Published: 2026-06-03
Building an agent that remembers across 10,000 conversations without hallucination
Most agent memory systems break down somewhere between conversation 50 and conversation 500. Retrieval quality decays. Hallucinations compound. Context windows overflow. At Myco Brain, we needed a system that could maintain coherent memory across tens of thousands of agent interactions—without the LLM inventing facts from its own degraded memory. Here’s how we built it.
The real scaling problem isn’t storage
Storing 10,000 conversations is cheap. Retrieving the right 3 facts from them is the problem. As memory grows, every retrieval query competes against an expanding corpus. The vector similarity score for the correct fact gets buried under hundreds of near-matches. The LLM gets a pile of semi-relevant chunks and starts guessing.
We call this the retrieval signal-to-noise collapse. It happens around 200–400 conversations with pure vector search, and it gets worse exponentially from there. The solution isn’t better embeddings. It’s structural retrieval—graph traversal that doesn’t degrade with corpus size.
Graph traversal over 10,000 conversations is O(d) where d is the traversal depth, not O(n) where n is the total corpus. Whether you have 100 or 100,000 conversations, brain.neighbors walks the same number of edges to find the answer. The graph doesn’t get slower as you add more data—it gets richer.
Deterministic at scale
The architecture splits cleanly into two planes that scale independently:
Client / Agent
↓ MCP tool call (context_pack, search, recall_memory)
MCP Server — tool validation + policy + RLS session context
↓ deterministic writes + graph traversals
Postgres 16 + pgvector — source of truth (hyobjects, relations, evidence)
↑ advisory suggestions only
LLM Advisory Worker — embeddings, NER, relation proposals, confidence scoring
The advisory workers (the LLM side) scale horizontally. More GPUs, more parallel extraction. The deterministic plane (MCP server + Postgres) scales vertically and through read replicas. Neither plane blocks the other. An increase in ingest volume doesn’t degrade query performance because querying the graph is independent of advisory extraction.
Why hallucination doesn’t compound
In a vector-only memory system, a hallucinated memory becomes part of the retrieval corpus. Future queries match against it. The agent retrieves its own hallucination from last week and treats it as fact. This creates a feedback loop where hallucination compounds.
Our confidence gating prevents this at three levels:
- Extraction confidence. Every proposed fact carries a confidence score from the advisory LLM. If the LLM isn’t sure, the fact never reaches durable storage.
- Schema validation. Facts must conform to the schema. A hallucinated fact about a person that doesn’t exist as a hyobject gets rejected at write time because the foreign key reference fails.
- Provenance anchoring. Every durable fact links back to source evidence. The
brain.whytool can trace any fact to its origin. Hallucinated facts have no provenance chain—they fail audit at the firstwhyquery.
The feedback loop breaks because bad data never enters the retrieval pool. It gets rejected at the deterministic gate before it can corrupt future retrievals.
Idempotency: replay safety at scale
At scale, workers crash. Jobs retry. Ingestion pipelines replay the same input multiple times. Without idempotency, every retry creates duplicate facts. In a long-running memory system, those replay artifacts can pollute the graph and make retrieval harder to trust.
We enforce idempotency at the database level through content-addressable hashing. Every fact is uniquely identified by a normalized hash of its content. Before writing, the system checks: does a fact with this hash already exist? If yes, link to it instead. If no, create it. This is enforced per-transaction, so concurrent workers can’t race each other into creating duplicates.
The operational payoff: you can replay your entire ingestion pipeline from day one without duplicating already-seen facts. Disaster recovery becomes a replay operation instead of a fragile data migration.
Multi-tenancy: why it matters at scale
If you’re an agency running Myco Brain for multiple clients, the scaling problem gets harder. Client A’s memories must never leak into Client B’s context—not through retrieval, not through the LLM’s training data, not through a misconfigured workspace.
We enforce isolation through Postgres Row-Level Security. Every query runs under a session context that scopes to a specific workspace. The deterministic plane enforces this at the database level. The advisory plane operates within the same RLS boundary. The isolation boundary is enforced by Postgres, not by application logic.
What we learned shipping this
Three things that surprised us:
- Confidence gates matter more than embeddings. We spent months tuning embedding models. The single biggest improvement came from the confidence gating that prevents low-confidence facts from entering the graph. Model quality is secondary to architectural discipline.
- RLS is worth the complexity. Row-Level Security adds latency to every query and makes migrations harder. But the alternative is application-level isolation, which fails in ways you can’t predict. RLS at the database level keeps isolation enforcement close to the data instead of scattering it through application code.
- Deterministic writes are a superpower. Knowing that every fact has a content hash, a provenance chain, and an idempotency key means you can debug production issues by asking the database questions instead of grep’ing logs. The database becomes the audit trail.