Compounding confidence, full dynamic schema, per-object privacy, in-repo benchmark
With this release, everything on the original launch roadmap has shipped.
Compounding confidence — the full engine
A fact’s confidence now rises with independent corroboration and falls on contradiction — and contradicted facts are superseded, never silently overwritten.
- Every relation sighting is recorded as evidence, one row per source document per edge; confidence is recomputed with a damped noisy-OR anchored on the strongest source.
- On single-valued (“functional”) predicates like works-for, a confident conflicting observation closes the old edge, weakens it, and records the supersession in the claims ledger — history stays queryable.
- brain_why gains independent-source counts, an audited confidence trend (e.g. “0.8 → 0.86”), and a superseded-relations list. brain_stats gains an evidence section. All additive — no tool contract changes.
Verified by npm run test:compounding — the full lifecycle end-to-end against a live database, plus 17 unit tests. Proven live with llama3.2:3b: two documents asserting different employers produce
Rhea Calloway -[works for]-> Halcyon Labs conf=0.600 [SUPERSEDED]
Rhea Calloway -[works for]-> Driftwood Analytics conf=1.000 [ACTIVE]Full dynamic schema — gated auto-promotion
The propose-and-surface loop from v1.1.0 now completes: proposals corroborated by enough distinct documents at high confidence auto-promote into the live catalogs, and the promoted type is immediately usable by the next extraction batch. Strictly opt-in (BRAIN_SCHEMA_AUTO_PROMOTE=1, defaults: 3 independent documents at ≥0.8 confidence), with a full audit trail on the proposal row — and strict curation mode always wins. Verified by npm run test:schema-promotion.
Per-object privacy
Documents marked private are readable only by the agent that created them (plus service-role callers) across all six read tools — brain_search, brain_context_pack, brain_recall_memory, brain_why, brain_neighbors, and brain_get_related. Private rows with no recorded creator stay hidden from non-service callers, conservative by design. Workspace, org, and public documents behave exactly as before. Verified by a two-agent visibility-matrix check, npm run test:sharing.
LongMemEval benchmark harness — in the repo
The headline number is reproducible by anyone, not asserted: 73.6% end-to-end QA accuracy on the complete 500-question LongMemEval oracle subset (no sampling) with 100% evidence-retrieval recall — reader gpt-4o-mini, judge gpt-4o. The harness ships in-repo at evals/longmemeval — self-contained, per-question workspace isolation, with methodology in its README. Run it yourself.
Run the checks yourself
npm run test:compounding— full corroboration → contradiction → supersession lifecycle, live DB, no LLMnpm run test:schema-promotion— default-off, gated auto-promotion, promoted-kind-becomes-usablenpm run test:sharing— two-agent visibility matrix for private documentsevals/longmemeval/— the benchmark harness itself — run the headline number yourself
Full details in the repository changelog. All changes are additive — no breaking changes to any brain_* tool contract.