v1.2.22026-06-16

Keyless recency reranker, better retrieval, no API key

A retrieval-quality release. No tool-contract changes, the reranker is an optional, additive brain_search argument.

What shipped

brain_search(reranker: 'recency') reorders results by a deterministic blend of relevance and recency, no API key, no network call:

final = 0.7 · relevance + 0.3 · recency_norm

So memories that are both fresh and relevant rise to the top, useful whenever recent context should win ties. It’s the production form of the eval’s “temporal” strategy (identical formula), and 15/15 reranker unit tests pass.

The numbers (full 500-question run)

On the complete 500-question LongMemEval longmemeval_s set, distractors and all, evidence recall@5:

89.2%, keyless default (hybrid search), 446/500.
91.6%, with the recency reranker (temporal), 458/500.

Retrieval is deterministic; these were verified two independent ways with zero drift. Reproduce them yourself:

python -m evals.longmemeval.run --subset longmemeval_s -n 500 --no-qa

Compare the hybrid and temporal rows (ev_at_5). These are the full 500-question figures, every number we publish has to reproduce on the full set, no sampling.

Also in this release

Eval robustness, embedder inputs are sanitized so a single bad batch can no longer silently skew benchmark results.
README accuracy, the relation edge-survival figure is corrected to 0% → ~80% (11–12/14) (86% is the directed-extraction accuracy, a different metric).

Full details in the repository changelog. No breaking changes to any brain_* tool contract.