Keyless recency reranker — better retrieval, no API key
A retrieval-quality release. No tool-contract changes — the reranker is an optional, additive brain_search argument.
What shipped
brain_search(reranker: 'recency') reorders results by a deterministic blend of relevance and recency — no API key, no network call:
final = 0.7 · relevance + 0.3 · recency_normSo memories that are both fresh and relevant rise to the top — useful whenever recent context should win ties. It’s the production form of the eval’s “temporal” strategy (identical formula), and 15/15 reranker unit tests pass.
The numbers (full 500-question run)
On the complete 500-question LongMemEval longmemeval_s set — distractors and all — evidence recall@5:
- 89.2% — keyless default (
hybridsearch), 446/500. - 91.6% — with the recency reranker (
temporal), 458/500.
Retrieval is deterministic; these were verified two independent ways with zero drift. Reproduce them yourself:
python -m evals.longmemeval.run --subset longmemeval_s -n 500 --no-qaCompare the hybrid and temporal rows (ev_at_5). These are the full 500-question figures — every number we publish has to reproduce on the full set, no sampling.
Also in this release
- Eval robustness — embedder inputs are sanitized so a single bad batch can no longer silently skew benchmark results.
- README accuracy — the relation edge-survival figure is corrected to 0% → 79% (86% is the directed-extraction accuracy, a different metric).
Full details in the repository changelog. No breaking changes to any brain_* tool contract.