Corpus
The current launch-family corpus is the v16 dgen1-r5-synth-300k corpus. It is
generated from structured BOTCOIN challenge worlds, entities, relations,
temporal updates, traps, and hard negatives. The generation path creates
retrieval-evaluation records from structured challenge ingredients.
Each record carries:
| Field | Purpose |
|---|---|
| Query text | The retrieval task |
| Truth documents | Answer-bearing memory documents |
| Hard negatives | Plausible documents that should rank below truth |
| Graded qrels | Relevance labels for nDCG@10, MRR, recall, and audit metrics |
| Split | train_visible, calibration, eval_hidden, or canary |
| Public intent metadata | Temporal, relation, evidence, conflict, scope, entity, and abstention hints available to all miners |
| Embeddings | Bundle-layout-compatible BGE-M3 query and document vectors |
| Provenance | Source domain, seed, generator path, and deterministic roots |
The production qrel path uses synthesizer-category labels. The generator knows why a negative exists, such as stale fact, entity swap, relation neighbor, attribute swap, lexical distractor, or unrelated filler. The bundle maps those categories into graded relevance. Larger audit rerankers remain useful for calibration checks. Production corpus growth avoids a heavier relabeling pass over every pair.
Corpus growth is published through signed deltas. Validators retain the launch
base corpus and can reconstruct historical corpus roots by walking the signed
delta chain forward. A manual --corpus-for-root 0x...=path shortcut exists
for operators, while the normal validator path auto-resolves and verifies
historical roots before post-reveal rescoring.
Corpus evolution is part of the memory model. New information enters, older information becomes stale, conflicts appear, retired hidden tasks leave the scoring pool, and new hidden tasks are added. Each evolve event is calibrated against its own corpus root, query pack, baseline, and pinned scorer context.
The calibration path also checks whether useful substrate changes generalize across corpus generations. A representative test starts with substrate design A on corpus/query pack A, evolves into corpus/query pack B, accepts a miner patch that beats the newly calibrated baseline, then backtests the resulting substrate design B against the pre-evolve corpus/query pack A. When design B preserves pre-evolve performance while improving the evolved context, the result shows a retrieval-routing improvement rather than corpus-specific indexing churn.