Corpus

The current launch-family corpus is the v16 dgen1-r5-synth-300k corpus. It is generated from structured BOTCOIN challenge worlds, entities, relations, temporal updates, traps, and hard negatives. The generation path creates retrieval-evaluation records from structured challenge ingredients.

Each record carries:

Field Purpose
Query text The retrieval task
Truth documents Answer-bearing memory documents
Hard negatives Plausible documents that should rank below truth
Graded qrels Relevance labels for nDCG@10, MRR, recall, and audit metrics
Split train_visible, calibration, eval_hidden, or canary
Public intent metadata Temporal, relation, evidence, conflict, scope, entity, and abstention hints available to all miners
Embeddings Bundle-layout-compatible BGE-M3 query and document vectors
Provenance Source domain, seed, generator path, and deterministic roots

The production qrel path uses synthesizer-category labels. The generator knows why a negative exists, such as stale fact, entity swap, relation neighbor, attribute swap, lexical distractor, or unrelated filler. The bundle maps those categories into graded relevance. Larger audit rerankers remain useful for calibration checks. Production corpus growth avoids a heavier relabeling pass over every pair.

Corpus growth is published through signed deltas. Validators retain the launch base corpus and can reconstruct historical corpus roots by walking the signed delta chain forward. A manual --corpus-for-root 0x...=path shortcut exists for operators, while the normal validator path auto-resolves and verifies historical roots before post-reveal rescoring.

Corpus evolution is part of the memory model. New information enters, older information becomes stale, conflicts appear, retired hidden tasks leave the scoring pool, and new hidden tasks are added. Each evolve event is calibrated against its own corpus root, query pack, baseline, and pinned scorer context.

The calibration path also checks whether useful substrate changes generalize across corpus generations. A representative test starts with substrate design A on corpus/query pack A, evolves into corpus/query pack B, accepts a miner patch that beats the newly calibrated baseline, then backtests the resulting substrate design B against the pre-evolve corpus/query pack A. When design B preserves pre-evolve performance while improving the evolved context, the result shows a retrieval-routing improvement rather than corpus-specific indexing churn.