Retrieval Evaluation
CoreTex evaluates each candidate patch against hidden query packs derived after the patch is received. The seed is bound to:
epochSecret + future Base blockhash + epochId + patchHash
+ parentRoot + minerAddress + corpusRoot + bundleHash
The future blockhash is not known when the patch arrives, so the coordinator cannot pre-test the patch against its actual hidden pack at receive time. Re-submitting the same (parentRoot, patchBytes) uses the cached verdict instead of rolling a fresh pack.
The coordinator records receivedAtBlock when a patch enters the eval queue. That value is part of the signed evaluation report, along with the target block, target blockhash, patch hash, and duplicate key. Replay watchers verify the blockhash against Base RPC data.
For each hidden query:
- Decode active substrate slots.
- Compare the query embedding to active retrieval-key vectors.
- Take the top retrieval candidates.
- Resolve candidates to corpus documents through memory slots.
- Rerank
(query, document)pairs with the pinned Qwen3 reranker. - Score the ranked list against graded qrels.
Two independent packs are used:
| Pack | Purpose |
|---|---|
| Gate | First hidden-pack pass |
| Confirm | Second hidden-pack pass to filter pack-luck wins |
A state advance must clear threshold on both packs.
The dominant metric is nDCG@10. It rewards ranking highly relevant answer-bearing documents near the top and penalizes plausible wrong answers. The evaluator also tracks temporal current/stale correctness, multi-hop relation recall, abstention behavior, and structural validity.
The default composite shape is retrieval-dominant:
| Component | Role |
|---|---|
Retrieval nDCG@10 |
Main signal for whether the substrate retrieves answer-bearing documents |
| Temporal score | Rewards current facts and stale-memory rejection |
| Relation recall | Rewards useful multi-hop routing through the relation region |
| Abstention | Penalizes surfacing irrelevant memories when no answer should be retrieved |
| Structural sanity | Ensures the substrate is well-formed and replayable |
The exact weights are bundle-profile values. The design requires retrieval to remain the dominant component and structural sanity to remain a small guardrail, not the reward law.