Evaluation
A live submission is scored against hidden packs derived after receipt. The seed binds:
epochSecret + future Base blockhash + epochId + patchHash
+ parentRoot + minerAddress + corpusRoot + bundleHash
The coordinator records the received block, waits for the target future blockhash, pins the seed before scoring, and uses two packs:
| Pack | Purpose |
|---|---|
| Gate | First hidden evaluation sample |
| Confirm | Independent sample used to reduce pack-luck acceptance |
For each query, the evaluator:
- Decodes active substrate slots.
- Builds retrieval candidates from public corpus indexes and substrate routes.
- Renders Memory-IR where the active profile enables it.
- Reranks query/document pairs with the pinned Qwen reranker.
- Scores ranked documents against graded qrels.
- Compares the patched substrate to the parent substrate on the same pack.
nDCG@10 is the primary retrieval metric. Secondary signals include temporal
current/stale behavior, relation recall, abstention, structural validity, and
policy-atom effects. Live packs are sampled hidden tasks. Broader full-pack and
pair-trace runs are calibration, parity, and release-gate evidence.
GPU scoring can be used by the coordinator as an execution venue through the
keyless scorer sidecar. The sidecar receives packed parent substrate bytes and
re-merkleizes them against parentStateRoot before scoring. The coordinator
verifies scorer pins, health fields, seed echo, artifact bytes, and context
hashes before signing. Validators replay on the pinned CPU path.