Evaluation

A live submission is scored against hidden packs derived after receipt. The seed binds:

epochSecret + future Base blockhash + epochId + patchHash
+ parentRoot + minerAddress + corpusRoot + bundleHash

The coordinator records the received block, waits for the target future blockhash, pins the seed before scoring, and uses two packs:

Pack Purpose
Gate First hidden evaluation sample
Confirm Independent sample used to reduce pack-luck acceptance

For each query, the evaluator:

  1. Decodes active substrate slots.
  2. Builds retrieval candidates from public corpus indexes and substrate routes.
  3. Renders Memory-IR where the active profile enables it.
  4. Reranks query/document pairs with the pinned Qwen reranker.
  5. Scores ranked documents against graded qrels.
  6. Compares the patched substrate to the parent substrate on the same pack.

nDCG@10 is the primary retrieval metric. Secondary signals include temporal current/stale behavior, relation recall, abstention, structural validity, and policy-atom effects. Live packs are sampled hidden tasks. Broader full-pack and pair-trace runs are calibration, parity, and release-gate evidence.

GPU scoring can be used by the coordinator as an execution venue through the keyless scorer sidecar. The sidecar receives packed parent substrate bytes and re-merkleizes them against parentStateRoot before scoring. The coordinator verifies scorer pins, health fields, seed echo, artifact bytes, and context hashes before signing. Validators replay on the pinned CPU path.