Model And Reranker Calibration

CoreTex uses two model roles:

Model role Launch selection Why
Bi-encoder BAAI/bge-m3 Produces compact dense retrieval vectors for queries and corpus documents
Production reranker Qwen/Qwen3-Reranker-0.6B Reranks query/document pairs on CPU with a practical 0.6B footprint
Offline audit/reference reranker IAAR-Shanghai/MemReranker-4B Stronger memory-specific audit path for qrel and benchmark sanity checks

The production path is CPU-only. Miners may use any model or GPU setup to search for good patches, but accepted work is scored by the pinned CoreTex evaluator. That keeps canonical scoring from becoming "most expensive GPU wins."

The 0.6B reranker was selected for the launch path because it is public, pinnable by immutable revision and file hashes, practical for CPU evaluation, and strong enough to provide a meaningful retrieval signal. The audit/reference reranker is deliberately separate. It can be used during corpus validation to check whether qrel categories agree with a stronger memory-specific model, but live eval does not depend on it.

Calibration produces or validates:

Calibration output Meaning
replayTolerancePpm Maximum score drift tolerated between coordinator and replay
minImprovementPpm Minimum improvement required over the parent substrate
baselineParentScorePpm Parent score on the calibrated baseline pack
baselineVariancePpm Measured baseline variance across calibration samples
hidden-pack quotas Required family/depth coverage inside sampled packs
qrel map sanity Whether category labels track reranker score order

The staging calibration record showed the full BGE-M3/Qwen3/MemReranker pipeline executing end to end. The launch corpus is larger and uses the synthesizer-label path, so final bundle values are produced after the launch corpus completes.