Model And Reranker Calibration

CoreTex uses two model roles:

Model role	Launch selection	Why
Bi-encoder	`BAAI/bge-m3`	Produces compact dense retrieval vectors for queries and corpus documents
Production reranker	`Qwen/Qwen3-Reranker-0.6B`	Reranks query/document pairs on CPU with a practical 0.6B footprint
Offline audit/reference reranker	`IAAR-Shanghai/MemReranker-4B`	Stronger memory-specific audit path for qrel and benchmark sanity checks

The production path is CPU-only. Miners may use any model or GPU setup to search for good patches, but accepted work is scored by the pinned CoreTex evaluator. That keeps canonical scoring from becoming "most expensive GPU wins."

The 0.6B reranker was selected for the launch path because it is public, pinnable by immutable revision and file hashes, practical for CPU evaluation, and strong enough to provide a meaningful retrieval signal. The audit/reference reranker is deliberately separate. It can be used during corpus validation to check whether qrel categories agree with a stronger memory-specific model, but live eval does not depend on it.

Calibration produces or validates:

Calibration output	Meaning
`replayTolerancePpm`	Maximum score drift tolerated between coordinator and replay
`minImprovementPpm`	Minimum improvement required over the parent substrate
`baselineParentScorePpm`	Parent score on the calibrated baseline pack
`baselineVariancePpm`	Measured baseline variance across calibration samples
hidden-pack quotas	Required family/depth coverage inside sampled packs
qrel map sanity	Whether category labels track reranker score order

The staging calibration record showed the full BGE-M3/Qwen3/MemReranker pipeline executing end to end. The launch corpus is larger and uses the synthesizer-label path, so final bundle values are produced after the launch corpus completes.