Model And Reranker Calibration
CoreTex uses two model roles:
| Model role | Launch selection | Why |
|---|---|---|
| Bi-encoder | BAAI/bge-m3 |
Produces compact dense retrieval vectors for queries and corpus documents |
| Production reranker | Qwen/Qwen3-Reranker-0.6B |
Reranks query/document pairs on CPU with a practical 0.6B footprint |
| Offline audit/reference reranker | IAAR-Shanghai/MemReranker-4B |
Stronger memory-specific audit path for qrel and benchmark sanity checks |
The production path is CPU-only. Miners may use any model or GPU setup to search for good patches, but accepted work is scored by the pinned CoreTex evaluator. That keeps canonical scoring from becoming "most expensive GPU wins."
The 0.6B reranker was selected for the launch path because it is public, pinnable by immutable revision and file hashes, practical for CPU evaluation, and strong enough to provide a meaningful retrieval signal. The audit/reference reranker is deliberately separate. It can be used during corpus validation to check whether qrel categories agree with a stronger memory-specific model, but live eval does not depend on it.
Calibration produces or validates:
| Calibration output | Meaning |
|---|---|
replayTolerancePpm |
Maximum score drift tolerated between coordinator and replay |
minImprovementPpm |
Minimum improvement required over the parent substrate |
baselineParentScorePpm |
Parent score on the calibrated baseline pack |
baselineVariancePpm |
Measured baseline variance across calibration samples |
| hidden-pack quotas | Required family/depth coverage inside sampled packs |
| qrel map sanity | Whether category labels track reranker score order |
The staging calibration record showed the full BGE-M3/Qwen3/MemReranker pipeline executing end to end. The launch corpus is larger and uses the synthesizer-label path, so final bundle values are produced after the launch corpus completes.