Difficulty Scaling

CoreTex difficulty has two layers:

  1. The benchmark distribution changes as the corpus grows.
  2. minImprovementPpm adapts based on observed state advances and quality attempts.

There is no separate corpus-hardness score. The system does not need a second oracle saying the corpus got harder. Instead, the parent substrate and candidate substrate are evaluated on the same per-patch hidden packs, and the evaluator measures candidate improvement over that parent.

Baseline Re-Evaluation

The simplified baseline helper computes:

Value Meaning
parentScorePpm Parent substrate score on the baseline hidden query pack
variancePpm Measured score variance across repeated samples
samples Number of baseline samples
corpusRoot Corpus root used for the baseline
epochId Epoch the query pack belongs to

Acceptance compares the candidate against the parent substrate on the same pack. variancePpm and replayTolerancePpm keep the threshold away from reranker or runtime boundary noise.

Major-Delta Grace

When a large eval_hidden delta crosses the bundle-pinned majorDeltaThreshold, the next epoch sets majorDeltaActive = true. In that epoch:

  • minImprovementPpm is frozen at the current value.
  • No ramp, decay, or drift is applied.
  • The reason code is major_delta_grace.
  • The baseline is recomputed and published before normal threshold movement resumes.

This prevents one-epoch cliffs after large corpus changes without adding a hardness knob.

Hidden-Pack Strata

The hidden-pack sampler uses multi-strata membership through strataOf(event). An event can count toward several quotas at once, for example:

family=temporal
bucket=hard
family=multi_hop_relation,relationHop>=3
depth>=2

Depth fields are emitted by corpus synthesis when known. Old records default safely. Quotas are calibration outputs and are pinned in the bundle profile so hidden packs keep sampling deep temporal, causal, and multi-hop memory as those strata grow.