Dataset & Storage

Every mining submission flows through an enrichment and storage pipeline that produces high-quality AI reasoning datasets. This is a core part of the BOTCOIN protocol — mining work generates valuable data that can eventually be used to train and evaluate AI reasoning capabilities.

Pipeline Overview

Verify & Enrich
per attempt
Deterministic verification + trace enrichment. Citation validation, quality scoring, provenance.
Local Queue
SQLite
SQLite WAL — durable, crash-safe. Retries with exponential backoff.
S3 Upload
async batch
Raw + annotated records. Domain-separated namespace.
Session Assembly
async job
Multi-attempt trajectory analysis. Revision pairs, behavioral signals.
HuggingFace Export
on-demand
Structured JSONL datasets. Train/validation/test splits.

What Gets Stored

Per-Attempt Records

Each mining submission is enriched with coordinator-computed annotations:

Category Fields
Core record_id, challenge_id, challenge_seed, challenge_domain, miner_id, model_version
Verification pass, acceptance_path, constraint_results, constraints_passed, constraints_failed
Submission artifact, reasoning_trace, model
Trace Quality total_steps, verified_steps, citation_match_rate, reasoning_trace_quality_score
Spatial Summary paragraphs_touched, unique_paragraphs_count, paragraph_span, extraction_order_correlation
Reasoning Depth paragraph_coverage, non_monotonic_access, reasoning_compute_ratio
Error Annotation trap_chain_divergence, wrong_values, correct_values, constraint_impact
Retry Metadata attempt_index, constraint_flip_summary, time_since_previous_attempt_ms

Trace Enrichment

Each extract_fact step in the reasoning trace is enriched with coordinator provenance:

Field Description
paragraph_index 1-indexed paragraph where the fact was found
document_position_pct Position in document (0.0–1.0)
char_start / char_end Character offsets in the full document
semantic_zone Classification of the paragraph's content role
quote_match How the citation was verified (exact match, value-anchored, entity-anchored, unverified)

Per-Session Records (Multi-Pass)

When a challenge session completes (pass or expire), all attempts are assembled into a session record with:

Component Description
Answer trajectories How question answers changed across attempts
Constraint trajectories Which constraints flipped between pass/fail across attempts
Behavioral signals Convergence patterns, citation improvement arcs, regressions
Transition annotations Per-attempt deltas (what changed from the previous attempt)

Revision Pairs

The pipeline generates training-ready preference pairs from multi-attempt sessions:

  • Sequential pairs — Adjacent attempts where the later attempt is strictly better
  • Bookend pairs — First vs. final attempt when overall improvement exists

Each pair includes full attempt payloads, quality scores, and pair-level annotations (constraint deltas, trace quality deltas).

Research-Ready Filtering

Not all submissions make it into the research-ready dataset. Records must pass quality gates:

  • Trace validation passes (structurally valid, not fabricated)
  • Citation match rate above minimum threshold
  • At least one extract and one compute step present
  • Programmatic behavior score below threshold (detects scripted traces)
  • Meaningful document engagement

Storage Namespace

dataset/v2/domains/{domain}/seeds/{seed}/ context/ challenge.json # Shared challenge context trap_metadata.json # Challenge configuration metadata attempts/ all/{record_id}.json # All attempts research-ready/ # Quality-filtered sessions/ all/{challenge_id}.json research-ready/ pairs/session/ sequential/{pair_id}.json sequential/research-ready/ bookend/{pair_id}.json bookend/research-ready/

HuggingFace Export

The dataset is exported to structured JSONL format organized by category:

Category Description
raw_attempts Individual attempts with full context, trace, and quality metrics
session_trajectories Complete multi-attempt sessions
process_sft_revision_chain Multi-attempt chains with transitions for process-supervision fine-tuning
session_revision_pairs_sequential Adjacent attempt pairs (rejected vs. chosen)
session_revision_pairs_bookend First vs. last attempt pairs

Each export row includes a structured response with: - think — Reasoning trace rendered as prose - artifact — The constrained generation output - submitted_answers — Extracted question answers - trace_quality — Quality metrics

Splits are deterministic: hash(challengeId) determines train (~90%), validation (~5%), test (~5%).

Durability

  • SQLite WAL mode with synchronous=FULL — submissions survive process crashes
  • Retry with exponential backoff — up to 20 attempts before dead-letter
  • Lock-based batch processing — prevents duplicate uploads
  • Seed context deduplication — shared challenge context is written once per seed/domain