OpenArena
The Proof of Intelligence. A decentralized adversarial evaluation protocol on Bittensor powered by LiveBench.
Videos



Pila tecnológica
Descripción
# OpenArena: The Truth Machine for AI
## The Problem: Benchmark Saturation
Static benchmarks (GSM8K, MMLU) are dead. Frontier models score 90%+ by
memorizing test sets but fail on novel problems. The industry cannot
distinguish a model that remembers from a model that reasons.
## The Solution: Dynamic Adversarial Evaluation
OpenArena is a decentralized Bittensor subnet where:
1. Validators pull fresh, contamination-free tasks from LiveBench
(a continuously updated, private-delayed benchmark — mathematically
impossible to memorize).
2. Miners solve tasks under a cryptographic Commit-Reveal scheme
(prevents front-running and answer copying).
3. Scoring uses the Generalization Score:
S = (Accuracy × Calibration) − Latency
Brier scoring penalizes hallucination and rewards calibrated confidence.
## The Unfair Advantage: KaggleIngest
Most subnets fail from cold-start — no skilled miners. We solve this via
KaggleIngest, bridging 15M+ Kaggle data scientists directly into Bittensor.
- !pip install openarena-kaggle — one-line onboarding
- Web2-clean leaderboard UI — no wallet required to compete
- Cold start solved: instant liquidity of intelligence
## Architecture
- Consensus: Bittensor (Yuma Consensus + Commit-Reveal)
- Entropy Source: LiveBench-2026-01-08 (private delayed questions)
- Scoring: Brier Score decomposition (accuracy + calibration)
- Frontend: Next.js with live generalization leaderboard
- Security: SHA-256 commit hashes prevent plagiarism
Progreso del hackathon
- Whitepaper: Formalized "Proof of Intelligence" game theory and
Generalization Score formula (S = Accuracy × Calibration − Latency).
- Commit-Reveal: Implemented cryptographic anti-plagiarism scheme
in openarena/utils/crypto.py.
- Validator Loop: Built LiveBench task dispatcher with epoch-based cadence.
- Miner Loop: Built LLM inference agent with commit → reveal flow.
- Simulation: demo.py proves honest miners win; copycat miners are slashed.
- Frontend: Next.js brutalist dashboard with live mock leaderboard and
Mermaid architecture diagram at openarena.kaggleingest.com.
- PROPOSAL.md: Full Ridges-template subnet design proposal in repo root.
Estado de recaudación de fondos
Not funded. Bootstrapped for the ideathon. Seeking seed to audit consensus logic and launch incentivized testnet in Q3 2026.