TestForge — Project Description

AI-Powered Unit Test Generation for Bittensor

TestForge is a Bittensor subnet where AI miners compete to generate battle-tested unit tests, verified through mutation testing to ensure tests actually catch bugs.

The Problem

70% of open-source code has zero tests. Untested code kills people (Therac-25), crashes markets (Knight Capital $440M), and breaks the internet (Cloudflare, Log4j).

The Solution

Decentralized AI competition to generate high-quality unit tests with cryptographic proof of usefulness.

The Innovation

Three-gate verification with mutation testing — the only ungameable way to prove a test actually works.

Why Bittensor

Tests are binary verifiable. Perfect fit for incentive-driven competition. Best AI wins, bad AI earns nothing.

How It Works


┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐

│   CODE IN   │────▶│  64 MINERS  │────▶│   3-GATE    │────▶│  BEST TEST  │

│             │     │   COMPETE   │     │  VALIDATOR  │     │  WINS TAO   │

└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

Gate 1: Do tests run? → pytest execution → 0 failures required

Gate 2: Do tests cover code? → coverage.py → ≥80% lines required

Gate 3: Do tests catch bugs? → Mutation injection → ≥60% kill rate required

Final Score = (G1 + G2 + G3) / 3 → Rewards distributed proportionally

Why Mutation Testing is Ungameable

The Cheat (without mutation testing):


def test_fake():

    my_function("x")  # No assertion — always passes, 100% coverage

Why It Fails Gate 3:

We inject a bug: change return a + b to return a - b

The fake test still passes → Mutation SURVIVES → Kill rate 0% → Gate 3 FAILS

You cannot fake catching a bug. The mutation either dies or it doesn't.

Architecture


┌─────────────────────────────────────────────────────────────────────────┐

│                          TESTFORGE SUBNET                               │

├─────────────────────────────────────────────────────────────────────────┤

│                                                                         │

│  ┌────────────────┐              ┌────────────────────────────────────┐ │

│  │                │              │            VALIDATOR               │ │

│  │     TASK       │              │                                    │ │

│  │   GENERATOR    │─────────────▶│   ┌──────────┐    ┌──────────┐    │ │

│  │                │              │   │  GATE 1  │    │  GATE 2  │    │ │

│  │  • Code        │              │   │  Tests   │───▶│ Coverage │    │ │

│  │  • Docstring   │              │   │  Pass?   │    │  ≥80%?   │    │ │

│  │  • Problem ID  │              │   └──────────┘    └──────────┘    │ │

│  │  • Benchmark   │              │        │              │           │ │

│  └────────────────┘              │        ▼              ▼           │ │

│                                  │   ┌─────────────────────────┐     │ │

│  ┌────────────────┐              │   │        GATE 3           │     │ │

│  │                │              │   │    MUTATION ENGINE      │     │ │

│  │    MINERS      │─────────────▶│   │    Kill Rate ≥60%?      │     │ │

│  │                │              │   └─────────────────────────┘     │ │

│  │  • LLM Agent   │              │              │                    │ │

│  │  • Test Gen    │              │              ▼                    │ │

│  │  • Compete     │              │   ┌─────────────────────────┐     │ │

│  └────────────────┘              │   │     SCORE ENGINE        │     │ │

│                                  │   │   S = (G1 + G2 + G3)/3  │     │ │

│                                  │   └─────────────────────────┘     │ │

│                                  └────────────────────────────────────┘ │

│                                               │                         │

│                                               ▼                         │

│                                  ┌────────────────────────────────────┐ │

│                                  │        BITTENSOR CHAIN             │ │

│                                  │    Weight Update + Emissions       │ │

│                                  └────────────────────────────────────┘ │

└─────────────────────────────────────────────────────────────────────────┘

Key Metrics

| Metric | Status |

|--------|--------|

| Prototype | ✅ Complete |

| Tests | ✅ 47/47 passing |

| Simulation | ✅ 5 miners × 10 epochs |

| API | ✅ FastAPI ready |

| Benchmarks | ✅ SWE-bench, GitBug-Java |

Economics

| Role | Share | Incentive |

|------|-------|-----------|

| Miners | 41% | Generate better tests → earn more |

| Validators | 41% | Run honest verification → earn stake |

| Subnet Owner | 18% | Grow the network → grow value |

Go-To-Market

|-------|----------|--------|------|

Competitive Advantage

| vs Copilot/ChatGPT | vs Other Subnets |

|--------------------|------------------|

| ✅ Verified quality | ✅ Binary scoring |

| ✅ Mutation-tested | ✅ Focused on ONE task |

| ✅ Decentralized | ✅ Real benchmarks |

| ✅ Always improving | ✅ Ungameable |

Deliverables

| Artifact | Status |

|----------|-----------|

| GitHub Repo | ✅ Test-Forge |

| Core Engine | ✅ 8 modules |

| 3-Gate Validator | ✅ Complete |

| Mutation Engine | ✅ 6 operators |

| Tests | ✅ 47 passing |

| Simulation | ✅ Working |

| Docs | ✅ 5 guides |

| API | ✅ Ready |

Quick Start

git clone https://github.com/manjeetsharma0796/Test-Forge.git

cd Test-Forge

pip install -r requirements.txt

python -m pytest tests/ -v # 47 passed ✅

python main.py simulate # Watch miners compete

The Ask

Select TestForge for Round 2.

We will deploy a fully functional subnet on Bittensor testnet demonstrating:

Working miner competition
Live 3-gate validation
Real TAO emissions to the best test generator

GitHub: github.com/manjeetsharma0796/Test-Forge

Test Forge

ビデオ

テックスタック

説明