Verification-first autonomous research agents
Agents that can’t grade their own homework.
Verification-first autonomous research agents. The agent that produces a result never decides whether it's good — an independent verifier grades it on held-out data it never saw. Wins and honest negatives are published with the same candor.
The stack
▲ all built on ▲
Touchstone· Verification spine
The domain-agnostic harness: a held-out evaluator, crash-resumable registry, token/experiment budget, single-GPU lease, and a calibration gate that refuses to open if the grader can't tell a good run from a broken one.