Whasuk Lee

Verification-first autonomous research agents

Agents that can’t grade their own homework.

Verification-first autonomous research agents. The agent that produces a result never decides whether it's good — an independent verifier grades it on held-out data it never saw. Wins and honest negatives are published with the same candor.

The stack

▲ all built on ▲
Touchstone· Verification spine

The domain-agnostic harness: a held-out evaluator, crash-resumable registry, token/experiment budget, single-GPU lease, and a calibration gate that refuses to open if the grader can't tell a good run from a broken one.

Latest works

all →