OutcomeOS
Menu

Evals: The New PRD

Chapter 8 — Evals: The New PRD We launched at roughly 90% benchmark accuracy. I thought that was solid. The team thought that was solid. Benchmarks do not capture what broke us. Within three days of l

Subscriber content

Unlock all 18 chapters + appendices of the guide

Full guide access · AI mentor · real-task simulator · unlimited retakes