Evals: The New PRD
Chapter 8 — Evals: The New PRD We launched at roughly 90% benchmark accuracy. I thought that was solid. The team thought that was solid. Benchmarks do not capture what broke us. Within three days of l
Subscriber content
Unlock all 18 chapters + appendices of the guide
Full guide access · AI mentor · real-task simulator · unlimited retakes