FrontierMath v2: 42% of Math Problems Had Errors
June 26, 2026
Epoch AI's FrontierMath v2 fixed errors in 42% of problems on June 12, 2026. Here's what broke, how it was caught, and what it means for trusting AI benchmarks.
Epoch AI's FrontierMath v2 fixed errors in 42% of problems on June 12, 2026. Here's what broke, how it was caught, and what it means for trusting AI benchmarks.
A hands-on promptfoo tutorial: test LLM prompts with deterministic and model-graded assertions, catch prompt regressions, and gate your CI with GitHub Actions.