#llm-evaluation

FrontierMath v2: 42% of Math Problems Had Errors

June 26, 2026

Epoch AI's FrontierMath v2 fixed errors in 42% of problems on June 12, 2026. Here's what broke, how it was caught, and what it means for trusting AI benchmarks.

#frontiermath #ai benchmarks

Promptfoo Tutorial: Test LLM Prompts in CI (2026)

May 22, 2026

A hands-on promptfoo tutorial: test LLM prompts with deterministic and model-graded assertions, catch prompt regressions, and gate your CI with GitHub Actions.

#promptfoo #llm testing