Claude Opus 4.8: Benchmarks, Dynamic Workflows, Pricing

May 29, 2026

Claude Opus 4.8: Benchmarks, Dynamic Workflows, Pricing

TL;DR

Anthropic released Claude Opus 4.8 on May 28, 2026, the same day it closed a $65 billion Series H at a $965 billion valuation.12 The model holds the same $5 input / $25 output per million tokens as Opus 4.7, but fast mode now runs 2.5x faster at 3x lower cost than previous Opus fast modes ($10/$50 per million tokens).1 On Anthropic's own benchmarks, Opus 4.8 reaches 88.6% on SWE-bench Verified, 69.2% on SWE-bench Pro, and 1890 Elo on GDPval-AA, which Anthropic reports as 121 Elo ahead of OpenAI's GPT-5.5 on knowledge work.3 The bigger product news may be dynamic workflows in Claude Code — a research preview that lets Claude write an orchestration script and run hundreds of parallel subagents in a single session, capped at 16 concurrent and 1,000 total per run.4 Anthropic also says it expects to ship Mythos-class models — a tier above Opus that the company has kept restricted on cybersecurity grounds — "in the coming weeks."1


What You'll Learn

  • What changed between Claude Opus 4.7 and Opus 4.8
  • The full benchmark table for Opus 4.8 vs Opus 4.7 and GPT-5.5
  • How Anthropic's new dynamic workflows feature works in Claude Code
  • Opus 4.8 pricing across standard, fast mode, and prompt caching
  • Why honesty is the headline non-coding improvement
  • The new effort control on claude.ai and Cowork
  • What Mythos is and when Anthropic plans to release it

What Is Claude Opus 4.8?

Claude Opus 4.8 is the May 28, 2026 refresh of Anthropic's flagship Opus line. The Anthropic announcement frames it as "a modest but tangible improvement on its predecessor" — incremental on raw benchmark numbers but more reliable on agentic work and noticeably more honest about its own progress.1

The API model ID is claude-opus-4-8, and Anthropic shipped it everywhere on launch day: the Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot.156 The context window stays at 1M tokens by default on the Claude API, Bedrock, and Vertex AI; Microsoft Foundry caps it at 200K.7

Two things distinguish this release from a routine point upgrade. First, Anthropic shipped it alongside a new Claude Code feature called dynamic workflows that changes what one model session can do in a single turn (more on that below). Second, the company tightened the calendar dramatically — Opus 4.8 lands 42 days after Opus 4.7 (April 16, 2026), fast even by Anthropic's recent cadence.8

Claude Opus 4.8 Benchmarks: SWE-bench, GDPval, Terminal-Bench

Anthropic published a head-to-head benchmark table comparing Opus 4.8 against Opus 4.7 and the strongest competitor in the same category.3 The pattern is consistent: Opus 4.8 beats Opus 4.7 on every measured category, and beats GPT-5.5 on most — but GPT-5.5 still wins one important corner of the table.

BenchmarkOpus 4.8Opus 4.7GPT-5.5
SWE-bench Verified — coding88.6%87.6%88.7%
SWE-bench Pro — agentic coding69.2%64.3%58.6%
Terminal-Bench 2.1 — terminal use (Terminus-2 harness)74.6%66.1%78.2%
GDPval-AA — knowledge work (Elo)189017531769

A few notes on reading this table:

SWE-bench Verified is now a near-tie at the top. Opus 4.8's 88.6% is one tenth of a point under GPT-5.5's 88.7%.39 At the leaderboard top, this benchmark is approaching its ceiling, and the more telling story has moved to SWE-bench Pro — a harder agentic-coding benchmark where Opus 4.8 leads GPT-5.5 by 10.6 points.3

Terminal-Bench is the one place GPT-5.5 still wins. Anthropic reports all numbers in the table using the Terminus-2 public harness for fairness, and Opus 4.8's 74.6% is 3.6 points behind GPT-5.5's 78.2% on the same harness.1 In a footnote, Anthropic acknowledges that GPT-5.5 with its own Codex CLI harness reaches 83.4%, so the gap on the terminal is real for terminal-centric workflows.1

GDPval-AA is the broad knowledge-work measure. Opus 4.8 lands at 1890 Elo, which Anthropic reports as 121 Elo points above GPT-5.5's 1769.3 On Elo scales used in head-to-head model comparisons, a 121-point gap implies roughly a 67% win rate in pairwise comparisons.10 This flips the GDPval-AA ranking that GPT-5.5 took on its April 23, 2026 launch, when it sat above Opus 4.7's 1753 Elo.39

Anthropic also says Opus 4.8 is "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked," based on its own evaluations — a reliability metric that does not show up on public leaderboards but matters more than a benchmark point for production engineering work.1

Dynamic Workflows in Claude Code

The biggest non-benchmark feature is dynamic workflows, available in Claude Code as a research preview. The pitch: instead of running one Claude instance against a hard task, Claude writes a JavaScript orchestration script, then a runtime executes it and spawns parallel subagents to handle pieces of the work.4

Anthropic's example workloads are the kinds of jobs that previously needed a team rather than a model — codebase-scale migrations across hundreds of thousands of lines, full test-suite generation, and codebase audits where every file gets its own pass.1 Claude plans the work, dispatches the subagents, then verifies its own outputs before returning a result.

There are two structural caps on a dynamic workflow run. The runtime allows up to 16 concurrent subagents running in parallel, and any single workflow run is capped at 1,000 total subagents end-to-end.4 Those are the hard guardrails for the research preview.

Dynamic workflows are available on Max, Team, and Enterprise plans for Claude Code, and Anthropic says the feature is on by default on Max and Team.4 Because the model selector is unchanged, dynamic workflows work most reliably when paired with Opus 4.8 — Anthropic says "with Opus 4.8, the agents can run for even longer" without losing the plot.1

Claude Opus 4.8 Pricing: Standard, Fast Mode, Caching

Standard pricing is unchanged from Opus 4.7. Fast mode is the headline change. The complete table:

TierInput (per 1M tokens)Output (per 1M tokens)
Standard$5.00$25.00
Fast mode$10.00$50.00
US-only inference$5.50$27.50

⚠ Prices change frequently. The values above are for illustration only and may be out of date. Always verify current pricing directly with the provider before making cost decisions: Anthropic · OpenAI · Google Gemini · Google Vertex AI · AWS Bedrock · Azure OpenAI · Mistral · Cohere · Together AI · DeepSeek · Groq · Fireworks AI · Perplexity · xAI · Cursor · GitHub Copilot · Windsurf.

Fast mode runs the same model at up to 2.5x output tokens per second for a 2x token-price premium.111 What makes this a story is that the same fast-mode tier on prior Opus releases cost more — Anthropic says fast mode on 4.8 is "now three times cheaper than it was for previous models," even though the standard price is flat.1 In practice this lets latency-sensitive workloads (interactive agents, IDE autocomplete-style flows) lean on the same model at a price closer to what was previously the slow tier.

Three more pricing levers matter for production:

  • Prompt caching can cut input cost by up to 90%.7 Opus 4.8 also lowers the minimum cacheable prompt to 1,024 tokens, down from a higher threshold on 4.7, so shorter prompts that previously couldn't cache now can.7
  • Batch processing delivers a 50% discount on async workloads.7
  • US-only inference for compliance workloads is priced at 1.1x standard — a small premium for data-residency guarantees.7

For a developer comparison: at $5 input and $25 output, Opus 4.8 ties OpenAI's GPT-5.5 on input ($5) and undercuts it by about 17% on output ($25 vs $30), while running roughly 3.3x above Google's Gemini 3.5 Flash on input and 2.8x on output ($1.50/$9).10

Honesty and the "Sharper Judgment" Pitch

Anthropic's strongest qualitative claim on Opus 4.8 isn't a benchmark — it's behavior. The announcement says Opus 4.8 is "more likely to flag uncertainties about its work and less likely to make unsupported claims."1 Anthropic's own evaluations report that Opus 4.8 is four times less likely than Opus 4.7 to let flaws in its own generated code slip through unflagged.1

The alignment assessment, run pre-release, found measurable gains in "prosocial traits like supporting user autonomy and acting in the user's best interest."1 On the misalignment side, deception and misuse-cooperation rates are "substantially lower than Opus 4.7," and similar to Claude Mythos Preview — Anthropic's most safety-aligned model.1 The full assessment is in the Opus 4.8 System Card.12

Several launch-day partners published independent observations:

  • Cursor said Opus 4.8 "exceeds prior Opus models across every effort level" on its internal CursorBench and that tool calling uses fewer steps for the same intelligence (Michael Truell, Co-Founder and CEO).1
  • Browserbase said Opus 4.8 scored 84% on the Online-Mind2Web computer-use benchmark, which it called "a meaningful jump over both Opus 4.7 and GPT-5.5" (Miguel Gonzalez, Tech Lead).1
  • Databricks said Opus 4.8 reasons over PDFs and diagrams at "61% cheaper token cost than Opus 4.7" in its Genie agent, attributing the saving to better token efficiency rather than a price change (Hanlin Tang, CTO of Neural Networks).1
  • Cognition said Opus 4.8 fixes the "comment-verbosity and tool-calling issues" that some teams hit on Opus 4.7 (Scott Wu, CEO).1

These are partner quotes, not Anthropic claims — worth treating accordingly — but the consistency across testers is the closest a launch post gets to independent validation on release day.

Effort Control and the Messages API Update

Two smaller but production-relevant changes shipped alongside the model.

Effort control is now a setting alongside the model selector on claude.ai and Cowork, available on every plan.1 Users pick the amount of thinking effort Claude spends per response, ranging from Low (faster, less rate-limit usage) to Max. Opus 4.8 defaults to High, which Anthropic calls "the best overall balance of quality and user experience." For harder problems, users can step up to "extra" (xhigh in Claude Code) or "max."1 Anthropic notes that on coding tasks, high effort spends a token count similar to Opus 4.7's default — so you get better results at a similar cost, not a quality bump priced in additional tokens.1

The Messages API now accepts system entries inside the messages array. Practically, this lets developers update Claude's instructions mid-task without breaking the prompt cache or having to route the update through a user turn.1 Anthropic suggests using this to update permissions, token budgets, or environment context as an agent runs — a hook designed for long-running agentic harnesses where the operator wants to change the rules without restarting the session.

Anthropic's Series H and the $965 Billion Valuation

Opus 4.8 dropped the same day Anthropic announced a $65 billion Series H at a $965 billion post-money valuation.2 Per Anthropic's own announcement, the round was led by Altimeter Capital, Dragoneer, Greenoaks, and Sequoia Capital, with Capital Group, Coatue, D1 Capital Partners, GIC, ICONIQ, and XN co-leading.13 Of the $65 billion total, $15 billion was previously committed from hyperscalers — including $5 billion from Amazon.13

The financial snapshot Anthropic disclosed:13

  • $47 billion revenue run rate — Anthropic says it crossed this earlier in May 2026
  • Up from a $14 billion run rate at the Series G round in February 202614
  • Previous valuation at the February 2026 Series G: $380 billion — meaning the May round nearly tripled the company's valuation in three months14

This puts Anthropic ahead of OpenAI as the most highly valued private AI lab.1516 Both companies are reportedly heading toward IPOs, and the same enterprise relationships Anthropic used to validate Opus 4.8 — Cursor, Cognition, Databricks, Browserbase, Thomson Reuters CoCounsel, Hebbia — read as a partial list of the customers funding the run-rate numbers.1

When Mythos Ships

The clearest forward-looking line in the Opus 4.8 post: Anthropic expects to bring Mythos-class models to all customers "in the coming weeks."1

Mythos is the model Anthropic disclosed earlier as having identified thousands of previously unknown zero-day vulnerabilities across every major operating system and every major web browser.17 Because of those capabilities, Anthropic has restricted Mythos Preview to the 12 launch partners of Project Glasswing and "over 40 additional organizations" that build or maintain critical infrastructure, rather than shipping it generally.18 The Opus 4.8 announcement is the first time Anthropic has paired a specific near-term window — "coming weeks" — with broader Mythos availability; earlier statements committed only to "the near future."

The technical bridge between Opus 4.8 and that release is the alignment work. Anthropic explicitly framed Opus 4.8 as a vehicle to refine the safeguards needed before Mythos can be released safely: "we plan to release a new class of model with even higher intelligence than Opus … Models of this capability level require stronger cyber safeguards before they can be generally released. We're making swift progress on developing these safeguards."1 Opus 4.8's improved deception and misuse-cooperation rates — which Anthropic says are now similar to Mythos Preview — are the trial run for what those safeguards look like at Opus capability levels.

Bottom Line

Opus 4.8 is the cleanest pure Opus point release Anthropic has shipped in some time — same price, marginally better leaderboards, plus a noticeable jump on harder agentic benchmarks like SWE-bench Pro and GDPval-AA. The bigger story is the surrounding announcements: dynamic workflows are the agentic-coding feature engineers will actually feel day-to-day, fast mode at 3x lower cost reshapes the latency-vs-price tradeoff for interactive workloads, and the "Mythos in the coming weeks" line puts a public deadline on a model that has been gated for safety reasons since launch. All of that landed on the same day Anthropic priced the most valuable AI company in the world at $965 billion. The model is the product, but the announcement around it tells you where Anthropic thinks the next quarter is going.

Sources

Footnotes

  1. Anthropic, "Introducing Claude Opus 4.8," May 28, 2026. https://www.anthropic.com/news/claude-opus-4-8 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

  2. TechCrunch, "Anthropic raises $65 billion, nears $1T valuation ahead of IPO," May 28, 2026. https://techcrunch.com/2026/05/28/anthropic-raises-65-billion-nears-1t-valuation-ahead-of-ipo/ 2

  3. officechai, "Anthropic Releases Claude Opus 4.8, Beats Opus 4.7, GPT-5.5 On Many Benchmarks," May 28, 2026. https://officechai.com/ai/claude-opus-4-8-benchmarks/ 2 3 4 5 6 7 8

  4. MarkTechPost, "Anthropic Ships Claude Opus 4.8 Alongside Dynamic Workflows and Cheaper Fast Mode, With Workflows Capped at 1,000 Subagents," May 28, 2026. https://www.marktechpost.com/2026/05/28/anthropic-ships-claude-opus-4-8-alongside-dynamic-workflows-and-cheaper-fast-mode-with-workflows-capped-at-1000-subagents/ 2 3 4 5 6

  5. AWS, "Claude Opus 4.8 is now available on AWS," May 28, 2026. https://aws.amazon.com/about-aws/whats-new/2026/05/claude-opus-4.8-aws/ 2

  6. GitHub Changelog, "Claude Opus 4.8 is generally available for GitHub Copilot," May 28, 2026. https://github.blog/changelog/2026-05-28-claude-opus-4-8-is-generally-available-for-github-copilot/ 2

  7. Anthropic, "What's new in Claude Opus 4.8," Claude API Docs. https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-8 2 3 4 5 6 7

  8. codersera, "Claude Opus 4.8 Launch Guide: Benchmarks & Pricing 2026," May 28, 2026. https://codersera.com/blog/claude-opus-4-8-launch-guide-2026/

  9. marc0.dev, "SWE-Bench Leaderboard May 2026 | GPT-5.5 Leads at 88.7%." https://www.marc0.dev/en/leaderboard 2

  10. digitalapplied, "Claude Opus 4.8 vs GPT-5.5: Benchmarks & Cost Compared." https://www.digitalapplied.com/blog/claude-opus-4-8-vs-gpt-5-5-frontier-comparison 2

  11. VentureBeat, "Anthropic's Claude Opus 4.8 is here with 3X cheaper fast mode and near-Mythos level alignment," May 28, 2026. https://venturebeat.com/technology/anthropics-claude-opus-4-8-is-here-with-3x-cheaper-fast-mode-and-near-mythos-level-alignment

  12. Anthropic, "Claude Opus 4.8 System Card," May 28, 2026. https://www.anthropic.com/claude-opus-4-8-system-card

  13. Anthropic, "Anthropic raises $65B in Series H funding at $965B post-money valuation," May 28, 2026. https://www.anthropic.com/news/series-h 2 3

  14. Anthropic, "Anthropic raises $30 billion in Series G funding at $380 billion post-money valuation," February 12, 2026. https://www.anthropic.com/news/anthropic-raises-30-billion-series-g-funding-380-billion-post-money-valuation 2

  15. Bloomberg, "Anthropic Valued at $965 Billion After Latest Funding Round, Eclipsing OpenAI," May 28, 2026. https://www.bloomberg.com/news/articles/2026-05-28/anthropic-raises-at-965-billion-valuation-eclipsing-openai

  16. CNBC, "Anthropic tops OpenAI as most valuable AI startup, nears $1 trillion valuation in latest round," May 28, 2026. https://www.cnbc.com/2026/05/28/anthropic-open-ai-startup-value.html

  17. CNBC, "Anthropic CEO warns of cyber 'moment of danger' as AI exposes thousands of vulnerabilities," May 5, 2026. https://www.cnbc.com/2026/05/05/anthropic-ceo-cyber-moment-of-danger-mythos-vulnerabilities.html

  18. Anthropic, "Project Glasswing: Securing critical software for the AI era." https://www.anthropic.com/glasswing 2

Frequently Asked Questions

May 28, 2026. Anthropic shipped it to the Claude API, Amazon Bedrock, Vertex AI, Microsoft Foundry, and GitHub Copilot on launch day, with the API model ID claude-opus-4-8 . 1 5 6

FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.