How much does Claude Opus 4.7 cost?

Standard pricing is $5.00 per million input tokens and $25.00 per million output tokens. Batch API pricing is $2.50/$12.50. Important: a new tokenizer uses up to 35% more tokens for the same text compared to older Claude models, so effective per-request costs are higher than the rate card alone suggests.

What happened to extended thinking budgets?

Extended thinking budgets have been removed in Opus 4.7. Setting budget_tokens returns a 400 error. Adaptive thinking — which dynamically allocates reasoning compute — replaces them and is off by default. Anthropic reports it outperforms extended thinking in internal evals.

How does Claude Opus 4.7 compare to GPT-5.4?

On SWE-bench Pro (real-world coding), Opus 4.7 leads 64.3% vs. 57.7%. On OSWorld computer use, Opus 4.7 leads 78.0% vs. 75.0%. On GPQA Diamond (science), they are within 0.2 points (94.2% vs. 94.4%). On GDPVal-AA knowledge work, Opus 4.7 leads 1,753 vs. 1,674 Elo.

Is Claude Opus 4.7 Anthropic's most powerful model?

No. Claude Mythos Preview (Project Glasswing) is more capable and better-aligned per Anthropic's internal evaluations, but remains invitation-only. It launched with 12 named partners — including AWS, Apple, Google, Microsoft, and Nvidia — with access extended to over 40 additional vetted organizations. Opus 4.7 is the most capable model available to the general public.

ai-ml

Claude Opus 4.7: Benchmarks, Features & Pricing

Q: What is the API model ID for Claude Opus 4.7?

The official model ID is claude-opus-4-7 . It is available on Anthropic's API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, Snowflake Cortex AI, and GitHub Copilot Enterprise.

April 17, 2026

#Claude #Anthropic #Claude Opus 4.7 #AI benchmarks #LLM #AI agents #coding AI #extended thinking #AI pricing

Claude Opus 4.7: Benchmarks, Features & Pricing

TL;DR

Anthropic released Claude Opus 4.7 on April 16, 2026. It leads SWE-bench Pro at 64.3% — ahead of GPT-5.4 (57.7%) and Gemini 3.1 Pro (54.2%) — and sets a new high on OSWorld-Verified at 78.0%, up from 72.7% for Opus 4.6. Pricing is unchanged at $5.00/$25.00 per million tokens input/output, though a new tokenizer uses up to 35% more tokens for equivalent text. Key additions include a new xhigh effort level, adaptive thinking (replacing extended thinking budgets), task budgets for agentic loops, and 3× higher image resolution for computer use.

What You'll Learn

How Claude Opus 4.7 scores on SWE-bench Pro, OSWorld, GPQA, CursorBench, and other key benchmarks
What changed vs. Claude Opus 4.6 — and what was removed
The new xhigh effort level and adaptive thinking system
Pricing, tokenizer changes, and what they mean for real API costs
Where Opus 4.7 leads, where competitors catch up, and what's still invitation-only

Release Details

Claude Opus 4.7 became generally available on April 16, 2026, one day before this post. The API model ID is claude-opus-4-7, available on Anthropic's API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, Snowflake Cortex AI, and GitHub Copilot Enterprise¹.

The model ships as a single variant — there are no Thinking, Pro, or Mini tiers for Opus 4.7. Extended thinking budgets have been removed entirely (setting budget_tokens now returns a 400 error); adaptive thinking replaces them. Knowledge cutoff is January 2026².

Benchmark Results

Coding — Where Opus 4.7 Leads Most Clearly

SWE-bench is the standard measure for autonomous software engineering: given a GitHub issue, can the model write a pull request that passes the test suite? Opus 4.7 sets a new top score on the harder SWE-bench Pro variant:

Model	SWE-bench Pro	SWE-bench Verified
Claude Opus 4.7	64.3%	87.6%
GPT-5.4	57.7%	—
Gemini 3.1 Pro	54.2%	80.6%
Claude Opus 4.6	53.4%	80.8%

On SWE-bench Pro, Opus 4.7 leads by more than 6 points over GPT-5.4 and 10 points over Gemini. On the standard SWE-bench Verified leaderboard, Opus 4.7 climbs from 80.8% to 87.6% — a 6.8-point improvement over its predecessor³.

Anthropic also reports a 13% improvement on an internal 93-task coding benchmark, and Rakuten's production deployment found Opus 4.7 resolves 3× more real production tasks than Opus 4.6 on their SWE-bench variant⁴.

Computer Use — OSWorld

OSWorld-Verified measures autonomous desktop task completion (file management, browser navigation, multi-app workflows). The human expert baseline is approximately 72.4%:

Model	OSWorld-Verified
GPT-5.4	75.0%
Claude Opus 4.7	78.0%
Claude Opus 4.6	72.7%
Human expert baseline	~72.4%

Opus 4.7 moves past GPT-5.4 on this benchmark, reaching 78.0% and extending the gap above the human baseline. The improvement is partly driven by a new 3.75MP image resolution ceiling for computer use — up from 1.15MP in Opus 4.6 — and coordinates that map 1:1 with pixels, removing the scale-factor math that previously introduced errors in screen coordinate targeting⁵.

Developer Workflows — CursorBench

CursorBench evaluates real-world coding assistant tasks as they occur in an IDE environment. Opus 4.7 scores 70%, up from 58% for Opus 4.6 — a 12-point jump that positions it above competing models on this benchmark⁶. These gains show up most in agentic tooling like Claude Code's agent mode, where the model reads, writes, and executes across a whole repository.

Graduate-Level Science — GPQA Diamond

On GPQA Diamond (graduate-level physics, chemistry, biology), the three frontier models are statistically indistinguishable:

Model	GPQA Diamond
GPT-5.4 Pro	94.4%
Gemini 3.1 Pro	94.3%
Claude Opus 4.7	94.2%

The differences here are within measurement noise. No single model holds a meaningful advantage on graduate-level scientific reasoning⁷.

Knowledge Work — GDPVal-AA

GDPVal-AA is an Elo-based benchmark measuring general knowledge work across business analysis, document processing, and professional reasoning tasks:

Model	GDPVal-AA (Elo)
Claude Opus 4.7	1,753
GPT-5.4	1,674
Gemini 3.1 Pro	1,314

Opus 4.7 holds a 79-point advantage over GPT-5.4 on this benchmark, with Gemini 3.1 Pro trailing significantly at 1,314⁸.

Security — XBOW Visual Acuity

On XBOW's visual acuity cybersecurity benchmark, Opus 4.7 scores 98.5%, versus 54.5% for Opus 4.6 — the largest single-generation jump of any benchmark in this release. Anthropic describes Opus 4.7 as a testbed for new cyber safeguards being validated before eventual broader release of Mythos-class models⁹.

Legal — BigLaw Bench

On Harvey's BigLaw Bench (professional legal reasoning), Opus 4.7 scores 90.9% at high effort¹⁰.

What's New vs. Opus 4.6

Adaptive Thinking Replaces Extended Thinking Budgets

The biggest architectural change: extended thinking budgets are gone. Setting budget_tokens in your API request now returns a 400 error. In their place, Anthropic introduces adaptive thinking — off by default, opt-in via the API — which Anthropic says outperforms extended thinking in internal evaluations. The system dynamically allocates reasoning compute rather than requiring developers to set a token ceiling.

Thinking blocks still appear in the response stream but the thinking field is empty by default unless you set display: "summarized" in your request¹¹.

New `xhigh` Effort Level

A new xhigh effort tier sits between the previous high and max levels, giving developers finer control over the reasoning/latency tradeoff. Anthropic recommends xhigh for coding and agentic use cases where you want stronger reasoning without paying the full max cost¹².

Task Budgets for Agentic Loops

A new public beta feature (task-budgets-2026-03-13 beta header) lets you set an advisory token budget across an entire agentic loop — not just a single model call. The minimum is 20,000 tokens. This is not a hard cap but guides the model's planning toward completing the task within budget, reducing runaway token usage in long agent workflows¹³.

3× Higher Image Resolution for Computer Use

Max image resolution climbs from 1,568px (1.15MP) to 2,576px (3.75MP) — more than three times the pixel count. Coordinate output now maps 1:1 to actual pixel positions, eliminating the scale-factor conversion errors that caused missed clicks in Opus 4.6 computer use deployments.

For a deeper look at how Claude's computer use capabilities are reshaping agent workflows, see our post on Claude managed agents.

Sampling Parameters Removed

Setting temperature, top_p, or top_k to non-default values now returns a 400 error. Anthropic has taken full control of sampling for Opus 4.7. Developers who relied on temperature tuning for output diversity will need to prompt-engineer instead.

More Direct, Less Deferential Tone

Opus 4.7 is described as more direct and opinionated than 4.6 — less validation-forward, with fewer emoji in responses and stronger opinions when asked. At lower effort levels, it is more literal and will not silently generalize instructions it considers ambiguous¹⁴.

`/ultrareview` in Claude Code

A new /ultrareview slash command is available in Claude Code for deeper code review passes¹⁵.

Pricing and Real-World Cost

The published rate card is unchanged from Opus 4.6:

Tier	Input	Output
Standard	$5.00 / MTok	$25.00 / MTok
Batch API (50% off)	$2.50 / MTok	$12.50 / MTok
Cache reads	$0.50 / MTok	—
Cache writes (5-min)	$6.25 / MTok	—
Cache writes (1-hour)	$10.00 / MTok	—

The critical caveat: Opus 4.7 uses a new tokenizer that converts the same input text into up to 35% more tokens than older Claude models. The per-token price is unchanged, but the effective per-request cost is higher. Developers migrating from Opus 4.6 should benchmark their actual token consumption before assuming cost parity¹⁶.

For reference, Gemini 3.1 Pro is available at approximately $2.00 input / $12.00 output per million tokens — roughly 2.5× cheaper at list price — though with a different capability profile and no published SWE-bench Pro or GDPVal-AA scores to compare directly.

What Opus 4.7 Is Not: Claude Mythos Preview

Anthropic's highest-capability model is not Opus 4.7. Claude Mythos Preview — developed under Project Glasswing — launched with 12 named enterprise and government partners, with access extended to over 40 additional vetted organizations on an invitation-only basis. Named launch partners are Anthropic, AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks.

Mythos is focused on defensive cybersecurity workflows. Anthropic describes it as "more capable and better-aligned" than Opus 4.7 in their internal evaluations. Opus 4.7 is explicitly positioned as the model on which Anthropic is testing new cyber safeguards — validating approaches before eventually moving toward a broader Mythos-class release.

For context on Mythos's cybersecurity evaluation, see our post on the AISI Claude Mythos cyber evaluation.

Safety Profile

Anthropic characterizes Opus 4.7's alignment as "largely well-aligned and trustworthy, though not fully ideal in its behavior." Specific improvements over 4.6 include better honesty and improved resistance to prompt injection attacks. One noted regression: Opus 4.7 is "modestly weaker" than 4.6 on avoiding overly detailed harm-reduction advice on controlled substances¹⁷.

The model automatically detects and blocks requests indicating prohibited or high-risk cybersecurity uses. A Cyber Verification Program is available for legitimate security professionals who need expanded capabilities. A full Claude Opus 4.7 System Card has been published¹⁸.

How It Compares: The Short Version

Opus 4.7 is the strongest model for autonomous coding and agentic tasks right now. On SWE-bench Pro it leads by over 6 points; on OSWorld it overtakes GPT-5.4 at 78.0%. For general scientific reasoning (GPQA Diamond), the gap between frontier models has effectively collapsed — all three sit within 0.2 points of each other.

Where Opus 4.7 loses ground: pricing (Gemini 3.1 Pro is cheaper at list price at approximately $2.00/$12.00 per million tokens, though benchmark comparisons are incomplete). Context windows are at parity — both models support 1M tokens. If you're optimizing for cost in long-document workflows rather than coding agents, the tradeoffs shift.

For a broader look at where AI agents now stand relative to human performance across every benchmark category, see the Stanford AI Index 2026 and our earlier breakdown of GPT-5.4's computer use scores.

References

Cursor AI Editor 2.5 Review 2026: Models, Pricing, Verdict

Frequently Asked Questions

Claude Opus 4.7 is Anthropic's latest publicly available large language model, released April 16, 2026. It is the most capable Claude model available to general users, with top scores on coding (SWE-bench Pro: 64.3%) and computer use (OSWorld-Verified: 78.0%) benchmarks.