Q: Is GPT-5.5 smarter than GPT-5.4?

On the benchmarks OpenAI highlighted, yes — notably Terminal-Bench 2.0, GDPval, OSWorld-Verified, and FrontierMath Tier 4. OpenAI also reports a 60% drop in hallucinations versus GPT-5.4 11 . Whether it is smarter on your workload requires evaluation.

Q: Does GPT-5.5 cost more than GPT-5.4?

Yes. API pricing is exactly 2× — $5/$30 per million tokens versus $2.50/$15 for GPT-5.4 6 . OpenAI argues that fewer tokens are needed for comparable tasks, so the real cost may not be 2× in practice.

Q: What is the context window?

1 million tokens on the API, 400,000 tokens in Codex 4 . MRCR v2 shows 74% retrieval accuracy across the 512K–1M range, up from about 36% for GPT-5.4 — a meaningful long-context reliability jump 4 6 .

Q: Is there a GPT-5.5 mini or nano?

Not at launch. Only GPT-5.5 (standard) and GPT-5.5 Pro shipped on April 23, 2026 4 .

Q: Does Claude Opus 4.7 still beat GPT-5.5 at coding?

On SWE-Bench Pro, yes — 64.3% vs 58.6% 12 . GPT-5.5 pulls ahead on Terminal-Bench 2.0 and agentic coding workflows. The better answer depends on whether your workload is "fix this GitHub issue" (Opus 4.7) or "drive the terminal to finish this task end-to-end" (GPT-5.5).

GPT-5.5: OpenAI's First Retrained Base Since GPT-4.5

April 24, 2026

#GPT-5.5 #GPT-5.5 Pro #OpenAI #ChatGPT #Codex #AI agents #Terminal-Bench #GDPval #FrontierMath #Claude Opus 4.7 #Gemini 3.1 Pro #NVIDIA GB200 #NVIDIA GB300 #LLM benchmarks #agentic AI

GPT-5.5: OpenAI's First Retrained Base Since GPT-4.5

TL;DR

On April 23, 2026, OpenAI released GPT-5.5 — its first fully retrained base model since GPT-4.5, arriving just seven weeks after GPT-5.4¹². GPT-5.5 hits 82.7% on Terminal-Bench 2.0, 84.9% on GDPval, 78.7% on OSWorld-Verified, and 98.0% on Tau2-bench Telecom — benchmarks that together describe an agent that can plan, use tools, and drive a real computer³⁴. On the hardest math, GPT-5.5 Pro reaches 39.6% on FrontierMath Tier 4, versus 22.9% for Claude Opus 4.7⁵. API pricing lands at $5 per million input tokens and $30 per million output tokens — exactly 2× GPT-5.4's rate — with a 1M-token context window, though the API launch is slated for "very soon" rather than day one¹⁶. In ChatGPT, GPT-5.5 Thinking is live today for Plus, Pro, Business, and Enterprise users; GPT-5.5 Pro for Pro, Business, Enterprise, and Edu. Codex access extends to Go tier as well⁴⁷.

What You'll Learn

Why GPT-5.5 is architecturally different from GPT-5.1 through GPT-5.4
Headline benchmark numbers — Terminal-Bench, GDPval, OSWorld, FrontierMath, SWE-Bench Pro
How GPT-5.5 compares to Claude Opus 4.7 and Gemini 3.1 Pro
Pricing for GPT-5.5 and GPT-5.5 Pro across ChatGPT and the API
What the NVIDIA GB200/GB300 NVL72 partnership means for the model
Where the safety classification sits under OpenAI's Preparedness Framework

A Fully Retrained Base, Not Another Post-Training Pass

Here is the structural fact that separates this release from recent OpenAI launches: GPT-5.5 is the first fully retrained base model since GPT-4.5²⁴. Every GPT-5.x release between them — 5.1, 5.2, 5.3, and 5.4 — was a post-training iteration on the same underlying base. GPT-5.5 reworks the architecture, the pretraining corpus, and the agent-oriented training objectives.

That distinction matters because it reframes what GPT-5.5 is competing against. A post-training iteration mostly redistributes capability — better alignment here, sharper tool use there. A retrained base can shift where the model's ceiling sits on fundamentals like long-context reliability, multi-step reasoning, and token efficiency. OpenAI calls the result "natively omnimodal" — text, images, audio, and video integrated inside a single system rather than stitched together after the fact².

OpenAI's chief research officer Mark Chen framed the upgrade around agent workflows: GPT-5.5 is better at navigating computer work than its predecessors, with meaningful gains on scientific and technical research tasks⁸. That phrasing tracks the benchmark focus. The headline wins are all agentic — terminal use, economic knowledge work, desktop tasks, tool chains — rather than raw reasoning.

The Headline Benchmark Numbers

OpenAI reports state-of-the-art scores on 14 benchmarks. Here are the ones that matter for deciding whether to switch models:

Benchmark	GPT-5.5	What it tests
Terminal-Bench 2.0	82.7%	Complex command-line workflows with planning, iteration, tool coordination³
GDPval	84.9%	Well-specified knowledge work across 44 occupations³
OSWorld-Verified	78.7%	Operating real computer environments autonomously⁴
Tau2-bench Telecom	98.0%	Multi-turn customer-service tool use (no prompt tuning)⁴
FrontierMath Tiers 1–3	51.7%	Research-level math problems (Tiers 1–3)⁵
FrontierMath Tier 4	35.4%	Hardest research-level math problems⁵
MRCR v2 (512K–1M)	74%	Long-context retrieval across the upper end of the window⁴
SWE-Bench Pro	58.6%	Real-world GitHub issue resolution⁹
Harvey BigLaw Bench	91.7% overall	Substantive legal accuracy across practice areas (up from 91.0% for GPT-5.4)¹⁰

GPT-5.5 Pro adds another lift on the hardest evaluations: 39.6% on FrontierMath Tier 4 and 52.4% on FrontierMath Tiers 1–3⁵. OpenAI also reports a 60% drop in hallucinations compared to GPT-5.4¹¹.

One caveat worth naming: Claude Opus 4.7 still leads on SWE-Bench Pro at 64.3%¹². On the pure-reasoning side, GPT-5.5 takes the ARC-AGI-2 crown at 85%, up from Gemini 3.1 Pro's 77.1% and Claude Opus 4.7's 75.8%¹³. GPQA Diamond is essentially a cluster at the top — Gemini 3.1 Pro at 94.3%, Opus 4.7 at 94.2%, GPT-5.5 at 93.6%¹³. GPT-5.5's clearest wins are in agentic and computer-use evaluations plus the hardest math; SWE-Bench Pro remains the one row where Opus 4.7 holds on.

GPT-5.5 vs Claude Opus 4.7: Where Each One Wins

Claude Opus 4.7 shipped exactly one week before GPT-5.5, on April 16, 2026. That makes the head-to-head unusually clean — two frontier models launched within the same release window.

Category	GPT-5.5	Claude Opus 4.7	Edge
Agentic coding (Terminal-Bench 2.0)	82.7%	—	GPT-5.5
Real-world coding (SWE-Bench Pro)	58.6%	64.3%	Opus 4.7
Desktop agent work (OSWorld-Verified)	78.7%	78.0%	GPT-5.5 (narrow)
Hardest math (FrontierMath Tier 4)	35.4% / 39.6% Pro	22.9%	GPT-5.5
Input price per 1M tokens	$5.00	$5.00	Tie
Output price per 1M tokens	$30.00	$25.00	Opus 4.7
Context window	1M tokens	1M tokens	Tie

The pricing line is the one that changes the deployment calculus. Opus 4.7 kept the same $5/$25 rate as Opus 4.6¹⁴, while GPT-5.5 doubled the GPT-5.4 price to $5/$30⁶. For output-heavy workloads — agent loops, long code generations, chain-of-thought reasoning — Opus 4.7 is now meaningfully cheaper per token. OpenAI's counterargument is that GPT-5.5 needs fewer tokens to finish comparable tasks, so the sticker price overstates the real cost⁶. Whether that holds up in your workload is an empirical question you have to run yourself.

Claude Mythos Preview — which leads several of these same benchmarks — is gated to a small audience of trusted partners and government agencies and is not a commercially competing product¹⁵.

Pricing and Availability

Unlike most recent OpenAI launches, GPT-5.5 did not arrive on the API on day one. The product pages and pricing have been published, but API access is still described as "very soon"¹⁴. That gap is a deliberate safety choice. OpenAI says serving the model at scale through the API requires different safeguards than ChatGPT's integrated environment, and the team is working with partners on the security requirements¹.

ChatGPT (live April 23, 2026):

GPT-5.5 (exposed as "GPT-5.5 Thinking") → Plus, Pro, Business, Enterprise⁴
GPT-5.5 Pro → Pro, Business, Enterprise, Edu⁴
Free tier → No GPT-5.5 access; free users remain on GPT-5.3⁷

Codex (live April 23, 2026):

GPT-5.5 available with a 400K-token context window on Plus, Pro, Business, Enterprise, Edu, and Go plans⁴
Fast mode generates tokens 1.5× faster for 2.5× the cost⁴

API (planned, not yet live):

Model	Input / 1M tokens	Output / 1M tokens	Context
GPT-5.5	$5.00	$30.00	1M tokens¹
GPT-5.5 Pro	$30.00	$180.00	1M tokens⁶

⚠ Prices change frequently. The values above are for illustration only and may be out of date. Always verify current pricing directly with the provider before making cost decisions: Anthropic · OpenAI · Google Gemini · Google Vertex AI · AWS Bedrock · Azure OpenAI · Mistral · Cohere · Together AI · DeepSeek · Groq · Fireworks AI · Perplexity · xAI · Cursor · GitHub Copilot · Windsurf.

No mini or nano variants shipped with GPT-5.5. GPT-5.4 mini and nano had launched on March 17, 2026, so that tier may arrive later — but it is not part of the 5.5 rollout¹⁶.

The NVIDIA GB200 and GB300 Partnership

GPT-5.5 is co-designed with NVIDIA GB200 and GB300 NVL72 rack-scale systems — the same pattern NVIDIA described for GPT-5.3-Codex, now extended to Blackwell Ultra (GB300)¹⁷. The key engineering milestone is that the first 100,000-GPU GB200 NVL72 cluster completed large-scale training runs for GPT-5.5 and set a new benchmark for system-level reliability at frontier scale¹⁷.

That matters for two reasons. First, it constrains what "state of the art" means in April 2026 — frontier training now requires an infrastructure tier that only a handful of operators can stand up, which maps neatly onto Google's own TPU 8 push and Meta's MTIA chip deployment. Second, NVIDIA has already rolled out GPT-5.5-powered Codex to more than 10,000 of its own staff across engineering, product, legal, marketing, and operations — a tight feedback loop between the model's behavior and the hardware it runs on¹⁷.

Safety Classification: High on Biological and Cybersecurity

OpenAI classified GPT-5.5 as High risk on both biological/chemical and cybersecurity capabilities under its Preparedness Framework¹⁸. That is the same classification GPT-5.4 received — but OpenAI notes the underlying cybersecurity capability is itself a step up within the High bracket, without crossing the Critical threshold that would block broad deployment. High risk means the model could "amplify existing pathways to severe harm."

The release came with what OpenAI calls its "strongest set of safeguards to date," informed by internal and external red-teaming and feedback from nearly 200 trusted early-access partners¹. The Preparedness High classification — alongside Anthropic's decision to gate Claude Mythos to a small trusted-partner audience — is a useful signal about where the frontier is heading: cybersecurity capability is now a first-order release concern for both major labs, not an afterthought.

How GPT-5.5 Fits the Bigger 2026 Model Race

A short map of the competitive field as of April 24, 2026:

GPT-5.5 (OpenAI, April 23): Best-in-class on agentic/computer-use benchmarks, ARC-AGI-2 (85%), and hardest math; retrained base
Claude Opus 4.7 (Anthropic, April 16): Best-in-class on real-world coding (SWE-Bench Pro at 64.3%); cheaper output tokens
Gemini 3.1 Pro (Google, preview since February 19, 2026): Top-cluster GPQA Diamond at 94.3%; featured at Cloud Next 2026 inside Gemini Enterprise Agent Platform
Claude Mythos Preview (Anthropic): Leads several benchmarks but not broadly available
GLM-5.1 (Z.ai, April 7): Best open-weight SWE-Bench Pro at 58.4%

The pattern worth watching is that no single model leads across categories anymore. The competitive advantage is moving from "best model" to "best-routed stack" — applications that can pick the right model per task. Multi-model routing has quietly become a first-class infrastructure problem rather than a curiosity.

The Bottom Line

GPT-5.5 is the first OpenAI release in over a year that is not a post-training iteration, and the benchmarks reflect that. The gains are concentrated where a retrained base can actually move numbers — agent workflows, long context, token efficiency, and the hardest math. The price doubled, the API is not yet live, and Claude Opus 4.7 still holds the SWE-Bench Pro crown. But for teams building agents that have to drive terminals, operate real computers, and finish multi-step knowledge work, GPT-5.5 is the first model where the ceiling feels like it moved rather than shifted sideways.

Whether it is worth switching depends on the classic tradeoff: if you need the agentic ceiling, pay the 2× and run it. If you need output-heavy coding, Opus 4.7 is cheaper and scores higher on the benchmark you care about. The new default is routing, not allegiance.

References

AI Agents: The Next Frontier in Software Development

TechCrunch — "OpenAI releases GPT-5.5, bringing company one step closer to an AI 'super app'", April 23, 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
The Next Web — "OpenAI launches GPT-5.5, its first fully retrained base model since GPT-4.5", April 23, 2026. ↩ ↩² ↩³
MarkTechPost — "OpenAI Releases GPT-5.5, a Fully Retrained Agentic Model That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval", April 23, 2026. ↩ ↩² ↩³
OpenAI — "Introducing GPT-5.5", official announcement, April 23, 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴
DigitalApplied — "GPT-5.5 vs Claude Opus 4.7: Benchmarks & Pricing", April 23, 2026. ↩ ↩² ↩³ ↩⁴
The Decoder — "OpenAI unveils GPT-5.5, claims a 'new class of intelligence' at double the API price", April 23, 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
OpenAI Help Center — "GPT-5.3 and GPT-5.5 in ChatGPT". ↩ ↩²
CNBC — "OpenAI announces GPT-5.5, its latest artificial intelligence model", April 23, 2026. ↩
SiliconANGLE — "OpenAI releases GPT-5.5 with advanced math, coding capabilities", April 23, 2026. ↩
Harvey — "GPT-5.5: Research Preview Results", April 23, 2026. ↩
Startup Fortune — "OpenAI's GPT-5.5 benchmarks show a 60% hallucination drop", April 23, 2026. ↩ ↩²
Scale AI — SWE-Bench Pro Leaderboard. ↩ ↩²
Officechai — "GPT 5.5 Tops ARC-AGI 2 With 85% Score", April 23, 2026. ↩ ↩²
Anthropic — "Introducing Claude Opus 4.7", April 16, 2026. ↩
R&D World — "How OpenAI's recently released GPT-5.5 stacks up with Anthropic's gated Claude Mythos", April 23, 2026. ↩
Wikipedia — "GPT-5.4". ↩
NVIDIA Blog — "OpenAI's New GPT-5.5 Powers Codex on NVIDIA Infrastructure", April 23, 2026. ↩ ↩² ↩³
DataCamp — "Open AI's GPT-5.5: Benchmarks, Safety Classification, and Availability", April 23, 2026. ↩

Frequently Asked Questions

OpenAI has published the pricing but says API access is coming "very soon." As of April 23, 2026, API keys cannot call gpt-5.5 directly — the model is only reachable through ChatGPT and Codex 1 .

GPT-5.5: OpenAI's First Retrained Base Since GPT-4.5

TL;DR

What You'll Learn

A Fully Retrained Base, Not Another Post-Training Pass

The Headline Benchmark Numbers

GPT-5.5 vs Claude Opus 4.7: Where Each One Wins

Pricing and Availability

The NVIDIA GB200 and GB300 Partnership

Safety Classification: High on Biological and Cybersecurity

How GPT-5.5 Fits the Bigger 2026 Model Race

The Bottom Line

References

Frequently Asked Questions

Related Posts

DeepSeek V4: Open-Weight Frontier at 1/7 the Cost

ChatGPT Personal Finance: OpenAI's Plaid Launch in 2026

GPT-5.5 Instant: ChatGPT's New Default Model in 2026

OpenAI Acquires Hiro Finance: The Vertical AI Shift Begins

Stay on the Nerd Track