GPT-5.5: OpenAI's First Retrained Base Since GPT-4.5
April 24, 2026
TL;DR
On April 23, 2026, OpenAI released GPT-5.5 — its first fully retrained base model since GPT-4.5, arriving just seven weeks after GPT-5.412. GPT-5.5 hits 82.7% on Terminal-Bench 2.0, 84.9% on GDPval, 78.7% on OSWorld-Verified, and 98.0% on Tau2-bench Telecom — benchmarks that together describe an agent that can plan, use tools, and drive a real computer34. On the hardest math, GPT-5.5 Pro reaches 39.6% on FrontierMath Tier 4, versus 22.9% for Claude Opus 4.75. API pricing lands at $5 per million input tokens and $30 per million output tokens — exactly 2× GPT-5.4's rate — with a 1M-token context window, though the API launch is slated for "very soon" rather than day one16. In ChatGPT, GPT-5.5 Thinking is live today for Plus, Pro, Business, and Enterprise users; GPT-5.5 Pro for Pro, Business, Enterprise, and Edu. Codex access extends to Go tier as well47.
What You'll Learn
- Why GPT-5.5 is architecturally different from GPT-5.1 through GPT-5.4
- Headline benchmark numbers — Terminal-Bench, GDPval, OSWorld, FrontierMath, SWE-Bench Pro
- How GPT-5.5 compares to Claude Opus 4.7 and Gemini 3.1 Pro
- Pricing for GPT-5.5 and GPT-5.5 Pro across ChatGPT and the API
- What the NVIDIA GB200/GB300 NVL72 partnership means for the model
- Where the safety classification sits under OpenAI's Preparedness Framework
A Fully Retrained Base, Not Another Post-Training Pass
Here is the structural fact that separates this release from recent OpenAI launches: GPT-5.5 is the first fully retrained base model since GPT-4.524. Every GPT-5.x release between them — 5.1, 5.2, 5.3, and 5.4 — was a post-training iteration on the same underlying base. GPT-5.5 reworks the architecture, the pretraining corpus, and the agent-oriented training objectives.
That distinction matters because it reframes what GPT-5.5 is competing against. A post-training iteration mostly redistributes capability — better alignment here, sharper tool use there. A retrained base can shift where the model's ceiling sits on fundamentals like long-context reliability, multi-step reasoning, and token efficiency. OpenAI calls the result "natively omnimodal" — text, images, audio, and video integrated inside a single system rather than stitched together after the fact2.
OpenAI's chief research officer Mark Chen framed the upgrade around agent workflows: GPT-5.5 is better at navigating computer work than its predecessors, with meaningful gains on scientific and technical research tasks8. That phrasing tracks the benchmark focus. The headline wins are all agentic — terminal use, economic knowledge work, desktop tasks, tool chains — rather than raw reasoning.
The Headline Benchmark Numbers
OpenAI reports state-of-the-art scores on 14 benchmarks. Here are the ones that matter for deciding whether to switch models:
| Benchmark | GPT-5.5 | What it tests |
|---|---|---|
| Terminal-Bench 2.0 | 82.7% | Complex command-line workflows with planning, iteration, tool coordination3 |
| GDPval | 84.9% | Well-specified knowledge work across 44 occupations3 |
| OSWorld-Verified | 78.7% | Operating real computer environments autonomously4 |
| Tau2-bench Telecom | 98.0% | Multi-turn customer-service tool use (no prompt tuning)4 |
| FrontierMath Tier 4 | 35.4% | Hardest research-level math problems5 |
| MRCR v2 (512K–1M) | 74% | Long-context retrieval across the upper end of the window4 |
| SWE-Bench Pro | 58.6% | Real-world GitHub issue resolution9 |
| Harvey BigLaw Bench | 91.7% overall | Substantive legal accuracy across practice areas (up from 91.0% for GPT-5.4)10 |
GPT-5.5 Pro adds another lift on the hardest evaluations: 39.6% on FrontierMath Tier 4 and 52.4% on FrontierMath Tiers 1–35. OpenAI also reports a 60% drop in hallucinations compared to GPT-5.411.
One caveat worth naming: Claude Opus 4.7 still leads on SWE-Bench Pro at 64.3%12. On the pure-reasoning side, GPT-5.5 takes the ARC-AGI-2 crown at 85%, up from Gemini 3.1 Pro's 77.1% and Claude Opus 4.7's 75.8%13. GPQA Diamond is essentially a cluster at the top — Gemini 3.1 Pro at 94.3%, Opus 4.7 at 94.2%, GPT-5.5 at 93.6%13. GPT-5.5's clearest wins are in agentic and computer-use evaluations plus the hardest math; SWE-Bench Pro remains the one row where Opus 4.7 holds on.
GPT-5.5 vs Claude Opus 4.7: Where Each One Wins
Claude Opus 4.7 shipped exactly one week before GPT-5.5, on April 16, 2026. That makes the head-to-head unusually clean — two frontier models launched within the same release window.
| Category | GPT-5.5 | Claude Opus 4.7 | Edge |
|---|---|---|---|
| Agentic coding (Terminal-Bench 2.0) | 82.7% | — | GPT-5.5 |
| Real-world coding (SWE-Bench Pro) | 58.6% | 64.3% | Opus 4.7 |
| Desktop agent work (OSWorld-Verified) | 78.7% | 78.0% | GPT-5.5 (narrow) |
| Hardest math (FrontierMath Tier 4) | 35.4% / 39.6% Pro | 22.9% | GPT-5.5 |
| Input price per 1M tokens | $5.00 | $5.00 | Tie |
| Output price per 1M tokens | $30.00 | $25.00 | Opus 4.7 |
| Context window | 1M tokens | 1M tokens | Tie |
The pricing line is the one that changes the deployment calculus. Opus 4.7 kept the same $5/$25 rate as Opus 4.614, while GPT-5.5 doubled the GPT-5.4 price to $5/$306. For output-heavy workloads — agent loops, long code generations, chain-of-thought reasoning — Opus 4.7 is now meaningfully cheaper per token. OpenAI's counterargument is that GPT-5.5 needs fewer tokens to finish comparable tasks, so the sticker price overstates the real cost6. Whether that holds up in your workload is an empirical question you have to run yourself.
Claude Mythos Preview — which leads several of these same benchmarks — is gated to a small audience of trusted partners and government agencies and is not a commercially competing product15.
Pricing and Availability
Unlike most recent OpenAI launches, GPT-5.5 did not arrive on the API on day one. The product pages and pricing have been published, but API access is still described as "very soon"14. That gap is a deliberate safety choice. OpenAI says serving the model at scale through the API requires different safeguards than ChatGPT's integrated environment, and the team is working with partners on the security requirements1.
ChatGPT (live April 23, 2026):
- GPT-5.5 (exposed as "GPT-5.5 Thinking") → Plus, Pro, Business, Enterprise4
- GPT-5.5 Pro → Pro, Business, Enterprise, Edu4
- Free tier → No GPT-5.5 access; free users remain on GPT-5.37
Codex (live April 23, 2026):
- GPT-5.5 available with a 400K-token context window on Plus, Pro, Business, Enterprise, Edu, and Go plans4
- Fast mode generates tokens 1.5× faster for 2.5× the cost4
API (planned, not yet live):
| Model | Input / 1M tokens | Output / 1M tokens | Context |
|---|---|---|---|
| GPT-5.5 | $5.00 | $30.00 | 1M tokens1 |
| GPT-5.5 Pro | $30.00 | $180.00 | 1M tokens6 |
⚠ Prices change frequently. The values above are for illustration only and may be out of date. Always verify current pricing directly with the provider before making cost decisions: Anthropic · OpenAI · Google Gemini · Google Vertex AI · AWS Bedrock · Azure OpenAI · Mistral · Cohere · Together AI · DeepSeek · Groq · Fireworks AI · Perplexity · xAI · Cursor · GitHub Copilot · Windsurf.
No mini or nano variants shipped with GPT-5.5. GPT-5.4 mini and nano had launched on March 17, 2026, so that tier may arrive later — but it is not part of the 5.5 rollout16.
The NVIDIA GB200 and GB300 Partnership
GPT-5.5 is co-designed with NVIDIA GB200 and GB300 NVL72 rack-scale systems — the same pattern NVIDIA described for GPT-5.3-Codex, now extended to Blackwell Ultra (GB300)17. The key engineering milestone is that the first 100,000-GPU GB200 NVL72 cluster completed large-scale training runs for GPT-5.5 and set a new benchmark for system-level reliability at frontier scale17.
That matters for two reasons. First, it constrains what "state of the art" means in April 2026 — frontier training now requires an infrastructure tier that only a handful of operators can stand up, which maps neatly onto Google's own TPU 8 push and Meta's MTIA chip deployment. Second, NVIDIA has already rolled out GPT-5.5-powered Codex to more than 10,000 of its own staff across engineering, product, legal, marketing, and operations — a tight feedback loop between the model's behavior and the hardware it runs on17.
Safety Classification: High on Biological and Cybersecurity
OpenAI classified GPT-5.5 as High risk on both biological/chemical and cybersecurity capabilities under its Preparedness Framework18. That is the same classification GPT-5.4 received — but OpenAI notes the underlying cybersecurity capability is itself a step up within the High bracket, without crossing the Critical threshold that would block broad deployment. High risk means the model could "amplify existing pathways to severe harm."
The release came with what OpenAI calls its "strongest set of safeguards to date," informed by internal and external red-teaming and feedback from nearly 200 trusted early-access partners1. The Preparedness High classification — alongside Anthropic's decision to gate Claude Mythos to a small trusted-partner audience — is a useful signal about where the frontier is heading: cybersecurity capability is now a first-order release concern for both major labs, not an afterthought.
How GPT-5.5 Fits the Bigger 2026 Model Race
A short map of the competitive field as of April 24, 2026:
- GPT-5.5 (OpenAI, April 23): Best-in-class on agentic/computer-use benchmarks, ARC-AGI-2 (85%), and hardest math; retrained base
- Claude Opus 4.7 (Anthropic, April 16): Best-in-class on real-world coding (SWE-Bench Pro at 64.3%); cheaper output tokens
- Gemini 3.1 Pro (Google, preview since February 19, 2026): Top-cluster GPQA Diamond at 94.3%; featured at Cloud Next 2026 inside Gemini Enterprise Agent Platform
- Claude Mythos Preview (Anthropic): Leads several benchmarks but not broadly available
- GLM-5.1 (Z.ai, April 7): Best open-weight SWE-Bench Pro at 58.4%
The pattern worth watching is that no single model leads across categories anymore. The competitive advantage is moving from "best model" to "best-routed stack" — applications that can pick the right model per task. Multi-model routing has quietly become a first-class infrastructure problem rather than a curiosity.
The Bottom Line
GPT-5.5 is the first OpenAI release in over a year that is not a post-training iteration, and the benchmarks reflect that. The gains are concentrated where a retrained base can actually move numbers — agent workflows, long context, token efficiency, and the hardest math. The price doubled, the API is not yet live, and Claude Opus 4.7 still holds the SWE-Bench Pro crown. But for teams building agents that have to drive terminals, operate real computers, and finish multi-step knowledge work, GPT-5.5 is the first model where the ceiling feels like it moved rather than shifted sideways.
Whether it is worth switching depends on the classic tradeoff: if you need the agentic ceiling, pay the 2× and run it. If you need output-heavy coding, Opus 4.7 is cheaper and scores higher on the benchmark you care about. The new default is routing, not allegiance.
References
Footnotes
-
TechCrunch — "OpenAI releases GPT-5.5, bringing company one step closer to an AI 'super app'", April 23, 2026. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7
-
The Next Web — "OpenAI launches GPT-5.5, its first fully retrained base model since GPT-4.5", April 23, 2026. ↩ ↩2 ↩3
-
MarkTechPost — "OpenAI Releases GPT-5.5, a Fully Retrained Agentic Model That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval", April 23, 2026. ↩ ↩2 ↩3
-
OpenAI — "Introducing GPT-5.5", official announcement, April 23, 2026. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14
-
DigitalApplied — "GPT-5.5 vs Claude Opus 4.7: Benchmarks & Pricing", April 23, 2026. ↩ ↩2 ↩3
-
The Decoder — "OpenAI unveils GPT-5.5, claims a 'new class of intelligence' at double the API price", April 23, 2026. ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
OpenAI Help Center — "GPT-5.3 and GPT-5.5 in ChatGPT". ↩ ↩2
-
CNBC — "OpenAI announces GPT-5.5, its latest artificial intelligence model", April 23, 2026. ↩
-
SiliconANGLE — "OpenAI releases GPT-5.5 with advanced math, coding capabilities", April 23, 2026. ↩
-
Harvey — "GPT-5.5: Research Preview Results", April 23, 2026. ↩
-
Startup Fortune — "OpenAI's GPT-5.5 benchmarks show a 60% hallucination drop", April 23, 2026. ↩ ↩2
-
Scale AI — SWE-Bench Pro Leaderboard. ↩ ↩2
-
Officechai — "GPT 5.5 Tops ARC-AGI 2 With 85% Score", April 23, 2026. ↩ ↩2
-
Anthropic — "Introducing Claude Opus 4.7", April 16, 2026. ↩
-
R&D World — "How OpenAI's recently released GPT-5.5 stacks up with Anthropic's gated Claude Mythos", April 23, 2026. ↩
-
NVIDIA Blog — "OpenAI's New GPT-5.5 Powers Codex on NVIDIA Infrastructure", April 23, 2026. ↩ ↩2 ↩3
-
DataCamp — "Open AI's GPT-5.5: Benchmarks, Safety Classification, and Availability", April 23, 2026. ↩