DeepSeek V4: Open-Source Frontier at 1/6 the Cost
April 25, 2026
TL;DR
On April 24, 2026, DeepSeek released DeepSeek V4 as a preview — a two-model open-weight family shipped under the MIT license: V4-Pro (1.6 trillion total parameters, 49 billion active per token, pre-trained on 33 trillion tokens) and V4-Flash (284 billion total, 13 billion active)123. Both models support a 1 million-token context window and 384,000-token maximum output45. V4-Pro (at maximum reasoning effort) scores 80.6% on SWE-bench Verified — within 0.2 points of Claude Opus 4.6 (80.8%) — and 93.5% on LiveCodeBench, leading Gemini 3.1 Pro (91.7%) and Claude Opus 4.6 (88.8%)67. API pricing lands at $1.74 per million input tokens (cache-miss) and $3.48 per million output tokens for V4-Pro, dropping to $0.145/$3.48 with cache hits — roughly one-sixth the cost of Claude Opus 4.7 ($5/$25) and one-seventh the cost of GPT-5.5 ($5/$30)89. The architecture's headline change is a Hybrid Attention mechanism combining Compressed Sparse Attention and Heavily Compressed Attention, cutting V4-Pro's per-token inference FLOPs to 27% of V3.2's and KV cache to 10% at the 1M-token setting410. Huawei announced "full support" via its Ascend SuperNode product line, with V4 reportedly hitting performance parity on Ascend NPUs and NVIDIA GPUs1112.
What You'll Learn
- Why the V4 architecture's Hybrid Attention is more than a marketing label
- The exact V4-Pro and V4-Flash benchmark scores against Opus 4.7, GPT-5.5, and Gemini 3.1 Pro
- API pricing tiers — including the cache-hit price that changes the cost calculus
- What the Huawei Ascend partnership means for the China–US chip race
- Where V4 wins, where it trails, and where the open-source frontier sits today
A Two-Model Open-Weight Release Under MIT License
DeepSeek shipped V4 as a preview release on April 24, 2026, roughly five months after V3.2's December 2025 launch213. The family contains two distinct Mixture-of-Experts (MoE) models:
| Model | Total params | Active params | Pre-training tokens | Context | Max output |
|---|---|---|---|---|---|
| DeepSeek V4-Pro | 1.6T | 49B | 33T | 1M | 384K |
| DeepSeek V4-Flash | 284B | 13B | 32T | 1M | 384K |
Both models are released under the MIT license and published as open weights on Hugging Face. The instruct variants use mixed FP4/FP8 precision — MoE expert weights in FP4 and the rest in FP8 — while the base models are FP8 mixed14. Both expose three reasoning-effort modes — non-think, think (high), and think (max) — with thinking enabled by default at the high setting15.
That MIT license matters. Llama 4's release uses Meta's custom Community License, which restricts companies above 700 million monthly active users and is not OSI-approved as open source16. Some other open-weight releases (including Kimi K2.6) ship under "modified MIT" terms. DeepSeek choosing standard MIT — one of the most permissive widely-used open-source licenses — means V4 can be downloaded, fine-tuned, redistributed, and run commercially with no legal acrobatics.
Hybrid Attention Architecture: Why the Million-Token Window Is Cheaper
The architectural headline of V4 is a Hybrid Attention mechanism that interleaves two new attention layers across the Transformer stack410.
Compressed Sparse Attention (CSA) compresses the key-value cache of every m tokens into a single entry using a learned token-level compressor. Each query then attends only to the top-k selected compressed entries via DeepSeek Sparse Attention (DSA). A sliding window branch runs in parallel for local-dependency modeling.
Heavily Compressed Attention (HCA) is more aggressive: it consolidates m' tokens (where m' is much larger than m) into a single compressed entry, then applies dense attention across those compressed representations.
The efficiency numbers are the point. At a 1M-token context:
| Model | Per-token inference FLOPs vs V3.2 | KV cache size vs V3.2 |
|---|---|---|
| V4-Pro | 27% | ~10% |
| V4-Flash | 10% | 7% |
That is the gap between "1M context exists on a spec sheet" and "1M context is something an agent can actually use without burning a server farm." The Hugging Face technical write-up frames it as "a million-token context that agents can actually use" — which, given how often 1M-context claims fall apart in practice, is the harder problem to solve10.
Benchmark Scorecard: V4-Pro vs the Frontier
The DeepSeek team published benchmarks across coding, math, reasoning, and agentic categories. The pattern is consistent: V4-Pro leads or ties on coding and competitive programming, sits in the upper-middle pack on math, and trails meaningfully on the hardest expert-knowledge benchmarks.
Coding and software engineering
V4 exposes three reasoning-effort modes — Non-think, Think (high), and Think (max). The headline scores below are reported at the Think Max setting (V4-Pro-Max), so they are compared against competitor models at their maximum-effort settings where applicable17.
| Benchmark | DeepSeek V4-Pro (Max) | Claude Opus 4.6 | Gemini 3.1 Pro | GPT-5.4 |
|---|---|---|---|---|
| SWE-bench Verified | 80.6% | 80.8% | — | — |
| LiveCodeBench | 93.5% | 88.8% | 91.7% | — |
| Codeforces (rating) | 3206 | — | 3052 | 3168 |
| Terminal-Bench 2.0 | 67.9% | 65.4% | 68.5% | 75.1% (xHigh) |
V4-Pro is 0.2 points behind Claude Opus 4.6 on SWE-bench Verified — putting V4-Pro among the closest open-weight models to a Claude flagship on this benchmark, ahead of Kimi K2.6 (80.2%)6. On LiveCodeBench it leads outright. On Terminal-Bench 2.0, it edges Claude Opus 4.6 by 2.5 points but trails GPT-5.4 (xHigh setting) by 7.2 points and Gemini 3.1 Pro by 0.6 points17.
Math and reasoning
| Benchmark | DeepSeek V4-Pro (Max) | Claude Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| IMO AnswerBench | 89.8 | 75.3 | 91.4 | 81.0 |
| HMMT 2026 | 95.2% | 96.2% | 97.7% | — |
| HLE (Humanity's Last Exam, no tools) | 37.7% | 40.0% | 39.8% | 44.4% |
V4-Pro takes IMO AnswerBench against Claude Opus 4.6 and Gemini 3.1 Pro and sits within striking distance of GPT-5.4. On the more contested HMMT 2026 benchmark and on HLE (without tools), the closed frontier models pull ahead1819. HLE is particularly telling: it is the cross-domain expert benchmark where world-knowledge depth shows up, and V4-Pro lands more than six points behind Gemini 3.1 Pro. Note that against the newer Claude Opus 4.7 (released April 16, 2026) and GPT-5.5 (April 23, 2026), the gap on HLE widens further — Opus 4.7 reaches 46.9% and GPT-5.5 reaches 41.4% without tools20.
Agentic and web browsing
On BrowseComp, the agentic web-browsing benchmark, the V4-Pro Max variant reportedly scores 83.4% — putting it in the upper-middle pack among frontier closed-source models, with Claude Opus 4.7 at 79.3% and Gemini 3.1 Pro reported at 85.9%21. The DeepSeek team also says V4 has been optimized for use with agent stacks like Anthropic's Claude Code4.
The honest summary that emerges across these benchmarks: V4-Pro is decisively the best open-weight model on coding and competitive programming, competitive on agentic tasks, and roughly 3–6 months behind the closed frontier on world-knowledge reasoning2.
API Pricing: The 1/6th and 1/7th Numbers
This is where V4 reshapes the conversation.
V4-Pro
| Tier | Per million tokens |
|---|---|
| Input (cache hit) | $0.145 |
| Input (cache miss) | $1.74 |
| Output | $3.48 |
V4-Flash
| Tier | Per million tokens |
|---|---|
| Input | $0.14 |
| Output | $0.28 |
Frontier comparisons
| Model | Input | Output | V4-Pro savings (output) |
|---|---|---|---|
| Claude Opus 4.7 | $5.00 | $25.00 | ~7.2× cheaper |
| GPT-5.5 | $5.00 | $30.00 | ~8.6× cheaper |
| GPT-5.4 | $2.50 | $15.00 | ~4.3× cheaper |
A workload that produces 100M output tokens per month on Claude Opus 4.7 costs $2,500. The same workload on V4-Pro is $348. The same workload on GPT-5.5 is $3,0009. VentureBeat described V4-Pro as arriving "at 1/6th the cost" of Opus 4.7 and GPT-5.5; on cache-miss input that is closer to one-seventh, but the framing holds9.
The V4-Flash numbers are even more striking. At $0.14/$0.28, V4-Flash undercuts every Western frontier-tier "small" model — GPT-5.4 Nano, Gemini 3.1 Flash, GPT-5.4 Mini, and Claude Haiku 4.5 — while still scoring 91.6% on LiveCodeBench and shipping output at roughly 83.7 tokens per second on DeepSeek's API22.
The Huawei Ascend Story
Alongside the V4 release, Huawei announced "full support" for V4 inference on its Ascend AI processors. Huawei said its entire Ascend SuperNode product line was "fully adapted" to V4 ahead of the launch, with DeepSeek and Huawei collaborating closely in the run-up to the release1112.
DeepSeek separately reported that V4 demonstrates performance parity on Huawei Ascend NPUs and NVIDIA GPUs — a claim that, if independently confirmed, undercuts the assumption that Chinese AI labs need NVIDIA's latest silicon to ship a frontier-class model23.
The forward-looking piece: Huawei's Ascend 950PR chip launched in Q1 2026 and the complementary 950DT is slated to ship by end of 2026, with DeepSeek expecting Ascend 950 SuperNodes to reach scale availability in the second half of the year1124.
This collaboration is being read in Beijing as a step toward AI self-reliance under continuing US export controls. It is also, more practically, a hedge: if NVIDIA's H100/H200/B200 supply remains constrained for Chinese labs, V4's training and deployment pipeline now has a domestic fallback.
How V4 Compares to the Open-Weight Field
V4 enters a competitive open-weight landscape that has tightened considerably over the past six months:
| Model | Lab | Released | Open weights | Top benchmark strength |
|---|---|---|---|---|
| DeepSeek V4-Pro | DeepSeek | April 24, 2026 | Yes (MIT) | SWE-bench, LiveCodeBench, Codeforces |
| GLM-5.1 | Z.ai | April 7, 2026 | Yes | SWE-Bench Pro (open-weight leader) |
| Kimi K2.6 | Moonshot AI | April 20, 2026 | Yes (Modified MIT) | SWE-Bench, agentic swarms |
| Llama 4 (Scout/Maverick) | Meta | 2025 | Yes (custom license) | General-purpose |
V4-Pro takes the open-weight crown on coding and competitive programming, while GLM-5.1 still leads on SWE-Bench Pro at 58.4%. The headline is that the open-weight ceiling has moved into a position where, on most non-expert-knowledge benchmarks, it is within a single-digit gap of the closed frontier2.
What This Means for Builders
Three practical takeaways for teams deciding where to spend their token budget:
For coding-heavy workloads: V4-Pro and Claude Opus 4.6 are now within 0.2 points on SWE-bench Verified. The cost differential is roughly 7×. For high-volume code-generation and code-review workflows where the marginal task does not require frontier-tier reasoning, V4-Pro is the new default. Pair it with a frontier closed model for hard escalations.
For agentic workflows: V4-Pro is competitive on BrowseComp and Terminal-Bench 2.0, trailing the top closed frontier models by a few points on each. For agent stacks where the orchestration layer can route hard subtasks to a closed frontier model and dispatch the rest to V4-Pro, the stack-level economics shift dramatically.
For self-hosters: V4-Flash at 284B total / 13B active is the more interesting number. The full V4-Pro requires substantial infrastructure even quantized; V4-Flash is in the range that a well-funded team can run on commodity GPU clusters or Huawei Ascend supernodes. Combined with MIT licensing and 1M context, V4-Flash is the strongest open-weight self-hosting target for code-and-agent workloads released to date.
The Bottom Line
DeepSeek V4 does not redraw the frontier — Claude Opus 4.7 and GPT-5.5 still lead on the hardest expert-knowledge and reasoning benchmarks. What V4 does is collapse the price of access to near-frontier capability. A 1.6T MoE that sits within 0.2 points of Claude Opus 4.6 on SWE-bench Verified, with 1M context and MIT licensing, at $1.74 input and $3.48 output per million tokens, is a different kind of release than what we have seen from any other lab in the past quarter.
The Huawei Ascend integration adds a second layer: this is the first frontier-class Chinese AI release with confirmed domestic-silicon production and inference pathways. For a builder choosing where to spend the next million tokens of budget, V4 is now the default open-weight pick on coding workloads and one of the strongest options on agentic tasks. For the larger picture of how OpenAI's GPT-5.5 reshaped pricing earlier this week, V4 is the response that closed-source labs were probably hoping would not arrive this quickly.
What remains genuinely open is whether DeepSeek can sustain this cadence. V3.2 shipped in December 2025; V4 in April 2026. If V4.x and V5 land on similar timelines, the gap between the open-weight ceiling and the closed frontier may keep closing. If they do not, V4 becomes the high-water mark for a lab that earned its reputation by punching above its compute budget.
References
Footnotes
-
TechCrunch — "DeepSeek previews new AI model that 'closes the gap' with frontier models", April 24, 2026. ↩ ↩2
-
Bloomberg — "DeepSeek Unveils Newest Flagship AI Model a Year after Upending Silicon Valley", April 24, 2026. ↩ ↩2 ↩3 ↩4
-
CNBC — "China's DeepSeek releases preview of long-awaited V4 model as AI race intensifies", April 24, 2026. ↩
-
MarkTechPost — "DeepSeek AI Releases DeepSeek-V4: Compressed Sparse Attention and Heavily Compressed Attention Enable One-Million-Token Contexts", April 24, 2026. ↩ ↩2 ↩3 ↩4 ↩5
-
NVIDIA Technical Blog — "Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints", April 24, 2026. ↩ ↩2
-
NxCode — "DeepSeek V4 (2026): 1T Parameters, 81% SWE-bench, $0.30/MTok — Full Specs", April 24, 2026. ↩ ↩2
-
BuildFastWithAI — "DeepSeek V4-Pro Review: Benchmarks, Pricing & Architecture", April 24, 2026. ↩ ↩2
-
DeepSeek API Docs — "Models & Pricing". ↩ ↩2
-
VentureBeat — "DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5", April 24, 2026. ↩ ↩2 ↩3 ↩4
-
Hugging Face Blog — "DeepSeek-V4: a million-token context that agents can actually use", April 24, 2026. ↩ ↩2 ↩3
-
South China Morning Post — "DeepSeek unveils next-gen AI model as Huawei vows 'full support' with new chips", April 24, 2026. ↩ ↩2 ↩3 ↩4
-
SCMP — "Huawei, DeepSeek strengthen China's AI self-reliance with collaboration on V4 model", April 24, 2026. ↩ ↩2
-
DeepSeek (X/Twitter) — "Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale", December 2025. ↩
-
Hugging Face — deepseek-ai/DeepSeek-V4-Pro, MIT License. ↩ ↩2
-
vLLM Recipes — DeepSeek-V4-Pro, April 2026. ↩
-
Llama 4 Community License Agreement — llama.com/llama4/license/ — 700M MAU restriction; classified as "source available" rather than OSI open source. ↩
-
BenchLM — "DeepSeek V4 Pro Benchmarks 2026: Scores, Rankings & Performance", April 2026. ↩ ↩2
-
DEV Community — "DeepSeek Just Dropped V4. Here's What the Benchmarks Actually Tell You.", April 24, 2026. ↩
-
Officechai — "DeepSeek Releases DeepSeek V4-Pro & V4-Flash, Delivers GPT 5.4 & Opus 4.6-Level Performance At Fraction Of The Price", April 24, 2026. ↩
-
SCMP — "Underwhelming or underrated? DeepSeek V4 shows 'impressive' gains", April 24, 2026 — HLE without-tools comparison vs Opus 4.7 and GPT-5.5. ↩
-
AnalyticsIndiaMag — "DeepSeek Releases V4 Pro, Challenging OpenAI, Anthropic on Key Benchmarks", April 24, 2026. ↩
-
ArtificialAnalysis — "DeepSeek V4 Flash (Max) - Intelligence, Performance & Price Analysis", April 2026. ↩ ↩2
-
Phemex News — "DeepSeek V4 Matches NVIDIA on Huawei Ascend, Dispels Rumors", April 24, 2026. ↩ ↩2
-
TrendForce — "Decoding DeepSeek V4: How Huawei's Ascend 950 PR Is Powering China's Push to Break CUDA Dependence", April 7, 2026 — Ascend 950PR launched Q1 2026, 950DT expected by end of 2026. ↩