DeepSeek V4: Frontier-Class Open Weights at 1/7 the Cost
April 30, 2026
TL;DR
DeepSeek released V4-Pro and V4-Flash as a public preview on April 24, 2026.1 V4-Pro is a 1.6 trillion-parameter Mixture-of-Experts model with 49 billion parameters activated per token, a 1,000,000-token context window, and weights published on Hugging Face under the MIT license.2 On DeepSeek's own evaluations it scores 80.6% on SWE-bench Verified, 87.5% on MMLU-Pro, and a 3,206 Codeforces rating — matching the previous-generation Claude Opus 4.6 (80.8%) but trailing the current frontier (Claude Opus 4.7 at 87.6%, GPT-5.5 at ~82.6%).345 The headline number is the price: $1.74 per million input tokens and $3.48 per million output tokens at list, roughly 7× cheaper on output than Claude Opus 4.7 and 8× cheaper than GPT-5.5.67
The other story is hardware. V4 is the first frontier-scale Chinese model engineered to run inference natively on Huawei's Ascend chips, even though V4-Pro itself appears to have been trained primarily on NVIDIA GPUs.89
What You'll Learn
- What V4-Pro and V4-Flash actually are, and how they fit in the open-weight landscape
- The hybrid attention architecture (CSA + HCA) that makes 1M-token context cheap to serve
- How V4 benchmarks compare to Claude Opus 4.7 and GPT-5.5 — and where it falls short
- DeepSeek's preview pricing and how to think about cache hits and the launch promotion
- What the Huawei Ascend angle does and does not mean
- When to pick V4-Pro vs V4-Flash vs a closed frontier model
What DeepSeek Shipped
DeepSeek published two models simultaneously on April 24, 2026, alongside open weights, an API, and a technical report titled "DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence."12
| Model | Total params | Activated params | Context | Max output |
|---|---|---|---|---|
| DeepSeek-V4-Pro | 1.6 trillion | 49 billion | 1,000,000 tokens | 384,000 tokens |
| DeepSeek-V4-Flash | 284 billion | 13 billion | 1,000,000 tokens | 384,000 tokens |
Both models are sparse Mixture-of-Experts (MoE) architectures: only a fraction of the network's parameters are activated for any given token, which keeps inference cost roughly proportional to the activated count rather than the total. V4-Flash, in particular, behaves like a 13B-parameter model at runtime while drawing on a 284B-parameter expert pool.2
Both come with open weights under the MIT license on Hugging Face — one of the most permissive licenses in modern AI. There are no commercial-use restrictions, no acceptable-use clauses tied to user count, and no required attribution beyond the license text.2
DeepSeek frames the release as a preview. The previous-generation deepseek-chat and deepseek-reasoner API endpoints are scheduled for full retirement on July 24, 2026, after which V4 becomes the only path on the official API.1
The Architecture Story: Compressing Attention, Not the Model
The headline architectural innovation in V4 is not the parameter count — V4-Pro at 1.6T is a meaningful jump up from V3's 671B-parameter MoE, but the more interesting story is a new way to handle attention at million-token contexts.
V4 introduces a hybrid attention mechanism that interleaves two operators:2
- Compressed Sparse Attention (CSA) compresses small chunks of tokens into summary representations. Each new token attends only to the most relevant summaries via top-k selection, rather than every previous token.
- Heavily Compressed Attention (HCA) collapses much larger chunks into a single representation, giving the model a global view of the full context.
Layers alternate between CSA and HCA, so the model gets both fine-grained local recall and coarse-grained global context without paying full quadratic-attention cost everywhere. The result, per DeepSeek's technical report, is that at a 1M-token context V4-Pro requires only 27% of the per-token inference FLOPs and 10% of the KV cache memory of V3.2 at the same context length.1
Two other components show up in the technical report:2
- Manifold-Constrained Hyper-Connections (mHC), an extension of residual connections meant to stabilize signal flow across the model's many layers without sacrificing expressivity.
- The Muon optimizer, a relatively new alternative to AdamW that has been showing up in several large training runs in late 2025 and 2026.
Pre-training scale is reported at 32 trillion+ tokens, with a two-stage post-training pipeline: independent expert specialization via supervised fine-tuning and reinforcement learning, then unified consolidation through on-policy distillation.2
Benchmarks: Near-Frontier, Not Frontier
V4's benchmark scores are strong, especially for an open-weight model, but they sit just below the closed frontier on most of the comparisons the industry actually argues about.
| Benchmark | V4-Pro (DeepSeek-published) | Reference point |
|---|---|---|
| SWE-bench Verified | 80.6% | Claude Opus 4.6: 80.8%; Claude Opus 4.7: 87.6%; GPT-5.5: ~82.6%345 |
| SWE-bench Multilingual | 76.2% | — |
| MMLU-Pro | 87.5% | Comparable to GPT-5.4 generation; trails the closed frontier (Gemini 3.1 Pro reports ~91% on this benchmark)10 |
| HMMT 2026 (math) | 95.2% | Strong, but trails GPT-5.4 and Claude on the same benchmark per third-party comparisons10 |
| Codeforces (Elo rating) | 3,206 | GPT-5.4 reported at ~3,168 in third-party comparisons11 |
| LiveCodeBench | 93.5 | — |
The Codeforces number is the one DeepSeek leans on hardest. A rating of 3,206 puts V4-Pro inside the top-tier human grandmaster band on the platform — and the company says it edges out GPT-5.4's reported 3,168.11 OpenAI itself does not publish Codeforces ratings as official benchmarks, so this comparison is built on third-party evaluations rather than head-to-head leaderboard runs.
On SWE-bench Verified — the closer-to-real-world software engineering benchmark — V4-Pro is statistically tied with Claude Opus 4.6 (80.6% vs 80.8%) but trails the current frontier by a real margin: Claude Opus 4.7 reports 87.6% (a 7-point gap) and GPT-5.5 reports roughly 82.6% (a 2-point gap).45 The pitch is not that V4-Pro wins, but that it loses by a margin smaller than the 7-9× output-cost gap.
Knowledge snippet: DeepSeek V4-Pro is a 1.6T-parameter Mixture-of-Experts model with 49B parameters activated per token, a 1,000,000-token context window, and MIT-licensed open weights. On DeepSeek's own evaluations it scores 80.6% on SWE-bench Verified — statistically tied with Claude Opus 4.6 (80.8%) but trailing Claude Opus 4.7 (87.6%) by 7 points — at roughly 7× lower output cost than Opus 4.7 list pricing.
The Pricing: Where V4 Actually Wins
This is where the story tightens. DeepSeek published two price tiers for V4-Pro: a list price that takes effect after the launch window, and a promotional preview price active until late May 2026.6
| Model | Input ($/M tokens) | Output ($/M tokens) | Notes |
|---|---|---|---|
| DeepSeek V4-Pro (list / post-promo) | $1.74 | $3.48 | Standard pricing after promotional window |
| DeepSeek V4-Pro (launch promo) | $0.435 | $0.87 | Active until ~May 31, 2026 |
| DeepSeek V4-Flash | $0.14 | $0.28 | Standard pricing |
| Claude Opus 4.7 | $5.00 | $25.00 | List API price7 |
| GPT-5.5 | $5.00 | $30.00 | List API price12 |
| GPT-5.5 Pro | $30.00 | $180.00 | Pro tier list price12 |
Using the V4-Pro list price (the durable comparison, since the promo expires), the gap looks like this:
- Output tokens: V4-Pro is roughly 7.2× cheaper than Claude Opus 4.7 ($25 / $3.48) and 8.6× cheaper than GPT-5.5 ($30 / $3.48).
- Input tokens: V4-Pro is roughly 2.9× cheaper than either Opus 4.7 or GPT-5.5 on a cache-miss basis.
Cache hits are an additional lever. From April 26, 2026 onward, DeepSeek dropped the cache-hit input price across V4 models to one-tenth of the cache-miss price — so an application with high prefix reuse pays even less.6
The headline VentureBeat used at launch — "near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5" — sits in the same ballpark as the list comparison above.3
The Huawei Ascend Angle (and What It Doesn't Mean)
Coverage of the V4 launch leaned heavily on the framing that V4 runs on Huawei's Ascend chips rather than NVIDIA. This is partly true and worth being careful about.
What is verified:
- V4 is the first Chinese frontier-scale model designed to run native inference on Huawei Ascend, and DeepSeek's API is reportedly already serving V4 from Ascend clusters in China.89
- DeepSeek's technical report is silent on what chips V4 was trained on. Community speculation — picked up alongside the MIT Technology Review coverage — suggests V4-Flash (the smaller 284B-parameter model) may have been used to test Huawei training infrastructure, but this is not confirmed by DeepSeek.9
What is not verified, despite some headlines:
- V4-Pro itself is probably still trained primarily on NVIDIA hardware. A computer-science professor at Tsinghua University told MIT Technology Review that DeepSeek's report does not describe a full Ascend-only training pipeline for V4-Pro, and that long-context features in particular look like they came from NVIDIA GPUs.9
So "China builds frontier AI without NVIDIA" is the wrong takeaway for now. The accurate version is more interesting: DeepSeek built an inference path that works on domestic chips, which is the part of the AI export-control story that actually changes day-to-day economics inside China.
V4 vs the Open-Weight Landscape
V4 is not the first open-weight model to ship a 1M-token context — Llama 4 Maverick offered 1M context as a fine-tuned variant in 2025, and Qwen 3.6 Plus opened a 1M-token preview on March 31, 2026.13 Calling V4 "the first 1M-token open-weight model" is wrong. (For a longer treatment of how 1M-token windows actually behave in practice, see our guide on context window optimization for LLMs.)
What V4 is among open-weight models, as of late April 2026:
- The largest open-weight model by total parameters at 1.6T (DeepSeek V3 was 671B / 37B activated; Llama 4 Maverick is 400B / 17B activated; Qwen 3.6 Plus parameter count is not publicly disclosed by Alibaba).
- The strongest reported open-weight Codeforces score at 3,206.
- Released under MIT, alongside Qwen's Apache 2.0 as the most permissive licenses in the open-weight tier — a meaningful difference from the Llama 4 Community License, which has acceptable-use clauses tied to user counts.13
It is not the SWE-bench leader: Claude Opus 4.7 sits ahead of V4 by 7 points on SWE-bench Verified (87.6% vs 80.6%) (see our Claude Opus 4.5 deep dive for the family's lineage and benchmark history), and GPT-5.5 is also ahead at ~82.6%.45
When to Pick V4 — And When Not To
A practical decision matrix, given the April 2026 landscape:
| Use case | Better fit |
|---|---|
| High-volume coding agents on a price-per-task budget | V4-Pro — output cost matters more than the last 5% of SWE-bench accuracy |
| Document analysis at 500K+ token contexts | V4-Pro or V4-Flash — hybrid attention makes this cheap |
| Top-of-line agentic coding where accuracy gaps matter | Claude Opus 4.7 — leads SWE-bench Verified at 87.6%, a 7-point gap over V4-Pro4 |
| Production deployments needing strict data residency in China | V4 on Huawei Ascend — uniquely compatible8 |
| Self-hosted on commodity hardware | V4-Flash — 13B activated params makes it tractable to host |
| Workloads where output tokens dominate cost | V4-Pro — the 7-8× output discount stacks fast |
| Small-context cheap calls (chat replies, classifications) | V4-Flash or GPT-5.4 Mini — neither needs a 1M model (GPT-5.5 Mini was not yet released as of late April 2026) |
Caveats Worth Stating Clearly
Three.
First, the benchmarks above are DeepSeek's own published numbers. Independent reproductions for V4-Pro are still arriving, and the model has only been in public preview for six days as of writing. Some early third-party evaluations have come in slightly below DeepSeek's published figures on tasks like SWE-bench Pro.3 Treat the official numbers as upper-bound claims until external benchmarks settle.
Second, the launch promotional pricing — $0.435/M input and $0.87/M output for V4-Pro — is a temporary discount, not the durable cost. The list price of $1.74/$3.48 is what budgets should be built around.6
Third, "open weights" does not mean "easy to self-host." 1.6T parameters at FP4/FP8 mixed precision is still hundreds of gigabytes of weights, and serving 1M-token contexts at low latency requires significant accelerator capacity. For most teams, V4 is open-weight in spirit and API-only in practice. If you are trying to keep token bills down on either tier, our notes on saving tokens and optimizing prompts cover the levers that matter most.
The Bottom Line
DeepSeek V4 doesn't take the frontier. It moves the price-performance curve sharply downward, in the open. For high-volume agentic coding, document analysis, and any workload where output tokens dominate the bill, V4-Pro at $3.48 per million output tokens is hard to ignore — even though Claude Opus 4.7 leads SWE-bench Verified by 7 points (87.6% vs 80.6%). For teams wanting to self-host or run on Huawei Ascend hardware, V4 is the first frontier-class model that lets them.
The next interesting question is whether OpenAI and Anthropic respond on price, on capability, or both. The open-weight tier just got a much harder ceiling to hide behind.
Footnotes
-
DeepSeek API Documentation, "DeepSeek V4 Preview Release," April 24, 2026. https://api-docs.deepseek.com/news/news260424 ↩ ↩2 ↩3 ↩4
-
deepseek-ai/DeepSeek-V4-Pro on Hugging Face, model card and technical report. https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8
-
VentureBeat, "DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5," April 24, 2026. https://venturebeat.com/technology/deepseek-v4-arrives-with-near-state-of-the-art-intelligence-at-1-6th-the-cost-of-opus-4-7-gpt-5-5 ↩ ↩2 ↩3 ↩4
-
Anthropic, "Introducing Claude Opus 4.7," April 16, 2026, headline benchmark of 87.6% on SWE-bench Verified (up from 80.84% on Opus 4.6). https://www.anthropic.com/news/claude-opus-4-7 ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
vals.ai SWE-bench leaderboard tracking GPT-5.5 at 82.60% on SWE-bench Verified. https://www.vals.ai/benchmarks/swebench ↩ ↩2 ↩3 ↩4 ↩5
-
DeepSeek API Documentation, Models & Pricing. https://api-docs.deepseek.com/quick_start/pricing ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
Anthropic, Claude Opus 4.7 product page and pricing. https://www.anthropic.com/claude/opus ↩ ↩2
-
Fortune, "DeepSeek unveils V4 model, with rock-bottom prices and close integration with Huawei's chips," April 24, 2026. https://fortune.com/2026/04/24/deepseek-v4-ai-model-price-performance-china-open-source/ ↩ ↩2 ↩3
-
MIT Technology Review, "Three reasons why DeepSeek's new model matters," April 24, 2026. https://www.technologyreview.com/2026/04/24/1136422/why-deepseeks-v4-matters/ ↩ ↩2 ↩3 ↩4 ↩5
-
Third-party benchmark comparisons of DeepSeek V4 vs GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on MMLU-Pro and HMMT 2026 (independent comparisons; specific competitor scores vary by source). https://framia.pro/page/en-US/news/deepseek-v4-benchmarks ↩ ↩2
-
Codeforces blog and DeepSeek announcement (third-party reported comparison; OpenAI does not publish Codeforces ratings as official benchmarks). https://codeforces.com/blog/entry/153267 ↩ ↩2
-
OpenAI API Pricing. https://openai.com/api/pricing/ ↩ ↩2
-
Best Open-Source LLMs comparison, April 2026 (Llama 4 Maverick and Qwen 3.6 Plus 1M-context details). https://lushbinary.com/blog/best-open-source-llms-april-2026-comparison-guide/ ↩ ↩2