Q: What is the actual context window?

1 million tokens for both V4-Pro and V4-Flash, with a maximum output of 384,000 tokens — both higher than what many "1M context" claims actually deliver in practice 5 .

Q: Is V4 the first 1M-token open-weight model?

No. Llama 4 Maverick offered 1M context as a fine-tuned variant in 2025, and Qwen 3.6 Plus opened a 1M-token preview on March 31, 2026. V4's contribution is making 1M context cheap to serve via Hybrid Attention, not the context length itself 12 .

Q: What is the difference between the cache-hit and cache-miss input price?

DeepSeek's API offers context caching: when input tokens are reused across requests, those tokens are served at the cache-hit rate of $0.174 per million (1/10 of cache-miss) instead of the standard $1.74 per million. Whether your workload benefits depends on how much context is shared across calls 10 .

Q: When does the launch promo expire?

May 31, 2026, 15:59 UTC. During the promo, V4-Pro is at $0.435/$0.87 per million (75% off list). After that, list pricing of $1.74/$3.48 takes effect 10 .

Q: Does V4 actually run on Huawei chips?

Yes for inference — Huawei has confirmed full Ascend SuperNode support, and DeepSeek has reported performance parity on Ascend NPUs and NVIDIA GPUs for serving. For training , V4-Pro itself was likely still trained mainly on NVIDIA GPUs per a Tsinghua University expert quoted in MIT Technology Review 8 24 13 .

Q: Is V4 better than Claude Opus 4.7 or GPT-5.5?

No, not on the hardest benchmarks. On SWE-bench Verified, Opus 4.7 leads at 87.6%, GPT-5.5 sits around 82.6% (vals.ai), and V4-Pro lands at 80.6%. V4 trails by roughly 3–8 points on HLE without tools. The honest framing is "near-frontier on coding, behind on world knowledge — at one-seventh the price" 7 11 8 9 .

Q: What is the license?

MIT — both the weights and the inference reference implementation. MIT is among the most permissive widely-used open-source licenses, with no commercial-use or downstream restrictions. Note this is open- weight , not strictly open- source : DeepSeek did not release training data or the full pipeline 15 .

Q: How fast is V4-Flash?

Roughly 83.7 tokens per second of output on DeepSeek's hosted API — well above average for an open-weight model in its size class 23 .

DeepSeek V4: Open-Weight Frontier at 1/7 the Cost

May 2, 2026

#DeepSeek V4 #DeepSeek V4 Pro #DeepSeek V4 Flash #DeepSeek #open-weight LLM #open-source LLM #MoE #Mixture of Experts #Hybrid Attention Architecture #Compressed Sparse Attention #million-token context #SWE-bench Verified #LiveCodeBench #Terminal-Bench #Huawei Ascend #MIT license #Claude Opus 4.7 #GPT-5.5 #Gemini 3.1 Pro #LLM benchmarks #LLM pricing

DeepSeek V4: Open-Weight Frontier at 1/7 the Cost

TL;DR

On April 24, 2026, DeepSeek released DeepSeek V4 as a preview — a two-model open-weight family shipped under the MIT license: V4-Pro (1.6 trillion total parameters, 49 billion active per token, pre-trained on 33 trillion tokens) and V4-Flash (284 billion total, 13 billion active)¹²³. Both models support a 1 million-token context window and 384,000-token maximum output⁴⁵. V4-Pro (at maximum reasoning effort) scores 80.6% on SWE-bench Verified — statistically tied with the previous-generation Claude Opus 4.6 (80.8%) but trailing the current frontier Claude Opus 4.7 (87.6%) by 7 points and GPT-5.5 (~82.6% on vals.ai's leaderboard) by 2 points⁶⁷⁸⁹. On LiveCodeBench V4-Pro scores 93.5%, leading Gemini 3.1 Pro (91.7%) and Claude Opus 4.6 (88.8%)⁷. List API pricing lands at $1.74 per million input tokens (cache-miss) and $3.48 per million output tokens for V4-Pro, with cache hits at $0.174 per million — roughly one-seventh the output cost of Claude Opus 4.7 ($5/$25) and one-eighth the cost of GPT-5.5 ($5/$30)¹⁰¹¹. A 75% launch promo discounts V4-Pro to $0.435/$0.87 until May 31, 2026¹⁰. The architecture's headline change is a Hybrid Attention mechanism combining Compressed Sparse Attention and Heavily Compressed Attention, cutting V4-Pro's per-token inference FLOPs to 27% of V3.2's and KV cache to 10% at the 1M-token setting⁴¹². Huawei announced "full support" via its Ascend SuperNode product line for inference, while V4-Pro itself appears to have been trained primarily on NVIDIA GPUs⁸⁹¹³.

What You'll Learn

Why the V4 architecture's Hybrid Attention is more than a marketing label
The exact V4-Pro and V4-Flash benchmark scores against Opus 4.7, GPT-5.5, and Gemini 3.1 Pro
API pricing tiers — including the cache-hit price and the 75% launch promo that expires May 31
What the Huawei Ascend partnership does and does not mean about training versus inference
A practical decision matrix for when to pick V4-Pro versus the closed frontier
Where V4 wins, where it trails, and where the open-weight frontier sits today

A Two-Model Open-Weight Release Under MIT License

DeepSeek shipped V4 as a preview release on April 24, 2026, roughly five months after V3.2's December 2025 launch²¹⁴. The previous-generation deepseek-chat and deepseek-reasoner API endpoints are scheduled for full retirement on July 24, 2026, after which V4 becomes the only path on the official API¹.

The family contains two distinct Mixture-of-Experts (MoE) models:

Model	Total params	Active params	Pre-training tokens	Context	Max output
DeepSeek V4-Pro	1.6T	49B	33T	1M	384K
DeepSeek V4-Flash	284B	13B	32T	1M	384K

Both models are released under the MIT license and published as open weights on Hugging Face. The instruct variants use mixed FP4/FP8 precision — MoE expert weights in FP4 and the rest in FP8 — while the base models are FP8 mixed¹⁵. Both expose three reasoning-effort modes — non-think, think (high), and think (max) — with thinking enabled by default at the high setting¹⁶.

A note on terminology: this is an open-weight release, not strictly open-source. DeepSeek published the weights under MIT, but did not release the full training data or the training pipeline source. That posture is the same as Llama 4, Qwen 3.6, and most other "open" frontier models¹⁵.

That MIT license still matters. Llama 4's release uses Meta's custom Community License, which restricts companies above 700 million monthly active users and is not OSI-approved as open source¹⁷. Some other open-weight releases (including Kimi K2.6) ship under "modified MIT" terms. DeepSeek choosing standard MIT — one of the most permissive widely-used open-source licenses — means V4 weights can be downloaded, fine-tuned, redistributed, and run commercially with no legal acrobatics.

Hybrid Attention Architecture: Why the Million-Token Window Is Cheaper

V4 is not the first open-weight model to ship a 1M-token context — Llama 4 Maverick offered 1M context in 2025, and Qwen 3.6 Plus opened a 1M-token preview on March 31, 2026¹². Calling V4 the first 1M-token open-weight model is wrong. What V4 does claim is a fundamentally cheaper way to actually serve that context.

The architectural headline of V4 is a Hybrid Attention mechanism that interleaves two new attention layers across the Transformer stack⁴¹².

Compressed Sparse Attention (CSA) compresses the key-value cache of every m tokens into a single entry using a learned token-level compressor. Each query then attends only to the top-k selected compressed entries via DeepSeek Sparse Attention (DSA). A sliding window branch runs in parallel for local-dependency modeling.

Heavily Compressed Attention (HCA) is more aggressive: it consolidates m' tokens (where m' is much larger than m) into a single compressed entry, then applies dense attention across those compressed representations. Layers alternate between CSA and HCA, so the model gets both fine-grained local recall and coarse-grained global context without paying full quadratic-attention cost everywhere.

The technical report also introduces Manifold-Constrained Hyper-Connections (mHC), an extension of residual connections meant to stabilize signal flow across layers, and uses the Muon optimizer — a relatively new alternative to AdamW that has been showing up in several large training runs in late 2025 and early 2026¹⁵.

The efficiency numbers are the point. At a 1M-token context:

Model	Per-token inference FLOPs vs V3.2	KV cache size vs V3.2
V4-Pro	27%	~10%
V4-Flash	10%	7%

That is the gap between "1M context exists on a spec sheet" and "1M context is something an agent can actually use without burning a server farm." The Hugging Face technical write-up frames it as "a million-token context that agents can actually use" — which, given how often 1M-context claims fall apart in practice, is the harder problem to solve¹².

Benchmark Scorecard: V4-Pro vs the Frontier

The DeepSeek team published benchmarks across coding, math, reasoning, and agentic categories. The pattern is consistent: V4-Pro leads or ties on coding and competitive programming when measured against the previous generation, sits in the upper-middle pack on math, and trails meaningfully on the hardest expert-knowledge benchmarks against the current frontier (Opus 4.7 and GPT-5.5).

Coding and software engineering

V4 exposes three reasoning-effort modes — Non-think, Think (high), and Think (max). The headline scores below are reported at the Think Max setting (V4-Pro-Max), so they are compared against competitor models at their maximum-effort settings where applicable¹⁸.

Benchmark	DeepSeek V4-Pro (Max)	Claude Opus 4.6	Claude Opus 4.7	GPT-5.5	Gemini 3.1 Pro
SWE-bench Verified	80.6%	80.8%	87.6%	~82.6%	—
LiveCodeBench	93.5%	88.8%	—	—	91.7%
Codeforces (rating)	3206	—	—	3168 (GPT-5.4)	3052
Terminal-Bench 2.0	67.9%	65.4%	—	75.1% (GPT-5.4 xHigh)	68.5%

The honest framing on SWE-bench Verified: V4-Pro is statistically tied with Opus 4.6 (the previous Anthropic flagship released before V4) but trails Opus 4.7 by 7 points and GPT-5.5 by ~2 points (per vals.ai's leaderboard tracking)⁸⁹. Note that benchmark-aggregation sources disagree on GPT-5.5 — BenchLM and TokenMix report 88.7%, while vals.ai reports 82.6%. The discrepancy likely reflects standard tier versus Pro/high-effort settings; treat 82.6% as a conservative floor.

On LiveCodeBench V4-Pro leads the comparable Anthropic and Google flagships outright. On Terminal-Bench 2.0, it edges Claude Opus 4.6 by 2.5 points but trails GPT-5.4 (xHigh setting) by 7.2 points and Gemini 3.1 Pro by 0.6 points¹⁸.

Math and reasoning

Benchmark	DeepSeek V4-Pro (Max)	Claude Opus 4.6	GPT-5.4	Gemini 3.1 Pro
IMO AnswerBench	89.8	75.3	91.4	81.0
HMMT 2026	95.2%	96.2%	97.7%	—
MMLU-Pro	87.5%	—	—	~91%
HLE (Humanity's Last Exam, no tools)	37.7%	40.0%	39.8%	44.4%

V4-Pro takes IMO AnswerBench against Claude Opus 4.6 and Gemini 3.1 Pro and sits within striking distance of GPT-5.4. On HMMT 2026 and HLE without tools, the closed frontier models pull ahead¹⁹²⁰¹⁴.

HLE is particularly telling: it is the cross-domain expert benchmark where world-knowledge depth shows up, and V4-Pro lands more than six points behind Gemini 3.1 Pro. Against the newer Claude Opus 4.7 (released April 16, 2026) and GPT-5.5 (April 23, 2026), the gap on HLE widens further — Opus 4.7 reaches 46.9% and GPT-5.5 reaches 41.4% without tools²¹.

Agentic and web browsing

On BrowseComp, the agentic web-browsing benchmark, the V4-Pro Max variant reportedly scores 83.4% — putting it in the upper-middle pack among frontier closed-source models, with Claude Opus 4.7 at 79.3% and Gemini 3.1 Pro reported at 85.9%²². The DeepSeek team also says V4 has been optimized for use with agent stacks like Anthropic's Claude Code⁴.

The honest summary that emerges across these benchmarks: V4-Pro is decisively the best open-weight model on coding and competitive programming, competitive on agentic tasks, and roughly 3–8 percentage points behind the closed frontier on world-knowledge reasoning and the hardest software-engineering benchmarks²⁸.

This is where V4 reshapes the conversation — but only after the launch promo expires.

Tier	Per million tokens
Input (cache hit)	$0.174
Input (cache miss)	$1.74
Output	$3.48

The cache-hit price is exactly 1/10 of the cache-miss price — DeepSeek dropped it from the original launch ratio on April 26, 2026¹⁰.

Tier	Per million tokens
Input (cache hit)	$0.003625
Input (cache miss)	$0.435
Output	$0.87

A 75% discount is in effect on V4-Pro through the end of May 2026¹⁰. After that window, list pricing kicks in. Build budgets around the list price, not the promo — using $0.435/$0.87 as a planning number sets you up for a 4× sticker-shock when the promo ends.

V4-Flash (standard pricing)

Tier	Per million tokens
Input (cache hit)	$0.0028
Input (cache miss)	$0.14
Output	$0.28

Frontier comparisons (V4-Pro list output cost)

Model	Input	Output	V4-Pro savings (output)
Claude Opus 4.7	$5.00	$25.00	~7.2× cheaper
GPT-5.5	$5.00	$30.00	~8.6× cheaper
GPT-5.5 Pro	$30.00	$180.00	~51.7× cheaper
GPT-5.4	$2.50	$15.00	~4.3× cheaper

A workload that produces 100M output tokens per month on Claude Opus 4.7 costs $2,500. The same workload on V4-Pro at list is $348. The same workload on GPT-5.5 is $3,000¹¹. VentureBeat described V4-Pro as arriving "at 1/6th the cost" of Opus 4.7 and GPT-5.5; the math at list pricing is closer to 1/7 on output and roughly 1/3 on cache-miss input — but the order of magnitude holds¹¹.

The V4-Flash numbers are even more striking. At $0.14/$0.28, V4-Flash undercuts every Western frontier-tier "small" model — GPT-5.4 Nano, Gemini 3.1 Flash, GPT-5.4 Mini, and Claude Haiku 4.5 — while still scoring 91.6% on LiveCodeBench and shipping output at roughly 83.7 tokens per second on DeepSeek's API²³.

The Huawei Ascend Story: Inference Confirmed, Training Probably Still NVIDIA

Alongside the V4 release, Huawei announced "full support" for V4 inference on its Ascend AI processors. Huawei said its entire Ascend SuperNode product line was "fully adapted" to V4 ahead of the launch, with DeepSeek and Huawei collaborating closely in the run-up to the release⁸⁹.

DeepSeek separately reported that V4 demonstrates performance parity on Huawei Ascend NPUs and NVIDIA GPUs for inference workloads — a claim that, if independently confirmed, undercuts the assumption that Chinese AI labs need NVIDIA's latest silicon to serve a frontier-class model in production²⁴.

What the Ascend story does not establish: V4-Pro itself was probably still trained primarily on NVIDIA hardware. DeepSeek's technical report does not describe a full Ascend-only training pipeline, and a computer-science professor at Tsinghua University told MIT Technology Review that long-context features in particular look like they came from NVIDIA GPUs¹³. Community speculation suggests V4-Flash (the smaller 284B model) may have been used to test Huawei training infrastructure, but DeepSeek has not confirmed this.

So "China builds frontier AI without NVIDIA" is the wrong takeaway for V4. The accurate version is more interesting: DeepSeek built an inference path that works on domestic chips, which is the part of the AI export-control story that actually changes day-to-day economics inside China — even if training still depends on smuggled-in or pre-restriction NVIDIA stockpiles.

The forward-looking piece: Huawei's Ascend 950PR chip launched in Q1 2026 and the complementary 950DT is slated to ship by end of 2026, with DeepSeek expecting Ascend 950 SuperNodes to reach scale availability in the second half of the year⁸²⁵. If 950DT supports a credible training pipeline, the next DeepSeek release could be the first one with a plausible end-to-end domestic-silicon story.

How V4 Compares to the Open-Weight Field

V4 enters a competitive open-weight landscape that has tightened considerably over the past six months:

Model	Lab	Released	Open weights	License	Top benchmark strength
DeepSeek V4-Pro	DeepSeek	April 24, 2026	Yes	MIT	SWE-bench, LiveCodeBench, Codeforces
GLM-5.1	Z.ai	April 7, 2026	Yes	Apache 2.0	SWE-Bench Pro (open-weight leader)
Kimi K2.6	Moonshot AI	April 20, 2026	Yes	Modified MIT	SWE-Bench, agentic swarms
Qwen 3.6 Plus	Alibaba	March 31, 2026 (1M preview)	Yes	Apache 2.0	Multilingual, long-context
Llama 4 (Scout/Maverick)	Meta	2025	Yes	Custom Community	General-purpose

V4-Pro takes the open-weight crown on coding and competitive programming, while GLM-5.1 still leads on SWE-Bench Pro at 58.4%. The headline is that the open-weight ceiling has moved into a position where, on most non-expert-knowledge benchmarks, it is within a single-digit gap of the closed frontier².

Decision Matrix: When to Pick V4 — And When Not To

Given the April–May 2026 landscape:

Use case	Better fit	Why
High-volume coding agents on a price-per-task budget	V4-Pro	Output cost matters more than the last 5–7 SWE-bench points
Document analysis at 500K+ token contexts	V4-Pro or V4-Flash	Hybrid attention makes 1M-token serving cheap
Top-of-line agentic coding where accuracy gaps decide	Claude Opus 4.7	Leads SWE-bench Verified at 87.6% — a 7-point gap over V4-Pro
Production deployments needing strict data residency in China	V4 on Huawei Ascend	Uniquely compatible inference pathway
Self-hosted on commodity hardware	V4-Flash	13B activated params makes it tractable to host
Workloads where output tokens dominate cost	V4-Pro	The 7–8× output discount stacks fast at scale
Small-context cheap calls (chat, classifications)	V4-Flash or GPT-5.4 Mini	Neither needs a 1M-context model
Hardest world-knowledge reasoning	GPT-5.5 / Opus 4.7 / Gemini 3.1 Pro	V4-Pro is 6+ points behind on HLE without tools

Caveats Worth Stating Clearly

Three.

First, the benchmarks above are DeepSeek's own published numbers. Independent reproductions for V4-Pro are still arriving, and the model has been in public preview for just over a week as of writing. Some early third-party evaluations have come in slightly below DeepSeek's published figures on tasks like SWE-bench Pro³. Treat the official numbers as upper-bound claims until external benchmarks settle.

Second, the launch promotional pricing — $0.435/M input and $0.87/M output for V4-Pro — is a temporary discount, not the durable cost. The list price of $1.74/$3.48 is what budgets should be built around¹⁰. After May 31, 2026, anyone who priced their roadmap on the promo gets a 4× bill.

Third, "open weights" does not mean "easy to self-host." 1.6T parameters at FP4/FP8 mixed precision is still hundreds of gigabytes of weights, and serving 1M-token contexts at low latency requires significant accelerator capacity. For most teams, V4-Pro is open-weight in spirit and API-only in practice. V4-Flash is the more realistic self-hosting target.

What This Means for Builders

Three practical takeaways for teams deciding where to spend their token budget:

For coding-heavy workloads: V4-Pro at list pricing is roughly 7× cheaper on output than Claude Opus 4.7 while trailing it by 7 points on SWE-bench Verified. For high-volume code-generation and code-review workflows where the marginal task does not require frontier-tier reasoning, V4-Pro is the new default. Pair it with a frontier closed model for hard escalations, or use Opus 4.7 when accuracy gaps decide outcomes.

For agentic workflows: V4-Pro is competitive on BrowseComp and Terminal-Bench 2.0, trailing the top closed frontier models by a few points on each. For agent stacks where the orchestration layer can route hard subtasks to a closed frontier model and dispatch the rest to V4-Pro, the stack-level economics shift dramatically.

For self-hosters: V4-Flash at 284B total / 13B active is the more interesting number. The full V4-Pro requires substantial infrastructure even quantized; V4-Flash is in the range that a well-funded team can run on commodity GPU clusters or Huawei Ascend supernodes. Combined with MIT licensing and 1M context, V4-Flash is the strongest open-weight self-hosting target for code-and-agent workloads released to date.

For broader context on how frontier-tier output costs are being squeezed across the board, see our analysis of GPT-5.5 and OpenAI's first retrained base since GPT-4.5.

The Bottom Line

DeepSeek V4 does not redraw the frontier — Claude Opus 4.7 and GPT-5.5 still lead on the hardest expert-knowledge and reasoning benchmarks, and Opus 4.7's 7-point lead on SWE-bench Verified is real. What V4 does is collapse the price of access to near-frontier capability. A 1.6T MoE that lands within striking distance of Opus 4.6 on SWE-bench Verified and leads on LiveCodeBench, with 1M context and MIT licensing, at $1.74 input and $3.48 output per million tokens (list), is a different kind of release than what we have seen from any other lab in the past quarter.

The Huawei Ascend integration adds a second layer: V4 is the first frontier-class Chinese AI release with a confirmed domestic-silicon inference pathway, even if the training pipeline still leans on NVIDIA. For a builder choosing where to spend the next million tokens of budget — especially before the May 31 promo expires — V4 is now the default open-weight pick on coding workloads and one of the strongest options on agentic tasks.

What remains genuinely open is whether DeepSeek can sustain this cadence. V3.2 shipped in December 2025; V4 in April 2026. If V4.x and V5 land on similar timelines, the gap between the open-weight ceiling and the closed frontier may keep closing. If they do not, V4 becomes the high-water mark for a lab that earned its reputation by punching above its compute budget.

References

TechCrunch — "DeepSeek previews new AI model that 'closes the gap' with frontier models", April 24, 2026. ↩ ↩² ↩³
Bloomberg — "DeepSeek Unveils Newest Flagship AI Model a Year after Upending Silicon Valley", April 24, 2026. ↩ ↩² ↩³ ↩⁴
CNBC — "China's DeepSeek releases preview of long-awaited V4 model as AI race intensifies", April 24, 2026. ↩ ↩²
MarkTechPost — "DeepSeek AI Releases DeepSeek-V4: Compressed Sparse Attention and Heavily Compressed Attention Enable One-Million-Token Contexts", April 24, 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵
NVIDIA Technical Blog — "Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints", April 24, 2026. ↩ ↩²
NxCode — "DeepSeek V4 (2026): 1T Parameters, 81% SWE-bench, $0.30/MTok — Full Specs", April 24, 2026. ↩
BuildFastWithAI — "DeepSeek V4-Pro Review: Benchmarks, Pricing & Architecture", April 24, 2026. ↩ ↩² ↩³
Anthropic — "Introducing Claude Opus 4.7", April 16, 2026 — 87.6% on SWE-bench Verified per third-party leaderboards (Opus 4.6 baseline 80.84%). ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
vals.ai SWE-bench Verified leaderboard tracking GPT-5.5 at 82.60% on SWE-bench Verified (vals.ai/benchmarks/swebench). BenchLM and TokenMix report 88.7% for GPT-5.5 Pro/high-effort settings — treat scores between 82.6% and 88.7% as the reported range depending on tier and methodology. ↩ ↩² ↩³ ↩⁴ ↩⁵
DeepSeek API Docs — "Models & Pricing". 75% V4-Pro discount until 2026-05-31 15:59 UTC; cache-hit = 1/10 of cache-miss after April 26, 2026 adjustment. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
VentureBeat — "DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5", April 24, 2026. ↩ ↩² ↩³ ↩⁴
Hugging Face Blog — "DeepSeek-V4: a million-token context that agents can actually use", April 24, 2026. Includes precedent on Llama 4 Maverick and Qwen 3.6 Plus 1M-context releases. ↩ ↩² ↩³ ↩⁴ ↩⁵
MIT Technology Review — "Three reasons why DeepSeek's new model matters", April 24, 2026 — Tsinghua University CS professor on V4-Pro likely still trained primarily on NVIDIA GPUs, especially long-context features. ↩ ↩² ↩³
DEV Community — "DeepSeek Just Dropped V4. Here's What the Benchmarks Actually Tell You.", April 24, 2026. ↩ ↩²
Hugging Face — deepseek-ai/DeepSeek-V4-Pro, MIT License. Technical report, mHC + Muon optimizer details, FP4/FP8 precision. ↩ ↩² ↩³ ↩⁴
vLLM Recipes — DeepSeek-V4-Pro, April 2026. ↩
Llama 4 Community License Agreement — llama.com/llama4/license/ — 700M MAU restriction; classified as "source available" rather than OSI open source. ↩
BenchLM — "DeepSeek V4 Pro Benchmarks 2026: Scores, Rankings & Performance", April 2026. ↩ ↩²
Officechai — "DeepSeek Releases DeepSeek V4-Pro & V4-Flash, Delivers GPT 5.4 & Opus 4.6-Level Performance At Fraction Of The Price", April 24, 2026. ↩
AnalyticsIndiaMag — "DeepSeek Releases V4 Pro, Challenging OpenAI, Anthropic on Key Benchmarks", April 24, 2026. ↩
SCMP — "Underwhelming or underrated? DeepSeek V4 shows 'impressive' gains", April 24, 2026 — HLE without-tools comparison vs Opus 4.7 and GPT-5.5. ↩
Framia — "Third-party benchmark comparisons of DeepSeek V4 vs GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro", April 2026 — BrowseComp 83.4% for V4-Pro Max. ↩
ArtificialAnalysis — "DeepSeek V4 Flash (Max) - Intelligence, Performance & Price Analysis", April 2026. ↩ ↩²
Phemex News — "DeepSeek V4 Matches NVIDIA on Huawei Ascend, Dispels Rumors", April 24, 2026. ↩ ↩²
TrendForce — "Decoding DeepSeek V4: How Huawei's Ascend 950 PR Is Powering China's Push to Break CUDA Dependence", April 7, 2026 — Ascend 950PR launched Q1 2026, 950DT expected by end of 2026. ↩

Frequently Asked Questions

It is a preview release as of April 24, 2026. The API is live and the weights are on Hugging Face, but DeepSeek has framed this as a preview. The previous-generation deepseek-chat and deepseek-reasoner endpoints retire on July 24, 2026, which functions as a soft GA deadline 1 4 .