What does "open-weight" mean here?

The model weights are downloadable from Hugging Face and usable under the license attached (MIT for GLM-5.1 and DeepSeek V4; Modified MIT for Kimi K2.6 and MiniMax M2.7). All four models in this wave can be self-hosted on appropriate hardware. Training data and training code are not necessarily released — these are "open-weight," not fully open-source. MiniMax M2.7's Modified-MIT license also restricts commercial use without prior authorization.

Why did four Chinese labs release in 18 days?

The releases bracket Anthropic's Claude Opus 4.7 launch on April 16, 2026, suggesting at least partial competitive timing. Each lab targets a slightly different niche — Z.ai on agentic engineering with permissive licensing, MiniMax on self-improving scaffolds, Moonshot on agent swarms, DeepSeek on inference efficiency — so the cluster is less monolithic than the timeline implies.

Is DeepSeek V4-Flash actually cheaper than running my own GPU?

At $0.14 input / $0.28 output per million tokens on a cache miss, DeepSeek V4-Flash is below the marginal cost of running most self-hosted MoE setups for low-volume workloads. The economics flip back toward self-hosting at very high request volumes or where data residency requires VPC inference. Promotional pricing on V4-Pro also runs only until May 31, 2026.

Will Western frontier models close the price gap?

The current pattern is that closed-source frontier models hold a capability lead and charge accordingly, while open-weight models from Chinese labs price aggressively against that capability gap. There is no public signal that Anthropic, OpenAI, or Google plans Opus/GPT/Gemini list-price cuts of the magnitude needed to match Chinese open-weight rates.

ai-ml

China's Open-Weight Coding Wave: 4 Models, 18 Days

May 17, 2026

#open-weight LLM #DeepSeek V4 #GLM-5.1 #Kimi K2.6 #MiniMax M2.7 #agentic coding #Chinese AI models #SWE-Bench Pro #LLM pricing #mixture of experts

China's Open-Weight Coding Wave: 4 Models, 18 Days

TL;DR

Between April 7 and April 24, 2026, four Chinese AI labs shipped open-weight coding models in close succession: Z.ai's GLM-5.1, MiniMax's M2.7 (open-sourced April 12 after a March announcement), Moonshot's Kimi K2.6, and DeepSeek's V4 (Pro and Flash variants). All target agentic engineering workflows, all are mixture-of-experts architectures, and all price at roughly an order of magnitude below Claude Opus 4.7 — which holds the top of the publicly available SWE-Bench Pro leaderboard at 64.3% versus the open-weight high mark of 58.6% from Kimi K2.6 (tied at 58.6% with OpenAI's GPT-5.5). Anthropic's invitation-only Claude Mythos Preview is ahead of all of these at 77.8% but is not generally available. The story isn't that open weights overtook the frontier. It's that the inference-cost floor for "good enough" agentic coding just collapsed.

What You'll Learn

The release timeline and specs for GLM-5.1, MiniMax M2.7, Kimi K2.6, and DeepSeek V4
Where each model lands on SWE-Bench Pro and SWE-Bench Verified
How list prices compare to Claude Opus 4.7 and what that means for production budgets
The architectural moves that made these models possible — Muon optimizer, compressed attention, agent swarms, self-improving scaffolds
Where Western frontier models still hold a real lead

The 18-Day Release Window

The compressed timeline is the headline. Four major Chinese labs shipped open-weight coding models within an 18-day window, each one positioned to compete on agentic engineering rather than chat¹:

Date	Model	Lab	License
Apr 7, 2026	GLM-5.1	Z.ai	MIT²
Apr 12, 2026	MiniMax M2.7 (open)	MiniMax	Modified-MIT (non-commercial)³
Apr 20, 2026	Kimi K2.6	Moonshot AI	Modified MIT⁴
Apr 24, 2026	DeepSeek V4 (Pro + Flash)	DeepSeek	MIT⁵

MiniMax M2.7 was originally announced March 18, 2026 as a closed model and then released with open weights on Hugging Face on April 12³. The other three shipped as open weights from day one.

In the same window, Anthropic shipped Claude Opus 4.7 on April 16, 2026, which leads the publicly available SWE-Bench Pro leaderboard at 64.3%⁶. OpenAI then released GPT-5.5 on April 23, scoring 58.6% on SWE-Bench Pro⁷. Anthropic's invitation-only Claude Mythos Preview (Project Glasswing, not generally available) sits ahead of both at 77.8% but is gated to ~40 named partner organizations⁸. So the four open-weight releases bracket two frontier-model launches — useful context for the comparison that follows.

The Models

GLM-5.1 (Z.ai) — April 7

GLM-5.1 is a mixture-of-experts model with roughly 744–754 billion total parameters (sources disagree on the exact figure, with both widely cited), 40 billion active parameters per token, a 200K context window, and an MIT license². At launch, Z.ai claimed the top SWE-Bench Pro score at 58.4% on vendor-published numbers — narrowly above GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3% per Z.ai's measurement), making it the first open-weight model to claim the top of that leaderboard before Anthropic's Claude Opus 4.7 release nine days later². List pricing on Z.ai's direct API is $1.05 per million input tokens and $3.50 per million output tokens⁹.

GLM-5.1's appeal is the combination of permissive license, strong agentic engineering scores, and an inference price that lands well under closed-source frontier rates. For a deeper breakdown of the model and its Huawei Ascend training stack, see our GLM-5.1 benchmarks post.

MiniMax M2.7 — April 12 (open)

MiniMax M2.7 is a 230-billion-parameter MoE with 10 billion active per token, a 200K context window (technically 204,800 tokens), and weights released on Hugging Face under a "Modified-MIT" license that — unlike the standard MIT license MiniMax used for M2 and M2.5 — restricts commercial use without prior written authorization³. The licensing change was controversial when announced; community feedback called the "Modified-MIT" label misleading. The model scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2³. MiniMax pricing comes in at $0.30 input / $1.20 output per million tokens¹⁰.

The lab's pitch is self-evolution: MiniMax says M2.7 actively participated in its own development, running over 100 autonomous rounds of scaffold optimization and reporting an ~30% performance improvement from that loop³. That framing is novel, though independent verification of the "self-evolution" gains is still limited.

Kimi K2.6 (Moonshot AI) — April 20

Kimi K2.6 is a 1-trillion-parameter MoE with 32 billion active per token, a 256K context window, and a Modified MIT license⁴. Architectural details Moonshot published include 384 experts (8 routed plus 1 shared), 61 layers, 64 attention heads, and Multi-head Latent Attention (MLA)⁴. On SWE-Bench Pro, Kimi K2.6 scores 58.6% — narrowly above GLM-5.1's vendor number and high enough to briefly hold the open-weight top spot when it shipped⁴. On SWE-Bench Verified it reaches 80.2%⁴.

Kimi K2.6 also introduces an Agent Swarm primitive that fans a single task across up to 300 sub-agents over 4,000 coordinated steps — a tripling of K2.5's 100-agent ceiling and a near-2.7× jump in the step horizon, and a deliberate bet on the multi-agent direction that several Chinese labs are now investing in⁴. Pricing on Moonshot's official API is $0.60 input / $2.50 output per million tokens; third-party providers list rates in the $0.73–$0.95 input range¹¹. For more on K2.6's swarm architecture and INT4-native checkpoint, see our Kimi K2.6 Agent Swarm coverage.

DeepSeek V4 — April 24

DeepSeek shipped two variants on the same day under MIT-licensed weights and a 1M-token context window⁵:

V4-Pro: 1.6 trillion total parameters, 49B active, pretrained on 33 trillion tokens
V4-Flash: 284 billion total parameters, 13B active, pretrained on 32 trillion tokens

V4-Pro reaches 80.6% on SWE-Bench Verified and 55.4% on SWE-Bench Pro¹². V4-Flash trails Pro by roughly 1.6 points on Verified at 79.0% in its strongest tier¹². DeepSeek positions Pro for maximum capability and Flash for production inference economics.

The architecture is the most aggressive of the four. V4 introduces Compressed Sparse Attention (CSA) paired with Heavily Compressed Attention (HCA), which DeepSeek reports cuts single-token inference FLOPs to roughly 27% of V3.2 and KV cache footprint to roughly 10% at 1M-token context¹². Pretraining used the Muon optimizer rather than AdamW, applying Newton-Schulz iterations to approximately orthogonalize gradient updates before each weight step — chosen for convergence speed and stability at the 33T-token scale¹².

DeepSeek is also running a launch promotion: V4-Pro is priced at $0.435 input / $0.87 output per million tokens through May 31, 2026, against a list price of $1.74 input / $3.48 output¹³. V4-Flash is $0.14 input / $0.28 output on a cache miss, with cache hits priced at $0.0028 per million tokens — a 98% discount¹³. For a full architecture breakdown of CSA, HCA, and the 1M-token economics, see our DeepSeek V4 deep dive.

Benchmark Snapshot

Putting the four releases on the same axis, alongside Claude Opus 4.7 as the closed-source reference point:

Model	SWE-Bench Pro	SWE-Bench Verified
Claude Opus 4.7	64.3%⁶	87.6%⁶
Kimi K2.6	58.6%⁴	80.2%⁴
GLM-5.1	58.4% (vendor)²	—
MiniMax M2.7	56.22% (SWE-Pro)³	—
DeepSeek V4-Pro	55.4%¹²	80.6%¹²
DeepSeek V4-Flash (Max)	—	79.0%¹²

Three notes on reading this table:

First, "SWE-Pro" and "SWE-Bench Pro" are the same benchmark — MiniMax just drops the "Bench" in its reporting. The methodological wrinkles are real, though: each lab uses its own scaffold, sometimes its own task subset, and run-count conventions vary. Cross-lab comparisons are directionally correct but should not be read as identically configured measurements.

Second, Claude Opus 4.7 holds the publicly available SWE-Bench Pro lead by roughly 5.7 points over Kimi K2.6 and Verified by 7+ points over both DeepSeek V4-Pro and Kimi K2.6. The Western frontier did not collapse in April — Opus 4.7 widened the publicly available lead on coding the same week the open-weight wave hit⁶. GPT-5.5 (April 23) ties Kimi K2.6 at 58.6% on SWE-Bench Pro, so the Western frontier and the open-weight frontier are now interleaved below Opus 4.7 rather than cleanly separated⁷.

Third, the open-weight pack is tightly clustered. About 3.2 percentage points separate the best (Kimi K2.6 at 58.6%) and worst (DeepSeek V4-Pro at 55.4%) SWE-Bench Pro score among the four Chinese models. From a procurement standpoint, the differentiators are price, license, context window, and architectural fit — not raw benchmark order.

For a fuller picture of how Western frontier coding scores climbed in parallel, see our Claude Opus 4.7 benchmarks breakdown and GPT-5.4 computer-use analysis.

The Real Story Is the Price Floor

The capability story is "open-weights are clustered ~6 points behind the frontier on coding." The cost story is much sharper. List pricing per million tokens, input/output⁹¹⁰¹¹¹³⁶:

Model	Input	Output	vs. Opus 4.7 (output)
Claude Opus 4.7	$5.00	$25.00	—
GLM-5.1	$1.05	$3.50	14%
Kimi K2.6 (official)	$0.60	$2.50	10%
MiniMax M2.7	$0.30	$1.20	5%
DeepSeek V4-Pro (promo until May 31)	$0.435	$0.87	3.5%
DeepSeek V4-Flash (cache miss)	$0.14	$0.28	1.1%

Every open-weight model in this wave prices below 15% of Claude Opus 4.7's output rate, and DeepSeek's promotional V4-Flash sits near 1% on the same axis. For agentic loops — where output tokens dominate the bill because the model is generating long action traces — this is where the math actually shifts production decisions¹³.

A few caveats. DeepSeek V4-Pro's promotional rate runs out on May 31, 2026; at the list price of $1.74/$3.48, V4-Pro lands closer to GLM-5.1's tier — still cheap, just less dramatic¹³. Kimi K2.6 pricing varies meaningfully across providers; the $0.60/$2.50 number is Moonshot's direct API, with third parties listing $0.73–$0.95 input¹¹. And Claude Opus 4.7 uses a new tokenizer that consumes up to 35% more tokens for the same text, which inflates real-world Opus costs above what the per-token rates suggest⁶.

The "no more than a third of Claude Opus 4.7" framing some commentators applied to the wave understates the gap. By output-rate alone, every Chinese open-weight model in this group prices at or below 14% of Opus, with Kimi K2.6 and MiniMax M2.7 well under 10% and DeepSeek's Flash variant near 1%.

Architecture: Where the Frontier Moved

A few technical moves are worth flagging because they show where Chinese open-weight teams are now competing on substance rather than scale.

Muon optimizer (DeepSeek V4). Replacing AdamW with Muon — which approximately orthogonalizes gradient updates via Newton-Schulz iterations — let DeepSeek push pretraining to 33 trillion tokens on V4-Pro without the gradient collapse that typically plagues runs at that scale¹². Moonshot's earlier Kimi K2 pretraining run also used Muon (specifically a MuonClip variant designed to handle instabilities at scale), so DeepSeek is not the first frontier-scale lab here — but V4 is the largest publicly disclosed Muon-trained model to date.

Compressed attention (DeepSeek V4). CSA + HCA cuts KV cache to roughly 10% and per-token FLOPs to roughly 27% of V3.2 at 1M-token context¹². This is the kind of inference-efficiency move that explains how DeepSeek can run V4-Flash at $0.14 input per million tokens at all. For more on inference-economics breakthroughs, see our TurboQuant KV-cache compression coverage.

Agent Swarms (Kimi K2.6). Moonshot's Agent Swarm primitive runs a single task across up to 300 sub-agents over 4,000 coordinated steps — 3× the agent count and ~2.7× the step horizon of K2.5⁴. Whether this beats well-structured single-agent workflows on production tasks is an open empirical question, but it is among the more aggressive multi-agent primitives shipped at the open-weight frontier so far in 2026.

Self-improving scaffolds (MiniMax M2.7). MiniMax describes M2.7 as having actively participated in its own development across more than 100 autonomous rounds of scaffold optimization, reporting a ~30% gain from the loop³. The "model trains itself" narrative is overstated — humans still own the training run — but evaluator-in-the-loop scaffold tuning is a real technique and M2.7 is the most publicly committed bet on it among the four.

Permissive licensing (mostly). GLM-5.1 and DeepSeek V4-Pro/Flash ship under standard MIT²⁵. Kimi K2.6 uses a Modified MIT that adds attribution requirements only above 100M monthly active users or $20M monthly revenue — effectively standard MIT for most users⁴. MiniMax M2.7 is the outlier: its Modified-MIT license restricts commercial use without prior written authorization, a meaningful regression from the standard MIT licensing MiniMax used for M2 and M2.5³. For enterprises with strict compliance reviews, GLM-5.1 and DeepSeek V4 are the smoothest pickup.

Where the Frontier Still Holds

For balance: the Western frontier did not lose any ground on capability during the April wave.

Publicly available coding leadership widened. Claude Opus 4.7 leads the publicly available SWE-Bench Pro leaderboard by 5.7 points over the best open-weight number from this group, and SWE-Bench Verified by 7+ points⁶. Anthropic's invitation-only Claude Mythos Preview sits even higher at 77.8% on SWE-Bench Pro but is restricted to Project Glasswing partners⁸.
Computer use. OSWorld-Verified scores for the four Chinese releases are not consistently published; Opus 4.7 holds 78.0% with the human baseline at ~72.4%⁶.
Tool integration depth. Anthropic's task-budgets agentic-loop controls, Microsoft Foundry integration, and adaptive thinking remain features the open-weight pack matches only piecemeal.

If you need the highest accuracy on multi-hour autonomous coding sessions today, Opus 4.7 is the clearest choice. If you need to run those same workflows at 1/10 the output-token cost and can absorb a 5–9 point capability gap, the Chinese open-weight pack is the new floor.

For broader context on the China–US AI capability gap and adoption curves, see our coverage of the Stanford AI Index 2026.

What This Means for Production Teams

A few takeaways that fall out of the data rather than the hype:

Inference economics changed faster than capability did. The interesting number this month is not the SWE-Bench delta — it's the price-per-output-token delta.
Open-weight self-hosting is now defensible for coding agents. GLM-5.1 and DeepSeek V4 ship under MIT with strong agentic scores; teams with compliance reasons to keep weights inside their VPC have a stack that works.
The Chinese frontier converged. GLM-5.1, Kimi K2.6, MiniMax M2.7, and DeepSeek V4 cluster within ~3.2 points on SWE-Bench Pro. Differentiation now runs through context window, license, and serving cost rather than raw benchmark.
Multi-agent and self-improvement are no longer fringe. Kimi's Agent Swarm and MiniMax's self-evolving scaffold are sitting on top of capable base models, not behind academic-grade demos.
The promotional pricing window matters. DeepSeek V4-Pro's promo rate runs through May 31, 2026; budget plans assuming the promo rate need to model the post-May 31 list price¹³.

Bottom Line

The April 2026 wave didn't unseat closed-source frontier coding models — Claude Opus 4.7 still leads the publicly available SWE-Bench Pro leaderboard by a clear margin, and Anthropic's invitation-only Mythos Preview sits even higher. What the wave did was collapse the price floor for "good-enough" agentic coding to roughly one-tenth of frontier rates. For teams running long-loop coding agents in production, the question shifted from "which closed-source model do we buy?" to "which open-weight model do we self-host, and what does the capability gap cost us per task?"

The frontier is two parallel races now: capability at the top, served by Anthropic, OpenAI, and Google; and inference cost at the open-weight tier, served by Z.ai, MiniMax, Moonshot, and DeepSeek. Eighteen days in April made the second race real.

References

Digital Bright Future, "China's AI Labs Released 4 Open-Weight Coding Models in 12 Days," April 2026: https://digitalbrightfuture.com/best-open-source-coding-models-2026/ ↩
Z.ai GLM-5.1 release coverage, llm-stats.com benchmarks page and Dataconomy launch coverage, April 7, 2026: https://llm-stats.com/models/glm-5.1 and https://dataconomy.com/2026/04/08/z-ais-glm-5-1-tops-swe-bench-pro-beating-major-ai-rivals/ ↩ ↩² ↩³ ↩⁴ ↩⁵
MiniMax M2.7 release announcement and MarkTechPost coverage, April 12, 2026: https://www.minimax.io/news/minimax-m27-en and https://www.marktechpost.com/2026/04/12/minimax-just-open-sourced-minimax-m2-7-a-self-evolving-agent-model-that-scores-56-22-on-swe-pro-and-57-0-on-terminal-bench-2/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
Moonshot AI Kimi K2.6 release coverage, Moonshot blog and llm-stats benchmarks, April 20, 2026: https://www.kimi.com/blog/kimi-k2-6 and https://llm-stats.com/models/kimi-k2.6 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰
DeepSeek V4 Preview release notes, April 24, 2026: https://api-docs.deepseek.com/news/news260424 ↩ ↩² ↩³
Claude Opus 4.7 release coverage, Anthropic and platform pricing, April 16, 2026: https://www.anthropic.com/news/claude-opus-4-7 and https://platform.claude.com/docs/en/about-claude/pricing ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
GPT-5.5 release and benchmarks, OpenAI launch coverage and SWE-Bench Pro leaderboard, April 23, 2026: https://openai.com/index/introducing-gpt-5-5/ and https://llm-stats.com/benchmarks/swe-bench-pro ↩ ↩²
Claude Mythos Preview leaderboard performance, OfficeChai and Scale AI SWE-Bench Pro public leaderboard: https://officechai.com/ai/claude-mythos-preview-benchmarks-swe-bench-pro/ and https://labs.scale.com/leaderboard/swe_bench_pro_public ↩ ↩²
GLM-5.1 API pricing, Artificial Analysis and pricepertoken.com listings, 2026: https://artificialanalysis.ai/models/glm-5-1 and https://pricepertoken.com/pricing-page/model/z-ai-glm-5.1 ↩ ↩²
MiniMax M2.7 pricing, Artificial Analysis listing, 2026: https://artificialanalysis.ai/models/minimax-m2-7 ↩ ↩²
Kimi K2.6 official API pricing, Moonshot platform docs and DeepInfra/OpenRouter listings: https://platform.kimi.ai/docs/pricing/chat-k2 and https://openrouter.ai/moonshotai/kimi-k2.6 ↩ ↩² ↩³
DeepSeek V4 architecture and benchmarks, Hugging Face technical report summary and DataCamp coverage, April 2026: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro and https://www.datacamp.com/blog/deepseek-v4 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹
DeepSeek V4 official API pricing, DeepSeek platform docs and TheNextWeb price-cut coverage, 2026: https://api-docs.deepseek.com/quick_start/pricing and https://thenextweb.com/news/deepseek-v4-pro-price-cut-75-percent ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶

Frequently Asked Questions

On vendor-published SWE-Bench Pro, Kimi K2.6 leads the four at 58.6%, narrowly ahead of GLM-5.1 at 58.4%. On SWE-Bench Verified, DeepSeek V4-Pro reaches 80.6% and Kimi K2.6 reaches 80.2%. Differences are small enough that license, context window, and pricing should dominate the choice for production teams.