China's Open-Weight Coding Wave: 4 Models, 18 Days
May 17, 2026
TL;DR
Between April 7 and April 24, 2026, four Chinese AI labs shipped open-weight coding models in close succession: Z.ai's GLM-5.1, MiniMax's M2.7 (open-sourced April 12 after a March announcement), Moonshot's Kimi K2.6, and DeepSeek's V4 (Pro and Flash variants). All target agentic engineering workflows, all are mixture-of-experts architectures, and all price at roughly an order of magnitude below Claude Opus 4.7 — which holds the top of the publicly available SWE-Bench Pro leaderboard at 64.3% versus the open-weight high mark of 58.6% from Kimi K2.6 (tied at 58.6% with OpenAI's GPT-5.5). Anthropic's invitation-only Claude Mythos Preview is ahead of all of these at 77.8% but is not generally available. The story isn't that open weights overtook the frontier. It's that the inference-cost floor for "good enough" agentic coding just collapsed.
What You'll Learn
- The release timeline and specs for GLM-5.1, MiniMax M2.7, Kimi K2.6, and DeepSeek V4
- Where each model lands on SWE-Bench Pro and SWE-Bench Verified
- How list prices compare to Claude Opus 4.7 and what that means for production budgets
- The architectural moves that made these models possible — Muon optimizer, compressed attention, agent swarms, self-improving scaffolds
- Where Western frontier models still hold a real lead
The 18-Day Release Window
The compressed timeline is the headline. Four major Chinese labs shipped open-weight coding models within an 18-day window, each one positioned to compete on agentic engineering rather than chat1:
| Date | Model | Lab | License |
|---|---|---|---|
| Apr 7, 2026 | GLM-5.1 | Z.ai | MIT2 |
| Apr 12, 2026 | MiniMax M2.7 (open) | MiniMax | Modified-MIT (non-commercial)3 |
| Apr 20, 2026 | Kimi K2.6 | Moonshot AI | Modified MIT4 |
| Apr 24, 2026 | DeepSeek V4 (Pro + Flash) | DeepSeek | MIT5 |
MiniMax M2.7 was originally announced March 18, 2026 as a closed model and then released with open weights on Hugging Face on April 123. The other three shipped as open weights from day one.
In the same window, Anthropic shipped Claude Opus 4.7 on April 16, 2026, which leads the publicly available SWE-Bench Pro leaderboard at 64.3%6. OpenAI then released GPT-5.5 on April 23, scoring 58.6% on SWE-Bench Pro7. Anthropic's invitation-only Claude Mythos Preview (Project Glasswing, not generally available) sits ahead of both at 77.8% but is gated to ~40 named partner organizations8. So the four open-weight releases bracket two frontier-model launches — useful context for the comparison that follows.
The Models
GLM-5.1 (Z.ai) — April 7
GLM-5.1 is a mixture-of-experts model with roughly 744–754 billion total parameters (sources disagree on the exact figure, with both widely cited), 40 billion active parameters per token, a 200K context window, and an MIT license2. At launch, Z.ai claimed the top SWE-Bench Pro score at 58.4% on vendor-published numbers — narrowly above GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3% per Z.ai's measurement), making it the first open-weight model to claim the top of that leaderboard before Anthropic's Claude Opus 4.7 release nine days later2. List pricing on Z.ai's direct API is $1.05 per million input tokens and $3.50 per million output tokens9.
GLM-5.1's appeal is the combination of permissive license, strong agentic engineering scores, and an inference price that lands well under closed-source frontier rates. For a deeper breakdown of the model and its Huawei Ascend training stack, see our GLM-5.1 benchmarks post.
MiniMax M2.7 — April 12 (open)
MiniMax M2.7 is a 230-billion-parameter MoE with 10 billion active per token, a 200K context window (technically 204,800 tokens), and weights released on Hugging Face under a "Modified-MIT" license that — unlike the standard MIT license MiniMax used for M2 and M2.5 — restricts commercial use without prior written authorization3. The licensing change was controversial when announced; community feedback called the "Modified-MIT" label misleading. The model scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 23. MiniMax pricing comes in at $0.30 input / $1.20 output per million tokens10.
The lab's pitch is self-evolution: MiniMax says M2.7 actively participated in its own development, running over 100 autonomous rounds of scaffold optimization and reporting an ~30% performance improvement from that loop3. That framing is novel, though independent verification of the "self-evolution" gains is still limited.
Kimi K2.6 (Moonshot AI) — April 20
Kimi K2.6 is a 1-trillion-parameter MoE with 32 billion active per token, a 256K context window, and a Modified MIT license4. Architectural details Moonshot published include 384 experts (8 routed plus 1 shared), 61 layers, 64 attention heads, and Multi-head Latent Attention (MLA)4. On SWE-Bench Pro, Kimi K2.6 scores 58.6% — narrowly above GLM-5.1's vendor number and high enough to briefly hold the open-weight top spot when it shipped4. On SWE-Bench Verified it reaches 80.2%4.
Kimi K2.6 also introduces an Agent Swarm primitive that fans a single task across up to 300 sub-agents over 4,000 coordinated steps — a tripling of K2.5's 100-agent ceiling and a near-2.7× jump in the step horizon, and a deliberate bet on the multi-agent direction that several Chinese labs are now investing in4. Pricing on Moonshot's official API is $0.60 input / $2.50 output per million tokens; third-party providers list rates in the $0.73–$0.95 input range11. For more on K2.6's swarm architecture and INT4-native checkpoint, see our Kimi K2.6 Agent Swarm coverage.
DeepSeek V4 — April 24
DeepSeek shipped two variants on the same day under MIT-licensed weights and a 1M-token context window5:
- V4-Pro: 1.6 trillion total parameters, 49B active, pretrained on 33 trillion tokens
- V4-Flash: 284 billion total parameters, 13B active, pretrained on 32 trillion tokens
V4-Pro reaches 80.6% on SWE-Bench Verified and 55.4% on SWE-Bench Pro12. V4-Flash trails Pro by roughly 1.6 points on Verified at 79.0% in its strongest tier12. DeepSeek positions Pro for maximum capability and Flash for production inference economics.
The architecture is the most aggressive of the four. V4 introduces Compressed Sparse Attention (CSA) paired with Heavily Compressed Attention (HCA), which DeepSeek reports cuts single-token inference FLOPs to roughly 27% of V3.2 and KV cache footprint to roughly 10% at 1M-token context12. Pretraining used the Muon optimizer rather than AdamW, applying Newton-Schulz iterations to approximately orthogonalize gradient updates before each weight step — chosen for convergence speed and stability at the 33T-token scale12.
DeepSeek is also running a launch promotion: V4-Pro is priced at $0.435 input / $0.87 output per million tokens through May 31, 2026, against a list price of $1.74 input / $3.48 output13. V4-Flash is $0.14 input / $0.28 output on a cache miss, with cache hits priced at $0.0028 per million tokens — a 98% discount13. For a full architecture breakdown of CSA, HCA, and the 1M-token economics, see our DeepSeek V4 deep dive.
Benchmark Snapshot
Putting the four releases on the same axis, alongside Claude Opus 4.7 as the closed-source reference point:
| Model | SWE-Bench Pro | SWE-Bench Verified |
|---|---|---|
| Claude Opus 4.7 | 64.3%6 | 87.6%6 |
| Kimi K2.6 | 58.6%4 | 80.2%4 |
| GLM-5.1 | 58.4% (vendor)2 | — |
| MiniMax M2.7 | 56.22% (SWE-Pro)3 | — |
| DeepSeek V4-Pro | 55.4%12 | 80.6%12 |
| DeepSeek V4-Flash (Max) | — | 79.0%12 |
Three notes on reading this table:
First, "SWE-Pro" and "SWE-Bench Pro" are the same benchmark — MiniMax just drops the "Bench" in its reporting. The methodological wrinkles are real, though: each lab uses its own scaffold, sometimes its own task subset, and run-count conventions vary. Cross-lab comparisons are directionally correct but should not be read as identically configured measurements.
Second, Claude Opus 4.7 holds the publicly available SWE-Bench Pro lead by roughly 5.7 points over Kimi K2.6 and Verified by 7+ points over both DeepSeek V4-Pro and Kimi K2.6. The Western frontier did not collapse in April — Opus 4.7 widened the publicly available lead on coding the same week the open-weight wave hit6. GPT-5.5 (April 23) ties Kimi K2.6 at 58.6% on SWE-Bench Pro, so the Western frontier and the open-weight frontier are now interleaved below Opus 4.7 rather than cleanly separated7.
Third, the open-weight pack is tightly clustered. About 3.2 percentage points separate the best (Kimi K2.6 at 58.6%) and worst (DeepSeek V4-Pro at 55.4%) SWE-Bench Pro score among the four Chinese models. From a procurement standpoint, the differentiators are price, license, context window, and architectural fit — not raw benchmark order.
For a fuller picture of how Western frontier coding scores climbed in parallel, see our Claude Opus 4.7 benchmarks breakdown and GPT-5.4 computer-use analysis.
The Real Story Is the Price Floor
The capability story is "open-weights are clustered ~6 points behind the frontier on coding." The cost story is much sharper. List pricing per million tokens, input/output91011136:
| Model | Input | Output | vs. Opus 4.7 (output) |
|---|---|---|---|
| Claude Opus 4.7 | $5.00 | $25.00 | — |
| GLM-5.1 | $1.05 | $3.50 | 14% |
| Kimi K2.6 (official) | $0.60 | $2.50 | 10% |
| MiniMax M2.7 | $0.30 | $1.20 | 5% |
| DeepSeek V4-Pro (promo until May 31) | $0.435 | $0.87 | 3.5% |
| DeepSeek V4-Flash (cache miss) | $0.14 | $0.28 | 1.1% |
Every open-weight model in this wave prices below 15% of Claude Opus 4.7's output rate, and DeepSeek's promotional V4-Flash sits near 1% on the same axis. For agentic loops — where output tokens dominate the bill because the model is generating long action traces — this is where the math actually shifts production decisions13.
A few caveats. DeepSeek V4-Pro's promotional rate runs out on May 31, 2026; at the list price of $1.74/$3.48, V4-Pro lands closer to GLM-5.1's tier — still cheap, just less dramatic13. Kimi K2.6 pricing varies meaningfully across providers; the $0.60/$2.50 number is Moonshot's direct API, with third parties listing $0.73–$0.95 input11. And Claude Opus 4.7 uses a new tokenizer that consumes up to 35% more tokens for the same text, which inflates real-world Opus costs above what the per-token rates suggest6.
The "no more than a third of Claude Opus 4.7" framing some commentators applied to the wave understates the gap. By output-rate alone, every Chinese open-weight model in this group prices at or below 14% of Opus, with Kimi K2.6 and MiniMax M2.7 well under 10% and DeepSeek's Flash variant near 1%.
Architecture: Where the Frontier Moved
A few technical moves are worth flagging because they show where Chinese open-weight teams are now competing on substance rather than scale.
Muon optimizer (DeepSeek V4). Replacing AdamW with Muon — which approximately orthogonalizes gradient updates via Newton-Schulz iterations — let DeepSeek push pretraining to 33 trillion tokens on V4-Pro without the gradient collapse that typically plagues runs at that scale12. Moonshot's earlier Kimi K2 pretraining run also used Muon (specifically a MuonClip variant designed to handle instabilities at scale), so DeepSeek is not the first frontier-scale lab here — but V4 is the largest publicly disclosed Muon-trained model to date.
Compressed attention (DeepSeek V4). CSA + HCA cuts KV cache to roughly 10% and per-token FLOPs to roughly 27% of V3.2 at 1M-token context12. This is the kind of inference-efficiency move that explains how DeepSeek can run V4-Flash at $0.14 input per million tokens at all. For more on inference-economics breakthroughs, see our TurboQuant KV-cache compression coverage.
Agent Swarms (Kimi K2.6). Moonshot's Agent Swarm primitive runs a single task across up to 300 sub-agents over 4,000 coordinated steps — 3× the agent count and ~2.7× the step horizon of K2.54. Whether this beats well-structured single-agent workflows on production tasks is an open empirical question, but it is among the more aggressive multi-agent primitives shipped at the open-weight frontier so far in 2026.
Self-improving scaffolds (MiniMax M2.7). MiniMax describes M2.7 as having actively participated in its own development across more than 100 autonomous rounds of scaffold optimization, reporting a ~30% gain from the loop3. The "model trains itself" narrative is overstated — humans still own the training run — but evaluator-in-the-loop scaffold tuning is a real technique and M2.7 is the most publicly committed bet on it among the four.
Permissive licensing (mostly). GLM-5.1 and DeepSeek V4-Pro/Flash ship under standard MIT25. Kimi K2.6 uses a Modified MIT that adds attribution requirements only above 100M monthly active users or $20M monthly revenue — effectively standard MIT for most users4. MiniMax M2.7 is the outlier: its Modified-MIT license restricts commercial use without prior written authorization, a meaningful regression from the standard MIT licensing MiniMax used for M2 and M2.53. For enterprises with strict compliance reviews, GLM-5.1 and DeepSeek V4 are the smoothest pickup.
Where the Frontier Still Holds
For balance: the Western frontier did not lose any ground on capability during the April wave.
- Publicly available coding leadership widened. Claude Opus 4.7 leads the publicly available SWE-Bench Pro leaderboard by 5.7 points over the best open-weight number from this group, and SWE-Bench Verified by 7+ points6. Anthropic's invitation-only Claude Mythos Preview sits even higher at 77.8% on SWE-Bench Pro but is restricted to Project Glasswing partners8.
- Computer use. OSWorld-Verified scores for the four Chinese releases are not consistently published; Opus 4.7 holds 78.0% with the human baseline at ~72.4%6.
- Tool integration depth. Anthropic's
task-budgetsagentic-loop controls, Microsoft Foundry integration, and adaptive thinking remain features the open-weight pack matches only piecemeal.
If you need the highest accuracy on multi-hour autonomous coding sessions today, Opus 4.7 is the clearest choice. If you need to run those same workflows at 1/10 the output-token cost and can absorb a 5–9 point capability gap, the Chinese open-weight pack is the new floor.
For broader context on the China–US AI capability gap and adoption curves, see our coverage of the Stanford AI Index 2026.
What This Means for Production Teams
A few takeaways that fall out of the data rather than the hype:
- Inference economics changed faster than capability did. The interesting number this month is not the SWE-Bench delta — it's the price-per-output-token delta.
- Open-weight self-hosting is now defensible for coding agents. GLM-5.1 and DeepSeek V4 ship under MIT with strong agentic scores; teams with compliance reasons to keep weights inside their VPC have a stack that works.
- The Chinese frontier converged. GLM-5.1, Kimi K2.6, MiniMax M2.7, and DeepSeek V4 cluster within ~3.2 points on SWE-Bench Pro. Differentiation now runs through context window, license, and serving cost rather than raw benchmark.
- Multi-agent and self-improvement are no longer fringe. Kimi's Agent Swarm and MiniMax's self-evolving scaffold are sitting on top of capable base models, not behind academic-grade demos.
- The promotional pricing window matters. DeepSeek V4-Pro's promo rate runs through May 31, 2026; budget plans assuming the promo rate need to model the post-May 31 list price13.
Bottom Line
The April 2026 wave didn't unseat closed-source frontier coding models — Claude Opus 4.7 still leads the publicly available SWE-Bench Pro leaderboard by a clear margin, and Anthropic's invitation-only Mythos Preview sits even higher. What the wave did was collapse the price floor for "good-enough" agentic coding to roughly one-tenth of frontier rates. For teams running long-loop coding agents in production, the question shifted from "which closed-source model do we buy?" to "which open-weight model do we self-host, and what does the capability gap cost us per task?"
The frontier is two parallel races now: capability at the top, served by Anthropic, OpenAI, and Google; and inference cost at the open-weight tier, served by Z.ai, MiniMax, Moonshot, and DeepSeek. Eighteen days in April made the second race real.
References
Footnotes
-
Digital Bright Future, "China's AI Labs Released 4 Open-Weight Coding Models in 12 Days," April 2026: https://digitalbrightfuture.com/best-open-source-coding-models-2026/ ↩
-
Z.ai GLM-5.1 release coverage, llm-stats.com benchmarks page and Dataconomy launch coverage, April 7, 2026: https://llm-stats.com/models/glm-5.1 and https://dataconomy.com/2026/04/08/z-ais-glm-5-1-tops-swe-bench-pro-beating-major-ai-rivals/ ↩ ↩2 ↩3 ↩4 ↩5
-
MiniMax M2.7 release announcement and MarkTechPost coverage, April 12, 2026: https://www.minimax.io/news/minimax-m27-en and https://www.marktechpost.com/2026/04/12/minimax-just-open-sourced-minimax-m2-7-a-self-evolving-agent-model-that-scores-56-22-on-swe-pro-and-57-0-on-terminal-bench-2/ ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8
-
Moonshot AI Kimi K2.6 release coverage, Moonshot blog and llm-stats benchmarks, April 20, 2026: https://www.kimi.com/blog/kimi-k2-6 and https://llm-stats.com/models/kimi-k2.6 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10
-
DeepSeek V4 Preview release notes, April 24, 2026: https://api-docs.deepseek.com/news/news260424 ↩ ↩2 ↩3
-
Claude Opus 4.7 release coverage, Anthropic and platform pricing, April 16, 2026: https://www.anthropic.com/news/claude-opus-4-7 and https://platform.claude.com/docs/en/about-claude/pricing ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8
-
GPT-5.5 release and benchmarks, OpenAI launch coverage and SWE-Bench Pro leaderboard, April 23, 2026: https://openai.com/index/introducing-gpt-5-5/ and https://llm-stats.com/benchmarks/swe-bench-pro ↩ ↩2
-
Claude Mythos Preview leaderboard performance, OfficeChai and Scale AI SWE-Bench Pro public leaderboard: https://officechai.com/ai/claude-mythos-preview-benchmarks-swe-bench-pro/ and https://labs.scale.com/leaderboard/swe_bench_pro_public ↩ ↩2
-
GLM-5.1 API pricing, Artificial Analysis and pricepertoken.com listings, 2026: https://artificialanalysis.ai/models/glm-5-1 and https://pricepertoken.com/pricing-page/model/z-ai-glm-5.1 ↩ ↩2
-
MiniMax M2.7 pricing, Artificial Analysis listing, 2026: https://artificialanalysis.ai/models/minimax-m2-7 ↩ ↩2
-
Kimi K2.6 official API pricing, Moonshot platform docs and DeepInfra/OpenRouter listings: https://platform.kimi.ai/docs/pricing/chat-k2 and https://openrouter.ai/moonshotai/kimi-k2.6 ↩ ↩2 ↩3
-
DeepSeek V4 architecture and benchmarks, Hugging Face technical report summary and DataCamp coverage, April 2026: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro and https://www.datacamp.com/blog/deepseek-v4 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9
-
DeepSeek V4 official API pricing, DeepSeek platform docs and TheNextWeb price-cut coverage, 2026: https://api-docs.deepseek.com/quick_start/pricing and https://thenextweb.com/news/deepseek-v4-pro-price-cut-75-percent ↩ ↩2 ↩3 ↩4 ↩5 ↩6