Kimi K2.6: Open-Weight 300-Agent Swarm Tops GPT-5.4 (2026)

April 27, 2026

Kimi K2.6: Open-Weight 300-Agent Swarm Tops GPT-5.4 (2026)

On April 20, 2026, Moonshot AI shipped Kimi K2.6 with full open weights, full API access, and a benchmark suite that pushes the open-weight frontier 0.2 points past Z.ai's GLM-5.1 — the previous open-weight leader — on SWE-Bench Pro.1 Kimi K2.6 is a 1-trillion-parameter Mixture-of-Experts model with 32B active parameters, a 256K context window, and an Agent Swarm architecture that scales to 300 sub-agents executing 4,000 coordinated steps.2

TL;DR

Kimi K2.6 scores 58.6% on SWE-Bench Pro, narrowly ahead of GLM-5.1 (58.4%) and GPT-5.4 (57.7%) but well behind Claude Opus 4.7 (64.3%, released April 16, 2026).134 On SWE-Bench Verified it scores 80.2% — within a fraction of a point of Claude Opus 4.6's 80.8%.13 K2.6 ships under a Modified MIT License with weights on Hugging Face.5 Its swarm orchestrator scales to 300 parallel sub-agents — three times the 100-agent ceiling of Kimi K2.5 in January 2026 — and the team demonstrated 13 hours of continuous autonomous coding on an open-source financial matching engine.26 At $0.95 per million input tokens and $4.00 per million output tokens, K2.6 sits well below the closed flagships (Claude Opus 4.7 at $5/$25, GPT-5.5 at $5/$30) though above DeepSeek V4's standard tier at $0.30/$0.50.78

What You'll Learn

  • The exact release timeline and licensing terms for Kimi K2.6
  • The MoE architecture: 1T total parameters, 32B active, 384 experts
  • Benchmark results versus GPT-5.4, Claude Opus 4.6, and Claude Opus 4.7
  • How Agent Swarm scales from 100 to 300 sub-agents
  • The 13-hour exchange-core demonstration and what it actually optimized
  • Pricing comparison and what the open-weight angle changes

Release Timeline and Licensing

Moonshot AI seeded Kimi K2.6 to Kimi Code subscribers as a preview on April 13, 2026. Eight days later, on April 21, the company removed the "Preview" label and shipped K2.6 as a generally available model across Kimi.com, the Kimi App, the official API, and the Kimi Code CLI.9 The open-weights drop on Hugging Face — moonshotai/Kimi-K2.6 — happened on April 20, 2026, alongside a benchmark suite and a deployment guide.25

Both the model weights and the code repository are released under a Modified MIT License.5 That makes K2.6 a genuinely open-weight frontier model — distinct from "open API, closed weights" releases like GPT-5.5 or invitation-only previews like Claude Mythos, which Anthropic gates through Project Glasswing rather than a public API.10 As of late April 2026, Kimi K2.6 sits alongside Z.ai's GLM-5.1 (April 7, 2026) and DeepSeek V4 (April 24, 2026) as one of a small group of open-weight models that contest the closed-source flagships on coding benchmarks.11412

Architecture: 1T MoE With INT4 Native Quantization

Kimi K2.6 is a sparse Mixture-of-Experts model. The full parameter count is 1 trillion, but only 32 billion parameters are activated for any given token during inference. The router selects 8 experts from a pool of 384, plus 1 shared expert that runs on every token.5 The practical consequence is that K2.6 has the parameter breadth of a 1T dense model while incurring roughly the inference cost of a 32B dense model — a useful property when you are renting GPU time.

The weights are published natively in INT4 quantization, which keeps the released checkpoint at approximately 594GB on Hugging Face.5 That is still a serious deployment, but INT4-native means there is no FP16 → INT4 quality drop to debate. The model handles text, images, and video in the same architecture — there is no separate vision module bolted on top.13

Context window is 256K tokens (precisely 262,144), with a maximum output length of 65,536 tokens.14 That is shorter than Claude Opus 4.6's 1M-token context and GPT-5.4's roughly 1.05M tokens, and the gap is real if your workflow streams entire repositories into a single prompt.11 For the agent-driven workflows that K2.6 is optimized around, where retrieval and tool calls fragment context anyway, 256K is rarely the bottleneck.

Coding Benchmarks: New Open-Weight High On SWE-Bench Pro

The headline number is SWE-Bench Pro. K2.6 scores 58.6% versus GPT-5.4 at 57.7% and the previous open-weight leader GLM-5.1 at 58.4% — a 0.2-point margin that nudges K2.6 to the top of the open-weight pack.14 At the broader leaderboard level, Anthropic's Claude Opus 4.7 (released April 16, 2026, four days before K2.6) leads SWE-Bench Pro at 64.3%, so K2.6 is the highest-scoring open-weight model on this benchmark, not the highest-scoring model overall.3 On SWE-Bench Verified, K2.6 scores 80.2% — within 0.6 points of Claude Opus 4.6's 80.8%.15

BenchmarkKimi K2.6GPT-5.4Claude Opus 4.6Claude Opus 4.7
SWE-Bench Verified80.2%80.8%87.6%
SWE-Bench Pro58.6%57.7%53.4%64.3%
HLE with tools54.0%52.1%53.0%
DeepSearchQA (F1)92.5%78.6%
BrowseComp (single-agent)83.2
BrowseComp (Agent Swarm)86.3
AIME 202696.4%99.2%
GPQA-Diamond90.5%92.8%

K2.6 leads GPT-5.4 on the agentic and tool-use benchmarks listed here, and trails on pure reasoning — GPT-5.4 stays ahead on AIME 2026 and GPQA-Diamond by 2-3 points, and Claude Opus 4.7 sits 5.7 points ahead on SWE-Bench Pro.1116 The shape of the gap matches Moonshot's pitch: the model is tuned for long-horizon agent execution, not for static math contests.

The 5.6 percentage-point jump on SWE-Bench Pro from K2.5's 53.0% is significant.17 Caveat: every benchmark figure cited here comes from vendor announcements; independent third-party replication for K2.6 is still in progress as of the April 27 publication date.11

Agent Swarm: From 100 to 300 Sub-Agents in One Generation

Kimi K2.6's most aggressive architectural bet is Agent Swarm. Mechanically, it lets the main model decompose a complex task into heterogeneous subtasks, spawn specialized sub-agents to execute them in parallel, and synthesize their outputs through a shared state coordinator.6

Three numbers define the upgrade from K2.5 to K2.6:

K2.5 (Jan 2026)K2.6 (Apr 2026)
Max sub-agents100300
Coordinated steps1,5004,000
BrowseComp w/ Swarm78.486.3

Kimi K2.5 launched January 27, 2026 with a 100-sub-agent ceiling and a 1,500-step coordination horizon.18 K2.6 triples the agent count and roughly 2.7x's the step horizon, while pushing the BrowseComp-with-swarm score from 78.4 to 86.3.26 The practical question is whether enterprise orchestration layers can actually drive 300 parallel agents — a separate VentureBeat analysis flagged orchestration overhead as the binding constraint, not model capability.19

The Skills capability ships alongside the swarm. Upload a high-quality PDF, spreadsheet, slide, or Word document, and K2.6 captures its structural and stylistic DNA as a reusable Skill that the swarm can apply to future tasks. Moonshot's launch demonstrations included a 100-sub-agent run that matched a single uploaded CV against 100 California job postings and returned 100 customized resumes, and another that turned an astrophysics paper into a 40-page, 7,000-word research output backed by a structured dataset of 20,000+ entries and 14 astronomy-grade charts.20

The 13-Hour Exchange-Core Demonstration

Moonshot's headline real-world demo for K2.6 is exchange-core — an 8-year-old open-source financial matching engine. The model worked autonomously for 13 hours, iterated through 12 optimization strategies, made over 1,000 tool calls, and modified more than 4,000 lines of code in a single execution.21

What it actually changed is more interesting than the headline numbers. K2.6 read CPU and allocation flame graphs to localize hidden bottlenecks, then reconfigured the core thread topology from 4ME+2RE to 2ME+1RE.21 After the run, exchange-core's medium throughput jumped 185% (0.43 → 1.24 MT/s) and peak throughput jumped 133% (1.23 → 2.86 MT/s).21

This is the kind of demonstration that would have been a multi-day human engineering project two years ago. The autonomous-mode caveat is that exchange-core has comprehensive tests and a clear performance metric, which makes it an easier target than a sparsely-tested production codebase. Still, sustaining 13 hours of useful work without supervision is a meaningful operational milestone for a single-model agentic system.

Pricing: $0.95 Input, $4.00 Output

Kimi K2.6's official Moonshot pricing is $0.95 per 1M input tokens, $4.00 per 1M output tokens, and $0.16 per 1M cache-hit tokens — an 83% discount on cached context.7 At the official rate, K2.6 lands well below the closed flagships (Claude Opus 4.7 at $5/$25 and GPT-5.5 at $5/$30) but above DeepSeek V4's standard tier ($0.30/$0.50) and below GLM-5.1's direct rate ($1.40/$4.40).822 Third-party providers — Fireworks, Parasail, DeepInfra, Together.ai, Cloudflare, Novita, SiliconFlow, and Clarifai — host K2.6 with their own price points; OpenRouter currently lists $0.7448 input / $4.655 output as a representative third-party rate.23

For agentic workflows that issue many tool calls and re-read the same context, the cache-hit pricing matters more than the headline rate. At $0.16 per million cache-hit input tokens, repeated context (a fixed system prompt, a stable set of tools, a documents directory) becomes effectively free relative to the output cost.

What This Changes

K2.6 is one of three open-weight models in April 2026 — alongside Z.ai's GLM-5.1 (April 7) and DeepSeek V4 (April 24) — that credibly contest the closed-source coding frontier.412 On SWE-Bench Pro it nudges past GLM-5.1 by 0.2 points, but Anthropic's Claude Opus 4.7 (April 16, 64.3%) sits clearly ahead at the leaderboard level.3 The combination of open weights, INT4-native deployment, agentic tuning, and a 300-sub-agent orchestration ceiling is a real product decision: Moonshot is betting that long-horizon agent execution, not single-shot reasoning, is where the next year of frontier work lives.

The remaining open questions are the ones that closed labs do not have to answer publicly. Independent replication of the SWE-Bench Pro and HLE-with-tools numbers will determine whether the leaderboard position survives third-party harnesses. Enterprise-scale orchestration of 300 parallel agents is gated by infrastructure that most teams have not built yet. And K2.6 launched without a published safety evaluation suite, which leaves alignment claims at "trust the benchmarks" rather than "read the report."11

For developers building agentic systems today, the calculus is straightforward: K2.6 is open-weight, cheaper than the closed flagships, and competitive on the benchmarks that match agentic workloads. For teams building on Claude Opus 4.6 or GPT-5.4 because of context window or pure-reasoning gaps, those gaps still exist — they just shrank.

Bottom Line

Kimi K2.6 takes the open-weight crown on SWE-Bench Pro by 0.2 points over GLM-5.1, while sitting 5.7 points below Claude Opus 4.7 at the overall leaderboard level. It also ships the largest publicly disclosed single-model swarm ceiling — 300 sub-agents and 4,000 coordinated steps — and a real-world demonstration of 13 hours of autonomous coding. At $0.95 input and $4.00 output per million tokens — undercutting both GPT-5.4 and Claude Opus 4.6 — the cost case is straightforward. The open questions are whether independent third-party harnesses will reproduce the benchmark numbers, and whether enterprise orchestration layers can actually drive a 300-agent swarm in production. As of April 27, 2026, neither has been answered at scale.

For more on the closed-flagship comparison points, see our coverage of GPT-5.4 beating humans on computer-use benchmarks and GPT-5.5's retrained agentic base. For the open-weight context, see DeepSeek V4 and GLM-5.1.

Footnotes

  1. Moonshot AI, "Kimi K2.6 Tech Blog: Advancing Open-Source Coding," April 2026. https://www.kimi.com/blog/kimi-k2-6 2 3 4 5

  2. MarkTechPost, "Moonshot AI Releases Kimi K2.6 with Long-Horizon Coding, Agent Swarm Scaling to 300 Sub-Agents and 4,000 Coordinated Steps," April 20, 2026. https://www.marktechpost.com/2026/04/20/moonshot-ai-releases-kimi-k2-6-with-long-horizon-coding-agent-swarm-scaling-to-300-sub-agents-and-4000-coordinated-steps/ 2 3 4 5 6

  3. FindSkill, "Claude Opus 4.7 Release Tracker: Shipped April 16, 2026 — First-Week Verdict," April 2026. https://findskill.ai/blog/claude-opus-4-7-release-tracker/ 2 3 4 5

  4. MarkTechPost, "Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution," April 8, 2026. https://www.marktechpost.com/2026/04/08/z-ai-introduces-glm-5-1-an-open-weight-754b-agentic-model-that-achieves-sota-on-swe-bench-pro-and-sustains-8-hour-autonomous-execution/ 2 3 4

  5. Hugging Face, "moonshotai/Kimi-K2.6 model card," April 2026. https://huggingface.co/moonshotai/Kimi-K2.6 2 3 4 5 6

  6. Verdent Guides, "Kimi K2.6 Agent Swarm: 300 Sub-Agents and 4,000 Steps Explained," April 2026. https://www.verdent.ai/guides/kimi-k2-6-agent-swarm 2 3

  7. Kimi.com, "Kimi K2.6 Pricing | API Costs, Plans & Membership," April 2026. https://www.kimi.com/resources/kimi-k2-6-pricing 2 3

  8. DeepSeek API Docs, "Models & Pricing," April 2026. https://api-docs.deepseek.com/quick_start/pricing 2

  9. Verdent Guides, "What Is Kimi K2.6? Moonshot AI's Open-Weight Agent Model Explained," April 2026. https://www.verdent.ai/guides/what-is-kimi-k2-6 2

  10. InfoQ, "Anthropic Releases Claude Mythos Preview with Cybersecurity Capabilities but Withholds Public Access," April 2026. https://www.infoq.com/news/2026/04/anthropic-claude-mythos/

  11. The Decoder, "Open-weight Kimi K2.6 takes on GPT-5.4 and Claude Opus 4.6 with agent swarms," April 2026. https://the-decoder.com/open-weight-kimi-k2-6-takes-on-gpt-5-4-and-claude-opus-4-6-with-agent-swarms/ 2 3 4 5 6 7

  12. VentureBeat, "DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5," April 24, 2026. https://venturebeat.com/technology/deepseek-v4-arrives-with-near-state-of-the-art-intelligence-at-1-6th-the-cost-of-opus-4-7-gpt-5-5 2

  13. Artificial Analysis, "Kimi K2.6: The new leading open weights model," April 2026. https://artificialanalysis.ai/articles/kimi-k2-6-the-new-leading-open-weights-model

  14. Kimi API Platform, "Kimi K2.6 Quickstart," April 2026. https://platform.kimi.ai/docs/guide/kimi-k2-6-quickstart 2

  15. Anthropic, "Introducing Claude Opus 4.6," 2026. https://www.anthropic.com/news/claude-opus-4-6 2

  16. OpenAI, "Introducing GPT-5.5," April 23, 2026. https://openai.com/index/introducing-gpt-5-5/

  17. ApiDog, "What is Kimi K2.6? Moonshot AI's 1T-Parameter Open Model Explained," April 2026. https://apidog.com/blog/what-is-kimi-k2-6/

  18. SpectrumAI Lab, "Kimi K2.5: 100 AI Agents Working Together, Complete Guide [2026]," 2026. https://spectrumailab.com/blog/kimi-k2-5-agent-swarm-moonshot-ai-guide-2026 2

  19. VentureBeat, "Kimi K2.6 runs agents for days — and exposes the limits of enterprise orchestration," April 2026. https://venturebeat.com/orchestration/kimi-k2-6-runs-agents-for-days-and-exposes-the-limits-of-enterprise-orchestration

  20. AImadeTools, "Kimi K2.6 Complete Guide — Open-Source Agentic Model With 300 Sub-Agents," April 2026. https://www.aimadetools.com/blog/kimi-k2-6-complete-guide/

  21. Moonshot AI, "Kimi K2.6 Tech Blog — exchange-core demonstration section," April 2026. https://www.kimi.com/blog/kimi-k2-6 2 3

  22. Z.AI Developer Document, "Pricing Overview," April 2026. https://docs.z.ai/guides/overview/pricing

  23. OpenRouter, "Kimi K2.6 — API Pricing & Providers," April 2026. https://openrouter.ai/moonshotai/kimi-k2.6 2

Frequently Asked Questions

Open weights and the benchmark suite shipped on April 20, 2026. The "Preview" label was removed and K2.6 became generally available across Kimi.com, the Kimi App, the API, and the Kimi Code CLI on April 21, 2026. The preview itself opened to Kimi Code subscribers on April 13, 2026.29

FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.