Kimi K2.7-Code: Open Weights, First-Party Numbers (2026)
١٣ يونيو ٢٠٢٦
Moonshot AI shipped Kimi K2.7-Code on June 12, 2026 — a 1-trillion-parameter open-weight coding model with frontier-cheap API pricing. It reports double-digit gains over its predecessor on every benchmark it published. The catch: as of launch, every one of those benchmarks is Moonshot's own.
TL;DR
- Kimi K2.7-Code is a Mixture-of-Experts model with 1T total parameters and 32B active per token, a 256K-token context window, and open weights under a Modified MIT license — released June 12, 2026.12
- API pricing is aggressive: $0.95 per million input tokens, $4.00 per million output tokens, and $0.19 per million on cache hits, per Moonshot's official pricing page.2
- Moonshot reports a +21.8% jump over Kimi K2.6 on its Kimi Code Bench v2 (50.9 → 62.0) and gains on five other suites, plus roughly 30% fewer "thinking" tokens than K2.6.1
- But all six launch benchmarks are Moonshot's proprietary suites. As of June 12, there were no independent third-party results on standard public benchmarks like SWE-bench Verified or Terminal-Bench.3
- On Moonshot's own numbers, K2.7-Code still trails GPT-5.5 and Claude Opus 4.8 on most rows — it edges Opus 4.8 on just one of the two tool-use benchmarks.1
What You'll Learn
- What Kimi K2.7-Code actually is, and how its architecture differs from K2.6
- What Moonshot's launch benchmarks claim — and why practitioners are cautious
- How K2.7-Code's pricing compares to closed frontier coding models
- Where the model genuinely stands out, and where it still trails
- What the "30% fewer reasoning tokens" claim means for your bill
- How to call it through the OpenAI-compatible API, and the constraints to watch
- Whether it's worth adopting today versus waiting for independent results
What Kimi K2.7-Code Is
Kimi K2.7-Code is a coding-specialized, agentic model from Moonshot AI, built on the earlier Kimi K2.6 release. It is designed for long-horizon software engineering rather than general chat: it plans, edits files, runs tools, and debugs across many steps.1
Architecturally it is a Mixture-of-Experts (MoE) transformer holding 1 trillion total parameters and activating 32 billion per token. The design uses 384 experts, with 8 routed plus 1 shared per token, across 61 layers (one of them dense). Attention uses Multi-head Latent Attention (MLA), the feed-forward path uses SwiGLU, and a MoonViT vision encoder adds about 400M parameters for image and video input. The model ships with native INT4 quantization, and the published context window is 256K tokens (262,144).12
This is not a laptop model. The Hugging Face repository weighs in at roughly 595 GB on disk, making self-hosting a server-class commitment even though the weights are openly licensed under a Modified MIT license.1
The Benchmark Picture — Read the Footnotes
Moonshot published six benchmark rows comparing K2.7-Code against K2.6, OpenAI's GPT-5.5, and Anthropic's Claude Opus 4.8. K2.7-Code beats its own predecessor on every row, with the largest coding jump on Kimi Code Bench v2 — from 50.9 to 62.0, a 21.8% relative gain.1
| Benchmark | Kimi K2.6 | Kimi K2.7-Code | GPT-5.5 | Claude Opus 4.8 | K2.7 vs K2.6 |
|---|---|---|---|---|---|
| Kimi Code Bench v2 | 50.9 | 62.0 | 69.0 | 67.4 | +21.8% |
| Program Bench | 48.3 | 53.6 | 69.1 | 63.8 | +11.0% |
| MLS Bench Lite | 26.7 | 35.1 | 35.5 | 42.8 | +31.5% |
| Kimi Claw 24/7 Bench | 42.9 | 46.9 | 52.8 | 50.4 | +9.3% |
| MCP Atlas | 69.4 | 76.0 | 79.4 | 81.3 | +9.5% |
| MCP Mark Verified | 72.8 | 81.1 | 92.9 | 76.4 | +11.4% |
Two things stand out once you read across the columns rather than down. First, the improvement over K2.6 is real and consistent. Second, K2.7-Code still trails both GPT-5.5 and Claude Opus 4.8 on most of these rows. The one clear win over Opus 4.8 is MCP Mark Verified (81.1 vs 76.4), a tool-use suite — but on the other tool-use benchmark, MCP Atlas, Opus 4.8 leads (81.3 vs 76.0).1
The run conditions also matter for any apples-to-apples reading: K2.7-Code ran inside Kimi Code CLI, GPT-5.5 ran in Codex on its "xhigh" setting, and Opus 4.8 ran in Claude Code on "xhigh." Harness choice affects agentic scores, so these are vendor-configured comparisons, not a neutral leaderboard.1
Why Practitioners Are Holding Applause
Here is the part that deserves the most weight: every benchmark above is one of Moonshot's own proprietary suites — Kimi Code Bench v2, Program Bench, MLS Bench Lite, Kimi Claw 24/7 Bench, MCP Atlas, and MCP Mark Verified. As of June 12, 2026, VentureBeat reported there were no independent third-party numbers for K2.7-Code on standard public benchmarks such as SWE-bench Verified, SWE-bench Pro, Terminal-Bench, or LiveCodeBench.3
That does not make the gains fake. It makes them unverified. Vendor-run benchmarks are a starting point, not settled evidence — and in coding especially, the gap between a curated internal suite and a community-run leaderboard can be large. Practitioner commentary collected by VentureBeat went further, suggesting some of the change reflects more "honest" behavior — for instance, writing real code where the older model leaned on library wrappers — rather than a straightforward jump in capability. In one probe of authored GPU kernels, several still failed on the model's own bugs.3
The practical takeaway: treat the +21.8% headline as a hypothesis to test on your own repository, not a number to put in a procurement deck.
Pricing: The Strongest Verified Claim
If the benchmarks are the soft part of the story, pricing is the hard part — and it's where K2.7-Code looks most compelling. Moonshot's official pricing page lists the model at $0.19 per million tokens on cache hits, $0.95 per million on cache-miss input, and $4.00 per million output.2
Against closed frontier coding models, that's a steep discount. Anthropic lists Claude Opus 4.8 at $5.00 per million input and $25.00 per million output — so K2.7-Code's output tokens are roughly one-sixth the price ($4.00 vs $25.00), before any caching.4
| Model | License | Params | Context | API (in / out per 1M) |
|---|---|---|---|---|
| Kimi K2.7-Code | Modified MIT (open) | 1T / 32B active | 256K | $0.95 / $4.00 |
| Kimi K2.6 | Open-weight | 1T-class MoE | 256K | open-weight |
| Claude Opus 4.8 | Closed | Not disclosed | 1M | $5.00 / $25.00 |
| Qwen3-Coder-480B-A35B | Open (Qwen license) | 480B / 35B active | 256K | varies by host |
⚠ Prices change frequently. The values above are for illustration only and may be out of date. Always verify current pricing directly with the provider before making cost decisions: Anthropic · OpenAI · Google Gemini · Google Vertex AI · AWS Bedrock · Azure OpenAI · Mistral · Cohere · Together AI · DeepSeek · Groq · Fireworks AI · Perplexity · xAI · Cursor · GitHub Copilot · Windsurf.
The cost story compounds with the ~30% reduction in reasoning tokens Moonshot claims versus K2.6. Reasoning tokens bill as output tokens on most price cards, and agentic coding runs hundreds or thousands of steps — so a per-step cut in "thinking" multiplies across a long run, lowering cost and freeing context budget at the same time.1 Like the benchmark gains, that 30% figure is a vendor claim, but it's a plausible one and easy for teams to measure directly.
Calling the Model: API Notes
The Kimi API is OpenAI-compatible, so adoption is mostly a base-URL swap. The model string is kimi-k2.7-code. A few server-side constraints are worth knowing before you wire it into an agent loop.5
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("MOONSHOT_API_KEY"),
base_url="https://api.moonshot.ai/v1",
)
resp = client.chat.completions.create(
model="kimi-k2.7-code",
messages=[
{"role": "system", "content": "You are a coding agent."},
{"role": "user", "content": "Refactor utils.py to remove duplicate code."},
],
max_tokens=32768, # default cap and also the maximum
)
print(resp.choices[0].message.content)
Three rules come straight from the docs. Thinking mode is mandatory — disabling it returns an API error. Sampling is locked to fixed values (temperature 1.0, top_p 0.95, n 1, penalties 0.0); passing anything else errors out. And during multi-step tool calls you must keep each turn's reasoning_content in context, or the next turn throws.5 These are unusual constraints, and they mean some existing agent scaffolds will need small adjustments rather than a drop-in swap.
Where It Fits in the Open-Weight Wave
K2.7-Code lands in the middle of a crowded open-weight coding race. In the past few weeks alone we've covered MiniMax M3's sparse-attention efficiency play, Nex-N2-Pro's run at GPT-5.5, and the broader Chinese open-weight cost war. Kimi's pitch within that field is specialization plus price: a coding-only model, openly licensed, that undercuts closed frontier labs on tokens.
What's missing is the same thing missing from most of these launches — neutral, reproducible evaluation. The model that wins developer trust in this segment may not be the one with the highest vendor chart, but the one whose numbers survive contact with an independent leaderboard.
The Bottom Line
Kimi K2.7-Code is a genuinely interesting release: open weights, a coding-focused 1T MoE, frontier-cheap pricing, and a credible efficiency claim. But the launch benchmarks are entirely first-party, and on Moonshot's own charts the model still sits behind GPT-5.5 and Claude Opus 4.8 on most tasks. The right posture is curiosity, not conversion — pull the weights or hit the API, run it against your own repository, and let independent evaluations catch up before you treat the headline numbers as settled.
Footnotes
-
Asif Razzaq, "Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6," MarkTechPost, June 12, 2026 (release date, Modified MIT license, built on K2.6, MoE architecture — 1T total / 32B active, 384 experts, 61 layers, MLA, SwiGLU, MoonViT +400M, INT4, ~595 GB; full six-row benchmark table with run conditions; ~30% reasoning-token claim; comparator table). https://www.marktechpost.com/2026/06/12/moonshot-ai-releases-kimi-k2-7-code-a-coding-model-reporting-21-8-on-kimi-code-bench-v2-over-k2-6/ ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13
-
"Coding Model Kimi K2.7 Code Pricing," Moonshot/Kimi official documentation (cache hit $0.19, cache-miss input $0.95, output $4.00 per 1M tokens; 262,144-token context window). https://platform.kimi.ai/docs/pricing/chat-k27-code ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
"Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out," VentureBeat, June 12, 2026 (all launch benchmarks are Moonshot's proprietary suites; no independent third-party results on SWE-bench Verified, SWE-bench Pro, Terminal-Bench, or LiveCodeBench as of June 12; practitioner skepticism on capability vs. behavior). https://venturebeat.com/technology/kimi-k2-7-code-cuts-thinking-tokens-30-practitioners-say-benchmarks-dont-check-out ↩ ↩2 ↩3 ↩4 ↩5
-
Comparator pricing and specs (Claude Opus 4.8: closed, 1M context, $5.00 / $25.00 per 1M; Qwen3-Coder-480B-A35B: open, 480B / 35B active, 256K), MarkTechPost "How K2.7-Code Compares" table, June 12, 2026. https://www.marktechpost.com/2026/06/12/moonshot-ai-releases-kimi-k2-7-code-a-coding-model-reporting-21-8-on-kimi-code-bench-v2-over-k2-6/ ↩
-
"Kimi K2.7 Code" quickstart, Moonshot/Kimi official documentation (256K context; mandatory thinking mode; fixed sampling parameters; OpenAI-compatible API; model string
kimi-k2.7-code; tool-use andreasoning_contentconstraints; default max_tokens 32,768). https://platform.kimi.ai/docs/guide/kimi-k2-7-code-quickstart ↩ ↩2 ↩3 ↩4