MiniMax M3: Open-Weight Coding at 1/10 the Cost (2026)
June 9, 2026
MiniMax M3 is a Chinese open-weight language model, released June 1, 2026, that pairs frontier-level coding with a 1-million-token context window and native multimodality — at roughly a tenth of the per-token price of GPT-5.5 or Claude Opus.12 The headline benchmarks are real, but they are self-reported and measured against an Anthropic model that has already been superseded.
The interesting part is the new attention mechanism underneath it, MiniMax Sparse Attention (MSA), which is what makes a 1M-token context affordable to serve.1 The asterisks are worth your attention too: the numbers come from MiniMax's own test harness, the comparison ceiling was Opus 4.7 rather than the newer Opus 4.8, and — as of this writing — you still cannot download the "open" weights.34
TL;DR
MiniMax M3 is an open-weight model from the Shanghai-based AI company MiniMax that reaches frontier-class scores on coding and agentic benchmarks while costing far less to run than the proprietary leaders.12 Its core innovation is MSA (MiniMax Sparse Attention), which selects only the relevant blocks of the key-value cache instead of attending to every token, cutting per-token compute at 1M context to 1/20th of the previous generation, with more than 9× faster prefill and more than 15× faster decode.1 On MiniMax's own SWE-Bench Pro run, M3 scores 59.0%, edging GPT-5.5 (58.6%), beating Gemini 3.1 Pro (54.2%), and trailing Claude Opus 4.7 (64.3%).15 Three caveats matter: every headline benchmark was produced on MiniMax's infrastructure and scaffolding; the comparison used Opus 4.7, while Opus 4.8 (69.2% on SWE-Bench Pro) had already shipped days earlier;36 and despite the "open-weight" framing, the weights and technical report were still pending their promised release as of June 9, 2026.14 Where M3 is unambiguously strong is price: around $0.60 per million input tokens and $2.40 per million output, roughly an order of magnitude below GPT-5.5 and Opus 4.8.789
What is MiniMax M3?
MiniMax M3 is a large language model from MiniMax, the Chinese AI company founded in 2022 and backed by Alibaba and Tencent that listed in Hong Kong in January 2026.10 Released on June 1, 2026, M3 is positioned as a frontier model for coding and agentic work, and MiniMax bills it as the first open-weight model to combine three things proprietary labs had kept bundled: top-tier coding, a 1-million-token context window, and native multimodality covering image and video input plus desktop computer use.12
Under the hood it is a reasoning model with a toggle that turns extended "thinking" on for complex tasks or off for latency-sensitive ones, at the same price either way.1 It was trained with mixed modalities from the start, and MiniMax says it scaled training data to the order of 100 trillion tokens.1 Independent evaluator Artificial Analysis places M3 at 55 on its Intelligence Index — well above the median for models in its price tier, though below the proprietary frontier leaders — which is a useful reality check on the more aggressive self-reported claims.4
One detail the marketing glosses over: MiniMax has not disclosed M3's parameter count, and Artificial Analysis currently lists the model as proprietary precisely because the weights are not yet public.4 So while "open-weight" is the pitch, the verifiable facts available today are about the API product, not a downloadable model.
What is MiniMax Sparse Attention (MSA)?
MiniMax Sparse Attention (MSA) is the attention architecture introduced with M3, and it is the reason the model can offer a 1M-token context without compute costs spiraling.1 Classic full attention compares every token with every other token, so cost grows quadratically with input length. MSA avoids that by computing attention only over the slices of context that matter for the current query.3
Mechanically, MSA works in two stages. The key-value (KV) cache is split into blocks; a lightweight index branch scores the blocks and keeps only the most relevant via top-k selection; then a sparse branch runs full attention on just those blocks.3 Independent analyses of the architecture diagram MiniMax published read M3 as keeping a standard grouped-query attention (GQA) backbone and running attention on the real, uncompressed KV — unlike DeepSeek's latent-attention (MLA) approach, which compresses keys and values into a lower-dimensional space — though MiniMax has not yet released the full technical report to confirm those details.11 MiniMax's own, narrower claim is that MSA partitions the KV into blocks more precisely than rival sparse-attention designs like DeepSeek's DSA and Moonshot's MoBA, achieving higher effective context coverage.1
MiniMax also reworked the GPU-level execution with what it calls a "KV outer gather Q" approach: instead of loading KV blocks separately for each query, blocks are processed sequentially and every query that needs a block is batched together, so each block is read from memory once in a contiguous pattern.13 MiniMax claims this runs more than 4× faster than open-source sparse-attention kernels, and that across ablations MSA matched full attention on the vast majority of capabilities.1 The payoff it reports: at 1 million tokens of context, M3's per-token compute is 1/20th of the previous generation, with prefill more than 9× faster and decoding more than 15× faster.12 If you have followed the long-context efficiency race — from subquadratic attention designs to KV-cache compression tricks — MSA is another credible swing at the same problem, this time shipping inside a frontier-class model.
MiniMax M3 benchmarks: strong, but self-reported
On paper, M3's coding numbers are excellent. Across MiniMax's reported results, M3 hits 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 34.8% on SWE-fficiency, 28.8% on KernelBench Hard, and 74.2% on MCP Atlas.1 On autonomous web browsing it posts 83.5 on BrowseComp, ahead of the 79.3 it lists for Opus 4.7.13 MiniMax's headline framing is that M3 "surpasses GPT-5.5 and Gemini 3.1 Pro and approaches Opus 4.7" on coding.1
Here is the part most coverage skips. Every one of those numbers was produced on MiniMax's own infrastructure, using scaffolding it chose (largely Claude Code), against baselines it selected.1 That is not inherently dishonest — labs routinely benchmark this way — but it means the results are claims awaiting independent replication, not settled facts. TechTimes flagged exactly this on launch day, headlining its coverage "Frontier Claims, Unverified Benchmarks."5 The one independent data point we do have, Artificial Analysis's Intelligence Index score of 55, is genuinely good for the price class but does not put M3 atop the frontier.4 Treat M3's coding scores as a strong opening bid, not a final verdict.
The Opus 4.8 problem: M3 was measured against the wrong ceiling
The most important caveat is about timing. MiniMax compared M3 against Claude Opus 4.7 and reported "approaching" it on SWE-Bench Pro (59.0% versus 64.3%).16 But Anthropic had already shipped Claude Opus 4.8 on May 28, 2026 — four days before M3's June 1 launch — and Opus 4.8 scores 69.2% on SWE-Bench Pro, up from 64.3% on 4.7.6 The Decoder, covering M3 the same day, noted the gap directly: "Anthropic has since shipped Opus 4.8, a somewhat stronger model."3
That shifts the story. Against the Anthropic model that was actually current at launch, M3's 59.0% trails by roughly ten points, not the comfortable near-parity the chart implies. It is a textbook example of how a self-selected baseline flatters a result.
| Model | SWE-Bench Pro | Source of score | Input $/M | Output $/M |
|---|---|---|---|---|
| MiniMax M3 | 59.0% | MiniMax (internal harness) | ~$0.60 | ~$2.40 |
| GPT-5.5 | 58.6% | MiniMax-reported | $5.00 | $30.00 |
| Gemini 3.1 Pro | 54.2% | MiniMax-reported | — | — |
| Claude Opus 4.7 | 64.3% | MiniMax-reported | $5.00 | $25.00 |
| Claude Opus 4.8 | 69.2% | Anthropic (current frontier) | $5.00 | $25.00 |
The table also makes the real selling point obvious — look at the right-hand columns, not just the score column.16789 M3 is not the best coder here; it is by far the cheapest credible one.
MiniMax M3 pricing: where the model actually wins
Strip away the benchmark theater and the genuine story is cost. M3's API runs at roughly $0.60 per million input tokens and $2.40 per million output tokens at standard rates, with a 50% launch-week discount that dropped those to about $0.30 and $1.20 in the first days after release.9 Pricing is tiered by input length: requests up to 512K tokens bill at the standard rate, while longer contexts cost more — a sensible split given that most coding and chat sessions never approach the ceiling.1
Set that next to the proprietary leaders and the gap is an order of magnitude. GPT-5.5 lists at $5.00 input and $30.00 output per million tokens; Claude Opus 4.8 at $5.00 and $25.00.78 That puts M3's input price around 12% of GPT-5.5's and its output price around 8% — the "5 to 10% of the cost" framing that accompanied its debut.2 For teams running high-volume agentic workloads, a model that lands near GPT-5.5 on coding while costing a tenth as much is a serious proposition, even with the benchmark asterisks. MiniMax also sells subscription token plans that bundle large quotas: $20/month for roughly 1.7 billion tokens, up to $120/month for about 9.8 billion.1 It is the same aggressive-pricing playbook we have watched Chinese open-weight labs run all year, and the DeepSeek V4 launch before it.
The open-weight catch: you can't download it yet
"Open-weight" is doing a lot of work in M3's positioning, so it is worth being precise. At launch, MiniMax said it would publish the model weights and a full technical report on Hugging Face and GitHub "over the next 10 days" — which points to roughly June 11.1 As of June 9, 2026, that had not happened: the weights are not on MiniMax's Hugging Face organization or its M3 GitHub repository, even though the company's earlier M2.7 weights are already public there.4 Artificial Analysis, reflecting this, still classifies M3 as a proprietary model with an undisclosed parameter count.4
None of this means the weights won't arrive — MiniMax has a track record of open-sourcing its models, and the release window is still open. But until they land, M3 is functionally an API-only product wearing an open-weight label, and the usual benefits of open weights (self-hosting, fine-tuning, auditing the architecture) remain promises rather than options. If you are evaluating M3 specifically because it is open, that is the box to watch this week.
Bottom line
MiniMax M3 is a real achievement wrapped in a slightly oversold launch. The MSA architecture is a genuinely clever answer to the cost of long context, the multimodal-plus-1M-context combination remains rare outside the major proprietary labs, and the pricing is the kind that reshapes what high-volume teams can afford to run.12 But the benchmark narrative leans on self-run tests against a baseline (Opus 4.7) that Anthropic had already superseded with Opus 4.8, and the "open-weight" label describes a release that hasn't shipped yet.346 The honest summary: M3 is probably the best dollar-for-dollar coding model you can hit through an API today, and a weaker claimant to the outright frontier than its own charts suggest. The independent benchmarks and the actual weights — both expected within days — will settle which half of that sentence carries more weight.
Related reading: DeepSeek V4: open-weight frontier at 1/7 the cost, China's open-weight coding wave, and Claude Opus 4.8: benchmarks and pricing.
Footnotes
-
MiniMax, "MiniMax M3: Frontier Coding, 1M Context, Native Multimodality — All in One Model," June 1, 2026 (release date, MSA architecture, efficiency figures, reported benchmarks, pricing tiers, token plans, open-weight/technical-report timeline). https://www.minimax.io/blog/minimax-m3 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18 ↩19 ↩20 ↩21 ↩22 ↩23 ↩24 ↩25 ↩26 ↩27 ↩28 ↩29
-
"MiniMax-M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5–10% of the cost," VentureBeat, June 1, 2026 (cost framing, efficiency, positioning). https://venturebeat.com/technology/minimax-m3-debuts-eclipsing-gpt-5-5-and-gemini-3-1-pro-on-key-benchmark-performance-for-just-5-10-of-the-cost ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7
-
"MiniMax M3: Open-weight model with a million-token context challenges proprietary leaders," The Decoder, June 1, 2026 (MSA mechanics, Opus 4.8 note, benchmark context). https://the-decoder.com/minimax-m3-open-weight-model-with-a-million-token-context-challenges-proprietary-leaders/ ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9
-
"MiniMax-M3 — Intelligence, Performance & Price Analysis," Artificial Analysis (Intelligence Index 55, proprietary classification, undisclosed parameters, weights not yet public). https://artificialanalysis.ai/models/minimax-m3 ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9
-
"MiniMax M3 Open-Weight Coding Model: Frontier Claims, Unverified Benchmarks," TechTimes, June 1, 2026. https://www.techtimes.com/articles/317532/20260601/minimax-m3-open-weight-coding-model-frontier-claims-unverified-benchmarks.htm ↩ ↩2 ↩3
-
"Claude Opus 4.8 Release, Benchmarks And More," LLM-Stats (SWE-Bench Pro 69.2%, up from 64.3% on Opus 4.7); release date May 28, 2026 per "Anthropic releases Opus 4.8 with new 'dynamic workflow' tool," TechCrunch, May 28, 2026. https://llm-stats.com/blog/research/claude-opus-4-8-launch ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
"GPT-5.5 — API Pricing," OpenAI / OpenRouter (GPT-5.5 standard pricing $5.00 input / $30.00 output per million tokens). https://openrouter.ai/openai/gpt-5.5 ↩ ↩2 ↩3 ↩4
-
"Claude Opus 4.8 API Pricing," price-per-token reference (Opus 4.8 standard pricing $5.00 input / $25.00 output per million tokens). https://pricepertoken.com/pricing-page/model/anthropic-claude-opus-4.8 ↩ ↩2 ↩3 ↩4
-
"MiniMax M3 — API Pricing," OpenRouter (standard $0.60/M input, $2.40/M output; 50% launch-week promo $0.30/$1.20). https://openrouter.ai/minimax/minimax-m3 ↩ ↩2 ↩3 ↩4
-
"MiniMax doubles in Hong Kong debut, marking yet another Chinese AI listing," CNBC, January 9, 2026 (MiniMax founded 2022, Alibaba/Tencent backing, Hong Kong IPO). https://www.cnbc.com/2026/01/09/minimax-hong-kong-ipo-ai-tigers-zhipu.html ↩
-
"MiniMax Goes Sparse: Decoding M3's Attention from a Single Diagram," Atlas Cloud (Hugging Face community article), May 29, 2026 (independent analysis of MSA — GQA backbone, real/uncompressed KV, comparison with DeepSeek NSA/DSA/CSA; details inferred from MiniMax's diagram, pending the official technical report). https://huggingface.co/blog/AtlasCloud-AI/minimax-goes-sparse ↩