Is MiniMax M3 better than GPT-5.5?

On MiniMax's own SWE-Bench Pro testing, M3 (59.0%) narrowly edges GPT-5.5 (58.6%). But the numbers are self-reported on MiniMax's infrastructure and haven't been independently replicated, so treat the lead as marginal and provisional rather than decisive. 1 5

How much does MiniMax M3 cost?

About $0.60 per million input tokens and $2.40 per million output tokens at standard API rates, with a 50% launch-week discount initially. That's roughly a tenth of the per-token cost of GPT-5.5 ($5/$30) or Claude Opus 4.8 ($5/$25). 7 8 9

Are MiniMax M3's weights actually open?

Yes, as of June 7, 2026. MiniMax announced it would release the weights and a technical report within about 10 days of the June 1 launch; the weights were published to Hugging Face on June 7, 2026, and the full technical report followed on arXiv on June 11, 2026 — inside the promised window. 1 4

How does MiniMax M3 compare to Claude Opus?

MiniMax benchmarked M3 against Opus 4.7 (64.3% on SWE-Bench Pro) and reported approaching it. But Anthropic's newer Opus 4.8, released days before M3, scores 69.2%, so against the current Anthropic frontier M3 trails by about ten points on that benchmark. 1 6

ai-ml

MiniMax M3: Open-Weight Coding at 1/10 the Cost (2026)

Q: What is MiniMax Sparse Attention (MSA)?

MSA is the new attention architecture in M3. It splits the key-value cache into blocks, uses a lightweight index branch to pick only the relevant blocks for each query, and runs full attention on just those. This keeps a 1M-token context affordable, cutting per-token compute to 1/20th of the previous generation. 1 3

June 9, 2026

#minimax m3 #minimax #minimax sparse attention #open-weight models #llm benchmarks #ai coding models #chinese ai models

MiniMax M3: Open-Weight Coding at 1/10 the Cost (2026)

MiniMax M3 is a Chinese open-weight language model, released June 1, 2026, that pairs frontier-level coding with a 1-million-token context window and native multimodality — at roughly a tenth of the per-token price of GPT-5.5 or Claude Opus.¹² The headline benchmarks are real, but they are self-reported and measured against an Anthropic model that has already been superseded.

The interesting part is the new attention mechanism underneath it, MiniMax Sparse Attention (MSA), which is what makes a 1M-token context affordable to serve.¹ The asterisks are worth your attention too: the numbers come from MiniMax's own test harness, the comparison ceiling was Opus 4.7 rather than the newer Opus 4.8, and — as of this writing — you still cannot download the "open" weights.³⁴

TL;DR

MiniMax M3 is an open-weight model from the Shanghai-based AI company MiniMax that reaches frontier-class scores on coding and agentic benchmarks while costing far less to run than the proprietary leaders.¹² Its core innovation is MSA (MiniMax Sparse Attention), which selects only the relevant blocks of the key-value cache instead of attending to every token, cutting per-token compute at 1M context to 1/20th of the previous generation, with more than 9× faster prefill and more than 15× faster decode.¹ On MiniMax's own SWE-Bench Pro run, M3 scores 59.0%, edging GPT-5.5 (58.6%), beating Gemini 3.1 Pro (54.2%), and trailing Claude Opus 4.7 (64.3%).¹⁵ Three caveats matter: every headline benchmark was produced on MiniMax's infrastructure and scaffolding; the comparison used Opus 4.7, while Opus 4.8 (69.2% on SWE-Bench Pro) had already shipped days earlier;³⁶ and despite the "open-weight" framing, the weights did not land on Hugging Face until June 7, 2026 — six days after launch, ahead of MiniMax's own 10-day estimate — with the full technical report following on June 11.¹⁴ Where M3 is unambiguously strong is price: around $0.60 per million input tokens and $2.40 per million output, roughly an order of magnitude below GPT-5.5 and Opus 4.8.⁷⁸⁹

What is MiniMax M3?

MiniMax M3 is a large language model from MiniMax, the Chinese AI company founded in 2022 and backed by Alibaba and Tencent that listed in Hong Kong in January 2026.¹⁰ Released on June 1, 2026, M3 is positioned as a frontier model for coding and agentic work, and MiniMax bills it as the first open-weight model to combine three things proprietary labs had kept bundled: top-tier coding, a 1-million-token context window, and native multimodality covering image and video input plus desktop computer use.¹²

Under the hood it is a reasoning model with a toggle that turns extended "thinking" on for complex tasks or off for latency-sensitive ones, at the same price either way.¹ It was trained with mixed modalities from the start, and MiniMax says it scaled training data to the order of 100 trillion tokens.¹ Independent evaluator Artificial Analysis places M3 at 55 on its Intelligence Index — well above the median for models in its price tier, though below the proprietary frontier leaders — which is a useful reality check on the more aggressive self-reported claims.⁴

One detail the marketing glosses over: MiniMax has not disclosed M3's parameter count, and Artificial Analysis currently lists the model as proprietary precisely because the weights are not yet public.⁴ So while "open-weight" is the pitch, the verifiable facts available today are about the API product, not a downloadable model.

What is MiniMax Sparse Attention (MSA)?

MiniMax Sparse Attention (MSA) is the attention architecture introduced with M3, and it is the reason the model can offer a 1M-token context without compute costs spiraling.¹ Classic full attention compares every token with every other token, so cost grows quadratically with input length. MSA avoids that by computing attention only over the slices of context that matter for the current query.³

Mechanically, MSA works in two stages. The key-value (KV) cache is split into blocks; a lightweight index branch scores the blocks and keeps only the most relevant via top-k selection; then a sparse branch runs full attention on just those blocks.³ Independent analyses of the architecture diagram MiniMax published read M3 as keeping a standard grouped-query attention (GQA) backbone and running attention on the real, uncompressed KV — unlike DeepSeek's latent-attention (MLA) approach, which compresses keys and values into a lower-dimensional space — though MiniMax has not yet released the full technical report to confirm those details.¹¹ MiniMax's own, narrower claim is that MSA partitions the KV into blocks more precisely than rival sparse-attention designs like DeepSeek's DSA and Moonshot's MoBA, achieving higher effective context coverage.¹

MiniMax also reworked the GPU-level execution with what it calls a "KV outer gather Q" approach: instead of loading KV blocks separately for each query, blocks are processed sequentially and every query that needs a block is batched together, so each block is read from memory once in a contiguous pattern.¹³ MiniMax claims this runs more than 4× faster than open-source sparse-attention kernels, and that across ablations MSA matched full attention on the vast majority of capabilities.¹ The payoff it reports: at 1 million tokens of context, M3's per-token compute is 1/20th of the previous generation, with prefill more than 9× faster and decoding more than 15× faster.¹² If you have followed the long-context efficiency race — from subquadratic attention designs to KV-cache compression tricks — MSA is another credible swing at the same problem, this time shipping inside a frontier-class model.

MiniMax M3 benchmarks: strong, but self-reported

On paper, M3's coding numbers are excellent. Across MiniMax's reported results, M3 hits 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 34.8% on SWE-fficiency, 28.8% on KernelBench Hard, and 74.2% on MCP Atlas.¹ On autonomous web browsing it posts 83.5 on BrowseComp, ahead of the 79.3 it lists for Opus 4.7.¹³ MiniMax's headline framing is that M3 "surpasses GPT-5.5 and Gemini 3.1 Pro and approaches Opus 4.7" on coding.¹

Here is the part most coverage skips. Every one of those numbers was produced on MiniMax's own infrastructure, using scaffolding it chose (largely Claude Code), against baselines it selected.¹ That is not inherently dishonest — labs routinely benchmark this way — but it means the results are claims awaiting independent replication, not settled facts. TechTimes flagged exactly this on launch day, headlining its coverage "Frontier Claims, Unverified Benchmarks."⁵ The one independent data point we do have, Artificial Analysis's Intelligence Index score of 55, is genuinely good for the price class but does not put M3 atop the frontier.⁴ Treat M3's coding scores as a strong opening bid, not a final verdict.

The Opus 4.8 problem: M3 was measured against the wrong ceiling

The most important caveat is about timing. MiniMax compared M3 against Claude Opus 4.7 and reported "approaching" it on SWE-Bench Pro (59.0% versus 64.3%).¹⁶ But Anthropic had already shipped Claude Opus 4.8 on May 28, 2026 — four days before M3's June 1 launch — and Opus 4.8 scores 69.2% on SWE-Bench Pro, up from 64.3% on 4.7.⁶ The Decoder, covering M3 the same day, noted the gap directly: "Anthropic has since shipped Opus 4.8, a somewhat stronger model."³

That shifts the story. Against the Anthropic model that was actually current at launch, M3's 59.0% trails by roughly ten points, not the comfortable near-parity the chart implies. It is a textbook example of how a self-selected baseline flatters a result.

Model	SWE-Bench Pro	Source of score	Input $/M	Output $/M
MiniMax M3	59.0%	MiniMax (internal harness)	~$0.60	~$2.40
GPT-5.5	58.6%	MiniMax-reported	$5.00	$30.00
Gemini 3.1 Pro	54.2%	MiniMax-reported	—	—
Claude Opus 4.7	64.3%	MiniMax-reported	$5.00	$25.00
Claude Opus 4.8	69.2%	Anthropic (current frontier)	$5.00	$25.00

The table also makes the real selling point obvious — look at the right-hand columns, not just the score column.¹⁶⁷⁸⁹ M3 is not the best coder here; it is by far the cheapest credible one.

MiniMax M3 pricing: where the model actually wins

Strip away the benchmark theater and the genuine story is cost. M3's API runs at roughly $0.60 per million input tokens and $2.40 per million output tokens at standard rates, with a 50% launch-week discount that dropped those to about $0.30 and $1.20 in the first days after release.⁹ Pricing is tiered by input length: requests up to 512K tokens bill at the standard rate, while longer contexts cost more — a sensible split given that most coding and chat sessions never approach the ceiling.¹

Set that next to the proprietary leaders and the gap is an order of magnitude. GPT-5.5 lists at $5.00 input and $30.00 output per million tokens; Claude Opus 4.8 at $5.00 and $25.00.⁷⁸ That puts M3's input price around 12% of GPT-5.5's and its output price around 8% — the "5 to 10% of the cost" framing that accompanied its debut.² For teams running high-volume agentic workloads, a model that lands near GPT-5.5 on coding while costing a tenth as much is a serious proposition, even with the benchmark asterisks. MiniMax also sells subscription token plans that bundle large quotas: $20/month for roughly 1.7 billion tokens, up to $120/month for about 9.8 billion.¹ It is the same aggressive-pricing playbook we have watched Chinese open-weight labs run all year, and the DeepSeek V4 launch before it.

The open-weight catch: you can't download it yet

"Open-weight" is doing a lot of work in M3's positioning, so it is worth being precise. At launch, MiniMax said it would publish the model weights and a full technical report on Hugging Face and GitHub "over the next 10 days" — which points to roughly June 11.¹ For the first six days, that had not happened: at launch the weights were not on MiniMax's Hugging Face organization or its M3 GitHub repository, even though the company's earlier M2.7 weights were already public there.⁴ Artificial Analysis, reflecting the pre-release gap, initially classified M3 as a proprietary model with an undisclosed parameter count.⁴ The weights landed on MiniMax's Hugging Face organization on June 7, 2026, and the full MSA technical report followed on arXiv on June 11, 2026 — inside MiniMax's own 10-day window.⁴

MiniMax's track record of open-sourcing its models held: the weights and technical report both arrived within the promised window, letting outside engineers begin verifying the efficiency claims. Readers evaluating M3 before June 7 would have been looking at an API-only product wearing an open-weight label; that gap has since closed.

Bottom line

MiniMax M3 is a real achievement wrapped in a slightly oversold launch. The MSA architecture is a genuinely clever answer to the cost of long context, the multimodal-plus-1M-context combination remains rare outside the major proprietary labs, and the pricing is the kind that reshapes what high-volume teams can afford to run.¹² But the benchmark narrative leans on self-run tests against a baseline (Opus 4.7) that Anthropic had already superseded with Opus 4.8, and for its first six days the "open-weight" label described a release that hadn't shipped yet.³⁴⁶ The honest summary: M3 is probably the best dollar-for-dollar coding model you can hit through an API today, and a weaker claimant to the outright frontier than its own charts suggest. The weights landed on Hugging Face June 7 and the technical report on arXiv June 11, so independent verification of the efficiency and benchmark claims can now begin in earnest.⁴

MiniMax, "MiniMax M3: Frontier Coding, 1M Context, Native Multimodality — All in One Model," June 1, 2026 (release date, MSA architecture, efficiency figures, reported benchmarks, pricing tiers, token plans, open-weight/technical-report timeline). https://www.minimax.io/blog/minimax-m3 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹ ↩²² ↩²³ ↩²⁴ ↩²⁵ ↩²⁶ ↩²⁷ ↩²⁸ ↩²⁹
"MiniMax-M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5–10% of the cost," VentureBeat, June 1, 2026 (cost framing, efficiency, positioning). https://venturebeat.com/technology/minimax-m3-debuts-eclipsing-gpt-5-5-and-gemini-3-1-pro-on-key-benchmark-performance-for-just-5-10-of-the-cost ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
"MiniMax M3: Open-weight model with a million-token context challenges proprietary leaders," The Decoder, June 1, 2026 (MSA mechanics, Opus 4.8 note, benchmark context). https://the-decoder.com/minimax-m3-open-weight-model-with-a-million-token-context-challenges-proprietary-leaders/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹
"MiniMax-M3 — Intelligence, Performance & Price Analysis," Artificial Analysis (Intelligence Index 55, proprietary classification at launch, undisclosed parameters); weights-publication timeline per "MiniMaxAI/MiniMax-M3," Hugging Face (weights published June 7, 2026) and "MiniMax M3 Takes Open-Weight AI Lead: Sparse Attention Architecture Now Verified," TechTimes, June 18, 2026 (confirms technical report published on arXiv June 11, 2026). https://artificialanalysis.ai/models/minimax-m3 ; https://huggingface.co/MiniMaxAI/MiniMax-M3 ; https://www.techtimes.com/articles/318622/20260618/minimax-m3-takes-open-weight-ai-lead-sparse-attention-architecture-now-verified.htm ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹
"MiniMax M3 Open-Weight Coding Model: Frontier Claims, Unverified Benchmarks," TechTimes, June 1, 2026. https://www.techtimes.com/articles/317532/20260601/minimax-m3-open-weight-coding-model-frontier-claims-unverified-benchmarks.htm ↩ ↩² ↩³
"Claude Opus 4.8 Release, Benchmarks And More," LLM-Stats (SWE-Bench Pro 69.2%, up from 64.3% on Opus 4.7); release date May 28, 2026 per "Anthropic releases Opus 4.8 with new 'dynamic workflow' tool," TechCrunch, May 28, 2026. https://llm-stats.com/blog/research/claude-opus-4-8-launch ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
"GPT-5.5 — API Pricing," OpenAI / OpenRouter (GPT-5.5 standard pricing $5.00 input / $30.00 output per million tokens). https://openrouter.ai/openai/gpt-5.5 ↩ ↩² ↩³ ↩⁴
"Claude Opus 4.8 API Pricing," price-per-token reference (Opus 4.8 standard pricing $5.00 input / $25.00 output per million tokens). https://pricepertoken.com/pricing-page/model/anthropic-claude-opus-4.8 ↩ ↩² ↩³ ↩⁴
"MiniMax M3 — API Pricing," OpenRouter (standard $0.60/M input, $2.40/M output; 50% launch-week promo $0.30/$1.20). https://openrouter.ai/minimax/minimax-m3 ↩ ↩² ↩³ ↩⁴
"MiniMax doubles in Hong Kong debut, marking yet another Chinese AI listing," CNBC, January 9, 2026 (MiniMax founded 2022, Alibaba/Tencent backing, Hong Kong IPO). https://www.cnbc.com/2026/01/09/minimax-hong-kong-ipo-ai-tigers-zhipu.html ↩
"MiniMax Goes Sparse: Decoding M3's Attention from a Single Diagram," Atlas Cloud (Hugging Face community article), May 29, 2026 (independent analysis of MSA — GQA backbone, real/uncompressed KV, comparison with DeepSeek NSA/DSA/CSA; details inferred from MiniMax's diagram, pending the official technical report). https://huggingface.co/blog/AtlasCloud-AI/minimax-goes-sparse ↩

Frequently Asked Questions

It's an open-weight large language model released by Chinese AI company MiniMax on June 1, 2026, built for coding and agentic tasks. It combines frontier-class coding performance, a 1-million-token context window, and native multimodality (text, image, and video input) in a single model. 1 2

MiniMax M3: Open-Weight Coding at 1/10 the Cost (2026)

TL;DR

What is MiniMax M3?

What is MiniMax Sparse Attention (MSA)?

MiniMax M3 benchmarks: strong, but self-reported

The Opus 4.8 problem: M3 was measured against the wrong ceiling

MiniMax M3 pricing: where the model actually wins

The open-weight catch: you can't download it yet

Bottom line

Frequently Asked Questions

Related Posts

MiniMax M3: Open-Weight 1M-Context Coding Model

Nex-N2-Pro: Open-Weight Coder vs GPT-5.5 (2026)

Claude Fable 5: Anthropic's Mythos-Class Model (2026)

GLM-5.2: Open-Weight 1M-Context Coding Model (2026)