How big is MiniMax M3?

Roughly 428 billion total parameters with about 23 billion active per token, using a Mixture-of-Experts architecture. 1

What is the context window?

One million tokens, made practical by MiniMax Sparse Attention (MSA). 1

Does MiniMax M3 beat GPT-5.5?

On MiniMax's own SWE-Bench Pro run, M3 scored 59.0% to GPT-5.5's reported 58.6% — a narrow lead on one vendor-run benchmark. It trails Claude Opus 4.8 on every shared benchmark. 2

How much does it cost?

Launch API pricing is $0.30 per million input tokens and $1.20 per million output tokens — a temporary 50% discount off the standard $0.60 / $2.40 rate. 2

Can I run it locally?

The weights are on Hugging Face and can be served with vLLM or SGLang, but at ~428B parameters it needs multiple GPUs. 1

ai-ml

MiniMax M3: Open-Weight 1M-Context Coding Model

June 20, 2026

#minimax m3 #open-weight models #1m context window #ai coding agents #mixture of experts #llm benchmarks #ai-ml

MiniMax M3: Open-Weight 1M-Context Coding Model

MiniMax M3 is an open-weight, natively multimodal Mixture-of-Experts model with a 1-million-token context window, released June 1, 2026. It pairs frontier-level agentic coding scores with a new sparse-attention design that makes long context far cheaper to run.

TL;DR

MiniMax M3 ships ~428B total parameters with ~23B active per token¹, a 1M-token context window, and native text/image/video input. Its headline trick is MiniMax Sparse Attention (MSA), which the company says delivers a 9× prefill and 15× decode speedup over its M2 generation at 1M context while cutting per-token compute to one-twentieth.¹ On MiniMax's own benchmarks it edges GPT-5.5 on SWE-Bench Pro but trails Anthropic's Claude Opus 4.8 across the board²³. One independent index already ranks its reasoning variant first among open-weight models.⁴ The catch most write-ups get wrong: it is not MIT-licensed — commercial use carries conditions.⁵

What you'll learn

What MiniMax M3 actually is, in one paragraph
How MiniMax Sparse Attention makes 1M context affordable
How M3's benchmarks compare to GPT-5.5, Gemini 3.1 Pro, and Claude Opus 4.8 — and why the numbers need an asterisk
What it costs to call, and how to self-host it
The license trap: why "open weights" does not mean "free for your startup"
Who should actually reach for M3 today

What is MiniMax M3?

MiniMax M3 is a frontier open-weight model from MiniMax, a Shanghai-based AI lab, announced on June 1, 2026, with the open weights and a technical report (arXiv:2606.13392) following on Hugging Face and GitHub in the days after launch.¹⁶ Architecturally it is a Mixture-of-Experts (MoE) multimodal transformer: roughly 428 billion total parameters, but only about 23 billion are activated per token, so inference compute scales with the active count rather than the full model.¹

Three properties define it. It is natively multimodal — trained on mixed text, image, and video from the first step rather than bolting vision on afterward.¹ It has a 1-million-token context window, enough to hold a large codebase or a long agent trajectory in a single prompt.¹ And it is open weight, downloadable and self-hostable, which separates it from closed competitors like GPT-5.5 and Gemini 3.1 Pro. MiniMax markets M3 as the first open-weight model to combine all three of those traits in one architecture — a claim worth treating as the company's framing rather than an independently audited fact.⁶ The model also exposes two modes: a thinking mode for complex reasoning and long-horizon agent work, and a non-thinking mode for latency-sensitive chat and code completion.¹

MiniMax Sparse Attention (MSA): why 1M context gets cheap

Standard attention cost grows quadratically with sequence length, which is why million-token context windows are usually slow and expensive. M3's answer is MiniMax Sparse Attention (MSA), a sparse-attention operator built specifically for million-token contexts. Compared with grouped-query attention (GQA), MSA cuts both attention compute and memory while, MiniMax says, preserving model quality.¹

The reported gains are large. Against its own M2 generation at a 1-million-token context, MiniMax states M3 runs prefill (reading the prompt) about 9× faster, decode (generating the answer) about 15× faster, and reduces per-token compute to one-twentieth.¹ Those figures come from MiniMax's own measurements, so treat them as vendor claims until third parties reproduce them. The encouraging signal is that the MSA design itself is documented in a public technical report and an open kernel, which makes it far easier to verify than a benchmark score run behind closed doors.¹

Benchmarks: M3 vs GPT-5.5, Gemini 3.1 Pro, and Opus 4.8

Here is where the asterisk matters most. MiniMax reports the following on agentic benchmarks. Every number in this table is vendor-run — measured on MiniMax's own infrastructure, with baselines MiniMax selected, frequently using Claude Code as the agent scaffolding.²

Benchmark	MiniMax M3	GPT-5.5	Gemini 3.1 Pro	Claude Opus 4.8
SWE-Bench Pro	59.0%	58.6%	54.2%	69.2%
Terminal-Bench 2.1	66.0%	—	—	74.6%
OSWorld-Verified	70.0%	—	—	83.4%
BrowseComp	83.5%	—	—	—

Sources: MiniMax-reported figures via TechTimes and VentureBeat.²³

The story those numbers tell is nuanced. On SWE-Bench Pro — a harder benchmark than the saturated SWE-Bench Verified, built from 1,865 real pull requests across 41 actively maintained repositories — M3's 59.0% edges GPT-5.5's reported 58.6% by a fraction of a point and clears Gemini 3.1 Pro.² On BrowseComp, an autonomous web-search task, MiniMax reports 83.5%, the highest among the models it tested.² But against Claude Opus 4.8, M3 trails on every shared benchmark: by roughly 10 points on SWE-Bench Pro, 9 points on Terminal-Bench 2.1, and 13 points on OSWorld-Verified.² So "beats GPT-5.5" is defensible on one vendor-run benchmark; "frontier across the board" is not.

There is one early independent data point. Artificial Analysis, which runs its own composite of benchmarks, scores M3's reasoning variant 55 on its Intelligence Index — first among the open-weight models it tracks, and ahead of several proprietary models, though still below Claude Opus 4.8's score of around 61.⁴ The non-reasoning variant scores 44, well above the roughly-24 median for comparable open models.⁴ That makes M3 a credible open-weight leader even if the broader "frontier" label is generous. For context on how fast open-weight coding models have been closing the gap, see our earlier breakdown of GLM-5.1 beating GPT on coding benchmarks, and our Claude Opus benchmarks and pricing profile for background on the Anthropic line M3 is measured against.

Pricing and how to run it

You can use M3 two ways: call the hosted API, or self-host the open weights.

Through MiniMax's API, launch pricing is $0.30 per million input tokens and $1.20 per million output tokens — a 50% introductory discount, available for the first week of availability, off the standard pay-as-you-go rate of $0.60 / $2.40.² Per MiniMax's launch materials, inputs above and below 512,000 tokens are billed at the same rate structure — there is no long-context step-up surcharge.² For teams that prefer fixed costs, MiniMax also sells subscription tiers through its MiniMax Code interface, starting at $20 per month.² VentureBeat framed the overall economics as landing around 5–10% of the cost of GPT-5.5 or Gemini 3.1 Pro for comparable work, which is the real headline for budget-conscious teams.³ The introductory discount is temporary, so confirm current numbers on MiniMax's official pricing page before committing.

To self-host, the weights are on Hugging Face under MiniMaxAI/MiniMax-M3, with quantized builds already published by the community. MiniMax recommends serving via vLLM or SGLang, and the model is also available through Transformers.¹ Recommended sampling parameters are temperature=1.0, top_p=0.95, top_k=40.¹ At ~428B total parameters this is a multi-GPU deployment, not a laptop model — if you want a genuinely local setup, our guide to running local AI with Ollama and Qwen3 covers smaller models that fit a single machine.

The license catch: read it before you ship

This is the detail several launch write-ups got wrong, including some that called M3 "MIT-licensed." It is not MIT. M3 ships under the MiniMax Community License.⁵

The actual terms: the model is free to use, copy, modify, and distribute for non-commercial purposes. The moment you put it into a commercial product or service, two conditions attach. First, you must prominently display "Built with MiniMax M3" on a related website, UI, blog post, about page, or product docs. Second, you must email MiniMax — a one-time notice if your product earns under $20 million in yearly revenue, or a prior written authorization request if it earns more than that.⁵ The license also forbids uses such as military applications, harming minors, and generating harmful disinformation.⁵

None of this makes M3 unusable commercially — plenty of teams will happily add an attribution line and send an email. But "open weights" is not the same as "do whatever you want," and the difference between this and a true permissive license (MIT, Apache 2.0) matters for legal review. Read the LICENSE file in the repo before building on it.

Who should use MiniMax M3?

Reach for M3 if you need very long context cheaply — large-codebase agents, long document analysis, multi-step browser or terminal automation — and you are comfortable either paying MiniMax's low API rates or running a multi-GPU deployment. Its agentic and web-search numbers are genuinely strong for an open-weight model, and the price-to-capability ratio is its real selling point.

Be more cautious if you need the absolute top of the quality curve, where Claude Opus 4.8 still leads on the shared benchmarks, or if your legal team needs a clean permissive license. And until more independent evaluations land, weight the vendor benchmarks accordingly — the open weights mean the community will pressure-test those claims quickly.

Bottom line

MiniMax M3 is the most interesting open-weight release of the month: a 1M-context, natively multimodal MoE that, on its own benchmarks, trades blows with GPT-5.5 and already tops an independent open-weight leaderboard — at a fraction of proprietary API prices. The honest caveats are that most of its headline numbers are vendor-run, it still trails Claude Opus 4.8 on shared tests, and its license is permissive only until you go commercial. Treat the benchmarks as a starting hypothesis, read the license before you build, and M3 earns a real place in your evaluation shortlist.

MiniMax M3 model card, Hugging Face — "MiniMax-M3 is a native multimodal model with 1M context. It has ~428B parameters and ~23B activated parameters," plus MSA speedup and inference-parameter details. https://huggingface.co/MiniMaxAI/MiniMax-M3/blob/main/README.md ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵
"MiniMax M3 Open-Weight Coding Model: Frontier Claims, Unverified Benchmarks," TechTimes, June 1, 2026. https://www.techtimes.com/articles/317532/20260601/minimax-m3-open-weight-coding-model-frontier-claims-unverified-benchmarks.htm ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹
"MiniMax-M3 debuts, eclipsing GPT-5.5 and Gemini 3.1 Pro on key benchmark performance for just 5-10% of the cost," VentureBeat. https://venturebeat.com/technology/minimax-m3-debuts-eclipsing-gpt-5-5-and-gemini-3-1-pro-on-key-benchmark-performance-for-just-5-10-of-the-cost ↩ ↩² ↩³
"MiniMax-M3 — Intelligence, Performance & Price Analysis," Artificial Analysis. https://artificialanalysis.ai/models/minimax-m3 ↩ ↩² ↩³
MiniMax Community License, official LICENSE file in the model repository. https://huggingface.co/MiniMaxAI/MiniMax-M3/blob/main/LICENSE ↩ ↩² ↩³ ↩⁴ ↩⁵
"MiniMax M3: Open-weight model with a million-token context challenges proprietary leaders," The Decoder. https://the-decoder.com/minimax-m3-open-weight-model-with-a-million-token-context-challenges-proprietary-leaders/ ↩ ↩²

Frequently Asked Questions

It is open weight — the weights are downloadable and self-hostable — but it is released under the custom MiniMax Community License, not a standard open-source license like MIT or Apache 2.0. Commercial use carries attribution and notification conditions. 5