Mistral Medium 3.5: 128B Open-Weight Frontier Coder
May 1, 2026
TL;DR
Mistral AI released Mistral Medium 3.5 on April 29, 2026 — a 128 billion-parameter dense model with a 256,000-token context window and configurable reasoning effort, published as open weights on Hugging Face under a Modified MIT license.12 The model scores 77.6% on SWE-Bench Verified and 91.4% on the τ³-Telecom agentic benchmark, a narrow lead over Anthropic's previous-generation Claude Sonnet 4.5 (77.2% on SWE-Bench Verified) but a step behind the current Anthropic Sonnet flagship, Sonnet 4.6, at 79.6%.134 API pricing is $1.50 per million input tokens and $7.50 per million output tokens, roughly half Sonnet 4.5 and Sonnet 4.6's identical $3.00/$15.00 list price.564
The other half of the news is the deployment story. Mistral says Medium 3.5 self-hosts "on as few as four GPUs," and ships an EAGLE speculative-decoding draft head (Mistral-Medium-3.5-128B-EAGLE) for low-concurrency latency-bound serving.17 The model also became the new default for Le Chat and powers a refreshed Mistral Vibe that now runs coding agents in the cloud asynchronously rather than only on a developer's laptop.8
What You'll Learn
- What Mistral Medium 3.5 actually is, and why a dense 128B was an unusual choice in 2026
- The 256K context window, multimodal inputs, and configurable reasoning effort
- The Modified MIT license and where its revenue carve-outs apply
- How Medium 3.5 benchmarks against Claude Sonnet 4.5, Sonnet 4.6, and DeepSeek V4
- How EAGLE speculative decoding and four-GPU self-hosting change the deployment math
- How Le Chat and Mistral Vibe consume the model in production
What Mistral Shipped
On April 29, 2026 Mistral published Medium 3.5 alongside a refresh of its product surface. The release bundle has four pieces:128
| Component | What it is |
|---|---|
mistralai/Mistral-Medium-3.5-128B | Base 128B dense weights on Hugging Face |
mistralai/Mistral-Medium-3.5-128B-EAGLE | EAGLE draft head for speculative decoding |
| Mistral API | mistral-medium-3.5 endpoint, $1.50 / $7.50 per million tokens |
| Le Chat + Mistral Vibe | Medium 3.5 set as the new default model |
The Hugging Face collection page consolidates the base model, the EAGLE variant, and a Mistral-Medium-3.5-128B-EAGLE draft used for speculative decoding.2 Third-party packagers — Unsloth, Ollama, NVIDIA NIM — picked up GGUF and container images within hours.910
This is a single model that "merges instruction-following, reasoning, and coding into a single 128B dense model."1 In other words, Mistral collapsed the previous specialised line — Devstral 2 for coding, Magistral for reasoning, Medium 3.1 for general instruction-following — into one weight set. Reasoning effort is configurable per request rather than baked into a separate model variant.
The 128B Dense Choice
Almost every frontier-class open-weight release in early 2026 has been Mixture-of-Experts. DeepSeek V4-Pro is a 1.6 trillion-parameter MoE with 49 billion parameters activated per token; V4-Flash is a 284B MoE with 13B activated.11 GLM 5.1 and Kimi K2.6 are MoE. Even Meta's now-internal Muse Spark was described publicly as a sparse architecture.
Mistral went the other way: a dense 128B model. Every parameter participates in every token. That is heavier per FLOP than an equivalent-quality MoE, but it has two practical consequences:
- Memory residency is predictable. A dense 128B model in BF16 needs roughly 256 GB of weights, which fits inside four 80-GB GPUs (320 GB) with room for activations and KV cache. Mistral's "as few as four GPUs" claim rests on FP8 quantisation bringing the weights closer to 128 GB.112
- Speculative decoding is straightforward. EAGLE works best when the draft head shares the target's tokenizer and head, and dense models give you a clean vocabulary to draft against. Mistral's published EAGLE numbers are roughly 1.41× output throughput and around 29% lower end-to-end latency at low concurrency, with an acceptance length of about 1.72 tokens per draft cycle.7
The trade-off: a dense 128B will not match a 1T-class MoE on raw benchmark ceilings. Medium 3.5's 77.6% on SWE-Bench Verified sits below DeepSeek V4-Pro's 80.6% and below the current closed-source frontier.111 Mistral is betting that "small enough to self-host, big enough to ship" is a more useful operating point for enterprises than the absolute leaderboard top.
The 256K Context Window and Multimodal Inputs
Medium 3.5 supports a 256,000-token context window — large enough to ingest a mid-sized codebase or several hundred pages of policy documents in one pass.12 Mistral has stayed below the megacontext threshold that DeepSeek V4 (1M tokens) and Claude Sonnet 4.6 (1M tokens, GA without surcharge as of February 2026) target.4 The result is a context that is deep but kept inside a single sequence-parallelism regime, which is part of why a four-GPU self-host stays plausible.
The model is multimodal on input: it accepts text and images, but produces only text.213 The vision encoder was trained from scratch to handle variable image sizes and aspect ratios, which is a quieter but useful detail — most open-weight vision encoders stamp images down to a fixed square. For document-parsing and visual-QA workloads in enterprises, native variable-resolution input means fewer pre-processing surprises.
Medium 3.5 also supports function calling and "configurable reasoning effort" per request — the same pattern Anthropic, Google, and OpenAI have been moving toward, where a single model can be asked to spend more or fewer thinking tokens on a problem rather than being forked into a separate "thinking" SKU.12
Benchmarks: Where Medium 3.5 Actually Lands
Mistral's headline benchmarks sit on a narrow strip between the open-weight MoE leaders of April 2026 and the proprietary Sonnet tier:1311144
| Model | License | Total params | SWE-Bench Verified | Notes |
|---|---|---|---|---|
| DeepSeek V4-Pro | MIT | 1.6T (MoE, 49B activated) | 80.6% | 1M context, $1.74 / $3.48 per M tokens |
| Claude Sonnet 4.6 | Closed | Not disclosed | 79.6% | 1M context (no surcharge), $3.00 / $15.00 per M tokens |
| Mistral Medium 3.5 | Modified MIT | 128B (dense) | 77.6% | 256K context, $1.50 / $7.50 per M tokens |
| Claude Sonnet 4.5 (legacy) | Closed | Not disclosed | 77.2% | 200K standard context (1M beta retired April 30, 2026), $3.00 / $15.00 per M tokens |
| DeepSeek V4-Flash | MIT | 284B (MoE, 13B activated) | Not separately disclosed | Cost-optimised tier of V4 |
The Mistral-vs-Sonnet comparison is the cleanest. SWE-Bench Verified is a fixed test set; all of these models were evaluated against it; Medium 3.5's 77.6% edges legacy Sonnet 4.5's 77.2% by 0.4 percentage points but trails the current Sonnet 4.6 (79.6%, released February 17, 2026) by two points at the same $3/$15 list price.134 On τ³-Telecom, an agentic-tool-use benchmark used to stress-test multi-turn workflows, Mistral reports 91.4% for Medium 3.5.1 Mistral has not published full MMLU, GPQA, or AIME scores as of publication, so reasoning and general-knowledge ceilings are harder to triangulate than the coding numbers.
Medium 3.5 is not the leaderboard king. DeepSeek V4-Pro stays ahead by three percentage points on SWE-Bench Verified on a model roughly twelve times bigger by parameter count, and Sonnet 4.6 stays ahead by two points on a closed model. The story is what those points cost: V4-Pro needs many more GPUs to host than a 128B dense network, and Sonnet at any version cannot be self-hosted — your prompts run on Anthropic's infrastructure or its cloud partners (AWS Bedrock, Google Vertex AI), but never on hardware you control.
Self-Hosting on Four GPUs
Mistral's "self-hosts on as few as four GPUs" claim is the most consequential part of the release for engineering teams.1 In practice it lands like this:
- BF16 native: 128B parameters × 2 bytes ≈ 256 GB of weights. Plus KV cache (which scales with context length) and activations, you are looking at four to eight 80-GB GPUs depending on context.
- FP8 quantised (Mistral's published recipe): roughly 128 GB of weights, comfortably within a 4×H100 or 4×H200 node, with KV-cache headroom for medium-length contexts.12
- EAGLE draft head (
Mistral-Medium-3.5-128B-EAGLE) is FP8-quantised, two-layer GQA, and adds about 4 GB of weight overhead. It buys roughly 1.41× output throughput and around 29% lower end-to-end latency in low-concurrency, latency-bound serving, with an average acceptance length of 1.72 tokens per draft cycle.7
vLLM is the inference engine Mistral and the third-party docs both lean toward, with SGLang as the alternative that ships first-class EAGLE support.12 If you have ever tried to deploy a 1.6T-parameter MoE in production, the difference is not a rounding error — Medium 3.5 fits inside a single high-end inference node where DeepSeek V4-Pro needs a small cluster.
The Modified MIT License
Mistral published Medium 3.5 under a Modified MIT license: an MIT-style permissive base with revenue-based exceptions for very large companies.12 For independent developers, startups, mid-market vendors, and most universities, the license behaves like vanilla MIT — commercial use, redistribution, and self-hosting are allowed without additional terms. Companies above Mistral's revenue threshold need to come to a separate commercial arrangement.
The exact threshold is documented in the Hugging Face card and Mistral's help center; multiple aggregators have summarised it as a monthly-revenue test, but the precise figure, the trigger conditions, and the carve-outs for usage versus redistribution are worth reading directly from the license text rather than from third-party paraphrases.215 Pulling the license out of LICENSE.md on the Hugging Face repo is a five-minute exercise; trusting a blog summary is the kind of thing that lawyers eventually re-read in a different mood.
Le Chat and Mistral Vibe: How Mistral Uses It
Mistral made Medium 3.5 the default model for two of its own products on the same day:18
Le Chat — Mistral's consumer and enterprise chat assistant — gained a new "Work mode" in preview that supports multi-step tasks like research and cross-tool actions, with Medium 3.5 underneath for Pro, Team, and Enterprise plans.16
Mistral Vibe — the open-source Apache 2.0 CLI coding agent originally announced alongside Devstral 2 — picked up cloud-side execution in this release.817 Until April 29, Vibe was a local-only CLI: you ran it in your terminal, it shelled out to bash, it edited files in your working directory. The new "remote agents" feature lets multiple Vibe agents run simultaneously in Mistral's cloud, kicked off either from the CLI or from inside Le Chat. The use case Mistral pitches is asynchronous coding work — file a task from Le Chat, walk away, come back to a pull-request-ready diff.
That is structurally similar to what GitHub's Copilot coding agent, Anthropic's Claude Code Remote Sessions, and Google's Gemini Enterprise Agent Platform already offer; the Mistral angle is that the underlying model is open-weight and the CLI client is Apache 2.0, so the whole stack can in principle be self-hosted by a customer who does not want to use Mistral's cloud.
How Medium 3.5 Compares to the Sonnet Tier
For most enterprise readers the practical question is "Mistral Medium 3.5 versus the Anthropic Sonnet tier." Mistral's own announcement compared against Sonnet 4.5, but Sonnet 4.6 has been the current Anthropic Sonnet flagship since February 17, 2026, at the same $3/$15 list price.13564
| Dimension | Mistral Medium 3.5 | Claude Sonnet 4.6 (current) | Claude Sonnet 4.5 (legacy) |
|---|---|---|---|
| License | Modified MIT (open weights) | Closed (API + first-party clients only) | Closed |
| Parameter count | 128B dense | Not disclosed | Not disclosed |
| Context window | 256,000 tokens | 1,000,000 tokens (no surcharge, GA) | 200,000 tokens (1M beta retired April 30, 2026) |
| SWE-Bench Verified | 77.6% | 79.6% | 77.2% |
| Self-hostable | Yes — Mistral cites four GPUs at FP8 | No | No |
| API price (input) | $1.50 / M tokens | $3.00 / M tokens | $3.00 / M tokens |
| API price (output) | $7.50 / M tokens | $15.00 / M tokens | $15.00 / M tokens |
| Multimodal input | Text + image | Text + image + PDFs | Text + image + PDFs |
| Released | April 29, 2026 | February 17, 2026 | September 29, 2025 |
Sonnet 4.6 wins on raw SWE-Bench score, on the now-GA 1M-token context window without surcharges, on PDF input, and on overall product polish — the Anthropic stack is older and more battle-tested. Medium 3.5 wins on price at any context length below 200K, on the option to bring weights in-house, and against Sonnet 4.5 specifically on the coding benchmark Mistral chose to highlight.
The practical decision tree is closer to "which side of the build/buy fence are you on" than "which is the better model." If your security team is uncomfortable with prompts leaving the building, Medium 3.5 is the only one of the three you can deploy. If you want a managed API and are comfortable on that side, the two-point SWE-Bench gap to Sonnet 4.6 may matter more than the price gap when you weigh the operational lift of standing up a four-GPU inference node, three replicas for high availability, and the EAGLE draft pipeline.
The European AI Independence Story
Mistral was founded in April 2023 by Arthur Mensch, Guillaume Lample, and Timothée Lacroix — three researchers who came together after stints at DeepMind and Meta.18 The company is headquartered in Paris with offices in the U.S., London, Luxembourg, and Singapore, and reportedly employs around 350 people.1819 On March 30, 2026 Mistral raised $830 million in debt from a seven-bank European consortium to fund a 44-megawatt NVIDIA-powered data center at Bruyères-le-Châtel near Paris — its largest infrastructure financing to date, and a separate effort from a $1.4 billion February 2026 pledge with EcoDataCenter for Swedish capacity.20
The strategic frame Mistral has pushed since 2023 is that Europe needs a domestic frontier-class lab whose weights companies can actually inspect, fine-tune, and self-host. Medium 3.5 is the most pointed version of that pitch so far. It is small enough to deploy on European-owned hardware, openly licensed enough to satisfy procurement teams that are wary of U.S. closed APIs, and benchmark-credible enough to be a real choice rather than a political one.
Whether that pitch lands depends less on the model itself than on enterprise buyers' patience for self-hosting frontier-class models. Most companies do not actually want to operate four-GPU inference nodes. Most companies do, however, want the option to walk away from a vendor without rebuilding their stack — and that optionality is exactly what an open-weight 77.6% SWE-Bench model under a Modified MIT license is selling.
Bottom Line
Mistral Medium 3.5 is not the highest-scoring model in the open-weight category — DeepSeek V4-Pro still owns that crown by three percentage points on the headline coding benchmark. It is, however, among the most deployable open-weight frontier-class models published as of April 29, 2026: a dense 128B network that fits inside a single four-GPU node with EAGLE-accelerated inference and a 256K-token context, shipped by one of the few European labs still pushing into this performance tier.
For enterprises that need to keep prompts in-house, for European procurement teams that want a credible domestic option, and for engineering teams that want the price floor of an open-weight model with the deployment ergonomics of a dense network, Medium 3.5 is one of the most interesting releases of the week. The cost of admission is a four-GPU node and a vLLM cluster; the prize is a model that lands within two percentage points of Anthropic's current Sonnet flagship at half the API price, edges past the previous Sonnet generation by 0.4 points on the same benchmark, and gives you the option to walk away from the API entirely.
Footnotes
-
Mistral Medium 3.5 — Mistral Docs (model card, version 26-04) ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18
-
mistralai/Mistral-Medium-3.5-128B— Hugging Face model card ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 -
Claude Sonnet 4.5: Highest-Scoring Claude Model Yet on SWE-bench — Caylent ↩ ↩2 ↩3 ↩4
-
Introducing Claude Sonnet 4.6 — Anthropic (released February 17, 2026; 79.6% SWE-Bench Verified at $3/$15 list price; 1M context GA without surcharge) ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7
-
Mistral Medium 3.5 Developer Guide: API & Benchmarks — Lushbinary ↩ ↩2
-
mistralai/Mistral-Medium-3.5-128B-EAGLE— Hugging Face model card ↩ ↩2 ↩3 ↩4 -
Remote agents in Vibe — Mistral AI news (April 29, 2026) ↩ ↩2 ↩3 ↩4 ↩5
-
DeepSeek V4 (2026): 1T Parameters, 81% SWE-bench, $0.30/MTok — NxCode ↩ ↩2 ↩3 ↩4
-
Self-Host Mistral Medium 3.5: vLLM, SGLang & GPU Guide — Lushbinary ↩ ↩2 ↩3 ↩4
-
Mistral Medium 3.5 Launched: What It Means for Self Hosted AI Infrastructure — VRLA Tech ↩
-
Mistral Medium 3.5 vs Claude Sonnet 4 vs GPT-4o Compared — Lushbinary ↩
-
Under which license are Mistral's open models available? — Mistral AI Help Center ↩ ↩2
-
Mistral AI unveils Medium 3.5 model and Work Mode for Le Chat — testingcatalog.com ↩
-
Introducing: Devstral 2 and Mistral Vibe CLI — Mistral AI news ↩ ↩2
-
Mistral secures $830 million in debt financing to fund AI data center — CNBC (March 30, 2026) ↩