Is MAI-Thinking-1 better than Claude?

On Microsoft's own benchmarks it is "toe-to-toe" with Claude Opus 4.6 on SWE-Bench Pro and was preferred over Claude Sonnet 4.6 in a 1,276-task blind human review run by Surge. These are Microsoft-reported results; independent benchmarks may differ. 2

How much do the MAI models cost?

GitHub's Copilot pricing page lists MAI-Code-1-Flash at $0.75 per million input tokens, $0.075 per million cached input tokens, and $4.50 per million output tokens, in line with Copilot's move to usage-based AI Credits billing on June 1, 2026; MAI-Thinking-1's pricing was not publicly disclosed as of June 3, 2026, while it is in private preview on Microsoft Foundry. 1 2

Were the MAI models trained on OpenAI data?

No. Microsoft says all MAI models were trained from scratch on clean, appropriately licensed data, without distillation from third-party models, and with AI-generated content excluded from pre-training. 2 3

ai-ml

Microsoft MAI Models: In-House AI in Copilot (2026)

Q: How big is MAI-Thinking-1?

MAI-Thinking-1 is a sparse Mixture-of-Experts model with 35 billion active parameters and roughly 1 trillion total parameters, and it supports a 256,000-token context window. 2

June 3, 2026

#microsoft mai #mai-code-1-flash #mai-thinking-1 #github copilot #microsoft foundry #reasoning model #ai coding

Microsoft MAI Models: In-House AI in Copilot (2026)

On June 2, 2026, at Build, Microsoft launched a family of seven in-house MAI models — most notably MAI-Code-1-Flash, a sparse Mixture-of-Experts coding model with 137 billion total parameters and 5 billion active parameters now rolling out inside GitHub Copilot, and MAI-Thinking-1, a 35B-active / ~1-trillion-total-parameter reasoning model that Microsoft says goes toe-to-toe with Claude Opus 4.6 on SWE-Bench Pro. Both were trained from scratch on clean, licensed data with no distillation from other labs — Microsoft's clearest move yet to build first-party models alongside its OpenAI partnership.¹²³

TL;DR

Microsoft AI, the group led by Mustafa Suleyman, announced the MAI model family at Build 2026: reasoning, coding, image, voice, and transcription models built entirely in-house.³ The two that matter most for developers ship now or soon: MAI-Code-1-Flash is reaching GitHub Copilot users in VS Code today, and MAI-Thinking-1 is in private preview on Microsoft Foundry.¹² Microsoft trained both from the ground up — no distillation from third-party models — and co-designed them with its own Maia 200 inference silicon.³ The strategic subtext is independence: usable first-party models so the OpenAI relationship no longer defines the ceiling of Microsoft's AI business.

What Microsoft launched

MAI-Thinking-1 and MAI-Code-1-Flash are the headline releases, but they arrive as part of a seven-model family. Microsoft AI describes the lineup as "a family of seven new models developed in-house," spanning reasoning, coding, image generation, voice, and transcription, all built on a shared foundation with "zero distillation."³ Alongside the two developer-facing models, the family includes MAI-Image-2.5 (with a Flash variant), MAI-Transcribe-1.5, and MAI-Voice-2 (with MAI-Voice-2-Flash coming soon).³

The unifying theme is self-sufficiency. Microsoft says every component — architecture, training pipeline, and post-training — was built in-house, with datasets that are "clean and appropriately licensed" and AI-generated content excluded from pre-training.²³ The models are distributed through Microsoft Foundry (the platform renamed from Azure AI Foundry on January 1, 2026) and are also being made available on OpenRouter, Fireworks, and Baseten, where developers can tune the weights themselves.³

MAI-Code-1-Flash: a coding model built for Copilot

MAI-Code-1-Flash is the model most developers will touch first. It is a lightweight, agentic coding model — a sparse Mixture-of-Experts architecture with 137 billion total parameters but only 5 billion active per token, which Microsoft positions as "comparable to Haiku but cheaper" — and it is rolling out to GitHub Copilot individual users in Visual Studio Code, both in the model picker and under the default Auto picker.¹³ Crucially, Microsoft trained it directly against the GitHub Copilot harnesses used in production, so it learns to interact with the surrounding tools and systems for agentic coding tasks in the same environment where it runs.¹

On Microsoft's benchmarks, MAI-Code-1-Flash beats Claude Haiku 4.5 across all four coding evaluations tested — SWE-Bench Verified, SWE-Bench Pro, SWE-Bench Multilingual, and Terminal Bench 2 — with its widest margin a +16-point lead on SWE-Bench Pro (51.2% vs. 35.2%).¹ Just as important for cost, Microsoft says the model uses "adaptive solution length control" to stay concise on easy requests and spend more reasoning budget on hard ones, solving harder problems with up to 60% fewer tokens on SWE-Bench Verified.¹

Benchmark (Microsoft harness)	MAI-Code-1-Flash	Claude Haiku 4.5
SWE-Bench Pro (pass rate)	51.2%	35.2%
IF Bench (instruction following)	+28.9 pts vs. Haiku	baseline
Advanced IF (rubric-based)	+14.5 pts vs. Haiku	baseline
Adversarial reasoning (186 Q, 34 categories)	85.8% adjusted	lower overall

Microsoft also built a 186-question, 34-category adversarial benchmark — inverted classics, impossible tasks, underdetermined prompts — to separate real reasoning from memorization. MAI-Code-1-Flash reached 85.8% adjusted accuracy, though Microsoft notes some categories such as Einstellung traps stayed below 50%, leaving room to grow.¹

If you already follow how GitHub Copilot is evolving toward agentic, spec-driven workflows, MAI-Code-1-Flash fits the pattern: a model tuned for the harness rather than for leaderboards.

MAI-Thinking-1: a mid-weight reasoning model

MAI-Thinking-1 is Microsoft AI's flagship reasoning model, and Microsoft trained it from scratch in-house without distilling from another lab's model — unlike its earlier MAI-DS-R1, which was a post-trained variant of DeepSeek-R1.²⁴ It is a sparse Mixture-of-Experts model with 35 billion active parameters and roughly 1 trillion total, giving it a smaller inference footprint than much larger models.²

The headline performance claims:

97.0% on AIME 2025 and 94.5% on AIME 2026, the competition-math benchmarks that test multi-step reasoning.²
On SWE-Bench Pro, Microsoft says it is "toe-to-toe with Claude Opus 4.6" — notable for a mid-weight model against a frontier flagship.²
In a blind human side-by-side evaluation spanning 1,276 tasks run by Microsoft's rating partner Surge, raters preferred MAI-Thinking-1 over Claude Sonnet 4.6.²

For builders, the enterprise plumbing matters as much as the scores. MAI-Thinking-1 ships with a 256,000-token context window — enough to hold a roughly 600-page document in a single pass — plus function calling, developer-instruction layering, and compatibility with the widely used Chat Completions API.² If you are weighing how much that window buys you in practice, our guide to context-window optimization for LLMs covers the trade-offs. The model is in private preview on Microsoft Foundry now, with public preview on the MAI Playground promised soon.²

Why "no distillation" is the real story

Most of the marketing around MAI emphasizes how the models were built, not just what they score. Microsoft frames this as a "hill-climbing machine": a pipeline where capabilities are "learned, not inherited," trained on clean licensed data, with self-sufficiency "across the entire stack."³ The argument is that a model distilled from a teacher inherits that teacher's design choices and struggles to adapt — whereas a model trained from the ground up is more steerable.²

That philosophy connects directly to silicon. Microsoft says it co-designed the MAI models with its own Maia 200 accelerator — the TSMC 3nm inference chip it introduced in January 2026 — and is already seeing a 1.4× efficiency boost from that co-design, with a next-generation GB200 cluster now operational.³ Owning the model, the data pipeline, and the inference hardware is the whole point: it is what lets Microsoft control cost and behavior end to end.

Frontier Tuning and the customization angle

Beyond the base models, Microsoft is pushing a customization layer it calls Frontier Tuning — using reinforcement-learning environments ("training gyms") so a MAI model can learn from a customer's own workflow traces, with the resulting model staying private to that customer.³ Microsoft cites two early data points: a MAI model tuned for Excel that it says "matches GPT 5.4 while being up to 10× more efficient," and a MAI model tuned to McKinsey's enterprise standards that it says "achieved the highest win rate of any model tested at roughly 10× lower cost."³ These are vendor-reported figures from internal evaluations, not independently verified, so treat them as directional.

Microsoft also announced a healthcare collaboration with the Mayo Clinic to co-create a frontier clinical model, which will be owned by Mayo Clinic and later distributed through Foundry.³

What it means for the OpenAI relationship

The framing across press coverage is that MAI lessens Microsoft's reliance on OpenAI — Microsoft has committed a reported $13 billion to the startup since 2019 — while lowering costs for developers.⁵ Microsoft's own framing is more measured: Foundry is a multi-model platform where MAI models run alongside those from OpenAI, Anthropic, Meta, Mistral, and other providers, rather than replacing them.⁶ The honest read is "optionality, not divorce." Microsoft keeps OpenAI in the stack but now has credible first-party models so that relationship no longer caps what Microsoft can ship. For more on how that partnership has been renegotiated, see our breakdown of the Microsoft–OpenAI deal and its AGI clause.

The bottom line

MAI is Microsoft's statement that it can build frontier-adjacent models itself. MAI-Code-1-Flash is the one developers can judge immediately — it is in Copilot now, and the claim of higher pass rates at up to 60% fewer tokens is exactly the kind of efficiency that matters in daily use. MAI-Thinking-1 is the more strategic bet: a mid-weight reasoning model trained without distillation, co-designed with Microsoft's own silicon, and pitched at enterprises that want to tune and own their models. Whether the benchmark claims hold up under independent testing is the open question — but the direction is unmistakable.

Microsoft AI, "Introducing MAI-Code-1-Flash," June 2, 2026. https://microsoft.ai/news/introducingmai-code-1-flash/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹
Microsoft AI, "Introducing MAI-Thinking-1," June 2, 2026. https://microsoft.ai/news/introducing-mai-thinking-1/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵
Mustafa Suleyman / Microsoft AI, "Building a hill-climbing machine: Launching seven new MAI models," June 2, 2026. https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵
Microsoft, "MAI-DS-R1" model card (post-trained DeepSeek-R1 variant), Hugging Face. https://huggingface.co/microsoft/MAI-DS-R1 ↩
CNBC, "Microsoft unveils new AI models to lessen reliance on OpenAI and lower costs for developers," June 2, 2026. https://www.cnbc.com/2026/06/02/microsoft-unveils-new-ai-models-lessen-reliance-on-openai-lower-costs.html ↩
Microsoft, "Microsoft Foundry Models overview" (multi-model catalog spanning OpenAI, Anthropic, Meta, Mistral, and others), Microsoft Learn. https://learn.microsoft.com/en-us/azure/foundry/concepts/foundry-models-overview ↩

Frequently Asked Questions

MAI-Code-1-Flash is Microsoft's in-house agentic coding model, announced June 2, 2026. It is a sparse Mixture-of-Experts model with 137 billion total parameters and 5 billion active parameters, is built end-to-end by Microsoft on licensed data, and is rolling out to GitHub Copilot individual users in VS Code via the model picker and Auto picker. 1 3

Microsoft MAI Models: In-House AI in Copilot (2026)

TL;DR

What Microsoft launched

MAI-Code-1-Flash: a coding model built for Copilot

MAI-Thinking-1: a mid-weight reasoning model

Why "no distillation" is the real story

Frontier Tuning and the customization angle

What it means for the OpenAI relationship

The bottom line

Frequently Asked Questions

Related Posts

Microsoft MAI Models Explained: 7 In-House AI Models

Claude Agent SDK Observability With OpenTelemetry (2026)

Trace Claude Agent Tool Calls with OpenTelemetry (2026)

Claude Memory Tool + Context Editing: Agent Tutorial (2026)