🎙️ Episode 29906:38 • June 9, 2026

MiniMax M3: Open-Weight Coding at 1/10 the Cost (2026)

#ai #ai-generated #cloud #nerd-level-tech #tech-podcast #technology

Listen to this episode

AI-generated discussion by Alex and Jamie

About this episode

Join hosts Alex and Jamie in this episode of the Nerd Level Tech AI Cast as they explore the groundbreaking MiniMax M3 coding model, a game-changer in AI technology that's both powerful and surprisingly affordable. Discover how this innovative model harnesses a million-token context and employs MiniMax Sparse Attention to revolutionize coding and agentic tasks, making sophisticated AI accessible like never before. Tune in for a lively discussion filled with insights and a few laughs as they unpack what truly sets MiniMax M3 apart from the competition!

Transcript

[Alex]: Welcome back, fellow nerds, to the Nerd Level Tech AI Cast—the place where GPUs are currency and context windows never close! I’m Alex.

[Jamie]: And I’m Jamie. Today, we’re diving into the world of coding models that are not only smarter, but—wait for it—cheaper. Like, “order of magnitude” cheaper. We’re talking about the brand-new MiniMax M3.

[Alex]: That’s right. The latest open-weight coding model out of Shanghai, promising frontier-level performance at, quite literally, a tenth of the price of GPT-5.5. It’s like if you could get a Ferrari for the price of a used bicycle… with a few asterisks.

[Jamie]: A few asterisks? You mean, like the fine print saying “actual Ferrari may not include engine or wheels”?

[Alex]: [laughs] Something like that. But let’s dig in.

[Jamie]: So, Alex, what exactly is MiniMax M3? Is this another “we promise it’s open” model, or is there real substance here?

[Alex]: Great question. MiniMax M3 is the latest large language model from the Chinese company MiniMax—backed by Alibaba and Tencent, listed in Hong Kong this year. It’s designed for coding and agentic tasks, but what really sets it apart is three things: it’s “open-weight” (sort of), it handles a million tokens of context, and it’s natively multimodal—so it can work with text, images, video, even desktop actions.

[Jamie]: Wait, a million tokens? That’s like… the War and Peace of prompts. Why would anyone need that much context?

[Alex]: Imagine giving the model an entire project’s worth of code, documentation, and Slack arguments in one go. Or, you know, your unread email backlog. It’s not just about size—long context windows mean the model can “see” more, reason better, and do more complex work without losing the thread.

[Jamie]: Okay, but how are they pulling off a million-token context without melting their GPUs into puddles?

[Alex]: That’s the real innovation: MiniMax Sparse Attention, or MSA for short. Instead of doing the classic “every token talks to every other token” dance—which gets expensive fast—MSA breaks the context into blocks and only pays attention to the blocks that actually matter for your query. It’s like cramming for an exam by only reading the highlighted bits instead of the whole textbook.

[Jamie]: That’s how I made it through college. [laughs] But, seriously, does it actually work? Or is it just another clever trick that falls apart in practice?

[Alex]: According to MiniMax, it really works. Their numbers say per-token compute at a million tokens is down to 1/20th of what it used to be. Prefill is 9 times faster, decoding 15 times faster. The key is how they batch memory reads, so they aren’t reloading the same blocks over and over. It’s a bit like a really efficient librarian who brings all the relevant books at once.

[Jamie]: So, less “needle in a haystack,” more “bring me the needles only.” Got it. But what about the model’s actual skills—can it code as well as the big names?

[Alex]: Here’s where it gets interesting. On MiniMax’s own tests, M3 scores just above GPT-5.5 and Gemini 3.1 Pro on coding benchmarks like SWE-Bench Pro. It posts a 59.0, versus 58.6 for GPT-5.5 and 54.2 for Gemini 3.1. But—and this is a big but—those are self-reported, on MiniMax’s own infrastructure.

[Jamie]: So, basically, “trust me, bro, I’m really good at coding.” [laughs]

[Alex]: [laughs] Exactly. And they compared M3 to Claude Opus 4.7, not the latest Opus 4.8, which shipped just a few days before M3’s launch and actually scored higher—69.2 to be exact. So, M3 isn’t the absolute best coder, but it’s competing at a tenth of the price.

[Jamie]: Let’s talk about that price, because that’s honestly wild. How cheap are we talking?

[Alex]: For the API, input tokens are $0.60 per million, output tokens $2.40 per million. For comparison, GPT-5.5 is around $5 and $30, respectively. Claude Opus 4.8 is $5 and $25. So, M3 is literally an order of magnitude cheaper.

[Jamie]: That’s, like… fast food prices in a Michelin-star restaurant world. Is there a secret menu, too?

[Alex]: [chuckles] Well, they did offer a launch-week discount—cut those prices in half. And they’re selling subscription bundles, with billions of tokens for a monthly fee. It’s the same aggressive pricing we’ve seen with other Chinese labs this year.

[Jamie]: I’m still stuck on the “open-weight” thing. Can you actually download the weights and run M3 on your own setup?

[Alex]: As of today—no. MiniMax promised to release the weights and a technical report on Hugging Face and GitHub within 10 days of launch, so theoretically by June 11th. But as of June 9th, they’re still not out. So right now, “open-weight” is more of a promise than a reality. You get API access, but not the self-hosting, fine-tuning magic yet.

[Jamie]: So it’s like showing up to a potluck with a Tupperware, but the food’s still in the kitchen.

[Alex]: [laughs] Perfect analogy. They do have a track record of open-sourcing, so there’s reason to believe it’ll happen. But if you need open weights right now, you’ll have to wait.

[Jamie]: Quick gut check—if I’m a dev running lots of agentic workloads, should I jump on M3?

[Alex]: If price is your main concern and you’re okay with API access, it’s a strong contender. The coding performance is close to the best, and you can run massive projects for pennies on the dollar. Just keep an eye on those benchmarks until we see independent replication, and watch for the actual weights to drop.

[Jamie]: So, solid potential, but “trust, then verify.”

[Alex]: Exactly—trust, but with your finger on the refresh button for that Hugging Face page.

[Jamie]: [laughs] I feel like that’s the story of AI in 2026—refresh, refresh, refresh.

[Alex]: And possibly a little bit of existential dread mixed in. But hey, at least it’s affordable now!

[Jamie]: [laughs] Silver linings! All right, that’s all for today’s Nerd Level Tech AI Cast. If you liked this episode, smash that subscribe button—digitally, not literally, please.

[Alex]: And as always, thanks for geeking out with us. We’ll be back next week with more AI news, more context windows, and probably more jokes about benchmarks.

[Jamie]: See you next time, nerds! [OUTRO MUSIC]