🎙️ Episode 29206:34June 1, 2026

SubQ: First Subquadratic LLM Ships 12M Context Window

Listen to this episode

AI-generated discussion by Alex and Jamie

About this episode

Join hosts Alex and Jamie in this episode of Nerd Level Tech AI Cast as they dive into the groundbreaking SubQ, a new large language model boasting a staggering 12-million-token context window. Discover how Subquadratic Sparse Attention (SSA) revolutionizes efficiency by selecting only the essential bits of information, making it a game-changer for everything from coding to storytelling. Tune in to learn why this Miami-based startup is generating major buzz in the tech world!

Transcript

[Alex]: Hey everyone, welcome back to Nerd Level Tech AI Cast—where the context window is never too big, and the attention span is… well, slightly subquadratic.

[Jamie]: [laughs] Speak for yourself, Alex. I still get distracted by shiny new LLMs. And today, do we have a whopper! We’re talking SubQ—the new kid on the long-context block, with a 12-million-token context window. That’s… a lot of tokens. Like, “I could fit my entire email inbox in there” levels of tokens.

[Alex]: Or, you know, your entire codebase, your D&D campaign notes, and every restaurant menu in Miami. Which, funnily enough, is where Subquadratic—the startup behind SubQ—calls home.

[Jamie]: Only in Miami, right? Silicon Valley’s out, South Beach is in. So, Alex, give us the elevator pitch—what’s SubQ, and why is everyone so hyped?

[Alex]: Okay, picture this: Subquadratic, a 13-person team with $29 million in seed funding, pops out of stealth mode and drops SubQ—a large language model with a native 12-million-token context window. That’s an order of magnitude bigger than the biggest public models out there.

[Jamie]: Wait, 12 million? That’s… not just “GPT-5.5 has context envy,” that’s “Claude Opus needs a support group” territory. How’d they pull that off?

[Alex]: It all comes down to something called Subquadratic Sparse Attention—or SSA, for those who like their acronyms crunchy. Traditional transformer models scale with the square of the input length. Double your context, quadruple the compute. Not fun. SSA, though, claims to pick out just the important bits—so compute and memory scale almost linearly.

[Jamie]: So… instead of trying to read every word in War and Peace at once, SSA just skims the juicy bits?

[Alex]: Exactly. It’s like a speed-reader who actually remembers details. The magic is in how it decides which tokens matter for attention. With SSA, SubQ claims it can keep up with the big boys in accuracy, but at a fraction of the cost.

[Jamie]: And when you say “fraction,” you mean…?

[Alex]: According to their own benchmarks, SubQ runs about 300 times cheaper than Claude Opus at the same accuracy on the RULER 128K benchmark. For context—pun intended—Claude Opus clocks in at $2,600 per run, SubQ does it for $8.

[Jamie]: Okay, but you know what they say: “If it sounds too good to be true, it probably needs an arXiv paper.” Are people buying these numbers?

[Alex]: [laughs] Oh, the research community is sharpening their pitchforks, Jamie. There’s skepticism, for sure. No peer-reviewed paper, no open weights, no third-party leaderboard results—just a company blog and some marketing slides. The numbers are impressive, but until someone else can reproduce them, it’s all very “trust me, bro.”

[Jamie]: So, we’re in the “show us the receipts” phase.

[Alex]: Big time. And it doesn’t help that there’s a weird gap in their own benchmarks. On the MRCR v2 task—think “find-the-needle-in-a-haystack, but the haystack is 1 million tokens”—their research model scores 83, but the production model drops to about 66. That’s a 17-point drop they haven’t really explained yet.

[Jamie]: That’s like telling your boss you aced the practice test, but on the real thing, you sort of… forgot your pencil.

[Alex]: [laughs] Exactly! Still, on other benchmarks—like coding tasks—SubQ holds its own. It edges out the previous-gen Claude Opus, but trails the latest Opus 4.7 and Anthropic’s latest models.

[Jamie]: Let’s talk about what they actually shipped. So, there’s SubQ API, SubQ Code, and SubQ Search?

[Alex]: Right. The API is OpenAI-compatible, so you can plug it into your existing stack with minimal fuss. SubQ Code is for developers—it claims you can load your whole repo into the context window. And SubQ Search targets enterprises with massive document databases.

[Jamie]: But before everyone gets too excited—this is all behind a waitlist, right?

[Alex]: Yep, private beta for now. And that headline 12-million-token context? That’s only available to select research and enterprise partners. For mere mortals, the API supports up to 1 million tokens. Still massive compared to the competition.

[Jamie]: So, if I want to run my entire “Jamie’s Thoughts” Notion archive through SubQ, I need to know someone at Subquadratic?

[Alex]: Or bribe the right Miami barista, I hear. But seriously, if you’re already using long-context tools like Claude Code or Cursor, SubQ’s pitch is that they can be a cost reducer—a long-context layer, not necessarily a total replacement.

[Jamie]: Okay, time for the million-token question: Is SubQ really the “first subquadratic LLM,” or is that just marketing?

[Alex]: Ah, the “first” wars. Not strictly true. Mamba, RWKV, and Jamba have been doing subquadratic stuff for years. Subquadratic’s angle is that SubQ is the first commercial, frontier-tier LLM built on a fully subquadratic sparse-attention architecture—no hybrid layers, no dense attention lurking in the corners.

[Jamie]: So, it’s less “We invented fire!” and more “We put a fire pit in a Miami penthouse and invited everyone over.”

[Alex]: [laughs] Perfect analogy. The architecture is real and interesting, but the magnitude of the cost and speed claims? Still to be proven.

[Jamie]: Alright, Alex, you’ve read all the footnotes. What’s your honest take? Hype or hope?

[Alex]: If even half their claims hold up, SubQ could make long-context tasks—like whole-repo coding assistants or legal document QA—actually affordable. But the next couple of months are critical. If independent researchers can reproduce the results, SubQ could become the new default for long-context LLMs. If not… it’ll join the “cool, but commercially overstated” club.

[Jamie]: So, SubQ is either the Miami Heat of LLMs, or just another startup with a flashy launch party. Either way, we’ll be watching.

[Alex]: And probably refreshing the LMArena leaderboard every morning. [laughs]

[Jamie]: That’s it for today’s episode of Nerd Level Tech AI Cast! If you want early access to SubQ, hit their waitlist—just don’t forget to let us know if you get in.

[Alex]: Thanks for listening! Subscribe, leave us a review, and remember—keep your context windows wide, but your skepticism wider.

[Jamie]: Catch you next time! [OUTRO MUSIC FADES OUT]