🎙️ Episode 28207:18 • May 23, 2026

Gemini 3.5 Flash: Benchmarks, Pricing, Speed (2026)

#ai #ai-generated #aws #development #nerd-level-tech #tech-podcast #technology

Listen to this episode

AI-generated discussion by Alex and Jamie

About this episode

In this episode of Nerd Level Tech AI Cast, hosts Alex and Jamie dive into the groundbreaking Gemini 3.5 Flash, Google's latest AI model that's outshining its predecessor in surprising ways. They explore its impressive benchmarks, affordability, and game-changing "agentic" capabilities that allow it to tackle complex tasks like a proactive intern. Join them for a fun and insightful discussion that demystifies the future of AI and what it means for your tech landscape!

Transcript

[Alex]: Hey everyone! Welcome back to Nerd Level Tech AI Cast—the show that tackles the latest in AI so you don’t have to read 40-page whitepapers. I’m Alex, your resident code whisperer and explainer-in-chief.

[Jamie]: And I’m Jamie! Self-appointed chief question-asker, occasional code breaker, and your voice of “Wait, what does that actually mean?” Today’s episode: “Gemini 3.5 Flash: Benchmarks, Pricing, Speed.” Or as I like to call it, “How fast can Google’s new AI make me obsolete?”

[Alex]: [chuckles] Trust me, Jamie, you’re safe. For now. So, Google dropped Gemini 3.5 Flash at Google IO 2026, and it’s already making waves. It’s the first model in the 3.5 family, and they’re calling it their “fast, mid-priced” model—but here’s the twist: it’s outscoring its own big sibling, Gemini 3.1 Pro, on a bunch of coding and agentic benchmarks.

[Jamie]: Hold up, “agentic”? That sounds like some kind of sci-fi serum. What does that actually mean in AI world?

[Alex]: [laughs] Good question! “Agentic” basically means the model isn’t just answering simple questions anymore. It’s built to plan, build, and iterate on tasks—think multi-step projects where it can make decisions, use tools, and even ask for help if it gets stuck.

[Jamie]: So, less like a chatbot that spits out trivia, more like an intern who can actually get things done without bugging me every five minutes?

[Alex]: Exactly! But this intern doesn’t sneak snacks from the office fridge. What’s wild is—Flash is Google’s “mid-tier” line, but it actually beat their top dog, Gemini 3.1 Pro, in head-to-head tests for coding, tool use, and certain expert tasks.

[Jamie]: Wait, so the smaller model is outrunning the big fancy one? That’s like my Chromebook beating a gaming rig at, uh, Minesweeper.

[Alex]: [chuckles] Not a terrible analogy! It’s like the smaller model figured out parkour and is zipping past the muscle-bound heavyweight in obstacle courses. On Google’s own benchmarks, Flash hit 76.2 on Terminal-Bench for coding—higher than 3.1 Pro’s 70.3. It’s even faster and more adaptable for agent-style work.

[Jamie]: But does it win at everything? Or is this one of those “nearly all” situations where the fine print bites you?

[Alex]: You’re onto it. On some “needle-in-a-haystack” stuff—like super long document searches or deep academic reasoning—the bigger Pro still has the edge. For example, on the Humanity’s Last Exam benchmark, Pro wins 44.4 to Flash’s 40.2. So, if you’re searching for legal loopholes in a 200,000-token contract, Pro’s still your guy.

[Jamie]: All right, but the real question: how fast are we talking? Is this just marketing hype, or will my code review bot finally stop making me wait for coffee breaks?

[Alex]: Google’s calling this their speed demon. In terms of output tokens per second, Flash is four times faster than previous “frontier” models. And for developer workloads—think agents that might run for hours, spawning subagents—this speed adds up. Instead of waiting all afternoon for an audit, you might get your results before lunch.

[Jamie]: Four times faster? That’s like going from dial-up to fiber overnight. But—[skeptical voice]—what’s the catch? There’s always a catch.

[Alex]: The catch is in the price tag. The new Flash tier is, well, not so cheap anymore. Standard API pricing is $1.50 per million input tokens and $9.00 for output tokens. That’s six times more expensive than the old Gemini 3.1 Flash-Lite.

[Jamie]: Ouch. So, my intern analogy just got a little more “Silicon Valley contractor rates,” huh?

[Alex]: Pretty much. But here’s the twist: it’s still about 25% cheaper per token than the big Pro model. So, Google’s pitching it as “frontier-class results, below-Pro pricing.” But—[leans in]—the cost per finished task can be tricky. If your workload is heavy on “thinking” or reasoning, you’ll rack up more output tokens, which are billed at the higher rate. Some devs found Flash ended up pricier than Pro for complex tasks.

[Jamie]: So, budget-minded devs: run your own tests before going all in. Got it. Is this price creep just a Google thing, or is this the new normal?

[Alex]: It’s an industry-wide trend. OpenAI’s latest models are pricier too; Anthropic’s Claude Opus 4.7 jumped up as well. Everyone’s seeing how much API customers will actually cough up.

[Jamie]: So, big question—when do I pick Flash over Pro? Is this a “one size fits all,” or am I still juggling models?

[Alex]: Still juggling! Use Gemini 3.5 Flash for agentic, execution-heavy jobs: coding pipelines, multi-step tool use, parallel subagents, document processing—basically, if speed and throughput matter, and you can check work along the way. But if you need deep reasoning or massive context—like legal analysis or giant document search—stick with Pro.

[Jamie]: Sounds like, in Google’s perfect world, you’ll have Pro as your “mastermind” planner, then unleash a swarm of Flash agents to do the actual work. Like a heist movie, but for data.

[Alex]: [laughs] Exactly, Jamie. “Ocean’s Eleven: AI Edition.” And this isn’t just theory—Google’s already offering 3.5 Flash as the default in the Gemini app and Search, plus through the Gemini API, Antigravity IDE, and more. And if you’re curious about their new Gemini Spark assistant, it’s running on 3.5 Flash too.

[Jamie]: So, it’s everywhere, whether I like it or not. [mock horror] My emails might already be written by a Flash agent.

[Alex]: [chuckles] Maybe so! But, real talk—Google’s betting big on agents, not just chatbots. They’re shifting from “ask a question, get an answer” to “give me a task, bring back results.” That’s a huge change and, yeah, it comes with safety risks. Google’s adding new safeguards, but the real test will be how it handles real-world workloads.

[Jamie]: So, the demos look shiny—like building a game or even an operating system from scratch. But until devs kick the tires, we won’t know if it’s genius or just good at staged magic tricks.

[Alex]: Couldn’t have said it better. So, in summary: Gemini 3.5 Flash is crazy fast, beats Pro on most agentic tasks, costs more than the old Flash but less than Pro, and represents Google’s bet on a future filled with autonomous AI agents. But don’t throw away your Pro models just yet—and definitely keep your expense reports handy!

[Jamie]: That’s it for today’s deep dive on Gemini 3.5 Flash. If you liked the episode, leave us a review, share with your favorite AI skeptic, or just send us your wildest Gemini agent stories.

[Alex]: Thanks for tuning in to Nerd Level Tech AI Cast. We’ll be back next week with more tech news, more banter, and more ways to keep your nerd level high.

[Jamie]: Stay curious, folks—and remember, if your AI starts building an operating system from scratch, maybe unplug it after hours. [Outro music fades up]

[Alex]: See you next time!

[Jamie]: Bye!