🎙️ Episode 30507:45 • ١٥ يونيو ٢٠٢٦
Gemini 3.5 Flash: Benchmarks, Pricing & API (2026)
Listen to this episode
AI-generated discussion by Alex and Jamie
About this episode
Join hosts Alex and Jamie in this episode of the Nerd Level Tech AI Cast as they dissect Google’s latest release, Gemini 3.5 Flash. From its impressive speed and performance benchmarks to the intriguing question of whether it's worth investing in for your tech arsenal, they break down what makes this coding model a game changer for complex tasks. Tune in for insights that will help you decide if it's time to upgrade your tech setup or stick with your trusty old devices!
Transcript
[Alex]: Welcome back to the Nerd Level Tech AI Cast, where we read the changelogs so you don’t have to! I’m Alex— [Jamie]: —and I’m Jamie. Today, we’re breaking down the launch of Gemini 3.5 Flash: benchmarks, pricing, API changes, and the million-token question—should you switch? [Alex]: Or as I like to call it, “Is it worth mortgaging my GPU farm for Google’s latest shiny thing?” [Jamie]: [laughs] My GPU farm is just a sad gaming laptop and a Raspberry Pi, so I’ll have to take your word for it. Alright, Alex, kick us off: What is Gemini 3.5 Flash and why is everyone suddenly talking about it? [Alex]: Good question. Gemini 3.5 Flash is Google’s latest agentic coding model—think of it as the speedster in the Gemini lineup. It was announced at Google IO 2026 and, unlike the usual vaporware, it actually launched as generally available on day one. [Jamie]: Agentic coding model… so, is this for building Skynet, or just automating my to-do list? [Alex]: More the latter—though, with a million tokens of context, it might actually remember your to-do list for once. [PAUSE] In plain English, “agentic” means it’s designed for complex, multi-step tasks—think code generation, workflow orchestration, document wrangling, and anything that needs sub-agents or parallel execution. Not just chat. [Jamie]: Got it. So, big brain stuff, but fast. How fast are we talking? [Alex]: Google claims 3.5 Flash is about four times faster than other so-called “frontier” models, which is their way of saying “it’s really, really fast.” On benchmarks, it beats the previous Gemini 3.1 Pro on key coding and agent tasks. But, and here’s a pro tip, always check that those numbers come from Google’s docs, not some random blog with a suspicious number of pop-ups. [Jamie]: [mock serious] I only trust benchmarks that come with at least three footnotes and a pie chart. [Alex]: Or a meme. Benchmarks as memes, 2026’s hottest trend. [Jamie]: Okay, numbers time. Where does 3.5 Flash land on the scoreboard? [Alex]: According to Google’s own numbers: Terminal-Bench 2.1, it scores 76.2; MCP Atlas, 83.6; CharXiv Reasoning, 84.2; and GDPval-AA, 1656 Elo. [PAUSE] Translation: it’s leading in agentic coding, tool use, multimodal chart understanding, and real-world economic tasks. [Jamie]: Wait, Elo? Are we playing chess with this thing? [Alex]: Basically! The GDPval-AA Elo score is all about economically valuable reasoning—think of it as the AI Olympics for business workflows. [Jamie]: So, it’s smart, fast, and… I’m guessing not cheap? [Alex]: You guessed right. Here’s the sticker shock: 3.5 Flash is $1.50 per million input tokens and $9.00 per million output tokens. That’s three times the price of the outgoing Gemini 3 Flash Preview. [PAUSE] [Jamie]: Oof. So, if I’m running high-volume jobs, my cloud bill just went from “maybe I’ll expense this” to “time to sell my Magic cards.” [Alex]: [laughs] Or just your unopened Funko Pops. But, silver lining: it’s still 25% cheaper than Gemini 3.1 Pro, which was $2 per million input and $12 per million output. So, you’re paying more than before, but less than the true Pro tier. [Jamie]: And what’s included in those output tokens? Are those just words, or…? [Alex]: Great catch—output tokens include “thinking tokens.” So, if you’re asking for heavy reasoning or complex problem solving, those get billed too. If you’re just summarizing a recipe, probably cheap. If you’re debugging a quantum algorithm, brace yourself. [Jamie]: [mock sigh] My hopes for cheap quantum debugging, dashed again. [Alex]: [PAUSE] But, if you use Google’s Batch API, you can cut those costs by half. And there’s context caching to avoid paying again for the same input. [Jamie]: Nice. Now, what’s the deal with the context window? I keep hearing “one million tokens” thrown around like confetti. [Alex]: That’s the headline feature. Gemini 3.5 Flash supports a 1,048,576-token input window—so you can feed it entire codebases, massive PDFs, or the complete works of Shakespeare and still have room for your resume. Output is up to 65,536 tokens. [Jamie]: So, for the first time, my AI can actually remember the first thing I told it… even if I start rambling? [Alex]: Pretty much. Just keep in mind, the knowledge cutoff is January 2025. So, if you need up-to-the-minute facts, you’ll want to use its Search grounding or URL context features. [Jamie]: No Taylor Swift tour dates from 2026, got it. [Alex]: Sorry, Swifties. [Jamie]: Let’s talk API changes. Should developers be worried about breaking stuff? [Alex]: There are a few gotchas. The “thinking_level” parameter is now a string with four values—minimal, low, medium, high. Default is “medium” now, not “high.” And those old sampling parameters—temperature, top_p, top_k? Google says ditch ‘em for all 3.x models. [Jamie]: So, no more fiddling with temperature to make it sound poetic? [Alex]: Nope. The model is tuned for its own defaults. Also, function-calling responses are stricter—each response has to match the originating call exactly, or you’ll get errors. [Jamie]: That’s… actually kind of nice? Fewer mystery bugs. [Alex]: [chuckles] Unless you like mystery bugs, in which case, I’ve got some JavaScript for you. [Jamie]: [groans] Please, no more callback hell. [Alex]: For new projects, Google recommends the Interactions API over the old generateContent API—but both are supported, so you don’t have to rewrite everything overnight. [Jamie]: And how do I actually call this thing? Python, JavaScript… REST? [Alex]: All of the above. For Python, it’s basically: [Alex]: ```python from google import genai client = genai.Client() response = client.models.generate_content( model="gemini-3.5-flash", contents="Explain how parallel agentic execution works in three sentences.", ) print(response.text) ``` [Jamie]: [impressed] That’s… refreshingly simple. Even I can copy-paste that. [Alex]: In JavaScript, it’s just as easy. And for the REST fans, a single POST request and you’re rolling. [Jamie]: Any cool new features I should know about? [Alex]: Batch API is supported, context caching, subagent deployment, everything except “Computer Use” and live browser control. If you need those, stick with the older model for now. [Jamie]: And what’s this about Gemini 3.5 Pro? Do I wait, or is Flash the way to go? [Alex]: Pro was announced alongside Flash, but as of mid-June, it’s not generally available yet. If you’re doing massive long-context retrieval, or you’re a glutton for punishment, you might wait. For most coding and agentic tasks, 3.5 Flash is ready and shippable now. [Jamie]: So, big picture—is Gemini 3.5 Flash worth it? [Alex]: If you need high-speed, high-intelligence agentic workflows and can stomach the price hike, yes. For cost-sensitive workloads, maybe hold off or try Flash-Lite. And as always, verify any benchmark claims directly from Google—don’t trust the guy selling “AI secrets” on YouTube. [Jamie]: [laughs] Or the ones making up new benchmarks like “AI-Generated Cat Picture Quality.” [Alex]: I’d win that one, hands down. [Jamie]: Well, that’s all we have for today on the Nerd Level Tech AI Cast. Don’t forget to rate, subscribe, and send us your favorite AI-generated haikus. [Alex]: And if you’ve already migrated to Gemini 3.5 Flash, let us know what you love—or hate—about it. We read every comment… with the help of a million-token context window. [Jamie]: Thanks for listening, and we’ll catch you next time! [OUTRO MUSIC]