Gemini 3.5 Flash: Benchmarks, Pricing, Speed (2026)

May 23, 2026

Gemini 3.5 Flash: Benchmarks, Pricing, Speed (2026)

TL;DR

Gemini 3.5 Flash is Google's new agentic coding model, released to general availability on May 19, 2026 at Google I/O. It is a "Flash"-tier model — the fast, mid-priced line — but Google says it outperforms its own larger frontier model, Gemini 3.1 Pro, on coding and agentic benchmarks while running about 4x faster.1 The headline scores: 76.2% on Terminal-Bench 2.1, 1656 Elo on GDPval-AA, and 83.6% on MCP Atlas.1 The catch is price. At $1.50 per million input tokens and $9.00 per million output tokens, 3.5 Flash costs roughly six times more per token than the cheaper Gemini 3.1 Flash-Lite and lands just 25% below Gemini 3.1 Pro.23 Google made it the default model in the Gemini app and AI Mode in Search globally on launch day, and a larger 3.5 Pro is due "next month."14


What You'll Learn

  • What Gemini 3.5 Flash is and where it sits in the Gemini 3.5 family
  • The benchmark scores, and how Flash compares to Gemini 3.1 Pro
  • How fast Gemini 3.5 Flash actually runs
  • Gemini 3.5 Flash API pricing across every tier
  • When to pick Flash over the larger 3.1 Pro
  • Where you can access Gemini 3.5 Flash today
  • Why Google is betting its Flash line on agents, not chatbots

What Is Gemini 3.5 Flash?

Gemini 3.5 Flash is a multimodal large language model from Google DeepMind, released on May 19, 2026 as the first model in the new Gemini 3.5 family.1 Google describes the family as "frontier intelligence with action" — a phrase that signals the shift the model is built around: instead of just answering questions, it is designed to plan, build, and iterate on long-running tasks with limited human input.1

"Flash" is Google's naming for its fast, mid-tier line, positioned below the larger "Pro" models on cost and latency. What makes 3.5 Flash unusual is that it broke that hierarchy on capability. Google says it is "our strongest agentic and coding model yet," and Koray Kavukcuoglu, CTO of Google DeepMind, told reporters it "outperforms our latest frontier model, 3.1 Pro, on nearly all the benchmarks."14

A few specs worth knowing up front: Google released 3.5 Flash as generally available on launch day; the API model ID is gemini-3.5-flash; it supports a context window of 1,048,576 input tokens and up to 65,536 output tokens; and its knowledge cutoff is January 2025.125 Developer Simon Willison, who reviewed the launch documentation, noted that 3.5 Flash carries the same developer platform features as the earlier Gemini 3.x series with one exception — the computer-use tool is not exposed for 3.5 Flash as of launch (the model still posts a strong 78.4% on the OSWorld-Verified computer-use benchmark).35

A larger sibling, Gemini 3.5 Pro, is already in internal use and Google says it will roll out "next month."1

Gemini 3.5 Flash Benchmarks: Beating Gemini 3.1 Pro

The story Google wants you to take away is that a Flash model beat the bigger Pro. On the benchmarks Google published, 3.5 Flash does exactly that across coding, tool use, and expert tasks.

Here is how 3.5 Flash compares with the previous Gemini 3 Flash and with Google's larger Gemini 3.1 Pro, using Google's own published numbers:15

BenchmarkGemini 3.5 FlashGemini 3 FlashGemini 3.1 Pro
Terminal-Bench 2.1 — agentic coding76.2%58.0%70.3%
MCP Atlas — tool use83.6%62.0%78.2%
Finance Agent v2 — expert tasks57.9%42.6%43.0%
GDPval-AA — knowledge work (Elo)165612041314
CharXiv Reasoning — multimodal84.2%80.3%83.3%
MRCR v2 128k — long context77.3%67.2%84.9%
Humanity's Last Exam — reasoning40.2%33.7%44.4%

Two patterns stand out. First, the jump over the previous Gemini 3 Flash is large: on MCP Atlas, Flash climbs from 62.0% to 83.6%; on GDPval-AA it rises from 1204 to 1656 Elo. Second, 3.5 Flash genuinely passes 3.1 Pro on coding, tool use, expert financial tasks, and multimodal reasoning — exactly the agentic workloads Google built it for.

But the same table shows the limits. On Humanity's Last Exam, an academic-reasoning test, 3.1 Pro still leads 44.4% to 40.2%. On MRCR v2 — long-context retrieval across a 128k-token window — Pro leads 84.9% to 77.3%. That is why Kavukcuoglu's claim was worded carefully as "nearly all the benchmarks," and why Google is keeping 3.1 Pro in the lineup rather than retiring it.4 When raw stored knowledge or needle-in-a-haystack retrieval matters more than agentic execution, the bigger model still wins.

One more honest caveat: "frontier" is a crowded field. On Google's own comparison table, OpenAI's GPT-5.5 and Anthropic's Claude Opus 4.7 still edge out 3.5 Flash on several benchmarks, including GDPval-AA and Humanity's Last Exam.5 The case for 3.5 Flash is not that it is the single smartest model available — it is that it delivers near-frontier results at Flash speed and a below-Pro price. Google says it lands in the "top-right quadrant" of the Artificial Analysis Intelligence Index, the corner that combines high intelligence with high speed.1

How Fast Is Gemini 3.5 Flash?

Speed is the real product here. Google's headline claim is that, measured in output tokens per second, 3.5 Flash is 4 times faster than other frontier models.1 Kavukcuoglu went further on the call with reporters, saying Google built an optimized version of Flash that runs 12 times faster with the same quality.4

Why does a model team obsess over tokens per second? Because agentic work changes the math. A chatbot answers one prompt and stops. An agent plans a task, calls tools, reads the results, revises, and loops — sometimes for hours. Tulsee Doshi, Google's senior director and head of product, said 3.5 Flash can run autonomously for multiple hours, pausing to ask for human input when it hits a decision point or a permission boundary.4 When a model runs that long and spawns parallel subagents, latency per step compounds — so a 4x speedup is the difference between an agent that finishes a job over lunch and one that takes all afternoon.

That design goal is also why Google says 3.5 Flash can do work that "used to take a developer days or an auditor weeks" in a fraction of the time — and, it claims, "often at less than half the cost of other frontier models."1

Gemini 3.5 Flash Pricing: The Flash Tier Got Expensive

Here is where the launch gets more complicated than the marketing. Gemini 3.5 Flash standard pricing, from Google's official API pricing page, is:2

TierInput (per 1M tokens)Output (per 1M tokens)
Standard$1.50$9.00
Batch$0.75$4.50
Priority$2.70$16.20

⚠ Prices change frequently. The values above are for illustration only and may be out of date. Always verify current pricing directly with the provider before making cost decisions: Anthropic · OpenAI · Google Gemini · Google Vertex AI · AWS Bedrock · Azure OpenAI · Mistral · Cohere · Together AI · DeepSeek · Groq · Fireworks AI · Perplexity · xAI · Cursor · GitHub Copilot · Windsurf.

Context caching on the standard tier costs $0.15 per million tokens, plus a storage fee of $1.00 per million tokens per hour.2 Search grounding includes 5,000 prompts per month free (shared across the Gemini 3 family), then $14 per 1,000 search queries.2

The number that surprised developers is not how that compares to Pro — it is how it compares to the old Flash tier. Gemini 3.1 Flash-Lite, Google's cost-efficient model, runs $0.25 input and $1.50 output.2 At $1.50 and $9.00, 3.5 Flash costs six times more per token than Flash-Lite. Simon Willison calculated that it is also roughly three times the price of the previous Gemini 3 Flash Preview.3

Against Pro, the picture flips: Gemini 3.1 Pro costs $2.00 input and $12.00 output for prompts up to 200k tokens, so 3.5 Flash is about 25% cheaper per token.2 That is the comparison Google leads with — frontier-class results, 25% under Pro pricing.

But per-token price is not the same as per-task cost. Gemini 3.5 Flash is a reasoning model, and Google's pricing bills its internal "thinking" tokens at the output rate — so a reasoning-heavy job can run up a bigger bill than the sticker price suggests.2 Willison flagged this directly: run at high reasoning effort, Gemini 3.5 Flash cost more to put through Artificial Analysis's standard benchmark suite than Gemini 3.1 Pro Preview did, despite Flash's lower per-token rate.3 The lesson for anyone budgeting: benchmark your own workload before assuming Flash is the cheap option.

This price creep is an industry pattern, not a Google quirk. Willison notes that OpenAI's GPT-5.5 launched at twice the price of GPT-5.4, and Claude Opus 4.7 is meaningfully pricier than its predecessor.3 All three labs appear to be testing how much their API customers will pay.

Gemini 3.5 Flash vs Gemini 3.1 Pro: Which to Use

With Flash now beating Pro on several benchmarks while costing less per token, the obvious question is whether 3.1 Pro is still worth using. The honest answer is that Google designed them to play different roles.

Reach for Gemini 3.5 Flash when the job is agentic and execution-heavy: coding pipelines, multi-step tool use, parallel subagents, document-processing workflows, or anything where speed and throughput matter and the task can be checked along the way.

Reach for Gemini 3.1 Pro when the job leans on deep reasoning or precise retrieval across very long contexts — needle-in-a-haystack search over 100k-plus-token documents, dense legal or financial analysis, or problems where you want the largest model's stored knowledge.

Google's own framing points the same way. Doshi described how the forthcoming 3.5 Pro and 3.5 Flash are meant to work as a pair: "3.5 Pro becomes your orchestrator, your planner, and then it actually can leverage Flash to be the various sub-agents."4 In other words, the future Google is building toward is not Flash or Pro — it is a planner model directing a swarm of fast Flash workers.

How to Access Gemini 3.5 Flash

Google made 3.5 Flash broadly available on launch day.1 You can reach it through:

  • The Gemini API, via Google AI Studio and Android Studio — the route for developers building their own apps.
  • Google Antigravity, Google's agent-first development platform and IDE. Google also shipped Antigravity 2.0 at I/O as a standalone desktop application built around agent-first development.4
  • Gemini Enterprise and the Gemini Enterprise Agent Platform, for organizations.
  • The Gemini app and AI Mode in Google Search, where 3.5 Flash is now the default model globally for everyone.1

It also powers Google's new Gemini Spark personal assistant, a 24/7 agent that Google began rolling out to trusted testers, with a beta planned for Google AI Ultra subscribers in the US.1 On the API side, Google paired the launch with a new Interactions API, currently in beta, that adds server-side conversation history management.3

Why Google Bet Flash on Agents

The most telling line of the launch is the framing TechCrunch put on it: with Gemini 3.5 Flash, Google is betting its next AI wave on agents, not chatbots.4 Putting frontier-class results in a fast, mid-tier model — and making that model the default across Search and the Gemini app — is a statement that Google expects the dominant mode of AI use to shift from "ask a question, read an answer" to "hand off a task, get back a result."

That bet carries real risk. Google acknowledged it strengthened the model's cyber and CBRN (chemical, biological, radiological, nuclear) safeguards under its Frontier Safety Framework, and TechCrunch noted Google is already facing a lawsuit tied to harmful Gemini interactions.14 Handing autonomous, hours-long agents to billions of consumers raises the stakes on every safety decision.

And the demos remain demos. Google showed agents building a playable game and even an operating system from scratch inside Antigravity.14 Impressive — but a staged demo is not a track record. The real verdict on Gemini 3.5 Flash will come from the workloads developers run against it over the next few months, not from a keynote stage.

The Bottom Line

Gemini 3.5 Flash is a notable release: a mid-tier model that out-benchmarks Google's larger Gemini 3.1 Pro on the tasks that matter most for agents, at four times the speed. For developers building coding tools, automation pipelines, or multi-agent systems, it is an easy model to test and likely a strong default.

But read the price tag carefully. "Flash" used to mean cheap. Gemini 3.5 Flash is fast and capable, not cheap — it costs six times more per token than the Flash-Lite model it sits above, and a reasoning-heavy workload can cost more end-to-end than Gemini 3.1 Pro. The smart move is the unglamorous one: run your real workload on it, measure the full per-task cost, and decide from your own numbers rather than the keynote's.

Footnotes

  1. Google — Gemini 3.5: frontier intelligence with action, The Keyword blog, May 19, 2026. https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/ 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

  2. Google — Gemini Developer API pricing, Google AI for Developers. https://ai.google.dev/gemini-api/docs/pricing 2 3 4 5 6 7 8 9 10 11

  3. Simon Willison — Gemini 3.5 Flash: more expensive, but Google plan to use it for everything, May 19, 2026. https://simonwillison.net/2026/May/19/gemini-35-flash/ 2 3 4 5 6 7

  4. TechCrunch — With Gemini 3.5 Flash, Google bets its next AI wave on agents, not chatbots, Rebecca Bellan, May 19, 2026. https://techcrunch.com/2026/05/19/with-gemini-3-5-flash-google-bets-its-next-ai-wave-on-agents-not-chatbots/ 2 3 4 5 6 7 8 9 10 11 12

  5. Google DeepMind — Gemini 3.5 Flash model page (benchmark comparison table and model specifications). https://deepmind.google/models/gemini/flash/ 2 3 4

Frequently Asked Questions

Gemini 3.5 Flash is a fast, multimodal AI model from Google DeepMind, released on May 19, 2026. It is the first model in the Gemini 3.5 family and is built for agentic and coding tasks — planning, tool use, and long-running workflows rather than simple chat. 1

FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.