ai-ml

Gemini 3.5 Flash: Benchmarks, Pricing & API (2026)

June 15, 2026

Gemini 3.5 Flash: Benchmarks, Pricing & API (2026)

Google shipped Gemini 3.5 Flash at I/O 2026 and made it generally available the same day. It's a 1-million-token agentic coding model that Google says beats last generation's Gemini 3.1 Pro on key coding and agent benchmarks — but it also costs three times as much as the Flash model it replaces. Here's what's actually verified, straight from Google's own docs.

TL;DR

  • Gemini 3.5 Flash is generally available (GA) under the model ID gemini-3.5-flash, announced at Google I/O 2026 on May 19, 2026.12
  • It has a 1,048,576-token input window (~1M) and up to 65,536 output tokens, with a January 2025 knowledge cutoff.3
  • On Google's published numbers it scores 76.2% on Terminal-Bench 2.1, 83.6% on MCP Atlas, 84.2% on CharXiv Reasoning, and 1656 Elo on GDPval-AA, while running roughly 4x faster than other frontier models.1
  • API pricing is $1.50 per million input tokens and $9.00 per million output tokens3x the price of the Gemini 3 Flash Preview it succeeds, but 25% cheaper than Gemini 3.1 Pro.4
  • The big API change: the default thinking_level is now medium (down from high), and temperature/top_p/top_k are no longer recommended for any Gemini 3.x model.2
  • Computer Use is not supported on 3.5 Flash. Gemini 3.5 Pro was announced alongside Flash but, as of mid-June 2026, has not reached general availability.12

What You'll Learn

  • What Gemini 3.5 Flash is and how it fits in the Gemini 3.x lineup
  • The benchmark numbers Google published — and which comparison numbers to distrust
  • Exactly what the model costs, and how that compares to its predecessor and to Pro
  • The full spec sheet: context window, modalities, and supported capabilities
  • The API changes that will break or change your existing Gemini code
  • How to call it in Python, JavaScript, and REST
  • Where Gemini 3.5 Pro stands and whether to wait for it

What Gemini 3.5 Flash Is

Gemini 3.5 Flash is the first model in Google's Gemini 3.5 family, announced at Google I/O 2026 and described by Google as combining "frontier intelligence with action."1 Google positions it as its "strongest agentic and coding model yet" — built for sub-agent deployment, multi-step workflows, and long-horizon tasks rather than simple chat.13

The pitch is that you no longer have to trade quality for latency: Google says 3.5 Flash delivers intelligence that rivals large flagship models while keeping the speed the Flash line is known for, landing in the top-right (high-intelligence, high-speed) quadrant of the Artificial Analysis Intelligence Index.1 It is already the default model for the Gemini app and AI Mode in Google Search globally.1

Unlike the preview models that came before it, gemini-3.5-flash is stable and GA, which means Google considers it ready for scaled production use.2

Gemini 3.5 Flash Benchmarks

Google published the following scores for Gemini 3.5 Flash, stating that the model outperforms the previous-generation Gemini 3.1 Pro on these coding and agentic benchmarks:1

BenchmarkGemini 3.5 FlashWhat it measures
Terminal-Bench 2.176.2%Agentic terminal/coding tasks
MCP Atlas83.6%Tool use via Model Context Protocol
CharXiv Reasoning84.2%Multimodal chart/document understanding
GDPval-AA1656 EloReal-world economically valuable work

On throughput, Google reports 3.5 Flash generates output about 4x faster than other frontier models.1

One important caveat for fact-checkers: these are Gemini 3.5 Flash's own scores. Google's announcement says the model beats Gemini 3.1 Pro on these benchmarks, but Google did not publish a head-to-head table of Pro's exact numbers in that post. A lot of "76.2% vs 70.3%"-style comparison tables circulating on third-party blogs are not sourced from Google and should be treated as unverified. Where this post quotes a number, it comes from Google's own materials.

Gemini 3.5 Flash Pricing

Here is where the story gets more complicated for cost-sensitive teams. On the paid tier, Gemini 3.5 Flash is priced per million tokens as follows, compared with its siblings:4

ModelInput / 1MOutput / 1MStatus
Gemini 3.5 Flash$1.50$9.00GA, stable
Gemini 3 Flash Preview$0.50$3.00Preview (predecessor)
Gemini 3.1 Pro Preview$2.00$12.00Preview (≤200k-token prompts)

Two things stand out. First, Gemini 3.5 Flash costs exactly three times what the Gemini 3 Flash Preview did on both input and output — a meaningful jump for anyone running high-volume Flash workloads. Google itself flags this in its migration guide, suggesting that highly cost-sensitive users consider Gemini 3.1 Flash-Lite instead.2

Second, despite the increase, 3.5 Flash is 25% cheaper than Gemini 3.1 Pro on both input ($1.50 vs $2.00) and output ($9.00 vs $12.00).4 So the value question is not "Flash vs. its old price" but "a cheaper, faster model that Google says wins on coding and agent benchmarks vs. a pricier Pro that still leads on long-context reasoning."

Output tokens include thinking tokens, so reasoning-heavy prompts are billed at the $9.00 rate. The Batch API offers a 50% cost reduction and is supported, as is context caching.24

Gemini 3.5 Flash Specs

The full spec sheet, from Google's model documentation:3

PropertyValue
Model IDgemini-3.5-flash
Input token limit1,048,576 (~1M)
Output token limit65,536 (~65k)
InputsText, image, video, audio, PDF
OutputText
Knowledge cutoffJanuary 2025
Latest updateMay 2026

Capability-wise, 3.5 Flash supports the Batch API, context caching, code execution, file search, function calling, Google Maps grounding, Google Search grounding, priority inference, structured outputs, thinking, and URL context.3 It does not support Computer Use, image generation, the Live API, or audio generation.3 If you need browser-control/Computer Use workloads, Google says to stay on Gemini 3 Flash Preview for now.2

The January 2025 knowledge cutoff is worth internalizing: for anything time-sensitive, you should pair the model with the Search grounding or URL context tools rather than relying on its parametric knowledge.2

API Changes That Will Affect Your Code

If you're migrating from Gemini 3 Flash Preview or from Gemini 2.5, several behavioral changes matter.2

The default thinking level changed. Gemini 3.5 Flash uses a string thinking_level enum with four values, and the default is now medium (it was high on Gemini 3 Flash Preview):

LevelUse it for
minimalSpeed; chat, quick factual answers, simple tool calls
lowLower-latency code and agentic tasks with fewer steps
medium (default)Best quality for most tasks
highHardest reasoning, math, and agent problems

The raw numeric thinking_budget parameter is no longer recommended (though still supported for backward compatibility) — migrate to thinking_level.2

Stop sending sampling parameters. Google now recommends removing temperature, top_p, and top_k from all Gemini 3.x requests; the models are tuned for their defaults.2

Thought preservation is automatic. The model carries intermediate reasoning across multi-turn conversations on its own, which improves iterative debugging and refactoring — but can increase input token counts over a long conversation.2

Function-calling responses are stricter. Every FunctionResponse must include the id from the originating call, match the name, and return exactly one response per call.2

For new agentic projects, Google now recommends its Interactions API (in Beta) over the classic generateContent API, though both are supported.2

How to Call Gemini 3.5 Flash

A minimal call looks like this. In Python with the Gen AI SDK:2

from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Explain how parallel agentic execution works in three sentences.",
)
print(response.text)

In JavaScript/TypeScript:2

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});

const response = await ai.models.generateContent({
  model: "gemini-3.5-flash",
  contents: "Explain how parallel agentic execution works in three sentences.",
});
console.log(response.text);

To override the default thinking level for a hard problem, pass a thinking config:2

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Prove that the square root of 2 is irrational.",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_level="high")
    ),
)
print(response.text)

And the equivalent REST call:2

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [{
      "parts": [{"text": "Explain how parallel agentic execution works in three sentences."}]
    }]
  }'

Where to Use It — and Who's Already On It

Gemini 3.5 Flash is available through the Gemini app, AI Mode in Search, the Gemini API in Google AI Studio and Android Studio, Google's agent-first development platform Antigravity, and the Gemini Enterprise Agent Platform.1 Google also debuted Gemini Spark, a personal AI agent built on 3.5 Flash, rolling out first to trusted testers and then in beta to Google AI Ultra subscribers in the US.1

Google named several launch partners using the model for real workflows, including Shopify (parallel subagents for merchant growth forecasts), Macquarie Bank (reasoning over 100+ page onboarding documents), Salesforce (integration into Agentforce), Ramp (multimodal invoice OCR), Xero (autonomous tax-form workflows), and Databricks (agentic data diagnostics).1

What About Gemini 3.5 Pro?

Google announced Gemini 3.5 Pro at the same I/O event, saying it was "already being used internally" and would roll out "next month" — i.e., June 2026.1 As of mid-June 2026, Pro has not reached general availability; it remains the higher-end option targeting frontier reasoning and the longest-context workloads.1 If your application depends on retrieving specific facts from very long documents, it may be worth waiting for Pro's GA rather than committing to Flash today. For agentic coding and most production workloads, Flash is shippable now.

If caching costs are part of your calculus, our guide to prompt caching with the Claude API covers the same trade-offs across vendors, and our breakdown of Kimi K2.7-Code's first-party benchmarks is a useful reminder to read launch numbers critically. For high-volume jobs, the Batch API pattern applies here too.

The Bottom Line

Gemini 3.5 Flash is a genuine step up for agentic and coding work — GA, fast, with a 1M-token window and Google-reported wins over last generation's Pro on the benchmarks developers care about. The asterisk is cost: at $1.50/$9.00 it's triple the old Flash price, so re-run your unit economics before migrating high-volume workloads, and consider Flash-Lite if you're squeezed. Verify any head-to-head benchmark table against Google's own post before you trust it, lean on Search grounding given the January 2025 knowledge cutoff, and if you live in long-context retrieval, it may be worth waiting for Gemini 3.5 Pro's GA.


This topic is moving quickly. All figures here are sourced to Google's official announcement and developer documentation as of June 15, 2026; pricing and availability can change.

Footnotes

  1. Google, "Gemini 3.5: frontier intelligence with action," The Keyword (blog.google), May 19, 2026. https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/ 2 3 4 5 6 7 8 9 10 11 12 13 14 15

  2. Google AI for Developers, "What's new in Gemini 3.5 Flash." https://ai.google.dev/gemini-api/docs/whats-new-gemini-3.5 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

  3. Google AI for Developers, "Gemini 3.5 Flash" model reference. https://ai.google.dev/gemini-api/docs/models/gemini-3.5-flash 2 3 4 5 6 7

  4. Google AI for Developers, "Gemini Developer API pricing." https://ai.google.dev/gemini-api/docs/pricing 2 3 4 5 6

Frequently Asked Questions

Yes. gemini-3.5-flash is GA and stable for production use as of its May 19, 2026 launch. 2