When will TPU 8t and TPU 8i be available?

Both chips are expected to reach general availability later in 2026 1 4 . Google has not published a specific month.

How does TPU 8t compare to Ironwood?

Google claims TPU 8t delivers nearly 3x the compute performance per pod, up to 2.7x better price-performance for large-scale training, up to 2x better performance-per-watt, and grows the superpod from 9,216 chips to 9,600 chips 3 4 5 .

Is TPU 8i faster than Nvidia's Blackwell or Vera Rubin?

Google did not publish direct comparisons to Nvidia parts. Its "80% better performance-per-dollar" claim for TPU 8i is versus its own previous-generation Ironwood, not versus Nvidia 4 . On the same day, Google separately announced Nvidia Vera Rubin-powered A5X instances as part of the AI Hypercomputer 13 .

Will Anthropic's Claude run on TPU 8?

Yes. Anthropic's up-to-one-million-TPU agreement with Google, expanded in April 2026 with Broadcom to cover multiple gigawatts in 2027, is being filled with eighth-generation TPU capacity 10 12 .

Why is Boardfly different from the 3D torus?

In a 1,024-chip pod, the 3D torus can require up to 16 hops between arbitrary chips. Boardfly's high-radix, optically switched design compresses that to seven hops — a 56% reduction in network diameter and up to a 50% improvement in all-to-all latency, which matters for MoE inference where tokens route through sparsely activated experts 9 .

Google TPU 8t and TPU 8i: The Agentic-Era Chip Split

April 23, 2026

#Google TPU #TPU 8t #TPU 8i #Sunfish #Zebrafish #Google Cloud Next 2026 #AI chips #AI infrastructure #agentic AI #Broadcom #MediaTek #Anthropic

Google TPU 8t and TPU 8i: The Agentic-Era Chip Split

TL;DR

On April 22, 2026, at Google Cloud Next in Las Vegas, Google unveiled its eighth-generation TPU family — and for the first time ever, split it into two purpose-built chips: TPU 8t (codename Sunfish), a training accelerator co-designed with Broadcom, and TPU 8i (codename Zebrafish), an inference accelerator co-designed with MediaTek¹². The training-focused TPU 8t delivers up to 2.7x better price-performance than Ironwood for large-scale training, while the inference-focused TPU 8i claims 80% better performance-per-dollar for low-latency serving of Mixture-of-Experts models³⁴. A single TPU 8t superpod scales to 9,600 chips and two petabytes of shared HBM, reaching 121 FP4 exaFLOPS per pod⁵. Layered on top, Google's new Virgo Network fabric can stitch 134,000 TPU 8t chips into a single data-center fabric and over one million chips across multiple sites — the physical substrate for running what CEO Sundar Pichai described as the "agentic enterprise"⁶.

Both chips will reach general availability later in 2026¹⁴.

What You'll Learn

What's new in Google's eighth-generation TPU, and why Google split it into two SKUs
TPU 8t (Sunfish) and TPU 8i (Zebrafish) specs and performance claims
How the new Boardfly interconnect differs from the classic 3D torus
What the Virgo Network adds — 134,000 chips in one fabric, and more than one million across sites
How TPU 8 fits Anthropic's up-to-one-million-chip deal with Google
Where the Nvidia partnership stands at Google Cloud Next 2026

Why Google Split the Eighth-Gen TPU

Every TPU generation until now has been a single chip asked to do everything — pre-training, fine-tuning, reinforcement learning, and inference. Ironwood, Google's seventh-generation TPU introduced in April 2025 and generally available in November 2025, was already pitched as "the first Google TPU for the age of inference," but physically it was still one silicon design⁷⁸.

With TPU 8, Google changed that. The eighth generation is two chips:

TPU 8t (Sunfish) — a training accelerator co-designed with Broadcom, built around two compute dies, one I/O chiplet, and eight twelve-high stacks of HBM3e².
TPU 8i (Zebrafish) — an inference accelerator co-designed with MediaTek, using a single compute die, one I/O die, and six stacks of HBM3e².

The logic is straightforward. Training and inference have drifted into different shapes of workload. Training wants enormous, dense, all-to-all collectives across tens of thousands of chips. Inference — especially for sparse Mixture-of-Experts (MoE) models running agentic workloads with strict latency budgets — wants lower-diameter networks, more on-chip memory, and better performance-per-dollar per served token. Asking one chip to optimise for both makes the die bigger, hotter, and more expensive than either workload actually needs.

Splitting the SKUs lets Google push each target harder.

TPU 8t (Sunfish): The Training Workhorse

TPU 8t is designed as the chip you build frontier models on. Google's pitch is that it can "reduce the frontier model development cycle from months to weeks."

Per-chip specs

Compute: up to 12.6 FP4 petaFLOPS per chip⁵
HBM: 216 GB of HBM3e per chip³
HBM bandwidth: 6.5 TB/s per chip³
Chip-to-chip interconnect: up to 19.2 Tb/s⁵

Pod and fabric scale

Superpod: 9,600 chips, held together by Google's proven 3D torus topology⁵
Shared memory per pod: two petabytes of HBM⁶
Pod FP4 compute: 121 exaFLOPS — about 2.8x Ironwood's 42.5 exaFLOPS FP4 per pod⁵
Virgo Network fabric: up to 134,000 TPU 8t chips in a single data-center fabric, and more than one million chips across multiple data centers in a single training cluster⁶

Performance claims vs. Ironwood

Metric	TPU 8t vs. Ironwood (TPU 7)
Training compute per pod	~3x³
Price-performance for large-scale training	Up to 2.7x³
Performance-per-watt	Up to 2x⁴
FP4 EFLOPS per pod	2.8x (121 vs 42.5)⁵

One caveat: these are Google's own numbers, comparing the new TPU to the previous TPU. They are not head-to-head benchmarks against Nvidia's Blackwell or the newer Vera Rubin, and Google did not publish such comparisons at launch⁴. If you're evaluating TPU 8t against an Nvidia-based buildout, you'll want your own workload numbers.

TPU 8i (Zebrafish): The Inference Specialist

If TPU 8t is about getting frontier models trained faster, TPU 8i is about serving them — and serving millions of concurrent agents — at a price Nvidia can't match on dense GPUs.

Per-chip specs

Compute: 10.1 FP4 petaFLOPS per chip⁹
On-chip SRAM: 384 MB per chip — triple the amount in Ironwood⁴
HBM: 288 GB of HBM3e per chip⁹
HBM bandwidth: 8.6 TB/s per chip⁹
ICI bandwidth: 19.2 Tb/s per chip, doubled from the previous generation and tuned specifically for MoE all-to-all traffic⁹

Notice that TPU 8i actually carries more HBM than TPU 8t (288 GB vs 216 GB) and higher memory bandwidth. That's intentional: inference for large MoE models is memory-bandwidth-bound, not compute-bound. The chip that serves tokens needs to stream weights and KV-cache faster than the chip that trains them.

TPU 8i also drops Ironwood's dedicated SparseCores in favour of a new Collective Acceleration Engine (CAE) that offloads collective communications from the tensor cores, keeping the math units busier during all-to-all phases⁹.

Boardfly: a new interconnect for inference pods

The biggest architectural departure in TPU 8i isn't the die — it's the network. TPU 8t keeps the proven 3D torus. TPU 8i throws it out.

Boardfly is Google's new high-radix interconnect, organised in three layers⁹:

Building blocks: each tray forms a four-chip ring.
Groups: eight boards fully connected with copper cabling.
Pod: up to 36 groups — 1,024 active chips — linked through Optical Circuit Switches (OCS).

The payoff is network diameter. In a 1,024-chip configuration, a 3D torus can require up to 16 hops between arbitrary chips. Boardfly compresses that worst case to seven hops — a 56% reduction in diameter, and up to a 50% improvement in all-to-all communication latency⁹. For MoE inference, where every token routes through a different subset of experts that may sit on different chips, fewer hops translate almost directly into lower tail latency.

Google's headline inference claim: 80% better performance-per-dollar for low-latency inference on large MoE models, compared to Ironwood⁴.

TPU 8t vs. TPU 8i: Side-by-Side

Spec	TPU 8t (Sunfish)	TPU 8i (Zebrafish)
Role	Training	Inference
Co-design partner	Broadcom	MediaTek
FP4 compute per chip	12.6 PFLOPS	10.1 PFLOPS
On-chip SRAM	128 MB	384 MB
HBM capacity	216 GB	288 GB
HBM bandwidth	6.5 TB/s	8.6 TB/s
Chip-to-chip bandwidth	Up to 19.2 Tb/s	19.2 Tb/s
Interconnect topology	3D torus	Boardfly (high-radix)
Pod size	9,600 chips (superpod)	1,024 chips (Boardfly pod)
Headline claim	2.7x price-performance vs Ironwood for training	80% better performance-per-dollar for MoE inference

Sources: ²³⁴⁵⁹.

The Virgo Network: The Fabric Behind a Million Chips

A superpod is still finite. Google's bigger-picture infrastructure story at Next 2026 was the Virgo Network — the scale-out fabric that glues superpods into data-center-wide and multi-data-center training clusters⁶.

Headline Virgo numbers:

Single fabric: links up to 134,000 TPU 8t chips in one data center with up to 47 petabits/second of non-blocking bisection bandwidth⁶
Multi-site: over one million TPU 8t chips across multiple data centers, in a single training cluster⁶
Bandwidth per accelerator: up to 4x the prior generation⁶
Unloaded fabric latency: 40% lower than the prior generation⁶

This is the substrate Google needs to make good on deals like Anthropic's up-to-one-million-TPU agreement from October 2025 (a commitment made alongside Anthropic's $100 billion AWS Trainium multi-cloud deal) — one data center alone can't hold that many accelerators, so the fabric between data centers has to behave like one machine.

How TPU 8 Connects to the Anthropic Deal

On October 23, 2025, Anthropic announced that it would expand its use of Google Cloud, gaining access to up to one million TPU chips and well over a gigawatt of capacity coming online in 2026, in a deal worth tens of billions of dollars¹⁰¹¹. That was already the company's largest TPU commitment.

In April 2026, as TPU 8t ramped, Anthropic tripled the deal's power envelope, signing a new multi-year agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity starting in 2027¹². Anthropic's run-rate revenue has surpassed $30 billion — up from around $9 billion at the end of 2025 — and it now has more than 1,000 customers spending over $1 million a year on Claude¹².

TPU 8t and TPU 8i are the silicon that deal is being placed on. Broadcom designs the TPU 8t training die; MediaTek designs the TPU 8i inference die². Both are fabricated by TSMC. Google is not Anthropic's only supplier — Claude also runs on Amazon Trainium and Nvidia GPUs under its multi-cloud strategy — but the TPU footprint is the largest by far, and TPU 8 is what the new capacity is built on.

Beyond Anthropic, Google named Midjourney, Salesforce, Safe Superintelligence, Figma, Palo Alto Networks, and Cursor as existing TPU customers⁴.

And Yet, Nvidia Is Still at the Party

The striking thing about Google Cloud Next 2026 isn't that Google announced its own chips — it's that Google announced them alongside an expanded partnership with Nvidia on the same day. The AI Hypercomputer — Google Cloud's umbrella for its AI infrastructure — now spans TPU 8, Nvidia's Vera Rubin, and Google's Arm-based Axion CPUs¹³.

The Nvidia side of the announcement included:

A5X bare-metal instances powered by Nvidia Vera Rubin NVL72 rack-scale systems, with Google claiming up to 10x lower inference cost per token and 10x higher token throughput per megawatt versus the prior generation¹³
A5X scaling via Nvidia ConnectX-9 SuperNICs and Virgo networking — up to 80,000 Rubin GPUs in a single-site cluster, and up to 960,000 across multiple sites¹³
Gemini on Google Distributed Cloud, running on Nvidia Blackwell and Blackwell Ultra GPUs, now in preview — letting customers run Gemini against sensitive data inside their own environments¹³

The message from Google is that TPU and Nvidia are not an either-or for its customers. Some workloads run better on TPU, some run better on GPU, and the AI Hypercomputer will hand you whichever makes sense for the job.

The Agentic-Era Framing

Sundar Pichai framed the launch explicitly around AI agents. The quote from his Next 2026 address¹:

The conversation has gone from "Can we build an agent?" to "How do we manage thousands of them?" That's why we're introducing our new Gemini Enterprise Agent Platform. It provides the secure, full-stack connective tissue you need to build, scale, govern and optimize your agents with confidence — a mission control for the agentic enterprise.

Google's pitch is that the TPU 8t/8i split is the hardware expression of that shift. Training bigger agents needs the dense, high-FLOP pods that TPU 8t delivers. Running millions of agents concurrently — each with its own context window, tool calls, and latency budget — needs TPU 8i's memory-rich, low-hop inference design. The chip split is how Google is trying to make agents cheap enough to run at enterprise scale.

Whether the economics actually play out that way will depend on what TPU 8i's $/1M-token numbers look like once the chips ship and workloads land on them. Google has not published standalone retail pricing; TPU capacity has historically been sold through custom enterprise agreements rather than public rate cards.

Timeline at a Glance

Date	Event
April 9, 2025	Ironwood (TPU 7) introduced at Google Cloud Next 25
October 23, 2025	Anthropic announces up-to-1M-TPU deal with Google, over 1 GW capacity in 2026
Late November 2025	Ironwood reaches general availability
April 7, 2026	Anthropic expands Google/Broadcom deal toward 2027 gigawatts
April 22, 2026	Google unveils TPU 8t and TPU 8i at Google Cloud Next 2026
Later in 2026	TPU 8t and TPU 8i expected to reach general availability

Sources: ¹⁴⁷⁸¹⁰¹².

The Bottom Line

TPU 8 is the first time Google has treated training and inference as different problems at the silicon level, and it is betting that the agentic era will reward the architecture. TPU 8t is a bigger, faster version of the well-understood training pod, with Broadcom inside and Virgo networking around it. TPU 8i is a genuinely new design for inference — more HBM, more SRAM, a new interconnect, and a different design partner in MediaTek.

The numbers to watch are 2.7x, 80%, and one million. 2.7x is Google's training price-performance claim over Ironwood — large enough to matter if it holds up on customer workloads, small enough to be dented by Nvidia's next generation. 80% is the inference price-performance claim, which is where the agent economics actually live. And one million is the number of TPU 8 chips the Anthropic deal implies Google has to be able to stitch together as one machine.

If all three numbers survive contact with production customers, the second-wave custom-silicon thesis — Google TPU, Amazon Trainium, Meta MTIA, and Microsoft Maia all chipping at Nvidia's margins — gets a lot more credible. If they don't, we'll see that in the next earnings cycle too.

Footnotes

Sundar Pichai shares news from Google Cloud Next 2026 — Google blog, April 22, 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵
Google Splits TPUv8 Strategy Into Two Chips, Handing Broadcom Training and MediaTek Inference Duties — Wccftech, April 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Google unveils chips for AI training and inference in latest shot at Nvidia — CNBC, April 22, 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
Google Cloud launches two new AI chips to compete with Nvidia — TechCrunch, April 22, 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³
Our eighth generation TPUs: two chips for the agentic era — Google blog, April 22, 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
Introducing Virgo Network megascale data center fabric — Google Cloud blog, April 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
Ironwood: The first Google TPU for the age of inference — Google blog, April 9, 2025. ↩ ↩²
Google unveils Ironwood, seventh generation TPU, competing with Nvidia — CNBC, November 6, 2025. ↩ ↩²
TPU 8t and TPU 8i technical deep dive — Google Cloud blog, April 22, 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹
Anthropic to Expand Use of Google Cloud TPUs and Services — Google Cloud press, October 23, 2025. ↩ ↩² ↩³
Google and Anthropic announce cloud deal worth tens of billions of dollars — CNBC, October 23, 2025. ↩
Anthropic expands partnership with Google and Broadcom — Anthropic, April 2026. ↩ ↩² ↩³ ↩⁴
NVIDIA and Google Cloud Collaborate to Advance Agentic and Physical AI — Nvidia blog, April 22, 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵

Frequently Asked Questions

TPU 8t (Sunfish) is a training chip co-designed with Broadcom, optimised for dense compute across 9,600-chip superpods on a 3D torus. TPU 8i (Zebrafish) is an inference chip co-designed with MediaTek, with more HBM, a new Boardfly interconnect, and a focus on low-latency serving of Mixture-of-Experts models. They are the first time Google has shipped purpose-built training and inference dies in the same generation 2 4 .

Google TPU 8t and TPU 8i: The Agentic-Era Chip Split

TL;DR

What You'll Learn

Why Google Split the Eighth-Gen TPU

TPU 8t (Sunfish): The Training Workhorse

Per-chip specs

Pod and fabric scale

Performance claims vs. Ironwood

TPU 8i (Zebrafish): The Inference Specialist

Per-chip specs

Boardfly: a new interconnect for inference pods

TPU 8t vs. TPU 8i: Side-by-Side

The Virgo Network: The Fabric Behind a Million Chips

How TPU 8 Connects to the Anthropic Deal

And Yet, Nvidia Is Still at the Party

The Agentic-Era Framing

Timeline at a Glance

The Bottom Line

Footnotes

Frequently Asked Questions

Related Posts

Google's $40B Anthropic Bet: Cash, Compute, and Claude

Meta-Broadcom MTIA Deal: 1GW of 2nm Custom AI Silicon

The Custom AI Chip Race in 2026: Meta, Google, Amazon, and Microsoft vs. Nvidia

Claude Opus 4.8: Benchmarks, Dynamic Workflows, Pricing

Stay on the Nerd Track