Google TPU 8t and TPU 8i: The Agentic-Era Chip Split

April 23, 2026

Google TPU 8t and TPU 8i: The Agentic-Era Chip Split

TL;DR

On April 22, 2026, at Google Cloud Next in Las Vegas, Google unveiled its eighth-generation TPU family — and for the first time ever, split it into two purpose-built chips: TPU 8t (codename Sunfish), a training accelerator co-designed with Broadcom, and TPU 8i (codename Zebrafish), an inference accelerator co-designed with MediaTek12. The training-focused TPU 8t delivers up to 2.7x better price-performance than Ironwood for large-scale training, while the inference-focused TPU 8i claims 80% better performance-per-dollar for low-latency serving of Mixture-of-Experts models34. A single TPU 8t superpod scales to 9,600 chips and two petabytes of shared HBM, reaching 121 FP4 exaFLOPS per pod5. Layered on top, Google's new Virgo Network fabric can stitch 134,000 TPU 8t chips into a single data-center fabric and over one million chips across multiple sites — the physical substrate for running what CEO Sundar Pichai described as the "agentic enterprise"6.

Both chips will reach general availability later in 202614.


What You'll Learn

  • What's new in Google's eighth-generation TPU, and why Google split it into two SKUs
  • TPU 8t (Sunfish) and TPU 8i (Zebrafish) specs and performance claims
  • How the new Boardfly interconnect differs from the classic 3D torus
  • What the Virgo Network adds — 134,000 chips in one fabric, and more than one million across sites
  • How TPU 8 fits Anthropic's up-to-one-million-chip deal with Google
  • Where the Nvidia partnership stands at Google Cloud Next 2026

Why Google Split the Eighth-Gen TPU

Every TPU generation until now has been a single chip asked to do everything — pre-training, fine-tuning, reinforcement learning, and inference. Ironwood, Google's seventh-generation TPU introduced in April 2025 and generally available in November 2025, was already pitched as "the first Google TPU for the age of inference," but physically it was still one silicon design78.

With TPU 8, Google changed that. The eighth generation is two chips:

  • TPU 8t (Sunfish) — a training accelerator co-designed with Broadcom, built around two compute dies, one I/O chiplet, and eight twelve-high stacks of HBM3e2.
  • TPU 8i (Zebrafish) — an inference accelerator co-designed with MediaTek, using a single compute die, one I/O die, and six stacks of HBM3e2.

The logic is straightforward. Training and inference have drifted into different shapes of workload. Training wants enormous, dense, all-to-all collectives across tens of thousands of chips. Inference — especially for sparse Mixture-of-Experts (MoE) models running agentic workloads with strict latency budgets — wants lower-diameter networks, more on-chip memory, and better performance-per-dollar per served token. Asking one chip to optimise for both makes the die bigger, hotter, and more expensive than either workload actually needs.

Splitting the SKUs lets Google push each target harder.


TPU 8t (Sunfish): The Training Workhorse

TPU 8t is designed as the chip you build frontier models on. Google's pitch is that it can "reduce the frontier model development cycle from months to weeks."

Per-chip specs

  • Compute: up to 12.6 FP4 petaFLOPS per chip5
  • HBM: 216 GB of HBM3e per chip3
  • HBM bandwidth: 6.5 TB/s per chip3
  • Chip-to-chip interconnect: up to 19.2 Tb/s5

Pod and fabric scale

  • Superpod: 9,600 chips, held together by Google's proven 3D torus topology5
  • Shared memory per pod: two petabytes of HBM6
  • Pod FP4 compute: 121 exaFLOPS — about 2.8x Ironwood's 42.5 exaFLOPS FP4 per pod5
  • Virgo Network fabric: up to 134,000 TPU 8t chips in a single data-center fabric, and more than one million chips across multiple data centers in a single training cluster6

Performance claims vs. Ironwood

MetricTPU 8t vs. Ironwood (TPU 7)
Training compute per pod~3x3
Price-performance for large-scale trainingUp to 2.7x3
Performance-per-wattUp to 2x4
FP4 EFLOPS per pod2.8x (121 vs 42.5)5

One caveat: these are Google's own numbers, comparing the new TPU to the previous TPU. They are not head-to-head benchmarks against Nvidia's Blackwell or the newer Vera Rubin, and Google did not publish such comparisons at launch4. If you're evaluating TPU 8t against an Nvidia-based buildout, you'll want your own workload numbers.


TPU 8i (Zebrafish): The Inference Specialist

If TPU 8t is about getting frontier models trained faster, TPU 8i is about serving them — and serving millions of concurrent agents — at a price Nvidia can't match on dense GPUs.

Per-chip specs

  • Compute: 10.1 FP4 petaFLOPS per chip9
  • On-chip SRAM: 384 MB per chip — triple the amount in Ironwood4
  • HBM: 288 GB of HBM3e per chip9
  • HBM bandwidth: 8.6 TB/s per chip9
  • ICI bandwidth: 19.2 Tb/s per chip, doubled from the previous generation and tuned specifically for MoE all-to-all traffic9

Notice that TPU 8i actually carries more HBM than TPU 8t (288 GB vs 216 GB) and higher memory bandwidth. That's intentional: inference for large MoE models is memory-bandwidth-bound, not compute-bound. The chip that serves tokens needs to stream weights and KV-cache faster than the chip that trains them.

TPU 8i also drops Ironwood's dedicated SparseCores in favour of a new Collective Acceleration Engine (CAE) that offloads collective communications from the tensor cores, keeping the math units busier during all-to-all phases9.

Boardfly: a new interconnect for inference pods

The biggest architectural departure in TPU 8i isn't the die — it's the network. TPU 8t keeps the proven 3D torus. TPU 8i throws it out.

Boardfly is Google's new high-radix interconnect, organised in three layers9:

  1. Building blocks: each tray forms a four-chip ring.
  2. Groups: eight boards fully connected with copper cabling.
  3. Pod: up to 36 groups — 1,024 active chips — linked through Optical Circuit Switches (OCS).

The payoff is network diameter. In a 1,024-chip configuration, a 3D torus can require up to 16 hops between arbitrary chips. Boardfly compresses that worst case to seven hops — a 56% reduction in diameter, and up to a 50% improvement in all-to-all communication latency9. For MoE inference, where every token routes through a different subset of experts that may sit on different chips, fewer hops translate almost directly into lower tail latency.

Google's headline inference claim: 80% better performance-per-dollar for low-latency inference on large MoE models, compared to Ironwood4.


TPU 8t vs. TPU 8i: Side-by-Side

SpecTPU 8t (Sunfish)TPU 8i (Zebrafish)
RoleTrainingInference
Co-design partnerBroadcomMediaTek
FP4 compute per chip12.6 PFLOPS10.1 PFLOPS
On-chip SRAM128 MB384 MB
HBM capacity216 GB288 GB
HBM bandwidth6.5 TB/s8.6 TB/s
Chip-to-chip bandwidthUp to 19.2 Tb/s19.2 Tb/s
Interconnect topology3D torusBoardfly (high-radix)
Pod size9,600 chips (superpod)1,024 chips (Boardfly pod)
Headline claim2.7x price-performance vs Ironwood for training80% better performance-per-dollar for MoE inference

Sources: 23459.


The Virgo Network: The Fabric Behind a Million Chips

A superpod is still finite. Google's bigger-picture infrastructure story at Next 2026 was the Virgo Network — the scale-out fabric that glues superpods into data-center-wide and multi-data-center training clusters6.

Headline Virgo numbers:

  • Single fabric: links up to 134,000 TPU 8t chips in one data center with up to 47 petabits/second of non-blocking bisection bandwidth6
  • Multi-site: over one million TPU 8t chips across multiple data centers, in a single training cluster6
  • Bandwidth per accelerator: up to 4x the prior generation6
  • Unloaded fabric latency: 40% lower than the prior generation6

This is the substrate Google needs to make good on deals like Anthropic's up-to-one-million-TPU agreement from October 2025 (a commitment made alongside Anthropic's $100 billion AWS Trainium multi-cloud deal) — one data center alone can't hold that many accelerators, so the fabric between data centers has to behave like one machine.


How TPU 8 Connects to the Anthropic Deal

On October 23, 2025, Anthropic announced that it would expand its use of Google Cloud, gaining access to up to one million TPU chips and well over a gigawatt of capacity coming online in 2026, in a deal worth tens of billions of dollars1011. That was already the company's largest TPU commitment.

In April 2026, as TPU 8t ramped, Anthropic tripled the deal's power envelope, signing a new multi-year agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity starting in 202712. Anthropic's run-rate revenue has surpassed $30 billion — up from around $9 billion at the end of 2025 — and it now has more than 1,000 customers spending over $1 million a year on Claude12.

TPU 8t and TPU 8i are the silicon that deal is being placed on. Broadcom designs the TPU 8t training die; MediaTek designs the TPU 8i inference die2. Both are fabricated by TSMC. Google is not Anthropic's only supplier — Claude also runs on Amazon Trainium and Nvidia GPUs under its multi-cloud strategy — but the TPU footprint is the largest by far, and TPU 8 is what the new capacity is built on.

Beyond Anthropic, Google named Midjourney, Salesforce, Safe Superintelligence, Figma, Palo Alto Networks, and Cursor as existing TPU customers4.


And Yet, Nvidia Is Still at the Party

The striking thing about Google Cloud Next 2026 isn't that Google announced its own chips — it's that Google announced them alongside an expanded partnership with Nvidia on the same day. The AI Hypercomputer — Google Cloud's umbrella for its AI infrastructure — now spans TPU 8, Nvidia's Vera Rubin, and Google's Arm-based Axion CPUs13.

The Nvidia side of the announcement included:

  • A5X bare-metal instances powered by Nvidia Vera Rubin NVL72 rack-scale systems, with Google claiming up to 10x lower inference cost per token and 10x higher token throughput per megawatt versus the prior generation13
  • A5X scaling via Nvidia ConnectX-9 SuperNICs and Virgo networking — up to 80,000 Rubin GPUs in a single-site cluster, and up to 960,000 across multiple sites13
  • Gemini on Google Distributed Cloud, running on Nvidia Blackwell and Blackwell Ultra GPUs, now in preview — letting customers run Gemini against sensitive data inside their own environments13

The message from Google is that TPU and Nvidia are not an either-or for its customers. Some workloads run better on TPU, some run better on GPU, and the AI Hypercomputer will hand you whichever makes sense for the job.


The Agentic-Era Framing

Sundar Pichai framed the launch explicitly around AI agents. The quote from his Next 2026 address1:

The conversation has gone from "Can we build an agent?" to "How do we manage thousands of them?" That's why we're introducing our new Gemini Enterprise Agent Platform. It provides the secure, full-stack connective tissue you need to build, scale, govern and optimize your agents with confidence — a mission control for the agentic enterprise.

Google's pitch is that the TPU 8t/8i split is the hardware expression of that shift. Training bigger agents needs the dense, high-FLOP pods that TPU 8t delivers. Running millions of agents concurrently — each with its own context window, tool calls, and latency budget — needs TPU 8i's memory-rich, low-hop inference design. The chip split is how Google is trying to make agents cheap enough to run at enterprise scale.

Whether the economics actually play out that way will depend on what TPU 8i's $/1M-token numbers look like once the chips ship and workloads land on them. Google has not published standalone retail pricing; TPU capacity has historically been sold through custom enterprise agreements rather than public rate cards.


Timeline at a Glance

DateEvent
April 9, 2025Ironwood (TPU 7) introduced at Google Cloud Next 25
October 23, 2025Anthropic announces up-to-1M-TPU deal with Google, over 1 GW capacity in 2026
Late November 2025Ironwood reaches general availability
April 7, 2026Anthropic expands Google/Broadcom deal toward 2027 gigawatts
April 22, 2026Google unveils TPU 8t and TPU 8i at Google Cloud Next 2026
Later in 2026TPU 8t and TPU 8i expected to reach general availability

Sources: 14781012.


The Bottom Line

TPU 8 is the first time Google has treated training and inference as different problems at the silicon level, and it is betting that the agentic era will reward the architecture. TPU 8t is a bigger, faster version of the well-understood training pod, with Broadcom inside and Virgo networking around it. TPU 8i is a genuinely new design for inference — more HBM, more SRAM, a new interconnect, and a different design partner in MediaTek.

The numbers to watch are 2.7x, 80%, and one million. 2.7x is Google's training price-performance claim over Ironwood — large enough to matter if it holds up on customer workloads, small enough to be dented by Nvidia's next generation. 80% is the inference price-performance claim, which is where the agent economics actually live. And one million is the number of TPU 8 chips the Anthropic deal implies Google has to be able to stitch together as one machine.

If all three numbers survive contact with production customers, the second-wave custom-silicon thesis — Google TPU, Amazon Trainium, Meta MTIA, and Microsoft Maia all chipping at Nvidia's margins — gets a lot more credible. If they don't, we'll see that in the next earnings cycle too.


Footnotes

Footnotes

  1. Sundar Pichai shares news from Google Cloud Next 2026 — Google blog, April 22, 2026. 2 3 4 5

  2. Google Splits TPUv8 Strategy Into Two Chips, Handing Broadcom Training and MediaTek Inference Duties — Wccftech, April 2026. 2 3 4 5 6

  3. Google unveils chips for AI training and inference in latest shot at Nvidia — CNBC, April 22, 2026. 2 3 4 5 6 7

  4. Google Cloud launches two new AI chips to compete with Nvidia — TechCrunch, April 22, 2026. 2 3 4 5 6 7 8 9 10 11 12 13

  5. Our eighth generation TPUs: two chips for the agentic era — Google blog, April 22, 2026. 2 3 4 5 6 7 8

  6. Introducing Virgo Network megascale data center fabric — Google Cloud blog, April 2026. 2 3 4 5 6 7 8

  7. Ironwood: The first Google TPU for the age of inference — Google blog, April 9, 2025. 2

  8. Google unveils Ironwood, seventh generation TPU, competing with Nvidia — CNBC, November 6, 2025. 2

  9. TPU 8t and TPU 8i technical deep dive — Google Cloud blog, April 22, 2026. 2 3 4 5 6 7 8 9

  10. Anthropic to Expand Use of Google Cloud TPUs and Services — Google Cloud press, October 23, 2025. 2 3

  11. Google and Anthropic announce cloud deal worth tens of billions of dollars — CNBC, October 23, 2025.

  12. Anthropic expands partnership with Google and Broadcom — Anthropic, April 2026. 2 3 4

  13. NVIDIA and Google Cloud Collaborate to Advance Agentic and Physical AI — Nvidia blog, April 22, 2026. 2 3 4 5

Frequently Asked Questions

TPU 8t (Sunfish) is a training chip co-designed with Broadcom, optimised for dense compute across 9,600-chip superpods on a 3D torus. TPU 8i (Zebrafish) is an inference chip co-designed with MediaTek, with more HBM, a new Boardfly interconnect, and a focus on low-latency serving of Mixture-of-Experts models. They are the first time Google has shipped purpose-built training and inference dies in the same generation 2 4 .

FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.