Google TPU 8t and TPU 8i: The Agentic-Era Chip Split
Google unveiled TPU 8t (Sunfish) and TPU 8i (Zebrafish) at Cloud Next 2026 — eighth-gen chips splitting AI training from inference at 2.7x price-perf.
Google unveiled TPU 8t (Sunfish) and TPU 8i (Zebrafish) at Cloud Next 2026 — eighth-gen chips splitting AI training from inference at 2.7x price-perf.
Amazon invests up to $25 billion in Anthropic as Anthropic commits $100 billion to AWS over 10 years for up to 5 GW of Trainium2 and Trainium3 compute.
Meta and Broadcom extended their MTIA chip partnership through 2029, starting with over 1 GW of custom silicon on the industry's first 2nm AI accelerator.
Cerebras targets a $22–25B Nasdaq listing backed by a $10B OpenAI contract. Inside the wafer-scale chip 57x bigger than Nvidia's H100 and what's at stake.
Big Tech is spending $700B on AI infrastructure in 2026. Here is what Amazon, Google, Meta, Microsoft, and Oracle are building and whether it will pay off.
Beyond hourly rates — true GPU cloud TCO in 2026: egress, storage, commitment discounts, and what AI training and inference actually cost per month.
Specialized GPU clouds cost 60–85% less than AWS. RunPod, Vast.ai, Thunder Compute, and Northflank benchmarked for AI training and inference in 2026.
The custom AI chip race in 2026 — Meta MTIA, Google TPU, Amazon Trainium, Microsoft Maia vs. Nvidia. Strategy, cost, supply, and workload fit for each.
ML model training from costs to code: frontier costs (Gemini Ultra $191M), compute doubling every 6 months, plus runnable training patterns for smaller teams.
AI rate limiting in 2026: adaptive, context-aware limits across prompts, tokens, users, and cost. The patterns that balance fairness and runaway spend.
Model serving patterns: batch, online, streaming, edge. Latency, cost, and throughput trade-offs for each — plus the tools (BentoML, vLLM, TGI) to ship with.
ML model monitoring: detect data drift, concept drift, and fairness regressions before they hit users. Tools, dashboards, and alerts that catch early.
Cut LLM costs without cutting corners: quantization, distillation, caching, batching, router choice, and infrastructure moves that actually preserve quality.
SQLite in 2026, beyond embedded: edge compute, AI inference caching, local-first apps on Turso and Cloudflare D1. Why it's the database quietly powering 2026.
The future of LLMs and fine-tuning: LoRA, adapters, RAG, synthetic data, and the modular techniques replacing full retraining in 2026 production workflows.
A hands-on, deeply detailed guide to mastering MLOps—from model versioning and CI/CD to monitoring, scaling, and real-world production practices.
One email per week — courses, deep dives, tools, and AI experiments.
No spam. Unsubscribe anytime.