Google TPU

Google's decade-old captive AI silicon — built to run Google's models on Google's network at Google's prices, now opened just enough to anchor Anthropic's largest training contract.

1. Core Product / Service

The Tensor Processing Unit is a custom AI ASIC designed by Google + Broadcom, manufactured at tsmc. Active generations in 2026:

TPU v5e — inference-optimized; cheapest per-token serving SKU in the lineup
TPU v5p — training flagship preceding Trillium; 95 TFLOPS BF16, 459 TB/s HBM bandwidth per chip
TPU v6e (Trillium) — current GA generation; 4.7× compute / 2× HBM capacity & bandwidth vs v5e; 2.1× perf/$ vs v5e and 2.5× perf/$ vs v5p on dense LLM training [1][2]
TPU v7 (Ironwood) — announced as inference-first generation, deploying through 2026

Pods scale to thousands of chips over Google's proprietary OCS (Optical Circuit Switch) + ICI (Inter-Chip Interconnect) — an alternative topology to NVIDIA's NVLink that prioritizes cluster-wide bisection bandwidth. Software stack: JAX + XLA + Pathways, with PyTorch/XLA bridging for non-Google customers.

Distribution: TPUs are rented exclusively through GCP (no direct sale). Captive use inside Google (Search, Ads, Gemini, YouTube) consumes the majority of fleet capacity; external rental is the remainder.

2. Target Users & Pain Points

Google itself — Search ranking, Gemini training/serving, Ads ranking, YouTube transcoding/recommendation. Captive use is by far the biggest TPU customer.
Anthropic — committed in November 2025 to "the largest TPU deal in Google history": hundreds of thousands of Trillium TPUs in 2026, scaling toward ~1M by 2027 [1]. Anthropic now multi-homes across Trainium (aws-trainium) and TPU.
Frontier labs / large enterprises on GCP — Salesforce, Anthropic, Mid-journey, Character.ai (historical), Hugging Face — choose TPU for perf/$ on supported workloads
Pain solved: cost-per-token for training and dense inference; access to extreme-scale clusters via OCS topology
Pain not solved: model portability — moving a TPU-trained model to GPU (or vice versa) is a multi-week eng project

3. Competitive Landscape

Chip	Buyer profile	Software	Distribution
Google TPU v6e (Trillium)	Captive + GCP-rental	JAX/XLA, PyTorch/XLA	GCP only
nvidia B200	Open market	CUDA	All clouds + direct
aws-trainium 2	Captive + AWS-rental	Neuron SDK	AWS only
microsoft-maia 200	Captive	Microsoft toolchain	Azure only (limited)
amd MI355X	Open market	ROCm	All clouds

The captive-chip cohort (TPU/Trainium/Maia) collectively threatens NVIDIA's monopoly on hyperscaler internal demand without ever competing in the open market.

4. Unique Observations

Captive cost vs market alternative — the central economic question. Google does not publish per-chip TPU cost; estimates from supply-chain analysts put Trillium silicon in the $3–5K/chip COGS range vs the $25–40K NVIDIA charges for H100 / $40K for B200. Even after amortizing in-house design (ex-Broadcom + Google chip team) and shared CoWoS allocation at tsmc, Google's effective per-FLOP cost on captive workloads is plausibly 40–60% below GPU rental — though publicly disclosed numbers don't exist. The Anthropic deal pricing (not disclosed) anchors the only external benchmark.
The Anthropic anchor reframes TPU as a credible second source. For most of TPU's history the strategic question was "is Google subsidizing internal use to justify a captive program?" The 2025 Anthropic commitment of hundreds of thousands of Trillium chips makes TPU economically self-sustaining on external revenue alone — and signals to NVIDIA that hyperscaler captive silicon is not just defensive [1].
JAX/XLA gravity well. Anyone trained on TPU writes JAX or PyTorch/XLA, which then runs poorly on GPUs without re-optimization. This is the inverse of CUDA lock-in — equally sticky, just with a smaller installed base. See gpu-kernel-optimization.
Trillium pricing reveals the strategy. Trillium claims 2.5× perf/$ over v5p and 2.1× over v5e — but the price isn't public list; it's via committed-use pricing. Google deliberately doesn't publish a number that NVIDIA could counter-price against. Effective customer pricing is negotiated (Anthropic rate is rumored materially below GCP H100 list).
Ironwood v7 signals a fork: TPU is becoming a two-track product, with training (v6/v7p-equivalent) and inference (v6e/Ironwood) optimized separately — mirroring aws-trainium vs aws-inferentia split.

5. Financials / Funding

Parent: Alphabet/Google Cloud
Disclosed TPU revenue: not separated from GCP segment
GCP revenue (2025 full year): ~$50B+ run rate; AI compute (TPU + GPU rental) is the fastest-growing line
Capex (Alphabet 2025): $75B+ guided for 2025, of which a substantial portion is AI infrastructure including TPU buildout
Trillium GA pricing (committed-use): list ranges roughly $1.20–$2.70/chip-hour depending on commitment tier (varies by region)

6. People & Relationships

TPU engineering origin: Norman Jouppi (TPU lead architect), led from Google Research
Google Cloud CEO: Thomas Kurian
Cloud AI VP: Amin Vahdat (TPU systems / network), Mark Lohmeyer (compute)
Foundry partner: tsmc (advanced nodes; uses CoWoS)
Design partner: Broadcom (custom ASIC implementation/IP)
Major external customers: Anthropic (anchor), Salesforce, Hugging Face, Mid-journey
Captive consumer: Google Search, Ads, Gemini, YouTube, Workspace AI features
Direct competitors: nvidia aws-trainium aws-inferentia microsoft-maia amd cerebras