Google TPU
Google's decade-old captive AI silicon — built to run Google's models on Google's network at Google's prices, now opened just enough to anchor Anthropic's largest training contract.
1. Core Product / Service
The Tensor Processing Unit is a custom AI ASIC designed by Google + Broadcom, manufactured at tsmc. Active generations in 2026:
- TPU v5e — inference-optimized; cheapest per-token serving SKU in the lineup
- TPU v5p — training flagship preceding Trillium; 95 TFLOPS BF16, 459 TB/s HBM bandwidth per chip
- TPU v6e (Trillium) — current GA generation; 4.7× compute / 2× HBM capacity & bandwidth vs v5e; 2.1× perf/$ vs v5e and 2.5× perf/$ vs v5p on dense LLM training [1][2]
- TPU v7 (Ironwood) — announced as inference-first generation, deploying through 2026
Pods scale to thousands of chips over Google's proprietary OCS (Optical Circuit Switch) + ICI (Inter-Chip Interconnect) — an alternative topology to NVIDIA's NVLink that prioritizes cluster-wide bisection bandwidth. Software stack: JAX + XLA + Pathways, with PyTorch/XLA bridging for non-Google customers.
Distribution: TPUs are rented exclusively through GCP (no direct sale). Captive use inside Google (Search, Ads, Gemini, YouTube) consumes the majority of fleet capacity; external rental is the remainder.
2. Target Users & Pain Points
- Google itself — Search ranking, Gemini training/serving, Ads ranking, YouTube transcoding/recommendation. Captive use is by far the biggest TPU customer.
- Anthropic — committed in November 2025 to "the largest TPU deal in Google history": hundreds of thousands of Trillium TPUs in 2026, scaling toward ~1M by 2027 [1]. Anthropic now multi-homes across Trainium (aws-trainium) and TPU.
- Frontier labs / large enterprises on GCP — Salesforce, Anthropic, Mid-journey, Character.ai (historical), Hugging Face — choose TPU for perf/$ on supported workloads
- Pain solved: cost-per-token for training and dense inference; access to extreme-scale clusters via OCS topology
- Pain not solved: model portability — moving a TPU-trained model to GPU (or vice versa) is a multi-week eng project
3. Competitive Landscape
| Chip | Buyer profile | Software | Distribution |
|---|---|---|---|
| Google TPU v6e (Trillium) | Captive + GCP-rental | JAX/XLA, PyTorch/XLA | GCP only |
| nvidia B200 | Open market | CUDA | All clouds + direct |
| aws-trainium 2 | Captive + AWS-rental | Neuron SDK | AWS only |
| microsoft-maia 200 | Captive | Microsoft toolchain | Azure only (limited) |
| amd MI355X | Open market | ROCm | All clouds |
The captive-chip cohort (TPU/Trainium/Maia) collectively threatens NVIDIA's monopoly on hyperscaler internal demand without ever competing in the open market.
4. Unique Observations
- Captive cost vs market alternative — the central economic question. Google does not publish per-chip TPU cost; estimates from supply-chain analysts put Trillium silicon in the $3–5K/chip COGS range vs the $25–40K NVIDIA charges for H100 / $40K for B200. Even after amortizing in-house design (ex-Broadcom + Google chip team) and shared CoWoS allocation at tsmc, Google's effective per-FLOP cost on captive workloads is plausibly 40–60% below GPU rental — though publicly disclosed numbers don't exist. The Anthropic deal pricing (not disclosed) anchors the only external benchmark.
- The Anthropic anchor reframes TPU as a credible second source. For most of TPU's history the strategic question was "is Google subsidizing internal use to justify a captive program?" The 2025 Anthropic commitment of hundreds of thousands of Trillium chips makes TPU economically self-sustaining on external revenue alone — and signals to NVIDIA that hyperscaler captive silicon is not just defensive [1].
- JAX/XLA gravity well. Anyone trained on TPU writes JAX or PyTorch/XLA, which then runs poorly on GPUs without re-optimization. This is the inverse of CUDA lock-in — equally sticky, just with a smaller installed base. See gpu-kernel-optimization.
- Trillium pricing reveals the strategy. Trillium claims 2.5× perf/$ over v5p and 2.1× over v5e — but the price isn't public list; it's via committed-use pricing. Google deliberately doesn't publish a number that NVIDIA could counter-price against. Effective customer pricing is negotiated (Anthropic rate is rumored materially below GCP H100 list).
- Ironwood v7 signals a fork: TPU is becoming a two-track product, with training (v6/v7p-equivalent) and inference (v6e/Ironwood) optimized separately — mirroring aws-trainium vs aws-inferentia split.
5. Financials / Funding
- Parent: Alphabet/Google Cloud
- Disclosed TPU revenue: not separated from GCP segment
- GCP revenue (2025 full year): ~$50B+ run rate; AI compute (TPU + GPU rental) is the fastest-growing line
- Capex (Alphabet 2025): $75B+ guided for 2025, of which a substantial portion is AI infrastructure including TPU buildout
- Trillium GA pricing (committed-use): list ranges roughly $1.20–$2.70/chip-hour depending on commitment tier (varies by region)
6. People & Relationships
- TPU engineering origin: Norman Jouppi (TPU lead architect), led from Google Research
- Google Cloud CEO: Thomas Kurian
- Cloud AI VP: Amin Vahdat (TPU systems / network), Mark Lohmeyer (compute)
- Foundry partner: tsmc (advanced nodes; uses CoWoS)
- Design partner: Broadcom (custom ASIC implementation/IP)
- Major external customers: Anthropic (anchor), Salesforce, Hugging Face, Mid-journey
- Captive consumer: Google Search, Ads, Gemini, YouTube, Workspace AI features
- Direct competitors: nvidia aws-trainium aws-inferentia microsoft-maia amd cerebras