Cerebras
Wafer-scale AI chip maker pivoting from training appliance to high-throughput inference cloud, riding a $10B+ OpenAI contract toward a 2026 Nasdaq IPO.
1. Core Product / Service
Cerebras builds the Wafer-Scale Engine (WSE-3) — a single chip occupying a full 5nm wafer with 4 trillion transistors, 900,000 AI cores, 44GB on-chip SRAM, and 125 PFLOPS peak AI performance. The chip ships inside the CS-3 appliance for on-prem training/fine-tuning, and powers Cerebras Inference, a token-streaming cloud service.
The differentiation lies in keeping model weights and activations on-wafer: no PCIe/NVLink/HBM bottleneck, near-linear scaling for sequential token generation. Public benchmarks show 969 t/s on Llama 3.1-405B and ~2,100 t/s on Llama 3.1-70B — orders of magnitude faster than GPU stacks for streaming workloads. See ai-inference-engines for where this sits in the inference-stack landscape.
2. Target Users & Pain Points
Two distinct buyer profiles:
- Frontier labs / sovereign clouds wanting massive training throughput without multi-rack networking complexity (OpenAI, G42/Core42, MBZUAI, national labs).
- Latency-sensitive inference customers — agentic apps, real-time code/voice, reasoning-heavy chains — where time-to-first-token and tokens/second materially change UX (Cognition, Mistral via partners, Hugging Face, Mayo Clinic, GSK).
The pain Cerebras solves: GPU clusters force a tradeoff between throughput and latency at large context lengths; wafer-scale removes the inter-chip-communication tax. The pain it does not solve: CUDA ecosystem lock-in, and elasticity for spiky/small workloads.
3. Competitive Landscape
| Vendor | Architecture | Strength | Weakness vs Cerebras |
|---|---|---|---|
| NVIDIA (H100/B200) | GPU + NVLink/HBM | CUDA ecosystem, supply, generality | 5-20x slower on streaming inference for large open models |
| Groq | LPU (deterministic SRAM) | Lowest TTFT on small models | Smaller on-chip memory; struggles at 405B+ scale |
| SambaNova (SN40L/SN50) | Reconfigurable dataflow | Best perf/area; strong on private deployments | Lower peak t/s than Cerebras on biggest open models |
| GPU clouds (coreweave lambda-labs runpod nebius together-ai) | NVIDIA rental | Flexibility, ecosystem | Bottlenecked by GPU memory bandwidth for streaming |
Cerebras' wedge is streaming throughput on very large open-weight models; it is not trying to win on $/training-FLOP against H100/B200 fleets.
4. Unique Observations
- Customer concentration risk is real but shifting. At 2024 S-1, G42 was 87% of revenue. For FY2025, MBZUAI + G42 together were ~86% of $510M revenue — still highly concentrated, but the OpenAI Master Relationship Agreement (750 MW expandable to 2 GW, $20B+ over term) reframes the story as "diversifying via a single mega-anchor."
- The AWS Bedrock binding term sheet is the more strategically interesting disclosure: it forces NVIDIA-native hyperscalers to integrate non-GPU silicon into managed inference, which is a precedent Groq/SambaNova have not matched.
- Inference-first pivot matches the broader industry shift — see ai-inference-engines. Cerebras was a training-chip story in 2022; in 2026 the IPO narrative is almost entirely about inference economics.
- Wafer-scale is fundamentally a gpu-kernel-optimization-orthogonal play: there is no kernel-fusion lever to pull because the whole model fits on-die. This is liberating and limiting.
5. Financials / Funding
- FY2025 revenue: $510M, +76% YoY; net income $87.9M (first profitable year per S-1).
- IPO: amended S-1 filed May 2026; targeting Nasdaq listing (ticker CBRS), 28M Class A shares at $115-$125 range, ~$3.5B raise at ~$26.6B valuation.
- Anchor contracts: $20B+ OpenAI Master Relationship Agreement (750 MW, expandable to 2 GW); binding term sheet with AWS for CS-3 in Bedrock.
- Earlier private rounds led by G42 / Foundation Capital / Benchmark / Eclipse / Altimeter; pre-IPO valuation marks were below the IPO range.
6. People & Relationships
- Founders: Andrew Feldman (CEO), Gary Lauterbach, Michael James, Sean Lie, Jean-Philippe Fricker — most of the founding team came from SeaMicro (sold to AMD 2012).
- Investors: G42, Foundation Capital, Benchmark, Eclipse Ventures, Altimeter, Coatue.
- Customers / Partners: OpenAI (anchor inference contract), G42 / Core42, MBZUAI, AWS (Bedrock integration), Meta, IBM, Hugging Face, Oracle, Cognition, Mayo Clinic, GlaxoSmithKline.
- Adjacent infra plays in this wiki: coreweave lambda-labs runpod nebius together-ai — all NVIDIA-fleet operators competing for the same inference dollars from a different architectural direction.