Cerebras

Wafer-scale AI chip maker pivoting from training appliance to high-throughput inference cloud, riding a $10B+ OpenAI contract toward a 2026 Nasdaq IPO.

1. Core Product / Service

Cerebras builds the Wafer-Scale Engine (WSE-3) — a single chip occupying a full 5nm wafer with 4 trillion transistors, 900,000 AI cores, 44GB on-chip SRAM, and 125 PFLOPS peak AI performance. The chip ships inside the CS-3 appliance for on-prem training/fine-tuning, and powers Cerebras Inference, a token-streaming cloud service.

The differentiation lies in keeping model weights and activations on-wafer: no PCIe/NVLink/HBM bottleneck, near-linear scaling for sequential token generation. Public benchmarks show 969 t/s on Llama 3.1-405B and ~2,100 t/s on Llama 3.1-70B — orders of magnitude faster than GPU stacks for streaming workloads. See ai-inference-engines for where this sits in the inference-stack landscape.

2. Target Users & Pain Points

Two distinct buyer profiles:

Frontier labs / sovereign clouds wanting massive training throughput without multi-rack networking complexity (OpenAI, G42/Core42, MBZUAI, national labs).
Latency-sensitive inference customers — agentic apps, real-time code/voice, reasoning-heavy chains — where time-to-first-token and tokens/second materially change UX (Cognition, Mistral via partners, Hugging Face, Mayo Clinic, GSK).

The pain Cerebras solves: GPU clusters force a tradeoff between throughput and latency at large context lengths; wafer-scale removes the inter-chip-communication tax. The pain it does not solve: CUDA ecosystem lock-in, and elasticity for spiky/small workloads.

3. Competitive Landscape

Vendor	Architecture	Strength	Weakness vs Cerebras
NVIDIA (H100/B200)	GPU + NVLink/HBM	CUDA ecosystem, supply, generality	5-20x slower on streaming inference for large open models
Groq	LPU (deterministic SRAM)	Lowest TTFT on small models	Smaller on-chip memory; struggles at 405B+ scale
SambaNova (SN40L/SN50)	Reconfigurable dataflow	Best perf/area; strong on private deployments	Lower peak t/s than Cerebras on biggest open models
GPU clouds (coreweave lambda-labs runpod nebius together-ai)	NVIDIA rental	Flexibility, ecosystem	Bottlenecked by GPU memory bandwidth for streaming

Cerebras' wedge is streaming throughput on very large open-weight models; it is not trying to win on $/training-FLOP against H100/B200 fleets.

4. Unique Observations

Customer concentration risk is real but shifting. At 2024 S-1, G42 was 87% of revenue. For FY2025, MBZUAI + G42 together were ~86% of $510M revenue — still highly concentrated, but the OpenAI Master Relationship Agreement (750 MW expandable to 2 GW, $20B+ over term) reframes the story as "diversifying via a single mega-anchor."
The AWS Bedrock binding term sheet is the more strategically interesting disclosure: it forces NVIDIA-native hyperscalers to integrate non-GPU silicon into managed inference, which is a precedent Groq/SambaNova have not matched.
Inference-first pivot matches the broader industry shift — see ai-inference-engines. Cerebras was a training-chip story in 2022; in 2026 the IPO narrative is almost entirely about inference economics.
Wafer-scale is fundamentally a gpu-kernel-optimization-orthogonal play: there is no kernel-fusion lever to pull because the whole model fits on-die. This is liberating and limiting.

5. Financials / Funding

FY2025 revenue: $510M, +76% YoY; net income $87.9M (first profitable year per S-1).
IPO: amended S-1 filed May 2026; targeting Nasdaq listing (ticker CBRS), 28M Class A shares at $115-$125 range, ~$3.5B raise at ~$26.6B valuation.
Anchor contracts: $20B+ OpenAI Master Relationship Agreement (750 MW, expandable to 2 GW); binding term sheet with AWS for CS-3 in Bedrock.
Earlier private rounds led by G42 / Foundation Capital / Benchmark / Eclipse / Altimeter; pre-IPO valuation marks were below the IPO range.

6. People & Relationships

Founders: Andrew Feldman (CEO), Gary Lauterbach, Michael James, Sean Lie, Jean-Philippe Fricker — most of the founding team came from SeaMicro (sold to AMD 2012).
Investors: G42, Foundation Capital, Benchmark, Eclipse Ventures, Altimeter, Coatue.
Customers / Partners: OpenAI (anchor inference contract), G42 / Core42, MBZUAI, AWS (Bedrock integration), Meta, IBM, Hugging Face, Oracle, Cognition, Mayo Clinic, GlaxoSmithKline.
Adjacent infra plays in this wiki: coreweave lambda-labs runpod nebius together-ai — all NVIDIA-fleet operators competing for the same inference dollars from a different architectural direction.