Company
AMD
The credible #2 in AI accelerators — bigger memory than NVIDIA at the chip level, smaller software ecosystem at every other level.
1. Core Product / Service
AMD's Instinct MI line is the only non-NVIDIA discrete GPU with material datacenter share in 2026:
- MI300X — TSMC N5 + N6 chiplets, 192 GB HBM3, 1.3 PFLOPS FP16 / 2.6 PFLOPS FP8; shipped late 2023.
- MI325X — refresh with 256 GB HBM3e; Llama-2 70B LoRA fine-tune in 21.75 min on 8 GPUs, ~8% faster than H200 in MLPerf-style tests [2].
- MI355X — CDNA 4, 288 GB HBM3e, near-parity with B200 on training (10.18 min vs 9.85 min on Llama-2 70B LoRA) and ~30% faster inference than B200 on Llama 3.1 405B with ~40% better tokens/$ in vendor-favorable benchmarks [1].
- MI400 / CDNA Next — targeted late-2026/2027 to compete with Rubin.
Software stack is ROCm (open-source CUDA analogue), supplemented by recent investments in PyTorch upstreaming, vLLM/SGLang ROCm ports, and AMD-internal model-compilation work via the SiloAI / ZT Systems / Nod.ai acquisitions.
2. Target Users & Pain Points
- Hyperscalers diversifying away from NVIDIA risk — Microsoft and Meta are the visible MI300X buyers; both want a credible second source on cost and supply.
- Inference-heavy customers for whom 192–288 GB HBM means a full 405B-class model fits on fewer GPUs (lower TCO).
- Sovereigns and HPC labs with existing AMD CPU footprints (El Capitan, Frontier).
Pain solved: NVIDIA allocation queue + price; HBM headroom for memory-bound inference. Pain not solved: every framework, kernel, and reference implementation is CUDA-first.
3. Competitive Landscape
| Chip | HBM | Peak FP8 / FP4 | Software | Notes |
|---|---|---|---|---|
| AMD MI300X | 192 GB HBM3 | 2.6 PF FP8 | ROCm 6 | ~45% of theoretical FLOPS achieved in microbench vs ~93% on H100 [2] |
| AMD MI355X | 288 GB HBM3e | ~10 PF FP4 (claimed) | ROCm 6 | Memory advantage for big-model inference |
| nvidia H100 | 80 GB | 4 PF FP8 | CUDA | Reference workhorse |
| nvidia B200 | 192 GB HBM3e | ~10 PF FP4 | CUDA | Direct competitor; FlashAttention runs ~20% faster on CUDA than ROCm [4] |
| intel Gaudi 3 | 128 GB HBM2e | ~1.8 PF FP8 | SynapseAI | Cheaper but slower ramp |
| google-tpu Trillium | 32 GB HBM | ~0.9 PF BF16 | XLA | Captive |
4. Unique Observations
- Hardware-on-paper vs hardware-in-production gap is the defining AMD problem. MI300X has 2.4× the HBM of H100 and similar FLOPS, yet achieves only ~45% of theoretical peak on common workloads where H100/B200 hit ~93% [2]. The bottleneck is the kernel/compiler stack — see gpu-kernel-optimization — not the silicon.
- ROCm is closing on CUDA but slowly. Flash Attention, the load-bearing primitive for transformer inference, runs ~20% slower on ROCm than CUDA as of 2026; ROCm 6 added PyTorch 2.x and TF support but optimization still requires manual work where CUDA is automatic [4]. Each generation closes maybe 20–30% of the gap.
- MI355X tokens-per-dollar narrative is the wedge. With 50% more HBM than B200 and AMD pricing aggressively for share, the marginal-cost story for memory-bound serving (large MoE, long-context) is genuinely better. This is what Microsoft and Meta are buying. Whether it scales to a 20%+ share by 2027 depends almost entirely on how fast ROCm catches up.
- Market share trajectory: 5–7% in 2026 [1]; AMD's own guidance pushes for the high-single digits. The credible bear case: AMD becomes a structural #2 at 10–15% share, NVIDIA holds ~80%, captive chips (google-tpu aws-trainium microsoft-maia) absorb the rest. The bull case requires ROCm to reach CUDA-parity, which is a 3–5 year project.
- Acquisition strategy is "buy a software stack." ZT Systems (system integration), SiloAI (Finnish open-model team), Nod.ai (compiler), Pensando (DPU) — all strengthen the non-silicon layers AMD has historically been weak in.
5. Financials / Funding
- Public: NASDAQ: AMD; market cap range ~$200–300B in 2026
- Data Center segment: now AMD's largest segment; MI accelerators contributing >$5B annual run rate
- MI300/325/355 cumulative AI revenue: AMD has guided to a multi-year multi-tens-of-billions trajectory; specific 2026 numbers depend on quarterly disclosure
- R&D: ~$6–7B/year; significant share now AI-targeted
- Foundry: tsmc N5/N4P → N3/N2 for MI400
6. People & Relationships
- CEO: Lisa Su — engineer-CEO who turned AMD around from near-bankruptcy in 2014; one of the most respected operators in semis
- CTO: Mark Papermaster
- DC GPU lead: Vamsi Boppana (SVP, AI)
- Foundry: tsmc (N5 → N3 → N2)
- HBM suppliers: SK hynix, Micron, Samsung
- Major customers: Microsoft Azure (MI300X), Meta (MI300X for Llama serving), Oracle, lambda-labs, coreweave (limited), HPE/Dell OEMs
- Acquisitions (AI-related, 2024–2025): ZT Systems (
$4.9B, system integration), SiloAI ($665M, models), Nod.ai (compiler), Pensando ($1.9B, networking) - Direct competitors: nvidia intel huawei-ascend google-tpu aws-trainium microsoft-maia cerebras
Last compiled: 2026-05-10