Company

AMD

The credible #2 in AI accelerators — bigger memory than NVIDIA at the chip level, smaller software ecosystem at every other level.

1. Core Product / Service

AMD's Instinct MI line is the only non-NVIDIA discrete GPU with material datacenter share in 2026:

  • MI300X — TSMC N5 + N6 chiplets, 192 GB HBM3, 1.3 PFLOPS FP16 / 2.6 PFLOPS FP8; shipped late 2023.
  • MI325X — refresh with 256 GB HBM3e; Llama-2 70B LoRA fine-tune in 21.75 min on 8 GPUs, ~8% faster than H200 in MLPerf-style tests [2].
  • MI355X — CDNA 4, 288 GB HBM3e, near-parity with B200 on training (10.18 min vs 9.85 min on Llama-2 70B LoRA) and ~30% faster inference than B200 on Llama 3.1 405B with ~40% better tokens/$ in vendor-favorable benchmarks [1].
  • MI400 / CDNA Next — targeted late-2026/2027 to compete with Rubin.

Software stack is ROCm (open-source CUDA analogue), supplemented by recent investments in PyTorch upstreaming, vLLM/SGLang ROCm ports, and AMD-internal model-compilation work via the SiloAI / ZT Systems / Nod.ai acquisitions.

2. Target Users & Pain Points

  • Hyperscalers diversifying away from NVIDIA risk — Microsoft and Meta are the visible MI300X buyers; both want a credible second source on cost and supply.
  • Inference-heavy customers for whom 192–288 GB HBM means a full 405B-class model fits on fewer GPUs (lower TCO).
  • Sovereigns and HPC labs with existing AMD CPU footprints (El Capitan, Frontier).

Pain solved: NVIDIA allocation queue + price; HBM headroom for memory-bound inference. Pain not solved: every framework, kernel, and reference implementation is CUDA-first.

3. Competitive Landscape

Chip HBM Peak FP8 / FP4 Software Notes
AMD MI300X 192 GB HBM3 2.6 PF FP8 ROCm 6 ~45% of theoretical FLOPS achieved in microbench vs ~93% on H100 [2]
AMD MI355X 288 GB HBM3e ~10 PF FP4 (claimed) ROCm 6 Memory advantage for big-model inference
nvidia H100 80 GB 4 PF FP8 CUDA Reference workhorse
nvidia B200 192 GB HBM3e ~10 PF FP4 CUDA Direct competitor; FlashAttention runs ~20% faster on CUDA than ROCm [4]
intel Gaudi 3 128 GB HBM2e ~1.8 PF FP8 SynapseAI Cheaper but slower ramp
google-tpu Trillium 32 GB HBM ~0.9 PF BF16 XLA Captive

4. Unique Observations

  • Hardware-on-paper vs hardware-in-production gap is the defining AMD problem. MI300X has 2.4× the HBM of H100 and similar FLOPS, yet achieves only ~45% of theoretical peak on common workloads where H100/B200 hit ~93% [2]. The bottleneck is the kernel/compiler stack — see gpu-kernel-optimization — not the silicon.
  • ROCm is closing on CUDA but slowly. Flash Attention, the load-bearing primitive for transformer inference, runs ~20% slower on ROCm than CUDA as of 2026; ROCm 6 added PyTorch 2.x and TF support but optimization still requires manual work where CUDA is automatic [4]. Each generation closes maybe 20–30% of the gap.
  • MI355X tokens-per-dollar narrative is the wedge. With 50% more HBM than B200 and AMD pricing aggressively for share, the marginal-cost story for memory-bound serving (large MoE, long-context) is genuinely better. This is what Microsoft and Meta are buying. Whether it scales to a 20%+ share by 2027 depends almost entirely on how fast ROCm catches up.
  • Market share trajectory: 5–7% in 2026 [1]; AMD's own guidance pushes for the high-single digits. The credible bear case: AMD becomes a structural #2 at 10–15% share, NVIDIA holds ~80%, captive chips (google-tpu aws-trainium microsoft-maia) absorb the rest. The bull case requires ROCm to reach CUDA-parity, which is a 3–5 year project.
  • Acquisition strategy is "buy a software stack." ZT Systems (system integration), SiloAI (Finnish open-model team), Nod.ai (compiler), Pensando (DPU) — all strengthen the non-silicon layers AMD has historically been weak in.

5. Financials / Funding

  • Public: NASDAQ: AMD; market cap range ~$200–300B in 2026
  • Data Center segment: now AMD's largest segment; MI accelerators contributing >$5B annual run rate
  • MI300/325/355 cumulative AI revenue: AMD has guided to a multi-year multi-tens-of-billions trajectory; specific 2026 numbers depend on quarterly disclosure
  • R&D: ~$6–7B/year; significant share now AI-targeted
  • Foundry: tsmc N5/N4P → N3/N2 for MI400

6. People & Relationships

  • CEO: Lisa Su — engineer-CEO who turned AMD around from near-bankruptcy in 2014; one of the most respected operators in semis
  • CTO: Mark Papermaster
  • DC GPU lead: Vamsi Boppana (SVP, AI)
  • Foundry: tsmc (N5 → N3 → N2)
  • HBM suppliers: SK hynix, Micron, Samsung
  • Major customers: Microsoft Azure (MI300X), Meta (MI300X for Llama serving), Oracle, lambda-labs, coreweave (limited), HPE/Dell OEMs
  • Acquisitions (AI-related, 2024–2025): ZT Systems ($4.9B, system integration), SiloAI ($665M, models), Nod.ai (compiler), Pensando ($1.9B, networking)
  • Direct competitors: nvidia intel huawei-ascend google-tpu aws-trainium microsoft-maia cerebras
Last compiled: 2026-05-10