AMD

The credible #2 in AI accelerators — bigger memory than NVIDIA at the chip level, smaller software ecosystem at every other level.

1. Core Product / Service

AMD's Instinct MI line is the only non-NVIDIA discrete GPU with material datacenter share in 2026:

MI300X — TSMC N5 + N6 chiplets, 192 GB HBM3, 1.3 PFLOPS FP16 / 2.6 PFLOPS FP8; shipped late 2023.
MI325X — refresh with 256 GB HBM3e; Llama-2 70B LoRA fine-tune in 21.75 min on 8 GPUs, ~8% faster than H200 in MLPerf-style tests [2].
MI355X — CDNA 4, 288 GB HBM3e, near-parity with B200 on training (10.18 min vs 9.85 min on Llama-2 70B LoRA) and ~30% faster inference than B200 on Llama 3.1 405B with ~40% better tokens/$ in vendor-favorable benchmarks [1].
MI400 / CDNA Next — targeted late-2026/2027 to compete with Rubin.

Software stack is ROCm (open-source CUDA analogue), supplemented by recent investments in PyTorch upstreaming, vLLM/SGLang ROCm ports, and AMD-internal model-compilation work via the SiloAI / ZT Systems / Nod.ai acquisitions.

2. Target Users & Pain Points

Hyperscalers diversifying away from NVIDIA risk — Microsoft and Meta are the visible MI300X buyers; both want a credible second source on cost and supply.
Inference-heavy customers for whom 192–288 GB HBM means a full 405B-class model fits on fewer GPUs (lower TCO).
Sovereigns and HPC labs with existing AMD CPU footprints (El Capitan, Frontier).

Pain solved: NVIDIA allocation queue + price; HBM headroom for memory-bound inference. Pain not solved: every framework, kernel, and reference implementation is CUDA-first.

3. Competitive Landscape

Chip	HBM	Peak FP8 / FP4	Software	Notes
AMD MI300X	192 GB HBM3	2.6 PF FP8	ROCm 6	~45% of theoretical FLOPS achieved in microbench vs ~93% on H100 [2]
AMD MI355X	288 GB HBM3e	~10 PF FP4 (claimed)	ROCm 6	Memory advantage for big-model inference
nvidia H100	80 GB	4 PF FP8	CUDA	Reference workhorse
nvidia B200	192 GB HBM3e	~10 PF FP4	CUDA	Direct competitor; FlashAttention runs ~20% faster on CUDA than ROCm [4]
intel Gaudi 3	128 GB HBM2e	~1.8 PF FP8	SynapseAI	Cheaper but slower ramp
google-tpu Trillium	32 GB HBM	~0.9 PF BF16	XLA	Captive

4. Unique Observations

Hardware-on-paper vs hardware-in-production gap is the defining AMD problem. MI300X has 2.4× the HBM of H100 and similar FLOPS, yet achieves only ~45% of theoretical peak on common workloads where H100/B200 hit ~93% [2]. The bottleneck is the kernel/compiler stack — see gpu-kernel-optimization — not the silicon.
ROCm is closing on CUDA but slowly. Flash Attention, the load-bearing primitive for transformer inference, runs ~20% slower on ROCm than CUDA as of 2026; ROCm 6 added PyTorch 2.x and TF support but optimization still requires manual work where CUDA is automatic [4]. Each generation closes maybe 20–30% of the gap.
MI355X tokens-per-dollar narrative is the wedge. With 50% more HBM than B200 and AMD pricing aggressively for share, the marginal-cost story for memory-bound serving (large MoE, long-context) is genuinely better. This is what Microsoft and Meta are buying. Whether it scales to a 20%+ share by 2027 depends almost entirely on how fast ROCm catches up.
Market share trajectory: 5–7% in 2026 [1]; AMD's own guidance pushes for the high-single digits. The credible bear case: AMD becomes a structural #2 at 10–15% share, NVIDIA holds ~80%, captive chips (google-tpu aws-trainium microsoft-maia) absorb the rest. The bull case requires ROCm to reach CUDA-parity, which is a 3–5 year project.
Acquisition strategy is "buy a software stack." ZT Systems (system integration), SiloAI (Finnish open-model team), Nod.ai (compiler), Pensando (DPU) — all strengthen the non-silicon layers AMD has historically been weak in.

5. Financials / Funding

Public: NASDAQ: AMD; market cap range ~$200–300B in 2026
Data Center segment: now AMD's largest segment; MI accelerators contributing >$5B annual run rate
MI300/325/355 cumulative AI revenue: AMD has guided to a multi-year multi-tens-of-billions trajectory; specific 2026 numbers depend on quarterly disclosure
R&D: ~$6–7B/year; significant share now AI-targeted
Foundry: tsmc N5/N4P → N3/N2 for MI400

6. People & Relationships

CEO: Lisa Su — engineer-CEO who turned AMD around from near-bankruptcy in 2014; one of the most respected operators in semis
CTO: Mark Papermaster
DC GPU lead: Vamsi Boppana (SVP, AI)
Foundry: tsmc (N5 → N3 → N2)
HBM suppliers: SK hynix, Micron, Samsung
Major customers: Microsoft Azure (MI300X), Meta (MI300X for Llama serving), Oracle, lambda-labs, coreweave (limited), HPE/Dell OEMs
Acquisitions (AI-related, 2024–2025): ZT Systems (~~$4.9B, system integration), SiloAI (~~$665M, models), Nod.ai (compiler), Pensando ($1.9B, networking)
Direct competitors: nvidia intel huawei-ascend google-tpu aws-trainium microsoft-maia cerebras

AMD