AI Inference Engine Landscape

Related: inferact, radixark, gpu-kernel-optimization

Overview

AI inference engines are the infrastructure layer that serves trained LLM models to users at scale. The market is rapidly commercializing around two dominant open-source projects: vLLM and SGLang.

Market Structure (2026)

Open-Source Leaders

Project	Stars	Commercial Entity	Valuation	Lead Investor
vLLM	~65K	Inferact	$800M	a16z + Lightspeed
SGLang	~16K	RadixArk	$400M	Accel

Other Players

Engine	Developer	Notes
TensorRT-LLM	NVIDIA	Most optimized for NVIDIA hardware, closed-ish
LMDeploy	Shanghai AI Lab (InternLM)	Strong INT4, TurboMind C++ engine
Xinference	Xorbits (阿里系)	Chinese market, distributed inference
Fireworks AI	Fireworks Inc.	$10B+ valuation, own engine

DeepSeek's Strategic Choice

DeepSeek (models V3, R1, V3-0324) chose to contribute optimizations back to vLLM rather than building their own inference engine.

Logic:

DeepSeek is a model company, not infra company — would cost them a team to maintain an engine
vLLM has the largest deployment base — contributing to vLLM = DeepSeek models reach more users
vLLM is hardware-agnostic — DeepSeek benefits regardless of what hardware users have

AI Lab Official Recommendations

Lab	Models	Recommended Engines
DeepSeek	V3, R1, V3-0324	SGLang (Day-0) + vLLM
Meta	Llama 4	vLLM + SGLang + TensorRT-LLM
Google	Gemma 3/4	vLLM
Mistral	Mistral Large 3	vLLM + SGLang
Moonshot	Kimi K2, K2.5	vLLM + SGLang

Key Metrics

Metric	SGLang	vLLM
H100 throughput	~16,200 tok/s	~12,500 tok/s
Multi-GPU scaling	TP + PP + EP	TP + PP
MoE support	Yes (DeepSeek V3/R1)	Yes
FP8 support	Partial	Yes (Hopper)

Business Model

All commercial players follow: Open-source free + Enterprise managed services paid

Services charged: SLA guarantees, dedicated GPU clusters, commercial support, hardware co-development.

Sources

Inferact $150M seed round coverage (Fintool, Pulse2, a16z)
SGLang GitHub: lmsys-org/sglang
DeepSeek official model cards
H100 benchmark data from various inference tests