Home/AI Infrastructure/AI Inference Engine Landscape
EN中文

AI Inference Engine Landscape

Related: inferact, radixark, gpu-kernel-optimization

Overview

AI inference engines are the infrastructure layer that serves trained LLM models to users at scale. The market is rapidly commercializing around two dominant open-source projects: vLLM and SGLang.

Market Structure (2026)

Open-Source Leaders

Project Stars Commercial Entity Valuation Lead Investor
vLLM ~65K Inferact $800M a16z + Lightspeed
SGLang ~16K RadixArk $400M Accel

Other Players

Engine Developer Notes
TensorRT-LLM NVIDIA Most optimized for NVIDIA hardware, closed-ish
LMDeploy Shanghai AI Lab (InternLM) Strong INT4, TurboMind C++ engine
Xinference Xorbits (阿里系) Chinese market, distributed inference
Fireworks AI Fireworks Inc. $10B+ valuation, own engine

DeepSeek's Strategic Choice

DeepSeek (models V3, R1, V3-0324) chose to contribute optimizations back to vLLM rather than building their own inference engine.

Logic:

  1. DeepSeek is a model company, not infra company — would cost them a team to maintain an engine
  2. vLLM has the largest deployment base — contributing to vLLM = DeepSeek models reach more users
  3. vLLM is hardware-agnostic — DeepSeek benefits regardless of what hardware users have

AI Lab Official Recommendations

Lab Models Recommended Engines
DeepSeek V3, R1, V3-0324 SGLang (Day-0) + vLLM
Meta Llama 4 vLLM + SGLang + TensorRT-LLM
Google Gemma 3/4 vLLM
Mistral Mistral Large 3 vLLM + SGLang
Moonshot Kimi K2, K2.5 vLLM + SGLang

Key Metrics

Metric SGLang vLLM
H100 throughput ~16,200 tok/s ~12,500 tok/s
Multi-GPU scaling TP + PP + EP TP + PP
MoE support Yes (DeepSeek V3/R1) Yes
FP8 support Partial Yes (Hopper)

Business Model

All commercial players follow: Open-source free + Enterprise managed services paid

Services charged: SLA guarantees, dedicated GPU clusters, commercial support, hardware co-development.

Sources

  • Inferact $150M seed round coverage (Fintool, Pulse2, a16z)
  • SGLang GitHub: lmsys-org/sglang
  • DeepSeek official model cards
  • H100 benchmark data from various inference tests