Inferact

A commercial company founded by the core vLLM team, turning the world's most widely deployed open-source inference engine into an enterprise managed platform.

1. Core Product / Service

Underlying engine = vLLM (continued open-source maintenance; independent project identity preserved). Inferact builds its commercial version on top: auto-scaling, multi-node orchestration, failure recovery [1]
Planned paid serverless vLLM: auto provisioning, updates, ops, including observability / troubleshooting / disaster recovery; expected to be Kubernetes-based [3]
Enterprise-grade hardware-optimized kernels: proprietary kernels for specific GPU configurations (see gpu-kernel-optimization), targeting SLA customers
Key vLLM technical capabilities: PagedAttention memory management, Tensor Parallelism (70B+ models on a single instance across multiple cards), multi-step scheduling (vLLM 0.6.0+), support for Llama / DeepSeek / Mistral / Gemma / Moonshot Kimi and other mainstream models
vLLM's deployment scale: a16z/Inferact self-reports vLLM running on roughly 400,000+ GPUs simultaneously worldwide [3]

2. Target Users & Pain Points

Target customers: enterprises unwilling to operate GPU clusters themselves but needing SLA guarantees — finance, healthcare, government, AI app companies.

Core pain points:

Running open-source vLLM yourself means solving multi-node orchestration, scaling, monitoring, disaster recovery
Enterprises need SOC2, private cloud deployment, secure model management
Need throughput / latency SLAs; the open-source version has no responsibility commitment
New hardware (non-H100/H200) adaptation workload is heavy

vLLM's "universal inference layer" thinking: coexist with existing inference providers rather than replace them, acting as the foundational standardized software layer [1].

3. Competitive Landscape

Competitor	Path difference
radixark (SGLang commercialization)	Direct rival. SGLang H100 ~16,200 tok/s exceeds vLLM ~12,500 tok/s; valuation $400M vs Inferact $800M; GitHub stars 16K vs 65K
together-ai	In-house engine + managed API, already mature commercialization (2-3 years earlier than Inferact)
Fireworks AI	Closed-source in-house engine, valuation ~$10B, founded 2022, has many enterprise customers
Anyscale	Ray ecosystem commercialization, multi-workload (not just inference), Ion Stoica overlapping founder
Modal / Replicate	Serverless GPU inference, more developer-oriented
runpod / Lambda Labs / Nebius	Pure GPU compute rental, no inference engine layer

Inferact's differentiation: the only one directly bound to the world's most widely deployed open-source inference engine. When all major model labs (Meta Llama, Google Gemma, Mistral, Moonshot, DeepSeek, etc.) release models, they officially recommend vLLM → Inferact is the natural default.

See ai-inference-engines.

4. Unique Observations

Before vLLM team commercialization, DeepSeek strategically chose to merge its inference optimizations back into vLLM instead of running a separate commercial play — confirming vLLM's "dominant" ecosystem position. This locks in Inferact a "de facto standard" channel even before going public.
The valuation $800M / seed $150M ratio (~5.3x) is on the high side for 2026 AI infra seeds, reflecting investor buy-in on the open-source → commercial conversion narrative behind vLLM's installed base (400k GPUs).
The real risk isn't technical, it's enterprise GTM: the vLLM team is Berkeley academic; they lack the enterprise sales muscle of Together AI / Fireworks. Ion Stoica's presence (Databricks/Anyscale co-founder) is the team's strongest "commercialization insurance".
The Inferact vs radixark battle is more like an extension of the vLLM vs SGLang ecosystem war: performance slightly favors SGLang, ecosystem breadth overwhelmingly favors vLLM.

5. Financials / Funding

Round	Date	Amount	Valuation	Investors
Seed	Announced 2026-01 (founded 2025-11)	$150M	$800M post	a16z + Lightspeed Venture Partners co-leading; Sequoia Capital, Altimeter Capital, Redpoint Ventures, ZhenFund following [2][4]

Revenue model:

Enterprise-grade managed vLLM inference (per-call / subscription)
SLA commercial support (finance, healthcare, government)
Hardware vendor adaptation fees (helping new chips quickly enter the vLLM ecosystem)

vLLM open source remains free → user growth → some natural transition to paid managed.

6. People & Companies

Founding team

Simon Mo — CEO, UC Berkeley PhD, vLLM core maintainer
Woosuk Kwon — co-founder, vLLM project originator, Berkeley PhD (Ion Stoica's student)
Kaichao You (游凯超) — co-founder, Tsinghua Special Award winner, vLLM core contributor
Roger Wang — co-founder
Joseph Gonzalez — co-founder, UC Berkeley professor
Ion Stoica — co-founder, UC Berkeley professor, Databricks / Anyscale co-founder

Investors: a16z, Lightspeed Venture Partners, Sequoia Capital, Altimeter Capital, Redpoint Ventures, ZhenFund

Ecosystem / competition

Direct competition: radixark
Adjacent ecosystem: together-ai, openrouter, runpod
Model-layer users (vLLM upstream): deepseek, kimi
Related modules: ai-inference-engines, gpu-kernel-optimization

Sources

[1] a16z official announcement Investing in Inferact — https://a16z.com/announcement/investing-in-inferact/ (2026-05-09)
[2] Bloomberg, Inferact Raises $150 Million in Seed Funding Led by Andreessen Horowitz — 2026-01-22 (2026-05-09)
[3] SiliconANGLE, Inferact launches with $150M in funding to commercialize vLLM — 2026-01-22 (2026-05-09)
[4] TechCrunch, Inference startup Inferact lands $150M to commercialize vLLM — 2026-01-22 (2026-05-09)
local: raw/daily_log-2026-04-08.md (Jimmy's own 2026-04-07 OpenClaw research notes)