Baseten

模型 serving 专用平台 —— 让 AI 工程团队像 deploy web 服务一样部署 AI 模型，深度参与 TensorRT-LLM、SGLang、vLLM 调优。

1. 核心产品 / 服务

Baseten 是 3P 推理里 专做 model serving infra 的玩家：

Truss：开源模型打包框架（Baseten 维护），把模型 + 推理代码 + 依赖打包成可部署单元。类似 replicate 的 Cog，但 Truss 更生产级、强调可观测性。
Dedicated Deployments：客户独占 GPU pool，按 GPU-minute 计费，Baseten 自动处理 autoscaling / canary / 蓝绿部署。
Model Library：预打包模型菜单 —— Llama 3 全家族、Whisper、SDXL、FLUX、自定义微调权重等，可以一键部署到 dedicated deployment。
Chains：把多模型 / 多步推理串成 production pipeline 的工具（与 fireworks-ai 的 Compound AI 类似定位）。
Performance Engineering 团队：公开 case 显示 Baseten 团队会和大客户一起做 TensorRT-LLM / vLLM / SGLang 调优 —— 这是和 "API 即抽象" 玩家的本质差别。

2. 服务对象 & 痛点

AI 公司有自己的微调权重：不愿把模型权重交给 token API（IP 顾虑），又不想自建 K8s + Triton stack 的团队。
延迟 / throughput 敏感的 production 应用：声音 / 实时翻译 / 多步 agent —— 需要 dedicated GPU 而非 noisy serverless。
痛点：自托管 TensorRT-LLM 需要 CUDA / kernel / Triton 专家；Baseten 把这些藏到平台后面 + 团队咨询服务。
vs token API：Baseten 客户出钱买的是 isolation + 可控 latency + 自家权重不外流，不是 token 价格。

3. 竞争格局

Competitor	Positioning	Vs. Baseten
modal	serverless GPU compute	Modal 偏通用 GPU 函数；Baseten 偏 model serving 专用栈
replicate	模型 hub + Cog	Replicate dev-first / 长尾；Baseten enterprise / production
fireworks-ai / together-ai	token API + dedicated 选项	两家也提供 dedicated；Baseten 优势在"完全你的 stack"（含自定义引擎、kernel）
AWS SageMaker	云厂托管	SageMaker 复杂; Baseten 简洁
NVIDIA Triton 自托管	自部署	Triton 灵活; Baseten 把 Triton 包成产品
Bento / Anyscale	model serving 同类	Baseten 客户口碑 / 工程深度领先

差异化：Truss 框架 + dedicated focus + performance engineering 服务。在 "self-hosted 模型 + 想要 SLA + 不想运维 Triton" 的中段市场是事实标准。

4. 独特观察

GPU 定价（dedicated，2026-05）：H100 80GB ~~$0.10/min（~~$6/h），A100 80GB ~~$0.07/min（~~$4.20/h），L4 ~$0.014/min，T4 ~$0.011/min [1]。比 modal（H100 ~$3.95/h）和 runpod 略贵 —— Baseten 售卖的是 GPU + 平台 / 工程服务。
Token 不直接列价：Baseten 模型库部署后客户自己算 token 单价；行业实测 Llama 70B 在 1×H100 dedicated 上 ~$0.30-0.50/M blended（全负载理想），含 autoscale buffer 后 ~$0.50-1.0/M。比 token-API 玩家略贵，但完全独占 + 自家权重 + 可调 kernel。
vs 1P 价差：Llama 70B dedicated 在 Baseten 上 ~$0.80/M vs GPT-4o ~$10/M → ~12×；vs DeepInfra ~$0.30/M → Baseten 贵 2-3×，但买到 isolation。
推理引擎：多引擎 polyglot — TensorRT-LLM、vLLM、SGLang、TGI 都支持，Baseten 团队和客户一起选最优解。这是和 "强绑一个引擎" 的同行（inferact 锁 vLLM、fireworks-ai 锁 FireAttention）的本质差别。
算力来源：从 coreweave / Oracle / GCP / AWS 多 L2 hyperscaler 租 H100 / A100 池，自有 capacity 调度层；不自建 colo。take rate ≈ (售价 - GPU rental) / 售价；行业估算 dedicated tier 30-40%。
Performance Engineering 服务：Baseten 经常 publish "我们把 Llama 70B 在 H100 上推到 X tok/s" 之类技术博客 —— 这是 marketing 资产也是销售工具，把"性能工程能力"直接卖给企业。
战略 tradeoff：不参与 token serverless 价格战 → 单价更高 → 客户群更窄但更黏。Series C 前 ARR 据传增长很猛（公开未披露具体数字），是少数 dedicated-tier "纯生意" 跑通的玩家。
风险：如果 token API 玩家把 dedicated 产品打透（Together、Fireworks 都在加强 dedicated），Baseten 的中段市场会被夹击。Baseten 必须靠工程团队咨询服务和 polyglot 引擎能力守住差异化。

5. 财务 / 融资

Round	Date	Amount	Valuation	Lead
Seed	2019	$2.6M	—	Greylock
Series A	2022	$20M	—	Greylock
Series B	2023	$40M	—	Spark Capital
Series C	2025-02	$75M	~$825M post	IVP [2]
Series D (报道)	2025-12	~$150M	~$2.1B	a16z (报道)

创立：2019 年
总融资：~$290M
客户：公开 case 含 Descript, Patreon, Pictory, Robust Intelligence；ARR 未披露但 IVP / a16z 入场反映增长强劲。

6. People & Relationships

Co-founders: Tuhin Srivastava (CEO, ex-Gumroad), Amir Haghighat, Pankaj Gupta, Phil Howes — 全是 ex-Gumroad 团队；产品基因偏 dev tool。
Investors: IVP, Spark Capital, Greylock, South Park Commons, a16z (报道)。
Partners: NVIDIA (TensorRT-LLM 早期 partner), Hugging Face。
Competes with: modal, replicate, fireworks-ai (在 dedicated tier), together-ai (在 dedicated tier), AWS SageMaker。

Sources

[1] https://www.baseten.co/pricing (2026-05-10)
[2] https://www.baseten.co/blog/announcing-our-series-c (2026-05-10)
[3] https://www.crunchbase.com/organization/baseten (2026-05-10)
[4] https://www.baseten.co/library/ (2026-05-10)

Baseten

1. 核心产品 / 服务

2. 服务对象 & 痛点

3. 竞争格局

4. 独特观察

5. 财务 / 融资

6. People & Relationships

Related