Modal

把"写函数"这件事变成"在 GPU 上跑代码"——serverless GPU compute 的 dev-first 玩法，a16z 投资，Erik Bernhardsson 创办。

1. 核心产品 / 服务

Modal 是 3P 推理的 底层抽象 —— 不卖 token，不卖固定 image API，卖写一个 Python 函数就能在 GPU 上 serverless 跑 的开发体验。

Modal Functions：Python decorator (@app.function(gpu="H100")) 即把函数发到云上 GPU 跑。冷启动几秒，按秒计费。
容器 + GPU 调度：底层是自研容器运行时 + 多家 L2 GPU 池（H100/A100/A10G/L40S/T4 menu 完整）。
Web Endpoints：把函数暴露成 HTTPS 接口；很多团队用 Modal 自托管 LLM、batch job、自家小模型推理服务。
Volumes / Networks / Sandboxes：完整的 sandbox + persistent storage 抽象，支持 untrusted code execution（成为 agent 时代意外重要的卖点 —— Claude / GPT 自家 code execution 也用类似抽象）。
不是 token API：Modal 不预装"调 Llama 70B → 给我 token"这样的菜单，需要自己打包模型。这是 vs Together/Fireworks 最根本差异。

2. 服务对象 & 痛点

ML 工程师 / AI startup：要跑实验、batch job、custom 模型推理 —— 不想运维 K8s + GPU autoscaling。
Agent / Sandbox 公司：需要"安全跑用户提交代码"的 sandbox 抽象；Modal 容器隔离 + 弹性 GPU 是少有的现成方案。
Workflow / pipeline 团队：跑视频处理、embedding 索引、模型训练 sweep 等不规则负载，Modal 的 ergonomics 比 K8s + Ray 写起来短一个数量级。
痛点：自建 K8s + GPU operator + autoscaling + image pipeline 至少 3-6 个月人力；Modal 的 decorator 模式把这件事压缩到一行装饰器。

3. 竞争格局

Competitor	Positioning	Vs. Modal
replicate	模型 hub + Cog 容器	Replicate 偏托管现成模型；Modal 偏写自己的代码
runpod	按分钟 GPU 出租 + serverless	RunPod 价格更低但 dev UX 弱；Modal 偏更高层抽象
baseten	模型 serving 专用	Baseten 偏部署 + 监控；Modal 偏通用 GPU 函数
AWS Lambda + GPU / SageMaker	云厂通用	云厂学习曲线陡；Modal 是 Python-native 简单
Beam.cloud / Banana	serverless GPU 同类	Modal 资本与生态领先
fireworks-ai / together-ai	token API	完全不同抽象层（function vs token）

差异化：Python-native ergonomics + 容器 / sandbox / GPU 一体化抽象。开发者只写函数，不管 K8s。

4. 独特观察

GPU-second 定价（2026-05）：H100 $0.001097/s（~~$3.95/h），A100 80GB $0.000694/s（~~$2.50/h），A10G $0.000306/s（~~$1.10/h），T4 $0.000164/s（~~$0.59/h），CPU $0.0000131/core-s [1]。比 runpod 略贵但便宜得多 vs AWS / GCP 同等 GPU 列表价。
Token 不是直接计价：Modal 不卖"调 Llama → 给 tokens"，需要客户自部署。但行业普遍跑 vLLM 在 Modal 上做自家 OpenAI-compat endpoint —— 自部署 Llama 70B 在 1×H100 ~3.95/h，跑 ~10K tok/s 输出（vLLM batched）→ ~$0.11 / M tokens 边际成本（GPU 占用满载理想），实际带 idle / autoscaling 后 ~$0.30-0.60/M。比 DeepInfra 自己卖的 ~$0.30/M 略贵，但你拿到完全控制权 + 私有数据。
vs 1P 价差：把 Modal 看作"自托管 OpenAI-compat 替代方案" 的成本下限：跑 Llama 70B ~$0.50/M blended vs GPT-4o ~$10/M → ~20× 价差，但你要自己处理 latency / model ops。
推理引擎：Modal 不强制 — 用户可以装 vLLM、SGLang、TGI、TensorRT-LLM 任何一个。Modal 是 GPU compute 平台，不做 engine。
算力来源：从 GCP / Oracle / coreweave / Lambda 等 L2 多源租 GPU pool；自己不做硬件、不做数据中心。Modal 的差异化在 调度层 + 容器编排 + cold-start 优化（自家 lazy loading + 镜像缓存让 cold start <2s 可达）。
Agent / Sandbox 对 Modal 的拉动：2024-2025 年 agent 时代很多 startup 用 Modal 做 code execution sandbox（Anthropic Code Interpreter 同类需求），是 Modal 增量的关键流量来源。
资本模型 + 战略：Modal 没有自家 token API 卖，所以 不参与 token 价格战；护城河在工程师生态 + ergonomics —— 类似 Vercel 之于 web，Modal 之于 GPU。这是 a16z 投资逻辑。
风险：随着 fireworks-ai / baseten / together-ai dedicated 部署产品成熟，"为什么不直接用 token API"的反问会越来越强。Modal 必须靠 sandbox / agent / batch / 自定义模型这些 token API 不擅长的场景活下来。

5. 财务 / 融资

Round	Date	Amount	Valuation	Lead
Seed	2022	$7M	—	Lux Capital, Definition
Series A	2023	$16M	—	Redpoint
Series B	2024-09	$80M	$1.1B post	Lux Capital + Andreessen Horowitz [3]

创立：2021 年
总融资：~$103M
公开 ARR / 用户量未披露；行业估算几万开发者活跃

6. People & Relationships

Founder / CEO: Erik Bernhardsson — ex-Spotify ML infra 负责人（开源 Annoy 库作者）；ML infra 圈知名人物。
Co-founder / CTO: Akshat Bubna。
Investors: Lux Capital, Andreessen Horowitz, Redpoint, Definition, Amplify Partners。
Partners / Customers: Suno, Ramp, ElevenLabs, Substack 等公开 case；agent 公司居多。
Competes with: replicate, baseten, runpod (serverless tier), Beam.cloud。

Sources

[1] https://modal.com/pricing (2026-05-10)
[2] https://modal.com/blog (2026-05-10)
[3] https://a16z.com/announcement/investing-in-modal/ (2026-05-10)
[4] https://www.crunchbase.com/organization/modal-labs (2026-05-10)

Modal

1. 核心产品 / 服务

2. 服务对象 & 痛点

3. 竞争格局

4. 独特观察

5. 财务 / 融资

6. People & Relationships

Related