Fireworks AI

闭源自研 FireAttention 推理引擎的 3P 推理平台，主打 enterprise compound AI，估值已破百亿。

1. 核心产品 / 服务

Fireworks 的产品矩阵围绕一条主线：用自研推理引擎服务开源模型，按 token 计费。

Serverless Inference API：100+ 开源模型菜单，含 Llama 3.1/3.3 (8B/70B/405B)、DeepSeek V3 / R1、Qwen2.5、Mistral、Mixtral、Gemma — 按 per-million-token 计费 [1]。
On-Demand / Dedicated Deployments：客户独占 GPU，按 GPU-hour 计费，避开 serverless 的多租户排队，适合稳定 QPS / 私有微调权重场景。
Fine-Tuning：LoRA + 全量微调，结果可在 serverless 上免额外托管费部署。
FireOptimizer / Compound AI：把多模型、多步推理、function calling 编排成 production pipeline 的工具层 — 是 Fireworks 切入企业市场的差异化产品。
FireAttention 引擎：自研、闭源，宣称在 FP8 / FP16 下比 vLLM 快 4×（自家测试，含主动 quantization）[5]。

企业 AI 团队：不想被 OpenAI / Anthropic API 锁住，但需要 SLA、多区域、私有数据 — Fireworks 提供单一 API 切换上百个开源模型 + 专属集群。
大流量 SaaS / Agent 公司：tokens 月消耗 >10B 时，1P API 价格难承受。Fireworks 在 Llama 70B 等价能力上比 GPT-4o 便宜 ~10×（自家市场素材口径）[4]。
痛点：开源 vLLM 自部署需要 kernel 调优 + 多节点编排 + autoscaling；Fireworks 把这些层层抽象为单一 API，并保留 dedicated 选项给需要 isolation 的客户。

Competitor	Positioning	Vs. Fireworks
together-ai	自研 Kernel Collection（Tri Dao），serverless + GPU clusters	直接对手；Together 模型菜单更大、有 1000-GPU 集群产品；Fireworks compound AI / agent 编排更深
inferact	vLLM 商业化，开源生态绑定	Inferact 是开源派；Fireworks 闭源引擎自己掌控优化路径
radixark	SGLang 商业化	同样是引擎派对手；Fireworks 更"产品化"
Groq	自研 LPU 硬件	不同维度竞争（硬件 vs 软件）
DeepInfra	极低价 serverless	DeepInfra 价格更低但企业产品弱；Fireworks 更高端
AWS Bedrock	云厂托管	Bedrock 综合分发胜，Fireworks 性能 / 引擎深度胜

差异化：FireAttention 引擎 + Compound AI 编排 是 Fireworks 二元卖点。同行多在拼速度 / 价格，Fireworks 把"production-ready agent / function-calling pipeline"打包卖给企业。

Per-token 定价（serverless，2026-05 公开口径）：Llama 3.1 8B ~ $0.20/M tokens；Llama 3.1 70B ~ $0.90/M（input/output blended）；Llama 3.1 405B ~ $3/M；DeepSeek V3 ~ $0.90/M；Qwen2.5 72B ~ $0.90/M [1]。
vs 1P 价差：Llama 3.1 70B @ ~$0.90/M vs GPT-4o @ ~$5/M input + $15/M output → blended ~$10/M，价差 ~10×。但能力非完全对等：Llama 70B 通用 reasoning 仍落后 GPT-4o，trade-off 成立条件是 task 容忍度高。
vs Together：同等 Llama 70B 双方均 ~$0.88-0.90/M，价格已贴脸；竞争点不在价不在量在引擎效率与企业产品深度。
推理引擎：闭源自研 FireAttention（不是 vLLM / SGLang）。意味着每个新硬件 / 新模型架构的适配工作 Fireworks 必须 100% 自己做 — 是负担也是护城河。
算力来源：不自建 L1，主要从 coreweave / Oracle / GCP 等 L2 hyperscaler 租 H100 / H200，自己做 capacity 调度和 kernel 层优化。take rate ≈ (token 售价 - GPU rental 折算) / token 售价；公开未披露但行业估算 serverless 毛利 30-50%，dedicated 偏低。
Compound AI 是 hedge：如果未来开源模型与闭源模型差距收窄、token 价格战恶化（pure inference 商品化），Fireworks 想在"agent / pipeline 工具层"另起一炉灶 — 类似 Snowflake 早期 cloud DW 商品化时往 data app 平台靠的剧本。

Round	Date	Amount	Valuation	Lead
Seed	2022	—	—	Sequoia
Series A	2023-07	$25M	—	Benchmark
Series B	2024-07	$52M	$552M post	Sequoia [2]
Series C	2025-07 (报道)	~$200M	~$5.5B	多家成长基金（Reuters 报道）[3]

注：Section 4 提示中"$10B val" 与 Reuters 报道的 ~$5.5B（2025-07）有出入；本页采用有 Reuters 报道 anchor 的 $5.5B；如有 2026 新一轮把估值推到 $10B 应以新闻为准更新。

CEO / Founder: Lin Qiao — 前 Meta PyTorch 团队负责人，PyTorch Distributed / Inference 的关键人物之一；Fireworks 的"自研引擎"叙事很大程度建立在 PyTorch alumni 网络上。
Investors: Sequoia, Benchmark, NVIDIA, AMD, MongoDB Ventures (战略); Databricks Ventures (报道)。
Competes with: together-ai, inferact, radixark, DeepInfra, Groq, Anyscale。
Partners with: NVIDIA (GPU + 早期硬件), AMD MI300X 适配, MongoDB (vector 集成)。
Hosts models from: Meta (Llama), deepseek, Mistral, Alibaba (Qwen), Google (Gemma)。

Sources

[1] https://fireworks.ai/pricing (2026-05-10)
[2] https://fireworks.ai/blog/fireworks-raises-52m-series-b (2026-05-10)
[3] https://www.reuters.com/technology/artificial-intelligence/fireworks-ai-valued-552-billion-latest-funding-round-2025-07-11/ (2026-05-10)
[4] https://northflank.com/blog/fireworks-ai-vs-together-ai (2026-05-10)
[5] https://fireworks.ai/blog/fire-attention-serving-open-source-models-4x-faster-than-vllm-by-quantizing-with-no-tradeoffs (2026-05-10)