Groq

自研 LPU（Language Processing Unit）硬件，把推理延迟和 throughput 推到 GPU 体系外的极限 — 速度叙事最强的 3P 推理玩家。

1. 核心产品 / 服务

Groq 是少数 垂直自研到芯片层 的 3P 推理公司：

LPU 芯片（GroqChip）：deterministic、无 GPU 那套 SIMT 调度的张量流处理器。每片 SRAM 容量小但带宽极高，模型权重必须切片到大量 LPU 上；好处是 token-by-token 解码近无 stall，坏处是模型越大需要 LPU 越多。
GroqCloud：serverless API，对外按 token 卖。模型菜单含 Llama 3.1 / 3.3 (8B/70B)、DeepSeek R1 distill、Mixtral、Whisper、Qwen2.5 等开源主流。
GroqRack / On-Prem：把 LPU 系统打包卖给主权云、政府、Aramco（沙特战略合作 2024）等大客户。
速度卖点：Llama 3.1 70B 上 ~250 tok/s output、Llama 3.1 8B ~750 tok/s、Llama 3.3 70B 在 GroqCloud 上多次第三方测速 ~250-330 tok/s [4][5]，远超 GPU 体系下的 ~50-80 tok/s。所谓 "500+ tok/s" 通常指 8B 级别小模型场景。

2. 服务对象 & 痛点

延迟敏感的应用：voice agent、实时翻译、agent 长链路（多次 LLM call 串联） — token/s 是体感关键变量。
企业 / 主权 deployment：把 LPU rack 卖到沙特、加拿大主权云，避开 NVIDIA 出口管制和 GPU 抢购排队。
痛点：现有 GPU 推理在 batch=1 (低并发) 时延迟瓶颈在 memory bandwidth；Groq 的 SRAM 设计直接绕过这个瓶颈，所以 single-stream 速度是 GPU 体系结构性突破，不是简单工程优化。

3. 竞争格局

Competitor	Positioning	Vs. Groq
cerebras	晶圆级 WSE-3 推理	同为非-GPU 推理；Cerebras 单晶圆性能更夸张但生态窄；Groq 商业化更早更广
NVIDIA H100 + vLLM	主流 GPU 推理路线	NVIDIA 通用性 + 软件生态压倒性优势；Groq 是 niche 速度极致
fireworks-ai / together-ai	GPU 上的软件优化	Groq 速度 / latency 占优；Fireworks/Together 模型菜单广 + 价格更弹性
SambaNova RDU	自研 reconfigurable dataflow	同为 non-GPU；SambaNova 偏企业 dedicated；Groq 偏公开云 + token API
AWS Trainium / Inferentia	云厂自研推理芯片	AWS 有分发；Groq 速度更激进

差异化：速度叙事 + 自研芯片 + 政府/主权云销售，三件事同时做的玩家只有 Groq + Cerebras + SambaNova 三家在 L3b 中。

4. 独特观察

Per-token 定价（GroqCloud，2026-05）：Llama 3.1 8B ~ $0.05/M（input + output 合计口径）；Llama 3.3 70B ~ $0.59/M input + $0.79/M output；DeepSeek R1 distill 70B ~ $0.75/M；Whisper Large v3 ~$0.111/hour audio [1]。所谓 "$1.50/M" 大约对应 70B 级 blended 价位，跟 fireworks-ai / together-ai 持平。
vs 1P 价差：Llama 3.3 70B @ ~$0.70/M blended vs GPT-4o ~$10/M blended → ~14× 价差；vs Claude Haiku ~$1.6/M → ~2× 价差。Groq 的卖点是 同价或便宜，但快 5-10×。
推理引擎：完全自研 + 闭源（基于自研 LPU 编译器栈），不用 vLLM 也不用 SGLang。模型 onboarding 周期长 — 这是为什么 Groq 模型菜单显著小于 Fireworks/Together (~20 个 vs 100+)。
资本模型：Groq 是 fabless 芯片公司（在 GlobalFoundries 14nm 流片，不依赖 TSMC 5nm），新一代芯片设计周期慢，跟 NVIDIA H100→B200 节奏比有代际差。LPU 代际跟上是存在风险。
Take rate / 成本：自有芯片自有 rack，成本结构与 GPU 玩家不可比；不需付 NVIDIA 60% 毛利税，但要摊销自家芯片研发 + 流片 + rack 部署。芯片折旧 + 数据中心 colo 折算成 token 成本未公开。
Capacity ramp：2024 年宣称要部署 100 万颗 LPU；实际进度受流片产能影响一直紧；GroqCloud 多次出现 rate-limit 严卡，反映容量是当前增长瓶颈。
沙特 Aramco 战略：2024 年签约部署 LPU 数据中心，是 Groq 把"速度叙事"换成"主权 AI 销售"的关键 GTM 事件。

5. 财务 / 融资

Round	Date	Amount	Valuation	Lead
Series C	2021	$300M	$1B+ post	Tiger Global
Series D	2024-08	$640M	$2.8B post	BlackRock [2]
Series E (报道)	2025-08	~$750M	~$6.9B	Disruptive (Bloomberg 报道) [3]

创立: 2016 年，Jonathan Ross（前 Google TPU 早期工程师）创立。
总融资: ~$2B+ 估算（含战略 + sovereign 客户预付）。
客户: Aramco (沙特战略部署), 公开 GroqCloud 用户号称 ~2M 开发者（自报）。

6. People & Relationships

Founder / CEO: Jonathan Ross — 前 Google TPU 项目早期工程师；TPU 团队 alumni 是 Groq 工程文化的根。
Chief Architect: Dennis Abts (Google TPU 前同事)。
Investors: BlackRock, Tiger Global, Cisco Investments, Type One Ventures, Samsung Catalyst, KDDI, D1 Capital, Disruptive (报道), Saudi Aramco (战略), Lee Fixel。
Partners: Aramco (主权云), Meta (Llama 早期适配 partner), DeepSeek (R1 distill 同步 launch)。
Competes with: cerebras, SambaNova, NVIDIA, fireworks-ai, together-ai。

Sources

[1] https://groq.com/pricing/ (2026-05-10)
[2] https://wow.groq.com/news_press/groq-raises-640m-series-d/ (2026-05-10)
[3] https://www.bloomberg.com/news/articles/2025-08-27/groq-said-to-raise-funds-at-roughly-6-9-billion-valuation (2026-05-10)
[4] https://groq.com/blog/groq-on-llama-3-1-70b (2026-05-10)
[5] https://artificialanalysis.ai/providers/groq (2026-05-10)

Groq

1. 核心产品 / 服务

2. 服务对象 & 痛点

3. 竞争格局

4. 独特观察

5. 财务 / 融资

6. People & Relationships

Related