Company
Groq
Self-developed LPU (Language Processing Unit) hardware that pushes inference latency and throughput beyond the limits of the GPU paradigm — the strongest speed-narrative third-party (3P) inference player.
1. Core Product / Service
Groq is one of the few 3P inference companies vertically integrated all the way down to the chip layer:
- LPU chip (GroqChip): a deterministic tensor stream processor without the GPU's SIMT scheduling. Each chip has small SRAM capacity but extremely high bandwidth; model weights must be sliced across many LPUs. Upside: near-zero stall in token-by-token decoding. Downside: larger models require more LPUs.
- GroqCloud: serverless API, sold per token. Model menu includes Llama 3.1 / 3.3 (8B/70B), DeepSeek R1 distill, Mixtral, Whisper, Qwen2.5, and other open-source mainstream.
- GroqRack / On-Prem: packages LPU systems for sale to sovereign clouds, governments, Aramco (strategic partnership 2024), and other large customers.
- Speed selling point: Llama 3.1 70B at ~250 tok/s output, Llama 3.1 8B at ~750 tok/s. Llama 3.3 70B on GroqCloud has been measured multiple times by third parties at ~250-330 tok/s [4][5], far exceeding the ~50-80 tok/s typical of GPU-based systems. The "500+ tok/s" figure usually refers to 8B-class small-model scenarios.
2. Target Users & Pain Points
- Latency-sensitive applications: voice agents, real-time translation, long-chain agent workflows (chained multi-LLM calls) — token/s is the key perceptual variable.
- Enterprise / sovereign deployments: selling LPU racks to Saudi Arabia, Canada sovereign clouds, sidestepping NVIDIA export controls and GPU procurement queues.
- Pain point: existing GPU inference's latency at batch=1 (low concurrency) is bottlenecked by memory bandwidth; Groq's SRAM design bypasses this bottleneck, so single-stream speed is a structural improvement over the GPU architecture, not just engineering optimization.
3. Competitive Landscape
| Competitor | Positioning | Vs. Groq |
|---|---|---|
| cerebras | Wafer-scale WSE-3 inference | Also non-GPU inference; Cerebras single-wafer performance is more extreme but the ecosystem is narrow; Groq commercialized earlier and broader |
| NVIDIA H100 + vLLM | Mainstream GPU inference route | NVIDIA's overwhelming generality + software ecosystem; Groq is niche speed extremity |
| fireworks-ai / together-ai | Software optimization on GPUs | Groq wins on speed / latency; Fireworks/Together have broader model menus + more flexible pricing |
| SambaNova RDU | In-house reconfigurable dataflow | Also non-GPU; SambaNova focuses on enterprise dedicated; Groq favors public cloud + token API |
| AWS Trainium / Inferentia | Cloud-provider in-house inference chips | AWS has distribution; Groq's speed is more aggressive |
Differentiation: speed narrative + in-house chip + government/sovereign cloud sales — the only L3b players doing all three simultaneously are Groq + Cerebras + SambaNova.
4. Unique Observations
- Per-token pricing (GroqCloud, 2026-05): Llama 3.1 8B ~$0.05/M (input + output blended); Llama 3.3 70B ~$0.59/M input + $0.79/M output; DeepSeek R1 distill 70B ~$0.75/M; Whisper Large v3 ~$0.111/hour audio [1]. The "$1.50/M" figure roughly corresponds to 70B-class blended pricing, on par with fireworks-ai / together-ai.
- Vs first-party (1P) price gap: Llama 3.3 70B @ ~$0.70/M blended vs GPT-4o ~$10/M blended → ~14× price gap; vs Claude Haiku ~$1.6/M → ~2× price gap. Groq's pitch is same price or cheaper, but 5-10× faster.
- Inference engine: completely in-house + closed source (based on a proprietary LPU compiler stack); doesn't use vLLM or SGLang. Model onboarding cycle is long — this is why Groq's model menu is significantly smaller than Fireworks/Together (~20 vs 100+).
- Capital model: Groq is a fabless chip company (tapes out at GlobalFoundries 14nm, not dependent on TSMC 5nm); new chip design cycles are slow, with a generational gap vs NVIDIA's H100→B200 cadence. LPU generational catch-up is a real risk.
- Take rate / cost: in-house chip and rack means a cost structure incomparable to GPU players; no need to pay NVIDIA's 60% margin tax, but must amortize in-house chip R&D + tape-out + rack deployment. Chip depreciation + data center colo conversion into per-token cost is not disclosed.
- Capacity ramp: in 2024 announced plans to deploy 1M LPUs; actual progress has been constrained by tape-out capacity. GroqCloud has hit severe rate limits multiple times, reflecting capacity as the current growth bottleneck.
- Saudi Aramco strategy: signed in 2024 to deploy LPU data centers, the key GTM event in Groq translating "speed narrative" into "sovereign AI sales".
5. Financials / Funding
| Round | Date | Amount | Valuation | Lead |
|---|---|---|---|---|
| Series C | 2021 | $300M | $1B+ post | Tiger Global |
| Series D | 2024-08 | $640M | $2.8B post | BlackRock [2] |
| Series E (reported) | 2025-08 | ~$750M | ~$6.9B | Disruptive (Bloomberg report) [3] |
- Founded: 2016, by Jonathan Ross (former early Google TPU engineer).
- Total funding: ~$2B+ estimated (including strategic + sovereign customer prepayments).
- Customers: Aramco (Saudi strategic deployment); public GroqCloud users self-reported at ~2M developers.
6. People & Relationships
- Founder / CEO: Jonathan Ross — early engineer on Google's TPU project; TPU team alumni are the foundation of Groq's engineering culture.
- Chief Architect: Dennis Abts (former Google TPU colleague).
- Investors: BlackRock, Tiger Global, Cisco Investments, Type One Ventures, Samsung Catalyst, KDDI, D1 Capital, Disruptive (reported), Saudi Aramco (strategic), Lee Fixel.
- Partners: Aramco (sovereign cloud), Meta (early Llama adaptation partner), DeepSeek (R1 distill simultaneous launch).
- Competes with: cerebras, SambaNova, NVIDIA, fireworks-ai, together-ai.
Sources
- [1] https://groq.com/pricing/ (2026-05-10)
- [2] https://wow.groq.com/news_press/groq-raises-640m-series-d/ (2026-05-10)
- [3] https://www.bloomberg.com/news/articles/2025-08-27/groq-said-to-raise-funds-at-roughly-6-9-billion-valuation (2026-05-10)
- [4] https://groq.com/blog/groq-on-llama-3-1-70b (2026-05-10)
- [5] https://artificialanalysis.ai/providers/groq (2026-05-10)
Last compiled: 2026-05-10