Company
Together AI
AI-native cloud for open-source model inference, fine-tuning, and dedicated NVIDIA GPU clusters — built around the Together Kernel Collection from FlashAttention author Tri Dao.
1. 核心产品 / 服务
Three product lines on a single platform:
- Together Inference — serverless API for 200+ open-source LLMs (Llama, Qwen, Mistral, DeepSeek, etc.), token-priced. Claims #1 output speed on demanding open models, with up to 2x faster serverless inference via FlashAttention-4 kernels, fused MoE kernels, and FP8/FP4 quantization that's "effectively lossless" [1].
- Together Fine-Tuning — full fine-tunes plus LoRA on open weights, customer keeps the weights. Targeted at production workloads, not just experimentation.
- Together GPU Clusters — two tiers: Instant GPU Clusters (self-service, up to 64 NVIDIA Hopper GPUs, spin up in minutes via console, GA'd Sept 2025) and Dedicated GPU Clusters (64–1,000 GPUs, custom-configured, supports Skypilot/Terraform IaC). Expanding Blackwell (B200/GB200) deployments announced at GTC 2025 [2].
Core technical moat is the Together Kernel Collection built under Chief Scientist Tri Dao (FlashAttention creator) — hardware-aware kernels for attention, MoE routing, and low-bit quantization.
2. 服务对象 & 痛点
- Open-source-first AI startups & enterprises that don't want to be locked into closed API providers (OpenAI/Anthropic) and want Llama/Qwen/DeepSeek-class quality.
- Pain points solved: hosting open models in-house requires GPU procurement, kernel tuning, autoscaling, eval — Together absorbs that. Pricing positioned ~11x cheaper than GPT-4 for Llama-3 equivalents [3].
- Burst training / fine-tuning: customers who need 64–1000 GPUs for a few weeks and don't want to commit to AWS/Azure reserved capacity.
3. 竞争格局
| Competitor | Positioning | Vs. Together |
|---|---|---|
| Fireworks AI | Token-priced inference, FireAttention engine | Direct competitor; Fireworks claims lower latency on some workloads, Together has broader model catalog + GPU cluster tier |
| Anyscale | Ray-native, RayTurbo, enterprise governance | Anyscale is more "infrastructure framework", Together is more "API product" |
| Modal | Commodity GPU host, per-second billing | Modal = bare-metal control for devs; Together = managed inference + training stack |
| runpod | Per-minute GPU rental, broad accelerator menu | RunPod is raw GPU; Together adds inference engine + fine-tuning UX |
| lambda-labs | Training-optimized GPU cloud | Lambda is more training/research focused, Together covers full inference→training loop |
| coreweave | Hyperscale GPU IaaS, NVIDIA-aligned | CoreWeave is wholesale GPU capacity (often Together's underlying supplier-class peer); Together sits a layer up as managed AI platform |
| openrouter | Aggregator/router across providers | OpenRouter routes traffic to Together (and others); they're complements more than competitors |
Differentiation: Tri Dao's kernel work + the only player offering serverless tokens and dedicated 1000-GPU clusters under one console.
4. 独特观察
- The Tri Dao hire is the load-bearing piece of the technical story — FlashAttention is foundational to every modern inference stack, so "we ship the kernels first" is a credible moat narrative vs. Fireworks/Anyscale.
- Bet that open-source models stay competitive enough that enterprises want a neutral host. If frontier closed models (GPT-5, Claude 5) keep stretching the lead and open models stagnate, Together's TAM compresses. Continued strength of deepseek / Llama / Qwen is existential.
- Strategic positioning between coreweave (wholesale GPUs) and OpenAI/Anthropic (closed APIs) — Together is the "Snowflake of open AI inference" pitch.
- Self-service Instant Clusters (Sept 2025) is a meaningful product expansion — moves Together from "API vendor" toward "Vercel-for-GPUs" UX. See ai-inference-engines and gpu-kernel-optimization for the technical context.
- Heavy NVIDIA partnership (NVIDIA is on the cap table, Blackwell early access at GTC 2025) — fortunes are correlated with NVIDIA roadmap.
5. 财务 / 融资
- Founded: June 2022.
- Series B (Feb 2025): $305M led by General Catalyst, co-led by Prosperity7. Valuation $3.3B, up >160% from the $1.25B post Salesforce-led $106M round in March 2024 [4].
- Total raised: ~$534M as of Feb 2025.
- Investors: General Catalyst, Prosperity7, Salesforce Ventures, NVIDIA, Kleiner Perkins, Coatue, Lux Capital, Greycroft, Emergence, March Capital, SK Telecom, John Chambers, Scott Banister, DAMAC Capital.
- Reported: seeking ~$1B follow-on per DCD reporting (2025) — would push valuation materially higher if closed.
6. People & Relationships
- Founder / CEO: Vipul Ved Prakash — serial founder (prior exits in search/data infra), founded Together June 2022.
- Chief Scientist: Tri Dao — FlashAttention author, Princeton CS, technical credibility anchor.
- Lead investors: General Catalyst, Prosperity7, NVIDIA (strategic).
- Cooperates with: openrouter (as upstream provider), NVIDIA (Blackwell early access).
- Competes with: Fireworks AI, Anyscale, Modal, runpod, lambda-labs (overlapping zones).
- Hosts models from: deepseek, Meta (Llama), Alibaba (Qwen), Mistral.
Sources
- [1] https://www.together.ai/products (2026-05-09)
- [2] https://siliconangle.com/2025/09/09/exclusive-together-ai-launches-self-service-gpu-infrastructure/ (2026-05-09)
- [3] https://northflank.com/blog/fireworks-ai-vs-together-ai (2026-05-09)
- [4] https://news.crunchbase.com/cloud/together-ai-valuation-jump-general-catalyst-nvda/ (2026-05-09)
- https://www.together.ai/blog/together-ai-announcing-305m-series-b (2026-05-09)
Last compiled: 2026-05-09