Together AI

AI-native cloud for open-source model inference, fine-tuning, and dedicated NVIDIA GPU clusters — built around the Together Kernel Collection from FlashAttention author Tri Dao.

1. 核心产品 / 服务

Three product lines on a single platform:

Together Inference — serverless API for 200+ open-source LLMs (Llama, Qwen, Mistral, DeepSeek, etc.), token-priced. Claims #1 output speed on demanding open models, with up to 2x faster serverless inference via FlashAttention-4 kernels, fused MoE kernels, and FP8/FP4 quantization that's "effectively lossless" [1].
Together Fine-Tuning — full fine-tunes plus LoRA on open weights, customer keeps the weights. Targeted at production workloads, not just experimentation.
Together GPU Clusters — two tiers: Instant GPU Clusters (self-service, up to 64 NVIDIA Hopper GPUs, spin up in minutes via console, GA'd Sept 2025) and Dedicated GPU Clusters (64–1,000 GPUs, custom-configured, supports Skypilot/Terraform IaC). Expanding Blackwell (B200/GB200) deployments announced at GTC 2025 [2].

Core technical moat is the Together Kernel Collection built under Chief Scientist Tri Dao (FlashAttention creator) — hardware-aware kernels for attention, MoE routing, and low-bit quantization.

2. 服务对象 & 痛点

Open-source-first AI startups & enterprises that don't want to be locked into closed API providers (OpenAI/Anthropic) and want Llama/Qwen/DeepSeek-class quality.
Pain points solved: hosting open models in-house requires GPU procurement, kernel tuning, autoscaling, eval — Together absorbs that. Pricing positioned ~11x cheaper than GPT-4 for Llama-3 equivalents [3].
Burst training / fine-tuning: customers who need 64–1000 GPUs for a few weeks and don't want to commit to AWS/Azure reserved capacity.

3. 竞争格局

Competitor	Positioning	Vs. Together
Fireworks AI	Token-priced inference, FireAttention engine	Direct competitor; Fireworks claims lower latency on some workloads, Together has broader model catalog + GPU cluster tier
Anyscale	Ray-native, RayTurbo, enterprise governance	Anyscale is more "infrastructure framework", Together is more "API product"
Modal	Commodity GPU host, per-second billing	Modal = bare-metal control for devs; Together = managed inference + training stack
runpod	Per-minute GPU rental, broad accelerator menu	RunPod is raw GPU; Together adds inference engine + fine-tuning UX
lambda-labs	Training-optimized GPU cloud	Lambda is more training/research focused, Together covers full inference→training loop
coreweave	Hyperscale GPU IaaS, NVIDIA-aligned	CoreWeave is wholesale GPU capacity (often Together's underlying supplier-class peer); Together sits a layer up as managed AI platform
openrouter	Aggregator/router across providers	OpenRouter routes traffic to Together (and others); they're complements more than competitors

Differentiation: Tri Dao's kernel work + the only player offering serverless tokens and dedicated 1000-GPU clusters under one console.

4. 独特观察

The Tri Dao hire is the load-bearing piece of the technical story — FlashAttention is foundational to every modern inference stack, so "we ship the kernels first" is a credible moat narrative vs. Fireworks/Anyscale.
Bet that open-source models stay competitive enough that enterprises want a neutral host. If frontier closed models (GPT-5, Claude 5) keep stretching the lead and open models stagnate, Together's TAM compresses. Continued strength of deepseek / Llama / Qwen is existential.
Strategic positioning between coreweave (wholesale GPUs) and OpenAI/Anthropic (closed APIs) — Together is the "Snowflake of open AI inference" pitch.
Self-service Instant Clusters (Sept 2025) is a meaningful product expansion — moves Together from "API vendor" toward "Vercel-for-GPUs" UX. See ai-inference-engines and gpu-kernel-optimization for the technical context.
Heavy NVIDIA partnership (NVIDIA is on the cap table, Blackwell early access at GTC 2025) — fortunes are correlated with NVIDIA roadmap.

5. 财务 / 融资

Founded: June 2022.
Series B (Feb 2025): $305M led by General Catalyst, co-led by Prosperity7. Valuation $3.3B, up >160% from the $1.25B post Salesforce-led $106M round in March 2024 [4].
Total raised: ~$534M as of Feb 2025.
Investors: General Catalyst, Prosperity7, Salesforce Ventures, NVIDIA, Kleiner Perkins, Coatue, Lux Capital, Greycroft, Emergence, March Capital, SK Telecom, John Chambers, Scott Banister, DAMAC Capital.
Reported: seeking ~$1B follow-on per DCD reporting (2025) — would push valuation materially higher if closed.

6. People & Relationships

Founder / CEO: Vipul Ved Prakash — serial founder (prior exits in search/data infra), founded Together June 2022.
Chief Scientist: Tri Dao — FlashAttention author, Princeton CS, technical credibility anchor.
Lead investors: General Catalyst, Prosperity7, NVIDIA (strategic).
Cooperates with: openrouter (as upstream provider), NVIDIA (Blackwell early access).
Competes with: Fireworks AI, Anyscale, Modal, runpod, lambda-labs (overlapping zones).
Hosts models from: deepseek, Meta (Llama), Alibaba (Qwen), Mistral.

Sources

[1] https://www.together.ai/products (2026-05-09)
[2] https://siliconangle.com/2025/09/09/exclusive-together-ai-launches-self-service-gpu-infrastructure/ (2026-05-09)
[3] https://northflank.com/blog/fireworks-ai-vs-together-ai (2026-05-09)
[4] https://news.crunchbase.com/cloud/together-ai-valuation-jump-general-catalyst-nvda/ (2026-05-09)
https://www.together.ai/blog/together-ai-announcing-305m-series-b (2026-05-09)