Company

Together AI

AI-native cloud for open-source model inference, fine-tuning, and dedicated NVIDIA GPU clusters — built around the Together Kernel Collection from FlashAttention author Tri Dao.

1. 核心产品 / 服务

Three product lines on a single platform:

  • Together Inference — serverless API for 200+ open-source LLMs (Llama, Qwen, Mistral, DeepSeek, etc.), token-priced. Claims #1 output speed on demanding open models, with up to 2x faster serverless inference via FlashAttention-4 kernels, fused MoE kernels, and FP8/FP4 quantization that's "effectively lossless" [1].
  • Together Fine-Tuning — full fine-tunes plus LoRA on open weights, customer keeps the weights. Targeted at production workloads, not just experimentation.
  • Together GPU Clusters — two tiers: Instant GPU Clusters (self-service, up to 64 NVIDIA Hopper GPUs, spin up in minutes via console, GA'd Sept 2025) and Dedicated GPU Clusters (64–1,000 GPUs, custom-configured, supports Skypilot/Terraform IaC). Expanding Blackwell (B200/GB200) deployments announced at GTC 2025 [2].

Core technical moat is the Together Kernel Collection built under Chief Scientist Tri Dao (FlashAttention creator) — hardware-aware kernels for attention, MoE routing, and low-bit quantization.

2. 服务对象 & 痛点

  • Open-source-first AI startups & enterprises that don't want to be locked into closed API providers (OpenAI/Anthropic) and want Llama/Qwen/DeepSeek-class quality.
  • Pain points solved: hosting open models in-house requires GPU procurement, kernel tuning, autoscaling, eval — Together absorbs that. Pricing positioned ~11x cheaper than GPT-4 for Llama-3 equivalents [3].
  • Burst training / fine-tuning: customers who need 64–1000 GPUs for a few weeks and don't want to commit to AWS/Azure reserved capacity.

3. 竞争格局

Competitor Positioning Vs. Together
Fireworks AI Token-priced inference, FireAttention engine Direct competitor; Fireworks claims lower latency on some workloads, Together has broader model catalog + GPU cluster tier
Anyscale Ray-native, RayTurbo, enterprise governance Anyscale is more "infrastructure framework", Together is more "API product"
Modal Commodity GPU host, per-second billing Modal = bare-metal control for devs; Together = managed inference + training stack
runpod Per-minute GPU rental, broad accelerator menu RunPod is raw GPU; Together adds inference engine + fine-tuning UX
lambda-labs Training-optimized GPU cloud Lambda is more training/research focused, Together covers full inference→training loop
coreweave Hyperscale GPU IaaS, NVIDIA-aligned CoreWeave is wholesale GPU capacity (often Together's underlying supplier-class peer); Together sits a layer up as managed AI platform
openrouter Aggregator/router across providers OpenRouter routes traffic to Together (and others); they're complements more than competitors

Differentiation: Tri Dao's kernel work + the only player offering serverless tokens and dedicated 1000-GPU clusters under one console.

4. 独特观察

  • The Tri Dao hire is the load-bearing piece of the technical story — FlashAttention is foundational to every modern inference stack, so "we ship the kernels first" is a credible moat narrative vs. Fireworks/Anyscale.
  • Bet that open-source models stay competitive enough that enterprises want a neutral host. If frontier closed models (GPT-5, Claude 5) keep stretching the lead and open models stagnate, Together's TAM compresses. Continued strength of deepseek / Llama / Qwen is existential.
  • Strategic positioning between coreweave (wholesale GPUs) and OpenAI/Anthropic (closed APIs) — Together is the "Snowflake of open AI inference" pitch.
  • Self-service Instant Clusters (Sept 2025) is a meaningful product expansion — moves Together from "API vendor" toward "Vercel-for-GPUs" UX. See ai-inference-engines and gpu-kernel-optimization for the technical context.
  • Heavy NVIDIA partnership (NVIDIA is on the cap table, Blackwell early access at GTC 2025) — fortunes are correlated with NVIDIA roadmap.

5. 财务 / 融资

  • Founded: June 2022.
  • Series B (Feb 2025): $305M led by General Catalyst, co-led by Prosperity7. Valuation $3.3B, up >160% from the $1.25B post Salesforce-led $106M round in March 2024 [4].
  • Total raised: ~$534M as of Feb 2025.
  • Investors: General Catalyst, Prosperity7, Salesforce Ventures, NVIDIA, Kleiner Perkins, Coatue, Lux Capital, Greycroft, Emergence, March Capital, SK Telecom, John Chambers, Scott Banister, DAMAC Capital.
  • Reported: seeking ~$1B follow-on per DCD reporting (2025) — would push valuation materially higher if closed.

6. People & Relationships

  • Founder / CEO: Vipul Ved Prakash — serial founder (prior exits in search/data infra), founded Together June 2022.
  • Chief Scientist: Tri Dao — FlashAttention author, Princeton CS, technical credibility anchor.
  • Lead investors: General Catalyst, Prosperity7, NVIDIA (strategic).
  • Cooperates with: openrouter (as upstream provider), NVIDIA (Blackwell early access).
  • Competes with: Fireworks AI, Anyscale, Modal, runpod, lambda-labs (overlapping zones).
  • Hosts models from: deepseek, Meta (Llama), Alibaba (Qwen), Mistral.

Sources

Last compiled: 2026-05-09