Company
Modal
Turning "writing a function" into "running code on a GPU" — a dev-first serverless GPU compute play, backed by a16z, founded by Erik Bernhardsson.
1. Core Product / Service
Modal is the underlying abstraction of 3P inference — it doesn't sell tokens, it doesn't sell fixed image APIs; it sells the developer experience of writing a Python function that runs serverless on a GPU.
- Modal Functions: a Python decorator (
@app.function(gpu="H100")) ships the function to the cloud GPU for execution. Cold start in seconds, billed by the second. - Container + GPU scheduling: backed by an in-house container runtime + GPU pools rented from multiple L2 providers (H100/A100/A10G/L40S/T4 — complete menu).
- Web Endpoints: expose functions as HTTPS interfaces; many teams use Modal to self-host LLMs, batch jobs, and in-house small-model inference services.
- Volumes / Networks / Sandboxes: complete sandbox + persistent storage abstraction, supporting untrusted code execution (an unexpectedly important selling point in the agent era — Claude / GPT in-house code execution also uses similar abstractions).
- Not a token API: Modal doesn't pre-load a menu of "call Llama 70B → give me tokens"; you have to package the model yourself. This is the most fundamental difference vs Together/Fireworks.
2. Target Users & Pain Points
- ML engineers / AI startups: want to run experiments, batch jobs, custom model inference — don't want to operate K8s + GPU autoscaling.
- Agent / Sandbox companies: need the "safely run user-submitted code" sandbox abstraction; Modal's container isolation + elastic GPUs is one of the few ready-made solutions.
- Workflow / pipeline teams: running irregular loads like video processing, embedding indexing, training sweeps — Modal's ergonomics are an order of magnitude shorter than K8s + Ray.
- Pain point: building K8s + GPU operator + autoscaling + image pipeline yourself takes at least 3-6 person-months; Modal's decorator pattern compresses this into one line.
3. Competitive Landscape
| Competitor | Positioning | Vs. Modal |
|---|---|---|
| replicate | Model hub + Cog container | Replicate leans toward hosted ready-made models; Modal leans toward writing your own code |
| runpod | Per-minute GPU rental + serverless | RunPod is cheaper but weaker on dev UX; Modal is a higher-level abstraction |
| baseten | Model serving dedicated | Baseten leans toward deployment + monitoring; Modal leans toward general-purpose GPU functions |
| AWS Lambda + GPU / SageMaker | Cloud-provider general | Cloud-provider learning curve is steep; Modal is Python-native and simple |
| Beam.cloud / Banana | Same serverless GPU class | Modal leads on capital and ecosystem |
| fireworks-ai / together-ai | Token API | Completely different abstraction layer (function vs token) |
Differentiation: Python-native ergonomics + integrated container / sandbox / GPU abstraction. Developers just write functions; no K8s.
4. Unique Observations
- GPU-second pricing (2026-05): H100 $0.001097/s (
$3.95/h), A100 80GB $0.000694/s ($2.50/h), A10G $0.000306/s ($1.10/h), T4 $0.000164/s ($0.59/h), CPU $0.0000131/core-s [1]. Slightly more expensive than runpod but much cheaper than AWS / GCP equivalent GPU list prices. - Tokens not priced directly: Modal doesn't sell "call Llama → give tokens"; customers must self-deploy. But the industry commonly runs vLLM on Modal as an in-house OpenAI-compatible endpoint — self-deployed Llama 70B on 1×H100 at ~$3.95/h, running ~10K tok/s output (vLLM batched) → ~$0.11 / M tokens marginal cost (ideal full GPU utilization); with idle / autoscaling actually ~$0.30-0.60/M. Slightly more expensive than DeepInfra's own ~$0.30/M, but you get full control + private data.
- Vs first-party (1P) price gap: treating Modal as a "self-hosted OpenAI-compat alternative" cost floor — running Llama 70B at ~$0.50/M blended vs GPT-4o ~$10/M → ~20× price gap, but you must handle latency / model ops yourself.
- Inference engine: Modal doesn't mandate one — users can install vLLM, SGLang, TGI, TensorRT-LLM, any of them. Modal is a GPU compute platform, not an engine.
- Compute source: rents GPU pools from GCP / Oracle / coreweave / Lambda and other L2 providers; doesn't do hardware or data centers itself. Modal's differentiation is at the scheduling layer + container orchestration + cold-start optimization (in-house lazy loading + image caching can achieve cold start <2s).
- Agent / Sandbox pull on Modal: in the agent era of 2024-2025 many startups use Modal as a code-execution sandbox (same demand as Anthropic Code Interpreter); a key source of Modal's incremental traffic.
- Capital model + strategy: Modal doesn't sell its own token API, so it doesn't participate in the token price war; the moat is engineer ecosystem + ergonomics — comparable to Vercel for web, Modal for GPU. This is a16z's investment thesis.
- Risks: as fireworks-ai / baseten / together-ai dedicated deployment products mature, "why not just use the token API?" gets asked more loudly. Modal has to survive on sandbox / agent / batch / custom-model scenarios that token APIs don't handle well.
5. Financials / Funding
| Round | Date | Amount | Valuation | Lead |
|---|---|---|---|---|
| Seed | 2022 | $7M | — | Lux Capital, Definition |
| Series A | 2023 | $16M | — | Redpoint |
| Series B | 2024-09 | $80M | $1.1B post | Lux Capital + Andreessen Horowitz [3] |
- Founded: 2021
- Total funding: ~$103M
- Public ARR / user counts not disclosed; industry estimates tens of thousands of active developers
6. People & Relationships
- Founder / CEO: Erik Bernhardsson — former Spotify ML infra head (author of the open-source Annoy library); a known figure in the ML infra community.
- Co-founder / CTO: Akshat Bubna.
- Investors: Lux Capital, Andreessen Horowitz, Redpoint, Definition, Amplify Partners.
- Partners / Customers: Suno, Ramp, ElevenLabs, Substack, etc., public case studies; agent companies are well-represented.
- Competes with: replicate, baseten, runpod (serverless tier), Beam.cloud.
Sources
- [1] https://modal.com/pricing (2026-05-10)
- [2] https://modal.com/blog (2026-05-10)
- [3] https://a16z.com/announcement/investing-in-modal/ (2026-05-10)
- [4] https://www.crunchbase.com/organization/modal-labs (2026-05-10)
Last compiled: 2026-05-10