OpenVLA
The de-facto open-source VLA baseline — 7B parameters, MIT license, and 1.52M monthly HuggingFace downloads, beating Google's 55B RT-2-X by 16.5 points.
1. Core Product / Service
OpenVLA is an open-source 7-billion-parameter Vision-Language-Action (VLA) model for robotic manipulation, developed by researchers from Stanford, UC Berkeley, Google DeepMind, Toyota Research Institute (TRI), and MIT [1]. Released in June 2024 under the MIT license, it was the first credibly open competitor to Google's closed RT-2 series.
Key architecture decisions:
- Dual visual encoders: DINOv2 (spatial reasoning) + SigLIP (semantic understanding) — a novel approach over single-encoder competitors at the time [1].
- Base LLM: Llama 2 7B, fine-tuned on ~970K robot manipulation episodes from the Open X-Embodiment dataset spanning 70+ datasets and 22 robot embodiments.
- Discrete action tokenization: 256 bins per action dimension, mapped to unused Llama 2 tokens.
- Fine-tuning access: LoRA/QLoRA support enables adaptation on consumer GPUs with as little as 8 GB VRAM.
The February 2025 OpenVLA-OFT update replaced discrete action tokens with a continuous regression head and added action chunking, achieving a 26× throughput improvement and 97.1% LIBERO success rate — up from 76.5% [1]. A PyTorch training pipeline released March 2025 supports scaling from 10B to 340B parameters.
On HuggingFace, OpenVLA-7B records ~1.52M monthly downloads — 36× more than the #2 model (GR00T-N1.7-3B at 41.4K), making it by far the most-used robot model on the platform [2].
2. Target Users & Pain Points
- Academic robotics researchers: Need a strong, open baseline to build upon — OpenVLA fills the gap that Google's closed RT-2 left. Most subsequent VLA papers benchmark against or build on top of OpenVLA.
- Robotics startups: Get a production-quality VLA model without paying for API access or building from scratch. The MIT license enables commercial use.
- Edge deployment engineers: The LoRA/QLoRA + quantization stack allows running VLA models on edge hardware — MiniVLA and other derivatives target this use case.
The core pain point: building a VLA from scratch requires GPU clusters and massive robot datasets that most labs can't access. OpenVLA democratizes the starting point.
3. Competitive Landscape
| Model | Type | Open Source | Scale |
|---|---|---|---|
| Google RT-2 / RT-2-X | Closed VLA | No | 55B params |
| nvidia-groot|GR00T N1 → N1.7 | Open VLA + WBC | Yes (weights, not code) | 3B |
| physical-intelligence|PI π0.5 | Closed VLA | No (HF hosted) | 3–4B |
| G0.5 (Galaxea) | Open VLA (unified autoregressive) | Yes | ~2B |
OpenVLA's differentiation is its combination of open-source completeness (code + weights + training pipeline, MIT license) and academic neutrality — it's not tied to any single robot manufacturer or cloud provider, making it the default baseline that subsequent models benchmark against.
4. Unique Observations
- 36× the downloads of GR00T. At 1.52M monthly HuggingFace downloads vs GR00T-N1.7-3B's 41.4K, OpenVLA dominates robot model usage by an order of magnitude [2]. This suggests the developer community strongly prefers an MIT-licensed, truly open model over a corporate-open model tied to a specific hardware ecosystem.
- The academic-commercial flywheel. Because OpenVLA is the default baseline, every new VLA paper benchmarks against it → more citations and community trust → more downloads → more derivative work → OpenVLA remains the baseline. This is the same dynamic that made BERT and ResNet dominant in their respective eras.
- From lab to production. OpenVLA-OFT's 26× speedup and quantization support are bending the curve toward real-time robot inference on affordable hardware. If 30+ Hz VLA inference on a Jetson becomes routine, the barrier between research demo and deployed robot drops dramatically.
5. Financials / Funding
OpenVLA is a research project, not a commercial entity — no separate funding rounds. Development has been supported by:
- Stanford University, UC Berkeley, Google DeepMind, TRI, and MIT — the contributing institutions fund the researchers.
- The project is MIT-licensed and community-maintained via GitHub (github.com/openvla/openvla) and HuggingFace.
6. People & Relationships
- Key authors: Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn
- Hosting: HuggingFace (openvla/openvla-7b), GitHub (openvla/openvla)
- Competitors / benchmarks against: Google RT-2, nvidia-groot, physical-intelligence, Galaxea G0.5
- Part of: Open X-Embodiment collaboration, LeRobot ecosystem