Why this comparison matters
"RT-X" is a family, not a single model, and that confusion kills a lot of procurement conversations. RT-1-X is the ~35M-parameter transformer policy the Open X-Embodiment authors trained and open-sourced to demonstrate cross-embodiment transfer. RT-2-X is the 55B PaLI-X-based VLA that showed large multimodal models can do robot control zero-shot. RT-2-X was never released, and later DeepMind models (Gemini Robotics, RT-H) stayed inside Google. If you are evaluating "RT-X" as a base for your own system, you are really evaluating which RT-X checkpoint is available — usually just RT-1-X — against OpenVLA, which was published end-to-end with weights, code, and adapters.
For a broader view, see our VLA model directory, the RT-X page, or the OpenVLA page.
At-a-glance comparison
| Dimension | RT-X family | OpenVLA |
|---|---|---|
| Parameters | RT-1-X ~35M · RT-2-X 55B (PaLI-X based) | 7B |
| Backbone | RT-1-X: EfficientNet + token learner + transformer. RT-2-X: PaLI-X multimodal LLM | Llama-2 7B + DINOv2 + SigLIP |
| Action head | Discretized action tokens (both variants) | Discretized action tokens |
| Training data | Open X-Embodiment (22 embodiments, ~970K traj) | Open X-Embodiment (~970K traj) |
| Language conditioning | Strong on RT-2-X, moderate on RT-1-X | Strong (LLM-native) |
| Action space | End-effector delta, configurable dim | 7-DOF end-effector delta (extensible) |
| Inference hardware | RT-1-X: single GPU. RT-2-X: multi-GPU inference cluster | Single A100/H100/L40S bf16 |
| Weights available? | RT-1-X: yes. RT-2-X: no (proprietary) | Yes, on Hugging Face |
| License | RT-1-X: Apache 2.0. RT-2-X: closed | MIT |
| Fine-tuning | RT-1-X fine-tune supported. RT-2-X: no external access | LoRA, QLoRA, and full fine-tune recipes published |
| Paper | Open X-Embodiment paper (2023), RT-2 (arXiv:2307.15818) | Kim et al., CoRL 2024 (arXiv:2406.09246) |
| Code | github.com/google-deepmind/open_x_embodiment | github.com/openvla/openvla |
RT-X: the family that started it all
RT-1-X — the available one
RT-1-X is the Open X-Embodiment re-training of Google's original RT-1 architecture: an EfficientNet visual backbone, a FiLM language conditioner, a token learner, and a transformer that outputs discretized actions. It is small (~35M parameters), fast, and — critically — the weights are on Hugging Face under Apache 2.0. If your use case is mobile manipulation with a Franka or Google Robot arm and you want a lightweight baseline, RT-1-X remains a credible starting point even in 2026.
RT-2-X — the one you cannot have
RT-2-X is a PaLI-X-based 55B VLA that Google trained on the same Open X-Embodiment data. It established the thesis that big VLMs can do robot control, but the weights have never been released. If you read "RT-X performance" in a blog post, check whether they mean the 35M RT-1-X or the 55B RT-2-X — the gap between the two is enormous. RT-2-X is a published reference, not a model you can deploy.
OpenVLA: the open answer
OpenVLA was built explicitly to close the RT-2-X gap for the open-source community. It is 7B parameters — large enough to capture language priors, small enough to serve on one high-end GPU. The OpenVLA paper reports that it outperforms RT-2-X on LIBERO-Spatial, LIBERO-Object, LIBERO-Goal, and BridgeData V2 zero-shot, despite using 7× fewer parameters. It uses a Llama-2 7B LLM with DINOv2 + SigLIP vision encoders, discretizes actions into 256 bins per dimension, and is trained on the same Open X-Embodiment corpus RT-X saw.
More importantly for builders: OpenVLA ships with a LoRA fine-tuning recipe that lets a team adapt the policy to a new robot on a single 80 GB A100 in hours rather than days. That deployment path simply does not exist for RT-2-X.
The licensing and access question
This is usually the decisive factor. RT-1-X under Apache 2.0 is commercial-grade but old and small. RT-2-X is not licensable at all. OpenVLA under MIT is the only way to get RT-2-class behavior in a commercially usable package. Teams shipping real robots in 2026 — especially warehouse deployments or lab automation installs that need explicit rights — almost always land on OpenVLA or one of its derivatives.
Hardware footprint
RT-1-X runs on a single GPU at real-time rates and was demonstrated on a range of robots from Franka to Google Robot. It is the lightweight option. OpenVLA needs ~16 GB of VRAM in bfloat16 and typically runs at 5–10 Hz on an A100 without quantization — which is fine for most manipulation but tight for dexterous control. Teams often pair OpenVLA with action chunking and a lower-level impedance controller to hit the required loop rate. RT-2-X, if it were available, would need a multi-GPU inference server — another reason open-source deployments gravitated toward 7B-class models instead.
When RT-1-X still makes sense
- You want the smallest, fastest Open-X baseline and do not need strong language understanding.
- You are comparing cross-embodiment transfer in a research paper and want a historically grounded baseline.
- You are running on edge hardware where a 7B model is not viable.
When OpenVLA is the obvious pick
- You want RT-2-class language behavior with weights you can actually download.
- You need a commercial license path (MIT).
- You want an active community shipping LoRA adapters, quantized variants, and deployment recipes.
- You plan to fine-tune on your own teleop data — OpenVLA's documentation is materially better than RT-X's for downstream use.
Honest tradeoffs
OpenVLA is not strictly "better" than RT-2-X — DeepMind has continued to push internal models well past RT-2-X, and Google's production robotics stack uses closed descendants. What OpenVLA is, unambiguously, is the best open-weight foundation model in the RT-X lineage. If you need SOTA-at-any-cost and can partner with DeepMind, that is a different conversation. If you are building a product, OpenVLA's MIT weights plus a Supabase-hosted data pipeline plus teleoperation data collection is a realistic stack today.
Benchmarks and evaluation
Both model families publish numbers on LIBERO, Google Robot, and RLBench. OpenVLA's LIBERO numbers are in the paper and reproducible. RT-X's numbers vary by variant — be careful which RT-X row you are quoting. See our benchmarks directory for current suites.
When reading a benchmark claim involving RT-X, always ask three questions: which RT-X variant (RT-1-X or RT-2-X), which embodiment subset of Open X-Embodiment was used for evaluation, and whether the robot was Google's real-world WidowX or an external reproduction. The numbers can swing 20 percentage points depending on these choices. OpenVLA evaluations are generally simpler to interpret because the paper is explicit about checkpoint, protocol, and dataset splits.
Fine-tuning and deployment recipes
Downstream teams working with RT-1-X typically fine-tune on narrower task sets within the Open X-Embodiment umbrella — selecting only the trajectories relevant to their target embodiment, then running a short training loop to bring the policy up on their specific hardware. The RT-1-X training code is mature but sparse on production guidance, and most of the downstream knowledge lives in papers rather than recipes.
OpenVLA's fine-tuning story is considerably richer. The official repository ships a LoRA fine-tune example, a full fine-tune example, a dataset conversion tool for custom RLDS data, and reference configurations for several popular robots. The community has added 4-bit quantization, vLLM serving, and integration with LeRobot. For a team that needs to move from "we have teleop data" to "we have a policy running on our robot" in under a month, OpenVLA's tooling is the decisive advantage.
What DeepMind is shipping next
Google DeepMind has continued to iterate past RT-2-X internally. Gemini Robotics and related projects extend the foundation-model-for-robotics thesis with larger VLMs and tighter coupling to production Google robots. None of these checkpoints have been made available for external deployment, so from a builder's perspective they are worth reading about but not worth building on. OpenVLA, pi0, and the LeRobot-ecosystem models are the realistic forward path for open-source work.
Our recommendation
For almost every practical team today, OpenVLA is the answer. It inherits the RT-X lineage intellectually, matches or beats RT-2-X on public benchmarks, ships with open weights under MIT, and has an active fine-tuning ecosystem. RT-1-X remains a fine tiny baseline, but you are more likely to use Octo or a small Diffusion Policy for that role in 2026. RT-2-X is a paper reference — cite it, do not build on it.