Best Robot Learning Datasets 2025
A curated guide to the top open-source datasets for imitation learning, VLA fine-tuning, and robot learning research.
From public data to a trainable signal
Top Datasets for Robot Learning
Choosing the right dataset depends on your robot, task, and model. Here are the most widely used datasets in 2025.
1. Open X-Embodiment
Combines RT-X, BridgeData, DROID, and others into a unified format. Used to train foundation models like OpenVLA and Octo. Best for: pre-training generalist policies. See Open X-Embodiment.
2. DROID
Large-scale, diverse manipulation from 22 robot types. 76K trajectories. Best for: multi-robot generalization, foundation model training. See DROID.
3. BridgeData
WidowX manipulation across 60 tasks. Widely used in research. Best for: single-arm manipulation, WidowX compatibility. See BridgeData.
4. ALOHA / Stanford Datasets
Bimanual teleoperation. Kitchen, mobile manipulation. Best for: bimanual tasks, Mobile ALOHA. See ALOHA.
5. LeRobot
Hugging Face–hosted, community datasets. Easy to add your own. Best for: quick experiments, sharing data. See LeRobot.
How to Choose
- Same robot as dataset? Use that dataset (e.g., WidowX → BridgeData).
- Different robot? Open X-Embodiment or DROID for multi-robot transfer.
- Custom task? Collect your own or use our data services.
Full Catalog
See our complete Datasets catalog with links to all datasets, papers, and download pages.