LeRobot Dataset Collection
HuggingFace's standardized hub of 181+ robotics datasets. The closest thing to an ImageNet for robot learning.
Key Stats
| Metric | Value |
|---|---|
| Total datasets | 181+ (and growing) |
| Robots covered | ALOHA, xArm, WidowX, Franka, UR5e, Unitree G1, Reachy, Koch, SO-100, and dozens more |
| Format | Standardized Parquet (tabular) + MP4 (video), streaming-friendly |
| Framework license | Apache 2.0 |
| Individual dataset licenses | Varies (mostly Apache 2.0; check each dataset card) |
| Training support | ACT, Diffusion Policy, VQ-BeT, TD-MPC built in |
What is LeRobot?
LeRobot is HuggingFace's answer to the fragmentation problem in robotics data. Before LeRobot, every lab used its own format -- HDF5, TFRecord, RLDS, raw ROS bags, custom CSVs. Training a model on data from multiple sources meant writing a different data loader for each one.
LeRobot standardizes everything into a single schema: Parquet tables for structured data (joint positions, actions, timestamps, episode boundaries) and MP4 files for video observations. This means you can load any dataset in the collection with the same two lines of code, stream it from HuggingFace without downloading the full dataset, and plug it directly into the LeRobot training pipelines for ACT, Diffusion Policy, or VQ-BeT.
Key dataset families in the collection include:
- ALOHA suite: 20+ variants covering sim and real bimanual manipulation
- PushT: The canonical diffusion policy benchmark (state and image variants)
- xArm: Lift and push tasks with UFactory xArm
- RoboCasa: Kitchen manipulation in simulation
- DROID subset: 40M-row conversion of the DROID real-world corpus
- Community datasets: UTokyo, Columbia, Koch, SO-100, and many more
How to use
# Install LeRobot
pip install lerobot
# List all available datasets
python -m lerobot.scripts.list_datasets
# Load a dataset (streaming from HuggingFace)
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
dataset = LeRobotDataset("lerobot/aloha_sim_transfer_cube_human")
# Access observations and actions
print(dataset[0]) # First frame: observation.images, observation.state, action
# Train ACT on the dataset
python -m lerobot.scripts.train \
--dataset.repo_id=lerobot/aloha_sim_transfer_cube_human \
--policy=act
Related datasets
- ALOHA Dataset -- the most popular family in the LeRobot collection
- Open X-Embodiment -- the larger cross-embodiment pretraining corpus
- DROID -- large-scale real-world data, subset available in LeRobot format
- RoboCasa -- kitchen simulation data in LeRobot format