LeRobot Dataset Collection

HuggingFace's standardized hub of 181+ robotics datasets. The closest thing to an ImageNet for robot learning.

Apache 2.0 (framework) Parquet + MP4 PyTorch Native

Key Stats

MetricValue
Total datasets181+ (and growing)
Robots coveredALOHA, xArm, WidowX, Franka, UR5e, Unitree G1, Reachy, Koch, SO-100, and dozens more
FormatStandardized Parquet (tabular) + MP4 (video), streaming-friendly
Framework licenseApache 2.0
Individual dataset licensesVaries (mostly Apache 2.0; check each dataset card)
Training supportACT, Diffusion Policy, VQ-BeT, TD-MPC built in

What is LeRobot?

LeRobot is HuggingFace's answer to the fragmentation problem in robotics data. Before LeRobot, every lab used its own format -- HDF5, TFRecord, RLDS, raw ROS bags, custom CSVs. Training a model on data from multiple sources meant writing a different data loader for each one.

LeRobot standardizes everything into a single schema: Parquet tables for structured data (joint positions, actions, timestamps, episode boundaries) and MP4 files for video observations. This means you can load any dataset in the collection with the same two lines of code, stream it from HuggingFace without downloading the full dataset, and plug it directly into the LeRobot training pipelines for ACT, Diffusion Policy, or VQ-BeT.

Key dataset families in the collection include:

  • ALOHA suite: 20+ variants covering sim and real bimanual manipulation
  • PushT: The canonical diffusion policy benchmark (state and image variants)
  • xArm: Lift and push tasks with UFactory xArm
  • RoboCasa: Kitchen manipulation in simulation
  • DROID subset: 40M-row conversion of the DROID real-world corpus
  • Community datasets: UTokyo, Columbia, Koch, SO-100, and many more

How to use

# Install LeRobot
pip install lerobot

# List all available datasets
python -m lerobot.scripts.list_datasets

# Load a dataset (streaming from HuggingFace)
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
dataset = LeRobotDataset("lerobot/aloha_sim_transfer_cube_human")

# Access observations and actions
print(dataset[0])  # First frame: observation.images, observation.state, action

# Train ACT on the dataset
python -m lerobot.scripts.train \
  --dataset.repo_id=lerobot/aloha_sim_transfer_cube_human \
  --policy=act

Related datasets

  • ALOHA Dataset -- the most popular family in the LeRobot collection
  • Open X-Embodiment -- the larger cross-embodiment pretraining corpus
  • DROID -- large-scale real-world data, subset available in LeRobot format
  • RoboCasa -- kitchen simulation data in LeRobot format

Contribute your data to the ecosystem

We help robotics teams package their demonstration data into LeRobot format and publish to HuggingFace or our marketplace.