LeRobot is HuggingFace's open-source robotics framework that provides a standardized format for robot learning datasets (Parquet + MP4), along with training scripts for ACT, Diffusion Policy, and VQ-BeT. It hosts 181+ datasets covering manipulation, locomotion, and navigation tasks across dozens of robot platforms.

What license are LeRobot datasets?

The LeRobot framework itself is Apache 2.0 licensed. Individual dataset licenses vary -- most ALOHA, PushT, xArm, and RoboCasa datasets are Apache 2.0, while converted versions of third-party datasets (like DROID) retain their original licenses. Always check the dataset card on HuggingFace for the specific license.

LeRobot Dataset Collection

HuggingFace's standardized hub of 181+ robotics datasets. The closest thing to an ImageNet for robot learning.

Apache 2.0 (framework) Parquet + MP4 PyTorch Native

Key Stats

Metric	Value
Total datasets	181+ (and growing)
Robots covered	ALOHA, xArm, WidowX, Franka, UR5e, Unitree G1, Reachy, Koch, SO-100, and dozens more
Format	Standardized Parquet (tabular) + MP4 (video), streaming-friendly
Framework license	Apache 2.0
Individual dataset licenses	Varies (mostly Apache 2.0; check each dataset card)
Training support	ACT, Diffusion Policy, VQ-BeT, TD-MPC built in

What is LeRobot?

LeRobot is HuggingFace's answer to the fragmentation problem in robotics data. Before LeRobot, every lab used its own format -- HDF5, TFRecord, RLDS, raw ROS bags, custom CSVs. Training a model on data from multiple sources meant writing a different data loader for each one.

LeRobot standardizes everything into a single schema: Parquet tables for structured data (joint positions, actions, timestamps, episode boundaries) and MP4 files for video observations. This means you can load any dataset in the collection with the same two lines of code, stream it from HuggingFace without downloading the full dataset, and plug it directly into the LeRobot training pipelines for ACT, Diffusion Policy, or VQ-BeT.

Key dataset families in the collection include:

ALOHA suite: 20+ variants covering sim and real bimanual manipulation
PushT: The canonical diffusion policy benchmark (state and image variants)
xArm: Lift and push tasks with UFactory xArm
RoboCasa: Kitchen manipulation in simulation
DROID subset: 40M-row conversion of the DROID real-world corpus
Community datasets: UTokyo, Columbia, Koch, SO-100, and many more

How to use

# Install LeRobot
pip install lerobot

# List all available datasets
python -m lerobot.scripts.list_datasets

# Load a dataset (streaming from HuggingFace)
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
dataset = LeRobotDataset("lerobot/aloha_sim_transfer_cube_human")

# Access observations and actions
print(dataset[0])  # First frame: observation.images, observation.state, action

# Train ACT on the dataset
python -m lerobot.scripts.train \
  --dataset.repo_id=lerobot/aloha_sim_transfer_cube_human \
  --policy=act

Access

Browse on HuggingFace GitHub Repository Paper (arXiv)

Related datasets

ALOHA Dataset -- the most popular family in the LeRobot collection
Open X-Embodiment -- the larger cross-embodiment pretraining corpus
DROID -- large-scale real-world data, subset available in LeRobot format
RoboCasa -- kitchen simulation data in LeRobot format

Contribute your data to the ecosystem

We help robotics teams package their demonstration data into LeRobot format and publish to HuggingFace or our marketplace.

Data Services Start Selling Data