Unit 2: Understand the LeRobot Dataset Format — LeRobot Learning Path

Why a Standard Format Matters

Robot learning has historically suffered from every lab using a different data format — making it impossible to share datasets, combine data from different robots, or use pre-trained policies across systems. The LeRobot dataset format solves this by defining a single schema that works across all supported hardware. A dataset recorded on an SO-100 can be used to train a policy for an OpenArm without any conversion — as long as the action space dimensions match.

Understanding the format before you record means you will not discover a structural problem in your data during training. It also makes debugging much easier: when training fails, the first place to look is the dataset.

Dataset Structure: Parquet + MP4

Each LeRobot dataset lives in a directory with this structure:

my_dataset/
├── meta/
│   ├── info.json          # dataset metadata (robot type, fps, modalities)
│   ├── tasks.jsonl        # task description per episode
│   └── stats.safetensors  # mean/std for normalization
├── data/
│   └── chunk-000/
│       ├── episode_000000.parquet  # joint states + actions, one row per timestep
│       ├── episode_000001.parquet
│       └── ...
└── videos/
    └── chunk-000/
        ├── observation.images.cam_high/
        │   ├── episode_000000.mp4   # camera feed, one file per episode
        │   └── ...
        └── observation.images.cam_wrist/
            └── ...

The split between Parquet (for numeric time-series) and MP4 (for video) is deliberate. Parquet compresses joint states and actions efficiently and supports fast random access by episode index. MP4 uses video codecs designed for image sequences, yielding 10–30x smaller files than storing raw images as tensors.

Key Fields in Each Episode

Field	Shape	Description
observation.state	[T, D]	Joint positions (and optionally velocities) at each timestep. D is the number of joints (e.g., 7 for SO-100: 6 joints + 1 gripper).
action	[T, D]	Target joint positions commanded at each timestep. Same dimensionality as observation.state.
timestamp	[T]	Time in seconds since the start of the episode, at 50Hz by default (0.02s per step).
episode_index	scalar	Integer index of this episode within the dataset. Used by the dataloader to group timesteps into episodes.
frame_index	[T]	Frame number within the episode (0 to T-1). Matches the frame number in the corresponding MP4.
next.done	[T]	Boolean flag — True at the last timestep of an episode. Used to signal episode boundaries during training.
task_index	scalar	Index into tasks.jsonl. Enables multi-task datasets where different episodes correspond to different instructions.

Camera fields: Camera images are stored as MP4 files, not in the Parquet. The Parquet contains observation.images.cam_high as a path reference (frame index + episode index) rather than raw pixel data. The LeRobot dataloader handles the decode and sync transparently.

Load and Visualize an Existing Dataset

Load the lerobot-raw/aloha_sim_insertion_scripted dataset from HuggingFace Hub and visualize 3 episodes. This dataset contains scripted demonstrations of a bimanual robot inserting a peg — a clean example of what a well-structured dataset looks like.

source ~/lerobot-env/bin/activate

# Visualize 3 episodes from a public dataset
python -m lerobot.scripts.visualize_dataset \
  --repo-id lerobot-raw/aloha_sim_insertion_scripted \
  --episode-indices 0 1 2 \
  --output-dir ~/dataset-viz/

# Open the generated HTML in your browser
# File path printed to terminal, e.g. ~/dataset-viz/index.html

The visualizer generates an HTML page with video playback of each episode alongside synchronized joint state plots. Look for:

Smooth joint trajectories — sharp spikes indicate recording artifacts or arm crashes
Consistent episode length — episodes that vary wildly in length (e.g., 50 vs 400 frames) often indicate some demonstrations captured partial or aborted motions
Gripper state changes — the last joint dimension should show clear binary transitions (open → close → open) for manipulation tasks

# Inspect the dataset programmatically
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

dataset = LeRobotDataset("lerobot-raw/aloha_sim_insertion_scripted")
print(f"Episodes: {dataset.num_episodes}")
print(f"Total frames: {len(dataset)}")
print(f"FPS: {dataset.fps}")
print(f"Features: {list(dataset.features.keys())}")

# Inspect a single frame
frame = dataset[0]
print(f"State shape: {frame['observation.state'].shape}")
print(f"Action shape: {frame['action'].shape}")

Optional — Reference Datasets

Explore the SVRC Dataset Collection

The SVRC dataset library includes curated robot learning datasets in LeRobot format. Browse them to understand what different tasks and hardware look like before recording your own. Browse datasets →

Unit 2 Complete When...

You have successfully visualized 3 episodes from lerobot-raw/aloha_sim_insertion_scripted and the HTML output opens in your browser. You can identify the observation.state, action, and timestamp fields in a Parquet file loaded with Python. You understand the difference between what is stored in Parquet vs MP4. You are ready to record your own dataset in Unit 3.