Why a Standard Format Matters
Robot learning has historically suffered from every lab using a different data format — making it impossible to share datasets, combine data from different robots, or use pre-trained policies across systems. The LeRobot dataset format solves this by defining a single schema that works across all supported hardware. A dataset recorded on an SO-100 can be used to train a policy for an OpenArm without any conversion — as long as the action space dimensions match.
Understanding the format before you record means you will not discover a structural problem in your data during training. It also makes debugging much easier: when training fails, the first place to look is the dataset.
Dataset Structure: Parquet + MP4
Each LeRobot dataset lives in a directory with this structure:
The split between Parquet (for numeric time-series) and MP4 (for video) is deliberate. Parquet compresses joint states and actions efficiently and supports fast random access by episode index. MP4 uses video codecs designed for image sequences, yielding 10–30x smaller files than storing raw images as tensors.
Key Fields in Each Episode
| Field | Shape | Description |
|---|---|---|
| observation.state | [T, D] | Joint positions (and optionally velocities) at each timestep. D is the number of joints (e.g., 7 for SO-100: 6 joints + 1 gripper). |
| action | [T, D] | Target joint positions commanded at each timestep. Same dimensionality as observation.state. |
| timestamp | [T] | Time in seconds since the start of the episode, at 50Hz by default (0.02s per step). |
| episode_index | scalar | Integer index of this episode within the dataset. Used by the dataloader to group timesteps into episodes. |
| frame_index | [T] | Frame number within the episode (0 to T-1). Matches the frame number in the corresponding MP4. |
| next.done | [T] | Boolean flag — True at the last timestep of an episode. Used to signal episode boundaries during training. |
| task_index | scalar | Index into tasks.jsonl. Enables multi-task datasets where different episodes correspond to different instructions. |
observation.images.cam_high as a path reference (frame index + episode index) rather than raw pixel data. The LeRobot dataloader handles the decode and sync transparently.
Load and Visualize an Existing Dataset
Load the lerobot-raw/aloha_sim_insertion_scripted dataset from HuggingFace Hub and visualize 3 episodes. This dataset contains scripted demonstrations of a bimanual robot inserting a peg — a clean example of what a well-structured dataset looks like.
The visualizer generates an HTML page with video playback of each episode alongside synchronized joint state plots. Look for:
- Smooth joint trajectories — sharp spikes indicate recording artifacts or arm crashes
- Consistent episode length — episodes that vary wildly in length (e.g., 50 vs 400 frames) often indicate some demonstrations captured partial or aborted motions
- Gripper state changes — the last joint dimension should show clear binary transitions (open → close → open) for manipulation tasks
Explore the SVRC Dataset Collection
The SVRC dataset library includes curated robot learning datasets in LeRobot format. Browse them to understand what different tasks and hardware look like before recording your own. Browse datasets →
Unit 2 Complete When...
You have successfully visualized 3 episodes from lerobot-raw/aloha_sim_insertion_scripted and the HTML output opens in your browser. You can identify the observation.state, action, and timestamp fields in a Parquet file loaded with Python. You understand the difference between what is stored in Parquet vs MP4. You are ready to record your own dataset in Unit 3.