Assembly101: Multi-View Assembly and Disassembly Dataset
513 hours of procedural assembly from 12 synchronized cameras. 1M+ fine-grained action annotations for manufacturing robotics.
Key Stats
| Metric | Value |
|---|---|
| Duration | 513 hours |
| Participants | 53 |
| Videos | 4,321 |
| Camera views | 12 synchronized (8 fixed + 4 egocentric) |
| Annotations | 100K+ coarse, 1M+ fine-grained action labels, hand pose |
| Size | ~1.5 TB |
| License | CC-BY-NC-4.0 |
What is Assembly101?
Assembly101 captures the full complexity of procedural assembly tasks. Participants assemble and disassemble take-apart toys while being recorded from 12 synchronized cameras -- 8 fixed external views and 4 head-mounted egocentric cameras. The multi-view setup provides complete 3D coverage of hand-object interactions during assembly.
The annotation depth is exceptional: 100K+ coarse action labels, 1M+ fine-grained temporal segments, and 3D hand pose estimates. This makes Assembly101 uniquely valuable for manufacturing robotics research, where robots need to understand assembly sequences, detect errors, and plan corrective actions.
Related datasets
- EPIC-KITCHENS -- egocentric kitchen activities
- Ego4D -- massive egocentric dataset from Meta
- RH20T -- multi-modal manipulation with force/tactile