- RGB and depthStill the baseline for perception-led manipulation tasks.
- Force and tactileImportant for contact-rich transitions and grasp stability.
- Language and metadataUseful for retrieval, evaluation slices, and instruction grounding.
Multimodal robot datasets
Multimodal datasets connect vision, action, proprioception, and touch so teams can reason about what information their policy will actually need.
This page helps teams decide whether they need more modalities, better timing alignment, or clearer metadata before retraining.