Pipeline Architecture Overview
A production-quality data collection pipeline has five stages: hardware setup (robot + sensors + teleoperation interface), recording software (synchronized multi-modal capture), storage backend (structured episode format), quality control (automated filtering + human review), and dataset packaging (RLDS, LeRobot, or custom format).
Hardware Requirements
Minimum viable setup: robot arm with joint encoders, wrist-mounted RGB camera (640×480, 30fps), and teleoperation device (SpaceMouse, VR controller, or leader arm). Recommended additions: external scene camera, wrist force-torque sensor, and depth camera. SVRC's data collection stations include all recommended sensors pre-configured and calibrated.
Software Stack
Use ROS 2 for sensor synchronization and recording. Store episodes in the RLDS format for compatibility with Open X-Embodiment and LeRobot. Implement automated quality checks: episode length bounds, action magnitude outliers, success/failure labeling, and camera occlusion detection. Version your dataset with DVC or git-lfs for reproducibility.
Scaling from 100 to 10,000 Episodes
The jump from proof-of-concept to production-scale data requires parallel collection stations, operator scheduling, and centralized quality dashboards. SVRC's Data Services team has collected 50,000+ episodes across multiple robot platforms. Contact us for pilot programs.