Example Dataset
Below is an example of a typical dataset collected through our human-in-the-loop robotic data collection pipeline.
All datasets are customized per project; this example is provided to illustrate structure, scope, and data quality, not as a fixed offering.
Example: Contact-Rich Manipulation Dataset
Task
Pick, reposition, and place household objects under varying initial conditions using a robotic arm with tactile sensing.
Environment
Tabletop manipulation setup
Semi-structured object layouts
Controlled lighting with minor variations
Real-world contact and friction dynamics
Hardware Configuration
Robotic Arm: 7-DOF anthropomorphic arm
End Effector: Parallel gripper with tactile sensing
Sensors:
Multi-view RGB / RGB-D cameras
Joint encoders (position, velocity, torque)
End-effector force and distributed tactile arrays
Data Modalities
Each episode includes synchronized streams of:
Vision
RGB or RGB-D frames (time-aligned)
Optional multi-camera viewpoints
Proprioception
Joint positions, velocities, torques
Low-level control signals
Tactile & Force
Per-cell triaxial tactile forces
Aggregate force vectors
Contact location and pressure distribution
Human Demonstration Signals
Teleoperation commands
Corrective actions during execution
Metadata
Object identity and properties
Task parameters
Success / failure flags
Dataset Structure
Data is organized into episode-based trajectories:
dataset/
├── metadata/
│ ├── task_definition.json
│ ├── object_catalog.json
│ └── calibration.yaml
├── episodes/
│ ├── episode_0001/
│ │ ├── observations/
│ │ │ ├── vision/
│ │ │ ├── proprioception/
│ │ │ └── tactile/
│ │ ├── actions/
│ │ ├── rewards/ (optional)
│ │ └── annotations.json
│ ├── episode_0002/
│ └── ...This structure maps directly to imitation learning trajectories, reinforcement learning rollouts, and offline RL datasets.
Scale
A typical pilot dataset may include:
200–1,000 episodes
5–20 minutes per episode
Multiple operators to capture variation
Intentional failure cases (misgrasp, slip, collision)
Larger engagements can scale to tens of thousands of episodes over extended collection periods.
Quality Controls
Each dataset undergoes:
Sensor calibration before collection
Real-time monitoring during capture
Automatic synchronization and integrity checks
Manual review of sampled episodes
Clear documentation of known edge cases
The result is learning-ready data, not raw logs.
Intended Use
This type of dataset is commonly used for:
Imitation learning policy training
Tactile-aware grasp and manipulation models
Contact-rich reinforcement learning
Sim-to-real validation and benchmarking
Customization
All datasets are customizable, including:
Task definitions
Sensor configurations
Data modalities and sampling rates
Annotation depth
Delivery format and schema
We work closely with clients to ensure the dataset aligns with their model architecture, training pipeline, and evaluation strategy.