Example Dataset

Below is an example of a typical dataset collected through our human-in-the-loop robotic data collection pipeline.

All datasets are customized per project; this example is provided to illustrate structure, scope, and data quality, not as a fixed offering.

Example: Contact-Rich Manipulation Dataset

Task

Pick, reposition, and place household objects under varying initial conditions using a robotic arm with tactile sensing.

Environment

Tabletop manipulation setup
Semi-structured object layouts
Controlled lighting with minor variations
Real-world contact and friction dynamics

Hardware Configuration

Robotic Arm: 7-DOF anthropomorphic arm
End Effector: Parallel gripper with tactile sensing
Sensors:
- Multi-view RGB / RGB-D cameras
- Joint encoders (position, velocity, torque)
- End-effector force and distributed tactile arrays

Data Modalities

Each episode includes synchronized streams of:

Vision
- RGB or RGB-D frames (time-aligned)
- Optional multi-camera viewpoints
Proprioception
- Joint positions, velocities, torques
- Low-level control signals
Tactile & Force
- Per-cell triaxial tactile forces
- Aggregate force vectors
- Contact location and pressure distribution
Human Demonstration Signals
- Teleoperation commands
- Corrective actions during execution
Metadata
- Object identity and properties
- Task parameters
- Success / failure flags

Dataset Structure

Data is organized into episode-based trajectories:

dataset/
├── metadata/
│   ├── task_definition.json
│   ├── object_catalog.json
│   └── calibration.yaml
├── episodes/
│   ├── episode_0001/
│   │   ├── observations/
│   │   │   ├── vision/
│   │   │   ├── proprioception/
│   │   │   └── tactile/
│   │   ├── actions/
│   │   ├── rewards/ (optional)
│   │   └── annotations.json
│   ├── episode_0002/
│   └── ...

This structure maps directly to imitation learning trajectories, reinforcement learning rollouts, and offline RL datasets.

Scale

A typical pilot dataset may include:

200–1,000 episodes
5–20 minutes per episode
Multiple operators to capture variation
Intentional failure cases (misgrasp, slip, collision)

Larger engagements can scale to tens of thousands of episodes over extended collection periods.

Quality Controls

Each dataset undergoes:

Sensor calibration before collection
Real-time monitoring during capture
Automatic synchronization and integrity checks
Manual review of sampled episodes
Clear documentation of known edge cases

The result is learning-ready data, not raw logs.

Intended Use

This type of dataset is commonly used for:

Imitation learning policy training
Tactile-aware grasp and manipulation models
Contact-rich reinforcement learning
Sim-to-real validation and benchmarking

Customization

All datasets are customizable, including:

Task definitions
Sensor configurations
Data modalities and sampling rates
Annotation depth
Delivery format and schema

We work closely with clients to ensure the dataset aligns with their model architecture, training pipeline, and evaluation strategy.