Example Dataset

Below is an example of a typical dataset collected through our human-in-the-loop robotic data collection pipeline.

All datasets are customized per project; this example is provided to illustrate structure, scope, and data quality, not as a fixed offering.

Example: Contact-Rich Manipulation Dataset

Task

Pick, reposition, and place household objects under varying initial conditions using a robotic arm with tactile sensing.

Environment

  • Tabletop manipulation setup

  • Semi-structured object layouts

  • Controlled lighting with minor variations

  • Real-world contact and friction dynamics

Hardware Configuration

  • Robotic Arm: 7-DOF anthropomorphic arm

  • End Effector: Parallel gripper with tactile sensing

  • Sensors:

    • Multi-view RGB / RGB-D cameras

    • Joint encoders (position, velocity, torque)

    • End-effector force and distributed tactile arrays

Data Modalities

Each episode includes synchronized streams of:

  • Vision

    • RGB or RGB-D frames (time-aligned)

    • Optional multi-camera viewpoints

  • Proprioception

    • Joint positions, velocities, torques

    • Low-level control signals

  • Tactile & Force

    • Per-cell triaxial tactile forces

    • Aggregate force vectors

    • Contact location and pressure distribution

  • Human Demonstration Signals

    • Teleoperation commands

    • Corrective actions during execution

  • Metadata

    • Object identity and properties

    • Task parameters

    • Success / failure flags

Dataset Structure

Data is organized into episode-based trajectories:

dataset/
├── metadata/
│   ├── task_definition.json
│   ├── object_catalog.json
│   └── calibration.yaml
├── episodes/
│   ├── episode_0001/
│   │   ├── observations/
│   │   │   ├── vision/
│   │   │   ├── proprioception/
│   │   │   └── tactile/
│   │   ├── actions/
│   │   ├── rewards/ (optional)
│   │   └── annotations.json
│   ├── episode_0002/
│   └── ...

This structure maps directly to imitation learning trajectories, reinforcement learning rollouts, and offline RL datasets.

Scale

A typical pilot dataset may include:

  • 200–1,000 episodes

  • 5–20 minutes per episode

  • Multiple operators to capture variation

  • Intentional failure cases (misgrasp, slip, collision)

Larger engagements can scale to tens of thousands of episodes over extended collection periods.

Quality Controls

Each dataset undergoes:

  • Sensor calibration before collection

  • Real-time monitoring during capture

  • Automatic synchronization and integrity checks

  • Manual review of sampled episodes

  • Clear documentation of known edge cases

The result is learning-ready data, not raw logs.

Intended Use

This type of dataset is commonly used for:

  • Imitation learning policy training

  • Tactile-aware grasp and manipulation models

  • Contact-rich reinforcement learning

  • Sim-to-real validation and benchmarking

Customization

All datasets are customizable, including:

  • Task definitions

  • Sensor configurations

  • Data modalities and sampling rates

  • Annotation depth

  • Delivery format and schema

We work closely with clients to ensure the dataset aligns with their model architecture, training pipeline, and evaluation strategy.