Custom Robot Teleoperation Data Collection
Spec the robot, the task list, the modalities, and the scale — we collect, label, QA, and deliver imitation learning data in the format your training stack expects. 24-hour teleoperation coverage at our Mountain View lab, from $500 per task episode.
Trusted at scale
Ready-to-License Dataset Packs
If you need imitation learning data today, start with one of our curated dataset packs. Each pack ships cleaned, labeled, and ready for OpenVLA / Octo / Diffusion Policy fine-tuning.
Bimanual Manipulation Pack
20,000 dual-arm episodes across 50 tabletop tasks. ALOHA 2 and bimanual Panda coverage, RGB + wrist cameras, language-annotated.
From $5,000
Humanoid Locomotion Pack
50,000 locomotion trajectories on Unitree G1/H1 across 30 terrains. Joint torques, IMU, full-body keypoints, RGB-D.
From $10,000
Dexterous Grasping Pack
10,000 multi-finger grasp trajectories on Shadow Hand and LEAP Hand across 200 objects. Tactile + force-torque + RGB-D.
From $5,000
VLA Evaluation Pack
2,000 held-out evaluation rollouts on Franka and WidowX, matched distribution to BridgeData V2 and LIBERO. Ideal for benchmarking a new VLA without contaminating your train set.
From $3,000
Commission a Custom Campaign
Every custom project follows a four-step flow. Typical total turnaround is 2-6 weeks depending on scale.
- Spec. You share a task brief — robot, objects, scenes, modalities, success criteria, delivery format. We return a detailed data collection plan and a fixed quote within 24 hours.
- Pilot. We collect ~100 episodes against your spec, train a sanity-check Diffusion Policy on the pilot data, and deliver raw episodes plus a validation report. You approve, tweak, or kill the spec.
- Scale. Once the pilot is approved, we run 24-hour teleoperation coverage to your target episode count with continuous QA and weekly drop deliveries.
- Delivery. Final dataset ships in RLDS, HuggingFace, LeRobot, or a custom schema with a CHANGELOG, per-episode metadata, and a reproducible training config.
Data Quality Standards
Every episode goes through a three-stage quality pipeline: automatic heuristic filters (action clipping, sensor dropout, episode length), manual review by a second teleoperator, and a final policy-based sanity check. Less than 0.5% of delivered episodes require post-hoc rework. See our full data services page for protocol details, inter-rater agreement numbers, and sample validation reports.
Licensing & Formats
You choose the delivery format:
- RLDS / TFDS. Drop-in compatible with the Open X-Embodiment mix and the Octo / OpenVLA training scripts.
- HuggingFace datasets. Parquet-backed, streamable, versioned, ready for a private HF repo.
- LeRobot. The Hugging Face LeRobot schema for community-friendly sharing.
- Custom schema. Your own HDF5 / Zarr / ROS bag layout — we match whatever your existing pipeline expects.
All custom-collected data is licensed to you exclusively by default. Non-exclusive licensing (to help offset collection cost) is available on request at a discount.
Pricing Guide
| Task complexity | Price per episode | Typical use |
|---|---|---|
| Simple pick-and-place, 5s episodes | $0.50 - $1 | VLA pretraining volume |
| Mid-complexity manipulation, 15-30s | $1 - $3 | Standard imitation learning |
| Long-horizon, bimanual, or contact-rich | $3 - $5 | Production policy fine-tuning |
| Humanoid / dexterous, full body | $5 - $15 | Whole-body policy training |
| Novel embodiment (customer-shipped robot) | $500 per task episode | Bespoke R&D collection |
Volume discounts kick in above 10,000 episodes. Full project budgets typically range from $5,000 for a pilot to $50,000+ for a production campaign.
Customer Logos & Case Studies
We work with leading robot foundation model labs, humanoid startups, and academic research groups. Full case studies are available under NDA on request. Representative outcomes:
- Collected 120,000 bimanual episodes for a Series-A humanoid startup in 8 weeks.
- Delivered a 30,000-episode BridgeData-style WidowX pack for a VLA research group at a top-5 university.
- Ran a 6-week dexterous grasping campaign on a customer-shipped robot hand, producing 8,000 tactile-annotated episodes.
Start with open data instead
Not sure you need custom collection yet? Start with one of the open datasets below — if you still need more coverage afterwards, come back here.
- Open X-Embodiment — 1M+ cross-robot trajectories for VLA pretraining
- BridgeData V2 — 60K WidowX demos with language
- DROID — 76K Franka episodes across 564 scenes
- LIBERO — 65K lifelong learning demos
- CALVIN — long-horizon language-conditioned sim
- ALOHA — bimanual real-world teleoperation
- Robomimic — canonical BC benchmark
- RoboNet — cross-robot video prediction
- MimicGen — synthetic data augmentation
See the full datasets hub for an up-to-date directory.
Request a Quote
Tell us what you need. We respond with a fixed quote and a data collection plan within 24 hours.