Why Cost Transparency Matters

Budget surprises kill robot learning programs. The most common failure mode is not a bad algorithm — it is running out of data budget 40% of the way to a viable policy. This breakdown is designed to let you build a defensible budget before committing to a data collection program.

All figures are US market rates as of 2025. Hardware costs are purchase prices; depreciation over 3 years is noted where relevant.

Hardware Costs

Hardware is the largest upfront cost and the most commonly miscounted category. Teams forget to include grippers, mounting fixtures, safety enclosures, and calibration targets — all of which are necessary.

  • Robot arm: Entry-level desktop collaborative arms (WidowX 250, Kinova Jaco) run $3,000–$8,000. Mid-range research arms (Universal Robots UR5e, Franka FR3) cost $20,000–$35,000. High-dexterity platforms (Ability Hand integration, custom cable-driven hands) add $5,000–$50,000. Budget $5,000–$300,000 per robot depending on the task.
  • Gripper: A parallel jaw gripper (Robotiq 2F-85) costs $3,500–$5,000. Dexterous hands (Ability Hand, Inspire Dexterous Hand) cost $8,000–$15,000. Custom soft grippers can be manufactured for $500–$3,000 in small quantities.
  • Teleoperation system: Leader-follower puppet arm kits (ACT-style) are available for $2,000–$8,000. VR-based systems using Meta Quest 3 with custom controller mapping run $2,500–$12,000 in hardware. Full exoskeleton haptic systems (SenseGlove Nova 2, HaptX DK2) cost $5,000–$20,000 per operator station.
  • Cameras: Intel RealSense D435i: $200 each. ZED 2 stereo camera: $450 each. FLIR Blackfly S (industrial, high framerate): $600–$1,500 each. A standard 3-camera rig (2 fixed + 1 wrist-mounted) totals $800–$4,000 in camera hardware plus $200–$500 in mounts, cables, and lighting.
  • Workspace and fixtures: A dedicated data collection station needs a sturdy work table, mounting frame for the robot, and repeatable object placement fixtures (3D-printed trays, jigs). Budget $500–$3,000 per station.
  • Safety enclosure: For arms operating above 1 kg payload at speed, safety caging or light-curtain interlocks are legally required in many jurisdictions. Budget $1,000–$8,000.

Operator Costs

Operator costs are the largest ongoing expense and are often underestimated because the full cost of an employee extends well beyond base wage.

  • Base wage: $20–$35/hour for operators in mid-cost US cities; $30–$50/hour in San Francisco or New York. Remote operators (using VR teleoperation from home) average $20–$30/hour.
  • Fully loaded cost: Add 30–40% for benefits, payroll tax, and equipment. A $28/hour operator costs your organization $36–$40/hour fully loaded.
  • Training cost: Plan for $1,000–$3,000 per operator in training time before they reach production quality — typically 3–5 days of simulator practice, task familiarization, and quality calibration.
  • Supervision overhead: A QA lead spending 2–4 hours/day reviewing demos for a team of 5 operators adds ~$500–$1,000/week in supervisory cost.
  • Throughput: A skilled operator completes 30–80 demonstrations per day for moderate-complexity tasks (L2–L3 on the difficulty scale). Simple pick-and-place can reach 100–150/day. Precise assembly or bimanual tasks drop to 15–30/day.

Annotation Costs

Not all datasets require post-hoc annotation — if success/failure is implicit in the teleoperation flow, you may skip this. But many tasks require additional labels:

  • Success/failure labeling: Human review of each episode at $0.05–$0.25/episode (30–60 seconds per episode for a $20/hour annotator). For 10,000 demos this is $500–$2,500.
  • Object segmentation masks: Polygon annotation of objects in key frames runs $0.50–$2.50 per image on specialized platforms (Scale AI, Labelbox). A 3-camera, 10Hz system generates 30 images per second — even annotating 5 key frames per 10-second demo produces 1.5M frames for a 10K-demo dataset. Be selective.
  • Contact event timestamps: Marking grasp contact, lift, transport, and release phases in each trajectory. Budget $0.10–$0.50 per trajectory for a skilled annotator.
  • Language instruction labels: For VLM fine-tuning, each episode needs a natural language description of the task variant. $0.05–$0.20 per episode with a template + human review system.

Compute and Infrastructure Costs

  • Storage: A single demonstration episode in HDF5 format (3 cameras at 30Hz, 10 seconds, joint states, actions) runs 50–500 MB depending on resolution. A 10,000-demo dataset occupies 500 GB–5 TB. AWS S3 storage costs $0.023/GB/month — budget $12–$115/month ongoing.
  • Preprocessing compute: Resizing, color normalization, action filtering, and episode validation for 10K demos requires ~50–200 GPU-hours on an A100 or equivalent. At $2–3/hour on Lambda Labs or AWS, that is $100–$600 per dataset processing run.
  • Data pipeline engineering: Building and maintaining a robust ingestion, validation, and versioning pipeline requires 2–4 weeks of engineering time. At $150–$250/hour for a senior ML engineer, this is a $12,000–$40,000 one-time investment that teams routinely forget to budget.
  • Model training runs: Training ACT or diffusion policy on 10K demos takes 8–48 hours on 4× A100 GPUs. At $12/hour for the cluster, that is $96–$576 per training run. Expect 10–30 training runs during development.

Hidden Costs

  • QA rejection waste: Expect 20–40% of demos to fail QA for new tasks — wrong object placement, operator error, hardware glitch. You pay full collection cost for rejected demos. Budget 25% overhead on all operator time.
  • Hardware failure and maintenance: Robot joints, gripper fingers, and cable harnesses fail. Budget 5–10% of hardware cost annually for repairs and consumables.
  • Calibration time: Camera extrinsic calibration, robot kinematic calibration, and workspace registration take 2–4 hours per setup change. For a 5-station lab running 3 object sets, this is 30–60 hours/month of unproductive operator time.
  • Task redesign iterations: The first version of your data collection protocol almost always produces data that does not train a working policy. Budget 2–4 weeks of iteration on task specification, fixture design, and operator instructions before collecting your "real" dataset.

Total Cost Per Demonstration

Task TypeDemos/Day/OperatorOperator Cost/DemoHardware (Amortized)AnnotationTotal/Demo
Simple pick-and-place80–120$4–$6$2–$5$0.10–$0.25$6–$11
Varied grasps (L2)40–70$8–$14$3–$8$0.25–$1.00$11–$23
Tool use / two-step (L3)20–40$14–$28$5–$12$0.50–$2.00$20–$42
Contact-rich assembly (L4)10–25$22–$56$8–$20$1.00–$3.00$31–$79
Bimanual / deformable (L5)5–15$37–$112$15–$40$2.00–$5.00$54–$157

Compare these figures with SVRC's data services pricing to determine whether outsourcing delivers cost savings for your specific task type and volume.