The Core Tradeoff
Every robotics team building a manipulation policy eventually faces the same question: do we collect our own demonstration data, or do we contract it out? The answer is not universal — it depends on your hardware, team capabilities, timeline, and budget. Getting it wrong in either direction costs months and six-figure sums.
This framework is designed for ML engineers and robotics leads who need to make a defensible, data-driven recommendation to their organization. We cover the true all-in cost of in-house collection, three clear signals that favor outsourcing, three signals that favor in-house, and a hybrid model that many mature teams use in practice.
The True Cost of In-House Data Collection
Teams routinely underestimate in-house data collection costs by 2–3× because they only count hardware and operator wages. The real cost stack includes equipment acquisition, operator training and wages, annotation, compute infrastructure, and a significant QA and rejection budget.
- Robot hardware: A 6-DOF collaborative arm (Universal Robots UR5e, Kinova Gen3, or Franka FR3) runs $15,000–$50,000. High-end platforms like the ALOHA bimanual system or Unitree Z1 reach $30,000–$80,000 once you add grippers, mounting hardware, and safety enclosures.
- Teleoperation system: A basic leader-follower system (e.g., ACT-style) using a low-cost puppet arm adds $2,000–$8,000. VR-based teleoperation with a Meta Quest 3 or HTC Vive Tracker setup runs $2,000–$25,000 depending on haptic feedback requirements.
- Cameras: Budget $200–$600 each for Intel RealSense D435i or ZED 2 stereo cameras. A standard 3-camera rig (2 fixed + 1 wrist) costs $1,000–$2,500 in hardware alone, plus mounts, cables, and lighting rigs.
- Operator wages: Skilled teleoperation operators earn $25–$45/hour in the US. Fully loaded (benefits, training, supervision overhead), budget $35–$65/hour. A typical operator completes 30–80 demonstrations per day depending on task complexity.
- Operator training: Plan for 3–5 days of onboarding per operator ($1,000–$3,000 per person in lost productivity + trainer time) before they reach production quality.
- Annotation: Even with good teleoperation, many datasets need post-hoc labeling — success/failure labels, object segmentation masks, or contact event timestamps. Budget $0.05–$2.50 per frame depending on task complexity.
- Compute and infrastructure: Storing, preprocessing, and versioning HDF5 episode files runs $0.50–$5.00 per trajectory at scale. A 10,000-demo dataset can accumulate $5,000–$50,000 in cloud storage and compute costs.
- QA rejection rate: For new tasks with inexperienced operators, expect 20–40% of collected demonstrations to be rejected during quality review. Budget for this waste explicitly.
Putting it together: an all-in cost of $50–$200 per demonstration is typical for new in-house programs. That means a 5,000-demo dataset can cost $250,000–$1,000,000 when you count everything.
3 Signals You Should Outsource Data Collection
Outsourcing to a specialized data provider makes sense when the following conditions are present:
- High task diversity (>20 distinct scenes or SKUs): When your policy must generalize across many object types, backgrounds, or kitchen/warehouse environments, you need breadth of data that is expensive to achieve in a single lab. A provider with multiple collection sites and pre-trained operators can deliver this breadth in weeks rather than months.
- Compressed timeline (<8 weeks to data delivery): Staffing, training, and ramping an in-house operation takes 4–8 weeks before first production demos. If you need 2,000+ demonstrations in under 8 weeks, outsourcing is the only viable path — providers like SVRC have operators and infrastructure already running.
- Team lacking teleoperation experience: Teleoperation quality is highly skill-dependent. A team that has never run a data collection campaign will spend 4–6 weeks on tooling, calibration, and operator training before producing policy-quality data. This is opportunity cost that a focused ML team cannot afford during an early product cycle.
3 Signals You Should Build In-House
- Proprietary hardware or task secrecy: If your robot platform is pre-production, uses a novel end-effector, or if your task involves trade-secret workflows, you cannot send hardware or procedures to an external lab. In-house collection is the only option.
- Ongoing, continuous dataset curation: Policies that run in production need continuous improvement — collecting failure cases, adding new SKUs, handling distribution shift. This is a long-term operational function, not a one-time project, and it is more cost-effective to build in-house when the program runs for 12+ months.
- Infrastructure budget already committed (>$500K): If your organization has already committed capital to compute infrastructure, a dedicated robot lab, and full-time robotics staff, the marginal cost of data collection shifts dramatically in favor of in-house. The fixed cost is sunk; only variable costs matter at that point.
The Hybrid Model
The most effective approach for teams past their initial pilot is a hybrid model: outsource breadth, build depth.
Concretely, this means contracting a data provider to collect a large, diverse "foundation" dataset — 5,000–20,000 demonstrations across all task variants and environments. The in-house team then collects a smaller, high-quality "fine-tuning" set (200–1,000 demos) on the exact hardware and in the exact deployment environment where the robot will operate.
This hybrid approach typically reduces total data cost by 30–50% versus pure in-house, while achieving better policy performance than pure outsourcing because the fine-tuning set captures real deployment distribution. It also preserves any trade-secret workflows in the fine-tuning phase, which stays internal.
Cost Comparison: SVRC vs. DIY vs. Academic
| Approach | Cost/Demo | Time to 5K Demos | Quality Control | Scalability |
|---|---|---|---|---|
| SVRC Data Services | $20–$60 | 3–6 weeks | Automated + human QA | High (fleet of operators) |
| DIY (new program) | $80–$200 | 3–6 months | Manual, ad hoc | Low (bottlenecked by ops) |
| DIY (mature program) | $30–$80 | 6–12 weeks | Systematic QA pipeline | Medium |
| Academic collaboration | $5–$20 | 3–9 months | Variable | Very low |
The academic route is cheapest per demo but has the longest and least predictable timelines. SVRC data services sit at a price point that beats DIY for any team that has not yet amortized a full in-house operation.
Use our data services page to get a custom quote for your task and volume, or explore the SVRC platform to understand how collected data flows into policy training.