What to Look For in a Robot Data Provider
The robot training data market is young and underregulated. Unlike traditional AI data labeling, robot demonstration data requires physical equipment, skilled operators, and task-specific knowledge that most data labeling companies do not have. Evaluating providers requires different criteria than you would use for image labeling or text annotation services.
Four dimensions matter most: robot variety (a policy trained on a single arm type generalizes poorly), operator quality (smooth, consistent demonstrations train better than jerky or hesitant ones), data format flexibility (receiving data in a format incompatible with your training framework wastes weeks of conversion time), and legal clarity on IP ownership and confidentiality.
Start by defining your requirements precisely: which robot arm(s), task description with success criteria, environment setup, number of demonstrations, target format, and timeline. A provider who asks these questions before quoting is more credible than one who quotes immediately.
Key Evaluation Questions to Ask Every Provider
- Which specific robot models do you operate? Get a list with arm models, gripper options, and available robot hands. Vague answers ("various industrial arms") are a red flag.
- What is your demonstration rejection rate, and how is it measured? Target: <15% rejection. A good provider tracks rejection reasons (operator error, hardware fault, task infeasible) separately.
- Do you perform annotation? Some tasks benefit from annotated contact events, success/failure labels, or language descriptions per episode. Ask whether annotation is included or priced separately.
- What data format do you deliver, and can you export to HDF5, LeRobot Parquet, and RLDS? Requiring format conversion on your side adds 1–3 weeks of engineering work. See our format guide.
- Can I see 5–10 sample episodes before committing? Any serious provider has a demo dataset. Review it for smoothness, camera framing, gripper timing, and episode-level consistency.
- What are your NDA and IP terms? Specifically: does your contract prohibit the provider from using your task data to train their own models?
- What is your environment replication process? If your task requires a specific tabletop layout, object set, or background, how do they replicate it? Do they require you to ship objects, or can they source proxies?
Pricing Models in the Robot Data Market
Pricing varies enormously depending on task complexity, robot type, required demonstrations, and quality level. Understand the model before comparing quotes — a low per-demo price with a high rejection rate may cost more in practice than a higher per-demo price with quality guarantees.
| Pricing Model | Typical Range | Best For | Watch Out For |
|---|---|---|---|
| Per-demonstration | $15–$200/demo | Well-defined tasks with clear success criteria | Rejection rate not included in price; quality variance |
| Per-hour (operator) | $150–$500/hr | Exploratory tasks, novel robot setups | Efficiency varies; no guaranteed demos per hour |
| Project-based (fixed) | $5K–$100K+ | Large defined dataset with full spec | Scope creep if task spec is loose; no iteration |
| Volume tier | $80/demo → $25/demo at 500+ | High-volume production datasets | Only works if first batch quality is verified |
| Subscription / retainer | $3K–$15K/month | Ongoing data collection pipelines | Overkill for one-time research; good for flywheels |
For a typical manipulation research project (single task, 200 demonstrations, bimanual, medium complexity), expect $8K–$25K at per-demo pricing or $15K–$40K at hourly rates depending on setup time. Factor in: environment setup (charged once), operator training on your task (1–3 hours typically), quality review, and format conversion.
SVRC prices manipulation data at $25–$80 per demonstration depending on task complexity and robot configuration, with volume discounts at 500+ demos. All prices include QA review, metadata, and multi-format export.
Contract Terms: What Must Be in Your Agreement
The following terms are non-negotiable. Walk away from any provider who will not include them.
- Full IP transfer: "All demonstration data, including raw sensor logs, video, joint trajectories, and derived annotations, are the exclusive intellectual property of [Your Company]." No "license" language — you must own it outright.
- No training on your data: "Provider shall not use Client's task specifications, demonstration data, or derived data to train, fine-tune, or benchmark any Provider model or third-party model." This clause is often missing in early contracts.
- Rejection rate guarantee: Define the rejection criteria in the contract. "Provider guarantees a maximum 15% rejection rate; rejected episodes are not billed." This aligns incentives.
- Data retention and deletion: After delivery, the provider should delete all copies within 30 days. Specify the destruction certificate requirement for sensitive tasks.
- Confidentiality covers task specification: Your task design may be proprietary. The NDA must cover not just data but the task description, environment setup, and object list.
Red Flags: When to Walk Away
- Cannot disclose rejection rate — means their quality process is immature or rejection is high.
- Proprietary format only — "We deliver in our platform format" with no standard export means you are locked to their analysis tools or face expensive conversion.
- No sample episodes available — a provider who has collected robot data at scale has samples. No samples = no track record.
- Fewer than 2 distinct robot types — a provider running only one arm type cannot offer the diversity needed for generalization unless you specifically need single-arm data.
- Operator credentials unavailable — "trained teleoperators" without specifics means you cannot assess quality. Good providers describe operator selection, training process, and demonstrate operator-consistency metrics.
- Vague IP language — phrases like "non-exclusive license" or "Provider retains derivative rights" are unacceptable for proprietary task data.
- No task feasibility assessment — a credible provider will tell you if your task is too difficult, requires special hardware, or has a low expected success rate before you pay.
Provider Comparison
| Provider | Robot Variety | Format Output | IP Clarity | Typical Price/Demo | Best For |
|---|---|---|---|---|---|
| SVRC (us) | High (8+ arms, humanoids, hands) | HDF5, LeRobot, RLDS | Full transfer standard | $25–$80 | Research + commercial, flexible format |
| Scale AI Robotics | Medium (UR, xArm focus) | Custom + some standard | Good (enterprise) | $40–$150 | Large commercial contracts |
| DIY (internal) | Whatever you own | Your choice | N/A | $5–$15 (labor only) | Maximum control, if you have bandwidth |
| Academic lab partnership | Varies | Usually HDF5/custom | Requires explicit agreement | $0–$20 | Cost-sensitive, relationship-dependent |