What to Look For in a Robot Data Provider

The robot training data market is young and underregulated. Unlike traditional AI data labeling, robot demonstration data requires physical equipment, skilled operators, and task-specific knowledge that most data labeling companies do not have. Evaluating providers requires different criteria than you would use for image labeling or text annotation services.

Four dimensions matter most: robot variety (a policy trained on a single arm type generalizes poorly), operator quality (smooth, consistent demonstrations train better than jerky or hesitant ones), data format flexibility (receiving data in a format incompatible with your training framework wastes weeks of conversion time), and legal clarity on IP ownership and confidentiality.

Start by defining your requirements precisely: which robot arm(s), task description with success criteria, environment setup, number of demonstrations, target format, and timeline. A provider who asks these questions before quoting is more credible than one who quotes immediately.

Key Evaluation Questions to Ask Every Provider

  • Which specific robot models do you operate? Get a list with arm models, gripper options, and available robot hands. Vague answers ("various industrial arms") are a red flag.
  • What is your demonstration rejection rate, and how is it measured? Target: <15% rejection. A good provider tracks rejection reasons (operator error, hardware fault, task infeasible) separately.
  • Do you perform annotation? Some tasks benefit from annotated contact events, success/failure labels, or language descriptions per episode. Ask whether annotation is included or priced separately.
  • What data format do you deliver, and can you export to HDF5, LeRobot Parquet, and RLDS? Requiring format conversion on your side adds 1–3 weeks of engineering work. See our format guide.
  • Can I see 5–10 sample episodes before committing? Any serious provider has a demo dataset. Review it for smoothness, camera framing, gripper timing, and episode-level consistency.
  • What are your NDA and IP terms? Specifically: does your contract prohibit the provider from using your task data to train their own models?
  • What is your environment replication process? If your task requires a specific tabletop layout, object set, or background, how do they replicate it? Do they require you to ship objects, or can they source proxies?

Pricing Models in the Robot Data Market

Pricing varies enormously depending on task complexity, robot type, required demonstrations, and quality level. Understand the model before comparing quotes — a low per-demo price with a high rejection rate may cost more in practice than a higher per-demo price with quality guarantees.

Pricing ModelTypical RangeBest ForWatch Out For
Per-demonstration$15–$200/demoWell-defined tasks with clear success criteriaRejection rate not included in price; quality variance
Per-hour (operator)$150–$500/hrExploratory tasks, novel robot setupsEfficiency varies; no guaranteed demos per hour
Project-based (fixed)$5K–$100K+Large defined dataset with full specScope creep if task spec is loose; no iteration
Volume tier$80/demo → $25/demo at 500+High-volume production datasetsOnly works if first batch quality is verified
Subscription / retainer$3K–$15K/monthOngoing data collection pipelinesOverkill for one-time research; good for flywheels

For a typical manipulation research project (single task, 200 demonstrations, bimanual, medium complexity), expect $8K–$25K at per-demo pricing or $15K–$40K at hourly rates depending on setup time. Factor in: environment setup (charged once), operator training on your task (1–3 hours typically), quality review, and format conversion.

SVRC prices manipulation data at $25–$80 per demonstration depending on task complexity and robot configuration, with volume discounts at 500+ demos. All prices include QA review, metadata, and multi-format export.

Contract Terms: What Must Be in Your Agreement

The following terms are non-negotiable. Walk away from any provider who will not include them.

  • Full IP transfer: "All demonstration data, including raw sensor logs, video, joint trajectories, and derived annotations, are the exclusive intellectual property of [Your Company]." No "license" language — you must own it outright.
  • No training on your data: "Provider shall not use Client's task specifications, demonstration data, or derived data to train, fine-tune, or benchmark any Provider model or third-party model." This clause is often missing in early contracts.
  • Rejection rate guarantee: Define the rejection criteria in the contract. "Provider guarantees a maximum 15% rejection rate; rejected episodes are not billed." This aligns incentives.
  • Data retention and deletion: After delivery, the provider should delete all copies within 30 days. Specify the destruction certificate requirement for sensitive tasks.
  • Confidentiality covers task specification: Your task design may be proprietary. The NDA must cover not just data but the task description, environment setup, and object list.

Red Flags: When to Walk Away

  • Cannot disclose rejection rate — means their quality process is immature or rejection is high.
  • Proprietary format only — "We deliver in our platform format" with no standard export means you are locked to their analysis tools or face expensive conversion.
  • No sample episodes available — a provider who has collected robot data at scale has samples. No samples = no track record.
  • Fewer than 2 distinct robot types — a provider running only one arm type cannot offer the diversity needed for generalization unless you specifically need single-arm data.
  • Operator credentials unavailable — "trained teleoperators" without specifics means you cannot assess quality. Good providers describe operator selection, training process, and demonstrate operator-consistency metrics.
  • Vague IP language — phrases like "non-exclusive license" or "Provider retains derivative rights" are unacceptable for proprietary task data.
  • No task feasibility assessment — a credible provider will tell you if your task is too difficult, requires special hardware, or has a low expected success rate before you pay.

Provider Comparison

ProviderRobot VarietyFormat OutputIP ClarityTypical Price/DemoBest For
SVRC (us)High (8+ arms, humanoids, hands)HDF5, LeRobot, RLDSFull transfer standard$25–$80Research + commercial, flexible format
Scale AI RoboticsMedium (UR, xArm focus)Custom + some standardGood (enterprise)$40–$150Large commercial contracts
DIY (internal)Whatever you ownYour choiceN/A$5–$15 (labor only)Maximum control, if you have bandwidth
Academic lab partnershipVariesUsually HDF5/customRequires explicit agreement$0–$20Cost-sensitive, relationship-dependent