Why Sim Metrics Are Not Enough

Simulation success rates often do not transfer to real-world performance. Lighting changes, object variations, calibration drift, and sensor noise all affect real performance. A rigorous real-world evaluation protocol is essential for publishable results and deployment decisions.

The Evaluation Protocol

Run minimum 50 trials per condition (100 preferred for tight confidence intervals). Vary object instances, positions, lighting conditions, and operator. Report success rate with 95% confidence intervals using Wilson score. Record and review all failure episodes. Document environment conditions for reproducibility.

  • 50+ trials per condition
  • At least 3 object variations
  • 2+ lighting conditions
  • Wilson score confidence intervals
  • Video recording of all trials

Common Pitfalls

Cherry-picking easy starting configurations, not reporting failure modes, using the same object instance for all trials, and running evaluations immediately after tuning (overfitting to current conditions). SVRC's evaluation services provide standardized, reproducible testing environments.