The Persistent Sim-to-Real Gap

Simulation has come a long way. Isaac Lab renders rigid-body physics at thousands of frames per second. MuJoCo MPC closes manipulation loops in milliseconds. Yet every team that has tried to ship a contact-rich policy trained purely in simulation has hit the same wall: the real world does not behave the way the simulator predicts, and the policy fails in ways that are difficult to diagnose and expensive to fix.

The core issue is contact modeling. When a robot finger touches a soap bar, the contact stiffness, friction coefficient, and surface micro-geometry all interact to determine whether the bar slides, rolls, or stays put. Simulation engines approximate this with a handful of scalar parameters. Real silicone soap on a wet ABS surface has a friction coefficient that varies with sliding speed, normal force, and surface wetness in ways that no physics engine currently models accurately. The policy trained in simulation learns to exploit the simulated contact model, not the real one.

Why Human Demonstrations Capture the Right Distribution

When a skilled operator teleoperates a robot arm to pick up a soap bar, they are not following a computed trajectory. They are using the same sensorimotor heuristics that evolution spent millions of years optimizing. They instinctively adjust approach angle when they see a rounded surface. They feel (through haptic feedback or visual cues) when a grasp is unstable and correct before failure. They find grasp configurations that are not globally optimal but are locally stable — exactly the kind of solutions that transfer to deployment.

This means teleoperation demonstrations implicitly encode two things that simulation cannot easily provide: the correct task distribution (the set of object poses, grasp strategies, and action sequences that actually work in the physical world) and implicit constraint satisfaction (a skilled operator never attempts a grasp that their experience tells them will fail — the failure modes are filtered out before data collection even begins).

Three Case Studies Where Simulation Fails

  • Soap bar grasping: Six teams across industry reported that policies trained in Isaac Sim with default PBR friction parameters achieved >80% success in simulation but <40% on real hardware. The contact stiffness model was wrong. Switching to real teleoperation data brought success rates above 85% within 300 demonstrations.
  • Cable insertion: Deformable geometry is essentially unsolved in real-time physics engines. A USB-C cable's deformation under finger contact depends on its specific braid tension, jacket stiffness, and core compliance. Simulation policies for cable routing achieve roughly 20% success in real; teleoperation-trained policies with 500 demos achieve 70-80%.
  • Liquid pouring: Fluid dynamics simulation at the scale of a cup of water is computationally tractable with SPH or grid-based methods, but the interaction between fluid, cup rim geometry, and surface tension is complex enough that sim-trained policies systematically over-pour. A 200-demo teleoperation dataset produced policies that outperformed 50K-step RL policies trained in simulation.

What Simulation Actually Gets Right

This is not an argument against simulation — it is an argument for using simulation for what it is good at. Three areas where sim genuinely helps:

  • Free-space motion planning: Collision-free trajectory generation in known environments transfers well from sim to real. The physics that matter (rigid body kinematics) are modeled accurately.
  • Diverse scene generation: Simulation can generate thousands of object poses, table configurations, and environment layouts that would take weeks to set up physically. This diversity is valuable for pre-training visual representations.
  • Infinite data scale for coarse behaviors: Getting a robot to roughly orient toward a target, approach an object, or navigate a hallway can be bootstrapped from millions of simulated episodes. The coarse behavior transfers even if the fine-grained contact policy does not.

The Right Mental Model: Coarse in Sim, Fine in Real

The most effective approach we have seen is a two-phase strategy. In phase one, use simulation to train a broad prior: the robot learns to approach objects, estimate grasp candidates, and execute rough pick motions across thousands of object categories. This phase can run overnight on a single A100 and produces a policy that gets within 10cm of the right grasp ~90% of the time.

In phase two, collect 200-500 real teleoperation demonstrations on the specific task. Fine-tune the simulation-pretrained model on this real data. The combination typically outperforms either sim-only or real-only approaches, and it reduces the real data requirement by 5-10× compared to training from scratch.

SVRC Observations from Data Collection Work

Across dozens of data collection projects at SVRC, we have consistently observed that data quality beats data quantity for contact tasks. A collection of 300 expert demonstrations from trained operators outperforms 1,500 demonstrations from novice operators on tasks involving contact, insertion, or surface-following. The expert operators instinctively avoid demonstrations that will confuse the policy — they use consistent, clean motions that the learning algorithm can actually extract signal from.

We have also observed that the first 200 demonstrations show diminishing returns on most L2-complexity tasks (structured pick-place). The next 300 demonstrations improve robustness to novel object poses. Beyond 500, further improvement requires introducing new object instances and environmental variations — not simply more of the same.

The Hybrid Strategy in Practice

Our recommended approach for teams starting a new manipulation task: (1) build or download a simulation environment for coarse behavior pre-training, (2) run 50K-100K sim steps to initialize the policy, (3) collect 300-500 real teleoperation demonstrations through a structured data collection protocol, (4) fine-tune, (5) evaluate in 3 novel conditions you have not trained on. This pipeline typically produces deployment-ready policies in 4-6 weeks rather than the 3-6 months required for sim-only approaches.

For teams who want to start collecting real demonstration data today, SVRC's data collection services provide trained operators, calibrated hardware, and a structured quality pipeline — so you get clean, policy-ready data without building the infrastructure yourself.