The Sim-to-Real Gap

Simulation is appealing for robot learning: it is cheap, parallelizable, and risk-free. But the sim-to-real gap — the difference between simulated and real-world physics and perception — has limited simulation's practical utility for manipulation to a narrower set of scenarios than the research literature sometimes suggests.

The gap has three distinct sources, each requiring different mitigation strategies:

  • Visual gap: Simulated images use rasterized textures, synthetic lighting, and idealized material properties. Real images have complex global illumination, specular reflections, dust, motion blur, and depth noise that simulators do not reproduce accurately.
  • Dynamics gap: Rigid-body simulators (MuJoCo, Isaac Lab, PyBullet) use simplified contact models that do not accurately capture real-world friction, compliance, and contact dynamics. This gap is most severe for contact-rich manipulation tasks (peg insertion, assembly, deformable object handling).
  • Sensor noise gap: Real depth cameras (RealSense D435i, ZED 2) produce structured noise, dropouts on shiny/dark surfaces, and temporal flickering that is difficult to reproduce in simulation. RGB cameras have fixed-pattern noise, lens distortion, and ISP artifacts.

Domain Randomization

Domain randomization (DR) addresses the visual and dynamics gaps by training the policy on a large distribution of simulated environments, hoping that the real world falls within this distribution. The policy must learn to generalize across the randomized factors rather than memorizing any single simulated environment.

  • Visual DR: Randomize object textures (uniform color noise, random ImageNet textures), lighting (random positions, colors, intensities), background (random images from web datasets), and camera pose (small perturbations around nominal). Standard ranges: lighting intensity ±50%, camera position ±3 cm, background: random 10K images.
  • Dynamics DR: Randomize object mass (±30% of nominal), friction coefficients (±50%), robot joint damping (±20%), and contact stiffness (±40%). These ranges are empirically determined — too wide and the policy becomes overly conservative; too narrow and it overfits to nominal simulation.
  • Mass randomization for grasp tasks: A ±30% mass perturbation is sufficient for most rigid-body grasping tasks. For tasks sensitive to inertia (throwing, dynamic manipulation), ±20% mass combined with ±30% inertia tensor randomization is more appropriate.

Photorealistic Rendering

An alternative to DR is closing the visual gap directly with photorealistic rendering. NVIDIA Replicator (in Isaac Sim), Blender with Cycles, and custom NeRF-based renderers can produce images close enough to real that the policy does not need to generalize across a large distribution.

  • NVIDIA Isaac Lab / Replicator: Path-traced rendering with PBR materials. Integrates with Isaac Lab RL framework. Rendering cost: 10–100× slower than rasterization. Best for generating evaluation images rather than large training sets.
  • BlenderProc: Python-based Blender pipeline for dataset generation. Excellent for generating labeled synthetic datasets for perception models (object detection, pose estimation). Less useful for end-to-end policy training due to rendering speed.
  • NeRF-based rendering: Capture the real environment with a NeRF (nerfstudio, Gaussian Splatting) and render new viewpoints/configurations. Combines realistic real-world appearance with the flexibility of simulation. Emerging approach — not yet production-ready for most teams.

What Transfers Well vs. Poorly

Task CategorySim-to-Real TransferWhyMitigation If Needed
Free-space arm motionExcellentNo contact, kinematics accurateNone required
Coarse top-down graspingGoodLarge contact surfaces, forgivingLight visual DR
Pick-and-place (rigid, known objects)GoodLow contact complexityVisual DR + pose randomization
Precision peg insertion (±2mm)PoorContact model inaccuracy dominatesReal data required
Deformable object manipulationVery poorFEM simulation errors largeMostly real data required
Bimanual coordinationModerateTiming depends on contactHybrid: sim init + real fine-tune
Cloth/fabric foldingVery poorSoft-body sim diverges quicklyReal data required

Hybrid Approaches That Work

The most cost-effective approach for most teams is a hybrid: use simulation for the portion of the task where sim transfer works, and collect real demonstrations for the contact-rich or precision-sensitive portions.

  • Sim pre-training + real fine-tuning: Pre-train the policy on 10,000 simulated demonstrations of approximate task behavior (approach, coarse grasp). Fine-tune on 500–1,000 real demonstrations of the precision phase. This 10:1 sim-to-real ratio is the most widely validated approach in manipulation research.
  • Sim for curriculum: Use simulation to generate easy training examples at the beginning of training (object placed directly under gripper, large tolerance) and progressively increase difficulty. This curriculum accelerates early policy learning, reducing the real data needed to reach performance thresholds.
  • Sim for data augmentation: Use simulation to generate additional viewpoints, lighting conditions, and object configurations that supplement a real demonstration dataset. Particularly effective for improving visual generalization without additional real collection cost.

Simulator Comparison

SimulatorPhysics EngineRenderingSpeed (FPS)Best For
Isaac Lab (NVIDIA)PhysX 5RTX path-tracing1,000–10,000 parallelRL at scale, GPU clusters
MuJoCoCustomBasic OpenGL500–5,000Research, accurate contact
PyBulletBullet 3Basic OpenGL100–1,000Easy setup, prototyping
WebotsODEBasic100–500Education, ROS integration
Gazebo / gz-simODE/Bullet/DARTOGRE50–200ROS ecosystem integration

For teams running RL training that requires millions of environment steps, Isaac Lab on a multi-GPU cluster is the current standard. For academic research or debugging, MuJoCo remains the most-cited simulator for contact-rich manipulation.

If you need real demonstration data to complement your simulation program, the SVRC data services team can scope a targeted real-data collection campaign for the portions of your task where simulation fails.