The Sim-to-Real Gap
Simulation is appealing for robot learning: it is cheap, parallelizable, and risk-free. But the sim-to-real gap — the difference between simulated and real-world physics and perception — has limited simulation's practical utility for manipulation to a narrower set of scenarios than the research literature sometimes suggests.
The gap has three distinct sources, each requiring different mitigation strategies:
- Visual gap: Simulated images use rasterized textures, synthetic lighting, and idealized material properties. Real images have complex global illumination, specular reflections, dust, motion blur, and depth noise that simulators do not reproduce accurately.
- Dynamics gap: Rigid-body simulators (MuJoCo, Isaac Lab, PyBullet) use simplified contact models that do not accurately capture real-world friction, compliance, and contact dynamics. This gap is most severe for contact-rich manipulation tasks (peg insertion, assembly, deformable object handling).
- Sensor noise gap: Real depth cameras (RealSense D435i, ZED 2) produce structured noise, dropouts on shiny/dark surfaces, and temporal flickering that is difficult to reproduce in simulation. RGB cameras have fixed-pattern noise, lens distortion, and ISP artifacts.
Domain Randomization
Domain randomization (DR) addresses the visual and dynamics gaps by training the policy on a large distribution of simulated environments, hoping that the real world falls within this distribution. The policy must learn to generalize across the randomized factors rather than memorizing any single simulated environment.
- Visual DR: Randomize object textures (uniform color noise, random ImageNet textures), lighting (random positions, colors, intensities), background (random images from web datasets), and camera pose (small perturbations around nominal). Standard ranges: lighting intensity ±50%, camera position ±3 cm, background: random 10K images.
- Dynamics DR: Randomize object mass (±30% of nominal), friction coefficients (±50%), robot joint damping (±20%), and contact stiffness (±40%). These ranges are empirically determined — too wide and the policy becomes overly conservative; too narrow and it overfits to nominal simulation.
- Mass randomization for grasp tasks: A ±30% mass perturbation is sufficient for most rigid-body grasping tasks. For tasks sensitive to inertia (throwing, dynamic manipulation), ±20% mass combined with ±30% inertia tensor randomization is more appropriate.
Photorealistic Rendering
An alternative to DR is closing the visual gap directly with photorealistic rendering. NVIDIA Replicator (in Isaac Sim), Blender with Cycles, and custom NeRF-based renderers can produce images close enough to real that the policy does not need to generalize across a large distribution.
- NVIDIA Isaac Lab / Replicator: Path-traced rendering with PBR materials. Integrates with Isaac Lab RL framework. Rendering cost: 10–100× slower than rasterization. Best for generating evaluation images rather than large training sets.
- BlenderProc: Python-based Blender pipeline for dataset generation. Excellent for generating labeled synthetic datasets for perception models (object detection, pose estimation). Less useful for end-to-end policy training due to rendering speed.
- NeRF-based rendering: Capture the real environment with a NeRF (nerfstudio, Gaussian Splatting) and render new viewpoints/configurations. Combines realistic real-world appearance with the flexibility of simulation. Emerging approach — not yet production-ready for most teams.
What Transfers Well vs. Poorly
| Task Category | Sim-to-Real Transfer | Why | Mitigation If Needed |
|---|---|---|---|
| Free-space arm motion | Excellent | No contact, kinematics accurate | None required |
| Coarse top-down grasping | Good | Large contact surfaces, forgiving | Light visual DR |
| Pick-and-place (rigid, known objects) | Good | Low contact complexity | Visual DR + pose randomization |
| Precision peg insertion (±2mm) | Poor | Contact model inaccuracy dominates | Real data required |
| Deformable object manipulation | Very poor | FEM simulation errors large | Mostly real data required |
| Bimanual coordination | Moderate | Timing depends on contact | Hybrid: sim init + real fine-tune |
| Cloth/fabric folding | Very poor | Soft-body sim diverges quickly | Real data required |
Hybrid Approaches That Work
The most cost-effective approach for most teams is a hybrid: use simulation for the portion of the task where sim transfer works, and collect real demonstrations for the contact-rich or precision-sensitive portions.
- Sim pre-training + real fine-tuning: Pre-train the policy on 10,000 simulated demonstrations of approximate task behavior (approach, coarse grasp). Fine-tune on 500–1,000 real demonstrations of the precision phase. This 10:1 sim-to-real ratio is the most widely validated approach in manipulation research.
- Sim for curriculum: Use simulation to generate easy training examples at the beginning of training (object placed directly under gripper, large tolerance) and progressively increase difficulty. This curriculum accelerates early policy learning, reducing the real data needed to reach performance thresholds.
- Sim for data augmentation: Use simulation to generate additional viewpoints, lighting conditions, and object configurations that supplement a real demonstration dataset. Particularly effective for improving visual generalization without additional real collection cost.
Simulator Comparison
| Simulator | Physics Engine | Rendering | Speed (FPS) | Best For |
|---|---|---|---|---|
| Isaac Lab (NVIDIA) | PhysX 5 | RTX path-tracing | 1,000–10,000 parallel | RL at scale, GPU clusters |
| MuJoCo | Custom | Basic OpenGL | 500–5,000 | Research, accurate contact |
| PyBullet | Bullet 3 | Basic OpenGL | 100–1,000 | Easy setup, prototyping |
| Webots | ODE | Basic | 100–500 | Education, ROS integration |
| Gazebo / gz-sim | ODE/Bullet/DART | OGRE | 50–200 | ROS ecosystem integration |
For teams running RL training that requires millions of environment steps, Isaac Lab on a multi-GPU cluster is the current standard. For academic research or debugging, MuJoCo remains the most-cited simulator for contact-rich manipulation.
If you need real demonstration data to complement your simulation program, the SVRC data services team can scope a targeted real-data collection campaign for the portions of your task where simulation fails.