Robosuite: Standardized Manipulation Benchmarks on MuJoCo
Built by the ARISE Initiative (Stanford, UT Austin, ASU), Robosuite is the most widely cited manipulation benchmark in robot learning. It ships 9 core tabletop tasks on MuJoCo, drop-in support for Mobile ALOHA and robomimic, controller abstractions from joint-torque to operational-space, and an MIT license that makes it the default choice for lab-wide imitation-learning pipelines.
What is Robosuite?
Robosuite is an open-source modular simulation framework and benchmark for robot manipulation learning, originally introduced in the Surreal project at Stanford and later developed by the ARISE Initiative — a joint effort by Stanford, UT Austin, and Arizona State University. Released in 2018 and continuously maintained since, it is the most widely cited benchmark in published imitation-learning and offline-RL manipulation papers.
The framework sits on top of MuJoCo and adds three things that plain MuJoCo does not give you: a standardized task interface with reward functions, success criteria, and reset logic; a robot abstraction layer that lets the same task run on a Franka, Sawyer, UR5e, Kinova, IIWA, Jaco, or the Mobile ALOHA bimanual platform with a single argument change; and a controller abstraction layer that exposes joint-torque, joint-position, operational-space, and inverse-kinematics control under a unified API. Together, these abstractions make Robosuite the natural environment for training imitation-learning policies that need to generalize across robot embodiments.
Robosuite's tight integration with robomimic — the ARISE Initiative's companion library for imitation learning — is one of its defining features. Robomimic provides BC, BC-RNN, BC-Transformer, IQL, and CQL baselines with reference checkpoints on all core Robosuite tasks, and the two libraries are designed to hand data and configs back and forth cleanly. Diffusion Policy, ACT, and many VLA evaluation papers use Robosuite-robomimic as their default benchmarking stack.
Installation Quickstart
Robosuite installs in under two minutes. The framework pulls MuJoCo automatically as a dependency, so a single pip install produces a working setup:
# Install Robosuite (MuJoCo is pulled as a dependency)
pip install robosuite
# Verify installation with a random-action rollout on Lift
python -c "import robosuite as suite; \
env = suite.make('Lift', robots='Panda', has_renderer=False); \
obs = env.reset(); \
for _ in range(100): obs, r, done, info = env.step(env.action_spec[0] * 0); \
print('Robosuite ready, reward:', r)"
For imitation learning, add robomimic and the demonstration datasets from the ARISE hub. A complete Robosuite + robomimic workflow — collect demos, train BC-Transformer, evaluate — takes three commands:
pip install robomimic
# Download 200 human demonstrations for the Square peg task
python -m robomimic.scripts.download_datasets --tasks square
# Train BC-Transformer on those demos (~30 min on an RTX 4090)
python -m robomimic.scripts.train --config bc_transformer_square.json
Mobile ALOHA bimanual support is available as a first-class robot option — suite.make('TwoArmTransport', robots='ALOHA') — and includes teleoperation replay and action-chunking-compatible datasets out of the box.
Supported Robots and Tasks
Robosuite supports eight tabletop manipulator arms: Franka Panda, Rethink Sawyer, Universal Robots UR5e, Kinova Gen3, Kuka IIWA, Jaco, Baxter (bimanual), and ALOHA (bimanual). The robot abstraction layer is deep — reward functions, observation spaces, and controller parameters all adapt to the selected robot automatically, which is why cross-embodiment imitation-learning papers use Robosuite as their evaluation substrate.
The core benchmark consists of nine standardized tasks that have become the reference set for published manipulation results: Lift (single-object lifting), Stack (vertical stacking), Pick-Place (ordered bin sorting across four object classes), Nut Assembly (peg-in-hole with square and round nuts), Door (lever opening), Wipe (surface cleaning with variable dirt distributions), Two-Arm Transport (bimanual handover), Two-Arm Peg-In-Hole (bimanual assembly), and Two-Arm Lift (bimanual cooperative lifting). Each task ships with dense and sparse reward variants, configurable object sets, and deterministic seeds.
Beyond the core tasks, Robosuite's MJCF-authoring system makes it straightforward to build custom tabletop scenes: tool use, pouring, insertion with clearance variation, articulated-object manipulation, and language-conditioned multi-stage sequences are all common community extensions. LIBERO, RoboCasa, and MimicGen are all built on top of Robosuite and extend it to long-horizon, kitchen-scale, and procedurally generated scenarios respectively.
Benchmarks on Robosuite
The Robosuite core-9 task set is the most reproduced manipulation benchmark in the literature. Reference numbers on Lift, Can, Square, and Transport come from the robomimic paper (Mandlekar et al., CoRL 2021) and are the baseline that nearly every subsequent imitation-learning paper reports against. Diffusion Policy (Chi et al., RSS 2023), ACT (Zhao et al., RSS 2023), and BC-Transformer all published their headline numbers on Robosuite tasks.
For VLA evaluation, Robosuite underpins a growing share of the standardized test beds. LIBERO-130 is built on Robosuite's task-authoring system and uses the same observation and action spaces. MimicGen, which automatically generates large-scale demonstrations from a few human examples, runs end-to-end on Robosuite. The Open X-Embodiment and RoboCasa datasets both use Robosuite-compatible action specs, which makes cross-dataset policy evaluation straightforward.
Pros and Cons
Strengths. MIT license, minimal install (one pip command), eight pre-calibrated robot arms with automatic reward adaptation, nine standardized core tasks with published baselines, tight integration with robomimic, LIBERO, MimicGen, and Mobile ALOHA, and a deep library of community-contributed extensions. Because Robosuite inherits MuJoCo's contact physics, it is genuinely accurate for tabletop manipulation.
Weaknesses. Scope is intentionally limited to tabletop manipulation — no legged locomotion, no mobile manipulation (outside a few community forks), no photorealistic rendering. The controller abstraction is powerful but adds a learning curve for teams coming from raw MuJoCo. The codebase assumes MuJoCo as the backend; porting to Isaac Sim or Genesis requires community forks. CPU-bound stepping caps throughput compared with GPU-parallel frameworks like Isaac Lab.
When to Pick Robosuite
Choose Robosuite when your experiments are tabletop manipulation and you want standardized tasks rather than building your own. It is the right default for imitation-learning research, VLA evaluation on a Franka or bimanual ALOHA, robomimic baseline reproduction, and any workflow where publishing apples-to-apples comparisons against prior papers matters. The MIT license and one-command install also make it the easiest simulator to adopt for a class, tutorial, or hackathon.
Pick MuJoCo directly when you want full authoring control rather than a benchmark harness. Pick Isaac Sim when photorealistic rendering or ROS 2 integration is the bottleneck. Pick Isaac Lab when you need GPU-parallel RL at 4,096+ envs for locomotion or large-scale training. See our MuJoCo vs Isaac Sim 2026 guide for the broader comparison.
Get a Custom Robosuite Environment
SVRC builds custom Robosuite environments for imitation-learning and VLA-evaluation teams: new tabletop tasks with robomimic-compatible demonstration datasets, cross-embodiment suites that evaluate the same policy across multiple arms, and Mobile ALOHA bimanual scenes with teleoperation replay. Every delivery ships with robomimic configs, deterministic seeds, and a matching teleoperation dataset from our Mountain View lab.
Related Links
- RL Environments hub — compare 8 major simulators.
- MuJoCo — the physics engine Robosuite builds on.
- Isaac Lab — GPU-parallel RL alternative for large-scale training.
- MuJoCo vs Isaac Sim 2026 — head-to-head simulator comparison.
- Compatible hardware in the store — Franka, Mobile ALOHA, UR5e.
- Custom teleoperation datasets for imitation learning.