COLOSSEUM
A simulation benchmark that systematically evaluates robot manipulation policies under 14 types of environmental perturbations: lighting changes, table texture, distractor objects, camera position shifts, and more. COLOSSEUM tests the robustness and generalization of learned policies by measuring performance degradation under each perturbation type.