Glossary

COLOSSEUM

A simulation benchmark that systematically evaluates robot manipulation policies under 14 types of environmental perturbations: lighting changes, table texture, distractor objects, camera position shifts, and more. COLOSSEUM tests the robustness and generalization of learned policies by measuring performance degradation under each perturbation type.

See this in practice: our real-world evals →

BenchmarkRobot Learning

Explore More Terms

Browse the full robotics glossary.

Back to Glossary