← Benchmarks

COLOSSEUM

Large-scale real-robot manipulation benchmark. Diverse tasks and environments.

Overview

COLOSSEUM is a real-robot benchmark with diverse manipulation tasks across multiple environments. Used to evaluate generalization and robustness of VLA and policy models. BridgeVLA achieves 64% success.

Related