Google Robot Benchmark

Real-world manipulation evaluation. 700+ tasks, multiple robot embodiments.

Overview

The Google Robot Benchmark evaluates policies on real physical robots across 700+ tasks. Supports WidowX and other embodiments. Metrics include success rate, multi-task performance, and language grounding. Used to evaluate OpenVLA, RT-X, and related models.

Key Results

InternVLA-M1: 71.7% (WidowX), 76–81% (other embodiments)
OpenVLA: Outperforms RT-2-X by 16.5% on 29 tasks

BridgeData — WidowX dataset
OpenVLA — Model evaluation

Google Robot Benchmark

Overview

Key Results

Related