CALVIN
A benchmark for evaluating long-horizon language-conditioned manipulation in simulation. CALVIN provides a robotic tabletop environment with 34 manipulation tasks (push, slide, open, close, pick, place) conditioned on natural language instructions. Policies are evaluated on their ability to chain multiple tasks sequentially based on language goals.