RT-2

Robotics Transformer 2 — a VLA model from Google DeepMind that fine-tunes a large vision-language model (PaLI-X or PaLM-E) to output robot actions as text tokens. RT-2 demonstrates that internet-scale pre-training enables robots to follow novel language instructions and generalize to unseen objects and scenarios. It represents the paradigm of treating robot action prediction as a vision-language modeling problem.

Robot LearningVLA

Explore More Terms

Browse the full robotics glossary with 1,000+ terms.

Back to Glossary