Definition

In robotics, foundation models refer to large neural networks pretrained on broad datasets (internet images, text, or cross-embodiment robot data) that provide general representations transferable to specific tasks. Vision-Language-Action models (VLAs) like RT-2, OpenVLA, and π₀ are examples. These models accept language instructions and visual observations to produce robot actions. The key advantage is reduced data requirements for new tasks, as the pretrained representations already capture useful visual and semantic concepts.

Why It Matters for Robot Teams

Understanding foundation model is essential for teams building real-world robot systems. Whether you are collecting demonstration data, training policies in simulation, or deploying in production, this concept directly affects your workflow and system design.