Pre-Training
The initial phase of training a model on a large, general dataset before fine-tuning it on a specific downstream task. In robot learning, visual encoders are pre-trained on ImageNet or internet images (ResNet, ViT, DINOv2), and VLA models are pre-trained on internet-scale vision-language data. Pre-training provides robust feature representations that transfer to robot perception tasks with limited robot-specific data.