Masked Autoencoder
A self-supervised pre-training approach where random patches of an image are masked, and a ViT encoder-decoder learns to reconstruct the missing patches. MAE pre-training learns strong visual representations that transfer well to downstream tasks. In robotics, MAE-pre-trained vision encoders provide robust features for manipulation policies, especially with limited labeled robot data.