Layer Normalization
A normalization technique that normalizes across the feature dimension of each individual sample (rather than across the batch, as in batch normalization). Layer normalization is the standard in transformer architectures because it works identically during training and inference and handles variable sequence lengths. It is used in nearly all VLA and policy transformer models.