Batch Normalization
A technique that normalizes layer inputs across the mini-batch to have zero mean and unit variance, then applies a learned affine transformation. Batch normalization stabilizes training, enables higher learning rates, and acts as a regularizer. It is standard in CNNs but less common in transformers, which typically use layer normalization instead.