Activation Function
A nonlinear function applied element-wise to the output of a neural network layer. Common activations include ReLU, GELU, sigmoid, tanh, and SiLU/Swish. The choice of activation affects training dynamics, gradient flow, and representational capacity. GELU is standard in transformers; ReLU and its variants dominate convolutional architectures.