Stochastic Gradient Descent
An iterative optimization algorithm that updates model parameters using gradients computed on random mini-batches of data. SGD is the foundation of deep learning training. Momentum, adaptive learning rate (Adam, RMSProp), and learning rate scheduling variants improve convergence. In robot learning, mini-batch gradient descent on demonstration datasets trains imitation learning policies.
MathML