Adam Optimizer
An adaptive learning rate optimization algorithm that maintains per-parameter first and second moment estimates of gradients. Adam combines the benefits of AdaGrad (per-parameter learning rates) and RMSProp (exponential moving average of squared gradients). It is the default optimizer for training most robot learning models, with AdamW (decoupled weight decay) being the most common variant.