DDPG
Deep Deterministic Policy Gradient — an off-policy RL algorithm for continuous action spaces that uses a deterministic policy and a Q-function, trained via the deterministic policy gradient theorem. DDPG uses experience replay and target networks for stability. It has been widely applied to robot control tasks but can be sensitive to hyperparameters. TD3 (Twin Delayed DDPG) addresses its overestimation bias.