A3C
Asynchronous Advantage Actor-Critic — a deep RL algorithm that trains multiple agent instances in parallel on separate environment copies, asynchronously updating a shared policy network. A3C was one of the first algorithms to achieve superhuman performance on Atari games and demonstrated that parallel data collection can replace experience replay for stable training.