PPO

Proximal Policy Optimization — a policy gradient RL algorithm that constrains policy updates to a trust region using a clipped surrogate objective. PPO is the default RL algorithm for robot locomotion (legged robots, humanoids) and sim-to-real transfer due to its stability, simplicity, and sample efficiency. It balances exploration and exploitation without the computational cost of TRPO's constrained optimization.

Robot LearningRL

Explore More Terms

Browse the full robotics glossary with 1,000+ terms.

Back to Glossary