Exploration-Exploitation Tradeoff

The fundamental dilemma in RL: the agent must balance exploiting known high-reward actions vs. exploring unknown actions that might yield higher rewards. Pure exploitation leads to local optima; pure exploration is inefficient. Methods like ε-greedy, UCB, Thompson sampling, and curiosity-driven exploration manage this tradeoff. In robotics, safe exploration constraints make the tradeoff harder.

Robot LearningRL

Explore More Terms

Browse 1,000+ robotics terms.

Back to Glossary