Exploration-Exploitation Tradeoff
The fundamental dilemma in RL: the agent must balance exploiting known high-reward actions vs. exploring unknown actions that might yield higher rewards. Pure exploitation leads to local optima; pure exploration is inefficient. Methods like ε-greedy, UCB, Thompson sampling, and curiosity-driven exploration manage this tradeoff. In robotics, safe exploration constraints make the tradeoff harder.
Robot LearningRL