Preference Learning

Learning from human comparative judgments (e.g., 'trajectory A is better than trajectory B') rather than explicit reward signals or demonstrations. A reward model is trained to be consistent with human preferences, then used to optimize the policy via RL. This approach (RLHF applied to robotics) avoids the need for precise scalar reward engineering and can capture nuanced human intent.

Robot LearningRL

Explore More Terms

Browse the full robotics glossary with 1,000+ terms.

Back to Glossary