CQL

Conservative Q-Learning — an offline RL algorithm that learns a conservative (pessimistic) Q-function by adding a penalty for Q-values on out-of-distribution actions. This prevents the overestimation problem that causes offline RL policies to exploit spurious high-value regions not supported by the data. CQL is one of the most widely used offline RL methods for robot manipulation.

RL

Explore More Terms

Browse the full robotics glossary.

Back to Glossary