CQL
Conservative Q-Learning — an offline RL algorithm that learns a conservative (pessimistic) Q-function by adding a penalty for Q-values on out-of-distribution actions. This prevents the overestimation problem that causes offline RL policies to exploit spurious high-value regions not supported by the data. CQL is one of the most widely used offline RL methods for robot manipulation.