Behavioral Regularization
A family of offline RL methods that constrain the learned policy to remain close to the behavioral policy that collected the data, preventing exploitation of out-of-distribution actions. Methods include: policy constraint (TD3+BC), KL divergence penalty, and support constraint (BEAR). Behavioral regularization is the key mechanism enabling stable offline RL.
Robot LearningRL