Offline RL
Reinforcement learning from a fixed dataset of previously collected transitions, without any additional online interaction with the environment. Offline RL algorithms (CQL, IQL, TD3+BC) address the distribution shift between the behavior policy that collected the data and the learned policy. Offline RL is attractive for robotics because it avoids the safety concerns and cost of online exploration.