Q-function (Action-Value Function)
The Q-function Q(s, a) estimates the expected cumulative discounted reward an agent will receive by taking action a in state s and then following a given policy thereafter. Q-functions are central to reinforcement learning algorithms such as DQN (discrete actions) and SAC, TD3, and DDPG (continuous actions). In robot RL, learning accurate Q-functions for long-horizon manipulation tasks is challenging because rewards are sparse and the state-action space is high-dimensional. Recent work in offline RL (IQL, CQL) uses Q-functions to extract policies from fixed datasets without online interaction, bridging the gap between imitation learning and RL.