Robot Learning Glossary: 60 Key Terms Defined

This glossary covers 60 key terms across six categories used in robot learning, teleoperation, and manipulation. Each definition is written for practitioners — concise, concrete, and with a brief example. For a more visual explanation of any concept, explore the rest of the Robotics Library.

Imitation Learning Terms

Behavior Cloning (BC): The simplest imitation learning algorithm — train a supervised model to predict expert actions given states, treating it as a regression problem. Example: train a CNN to predict gripper position from camera images using 100 recorded pick-and-place demonstrations.
DAgger (Dataset Aggregation): An iterative IL algorithm that fixes distribution shift by having the expert correct the learned policy's actions at states the policy actually visits. Example: deploy a BC policy, have an operator correct any mistakes in real-time, add corrections to training data, retrain.
Action Chunking: Predicting K consecutive future actions at once rather than one action per step. Used in ACT (K=100). Example: the policy outputs 2 seconds of future arm motion at once, then re-plans — breaking the compounding error cycle.
Distribution Shift: The mismatch between the state distribution seen during training (demonstrations) and the distribution visited at test time (robot execution). Example: the training data always starts with the object in a specific location; the robot fails when the object is slightly moved.
Covariate Shift: A specific type of distribution shift where the input distribution (states) changes but the conditional output distribution (actions given states) is unchanged. Example: robot visits states not in training data because its own imperfect actions lead it off the demonstrated path.
Compounding Error: The phenomenon where small per-step prediction errors accumulate geometrically over time, growing as O(T²ε) for horizon T and per-step error ε. Example: a 0.5% per-step error causes 25% cumulative error after 50 steps.
Multi-modality: The property of a demonstration dataset where multiple distinct actions are valid for the same state. Example: a red block can be grasped from the left or the right — both are correct, but a single-mode model averages them into a bad middle grasp.
Demonstration: A single recorded instance of a human expert performing a task that the robot should learn. Example: one complete recording of an operator picking up a block and placing it in a bin.
Episode: One complete attempt at a task from start to finish, including all sensor data. May succeed or fail. Example: 150 seconds of synchronized image and joint data for one pick-and-place attempt.
Rollout: An episode generated by the learned policy (rather than by a human). Used to evaluate and sometimes to collect additional training data. Example: deploying the trained BC policy for 20 test episodes to measure its success rate.

Policy Architecture Terms

Transformer Policy: A policy implemented as a Transformer neural network — using self-attention to model temporal dependencies across timesteps or spatial dependencies across image patches. Example: ACT uses a Transformer decoder to generate action sequences conditioned on visual observations.
CVAE (Conditional Variational Autoencoder): A generative model that learns a latent distribution conditioned on observed variables. Used in ACT to encode the multi-modality of demonstrations into a style variable. Example: ACT's CVAE encoder compresses the full action sequence into a latent code z, allowing the decoder to generate consistent action chunks.
Diffusion Policy: A policy that models the action distribution as a denoising diffusion process — iteratively denoising random noise into a plausible action trajectory conditioned on observations. Example: Chi et al. 2023 demonstrated that Diffusion Policy outperforms BC on multi-modal tasks by representing multiple valid grasps simultaneously.
Energy-Based Model (EBM): A model that assigns a scalar "energy" to each (state, action) pair; inference finds the action minimizing energy. Example: IBC trains an EBM where low energy = expert-like (state, action) pair; inference via gradient descent on action.
Flow Matching: A generative modeling technique that learns to transport samples from a simple distribution to a complex target distribution via continuous normalizing flows. Example: π₀ (Physical Intelligence's foundation model) uses flow matching for action generation.
Prediction Horizon: The number of future timesteps a policy predicts at once. Longer horizons enable smoother, more coordinated motion but require more accurate models. Example: a 50-step prediction horizon at 50 Hz covers 1 second of future motion.
Chunk Size: The number of action steps in one prediction chunk (same as prediction horizon in ACT-style policies). Example: ACT uses chunk size K=100 at 50 Hz, so the arm plans 2 seconds of motion at a time.
Action Head: The output module of a policy network that maps latent features to action values. Example: an MLP head taking Transformer decoder output and outputting 7 joint angles for a 6-DOF arm + gripper.

Data and Collection Terms

Teleoperation: Human-controlled operation of a robot from a remote interface, typically used to collect demonstration data. Example: an operator wearing a VR headset moves their hands, and the robot arm mirrors those motions while all data is logged.
Kinesthetic Teaching: A data collection method where the operator physically grasps and moves the robot arm (gravity-compensated, back-drivable mode) to demonstrate a task. Example: a researcher physically guides an arm through a peg-insertion task; joint angles are recorded at 100 Hz.
Leader-Follower: A teleoperation architecture where the operator moves a lightweight replica arm (leader) and the robot arm (follower) mirrors the motion in real time. Example: ALOHA uses two leader arms (one per hand) to control a bimanual robot system.
HDF5 (Hierarchical Data Format 5): A binary file format for storing large numerical datasets with chunked, compressed arrays. The standard storage format for robot episode data. Example: one HDF5 file per episode containing groups for /images, /joint_states, /actions, /metadata.
RLDS (Reinforcement Learning Datasets): A TensorFlow Datasets-based format for robot learning data, standardizing episode and step structure. Used by Octo, Open X-Embodiment. Example: convert your HDF5 files to RLDS to train Octo on your custom data with minimal code changes.
LeRobot: A HuggingFace-based robot learning framework and dataset format using Parquet files and a standardized schema. Example: publish your dataset to HuggingFace Hub in LeRobot format to share it with the community.
Data Augmentation: Applying random transformations to training data to improve policy generalization. Example: randomly crop images by ±10%, jitter color by ±15%, add Gaussian noise to joint states during training.
Success Rate: The fraction of evaluation rollouts in which the policy successfully completes the task. The primary evaluation metric for manipulation policies. Example: 18/20 successful grasps = 90% success rate.
Rejection Rate: The fraction of collected episodes discarded during quality filtering. Example: collecting 500 episodes with 35% rejection rate yields 325 training episodes.
Episode Length: The duration (in timesteps or seconds) of one episode. Example: mean episode length of 8 seconds at 50 Hz = 400 timesteps per episode.

Robot Hardware Terms

DOF (Degrees of Freedom): The number of independent axes of motion in a robot arm. 6 DOF is the minimum for arbitrary end-effector poses; 7 DOF adds redundancy. Example: the UR5e has 6 revolute joints → 6 DOF.
Payload: The maximum mass the robot arm can carry at its end-effector while maintaining rated performance. Example: a 5 kg payload cobot can reliably lift objects up to 5 kg including the gripper's own weight.
Reach: The maximum radius from the robot's base that the end-effector can achieve. Determined by the sum of link lengths. Example: the UR5e has 850 mm reach — sufficient for a standard lab table setup.
Repeatability: The precision with which the arm returns to the same commanded position on repeated attempts, measured as ±X mm. Example: ±0.05 mm repeatability means 200 repeated moves to the same target all land within a 0.1 mm diameter sphere.
End-Effector: The device mounted at the tip of the robot arm that interacts with objects — the "hand." Example: a parallel jaw gripper, a suction cup array, or a multi-finger dexterous hand.
Gripper: A specific type of end-effector that grasps objects by applying clamping force. Parallel jaw grippers are the most common for pick-and-place. Example: the Robotiq 2F-85 opens to 85 mm and closes with up to 235 N force.
Force/Torque (F/T) Sensor: A six-axis wrist sensor measuring forces (Fx, Fy, Fz) and torques (Tx, Ty, Tz) at the end-effector. Example: detecting a successful grasp by verifying that gripper force exceeds 5 N with no slip torque.
Tactile Sensor: A sensor that measures distributed contact pressure or geometry at the fingertip level. Example: GelSight produces a high-resolution image of finger contact geometry, enabling slip detection and grasp quality assessment.
Joint Encoder: A sensor measuring the angle of a robot joint. Absolute encoders report the true angle without homing. Example: a 19-bit absolute encoder provides 0.0007° resolution at 1 kHz.
Servo: In robotics, a self-contained motor+encoder+controller unit that accepts a position, velocity, or torque command. Example: Dynamixel servos are widely used in low-cost research arms — each servo handles its own PID control internally.

Learning Paradigms

Reinforcement Learning (RL): A learning paradigm where an agent learns by taking actions and receiving scalar reward signals — no expert demonstrations needed. Example: training a grasping policy in simulation where the reward is +1 for successful lift and 0 otherwise.
Offline RL: RL from a fixed dataset of pre-collected experience, without additional environment interaction. Example: training a policy from a robot dataset collected by humans using conservative Q-learning (CQL) to avoid out-of-distribution actions.
Online RL: RL with active environment interaction — the policy collects new data and immediately learns from it. Example: a simulated robot repeatedly attempts grasps, updates its policy after each attempt, and gradually improves.
Sim-to-Real: Training a policy in simulation and then deploying it on a real robot. The core challenge is the "reality gap" — simulation is never perfectly accurate. Example: training a locomotion policy in IsaacGym with randomized physics parameters, then deploying on a real quadruped.
Domain Randomization: A sim-to-real technique that randomizes simulation parameters (friction, lighting, mass, texture) so the policy learns to be robust to these variations. Example: randomly varying table friction between 0.3–0.9 during training so the policy works on real tables with unknown friction.
Foundation Model: A large neural network pre-trained on broad data (internet-scale or cross-robot datasets) that can be fine-tuned for specific tasks. Example: Octo pre-trained on 800K robot episodes from Open X-Embodiment, then fine-tuned on 50 task-specific episodes.
VLA (Vision-Language-Action Model): A model that takes visual observations and natural language instructions as input and outputs robot actions. Example: OpenVLA accepts a camera image and the instruction "pick up the red cup" and outputs joint angle deltas.
Fine-Tuning: Adapting a pre-trained model to a new task or environment by continuing training on a small task-specific dataset. Example: fine-tuning Octo on 200 custom pick-and-place episodes for 500 gradient steps.
Zero-Shot: Applying a trained model to a new task or environment without any task-specific training. Example: a VLA trained on many tasks succeeds at a new instruction "pick up the blue block" without ever having seen that specific instruction during training.
Few-Shot: Adapting to a new task using only a small number of examples (typically 1–20). Example: a meta-learning policy adapts to a new object shape after seeing just 5 demonstration episodes.

Evaluation Terms

Success Rate: Fraction of evaluation trials where the policy completes the task. The standard primary metric. Example: 16/20 = 80% success rate over 20 real-robot evaluation trials.
Out-of-Distribution (OOD): Inputs that differ from the training distribution — new object colors, positions, or environments not seen during training. Example: a policy trained on red blocks evaluated on green blocks is tested OOD on color.
Generalization Gap: The difference between in-distribution and out-of-distribution success rates. Large gaps indicate overfitting to training conditions. Example: 90% in-distribution vs. 45% OOD = 45% generalization gap.
Benchmark: A standardized set of tasks, objects, and evaluation protocols for comparing algorithms fairly. Example: FurnitureBench, RLBench, and LIBERO are common robot manipulation benchmarks.
Real-World Evaluation: Policy evaluation on a physical robot (not simulation). Considered the gold standard; simulation metrics often do not transfer. Example: running 50 real-world trials on 5 different table setups to measure robustness.
Ablation Study: An experiment removing or modifying one component of a system to measure its contribution. Example: ablating the CVAE component from ACT to measure how much it contributes to multi-task performance.
Held-Out Set: A subset of data kept completely separate from training and only used for final evaluation. Ensures evaluation metrics are not inflated by overfitting. Example: reserving 10% of episodes from each task variant for evaluation, never exposing them to the training procedure.

For full-length explanations of any concept in this glossary, browse the SVRC Glossary or explore the Robotics Library articles above. This reference is updated as new techniques emerge in the field.