What Is a Robot Trajectory?

A robot trajectory is the sequence of states and actions recorded during one episode of robot operation. Formally: {(s₀, a₀), (s₁, a₁), ..., (s_T, a_T)}, where s_t is the full sensor observation at time step t (camera images + joint states + force readings) and a_t is the action taken.

For training, the action sequence is what the policy must learn to reproduce. But "action" is not a single universal format — it depends on your design choices, and those choices have significant consequences for policy generalization and training difficulty.

Joint Space Representation

The most direct representation: the action a_t is a vector of absolute joint angles q ∈ ℝⁿ. The robot controller receives joint angle targets and drives each joint to the commanded position.

  • Advantages: No IK required — angles go directly to the motor controllers. Deterministic: the same action always produces the same joint configuration. Fast execution — no intermediate computation.
  • Disadvantages: Arm-specific. A policy trained on a 6-DOF arm cannot be transferred to a 7-DOF arm without retraining, even for the same task. Not intuitive: angle values do not correspond to meaningful task concepts.
  • Standard use: ALOHA and ACT use absolute joint space as the primary action representation. For fixed-workspace, single-arm tasks this works well.

Cartesian Task Space

The action a_t represents the desired end-effector pose: (x, y, z, qx, qy, qz, qw) — three position components and a quaternion for orientation.

  • Advantages: Intuitive — the action directly specifies where the end-effector should be. More transferable across robots with different joint configurations but similar workspaces.
  • Disadvantages: Requires IK to convert to joint commands — adds latency and singularity risk. Rotation representation is tricky: Euler angles have gimbal lock (avoid them), quaternions are compact but not unique (double cover of SO(3)).
  • Rotation representation note: Always use quaternions (not Euler angles) in code. For neural network outputs, consider 6D rotation representation (first two columns of the rotation matrix) — it is continuous and singularity-free.

Delta Actions

Instead of absolute targets, delta actions specify the change from the current state: a_t = Δq (change in joint angles) or a_t = (Δx, Δy, Δz, Δ rotation) (change in Cartesian pose).

  • Why deltas are easier to learn: The magnitude of delta actions is small and roughly constant across a trajectory. Networks learn to predict small, bounded values more easily than large absolute coordinates that vary with workspace position.
  • Implicit safety: Clipping delta actions to a maximum magnitude bounds the robot's speed — an important safety property.
  • Standard use: Diffusion Policy, RT-1, Octo, and most VLA models use Cartesian delta actions as their primary action space.

Absolute vs. Object-Relative Representations

A fundamental generalization question: should actions be represented in the robot's workspace frame, or relative to the object being manipulated?

  • Absolute (workspace frame): Action values depend on where in the workspace the object is. If the table is moved 10 cm, the policy fails. Good for fixed setups; poor for deployment generalization.
  • Object-relative: Actions are expressed as offsets from the detected object pose. Policy learns "grasp from 5 cm above the object" rather than "move to (0.3, -0.1, 0.15)". Requires reliable object detection or pose estimation, but generalizes dramatically better to new table heights, positions, and even new environments.

Trajectory Length and Padding

Tasks vary in duration: a simple grasp might take 2 seconds (100 steps at 50Hz); a multi-step assembly might take 30 seconds (1500 steps). This creates a practical challenge for batch training.

  • Fixed-length with padding: Pad shorter episodes to a maximum length with a special "no-op" action token. Use attention masking in Transformer-based policies so the network ignores padding tokens. Simple to implement.
  • Variable-length with masking: Process each episode at its natural length. Requires careful batching (group by similar length or use dynamic padding).
  • Action chunking: ACT-style: break the trajectory into fixed-length chunks (e.g., 100 steps) and train on chunks independently. Naturally handles variable episode lengths while maintaining fixed-size model inputs.

Representation Comparison

RepresentationIntuitiveGeneralizableIK NeededUsed In
Absolute joint anglesNoLowNoALOHA, ACT, RoboAgent
Absolute Cartesian poseYesModerateYesClassic manipulation research
Cartesian delta actionsYesHighYes (incremental)Diffusion Policy, RT-1, Octo
Object-relative CartesianYesVery HighYesGeneralizable grasping research
Waypoint sequencesYesHighYesLong-horizon task planning

Trajectory representation is one of the most impactful design decisions in a robot learning system. Spend time on this choice before investing in large-scale data collection. See the SVRC platform for tools that support all of the above representations with built-in format conversion.