Unit 6: Deploy and Improve — OpenArm Learning Path

Running Inference on the Real Arm

Deployment means running your trained checkpoint in real time, feeding live camera and joint observations into the network and executing the output actions on the physical arm. The inference script handles the observation-action loop at 50Hz.

source ~/openarm-env/bin/activate

# Make sure ROS 2 is running (real hardware mode, from Unit 1)

python -m lerobot.scripts.eval \
  --policy-checkpoint ~/openarm-policies/pick-and-place-v1/checkpoint_XXXXX \
  --device cuda \
  --num-eval-episodes 10 \
  --record-video \
  --output-dir ~/openarm-evals/v1

# Replace XXXXX with your best checkpoint step number from Unit 5
# --record-video saves each episode as an mp4 for review

For the first deployment run, keep your hand near the physical E-stop. A freshly deployed policy may occasionally make unexpected movements while it warms up to the real hardware environment. This is normal for the first 2–3 episodes. After that, behavior should stabilize.

For comprehensive deployment and production guidance including safety envelopes and watchdog timers, see the OpenArm Production Guide.

Evaluation Methodology

Do not evaluate your policy informally. Use a structured protocol — it is the only way to know if a change you make (more data, different checkpoint, different task framing) actually improved performance:

Protocol Item	Specification
Number of episodes per evaluation	10 minimum, 20 for high-confidence results
Object starting position	Fixed. Use tape marks. Same position every episode.
Object type	Same object as training. Lighting must match training conditions.
What counts as success	Object placed within 3cm of target. Arm returns to home. No human intervention during episode.
Failure classification	Log failure type: missed grasp / dropped object / wrong target / timeout. This tells you what to fix.
Report metric	Success rate = successful episodes / total episodes. Report with episode count (e.g., "7/10 = 70%").

The Data Flywheel: How to Get Better

A policy that succeeds 7/10 times is a good start — but the path to 9/10 or beyond is through the data flywheel. This is the core loop of robot learning in production:

Collect

Record demonstrations, including failure cases your current policy struggles with

Train

Retrain (or fine-tune) on your expanded dataset with the new demonstrations added

Evaluate

Run the structured eval protocol. Did success rate improve? What failure modes remain?

Analyze

Watch the failure videos. Identify the specific state where the policy breaks down. Collect targeted data there.

The key insight of the flywheel: targeted data beats random data. Instead of recording 50 more random demonstrations, watch your failure videos and identify the exact moment things go wrong. Record 20 demonstrations that specifically cover that difficult state (e.g., the grasp at the edge of the workspace, or the object at an unusual angle). Your success rate will improve faster with 20 targeted demos than 50 random ones.

Common Failure Modes and How to Fix Them

Arm overshoots the grasp position: The policy's action chunks are too large or your data had high velocity variance. Record 10 more demos at slow speed near the grasp point. Or reduce chunk_size from 100 to 50 in the training config.
Arm succeeds on training object but fails on slightly different objects: Your training data lacked object position diversity. Record 20 demos with the object at 5 different positions within a 10cm radius. This teaches the policy to generalize.
Policy freezes or produces repeated motions: The CVAE style variable is collapsing. This often means your dataset has too much variance — the model cannot find a consistent style. Check for mixed demonstrations (different operators, different task framings) and clean your dataset.

Unit 6 Complete When...

Your arm completes the pick-and-place task autonomously 7 out of 10 times in a structured evaluation run. You have watched the 3 failure videos and identified what went wrong. You understand the data flywheel well enough to plan your next improvement iteration. This is the end of the structured path — but it is the beginning of your robot learning practice.

What's Next

You have the foundation. Here is where to go from here:

Deploy and Improve

Running Inference on the Real Arm

Evaluation Methodology

The Data Flywheel: How to Get Better

Collect

Train

Evaluate

Analyze

Common Failure Modes and How to Fix Them

Unit 6 Complete When...

You did it.

What's Next

OpenArm Production Guide

Go Deeper: Diffusion Policy

DK1 Bimanual Kit

Share Your Results