Policy Choices

LeRobot ships three production-ready policy architectures. Choose one before you run training — you cannot switch mid-run.

Diffusion Policy

Higher peak accuracy on precision tasks but 3–5x slower to train and infer. Use it after you have a working ACT baseline.

SmolVLA

Language-conditioned VLA. Use when your task requires natural language instructions or multi-task generalization. Requires more data.

ACT Training Command

Replace $HF_USER/pick-place-v1 with your dataset repo ID from Unit 3.

source ~/lerobot-env/bin/activate python -m lerobot.scripts.train \ --policy-type act \ --dataset-repo-id $HF_USER/pick-place-v1 \ --output-dir ~/lerobot-policies/pick-place-v1 \ --config-overrides \ training.num_steps=50000 \ training.eval_freq=5000 \ training.save_freq=5000 \ training.batch_size=32 \ policy.chunk_size=100 \ policy.n_action_steps=100 # Add --device cuda if you have a GPU (strongly recommended) # Checkpoints save every 5k steps to ~/lerobot-policies/pick-place-v1/ # Start this before sleep — it can run unattended
GPU vs CPU training time: On an RTX 3090 (24GB), 50,000 steps takes approximately 60–80 minutes. On an RTX 3080 (10GB), approximately 90–120 minutes. On CPU, expect 8–12 hours. Cloud GPU options (Lambda Labs, Vast.ai) run $0.50–1.50/hr for the hardware needed.

Recommended Hyperparameters for Single-Arm Pick-and-Place

Parameter Recommended Why
num_steps50000Sufficient for 50–100 demos of a simple pick-and-place. Increase to 80k if your loss plateau occurs late.
batch_size32Standard for single-arm datasets. Reduce to 16 if you run out of GPU memory.
chunk_size100ACT plans 100 steps ahead. At 30fps this is ~3.3 seconds — a good planning horizon for pick-and-place.
n_action_steps100Must match chunk_size. Reduces inference frequency and smooths execution.
kl_weight10LeRobot default. Do not change unless L_kl stays near zero after 20k steps.
lr1e-5LeRobot default for ACT. Lower to 5e-6 if reconstruction loss oscillates instead of converging.

Reading Training Logs

Training logs print to the terminal and to TensorBoard. Launch TensorBoard in a second terminal:

tensorboard --logdir ~/lerobot-policies/

Then open http://localhost:6006 in your browser. Watch these curves:

loss/reconstruction (L_recon)

The primary training signal. Should decrease from ~2.5–3.5 to below 0.1 by 50,000 steps. A plateau above 0.15 after 40,000 steps usually means your dataset has too much variance — review Unit 3's good demo practices and consider recording more consistent demonstrations.

loss/kl (L_kl)

Rises slowly from near 0 to 5–20. This is expected behavior — the CVAE is learning a compact style embedding. If it exceeds 40, your demonstrations contain too much behavioral diversity. If it stays near 0 after 20k steps, the CVAE is not learning; increase kl_weight to 20.

train/loss (total loss)

L_recon + kl_weight × L_kl. Dominated by L_recon in early training. Should decrease monotonically. A total loss that rises after an initial decrease indicates learning rate decay is too aggressive — check the scheduler config.

Checkpoint Management

Checkpoints save every 5,000 steps to ~/lerobot-policies/pick-place-v1/checkpoints/. Do not assume the final checkpoint is the best. The policy can overfit at high step counts, especially with small datasets.

After training, identify your best checkpoint: it is the step where L_reconstruction reached its minimum before starting to plateau. For 50 demonstrations, this typically occurs in the 35,000–50,000 step range. Save this step number — you will use it in Unit 5.

Unit 4 Complete When...

Training has completed 50,000 steps and checkpoints are saved in ~/lerobot-policies/pick-place-v1/checkpoints/. The final L_reconstruction loss is below 0.1. You have identified your best checkpoint step based on the loss curves. You understand what L_kl is doing in your training run. You are ready to evaluate the policy in Unit 5.