ACT (Action Chunking with Transformers)

ACT is an imitation learning algorithm introduced by Tony Zhao et al. (2023) that trains a transformer-based policy to predict a fixed-length chunk of future actions rather than a single action at each timestep. By predicting action sequences in one shot, ACT reduces the compounding error typical of step-by-step behavioral cloning and produces temporally consistent motion. The architecture encodes RGB observations and proprioceptive state through a CVAE-style encoder and decodes action chunks using a transformer. ACT was demonstrated on the ALOHA bimanual platform, achieving strong performance on tasks such as opening a bag and transferring eggs. See also: Action Chunking (deep dive).
PolicyTransformerImitation Learning

Explore More Terms

Browse the full robotics glossary with 70+ terms.

Back to Glossary