Multi-Head Attention

An attention mechanism that runs multiple attention operations in parallel, each with different learned projections, then concatenates their outputs. Multi-head attention allows the model to attend to different types of information (position, color, shape) simultaneously. It is the core computational primitive of transformer architectures used in VLAs and policy transformers.

MLTransformer

Explore More Terms

Browse the full robotics glossary.

Back to Glossary