Question 1

What is HumanML3D?

Accepted Answer

HumanML3D is a dataset pairing 14,616 human motion sequences with 44,970 natural language descriptions. Each motion has roughly 3 text descriptions in the format 'a person walks forward and turns left.' It is the standard benchmark for text-to-motion generation research and is built on motion data from the AMASS archive.

Question 2

Is HumanML3D free to use?

Accepted Answer

Yes. HumanML3D is released under the MIT license, making it one of the most permissive motion-language datasets available. It can be used for both research and commercial purposes.

Question 3

How is HumanML3D useful for robotics?

Accepted Answer

HumanML3D enables training language-conditioned motion generation models that can be retargeted to humanoid robots. Instead of manually programming every motion, you can describe desired behavior in natural language and have the model generate the corresponding joint trajectories.

Metric	Value
Motions	14,616 unique motion sequences
Text descriptions	44,970 (avg. ~3 per motion)
Source	AMASS motion capture + HumanAct12
Representation	263-dim feature vector per frame (joints, velocities, foot contact)
License	MIT
Benchmark models	MDM, MotionDiffuse, MoMask, T2M-GPT, ReMoDiffuse

HumanML3D: Text-to-Motion Generation Dataset

Key Stats

What is HumanML3D?

Relevance to robotics

Access

Related datasets

Language-conditioned robot control