MoMask: Generative Masked Modeling of 3D Human Motions
January 2026
25 min read
Generative Models, Motion Models, Transformers, MoMask, 3D Vision
Use arrow keys or click to navigate slides. Press 'F' or Fullscreen icon for best experience.
What You'll Learn
- •Architecture Overview of MoMask
- •Training the Residual VQ-VAE
- •Masked Transformer Modeling
- •Inference Pipeline & Sampling
- •Quantitative & Qualitative Results
Key Concepts Covered
Learning to reconstruct corrupted motion sequences token by token.
A hierarchical VQ-VAE that learns coarse-to-fine discrete representations.
Capabilities allowed by masked modeling like in-painting and motion interpolation.
Resources
Slide Overview
- Architecture Overview (Slides 1-5)
- Residual VQ-VAE & Training (Slides 6-12)
- Masked & Residual Transformer (Slides 13-20)
- Inference Pipeline (Slides 21-25)
- Results & Future Work (Slides 26-end)
