MoMask: Generative Masked Modeling of 3D Human Motions

January 2026
25 min read
Generative Models, Motion Models, Transformers, MoMask, 3D Vision
Loading...
Use arrow keys or click to navigate slides. Press 'F' or Fullscreen icon for best experience.

What You'll Learn

  • Architecture Overview of MoMask
  • Training the Residual VQ-VAE
  • Masked Transformer Modeling
  • Inference Pipeline & Sampling
  • Quantitative & Qualitative Results

Key Concepts Covered

Learning to reconstruct corrupted motion sequences token by token.

A hierarchical VQ-VAE that learns coarse-to-fine discrete representations.

Capabilities allowed by masked modeling like in-painting and motion interpolation.

Resources

Slide Overview

  • Architecture Overview (Slides 1-5)
  • Residual VQ-VAE & Training (Slides 6-12)
  • Masked & Residual Transformer (Slides 13-20)
  • Inference Pipeline (Slides 21-25)
  • Results & Future Work (Slides 26-end)