Diffusion Transformers with Representation Autoencoders

January 2026

25 min read

Generative Models, Diffusion Models, Transformers, Representation Learning

Use arrow keys or click to navigate slides. Press 'F' or Fullscreen icon for best experience.

What You'll Learn

•Limitations of standard VAE encoders (outdated backbones, low-dim latents)
•Introduction to Representation Autoencoders (RAEs)
•Challenges of operating in high-dimensional latent spaces
•Theoretical solutions for faster convergence
•ImageNet generation results (1.51 FID)

Key Concepts Covered

Pairing pretrained encoders (DINO, SigLIP) with trained decoders.

Semantically rich representations that challenge standard diffusion.

The backbone architecture scaled for these experiments.

Resources

Open in Google Slides

Slide Overview

Motivation & VAE Limitations (Slides 1-5)
RAE Architecture Definition (Slides 6-12)
Latent Space Analysis (Slides 13-20)
Experimental Results (Slides 21-end)

Diffusion Transformers with Representation Autoencoders

January 2026

25 min read

Generative Models, Diffusion Models, Transformers, Representation Learning

Use arrow keys or click to navigate slides. Press 'F' or Fullscreen icon for best experience.

What You'll Learn

•Limitations of standard VAE encoders (outdated backbones, low-dim latents)
•Introduction to Representation Autoencoders (RAEs)
•Challenges of operating in high-dimensional latent spaces
•Theoretical solutions for faster convergence
•ImageNet generation results (1.51 FID)

Key Concepts Covered

Pairing pretrained encoders (DINO, SigLIP) with trained decoders.

Semantically rich representations that challenge standard diffusion.

The backbone architecture scaled for these experiments.

Resources

Open in Google Slides

Slide Overview

Motivation & VAE Limitations (Slides 1-5)
RAE Architecture Definition (Slides 6-12)
Latent Space Analysis (Slides 13-20)
Experimental Results (Slides 21-end)

Issam Alzouby

Diffusion Transformers with Representation Autoencoders

What You'll Learn

Key Concepts Covered

Resources

Slide Overview

Further Reading

Related Presentations

From Autoencoders to VQ-VAEs: A Mathematical Timeline

Training BAMM - Bidirectional Autoregressive Motion Model

Issam Alzouby

Diffusion Transformers with Representation Autoencoders

What You'll Learn

Key Concepts Covered

Resources

Slide Overview

Further Reading

Related Presentations

From Autoencoders to VQ-VAEs: A Mathematical Timeline

Training BAMM - Bidirectional Autoregressive Motion Model

Diffusion Transformers with Representation Autoencoders

What You'll Learn

Key Concepts Covered

Representation Autoencoder (RAE)

High-Dimensional Latents

Diffusion Transformer (DiT)

Resources

Slide Overview

Further Reading

Related Presentations

From Autoencoders to VQ-VAEs: A Mathematical Timeline

Training BAMM - Bidirectional Autoregressive Motion Model

Diffusion Transformers with Representation Autoencoders

What You'll Learn

Key Concepts Covered

Representation Autoencoder (RAE)

High-Dimensional Latents

Diffusion Transformer (DiT)

Resources

Slide Overview

Further Reading

Related Presentations

From Autoencoders to VQ-VAEs: A Mathematical Timeline

Training BAMM - Bidirectional Autoregressive Motion Model