Unifying Autoregressive and Diffusion-Based Sequence Generation

#1 Unifying Autoregressive and Diffusion-Based Sequence Generation [PDF²] [Copy] [Kimi¹] [REL]

Authors: Nima Fathi, Torsten Scholak, Pierre-Andre Noel

We present significant extensions to diffusion-based sequence generation models, blurring the line with autoregressive language models. We introduce *hyperschedules*, which assign distinct noise schedules to individual token positions, generalizing both autoregressive models (*e.g.*, GPT) and conventional diffusion models (*e.g.*, SEDD, MDLM) as special cases. Second, we propose two \emph{hybrid token-wise noising processes} that interpolate between absorbing and uniform processes, enabling the model to fix past mistakes, and we introduce a *novel inference algorithm* that leverages this new feature in a simplified context inspired from MDLM. To support efficient training and inference, we design attention masks compatible with KV-caching. Our methods achieve state-of-the-art perplexity and generate diverse, high-quality sequences across standard benchmarks, suggesting a promising path for autoregressive diffusion-based sequence generation. See code and resources at https://hdlm-colm.github.io/ .

Subject: COLM.2025

rgq9BFXSFl@OpenReview

#1 Unifying Autoregressive and Diffusion-Based Sequence Generation [PDF2] [Copy] [Kimi1] [REL]

#1 Unifying Autoregressive and Diffusion-Based Sequence Generation [PDF²] [Copy] [Kimi¹] [REL]