Parallel Token Prediction for Language Models

#1 Parallel Token Prediction for Language Models [PDF¹⁴] [Copy] [Kimi¹⁸] [REL]

Authors: Felix Draxler, Justus Will, Farrin Marouf Sofian, Theofanis Karaletsos, Sameer Singh, Stephan Mandt

We propose Parallel Token Prediction (PTP), a universal framework for parallel sequence generation in language models. PTP jointly predicts multiple dependent tokens in a single transformer call by incorporating the sampling procedure into the model. This reduces the latency bottleneck of autoregressive decoding, and avoids the restrictive independence assumptions common in existing multi-token prediction methods. We prove that PTP can represent arbitrary autoregressive sequence distributions. PTP is trained either by distilling an existing model or through inverse autoregressive training without a teacher. Experimentally, we achieve state-of-the-art speculative decoding performance on Vicuna-7B by accepting over four tokens per step on Spec-Bench. The universality of our framework indicates that parallel generation of long sequences is feasible without loss of modeling power.

Subjects: Computation and Language , Machine Learning

Publish: 2025-12-24 18:46:55 UTC

2512.21323

#1 Parallel Token Prediction for Language Models [PDF14] [Copy] [Kimi18] [REL]

#1 Parallel Token Prediction for Language Models [PDF¹⁴] [Copy] [Kimi¹⁸] [REL]