Processing math: 100%

2504.15475

Total: 1

#1 Speculative Sampling via Exponential Races [PDF4] [Copy] [Kimi4] [REL]

Authors: Szymon Kobus, Deniz Gündüz

Speculative decoding accelerates large language model inference using a smaller draft model. In this paper, we establish a surprising connection between speculative decoding and channel simulation, which aims at simulating a noisy channel using as few bits as possible. This connection allows us to provide an information-theoretic analysis of the speed up that can be achieved by speculative decoding. Leveraging this link, we derive an explicit relation between generation speed-up and the number of tokens k generated by the draft model for large k, which serves as an upper bound for all k. We also propose a novel speculative decoding method via exponential race ERSD that matches state-of-the-art performance.

Subjects: Computation and Language , Information Theory

Publish: 2025-04-21 23:02:08 UTC