StarSD: One-for-Many Speculative Decoding

#1 StarSD: One-for-Many Speculative Decoding [PDF¹] [Copy] [Kimi] [REL]

Authors: Junhao He, Feiran You, Hongyang Du

Speculative decoding accelerates autoregressive generation by separating token proposal from verification, but most existing approaches are designed for single-node execution and do not scale well to multi-accelerator clusters used for serving modern Large Language Models (LLMs). We present StarSD, a one-for-many speculative decoding framework that uses a single draft model to serve multiple target models across distributed nodes via a star topology. StarSD decouples drafting and verification, enabling effective sharing of draft computation, and preventing distributed accelerators from remaining idle under bursty workloads. We provide a system-level analysis that characterizes when and why a single draft model can remain fully utilized by multiple verifiers, yielding predictable latency and utilization gains. Extensive experiments in real-world distributed inference settings demonstrate that StarSD simplifies deployment and supports flexible resource allocation across heterogeneous accelerators, while maintaining output quality. These results indicate that StarSD is a practical and scalable framework for bringing speculative decoding to modern cloud and edge inference infrastructures.

Subject: Systems and Control

Publish: 2026-01-29 12:25:39 UTC

2601.21622

#1 StarSD: One-for-Many Speculative Decoding [PDF1] [Copy] [Kimi] [REL]

#1 StarSD: One-for-Many Speculative Decoding [PDF¹] [Copy] [Kimi] [REL]