2606.16568

Total: 1

#1 Fast When, Careful Who: Dual-Process Multiparty Turn-Taking with Diffusion Augmentation [PDF] [Copy] [Kimi] [REL]

Authors: Rutherford A. Patamia, Ming Liu, Wei Luo, Favour Ekong, Akan Cosgun

Reliable turn-taking is essential for spoken dialogue systems. However, most existing methods are designed for two-speaker interaction and struggle with realistic multiparty audio containing overlap and rapid speaker changes. We study multiparty turn-taking on the VoxConverse dataset and propose an audio-only two-stage pipeline that separates when to trigger a turn boundary from whether the floor is actually transferring. A fast trigger scans the audio and proposes candidate end-of-turn times, while a lightweight verifier runs only at those times to decide \textsc{Hold} or \textsc{Shift} and support next-speaker prediction. We report results in the full multiparty setting and a controlled dyadic top-2 projection for comparability. We also investigate diffusion-based, label-preserving background-audio mixing as a data augmentation strategy. Results show improved shift detection over a baseline, with further improvements from diffusion augmentation.

Subjects: Computation and Language , Artificial Intelligence

Publish: 2026-06-15 11:09:40 UTC