pan24@interspeech_2024@ISCA

Total: 1

#1 PARIS: Pseudo-AutoRegressIve Siamese Training for Online Speech Separation [PDF] [Copy] [Kimi] [REL]

Authors: Zexu Pan, Gordon Wichern, François G. Germain, Kohei Saijo, Jonathan Le Roux

While offline speech separation models have made significant advances, the streaming regime remains less explored and is typically limited to causal modifications of existing offline networks. This study focuses on empowering a streaming speech separation model with autoregressive capability, in which the current step separation is conditioned on separated samples from past steps. To do so, we introduce pseudo-autoregressive Siamese (PARIS) training: with only two forward passes through a Siamese-style network for each batch, PARIS avoids the training-inference mismatch in teacher forcing and the need for numerous autoregressive steps during training. The proposed PARIS training improves the recent online SkiM model by 1.5 dB in SI-SNR on the WSJ0-2mix dataset, with minimal change to the network architecture and inference time.