zhang25l@interspeech_2025@ISCA

Total: 1

#1 LSPnet: an ultra-low bitrate hybrid neural codec [PDF] [Copy] [Kimi] [REL]

Authors: Bowen Zhang, Ian McLoughlin, Xiaoxiao Miao, AS Madhukumar

This paper presents an ultra-low bitrate speech codec that achieves high-fidelity speech coding at 1.2kbps while maintaining low computational complexity. Building upon the LPCNet framework, combined with a parametric encoder, we introduce several key improvements by incorporating line spectral pairs (LSP) to improve quantization error performance and eliminate explicit LPC estimation by directly predicting the probability distribution of audio samples using a deep neural network, and employing a joint time-frequency training strategy combining short-time Fourier transform (STFT) loss with cross-entropy (CE) loss. The codec is suitable for real-time applications in resource-constrained environments. Experimental results show that the proposed codec not only outperforms traditional speech codecs but also achieves superior speech quality compared to state-of-the-art end-to-end codecs, offering a compelling balance between quality and computational cost.

Subject: INTERSPEECH.2025 - Speech Processing