kim25o@interspeech_2025@ISCA

Total: 1

#1 Towards an Ultra-Low-Delay Neural Audio Coding with Computational Efficiency [PDF2] [Copy] [Kimi] [REL]

Authors: Byeong Hyeon Kim, Hyungseob Lim, Inseon Jang, Hong-Goo Kang

Recent studies on neural audio codecs (NACs) have primarily focused on improving audio quality in extremely low bit-rate scenarios. However, they have not thoroughly explored the impact of latency. In this work, we first demonstrate that NACs can achieve high-quality reconstruction with an algorithmic delay below 1 ms, albeit at substantial computational costs. To address this challenge, we propose DualStream, a novel framework designed to significantly reduce computational costs in ultra-low-delay settings. DualStream integrates a lightweight encoding module with a small down-sampling ratio to maintain low algorithmic delay, combined with a larger module with a higher down-sampling ratio that processes time-delayed inputs to improve efficiency without introducing additional delay. Experimental results demonstrate that DualStream, with an algorithmic delay of 0.7 ms, achieves comparable performance to conventional NACs while reducing computational costs by approximately 40%.

Subject: INTERSPEECH.2025 - Speech Processing