xu23@interspeech_2023@ISCA

Total: 1

#1 An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec [PDF] [Copy] [Kimi1]

Authors: Linping Xu ; Jiawei Jiang ; Dejun Zhang ; Xianjun Xia ; Li Chen ; Yijian Xiao ; Piao Ding ; Shenyi Song ; Sixing Yin ; Ferdous Sohel

Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, underutilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beamsearch Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps.