ng23b@interspeech_2023@ISCA

Total: 1

#1 Small Footprint Multi-channel Network for Keyword Spotting with Centroid Based Awareness [PDF] [Copy] [Kimi1]

Authors: Dianwen Ng ; Yang Xiao ; Jia Qi Yip ; Zhao Yang ; Biao Tian ; Qiang Fu ; Eng Siong Chng ; Bin Ma

Spoken Keyword Spotting (KWS) in noisy far-field environments is challenging for small-footprint models, given the restrictions on computational resources (e.g., model size, running memory). This is even more intricate when handling noises from multiple microphones. To address this, we present a new multi-channel model that uses a CNN-based network with a linear mixing unit to achieve local-global dependency representations. Our method enhances noise-robustness while ensuring more efficient computation. Besides, we propose an end-to-end centroid-based awareness module that provides class similarity awareness at the bottleneck level to correct ambiguous cases during prediction. We conducted experiments using real noisy far-field data from the MISP challenge 2021 and achieved SOTA results compared to existing small-footprint KWS models. Our best score of 0.126 is highly competitive against larger models like 3D-ResNet, which is 0.122, but ours is much smaller at 473K compared to 13M.