Multi-channel Attention for End-to-End Speech Recognition

#1 Multi-channel Attention for End-to-End Speech Recognition [PDF] [Copy] [Kimi¹] [REL]

Authors: Stefan Braun, Daniel Neil, Jithendar Anumula, Enea Ceolini, Shih-Chii Liu

Recent end-to-end models for automatic speech recognition use sensory attention to integrate multiple input channels within a single neural network. However, these attention models are sensitive to the ordering of the channels used during training. This work proposes a sensory attention mechanism that is invariant to the channel ordering and only increases the overall parameter count by 0.09%. We demonstrate that even without re-training, our attention-equipped end-to-end model is able to deal with arbitrary numbers of input channels during inference. In comparison to a recent related model with sensory attention, our model when tested on the real noisy recordings from the multi-channel CHiME-4 dataset, achieves a relative character error rate (CER) improvement of 40.3% to 42.9%. In a two-channel configuration experiment, the attention signal allows the lower signal-to-noise ratio (SNR) sensor to be identified with 97.7% accuracy.

Subject: INTERSPEECH.2018 - Speech Recognition

braun18@interspeech_2018@ISCA

#1 Multi-channel Attention for End-to-End Speech Recognition [PDF] [Copy] [Kimi1] [REL]

#1 Multi-channel Attention for End-to-End Speech Recognition [PDF] [Copy] [Kimi¹] [REL]