Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition

#1 Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition [PDF] [Copy] [Kimi¹] [REL]

Authors: Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu

In this work, we propose two improvements to attention based sequence-to-sequence models for end-to-end speech recognition systems. For the first improvement, we propose to use an input-feeding architecture which feeds not only the previous context vector but also the previous decoder hidden state information as inputs to the decoder. The second improvement is based on a better hypothesis generation scheme for sequential minimum Bayes risk (MBR) training of sequence-to-sequence models where we introduce softmax smoothing into N-best generation during MBR training. We conduct the experiments on both Switchboard-300hrs and Switchboard+Fisher-2000hrs datasets and observe significant gains from both proposed improvements. Together with other training strategies such as dropout and scheduled sampling, our best model achieved WERs of 8.3%/15.5% on the Switchboard/CallHome subsets of Eval2000 without any external language models which is highly competitive among state-of-the-art English conversational speech recognition systems.

Subject: INTERSPEECH.2018 - Speech Recognition

weng18@interspeech_2018@ISCA

#1 Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition [PDF] [Copy] [Kimi1] [REL]

#1 Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition [PDF] [Copy] [Kimi¹] [REL]