Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity

#1 Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity [PDF] [Copy] [Kimi¹] [REL]

Authors: Deepak Kadetotad, Jian Meng, Visar Berisha, Chaitali Chakrabarti, Jae-sun Seo

The long short-term memory (LSTM) network is one of the most widely used recurrent neural networks (RNNs) for automatic speech recognition (ASR), but is parametrized by millions of parameters. This makes it prohibitive for memory-constrained hardware accelerators as the storage demand causes higher dependence on off-chip memory, which bottlenecks latency and power. In this paper, we propose a new LSTM training technique based on hierarchical coarse-grain sparsity (HCGS), which enforces hierarchical structured sparsity by randomly dropping static block-wise connections between layers. HCGS maintains the same hierarchical structured sparsity throughout training and inference; this reduces weight storage for both training and inference hardware systems. We also jointly optimize in-training quantization with HCGS on 2-/3-layer LSTM networks for the TIMIT and TED-LIUM corpora. With 16× structured compression and 6-bit weight precision, we achieved a phoneme error rate (PER) of 16.9% for TIMIT and a word error rate (WER) of 18.9% for TED-LIUM, showing the best trade-off between error rate and LSTM memory compression compared to prior works.

Subject: INTERSPEECH.2020 - Speech Recognition

kadetotad20@interspeech_2020@ISCA

#1 Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity [PDF] [Copy] [Kimi1] [REL]

#1 Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity [PDF] [Copy] [Kimi¹] [REL]