wang22y@interspeech_2022@ISCA

Total: 1

#1 Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding [PDF] [Copy] [Kimi1]

Authors: Pu Wang ; Hugo Van hamme

End-to-end spoken language understanding (SLU) systems benefit from pretraining on large corpora, followed by fine-tuning on application-specific data. The resulting models are too large for on-edge applications. For instance, BERT-based systems contain over 110M parameters. Observing the model is over-parameterized, we propose lean transformer structure where the dimension of the attention mechanism is automatically reduced using group sparsity. We propose a variant where the learned attention subspace is transferred to an attention bottleneck layer. In a low-resource setting and without pre-training, the resulting compact SLU model achieves accuracies competitive with pre-trained large models.