peng21b@interspeech_2021@ISCA

Total: 1

#1 Data Augmentation for Spoken Language Understanding via Pretrained Language Models [PDF] [Copy] [Kimi]

Authors: Baolin Peng ; Chenguang Zhu ; Michael Zeng ; Jianfeng Gao

The training of spoken language understanding (SLU) models often faces the problem of data scarcity. In this paper, we put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances. Furthermore, we investigate and propose solutions to two previously overlooked semi-supervised learning scenarios of data scarcity in SLU: i) Rich-in-Ontology: ontology information with numerous valid dialogue acts is given; ii) Rich-in-Utterance: a large number of unlabelled utterances are available. Empirical results show that our method can produce synthetic training data that boosts the performance of language understanding models in various scenarios.