Analysis of Acoustic information in End-to-End Spoken Language Translation

#1 Analysis of Acoustic information in End-to-End Spoken Language Translation [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Gerard Sant, Carlos Escolano

End-to-End Transformer-based models are the most popular approach for Spoken Language Translation (SLT). While obtaining state-of-the-art results, we are still far from understanding how these models extract acoustic information from the data and how they are transformed into semantic representations. In this paper, we seek to provide a better understanding of the flow of acoustic information along speech-to-text translation models. By means of the Speaker Classification and Spectrogram Reconstruction tasks, this study (i) interprets the main role of the encoder with respect to the acoustic features, (ii) highlights the importance of the acoustic information throughout the model and its transfer between encoder and decoder, and (iii) reveals the significant effect of downsampling convolutional layers for learning acoustic features. (iv) Finally, we also observe the existence of a strong correlation between the semantic domain and the speakers' labels in MuST-C.

Subject: INTERSPEECH.2023 - Language and Multimodal

sant23@interspeech_2023@ISCA

#1 Analysis of Acoustic information in End-to-End Spoken Language Translation [PDF1] [Copy] [Kimi1] [REL]

#1 Analysis of Acoustic information in End-to-End Spoken Language Translation [PDF¹] [Copy] [Kimi¹] [REL]