uttam19@interspeech_2019@ISCA

Total: 1

#1 Hush-Hush Speak: Speech Reconstruction Using Silent Videos [PDF] [Copy] [Kimi1]

Authors: Shashwat Uttam ; Yaman Kumar ; Dhruva Sahrawat ; Mansi Aggarwal ; Rajiv Ratn Shah ; Debanjan Mahata ; Amanda Stent

Speech Reconstruction is the task of recreation of speech using silent videos as input. In the literature, it is also referred to as lipreading. In this paper, we design an encoder-decoder architecture which takes silent videos as input and outputs an audio spectrogram of the reconstructed speech. The model, despite being a speaker-independent model, achieves comparable results on speech reconstruction to the current state-of-the-art speaker-dependent model. We also perform user studies to infer speech intelligibility. Additionally, we test the usability of the trained model using bilingual speech.