2023.iwslt-1.20@ACL

Total: 1

#1 CMU’s IWSLT 2023 Simultaneous Speech Translation System [PDF] [Copy] [Kimi1]

Authors: Brian Yan ; Jiatong Shi ; Soumi Maiti ; William Chen ; Xinjian Li ; Yifan Peng ; Siddhant Arora ; Shinji Watanabe

This paper describes CMU’s submission to the IWSLT 2023 simultaneous speech translation shared task for translating English speech to both German text and speech in a streaming fashion. We first build offline speech-to-text (ST) models using the joint CTC/attention framework. These models also use WavLM front-end features and mBART decoder initialization. We adapt our offline ST models for simultaneous speech-to-text translation (SST) by 1) incrementally encoding chunks of input speech, re-computing encoder states for each new chunk and 2) incrementally decoding output text, pruning beam search hypotheses to 1-best after processing each chunk. We then build text-to-speech (TTS) models using the VITS framework and achieve simultaneous speech-to-speech translation (SS2ST) by cascading our SST and TTS models.