IWSLT.2020

| Total: 34

#1 FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN [PDF] [Copy] [Kimi1] [REL]

Authors: Ebrahim Ansari ; Amittai Axelrod ; Nguyen Bach ; Ondřej Bojar ; Roldano Cattoni ; Fahim Dalvi ; Nadir Durrani ; Marcello Federico ; Christian Federmann ; Jiatao Gu ; Fei Huang ; Kevin Knight ; Xutai Ma ; Ajay Nagesh ; Matteo Negri ; Jan Niehues ; Juan Pino ; Elizabeth Salesky ; Xing Shi ; Sebastian Stüker ; Marco Turchi ; Alexander Waibel ; Changhan Wang

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2020) featured this year six challenge tracks: (i) Simultaneous speech translation, (ii) Video speech translation, (iii) Offline speech translation, (iv) Conversational speech translation, (v) Open domain translation, and (vi) Non-native speech translation. A total of teams participated in at least one of the tracks. This paper introduces each track’s goal, data and evaluation metrics, and reports the results of the received submissions.

#2 ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020 [PDF] [Copy] [Kimi1] [REL]

Authors: Maha Elbayad ; Ha Nguyen ; Fethi Bougares ; Natalia Tomashenko ; Antoine Caubrière ; Benjamin Lecouteux ; Yannick Estève ; Laurent Besacier

This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). Attention-based encoder-decoder models, trained end-to-end, were used for our submissions to the offline speech translation track. Our contributions focused on data augmentation and ensembling of multiple models. In the simultaneous speech translation track, we build on Transformer-based wait-k models for the text-to-text subtask. For speech-to-text simultaneous translation, we attach a wait-k MT system to a hybrid ASR system. We propose an algorithm to control the latency of the ASR+MT cascade and achieve a good latency-quality trade-off on both subtasks.

#3 Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University [PDF] [Copy] [Kimi1] [REL]

Authors: Parnia Bahar ; Patrick Wilken ; Tamer Alkhouli ; Andreas Guta ; Pavel Golik ; Evgeny Matusov ; Christian Herold

AppTek and RWTH Aachen University team together to participate in the offline and simultaneous speech translation tracks of IWSLT 2020. For the offline task, we create both cascaded and end-to-end speech translation systems, paying attention to careful data selection and weighting. In the cascaded approach, we combine high-quality hybrid automatic speech recognition (ASR) with the Transformer-based neural machine translation (NMT). Our end-to-end direct speech translation systems benefit from pretraining of adapted encoder and decoder components, as well as synthetic data and fine-tuning and thus are able to compete with cascaded systems in terms of MT quality. For simultaneous translation, we utilize a novel architecture that makes dynamic decisions, learned from parallel data, to determine when to continue feeding on input or generate output words. Experiments with speech and text input show that even at low latency this architecture leads to superior translation results.

#4 KIT’s IWSLT 2020 SLT Translation System [PDF] [Copy] [Kimi1] [REL]

Authors: Ngoc-Quan Pham ; Felix Schneider ; Tuan-Nam Nguyen ; Thanh-Le Ha ; Thai Son Nguyen ; Maximilian Awiszus ; Sebastian Stüker ; Alexander Waibel

This paper describes KIT’s submissions to the IWSLT2020 Speech Translation evaluation campaign. We first participate in the simultaneous translation task, in which our simultaneous models are Transformer based and can be efficiently trained to obtain low latency with minimized compromise in quality. On the offline speech translation task, we applied our new Speech Transformer architecture to end-to-end speech translation. The obtained model can provide translation quality which is competitive to a complicated cascade. The latter still has the upper hand, thanks to the ability to transparently access to the transcription, and resegment the inputs to avoid fragmentation.

#5 End-to-End Simultaneous Translation System for IWSLT2020 Using Modality Agnostic Meta-Learning [PDF] [Copy] [Kimi1] [REL]

Authors: Hou Jeung Han ; Mohd Abbas Zaidi ; Sathish Reddy Indurthi ; Nikhil Kumar Lakumarapu ; Beomseok Lee ; Sangha Kim

In this paper, we describe end-to-end simultaneous speech-to-text and text-to-text translation systems submitted to IWSLT2020 online translation challenge. The systems are built by adding wait-k and meta-learning approaches to the Transformer architecture. The systems are evaluated on different latency regimes. The simultaneous text-to-text translation achieved a BLEU score of 26.38 compared to the competition baseline score of 14.17 on the low latency regime (Average latency ≤ 3). The simultaneous speech-to-text system improves the BLEU score by 7.7 points over the competition baseline for the low latency regime (Average Latency ≤ 1000).

#6 DiDi Labs’ End-to-end System for the IWSLT 2020 Offline Speech TranslationTask [PDF] [Copy] [Kimi1] [REL]

Authors: Arkady Arkhangorodsky ; Yiqi Huang ; Amittai Axelrod

This paper describes the system that was submitted by DiDi Labs to the offline speech translation task for IWSLT 2020. We trained an end-to-end system that translates audio from English TED talks to German text, without producing intermediate English text. We use the S-Transformer architecture and train using the MuSTC dataset. We also describe several additional experiments that were attempted, but did not yield improved results.

#7 End-to-End Offline Speech Translation System for IWSLT 2020 using Modality Agnostic Meta-Learning [PDF] [Copy] [Kimi1] [REL]

Authors: Nikhil Kumar Lakumarapu ; Beomseok Lee ; Sathish Reddy Indurthi ; Hou Jeung Han ; Mohd Abbas Zaidi ; Sangha Kim

In this paper, we describe the system submitted to the IWSLT 2020 Offline Speech Translation Task. We adopt the Transformer architecture coupled with the meta-learning approach to build our end-to-end Speech-to-Text Translation (ST) system. Our meta-learning approach tackles the data scarcity of the ST task by leveraging the data available from Automatic Speech Recognition (ASR) and Machine Translation (MT) tasks. The meta-learning approach combined with synthetic data augmentation techniques improves the model performance significantly and achieves BLEU scores of 24.58, 27.51, and 27.61 on IWSLT test 2015, MuST-C test, and Europarl-ST test sets respectively.

#8 End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020 [PDF] [Copy] [Kimi1] [REL]

Authors: Marco Gaido ; Mattia A. Di Gangi ; Matteo Negri ; Marco Turchi

This paper describes FBK’s participation in the IWSLT 2020 offline speech translation (ST) task. The task evaluates systems’ ability to translate English TED talks audio into German texts. The test talks are provided in two versions: one contains the data already segmented with automatic tools and the other is the raw data without any segmentation. Participants can decide whether to work on custom segmentation or not. We used the provided segmentation. Our system is an end-to-end model based on an adaptation of the Transformer for speech data. Its training process is the main focus of this paper and it is based on: i) transfer learning (ASR pretraining and knowledge distillation), ii) data augmentation (SpecAugment, time stretch and synthetic data), iii)combining synthetic and real data marked as different domains, and iv) multi-task learning using the CTC loss. Finally, after the training with word-level knowledge distillation is complete, our ST models are fine-tuned using label smoothed cross entropy. Our best model scored 29 BLEU on the MuST-CEn-De test set, which is an excellent result compared to recent papers, and 23.7 BLEU on the same data segmented with VAD, showing the need for researching solutions addressing this specific data condition.

#9 SRPOL’s System for the IWSLT 2020 End-to-End Speech Translation Task [PDF] [Copy] [Kimi1] [REL]

Authors: Tomasz Potapczyk ; Pawel Przybysz

We took part in the offline End-to-End English to German TED lectures translation task. We based our solution on our last year’s submission. We used a slightly altered Transformer architecture with ResNet-like convolutional layer preparing the audio input to Transformer encoder. To improve the model’s quality of translation we introduced two regularization techniques and trained on machine translated Librispeech corpus in addition to iwslt-corpus, TEDLIUM2 andMust_C corpora. Our best model scored almost 3 BLEU higher than last year’s model. To segment 2020 test set we used exactly the same procedure as last year.

#10 The University of Helsinki Submission to the IWSLT2020 Offline SpeechTranslation Task [PDF] [Copy] [Kimi1] [REL]

Authors: Raúl Vázquez ; Mikko Aulamo ; Umut Sulubacak ; Jörg Tiedemann

This paper describes the University of Helsinki Language Technology group’s participation in the IWSLT 2020 offline speech translation task, addressing the translation of English audio into German text. In line with this year’s task objective, we train both cascade and end-to-end systems for spoken language translation. We opt for an end-to-end multitasking architecture with shared internal representations and a cascade approach that follows a standard procedure consisting of ASR, correction, and MT stages. We also describe the experiments that served as a basis for the submitted systems. Our experiments reveal that multitasking training with shared internal representations is not only possible but allows for knowledge-transfer across modalities.

#11 The AFRL IWSLT 2020 Systems: Work-From-Home Edition [PDF] [Copy] [Kimi1] [REL]

Authors: Brian Ore ; Eric Hansen ; Tim Anderson ; Jeremy Gwinnup

This report summarizes the Air Force Research Laboratory (AFRL) submission to the offline spoken language translation (SLT) task as part of the IWSLT 2020 evaluation campaign. As in previous years, we chose to adopt the cascade approach of using separate systems to perform speech activity detection, automatic speech recognition, sentence segmentation, and machine translation. All systems were neural based, including a fully-connected neural network for speech activity detection, a Kaldi factorized time delay neural network with recurrent neural network (RNN) language model rescoring for speech recognition, a bidirectional RNN with attention mechanism for sentence segmentation, and transformer networks trained with OpenNMT and Marian for machine translation. Our primary submission yielded BLEU scores of 21.28 on tst2019 and 23.33 on tst2020.

#12 LIT Team’s System Description for Japanese-Chinese Machine Translation Task in IWSLT 2020 [PDF] [Copy] [Kimi1] [REL]

Authors: Yimeng Zhuang ; Yuan Zhang ; Lijie Wang

This paper describes the LIT Team’s submission to the IWSLT2020 open domain translation task, focusing primarily on Japanese-to-Chinese translation direction. Our system is based on the organizers’ baseline system, but we do more works on improving the Transform baseline system by elaborate data pre-processing. We manage to obtain significant improvements, and this paper aims to share some data processing experiences in this translation task. Large-scale back-translation on monolingual corpus is also investigated. In addition, we also try shared and exclusive word embeddings, compare different granularity of tokens like sub-word level. Our Japanese-to-Chinese translation system achieves a performance of BLEU=34.0 and ranks 2nd among all participating systems.

#13 OPPO’s Machine Translation System for the IWSLT 2020 Open Domain Translation Task [PDF] [Copy] [Kimi1] [REL]

Authors: Qian Zhang ; Xiaopu Li ; Dawei Dang ; Tingxun Shi ; Di Ai ; Zhengshan Xue ; Jie Hao

In this paper, we demonstrate our machine translation system applied for the Chinese-Japanese bidirectional translation task (aka. open domain translation task) for the IWSLT 2020. Our model is based on Transformer (Vaswani et al., 2017), with the help of many popular, widely proved effective data preprocessing and augmentation methods. Experiments show that these methods can improve the baseline model steadily and significantly.

#14 Character Mapping and Ad-hoc Adaptation: Edinburgh’s IWSLT 2020 Open Domain Translation System [PDF] [Copy] [Kimi1] [REL]

Authors: Pinzhen Chen ; Nikolay Bogoychev ; Ulrich Germann

This paper describes the University of Edinburgh’s neural machine translation systems submitted to the IWSLT 2020 open domain Japanese↔Chinese translation task. On top of commonplace techniques like tokenisation and corpus cleaning, we explore character mapping and unsupervised decoding-time adaptation. Our techniques focus on leveraging the provided data, and we show the positive impact of each technique through the gradual improvement of BLEU.

#15 CASIA’s System for IWSLT 2020 Open Domain Translation [PDF] [Copy] [Kimi1] [REL]

Authors: Qian Wang ; Yuchen Liu ; Cong Ma ; Yu Lu ; Yining Wang ; Long Zhou ; Yang Zhao ; Jiajun Zhang ; Chengqing Zong

This paper describes the CASIA’s system for the IWSLT 2020 open domain translation task. This year we participate in both Chinese→Japanese and Japanese→Chinese translation tasks. Our system is neural machine translation system based on Transformer model. We augment the training data with knowledge distillation and back translation to improve the translation performance. Domain data classification and weighted domain model ensemble are introduced to generate the final translation result. We compare and analyze the performance on development data with different model settings and different data processing techniques.

#16 Deep Blue Sonics’ Submission to IWSLT 2020 Open Domain Translation Task [PDF] [Copy] [Kimi1] [REL]

Authors: Enmin Su ; Yi Ren

We present in this report our submission to IWSLT 2020 Open Domain Translation Task. We built a data pre-processing pipeline to efficiently handle large noisy web-crawled corpora, which boosts the BLEU score of a widely used transformer model in this translation task. To tackle the open-domain nature of this task, back- translation is applied to further improve the translation performance.

#17 University of Tsukuba’s Machine Translation System for IWSLT20 Open Domain Translation Task [PDF] [Copy] [Kimi1] [REL]

Authors: Hongyi Cui ; Yizhen Wei ; Shohei Iida ; Takehito Utsuro ; Masaaki Nagata

In this paper, we introduce University of Tsukuba’s submission to the IWSLT20 Open Domain Translation Task. We participate in both Chinese→Japanese and Japanese→Chinese directions. For both directions, our machine translation systems are based on the Transformer architecture. Several techniques are integrated in order to boost the performance of our models: data filtering, large-scale noised training, model ensemble, reranking and postprocessing. Consequently, our efforts achieve 33.0 BLEU scores for Chinese→Japanese translation and 32.3 BLEU scores for Japanese→Chinese translation.

#18 Xiaomi’s Submissions for IWSLT 2020 Open Domain Translation Task [PDF] [Copy] [Kimi1] [REL]

Authors: Yuhui Sun ; Mengxue Guo ; Xiang Li ; Jianwei Cui ; Bin Wang

This paper describes the Xiaomi’s submissions to the IWSLT20 shared open domain translation task for Chinese<->Japanese language pair. We explore different model ensembling strategies based on recent Transformer variants. We also further strengthen our systems via some effective techniques, such as data filtering, data selection, tagged back translation, domain adaptation, knowledge distillation, and re-ranking. Our resulting Chinese->Japanese primary system ranked second in terms of character-level BLEU score among all submissions. Our resulting Japanese->Chinese primary system also achieved a competitive performance.

#19 ISTIC’s Neural Machine Translation System for IWSLT’2020 [PDF] [Copy] [Kimi1] [REL]

Authors: Jiaze Wei ; Wenbin Liu ; Zhenfeng Wu ; You Pan ; Yanqing He

This paper introduces technical details of machine translation system of Institute of Scientific and Technical Information of China (ISTIC) for the 17th International Conference on Spoken Language Translation (IWSLT 2020). ISTIC participated in both translation tasks of the Open Domain Translation track: Japanese-to-Chinese MT task and Chinese-to-Japanese MT task. The paper mainly elaborates on the model framework, data preprocessing methods and decoding strategies adopted in our system. In addition, the system performance on the development set are given under different settings.

#20 Octanove Labs’ Japanese-Chinese Open Domain Translation System [PDF] [Copy] [Kimi1] [REL]

Author: Masato Hagiwara

This paper describes Octanove Labs’ submission to the IWSLT 2020 open domain translation challenge. In order to build a high-quality Japanese-Chinese neural machine translation (NMT) system, we use a combination of 1) parallel corpus filtering and 2) back-translation. We have shown that, by using heuristic rules and learned classifiers, the size of the parallel data can be reduced by 70% to 90% without much impact on the final MT performance. We have also shown that including the artificially generated parallel data through back-translation further boosts the metric by 17% to 27%, while self-training contributes little. Aside from a small number of parallel sentences annotated for filtering, no external resources have been used to build our system.

#21 NAIST’s Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task [PDF] [Copy] [Kimi1] [REL]

Authors: Ryo Fukuda ; Katsuhito Sudoh ; Satoshi Nakamura

This paper describes NAIST’s NMT system submitted to the IWSLT 2020 conversational speech translation task. We focus on the translation disfluent speech transcripts that include ASR errors and non-grammatical utterances. We tried a domain adaptation method by transferring the styles of out-of-domain data (United Nations Parallel Corpus) to be like in-domain data (Fisher transcripts). Our system results showed that the NMT model with domain adaptation outperformed a baseline. In addition, slight improvement by the style transfer was observed.

#22 Generating Fluent Translations from Disfluent Text Without Access to Fluent References: IIT Bombay@IWSLT2020 [PDF] [Copy] [Kimi1] [REL]

Authors: Nikhil Saini ; Jyotsana Khatri ; Preethi Jyothi ; Pushpak Bhattacharyya

Machine translation systems perform reasonably well when the input is well-formed speech or text. Conversational speech is spontaneous and inherently consists of many disfluencies. Producing fluent translations of disfluent source text would typically require parallel disfluent to fluent training data. However, fluent translations of spontaneous speech are an additional resource that is tedious to obtain. This work describes the submission of IIT Bombay to the Conversational Speech Translation challenge at IWSLT 2020. We specifically tackle the problem of disfluency removal in disfluent-to-fluent text-to-text translation assuming no access to fluent references during training. Common patterns of disfluency are extracted from disfluent references and a noise induction model is used to simulate them starting from a clean monolingual corpus. This synthetically constructed dataset is then considered as a proxy for labeled data during training. We also make use of additional fluent text in the target language to help generate fluent translations. This work uses no fluent references during training and beats a baseline model by a margin of 4.21 and 3.11 BLEU points where the baseline uses disfluent and fluent references, respectively. Index Terms- disfluency removal, machine translation, noise induction, leveraging monolingual data, denoising for disfluency removal.

#23 The HW-TSC Video Speech Translation System at IWSLT 2020 [PDF] [Copy] [Kimi1] [REL]

Authors: Minghan Wang ; Hao Yang ; Yao Deng ; Ying Qin ; Lizhi Lei ; Daimeng Wei ; Hengchao Shang ; Ning Xie ; Xiaochun Li ; Jiaxian Guo

The paper presents details of our system in the IWSLT Video Speech Translation evaluation. The system works in a cascade form, which contains three modules: 1) A proprietary ASR system. 2) A disfluency correction system aims to remove interregnums or other disfluent expressions with a fine-tuned BERT and a series of rule-based algorithms. 3) An NMT System based on the Transformer and trained with massive publicly available corpus.

#24 CUNI Neural ASR with Phoneme-Level Intermediate Step for~Non-Native~SLT at IWSLT 2020 [PDF] [Copy] [Kimi1] [REL]

Authors: Peter Polák ; Sangeet Sagar ; Dominik Macháček ; Ondřej Bojar

In this paper, we present our submission to the Non-Native Speech Translation Task for IWSLT 2020. Our main contribution is a proposed speech recognition pipeline that consists of an acoustic model and a phoneme-to-grapheme model. As an intermediate representation, we utilize phonemes. We demonstrate that the proposed pipeline surpasses commercially used automatic speech recognition (ASR) and submit it into the ASR track. We complement this ASR with off-the-shelf MT systems to take part also in the speech translation track.

#25 ELITR Non-Native Speech Translation at IWSLT 2020 [PDF] [Copy] [Kimi1] [REL]

Authors: Dominik Macháček ; Jonáš Kratochvíl ; Sangeet Sagar ; Matúš Žilinec ; Ondřej Bojar ; Thai-Son Nguyen ; Felix Schneider ; Philip Williams ; Yuekun Yao

This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020. We describe systems for offline ASR, real-time ASR, and our cascaded approach to offline SLT and real-time SLT. We select our primary candidates from a pool of pre-existing systems, develop a new end-to-end general ASR system, and a hybrid ASR trained on non-native speech. The provided small validation set prevents us from carrying out a complex validation, but we submit all the unselected candidates for contrastive evaluation on the test set.