IWSLT.2012 | Cool Papers - Immersive Paper Discovery

#1 Toward universal network-based speech translation [PDF] [Copy] [Kimi¹] [REL]

No summary was provided.

#2 Who can understand your speech better – deep neural network of Gaussian mixture model [PDF] [Copy] [Kimi] [REL]

Author: Dong Yu

No summary was provided.

#3 The NICT ASR system for IWSLT2012 [PDF] [Copy] [Kimi¹] [REL]

Authors: Hitoshi Yamamoto ; Youzheng Wu ; Chien-Lin Huang ; Xugang Lu ; Paul R. Dixon ; Shigeki Matsuda ; Chiori Hori ; Hideki Kashioka

This paper describes our automatic speech recognition (ASR) system for the IWSLT 2012 evaluation campaign. The target data of the campaign is selected from the TED talks, a collection of public speeches on a variety of topics spoken in English. Our ASR system is based on weighted finite-state transducers and exploits an combination of acoustic models for spontaneous speech, language models based on n-gram and factored recurrent neural network trained with effectively selected corpora, and unsupervised topic adaptation framework utilizing ASR results. Accordingly, the system achieved 10.6% and 12.0% word error rate for the tst2011 and tst2012 evaluation set, respectively.

#4 The KIT translation systems for IWSLT 2012 [PDF] [Copy] [Kimi¹] [REL]

Authors: Mohammed Mediani ; Yuqi Zhang ; Thanh-Le Ha ; Jan Niehues ; Eunach Cho ; Teresa Herrmann ; Rainer Kärgel ; Alexander Waibel

In this paper, we present the KIT systems participating in the English-French TED Translation tasks in the framework of the IWSLT 2012 machine translation evaluation. We also present several additional experiments on the English-German, English-Chinese and English-Arabic translation pairs. Our system is a phrase-based statistical machine translation system, extended with many additional models which were proven to enhance the translation quality. For instance, it uses the part-of-speech (POS)-based reordering, translation and language model adaptation, bilingual language model, word-cluster language model, discriminative word lexica (DWL), and continuous space language model. In addition to this, the system incorporates special steps in the preprocessing and in the post-processing step. In the preprocessing the noisy corpora are filtered by removing the noisy sentence pairs, whereas in the postprocessing the agreement between a noun and its surrounding words in the French translation is corrected based on POS tags with morphological information. Our system deals with speech transcription input by removing case information and punctuation except periods from the text translation model.

#5 The UEDIN systems for the IWSLT 2012 evaluation [PDF] [Copy] [Kimi¹] [REL]

Authors: Eva Hasler ; Peter Bell ; Arnab Ghoshal ; Barry Haddow ; Philipp Koehn ; Fergus McInnes ; Steve Renals ; Pawel Swietojanski

This paper describes the University of Edinburgh (UEDIN) systems for the IWSLT 2012 Evaluation. We participated in the ASR (English), MT (English-French, German-English) and SLT (English-French) tracks.

#6 The NAIST machine translation system for IWSLT2012 [PDF] [Copy] [Kimi¹] [REL]

Authors: Graham Neubig ; Kevin Duh ; Masaya Ogushi ; Takamoto Kano ; Tetsuo Kiso ; Sakriani Sakti ; Tomoki Toda ; Satoshi Nakamura

This paper describes the NAIST statistical machine translation system for the IWSLT2012 Evaluation Campaign. We participated in all TED Talk tasks, for a total of 11 language-pairs. For all tasks, we use the Moses phrase-based decoder and its experiment management system as a common base for building translation systems. The focus of our work is on performing a comprehensive comparison of a multitude of existing techniques for the TED task, exploring issues such as out-of-domain data filtering, minimum Bayes risk decoding, MERT vs. PRO tuning, word alignment combination, and morphology.

#7 FBK’s machine translation systems for IWSLT 2012’s TED lectures [PDF] [Copy] [Kimi¹] [REL]

Authors: N. Ruiz ; A. Bisazza ; R. Cattoni ; M. Federico

This paper reports on FBK’s Machine Translation (MT) submissions at the IWSLT 2012 Evaluation on the TED talk translation tasks. We participated in the English-French and the Arabic-, Dutch-, German-, and Turkish-English translation tasks. Several improvements are reported over our last year baselines. In addition to using fill-up combinations of phrase-tables for domain adaptation, we explore the use of corpora filtering based on cross-entropy to produce concise and accurate translation and language models. We describe challenges encountered in under-resourced languages (Turkish) and language-specific preprocessing needs.

#8 The RWTH Aachen speech recognition and machine translation system for IWSLT 2012 [PDF] [Copy] [Kimi¹] [REL]

Authors: Stephan Peitz ; Saab Mansour ; Markus Freitag ; Minwei Feng ; Matthias Huck ; Joern Wuebker ; Malte Nuhn ; Markus Nußbaum-Thom ; Hermann Ney

In this paper, the automatic speech recognition (ASR) and statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2012 are presented. We participated in the ASR (English), MT (English-French, Arabic-English, Chinese-English, German-English) and SLT (English-French) tracks. For the MT track both hierarchical and phrase-based SMT decoders are applied. A number of different techniques are evaluated in the MT and SLT tracks, including domain adaptation via data selection, translation model interpolation, phrase training for hierarchical and phrase-based systems, additional reordering model, word class language model, various Arabic and Chinese segmentation methods, postprocessing of speech recognition output with an SMT system, and system combination. By application of these methods we can show considerable improvements over the respective baseline systems.

#9 The HIT-LTRC machine translation system for IWSLT 2012 [PDF] [Copy] [Kimi] [REL]

Authors: Xiaoning Zhu ; Yiming Cui ; Conghui Zhu ; Tiejun Zhao ; Hailong Cao

In this paper, we describe HIT-LTRC's participation in the IWSLT 2012 evaluation campaign. In this year, we took part in the Olympics Task which required the participants to translate Chinese to English with limited data. Our system is based on Moses[1], which is an open source machine translation system. We mainly used the phrase-based models to carry out our experiments, and factored-based models were also performed in comparison. All the involved tools are freely available. In the evaluation campaign, we focus on data selection, phrase extraction method comparison and phrase table combination.

#10 FBK@IWSLT 2012 – ASR track [PDF] [Copy] [Kimi¹] [REL]

Authors: D. Falavigna ; R. Gretter ; F. Brugnara ; D. Giuliani

This paper reports on the participation of FBK at the IWSLT2012 evaluation campaign on automatic speech recognition: namely in the English ASR track. Both primary and contrastive submissions have been sent for evaluation. The ASR system features acoustic models trained on a portion of the TED talk recordings that was automatically selected according to the fidelity of the provided transcriptions. Three decoding steps are performed interleaved by acoustic feature normalization and acoustic model adaptation. A final rescoring step, based on the usage of an interpolated language model, is applied to word graphs generated in the third decoding step. For the primary submission, language models entering the interpolation are trained on both out-of-domain and in-domain text data, instead the contrastive submission uses both ”general purpose” and auxiliary language models trained only on out-of-domain text data. Despite this fact, similar performance are obtained with the two submissions.

#11 The 2012 KIT and KIT-NAIST English ASR systems for the IWSLT evaluation [PDF] [Copy] [Kimi¹] [REL]

Authors: Christian Saam ; Christian Mohr ; Kevin Kilgour ; Michael Heck ; Matthias Sperber ; Keigo Kubo ; Sebatian Stüker ; Sakriani Sakri ; Graham Neubig ; Tomoki Toda ; Satoshi Nakamura ; Alex Waibel

This paper describes our English Speech-to-Text (STT) systems for the 2012 IWSLT TED ASR track evaluation. The systems consist of 10 subsystems that are combinations of different front-ends, e.g. MVDR based and MFCC based ones, and two different phone sets. The outputs of the subsystems are combined via confusion network combination. Decoding is done in two stages, where the systems of the second stage are adapted in an unsupervised manner on the combination of the first stage outputs using VTLN, MLLR, and cM-LLR.

#12 The KIT-NAIST (contrastive) English ASR system for IWSLT 2012 [PDF] [Copy] [Kimi¹] [REL]

Authors: Michael Heck ; Keigo Kubo ; Matthias Sperber ; Sakriani Sakti ; Sebastian Stüker ; Christian Saam ; Kevin Kilgour ; Christian Mohr ; Graham Neubig ; Tomoki Toda ; Satoshi Nakamura ; Alex Waibel

This paper describes the KIT-NAIST (Contrastive) English speech recognition system for the IWSLT 2012 Evaluation Campaign. In particular, we participated in the ASR track of the IWSLT TED task. The system was developed by Karlsruhe Institute of Technology (KIT) and Nara Institute of Science and Technology (NAIST) teams in collaboration within the interACT project. We employ single system decoding with fully continuous and semi-continuous models, as well as a three-stage, multipass system combination framework built with the Janus Recognition Toolkit. On the IWSLT 2010 test set our single system introduced in this work achieves a WER of 17.6%, and our final combination achieves a WER of 14.4%.

#13 EBMT system of Kyoto University in OLYMPICS task at IWSLT 2012 [PDF] [Copy] [Kimi¹] [REL]

Authors: Chenhui Chu ; Toshiaki Nakazawa ; Sadao Kurohashi

This paper describes the EBMT system of Kyoto University that participated in the OLYMPICS task at IWSLT 2012. When translating very different language pairs such as Chinese-English, it is very important to handle sentences in tree structures to overcome the difference. Many recent studies incorporate tree structures in some parts of translation process, but not all the way from model training (alignment) to decoding. Our system is a fully tree-based translation system where we use the Bayesian phrase alignment model on dependency trees and example-based translation. To improve the translation quality, we conduct some special processing for the IWSLT 2012 OLYMPICS task, including sub-sentence splitting, non-parallel sentence filtering, adoption of an optimized Chinese segmenter and rule-based decoding constraints.

#14 The LIG English to French machine translation system for IWSLT 2012 [PDF] [Copy] [Kimi¹] [REL]

Authors: Laurent Besacier ; Benjamin Lecouteux ; Marwen Azouzi ; Ngoc Quang Luong

This paper presents the LIG participation to the E-F MT task of IWSLT 2012. The primary system proposed made a large improvement (more than 3 point of BLEU on tst2010 set) compared to our last year participation. Part of this improvment was due to the use of an extraction from the Gigaword corpus. We also propose a preliminary adaptation of the driven decoding concept for machine translation. This method allows an efficient combination of machine translation systems, by rescoring the log-linear model at the N-best list level according to auxiliary systems: the basis technique is essentially guiding the search using one or previous system outputs. The results show that the approach allows a significant improvement in BLEU score using Google translate to guide our own SMT system. We also try to use a confidence measure as an additional log-linear feature but we could not get any improvment with this technique.

#15 The MIT-LL/AFRL IWSLT 2012 MT system [PDF] [Copy] [Kimi] [REL]

Authors: Jennifer Drexler ; Wade Shen ; Tim Anderson ; Raymond Slyh ; Brian Ore ; Eric Hansen ; Terry Gleason

This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2012 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic to English and English to French TED-talk translation task. We also applied our existing ASR system to the TED-talk lecture ASR task, and combined our ASR and MT systems for the TED-talk SLT task. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2011 system, and experiments we ran during the IWSLT-2012 evaluation. Specifically, we focus on 1) cross-domain translation using MAP adaptation, 2) cross-entropy filtering of MT training data, and 3) improved Arabic morphology for MT preprocessing.

#16 Minimum Bayes-risk decoding extended with similar examples: NAIST-NCT at IWSLT 2012 [PDF] [Copy] [Kimi¹] [REL]

Authors: Hiroaki Shimizu ; Masao Utiyama ; Eiichiro Sumita ; Satoshi Nakamura

This paper describes our methods used in the NAIST-NICT submission to the International Workshop on Spoken Language Translation (IWSLT) 2012 evaluation campaign. In particular, we propose two extensions to minimum bayes-risk decoding which reduces a expected loss.

#17 The NICT translation system for IWSLT 2012 [PDF] [Copy] [Kimi¹] [REL]

Authors: Andrew Finch ; Ohnmar Htun ; Eiichiro Sumita

No summary was provided.

#18 TED Polish-to-English translation system for the IWSLT 2012 [PDF] [Copy] [Kimi] [REL]

Author: Krzysztof Marasek

This paper presents efforts in preparation of the Polish-to-English SMT system for the TED lectures domain that is to be evaluated during the IWSLT 2012 Conference. Our attempts cover systems which use stems and morphological information on Polish words (using two different tools) and stems and POS.

#19 Forest-to-string translation using binarized dependency forest for IWSLT 2012 OLYMPICS task [PDF] [Copy] [Kimi¹] [REL]

Authors: Hwidong Na ; Jong-Hyeok Lee

We participated in the OLYMPICS task in IWSLT 2012 and submitted two formal runs using a forest-to-string translation system. Our primary run achieved better translation quality than our contrastive run, but worse than a phrase-based and a hierarchical system using Moses.

#20 Romanian to English automatic MT experiments at IWSLT12 – system description paper [PDF] [Copy] [Kimi] [REL]

Authors: Ştefan Daniel Dumitrescu ; Radu Ion ; Dan Ştefănescu ; Tiberiu Boroş ; Dan Tufiş

The paper presents the system developed by RACAI for the ISWLT 2012 competition, TED task, MT track, Romanian to English translation. We describe the starting baseline phrase-based SMT system, the experiments conducted to adapt the language and translation models and our post-translation cascading system designed to improve the translation without external resources. We further present our attempts at creating a better controlled decoder than the open-source Moses system offers.

#21 The TÜBİTAK statistical machine translation system for IWSLT 2012 [PDF] [Copy] [Kimi¹] [REL]

Authors: Coşkun Mermer ; Hamza Kaya ; İlknur Durgar El-Kahlout ; Mehmet Uğur Doğan

WedescribetheTU ̈B ̇ITAKsubmissiontotheIWSLT2012 Evaluation Campaign. Our system development focused on utilizing Bayesian alignment methods such as variational Bayes and Gibbs sampling in addition to the standard GIZA++ alignments. The submitted tracks are the Arabic-English and Turkish-English TED Talks translation tasks.

#22 A method for translation of paralinguistic information [PDF] [Copy] [Kimi¹] [REL]

Authors: Takatomo Kano ; Sakriani Sakti ; Shinnosuke Takamichi ; Graham Neubig ; Tomoki Toda ; Satoshi Nakamura

This paper is concerned with speech-to-speech translation that is sensitive to paralinguistic information. From the many different possible paralinguistic features to handle, in this paper we chose duration and power as a first step, proposing a method that can translate these features from input speech to the output speech in continuous space. This is done in a simple and language-independent fashion by training a regression model that maps source language duration and power information into the target language. We evaluate the proposed method on a digit translation task and show that paralinguistic information in input speech appears in output speech, and that this information can be used by target language speakers to detect emphasis.

#23 Continuous space language models using restricted Boltzmann machines [PDF] [Copy] [Kimi¹] [REL]

Authors: Jan Niehues ; Alex Waibel

We present a novel approach for continuous space language models in statistical machine translation by using Restricted Boltzmann Machines (RBMs). The probability of an n-gram is calculated by the free energy of the RBM instead of a feedforward neural net. Therefore, the calculation is much faster and can be integrated into the translation process instead of using the language model only in a re-ranking step. Furthermore, it is straightforward to introduce additional word factors into the language model. We observed a faster convergence in training if we include automatically generated word classes as an additional word factor. We evaluated the RBM-based language model on the German to English and English to French translation task of TED lectures. Instead of replacing the conventional n-gram-based language model, we trained the RBM-based language model on the more important but smaller in-domain data and combined them in a log-linear way. With this approach we could show improvements of about half a BLEU point on the translation task.

#24 Focusing language models for automatic speech recognition [PDF] [Copy] [Kimi¹] [REL]

Author: Daniele Falavigna Roberto Gretter

This paper describes a method for selecting text data from a corpus with the aim of training auxiliary Language Models (LMs) for an Automatic Speech Recognition (ASR) system. A novel similarity score function is proposed, which allows to score each document belonging to the corpus in order to select those with the highest scores for training auxiliary LMs which are linearly interpolated with the baseline one. The similarity score function makes use of ”similarity models” built from the automatic transcriptions furnished by earlier stages of the ASR system, while the documents selected for training auxiliary LMs are drawn from the same set of data used to train the baseline LM used in the ASR system. In this way, the resulting interpolated LMs are ”focused” towards the output of the recognizer itself. The approach allows to improve word error rate, measured on a task of spontaneous speech, of about 3% relative. It is important to note that a similar improvement has been obtained using an ”in-domain” set of texts data not contained in the sources used to train the baseline LM. In addition, we compared the proposed similarity score function with two other ones based on perplexity (PP) and on TFxIDF (Term Frequency x Inverse Document Frequency) vector space model. The proposed approach provides about the same performance as that based on TFxIDF model but requires both lower computation and occupation memory.

#25 Simulating human judgment in machine translation evaluation campaigns [PDF] [Copy] [Kimi¹] [REL]

Author: Philipp Koehn

We present a Monte Carlo model to simulate human judgments in machine translation evaluation campaigns, such as WMT or IWSLT. We use the model to compare different ranking methods and to give guidance on the number of judgments that need to be collected to obtain sufficiently significant distinctions between systems.