IWSLT.2010 | Cool Papers - Immersive Paper Discovery

#1 Hierarchical phrase-based translation with weighted finite state transducers [PDF] [Copy] [Kimi¹] [REL]

Author: William Byrne

No summary was provided.

Subject: IWSLT.2010 - Plenaries

#2 Resources for adding semantics to machine translation [PDF] [Copy] [Kimi²] [REL]

Author: Jan Hajič

No summary was provided.

Subject: IWSLT.2010 - Plenaries

#3 The Quaero program: multilingual and multimedia technologies [PDF] [Copy] [Kimi³] [REL]

Author: Jean-Luc Gauvain

No summary was provided.

Subject: IWSLT.2010 - Plenaries

#4 Overview of the IWSLT 2010 evaluation campaign [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Michael Paul, Marcello Federico, Sebastian Stüker

This paper gives an overview of the evaluation campaign results of the 7th International Workshop on Spoken Language Translation (IWSLT 2010)1. This year, we focused on three spoken language tasks: (1) public speeches on a variety of topics (TALK) from English to French, (2) spoken dialog in travel situations (DIALOG) between Chinese and English, and (3) traveling expressions (BTEC) from Arabic, Turkish, and French to English. In total, 28 teams (including 7 firsttime participants) took part in the shared tasks, submitting 60 primary and 112 contrastive runs. Automatic and subjective evaluations of the primary runs were carried out in order to investigate the impact of different communication modalities, spoken language styles and semantic context on automatic speech recognition (ASR) and machine translation (MT) system performances.

Subject: IWSLT.2010 - Evaluation Campaign

#5 AppTek’s APT machine translation system for IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Evgeny Matusov, Selçuk Köprü

In this paper, we describe AppTek’s new APT machine translation system that we employed in the IWSLT 2010 evaluation campaign. This year, we participated in the Arabic-to-English and Turkish-to-English BTEC tasks. We discuss the architecture of the system, the preprocessing steps and the experiments carried out during the campaign. We show that competitive translation quality can be obtained with a system that can be turned into a real-life product without much effort.

Subject: IWSLT.2010 - Evaluation Campaign

#6 The DCU machine translation systems for IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Hala Almaghout, Jie Jiang, Andy Way

No summary was provided.

Subject: IWSLT.2010 - Evaluation Campaign

#7 N-gram-based machine translation enhanced with neural networks [PDF] [Copy] [Kimi¹] [REL]

Authors: Francisco Zamora-Martinez, Maria Jose Castro-Bleda, Holger Schwenk

No summary was provided.

Subject: IWSLT.2010 - Evaluation Campaign

#8 FBK @ IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Arianna Bisazza, Ioannis Klasinas, Mauro Cettolo, Marcello Federico

This year FBK took part in the BTEC translation task, with source languages Arabic and Turkish and target language English, and in the new TALK task, source English and target French. We worked in the framework of phrase-based statistical machine translation aiming to improve coverage of models in presence of rich morphology, on one side, and to make better use of available resources through data selection techniques. New morphological segmentation rules were developed for Turkish-English. The combination of several Turkish segmentation schemes into a lattice input led to an improvement wrt to last year. The use of additional training data was explored for Arabic-English, while on the English to French task improvement was achieved over a strong baseline by automatically selecting relevant and high quality data from the available training corpora.

Subject: IWSLT.2010 - Evaluation Campaign

#9 The GREYC/LLACAN machine translation systems for the IWSLT 2010 campaign [PDF] [Copy] [Kimi] [REL]

Authors: Julien Gosme, Wigdan Mekki, Fathi Debili, Yves Lepage, Nadine Lucas

In this paper we explore the contribution of the use of two Arabic morphological analyzers as preprocessing tools for statistical machine translation. Similar investigations have already been reported for morphologically rich languages like German, Turkish and Arabic. Here, we focus on the case of the Arabic language and mainly discuss the use of the G-LexAr analyzer. A preliminary experiment has been designed to choose the most promising translation system among the 3 G-LexAr-based systems, we concluded that the systems are equivalent. Nevertheless, we decided to use the lemmatized output of G-LexAr and use its translations as primary run for the BTEC AE track. The results showed that G-LexAr outputs degrades translation compared to the basic SMT system trained on the un-analyzed corpus.

Subject: IWSLT.2010 - Evaluation Campaign

#10 I2R’s machine translation system for IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Xiangyu Duan, Rafael Banchs, Jun Lang, Deyi Xiong, Aiti Aw, Min Zhang, Haizhou Li

No summary was provided.

Subject: IWSLT.2010 - Evaluation Campaign

#11 The ICT statistical machine translation system for IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Hao Xiong, Jun Xie, Hui Yu, Kai Liu, Wei Luo, Haitao Mi, Yang Liu, Yajuan Lü, Qun Liu

No summary was provided.

Subject: IWSLT.2010 - Evaluation Campaign

#12 The INESC-ID machine translation system for the IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Wang Ling, Tiago Luís, João Graça, Luísa Coheur, Isabel Trancoso

In this paper we describe the Instituto de Engenharia de Sistemas e Computadores Investigac ̧a ̃o e Desenvolvimento (INESC-ID) system that participated in the IWSLT 2010 evaluation campaign. Our main goal for this evaluation was to employ several state-of-the-art methods applied to phrase-based machine translation in order to improve the translation quality. Aside from the IBM M4 alignment model, two constrained alignment models were tested, which produced better overall results. These results were further improved by using weighted alignment matrixes during phrase extraction, rather than the single best alignment. Finally, we tested several filters that ruled out phrase pairs based on puntuation. Our system was evaluated on the BTEC and DIALOG tasks, having achieved a better overall ranking in the DIALOG task.

Subject: IWSLT.2010 - Evaluation Campaign

#13 ITI-UPV machine translation system for IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Guillem Gascó, Vicent Alabau, Jesús-Andrés Ferrer, Jesús González-Rubio, Martha-Alicia Rocha, Germán Sanchis-Trilles, Francisco Casacuberta, Jorge González, Joan-Andreu Sánchez

This paper presents the submissions of the PRHLT group for the evaluation campaign of the International Workshop on Spoken Language Translation. We focus on the development of reliable translation systems between syntactically different languages (DIALOG task) and on the efficient training of SMT models in resource-rich scenarios (TALK task).

Subject: IWSLT.2010 - Evaluation Campaign

#14 The KIT translation system for IWSLT 2010 [PDF] [Copy] [Kimi] [REL]

Authors: Jan Niehues, Mohammed Mediani, Teresa Herrmann, Michael Heck, Christian Herff, Alex Waibel

No summary was provided.

Subject: IWSLT.2010 - Evaluation Campaign

#15 LIG statistical machine translation systems for IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Laurent Besacier, Haitem Afli, Thi Ngoc Diep Do, Hervé Blanchon, Marion Potet

No summary was provided.

Subject: IWSLT.2010 - Evaluation Campaign

#16 LIMSI @ IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Alexandre Allauzen, Josep M. Crego, İlknur Durgar El-Kahlout, Le Hai-Son, Guillaume Wisniewski, François Yvon

This paper describes LIMSI’s Statistical Machine Translation systems (SMT) for the IWSLT evaluation, where we participated in two tasks (Talk for English to French and BTEC for Turkish to English). For the Talk task, we studied an extension of our in-house n-code SMT system (the integration of a bilingual reordering model over generalized translation units), as well as the use of training data extracted from Wikipedia in order to adapt the target language model. For the BTEC task, we concentrated on pre-processing schemes on the Turkish side in order to reduce the morphological discrepancies with the English side. We also evaluated the use of two different continuous space language models for such a small size of training data.

Subject: IWSLT.2010 - Evaluation Campaign

#17 LIUM’s statistical machine translation system for IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève

This paper describes the two systems developed by the LIUM laboratory for the 2010 IWSLT evaluation campaign. We participated to the new English to French TALK task. We developed two systems, one for each evaluation condition, both being statistical phrase-based systems using the the Moses toolkit. Several approaches were investigated.

Subject: IWSLT.2010 - Evaluation Campaign

#18 The MIRACL Arabic-English statistical machine translation system for IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Ines Turki Khemakhem, Salma Jamoussi, Abdelmajid Ben Hamadou

This paper describes the MIRACL statistical Machine Translation system and the improvements that were developed during the IWSLT 2010 evaluation campaign. We participated to the Arabic to English BTEC tasks using a phrase-based statistical machine translation approach. In this paper, we first discuss some challenges in translating from Arabic to English and we explore various techniques to improve performances on a such task. Next, we present our solution for disambiguating the output of an Arabic morphological analyzer. In fact, The Arabic morphological analyzer used produces all possible morphological structures for each word, with an unique correct proposition. In this work we exploit the Arabic-English alignment to choose the correct segmented form and the correct morpho-syntactic features produced by our morphological analyzer.

Subject: IWSLT.2010 - Evaluation Campaign

#19 The MIT-LL/AFRL IWSLT-2010 MT system [PDF] [Copy] [Kimi¹] [REL]

Authors: Wade Shen, Timothy Anderson, Raymond Slyh, A. Ryan Aminzadeh

This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2010 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic and Turkish to English translation tasks. We also participated in the new French to English BTEC and English to French TALK tasks. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2008 system, and experiments we ran during the IWSLT-2010 evaluation. Specifically, we focus on 1) cross-domain translation using MAP adaptation, 2) Turkish morphological processing and translation, 3) improved Arabic morphology for MT preprocessing, and 4) system combination methods for machine translation.

Subject: IWSLT.2010 - Evaluation Campaign

#20 The MSRA machine translation system for IWSLT 2010 [PDF] [Copy] [Kimi] [REL]

Authors: Chi-Ho Li, Nan Duan, Yinggong Zhao, Shujie Liu, Lei Cui, Mei-yuh Hwang, Amittai Axelrod, Jianfeng Gao, Yaodong Zhang, Li Deng

No summary was provided.

Subject: IWSLT.2010 - Evaluation Campaign

#21 The NICT translation system for IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Chooi-Ling Goh, Taro Watanabe, Michael Paul, Andrew Finch, Eiichiro Sumita

This paper describes NICT’s participation in the IWSLT 2010 evaluation campaign for the DIALOG translation (Chinese-English) and the BTEC (French-English) translation shared-tasks. For the DIALOG translation, the main challenge to this task is applying context information during translation. Context information can be used to decide on word choice and also to replace missing information during translation. We applied discriminative reranking using contextual information as additional features. In order to provide more choices for re-ranking, we generated n-best lists from multiple phrase-based statistical machine translation systems that varied in the type of Chinese word segmentation schemes used. We also built a model that merged the phrase tables generated by the different segmentation schemes. Furthermore, we used a lattice-based system combination model to combine the output from different systems. A combination of all of these systems was used to produce the n-best lists for re-ranking. For the BTEC task, a general approach that used latticebased system combination of two systems, a standard phrasebased system and a hierarchical phrase-based system, was taken. We also tried to process some unknown words by replacing them with the same words but different inflections that are known to the system.

Subject: IWSLT.2010 - Evaluation Campaign

#22 NTT statistical MT system for IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Katsuhito Sudoh, Kevin Duh, Hajime Tsukada

No summary was provided.

Subject: IWSLT.2010 - Evaluation Campaign

#23 The POSTECH’s statistical machine translation system for the IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Hwidong Na, Jong-Hyeok Lee

No summary was provided.

Subject: IWSLT.2010 - Evaluation Campaign

#24 The QMUL system description for IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Sirvan Yahyaei, Christof Monz

No summary was provided.

Subject: IWSLT.2010 - Evaluation Campaign

#25 The RWTH Aachen machine translation system for IWSLT 2010 [PDF] [Copy] [Kimi¹] [REL]

Authors: Saab Mansour, Stephan Peitz, David Vilar, Joern Wuebker, Hermann Ney

In this paper we describe the statistical machine translation system of the RWTH Aachen University developed for the translation task of the IWSLT 2010. This year, we participated in the BTEC translation task for the Arabic to English language direction. We experimented with two state-of-theart decoders: phrase-based and hierarchical-based decoders. Extensions to the decoders included phrase training (as opposed to heuristic phrase extraction) for the phrase-based decoder, and soft syntactic features for the hierarchical decoder. Additionally, we experimented with various rule-based and statistical-based segmenters for Arabic. Due to the different decoders and the different methodologies that we apply for segmentation, we expect that there will be complimentary variation in the results achieved by each system. The next step would be to exploit these variations and achieve better results by combining the systems. We try different strategies for system combination and report significant improvements over the best single system.

Subject: IWSLT.2010 - Evaluation Campaign