IWSLT.2010 - Papers

| Total: 18

#1 An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting [PDF] [Copy] [Kimi1] [REL]

Authors: Marianna Apidianaki ; Yifan He

No summary was provided.

#2 Mining parallel fragments from comparable texts [PDF] [Copy] [Kimi1] [REL]

Authors: Mauro Cettolo ; Marcello Federico ; Nicola Bertoldi

This paper proposes a novel method for exploiting comparable documents to generate parallel data for machine translation. First, each source document is paired to each sentence of the corresponding target document; second, partial phrase alignments are computed within the paired texts; finally, fragment pairs across linked phrase-pairs are extracted. The algorithm has been tested on two recent challenging news translation tasks. Results show that mining for parallel fragments is more effective than mining for parallel sentences, and that comparable in-domain texts can be more valuable than parallel out-of-domain texts.

#3 Improved Vietnamese-French parallel corpus mining using English language [PDF] [Copy] [Kimi] [REL]

Authors: Thi Ngoc Diep Do ; Laurent Besacier ; Eric Castelli

No summary was provided.

#4 Analysis of translation model adaptation in statistical machine translation [PDF] [Copy] [Kimi1] [REL]

Authors: Kevin Duh ; Katsuhito Sudoh ; Hajime Tsukada

No summary was provided.

#5 The pay-offs of preprocessing for German-English statistical machine translation [PDF] [Copy] [Kimi] [REL]

Authors: Ilknur Durgar El-Kahlout ; Francois Yvon

In this paper, we present the result of our work on improving the preprocessing for German-English statistical machine translation. We implemented and tested various improvements aimed at i) converting German texts to the new orthographic conventions; ii) performing a new tokenization for German; iii) normalizing lexical redundancy with the help of POS tagging and morphological analysis; iv) splitting German compound words with frequency based algorithm and; v) reducing singletons and out-of-vocabulary words. All these steps are performed during preprocessing on the German side. Combining all these processes, we reduced 10% of the singletons, 2% OOV words, and obtained 1.5 absolute (7% relative) BLEU improvement on the WMT 2010 German to English News translation task.

#6 A Bayesian model of bilingual segmentation for transliteration [PDF] [Copy] [Kimi1] [REL]

Authors: Andrew Finch ; Eiichiro Sumita

No summary was provided.

#7 Faster cube pruning [PDF] [Copy] [Kimi] [REL]

Authors: Andrea Gesmundo ; James Henderson

No summary was provided.

#8 Factor templates for factored machine translation models [PDF] [Copy] [Kimi] [REL]

Authors: Yvette Graham ; Josef van Genabith

No summary was provided.

#9 Modelling pronominal anaphora in statistical machine translation [PDF] [Copy] [Kimi1] [REL]

Authors: Christian Hardmeier ; Marcello Federico

Current Statistical Machine Translation (SMT) systems translate texts sentence by sentence without considering any cross-sentential context. Assuming independence between sentences makes it difficult to take certain translation decisions when the necessary information cannot be determined locally. We argue for the necessity to include crosssentence dependencies in SMT. As a case in point, we study the problem of pronominal anaphora translation by manually evaluating German-English SMT output. We then present a word dependency model for SMT, which can represent links between word pairs in the same or in different sentences. We use this model to integrate the output of a coreference resolution system into English-German SMT with a view to improving the translation of anaphoric pronouns.

#10 A combination of hierarchical systems with forced alignments from phrase-based systems [PDF] [Copy] [Kimi1] [REL]

Authors: Carmen Heger ; Joern Wuebker ; David Vilar ; Hermann Ney

Currently most state-of-the-art statistical machine translation systems present a mismatch between training and generation conditions. Word alignments are computed using the well known IBM models for single-word based translation. Afterwards phrases are extracted using extraction heuristics, unrelated to the stochastic models applied for finding the word alignment. In the last years, several research groups have tried to overcome this mismatch, but only with limited success. Recently, the technique of forced alignments has shown to improve translation quality for a phrase-based system, applying a more statistically sound approach to phrase extraction. In this work we investigate the first steps to combine forced alignment with a hierarchical model. Experimental results on IWSLT and WMT data show improvements in translation quality of up to 0.7% BLEU and 1.0% TER.

#11 Multi-pivot translation by system combination [PDF] [Copy] [Kimi1] [REL]

Authors: Gregor Leusch ; Aurélien Max ; Josep Maria Crego ; Hermann Ney

This paper describes a technique to exploit multiple pivot languages when using machine translation (MT) on language pairs with scarce bilingual resources, or where no translation system for a language pair is available. The principal idea is to generate intermediate translations in several pivot languages, translate them separately into the target language, and generate a consensus translation out of these using MT system combination techniques. Our technique can also be applied when a translation system for a language pair is available, but is limited in its translation accuracy because of scarce resources. Using statistical MT systems for the 11 different languages of Europarl, we show experimentally that a direct translation system can be replaced by this pivot approach without a loss in translation quality if about six pivot languages are available. Furthermore, we can already improve an existing MT system by adding two pivot systems to it. The maximum improvement was found to be 1.4% abs. in BLEU in our experiments for 8 or more pivot languages.

#12 Real-time spoken language identification and recognition for speech-to-speech translation [PDF] [Copy] [Kimi1] [REL]

Authors: Daniel Chung Yong Lim ; Ian Lane ; Alex Waibel

No summary was provided.

#13 Towards a general and extensible phrase-extraction algorithm [PDF] [Copy] [Kimi1] [REL]

Authors: Wang Ling ; Tiago Luís ; João Graça ; Luísa Coheur ; Isabel Trancoso

Phrase-based systems deeply depend on the quality of their phrase tables and therefore, the process of phrase extraction is always a fundamental step. In this paper we present a general and extensible phrase extraction algorithm, where we have highlighted several control points. The instantiation of these control points allows the simulation of previous approaches, as in each one of these points different strategies/heuristics can be tested. We show how previous approaches fit in this algorithm, compare several of them and, in addition, we propose alternative heuristics, showing their impact on the final translation results. Considering two different test scenarios from the IWSLT 2010 competition (BTEC, Fr-En and DIALOG, Cn-En), we have obtained an improvement in the results of 2.4 and 2.8 BLEU points, respectively.

#14 MorphTagger: HMM-based Arabic segmentation for statistical machine translation [PDF] [Copy] [Kimi1] [REL]

Author: Saab Mansour

In this paper, we investigate different methodologies of Arabic segmentation for statistical machine translation by comparing a rule-based segmenter to different statistically-based segmenters. We also present a new method for segmentation that serves the need for a real-time translation system without impairing the translation accuracy.

#15 Comparing intrinsic and extrinsic evaluation of MT output in a dialogue system [PDF] [Copy] [Kimi1] [REL]

Authors: Anne H. Schneider ; Ielka van der Sluis ; Saturnino Luz

No summary was provided.

#16 Sign language machine translation overkill [PDF] [Copy] [Kimi1] [REL]

Authors: Daniel Stein ; Christoph Schmidt ; Hermann Ney

Sign languages represent an interesting niche for statistical machine translation that is typically hampered by the scarceness of suitable data, and most papers in this area apply only a few, well-known techniques and do not adapt them to small-sized corpora. In this paper, we will propose new methods for common approaches like scaling factor optimization and alignment merging strategies which helped improve our baseline. We also conduct experiments with different decoders and employ state-of-the-art techniques like soft syntactic labels as well as trigger-based and discriminative word lexica and system combination. All methods are evaluated on one of the largest sign language corpora available.

#17 If I only had a parser: poor man’s syntax for hierarchical machine translation [PDF] [Copy] [Kimi1] [REL]

Authors: David Vilar ; Daniel Stein ; Stephan Peitz ; Hermann Ney

No summary was provided.

#18 Dynamic distortion in a discriminative reordering model for statistical machine translation [PDF] [Copy] [Kimi1] [REL]

Authors: Sirvan Yahyaei ; Christoph Monz

No summary was provided.