CON-TXT-MT 2011 : META-NET Challenge: Context in Machine Translation
Call For Papers
Machine Translation can be considered to be one of the most challenging tasks computer science has ever taken. Statistical methods have been increasingly successful in providing efficiently MT solutions for many language pairs. However, there is a lot of room for improvement regarding the quality of translations. Prototypical sentences are translated well but in certain situations the end result is far from expected. One central reason for the failures is that current systems take the context into account only in a limited manner.
In natural language processing, the context of use has a considerable impact on the understanding process. It can refer to multiple kinds of meta-data, including information on the document type, domain, genre and medium used. Automatic machine translation systems typically restrict the considered context to one sentence or smaller parts of it.
In order to encourage research in this area, META-NET Network of Excellence (http://www.meta-net.eu) launches a series of challenges. The first challenge, Context in Machine Translation, is organized as an associated event with the ICANN 2011 conference, June 14th to 17th, 2011 in Finland (http://www.ics.tkk.fi/icann11/).
The problem will be formulated as machine learning task, which does not require much MT investment of ML people. In particular, the participants do not need to train any machine translation system and the data will provide a set of reasonable context features to reduce the amount of language processing.
OBJECTIVES AND TASKS
The challenge aims at advancing machine translation research by providing a concrete application area for supervised or unsupervised algorithms whose objective is to learn to assess the quality of the translations in the given context of use.
The concrete task is to choose the best translation from a set of given likely translations from multiple machine translation systems with the help of additional information of the context in which the translation occurs (domain, surrounding text, etc.). This is typically described as re-ranking and should lead to improved translation performance scores compared to the translations originally selected by the respective MT system.
N-best re-ranking can be seen as a subproblem of structured prediction. Given data from methodologically different MT systems, selecting the best translation poses a multi-task learning problem.
The data includes parts of the JRC Acquis corpus (http://wt.jrc.it/lt/Acquis/) and additional language models and context features derived from the corpus. The participants are not allowed to use extra data in order to ensure the comparability of the proposed solutions.
The evaluation is split into the following eight subtasks, depending on which context features were used. For each subtask, the following result sets will be submitted:
* a) given the outputs of one system, re-rank them (best translation of each MT system individually)
* b) given the sets of outputs of all systems, re-rank them (best translation across all MT systems, which can use methods for combining models)
The submitted algorithm performance will be evaluated by BLEU, the standard automatic machine translation evaluation metric.
The data will be made available at http://www.cis.hut.fi/icann11/con-txt-mt11/
Jun 14, 2011 Context in machine translation workshop at ICANN 2011. See http://www.cis.hut.fi/icann11/con-txt-mt11/#workshop
The challenge timetable will be announced at the workshop.
The Context in Machine Translation Challenge is part of a series of challenges organized by the META-NET Network of Excellence (http://www.meta-net.eu), jointly by Aalto University (Finland), CNRS/LIMSI (France) and ILSP (Greece), supported by other network partners.
(Last Update: Jun 7th, 2011.)