WMT 2008 : Third Workshop on Statistical Machine Translation

posted by system || 3650 views || tracked by 6 users: [display]

WMT 2008 : Third Workshop on Statistical Machine Translation

When	Jun 19, 2008 - Jun 19, 2008
Where	Columbus, Ohio, USA
Submission Deadline	Mar 14, 2008
Notification Due	Apr 12, 2008

Categories NLP machine translation

Call For Papers

This workshop on statistical and hybrid methods for machine translation, and builds on the 2005 ACL Workshop on Parallel Text, the 2006 NAACL Workshop on Statistical Machine Translation, and the 2007 ACL Second Workshop on Statistical Machine Translation. The workshop will feature papers on topics related to MT, and will feature two shared tasks: a shared translation task for 12 pairs of European languages, and a shared evaluation task to test automatic evaluation metrics.

Topics of interest include, but are not limited to:

* word-based, phrase-based, syntax-based SMT
* using comparable corpora for SMT
* incorporating linguistic information into SMT
* decoding
* system combination
* error analysis
* manual and automatic method for evaluating MT
* scaling MT to very large data sets

We encourage authors to evaluate their approaches to the above topics using the common data sets created for the shared translation task. In addition to scientific papers, we will also feature two shared tasks.
SHARED TRANSLATION TASK

The first is a shared translation task which will examine translation between the following language pairs:

* English-German and German-English
* English-French and French-English
* English-Spanish and Spanish-English
* German-Spanish and Spanish-German
* English-Czech and Czech-English
* English-Hungarian and Hungarian-English

Participants may submit translations for any or all of the language directions. In addition to the common test sets the workshop organizers will provide optional training resources, including a newly expanded release of the Europarl corpora and out-of-domain corpora.

All participants who submit entries will have their translations evaluated. We will evaluate translation performance by human judgment. To facilitate the human evaluation we will require participants in the shared task to manually judge some of the submitted translations.

A more detailed description of the shared translation task (including information about the test and training corpora, a freely available MT system, and a number of other resources) is available from http://www.statmt.org/wmt08/shared-task.html. We also provide a baseline machine translation system, whose performance matches the best systems from last year's shared task.
SHARED EVALUATION TASK

The second task is a shared evaluation task. Participants in this task will submit automatic evaluation metrics for machine translation, which will be assessed on their ability to:

* Rank systems on their overall performance on the test set
* Rank systems on a sentence by sentence level

Participants in the shared translation task will submit translation results for a set of a few thousand sentences. Their system outputs will be distributed to participants in the shared evaluation task along with the reference translations. The translations will be ranked with automatic evaluation metrics. We will measure the correlation of automatic evaluation metrics with the human judgments.

More details of the shared evaluation task (including submission formats and the collected manual evaluations from last year's workshop) is available from http://www.statmt.org/wmt08/shared-evaluation-task.html.

PAPER SUBMISSION INFORMATION
Submissions will consist of regular full papers of max. 8 pages, formatted following the ACL 2008 guidelines. In addition, shared task participants will be invited to submit short papers (max. 4 pages) describing their systems or their evaluation metrics. Both submission and review processes will be handled electronically.

We encourage individuals who are submitting research papers to evaluate their approaches using the training resources provided by this workshop and past workshops, so that their experiments can be repeated by others using these publicly available corpora.

IMPORTANT DATES
Regular paper submissions March 14, 2008
(shared translation task) Results submissions March 21, 2008
(shared evaluation task) Results submissions April 4, 2008
(both shared tasks) Short paper submissions April 4, 2008
Notification April 12, 2008
Camera-ready papers April 21, 2008

ORGANIZERS
Chris Callison-Burch (Johns Hopkins University)
Philipp Koehn (University of Edinburgh)
Christof Monz (University of London)
Josh Schroeder (University of Edinburgh)
Cameron Shaw Fordyce

PROGRAM COMMITTEE
Lars Ahrenberg (Linköping University)
Yaser Al-Onaizan (IBM Research)
Oliver Bender (RWTH Aachen)
Chris Brockett (Microsoft Research)
Bill Byrne (Cambridge University)
Francisco Casacuberta (University of Valencia)
Colin Cherry (Microsoft Research)
Stephen Clark (Oxford University)
Trevor Cohn (Edinburgh University)
Mona Diab (Columbia University)
Hal Daume (University of Utah)
Chris Dyer (University of Maryland)
Andreas Eisele (University Saarbrücken)
Marcello Federico (ITC-IRST)
George Foster (Canada National Research Council)
Alex Fraser (University of Stuttgart)
Ulrich Germann (University of Toronto)
Nizar Habash (Columbia University)
Jan Hajic (Charles University)
Keith Hall (Google)
John Henderson (MITRE)
Rebecca Hwa (University of Pittsburgh)
Doug Jones (Lincoln Labs MIT)
Damianos Karakos (Johns Hopkins University)
Kevin Knight (ISI/University of Southern California)
Shankar Kumar (Google)
Philippe Langlais (University of Montreal)
Alon Lavie (Carnegie Melon University)
Adam Lopez (Edinburgh University)
Daniel Marcu (ISI/University of Southern California)
Lambert Mathias (Johns Hopkins University)
Arul Menezes (Microsoft Research)
Bob Moore (Microsoft Research)
Miles Osborne (University of Edinburgh)
Kay Peterson (NIST)
Mark Przybocki (NIST)
Chris Quirk (Microsoft Research)
Philip Resnik (University of Maryland)
Michel Simard (National Research Council Canada)
Libin Shen (BBN Technologies)
Wade Shen (Lincoln Labs MIT)
Eiichiro Sumita (NICT/ATR)
David Talbot (Edinburgh University)
Jörg Tiedemann (University of Groningen)
Christoph Tillmann (IBM Research)
Kristina Toutanova (Microsoft Research)
Nicola Ueffing (National Research Council Canada)
Clare Voss (Army Research Labs)
Taro Watanabe (NTT)
Dekai Wu (HKUST)
Richard Zens (Google)

CONTACT
For questions, comments, etc. please send email to pkoehn@inf.ed.ac.uk.