CoCo4MT Shared Task 2023 : CoCo4MT Shared Task: First Call for Participation
Call For Papers
We are excited to introduce a new shared task for this year’s CoCo4MT
workshop! Our aim is to encourage and facilitate research on corpus
construction for low-resource machine translation.
Corpus creation for machine translation is typically constrained by the
cost and availability of human translators. When a new dataset needs to be
created for a low-resource language or a specialized domain, the annotation
budget should be used efficiently and any sentences chosen for translation
should be of high quality and as useful for machine translation system
training as possible.
In this shared task, we ask participants to come up with ways in which such
examples can be identified for a target language without any existing data.
Specifically, given a parallel corpus between high-resource languages, the
goal is to choose a good subset of the high-resource corpus to be
translated into the low-resource language, in order to obtain a good
training set for a machine translation system. The shared task winner will
be the team whose instances result in the best final system after training.
- May 19 2023: Release of train, dev and test data
- May 30 2023: Release of baselines
- July 12, 2023: Deadline to submit results
- July 20, 2023: System description papers due
Organizers (listed alphabetically)
- Ananya Ganesh, University of Colorado Boulder
- Constantine Lignos, Brandeis University
- John E. Ortega, Northeastern University
- Jonne Sälevä, Brandeis University
- Katharina Kann, University of Colorado Boulder
- Marine Carpuat, University of Maryland
- Rodolfo Zevallos, Universitat Pompeu Fabra
- Shabnam Tafreshi, University of Maryland
- William Chen, Carnegie Mellon University