posted by user: wc6255 || 2019 views || tracked by 8 users: [display]

CoCo4MT 2022 : 2nd CFP - The First Workshop on Corpus Generation and Corpus Augmentation for Machine Translation


When Sep 16, 2022 - Sep 16, 2022
Where Orlando, Florida and Hybrid
Submission Deadline Jul 13, 2022
Categories    NLP   artificial intelligence   computer science   machine learning

Call For Papers

CoCo4MT 2nd CFP - The First Workshop on Corpus Generation and Corpus Augmentation for Machine Translation

The First Workshop on Corpus Generation and Corpus Augmentation for Machine Translation (CoCo4MT)
@ AMTA – 2022
This 15th biennial conference of the Association for Machine Translation in the Americas
12-16 September 2022, Orlando, Florida, USA


Julia Kreutzer Google Research
More TBA...


It is a well-known fact that machine translation systems, especially those that use deep learning, require massive amounts of data. Several resources for languages are not available in their human-created format. Some of the types of resources available are monolingual, multilingual, translation memories, and lexicons. Those types of resources are generally created for formal purposes such as parliamentary collections when parallel and more informal situations when monolingual. The quality and abundance of resources including corpora used for formal reasons is generally higher than those used for informal purposes. Additionally, corpora for low-resource languages, languages with less digital resources available, tends to be less abundant and of lower quality.

CoCo4MT sets out to be the first workshop centered around research that focuses on corpora creation, cleansing, and augmentation techniques specifically for machine translation. We accept work that covers any spoken language (including high-resource languages) but we are specifically interested in those submissions that are on languages with limited existing resources (low-resource languages) where resources are not highly available.

The goal of this workshop is to begin to close the gap between corpora available for low-resource translation systems and promote high-quality data for online systems that can be used by native speakers of low-resource languages is of particular interest. Therefore, It will be beneficial if the techniques presented in research papers include their impact on the quality of MT output and how they can be used in the real world.

CoCo4MT aims to encourage research on new and undiscovered techniques. We hope that submissions will provide high-quality corpora that is available publicly for download and can be used to increase machine translation performance thus encouraging new dataset creation for multiple languages that will, in turn, provide a general workshop to consult for corpora needs in the future. The workshop’s success will be measured by the following key performance indicators:

- Promotes the ongoing increase in quality of machine translation systems when measured by standard measurements,
- Provides a meeting place for collaboration from several research areas to increase the availability of commonly used corpora and new corpora,
- Drives innovation to address the need for higher quality and abundance of low-resource language data.


We are highly interested in original research papers on the topics below; however, we welcome all novel ideas that cover research on corpora techniques.

- Difficulties with using existing corpora (e.g., political considerations or domain limitations) and their effects on final MT systems,
- Strategies for collecting new MT datasets (e.g., via crowdsourcing),
- Data augmentation techniques,
- Data cleansing and denoising techniques,
- Quality control strategies for MT data,
- Exploration of datasets for pretraining or auxiliary tasks for training MT systems.


There is one type of submission in the workshop: Research, review and position paper. The length of each paper should be at least four (4) and not exceed ten (10) pages, plus unlimited pages for references. Submissions should be formatted according to the official AMTA 2022 style templates (PDF, LaTeX, Word). Accepted papers will be published on-line in the AMTA 2022 proceedings which includes the ACL Anthology and will be presented at the conference either orally or as a poster.

Submissions must be anonymized and should be done using the official conference management system ( Scientific papers that have been or will be submitted to other venues must be declared as such, and must be withdrawn from the other venues if accepted and published at CoCo4MT. The review will be double-blind.

We would like to encourage authors to cite papers written in ANY language that are related to the topics, as long as both original bibliographic items and their corresponding English translations are provided.

Registration will be handled by the main conference. (To be announced)


June 1, 2022 – Call for papers released
June 15, 2022 – Second call for papers
June 29, 2022 – Third and final call for papers
July 13, 2022 – Paper submissions due
July 27, 2022 – Notification of acceptance
August 7, 2022 – Camera-ready due
August 31, 2022 – Video recordings due
September 16, 2022 - CoCo4MT workshop


CoCo4MT Workshop Organizers

ORGANIZING COMMITTEE (listed alphabetically)

Constantine Lignos Brandeis University
John E. Ortega New York University and University of Santiago de Compostela (CITIUS)
Katharina Kann University of Colorado Boulder
Maja Popopvić ADAPT Centre at Dublin City University
Marine Carpuat University of Maryland
Shabnam Tafreshi University of Maryland
William Chen Carnegie Mellon University

PROGRAM COMMITTEE (listed alphabetically tentative)

Abteen Ebrahimi University of Colorado Boulder
Adelani David Saarland University
Ananya Ganesh University of Colorado Boulder
Alberto Poncelas ADAPT Centre at Dublin City University
Amirhossein Tebbifakhr University of Trento
Anna Currey Amazon
Arturo Oncevay University of Edinburgh
Atul Kr. Ojha National University of Ireland Galway
Bharathi Raja Chakravarthi National University of Ireland Galway
Beatrice Savoldi University of Trento
Bogdan Babych Heidelberg University
Briakou Eleftheria University of Maryland
Dossou Bonaventure Mila Quebec AI Institute
Duygu Ataman New York University
Eleni Metheniti Université Toulosse - Paul Sabatier
Francis Tyers Indiana University
Jasper Kyle Catapang University of Birmingham
John E. Ortega New York University and USC - CITIUS
José Ramom Pichel Campos Universidade de Santiago de Compostela - CITIUS
Kalika Bali Microsoft
Koel Dutta Chowdhury Saarland University
Liangyou Li Huawei
Manuel Mager University of Stuttgart
Maria Art Antonette Clariño University of the Philippines Los Baños
Mathias Müller University of Zurich
Nathaniel Oco De La Salle University
Niu Xing Amazon
Pablo Gamallo Universidade de Santiago de Compostela - CITIUS
Rico Sennrich University of Zurich
Sangjee Dondrub Qinghai Normal University
Santanu Pal Saarland University
Sardana Ivanova University of Helsinki
Shantipriya Parida Silo AI
Surafel Melaku Lakew Amazon
Tommi A Pirinen University of Tromsø
Valentin Malykh Moscow Institute of Physics and Technology
Xu Weijia University of Maryland

Related Resources

LxGr 2024   9th Symposium on Corpus Approaches to Lexicogrammar
Ei/Scopus-AACIP 2024   2024 2nd Asia Conference on Algorithms, Computing and Image Processing (AACIP 2024)-EI Compendex
IEEE Big Data - AIMG 2024   IEEE Big Data 2024 Workshop on AI Music Generation
IEEE-Ei/Scopus-SGGEA 2024   2024 Asia Conference on Smart Grid, Green Energy and Applications (SGGEA 2024) -EI Compendex
NJ 2024   18th NooJ Call for Paper- Applied Linguistics; Computational Linguistics; Discourse Analysis; General Linguistics; Semantics; Text/Corpus Linguistics
IEEE-Ei/Scopus-ACEPE 2024   2024 IEEE Asia Conference on Advances in Electrical and Power Engineering (ACEPE 2024) -Ei Compendex
: LEXESP – GPLSI 2024   VII International Conference on English and ESP Lexicology and Lexicography: Computational Linguistics, Corpus linguistics and Artificial Intelligence
IEEE ICA 2022   The 6th IEEE International Conference on Agents
LEXESP – GPLSI 2024   VII International Conference on English and ESP Lexicology and Lexicography: Computational Linguistics, Corpus linguistics and Artificial Intelligence
DSIT 2024   2024 7th International Conference on Data Science and Information Technology (DSIT 2024)