BUCC 2019 : Building and Using Comparable Corpora
Call For Papers
Comparable corpora with various degrees of comparability (from noisy parallel corpora to random web snapshots) have been used in a range of applications, including Information Retrieval, Machine Translation, Cross-lingual text classification, etc. We believe that the linguistic definitions and observations related to comparable corpora can improve methods to mine such corpora for applications of statistical NLP, for example to extract parallel corpora from comparable corpora for neural MT, see the BUCC shared tasks in the past years.
The special topic for this year is Neural Networks for Building and Using Comparable Corpora. The workshop is co-located with RANLP'19.
We solicit contributions to the following topics:
Building Comparable Corpora
• Automatic and semi-automatic methods
• Methods to mine parallel and non-parallel corpora from the Web
• Tools and criteria to evaluate the comparability of corpora
• Parallel vs non-parallel corpora, monolingual corpora
• Rare and minority languages, across language families
• Multi-media/multi-modal comparable corpora
Applications of comparable corpora
• Human translations
• Language learning
• Cross-language information retrieval & document categorization
• Bilingual projections
• Machine translation
• Writing assistance
Mining from Comparable Corpora
• Cross-language distributional semantics
• Extraction of parallel segments or paraphrases from comparable corpora
• Methods to extract parallel from non-parallel corpora (e.g. to provide for low-resource languages in neural machine translation)
• Extraction of bilingual and multilingual translations of single words and multi-word expressions; proper names, named entities, etc.