posted by user: pz || 3342 views || tracked by 10 users: [display]

LREC Wkshp Comparable Corpora 2008 : LREC 2008 workshop Building and Using Comparable Corpora


When May 31, 2008 - May 31, 2008
Where Marrakech, Morocco
Submission Deadline Feb 18, 2008
Notification Due Mar 14, 2008
Categories    computational linguistics   NLP   information retrieval   linguistics

Call For Papers

Call for papers

Building and Using Comparable Corpora

LREC 2008 post-conference workshop
Marrakech, Morocco
31 May 2008

*Deadline extended to 18 February 2008*


Research in comparable corpora is motivated by the scarcity of
parallel corpora. Parallel corpora are a key resource to mine
translations for statistical machine translation or for building
or extending bilingual lexicons and terminologies. However, beyond
a few language pairs such as English-French or English-Chinese and
a few contexts such as parliamentary debates or legal texts, they
remain a scarce resource, despite the creation of automated
methods to collect parallel corpora from the Web. A more
fundamental limitation is that translated texts, whatever the
skills of translators, are generally influenced by the very
translation process and by the language of source texts, so that
they may not be fully adequate for the task at hand.

This has motivated research into the use of comparable corpora:
pairs of monolingual corpora selected according to the same set of
criteria, but in different languages or language
varieties. Comparable corpora overcome the two limitations of
parallel corpora, since sources for original, monolingual texts
are much more abundant than translated texts. However, because of
their nature, mining translations in comparable corpora is much
more challenging than in parallel corpora. What constitutes a good
comparable corpus, for a given task or per se, also requires
specific attention: while the definition of a parallel corpus is
fairly straightforward, building a comparable corpus requires
control over the selection of source texts in both languages.


This workshop aims to bring together researchers interested in the
constitution and use of comparable corpora. Contributions are
solicited on the constitution and application of comparable
corpora, including the following topics:

Applications of comparable corpora:

tools for translators;
tools for language learning;
cross-language information retrieval;
cross-language document categorization;
machine translation;
monolingual comparable corpora for writing assistance;
extraction of parallel segments in comparable corpora.

Units aligned in comparable corpora:

single words and multi-word expressions;
proper names;
alignment across different scripts.

Constitution of comparable corpora:

criteria of comparability;
degree of comparability;
methods for mining comparable corpora.


18 February 2008 *Extended deadline for submission*
14 March 2008 Notification
31 March 2008 Final version
31 May 2008 Workshop


Pierre Zweigenbaum
LIMSI, CNRS, Orsay, France
Eric Gaussier
LIG, Université Joseph Fourier, Grenoble, France
Pascale Fung
Department of Electronic & Computer Engineering,
University of Science & Technology, Hong Kong


We expect short papers of max 3500 words (about 4-6 pages)
describing research addressing one of the above topics, to be
submitted as PDF documents by email to the following address:

Pierre Zweigenbaum (

The final papers should not have more than 6 pages, adhering to
the stylesheet that will be adopted for the LREC Proceedings (to
be announced later on the Conference web site).


Lynne Bowker (University of Ottawa, Canada)
Hervé Déjean (Xerox Research Centre Europe, Grenoble, France)
Eric Gaussier (Université Joseph Fourier, Grenoble, France)
Gregory Grefenstette (CEA/LIST, Fontenay-aux-Roses, France)
Pascale Fung (University of Science & Technology, Hong Kong)
Nathalie Kübler (Université Paris Diderot, France)
Tony McEnery (Lancaster University, UK)
Emmanuel Morin (Université de Nantes, France)
Dragos Stefan Munteanu (Information Sciences Institute, Marina Del Rey, USA)
Carol Peters (ISTI-CNR, Pisa, Italy)
Reinhard Rapp (Johannes Gutenberg-Universität Mainz, Germany)
Serge Sharoff (University of Leeds, UK)
Monique Slodzian (INALCO, Paris, France)
Richard Sproat (University of Illinois at Urbana-Champaign, USA)
Pierre Zweigenbaum (LIMSI-CNRS, Orsay, France)

Related Resources

BUCC 2022   15th Workshop on Building and Using Comparable Corpora with Shared Task on Multilingual Terminology Extraction from Comparable Corpora
LREC 2022   14th Conference on Language Resources and Evaluation
ECIR 2023   45th European Conference on Information Retrieval
PoliticalNLP 2022   First Workshop on Natural Language Processing for Political sciences Co-located with LREC 2022
ECNLPIR 2022   2022 European Conference on Natural Language Processing and Information Retrieval (ECNLPIR 2022)
Legal 2022   Legal and Ethical Issues Workshop at LREC 2022
CLNLP 2022   2022 3rd International Conference on Computational Linguistics and Natural Language Processing (CLNLP 2022)
EURALI 2022   Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia@ LREC 2022
EI-ISEEIE 2023   2023 International Symposium on Electrical, Electronics and Information Engineering(ISEEIE 2023)