BUCC 2016



When May 23, 2016 - May 23, 2016
Where Portorož, Slovenia
Submission Deadline Feb 10, 2016
Notification Due Mar 10, 2016
Final Version Due Mar 25, 2016
Categories    computational linguistics   corpus linguistics   corpora   comparable corpora

Call For Papers


Special Topic: Continuous Vector Space Models and Comparable Corpora

Shared Task: Identifying Parallel Sentences in Comparable Corpora

Monday, May 23, 2016

Co-located with LREC 2016, Portorož, Slovenia

DEADLINE FOR PAPERS: February 10, 2016



In the language engineering and the linguistics communities, research
on comparable corpora has been motivated by two main reasons. In
language engineering, on the one hand, it is chiefly motivated by the
need to use comparable corpora as training data for statistical
Natural Language Processing applications such as statistical machine
translation or cross-lingual retrieval. In linguistics, on the other
hand, comparable corpora are of interest in themselves by making
possible inter-linguistic discoveries and comparisons. It is generally
accepted in both communities that comparable corpora are documents in
one or several languages that are comparable in content and form in
various degrees and dimensions. We believe that the linguistic
definitions and observations related to comparable corpora can improve
methods to mine such corpora for applications of statistical NLP. As
such, it is of great interest to bring together builders and users of
such corpora.


There will be a shared task on "Identifying Parallel Sentences in
Comparable Corpora" whose details will be described on the
workshop website (URL see above).


Beyond this year's special topic "Continuous Vector Space Models and
Comparable Corpora" and the shared task on "Identifying Parallel
Sentences in Comparable Corpora", we solicit contributions including
but not limited to the following topics:

Building comparable corpora:

* Human translations
* Automatic and semi-automatic methods
* Methods to mine parallel and non-parallel corpora from the Web
* Tools and criteria to evaluate the comparability of corpora
* Parallel vs non-parallel corpora, monolingual corpora
* Rare and minority languages, across language families
* Multi-media/multi-modal comparable corpora

Applications of comparable corpora:

* Human translations
* Language learning
* Cross-language information retrieval & document categorization
* Bilingual projections
* Machine translation
* Writing assistance

Mining from comparable corpora:

* Cross-language distributional semantics
* Extraction of parallel segments or paraphrases from comparable corpora
* Extraction of translations of single words and multi-word expressions,
proper names, named entities, etc.


February 10, 2016 Deadline for submission of full papers
March 10, 2016 Notification of acceptance
March 25, 2016 Camera-ready papers due
May 23, 2016 Workshop date


Papers should follow the LREC main conference formatting details (to be
announced on the conference website )
and should be submitted as a PDF-file via the START workshop manager at

Contributions can be short or long papers. Short paper submission must
describe original and unpublished work without exceeding six (6)
pages. Characteristics of short papers include: a small, focused
contribution; work in progress; a negative result; an opinion piece;
an interesting application nugget. Long paper submissions must
describe substantial, original, completed and unpublished work without
exceeding ten (10) pages.

Reviewing will be double blind, so the papers should not reveal the
authors' identity. Accepted papers will be published in the workshop

Double submission policy: Parallel submission to other meetings or
publications is possible but must be immediately notified to the
workshop organizers.

Please also observe the following two paragraphs which are applicable
to all LREC workshops as well as to the main conference:

Describing your LRs in the LRE Map is now a normal practice in the
submission procedure of LREC (introduced in 2010 and adopted by other
conferences). To continue the efforts initiated at LREC 2014 about
“Sharing LRs” (data, tools, web-services, etc.), authors will have
the possibility, when submitting a paper, to upload LRs in a special
LREC repository. This effort of sharing LRs, linked to the LRE Map
for their description, may become a new “regular” feature for conferences
in our field, thus contributing to creating a common repository where
everyone can deposit and share data.

As scientific work requires accurate citations of referenced work so
as to allow the community to understand the whole context and also
replicate the experiments conducted by other researchers, LREC 2016
endorses the need to uniquely Identify LRs through the use of the
International Standard Language Resource Number (ISLRN,,
a Persistent Unique Identifier to be assigned to each Language Resource.
The assignment of ISLRNs to LRs cited in LREC papers will be offered at
submission time.


Reinhard Rapp, University of Mainz (Germany)
Pierre Zweigenbaum, LIMSI, CNRS, Orsay (France)
Serge Sharoff, University of Leeds (UK)


Reinhard Rapp: reinhardrapp (at) gmx (dot) de

