posted by system || 2935 views || tracked by 9 users: [display]

Corpus Profiling 2008 : Corpus Profiling for Information Retrieval and Natural Language Processing Workshop


When Oct 18, 2008 - Oct 18, 2008
Where London, UK
Submission Deadline Aug 15, 2008
Notification Due Sep 12, 2008
Final Version Due Sep 26, 2008
Categories    IR   NLP   information retrieval

Call For Papers


We aim to bring together people from different research communities
interested in exploring how corpus characteristics affect the behaviour
of techniques in information retrieval and natural language processing,
and to set out a roadmap for a shared research agenda.

It is well known in NLP and IR that the effectiveness of a technique
depends on both the data on which it is deployed and its match with the
task at hand. In 1973, Spärck-Jones attributed differing degrees of
success at automatic classification to differences in dataset
characteristics. Since Croft and Harper (1979), IR performance has
repeatedly been related to collection size and other features, though no
upper bound has been found.

The importance of data and task dependencies has been highlighted in IR,
anaphora resolution, automatic summarization and recently, in word sense
disambiguation. Many web/enterprise web retrieval systems rely on URL
properties, link graph properties, click streams, and so on, with
performance dependent on the degree to which this evidence is present
and meaningful in a particular corpus.

Systematically exploring features that can be used effectively to
characterise corpora, has been missing from IR/NLP research. This
creates problems with replicability of experimental results and the
development of applications.

The time is right to pursue this dependence systematically to address
topics in tracking the effect of dataset profile on technique
performance. Over the past 15 years, the approaches of several subject
areas have converged with IR, as large corpora and test collections
assume central importance in research methodologies. These areas have
highlighted issues surrounding the role of data.


The workshop will be a day long, in conjunction with the Information
Interaction in Context (IIiX'2008, The
workshop will have three components:

(1) invited talks in the morning, introducing the background from
different perspectives

(2) two afternoon sessions, presenting peer-reviewed papers

(3) a panel discussion (panel composed of presenters and the organizers).


We welcome original research or position papers. We particularly
encourage postgraduate students or postdoctoral researchers to submit
papers. Topics of interest include, but are NOT LIMITED to, the
following areas:

* Suitable features to characterise text/language variety,
capturing known effects on technique performance with respect to a task;

* Tasks that depend on aspects of corpus profiles, (e.g., the
positive correlation of QA performance with fact frequency in a corpus);

* Limitations of context-independent frequency-based measures, and
exploration of measures that highlight complex dependencies;

* Tools/techniques for characterising a feature or the extent to
which it is manifested in a corpus;

* Evaluation methodologies for testing feature candidates relative
to task/technique;

* Learnability of features (cf. meta-level learning for
classification algorithms).


15 August 2008: Paper submission due

12 September 2008: Notification of acceptance/rejection

26 September 2008: Camera-ready due

18 October 2008: Workshop


Original technical papers, short papers and position papers are all
welcome. Please ensure that your submission does not exceed 5,000 words
in length. Use 10 point font size, double column for body text, and 12
point bold for headings. Please send your submission in PDF to all the
three organizers (;; with subject "Corpus Profiling workshop submission".

We will publish the accepted papers electronically through BCS's
Electronic Workshops in Computing (eWiC), together with the extended
abstracts of invited talks, a summary of the panel discussion. We will
seek to pursue the research thread through further workshops at relevant
conferences. We plan to organize a post-workshop special issue on a
suitable IR or NLP related journal.


Anne De Roeck (The Open University)
Udo Kruschwitz (University of Essex)
Ruslan Mitkov (University of Wolverhampton)
Nikolaos Nanas (CERETETH, Greece)
Michael Oakes (University of Sunderland)
Ian Ruthven (University of Strathclyde)
Dawei Song (KMi, The Open University)
Tomek Strzalkowski (SUNY Albany)
Alistair Willis (The Open University)

For further information please visit

Related Resources

ECNLPIR 2022   2022 European Conference on Natural Language Processing and Information Retrieval (ECNLPIR 2022)
ECIR 2023   45th European Conference on Information Retrieval
CoCo4MT 2022   2nd CFP - The First Workshop on Corpus Generation and Corpus Augmentation for Machine Translation
ECNLPIR 2022   2022 European Conference on Natural Language Processing and Information Retrieval (ECNLPIR 2022)
CBW 2023   4th International Conference on Cloud, Big Data and Web Services
Smart Cities 2022   Smart Cities: Urban Profiling with Artificial Intelligence and Big Data
IEEE Big Data - MMBD 2022   IEEE Big Data 2022 Workshop on Multimodal Big Data (Virtually)
CoCo4MT 2022   The First Workshop on Corpus Generation and Corpus Augmentation for Machine Translation
AdNLP 2023   4th International conference on Advanced Natural Language Processing
ACM-Ei/Scopus-MLBDM 2022   2022 2nd International Conference on Machine Learning and Big Data Management(MLBDM 2022)-EI Compendex