posted by user: leondz || 8292 views || tracked by 20 users: [display]

ScaNLP 2013 : Workshop on Scalability in Natural Language Processing


When Sep 12, 2013 - Sep 13, 2013
Where Hissar, Bulgaria
Submission Deadline Jul 5, 2013
Notification Due Aug 2, 2013
Final Version Due Aug 16, 2013
Categories    NLP   parallel computing   cloud   big data

Call For Papers

First Call for Papers

Workshop on Scalability in Natural Language Processing

Full-day workshop in conjunction with RANLP 2013

Deadline: 3 July 2013, 23:59 Hawaii Time

This workshop, held in conjunction with RANLP 2013, aims to introduce
contemporary work and to discuss novel methods for natural language
processing at a large scale, and explore how the resulting technology
and methods can be reused in applications both on the Web and in
the physical world.


For a processing approach to be scalable, it should be to take on
large volumes of data; it can work through them at high speed; and
it can smoothly adapt to changes in these needs. We discuss this
in the context of NLP, with particular focus on the core tasks
of resource creation, discourse processing, and evaluation.

Now is a particularly important time to develop scalable methods
in our field. Big data is here and the benefits of effectively
getting through it remain to be harvested by the pioneers. Huge
datasets are becoming available: Google Books contains 155 billion
tokens, over which only shallow surveys have been conducted; the
new Common Crawl web corpus contains over 60 terabytes of text and
metadata. But size alone is not a driver for scalable methods -
the rapid text content creation we see every day presents masses
of data we are not yet equipped to handle. For example, Twitter
alone is responsible for 500 million microtexts every day; the
publicly-visible holds a part of the 2 million
blog documents we create every 24 hours.

As well as big text data becoming prolific, demand for this data
is also high. The fast, un-curated nature of microtext has been
shown to be of value in stock valuation by multiple researchers.
User location and movement analysis enables powerful search and
analysis modes, such as computational journalism and powerful
personalisation. Sentiment detection informs corporations,
governance and political activities. Media monitoring requires
extracting and co-referring entities and events from thousands
of outlets in real time. And finally, the emerging field of
deep learning places but one core demand in all its guises:
large amounts of data. All these applications' pressures
create a demand for NLP that can be done quickly and broadly.

There is more demand than ever for scalable natural language
processing. Many organisations are interested in the potential
results as big data becomes better defined and data-intensive
approaches to computational linguistics reach production-level
performance. Enormous quantities of data, from user input to
news archives, are being mined using more powerful and
computationally demanding techniques. The organisation, variety,
integrity and public availability of the resulting resources will
have a major impact on how we continue to do science.

Newly introduced data-intensive approaches to computational
linguistics continue thrive on input volume; we need scalable
technology to handle the next order of magnitude in corpus
sizes and, given the nature of language, to continue
data-intensive advances in our field.


With regard to Scalable NLP, we aim to encourage discussion
regarding three key areas of natural language processing:
resource creation; processing of discourse; and evaluation:

-- General scalability issues
-- Application approaches
-- Performance limits
-- Flexible resource creation
-- Parallelising annotation
-- Handling huge corpora
-- Crowdsourcing for corpus creation
-- Decomposing resource creation tasks
-- Rapid or realtime annotation quality assessment
-- Running NLP in the cloud
-- Privacy issues
-- NLP application optimisation / parallelisation
-- Scalable machine learning for NLP
-- High performance computing for NLP
-- Rapid evaluation
-- On-line learning for NLP
-- Reinforcement learning
-- Iterative and ensemble learning
-- Hypothesis generation

In addition to the invited talk and presentations, the
worskhop will include a 30-minute hands-on demonstration slot
with participants doing NLP in the cloud using GATECloud,
possibly including social media processing using GATE TwitIE
(supported and funded by the organisers).



Submission deadline: 5 July 2013
Notification of acceptance: 2 August 2013
Camera-ready copies due: 16 August 2013
Workshop date: 12/13 September 2013



Submission is via EasyChair:

All submissions must be in PDF format and must follow the RANLP
template (

Multiple submission policy: We welcome papers that are under review for
other venues, but, in the event of multiple acceptances, authors are
requested to notify us and choose which meeting to present and publish the
work at as soon as possible - we cannot accept for publication or
presentation work that will be (or has been) published elsewhere.

Reviewing: Reviewing will be blind. No information identifying the authors
should be in the paper: this includes not only the authors' names and
affiliations, but also self-references that reveal authors' identities; for
example, "We have previously shown (Smith 1999)" should be changed to "Smith
(1999) has previously shown".

Paper length and presentation: We invite long (8) and short (4) papers.
Accepted short papers will be presented either as short oral presentations
or as posters.



Leon Derczynski, University of Sheffield, UK
Kalina Bontcheva, University of Sheffield, UK
Bin Yang, Aarhus University, Denmark
Valentin Tablan, University of Sheffield, UK
Arno Scharl, MODUL University Vienna, Austria
Thierry Declerck, DFKI, Germany



Galia Angelova, Bulgarian Academy of Sciences, Bulgaria
Srikanta Bedathur, Indraprastha Institute of Information Technology, India
Kai-wei Chang, University of Illinois Urbana-Champaign, USA
Freddy Chong-Tat Chua, Singapore Management University, Singapore
Hamish Cunningham, University of Sheffield, UK
David Martins de Matos, L2F INESC ID, Portugal
Ted Dunning, MapR Technologies, USA
Chris Dyer, Carnegie Mellon University, USA
Rainer Gemulla, Max Planck Institut für Informatik, Germany
Amit Goyal, University of Maryland, USA
Christian S. Jensen, Aarhus University, Denmark
Vinh Ngoc Khuc, Ohio State University, USA
Oleksandr Kolomiyets, KU Leuven, Belgium
Hector Llorens, Nuance, Spain
Barry Norton, Ontotext, UK
Miles Osborne, University of Edinburgh, UK
Weining Qian, East China Normal University, China
Alan Ritter, University of Washington, USA
Matthew Rowe, Lancaster University, UK
Marta Sabou, MODUL University Vienna, Austria
Sina Samangooei, University of Southampton, UK
Sebastian Schelter, TU Berlin / Apache Software Foundation, Germany
Darius Sidlauskas, Aarhus University, Denmark
Marc Spaniol, Max Planck Institut für Informatik, Germany
Andreas Vlachos, University of Cambridge, UK



The ScaNLP workshop is partially supported by GATE, the EU FP7 projects
TrendMiner ( and AnnoMarket (,
and the CHIST-ERA uComp (http:// project.

Related Resources

ECNLPIR 2022   2022 European Conference on Natural Language Processing and Information Retrieval (ECNLPIR 2022)
IEEE COINS 2022   IEEE COINS 2022: Hybrid (3 days on-site | 2 days virtual)
ECNLPIR 2022   2022 European Conference on Natural Language Processing and Information Retrieval (ECNLPIR 2022)
ICDM 2022   22th Industrial Conference on Data Mining
SPECOM 2022   24th International Conference on Speech and Computer
GRAPH-HOC 2022   International Journal on Applications of Graph Theory in Wireless Ad hoc Networks and Sensor Networks
FNP 2022   The 4th Financial Narrative Processing Workshop (FNP 2022)
NMCO 2022   8th International Conference on Networks, Mobile Communication
MWE 2022   18th Workshop on Multiword Expressions
ICAIT 2022   11th International Conference on Advanced Computer Science and Information Technology