posted by user: grupocole || 2631 views || tracked by 9 users: [display]

WAC 2008 : The 4th Web as Corpus workshop: Can we beat Google?

FacebookTwitterLinkedInGoogle

Link: http://webascorpus.sf.net/WAC4/
 
When Jun 1, 2008 - Jun 1, 2008
Where Marrakech, Morocco
Submission Deadline Feb 29, 2008
Categories    NLP   information retrieval
 

Call For Papers

The 4th Web as Corpus workshop: Can we beat Google?

Marrakech, Morocco (post-LREC workshop)
1 June 2008

http://webascorpus.sf.net/WAC4/

Submission deadline: 29 February 2008

DESCRIPTION

Commercial Web search engines offer fast search on huge amounts of text,
combined with increasingly clever ranking and data analysis algorithms,
but their content-centric services do not cater to the needs of the
computational linguistics and NLP communities. The leading theme of
this workshop, the fourth in a row of highly successful Web as Corpus
meetings, is to find out how to combine the power and scalability of
modern search engine technology with sophisticated linguistic annotation
and query processing.

We invite papers on various topics concerning the use of Web resources
for corpus research and NLP applications, including (but not limited to)
the following:

* linguistic Web crawler technology and Web corpus collection
projects
* applications of Web-derived corpora and other kinds of Web data
* how far does the "easy way" get you? (using search engines, or
Google's n-gram lists; we are particularly interested in a critical
discussion of the usefulness and limitations of such approaches)
* methods and tools for "cleaning" Web pages to turn them into a
corpus (contributors to this topic will be encouraged to participate in
the second CLEANEVAL competition to be held in 2009)
* automatic linguistic annotation of Web data: tokenisation, POS
tagging, lemmatisation, semantic tagging, etc. (established tools often
perform very poorly on Web data)
* search engine architectures for linguists: bringing linguistics to
commercial search engines, or high-performance search technology to
linguistics?
* search engine-related topics such as result ranking (e.g. how to
identify "typical" uses rather than returning 50 very similar matches on
the first page)
* duplicate detection, interactive query refinement, etc.
* reviews and clever uses of search engine APIs (Google, Yahoo,
Altavista, and in particular Microsoft's current generous LiveSearch
API)

This workshop is endorsed by the Special Interest Group on the Web as
Corpus (SIGWAC) of the Association for Computational Linguistics (ACL).

Submission Information: Authors are invited to submit full papers on
original, unpublished work in the topic area of this workshop.
Submissions should follow the format of LREC proceedings and should not
exceed eight (8) pages, including references. We strongly recommend the
use of LREC LaTeX or Microsoft Word style files tailored for this year's
conference. Details on the submission procedure will be posted on the
conference website shortly.

PROGRAMME COMMITTEE

Silvia Bernardini, U of Bologna, Italy
Massimiliano Ciaramita, CNR Pisa, Italy
Jesse de Does, INL, Netherlands
Katrien Depuydt, INL, Netherlands
Stefan Evert, U of Osnabrück, Germany
Cédrick Fairon, UCLouvain, Belgium
William Fletcher, U.S. Naval Academy, USA
Gregory Grefenstette, Commissariat à l'�nergie Atomique, France
Péter Halácsy, Budapest U of Technology and Economics, Hungary
Katja Hofmann, U of Amsterdam, Netherlands
Adam Kilgarriff, Lexical Computing Ltd, UK
Igor Leturia, U of the Basque Country, Spain
Phil Resnik, U of Maryland, College Park, USA
Kevin Scannell, Saint Louis U, USA
Gilles-Maurice de Schryver, U Gent, Belgium
Klaus Schulz, LMU München, Germany
Serge Sharoff, U of Leeds, UK
Eros Zanchetta, U of Bologna, Italy

ORGANISING COMMITTEE

Stefan Evert, University of Osnabrück
Adam Kilgarriff, Lexical Computing
Serge Sharoff, University of Leeds

Related Resources

ICWS 2024   International Conference on Web Services
ACM-Ei/Scopus-CCISS 2024   2024 International Conference on Computing, Information Science and System (CCISS 2024)
IEEE ESAS 2024   IEEE ESAS 2024: The 19th International Workshop on e-Health Systems & Web Technologies
JARES 2024   International Journal of Advance Robotics & Expert Systems
SEMANTiCS 2024   20th International Conference on Semantic Systems
NLDB 2024   The 29th International Conference on Natural Language & Information Systems
XR for the Metaverse 2024   IEEE MetroXRAINE 2024 - Special Session on Extended Reality as a gateway to the Metaverse
COMIT 2024   8th International Conference on Computer Science and Information Technology
WI-IAT 2024   23rd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology
iiWAS 2024   The 26th International Conference on Information Integration and Web Intelligence