posted by user: grupocole || 40 views || tracked by 2 users: [display]

WaC 2026 : 13th Web-as-Corpus (WaC-13) Workshop @EMNLP2026

FacebookTwitterLinkedInGoogle

Link: https://wacky-workshop.github.io/
 
When Oct 24, 2026 - Oct 29, 2026
Where Budapest, Hungary
Submission Deadline Aug 7, 2026
Notification Due Sep 5, 2026
Final Version Due Sep 20, 2026
Categories    NLP   computational linguistics
 

Call For Papers



First Call for Papers

13th Web-as-Corpus (WaC-13) Workshop @EMNLP2026, Budapest, Hungary, 24-29 Oct, 2026

https://wacky-workshop.github.io/

The World Wide Web has evolved from a resource for building linguistic corpora into the central data infrastructure powering modern natural language processing and Large Language Models (LLMs). As web-scale data increasingly shapes AI systems’ knowledge and capabilities, understanding its quality, representativeness, and ethical implications has become critical.

At the same time, the “more is better” paradigm is being challenged by issues such as machine-generated content, data toxicity, limited metadata, and the under-representation of many languages and domains. These challenges call for a shift toward Data-Centric AI, focusing on the curation, analysis, and responsible use of web-derived data.

The 13th Web-as-Corpus (WaC-13) workshop provides a multidisciplinary forum for research addressing the full lifecycle of web data. We invite submissions on methods, resources, and applications related to web corpora, with special emphasis on multilingual data and less-resourced languages.

Topics of interest include (but are not limited to):
* Creation and evaluation of high-quality datasets for foundation models (e.g., data collection, filtering, enrichment, language identification)
* Use of web data in empirical linguistic research
* Analysis of web-scale corpora for quality, representativeness, and societal insights
* Ethical and legal aspects of collecting, sharing, and using web data

By bringing together researchers from NLP, linguistics, and the social sciences, WaC aims to advance best practices for one of the field’s most influential data sources.

Important dates:
Direct paper submission deadline: 7 August, 2026
Pre-reviewed ARR commitment deadline: 1 September, 2026
Notification of acceptance: 5 September, 2026
Camera-ready paper due: 20 September, 2026
Conference dates: 24-29 Oct, 2026

Submissions:
Submissions will be possible through ARR commitment and through openreview.net

Workshop Organizers:
Nikola Ljubešić, Jožef Stefan Institute, Slovenia
Yves Scherrer, University of Oslo, Norway
Laurie Burchell, Common Crawl
Veronika Laippala, University of Turku, Finland
Pedro Ortiz Saurez, Common Crawl
Jen English, Common Crawl
Vuk Dinić, Jožef Stefan Institute, Slovenia

Related Resources

DEPLING 2023   International Conference on Dependency Linguistics
AS 2026   22nd International Conference on Applied Statistics
DSML 2026   7th International Conference on Data Science and Machine Learning
The Web 2026   WWW 2026 : The Web Conference
CLNLP 2026   2026 3rd International Conference on Computational Linguistics and Natural Language Processing
WEB 2026   The Fourteenth International Conference on Building and Exploring Web Based Environments
Cyber-AI 2026   The 2nd IEEE 2026 International Conference on Cybersecurity and AI-Based Systems (Scopus)
SEA 2026   15th International Conference on Software Engineering and Applications
CDKP 2026   15th International Conference on Data Mining & Knowledge Management Process
SPM 2026   13th International Conference on Signal, Image Processing and Multimedia