posted by user: ziqizhang || 19488 views || tracked by 4 users: [display]

SWC-MWPD 2020 : (ROUND 1 RESULTS AND ROUND 2 CfP) Semantic Web Challenge on Mining the Web of HTML-embedded Product Data (@ISWC2020)

FacebookTwitterLinkedInGoogle

Link: https://ir-ischool-uos.github.io/mwpd/index.html
 
When Nov 2, 2020 - Nov 6, 2020
Where Athens
Submission Deadline Aug 17, 2020
Notification Due Aug 24, 2020
Categories    semantic web   data mining   natural language processing   machine learning
 

Call For Papers

Call for Participation: Semantic Web Challenge on Mining the Web of HTML-embedded Product Data (co-located with ISWC2020) - Round 1 results announced and Round 2 open!

NEWS: Round 1 results published, and Round 2 now open for submission! See our website for details: https://ir-ischool-uos.github.io/mwpd/

1. Overview
The Semantic Web Challenge on Mining the Web of HTML-embedded Product Data is co-located with the 19th International Semantic Web Conference (https://iswc2020.semanticweb.org/, 2-6 Nov 2020 at Athens, Greece). The challenge organises two shared tasks related to product data mining on the Web: (1) product matching and (2) product classification. This event is organised by The University of Sheffield, The University of Mannheim and Amazon, and is open to anyone. Systems successfully beating the baseline of the respective task, will be invited to write a paper describing their method and system and present the method as a poster (and potentially also a short talk) at the ISWC2020 conference. Winners of each task will be awarded 500 euro as prize (partly sponsored by Peak Indicators, https://www.peakindicators.com/).

2. Challenge website
For details of the challenge please visit https://ir-ischool-uos.github.io/mwpd/

3. Important dates
13 July 2020: Round 2 system output submission open
17 August 2020: Round 2 system output submission end
02 September 2020: Final system paper submission

4. Task and dataset brief
The challenge organises two tasks, product matching and product categorisation.

i) Product Matching deals with identifying product offers on different websites that refer to the same real-world product (e.g., the same iPhone X model offered using different names/offer titles as well as different descriptions on various websites). A multi-million product offer corpus (16M) containing product offer clusters is released for the generation of training data. A validation set containing 1.1K offer pairs and a test set of 600 offer pairs will also be released. The goal of this task is to classify if the offer pairs in these datasets are match (i.e., referring to the same product) or non-match.

ii) Product classification deals with assigning predefined product category labels (which can be multiple levels) to product instances (e.g., iPhone X is a ‘SmartPhone’, and also ‘Electronics’). A training dataset containing 10K product offers, a validation set of 3K product offers and a test set of 3K product offers will be released. Each dataset contains product offers with their metadata (e.g., name, description, URL) and three classification labels each corresponding to a level in the GS1 Global Product Classification taxonomy. The goal is to classify these product offers into the pre-defined category labels.

All datasets are built based on structured data that was extracted from the Common Crawl (https://commoncrawl.org/) by the Web Data Commons project (http://webdatacommons.org/).

5. Resources and tools
The challenge will also release utility code (in Python) for processing the above datasets and scoring the system outputs. In addition, the following language resources for product-related data mining tasks:
A text corpus of 150 million product offer descriptions
Word embeddings trained on the above corpus

6. Organizing committee
Dr Ziqi Zhang (Information School, The University of Sheffield)
Prof. Christian Bizer (Institute of Computer Science and Business Informatics, The Mannheim University)
Dr Haiping Lu (Department of Computer Science, The University of Sheffield)
Dr Jun Ma (Amazon Inc. Seattle, US)
Prof. Paul Clough (Information School, The University of Sheffield & Peak Indicators)
Ms Anna Primpeli (Institute of Computer Science and Business Informatics, The Mannheim University)
Mr Ralph Peeters (Institute of Computer Science and Business Informatics, The Mannheim University)
Mr. Abdulkareem Alqusair (Information School, The University of Sheffield)

7. Contact
To contact the organising committee please use the Google discussion group https://groups.google.com/forum/#!forum/mwpd2020

Related Resources

ICSE 2024   The IEEE/ACM International Conference on Software Engineering (Second Round)
SPIE-Ei/Scopus-DMNLP 2025   2025 2nd International Conference on Data Mining and Natural Language Processing (DMNLP 2025)-EI Compendex&Scopus
ISSTA 2024   The ACM SIGSOFT International Symposium on Software Testing and Analysis (Round 1)
Ei/Scopus-ACAI 2024   2024 7th International Conference on Algorithms, Computing and Artificial Intelligence(ACAI 2024)
OOPSLA 2025 Round 2 2025   Conference on Object-Oriented Programming Systems, Languages, and Applications (Round 2)
AMLDS 2025   IEEE--2025 International Conference on Advanced Machine Learning and Data Science
OOPSLA 2025 Round 1 2025   Conference on Object-Oriented Programming Systems, Languages, and Applications (Round 1)
IEEE CACML 2025   2025 4th Asia Conference on Algorithms, Computing and Machine Learning (CACML 2025)
ISSTA 2024   The ACM SIGSOFT International Symposium on Software Testing and Analysis (Round 2)
IEEE SWC 2024   2024 IEEE Smart World Congress