posted by organizer: bross || 937 views || tracked by 1 users: [display]

NEATCLasS 2023 : 2nd Workshop on Novel Evaluation Approaches for Text Classification Systems


When Jun 5, 2023 - Jun 5, 2023
Where Limassol, Cyprus
Submission Deadline Apr 17, 2023
Notification Due Apr 30, 2023
Final Version Due May 6, 2023
Categories    computational social science   natural language processing   evaluation   benchmarking

Call For Papers

Co-located with ICWSM 2023

The automatic or semiautomatic analysis of textual data is a key approach to analyse the massive amounts of user-generated content online, from the identification of sentiment in text and topic classification to the detection of abusive language, misinformation or propaganda. However, the development of such systems faces a crucial challenge. Static benchmarking datasets and performance metrics are the primary method for measuring progress in the field, and the publication of research on new systems typically requires demonstrating an improvement over state-of-the-art approaches in this way. Yet, these performance metrics can obscure critical failings in current models. Improvements in metrics often do not reflect improvements in the real-world performance of models. There is clearly a need to rethink performance evaluation for text classification and analysis systems to be usable and trustable.

If unreliable systems achieve astonishing scores with traditional metrics, how do we recognise progress when we see it? The goal of the Workshop on Novel Evaluation Approaches for Text Classification Systems (NEATCLasS) is to promote the development and use of novel metrics for abuse detection, hate speech recognition, sentiment analysis and similar tasks within the community, to better be able to measure whether models really improve upon the state of the art, and to encourage a wide range of models to be tested on these new metrics.

Recently there have been attempts to address the problem of benchmarks and metrics that do not represent performance well. For example, in abusive language detection, there are both static datasets of hard-to-detect examples (Röttger et al. 2021) and dynamic approaches for generating such examples (Calabrese et al. 2021). On the platform DynaBench (Kiela et al. 2021), benchmarks are dynamic and constantly updated with hard-to-classify examples, avoiding overfitting a predetermined dataset. However, these approaches only capture a tiny fraction of issues with benchmarking. There is still much work to do.

We welcome submissions discussing such new evaluation approaches, introducing new or refining existing ones, promoting the use of novel metrics for abuse detection, sentiment analysis and similar tasks within the community. Furthermore, the workshop will promote discussion on the importance, potential and danger of disagreement in tasks that require subjective judgements. This discussion will also focus on how to evaluate human annotations, and how to find the most suitable set of annotators (if any) for a given instance and task. The workshop will solicit, among others, research papers about
* Issues with current evaluation metrics and benchmarking datasets
* New evaluation metrics
* User-centred (qualitative or quantitative) evaluation of social media text analysis tools
* Adaptations and translations of novel evaluation metrics for other languages
* New datasets for benchmarking
* Increasing data quality in benchmarking datasets, e.g., avoidance of selection bias, identification of suitable expert human annotators for tasks involving subjective judgements
* Systems that facilitate dynamic evaluation and benchmarking
* Models that perform better at hard-to-classify instances and novel evaluation metrics such as AAA, DynaBench and HateCheck
* Bias, error analysis and model diagnostics
* Phenomena not captured by existing evaluation metrics (such as models making the right predictions for the wrong reason)
* Approaches to mitigating bias and common errors
* Alternative designs for NLP competitions that evaluate a wide range of model characteristics (such as bias, error analysis, cross-domain performance)
* Challenges of downstream applications (in industry, computational social science and elsewhere) and reflections on how these challenges can be captured in evaluation metrics

Format and Submissions

We invite research papers (8 pages), position and short papers (4 pages), and demo papers (2 pages). Detailed submission instructions can be found on the workshop website.

The workshop will take place as a half-day meeting on 5 June. We are looking forward to an exciting mix of activities including invited talks, paper presentations and a group discussion. Authors of accepted papers will be invited to trial an innovative format for paper presentations: presenters will be given 5 minutes to describe their research questions and hypothesis, and a group discussion will start after that. Then, presenters will be given 5 more minutes to describe their method and results, followed by a new group discussion about the interpretation and implications of such results. The group discussion to bring researchers together and collect ideas for new evaluation approaches and future work in the field.

While we would encourage attending the workshop in person, we are also planning to live stream the workshop on Zoom and record talks to allow as many people as possible to participate.

Authors of accepted papers will have the opportunity to publish their papers through workshop proceedings by the AAAI Press. Submission instructions will be uploaded to the workshop web page in due course:


Björn Ross, University of Edinburgh (Contact:
Roberto Navigli, Sapienza University of Rome
Agostina Calabrese, University of Edinburgh
Sheikh Muhammad Sarwar, Amazon

Related Resources

ENASE 2024   19th International Conference on Evaluation of Novel Approaches to Software Engineering
Ei/Scopus-AACIP 2024   2024 2nd Asia Conference on Algorithms, Computing and Image Processing (AACIP 2024)-EI Compendex
NLE Special Issue 2024   Natural Language Engineering- Special issue on NLP Approaches for Computational Analysis of Social Media Texts for Online Well-being and Social Order
SPIE-Ei/Scopus-ITNLP 2024   2024 4th International Conference on Information Technology and Natural Language Processing (ITNLP 2024) -EI Compendex
NovelIQA 2024   Novel Approaches to Image Quality Assessment
ISEEIE 2024   2024 4th International Symposium on Electrical, Electronics and Information Engineering (ISEEIE 2024)
CAiSE 2024   36th International Conference on Advanced Information Systems Engineering
CTCNet 2024   2024 Asia Pacific Conference on Computing Technologies, Communications and Networking (CTCNet 2024)
SNAM-Special Issue 2024   Datasets, Language Resources and Algorithmic Approaches on Online Wellbeing and Social Order in Asian Languages
GEM shared task 2024   GEM 2024 multilingual data-to-text and summarization shared task