|   | 
| 
 | |||||||||||||||
| Eval4NLP 2025 : 5th Workshop on Evaluation and Comparison for NLP systems (Eval4NLP) | |||||||||||||||
| Link: https://eval4nlp.github.io/2025/index.html | |||||||||||||||
| 
 | |||||||||||||||
| Call For Papers | |||||||||||||||
| 
The 5th Workshop on Evaluation and Comparison for NLP systems (Eval4NLP) will be co-located with the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL 2025) in December, 2025. 
 We invite the submission of long and short papers, with a theoretical or experimental nature, describing recent advances in system evaluation and comparison in NLP with a particular focus on developing model evaluation and human evaluation strategies for multitasking, multilingual, and multimodal scenarios. We particularly encourage works considering these issues for low-resource and highly distant languages. IJCNLP-AACL webpage: https://2025.aaclnet.org Workshop webpage: https://eval4nlp.github.io/2025/index.html -------Important Dates------- All deadlines are 11.59 pm UTC -12h (Anywhere on Earth). Paper submission deadline - September 29, 2025. Direct submission via OpenReview. ARR commitment deadline - October 27, 2025 Notification of acceptance - November 3, 2025 Camera-ready papers due - November 11, 2025 Workshop date - December 23, 2025 --------Topics------- 1. Designing evaluation metrics Proposing and/or analyzing: - Metrics with desirable properties, e.g., high correlations with human judgments, strong in distinguishing high-quality outputs from mediocre and low-quality outputs, robust across lengths of input and output sequences, efficient to run, etc.; - Reference-free evaluation metrics, which only require source text(s) and system predictions; - Cross-modal metrics, which can reliably and robustly measure the quality of system outputs from heterogeneous modalities (e.g., text, image and speech), different genres (e.g., newspapers, Wikipedia articles and scientific papers) and different languages; - Cross-lingual metrics that can take inputs in different languages (e.g., input documents in language A and its summary in language B); - Cost-effective methods for eliciting high-quality manual annotations; and - Methods and metrics for evaluating interpretability and explanations of NLP models 2. Creating adequate evaluation data Proposing new datasets or analyzing existing ones by studying their: -Coverage and diversity, e.g., size of the corpus, covered phenomena, representativeness of samples, distribution of sample types, variability among data sources, eras, and genres; - Quality of annotations and human evaluation, e.g., consistency of annotations, interrater agreement, method for human evaluation and bias check. 3. Reporting correct results Ensuring and reporting: - Statistics for the trustworthiness of results, e.g., via appropriate significance tests, and reporting of score distributions rather than single-point estimates, to avoid chance findings; - Reproducibility of experiments, e.g., quantifying the reproducibility of papers and issuing reproducibility guidelines; - Comprehensive and unbiased error analyses and case studies, avoiding cherry-picking and sampling bias. -------Submission Guidelines------ The workshop welcomes two types of submission -- long and short papers. Long papers may consist of up to 8 pages of content, plus unlimited pages of references. Short papers may consist of up to 4 pages of content, plus unlimited pages of references. Please follow the ACL ARR formatting requirements, using the official templates. Final versions of both submission types will be given one additional page of content for addressing reviewers’ comments. The accepted papers will appear in the workshop proceedings. The review process is double-blind. Therefore, no author information should be included in the papers. Self-references that reveal the author's identity must be avoided. Papers that do not conform to these requirements will be rejected without review. -------Two submission modes: standard and pre-reviewed------- Eval4NLP features two modes of submissions. - Standard submissions: We invite the submission of papers that will receive up to three double-blind reviews from the Eval4NLP committee, and a final verdict from the workshop chairs. - Pre-reviewed: To a later deadline, we invite unpublished papers that have already been reviewed, either through ACL ARR, or recent AACL/EACL/ACL/EMNLP/COLING venues (these papers will not receive new reviews but will be judged together with their reviews via a meta-review; authors are invited to attach a note with comments on the reviews and describe possible revisions). Final decisions will be either accept, reject, or conditional accept, i.e., the paper is only accepted provided that specific (meta-)reviewer requirements have been met. Please also note the multiple submission policy. The submission site (OpenReview) is currently available here: https://openreview.net/group?id=aclweb.org/AACL-IJCNLP/2025/Workshop/Eval4NLP ------Optional Supplementary Materials------ Authors are allowed to submit (optional) supplementary materials (e.g., appendices, software, and data) to improve the reproducibility of results and/or to provide additional information that does not fit in the paper. All of the supplementary materials must be zipped into one single file (.tgz or .zip) and submitted via OpenReview together with the paper. However, because supplementary materials are completely optional, reviewers may or may not review or even download them. So, the submitted paper should be fully self-contained. -------Preprints------- Papers uploaded to preprint servers (e.g., ArXiv) can be submitted to the workshop. There is no deadline concerning when the papers were made publicly available. However, the version submitted to Eval4NLP must be anonymized, and we ask the authors not to update the preprints or advertise them on social media while they are under review at Eval4NLP. ------Multiple Submission Policy----- Eval4NLP allows authors to submit a paper that is under review in another venue (journal, conference, or workshop) or to be submitted elsewhere during the Eval4NLP review period. However, the authors need to withdraw the paper from all other venues if they get accepted and want to publish in Eval4NLP. Note that the IJCNLP-AACL and ARR do not allow double submissions. So, papers submitted both to the main conference and IJCNLP-AACL workshops (including us) will violate the multiple submission policy of the main conference. If authors would like to submit a paper under review by IJCNLP-AACL to the Eval4NLP workshop, they need to withdraw their paper from IJCNLP-AACL and submit it to our workshop before the workshop submission deadline. -------Best Paper Awards------ We will award three prizes to the best three paper submissions (subject to availability; more details to come soon). Both long and short submissions will be eligible for prizes. --------Contact Information------- Email: eval4nlp@gmail.com | 
| 
 |