posted by organizer: dherrmannova || 5759 views || tracked by 4 users: [display]

SDP Shared Tasks 2021 : 2nd Workshop on Scholarly Document Processing (SDP 2021) @ NAACL Shared Tasks


When Jun 10, 2021 - Jun 10, 2021
Where Mexico City, Mexico
Submission Deadline Mar 15, 2021
Notification Due Apr 15, 2021
Final Version Due Apr 26, 2021
Categories    scholarly document processing   natural language processing   summarization   information retrieval

Call For Papers

Dear colleagues,

The 2nd Workshop on Scholarly Document Processing (SDP 2021) is hosting 3 Shared Tasks tackling key NLP challenges in summarization, claim verification, and citation context classification. Participating teams will be invited to submit their works for presentation at the SDP 2021 workshop on June 10 at NAACL 2021.

The call for participation is described below. More details can be found on our website, alongside our usual call for Research track papers.

Mailing list:

** Task 1: LongSumm - Generating Long Summaries for Scientific Documents **

Most of the work on scientific document summarization focuses on generating relatively short summaries (250 words or less). While such a length constraint can be sufficient for summarizing news articles, it is far from sufficient for summarizing scientific work. In fact, such a short summary resembles more to an abstract than to a summary that aims to cover all the salient information conveyed in a given text. Writing such summaries requires expertise and a deep understanding in a scientific domain, as can be found in some researchers’ blogs.

The LongSumm task opted to leverage blogs created by researchers in the NLP and Machine learning communities and use these summaries as reference summaries to compare the submissions against.

The corpus for this task includes a training set that consists of 1705 extractive summaries and around 700 abstractive summaries of NLP and Machine Learning scientific papers. These are drawn from papers based on video talks from associated conferences (TalkSumm: and from blogs created by NLP and ML researchers. In addition, we create a test set of abstractive summaries. Each submission is judged against one reference summary (gold summary) on ROUGE and should not exceed 600 words.

The training data is composed of abstractive and extractive summaries. To download both datasets, and for further details, see the LongSumm GitHub repository: and our website:

** Task 2: SciVer - Scientific Claim Verification**

Due to the rapid growth in scientific literature, it is difficult for scientists to stay up-to-date on the latest findings. This challenge is especially acute during pandemics due to the risk of making decisions based on outdated or incomplete information. There is a need for AI systems that can help scientists with information overload and support scientific fact checking and evidence synthesis.

In the SciVer shared task, we will build systems that can take a scientific claim as input, identify all relevant abstracts from a large corpus that Support or Refute the claim, and also provide rationales/supporting evidence. Here's a live demo of what such a system would do:

We will use the SciFact dataset of 1409 expert-annotated biomedical claims verified against 5183 abstracts from peer-reviewed publications. Download the full dataset and any baseline models from GitHub ( Find out more from the EMNLP 2020 paper:

To register, please send an email to with:
- Team name
- Participant (full) names
- Participant affiliation(s)
- Email(s) for primary contact(s)

Feel free to contact the organizers at More details are available on the shared task page:

** Task 3: 3C - Citation Context Classification**

Recent years have witnessed a massive increase in the amount of scientific literature and research data being published online, providing revelation about the advancements in the field of different domains. The introduction of aggregator services like CORE ( has enabled unprecedented levels of open access to scholarly publications. The availability of full text of the research documents facilitates the possibility of extending the bibliometric studies by identifying the context of the citations ( The shared task organized as part of the SDP 2021 workshop focuses on classifying citation context in research publications based on their influence and purpose:

Subtask A: A task for identifying the purpose of a citation. Multiclass classification of citations into one of six classes: Background, Uses, Compares_Contrasts, Motivation, Extension, and Future.

Subtask B: A task for identifying the importance of a citation. Binary classification of citations into one of two classes: Incidental, and Influential.

A sample training dataset can be downloaded by filling the following registration form: The full training dataset will be released shortly via the Kaggle platform (

More information about the dataset and the shared task can be found on the workshop website:

To register please use the following form:

** Participation instructions **

Each shared task is organized by a separate sub-team of the SDP 2021 workshop organizers. Please contact the following people with questions:

- LongSumm: and
- SciVer: or and
- 3C: and

Participants in our shared tasks will be invited to submit papers describing their systems for publication in the ACL Anthology; see examples of these papers from the SDP shared tasks at EMNLP 2020.

More details about participation (both Research Papers and Shared Tasks) available on our website: To receive updates about SDP 2021, please join our mailing list: or follow us on Twitter:

Related Resources

TAL-SDP 2024   Special issue of the TAL journal: Scholarly Document Processing
Ei/Scopus-AACIP 2024   2024 2nd Asia Conference on Algorithms, Computing and Image Processing (AACIP 2024)-EI Compendex
ECNLPIR 2024   2024 European Conference on Natural Language Processing and Information Retrieval (ECNLPIR 2024)
SMM4H 2024   The 9th Social Media Mining for Health Research and Applications Workshop and Shared Tasks — Large Language Models (LLMs) and Generalizability for Social Media NLP
SPIE-Ei/Scopus-ITNLP 2024   2024 4th International Conference on Information Technology and Natural Language Processing (ITNLP 2024) -EI Compendex
DAGPap 2024   DAGPap24: Detecting automatically generated scientific papers
NLAI 2024   5th International Conference on NLP & Artificial Intelligence Techniques
IberLEF 2024   Call for Task Proposals - IberLEF 2024
MLNLP 2024   2024 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024)
ISEEIE 2024   2024 4th International Symposium on Electrical, Electronics and Information Engineering (ISEEIE 2024)