SDP Shared Tasks 2021 : 2nd Workshop on Scholarly Document Processing (SDP 2021) @ NAACL Shared Tasks
Call For Papers
The 2nd Workshop on Scholarly Document Processing (SDP 2021) is hosting 3 Shared Tasks tackling key NLP challenges in summarization, claim verification, and citation context classification. Participating teams will be invited to submit their works for presentation at the SDP 2021 workshop on June 10 at NAACL 2021.
The call for participation is described below. More details can be found on our website, alongside our usual call for Research track papers.
Mailing list: https://groups.google.com/g/sdproc-updates
** Task 1: LongSumm - Generating Long Summaries for Scientific Documents **
Most of the work on scientific document summarization focuses on generating relatively short summaries (250 words or less). While such a length constraint can be sufficient for summarizing news articles, it is far from sufficient for summarizing scientific work. In fact, such a short summary resembles more to an abstract than to a summary that aims to cover all the salient information conveyed in a given text. Writing such summaries requires expertise and a deep understanding in a scientific domain, as can be found in some researchers’ blogs.
The LongSumm task opted to leverage blogs created by researchers in the NLP and Machine learning communities and use these summaries as reference summaries to compare the submissions against.
The corpus for this task includes a training set that consists of 1705 extractive summaries and around 700 abstractive summaries of NLP and Machine Learning scientific papers. These are drawn from papers based on video talks from associated conferences (TalkSumm: https://arxiv.org/abs/1906.01351) and from blogs created by NLP and ML researchers. In addition, we create a test set of abstractive summaries. Each submission is judged against one reference summary (gold summary) on ROUGE and should not exceed 600 words.
The training data is composed of abstractive and extractive summaries. To download both datasets, and for further details, see the LongSumm GitHub repository: https://sdproc.org/2021/sharedtasks.html and our website: https://sdproc.org/2021/sharedtasks.html#longsumm.
** Task 2: SciVer - Scientific Claim Verification**
Due to the rapid growth in scientific literature, it is difficult for scientists to stay up-to-date on the latest findings. This challenge is especially acute during pandemics due to the risk of making decisions based on outdated or incomplete information. There is a need for AI systems that can help scientists with information overload and support scientific fact checking and evidence synthesis.
In the SciVer shared task, we will build systems that can take a scientific claim as input, identify all relevant abstracts from a large corpus that Support or Refute the claim, and also provide rationales/supporting evidence. Here's a live demo of what such a system would do: https://scifact.apps.allenai.org/.
We will use the SciFact dataset of 1409 expert-annotated biomedical claims verified against 5183 abstracts from peer-reviewed publications. Download the full dataset and any baseline models from GitHub (https://github.com/allenai/scifact). Find out more from the EMNLP 2020 paper: https://www.aclweb.org/anthology/2020.emnlp-main.609/.
To register, please send an email to email@example.com with:
- Team name
- Participant (full) names
- Participant affiliation(s)
- Email(s) for primary contact(s)
Feel free to contact the organizers at firstname.lastname@example.org. More details are available on the shared task page: https://sdproc.org/2021/sharedtasks.html#sciver.
** Task 3: 3C - Citation Context Classification**
Recent years have witnessed a massive increase in the amount of scientific literature and research data being published online, providing revelation about the advancements in the field of different domains. The introduction of aggregator services like CORE (https://core.ac.uk/) has enabled unprecedented levels of open access to scholarly publications. The availability of full text of the research documents facilitates the possibility of extending the bibliometric studies by identifying the context of the citations (http://oro.open.ac.uk/51751/). The shared task organized as part of the SDP 2021 workshop focuses on classifying citation context in research publications based on their influence and purpose:
Subtask A: A task for identifying the purpose of a citation. Multiclass classification of citations into one of six classes: Background, Uses, Compares_Contrasts, Motivation, Extension, and Future.
Subtask B: A task for identifying the importance of a citation. Binary classification of citations into one of two classes: Incidental, and Influential.
A sample training dataset can be downloaded by filling the following registration form: https://forms.gle/AjYfMrTzZXjfBjgS6. The full training dataset will be released shortly via the Kaggle platform (https://www.kaggle.com/).
More information about the dataset and the shared task can be found on the workshop website: https://sdproc.org/2021/sharedtasks.html#3c
To register please use the following form: https://forms.gle/AjYfMrTzZXjfBjgS6
** Participation instructions **
Each shared task is organized by a separate sub-team of the SDP 2021 workshop organizers. Please contact the following people with questions:
- LongSumm: email@example.com and firstname.lastname@example.org
- SciVer: email@example.com or firstname.lastname@example.org and email@example.com
- 3C: firstname.lastname@example.org and email@example.com
Participants in our shared tasks will be invited to submit papers describing their systems for publication in the ACL Anthology; see examples of these papers from the SDP shared tasks at EMNLP 2020.
More details about participation (both Research Papers and Shared Tasks) available on our website: http://www.sdproc.org/. To receive updates about SDP 2021, please join our mailing list: https://groups.google.com/g/sdproc-updates or follow us on Twitter: https://twitter.com/sdproc