SDP 2022 : 3rd Workshop on Scholarly Document Processing @ COLING 2022
Call For Papers
You are invited to participate in the 3rd Workshop on Scholarly Document Processing (SDP 2022) to be held at COLING 2022 (October 12-17, 2022, https://coling2022.org/). The SDP 2022 workshop will consist of a Research track and six Shared Tasks. The call for research papers is described below, and more details can be found on our website, http://www.sdproc.org/.
Papers must follow the COLING format and conform to the COLING Submission Guidelines.
The paper submission site will be provided on the workshop website shortly. The paper submission deadline is July 11, 2022.
Mailing list: https://groups.google.com/g/sdproc-updates
** Call for Research Papers **
== Introduction ==
Although scientific literature plays a major part in research and policy-making, these texts represent an underserved area of NLP. NLP can play a role in addressing research information overload, identifying disinformation and its effect on people and society, and enhancing the reproducibility of science. The unique challenges of processing scholarly documents necessitate the development of specific methods and resources optimized for this domain. The Scholarly Document Processing (SDP) workshop provides a venue for discussing these challenges and bringing together stakeholders from different communities including computational linguistics, text mining, information retrieval, digital libraries, scientometrics, and others to develop and present methods and resources in support of these goals.
This workshop builds on the success of prior workshops: the 1st SDP workshop held at EMNLP 2020, the 2nd SDP workshop held at NAACL 2021 and the 1st and 2nd SciNLP workshops held at AKBC 2020 and 2021. In addition to having broad appeal within the NLP community, we hope the SDP workshop will attract researchers from other relevant fields including meta-science, scientometrics, data mining, information retrieval, and digital libraries, bringing together these disparate communities within ACL.
== Topics of Interest ==
We invite submissions from all communities demonstrating usage of and challenges associated with natural language processing, information retrieval, and data mining of scholarly and scientific documents. Relevant tasks include (but are not limited to):
* Representation learning
* Information extraction
* Language generation
* Question answering
* Discourse modeling and argumentation mining
* Network analysis
* Bibliometrics, scientometrics, and altmetrics
* Peer review
* Search and indexing
* Datasets and resources
* Document parsing
* Text mining
* Research infrastructure and others.
We specifically invite research on important and/or underserved areas, such as:
* Identifying/mitigating scientific disinformation and its effects on public policy and behavior
* Reducing information overload through summarization and aggregation of information within and across documents
* Improving access to scientific papers through multilingual scholarly document processing
* Improving research reproducibility by connecting scientific claims to evidence such as data, software, and cited claims
** Submission Information **
Authors are invited to submit full and short papers with unpublished, original work. Submissions will be subject to a double-blind peer-review process. Accepted papers will be presented by the authors at the workshop either as a talk or a poster. All accepted papers will be published in the workshop proceedings (proceedings from previous years can be found here: https://aclanthology.org/venues/sdp/).
The submissions must be in PDF format and anonymized for review. All submissions must be written in English and follow the COLING 2022 formatting requirements: https://coling2022.org/Cpapers
We follow the same policies as COLING 2022 regarding preprints and double-submissions. The anonymity period for SDP 2022 is from June 13 to August 22.
Long paper submissions: up to 9 pages of content, plus unlimited references.
Short paper submissions: up to 4 pages of content, plus unlimited references.
Final versions of accepted papers will be allowed 1 additional page of content so that reviewer comments can be taken into account.
More details about submissions are available on our website: http://www.sdproc.org/. To receive updates, please join our mailing list: https://groups.google.com/g/sdproc-updates or follow us on Twitter: https://twitter.com/sdproc
** Important Dates (Main Research Track) **
All paper submissions due – July 11, 2022
Notification of acceptance – August 22, 2022
Camera-ready papers due – September 5, 2022
Workshop – October 16/17, 2022
** SDP 2022 Keynote Speakers **
We are excited to have several keynote speakers at SDP 2022. The following speakers have been confirmed (others will be announced later).
* Min Yen-Kan, NUS, Singapore (https://www.comp.nus.edu.sg/~kanmy/)
* Sophia Ananiadou, University of Manchester, UK who will discuss her recent work on uncertainty and negation, summarisation and citation graphs (https://www.research.manchester.ac.uk/portal/sophia.ananiadou.html)
** SDP 2022 Shared Tasks **
SDP 2022 will host six exciting shared tasks. More information about all shared tasks is provided on the workshop website: https://sdproc.org/2022/sharedtasks.html Each shared task will follow-up with a separate CfP.
== Multi Perspective Scientific Document Summarization ==
Generating summaries of scientific documents is known to be a challenging task. Majority of existing work in summarization assumes only one single best gold summary for each given document. Having only one gold summary negatively impacts our ability to evaluate the quality of summarization systems as writing summaries is a subjective activity. At the same time, annotating multiple gold summaries for scientific documents can be extremely expensive as it requires domain experts to read and understand long scientific documents. This shared task will enable exploring methods for generating multi-perspective summaries. We introduce a novel summarization corpus, leveraging data from scientific peer reviews to capture diverse perspectives from the reader's point of view. More information coming soon at: https://github.com/guyfe/Mup
== LongSumm 2022: Generation of Long Summaries for Scientific Documents ==
Most of the work on scientific document summarization focuses on generating relatively short summaries. Such a short summary resembles an abstract and cannot cover all the salient information conveyed in a given scientific text. Writing longer summaries requires expertise and a deep understanding in a scientific domain, as can be found in some researchers blogs. This shared task leverages blog posts created by researchers in the NLP and Machine learning communities that summarize scientific articles and use these posts as reference summaries. The corpus for this task includes a training set that consists of 1705 extractive summaries, and 531 abstractive summaries of NLP and Machine Learning scientific papers.
More information at: https://github.com/guyfe/LongSumm
== SV-Ident 2022: Survey Variable Identification in Social Science Publications ==
In this shared task, we focus on concepts specific to social science literature, namely survey variables. Survey variable mention identification in texts can be seen as a multi-label classification problem: Given a sentence in a document, and a list of unique variables (from a reference vocabulary of survey variables), the task is to classify which variables, if any, are mentioned in each sentence. This task is organized by the VAriable Detection, Interlinking, and Summarization (VADIS) project. Further details: https://vadis-project.github.io/sv-ident-sdp2022/
== MSLR 2022: Multi-document summarization for medical literature reviews ==
In the context of medicine, systematic literature reviews constitute the highest-quality evidence used to inform clinical care. However, reviews are expensive to produce manually; (semi-)automation via NLP may facilitate faster evidence synthesis without sacrificing rigor. Toward this end, we are running a shared task to study the generation of multi-document summaries in this domain. We make use of two datasets: 1) MS^2: consisting of 20k reviews (citing 470K studies) from the biomedical literature (https://github.com/allenai/ms2), and 2) Cochrane Conclusions: derived from over 4500 Cochrane reviews (https://github.com/bwallace/RCT-summarization-data). Each submission is judged against a gold review summary on the ROUGE score and by the evidence-inference-based divergence metric defined in the MS^2 paper. We also encourage contributions that extend this task and dataset, e.g., by proposing scaffolding tasks, methods for model interpretability, and especially, improved automated evaluation methods in this domain. More information: https://sdproc.org/2022/sharedtasks.html#mslr
== Scholarly Knowledge Graph Generation ==
With the demise of the widely used Microsoft Academic Graph (MAG) at the end of 2021, the scholarly document processing community is facing a pressing need to replace MAG with an open-source community supported service. A number of challenging data processing tasks are essential for a scalable creation of a comprehensive scholarly graph, i.e., a graph of entities involving but not limited to research papers, their authors, research organizations, and research themes. This shared task will evaluate three key sub-tasks involved in the generation of a scholarly graph: 1) document deduplication, i.e. identifying and linking different versions of the same scholarly document, 2) extracting research themes, and 3) affiliation mining, i.e., linking research papers or their metadata to the organizational entities that produced them. Test and evaluation data will be supplied by the CORE aggregator (https://core.ac.uk/). Pre-register your team here: https://forms.gle/7nduU6meseEpv9i69 And we'll keep you posted with competition updates and timelines. More information: https://sdproc.org/2022/sharedtasks.html#skgg
== DAGPap22: Detecting automatically generated scientific papers ==
There are increasing reports that research papers can be written by computers, which presents a series of concerns. In this challenge, we explore the state of the art in detecting automatically generated papers. We frame the detection problem as a binary classification task: given an excerpt of text, label it as either human-written or machine-generated. To this end, we will provide a corpus of automatically written papers, as well as documents collected by our publishing and editorial teams. As a control, we will provide a corpus of openly accessible human-written papers from the same scientific domains of documents. We also encourage contributions that aim to extend this dataset with other computer-generated scientific papers, or papers that propose valid metrics to assess automatically generated papers against those written by humans.
More information will be made available at https://sdproc.org/2022/sharedtasks.html#dagpap
** Organizing Committee **
Arman Cohan, Allen Institute for AI, Seattle, USA
Guy Feigenblat, Piiano, Israel
Dayne Freitag, SRI International, San Diego, USA
Tirthankar Ghosal, Charles University, Czech Republic
Drahomira Herrmannova, Elsevier, USA
Petr Knoth, Open University, UK
Kyle Lo, Allen Institute for AI, Seattle, USA
Philipp Mayr, GESIS -- Leibniz Institute for the Social Sciences, Germany
Robert M. Patton, Oak Ridge National Laboratory, USA
Michal Shmueli-Scheuer, IBM Research AI, Haifa Research Lab, Israel
Anita de Waard, Elsevier, USA
Lucy Lu Wang, Allen Institute for AI, Seattle, USA