[WWW 2022]FinSim 4 2022 : The 4th Shared Task on Learning Semantic Similarities for the Financial Domain - extended version to ESG insights
Call For Papers
FinSIM-4 2022 The 4th Shared Task on Learning Semantic Similarities for the Financial Domain - extended version to ESG insights
We would like to invite you to submit to FinSIM-4, the 4th shared task on Learning Semantic Similarities for the Financial Domain, extended to ESG insights, held in conjunction with The Web Conference 2022 @ Online, 25th -26th April, 2022 as part of the FinWeb-2022 workshop.
Shared Task URL: https://sites.google.com/nlg.csie.ntu.edu.tw/finweb2022/shared-task-finsim-4
Workshop URL: https://sites.google.com/nlg.csie.ntu.edu.tw/finweb2022/home
Registration Form: https://forms.gle/aScP11s5vPSK1ghm6
The FinSim 2022 shared task aims to spark interest from communities in NLP, ML/AI, Knowledge Engineering and Financial document processing. Going beyond the mere representation of words is a key step to industrial applications that make use of Natural Language Processing (NLP). This is typically addressed using either 1) Unsupervised corpus-derived representations like word embeddings, which are typically opaque to human understanding but very useful in NLP applications or 2) Supervised approach to semantic representations learning, which typically requires an important volume of labeled data, but has high coverage for the target domain or 3) Manually labeled resources such as corpora, lexica, taxonomies and ontologies, which typically have low coverage and contain inconsistencies, but provide a deeper understanding of the target domain.
These approaches form different spectrum which a number of them have attempted to combine, particularly in tasks aiming at expanding the coverage of manual resources using automatic methods.
The Semeval community has organized several evaluation campaigns to stimulate the development of methods which extract semantic/lexical relations between concepts/words (Bordea et al. 2015, Bordea et al. 2016, Jurgens et al. 2016, Camacho-Collados et al. 2018).
A large number of datasets and challenges specifically look at how to automatically populate knowledge bases such as DBpedia or Wikidata (e.g. KBP challenges, https://tac.nist.gov/2020/KBP/SM-KBP/).
There are also a number of studies on the supervised and unsupervised approaches to the extraction of semantic relations between concepts and terms (Alfarone et al. 2015, Fauconnier et al. 2015, Shwartz et al. 2016, Sarkar et al. 2018, Martel et al. 2021).
Considering the ESG (Environmental, Social and Governance) related issues in the financial domain, from the end of 2022, companies providing investment products that make sustainability or environmental claims will be required to disclose how their portfolios align with the EU taxonomy (https://ec.europa.eu/info/business-economy-euro/banking-and-finance/sustainable-finance/eu-taxonomy-sustainable-activities_en) for sustainable activities according to the European Commission. The objective is to elaborate a ESG taxonomy or ESG related concepts representations and make use of it to analyze how an economic activity complies with the taxonomy, by consequently, it allows to know how an investment product is aligned with it.
The new edition FinSim-4 proposes two sub-tasks:
Sub-task 1. We have created an in-house sustainable finance taxonomy called Fortia ESG taxonomy. It is based on different financial data provider taxonomies as well as several sustainability and annual reports where we looked for ESG related criteria. Given a subset of Fortia ESG taxonomy (your trainset), participants will be asked to enrich this trainset to cover the rest of the terms of the original Fortia ESG taxonomy. For this purpose, participants will be given a set of annual reports and sustainability reports of financial companies from which they can develop a model allowing to induce semantically related terms to the concepts defined in the trainset. For example, given a set of terms related to the concept Waste management (e.g. Hazardous Waste, Waste Reduction Initiatives) you need to find the missing ones by the way that you predict a corresponding concept to unlabeled terms.
Sub-task 2. Participants will be asked to design a system which can automatically classify sentences into sustainable or unsustainable sentences making use of the enriched taxonomy if helpful. For this purpose, participants will be given a list of carefully selected labeled sentences from the sustainability reports and other documents. In this shared task, we consider a sentence as sustainable if a sentence semantically mentions the Environmental or Social or Governance related factors as defined in our ESG taxonomy.
Performance will be measured according to the accuracy with which label is assigned, and according to recall (based on the total number of predictions).
This year, we propose a subset of our in-house made ESG taxonomy and a dataset composed of financial and non-financial reportings. And we are interested in systems which make use of contextual word embeddings such as BERT (Devlin et al. 2018), as well as systems which make use of resources related to the ESG (Environmental, Social and Governance) and/or to sustainability including EU taxonomy.
Daniele Alfarone and Jesse Davis (2015). Unsupervised Learning of an IS-A Taxonomy from a Limited Domain-Specific Corpus. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015).
Georgeta Bordea, Paul Buitelaar, Stefano Faralli and Roberto Navigli (2015). “SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval)”. In Proceedings of SemEval 2015, co-located with NAACL HLT 2015, Denver, Col, USA.
Georgeta Bordea, Els Lefever, and Paul Buitelaar (2016). “Semeval-2016 task 13: Taxonomy extraction evaluation (TExEval-2)”. In Proceedings of the 10th International Workshop on Semantic Evaluation, San Diego, CA, USA.
Jose Camacho-Collados, Claudio Delli Bovi, Luis Espinosa-Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, and Horacio Saggion (2018). “SemEval-2018 Task 9: Hypernym Discovery”. In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, United States. Association for Computational Linguistics.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. https://arxiv.org/abs/1810.04805v2.
Jean-Philippe Fauconnier, Mouna Kamel and Bernard Rothenburger (2015). A Supervised Machine Learning Approach for Taxonomic Relation Recognition through Non-linear Enumerative Structures. In: 30th ACM Symposium on Applied Computing (SAC 2015), 13 April 2015 - 17 April 2015 (Salamanque, Spain).
David Jurgens and Mohammad Taher Pilehvar (2016). “SemEval-2016 Task 14: Semantic Taxonomy Enrichment”. In Proceedings of SemEval-2016, NAACL-HLT.
Félix Martel, Amal Zouaq (2021). Taxonomy extraction using knowledge graph embeddings and hierarchical clustering. In SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing, March 2021 Pages 836–844.
Rajdeep Sarkar, John P. McCrae, Paul Buitelaar (2018). “A supervised approach to taxonomy extraction using word embeddings”. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Vered Shwartz, Yoav Goldberg, Ido Dagan. (2016). Improving Hypernymy Detection with an Integrated Path-based and Distributional Method. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
To register your interest in participating in FinSim shared task, please use the following google form: https://forms.gle/aScP11s5vPSK1ghm6.
A USD $500 prize will be rewarded to the best-performing team.
Submission paper: https://easychair.org/conferences/?conf=finweb2022
December 22, 2021: First announcement of the shared task and beginning of registration
January 14, 2022 : Release of training set & scoring scripts.
February 16, 2022: Release of test set.
February 22, 2022: System's outputs submission deadline.
February 25, 2022: Release of results.
February 25, 2022: Shared task title and abstract due
March 01, 2022: Shared task paper submissions due
March 03, 2022: Registration deadline.
March 10, 2022: Camera-ready version of shared task paper due
April 25-26, 2022: FinWeb workshop @WWW Conference 2022
For any questions on the shared task, please contact us on email@example.com.