FinSim4-ESG 2022 : [FinNLP-2022] The 4th Shared Task on Learning Semantic Similarities for the Financial Domain: Extended edition to ESG insights

posted by user: finsim || 1061 views || tracked by 1 users: [display]

FinSim4-ESG 2022 : [FinNLP-2022] The 4th Shared Task on Learning Semantic Similarities for the Financial Domain: Extended edition to ESG insights

Link: https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-2022/home?authuser=0

When	Jul 23, 2022 - Jul 29, 2022
Where	Vienna, Austria
Submission Deadline	TBD

Categories financial data machine learning ESG distributional semantics

Call For Papers

Greetings,

We would like to invite you to submit to FinSIM4-ESG, the 4th shared task on Learning Semantic Similarities for the Financial Domain, extended to ESG insights, held in conjunction with IJCAI-ECAI-2022, Messe Wien, Vienna, Austria 23th -25th July, 2022 as part of the FinNLP-2022 workshop.

Shared Task URL: https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-2022/shared-task-finsim4-esg
Workshop URL: https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-2022/home
Registration Form: https://docs.google.com/forms/d/1wUrFr1z9Z3Yi3XC6UAM6WwvVNd7N0V3_bTj4PVmClVU/edit

=====Introduction======
The FinSim 2022 shared task aims to spark interest from communities in NLP, ML/AI, Knowledge Engineering and Financial document processing. Going beyond the mere representation of words is a key step to industrial applications that make use of Natural Language Processing (NLP). This is typically addressed using either 1) Unsupervised corpus-derived representations like word embeddings, which are typically opaque to human understanding but very useful in NLP applications or 2) Supervised approach to semantic representations learning, which typically requires an important volume of labeled data, but has high coverage for the target domain or 3) Manually labeled resources such as corpora, lexica, taxonomies and ontologies, which typically have low coverage and contain inconsistencies, but provide a deeper understanding of the target domain.
These approaches form a different spectrum which a number of them have attempted to combine, particularly in tasks aiming at expanding the coverage of manual resources using automatic methods.
The Semeval community has organized several evaluation campaigns to stimulate the development of methods which extract semantic/lexical relations between concepts/words (Bordea et al. 2015, Bordea et al. 2016, Jurgens et al. 2016, Camacho-Collados et al. 2018).
A large number of datasets and challenges specifically look at how to automatically populate knowledge bases such as DBpedia or Wikidata (e.g. KBP challenges, https://tac.nist.gov/2020/KBP/SM-KBP/).
There are also a number of studies on the supervised and unsupervised approaches to the extraction of semantic relations between concepts and terms (Alfarone et al. 2015, Fauconnier et al. 2015, Shwartz et al. 2016, Sarkar et al. 2018, Martel et al. 2021).

This new edition of FinSim4-ESG is extended to the "Environment, Social and Governance (ESG)" related issues in the financial domain. According to the European Commission, from the end of 2022, companies providing investment products that make sustainability or environmental claims will be required to disclose how their portfolios align with the EU taxonomy (https://ec.europa.eu/info/business-economy-euro/banking-and-finance/sustainable-finance/eu-taxonomy-sustainable-activities_en) and ESG regulations for sustainable activities. The objective of this shared task is to elaborate an ESG taxonomy (ESG related concepts representations) based on the data like companies' sustainability reports, annual reports, environment reports, etc. and make use of them to analyze how an economic activity complies with the taxonomy. Consequently, it allows us to know how an investment product aligns with ESG regulations.
Keywords: distributional semantics, taxonomy enrichment, ESG taxonomy, Natural Language Processing(NLP), Machine Learning(ML)

=====Task Description=====
The new edition proposes two sub-tasks:
Sub-task 1. We have created an in-house sustainable finance taxonomy called "Fortia ESG taxonomy". It is based on different financial data provider's taxonomies as well as several sustainability and annual reports where we looked for ESG related criteria. Given a subset of "Fortia ESG taxonomy" (your trainset), participants will be asked to enrich this training set to cover the rest of the terms of the original "Fortia ESG taxonomy". For this purpose, participants will be given a set of annual reports and sustainability reports of financial companies from which they can develop a model allowing to induce semantically related terms to the concepts defined in the training set. For example, given a set of terms related to the concept Waste management (e.g. Hazardous Waste, Waste Reduction Initiatives), you need to find the missing ones by the way that you predict a corresponding concept to unlabeled terms.
Sub-task 2. Participants will be asked to design a system which can automatically classify sentences into sustainable or unsustainable sentences making use of the enriched taxonomy if helpful. For this purpose, participants will be given a list of carefully selected labeled sentences from the sustainability reports and other documents. In this shared task, we consider a sentence as sustainable if a sentence semantically mentions the Environmental or Social or Governance related factors as defined in our ESG taxonomy.
Performance will be measured according to the accuracy with which label is assigned, and according to recall (based on the total number of predictions).
This year, we propose a subset of our in-house made ESG taxonomy and a dataset composed of financial and non-financial reportings. And we are interested in systems which make use of contextual word embeddings such as BERT (Devlin et al. 2018), as well as systems which make use of resources related to the ESG (Environmental, Social and Governance) and sustainability including EU taxonomy.

=====References=====
Daniele Alfarone and Jesse Davis (2015). Unsupervised Learning of an IS-A Taxonomy from a Limited Domain-Specific Corpus. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015).
Georgeta Bordea, Paul Buitelaar, Stefano Faralli and Roberto Navigli (2015). “SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval)”. In Proceedings of SemEval 2015, co-located with NAACL HLT 2015, Denver, Col, USA.
Georgeta Bordea, Els Lefever, and Paul Buitelaar (2016). “Semeval-2016 task 13: Taxonomy extraction evaluation (TExEval-2)”. In Proceedings of the 10th International Workshop on Semantic Evaluation, San Diego, CA, USA.
Jose Camacho-Collados, Claudio Delli Bovi, Luis Espinosa-Anke, Sergio Oramas, Tommaso Pasini, Enrico Santus, Vered Shwartz, Roberto Navigli, and Horacio Saggion (2018). “SemEval-2018 Task 9: Hypernym Discovery”. In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, United States. Association for Computational Linguistics.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. https://arxiv.org/abs/1810.04805v2.
Jean-Philippe Fauconnier, Mouna Kamel and Bernard Rothenburger (2015). A Supervised Machine Learning Approach for Taxonomic Relation Recognition through Non-linear Enumerative Structures. In: 30th ACM Symposium on Applied Computing (SAC 2015), 13 April 2015 - 17 April 2015 (Salamanque, Spain).
David Jurgens and Mohammad Taher Pilehvar (2016). “SemEval-2016 Task 14: Semantic Taxonomy Enrichment”. In Proceedings of SemEval-2016, NAACL-HLT.
Félix Martel, Amal Zouaq (2021). Taxonomy extraction using knowledge graph embeddings and hierarchical clustering. In SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing, March 2021 Pages 836–844.
Rajdeep Sarkar, John P. McCrae, Paul Buitelaar (2018). “A supervised approach to taxonomy extraction using word embeddings”. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Vered Shwartz, Yoav Goldberg, Ido Dagan. (2016). Improving Hypernymy Detection with an Integrated Path-based and Distributional Method. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).

=====Registration=====
To register your interest in participating in FinSim shared task, please use the following google form: https://docs.google.com/forms/d/1wUrFr1z9Z3Yi3XC6UAM6WwvVNd7N0V3_bTj4PVmClVU/edit?usp=sharing

=====Prize=====
A USD$1000 prize will be rewarded to the best-performing teams.

=====Important Dates=====
April 04, 2022: First announcement of the shared task and beginning of registration
April 20, 2022 : Second announcement of the shared task
April 20, 2022 : Release of training set & scoring scripts.
May 20, 2022: Release of test set.
May 26, 2022: System's outputs submission deadline.
May 30, 2022: Release of results.
May 30, 2022: Shared task title and abstract due
June 06, 2022: Shared task paper submissions due
June 17, 2022: Registration deadline.
June 17, 2022: Camera-ready version of shared task paper due
July 23-25, 2022: FinNLP-2022 workshop @IJCAI-ECAI-2022

=====Contact=====
For any questions on the shared task, please contact us on fin.sim.task@gmail.com.

=====Shared Task Co-organizers - Fortia Financial Solutions=====
Juyeon KANG
Mehdi Kchouk
Sandra Bellato
Mei Gan
Ismail EL MAAROUF