DiLiTAL 2017 : Diversité linguistique et TAL


When Jun 26, 2017 - Jun 26, 2017
Where Orléans, France
Submission Deadline May 1, 2017
Notification Due May 20, 2017
Final Version Due Jun 26, 2017
Categories    natural language processing   poorly endowed languages

Call For Papers

Knowing how to communicate is one of the founding principles of every society. Our lives along with our existence are inextricably related to our ability to mutually understand each other. However, this situation may well be construed as an insuperable problem due to our culture and linguistic difference as well as our life experiences, viewpoints, etc. Replete are the countries where people practice many languages no matter how different their statuses are, a situation which yields an important disparity between languages at the level of Natural Language Processing (NLP) tools and resources. Languages with a lower status are commonly dubbed “Poorly Endowed Languages” (PEL). More often than not, the NLP scientific work exhibits little interest in these languages, which may well be construed as an incontestable handicap, especially within the predominating globalizing context.

Well aware of this situation, this workshop is looking forward to setting up the ground for a reflection whose central thrust is to endorse and enlarge NLP work to a whole range of languages other than the putatively well-treated languages. The main goal meant to be achieved in this workshop is to introduce linguistic and cultural diversity along a line of thinking that focuses on the creation of resources and tools as well as meet the PEL specialists training needs. This demarche, thereby, falls under the various initiatives set up by big international organizations (United Nations, Europe Council, etc.) with an eye to meeting the aforementioned objective.

The need to automatically process PELs derives much of its appeal and utility from scientific and humanitarian (health, education, culture and communication) needs. Stakes may further extend to political needs (access to information and learning). In fine accord with a number of PEL-interested-researchers common line of thinking, DiLiTAL’s central goal is to sensitize scientific community of the stakes and obstacles besetting the processing of such languages. We would like to unravel the main difficulties with an eye to broaching an epistemological reflection on the generic capacity of tools and resources, which will make the processing of a huge number of languages possible. Another objective meant to be achieved is to identify PEL specific needs based on their specificity and typology. From the foregoing, it emerges that this enterprise aims at shedding light on the various methods and techniques deployed so as to inject some dynamism into the construction and sharing of resources and to ease knowledge and know-how transfer.

DiLiTAL workshop will focus on the following points:
I.Resources and corpus: production, standardization and archiving
Under this point, we would like to unravel the current and future efforts invested in collecting and structuring both specialized and general PEL language data. These data (lexicons, corpora, etc.) are, more often than not, made avail of in the training of morpho-syntactic taggers. This training is, in turn, viewed as a preliminary stage for undertaking more complex tasks, such as the accessibility and portability of linguistic data, phenomena that happen to be insuperable problems in the works on PEL. From the foregoing, it emerges that participants’ contribution to open source creation is particularly appreciated and welcome.
II.PEL processing tools
We will also try to accommodate the pertinence of PEL existing tools, and how these tools get around multilingualism. In fact, NLP, ever since the emergence of UTF-8 (Unicode), has developed the ability to diversify its field of action research, making the processing of PEL possible. Along this line of thinking, we intend to invest more effort on improving the management of multilingualism management by including new graphics and standards, while identifying methodological constraints that baffle researchers who work on PEL with the existing tools.
III. Social, cultural and ethical questions
DiLiTAL will also be an opportunity to think of the possibilities of cross-border cooperation with an eye to encouraging experience exchange and injecting some dynamism in NLP research in countries where PELs are attested. It is also our goal in this workshop to address the ethical dimension of collecting, archiving and processing linguistic data, which are anchored in a whole range of social contexts. Importantly, the codes and customs of these social contexts are unknown or are at odds with the codes and traditions of researchers. Of prime importance among our goals is also to reflect on a field methodology whose central objective is not only to serve science but also to account for citizens, who identify themselves to the language that they speak, and to the sociocultural context where they live. This workshop will further be an opportunity to discuss how NLP could be viewed as an engine that brings about linguistic change.
IV. Return of experience
Any contribution that gives a handle on experiences of PEL processing is welcome and appreciated. The synthesis reached by researchers having worked on this sort of problems will enable us to enrich the epistemological reflection necessary for the setting up of tools, and will also help us overcome the framework of one particular sort of languages. These experiences may well be construed to be of prime utility for identifying the fields to which research on NLP may contribute. No matter what language we speak of, a return of experience is of paramount importance in identifying the approaches convergences and divergences. With the above as background, any work on return of experience will, thereby, contribute in enriching this workshop and the action research that follows.

Topics addressed:
- PEL Tools and resources: implementation and evaluation;
- Writing systems and multilingualism management;
- Methodology (roadmap) for PEL digitalization;
- Neology treatment;
- NLP for the normalization and codification of PELs;
- etc.

