CMDLC 2023 : Computational models of diachronic language change at ICHL26
Call For Papers
Call for abstracts
Workshop: Computational models of diachronic language change
@the International Conference on Historical Linguistics (ICHL26)
Stefania Degaetano-Ortlieb*, Lauren Fonteyn+, Marie-Pauline Krielke*, Elke Teich*
*Saarland University, +Leiden University
Submission deadline: January 1st 2023
Submission format: One-page abstract (plus references) to be sent to email@example.com
Notification of acceptance: January 12th 2023
We envisage a full day workshop in presence with presentations (20min + 5-10 min discussion). After the workshop, we aim to publish a Special Journal Issue in an open access journal.
While the study of diachronic language change has long been firmly grounded in corpus data analysis, it seems fair to state that the field has been subject of a ‘computational turn’ over the last decade or so, computational models being increasingly adopted across several research communities, including corpus and computational linguistics, computational social science, digital humanities, and historical linguistics.
The core technique for the investigation of diachronic change are distributional models (DMs). DMs rely on the fact that related meanings occur in similar contexts and allow us to study lexical-semantic change in a data-driven way (e.g. as argued by Sagi et al. 2011), and on a larger scale (e.g. as shown on the Google NGram corpus by Gulordava & Baroni 2011). Besides count-based models (e.g. Hilpert & Saavedra 2017), contextualized word embeddings are increasingly employed for diachronic modeling, as such models are able to encode rich, context-sensitive information on word usage (see Lenci 2018 or Fonteyn et al., 2022 for discussion).
In previous work, DMs have been used to determine laws of semantic change (e.g. Hamilton et al. 2016b, Dubossarsky et al. 2017) as well as develop statistical measures that help detect different types of change (e.g. specification vs. broadening; cultural change vs. linguistic change; Hamilton et al. 2016a, Del Tredici et al. 2019). DMs have also been used to map change in specific (groups of) concepts (e.g. ‘racism’, ‘knowledge’; see Sommerauer & Fokkens 2019 for a discussion). Further studies have suggested ways of improving the models that generate (diachronic) word embeddings to attain these goals (e.g. Rudolph & Blei 2018).
Existing studies and projects focus on capturing and quantifying aspects of semantic change. Yet, over the past decade, DMs have also been shown to be useful to investigate other types of change in language use, including grammatical change. Within the computational and corpus linguistic communities, for example, Bizzoni et al. (2019, 2020) have shown an interdependency between lexical and grammatical changes and Teich et al. (2021) use embeddings to detect (lexico-) grammatical conventionalization (which may lead to grammaticalization). Within diachronic linguistics, the use of distributional models is focused on examining the underlying functions of grammatical structures across time (e.g. Perek 2016, Hilpert and Perek 2015, Gries and Hilpert 2008, Fonteyn 2020, Budts 2020). Specifically targeting historical linguistic questions, Rodda et al. (2019) and Sprugnoli et al. (2020) have shown that computational models are promising for analyzing ancient languages, and McGillivray et al. (2022) highlight the advantages of word embeddings (vs. count-based methods) while also pointing to the challenges and the limitations of these models.
A common concern across these different communities is to better understand the general principles or “laws” of language change and the underlying mechanisms (analogy, priming, processing efficiency, contextual predictability as measured by surprisal, etc.). In the proposed workshop, we want to bring together researchers from relevant communities to talk about the unique promises that computational models hold when applied to diachronic data as well as the specific challenges they involve. In doing so, we will identify common ground and explore the most pressing problems and possible solutions.
Specific questions will concern:
Model utility: How can we capture change in language use beyond lexical-semantic change, e.g. change in grammatical constructions, collocations, phraseology?
Model quality: How can we evaluate computational models of historical language stages in absence of native-speaker ‘gold standards’? To what extent does the quality of historical and diachronic corpora affect the performance of models?
Model analytics: How do we transition from testing the reliability of models to employing them to address previously unanswered research questions on language change? How can we detect and “measure” change? What are suitable analytic procedures to interpret the output of models?
Bizzoni, Y., Degaetano-Ortlieb, S., Menzel, K., Krielke, P., and Teich, E. (2019). “Grammar and meaning: analysing the topology of diachronic word embeddings”. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, ACL, Florence, Italy, pp. 175–185.
Bizzoni, Y., Degaetano-Ortlieb, S., Fankhauser, P., and Teich, E. (2020). “Linguistic variation and change in 250 years of English scientific writing: a data-driven approach”. Frontiers in Artificial Intelligence, 3.
Budts, S. (2020). "A connectionist approach to analogy. On the modal meaning of periphrastic do in Early Modern English". Corpus Linguistics and Linguistic Theory, 18(2), pp. 337–364.
Del Tredici, M., Fernández, R., and Boleda, G. (2019). “Short-term meaning shift: A distributional exploration.” In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL): Human Language Technologies, Minneapolis, Minnesota, USA, pp. 2069–2075.
Dubossarsky, H., Weinshall, D., and Grossman, E. (2017). “Outta control: laws of semantic change and inherent biases in word representation models”. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, pp. 1136–1145.
Fonteyn, L. (2020). "What about grammar? Using BERT embeddings to explore functional-semantic shifts of semi-lexical and grammatical constructions." Computational Humanities Research CEUR-WS, pp. 257–268.
Fonteyn, L., Manjavacas, E., and Budts, S. (2022). “Exploring Morphosyntactic Variation & Change with Distributional Semantic Models”. Journal of Historical Syntax, 7(12), pp. 1–41.
Gries, S. T., and Hilpert, M. (2008). “The identification of stages in diachronic data: variability-based Neighbor Clustering”. Corpora, 3(1), pp. 59–81.
Gulordava, K., and Baroni, M. (2011). “A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus”. In Proceedings of Geometrical Models for Natural Language Semantics (GEMS), EMNLP, Edinburgh, United Kingdom, pp. 67–71.
Hamilton, W. L., Leskovec, J., and Jurafsky, D. (2016a). “Cultural shift or linguistic drift? comparing two computational models of semantic change”. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP, Austin, Texas, USA, pp. 2116–2121.
Hamilton, W. L., Leskovec J., and Jurafsky, D. (2016b). “Diachronic word
embeddings reveal statistical laws of semantic change”. In Proceedings of Morphosyntactic Variation & Change with DSMs, 54th Annual Meeting of the Association for Computational Linguistics, ACL, Berlin, Germany, pp. 1489–1501.
Hilpert, M., and Saavedra, D.C. (2020). "Using token-based semantic vector spaces for corpus-linguistic analyses: From practical applications to tests of theoretical claims". Corpus Linguistics and Linguistic Theory, 16(2), pp. 393–424.
Hilpert, M. and Perek, F. (2015). “Meaning change in a petri dish: constructions, semantic vector spaces, and motion charts”. Linguistics Vanguard, 1(1), pp. 339–350.
Lenci, A. (2018). “Distributional Models of Word Meaning”. Annual Review of Linguistics, 4, pp. 151–171.
Perek, F. (2016). “Using distributional semantics to study syntactic productivity in diachrony: a case study”. Linguistics, 54(1), pp. 149–188.
Rodda, M.A., Probert, P., and McGillivray, B. (2019). “Vector space models of Ancient Greek word meaning, and a case study on Homer”. TAL Traitement Automatique des Langues, 60(3), pp. 63–87.
Rudolph, M., and Blei, D. (2018). “Dynamic embeddings for language evolution”. In Proceedings of the 2018 World Wide Web Conference (WWW ’18), Lyon, France, pp. 1003–1011.
Sagi, E., Kaufmann, S., and Clark, B. (2011). “Tracing semantic change with Latent Semantic Analysis”. Current Methods in Historical Semantics, 73, pp. 161–183.
Sommerauer, P., and Fokkens, A. (2019). “Conceptual Change and Distributional Semantic Models: An Exploratory Study on Pitfalls and Possibilities”. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, Florence, Italy, pp. 223–233.
Sprugnoli, R., Moretti, G., and Passarotti, M. (2020). “Building and Comparing Lemma Embeddings for Latin. Classical Latin versus Thomas Aquinas”. IJCoL. Italian Journal of Computational Linguistics, 6(6-1), pp. 29–45.
Teich, E., Fankhauser P., Degaetano-Ortlieb, S., and Bizzoni, Y. (2021). “Less is More/More Diverse: On the Communicative Utility of Linguistic Conventionalization”. Frontiers in Communication, 5.