Tokenization 2026 : The Second Tokenization Workshop @ COLM 2026

posted by user: grupocole || 797 views || tracked by 3 users: [display]

Tokenization 2026 : The Second Tokenization Workshop @ COLM 2026

Link: https://tokenization-workshop.github.io/

When	Oct 29, 2026 - Oct 29, 2026
Where	San Francisco, CA, USA
Submission Deadline	Jun 23, 2026
Notification Due	Jul 24, 2026

Categories NLP artificial intelligence computational linguistics

Call For Papers

*First Call for Papers*

TokShop: Second Tokenization Workshop (COLM 2026) https://tokenization-workshop.github.io

**Important days**

- Deadline for submissions is June 23, 2026, at 11:59 pm (anywhere on earth)
- Notifications of acceptance will be sent out on July 24, 2026
- Camera-ready papers will be due shortly afterward at 11:59 pm (anywhere on earth)

The workshop will take place at the Hilton Union Square in San Francisco, CA, USA on October 9, 2026.

***Workshop Description***

The Second Tokenization Workshop (TokShop) at COLM 2026 aims to bring together researchers and practitioners from across machine learning to explore tokenization in its broadest sense. We will discuss innovations, challenges, and future directions for tokenization across diverse data types and modalities.

***Call for Papers***

Topics of interest include:

- Subword Tokenization in NLP: Analysis of techniques such as BPE, WordPiece, and UnigramLM, as well as improvements for efficiency, interpretability, and adaptability.
- Multimodal Tokenization: Tokenization strategies for images, audio, video, and other modalities, including methods to align representations across different types of data.
- Multilingual Tokenization: Development of tokenizers that work robustly across languages and scripts, and investigation into failure modes tied to tokenization.
- Tokenizer Modification Post-Training: Methods for updating tokenizers after model training to boost performance and/or efficiency without retraining from scratch.
- Alternative Input Representations: Exploration of non-traditional tokenization approaches, such as byte-level, pixel-level, or patch-based representations.
- Statistical Perspectives on Tokenization: Empirical analysis of token distributions, compression properties, and correlations with model behavior.

By broadening the scope of tokenization research beyond language, this workshop seeks to foster cross-disciplinary dialogue and inspire new advances at the intersection of representation learning, data efficiency, and model design.

***Submission Guidelines***

Our author guidelines follow the COLM requirements unless otherwise specified.

- Paper submission is hosted on OpenReview: https://openreview.net/group?id=colmweb.org/COLM/2026/Workshop/TokShop#tab-your-consoles
- We accept non-archival submissions of two types:
- Research papers (up to 9 pages, not including references or appendices)
- Extended abstracts (up to 2 pages)
- Please use the provided LaTeX template (Style Files) for your submission. Please follow the general paper formatting guidelines for COLM, as specified in the style files.
- You may use as many pages of references and appendix as you wish, but reviewers are not required to read the appendix.
- Posting papers on preprint servers like ArXiv is permitted.
- We encourage each submission to discuss the limitations as well as ethical and societal implications of their work, wherever applicable (but neither are required). These sections do not count towards the page limit.
- The paper should be anonymized and uploaded to OpenReview as a single PDF.
- The review process will be double-blind.

Read more: https://tokenization-workshop.github.io/

Related Resources

Ei/Scopus-AI2A 2026 2026 IEEE 6th International Conference on Artificial Intelligence, Automation and Algorithms (AI2A 2026)

DEPLING 2023 International Conference on Dependency Linguistics

Ei/Scopus-ACEPE 2026 2026 3rd IEEE Asia Conference on Advances in Electrical and Power Engineering (ACEPE 2026)

AAIML 2027 IEEE--2027 2nd International Conference on Advances in Artificial Intelligence and Machine Learning

MLDS 2026 7th International Conference on Machine Learning Techniques and Data Science

Cyber-AI 2026 The 2nd IEEE 2026 International Conference on Cybersecurity and AI-Based Systems (Scopus)

DSML 2026 7th International Conference on Data Science and Machine Learning

EDU 2026 11th International Conference on Education

IBCOM 2026 7th International Conference on IoT, Blockchain & Cloud Computing

ICSTTE 2026 2026 4th International Conference on SmartRail, Traffic and Transportation Engineering (ICSTTE 2026)