posted by organizer: shamujum || 2027 views || tracked by 2 users: [display]

DQAML 2021 : 2nd International Workshop on Data Quality Assessment for Machine Learning @ KDD 2021


When Aug 14, 2021 - Aug 18, 2021
Where Virtual
Submission Deadline May 27, 2021
Notification Due Jun 10, 2021
Categories    data quality   machine learning

Call For Papers

DQAML2021: Call for Papers

Dear Colleagues,

We invite you to submit high-quality research papers to the 2nd International Workshop on Data Quality Assessment for Machine Learning in conjunction with The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021) Virtual event , August 14th-18th, 2021

Website :
Contact :

Important Deadlines

Submission : May 27th, 2021 (Extended Deadline)
Decisions : June 10th, 2021
Workshop : August 14-18th, 2021

All deadlines are 11.59 pm UTC -12h ("Anywhere on Earth").

Workshop Description

In the past decade, AI/ML technologies have become pervasive in academia and industry, finding their utility in newer and challenging applications. While there has been a focus to build better, smarter and automated ML models little work has been done to systematically understand the challenges in the data and assess its quality issues before it is fed to an ML pipeline. Issues such as incorrect labels, synonymous categories in a categorical variable, heterogeneity in columns etc. which might go undetected by standard pre-processing modules in these frameworks can lead to suboptimal model performance. Although, some systems are able to generate comprehensive reports with details of the ML pipeline, a lack of insight and explainability w.r.t. to the data quality issues leads to data scientists spending ~80% time on data preparation before employing these AutoML solutions. This is why data preparation has been called out as one of the most time-consuming step in an AI lifecycle. Since the quality of data is not known at Step 0, when the data is acquired, data preparation becomes an iterative debugging process and becomes more of an art, leveraging the experience of a data scientist. Because the performance of an ML model is only as good as the training data it sees, a systematic analysis of data quality before building AI/ML models is of utmost importance.

The goal of this workshop is to attract researchers working in the fields of data acquisition, data labeling, data quality, data preparation and AutoML areas to understand how the data issues, their detection and remediation will help towards building better models. With a focus on different modalities such as structured data, time series data, text data and graph data, this workshop invites researchers from academia and industry to submit novel propositions for systematically identifying and mitigating data issues for making data AI ready.


Methods of data assessment can change depending on the modality of the data. This workshop will invite submissions for data quality assessment for different modalities: structured (or tabular) data, unstructured (such as text, log, images) data, graph structured (relational, network) data, time series data, spatio-temporal data etc. We would like to explore state-of-the-art deep learning and AI concepts such as deep reinforcement learning, graph neural networks, self-supervised learning, capsule networks and adversarial learning to address the problems of data assessment quality for ML. Following is a (non-exhaustive) list of topics that are of interest to this workshop:
- Algorithms for assessment of data quality issues relevant to ML
- Automatic remediation of data quality issues
- Human-assisted data cleaning and remediation
- Automated data cleaning workflows
- Explainability and interpretability of quality assessment
- Interactive debugging of data
- Smarter data visualisations for high dimensional data
- Evaluation techniques for data quality assessment
- Real world use cases and applications of data quality assessment
- Novel interfaces to assist human-in-the-loop intervention for interactive data cleaning
- Quality-aware representations and sampling of high dimensional data
- Representative sampling for high dimensional data
- Detection of bias and privacy breach
- Label noise detection, explanation and incorporating feedback
- Noise and low-quality data robustness studies
- Handling corrupted, missing and uncertain data
- Outlier (or anomaly) detection and mitigation in data
- Addressing Class Imbalance in data
- Benchmarking of data preparation and cleaning systems and tools: data sets and frameworks

Submission Instructions

We solicit submission of papers of papers of 4 to 10 pages representing reports of original research, preliminary research results, case studies, proposals for new work and position papers.

All papers will be peer reviewed, single blind (i.e. author names and affiliations should be listed). If accepted, at least one of the authors must attend the workshop to present the work. The submitted papers must be written in English and formatted in the double column standard according to the ACM Proceedings Template, Tighter Alternate style ( The papers should be in PDF format and submitted via the EasyChair submission site ( The workshop website will archive the published papers. The submitted papers must not be previously published anywhere and must not be under consideration by any other conference or journal during the workshop review process.

Workshop Organizers

- Hima Patel, IBM Research AI, India
- Fuyuki Ishikawa, National Institute of Informatics, Japan
- Laure Berti-Equille, IRD, ESPACE-DEV, France
- Nitin Gupta, IBM Research AI, India
- Sameep Mehta, IBM Research AI, India
- Satoshi Masuda, IBM Research AI, Japan
- Shashank Mujumdar, IBM Research AI, India
- Shazia Afzal, IBM Research AI, India
- Srikanta Bedathur, Indian Institute of Technology Delhi, India
- Yasuharu Nishi, The University of Electro-Communications, Japan


Workshop Chairs,

Related Resources

AMLDS 2025   2025 International Conference on Advanced Machine Learning and Data Science
ICMLA 2024   23rd International Conference on Machine Learning and Applications
DSIT 2024   2024 7th International Conference on Data Science and Information Technology (DSIT 2024)
Ei/Scopus-AACIP 2024   2024 2nd Asia Conference on Algorithms, Computing and Image Processing (AACIP 2024)-EI Compendex
Ei/Scopus-ACAI 2024   2024 7th International Conference on Algorithms, Computing and Artificial Intelligence(ACAI 2024)
Ei/Scopus- DMCSE 2024   2024 International Conference on Data Mining, Computing and Software Engineering (DMCSE 2024)
AAAI 2025   The 39th Annual AAAI Conference on Artificial Intelligence
IEEE-Ei/Scopus-SGGEA 2024   2024 Asia Conference on Smart Grid, Green Energy and Applications (SGGEA 2024) -EI Compendex
EI/Scopus-PRDM 2024   2024 5th International Conference on Pattern Recognition and Data Mining(PRDM 2024)
Ei/Scopus-CISDS 2024   2024 3rd International Conference on Communications, Information System and Data Science (CISDS 2024)