MLCID 2023 : Machine Learning from Class Imbalanced Data

posted by user: Hassanat || 476 views || tracked by 1 users: [display]

MLCID 2023 : Machine Learning from Class Imbalanced Data

Link: https://www.mutah.edu.jo/Lists/NewAtMutah/Disp_Form.aspx?ID=25

When	Jul 9, 2023 - Jul 12, 2023
Where	Tunis, Tunisia
Submission Deadline	Apr 15, 2023
Notification Due	May 22, 2023
Final Version Due	May 30, 2023

Categories machine learning class imbalance resampling

Call For Papers

This workshop will be organized by Prof. Hasanat as part of the 28th IEEE Symposium on Computers and Communications (ISCC),
https://2023.ieee-iscc.org/ .
and it will include presentations, tutorials, and panel discussions to facilitate knowledge sharing and collaboration among participants. This workshop is peer-reviewed and published as part of the symposium proceedings and needs registration fees as per the ISCC.

Problem and scope of the workshop:

Class imbalance occurs when training a dataset with examples from one class vastly outnumbering those from the other. The majority class is commonly referred to as such, while the minority class is commonly referred to as such. A single dataset may contain more than one majority class and more than one minority class. Machine learning models trained on unequal training sets exhibit a prediction bias, which is connected to poor performance in the minority class(es), The bias could range from a minor imbalance to a major imbalance depending on the dataset used.

This problem has grown and becomes a substantial challenge since the minority class is frequently crucial, as it reflects favorable instances that are rare in nature or expensive to collect. This is true when considering contexts such as Big Data analytics, Biometrics, gene profiling, credit card fraud detection, content-based image retrieval, disease detection, Natural Language Processing, network security, image recognition, Anomaly Detection, etc.

There are several approaches to solving class imbalance problem before starting machine learning, such as:

· More samples from the minority class(es) should be acquired from the knowledge domain.

· Changing the loss function to give the failing minority class a higher cost.

· Oversampling the minority class.

· Undersampling the majority class.

· Any combination of previous approaches.

Each of the aforementioned approaches has advantages and disadvantages. Oversampling, on the other hand, is the most commonly utilized approach, as seen by the plethora of oversampling methods developed in the last two decades. However, this does not necessarily imply that the oversampling approach is desirable.

Oversampling methods, according to several recent studies, increase the number of minority-class cases by creating new ones out of out of nothing based only on their similarity to one or more of the minority's examples. This is problematic because such methods increase the probability of the learning process becoming overfitted.

Overfitted synthetic datasets generate impressive machine-learning results on paper, but this is not necessarily the case in practice. Another more serious issue with oversampling is that the generated examples may exist in the real world as members of a different class, regardless of how similar they are to the minority's examples, as demonstrated by a recent study calling for an end to oversampling class imbalanced data.

The main issue with oversampling methods is that these methods assume that all synthesized data belong to the minority class without providing any guarantee, even though some of these generated examples may actually belong to the majority class, making a learned model dangerous in real-world applications, especially crucial applications like security, autonomous driving, aviation safety, and medical applications, where even a single unrealistic false synthetic example can cause fetal harm.

Workshop goal:

This workshop is about creative ideas for class imbalance learning that are based on the aforementioned methodologies, with the exception of the oversampling approach, because the problem is still significant and requires stronger and more practical solutions.

We anticipate that at least ten accepted papers will be presented on the workshop's designated date and that the workshop's chair, Prof. Ahmad Hassanat, will give a talk about the problem and current solutions, as well as lead lively discussions of each accepted paper in order to create a brainstorming discussion based on the provided solutions and their viability.