3Clust 2012 : Workshop on Multi-view data, High-dimensionality, External Knowledge: Striving for a Unified Approach to Clustering (3Clust)
Call For Papers
CALL FOR PAPERS
Workshop on Multi-view data, High-dimensionality, External Knowledge: Striving for a Unified Approach to Clustering (3Clust)
In conjunction with the 16th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2012)
May 29 - June 1, 2012. Kuala Lumpur, Malaysia
Clustering is the key step for many tasks in data/knowledge management and mining, whose aim is to discover unknown relationships and/or patterns from large sets of data. A considerable amount of work has been done for data clustering during the last four decades, and a myriad of methods has been proposed focusing on different data types, proximity functions, cluster representation models, and cluster presentation. Clustering is a challenging problem due to its ill-posed nature. It is well known that off-the-shelf clustering methods may discover different patterns in a given set of data. This is because each clustering algorithm has its own bias resulting from the optimization of different criteria. No ground truth is available to validate the result.
We propose a full-day workshop titled "Multi-view data, High-dimensionality, External Knowledge: Striving for a Unified Approach to Clustering (3Clust)". The purpose of this workshop is to solicit and discuss the latest advances in data clustering research for solving emerging and challenging issues concerning three major themes: (i) multi-view data, (ii) high-dimensionality, and (iii) external knowledge.
Most of the existing approaches to data clustering provide single clustering solutions and/or use the same space (typically very large) of attributes to represent all clusters. However, in several real-life domains, data can be explained according to different views. For instance, in genomics, multiple clustering solutions should be provided to capture the multiple functional roles of genes. In text mining, documents inherently discuss multiple topics, hence their grouping by content should reflect different informative views which correspond to multiple (possibly alternative) clustering solutions. In evolving data (streams) management, users could be interested in different views of the data that may correspond to different informative needs and be dependent on part of the existing dimensions or new dimensions selected over time.
The high-dimensionality of the data poses an additional difficult challenge to the clustering process. Almost all problems of practical interest are high dimensional. Data with thousands of dimensions abound in fields and applications as diverse as bioinformatics, security and intrusion detection, and information and image retrieval. A common scenario with high-dimensional data is that several clusters may exist in different subspaces comprised of different combinations of features. In many real-world problems, points in a given region of the input space may cluster along a given set of dimensions, while points located in another region may form a tight group with respect to different dimensions. Each dimension could be relevant to at least one of the clusters. Multiple clustering solutions can also be hidden in projections of the data. Furthermore, in many applications, domain knowledge is available, and could be used to guide the clustering process.
All the above issues have vexed researchers dealing with clustering for decades. Advances in clustering techniques include clustering ensembles, semi-supervised clustering, subspace/projective clustering, co-clustering, and multi-view clustering. Despite the advancement in data clustering techniques, the literature still lacks a unified framework capable of handling the challenges present in the ill-posed nature of clustering, the high-dimensionality issue, the often multi-faceted nature of the data, and the opportunity of exploiting application-driven and user-provided knowledge. Such framework would enable clustering solutions for a variety of tasks and applications, including: data integration, topic detection and tracking, evolving data management, collaborative filtering, document classification and retrieval, Web data analysis, social network applications, etc.
We solicit original papers (including work in progress) that contribute to narrow the aforementioned research gap in data clustering. In particular, we solicit approaches for solving emerging problems such as clustering ensembles, semi-supervised clustering, subspace/projective clustering, co-clustering, and multi-view clustering. Of particular interest will be papers that draw new and insightful connections between these techniques; and papers that contribute to the achievement of a unified framework that combines two or more of these techniques.
TOPICS of interest include (but are not limited to):
1. Clustering Ensembles
2. Co-clustering Ensembles
3. Subspace/Projective Clustering
4. Semi-supervised Clustering
5. Multiview/Alternative Clustering
6. Combining Clustering Ensembles/Multiview Clustering and Subspace Clustering/Co-clustering
7. Combining Clustering Ensembles/Multiview Clustering and Semi-supervised Clustering
8. Combining Subspace Clustering/Co-clustering and Semi-supervised Clustering
9. Bayesian Learning for Clustering
10. Model Selection Issues: How Many Clusters?
11. Multiview and Clustering Ensembles: How Many Clusterings?
12. Co-clustering with External Knowledge for Relational Learning
13. Probabilistic Clustering with Constraints
14. Kernels for Semi-supervised Clustering
15. Active Learning of Constraints in Clustering Ensembles
16. Clustering Ensembles for Uncertain Data Management and Mining
17. Constraint-based Clustering for Uncertain Data Management and Mining
18. Integration of Frequent Pattern Mining in (Semi-supervised) Multi-view Clustering
19. Evaluation Criteria for Multi-view Data Clustering
20. Incorporating User Feedback in Semi-supervised Clustering
Submission deadline: January 23, 2012
Acceptance notification: February 10, 2012
Camera-ready deadline: February 24, 2012
3Clust Workshop Organizers:
Department of Computer Science
George Mason University,
Fairfax, VA 22030, USA
08018 Barcelona, Spain
Department of Electronics, Computer and Systems Sciences
University of Calabria,
Rende, CS 87036, Italy
To be announced soon.
Submission Instructions and Policy:
Papers submitted to this workshop should have a maximum length of 12 pages and formatted according to the Springer-Verlag Lecture Notes in Artificial Intelligence guidelines. Authors instructions and style files can be downloaded at http://www.springer.de/comp/lncs/authors.html.
All papers (in PDF format) should be submitted via the Microsoft CMT system at https://cmt.research.microsoft.com/3CLUST2012/.
As required by the PAKDD 2012 Workshop Co-Chairs, by submitting a paper to the workshop, the authors promise that, if the paper is accepted, at least one author will attend the workshop to present the paper. For no-show authors, their affiliations will receive a notification.
Each workshop will have a right to include its outstanding papers in a LNCS/LNAI post Proceedings of PAKDD Workshops published by Springer. Under the program, the workshop chairs will organize a review committee to select the outstanding papers from the papers presented in the workshop. Based on the reviews, each selected paper should be further improved for the camera ready version. A detailed schedule of due dates for the paper selection and the collection of the camera ready versions will be announced immediately after the workshop.
Further information will be made available on the workshop website (http://sites.google.com/site/3clust/)
If you have any question, please contact us at firstname.lastname@example.org. Thank you.