• Aucun résultat trouvé

Clustering Evaluation in High-Dimensional Data

N/A
N/A
Protected

Academic year: 2022

Partager "Clustering Evaluation in High-Dimensional Data"

Copied!
1
0
0

Texte intégral

(1)

Clustering Evaluation in High-Dimensional Data

Miloˇs Radovanovi´ c

May 4, 2019

Abstract of Invited Presentation

Clustering evaluation plays an important role in unsupervised learning systems, as it is often necessary to automatically quantify the quality of generated cluster configurations. This is especially useful for comparing the performance of different clustering algorithms as well as determining the optimal number of clusters in clustering algorithms that do not estimate it internally. Many clustering quality indexes have been proposed over the years and different indexes are used in different contexts. There is no unifying protocol for clustering evaluation, so it is often unclear which quality index to use in which case.

In this talk, we review existing clustering quality measures and evaluate them in the challenging context of high-dimensional data clustering. High-dimensional data is sparse and distances tend to concentrate, possibly affecting the applicability of various clustering quality indexes. We analyze the stability and discriminative power of a set of standard clustering quality measures with increasing data dimensionality. Our evaluation shows that the curse of dimensionality affects different clustering quality indexes in different ways, and that some are to be preferred when determining clustering quality in many dimensions.

University of Novi Sad

Références

Documents relatifs

A new method for proper assessment of cluster quality is therefore preferred so that: (i) the data set is kept intact during clustering, (ii) one is able to determine

To evaluate the clusters that we have obtained from the proposed method, together with the other baseline algorithms, we performed sample clustering us- ing an automatically

Subspace clustering techniques are mostly based on probabilistic versions of the factor analysis model and allow to classify the data in low-dimensional subspaces without reducing

It is well known that, because of the high dimensionality and sparsity of the data, co-clustering approaches, like one-sided clustering methods, tend to generate highly

These relations are called CLAS since they are assigned a primary role during clustering, they cluster connected components into clusters of possible domain topics..

These relations are called CLAS since they are assigned a primary role during clustering, they cluster connected components into clusters of possible domain topics.. The

Popular clustering methods are based on the Gaussian Mixture Model (GMM) [32] and show a disappointing behavior when the size of the dataset is too small compared to the number

Popular cluster- ing methods are based on the Gaussian mixture model and show a disappointing behavior when the size of the training dataset is too small compared to the num- ber