• Aucun résultat trouvé

Clustering Evaluation in High-Dimensional Data

N/A
N/A
Protected

Academic year: 2022

Partager "Clustering Evaluation in High-Dimensional Data"

Copied!
1
0
0

Texte intégral

(1)

Clustering Evaluation in High-Dimensional Data

Miloˇs Radovanovi´ c

May 4, 2019

Abstract of Invited Presentation

Clustering evaluation plays an important role in unsupervised learning systems, as it is often necessary to automatically quantify the quality of generated cluster configurations. This is especially useful for comparing the performance of different clustering algorithms as well as determining the optimal number of clusters in clustering algorithms that do not estimate it internally. Many clustering quality indexes have been proposed over the years and different indexes are used in different contexts. There is no unifying protocol for clustering evaluation, so it is often unclear which quality index to use in which case.

In this talk, we review existing clustering quality measures and evaluate them in the challenging context of high-dimensional data clustering. High-dimensional data is sparse and distances tend to concentrate, possibly affecting the applicability of various clustering quality indexes. We analyze the stability and discriminative power of a set of standard clustering quality measures with increasing data dimensionality. Our evaluation shows that the curse of dimensionality affects different clustering quality indexes in different ways, and that some are to be preferred when determining clustering quality in many dimensions.

University of Novi Sad

Références

Documents relatifs

The standard model for lustering observations desribed through ategorial variables is the so-alled latent lass model (see.. for instane

The remainder of this paper is laid out as follows. Section 2 presents a detailed back- ground on high dimensional clustering techniques as well as details on traditional

For these close phenomenological links between trait- schizotypy and state-CHR, the current review critically assesses the value of the schizotypy construct for the prediction

Our experiments seem to show that using the clustering to partition the ob- ject space, and then executing independent supervised learnings on each created group of data objects

After having recalled the bases of model-based clustering, this article will review dimension reduction approaches, regularization-based techniques, parsimonious modeling,

Abstract: For the last three decades, the advent of technologies for massive data collection have brought deep changes in many scientific fields. What was first seen as a

This work shows that learning a representation of seismic data in order to cluster seismic events in continuous waveforms is a challenging task that can be tackled with deep