• Aucun résultat trouvé

HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data

N/A
N/A
Protected

Academic year: 2021

Partager "HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-00717506

https://hal.archives-ouvertes.fr/hal-00717506

Submitted on 13 Jul 2012

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data

Laurent Bergé, Charles Bouveyron, Stéphane Girard

To cite this version:

Laurent Bergé, Charles Bouveyron, Stéphane Girard. HDclassif: An R Package for Model-Based Clus- tering and Discriminant Analysis of High-Dimensional Data. 1ères Rencontres R, Jul 2012, Bordeaux, France. �hal-00717506�

(2)

and Disriminant Analysis of High-Dimensional Data

L. Bergé

a

and C. Bouveyron

b

and S. Girard

c

a

LaboratoireGREthA

Université Bordeaux IV

laurent.bergeu-bordeaux4.fr

b

LaboratoireSAMM, EA 4543

Université Paris1 Panthéon-Sorbonne

harles.bouveyronuniv-paris1.fr

b

Equipe Mistis

INRIA Rhne-Alpes &LJK

stephane.girardinrialpes.fr

Mots lefs : Model-based lassiationand lustering, high-dimensional data, subspaes.

ThispaperpresentstheRpakageHDlassif whihisdevotedtothelusteringandthedisrim-

inant analysis of high-dimensional data. The lassiation methods proposed in the pakage

result from a new parametrization of the Gaussianmixture model whih ombines the idea of

dimension redution and model onstraints on the ovariane matries. The supervised las-

siation method using this parametrization is alled high dimensional disriminant analysis

(HDDA).Inasimilarmanner,the assoiatedlusteringmethodisalledhighdimensionaldata

lustering (HDDC)and usesthe expetation-maximizationalgorithmforinferene. Inorderto

orretlytthe data,both methodsestimatethe speisubspae and the intrinsi dimension

of the groups. Due to the onstraints on the ovariane matries, the number of parameters

toestimateissigniantly lowerthan othermodel-based methodsand this allows the methods

to be stable and eient in high dimensions. Two introdutory examples illustrated with R

odes allow the user to disover the hdda and hdd funtions. Experiments onsimulated and

real datasets also ompare HDDC and HDDA with existing lassiation methods on high-

dimensional datasets. HDlassif is a free software and distributed under the general publi

liense, aspart of the R software projet.

TheRpakageHDlassif (urrentlyinversion1.2)implementsthesetwolassiationmethods

for the lustering and the disriminant analysis of high-dimensional data. The pakage is

availablefrom the CRAN athttp://CRAN.R-projets.org/pakage=HDlassif.

Référenes

[1℄L.Bergé, C.BouveyronandS.Girard,HDlassif: anRPakageforModel-BasedClustering

and Disriminant Analysis of High-Dimensional Data, Journal of Statistial Software, vol. 42

Références

Documents relatifs

The ZIP employ two different process : a binary distribution that generate structural

So in this thesis, to simplify the parameters estimation so thus make the binned-EM and bin-EM-CEM algorithms more efficient, and also to better adapt different data distribution

In order to investigate the effect of the sample size on clustering results in high- dimensional spaces, we try to cluster the data for different dataset sizes as this phenomenon

After having recalled the bases of model-based clustering, this article will review dimension reduction approaches, regularization-based techniques, parsimonious modeling,

Three versions of the MCEM algorithm were proposed for comparison with the SEM algo- rithm, depending on the number M (or nsamp) of Gibbs iterations used to approximate the E

The R package FANet: sparse factor analysis model for high dimensional gene co-expression networks.. Anne Blum, Magalie Houee, Sandrine Lagarrigue,

The R package blockcluster allows to estimate the parameters of the co-clustering models [Govaert and Nadif (2003)] for binary, contingency and continuous data.. This package is

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des