HAL Id: hal-00717506
https://hal.archives-ouvertes.fr/hal-00717506
Submitted on 13 Jul 2012
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data
Laurent Bergé, Charles Bouveyron, Stéphane Girard
To cite this version:
Laurent Bergé, Charles Bouveyron, Stéphane Girard. HDclassif: An R Package for Model-Based Clus- tering and Discriminant Analysis of High-Dimensional Data. 1ères Rencontres R, Jul 2012, Bordeaux, France. �hal-00717506�
and Disriminant Analysis of High-Dimensional Data
L. Bergé
a
and C. Bouveyron
b
and S. Girard
c
a
LaboratoireGREthA
Université Bordeaux IV
laurent.bergeu-bordeaux4.fr
b
LaboratoireSAMM, EA 4543
Université Paris1 Panthéon-Sorbonne
harles.bouveyronuniv-paris1.fr
b
Equipe Mistis
INRIA Rhne-Alpes &LJK
stephane.girardinrialpes.fr
Mots lefs : Model-based lassiationand lustering, high-dimensional data, subspaes.
ThispaperpresentstheRpakageHDlassif whihisdevotedtothelusteringandthedisrim-
inant analysis of high-dimensional data. The lassiation methods proposed in the pakage
result from a new parametrization of the Gaussianmixture model whih ombines the idea of
dimension redution and model onstraints on the ovariane matries. The supervised las-
siation method using this parametrization is alled high dimensional disriminant analysis
(HDDA).Inasimilarmanner,the assoiatedlusteringmethodisalledhighdimensionaldata
lustering (HDDC)and usesthe expetation-maximizationalgorithmforinferene. Inorderto
orretlytthe data,both methodsestimatethe speisubspae and the intrinsi dimension
of the groups. Due to the onstraints on the ovariane matries, the number of parameters
toestimateissigniantly lowerthan othermodel-based methodsand this allows the methods
to be stable and eient in high dimensions. Two introdutory examples illustrated with R
odes allow the user to disover the hdda and hdd funtions. Experiments onsimulated and
real datasets also ompare HDDC and HDDA with existing lassiation methods on high-
dimensional datasets. HDlassif is a free software and distributed under the general publi
liense, aspart of the R software projet.
TheRpakageHDlassif (urrentlyinversion1.2)implementsthesetwolassiationmethods
for the lustering and the disriminant analysis of high-dimensional data. The pakage is
availablefrom the CRAN athttp://CRAN.R-projets.org/pakage=HDlassif.
Référenes
[1℄L.Bergé, C.BouveyronandS.Girard,HDlassif: anRPakageforModel-BasedClustering
and Disriminant Analysis of High-Dimensional Data, Journal of Statistial Software, vol. 42