• Aucun résultat trouvé

Gaussian Based Visualization of Gaussian and Non-Gaussian Based Clustering

N/A
N/A
Protected

Academic year: 2021

Partager "Gaussian Based Visualization of Gaussian and Non-Gaussian Based Clustering"

Copied!
26
0
0

Texte intégral

(1)

HAL Id: hal-02400486

https://hal.archives-ouvertes.fr/hal-02400486

Submitted on 9 Dec 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Gaussian Based Visualization of Gaussian and Non-Gaussian Based Clustering

Christophe Biernacki, Matthieu Marbac-Lourdelle, Vincent Vandewalle

To cite this version:

Christophe Biernacki, Matthieu Marbac-Lourdelle, Vincent Vandewalle. Gaussian Based Visualization

of Gaussian and Non-Gaussian Based Clustering. SPSR 2019, Apr 2019, Bucarest, Romania. �hal-

02400486�

(2)

Gaussian Based Visualization

of Gaussian and Non-Gaussian Based Clustering

M. Marbac-Lourdelle, C. Biernacki, V. Vandewalle

SPSR 2019

10-11 May 2019, Bucharest, Romania

1/22

(3)

Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians Numerical illustrations for functional data Discussion

Take home message

Data/density visualization

Traditionally: chose the form of the mapping fromX toY foruser convenience Proposal: chose the form of the density of the data onYforcluster interpretation convenience

2/22

(4)

Outline

1 Clustering: from modeling to visualizing

2 Mapping clusters as spherical Gaussians

3 Numerical illustrations for functional data

4 Discussion

3/22

(5)

Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians Numerical illustrations for functional data Discussion

Model-based clustering: pitch

1

Data set:x= (x1, . . . ,xn), eachxi ∈ X withdX variables (possibly mixing continuous, categorical, functional. . . )

Unknown partition inK clusters:z= (z1, . . . ,zn) with binary notation zi= (zi1, . . . ,ziK)

Statistical model: couples (xi,zi) independently arise from the parametrized pdf

f(xi,zi)

| {z }

∈F

=

K

Y

k=1

kfk(xi)]zik ⇒ f(xi) =

K

X

k=1

πkfk(xi)

Estimatingf: implement the MLE principle through an EM-like algorithm EstimatingK: use some information criteria as BIC, ICL,. . .

Estimatingz: use the MAP principle ˆzik= 1 iifk= arg max`ti`( ˆf) where

tik(f) = p(zik= 1|xi;f) = πkfk(xi)

K

X

`=1

π`f`(xi)

| {z }

f(xi)

.

1See for instance[McLachlan & Peel 2004],[Biernacki 2017]

4/22

(6)

Model-based clustering: poor user-friendly understanding

norK large: poor overview of partition ˆz

dX large: too many parameters to embrace as a whole in ˆfk

ComplexX: specific and non trivial parameters involved in ˆfk

Visualization procedures

Aim at proposing user-friendly understanding of the mathematical clustering results

5/22

(7)

Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians Numerical illustrations for functional data Discussion

Overview of clustering visualization: individual and pdf mappings Indivdual mapping: visualize x and its estimated partition ˆ z

Transformsx, defined onX, intoy= (y1, . . . ,yn), defined on a new spaceY

Mind∈ Mind: x∈ Xn7→y=Mind(x)∈ Yn

Many methods, depending onX definition: PCA, MCA, MFA, FPCA, MDS. . . Some of them use ˆzinMind: LDA, mixture entropy preservation[Scrucca 2010]

Nearly always,Y=R2

Model ˆf(x,z) is not taken into account, approach focused onx

Pdf mapping: display information relative to the f distribution

Transformsf =P

kπkfk∈ F, into a new mixtureg=P

kπkgk∈ G Mpdf∈ Mpdf: f ∈ F 7→g=Mpdf(f)∈ G

Gis a pdf family defined on the spaceY

Mpdfis often obtained as a by product ofMind(variable change formula) Gis not a usual mixture family (Gaussian, ...) whenMindis a nonlinear mapping For largen,Mindfinally displaysMpdf

Often, bothyandg are overlaid

6/22

(8)

Overview of clustering visualization: individual and pdf mappings Indivdual mapping: visualize x and its estimated partition ˆ z

Transformsx, defined onX, intoy= (y1, . . . ,yn), defined on a new spaceY

Mind∈ Mind: x∈ Xn7→y=Mind(x)∈ Yn

Many methods, depending onX definition: PCA, MCA, MFA, FPCA, MDS. . . Some of them use ˆzinMind: LDA, mixture entropy preservation[Scrucca 2010]

Nearly always,Y=R2

Model ˆf(x,z) is not taken into account, approach focused onx

Pdf mapping: display information relative to the f distribution

Transformsf =P

kπkfk∈ F, into a new mixtureg=P

kπkgk∈ G Mpdf∈ Mpdf: f ∈ F 7→g=Mpdf(f)∈ G

Gis a pdf family defined on the spaceY

Mpdfis often obtained as a by product ofMind(variable change formula) Gis not a usual mixture family (Gaussian, ...) whenMindis a nonlinear mapping For largen,Mindfinally displaysMpdf

Often, bothyandg are overlaid

6/22

(9)

Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians Numerical illustrations for functional data Discussion

Traditional visualization strategies and proposal Traditional strategies: Controlling the mapping family M

pdf 2

G(Mpdf)

| {z }

uncontrolled

= (

g:g =Mpdf(f),f ∈ F,Mpdf∈ Mpdf

| {z }

controlled

)

Nature ofGcan dramatically depend on the choice ofMpdf It can potentially lead to very different cluster shapes!

Arguments for traditionalMpdf: user-friendly, easy-to-compute Examples: linear mappings in all PCA-like methods

Proposed strategy: Controlling the pdf family G

Mpdf(G)

| {z }

uncontrolled

=

Mpdf:g =Mpdf(f),f ∈ F,g∈ G

|{z}

controlled

It is the reversed situation whereGis controlled instead ofMpdf

Offer opportunity to impose directlyGto be a user-friendly mixture family StrategyMand StrategyGare both validbut StrategyGis rarely explored!

2Similar thinking withMind 7/22

(10)

Traditional visualization strategies and proposal Traditional strategies: Controlling the mapping family M

pdf 2

G(Mpdf)

| {z }

uncontrolled

= (

g:g =Mpdf(f),f ∈ F,Mpdf∈ Mpdf

| {z }

controlled

)

Nature ofGcan dramatically depend on the choice ofMpdf It can potentially lead to very different cluster shapes!

Arguments for traditionalMpdf: user-friendly, easy-to-compute Examples: linear mappings in all PCA-like methods

Proposed strategy: Controlling the pdf family G

Mpdf(G)

| {z }

uncontrolled

=

Mpdf:g =Mpdf(f),f ∈ F,g∈ G

|{z}

controlled

It is the reversed situation whereGis controlled instead ofMpdf

Offer opportunity to impose directlyGto be a user-friendly mixture family StrategyMand StrategyGare both validbut StrategyGis rarely explored!

2Similar thinking withMind 7/22

(11)

Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians Numerical illustrations for functional data Discussion

Outline

1 Clustering: from modeling to visualizing

2 Mapping clusters as spherical Gaussians

3 Numerical illustrations for functional data

4 Discussion

8/22

(12)

Spherical Gaussians as candidates

Users are usually familiar withmultivariate spherical GaussiansonY=RdY Thus a simple and “user-friendly” candidategis a mixture of spherical Gaussians

g(y;µ) =

K

X

k=1

πk

|{z}

fromf

φdY(y;µk

|{z}

?

,I)

whereµ= (µ1, . . . ,µK) andφdY(.;µk,I) the pdf of the Gaussian distribution with expectationµk= (µk1, . . . , µkdY)∈RdY

with covariance matrix equal to identityI

g(·;µ) should be then linked withf in order to define a sensibleG

G={g:g(·;µ),µ∈arg minδ(f,g(·;µ)),f ∈ F }

9/22

(13)

Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians Numerical illustrations for functional data Discussion

g as the “clustering twin” of f

Question: how to chooseδsince generallyX 6=Y?

Answer: in our clustering context,δshould measure theclustering ability difference

Kullback-Leibler divergence of clustering ability between bothf andg(·;µ)2

δKL(f,g(·;µ)) = Z

T

pf(t) ln pf(t) pg(t;µ)dt

where

pf: pdf of proba. of classificationt(f) = (ti(f))ni=1, withti(f) = (tik(f))K−1k=1 pg(·;µ): pdf of proba. of classif. t(g) = (ti(g))ni=1, withti(g) = (tik(g))K−1k=1 T ={t:t= (t1, . . . ,tK−1),tk>0,P

ktk= 1}

Thus,g should produce a distribution of the class membership posterior probabilities similar the one resulting off.

A natural requirement:pg(·;µ) andg should be linked by a one-to-one mapping Currently not true since rotations and/or translations are possible

It means: for one distributionf, there is a unique optimal distributiong(·;µ) Additional constraints ong(·;µ): dY =K−1,µK=0,µkh= 0 (h>k),µkk≥0

2pf is the reference measure 10/22

(14)

g as the “clustering twin” of f

Question: how to chooseδsince generallyX 6=Y?

Answer: in our clustering context,δshould measure theclustering ability difference

Kullback-Leibler divergence of clustering ability between bothf andg(·;µ)2

δKL(f,g(·;µ)) = Z

T

pf(t) ln pf(t) pg(t;µ)dt

where

pf: pdf of proba. of classificationt(f) = (ti(f))ni=1, withti(f) = (tik(f))K−1k=1 pg(·;µ): pdf of proba. of classif. t(g) = (ti(g))ni=1, withti(g) = (tik(g))K−1k=1 T ={t:t= (t1, . . . ,tK−1),tk>0,P

ktk= 1}

Thus,g should produce a distribution of the class membership posterior probabilities similar the one resulting off.

A natural requirement:pg(·;µ) andg should be linked by a one-to-one mapping Currently not true since rotations and/or translations are possible

It means: for one distributionf, there is a unique optimal distributiong(·;µ) Additional constraints ong(·;µ): dY =K−1,µK=0,µkh= 0 (h>k),µkk≥0

2pf is the reference measure 10/22

(15)

Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians Numerical illustrations for functional data Discussion

Estimating the Gaussian centers

The Kullback-Leibler divergenceδKLhas generally no closed-form Estimate it by the following consistent (inS) Monte-Carlo expression

ˆδKL(f,g(·;µ)) = 1 S

S

X

s=1

ln pg(t(s);µ)

| {z }

L(µ;t)

+cst

withSindependent draws of conditional proba. t= (t(1), . . . ,t(S)) from pf It is the normalized (observed-data) log-likelihood function of a mixture model But, by construction, all the conditional probabilities are fixed in this mixture Thus, just maximize the normalized complete-data log-likelihoodLcomp(µ;t):

K= 2: this maximization is straightforward

K>2: use a standardQuasi-Newton algorithmwith different random initializations, for avoiding possible local optima

11/22

(16)

From a multivariate to a bivariate Gaussian mixture

g is defined onRK−1but it ismore convenient to be onR2

Just apply LDAong to display this distribution on its most discriminative map It leads to the bivariate spherical Gaussian mixture ˜g

˜ g( ˜y; ˜µ) =

K

X

k=1

πkφ2( ˜y; ˜µk,I),

where ˜y∈R2, ˜µ= ( ˜µ1, . . . ,µ˜K) and ˜µk∈R2

Use the% of inertiaof LDA to measure the quality of the mapping fromg to ˜g

Remark

IfX =Rd andf is a Gaussian mixture with isotropic covariance matrices, thenthe proposed mapping is equivalent to applying a LDA to the centers off

12/22

(17)

Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians Numerical illustrations for functional data Discussion

Overall accuracy of the mapping between f and ˜ g

Use the followingdifference between the normalized entropiesoff and ˜g

δE(f,˜g) =− 1 lnK

K

X

k=1

Z

X

tk(x;f) lntk(x;f)dx− Z

R2

tk( ˜y; ˜g) lntk( ˜y; ˜g)dy˜

Such a quantity can beeasily estimatedby empirical values Its meaning is particularly relevant:

δE(f,g)˜ ≈0: the component overlap conveyed by ˜g(overf) is accurate δE(f,g)˜ ≈1: ˜gstrongly underestimates the component overlap off δE(f,g)˜ ≈ −1: ˜gstrongly overestimates the component overlap off

δE(f,g˜) permits toevaluate the bias of the visualization

13/22

(18)

Drawing ˜ g

Cluster centers: the locations of ˜µ1, . . . ,µ˜K are materialized by vectors Cluster spread: the 95% confidence level displayed by a black border

Cluster overlap: iso-probability curves of the MAP classification for different levels Mapping accuracy:δE(f,˜g) and also % of inertia by axis

14/22

(19)

Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians Numerical illustrations for functional data Discussion

Outline

1 Clustering: from modeling to visualizing

2 Mapping clusters as spherical Gaussians

3 Numerical illustrations for functional data

4 Discussion

15/22

(20)

Bike sharing system: data

3

and model

Station occupancy data collected over the course of one month on the bike sharing system in Paris

Data collected over 5 weeks, between February, 24 and March, 30, 2014, on 1 189 bike stations

Functional data: station status information (available bikes/docks) downloaded every hour from the open-data APIs of JCDecaux company

The final data set contains 1 189 loading profiles, one per station, sampled at 1 448 time points

Model: profiles of the stations were projected on a basis of 25 Fourier functions Model-based clustering of these functional data[Bouveyronet al.2015]with ther packageFunFEM[Bouveyron 2015]

Retain 10 clusters

Visualization usingClusVis R package

3[Bouveyronet al.(2015)]

16/22

(21)

Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians Numerical illustrations for functional data Discussion

Bike sharing system: cluster of curves visualization

17/22

(22)

Mapping off on this graph is accurate becauseδE(f,˜g) =−0.03

18/22

(23)

Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians Numerical illustrations for functional data Discussion

Outline

1 Clustering: from modeling to visualizing

2 Mapping clusters as spherical Gaussians

3 Numerical illustrations for functional data

4 Discussion

19/22

(24)

Conclusion and extensions

Conclusion

Generic method for visualizing the results of a model-based clustering Very easy to understand output since “Gaussian-like”

Permits visualization for any type of data, because only based on proba. of classif.

Can be used after any existing package of model-based clustering The overall accuracy of the visualization is also provided

Extensions

Possibility to explore other pdf visualizations than Gaussians However, should keep in mind simple visualizations are targeted Possibility to compare pdf candidates throughδKLorδE

20/22

(25)

Clustering: from modeling to visualizing Mapping clusters as spherical Gaussians Numerical illustrations for functional data Discussion

About individual visualization

Theoretically, impossible to obtain individual visualization from pdf visualization However, we can propose apseudo scatter plotofxas follows

xi7−→ti(f) =ti(g)bijection7−→ yi∈RK−17−→LDAi∈R2

˜

yallows only to visualize the classification position ofx

Caution: do not overlay pdf and individual plots since ˜y= ( ˜y1, . . . ,y˜n) is not necessarily drawn from a Gaussian mixture

21/22

(26)

Model-based clustering: flexibility of F for complex X

Continuous data(X =RdX): multivariate Gaussian/tdistrib.[McNicholas 2016]

Categorical data: product of multinomial distributions[Goodman 1974]

Mixing cont./cat.: product Gaussian/multinomial[Moustaki & Papageorgiou 2005]

Functional data: the discriminative functional mixture[Bouveyronet al.2015]

Network data: the Erd¨os R´enyi mixture[Zanghiet al.2008]

Other kinds of data, missing data, high dimension,. . .

22/22

Références

Documents relatifs

· A sparse semi-supervised Gaussian process regression method infers the gaze coordinates by an active set which can be built on-line with limited numbers of samples and

The proposed model allows thus to fully exploit the structure of the data, compared to classical latent block clustering models for continuous non functional data, which ignore

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

The key point is to ex- plicitly force a visualization based on a spherical Gaussian mixture to inherit from the within cluster overlap that is present in the initial

• choosing to prove necessary and sufficient conditions for nullity of the cubic variation, around the threshold regularity value H = 1/6, for Gaussian processes with

The first step consists of constructing a collection of models {H (k,J) } (k,J)∈M in which H (k,J) is defined by equation (7), and the model collection is indexed by M = K × J. We

The idea of the Raftery and Dean approach is to divide the set of variables into the subset of relevant clustering variables and the subset of irrelevant variables, independent of

– testing association between the binarized phenotype (using the optimal cutoff identified by the co-clustering algorithm, here value 0.042) and genotypes by logistic regression..