• Aucun résultat trouvé

A study of the manifold hypothesis for functional data by using spectral clustering

N/A
N/A
Protected

Academic year: 2022

Partager "A study of the manifold hypothesis for functional data by using spectral clustering"

Copied!
71
0
0

Texte intégral

(1)

A study of the manifold hypothesis for functional data by using spectral clustering

Julien Ah-Pine 1,2 Anne-Fran¸ coise Yao 2

1

Univ. of Lyon - ERIC Lab

2

Univ. of Clermont Auvergne - LMBP

CMStatistics 2019

London, 14th of June 2019

(2)

Outline

1 Background and Motivations

2 Spectral clustering

3 Experiments and discussion

(3)

Background and Motivations

Outline

1 Background and Motivations

2 Spectral clustering

3 Experiments and discussion

(4)

Background and Motivations

Functional Data (FD)

In many applications, observations are realization of functional data (FD) (curves, time series, signals, images,. . . ).

Functional Data Analysis (FDA) extends multivariate data analysis techniques to FD or develops specific techniques for FD, see for e.g.

[?, ?].

Objects under study are n real valued functions {x i } i=1,...,n in L 2 ([0, T ]), where T > 0.

However ∀x i , we only have p measurements {y ij } j =1,...,p at discrete time points {t j } j =1,...,p in [0, T ] and these observations are assumed to be corrupted by noise ij :

y ij = x i (t j ) + ij , ∀i , ∀j

where ij are assumed to be independent across i and j .

(5)

Background and Motivations

Functional Data Clustering (FDC)

Given {y ij } i,j find a partition of {x i } i where FD in a class are more similar to each other than to FD in other classes (see for e.g. [?]).

One possible workflow for FDC is the following one :

Represent the FD in a low-dimensional space using either : Pre-defined finite set of basis functions such as bsplines.

Data-driven finite set of basis functions such as truncated Karhunen-Loeve expansion (a.k.a. functional PCA).

Apply multivariate clustering techniques either : Assuming all FD belong to the whole low-dimensional representation space (e.g. k -means or hierarchical clustering).

Assuming that each cluster only belong to a subspace of the

representation space (e.g. subspace clustering or model-based

functional clustering techniques).

(6)

Background and Motivations

Example : Berkeley Growth data

{y ij } j =1,...,p = heights measured at different times t j .

x i = height function of individual i .

● ●● ●

●●

● ● ●● ● ● ● ●

0 5 10 15 20 25 30

80120160

Discrete observations of 2 FD

time

value

(7)

Background and Motivations

Example : Berkeley Growth data

{y ij } j =1,...,p = heights measured at different times t j . x i = height function of individual i .

● ●● ●

●●

● ● ●● ● ● ● ●

0 5 10 15 20 25 30

80120160

Discrete observations of 2 FD

time

value

0 5 10 15 20 25 30

80120160

Continuous representation of 2 FD

time

value

(8)

Background and Motivations

Example : Berkeley Growth data

{y ij } j =1,...,p = heights measured at different times t j . x i = height function of individual i .

● ●● ●

●●

● ● ●● ● ● ● ●

0 5 10 15 20 25 30

80120160

Discrete observations of 2 FD

time

value

0 5 10 15 20 25 30

80120160

Continuous representation of 2 FD

time

value

0 5 10 15 20 25 30

80120160200

Continuous representation of all FD

time

value

(9)

Background and Motivations

Example : Berkeley Growth data

{y ij } j =1,...,p = heights measured at different times t j . x i = height function of individual i .

● ●● ●

●●

● ● ●● ● ● ● ●

0 5 10 15 20 25 30

80120160

Discrete observations of 2 FD

time

value

0 5 10 15 20 25 30

80120160

Continuous representation of 2 FD

time

value

0 5 10 15 20 25 30

80120160200

Continuous representation of all FD

time

value

0 5 10 15 20 25 30

80120160200

Clustering of all FD in 2 clusters

time

value

(10)

Background and Motivations

Motivations of our study

Most of previous works do not consider that FD belong to a RKHS.

→ We want to investigate kernel methods for FDC.

E.g. of related work [?, ?].

Most of previous works only use one representation x i xor Dx i the derivative functions.

→ We want to investigate if information fusion can leverage the functional nature of the data by considering Sobolev spaces W 1,2 ([0, T ]).

E.g. of related work [?].

Most of previous works assume that FD belong to linear spaces or subspaces.

→ We want to investigate further the manifold hypothesis : FD belong to low-dimensional non-linear manifold.

E.g. of related work [?].

⇒ We investigate these points jointly and from an empirical viewpoint

using 20 benchmarks and by using spectral clustering (SC).

(11)

Background and Motivations

Motivations of our study

Most of previous works do not consider that FD belong to a RKHS.

→ We want to investigate kernel methods for FDC.

E.g. of related work [?, ?].

Most of previous works only use one representation x i xor Dx i the derivative functions.

→ We want to investigate if information fusion can leverage the functional nature of the data by considering Sobolev spaces W 1,2 ([0, T ]).

E.g. of related work [?].

Most of previous works assume that FD belong to linear spaces or subspaces.

→ We want to investigate further the manifold hypothesis : FD belong to low-dimensional non-linear manifold.

E.g. of related work [?].

⇒ We investigate these points jointly and from an empirical viewpoint

using 20 benchmarks and by using spectral clustering (SC).

(12)

Background and Motivations

Motivations of our study

Most of previous works do not consider that FD belong to a RKHS.

→ We want to investigate kernel methods for FDC.

E.g. of related work [?, ?].

Most of previous works only use one representation x i xor Dx i the derivative functions.

→ We want to investigate if information fusion can leverage the functional nature of the data by considering Sobolev spaces W 1,2 ([0, T ]).

E.g. of related work [?].

Most of previous works assume that FD belong to linear spaces or subspaces.

→ We want to investigate further the manifold hypothesis : FD belong to low-dimensional non-linear manifold.

E.g. of related work [?].

⇒ We investigate these points jointly and from an empirical viewpoint

using 20 benchmarks and by using spectral clustering (SC).

(13)

Background and Motivations

Motivations of our study

Most of previous works do not consider that FD belong to a RKHS.

→ We want to investigate kernel methods for FDC.

E.g. of related work [?, ?].

Most of previous works only use one representation x i xor Dx i the derivative functions.

→ We want to investigate if information fusion can leverage the functional nature of the data by considering Sobolev spaces W 1,2 ([0, T ]).

E.g. of related work [?].

Most of previous works assume that FD belong to linear spaces or subspaces.

→ We want to investigate further the manifold hypothesis : FD belong to low-dimensional non-linear manifold.

E.g. of related work [?].

⇒ We investigate these points jointly and from an empirical viewpoint

using 20 benchmarks and by using spectral clustering (SC).

(14)

Spectral clustering

Outline

1 Background and Motivations

2 Spectral clustering

3 Experiments and discussion

(15)

Spectral clustering

Spectral clustering (SC) in a nutshell

Methods developed in the ML community since the early 2000’s.

Capture the intrinsic geometry of the data.

Similarity, neighbor end Laplacian graphs are important concepts.

Methodology : use the spectral decomposition of the Laplacian matrix as an embedding of the graph nodes in an Euclidean space then partition the nodes using k -means.

Motivations : the eigenvalues and eigenvectors of the Laplacian encode information about the connected components (and more generally clusters) of the graph, they also provide solutions to (relaxed) graph cuts problems.

See for e.g. [?] for an introduction.

(16)

Spectral clustering

Similarity and Neighbor graphs

Similarities between objects as a weighted undirected graph G = ( V , E ) :

V = {x

1

, . . . , x

n

} is the set of nodes : objects to cluster.

E is the set of edges : pairs of objects that are similar to each other.

Edges are weighted : if (x i , x j ) ∈ E then K (x i , x j ) > 0 is the measure of the similarity.

G is represented by a weighted adjacency matrix denoted W = (w ij ) i ,j =1,...,n with :

w ij =

K (x i , x j ) if (x i , x j ) ∈ E

0 else

K is a kernel function : objects belong to an RKHS.

We can sparsify W and have a k nearest neighbor graph in order to

strengthen the manifold hypothesis.

(17)

Spectral clustering

Similarity and Neighbor graphs

Similarities between objects as a weighted undirected graph G = ( V , E ) :

V = {x

1

, . . . , x

n

} is the set of nodes : objects to cluster.

E is the set of edges : pairs of objects that are similar to each other.

Edges are weighted : if (x i , x j ) ∈ E then K (x i , x j ) > 0 is the measure of the similarity.

G is represented by a weighted adjacency matrix denoted W = (w ij ) i ,j =1,...,n with :

w ij =

K (x i , x j ) if (x i , x j ) ∈ E

0 else

K is a kernel function : objects belong to an RKHS.

We can sparsify W and have a k nearest neighbor graph in order to

strengthen the manifold hypothesis.

(18)

Spectral clustering

Similarity and Neighbor graphs

Similarities between objects as a weighted undirected graph G = ( V , E ) :

V = {x

1

, . . . , x

n

} is the set of nodes : objects to cluster.

E is the set of edges : pairs of objects that are similar to each other.

Edges are weighted : if (x i , x j ) ∈ E then K (x i , x j ) > 0 is the measure of the similarity.

G is represented by a weighted adjacency matrix denoted W = (w ij ) i ,j =1,...,n with :

w ij =

K (x i , x j ) if (x i , x j ) ∈ E

0 else

K is a kernel function : objects belong to an RKHS.

We can sparsify W and have a k nearest neighbor graph in order to

strengthen the manifold hypothesis.

(19)

Spectral clustering

Laplacian matrix and its normalization

Let D = (d ij ) i,j=1,...,n be the degree matrix defined by :

d ij =

d i if i = j 0 else with d i = P n

j=1 w ij , ∀i = 1, . . . , n.

The Laplacian matrix of G denoted L is given by : L = D − W

Its (symmetric) normalization denoted L sym is defined by : L sym = D −1/2 LD −1/2

= I − D −1/2 WD −1/2

with I the identity matrix of order n.

(20)

Spectral clustering

Properties of the normalized Laplacian matrix

Property.

L sym can be viewed as a quadratic form (that we aim at minimizing) :

f > L sym f = 1 2

n

X

i,j=1

w ij ( f i

√ d i − f j

p d j

) 2 , ∀f ∈ R n

L sym is symmetric and psd :

0 = λ 1 ≤ λ 2 ≤ . . . ≤ λ n

The multiplicity order k of the null eigenvalue is the number of

connected components of G . Let denote the latter subset of nodes

as C 1 , . . . , C k . The eigen subspace associated to λ 1 is spanned by

D 1/2 1 C

1

, . . . , D 1/2 1 C

k

where 1 C

l

is the assignment vector of C l .

(21)

Spectral clustering

Illustration with a disconnected graph

V = {x 1 , x 2 , x 3 , x 4 , x 5 } E = {(x 1 , x 2 )

| {z }

2

, (x 2 , x 3 )

| {z }

3

, (x 4 , x 5 )

| {z }

2

}

x 1 x 2

x 3 x 4 x 5

2

3 2

→ W =

0 2 0 0 0

2 0 3 0 0

0 3 0 0 0

0 0 0 0 2

0 0 0 2 0

→ D =

2 0 0 0 0

0 5 0 0 0

0 0 3 0 0

0 0 0 2 0

0 0 0 0 2

→ L =

2 −2 0 0 0

−2 5 −3 0 0

0 −3 3 0 0

0 0 0 2 −2

0 0 0 −2 2

→ L

sym

=

1 −.63 0 0 0

−.63 1 −.77 0 0

0 −.77 1 0 0

0 0 0 1 −1

0 0 0 −1 1

→ Spectra of L

sym

: {2,2,1,0,0} → D

1/2

1

C1

| {z }

f01

=

 1.41 2.24 1.73 0 0

and D

1/2

1

C2

| {z }

f02

=

 0 0 0 1.41 1.41

.

(22)

Spectral clustering

Illustration with a disconnected graph

V = {x 1 , x 2 , x 3 , x 4 , x 5 } E = {(x 1 , x 2 )

| {z }

2

, (x 2 , x 3 )

| {z }

3

, (x 4 , x 5 )

| {z }

2

}

x 1 x 2

x 3 x 4 x 5

2

3 2

→ W =

0 2 0 0 0

2 0 3 0 0

0 3 0 0 0

0 0 0 0 2

0 0 0 2 0

→ D =

2 0 0 0 0

0 5 0 0 0

0 0 3 0 0

0 0 0 2 0

0 0 0 0 2

→ L =

2 −2 0 0 0

−2 5 −3 0 0

0 −3 3 0 0

0 0 0 2 −2

0 0 0 −2 2

→ L

sym

=

1 −.63 0 0 0

−.63 1 −.77 0 0

0 −.77 1 0 0

0 0 0 1 −1

0 0 0 −1 1

→ Spectra of L

sym

: {2,2,1,0,0} → D

1/2

1

C1

| {z }

f01

=

 1.41 2.24 1.73 0 0

and D

1/2

1

C2

| {z }

f02

=

 0 0 0 1.41 1.41

.

(23)

Spectral clustering

Illustration with a disconnected graph

V = {x 1 , x 2 , x 3 , x 4 , x 5 } E = {(x 1 , x 2 )

| {z }

2

, (x 2 , x 3 )

| {z }

3

, (x 4 , x 5 )

| {z }

2

}

x 1 x 2

x 3 x 4 x 5

2

3 2

→ W =

0 2 0 0 0

2 0 3 0 0

0 3 0 0 0

0 0 0 0 2

0 0 0 2 0

→ D =

2 0 0 0 0

0 5 0 0 0

0 0 3 0 0

0 0 0 2 0

0 0 0 0 2

→ L =

2 −2 0 0 0

−2 5 −3 0 0

0 −3 3 0 0

0 0 0 2 −2

0 0 0 −2 2

→ L

sym

=

1 −.63 0 0 0

−.63 1 −.77 0 0

0 −.77 1 0 0

0 0 0 1 −1

0 0 0 −1 1

→ Spectra of L

sym

: {2,2,1,0,0} → D

1/2

1

C1

| {z }

f01

=

 1.41 2.24 1.73 0 0

and D

1/2

1

C2

| {z }

f02

=

 0 0 0 1.41 1.41

.

(24)

Spectral clustering

Illustration with a disconnected graph

V = {x 1 , x 2 , x 3 , x 4 , x 5 } E = {(x 1 , x 2 )

| {z }

2

, (x 2 , x 3 )

| {z }

3

, (x 4 , x 5 )

| {z }

2

}

x 1 x 2

x 3 x 4 x 5

2

3 2

→ W =

0 2 0 0 0

2 0 3 0 0

0 3 0 0 0

0 0 0 0 2

0 0 0 2 0

→ D =

2 0 0 0 0

0 5 0 0 0

0 0 3 0 0

0 0 0 2 0

0 0 0 0 2

→ L =

2 −2 0 0 0

−2 5 −3 0 0

0 −3 3 0 0

0 0 0 2 −2

0 0 0 −2 2

→ L

sym

=

1 −.63 0 0 0

−.63 1 −.77 0 0

0 −.77 1 0 0

0 0 0 1 −1

0 0 0 −1 1

→ Spectra of L

sym

: {2,2,1,0,0} → D

1/2

1

C1

| {z }

f01

=

 1.41 2.24 1.73 0 0

and D

1/2

1

C2

| {z }

f02

=

 0 0 0 1.41 1.41

.

(25)

Spectral clustering

Illustration with a disconnected graph

V = {x 1 , x 2 , x 3 , x 4 , x 5 } E = {(x 1 , x 2 )

| {z }

2

, (x 2 , x 3 )

| {z }

3

, (x 4 , x 5 )

| {z }

2

}

x 1 x 2

x 3 x 4 x 5

2

3 2

→ W =

0 2 0 0 0

2 0 3 0 0

0 3 0 0 0

0 0 0 0 2

0 0 0 2 0

→ D =

2 0 0 0 0

0 5 0 0 0

0 0 3 0 0

0 0 0 2 0

0 0 0 0 2

→ L =

2 −2 0 0 0

−2 5 −3 0 0

0 −3 3 0 0

0 0 0 2 −2

0 0 0 −2 2

→ L

sym

=

1 −.63 0 0 0

−.63 1 −.77 0 0

0 −.77 1 0 0

0 0 0 1 −1

0 0 0 −1 1

→ Spectra of L

sym

: {2,2,1,0,0} → D

1/2

1

C1

| {z }

f01

=

 1.41 2.24 1.73 0 0

and D

1/2

1

C2

| {z }

f02

=

 0 0 0 1.41 1.41

.

(26)

Spectral clustering

Illustration with a disconnected graph

V = {x 1 , x 2 , x 3 , x 4 , x 5 } E = {(x 1 , x 2 )

| {z }

2

, (x 2 , x 3 )

| {z }

3

, (x 4 , x 5 )

| {z }

2

}

x 1 x 2

x 3 x 4 x 5

2

3 2

→ W =

0 2 0 0 0

2 0 3 0 0

0 3 0 0 0

0 0 0 0 2

0 0 0 2 0

→ D =

2 0 0 0 0

0 5 0 0 0

0 0 3 0 0

0 0 0 2 0

0 0 0 0 2

→ L =

2 −2 0 0 0

−2 5 −3 0 0

0 −3 3 0 0

0 0 0 2 −2

0 0 0 −2 2

→ L

sym

=

1 −.63 0 0 0

−.63 1 −.77 0 0

0 −.77 1 0 0

0 0 0 1 −1

0 0 0 −1 1

→ Spectra of L

sym

: {2,2,1,0,0} → D

1/2

1

C1

| {z }

f01

=

 1.41 2.24 1.73 0 0

and D

1/2

1

C2

| {z }

f02

=

 0 0 0 1.41 1.41

.

(27)

Spectral clustering

Illustration with a disconnected graph

V = {x 1 , x 2 , x 3 , x 4 , x 5 } E = {(x 1 , x 2 )

| {z }

2

, (x 2 , x 3 )

| {z }

3

, (x 4 , x 5 )

| {z }

2

}

x 1 x 2

x 3 x 4 x 5

2

3 2

→ W =

0 2 0 0 0

2 0 3 0 0

0 3 0 0 0

0 0 0 0 2

0 0 0 2 0

→ D =

2 0 0 0 0

0 5 0 0 0

0 0 3 0 0

0 0 0 2 0

0 0 0 0 2

→ L =

2 −2 0 0 0

−2 5 −3 0 0

0 −3 3 0 0

0 0 0 2 −2

0 0 0 −2 2

→ L

sym

=

1 −.63 0 0 0

−.63 1 −.77 0 0

0 −.77 1 0 0

0 0 0 1 −1

0 0 0 −1 1

→ Spectra of L

sym

: {2,2,1,0,0}

→ D

1/2

1

C1

| {z }

f01

=

 1.41 2.24 1.73 0 0

and D

1/2

1

C2

| {z }

f02

=

 0 0 0 1.41 1.41

.

(28)

Spectral clustering

Illustration with a disconnected graph

V = {x 1 , x 2 , x 3 , x 4 , x 5 } E = {(x 1 , x 2 )

| {z }

2

, (x 2 , x 3 )

| {z }

3

, (x 4 , x 5 )

| {z }

2

}

x 1 x 2

x 3 x 4 x 5

2

3 2

→ W =

0 2 0 0 0

2 0 3 0 0

0 3 0 0 0

0 0 0 0 2

0 0 0 2 0

→ D =

2 0 0 0 0

0 5 0 0 0

0 0 3 0 0

0 0 0 2 0

0 0 0 0 2

→ L =

2 −2 0 0 0

−2 5 −3 0 0

0 −3 3 0 0

0 0 0 2 −2

0 0 0 −2 2

→ L

sym

=

1 −.63 0 0 0

−.63 1 −.77 0 0

0 −.77 1 0 0

0 0 0 1 −1

0 0 0 −1 1

→ Spectra of L

sym

: {2,2,1,0,0} → D

1/2

1

C1

| {z }

f01

=

 1.41 2.24 1.73 0 0

and D

1/2

1

C2

| {z }

f02

=

 0 0 0 1.41 1.41

.

(29)

Experiments and discussion

Outline

1 Background and Motivations

2 Spectral clustering

3 Experiments and discussion

(30)

Experiments and discussion

Workflow

1. Smoothing : from {y ij } i,j to {x i } i :

Basis functions are cubic bspline {φ

k

}

k=1,...,q

with q = 4 + p : x

i

(t ) = c

>i

φ(t ) =

q

X

k=1

c

ik

φ

k

(t)

where c

i

= c

i1

. . . c

iq

>

and φ(t) = φ

1

(t) . . . φ

q

(t )

>

∈ R

q

. We find c

i

as follows :

c

i

= arg min

c∈Rq p

X

j=1

(y

ij

− x

i

(t

j

))

2

+ λ Z

T

0

D

2

x

i

(t )dt

where D is the differential operator and λ is the smoothing coefficient selected in {10

−4

, 10

−3

, 10

−2

, 10

−1

, 10

0

} wrt the GCV criterion.

2. Center the {x i } i and compute derivatives {Dx i } i . 3. Compute the Gram matrix S wrt a given kernel function. 4. Perform clustering procedures.

5. Evaluate clustering outputs and compare the results.

(31)

Experiments and discussion

Workflow

1. Smoothing : from {y ij } i,j to {x i } i :

Basis functions are cubic bspline {φ

k

}

k=1,...,q

with q = 4 + p : x

i

(t ) = c

>i

φ(t ) =

q

X

k=1

c

ik

φ

k

(t)

where c

i

= c

i1

. . . c

iq

>

and φ(t) = φ

1

(t) . . . φ

q

(t )

>

∈ R

q

. We find c

i

as follows :

c

i

= arg min

c∈Rq p

X

j=1

(y

ij

− x

i

(t

j

))

2

+ λ Z

T

0

D

2

x

i

(t )dt

where D is the differential operator and λ is the smoothing coefficient selected in {10

−4

, 10

−3

, 10

−2

, 10

−1

, 10

0

} wrt the GCV criterion.

2. Center the {x i } i and compute derivatives {Dx i } i .

3. Compute the Gram matrix S wrt a given kernel function. 4. Perform clustering procedures.

5. Evaluate clustering outputs and compare the results.

(32)

Experiments and discussion

Workflow

1. Smoothing : from {y ij } i,j to {x i } i :

Basis functions are cubic bspline {φ

k

}

k=1,...,q

with q = 4 + p : x

i

(t ) = c

>i

φ(t ) =

q

X

k=1

c

ik

φ

k

(t)

where c

i

= c

i1

. . . c

iq

>

and φ(t) = φ

1

(t) . . . φ

q

(t )

>

∈ R

q

. We find c

i

as follows :

c

i

= arg min

c∈Rq p

X

j=1

(y

ij

− x

i

(t

j

))

2

+ λ Z

T

0

D

2

x

i

(t )dt

where D is the differential operator and λ is the smoothing coefficient selected in {10

−4

, 10

−3

, 10

−2

, 10

−1

, 10

0

} wrt the GCV criterion.

2. Center the {x i } i and compute derivatives {Dx i } i . 3. Compute the Gram matrix S wrt a given kernel function.

4. Perform clustering procedures.

5. Evaluate clustering outputs and compare the results.

(33)

Experiments and discussion

Workflow

1. Smoothing : from {y ij } i,j to {x i } i :

Basis functions are cubic bspline {φ

k

}

k=1,...,q

with q = 4 + p : x

i

(t ) = c

>i

φ(t ) =

q

X

k=1

c

ik

φ

k

(t)

where c

i

= c

i1

. . . c

iq

>

and φ(t) = φ

1

(t) . . . φ

q

(t )

>

∈ R

q

. We find c

i

as follows :

c

i

= arg min

c∈Rq p

X

j=1

(y

ij

− x

i

(t

j

))

2

+ λ Z

T

0

D

2

x

i

(t )dt

where D is the differential operator and λ is the smoothing coefficient selected in {10

−4

, 10

−3

, 10

−2

, 10

−1

, 10

0

} wrt the GCV criterion.

2. Center the {x i } i and compute derivatives {Dx i } i . 3. Compute the Gram matrix S wrt a given kernel function.

4. Perform clustering procedures.

5. Evaluate clustering outputs and compare the results.

(34)

Experiments and discussion

Workflow

1. Smoothing : from {y ij } i,j to {x i } i :

Basis functions are cubic bspline {φ

k

}

k=1,...,q

with q = 4 + p : x

i

(t ) = c

>i

φ(t ) =

q

X

k=1

c

ik

φ

k

(t)

where c

i

= c

i1

. . . c

iq

>

and φ(t) = φ

1

(t) . . . φ

q

(t )

>

∈ R

q

. We find c

i

as follows :

c

i

= arg min

c∈Rq p

X

j=1

(y

ij

− x

i

(t

j

))

2

+ λ Z

T

0

D

2

x

i

(t )dt

where D is the differential operator and λ is the smoothing coefficient selected in {10

−4

, 10

−3

, 10

−2

, 10

−1

, 10

0

} wrt the GCV criterion.

2. Center the {x i } i and compute derivatives {Dx i } i . 3. Compute the Gram matrix S wrt a given kernel function.

4. Perform clustering procedures.

5. Evaluate clustering outputs and compare the results.

(35)

Experiments and discussion

Kernel/Representation/Sparsification

FD are centered : P n

i=1 x i (t ) = 0, ∀t ∈ [0, T ].

Different Hilbert spaces :

00 x

i

∈ L

2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

. 11 Dx

i

∈ L

2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hDx

i

, Dx

j

i

L2

.

01 x

i

∈ W

1,2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

+ hDx

i

, Dx

j

i

L2

. Different kernel functions (RKHS) :

Linear kernel : K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

= R

T

0

x

i

(t )x

j

(t)dt Gaussian Kernel [?] : K

g

(x

i

, x

j

) = exp

kxi−xjk

2 L2

σiσj

Different sparsifications :

0 “Connected”graph : w

ij

= max(K (x

i

, x

j

), 0). 1 k nearest-neighbor graph (with k=7).

⇒ Main questions :

Does basis expansion and RKHS help ?

Does “fusing” both x

i

and Dx

i

and work in a Sobolev space help ?

Does sparsification (that emphasizes the manifold hypothesis) help ?

(36)

Experiments and discussion

Kernel/Representation/Sparsification

FD are centered : P n

i=1 x i (t ) = 0, ∀t ∈ [0, T ].

Different Hilbert spaces :

00 x

i

∈ L

2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

. 11 Dx

i

∈ L

2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hDx

i

, Dx

j

i

L2

.

01 x

i

∈ W

1,2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

+ hDx

i

, Dx

j

i

L2

.

Different kernel functions (RKHS) : Linear kernel : K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

= R

T

0

x

i

(t )x

j

(t)dt Gaussian Kernel [?] : K

g

(x

i

, x

j

) = exp

kxi−xjk

2 L2

σiσj

Different sparsifications :

0 “Connected”graph : w

ij

= max(K (x

i

, x

j

), 0). 1 k nearest-neighbor graph (with k=7).

⇒ Main questions :

Does basis expansion and RKHS help ?

Does “fusing” both x

i

and Dx

i

and work in a Sobolev space help ?

Does sparsification (that emphasizes the manifold hypothesis) help ?

(37)

Experiments and discussion

Kernel/Representation/Sparsification

FD are centered : P n

i=1 x i (t ) = 0, ∀t ∈ [0, T ].

Different Hilbert spaces :

00 x

i

∈ L

2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

. 11 Dx

i

∈ L

2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hDx

i

, Dx

j

i

L2

.

01 x

i

∈ W

1,2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

+ hDx

i

, Dx

j

i

L2

. Different kernel functions (RKHS) :

Linear kernel : K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

= R

T

0

x

i

(t )x

j

(t)dt Gaussian Kernel [?] : K

g

(x

i

, x

j

) = exp

kxi−xjk

2 L2

σiσj

Different sparsifications :

0 “Connected”graph : w

ij

= max(K (x

i

, x

j

), 0). 1 k nearest-neighbor graph (with k=7).

⇒ Main questions :

Does basis expansion and RKHS help ?

Does “fusing” both x

i

and Dx

i

and work in a Sobolev space help ?

Does sparsification (that emphasizes the manifold hypothesis) help ?

(38)

Experiments and discussion

Kernel/Representation/Sparsification

FD are centered : P n

i=1 x i (t ) = 0, ∀t ∈ [0, T ].

Different Hilbert spaces :

00 x

i

∈ L

2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

. 11 Dx

i

∈ L

2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hDx

i

, Dx

j

i

L2

.

01 x

i

∈ W

1,2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

+ hDx

i

, Dx

j

i

L2

. Different kernel functions (RKHS) :

Linear kernel : K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

= R

T

0

x

i

(t )x

j

(t)dt Gaussian Kernel [?] : K

g

(x

i

, x

j

) = exp

kxi−xjk

2 L2

σiσj

Different sparsifications :

0 “Connected”graph : w

ij

= max(K (x

i

, x

j

), 0).

1 k nearest-neighbor graph (with k=7).

⇒ Main questions :

Does basis expansion and RKHS help ?

Does “fusing” both x

i

and Dx

i

and work in a Sobolev space help ?

Does sparsification (that emphasizes the manifold hypothesis) help ?

(39)

Experiments and discussion

Kernel/Representation/Sparsification

FD are centered : P n

i=1 x i (t ) = 0, ∀t ∈ [0, T ].

Different Hilbert spaces :

00 x

i

∈ L

2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

. 11 Dx

i

∈ L

2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hDx

i

, Dx

j

i

L2

.

01 x

i

∈ W

1,2

([0, T ]), e.g. K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

+ hDx

i

, Dx

j

i

L2

. Different kernel functions (RKHS) :

Linear kernel : K

l

(x

i

, x

j

) = hx

i

, x

j

i

L2

= R

T

0

x

i

(t )x

j

(t)dt Gaussian Kernel [?] : K

g

(x

i

, x

j

) = exp

kxi−xjk

2 L2

σiσj

Different sparsifications :

0 “Connected”graph : w

ij

= max(K (x

i

, x

j

), 0).

1 k nearest-neighbor graph (with k=7).

⇒ Main questions :

Does basis expansion and RKHS help ?

Does “fusing” both x

i

and Dx

i

and work in a Sobolev space help ?

Does sparsification (that emphasizes the manifold hypothesis) help ?

(40)

Experiments and discussion

Clustering procedures

We test the different kernel/representation/sparsification using the two following clustering procedures. S is the Gram matrix.

Kernel k-means (K km) : Spectral decomposition of S.

Euclidean embedding : F = f

1

. . . f

l

(all eigenvectors associated to strictly positive eigenvalues).

Apply k -means to F. Spectral clustering (SC km) :

From S, determine W (with/without sparsification) and L

sym

. Spectral decomposition of L

sym

.

Euclidean embedding : F = f

1

. . . f

k

(k first eigenvectors associated to the lowest eigenvalues).

Normalize rows of F to have unit norms. Apply k -means to F.

Baseline : kernel k-means with linear kernel K l (x i , x j ) = hx i , x j i L

2

.

(41)

Experiments and discussion

Clustering procedures

We test the different kernel/representation/sparsification using the two following clustering procedures. S is the Gram matrix.

Kernel k-means (K km) : Spectral decomposition of S.

Euclidean embedding : F = f

1

. . . f

l

(all eigenvectors associated to strictly positive eigenvalues).

Apply k -means to F.

Spectral clustering (SC km) :

From S, determine W (with/without sparsification) and L

sym

. Spectral decomposition of L

sym

.

Euclidean embedding : F = f

1

. . . f

k

(k first eigenvectors associated to the lowest eigenvalues).

Normalize rows of F to have unit norms. Apply k -means to F.

Baseline : kernel k-means with linear kernel K l (x i , x j ) = hx i , x j i L

2

.

(42)

Experiments and discussion

Clustering procedures

We test the different kernel/representation/sparsification using the two following clustering procedures. S is the Gram matrix.

Kernel k-means (K km) : Spectral decomposition of S.

Euclidean embedding : F = f

1

. . . f

l

(all eigenvectors associated to strictly positive eigenvalues).

Apply k -means to F.

Spectral clustering (SC km) :

From S, determine W (with/without sparsification) and L

sym

. Spectral decomposition of L

sym

.

Euclidean embedding : F = f

1

. . . f

k

(k first eigenvectors associated to the lowest eigenvalues).

Normalize rows of F to have unit norms.

Apply k -means to F.

Baseline : kernel k-means with linear kernel K l (x i , x j ) = hx i , x j i L

2

.

(43)

Experiments and discussion

Clustering procedures

We test the different kernel/representation/sparsification using the two following clustering procedures. S is the Gram matrix.

Kernel k-means (K km) : Spectral decomposition of S.

Euclidean embedding : F = f

1

. . . f

l

(all eigenvectors associated to strictly positive eigenvalues).

Apply k -means to F.

Spectral clustering (SC km) :

From S, determine W (with/without sparsification) and L

sym

. Spectral decomposition of L

sym

.

Euclidean embedding : F = f

1

. . . f

k

(k first eigenvectors associated to the lowest eigenvalues).

Normalize rows of F to have unit norms.

Apply k -means to F.

Baseline : kernel k-means with linear kernel K l (x i , x j ) = hx i , x j i

L

2

.

(44)

Experiments and discussion

List of the 18 clustering models

Acronym Representation Kernel Clustering proc. Sparsif.

00 linear K km x

i

∈ L

2

([0, T ]) Linear Ker. k-means

00 gaussian K km x

i

∈ L

2

([0, T ]) Gaussian Ker. k-means 11 linear K km Dx

i

∈ L

2

([0, T ]) Linear Ker. k-means 11 gaussian K km Dx

i

∈ L

2

([0, T ]) Gaussian Ker. k-means 01 linear K km x

i

∈ W

1,2

([0, T ]) Linear Ker. k-means 01 gaussian K km x

i

∈ W

1,2

([0, T ]) Gaussian Ker. k-means

00 linear SC km 0 x

i

∈ L

2

([0, T ]) Linear Spectral Clust. Connected

00 gaussian SC km 0 x

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. Connected

11 linear SC km 0 Dx

i

∈ L

2

([0, T ]) Linear Spectral Clust. Connected

11 gaussian SC km 0 Dx

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. Connected

01 linear SC km 0 x

i

∈ W

1,2

([0, T ]) Linear Spectral Clust. Connected

01 gaussian SC km 0 x

i

∈ W

1,2

([0, T ]) Gaussian Spectral Clust. Connected

00 linear SC km 1 x

i

∈ L

2

([0, T ]) Linear Spectral Clust. 7 near. neig.

00 gaussian SC km 1 x

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

11 linear SC km 1 Dx

i

∈ L

2

([0, T ]) Linear Spectral Clust. 7 near. neig.

11 gaussian SC km 1 Dx

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

01 linear SC km 1 x

i

∈ W

1,2

([0, T ]) Linear Spectral Clust. 7 near. neig.

01 gaussian SC km 1 x

i

∈ W

1,2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

(45)

Experiments and discussion

List of the 18 clustering models

Acronym Representation Kernel Clustering proc. Sparsif.

00 linear K km x

i

∈ L

2

([0, T ]) Linear Ker. k-means

00 gaussian K km x

i

∈ L

2

([0, T ]) Gaussian Ker. k-means

11 linear K km Dx

i

∈ L

2

([0, T ]) Linear Ker. k-means

11 gaussian K km Dx

i

∈ L

2

([0, T ]) Gaussian Ker. k-means

01 linear K km x

i

∈ W

1,2

([0, T ]) Linear Ker. k-means

01 gaussian K km x

i

∈ W

1,2

([0, T ]) Gaussian Ker. k-means

00 linear SC km 0 x

i

∈ L

2

([0, T ]) Linear Spectral Clust. Connected

00 gaussian SC km 0 x

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. Connected

11 linear SC km 0 Dx

i

∈ L

2

([0, T ]) Linear Spectral Clust. Connected

11 gaussian SC km 0 Dx

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. Connected

01 linear SC km 0 x

i

∈ W

1,2

([0, T ]) Linear Spectral Clust. Connected

01 gaussian SC km 0 x

i

∈ W

1,2

([0, T ]) Gaussian Spectral Clust. Connected

00 linear SC km 1 x

i

∈ L

2

([0, T ]) Linear Spectral Clust. 7 near. neig.

00 gaussian SC km 1 x

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

11 linear SC km 1 Dx

i

∈ L

2

([0, T ]) Linear Spectral Clust. 7 near. neig.

11 gaussian SC km 1 Dx

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

01 linear SC km 1 x

i

∈ W

1,2

([0, T ]) Linear Spectral Clust. 7 near. neig.

01 gaussian SC km 1 x

i

∈ W

1,2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

(46)

Experiments and discussion

List of the 18 clustering models

Acronym Representation Kernel Clustering proc. Sparsif.

00 linear K km x

i

∈ L

2

([0, T ]) Linear Ker. k-means

00 gaussian K km x

i

∈ L

2

([0, T ]) Gaussian Ker. k-means

11 linear K km Dx

i

∈ L

2

([0, T ]) Linear Ker. k-means

11 gaussian K km Dx

i

∈ L

2

([0, T ]) Gaussian Ker. k-means

01 linear K km x

i

∈ W

1,2

([0, T ]) Linear Ker. k-means

01 gaussian K km x

i

∈ W

1,2

([0, T ]) Gaussian Ker. k-means

00 linear SC km 0 x

i

∈ L

2

([0, T ]) Linear Spectral Clust. Connected

00 gaussian SC km 0 x

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. Connected

11 linear SC km 0 Dx

i

∈ L

2

([0, T ]) Linear Spectral Clust. Connected

11 gaussian SC km 0 Dx

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. Connected

01 linear SC km 0 x

i

∈ W

1,2

([0, T ]) Linear Spectral Clust. Connected

01 gaussian SC km 0 x

i

∈ W

1,2

([0, T ]) Gaussian Spectral Clust. Connected

00 linear SC km 1 x

i

∈ L

2

([0, T ]) Linear Spectral Clust. 7 near. neig.

00 gaussian SC km 1 x

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

11 linear SC km 1 Dx

i

∈ L

2

([0, T ]) Linear Spectral Clust. 7 near. neig.

11 gaussian SC km 1 Dx

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

01 linear SC km 1 x

i

∈ W

1,2

([0, T ]) Linear Spectral Clust. 7 near. neig.

01 gaussian SC km 1 x

i

∈ W

1,2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

(47)

Experiments and discussion

List of the 18 clustering models

Acronym Representation Kernel Clustering proc. Sparsif.

00 linear K km x

i

∈ L

2

([0, T ]) Linear Ker. k-means 00 gaussian K km x

i

∈ L

2

([0, T ]) Gaussian Ker. k-means 11 linear K km Dx

i

∈ L

2

([0, T ]) Linear Ker. k-means 11 gaussian K km Dx

i

∈ L

2

([0, T ]) Gaussian Ker. k-means 01 linear K km x

i

∈ W

1,2

([0, T ]) Linear Ker. k-means 01 gaussian K km x

i

∈ W

1,2

([0, T ]) Gaussian Ker. k-means

00 linear SC km 0 x

i

∈ L

2

([0, T ]) Linear Spectral Clust. Connected

00 gaussian SC km 0 x

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. Connected

11 linear SC km 0 Dx

i

∈ L

2

([0, T ]) Linear Spectral Clust. Connected

11 gaussian SC km 0 Dx

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. Connected

01 linear SC km 0 x

i

∈ W

1,2

([0, T ]) Linear Spectral Clust. Connected

01 gaussian SC km 0 x

i

∈ W

1,2

([0, T ]) Gaussian Spectral Clust. Connected

00 linear SC km 1 x

i

∈ L

2

([0, T ]) Linear Spectral Clust. 7 near. neig.

00 gaussian SC km 1 x

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

11 linear SC km 1 Dx

i

∈ L

2

([0, T ]) Linear Spectral Clust. 7 near. neig.

11 gaussian SC km 1 Dx

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

01 linear SC km 1 x

i

∈ W

1,2

([0, T ]) Linear Spectral Clust. 7 near. neig.

01 gaussian SC km 1 x

i

∈ W

1,2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

(48)

Experiments and discussion

List of the 18 clustering models

Acronym Representation Kernel Clustering proc. Sparsif.

00 linear K km x

i

∈ L

2

([0, T ]) Linear Ker. k-means 00 gaussian K km x

i

∈ L

2

([0, T ]) Gaussian Ker. k-means 11 linear K km Dx

i

∈ L

2

([0, T ]) Linear Ker. k-means 11 gaussian K km Dx

i

∈ L

2

([0, T ]) Gaussian Ker. k-means 01 linear K km x

i

∈ W

1,2

([0, T ]) Linear Ker. k-means 01 gaussian K km x

i

∈ W

1,2

([0, T ]) Gaussian Ker. k-means

00 linear SC km 0 x

i

∈ L

2

([0, T ]) Linear Spectral Clust. Connected 00 gaussian SC km 0 x

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. Connected 11 linear SC km 0 Dx

i

∈ L

2

([0, T ]) Linear Spectral Clust. Connected 11 gaussian SC km 0 Dx

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. Connected 01 linear SC km 0 x

i

∈ W

1,2

([0, T ]) Linear Spectral Clust. Connected 01 gaussian SC km 0 x

i

∈ W

1,2

([0, T ]) Gaussian Spectral Clust. Connected

00 linear SC km 1 x

i

∈ L

2

([0, T ]) Linear Spectral Clust. 7 near. neig.

00 gaussian SC km 1 x

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

11 linear SC km 1 Dx

i

∈ L

2

([0, T ]) Linear Spectral Clust. 7 near. neig.

11 gaussian SC km 1 Dx

i

∈ L

2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

01 linear SC km 1 x

i

∈ W

1,2

([0, T ]) Linear Spectral Clust. 7 near. neig.

01 gaussian SC km 1 x

i

∈ W

1,2

([0, T ]) Gaussian Spectral Clust. 7 near. neig.

Références

Documents relatifs

Using Matlab, we first generate random graphs with an exact group structure. We choose different number of blocks and different sizes for the blocks. Then, we add a noise on

Antibiotics delivery by oral gavage twice a day or in drinking water induces in contrast a robust and consistent depletion of mouse fecal bacteria, as soon as 4 days of treatment,

In both experiments, plots were sown with different numbers of species to unravel mechanisms underlying the relationship between biodiversity and ecosystem functioning (BEF).

This is also consistent with predictions from fluid-dynamic models for time required to generate a zoned phonolite magma chamber with the dimensions of the Laacher See system (Tait

In this thesis, with addressing text clustering tasks as our focus, we elaborate on our contributions on toward scalable hierarchical clustering methods, tests on the cluster

‫ﺧﺎﺗﻤﺔ اﻟﻔﺼﻞ ‪:‬‬ ‫ﻤن ﺨﻼل دراﺴﺔ ﻓﻌﺎﻝﻴﺔ اﻝﺴﻴﺎﺴﺔ اﻝﻨﻘدﻴﺔ وﻗدرة ﺘﺄﺜﻴرﻫﺎ ﻋﻠﻰ ﻨظم اﻝﺼرف ﻓﻲ اﻝﺠزاﺌر وذﻝك ﻓﻲ ﻤﻌﺎﻝﺠﺔ‬ ‫اﻹﺨﺘﻼﻻت اﻻﻗﺘﺼﺎدﻴﺔ‪ ،‬ﺒﺎﻹﻀﺎﻓﺔ إﻝﻰ ﺘﻔﺎﻋل ﺒﻴن

Bartlett’s test for the equality of variances is used since long for determining the number of factors to retain in the context of a Principal Component Analysis (PCA) [3]. We show

Since the original Euclidean distance is not suit for complex manifold data structure, we design a manifold similarity measurement and proposed a K-AP clustering algorithm based