• Aucun résultat trouvé

TD7

N/A
N/A
Protected

Academic year: 2022

Partager "TD7"

Copied!
2
0
0

Texte intégral

(1)

TP7: KERNEL PRINCIPAL COMPONENT ANALYSIS

COURS D’APPRENTISSAGE, ECOLE NORMALE SUP ´ERIEURE

Rapha¨el Berthier [email protected] This exercise sheet was composed using [1, Exercises 1.6.4, 9.5.6].

1. Plain Principal Component Analysis (PCA)

Let X1, . . . , Xd ∈ Rp be a sequence of samples. The goal of the PCA is to compute a low-dimensional representation of the data. For a given dimensiond≤p, the PCA computes the subspaceVd such that

(1) Vd ∈ argmin

VRp: dim(V)≤d n

X

i=1

kXi−ProjV Xik22 ,

where ProjV denotes the orthogonal projection ontoV. Let us denote X the data matrix, whose lines are X1, . . . , Xn, andX =Pr

k=1σkukvkT its singular values decomposition (SVD), withσ1≥ · · · ≥σr>0.

We have the following theorem Theorem 1. Let d≤r. We have

min

Y:rank(Y)≤dkY −Xk2F =

r

X

k=d+1

σk2, wherek.kF denotes the Frobenius norm of a matrix,kZk2F = tr(ZZT) =P

i,jZi,j2 . Moreover, the minimum is reached if

Y =

d

X

k=1

σkukvkT.

1) Check that

n

X

i=1

kXi−ProjV Xik22=kX−XProjV k2F. Conclude thatVd={v1, . . . , vd} minimizes (1).

2) Prove that the coordinates of ProjV Xi in the orthonormal basis (v1, . . . , vd) of Vd are given by (σ1hei, u1i, . . . , σdhei, udi).

The right-singular vectors v1, . . . , vd ∈ Rp are called the principal axes. There represent the directions where the data varies the most. The vectors ck = Xvk = σkuk ∈ Rn, k = 1, . . . , d are called the principal components. The principal componentck gathers the coordinates ofX1, . . . , Xn onvk.

Note that in practice, since Vd is a linear span and not an affine span, it is highly recommended to first center the data point ˜Xi=Xin1PXi before applying the PCA.

2. Kernel principal component analysis

The Reproducing Kernel Hilbert Space (RKHS) framework allows us to “delinearize” some linear algorithm.

We show here how it can be applied to PCA. Let us consider that we have n pointsX1, . . . , Xn ∈ X. Note that in the kernel setting,X does not need to be a vector space anymore. Letφ:X → F be a mapping from X to some RKHSF associated to some positive definite kernel k on X. The principal of kernel PCA is to perform a PCA on the pointsφ(X1), . . . , φ(Xn) without computing the pointsφ(X1), . . . , φ(Xn) explicitly.

Letdbe a fixed positive integer. We seek for the spaceV ⊂ F such that Vd∈ argmin

V⊂F: dim(V)≤d n

X

i=1

kφ(Xi)−ProjVφ(Xi)k2F .

(2)

2 COURS D’APPRENTISSAGE, ECOLE NORMALE SUP ´ERIEURE

where ProjV denotes the orthogonal projection onV in the Hilbert spaceF. In the following, we denote byL the linear map

L :Rn→ F α7→ Lα=

n

X

i=1

αiφ(Xi). 3) Prove thatVd=LVd, withVd fulfilling

(2) Vd∈ argmin

V⊂Rn: dim(V)≤d n

X

i=1

kφ(Xi)−ProjLVφ(Xi)k2F .

4) We denote by K the n×n matrix with entries Ki,j = k(Xi, Xj) for i, j = 1, . . . , n. We assume in the following that Kis non-singular. Prove that for anyα∈Rn,kLK−1/2αk2F =kαk22.

5) Let V be a subspace of Rn of dimension dand denote by (b1, . . . , bd) an orthonormal basis of the linear spanK1/2V. Prove that (LK−1/2b1, . . . ,LK−1/2bd) is an orthonormal basis ofLV.

6) Prove the identities

ProjLV Lα=

d

X

k=1

hLK−1/2bk,LαiFLK−1/2bk

=LK−1/2ProjK1/2VK1/2α .

7) Let us denote (e1, . . . , en) the canonical basis ofRn. Check that

n

X

i=1

kφ(Xi)−ProjLV φ(Xi)k2F =

n

X

i=1

kLei− LK−1/2ProjK1/2VK1/2eik2F

=

n

X

i=1

kK1/2ei−ProjK1/2V K1/2eik22

=kK1/2−ProjK1/2V K1/2k2F.

8) Using Theorem 1, show that Vd = Span{v1, . . . , vd}, where v1, . . . , vd are eigenvectors ofK associated to thedlargest eigenvalues ofK, is a minimizer of (2).

9) We setfk=LK−1/2vk fork= 1, . . . , d. Check that (f1, . . . , fd) is an orthonormal basis ofVd and ProjVφ(Xi) =

n

X

k=1

hvk, K1/2eiifk.

Thus in the basis (f1, . . . , fk), the coordinates of the orthogonal projection of the point ϕ(Xi) onto V are (hv1, K1/2eii, . . . ,hvd, K1/2eii).

The take-home message is that the kernel PCA can be computed in the original space X as a eigenvalue decomposition of the Gram matrixK.

References

[1] Christophe Giraud.Introduction to high-dimensional statistics. Chapman and Hall/CRC, 2014.

Références

Documents relatifs

In this case, F Γ,χ 2,ν,H ( C g ) is an infinite reproducing kernel Hilbert space, and we provide an explicit orthonormal basis and explicit expression of its reproducing

Motivated by the setting of reproducing kernel Hilbert spaces (RKHS) and their extensions considered in multi-task learning, we propose in this paper a matrix-valued kernel

We derive a precise link between series expansions of Gaussian random vectors in a Banach space and Parseval frames in their reproducing kernel Hilbert space.. The results are

As in the scalar case, Operator-Valued Kernels (OVKs) are used to build Reproducing Kernel Hilbert Spaces (RKHS) in which representer theorems apply as for ridge regression or

In kernel methods, data are represented as functions or elements in reproducing kernel Hilbert spaces (RKHS), which are associated with positive definite kernels.. The application

Index Terms—Kernel machines, nonnegative matrix factorization, reproducing kernel Hilbert space, pre-image problem, hyperspectral image, unmixing

Index Terms — Kernel machines, nonnegative matrix fac- torization, reproducing kernel Hilbert space, pre-image prob- lem, unmixing problem, hyperspectral

Keywords: Regression, Reproducing kernel Hilbert space, Wasserstein distance, Transient evoked otoacoustic emission, separation rates, adaptive tests, kernel meth- ods,