• Aucun résultat trouvé

TD7

N/A
N/A
Protected

Academic year: 2022

Partager "TD7"

Copied!
2
0
0

Texte intégral

(1)

TP7: KERNEL PRINCIPAL COMPONENT ANALYSIS

COURS D’APPRENTISSAGE, ECOLE NORMALE SUP ´ERIEURE

Rapha¨el Berthier raphael.berthier@inria.fr This exercise sheet was composed using [1, Exercises 1.6.4, 9.5.6].

1. Plain Principal Component Analysis (PCA)

Let X1, . . . , Xd ∈ Rp be a sequence of samples. The goal of the PCA is to compute a low-dimensional representation of the data. For a given dimensiond≤p, the PCA computes the subspaceVd such that

(1) Vd ∈ argmin

VRp: dim(V)≤d n

X

i=1

kXi−ProjV Xik22 ,

where ProjV denotes the orthogonal projection ontoV. Let us denote X the data matrix, whose lines are X1, . . . , Xn, andX =Pr

k=1σkukvkT its singular values decomposition (SVD), withσ1≥ · · · ≥σr>0.

We have the following theorem Theorem 1. Let d≤r. We have

min

Y:rank(Y)≤dkY −Xk2F =

r

X

k=d+1

σk2, wherek.kF denotes the Frobenius norm of a matrix,kZk2F = tr(ZZT) =P

i,jZi,j2 . Moreover, the minimum is reached if

Y =

d

X

k=1

σkukvkT.

1) Check that

n

X

i=1

kXi−ProjV Xik22=kX−XProjV k2F. Conclude thatVd={v1, . . . , vd} minimizes (1).

2) Prove that the coordinates of ProjV Xi in the orthonormal basis (v1, . . . , vd) of Vd are given by (σ1hei, u1i, . . . , σdhei, udi).

The right-singular vectors v1, . . . , vd ∈ Rp are called the principal axes. There represent the directions where the data varies the most. The vectors ck = Xvk = σkuk ∈ Rn, k = 1, . . . , d are called the principal components. The principal componentck gathers the coordinates ofX1, . . . , Xn onvk.

Note that in practice, since Vd is a linear span and not an affine span, it is highly recommended to first center the data point ˜Xi=Xin1PXi before applying the PCA.

2. Kernel principal component analysis

The Reproducing Kernel Hilbert Space (RKHS) framework allows us to “delinearize” some linear algorithm.

We show here how it can be applied to PCA. Let us consider that we have n pointsX1, . . . , Xn ∈ X. Note that in the kernel setting,X does not need to be a vector space anymore. Letφ:X → F be a mapping from X to some RKHSF associated to some positive definite kernel k on X. The principal of kernel PCA is to perform a PCA on the pointsφ(X1), . . . , φ(Xn) without computing the pointsφ(X1), . . . , φ(Xn) explicitly.

Letdbe a fixed positive integer. We seek for the spaceV ⊂ F such that Vd∈ argmin

V⊂F: dim(V)≤d n

X

i=1

kφ(Xi)−ProjVφ(Xi)k2F .

(2)

2 COURS D’APPRENTISSAGE, ECOLE NORMALE SUP ´ERIEURE

where ProjV denotes the orthogonal projection onV in the Hilbert spaceF. In the following, we denote byL the linear map

L :Rn→ F α7→ Lα=

n

X

i=1

αiφ(Xi). 3) Prove thatVd=LVd, withVd fulfilling

(2) Vd∈ argmin

V⊂Rn: dim(V)≤d n

X

i=1

kφ(Xi)−ProjLVφ(Xi)k2F .

4) We denote by K the n×n matrix with entries Ki,j = k(Xi, Xj) for i, j = 1, . . . , n. We assume in the following that Kis non-singular. Prove that for anyα∈Rn,kLK−1/2αk2F =kαk22.

5) Let V be a subspace of Rn of dimension dand denote by (b1, . . . , bd) an orthonormal basis of the linear spanK1/2V. Prove that (LK−1/2b1, . . . ,LK−1/2bd) is an orthonormal basis ofLV.

6) Prove the identities

ProjLV Lα=

d

X

k=1

hLK−1/2bk,LαiFLK−1/2bk

=LK−1/2ProjK1/2VK1/2α .

7) Let us denote (e1, . . . , en) the canonical basis ofRn. Check that

n

X

i=1

kφ(Xi)−ProjLV φ(Xi)k2F =

n

X

i=1

kLei− LK−1/2ProjK1/2VK1/2eik2F

=

n

X

i=1

kK1/2ei−ProjK1/2V K1/2eik22

=kK1/2−ProjK1/2V K1/2k2F.

8) Using Theorem 1, show that Vd = Span{v1, . . . , vd}, where v1, . . . , vd are eigenvectors ofK associated to thedlargest eigenvalues ofK, is a minimizer of (2).

9) We setfk=LK−1/2vk fork= 1, . . . , d. Check that (f1, . . . , fd) is an orthonormal basis ofVd and ProjVφ(Xi) =

n

X

k=1

hvk, K1/2eiifk.

Thus in the basis (f1, . . . , fk), the coordinates of the orthogonal projection of the point ϕ(Xi) onto V are (hv1, K1/2eii, . . . ,hvd, K1/2eii).

The take-home message is that the kernel PCA can be computed in the original space X as a eigenvalue decomposition of the Gram matrixK.

References

[1] Christophe Giraud.Introduction to high-dimensional statistics. Chapman and Hall/CRC, 2014.

Références

Documents relatifs

One implication of this view is that related subjects can be interpreted as a weak form of identity and from a practical perspective can be used as a starting point for

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

In fact, most of us (and I include myself here) have done rather worse than that – in that having identified the need to think beyond growth we have then reverted to the more

ERRATUM ON “ENTROPY-ENERGY INEQUALITIES AND IMPROVED CONVERGENCE RATES FOR NONLINEAR..

The paper is structured as follows: In Section 2, we first recall important concepts about kernels on Wasserstein spaces W 2 (R). We then give a brief introduction to Wasserstein

Trial emulation to explore residual confounding Trial simulation Effectiveness level 1Systematic review of randomized controlled trialsDescription:Identify and aggregate available

We now evaluate the performance of SMD’s step size adaptation in RKHS by comparing the online SVMD algorithm described in Section 5 above to step size adaptation based only

The goal of the study is to analyze, from the perspective of researchers, the design of interfaces in order to: - evaluate the adherence of real-world interfaces to the