TD7

(1)

TP7: KERNEL PRINCIPAL COMPONENT ANALYSIS

COURS D’APPRENTISSAGE, ECOLE NORMALE SUP ´ERIEURE

Rapha¨el Berthier raphael.berthier@inria.fr This exercise sheet was composed using [1, Exercises 1.6.4, 9.5.6].

1. Plain Principal Component Analysis (PCA)

Let X1, . . . , Xd ∈ R^p be a sequence of samples. The goal of the PCA is to compute a low-dimensional representation of the data. For a given dimensiond≤p, the PCA computes the subspaceVd such that

(1) Vd ∈ argmin

V⊂R^p: dim(V)≤d n

X

i=1

kXi−Proj_V Xik²₂ ,

where Proj_V denotes the orthogonal projection ontoV. Let us denote X the data matrix, whose lines are X1, . . . , Xn, andX =Pr

k=1σkukv_k^T its singular values decomposition (SVD), withσ1≥ · · · ≥σr>0.

We have the following theorem Theorem 1. Let d≤r. We have

min

Y:rank(Y)≤dkY −Xk²_F =

r

X

k=d+1

σ_k², wherek.kF denotes the Frobenius norm of a matrix,kZk²_F = tr(ZZ^T) =P

i,jZ_i,j² . Moreover, the minimum is reached if

Y =

d

X

k=1

σ_ku_kv_k^T.

1) Check that

n

X

i=1

kXi−Proj_V Xik²₂=kX−XProj_V k²_F. Conclude thatVd={v1, . . . , vd} minimizes (1).

2) Prove that the coordinates of Proj_V Xi in the orthonormal basis (v1, . . . , vd) of Vd are given by (σ1hei, u1i, . . . , σdhei, udi).

The right-singular vectors v₁, . . . , v_d ∈ R^p are called the principal axes. There represent the directions where the data varies the most. The vectors c_k = Xv_k = σ_ku_k ∈ Rⁿ, k = 1, . . . , d are called the principal components. The principal componentc_k gathers the coordinates ofX₁, . . . , X_n onv_k.

Note that in practice, since V_d is a linear span and not an affine span, it is highly recommended to first center the data point ˜X_i=X_i−_n¹PX_i before applying the PCA.

2. Kernel principal component analysis

The Reproducing Kernel Hilbert Space (RKHS) framework allows us to “delinearize” some linear algorithm.

We show here how it can be applied to PCA. Let us consider that we have n pointsX₁, . . . , X_n ∈ X. Note that in the kernel setting,X does not need to be a vector space anymore. Letφ:X → F be a mapping from X to some RKHSF associated to some positive definite kernel k on X. The principal of kernel PCA is to perform a PCA on the pointsφ(X1), . . . , φ(Xn) without computing the pointsφ(X1), . . . , φ(Xn) explicitly.

Letdbe a fixed positive integer. We seek for the spaceV ⊂ F such that Vd∈ argmin

V⊂F: dim(V)≤d n

X

i=1

kφ(X_i)−Proj_Vφ(X_i)k²_F .

(2)

2 COURS D’APPRENTISSAGE, ECOLE NORMALE SUP ´ERIEURE

where Proj_V denotes the orthogonal projection onV in the Hilbert spaceF. In the following, we denote byL the linear map

L :Rⁿ→ F α7→ Lα=

n

X

i=1

αiφ(Xi). 3) Prove thatVd=LVd, withVd fulfilling

(2) Vd∈ argmin

V⊂Rⁿ: dim(V)≤d n

X

i=1

kφ(Xi)−Proj_LVφ(Xi)k²_F .

4) We denote by K the n×n matrix with entries Ki,j = k(Xi, Xj) for i, j = 1, . . . , n. We assume in the following that Kis non-singular. Prove that for anyα∈Rⁿ,kLK^−1/2αk²_F =kαk²₂.

5) Let V be a subspace of Rⁿ of dimension dand denote by (b1, . . . , bd) an orthonormal basis of the linear spanK^1/2V. Prove that (LK^−1/2b1, . . . ,LK^−1/2bd) is an orthonormal basis ofLV.

6) Prove the identities

Proj_LV Lα=

d

X

k=1

hLK^−1/2bk,Lαi_FLK^−1/2bk

=LK^−1/2Proj_K1/2VK^1/2α .

7) Let us denote (e₁, . . . , e_n) the canonical basis ofRⁿ. Check that

n

X

i=1

kφ(Xi)−Proj_LV φ(Xi)k²_F =

n

X

i=1

kLei− LK^−1/2Proj_K1/2VK^1/2eik²_F

=

n

X

i=1

kK^1/2e_i−Proj_K1/2V K^1/2e_ik²₂

=kK^1/2−Proj_K1/2V K^1/2k²_F.

8) Using Theorem 1, show that Vd = Span{v1, . . . , vd}, where v1, . . . , vd are eigenvectors ofK associated to thedlargest eigenvalues ofK, is a minimizer of (2).

9) We setf_k=LK^−1/2v_k fork= 1, . . . , d. Check that (f₁, . . . , f_d) is an orthonormal basis ofV_d and Proj_Vφ(X_i) =

n

X

k=1

hvk, K^1/2e_iifk.

Thus in the basis (f1, . . . , fk), the coordinates of the orthogonal projection of the point ϕ(Xi) onto V are (hv1, K^1/2e_ii, . . . ,hvd, K^1/2e_ii).

The take-home message is that the kernel PCA can be computed in the original space X as a eigenvalue decomposition of the Gram matrixK.

References

[1] Christophe Giraud.Introduction to high-dimensional statistics. Chapman and Hall/CRC, 2014.