Eigendecomposition and Diagonalization

Matrix Decompositions

4.4 Eigendecomposition and Diagonalization

Comparing the left-hand side of (4.45) and the right-hand side of (4.46) shows that there is a simple pattern in the diagonal elementslii:

l11=√

a11, l22= q

a22−l²₂₁, l33=^qa33−(l²₃₁+l₃₂² ). (4.47) Similarly for the elements below the diagonal (lij, wherei > j), there is also a repeating pattern:

l21= 1 l11

a21, l31= 1 l11

a31, l32= 1 l22

(a32−l31l21). (4.48) Thus, we constructed the Cholesky decomposition for any symmetric, pos-itive definite 3×3 matrix. The key realization is that we can backward calculate what the componentslij for theL should be, given the values aij forAand previously computed values oflij.

The Cholesky decomposition is an important tool for the numerical computations underlying machine learning. Here, symmetric positive def-inite matrices require frequent manipulation, e.g., the covariance matrix of a multivariate Gaussian variable (see Section 6.5) is symmetric, positive definite. The Cholesky factorization of this covariance matrix allows us to generate samples from a Gaussian distribution. It also allows us to perform a linear transformation of random variables, which is heavily exploited when computing gradients in deep stochastic models, such as the varia-tional auto-encoder (Jimenez Rezende et al., 2014; Kingma and Welling, 2014). The Cholesky decomposition also allows us to compute determi-nants very efficiently. Given the Cholesky decomposition A = LL^>, we know thatdet(A) = det(L) det(L^>) = det(L)². SinceLis a triangular matrix, the determinant is simply the product of its diagonal entries so that det(A) = ^Q_il²_ii. Thus, many numerical software packages use the Cholesky decomposition to make computations more efficient.

4.4 Eigendecomposition and Diagonalization

Adiagonal matrix is a matrix that has value zero on all off-diagonal ele- diagonal matrix

ments, i.e., they are of the form D =







c1 · · · 0 ... . .. ...

0 · · · cn





. (4.49)

They allow fast computation of determinants, powers, and inverses. The determinant is the product of its diagonal entries, a matrix powerD^k is given by each diagonal element raised to the power k, and the inverse D⁻¹is the reciprocal of its diagonal elements if all of them are nonzero.

In this section, we will discuss how to transform matrices into diagonal

form. This is an important application of the basis change we discussed in Section 2.7.2 and eigenvalues from Section 4.2.

Recall that two matricesA,Dare similar (Definition 2.22) if there ex-ists an invertible matrixP, such thatD =P⁻¹AP. More specifically, we will look at matricesAthat are similar to diagonal matricesD that con-tain the eigenvalues ofAon the diagonal.

Definition 4.19(Diagonalizable). A matrixA ∈ R^n×n isdiagonalizable

diagonalizable

if it is similar to a diagonal matrix, i.e., if there exists an invertible matrix P ∈R^n×n such thatD=P⁻¹AP.

In the following, we will see that diagonalizing a matrix A ∈ R^n×n is a way of expressing the same linear mapping but in another basis (see Section 2.6.1), which will turn out to be a basis that consists of the eigen-vectors ofA.

LetA∈R^n×n, letλ1, . . . , λn be a set of scalars, and letp₁, . . . ,p_nbe a set of vectors inRⁿ. We defineP := [p₁, . . . ,p_n]and letD ∈R^n×n be a diagonal matrix with diagonal entriesλ1, . . . , λn. Then we can show that

AP =P D (4.50)

if and only ifλ1, . . . , λn are the eigenvalues ofAandp₁, . . . ,p_n are cor-responding eigenvectors ofA.

We can see that this statement holds because

AP =A[p₁, . . . ,p_n] = [Ap₁, . . . ,Ap_n], (4.51) P D= [p₁, . . . ,p_n]







λ1 0

. ..

0 λn





= [λ1p₁, . . . , λnp_n]. (4.52) Thus, (4.50) implies that

Ap₁=λ1p₁ (4.53)

...

Ap_n =λnp_n. (4.54)

Therefore, the columns ofP must be eigenvectors ofA.

Our definition of diagonalization requires thatP ∈R^n×n is invertible, i.e., P has full rank (Theorem 4.3). This requires us to have n linearly independent eigenvectorsp₁, . . . ,p_n, i.e., thep_iform a basis ofRⁿ. Theorem 4.20(Eigendecomposition). A square matrixA∈R^n×n can be factored into

A=P DP⁻¹, (4.55)

where P ∈ R^n×n and D is a diagonal matrix whose diagonal entries are the eigenvalues ofA, if and only if the eigenvectors ofAform a basis ofRⁿ.

4.4 Eigendecomposition and Diagonalization 117 inR²and depicted as a rotation-like depicted here by a circle being

Theorem 4.20 implies that only non-defective matrices can be diagonal-ized and that the columns ofP are theneigenvectors ofA. For symmetric matrices we can obtain even stronger outcomes for the eigenvalue decom-position.

Theorem 4.21. A symmetric matrixS∈R^n×ncan always be diagonalized.

Theorem 4.21 follows directly from the spectral theorem 4.15. More-over, the spectral theorem states that we can find an ONB of eigenvectors ofRⁿ. This makesP an orthogonal matrix so thatD =P^>AP.

Remark. The Jordan normal form of a matrix offers a decomposition that works for defective matrices (Lang, 1987) but is beyond the scope of this

book. ♦

Geometric Intuition for the Eigendecomposition

We can interpret the eigendecomposition of a matrix as follows (see also Figure 4.7): LetAbe the transformation matrix of a linear mapping with respect to the standard basis ei (blue arrows). P⁻¹ performs a basis change from the standard basis into the eigenbasis. Then, the diagonal D scales the vectors along these axes by the eigenvalues λi. Finally, P transforms these scaled vectors back into the standard/canonical coordi-nates yieldingλip_i.

Example 4.11 (Eigendecomposition)

Let us compute the eigendecomposition ofA= ¹₂

5 −2

−2 5

Step 1: Compute eigenvalues and eigenvectors. The characteristic

polynomial ofAis characteristic polynomial), and the associated (normalized) eigenvectors are obtained via Therefore,Acan be diagonalized.

Step 3: Construct the matrixP to diagonalizeA.We collect the eigen-vectors ofAinP so that

Equivalently, we get (exploiting that P⁻¹ = P^> since the eigenvectors

Figure 4.7 visualizes

as a sequence of linear

Diagonal matrices D can efficiently be raised to a power. Therefore, we can find a matrix power for a matrixA ∈^R^n×n via the eigenvalue decomposition (if it exists) so that

A^k = (P DP⁻¹)^k =P D^kP⁻¹. (4.62) ComputingD^k is efficient because we apply this operation individually to any diagonal element.

Assume that the eigendecompositionA=P DP⁻¹exists. Then, det(A) = det(P DP⁻¹) = det(P) det(D) det(P⁻¹) (4.63a)

Dans le document MACHINE LEARNING (Page 121-125)