• Aucun résultat trouvé

Determinant and Trace

Dans le document MACHINE LEARNING (Page 105-111)

Matrix Decompositions

4.1 Determinant and Trace

Figure 4.1 A mind map of the concepts introduced in this chapter, along with where they are used in other parts of the book.

Determinant Invertibility Cholesky

Eigenvalues

Eigenvectors Orthogonal matrix Diagonalization

SVD

Chapter 6 Probability

& distributions

Chapter 10 Dimensionality

reduction

tests used in

usedin

usedindetermines

usedin

used in

used in

constructs used in

used in

usedin

both subsequent mathematical chapters, such as Chapter 6, but also in applied chapters, such as dimensionality reduction in Chapters 10 or den-sity estimation in Chapter 11. This chapter’s overall structure is depicted in the mind map of Figure 4.1.

4.1 Determinant and Trace

The determinant notation|A|must not be confused with the absolute value.

Determinants are important concepts in linear algebra. A determinant is a mathematical object in the analysis and solution of systems of linear equations. Determinants are only defined for square matricesA∈Rn×n, i.e., matrices with the same number of rows and columns. In this book, we write the determinant asdet(A)or sometimes as|A|so that

det(A) =

a11 a12 . . . a1n

a21 a22 . . . a2n

... . .. ... an1 an2 . . . ann

. (4.1)

Thedeterminantof a square matrixA ∈Rn×n is a function that mapsA determinant

onto a real number. Before providing a definition of the determinant for generaln×nmatrices, let us have a look at some motivating examples, and define determinants for some special matrices.

Example 4.1 (Testing for Matrix Invertibility)

Let us begin with exploring if a square matrix A is invertible (see Sec-tion 2.2.2). For the smallest cases, we already know when a matrix is invertible. If A is a 1 ×1 matrix, i.e., it is a scalar number, then A=a =⇒ A−1 = 1a. Thusaa1 = 1holds, if and only ifa6= 0.

For2×2matrices, by the definition of the inverse (Definition 2.3), we know thatAA−1=I. Then, with (2.24), the inverse ofAis

A−1= 1 a11a22−a12a21

a22 −a12

−a21 a11

. (4.2)

Hence,Ais invertible if and only if

a11a22−a12a216= 0. (4.3) This quantity is the determinant ofA∈R2×2, i.e.,

det(A) =

a11 a12

a21 a22

=a11a22−a12a21. (4.4)

Example 4.1 points already at the relationship between determinants and the existence of inverse matrices. The next theorem states the same result forn×nmatrices.

Theorem 4.1. For any square matrixA∈Rn×nit holds thatAis invertible if and only ifdet(A)6= 0.

We have explicit (closed-form) expressions for determinants of small matrices in terms of the elements of the matrix. Forn= 1,

det(A) = det(a11) =a11. (4.5) Forn= 2,

det(A) =

a11 a12

a21 a22

=a11a22−a12a21, (4.6) which we have observed in the preceding example.

Forn= 3(known as Sarrus’ rule),

a11 a12 a13

a21 a22 a23

a31 a32 a33

=a11a22a33+a21a32a13+a31a12a23 (4.7)

−a31a22a13−a11a32a23−a21a12a33.

4.1 Determinant and Trace 101 For a memory aid of the product terms in Sarrus’ rule, try tracing the elements of the triple products in the matrix.

We call a square matrix T an upper-triangular matrix if Tij = 0 for upper-triangular matrix

i > j, i.e., the matrix is zero below its diagonal. Analogously, we define a

lower-triangular matrixas a matrix with zeros above its diagonal. For a tri- lower-triangular matrix

angular matrixT ∈Rn×n, the determinant is the product of the diagonal elements, i.e.,

Figure 4.2 The area of the parallelogram

Example 4.2 (Determinants as Measures of Volume)

The notion of a determinant is natural when we consider it as a mapping from a set ofnvectors spanning an object inRn. It turns out that the de-terminantdet(A)is the signed volume of ann-dimensional parallelepiped formed by columns of the matrixA.

For n = 2, the columns of the matrix form a parallelogram; see Fig-ure 4.2. As the angle between vectors gets smaller, the area of a parallel-ogram shrinks, too. Consider two vectorsb,gthat form the columns of a matrixA= [b,g]. Then, the absolute value of the determinant ofAis the area of the parallelogram with vertices0,b,g,b+g. In particular, ifb,g are linearly dependent so that b = λg for some λ ∈ R, they no longer form a two-dimensional parallelogram. Therefore, the corresponding area is0. On the contrary, ifb,g are linearly independent and are multiples of the canonical basis vectorse1,e2then they can be written asb=

b

, and the determinant is

The sign of the determinant indicates the orientation of the spanning vectorsb,gwith respect to the standard basis(e1,e2). In our figure, flip-ping the order tog,bswaps the columns ofAand reverses the orientation of the shaded area. This becomes the familiar formula: area =height× length. This intuition extends to higher dimensions. In R3, we consider three vectors r,b,g ∈ R3 spanning the edges of a parallelepiped, i.e., a

solid with faces that are parallel parallelograms (see Figure 4.3). The ab- The sign of the determinant indicates the orientation of the spanning vectors.

solute value of the determinant of the3×3matrix[r, b, g]is the volume of the solid. Thus, the determinant acts as a function that measures the signed volume formed by column vectors composed in a matrix.

Consider the three linearly independent vectorsr,g,b∈R3given as r=

Writing these vectors as the columns of a matrix

allows us to compute the desired volume as

V =|det(A)|= 186. (4.11)

Computing the determinant of ann×nmatrix requires a general algo-rithm to solve the cases forn >3, which we are going to explore in the fol-lowing. Theorem 4.2 below reduces the problem of computing the deter-minant of ann×nmatrix to computing the determinant of(n−1)×(n−1) matrices. By recursively applying the Laplace expansion (Theorem 4.2), we can therefore compute determinants ofn×n matrices by ultimately computing determinants of2×2matrices.

Laplace expansion

Theorem 4.2 (Laplace Expansion). Consider a matrix A ∈ Rn×n. Then, for allj= 1, . . . , n:

1. Expansion along columnj

det(Ak,j)is called aminorand (−1)k+jdet(Ak,j)

acofactor. det(A) =

n

X

k=1

(−1)k+jakjdet(Ak,j). (4.12) 2. Expansion along rowj

det(A) =

n

X

k=1

(−1)k+jajkdet(Aj,k). (4.13) HereAk,j ∈R(n−1)×(n−1)is the submatrix ofAthat we obtain when delet-ing rowkand columnj.

Example 4.3 (Laplace Expansion) Let us compute the determinant of

A=

using the Laplace expansion along the first row. Applying (4.13) yields

4.1 Determinant and Trace 103 We use (4.6) to compute the determinants of all2×2matrices and obtain det(A) = 1(1−0)−2(3−0) + 3(0−0) =−5. (4.16) For completeness we can compare this result to computing the determi-nant using Sarrus’ rule (4.7):

det(A) = 1·1·1+3·0·3+0·2·2−0·1·3−1·0·2−3·2·1 = 1−6 =−5. (4.17)

ForA∈Rn×nthe determinant exhibits the following properties:

The determinant of a matrix product is the product of the corresponding determinants,det(AB) = det(A)det(B).

Determinants are invariant to transposition, i.e.,det(A) = det(A>). IfAis regular (invertible), thendet(A−1) = det(A)1 .

Similar matrices (Definition 2.22) possess the same determinant. There-fore, for a linear mappingΦ : V → V all transformation matricesAΦ

of Φhave the same determinant. Thus, the determinant is invariant to the choice of basis of a linear mapping.

Adding a multiple of a column/row to another one does not change det(A).

Multiplication of a column/row with λ ∈ R scales det(A) by λ. In particular,det(λA) =λndet(A).

Swapping two rows/columns changes the sign ofdet(A).

Because of the last three properties, we can use Gaussian elimination (see Section 2.1) to compute det(A) by bringing A into row-echelon form.

We can stop Gaussian elimination when we haveAin a triangular form where the elements below the diagonal are all0. Recall from (4.8) that the determinant of a triangular matrix is the product of the diagonal elements.

Theorem 4.3. A square matrixA ∈ Rn×n hasdet(A) 6= 0 if and only if rk(A) =n. In other words,Ais invertible if and only if it is full rank.

When mathematics was mainly performed by hand, the determinant calculation was considered an essential way to analyze matrix invertibil-ity. However, contemporary approaches in machine learning use direct numerical methods that superseded the explicit calculation of the deter-minant. For example, in Chapter 2, we learned that inverse matrices can be computed by Gaussian elimination. Gaussian elimination can thus be used to compute the determinant of a matrix.

Determinants will play an important theoretical role for the following sections, especially when we learn about eigenvalues and eigenvectors (Section 4.2) through the characteristic polynomial.

Definition 4.4. Thetraceof a square matrixA∈Rn×n is defined as trace

tr(A) :=

n

X

i=1

aii, (4.18)

i.e. , the trace is the sum of the diagonal elements ofA. The trace satisfies the following properties:

tr(A+B) =tr(A) +tr(B)forA,B∈Rn×n tr(αA) =αtr(A), α∈RforA∈Rn×n tr(In) =n

tr(AB) =tr(BA)forA∈Rn×k,B ∈Rk×n

It can be shown that only one function satisfies these four properties to-gether – the trace (Gohberg et al., 2012).

The properties of the trace of matrix products are more general. Specif-ically, the trace is invariant under cyclic permutations, i.e.,

The trace is invariant under

cyclic permutations. tr(AKL) =tr(KLA) (4.19)

for matricesA∈Ra×k,K∈Rk×l,L∈Rl×a. This property generalizes to products of an arbitrary number of matrices. As a special case of (4.19), it follows that for two vectorsx,y∈Rn

tr(xy>) =tr(y>x) =y>x∈R. (4.20) Given a linear mapping Φ : V → V, where V is a vector space, we define the trace of this map by using the trace of matrix representation ofΦ. For a given basis ofV, we can describeΦby means of the transfor-mation matrixA. Then the trace of Φ is the trace of A. For a different basis ofV, it holds that the corresponding transformation matrixB ofΦ can be obtained by a basis change of the formS−1ASfor suitableS(see Section 2.7.2). For the corresponding trace ofΦ, this means

tr(B) =tr(S−1AS)(4.19)= tr(ASS−1) =tr(A). (4.21) Hence, while matrix representations of linear mappings are basis depen-dent the trace of a linear mappingΦis independent of the basis.

In this section, we covered determinants and traces as functions char-acterizing a square matrix. Taking together our understanding of determi-nants and traces we can now define an important equation describing a matrixAin terms of a polynomial, which we will use extensively in the following sections.

Definition 4.5(Characteristic Polynomial). Forλ ∈R and a square ma-trixA∈Rn×n

pA(λ) := det(A−λI) (4.22a)

=c0+c1λ+c2λ2+· · ·+cn−1λn−1+ (−1)nλn, (4.22b) c0, . . . , cn−1∈R, is thecharacteristic polynomialofA. In particular,

characteristic polynomial

Dans le document MACHINE LEARNING (Page 105-111)