Error minimization via gradients - Representation and Approximation in Vector Spaces

Representation and Approximation in Vector Spaces

3.3 Error minimization via gradients

While the orthogonality theorem is used principally throughout this chapter as the ge- ,,rnetr~cal basis for finding a minimum error approximation under an induced norm, it 1s pedagogically worthwhile to consider another approach based on gradients, which reaffirms what we already know but demonstrates the use of some new tools.

M~nlmlzing llellz for the induced norm in

requires minimizing

Using the vector notations defined in ( 3 . 3 , (3.6), and (3.7), we can write (3.13) as

Gradient formulas appropriate for this optimization are presented in section E. 1.1 of ap- pendix E. In particular, the following gradient formulas are derived:

Taking the gradient of (3.14) using the last two of these, we obtain

Equating this result to zero we obtain

giving us again the normal equations.

To determine whether the extremum we have obtained by the gradient is in fact a minimum, we compute the gradient a second time. We have the Hessian matrix

wh~ch is a positive-semidefinite matrix, so the extremum must be a minimum.

Restricting attention for the moment to real variables, consider the plot of the norm of the error J ( c ) as a function of the variables cl, cz,

. . .

, c,. Such a plot is called an error surface. Because J ( c ) is quadratic in c and R is positive semidefinite, the error surface is a parabolic bowl. Figure 3.3 illustrates such an error surface for two variables cl and cz.

Because of its parabolic shape, any extremum must be a minimum, and is in fact a global minimum.

138 Representation and Approximation in Vector Spaces

Figure 3.3: An error surface for two variables

3.4 Matrix representations of least-squares problems

While vector space methods apply to both infinite- and finite-dimensional vectors (signals), the notational power of matrices can be applied when the basis vectors are finite dimensional.

The linear combination of the finite set of vectors {pl

.

p z

.

^,pm} can be written as

This is the linear combination of the columns of the matrix A defined by A = [PI ^P2 . . .

P ~ I .

which we compute by

i = A c . The approximation problem can be stated as follows:

1

^Determine^cto minimize //el/: in the problem x = A c

+

^{e =}i

+

e .

/

^(3.16)

The mlnlmum llell: = Ilx -

~ ~

occurs when

1 1 ~

e 1s orthogonal to each of the vectors ( x - A c , p , ) = O , J = 1 . 2 , ^{. m}

Stack~ng these orthogonality condltlon\, we obtaln

Recogn~z~ng that the stack of vector5 I \ \imply A ~we obta~n . A)' AC = AI'X

t.4 Matrix Representations of Least-Squares I'roblerns 139

The mdtrix A"A i i the Cramrntan R, and the vector AHx 15 the crosi-correiatlon p We can write (3 17) ai

R c = A x - p H (3 18)

These equations are the normal equations. Then the optimal (least-squares) coefficients are

/

c = ( A ~ A ) - ' A ~ x = R - ' ~

1

^(3.19)

By theorem 3.1, AHA is positive definite if the p l , . . . , p, are linearly independent. The matrix (AHA)-'AH is called a pseudoinverse of A, and is often denoted At. More is said about pseudoinverses in section 4.9. While (3.19) provides an analytical prescription for the optimal coefficients, it should rarely be computed explicitly as shown, since many problems are numerically unstable (subject to amplification of roundoff errors). Numerical stability is discussed in section 4.10. Stable methods for computing pseudoinverses are discussed in sections 5.3 and 7.3. In MATLAB, the pseudoinverse may be computed using the command

pin-J.

Using (3.19), the approximation is

The matnx P = A (AH A)-"AH is a projection matrix, which we encountered in section 2.13. The matrix P projects onto the range of A. Consider geometrically what is taking place: we wish to solve the equation Ac = x, but there is no exact solution, since x is not in the range of A. So we project x orthogonally down onto the range of A, and find the best solution in that range space. The idea is shown in figure 3.4

Figure 3.4: Projection solution

A useful representation of the Grammian R = A H A can be obtained by considering A as a stack of rows,

so that ₌[qi, q 2 ,

.

. . , q,] and

140 Representation and Approximation in Vector Spaces

3.4.1 Weighted least-squares

A weight can also be applied to the data points, reflecting the confidence in the data, as illustrated by the next example. This is naturally incorporated into the inner product. Define a weighted inner product as

( x , y ) , = y H w x .

Then minimizing l/e11L =

11

Ac - x11L leads to the weighted normal equations

A ~ W A C = A * W X , (3.23)

so the coefficients which minimize the weighted squared error are

c = (A* W A ) - ' A ~ W X . (3.24)

Another approach to ( 3 . 2 4 ) is to presume that we have a factorization of the weight W =

sH

S (see section 5 . 2 ) . Then we weight the equation

SAC

=

S y .

Multiplying through by ( S A ) ~ and solving for c , we obtain c = ( ( S A ) * S A ) - ~ ( S A ) ~ S ~ , which is equivalent to (3.24).

3.4.2 Statistical properties of the least-squares estimate

The matrix-least squares solution ( 3 . 2 0 ) has some useful statistical properties. Suppose that the signal

x

has the true model according to the eqaation

x = A c o + e , (3.25)

for some "true" model parameter vector

co;

and that we assume a statistical model for the model error e: assume that each component of e is a zero-mean, i.i.d. random variable with variance a:. The estimated parameter vector is

c = ( A ~ A ) - ' A * X . ( 3 . 2 6 )

This least-squares estimate, being a function of the random vector

x,

is itself a random vector. We will determine the mean and covariance matrix for this random vector.

Mean of c Substituting the "true" model of (3.25) into (3.261, we obtaln c = ( A H A ) - ' A H A m

+

( A H A ) - ' A H e

= co

+

( A H A ) - ] A H e .

If we now take the expected value of our estimated parameter vector, we obtain E [ c ] ⁼E[co

+

( A * A ) - ' A H e ] ⁼co,

slnce each component of e has zero mean. Thus. the expected value of the estimate is equal to the true value. Such an est~mate I S said to be unbiased.

Covariance of c The covariance can be written as Cov[c] = E [ ( c ^-c o ) ( c ^-C O ) ~ ]

= ( A * A ) - ~ A * E [ ~ ~ ~ ] A ( A " A ) - ' . S ~ n c e the components of e are 1 1.d.. i t follows that E [ e e H ] ⁼o:l, so that

C O V [ C ~ = CT??(A A ) - ] = 0: R

Dans le document Methods and Algorithms for (Page 177-181)