Tikhonov regularization - Mathematical concepts for discrete inverse problems

Mathematical concepts for discrete inverse problems

2.5 Tikhonov regularization

(2.20) where KB is the Boltzmann constant and T is referred to as control parameter (or temperature in the original SA). The energy of the original configurationq and the new configurationris represented asEqandEr, respectively. In addition, an annealing process is needed to obtain a lower energy configuration. The well-known cooling schedule that provides the necessary and sufficient conditions for convergence is

C(t)= C0

logt ∀t>0 (2.21)

whereC(t) is a sequence of control parameters,C0is a constant andtis the time. It is noted thatC(t) approaches zero whentgoes to infinity.

When the simulated annealing schedule is applied to an optimization problem, the energy function becomes the objective function, and the configuration becomes the solution configuration of the parameters. Also Equations (2.20) and (2.21) can be further transformed as

Pr o(r)=e⁻

_Er−Eq T(l)

(2.22) T(l)= T

logl (2.23)

whereldenotes an integer step sequence,T0is the initial constant control parameter andT(l) is a sequence of control parameter. Equations (2.22) and (2.23) then give

Pr o(l)=l⁻

_Er−Eq T0

(2.24) It is noted thatPro(l) equals 1.0 whenl=1.0.

2.5 Tikhonov regularization

Regularization techniques include direct regularization methods and iterative regular-ization methods. A brief review of the regularregular-ization methods for numerical treatment of discrete ill-posed problems is found in Hansen (1994). Equation. (2.3) shows that the zeroth-order regularization whenL=I (the identify matrix). It becomes the first-order regularization whenLis a gradient operator and a second-order regularization

whenLis a surface Laplacian operator. Only one of the three orders of regularization is employed under most circumstances. The zeroth-order regularization biases the estimates towards zero but also greatly reduces large-magnitude oscillations in the parameter values, whereas first-order regularization biases the estimates towards a constant and reduces the tendency to fluctuate from one value to the next. Three regularization methods (truncated SVD, generalized cross-validation and L-curve) are discussed in the following sections.

2.5.1 T r u n c a t e d s i n g u l a r v a l u e d e c o m p o s i t i o n

The SVD allows the solution of singular systems by separating the components of operators belonging to its range from those belonging to its null-space (corresponding to the null singular values). If the whole set ofnsingular values in Equation (2.4) is nonzero, the solution becomes:

x= n

i=1

σ_i⁻¹(u^T_ib)vi (2.25)

The small singular values may cause solution instability as discussed in Section 2.3.1, and the terms corresponding to the smallest singular values are affected. It would be useful to consider these terms as belonging to the null-space and neglecting the corresponding singular values. In this way, the solution is deprived of some information content but becomes more regular without solution instability due to the small singular values. This is one way to treat the ill-conditioning of A to generate a new problem with a well-conditioned rank deficient coefficient matrix. The rank deficient matrix, which is the closest rank-kapproximationAktoA, is measured in the 2-norm and is obtained by truncating the SVD expansion in Equation (2.4) atk, to give

Ak= k

i=1

uiσiv^T_i , k≤n (2.26)

and the solution is given by xk=

k i=1

u^T_i b σi

vi (2.27)

The number of neglected singular values should be neither too high to preserve the information content, nor too low to obtain a more stable solution. If an estimate, δb, of the amount of noise in the data is available, the summation in Equation (2.25) can be truncated when the following condition is not satisfied:

σ_k≥ δbσ₁ (2.28)

This means that the firstksingular values can be retained when error in the data can be removed by filtering.

An alternative method to treat the problem is to use the discrete Picard condition to determine the number of terms in the summation in Equation. (2.12). Since the

error-free right-hand-side of Equation (2.1) is generally unknown, the Picard condition can be expressed using the perturbed right-hand-side,b, i.e. a bounded solution of the ill-posed problems exists if the terms|u^T_ib|decay faster than the corresponding singular values,σi. When|u^T_i b|do not decay faster than the corresponding singular values, the summation should be truncated because the remaining singular values are not able to filter out the error contained in the data. This method does not require knowledge of the error in the data. However, it can be used only when the operatorAis noise-free and the error is only on the right-hand-side. In fact, the error in operatorA affects both the singular values and the singular vectors. The trend of the terms|u^T_i b| can exhibit large oscillations and it is very difficult to decide when the Picard condition is satisfied.

The third method is based on the minimization of the output residual. If bpred

is the reconstructed measurements from the solution by Equation (2.1), then the non-dimensional output residualςcan be defined as

ς= bpred−b

b (2.29)

The output residual is computed for different numbers of retained singular values varying from 1 ton. The truncation of the summation can be performed for a value ofksuch thatςis a minimum.

2.5.2 G e n e r a l i z e d c r o s s - v a l i d a t i o n

The idea of cross-validation (Stone, 1974) is to maximize the predictability of the model with a choice of the regularization parameter,λ. A predictability test can be arranged by omitting one measured data point,bk(k=1, 2,· · ·,n), at a time and determining an estimate,kx(λ), using the remaining data points. Then for each of the estimates, the missing data is predicted and the value ofλthat predicts thebk(k=1, 2,· · ·,n) best is found. The procedure of this cross-validation is explained in the following steps.

(1) Find the estimate,kx, which minimizes n

i=1 i =k

(bi− m

j=1

aijxj)²+λL(x−x^∗)²2 (2.30)

(2) Predict the missing data point, bk(λ)=

m j=1

akj_kxj(λ) (2.31)

(3) Choose the value ofλwhich minimizes the cross-validation function, V0= 1

n n k=1

(bk−bk(λ))² (2.32)

To simplify the cross-validation, consider the identity,

bk−bk(λ)=

bk−^m

i=1

akj_kxj(λ) 1−rkk

(2.33) where,

rkk= m j=1

akj_kxj(λ)−bk(λ)

bk−bk(λ) (2.34)

andxj(λ) is thej-thterm inx(λ). Sincebk=^m

j=1

akj kxj(λ) it follows that

rkk= m

j=1

ajk((xj(λ)−kx(λ))

bk−bk(λ) (2.35)

Replacing the term on the right-hand-side by a derivative, gives

rkk= ∂

∂bk



^m

j=1

akjxj(λ)



=rkk(λ) (2.36)

Combining Equations. (2.32), (2.33) and (2.36) gives

V0(λ)=1 n

n i=1







bk−^m

j=1

akjxj(λ) 1−rkk







(2.37)

Equation (2.37) can be rewritten in the form V0=1

nQ(λ)(Ax(λ)−b)²2 (2.38)

where

Q(λ)=diag

1 1−rkk(λ)

, (i=1, 2,· · ·,n) (2.39)

andr_kkis thekk-th element of the influence matrix,R(λ), where

R(λ)=A(A^TA+λL^TL)⁻¹A^T (2.40)

V(λ)

λ Figure 2.3 The normal GCV function

log储Lx储2

Less filtering

Over filtering

log储Ax b储2

Figure 2.4 The generic form of L-curve

Golub et al. (1979) showed that the ‘ordinary’ cross-validation method led to solu-tions that were rotationally dependent. They replacedrii(λ) in Equation (2.37) with (1/n)trace(R(λ)) to give the generalized cross-validation (GCV) function,

V(λ)=

nAx(λ)−b²₂ ₁

ntrace(I−R(λ))2 (2.41)

Figure 2.3 shows a normal GCV function. The GCV function usually has a flat minimum and it works well in determining the optimalλvalue. However, the minimum may be difficult to locate numerically under some situations.

2.5.3 T h e L - c u r v e

Perhaps the most convenient graphical tool for the analysis of discrete ill-posed prob-lems is the L-curve (Hansen, 1992), which is a plot for all valid regularization parameters of the norm,Lxreg2, of the regularized solution versus the corresponding residual norm, Axreg −b2. The L-curve clearly displays the compromise between the minimization of these two quantities, which is the essence of any regularization method. Figure 2.4 shows the generic form of the L-curve.

For a discrete ill-posed problem, the L-curve, when plotted in the log-log scale, usually has a characteristic L-shape appearance with a distinct corner separating the vertical and the horizontal parts of the curve. It is noted that ifx^∗denotes the exact, un-regularized solution corresponding to the exact right-hand-side, b, in Equation (2.11), then the error,xreg−x^∗, in the regularized solution consists of two components, namely, a perturbation error from the errorein the given vectorb, and a regularization error due to the regularization of the error-free component,b, in the right-hand-side.

The vertical part of the L-curve corresponds to solutions whereLxreg2is very sensitive to changes in the regularization parameter because the perturbation erroredominates xregand it does not satisfy the discrete Picard condition. The horizontal part of the L-curve corresponds to solutions where it is the residual norm,Axreg−b2, that is most sensitive to the regularization parameter becausexregis dominated by the regularization error, as long asbsatisfies the discrete Picard condition.

This can be substantiated by means of the regularized solution,xreg, expressed in terms of the filter factors. For the general form of regularization, Equation (2.14) yields the following expression for the error inxreg:

xreg−x^∗=



 p

i=1

f_iu^T_i e σ_i xi+

n i=p+1

(u^T_ie)xi



+ p

i=1

(f_i−1)u^T_i b

σ_i xi (2.42)

Here, the term in parentheses is the perturbation error due to the perturbation,e; and the last term is the regularization error caused by regularization of the unperturbed component,b, on the right-hand-side. When only a little regularization is introduced, most of the filter factors,fi, are approximately equal to unity and the error,xreg−x^∗, is dominated by the perturbation error. However, when most of the filter factors are small, i.e.fi1,xreg−x^∗is dominated by the regularization error.

The L-curve for the Tikhonov regularization plays a central role in the regularization methods for discrete ill-posed problems because it divides the first quadrant into two regions. It is impossible to construct any solution that corresponds to a point below the Tikhonov L-curve; any regularized solution must lie on or above this curve. The solution computed by Tikhonov regularization is therefore optimal in the sense that for a given residual norm there does not exists a solution with a smaller semi-norm than the Tikhonov solution and the same is true with the roles of the norms interchanged.

A consequence of this is that other regularization methods can be compared with the Tikhonov regularization by inspecting how close the L-curve for the alternative method is to the Tikhonov L-curve.

For the regularized solution in Equation (2.14), there is obviously an optimal regu-larization parameter that balances the perturbation error and the reguregu-larization error in solution,xreg. An essential feature of the L-curve is that this optimal regularization parameter defined above, is not far from the regularization parameter that corre-sponds to the L-curve’s corner. In other words, by locating the corner of the L-curve, an approximation to the optimal regularization parameter can be computed and thus, in turn, a regularized solution with a good balance between the two types of errors

can be computed. For continuous L-curves, a computationally convenient definition of the L-curve’s corner is the point with maximum curvature in the log-log scale.

max(logAxλ−b2, logLxλ2) (2.43)

When the regularization parameter is discrete, the discrete L-curve in the log-log scale can be approximated by a two-dimensional spline curve. The point on the spline curve with maximum curvature was computed and the corner of the discrete L-curve at that point, which is closest to the corner of the spline curve, was defined.

2.6 General optimization procedure for the

Dans le document Damage Models and Algorithms for Assessment of Structures under Operating Conditions (Page 39-45)