• Aucun résultat trouvé

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

N/A
N/A
Protected

Academic year: 2022

Partager "Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php"

Copied!
33
0
0

Texte intégral

(1)

ON THE GENERALIZED LANCZOS TRUST-REGION METHOD

LEI-HONG ZHANG, CHUNGEN SHEN, AND REN-CANG LI§

Abstract. The so-called trust-region subproblem gets its name in the trust-region method in optimization and also plays a vital role in various other applications. Several numerical algorithms have been proposed in the literature for solving small-to-medium size dense problems as well as for large-scale sparse problems. The generalized Lanczos trust-region (GLTR) method proposed by [N. I. M. Gould, S. Lucidi, M. Roma and P. L. Toint, SIAM J. Optim., 9 (1999), pp. 561–580]

is a natural extension of the classical Lanczos method for the linear system to the trust-region subproblem. In this paper, we first analyze the convergence of GLTR to reveal its convergence behavior in theory and then propose new stopping criteria that can be integrated into GLTR for better numerical performance. Specifically, we develop a priori upper bounds for the convergence to both the optimal objective value as well as the optimal solution and argue that these bounds can be efficiently estimated numerically and serve as stopping criteria for iterative methods such as GLTR.

Two sets of numerical tests are presented. In the first set, we demonstrate the sharpness of the upper bounds, and for the second set, we integrate the upper bound estimate into the Fortran routineGLTR in the libraryGALAHADas new stopping criteria and test the trust-region solverTRUon the problem collectionCUTEr. The numerical results show that, with the new stopping criteria inGLTR, the overall performance ofTRUcan be improved considerably.

Key words. trust-region subproblem, Lanczos method, conjugate gradient method, trust-region method, convergence, stopping criterion

AMS subject classifications. 90C20, 90C06, 65F10, 65F15, 65F35

DOI. 10.1137/16M1095056

1. Introduction. Minimization of a quadratic function over a Euclidean ball min

ksssk2≤∆f(sss) with f(sss) := 1

2sss>Hsss+sss>ggg (1.1)

is widely known as thetrust-region subproblem(TRS) [20, 21], whereH=H>∈Rn×n, ggg ∈ Rn, and ∆ >0 is the trust-region radius. It also shows up in other important applications such as the Tikhonov regularization [24, 25, 26, 32, 33] for ill-posed problems, graph partitioning problems [13], as well as in the Levenberg–Marquardt approach in optimization [21].

Because of its vital role in numerous applications, several algorithms have been proposed for (1.1). Basically, these algorithms can be classified into two categories:

algorithms based on matrix factorizations for small-to-medium size dense problems

Received by the editors September 22, 2016; accepted for publication (in revised form) May 5, 2017; published electronically September 28, 2017.

http://www.siam.org/journals/siopt/27-3/M109505.html

Funding: The first author was supported in part by National Natural Science Foundations of China NSFC-11371102, NSFC-11671246, and NSFC-91730303 and by the Basic Academic Discipline Program, the 11th Five Year Plan of 211 Project for Shanghai University of Finance and Economics.

The work of the second author is supported in part by National Natural Science Foundations of China NSFC-11101281 and NSFC-11271259. The third author’s work was supported in part by NSF grants DMS-1317330 and CCF-1527104 and NSFC grant 11428104.

School of Mathematics and Research School for Interdisciplinary Sciences, Shanghai Univer- sity of Finance and Economics, 777 Guoding Road, Shanghai 200433, China (zhang.leihong@mail.

shufe.edu.cn).

College of Science, University of Shanghai for Science and Technology, Shanghai 200093, China (shenchungen@gmail.com).

§Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019-0408

(rcli@uta.edu).

2110

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(2)

(see, e.g., [20, 21]) and factorization-free algorithms for large-scale sparse problems (see, e.g., [8, 12, 21, 23, 24, 25, 26, 30, 31, 32, 34]). The method proposed in [11] is based on matrix factorization but can often be just as successful as factorization-free methods for large problems when the sparsity structure is favorable.

The Mor´e–Sorensen method [20] available as subroutineGQTPARin MINPACK-2 is probably the most well-known one for small size dense problems, and it and its modifications are frequently embedded into programs as building blocks for solving relevant subproblems within large-scale computational problems. This is also the case for the method proposed in [9] (see also [4, Chapter 5]). In particular, the authors of [9] presented a generalized Lanczos trust-region method (GLTR) [9, Algorithm 5.1], which is an improved Steihaug–Toint [31, 34] truncated conjugate gradient (tCG) iteration for the weighted-norm TRS:

min

kssskM≤∆f(sss), (1.2)

where the weighting matrixM ∈Rn×n is a given symmetric positive definite matrix, andkssskM :=√

s

ss>Msssis theM-vector norm ofsss.

GLTR generally consists of two main phases, namely,the first passandthe second pass. In the first pass, it starts with the (preconditioned) CG iteration [9, Algorithm 4.1] for minimizing f(sss). During the (preconditioned) CG iterations, the objective function decreases while the M-norm of the iterative solution increases. Thus, the iteration in the first pass stops when either the solutionsssoptto (1.2) is achieved within the trust regionkssskM ≤∆ or one CG step exceeds∆in M-norm or directions with negative curvature (a vectorpppis a direction of negative curvature ifppp>Hppp <0) are detected. The former means that the problem (1.2) is equivalent to a linear system Hsss= −ggg, while the latter implies either H is indefinite and ksssoptkM =∆ or H is positive semidefinite but there is a ppp ∈ Rn such that Hppp = 0 and ppp>ggg < 0; the second pass will be triggered thereafter to obtain (together with a third pass) an approximationsssk on the boundary. By making use of the close relationship between CG and the Lanczos process (see [9, section 4]), in the second pass, GLTR needs to solve smaller size TRSs successively, which resulted from projecting the original TRS (1.2) onto the Krylov subspace generated by the Lanczos process or, equivalently, by CG (see [9] in detail). Extensive numerical testing suggests that by integrating the first pass and the second pass carefully [9, section 5.1], GLTR is able to achieve efficiently a boundary solution on the one hand and also maintain the efficiencies of CG so long as the iterates lie in the interior on the other hand.

GLTR can be understood as a generalization and indeed an efficient implemen- tation of a kind of Lanczos method for TRS as detailed in [9, section 5] under the name truncated Lanczos approach (TLTRS). In particular, TLTRS mimics the clas- sical Rayleigh–Ritz procedure (see [22, section 11.3] and [5, Definition 7.1]) for the eigenvalue problem and proceeds iteratively the following three steps: fork= 0,1, . . . (more detail will be given in section 3):

1. generate thekth Krylov subspace by the preconditioned Lanczos process [22, Algorithm 4.2] or equivalently the preconditioned CG [9, Algorithm 4.1];

2. project the original TRS (1.2) onto thekth Krylov subspace to give a smaller size TRS;

3. solve the resulting smaller size TRS to get an approximate solution to TRS (1.2).

TLTRS can be viewed as a natural extension of the classical Lanczos method (see, e.g., [5, 28]) for the linear system and eigenvalue problem to TRS. There has been

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(3)

a wealth of developments, in both theory and implementation, on the Lanczos-based methods, e.g., in [5, 22, 28] for a complete development up to 1998 and more recently in [15, 16, 17]. However, to the authors’ best knowledge, convergence analysis for the Lanczos type method for TRS has not yet been fully developed.

Our goals in this paper are twofold. First, on the theoretic aspect, we analyze the convergence of TLTRS. In contrast to a posteriori error bounds in [36], we will develop a priori upper bounds for both the convergence to the optimal objective value and the optimal solution during TLTRS iterations. Second, on the numerical aspect, we will offer practical and effective estimates of the upper bounds. These estimates can be computed at roughly O(k2) extra flops, which turns out to be practical as knin general, and therefore can be used as stopping criteria for the second pass in GLTR, an efficient implementation of TLTRS.

We conduct two sets of numerical tests to support both our theoretical bounds and their practical estimates used as effective stopping criteria. In the first set of testing, we present several numerical tests to show the sharpness of these upper bounds, and in the second set, we integrate our upper bound estimates into the Fortran packageGLTR in the libraryGALAHAD1 (version 2.6) and test the trust-region method implemented in the solver TRU on unconstrained minimization problems with n ≥ 100 from the CUTErcollection (86 test problems in all). The numerical results show that, with the new stopping criteria integrated intoGLTR, the overall performance of the trust-region solverTRUimproves considerably.

The rest of this paper is organized as follows. In section 2, we first present some preliminary results on TRS, where the so-called nondegenerate case (or easy case) and the degenerate case (or hard case) are explicitly stated. In section 3, we then briefly describe the framework of TLTRS/GLTR as well as some basic properties.

Section 4 contains the main convergence results of this paper: in subsections 4.1 and 4.2, we discuss the convergence of TLTRS for the case λopt = 0 andλopt 6= 0, respectively, where λopt denotes the Lagrangian multiplier of (1.1) associated with the solutionsssopt; subsection 4.3 shows how to extend the main convergence results to the weighted-norm TRS (1.2). Our numerical verification of the sharpness of the established upper bounds is carried out in section 5. In section 6, we suggest a new stopping criteria forGLTR in GALAHAD, and its numerical performance in comparison with the original GLTR is presented in section 7. Final conclusions are drawn in section 8.

Notation. Throughout this paper, all vectors are column vectors and are typeset in bold lowercase letters. Forxxx∈Rn (the set of all realn-vectors), xi stands for its ith entry. ForA∈Rm×n(the set of allm×nreal matrices),Astands for the Moore–

Penrose inverse ofA, andA> andR(A) denote its transpose and range, respectively.

Then×nidentity matrix isIn or simplyIif its size is clear from the context, andeeej

is thejth column of an identity matrix whose size is determined by the context. To simplify our presentation, we shall also adopt MATLAB-like convention to access the entries of vectors and matrices. For example, A(i,j) is the (i, j)th entry of A. With i:j for the set of integers from ito j inclusive, A(k:`,i:j) is the submatrix ofA that consists of intersections of rowkto row`and column ito columnj.

2. Optimality conditions. The following well-known optimality conditions are due to Gay [7] and Mor´e and Sorensen [20] (see also [29] and [21, Theorem 4.1]). It has been serving as the fundamental guideline for most existing methods for TRS.

1GALAHAD is a thread-safe library of Fortran 2003 packages for solving nonlinear optimization

problems and its version 2.6 is available at http://www.galahad.rl.ac.uk/.

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(4)

Theorem 2.1 (see [29]). The vectorsssopt is a global optimal solution of the trust- region problem (1.1) if and only ifsssopt is feasible, i.e., ksssoptk2 ≤∆, and there is a scalarλopt≥0such that the following conditions are satisfied:

( (H+λoptIn)sssopt=−ggg, λopt(∆− ksssoptk2) = 0 and H+λoptIn is positive semidefinite.

Let the eigendecomposition ofH be

H =U ΘU> with Θ= diag(θ1, θ2, . . . , θn), where the eigenvector matrixU = [uuu1, uuu2, . . . , uuun] is orthogonal, and

θ12=· · ·=θp< θp+1≤ · · · ≤θn

(2.1)

are the eigenvalues. In (2.1), we assumeθ1has multiplicityp. LetE1be the invariant subspace associated with the smallest eigenvalueθ1. ThenU1= [uuu1, . . . , uuup]∈Rn×p is an orthonormal basis matrix forE1. WriteU = [U1, U2], whereU2= [uuup+1, . . . , uuun], and setE2=R(U2) =E1, the orthogonal complement ofE1.

For TRS (1.1), there are two situations (see, e.g., [12, 20, 21]) to consider:

1. thedegenerate case [12, Lemma 2.2] (or thehard case [21]) as characterized by2

ggg⊥E1 and k(H−θ1In)gggk2≤∆ (2.2)

and the corresponding Lagrangian multiplier isλopt=−θ1. In this case, there are multiple global solutions which can be expressed by [12, Lemma 2.2]

sssopt=−(H−θ1In)ggg+τuuu (2.3)

for anyuuu∈E1 withkuuuk2= 1, and

τ2=∆2− k(H−θ1In)gggk22≥0;

2. the nondegenerate case [12, Lemma 2.2] (or the easy case [21]) as charac- terized by the opposite of (2.2). In this case, the corresponding Lagrangian multiplier λopt > max{−θ1,0}, and the global solution sssopt is unique and given by [12, Lemma 2.2]

sssopt=−(H+λoptIn)−1ggg.

By investigating these two cases, it can be seen that if H is positive definite, the global solution sssopt can only be either sssopt = −H−1ggg (i.e., λopt = 0) or sssopt =

−(H+λoptIn)−1ggg(i.e.,λopt>−θ1) on the boundary. Therefore, the degenerate case can occur only whenθ1≤0.

3. The truncated Lanczos approach for TRS. We first outline the TLTRS method [9, section 5]. Given a symmetric positive definiteM ∈Rn×n, TLTRS starts

2We adopt the definitions of degenerate and nondegenerate cases used in [12, Lemma 2.2] in this paper.

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(5)

by using the generalized Lanczos process to produce an M-orthonormal basis Qk = [qqq0, qqq1, . . . , qqqk]∈Rn×(k+1)of the (k+ 1)st Krylov subspace3

Kk(M−1H, M−1ggg) :=R(M−1ggg,(M−1H)M−1ggg, . . . ,(M−1H)kM−1ggg), k= 0,1, . . . , ofM−1H onM−1gggto partially reduceH to the tridiagonal matrix

Tk:=Q>kHQk =

 δ0 γ1

γ1 δ1 γ2

· · ·

γk−1 δk−1 γk

γk δk

 , (3.1)

whereQ>kM Qk =Ik+1[9, Algorithm 4.2], assuming dimKk(M−1H, M−1ggg) =k+ 1.

Compactly, the process can be expressed by the relation HQk−M QkTkk+1Mqqqk+1eee>k+1 (3.2)

withγ0 =kM−1gggk2, Qkeee1 =qqq0 :=γ0−1(M−1ggg). This leads to the following reduced TRS:

min

khhhk2≤∆φ(hhh) with φ(hhh) := 1

2hhh>Tkhhh+γ0hhh>eee1. (3.3)

Lethhhk be the minimizer of (3.3). It can be verified that the vector sssk=Qkhhhk∈Kk(M−1H, M−1ggg)

is the minimizer of

min

s

ss∈Kk(M−1H,M−1ggg) kssskM≤∆

f(sss) (3.4)

and thus naturally serves as an approximation to the global optimal solutionsssopt of (1.2).

Generically, dimKk(M−1H, M−1ggg) strictly increases by 1 askincreases by 1 and thus often dimKk(M−1H, M−1ggg) =k+ 1 untilk =n−1. But it can happen that dimKk(M−1H, M−1ggg) may stop increasing at certain k. When that happens, the Lanczos process breaks down and an invariant subspace ofM−1H is found. Letkmax

be the smallest nonnegative integer such that

dimKkmax(M−1H, M−1ggg) = dimKkmax+1(M−1H, M−1ggg) =kmax+ 1.

(3.5)

This is reflected by γkmax+1 = 0, while γk 6= 0 for all 0≤k ≤kmax. In such a case, HQkmax=M QkmaxTkmax.

3We adopt a notation convention that is consistent with the one used in [9]. That is, we useKk

for the Krylov subspace of order (k+ 1) and accordinglyTkandQkfor the generated (k+ 1)×(k+ 1) symmetric tridiagonal matrix andn×(k+ 1) orthonormal basis matrix, different fromKk+1,Tk+1

andQk+1that are customarily used in the numerical linear algebra community [5, p. 305].

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(6)

Remark 3.1. GLTR can be thought of as an efficient implementation of TLTRS.

In GLTR, the (preconditioned) CG is used instead of the (preconditioned) Lanczos process, and in the first pass, the constraintkssskM ≤∆ in (3.4) is skipped implicitly andsssk is updated by CG until kssskkM exceeds∆ or Tk is detected to be indefinite.

GLTR then enters into the second pass, and thereafterKk(M−1H, M−1ggg) is contin- uously expanded by (preconditioned) CG and

s s

sk = arg min

sss∈Kk(M−1H,M−1ggg) kssskM=∆

f(sss).

Therefore, the second pass of GLTR is mathematically equivalent to TLTRS.

Two special cases forM =In are worth mentioning:

1. The caseggg= 0. TLTRS reduces to the classical Lanczos method for finding the smallest eigenpair ofH.

2. The case whenH is positive definite and∆≥ kH−1gggk2. TLTRS is equivalent to CG for solving the linear systemHsss=−ggg (see subsection 4.2 for details).

In view of these two special cases, we may say that TLTRS or its efficient implemen- tation GLTR lies between the Lanczos-based method for the linear system and that for the symmetric eigenvalue problem.

Remark 3.2. In order to simplify our presentation, in what follows, we assume the weighting matrixM =In, except in subsection 4.3, and thereby discuss the relations of the optimal values and optimal solutions between the classical problem (1.1) and the projected one (3.3). Mathematically, we will see in subsection 4.3 that doing so does not lose any generality because any convergence result for M = In can be translated into one forM 6=In through the following substitutions:

H ←M−1/2HM−1/2, ggg←M−1/2ggg.

MakingM =InsimplifiesQk to having orthonormal columns, i.e.,Q>kQk=Ik+1and (3.2) to

HQk−QkTkk+1qqqk+1eee>k+1, γ0=kgggk2, Qkeee1=ggg/γ0. (3.6)

As we previously assumed, let kmax be the smallest nonnegative integer such that (3.5) holds, i.e., the Lanczos process breaks down at iteration kmax, and let k≤kmax≤n. LetQ∈Rn×(n−kmax−1)be any orthogonal complementarity ofQkmax such thatQ:= [Qkmax, Q]∈Rn×n is orthogonal. We have

Q>HQ=

δ0 γ1 γ1 δ1 . ..

. .. . .. γk

γk δk γk+1 γk+1 δk+1 . ..

. .. . .. γkmax

γkmax δkmax

Q>HQ

=

Tk γk+1eeek+1eee>1 γk+1eee1eee>k+1 Tek

=:T.

(3.7)

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(7)

Denote the eigenvalues ofTk byσ(k)i , often called theRitz values, arranged in nonde- creasing order:

σ1(k)≤σ2(k)≤ · · · ≤σk+1(k) .

Associated with every Lanczos step k before a breakdown is the corresponding TRS (3.3). Lethhhk and λk be the solution of (3.3) and the Lagrangian multiplier for it, respectively, and setsssk =Qkhhhk. With these settings, the following lemma follows.

Lemma 3.3. We have

(i) for any k= 0,1, . . . , kmax,

θi≤σ(k)i ≤θn+i−k−1 for i= 1,2, . . . , k+ 1;

(ii) for0≤j≤k≤kmax,

σi(k)≤σ(j)i for i= 1,2, . . . , j+ 1;

(iii) in the nondegenerate case,ssskmax=sssopt andλkmaxopt.

Proof. The inequalities in items (i) and (ii) are straightforward consequences of the Cauchy interlacing inequalities [22].

Item (iii) for the caseggg6⊥E1 has been proved in [9, Theorem 5.7]. We consider the special scenario

ggg⊥E1 but k(H−θ1In)gggk2> ∆.

Defineρ(λ) :=k(H+λIn)gggk2. Use the eigendecomposition ofH to obtain a secular function ρ(λ), from which we know that the condition ggg ⊥ E1 implies that ρ(λ) is a continuous and nonincreasing function of λ ∈ (−θp+1,+∞). Also note from Q>ggg=γ0eee1 and (3.7) withγkmax+1= 0 that forλ >−θp+1

ρ(λ) =k(QT Q>+λIn)gggk20k(T+λIn)eee1k20k(Tkmax+λIkmax+1)eee1k2, implying thatγ0k(Tkmax+λIkmax+1)eee1k2is also a continuous and nonincreasing func- tion ofλ >−θp+1. Thus, it follows from

ρ(λkmax) =γ0k(TkmaxkmaxIkmax+1)eee1k2≤∆ < ρ(−θ1)

that λkmax > −θ1, i.e., H +λkmaxIn is positive definite. Moreover, by [9, Theorem 5.1], (H+λkmaxIn)ssskmax =−ggg, which according to Lemma 2.1 and the uniqueness in the nondegenerate case leads to item (iii).

Lemma 3.3(iii) says that when a breakdown occurs, TLTRS solves the original problem (1.1) exactly for the nondegenerate case. However, in the degenerate case, the solutionsssopt is of the form (2.3) with τ > 0. As the Lanczos process starting from ggg cannot extract any information out of the eigenspace E1, the approximate solutionsssk =Qkhhhk does not contain the component ofτuuufor anyuuu∈E1, even for k=kmax. In other words, the projected problem (3.3) can never deliver a sufficiently close approximate model to the original problem (1.1) for the degenerate case. This is fully discussed in [9, Theorem 5.8] with a restarting strategy to cure this problem.

Therefore, in our convergence analysis presented in section 4, we are mainly concerned with the nondegenerate case.

We conclude this section with an important result in [18], which claims that the Lagrangian multipliersλk monotonically increase withk.

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(8)

Lemma 3.4 (see [18]). The sequence {λk}kk=0max of Lagrangian multipliers associ- ated with (3.3)is monotonically nondecreasing in k.

Combining Lemmas 3.3 and 3.4, we have the following Proposition 3.5. This is the first step for the convergence analysis of TLTRS/GLTR.

Proposition 3.5. Let 0≤k≤kmax; then (i) if λk = 0, thenλi= 0 fori= 0,1, . . . , k, and

(ii) in the nondegenerate case,0≤λ1≤λ2≤ · · · ≤λkmaxopt.

4. Convergence analysis for TLTRS. Throughout this section, we assume that (1.1) is nondegenerate, unless otherwise explicitly stated differently. Also, since GLTR is an efficient implementation of TLTRS and is essentially equivalent to TLTRS (cf. Remark 3.1), our convergence analysis will focus on TLTRS only.

We will analyze the convergence for the two cases λopt = 0 and λopt > 0, separately.

Let Pk denote all polynomials with degree no higher than k. The Chebyshev polynomials will show up in our convergence analysis. Thekth Chebyshev polynomial of the first kindTk(t)∈Pk is

Tk(t) = cos(karccost) for|t| ≤1

= 1 2

t+p

t2−1k +

t+p

t2−1−k

for|t| ≥1.

It frequently shows up in numerical analysis and computations because of its numerous nice properties, for example,|Tk(t)| ≤1 for|t| ≤1 and|Tk(t)|grows extremely fast4 for|t|>1. We have the following classical result (see, e.g., [28, Theorem 6.25]).

Lemma 4.1. For a given γ6∈[a, b], we have

min

p∈Pk,p(γ)=1

max

t∈[a,b]|p(t)|= Tk

1 + 2γ−b b−a

−1

= Tk

−[b−γ] + [a−γ]

[b−γ]−[a−γ]

−1

. (4.1)

There is an elegant expression forTk(· · ·) in (4.1), namely [15],

Tk

1 +t 1−t

= Tk

t+ 1 t−1

= 1

2 Γtkt−k

for 16=t >0, (4.2a)

where

Γt:=

√t+ 1

|√

t−1| fort >0.

(4.2b)

4.1. Convergence when λopt = 0. In this case, H is positive definite, and moreover kH−1gggk2 ≤ ∆, implying that (1.1) is equivalent to the linear system Hsssopt=−ggg, and

f(sss) =1

2(sssopt−sss)>H(sssopt−sss)−1

2sss>optHsssopt.

Furthermore, by Lemma 3.3(iii) and Proposition 3.5, we know that λk = 0 for all k = 0,1, . . . , kmax, which implies that each TRS (3.3) is equivalent to the linear

4In fact, a result due to Chebyshev himself says that ifp(t) is a polynomial of degree no bigger thankand|p(t)| ≤1 for−1t1, then|p(t)| ≤ |Tk(t)|for anytoutside [−1,1] [3, p. 65].

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(9)

systemTkhhhk=−γ0eee1, and TLTRS turns out to be the full orthogonalization method [28, Algorithm 6.4]. Indeed,

s

ssk=Qkhhhk= arg min

sss∈Kk(H,ggg)

1

2(sssopt−sss)>H(sssopt−sss),

the same as the one obtained from CG [28, section 6.7] on the linear systemHsssopt=

−ggg. In other words, in this case, GLTR will never go over the boundary ofksssk2≤∆ (i.e., the second pass of GLTR will never be called), and thereby, the approximationsssk from GLTR is the same as that from the CG iteration forHsssopt=−ggg. Consequently, the standard convergence theory [28, section 6.11.3] for CG applies for this situation.

In particular, we have ksssopt−ssskkH

ksssoptkH ≤ 1

Tk+1((κ+ 1)/(κ−1)) = 2 Γκk+1κ−(k+1)

, (4.3)

whereΓκis defined by (4.2), andκ:=κ(H) =θθn

1 is the spectral condition number of H. In terms of the spectral norm, (4.3) implies

ksssopt−ssskk2

ksssoptk2

√κ

Tk+1((κ+ 1)/(κ−1)), and byf(sssopt) =−12sss>optHsssopt in this case, it holds that

0≤f(sssk)−f(sssopt) = 1

2(sssopt−sssk)>H(sssopt−sssk)≤ kHk2κksssoptk22 2Tk+12 ((κ+ 1)/(κ−1)). 4.2. Convergence when λopt>0. Ifλopt>0, thenksssoptk2=∆. This is the case whenH is indefinite or positive definite butkH−1gggk2> ∆.

But we point out thatλopt>0 does not imply allλk>0, as the following simple example demonstrates, where λopt > 0 but λk = 0 for some 0 ≤ k ≤ kmax, i.e., hhhk=−γ0Tk−1eee1is the solution to the projected TRS (3.3).

Example 4.2. Consider TRS with H =

2 1 1 1

, ggg=eee1∈R2and∆= 1.

It can be verified thatλopt ≈0.1701 andsssopt ≈[−0.7602,0.6497]>, but λ0 = 0 and sss0=−eee1/2.

Even thoughλk = 0 may happen in the early stage of TLTRS, eventuallyλk>0 as k increases, and therebykssskk2 =∆. This also means that GLTR will eventually encounter the boundary ofksssk2 ≤∆ and proceeds to the second pass and the third pass. For that reason, in what follows, we analyze the errors

ksssopt−ssskk2 and |f(sssk)−f(sssopt)|

under the assumptionkssskk2=∆. Set

Hopt:=H+λoptIn, (4.4)

which is positive definite since it is assumed in the nondegenerate case.

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(10)

Before proceeding further, we mention a related analysis given in [35] of the tCG method for the strictly convex TRS (i.e., H is positive definite). When CG encounters the boundary, GLTR and tCG invoke different procedures. In particular, tCG stops at the next step by choosing the intersection point of the CG path [35] on the boundaryksssk2 =∆, while GLTR continues from the last CG step by expanding the Krylov subspace and finding a boundary approximationssskin the Krylov subspace.

The main result in [35] shows that the reduction in the objective function by tCG is at least half of the reduction by sssopt. This result was generalized to the convex case in [4, section 7.5.2]. There is a major difference in goals between GLTR and tCG: GLTR seeks accurate approximations tosssopt, as accurate as dictated by chosen tolerance, whereas tCG attempts to find approximations that hopefully reduce the objective function by significant fractions assssopt does. Our analysis in this paper is concerned with GLTR for the general TRS and will provide upper bounds for the absolute reductions|f(sssk)−f(sssopt)|andksssopt−ssskk2.

4.2.1. The optimal polynomial. Note that the approximate solution sssk = Qkhhhk∈Kk(H, ggg) can be expressed as

ssskk(H)ggg=U ψk(Θ)U>ggg=U ψk(Θ)aaa=

n

X

i=1

ψki)aiuuui,

whereaaa=U>ggg and the optimal polynomialψk ∈Pk is given by ψk = arg min

ψ∈Pk,kψ(H)gggk2=∆f(ψ(H)ggg).

(4.5) Let

ψk(θ) =

k

X

i=0

pbiθi= [pb1,pb2, . . . ,pbk+1]

| {z }

=:pbpp>

 1 θ ... θk

and Vk+1,n:=

1 1 . . . 1 θ1 θ2 . . . θn

... ... · · · ... θ1k θ2k . . . θkn

 .

Since anysss∈Kk(H, ggg) takes the formsss=ψ(H)ggg=U ψ(Θ)aaa=U DaaaVk+1,n> ppp, where Daaa = diag(a1, a2,· · ·, an) andppp∈Rk+1 is the coefficient vector ofψ, we can express the coefficient vectorbpppcorresponding to the optimalψk as

bp

pp= arg min

kDaaaVk+1,n> pppk2=∆

1

2ppp>(Vk+1,nDaaaΘDaaaVk+1,n> )ppp+ppp>Vk+1,nDaaaaaa.

By the Lagrangian multiplier theory, there isϑk∈Rsuch that Vk+1,nDaaa(Θ+ϑkIn)DaaaVk+1,n>

bp

pp=−Vk+1,nDaaaaaa and kDaaaVk+1,n> bpppk2=∆.

Due to the emergence of the Lagrangian multiplierϑk, however, this characterization for the optimalψk doesn’t lead to a simple convergence analysis, as opposed to the ones in [14, 15] for analyzing CG and the minimal residual method. In what follows, we adopt an approach of using suboptimal polynomial approximations to establish bounds on the errors in the approximation solutions.

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(11)

4.2.2. Solutions resulted from suboptimal polynomials. Recall that we will be focusing on the situation where ksssoptk2 =∆ and on approximations ssswith ksssk2=∆. We first present a general framework to bound the errors of

f(sssk)−f(sssopt) and ksssk−sssoptk2

in terms of any nonzeroesss∈Kk(H, ggg). Later, this framework will be realized for es

ss=℘k(H)ggg∈Rn (4.6)

constructed from certain suboptimal polynomial℘k ∈Pk as opposed to the optimal one given by (4.5) for the purpose of establishing error bounds for TLTRS solutions (i.e., for the second pass of GLTR).

Theorem 4.3. Suppose (1.1) is nondegenerate, ksssoptk2 = ∆ and sssk is the kth (k ≤kmax) approximation of TLTRS satisfying kssskk2 =∆. Then for any nonzero esss∈Kk(H, ggg), we have

0≤f(sssk)−f(sssopt)≤2kHoptk2kesss−sssoptk22 and (4.7)

ksssk−sssoptk2≤2√

κkesss−sssoptk2, (4.8)

whereHopt is given by (4.4)and

κ:=κ(Hopt) =θnopt θ1opt

(4.9)

is the spectral condition number ofHopt. Proof. First, we have

kesssk2−∆ =

kesssk2− ksssoptk2

≤ kesss−sssoptk2, which leads to

1− ∆ kesssk2

≤ kesss−sssoptk2

kesssk2

. (4.10)

Letrrr=vvv−sssopt, wherevvv= (esss/kesssk2)∆. We then have krrrk2=ksssopt−vvvk2≤ ksssopt−esssk2+kesss−vvvk2

=ksssopt−esssk2+

esss−∆· esss ksessk2

2

=ksssopt−esssk2+kesssk2×

1− ∆ kesssk2

≤2ksssopt−esssk2, (4.11)

where the last inequality is obtained by using (4.10). Moreover, since for any 0≤i≤ kmax−1,

f(sssi) = min

sss∈Ki(H,ggg) ksssk2≤∆

f(sss)≥ min

sss∈Ki+1(H,ggg) ksssk2≤∆

f(sss) =f(sssi+1)≥ min

ksssk2≤∆f(sss) =f(sssopt), (4.12)

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(12)

we have

0≤f(sssk)−f(sssopt)≤f(vvv)−f(sssopt)

=rrr>Hrrr

2 +rrr>(Hsssopt+ggg) =rrr>Hrrr

2 −λoptrrr>sssopt

=rrr>(H+λoptI)rrr (4.13) 2

≤kHoptk2

2 krrrk22

≤2kHoptk2ksssopt−sessk22, (4.14)

where for obtaining (4.13) we have used

2=kvvvk22=ksssoptk22+krrrk22+ 2rrr>sssopt

to get rrr>sssopt = −krrrk22/2 = −rrr>rrr/2, and we used (4.11) for getting (4.14). This completes the proof of (4.7).

For (4.8), we define fopt(sss) := 1

2sss>Hoptsss+sss>ggg=f(sss) +1

optksssk22. Then by noting that∇fopt(sssopt) =Hoptsssopt+ggg=0, we have for anysss,

fopt(sss) =fopt(sssopt) +1

2(sss−sssopt)>Hopt(sss−sssopt), and thus,

fopt(sss)−fopt(sssopt)≥ 1

2(θ1opt)ksss−sssoptk22. (4.15)

Furthermore, ifksssk2=∆, then fopt(sss)−fopt(sssopt) =

f(sss) +1

optksssk22

f(sssopt) +1

optksssoptk22

=f(sss)−f(sssopt)

sinceksssoptk2=∆ also. Consequently, forsssk, by (4.7) and (4.15), we have 1

2(θ1opt)ksssk−sssoptk22≤f(sssk)−f(sssopt)≤2kHoptk2ksssopt−esssk22, yielding (4.8).

Next, we will discuss two suboptimal polynomials℘k∈Pk to realizeesssby (4.6).

Polynomials resulting from minksssopt−esssk2. According to Theorem 4.3, a good bound forf(sssk)−f(sssopt) andksssk−sssoptk2can be pursued by minimizingksssopt−esssk2

overesss∈Kk(H, ggg). Note thatesss∈Kk(H, ggg) =Kk(Hopt, ggg), and therefore, es

ss−sssopt=℘k(Hopt)ggg+Hopt−1ggg

= (℘k(Hopt)Hopt+In)Hopt−1ggg

=−~(Hopt)sssopt,

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(13)

where ~(t) := 1 +t℘k(t)∈Pk+1 satisfying~(0) = 1. Hence, noting ksssoptk2≤∆, we have

℘∈minPk

kesss−sssoptk2≤ min

~Pk+1,~(0)=1

k~(Hopt)k2·∆

≤ min

~Pk+1,~(0)=1

max

i=1,2,...,n|~(θiopt)| ·∆

≤ min

~Pk+1,~(0)=1

max

t∈[θ1optnopt]|~(t)| ·∆

= ∆

Tk+1(η) = 2∆

Γκk+1κ−(k+1)−1

, (4.16)

where the second equality in (4.16) follows from Lemma 4.1 and (4.2), and η:= κ+ 1

κ−1 = 1 + 2θ1opt

θn−θ1 (4.17)

withκ=κ(Hopt) defined by (4.9). Substituting (4.16) into (4.7) and (4.8) gives 0≤f(sssk)−f(sssopt)≤2kHoptk2

∆ Tk+1(η)

2 , (4.18a)

ksssopt−ssskk2≤ 2√ κ∆ Tk+1(η). (4.18b)

Best polynomials for approximating x−η1 . We next discuss yet another suboptimal polynomial. Note thatggg=Uaaa=Pn

i=1uuuiai, and we have for anysss∈Kk(H, ggg) sss=ψ(H)ggg=U ψ(Θ)aaa=

n

X

i=1

ψ(θi)aiuuui for someψ∈Pk, and (4.19)

sssopt=−(H+λoptIn)−1ggg=−U(Θ+λoptIn)−1aaa=−

n

X

i=1

ai

θiopt

uuui. (4.20)

By comparing (4.19) and (4.20), we define a suboptimal polynomial℘rak ∈Pk as the solution to the following minimax approximation problem:

rak := arg min

℘∈Pk

max

θ1≤θ≤θn

℘(θ)− 1 θ+λopt

. (4.21)

In other words, ℘rak is the best polynomial of approximation to the rational function

1

θ+λopt in the interval [θ1, θn].

Note that the linear transformation θ(x) =θn−θ1

2 x+θ1n 2

mapsx∈[−1,1] one-to-one and ontoθ∈[θ1, θn]; moreover, by (4.17),η >1, and we have

ψ∈minPk

max

θ1≤θ≤θn

ψ(θ)− 1 θ+λopt

= min

ψ∈Pk

max

−1≤x≤1

ψ(θ(x))− 2

n−θ1)(x−θ1θn+2λopt

n−θ1 )

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(14)

= 2 θn−θ1

×min

ψ∈Pk

max

−1≤x≤1

n−θ1)ψ(θ(x))

2 − 1

x−θ1θn+2λopt

n−θ1

= 2

θn−θ1

×min

℘∈Pk

−1≤x≤1max

℘(x)− 1 x−η

with℘(x) = n−θ1)ψ(θ(x))2 , which implies that

max

θ1≤θ≤θn

rak (θ)− 1 θ+λopt

= 2

θn−θ1 ×min

℘∈Pk

−1≤x≤1max

℘(x)− 1 x−η

| {z }

=:rak(η)

, (4.22)

whererak(η) is the error of approximation by the best polynomial inPkto x−η1 in the interval [−1,1].

For the behavior ofrak (η) with respect tokandη, we fortunately have the explicit formulation by the pioneering works of Chebyshev and Bernstein. Indeed, Chebyshev found an explicit expression for the best approximating polynomial of x−η1 in [−1,1], and Bernstein gave a trigonometric representation as stated in the next lemma [19, section 4.3].

Lemma 4.4 (Bernstein [2]). Given η > 1, the best approximating polynomial pk(x)∈Pk of x−η1 in [−1,1]satisfies

1

x−η −pk(x) = η+p

η2−1−k

η2−1 cos(kα+β), whereαandβ are such thatx= cosαand ηx−1x−η = cosβ, and moreover,

rak(η) := min

℘∈Pk

−1≤x≤1max

℘(x)− 1 x−η

= η+p

η2−1−k

η2−1 . (4.23)

Remark 4.5. It is noted that η+p

η2−1> η > 1 sinceη >1, and forη given by (4.17),

η+p

η2−1 =

√ κ+ 1

κ−1 =Γκ,

where Γκ is defined by (4.2b). Therefore, rak (η) converges linearly to zero with the linear factorΓκ−1 askincreases.

Now we can establish error bounds for TLTRS solutions in terms ofrak (η). The corresponding estimates forksssk−sssoptk2 andf(sssk)−f(sssopt) also reflect the behavior of TLTRS characterized by the number of Lanczos stepkand the parameterηas we will see from the numerical examples in section 5.

Let℘rak be defined by (4.21), and setesssrak :=℘rak(H)ggg. Note by (4.22) that kesssrak −sssoptk2=

U ℘rak(Θ)−(Θ+λopt)−1 aaa

2

≤ kgggk2× max

θ∈{θ1...,θn}

rak (θ)− 1 θ+λopt

≤ kgggk2× max

θ1≤θ≤θn

rak(θ)− 1 θ+λopt

= 2kgggk2 θn−θ1

rak(η).

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(15)

Now withesss=esssrak, by Theorem 4.3, we have 0≤f(sssk)−f(sssopt)≤2kHoptk2

2kgggk2

θn−θ1

rak(η) 2

, (4.24a)

ksssopt−ssskk2≤2√ κ

2kgggk2

θn−θ1rak (η) = 4

√ κkgggk2

θn−θ1 rak(η).

(4.24b)

Summarizing the results in (4.18) and (4.24) for the two suboptimal solutions yields the item (ii) of the following theorem.

Theorem 4.6. Let θ1≤θ2 ≤ · · · ≤θn be the ordered eigenvalues of H and λopt be the optimal Lagrange multiplier as inTheorem 2.1forTRS (1.1). Let the sequence {sssk}kk=0max be generated byTLTRS for (1.1).

(i) The sequence {f(sssk)}kk=0max is nonincreasing, and f(ssskmax) = f(sssopt) for the nondegeneratecase, and

f(ssskmax) +τ2θ1

2 ≤f(sssopt)≤f(ssskmax) (4.25)

for the degeneratecase, whereτ2=∆2− k(H−θ1In)gggk22≥0 andθ1≤0.

(ii) For thenondegeneratecase, if ksssoptk2=kssskk2=∆ for some 0≤k≤kmax, then

0≤f(sssk)−f(sssopt)≤2kHoptk2ζk2, (4.26a)

ksssopt−ssskk2≤2√ κζk, (4.26b)

whereHopt=H+λoptIn,κ=κ(Hopt)by (4.9), and ζk = min

Tk+1(η), 2kgggk2rak(η) θn−θ1

, (4.27)

rak(η)is defined by (4.23), andTk+1(η)is the(k+ 1)st Chebyshev polynomial of the first kind evaluated atη given in (4.17).

Proof. Based on our previous discussions, only the inequality (4.25) needs a proof.

First, it can be seen that f(ssskmax) is an upper bound for f(sssopt) by (4.12). For the orthogonal matrixQ:= [Qkmax, Q]∈Rn×n satisfying (3.7), let

Q>sssopt= Q>k

maxsssopt

Q>sssopt

= yyy

zzz

.

From Q>HQ = T = diag(Tkmax, T), Tkmax = T(1:kmax+1,1:kmax+1), T

=T(kmax+2:n,kmax+2:n)andQ>ggg=γ0eee1, we have f(sssopt) =1

2sss>optHsssopt+ggg>sssopt

=1

2sss>optQT Q>sssopt+sss>optQQ>ggg

=1

2yyy>Tkmaxyyy+γ0yyy>eee1+1 2zzz>Tzzz

≥f(ssskmax) +1 2zzz>Tzzz,

Downloaded 11/08/17 to 129.107.134.161. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Références

Documents relatifs

Global existence of strong solutions to the compressible Navier–Stokes equations, with small initial data, in the presence of initial vacuum, was first proved by Huang, Li, and Xin

Recently, the notion of memory-efficient represen- tations of the orthonormal basis matrices of the second-order and higher-order Krylov subspaces was generalized to the model

Shu , Stability analysis and a priori error estimates to the third order explicit Runge-Kutta discontinuous Galerkin method for scalar conservation laws, SIAM J. Redistribution

Finally, we apply some spectral properties of M -tensors to study the positive definiteness of a class of multivariate forms associated with Z -tensors.. We propose an algorithm

In this paper we study the structure of finite dimensional estimation algebras with state dimension 3 and rank 2 arising from a nonlinear filtering system by using the theories of

In section 3, we use the sticky particle method to study weak solutions to the mCH equation for general Radon measure initial data.. Space-time BV estimates

In the important case in which the underlying sets are convex sets described by convex polynomials in a finite dimensional space, we show that the H¨ older regularity properties

For the direct adaptation of the alternating direction method of multipliers, we show that if the penalty parameter is chosen sufficiently large and the se- quence generated has a