Vol. 40, No. 4, pp. A2005–A2032
A NESTED LANCZOS METHOD FOR THE TRUST-REGION SUBPROBLEM∗
LEI-HONG ZHANG† AND CHUNGEN SHEN‡
Abstract. The trust-region subproblem (TRS) minimizes a quadraticf(s) =sTHs/2 +sTg over the ellipsoidal constraint kskM ≤ ∆for a symmetric and positive definite matrix M. For a large scale TRS, a Lanczos-type approach, namely, the generalized Lanczos trust-region (GLTR) method was introduced by Gould, Lucidi, Roma, and Toint [SIAM J. Optim., 9 (1999), pp. 504–525], and extends nicely the classical Lanczos method for the eigenvalue problem to TRS. Basically,GLTR attempts to obtain a feasible approximation in the Krylov subspaceKk(M−1H, M−1g) in an efficient way. For an accurate approximation, the dimensionkofKk(M−1H, M−1g) is usually modest for a well-conditioned TRS, but can be large for ill-conditioned problems. This causes numerical difficulties in the computational costs, memory requirements, and numerical stability. This paper introduces an efficient nested restarting strategy forGLTRand resolves these numerical troubles. Convergence analysis and numerical testings are carried out to support our improvements uponGLTR.
Key words. Lanczos method, Krylov subspace, restarting, nested GMRES, trust-region subproblem, ill-conditioned
AMS subject classifications. 90C20, 90C06, 65F10, 65F15, 65K05 DOI. 10.1137/17M1145914
1. Introduction. The main task of this paper is to develop an efficient restart- ing Lanczos-type method for solving the large scale trust-region subproblem (TRS) [21, 22]
(1.1) min
kssskM≤∆
1
2sssTHsss+sssTggg, whereH ∈Rn×nis symmetric,ggg∈Rn, ∆ >0, andkssskM =√
s s
sTMssswith a symmetric and positive definite weight matrixM. TRS aims at minimizing a quadratic function f(sss) := 12sssTHsss+sssTggg over an ellipsoid and lies among fundamental problems in nu- merical optimization as well as in other practical applications. It originally was fully studied as a subproblem in the prominent trust-region method (refer to [16, 22] for a general discussion and the references therein for in-depth extensions), and also finds numerous other important applications. A full list of these applications is ever in- creasing; here we only mention a few noticeable applications, arising from the Lorentz eigenvalue problem [37], the Tikhonov regularization [25, 26, 27, 31, 32], the con- strained eigenvalue problem [11, 13, 34], graph partitioning [19], and the Levenberg–
Marquardt approach [22].
Because the model (1.1) plays a vital role in the various applications, there are a lot of efficient algorithms in the literature. In general, according to [1], algorithms
∗Submitted to the journal’s Methods and Algorithms for Scientific Computing section September 1, 2017; accepted for publication (in revised form) March 26, 2018; published electronically July 3, 2018.
http://www.siam.org/journals/sisc/40-4/M114591.html
Funding: The work of the first author was supported in part by the National Natural Science Foundation of China via grants NSFC-11671246, NSFC-91730303, and NSFC-11371102.
†School of Mathematics and Shanghai Key Laboratory of Financial Information Technology, Shanghai University of Finance and Economics, 777 Guoding Road, Shanghai 200433, China ([email protected]).
‡College of Science, University of Shanghai for Science and Technology, Shanghai 200093, China ([email protected]).
A2005
for (1.1) can be grouped into three broad categories: accurate methodsfor the dense TRS [1, 21, 22],accurate methodsfor the large-sparse TRS [1, 10, 15, 17, 18, 24, 25, 26, 27, 29, 31], and approximate methods[16, 17, 30, 33] which aim at reducing the overall cost of solving the nonlinear minimization problem via TRS, instead of finding the exact minimizer of TRS.
As a highly efficient accurate method for small-size dense TRS (1.1), the Mor´e–
Sorensen method [21] is one of the most widely used matrix factorization methods, and algorithms for large-scale sparse problems frequently use it or its appropriate modification as a subroutine to solve the relevant subproblem resulting from the original problem (1.1). For example, in [16], ageneralizedLanczostrust-region (GLTR) method ([16, Algorithm 5.1]; see also [2, Chapter 5]) is proposed as an improved Steihaug [30]–Toint [33]truncatedconjugate-gradient (tCG) iteration for (1.1).
The idea behindGLTRis the Lanczos process plus the Rayleigh–Ritz procedure.
Similar to the classical Lanczos method for the eigenvalue problem (see, e.g., [23]), to handle the large-scale TRS, the original TRS (1.1) is first projected onto the Krylov subspace to form a small size TRS, which is then solved by a certain efficient accurate method. The details of this procedure are formulated in [16] (see also section 2 and [36]) and we call it the Lanczos method forTRS (LTRS).GLTRcan be viewed as an efficient implementation ofLTRSby relying ontCGas well as the intimate relationship between the three-term preconditioned Lanczos process and the preconditioned CG (PCG) iteration (see [16]). More recently, [36] offers a theoretical analysis on the convergence behavior of LTRS/GLTR.
Apart from a favorable numerical performance on some problems and the theo- retical convergence analysis of LTRS/GLTR, there are, however, drawbacks for solving a large and ill-conditioned TRS; namely, the initial vector for generating the Krylov subspace is fixed toM−1ggg(we shall give an explanation for this in section 2), and the Krylov subspace needs to be expanded continuously to obtain an accurate approxima- tion. The fixed initial vectorM−1ggginduces numerical trouble when the dimensionkof the Krylov subspace gets large. In particular, askincreases, not only do the overhead computational complexity and memory requirements increase, but the numerical sta- bility (e.g., the orthogonality between Lanczos vectors) deteriorates gradually. The increase in computational costs and memory demands come from the expansion of the Lanczos vectors on the one hand, and also from the Rayleigh–Ritz procedure in solving a sequence of projected TRS’s whose dimensions increase continuously, on the other hand. Breakdown in the Lanczos process can happen in theory, but the roundoff error always prevents its appearance in practice, leading to an infinitely increasingk.
In this paper we attempt to develop an efficient restarting method to alleviate the above mentioned numerical difficulties. The key for our restarting strategy is from the convergence analysis of LTRS/GLTR in [36]. Basically, we find that the convergence behavior of LTRS/GLTRis essentially the same as applyingimplicitly a Krylov subspace method to a “linear” system (H +λoptM)sssopt =−ggg, or its precon- ditioned “linear” system M−1(H +λoptM)sssopt = −M−1ggg, where sssopt, λopt are the global solution of TRS (1.1) and the corresponding Lagrange multiplier, respectively.
Here we remark thatλopt is unknown, and henceM−1(H +λoptM)sssopt =−M−1ggg is not indeed a linear system, and thus traditional Krylov subspace methods cannot be applied directly. The Lanczos methodLTRS/GLTR, from this point of view, can be regarded as a Krylov subspace approach that simultaneously produces the ap- proximations forsssopt andλopt (more detailed discussion will be presented in sections 2 and 3). The connection between LTRS/GLTRwith the implicit realization for the
preconditioned systemM−1(H+λoptM)sssopt=−M−1gggleads us to a nested restarting Lanczos method for TRS in this paper.
After a brief review of the TRS andLTRS, we present a convergence result (The- orem 2.2) forLTRSin section 2. The intuition of our restarting strategy follows from this convergence result and is also discussed in section 2. In section 3, we introduce our nested restarting Lanczos method for TRS and also provide some related prop- erties. The convergence behaviors are investigated in section 4. We show in section 5 that the performance of the nested restarting Lanczos method can be improved by an initialization step usingtCG. The effectiveness and the efficiency of our nested restarting Lanczos method is evaluated in section 6 in various situations, and final conclusions are drawn in section 7.
Notation. Throughout this paper, all vectors are column vectors and are typeset in bold lower case letters. Then×nidentity matrix isInandeeejis thejth column of an identity matrix whose size is determined by the context. For any symmetric matrixC, λi(C) stands for theith smallest eigenvalue. WhenCis positive semidefinite (positive definite), we denote it byC<0 (0). For a given nonzero vectorxxx, thekth Krylov subspace generated by a square matrixCisKk(C, xxx) := span(xxx, Cxxx, . . . , Ck−1xxx). The dimension ofKk(C, xxx) in general strictly increases by 1 after each step k, and thus often we have dimKk(C, xxx) = k until k = χ(xxx), the grade ofxxxwith respect to C.
χ(xxx) is the smallest nonnegative integerk such that dimKk(C, xxx) = dimKk+1(C, xxx);
in this case,Kχ(xxx)(C, xxx) is an invariant subspace ofC.
Let Pk denote all polynomials with degree at most k. For the convergence of Krylov subspace methods, the Chebyshev polynomials frequently play a central role.
Thekth Chebyshev polynomial of the first kindTk(t)∈Pk is Tk(t) =
(cos(karccost) for|t| ≤1,
1 2
t+√
t2−1k
+ t+√
t2−1−k
for|t| ≥1.
2. The Lanczos method for TRS. We first briefly review the Lanczos method for solving TRS (LTRS) introduced in [16, section 5] (see also [2, 36]). LTRSbasically mimics the classical Lanczos method for the eigenvalue problem, which consists of the preconditioned Lanczos process and the Rayleigh–Ritz procedure (see [23, section 11.3] and [6, Definition 7.1]). Thekth step can be structured into the following three procedures:
(A) Produce thekth Krylov subspace via the preconditioned Lanczos process [23, Algorithm 4.2].
(B) Project TRS (1.1) onto thekth Krylov subspace to yield a smaller size TRS.
(C) Solve the resulting smaller size TRS to achieve an approximate of TRS (1.1).
More precisely, LTRS starts with qqq1 = M−1ggg/γ1, with γ1 = kgggkM−1, and the preconditioned Lanczos process produces anM-orthonormal basisQk = [qqq1, . . . , qqqk]∈ Rn×k satisfyingQTkM Qk =Ik of thekth Krylov subspaceKk(M−1H, M−1ggg). Note (2.1) Kk(M−1H, M−1ggg) =M−12Kk(M−12HM−12, M−12ggg).
Assuming dimKk(M−1H, M−1ggg) =k, the preconditioned Lanczos process partially reducesH to the tridiagonal form
Tk =QTkHQk=
δ1 γ2 γ2 δ2 γ3
· · ·
γk−1 δk−1 γk
γk δk
,
and it bears the compact relation
HQk−M QkTk =γk+1Mqqqk+1eeeTk and QTkM Qk=Ik
with Qkeee1 = qqq1. Projecting the original TRS (1.1) onto Kk(M−1H, M−1ggg) and notingkQkhhhkM =khhhk2 for anyhhh∈ Rk, we have the following reduced trust-region subproblem:
(2.2) min
khhhk2≤∆
fb(hhh) with fb(hhh) := 1
2hhhTTkhhh+γ1hhhTeee1. Lethhhk be the minimizer of (2.2); it can be readily verified that (2.3) sss(k) :=Qkhhhk∈Kk(M−1H, M−1ggg) solves min
s
ss∈Kk(M−1H,M−1ggg),kssskM≤∆f(sss), and thus naturally serves as an approximation to the global optimal solution sssopt
of (1.1).
The breakdown in the preconditioned Lanczos process happens whenk=χ(M−1ggg), the grade1ofM−1gggwith respect toM−1H. Letι:=χ(M−1ggg) and the breakdown can be reflected (at least in exact arithmetic) byγι+1= 0 whileγk6= 0 for all 1≤k≤ι.
In such a case, it holds that HQι =M QιTι. The breakdown for the nondegenerate caseto be defined below impliessss(k) =sssopt, and we refer to [36, 38] for more detailed discussions on the accuracy of the approximate solutionsss(k) in general.
Next we give an explanation on why the initial vectorM−1ggg is fixed to generate the Krylov subspace Kk(M−1H, M−1ggg), and also state the underlying principle to restart the Lanczos method for TRS. To this end, we begin with an important result on TRS due to Gay [12] and Mor´e and Sorensen [21]:
Theorem 2.1 (see Gay [12] and Mor´e and Sorensen [21]). The vectorsssopt is a global optimal solution of the trust-region problem (1.1)if and only ifsssopt is feasible, i.e., ksssoptkM ≤∆, and there is a scalarλopt ≥0 such that the following conditions are satisfied:
(H+λoptM)sssopt=−ggg, λopt(∆− ksssoptkM) = 0, and H+λoptM <0.
Let the eigen-decomposition ofM−12HM−12 be P ΘPT withPTP =In and the ordered eigenvaluesθ1=· · ·=θp< θp+1≤ · · · ≤θn. Denote
(2.4) W = [www1, www2, . . . , wwwn] =M−12P = [M−12ppp1, M−12ppp2, . . . , M−12pppn].
It is true that W is M-orthogonal and WTHW = Θ. Let E1 = span(W1) be the invariant subspace associated with the smallest eigenvalueθ1=· · ·=θp, whereW1= [www1, . . . , wwwp]∈Rn×p andW = [W1, W2]. There are two scenarios (e.g., [1, 18, 21, 22]) when the solution is on the boundaryksssoptkM =∆: thedegenerate case(or thehard case [1, Definition 4.2]) and thenondegenerate case(or the easy case). The former2 happens if and only if
(2.5) ggg⊥E1 and k(H−θ1M)†gggkM ≤∆.
1By (2.1), the gradeχ(M−1ggg) with respect toM−1H is the grade ofM−12ggg with respect to M−12HM−12.
2We adopt the definitions of degenerate and nondegenerate cases of [1, Definition 4.2] in this paper.
The corresponding Lagrange multiplier for the degenerate case is λopt = −θ1 and there are multiple global solutions, all of which can be expressed as
ss
sopt =−(H−θ1M)†ggg+τu ∀uuu∈E1, kuuukM = 1, andτ=±p
∆2− k(H−θ1M)†gggk2M. The nondegenerate case is characterized by the opposite of (2.5) withλopt>−θ1andsssopt=−(H+λoptM)−1ggg.
Denote
(2.6) Hopt=H+λoptM.
The optimality condition indicatesHoptis positive semidefinite and the pair (λopt, sssopt) satisfies that systemHoptsssopt =−ggg, or equivalently, the preconditioned system (2.7) (M−1H+λoptIn)sssopt=−M−1ggg.
This is one of the keys for the convergence bounds in Theorem 2.2 ([36, Theorem 4.11]) for f(sss(k))−f(sssopt) and ksssopt−sss(k)kM. For the special caseλopt = 0, the Lanczos method for TRS turns out to be the full orthogonalization method (FOM) [28, Algorithm 6.4] and the approximated solutionsss(k) is the same as the one obtained from PCG [28, section 6.7] on the linear system Hsssopt = −ggg. Consequently, the standard convergence theory [28, section 6.11.3] forCGapplies for this situation. For the generic and more difficult caseλopt 6= 0, i.e.,ksssoptkM =∆, we can implicitly apply PCGforHoptsssopt =−gggto yield the approximationsss(k) of (2.3), and the convergence has been established in [36]. Theorem 2.2 summariesa priori error bounds forLTRS.
Theorem 2.2 (see [36, Theorem 4.11]). Let W ∈Rn×n be M-orthogonal such thatWTHW = diag(θ1, . . . , θn)with theith smallest eigenvalueθi=λi(M−12HM−12).
Suppose the preconditioned Lanczos process does not break down until ι =χ(M−1ggg) and let the sequence {sss(k)}ιk=1 be generated by LTRSforTRS (1.1).
(i) The sequence {f(sss(k))}ιk=1 is nonincreasing, and f(sss(ι)) = f(sssopt) for the nondegeneratecase, and
f(sss(ι)) +τ2θ1
2 ≤f(sssopt)≤f(sss(ι))
for thedegeneratecase3, whereτ2=∆2− k(H−θ1M)†gggk2M ≥0andθ1≤0.
(ii) For thenondegeneratecase, if ksssoptkM =ksss(k)kM =∆ for some1≤k≤ι, then
0≤f(sss(k))−f(sssopt)≤2(θn+λopt)ζk2, ksssopt−sss(k)kM ≤2√
κζk, where
(2.8) κ=θn+λopt θ1+λopt
, ζk= min ( ∆
Tk(η), 2kM−12gggk2rak(η) θn−θ1
) ,
Tk(x)is thekth degree Chebyshev polynomial of the first kind, rak(η) = η+p
η2−11−k
η2−1 , andη= κ+ 1 κ−1 >1.
3For the degenerate case, whenKι(M−1H, M−1ggg) is an invariant subspace forM−1H, we can choose a new vectorqqq which is M-orthonormal toKι(M−1H, M−1ggg), and as long asqqq contains components inE1,sssoptcan be obtained [16, Theorem 5.8] in theory.
The convergence analysis [36] for Theorem 2.2 essentially reveals thatLTRScan be viewed as a certain kind of Lanczos-type method for the systemHoptsssopt=−ggg, or its preconditioned system (2.7), with an unknown coefficient matrixHopt. Onceλopt
is known,sssopt can be approximated byPCGstarting from any initial guesssss0. Note that the underlying Krylov subspace for (2.7) is Kk (M−1H+λoptIn), M−1rrr0
= Kk(M−1H, M−1rrr0), which is generated fromM−1rrr0with the residualrrr0=Hoptsss0+ggg.
In practice, however, we do not have the information ofλopt at the first stage and the residualrrr0 indeed is unknown unlesssss0 =0. Thus, the only reasonable choice ofrrr0
to initiate the Lanczos process isrrr0=ggg, and this is why the starting vectorM−1gggis used inLTRS.
Motivated by the above observation, on the other hand, we are able to design a restarting procedure forLTRSbecause after obtainingsss(k) of (2.3) we naturally have, as a byproduct, an approximation, say bλ, of the Lagrange multiplier λopt; thus the unknown coefficient matrixHopt can consequently be estimated by H+bλM. Since sss(k) is generally a good approximation forsssopt, the residual (H+λMb )sss(k) +ggg can serve as the initial vector to generate the next Krylov subspace on which the basis steps (B) and (C) inLTRSfollow similarly. We will present a restarting type procedure in the next sections.
3. The nested Lanczos method for TRS (LTRSR).
3.1. The nested GMRES method (GMRESR). Our restarting procedure for the TRS is inspired by the family of recursiveGMRESmethods from [7] (see also [5]), which are improvements of the generalized conjugate residual (GCR) method [8] for nonsymmetric linear systems. These methods possess a nested, “inner-loop/outer- loop” structure. In principle, any linear solver may be used in the inner loop, but whenGMRESis used, the resulting algorithm is known asGMRESR.
To briefly state the framework of GMRESR, at theith iteration, letxxxi andvvvi = bbb−Axxxi be the current approximation and the residual, respectively; the inner and outer loops ofGMRESRtarget at finding a good correctionuuuito the associated residual systemAuuu=vvvi. Supposeuuu(0)i ≈A−1vvvi is an approximation produced by the inner loop, and if thek-stepGMRESis used,uuu(0)i =Pk,i(A)vvvifor some polynomialPk,i(t) of degreek−1. The restartedGMRESis simply to updatexxxibyxxxi+1=xxxi+uuu(0)i asuuu(0)i is the “best” (in the sense of minimizingkAuuu−vvvik2ifGMRESis used) overKk(A, vvvi).
However, this updating ignores the information obtained from the previous Krylov subspaces. This is where the outer loop comes out and theGMRESRthen improves uuu(0)i by finding the minimizer of
uu
ui= arg min
u
uu∈span(Ui)kvvvi−Auuuk2
and updatingxxxi+1=xxxi+uuui, where
span(Ui) = span(Pk,0(A)vvv0, . . . ,Pk,i(A)vvvi) = span(uuu(0)0 , . . . , uuu(0)i )
is the subspace formed by the all previous “best” corrections Pk,j(A)vvvj associated with Auuu = vvvj over the corresponding Krylov subspaces Kk(A, vvvj), j = 0,1, . . . , i.
This procedure of GMRESR[7] can be simply stated as Algorithm 3.1.
Remark 3.1. There are four remarks for Algorithm 3.1:
(1) If Pk,i(t) = 1 for all i, then xxxi is the same as that resulting fromGCR and GMRES.
Algorithm 3.1. The framework of GMRESRin [7].
1: Choosexxx0, >0 andkand letvvv0=bbb−Axxx0, i= 0, U−1= [ ];
2: while kvvvik2> do
3: Apply k stepsGMRES forAuuu=vvvi to obtain an approximationuuu(0)i ofA−1vvvi
(or other suitable approximations for A−1vvvi); i.e., uuu(0)i =Pk,i(A)vvvi= arg min
u
uu∈Kk(A,vvvi)kvvvi−Auuuk2;
4: LetUi= [Ui−1, uuu(0)i ];
5: Solve
u u
ui= arg min
uu
u∈span(Ui)
kvvvi−Auuuk2;
6: Updatexxxi+1=xxxi+uuui, vvvi+1=vvvi−Auuui andi=i+ 1;
7: end while
(2) If Ui in step 4 is replaced by Ui = uuu(0)i , then it reduces to the restarted GMRES(i.e.,GMRES(k): restart for everyksteps).
(3) The minimization problems in steps 3 and 5 can be equivalently stated as x
xxi+uuu(0)i =xxxi+Pk,i(A)vvvi = arg min
xxx∈xxxi+Kk(A,vvvi)
kbbb−Axxxk2 and x
x
xi+1=xxxi+uuui= arg min
x
xx∈xxxi+span(Ui)kbbb−Axxxk2, respectively.
(4) For the efficient implementation of GMRESR[7], anATA-orthogonal basis for span(Ui) is formed via the modified Gram–Schmidt process, and the updating forxxxi in outer-loop iteration at steps 5 and 6 can be executed efficiently.
3.2. A nested LTRS (LTRSR). Following GMRESRin Algorithm 3.1, we will derive a nested Lanczos method for TRS. In order to put our final algorithm into a recursive inner-and-outer-loop fashion, we change the degreek(and all other quanti- ties involved in the first Krylov subspace (2.1) appearing in section 2) of the Krylov subspace in (2.3) to k1. Also, we use sss(0)1 and λ(0)1 to present the solutionsss(k) and the Lagrange multiplier to (2.3), respectively, where the subscript 1 denotes the first outer-loop iteration, and (0) means that the solution has not been refined in the nested structure to be defined below.
Withsss0=0andλ0= 0, we suppose (sssi, λi) is an approximate pair of (sssopt, λopt) after theith outer-loop iteration, and the corresponding residual isrrri=Hsssi+λiMsssi+ ggg. We will first take a look at the correction system. By comparing the optimality Hsssopt+λoptMsssopt+ggg=0, we have
Hoptuuu+rrri+δλ·Msssi=0 or M−1Hoptuuu+M−1rrri+δλ·sssi=0, (3.1)
where uuu =sssopt−sssi and δλ = λopt−λi. It should be noticed that, different from GMRESR, the correction system (3.1) is nonlinear, from which we have
uuu=−(M−1Hopt)−1M−1rrri−δλ·(M−1Hopt)−1sssi;
if we approximate the terms (M−1Hopt)−1M−1rrri andδλ·(M−1Hopt)−1sssi using the Krylov subspace based methods and notingKk(M−1Hopt, xxx) =Kk(M−1H, xxx) for any xxx, we have
uuu=−(M−1Hopt)−1M−1rrri−δλ·(M−1Hopt)−1sssi
≈P(M−1Hopt)M−1rrri+Pb(M−1Hopt)sssi
∈Kki+1(M−1H, M−1rrri) +Kmi+1(M−1H, sssi)
for some polynomialsP and bPof degrees ki+1−1 and mi+1−1, respectively. This suggests that a good correctionuuufor the currentsssi should be in
(3.2) span(Ui(0)) :=Kki+1(M−1H, M−1rrri) +Kmi+1(M−1H, sssi),
where Ui(0) is a basis ofKki+1(M−1H, M−1rrri) +Kmi+1(M−1H, sssi). The inner-loop iteration terminates withsss(0)i+1=sssi+uuu(0)i , whereuuu(0)i is obtained from
sss(0)i+1=sssi+uuu(0)i = arg min
ss
s∈sssi+Kki+1(M−1H,M−1rrri)+Kmi+1(M−1H,sssi) kssskM≤∆
f(sss).
Analogous to GMRESR, we cast the ith outer-loop iteration as minimizing f(sss) over the subspace spanned by the previousuuu(0)j forj= 0,1, . . . , i. This nested proce- dure is summarized in Algorithm 3.2.
Algorithm 3.2. The framework of the nested Lanczos method for TRS (LTRSR).
1: Choose >0 and letsss0=0,rrr0=−ggg, i= 0, U−1= [ ], k0= 0;
2: while krrrikM−1 > andi < imax do
3: Obtain the solutionsss(0)i+1and the associated Lagrange multiplierλ(0)i+1 from (3.3) sss(0)i+1=sssi+uuu(0)i := arg min
sss∈sssi+Kki+1(M−1H,M−1rrri)+Kmi+1(M−1H,sssi) kssskM≤∆
f(sss);
4: LetUi= [Ui−1, uuu(0)i ];
5: Obtain the solutionsssi+1and the associated Lagrange multiplierλi+1 of (3.4) sssi+1=sssi+uuui:= arg min
sss∈sssi+span(Ui),kssskM≤∆f(sss);
6: Updaterrri+1=Hsssi+1+λi+1Msssi+1+ggg andi=i+ 1;
7: end while
3.3. Implementation details. In this subsection, we shall give the implemen- tation details involved in Algorithm 3.2.
First, note that both (3.3) and (3.4) can be restated as
(3.5) arg min
ss
s∈sssi+U,kssskM≤∆f(sss)
for some subspaceU. Suppose dimU=ζ andU ∈Rn×ζ is an M-orthonormal basis forU; thus the next approximation sayessscan be expressed asesss=sssi+Uhhh, hhh∈Rζ. Note that
f(esss) =1
2hhhTUTHUhhh+hhhTUT(Hsssi+ggg) +f(sssi) =1
2hhhTT hhh+hhhTccci+f(sssi),
whereT =UTHU andccci=UT(Hsssi+ggg); furthermore,
ksssi+Uhhhk2M =khhhk22+ 2hhhTUTMsssi+ksssik2M =khhhk22+ 2hhhTyyyi+ksssik2M,
whereyyyi=UTMsssi. Thus the optimal correctionUhhh∈Uin (3.5) can be achieved by min
khhhk22+2hhhTyyyi≤∆2−ksssik2M
1
2hhhTT hhh+hhhTccci which, by denotingzzz=hhh+yyyi andmmmi=ccci−T yyyi, is equivalent to
kzzzkmin2≤∆i+1
1
2zzzTT zzz+zzzTmmmi, where ∆i+1= q
∆2− ksssik2M+kyyyik22. (3.6)
Letzzzi and eλbe solution and the Lagrange multiplier of (3.6), respectively; thenesss= sssi+U(zzzi−yyyi) solves (3.5). Note that (3.6) is also a smaller size TRS and the efficiency of the final restarting Lanczos method can benefit from the full horsepower of any sophisticated solver (for example, the Mor´e–Sorensen method [21] or the eigenvalue- based solver [1]) suitable for small- to medium-size TRSs.
We next discuss the procedure of finding theM-orthogonal basisUi(0)in (3.2) and span(Ui), respectively. In general, one can call the block Lanczos process (see, e.g., [3, 14]) to generate the orthogonal basis for Kki+1(M−1H,[M−1rrri, sssi]) when ki+1 = mi+1. Alternatively, we can obtain an orthogonal basis by first carrying out the stan- dard preconditioned Lanczos process [23, Algorithm 4.2] to obtain an M-orthogonal basis forKki+1(M−1H, M−1rrri), and then applying the Gram–Schmidt method process to expand it to anM-orthogonal basisUi(0)forKki+1(M−1H, M−1rrri)+Kmi+1(M−1H, sssi). This is stated in Algorithm 3.3. Note that when the breakdown in the Lanczos process in step 1 occurs beforeki+1, then`=χ(M−1rrri), i.e., the grade ofM−1rrriwith respect toM−1H; otherwise`=ki+1. Also, when taking the roundoff errors into con- sideration, the modified Gram–Schmidt and/or the full re-orthogonalization [6, Algo- rithm 7.2] should be incorporated in steps 4 and 6 to keep theM-orthogonalization nu- merically. The modified Gram–Schmidt process can be used to have anM-orthogonal basisUi for span(Ui) numerically.
We last mention the primary computational and memory requirements ofLTRSR.
For (3.3) in the inner loop, it first uses aboutki+1+mi+1 times ofH-vector products
Algorithm 3.3. A process to form an M-orthogonal basis Ui(0) for Kki+1(M−1H, M−1rrri) +Kmi+1(M−1H, sssi).
1: Obtain anM-orthogonal basisU ∈Rn×`forKki+1(M−1H, M−1rrri) by the Lanczos process [23, Algorithm 4.2];
2: forj= 1,2, . . . , mi+1 do
3: if j= 1then
4: ppp=sssi−U UTMsssi
5: else
6: ppp=M−1HU(:, j+`−1)−U(UTHU(:, j+`−1))
7: end if
8: γ=kpppkM; Ifγ= 0 then break
9: U = [U, ppp/γ]
10: end for
11: Ui(0)=U
andki+1+mi+1 times of M−1-vector products (i.e., solving linear systems with the coefficient matrixM) to form the basisUi(0), and then costsO((ki+1+mi+1)3) flops to solve (3.3). For (3.4), it calls a procedure to M-orthogonalize uuu(0)i against Ui−1 and then costs O(i3) flops to obtainsssi+1. Moreover, we can restrict the number of columns ofUi to ς by discarding the first columns to control the size for (3.4). The main memory requirements are storages ofUi(0)andUi, which can be kept modest by controllingki+1+mi+1 andς.
4. Convergence of the LTRSR. We now explore some convergence behavior of the LTRSRin Algorithm 3.2. We begin with the investigation of the breakdown behavior of LTRSR.
4.1. Breakdown inLTRSR.
Theorem 4.1. For i = 0,1, . . . until convergence in Algorithm 3.2 and min(ki, mi) ≥ 1,
(i) f(sssi)≥f(sss(0)i+1)≥f(sssi+1);
(ii) as the optimality conditions for (3.3) and (3.4), respectively, it holds that rrr(0)i+1⊥Kki+1(M−1H, M−1rrri) +Kmi+1(M−1H, sssi) and rrri+1⊥span(Ui), whererrr(0)i+1=Hsss(0)i+1+λ(0)i+1Msss(0)i+1+ggg andrrri+1 is given by step 6;
(iii) if rrri 6=0, then f(sssi)> f(sss(0)i+1); in other words, the ith inner-loop iteration will always improve the objective function wheneverrrri6=0.
Proof. The assertion (i) is trivial. The conclusion (ii) is indeed the first-order necessary condition of (3.3) and (3.4), respectively, and can be verified by, e.g., [35, Theorem 4.1]. For (iii), we show that f(sssi) = f(sss(0)i+1) implies rrri = 0. To this end, we note that f(sssi) =f(sss(0)i+1) indicates thatsssi andλi are the solution and the corresponding Lagrange multiplier of (3.3), respectively. Thus, by (ii), we have
rrri= (H+λiM)sssi+ggg⊥Kki+1(M−1H, M−1rrri) +Kmi+1(M−1H, sssi), which byM−1rrri∈Kki+1(M−1H, M−1rrri) impliesrrrTi M−1rrri= 0 leading torrri=0.
We next consider the breakdown in the inner-loop iteration.
Lemma 4.2. Suppose ggg 6⊥ E1 (implying that the original TRS (1.1) is nonde- generate) and at the ith iteration of Algorithm 3.2, min(ki+1, mi+1) ≥ 1. Then Kki+1(M−1H, M−1rrri) +Kmi+1(M−1H, sssi)is not orthogonal to E1.
Proof. In (2.4), E1 = span(W1) with W1 = [www1, . . . , wwwp] and W = [W1, W2].
The assertion is true if sssi has a component in E1. When sssi ∈ span(W2), we have M−1ggg=W WTggg=W1(W1Tggg) +W2(W2Tggg) withW1Tggg6= 0 by assumption. Since both E1 and span(W2) are invariant subspaces ofM−1H +λiIn, we know that M−1rrri = W1(W1Tggg) + (M−1H+λiIn)sssi+W2(W2Tggg) contains a nonzero componentW1(W1Tggg) inE1.
Theorem 4.3. Under the assumptions of Lemma4.2and ifKki+1(M−1H, M−1rrri) +Kmi+1(M−1H, sssi)is an invariant subspace ofM−1H additionally, thensss(0)i+1=sssopt.
Proof. Let Ui(0) be an M-orthogonal basis of Kki+1(M−1H, M−1rrri) +Kmi+1
(M−1H, sssi). According to Lemma 4.2, if span(Ui(0)) is an invariant subspace of
M−1H, then all the eigenvalues ofT = (Ui(0))THUi(0) are the eigenvalues ofM−1H, and the smallest one isθ1.
Consider the ith inner-loop subproblem (3.3) and the associated problem (3.6).
Withsss(0)i+1=sssi+Ui(0)hhhi for somehhhi, it holds that M−1rrr(0)i+1= (M−1H+λ(0)i+1In)sss(0)i+1+M−1ggg
= (M−1H+λ(0)i+1In)(sssi+Ui(0)hhhi) +M−1ggg
=M−1rrri+ (λ(0)i+1−λi)sssi+ (M−1H+λ(0)i+1In)Ui(0)hhhi ∈span(Ui(0)), where the last relation follows because span(Ui(0)) is an invariant subspace ofM−1H.
Furthermore, by (iii) of Theorem 4.1, we know thatrrr(0)i+1⊥Ui(0), which together with the above relation M−1rrr(0)i+1 ∈ span(Ui(0)) leads to (rrr(0)i+1)TM−1rrr(0)i+1 = 0 and thus rrr(0)i+1 = 0. Moreover, according to Theorem 2.1 and the associated problem (3.6), the global optimality conditions ensure that λ(0)i+1 ≥ 0,λ(0)i+1(ksss(0)i+1kM −∆) = 0 and T+λ(0)i+1I<0 implyingλ1(T) +λ(0)i+1=θ1+λ(0)i+1≥0, and thereforeH+λ(0)i+1M <0.
Consequently, we know that all the conditions in Theorem 2.1 hold, andsss(0)i+1andλ(0)i+1 are the solution and the corresponding Lagrange multiplier of the original problem (1.1), respectively.
Remark 4.1.
(1) Note that for any i, sssi = P(M−1H)M−1ggg and rrri = Pe(M−1H)M−1ggg for some polynomials P and Pe, respectively. Thus max(χ(M−1rrri), χ(sssi)) ≤ χ(M−1ggg) =ιand
span(Ui(0)) =Kki+1(M−1H, M−1rrri)+Kmi+1(M−1H, sssi)⊆Kι(M−1H, M−1ggg).
Therefore for the caseggg6⊥E1, if dim(span(Ui(0))) =χ(M−1ggg),sssoptis attained at the ith iteration. In general, it is hoped that sssopt (or a highly accurate approximation) is obtained with dim(span(Ui(0))) χ(M−1ggg). By (iii) of Theorem 4.1, we know that the outer-loop iteration serves as the refinement procedure so that the next inner-loop iteration returnssssopt (or a satisfactory approximation) with max(mi+1, ki+1)χ(M−1ggg).
(2) Theorem 4.3 also partially shows the advantage in using the affine space (4.1) sssi+Kki+1(M−1H, M−1rrri) +Kmi+1(M−1H, sssi)
over another choicesssi+Kki+1(M−1H, M−1rrri) as the constraint for the inner- loop iteration in step 3 of Algorithm 3.2: Whensssiis notsssopt, it is possible that M−1rrri ∈span(W2); thus, even ifKki+1(M−1H, M−1rrri) = span(Qki+1) is an invariant subspace of M−1H, the smallest eigenvalue ofT = QTk
i+1HQki+1 is strictly larger than θ1 and therefore H +λ(0)i+1M is indefinite; in other words, forggg 6⊥E1, the breakdown in Kki+1(M−1H, M−1rrri) does not neces- sarily imply thatsssopt is attained. The other advantage of using (4.1) as the constraint for the inner-loop subproblem (3.3) will be revealed in Theorems 4.6 and 4.7. Numerical testing will also be carried out in section 6.2 to illus- trate the importance of this double Krylov subspaceKki+1(M−1H, M−1rrri) + Kmi+1(M−1H, sssi) as well as the nested structure in steps 4–5 in Algorithm 3.2.
4.2. A priori error bounds. We next consider a priori bounds for f(sss(0)i+1)− f(sssopt) andksss(0)i+1−sssoptkM before the occurrence of the breakdown for (3.3). Similar to the convergence analysis in [36], we have
Lemma 4.4. SupposeksssoptkM =∆ andWis an arbitrary subspace of Rn. Let bsss∈arg min
sss∈sssi+W,kssskM=∆
f(sss).
Then for any nonzeroesss∈sssi+W, it holds that
(4.2) 0≤f(bsss)−f(sssopt)≤2(θn+λopt)kesss−sssoptk2M. Furthermore, if (1.1)is nondegenerate, then
(4.3) kbsss−sssoptkM ≤2√
κkesss−sssoptkM, whereκ:= θθn+λopt
1+λopt is the condition number ofM−12HoptM−12 given in (2.6).
Proof. The proof is similar to that of [36, Theorem 4.3] and the details are omitted.
Moreover, we also need the following classical result (see, e.g., [28, Theorem 6.25]) in our convergence analysis.
Lemma 4.5. On the interval [a, b] and for a real valueγ6∈[a, b], we have
(4.4) min
L∈Pk,L(γ)=1
max
t∈[a,b]|L(t)|= 1
Tk(1 + 2γ−bb−a) ,
and the polynomial L(t) = Tk(1 + 2b−at−b)/Tk(1 + 2γ−bb−a) solves the above problem (4.4).
Theorem 4.6. Suppose (1.1) is nondegenerate with ksssoptkM = ∆, and (sssi, λi) is an approximate pair to (1.1) with λi the associated Lagrange multiplier and rrri = Hsssi+λiMsssi+ggg. Letsss(0)i+1 be the solution to (3.3)satisfying ksss(0)i+1kM =∆; then we have
0≤f(sss(0)i+1)−f(sssopt)≤2(θn+λopt)
ksssopt−sssikM
Tνi(η) 2
, (4.5a)
ksssopt−sss(0)i+1kM ≤2√
κksssopt−sssikM
Tνi(η) , where (4.5b)
(4.6) κ=θn+λopt
θ1+λopt
, η= κ+ 1
κ−1 >1, νi = min χ(M−1rrri), ki+1, χ(sssi), mi+1
,
andTνi(t)is theνith Chebyshev polynomial of the first kind.
Proof. Denote
(4.7) φi= min(ki+1, χ(M−1rrri)) and ϑi= min(mi+1, χ(sssi)).
Note that any
sss∈sssi+Kki+1(M−1H, M−1rrri) +Kmi+1(M−1H, sssi)
=sssi+Kki+1(M−1Hopt, M−1rrri) +Kki+1(M−1Hopt, sssi)
can be represented as
(4.8) sss=sssi+P(M−1Hopt)M−1rrri+bP(M−1Hopt)sssi
for some polynomialsP∈Pφi−1 andbP∈Pϑi−1. Also, withδλi=λopt−λi, we have (4.9) rrri= (H+λiM)sssi+ggg=Hopt(sssi−sssopt)−δλi·Msssi.
We prove our assertion for the two cases: φi≤ϑi andφi≥ϑi. Case φi≤ϑi: By (4.8) and (4.9), we have
s s
s−sssopt=sssi−sssopt+P(M−1Hopt)M−1rrri+bP(M−1Hopt)sssi
= (sssi−sssopt) +P(M−1Hopt)M−1Hopt(sssi−sssopt)
−δλi·P(M−1Hopt)sssi+Pb(M−1Hopt)sssi
= [In+P(M−1Hopt)M−1Hopt](sssi−sssopt)
−δλi·P(M−1Hopt)sssi+Pb(M−1Hopt)sssi
=L(M−1Hopt)(sssi−sssopt)−
δλi·P(M−1Hopt)−bP(M−1Hopt) sssi, (4.10)
whereL(t) = 1 +tP(t)∈Pφi satisfyingL(0) = 1.
Note that the polynomials P(t) andPb(t) in (4.10) can be chosen independently to have ansss. Sinceφi ≤ϑi, we can choose a polynomialPb(t) =δλi·P(t)∈Pϑi−1 so that the last term in (4.10) vanishes, i.e., (δλi·P(M−1Hopt)−Pb(M−1Hopt))sssi=0.
With this choice, the correspondingsss−sssopt given in (4.10) is of the formsss−sssopt= L(M−1Hopt)(sssi−sssopt). Note that the degree of L(t) is φi. Now, we minimize the aboveksss−sssoptkMover all polynomialsL(t) = 1+tP(t) of degreeφisatisfyingL(0) = 1 to have a specificesss. Precisely,
kesss−sssoptkM = min
L∈Pφi,L(0)=1kL(M−1Hopt)(sssi−sssopt)kM
= min
L∈Pφi,L(0)=1kM12L(M−1Hopt)M−12M12(sssi−sssopt)k2
≤ kM12(sssi−sssopt)k2· min
L∈Pφi,L(0)=1kM12L(M−1Hopt)M−12k2
=k(sssi−sssopt)kM · min
L∈Pφi,L(0)=1kL(M−12HM−12 +λoptIn)k2
=ksssi−sssoptkM· min
L∈Pφi,L(0)=1 max
i=1,2,...,n|L(θi+λopt)|
≤ ksssi−sssoptkM· min
L∈Pφi,L(0)=1
max
t∈[θ1+λopt,θn+λopt]
|L(t)|
=ksssi−sssoptkM
Tφi(η) , (4.11)
where the last equality follows from (4.4) and η is given in (4.6). Plug (4.11) into (4.2) and (4.3) withW=Kki+1(M−1H, M−1rrri) +Kmi+1(M−1H, sssi) andbsss=sss(0)i+1 to have (4.5).
Case φi≥ϑi: For this case, we first claim that ifδλi= 0, then (4.10) reduces to sss−sssopt=L(M−1Hopt)(sssi−sssopt) +Pb(M−1Hopt)sssi,