• Aucun résultat trouvé

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

N/A
N/A
Protected

Academic year: 2022

Partager "Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php"

Copied!
23
0
0

Texte intégral

(1)

A RIEMANNIAN NEWTON ALGORITHM FOR NONLINEAR EIGENVALUE PROBLEMS

ZHI ZHAO, ZHENG-JIAN BAI, AND XIAO-QING JIN

Abstract. We give the formulation of a Riemannian Newton algorithm for solving a class of nonlinear eigenvalue problems by minimizing a total energy function subject to the orthogonality constraint. Under some mild assumptions, we establish the global and quadratic convergence of the proposed method. Moreover, the positive definiteness condition of the Riemannian Hessian of the total energy function at a solution is derived. Some numerical tests are reported to illustrate the efficiency of the proposed method for solving large-scale problems.

Key words. nonlinear eigenvalue problem, Riemannian Newton algorithm, Stiefel manifold, Grassmann manifold

AMS subject classifications.15A18, 65F15, 49M15, 47J10

DOI.10.1137/140967994

1. Introduction. We consider the following total energy minimization problem:

(1.1) min

X∈Rn×kE(X) :=1

2tr(XTLX) + α

4ρ(X)TL−1ρ(X) s.t. XTX =Ik, where XT denotes the transpose of X, L is a discrete Laplacian operator, α > 0 is a given constant, ρ(X) := diag(XXT), “s.t.” means “subject to”, and Ik is the identity matrix of order k. We point out that the matrix L may be singular with different boundary conditions (see [45]). In this case, we may replace L−1 by the Moore–Penrose generalized inverseL. The symbol diag(M) := (m11, m22, . . . , mnn)T denotes a vector containing the diagonal elements of ann×nmatrixM = [mij]. Ob- viously, the first-order necessary conditions for the total energy minimization problem (1.1) are given by [33]

H(X)X =XΛk, XTX =Ik,

where thek-by-kreal symmetric matrix Λk is a Lagrange multiplier. We note that the global minimizer of the constrained minimization problem (1.1) is not unique. If X is a solution, thenXQ is also a solution for any k×k real orthogonal matrixQ.

Thus a necessary condition for the global minimum of problem (1.1) takes the form of a nonlinear eigenvalue problem (NEP) [46]:

(1.2) H(X)X =k, XTX =Ik,

where the diagonal matrix Λk Rk×k contains the k smallest eigenvalues of the symmetric matrix H(X) = L+αDiag(L−1ρ(X)) Rn×n. The symbol Diag(x) is

Received by the editors May 5, 2014; accepted for publication (in revised form) by F. Tisseur March 30, 2015; published electronically June 11, 2015.

http://www.siam.org/journals/simax/36-2/96799.html

Department of Mathematics, University of Macau, Macau, People’s Republic of China (zhaozhi231@163.com,xqjin@umac.mo). The research of the third author was supported by research grant MYRG098(Y2-L3)-FST13-JXQ from the University of Macau.

Corresponding author. School of Mathematical Sciences, Xiamen University, Xiamen 361005, People’s Republic of China (zjbai@xmu.edu.cn). The research of this author was partially supported by National Natural Science Foundation of China grant 11271308, by NCET, and by the Fundamental Research Funds for the Central Universities (20720150001).

752

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(2)

a diagonal matrix with a vector x on its diagonal. Note that the meaning of the notation diag(·) is different from that of the notation Diag(·).

The total energy minimization problem (1.1) is a simplified version of the Hartree–

Fock (HF) total energy minimization problem and the Kohn–Sham (KS) total energy minimization problem in electronic structure calculations (see, for instance, [30,38, 44, 45]). Moreover, the NEP (1.2) is a simplified version of the associated HF and KS equations. The self-consistent field (SCF) iteration is widely used for solving the HF and KS equations, which calculates theksmallest eigenvalues and associated eigenvectors of the NEP (1.2) iteratively: Given the current iterateXj, computeXj+1 such that

H(Xj)Xj+1=Xj+1Λj+1k and (Xj+1)TXj+1=Ik,

where Λj+1k contains the k smallest eigenvalues of H(Xj). However, the original version of the SCF iteration often fails to converge [11]. In past decades, different heuristics were developed to accelerate and stabilize the SCF iteration [24,25]. On the convergence of the SCF iteration, one may refer to [13, 27,46]. In [45], the SCF iteration is used as an indirect way to solve problem (1.1) by minimizing a sequence of quadratic surrogate functions.

There are several recent optimization methods for solving the minimization prob- lem (1.1) directly [5, 7, 25, 26, 32, 34, 35, 40, 41]. Because of the orthogonality constraintXTX =Ik, those methods only use the gradient of the total energy and often converge slowly. In [44], a constrained optimization algorithm is proposed for minimizing the total energy by projecting the total energy into a sequence of sub- spaces and seeking the minimum point of the total energy over each subspace. In [42], a projected gradient-type method is given for minimizing a general function with the orthogonality constraint. In [16], Newton’s method and the conjugate gradient (CG) method are developed on the Grassmann and Stiefel manifolds. In [28], a mod- ified steepest descent–type method with Armijo’s line search and a modified Newton method are presented on the Grassmann and Stiefel manifolds. Also, in [31], line- search, trust region, and Newton algorithms are well-studied on matrix manifolds.

The SCF iteration with various trust region techniques is employed to minimize the total energy [17, 18, 39, 45]. In [19], a Newton method is presented for solving a class of NEPs arising from electronic structure calculation, which is efficient only for small-scale problems.

In this paper, we propose a Riemannian Newton algorithm for solving the to- tal energy minimization problem (1.1) over the Grassmann manifold related to the Stiefel manifold St(k, n) := {X Rn×k | XTX = Ik}. This is sparked by two recent papers, [27] and [19]. In [27], the convergence condition of the SCF itera- tion is related to the Hessian of the total energy. In [19], the NEP is viewed as a system of nonlinear equations, and then a Newton method is used for solving it. Therefore, in this paper, we first construct the Grassmann manifold from the Stiefel manifold St(k, n) based on an orthogonal equivalence relation and a Rieman- nian metric. Then we propose a Riemannian Newton algorithm for solving problem (1.1) over the Grassmann manifold. In particular, we combine the Riemannian New- ton algorithm with the Riemannian line search technique. Sparked by [2, 16, 28], we use the CG method [20, Algorithm 10.2.1] to solving each Newton equation inex- actly, where we do not need the inversion of the Riemannian Hessian of the total energy function and thus the computational complexity is reduced. Also, the Riemannian line search guarantees that the proposed method will converge to a local minimum [28].

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(3)

Under some mild conditions, we show that the proposed Riemannian Newton algo- rithm converges globally and quadratically. Moreover, we give the positive definiteness condition of the Riemannian Hessian of the total energy function at a solution. Some numerical experiments are reported to demonstrate the efficiency of our method for solving large-scale problems.

The rest of this paper is organized as follows. In section2we review some prelim- inary results on Riemannian manifolds. In section3we present a Riemannian Newton algorithm for solving the minimization problem (1.1) over the Grassmann manifold related to the Stiefel manifold St(k, n). In section4we give a convergence analysis. In section5we investigate the positive definiteness condition of the Riemannian Hessian of the total energy function in problem (1.1) over the Grassmann manifold. In section 6 we report some numerical results, and finally we give some concluding remarks in section7.

2. Preliminaries. In this section, we recall some basic concepts and results on Riemannian manifolds [1,2]. LetMbe ad-dimensional manifold. LetRx(M) be the set of all smooth real-valued functions defined on a neighborhood of a pointx∈ M. A tangent vectorξxto Matxis defined as a mapping fromRx(M) toRsuch that

ξxf = ˙γ(0)f := d

f(γ(t)) dt

t=0

∀f ∈ Rx(M)

for some smooth curveγ onMwith γ(0) =x. The tangent spaceTxMto Mat x consists of all tangent vectors toMatx. Denote byTMthe tangent bundle ofM:

TM:=

x∈M

TxM.

A vector field onMis a smooth functionξ:M →TMsuch thatξ(x) =ξx∈TxM for allx∈ M. A Riemannian metricg onM is a family of inner products,

gx:TxM ×TxM →R, x∈ M,

where the inner productgx(·,·) varies smoothly and induces a normξx=

gxx, ξx) onTxM. Thus, (M, g) is a Riemannian manifold [2, p. 45].

Let Mand L be two manifolds. Let G:M → L be a smooth mapping. Then the differential DG(x) ofGatx∈ M is a mapping fromTxMtoTG(x)Lsuch that

DG(x)[ξx]∈TG(x)L ∀ξx∈TxM,

where DG(x)[ξx] is a tangent vector to L at G(x) ∈ L, which is a mapping from RG(x)(L) toRdefined by

DG(x)[ξx]f =ξx(f◦G) ∀f ∈ RG(x)(L).

Given a Riemannian manifold (M, g) with a Riemannian connection∇ (see, for instance, [2,10]), letf :M →Rbe a smooth function. Then the Riemannian gradient gradf(x) off atx∈ Mis defined as the unique element inTxMsuch that

gx(gradf(x), ξx) = Df(x)[ξx] ∀ξx∈TxM.

The Riemannian Hessian off at x∈ Mis defined as the linear mapping fromTxM toTxMsuch that [2, Definition 5.5.1],

Hessf(x)[ξx] =ξx gradf(x) ∀ξx∈TxM.

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(4)

The concept of retraction originally appeared in the field of algebraic topology [21]. Here, we adopt the following definition of retraction [2,4, 37].

Definition 2.1. Let Mbe a manifold. LetR be a mapping from TMontoM. Let Rx denote the restriction ofR toTxM. We say that R is a retraction onM if

(i) R is smooth,

(ii) Rx(0X) =x, where0X is the origin ofTxM,

(iii) DRx(0X) = idTxM, where idTxM is the identity mapping on TxM with the canonical identification T0XTxM TxM.

For a real-valued function f on the manifold M and a retraction R on M, we define the pullbackfoff as the mapping fromTMtoRsuch that

(2.1) f(ξ) =f(R(ξ)) ∀ξ∈TM,

and letfxmean the restriction of ftoTxM, which is defined by fxx) =f(Rxx)) ∀ξx∈TxM.

On the Riemannian distance to a nondegenerate local minimizerx of a smooth real-valued functionf on (M, g), we have the following lemma [2, Lemma 7.4.8].

Lemma 2.2. Letx∈ Mand letf :M →Rbe aC2function(its first and second derivatives are continuous)such thatgradf(x) = 0andHessf(x)is positive definite with maximal and minimal eigenvaluesλmaxandλmin. Then given two positive scalars τ0, τ1 with τ0 < λmin and τ1 > λmax, there exists a neighborhood N(x) of x such that

τ0dist(x, x)≤ gradf(x) ≤τ1dist(x, x) ∀x∈ N(x), wheredist(·,·)means the Riemannian distance on (M, g) [2, p. 46].

On a relation between the Riemannian gradient of a smooth functionf onMat Rx(ξ) and the gradient offx atξ∈TxM withξ ≤δ for someδ > 0, we have the following special result [2, Lemma 7.4.9].

Lemma 2.3. LetRbe a retraction onMand letf be a continuously differentiable cost function on M. Then for any given x∈ Mand a scalar τ2>1, there exists a neighborhood N(x)ofx andδ >0 such that

gradf(Rx(ξ)) ≤τ2gradfx(ξ)

for allx∈ N(x)and allξ∈TxMwith ξ ≤δ, wherefis defined as in(2.1).

3. Riemannian Newton algorithm. In this section, we propose a Riemannian Newton algorithm for solving the total energy minimization problem (1.1). We first construct a Grassmann manifold from the Stiefel manifold St(k, n). Then, based on the induced Grassmann manifold, we give a matrix-form Riemannian Newton algorithm for solving problem (1.1).

3.1. The Grassmann manifold. We observe the fact that the function E : St(k, n) R defined in problem (1.1) is such that for any given X St(k, n), E(X) = E(XQ) for all Q Ok, where Ok is the set of all k×k orthogonal ma- trices. Thus, the global minimizer of problem (1.1) is not unique and is not isolated.

The Riemannian Hessian ofE must be singular, which causes trouble for applying a Riemannian Newton algorithm to problem (1.1). To overcome this difficulty, we con- struct a Grassmann manifoldQfrom the Stiefel manifold St(k, n) under the operation of orthogonal groupOk. We define a quotient manifold by

(3.1) Q:= St(k, n)/Ok,

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(5)

based on the following equivalence relation on St(k, n):

X∼Y ⇐⇒ XQ|Q∈Ok

= Y Q|Q∈Ok .

Then we haveQ:= [X] :X∈St(k, n) , where

[X] := Y St(k, n)|Y =XQ, Q∈Ok

is the equivalent class containingX. Thenatural projectionis defined as the mapping from St(k, n) toQsuch that

π(X) = [X] ∀X∈St(k, n).

Moreover, we have [X] =π−1(π(X)) and dimπ−1(π(X)) = dimOk = 1/2k(k1).

Since St(k, n) is the total space ofQ, we have [2, Proposition 3.4.4]

dimQ = dim St(k, n)dimπ−1(π(X))

= nk−12k(k+ 1)12k(k−1) =k(n−k).

The functionE: St(k, n)Rinduces a unique functionE:Q →Rsuch that E

[X]

=E

π−1([X])

and E(X) =E π(X)

.

Hence, problem (1.1) can be written as the following minimization problem:

(3.2) minE(X) s.t. X ∈ Q.

For any X ∈ Q, let ξX be an element of TXQ, and let X be an element in the equivalent classπ−1(X), which is an embedded submanifold of St(k, n). Any element ξX ∈TXSt(k, n) that satisfies Dπ(X)[ξX] =ξXcan be considered a representation of ξX. For any smooth functionf : Q →R, the functionf :=f ◦π: St(k, n)Ris smooth [2, Proposition 3.4.5]. Moreover,

Df(X)[ξX] = Df π(X)

Dπ(X)[ξX]

= Df(X)[ξX].

Since there are infinitely many valid representationsξX ofξX at X, we need to define the vertical space and horizontal space at the pointX [2, p. 48]. Note that the tangent space to St(k, n) atX St(k, n) is given by [2, p. 42]

TXSt(k, n) = {Z Rn×k :XTZ+ZTX = 0}

= {XΩ +XK: ΩT =Ω, K R(n−k)×k},

whereXRn×(n−k)such that span(X) is the orthogonal complement of span(X).

Also, a Riemannian metricg on St(k, n) is defined by

gX(Z1, Z2) := tr(ZT1Z2) ∀Z1, Z2∈TXSt(k, n), X St(k, n), and its induced Frobenius norm · X. Thus, thevertical spaceatX is defined as

VX :=TX

π−1(X)

= XΩ : ΩT =Ω, ΩRk×k .

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(6)

We can set thehorizontal spaceHX atX to be

HX: = VX = ξX∈TXSt(k, n) :gXX, νX) = 0∀νX∈ VX

= ξX ∈TXSt(k, n) :XTξX = 0

= XK:K∈R(n−k)×k . Then for anyξX ∈TXQ, there exists a unique elementξX∈ HXsuch that Dπ(X)[ξX]

= ξX, and ξX is called the horizontal lift of ξX at X. Moreover, the orthogonal projection of any elementηX∈TXSt(k, n) ontoHX atX is given by

Ph

XηX =

In−X XT ηX.

Now, we define a Riemannian metricgon the quotient manifold Qby (3.3) gXX, ζX) :=gXX, ζX), ξX, ζX ∈TXQ, X ∈ Q,

whereξX, ζX∈ HX are the unique horizontal lifts ofξX, ζXatX, respectively. Since X ∈π−1(X),XQ is inπ−1(X) for anyQ∈Ok, we need to show that

(3.4) gXQXQ, ζXQ) =gXX, ζX) ∀Q∈Ok.

To verify (3.4), we first establish the following result. The proof is similar to that of Proposition 3.6.1 in [1], and we therefore omit it.

Proposition 3.1. Let X St(k, n),X =π(X), and ξX ∈TXQ. Then it holds that

ξXQ=ξX·Q

for allQ∈Ok, where the center dot denotes matrix multiplication, and gXQXQ, ζXQ) =gXX, ζX)

for allξX, ζX ∈TXQ.

Thus the quotient manifoldQendowed with the Riemannian metricg defined in (3.3) is a Grassmann manifold.

Next, we define a second-order retractionRon (Q, g) as follows:

(3.5) RX

ξX :=π

RXX)

∀ξX∈TXQ,

whereX =π(X)∈ Q, ξX∈ HX is the horizontal lift of aξX ∈TXQat X, andRis a second-order retraction on St(k, n), which is defined by [1,3]:

(3.6) RX(Z) =

k i=1

¯

ui¯viT ∀Z ∈TXSt(k, n),

where{¯u}ki=1 and{¯v}ki=1are the left and right singular vectors corresponding to the largest k singular values of X+Z, which admits the singular value decomposition [20]:

X+Z=UΣVT, Σ = Diag(¯σ1(X+Z), . . . ,σ¯k(X+Z))Rn×k.

Here, ¯σ1(X+Z)≥σ¯2(X+Z)≥ · · · ≥σ¯k(X+Z)>0 andU = [¯u1, . . . ,n]Rn×n and V = [¯v1, . . . ,¯vk]Rk×k are orthogonal matrices. The retraction R on St(k, n)

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(7)

may reduce the computational complexity and accelerate convergence [28, 29]. In our numerical experiments, we use the retraction R. Obviously, for the retractions R defined in (3.6), we have π(RX

aX

a)) =π(RX

bX

b)) for allXa, Xb ∈π−1(X).

ThusRdefined by (3.5) is a retraction on Q[2, Proposition 4.1.3].

For the functionE : St(k, n)Rdefined in problem (1.1), we have E=E◦π, where E :Q → Ris defined in problem (3.2). By the smoothness of E on St(k, n), we know thatE is smooth onQ.

3.2. Riemannian gradient and Riemannian Hessian ofE. We give explicit formulas of the Riemannian gradient and the Riemannian Hessian of the cost function E defined in problem (3.2). To do so, we define the extended functionE:Rn×k R by

E(X) = 1

2tr(XTLX) +α

4ρ(X)L−1ρ(X) ∀X∈Rn×k.

ThenEis the restriction ofEonto St(k, n), i.e.,E=E|St(k,n). By simple calculation, the gradient ofE atX Rn×k is given by [2, p. 48]

gradE(X ) =H(X)X.

Since St(k, n) is a Riemannian submanifold ofRn×k, the Riemannian gradient of E atX St(k, n) is given by

gradE(X) = PX

gradE(X )

= PhX

H(X)X .

where PX means the orthogonal projection ontoTXM, which is given by PXZ= (In−X XT)Z+Xskew(XTZ) =Z−Xsym(XTZ) ∀Z∈Rn×k. Here, skew(A) := (A−AT)/2 and sym(A) := (A+AT)/2. Therefore, for anyX ∈ Q andX∈π−1(X), the unique horizontal lift of the Riemannian gradient gradE(X) of E atX St(k, n) is given by

(3.7) gradE(X)X= gradE(X) = Ph

X

H(X)X

= (In−X XT)H(X)X.

Let and be the Riemannian connections on Q and St(k, n). The Riemannian Hessian ofEat X∈ Q is given by

HessE(X)[ZX] =ZXgradE(X) ∀ZX∈TXQ. SincePh

XPXZ=Ph

XZ for allZ∈TXSt(k, n), we have [2, equation (5.15) and Propo- sition 5.3.3],

HessE(X)[ZX]X =ZXgradE(X)X = PhX

Z

XgradE(X)X

= PhX

PX

D gradE(X)[ZX]

= PhX

D gradE(X)[ZX] . where D gradf(x)[ξx] means the classical directional derivative. We get by (3.7),

D gradE(X)[ZX] =

X ZTX+ZX XT

H(X)X + 2α

In−X XT Diag

L−1diag(X ZTX) X +

In−X XT)H(X ZX.

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(8)

Thus

HessE(X)[ZX]X

= Ph

X

−ZXXTH(X)X+ 2αDiag

L−1diag(X ZTX)

X+H(X ZX

, (3.8)

where the fact of Ph

XX = 0 is used.

We remark that the Newton equation on the Grassmann manifoldQat the point X ∈ Qis given by [2, p. 113]

HessE(X)[ZX] =gradE(X), ZX∈TXQ. Taking the horizontal lift yields

HessE(X)[ZX]X=gradE(X)X, or

Ph

X

−ZXXTH(X)X+ 2αDiag

L−1diag(X ZTX)

X+H(X ZX

=Ph

X

H(X)X ,

forZX ∈ HX.

3.3. Riemannian Newton algorithm. Without causing any confusion, we use

·,·and · to denote the Riemannian metrics and their induced norms on St(k, n) andQ, respectively. Based on the discussion in section3.2, we describe a matrix-form Riemannian Newton algorithm for solving the minimization problem (3.2).

algorithm3.2 (a matrix-form Riemannian Newton algorithm).

Step 0. Given X0St(k, n),β, η∈(0,1),σ∈(0,1/2], and j:= 0.

Step 1. Apply the CG method [20, Algorithm 10.2.1]to solving

(3.9) Ph

Xj

D gradE(Xj)[ΔXj]

+ gradE(Xj) = 0 for ΔXj∈ HXj such that

(3.10) Ph

Xj

D gradE(Xj)[ΔXj]

+ gradE(Xj)≤ηjgradE(Xj) and

(3.11)

gradE(Xj),ΔXj

≤ −ηj

ΔXj,ΔXj ,

where ηj := min{η,gradE(Xj)}. If (3.10) and(3.11) are not attainable, then let

ΔXj:=gradE(Xj).

Step 2. Let lj be the smallest nonnegative integerl such that

(3.12) E(R

XjlΔXj))−E(Xj)≤σβl

gradE(Xj),ΔXj . Set

Xj+1 :=R

XjljΔXj)Qj for some Qj∈Ok.

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(9)

Step 3. Replace j byj+ 1and go toStep 1.

We remark that Algorithm 3.2 is a numerically realizable Riemannian Newton algorithm for solving the minimization problem (3.2). Suppose that{Xj} and{Yj} are two sequences generated by Algorithm 3.2. If [X0] = [Y0], then [Xj] = [Yj] for allj. Thus, Algorithm3.2returns a sequence{[Xj]} ∈ Q by takingX0= [X0]∈ Q, where X0 St(k, n). We also point out that our method has some advantages over classical equality-constrained optimization methods: (1) A nice feature is that the generated iterates are all feasible. (2) As shown in section 4, our method converges globally and quadratically as an unconstrained optimization on a constrained set.

(3) No additional Lagrange multipliers or penalty functions are required. Finally, numerical tests in section6show the efficiency of our method over the classical interior- point method [12].

4. Convergence analysis. In this section, we establish the global and quadratic convergence of Algorithm 3.2. As in (2.1), we have the following equality on the Riemannian gradient ofEand its pullback functionEthrough the retractionRdefined in (3.5) [2, p. 56]:

(4.1) gradE(X) = gradEX(0X) ∀X ∈ Q, 0X∈TXQ.

For the second-order retractionRonQdefined in (3.5), we have [2, Proposition 5.5.5]

(4.2) HessE(X) = HessE(0 X) ∀X∈ Q, 0X∈TXQ.

4.1. Global convergence. On the global convergence of Algorithm3.2, we have the following result. The proof follows that of Theorem 11(a) in [15].

Theorem 4.1. Any accumulation point X of the sequence {Xj} generated by Algorithm 3.2produces a stationary point X := [X] of the cost function E defined in problem(3.2).

Proof. Suppose {Xj} → X, renumbering if necessary. If there exists a subse- quence{ΔXj}J such that ΔXj=gradE(Xj) for allk∈ J, thenXis a stationary point ofE. We note that Dπ(X)[gradE(X)] = gradE(X). Hence, [X] is a sta- tionary point ofE. Therefore, without loss of generality, to prove the theorem we only need to consider the case in which the direction is always given by (3.9). To verify that gradE(X) = 0, we only need to show that gradE(X) = 0. By contradiction, we assume that gradE(X)= 0. LetXj := [Xj] for allj. By (3.3) and (3.10), we have

gradE(Xj)HessE(Xj)[ΔXj]Xj+HessE(Xj)[ΔXj]Xj+ gradE(Xj)

=HessE(Xj)[ΔXj]+HessE(Xj)[ΔXj]

Xj+ gradE(Xj)

HessE(Xj) · ΔXj+ηjgradE(Xj)

HessE(Xj) · ΔXj+ηgradE(Xj), (4.3)

where 0< ηj ≤η <1, andHessE(Xj)denotes the operator norm defined by HessE(Xj):= sup HessE(Xj)[ΔXj]: ΔXj∈TXjSt(k, n), ΔXj= 1

. It follows from (4.3) that

(4.4) ΔXj(1−η)gradE(Xj) HessE(Xj),

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(10)

where HessE(Xj)>0 for allj. Otherwise, if HessE(Xj)= 0 for some j, then by (4.3), we have gradE(Xj) = 0. Thus Xj is a stationary point of E and the algorithm stops.

Now, we note that there exist two constantsc1, c2>0 such that 0< c1ΔXj ≤c2

for allj. In fact, if there exists some subsequence {ΔXj}K 0, then we have by (4.4) that {gradE(Xj)}K 0, since HessE(Xj) is bounded for the bounded sequence{Xj}K. By continuity, we get gradE(X) = 0, a contradiction. On the other hand,{ΔXj}cannot be unbounded because, taking into account of the boundedness of{gradE(Xj)}, this would contradict (3.11).

We observe from (3.12) that the sequence{E(Xj)0}is monotonically nonin- creasing and thus is convergent. Hence,

(4.5) lim

j→∞

E(Xj)−E(Xj+1)

= 0.

By (3.11), (3.12), and (4.4), we have E(Xj)−E(Xj+1)≥ −σβlj

gradE(Xj),ΔXj

≥σ(1−η)2βljηjgradE(Xj)2 HessE(Xj)2 0, which, together with (4.5), implies

j→∞lim βljηjgradE(Xj)2= 0.

This implies that lim infβlj = 0. Otherwise, if lim infβlj >0, then, by the definition of ηj, we have gradE(X) = 0, a contradiction. Therefore, we may assume that limβlj = 0, taking a subsequence if necessary. Then we get by (3.12),

E

RXj

βljΔXj β

ΔXj ΔXj

−E(Xj)> σβljΔXj β

gradE(Xj), ΔXj ΔXj

,

and it follows that

(4.6)

EX

j

βljΔXj β ΔXj

ΔXj

−EX

j(0X

j)

βljΔXj β

> σ

gradE(Xj), ΔXj ΔXj

,

whereE =E◦Rmeans the pullback ofE through the retractionRon St(k, n). Since ΔXj/ΔXj has unit norm, we may assume that {ΔXj/ΔXj} converges to ξ with ξ = 1, taking a subsequence if necessary. By continuity of the Riemannian metric·,·and (4.6), we obtain

gradE(X), ξ

≥σ

gradE(X), ξ , and then

(4.7)

gradE(X), ξ

0,

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(11)

since 0< σ <1. By (3.11) and (4.4), we have (4.8)

gradE(Xj), ΔXj ΔXj

≤ −ηjΔXj.

Note that the sequence{ΔXj}is bounded below and gradE(X)= 0 by assump- tion. Hence, we may assume that {ΔXj} → ΔX, taking a subsequence if necessary. Then, in (4.8) asj → ∞, we get

gradE(X), ξ

≤ −min{η,gradE(X)}ΔX<0, which contradicts (4.7). Therefore,gradE(X)= 0. The proof is complete.

4.2. Quadratic convergence. We establish the quadratic convergence of Al- gorithm3.2. To do so, we need the following positive definiteness assumption on the Riemannian Hessian ofE.

Assumption 4.2. The Riemannian Hessian operator HessE([X]) : T[X

]Q → T[X

]Q is positive definite, where X is an accumulation point of the sequence{Xj} generated by Algorithm 3.2.

Assumption4.2guarantees that a stationary pointX:= [X] ofEis an isolated local minimum point of E. In section 5 we provide a sufficient condition such that Assumption4.2is satisfied.

To establish the quadratic convergence of Algorithm 3.2, we need the following result.

Lemma 4.3. Let X be an accumulation point of the sequence {Xj} generated by Algorithm 3.2, i.e., X is a limit point of a subsequence {Xj}K. Suppose that Assumption 4.2 is satisfied. Then there exist two constants d1, d2 >0 such that for allj ∈ K sufficiently large, it holds that

d1gradE([Xj])ΔXj ≤d2gradE([Xj]).

Proof. Let X := [X] and Xj := [Xj] for all j. As {Xj}K X, we get {Xj}K→X. By Assumption4.2, there exist two scalarsκ0, κ1>0 such that for all j∈ K sufficiently large, HessE(Xj) is nonsingular, and

(4.9) HessE(Xj) ≤κ0, [HessE(Xj)]−1 ≤κ1. By (4.9), we have for allj ∈ Ksufficiently large,

ΔXj=[HessE(Xj)]−1

HessE(Xj)[ΔXj] + gradE(Xj)gradE(Xj)

[HessE(Xj)]−1

HessE(Xj)[ΔXj] + gradE(Xj)+gradE(Xj)

≤κ1(1 +ηj)gradE(Xj) ≤κ1(1 +η)gradE(Xj) ≡d2gradE(Xj) and

gradE(Xj)=HessE(Xj)[ΔXj] + gradE(Xj)HessE(Xj)[ΔXj]

HessE(Xj)[ΔXj] + gradE(Xj)+HessE(Xj)[ΔXj]

≤ηjgradE(Xj)+HessE(Xj) · ΔXj

≤ηgradE(Xj)+κ0ΔXj.

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(12)

Thus for allj∈ Ksufficiently large, ΔXj 1−η

κ0 gradE(Xj) ≡d1gradE(Xj). The proof is complete.

On the local convergence of Algorithm 3.2 related to the nondegenerate local minima, we have the following result. The proof follows that of Theorem 11(b) in [15].

Lemma 4.4. Let X be an accumulation point of the sequence{Xj} generated by Algorithm 3.2. Suppose that Assumption 4.2 holds. Then {[Xj]} converges to [X] onQ defined by (3.1).

Proof. By Theorem4.1, we have gradE([X]) = 0. Also, the Riemannian Hessian operator HessE([X]) is positive definite by assumption. Then X := [X] is an isolated local minimum point of E. LetS be the set of limit points of the sequence {Xj := [Xj]}, which is nonempty since X ∈ S. Suppose thatX is not the only limit points of the sequence{Xj}. Then

ς :=

inf

X∈S\X

dist(X, X)

ifS\X=∅,

1 otheriwse.

SinceX is an isolated local minimizer ofE, it follows thatς >0. Define S1:= Y ∈ Q |dist(Y|S)≤ς/4

, S2:= Y ∈ Q |dist(Y, X)≥ς , where dist(Y|S) := infX∈Sdist(Y, X). Then for allj sufficiently large,Xj belongs to at least one of the setsS1andS2. Next, let{Xj}Kbe a subsequence of{Xj}such that dist(Xj, X)≤ς/4 for allj∈ Ksufficiently large. Thus, every limit point of{Xj}j∈K

lies in the compact setBς

4(X), which is also an accumulation point of the sequence {Xj}. Hence,{Xj}j∈K converges toX, which is the unique accumulation point of {Xj} in Bς

4(X). By Theorem 4.1 again, {gradE(Xj)}K 0. This, together with Lemma4.3, yields that{ΔXj}K0. SinceQis a compact manifold, for the retractionR onQ, there exist two scalars >0 andδ>0 such that [2, p. 149]

(4.10) ΔXdist(X, RXΔX) ∀X ∈ Q, ΔX ∈TXQ, ΔX ≤δ. Notice that ΔXj min{ς/4, δ,(ς)/4} for all j ∈ K sufficiently large. Let ˆj K be sufficiently large. Then, by using Xˆj+1 := [Xj+1] = [R

XjljΔXj)Qj] = RXˆjlˆjΔXˆj) and (4.10), we obtain

dist(Xˆj+1|S\X) inf

Y∈S\X{dist(Y, X)} −dist(Xˆj+1, Xˆj)dist(Xˆj, X)

= inf

Y∈S\X{dist(Y, X)} −dist(RXˆjlˆjΔXˆj), Xˆj)dist(Xˆj, X)

inf

Y∈S\X{dist(Y, X)} − 1

ΔXˆjdist(Xˆj, X)

≥ς−ς/4−ς/4 =ς/2, which showsXˆj+1∈ S1\Bς

4(X).

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(13)

By using ΔXˆj min{ς/4, δ,(ς)/4}, Xˆj+1 := RXˆjlˆjΔXˆj), and (4.10) again, we get

dist(Xˆj+1, X)dist(Xˆj+1, Xˆj) + dist(Xˆj, X)

dist(RXˆjlˆjΔXˆj), Xˆj) + dist(Xˆj, X)

1

ΔXˆj+ dist(Xˆj, X)

≤ς/4 +ς/4 =ς/2,

which implies Xˆj+1 ∈ S2. Hence, Xˆj+1 Bς

4(X). By definition, we derive that ˆj+ 1∈ K. Therefore, by induction, we conclude thatj∈ Kfor allj sufficiently large and then the whole sequence{Xj}converges toX.

On the stepsizeβlj in (3.12), we have the following result similar to Proposition 5 in [36].

Lemma 4.5. Let X be an accumulation point of the sequence{Xj} generated by Algorithm3.2. Suppose that Assumption4.2holds; then forj sufficiently large,lj= 0 satisfies(3.12).

Proof. LetX:= [X] and Xj:= [Xj] for allj. Let ΔXjN be the exact solution of the Newton equation (3.9). Then we have

HessE(Xj)[ΔXjΔXNj]Xj = gradE(Xj) + HessE(Xj)[ΔXj]Xj, and thus

(4.11) HessE(Xj)[ΔXjΔXNj] = gradE(Xj) + HessE(Xj)[ΔXj].

According to (4.1) and (4.2), we have (4.12)

gradEXj(0Xj) + HessEXj(0Xj)[ΔXNj] = gradE(Xj) + HessE(Xj)[ΔXNj] = 0Xj. By Lemma4.4, we haveXj→X. Thus, by Lemma4.3, (3.10), (4.9), and (4.11), we have for allj sufficiently large,

ΔXjΔXNj=[HessE(Xj)]−1(gradE(Xj) + HessE(Xj)[ΔXj])

[HessE(Xj)]−1 · gradE(Xj) + HessE(Xj)[ΔXj]

≤κ1ηjgradE(Xj) ≤κ1gradE(Xj)2 κ1

d21ΔXj2. (4.13)

In addition, HessEX is Lipschitz-continuous at 0X uniformly in a neighborhood ofX, i.e., there exist scalarsκ2>0,δ1>0, andδ2>0, such that for allX ∈Bδ1(X) and allξ∈Bδ2(0X), it holds that

(4.14) HessEX(ξ)HessEX(0X) ≤κ2ξ. By Taylor’s theorem, there exists some constantθ∈[0,1] such that EXj(ΔXj) =EXj(0Xj)+

gradEXj(0Xj),ΔXj +1

2

HessEXj(θΔXj)[ΔXj],ΔXj .

Downloaded 01/18/18 to 166.111.71.48. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Références

Documents relatifs

Recently, the notion of memory-efficient represen- tations of the orthonormal basis matrices of the second-order and higher-order Krylov subspaces was generalized to the model

inverse eigenvalue problem, stochastic matrix, geometric nonlinear conjugate gradient method, oblique manifold, isospectral flow method.. AMS

Finally, we apply some spectral properties of M -tensors to study the positive definiteness of a class of multivariate forms associated with Z -tensors.. We propose an algorithm

The reason that we are interested in BCOP is twofold: on the one hand, many practical applications that arise recently from, for example, correlation matrix approx- imation with

Then, making use of this fractional calculus, we introduce the generalized definition of Caputo derivatives. The new definition is consistent with various definitions in the

Based on iteration (3), a decentralized algorithm is derived for the basis pursuit problem with distributed data to recover a sparse signal in section 3.. The algorithm

For the direct adaptation of the alternating direction method of multipliers, we show that if the penalty parameter is chosen sufficiently large and the se- quence generated has a

For a nonsymmetric tensor F , to get a best rank-1 approximation is equivalent to solving the multilinear optimization problem (1.5).. When F is symmetric, to get a best