An unconstrained smooth minimization reformulation of the second-order cone complementarity problem

(1)

Digital Object Identifier (DOI) 10.1007/s10107-005-0617-0

Jein-Shan Chen·Paul Tseng

An unconstrained smooth minimization reformulation of the second-order cone complementarity problem

In honor of Terry Rockafellar on his 70th birthday Received: July 12, 2004 / Accepted: May 25, 2005 Published online: July 14, 2005 – © Springer-Verlag 2005

Abstract. A popular approach to solving the nonlinear complementarity problem (NCP) is to reformulate it as the global minimization of a certain merit function over IRⁿ. A popular choice of the merit function is the squared norm of the Fischer-Burmeister function, shown to be smooth over IRⁿand, for monotone NCP, each stationary point is a solution of the NCP. This merit function and its analysis were subsequently extended to the semidefinite complementarity problem (SDCP), although only differentiability, not continuous differentiability, was established. In this paper, we extend this merit function and its analysis, including continuous differentiability, to the second-order cone complementarity problem (SOCCP). Although SOCCP is reducible to a SDCP, the reduction does not allow for easy translation of the analysis from SDCP to SOCCP. Instead, our analysis exploits properties of the Jordan product and spectral factorization associated with the second-order cone. We also report preliminary numerical experience with solving DIMACS second-order cone programs using a limited-memory BFGS method to minimize the merit function.

Key words. Second-order cone – Complementarity – Merit function – Spectral factorization – Jordan product – Level set – Error bound

1. Introduction

We consider the following conic complementarity problem of findingx, y ∈ IRⁿand ζ ∈IRⁿsatisfying

x, y =0, x∈K, y ∈K, (1)

x =F (ζ ), y =G(ζ ), (2)

where·,·is the Euclidean inner product,F : IRⁿ→IRⁿandG: IRⁿ →IRⁿare smooth (i.e., continuously differentiable) mappings, andKis a closed convex cone in IRⁿ that is self-dual in the sense thatKequals its dual coneK^∗:= {y | x, y ≥0∀x ∈K}. We will focus on the case whereKis the Cartesian product of second-order cones (SOC), also called Lorentz cones [11]. In other words,

K=Kⁿ¹× · · · ×Kⁿ^N, (3)

J.-S. Chen: Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan.

e-mail:jschen@math.ntnu.edu.tw

P. Tseng: Department of Mathematics, University of Washington, Seattle, Washington 98195, USA.

e-mail:tseng@math.washington.edu

Mathematics Subject Classification (1991): 26B05, 26B35, 90C33, 65K05

(2)

whereN, n1, . . . , n_N ≥1,n1+ · · · +n_N =n, and

Kⁿⁱ:= {(x1, x2)∈IR×IRⁿⁱ⁻¹| x2 ≤x1},

with·denoting the Euclidean norm andK¹denoting the set of nonnegative reals IR₊. A special case of (3) isK=IRⁿ₊, the nonnegative orthant in IRⁿ, which corresponds to N =nandn₁= · · · =n_N =1. We will refer to (1), (2), (3) as the second-order cone complementarity problem (SOCCP).

An important special case of SOCCP corresponds toG(ζ )=ζfor allζ ∈IRⁿ. Then (1) and (2) reduce to

F (ζ ), ζ =0, F (ζ )∈K, ζ ∈K. (4) IfK=IRⁿ₊, then (4) reduces to the nonlinear complementarity problem (NCP) and (1)–

(2) reduce to the vertical NCP [9]. The NCP plays a fundamental role in optimization theory and has many applications in engineering and economics; see, e.g., [9, 13–15].

Another important special case of SOCCP corresponds to the Karush-Kuhn-Tucker (KKT) optimality conditions for the convex second-order cone program (CSOCP):

minimize g(x)

subject toAx =b, x∈K, (5)

whereA∈IR^m^×ⁿhas full row rank,b∈IR^mandg: IRⁿ →IR is a convex twice continuously differentiable function. Whengis linear, this reduces to the SOCP which has numerous applications in engineering design, finance, robust optimization, and includes as special cases convex quadratically constrained quadratic programs and linear programs (LP); see [1, 33] and references therein. The KKT optimality conditions for (5), which are sufficient but not necessary for optimality, are (1) and

Ax=b, y = ∇g(x)−A^Tζ_d for someζ_d∈IR^m.

Choose anyd ∈ IRⁿsatisfyingAd =b. (If no suchd exists, then (5) has no feasible solution.) LetB ∈ IRⁿ^×⁽ⁿ⁻^m) be any matrix whose columns span the null space ofA.

Thenx satisfiesAx =bif and only ifx =d +Bζ_pfor someζ_p ∈IRⁿ⁻^m. Thus, the KKT optimality conditions can be written in the form of (1) and (2) with

ζ :=(ζp, ζd), F (ζ ):=d+Bζp, G(ζ ):= ∇g(F (ζ ))−A^Tζd. (6) Alternatively, since any ζ ∈ IRⁿ can be decomposed into the sum of its orthogonal projection onto the column space ofA^T and the null space ofA,

F (ζ ):=d+(I−A^T(AA^T)⁻¹A)ζ, G(ζ ):= ∇g(F (ζ ))−A^T(AA^T)⁻¹Aζ (7) can also be used in place of (6). For large problems whereAis sparse, (7) has the advan- tage that the main cost of evaluating the Jacobians∇F and∇Glies in invertingAA^T, which can be done efficiently via sparse Cholesky factorization. In contrast, (6) entails multiplication by the matrixB, which can be dense.

There have been proposed various methods for solving CSOCP and SOCCP. They include interior-point methods [2, 3, 33, 36, 37, 42, 52], reformulating SOC constraints as smooth convex constraints [4], (non-interior) smoothing Newton methods [6, 19],

(3)

and smoothing–regularization methods [22]. These methods require solving a nontrivial system of linear equations at each iteration. For the case whereG≡I andF is affine with∇FstrictlyK-copositive, a matrix splitting method has been proposed [21]. In this paper, we study an alternative approach based on reformulating CSOCP and SOCCP as an unconstrained smooth minimization problem. In particular, we aim to find a smooth functionψ: IRⁿ×IRⁿ →IR₊such that

ψ (x, y)=0 ⇐⇒ (x, y)satisfies (1). (8) We call such aψa merit function. Then SOCCP can be expressed as an unconstrained smooth (global) minimization problem:

ζmin∈IRⁿ f (ζ ):=ψ (F (ζ ), G(ζ )). (9) Various gradient methods, such as conjugate gradient methods and (limited-memory) quasi-Newton methods [5, 18, 38], can now be applied to solve (9). They have the advan- tage of requiring less work per iteration than interior-point methods and non-interior Newton methods. This approach can also be combined with smoothing and nonsmooth Newton methods to improve the efficiency and robustness of the latter, as was done in the case of NCP [7, 8, 12, 17, 24, 27, 30]. For this approach to be effective, the choice of ψis crucial. In the case of NCP, corresponding to (4) andK=IRⁿ₊, a popular choice is

ψ (x, y) = 1 2

n i=1

φ (x_i, y_i)²

for all x = (x1, ..., xn)^T ∈ IRⁿ, whereφ is the well-known Fischer-Burmeister (FB) NCP-function [16, 17] defined by

φ (x_i, y_i)=

x_i²+y_i²−x_i−y_i.

It has been shown thatψis smooth (even thoughφis not differentiable) and satisfies (8) [10, 25, 26]. Moreover, whenF is monotone or, more generally, aP0-function, every stationary point ofζ →ψ (F (ζ ), ζ )is a solution of NCP [10, 20]. This is an important property since (i) gradient methods are guaranteed to find stationary points only, and (ii) when an LP is reformulated as an NCP, the resultingFis monotone, but neither strongly monotone nor a uniformly P-function. In contrast, other smooth merit functions for NCP, such as the implicit Lagrangian and the D-gap function [28, 35, 40, 45, 51, 54], requireF to be a uniformlyP-function in order for stationary points to be solutions of NCP. Thus these other merit functions cannot be used for LP. Subsequently, a number of variants ofψwith additional desirable properties have been proposed, e.g., [6, 10, 29, 31, 34, 41, 47, 49, 53]. A recent discussion of these variants can be found in the paper [47].

Moreover, the above merit functionψ, as well as a related merit function of Yamashita and Fukushima [53], have been extended to the semidefinite complementarity problem (SDCP), which has the form (1), (2), but withx, ybeingq×q(q ≥1) real symmetric block-diagonal matrices of fixed block sizes,·,·being the trace inner product, andK being the cone ofq ×q block-diagonal positive semidefinite matrices of fixed block

(4)

sizes [50, 53]. However, the analysis in [50] showedψto be differentiable, but did not show it to be smooth.¹

Can the above merit functions for NCP be extended to SOCCP? To our knowledge, this question has not been studied previously. We study it in this paper. We are motivated by previous work on extending merit function from NCP to SDCP [50, 53]. We are further motivated by a recent work [19] showing that the FB function extends from NCP to SOCCP using the Jordan product associated with SOC [11]. Nice properties of the FB function, such as strong semismoothness, are preserved when extended to SOCCP [48]. More specifically, for anyx =(x₁, x₂), y=(y₁, y₂)∈IR×IRⁿ⁻¹, we define their Jordan product associated withKⁿas

x·y := (x, y, y₁x₂+x₁y₂). (10) The identity element under this product ise :=(1,0, . . . ,0)^T ∈ IRⁿ. We writex²to meanx·xand writex+yto mean the usual componentwise addition of vectors. It is known thatx² ∈ Kⁿfor allx ∈ IRⁿ. Moreover, ifx ∈ Kⁿ, then there exists a unique vector inKⁿ, denoted byx^1/2, such that(x^1/2)²=x^1/2·x^1/2=x. Then,

φ (x, y):=(x²+y²)^1/2−x−y (11) is well defined for all(x, y) ∈ IRⁿ×IRⁿand maps IRⁿ×IRⁿto IRⁿ. It was shown in [19] thatφ (x, y)=0 if and only if(x, y)satisfies (1). Thus,

ψ_FB(x, y):= 1 2

N i=1

φ (x_i, y_i)², (12)

wherex =(x₁, . . . , x_N)^T, y=(y₁, . . . , y_N)^T ∈IRⁿ¹× · · · ×IRⁿ^N, is a merit function for SOCCP. We will show that, like the NCP case,ψ_FB is smooth and, when ∇F and

−∇Gare column monotone, every stationary point of (9) solves SOCCP; see Propo- sitions 2 and 3. The same holds for the following analog of the SDCP merit function studied by Yamashita and Fukushima [53]:

ψ_YF(x, y):=ψ₀(x, y)+ψ_FB(x, y), (13) whereψ₀: IR→[0,∞)is any smooth function satisfying

ψ₀(t )=0 ∀t≤0 and ψ₀(t ) >0 ∀t >0; (14) see Proposition 4. In [53], ψ₀(t ) = ¹₄(max{0, t})⁴was considered. Analogous to the NCP and SDCP cases, when∇G(ζ )is invertible, a∇F-free descent direction for

f_FB(ζ ):=ψ_FB(F (ζ ), G(ζ )) (15) and

f_YF(ζ ):=ψ_YF(F (ζ ), G(ζ )) (16)

1 During the revising of this paper, a proof of smoothness is reported in [43].

(5)

can be found. The functionf_YF, compared tof_FB, has additional bounded level-set and error bound properties; see Section 5. Our proof of the smoothness ofψ_FBin Section 3 is quite technical, but further simplification seems difficult. In particular, neither general properties of the Jordan product associated with symmetric cones [11] nor the strong semismoothness proof forφgiven in [48] lend themselves readily to a smoothness proof forψ_FB. In Section 6, we report our numerical experience with solving SOCP (5) from the DIMACS library by using a limited-memory BFGS (L-BFGS) method to minimize f_FB, withF andG given by (7). On problems withn mand for low-to-medium solution accuracy, L-BFGS appears to be competitive with interior-point methods. We also report our experience with solving CSOCP using a BFGS method to minimizef_FB. It is known that SOCCP can be reduced to an SDCP by observing that, for any x =(x₁, x₂)∈IR×IRⁿ⁻¹, we havex∈Kⁿif and only if

Lx:= x1 x₂^T

x2x1I

is positive semidefinite (also see [19, p. 437] and [44]). However, this reduction increases the problem dimension fromnton(n+1)/2 and it is not known whether this increase can be mitigated by exploiting the special “arrow” structure ofL_x.

Throughout this paper, IRⁿdenotes the space ofn-dimensional real column vectors and^T denotes transpose. For any differentiable functionf : IRⁿ →IR,∇f (x)denotes the gradient off atx. For any differentiable mappingF =(F1, ..., F_m)^T : IRⁿ→IR^m,

∇F (x) = [∇F1(x) · · · ∇F_m(x)] denotes the transpose Jacobian ofF atx. For any symmetric matricesA, B ∈ IRⁿ^×ⁿ, we writeA B (respectively, A B) to mean A−Bis positive semidefinite (respectively, positive definite). For nonnegative scalars αandβ, we writeα=O(β)to meanα≤Cβ, withCindependent ofαandβ.

2. Jordan product and spectral factorization

It is known thatKⁿis a closed convex self-dual cone with nonempty interior given by int(Kⁿ)= {(x₁, x₂)∈IR×IRⁿ⁻¹| x₂< x₁}.

The Jordan product (10), unlike scalar or matrix multiplication, is not associative, which is a main source of complication in the analysis of SOCCP. For any x = (x1, x2) ∈ IR×IRⁿ⁻¹, its determinant is defined by

det (x):=x₁²− x2². In general,det (x·y)=det (x)det (y)unlessx₂=y₂.

We next recall from [19] that eachx = (x₁, x₂) ∈ IR×IRⁿ⁻¹ admits a spectral factorization, associated withKⁿ, of the form

x=λ₁u⁽¹⁾+λ₂u⁽²⁾,

whereλ1, λ2andu⁽¹⁾, u⁽²⁾are the spectral values and the associated spectral vectors of xgiven by

(6)

λ_i =x1+(−1)ⁱx2, u⁽ⁱ⁾=









1

2 1, (−1)ⁱ x₂ x2

ifx2=0;

1

2 1, (−1)ⁱw2

ifx2=0,

fori =1,2, withw2being any vector in IRⁿ⁻¹ satisfyingw2 = 1. Ifx2 = 0, the factorization is unique.

The above spectral factorization ofx, as well asx²andx^1/2and the matrixL_x, have various interesting properties; see [19]. We list four properties that we will use later.

Property 1. For anyx=(x₁, x₂)∈IR×IRⁿ⁻¹, with spectral valuesλ₁, λ₂and spectral vectorsu⁽¹⁾, u⁽²⁾, the following results hold.

(a) x²=λ²₁u⁽¹⁾+λ²₂u⁽²⁾∈Kⁿ.

(b) Ifx∈Kⁿ, then 0≤λ1≤λ2andx^1/2=√

λ1u⁽¹⁾+√ λ2u⁽²⁾.

(c) Ifx ∈int(Kⁿ), then 0< λ1≤λ2,det (x)=λ1λ2, andL_xis invertible with L⁻_x¹= 1

det (x)



 x₁ −x₂^T

−x₂ det (x) x1

I+ 1 x1

x₂x₂^T



.

(d) x·y =L_xyfor ally ∈IRⁿ, andL_x0 if and only ifx ∈int(Kⁿ).

3. Smoothness property of merit functions

In this section we show that the functions (12) and (13) are smooth functions satisfying (8). For simplicity, we focus on the special case ofN=1, i.e.,

ψ_FB(x, y)= 1

2φ (x, y)² (17)

in this and the next two sections. Extension of our analyses to the general case ofN ≥1 is straightforward. We begin with the following result from [19] showing that the FB functionφgiven by (11) has property analogous to the NCP and SDCP cases. Additional properties ofφare studied in [19, 48].

Lemma 1. ([19, Proposition 2.1]) Letφ: IRⁿ×IRⁿ→IRⁿbe given by (11). Then φ (x, y)=0⇐⇒x, y∈Kⁿ, x·y =0,

⇐⇒x, y∈Kⁿ, x, y =0.

Sincex², y²∈Kⁿfor anyx, y ∈IRⁿ, we have thatx²+y²=(x²+y²,2x1x2+ 2y1y2)∈Kⁿ. Thus

x²+y²∈int(Kⁿ) ⇐⇒ x²+ y²=2x1x2+y1y2. (18) The spectral values ofx²+y²are

λ1:= x²+ y²−2x1x2+y1y2,

λ2:= x²+ y²+2x1x2+y1y2. (19)

(7)

Then, by Property 1(b),z:=(x²+y²)^1/2has the spectral values√ λ1,√

λ2and z=(z1, z2)=

√λ₁+√ λ₂

2 ,

√λ₂−√ λ₁

2 w2

, (20)

wherew2 := x₁x₂+y₁y₂

x₁x₂+y₁y₂ ifx1x2+y1y2 = 0 and otherwisew2 is any vector in IRⁿ⁻¹satisfyingw2 =1. The next key lemma, describing special properties ofx, y withx²+y²∈int(Kⁿ), will be used to prove Propositions 1, 2, and Lemma 6.

Lemma 2. For anyx =(x₁, x₂), y =(y₁, y₂)∈IR×IRⁿ⁻¹withx²+y²∈int(Kⁿ), we have

x₁² = x₂², y₁² = y2², x₁y₁=x₂^Ty₂, x₁y₂=y₁x₂.

Proof. By (18),x²+ y²=2x₁x₂+y₁y₂. Thus x²+ y² 2

=4x₁x₂+ y₁y₂², so that

x⁴+2x²y²+ y⁴=4(x1x2+y1y2)^T(x1x2+y1y2).

Notice thatx²=x₁²+ x2²andy²=y₁²+ y2². Thus, x₁²+ x2²

2

+2x²y²+ y₁²+ y2² 2

=4x²₁x2²+8x1y1x₂^Ty2+4y²1y2². Simplifying the above expression yields

x₁²− x2² 2

+ y₁²− y2² 2

+ 2x²y²−8x1y1x₂^Ty2

=0.

The first two terms are nonnegative. The third term is also nonnegative because x²y²= x₁²+ x2² y₁²+ y2²

≥ 2|x1|x2 2|y1|y2

=4|x1||y1|x2y2

≥4x1y₁x₂^Ty₂. Hence

x₁²= x2², y₁²= y2², 2x²y²−8x1y1x₂^Ty2=0.

Substituting x²₁ = x2² andy₁² = y2² into the last equation, the resulting three equations implyx1y1=x₂^Ty2.

(8)

It remains to prove thatx1y2=y1x2. Ifx1=0, thenx2 = |x1| =0 so this relation is true. Symmetrically, ify1=0, then this relation is also true. Suppose thatx1=0 and y1=0. Thenx2=0,y2=0, and

x1y1=x₂^Ty2= x2y2cosθ= |x1||y1|cosθ,

whereθis the angle betweenx₂andy₂. Thus, cosθ ∈ {−1,1}, i.e.,y₂=αx₂for some α=0. Then

x₁y₁=x₂^Ty₂=αx₂²=αx₁², so thaty₁/x₁=α. Thusy₂=x₂y₁/x₁.

The next technical lemma shows that two square terms are upper bounded by a quan- tity that measures how closex²+y²comes to the boundary ofKⁿ(cf. (18)). This lemma will be used to prove Lemma 4 and Proposition 2.

Lemma 3. For anyx =(x1, x2), y =(y1, y2)∈IR×IRⁿ⁻¹withx1x2+y1y2=0, we have

x₁−(x1x2+y1y2)^Tx2

x1x2+y1y2 2

≤

x₂−x₁ x1x2+y1y2

x1x2+y1y2 ²

≤ x²+ y²−2x1x2+y1y2.

Proof. The first inequality can be seen by expanding the square on both sides and using the Cauchy-Schwarz inequality. It remains to prove the second inequality. Let us multiply both sides of this inequality by

x₁x₂+y₁y₂²=x₁²x₂²+2x1y₁x₂^Ty₂+y₁²y₂²

and letLandR denote, respectively, the left-hand side and the right-hand side. Since x₁x₂+y₁y₂=0, the second inequality is equivalent toR−L≥0. We have

L= x₂²−2x1

(x₁x₂+y₁y₂)^Tx₂ x₁x₂+y₁y₂ +x₁²

x₁x₂+y₁y₂²

= x₂² x₁²x₂²+2x1y₁x₂^Ty₂+y₁²y₂²

−2x1 x₁x₂²+y₁x₂^Ty₂

x₁x₂+y₁y₂ +x₁² x₁²x₂²+2x1y₁x₂^Ty₂+y₁²y₂²

=x₁²x2⁴+2x1y1x^T₂y2x2²+y²₁x2²y2²

−2x₁²x₂²x₁x₂+y₁y₂ −2x1y₁x₂^Ty₂x₁x₂+y₁y₂ +x₁⁴x2²+2x₁³y1x₂^Ty2+x₁²y²₁y2²,

(9)

and

R= x²+ y²−2x1x2+y1y2

x1x2+y1y2²

= x₁²+ x2²−2x1x2+y1y2

x1x2+y1y2²+ y²x1x2+y1y2²

= x₁²+ x₂²−2x₁x₂+y₁y₂ x₁²x₂²+2x1y₁x₂^Ty₂+y₁²y₂²

+y²x₁x₂+y₁y₂²

=x₁⁴x2²+2x₁³y1x^T₂y2+x₁²y₁²y2²+x²₁x2⁴+2x1y1x₂^Ty2x2² +y²₁x₂²y₂²−2x₁²x₂²x₁x₂+y₁y₂ −4x1y₁x^T₂y₂x₁x₂+y₁y₂

−2y₁²y2²x1x2+y1y2 + y²x1x2+y1y2².

Thus, taking the difference and using the Cauchy-Schwarz inequality yields

R−L= y²x1x2+y1y2²−2x1y1x₂^Ty2x1x2+y1y2 −2y₁²y2²x1x2+y1y2

=y₁²x₁x₂+y₁y₂²+ y₂²x₁x₂+y₁y₂²

−2y1y₂^T(x1x2+y1y2)x1x2+y1y2

≥y₁²x₁x₂+y₁y₂²+ y₂²x₁x₂+y₁y₂²−2|y₁|y₂ x₁x₂+y₁y₂²

= |y1| − y2 2

x1x2+y1y2²

≥0.

Using Lemmas 1, 2, 3, and [19, Proposition 5.2], we prove our first main result showing thatψ_FBis differentiable and its gradient has a computable formula.

Proposition 1. Letφbe given by (11). Thenψ_FBgiven by (17) has the following prop- erties.

(a) ψ_FB: IRⁿ×IRⁿ→IR₊and satisfies (8).

(b) ψ_FB is differentiable at every (x, y) ∈ IRⁿ × IRⁿ. Moreover, ∇xψ_FB(0,0)

= ∇yψ_FB(0,0)=0. If(x, y)=(0,0)andx²+y²∈int(Kⁿ), then

∇xψ_FB(x, y)= L_xL⁻¹

(x²+y²)^1/2−I

φ (x, y),

∇yψ_FB(x, y)= L_yL⁻¹

(x²+y²)^1/2−I

φ (x, y). (21)

If(x, y)=(0,0)andx²+y²∈int(Kⁿ), thenx²₁+y₁²=0 and

∇xψ_FB(x, y)=



 x1

x₁²+y₁²

−1



φ (x, y), (22)

∇yψ_FB(x, y)=



 y₁

x₁²+y₁²

−1



φ (x, y). (23)

(10)

Proof. (a) This follows from Lemma 1.

(b) Case (1):x=y=0.

For anyh, k ∈ IRⁿ, letµ1 ≤ µ2be the spectral values and letv⁽¹⁾, v⁽²⁾ be the corresponding spectral vectors ofh²+k². Then, by Property 1(b),

(h²+k²)^1/2−h−k = √

µ₁v⁽¹⁾+√

µ₂v⁽²⁾−h−k

≤√

µ₁v⁽¹⁾ +√

µ₂v⁽²⁾ + h + k

=(√ µ1+√

µ2)/√

2+ h + k. Also

µ₁ ≤ µ₂= h²+ k²+2h₁h₂+k₁k₂

≤ h²+ k²+2|h1|h2 +2|k1|k2

≤2h²+2k². Combining the above two inequalities yields

ψ_FB(h, k)−ψ_FB(0,0)= (h²+k²)^1/2−h−k²

≤ (√

µ₁+√ µ₂)/√

2+ h + k2

≤ 2

2h²+2k²/√

2+ h + k2

=O(h²+ k²).

This shows thatψ_FBis differentiable at(0,0)with

∇xψ_FB(0,0)= ∇yψ_FB(0,0)=0.

Case (2):(x, y)=(0,0)andx²+y²∈int(Kⁿ).

Sincex²+y² ∈ int(Kⁿ), Proposition 5.2 of [19] implies thatφis continuously differentiable at (x, y). Since ψ_FB is the composition ofφ withx → ¹₂x², then ψ_FBis continuously differentiable at(x, y). The expressions (21) for∇xψ_FB(x, y)and

∇yψ_FB(x, y)follow from the chain rule for differentiation and the expression for the Jacobian ofφgiven in [19, Proposition 5.2] (also see [19, Corollary 5.4]).

Case (3):(x, y)=(0,0)andx²+y²∈int(Kⁿ).

By (18), x²+ y² = 2x1x2 +y1y2. Since (x, y) = (0,0), this also implies x1x2+y1y2=0, so Lemmas 2 and 3 are applicable. By (20),

(x²+y²)^1/2=

√λ₁+√ λ₂

2 ,

√λ₂−√ λ₁

2 w2

,

whereλ₁, λ₂are given by (19) andw₂ := x1x2+y1y2

x1x2+y1y2. Thusλ₁=0 andλ₂ >0.

Sincex1x2+y1y2=0, we havex₁x₂+y₁y₂ =0 for all(x, y)∈IRⁿ×IRⁿsufficiently near to(x, y). Moreover,

(11)

2ψ_FB(x, y)= (x²+y²)^1/2−x−y²

= (x²+y²)^1/2²+ x+y²−2(x²+y²)^1/2, x+y

= x²+ y²+ x+y²−2(x²+y²)^1/2, x+y, where the third equality uses the observation thatz²= z², efor anyz∈IRⁿ. Since x²+ y²+ x+y²is clearly differentiable in(x, y), it suffices to show that

2(x²+y²)^1/2, x+y

=(√ µ₂+√

µ₁)(x₁ +y₁)+(√ µ₂−√

µ₁)(x₁x₂+y₁y₂)^T(x₂ +y₂) x₁x₂ +y₁y₂

=√ µ2

x₁+y₁ +(x₁x₂ +y₁y₂)^T(x₂+y₂) x₁x₂+y₁y₂

+√ µ₁

x₁ +y₁−(x₁x₂ +y₁y₂)^T(x₂ +y₂) x₁x₂ +y₁y₂

(24) is differentiable at(x, y)=(x, y), whereµ1, µ2are the spectral values ofx²+y², i.e.,µ_i = x²+y²+2(−1)ⁱx₁x₂+y₁y₂. Sinceλ2>0, we see that the first term on the right-hand side of (24) is differentiable at(x, y)= (x, y). We claim that the second term on the right-hand side of (24) iso(h+k)withh:=x−x, k:=y−y, i.e., it is differentiable with zero gradient. To see this, notice thatx1x2+y1y2=0, so that µ1= x²+ y²−2x₁x₂+y₁y₂, viewed as a function of(x, y), is differentiable at(x, y)=(x, y). Moreover,µ1=λ1 =0 when(x, y)=(x, y). Thus, first-order Taylor’s expansion ofµ₁at(x, y)yields

µ₁ = O(x−x + y−y) = O(h + k).

Also, sincex1x2+y1y2=0, by the product and quotient rules for differentiation, the function

x₁ +y₁ −(x₁x₂+y₁y₂)^T(x₂ +y₂)

x₁x₂ +y₁y₂ (25) is differentiable at(x, y)=(x, y). Moreover, the function (25) has value 0 at(x, y)= (x, y). This is because

x₁+y₁−(x1x2+y1y2)^T(x2+y2)

x1x2+y1y2 =x₁−w^T₂x₂ + y₁−w^T₂y₂ = 0+0, wherew₂:=(x₁x₂+y₁y₂)/x₁x₂+y₁y₂and the last equality uses the fact that, by Lemma 3 andx²+ y²=2x₁x₂+y₁y₂, we havew₂^Tx₂=x₁,w^T₂y₂=y₁. (By symmetry, Lemma 3 still holds whenxandyare switched.) Thus, the function (25) is O(h + k)in magnitude. This together withµ₁ =O(h + k)shows that the second term on the right of (24) isO((h + k)^3/2)=o(h + k).

Thus, we have shown thatψ_FB is differentiable at(x, y). Moreover, the preceding argument shows that 2∇ψ_FB(x, y)is the sum of the gradient ofx²+y²+x+y²

(12)

and the gradient of the first term on the right of (24), evaluated at(x, y)=(x, y). The gradient ofx²+ y²+ x+y²with respect tox, evaluated at(x, y)=(x, y), is 4x+2y. Using the product and quotient rules for differentiation, the gradient of the first term on the right of (24) with respect tox₁, evaluated at(x, y)=(x, y), works out to be

x₁+w₂^Tx₂

√λ2

x₁+y₁+w₂^T(x₂+y₂)

+ λ2

1+ x₂^T(x2+y2)

x₁x₂+y₁y₂− w₂^T(x2+y2) x₁x₂+y₁y₂w^T₂x2

= 2x1(x1+y1)

x₁²+y₁² +2

x₁²+y₁²,

wherew₂:=(x₁x₂+y₁y₂)/x₁x₂+y₁y₂and the equality uses Lemma 2 and the fact that, by Lemma 3 andx²+y²=2x₁x₂+y₁y₂, we havew₂^Tx₂=x₁,w^T₂y₂=y₁. Similarly, the gradient of the first term on the right of (24) with respect tox₂, evaluated at(x, y)=(x, y), works out to be

x₂+w₂x₁

√λ₂

x1+y1+w^T₂(x2+y2)

+ λ₂

2x1x2+(x1+y1)y2

x₁x₂+y₁y₂ − w₂^T(x₂+y₂) x₁x₂+y₁y₂w₂x₁

=2 2x1x₂+(x₁+y₁)y₂

x²₁+y₁² .

In particular, the equality uses the fact that, by Lemma 2, we have x₁y₂ =y₁x₂and x₁x₂+y₁y₂ =x₁²+y₁², so thatw₂x₁=x₂andλ₂=4(x₁²+y₁²). Thus, combining the preceding gradient expressions yields

2∇xψ_FB(x, y)=4x+2y−

2

x₁²+y₁² 0

− 2

x₁²+y₁²

x1(x1+y1) 2x1x2+(x1+y1)y2

.

Usingx1x2+y1y2 =x₁²+y₁²andλ2=4(x₁²+y₁²), we can also write (x²+y²)^1/2 =





x₁²+y₁², x₁x₂+y₁y₂

x₁²+y₁²



,

so that

φ (x, y) = x₁²+y₁²−(x1+y1),x₁x₂+y₁y₂

x₁²+y₁²

−(x2+y2)

. (26)

Using the fact thatx1y2=y1x2, we can rewrite the above expression for∇xψ_FB(x, y) in the form of (22). By symmetry, (23) also holds.