Digital Object Identifier (DOI) 10.1007/s10107-005-0617-0
Jein-Shan Chen·Paul Tseng
An unconstrained smooth minimization reformulation of the second-order cone complementarity problem
In honor of Terry Rockafellar on his 70th birthday Received: July 12, 2004 / Accepted: May 25, 2005 Published online: July 14, 2005 – © Springer-Verlag 2005
Abstract. A popular approach to solving the nonlinear complementarity problem (NCP) is to reformulate it as the global minimization of a certain merit function over IRn. A popular choice of the merit function is the squared norm of the Fischer-Burmeister function, shown to be smooth over IRnand, for monotone NCP, each stationary point is a solution of the NCP. This merit function and its analysis were subsequently extended to the semidefinite complementarity problem (SDCP), although only differentiability, not continuous differen- tiability, was established. In this paper, we extend this merit function and its analysis, including continuous differentiability, to the second-order cone complementarity problem (SOCCP). Although SOCCP is reducible to a SDCP, the reduction does not allow for easy translation of the analysis from SDCP to SOCCP. Instead, our analysis exploits properties of the Jordan product and spectral factorization associated with the second-order cone. We also report preliminary numerical experience with solving DIMACS second-order cone programs using a limited-memory BFGS method to minimize the merit function.
Key words. Second-order cone – Complementarity – Merit function – Spectral factorization – Jordan product – Level set – Error bound
1. Introduction
We consider the following conic complementarity problem of findingx, y ∈ IRnand ζ ∈IRnsatisfying
x, y =0, x∈K, y ∈K, (1)
x =F (ζ ), y =G(ζ ), (2)
where·,·is the Euclidean inner product,F : IRn→IRnandG: IRn →IRnare smooth (i.e., continuously differentiable) mappings, andKis a closed convex cone in IRn that is self-dual in the sense thatKequals its dual coneK∗:= {y | x, y ≥0∀x ∈K}. We will focus on the case whereKis the Cartesian product of second-order cones (SOC), also called Lorentz cones [11]. In other words,
K=Kn1× · · · ×KnN, (3)
J.-S. Chen: Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan.
e-mail:jschen@math.ntnu.edu.tw
P. Tseng: Department of Mathematics, University of Washington, Seattle, Washington 98195, USA.
e-mail:tseng@math.washington.edu
Mathematics Subject Classification (1991): 26B05, 26B35, 90C33, 65K05
whereN, n1, . . . , nN ≥1,n1+ · · · +nN =n, and
Kni:= {(x1, x2)∈IR×IRni−1| x2 ≤x1},
with·denoting the Euclidean norm andK1denoting the set of nonnegative reals IR+. A special case of (3) isK=IRn+, the nonnegative orthant in IRn, which corresponds to N =nandn1= · · · =nN =1. We will refer to (1), (2), (3) as the second-order cone complementarity problem (SOCCP).
An important special case of SOCCP corresponds toG(ζ )=ζfor allζ ∈IRn. Then (1) and (2) reduce to
F (ζ ), ζ =0, F (ζ )∈K, ζ ∈K. (4) IfK=IRn+, then (4) reduces to the nonlinear complementarity problem (NCP) and (1)–
(2) reduce to the vertical NCP [9]. The NCP plays a fundamental role in optimization theory and has many applications in engineering and economics; see, e.g., [9, 13–15].
Another important special case of SOCCP corresponds to the Karush-Kuhn-Tucker (KKT) optimality conditions for the convex second-order cone program (CSOCP):
minimize g(x)
subject toAx =b, x∈K, (5)
whereA∈IRm×nhas full row rank,b∈IRmandg: IRn →IR is a convex twice con- tinuously differentiable function. Whengis linear, this reduces to the SOCP which has numerous applications in engineering design, finance, robust optimization, and includes as special cases convex quadratically constrained quadratic programs and linear pro- grams (LP); see [1, 33] and references therein. The KKT optimality conditions for (5), which are sufficient but not necessary for optimality, are (1) and
Ax=b, y = ∇g(x)−ATζd for someζd∈IRm.
Choose anyd ∈ IRnsatisfyingAd =b. (If no suchd exists, then (5) has no feasible solution.) LetB ∈ IRn×(n−m) be any matrix whose columns span the null space ofA.
Thenx satisfiesAx =bif and only ifx =d +Bζpfor someζp ∈IRn−m. Thus, the KKT optimality conditions can be written in the form of (1) and (2) with
ζ :=(ζp, ζd), F (ζ ):=d+Bζp, G(ζ ):= ∇g(F (ζ ))−ATζd. (6) Alternatively, since any ζ ∈ IRn can be decomposed into the sum of its orthogonal projection onto the column space ofAT and the null space ofA,
F (ζ ):=d+(I−AT(AAT)−1A)ζ, G(ζ ):= ∇g(F (ζ ))−AT(AAT)−1Aζ (7) can also be used in place of (6). For large problems whereAis sparse, (7) has the advan- tage that the main cost of evaluating the Jacobians∇F and∇Glies in invertingAAT, which can be done efficiently via sparse Cholesky factorization. In contrast, (6) entails multiplication by the matrixB, which can be dense.
There have been proposed various methods for solving CSOCP and SOCCP. They include interior-point methods [2, 3, 33, 36, 37, 42, 52], reformulating SOC constraints as smooth convex constraints [4], (non-interior) smoothing Newton methods [6, 19],
and smoothing–regularization methods [22]. These methods require solving a nontrivial system of linear equations at each iteration. For the case whereG≡I andF is affine with∇FstrictlyK-copositive, a matrix splitting method has been proposed [21]. In this paper, we study an alternative approach based on reformulating CSOCP and SOCCP as an unconstrained smooth minimization problem. In particular, we aim to find a smooth functionψ: IRn×IRn →IR+such that
ψ (x, y)=0 ⇐⇒ (x, y)satisfies (1). (8) We call such aψa merit function. Then SOCCP can be expressed as an unconstrained smooth (global) minimization problem:
ζmin∈IRn f (ζ ):=ψ (F (ζ ), G(ζ )). (9) Various gradient methods, such as conjugate gradient methods and (limited-memory) quasi-Newton methods [5, 18, 38], can now be applied to solve (9). They have the advan- tage of requiring less work per iteration than interior-point methods and non-interior Newton methods. This approach can also be combined with smoothing and nonsmooth Newton methods to improve the efficiency and robustness of the latter, as was done in the case of NCP [7, 8, 12, 17, 24, 27, 30]. For this approach to be effective, the choice of ψis crucial. In the case of NCP, corresponding to (4) andK=IRn+, a popular choice is
ψ (x, y) = 1 2
n i=1
φ (xi, yi)2
for all x = (x1, ..., xn)T ∈ IRn, whereφ is the well-known Fischer-Burmeister (FB) NCP-function [16, 17] defined by
φ (xi, yi)=
xi2+yi2−xi−yi.
It has been shown thatψis smooth (even thoughφis not differentiable) and satisfies (8) [10, 25, 26]. Moreover, whenF is monotone or, more generally, aP0-function, every stationary point ofζ →ψ (F (ζ ), ζ )is a solution of NCP [10, 20]. This is an important property since (i) gradient methods are guaranteed to find stationary points only, and (ii) when an LP is reformulated as an NCP, the resultingFis monotone, but neither strongly monotone nor a uniformly P-function. In contrast, other smooth merit functions for NCP, such as the implicit Lagrangian and the D-gap function [28, 35, 40, 45, 51, 54], requireF to be a uniformlyP-function in order for stationary points to be solutions of NCP. Thus these other merit functions cannot be used for LP. Subsequently, a number of variants ofψwith additional desirable properties have been proposed, e.g., [6, 10, 29, 31, 34, 41, 47, 49, 53]. A recent discussion of these variants can be found in the paper [47].
Moreover, the above merit functionψ, as well as a related merit function of Yamashita and Fukushima [53], have been extended to the semidefinite complementarity problem (SDCP), which has the form (1), (2), but withx, ybeingq×q(q ≥1) real symmetric block-diagonal matrices of fixed block sizes,·,·being the trace inner product, andK being the cone ofq ×q block-diagonal positive semidefinite matrices of fixed block
sizes [50, 53]. However, the analysis in [50] showedψto be differentiable, but did not show it to be smooth.1
Can the above merit functions for NCP be extended to SOCCP? To our knowledge, this question has not been studied previously. We study it in this paper. We are motivated by previous work on extending merit function from NCP to SDCP [50, 53]. We are further motivated by a recent work [19] showing that the FB function extends from NCP to SOCCP using the Jordan product associated with SOC [11]. Nice properties of the FB function, such as strong semismoothness, are preserved when extended to SOCCP [48]. More specifically, for anyx =(x1, x2), y=(y1, y2)∈IR×IRn−1, we define their Jordan product associated withKnas
x·y := (x, y, y1x2+x1y2). (10) The identity element under this product ise :=(1,0, . . . ,0)T ∈ IRn. We writex2to meanx·xand writex+yto mean the usual componentwise addition of vectors. It is known thatx2 ∈ Knfor allx ∈ IRn. Moreover, ifx ∈ Kn, then there exists a unique vector inKn, denoted byx1/2, such that(x1/2)2=x1/2·x1/2=x. Then,
φ (x, y):=(x2+y2)1/2−x−y (11) is well defined for all(x, y) ∈ IRn×IRnand maps IRn×IRnto IRn. It was shown in [19] thatφ (x, y)=0 if and only if(x, y)satisfies (1). Thus,
ψFB(x, y):= 1 2
N i=1
φ (xi, yi)2, (12)
wherex =(x1, . . . , xN)T, y=(y1, . . . , yN)T ∈IRn1× · · · ×IRnN, is a merit function for SOCCP. We will show that, like the NCP case,ψFB is smooth and, when ∇F and
−∇Gare column monotone, every stationary point of (9) solves SOCCP; see Propo- sitions 2 and 3. The same holds for the following analog of the SDCP merit function studied by Yamashita and Fukushima [53]:
ψYF(x, y):=ψ0(x, y)+ψFB(x, y), (13) whereψ0: IR→[0,∞)is any smooth function satisfying
ψ0(t )=0 ∀t≤0 and ψ0(t ) >0 ∀t >0; (14) see Proposition 4. In [53], ψ0(t ) = 14(max{0, t})4was considered. Analogous to the NCP and SDCP cases, when∇G(ζ )is invertible, a∇F-free descent direction for
fFB(ζ ):=ψFB(F (ζ ), G(ζ )) (15) and
fYF(ζ ):=ψYF(F (ζ ), G(ζ )) (16)
1 During the revising of this paper, a proof of smoothness is reported in [43].
can be found. The functionfYF, compared tofFB, has additional bounded level-set and error bound properties; see Section 5. Our proof of the smoothness ofψFBin Section 3 is quite technical, but further simplification seems difficult. In particular, neither general properties of the Jordan product associated with symmetric cones [11] nor the strong semismoothness proof forφgiven in [48] lend themselves readily to a smoothness proof forψFB. In Section 6, we report our numerical experience with solving SOCP (5) from the DIMACS library by using a limited-memory BFGS (L-BFGS) method to minimize fFB, withF andG given by (7). On problems withn mand for low-to-medium solution accuracy, L-BFGS appears to be competitive with interior-point methods. We also report our experience with solving CSOCP using a BFGS method to minimizefFB. It is known that SOCCP can be reduced to an SDCP by observing that, for any x =(x1, x2)∈IR×IRn−1, we havex∈Knif and only if
Lx:= x1 x2T
x2x1I
is positive semidefinite (also see [19, p. 437] and [44]). However, this reduction increases the problem dimension fromnton(n+1)/2 and it is not known whether this increase can be mitigated by exploiting the special “arrow” structure ofLx.
Throughout this paper, IRndenotes the space ofn-dimensional real column vectors andT denotes transpose. For any differentiable functionf : IRn →IR,∇f (x)denotes the gradient off atx. For any differentiable mappingF =(F1, ..., Fm)T : IRn→IRm,
∇F (x) = [∇F1(x) · · · ∇Fm(x)] denotes the transpose Jacobian ofF atx. For any symmetric matricesA, B ∈ IRn×n, we writeA B (respectively, A B) to mean A−Bis positive semidefinite (respectively, positive definite). For nonnegative scalars αandβ, we writeα=O(β)to meanα≤Cβ, withCindependent ofαandβ.
2. Jordan product and spectral factorization
It is known thatKnis a closed convex self-dual cone with nonempty interior given by int(Kn)= {(x1, x2)∈IR×IRn−1| x2< x1}.
The Jordan product (10), unlike scalar or matrix multiplication, is not associative, which is a main source of complication in the analysis of SOCCP. For any x = (x1, x2) ∈ IR×IRn−1, its determinant is defined by
det (x):=x12− x22. In general,det (x·y)=det (x)det (y)unlessx2=y2.
We next recall from [19] that eachx = (x1, x2) ∈ IR×IRn−1 admits a spectral factorization, associated withKn, of the form
x=λ1u(1)+λ2u(2),
whereλ1, λ2andu(1), u(2)are the spectral values and the associated spectral vectors of xgiven by
λi =x1+(−1)ix2, u(i)=
1
2 1, (−1)i x2 x2
ifx2=0;
1
2 1, (−1)iw2
ifx2=0,
fori =1,2, withw2being any vector in IRn−1 satisfyingw2 = 1. Ifx2 = 0, the factorization is unique.
The above spectral factorization ofx, as well asx2andx1/2and the matrixLx, have various interesting properties; see [19]. We list four properties that we will use later.
Property 1. For anyx=(x1, x2)∈IR×IRn−1, with spectral valuesλ1, λ2and spectral vectorsu(1), u(2), the following results hold.
(a) x2=λ21u(1)+λ22u(2)∈Kn.
(b) Ifx∈Kn, then 0≤λ1≤λ2andx1/2=√
λ1u(1)+√ λ2u(2).
(c) Ifx ∈int(Kn), then 0< λ1≤λ2,det (x)=λ1λ2, andLxis invertible with L−x1= 1
det (x)
x1 −x2T
−x2 det (x) x1
I+ 1 x1
x2x2T
.
(d) x·y =Lxyfor ally ∈IRn, andLx0 if and only ifx ∈int(Kn).
3. Smoothness property of merit functions
In this section we show that the functions (12) and (13) are smooth functions satisfying (8). For simplicity, we focus on the special case ofN=1, i.e.,
ψFB(x, y)= 1
2φ (x, y)2 (17)
in this and the next two sections. Extension of our analyses to the general case ofN ≥1 is straightforward. We begin with the following result from [19] showing that the FB functionφgiven by (11) has property analogous to the NCP and SDCP cases. Additional properties ofφare studied in [19, 48].
Lemma 1. ([19, Proposition 2.1]) Letφ: IRn×IRn→IRnbe given by (11). Then φ (x, y)=0⇐⇒x, y∈Kn, x·y =0,
⇐⇒x, y∈Kn, x, y =0.
Sincex2, y2∈Knfor anyx, y ∈IRn, we have thatx2+y2=(x2+y2,2x1x2+ 2y1y2)∈Kn. Thus
x2+y2∈int(Kn) ⇐⇒ x2+ y2=2x1x2+y1y2. (18) The spectral values ofx2+y2are
λ1:= x2+ y2−2x1x2+y1y2,
λ2:= x2+ y2+2x1x2+y1y2. (19)
Then, by Property 1(b),z:=(x2+y2)1/2has the spectral values√ λ1,√
λ2and z=(z1, z2)=
√λ1+√ λ2
2 ,
√λ2−√ λ1
2 w2
, (20)
wherew2 := x1x2+y1y2
x1x2+y1y2 ifx1x2+y1y2 = 0 and otherwisew2 is any vector in IRn−1satisfyingw2 =1. The next key lemma, describing special properties ofx, y withx2+y2∈int(Kn), will be used to prove Propositions 1, 2, and Lemma 6.
Lemma 2. For anyx =(x1, x2), y =(y1, y2)∈IR×IRn−1withx2+y2∈int(Kn), we have
x12 = x22, y12 = y22, x1y1=x2Ty2, x1y2=y1x2.
Proof. By (18),x2+ y2=2x1x2+y1y2. Thus x2+ y2 2
=4x1x2+ y1y22, so that
x4+2x2y2+ y4=4(x1x2+y1y2)T(x1x2+y1y2).
Notice thatx2=x12+ x22andy2=y12+ y22. Thus, x12+ x22
2
+2x2y2+ y12+ y22 2
=4x21x22+8x1y1x2Ty2+4y21y22. Simplifying the above expression yields
x12− x22 2
+ y12− y22 2
+ 2x2y2−8x1y1x2Ty2
=0.
The first two terms are nonnegative. The third term is also nonnegative because x2y2= x12+ x22 y12+ y22
≥ 2|x1|x2 2|y1|y2
=4|x1||y1|x2y2
≥4x1y1x2Ty2. Hence
x12= x22, y12= y22, 2x2y2−8x1y1x2Ty2=0.
Substituting x21 = x22 andy12 = y22 into the last equation, the resulting three equations implyx1y1=x2Ty2.
It remains to prove thatx1y2=y1x2. Ifx1=0, thenx2 = |x1| =0 so this relation is true. Symmetrically, ify1=0, then this relation is also true. Suppose thatx1=0 and y1=0. Thenx2=0,y2=0, and
x1y1=x2Ty2= x2y2cosθ= |x1||y1|cosθ,
whereθis the angle betweenx2andy2. Thus, cosθ ∈ {−1,1}, i.e.,y2=αx2for some α=0. Then
x1y1=x2Ty2=αx22=αx12, so thaty1/x1=α. Thusy2=x2y1/x1.
The next technical lemma shows that two square terms are upper bounded by a quan- tity that measures how closex2+y2comes to the boundary ofKn(cf. (18)). This lemma will be used to prove Lemma 4 and Proposition 2.
Lemma 3. For anyx =(x1, x2), y =(y1, y2)∈IR×IRn−1withx1x2+y1y2=0, we have
x1−(x1x2+y1y2)Tx2
x1x2+y1y2 2
≤
x2−x1 x1x2+y1y2
x1x2+y1y2 2
≤ x2+ y2−2x1x2+y1y2.
Proof. The first inequality can be seen by expanding the square on both sides and using the Cauchy-Schwarz inequality. It remains to prove the second inequality. Let us multiply both sides of this inequality by
x1x2+y1y22=x12x22+2x1y1x2Ty2+y12y22
and letLandR denote, respectively, the left-hand side and the right-hand side. Since x1x2+y1y2=0, the second inequality is equivalent toR−L≥0. We have
L= x22−2x1
(x1x2+y1y2)Tx2 x1x2+y1y2 +x12
x1x2+y1y22
= x22 x12x22+2x1y1x2Ty2+y12y22
−2x1 x1x22+y1x2Ty2
x1x2+y1y2 +x12 x12x22+2x1y1x2Ty2+y12y22
=x12x24+2x1y1xT2y2x22+y21x22y22
−2x12x22x1x2+y1y2 −2x1y1x2Ty2x1x2+y1y2 +x14x22+2x13y1x2Ty2+x12y21y22,
and
R= x2+ y2−2x1x2+y1y2
x1x2+y1y22
= x12+ x22−2x1x2+y1y2
x1x2+y1y22+ y2x1x2+y1y22
= x12+ x22−2x1x2+y1y2 x12x22+2x1y1x2Ty2+y12y22
+y2x1x2+y1y22
=x14x22+2x13y1xT2y2+x12y12y22+x21x24+2x1y1x2Ty2x22 +y21x22y22−2x12x22x1x2+y1y2 −4x1y1xT2y2x1x2+y1y2
−2y12y22x1x2+y1y2 + y2x1x2+y1y22.
Thus, taking the difference and using the Cauchy-Schwarz inequality yields
R−L= y2x1x2+y1y22−2x1y1x2Ty2x1x2+y1y2 −2y12y22x1x2+y1y2
=y12x1x2+y1y22+ y22x1x2+y1y22
−2y1y2T(x1x2+y1y2)x1x2+y1y2
≥y12x1x2+y1y22+ y22x1x2+y1y22−2|y1|y2 x1x2+y1y22
= |y1| − y2 2
x1x2+y1y22
≥0.
Using Lemmas 1, 2, 3, and [19, Proposition 5.2], we prove our first main result showing thatψFBis differentiable and its gradient has a computable formula.
Proposition 1. Letφbe given by (11). ThenψFBgiven by (17) has the following prop- erties.
(a) ψFB: IRn×IRn→IR+and satisfies (8).
(b) ψFB is differentiable at every (x, y) ∈ IRn × IRn. Moreover, ∇xψFB(0,0)
= ∇yψFB(0,0)=0. If(x, y)=(0,0)andx2+y2∈int(Kn), then
∇xψFB(x, y)= LxL−1
(x2+y2)1/2−I
φ (x, y),
∇yψFB(x, y)= LyL−1
(x2+y2)1/2−I
φ (x, y). (21)
If(x, y)=(0,0)andx2+y2∈int(Kn), thenx21+y12=0 and
∇xψFB(x, y)=
x1
x12+y12
−1
φ (x, y), (22)
∇yψFB(x, y)=
y1
x12+y12
−1
φ (x, y). (23)
Proof. (a) This follows from Lemma 1.
(b) Case (1):x=y=0.
For anyh, k ∈ IRn, letµ1 ≤ µ2be the spectral values and letv(1), v(2) be the corre- sponding spectral vectors ofh2+k2. Then, by Property 1(b),
(h2+k2)1/2−h−k = √
µ1v(1)+√
µ2v(2)−h−k
≤√
µ1v(1) +√
µ2v(2) + h + k
=(√ µ1+√
µ2)/√
2+ h + k. Also
µ1 ≤ µ2= h2+ k2+2h1h2+k1k2
≤ h2+ k2+2|h1|h2 +2|k1|k2
≤2h2+2k2. Combining the above two inequalities yields
ψFB(h, k)−ψFB(0,0)= (h2+k2)1/2−h−k2
≤ (√
µ1+√ µ2)/√
2+ h + k2
≤ 2
2h2+2k2/√
2+ h + k2
=O(h2+ k2).
This shows thatψFBis differentiable at(0,0)with
∇xψFB(0,0)= ∇yψFB(0,0)=0.
Case (2):(x, y)=(0,0)andx2+y2∈int(Kn).
Sincex2+y2 ∈ int(Kn), Proposition 5.2 of [19] implies thatφis continuously differentiable at (x, y). Since ψFB is the composition ofφ withx → 12x2, then ψFBis continuously differentiable at(x, y). The expressions (21) for∇xψFB(x, y)and
∇yψFB(x, y)follow from the chain rule for differentiation and the expression for the Jacobian ofφgiven in [19, Proposition 5.2] (also see [19, Corollary 5.4]).
Case (3):(x, y)=(0,0)andx2+y2∈int(Kn).
By (18), x2+ y2 = 2x1x2 +y1y2. Since (x, y) = (0,0), this also implies x1x2+y1y2=0, so Lemmas 2 and 3 are applicable. By (20),
(x2+y2)1/2=
√λ1+√ λ2
2 ,
√λ2−√ λ1
2 w2
,
whereλ1, λ2are given by (19) andw2 := x1x2+y1y2
x1x2+y1y2. Thusλ1=0 andλ2 >0.
Sincex1x2+y1y2=0, we havex1x2+y1y2 =0 for all(x, y)∈IRn×IRnsufficiently near to(x, y). Moreover,
2ψFB(x, y)= (x2+y2)1/2−x−y2
= (x2+y2)1/22+ x+y2−2(x2+y2)1/2, x+y
= x2+ y2+ x+y2−2(x2+y2)1/2, x+y, where the third equality uses the observation thatz2= z2, efor anyz∈IRn. Since x2+ y2+ x+y2is clearly differentiable in(x, y), it suffices to show that
2(x2+y2)1/2, x+y
=(√ µ2+√
µ1)(x1 +y1)+(√ µ2−√
µ1)(x1x2+y1y2)T(x2 +y2) x1x2 +y1y2
=√ µ2
x1+y1 +(x1x2 +y1y2)T(x2+y2) x1x2+y1y2
+√ µ1
x1 +y1−(x1x2 +y1y2)T(x2 +y2) x1x2 +y1y2
(24) is differentiable at(x, y)=(x, y), whereµ1, µ2are the spectral values ofx2+y2, i.e.,µi = x2+y2+2(−1)ix1x2+y1y2. Sinceλ2>0, we see that the first term on the right-hand side of (24) is differentiable at(x, y)= (x, y). We claim that the second term on the right-hand side of (24) iso(h+k)withh:=x−x, k:=y−y, i.e., it is differentiable with zero gradient. To see this, notice thatx1x2+y1y2=0, so that µ1= x2+ y2−2x1x2+y1y2, viewed as a function of(x, y), is differentiable at(x, y)=(x, y). Moreover,µ1=λ1 =0 when(x, y)=(x, y). Thus, first-order Taylor’s expansion ofµ1at(x, y)yields
µ1 = O(x−x + y−y) = O(h + k).
Also, sincex1x2+y1y2=0, by the product and quotient rules for differentiation, the function
x1 +y1 −(x1x2+y1y2)T(x2 +y2)
x1x2 +y1y2 (25) is differentiable at(x, y)=(x, y). Moreover, the function (25) has value 0 at(x, y)= (x, y). This is because
x1+y1−(x1x2+y1y2)T(x2+y2)
x1x2+y1y2 =x1−wT2x2 + y1−wT2y2 = 0+0, wherew2:=(x1x2+y1y2)/x1x2+y1y2and the last equality uses the fact that, by Lemma 3 andx2+ y2=2x1x2+y1y2, we havew2Tx2=x1,wT2y2=y1. (By symmetry, Lemma 3 still holds whenxandyare switched.) Thus, the function (25) is O(h + k)in magnitude. This together withµ1 =O(h + k)shows that the second term on the right of (24) isO((h + k)3/2)=o(h + k).
Thus, we have shown thatψFB is differentiable at(x, y). Moreover, the preceding argument shows that 2∇ψFB(x, y)is the sum of the gradient ofx2+y2+x+y2
and the gradient of the first term on the right of (24), evaluated at(x, y)=(x, y). The gradient ofx2+ y2+ x+y2with respect tox, evaluated at(x, y)=(x, y), is 4x+2y. Using the product and quotient rules for differentiation, the gradient of the first term on the right of (24) with respect tox1, evaluated at(x, y)=(x, y), works out to be
x1+w2Tx2
√λ2
x1+y1+w2T(x2+y2)
+ λ2
1+ x2T(x2+y2)
x1x2+y1y2− w2T(x2+y2) x1x2+y1y2wT2x2
= 2x1(x1+y1)
x12+y12 +2
x12+y12,
wherew2:=(x1x2+y1y2)/x1x2+y1y2and the equality uses Lemma 2 and the fact that, by Lemma 3 andx2+y2=2x1x2+y1y2, we havew2Tx2=x1,wT2y2=y1. Similarly, the gradient of the first term on the right of (24) with respect tox2, evaluated at(x, y)=(x, y), works out to be
x2+w2x1
√λ2
x1+y1+wT2(x2+y2)
+ λ2
2x1x2+(x1+y1)y2
x1x2+y1y2 − w2T(x2+y2) x1x2+y1y2w2x1
=2 2x1x2+(x1+y1)y2
x21+y12 .
In particular, the equality uses the fact that, by Lemma 2, we have x1y2 =y1x2and x1x2+y1y2 =x12+y12, so thatw2x1=x2andλ2=4(x12+y12). Thus, combining the preceding gradient expressions yields
2∇xψFB(x, y)=4x+2y−
2
x12+y12 0
− 2
x12+y12
x1(x1+y1) 2x1x2+(x1+y1)y2
.
Usingx1x2+y1y2 =x12+y12andλ2=4(x12+y12), we can also write (x2+y2)1/2 =
x12+y12, x1x2+y1y2
x12+y12
,
so that
φ (x, y) = x12+y12−(x1+y1),x1x2+y1y2
x12+y12
−(x2+y2)
. (26)
Using the fact thatx1y2=y1x2, we can rewrite the above expression for∇xψFB(x, y) in the form of (22). By symmetry, (23) also holds.