• Aucun résultat trouvé

and the exponential multiplier method for convex symmetric cone programming

N/A
N/A
Protected

Academic year: 2022

Partager "and the exponential multiplier method for convex symmetric cone programming"

Copied!
23
0
0

Texte intégral

(1)

DOI 10.1007/s10589-008-9227-0

An entropy-like proximal algorithm

and the exponential multiplier method for convex symmetric cone programming

Jein-Shan Chen·Shaohua Pan

Received: 28 April 2006 / Revised: 6 December 2008 / Published online: 18 December 2008

© Springer Science+Business Media, LLC 2008

Abstract We introduce an entropy-like proximal algorithm for the problem of min- imizing a closed proper convex function subject to symmetric cone constraints. The algorithm is based on a distance-like function that is an extension of the Kullback- Leiber relative entropy to the setting of symmetric cones. Like the proximal algo- rithms for convex programming with nonnegative orthant cone constraints, we show that, under some mild assumptions, the sequence generated by the proposed algo- rithm is bounded and every accumulation point is a solution of the considered prob- lem. In addition, we also present a dual application of the proposed algorithm to the symmetric cone linear program, leading to a multiplier method which is shown to possess similar properties as the exponential multiplier method (Tseng and Bertsekas in Math. Program. 60:1–19,1993) holds.

Keywords Symmetric cone optimization·Proximal-like method·Entropy-like distance·Exponential multiplier method

1 Introduction

Symmetric cone programming provides a unified framework for linear programming, second-order cone programming and semidefinite programming, which arise from a

J.-S. Chen is a member of Mathematics Division, National Center for Theoretical Sciences, Taipei Office. The author’s work is partially supported by National Science Council of Taiwan.

J.-S. Chen (

)

Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan e-mail:jschen@math.ntnu.edu.tw

S. Pan

School of Mathematical Sciences, South China University of Technology, Guangzhou 510640, China e-mail:shhpan@scut.edu.cn

(2)

wide range of applications in engineering, economics, management science, optimal control, combinatorial optimization, and other fields; see [1,16,28] and references therein. Recently, symmetric cone programming, especially symmetric cone linear programming (SCLP), has attracted the attention of some researchers with a focus on the development of interior point methods similar to those for linear programming;

see [9,10,23]. Although interior point methods were successfully applied for SCLPs, it is worthwhile to explore other solution methods for general convex symmetric cone optimization problems.

LetA=(V,,·,·)be a Euclidean Jordan algebra, where (V,·,·)is a finite dimensional inner product space over the real fieldRand “◦” denotes the Jordan product which will be defined in the next section. LetK be the symmetric cone in V. In this paper, we consider the following convex symmetric cone programming (CSCP):

minf (x)

s.t. xK0, (1)

wheref :V→(−∞,∞] is a closed proper convex function, and xK 0 means xK. In general, for anyx, y∈V, we writexK yifxyKand writex Ky ifxy∈int(K). A function is closed if and only if it is lower semi-continuous, and a function is proper iff (x) <∞for at least onex∈Vandf (x) >−∞for allx∈V. Notice that the CSCP is a special class of convex programs, and hence in princi- ple it can be solved via general convex programming methods. One such method is the proximal point algorithm for minimizing a convex functionf (x)onRn which generates a sequence{xk}kNvia the following iterative scheme:

xk=argmin

x∈Rn

f (x)+ 1

μkxxk122

, (2)

whereμk>0 and · 2denotes the Euclidean norm inRn. This method was first introduced by Martinet [17], based on the Moreau proximal approximation off (see [18]), and further developed and studied by Rockafellar [20,21]. Later, several gen- eralizations of the proximal point algorithm have been considered where the usual quadratic proximal term in (2) is replaced by a nonquadratic distance-like function;

see, for example, [5,7,8,14,25]. Among others, the algorithms using an entropy-like distance [13,14,25,26] for minimizing a convex functionf (x)subject tox∈Rn+, generate the iterates by

x0∈Rn++

xk=argminx∈Rn

+

f (x)+μ1kdϕ(x, xk1)

, (3)

wheredϕ(·,·):Rn++×Rn++→Rn+is the entropy-like distance defined by

dϕ(x, y)= n

i=1

yiϕ(xi/yi) (4)

(3)

withϕsatisfying certain conditions; see [13,14,25,26]. An important choice ofϕis the functionϕ(t )=tlnt−t+1, for which the correspondingdϕgiven by

dϕ(x, y)= n

i=1

xilnxixilnyi+yixi (5) is the popular Kullback-Leibler entropy from statistics and that is the “entropy” ter- minology stems from. One key feature of entropic proximal methods is that they generate a sequence staying in the interior ofRn+automatically, and thus eliminates the combinatorial nature of the problem. One of the main applications of such proxi- mal methods is to the dual of smooth convex programs, yielding twice continuously differentiable nonquadratic augmented Lagrangians and thereby allowing the usage of Newton’s methods.

The main purpose of this paper is to propose an interior proximal-like method and the corresponding dual augmented Lagrangian method for the CSCP (1). Specifically, by using the Euclidean Jordan algebraic techniques, we extend the entropy-like prox- imal algorithm defined by (3)–(4) withϕ(t )=tlntt+1 to the solution of (1). For the proposed algorithm, we establish a global convergence estimate in terms of the objective value, and moreover present a dual application to the standard SCLP, which leads to an exponential multiplier method shown to possess properties analogous to the method proposed by [3,27] for convex programming over nonnegative orthant coneRn+.

The paper is organized as follows. Section2 reviews some basic concepts and materials on Euclidean Jordan algebras which are needed in the analysis of the al- gorithm. In Sect.3, we introduce a distance-like functionH to measure how close between two points in the symmetric coneK and investigate some related proper- ties. Furthermore, we outline a basic proximal-like algorithm with the measure func- tionH. The convergence analysis of the algorithm is the main content of Sect.4. In Sect.5, we consider a dual application of the algorithm to the SCLP and establish the convergence results for the corresponding multiplier method. We close this paper with some remarks in Sect.6.

2 Preliminaries on Euclidean Jordan algebra

This section recalls some concepts and results on Euclidean Jordan algebras that will be used in the subsequent sections. More detailed expositions of Euclidean Jordan al- gebras can be found in Koecher’s lecture notes [15] and Faraut and Korányi’s mono- graph [11].

Let V be a finite dimensional inner space endowed with a bilinear mapping (x, y)xy :V×V→V. For a givenx ∈V, let L(x)be the linear operator ofVdefined by

L(x)y:=xy for everyy∈V. The pair(V,)is called a Jordan algebra if, for allx, y∈V,

(i) xy=yx,

(4)

(ii) x(x2y)=x2(xy), wherex2:=xx.

In a Jordan algebra(V,),xyis said to be the Jordan product ofxandy. Note that a Jordan algebra is not associative, i.e.,x(yz)=(xy)zmay not hold in general. If for some elemente∈V,xe=ex=x for allx∈V, theneis called a unit element of the Jordan algebra(V,). The unit element, if exists, is unique.

A Jordan algebra does not necessarily have a unit element. Forx∈V, letζ (x)be the degree of the minimal polynomial ofx, which can be equivalently defined as

ζ (x):=min

k: {e, x, x2,· · ·, xk}are linearly dependent

.

Then the rank of(V,◦)is well defined byr:=max{ζ (x):x∈V}.

A Jordan algebra(V,), with a unit elemente∈V, defined over the real fieldRis called a Euclidean Jordan algebra or formally real Jordan algebra, if there exists a positive definite symmetric bilinear form onVwhich is associative; in other words, there exists onVan inner product denoted by·,·Vsuch that for allx, y, z∈V:

(iii) xy, zV= y, xzV.

In a Euclidean Jordan algebraA=(V,,·,·V), we define the set of squares as K:=

x2:x∈V .

By [11, Theorem III. 2.1],Kis a symmetric cone. This means thatKis a self-dual closed convex cone with nonempty interior and for any two elementsx, y∈int(K), there exists an invertible linear transformationT :V→Vsuch thatT(K)=Kand T(x)=y.

Here are two popular examples of Euclidean Jordan algebras. LetSnbe the space ofn×nreal symmetric matrices with the inner product given byX, YSn:=Tr(XY ), whereXY is the matrix multiplication ofXandY and Tr(XY ) is the trace ofXY. Then,(Sn,,·,·Sn)is a Euclidean Jordan algebra with the Jordan product defined by

XY := (XY+Y X)/2, X, Y∈Sn.

In this case, the unit element is the identity matrixIinSnand the cone of squaresK is the set of all positive semidefinite matrices inSn. LetRnbe the Euclidean space of dimensionnwith the usual inner productx, yRn=xTy. For anyx=(x1, x2), y= (y1, y2)∈R×Rn1, definexy:=(xTy, x1y2+y1x2)T. Then(Rn,,·,·Rn)is a Euclidean Jordan algebra, also called the quadratic forms algebra. In this algebra, the unit elemente=(1,0, . . . ,0)T andK= {x=(x1, x2)∈R×Rn1:x1x22}.

Recall that an elementc∈Vis said to be idempotent ifc2=c. Two idempotents cand q are said to be orthogonal ifcq =0. One says that{c1, c2, . . . , ck}is a complete system of orthogonal idempotents if

c2j=cj, cjci=0 ifj=ifor allj, i=1,2, . . . , k, and k j=1

cj=e.

(5)

An idempotent is said to be primitive if it is nonzero and cannot be written as the sum of two other nonzero idempotents. We call a complete system of orthogonal primitive idempotents a Jordan frame. Then we have the following spectral decomposition theorem.

Theorem 2.1 [11, Theorem III. 1.2] Suppose thatA=(V,,·,·V)is a Euclidean Jordan algebra and the rank ofAisr. Then for anyx∈V, there exists a Jordan frame {c1, c2, . . . , cr}and real numbersλ1(x), λ2(x), . . . , λr(x), arranged in the decreas- ing orderλ1(x)λ2(x)≥ · · · ≥λr(x), such thatx=r

j=1λj(x)cj. The numbers λj(x)(counting multiplicities), which are uniquely determined byx, are called the eigenvalues,r

j=1λj(x)cjthe spectral decomposition ofx, and tr(x)=r

j=1λj(x) the trace ofx.

From [11, Proposition III.1.5], a Jordan algebra(V,)with a unit elemente∈V is Euclidean if and only if the symmetric bilinear form tr(x◦y)is positive definite.

Therefore, we may define another inner product onVby x, y := tr(x◦y)x, y∈V.

By the associativity of tr(·)[11, Proposition II. 4.3], the inner product·,·is associa- tive, i.e., for allx, y, z∈V, there holds thatx, yz = y, xz. Thus, the operator L(x)for eachx∈Vis symmetric with respect to the inner product·,·in the sense that

L(x)y, z = y,L(x)zy, z∈V.

In the sequel, we let · be the norm onVinduced by the inner product·,·, i.e., x :=

x, x = r

j=1

λ2j(x) 1/2

x∈V,

and denote byλmin(·)andλmax(·)the smallest and the largest eigenvalue ofx, re- spectively. Then, by Lemma 13 and the proof of Lemma 14 in [23], we can prove the following lemma.

Lemma 2.1 Letx, y∈V, then we can bound the minimum eigenvalue ofx+y as follows:

λmin(x)+λmin(y)λmin(x+y)λmin(x)+λmax(y).

Letg:R→Rbe a scalar-valued function. Then, it is natural to define a vector- valued function associated with the Euclidean Jordan algebra(V,◦,·,·V)by

gsc(x):=g(λ1(x))c1+g(λ2(x))c2+ · · · +g(λr(x))cr, (6) wherex ∈Vhas the spectral decomposition x =r

j=1λj(x)cj. This function is called the Löwner operator in [24] and was shown to have the following important property.

(6)

Lemma 2.2 [24, Theorem 13] For anyx=r

j=1λj(x)cj, letgscbe defined by (6).

Thengscis (continuously) differentiable atx if and only ifgis (continuously) differ- entiable at allλj(x). Furthermore, the derivative ofgscatx, for anyh∈V, is given by

(gsc)(x)h= r j=1

gj(x))cj, hcj+

1j <lr

4[λi(x), λj(x)]gcj(clh)

with

[λi(x), λj(x)]g:=g(λi(x))g(λj(x))

λi(x)λj(x)i, j=1,2, . . . , randi=j.

In fact, the Jacobian(gsc)(·)is a linear and symmetric operator, and can be written as

(gsc)(x)= r j=1

gj(x))Q(cj)+2 r i,j=1,i=j

[λi(x), λj(x)]gL(cj)L(ci), (7)

whereQ(x):=2L2(x)L(x2)for anyx∈Vis called the quadratic representation ofV.

Finally, we recall the spectral function generated by a symmetric function. LetP denote the set of all permutations ofr-dimensional vectors. A subset ofRr is said to be symmetric if it remains unchanged under every permutation ofP. LetS be a symmetric set inRr. A real-valued functionf :S→Ris said to be symmetric if for every permutation P ∈P and each sS, there holds that f (P s)=f (s). For any x ∈V with the spectral decompositionx =r

j=1λj(x)cj, define K:= {x∈ V|λ(x)S}withλ(x)=1(x), . . . , λr(x))T being the spectral vector ofx. Then F:K→Rdefined by

F (x):=f (λ(x)) (8)

is called the spectral function generated byf. From Theorem 41 of [2],F is (strictly) convex iff is (strictly) convex.

Unless otherwise stated, in the rest of this paper, the notationA=(V,,·,·) represents a Euclidean Jordan algebra of rankrand dim(V)=n. For a closed proper convex functionf:V→(−∞,+∞], we denote by domf := {x∈V|f (x) <+∞}

the domain off. The subdifferential off atx0∈Vis the convex set

∂f (x0)= {ξ∈V|f (x)f (x0)+ ξ, xx0x∈V}. (9) Sincex, y =tr(x◦y)for anyx, y∈V, the above subdifferential set is equivalent to

∂f (x0)=

ξ∈V|f (x)f (x0)+tr

ξ(xx0)

, x∈V

. (10)

For a sequence{xk}kN, the notationNdenotes the set of natural numbers.

(7)

3 Entropy-like proximal algorithm

To solve the CSCP (1), we suggest the following proximal-like minimization algo- rithm:

x0 K0 xk=argminx

K0

f (x)+μ1kH (x, xk1)

, (11)

whereμk>0 andH:V×V→(−∞,+∞]is defined by H (x, y):=

tr(x◦lnx−x◦lny+yx)x∈int(K), y∈int(K),

+∞ otherwise. (12)

This algorithm is indeed a proximal-type one except that the classical quadratic term xxk122is replaced by the distance-like functionHto guarantee that{xk}kN⊂ int(K), thus leading to an interior proximal-like method (see Proposition4.1).

By the definition of Löwner operator, clearly, the functionH (x, y)is well-defined for allx, y∈int(K). Moreover, the domain ofx∈int(K)can be extended toxK by adopting the convention 0 ln 0≡0. The functionH is a natural extension of the distance-like entropy function in (5), and is used to measure the “distance” between two points inK. In fact,H will become the entropy functiondϕin (5) if the Euclid- ean Jordan algebraAis specified as(Rn,,·,·Rn)with “◦” denoting the compo- nentwise product of two vectors inRnand·,·Rnthe usual Euclidean inner product.

As shown by Proposition3.1below, most of the important properties, but not all, of dϕ also hold forH.

The following two technical lemmas will be used to investigate the favorable properties of the distance measureH. Lemma3.1states an extension of Von Neu- mann inequality to Euclidean Jordan algebras, and Lemma3.2gives the properties of tr(x◦lnx).

Lemma 3.1 [2] For any x, y ∈ V, we have tr(xy)r

j=1λj(x)λj(y)= λ(x)Tλ(y), whereλ(x)andλ(y)are the spectral vectors ofxandy, respectively.

Lemma 3.2 For anyxK, let(x):=tr(x◦lnx). Then, we have the following results.

(a) (x)is the spectral function generated by the symmetric entropy function φ (u)=r

j=1ujlnuju∈Rr+. (13)

(b) (x)is continuously differentiable on int(K)with(x)=lnx+e.

(c) The function(x)is strictly convex overK.

Proof (a) Suppose thatxhas the spectral decompositionx=r

j=1λj(x)cj. Let g(t )=tlntt∈R.

(8)

From Sect. 2, it follows that the vector-valued functionx◦lnxis the Löwner function gsc(x), i.e.,gsc(x)=x◦lnx. Clearly,gscis well-defined for anyxKand

gsc(x)=x◦lnx= r j=1

λj(x)ln(λj(x))cj.

Therefore,

(x)=tr(x◦lnx)=tr(gsc(x))= r j=1

λj(x)ln(λj(x))=φ (λ(x))

withφ:Rr+→Rgiven by (13). Since the functionφ is symmetric, (x) is the spectral function generated by the symmetric functionφin view of (8).

(b) From Lemma2.2,gsc(x)=x◦lnx is continuously differentiable on int(K).

Thus,(x)is also continuously differentiable on int(K)because is the compo- sition of the trace function (clearly continuously differentiable) andgsc. Now, it re- mains to find its gradient formula. From the fact that tr(x◦y)= x, y, we have

(x)=tr(x◦lnx)= x,lnx.

Applying the chain rule for inner product of two functions, we then obtain

(x)=lnx+(∇lnx)x=lnx+(lnx)x. (14) On the other hand, from formula (7) it follows that for anyh∈V,

(lnx)h= r j=1

1

λj(x)cj, hcj+

1j <lr

ln(λj(x))−ln(λl(x))

λj(x)λl(x) cj(clh).

By this and the spectral decomposition ofx, it is easy to compute that (lnx)x=

r j=1

1

λj(x)cj, xcj+

1j <lr

ln(λj(x))−ln(λl(x))

λj(x)λl(x) cj(clx)

= r j=1

1

λj(x)λj(x)cj+

1j <lr

ln(λj(x))−ln(λl(x))

λj(x)λl(x) λl(x)cjcl

=e.

Combining with (14), we readily obtain the desired result.

(c) Note that the functionφ in (13) is strictly convex overRn+. Therefore, the conclusion immediately follows from part (a) and Theorem 41 of [2].

Next we study the favorable properties of the distance-like function H. These properties play a crucial role in the convergence analysis of the algorithm defined by (11)–(12).

(9)

Proposition 3.1 LetH (x, y)be defined by (12). Then the following results hold.

(a) H (x, y)is continuous onK×int(K)andH (·, y)is strictly convex for anyy∈ int(K).

(b) For any fixedy∈int(K),H (·, y)is continuously differentiable on int(K)with

xH (x, y)=lnx−lny.

(c) H (x, y)0 for anyxKandy∈int(K), andH (x, y)=0 if and only ifx=y.

(d) H (x, y)d(λ(x), λ(y))0 for anyxK, y∈int(K), whered(·,·)is defined by

d(u, v)= n i=1

uilnuiuilnvi+viuiu∈Rr+, v∈Rr++. (15)

(e) For fixed y ∈int(K), the level sets LH(x, γ ):= {xK |H (x, y)γ} are bounded for allγ0, and for fixed xK, the level sets LH(y, γ ):= {y∈ int(K)|H (x, y)γ}are bounded for allγ≥0.

Proof (a) Sincex◦lnx, x◦lnyare continuous inxKandy∈int(K), and the trace function is also continuous, the functionHis continuous overK×int(K). Notice that H (x, y)=(x)−tr(x◦lny)+tr(y)−tr(x), (16) (x)is strictly convex overK by Lemma3.2(c), and the other terms on the right hand side of (16) are clearly convex for any fixedy∈int(K). Therefore,H (·, y)is strictly convex for any fixedy∈int(K).

(b) From the expression ofH (x, y)given by (16) and Lemma3.2(b), obviously, the functionH (·, y)is continuously differentiable in int(K). Moreover,

xH (x, y)= ∇x(x)−lny−e=lnx−lny.

(c) From the definition of(x)and its gradient formula shown as in Lemma3.2(b), (x)(y)(y), xy

=tr(x◦lnx)−tr(y◦lny)− lny+e, xy

=tr(x◦lnx)−tr(x◦lny)−tr(x)+tr(y)

=H (x, y) (17)

for anyxKandy∈int(K). In addition, the strict convexity ofimplies that (x)(y)(y), xy ≥0

and the equality holds if and only ifx=y. The two sides readily give the desired result.

(10)

(d) Using the definition of H and Lemma3.1, we have for all xK andy∈ int(K),

H (x, y)=tr(x◦lnx+yx)−tr(x◦lny)

≥tr(x◦lnx+yx)r j=1

λj(x)ln(λj(y))

= r j=1

λj(x)ln(λj(x))+λj(y)λj(x)r j=1

λj(x)ln(λj(y))

= r j=1

λj(x)ln(λj(x))λj(x)ln(λj(y))+λj(y)λj(x)

=d(λ(x), λ(y)).

The nonnegativity ofd(λ(x), λ(y))is direct by the definition ofd(·,·)in (15).

(e) For fixed y ∈int(K), from part (d) we have LH(x, γ )⊆ {xK|d(λ(x), λ(y))γ}for allγ ≥0. Since the sets{u∈Rr+|d(u, v)γ}are bounded for all γ≥0 by [26, Lemma 2.3], we have from the continuity ofλ(·)that the setsLH(x, γ ) are bounded for allγ≥0. Similarly, for fixedxKn, using Lemma 2.1(i) of [26]

the setsLH(y, γ )are bounded for allγ≥0.

Proposition 3.2 Let H (x, y) be defined by (12). Suppose {xk}kNK and {yk}kN⊂int(K)are bounded sequences such thatH (xk, yk)0. Then, ask→ ∞, we have

(a) λj(xk)λj(yk)0 for allj=1,2, . . . , r. (b) tr(xkyk)→0.

Proof (a) From Proposition 3.1(d), H (xk, yk)d(λ(xk), λ(yk))≥ 0. Hence, H (xk, yk)→ 0 implies d(λ(xk), λ(yk))→ 0. By the definition of d(·,·) given by (15),

d(λ(xk), λ(yk))= r j=1

λj(yk

λj(xk)/λj(yk)

withϕ(t )=tlntt+1(t≥0). Sinceϕ(t )≥0 for anyt≥0, each term of the above sum is nonnegative, and consequently,d(λ(xk), λ(yk))→0 implies

λj(yk

λj(xk)/λj(yk)

→0, j=1,2, . . . , r.

This is equivalent to saying that

λj(xk)ln(λj(xk))λj(xk)ln(λj(yk))+λj(yk)λj(xk)→0, j =1,2, . . . , r.

Since{λj(xk)}and{λj(yk)}are bounded, using Lemma A.1 of [6] then yields that λj(xk)λj(yk)→0 for allj =1,2, . . . , r.

(11)

(b) Since tr(xkyk)=r

j=1j(xk)λj(yk)), the result follows from part (a).

To close this section, we present two useful relations for the functionH, which can be easily verified by using the definition ofH and recalling the nonnegativity ofH.

Proposition 3.3 LetH (x, y)be defined by (12). For allx, y∈int(K)andzK, we have

(a) H (z, x)H (z, y)=tr(z◦lny−z◦lnx+xy).

(b) tr((z−y)(lny−lnx))=H (z, x)H (z, y)H (y, x)H (z, x)H (z, y).

4 Convergence analysis of the algorithm

The convergence analysis of the entropy-like proximal algorithm defined by (11)–

(12) is similar to that of the proximal point method [12] for convex minimization problems and the proximal-like algorithm [4] using Bregman functions. In this sec- tion, we will present a global convergence rate estimate for the algorithm in terms of function values. For this purpose, we need to make the following assumptions for the CSCP (1):

(A) inf{f (x)|xK0} :=f>−∞and domf∩int(K)= ∅.

In addition, we denote byX:= {xK|f (x)=f}the solution set of (1).

First, we show that the algorithm in (11)–(12) is well defined, i.e., it generates a sequence{xk}k∈N⊂int(K). This is a direct consequence of Proposition4.1below.

For the proof of Proposition4.1, we need the following technical lemma.

Lemma 4.1 For anyxK0,y K0 and tr(xy)=0, we always havex=0.

Proof From the self-duality ofKand [11, Proposition I. 1.4], we have that u∈int(K) ⇐⇒ u, v>0, ∀0=vK,

which together with tr(x◦y)= x, yimmediately impliesx=0.

Proposition 4.1 For any fixed y ∈ int(K) and μ >0, let Fμ(·, y):= f (·)+ μ1H (·, y). Then, under assumption (A), we have the following results.

(a) The functionFμ(·, y)has bounded level sets.

(b) There exists a uniquex(y, μ)∈int(K)such that x(y, μ):=argmin

xK0

Fμ(x, y)

and

μ1(lny−lnx(y, μ))∈∂f (x(y, μ)). (18)

(12)

Proof (a) SinceFμ(·, y)is convex, to show thatFμ(·, y)has bounded level sets, it suffices to show that for anyνf>−∞, the level setL(ν):= {x|Fμ(x, y)ν} is bounded. Letν:=f)μ. Clearly, we haveL(ν)LH(x, ν). Moreover, by Proposition3.1(e),LH(x, ν)is bounded. Therefore,L(ν)is bounded.

(b) From part (a), Fμ(·, y) has bounded level sets, which in turn implies the existence of the minimum point x(y, μ). Also, the strict convexity of Fμ(x, y) by Proposition 3.1(a) guarantees the uniqueness. Under assumption (A), using Proposition 3.1(b) and the optimality condition for the minimization problem argminx

K0Fμ(x, y)gives that 0∈∂f (x(y, μ))+μ1

lnx(y, μ)−lny

+∂δ(x(y, μ)|K), (19)

whereδ(z|K)=0 ifzK0 and+∞otherwise. We will show that∂δ(x(y, μ)|K)= {0}and hence the desired result follows. Notice that for anyxk∈int(K)withxkx∈bd(K),

lnxk = r

j=1

[ln(λj(xk))]2 1/2

→ +∞,

where the second relation is due to the continuity ofλj(·)and lnt. Consequently,

xH (xk, y) = lnxk−lny ≥ lnxk − lny → +∞.

This by [19, pp. 251] means thatH (·, y)is essentially smooth, and thenxH (x, y)=

∅ for all x ∈ bd(K) by [19, Theorem 26.1]. Together with (19), we must have x(y, μ)∈int(K). Furthermore, from [19, pp. 226], it follows that

∂δ(z|K)= {v∈V|vK0, tr(v◦z)=0}.

Using Lemma4.1then yields∂δ(x(y, μ)|K)= {0}. Thus, the proof is completed.

Next we establish several important properties for the algorithm defined by (11)–

(12), from which our convergence result will follow.

Proposition 4.2 Let{xk}kNbe the sequence generated by the algorithm defined by (11)–(12), and letσm:=m

k=1μk. Then, the following results hold.

(a) The sequence{f (xk)}kNis nonincreasing.

(b) μk(f (xk)f (x))H (x, xk1)H (x, xk)for allxK.

(c) σm(f (xm)f (x))H (x, x0)H (x, xm)for allxK.

(d) {H (x, xk)}kNis nonincreasing for anyxX, if the optimal setX= ∅. (e) H (xk, xk1)0 and tr(xkxk1)0, if the optimal setX= ∅.

Proof (a) From the definition ofxkin (11), it follows that

f (xk)+μk1H (xk, xk1)f (xk1)+μk1H (xk1, xk1).

(13)

SinceH (xk, xk1)≥0 andH (xk1, xk1)=0 (see Proposition3.1(c)), we obtain f (xk)f (xk1)kN,

and therefore the sequence{f (xk)}kNis nonincreasing.

(b) From (18),−μk1(lnxk−lnxk1)∂f (xk). Plugging this into the formula of

∂f (xk)given by (10), we have f (x)f (xk)μk1tr

(xxk)(lnxk−lnxk1)x∈V. Consequently, for anyxK,

μk·(f (xk)f (x))≤tr

(xxk)(lnxk−lnxk1)

=H (x, xk1)H (x, xk)H (xk, xk1)

H (x, xk1)H (x, xk), (20) where the equality holds due to Proposition3.3(b), and the last inequality follows from the nonnegativity ofH (x, y).

(c) First, summing up the inequalities in part (b) overk=1,2,3, . . . , myields that m

k=1

μkf (xk)σmf (x)H (x, x0)H (x, xm)xK. (21) On the other hand, sincef (xm)f (xk)by part (a), multiplying this inequality by μkand summing up the inequalities overk=1,2,3, . . . , mthen gives that

m k=1

μkf (xk)m k=1

μkf (xm)=σmf (xm). (22) Now, combining (22) with (22), we readily obtain the desired result.

(d) Sincef (xk)f (x)≥0 for allxX, the result immediately follows from part (b).

(e) From part (d), the sequence{H (x, xk)}kN is nonincreasing for anyxX. This together withH (x, xk)≥0 implies that{H (x, xk)}kNis convergent. Thus,

H (x, xk1)H (x, xk)→0 ∀xX. (23) On the other hand, from (20) and part (d), we have

0≤μk(f (xk)f (x))H (x, xk1)H (x, xk)H (xk, xk1),xX, which in turn implies

H (xk, xk1)H (x, xk1)H (x, xk).

Using (23) and the nonnegativity of H (xk, xk1), the first result is then obtained.

Since {xk}kN ⊂ {y ∈ int(K)|H (x, y)H (x, x0)} with xX, the sequence

(14)

{xk}kN is bounded. Thus, the second result follows by the first result and Propo-

sition3.2(b).

By now, we have proved that the algorithm in (11)–(12) is well-defined and sat- isfies some favorable properties. With these properties, we are ready to establish the convergence results of the algorithm which is one of the main purposes of this paper.

Proposition 4.3 Let{xk}kN be the sequence generated by the algorithm in (11)–

(12), and letσm:=m

k=1μk. Then, under assumptions (A), the following results hold.

(a) f (xm)f (x)σm1H (x, x0)for allxK.

(b) Ifσm→ +∞, then limm→+∞f (xm)=f.

(c) If the optimal setX= ∅, then the sequence{xk}kNis bounded, and if, in addi- tion,σm→ +∞, every accumulation point is a solution of the CSCP (1).

Proof (a) This result follows by Proposition 4.2(c) and the nonnegativity of H (x, xm).

(b) Sinceσm→ +∞, passing the limit to the inequality in part (a), we have lim sup

m→+∞f (xm)f (x)xK, which particularly implies

lim sup

m→+∞f (xm)≤inf{f (x): xK}.

This together with the fact thatf (xm)f=inf{f (x):xK}yields the result.

(c) From Proposition3.1(d),H (x,·)has bounded level sets for everyxK. Also, from Proposition4.2(d),{H (x, xk)}kN is nonincreasing for allxXifX= ∅.

Thus, the sequence{xk}kNis bounded and therefore has an accumulation point. Let ˆ

xKbe an accumulation point of{xk}kN. Then{xkj} → ˆx for somekj → +∞. Sincef is lower semi-continuous, we have f (x)ˆ =lim infkj→+∞f (xkj)(see [22, p. 8]). On the other hand, we also havef (xkj)f, hencef (x)ˆ =fwhich says

thatxˆis a solution of (1).

Proposition4.3(a) indicates an estimate for global rate of convergence which is similar to the one obtained for the proximal-like algorithms of convex minimization over nonnegative orthant cone. Analogously, as remarked in [6, Remark 4.1], global convergence of{xk}itself to a solution of (1) can be achieved under the assumption that{xk} ⊂int(K)has a limit point in int(K). Nonetheless, this assumption is rather stringent.

5 Exponential multiplier method for SCLP

In this section, we give a dual application of the entropy-like proximal algorithm in (11)–(12), leading to an exponential multiplier method for the symmetric cone linear

(15)

program

(SCLP) min

bTy: cm i=1

yiaiK0, y∈Rm

,

wherec∈V,ai∈Vandb∈Rm. For the sake of notation, we denote the feasible set of (SCLP) byF:= {y∈Rm:A(y)K0}, where A:Rm→Vis a linear operator defined by

A(y):=cm i=1

yiai ∀y∈Rm. (24)

In addition, we make the following assumptions for the problem (SCLP):

(A1) The optimal solution set of (SCLP) is nonempty and bounded;

(A2) Slater’s condition holds, i.e., there exists ayˆ∈Rmsuch thatA(y)ˆ K0.

Notice that the Lagrangian function associated with (SCLP) is defined as follows:

L(y, x)=

bTy−tr[xA(y)] ifxK,

+∞ otherwise.

Therefore, the dual problem corresponding to the SCLP is given by (DSCLP) max

xK0 inf

y∈RmL(y, x), which can be explicitly written as

(DSCLP)

⎧⎪

⎪⎩

max−tr(c◦x) s.t. tr(aix)=bi,

xK0.

i=1,2, . . . , m,

From the standard convex analysis arguments in [19], it follows that under assump- tion (A2) there is no duality gap between (SCLP) and (DSCLP), and furthermore, the solution set of (DSCLP) is nonempty and compact.

To solve the problem (SCLP), we introduce the following multiplier-type al- gorithm: given x0∈int(K), generate the sequence {yk}kN ⊆Rm and {xk}kN⊂ int(K)by

yk=argmin

y∈Rm

bTy+μk1tr

exp

μkA(y)+lnxk1 , (25)

xk=exp

−μkA(yk)+lnxk1

, (26)

where the parameters μk satisfyμk>μ >¯ 0 for all kN. The algorithm can be viewed as a natural extension of the exponential multiplier method used in convex programs over nonnegative orthant cones (see, e.g., [3,27]) to symmetric cones. In

(16)

the rest of this section, we will show that the algorithm defined by (25)–(26) possesses similar properties.

We first present two technical lemmas, where Lemma5.1collects some properties of the Löwner operator exp(z)and Lemma5.2characterizes the recession cone ofF.

Lemma 5.1

(a) For anyz∈V, there always holds that exp(z)∈int(K).

(b) The function exp(z)is continuously differentiable everywhere with (exp(z))e= ∇z(exp(z))e=exp(z) ∀z∈V. (b) The function tr[exp(m

i=1yiai)]is continuously differentiable everywhere with

ytr

exp m

i=1

yiai

=AT

exp m

i=1

yiai

,

wherey∈Rmandai∈Vfor alli=1,2, . . . , m, andAT :V→Rmbe a linear transformation defined byAT(x)=(a1, x, . . . ,am, x)T for anyx∈V.

Proof (a) The result is clear since for anyz∈Vwithz=r

j=1λj(z)cj, all eigen- values of exp(z), given by exp(λj(z))forj =1,2, . . . , r, are positive.

(b) By Lemma2.2, clearly, exp(z)is continuously differentiable everywhere and (exp(z))h=

r j=1

exp(λj(z))cj, hcj

+

1j <lr

4exp(λj(z))−exp(λl(z))

λj(z)λl(z) cj(clh)

for anyh∈V. From this formula, we particularly have (exp(z))e=

r j=1

exp(λj(z))cj, ecj

+

1j <lr

4exp(λj(z))−exp(λl(z))

λj(z)λl(z) cj(cle)

= r j=1

exp(λj(z))cj=exp(z).

(c) The first part is direct by the continuous differentiability of trace function and part (b), and the second part follows from the differential chain rule.

Lemma 5.2 LetFdenote the recession cone of the feasible setF. Then, F=

d∈Rm:A(d)cK0 .

(17)

Proof Assume thatd∈Rmsuch thatA(d)cK0. Ifd=0, clearly,dF. If d=0, we take anyyF. From the definition of the linear operatorA, for anyτ >0,

A(y+τ d)=cm i=1

(yi+τ di)ai=A(y)+τ (A(d)c)K0.

This, by the definition of recession direction [19, pp. 61], shows thatdF. Thus, we prove that{d∈Rm:A(d)cK0} ⊆F.

Now take any dF and yF. Then A(y+τ d)K 0 for any τ >0, and therefore,λmin[A(y+τ d)] ≥0. This must implyλmin(A(d)c)≥0. If not, using the fact that

λmin[A(y+τ d)] =λmin[A(y)+τ (A(d)c)]

λmin(τ (A(d)c))+λmax(A(y))

=τ λmin(A(d)c)+λmax(A(y))

where the inequality is due to Lemma2.1, and lettingτ→ +∞, we then have λmin[A(y+τ d)] → −∞,

contradicting the fact thatλmin[A(y+τ d)] ≥0. Consequently,A(d)cK 0. To- gether with the above discussions, we show thatF= {d∈Rm|A(d)cK0}. Next we establish the convergence of the algorithm in (25)–(26). We first prove that the sequence generated is well-defined, which is implied by the following lemma.

Lemma 5.3 For anyy∈Rmandμ >0, letF :Rm→Rbe defined by F (y):= −bTy+μ1tr

exp(−μA(y)+lnxk1) . (27) Then under assumption (A1) the minimum set ofF is nonempty and bounded.

Proof We prove thatF is coercive by contradiction. Suppose not, i.e., some level set ofF is not bounded. Then, there exists a sequence{yk} ⊆Rmsuch that

yk → +∞, lim

k→+∞

yk

yk=d=0 and F (yk)δ (28) for some δ ∈ R. Since exp(−μA(y)+lnxk1)∈ int(K) for any y ∈ Rm by Lemma5.1(a),

tr

exp

μA(y)+lnxk1 = r j=1

exp

λj

μA(y)+lnxk1 >0.

Références

Documents relatifs

proves that three apparently unrelated sets of convex cones, namely the class of linearly closed and convex cones of a finite-dimensional space, the class gathering all the cones

In this paper we use ideas concerning the Anti-Maximum Principle due to Cl´ement and Peletier [5] and later to Arcoya Gamez [3] to obtain in Section 2 precise estimates concerning

A global Carleman estimate for a wave equation with potential on a star-shaped 1-D network is proved in Section

For each method, we obtain formulas that provide bounds on the total number of vector additions in F k b required to perform any of the following three computations, given a vector S

The results proven in this paper provide a first a priori analysis of the greedy algorithm for the selection of the elements used in the reduced basis for the approximation of

During this 1906 visit, Rinpoche’s father met the 8 th Khachoed Rinpoche, Drupwang Lungtok Tenzin Palzangpo of Khachoedpalri monastery in West Sikkim, and from

This work generalizes the use of duality in the theory of Markov processes as proposed by Sigmund (see Anderson’s book). The dual doesn’t

Recently we have used the Jenkins s approach in order to design algorithms to solve multiparametric 0-1-ILP problems relative to the right-hand-side vec- tor (Crema [2]), the