and the exponential multiplier method for convex symmetric cone programming

(1)

DOI 10.1007/s10589-008-9227-0

An entropy-like proximal algorithm

and the exponential multiplier method for convex symmetric cone programming

Jein-Shan Chen·Shaohua Pan

Received: 28 April 2006 / Revised: 6 December 2008 / Published online: 18 December 2008

Abstract We introduce an entropy-like proximal algorithm for the problem of min- imizing a closed proper convex function subject to symmetric cone constraints. The algorithm is based on a distance-like function that is an extension of the Kullback- Leiber relative entropy to the setting of symmetric cones. Like the proximal algorithms for convex programming with nonnegative orthant cone constraints, we show that, under some mild assumptions, the sequence generated by the proposed algorithm is bounded and every accumulation point is a solution of the considered problem. In addition, we also present a dual application of the proposed algorithm to the symmetric cone linear program, leading to a multiplier method which is shown to possess similar properties as the exponential multiplier method (Tseng and Bertsekas in Math. Program. 60:1–19,1993) holds.

Keywords Symmetric cone optimization·Proximal-like method·Entropy-like distance·Exponential multiplier method

1 Introduction

Symmetric cone programming provides a unified framework for linear programming, second-order cone programming and semidefinite programming, which arise from a

J.-S. Chen is a member of Mathematics Division, National Center for Theoretical Sciences, Taipei Office. The author’s work is partially supported by National Science Council of Taiwan.

J.-S. Chen (

⁾

Department of Mathematics, National Taiwan Normal University, Taipei 11677, Taiwan e-mail:[email protected]

S. Pan

School of Mathematical Sciences, South China University of Technology, Guangzhou 510640, China e-mail:[email protected]

(2)

wide range of applications in engineering, economics, management science, optimal control, combinatorial optimization, and other fields; see [1,16,28] and references therein. Recently, symmetric cone programming, especially symmetric cone linear programming (SCLP), has attracted the attention of some researchers with a focus on the development of interior point methods similar to those for linear programming;

see [9,10,23]. Although interior point methods were successfully applied for SCLPs, it is worthwhile to explore other solution methods for general convex symmetric cone optimization problems.

LetA=(V,◦,·,·)be a Euclidean Jordan algebra, where (V,·,·)is a finite dimensional inner product space over the real fieldRand “◦” denotes the Jordan product which will be defined in the next section. LetK be the symmetric cone in V. In this paper, we consider the following convex symmetric cone programming (CSCP):

minf (x)

s.t. x_K0, (1)

wheref :V→(−∞,∞] is a closed proper convex function, and x_K 0 means x∈K. In general, for anyx, y∈V, we writex_K yifx−y∈Kand writex _Ky ifx−y∈int(K). A function is closed if and only if it is lower semi-continuous, and a function is proper iff (x) <∞for at least onex∈Vandf (x) >−∞for allx∈V. Notice that the CSCP is a special class of convex programs, and hence in princi- ple it can be solved via general convex programming methods. One such method is the proximal point algorithm for minimizing a convex functionf (x)onRⁿ which generates a sequence{x^k}k∈Nvia the following iterative scheme:

x^k=argmin

x∈Rⁿ

f (x)+ 1

μkx−x^k⁻¹²₂

, (2)

whereμ_k>0 and · 2denotes the Euclidean norm inRⁿ. This method was first introduced by Martinet [17], based on the Moreau proximal approximation off (see [18]), and further developed and studied by Rockafellar [20,21]. Later, several gen- eralizations of the proximal point algorithm have been considered where the usual quadratic proximal term in (2) is replaced by a nonquadratic distance-like function;

see, for example, [5,7,8,14,25]. Among others, the algorithms using an entropy-like distance [13,14,25,26] for minimizing a convex functionf (x)subject tox∈Rⁿ₊, generate the iterates by

x⁰∈Rⁿ₊₊

x^k=argmin_x_∈Rⁿ

+

f (x)+_μ¹_kdϕ(x, x^k⁻¹)

, (3)

whered_ϕ(·,·):Rⁿ₊₊×Rⁿ₊₊→Rⁿ₊is the entropy-like distance defined by

d_ϕ(x, y)= n

i=1

y_iϕ(x_i/y_i) (4)

(3)

withϕsatisfying certain conditions; see [13,14,25,26]. An important choice ofϕis the functionϕ(t )=tlnt−t+1, for which the correspondingd_ϕgiven by

d_ϕ(x, y)= n

i=1

x_ilnx_i−x_ilny_i+y_i−x_i (5) is the popular Kullback-Leibler entropy from statistics and that is the “entropy” ter- minology stems from. One key feature of entropic proximal methods is that they generate a sequence staying in the interior ofRⁿ₊automatically, and thus eliminates the combinatorial nature of the problem. One of the main applications of such proximal methods is to the dual of smooth convex programs, yielding twice continuously differentiable nonquadratic augmented Lagrangians and thereby allowing the usage of Newton’s methods.

The main purpose of this paper is to propose an interior proximal-like method and the corresponding dual augmented Lagrangian method for the CSCP (1). Specifically, by using the Euclidean Jordan algebraic techniques, we extend the entropy-like proximal algorithm defined by (3)–(4) withϕ(t )=tlnt−t+1 to the solution of (1). For the proposed algorithm, we establish a global convergence estimate in terms of the objective value, and moreover present a dual application to the standard SCLP, which leads to an exponential multiplier method shown to possess properties analogous to the method proposed by [3,27] for convex programming over nonnegative orthant coneRⁿ₊.

The paper is organized as follows. Section2 reviews some basic concepts and materials on Euclidean Jordan algebras which are needed in the analysis of the algorithm. In Sect.3, we introduce a distance-like functionH to measure how close between two points in the symmetric coneK and investigate some related properties. Furthermore, we outline a basic proximal-like algorithm with the measure func- tionH. The convergence analysis of the algorithm is the main content of Sect.4. In Sect.5, we consider a dual application of the algorithm to the SCLP and establish the convergence results for the corresponding multiplier method. We close this paper with some remarks in Sect.6.

2 Preliminaries on Euclidean Jordan algebra

This section recalls some concepts and results on Euclidean Jordan algebras that will be used in the subsequent sections. More detailed expositions of Euclidean Jordan algebras can be found in Koecher’s lecture notes [15] and Faraut and Korányi’s mono- graph [11].

Let V be a finite dimensional inner space endowed with a bilinear mapping (x, y)→x◦y :V×V→V. For a givenx ∈V, let L(x)be the linear operator ofVdefined by

L(x)y:=x◦y for everyy∈V. The pair(V,◦)is called a Jordan algebra if, for allx, y∈V,

(i) x◦y=y◦x,

(4)

(ii) x◦(x²◦y)=x²◦(x◦y), wherex²:=x◦x.

In a Jordan algebra(V,◦),x◦yis said to be the Jordan product ofxandy. Note that a Jordan algebra is not associative, i.e.,x◦(y◦z)=(x◦y)◦zmay not hold in general. If for some elemente∈V,x◦e=e◦x=x for allx∈V, theneis called a unit element of the Jordan algebra(V,◦). The unit element, if exists, is unique.

A Jordan algebra does not necessarily have a unit element. Forx∈V, letζ (x)be the degree of the minimal polynomial ofx, which can be equivalently defined as

ζ (x):=min

k: {e, x, x²,· · ·, x^k}are linearly dependent

.

Then the rank of(V,◦)is well defined byr:=max{ζ (x):x∈V}.

A Jordan algebra(V,◦), with a unit elemente∈V, defined over the real fieldRis called a Euclidean Jordan algebra or formally real Jordan algebra, if there exists a positive definite symmetric bilinear form onVwhich is associative; in other words, there exists onVan inner product denoted by·,·_Vsuch that for allx, y, z∈V:

(iii) x◦y, z_V= y, x◦z_V.

In a Euclidean Jordan algebraA=(V,◦,·,·_V), we define the set of squares as K:=

x²:x∈V .

By [11, Theorem III. 2.1],Kis a symmetric cone. This means thatKis a self-dual closed convex cone with nonempty interior and for any two elementsx, y∈int(K), there exists an invertible linear transformationT :V→Vsuch thatT(K)=Kand T(x)=y.

Here are two popular examples of Euclidean Jordan algebras. LetSⁿbe the space ofn×nreal symmetric matrices with the inner product given byX, Y_Sⁿ:=Tr(XY ), whereXY is the matrix multiplication ofXandY and Tr(XY ) is the trace ofXY. Then,(Sⁿ,◦,·,·Sⁿ)is a Euclidean Jordan algebra with the Jordan product defined by

X◦Y := (XY+Y X)/2, X, Y∈Sⁿ.

In this case, the unit element is the identity matrixIinSⁿand the cone of squaresK is the set of all positive semidefinite matrices inSⁿ. LetRⁿbe the Euclidean space of dimensionnwith the usual inner productx, yRⁿ=x^Ty. For anyx=(x1, x2), y= (y1, y2)∈R×Rⁿ⁻¹, definex◦y:=(x^Ty, x1y2+y1x2)^T. Then(Rⁿ,◦,·,·_Rⁿ)is a Euclidean Jordan algebra, also called the quadratic forms algebra. In this algebra, the unit elemente=(1,0, . . . ,0)^T andK= {x=(x₁, x₂)∈R×Rⁿ⁻¹:x₁≥ x₂2}.

Recall that an elementc∈Vis said to be idempotent ifc²=c. Two idempotents cand q are said to be orthogonal ifc◦q =0. One says that{c1, c2, . . . , ck}is a complete system of orthogonal idempotents if

c²_j=c_j, c_j◦c_i=0 ifj=ifor allj, i=1,2, . . . , k, and k j=1

c_j=e.

(5)

An idempotent is said to be primitive if it is nonzero and cannot be written as the sum of two other nonzero idempotents. We call a complete system of orthogonal primitive idempotents a Jordan frame. Then we have the following spectral decomposition theorem.

Theorem 2.1 [11, Theorem III. 1.2] Suppose thatA=(V,◦,·,·_V)is a Euclidean Jordan algebra and the rank ofAisr. Then for anyx∈V, there exists a Jordan frame {c1, c2, . . . , cr}and real numbersλ1(x), λ2(x), . . . , λr(x), arranged in the decreas- ing orderλ1(x)≥λ2(x)≥ · · · ≥λr(x), such thatx=_r

j=1λj(x)cj. The numbers λj(x)(counting multiplicities), which are uniquely determined byx, are called the eigenvalues,_r

j=1λj(x)cjthe spectral decomposition ofx, and tr(x)=_r

j=1λj(x) the trace ofx.

From [11, Proposition III.1.5], a Jordan algebra(V,◦)with a unit elemente∈V is Euclidean if and only if the symmetric bilinear form tr(x◦y)is positive definite.

Therefore, we may define another inner product onVby x, y := tr(x◦y) ∀x, y∈V.

By the associativity of tr(·)[11, Proposition II. 4.3], the inner product·,·is associative, i.e., for allx, y, z∈V, there holds thatx, y◦z = y, x◦z. Thus, the operator L(x)for eachx∈Vis symmetric with respect to the inner product·,·in the sense that

L(x)y, z = y,L(x)z ∀y, z∈V.

In the sequel, we let · be the norm onVinduced by the inner product·,·, i.e., x :=

x, x = _r

j=1

λ²_j(x) 1/2

∀x∈V,

and denote byλ_min(·)andλ_max(·)the smallest and the largest eigenvalue ofx, respectively. Then, by Lemma 13 and the proof of Lemma 14 in [23], we can prove the following lemma.

Lemma 2.1 Letx, y∈V, then we can bound the minimum eigenvalue ofx+y as follows:

λmin(x)+λmin(y)≤λmin(x+y)≤λmin(x)+λmax(y).

Letg:R→Rbe a scalar-valued function. Then, it is natural to define a vector- valued function associated with the Euclidean Jordan algebra(V,◦,·,·_V)by

g^sc(x):=g(λ₁(x))c₁+g(λ₂(x))c₂+ · · · +g(λ_r(x))c_r, (6) wherex ∈Vhas the spectral decomposition x =_r

j=1λ_j(x)c_j. This function is called the Löwner operator in [24] and was shown to have the following important property.

(6)

Lemma 2.2 [24, Theorem 13] For anyx=_r

j=1λ_j(x)c_j, letg^scbe defined by (6).

Theng^scis (continuously) differentiable atx if and only ifgis (continuously) differ- entiable at allλj(x). Furthermore, the derivative ofg^scatx, for anyh∈V, is given by

(g^sc)(x)h= r j=1

g(λj(x))cj, hcj+

1≤j <l≤r

4[λi(x), λj(x)]gcj◦(cl◦h)

with

[λ_i(x), λ_j(x)]g:=g(λ_i(x))−g(λ_j(x))

λi(x)−λj(x) ∀i, j=1,2, . . . , randi=j.

In fact, the Jacobian(g^sc)(·)is a linear and symmetric operator, and can be written as

(g^sc)(x)= r j=1

g(λ_j(x))Q(c_j)+2 r i,j=1,i=j

[λ_i(x), λ_j(x)]gL(c_j)L(c_i), (7)

whereQ(x):=2L²(x)−L(x²)for anyx∈Vis called the quadratic representation ofV.

Finally, we recall the spectral function generated by a symmetric function. LetP denote the set of all permutations ofr-dimensional vectors. A subset ofR^r is said to be symmetric if it remains unchanged under every permutation ofP. LetS be a symmetric set inR^r. A real-valued functionf :S→Ris said to be symmetric if for every permutation P ∈P and each s∈S, there holds that f (P s)=f (s). For any x ∈V with the spectral decompositionx =_r

j=1λj(x)cj, define K:= {x∈ V|λ(x)∈S}withλ(x)=(λ1(x), . . . , λr(x))^T being the spectral vector ofx. Then F:K→Rdefined by

F (x):=f (λ(x)) (8)

is called the spectral function generated byf. From Theorem 41 of [2],F is (strictly) convex iff is (strictly) convex.

Unless otherwise stated, in the rest of this paper, the notationA=(V,◦,·,·) represents a Euclidean Jordan algebra of rankrand dim(V)=n. For a closed proper convex functionf:V→(−∞,+∞], we denote by domf := {x∈V|f (x) <+∞}

the domain off. The subdifferential off atx0∈Vis the convex set

∂f (x0)= {ξ∈V|f (x)≥f (x0)+ ξ, x−x0 ∀x∈V}. (9) Sincex, y =tr(x◦y)for anyx, y∈V, the above subdifferential set is equivalent to

∂f (x0)=

ξ∈V|f (x)≥f (x0)+tr

ξ◦(x−x0)

, x∈V

. (10)

For a sequence{x^k}k∈N, the notationNdenotes the set of natural numbers.

(7)

3 Entropy-like proximal algorithm

To solve the CSCP (1), we suggest the following proximal-like minimization algorithm:

x⁰ _K0 x^k=argmin_x

K0

f (x)+_μ¹_kH (x, x^k⁻¹)

, (11)

whereμk>0 andH:V×V→(−∞,+∞]is defined by H (x, y):=

tr(x◦lnx−x◦lny+y−x) ∀x∈int(K), y∈int(K),

+∞ otherwise. (12)

This algorithm is indeed a proximal-type one except that the classical quadratic term x−x^k⁻¹²₂is replaced by the distance-like functionHto guarantee that{x^k}k∈N⊂ int(K), thus leading to an interior proximal-like method (see Proposition4.1).

By the definition of Löwner operator, clearly, the functionH (x, y)is well-defined for allx, y∈int(K). Moreover, the domain ofx∈int(K)can be extended tox∈K by adopting the convention 0 ln 0≡0. The functionH is a natural extension of the distance-like entropy function in (5), and is used to measure the “distance” between two points inK. In fact,H will become the entropy functiondϕin (5) if the Euclid- ean Jordan algebraAis specified as(Rⁿ,◦,·,·_Rⁿ)with “◦” denoting the compo- nentwise product of two vectors inRⁿand·,·_Rⁿthe usual Euclidean inner product.

As shown by Proposition3.1below, most of the important properties, but not all, of d_ϕ also hold forH.

The following two technical lemmas will be used to investigate the favorable properties of the distance measureH. Lemma3.1states an extension of Von Neu- mann inequality to Euclidean Jordan algebras, and Lemma3.2gives the properties of tr(x◦lnx).

Lemma 3.1 [2] For any x, y ∈ V, we have tr(x ◦ y) ≤ _r

j=1λj(x)λj(y)= λ(x)^Tλ(y), whereλ(x)andλ(y)are the spectral vectors ofxandy, respectively.

Lemma 3.2 For anyx ∈K, let(x):=tr(x◦lnx). Then, we have the following results.

(a) (x)is the spectral function generated by the symmetric entropy function φ (u)=_r

j=1u_jlnu_j ∀u∈R^r₊. (13)

(b) (x)is continuously differentiable on int(K)with∇(x)=lnx+e.

(c) The function(x)is strictly convex overK.

Proof (a) Suppose thatxhas the spectral decompositionx=_r

j=1λj(x)cj. Let g(t )=tlnt ∀t∈R.

(8)

From Sect. 2, it follows that the vector-valued functionx◦lnxis the Löwner function g^sc(x), i.e.,g^sc(x)=x◦lnx. Clearly,g^scis well-defined for anyx∈Kand

g^sc(x)=x◦lnx= r j=1

λ_j(x)ln(λj(x))c_j.

Therefore,

(x)=tr(x◦lnx)=tr(g^sc(x))= r j=1

λ_j(x)ln(λj(x))=φ (λ(x))

withφ:R^r₊→Rgiven by (13). Since the functionφ is symmetric, (x) is the spectral function generated by the symmetric functionφin view of (8).

(b) From Lemma2.2,g^sc(x)=x◦lnx is continuously differentiable on int(K).

Thus,(x)is also continuously differentiable on int(K)because is the compo- sition of the trace function (clearly continuously differentiable) andg^sc. Now, it remains to find its gradient formula. From the fact that tr(x◦y)= x, y, we have

(x)=tr(x◦lnx)= x,lnx.

Applying the chain rule for inner product of two functions, we then obtain

∇(x)=lnx+(∇lnx)x=lnx+(lnx)x. (14) On the other hand, from formula (7) it follows that for anyh∈V,

(lnx)h= r j=1

1

λ_j(x)cj, hcj+

1≤j <l≤r

ln(λj(x))−ln(λl(x))

λ_j(x)−λ_l(x) cj◦(cl◦h).

By this and the spectral decomposition ofx, it is easy to compute that (lnx)x=

r j=1

1

λj(x)c_j, xc_j+

1≤j <l≤r

λj(x)−λl(x) c_j◦(c_l◦x)

= r j=1

1

λ_j(x)λj(x)cj+

1≤j <l≤r

λ_j(x)−λ_l(x) λl(x)cj◦cl

=e.

Combining with (14), we readily obtain the desired result.

(c) Note that the functionφ in (13) is strictly convex overRⁿ₊. Therefore, the conclusion immediately follows from part (a) and Theorem 41 of [2].

Next we study the favorable properties of the distance-like function H. These properties play a crucial role in the convergence analysis of the algorithm defined by (11)–(12).

(9)

Proposition 3.1 LetH (x, y)be defined by (12). Then the following results hold.

(a) H (x, y)is continuous onK×int(K)andH (·, y)is strictly convex for anyy∈ int(K).

(b) For any fixedy∈int(K),H (·, y)is continuously differentiable on int(K)with

∇xH (x, y)=lnx−lny.

(c) H (x, y)≥0 for anyx∈Kandy∈int(K), andH (x, y)=0 if and only ifx=y.

(d) H (x, y)≥d(λ(x), λ(y))≥0 for anyx∈K, y∈int(K), whered(·,·)is defined by

d(u, v)= n i=1

u_ilnui−u_ilnv_i+v_i−u_i ∀u∈R^r₊, v∈R^r₊₊. (15)

(e) For fixed y ∈int(K), the level sets L_H(x, γ ):= {x ∈K |H (x, y)≤γ} are bounded for allγ ≥0, and for fixed x ∈K, the level sets LH(y, γ ):= {y∈ int(K)|H (x, y)≤γ}are bounded for allγ≥0.

Proof (a) Sincex◦lnx, x◦lnyare continuous inx∈Kandy∈int(K), and the trace function is also continuous, the functionHis continuous overK×int(K). Notice that H (x, y)=(x)−tr(x◦lny)+tr(y)−tr(x), (16) (x)is strictly convex overK by Lemma3.2(c), and the other terms on the right hand side of (16) are clearly convex for any fixedy∈int(K). Therefore,H (·, y)is strictly convex for any fixedy∈int(K).

(b) From the expression ofH (x, y)given by (16) and Lemma3.2(b), obviously, the functionH (·, y)is continuously differentiable in int(K). Moreover,

∇xH (x, y)= ∇x(x)−lny−e=lnx−lny.

(c) From the definition of(x)and its gradient formula shown as in Lemma3.2(b), (x)−(y)− (y), x−y

=tr(x◦lnx)−tr(y◦lny)− lny+e, x−y

=tr(x◦lnx)−tr(x◦lny)−tr(x)+tr(y)

=H (x, y) (17)

for anyx∈Kandy∈int(K). In addition, the strict convexity ofimplies that (x)−(y)− (y), x−y ≥0

and the equality holds if and only ifx=y. The two sides readily give the desired result.

(10)

(d) Using the definition of H and Lemma3.1, we have for all x ∈K andy∈ int(K),

H (x, y)=tr(x◦lnx+y−x)−tr(x◦lny)

≥tr(x◦lnx+y−x)− r j=1

λj(x)ln(λj(y))

= r j=1

λ_j(x)ln(λj(x))+λ_j(y)−λ_j(x) − r j=1

λ_j(x)ln(λj(y))

= r j=1

λ_j(x)ln(λj(x))−λ_j(x)ln(λj(y))+λ_j(y)−λ_j(x)

=d(λ(x), λ(y)).

The nonnegativity ofd(λ(x), λ(y))is direct by the definition ofd(·,·)in (15).

(e) For fixed y ∈int(K), from part (d) we have L_H(x, γ )⊆ {x∈K|d(λ(x), λ(y))≤γ}for allγ ≥0. Since the sets{u∈R^r₊|d(u, v)≤γ}are bounded for all γ≥0 by [26, Lemma 2.3], we have from the continuity ofλ(·)that the setsLH(x, γ ) are bounded for allγ≥0. Similarly, for fixedx∈Kⁿ, using Lemma 2.1(i) of [26]

the setsL_H(y, γ )are bounded for allγ≥0.

Proposition 3.2 Let H (x, y) be defined by (12). Suppose {x^k}k∈N ⊆ K and {y^k}k∈N⊂int(K)are bounded sequences such thatH (x^k, y^k)→0. Then, ask→ ∞, we have

(a) λ_j(x^k)−λ_j(y^k)→0 for allj=1,2, . . . , r. (b) tr(x^k−y^k)→0.

Proof (a) From Proposition 3.1(d), H (x^k, y^k) ≥d(λ(x^k), λ(y^k))≥ 0. Hence, H (x^k, y^k)→ 0 implies d(λ(x^k), λ(y^k))→ 0. By the definition of d(·,·) given by (15),

d(λ(x^k), λ(y^k))= r j=1

λ_j(y^k)ϕ

λ_j(x^k)/λ_j(y^k)

withϕ(t )=tlnt−t+1(t≥0). Sinceϕ(t )≥0 for anyt≥0, each term of the above sum is nonnegative, and consequently,d(λ(x^k), λ(y^k))→0 implies

λ_j(y^k)ϕ

λ_j(x^k)/λ_j(y^k)

→0, j=1,2, . . . , r.

This is equivalent to saying that

λ_j(x^k)ln(λj(x^k))−λ_j(x^k)ln(λj(y^k))+λ_j(y^k)−λ_j(x^k)→0, j =1,2, . . . , r.

Since{λ_j(x^k)}and{λ_j(y^k)}are bounded, using Lemma A.1 of [6] then yields that λ_j(x^k)−λ_j(y^k)→0 for allj =1,2, . . . , r.

(11)

(b) Since tr(x^k−y^k)=_r

j=1(λ_j(x^k)−λ_j(y^k)), the result follows from part (a).

To close this section, we present two useful relations for the functionH, which can be easily verified by using the definition ofH and recalling the nonnegativity ofH.

Proposition 3.3 LetH (x, y)be defined by (12). For allx, y∈int(K)andz∈K, we have

(a) H (z, x)−H (z, y)=tr(z◦lny−z◦lnx+x−y).

(b) tr((z−y)◦(lny−lnx))=H (z, x)−H (z, y)−H (y, x)≤H (z, x)−H (z, y).

4 Convergence analysis of the algorithm

The convergence analysis of the entropy-like proximal algorithm defined by (11)–

(12) is similar to that of the proximal point method [12] for convex minimization problems and the proximal-like algorithm [4] using Bregman functions. In this section, we will present a global convergence rate estimate for the algorithm in terms of function values. For this purpose, we need to make the following assumptions for the CSCP (1):

(A) inf{f (x)|x_K0} :=f_∗>−∞and domf∩int(K)= ∅.

In addition, we denote byX^∗:= {x∈K|f (x)=f_∗}the solution set of (1).

First, we show that the algorithm in (11)–(12) is well defined, i.e., it generates a sequence{x^k}k∈N⊂int(K). This is a direct consequence of Proposition4.1below.

For the proof of Proposition4.1, we need the following technical lemma.

Lemma 4.1 For anyx_K0,y _K0 and tr(x◦y)=0, we always havex=0.

Proof From the self-duality ofKand [11, Proposition I. 1.4], we have that u∈int(K) ⇐⇒ u, v>0, ∀0=v∈K,

which together with tr(x◦y)= x, yimmediately impliesx=0.

Proposition 4.1 For any fixed y ∈ int(K) and μ >0, let F_μ(·, y):= f (·)+ μ⁻¹H (·, y). Then, under assumption (A), we have the following results.

(a) The functionF_μ(·, y)has bounded level sets.

(b) There exists a uniquex(y, μ)∈int(K)such that x(y, μ):=argmin

x_K0

Fμ(x, y)

and

μ⁻¹(lny−lnx(y, μ))∈∂f (x(y, μ)). (18)

(12)

Proof (a) SinceF_μ(·, y)is convex, to show thatF_μ(·, y)has bounded level sets, it suffices to show that for anyν≥f_∗>−∞, the level setL(ν):= {x|F_μ(x, y)≤ν} is bounded. Letν:=(ν−f_∗)μ. Clearly, we haveL(ν)⊂LH(x, ν). Moreover, by Proposition3.1(e),LH(x, ν)is bounded. Therefore,L(ν)is bounded.

(b) From part (a), F_μ(·, y) has bounded level sets, which in turn implies the existence of the minimum point x(y, μ). Also, the strict convexity of F_μ(x, y) by Proposition 3.1(a) guarantees the uniqueness. Under assumption (A), using Proposition 3.1(b) and the optimality condition for the minimization problem argmin_x

K0Fμ(x, y)gives that 0∈∂f (x(y, μ))+μ⁻¹

lnx(y, μ)−lny

+∂δ(x(y, μ)|K), (19)

whereδ(z|K)=0 ifz_K0 and+∞otherwise. We will show that∂δ(x(y, μ)|K)= {0}and hence the desired result follows. Notice that for anyx^k∈int(K)withx^k→ x∈bd(K),

lnx^k = _r

j=1

[ln(λj(x^k))]² 1/2

→ +∞,

where the second relation is due to the continuity ofλ_j(·)and lnt. Consequently,

∇xH (x^k, y) = lnx^k−lny ≥ lnx^k − lny → +∞.

This by [19, pp. 251] means thatH (·, y)is essentially smooth, and then∂_xH (x, y)=

∅ for all x ∈ bd(K) by [19, Theorem 26.1]. Together with (19), we must have x(y, μ)∈int(K). Furthermore, from [19, pp. 226], it follows that

∂δ(z|K)= {v∈V|v_K0, tr(v◦z)=0}.

Using Lemma4.1then yields∂δ(x(y, μ)|K)= {0}. Thus, the proof is completed.

Next we establish several important properties for the algorithm defined by (11)–

(12), from which our convergence result will follow.

Proposition 4.2 Let{x^k}k∈Nbe the sequence generated by the algorithm defined by (11)–(12), and letσ_m:=_m

k=1μ_k. Then, the following results hold.

(a) The sequence{f (x^k)}k∈Nis nonincreasing.

(b) μ_k(f (x^k)−f (x))≤H (x, x^k⁻¹)−H (x, x^k)for allx∈K.

(c) σm(f (x^m)−f (x))≤H (x, x⁰)−H (x, x^m)for allx∈K.

(d) {H (x, x^k)}k∈Nis nonincreasing for anyx∈X^∗, if the optimal setX^∗= ∅. (e) H (x^k, x^k⁻¹)→0 and tr(x^k−x^k⁻¹)→0, if the optimal setX^∗= ∅.

Proof (a) From the definition ofx^kin (11), it follows that

f (x^k)+μ⁻_k¹H (x^k, x^k⁻¹)≤f (x^k⁻¹)+μ⁻_k¹H (x^k⁻¹, x^k⁻¹).

(13)

SinceH (x^k, x^k⁻¹)≥0 andH (x^k⁻¹, x^k⁻¹)=0 (see Proposition3.1(c)), we obtain f (x^k)≤f (x^k⁻¹) ∀k∈N,

and therefore the sequence{f (x^k)}k∈Nis nonincreasing.

(b) From (18),−μ⁻_k¹(lnx^k−lnx^k⁻¹)∈∂f (x^k). Plugging this into the formula of

∂f (x^k)given by (10), we have f (x)≥f (x^k)−μ⁻_k¹tr

(x−x^k)◦(lnx^k−lnx^k⁻¹) ∀x∈V. Consequently, for anyx∈K,

μ_k·(f (x^k)−f (x))≤tr

(x−x^k)◦(lnx^k−lnx^k⁻¹)

=H (x, x^k⁻¹)−H (x, x^k)−H (x^k, x^k⁻¹)

≤H (x, x^k⁻¹)−H (x, x^k), (20) where the equality holds due to Proposition3.3(b), and the last inequality follows from the nonnegativity ofH (x, y).

(c) First, summing up the inequalities in part (b) overk=1,2,3, . . . , myields that m

k=1

μ_kf (x^k)−σ_mf (x)≤H (x, x⁰)−H (x, x^m) ∀x∈K. (21) On the other hand, sincef (x^m)≤f (x^k)by part (a), multiplying this inequality by μ_kand summing up the inequalities overk=1,2,3, . . . , mthen gives that

m k=1

μkf (x^k)≥ m k=1

μkf (x^m)=σmf (x^m). (22) Now, combining (22) with (22), we readily obtain the desired result.

(d) Sincef (x^k)−f (x)≥0 for allx∈X^∗, the result immediately follows from part (b).

(e) From part (d), the sequence{H (x, x^k)}k∈N is nonincreasing for anyx∈X^∗. This together withH (x, x^k)≥0 implies that{H (x, x^k)}k∈Nis convergent. Thus,

H (x, x^k⁻¹)−H (x, x^k)→0 ∀x∈X^∗. (23) On the other hand, from (20) and part (d), we have

0≤μ_k(f (x^k)−f (x))≤H (x, x^k⁻¹)−H (x, x^k)−H (x^k, x^k⁻¹), ∀x∈X^∗, which in turn implies

H (x^k, x^k⁻¹)≤H (x, x^k⁻¹)−H (x, x^k).

Using (23) and the nonnegativity of H (x^k, x^k⁻¹), the first result is then obtained.

Since {x^k}k∈N ⊂ {y ∈ int(K)|H (x, y) ≤H (x, x⁰)} with x ∈ X^∗, the sequence

(14)

{x^k}k∈N is bounded. Thus, the second result follows by the first result and Propo-

sition3.2(b).

By now, we have proved that the algorithm in (11)–(12) is well-defined and sat- isfies some favorable properties. With these properties, we are ready to establish the convergence results of the algorithm which is one of the main purposes of this paper.

Proposition 4.3 Let{x^k}k∈N be the sequence generated by the algorithm in (11)–

(12), and letσm:=_m

k=1μk. Then, under assumptions (A), the following results hold.

(a) f (x^m)−f (x)≤σ_m⁻¹H (x, x⁰)for allx∈K.

(b) Ifσm→ +∞, then limm→+∞f (x^m)=f_∗.

(c) If the optimal setX^∗= ∅, then the sequence{x^k}k∈Nis bounded, and if, in addi- tion,σm→ +∞, every accumulation point is a solution of the CSCP (1).

Proof (a) This result follows by Proposition 4.2(c) and the nonnegativity of H (x, x^m).

(b) Sinceσm→ +∞, passing the limit to the inequality in part (a), we have lim sup

m→+∞f (x^m)≤f (x) ∀x∈K, which particularly implies

lim sup

m→+∞f (x^m)≤inf{f (x): x∈K}.

This together with the fact thatf (x^m)≥f_∗=inf{f (x):x∈K}yields the result.

(c) From Proposition3.1(d),H (x,·)has bounded level sets for everyx∈K. Also, from Proposition4.2(d),{H (x, x^k)}k∈N is nonincreasing for allx∈X^∗ifX^∗= ∅.

Thus, the sequence{x^k}k∈Nis bounded and therefore has an accumulation point. Let ˆ

x∈Kbe an accumulation point of{x^k}k∈N. Then{x^k^j} → ˆx for somekj → +∞. Sincef is lower semi-continuous, we have f (x)ˆ =lim infk_j→+∞f (x^k^j)(see [22, p. 8]). On the other hand, we also havef (x^k^j)→f_∗, hencef (x)ˆ =f_∗which says

thatxˆis a solution of (1).

Proposition4.3(a) indicates an estimate for global rate of convergence which is similar to the one obtained for the proximal-like algorithms of convex minimization over nonnegative orthant cone. Analogously, as remarked in [6, Remark 4.1], global convergence of{x^k}itself to a solution of (1) can be achieved under the assumption that{x^k} ⊂int(K)has a limit point in int(K). Nonetheless, this assumption is rather stringent.

5 Exponential multiplier method for SCLP

In this section, we give a dual application of the entropy-like proximal algorithm in (11)–(12), leading to an exponential multiplier method for the symmetric cone linear

(15)

program

(SCLP) min

−b^Ty: c− m i=1

y_ia_i_K0, y∈R^m

,

wherec∈V,ai∈Vandb∈R^m. For the sake of notation, we denote the feasible set of (SCLP) byF:= {y∈R^m:A(y)_K0}, where A:R^m→Vis a linear operator defined by

A(y):=c− m i=1

yiai ∀y∈R^m. (24)

In addition, we make the following assumptions for the problem (SCLP):

(A1) The optimal solution set of (SCLP) is nonempty and bounded;

(A2) Slater’s condition holds, i.e., there exists ayˆ∈R^msuch thatA(y)ˆ _K0.

Notice that the Lagrangian function associated with (SCLP) is defined as follows:

L(y, x)=

−b^Ty−tr[x◦A(y)] ifx∈K,

+∞ otherwise.

Therefore, the dual problem corresponding to the SCLP is given by (DSCLP) max

x_K0 inf

y∈R^mL(y, x), which can be explicitly written as

(DSCLP)

⎧⎪

⎨

⎪⎩

max−tr(c◦x) s.t. tr(ai◦x)=b_i,

x_K0.

i=1,2, . . . , m,

From the standard convex analysis arguments in [19], it follows that under assumption (A2) there is no duality gap between (SCLP) and (DSCLP), and furthermore, the solution set of (DSCLP) is nonempty and compact.

To solve the problem (SCLP), we introduce the following multiplier-type al- gorithm: given x⁰∈int(K), generate the sequence {y^k}k∈N ⊆R^m and {x^k}k∈N⊂ int(K)by

y^k=argmin

y∈R^m

−b^Ty+μ⁻_k¹tr

exp

−μ_kA(y)+lnx^k⁻¹ , (25)

x^k=exp

−μkA(y^k)+lnx^k⁻¹

, (26)

where the parameters μ_k satisfyμ_k>μ >¯ 0 for all k∈N. The algorithm can be viewed as a natural extension of the exponential multiplier method used in convex programs over nonnegative orthant cones (see, e.g., [3,27]) to symmetric cones. In

(16)

the rest of this section, we will show that the algorithm defined by (25)–(26) possesses similar properties.

We first present two technical lemmas, where Lemma5.1collects some properties of the Löwner operator exp(z)and Lemma5.2characterizes the recession cone ofF.

Lemma 5.1

(a) For anyz∈V, there always holds that exp(z)∈int(K).

(b) The function exp(z)is continuously differentiable everywhere with (exp(z))e= ∇z(exp(z))e=exp(z) ∀z∈V. (b) The function tr[exp(_m

i=1yiai)]is continuously differentiable everywhere with

∇ytr

exp _m

i=1

y_ia_i

=A^T

exp _m

i=1

y_ia_i

,

wherey∈R^manda_i∈Vfor alli=1,2, . . . , m, andA^T :V→R^mbe a linear transformation defined byA^T(x)=(a₁, x, . . . ,a_m, x)^T for anyx∈V.

Proof (a) The result is clear since for anyz∈Vwithz=r

j=1λ_j(z)c_j, all eigenvalues of exp(z), given by exp(λj(z))forj =1,2, . . . , r, are positive.

(b) By Lemma2.2, clearly, exp(z)is continuously differentiable everywhere and (exp(z))h=

r j=1

exp(λj(z))c_j, hc_j

+

1≤j <l≤r

4exp(λj(z))−exp(λl(z))

λj(z)−λl(z) c_j◦(c_l◦h)

for anyh∈V. From this formula, we particularly have (exp(z))e=

r j=1

exp(λj(z))c_j, ec_j

+

1≤j <l≤r

4exp(λj(z))−exp(λl(z))

λ_j(z)−λ_l(z) c_j◦(c_l◦e)

= r j=1

exp(λj(z))c_j=exp(z).

(c) The first part is direct by the continuous differentiability of trace function and part (b), and the second part follows from the differential chain rule.

Lemma 5.2 LetF_∞denote the recession cone of the feasible setF. Then, F_∞=

d∈R^m:A(d)−c_K0 .

(17)

Proof Assume thatd∈R^msuch thatA(d)−c_K0. Ifd=0, clearly,d∈F_∞. If d=0, we take anyy∈F. From the definition of the linear operatorA, for anyτ >0,

A(y+τ d)=c− m i=1

(yi+τ di)ai=A(y)+τ (A(d)−c)_K0.

This, by the definition of recession direction [19, pp. 61], shows thatd∈F∞. Thus, we prove that{d∈R^m:A(d)−c_K0} ⊆F_∞.

Now take any d ∈F_∞ and y∈F. Then A(y+τ d)_K 0 for any τ >0, and therefore,λmin[A(y+τ d)] ≥0. This must implyλmin(A(d)−c)≥0. If not, using the fact that

λ_min[A(y+τ d)] =λ_min[A(y)+τ (A(d)−c)]

≤λmin(τ (A(d)−c))+λmax(A(y))

=τ λmin(A(d)−c)+λmax(A(y))

where the inequality is due to Lemma2.1, and lettingτ→ +∞, we then have λ_min[A(y+τ d)] → −∞,

contradicting the fact thatλ_min[A(y+τ d)] ≥0. Consequently,A(d)−c_K 0. To- gether with the above discussions, we show thatF∞= {d∈R^m|A(d)−c_K0}. Next we establish the convergence of the algorithm in (25)–(26). We first prove that the sequence generated is well-defined, which is implied by the following lemma.

Lemma 5.3 For anyy∈R^mandμ >0, letF :R^m→Rbe defined by F (y):= −b^Ty+μ⁻¹tr

exp(−μA(y)+lnx^k⁻¹) . (27) Then under assumption (A1) the minimum set ofF is nonempty and bounded.

Proof We prove thatF is coercive by contradiction. Suppose not, i.e., some level set ofF is not bounded. Then, there exists a sequence{y^k} ⊆R^msuch that

y^k → +∞, lim

k→+∞

y^k

y^k=d=0 and F (y^k)≤δ (28) for some δ ∈ R. Since exp(−μA(y)+lnx^k⁻¹)∈ int(K) for any y ∈ R^m by Lemma5.1(a),

tr

exp

−μA(y)+lnx^k⁻¹ = r j=1

exp

λ_j

−μA(y)+lnx^k⁻¹ >0.