• Aucun résultat trouvé

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

N/A
N/A
Protected

Academic year: 2022

Partager "One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties"

Copied!
13
0
0

Texte intégral

(1)

Constrained Optimization Problems with Non-Standard Growth Properties

Fedor S. Stonyakin1 and Alexander A. Titov2

1 V. I. Vernadsky Crimean Federal University, Simferopol, Russia fedyor@mail.ru

2 Moscow Institute of Physics and Technologies, Moscow, Russia a.a.titov@phystech.edu

Abstract. The paper is devoted to a special Mirror Descent algorithm for problems of convex minimization with functional constraints. The objective function may not satisfy the Lipschitz condition, but it must necessarily have a Lipshitz-continuous gradient. We assume, that the functional constraint can be non-smooth, but satisfying the Lipschitz condition. In particular, such functionals appear in the well-known Truss Topology Design problem. Also, we have applied the technique of restarts in the mentioned version of Mirror Descent for strongly convex problems.

Some estimations for the rate of convergence are investigated for the Mirror Descent algorithms under consideration.

Keywords: Adaptive Mirror Descent algorithm · Lipshitz-continuous gradient·Technique of restarts

1 Introduction

The optimization of non-smooth functionals with constraints attracts widespread interest in large-scale optimization and its applications [6, 18]. There are various methods of solving this kind of optimization problems. Some examples of these methods are: bundle-level method [14], penalty method [19], Lagrange multipli- ers method [7]. Among them, Mirror Descent (MD) [4, 12] is viewed as a simple method for non-smooth convex optimization.

In optimization problems with quadratic functionals we consider functionals which do not satisfy the usual Lipschitz property (or the Lipschitz constant is quite large), but they have a Lipshitz-continuous gradient. For such problems in ([2], item 3.3) the ideas of [14, 15] were adopted to construct some adaptive

Copyright cby the paper’s authors. Copying permitted for private and academic purposes.

In: S. Belim et al. (eds.): OPTA-SCL 2018, Omsk, Russia, published at http://ceur-ws.org

(2)

version of Mirror Descent algorithm. For example, let Ai (i = 1, . . . , m) be a positive-definite matrix:xTAix≥0∀xand the objective function

f(x) = max

1≤i≤mfi(x) for

fi(x) = 1

2hAix, xi − hbi, xi+αi, i= 1, . . . , m.

Note that such functionals appear in the Truss Topology Design problem with weights of the bars [15].

In this paper we propose some partial adaptive (by objective functional) ver- sion of algorithm from ([2], item 3.3). It simplifies work with problems where the necessity of calculating the norm of the subgradient of the functional constraint is burdensome in view of the large number of constraints. The idea of restarts [9] is adopted to construct the proposed algorithm in the case of strongly convex objective and constraints. It is well-known that both considered methods are optimal in terms of the lower bounds [12].

Note that a functional constraint, generally, can be non-smooth. That is why we consider subgradient methods. These methods have a long history starting with the method for deterministic unconstrained problems and Euclidean setting in [17] and the generalization for constrained problems in [16], where the idea of steps switching between the direction of subgradient of the objective and the direction of subgradient of the constraint was suggested. Non-Euclidean exten- sion, usually referred to as Mirror Descent, originated in [10, 12] and was later analyzed in [4]. An extension for constrained problems was proposed in [12], see also recent version in [3]. To prove faster convergence rate of Mirror Descent for strongly convex objective in an unconstrained case, the restart technique [11–

13] was used in [8]. Usually, the stepsize and stopping rule for Mirror Descent requires to know the Lipschitz constant of the objective function and constraint, if any. Adaptive stepsizes, which do not require this information, are consid- ered in [5] for problems without inequality constraints, and in [3] for constrained problems.

We consider some Mirror Descent algorithms for constrained problems in the case of non-standard growth properties of objective functional (it has a Lipshitz- continuous gradient).

The paper consists of Introduction and three main sections. In Section 2 we give some basic notation concerning convex optimization problems with func- tional constrains. In Section 3 we describe some partial adaptive version (Al- gorithm 2) of Mirror Descent algorithm from ([2], item 3.3) and prove some estimates for the rate of convergence of Algorithm 2. The last Section 4 is fo- cused on the strongly convex case with restarting Algorithm 2 and corresponding theoretical estimates for the rate of convergence.

(3)

2 Problem Statement and Standard Mirror Descent Basics

Let (E,||·||) be a normed finite-dimensional vector space andEbe the conjugate space ofE with the norm:

||y||= max

x {hy, xi,||x|| ≤1},

wherehy, xiis the value of the continuous linear functionalyat x∈E.

LetX ⊂E be a (simple) closed convex set. We consider two convex subd- iffirentiable functionals f and g: X →R. Also, we assume thatg is Lipschitz- continuous:

|g(x)−g(y)| ≤Mg||x−y|| ∀x, y∈X. (1) We focus on the next type of convex optimization problems

f(x)→min

x∈X, (2)

s.t. g(x)≤0. (3)

Letd:X →Rbe a distance generating function (d.g.f) which is continuously differentiable and 1-strongly convex w.r.t. the normk·k, i.e.

∀x, y,∈X h∇d(x)− ∇d(y), x−yi ≥ kx−yk2, and assume that min

x∈Xd(x) = d(0). Suppose we have a constant Θ0 such that d(x)≤Θ02, wherex is a solution of (2) – (3).

Note that if there is a set of optimal pointsX, then we may assume that min

x∈X

d(x)≤Θ20.

For allx, y∈X consider the corresponding Bregman divergence V(x, y) =d(y)−d(x)− h∇d(x), y−xi.

Standard proximal setups, i.e. Euclidean, entropy,`1/`2, simplex, nuclear norm, spectahedron can be found, e.g. in [5]. Let us define the proximal mapping op- erator standardly

Mirrx(p) = arg min

u∈X

hp, ui+V(x, u) for eachx∈X andp∈E. We make the simplicity assumption, which means that Mirrx(p) is easily com- putable.

(4)

3 Some Mirror Descent Algorithm for the Type of Problems Under Consideration

Following [14], given a functionf for each subgradient∇f(x) at a pointy∈X, we define

vf(x, y) =





∇f(x)

k∇f(x)k, x−y

, ∇f(x)6= 0

0 ∇f(x) = 0

, x∈X. (4)

In ([2], item 3.3) the following adaptive Mirror Descent algorithm for Problem (2) – (3) was proposed by the first author.

Algorithm 1Adaptive Mirror Descent, Non-Standard Growth Require: ε, Θ02, X, d(·)

1: x0 =argmin

x∈X

d(x) 2: I=:∅

3: N←0 4: repeat

5: if g(xN)≤ε→then 6: hN||∇f(xεN)||

7: xN+1←M irrxN(hN∇f(xN)) (”productive steps”)

8: N→I

9: else

10: (g(xN)> ε)→ 11: hN||∇g(xεN)||2

12: xN+1←M irrxN(hN∇g(xN)) (”non-productive steps”) 13: end if

14: N←N+ 1

15: untilΘ02ε22|I|+P

k6∈I ε2 2||∇g(xk)||2

Ensure: x¯N:= arg minxk,k∈If(xk)

For the previous method the next result was obtained in [2].

Theorem 1. Let ε >0be a fixed positive number and Algorithm 1 works N =

&

2 max{1, Mg202 ε2

'

(5) steps. Then

mink∈Ivf(xk, x)< ε. (6) Let us remind one well-known statement (see, e.g. [5]).

(5)

Lemma 1. Letf :X →Rbe a convex subdifferentiable function over the convex setX and a sequence {xk} be defined by the following relation:

xk+1=M irrxk(hk∇f(xk)).

Then for each x∈X

hkh∇f(xk), xk−xi ≤ h2k

2 ||∇f(xk)||2+V(xk, x)−V(xk+1, x). (7) The following Algorithm 2 is proposed by us for Problem (2) – (3).

Algorithm 2Partial Adaptive Version of Algorithm 1 Require: ε, Θ02, X, d(·)

1: x0 =argmin

x∈X

d(x) 2: I=:∅

3: N←0

4: if g(xN)≤ε→then 5: hNM ε

g·||∇f(xN)||

6: xN+1←M irrxN(hN∇f(xN)) (”productive steps”) 7: N→I

8: else

9: (g(xN)> ε)→ 10: hNMε2

g

11: xN+1←M irrxN(hN∇g(xN)) (”non-productive steps”) 12: end if

13: N←N+ 1

Ensure: x¯N:= arg minxk,k∈If(xk)

Set [N] ={k∈ 0, N−1}, J = [N]/I, where I is a collection of indexes of productive steps

hk = ε

Mg||∇f(xk)||

, (8)

and|I|is the number of ”productive steps”. Similarly, for ”non-productive” steps from the set J the analogous variable is defined as follows:

hk= ε

Mg2, (9)

and|J|is the number of ”non-productive steps”. Obviously,

|I|+|J|=N. (10)

Let us formulate the following analogue of Theorem 1.

(6)

Theorem 2. Let ε >0be a fixed positive number and Algorithm 2 works N =

&

2Mg2Θ20 ε2

'

(11)

steps. Then

mink∈Ivf(xk, x)< ε Mg

. (12)

Proof. 1) For the productive steps from (7), (8) one can get, that hkh∇f(xk), xk−xi ≤ h2k

2 ||∇f(xk)||2+V(xk, x)−V(xk+1, x).

Taking into account h22k||∇f(xk)||2=2Mε22

g

, we have hkh∇f(xk), xk−xi= ε

Mg

∇f(xk)

||∇f(xk)||, xk−x

= ε Mg

vf(xk, x). (13) 2) Similarly for the ”non-productive” stepsk∈J:

hk(g(xk)−g(x))≤ h2k

2 ||∇g(xk)||2+V(xk, x)−V(xk+1, x).

Using (1) and||∇g(x)|| ≤Mg we have hk(g(xk)−g(x))≤ ε2

2Mg2 +V(xk, x)−(xk+1, x). (14) 3) From (13) and (14) forx=x we have:

ε Mg

X

k∈I

vf(xk, x) +X

k∈J

ε

Mg2(g(xk)−g(x))≤

≤N ε2 2Mg2 +

N−1

X

k=0

(V(xk, x)−V(xk+1, x)). (15) Let us note that for anyk∈J

g(xk)−g(x)≥g(xk)> ε and in view of

N

X

k=1

(V(xk, x)−V(xk+1, x))≤Θ20 the inequality (15) can be transformed in the following way:

ε Mg

X

k∈I

vf(xk, x)≤N ε2

2Mg202− ε2 Mg2|J|.

(7)

On the other hand, X

k∈I

vf(xk, x)≥ |I|min

k∈Ivf(xk, x).

Assume that

ε2

2Mg2N ≥Θ20, orN ≥2MgΘ20

ε2 . (16)

Thus,

|I| ε

Mgminvf(xk, x)< N ε2 2Mg2 − ε2

Mg2|J|+Θ20

≤N ε2 Mg2 − ε2

Mg2|J|= ε2 Mg2|I|, whence

|I| ε Mg

minvf(xk, x)< ε2

Mg2|I| ⇒minvf(xk, x)< ε Mg

. (17)

To finish the proof we should demonstrate that|I| 6= 0. Supposing the reverse we claim that|I|= 0⇒ |J|=N, i.e. all the steps are non-productive, so after using

g(xk)−g(x)≥g(xk)> ε we can see, that

N−1

X

k=0

hk(g(xk)−g(x))≤

N−1

X

k=0

ε2

2Mg202≤N ε2

2Mg2 +N ε2

2Mg2 =N ε2 Mg2. So,

ε Mg2

N−1

X

k=0

(g(xk)−g(x))≤N ε2 Mg2 and

N ε <

N−1

X

k=0

(g(xk)−g(x))≤N ε.

So, we have the contradiction. It means that|I| 6= 0.

The following auxiliary assertion (see, e.g [14, 15]) is fullfilled (xis a solution of (2) – (3)).

Lemma 2. Let us introduce the following function:

ω(τ) = max

x∈X{f(x)−f(x) :||x−x|| ≤τ}, (18) whereτ is a positive number. Then for anyy∈X

f(y)−f(x)≤ω(vf(y, x)). (19)

(8)

Now we can show, how using the previous assertion and Theorem 2, one can estimate the rate of convergence of the Algorithm 2 if the objective function f is differentiable and its gradient satisfies the Lipschitz condition:

||∇f(x)− ∇f(y)||≤L||x−y|| ∀x, y∈X. (20) Using the next well-known fact

f(x)≤f(x) +||∇f(x)||||x−x||+1

2L||x−x||2, we can get that

mink∈If(xk)−f(x)≤min

k∈I

||∇f(x)||||xk−x||+1

2L||xk−x||2

.

So,

f(x)−f(x)≤ ||∇f(x)|| ε

Mg + Lε2

2Mgf+ Lε2 2Mg, where

εf =||∇f(x)|| ε Mg

. That is why the following result holds.

Corollary 1. If f is differentiable on X and (20)holds. Then after N =

&

2Mg2Θ20 ε2

'

steps of Algorithm 2 working the next estimate can be fulfilled:

min

0≤k≤Nf(xk)−f(x)≤εf+L 2

ε2 Mg2, where

εf =||∇f(x)||

ε Mg

.

We can apply our method to some class of problems with a special class of non-smooth objective functionals.

Corollary 2. Assume thatf(x) = max

i=1,m

fi(x), where fi is differentiable at each x∈X and

||∇fi(x)− ∇fi(y)||≤Li||x−y|| ∀x, y∈X.

Then after

N =

&

2Mg2Θ20 ε2

'

(9)

steps of Algorithm 2 working the next estimate can be fulfilled:

0≤k≤Nmin f(xk)−f(x)≤εf+L 2

ε2 Mg2, where

εf =||∇f(x)|| ε Mg

, L= max

i=1,m

Li.

Remark 1. Generally, ||∇f(x)|| 6= 0, because we consider some class of con- strained problems.

4 On the Technique of Restarts in the Considered Version of Mirror Descent for Strongly Convex Problems

In this subsection, we consider problem

f(x)→min, g(x)≤0, x∈X (21)

with assumption (1) and additional assumption of strong convexity off andg with the same parameterµ, i.e.,

f(y)≥f(x) +h∇f(x), y−xi+µ

2ky−xk2, x, y∈X

and the same holds forg. We also slightly modify assumptions on prox-function d(x). Namely, we assume that 0 = arg minx∈Xd(x) and that d is bounded on the unit ball in the chosen norm k · kE, that is

d(x)≤ Ω

2 ∀x∈X :kxk ≤1, (22)

whereΩis some known number. Finally, we assume that we are given a starting pointx0∈X and a numberR0>0 such thatkx0−xk2≤R02.

To construct a method for solving problem (21) under stated assumptions, we use the idea of restarting Algorithm 2. The idea of restarting a method for convex problems to obtain faster rate of convergence for strongly convex problems dates back to 1980’s, see e.g. [12, 13]. To show that restarting algorithm is also possible for problems with inequality constraints, we rely on the following Lemma (see, e.g. [1]).

Lemma 3. Iff andgareµ-strongly convex functionals with respect to the norm k · kon X,x=argmin

x∈Xf(x),g(x)≤0 (∀x∈X) andεf >0 andεg>0:

f(x)−f(x)≤εf, g(x)≤εg. (23)

Then µ

2kx−xk2≤max{εf, εg}. (24)

(10)

In conditions of Corollary 2, after Algorithm 2 stops the inequalities will be true (23) for

εf = ε Mg

k∇f(x)k+ ε2L 2Mg2 andεg=ε. Consider the functionτ:R+→R+:

τ(δ) = max

δk∇f(x)k2L 2 ; δMg

.

It is clear thatτ increases and therefore for eachε >0 there exists ϕ(ε)>0 : τ(ϕ(ε)) =ε.

Remark 2. ϕ(ε) depends on k∇f(x)k and Lipschitz constant L for ∇f. If k∇f(x)k< Mg thenϕ(ε) =εfor small enoughε:

ε <2(Mg− k∇f(x)k)

L .

For another case (k∇f(x)k> Mg) we have∀ε >0:

ϕ(ε) =

pk∇f(x)k2+ 2εL− k∇f(x)k

L .

Let us consider the following Algorithm 3 for the problem (21).

Algorithm 3Algorithm for the Strongly Convex Problem

Require: accuracyε >0; strong convexity parameterµ;Θ20 s.t.d(x)≤Θ20 ∀x∈X: kxk ≤1; starting pointx0and numberR0 s.t.kx0−xk2≤R20.

1: Setd0(x) =d x−x0

R0

. 2: Setp= 1.

3: repeat

4: SetR2p=R20·2−p. 5: Setεp= µR

2 p 2 .

6: Setxpas the output of Algorithm 2 with accuracyεp, prox-functiondp−1(·) and Θ20.

7: dp(x)←dx−x

p Rp

. 8: Setp=p+ 1.

9: untilp >log2µR20. Ensure: xp.

The following theorem is fulfilled.

Theorem 3. Letf andgsatisfy Corollary 2. Iff, gareµ-strongly convex func- tionals on X ⊂ Rn and d(x) ≤θ20 ∀x∈ X, kxk ≤ 1. Let the starting point

(11)

x0 ∈ X and the number R0 > 0 be given and kx0 −xk2 ≤ R20. Then for pb=

log2µR20

xpbis theε-solution of Problem (2) –(3), where kxpb−xk2≤ 2ε

µ.

At the same time, the total number of iterations of Algorithm 2 does not exceed

pb+

bp

X

p=1

02M g2

ϕ2p) , where εp= µR20 2p+1.

Proof. The functiondp(x) (p= 0,1,2, . . .) is 1-strongly convex with respect to the norm k · k

Rp

. By the method of mathematical induction, we show that kxpb−xk2≤Rp ∀p≥0.

Forp= 0 this assertion is obvious by virtue of the choice ofx0andR0. Suppose that for somep:kxp−xk2≤R2p. Let us prove thatkxp+1−xk2≤R2p+1. We have dp(x)≤θ02and on (p+ 1)-th restart after no more than

&

20Mg2 ϕ2p+1)

'

iterations of Algorithm 2 the following inequalities are true:

f(xp+1)−f(x)≤εp+1, g(xp+1)≤εp+1 for εp+1= µR2p+1 2 . Then, according to Lemma 3

kxp+1−xk2≤2εp+1

µ =R2p+1. So, for allp≥0

kxp−xk2≤R2p= R20

2p, f(xp)−f(x)≤ µR20

2 2−p, g(xp)≤ µR02 2 2−p. Forp=pb=

log2µR20

the following relation is true kxp−xk2≤R2p=R20·2−p≤ 2ε

µ.

It remains only to note that the number of iterations of the work of Algorithm 2 is no more than

pb

X

p=1

&

20Mg2 ϕ2p+1)

'

≤pb+

bp

X

p=1

20Mg2 ϕ2p+1).

(12)

Acknowledgement.The authors are very grateful to Yurii Nesterov, Alexander Gasnikov and Pavel Dvurechensky for fruitful discussions. The authors would like to thank the unknown reviewers for useful comments and suggestions.

The research by Fedor Stonyakin and Alexander Titov (Algorithm 3, Theo- rem 3, Corollary 1 and Corollary 2) presented was partially supported by Russian Foundation for Basic Research according to the research project 18-31-00219.

The research by Fedor Stonyakin (Algorithm 2 and Theorem 2) presented was partially supported by the grant of the President of the Russian Federation for young candidates of sciences, project no. MK-176.2017.1.

References

1. Bayandina, A., Gasnikov, A., Gasnikova, E., Matsievsky, S.: Primal-dual mir- ror descent for the stochastic programming problems with functional constraints.

Computational Mathematics and Mathematical Physics. (accepted) (2018).https:

//arxiv.org/pdf/1604.08194.pdf, (in Russian)

2. Bayandina, A., Dvurechensky, P., Gasnikov, A., Stonyakin, F., Titov, A.: Mirror descent and convex optimization problems with non-smooth inequality constraints.

In: LCCC Focus Period on Large-Scale and Distributed Optimization. Sweden, Lund: Springer. (accepted) (2017).https://arxiv.org/abs/1710.06612

3. Beck, A., Ben-Tal, A., Guttmann-Beck, N., Tetruashvili, L.: The comirror algo- rithm for solving nonsmooth constrained convex problems. Operations Research Letters 38(6), 493–498 (2010)

4. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient meth- ods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)

5. Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization. Society for Industrial and Applied Mathematics, Philadelphia (2001)

6. Ben-Tal, A., Nemirovski, A.: Robust Truss Topology Design via semidefinite pro- gramming. SIAM Journal on Optimization 7(4), 991–1016 (1997)

7. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)

8. Juditsky, A., Nemirovski, A.: First order methods for non-smooth convex large- scale optimization. I: general purpose methods. Optimization for Machine Learn- ing, S. Sra et al (eds.), 121–184. Cambridge, MA: MIT Press (2012)

9. Juditsky, A., Nesterov, Y.: Deterministic and stochastic primal-gual subgradient al- gorithms for uniformly convex minimization. Stochastic Systems 4(1), 44–80 (2014) 10. Nemirovskii, A.: Efficient methods for large-scale convex optimization problems.

Ekonomika i Matematicheskie Metody (1979), (in Russian)

11. Nemirovskii, A., Nesterov, Y.: Optimal methods of smooth convex minimization.

USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985), (in Russian)

12. Nemirovsky, A., Yudin, D.: Problem Complexity and Method Efficiency in Opti- mization. J. Wiley & Sons, New York (1983)

13. Nesterov, Y.: A method of solving a convex programming problem with conver- gence rateO(1/k2). Soviet Mathematics Doklady 27(2), 372–376 (1983)

14. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course.

Kluwer Academic Publishers, Massachusetts (2004)

(13)

15. Nesterov, Y.: Subgradient methods for convex functions with nonstan- dard growth properties: https://www.mathnet.ru:8080/PresentFiles/16179/

growthbm_nesterov.pdf, [Online; accessed 01-April-2018]

16. Polyak, B.: A general method of solving extremum problems. Soviet Mathematics Doklady 8(3), 593–597 (1967), (in Russian)

17. Shor, N. Z.: Generalized gradient descent with application to block programming.

Kibernetika 3(3), 53–55 (1967), (in Russian)

18. Shpirko, S., Nesterov Y.: Primal-dual subgradient methods for huge-scale linear conic problem. SIAM Journal on Optimization 24 (3), 1444 – 1457 (2014) 19. Vasilyev, F.: Optimization Methods. Fizmatlit, Moscow (2002), (in Russian)

Références

Documents relatifs

A la question 14 concernant le processus créatif en salle d’informatique, les élèves précisent que malgré le fait qu’il est plus facile pour eux de composer avec

Here we describe a method to measure lipid synthesis flux in intact cells from the saponifiable (including fatty acids) and non-saponifiable (including sterols) fractions of

46 The interested reader might further want to learn, 47 that the computational efficiency of Tregenza’s 48 concept of daylight coefficients has been recog- 49 nized by a number

changing strength of the P Cygni absorption, altering the mean position of the unabsorbed emission, (2) variation of the relative strengths of the two O vi doublet lines,

Then in any given tree with edge lengths and Poisson point process of mutations (with rate θ) independent of the genealogy (as in our situation), the expectation of the number

Generalized Hermite Reduction, Creative Telescoping and Definite Integration of D-Finite Functions.. Alin Bostan, Frédéric Chyzak, Pierre Lairez,

Automatic segmentation of brain tumor in intraoperative ultrasound images using 3D U-Net.. François-Xavier Carton, Matthieu Chabanas, Bodil Munkvold, Ingerid Reinertsen,

Throughout the algorithm, we compute the SoC Y pΠ, iq of the EV upon arrival at node i P Π assuming the following heuristic (H) charging policy: at each CS, the EV is charged in such