GC-ES: A Globally Convergent Evolution Strategy for Unconstrained and Constrained Optimization

(1)

1/29

GC-ES: A Globally Convergent Evolution Strategy

for Unconstrained and Constrained Optimization

Youssef Diouane (ISAE-SUPAERO)

Joint work with: Serge Gratton and Luis Nunes Vicente

22nd International Symposium on Mathematical Programming, Pittsburgh, Monday, July 13

(2)

2/29

Plan

1 Introduction

2 A Globally Convergent Evolution Strategy

Algorithmic Modifications Global Convergence

3 Numerical results

The GC-ES solver Linear constraints case Non-linear constraints case

4 Conclusions

(3)

3/29

Derivative-free optimization

Minimizingf (x ).

Possibly subject to a constraints setΩ.

f is assumed to be differentiable, but the derivatives of f are not available (expensive in CPU, legacy code, not coded at all).

In Derivative-Free Optimization (DFO), we can have two types of algorithms :

Deterministic DFO (essentially model-based methods, direct-search methods, ...).

Stochastic DFO (essentially simulated annealing, particle swarm optimization, evolutionary algorithms, ...).

(4)

3/29

Derivative-free optimization

Minimizingf (x ).

Possibly subject to a constraints setΩ.

f is assumed to be differentiable, but the derivatives of f are not available (expensive in CPU, legacy code, not coded at all). In Derivative-Free Optimization (DFO), we can have two types of algorithms :

Deterministic DFO (essentially model-based methods, direct-search methods, ...).

Stochastic DFO (essentially simulated annealing, particle swarm optimization, evolutionary algorithms, ...).

(5)

4/29

Stochastic DFO & Evolution Strategies

Only few interactions between the two DFO categories were established.

In this work, we show how ideas from deterministic DFO can improve the efficiency of one of the most successful class of

stochastic algorithms, known as Evolution

Strategies (ES’s)[Rechenberg 1973] [Schwefel 1975].

(6)

4/29

Stochastic DFO & Evolution Strategies

(7)

4/29

Stochastic DFO & Evolution Strategies

(8)

4/29

Stochastic DFO & Evolution Strategies

(9)

4/29

Stochastic DFO & Evolution Strategies

(10)

5/29

A Class of Evolution Strategies

1 _{Offspring Generation}_{: Compute new sample points}

Yk+1 = {yk+11 , ..., y λ k+1} such that : y_k+1i =xk +σkESd i k,

where d_ki is drawn from a distribution Ck, i = 1, ..., λ.

2 Parent Selection: Evaluate f (yi

k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωi_k˜y_k+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).

(11)

5/29

A Class of Evolution Strategies

(12)

5/29

A Class of Evolution Strategies

(13)

5/29

A Class of Evolution Strategies

(14)

6/29

A Class of Evolution Strategies

A relevant implementation [Hansen et al, 1996]

CMA-ES(CovarianceMatrix Adaptation -EvolutionStrategy). Evolution of the population using a normal distribution with a covariance matrix (and step-size) adaptation mechanism. Considered as one of the best derivative-free optimization

methods.[BBOB, 2013] [BBOB, 2011] [Rios, Sahinidis, 2011]

(15)

6/29

A Class of Evolution Strategies

CMA-ES(CovarianceMatrix Adaptation - EvolutionStrategy).

Evolution of the population using a normal distribution with a covariance matrix (and step-size) adaptation mechanism. Considered as one of the best derivative-free optimization

(16)

6/29

A Class of Evolution Strategies

CMA-ES(CovarianceMatrix Adaptation - EvolutionStrategy). Evolution of the population using a normal distribution with a covariance matrix (and step-size) adaptation mechanism.

Considered as one of the best derivative-free optimization

(17)

6/29

A Class of Evolution Strategies

CMA-ES(CovarianceMatrix Adaptation - EvolutionStrategy). Evolution of the population using a normal distribution with a covariance matrix (and step-size) adaptation mechanism. Considered as one of the best derivative-free optimization

(18)

7/29

Some Existing Convergence Results

Yk+1 = {yk+11 , ..., y λ

k+1} such that :

yk+1i =xk +σkESdki,

(19)

7/29

Some Existing Convergence Results

Yk+1 = {yk+11 , ..., y λ

k+1} such that :

yk+1i =xk +σkESdki,

(20)

7/29

Some Existing Convergence Results

The sphere problem is among the most studied cases. [Bayer, 1998]

[Bienve¨ue and Fran¸cois, 2003] [Auger, 2005]

A single parent is considered (the recombination Pµ

i =1ω i ky˜

i

k+1 is not

allowed). [Yin, Rudolph and Schwefel, 1995] [Greenwood and Zhu

2001] [J¨agersk¨upper 2003]

In the isotropic case and a scale invariant adaptation rule. [Jebalia

and Auger, 2010]

(21)

7/29

Some Existing Convergence Results

i =1ω i ky˜

i

k+1 is not

In the isotropic case and a scale invariant adaptation rule. [Jebalia

and Auger, 2010]

(22)

7/29

Some Existing Convergence Results

i =1ω i ky˜

i

k+1 is not

In the isotropic case and a scale invariant adaptation rule.[Jebalia

and Auger, 2010]

(23)

8/29

Main objectives

Equip a class of evolution strategies with known techniques from deterministic DFO.

Achieve convergence from any starting point to a stationary point for evolution strategies.

Obtain good performance on practical problems, in terms of function evaluations, supposed to be expensive.

Extend the proposed algorithm to handle constrained optimization problems.

(24)

8/29

Main objectives

(25)

8/29

Main objectives

(26)

8/29

Main objectives

(27)

8/29

Main objectives

(28)

9/29

Plan

1 Introduction

3 Numerical results

4 Conclusions

(29)

10/29

Algorithmic Modifications (1)

The objective function is minimized on a dense

set of directions. [Audet and Denis, 2006]

Changes in the algorithm[Kolda,

Lewis,Torczon 2003]:

One will impose a sufficient decrease along the iterates.

We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.

A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )

=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.

(30)

10/29

Algorithmic Modifications (1)

(31)

10/29

Algorithmic Modifications (1)

(32)

10/29

Algorithmic Modifications (1)

mean/mean

(33)

10/29

Algorithmic Modifications (1)

mean/mean

(34)

10/29

Algorithmic Modifications (1)

mean/mean

(35)

10/29

Algorithmic Modifications (1)

mean/mean

(36)

10/29

Algorithmic Modifications (1)

mean/mean

(37)

10/29

Algorithmic Modifications (1)

mean/mean

(38)

10/29

Algorithmic Modifications (1)

mean/mean

(39)

10/29

Algorithmic Modifications (1)

mean/mean

(40)

11/29

Algorithmic Modifications (2)

Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :

y_k+1i = xk + σkESd i k,

where di

k is drawn from a distribution Ck, i = 1, ..., λ.

Parent Selection : Evaluate f (yi

k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜y λ k+1} : f (˜y 1 k+1) ≤ ... ≤ f (˜y λ k+1). xk+1 = µ X i =1 ωi_k˜y_k+1i λ ≥ µ.

Imposing Sufficient Decrease :

ES Updates : σ_k+1ES , Ck and (ω01, ..., ω µ

0) ∈ S. Return to step (1).

(41)

11/29

Algorithmic Modifications (2)

y_k+1i = xk + σkESd i k,

where di

Imposing Sufficient Decrease :

(42)

11/29

Algorithmic Modifications (2)

y_k+1i = xk +σk d

i k,

where di

Imposing Sufficient Decrease :

(43)

11/29

Algorithmic Modifications (2)

y_k+1i = xk +σk d

i k,

where di

Parent Selection : Evaluate fΩ(yk+1i ), i = 1, ..., λ, and reorder

Yk+1 = {˜yk+11 , ..., ˜y λ k+1} :fΩ(˜yk+11 ) ≤ ... ≤fΩ(˜yk+1λ ). xk+1 = µ X i =1 ωi_k˜y_k+1i λ ≥ µ.

Imposing Sufficient Decrease :

(44)

11/29

Algorithmic Modifications (2)

y_k+1i = xk +σk d˜

i k,

where di

Yk+1 = {˜yk+11 , ..., ˜y λ k+1} :fΩ(˜yk+11 ) ≤ ... ≤fΩ(˜yk+1λ ). xk+1 = µ X i =1 ωi_k˜y_k+1i λ ≥ µ.

Imposing Sufficient Decrease :

ES Updates : σES

k+1, Ck and (ω01, ..., ω µ

(45)

11/29

Algorithmic Modifications (2)

i k,

where di

Yk+1 = {˜yk+11 , ..., ˜y λ k+1} :fΩ(˜yk+11 ) ≤ ... ≤fΩ(˜yk+1λ ). x_k+1trial = µ X i =1 ωi_k˜y_k+1i λ ≥ µ.

Imposing Sufficient Decrease :

ES Updates : σES

k+1, Ck and (ω01, ..., ω µ

(46)

11/29

Algorithmic Modifications (2)

i k,

where di

Yk+1 = {˜yk+11 , ..., ˜y λ k+1} :fΩ(˜yk+11 ) ≤ ... ≤fΩ(˜yk+1λ ). x_k+1trial = µ X i =1 ωi_k˜y_k+1i λ ≥ µ.

Imposing Sufficient Decrease :

ES Updates : σES

k+1, Ck and (ω01, ..., ω µ

(47)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand

σk+1= max(σk, σkES))

otherwise, set xk+1= xk and σk+1= βσk(where β < 1).

Conforming to the geometry of the constraints :

(48)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

Conforming to the geometry of the constraints :

(49)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

Conforming to the geometry of the constraints :

Barrier approach

(50)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

Conforming to the geometry of the constraints :

Barrier approach

(51)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

Conforming to the geometry of the constraints :

Barrier approach

(52)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

Conforming to the geometry of the constraints :

Barrier approach Projection approach

(53)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

Conforming to the geometry of the constraints :

(54)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

Conforming to the geometry of the constraints :

(55)

13/29

Plan

1 Introduction

3 Numerical results

4 Conclusions

(56)

14/29

Global Convergence (1)

By inspecting the sign of theClarke directional derivative.

Definition

Let f be a Lipschitz continuous function near the point x∗, given

u ∈ TΩ(x∗), the Clarke-Jahn generalized derivative is

f◦(x∗; u) = lim

d ∈TH Ω(x∗),d →u

f◦(x∗; d ).

If x∗∈ Ω is a local minimizer of the function f then

f◦(x∗; d ) ≥ 0, ∀d ∈ TΩ(x∗).

(the point x∗ is said to be Clarke stationary)

(57)

14/29

Global Convergence (1)

Definition

f◦(x∗; u) = lim

f◦(x∗; d ).

f◦(x∗; d ) ≥ 0, ∀d ∈ TΩ(x∗).

(58)

14/29

Global Convergence (1)

Definition

f◦(x∗; u) = lim

f◦(x∗; d ).

f◦(x∗; d ) ≥ 0, ∀d ∈ TΩ(x∗).

(59)

14/29

Global Convergence (1)

Definition

f◦(x∗; u) = lim

f◦(x∗; d ).

f◦(x∗; d ) ≥ 0, ∀d ∈ TΩ(x∗).

(60)

15/29

Global Convergence (2)

Theorem [D., Gratton and Vicente, 2014]

Consider a sequence of iterations generated by our Algorithm without any stopping criterion. Let f be bounded below.

Then there exists a subsequence K of unsuccessful iterates for which

limk∈Kσk = 0.

If {xk}is bounded, then there exists x∗ and a subsequence K of

unsuccessful iterates for whichlimk∈Kσk = 0andlimk∈Kxk = x∗.

(61)

15/29

Global Convergence (2)

limk∈Kσk = 0.

(62)

15/29

Global Convergence (2)

limk∈Kσk = 0.

(63)

15/29

Global Convergence (2)

limk∈Kσk = 0.

(64)

16/29

Global Convergence (3)

Let x∗ ∈ Ω be a limit point of unsuccessful subsequence of iterates {xk}

for which limk∈Kσk = 0, and let ak =Pµ_{i =1}ωkidki.

Assume that f is

Lipschitz continuous near x∗and that TΩH(x∗) 6= ∅.

If d ∈ TH

Ω(x∗) is a limit point of {ak/kakk}K, then

f◦(x∗; u) ≥ 0.

If the set of limit points {ak/kakk}Kis dense on the unit sphere, thenx∗is

a Clarke stationary point.

Corollary [D., Gratton and Vicente, 2014]

When f isstrictly differentiableat x∗, we conclude that the projection of

∇f (x∗) onto TΩ(x∗) is zero.

(65)

16/29

Global Convergence (3)

for which limk∈Kσk = 0, and let ak =Pµ_{i =1}ωkidki. Assume that f is

If d ∈ TH

f◦(x∗; u) ≥ 0.

(66)

16/29

Global Convergence (3)

If d ∈ TH

f◦(x∗; u) ≥ 0.

(67)

16/29

Global Convergence (3)

If d ∈ TH

f◦(x∗; u) ≥ 0.

(68)

16/29

Global Convergence (3)

If d ∈ TH

f◦(x∗; u) ≥ 0.

(69)

17/29

Plan

1 Introduction

3 Numerical results

4 Conclusions

(70)

18/29

The GC-ES solver

GC-ES : Globally Convergent Evolution Strategy. A Matlab software.

Constrained and unconstrained optimization problems are handled. A geometry control for general linear constraints.

A search step is implemented using least-squares regression models are used.

No geometry control in the search step, we only impose the singular values to be large enough.

(71)

19/29

Plan

1 Introduction

3 Numerical results

4 Conclusions

(72)

20/29

Solvers in our comparison

Implemented solvers :

The barrier approach : GC-ES-LC-B The projection approach : GC-ES-LC-P

Solvers used in our numerical comparison : [Rios, Sahinidis 2010]

The ”best” stochastic solvers CMA-ES [Hansen 2010] and PSWARM

[Vaz, Vicente 2009],[Vaz, Vicente 2010]

The ”best” solver MCS[Huyer, Neumaier 1999](deterministic).

BCDFO [Gratton, Toint, Tr¨oltzsch 2011]; developed later but it was

shown to perform very well.

(73)

20/29

Solvers in our comparison

Implemented solvers :

The barrier approach : GC-ES-LC-B The projection approach : GC-ES-LC-P

Solvers used in our numerical comparison : [Rios, Sahinidis 2010]

The ”best” stochastic solvers CMA-ES [Hansen 2010] and PSWARM

[Vaz, Vicente 2009],[Vaz, Vicente 2010]

The ”best” solver MCS[Huyer, Neumaier 1999](deterministic).

BCDFO [Gratton, Toint, Tr¨oltzsch 2011]; developed later but it was

shown to perform very well.

(74)

21/29

Test set

2 constrained category problems

114 Bound constrained problems[Vaz, Vicente 2009].

107 general linear constrained problems[Vaz, Vicente 2010].

In the proposed experiments

the performance is compared using performance profile. the budget is fixed to 3000 objective function evaluations. the advantage of the search step was significant for bound constraints.

for linear constraints, the search step did not lead to any improvement of the performance and was switched off.

(75)

21/29

Test set

2 constrained category problems

114 Bound constrained problems[Vaz, Vicente 2009].

107 general linear constrained problems[Vaz, Vicente 2010].

In the proposed experiments

the performance is compared using performance profile. the budget is fixed to 3000 objective function evaluations. the advantage of the search step was significant for bound constraints.

for linear constraints, the search step did not lead to any improvement of the performance and was switched off.

(76)

22/29

Bound constraints (accuracy is 10

−2

)

0 2 4 6 8 10 12 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log

2 scaled performance profiles, α = 10 −2 ES−LC−B ES−LC−P PSWARM CMA−ES MCS BCDFO

(a) Search step disabled.

(77)

22/29

Bound constraints (accuracy is 10

−2

)

0 2 4 6 8 10 12 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log

(b) Search step enabled.

(78)

23/29

Bound constraints (accuracy is 10

−4

)

0 2 4 6 8 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log

(c) Search step disabled.

(79)

23/29

Bound constraints (accuracy is 10

−4

)

0 2 4 6 8 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log

(d) Search step enabled.

(80)

24/29

Linear constraints (accuracy 10

−2

and 10

−4

)

0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log

2 scaled performance profiles, α = 10 −2

ES−LC−B ES−LC−P PSWARM

(e) Search step disabled.

(81)

24/29

Linear constraints (accuracy 10

−2

and 10

−4

)

0 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log

2 scaled performance profiles, α = 10 −4

ES−LC−B ES−LC−P PSWARM

(f) Search step disabled.

(82)

25/29

Plan

1 Introduction

3 Numerical results

4 Conclusions

(83)

26/29

Implementation details

Comparison with the extreme barrier approach proposed in MADS

[Audet, Dennis 2009]: MADS-EB.

Using 13 constrained opt. pbs [Michalewicz, Schoenaurer 1996]

[Koziel, Michalewicz 1999]

tend to cover many kind of constrained global optimization difficulties.

A large computational budget : 20000 objective function evaluations.

(84)

27/29

Non-linear constraints tests

Name Best known GC-ES-EB MADS-EB

f value final f #f final f #f

G1 −15 −15 15904 −7.82761 10093 G2 −0.803619 −0.261765 4235 −0.206864 11027 G3 −1 −0.340501 20000 −6.36481e − 233 1310 G4 −30665.5 −30665.5 6008 −30664.9 6666 G5 5126.5 5976.79 1653 5609.84 2542 G6 −6961.81 −6961.81 1158 −6961.81 1863 G7 24.3062 24.7203 14606 29.9121 7825 G8 −0.095825 −0.095825 438 −0.095825 343 G9 680.63 680.641 11114 681.301 3443 G10 7049.33 11129.7 20000 7186.62 20000 G11 0.75 0.99998 211 0.9998 193 G12 −1 −1 247 −1 173 G13 0.0539498 2.52108 5413 2.64292 20000

(85)

28/29

Conclusions

GC-ES is a software that :

implements a globally convergent evolution strategy.

has a two feasible approaches to handle constraints : barrier and projection approaches.

uses optionally surrogate quadratic models to improve the performance of the proposed algorithm.

gives promising results are obtained compare to some of the best DFO solvers.

Version 0.1 of CG ES is available (Diouane.Youssef@isae.fr).

Next : add a merit function approach to handle relaxable constraints.

D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies, Math. Program., 2014.

D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies for Constrained Optimization, Comput. Optim. Appl., 2015.

(86)

28/29

Conclusions

(87)

28/29

Conclusions

(88)

29/29