• Aucun résultat trouvé

GC-ES: A Globally Convergent Evolution Strategy for Unconstrained and Constrained Optimization

N/A
N/A
Protected

Academic year: 2021

Partager "GC-ES: A Globally Convergent Evolution Strategy for Unconstrained and Constrained Optimization"

Copied!
88
0
0

Texte intégral

(1)

1/29

GC-ES: A Globally Convergent Evolution Strategy

for Unconstrained and Constrained Optimization

Youssef Diouane (ISAE-SUPAERO)

Joint work with: Serge Gratton and Luis Nunes Vicente

22nd International Symposium on Mathematical Programming, Pittsburgh, Monday, July 13

(2)

2/29

Plan

1 Introduction

2 A Globally Convergent Evolution Strategy

Algorithmic Modifications Global Convergence

3 Numerical results

The GC-ES solver Linear constraints case Non-linear constraints case

4 Conclusions

(3)

3/29

Derivative-free optimization

Minimizingf (x ).

Possibly subject to a constraints setΩ.

f is assumed to be differentiable, but the derivatives of f are not available (expensive in CPU, legacy code, not coded at all).

In Derivative-Free Optimization (DFO), we can have two types of algorithms :

Deterministic DFO (essentially model-based methods, direct-search methods, ...).

Stochastic DFO (essentially simulated annealing, particle swarm optimization, evolutionary algorithms, ...).

(4)

3/29

Derivative-free optimization

Minimizingf (x ).

Possibly subject to a constraints setΩ.

f is assumed to be differentiable, but the derivatives of f are not available (expensive in CPU, legacy code, not coded at all). In Derivative-Free Optimization (DFO), we can have two types of algorithms :

Deterministic DFO (essentially model-based methods, direct-search methods, ...).

Stochastic DFO (essentially simulated annealing, particle swarm optimization, evolutionary algorithms, ...).

(5)

4/29

Stochastic DFO & Evolution Strategies

Only few interactions between the two DFO categories were established.

In this work, we show how ideas from deterministic DFO can improve the efficiency of one of the most successful class of

stochastic algorithms, known as Evolution

Strategies (ES’s)[Rechenberg 1973] [Schwefel 1975].

(6)

4/29

Stochastic DFO & Evolution Strategies

Only few interactions between the two DFO categories were established.

In this work, we show how ideas from deterministic DFO can improve the efficiency of one of the most successful class of

stochastic algorithms, known as Evolution

Strategies (ES’s)[Rechenberg 1973] [Schwefel 1975].

(7)

4/29

Stochastic DFO & Evolution Strategies

Only few interactions between the two DFO categories were established.

In this work, we show how ideas from deterministic DFO can improve the efficiency of one of the most successful class of

stochastic algorithms, known as Evolution

Strategies (ES’s)[Rechenberg 1973] [Schwefel 1975].

(8)

4/29

Stochastic DFO & Evolution Strategies

Only few interactions between the two DFO categories were established.

In this work, we show how ideas from deterministic DFO can improve the efficiency of one of the most successful class of

stochastic algorithms, known as Evolution

Strategies (ES’s)[Rechenberg 1973] [Schwefel 1975].

(9)

4/29

Stochastic DFO & Evolution Strategies

Only few interactions between the two DFO categories were established.

In this work, we show how ideas from deterministic DFO can improve the efficiency of one of the most successful class of

stochastic algorithms, known as Evolution

Strategies (ES’s)[Rechenberg 1973] [Schwefel 1975].

(10)

5/29

A Class of Evolution Strategies

1 Offspring Generation: Compute new sample points

Yk+1 = {yk+11 , ..., y λ k+1} such that : yk+1i =xk +σkESd i k,

where dki is drawn from a distribution Ck, i = 1, ..., λ.

2 Parent Selection: Evaluate f (yi

k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).

(11)

5/29

A Class of Evolution Strategies

1 Offspring Generation: Compute new sample points

Yk+1 = {yk+11 , ..., y λ k+1} such that : yk+1i =xk +σkESd i k,

where dki is drawn from a distribution Ck, i = 1, ..., λ.

2 Parent Selection: Evaluate f (yi

k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).

(12)

5/29

A Class of Evolution Strategies

1 Offspring Generation: Compute new sample points

Yk+1 = {yk+11 , ..., y λ k+1} such that : yk+1i =xk +σkESd i k,

where dki is drawn from a distribution Ck, i = 1, ..., λ.

2 Parent Selection: Evaluate f (yi

k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).

(13)

5/29

A Class of Evolution Strategies

1 Offspring Generation: Compute new sample points

Yk+1 = {yk+11 , ..., y λ k+1} such that : yk+1i =xk +σkESd i k,

where dki is drawn from a distribution Ck, i = 1, ..., λ.

2 Parent Selection: Evaluate f (yi

k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).

(14)

6/29

A Class of Evolution Strategies

A relevant implementation [Hansen et al, 1996]

CMA-ES(CovarianceMatrix Adaptation -EvolutionStrategy). Evolution of the population using a normal distribution with a covariance matrix (and step-size) adaptation mechanism. Considered as one of the best derivative-free optimization

methods.[BBOB, 2013] [BBOB, 2011] [Rios, Sahinidis, 2011]

(15)

6/29

A Class of Evolution Strategies

A relevant implementation [Hansen et al, 1996]

CMA-ES(CovarianceMatrix Adaptation - EvolutionStrategy).

Evolution of the population using a normal distribution with a covariance matrix (and step-size) adaptation mechanism. Considered as one of the best derivative-free optimization

methods.[BBOB, 2013] [BBOB, 2011] [Rios, Sahinidis, 2011]

(16)

6/29

A Class of Evolution Strategies

A relevant implementation [Hansen et al, 1996]

CMA-ES(CovarianceMatrix Adaptation - EvolutionStrategy). Evolution of the population using a normal distribution with a covariance matrix (and step-size) adaptation mechanism.

Considered as one of the best derivative-free optimization

methods.[BBOB, 2013] [BBOB, 2011] [Rios, Sahinidis, 2011]

(17)

6/29

A Class of Evolution Strategies

A relevant implementation [Hansen et al, 1996]

CMA-ES(CovarianceMatrix Adaptation - EvolutionStrategy). Evolution of the population using a normal distribution with a covariance matrix (and step-size) adaptation mechanism. Considered as one of the best derivative-free optimization

methods.[BBOB, 2013] [BBOB, 2011] [Rios, Sahinidis, 2011]

(18)

7/29

Some Existing Convergence Results

1 Offspring Generation: Compute new sample points

Yk+1 = {yk+11 , ..., y λ

k+1} such that :

yk+1i =xk +σkESdki,

where dki is drawn from a distribution Ck, i = 1, ..., λ.

2 Parent Selection: Evaluate f (yi

k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).

(19)

7/29

Some Existing Convergence Results

1 Offspring Generation: Compute new sample points

Yk+1 = {yk+11 , ..., y λ

k+1} such that :

yk+1i =xk +σkESdki,

where dki is drawn from a distribution Ck, i = 1, ..., λ.

2 Parent Selection: Evaluate f (yi

k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).

(20)

7/29

Some Existing Convergence Results

The sphere problem is among the most studied cases. [Bayer, 1998]

[Bienve¨ue and Fran¸cois, 2003] [Auger, 2005]

A single parent is considered (the recombination Pµ

i =1ω i ky˜

i

k+1 is not

allowed). [Yin, Rudolph and Schwefel, 1995] [Greenwood and Zhu

2001] [J¨agersk¨upper 2003]

In the isotropic case and a scale invariant adaptation rule. [Jebalia

and Auger, 2010]

(21)

7/29

Some Existing Convergence Results

The sphere problem is among the most studied cases. [Bayer, 1998]

[Bienve¨ue and Fran¸cois, 2003] [Auger, 2005]

A single parent is considered (the recombination Pµ

i =1ω i ky˜

i

k+1 is not

allowed). [Yin, Rudolph and Schwefel, 1995] [Greenwood and Zhu

2001] [J¨agersk¨upper 2003]

In the isotropic case and a scale invariant adaptation rule. [Jebalia

and Auger, 2010]

(22)

7/29

Some Existing Convergence Results

The sphere problem is among the most studied cases. [Bayer, 1998]

[Bienve¨ue and Fran¸cois, 2003] [Auger, 2005]

A single parent is considered (the recombination Pµ

i =1ω i ky˜

i

k+1 is not

allowed). [Yin, Rudolph and Schwefel, 1995] [Greenwood and Zhu

2001] [J¨agersk¨upper 2003]

In the isotropic case and a scale invariant adaptation rule.[Jebalia

and Auger, 2010]

(23)

8/29

Main objectives

Equip a class of evolution strategies with known techniques from deterministic DFO.

Achieve convergence from any starting point to a stationary point for evolution strategies.

Obtain good performance on practical problems, in terms of function evaluations, supposed to be expensive.

Extend the proposed algorithm to handle constrained optimization problems.

(24)

8/29

Main objectives

Equip a class of evolution strategies with known techniques from deterministic DFO.

Achieve convergence from any starting point to a stationary point for evolution strategies.

Obtain good performance on practical problems, in terms of function evaluations, supposed to be expensive.

Extend the proposed algorithm to handle constrained optimization problems.

(25)

8/29

Main objectives

Equip a class of evolution strategies with known techniques from deterministic DFO.

Achieve convergence from any starting point to a stationary point for evolution strategies.

Obtain good performance on practical problems, in terms of function evaluations, supposed to be expensive.

Extend the proposed algorithm to handle constrained optimization problems.

(26)

8/29

Main objectives

Equip a class of evolution strategies with known techniques from deterministic DFO.

Achieve convergence from any starting point to a stationary point for evolution strategies.

Obtain good performance on practical problems, in terms of function evaluations, supposed to be expensive.

Extend the proposed algorithm to handle constrained optimization problems.

(27)

8/29

Main objectives

Equip a class of evolution strategies with known techniques from deterministic DFO.

Achieve convergence from any starting point to a stationary point for evolution strategies.

Obtain good performance on practical problems, in terms of function evaluations, supposed to be expensive.

Extend the proposed algorithm to handle constrained optimization problems.

(28)

9/29

Plan

1 Introduction

2 A Globally Convergent Evolution Strategy

Algorithmic Modifications Global Convergence

3 Numerical results

The GC-ES solver Linear constraints case Non-linear constraints case

4 Conclusions

(29)

10/29

Algorithmic Modifications (1)

The objective function is minimized on a dense

set of directions. [Audet and Denis, 2006]

Changes in the algorithm[Kolda,

Lewis,Torczon 2003]:

One will impose a sufficient decrease along the iterates.

We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.

A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )

=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.

(30)

10/29

Algorithmic Modifications (1)

The objective function is minimized on a dense

set of directions. [Audet and Denis, 2006]

Changes in the algorithm[Kolda,

Lewis,Torczon 2003]:

One will impose a sufficient decrease along the iterates.

We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.

A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )

=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.

(31)

10/29

Algorithmic Modifications (1)

The objective function is minimized on a dense

set of directions. [Audet and Denis, 2006]

Changes in the algorithm[Kolda,

Lewis,Torczon 2003]:

One will impose a sufficient decrease along the iterates.

We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.

A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )

=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.

(32)

10/29

Algorithmic Modifications (1)

The objective function is minimized on a dense

set of directions. [Audet and Denis, 2006]

Changes in the algorithm[Kolda,

Lewis,Torczon 2003]:

One will impose a sufficient decrease along the iterates.

We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.

A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )

mean/mean

=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.

(33)

10/29

Algorithmic Modifications (1)

The objective function is minimized on a dense

set of directions. [Audet and Denis, 2006]

Changes in the algorithm[Kolda,

Lewis,Torczon 2003]:

One will impose a sufficient decrease along the iterates.

We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.

A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )

mean/mean

=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.

(34)

10/29

Algorithmic Modifications (1)

The objective function is minimized on a dense

set of directions. [Audet and Denis, 2006]

Changes in the algorithm[Kolda,

Lewis,Torczon 2003]:

One will impose a sufficient decrease along the iterates.

We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.

A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )

mean/mean

=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.

(35)

10/29

Algorithmic Modifications (1)

The objective function is minimized on a dense

set of directions. [Audet and Denis, 2006]

Changes in the algorithm[Kolda,

Lewis,Torczon 2003]:

One will impose a sufficient decrease along the iterates.

We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.

A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )

mean/mean

=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.

(36)

10/29

Algorithmic Modifications (1)

The objective function is minimized on a dense

set of directions. [Audet and Denis, 2006]

Changes in the algorithm[Kolda,

Lewis,Torczon 2003]:

One will impose a sufficient decrease along the iterates.

We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.

A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )

mean/mean

=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.

(37)

10/29

Algorithmic Modifications (1)

The objective function is minimized on a dense

set of directions. [Audet and Denis, 2006]

Changes in the algorithm[Kolda,

Lewis,Torczon 2003]:

One will impose a sufficient decrease along the iterates.

We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.

A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )

mean/mean

=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.

(38)

10/29

Algorithmic Modifications (1)

The objective function is minimized on a dense

set of directions. [Audet and Denis, 2006]

Changes in the algorithm[Kolda,

Lewis,Torczon 2003]:

One will impose a sufficient decrease along the iterates.

We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.

A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )

mean/mean

=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.

(39)

10/29

Algorithmic Modifications (1)

The objective function is minimized on a dense

set of directions. [Audet and Denis, 2006]

Changes in the algorithm[Kolda,

Lewis,Torczon 2003]:

One will impose a sufficient decrease along the iterates.

We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.

A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )

mean/mean

=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.

(40)

11/29

Algorithmic Modifications (2)

Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :

yk+1i = xk + σkESd i k,

where di

k is drawn from a distribution Ck, i = 1, ..., λ.

Parent Selection : Evaluate f (yi

k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜y λ k+1} : f (˜y 1 k+1) ≤ ... ≤ f (˜y λ k+1). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ.

Imposing Sufficient Decrease :

ES Updates : σk+1ES , Ck and (ω01, ..., ω µ

0) ∈ S. Return to step (1).

(41)

11/29

Algorithmic Modifications (2)

Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :

yk+1i = xk + σkESd i k,

where di

k is drawn from a distribution Ck, i = 1, ..., λ.

Parent Selection : Evaluate f (yi

k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜y λ k+1} : f (˜y 1 k+1) ≤ ... ≤ f (˜y λ k+1). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ.

Imposing Sufficient Decrease :

ES Updates : σk+1ES , Ck and (ω01, ..., ω µ

0) ∈ S. Return to step (1).

(42)

11/29

Algorithmic Modifications (2)

Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :

yk+1i = xk +σk d

i k,

where di

k is drawn from a distribution Ck, i = 1, ..., λ.

Parent Selection : Evaluate f (yi

k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜y λ k+1} : f (˜y 1 k+1) ≤ ... ≤ f (˜y λ k+1). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ.

Imposing Sufficient Decrease :

ES Updates : σk+1ES , Ck and (ω01, ..., ω µ

0) ∈ S. Return to step (1).

(43)

11/29

Algorithmic Modifications (2)

Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :

yk+1i = xk +σk d

i k,

where di

k is drawn from a distribution Ck, i = 1, ..., λ.

Parent Selection : Evaluate fΩ(yk+1i ), i = 1, ..., λ, and reorder

Yk+1 = {˜yk+11 , ..., ˜y λ k+1} :fΩ(˜yk+11 ) ≤ ... ≤fΩ(˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ.

Imposing Sufficient Decrease :

ES Updates : σk+1ES , Ck and (ω01, ..., ω µ

0) ∈ S. Return to step (1).

(44)

11/29

Algorithmic Modifications (2)

Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :

yk+1i = xk +σk d˜

i k,

where di

k is drawn from a distribution Ck, i = 1, ..., λ.

Parent Selection : Evaluate fΩ(yk+1i ), i = 1, ..., λ, and reorder

Yk+1 = {˜yk+11 , ..., ˜y λ k+1} :fΩ(˜yk+11 ) ≤ ... ≤fΩ(˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ.

Imposing Sufficient Decrease :

ES Updates : σES

k+1, Ck and (ω01, ..., ω µ

0) ∈ S. Return to step (1).

(45)

11/29

Algorithmic Modifications (2)

Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :

yk+1i = xk +σk d˜

i k,

where di

k is drawn from a distribution Ck, i = 1, ..., λ.

Parent Selection : Evaluate fΩ(yk+1i ), i = 1, ..., λ, and reorder

Yk+1 = {˜yk+11 , ..., ˜y λ k+1} :fΩ(˜yk+11 ) ≤ ... ≤fΩ(˜yk+1λ ). xk+1trial = µ X i =1 ωik˜yk+1i λ ≥ µ.

Imposing Sufficient Decrease :

ES Updates : σES

k+1, Ck and (ω01, ..., ω µ

0) ∈ S. Return to step (1).

(46)

11/29

Algorithmic Modifications (2)

Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :

yk+1i = xk +σk d˜

i k,

where di

k is drawn from a distribution Ck, i = 1, ..., λ.

Parent Selection : Evaluate fΩ(yk+1i ), i = 1, ..., λ, and reorder

Yk+1 = {˜yk+11 , ..., ˜y λ k+1} :fΩ(˜yk+11 ) ≤ ... ≤fΩ(˜yk+1λ ). xk+1trial = µ X i =1 ωik˜yk+1i λ ≥ µ.

Imposing Sufficient Decrease :

ES Updates : σES

k+1, Ck and (ω01, ..., ω µ

0) ∈ S. Return to step (1).

(47)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand

σk+1= max(σk, σkES))

otherwise, set xk+1= xk and σk+1= βσk(where β < 1).

Conforming to the geometry of the constraints :

(48)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand

σk+1= max(σk, σkES))

otherwise, set xk+1= xk and σk+1= βσk(where β < 1).

Conforming to the geometry of the constraints :

(49)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand

σk+1= max(σk, σkES))

otherwise, set xk+1= xk and σk+1= βσk(where β < 1).

Conforming to the geometry of the constraints :

Barrier approach

(50)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand

σk+1= max(σk, σkES))

otherwise, set xk+1= xk and σk+1= βσk(where β < 1).

Conforming to the geometry of the constraints :

Barrier approach

(51)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand

σk+1= max(σk, σkES))

otherwise, set xk+1= xk and σk+1= βσk(where β < 1).

Conforming to the geometry of the constraints :

Barrier approach

(52)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand

σk+1= max(σk, σkES))

otherwise, set xk+1= xk and σk+1= βσk(where β < 1).

Conforming to the geometry of the constraints :

Barrier approach Projection approach

(53)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand

σk+1= max(σk, σkES))

otherwise, set xk+1= xk and σk+1= βσk(where β < 1).

Conforming to the geometry of the constraints :

Barrier approach Projection approach

(54)

12/29

Algorithmic Modifications (3)

Imposing Sufficient Decrease :

If

fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand

σk+1= max(σk, σkES))

otherwise, set xk+1= xk and σk+1= βσk(where β < 1).

Conforming to the geometry of the constraints :

Barrier approach Projection approach

(55)

13/29

Plan

1 Introduction

2 A Globally Convergent Evolution Strategy

Algorithmic Modifications Global Convergence

3 Numerical results

The GC-ES solver Linear constraints case Non-linear constraints case

4 Conclusions

(56)

14/29

Global Convergence (1)

By inspecting the sign of theClarke directional derivative.

Definition

Let f be a Lipschitz continuous function near the point x∗, given

u ∈ TΩ(x∗), the Clarke-Jahn generalized derivative is

f◦(x∗; u) = lim

d ∈TH Ω(x∗),d →u

f◦(x∗; d ).

If x∗∈ Ω is a local minimizer of the function f then

f◦(x∗; d ) ≥ 0, ∀d ∈ TΩ(x∗).

(the point x∗ is said to be Clarke stationary)

(57)

14/29

Global Convergence (1)

By inspecting the sign of theClarke directional derivative.

Definition

Let f be a Lipschitz continuous function near the point x∗, given

u ∈ TΩ(x∗), the Clarke-Jahn generalized derivative is

f◦(x∗; u) = lim

d ∈TH Ω(x∗),d →u

f◦(x∗; d ).

If x∗∈ Ω is a local minimizer of the function f then

f◦(x∗; d ) ≥ 0, ∀d ∈ TΩ(x∗).

(the point x∗ is said to be Clarke stationary)

(58)

14/29

Global Convergence (1)

By inspecting the sign of theClarke directional derivative.

Definition

Let f be a Lipschitz continuous function near the point x∗, given

u ∈ TΩ(x∗), the Clarke-Jahn generalized derivative is

f◦(x∗; u) = lim

d ∈TH Ω(x∗),d →u

f◦(x∗; d ).

If x∗∈ Ω is a local minimizer of the function f then

f◦(x∗; d ) ≥ 0, ∀d ∈ TΩ(x∗).

(the point x∗ is said to be Clarke stationary)

(59)

14/29

Global Convergence (1)

By inspecting the sign of theClarke directional derivative.

Definition

Let f be a Lipschitz continuous function near the point x∗, given

u ∈ TΩ(x∗), the Clarke-Jahn generalized derivative is

f◦(x∗; u) = lim

d ∈TH Ω(x∗),d →u

f◦(x∗; d ).

If x∗∈ Ω is a local minimizer of the function f then

f◦(x∗; d ) ≥ 0, ∀d ∈ TΩ(x∗).

(the point x∗ is said to be Clarke stationary)

(60)

15/29

Global Convergence (2)

Theorem [D., Gratton and Vicente, 2014]

Consider a sequence of iterations generated by our Algorithm without any stopping criterion. Let f be bounded below.

Then there exists a subsequence K of unsuccessful iterates for which

limk∈Kσk = 0.

If {xk}is bounded, then there exists x∗ and a subsequence K of

unsuccessful iterates for whichlimk∈Kσk = 0andlimk∈Kxk = x∗.

(61)

15/29

Global Convergence (2)

Theorem [D., Gratton and Vicente, 2014]

Consider a sequence of iterations generated by our Algorithm without any stopping criterion. Let f be bounded below.

Then there exists a subsequence K of unsuccessful iterates for which

limk∈Kσk = 0.

If {xk}is bounded, then there exists x∗ and a subsequence K of

unsuccessful iterates for whichlimk∈Kσk = 0andlimk∈Kxk = x∗.

(62)

15/29

Global Convergence (2)

Theorem [D., Gratton and Vicente, 2014]

Consider a sequence of iterations generated by our Algorithm without any stopping criterion. Let f be bounded below.

Then there exists a subsequence K of unsuccessful iterates for which

limk∈Kσk = 0.

If {xk}is bounded, then there exists x∗ and a subsequence K of

unsuccessful iterates for whichlimk∈Kσk = 0andlimk∈Kxk = x∗.

(63)

15/29

Global Convergence (2)

Theorem [D., Gratton and Vicente, 2014]

Consider a sequence of iterations generated by our Algorithm without any stopping criterion. Let f be bounded below.

Then there exists a subsequence K of unsuccessful iterates for which

limk∈Kσk = 0.

If {xk}is bounded, then there exists x∗ and a subsequence K of

unsuccessful iterates for whichlimk∈Kσk = 0andlimk∈Kxk = x∗.

(64)

16/29

Global Convergence (3)

Theorem [D., Gratton and Vicente, 2014]

Let x∗ ∈ Ω be a limit point of unsuccessful subsequence of iterates {xk}

for which limk∈Kσk = 0, and let ak =Pµi =1ωkidki.

Assume that f is

Lipschitz continuous near x∗and that TΩH(x∗) 6= ∅.

If d ∈ TH

Ω(x∗) is a limit point of {ak/kakk}K, then

f◦(x∗; u) ≥ 0.

If the set of limit points {ak/kakk}Kis dense on the unit sphere, thenx∗is

a Clarke stationary point.

Corollary [D., Gratton and Vicente, 2014]

When f isstrictly differentiableat x∗, we conclude that the projection of

∇f (x∗) onto TΩ(x∗) is zero.

(65)

16/29

Global Convergence (3)

Theorem [D., Gratton and Vicente, 2014]

Let x∗ ∈ Ω be a limit point of unsuccessful subsequence of iterates {xk}

for which limk∈Kσk = 0, and let ak =Pµi =1ωkidki. Assume that f is

Lipschitz continuous near x∗and that TΩH(x∗) 6= ∅.

If d ∈ TH

Ω(x∗) is a limit point of {ak/kakk}K, then

f◦(x∗; u) ≥ 0.

If the set of limit points {ak/kakk}Kis dense on the unit sphere, thenx∗is

a Clarke stationary point.

Corollary [D., Gratton and Vicente, 2014]

When f isstrictly differentiableat x∗, we conclude that the projection of

∇f (x∗) onto TΩ(x∗) is zero.

(66)

16/29

Global Convergence (3)

Theorem [D., Gratton and Vicente, 2014]

Let x∗ ∈ Ω be a limit point of unsuccessful subsequence of iterates {xk}

for which limk∈Kσk = 0, and let ak =Pµi =1ωkidki. Assume that f is

Lipschitz continuous near x∗and that TΩH(x∗) 6= ∅.

If d ∈ TH

Ω(x∗) is a limit point of {ak/kakk}K, then

f◦(x∗; u) ≥ 0.

If the set of limit points {ak/kakk}Kis dense on the unit sphere, thenx∗is

a Clarke stationary point.

Corollary [D., Gratton and Vicente, 2014]

When f isstrictly differentiableat x∗, we conclude that the projection of

∇f (x∗) onto TΩ(x∗) is zero.

(67)

16/29

Global Convergence (3)

Theorem [D., Gratton and Vicente, 2014]

Let x∗ ∈ Ω be a limit point of unsuccessful subsequence of iterates {xk}

for which limk∈Kσk = 0, and let ak =Pµi =1ωkidki. Assume that f is

Lipschitz continuous near x∗and that TΩH(x∗) 6= ∅.

If d ∈ TH

Ω(x∗) is a limit point of {ak/kakk}K, then

f◦(x∗; u) ≥ 0.

If the set of limit points {ak/kakk}Kis dense on the unit sphere, thenx∗is

a Clarke stationary point.

Corollary [D., Gratton and Vicente, 2014]

When f isstrictly differentiableat x∗, we conclude that the projection of

∇f (x∗) onto TΩ(x∗) is zero.

(68)

16/29

Global Convergence (3)

Theorem [D., Gratton and Vicente, 2014]

Let x∗ ∈ Ω be a limit point of unsuccessful subsequence of iterates {xk}

for which limk∈Kσk = 0, and let ak =Pµi =1ωkidki. Assume that f is

Lipschitz continuous near x∗and that TΩH(x∗) 6= ∅.

If d ∈ TH

Ω(x∗) is a limit point of {ak/kakk}K, then

f◦(x∗; u) ≥ 0.

If the set of limit points {ak/kakk}Kis dense on the unit sphere, thenx∗is

a Clarke stationary point.

Corollary [D., Gratton and Vicente, 2014]

When f isstrictly differentiableat x∗, we conclude that the projection of

∇f (x∗) onto TΩ(x∗) is zero.

(69)

17/29

Plan

1 Introduction

2 A Globally Convergent Evolution Strategy

Algorithmic Modifications Global Convergence

3 Numerical results

The GC-ES solver Linear constraints case Non-linear constraints case

4 Conclusions

(70)

18/29

The GC-ES solver

GC-ES : Globally Convergent Evolution Strategy. A Matlab software.

Constrained and unconstrained optimization problems are handled. A geometry control for general linear constraints.

A search step is implemented using least-squares regression models are used.

No geometry control in the search step, we only impose the singular values to be large enough.

(71)

19/29

Plan

1 Introduction

2 A Globally Convergent Evolution Strategy

Algorithmic Modifications Global Convergence

3 Numerical results

The GC-ES solver Linear constraints case Non-linear constraints case

4 Conclusions

(72)

20/29

Solvers in our comparison

Implemented solvers :

The barrier approach : GC-ES-LC-B The projection approach : GC-ES-LC-P

Solvers used in our numerical comparison : [Rios, Sahinidis 2010]

The ”best” stochastic solvers CMA-ES [Hansen 2010] and PSWARM

[Vaz, Vicente 2009],[Vaz, Vicente 2010]

The ”best” solver MCS[Huyer, Neumaier 1999](deterministic).

BCDFO [Gratton, Toint, Tr¨oltzsch 2011]; developed later but it was

shown to perform very well.

(73)

20/29

Solvers in our comparison

Implemented solvers :

The barrier approach : GC-ES-LC-B The projection approach : GC-ES-LC-P

Solvers used in our numerical comparison : [Rios, Sahinidis 2010]

The ”best” stochastic solvers CMA-ES [Hansen 2010] and PSWARM

[Vaz, Vicente 2009],[Vaz, Vicente 2010]

The ”best” solver MCS[Huyer, Neumaier 1999](deterministic).

BCDFO [Gratton, Toint, Tr¨oltzsch 2011]; developed later but it was

shown to perform very well.

(74)

21/29

Test set

2 constrained category problems

114 Bound constrained problems[Vaz, Vicente 2009].

107 general linear constrained problems[Vaz, Vicente 2010].

In the proposed experiments

the performance is compared using performance profile. the budget is fixed to 3000 objective function evaluations. the advantage of the search step was significant for bound constraints.

for linear constraints, the search step did not lead to any improvement of the performance and was switched off.

(75)

21/29

Test set

2 constrained category problems

114 Bound constrained problems[Vaz, Vicente 2009].

107 general linear constrained problems[Vaz, Vicente 2010].

In the proposed experiments

the performance is compared using performance profile. the budget is fixed to 3000 objective function evaluations. the advantage of the search step was significant for bound constraints.

for linear constraints, the search step did not lead to any improvement of the performance and was switched off.

(76)

22/29

Bound constraints (accuracy is 10

−2

)

0 2 4 6 8 10 12 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log

2 scaled performance profiles, α = 10 −2 ES−LC−B ES−LC−P PSWARM CMA−ES MCS BCDFO

(a) Search step disabled.

(77)

22/29

Bound constraints (accuracy is 10

−2

)

0 2 4 6 8 10 12 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log

2 scaled performance profiles, α = 10 −2 ES−LC−B ES−LC−P PSWARM CMA−ES MCS BCDFO

(b) Search step enabled.

(78)

23/29

Bound constraints (accuracy is 10

−4

)

0 2 4 6 8 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log

2 scaled performance profiles, α = 10 −4 ES−LC−B ES−LC−P PSWARM CMA−ES MCS BCDFO

(c) Search step disabled.

(79)

23/29

Bound constraints (accuracy is 10

−4

)

0 2 4 6 8 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log

2 scaled performance profiles, α = 10 −4 ES−LC−B ES−LC−P PSWARM CMA−ES MCS BCDFO

(d) Search step enabled.

(80)

24/29

Linear constraints (accuracy 10

−2

and 10

−4

)

0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log

2 scaled performance profiles, α = 10 −2

ES−LC−B ES−LC−P PSWARM

(e) Search step disabled.

(81)

24/29

Linear constraints (accuracy 10

−2

and 10

−4

)

0 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log

2 scaled performance profiles, α = 10 −4

ES−LC−B ES−LC−P PSWARM

(f) Search step disabled.

(82)

25/29

Plan

1 Introduction

2 A Globally Convergent Evolution Strategy

Algorithmic Modifications Global Convergence

3 Numerical results

The GC-ES solver Linear constraints case Non-linear constraints case

4 Conclusions

(83)

26/29

Implementation details

Comparison with the extreme barrier approach proposed in MADS

[Audet, Dennis 2009]: MADS-EB.

Using 13 constrained opt. pbs [Michalewicz, Schoenaurer 1996]

[Koziel, Michalewicz 1999]

tend to cover many kind of constrained global optimization difficulties.

A large computational budget : 20000 objective function evaluations.

(84)

27/29

Non-linear constraints tests

Name Best known GC-ES-EB MADS-EB

f value final f #f final f #f

G1 −15 −15 15904 −7.82761 10093 G2 −0.803619 −0.261765 4235 −0.206864 11027 G3 −1 −0.340501 20000 −6.36481e − 233 1310 G4 −30665.5 −30665.5 6008 −30664.9 6666 G5 5126.5 5976.79 1653 5609.84 2542 G6 −6961.81 −6961.81 1158 −6961.81 1863 G7 24.3062 24.7203 14606 29.9121 7825 G8 −0.095825 −0.095825 438 −0.095825 343 G9 680.63 680.641 11114 681.301 3443 G10 7049.33 11129.7 20000 7186.62 20000 G11 0.75 0.99998 211 0.9998 193 G12 −1 −1 247 −1 173 G13 0.0539498 2.52108 5413 2.64292 20000

(85)

28/29

Conclusions

GC-ES is a software that :

implements a globally convergent evolution strategy.

has a two feasible approaches to handle constraints : barrier and projection approaches.

uses optionally surrogate quadratic models to improve the performance of the proposed algorithm.

gives promising results are obtained compare to some of the best DFO solvers.

Version 0.1 of CG ES is available (Diouane.Youssef@isae.fr).

Next : add a merit function approach to handle relaxable constraints.

D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies, Math. Program., 2014.

D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies for Constrained Optimization, Comput. Optim. Appl., 2015.

(86)

28/29

Conclusions

GC-ES is a software that :

implements a globally convergent evolution strategy.

has a two feasible approaches to handle constraints : barrier and projection approaches.

uses optionally surrogate quadratic models to improve the performance of the proposed algorithm.

gives promising results are obtained compare to some of the best DFO solvers.

Version 0.1 of CG ES is available (Diouane.Youssef@isae.fr).

Next : add a merit function approach to handle relaxable constraints.

D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies, Math. Program., 2014.

D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies for Constrained Optimization, Comput. Optim. Appl., 2015.

(87)

28/29

Conclusions

GC-ES is a software that :

implements a globally convergent evolution strategy.

has a two feasible approaches to handle constraints : barrier and projection approaches.

uses optionally surrogate quadratic models to improve the performance of the proposed algorithm.

gives promising results are obtained compare to some of the best DFO solvers.

Version 0.1 of CG ES is available (Diouane.Youssef@isae.fr).

Next : add a merit function approach to handle relaxable constraints.

D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies, Math. Program., 2014.

D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies for Constrained Optimization, Comput. Optim. Appl., 2015.

(88)

29/29

Thank you

Références

Documents relatifs

The calculation and the optimization of the criterion are discussed in Section 4. Section 5 presents experimental results. An illustration on a two-dimensional toy problem is

Different from evolutionary algorithms, the filled function method proposed in [?, ?, ?] is applied to construct a path which can help optimize the objective function when a

The optimization of the criterion is also a difficult problem because the EI criterion is known to be highly multi-modal and hard to optimize. Our proposal is to conduct

Because of the connection of quasiregular mappings to solutions of quasi- linear elliptic partial differential equations (see [7]), our constructions give

Utilisez les freins chaque fois que vous installez votre enfant dans la poussette ou que vous l’en retirez.

assumption since the cohort studies typically have a very large number of subjects. Negative risks can be predicted if the uncertainty distribution has both positive and

No calculators are to be used on this exam. Note: This examination consists of ten questions on one page. Give the first three terms only. a) What is an orthogonal matrix?

Note: This examination consists of ten questions on one page. a) For a rectangular matrix B, what is its singular value decomposition? Illustrate with a simple example. b)