1/29
GC-ES: A Globally Convergent Evolution Strategy
for Unconstrained and Constrained Optimization
Youssef Diouane (ISAE-SUPAERO)
Joint work with: Serge Gratton and Luis Nunes Vicente
22nd International Symposium on Mathematical Programming, Pittsburgh, Monday, July 13
2/29
Plan
1 Introduction
2 A Globally Convergent Evolution Strategy
Algorithmic Modifications Global Convergence
3 Numerical results
The GC-ES solver Linear constraints case Non-linear constraints case
4 Conclusions
3/29
Derivative-free optimization
Minimizingf (x ).
Possibly subject to a constraints setΩ.
f is assumed to be differentiable, but the derivatives of f are not available (expensive in CPU, legacy code, not coded at all).
In Derivative-Free Optimization (DFO), we can have two types of algorithms :
Deterministic DFO (essentially model-based methods, direct-search methods, ...).
Stochastic DFO (essentially simulated annealing, particle swarm optimization, evolutionary algorithms, ...).
3/29
Derivative-free optimization
Minimizingf (x ).
Possibly subject to a constraints setΩ.
f is assumed to be differentiable, but the derivatives of f are not available (expensive in CPU, legacy code, not coded at all). In Derivative-Free Optimization (DFO), we can have two types of algorithms :
Deterministic DFO (essentially model-based methods, direct-search methods, ...).
Stochastic DFO (essentially simulated annealing, particle swarm optimization, evolutionary algorithms, ...).
4/29
Stochastic DFO & Evolution Strategies
Only few interactions between the two DFO categories were established.
In this work, we show how ideas from deterministic DFO can improve the efficiency of one of the most successful class of
stochastic algorithms, known as Evolution
Strategies (ES’s)[Rechenberg 1973] [Schwefel 1975].
4/29
Stochastic DFO & Evolution Strategies
Only few interactions between the two DFO categories were established.
In this work, we show how ideas from deterministic DFO can improve the efficiency of one of the most successful class of
stochastic algorithms, known as Evolution
Strategies (ES’s)[Rechenberg 1973] [Schwefel 1975].
4/29
Stochastic DFO & Evolution Strategies
Only few interactions between the two DFO categories were established.
In this work, we show how ideas from deterministic DFO can improve the efficiency of one of the most successful class of
stochastic algorithms, known as Evolution
Strategies (ES’s)[Rechenberg 1973] [Schwefel 1975].
4/29
Stochastic DFO & Evolution Strategies
Only few interactions between the two DFO categories were established.
In this work, we show how ideas from deterministic DFO can improve the efficiency of one of the most successful class of
stochastic algorithms, known as Evolution
Strategies (ES’s)[Rechenberg 1973] [Schwefel 1975].
4/29
Stochastic DFO & Evolution Strategies
Only few interactions between the two DFO categories were established.
In this work, we show how ideas from deterministic DFO can improve the efficiency of one of the most successful class of
stochastic algorithms, known as Evolution
Strategies (ES’s)[Rechenberg 1973] [Schwefel 1975].
5/29
A Class of Evolution Strategies
1 Offspring Generation: Compute new sample points
Yk+1 = {yk+11 , ..., y λ k+1} such that : yk+1i =xk +σkESd i k,
where dki is drawn from a distribution Ck, i = 1, ..., λ.
2 Parent Selection: Evaluate f (yi
k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).
5/29
A Class of Evolution Strategies
1 Offspring Generation: Compute new sample points
Yk+1 = {yk+11 , ..., y λ k+1} such that : yk+1i =xk +σkESd i k,
where dki is drawn from a distribution Ck, i = 1, ..., λ.
2 Parent Selection: Evaluate f (yi
k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).
5/29
A Class of Evolution Strategies
1 Offspring Generation: Compute new sample points
Yk+1 = {yk+11 , ..., y λ k+1} such that : yk+1i =xk +σkESd i k,
where dki is drawn from a distribution Ck, i = 1, ..., λ.
2 Parent Selection: Evaluate f (yi
k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).
5/29
A Class of Evolution Strategies
1 Offspring Generation: Compute new sample points
Yk+1 = {yk+11 , ..., y λ k+1} such that : yk+1i =xk +σkESd i k,
where dki is drawn from a distribution Ck, i = 1, ..., λ.
2 Parent Selection: Evaluate f (yi
k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).
6/29
A Class of Evolution Strategies
A relevant implementation [Hansen et al, 1996]
CMA-ES(CovarianceMatrix Adaptation -EvolutionStrategy). Evolution of the population using a normal distribution with a covariance matrix (and step-size) adaptation mechanism. Considered as one of the best derivative-free optimization
methods.[BBOB, 2013] [BBOB, 2011] [Rios, Sahinidis, 2011]
6/29
A Class of Evolution Strategies
A relevant implementation [Hansen et al, 1996]
CMA-ES(CovarianceMatrix Adaptation - EvolutionStrategy).
Evolution of the population using a normal distribution with a covariance matrix (and step-size) adaptation mechanism. Considered as one of the best derivative-free optimization
methods.[BBOB, 2013] [BBOB, 2011] [Rios, Sahinidis, 2011]
6/29
A Class of Evolution Strategies
A relevant implementation [Hansen et al, 1996]
CMA-ES(CovarianceMatrix Adaptation - EvolutionStrategy). Evolution of the population using a normal distribution with a covariance matrix (and step-size) adaptation mechanism.
Considered as one of the best derivative-free optimization
methods.[BBOB, 2013] [BBOB, 2011] [Rios, Sahinidis, 2011]
6/29
A Class of Evolution Strategies
A relevant implementation [Hansen et al, 1996]
CMA-ES(CovarianceMatrix Adaptation - EvolutionStrategy). Evolution of the population using a normal distribution with a covariance matrix (and step-size) adaptation mechanism. Considered as one of the best derivative-free optimization
methods.[BBOB, 2013] [BBOB, 2011] [Rios, Sahinidis, 2011]
7/29
Some Existing Convergence Results
1 Offspring Generation: Compute new sample points
Yk+1 = {yk+11 , ..., y λ
k+1} such that :
yk+1i =xk +σkESdki,
where dki is drawn from a distribution Ck, i = 1, ..., λ.
2 Parent Selection: Evaluate f (yi
k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).
7/29
Some Existing Convergence Results
1 Offspring Generation: Compute new sample points
Yk+1 = {yk+11 , ..., y λ
k+1} such that :
yk+1i =xk +σkESdki,
where dki is drawn from a distribution Ck, i = 1, ..., λ.
2 Parent Selection: Evaluate f (yi
k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜yk+1λ } : f (˜yk+11 ) ≤ ... ≤ f (˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ. 3 Updates: σES k+1,Ck and(ω01, ..., ω µ 0)∈ S. Return to step (1).
7/29
Some Existing Convergence Results
The sphere problem is among the most studied cases. [Bayer, 1998]
[Bienve¨ue and Fran¸cois, 2003] [Auger, 2005]
A single parent is considered (the recombination Pµ
i =1ω i ky˜
i
k+1 is not
allowed). [Yin, Rudolph and Schwefel, 1995] [Greenwood and Zhu
2001] [J¨agersk¨upper 2003]
In the isotropic case and a scale invariant adaptation rule. [Jebalia
and Auger, 2010]
7/29
Some Existing Convergence Results
The sphere problem is among the most studied cases. [Bayer, 1998]
[Bienve¨ue and Fran¸cois, 2003] [Auger, 2005]
A single parent is considered (the recombination Pµ
i =1ω i ky˜
i
k+1 is not
allowed). [Yin, Rudolph and Schwefel, 1995] [Greenwood and Zhu
2001] [J¨agersk¨upper 2003]
In the isotropic case and a scale invariant adaptation rule. [Jebalia
and Auger, 2010]
7/29
Some Existing Convergence Results
The sphere problem is among the most studied cases. [Bayer, 1998]
[Bienve¨ue and Fran¸cois, 2003] [Auger, 2005]
A single parent is considered (the recombination Pµ
i =1ω i ky˜
i
k+1 is not
allowed). [Yin, Rudolph and Schwefel, 1995] [Greenwood and Zhu
2001] [J¨agersk¨upper 2003]
In the isotropic case and a scale invariant adaptation rule.[Jebalia
and Auger, 2010]
8/29
Main objectives
Equip a class of evolution strategies with known techniques from deterministic DFO.
Achieve convergence from any starting point to a stationary point for evolution strategies.
Obtain good performance on practical problems, in terms of function evaluations, supposed to be expensive.
Extend the proposed algorithm to handle constrained optimization problems.
8/29
Main objectives
Equip a class of evolution strategies with known techniques from deterministic DFO.
Achieve convergence from any starting point to a stationary point for evolution strategies.
Obtain good performance on practical problems, in terms of function evaluations, supposed to be expensive.
Extend the proposed algorithm to handle constrained optimization problems.
8/29
Main objectives
Equip a class of evolution strategies with known techniques from deterministic DFO.
Achieve convergence from any starting point to a stationary point for evolution strategies.
Obtain good performance on practical problems, in terms of function evaluations, supposed to be expensive.
Extend the proposed algorithm to handle constrained optimization problems.
8/29
Main objectives
Equip a class of evolution strategies with known techniques from deterministic DFO.
Achieve convergence from any starting point to a stationary point for evolution strategies.
Obtain good performance on practical problems, in terms of function evaluations, supposed to be expensive.
Extend the proposed algorithm to handle constrained optimization problems.
8/29
Main objectives
Equip a class of evolution strategies with known techniques from deterministic DFO.
Achieve convergence from any starting point to a stationary point for evolution strategies.
Obtain good performance on practical problems, in terms of function evaluations, supposed to be expensive.
Extend the proposed algorithm to handle constrained optimization problems.
9/29
Plan
1 Introduction
2 A Globally Convergent Evolution Strategy
Algorithmic Modifications Global Convergence
3 Numerical results
The GC-ES solver Linear constraints case Non-linear constraints case
4 Conclusions
10/29
Algorithmic Modifications (1)
The objective function is minimized on a dense
set of directions. [Audet and Denis, 2006]
Changes in the algorithm[Kolda,
Lewis,Torczon 2003]:
One will impose a sufficient decrease along the iterates.
We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.
A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )
=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.
10/29
Algorithmic Modifications (1)
The objective function is minimized on a dense
set of directions. [Audet and Denis, 2006]
Changes in the algorithm[Kolda,
Lewis,Torczon 2003]:
One will impose a sufficient decrease along the iterates.
We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.
A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )
=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.
10/29
Algorithmic Modifications (1)
The objective function is minimized on a dense
set of directions. [Audet and Denis, 2006]
Changes in the algorithm[Kolda,
Lewis,Torczon 2003]:
One will impose a sufficient decrease along the iterates.
We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.
A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )
=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.
10/29
Algorithmic Modifications (1)
The objective function is minimized on a dense
set of directions. [Audet and Denis, 2006]
Changes in the algorithm[Kolda,
Lewis,Torczon 2003]:
One will impose a sufficient decrease along the iterates.
We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.
A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )
mean/mean
=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.
10/29
Algorithmic Modifications (1)
The objective function is minimized on a dense
set of directions. [Audet and Denis, 2006]
Changes in the algorithm[Kolda,
Lewis,Torczon 2003]:
One will impose a sufficient decrease along the iterates.
We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.
A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )
mean/mean
=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.
10/29
Algorithmic Modifications (1)
The objective function is minimized on a dense
set of directions. [Audet and Denis, 2006]
Changes in the algorithm[Kolda,
Lewis,Torczon 2003]:
One will impose a sufficient decrease along the iterates.
We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.
A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )
mean/mean
=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.
10/29
Algorithmic Modifications (1)
The objective function is minimized on a dense
set of directions. [Audet and Denis, 2006]
Changes in the algorithm[Kolda,
Lewis,Torczon 2003]:
One will impose a sufficient decrease along the iterates.
We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.
A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )
mean/mean
=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.
10/29
Algorithmic Modifications (1)
The objective function is minimized on a dense
set of directions. [Audet and Denis, 2006]
Changes in the algorithm[Kolda,
Lewis,Torczon 2003]:
One will impose a sufficient decrease along the iterates.
We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.
A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )
mean/mean
=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.
10/29
Algorithmic Modifications (1)
The objective function is minimized on a dense
set of directions. [Audet and Denis, 2006]
Changes in the algorithm[Kolda,
Lewis,Torczon 2003]:
One will impose a sufficient decrease along the iterates.
We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.
A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )
mean/mean
=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.
10/29
Algorithmic Modifications (1)
The objective function is minimized on a dense
set of directions. [Audet and Denis, 2006]
Changes in the algorithm[Kolda,
Lewis,Torczon 2003]:
One will impose a sufficient decrease along the iterates.
We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.
A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )
mean/mean
=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.
10/29
Algorithmic Modifications (1)
The objective function is minimized on a dense
set of directions. [Audet and Denis, 2006]
Changes in the algorithm[Kolda,
Lewis,Torczon 2003]:
One will impose a sufficient decrease along the iterates.
We also require a control of the scaling parameter (i.e. step-length), and possibly scale too long steps.
A feasible approach is used to handle the constraints (using an extreme barrier function fΩinstead of f )
mean/mean
=⇒Idea : give a general framework where a slight modification of ES can fit, hopefully without jeopardizing performance.
11/29
Algorithmic Modifications (2)
Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :
yk+1i = xk + σkESd i k,
where di
k is drawn from a distribution Ck, i = 1, ..., λ.
Parent Selection : Evaluate f (yi
k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜y λ k+1} : f (˜y 1 k+1) ≤ ... ≤ f (˜y λ k+1). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ.
Imposing Sufficient Decrease :
ES Updates : σk+1ES , Ck and (ω01, ..., ω µ
0) ∈ S. Return to step (1).
11/29
Algorithmic Modifications (2)
Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :
yk+1i = xk + σkESd i k,
where di
k is drawn from a distribution Ck, i = 1, ..., λ.
Parent Selection : Evaluate f (yi
k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜y λ k+1} : f (˜y 1 k+1) ≤ ... ≤ f (˜y λ k+1). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ.
Imposing Sufficient Decrease :
ES Updates : σk+1ES , Ck and (ω01, ..., ω µ
0) ∈ S. Return to step (1).
11/29
Algorithmic Modifications (2)
Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :
yk+1i = xk +σk d
i k,
where di
k is drawn from a distribution Ck, i = 1, ..., λ.
Parent Selection : Evaluate f (yi
k+1), i = 1, ..., λ, and reorder Yk+1 = {˜yk+11 , ..., ˜y λ k+1} : f (˜y 1 k+1) ≤ ... ≤ f (˜y λ k+1). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ.
Imposing Sufficient Decrease :
ES Updates : σk+1ES , Ck and (ω01, ..., ω µ
0) ∈ S. Return to step (1).
11/29
Algorithmic Modifications (2)
Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :
yk+1i = xk +σk d
i k,
where di
k is drawn from a distribution Ck, i = 1, ..., λ.
Parent Selection : Evaluate fΩ(yk+1i ), i = 1, ..., λ, and reorder
Yk+1 = {˜yk+11 , ..., ˜y λ k+1} :fΩ(˜yk+11 ) ≤ ... ≤fΩ(˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ.
Imposing Sufficient Decrease :
ES Updates : σk+1ES , Ck and (ω01, ..., ω µ
0) ∈ S. Return to step (1).
11/29
Algorithmic Modifications (2)
Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :
yk+1i = xk +σk d˜
i k,
where di
k is drawn from a distribution Ck, i = 1, ..., λ.
Parent Selection : Evaluate fΩ(yk+1i ), i = 1, ..., λ, and reorder
Yk+1 = {˜yk+11 , ..., ˜y λ k+1} :fΩ(˜yk+11 ) ≤ ... ≤fΩ(˜yk+1λ ). xk+1 = µ X i =1 ωik˜yk+1i λ ≥ µ.
Imposing Sufficient Decrease :
ES Updates : σES
k+1, Ck and (ω01, ..., ω µ
0) ∈ S. Return to step (1).
11/29
Algorithmic Modifications (2)
Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :
yk+1i = xk +σk d˜
i k,
where di
k is drawn from a distribution Ck, i = 1, ..., λ.
Parent Selection : Evaluate fΩ(yk+1i ), i = 1, ..., λ, and reorder
Yk+1 = {˜yk+11 , ..., ˜y λ k+1} :fΩ(˜yk+11 ) ≤ ... ≤fΩ(˜yk+1λ ). xk+1trial = µ X i =1 ωik˜yk+1i λ ≥ µ.
Imposing Sufficient Decrease :
ES Updates : σES
k+1, Ck and (ω01, ..., ω µ
0) ∈ S. Return to step (1).
11/29
Algorithmic Modifications (2)
Offspring Generation : Compute new sample points Yk+1 = {yk+11 , ..., yk+1λ } such that :
yk+1i = xk +σk d˜
i k,
where di
k is drawn from a distribution Ck, i = 1, ..., λ.
Parent Selection : Evaluate fΩ(yk+1i ), i = 1, ..., λ, and reorder
Yk+1 = {˜yk+11 , ..., ˜y λ k+1} :fΩ(˜yk+11 ) ≤ ... ≤fΩ(˜yk+1λ ). xk+1trial = µ X i =1 ωik˜yk+1i λ ≥ µ.
Imposing Sufficient Decrease :
ES Updates : σES
k+1, Ck and (ω01, ..., ω µ
0) ∈ S. Return to step (1).
12/29
Algorithmic Modifications (3)
Imposing Sufficient Decrease :
If
fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand
σk+1= max(σk, σkES))
otherwise, set xk+1= xk and σk+1= βσk(where β < 1).
Conforming to the geometry of the constraints :
12/29
Algorithmic Modifications (3)
Imposing Sufficient Decrease :
If
fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand
σk+1= max(σk, σkES))
otherwise, set xk+1= xk and σk+1= βσk(where β < 1).
Conforming to the geometry of the constraints :
12/29
Algorithmic Modifications (3)
Imposing Sufficient Decrease :
If
fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand
σk+1= max(σk, σkES))
otherwise, set xk+1= xk and σk+1= βσk(where β < 1).
Conforming to the geometry of the constraints :
Barrier approach
12/29
Algorithmic Modifications (3)
Imposing Sufficient Decrease :
If
fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand
σk+1= max(σk, σkES))
otherwise, set xk+1= xk and σk+1= βσk(where β < 1).
Conforming to the geometry of the constraints :
Barrier approach
12/29
Algorithmic Modifications (3)
Imposing Sufficient Decrease :
If
fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand
σk+1= max(σk, σkES))
otherwise, set xk+1= xk and σk+1= βσk(where β < 1).
Conforming to the geometry of the constraints :
Barrier approach
12/29
Algorithmic Modifications (3)
Imposing Sufficient Decrease :
If
fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand
σk+1= max(σk, σkES))
otherwise, set xk+1= xk and σk+1= βσk(where β < 1).
Conforming to the geometry of the constraints :
Barrier approach Projection approach
12/29
Algorithmic Modifications (3)
Imposing Sufficient Decrease :
If
fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand
σk+1= max(σk, σkES))
otherwise, set xk+1= xk and σk+1= βσk(where β < 1).
Conforming to the geometry of the constraints :
Barrier approach Projection approach
12/29
Algorithmic Modifications (3)
Imposing Sufficient Decrease :
If
fΩ(xk+1trial) ≤ f (xk) −ρ(σk), then the iteration issuccessful(set xk+1= xk+1trialand
σk+1= max(σk, σkES))
otherwise, set xk+1= xk and σk+1= βσk(where β < 1).
Conforming to the geometry of the constraints :
Barrier approach Projection approach
13/29
Plan
1 Introduction
2 A Globally Convergent Evolution Strategy
Algorithmic Modifications Global Convergence
3 Numerical results
The GC-ES solver Linear constraints case Non-linear constraints case
4 Conclusions
14/29
Global Convergence (1)
By inspecting the sign of theClarke directional derivative.
Definition
Let f be a Lipschitz continuous function near the point x∗, given
u ∈ TΩ(x∗), the Clarke-Jahn generalized derivative is
f◦(x∗; u) = lim
d ∈TH Ω(x∗),d →u
f◦(x∗; d ).
If x∗∈ Ω is a local minimizer of the function f then
f◦(x∗; d ) ≥ 0, ∀d ∈ TΩ(x∗).
(the point x∗ is said to be Clarke stationary)
14/29
Global Convergence (1)
By inspecting the sign of theClarke directional derivative.
Definition
Let f be a Lipschitz continuous function near the point x∗, given
u ∈ TΩ(x∗), the Clarke-Jahn generalized derivative is
f◦(x∗; u) = lim
d ∈TH Ω(x∗),d →u
f◦(x∗; d ).
If x∗∈ Ω is a local minimizer of the function f then
f◦(x∗; d ) ≥ 0, ∀d ∈ TΩ(x∗).
(the point x∗ is said to be Clarke stationary)
14/29
Global Convergence (1)
By inspecting the sign of theClarke directional derivative.
Definition
Let f be a Lipschitz continuous function near the point x∗, given
u ∈ TΩ(x∗), the Clarke-Jahn generalized derivative is
f◦(x∗; u) = lim
d ∈TH Ω(x∗),d →u
f◦(x∗; d ).
If x∗∈ Ω is a local minimizer of the function f then
f◦(x∗; d ) ≥ 0, ∀d ∈ TΩ(x∗).
(the point x∗ is said to be Clarke stationary)
14/29
Global Convergence (1)
By inspecting the sign of theClarke directional derivative.
Definition
Let f be a Lipschitz continuous function near the point x∗, given
u ∈ TΩ(x∗), the Clarke-Jahn generalized derivative is
f◦(x∗; u) = lim
d ∈TH Ω(x∗),d →u
f◦(x∗; d ).
If x∗∈ Ω is a local minimizer of the function f then
f◦(x∗; d ) ≥ 0, ∀d ∈ TΩ(x∗).
(the point x∗ is said to be Clarke stationary)
15/29
Global Convergence (2)
Theorem [D., Gratton and Vicente, 2014]
Consider a sequence of iterations generated by our Algorithm without any stopping criterion. Let f be bounded below.
Then there exists a subsequence K of unsuccessful iterates for which
limk∈Kσk = 0.
If {xk}is bounded, then there exists x∗ and a subsequence K of
unsuccessful iterates for whichlimk∈Kσk = 0andlimk∈Kxk = x∗.
15/29
Global Convergence (2)
Theorem [D., Gratton and Vicente, 2014]
Consider a sequence of iterations generated by our Algorithm without any stopping criterion. Let f be bounded below.
Then there exists a subsequence K of unsuccessful iterates for which
limk∈Kσk = 0.
If {xk}is bounded, then there exists x∗ and a subsequence K of
unsuccessful iterates for whichlimk∈Kσk = 0andlimk∈Kxk = x∗.
15/29
Global Convergence (2)
Theorem [D., Gratton and Vicente, 2014]
Consider a sequence of iterations generated by our Algorithm without any stopping criterion. Let f be bounded below.
Then there exists a subsequence K of unsuccessful iterates for which
limk∈Kσk = 0.
If {xk}is bounded, then there exists x∗ and a subsequence K of
unsuccessful iterates for whichlimk∈Kσk = 0andlimk∈Kxk = x∗.
15/29
Global Convergence (2)
Theorem [D., Gratton and Vicente, 2014]
Consider a sequence of iterations generated by our Algorithm without any stopping criterion. Let f be bounded below.
Then there exists a subsequence K of unsuccessful iterates for which
limk∈Kσk = 0.
If {xk}is bounded, then there exists x∗ and a subsequence K of
unsuccessful iterates for whichlimk∈Kσk = 0andlimk∈Kxk = x∗.
16/29
Global Convergence (3)
Theorem [D., Gratton and Vicente, 2014]
Let x∗ ∈ Ω be a limit point of unsuccessful subsequence of iterates {xk}
for which limk∈Kσk = 0, and let ak =Pµi =1ωkidki.
Assume that f is
Lipschitz continuous near x∗and that TΩH(x∗) 6= ∅.
If d ∈ TH
Ω(x∗) is a limit point of {ak/kakk}K, then
f◦(x∗; u) ≥ 0.
If the set of limit points {ak/kakk}Kis dense on the unit sphere, thenx∗is
a Clarke stationary point.
Corollary [D., Gratton and Vicente, 2014]
When f isstrictly differentiableat x∗, we conclude that the projection of
∇f (x∗) onto TΩ(x∗) is zero.
16/29
Global Convergence (3)
Theorem [D., Gratton and Vicente, 2014]
Let x∗ ∈ Ω be a limit point of unsuccessful subsequence of iterates {xk}
for which limk∈Kσk = 0, and let ak =Pµi =1ωkidki. Assume that f is
Lipschitz continuous near x∗and that TΩH(x∗) 6= ∅.
If d ∈ TH
Ω(x∗) is a limit point of {ak/kakk}K, then
f◦(x∗; u) ≥ 0.
If the set of limit points {ak/kakk}Kis dense on the unit sphere, thenx∗is
a Clarke stationary point.
Corollary [D., Gratton and Vicente, 2014]
When f isstrictly differentiableat x∗, we conclude that the projection of
∇f (x∗) onto TΩ(x∗) is zero.
16/29
Global Convergence (3)
Theorem [D., Gratton and Vicente, 2014]
Let x∗ ∈ Ω be a limit point of unsuccessful subsequence of iterates {xk}
for which limk∈Kσk = 0, and let ak =Pµi =1ωkidki. Assume that f is
Lipschitz continuous near x∗and that TΩH(x∗) 6= ∅.
If d ∈ TH
Ω(x∗) is a limit point of {ak/kakk}K, then
f◦(x∗; u) ≥ 0.
If the set of limit points {ak/kakk}Kis dense on the unit sphere, thenx∗is
a Clarke stationary point.
Corollary [D., Gratton and Vicente, 2014]
When f isstrictly differentiableat x∗, we conclude that the projection of
∇f (x∗) onto TΩ(x∗) is zero.
16/29
Global Convergence (3)
Theorem [D., Gratton and Vicente, 2014]
Let x∗ ∈ Ω be a limit point of unsuccessful subsequence of iterates {xk}
for which limk∈Kσk = 0, and let ak =Pµi =1ωkidki. Assume that f is
Lipschitz continuous near x∗and that TΩH(x∗) 6= ∅.
If d ∈ TH
Ω(x∗) is a limit point of {ak/kakk}K, then
f◦(x∗; u) ≥ 0.
If the set of limit points {ak/kakk}Kis dense on the unit sphere, thenx∗is
a Clarke stationary point.
Corollary [D., Gratton and Vicente, 2014]
When f isstrictly differentiableat x∗, we conclude that the projection of
∇f (x∗) onto TΩ(x∗) is zero.
16/29
Global Convergence (3)
Theorem [D., Gratton and Vicente, 2014]
Let x∗ ∈ Ω be a limit point of unsuccessful subsequence of iterates {xk}
for which limk∈Kσk = 0, and let ak =Pµi =1ωkidki. Assume that f is
Lipschitz continuous near x∗and that TΩH(x∗) 6= ∅.
If d ∈ TH
Ω(x∗) is a limit point of {ak/kakk}K, then
f◦(x∗; u) ≥ 0.
If the set of limit points {ak/kakk}Kis dense on the unit sphere, thenx∗is
a Clarke stationary point.
Corollary [D., Gratton and Vicente, 2014]
When f isstrictly differentiableat x∗, we conclude that the projection of
∇f (x∗) onto TΩ(x∗) is zero.
17/29
Plan
1 Introduction
2 A Globally Convergent Evolution Strategy
Algorithmic Modifications Global Convergence
3 Numerical results
The GC-ES solver Linear constraints case Non-linear constraints case
4 Conclusions
18/29
The GC-ES solver
GC-ES : Globally Convergent Evolution Strategy. A Matlab software.
Constrained and unconstrained optimization problems are handled. A geometry control for general linear constraints.
A search step is implemented using least-squares regression models are used.
No geometry control in the search step, we only impose the singular values to be large enough.
19/29
Plan
1 Introduction
2 A Globally Convergent Evolution Strategy
Algorithmic Modifications Global Convergence
3 Numerical results
The GC-ES solver Linear constraints case Non-linear constraints case
4 Conclusions
20/29
Solvers in our comparison
Implemented solvers :
The barrier approach : GC-ES-LC-B The projection approach : GC-ES-LC-P
Solvers used in our numerical comparison : [Rios, Sahinidis 2010]
The ”best” stochastic solvers CMA-ES [Hansen 2010] and PSWARM
[Vaz, Vicente 2009],[Vaz, Vicente 2010]
The ”best” solver MCS[Huyer, Neumaier 1999](deterministic).
BCDFO [Gratton, Toint, Tr¨oltzsch 2011]; developed later but it was
shown to perform very well.
20/29
Solvers in our comparison
Implemented solvers :
The barrier approach : GC-ES-LC-B The projection approach : GC-ES-LC-P
Solvers used in our numerical comparison : [Rios, Sahinidis 2010]
The ”best” stochastic solvers CMA-ES [Hansen 2010] and PSWARM
[Vaz, Vicente 2009],[Vaz, Vicente 2010]
The ”best” solver MCS[Huyer, Neumaier 1999](deterministic).
BCDFO [Gratton, Toint, Tr¨oltzsch 2011]; developed later but it was
shown to perform very well.
21/29
Test set
2 constrained category problems
114 Bound constrained problems[Vaz, Vicente 2009].
107 general linear constrained problems[Vaz, Vicente 2010].
In the proposed experiments
the performance is compared using performance profile. the budget is fixed to 3000 objective function evaluations. the advantage of the search step was significant for bound constraints.
for linear constraints, the search step did not lead to any improvement of the performance and was switched off.
21/29
Test set
2 constrained category problems
114 Bound constrained problems[Vaz, Vicente 2009].
107 general linear constrained problems[Vaz, Vicente 2010].
In the proposed experiments
the performance is compared using performance profile. the budget is fixed to 3000 objective function evaluations. the advantage of the search step was significant for bound constraints.
for linear constraints, the search step did not lead to any improvement of the performance and was switched off.
22/29
Bound constraints (accuracy is 10
−2)
0 2 4 6 8 10 12 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log
2 scaled performance profiles, α = 10 −2 ES−LC−B ES−LC−P PSWARM CMA−ES MCS BCDFO
(a) Search step disabled.
22/29
Bound constraints (accuracy is 10
−2)
0 2 4 6 8 10 12 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log
2 scaled performance profiles, α = 10 −2 ES−LC−B ES−LC−P PSWARM CMA−ES MCS BCDFO
(b) Search step enabled.
23/29
Bound constraints (accuracy is 10
−4)
0 2 4 6 8 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log
2 scaled performance profiles, α = 10 −4 ES−LC−B ES−LC−P PSWARM CMA−ES MCS BCDFO
(c) Search step disabled.
23/29
Bound constraints (accuracy is 10
−4)
0 2 4 6 8 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log
2 scaled performance profiles, α = 10 −4 ES−LC−B ES−LC−P PSWARM CMA−ES MCS BCDFO
(d) Search step enabled.
24/29
Linear constraints (accuracy 10
−2and 10
−4)
0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log
2 scaled performance profiles, α = 10 −2
ES−LC−B ES−LC−P PSWARM
(e) Search step disabled.
24/29
Linear constraints (accuracy 10
−2and 10
−4)
0 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 τ ρ s (τ ) Log
2 scaled performance profiles, α = 10 −4
ES−LC−B ES−LC−P PSWARM
(f) Search step disabled.
25/29
Plan
1 Introduction
2 A Globally Convergent Evolution Strategy
Algorithmic Modifications Global Convergence
3 Numerical results
The GC-ES solver Linear constraints case Non-linear constraints case
4 Conclusions
26/29
Implementation details
Comparison with the extreme barrier approach proposed in MADS
[Audet, Dennis 2009]: MADS-EB.
Using 13 constrained opt. pbs [Michalewicz, Schoenaurer 1996]
[Koziel, Michalewicz 1999]
tend to cover many kind of constrained global optimization difficulties.
A large computational budget : 20000 objective function evaluations.
27/29
Non-linear constraints tests
Name Best known GC-ES-EB MADS-EB
f value final f #f final f #f
G1 −15 −15 15904 −7.82761 10093 G2 −0.803619 −0.261765 4235 −0.206864 11027 G3 −1 −0.340501 20000 −6.36481e − 233 1310 G4 −30665.5 −30665.5 6008 −30664.9 6666 G5 5126.5 5976.79 1653 5609.84 2542 G6 −6961.81 −6961.81 1158 −6961.81 1863 G7 24.3062 24.7203 14606 29.9121 7825 G8 −0.095825 −0.095825 438 −0.095825 343 G9 680.63 680.641 11114 681.301 3443 G10 7049.33 11129.7 20000 7186.62 20000 G11 0.75 0.99998 211 0.9998 193 G12 −1 −1 247 −1 173 G13 0.0539498 2.52108 5413 2.64292 20000
28/29
Conclusions
GC-ES is a software that :
implements a globally convergent evolution strategy.
has a two feasible approaches to handle constraints : barrier and projection approaches.
uses optionally surrogate quadratic models to improve the performance of the proposed algorithm.
gives promising results are obtained compare to some of the best DFO solvers.
Version 0.1 of CG ES is available (Diouane.Youssef@isae.fr).
Next : add a merit function approach to handle relaxable constraints.
D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies, Math. Program., 2014.
D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies for Constrained Optimization, Comput. Optim. Appl., 2015.
28/29
Conclusions
GC-ES is a software that :
implements a globally convergent evolution strategy.
has a two feasible approaches to handle constraints : barrier and projection approaches.
uses optionally surrogate quadratic models to improve the performance of the proposed algorithm.
gives promising results are obtained compare to some of the best DFO solvers.
Version 0.1 of CG ES is available (Diouane.Youssef@isae.fr).
Next : add a merit function approach to handle relaxable constraints.
D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies, Math. Program., 2014.
D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies for Constrained Optimization, Comput. Optim. Appl., 2015.
28/29
Conclusions
GC-ES is a software that :
implements a globally convergent evolution strategy.
has a two feasible approaches to handle constraints : barrier and projection approaches.
uses optionally surrogate quadratic models to improve the performance of the proposed algorithm.
gives promising results are obtained compare to some of the best DFO solvers.
Version 0.1 of CG ES is available (Diouane.Youssef@isae.fr).
Next : add a merit function approach to handle relaxable constraints.
D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies, Math. Program., 2014.
D., S. Gratton and L. N. Vicente, Globally Convergent Evolution Strategies for Constrained Optimization, Comput. Optim. Appl., 2015.
29/29