A NNALES DE L ’I. H. P., SECTION B
R APHAËL C ERF
The dynamics of mutation-selection algorithms with large population sizes
Annales de l’I. H. P., section B, tome 32, n
o4 (1996), p. 455-508
<http://www.numdam.org/item?id=AIHPB_1996__32_4_455_0>
© Gauthier-Villars, 1996, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les condi- tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
The dynamics of mutation-selection algorithms
with large population sizes
Raphaël
CERFUniversite de Montpellier,
Montpellier II, Departement des Sciences Mathematiques,
Case 051, Place Eugene Bataillon,
34095 Montpellier Cedex 05, France.
Vol. 32, n° 4, 1996, p. 455-508 Probabilités et Statistiques
ABSTRACT. - We build the mutation-selection
algorithm by randomly perturbing
a verysimple
selection scheme. Our processbelongs
to the classof the
generalized
simulatedannealing algorithms
studiedby
Trouve. When thepopulation
size m islarge,
the variousquantities
associated with thealgorithm
are affine functions of m and thehierarchy
ofcycles
on the setof uniform
populations
stabilizes. If the mutation kernel issymmetric,
thelimiting
distribution is the uniform distribution over the set of theglobal
maxima of the fitness function. The
optimal
convergenceexponent
definedby Azencott,
Catoni and Trouve is an affinestrictly increasing
function of m.Key words and phrases: Freidlin-Wentzell theory, evolutionary algorithms, stochastic optimization.
1991 Mathematics Subject Classification : 60 F 10, 60 J 10, 92 D 15.
Professor Jean-Arcady Meyer first spoke to me of evolutionary algorithms.
I am deeply grateful to Professor Gerard Ben Arous for introducing me to the Freidlin- Wentzell theory.
I thank Professor Alain Berlinet for his useful advice and for having carefully read this paper.
I thank Alain Trouve for several discussions about the generalized simulated annealing and
for comments which improved this work.
I thank an anonymous Referee for his meticulous work and numerous changes which increased
significantly the clarity of this paper. He gave a proof of proposition 10.1 which is considerably simpler than the original one.
Annales de l’lnstitut Henri Poincaré - Probabilités et Statistiques - 0246-0203
Vol. 32/96/04/$ 4.00/@ Gauthier-Villars
RESUME. - Nous construisons
l’algorithme
de mutation-selection enperturbant
aleatoirement un mecanisme de selection tressimple.
Leprocessus obtenu entre dans la classe des
algorithmes
de recuit simulegeneralises
etudies par Trouve.Lorsque
la taille de lapopulation
m estgrande,
les differentesquantites
associees al’algorithme
sont des fonctions affines de m et la hiérarchie descycles
sur l’ensemble despopulations
uniformes se stabilise. Si le noyau de mutation est
symetrique,
la distribution limite est la distribution uniforme sur l’ensemble des maximaglobaux
dela fonction fitness.
L’exposant optimal
de convergence defini parAzencott,
Catoni et Trouve est une fonction affine strictement croissante de m.1. INTRODUCTION
Evolutionary algorithms
areoptimization techniques
based on themechanics of natural selection
[6]. Experimental
simulations have demonstrated theirefficiency: they
arerobust,
flexible and in additionextremely
easy toimplement
onparallel computers [5].
Unfortunately,
there is a critical lack of theoretical results for the convergence of this kind ofalgorithms.
In aprevious
paper[3],
the author constructed a Markovian model of Holland’s GeneticAlgorithm.
This modelwas built
by randomly perturbing
a verysimple
selection scheme. Thestudy
of theasymptotic dynamics
of the process was carried out with thepowerful
toolsdeveloped by
Freidlin and Wentzell for thestudy
of randomperturbations
ofdynamical systems, yielding
what seemed to be the firstmathematically
well-founded convergence results forgenetic algorithms.
It was
proved
that the convergence toward theglobal
maxima of thefitness function becomes
possible
when thepopulation
size isgreater
than a critical value(which depends
on all the characteristics of theoptimization problem). Surprisingly,
the crossover did notplay a
fundamental role in this model. The crucialpoint
to ensure the desired convergence was the delicateasymptotic
interaction between the localperturbations
of the individuals(i. e.
themutations)
and the selection pressure.In this paper, we deal with the
two-operators
mutation-selectionalgorithm.
We introduce a newparameter (analogous
to thetemperature
of the simulatedannealing)
which controls theintensity
of the randomperturbations.
Westudy
theasymptotic dynamics
of the process when theperturbations disappear.
This newpoint
of view in the field ofgenetic algorithms
allows to prove several resultsconcerning
the convergence ofAnnales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques
the law of the process toward the maxima of the fitness function. We focus here on the
dynamics
of thealgorithm
when thepopulation
size mbecomes very
large.
Once more, we follow the road
opened by
Freidlin and Wentzell[4].
However we will use the
concepts
and tools introducedby
Catoni in hisprecise study
of thesequential
simulatedannealing ([1], [2]);
these wereput
in a moregeneral
frameworkby Trouve,
who carried out asystematic study (initiated by Hwang
and Sheu[7])
of a class of processes hebaptized
"generalized
simulatedannealing" ([9], [10], [11]).
Ouralgorithm belongs
to this class of processes and our
study
will thusheavily rely
on Catoniand Trouve’ s work.
This paper has the
following
structure. First we describe our model of the mutation-selectionalgorithm.
We show how it fits in the framework of thegeneralized
simulatedannealing
andgive
a sufficient condition to ensurethe concentration of the law of the
algorithm
on theglobal
maxima of the fitness function. Thedynamics
of thealgorithm
is best describedthrough
thedecomposition
of the space into ahierarchy
ofcycles,
which are the mostattractive and stable sets we can build for the
perturbed
process. We provea result valid for the
generalized
simulatedannealing
which reduces thestudy
of thedynamics
to the bottom of thecycles (the
uniformpopulations
in our
situation).
Thekey
result for ourstudy
lies in the structure of themost
probable trajectories
ofpopulations joining
two uniformpopulations:
a small group of individuals sacrifice themselves in order to create an ideal
path
which is then followedby
all other individuals. As a consequence, the variousquantities
associated with thealgorithm (such
as the communication cost, the virtual energy, the communicationaltitude... )
are affine functions of thepopulation
size. We then prove that thehierarchy
ofcycles
on the setof the uniform
populations
stabilizes when thepopulation
size islarge.
Thestructure of the
limiting hierarchy
ofcycles depends
upon the relative values of the mutation cost and the variations of the fitness function.However,
the fitness function is constant on the "bad"
cycles (i. e.
those which donot contain the minima of the virtual
energy). Furthermore,
if the mutation kernel issymmetric
and if thepopulation
size isgreater
than a criticalvalue,
the
limiting
distribution is the uniform distribution over the set of theglobal
maxima of the fitness function.
Finally,
weinvestigate
the critical constantsdescribing
the convergence of thealgorithm.
Theoptimal
convergencespeed exponent
definedby Azencott,
Catoni and Trouve([ 1 ], [2], [9], [10], [11]) (which gives
theoptimal
convergence rate toward the maxima in finitetime)
increaseslinearly
with m. This fact shows that ouralgorithm
isintrinsically parallel:
it involvesonly
localindependent computations.
Vol. 32, n° 4-1996.
2. DESCRIPTION OF THE MODEL 2.1. The fitness
landscape
We consider a finite space of states E and a real-valued
positive
non-constant function
f (which
will be called the fitnessfunction)
defined onE. The set E is endowed with an irreducible Markov kernel cx, that is a
function defined on E x E with values in
[0, 1] satisfying
The three
objects E, f, a
define an abstract fitnesslandscape.
We aresearching
for the setf *
of theglobal
maxima off
i. e.By f ( f * )
we mean the maximum value off
over E i.e.Symbols
with a star * insuperscript
will denote setsrealizing
the minimumor the maximum of a
particular
functional. Thepoints
of E will be called individuals and will bemostly
denotedby
the lettersi, j,
e.2.2. The
population
spaceWe will consider Markov chains with state space E~ where m is the
population
size of ouralgorithm;
them-uples
of elements ofE,
i.e. thepoints
of the setE"2,
are calledpopulations
and will bemostly
denotedby
the letters x, ~, z. For x in E~ and i in is the number ofoccurrences of i in x:
With
f
we associate a functionf
defined on E’nby
For x in
Em, X
denotes the set of those individuals of x which realize the valueAnnales de l’Institut Henri Poincaré - Probabilités et Statistiques
If i
belongs
toE, (i)
is them-uple
whose mcomponents
areequal
to i andU is the set of all such
m-uples (which
are called the uniformpopulations).
We sometimes
identify
an element i of E with the uniformpopulation ( i ) .
Thus
f *
may be seen as a subset of U.2.3. The
unperturbed
Markov chain(X~)
In the absence of
perturbations,
the process understudy
is a Markov chain with state space Em. Thesuperscript
oo reflects the fact that thisprocess describes the limit behavior of our
model,
when allperturbations
have
disappeared.
The transitionprobabilities
of this chain arethat
is,
the individuals of thepopulation
are chosenrandomly (under
the uniform
distribution)
andindependently
among the elements ofX~
which are the best individuals of
X~ according
to the fitnessfunction f.
Notice that after a
while,
the chain(X~) starting
from apopulation
x istrapped
in a uniformpopulation
selected inx:
In
fact,
theabsorbing
states of the chain(X~)
areexactly
the uniformpopulations
U.2.4. The
perturbed
Markov chain(X~)
The
previous
Markov chain(X~)
israndomly perturbed by
two distinctmechanisms. The first one acts
directly
upon thepopulation
and mimicsthe
phenomenon
of mutation. The second one consists inloosening
the selection of the individuals. The
intensity
of theperturbations
isgoverned by
apositive parameter l
so that we obtain afamily
of Markovchains indexed
by
l. As l grows towardinfinity,
theperturbations progressively disappear.
The transition fromXn
to includes twostages corresponding
to the mutation and the selectionoperators:
We now describe more
precisely
these twooperators.
The mutation
operator
is modelledby
randomindependent perturbations
of the individuals of the
population X~.
With the Markov kernel a(which
Vol. 32, n° 4-1996.
is
initially given
with the set E and the functionf ),
we build afamily
of Markov kernels on the set E. Let a be a
positive
real number.Define for
i, j
in Eand
l > 1The
quantity
is theprobability
that a mutation transforms the individual i into theindividual j
at theperturbation
level I. The transitionprobabilities
fromXn
toYn
are thengiven by
Note that the mutations vanish when I goes to
infinity
i. e.where is the Kronecker
symbol (the identity
matrix indexedby E):
2.6.
Yn
-~ selectionThe selection
operator
is built with a selection function[3].
We evaluatethe fitness of the individuals of the
population Yn
and we build a distributionprobability
overYn
which is biased toward the best fit individuals and whichprogressively
concentrates on the setYn
asgoes
toinfinity.
We then selectindependently
m individuals fromYn
to form thepopulation Xn+1.
Let cbe a
positive
real number. The transitionprobabilities
fromYn
to areWe have
i. e. the selections of individuals below
peak
fitness tend todisappear
whenl goes to
infinity.
Annales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques
2.7. The
vanishing perturbations
The two
operators play antagonistic
roles: whereas the mutation tends todisperse
thepopulation
over the spaceE,
the selection tends to concentrate thepopulation
on the current best individual. Both are builtthrough
randomperturbations:
randomperturbations
of theidentity
for themutations,
randomperturbations
of the verystrong
selection mechanism of the chain(X)
for the selection. With this scheme of
non-overlapping generations,
itis essential that the mutations vanish:
otherwise,
the currentpopulation
could be
destroyed
at any timethrough
a massive mutation event witha null
perturbation
cost,preventing
the concentration of the law of the process on the set of theglobal
maxima of the fitness function. It isyet questionable
whether thevanishing
mutations and thegrowing
selectionpressure have a
biological meaning. However,
theoptimization problem
leads to the
following question:
is itpossible
to exert a control over thesevery
rudimentary operators
to ensure the convergence toward a state of maximum fitness?Formulas
(1)
and(2) yield
so that the process
(X~)
is aperturbation
of the process(X~).
A crucialpoint
is togive
the sameintensity
to the two kinds ofperturbations
sothat
they
interactproperly
when l goes toinfinity.
Moreprecisely,
therate of convergence of the transition
probabilities
in formulas(1)
and(2)
should be
logarithmically
of the same order(this
is the reasonwhy
weintroduce two more
parameters a
andc).
Theparameters
a and c are notindependent:
we will in fact comparea/c
toquantities
related to the fitness functionf. Rescaling
the functionf
with theparameter
c has the same effect aschanging
the mutation cost a. To avoidbreaking
thesymmetry,
we
keep
bothparameters
a and c in thesequel. Finally,
we areentirely
freefor the choice of the control
parameter
l: here l goes toinfinity.
We couldalso take In l or the
temperature
T =3. ASYMPTOTIC EXPANSION OF = =
y)
For a fixed value of
l,
theexpression
of the transition matrix of the chain(Xn )
isquite complicated.
We will focus on theasymptotic
behaviorof the
chain,
when l goes toinfinity.
The firststep
consists inestimating
the transition matrix of
Vol. 32, n° 4-1996.
By
the very construction of the process(X~),
we haveFor
each y
inEm,
where we note for x, y, z in Em
and
y)
is theHamming
distance between the vectors xand y
i.e.The above transition
probability
vanishes whenevera(x, y) z)
= 0.There are two different terms in the
exponent.
Thequantity d(x, y)
issimply
the number of mutations necessary to go from xto y
andad(x, y)
represents
theperturbation
cost necessary to achieve these mutations. The second termoriginates
from the fact that individualsof y
with a fitnessstrictly
less than may have been selected to form z : such an event will be called an anti-selection. The termc ~~ ( f (~) - represents
the
perturbation
cost necessary to achieve all the anti-selections.We define next the communication cost
Vi
on E’~ x Emby
We
put finally =03A303B1(x,y)03B3(y,z),
the sumbeing
extended overthe
populations y realizing
the minimum inz ) .
With thesenotations,
we see that
Remark in addition that for each x, z in E~
We are now in the framework of the
generalized
simulatedannealing
studiedby
Trouve([8], [9], [10], [11]).
Thatis,
the transitionprobabilities
of theprocess
(X~)
form afamily
of Markov kernels on the space E~ indexedby
l which is admissible for the communication kernel qi and the cost functionVi [9,
Definition3.1].
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques
4.
CONVERGENCE
OF THE MUTATION-SELECTION ALGORITHMWe will use
graphs
over the set Em.By
agraph
over E~ we mean a setof arrows x --~ y with
endpoints
in Em. Thus ourgraphs
are all oriented.NOTATION 4.1. -
Let g
be agraph
on Em. TheVi-cost of g
isWe recall that an
x-graph
is agraph
with no arrowstarting
from x andsuch that for any there exists a
unique path in g leading
from y to x.The set of all
x-graphs
is denotedby G(x).
For more details and notationsconcerning graphs,
see[4, chapter 6].
DEFINITION 4.2. - The virtual energy
Wi
associated with the cost functionVi
is definedby
We
put
alsoPROPOSITION 4.3. -
(Freidlin
andWentzell)
THEOREM 4.4
(convergence
of thehomogeneous algorithm). -
Fix thespace
E,
thefitness function f ,
the kernel a and thepositive
constants a, c.There exists a critical
population
size m*depending
upon all theseobjects
such that ’
This theorem has been
proved
in[3]
and various upper bounds for the value m* weregiven there;
for instanceVol. 32, n° 4-1996.
where
and R is the minimal number of transitions necessary to
join
twoarbitrary points
of Ethrough
the kernel a i.e. R is the smallestinteger
such thatThe
preceding
bound may beimproved
in several ways with morecomplicated
constants:however,
since we will dealonly
with verylarge population sizes,
we do not care about aprecise
estimation of m* .Anyway,
it is obvious that m*
depends strongly
on the fitnesslandscape (E, f, a).
We restate now Trouve’s result for the convergence in the
inhomogeneous
case, where the
parameter
1 is anincreasing
function of n. We have then aninhomogeneous
Markov chain and we suppress thesuperscript
l:
Xn
stands for-
THEOREM 4.5
(Trouve [9,
Theorem3.23]). -
There exists a constantHl,
called the critical
height,
such thatfor
allincreasing
sequencesl (n),
wehave the
equivalence
Note that
Hi
maydepend
on m.COROLLARY 4.6. -
Suppose
m > m* and03A3~n=0l(n)-H1
= ~. ThenAll the tools introduced
by
Trouve in([9], [10], [11])
for thestudy
of the
generalized
simulatedannealing
may be used foranalyzing
themutation-selection
algorithm.
Theparticular
interest of our process lies in the presence of several controlparameters
among which thepopulation
sizem is of
paramount importance.
In thesequel,
we willtry
to understand the behaviour of the mutation-selectionalgorithm
for verylarge populations.
The next section contains several results valid for the
generalized
simulatedannealing; they
will allow us to reduce thestudy
of thedynamics
ofthe process to the set U of uniform
populations (whose cardinality
isindependent
ofm).
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques
5. GENERAL RESULTS ABOUT THE VIRTUAL ENERGY AND THE COMMUNICATION ALTITUDE Let us introduce some notations and definitions.
If S is an
arbitrary
set, denotes the set ofpaths
inS,
that is the finite sequences of elements of S. Apath s
in S is notedindifferently
and its
length
isnoted Isl ] (r
in the aboveexample).
Apath s
in S is saidto
join
two elementsti
andt2
if s 1 =ti
and =t2;
the set of allpaths
in S
joining
thepoints ti
andt2
is notedt2 ) .
NOTATION 5.1. -
By
D’n we denote thepaths
in E"2 whichcorrespond
topossible trajectories
of the process(X)
i.e. thepaths
p in E"2satisfying
The
Vi-cost
of such apath
isThis definition coincides with the cost of a
graph (see
notation4.1)
if we consider thepath
as agraph
over E"2. Notice that for theempty path (which
has a nulllength),
the cost is zero. If pbelongs
toD"2,
weput Vi(p)
= oo. Weput
also for y, z in E’nDEFINITION 5.2. - We define the minimal communication cost V for y and z in E"2
by
and let
z )
be thepaths
ofz ) realizing
the above minimum.Notice that for all x in
E’n,
we haveV (x, x)
= 0. With this newcommunication cost
V,
we associate a cost function for thegraphs
anda virtual energy.
DEFINITION 5.3. - For a
graph g
onE"2,
we define its V-costby
Vol. 32, n° 4-1996.
and we define the virtual energy for any element x of E"2
by
We
put
alsoW(Em)
=min ~ W (x) : x
E W *_ ~ x
E E’n :W(x)
=We will now show that the minimal cost V contains all the relevant information to carry out the
study
of thedynamics
of thegeneralized
simulated
annealing:
thefollowing
results are notspecific
to the case ofthe mutation-selection
algorithms
and are valid in thegeneral
framework.Propositions
5.4 and 5.6 below are consequences of the moregeneral
lemma 5.6 of
[10].
However wegive
here directproofs
which do notinvolve the
hierarchy
ofcycles.
PROPOSITION 5.4. - The virtual energy W associated with the communication cost V coincides with the virtual energy
Wl
associatedwith the communication cost
Vl
i.e.As a consequence, we have = and W* =
Wl .
Proof -
Since we haveclearly
WWi.
Theproof
of thereverse
inequality
W >Wi
is similar to theproof
of lemma 4.1 of[4, chapter 6].
DThe fundamental
quantity
used to build the hierarchicaldecomposition
of the space into
cycles
is the communication altitude.DEFINITION 5.5
(Trouve [9,
Definition3.15]). -
The communication altitude between two distinctpoints x and y
of E~ isFor any x in
E"B
weput
=Wl (x).
The communication altitude may
equivalently
be definedthrough
thecost function V.
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques
PROPOSITION 5.6. - Let
for
xand y
two distinctpoints of
E"2For x in
E’n,
putA(x, x)
=W (x).
Then A =
A1.
Proof -
We have W =Wi
and YYl: clearly A Al.
Conversely,
let pbelong
toy).
For each1~,
1 k letpi
-~ ~ ~ ~ -~pnk
be apath
in E"2joining
p~ and such thatConsider the
path p
obtainedby joining
end to end all thesepaths:
Let k and h be two
integers
such that 1 k 1 h nk. We havewhence
However,
the left-hand side member of the aboveinequality
coincides withTaking
the minimum over allpaths
ofy),
we obtainA1(x, y)
A(x, y). D
Suppose
we wishonly
to examine the trace of thedynamics
on asubset H of Em. We build a new communication cost
VH
on Hby making
the set
E"2 B
H a "taboo" set. Once more, thefollowing
definitions and results are notspecific
to mutation-selectionalgorithms.
Vol. 32, n° 4-1996.
DEFINITION 5.7. - Let H be a subset of Em. We define a communication
cost
VH by
(for
the definition ofVi (p),
see notation5.1).
We define a virtual energy
WH
on Hby
where
GH(x)
is the set ofx-graphs
over H and theVH-cost
of agraph
g over H is
Finally,
we define a communication altitudeAH :
if xand y
are two distinctpoints
ofH,
For any x in
H,
weput AH(x, x)
=WH(x).
THEOREM 5.8. - Let H be a subset
of
E~ such thatThen
WH
= WandAH
= A on the set H.Proof - Let g belong
toGH(x),
where x E H.By hypothesis,
for eachy in
H,
there exists z in H such thatV(y, z)
= 0.Let 9
be anx-graph
over Em obtainedby adding
one such arrow for eachpoint
ofE"~ ~
H. We haveV(g)
=V (g) VH (g) whence, by taking
the minimumover all
x-graphs, W(x) WH(x).
Conversely, let g belong
toG(x),
where x is inH,
and consider thegraph 9
ofGH ( x )
definedby:
the arrow(y
-~z )
isin 9
if andonly
if y andz are in H and there exists a
path ~
= ?/ 2014~~ 2014~
... 2014~~c~’~ 2014~~~~~
in E"2 such that
We have It follows that
WH (x) W (x)
andthe first
equality WH =
W isproved.
Annales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques
Let p
belong
toH{~~ (x, ~),
where x, y are elements of H. For eachl~,
1 1~~p~,
letpi ~ ~ ~ ~ ~
be apath
in E"2 such thatp1
=p~~
== andConsider the
path p
obtainedby joining
end to end all thesepaths:
Let k and h be two
integers
such that1 k ~p~, 1 h n~.
We havewhence
However,
the left-hand side member of the aboveinequality
coincides withTaking
the minimum over allpaths
of we obtainA1 (x, g) _ A(x, g) AH(x, g)-
°The
proof
of the reverseinequality
will use thefollowing
little lemma.LEMMA 5.9. - Let x
belong
to E"2. There exists agraph
g inG(x)
such that
V(g)
=W (x)
andfor
each arrow(y
-~z) of
g we have either(y
E EH, V (y, z)
=0)
or(y
EH, z
EHU ~x~).
Inparticular,
no arrow ends in a
population of ~H
U~x~~.
Remark. - This lemma is a
slight improvement
of the firstpart
of Lemma 4.3 of[4, chapter 6].
Proof. - Let g
be an element ofG ( x )
such thatV~ (g )
=W (x ) . First,
weproceed
as in theproof
of Lemma 4.3 of[4, chapter 6]
toget
rid of all thearrows from
E"~ B
H to H. Let(y
-~z)
be such an arrow. Let z’ bein H such that
z’ )
= 0. Wereplace
the arrow( g ~ z ) by (y
--~z’ ) .
Ifa
cycle
2014~ ... -~xr --~ y
isformed,
wereplace
the arrow(xr
-~y)
Vol. 32, n° 4-1996.
by (xr
-~z).
Since +V(y, z)
the cost of thegraph
does not increase. We obtain a
graph g
inG(x)
such thatV(g)
=W (x)
and all arrows
starting
fromE"~ B
H end in H and have a null cost. Wenow remove the arrows from H to
U ~ x ~ ~ .
Let(y
--~z )
be suchan arrow. Let
z’
be theunique
element of H such that(z
-~z’)
E g. Wereplace
the arrow(y
--~z) by (y
--~z’).
Thisoperation
does not increasethe cost of the
graph
sinceV(y, z’) V(y, z)
+V(z, z’)
=V(y, z).
Theresulting graph
has the desiredproperties.
DNow let p
belong
toy).
Let k
belong
If p~ is inH,
weput p i
= p~ and rk = 1.Suppose
p~ is not in H and letpi
be an element of H such that= 0 and there exist
populations pl, ~ ~ ~ , p°
in suchthat =
Vl (p 1, p2 ) _ ~ ~ ~ =
0.Let g belong
toG(Pk)
be as in lemma 5.9. We add the arrow(pk
--~pt)
to g.Let g
be thegraph
on H definedby:
the arrow(y
-~z)
isin g
if andonly
if y and z are in H and there exists apath xl = g -a x2 --~ , ~ ~ --~
xr-1
-~ xr = z in E"2 such that(In particular,
we suppressfrom g
all the arrows(y
-~z),y
EE- B {H U
We have
V(g)
=W(pk).
Moreover,
thegraph 9
contains at most onecycle p2
~ ’ ’ ’ -pi (where
thepopulations pj ,
2 h, rk,belong
toH).
Thiscycle
has been created
by
the addition of the arrow(pk -> pi ) in g
which causesthe arrow -~
~i )
to appear ing. Necessarily
this arrow(pkk
-~pi )
comes from the sequence
pi in g (recall
that no arrowof g
ends in
Em B {H
U{pk~~)
so that the arrow ~ ispresent
in g. Ifwe now remove this arrow
from g
we obtain an element of whenceY~~J~ -
> or =and it follows that
W(Pk)
= +If there is no
cycle,
weput
rk = 1 and thepreceding equality
is still valid.Removing
the arrow(p~
-~ph+1)
fromg (1
hrk) gives
an elementof
GH(p; )
so that we have theinequalities
Yet,
there exists a sequence ofpopulations prk +1, ~ ~ ~ ,
in H such thatAnnales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques
and each cost
appearing
in theright-hand
side is realizedby
apath
inE~ B
H. That isand there exist
populations p1 , ’ ’ ’ , ps , pi , ’ ’ ’ , pt
of H such thatTo obtain such a sequence
~~k +1, ~ ~ ~ , pnk ,
wejust
take the successivepopulations
of H which appear in apath
of the set(see
definition5.2).
Letp
be thepath
obtainedby joining
end to end all thesepaths
in thefollowing
way:For each k in
1~,
we havedjt
1 ~ jL Tk
+(by inequality (3))
Thus
(we put
= for k inHowever the left-hand side member of the above
inequality
isexactly
Taking
the minimum over allpaths
ofy) yields AH (x, y) A(x, y).
DVol. 32, nO 4-1996.
Coming
back to the mutation-selectionalgorithm,
we see that the set Uof the uniform
populations verify
(more precisely:
Vx E E"2 ViE X V(x, (i))
=0)
so that thepreceding
results may be
applied
to the set U. Our next task is tostudy
the costfunction
Vu,
orequivalently (by propositions
5.4 and 5.6 and theorem5.8),
the restriction of V to
U,
as a function of m. We will often omit theparenthesis
whenspeaking
of uniformpopulations:
for instanceV (i, j)
willstand for
V ( ( i ) , ( j ) ) .
The crucial result isthat,
for msufficiently large,
6.
Y(i, j)
IS AN AFFINE FUNCTION OF mBefore
proceeding
to theproof
of the main theorem6.12,
wegive
severalnotations and definitions. We will consider
paths
in the sets andP(E).
Paths of Em willmostly
be denotedby
the letter p andpaths of P(E) by
the letter q.DEFINITION 6.1. - We define a bracket
operator [ ]
fromE~~~ _
E’~’~ ,
the set of all finite sequences of elements ofE,
ontoP(E),
the set of all subsets of
E, by
i.e.
[x]
is the set of all individualspresent
in thepopulation
x. Thebracket
operator [ ] provides
a naturalprojection
from(the
set of all finite sequences of elements of
E"2)
ontoP(E) ~~~ :
with eachpath
p =
(pl, ~ ~ ~ , pr)
in Em we associate thepath [p]
==(~pl~, ~ ~ ~ ,
inP(E).
From now on, the
population
size m will vary and we will sometimes considersimultaneously
severalpaths, possibly
with differentpopulation
sizes.
Thus,
if p is apath
inE’n,
weput m(p)
= m.NOTATION 6.2. -
By D~
we denote thepaths
in Em whichcorrespond
to
possible trajectories
for the whole processi.e. such a
path
p includes the intermediatepopulations Yn,
has an oddlength
and satisfiesAnnales de l’lnstitut Henri Poincaré - Probabilités et Statistiques
The
corresponding
cost function V is definedby
if the
path p belongs
toD~ (here
is the h-thcomponent
of the vectorp2k+1)
andV(p)
= oo otherwise(we
recall thatd(x, y)
is theHamming
distance between x and
y).
We
put
also for y, z in E"‘We denote
by D’~*(,y, z)
the elements p ofD~"(~, z)
such thatV(p) =
DEFINITION 6.3. - We define two cost functions
~V~
and[V]
onP(E)~~~;
for q 6
P(E)~~>,
(for
the definitions ofVi
andV,
see notations 5.1 and6.2).
NOTATION 6.4. - We
put
fori, j
in EDEFINITION 6.5. -
Let q
be apath
inP(E).
We say that thepath
ei --> ~ ~ ~ in E is admissible for
q if r =
Vk E ekE qx
andThe set of all admissible
paths
for q is denotedby ,,4(q).
Forinstance,
if p
belongs
toD,
for each hin {1. - m(~) ~,
thepath p~ --~
... ~~~h ~
isadmissible for
[p] (here ph
denotes the h-thcomponent
of the vectorp~ ).
Vol. 32, n° 4-1996.
NOTATION 6.6. - We define a function Q on the set
by:
if q E
[D],
thenf2(q)
= oc;if q
E[D],
then weput
(we
recall that 6 is the Kroneckersymbol).
We use the function S2 defined on
P(E)CN)
to build a function Q on E x E:Since the set
P( i, j)
is neverempty
and is included in[D] (see
notation6.4),
the
quantity
is finite. We denoteby P*(i; j)
the elements ofj) realizing
the above minimum.Let p be a
path
in E"~ andp
apath
in E"’ .We say that
p
is included in p(noted p C p)
if~~~
= and(we
recall thatpk (i)
is the number of occurrences of i in thepopulation p~).
A
path p
included in p may be obtained from thepath
pby destroying
some individuals in each
population
of thepath
in such a way that the set of mutationsoccurring
at odd times ispreserved.
Let q
be apath in P(E) . Among
thepaths
p such that[p]
= q, wesingle
out the
paths
which are minimal withrespect
to the above inclusion relation:DEFINITION 6.7. - The
path p
inDm
is a minimalpath realizing the path
q in
P(E)
if[p]
= q and for each m in N* and eachpath p
inD~,
wehave the
implication
(whence
also Vk Vi E E =The
path
p inD
is minimal if it is a minimalpath realizing
thepath [p].
This definition has the
following interpretation:
thepath
p is minimal if we can’tdestroy
a fixed number of individuals in eachpopulation
of pand still have a
path belonging
to D withoutaltering [p]
and the set ofmutations
occurring
in p.Annales de l’lnstitut Henri Poincaré - Probabilités et Statistiques
We have the
following
upper bound for thepopulation
sizes of minimalpaths:
LEMMA 6.8. - Let p be a minimal
path realizing
thepath
q inP(E).
ThenProof - Suppose
For
each k,
1 2k consider thepairs (ph , they
belong
to the set q2k-1 x q2k whosecardinality
isstrictly
less thanm(p);
necessarily
at least twopairs
are identical. We choose an indexh( k)
suchthat the
pair (ph~~)1 present
twice. Letp
be apath
obtained inthe
following
way:. for
each 1~, 1
2k~p~,
we remove the individuals and fromp2k-1
andp2k
to obtain thepopulations p2k-1
and. to build
Plql’
we remove from thepopulation p|q|
an individual which ispresent
twice(such
an individualnecessarily
exists since Cand the individuals
of p|q|-1
are not alldistinct).
Finally
we have for thispath p
thus
contradicting
theminimality
of p. DDEFINITION 6.9. - We define a function 0 on which associates with each
path q
the valueBy
the very definition ofSZ,
allpaths
psatisfy
theinequality V(p)
>so that the
quantity
isnon-negative.
Moreprecisely,
wehave the
following
LEMMA 6.10. - For all
paths
pbelonging
toD,
Proof -
Let p be apath
of D.Necessarily,
there exists a minimalpath p
included in p which realizes the
path [p].
We may rearrange the individuals of thepopulations pl, ~ ~ ~ ,
so thatp
appears as thetrajectories
of theVol. 32, n° 4-1996.