• Aucun résultat trouvé

ON SOME THEORETICAL PROPERTIES OF (1 + 1) EVOLUTIONARY ALGORITHMS

N/A
N/A
Protected

Academic year: 2022

Partager "ON SOME THEORETICAL PROPERTIES OF (1 + 1) EVOLUTIONARY ALGORITHMS"

Copied!
17
0
0

Texte intégral

(1)

ON SOME THEORETICAL PROPERTIES OF (1 + 1) EVOLUTIONARY ALGORITHMS

ALEXANDRU AGAPIE and MIRCEA AGAPIE

Evolutionary algorithms in general search space can be seen as continuous param- eter Markov chains. We define and estimate transition functions for the (1+1) evolutionary algorithm on the inclined plane and corridor models. In the first case, the probability of maximal success inniterations is derived in closed form, under uniform mutation. For the second case, the algorithm is proved to escape the corner of the corridor for both uniform and normal mutation, exponentially fast.

AMS 2000 Subject Classification: 60J27, 68W20.

Key words: evolutionary agorithm, Markov transition function.

INTRODUCTION

A good introduction to the theory of continuousevolutionary algorithms (EAs) could come from the analysis ofsimulated annealing[4]. Yet, two major impediments make this theory inappropriate for extrapolation. First, conver- gence of simulated annealing is constructed on particular properties of the Boltzman distribution, as the only selection operator; this is not the case with EAs. Second – and we follow here the argumentation from [5] – convergence of simulated annealing involves a double chain procedure (convergence of a sequence of homogeneous Markov chains to their limit distributions, then the sequence of limit distributions converging to a measure which carries only the optimal points) which, when transferred to convergence in probability for a practical algorithm, corresponds to very slow annealing schedules [3], involving unpractically huge computation time.

The Robbins-Monroe family of algorithms usually referred to as sto- chastic approximation [11], and the martingale approach to random search performed in [7], can be criticized for assuming a certain positive success rate in each algorithmic iteration. That corresponds to a local, rather than global behavior. The same applies to theevolution strategy theory developed in [2].

In contrast, the present paper analyzes continuous EAs seen as stochastic processes in general search space. The first step in this direction was taken by

REV. ROUMAINE MATH. PURES APPL.,52(2007),3, 287–303

(2)

[8], which introduced the formalization and presented the first global conver- gence result. According to that, two sufficient conditions for the convergence of an EA acting on continuous space are: (i) elitism, that is, the best state found so far cannot be lost from an iteration to another, and (ii) positively bounded probability for reaching the target zone in one step, from any point of the space. The original approach of Rudolph [8] was also extrapolated to adaptive EAs, modeled by multi-order Markov chains [1]. Still, from a prac- tical viewpoint, both analyses share the same deficit: they are too general;

the two conditions above do not provide the EA user with important infor- mation, like convergence rates or computation time for particular algorithms.

The present paper is a first attempt to cover this gap, by considering examples of continuous EAs with specific mutation operators, presented with different optimization tasks. Although a complete characterization of the algorithmic behavior is cumbersome – even for the simplest cases – we present convergence problems that can be solved within the proposed framework.

To start, let 0 = (0, . . . ,0) be the initial point of the algorithm and F : Rk R+ be the fitness (objective) function to be maximized. Let {zi}i≥1 be a sequence of i.i.d. random variables, uniformly distributed on the k-dimensional cube of volume one and center 0. Their common measure will be denotedz, and referred to as the mutation distribution. The simplest EA consists of only one individual, which transits the space under the action of succesive mutationszi. It can be seen as a stochastic process if we set

(1) Xn= 0 +

n i=1

zi =Xn−1+zn, n≥1.

This process is called the (1,1) EA with uniform mutationz. As the next state of the algorithm depends only on its current state,Xn defines a homogeneous Markov chain. The comma indicates that there is no selection between the current individual (the parent) and the next one (the offspring): the offspring replaces the parent, no matter what their fitness values are. This is exactly the case of the random walk. But if we call a replacementonly if the offspring is better than the parrent (elitistselection), then we have a proper evolutionary algorithm, denoted (1+1)EAand given for all n≥1 by

Xn= (Xn−1+zn)1{F(Xn−1+zn)≥F(Xn−1)}+Xn−11{F(Xn−1+zn)<F(Xn−1)}. Definition 1 ([6]). A function P : Rk× B(Rk) [0,1] is said to be a transition function (kernel) if P(w,·) is a probability measure on B(Rk) for allw∈Rk, and P, A) is a random variable onRk for all A∈ B(Rk).

For the optimization task, we first consider the inclined plane model, in Section 1. This model describes the simplest dependence between the objec- tive function and the variables of the search space. Under the assumption of

(3)

uniform mutation, we derive a closed form probability for the 2-dimensional EA to reach in n iterations a certain region of the plane. In Section 2 we restrict the search space to a corridor, and analyze the behavior of the EA at the corner. The algorithm is proved to escape the corner of the 2-dimensional corridor exponentially fast. The result is generalized in Subsection 2.1 to thek-dimensional search space, while in Subsection 2.2 the same behavior is demonstrated for the algorithm with normalmutation.

1. EA ON THE INCLINED PLANE

In this section we restrict the search to the real plane and consider the (1+1) EA with square uniform mutation starting at zero. For the fitness func- tion, we orient the coordinate system so that the plane slopes in the direction of the x-axis, i.e. x = corresponds to the optimum [10]. For this reason, thex-axis is also called progress axis in this paper. The simplest example of fitness function satisfying this requirement isF(x, y) =x.

Departing from traditional random walk, the kernel of the (1+1) EA is no longer continuous with respect to the Lebesgue measured. This is due to elitism, which makes the associated probability measure to have an atom at zero (a discontinuity of the distribution function). Namely, the algorithm is allowed to move only to the right, any unsuccessfulmutation– that is, to the left – making the EA stagnate in its current state. From this point on we omit the ‘(1+1)’ tag of the EA, since this is the only type of algorithm we analyze.

The associated one-step kernel can be described as a sum of two mea- sures, one singular (Dirac) and one continuous (note that hereA∈ B(R2)):

P((x, y), A) = 1

2δ(x,y)(A) + 1(x,x+1

2)×(y−12,y+12)·d(A) = (2)

= 1

2δ(x,y)(A) +d

x, x+1 2

× y−1

2, y+1 2

∩A

.

We shall frequently use the 1-dimensional version of (2) corresponding to the progress along thex-axis, and also its density form

P(x, A) = 1

2δx(A) +d

x, x+1 2

∩A

, A∈ B(R), (3)

P(x,du) = 1

2δx(u) + 1(x,x+1 2]·du,

where the first term from (3) carries only the null set{x}.

For all fixedx, y∈R, formulas (2)–(3) define probability measures with respect toA. Yet, it is not obvious that P(·, A) is a measurable function for any Borel setA. This is proved in Lemma 1 below. For simplicity we consider just one dimension and a setAof the formA= [a, b). For two dimensions, the

(4)

movements are independent with respect to the x and y axes, which makes the contribution of they-term multiplicative.

Lemma1. Let [a, b) be a fixed interval of the real line. The restriction to thex-axis of the kernel corresponding to the EAwith square uniform mutation is a measurable function given by

(4) P(x,[a, b)) =











0, x < a−12,

min{x+12, b} −a, x∈[a− 12, a),

12 + min{x+12, b} −x, x∈[a, b),

0, x≥b.

Proof. On each branch in (4) we have the probability for the one-step EA to move fromxto [a, b), calculated as the Lebesgue measure of the intersection between [a, b) and the 1/2-radius interval centered at x – see (3). The only case to watch out for isx∈[a, b), where the non-Lebesgue term 1/2 is added, as the probability of staying inx due to elitism. Since each branch function is measurable with respect to x, so is P(x,[a, b)). And, because B(R) is generated by intervals [a, b),P, A) is measurable for any Borel setA.

The following result is elementary.

Lemma 2. Let A∈ B(R) be a set with d(A) = 0. Then

P(x, A)dx= 0.

Now we consider the n-step kernel Pn. Assume as usual that the al- gorithm starts at zero, and furthermore restrict the search to the real line.

Applying the Chapman-Kolmogorov formula we get P2(0, A) =

P(0,dx)P(x, A) = 1

2P(0, A) + 1/2

0+ P(x, A)dx, A∈ B(R). The only discontinuity of this kernel is at zero. This is true not only forP2 but for any powerPn withn≥1, as stated in

Proposition 1. Let P be the kernel of a 1-dimensional EA and A B((0,∞)) with d(A) = 0. Then for all n≥1 we have

Pn(0, A) = 0,

(i)

Pn(x, A)dx= 0. (ii)

Proof. The casen= 1 corresponds to definition (3) for (i), respectively to Lemma 2 for (ii). Next, applying Chapman-Kolmogorov toPn and assuming

(5)

by induction that the statement holds forn−1, we get Pn(0, A) =

P(0,dx)Pn−1(x, A) = 1

2Pn−1(0, A)+

+ 1/2

0+ Pn−1(x, A)dx≤0 +

Pn−1(x, A)dx= 0.

We now return to the inclined plane model. The goal is to derive an expression for then-step progress of the EA along the x-axis, by calculating its probability to reach inn iterations theSn rectangle (see Figure 1)

Sn=

(x, y) : n−1

2 ≤x < n

2,|y|< n 2

, n≥1.

00000 00000 00000 00000 00000 00000

11111 11111 11111 11111 11111 11111

00000 00000 00000 00000 00000 00000 00000 00000 00000 00000

11111 11111 11111 11111 11111 11111 11111 11111 11111 11111

00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000

11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111

00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000

11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111Sn

S1 S2

0 1

S3

1/2 3/2

y

(n-1)/2 n/2 x = Progress axis

Fig. 1. Regions of progress for the EA on the inclined plane model.

To this end, we start with an intuitive result, which can be stated informally as follows. The kernel P is invariant to translations along the progress axis.

Formally, we have

Lemma 3. Let Sn1 = [n−12 ,n2) be the projection of Sn on the x-axis for n≥1. Let n≥1, 1≤k < n, and m≥1. Then for all x≥k/2 we have

Pm(x, Sn1) =Pm

x−k 2, Sn−k1

.

(6)

Proof. Fix n, k < n and proceed with induction onm. For n = 1, (4) yields

P(x, Sn1) =P

x, n−1

2 ,n 2

=











0, x≤ n−22 , x+12 n−12 , x∈(n−22 ,n−12 ),

12 +n2 −x, x∈[n−12 ,n2),

0, x≥ n2,

=











0, x−k2 n−k−22 ,

(x−k2) +12 n−k−12 , x−k2 (n−k−22 ,n−k−12 ),

12 +n−k2 (x−k2), x−k2 [n−k−12 ,n−k2 ),

0, x−k2 n−k2 ,

=P

x−k 2,

n−k−1 2 ,n−k

2

=P

x−k 2, Sn−k1

. Assume now that the property holds up tom−1, and calculate

Pm(x, Sm1 ) =

P(x,du)Pm−1(u, Sn1) =

P(x,du)Pm−1

u−k 2, Sn−k1

=

=

P

x− k 2,du

Pm−1

u−k

2, Sn−k1

=Pm

x−k 2, S1n−k

, where the last equalities come from induction, casesm−1 and 1.

Actually, Lemma 3 holds for any power ofP and any translation rate but for the purpose of this analysis the form provided is sufficient. The following result is essential for characterizing then-step kernels.

Proposition 2. Let x∈S11. For alln≥1 we have Pn(x, Sn+11 ) = 1

n!xn.

Proof. Induction on n. For n= 1, apply (4) with [a, b) = [12,1):

P x,1

2,1

=x+1 2 1

2 =x.

Next, assume the condition holds forn−1 and derive Pn(x, Sn+11 ) =

P(x,du)Pn−1(u, Sn+11 ) =

= 1

2Pn−1(x, Sn+11 ) + x+1

2

x Pn−1(u, Sn+11 )du,

(7)

where the first term is zero becauseSn+11 is notaccesible from x < 12. There- fore,

Pn(x, Sn+11 ) = 1

2

x Pn−1(u, Sn+11 )du+ x+1

2 12

Pn−1(u, Sn+11 )du,

where the first integral is zero. Apply Lemma 3, change the variableu−12 =v, and use induction to obtain the conclusion.

The following result opens the path to the 2-dimensional case:

Lemma 4. Let k≥0 and (x, y)∈Sk. Then for all n≥1 we have Pn((x, y), Sn+k) =Pn(x, Sn+k1 ).

Proof. Induction on n. For n= 1 equation (2) yields P((x, y), Sk+1) = 1

2δ(x,y)(Sk+1) +d

x, x+1 2

× y−1

2, y+1 2

∩Sk+1

=

= 1

2δ(x)(Sk+11 ) + 1·d

x, x+1 2

∩S1k+1

=P(x, Sk+11 ). Assuming that the equality holds forn−1, we have

Pn((x, y), Sn+k) =

P((x, y),dw)Pn−1(w, Sn+k) =

= 1

2δ(x, y)(w)Pn−1((x, y), Sn+k) + x+1

2

x du y+1

2

y−12 Pn−1((u, v), Sn+k)dv=

= 1

2Pn−1(x, Sn+k1 ) + x+1

2

x du y+1

2

y−12 Pn−1(u, Sn+k1 )dv =

= 1

2Pn−1(x, S1n+k) + x+1

2

x Pn−1((u, v), Sn+k)du=

=

P(x,du)Pn−1(u, Sn+k) =Pn(x, Sn+k).

The main result of this section provides a closed form forPn(0, Sn), the probability for the uniform-mutation EA to be in the region Sn of the plane aftern iterations.

Theorem 1. Let k≥0 and(x, y)∈Sk. Then for all n≥1 we have Pn(0, Sn) = 1

n! 2n.

(8)

Proof. According to Lemma 4 it is sufficient to prove the 1-dimensional result. We have

Pn(0, Sn1) =

P(0,dx)Pn−1(x, Sn) = 1

2

0 P(0,dx)Pn−1(x, Sn) =

= 1

2

0 P(0,dx) xn−1 (n−1)! =

1

2

0

xn−1

(n−1)!dx= 1 n! 2n,

where in the last line we used Proposition 2 and the fact that Sn1 is not at- tainable from 0 inn−1 iterations.

Theorem 1 can be easily generalized to the case of EA with uniform mutation inside the square of arear2.

Corollary 1.1. If mutation is uniformly distributed inside the square of arear2, and Snr ={(x, y) : (n−1)/2≤x < n/2, |y|< n/2}, then

Pn(0, Snr) = 1 n! 2n.

Proof. Same as for Theorem 1 but integrating from 0 to r/2 along the x-axis, respectively from−r/2 tor/2 along they-axis, and dividing byr2 each occurrence of P.

Remark 1.1. For the inclined plane model and the x-axis, another de- scription of the n-step transition function is provided by the inversion for- mula of its characteristic functions. This is easily derived by reprezenting the algorithm as a sum of i.i.d. random variables of the form (1), but with discontinuous distributionz= 1/2δ0+ 1(0,1/2].

2. EA ON THE CORRIDOR

The corridor model is obtained from the inclined plane model by re- stricting the search to the bandAh ≤yh ≤Bh,h= 1, . . . , k, where the space dimension is now k+ 1. As before,x is considered to be the progress axis. A special situation occurs when the EA is located at a corner of the corridor, where the probability of a successful mutation is small, making the algorithm stagnate for a long time. The success probability gets smaller with increasing dimensionk, and the EA seems to stay in the corner for ever. This is unde- sirable algorithmic behavior, as pointed out in [10], as an empirical result for multi-normal mutation EAs. We shall prove in what follows that, from a the- oretical viewpoint, things are different: the probability for the EA to stagnate in the corner converges to zero exponentially fast, for either uniform or nor- mal mutation. Nevertheless – and this explains the experimental conclusion – the exponential convergence rate approaches one (also exponentially) when

(9)

kgoes to infinity, which may be a reason for notobserving the escape of the algorithm in (real) computation time.

For the rest of the paper we confine the analysis to the ability of EA, under different mutation distributions, to escape the corner of the corridor, assuming it has reached one. Thus we can fix the lower barrier at Ah = 0, h= 1, . . . , k and disregard the upper barrier Bh. Furthermore, if we assume movements along the x and y axes to be independent, then the contribution in probability of the x-axis mutation will be always a constant factor (1/2);

it is thus sufficient to consider the y-axis kernel. The reader is cautioned at this point that the goal of the analysis has shifted from the goal pursued in the previous section: rather than concerning ourselves with the behavior along the progress axis, from now on we are dealing with behavior along the non-progress axis.

Let us start with the 2-dimensional case (k= 1), and with the EA with uniform mutation inside the square of area 1, introduced in Section 1. We assume the algorithm starts at 0, and aim to describe the probability for the EA to leave the corner, which will be defined as the band{y : 0≤y≤1/2}; see Figure 2.

00000000000000000000000000000000000000 11111111111111111111111111111111111111

0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000

1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000

1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111

y

0 1/2

x = Progress axis

C o r n e r C o r r i d o r

B

Forbidden zone

Forbidden zone

Fig. 2. The 2-dimensional corridor model.

The one-step kernel along they-axis is given for ally 0 by P(y, A) =

(12 −y)δy(A) +d((0, y+12)∩A), y≤ 12, d((y−12, y+12)∩A), y > 12, (5)

P(y,du) =

(12 −y)δy(u) + 1(0,y+1

2]·du, y≤ 12, 1(y−1

2,y+12]·du, y > 12. (6)

(10)

Note that the movement along they-axis copies the discontinuous transition rule along the x-axis from the inclined plane model – equation (3), but only fory= 0.

The formulas above make it clear why the 1-dimensional Markov chain – which is the EA on the corridor, restricted to the y-axis – cannot be put in the additive form (1). Next, as the point y = 0 is acting like a non- absorbent barrier, one would be interested in observing the behavior of the algorithm in the vicinity of this point, namely in evaluating the succesive terms {Pn(0,[0,12])}n≥1. That calculation is tractable, even for high-order terms, still no recursion becomes apparent. A different conclusion can be drawn if we confine the analysis to the sequence{Pn(0,[0,12])}n≥1, wherePn(0,[0,12]) is by definition,the probability of theEAstarting at0to be inside they-axis interval [0,12]from iteration 1to iteration n. Note thatPn(0,[0,12])≥Pn(0,[0,12]) for alln, and also

Proposition 3. For all y∈(0,12] we have P(0,[0,12]) = 1, Pn(0,[0,12]) = 1

2Pn−1(0,[0,12]) +In−1, n≥2, (7)

In= 1

2

0 Pn(y,[0,12])dy, n≥1, P(y,[0,12]) = 1−y,

Pn(y,[0,12]) = 1

2−y n−1

(1−y) + 1

2−y n−2

I1+· · ·+In−1, n≥2. (8)

Proof. Induction on n. For n = 1 we have Pn(y,[0,12]) = Pn(y,[0,12]) for ally∈[0,12] and the result follows from the previous calculation on P.

Next, assume the recursion holds forn−1 and use the Chapman-Kolmogorov equation

Pn(0,[0,12]) =

P(0,dy)Pn−1(y,[0,12]) =

P(0,dy)Pn−1(y,[0,12]), then apply (6) with y = 0 to obtain (7). In order to prove (8), apply again Chapman-Kolmogorov, then (6) to get

Pn(y,[0,12]) =

P(y,du)Pn−1(u,[0,12]) =

= (12 −y)Pn−1(y,[0,12]) + 1

2

0 Pn−1(u,[0,12])du,

(11)

which, according to the induction hypothesis, is equal to 1

2−y 1

2−y n−2

(1−y) + 1

2−y n−3

I1+· · ·+In−2

+In−1=

= 1

2−y n−1

(1−y) + 1

2 −y n−2

I1+· · ·+ 1

2 −y

In−2+In−1. By iterating numerically the sequence {Pn(0,[0,12])}n≥1 from equation (7), we observe exponential convergence to zero when the number of itera- tions increases. This corresponds to exponential convergence to one for the complement – the probability of escape from the corner for the EA with uni- form mutation. To make that rigorous, an upper bound for Pn(0,[0,12]) is computed next.

Lemma 5. Let fn: [0,12][0,1] be such that f1(y) = 1−y, fn(y) =

1 2−y

fn−1(y) +In−1, n≥2, (9)

In= 1

2

0 fn(y)dy, n≥1. Then{fn(0)}n≥1 converges to zero as n→ ∞.

Proof. First note thatfnis monotonically decreasing and convex in [0,12] for all n. Products and sums of monotonically decreasing positive functions are monotonically decreasing. As for convexity, all terms in the expanded form (8) – which also holds forfn(y) – have factors of the form (1/2−y)i, therefore their second derivative will be positive in the given interval.

It follows from convexity that the integral definingInis less then the area of a certain trapesius: In [fn(0) + 1/2fn]/4. From (9) we have fn(1/2) = In−1 and alsofn(0) = 1/2fn−1(0) +In−1, so we get

(10) In 1

8fn−1(0) + 1 2In−1.

Given the positivity of all terms in (9), it is clear that if we replace (10) with equality, all functionsfnwill increase, and in particular so will dofn(0) andIn:

(11) In= 1

8fn−1(0) + 1 2In−1.

The advantage is now that (9) and (11) constitute a purely numeric double sequence, which can be solved in closed form. Technically, we should rename the variables, since the sequence defined by (9) and (11) is a different one, bounding the original fn(0) andIn. Let us call them an and bn, respectively.

(12)

Define the vector xn = [an, bn]T and the matrix A =

1/2 1 1/8 1/2

. The numerical recursion is now written simplyxn=Axn−1. Under initial condition x1 = [1,3/8]T and given the eigenvalues of A,σ1 = 0.853, σ2 = 0.146, by the Jordan decomposition we obtain

an = 1.030σ1n−10.030σn−12 , bn= 0.364σn−11 + 0.010σn−12 .

Conclusion: an above is an upper bound on fn(0), and it tends exponentially to zero asn→ ∞.

The main result is now a straightforward implication of Lemma 5 and Proposition 3, by identifyingfn(y) =Pn(y,[0,12]) for alln.

Theorem2. The probability for the EA with uniform mutation to stag- nate at the corner of the2-dimensional corridor from iteration1tonconverges to zero exponentially asn→ ∞.

2.1. THE MULTI-DIMENSIONAL CORRIDOR

The escape from the corner problem can be approached in a similar manner for the multi-dimensional corridor. Consider the most general case:

a fixed positive integer k, a (k+ 1)-dimensional corridor, and the associated k-dimensional kernel Pnk((y1, . . . , yk),[0,1/2]k), to be denoted fnk(y1, . . . , yk) in what follows. A derivation similar to the 2-dimensional case leads to

Lemma 6. Let fnk: [0,12]k [0,1] be such that f1k(y1, . . . , yk) = 1

k

i=1

1 2 +yi

1 2k

,

fnk(y1, . . . , yk) =

1 2k k

i=1

yi

fn−1k (y1, . . . , yk) +In−1, n≥2, (12)

In= 1

2

0 . . . 1

2

0 fnk(y1, . . . , yk)dy1. . .dyk, n≥1. (13)

Then{fnk(0, . . . ,0)}n≥1 converges to zero exponentially as n→ ∞. The main result follows now by identifying fnk(y) = Pnk((y),[0,1/2]k) for all n 1 and all (y1, . . . , yk) [0,1/2]k. The verification of Lemma 6 on the transition probabilities Pnk – the analogue of proposition 3 – is a straightforward consequence of Chapman-Kolmogorov equation and of thek- dimensional version of (5). For simplicity,k+ 1 is renotedN.

(13)

Theorem 3. The probability of the EA with uniform mutation to stag- nate at the corner of the N-dimensional corridor from iterations 1 to n con- verges to zero exponentially fast asn→ ∞, for allN 3.

One should note that through some more detailed geometrical compu- tations, tighter bounds can be derived. This is made clear forN = 3 by the following result (proof similar to that of Lemma 5):

Proposition 4. For the 3-dimensional case, the probability of the EA with uniform mutation to stagnate at the corner from iterations 1 to n is bounded from above by

an= 0.776σn−11 + 0.200σ2n−1+ 0.023σ3n−1, where σ1 = 0.478, σ2= 0.250, σ3= 0.021.

2.2. EA WITH NORMAL MUTATION

Normal mutation is extensively used in EA applications, being also a workhorse for empirical testing and statistical study [10]. For the present analysis, passing from uniform to normal mutation means passing from a ker- nel with compact suport to one which is spread over the whole space. Together with elitism, this makes EA fulfill both sufficient conditions to global conver- gence [8], [9].

We keep the corridor model, and analyze the escape from the corner problem, but give a unitary treatment for the 2- and (k+ 1)-dimensional case.

As remarked in Section 2, we can omit the x-component of the transition function and concentrate only on theknon-progress directions, which are the axes defining the corridor. For convenience, we replace the upper bound 1/2 by 1 in the corner definition.

Let us denote byL(u, y) the density of thek-variate normal distribution with meany and covariance matrixC

L(u, y) = 1

(2π)kdet(C)e12(u−y)TC−1(u−y).

The one-stepk-dimensional kernel associated with the EA with normal muta- tion is defined, for (y1, . . . , yk)0 and A∈ B(Rk), by

P(y, A) =

1

(0,∞)kL(u, y)du

δy(A) +

(0,∞)k∩AL(u, y)du, (14)

P(y,du) =

1

(0,∞)kL(u, y)du

δy(u) +L(u, y)1(0,∞)kdu.

(14)

As before, we confine the analysis to the sequence {Pn(0,[0,1])}n≥1, wherePn(0,[0,1]) stands now for the probability of the EA with normal mu- tation starting at 0 to be inside the y-axis interval [0,1], from iterations 1 to n. Then we have have the following equivalent of Lemma 6.

Lemma 7. Let fnk: [0,1]k [0,1]be such that f1k(y) = 1

(1,∞)kL(u, y)du, fnk(y) =

1

(0,∞)kL(u, y)du

fn−1k (y) +In−1(y), n≥2, In(y) =

(0,1)kL(u, y)fnk(u)du, n≥1. Then{fnk(0)}n≥1 is exponentially decreasing to zero as n→ ∞.

Proof. For all n denote by fnMAX the maximal value of fn when the components of y take independently values between 0 and 1. Omit index k from the expression off. Then

fn(y)≤fn−1MAX

1

(0,∞)kL(u, y)du

+In−1, (15)

In−1(y)≤fn−1MAX

(0,1)kL(u, y)du.

(16)

Substitute (16) in (15) to get

(17) fn(y)≤fn−1MAX

1

(0,∞)kL(u, y)du+

(0,1)kL(u, y)du

.

Denote byαk(y) the term in the large parantheses above. This is a continuous function of y with compact domain (the unit cube). Hence, according to a well-known result in analysis, it attains both its minimum and maximum at points of the domain. Since at any pointy in the domain 0< αk(y) <1, the same inequality is true for both its minimum and maximum, denoted αMINk andαMAXk , respectively.

(15)

For numerical evaluations, we can also derive computable versions of these two bounds, as follows. We have

1

(−1,∞)kL(u,0)du+

(0,1)kL(u,0)du≤ αk(y) (18)

1

(0,∞)kL(u,0)du+

(−12,12)kL(u,0)du.

Both inequalities above follow easily by induction onkby using at each induc- tion step the one-dimensional version of the same inequality for the respective marginal of thek-variate normal. In compact form we have

(19) αmink ≤αk(y)≤αmaxk

with obvious identifications. The new bounds are independent of y, and αmaxk < 1 for all k 1. Note that the explicit bounds just obtained are in general less tight than the previous ones,αMINk and αMAXk .

For the final step of the proof, either upper bound works, since both are strictly sub-unitary. Let fn(y) attain its maximum in (17). From (19) we obtain

fnMAX≤αkfn−1MAX≤ · · · ≤αn−1k f1MAX, which tends to 0 asn→ ∞. Thus,fn(0) will do the same.

As with uniform mutation, the main result follows now by identifying fnk(y) = Pnk((y),[0,1]k) for all n 1 and all y [0,1]k. The verification of Lemma 7 onPnkis a staightforward consequence of Chapman-Kolmogorov and (14). Again, (k+ 1) is denotedN.

Theorem4. The probability of theEAwithnormalmutation to stagnate at the corner of the N-dimensional corridor from iterations 1 to n converges to zero exponentially fast as n→ ∞, for all N 2.

Remark 4.1. For the case of uncorrelated mutations (diagonal covariance matrix C), the bases responsible for the EA’s convergence – αmink and αmaxk from the proof of Lemma 7 – both converge to 1 as k→ ∞. Indeed, in this case the domain-integrals in (18) can be easily computed from their uni-variate forms, and get

αmink = 10.84k+ 0.34k≤αk(y)10.5k+ 0.38k=αmaxk .

Obviously, both bounds tend to 1 exponentially as k→ ∞, and this explains the stagnation at the corner reported in [10], for certain corridor problems on spaces of large dimensionk.

(16)

3. CONCLUSIONS

In continuous search spaces, significant insight on the behavior of a prob- abilistic algorithm can be gained by analyzing the associated transition func- tion. This was done in this paper for the one-individual evolutionary algorithm with elitist selection and two types of mutation. First, uniform mutation was used to derive the probability of maximal success inniterations on the inclined plane model. Next, search was restricted to a corridor, and the algorithm’s behavior at one of the corners was analyzed in detail, under uniform/normal mutation, in two-/multi-dimensional space. The conclusion is that the proba- bility of escaping the corner tends to one when the number of iterations goes to infinity, and convergence is exponentially fast.

Taking the analysis to more complicated landscapes and/or different mu- tation distributions would be an important step ahead. An even more challeng- ing task would be to characterize the kernel of multi-individual evolutionary algorithms, thus allowing for interactions within the current population.

Acknowledgements.This work was done while the first author was visiting Dort- mund University. Financial support from the Collaborative Research CenterCompu- tational Intelligence(SFB 531), and sccientific support from Chair Computer Science XI are gratefully acknowledged.

REFERENCES

[1] A. Agapie, Theoretical analysis of mutation-adaptive evolutionary algorithms. Evolu- tionary Comput.9(2001), 127–146.

[2] H.-G. Beyer,The Theory of Evolution Strategies.Springer, Heidelberg, 2001.

[3] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.IEEE Trans. Pattern Anal. Machine Intelligence6(1984), 721–

741.

[4] H. Haario and E. Saksman,Simulated annealing process in general state space.Adv. in Appl. Probab.23(1991), 866–893.

[5] O. H¨aggstr¨om, Finite Markov Chains and Algorithmic Applications.Cambridge Univ.

Press, Cambridge, 2002.

[6] E. Nummelin, General Irreducible Markov Chains and Non-negative Operators. Cam- bridge Univ. Press, Cambridge, 1984.

[7] G. Rapple, On linear convergence of a class of random search algorithms.Z. Angew.

Math. Mech.69(1989), 37–45.

[8] G. Rudolph,Convergence of evolutionary algorithms in general search spaces.In: Proc.

3rd IEEE Conf. on Evolutionary Computation, Piscataway,NJ, pp. 50–54. IEEE Press, 1996.

(17)

[9] G. Rudolph, Convergence Properties of Evolutionary Algorithms. Kova´c, Hamburg, 1997.

[10] H.-P. Schwefel,Evolution and Optimum Seeking.Wiley, New York, 1995.

[11] M.T. Wasan,Stochastic Approximation.Cambridge Univ. Press, Cambridge, 1969.

Received 15 October 2005 Romanian Academy

Institute of Mathematical Statistics and Applied Mathematics Calea 13 Septembrie nr. 13 050711 Bucharest 5, Romania

agapie@rdslink.ro and

Computer Science, Tarleton State University Box T-0930 Stephenville, TX 76402, USA

agapie@tarleton.edu

Références

Documents relatifs

We continue with algorithms to compute the matrix factorials that arise in Lemma 3.3 (§4.2) and with a numeri- cally stable algorithm to compute the characteristic polyno- mial of

Compared to the traditional meaning of the synthetic spectra, the physics in our model is greatly reduced: The set of the lines in our synthetic spectrum is not derived from the

However, even if we can think about the process to choose in each of these fields including our own choices in our individual identity, this is not enough for

We view such a fast algorithm as a sequence of what we call interrupted algorithms, and we obtain two main results about the (plain) Euclid Algorithm which are of independent

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

d Department of Medical Genetics (National reference center ‘ maladies neuromusculaires du grand sud-ouest’), University Hospital of Bordeaux (groupe hospitalier Pellegrin),

Escape from local minimum is a classical problem for the global search algorithms, and has been studied in different fields. A popu- lar method is the restart strategy such as in

The Schoof-Elkies-Atkin algorithm is the only known method for counting the number of points of an elliptic curve defined over a finite field of large characteristic.. Several