ON SOME THEORETICAL PROPERTIES OF (1 + 1) EVOLUTIONARY ALGORITHMS

(1)

ON SOME THEORETICAL PROPERTIES OF (1 + 1) EVOLUTIONARY ALGORITHMS

ALEXANDRU AGAPIE and MIRCEA AGAPIE

Evolutionary algorithms in general search space can be seen as continuous param- eter Markov chains. We define and estimate transition functions for the (1+1) evolutionary algorithm on the inclined plane and corridor models. In the first case, the probability of maximal success inniterations is derived in closed form, under uniform mutation. For the second case, the algorithm is proved to escape the corner of the corridor for both uniform and normal mutation, exponentially fast.

AMS 2000 Subject Classification: 60J27, 68W20.

Key words: evolutionary agorithm, Markov transition function.

INTRODUCTION

A good introduction to the theory of continuousevolutionary algorithms (EAs) could come from the analysis ofsimulated annealing[4]. Yet, two major impediments make this theory inappropriate for extrapolation. First, convergence of simulated annealing is constructed on particular properties of the Boltzman distribution, as the only selection operator; this is not the case with EAs. Second – and we follow here the argumentation from [5] – convergence of simulated annealing involves a double chain procedure (convergence of a sequence of homogeneous Markov chains to their limit distributions, then the sequence of limit distributions converging to a measure which carries only the optimal points) which, when transferred to convergence in probability for a practical algorithm, corresponds to very slow annealing schedules [3], involving unpractically huge computation time.

The Robbins-Monroe family of algorithms usually referred to as sto- chastic approximation [11], and the martingale approach to random search performed in [7], can be criticized for assuming a certain positive success rate in each algorithmic iteration. That corresponds to a local, rather than global behavior. The same applies to theevolution strategy theory developed in [2].

In contrast, the present paper analyzes continuous EAs seen as stochastic processes in general search space. The ﬁrst step in this direction was taken by

REV. ROUMAINE MATH. PURES APPL.,52(2007),3, 287–303

(2)

[8], which introduced the formalization and presented the first global convergence result. According to that, two sufficient conditions for the convergence of an EA acting on continuous space are: (i) elitism, that is, the best state found so far cannot be lost from an iteration to another, and (ii) positively bounded probability for reaching the target zone in one step, from any point of the space. The original approach of Rudolph [8] was also extrapolated to adaptive EAs, modeled by multi-order Markov chains [1]. Still, from a practical viewpoint, both analyses share the same deficit: they are too general;

the two conditions above do not provide the EA user with important infor- mation, like convergence rates or computation time for particular algorithms.

The present paper is a first attempt to cover this gap, by considering examples of continuous EAs with specific mutation operators, presented with different optimization tasks. Although a complete characterization of the algorithmic behavior is cumbersome – even for the simplest cases – we present convergence problems that can be solved within the proposed framework.

To start, let 0 = (0, . . . ,0) be the initial point of the algorithm and F : R^k → R₊ be the fitness (objective) function to be maximized. Let {z_i}_i≥1 be a sequence of i.i.d. random variables, uniformly distributed on the k-dimensional cube of volume one and center 0. Their common measure will be denotedz, and referred to as the mutation distribution. The simplest EA consists of only one individual, which transits the space under the action of succesive mutationsz_i. It can be seen as a stochastic process if we set

(1) Xn= 0 +

n i=1

zi =Xn−1+zn, n≥1.

This process is called the (1,1) EA with uniform mutationz. As the next state of the algorithm depends only on its current state,X_n defines a homogeneous Markov chain. The comma indicates that there is no selection between the current individual (the parent) and the next one (the offspring): the offspring replaces the parent, no matter what their fitness values are. This is exactly the case of the random walk. But if we call a replacementonly if the offspring is better than the parrent (elitistselection), then we have a proper evolutionary algorithm, denoted (1+1)EAand given for all n≥1 by

Xn= (Xn−1+zn)1_{F_(X_n−1_+z_n_)≥F_(X_n−1_)}+Xn−11_{F(X_n−1_+z_n_)<F_(X_n−1_)}. Definition 1 ([6]). A function P : R^k× B(R^k) → [0,1] is said to be a transition function (kernel) if P(w,·) is a probability measure on B(R^k) for allw∈R^k, and P(·, A) is a random variable onR^k for all A∈ B(R^k).

For the optimization task, we ﬁrst consider the inclined plane model, in Section 1. This model describes the simplest dependence between the objective function and the variables of the search space. Under the assumption of

(3)

uniform mutation, we derive a closed form probability for the 2-dimensional EA to reach in n iterations a certain region of the plane. In Section 2 we restrict the search space to a corridor, and analyze the behavior of the EA at the corner. The algorithm is proved to escape the corner of the 2-dimensional corridor exponentially fast. The result is generalized in Subsection 2.1 to thek-dimensional search space, while in Subsection 2.2 the same behavior is demonstrated for the algorithm with normalmutation.

1. EA ON THE INCLINED PLANE

In this section we restrict the search to the real plane and consider the (1+1) EA with square uniform mutation starting at zero. For the ﬁtness function, we orient the coordinate system so that the plane slopes in the direction of the x-axis, i.e. x = ∞ corresponds to the optimum [10]. For this reason, thex-axis is also called progress axis in this paper. The simplest example of ﬁtness function satisfying this requirement isF(x, y) =x.

Departing from traditional random walk, the kernel of the (1+1) EA is no longer continuous with respect to the Lebesgue measured. This is due to elitism, which makes the associated probability measure to have an atom at zero (a discontinuity of the distribution function). Namely, the algorithm is allowed to move only to the right, any unsuccessfulmutation– that is, to the left – making the EA stagnate in its current state. From this point on we omit the ‘(1+1)’ tag of the EA, since this is the only type of algorithm we analyze.

The associated one-step kernel can be described as a sum of two measures, one singular (Dirac) and one continuous (note that hereA∈ B(R²)):

P((x, y), A) = 1

2δ_(x,y)(A) + 1_(x,x+1

2)×(y−¹₂,y+¹₂)·d(A) = (2)

= 1

2δ_(x,y)(A) +d

x, x+1 2

× y−1

2, y+1 2

∩A

.

We shall frequently use the 1-dimensional version of (2) corresponding to the progress along thex-axis, and also its density form

P(x, A) = 1

2δx(A) +d

x, x+1 2

∩A

, A∈ B(R), (3)

P(x,du) = 1

2δ_x(u) + 1_(x,x+1 2]·du,

where the ﬁrst term from (3) carries only the null set{x}.

For all ﬁxedx, y∈R, formulas (2)–(3) deﬁne probability measures with respect toA. Yet, it is not obvious that P(·, A) is a measurable function for any Borel setA. This is proved in Lemma 1 below. For simplicity we consider just one dimension and a setAof the formA= [a, b). For two dimensions, the

(4)

movements are independent with respect to the x and y axes, which makes the contribution of they-term multiplicative.

Lemma1. Let [a, b) be a fixed interval of the real line. The restriction to thex-axis of the kernel corresponding to the EAwith square uniform mutation is a measurable function given by

(4) P(x,[a, b)) =











0, x < a−¹₂,

min{x+¹₂, b} −a, x∈[a− ¹₂, a),

12 + min{x+¹₂, b} −x, x∈[a, b),

0, x≥b.

Proof. On each branch in (4) we have the probability for the one-step EA to move fromxto [a, b), calculated as the Lebesgue measure of the intersection between [a, b) and the 1/2-radius interval centered at x – see (3). The only case to watch out for isx∈[a, b), where the non-Lebesgue term 1/2 is added, as the probability of staying inx due to elitism. Since each branch function is measurable with respect to x, so is P(x,[a, b)). And, because B(R) is generated by intervals [a, b),P(·, A) is measurable for any Borel setA.

The following result is elementary.

Lemma 2. Let A∈ B(R) be a set with d(A) = 0. Then

P(x, A)dx= 0.

Now we consider the n-step kernel Pⁿ. Assume as usual that the algorithm starts at zero, and furthermore restrict the search to the real line.

Applying the Chapman-Kolmogorov formula we get P²(0, A) =

P(0,dx)P(x, A) = 1

2P(0, A) + _1/2

0+ P(x, A)dx, A∈ B(R). The only discontinuity of this kernel is at zero. This is true not only forP² but for any powerPⁿ withn≥1, as stated in

Proposition 1. Let P be the kernel of a 1-dimensional EA and A ∈ B((0,∞)) with d(A) = 0. Then for all n≥1 we have

Pⁿ(0, A) = 0,

(i)

Pⁿ(x, A)dx= 0. (ii)

Proof. The casen= 1 corresponds to deﬁnition (3) for (i), respectively to Lemma 2 for (ii). Next, applying Chapman-Kolmogorov toPⁿ and assuming

(5)

by induction that the statement holds forn−1, we get Pⁿ(0, A) =

P(0,dx)Pⁿ⁻¹(x, A) = 1

2Pⁿ⁻¹(0, A)+

+ _1/2

0+ Pⁿ⁻¹(x, A)dx≤0 +

Pⁿ⁻¹(x, A)dx= 0.

We now return to the inclined plane model. The goal is to derive an expression for then-step progress of the EA along the x-axis, by calculating its probability to reach inn iterations theSn rectangle (see Figure 1)

Sn=

(x, y) : n−1

2 ≤x < n

2,|y|< n 2

, n≥1.

00000 00000 00000 00000 00000 00000

11111 11111 11111 11111 11111 11111

00000 00000 00000 00000 00000 00000 00000 00000 00000 00000

11111 11111 11111 11111 11111 11111 11111 11111 11111 11111

00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000

11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111

00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000

11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111Sn

S1 S2

0 1

S3

1/2 3/2

y

(n-1)/2 n/2 x = Progress axis

Fig. 1. Regions of progress for the EA on the inclined plane model.

To this end, we start with an intuitive result, which can be stated informally as follows. The kernel P is invariant to translations along the progress axis.

Formally, we have

Lemma 3. Let S_n¹ = [ⁿ⁻¹₂ ,ⁿ₂) be the projection of Sn on the x-axis for n≥1. Let n≥1, 1≤k < n, and m≥1. Then for all x≥k/2 we have

P^m(x, S_n¹) =P^m

x−k 2, S_n−k¹

.

(6)

Proof. Fix n, k < n and proceed with induction onm. For n = 1, (4) yields

P(x, S_n¹) =P

x, n−1

2 ,n 2

=











0, x≤ ⁿ⁻²₂ , x+¹₂ −ⁿ⁻¹₂ , x∈(ⁿ⁻²₂ ,ⁿ⁻¹₂ ),

12 +ⁿ₂ −x, x∈[ⁿ⁻¹₂ ,ⁿ₂),

0, x≥ ⁿ₂,

=











0, x−^k₂ ≤ ^n−k−2₂ ,

(x−^k₂) +¹₂ −^n−k−1₂ , x−^k₂ ∈(^n−k−2₂ ,^n−k−1₂ ),

12 +^n−k₂ −(x−^k₂), x−^k₂ ∈[^n−k−1₂ ,^n−k₂ ),

0, x−^k₂ ≥ ^n−k₂ ,

=P

x−k 2,

n−k−1 2 ,n−k

2

=P

x−k 2, S_n−k¹

. Assume now that the property holds up tom−1, and calculate

P^m(x, S_m¹ ) =

P(x,du)P^m−1(u, S_n¹) =

P(x,du)P^m−1

u−k 2, S_n−k¹

=

P

x− k 2,du

P^m−1

u−k

2, S_n−k¹

=P^m

x−k 2, S¹_n−k

, where the last equalities come from induction, casesm−1 and 1.

Actually, Lemma 3 holds for any power ofP and any translation rate but for the purpose of this analysis the form provided is suﬃcient. The following result is essential for characterizing then-step kernels.

Proposition 2. Let x∈S¹₁. For alln≥1 we have Pⁿ(x, S_n+1¹ ) = 1

n!xⁿ.

Proof. Induction on n. For n= 1, apply (4) with [a, b) = [¹₂,1):

P x,₁

2,1

=x+1 2 −1

2 =x.

Next, assume the condition holds forn−1 and derive Pⁿ(x, S_n+1¹ ) =

P(x,du)Pⁿ⁻¹(u, S_n+1¹ ) =

= 1

2Pⁿ⁻¹(x, S_n+1¹ ) + _x+¹

2

x Pⁿ⁻¹(u, S_n+1¹ )du,

(7)

where the ﬁrst term is zero becauseS_n+1¹ is notaccesible from x < ¹₂. There- fore,

Pⁿ(x, S_n+1¹ ) = ¹

2

x Pⁿ⁻¹(u, S_n+1¹ )du+ _x+¹

2 12

Pⁿ⁻¹(u, S_n+1¹ )du,

where the ﬁrst integral is zero. Apply Lemma 3, change the variableu−¹₂ =v, and use induction to obtain the conclusion.

The following result opens the path to the 2-dimensional case:

Lemma 4. Let k≥0 and (x, y)∈S_k. Then for all n≥1 we have Pⁿ((x, y), Sn+k) =Pⁿ(x, S_n+k¹ ).

Proof. Induction on n. For n= 1 equation (2) yields P((x, y), Sk+1) = 1

2δ_(x,y)(Sk+1) +d

x, x+1 2

× y−1

2, y+1 2

∩Sk+1

=

= 1

2δ_(x)(S_k+1¹ ) + 1·d

x, x+1 2

∩S¹_k+1

=P(x, S_k+1¹ ). Assuming that the equality holds forn−1, we have

Pⁿ((x, y), S_n+k) =

P((x, y),dw)Pⁿ⁻¹(w, S_n+k) =

= 1

2δ₍x, y)(w)Pⁿ⁻¹((x, y), S_n+k) + _x+¹

2

x du _y+¹

2

y−¹₂ Pⁿ⁻¹((u, v), S_n+k)dv=

= 1

2Pⁿ⁻¹(x, S_n+k¹ ) + _x+¹

2

x du _y+¹

2

y−¹₂ Pⁿ⁻¹(u, S_n+k¹ )dv =

= 1

2Pⁿ⁻¹(x, S¹_n+k) + _x+¹

2

x Pⁿ⁻¹((u, v), S_n+k)du=

=

P(x,du)Pⁿ⁻¹(u, Sn+k) =Pⁿ(x, Sn+k).

The main result of this section provides a closed form forPⁿ(0, Sn), the probability for the uniform-mutation EA to be in the region S_n of the plane aftern iterations.

Theorem 1. Let k≥0 and(x, y)∈S_k. Then for all n≥1 we have Pⁿ(0, Sn) = 1

n! 2ⁿ.

(8)

Proof. According to Lemma 4 it is suﬃcient to prove the 1-dimensional result. We have

Pⁿ(0, S_n¹) =

P(0,dx)Pⁿ⁻¹(x, Sn) = ¹

2

0 P(0,dx)Pⁿ⁻¹(x, Sn) =

= ¹

2

0 P(0,dx) xⁿ⁻¹ (n−1)! =

¹

2

0

xⁿ⁻¹

(n−1)!dx= 1 n! 2ⁿ,

where in the last line we used Proposition 2 and the fact that S_n¹ is not at- tainable from 0 inn−1 iterations.

Theorem 1 can be easily generalized to the case of EA with uniform mutation inside the square of arear².

Corollary 1.1. If mutation is uniformly distributed inside the square of arear², and S_n^r ={(x, y) : (n−1)/2≤x < n/2, |y|< n/2}, then

Pⁿ(0, S_n^r) = 1 n! 2ⁿ.

Proof. Same as for Theorem 1 but integrating from 0 to r/2 along the x-axis, respectively from−r/2 tor/2 along they-axis, and dividing byr² each occurrence of P.

Remark 1.1. For the inclined plane model and the x-axis, another de- scription of the n-step transition function is provided by the inversion formula of its characteristic functions. This is easily derived by reprezenting the algorithm as a sum of i.i.d. random variables of the form (1), but with discontinuous distributionz= 1/2δ0+ 1_(0,1/2].

2. EA ON THE CORRIDOR

The corridor model is obtained from the inclined plane model by re- stricting the search to the bandA_h ≤y_h ≤B_h,h= 1, . . . , k, where the space dimension is now k+ 1. As before,x is considered to be the progress axis. A special situation occurs when the EA is located at a corner of the corridor, where the probability of a successful mutation is small, making the algorithm stagnate for a long time. The success probability gets smaller with increasing dimensionk, and the EA seems to stay in the corner for ever. This is unde- sirable algorithmic behavior, as pointed out in [10], as an empirical result for multi-normal mutation EAs. We shall prove in what follows that, from a theoretical viewpoint, things are diﬀerent: the probability for the EA to stagnate in the corner converges to zero exponentially fast, for either uniform or normal mutation. Nevertheless – and this explains the experimental conclusion – the exponential convergence rate approaches one (also exponentially) when

(9)

kgoes to inﬁnity, which may be a reason for notobserving the escape of the algorithm in (real) computation time.

For the rest of the paper we confine the analysis to the ability of EA, under different mutation distributions, to escape the corner of the corridor, assuming it has reached one. Thus we can fix the lower barrier at Ah = 0, h= 1, . . . , k and disregard the upper barrier B_h. Furthermore, if we assume movements along the x and y axes to be independent, then the contribution in probability of the x-axis mutation will be always a constant factor (1/2);

it is thus suﬃcient to consider the y-axis kernel. The reader is cautioned at this point that the goal of the analysis has shifted from the goal pursued in the previous section: rather than concerning ourselves with the behavior along the progress axis, from now on we are dealing with behavior along the non-progress axis.

Let us start with the 2-dimensional case (k= 1), and with the EA with uniform mutation inside the square of area 1, introduced in Section 1. We assume the algorithm starts at 0, and aim to describe the probability for the EA to leave the corner, which will be deﬁned as the band{y : 0≤y≤1/2}; see Figure 2.

00000000000000000000000000000000000000 11111111111111111111111111111111111111

0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000

1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000 0000000000000000000000000000000000000

1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111 1111111111111111111111111111111111111

y

0 1/2

x = Progress axis

C o r n e r C o r r i d o r

B

Forbidden zone

Fig. 2. The 2-dimensional corridor model.

The one-step kernel along they-axis is given for ally ≥0 by P(y, A) =

(¹₂ −y)δ_y(A) +d((0, y+¹₂)∩A), y≤ ¹₂, d((y−¹₂, y+¹₂)∩A), y > ¹₂, (5)

P(y,du) =

(¹₂ −y)δy(u) + 1_(0,y+1

2]·du, y≤ ¹₂, 1_(y−1

2,y+¹₂]·du, y > ¹₂. (6)

(10)

Note that the movement along they-axis copies the discontinuous transition rule along the x-axis from the inclined plane model – equation (3), but only fory= 0.

The formulas above make it clear why the 1-dimensional Markov chain – which is the EA on the corridor, restricted to the y-axis – cannot be put in the additive form (1). Next, as the point y = 0 is acting like a non- absorbent barrier, one would be interested in observing the behavior of the algorithm in the vicinity of this point, namely in evaluating the succesive terms {Pⁿ(0,[0,¹₂])}_n≥1. That calculation is tractable, even for high-order terms, still no recursion becomes apparent. A different conclusion can be drawn if we confine the analysis to the sequence{Pⁿ(0,[0,¹₂])}_n≥1, wherePⁿ(0,[0,¹₂]) is by definition,the probability of theEAstarting at0to be inside they-axis interval [0,¹₂]from iteration 1to iteration n. Note thatPⁿ(0,[0,¹₂])≥Pⁿ(0,[0,¹₂]) for alln, and also

Proposition 3. For all y∈(0,¹₂] we have P(0,[0,¹₂]) = 1, Pⁿ(0,[0,¹₂]) = 1

2Pⁿ⁻¹(0,[0,¹₂]) +In−1, n≥2, (7)

I_n= ¹

2

0 Pⁿ(y,[0,¹₂])dy, n≥1, P(y,[0,¹₂]) = 1−y,

Pⁿ(y,[0,¹₂]) = 1

2−y _n−1

(1−y) + 1

2−y _n−2

I1+· · ·+In−1, n≥2. (8)

Proof. Induction on n. For n = 1 we have Pⁿ(y,[0,¹₂]) = Pⁿ(y,[0,¹₂]) for ally∈[0,¹₂] and the result follows from the previous calculation on P.

Next, assume the recursion holds forn−1 and use the Chapman-Kolmogorov equation

Pⁿ(0,[0,¹₂]) =

P(0,dy)Pⁿ⁻¹(y,[0,¹₂]) =

P(0,dy)Pⁿ⁻¹(y,[0,¹₂]), then apply (6) with y = 0 to obtain (7). In order to prove (8), apply again Chapman-Kolmogorov, then (6) to get

Pⁿ(y,[0,¹₂]) =

P(y,du)Pⁿ⁻¹(u,[0,¹₂]) =

= (¹₂ −y)Pⁿ⁻¹(y,[0,¹₂]) + ¹

2

0 Pⁿ⁻¹(u,[0,¹₂])du,

(11)

which, according to the induction hypothesis, is equal to 1

2−y 1

2−y _n−2

(1−y) + 1

2−y _n−3

I₁+· · ·+I_n−2

+I_n−1=

= 1

2−y _n−1

(1−y) + 1

2 −y _n−2

I1+· · ·+ 1

2 −y

In−2+In−1. By iterating numerically the sequence {Pⁿ(0,[0,¹₂])}_n≥1 from equation (7), we observe exponential convergence to zero when the number of iterations increases. This corresponds to exponential convergence to one for the complement – the probability of escape from the corner for the EA with uniform mutation. To make that rigorous, an upper bound for Pⁿ(0,[0,¹₂]) is computed next.

Lemma 5. Let fn: [0,¹₂]→[0,1] be such that f1(y) = 1−y, fn(y) =

1 2−y

fn−1(y) +In−1, n≥2, (9)

In= ¹

2

0 fn(y)dy, n≥1. Then{f_n(0)}_n≥1 converges to zero as n→ ∞.

Proof. First note thatfnis monotonically decreasing and convex in [0,¹₂] for all n. Products and sums of monotonically decreasing positive functions are monotonically decreasing. As for convexity, all terms in the expanded form (8) – which also holds forf_n(y) – have factors of the form (1/2−y)ⁱ, therefore their second derivative will be positive in the given interval.

It follows from convexity that the integral deﬁningI_nis less then the area of a certain trapesius: I_n ≤ [f_n(0) + 1/2f_n]/4. From (9) we have f_n(1/2) = In−1 and alsofn(0) = 1/2fn−1(0) +In−1, so we get

(10) I_n≤ 1

8f_n−1(0) + 1 2I_n−1.

Given the positivity of all terms in (9), it is clear that if we replace (10) with equality, all functionsf_nwill increase, and in particular so will dof_n(0) andI_n:

(11) In= 1

8fn−1(0) + 1 2In−1.

The advantage is now that (9) and (11) constitute a purely numeric double sequence, which can be solved in closed form. Technically, we should rename the variables, since the sequence deﬁned by (9) and (11) is a diﬀerent one, bounding the original fn(0) andIn. Let us call them an and bn, respectively.

(12)

Deﬁne the vector xn = [an, bn]^T and the matrix A =

1/2 1 1/8 1/2

. The numerical recursion is now written simplyx_n=Ax_n−1. Under initial condition x1 = [1,3/8]^T and given the eigenvalues of A,σ1 = 0.853, σ2 = 0.146, by the Jordan decomposition we obtain

an = 1.030σ₁ⁿ⁻¹−0.030σⁿ⁻¹₂ , bn= 0.364σⁿ⁻¹₁ + 0.010σⁿ⁻¹₂ .

Conclusion: a_n above is an upper bound on f_n(0), and it tends exponentially to zero asn→ ∞.

The main result is now a straightforward implication of Lemma 5 and Proposition 3, by identifyingfn(y) =Pⁿ(y,[0,¹₂]) for alln.

Theorem2. The probability for the EA with uniform mutation to stag- nate at the corner of the2-dimensional corridor from iteration1tonconverges to zero exponentially asn→ ∞.

2.1. THE MULTI-DIMENSIONAL CORRIDOR

The escape from the corner problem can be approached in a similar manner for the multi-dimensional corridor. Consider the most general case:

a ﬁxed positive integer k, a (k+ 1)-dimensional corridor, and the associated k-dimensional kernel Pⁿ_k((y₁, . . . , y_k),[0,1/2]^k), to be denoted f_n^k(y₁, . . . , y_k) in what follows. A derivation similar to the 2-dimensional case leads to

Lemma 6. Let f_n^k: [0,¹₂]^k →[0,1] be such that f₁^k(y₁, . . . , y_k) = 1−

_k

i=1

1 2 +y_i

− 1 2^k

,

f_n^k(y₁, . . . , y_k) =

1 2^k −^k

i=1

y_i

f_n−1^k (y₁, . . . , y_k) +I_n−1, n≥2, (12)

I_n= ¹

2

0 . . . ¹

2

0 f_n^k(y₁, . . . , y_k)dy₁. . .dy_k, n≥1. (13)

Then{f_n^k(0, . . . ,0)}_n≥1 converges to zero exponentially as n→ ∞. The main result follows now by identifying f_n^k(y) = Pⁿ_k((y),[0,1/2]^k) for all n ≥ 1 and all (y1, . . . , yk) ∈ [0,1/2]^k. The veriﬁcation of Lemma 6 on the transition probabilities Pⁿ_k – the analogue of proposition 3 – is a straightforward consequence of Chapman-Kolmogorov equation and of thek- dimensional version of (5). For simplicity,k+ 1 is renotedN.

(13)

Theorem 3. The probability of the EA with uniform mutation to stag- nate at the corner of the N-dimensional corridor from iterations 1 to n con- verges to zero exponentially fast asn→ ∞, for allN ≥3.

One should note that through some more detailed geometrical compu- tations, tighter bounds can be derived. This is made clear forN = 3 by the following result (proof similar to that of Lemma 5):

Proposition 4. For the 3-dimensional case, the probability of the EA with uniform mutation to stagnate at the corner from iterations 1 to n is bounded from above by

an= 0.776σⁿ⁻¹₁ + 0.200σ₂ⁿ⁻¹+ 0.023σ₃ⁿ⁻¹, where σ1 = 0.478, σ2= 0.250, σ3= 0.021.

2.2. EA WITH NORMAL MUTATION

Normal mutation is extensively used in EA applications, being also a workhorse for empirical testing and statistical study [10]. For the present analysis, passing from uniform to normal mutation means passing from a kernel with compact suport to one which is spread over the whole space. Together with elitism, this makes EA fulﬁll both suﬃcient conditions to global convergence [8], [9].

We keep the corridor model, and analyze the escape from the corner problem, but give a unitary treatment for the 2- and (k+ 1)-dimensional case.

As remarked in Section 2, we can omit the x-component of the transition function and concentrate only on theknon-progress directions, which are the axes deﬁning the corridor. For convenience, we replace the upper bound 1/2 by 1 in the corner deﬁnition.

Let us denote byL(u, y) the density of thek-variate normal distribution with meany and covariance matrixC

L(u, y) = 1

(2π)^kdet(C)e⁻¹²^(u−y)^T^C⁻¹^(u−y).

The one-stepk-dimensional kernel associated with the EA with normal mutation is deﬁned, for (y1, . . . , yk)≥0 and A∈ B(R^k), by

P(y, A) =

1−

(0,∞)^kL(u, y)du

δy(A) +

(0,∞)^k∩AL(u, y)du, (14)

P(y,du) =

1−

(0,∞)^kL(u, y)du

δ_y(u) +L(u, y)1_(0,∞)kdu.

(14)

As before, we conﬁne the analysis to the sequence {Pⁿ(0,[0,1])}n≥1, wherePⁿ(0,[0,1]) stands now for the probability of the EA with normal mutation starting at 0 to be inside the y-axis interval [0,1], from iterations 1 to n. Then we have have the following equivalent of Lemma 6.

Lemma 7. Let f_n^k: [0,1]^k →[0,1]be such that f₁^k(y) = 1−

(1,∞)^kL(u, y)du, f_n^k(y) =

1−

(0,∞)^kL(u, y)du

f_n−1^k (y) +In−1(y), n≥2, In(y) =

(0,1)^kL(u, y)f_n^k(u)du, n≥1. Then{f_n^k(0)}_n≥1 is exponentially decreasing to zero as n→ ∞.

Proof. For all n denote by f_n^MAX the maximal value of fn when the components of y take independently values between 0 and 1. Omit index k from the expression off. Then

f_n(y)≤f_n−1^MAX

1−

(0,∞)^kL(u, y)du

+I_n−1, (15)

In−1(y)≤f_n−1^MAX

(0,1)^kL(u, y)du.

(16)

Substitute (16) in (15) to get

(17) f_n(y)≤f_n−1^MAX

1−

(0,∞)^kL(u, y)du+

(0,1)^kL(u, y)du

.

Denote byαk(y) the term in the large parantheses above. This is a continuous function of y with compact domain (the unit cube). Hence, according to a well-known result in analysis, it attains both its minimum and maximum at points of the domain. Since at any pointy in the domain 0< αk(y) <1, the same inequality is true for both its minimum and maximum, denoted α^MIN_k andα^MAX_k , respectively.

(15)

For numerical evaluations, we can also derive computable versions of these two bounds, as follows. We have

1−

(−1,∞)^kL(u,0)du+

(0,1)^kL(u,0)du≤ α_k(y)≤ (18)

≤1−

(0,∞)^kL(u,0)du+

(−¹₂,¹₂)^kL(u,0)du.

Both inequalities above follow easily by induction onkby using at each induction step the one-dimensional version of the same inequality for the respective marginal of thek-variate normal. In compact form we have

(19) α^min_k ≤α_k(y)≤α^max_k

with obvious identiﬁcations. The new bounds are independent of y, and α^max_k < 1 for all k ≥ 1. Note that the explicit bounds just obtained are in general less tight than the previous ones,α^MIN_k and α^MAX_k .

For the ﬁnal step of the proof, either upper bound works, since both are strictly sub-unitary. Let f_n(y) attain its maximum in (17). From (19) we obtain

f_n^MAX≤α_kf_n−1^MAX≤ · · · ≤αⁿ⁻¹_k f₁^MAX, which tends to 0 asn→ ∞. Thus,f_n(0) will do the same.

As with uniform mutation, the main result follows now by identifying f_n^k(y) = Pⁿ_k((y),[0,1]^k) for all n ≥ 1 and all y ∈ [0,1]^k. The veriﬁcation of Lemma 7 onPⁿ_kis a staightforward consequence of Chapman-Kolmogorov and (14). Again, (k+ 1) is denotedN.

Theorem4. The probability of theEAwithnormalmutation to stagnate at the corner of the N-dimensional corridor from iterations 1 to n converges to zero exponentially fast as n→ ∞, for all N ≥2.

Remark 4.1. For the case of uncorrelated mutations (diagonal covariance matrix C), the bases responsible for the EA’s convergence – α^min_k and α^max_k from the proof of Lemma 7 – both converge to 1 as k→ ∞. Indeed, in this case the domain-integrals in (18) can be easily computed from their uni-variate forms, and get

α^min_k = 1−0.84^k+ 0.34^k≤α_k(y)≤1−0.5^k+ 0.38^k=α^max_k .

Obviously, both bounds tend to 1 exponentially as k→ ∞, and this explains the stagnation at the corner reported in [10], for certain corridor problems on spaces of large dimensionk.

(16)

3. CONCLUSIONS

In continuous search spaces, signiﬁcant insight on the behavior of a prob- abilistic algorithm can be gained by analyzing the associated transition function. This was done in this paper for the one-individual evolutionary algorithm with elitist selection and two types of mutation. First, uniform mutation was used to derive the probability of maximal success inniterations on the inclined plane model. Next, search was restricted to a corridor, and the algorithm’s behavior at one of the corners was analyzed in detail, under uniform/normal mutation, in two-/multi-dimensional space. The conclusion is that the probability of escaping the corner tends to one when the number of iterations goes to inﬁnity, and convergence is exponentially fast.

Taking the analysis to more complicated landscapes and/or diﬀerent mutation distributions would be an important step ahead. An even more challeng- ing task would be to characterize the kernel of multi-individual evolutionary algorithms, thus allowing for interactions within the current population.

Acknowledgements.This work was done while the first author was visiting Dort- mund University. Financial support from the Collaborative Research CenterCompu- tational Intelligence(SFB 531), and sccientific support from Chair Computer Science XI are gratefully acknowledged.

REFERENCES

[1] A. Agapie, Theoretical analysis of mutation-adaptive evolutionary algorithms. Evolu- tionary Comput.9(2001), 127–146.

[2] H.-G. Beyer,The Theory of Evolution Strategies.Springer, Heidelberg, 2001.

[3] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.IEEE Trans. Pattern Anal. Machine Intelligence6(1984), 721–

741.

[4] H. Haario and E. Saksman,Simulated annealing process in general state space.Adv. in Appl. Probab.23(1991), 866–893.

[5] O. H¨aggstr¨om, Finite Markov Chains and Algorithmic Applications.Cambridge Univ.

Press, Cambridge, 2002.

[6] E. Nummelin, General Irreducible Markov Chains and Non-negative Operators. Cam- bridge Univ. Press, Cambridge, 1984.

[7] G. Rapple, On linear convergence of a class of random search algorithms.Z. Angew.

Math. Mech.69(1989), 37–45.

[8] G. Rudolph,Convergence of evolutionary algorithms in general search spaces.In: Proc.

3rd IEEE Conf. on Evolutionary Computation, Piscataway,NJ, pp. 50–54. IEEE Press, 1996.

(17)

[9] G. Rudolph, Convergence Properties of Evolutionary Algorithms. Kova´c, Hamburg, 1997.

[10] H.-P. Schwefel,Evolution and Optimum Seeking.Wiley, New York, 1995.

[11] M.T. Wasan,Stochastic Approximation.Cambridge Univ. Press, Cambridge, 1969.

Received 15 October 2005 Romanian Academy

Institute of Mathematical Statistics and Applied Mathematics Calea 13 Septembrie nr. 13 050711 Bucharest 5, Romania

agapie@rdslink.ro and

Computer Science, Tarleton State University Box T-0930 Stephenville, TX 76402, USA

agapie@tarleton.edu