Theoretical Foundation - Theory and Analysis of Evolutionary Optimization

Theory and Analysis of Evolutionary Optimization

4.2 Theoretical Foundation

This section reviews the theory of stochastic generational methods pro-posed by [Zhigljavsky, 1991], which presents a convenient and intuitive mathematical foundation for modeling a class of evolutionary algorithms in terms of construction and evolution of sampling distributions over the feasible space X. The key results in [Zhigljavsky, 1991] are adapted and presented without proof. Similar models are formulated and analyzed in the mathematical population genetics literature (see for example [Burger, 1988;

Karlin, 1979; Slatkin, 1970]). These general methods accommodate ex-tensions to model other evolutionary algorithm variants. For instance,

[Slatkin, 1970] has used this approach to study the effect of random mat-ing, and [Peck and Dhawan, 1995] have formulated similar extensions to model genetic algorithms. The reader interested in a relative eval-uation of this theoretical approach and other prevailing theoretical ap-proaches is referred to [Peck and Dhawan, 1995], and to [Gao, 1998;

Qi and Palmieri, 1994].

4.2.1 Notation

• [iⁿ is the Lebesgue measure on W¹ o r o n ^ C i ™ . Less formally, it is a general measure of volume in n dimensional real space.

• dx is compactly the infinitesimal volume element in n dimensional real space.

• B(x,e) = {z € X : \\x - z\\ < e}, and B(e) = B(x*,e). B{x,e) is the set of all points in X e-close to a;, and represents an enclosing ball of radius e centered at x in the space X. B(x*,e) represents the e ball in X centered at the global optimizer x*.

• A(e) = {z € X : \ip(x*) — ip(z)\ < e} is the set of all points in the search space X that are e-close to the global optimizer x* in the objective space tp(-).

• Given a set of events <£., for any event e £ £ the image of the indicator function I : (£ —> {0,1} is given by l[e] = 1 if e is true,

Theory and Analysis of Evolutionary Optimization 55

and I[e] — 0 if e is false.

4.2.2 General Algorithm

A four-step scheme of a generational global stochastic search algorithm is presented. An evolutionary algorithm that does not include crossover but may include deterministic survival heuristics in combination with fit-ness proportional selection and mutation-oriented stochastic variation op-erations is a special case.

(1) Initialization: Set g = 0. Choose a probability distribution P^g on X, and sample N times to obtain points x^g , x^g , • • • , x^g constituting the initial population.

(2) Fitness Proportional Selection: Construct the population dis-tribution Rg on X by selecting N points based on their fitness. The probability of selecting a point with index j is given by

*> = # £ > - <«)

y>(4<>)

2 = 1

(3) One Step Evolution: Construct the population distribution at the next generation P^g+\ on X by varying R^g, and according to

p

^g+1

(dx) = Y, p

⁽

p Q

⁹

(4

^J)

>

^dx

) (

⁴

-

)

Qg(z,-) represents a stochastic variational operator and is a generic function that can model a variety of mutation operations including deterministic survival heuristics. Q^g(z,dx) = q^g(z,x)iJ,ⁿ(dx), and

max q^g(z,x) < oo. q^g(z,x) is an n dimensional probability density function representing the transition of point z to point x. Require that the width of such a function be non-zero, or alternatively, require its maximum height to be less than infinity. Qg(z,dx) represents the associated probability mass measured over the volume dx. Formally, Q^g(z, •) is a measurable, nonnegative function with respect to its first argument and a probability measure with respect to its second argu-ment. A desired sample x in the distribution P^g+i is obtained by first sampling R^g(dz) to obtain z, and then sampling Q^g(z,dx). Stated

al-ternatively, a desired sample x in the distribution P^g+i is obtained by selection followed by stochastic variation.

(4) Iteration: Set g <— <7 + 1, and repeat from Step 2 until some termi-nation criterion is met.

The construction of the distribution R^g on X via selection takes into account the global aspect of the search strategy, whereby a point z from all X is chosen, while the distribution Q^g(z,-), representing stochastic varia-tion comprises the local aspect of the search strategy, whereby a point in the neighborhood of z is chosen. Importantly, the general form of Qg(z, •) permits an encapsulation of deterministic problem-specific heuristics in ad-dition to randomized heuristics, and may be used to some degree to offset the specificity of the assumption of fitness proportional selection above, though proportional selection is in and of itself not a restrictive assumption and not one that fosters only a narrow appeal or application. The nature of Q^g(z,-) largely determines the trade-off between search accuracy and search efficiency, and for a theoretical study it is convenient to select

Qg(z,dx) = ?{{X ~ *>/fl>>"" W (4.4) / 4>((y- z)//3g)nn(dy)

where (/>(•) is decreasing for ||a; — z\\ > 0, is symmetrical, continuous, de-composes into a product of one-dimensional densities, f3^g > 0, {/3⁹} is a non-increasing sequence, and the denominator is a normalization constant.

A random realization x 6 X is obtained by repeatedly sampling ${•) until an £9 is obtained such that z + £g £ X, and then setting x = z + £g. In this mode, the distribution Qg{z, •) serves as a radially symmetric mutation operation. The above form is also preferred when there is random noise in fitness evaluations. In practice, and when there is no noise in the fitness evaluations, an option is to select Qg(z, •) as

Qg(z,A)=J l[xeA: iP(x) > ^{z)}Q'g{z,dx) + A] J I[Mx)<il>(z)]Q'^g(z,dx)

ilze

(4.5)

where Q'g (z, dx) is also of the form of Qg (z, dx) as it appears in (4.4). What (4.5) represents is that if the probability mass of Q^g(z, •) is considered over an arbitrary set A C X, then the contribution by the first integral is due

Theory and Analysis of Evolutionary Optimization 57

to the probability of sampling an x £ A : tp(x) > "4>{z), while the second integral contributes to the mass only when z £ A and the sample x £ X is such that ip(x) < 4>{z). Essentially, expression (4.5) represents a policy by which the offspring survives only when its fitness is equal to or better than that of its parent. This expression demonstrates the incorporation of a deterministic survival heuristic based on parent-offspring competition.

Other heuristics may be conceived and expressed as well using the above technique based on indicator functions.

4.2.3 Basic Results

This section presents four results based on the following assumptions:

(a) 0 < ip(x) < oo for all x £ X.

(b) Qg(z,dx) = qg(z,x)nn(dx), and max qg(z,x) < oo, as stated earlier.

(c) The global optimizer x* is unique, and there exists an e > 0 such that V'(-) is continuous in the finite region B(e).

(d) There exists a 6 > 0 such that the sets ^4(e) are connected for all e : 0 < e < 6.

(e) For any z £ X and g —> oo the sequence of probability measures Q^g(z,dx) weakly converges to the probability measure concentrated at the point z.

(f) For any e > 0 there exists a 5 > 0 and a natural number k such that Pg(B(e)) > 6 for all g > k.

(g) P0(B(x, e)) > 0 for all e > 0, x £ X.

Result 4.1 Let (a), (b) hold, and let the probability distribution iteration follow the scheme in (4-3). Then, as N —» oo, the distribution of the samples in the search follows the sequence

f P^g{dz)^(z)Q^g(z,dx)

P^g+1(dx) = ^ — (4.6)

/ Pgidz^z) Jx

n

Result 4.2 Let (a) through (d) hold. Let ijj{-) be evaluated without ran-dom noise. Let Q^g(z,dx) be chosen according to (4-5), Q'(z,dx) = q' (z,x)/j,ⁿ(dx), and let q' (z,x) be an n-dimensional Gaussian density with independent coordinates whose variance is non-increasing with respect to

generations, whereby Q'g(z,dx) weakly converges to the probability measure

concentrated at the point z. Then, (e), (f), (g) hold. O Result 4.3 Let (a) through (d) hold. Let tp(-) be evaluated either with

finite bounded random noise or without noise. Let Q^g(z,dx) be chosen ac-cording to (4-4)>^and Id Qg(^zt^x) be an n-dimensional Gaussian density with independent coordinates whose variance is non-increasing with respect to generations, whereby Qg(z,dx) weakly converges to the probability measure

concentrated at the point z. Then, (e), (f), (g) hold. • Result 4.4 Let Result 4-2 or Result 4-3 hold. Then, the distribution

se-quence (4-6) converges to the distribution concentrated at x* as g —>• oo.

Dans le document Network-Based Distributed Planning (Revolutionary (Page 74-78)