• Aucun résultat trouvé

In this Section, we propose an algorithm to provide amaximin design with N points in any set E enclosed in a bounded set. It is based on a simulated annealing method. It aims at finding the global minimum of the function U : EN → R+, U(X) = diam(E)−δX where diam(E) is the diameter of the set E (diam = maxx,xEkx−xk). It is obvious that to minimize U is equivalent to maximizeδ :X7→δX.

The initialization step consists of simulating uniformly a lot of points in the domain E and of calculating the corresponding empirical covariance matrix denoted by Σ. At the end of the initialization step, we randomly keep N points, denoted by X(0) = {x(0)1 , . . . ,x(0)N }. Then, we propose to iterate the following steps, for t= 1, . . .:

Algorithm 4.1.

1. A pair of points (x(t)i ,x(t)j ) is drawn in X(t) according to a multinomial distribution with probabilities proportional to 1/(kxi−xjk+α) ;

2. One of the two points is chosen with probability 12, it is denoted byx(t)k ; 3. A constraint gaussian random walk is used to propose a new point :

xpropk ∼ Nd(x(t)k , τΣ)IE(.),

The proposed design is denoted by Xprop ={x(t)1 , . . . ,x(t)k1,xpropk ,x(t)k+1, . . . ,x(t)N}; 4. X(t+1) =Xprop with probability

min 1,exp

The idea behind this proposal is to force the pairs of points which are very close to be more distant. β:t7→βtis an inverse cooling schedule (ieβtis an increasing positive sequence and limt→∞βt = ∞) which is chosen in order to ensure the convergence of the algorithm.

qτ(X, .) is the probability density function of the proposal kernelQτ(X, dY) whereX∈EN is the current state, dY is an infinitesimal neighborhood of the state Y.

τ is a variance parameter which is allowed to change during the iterations but, at each it-eration, τ is such that τ0 ≥ τ ≥ τmin. α > 0 is a very small integer which prevents the denominator of 1/(kxi−xjk+α) to vanish.

In order to explicit the proposal kernelQτ(X, dY), let us introduce some notations:

• dXi,j = 1/(kxi−xjk+α),

• DX =P

k,l:k<ldXk,l,

• φ(.|µ, S) denotes the gaussian pdf with mean µand covariance matrixS,

• Gµ,S = R

Eφ(y|µ, S)dy denotes the normalization constant associated to φ(.|µ, S) on the domain E,

In order to show the convergence of the previous algorithm, some lemmas are introduced.

Lemma 4.3. For all X ∈ EN, qτ(X, .) ≥ qmin > 0 and qτ(X, .) ≤ qmax, Qτ(X, .)-almost everywhere on EN.

Proof

The fact that qτ(X, .) ≤ qmax is true since the normalization constants are lower-bounded, the gaussian densities are uniformly bounded since τ0≥τ ≥τmin>0 and all the other terms can be upper bounded by 1.

The other assertion is only true Qτ(X, .)-almost everywhere onEN. It means that the lower bound on qτ(X,Y) is given when X and Y have at least N −1 points in common and are both in EN.

The following lower bounds are used:

• G−1x

where ξ is the largest eigenvalue of Σ−1.

qmin>0 is found by multiplying these expressions and it is a lower bound ofqτ(X,Y) which does not depend onτ and on the states if X∈EN and Y∈EN have at least N −1 points in common. ✷

Let us denote by (τt)t1 the values ofτ used during the iterations of the algorithm. This lemma gives that, for a sequence ofN proposal kernels (Qτ1, . . . , QτN), it is possible to reach any state Y ∈ EN from any state X ∈ EN. Indeed, at each transition the density is lower bounded by qmin and at each transition one of the N points is moved. Hence, we get the following lemma:

Lemma 4.4. If τt is such that τ0 ≤ τt ≤τmin, ∀t ≥ 1, there exists ǫ > 0 such that for all A∈B(EN) (Borelian subset of EN), and for all X∈EN,

(Qτ1· · ·QτN)(X, A)≥ǫλ(A)/λ(EN). (4.3) where λ denotes the Lebesgue measure on the compact set EN (λ(dX) = IEN(X)Leb(dX) where Leb is the Lebesgue measure on (Rd)N).

According to the previous comments ǫ=qminN suits.

Then, the Hasting-Metropolis (HM) kernel is focused on. It is the global kernel which describes an iteration of the algorithm. It obviously depends on the parametersβ and τ. It reads as, fixed. In a simulated annealing algorithm, the target distribution is the Gibbs measure, ie µβ(dX) = exp(−βU(X))Zβ−1λ(dX) whereZβ =R

eβU(Y)λ(dY).

Lemma 4.5. µβ is Kβ,τ-reversible for allτ, β. It implies that µβ is Kβ,τ-invariant.

Proof

If X 6= Y, we have µβ(X)qτ(X,Y)aβ,τ(X,Y) = µβ(Y)qτ(Y,X)aβ,τ(Y,X). Indeed, if µβ(Y)qτ(Y,X) > µβ(X)qτ(X,Y), aβ,τ(X,Y) = 1 and aβ,τ(Y,X) = µµβ(Y)qτ(Y,X)

β(X)qτ(X,Y). The other case is done by symmetry in Xand Y.

Let ¯bβ,τ(X) = 1−R

Eaβ,τ(X,Z)Qτ(X, dZ),

we haveµβ(X)¯bβ,τ(X)δX(dY) =µβ(Y)¯bβ,τ(Y)δY(dX).

Indeed, this measure is non-zero only in the caseX=Y. Therefore, µβ(dX)Kβ,τ(X, dY) =µβ(dY)Kβ,τ(Y, dX).

We will show the convergence of our algorithm following the proof given in Bartoli and Del Moral (2001). Some adaptations are necessary since our proposal kernel depends on a variance parameterτ and since there is no reversible measure for the proposal kernel Qτ. For those reasons, the ratio between the proposal densities has to be in the acceptance rateaβ,τ since it makesµβ Kβ,τ-reversible and then invariant.

In Bartoli and Del Moral (2001), the reversibility of Kβ was shown thanks to the re-versibility of Q. That is why the ratio of the proposal does not intervene in the acceptance rate of their algorithm.

The next lemma states that whenβis large, the target distributionµβ concentrates on the minima of the functionU. U :EN → R+ is lower bounded with respect toλ (the Lebesgue measure on the compact setEN). We use the following notation

m= supa[a;λ({X;U(X)< a}) = 0], by definition λ({X;U(X)< m}) = 0.

Moreover, for all ǫ > 0, we define Uλǫ = {X;U(X) ≤ m +ǫ} which is clearly such that λ(Uλǫ)>0 andUλǫ,c ={X; U(X)> m+ǫ}.

Lemma 4.6.

∀ǫ >0, lim

β→∞µβ(Uλǫ) = 1. Proof

IfX∈Uλǫ, then e−β(U(X)−(m+ǫ))≥1 and λ(eβ(U(m+ǫ))) =

Z

eβ(U(X)(m+ǫ))λ(dX)

≥ Z

IUǫ

λ(X)eβ(U(X)(m+ǫ))λ(dX)

≥ λ(Uλǫ).

Then,

Dominated convergence Theorem can be applied to the integral on the right-hand side since the function is bounded by 1 which is integrable on the compact set EN. Thus,

β→∞lim λ

The distribution of the Markov chain associated to an inverse cooling schedulet7→β(t) and to a variance schedule t 7→ τ(t) is denoted ηn. According to the previous results, we have ηn+1nKβ(n),τ(n) and µβ(n)β(n)Kβ(n),τ(n). The aim is to prove that

where osc(U) is the smallest positive number h such that for all X, Y in EN, U(Y)−U(X)≤h.

According to the upper-bound ofaβ,τ(X,Y), it is shown that

Thus for (X, A)∈EN×B(EN),Kβ,τ(X, A)≥e−βosc(U)qqmin

For p = N and thanks to the condition (4.3), there is an upper-bound to the Dobrushin coefficient,

As a consequence, an application of Dobrushin Theorem states that for all µ1, µ2 proba-bility measures on EN,

1Kβ11· · ·KβNN−µ2Kβ11· · ·KβNNk ≤ 1−ǫ λ(U)>0, these notations are used

µU(dX) =ZU1eU(X)λ(dX) where ZU =λ(eU). Thanks to Dobrushin Theorem, we get

U1−µU2k ≤1−exp

If this lemma is applied to U11U and U22U, 0 < β1 < β2, then an upper bound is obtained on the Gibbs measures:

β1 −µβ2k ≤(β2−β1)osc(U).

The next lemma is useful in order to choose the funtionn7→βn.

Lemma 4.9. Let In, an, bn, n≥0 be three sequences of positive numbers such that ∀n≥1 In≤(1−an)In1+bn. Ifanandbnare such thatlimn→∞abn

n = 0andlimn→∞Qn

p=1(1−ap) = 0 then,

n→∞lim In= 0. Proof

According to the assumptions, for allǫ >0 there exists an integer n(ǫ)≥1 such that

∀n≥n(ǫ), bn≥ǫan, Yn

p=1

(1−ap)≤ǫ . As a consequence for all these n≥n(ǫ), it holds that

In−ǫ ≤ (1−an)In1−ǫ(1−an) = (1−an)(In1−ǫ)

 Yn

p=1

(1−ap)

(I0−ǫ).

It implies that for all n≥n(ǫ),

0≤In≤ǫ+ǫ(I0+ǫ)≤ǫ(1 +ǫ+|I0|) which ends the proof. ✷

The convergence Theorem can now be stated.

Theorem 4.1. If the sequence (τn)n0 is such that ∀n≥0, τ0 ≥τn≥τmin>0 and if βn= 1

Clog(n+e), C > N osc(U), we get

∀ǫ >0, lim

n→∞Pη(Xn∈Uλǫ) = 1

where Uλǫ ={X∈ E;U(X) ≤m+ǫ} and {Xn;n≥0} denotes the random sequence we get from the simulated annealing algorithm with an initial probability distribution η on EN.

Proof

For any non-decreasing sequence,

0≤β1≤. . .≤βp

and for every probability distributionη onEN, it is first noticed that

kηKβ11· · ·KβNN −µβN+1k ≤ kηKβ11· · ·KβNN −µβ1Kβ11· · ·KβNNk+ kµβ1Kβ11· · ·KβNN −µβN+1k.

Thanks to the remark following the lemma 4.7, forε >0, it holds that kηKβ11· · ·KβNN −µβ1Kβ11· · ·KβNNk ≤

1−εeN βNosc(U)

kη−µβ1k. (4.4) For the second term, the following decomposition is used

µβ1Kβ11· · ·KβNN−µβN+1 =

wherebis the contraction coefficient (by Dobrushin Theorema(K) +b(K) = 1).

An application of the lemma 4.8 gives

β1Kβ11· · ·KβNN−µβN+1k ≤osc(U) XN

k=1

k+1−βk) = (βN+1−β1)osc(U). (4.5) By combining (4.4) and (4.5), it is deduced that

kηKβ11· · ·KβNN −µβN+1k ≤

1−εe−N βNosc(U)

kη−µβ1k+ (βN+1−β1)osc(U). Instead of (β1, . . . , βN) and η, we take (βkN, . . . , β(k+1)N) andηkN:

Ik+1=kηkNKβkNKN· · ·Kβ(k+1)N(k+1)N −µβ(k+1)Nk=kη(k+1)N −µβ(k+1)Nk. By the previous upper bound, the recursive inequalities are stated

Ik+1

checked that if C > N osc(U) then bak+1

In the case whereE is not explicit, the normalization constantGm,S of a gaussian distribution with mean m and covariance matrix S cannot be computed. Hence, the ratio of densities of proposal kernels is not tractable. In that case, we first propose to use as a proposal an unconstrained gaussian random walk. The steps 3 and 4 of Algorithm 4.1 are modified.

Algorithm 4.2. The first steps until step 3 are the same.

Step 3 is replaced with

3bis. A gaussian random walk is used to propose a new point : xpropk ∼ Nd(x(t)k , τΣ). And step 4 is replaced with

4bis. IfXprop∈EN,X(t+1) =Xprop with probability

In the last step, ˜qτ(X, .) stands for the density of the proposal kernel where the gaussian random walk is not constraint to remain in the domain E. For anyX∈EN, Y ∈(Rd)N,

Since a lemma similar to lemma 4.3 can be proved for the kernel ˜Qτ (corresponding to the density ˜qτ), theorem 4.1 still applies to it. Hence, there is also a convergence result for Algorithm 4.2.

However, since a point can be proposed outside of the domainE, this algorithm can suffer from a lack of efficiency. Another solution is to use the first algorithm without the ratio of densities of proposal kernels.