Computing maximin designs - Méthodes d’interpolation à noyaux pour l’approximation de fonctions

In this Section, we propose an algorithm to provide amaximin design with N points in any set E enclosed in a bounded set. It is based on a simulated annealing method. It aims at finding the global minimum of the function U : E^N → R₊, U(X) = diam(E)−δ_X where diam(E) is the diameter of the set E (diam = max_x,x^′_∈_Ekx−x^′k). It is obvious that to minimize U is equivalent to maximizeδ :X7→δ_X.

The initialization step consists of simulating uniformly a lot of points in the domain E and of calculating the corresponding empirical covariance matrix denoted by Σ. At the end of the initialization step, we randomly keep N points, denoted by X⁽⁰⁾ = {x⁽⁰⁾₁ , . . . ,x⁽⁰⁾_N }. Then, we propose to iterate the following steps, for t= 1, . . .:

Algorithm 4.1.

1. A pair of points (x^(t)_i ,x^(t)_j ) is drawn in X^(t) according to a multinomial distribution with probabilities proportional to 1/(kx_i−x_jk+α) ;

2. One of the two points is chosen with probability ¹₂, it is denoted byx^(t)_k ; 3. A constraint gaussian random walk is used to propose a new point :

x^prop_k ∼ Nd(x^(t)_k , τΣ)I_E(.),

The proposed design is denoted by X^prop ={x^(t)₁ , . . . ,x^(t)_k₋₁,x^prop_k ,x^(t)_k+1, . . . ,x^(t)_N}; 4. X^(t+1) =X^prop with probability

min 1,exp

The idea behind this proposal is to force the pairs of points which are very close to be more distant. β:t7→βtis an inverse cooling schedule (ieβtis an increasing positive sequence and lim_t→∞β_t = ∞) which is chosen in order to ensure the convergence of the algorithm.

q_τ(X, .) is the probability density function of the proposal kernelQ_τ(X, dY) whereX∈E^N is the current state, dY is an infinitesimal neighborhood of the state Y.

τ is a variance parameter which is allowed to change during the iterations but, at each it-eration, τ is such that τ₀ ≥ τ ≥ τ_min. α > 0 is a very small integer which prevents the denominator of 1/(kxi−xjk+α) to vanish.

In order to explicit the proposal kernelQ_τ(X, dY), let us introduce some notations:

• d^X_i,j = 1/(kx_i−x_jk+α),

• D^X =P

k,l:k<ld^X_k,l,

• φ(.|µ, S) denotes the gaussian pdf with mean µand covariance matrixS,

• G_µ,S = R

Eφ(y|µ, S)dy denotes the normalization constant associated to φ(.|µ, S) on the domain E,

In order to show the convergence of the previous algorithm, some lemmas are introduced.

Lemma 4.3. For all X ∈ E^N, q_τ(X, .) ≥ q_min > 0 and q_τ(X, .) ≤ q_max, Q_τ(X, .)-almost everywhere on E^N.

Proof

The fact that qτ(X, .) ≤ qmax is true since the normalization constants are lower-bounded, the gaussian densities are uniformly bounded since τ₀≥τ ≥τ_min>0 and all the other terms can be upper bounded by 1.

The other assertion is only true Qτ(X, .)-almost everywhere onE^N. It means that the lower bound on q_τ(X,Y) is given when X and Y have at least N −1 points in common and are both in E^N.

The following lower bounds are used:

• G⁻¹_x

where ξ is the largest eigenvalue of Σ⁻¹.

q_min>0 is found by multiplying these expressions and it is a lower bound ofq_τ(X,Y) which does not depend onτ and on the states if X∈E^N and Y∈E^N have at least N −1 points in common. ✷

Let us denote by (τ_t)_t_≥₁ the values ofτ used during the iterations of the algorithm. This lemma gives that, for a sequence ofN proposal kernels (Qτ1, . . . , Qτ_N), it is possible to reach any state Y ∈ E^N from any state X ∈ E^N. Indeed, at each transition the density is lower bounded by q_min and at each transition one of the N points is moved. Hence, we get the following lemma:

Lemma 4.4. If τ_t is such that τ₀ ≤ τ_t ≤τ_min, ∀t ≥ 1, there exists ǫ > 0 such that for all A∈B(E^N) (Borelian subset of E^N), and for all X∈E^N,

(Q_τ₁· · ·Q_τ_N)(X, A)≥ǫλ(A)/λ(E^N). (4.3) where λ denotes the Lebesgue measure on the compact set E^N (λ(dX) = I_E_N(X)Leb(dX) where Leb is the Lebesgue measure on (R^d)^N).

According to the previous comments ǫ=q_min^N suits.

Then, the Hasting-Metropolis (HM) kernel is focused on. It is the global kernel which describes an iteration of the algorithm. It obviously depends on the parametersβ and τ. It reads as, fixed. In a simulated annealing algorithm, the target distribution is the Gibbs measure, ie µ_β(dX) = exp(−βU(X))Z_β⁻¹λ(dX) whereZ_β =R

e⁻^βU^(Y)λ(dY).

Lemma 4.5. µ_β is K_β,τ-reversible for allτ, β. It implies that µ_β is K_β,τ-invariant.

Proof

If X 6= Y, we have µ_β(X)q_τ(X,Y)a_β,τ(X,Y) = µ_β(Y)q_τ(Y,X)a_β,τ(Y,X). Indeed, if µ_β(Y)qτ(Y,X) > µ_β(X)qτ(X,Y), a_β,τ(X,Y) = 1 and a_β,τ(Y,X) = ^µ_µ^β^(Y)q^τ^(Y,X)

β(X)qτ(X,Y). The other case is done by symmetry in Xand Y.

Let ¯bβ,τ(X) = 1−R

Eaβ,τ(X,Z)Qτ(X, dZ),

we haveµ_β(X)¯b_β,τ(X)δX(dY) =µ_β(Y)¯b_β,τ(Y)δY(dX).

Indeed, this measure is non-zero only in the caseX=Y. Therefore, µ_β(dX)K_β,τ(X, dY) =µ_β(dY)K_β,τ(Y, dX).

✷

We will show the convergence of our algorithm following the proof given in Bartoli and Del Moral (2001). Some adaptations are necessary since our proposal kernel depends on a variance parameterτ and since there is no reversible measure for the proposal kernel Q_τ. For those reasons, the ratio between the proposal densities has to be in the acceptance ratea_β,τ since it makesµ_β K_β,τ-reversible and then invariant.

In Bartoli and Del Moral (2001), the reversibility of K_β was shown thanks to the re-versibility of Q. That is why the ratio of the proposal does not intervene in the acceptance rate of their algorithm.

The next lemma states that whenβis large, the target distributionµ_β concentrates on the minima of the functionU. U :E^N → R₊ is lower bounded with respect toλ (the Lebesgue measure on the compact setE^N). We use the following notation

m= sup_a[a;λ({X;U(X)< a}) = 0], by definition λ({X;U(X)< m}) = 0.

Moreover, for all ǫ > 0, we define U_λ^ǫ = {X;U(X) ≤ m +ǫ} which is clearly such that λ(U_λ^ǫ)>0 andU_λ^ǫ,c ={X; U(X)> m+ǫ}.

Lemma 4.6.

∀ǫ >0, lim

β→∞µ_β(U_λ^ǫ) = 1. Proof

IfX∈U_λ^ǫ, then e−β(U(X)−(m+ǫ))≥1 and λ(e⁻^β(U⁻^(m+ǫ))) =

e⁻^β(U(X)⁻^(m+ǫ))λ(dX)

≥ Z

I_Uǫ

λ(X)e⁻^β(U(X)⁻^(m+ǫ))λ(dX)

≥ λ(U_λ^ǫ).

Then,

Dominated convergence Theorem can be applied to the integral on the right-hand side since the function is bounded by 1 which is integrable on the compact set E^N. Thus,

β→∞lim λ

The distribution of the Markov chain associated to an inverse cooling schedulet7→β(t) and to a variance schedule t 7→ τ(t) is denoted η_n. According to the previous results, we have η_n+1 =η_nK_β(n),τ(n) and µ_β(n) =µ_β(n)K_β(n),τ_(n). The aim is to prove that

where osc(U) is the smallest positive number h such that for all X, Y in E^N, U(Y)−U(X)≤h.

According to the upper-bound ofa_β,τ(X,Y), it is shown that

Thus for (X, A)∈E^N×B(E^N),K_β,τ(X, A)≥e^−βosc(U)^q_q^min

For p = N and thanks to the condition (4.3), there is an upper-bound to the Dobrushin coefficient,

As a consequence, an application of Dobrushin Theorem states that for all µ₁, µ₂ proba-bility measures on E^N,

kµ₁K_β₁_,τ₁· · ·K_β_N_,τ_N−µ₂K_β₁_,τ₁· · ·K_β_N_,τ_Nk ≤ 1−ǫ λ(U)>0, these notations are used

µ_U(dX) =Z_U⁻¹e⁻^U(X)λ(dX) where Z_U =λ(e⁻^U). Thanks to Dobrushin Theorem, we get

kµ_U₁−µ_U₂k ≤1−exp

✷

If this lemma is applied to U1 =β1U and U2 =β2U, 0 < β1 < β2, then an upper bound is obtained on the Gibbs measures:

kµ_β₁ −µ_β₂k ≤(β₂−β₁)osc(U).

The next lemma is useful in order to choose the funtionn7→βn.

Lemma 4.9. Let I_n, a_n, b_n, n≥0 be three sequences of positive numbers such that ∀n≥1 I_n≤(1−a_n)I_n₋₁+b_n. Ifa_nandb_nare such thatlim_n_→∞_a^bⁿ

n = 0andlim_n_→∞Qn

p=1(1−a_p) = 0 then,

n→∞lim In= 0. Proof

According to the assumptions, for allǫ >0 there exists an integer n(ǫ)≥1 such that

∀n≥n(ǫ), bn≥ǫan, Yn

p=1

(1−ap)≤ǫ . As a consequence for all these n≥n(ǫ), it holds that

I_n−ǫ ≤ (1−a_n)I_n₋₁−ǫ(1−a_n) = (1−a_n)(I_n₋₁−ǫ)

≤



 Yn

p=1

(1−ap)



(I0−ǫ).

It implies that for all n≥n(ǫ),

0≤I_n≤ǫ+ǫ(I₀+ǫ)≤ǫ(1 +ǫ+|I₀|) which ends the proof. ✷

The convergence Theorem can now be stated.

Theorem 4.1. If the sequence (τ_n)_n_≥₀ is such that ∀n≥0, τ₀ ≥τ_n≥τ_min>0 and if β_n= 1

Clog(n+e), C > N osc(U), we get

∀ǫ >0, lim

n→∞P_η(Xn∈U_λ^ǫ) = 1

where U_λ^ǫ ={X∈ E;U(X) ≤m+ǫ} and {X_n;n≥0} denotes the random sequence we get from the simulated annealing algorithm with an initial probability distribution η on E^N.

Proof

For any non-decreasing sequence,

0≤β₁≤. . .≤β_p

and for every probability distributionη onE^N, it is first noticed that

kηK_β₁_,τ₁· · ·K_β_N_,τ_N −µ_β_N+1k ≤ kηK_β₁_,τ₁· · ·K_β_N_,τ_N −µ_β₁K_β₁_,τ₁· · ·K_β_N_,τ_Nk+ kµ_β₁K_β₁_,τ₁· · ·K_β_N_,τ_N −µ_β_N+1k.

Thanks to the remark following the lemma 4.7, forε >0, it holds that kηK_β₁_,τ₁· · ·K_β_N_,τ_N −µ_β₁K_β₁_,τ₁· · ·K_β_N_,τ_Nk ≤

1−εe⁻^{N β}^N^osc(U)

kη−µ_β₁k. (4.4) For the second term, the following decomposition is used

µ_β₁K_β₁_,τ₁· · ·K_β_N_,τ_N−µ_β_N+1 =

wherebis the contraction coefficient (by Dobrushin Theorema(K) +b(K) = 1).

An application of the lemma 4.8 gives

kµ_β₁K_β₁_,τ₁· · ·K_β_N_,τ_N−µ_β_N+1k ≤osc(U) XN

k=1

(β_k+1−β_k) = (β_N+1−β₁)osc(U). (4.5) By combining (4.4) and (4.5), it is deduced that

kηK_β₁_,τ₁· · ·K_β_N_,τ_N −µ_β_N+1k ≤

1−εe^{−N β}^N^osc(U)

kη−µ_β₁k+ (β_N+1−β₁)osc(U). Instead of (β1, . . . , β_N) and η, we take (β_kN, . . . , β_(k+1)N) andη_kN:

I_k+1=kη_kNK_β_kN_,τ_KN· · ·K_β_(k+1)N_,τ_(k+1)N −µ_β_(k+1)Nk=kη_(k+1)N −µ_β_(k+1)Nk. By the previous upper bound, the recursive inequalities are stated

I_k+1 ≤

checked that if C > N osc(U) then ^b_a^k+1

In the case whereE is not explicit, the normalization constantGm,S of a gaussian distribution with mean m and covariance matrix S cannot be computed. Hence, the ratio of densities of proposal kernels is not tractable. In that case, we first propose to use as a proposal an unconstrained gaussian random walk. The steps 3 and 4 of Algorithm 4.1 are modified.

Algorithm 4.2. The first steps until step 3 are the same.

Step 3 is replaced with

3bis. A gaussian random walk is used to propose a new point : x^prop_k ∼ Nd(x^(t)_k , τΣ). And step 4 is replaced with

4bis. IfX^prop∈E^N,X^(t+1) =X^prop with probability

In the last step, ˜q_τ(X, .) stands for the density of the proposal kernel where the gaussian random walk is not constraint to remain in the domain E. For anyX∈E^N, Y ∈(R^d)^N,

Since a lemma similar to lemma 4.3 can be proved for the kernel ˜Q_τ (corresponding to the density ˜qτ), theorem 4.1 still applies to it. Hence, there is also a convergence result for Algorithm 4.2.

However, since a point can be proposed outside of the domainE, this algorithm can suffer from a lack of efficiency. Another solution is to use the first algorithm without the ratio of densities of proposal kernels.

Dans le document Méthodes d’interpolation à noyaux pour l’approximation de fonctions type boîte noire coûteuses (Page 89-97)