A NNALES DE L ’I. H. P., SECTION B
A LAIN T ROUVÉ
Rough large deviation estimates for the optimal convergence speed exponent of generalized simulated annealing algorithms
Annales de l’I. H. P., section B, tome 32, n
o3 (1996), p. 299-348
<http://www.numdam.org/item?id=AIHPB_1996__32_3_299_0>
© Gauthier-Villars, 1996, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les condi- tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
Rough large deviation estimates for the
optimal convergence speed exponent of generalized simulated annealing algorithms
Alain
TROUVÉ
LMENS/DIAM, URA 762, Ecole Normale Supérieure, 45, rue d’Ulm, 75230 Paris Cedex 05, France. E-mail address: trouve@ens.ens.fr
Vol. 32, n° 3, 1996, p. 299-348 Probabilités et Statistiques
ABSTRACT. - We
study
the convergencespeed
ofgeneralized
simulatedannealing algorithms. Large
deviations estimates uniform in thecooling
schedule are established for the exit from the
cycles
in thespirit
of Catoni’ssequential annealing
work[2].
Wecompute
theoptimal
convergencespeed
exponent proving
aconjectured
of R. Azencott[1].
RÉSUMÉ. - Dans ce
papier,
nous étudions la vitesse de convergence desalgorithmes
de recuitgénéralisé.
Nous établissons des estimées degrandes
deviations uniformes en le schema de
temperature
pour lestemps
et les lois de sorties descycles
dansl’esprit
du travail de O. Catoni pour le recuitséquentiel [2].
Nous obtenons alors la valeur del’exposant optimal
pour la vitesse de convergence sur les schemas de
temperature
décroissantsconjecturée
par R. Azencott[1].
CONTENTS 1. Introduction
1.1. From
sequential
togeneralized
simulatedannealing
1.2. Theoretical framework 1.3. Outlines
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques - 0246-0203
Vol. 32/96/03/$ 4.00/@ Gauthier-Villars
2. The
cycle decomposition
3. Renormalization of the communication cost 4. The localization theorems
4.1. Survival kernels 4.2. Localization kernels
4.3. Main
large
deviation estimates5.
Necessary
and sufficient condition for convergence 6.Optimal
convergencespeed exponent
6.1.
Upper
bound6.2. Lower bound 7. Conclusion
1. INTRODUCTION
1.1. From
sequential
togeneralized
simulatedannealing Sequential
simulatedannealing
is a verygeneral
and flexible methodfor
finding good
solutions to hardoptimization problems.
Itsprinciple
issurprisingly simple
and was formulated in theearly
1980s[12].
Considera real valued function U to be minimized on a finite set E called the state space
(or
theconfiguration space). Now,
let an irreducible Markov kernel q on E begiven
and define for all T > 0 the Markov kernelQT
on Esuch that for any
2, j
EE,
i~ j
where
x+
denotes thepositive part
of x. For T =0,
denoteQo
=QT.
Anhomogeneous
Markov chain X =(Xn)nEN
withtransition kernel
Qo
satisfiesU(Xn)
> and istrapped
in a finitenumber of
steps
into a local minima of U. For T >0, QT
is aperturbation
of the
previous
mechanismallowing hill-climbing
moves.Assuming
thatq( i, j)
=q ( j, i ), QT
admits 7fT asunique equilibrium probability
measurewhich satisfies for all i E E
The
parameter
T isphysically interpreted
as atemperature
and one verifiesthat
This
particular
feature is thekey
of the simulatedannealing approach.
Considering
adecreasing cooling schedule T ==
i.e. a sequence ofdecreasing temperatures,
we define aninhomogeneous
Markov chain X =(Xn)nEN
with transition kernel 1 at time n. The intuitive idea is that forsufficiently slowly decreasing cooling schedules,
the law ofXn
should be close to 03C0Tn and we should haveMany
results are known onsequential
simulatedannealing.
SinceHajek’s
paper
[8],
one knows that there exists H > 0 such that for alldecreasing cooling schedules T vanishing
to zero,(4)
holds if andonly
if +oo. Thisproblem
of convergence can be extendedto the
moregeneral problem
of convergence on level setsEx = {
ie E ] U(i)
> min U-~ ~ ~
for A > 0. Thisproblem
hasbeen
successfully
solvedby
O. Catoni in[2]
who proves that there existsra
> 0 such that if T is adecreasing cooling cooling
schedulevanishing
to zero, then
More crucial for
practical
use ofsequential
simulatedannealing
is the rateof convergence. More
precisely,
letCatoni’s work
[2] gives
that there exist aa >0, aa
>0, K1
> 0 andK2
> 0 such thatThe lower bound says that the mass in
Ex
cannot vanish faster than apower of
1/n
and that this power is upper boundedby Conversely,
theupper bound says that for each N E N there exists a
decreasing cooling
schedule
T N
such that for thiscooling
scheduleVol. 32, nO 3-1996.
Note that we may have
T~~ ~
and in fact theproof
of(8)
showsthat one should
designed
thecooling
scheduleaccording
to the horizon.The value of a~, and
~xa
aregiven explicitly
in function of Uand q (even
if their numerical
computation
is an hard combinatorialproblem).
Since forsufficiently
small value of A we have aa =c~a
we deduce that ifthen we cannot
expect
for the convergence to the set ofglobal
minima ofU a better
speed exponent
withdecreasing cooling
schedules than In manypractical situation, computer
scientists haveproposed
modifiedversion of the
sequential
simulatedannealing
in order to increase thespeed
of convergence. One of the most
promising
issues is theparallelization
of thealgorithm
in order to distribute thecomputations
on several processes. As an illustration of such anapproach,
let us have a look onsequential
simulatedannealing
inimage processing (see [6]).
In this case, theconfiguration
space E is a
product
space E =LS
where S is a set ofpixels
and L aset of labels
(usually
greylevels).
For each i EE, z(s)
is the value of the grey level atpixel s
E S. Thesequential annealing
process has in this framework thefollowing particular
form. One define first afamily
of localupdating
kernelsby
where il’,s
is theconfiguration j
E E suchthat j ( s )
= l’and j (Sf)
=for s’ ~
s(only
thepixel
s ischanged). Now,
let afamily
exhausting
the elements of S begiven (generally
we choose asweeping
line
by
line of theimage).
We define thefamily Q
= of transition kernels of thesequential
simulatedannealing by
Rigorously speaking, QT
cannot be written of the form(1)
and theprevious
results do not
apply. However,
an extension of them to this case does notimply deep
modifications of the simulatedannealing theory
since 7rT is stillthe
unique
invariantprobability
measure ofQT
and sinceQT
is reversiblefor T > 0.
Now,
consider thefollowing parallelization
of thesequential
process: assume that one has a small processor attaches to each
pixel
andassume that all the
pixels
areupdating synchronously according
to the localupdating
rule. We define then a new transition matrixgiven by
One can
expect
that thespeed
of convergence is increasedby
a factorI
since all the labels are
updating
in one unit of time(instead
of~S~ I
unitsof time
previously). However,
some veryimportant changes
have occurredowing
to the interactions between processors and we have togeneralize
theframework of simulated
annealing.
Instead of the form(1),
we canonly
say that there exist an irreducible Markov kernel q onE,
a real valued numberx E
[1, +oo
and afamily
of nonnegative
numbers(V(i,
such thatFor the classical
sequential annealing,
we have x = 1 andV(i, j) _ (U(j) - U(i))+
so thatV( i, j) - V(j, i) = U(j) - 7(z).
Thisproperty
which is a consequence of the
reversibility
of thesequential annealing
transition kernel does not hold any more for the kernel
KT (in fact,
there does not exist any function W such thatV (i, j ) - V ( j, i )
=W ( j ) - W (i) ).
As a consequence, none of the
previous
results canapply
and we need anextended
theory
to handle the convergenceproperties
ofparallel
version ofsequential
simulatedannealing.
This extension will be the mainsubject
ofthis paper. We should
emphasize
the fact that many stochasticoptimization algorithms
fall into the scope of this extension and notonly
theparallel
versions of the
sequential annealing
forimage processing presented
above.This extension covers in fact all the small random
perturbations (with
discrete time and finite statespace)
ofdynamical systems
asdeveloped by
Wentzell and Freidlin in
[5].
Thepoint
is to handle here the convergenceproperties
of such processes when theperturbation
vanishes with time.1.2. Theoretical framework
We introduce now
precisely
our theoretical framework. Let E be a finiteset which will
stay
fixedthroughout
this work.DEFINITION 1.1. - Let ~ E
[1, and
let q be an irreducible Markov kernel on E. We say that acontinuously parametered family Q
=(QT )T?:.O
of Markov kernel on E is admissible for ~
and q
if andonly
if there existsa
family
V =(V(i,
such thatFor T =
0,
we take the convention that = Thefamily
V is called the communication cost and the set of all the admissible families
Q for q
and ~ is denotedby A(q, x).
Vol. 32, n° 3-1996.
Remark 1. - As a main extension of the standard
sequential
simulatedannealing
transitionkernels,
we do not assumethat q
issymmetric
and andV can be
completely arbitrary.
In the
following definition,
we define theprobability
measure associatedwith an abstract
annealing
process on E.DEFINITION 1.2. - Let X =
(Xn)nEN
be the coordinate process onEN.
For all 9 =
(Q, T, vo)
where. Q =
is acontinuously parametered family
of Markov kernelson
E,
. T = is a sequence of non
negative
real valued numbers calledcooling schedule,
. vo is a
probability
measure on E called the initialdistribution,
we denote
Po
theunique probability
measure onEN
for thea-algebra
n >
0)
such that X is underPo
a Markov chain with initial distribution vo and transition kernelQTn+1 at
time n.We can now define the extended notion of
generalized
simulatedannealing.
DEFINITION 1.3. - Let q be an irreducible Markov kernel on
E,
letx E and let X = be the coordinate process on
E~.
We say that X is a
generalized
simulatedannealing
process(G.S.A.)
withparameter (q, r~, P)
if there existQ
EA(q, x),
acooling schedule T
andan initial distribution vo such that P =
Po
with 8 =(Q, T, vo). Moreover,
as
usual, P( ( X o
=i )
will denote theprobability
measurePgi
where forall i C
.E, 8i
=Remark 2. - An
equivalent
framework has been consideredby Hwang
and Sheu in
[10].
Their results will bereported
below in this section.Considering
anarbitrary
communication costV,
a G.S.A. is not defined around anexplicit
energy function U.However,
there exists animplicit
function W introduced
by
the definitions below whichplays
for lowtemperatures
the same role. This functiondepends
on the communicationcost V
through
a functional onA-graphs
that we define now.DEFINITION 1.4. - Let
A C E.
We say that aset g
of arrowsi-+j
inA~ x E is an
A-graph
iff:(1)
for each i EA c,
there exists anunique
e E such thati-t j
e g;(2)
for each i EA c,
there is apath in g ending
on aconfiguration
in A.We denote
by G(A)
the set of theA-graphs.
DEFINITION 1.5. -
Let q
be an irreducible Markov kernel onE,
let 03BA E[1, +oo [ and
letQ
EA(q, K-).
Let V be the associated communication cost.For each i E E and
each g
EG({i}),
we denoteV(g) = V(i, j).
We say that W : is the virtual energy associated with
Q
if andonly
if for each i E ERemark 3. - Since
Tj(i, j)
if andonly
ifq(i, j)
> 0 and sinceq is irreducible on
E,
oneeasily
verifies thatW( i)
The aboveconstruction of W has been
given by
Wentzell and Freidlin in[5]
as wellas the
following proposition.
PROPOSITION 1.6. - Let q be an irreducible Markov kernel on
E,
letn E
[1, +oo[ and
letQ
EA(q, ~).
For each T >0, QT
is irreducible andwe denote its
unique
invariantprobability
measure. Thenif
W is thevirtual energy associated with
Q
we haveRemark 4. - This
proposition
shows that E E( W (i)
_min W
~)-~1
when thetemperature
vanishes.Moreover, J-LT(i)
is of theorder of
exp( -(W(i) -
for lowtemperature
which should becompared
to of the order ofexp ( - ( U ( z ) - min U) jT)
for thesequential
simulatedannealing.
Hence a G.S.A. is aoptimizating
process for the virtual energy.1.3. Outlines
In the
general framework,
one caneasily
show that thegeneralized
simulated
annealing algorithm
converges forsufficiently slowly decreasing cooling
schedule to theglobal
minima of a virtual energy function W. Forcooling
scheduleTn
=cj(ln(n +2)),
a necessary and sufficient conditionon the value of c for this convergence is
given
in[3], [10], [9]
and[13]
without
giving
theoptimal
convergencespeed
inparticular
because sucha result needs
large
deviation estimates uniform in thecooling
schedulesas established
by
O. Catoni in[2].
Our maingoal
will be here to prove that theapproach
of O. Catoni can besuccessfully
extended to thegeneral
framework and
gives
the value of theoptimal
convergencespeed exponent conjectured by
R. Azencott in[1].
We will start in Section 2 with the
cycle decomposition
of the state space E.Roughly speaking,
acycle
is a subset of E which can be considered as Vol. 32, n° 3-1996.a
point
forsufficiently
lowtemperature.
One of theirimportant properties
is that at constant
temperature T,
the exit time of acycle
II is of orderwhatever the
starting point
in II is. The nonnegative
numberis called the exit
height
of II. Thecycles
are structured on a tree whose root is E and leaves are thesingletons
since twocycles II1
andII2
are either
disjoint
or included one in the other. We will recall the recursive construction of thecycles
whichdepends only
on the communication costV of the G. S .A. and we will show the link between the virtual energy and the
cycle decomposition.
In Section
3,
we will introduce for eachACE,.
i E Aand j
E Ea new cost called the renormalized communication cost in A.
Starting
from i EA,
theprobability
that the process at constanttemperature
visits theedge (i, j )
before the exit time of A(included)
is of orderThis renormalized cost in A will be
expressed
infunction of the
cycle decomposition.
Both
previous
sections are in fact devoted to the combinatorial definitions of thecycle decomposition
and of the renormalized costs without any resultson their
probabilistic meaning.
We will state in Section4, large
deviationestimates on exit time and exit
point
of subset of E which willenlighten
the
probabilistic
role of thequantities
mentioned above. We will handlelarge
deviation estimates forarbitrary decreasing cooling
schedules whichare the
keys
for thestudy
of the convergence of the G.S.A.As done for the
sequential
simulatedannealing,
theproblem
ofconvergence can be written in the
following
way: Let A > 0 and denotesOur first convergence
result,
established in Section5,
concerns the necessary and sufficient condition on thecooling
schedule T for which the mass inEx
vanishes when the number ofsteps
increases:Let q be an irreducible Markov kernel on E and let 03BA E
[1, +oo[.
Let Xbe a G.S.A. with
parameter (q,
x,P)
and let T be theunderlying cooling
schedule. Assume that T is
decreasing (Tn
> and vanishes whenn tends to
infinity.
Then for all A >0,
there existsrÀ
> 0(independent
of
T)
such thatThe value of
ra
isexplicitly given
in function of thecycle decomposition by
Finally,
in Section6,
westudy
theoptimal
convergencespeed
andwe establish the
following
result which says thatif q
is an irreducible Markov kernel onE,
if ~ E~l, +oo[
and if X is a G.S.A withparameter (q, ~, P)
whoseunderlying cooling
schedule T isdecreasing, then, assuming
that theunderlying family Q
E satisfies anadditional weak condition
Ci (which
is fulfilled for instance if~~ ~~
there exists b > 0(which depends
onQ
but not on
T)
such that for all level A > 0 and all n >0,
with
where
W(H)
=infi~03A0 W(i).
We
finally
establish a last result which say we can construct for each N adecreasing cooling
scheduleT N
such that ifPN
= 1 thenthere exists
K1
> 0 such thatB lt
where
This
exponent
isoptimal
forsufficiently
small A > 0 since we have then~xa =
ax. As acorollary,
we prove aconjecture
of Azencott[1] ]
onthe
optimal
convergencespeed exponent
for the convergence towards theglobal
minima of the virtual energy W. -2. THE CYCLE
DECOMPOSITION
Let q
be an irreducible Markov kernel on E and let 03BA>
1. LetQ
=A(q, ~)
and denotes V theunderlying
communication cost. Wepresent
now thecycle decomposition
of E whichdepends only
on V. The
proof
of theprobabilistic properties
of thecycles
arepostponed
to Section 4 where we will
study
exit time and exitpoint
from acycle.
Vol. 32, n° 3-1996.
The
cycle decomposition
for agiven
communication cost has been introducedby
Wentzell and Freidlin[1] ]
and we recall itbriefly
forcompleteness.
We need first to define the usual notion ofpath
associatedwith a cost function.
DEFINITION 2.1. - Let F be a finite set, C : F x be a function
(such
a function will be called a communication cost function onF),
andi
and j
be two distinctconfigurations
of F.(1)
We denotePthF( i, j)
the set of allpaths
in F such thatgo = i
and gn = j .
Thepath dependent integer n
will be denoted n9 and called thelength
of g.(2) Let g
in we definedC(g) by
We
adopt
the convention thatC(g) =
+0oo if one of the summation terms is +0o.The
decomposition
incycles
is defined in a recursive way. First the setE°
of thecycle
of order 0 is definedby:
Let us consider the communication cost function
Y°
onE°
definedby
Assume that the set
E~
of thecycles
of order k has been constructed aswell as a communication cost function
Vk
onEB
The construction of thepair
can besplit
in severalsteps:
(1)
From we define an other communication costV~
onE~ by
II
IT
if either n =IT
or thereexists g
EPthEk(03A0, n’)
such that =
0,
and the
equivalence
relationRk by
(3)
Defineby
and define on the
partial order by II2+1
if there existIIf
cII:+1
for i =1, 2
such thatII2 -~
We noteD:+1
the set of theminimal elements of for the order
~.
(4)
Defineby
(5)
We define now a communication cost onby
The construction continues until
Ek
={E~.
We denote nE theinteger
suchthat
{E~
and ={E}.
This
procedure gives
an hierarchicaldecomposition
of the states spaceon a tree
beginning
with thesingletons
andending
with the whole space.One could find an helful
example
of this recursive construction in[16].
From this tree we will introduce the
following
subsets.DEFINITION 2.2. -
(1)
We denoteC(E)
the set of all thecycles
that is. We define the maximal
partition M (A)
of Aby:
Vol. 32, nO 3-1996.
. We define the maximal proper
partition .M*(A)
of Aby:
.M*(A) _ {
II EC(E) n
is a maximal element inWe will now
introduce
in thefollowing
definition someimportant
parameters on the
cycles.
DEFINITION
2.3. - Let n EC(E).
(1)
We denote the real valued numberThe real will be called the exit
height
of II. For 11 ={i}
we willoften
prefer
the notation toH,. ( { i, } ).
(2)
We denote the real numberThe real will be
mixing height
of II.(3)
We denoteW(II)
the real valued number(4)
We denoteF(II)
the setThe set
F(II)
will be called the bottom of II.Remark 5. - The
probabilistic interpretation
ofHe (II)
andHm(II)
canbe
easily given.
For T >0,
let X be a G.S.A. with parameter(q,
x,PT)
where the
underlying cooling
schedule T = isassuming
to beconstant and
equal
to T(Tn
=T,
Vn eN).
For eachcycle
II the exit time from II isof
order anystarting point
in II and theprobability
to go from z E II
to j
E II within a time of order tends to 1 when thetemperature
T vanishes.Some
parts
of the above definition can be extended to anarbitrary
set.DEFINITION 2.4. - Let
(1)
We define the exitheight He(A)
of Aby
(2)
We denoteW(A)
the real valued number(3)
We define the bottomF(A)
of Aby
DEFINITION 2.5. - Let i E
E,
we defineby
induction theincreasing family
of
cycles by iO == ~i~
andRemark 6. - A set A which is not a
cycle
should be considered as aninhomogeneous
subset of E for the exit time from A.However,
for anystarting point
inA,
a G.S.A withparameter (q,
x,PT) (see
remark5)
canexit from A at least within a time of order
eHe(A)/T.
We show in the next
proposition proved
in[15]
thestrong
link that exists between the virtual energy Wand thecycle decomposition.
PROPOSITION 2.6. - Let i E E. Then
3. RENORMALIZATION OF THE COMMUNICATION COST In this
section,
we define from the communication cost V and an action functional for the exitpaths
from a subset A of E. Moreprecisely, starting
from a
point
i EA,
we want to evaluate theprobability
that before the exit time fromA,
the process(running
at constanttemperature T)
visits theedge (i, j).
It appears that thisprobability
is of order HenceC A (i, j)
can beinterpreted
as a new cost whichweights
the cost of alarge
Vol. 32, n° 3-1996.
deviation event
(the
visit of theedge (i, j))
from the standard behavior of the processrunning
in A. This cost will be called the renormalized cost in A. This section is devoted to its combinatorial definition in function of V and topreparatory
lemmas for thelarge
deviation estimates established in Section 4.DEFINITION 3.1. -
(1)
Leti, j
EE,
i~ j .
We denoteby
n2~ theinteger uniquely
definedby :
(2)
LetA C E,
E. For all i E E we define ni,Aby
PROPOSITION 3.2. - Let E. We
define
on E thefunction CA by :
Then
CA
is a communication cost called therenormalized
communication cost in A.Proof. -
It is sufficient to prove thatC A (i, j)
> 0.Moreover,
it issufficient to prove that for z we have :
This will be
proved by
induction on the value of nij.Assume that nij = 0. Then
(14)
isequivalent
towhich is obvious.
Now assume that the result has been
proved
for nij I~ - 1. Then from the recursive construction of thecycles
and from the inductionhypothesis
we have
Furthermore,
from the definition ofV~
we haveAdding (15)
and(16)
we deduce(14)
so that theproposition
isproved.
0DEFINITION 3.3. - Let
A C E,
and leti, j
G E be two distinctconfigurations.
We denoteby CA(i, j)
the numberHere the infimum is taken on the
paths
from ito j
thatstay
within Aexcept eventually
at their extremities.Remark 7. -
Starting
from theprobabilistic interpretation
ofCA given
inthe introduction of this
section,
we caneasily
deduce theinterpretation
ofCA(i, j ).
For all i E Aand j
EE,
if g EPthA (z, j )
is apath staying
in A(excepted eventually
for itsright
handextremity),
theprobability
that theprocess
starting
from i visits all theedges of g
beforeexiting
from A isof order
Hence, if j
EA~,
theprobability
that the process exit from A atpoint j
is of orderWe establish now four lemmas which will be useful for the next section.
LEMMA 3.4. - Let n E
C(E)
andACE, A =/:
E.(1)
Assume that03A0 ~
E and let II’ E Enn be acycle
such thatThen there exist e E H and
f
E IT/ such thatCn (e, f)
= 0.(2)
For alli, j
EII,
we have:(3)
For all i EA, let A~)
=inf{
EA~ ~ .
We haveRemark 8. -
According
to theprobabilistic interpretation given
in theremark
7, (1)
defines the arrows followedby
the process when its goes out of aparticular cycle.
For acycle
of level0,
that is forsingleton {e},
thesearrows are defined
by V(e, f)
=Moreover, (2)
says thatstarting
from i E
II,
the process visits all theconfigurations j
E II beforeexiting
Vol. 32, n° 3-1996.
with a
probability
nonvanishing
with thetemperature.
Theinterpretation
of
(3)
isstraightforward.
Proof -
Let usbegin
withpart 1).
From the recursive definition of thecycle,
there exist e E n andf
E ITsatisying:
We
verify
thatso that we
get
with(17)
and(18)
thatGn( e, f)
= 0.We prove now the
part 2)
of the lemmaby
induction on k =supi,j~03A0 nij.
Assume that k =
0,
then nij = 0and j
Ei 1. Furthermore,
for all a, b EII, a ~ b,
nab = 0 we haveso that the result follows
easily.
Assume now that the result is true If
i k = j ~
then the inductionhypothesis gives C;k (i, j)
= 0. Since we haveC k (z, j ) >
theresult is
proved. Otherwise,
it is sufficient to prove that if = thenCn (I, j)
= 0.However, using part 1 )
with II = we deducethat there exist e and
f
such thatCn (e, f ) Cik(e, f)
= 0.The
proof
iscompleted
if we notice that weget
also from the inductionhypothesis
thatWe are now concerned
by
theproof
of3).
We startby proving
the result fora
cycle
IIe C(E)
distinct of E. Since thepoint 2)
isproved,
it is sufficientto prove that there exist e E II and
f
E IP such thatGn( e, f)
= 0.However,
this has beenproved
inpart 1).
We consider now thegeneral
case of a strict subset of E. Let i be an element of
A,
we prove the resultby
induction on the valueWe assume here that ni,A == 0. From the definition of we deduce that
i 1 n A~ ~ ~ . Let j
bein i 1
n A c. From the construction of thecycle
there exists a
path g
E such that:Hence,
if ~ =inf{ 0 ~ ~
e ~},
thepath ~ stopped
at r~defined
by
= verifies:so that = 0 and the result is
proved.
Assume now that the result is
proved
for all i in A such that ni,A k.Assume that ni,A = k. We note then
II+
= From the definition of ni,A, we haveII+
nA~ ~ 0. Let j
be inII+
nAC,
we have from thepoint 1)
thatC*03A0+ (i, j )
= 0.Hence,
we consider apath g
EPthn+ (i, j )
such thatCn+ (i, j)
= 0. Aspreviously,
westop
thepath
at its first exit fromA,
i.e.we
define 9
= where r~ =inf{ k ] A~.
Since forall k ~ rg,
9k
En+,
we haveDefine 8g
=]
= ni,A~.
For all l~ s9, we havefrom
(19)
that ni,A = so thatand = 0. Now consider the two
following
cases. Ifs9 -~-1
= r9,then the result is
proved. Otherwise, applying
the inductionhypothesis,
wehave
CA (gs g + 1, A~ ) =
0 so thatThe
proof
of the lemma iscomplete.
DRemark 9. - We can
give
an heuristicproof
of the aboveequality
with theprobabilistic interpretation
of the renormalized costs. Consider the process atequilibrium
for a constantcooling
schedule attemperature
T.Now, compute
the order of the massexiting
from II at each timestep.
This mass isgiven by
theprobability
to be in II which is of order of(see proposition 1.6) multiplied by
theprobability by
unit of time to escapeVol. 32, n° 3-1996.
from II
starting
from II which is of orderHowever, choosing
apoint
i EII,
we can say that this mass isgiven by
theprobability
to be ati which is of order
multiplied by
theprobability
to escape fromII without any return to z which is of order Hence we
get
W(II)
+He(II)
= +Proof -
From theproposition
2.6 we know that there exists a constantAE
such that:Assume
that j
EII,
thenIt is true that nj,n does not
depend on j for j
E II so that we will omitthe
index j. Furthermore, since j nn
=II, only
the last termdepends really on j for j
E II . Hence forf
E we haveHence to prove the lemma it is sufficient to prove that
Let j
E II. Since = nij we haveHowever,
for k > nij, wehave jk
=ik
so thatOtherwise,
if nji l > nij, then nil = nji . We deduce then thatHence for all
paths g
from i to IPthrough
II we haveMoreover,
there exists apath g
from i to IPthrough
II such thatCn (g)
= 0and nigk is
increasing
in k. HenceWe deduce from
(20)
and(21)
thatso that the lemma is
proved.
DLEMMA 3.6. - Let II E
C (E),
there exists a real valued constantAn
suchthat
for
eachII’
E we haveRemark 10. - The
probabilistic interpretation
of the aboveequality
isthat at
equilibrium
at constanttemperature T,
the massexiting
from eachmaximal proper
sub-cycle II’
of il is of the same order.Proof. -
Letf
EF ( II’ ) .
We haveThe result is
proved
if we notice that the last summation term does notdepend
on II’ but on II. DLEMMA 3.7. - Let
(2, j )
E E xE,
i7~ ~
and consider acycle
i E II. Then we
have
Vol. 32, n° 3-1996.
Remark 11.- Here
again
we cangive
a heuristicprobabilistic proof
ofthe above
equality
if we consider the process atequilibrium
at constanttemperature
T. Theprobability
to visit theedge (2, j )
isgiven by
theprobability
to be at i which is of ordermultiplied by
theprobability
of the transition from ito j
which is of orderHowever, considering
thecycle
II thisprobability
can becompute
in another way. Theprobability
isgiven by
theprobability
to exit from II which is of ordermultiplied by
theprobability
to visitthe
edge (2, j )
at the exit time which is of order~Ze)~T .
Hence weget
+ +
W 2)
+Proof -
From the definition ofCn
weget
If j ~ II,
we havenij 2:
n2,n andThe lemma is
proved.
D4. THE LOCALIZATION THEOREMS
In Section
3,
we have defined the mainobjects
needed toperform
alarge
deviation
study
of a G.S.A. In thissection,
we turn to arigorous setting
ofthe heuristic
probabilistic interpretations
of thecycle decomposition
of Eand of the renormalized communication costs. One of our main tasks will be to estimate the
probability
thatstarting
from apoint
z EII,
the Markovchain escape from II at a
point j
E IT at agiven
time n. We will follow theoriginal approach
of O. Catoni[2]
who has handledlarge
deviationestimates for
arbitrary decreasing cooling
schedules.Throughout
thissection,
we will consider a fixed irreducible Markel kernel q on E and a fixed ~e 1,
We denote X a G. S . A. withparameter ( q, ~, P ) , ~
theunderlying cooling
schedule and V thecommunication cost.
4.1. Survival kernels
DEFINITION 4.1. - Let H >
0, a
> 0 and b > 0 be three real valued numbers. Now consider a kernel on theintegers Q :
Z x 7~-~[0, 1]
such thatThe set of all such kernels is called the set of the
right
survival kernelsfor
(H, a, b)
and denotedby DT (H,
a,b).
In the same way, we define theset
Dl (H,
a,b)
of the left survival kernels for(H, a, b) by
We will take the convention that the
product ~l ,~+1 ( )
is 0 if oneof the term is
negative.
Remark 12. - The survival kernels has been introduced in
[2]
andplay
a central role in our
large
deviation estimates. Theright
kernel will be used togive
upper and lower bound to theprobability
that the Markovchain
starting
in a subset A at time n exit from A at time n.They
followsapproximately
aninhomogeneous exponential
law where the control is done on thepartial
sums.We recall now the
stability
lemmasgiven
in[2].
LEMMA 4.2.
(Stability
underproduct). -
Consider two real valued numbersa > 0 and b > 0. Let
Q
E(respectively
a,b)),
letR E
1)r(H,a,b) (respectively
a,b)). Then,
there exista’
> 0 and b’ > 0 whichdepend
on a and b such that theproduct
kernelQR
is aelement
of D’~(H, a’, b’) (respectively a’, b’)).
LEMMA 4.3.
(Stability
under convexcombination). -
LetQ
E a,b) (respectively
a,b)),
let R E(respectively
and A E
[0,1].
ThenAQ+ (1- A)R
ED’~ (H, a, b) (respectively
a,b).
4.2. Localization kernels
Following Catoni,
we defineDEFINITION 4.4. -
(1)
LetA C E.
We define thestopping
timeT (A, m) by
(2) Let A c E,
B C E. We defineVol. 32, n° 3-1996.
The kernel
M(A, B) gives
theprobability
to have an entrance at time ninto the set
B
atj, starting
from i at time m andtraveling
in Aduring
theintermediate
time. It will be used with B and will describe the way that the Markov chain escape from A. On the otherside, L(A, B) gives
theprobability
to bein j
at time n,starting
from z at time m andtraveling
in A.4.3. Main
large
deviation estimatesTHEOREM 4.5. - There exist a >
0,
b >0,
c >0,
d >0, Kl
>0,
andI~2
> 0 whichdepend only
onE,
q and(and
not onY)
such thatfor
alln E
~,
the inductionhypothesis ?-~C’~ ( a, b,
c,d, Kl , K2 )
is true :and all 2 E II:
there exists
Q
EA),
a,b)
such thatMoreover
P(T(II,m)
>n |Xm
=i ) >
~A~ n,alliEA,all j
there exists
Q
E a,b)
such thatMoreover we have
~3 ( a, b, K2 ) :
For all II EC ( E), ~ I
n and all i E II :there exists
Q
E a,b)
such that~C4 (a, b, K2 ) :
For allA ~
n, all II EM(A)
and all i E A:there exists
Q
E a,b)
such thatRemark 13. - This first theorem collects all the basic
large
deviationestimates needed for the