A NNALES DE L ’I. H. P., SECTION B
O. C ATONI
Sharp large deviations estimates for simulated annealing algorithms
Annales de l’I. H. P., section B, tome 27, n
o3 (1991), p. 291-383
<http://www.numdam.org/item?id=AIHPB_1991__27_3_291_0>
© Gauthier-Villars, 1991, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les condi- tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
Sharp large deviations estimates for simulated
annealing algorithms
O. CATONI
lntelhgence Artiticielle et Mathematiqucs, Laboratoire de Mathématiques, U.A. 762 du C.N.R.S.,
Departement de Mathematiques et d’lnformatique, Ecole Normale Superieure, 45, rue d’Ulm, 75005 Paris, France
Probabilités et Statistiques
ABSTRACT. - Simulated
annealing algorithms
are Monte-Carlo simula- tions ofphysical systems
where thetemperature
is adecreasing
functionof time. The method can be used as a
general
purposeoptimization technique
to locate the minima of anarbitrary
function defined on a finitebut
possibly
verylarge
set. It is described as anon-stationary
controlledMarkov chain. The aim of this paper is to build a
large
deviationstheory
in thistime-inhomogeneous
discretesetting.
We make a carefulinvestigation
of the law of the exitpoint
and time from sets, based on Wentzell and Freidlin’sdecomposition
of the states space intocycles,
which leads us to establish some kind of
systematic
"calculus" of theprobability
ofjumps
from onearbitrary
subdomain into another one. We feel thatgiving
aprecise description
of howtrajectories
escape from attractorsbrings
aqualitative
contribution to theinsight
one may have into the behaviour of simulatedannealing.
We think also that oursharp large
deviation estimates are a useful tool for furtherinvestigations.
Weillustrate their use
by addressing
in this paper and in aforthcoming
onefour
questions:
convergence to theground
states,asymptotical equidistri-
bution on the
ground
states, the discussion ofquasi-equilibrium
and theClassification A.M.S. : 60 F 10, 60 J 10, 93 E 2.
Annales de l’lnstitut Henri Poincaré - Probabilités et Statistiques - 0246-0203 Vol. 27,~9I i03/291; 931~ 11,30,~ ~ Gauthier-Villars
Vol. 27, n° 3, 1991, p. 291 -383.
292 O. CATONI
shape
otoptimal cooling
schedules. Short cuts torough
but more uniform estimate will begiven
elsewhere.Key words : Simulated annealing, large deviations, non-stationary Markov chains.
RESUME. - Les
algorithmes de
recuit simulépermettent
de simuler Iecomportement
d’unsystème physique
dont latemperature
décroît au coursdu
temps.
Ils constituent par là-même une méthoded’optimisation
deportée générale capable
de situer les minimums d’une fonction arbitraire definie sur un ensemblefini,
fut-il de trèsgrande
taille. Leurdescription mathématique
est celle d’une chaine de Markov contrôlée non stationnaire.Le but de cet article est d’établir une théorie des
grandes
deviations pources
systèmes
discrets nonhomogènes
dans letemps.
Une etude détaillée de la loi du lieu et dutemps
de sortie d’un sous-ensembled’états,
fondéesur la
decomposition
encycles
de Wentzell etFreidlin,
conduit a un calculsystématique
de laprobabilité
des sauts d’un sous-domainequelconque
dans un autre. Cette
description
de lafaçon
dont lestrajectoires s’echap- pent
des attracteurs nous sembleapporter
une informationqualitative
intéressante sur le
comportement
du recuit. Nous pensons aussi que cesestimées
précises
degrandes
deviations sontsusceptibles
de nombreusesapplications.
Nous illustrons leurportée
en traitant dans cet article etcelui
qui
lui fera suitequatre
themes : la convergence vers les états fonda- mentaux,1’equidistribution asymptotique
sur les étatsfondamentaux,
laquestion
duquasi-équilibre thermique,
la forme des courbes detemperature optimales.
Un raccourci vers des estiméesplus
«grossières
» maisplus
uniformes sera
présenté
ultérieurement.Contents
Introduction... 293
1. Description of the model. 298
1.1. Annealing algorithms ... 298 1.2. Decomposition into cycles ... 299 1.3. Invariant probability measures ... 302 2. Study of some large deviation events. 3~3
2.1. The most probable behaviour of X... 304 2.2. Generalized transition kernels ... 306 2.3. Jumping out of cycles ... 312 Annales de l’Institut Henri Poincaré - Probabilités et Statistiques
3. Convergence of annealing algorithms. 3I 7
4. Behaviour of annealing in restricted domains. 322 5. Proofs of large deviation estimates. 339
Conclusion. 369
6. Appendix. 370
6.1. Comparison lemmas ... 370 6.2. Composition lemmas... 374
INTRODUCTION
Simulated
annealing
is a simulationalgorithm
from Statistical Mech- anics. It wasdesigned
to simulate the canonical distribution of aphysical system
in contact with a heat bath when thetemperature
of the heat bath isprogressively
lowered to zero. If thecooling
is not toofast,
thesystem
will gothrough
a succession ofquasi-equilibrium
states towards one ofits
ground
states. As any function defined on a finite set can be viewed asthe energy levels of an "abstract"
physical system,
simulatedannealing algorithms
have been found to be ageneral
purposeoptimization
tech-nique, allowing
one to locate theglobal
minima of anarbitrary
function.They
have beenwidely
used as such without further reference to theirphysical interpretation.
From the mathematical
point
ofview,
simulatedannealing
is an interest-ing
case ofnon-stationary
Markov chain: the transition matrices converge to aboundary point
of the domain ofergodic
Markov matrices. Astemperature
goes to zero the "instantaneous"mixing properties
grow weaker and weaker and onequestion
is to know whether the non-station- ary Markov chain as a whole retains somemixing properties.
The answerwill
depend
on thecooling schedule,
on how fasttemperature
isapproach- ing
zero.Non-stationary
Markov chains are a not so recent but stillwidely
open field of
investigation (Dobrushin [7],
Iosifescu and Theodorescu[17],
Isaacson and Madsen[18],
Seneta[20],
Gidas[10]).
Let us formulate some of the
questions
about simulatedannealing people
have been interested in. One is to know for whichcooling
schedulesquasi-equilibrium
is maintained. Another one is to know whether the distribution law of thesystem
concentrates on theground
states. In caseit is so and there are more than one
ground
state, one is entitled to ask whether all theground
states are reached withequal probabilities. Finally
Vol. 27, n° 3-1991.
294 O. CATONI
when the
target
is to minimize the energyfunction,
one wants to knowwhich
cooling
schedulesprovide
the fastest convergence.The aim of this paper and of a
forthcoming
one is to address thesequestions.
Partial answers
already
exist in the mathematical literature. S. Geman and D. Gemanpointed
out in 1984[9]
thatcooling
schedules of thetype
with clarge enough
ensurequasi-equilibrium (n
is the discrete time variable in thisequation).
Thequestion
was then to know what wasthe smallest
possible
value of c.The critical value was
given by
differentpeople. Holley
and Stroock showed in 1988[13]
that the lawFXn of the system Xn undergoing
simulated
annealing
had a boundeddensity In
withrespect
to theequilibrium
law attemperature Tn
when c> co, cobeing
thesharp
critical value. Their
argument
is based on thestudy
of the smallest non zeroeigenvalue
of the Dirichlet form associated with the infinitesimalgenerator
of the process and goesthrough Lp
estimates offn
derivedfrom the
Kolmogorov equations. They
were not able to conclude on theL ex) norm
of In
in the critical case c = co.More
precise
results weregiven by Chiang
and Chow andby Hwang
and
Sheu,
both in 1988. These authors showed that lim I when c> Co, cobeing
the critical value.Chiang
and Chow uses the forwardequation
and an inductionargument
on theprecision
of their estimations.Hwang
and Sheu have a differentapproach. They
use the results from Wentzell and Freidlin[8]
onstationary
randomperturbations
ofdynamical systems
to derive a weakergodicity
estimate in theapproximately
statio-nary case, that is for Markov chains with transitions
remaining
of thesame order of
magnitude
as the transitions ofannealing
at one fixedtemperature.
Thenthey
use their estimates on time intervals on which the variation of thetemperature
is smallenough.
These papers, which are focused on the
study
ofquasi-equilibrium, give
also some answers for the convergence to a
ground
state but thesharpest
results are in two papers
by Hajek [11]
in 1988 and Tsitsiklis[22]
in 1989.They give
a necessary and sufficient condition for which theprobability
of the
system
to be in aground
state tends to one when time tends toinfinity.
The condition is + oo , whered Co
is a "criticaldepth".
k
Tsitsiklis uses the same
strategy
asHwang
and Sheu: he studies the almost constanttemperature
case andapplies
it on well chosen time intervals.Hajek
worksdirectly
in the non-constant temperature case. HeAnnales de 1’Institut Henri Poincaré - Probabilités et Statistiques
studies the exit time and
point
fromsecondary
attractors of thesystem (he
calls them"cups").
If C is one of these cups, he showsby
an inductionargument
on the size of C that for any state i in C of minimal energywhere H is
sharp
but not r. He shows also the weak reverseinequality
To my
knowledge
there was no necessary and sufficient answer to theproblem
ofknowing
whether the law of thesystem
becomesequidistributed
on the
ground
states before the onegiven
in this paper. I did not found either anystudy
ofoptimal cooling
schedules(those leading
to the fastestconvergence).
Let us close this
quick
review of the literatureby saying
that theconstant
temperature
case is well understood in a moregeneral
contextfrom the works of Wentzell and Freidlin on random
perturbations
ofdynamical systems. They
have made decisivebreakthroughs
in thetheory
of
large
deviations ofdynamical systems
with small randomperturbations.
Large
deviationstheory
is theright
framework tostudy annealing algo-
rithms because
escaping
from local minima is an event of smallprobability
at low
temperature.
Roughly speaking
we willbring
thefollowing
answers to the fouraforementioned
questions:
1. To
study
the critical case it is necessary tointroduce
a second constant and to consider
cooling
schedules of thetype
The behaviour of the systemdepends
on whether B islarger
than some critical valueBo
or not.2022 If B >
Bo,
thenlim ~fn~~ =
+ ~.~ If
B Bo,
then limIn (i)
exists for anystate i,
isfinite,
is always > 1 and for some states > 1.Hence
quasi-equilibrium
is not maintained in the critical case but it can be almost maintained for smallenough
values of B.2. We
give
a newproof
ofHajek
criterion of convergence: if F is the set ofground
states, lim if andonly
k
27. n° ~-I~91.
296
O. CATONI3. We prove that lim P
(X~ = f)
= Card(F) ’ ~
for anyf
e F if andonly
if where
co>d
is the same critical constant as in thek
discussion
ofquasi-equilibrium.
4. We say that
t~,
...,t~
isoptimal
if__ ._ _f-r~J __. _ ._ 2014’-r’ __~ , ~ ~
Thus
our measure of convergence is theprobability
to be in aground
state at some finite
(large)
time Ncorresponding
to the end of thesimulation.
There is no reason apriori
to think that the best schedulesT 1,
... ,TN
areindependent
ofN,
hence we haveput
asuperscript
N. Weshow that there is some value
B
of B and some a > 0 such thatas soon as k and N - k are
large enough.
We show that
changing 1/N
into1 /co ln k + B) produces only
a small variation of
P (XeF).
Remarks:
~ We will prove elsewhere
[3]
that(4)
does not hold for N - k "small".This shows that there is no
optimal cooling
schedule which isindependent
of the simulation time N
(i.
e. such thatTk
is constant in N for kfixed).
. Our
study
ofT
is done under somenon-degeneracy asumptions
onthe energy
landscape:
we assume that there isonly
oneground
state andthat the critical
depth d
inHajek
criterion is reachedonly
once.This first paper will be
mainly
devoted to build the technical tool weneed to prove our claims. This tool is made of a
precise
andsystematic study
of the way thesystem
escapes from anygiven
strict subdomain of the states space.This program has been carried
through by
Wentzell and Freidlin[8]
ina
general
context thatapplies
to the constanttemperature
case. We aregoing
to carry it for simulatedannealing,
that is in anon-stationary
case.We have not used Wentzell and Freidlin results as
Hwang
and Sheu didbecause we needed more
precise
estimates: we wanted them to be uniform withrespect
to the values of thetemperatures
and we wanted to knowmore than the
exponential
order ofmagnitude
of theprobability
to espace from attractors, we wanted a fullequivalent.
Nontheless we followed theconceptions
aboutlarge
deviations which Wentzell and Freidlingive along
with their
proofs.
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques
There are subdomains for which the
dependence
of the exit time andpoint
on thestarting point
within the domain fades away at lowtempera-
tures. These subdomains were called
cycles by
Wentzell and Freidlin. We will prove that for anycycle
C and any state e Cwith
sharp
constants a and H. We will makeexplicit
how the "almost~-equal"
can be considered to be uniform withrespect
to thetemperatures
sequence
Tm + 1, ... , Tn. Getting
uniform estimates is crucial to address theproblem
ofoptimal cooling
schedules.Having
asharp
constant a is useful to prove convergence to theequidistributed
law on theground
states. This is
sharper
than usuallarge
deviation estimates whereonly
theexponential order,
that isH,
issharp.
The
cycles
areorganized by
the inclusion relation into a tree structure.Two
cycles
are eitherdisjoint
or one is a subset of the other.Along
with the escape fromcycles,
we willstudy
thejumps
from onecycle
to anothercycle.
Nowtaking
somegeneral
strict subdomain of the states space, we can write it as thedisjoint
union ofcycles
in the coarsestpossible
way.Studying
thejumps
from onecycle
of this "maximalparti-
tion" to
another,
we willgive
thedependence
of the exit time andpoint
of the domain on the
starting point.
The
proofs
areby
induction on the size of thecycles.
Thestudy
of thebehaviour of the
system
incycles
and thestudy
ofjumps
fromcycle
tocycle
areclosely
related in the inductionargument.
To prove that the conditional law of thesystem knowing
that it remains in somecycle
becomes
equidistributed
on theground
states of thecycle,
westudy jumps
between
subcycles containing
theground
states. Then we remove oneground
state from thecycle, obtaining
thus a subdomain which is no more acycle,
and westudy
the behaviour of thesystem
within this domain after its last visit to the markedground
state. It has a strongprobability
to return to the marked
ground
state, and a small one to escape from thecycle.
In this we follow the program tracedby
Wentzell and Freidlin’sinvestigation
of the escape from a domaincontaining
one stable attractor.What we
get
in the end is asystematic algebraic
tool to derive estimates for thesystem
to have lived in a tube ofarbitrary shape during
agiven period
of time as well as estimates for anarbitrary
succession ofjumps
from one tube ’to another one.
This tool is therefore very
general
and could lead to otherapplications
than the results we
give
in these two papers.Vol. 27, n° 3-1991.
298 O. CATONI
This research has been carried under the direction of R.
Azencott,
I amglad
to thank him forhaving
introduced me to simulatedannealing
andhaving supported
meby
usefulsuggestions
andencouragements.
1. DESCRIPTION OF THE MODEL
1.1.
Annealing algorithms
DEFINITION l.l. - An energy
landscape
is acouple (E, U),
where:~ E is a
finite
set;~ U : E ~ (~ is a non-constant real valued
function.
DEFINITION 1.2. - Let
(E, U)
be an energylandscape.
A communication kernel on E is an irreduciblesymmetric
Markov kernel q onE,
that is afunction
such that
For any subset A
of E, by q|A
we will mean the restrictionof
q toA,
thatis the kernel
otherwise.
Remark: The
assumption
ofsymmetry
could bereplaced by
the existence of alocally
invariantprobability
measure, that is a measure ~., such that:or even
by
a weakerassumption
such as inHajek [11];
but this would not lead us tosignificantly deeper
results norchange
the nature of theproofs.
DEFINITION 1 .3. - Let us r~~ite
x ~ , for
thepositive
partof
x, thatis for
sup
(x, 0).
Let(E, U, q)
be an energylandscape
with communications. For any T the transition kernel ~.~ at temperature T is a Maikoc kernelAnnales de l’Institut Henri Poincaré - Probabilités et Statistiques
on E
defined by
and
By
convention we will extend thisdefinition
to the case T = 0 inthe following
way:
DEFINITION 1.4. - A
cooling
schedule is anon-increasing
sequenceof positive
real numbersDEFINITION 1.5. - A simulated
annealing algorithm
is a collection(E, U,
q,T, X),
where:~
(E, U, q)
is an energylandscape
withcommunications;
~
~o
is aprobability
distribution on E(the
initialdistribution);
. T is a
cooling schedule;
. and X is the Markov chain on E
defined by
its initial distribution~o
and its transitions
Remark. - We can
identify (E, U, 20,
q,T, X)
and(E, U,
q,
T, X)
as soon as U -U
is a constant function on E(because
thelaw of X is then the
same).
Hence we can - and will - assume that1.2.
Decomposition
intocycles
We follow here the
decomposition
intocycles given by
Wentzell andFreidlin
[8]
in a moregeneral setting.
DEFINITION i.~. - Let
(E, U, q)
be an energylandscape.
The level set atlevel ~,
of (E, U)
is the subsetE~ of
Edefined hy
Vol. 27, n° 3-1991.
300 O. CATONI
DEFINITION 1.7. 2014 Let (E, U, q) be an energy landscape. Let 03BB be some real number. The communication relation at level ?~ on
(E, U~
is theequiva-
lence relation
defined by
_ f . _ _. , , _ __ . _ - _.... , "’B _ _ _.
We shall assume
by
convention that(q~
=Id,
hence that(i, i)
E i E E.Two states i
and j
communicate at level ~, if there is apath
from ito j
which does not gothrough
any state of energysuperior
to ?~.DEFINITION 1 .8. - Let
(E, U, q)
be an energylandscape.
Let ~, be a realnumber. A
cycle of
level ?~ is a subsetof
E which is anequivalence
classof
The class
of
allcycles of
level ?~ will be denotedby ~~ (E, U).
The classof
allcycles
will be denotedby (E, U).
We haveThe class
of
allsub-cycles of
acycle
Cof (E, U)
will be denotedby CC (E, U, C).
Thus we haveT T !’rB - ~ f~ _ VTB ~ ~~ ~ 1 /inB
Let us mention a few
simple
facts aboutcycles:
PROPOSITION 1.9:
~ All sets reduced to one state are
cycles.
~
If
C and G are in ~(E, U) they
are eitherdisjoint
orcomparable for
inclusion
(that
is one is a subsetof
theother).
~
(E, U)
is apartition of
E.DEFINITION 1.10. - Let
(E, U, q)
be an energylandscape.
We extendthe
definition of
U to(E, U) by putting
~ T / ~B . t T / .. /~B
The energy of a
cycle
is the energy of its fundamental states.DEFINITION l.l l. - Let
(E, U, q)
be an energylandscape.
Let A be asubset
of
E and let be any states in E. Wedefine
the minimal energyof
communication between i
and j through
A to beT T /.: ~ A 1 - : ,.~ ~’ / ’’1 . ~,"..~ 1tI /: =1 ~ i
DEFINITION l .12. - The
boundary of
AcE is- .._ ~ - .. _
Annales de l’Institut Henri Probabilités et Statistiques
DEFINITION 1.13. - The
depth function (E, U~ --~ Il~
isdefined by:
Remark. - The max is needed in the case when C is reduced to one
point.
DEFINITION 1.14. - The
principal boundary of
~(E, U)
isRemark. -
Again
the is there to cover the casesingletons.
When C isnot reduced to one
point,
theinequality
isequivalent
tothe equality:
PROPOSITION 1.15. - then
DEFINITION 1.16. - The bottom
of
C E ~(E, U)
isDEFINITION 1.17. - Let
(E, U, q)
be an energylandscape
with communi-cations. The set
of
minimumsof E
isA minimum i is called a local minimum
f Ui
> ~[let
us recall that U(E)
= 0by convention].
DEFINITION 1.18. - Extension
of
the communication kernel q toP(E) P
(E)
and(E, U) :
and
for
anycycle
Cof
S~(E, U)
DEFINITION 1.19. - Some more classes
of cycles:
Let C be a
cycle
in ~(E, U),
~"(E, U, C)
will denote the classof cycles
G in ~
(E, U, C)
such that U(G)
> U(C),
and ~’(E, U, C)
will denote the classof cycles
G inU, C)
such that F We will abreviate~‘
(E, U, E)
and ~"(E, U, E) by
~’(E, U)
and ~"(E, U).
The reason for
introducing
~’ and ~" is thefollowing:
if we want tomake sure that the system can reach some fundamental state of C
starting
Vol. 27. n° 3-1 ~91.
302 0.CATONI
trom a state in i, we nave to maKe sure tnat it is possioie to get out 01 any
cycle
of(E, U, C).
If we want to make sure that it ispossible
totravel from any
ground
state of C to any otherground
state ofC,
wehave to make sure that it is
possible
toget
out of anycycle (E, U, C).
PROPOSITION 1.20. - We have
DEFINITION 1 .21. - For any
cycles C1, C2
such thatC2 cC1,
we willdenote
by
~’C1)
and call the natural contextof C2
inC1
thesubcycle
which is minimal
for
inclusion in, _ _ ._ _ _ . , ~. - - - - _ . ~ . _ , . _ _ .
DEFINITION 1.22. - Let C be a
cycle of (E, U, q).
The naturalpartition of
C is thepartition of
C into its maximal strictsubcycles.
LEMMA 1.23. - The
quantity H (G) + U (G)
is constant among the Gsbelonging
to the naturalpartition of
C.Proof. -
If it were not the case,putting
(G)
+ U(G) ‘
G in the naturalpartition of C ~ (31 )
the communication relation at level
?~,
would induce apartition
of Ccoarser than the natural
partition
ofC,
which is a contradiction.End
of
theproof of
lemma 1.23.DEFINITION 1 .24. - Let
(E, U, )
be an energylandscape
with communica-tions. For any
CE(E, U)
wedefine
___ . ~.. ~ __ . ~._ ~ _ __ ._ _ _ ~.. ~ "’B. - __
ana
1.3. Invariant
probability
measuresDEFINITION 1.25. - Let
(E, U)
be an energylandscape.
Theequilibrium
distribution at temperature T is
defined
to be theprobability
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques
where
is the
partition function.
We extend thisdefinition
to T = 0by putting
where |A| is
the cardinalof
A and xA is the characteristicfunction of
A.PROPOSITION 1.26. - Let
(E, U)
be an energylandscape.
We havefor
any iEE
Proof. - Easy.
PROPOSITION 1.27. - For any T
E (1~+,
~.T is the invariantprobability of
the transition kernel at temperature T.
In fact
we have a strongform of invariance,
which is local invariance:for
any iand j
in E2. STUDY OF SOME LARGE DEVIATION EVENTS
We will need uniform
estimates, holding
for some families ofcooling
schedules. The
simplest interesting family
ofcooling
schedules we willencounter are the
neighbourhoods
of zero for theL~
norm. We willstudy
how
long
thesystem
remains in agiven
subdomain of E at lowtempera-
ture. This is a way of
localizing
ourstudy
both in space and in time.DEFINITION 2.1. - Let
(E, U, q)
be an energylandscape
with communi-cations. A
cooling framework
is any setof cooling
schedules. For anywe
define
thecooling framework:
Such
cooling frameworks
will be calledsimple cooling
frameworks.DEFINITION 2.2. - An
annealing framework
is a collection(E, U,
q,~ ,
where:.
~E, U,
q, is an energylandscape
with communications and initialdistribution;
Vol. 27, n° ~-~991.
304 O. CATONI 2022 F is a cooling framework;
X is the class
of
Markov chainsappearing
in theannealing algorithms (E, U,
q,T, X)
withAn
annealing framework
is said to besimple f
thecorresponding cooling framework
issimple.
DEFINITION 2.3. - Let
(E, U,
q,~ , ~’)
be anannealing framework.
Let A be some subset
of E, the first
exit timefrom
Aafter
time m will benoted
2.1. The most
probable
behaviour of XThe results from this section are very
simple. They quantify
the factthat in first
approximation
a simulatedannealing algorithm strongly
resembles a descent
algorithm.
THEOREM 2.4. - For any energy
landscape
with communications and initial distribution(E, U,
q,~o),
there exist asimple annealing framework (E, U,
q,F, X)
and a constant rt>O suchthat, for any X~X, for
anyi, j E
E we havek
Proof. -
There are constants a > 0 andTo > 0
for which for any TTo,
for
any i, je
E we haveConsider the
annealing
framework(E, U,
~(To),
In thisframework
-
DEFINITION 2.5. - Let
(E, U, q)
be an energylandscape
with communica-tions. For any i ~ E the set
of
minimaof
E under i isAnnales de l’Institut Henri Poincaré - Probabilités et Statistiques
The set
M~
is thus the set of local minima of the states space E whichcan be reached
by
the descentalgorithm
with transitions postarting
at i.THEOREM 2.6. - For any energy
landscape
with communications and initial distribution(E, U,
q, there exist asimple annealing framework
~E, U,
q,~ ,
and constants a >0, ]0, 1 ~,
y > 0 suchthat, for
any anyi,jEE,for
any k~N we havek
Proof. - According
to thepreceding theorem,
we have to prove that_ ._ . _ _ _ _ ~_ .. _.
Let
Oi
be the set of states which can be reachedby
the descentalgorithm
with transitions po
starting
from i:The relation "to be reachable from" is
clearly transitive,
which canequiva- lently
be stated in thefollowing
way:If j belongs
tooi,
thenO J
is a subsetof oi.
The next
thing
to remark is thatM J
is neverempty,
becauseeither j
isa minimum or there is in
OJ
some state s with energyUS U~.
Foreach j
in
oi
considerwhere
Consider
and
If s is in
Mj’
we haveVol. 27, n° 3-1991.
306 O. CATONI I Thus
Thus A > 0. Then
But po (i, Q~ - M~) =1-pa (i, M;),
henceEquation (46)
is an easy consequence of this lastinequality.
2.2. Generalized transition kernels
DEFINITION 2 . 7. -
(E, U, ~o,
c~,T, X)
be anannealing algo-
rithm. The
family
1VIof generalized
transition kernelsof
s~ is afamily
Mof
2 x 2 tensors indexedby P (E) (E) defined by
and
The
product of
GTKs will be the usual inner tensorproduct:
Remark. - M
(A,
is theprobability
ofjumping
from A into B at time n andpoint j
ofB, coming
from i at time m andhaving stayed
in Ain the mean time.
It will be
helpful
to introduce some "characteristic kernels":DEFINITION 2. 8. - For any subset A
of
E wedefine
the characteristic kernelof
A to beRemark. - The characteristic kernel of
E, I(E)
is a unit for thegeneralized
transition kernels of .~‘:__.. A. ~~_._. _____ _
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques
The
following decomposition
formulas will allow us to write thejumps
from A into B in terms of smaller subdomains.
PROPOSITION 2 . 9. - Let A and C be subsets
of
E and let B be a subsetof A.
We have:Proof. -
We havefrom which it is easy to deduce
equation (59).
PROPOSITION 2 . 10. - Let A and C be subsets
of
E. Let ~ be apartition of A.
We haveProof -
Let us introduce the time of lastjump
into one of thecomponents
of d: let us call - theequivalence
relation inducedby d
onE, ij
if andonly
ifi = j
or there is G E j~ such that(i, j)
EG~.
Weput
We have
but
..., ~ 2014 1
_.._l~_.. ~ A
Moreover,
if i EG,
Vol. 27, n° 3-1991.
308 O. CATONI
and if
End
of -
theproof of proposition
2 . 10.We can write the
jump
from A to C as a series ofjumps
betweensmaller
subdomains,
asexpressed
in thefollowing proposition.
PROPOSITION 2.11. -
Decomposition of
M :Let A and C be subsets
of E,
and let be apartition of
A. Wehave
with the convention that in the
first
sum the termcorresponding
to k = 0 isT /T ... ’- ... IVS
in equation tne summation in /c is unite tor any
M(A, ~)I; m (namely
it can be restricted tokn-m).
Proof -
Consider theequivalence
relation - associated with thepartition
of A. Define the sequence ofstopping
timespm by
and
We have for any i~E and
any jEC:
Conditioning by pm, I k
in each term in theright
membergives
thedesired result.
End
of
theproof.
For well behaved domains A
(i.
e. forcycles)
theprobability
to stay inn
A between times m and n is
roughly speaking f1 ( 1- ~ e - H ~A~~T~~
whereH(A)
is thedepth
of A and theprobability
ofjumping
out of A into C isAnnales de l’Institut Henri Poincaré - Probabilités et Statistiques
roughly
The
following
definitions will make moreprecise
what"roughly"
couldmean in the
previous
sentences.DEFINITION 2.12. - A KI
(Kernel
on theIntegers)
G is afunction
A KI G will be said to be
increasing f Gm
= 0for
m > n, it will be said tobe
decreasing f Gm
= 0for
m n, it will be said to befinite if
A RKI
(Right
Kernel on theIntegers)
is anincreasing
KIQ
which hasthe Markov property:
w ~ri , ~ ~.,~~
A ~KI is a I~I
Q
such thatQ defined by Qm
=(~n
isdecreasing
and has the Markov property.Let
~E, U,
q,T, ~)
be anannealing algorithm.
Ror anypositive
constants
H,
a,b,
a class Dr(H,
a,b)
is a RKIQ satisfying
1 ,..~ ., - 1
r
where the
cooling
schedule T is extended to Zby putting Tk
=T1 for
k 1.In the same way a LKI
of
classDl (H,
a,b)
is a LKIQ satisfying
DEFINITION 2 . 13 We will use the
following notations for
KIs andand
Vol. 27, n° ~-~~~~.
310 O. CATONI
using mese aecomposmon formulas, we are going estlmate Im B) m terms of RKIs and LKIs.
Let us
study
first thesimplest
case as anexample
to begeneralized.
We will
study
thejumps
from A into Bby
induction on the size of A.Hence the first
thing
to do is to describe thesimple
case when A is reducedto one
point.
Example: Study o~~{~5~,
We will
study
thefollowing
cases:~ The
pair { i, s ~
forms acycle C,
= F(C).
~ The
starting
state i isequal
to s.The first
corresponds
to thesimplest
case when A is acycle
with oneof its
ground
states removed. This situation isinteresting
in thestudy
ofthe last excursion of the
system
from one of theground
states of acycle
before it
jumps
out of it.The second case is the initialization of the induction
argument
on the size of A and is mentioned for the sake ofcompleteness.
PROPOSITION 2. 14. - energy
landscape
withinitial distribution and communications. Let C
= ~ i, s ~
be acycle
withelements. Assume that
Ui Us. Let j
be a stateof
E - C. There arepositive
constants a and
b,
there is asimple annealing framework (E, U,
q, F(To), X)
in which there is aRKIQ1
and a LKIQ2 of
class~ {H~ ~ ~, a, b)
such thatProof -
Let usput
.~ _ , ..
where
~~ =1
if k = I and~k = 0
otherwise. In the same way, let usput:
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques
It is easy to see that
Q
is a RKI and R a LKI of class~d (0, a, b).
Wehave
and
There exist
positive
constantsTo
and a suchthat,
in theannealing
frame-work associated with the
cooling
framework /(To)
we have:with
The choice of
To
and a arelinked,
it ispossible,
forexample,
to takeLet us
put
and
it is easy to deduce the
part
ofproposition
2.14concerning
RKIs fromthe
preceding equations.
The
proof
for LKIs is in the sametrend, putting
End
of
theproof
PROPOSITION 2 .15. -
Let(E, U,
q, be an energylandscape
withcommunications and initial distribution. Let C
= ~ i ~
be asingleton
in E andlet j
be in E - C. There arepositive
constants a,To,
a and asimple annealing framework (E, U,
q, ~(To), ~’)
in which there exist a R~Q~
and aLKI
Q~ of
class ~(H (C),
a,0)
such that:Vol. 27, n° 3-1991.