HAL Id: jpa-00246344
https://hal.archives-ouvertes.fr/jpa-00246344
Submitted on 1 Jan 1991
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Ensemble approach to simulated annealing
George Ruppeiner, Jacob Mørch Pedersen, Peter Salamon
To cite this version:
George Ruppeiner, Jacob Mørch Pedersen, Peter Salamon. Ensemble approach to simulated annealing.
Journal de Physique I, EDP Sciences, 1991, 1 (4), pp.455-470. �10.1051/jp1:1991146�. �jpa-00246344�
Classificafion
Physics
Abstracts05.20G 02.50 02.70
Ensemble approach to simulated annealing
George Ruppeiner ('),
JacobM§rch
Pedersen ~2,*)
and Peter Salamon(3)
(~) Division of Natural Sciences, New
College
of theUniversity
of SouthFlorida,
Sarasota, Florida 34243, U.S.A.f) Physics Laboratory,
H. C.§§rsted Institute, University
ofCopenhagen, Universitetsparken
5, DK-2100Copenhagen,
Denmark(~) Department of Mathematical Sciences, San
Diego
StateUniversity,
San Diego, California 92182, U.S.A.(Received16
November 1990,accepted
9 January1991)
Abstract. We present three reasons for
implementing
simulatedannealing
with an ensemble of random walkers which search theconfiguration
space inparallel.
First, an ensemble allows theimplementation
of anadaptive coofing
schedule because itprovides good
statistics forcollecting
thermodynamic information. TMs newadaptive implementation
is general,simple,
and, in somesense,
optimal.
Second, the ensemble can tell us how tooptimally
allocate our effort in the searchfor a
good
solution, I.e.,given
the total computer time available, how many members to use in the ensemble. llfird, an ensemble can reveal otherwise hiddenproperties
of theconfiguration
space,e.g.,
by examining Hamming
distance distributions among the ensemble members. We present numerical results on thebipartitioning
of randomgraphs
and on agraph bipartitioning problem
whose staticthermodynamic properties
may be solved forexactly.
1. Introducdon.
Simulated
annealing
is a class of computeralgorithms
forfinding good
solutions to certaincomputationally
intractableproblems [I].
It is based on ananalogy
with statisticalmechanics, particularly
the statistical mechanics ofspin glasses [1, 2].
Theobjective
is to n~inimize the cost functions forproblems
which have many local minima in their space ofconfigurations.
Most
previous approaches
to simulatedannealing
used a serial searchrepeated
several times to get agood
solution.Among
the fewparallel studies,
thetypical procedure [3]
divides theconfiguration
space into cells. In view of thephenomenon
of brokenergodicity,
this seems unnecessary, since the accessibleconfiguration
space will in any casefragment
into cells[4].
Better to let the nature of the space decide the division We advocate a
parallel
search, of the entireconfiguration
spacesimultaneously by
an ensemble of random walkers with limitedcommunication.
Specifically,
we allow the random walkers to send informationgiving
the(*)
Permanent address.-#degaard
& Danneskiold-SaJns#eApS., Kroghsgade
I,Copenhagen,
Dennlark.values of the
energies they
visited to a central control routine which decides when toupdate
the shared value of the
temperature.
We advocate a
parallel
search of the entireconfiguration
space for three reasons.First,
it enables us toget good
statistics forthermodynamic quantities
with ensemble averages. Good statistics make theimplementation
of ouradaptive
schedulepossible.
Second,
it enables us tocompute
collectiveproperties,
such asHamming distances,
among the members of the ensemble.Hamming
distancesprovide insight
into the nature of theconfiguration
space of theproblem.
Inparticular, they
illuminate the distribution of local energy minima in these verycomplicated
spaces.Third,
the ensemble makes itpossible
to calculate the distribution of Best So FarEnergies, (BSFE),
as a function of time[5].
The BSFE is the minimum energy a walker has seen up to time t. The BSFE distribution for all walkers allows a determination of theoptimal
ensemble size for agiven computational
effort which minimizes theexpectation
value of the lowestenergy seen.
The idea of ensembles is
clearly
dictatedby
theanalogy
to realphysical systems.
While ourarguments
do notrely
on thisdirectly,
the intuition for them is often motivatedby
theanalogy.
Westrongly
believe that theanalogy
tophysical
systems is useful and runs fardeeper
than what is
presently
understood. Simulatedannealing
should bejudged
with animplemen-
tation which makes full use of the intuition
gained
from theanalogy
rather thanmerely
the idea of ii slowcooling
».Let us add that an essential aspect
conceming
ourapproach
to simulatedannealing
isii broken
ergodicity
», which is of fundamentalimportance
inspin glass problems [4].
At lowtemperatures,
time averages are notequal
to ensemble averages even after a verylong
time since thesystem gets
stuck inregions
ofphase
space from which escapeis, practically speaking, impossible.
This makes the ensembleapproach fundamentally
different in character from the serialapproach.
2.
Theory.
In this
section,
we discuss thetheory
behind our method. The discussion dividesnaturally
into four sections 2.Ibackground,
2.2implementation,
2.3 Best So FarEnergies (BSFE),
and 2.4Hamming
distances.2,I BACKGROUND. Let us introduce the
example
we use to illustrate the ideas in thispaper the
problem
ofgraph bipartitioning [2].
Agraph
consists of a collection of vertices connectedby edges [6].
Theproblem
is to divide the vertices into twoequal
size sets, A andB,
such that the number ofedges running
between the sets is minimized. Each way ofassigning
vertices to sets is a «configuration
» and the <i cost function » for anyconfiguration
is the number of
edges connecting
vertices in different subsets.A
practical application
ofgraph bipartitioning
isprovided by
circuitdesign,
where the vertices arecomponents
of an electronic circuit and theedges
are electrical connections between thecomponents.
How should the components be dividedequally
between twochips
to minimize the number of connections between the
chips
?The
difficulty
ingraph bipartitioning
is that it appears one cannot, for ageneral graph,
beguaranteed
ofhaving
found theoptimal solution, I.e.,
theconfiguration
with the smallest value of the costfunction,
unlessessentially
all of theconfigurations
have beentried,
and thenumber of
possible configurations
growsexponentially
with the number of vertices.Graph bipartitioning
is anNP-complete problem,
for which no efficientalgorithm
forfinding
theoptimal
solution is known[7].
Theobjective, therefore,
is to find efficientalgorithms
forfinding good
solutions.A
graph
consists ofvertices,
and ofedges connecting
somepairs
of vertices. The verticesare numbered from I to
N,
where we take N to be an even number. Two vertices connectedby
anedge
are said to beadjacent.
There are
l~
=~"
=2/~ (1)
I
~6'~
2 2
possible configurations
for thepartition
of agraph
into twoequal
size subsets A and B. For N=
52,
it would take a computer which can test a millionconfigurations
per second about sixteen years to test everyconfiguration.
Theproblem
with 54 vertices would take about four times aslong. Typical problem
sizes in circuitdesign [I]
have N= 5 000. It is clear that the most we can
hope
to find is agood configuration.
An
analogy
is made between the cost function in simulatedannealing
and the energy in statistical mechanics.Henceforth,
we call the cost function the energy.Configurations
in simulatedannealing
areanalogous
to microstates in statistical mechanics.To
approach
thegraph bipartitioning problem, imagine
that there is a «walker»constrained to travel over the
configuration
spacelooking
for theoptimal
solution. A clockkeeps
time. As the time increasesby
oneunit,
the walker considersmaking
a «step
fromone
configuration
to another. A stepcorresponds
toswitching
arandomly
selected vertex in subset A with arandomly
selected vertex in subset B. Anattempted step
isaccepted
with theprobability
~
~~~ ~~ ~xp
AE/T)
~ ~
,
~~~
given by
theMetropolis algorithm [1, 8].
AE is thechange
in energy which would result fromaccepting
the step, and T is a parameter called the temperature. If T isinfinite,
allpossible
steps areaccepted,
and the search isentirely
random. If T is verysmall, uphill
moves areessentially forbidden,
and the search is a descentsearch,
or iiquench
», which ceases to accept furtherattempted
moves once the walker reaches a local minimum. The idea in simulatedannealing
is to start at ahigh
temperature and coolslowly
so that the walker iscontinually
drawn towards low energy
configurations,
while at the same timebeing
able to escape local minimaalong
the way.Our
step
rule for the random walk inconfiguration
space defines adynamics
for theproblem.
For fixedT,
it leads to the Gibbs-Boltzmann distributionexpj- E(w )/n
(3) P(w, T)
=
z(T)
'in the limit of infinite
time, regardless
of the initialconfiguration [1, 8]. Here,
p(w, T)
is theprobability
ofgetting
theconfiguration
w, with energyE(w ),
attemperature
T andz(T~
=
z
exPE(W )/11 (4)
is the
partition
function. The sum is over allpossible configurations.
The
equilibrium
average energy(E)~
at sometemperature
T is(El
~ =
z E(W )P(W, T) (5)
The
equilibrium
heatcapacity
isgiven by
a fluctuation formulad
(El
~
j(AE)~j
C(T~
~"
j
= ~ ,(6)
where
i(AE)~>
~ =
z lE(W <El TI~P (W, T) (7)
In our
approach
to simulatedannealing,
we use an ensemble of walkers to calculatethermodynamic
averages. The ensemble of walkerscontinually
tries to reachequilibrium
witha « reservoir» at temperature
T,
which is the temperature in theMetropolis probabifity, equation (2).
We refer to this reservoir as the «target
». Akey assumption
in thejustification
of
optimization,
but not in theimplementation
of ourmethod,
is that theenergies
of the ensemble of walkers are at all times distributedaccording
to the Gibbs-Boltzmanndistribution at some
temperature.
The two
examples
ofgraph partitioning problems employed
in the present paper arerandom
graphs
and asimple family
ofgraphs
called « necklaces ». Randomgraphs
aregraphs
in which each
possible edge
ispresent
or not with a certainprobability.
An instance of thegraph
isgenerated by using
this property to decideindependently
whether eachedge
is present. Necklaces are afamily
ofgraphs
whosethermodynamic properties
can be solved forexactly [9].
An[m,
n necklace consists of acycle
of melements,
each element is acompletely
connected
graph
with n vertices. In this paper, we have restricted m to be even, and usedn = 2.
Figure
I shows the[8, 2]
necklace in aconfiguration
of lowest energy. There are m lowest energyconfigurations
relatedby
«rotations». In ourdynamics,
eachoptimal configuration
isseparated
from the one next to itby
an energy barrier ofheight
2.3
2 4
15 5
6
14 8
13 7
12 10
9
Fig,
I. An [8, 2] necklaceconsisting
of a cycle of 8 beads, each beadcomposed
of bvo vertices. Anoptimal partition,
With energy 2, of thisgraph
is indicatedby
the dashed vertical line ; let all vertices to the left of the dashed line be in subset A and all the ones to theright
in subset B. A rotationby
ar/4 leads to anotheroptimal configuration.
Bipartitioning
necklaces is not anNP-complete problem. Nonetheless,
it possessesfeatures,
such as many local minima and brokenergodicity,
which lead toinsight
into more difficultproblems.
In a similarspirit,
Ettelaie and Moore have used a one dimensionalIsing spin glass
tostudy
simulatedannealing [10].
A fundamental
difficulty
with simulatedannealing
forgraph bipartitioning
is that there areregions
of theconfiguration
space which aredynamically
inaccessible from one another on the time scale ofobservation, particularly
at low temperatures. This causesergodicity
to break down over time scales even muchlonger
than those characteristic of simulatedannealing
runs[4]. Hence,
time averages taken for asingle
walker are not the same as ensemble averages taken over the entireconfiguration
space. It follows that the ensembleapproach
isquite
different from the serial
approach.
Thus the ensembleapproach
is well worthexploring
evenaside from the
question
ofcomputational speed.
If we cool
starting
with an ensemble inequilibrium
at sometemperature,
then the ensemble members must rearrange themselves withinrelatively
isolatedregions
of theconfiguration
space, and
they
must also rearrange between theregions.
lvhile the firstrearrangement
islikely
to berelatively fast,
the latterlikely
takes considerable time and results in the ensemblefailing
to reachequilibrium.
Theonly
way to alleviate this is to cool veryslowly.
Ourgeneral
approach,
based on ensembleequilibrium
forjustification
ofoptimality,
may be viewed as apossible improvement,
but it also suffers from verylong equilibration
times at lowtemperatures.
2.2 SCHEDULE. A critical issue is how to best carry out simulated
annealing
runs. What is the best sequence of temperatures, and how much time should be spent at each temperature ? Mostprevious applications
of simulatedannealing
have used preset schedules based either onsimplicity
ofimplementation
or, in the case of the Geman and Geman schedule[I1, 12],
onthe fact that it has
provable asymptotic properties.
Here we present anadaptive
schedule whoseimplementation depends
on the use of ensembles.Let t denote the «time measured as the number of
Metropolis
stepsattempted
perwalker. The Geman and Geman schedule uses
T(t)
=
d/In
I +t),
where dis a constant. Forsufficiently large
d, this schedule results in a random walkwhich,
in the limit of infinitetime,
visits theground
state withprobability
I.However,
no one has used a value of d whichgives
asufficiently
slowcooling
rate for the provenasymptotic properties
toapply.
More
typically,
simulatedannealing
studies have used a constantcooling
rateT(t)
= a bt
[10, 13-14]
or anexponential cooling
rateT(t)
=
aexp(-bt) [15],
where a and b areconstants. Our interest is in a certain schedule which makes use of
thermodynamic
information
gathered during
thecooling.
Severalcooling
schedules have been advanced which make use ofthermodynamic
criteria.Examples
of such schedules include ones whichkeep
AE[16, 17]
or AS[18]
constant for each relaxation fromT(t)
toT(t
+ I).
Such schedulescan
improve performance
ascompared
withexponential
or linearcooling.
Our constantthermodynamic speed
schedulekeeps
the average energy of the system within a fixed number of standard deviations of the energy which the system would reach inequilibrium [5, 19-24].
The constant
thermodynamic speed
schedule has been shown to beoptimal
in a certain sensespecified
below and hasperformed
very well in simulations[5, 20-23].
Let
(E)
be the average energy of the ensemble of walkers at sometime,
and let(E)
~
be the average energy, as
given by equation (5), corresponding
to T.~loote, though,
that
equation (5)
is neveractually
used to calculate(E)
~!) In our
schedule,
T is controlled soas to
keep (E)
~ a fixed number v of standard deviations less than
(E)
:jEj
~ =
jEj
vj (AE)2j '/2, (g)
where v is the «
thermodynamic speed
» which iskept
constant, andj( AE)2j
=
j (E (E) )~)
is the variance of theenergies
of the ensemble members. The ideais to evaluate
(E) often,
andadjust (E)~
to maintain theequality
inequation (8).
Following
each calculation of a new(E)
~
using equation (8),
a new T must be determined forcalculating
thestep probabilities.
We refer to(E)~
and T as thetarget
energy andtemperature, respectively.
Let d(E)
~
and dT be the
change,
from the oldvalues,
of thetarget
energy and thetarget
temperature,respectively.
Fromequation (6),
d(E)
dT ~
=
T~
~
((AE) ) (9)
Details of our
implementation
will be discussed further in thecomputation
section where we present anexplicit algorithm.
Our schedule is consistent with the common wisdom that
cooling
should slow down if either the heatcapacity [I]
or the time scale[25] gets large.
We tested our
implementation
of simulatedannealing against
some othercooling
schedules[23].
Ourimplementation
was found toproduce generally superior
results.Our schedule is
optimal
in two senses. The first is based on the constraint that the state of the ensemble should remain at all timesstatistically
«indistinguishable
from theequilibrium
state toward which the system is
striving [19, 26].
The second is the fact that our schedule minimizesentropy production
in thethermodynamic analog
of a system cooled to a lowtemperature by
successive contact with cooler and cooler reservoirs[23, 27~.
2.3 BSF ENERGY. The Best So Far
Energy
of the I-th walker at time t,BSFE;(t),
isdefined as the lowest energy that walker has seen up to time t:
BSFE,(t)
=
ruin
(E;(t'))
,
(10)
o«i~«i
where
E;(t')
is the energy of the I-th walker at time t'.We have used BSFE'S as a measure of
quality
incomparing
differentoptimization
methods[5, 23, 28, 29].
Here we show how one can use BSFE'S to determine theoptimal
ensemble sizefrom an
annealing
run, forgiven
total number ofMetropolis
steps, orcomputational
effortc = M* t, where M is the number of walkers
[30].
Denote the distribution of BSFE'S
by f(E,
t).
Fromf(E,
t it ispossible
to calculate the distribution of theVery
Best So FarEnergy, VBSFE,
which is the minimum energy seen up to time t over the entire ensemble :VBSFE(t)
=
ruin
BSFE;(t) (I I)
o «; «M
Denote the
probability
distribution of VBSFE'Sby
h(E,
t, M).
This distribution is found from its cumulative distributionH(E,
t,M)
which isgiven by
:E M
H(E,
t,M)
= Il
f(E',
t) dE') (12)
-w
The
integral
inequation (12)
is theprobability
that the BSFE is less than orequal
to E. One minus thisintegral
is then theprobability
that the BSFE is greater than E. This difference to the M-th power is theprobability
that all Mindependent samples give
a BSFE of E orgreater.
H(E,
t,M)
is therefore theprobability
that at least one of the M walkers has visited a state with energy E or lower.The
probability density h(E,
t,M)
is :h(E,
t,M)
=
~~~~
~'~'~
,
(13)
with
expectation
value(VBSFE (t, M) ) lm
= Eh
(E,
t dE.(14)
m
For a
given
totalcomputer
effort c, it ispossible
to minimize(VBSF(t, M)) by varying
the ensemble size M. At least one local minimum for(VBSF(t, M~)
should exist since for very small M the walkerssample
too little of theconfiguration
space to find a low energy, and for verylarge
M too little time is allocated to each walker to get very far. We let M*=
M*(c)
denote theoptimal
ensemble size for fixed c.Using
an ensemble in simulatedannealing
allows us tosample
and hence estimate thedistribution
f(E,t).
From this distribution we can find theoptimal
ensemble sizeM* a
posteriori.
As yet, no run-time method forchoosing
M* has been found.To get a
good
estimate of the distributionf(E,
t)
it is necessary to run theannealing
withan ensemble. The calculation can serve as a
guideline
forchoosing
an ensemble size M on similarproblems.
2.4 HAMMING DISTANCES. A measure of the nature of the
configuration
spacebeing
sampled by
the ensemble at any instant isprovided by
the distribution ofHamming
distances between the walkers. Eachgraph
vertex contributes either 0 or I to theHamming
distance between two walkers ; the contribution is 0 if the vertex is in the same subset for both walkers and I otherwise. The sum of the contributions of all the vertices is theHamming
distancebetween the two walkers. The
Hamming
distance is a natural measure for distance inconfiguration
space.There are
M(M-1)/2
distinctpairs
of walkers. We note thefollowing
facts :(I)
theHamming
distance H between two walkers lies between 0 and the number of verticesN,
inclusive ;(2)
His an even number ;(3)
in the limit as M getslarge,
theHamming
distancedistribution is
symmetric
aboutN/2; (4)
athigh
temperatures, where the walkers arerandomly
distributed over theconfiguration
space, theprobability
distributionP~(H)
ofHamming
distances H isPN(H)
=
°
if His odd~i
N N!(15)
j~r
j~)j
jpif HIS even
Condition
(3)
holds since theconfiguration
space issymmetric
under a swap of all A and B vertices. If such a swap is made for awalker,
theHamming
distance with any other walker goes from H to N H. Thisexpected
symmetry offers agood
runtime check as to whether or notenough
walkers arebeing
used.As the temperature is
lowered,
walkers should tend to grouptogether
in low energyregions
of the
configuration
space and theHamming
distribution shouldspread
out from a centralpeak.
The distribution ofHamming
distances is a direct measure of the effectiveconfiguration
space
currently being sampled by
the ensemble.In
problems
with aunique optimal configuration,
theHamming
distance distribution shouldapproach
twospikes
at 0 and N as thetemperature
isslowly
reduced to zero.start
Set Initial T
Equilibrate Ensemble at T
<E>T
= <E>NewE <E> v <(AE)z~i/2
NewT
=
+
T2[NewE
T
=
NewT E(T)
= NewE
Run Ensemble at T for t~ time steps
Fig.
2. Flowchart of our main program. Theexpression
for New E comes fromequation
(8) and theone for New T from
equation (9).
The averages are the ensemble averages. A program constant tr contained the number of times each walker was considered at each temperature. In our runs, thisconstant was set at 100.
Also
illuminating
are theHamming
distributions for the walkers with aspecific
energy. This allowsviewing
theconfiguration
space at aparticular
energyslice,
forexample,
the lowest energyconfigurations.
3.
Computer
program.The
major portion
of our program was written in MaCFORTH. Forspeed,
the oftenrepeated parts
were written in 68 000 ASSEMBLYlanguage. Thirty-one
bit random numbers weregenerated
with Tautsworth'salgorithm (R250)
which isfast,
and has anessentially
infinitecycle
time[31, 32].
Below we describe the data structures andalgorithms
usedby
ourprogram.
The
graph
data structure consists of threeparts
:first,
a constantgiving
the total number ofvertices, second,
a list ofneighbors
for each vertex[33], and, third,
a table of Naddresses,
one address for each vertex
neighbor
list. This data structure waspicked
to minimize the timerequired
to read off all theneighbors
of any vertex, and to minimize the amount of memoryrequired
to store thegraph.
Anadjacency
matrixtype representation
of N x Nflags,
inwhich the
I, j-th
matrix element is I if vertices I andj
areconnected,
and 0otherwise,
takes up much more memory for sparsegraphs,
and is slowerreading
off all theneighbors
of a vertex.On the other
hand,
with anadjacency
matrixrepresentation, given
any twovertices,
one may look updirectly
whether or notthey
areneighbors.
This must be known tocompute
energychanges.
With our data structure for thegraph,
thisquestion required
a list search.The list of
neighbors
for each vertexbegan
with the number ofneiglibors
of the vertex, followedby
a list of theneighbors.
Theneighbor
vertex numbers were listed inascending
order to allow use of the efficient
binary
searchalgorithm.
The walkers were numbered from I to
M,
where M is a constant. Each walker datastructure consists of two
parts
: the energy of the walker and an array of Nflags
whichgives
the subset to which each vertex
belongs.
If two vertices areswitched,
both walker vertex subset list and walker energy areupdated.
The
target
data structure consists of threeparts
: thetarget temperature T,
thetarget
energy(E)
~
and a table of
Metropolis probabilities corresponding
to the target temperature,one element for every
possible
AE. Theprobabilities
were stored as scaled upintegers
tospeed
up the arithmetic.We
kept
track of the BSFE'S in an array with one element for each walker.Following
eachupdate
of awalker,
the walker's energy wascompared
with hiscorresponding
BSFE. If the walker's energy was less than hisBSFE,
then his current energyreplaced
his BSFE.In addition there were constants for the
thermodynamic speed
v and for the number of times each walker was considered beforeupdating
the target. There was also a constantcontrolling
howfrequently analysis
was saved to a text file for later use. A counter was used tokeep
track of the total number of times each walker was consideredduring
a run.The program
proceeded
verysimply,
as shown infigure
2.During initialization,
all data structures wereprepared.
With a newgraph,
the initialtemperature
waspicked fairly high by
the user, and the walkers were allowed to
equilibrate
at that temperature. This allowed the determination of the firsttarget
energy. The criterion forchoosing
the firsttemperature
wasthat the
Hamming
distance distribution shouldapproximately
match thehigh
temperature distribution inequation (15).
Then each walker in the ensemble was considered several times(100
times for all the runsreported here)
for aMetropolis step
at thetemperature
T. Afterthis,
a new target was set and the process continued until the average energy ceased toterminate.
4. Results.
In this
section,
wepresent
results of ourcomputer experiments
with several 100 vertex randomgraphs,
and with the[80, 2]
necklace.We
begin
with the necklace results since these are easy tointerpret.
We made a sequence ofruns at various
thermodynamic speeds
on the[80, 2]
necklace. Each run started with anensemble of 500 walkers in
equilibrium
at T= 2.0.
Equilibrium
was achievedby running
at thistemperature
for several thousand steps per walker. This temperature is well above T=
0.42,
the location of thepeak
in the heatcapacity.
TheHamming
distance distribution T= 2.0 isreasonably
close to that of thehigh
temperature distribution inequation (15).
During
thecooling,
the target wasupdated
after each 100attempted
steps per walker.Figure
3a shows the target energy(E)~
as a function of the targettemperature
T for severalspeeds.
Also shown is the theoretical curve[9].
Thepoints
for all thesespeeds
areclose to one
another,
and to the theoretical curve,indicating
that the temperature did notchange
too fast at any of thespeeds
shown.Ultimately,
several walkers found anoptimal
configuration
of the[80, 2]
necklace after each of the runs.14
40 ))~~~ ~,
~o
ii
~~~ ~
~
;~
,to~'°
~f
~~.
wA~°°'
C , «
" 20 a . $'
~ i i
.jw
+ . ..~
~ ,.
~° io
a)
~~
0 9
0.3 O.4 0.5 O.6 O.7 O.8 O.9 1-O 0.35 O.36 O.37 O.38 O.39 0.40
Target Temperature Target Temperature
Fig.
3.-Target
energy(E)~
for the[80,
2] necklace as a function of the target temperature Tcomputed along
the way for fourspeeds.
Also shown is the theoretical curve for(E)
~-
Figure
3b is amagnification
of a section of thegraph
at low temperatures. At low temperatures, theexperimental
curve falls above the theoretical one for all speeds. We believe that this is an indication of non-
equilibrium brought
onby long
relaxation times. The horizontal arrow infigure
3a shows where the theoretical curve crosses the leftedge
of thegraph.
Figure
3b shows amagnification
of thegraph
at lowtemperature.
It is clear that thepoints
fall closer to the theoretical curve for smaller
speeds. However,
thelimiting
curve isapproached slowly
withdecreasing speed, indicating
the presence of verylong
relaxation times. The ensemble method offers asimple limiting technique
to at least test whether or not we areapproaching equilibrium along
the way. Note thatdespite
the presence of verylong
relaxation
times,
theexperimental
results do not deviateexcessively
from the theoretical results.6
5 o °
~
O
~fl
~4 O
~
V
3 05
Theo
2
lo 20 30 40
Fig. 4.- Ensemble energy standard deviation as a function of the ensemble average energy for
v = 0.05. The scatter is less than 10 9b about the theoretical curve. Good data for
((6E)~)"~
are
essential for the
implementation
of ouradaptive
schedule.Essential
ingredients
in theimplementation
of our schedule aregood
run time estimates of((AE)~)
~/~.Figure
4 shows((AE)~)
"~ for the ensemble of 500 walkers as a function of the average ensemble energy(E).
Also shown is the theoreticalequilibrium
curve. Both thequality
of the data and theagreement
withtheory
aregood.
20
~ ° O v=0 05
6 O D v=0 20
o A
v=0 50
p 4 ~
B D
~ l 2 O
I O
1.0 O
~ A ~
f~ 8
~
O 6~
~
04 ~ A
02
go io3 io4 ios
Time
Fig.
5.Target
temperature T as a function of theMetropolis
time t for the [80, 2] necklace for three values of thethermodynamic speed.
All of the runs started with an ensemble of 500 walkers inequilibrium
at T = 2.0. For these threespeeds,
the lowest average energy waseventually
achieved withv =0.2. The runs with v =0.5 got stuck in local energy minima too soon, and the run with
v = 0.05 was too slow to reach
really
low temperatures in the time allotted.Figure
5 shows the temperature schedule followed with severalspeeds. Cooling
israpid
atfirst,
but slows as thetemperature
decreases and relaxation times grow.We turn now to our results on the random
graph.
Consider first randomgraph
A which consists of N= 100
vertices,
and was constructed withprobability
of anypair
of verticesbeing
connected p = 0.05. Random
graph
A was used ih reference[23],
where it was shown that theconstant
thermodynamic speed
scheduleproduced
a lower BSFEnergy
distribution thanseveral other schedules encountered in the literature. This reference also showed a
graph
of the heatcapacity
for randomgraph
A which has a well definedpeak
at T= 0.87 ± 0.03. The constant
thermodynamic speed
scheduleproved
to followroughly
the samecooling pattem
asa modified Geman and Geman schedule
[12, 34].
We note,however,
that the Geman and Geman schedule is notadaptive,
and hasparameters
which may be difficult to select.53
a v=O.20 o v=O.lO
~ 52
0.05 o.
~
'
cn
~P
51~
uj
50
%
~
49
~
48
47
0 2 0.25 0.35 0.45
Target Temperature
Target TemperatureFig.
6.Target
energy(E)
~ as a function of the target temperature T for Random
Graph
A, which has 100 vertices and p= 0.05, for three
speeds. Figure
6b is amagnification
of a section of thegraph
at low temperatures.~ o
oo o
o
o~
6 o°o °
ru *°
$
~W~4 *
f~ E=47
li
2 v 0. I
~
60 70 80 90 loo
<E>
Fig.
7.- Ensemble energy standard deviation as a function of the ensemble average energy for RandomGraph
A with v = 0.I. The scatter is less than about 10 9b. Good data for((AE)~)
"~ areessential for our
adaptive
schedule. The vertical arrow indicates the lowest energy for thisgraph
in all ofthe runs, E
= 47.
We made a sequence of constant
thermodynamic speed coolings
on randomgraph
A. Eachcooling
used an ensemble of 500 walkersstarting
inequilibrium
at T= 2.0 and ran for a total time of about 100 000 steps per walker.
Figure
6 shows thetarget
energy as a function of the target temperature for severalspeeds.
Qualitatively,
the results look similar to those infigure
3 for the[80, 2]
necklacegraph.
We have no exact theoretical solution to compare with theexperimental
results forbipartitioning
a random
graph.
Figure
7 shows((AE)~)
~/~ as a function of(E)
taken in the run with v= 0.I.
Again
the results lookqualitatively
similar to thecorresponding
ones for the[80, 2]
necklace. Thescatter with 500 walkers is not excessive.
The
cooling
schedules followed for randomgraph
A are shown in reference[23].
The characteristiccooling pattem
evident isroughly
the same as that for the[80, 2]
necklaceinitially rapid coofipg,
followedby
slowercooling.
Of considerable interest is the nature of the
configuration
spaces forproblems
such as those considered here. Palmer[35]
hasposed
a set ofquestions regarding
the nature of theconfiguration
space, some of which we take up here.Perhaps
mostimportant
are the nature and the distribution of thedeep
energy minima. Are the minimum energyconfigurations
at the bottom of broadvalleys,
or arethey
more like narrow holes in a flatgolf
course ? Are the sides of thevalleys reasonably smooth,
or are there numerousledges
and traps toimpede
asearch for the bottom ? Are the locations of the
deep
energy minima correlated ? In otherwords,
are theoptimal configurations
close to one another in theconfiguration spare,
or arethey
far apart ? Thesequestions
are amenable toanalysis using
ensembles.We consider first Random
Graph
A discussed above. In the course of the run withv =
0,I,
weanalyzed
theHamming
distance distributions of the walkers withspecific energies. Figure
8 shows theHamming
distance distribution for thisgraph
for four energyslices at and above what we believe to be the lowest energy E
=
47. It is clear that there is a
multiplicity
of states with lowest energy E=
47. In
addition, they
seem to be clustered about tworegions
of thephase
space.A trivial
degeneracy
in the lowest energyconfiguration
is due to vertices which are not connected to any other vertex.Swapping
such vertices results in AE= 0. Random
Graph
Ahas
only
one unconnected vertex, so this is not the cause of thisdegeneracy.
Thedegeneracy
results because of
reasonably
farseparated valleys
inconfiguration
space.E=94 E=62
#=42 #=49
~r=2.0) (T=O.8)
g E=48
g #=195 #=146
~ (T=O.28) (T=O.28)
d
d3
0 50 loo
Hamming Distance
Fig.
8.Hamming
distance distribution for RandomGraph
A for walkers with fourspecific energies.
For each
graph,
the number of walkers at each energy isgiven.
Also show for eachgraph
is the temperaturealong
the way at which the data wascomputed.
The scale of thefrequency
distribution is different for eachgraph.
T/C)E=50
#=46 #=68)fi
E=51TE=5g
#=34~ ~
E)E=53 =54lofiE~56
#=61fi T=52
#=58 154H)E=53 E=54
]E=50
j #=87 #=203 #=49 #=77
I
§
I
~ ~
~
50 loo
Hamming Distance
Fig.
9.Hamming
distance distributions foreight
randomgraphs.
Shown are the distributions for the lowest and the next to the lowestenergies
for eachgraph.
ii
)
A£
ii,o m
§i
v
io 5
5 lo 15 20
M
Fig.
10. Theexpectation
value of the lowest energy observed over the ensemble as a function of the ensemble size M. The total computer effort is calculated as theproduct
of the ensemble she and the time allocated to each walker and waskept
constant at 529 000Metropolis
steps.We
point
out that notonly
was itrelatively
easy to find what we believed to be the lowest energy state(E
= 47 with many
walkers,
but on astraight quench
to T= 0 about 20 iii of the walkers found a lowest energy
configuration.
This indicates that the sides of thelarge valleys
are
reasonably
smooth withenough byways
to slide down even on astraiglit quench.
We examined nine random
graphs
with N=
100 constructed with connection
probability
p =
0.05. For all of these
graphs
it was easy to find what we believe to be the lowest energystates with many walkers. The average lowest energy for these
graphs
is 52 ±3,
whichcompares with 42
predicted by
Fu and Andersen[2]
and 44by
Banaver et al.[36]. Considering
the
approximations
made in these theoreticalcalculations,
we consider this to begood
agreement. These theoretical
predictions
were based on seriesexpansions
for sparsegraphs
;our random
graphs
were not all that sparse.Figure
9 shows that for mostgraphs
the lowest energy states tend to be few in number and clusteredclosely together
in oneregion
of theconfiguration
space. The next tohighest
energyshows more structure as the bottoms of additional
valleys
make their appearance.Our
general
conclusion about the randomgraphs
we studied is that theoptimal energies
areto be found at the bottoms of rather broad
valleys
withreasonably
smooth walls. In morecases than not, the lowest energy
configurations
were in asingle
smallregion
of theconfiguration
space.In
general,
we conclude that our ensembleapproach
offers an excellent way toexplore
theconfiguration
space ofcomplicated systems by
means of theHamming
distance distribution for the ensemble of walkers.Figure
10 shows theexpectation
value of the very lowest energy observed over the ensemble as a function of the ensemble size M. The total computer effort is calculated as theproduct
of the ensemble size and the time allocated to each walker and waskept
constant.The annealed
graph
is a 200 vertex randomgraph
with connectionprobability
of 0.01. Theannealing
was done with an ensemble of 200 walkers with v= 0.I.
S. Conclusions.
We have
presented
a newimplementation
of simulatedannealing.
In contrast to mostprevious implementations,
which are based onrepetitive
serialsearches,
ourapproach
is based on an ensemble : manycopies searching simultaneously
over the entireconfiguration
space.