HAL Id: jpa-00247320
https://hal.archives-ouvertes.fr/jpa-00247320
Submitted on 1 Jan 1997
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
The Random Link Approximation for the Euclidean
Traveling Salesman Problem
N. Cerf, J. Boutet de Monvel, O. Bohigas, O. Martin, A. Percus
To cite this version:
The
Random
Link
Approximation
for
the
Euclidean
lYaveling
Salesman
Problem
N.J.
Cerf
(*),
J. Boutet deMonvel,
O.Bohigas,
O.C. Martinand A.G.
Percus
(**)
Division de
Physique
Th60rique
(***),
Institut dePhysique
NuclAaire,
UniversitAParis-Sud,
91406Orsay
Cedex,
France(Received
22April
1996, revised 28 June 1996,accepted
23September1996)
PACS.02.60.Pn Numerical
optimization
PACS.02.70.Lq
Monte Carlo and statistical methodsPACS.64.60.Cn Order-disorder transformations; statistical mechanics of model systems
#
Abstract. The
traveling
salesmanproblem
(TSP)
consists offinding
thelength
of the short-est closed tourvisiting
N "cities". We consider the Euclidean TSP where the cities aredis-tributed
randomly
andindependently
in a d-dimensional unithypercube.
Working
withperiodic
boundary
conditions and inspiredby
a remarkable
universality
in the kth nearestneighbor
dis-tribution,
we find for the average optimum tourlength
(LE)
=
4E(d)
N~~~/~
[1+
O(1IN)]
withflE(2)
= 0.7120 + 0.0002 and
flE(3)
= 0.6979 + 0.0002. We then deriveanalytical
predictions
forthese
quantities
using
the random linkapproximation,
where thelengths
between cities are takenas
independent
random variables. From the"cavity"
equationsdeveloped by
Krauth, MAzardand Parisi, we calculate the associated random link values
fIRL(d).
For d= 1, 2, 3, numerical
results show that the random link
approximation
isa
good
one, with adiscrepancy
of less than2.1% between
flE(d)
andfIRL(d).
For large d, we argue that the approximation is exact upto
O(1/d~)
andgive
aconjecture
forflE(d),
in terms of a power series in1Id,
specifying
bothleading
andsubleading
coefficients.Rdsumd. Le
problAme
du voyageur de commerce(TSP)
consiste h trouver le chemin ferm6 leplus
court qui relie N "villes". Nous dtudions le TSP euclidien off les villes sont distribuAesau hasard de maniAre ddcorrAlde dans
l'hypercube
de c6tA 1, en dimension d. En imposant desconditions aux bords
pdriodiques
et guidAs par une universalitdremarquable
de la distribution deskiimes voisins, nous trouvons la
longueur
moyenne du cheminoptimal
(LE)
=~E(d)
N~~~~~
Ii
+O(1/N)j,
avecflE(2)
= 0,7120 + 0,0002 etflE(3)
= 0,6979 + 0,0002. Nous Atablissons ensuitedes
prAdictions
analytiques
sur cesquantitAs
h l'aide del'approximation
de liens alAatoires, offles
longueurs
entre les villes sent des variables alAatoiresindApendantes.
Grice auxAquations
"cavitA"
dAveloppAes
par Krauth, MAzard et Parisi, nous obtenons dans le cas de liens alAatoiresles valeurs,
fIRL(d), analogues
hflE(d).
Pour d= 1,2,3, les rAsultats
numAriques
confirment quel'approximation
de liens a14atoires est bonne, conduisant hun Acart inf6rieur h
2,1%
entreflE(d)
et
fIRL(d).
Pour dgrand,
nous donnons des arguments montrant que cetteapproximation
estexacte
jusqu'h
l'ordre1/d~
et nous proposons une conjecture pourflE(d),
exprimAe
en fonctiond'une sArie en
1Id,
dent on donne les deuxpremiers
ordres.(*)
Present address: W-K-Kellogg
RadiationLaboratory,
California Institute ofTechnology,
Pasadena, CA 91125, USA
(**)
Author for correspondence(e-mail:
percusflipno.in2p3.fr)
(***)
UnitA de Recherche des LiniversitAs Paris XI et Paris VI assoc14e au CNRS1. Introduction
Given N "cities" and the distances between
them,
thetraveling
salesmanproblem
(TSP)
consists of
finding
thelength
of the shortest closed "tour"(path)
visiting
everycity exactly
once,~N.here the tour
length
is the sum of thecity-to-city
distancesalong
the tour. The TSP isNP-complete,
whichsuggests
that there is nogeneral algorithm
capable
offinding
theoptimum
tourin an amount of time
polynomial
in N. Theproblem
is thussimple
to state, but very difficultto solve. It also
happens
to be the most well known combinatorialoptimization
problem,
andhas attracted interest from a wide range of fields. In
operations
research,
mathematics andcomputer
science,
researchers have concentrated onalgorithmic
aspects. Aparticular
focus hasbeen on heuristic
algorithms
algorithms
which do notguarantee
optimal
tours for caseswhere exact methods are too slow to be of use. The most effective heuristics are based on local
search
methods,
which start with anon-optimal
tour anditeratively
improve
the tour withina well-defined
"neighborhood";
a famousexample
is theLin-Kernighan
heuristic [1]. Morerecent efforts have involved
combining
local search and non-deterministicmethods,
in order torefine heuristics to the
point
wherethey give
good enough
solutions forpractical
purposes; apowerful
suchtechnique
is Chained LocalOptimization
[2].Over the last fifteen years,
physicists
haveincreasingly
been,drawn
to the TSP aswell,
andparticularly
to stochastic versions of theproblem,
where instances arerandomly
chosen froman ensemble. The motivation has often been to find
properties
applicable
to alarge
classof disordered
systems,
eitherthrough
good
approximate
methods orthrough
exactanalytical
approaches.
In ourwork,
we consider two such stochastic TSPS. Thefirst,
the EuclideanTSP,
is the more classic form of theproblem:
N cities areplaced
randomly
andindependently
in a d-dimensional
hypercube,
and the distances between cities are definedby
the Euclideanmetric. The
second,
the random linkTSP,
is a relatedproblem
developed
within the contextof disordered
systems:
rather thanspecifying
thepositions
of cities, wespecify
thelengths
l~jseparating
cities I and j, where the l~j are taken to beindependent,
identically
distributedran-dom variables. The
appeal
of the random linkproblem
is,
on the onehand,
that ananalytical
approach
exists forsolving
it[3,4],
and on the otherhand,
that when certain correlations areneglected
this TSP can be made to resemble the Euclidean TSP. We therefore consider therandom link
problem
as a random linkappro~imation
to the(random point)
Euclideanprob-lem. Researchers outside of
physics
remainlargely
unaware of theanalytical
progress madeon the random link
TSP;
one of ourhopes
is to demonstrate how these results are of directinterest in
problems
where the aim is to find theoptimum
Euclidean TSP tourlength.
Our
approach
in this paper is then to examine both the Euclideanproblem
and the randomlink
problem
the latter for its own theoretical interest as well as for a betterunderstanding
of the Euclidean case. We
begin by considering
indepth
the EuclideanTSP,
including
a reviewof
previous
work. We find that,given periodic
boundary
conditions(toroidal
geometry),
the Euclideanoptimum
tourlength
LE
averaged
over the ensemble of allpossible
instances hasthe finite size
scaling
behavior(LE)
=
@E(d)
N~~~/~
ll
+ O(j)j
(I)
I
From simulations. we extract very
precise
numerical values forflE(d)
at d = 2 and d =3;
methodological
and numericalprocedures
are detailed in theappendices.
We alsogive
numer-ical evidence that the
probability
distribution ofLE
becomes Gaussian in thelarge
N limit.In addition to these TSP
results,
we find asurprising
universality
in thescaling
of the meanis
equivalent,
inappropriate
units,
to an infinite volume limit at constantdensity.
LE/N~~~/~
then
corresponds
to an energydensity
that isself-averaging
and has a well-defined infinitevolume limit. The
original
proof by
BHH isquite
complicated;
simpler
proofs
have since beengiven by Karp
and Steele[6,
ii.
One of our
goals
is to determineflE(d).
BHH gaverigorous
lower and upper bounds as afunction of dimension. For any
given
instance,
a trivial lower bound onLE
is the sum over allcities I of the distance between I and its nearest
neighbor
in space. Infact,
since a tour at bestlinks a
city
with its two nearestneighbors,
this bound can beimproved
uponby summing,
over all
I,
the mean of i's nearest and next-nearestneighbor
distances.Taking
the ensembleaverage of this
quantity
(that
is, the average over allinstances)
leads to the bestanalytical
lower bound to date. For upper
bounds,
BHH introduced a heuristicalgorithm,
now knownas
"strip",
in order togenerate
near-optimal
tours(discussed
also in a paperby
Armour and ~vheeler[8]).
In two dimensions the method involvesdividing
the square intoadjacent
columnsor
strips,
andsequentially
visiting
the cities on agiven
strip
according
to theirpositions
along
it. The
respective
lower and upper boundsgive
0.6250 <pE(2)
< 0.9204.In addition to
bounds,
it ispossible
to obtain numerical estimates for@E(d).
BHH usedtwo
instances,
N= 202 and N =
400,
from whichthey
estimatedpE(2)
m 0.749using
hand-drawn tours.
Surprisingly
little has been done toimprove
upon this value in twodimensions,
and
essentially
nothing
inhigher
dimensions. Stein [9] has foundpE(2)
m 0.765, which isfrequently
cited.Only
recently
have better values beenobtained,
but asthey
come fromnear-optimal
tours foundby
heuristicalgorithms,
they
should be considered more as upper bounds than as estimates.Using
a local search heuristic known as"3-opt"
[10],
Ong
andHuang
Ill]
have found
@E(2)
<0.743;
using
anotherheuristic,
"tabu"search,
Fiechter[12]
has found@E(2)
<0.731;
andusing
a variant of simulatedannealing,
Lee and Choi [13] have foundpE(2)
< 0.721. In what follows we shall show what is needed for a moreprecise
estimate offlE
Id)
with,
furthermore,
a way toquantify
the associated error.2.2. EXTRACTING
flE(d).
As N- oo,
LE/N~~~/~
converges withprobability
I to theinstance-independent
flE(d).
Our estimate of@E(d)
must rest on someassumptions,
though,
since
only
finite values of N are accessiblenumerically.
Note first that at values of N wherecomputation
times are reasonable.LE
has substantial instance-to-instance fluctuations. Toreduce and at the same time
quantify
these fluctuations, we average over alarge
number ofinstances. We thus consider the numerical mean of
LE
over the instancessampled,
which itselfsatisfies the
asymptotic
relation(3)
but with a smoother convergence. To extractflE(d),
wemust understand
precisely
what this convergence in N is.If cities were
randomly
distributed in thehypercube
with openboundary
conditions,
thecities near the boundaries would have fewer
neighbors
and thereforelengthen
the tour. Instandard statistical mechanical
systems
at constantdensity, boundary
effects lead to correctionsof the form surface over volume. For the TSP at constant
density.,
the volume grows as ~T andthe surface as
N~~~/~
In a d-dimensional unithypercube,
then, the ensemble average ofLE
would
presumably
have thelarge
N behaviorN~-~/~ fl~(d)
Ii
+£
+]
+(4)
In order to extract
@E(d)
numerically.
it would be necessary toperform
a fit which includesthese corrections. A reliable numerical
fit,
however,
must have fewadjustable
parameters,
andthe slow convergence of this series would prevent us from
extracting
flE(d)
tohigh
accuracy. Wetherefore have chosen to eliminate these
boundary
(surface)
effectsby using periodic
boundary
parameters and a faster convergence,
enabling
us to work with smaller values of N wherenumerical simulations are not too slow.
For the
hypercube
withperiodic
boundary
conditions,
let us introduce the notation~
~~
~~ILE(N,
d))
~~~~
' "
Ni-i/d
'where
(LB
is the average ofLB
over the ensemble of instances.(flE
IN, d)
is,
inphysical
units,the
zero-temperature
energydensity.)
We then wish to understand howflE
IN,
d)
converges toits
large
Nlimit,
flE(d).
In standard statistical mechanicalsystems,
there is a characteristiccorrelation
length
(.
Away
from a criticalpoint,
(
isfinite,
and finite size corrections decrease ase~~/~,
where 11' is a measure of thesystem
"width". At a criticalpoint,
(
isinfinite,
and finitesize corrections decrease as a power of
I/W.
For disordered statisticalsystems,
however,
thispicture
must be modified. Even if(
is finite for each instance in theenserqble,
thefluctuating
disorder can still
give
rise topower-law
corrections for ensembleaveraged
quantities.
In thecase of the TSP. this is
particularly
clear: the disorder in thepositions
of the cities induceslarge
finite size effects even onsimple
geometric
quantities.
To see how this
might
affect the convergence offlE(N,d),
consider thefollowing.
For agiven
configuration
of Npoints,
callDk(N, d)
the distance between apoint
and its kth nearestneighbor,
where k=
I,.
.,
N 1. Take the
points
to be distributedrandomly
anduniformly
in the unit
hypercube.
Let us find(Dk(N,d)).
Underperiodic
boundary
conditions,
theprobability
density
p(I)
offinding
apoint
at distance from anotherpoint
issimply equal
(for
0 < <
1/2)
to the surface area at radius I of the d-dimensionalsphere:
d~d/2
~
~~~~
T(d/2
+
1)
~~~The
probability
offinding
apoint's
kth nearestneighbor
at distance I(see Fig.
2)
isequal
tothe
probability
offinding
k I(out
of N-1)
points
withinI,
onepoint
at and theremaining
N k I
points
beyond
1:PlDklN,
d)
=lj
=)_
))
/~
p(I')
l'j
~(N
k) p(I)
I
j~
p(I')
l'j
~ ~ ~(7)
o ~ ~-ll
II
i~v
L)
dlr~~il~l
~l
i~~>
~~il~l
i~
~l
~~18)
giving
the ensemble average~/
~d/2
k~~~~~'~~~
k~
~~
~~ ~r(d/2
+1)
~~~~
~~~~
r(/(~~
i)
~j
~ ~ ~ ~~ ~ ~~~where the corrections are due to the >
1/2
case, and areexponentially
small in N.Recognizing
theintegral,
up to asimple change
ofvariable,
as a Beta function(B(a,
b)
ef/
t~~~ (1
t)~~~
dt,--,_ . ,' , . , ' . , , . ' ~ / ' / ' , , / ' j
~
~~
j . l , I 'fi
' N-k-I points , / , . , . . , / , ' , , . , , . '-__-'Fig.
2. Apoint's
N -1neighbors:
k -1 nearestneighbors
are within distance I, kth nearestneighbor
is at I, and remaining N k -1 points are beyond 1.N, we see that
l~~~~'~~~
"~~~~)~~~~~
~~~il~~~
r((~(lid)
~ ~~°~~~~~~i~~~~~
~~~il~~~
~ ~~~~
~~~~2/~~~
~ ~~/2~~
~~~~We are confronted here with a
remarkable,
and hithertounexplored,
universality:
the exactsame
1IN
seriesgives
theN-dependence
regardless
of k. The same finite sizescaling
behaviortherefore
applies
to all kth nearestneighbor
distances.It
might
behoped
then that thetypical
linklength
inoptimum
tours would have thisN-dependence,
and thatflE(N,d)
would therefore have the same1IN
expansion.
This is notquite
the case. The link between cities and jfigures
in the average(Dk IN, d))
wheneverj
is the kth
neighbor
of i; itfigures
inflE(N,d),
however,
only
when itbelongs
to theoptimal
tour. Two different kinds of averages arebeing
taken, and so finite size corrections need not beidentical.
Nevertheless,
it remainsplausible
thatflE(N, d)
has a1IN
seriesexpansion,
albeita different one from
(11).
While we cannot prove thisproperty,
it is confirmedby
ananalysis
of our numerical data.
Our
approach
tofinding
flE(d)
is thus as follows:Ii)
we consider the ensemble average(LE),
rather than
LE
for agiven
instance,
in order to have aquantity
with a well-defineddependence
on
N;
iii)
we useperiodic
boundary
conditions to eliminate surfaceeffects;
(iii)
wesample
theensemble
using
numericalsimulations,
and measureflE(N,
d)
within well controlled errors;(iv)
we extract
flE(d)
by fitting
these values to a1IN
series.2.3. FINITE SIzE SCALING RESULTS. Let us consider the d
= 2 case in detail. We found
the most effective numerical
optimization
methods for our purposes to be the local search0.713
7
0.711 + $~ 0.709~
~
0.707 ~i~
~~
0.705 0.703 0 0.02 0.04 0.06 0.08 0.I I/NFig.
3. Finite sizedependence
of the rescaled 2-D Euclidean TSP optimum. Best fit(;~
=
5.56)
gives:
flE(N,
2)/[1+
1/(8N)
+ =0.7120(1
0.0171/N
1.048/N~
). Error bars represent statisticalerrors.
the introduction. Both
heuristics,
by
definition, give
tourlengths
that are notalways optimal.
However,
it is not necessary that theoptimum
be found100~
of the time: there isalready
a
significant
statistical errorarising
from instance-to-instance fluctuations, and so a furthersystematic
error due tonon-optimal
tours isacceptable
aslong
as this error iskept
negligible
compared
to the statistical error. Ourmethods,
along
with relevant numerical details, arediscussed in the
appendices.
For thepresent
purposes, let ussimply
mention thegeneral
nature of the two heuristics used. LK works
by
performing
a~'variable-depth"
localsearch,
asdiscussed further in Section 3.6. CLO works
by
an iterative processcombining
LKoptimizations
with random
perturbations
to the tour, in order toexplore
many different localneighborhoods.
We used LK for "small" N values
(N
<17),
averaging
over 250,000 instances at each value ofN,
and we used CLO for"large"
N valuesIN
= 30 and N =
100),
averaging
over10,000
and6,000
instancesrespectively.
We fitted our
resulting
flE(N,d)
estimates to a truncated1IN
series: the fits aregood,
and are stable with
respect
to the use ofsub-samples
of the data. For a fit of the formflE(N,d)
=
flE(d)(1+ A/N
+B/N~),
we findflE(2)
= o.7120 +0.0002,
withx~
= 5.57 for8 data
points
and 3 fitparameters
(5
degrees
offreedom).
Our error estimate forflE(2)
isobtained
by
the standard method ofperforming
fitsusing
a range of fixed values for thisparameter:
the error bar +0.0002 is determinedby
the values offlE(2)
which make X~ exceedits
original
resultby
exactly
I,
I.e.,
making
j2
= 6.57 in this case.It is
possible
to extract anotherflE (IV,
d)
estimateby making
direct use of theuniversality
discussed
previously:
the universalI/N
series in(11)
suggests
that there will be a fasterconvergence if we use the rescaled data
flE(N,
2)
Ill
+1/(8N)
+ .] This also has theappealing
property
ofleading
to a function monotonic inN,
as shown inFigure
3. We findwith the
leading
termhaving
the same error bar of +0.0002 as before. Note that the1IN
termin the fit is small 2 orders of
magnitude
smaller than theleading
order coefficient and so0.16 0.14 0.12 ~ 0.I
(
f
~~~ U~ 0.06 0.04 0.02 0 0.75 -0.5 -0.25 0 0.25 0.5 0.75X~
Fig.
4. Distribution of 2-D Euclidean TSPscaling
variable XN =(LE
(LE))/N~/~~"~.
Shadedregion is for ~T
= 12
(100,000
instancesused)
and solid line is for N = 30(10,000
instancesused).
Superimposed
curve shows(extrapolated)
limiting
Gaussian.The same
methodology
wasapplied
to the d = 3 case. The~2's
again
confirmed thefunc-tional form of the
fit,
and we find from our dataflE(3)
= 0.6979 + 0.0002.Also,
since ourinitial work
[14],
Johnson et al. haveperformed
simulations at d=
2, 3, 4,
obtaining
results[15]
consistent with ours:
flE(2)
m0.7124,
flE(3)
m 0.6980 andflE(4)
m 0.7234.2.4. DISTRIBUTION oF OPTIMUM TouR LENGTHS. While BHH and others
[6,7]
haveshown that the variance of
LB
/N~~"~
goes to zero as N - oo(see
alsoFig.
1), they
havenot determined how fast this variance decreases. More
generally,
onemight
ask how thedistribution of
LE/N~~~/~
behaves as N - oo. We are aware ofonly
oneresult,
by
Rhee andTalagrand
[16], showing
that theprobability
offinding
LE
with(LE
(LE)(
> t is smaller thanIi
exp(-t2/K)
for some Ii.Unfortunately
this is notstrong
enough
togive
bounds on thevariance.
Let us characterize the distribution at d = 2
by
numerical simulation. Formotivation,
con-sider the
analogy
betweenLE/N~~~/~
andE/V,
the energydensity
in a disordered statisticalsystem.
If thesystem's
correlationlength (
is finite(the
system
is notcritical),
Eli'
has adistribution which becomes Gaussian when V
- oo. This is because as the subvolumes
in-crease, the energy densities in each subvolume become
uncorrelated;
the central limit theoremthen
applies.
A consequence is thata2,
the variance of EIV,
decreases asV~~
If(
is infinite(the
system
iscritical),
then ingeneral
the distribution ofE/V
is not Gaussian. In both casesthough,
theself-averaging
ofE/V
suggests
that thescaling
variable X=
(E
(E))lal'
has alimiting
distribution when V - oo.In the case of the
TSP,
it can beargued
using
a theoreticalanalysis
of the LK heuristic thatat d > 2 the system is not critical.
By analogy
withE/V,
if we take subvolumeK to contain afixed number of
cities,
the central limit theorem thensuggests
thatLE/N~~~/~
has a Gaussiandistribution with
a2
decreasing
asN~~
Thescaling
variableXN
=(LE
(LE))/N~/2~~/~
should
consequently
have a Gaussian distribution with a finite width for N - oo(and
atd >
2).
Numerical results at d2.5. CONJECTURES oN THE LARGE d LIMIT. In most statistical mechanics
problems,
thelarge
dimensional limit introducessimplifications
because fluctuations becomenegligible.
Forthe
TSP,
can oneexpect flE
Id)
to have asimple
limit as d - oo?Again,
consider theproperty
of the kth nearest
neighbor
distanceDk.
In thelarge
Nlimit,
(11)
gives
~~~~~~'~~~
~~~
~~~~~~~ji~~~~~
~~ik(~~~'
°~ ~~ ~~~~~~'
~~~~ mJN~~~/~
~(~d)~/~~
(l
+(
+ ,(14)
2~ewhere
Ak
% -'i +)
+)
+ (~/ is Euler'sconstant).
Notice thatAh
mJ In k at
large
k. Thissuggests
strongly
that unless the"typical"
k used in theoptimum
tour growsexponentially
ind,
we may write for d - oo:flE(d)
= lim~~~)
'~~~~
mJ ~(grd)~/~~
(1+
O(15)
N-°~ N 2~e dUp
toO(I
Id),
thisexpression
is identical to the BHH lower bound onflE(d)
discussed in Section2.1,
given by
thelarge
N limit ofN~/~(Di(N,
d)
+D2(N, d)) /2.
A weaker
conjecture
than(15)
has beenproposed by
Bertsimas and vanRyzin [17]
flE(d)
m~
fi$
as d - oo.
(16)
This
limiting
behavior was motivatedby
ananalogous
result for a related combinatorial op-timizationproblem,
the minimumspanning
tree.Unfortunately,
there is noproof
of either(15)
or(16);
inparticular,
the upper bound onflE(d)
given by
strip,
discussed in Section2.1,
behaves as
/fl
atlarge
d. Thus if theconjectures
are true, thestrip
construction leadsasymptotically
to tours which are on average 1.69 times toolong.
Can we derivestronger
upper bounds? A number of heuristic construction methods should do better than
strip,
butthere are no reliable calculations to this effect. The
only
improvements
over the BHH resultsare due to Smith
[18],
whogeneralized
thestrip
algorithm by optimizing
theshape
of thestrips,
leading
to an upper bound which isvi
timesgreater
than thepredictions
of(15)
and(16)
atlarge
d.In
spite
of ourinability
to derive an upper boundwhich, together
with the BHH lowerbound,
would confirm the two
conjectures
for d - oo, we are confident that(15)
and(16)
are truebecause of
non-rigorous
yet
convincing
arguments.
One is aproof
that(16)
is satisfied for theTSP if it is satisfied for another related combinatorial
optimization
problem
(see
Appendix
D fordetails).
A morepowerful
argument,
presented
in Section3.6,
relies on a theoreticalanalysis
of the LI< heuristic. Itsuggests
that up toO(1/d2),
flE(d)
isgiven by
a random linkapproximation,
leading
to aconjecture
evenstronger
than(15).
3. The Random Link TSP
3.I. CORRESPONDENCE WITH THE EUCLIDEAN TSP. Let us now consider a
problem
atfirst
sight
dramatically
different from the Euclidean TSP. Instead oftaking
thepositions
ofthe N cities to be
independent
randomvariables,
take thelengths
lj
= lj~ between cities I
and
j
(1
<I, j
<N)
to beindependent
randomvariables,
identically
distributedaccording
tosome
p(I).
Wespeak
oflengths
rather thandistances,
as there is no distance metric here. ThisThe connection between this TSP and the Euclidean TSP is not
obvious,
as we now haverandom links rather than random
points.
Nevertheless,
one can relate the twoproblems.
Tosee this, consider the
probability
distribution for the distance between a fixedpair
of citiesii. j)
in the Euclidean TSP. Thisdistribution,
in the unithypercube
withperiodic
boundary
conditions,
isgiven
for 0 < <1/2
by
theexpression
in(6):
d~d/2
~~~~
T(d/2
+1)
~~ ~ ~~~~Of course, in the Euclidean TSP the link
lengths
areby
no meansindependent
randomvari-ables: correlations such as the
triangle
inequality
arepresent.
However,
as notedby
MAzardand Parisi
[3],
correlations appearexclusively
whenconsidering
three or moredistances,
since any two Euclidean distances arenecessarily
independent.
Let usadopt
(17)
as the l~jdistribu-tion in the limit of small I for the random link TSP, where d in this case no
longer
representsphysical
dimension but issimply
a parameter of the model. The Euclidean and random linkproblems
then have the same small I one- and two-link distributions. In thelarge
N limit therandom link TSP may therefore be considered, rather than as a separate
problem,
as a randomlink
appro~imafion
to the Euclidean TSP.Only
joint
distributions of three or more links differbetween these two TSPS. If indeed the correlations involved are not too
important,
then therandom link fIRL
Id)
can be taken as agood
estimate offlE(d).
We shall see that this istrue,
particularly
forlarge
d.3.2. SCALING AT LARGE N. As in the Euclidean case, we are interested in
understanding
the N - oo
scaling
law in the random link TSP. It isrelatively
simple
to see,following
anargument
similar to the one in Section 2.2. that the nearestneighbor
distancesDk
have aprobability
distribution with ascaling
factorN~~/~
atlarge
N. Vannimenus and MAzard [20]have
suggested
that the random linkoptimum
tourlength
with N links will then scale asN~~~/~,
and the tour will beself-averaging,
i.e.,
parallel
to the BHH theorem(3)
for the Euclidean case. This involves theimplicit
assumption
that
optimum
tourssample
arepresentative
part of theDk
distribution,
so no further Nscaling
effects are introduced. Theassumption
seems reasonable based on theanalogy
withthe Euclidean
TSP,
and for our purposes we shall accept here thatfIRL(d)
exists. However,there is to our
kno~vledge
no mathematicalproof
ofself-averaging
in the random link TSP.Following
the discussion of Section2.I,
let us consider some bounds on the ensemble average(LRL)
as derived in[20].
Asbefore,
weget
a lower bound on fIRLId)
using
nearest and nextnearest
neighbor
distances. For an upperbound,
the"strip"
algorithm
used in the Euclideancase
(Sect. 2.1)
cannot beapplied
to the random link case. On the otherhand,
Vannimenusand MAzard make use of an
algorithm
called"greedy"
[21]:
this constructs anon-optimal
tourby starting
at anarbitrary city,
and thensuccessively
picking
the link to the nearest availablecity
until all cities are used once and a closed tour is formed. At d >I,
greedy gives
rise totour
lengths
that areself-averaging,
and leads to the upper bound [20]fl~~(d)
<I
F(d/2
+1)~/~
T(I Id)
fi
d 1(19)
At d
=
I,
thepresumed
scaling
(18)
suggests
that(LRL)
isindependent
ofN,
whereasgreedy
generates tour
lengths
which grow as In N. There is numerical evidence[4,22],
however,
that3.3. SOLUTION via THE CAVITY
EQUATIONS.
Since the work of Vannimenus andM4zard,
several groups
[23-25]
have tried to "solve" the statistical mechanicalproblem
of the randomlink TSP at finite temperature
using
thereplica
method,
atechnique
developed
foranalyzing
disordered
systems
such asspin
glasses
[26].
Todate,
it hasonly
beenpossible
to obtain partof the
high
temperature
series of this system[23].
In view of theintractability
of thesereplica
approaches,
MAzard and Parisi have derived ananalytical
solutionusing
anothertechnique
fromspin
glass
theory,
the"cavity
method". The details of thisapproach
arebeyond
the scope ofthis paper, and are discussed in several technical articles
[3,26,27].
For readersacquainted
withthe
language
of disorderedsystems,
however,
the broad outline is as follows: onebegins
witha
representation
of the TSP in terms of aHeisenberg
(multi-dimensional spin)
model in thelimit where the
spin
dimension goes to zero. Under theassumption
that thissystem
hasonly
oneequilibrium
state(no
replica
symmetry
breaking),
M6zard and Parisi have then written arecursion
equation
for thesystem
when a new(N
+1)th
spin
is added. Thecavity
methodthen supposes that this new
spin's
effect on the N otherspins
isnegligible
in thelarge
Nlimit,
and that itsmagnetization
may beexpressed
in terms of themagnetizations
of the otherspins.
Using
thismethod,
Krauth and M6zard have derived a self-consistentequation
for theran-dom link
TSP,
at N - oo[4].
They
have determined theprobability
distribution of linklengths
in the
optimum
tour in terms ofGd(x),
whereGd(x)
is the solution to theintegral
equation
g~(x)
~
f~°~
(~
j
[j~~~
ii
+g~(y)j
e-Gd(v) dy.
(20)
Their
probability
distribution leads to theprediction
fIRL
Id)
=j
(~/j~/~)~
~~~
/~~
Gd(x)
J
+ Gdlx)]
e~~~~~~ dx.(21)
These
equations
can be solvednumerically,
as well asanalytically
in terms of a1Id
power series(see
nextsection).
At d= 1, Krauth and M6zard
compared
theirprediction
with the resultsof a direct simulation of the random link
model;
their numericalstudy
[4, 22]
strongly
suggests
that the
cavity
prediction
is exact in this case. It has beenargued,
furthermore,
that thecavity
method is exact at N - oo for any distribution of theindependent
random links[26].
Good numerical evidence has been found for
this,
notably
in the case of thematching problem,
a related combinatorial
optimization
problem
[28].
Thevalidity
of thecavity
assumptions
therefore does not appear to be sensitive to the dimension
d,
and we shall assume that(21)
holds for the random link TSP at all d.
Krauth and M6zard
computed
the d= and d = 2 cases to
give fIRL(1)
= 1.0208 andfIRL(2)
= 0.7251. SincefIRL(d)
is taken toapproximate
flE(d),
let us compare these valueswith their Euclidean counterparts. At d
=
1,
the Euclidean TSP withperiodic
boundary
conditions is trivial
(flE(1)
=
1);
the random link TSP thus has a2.1%
relative excess. At
d
=
2,
comparing
withflE(2)
= 0.7120 found in Section2.3,
the random link TSP has a1.8%
excess. In low dimensions, the random link results are then agood
approximation
ofthe Euclidean results. The
approximation
is better than Krauth and MAzardbelieved,
sincethey
made thecomparison
at d= 2
using
theconsiderably
overestimated Euclideanvalue of
flE(2)
m 0.749 from [5].Extending
the numerical solutions tohigher
dimensions,
at d= 3 we find
fIRL(3)
=0.7100,
which
compared
withflE(3)
=
0.6979,
has an excessof1.7%.
Some further random link valuesare
fIRL(4)
= 0.7322 andfIRL(5)
= 0.7639. The value at d = 4 may becompared
with theEuclidean estimate of Johnson et al.
[15],
flE(4)
m0.7234,
giving
an excessof1.2%.
TheflE(d)
data at d =
1,2,3,4
thereforesuggest
that the random linkapproximation
improves
with1.4
)
~/
~
l.2;/
/ £ _.~ , / q/ ;~ + / ~ :" / ~ l-1 ,:~ .II
3
_:~ . / ~ . 'II
Q3 II -' 0.9 0 O-1 0.2 0.3 0A 0.5 0.6 1/dFig.
5. Dimensionaldependence
of rescaled random link TSP optimum, shownby
smallpoints,
betweenconverging
"greedy"
upper bound(dotted line)
andnearest-neighbors
lower bound(dashed
line).
Plussigns
at d = 2 and d = 3 show Euclidean results forcomparison.
3.4. DIMENSIONAL DEPENDENCE. The
large
d limit was consideredby
Vannimenus andMAzard
[20].
ForfIRL(d),
the lower bound obtained from(Di IN, d)
+D2(N,d))/2
by
way of(11)
and the upper boundgiven
in(19)
differ atlarge
donly by
O(1Id),
giving:
fIRL(d)
=~
(~d)~/~~
l + O(22)
2~e
d~~
Note that this exact result is the random link
analogue
of the Euclideanconjecture
(15).
For values of d <
50,
we have calculatedfIRL(d)
numerically
using
thecavity
equations
(20,21).
The results are shown inFigure
5,along
with theconverging
upper and lowerbounds,
and our low d Euclidean results.
For
large
d,
we may see whether thecavity
equations
arecompatible
with(22) by solving
them
analytically
in terms of a IId
power series. Define4d(~)
a Gd(T(d
+1)~/~
[1/2
+~/d]).
(20)
may then be written:4d(x)
=/~~
1
+~)
~ ~(l
+4d(il)j
e~4d~~)
dy
(23)
-x-d d =/~~
e~+~
1
(x
+ y + ~~ ~ ~~~ + O1
+Gd(il)j
e~GdlY)
dy.
(24)
-x-d d 2 d
Strictly speaking,
theexpansion
of(I
+lx
+ill
/d)~~~
isonly
valid in the interval -x d <y < -x +
d; however,
forlarge
y it can be shown that4d(il)
~il~,
so thee~4d(Y)
term in theintegrand
makes the y > -x + d contributionexponentially
small in d.Furthermore,
extending
theintegral's
lower limit to include theregion
y < -x d alsolimit at y = -oo, the
equation
may be solved:~
1x~
3 In 2 2~/(In
2 +2~/)2
+ 6 In 2 + 12~/9j
Gd(x)
= 2e~ 1j
~
+ x ~ ~ +Oj
,(25)
where ~/, we
recall,
represents Euler's constant.Using (21),
we then findfIRL(d)
= ~(~d)~/~~
l
+ ~ ~~ ~ ~'~ + O)j
,(26)
2~e d dwhich is
perfectly
compatible
with(22).
Thisprovides
further evidence that thecavity
methodis exact for the random link TSP.
3.5. RENORMALIzED RANDOM LINK MODEL AT LARGE d. We can motivate the
large
d
scaling
found in theprevious
sectionby
examining
a different sort of random link TSP.Consider a new "renormalized" model where link
"lengths"
x~j are obtained from theoriginal
(j
by
the linear transformation x~j ed[l~j
(Di
IN,
d))]/(Di
IN, d)).
Note that the x~j maytake on
negative values,
and that the nearestneighbor
length
in this new model has mean zero.Since the transformation is
linear,
there is a directequivalence
between the renormalized xijand
original
(j
TSPS,
and the two have the sameoptimum
tours. The renormalizedoptimum
tour
length
L~
may then begiven
in terms of theoriginal
tourlength
Lj
by
Now
takeN
-
oo and d - oo. It maybe
seen
(Di(N,
d))
(14)
that the ariablesx~j
have the -independent obabilityistribution p(x) mJ
N~~
exp(x
-
).Also,
in
the large Nlimit, since
Li
scalesas
N~~~/~
(Di) cales
as
N~~/~,we
expect (Lx ) mJNp
for some pof
d.Then,
from 7), the P in theriginal
j satisfiesor,
using
theexpansion
(14),
fIRL(d)
=
~
(~d)~/~~
l
+ ~ '~ + O~)j
(29)
2~e d d
This result may be
compared
with ourcavity
solution of(26),
where the IId
coefficient isequal
to 2 ln2 2~/. If the
cavity
method is correct atO(I
Id),
which westrongly
believe is thecase, then a direct solution of the renormalized model should
give
p = 2 In 2 ~/. Work iscurrently
in progress to test this claimby
numerical methods.3.6. LARGE d AccuRAcY oF THE RANDOM LINK APPROXIMATION. Since the random
link model is considered to be an
approximation
to the Euclidean case, it is natural to askwhether the
approximation
becomes exact as d - oo. In this section we argue that:(I)
in stochasticTSPS,
good
tours can be obtainedusing
almostexclusively
low orderneighbors;
l~
~'
2 lj ~j'
j ' j ' 'f
ljFig.
6. Recursive construction of removed links(dashed
lines)
and added links(bold
lines)
inan
LK search.
(iii)
the relative error of the random linkapproximation
decreases as1/d~
atlarge
d. All threeclaims are based on a theoretical
analysis
of theLin-Kernighan
(LK)
heuristicalgorithm
forconstructing
near-optimal
tours.The LK
algorithm
works as follows[1,
29].
An LK search starts with anarbitrary
tour. Theprinciple
of the search is to substitute links in the tourrecursively,
as illustratedschematically
in
Figure
6. The firststep
consists ofchoosing
anarbitrary
starting city
io.
Callii
the nextcity
on thetour,
andfi
the link between the two. Now remove this link. Letii
be the nearestneighbor
toii
that was not connected toii
on theoriginal
tour, and letI[
be a new linkconnecting
ii
toI[.
We now have not a tour but a"tadpole
graph",
containing
aloop
with atail attached to it at
i[.
At thispoint,
call 12 one of the cities next toi[
on theoriginal
tour,and remove the link 12 between the two. There are two
possibilities
for12
(and
thus 12)1 LKchooses the one
which,
if we were toput
in a new link between 12 andio,
wouldgive
asingle
closed tour. Now asbefore,
letii
be the nearestneighbor
of12
that was not connectedto12
on theoriginal
tour, and letI[
be a new link between the two. Thisgives
a newtadpole.
Theprocess continues
recursively
in this manner, with the vertexhopping
around while the endpoint
stays
fixed,
until no newtadpoles
are found. At eachstep,
LK chooses the newim
so asto allow the
path
to be closed up between im and to,forming
asingle
tour; the result of the LK search is then the best of all such closed up tours. The LKalgorithm
consists ofrepeating
these LK searches on differentstarting
points
io,
each timeusing
the current best tour as astarting
tour, until no further tourimprovements
arepossible.
Let us first sketch
why
the LKalgorithm
leads to tours which useonly
links between "near"neighbors,
where "near" means that theneighborhood
order k is small and does not growwith d. Consider any tour where a
significant
fraction of the links connect distantneighbors
(large k).
The linksI$
which the LK search substitutes for thelm
are,by
definition,
betweenvery near
neighbors
(k
<3).
Aslong
as manylong
linksexist,
theprobability
at eachstep
ofsubstituting
a nearneighbor
inplace
of a farneighbor
issignificant.
Towards thebeginning
ofan LK search this
probability
isrelatively
constant, so theexpected tadpole
length
will decreaselinearly
with the number ofsteps.
Eventaking
into account the fact thatclosing
up thepath
between im and iomight
require
inserting
a link with k >3,
there is ahigh
probability
asN - oo that the
improvement
intadpole
length
faroutweighs
this cost ofclosing
the tour.tiny
fraction of thelong
links with short links. It follows that in accordance with our Euclidean TSPassumption
of Section2.5,
the"typical"
k used in theoptimum
tour remains small atlarge
d. This
provides
verypowerful
support
for theflE(d)
conjectures
(15)
and(16).
A consequence,making
use of the exactasymptotic
fIRLId)
result(22),
is that the relative difference betweenflE(d)
andfIRL(d)
is at most ofO(1Id).
Our second
argument
concernswhy fIRL(d)
must be greater thanflE(d)
at all d. For the random link TSP there is notriangle inequality,
which means thatgiven
twoedges
of atriangle,
the third
edge
is on averagelonger
than it would be for the Euclidean TSP.Applying
this toour LK search, we can
expect
the link between im and loclosing
up the tour to belonger
in the random link case than in the Euclidean case. Thus on average, the LK
algorithm
willfind
longer
random link tours than Euclidean tours. Infact,
thisproperty
holds as well forany LK-like
algorithm
where the method ofchoosing
thelm
andI[
links isgeneralized.
If thealgorithm
were to allow allpossibilities
forlm
andI[,
we would be sure ofobtaining
the exactoptimum
tour,given
along enough
search. In that case, theinequality
on the tourlengths
found
by
ouralgorithm
leadsdirectly
to@RL(d)
>flE(d).
Notsurprisingly,
the numericaldata confirm this
inequality
at d up to 4(although
one should be cautious whenapplying
theargument
at d=1).
Note also that theinequality
in itselfimplies
conjectures
(15)
and(16)
for the Euclidean
model,
since itsupplies
precisely
the upper bound we need onflE(d).
Finally
let usexplain
why
the relative difference betweendRL(d)
andflE(d)
should be ofO(1/d~).
This involvesquantifying
the tourlength
improvement
discussed above. It is clearthat any
non-optimal
tour can beimproved
to thepoint
~vhere links aremostly
betweenneighbors
of low order. If LK, or ageneralized
LK-likealgorithm,
is able toimprove
the tourfurther;
the relative difference inlength
will be ofO(1Id
);
we see this from(14),
noting
that theneighborhood
order k is small both before and after the LK search. Now we need toquantify
the
probability
that LK indeed succeeds inimproving
the tour. We may consider the vertex of the LI<tadpole
graph
asexecuting
a random walk, in which case theprobability
ofclosing
upa tour
by
asufficiently
short link isequivalent
to theprobability
of the random walk'send-to-end distance
being sufficiently
small. In that case it may be shownthat,
over the course of an LKsearch,
theprobability
ofsuccessfully
closing
a random link tourminis
theprobability
of
successfully closing
a Euclidean tour scales atlarge
d as2/(d
2).
Fromthis,
we concludethat
improvements
in the Euclidean model areO(1Id)
moreprobable
than in the random linkmodel.
Now,
the relative tourlength
improvement
for the Euclidean TSPcompared
to therandom link TSP is
simply
the relative tourlength
improvement
when a better tour isfound,
times the
probability
offinding
a better tour henceO(1/d2).
If we consider ageneralized
LK search as described in the
previous
paragraph,
where thealgorithm
necessarily
finds thetrue
optimum,
then this resultapplies
to the exactp's:
the relative difference betweenfIRL(d)
and
flE(d)
will scale atlarge
d as1/d2.
Three comments are in order
concerning
thissurprisingly good
accuracy of the random linkapproximation.
First,
the factor 2lid
2)
isonly
appropriate
forlarge
d. It is not smalleien
for d= 4.
(Its
divergence
at d = 2 is associated with the fact that a two-dimensional random~N-alk returns to its
origin
withprobability
1.)
We therefore expect the1/d2
scaling
to becomeapparent
only
for d >5,
beyond
the range of our numerical data.Second,
we have seen thatthe coefficient of the
1Id
term infIRL(d)
may be obtainedby
thecavity
method.Assuming
that this method is correct and that
fIRL(d)
andflE(d)
do indeed converge as1/d2,
this leadsto a
particularly
strong
conjecture
for the Euclidean TSP:Third,
thistype
of LKanalysis
can in fact be extended to many other combinatorialoptimiza-tion
problems,
such as theassignment,
matching
andbipartite
matching
problems.
In thesecases, we
expect
the random linkapproximation
togive
rise to aO(1/d~)
relative errorjust
asin the TSP.
4.
Summary
and ConclusionsThe first
goal
in our work has been toinvestigate
the finite sizescaling
ofLE,
theoptimum
Eu-clidean
traveling
salesman tourlength,
and to obtainprecise
estimates for itslarge
N behavior.Motivated
by
a remarkableuniversality
in the kth nearestneighbor
distribution,
we have foundthat under
periodic
boundary
conditions,
the convergence of(LE)/N~~~/~
to its limitflE(d)
isdescribed
by
a series in1IN.
This has enabled us to extractflE(2)
andflE(3)
using
numericalsimulatidns at small values of
N,
where errors are easy to control. Furthermore, thanks to abias-free variance reduction method
(see
Appendix
B),
these estimates areextremely
precise.
Our second
goal
has been to examine the random linkTSP,
where there are no correlations between linklengths.
We have considered it as anapproximation
to the EuclideanTSP,
inorder to understand better the dimensional
scaling
offlE(d).
For smalld,
we have used thecavity
method to obtain numerical values of the random linkfIRL(d).
Comparing
these withour numerical values for
flE(d)
shows that the random linkapproximation
isremarkably
good,
accurate to within
2%
at low dimension. Forlarge d,
we have solved thecavity
equations
analytically
togive fIRL(d)
in terms of a1Id
series. We have thenargued, using
a theoreticalanalysis
of iterative tourimprovement
algorithms,
that the relative difference betweenfIRL(d)
and
flE(d)
decreases as1/d~.
This leads to ourconjecture
(30)
on thelarge
d behavior offlE(d),
specifying
both itsasymptotic
form and itsleading
order correction.Let us conclude with some
remaining
openquestions.
First ofall,
while thecavity
methodmost
likely gives
the exact result for the random link TSP, we would be interested inseeing
thisargued
on a more fundamentalphysical
level. Readers with abackground
in disorderedsystems
will
recognize
that theunderlying
assumption
of aunique
equilibrium
state is false in manyNP-complete
problems,
and inparticular
in thespin-
glass
problem
that hasinspired
thecavity
method. What makes the TSP different? Second of
all,
our renormalized random link modelprovides
an alternateapproach
tofinding
the1Id
coefficient of the power series infIRL(d),
and could prove a useful test of the
cavity
method'svalidity.
A solution to the renormalizedmodel
using
heuristic methods appears within reach. Third ofall,
theO(1/d2)
convergenceof the random link
approximation
merits furtherstudy,
from both numerical andanalytical
perspectives.
Numerically,
Euclidean simulations at d > 5 couldprovide
powerful
support
forthe form of the convergence, and thus for our
conjecture
(30).
Analytically,
thequalitative
arguments
presented
in Section3.6,
based on the LKalgorithm,
couldperhaps
be refinedby
a more
quantitative
approach.
Lastly,
it is worthnoting
that theO(1/d2)
convergence shouldapply equally
well to the distrtbution of linklengths
in theoptimum
tour. The random linkprediction
for this distribution can be obtained from thecavity
method [4]; aninteresting
testwould then be to compare it with simulation results for the true Euclidean distribution.
Acknowledgments
We thank E.
Bogomolny,
D.S.Johnson,
M.MAzard,
S-W- Otto and N. Sourlas for fruitfuldiscussions. NJC
acknowledges
agrant
from the HumanCapital
andMobility
Program
of theCommission of the
European
Communities;
JBdMacknowledges
financialsupport
from theFrench
Ministry
ofHigher
Education andResearch;
OCMacknowledges
support from NATOAppendix
AOverview of the Numerical
Methodology
In the
following,
we discuss theprocedures
used to obtain the raw data from whichflE(d)
and the finite size
scaling
coefficients are extracted. Twomajor
problems
must be solved inorder to
get
good
estimates offlE(N,d).
First,
flE(N,
d)
is defined as an ensemble average(LE(N,
d))
/N~~~/~,
but is measuredby
a numerical average over a finitesample
of instances.The instance-to-instance fluctuations in
LE
give
rise to a statistical error, which decreasesonly
as the inverse square root of the
sample
size.Keeping
the statistical error down toacceptable
levels could
require
inordinate amounts ofcomputing
time. We therefore find it useful tointroduce a variance reduction trick: instead of
measuring LE,
we measureLE
AL*,
whereis a free parameter and L* can be any
quantity
which isstrongly
correlated withLE.
Detailsare
given
inAppendix
B.A second and more basic
problem
is that it iscomputationally
costly
to determine theoptimal
tourlengths
for alarge
number ofinstances,
precisely
because the TSP is anNP-complete
problem.
The mostsophisticated
"branch and cut"algorithms
can take minutes on a workstation to solve asingle
instance of size N < 100 tooptimality.
However,
we do notneed to
guarantee
optimality:
the statistical error inflE(N, d)
already
limits thequality
of ourestimate,
and so an additional(systematic)
error inLE
is admissible aslong
as it isnegligible
compared
to the statistical error. We may thus use fast heuristics to measureLE,
rather thanexact but slower
algorithms.
This is discussed further inAppendix
C.Appendix
BStatistical Errors and a Variance Reduction Trick
Consider
estimating
(LE(N,d))
at agiven
Nby
sampling
over many instances. If we haveM
independent
instances,
thesimplest
estimator for(LE(N,d))
isLE(N,d),
the numerical average over the ~I instances of the minimum tourlengths.
This estimator has anexpected
statistical error
alit)
=
aL~/fi,
where aL~ is the instance-to-instance standard deviationof
LE.
Now let us define
Lk
to be the sum, over allcities,
of kth nearestneighbor
distances.(Lk)
is its ensemble average; in terms of the notation used earlier in the
text,
(Lk)
"N(Dk).
It hasbeen noted
by
Sourlas[30]
thatLE
isstrongly
correlated withLi, L2
andL3.
He thereforesuggested
reducing
the statistical error in(LE)
using
the estimatorEs
"(L123)
LE/L123,
(B.I)
where
L123
is the arithmetic mean ofLi, L2
andL3.
The ensemble average(L123)
can becalculated
analytically
from(ii),
and so the variance ofEs
comes from fluctuations in theratio
LE/L123.
IfLE
were a constant factor timesL123,
this estimator would of course beperfect,
I.e.,
it would have zero variance. This is not the case,however,
and furthermore theuse of a ratio biases the Sourlas estimator: its true mathematical
expectation
value differsfrom
(LE(N,d))
by
Oil /N).
Toimprove
uponthis,
we have introduced our own bias-freeestimator
[31]:
EM-P
=
A(L12)
+LE
AL12,
(B.2)
where