A NNALES DE L ’I. H. P., SECTION A
P IERRE P ICCO
Artificial neural networks. A review from physical and mathematical points of view
Annales de l’I. H. P., section A, tome 64, n
o3 (1996), p. 289-307
<http://www.numdam.org/item?id=AIHPA_1996__64_3_289_0>
© Gauthier-Villars, 1996, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section A » implique l’accord avec les conditions générales d’utilisation (http://www.numdam.
org/conditions). Toute utilisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit contenir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
p. 289-3
Artificial neural networks A review from Physical
and Mathematical points of view
Pierre PICCO*
CPT CNRS, Luminy Marseille
Case 907, 13288 Marseille Cedex 09 E.mail:
picco@cptsu2.univ-mes.fr
Vol. 64, n° 3, 1996, Physique theorique
ABSTRACT. - This survey summarises recent mathematical results on the statistical mechanics of the
Hopfield
andKac-Hopfield
models.Key words : Neural Networks, Hopfield model.
RESUME. - Nous donnons ici une revue, sans entrer dans les details de preuves, des resultats
mathematiques
obtenus recemment pour les modeles deHopfield
et deKac-Hopfield.
Mots cles : Reseaux de neurones, modele de Hopfield.
0. INTRODUCTION
Neural Networks are models made for
understanding
some features of the brain. We do not think thatbiological
neurons work as neural networks.However,
aninteresting
consequence of thestudy
of Neural Networks is that now, Neural Networks are used for manyapplications
such as* Work partially supported by EU grant CHRX-CT93-0411.
Annales de l’Institut Henri Poincaré - Physique theorique - 0246-0211
Vol. 64/96/03/$ 4.00/@ Gauthier-Villars
pattern
classification andpattern recognition. They
are used in robotics andcomputer
vision. The range ofpossible applications
seems to becomewider and wider. New
journals
dedicated to Neural Networksappeared
asNeural Network in
1988,
NeuralComputation
in1989,
Network in1990, Artificial
Neural Networks in 1991.People coming
from different scientific communities areworking
in thissubject
and we will restrict ourself in this review to a rather smallpart
of thesecommunities, namely
the MathematicalPhysics
one. From the Mathematical Statisticspoint
of view a reviewby Cheng
andTitterington [CT] appeared
inFebruary
1994. From the discretedynamical point
of view the bookby
Goles and Martinez[GM]
is agood
reference. From the Theoretical
Physics point
of view the bookby
Amit[A]
is now a classic. The oneby Minsky
andPapert [MP]
ishistorically
fundamental.
Neural Networks are collections of
simple
units called neurons and connections that link neurons. In the models weconsider,
the neuron canbe in two states, fired or unfired and is described
by
a variabletaking
the values ±1 that we call r. For some
applications,
it could be useful to consider the case where the number ofpossible
states of agiven
neuron isan
integer
between 1 and q. This is the case of coloredimage recognition.
Models where o- takes continuous values between -1 and + 1 are also considered. The connections are
usually
intricate and in some models anygiven
neuron is connected with all the other ones.However,
in the brain there is not as much connection and a way to model this fact is to consider the so-called diluted NeuralNetworks,
where the connections are cut ina random way and
stability
withrespect
to dilution of the connections is studied.The first model of Neural Networks is the McCulloch-Pitts neuron
[McCP],
aninput-output
model with asingle
unit. It is described as follows:The
inputs
are theoutput
is x wherewith 03C3j
= dbl and x = J=l.The wj
are called connectionweights
and 03C90is a
bias,
sometimes called a threshold. Modified versions of this model exist where theinputs
andoutput
are not restricted to be ±1 but can take continuous values between 0 and 1 or between -1 and+1,
in this case thesgn function is
replaced by
a convenientsigmoid.
A multidimensional extension of this model is defined in the
following
way: a state of the
system
is charaterizedby
a vector: (T, with ze {1,..., TV}
for some
integer
N which isusually
verylarge.
Nrepresents
the numberAnnales de l’Institut Henri Poincaré - Physique theorique
of neurons, which in the brain is of the order of
1011.
In thismodel,
theoutputs
arecr~
withThe matrix W is called the connection matrix and () the treshold vector.
Multilayered
Neural Networks are models where units arearranged
ina series of
layers
and connections arepresent only
between consecutivelayers. Multilayered
versions are used forpractical applications
andfinding
the architecture that have some
optimality
for agiven problem
isusually extremely
difficult. As anexample,
the Le Cun et al. network for hand writtenZip-code recognition
involves 1256units, 63,660
connections and 9760independents parameters [Led] [LeC2].
All these models andpossible generalisations belong
to thefamily
ofinput-output models,
that isgiven
aninput
the Networkgives
an answer, theoutput. They
are very welladapted
to
recognition problems.
1. THE HOPFIELD MODEL
An
important conceptual step
in Neural Networktheory
was doneby Hopfield [Ho]
in 1982. Theobjective
was to associate with abinary JL1 vector ç
one of a set of mexamplars
that have been memorized in thesystem.
The association is not done on the form of anoutput
as before but as the result of an iterativeprocedure,
as in a discrete timedynamical system. Starting with ç
as an initialcondition,
the iteration willbring
thesystem
to the stablepoints (or
limitcycles)
of thedynamics
that are theassociated vectors. The
advantage
of this formulation is thepossibility
tomodel the fact that the human memory does not need a
perfect image
orsound to associate it with
previously
memorizedimage
or sound but is able to correct errors. Indynamical system theory,
this isjust
the fact thatstarting
in the basin of attraction of a stablepoint
thetrajectory
converges to this stablepoint.
Let us describe howHopfield
introduced his model:Take N-dimensional ±1 vectors
(03BEi)Ni=1
==ç,
and then take M of suchobjects
calledpatterns ~~
for ~c =1, ... , M,
and consider thefollowing
iterative
procedure:
Given a
configuration 03C3(t)
at a timet,
on thespace {-1,+1}N,
theconfiguration
at the time t + 1 is64, n°
where the connections are chosen
according
to the Hebb[He]
rule:The
dynamics
could besynchronous (parallel), i. e. ,
theupdating
definedby ( 1.1 )
is donesimultaneously
for all the sitesz,
orasynchonous (sequential), i.e.,
theupdating
is done on one site and then on anotherone and so on
by using
an exhaustiveprocedure.
There are a lot of choices for the connection matrix W. An
important
property
ofHopfield’s
one is the fact that W is asymmetric
matrix.Hopfield noticed that the function
is a
Lyapunov
function of theasynchronous dynamics,
that is adecreasing
function
along
theorbits,
so theasynchronous dynamics
will converge to local minima of this function and there are not limitcycles
in this case.This is true for
general
matrix connections aslong
as thediagonal
elementsare
strictly positive
and the matrix issymmetric,
as wasproved by Goles, Fogelman-Soulie
andPellegrin [GFP].
For thesynchronous dynamics,
whenthe connection matrix is
symmetric,
we can have fixedpoints
or limitcycles
of
period
at most two. This follows from the fact that the functionalis a
Lyapunov
fonction of thesynchronous dynamics [GPP] [GM].
Noticethat the Hebb rule with a zero
diagonal
can have limitcycles
ofperiod
atmost two. If the connection matrix is not
symmetric,
limitcycles
ofperiod
more than two may exist.
In Neural Network
theory
it is said that the memory worksif, roughly speaking, starting
"near" apattern
thedynamics
converge as the time tends to oo, to apattern.
Here "near" means that the number ofdiscrepancies
between the initial
point
and thepattern
is smaller thanEN,
for someE >
0,
that is w.r.t. theHamming
distance. We can define the radius ofattraction of a stable
(for
thedynamics)
state x as thelargest
value of psuch that every vector at a distance not more than
pN eventually
reachesthe state ~. However the usual notion of
stability,
that is a fixedpoint
ofthe
dynamics,
could be toostrong
and we can define a weaker one:given
Annales de l’Institut Henri Poincaré - Physique theorique
two
non-negative
numbers E and 03B4 we call a vector x(8, 6)-stable
ifstarting
from an initial
point
within a distance ~7V thesystem
ends up within adistance EN of x. A stable state is
just
a(8, 0)-stable
state. Notice alsothat
(0,6)-stable
states arejust configurations
such thatstarting
from it thesystem
ends up in aneighbourhood
of radius EN. As adynamical system
this model isinvolved,
but its discrete nature has however theadvantage
to make it suitable for numerical simulations.
The numerical results of
Hopfield
are thefollowing:
He chose fortesting
:
çf == ::1::1 randomly
withprobability
p =~.
IfN -
0152 and a 0152c ~0.14,
he found that the memory
works,
in a sense that can beinterpretated
asthe fact the
original patterns
are(0,6)-stable
for some > 0 smallenough.
If cx > 0152c then the memory does notwork,
in a sense that can beinterpreted
as E>; 1/2
forany 03B4
> 0. This is because thetypical Hamming
distance between two
patterns
is1/2.
What is now known as the
Hopfield
model ismerely
thisparticular
randomized version. Let us remark at this
point
that as adynamical system
this is a model in a random environment and the
knowledge
ofproperties
of such kinds of
dynamical systems
is rather poor. Animportant
featureto notice is that Nand M are
supposed
to be verylarge
and we will beinterested in the case where
they
tend toinfinity.
The maindifficulty
is nowthat the state space of our
dynamical system
isgrowing.
Theadvantage
isthat in this limit we can
expect
to have results that are either true withprobablity
one, or else withprobability approching
one with Nand M orwe could have
just
convergence in Law.Knowing
that the function H is aLyapunov function,
it is natural tostudy
the
corresponding
energylandscape,
and to check that in someregime
ofthe
parameters,
the storedpatterns
are minima of this function.However,
since H is a function on a discrete space, a
point
r* is a local minimum iffor any
single flip
at the siteI~,
where_ ~ ( j )
ifj ~ 1~
and-cr(&),
we haveA~(~) = H(~*) >
0.In this
form,
the memorycapacity,
that is thelargest
cx such that everyone of the
pattern
is a(local) minima,
wasgiven by
McEliece et al.[McE]
who show that if 0152 =
1 4 ln N
then,
with aprobability tending
to one whenN T
oo, thepatterns
are local minima. This was extended to almost sure convergence if a 1==by
Martinez[M].
A weaker notion was
implicitly
introducedby Amit,
Gutfreund andSompolinsky [AGS]
andmathematicaly analyzed by
Newman[N]
in afundamental work. Let
S ( ~, b ) = {~~(cr,~)
==~b N] ~ ,
where ==2 ~N - (~, ~’)]
is theHamming distance,
be thesphere
of radiuscentered at cr. Let us call
hN (~, b)
the minimum of the energy on the3-1996.
sphere ?(7~).
We will say that there exists an energy barrier ofheight
EN if for some 6 E
(0~ 1/2)
we haveThe main result of Newman is
THEOREM 1. - There exists cxo > 0 such that
if
M::S aoN
then thereexists E > 0 and 0 8
1 /2
such thatalmost
surely.
Newman’s estimate for 0.056 is
far,
also not too much so fromthe
expected
0.14. This result wasrecently improved by
Loukianova[Lo],
who
get
0.071.Let us notice that this
implies
that there exist at least one minimum ina ball of radius 8 centered at The
study
of the energylandscape
is farfrom
being complete,
such as theexpected
local minima that are mixture of thepatterns.
However Newmanproved
that vectors of the formwith all but a finite number of the
equal
to zero and such that 0 for allpossible
choices of 6~ ==::f::1,
the v are someparticular dyadic rationals,
are surroundedby
energy barriers.It is
expected,
in fact that the number of local minima isgrowing exponentialy
withN,
see[McE].
The convergence of theprevious
iterativeprocedure
was studiedby
Komlos and Paturi[KPa]. They proved
that in theregime
0152= l/(4In7V)
thesynchronous dynamics
takessteps
for convergence. In theregime
where a~ they
exhibit a real valued functionA(.)
suchthat, starting
within aHamming distance 03BB(03B1)N
froma
pattern ~’~,
thenuniformly
withrespect
to the chosenpattern,
in a time0(ln ~),
thesystem
will end up within a distance of~’~,
and thisholds with a
probability converging
to one whenN T
oo .Newman’s result was extended to the case of
variables ~ taking
discretevalues
in {1,2~... q~
with a uniformdistribution, by Ferrari,
Martinez and Picco[FMP].
The Hamiltonian is nowAnnales de l’Institut Henri Poincare - Physique theorique
where
8 ( ., .)
is the Kroneckersymbol.
Their results areexactly
the same asTheorem 1
except
that now all the constants are functions of q.The Newman result was extended
by
Bovier andGayrard [BG1]
in thecase of the diluted
Hopfield
model.They
consider the Hamiltonianwhere the are a
family
ofN2
i.i.d. random variable with=
1) =
1 - =0)
=p(~V).
Their result isTHEOREM 1.2. -
Suppose p >
Then there exists0152c ~ 0,
such thatif M
then there exists E > 0 and 08 1/2
such thatalmost
surely.
Notice that here
pN
is the mean number of connections persite,
and isin fact the correct scale of the
parameter
0152.An very
important
openproblem
that remains to prove is the converse of Newman’s theorem.Another
interesting problem
is to consider correlatedpatterns
and totry
to extend at least Newman result’s to these cases. The
problem
of correlatedpatterns
was studied inslightly
different contexts. One is related tooptimal
storage properties.
Theproblem
is to find the connectionsWZ~
thatgive
themaximal
possible
This was studied firstby
Gardner[Gar]
and Gardner and Derrida[GD]
from the TheoreticalPhysics point
of view. Monasson[Mol], [M02]
and Tarkowski and Lewenstein[TL]
studied theproblem
ofoptimal storage properties
in the case of correlatedpatterns.
Thepatterns
can havespatial
correlations i.e.IE ( çf çj)
= or semantic correlations=
C’ (~c, v) ~i,~ .
Here8(.,.)
is the Kroneckersymbol
andC,
C’ are correlation matrices. It could beinteresting
tostudy
how the0152O of Newman
depends
on the correlations. The case withexponentially decreasing spatial
correlations does not seems too difficult but the caseof semantic correlations could be a little more involved. We can
expect
to
get
at least an0152~
which islarger
than the one in Newman’s theorem.Let us notice that there are no
rigorous
results in theproblem
ofoptimal storage properties.
Another context where correlatedpattern
are studied isa work of
Griniasty, Tsodyks
and Amit[GTA]
who model the results ofMiyashita’s experiment [Mi]
in whichpatterns presented
in a certain order3-1996.
in time leads to attractors with space ’ correlations. Let us note that various modifications of the
Hopfield
0 hamiltonianexist, including
£ thefollowing:
Higher
order Networks:For
d >
2 this was studied from TheoreticalPhyics point
of viewby
Lee etal.
[L]
andby
Newman[N]
from the MathematicalPhysics point
of view.Here the
scaling
relation a =M/N
valid for d = 2 has to bereplaced by
cx = and is adecreasing
function of d. Notice that the extension to d > 2 of the result of Komlos and Paturi is aninteresting problem
tostudy.
Here also there is no limitcycle
for thedynamics
if dis even, as was shown
by
Lee et al.[L].
Model
y9/"
short time memory:This was introduced
by
Kohonen[Koh]
and studiedby Mezard,
Nadaland Toulouse
[MNT].
The Hamiltonian iswhere
11(x)
is It will beinteresting
to have theanalogues
of theNewman Theorem and the Komlos and Paturi Theorem in this case and also for the
corresponding higher
order models.The
pseudo
inverse model : If theçlL
wereorthogonal
vectors, theconnection matrix would
satisfy
and the would be fixed
points
of thedynamics.
In the random case thepatterns
are notorthogonal,
and the idea isjust
toproject
on the spacegenerated by
the(~~ ) ~ 1.
The Hamiltonian ishere (1 N 03A3k ÇkÇk) -1
is the element of the M x M matrix which is the inverse of thepositive
definiteÇkÇk.
This model was introducedAnnales de l’Institut Henri Poincare - Physique theorique
by Personnaz, Guyon
andDreyfus [PGD].
and studiedby
Kanter andSompolinsky [KS]
from the StatisticalPhysics point
of view. Here alsothe
analogues
of Newman’s Theorem and the Komlos and Paturi Theoremare
interesting problems.
2. STATISTICAL MECHANICS OF THE HOPFIELD MODEL The main
problem
of the deterministicdynamical system
definedby Hopfield
is the fact that thesystem
istrapped
in local minima of theLyapounov
functional. Since there is anexponential (in N)
number of suchminima this situation will
happen quite frequently.
The usual way to be able to avoid this situation is to use stochasticdymanics,
that is to introducea random noise that allows the
system
to go away from these minima and reach the absolute minima that could have or not,depending
on the value of 0152, somerelationship
with the storedpatterns.
There are manypossibilities
to choose such a
dynamics.
Apossible
choice is the heat bathalgorithm,
that is to introduce a
temperature
in such a way that theprevious dynamics corresponds
to the zerotemperature dynamics.
Let us calland
The stochastic
dynamics
is definedby
By construction,
for all finiteN,
the Gibbs measure, a measure valued randomvariable,
definedby
is reversible with
respect
to thisdynamics.
Therefore apreliminary question
is to
study
the Gibbs measures in the limit when N tends to oo. At thispoint
theHopfield
model can be either this Statistical Mechanics model orthe
previous
deterministicDynamical System.
In 1985-87Amit,
GutfreundVol. 3-1996.
and
Sompolimsky [AGS 1 ]
and[AGS2]
studied thethermodynamics
ofthe
Hopfield
model from the TheoreticalPhysics point
of view.They
found a very rich structure in the
regime
where 0152 >0,
with aspin glass phase [MPV]
if/3
and 0152 arelarge enough.
TheHopfield
modelis,
as 0152varies,
intermediate between an ordered model of Curie-Weisstype (when
M =1)
and theSherrington
andKirkpatrick
model(when 0152 i oo).
All thetechniques
introducedby Mezard,
Parisi and Virasoro[MPV]
wereapplied
to the
Hopfield
model.Unfortunately
for the Mathematicalpoint
of viewthe
only rigorous
results for theSherrington
andKirkpatrick
model are inthe
high temperature regime, cf Aizenman,
Lebowitz and Ruelle[ALR].
The way used
by
MathematicalPhysicists
is in fact theopposite
of theTheoretical
Physicists, namely
thestudy
of theHopfield
model couldhelp
. us to understand the
Sherrington
andKirkpatrick
model and thespin-glass phase.
Animportant
historicalpoint
to notice is that theHopfield
model asa model of Statistical Mechanics is not a new one, it was introduced in 1977
by
Pastur andFigotin [PPI], [FP2]
and[FP3],
with the main difference that M was a finite number. It was considered as a model ofspin glasses,
since a basic
property
ofspin-glasses
is the presence of an infinite number of GibbsStates,
this model which has M Gibbs states at lowtemperature
did not receive too much interest at that moment. Let us notice that this model has the two otherproperties
that seems to characterise aspin glass, namely
frustration and disorder.The first
problem
to solve in Statistical Mechanic is the existence ofthermodymanics. Namely
the existence of the infinite volume limit of the free energy. In mean fieldmodels,
this is not a trivialproblem.
The firstreason is that you cannot use a sub-additive
argument
to proveit,
as in the usual Statistical Mechanics models where thestrengh
of the interaction does notdepend
on the volume. In random mean field models the situation isworse as the models are
usually
not even stable in the sense that the energy is not bounded from belowby
some constant times thevolume,
for all realisations of the randomness. However such a lower bound doeshappen
to be true for almost all the realisations. The main
difficulty
is to prove the existence of the infinite volume limit i. e. in our caseN T
oo . Since thefinite volume free energy is a random
quantity,
the infinite volume limit could occur inprobability (that
is with aprobability
that goes to one with thevolume)
or almostsurely (that
is when theprobability
of the realisations for which the free energy converges isone).
In usual models this is not aproblem
since the existence of thethermodynamics
follows from the sub- additiveergodic
theorem and the convergence occursalways
almostsurely.
The second
problem, given
that the existence of thethermodynamics
limitl’Institut Henri Poincaré - Physique theorique
is proven, is to show that the
limiting
free energy is a non random function.At first
sight,
the twoproblems
seem unrelated. This is not the case. Weproved recently [BGP3]
a concentration result for theHopfield
model andthe
Sherrington
andKirkpatrick
model thatimplies
THEOREM 2.1. -
If the
meanof the free
energy converges thenthe free
energy converges almost
surely.
We stated this result for these two models but any model which
depends
ina
Lipschitzian
way on the randomparameters
should also have thisproperty.
This is the case for all the Hamiltonians introduced in the
previous
section.However,
in some cases it could be difficult to prove theanalogue
of theprevious
theorem.Historically
the firstrigorous
result on theHopfield
model wasgiven by
Pastur and
Figotin
in1977, [PP1] ] [FP2] [FP3 ] . They proved
the existence of the infinite volume limit for the free energy in theregime
of finite M.The value of the limit was
given
as the solution of a variationalproblem.
The second result is contained in a very nice paper which did not receive the success it deserves. In 1989 Koch and Piasko
[KP]
studied the caseof M such that
2M
=using
aspecial representation
of the model foundby Grensing
and Kuhn[GK]. They got
convergence inprobability
for the free energy, but almost
everywhere
results can be obtainedby
easy modification of their
proof.
Animportant
fact is that the value of thelimiting
free energy is the same as the usual Curie-Weiss model[E].
Thisresult can be found in
[FP 1 ] .
This was not noticed at that moment.This was extended in 1991 to the case of the
Potts-Hopfield
modelby Gayrard [Gay]
who gave almost sure results for the free energy and studied the structure of the Gibbs states. In 1991 Scacciatelli and Tirozzi[ST]
extended the result of
Aizenman, Lebowitz,
and Ruelle[ALR]
for the SKmodel to the
Hopfield
model andproved
the existence ofthermodynamics
for all the
paramagnetic phase.
At the end of 1991 a very nice article
by
Shcherbina and Tirozzi[ShT] appeared
where the limit a = 0 was considered.They proved
thatthe
limiting (in probability)
free energy is the Curie-Weiss free energy.They proved
also that the free energy and theoverlap parameters
are selfaveraging
in thefollowing
sense: athermodynamics quantity g(,~, N)
isself-averaging
ifa notion which
implies only
convergence inProbability.
This estimate wasextended to some weak
exponential
moments in[BGP3].
Vol. 64, nO 3-1996.
At the
beginning
of 1992 Koch[Ko]
gave anotherproof
of the sameresult. All these results were extended
by
Bovier andGayrard
to the caseof the diluted
Hopfield
model[BG2].
In 1992Pastur, Shcherbina,
and Tirozzi studied thecavity
method for theHopfield
model. This methodwas introduced
by Mezard,
Parisi and Virasoro[MPV]
for thestudy
ofthe
Sherrington
andKirkpatrick
model as an alternative to thereplica symetry breaking
scheme.Pastur,
Shcherbina and Tirozzi[PST] proved
that if the Parisi
replica symetry breaking
scheme holds then the Edwards and Andersonparameter
is not selfaveraging.
Animportant
fact that does not seems to be knownby people working
in thissubject.
In 1993
Bovier, Gayrard
and Picco[BGP1]
studied thelimiting
Gibbsstates of the
Hopfield
model. Since we are in a mean field model thenotion of Gibbs states is not very well defined. This has been a source of mistake and
misunderstanding
that seems topersist
in the literature related to disordered mean field models. There is however a verycomplete analysis
of the basic mean field
model,
the Curie-Weissmodel,
doneby
Ellis andNewman
[E]
thatgive
us a framework to start with. The extension to theHopfield
model is nottrivial, owing
to the fact that the number of orderparameters depends
on the volume in a very sensitive way. This createsnew difficulties. To describe the
results,
we need to introduce some notation.For 7/ E
N,
we denoteby ~N,~,h
the randomprobability
measure thatassigns
to eachconfiguration cr
the masswhere ’
the partition function. G~N,03B2,h
is calledafinite
volume Gibbs state withmagnetic field.
Animportant
observation is that the value ’ of themeasure ’
depends on a only through
thequantities
called
overlap parameters,
since the Hamiltonian may be written in the formWe define the random map
Annales de l’Institut Henri Poincaré - Physique theorique
and the measures on
( ~ M , ,t3 ( (~ M ) )
that are inducedby through
the map i.e.These measures are the natural definition of the Gibbs measures of the
Hopfield
model. In mean fieldmodels,
the measures on the space ofspin configurations
areusually
trivial:they
arejust products
of Bernoullimeasures. As we will see later this is also the case for the
Hopfield
model.THEOREM 2.4. - Assume that M is
non-decreasing
andsatisfies limN~~ M(N) N
== O. Let03B1±(03B2)
denote the largest (resp. smallest) solutionof
a ==Then, for
all/3 2: 0,
where the limits are understood in the sense
of
weak convergenceof probability distributions ; b~,~
denotes the Dirac-measure concentratedon and e’~ is the
ri-th
unit vector inIRN.
(ii) Moreover,
anylimiting
induced measure is a convex combinationof
the measures in
(2.11 )
Note that for
/3 ~ l, a+ (,~)
= = 0 so that in this case there isa
unique limiting
measure.In the case where
limN~~ N
>0,
some modification have to be done:For 8 >
0,
we will writea(b, /3)
for thelargest
solution of theequation
We denote
by ~.~
thel2-norm
on Given thatlimN~~ M(N) N =
0152,we set, for fixed
,~3,
Using results
of[BGP1]
and o[BGP3]
we haveTHEOREM 2.5. - There
exists ao
> 0 such that = cx, with aao, >
for
almost all cv,Theorem 2.5 excludes in
particular
that any of the so-called mixed states(which
are associated to local minima of theHamiltonian;
see[AGS], [N],
[KPa]) give
rise to Gibbs states in thisregime
ofparameters.
As we mentioned before the measures induced on the
spin configurations
are rather trivial. Here we have:
THEOREM 2.6. - Under the
assumptions
and with the notationof
Theorem
2.4,
where
Ba
denotes theproduct
measure with themarginal
measure
given by
the Bernoulli measure~~ {20141,1}
with mean~a.
3. THE KAC-HOPFIELD MODEL
The main
problem
of mean field models like the Curie-Weiss is theunphysical property
that the interaction energy between two sitesdepends
onthe volume. This fact has various consequences that are rather
pathological.
The first one is that the Canonical Free energy
(that
is the free energy with fixedmagnetisation
in the case of theMagnetic systems)
is not aconvex function of the
magnetisation.
Thisnonconvexity
can be seen fromthe fact that
starting
from aspin configuration
with allspins
~i =1,
thecost in energy to create a
droplet
of -1 isproportional
to the volume of thedroplet,
instead ofbeing proportional
to thelength
of theboundary
of the
droplet.
This fact isresponsible
also fo theimpossibility
to usea sub-addiditive
argument
to prove the existence of the infinite volume free energy. Moreover the definition of Gibbs States is notgiven by
DLRequations [D], [LR]
and the notion of extremal measures is not well defined.The
advantage
is however thepossibility
to makeexplicit computation
andto have non trivial critical
Thermodynamical properties,
see[E].
However there is a model invented
by
Kac[K]
in thefifties,
that has nosuch
pathologies,
which isexplicitly solvable,
and is related to the Curie Weiss model. The one dimensionalferromagnetic
version of thismodel,
is thefollowing:
The Hamiltonian isTherefore,
for allfixed 03B3,
there is nophase
transition. However if weperform
the limitlim 03B3 1
0 after thethermodyamic limit,
it wasproved
thatthe free energy is the same as the Curie-Weiss free energy, the canonical free energy is the convex hull of the canonical free energy of the Curie- Weiss model. This is know as the Lebowitz and Penrose theorem
[LP]
andAnnales de l’Institut Henri Poincaré - Physique theorique
there is
spontaneous magnetisation
at lowtemperature (which is j3
= 1here).
Theferromagnetic
model is nowrelatively
wellunderstood, although
the
complete study
in one dimension is rather recent and was doneby Cassandro,
Orlandi and Presutti[COP].
To avoid the
previouly
mentionedpathologies
of the mean fieldmodels,
it is
important
to consider the Kac version of theHopfield
model. Let usdefine it
precisly.
We denote
by
A the set ofintegers
A= {-~V, -~V
+1,..., ~V}.
Wedefine a random Hamiltonian as follows. be a two-
parameter family
ofindependent, identically
distributed random variables such ‘ =1 ) == r (çf
‘ ==-1 )
=~.
The Hamiltonian with freeboundary
conditions on A is thengiven by
It is
slightly
different from the model withexponentially decaying interactions,
however the two versions areequivalent.
We are interestedin the case where oo, as 03B3
1
0. We will set == Thefirst result concern the free energy,
F~,~,~, - -(3A -1
PROPOSITION 3.1. - Assume that = O.
Then, for
allmostall 03C9,
with
F~ W - the free
energyof the
Curie-Weiss model.
The finite volume Gibbs measure for our model is defined
by assigning
to each 03C3 ~
sn
the masswhere is the For any subset A C
Z,
we definethe M-dimensional vector of
’overlaps’ mo (~)
whosecomponents
areVol. 64, n 3-1996.
The main
object
tostudy
are the distributions ofm~ (~)
under the Gibbs measure, i. e.defines a random
probability
measure on Forfixed ’7 >.0,
this sequence ofprobability
measures satisfies alarge
deviationprinciple
in the sense that for instance the limitexists almost
surely by
the sub additiveergodic
theorem.Moreover,
is a convex function of its
argument.
We are interested in thelimiting
behaviour of
as ’"Y 1
0. Since the domain of this functiondepends
via
M(~y),
it is natural to consider restrictions to finite dimensionalcylinders. Thus,
let I c N be a finite set and denoteby III : ~I,
for any M such that I
G {!,..., M},
theorthogonal projector
on thecomponents with ~
EI,
of a vector m E We set, for m EThe Lebowitz-Penrose
theory
relates the limit of thesequantities
to theanalogous
ones in thecorresponding
mean-fieldmodel,
i. e. theHopfield
model.
We denote
by
thecorresponding
Gibbs measure, andby
the induced distribution of the
overlap parameters.
We also writeprovided
this limit exists. Notice that the case of a finite number ofpatterns
was studied
by
Comets[Co].
Let us define the convex functions
which,
ifexists,
arethe convex hulls of these functions. We set
Our first result concerns the existence of the functions THEOREM 3.1. -
Suppose
thatp(N)
is such thatand
limN~~ p(N)lnN N
= o.Then,
Annales de l’Institut Henri Poincare - Physique theorique
(i)
For almost all realisationof
the patterns,C~
°Pf’I(m) defzned through (3 .11 )
existsfor
anyfinite
set I C N and isindependent of
E, and thefunction p(N).
(ii) If,
moreover,limN~~ 2p(N) N
=0, then,
almostsurely, fHopf,I03B2() defzned through (3.11)
exists and isindependent ofp(N).
For the
Kac-Hopfield model,
we obtain theanalogue
of the Lebowitz-Penrose Theorem:
THEOREM 3.2. - Assume that
satisfies lim03B3~0 M(03B3) =
+00 and= 0.
Then, for
any(3,
and anyfinite
subsetI, for
almostall realisations
of
the patterns,ACKNOWLEDGEMENTS
I would like to thank Flora Koukiou for
giving
me theopportunity
towrite this review and the referee for his
suggestions.
[ALR] M. AIZENMAN, J. LEBOWITZ and D. RUELLE, Some rigorous results on the Sherrington- Kirkpatrick model, Com. Math. Phys. Vol. 112, 1987, pp. 3-20.
[A] D. J. AMIT, Modeling Brain Function, Cambridge Univ. Press. Cambridge, 1988.
[AGS1] D. J. AMIT, H. GUTFREUND and H. SOMPOLINSKY, Spin-glass models of neural networks, Phys. Rev., Vol. A32, 1985, pp. 1007-1018.
[AGS] D. J. AMIT, H. GUTFREUND and H. SOMPOLINSKY, Statistical Mechanics of Neural Networks near Saturation, Ann. of Phys., Vol. 173, 1987, pp. 30-67.
[BG1] A. BOVIER and V. GAYRARD, Rigorous bounds on the storage capacity of the dilute
Hopfield model, J. Stat. Phys., Vol. 69, 1992, pp. 597-627.
[BG2] A. BOVIER and V. GAYRARD, Rigorous results on the thermodynamics of the dilute
Hopfield model, J. Stat. Phys., Vol. 69, 1993, pp. 597-627.
[BGP1] A. BOVIER, V. GAYRARD and P. PICCO, Gibbs states of the Hopfield model in the regime of perfect memory, to appear in Prob. Theor. Rel. Fields, 1995.
[BGP2] A. BOVIER, V. GAYRARD and P. PICCO, Large deviation principles for the Hopfield
and the Kac-Hopfield model, to aappear in Probability Theory and relatedfields,
1995.
[BGP3] A. BOVIER, V. GAYRARD and P. PICCO, Gibbs states for the Hopfield model with extensively many patterns, to appear in J.S.P., 1995.
[CT] B. CHENG and D. M. TITTERINGTON, Neural Networks: A Review from a Statistical
Perspective, Statistical Science, Vol. 9, 1994, pp. 2-54.
64, nO 3-1996.