HAL Id: jpa-00212374
https://hal.archives-ouvertes.fr/jpa-00212374
Submitted on 1 Jan 1990
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
On the storage of correlated patterns in Hopfield’s model
J.F. Fontanari, W.K. Theumann
To cite this version:
On
the
storage
of correlated
patterns
in
Hopfield’s
model
J. F. Fontanari(1,* )
and W. K. Theumann(2)
(1)
Division ofChemistry,
California Institute ofTechnology,
Pasadena, CA 91125, U.S.A(2)
Instituto de Fisica, Universidade Federal do Rio Grande do Sul, Caixa Postal 15051, 91500 PortoAlegre,
RS, Brazil(Reçu
le 3 avril 1989, révisé le 18 octobre 1989,accepté
le 21 novembre1989)
Résumé. 2014 On étudie les effets de
stockage de p
patternsstatistiquement indépendants
mais effectivement corrélés dans le modèle deHopfield
de mémoire associative. Ceci conduit à proposer unerègle d’apprentissage
localequi,
en augmentant les différences entre patterns,conduit à une mémorisation d’efficacité
comparable
à celle desrègles d’apprentissage
non locales. Abstract. 2014The effects of
storing
pstatistically independent
buteffectively
correlated patterns in theHopfield
model of associative memory are studied. This leads us to propose a locallearning
rule whichby enhancing
the differences among the patterns allows the network to store them withan
efficiency comparable
to nonlocallearning
rules. ClassificationPhysics
Abstracts 87.30G - 64.60C -75. 10H
1. Introduction.
The statistical mechanics
analysis
offeedback,
fully
connected,
neural networks has madepossible
to unveil avariety
ofinteresting
features of thesesystems
which would have been hard to detectsolely through
numerical simulations[1, 2].
The most studied neural network is theHopfield
model of associative memory : the states of the neurons arerepresented by Ising
spins, Si =
+ 1(active) or Si = -
1(passive)
and thesystem
of Ninteracting
neurons isgoverned by
the Hamiltonian[3]
The stored
patterns {gr
= ±1 , £
=1,
... , p }
areimprinted
on theJi/s
(synaptic
connec-tions)
by
thegeneralized
Hebb rule(*)
Present address : Instituto de Fisica e Quimica de SaoCarlos,
Universidade de Sao Paulo, 13560 Sao Carlos SP Brazil.The
equilibrium
states are characterizedby
anoverlap
vector m of pcomponents
definedby
i = 1
for g
=1,
..., p.In this
approach
the neural network is viewed as asystem
ofIsing spins
in contact with aheat bath which simulates the
biological synaptic
noise. The toolsdeveloped
for infinite rangespin-glasses
[4]
allow ananalytical study
of theequilibrium
properties
of neural networks[1, 2].
It has been shown that there exist so called mixture states, i. e . states that are linearcombinations of the stored
patterns,
in which m has severalmacroscopic,
0(1),
components
[1].
Thatstudy
uncovered,
somewhatsurprisingly,
that thesynaptic
noise suppresses themixture states in favour of the retrieval states in which m has
only
onemacroscopic
component.
Storing
correlatedpatterns
in theHopfield
model is aproblem
that involves mixture states since the network’s state must have either a non-zerooverlap
with all thepatterns
or with none. In this paper we consider thesymmetric
andasymmetric
mixture states. The former describes the situation in which the network confuses thepatterns,
i.e. it cannotperceive
the individual details which make apattern
different from the others. This state is describedby
the vectorN
wheremp
= j7 iE et’ Si
for JI- =1,
..., p. Theasymmetric
state weconsider,
forsimplicity,
N ;
= 1distinguishes only,
one of thepattems
from the other p -1,
and is describedby
the vectorwhere and for
These two states
capture
the essence of theproblem
ofstoring
correlatedpatterns
inHopfield’s
model : the network’stendency
to enhance the commonpart
of thepatterns
makes it more difficult to retrieve the details thatdistinguish
them.Our
study
is restricted to aparticular
type
of correlations in which thepatterns
arestatistically independent
random variablesgenerated by
theasymmetric
distributionwith
bl
=(1
+a )/2,
b2
=(1 -
a )/2,
and a E[- 1, 1 ].
Though independent,
thepattems
areeffectively
correlated sincefor IL ::1=
v . Due to thisparticular
choice of correlations theasymmetric
state is pdegenerate ;
any
component
we choose to be ml leads to anequivalent asymmetric
state.This paper is
organized
as follows. In section 2 ,westudy
thethermodynamics
of theHopfield
model in theregime
p /N -
0 when Nu oo. Thesimplicity
of the modelsingles
outthe effects of the correlations among the
patterns.
In section 3 we propose a locallearning
which enhances the
asymmetric
stateby suppressing
itsrival,
thesymmetric
state. The network’s overallperformance
iscomparable
to the nonlocal modified Hebb ruleproposed by
Amit et al.[5]
where the
nonlocality
is due to theparameter a
which stands for the averageactivity
rate of the entire network. In section 4 we discuss our results andpresent
someconcluding
remarks.2. The
generalized
Hebb rule.The
design
of associative memory models may bethought
of as anoptimization problem
[6].
Given p
patterns { gr, IL
=1, ..., p}
to be stored in the network we must choose the connection matrix such that the local minima of the cost function or energy occur when the network’s stateS1
is near each one of thepatterns.
Thesimplest
guess iswhere m’ is defined
by
equation (1.3).
Noticethat,
except
for thediagonal
term, thisequation
is identical toequation
(1.1).
A rather different criterium for thedesign
of aneffective associative memory
model,
which we do not pursue in this paper, is that the basins of attraction of the local minima be aslarge
aspossible
[7],
and some progress in this directionhas
already
been made with correlatedpatterns
[8].
The
thermodynamics
of the Hamiltonian(2.1)
has beenfully
studiedby
Amit et al. when the storedpatterns
are uncorrelated[1, 2].
This section focuses on theproblems
ofusing
thegeneralized
Hebbrule,
equation
(1.2),
to store correlatedpatterns.
Since the
diagonal
term in(2.1)
does not affect thethermodynamics,
we canstraightfor-wardly
take the average free energydensity
from reference[5],
p
where
m. e = E
M »e,"
and m," isgiven by
the saddlepoint
equation
W=1 1
for g
=1,
..., p. The
site subscripts
weredropped
since theself-averaging
property
of 1 Nf
allows us toreplace N
(... )
by
the averages>
over thee’s,
anda -1-
T is aN i =1
iparameter
measuring
the amount of noiseacting
in the network. Next we consider twoparticular
solutions ofequation
(2.3).
z
2.1 SYMMETRIC SOLUTIONS. -
where and stands for
with and
Expanding
equation
(2.4)
in powers ofmp
one findswhere
and the critical
temperature
Tc
at which thesymmetric
stateundergoes
a continuous transition to theparamagnetic
state(mP
=0 )
isThis
equation
is an indication that the correlations among thepatterns
make thesymmetric
state more robust to noise effects. However this transitiononly
occurs if thesymmetric
solutions are stable near
T,,
which is not the case for a =0,
since theodd-p symmetric
solutions are unstable above a certain
temperature
0Tp
1 and the even-p arealways
unstable
[1].
Next we show how these resultschange
forez
0.The elements of the matrix A whose
eigenvalues
determine the localstability
of the saddlepoint
solutions,
equation
(2.3),
arewhich for the
symmetric
solutions are reduced towhere
There are two
types
ofeigenvalues :
. a
nondegenerate
eigenvalue
. a p - 1
degenerate eigenvalue
The
signs
of theseeigenvalues
determine the localstability
of thesymmetric
solutions. In the limit T -> 0 onefinds q =
1 -prob
(z.
=0)
andQ .- a 2 - prob
(z.
= 0 ).
Sinceprob
(z.
=0)
=Cp,p/z(b1 b2 )p/2
is nonzeroonly
for even p, theodd-p
symmetric
solutionsNear
Tc
theexpansion
ofequations
(2.lla-b)
in powers ofmp
gives
and
Substituting
these results into theequations
fors
1and À 2
yields
Hence
1 is
always
positive
belowTc
and À 2
becomesnegative,
signaling
theinstability
of thesymmetric
solutions,
only
in the limita2 -+
0 for T --T,.
For non-zero a, and T not too farbelow
Tc, À 2
ispositive
independently
of theparity
of p.
Thisimplies
that the even-p solutions must become stable above a certain 0Tp
T,,.
To
identify
theregions
ofstability
of thesymmetric
solutions in the space ofparameters
a and T we
perform
a numericalanalysis
Of À 2, equation (2.13),
since it is the firsteigenvalue
tochange
ofsign.
The results for several values ofodd p are presented
infigure
1,
where theodd-p
solutions are unstable inside the contours ofTp (a ).
Increasing
p increases theregion
ofstability.
For each p there is a critical value of a above which these solutions are stable for allT
T,.
The results for even-p arepresented
infigure
2,
where the even-psymmetric
solutionsbecome stable above
T.
T,.
Thisfigure
showsclearly
that the roleplayed by
thesynaptic
noisechanges
when thepatterns
are correlated - it stabilizes thesymmetric
mixture state.So far we have studied the network’s
tendency
to enhance the common informationcontained in the
patterns
reflected in theincreasing stability
of thesymmetric
state. Next weshow how this
tendency jeopardizes
the network’sability
to retrieve individual details of the storedpatterns.
2.2 ASYMMETRIC SOLUTIONS. - The
asymmetric
solutions,
equation
(1.5),
obey
theequations
p
where S, = ml -t mp - 1 zp - 1,
andzp _ 1
= E e," .
For weakcorrelations,
i.e. a « 1 these/t=2
Fig. 1.
Fig.
2.Fig.
1. - Theodd-p symmetric
solutions are unstable inside the contours ofT.
(solid curves)
shown for p = 3, 5, 7, 9, 21 and aboveT,,
(broken curves)
shown for p = 3 and 21.Fig.
2. -The even-p
symmetric
solutions are unstable belowTp
(solid curves)
shown for p = 2, 4, 6, 12 and above7c (broken curves)
shownfor p
= 2 and 12.Thus
making a
= 0 one recovers the Mattis solutions. Notice that the retrievaloverlap
ml isonly
slightly
reducedby
the presence of weak correlations among thepatterns.
Next we consider the zero
temperature
solutions ofequations
(2.17).
Taking
the limitf3 -+
00 inequations
(2.18)
one findswhich
clearly
is the best the network can do forretrieving
patterns
correlatedaccording
toequation
(1.7).
However,
it can beeasily
verified that this solution existsonly
ifS’+
and E -
arepositive.
This condition is satisfied for a a,, whereThe behaviour for a > a,,
depends
on theparity
of p. Forodd p
the modelundergoes
awhere k’-=
[p/2]
stands for theinteger
part
of p/2 [9].
Since for even p thesymmetric
solutions are
unstable,
the model mustundergo
a discontinuous transition to anotherasymmetric
state withThis solution has
mP _ 1 ml 1
and forlarge p
tends to asymmetric
solution withml = mp - 1 = a.
It should be
emphasized
thatequations
(2.19), (2.21),
and(2.22)
are not theonly
zerotemperature
solutions ofequations (2.17) [10]
butthey
are the relevant ones for our purposesince their basins of attraction contain the
input
states such that ml .-; 1 andmp _
1 eue 0 whichbias the network to
distinguish
pattern
1 from theother p -
1. We also remark that the transitionoccurring
at a = ac is not athermodynamic
transition because the energy of the statesplays
no role in the transition.Nevertheless,
a,clearly signals
achange
in thedynamical
behaviour of the model which is the
important
information for the software and hardwareimplementations.
To describe the behaviour of the model for T > 0 we associate the solutions
(2.19), (2.21),
and
(2.22)
with thephases
asymmetric
I(AI),
symmetric
(S),
andasymmetric
II(AII),
respectively.
The solution ml =mp _
= 0corresponds
to theparamagnetic
phase
(P).
Forodd p
thephase
diagram
has two transitions : a discontinuous one fromphase
AI tophase
S and a continuous one fromphase
S tophase
Pgiven by
equation (2.8).
Figure
3a illustratesthese transitions
for p
= 3. In the case of even p we need to include thephase
AII. Thisphase
is very sensitive to noise and
undergoes
a continuous transition tophase
S as soon as thesymmetric
solutions becomestable,
i.e. whena2
= 0. Theremaining
transitions are similar to theodd-p
case.Figure
3b shows thephase
diagram
for p
= 4. Notice the robustness of thephase
AI to noise and thelarge
domain of thephase
S in bothdiagrams.
3. Alearning
rule for correlatedpatterns.
An effective
leaming
rule forstoring
correlatedpatterns
should enhance their dissimilaritiesor, which is the same,
penalize
their similarities. Toimplement
this idea we mustdesign
anetwork whose cost function or energy is maximized
by
thesymmetric
mixture state. Asimple
guess iswhere m" -
m v 1ft
and p
= p - 1 with m’ definedby equation
(1.3).
A less
transparent
expression
forH
can be obtainedby expanding
thequadratic
term inFig.
3. - Phasediagram
for(a) p
= 3 and(b)
p = 4. The broken curvescorrespond
to continuous transitions while the solid curvescorrespond
to discontinuous transitions.where
Ji,
is a locallearning
rulegiven by
Notice that
going
fromequations
(3.1)
to(3.2)
we have omitted thediagonal
termlii
since itplays
no role in the statistical mechanicsanalysis
of the model.However,
it doesaffect the
dynamical properties
in a nontrivial way[11]
and,
to avoid futureambiguities,
wedefine the model
by equations
(3.2)-(3.3).
The
thermodynamics
of the Hamiltonian(3.1)
or(3.2)
isstraightforward.
Thepartition
Deforming
the contours ofintegration
so thatthey
pass the saddlepoint
andusing
theself-averaging
property
of theaveraged
free energydensity
one finds
where m’ and t "
obey
the saddlepoint
equations
Clearly
thesymmetric
state,equation (1.4),
is not a solution of theseequations.
This is the main consequence of ourlearning
rulebeing
different from that of Amit etal.,
equation
(1.9),
which has an energy
landscape
dominatedby spurious symmetric
states. We retum to acomparison
of the two rules below.For the
asymmetric
solutions,
equation
(1.5),
the saddlepoint equations
becomeAfter
averaging
over e 1,
introducing
the variableM = m 1 - m p _ 1,
andeliminating
t1
andtp -
1 onegets
thefollowing
equations
where
r
We now consider the zero
temperature
solutions of theseequations. Taking
the limit{3 -+
oo inequation
(3.10b)
andmultiplying
both sidesby
M one findssince
1 --t zp - i-- 0.
Tuming
toequation
(3.10a),
we notice thate+
and0-
arealways
p
positive
except
forzp - 1
= -(p - 1)
andzp -
1 = p - 1, respectively,
wherethey
vanish.Hence
taking
the limit8 --+
oo theonly
solutions oféquations
(3.10)
arewhich
rapidly
approach (2.19)
as p increases. Theseequations give
us a clue tounderstanding
how the network works. Take a certain bit inpattern
{},
sayei.
Theprobability
ofe ’
= e i
for g =2,
..., p is
r = bP
+bf
since thepatterns
areindependent.
Thus the meannumber of bits which are common to all the
patterns
is N T . Because thelearning
rulepenalizes
thesymmetric
state the network’s state must be such that half of these N T bits are reversed(if
all them are reversed the network would in fact enhance thesymmetric
state due to thesymmetry e "
--+ - e e)
and the N -N T /2
remaining
ones areequal
to theeils.
Hence-in
agreement
withequation
(3.13a).
If one thinks of thepatterns
aspictures
thenNT is the common
background
in thepictures
and what the network does ishomogenize
thebackground leaving
theprincipal
untouched.It is
interesting
to compare ourlearning
rule,
equation (1.8),
which forlarge p
isessentially
with the rule of Amit et
al.,
equation
(1.9).
Noting
thatand
neglecting
the fluctuationsof 0 (p-1/2),
one recoversequation (1.9).
Nevertheless,
these fluctuations arestrong
enough
tomodify
the energylandscape
and,
consequently,
thethermodynamics
of the model. To illustratethis,
let us consider the T = 0 solutions of bothmodels in the limit of
large
but finite p.For the rule
(1.9)
thesymmetric
solutions aregiven by
[5]
while
for the
rule(1.8)
one hasmp = Hs = 0.
Theasymmetric
solutions areml = 1,
mp _
1 =a2
for both rules and possess the energyThus the role of
the O(P-1/2)
fluctuations is to destabilize thesymmetric
states. Notice that for rule(1.9)
thesymmetric
solutions have lower energy for a >(1 - 2/or
)1/2,
though
this serious drawback can be avoidedby imposing
aglobal
constraint to thedynamics
[5].
4. Discussion.
In this paper we
proposed
a localleaming
rule forstoring
correlatedpatterns
in a neural network model of associative memory. Thelearning
rule,
equation
(1.8),
emerges as a naturalresult of
posing
thedesign
of the neural network as anoptimization problem
for the energyfunction : find a
learning
rule that enhances thepatterns’
differences. Such a rule mustnecessarily
becomplex.
Asimple
additiveleaming
rule,
like thegeneralized
Hebbrule,
treats eachpattern
as a newpiece
of information to be stored even if thepatterns
are correlated.Instead,
equation
(1.8)
extracts and storesonly
the new information contained in the newpattern
presented
to the network. Since the process ofselecting
the new information involvesa
comparison
of the newpattern
with all the dataalready
stored in the network thelearning
rule cannot be local in the space of the storedpatterns.
A similar situation occurs in thestorage
ofhierarchically
correlatedpatterns
where in order to store apattern
(descendant)
thenetwork needs to recall information
already
stored(ancestor) [14,
15].
Ourleaming
rule mayhave a
promising
application
inpattern
recognition problems
whereimportant
features of thestored
patterns
are hidden in thebackground.
We think that thispoint
deserves furtherattention.
Throughout
this paper we have consideredonly learning
rules where all thepatterns
areembedded in the network at once
through
aprescription
forJjj.
Although
thisunsupervised
learning
strategy
does notguarantee
thestability
of thepatterns,
it allows ananalytical study
of the retrieval process. It should be
emphasized
that,
in the context ofsupervised learning
rules,
there exist a localalgorithm
- theperceptron
algorithm
- whichcan
generate
theappropriate synaptic
connections to stabilize a set of N correlated or uncorrelatedpatterns
[16].
It should be remarked that the results of section 2 could be attributed to the statistical
independence
of the biasedpatterns
e e e "> = e e> e ">
= a 2
instead of to a true correlation effect. We believe this is not the case since those results agree with the intuitiveexpectation
that the correlations should favour thesymmetric
mixture state and thispreference
should be enhanced in the presence of noise. Further evidence isprovided by
comparing
our results for p = 2 with the results of reference[13]
wheree,">
=V)
= 0and (
e >
= Q.
Theequations
there are reduced toequations
(2.17)
when onereplaces Q
by a2.
Among
theadvantages
of theleaming
ruleequation
(1.8)
overequation
(1.9)
are theabsence of the
symmetric
mixture state and theapplicability
to any set of correlatedpatterns
without earlier
knowledge
of the correlations. To compare these twolearning
rules in thelimit of non-zero a -
p/N
we have run simulations for a = 0.1(N
=200 )
and measured theretrieval
overlap
ml as a function of a. Rule(1.9)
hasonly
a retrievaloverlap
Acknowledgments.
The research at Caltech was
supported by
contract N00014-87-K-0377 from the Office of Naval Research. J. F. F. thanks the kindhospitality
of IF-UFRGS where this work was started. W. K. T. thanks R. Erichsen Jr. for aid with some of the calculations in section 2. The researchof W.K.T. was
supported
inpart
by
Conselho Nacional de Desenvolvimento Cientifico eTecnologico
(CNPq)
and Financiadora de Estudos eProjetos
(FINEP),
Brazil. J.F.F. waspartly supported by
aCNPq
fellowship.
References