HAL Id: jpa-00212362
https://hal.archives-ouvertes.fr/jpa-00212362
Submitted on 1 Jan 1990
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Retrieval dynamics of neural networks constructed from
local and nonlocal learning rules
J. Krätzschmar, G.A. Kohring
To cite this version:
Retrieval
dynamics
of neural networks constructed from local
and nonlocal
learning
rules
J. Krätzschmar and G. A.
Kohring
Universität Bonn,
Physikalisches Institut,
Nussallee 12, D-5300 Bonn I, F. R. G.(Reçu
le 7 avril 1989, révisé le 18septembre
1989,accepté
le 20 octobre1989)
Résumé. 2014 Les
dynamiques
derappel
de réseaux neuronaux construits àpartir
derègles
d’apprentissage
locales et non locales sontcomparées
à l’aide de simulationsnumériques
etapparaissent
très similaires. Nos simulationsindiquent
qu’il
ne semble paspossible
de déterminer le comportement dessystèmes
aux tempslongs
en fonction de ladynamique
auxpremiers
instants, comme cela avait été récemment
suggéré
parKepler
et Abbott. Abstract. 2014 The retrievaldynamics
of neural networks constructed from local and nonlocallearning
rules arecompared
via computer simulations and shown to be very similar. Furthermore,the simulations show no
hope
fordetermining
thelong
time behavior of the system in terms of the first stepdynamics
as has been advocatedby Kepler
and Abbott.Classification
Physics
Abstracts87.30G - 06.50
Introduction.
The
thermodynamic properties
of theoriginal Little-Hopfield
model have been studied in detailby
Amit, Gutfreund,
Sompolinsky
and their co-workers[1].
They
showed that astrongly
connected networkhaving
Nnodes,
orIsing-spins,
behaves as an associative memorydevice
provided
the number ofpattems,
P,
scales with N as: a =- limPIN 0. 14.
The N-+oorelatively
lowstorage
capacity
of the modelprompted
Personnaz et al.[2]
to invent thenonlocal,
orpseudo-inverse, learning
rule fordetermining
thecouplings,
Jjj,
between thespins.
Such a model has a maximumstorage
capacity
of a = 1. Vekentash[3],
however,
showed that the maximum
storage
capacity
for any model withonly two-spin
interactions isa = 2. This
prompted
Gardner[4]
to find a locallearning
rule which could reach thismaximum
storage
capacity
and indoing
so shedeveloped
theanalytical
tools needed forcompletely determining
the staticproperties
of networkshaving asymmetric couplings.
Staticproperties
alone are not sufficient forunderstanding
neural networks because their ultimate use is asdynamical
attractors. In thatrespect
it has beenargued
[5]
that networks constructed from nonlocallearning
rules should have betterdynamical properties
than those constructed from locallearning
rules.However,
extensive numerical simulationsby
Forrest[6]
and Kanter andSompolinsky
[2]
at two values of a do not bear that out. Anapproximate
analytic
treatment[7]
also indicates that at least in theneighborhood
of « =1,
the twolearning
rules should have similardynamical
behavior.Furthermore,
it has been shownby
224
Diederich and
Opper
[8]
that some locallearning
rules can in factreproduce exactly
thecoupling
matrix of thepseudo-inverse.
In this paper
then,
we makeexplicit comparison
between these twotypes
of networksusing
as a measure of the
dynamical
response, the radius ofattraction,
R. R is defined as one minusthe minimum
overlap,
mmin, with a stored state such that as N - 00 almost all of the stateshaving >
mmin(m =
(1 IN ej
Sj,
where e
represents
the stored state and S anarbitrary
jstate)
will evolve towards the stored state.Previously Kepler
and Abbott[9]
gave aphenomenological
rule forcalculating
thisquantity,
but the simulations to bepresented
here will show that care isrequired
whenusing
their rule. In the next section the numericalprocedures
used will beexplained
and aninterpretation
isgiven
in the last section. Numerical methods.Local
learning
rules aredesigned
toproduce
acoupling
matrix,
Jjj,
such that thefollowing
stability
condition is fulfilled for every site of everypattern :
The
learning
rulesproposed by
Krauth and Mèzard[10],
Forrest[6]
and Gardner[4]
can beshown to
produce equivalent
networks atsaturation, i.e.,
when the network has the maximumstorage
capacity
allowed for agiven
K. In the case ofrandom,
uncorrelatedpatterns,
a is related to K at saturation
by
the Gardner formula[4] :
To construct a network at saturation we used the
approach
advocatedby
Gardner[4].
Starting
with
pattern
1 andgoing sequentially through
all of thepatterns
to bestored,
thecoupling
matrix ischanged by :
After all the
pattems
have beenchecked,
theprocedure
isrepeated
untilAJij
= 0 for everypattern, J.L
andsite,
i. For a finite sizenetwork,
thesystem
saturates much sooner than wouldbe
expected by equation
(2.2).
In order to correct forthis,
we note that thelearning
time on an infinitesystem
shoulddiverge
asK --> K S ( a ),
where K S is the saturation value ofK
[6, 11].
Hence,
we looked for a similar finitesize,
K fs, at which thelearning
time tends toinfinity
and chose for the simulations a value of Kslightly
smaller so as toproduce
reasonablelearning
times. Thisprocedure
produces
a network withproperties
very similar to that of afully
saturated,
infinite volume network. Infact,
a calculation of the average localalignment,
(§fi
L Jij
03BEjr>>03BE ,
gives
a value on a 100 node network within 0.5 % of thatexpected
for an03BEui
j =iiij
03BEUj>>
03BE
’j¥Ai
03BE
infinite size
system
at saturation[9].
stability
condition in(2.1)
to,§iU
E
Jij 03BEuj’
=1,
couldreproduce
thecoupling
matrix structurej=i
of the
pseudo-inverse
method of Personnazet al.,
as N --> oo and in the limit of an infinite number ofleaming
steps.
Thepseudo-inverse
method defines thesynaptic
couplings
as :-where the correlation matrix is defined as : This method has the
advantage
that no refinements are necessary for finite sizesystems.
After
preparing
a networkusing
one of theseprescriptions,
thesystem
was initialized to some statehaving
anoverlap,
mo, with one of the storedpattems.
Then the network wasupdated
inparallel according
to thezero-temperature,
deterministic rule(Glauber
Dynamics) :
Biologically,
this rule is not correct ;however,
a naivecomparison
of time scales indicates thatreal
systems
shouldupdate
more inparallel
than inserial, hence,
there has been agreat
amount of attentionpaid
to thisupdating
scheme[9, 12, 7, 13].
Otherupdating
rules willgive
differentquantitative
results for the radius ofattraction,
but thecomparative
results should be the same.The
updating
rule is thenapplied
until thesystem
reaches a fixedpoint
or acycle.
Thus,
thepercentage
of times thepattern IL
is recalled as a function ofme
can be calculated.Repeating
this
procedure
for many different values of mo andusing
the definitiongiven
above,
enablesone to calculate the radius of attraction. This
radius, however,
must berenormalized, since,
as Kantor and
Som olinsky
[2]
havepointed
out, for finite sizesystems
the radius is reduced due to the0(1/
N )
overlap
of every stored state with every other stored state.Hence,
the actual definition in use is : R1 Mnùn
where m av is the averageoverlap
with the other stored states and the brackets indicate an average over all stored states and allpossible
sets ofpatterns.
We have also checked in the course of these simulations the
proposed
phenomenological
rule of
Kepler
and Abbott[9] :
where,
ml is theoverlap
of thesystem
at the first timestep
and P(So --> E )
is theprobability
that initialsystem
ismapped
onto the stored state.Kepler
and Abbottsuggested
thatX has the value 1/2.
For the simulation results to be
presented
here,
an N = 100 node network was used andtypically
75 different sets ofpatterns
wererandomly
chosen to stabilize for each distinct value of a. For each set ofpattems,
the radius of attraction was measured for eachpattern
as226
Results and discussion.
The main results of this paper are
presented
infigure
1,
where it canclearly
be seen that localand
nonlocal learning
rules have similardynamical
behavior as manifestedby
thesimilarity
of their radius of attraction. At a= 1,
Rnonlocal
must tend to zero[2],
however,
it is also clear thatR10cal
al isbecomming
very small.Hence,
eventhough
the locallearning
rule has astorage
capacity
of a= 2,
thesystems
ability
to function as an associative memory device isalready
seriously
impaired
at a = 1. These results confirm thepredictions
of Krauth et al.[13]
whoargued
on the basis of aone-pattern-neural-network
that thestability
of thepatterns
is the mostimportant
variable fordetermining
the radius of attraction.Indeed,
for the nonlocallearning
thestability,
à,
is[2] : à
=(1 -
a )/ a ,
while for thelocal learning
rule we haveFig.
1. -The radius of attraction, R vs. a.
(D)
For thelocal learning
rule,(A)
for thenonlocal learning
rule,
(0)
using
theKepler
and Abbottphenomenological
rule. The broken line is theanalytical
[4]:
a = [ ( K2; 1 ) erfc (- K/ v2) + (K/ v2)e-K2/2t1,
and as K-->oo,a -> 1 /
(K 2 + 1),
==> K= ( 1 - a )/ a .
Hence,
these results confirm Krauth et al.’sextropolation
from aone-pattern-network
to a full network.Furthermore,
since the localleaming
ruleproduces
couplings
which aresymmetric only
in the mean(i. e. ,
thequantity,
y =Tin - Tji
is Gaussian distributed with zeromean),
while the nonlocallearning produces completely
symmetric couplings ;
these results indicate the lesserimportance
of theasymmetry
of thecouplings
(at
least when suchasymmetry
is not toolarge).
Figure
1 also shows the radius of attraction aspredicted by
thephenomenological
rule ofKepler
and Abbottalong
with the data from their paper[9].
Thedisagreement
between their results and thepresent
results can be understoodby
looking
atfigure
2a. Here theprobability
that P(So -03BE )
as a function of1 -mo
isplotted
for a =0.10,
K = 2.5 and two different1 - m0
values
of mo. Although
simulations at mo = 0.15give
results inagreement
with theirproposed
rule,
results for mo = 0.50 do not.Clearly,
a universal value for theadjustable
parameter
X cannot be determined from this data because itsdependence
upon mo is not clear at this finite value of N.Furthermore,
the results infigure
2b for a = 0.40 and K = 0.92 indicatethat the value
of X
may also have a Kdependence.
Infigure
3 additional evidence of this isshown at a =
0.4,
K =0.92,
where P(So --+ e )
isplotted
as a function of mousing
the normalprocedure
and theKepler
and Abbott rule. The formerpredicts
the correct radius of228
Fig.
3. -P (S --> 03BE) )
vs. mo. The full line is for the normalprocedure
while the dotted line is for theKepler
and Abbottprocedure.
Both are for a = 0.40 and K = 0.92.attraction,
while the latterpredicts
a radius of attraction consistent with the resultsgiven by
[9]
and indicated infigure
1. Thepresent
results shown infigure
1 could in fact be accounted forby using X =
0.3[14].
However,
the results infigures
2 and 3 would still be difficult tointerpret
assupporting
any such universal value of X . Given thisuncertainty
for the correctvalue of X , if any,
prudence
would dictate that any results should be based upon the morereliable method for
computing
the radius ofattraction,
namely
the standard method asdescribed in section 2 and used
by
other authors(see
forexample
Kanter andSompolinsky
in[2])
which usesonly
the initialoverlap
mo as itsdependent
variable.In summary
then,
thesimilarity
of thedynamical
behavior between networks constructed from local andnonlocal learning
rules has been demonstrated andimplies
noadvantage,
fromthis
viewpoint,
of either networktype
in actualapplications.
Furthermore,
it has been shown thatattempting
adescription
of thelong
time behavior of thesesystems
in terms ofonly
the firststep
dynamics
may lead tospurious
results.References
[1]
AMIT D., GUTFREUND H. and SOMPOLINSKY H.,Phys.
Rev. Lett. 55(1985)
1530 ; Ann.Phys.
173(1987)
30 ;CRISANTI A., AMIT D. J. and GUTFREUND H.,
Europhys.
Lett. 2(1986)
337.[2]
PERSONNAZ L., GUYON I. and DREYFUS G.,Phys.
Rev. A 34(1986)
4217 ;[3]
VENKATESH S. S., Neural Networksfor
Computing,
Ed. J. S. Denker(American
Inst. ofPhys. ,
New
York)
1986.[4]
GARDNER E.,Europhys.
Lett. 4(1987)
481 ; J.Phys.
A 21(1988)
257 ; GARDNER E. and DERRIDA B., J.Phys.
A 21(1988)
271.[5]
AMIT D. J.(Racah
Inst. ofPhys.,
Preprint),
RI/87/19(to
bepublished).
[6]
FORREST B. M., J.Phys.
A 21(1988)
245.[7]
KOHRING G. A.,Europhys.
Lett. 8(1989)
697.[8]
DIEDERICH S. and OPPER M.,Phys.
Rev. Lett. 58(1987)
949.[9]
KEPLER T. B. and ABBOTT L. F., J.Phys.
France 49(1988)
1657.[10]
KRAUTH W. and MÈZARD M., J.Phys.
A 20(1987)
L745.[11]
OPPER M.,Phys.
Rev. A 38(1988)
3824.[12]
GARDNER E., DERRIDA B. and MOTTISHAW M., J.Phys.
France 48(1987)
741 ;KRAUTH W., NADAL J.-P. and MÈZARD M.,