HAL Id: jpa-00246392
https://hal.archives-ouvertes.fr/jpa-00246392
Submitted on 1 Jan 1991
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Generalization in the Hopfield model: numerical results
E. Miranda
To cite this version:
E. Miranda. Generalization in the Hopfield model: numerical results. Journal de Physique I, EDP
Sciences, 1991, 1 (7), pp.999-1004. �10.1051/jp1:1991183�. �jpa-00246392�
J
Phys.
Ifrance 1(1991)
999-llKA JuiLLBr1991, PAGE 999classification
Physics
AbsmvctY 87,10 64.60cSham Communication
Generalization in the Hopfield model: numerical results
E. N. Mkanda
HLR2~ WA
Jdlich,
D-5170 JUlich,Gennany (Received llApdll99l, vcceptedl6Apfill99l)
Abstract. The
generalization capability
of theHopfield
model is studiednumerically.
There is a critical number tc ofnoisy examples
that should bepresented
before the system grasps the pure patterns. Above tc, thegeneralization
error falls off as a power of thepresented examples.
The criticalnumber of
examples
increasesexponentially
with the noise level in theexamples
andlinearly
with the number of patterns to be learned.The
Hopfield
model[ii
is the mostpopular
neural network and it isresponsible
for thepresent
excitement around thissubject
in the theoreticalphysics community.
It has beenextensively
stud- ied as an associativememory
model and itsphase space
has beenexplored
with thepowerful
tools ofspin-glass
mean-fieldtheory [2].
However little is known about itsperformance
in more com-plex computational
tasks likegeneralization.
Thisproblem
is the focus of many recentpapers [3 8], although
these works deal with mono ormultilayered perceptrons.
Fontanari [9] has ex- aminedgeneralization
in theHopfield
model with the usual mean-fieldtechniques.
The aim of this paper is tostudy numerically
the sameproblem.
In the usualHopfield
model,
patterns
are learnedby choosing
thecouplings according
to Hebb's rule. In this way, thepatterns
become minima of aproperly
defined energy[2]. Now, suppose
that severalnoisy examples
of thepure patterns
are stored in thesystem.
Does itgrasp
thepure pattems?
We say thesystem "general-
he" from tile
examples
if thepure patterns (which
have never been stored! are energy minimaor
they
are very close to energy minima. Our results show that a critical number ofexamples
must betaught
to thesystem
in order to startgeneralization.
Once the network is in thegeneralization regime,
the retrieval error decreases as a power of thetaught examples.
The critical number ofexamples
increasesexponentially
with the amount of noise in thetaught examples
andlinearly
with the number of
pattems
to be stored.Consider a neural network with N neurons which can take the values
S;
=+I;
every site is con-nected to every site. There is a set of p
patterns (ff (0)) (p
=I,
...., p; I
=
I,
...,
N)
to be learnedby
thesystem.
Asusual,
the relevantquantity
is a =PIN.
We assume two realistichypotheses
about
teaming: a)
we never leam a purepattem
but anoisy
version ofit; b) leaming
neverstops;
we are
always relearning
andimproving previously
stored information.So,
thelearning procedure
of our model is a continuous one. A "time
step"
at time timplies
thefollowing operations:
JOURNAL DE PHYSIOUE I T I,At7, JUILLET199> 40
llX4 JOURNALDEPHYSIQUEI N°7
. Choose a set of
noisy examples
of thepure patterns
to belearned; they
are denotedby (ff (t) ).
If r is the nobe
level~
this means thatff(t)
=
-ff(0)
withprobability
r.. Store the set of
noisy examples using
the usual Hebb rule:Ji;(t)
=Ju(t i)
+ffl"~(t)fj~~(t)
Jii
= 0Initiauy,
thesystem
starts from a "tabula rosa" state(all
theJ,j
arezero).
It is well known [2]that an
energy
may be defined if theJ;;
aresymmetric.
In our case, the energy will be a timedependent quantity:
~ ~
E(t)
=~j ~j J,j (t)S;Sj
;=i j=i
Generalization b achieved when the pure
patterns
are energy minima. Thispoint
is checked in thefollowing
way. Apure pattern
is taken as initialconfiguration;
it evolvesaccording
to asequential stepest-descendent dynamics
-I.e, each neuron should beparallel
to its internal field-.After few
iterations,
thesystem
reaches a fixedpoint.
Itsoverlap
with the initialpure pattem
isgiven by:
~
'~'"(l)
"j ~jffSS
<=I
This
quantity
isaveraged
over the whole set ofpure patterns.
Thegeneralization
error is definedas:
Z(t)
=/~~~
The bar means
average
over thepattern
set. If thesystem generalizes
from the shownexamples,
one
expects
z ci 0.Computer
simulations wereperformed
with neural networks of 256 and 512 neurons. These are rather modest sizes -see reference[10]
for a review oflarge
scale simulations- but aqualitative change
of thesystem
behaviour is notexpected
forbigger
sizes. The time evolution of theoverlap
was measured and
averaged
over 50samples.
We simulated over 50 to 250 timesteps depending
on
N,
r and a. The simulation time was chosen such that thepower-law decreasing
error(see below)
was seen. Infigure I,
thegeneralization
error isplotted against
time(I,e,
number ofshown
examples).
These datacorrespond
to r =0.25,
a = 0.10 and N= 512. It is clear that the
generalization
error isapproximately
constant in the firstiterations,
then it goes down with apower
law. A critical number ofexamples (I,e
a criticaltime)
tc may be defined as it is shown infigure
I(see
construction with dashedline).
For t >tc,
thegeneralization
error falls off with apower
law:z ~
t~°
We now
study
thedependence
oftc
and with theparameters
involved in the model.In
figure 2,
the critical time isplotted against
the noise level r for a = 0.05. The data fit very well anexponential
law:tc
+~ e~Thh means that
generalization
becomes much more difficult aj thepatterns
are less well defined-a
good
advice for out teachers!-certainly
decreases with r(bom
m 3.5 for r = 0.I to b ci 1.5 for r =0.4)
but the values fluctuates too much for a definitive statemenL It should be remarkedN°7 GENERALIZA~TONINTHEHOPFIELDMODEL llYll
< ,---~
8 " " " .. '
k
c o
fi
_I~ i o
tc
20
log10 (time)
Fig,
I.Log~o Log~o plot
of thegeneralization
error vs, time in a 512 neurons network. The noise level is r= 0.20 and a
= 0, lo- A critical time tc may be defined as shown. Ift > tc, the
generalization
error falls off with a power law: e =t~°
D
1.
,..~'"
~~
,,:.." O
) ,,:.""~
,,....'"'i$
..
d'
Oloo 150 200 .250 300 350 400
Noise level
Fig.
~ The critical time tc as a function of the noise level r for N =256(o)
and 512 Q).
The data show that tc growsexponentially
with the noise in theexamples.
that for r = 0.5 there is no
generalization
at all~ lids isquite
obvious: apattern (f» (0))
with50~
of
wrong
sites may be considered as the"anti"-pattem (-f" (0))
with the same amount of noise.Therefore,
one mayexpect
asingularity
at r = 0.5.In
figure
3a the critical number ofexamples tc
hplotted against
a. We can see that tc increases1l~l2 JOURNALDEPHYSIQUEI N°7
___,_,...'b
,..."'~"'
O____,,.iy.""'
___,,,,:.g""""'
O~
..."'$"""
E
r B I qo
~
040 050 060 070 080 .090 loo l10 120 130 140
alpha
a)
"""'"...,q I""...,
'""..v.,
~e
"....,,_
I
"'".n._
o
'"....,,
""..R,,_
""'"....,_q
040 050 060 070 080 .090 loo l10 120 130 140
~'~~~
b)
Fig.
3. The critical time tc(a)
and the exponent b(b)
as a function of a. For a= 0.15 there is no
generalization
at all.symbols
as in theprevious figure.
linearly
with the number ofpattem
to be learned. A finite tc is found for a =0.14,
but there is nogeneralization
at all for a = 0. lb. It is well known thattheje
is a first order transition in the model for its retrievalproperties
[2] andlarge
scale simulations[11]
have shown it takesplace
at ac =0.143(1). So,
it is reasonable toexpect
ajump
in thegeneralization capability
of the modelN°7 GENERALIZA~TONINTHEHOPFIELDMODEL 10J3
around o ci 0.14. In
figure
3b theexponent
isplotted against
a. We may conclude:tc ~ a
~
'~7 -££
provided
that a < ac.Thus:
(I)
there is a critical number ofexamples
tc which should be shown to the network before it gen-eralizes;
(it)
in thegeneralization regime,
thegeneralization
error decreases with a power law +~t~°;
(iii)
tc growsexponentially
with the noise level in theexamples
shown to thenetwork;
(iv )tc
growslinearly
with a and theexponent
decreaseslinearly
with itprovided
o < oc.The first
point
is in full accordance with the mean-field results[9]
for theHopfield
model. Thereare some
analytical
calculations forsingle-layer perceptrons [12]
whichpredict
no critical sizetc
for thetraining
set. Mostprobably,
the resultsstrongly depend
on thearchitecture;
for this reasonone should not
expect agreement
with those calculations. In any case, our numerical simulationssupport
the claim of reference[9].
One should remember that our data have been obtained in fi- nitesystems
andthey
maydepend
on N. Infac~ tc
increases with N-compare
the data for N =256 with those for N =512-. Some runs with N=640 wereperformed
and the results agree with thoseof 512 neurons within statistical errors (+~
5~).
A carefulstudy
of finite-size effects(and bigger
systems I)
would be needed for aquantitative comparison
with mean-fieldpredictions.
Our second conclusion is indisagreement
withprevious
simulations formultilayered
neural networks[13, 14]
that show an
exponential decay
of e.However,
the behaviour of e seems todepend
on the details of the architecture even formultilayered perceptrons [14]. Therefore,
it is notsurprising
that weget
acompletely
different behaviour with acompletely
different architecture.Finally,
there are noanalytical predictions
aboutpoints (iii)
and(iv). Perhaps,
a noise tosignal analysis
can be a useful tool tostudy
suchquestions.
It would also be veryinteresting
toanalyze generalization
withGardner's
technique [15]
in order toget
results which are not bound to aparticular
architecture.Acknowledgements.
The author vishes to thank H. J. Herrmann for a careful
reading
of themanuscript
and the ref-erees for many
suggestions
References
HOPFIELD
J-J-,
Ptvc. NatL Acaa. Sci USA 79(1982)
2554.Functions
(Cambridge
Univ.Press, 1988).
Ml
(1989)
1983.J, and RERWGIER
P, Europhys.
Lett. 9(1989)
315.M.,
XINzELW,
KLEiNz J. and NEHLR.,
JPhys.
A23(1990)
I681.GYORGYi
G.,
Pfiys. Rev Len. 64(1990)
2967.HERI~
JA., KROGH A. and THORBERGSSON G-I-, JPhys.
Ml(1989)
2133.~SHBY N, and SOLLA
s.A~,
Proc. IEEE 78(1990)
1568.FONTANABI
J.,
JPhys.
France 51(1990)
2421.1o04 JOURNALDEPHYSIQUEI N°7
[10] KOHRiNG G., Int. L Mod
Phys.
Cl(1990)
259.[I I]
KOBRiNG G.,J Stat.Phys.
59(1990)
1077.[12] HANSEL D, and SOMPCLINSKY
H., Europhys.
Lent11(1990)
687.[13] DENKER
J., scHWAm2D.,
WnTNmB.,
SOLLAs.,
HOWARDR.,
JACKELL. and HOPFIELDJJ., Campier
SysL 1(1987)
877.[14] ~sHEY