HAL Id: jpa-00212540
https://hal.archives-ouvertes.fr/jpa-00212540
Submitted on 1 Jan 1990
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Generalization in a Hopfield network
J.F. Fontanari
To cite this version:
2421
Generalization in
aHopfield
network
J. F. FontanariInstituto de Fisica e
Quimica
de São Carlos, Universidade de São Paulo, Caixa Postal 369, 13560 São Carlos SP, Brasil(Received
23April
1990,accepted
infinal
form
12July 1990)
Abstract. 2014 The
performance
of aHopfield
network inlearning
an extensive number of conceptshaving
accessonly
to a finitesupply
oftypical
data whichexemplify
the concepts is studied. Theminimal number of
examples
which must betaught
to the network in order it starts to createrepresentations
for the concepts is calculatedanalitically.
It is shown that the mixture statesplay a
crucial role in the creation of these
representations.
J.
Phys.
France 51(1990)
2421-2430 ler NOVEMBRE 1990,Classification
Physics
Abstracts 87.10 - 64.60C1. Introduction
Learning
andgeneralization
in neural networks has been thesubject
of intensive research in thepast
few years[1-7].
The most recent studies have been carried out in the context ofsupervised learning
insingle-layer
feedforward neural networks[5-7],
following
the theoreti-cal frameworkpresented
in Gardner’s seminal paper[8]. Comparatively,
little attention has been devoted to theability
ofsimple
feedback neuralnetworks,
e.g.Hopfield’s
model[9],
toperform computational
tasksbeyond
thesimple
storage
of a set ofactivity
patterns.
Thelearning
process inHopfield’s
model consists ofsetting
the value of thecoupling
.l¡j
between neurons iand j
for allpairs
of neurons such that agiven
set ofactivity
patterns
is memorizedby
the network. In this model the states of the neurons arerepresented by Ising
spins, Si
+ 1(firing) or Si= - 1 (rest). Storage
of anactivity
pattern
{ç r
= ±1,
i =
1, ...,
N }
into the memory of the network is achievedby modifying
thecouplings
according
to thegeneralized
Hebb ruleAfter
being exposed
to p
activity
patterns
thecouplings
are set towhere we have
assumed Jii
= 0initially (tabula rasa).
Once the
couplings
arefixed,
thedynamical
retrieval process isgoverned by
the Hamiltonian[9]
The network can
potentially
retrieve agiven
activity
pattern
if it is a minimum or if it is very near a minimum of H. The naturalparameters
formeasuring
theperformance
of the network inretrieving
the storedpatterns
are theoverlaps
where the state
{Si, i = 1, ..., N } is a
minimum of H. Theproperties
of these minima have beenfully
studiedby
Amit et al.[10, 11 ]
using
statistical mechanics toolsdeveloped
in theanalysis
of infinite rangespin-glasses [12].
It has been shown that besides the retrieval stateswhich have
macroscopic overlaps, 0(1),
withonly
one storedpattern
there exist mixturestates which have
macroscopic overlaps
with several storedpatterns
[10].
Since the interest inHopfield’s
model ismainly
due to itsprospective
use as an associative memory, the attention has been focused on the retrieval states, the mixtures statesbeing regarded
as a nuisancewhich can be eliminated either
by adding
external noise to thesystem
[10]
orby modifying
thelearning
rule[13].
On the otherhand,
thesespurious
states have been seen asproof
of theability
of the network to create newrepresentations
to handle the information contained in the storedpatterns
[14].
In this paper we show that the mixture statesplay
a crucial role when the taskposed
to the network is to extractmeaningful
information from theactivity
patterns
it isexposed
toduring
thelearning
stage.
We consider the
following problem.
Let us suppose thatduring
thelearning
stage
the network isexposed
to sexamples
of agiven
concept.
Theexamples
are embedded in the memory of the networkby
the Hebbianlearning
process,equation (1.2).
Thequestion
we address is whether the network can create arepresentation
for theconcept
to which it had beenexposed only through examples.
We say that the network has arepresentation
for aconcept
if theconcept
is a minimum or if it is very near a minimum of H. Morespecifically,
weconsider p
= a Nconcepts
represented by
theactivity
patterns {T}.
=1,
..., p. For each
concept,
a finite number ofexamples
{ç jJl}, V
=1, ..., s
isgenerated.
Theircomponents
arestatistically independent
random variables drawn from the distributionwith O:s; b :s; 1. The
examples
can bethought
of asnoisy
versions of theconcepts
they
exemplify.
Theparameter
b measures thedifficulty
of the taskposed
to the network : Smallvalues of b result in low correlations between
examples
andconcepts,
making
thegrasping
ofthe
concepts
more difficult. Forsimplicity,
thecomponents
of theconcepts
arerandomly
chosen as ± 1 withequal probability.
As an alternative
viewpoint,
one may consider theconcepts
asdefining
p classes each onecontaining s
individuals(examples).
Thus,
the task of the network would be to group theexamples
in theirrespective
classes,
i.e. the network shouldcategorize
theexamples.
However,
in this paper we follow thepoint
of viewexpressed
in Denker et al.[1]
]
thatcategorization
is aparticular
case of rule extraction(generalization)
in which the rule may be2423
Finished the
learning
stage,
thecouplings
are set toThe
quantities
we focus on are thegeneralization
errors eJL definedby
where
m e,
thegeneralization overlaps,
are theoverlaps
between theconcepts
and a certain minimum of H which will bespecified
later.The aim of this paper is to calculate the
dependence
of thegeneralization
error(el-")
on the number ofexamples taught
to the network(s)
for agiven
task characterizedby
theparameter
b. To achieve this westudy
thethermodynamics
ofHopfield’s
Hamiltonian,
equation (1.3),
with thecouplings
set as inequation (1.6).
Ouranalysis
is restricted to thenoiseless
(zero temperature)
limit. A firstattempt
to tackle thisproblem
has beenpublished
recently [15].
This paperpresents
asimpler
and moregeneral approach.
The paper is
organized
as follows. In section 2 westudy
thethermodynamics
of the model in the limit a =p
/N -
0. Thesimplicity
of thislimit,
whichdispenses
the use of thereplica
trick in thecomputation
of theaveraged
free energydensity,
allows us to findanalytical
expressions relating
ee with s and b. Theanalysis
of the nonzero a limit isperformed
within thereplica symmetric
framework in section 3. We summarize our results andpresent
someconcluding
remarks in section 4. 2. Finite number ofconcepts.
The Hamiltonian of
Hopfield’s
model with thecouplings given by equation (1.6)
can be written aswhere we have omitted a constant term and included an additional term in order to
compute
M,". In
fact,
writing
theaveraged
free energydensity
aswhere Z =
Trs e- pH
one hasThe
parameter 8 =-
T-1
in
theexpression
for thepartition
function Z is a measure of the amount of noiseacting
on thesystem.
The noiseless limit is obtainedby
taking
/3
- oo. Thenotation ... >
stands for the averages over theexamples
and over theconcept
taken in this order. The calculation
of f is straightforward
in the limit a = 0[10]
so wepresent
The order
parameters,
with
(... ) T standing
for the thermal average, aregiven by
thesaddle-point equations
with hu = 0.
Using
equation
(2.3)
we can write thegeneralization overlaps
me in terms of the retrievaloverlaps
m p, JIWe restrict our
analysis
to aparticular
class of solutions form J.L 11,
i.e. to aparticular
class of minimaof f (or
H for T =0).
Since there are nomacroscopic
correlations between differentconcepts
we consider solutions of the formm J.L 11 = m 1 11 8 J.L 1.
Moreover,
we choosem1v
to be of the formThe motivation for
choosing
this solution is that itgives
a bias to the network to behave as an associative memory : eachexample
issingled
out(the
solution is sdegenerate
since we can select any of theexamples
to beml 1)
and treated as anindependent piece
of information to be stored. If any other behaviour emerges, it will be aspontaneous
property
of the network and not an artifice due to theparticular
choiceexpressed by equation (2.8).
At T =0,
m11
and ms - 1satisfy
theequations
where Xs -
1 = ¿ ç 1 JI.
For s > 10 the binomial distribution of Xs - 1 can bereplaced by
av > i
Gaussian with mean
(s - 1 )
bç
1 and
variance4)
=(s - 1 ) ( 1 -
b 2).
Performing
the averagesover ç Il,
Xs - 1 iand ç
1 in
thisorder,
we findwhere
2425
Next we discuss the solutions of
equations
(2.10).
For small s the network retrieves theexamples
almostperfectly,
i.e.m11 ~ 1
andms - 1 = b2.
Hence thegeneralization
errorE = E ,
equation (1.7),
isStrictly,
the retrieval isperfect
for(s -
1 ) -- b-2
as can beeasily
seen fromequations (2.9)
since 1 xs - 1 1 -- s -
1. This result is not recoveredby equations (2.10)
because thereplacement
of the discrete distribution of xs _ 1by
a Gaussianmakes
1 Xs - 11 [
unbounded. As sincreases,
m 11
decreasesslightly
until s reaches a critical value Sc, above whichm11
jumps
to a much smallervalue,
almostequating
with ms - 1. This behaviour is reflected in thegeneralization
error which we show infigure
1 as a function of s for several values of b. For s - Sc the networkjust
memorizes thepatterns
it isexposed
to. This behaviour is referred to as the retrievalphase (R).
For S:> sc the network nolonger
treats theexamples
asindependent pieces
of information and starts to mixthem,
creating
therepresentation
for theconcept.
This is thegeneralization phase (G).
Thephase diagram
in the(s, b ) plane
indicating
theregions
where eachregime
occurs is shown infigure
2.Although
mIl = m s -
i is solution ofequations (2.10) only
in the limit s -+0 oo, in thegeneralization regime
one hasm ms ms
where ms is thesymmetric
solution ofequation (2.6),
In
fact, m 11
neverequals
ms _ 1 because whenaveraging
over theexamples
we haveexplicitly
ruled out thispossibility by retaining
the discrete natureof e
11 whilemaking
the Gaussianapproximation
for the distribution of Xs - 1.Nevertheless,
since the differences betweenm 11,
ms -and
m, arenegligible
in the Gphase
we canapproximately
characterize thisregime
by
thesymmetric
solution ms.Following
similarsteps
to the onesleading
toequations (2.10)
one findsFig.
1. -The
generalization
error as function of the number ofexamples
for b = 0.18, 0.20, 0.25, 0.40where
Thus,
thegeneralization
error in the Gphase
can beapproximate
toThe values of 8
computed
through
thisequation
areindistinguishable
from the exactgeneralization
error,computed through equations (1.7)
and(2.13),
in the scale offigure
1.3. Infinité number of
concepts.
In this section we consider the case where the network has to create
representations
for an extensive number ofconcepts, a
=p /N +
0,
being exposed
to a finite number ofexamples s
of each
concept.
In this case we have to take into account the fact that the combinedoverlap
of aconcept
with all the otherconcepts
is of0 (-,/à ).
To handle this situation we follow Amit et al.[11] and
assume thatonly
theoverlaps
m
JIcondense,
i.e. are of0(1)
while the others are ofO(N- 1/2).
Theaveraged
free energydensity
is calculatedthrough
thereplica
trickwhere Z’ is the
partition
functionreplicated n
times,
Averaging over rJl(J.L
>1) explicitly
andusing
theself-averaging
property
of ei ’
yields
where we have omitted
multiplicative
factors which vanish in thethermodynamic
limit andwithBJlÀ = b2 +
(1 - b2)
8J1À
andQpu
=qpu
+(1 -
qpu)
8pu.
The integrals in equation (3.3)
2427
resulting
in thefollowing expression
for theaveraged
free energydensity
where
with
C = 6 (1 - q )
and Dz =dz /
J2;
e - z 2/2
Thé orderparameters
aregiven by
thesaddle-point equations
which,
in the limitsf3
-+ oo andhl...
0,
are written asThe
equations
for the standardHopfield
model[11]
are recovered in the cases s = 1 andb = 1 with an
appropriate rescaling
of C and r. For b = 0 the network iseffectively storing
sp uncorrelated
patterns
andHopfield’s equations
are obtainedby rescaling a ’ -
sa. Oncem
"and C are known we can
compute
m
through
theequation
Next we discuss the solutions of the
saddle-point equations (3.10)-(3.12).
In addition to the solutions considered in the a = 0limit,
there exist aspin-glass solution m 1 JI =
0‘d v which is stabilized
by
the Gaussian noise due to theoverlaps
ofO(N- 1/2)
betweene
"and el’ ’,
» > 1.However,
adding
noise to thesystem
has adestabilizing
effect for theretrieval
phase [ 13,
16,
17], reducing
its domain to aregion
much smaller than the one shown infigure
2. Therefore thephase
diagram
in the(s, b ) plane
will be dominatedby
thegeneralization phase
characterizedby
thesymmetric
solution,
equation (2.15),
and thespin-glass phase.
In thefollowing
we will focusonly
on theinterplay
between these twophases.
For the
symmetric
solutionsequations (3.10)
and(3.11)
reduce toFig.
2.Fig.
3.Fig.
2. - Phasediagram showing
thegeneralization
phase (G)
and the retrievalphase (R)
fora = 0. The transition at s = SC is discontinuous. The retrieval of the
examples
isperfect
below thedashed curve, b = 1
/
Bls - 1.
Fig.
3. - Thegeneralization
error as function of the numberof examples
fora/a 0
= 0.05, 0.2, 0.4, 0.5and b = 0.4.
respectively,
where Xs= L çl
JI.
For s > 1 0 one canreplace
Xsby
a Gaussian variable of meanv
sbe
1 and
variances( 1 -
b
2).
Performing
the averages as in section 2yields
where
4)
=a r + m.; s (1 - b 2).
Equations (3.12), (3.16)
and(3.17)
must be solvednumeri-cally
in order tocompute
thegeneralization overlap,
and,
consequently,
thegeneralization
error e =( 1- m 1)/2.
In the limit s - oo the noise in the
examples
isaveraged
out, i.e. Xs -sbe
1 and
the standardHopfield
model with theconcepts
replacing
theexamples
inequation (1.6)
is recovered. Thisresult can be
easily
obtainedby rescaling
C’ =sb 2 C
and r’ =r / S2 b 4
whichimplies
thatms =
bm
1 with m
satisfying
the standardHopfield’s equations.
Next we discuss the behaviourof the
generalization
error as a functionof s, b
and a. Infigure
3 we show e as a function of sfor b = 0.4 and several values of
a / a o,
whereao = 0.138
gives
thestorage
capacity
of thestandard
Hopfield
model. As in the a = 0 case, there is a minimal number ofexamples
which
must be
taught
to the network in order it starts togeneralize.
The transition between the SGphase,
where E = 0.5 since thespin-glass
states have nomacroscopic
correlations with theconcepts,
and the Gphase
isalways
discontinuous. The values ofa lao
for which the transition occurs are shown infigure
4 for several values of b. As b --+ 1 or s --> 00,2429
Fig.
4.Fig. 5.
Fig.
4. - The critical values ofa / a 0
below which the network starts togeneralize
as function of thenumber of
examples
for b = 0.2, 0.4, 0.6, 0.8.Fig.
5. - Thegeneralization
error atcriticality
as function of the number ofexamples.
generalization
error at the transition(Be)
is shown infigure
5. As b --+ 1 or s --+ oo,ec tends to
0.0165,
the critical retrieval error for the standardHopfield
model[11].
4. Conclusion.
In this paper we have
s*hown
that asimple
feedback neural networkusing
a local Hebbianlearning
rule is able to learn a set of conceptshaving
accessonly
to a finitesupply
oftypical
data whichexemplify
theconcepts.
The networkacomplishes
thatby creating representations,
i.e. minima of the energy functiongoverning
the retrieval process, whichcapture
theunderlying
statistical structure of theexamples, allowing
the extraction ofmeaningful
information from the datasupply.
For thespecific problem
we have considered in this paper, thesymmetric
mixture statesprovide
for therepresentations
which allow the network to graspthe
concepts.
Since the research onHopfield’s
model has focusedmainly
on the retrievalstates, very little is known about the relation between the statistical structure of the
patterns
presented
to the networkduring
thelearning
stage
and the structure of therepresentations
createdby
the network[13,
17].
This seems to be a crucial issue if one intends to lead thestudy
ofHopfield’s
modelbeyond
itsmemorizing capabilities.
Next we summarize our main results. For finite p we have found that there is a
regime
where the networksimply
memorizes theexamples ignoring
their statistical structure(retrieval phase).
However,
as the number ofexamples
increasespassing
a certain criticalnumber Sc
the behaviour of the networkundergoes
anabrupt
change entering
a newregime
where therepresentations
of theconcepts
are created(generalization phase).
For p = a N(a >0)
the retrievalphase
is confined to a very smallregion
of the(s, b )
plane.
However,
aspin-glass regime,
where the networkignores
both theexamples
and theconcepts,
appears tocompete
with thegeneralization regime.
Theinterplay
between thesetwo
regimes
isqualitatively
similar to the one discussed in the finite p limit with thespin-glass
replacing
the retrievalregime.
It is
interesting
to compare our results with the onespresented
in the literature forthe
present
work is the existence of a critical number ofexamples (sc) beyond
which thenetwork
generalizes
well(Figs.
1 and3).
Whether a similar behaviour occurs in the case offeedforward networks is an unsettled issue. On the one
hand,
simulations ofmultilayer
networks for the
contiguity
problem
and somegeneral
theoreticalarguments
point
out for theexistence of a critical size of the
training
set above which thegeneralization
error(E)
fallsof exponentially
fast[1, 3].
On the otherhand,
ananalytical study
of theperformance
of asingle-layer
perceptron
inclassifying examples according
to theirHamming
distance from a set ofprototypes
indicates that such a critical number does not exist[6].
However,
since thebehaviour of E seems to
depend strongly
on the architecture of the network considered[3]
there may be no
simple
answer for this issue. A similarcontroversy
could very well arise in thecontext of feedback neural networks if we use other
leaming
rules than the one considered inthis paper.
Finally,
we should mention the relevance of our results to theproblem
ofcategorization
in neural networks. Aspointed
out in theIntroduction,
thegeneralization regime
may beinterpreted
as acategorization
of theexamples
into the classes definedby
theconcepts.
We have found thatcategorization
emergesspontaneously
when a critical number ofexamples
ispresented
to the networkduring
thelearning
stage.
This result corroborates theviewpoint
that
categorization
is related with the limitation of an associative memory. Thispoint
wasbeautifully expressed by
Virasoro : « wecategorize
not because we want to but because we cannot do otherwise »[18].
Acknowledgments
It is a
pleasure
to thankRonny
Meir for manyhelpful
discussions.References
[1]
DENKER J., SCHWARTZ D., WITTNER B., SOLLA S., HOWARD R., JACKEL L. and HOPFIELD J. J.,Complex Systems
1(1987)
877.[2]
PATARNELLO S. and CARNEVALI P.,Europhys.
Lett. 4(1987)
503 ;CARNEVALI P. and PATARNELLO
S., Europhys.
Lett. 4(1987)
1199.[3]
TISHBY N., LEVIN E. and SOLLA S. A.,Proceedings of
the International JointConference
on NeuralNetworks
(1989).
[4]
VALLET F.,Europhys.
Lett. 8(1989)
747.[5]
DEL GIUDICE P., FRANZ S. and VIRASORO M. A., J.Phys.
France 50(1989)
121.[6]
HANSEL D. and SOMPOLINSKY H.,Europhys.
Lett. 11(1990)
687.[7]
GYÖRGYI G. and TISHBY N.,Proceedings of
the STATPHYS-17Workshop
on Neural Networksand