HAL Id: jpa-00246681
https://hal.archives-ouvertes.fr/jpa-00246681
Submitted on 1 Jan 1992
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Categorization and generalization in the Hopfield model
Miguel Branchtein, Jeferson Arenzon
To cite this version:
Miguel Branchtein, Jeferson Arenzon. Categorization and generalization in the Hopfield model. Jour-
nal de Physique I, EDP Sciences, 1992, 2 (11), pp.2019-2024. �10.1051/jp1:1992262�. �jpa-00246681�
Classification Physics Abstracts
87.10 64.60C
Short Communication
Categorization and generalization in the Hopfield model
Miguel
C. Branchtein and Jeferson J. ArenzonInstituto de Fisica, Universidade Federal do Rio Grande do Sul, C-P- 15051, 91501-970, Porto Alegre, RS, Brazi
(Received
1 September 1992, accepted in final form 8 September1992)
Abstract. The capability of the Hopfield model in generalizing concepts from examples is studied
through
numerical simulation. Noisy examples are taught and we test if the networkgrasps the underlying concepts. Several parameters are modified along the simulation: the
number of concepts and examples and the initial noise in the examples. It is obtained the first order transition from the retrieval phase to the categorization phase predicted by the
mean field theory.
Nowadays
thestudy
of neural networks is focusedmainly
in theability
ofrecalling
stored patterns that are attractors(stable
fixedpoints)
of the networkdynamics. However,
analc-gously
to real systems, otherproperties
deserve to bestudied,
amongthem,
thecapability
ofgeneralizing
concepts fromexamples.
When agiven
set ofexamples
islearned,
we must be able torecognize
the essential information embedded in eachexample, learning
theunderlying
conceptexemplified by
them. Thiscategorization
processimproves
both the way memories are stored in real systems and thehandling velocity
of information. As stressedby
Virasoro[I],
this is not a choice but rather a
necessity.
To define the
generalization problem,
let us denote each of the P conceptsby
an N- dimensional vector((~)
with components ~l and the sexamples
of each conceptby ((~")
where ~
= l,..
,
P and v
= I,,
, s. Each
example
is created from one conceptintroducing
acertain amount of
noise,
measuredby
r. The noise parameter r is defined as theprobability
that the I-th neuron in an
example
is dilserent from thecorresponding
neuron in the concept,I,e.
P((f"
=-(f)
= r=
(I b)/2,
while b is theoverlap
between a concept and itsexamples.
The concepts are
randomly
chosen uncorrelated patterns withprobability
0.5 of each neuronstate be ~l. Remark that
only
theexamples
arelearned,
not the concepts.Symmetric spurious
states, that disturb thedynamics
when otherproperties
arestudied, play
animportant
role in thecategorization
process: in thegeneralization phase,
the systemextracts the common information from the
examples
of agiven
concept ~ and the state C"(or
2020 JOURNAL DE PHYSIQUE I N°i1
one near
it)
that lies in the "center of mass>' of allexamples
and isgiven by
Cl
= SgnI, £
if") (i)
v=1
becomes a stable fixed
point.
This stateoverlaps
on averageequally
with all sexamples
and is theeffectively
learned concept. In this context, the desired situation occurs when these statesare absent if the patterns are not correlated but this
s-symmetric
central state appears when similar patterns(the examples)
are stored [2].In the
Hopfield model,
that may be describedby
the energy functionE =
) ~i ~ ~ ~i m(v (2)
where m~v is the
overlap
with theexamples
m»v =
j i~ff"S; (3)
1
the
particular examples
are nolonger
stable when the system enters in the so calledgeneral-
i2ationphase
if the number of concepts is finite(if
it isextensive, they
are neverstable).
Asa consequence the network cannot
recognize
agiven
real12ation of a concept(an example)
butjust
the concept itself. Onepossible
measure of thegeneralization
error is theHamming
dis- tance e between theoriginal
concept(not taught)
and thegeneralized
state(effectively learned)
for a certain number s of learned
examples:
where m~ is the
overlap
with the~-th
concept~@
~ ~
~i~~i (~)
,
The
generalization
in theHopfield
model for tx'#
0(tx'
e number ofconcepts/number
ofneurons)
waspreviously
studiedthrough
numerical simulationby
Miranda [4]: above a critical number ofexamples
sc, thegeneralization
errordecays rapidly following
a power lawif
~w
s~~,
where decreases
linearly
withtx').
Miranda also found that sc growsexponentially
with the noise level andlinearly
with tx'. The transition to thegeneralization regime
iscontinuous,
whileanalytical
calculations [5]predicted
it to be discontinuous.The T
= 0 simulation was
performed
for network sizes up to 8192 neurons for tx'= 0 and
0.05. The
algorithm
follows the steps:(I)
a set of P concepts isgenerated;
(it)
for each concept, oneexample (s
=I)
isgenerated
with noise b andtaught
to thenetwork;
(iii)
the initial state is set in one of theoriginal
concepts(or
one of theexamples)
and aneuron is
flipped
whenever this lowers the energy until a stable state is reached. Thegeneral12ation
error e is thenmeasured;
(iv)
anotherexample
is created(s
- s +I)
and steps(it)
and(iii)
arerepeated.
The step
(iii)
isrepeated
for many initial states and for several different sets of concepts,yielding
theaveraged generalization
error e. Animportant
difference from reference [4] is that thereonly
thestability
of the concepts weretested,
while here the initial state can also be set in one of theexamples
since the transition found inanalytical
calculations(for
tx'=
0)
refers to the situation when the individual
examples
stopbeing
stables. Themultispin coding algorithm
[7] was used in order to save computer memory and decreasecomputational
time.0.40
, D .
, o ~
, .
0.30
"
' ~ D
*;
, °
aL~. ~
, o u
£
~;0.20 o @..c ,, b = 0.25
, >*O
~
n'mq~
o o~«
b
= 0.6
0. lo °
.G_
WU
o.00~~ ~ ~ ~~ ~ ~~ ~
S
Fig-I.
Average generalization error e versus the number of examples s for a'= 0(P
= 5) and =
0.25 and 0.6. The network sizes are N
= 4096
(empty symbols)
and N= 8192
(filled symbols).
Note that for s < sc we have two values off, depending on whether the initial state is the concept(circles)
or the example
(boxes).
For s > sc, both curves merge. The analytical prediction corresponding to the case when the initial state is the example is indicated by the full line whie for the concepts it is the dashed line(after
[5]).Figure
I shows thegeneralization
error e versus the number ofexamples
s in the limit tx'- 0
(not
studiedby
Miranda [4]) and two different values of b. This limit is achievedby holding
fixed the number of concepts
(in
this case5)
andincreasing
the size of the network(up
to8192).
The averages were taken over 5 distinct sets of concepts
and,
for each set, all concepts and up to a maximum of 20examples
were tested. The main new result is that the discontinuoustransition
predicted by
the mean fieldtheory
[5] shows up. Thestep-like
form of the simulationcurve e,
clearly
seen for small s, demonstrates that an odd number of examp les isrequired
tocorrectly
extract the concept, while doubts are created when the number is even: the networkcannot decide between two situations. When the number of
examples
is even, some valuesCf,
equation (I),
are zero, thecorrespondent
state of the I-th neuron in the concept is not defined and thegeneralization
error does not decrease when an evenexample
is learnedby
the network.2022 JOURNAL DE PHYSIQUE I N°11
These steps do not appear in the
general12ation
error curve inprevious
calcula tions [5] because the substitution of binomial(discrete)
distributionsby
a Gaussian(continuous)
one masks this effectsmoothing
the curves. Thisapproximation
can also contribute for the smalldiscrepancy
between thepredicted
critical number ofexamples
[5] and the one found in the simulations(it
may also be due toreplica symmetric
unstablesolutions).
The actualshape
of the curves, obtained with discrete distributions [6], shows these fine details. Note that if the initial state is set in one of theexamples,
the simulatedcurves fit very well the
analytical prediction
[5](solid line)
forlarge
networks. On the otherhand,
when one of the concepts is chosen as the initial state of thedynamics,
we find that for s < sc the net stabilizes in a different state,yielding
another value for the error e: bothexamples
and concepts are stable and have distinct basins of attraction. Aspredicted
in reference [5], for s > sc, thegeneralization
errorobbeys
the
equation
f -
lerfc (~~i~~~~~~) ~~j (6)
Here we found that this solution also holds for s < sc, as can be seen in
fig.
I(dashed line)
if the initial state of thedynamics
is one of the concepts. Moreover, the conclusions of Miranda [4] do not hold in the tx'= 0 limit: from
equation(6),
thegeneralization
error, for great s, nolonger
follows a power law but has an
exponential decay.
The model was also studied for tx'= 0.05
o.50
0,40
~--~~
£ ~--~,'~~§
0.30 ~ ',
',
~£u ~ ',
'
%
0.20 ,
, ,
,
o.io
~~~0.0 ~j0 20.0
Fig.2.
Average generalization error versus the number s of examples for a'= o.05, b
= 0.6 and
N =1024
(o),
2048(D),
4096(hi
and 8192(O).
For s < sc the curves, for both concepts(empty
symbols)
and examples(filled symbols)
as initial states, tend to be a flat plateau in e= o.5 as N
increases. The full line is the mean field prediction
(after
[5]).(Fig. 2). Dilserently
from theprevious
case, here theplateau
must nolonger
be at the value of r but at 0.5 [5]: below a certain critical number ofexamples,
the network willalways
evolvetowards a
spin glass
state that has nomacroscopic overlap
with the concepts. In this limit thedependence
on the size of the network is too strong: the 0.5plateau
is not achieved for small networks but, as N increases, the curvesslowly approach
theirasymptotic
behavior.Also,
there is no
dependence
on the initial state, the curves forexamples
and concepts are almostthe same.
Again
the curves match theanalytical predictions
[5]and, comparing
with them(solid line),
it may also beargued
that the continuous transition foundby
Miranda wasjust
an elsect of the finite size of his
simulations, although
his other conclusions seems to remain valid(at
least for the sizes usedhere).
Virasoro
ill pointed
out theanalogy
between the lack ofability
inrecognizing examples belonging
to a certain class(although
not the classitself)
ina
synaptic
dilution context anda mental disorder known as
prosopagnosia:
when synapses are cut, the system first loose thecapability
ofdiscerning
theexamples
and further the concepts(agnosia).
The same elsectappears in the
Hopfield
model when too manyexamples
aretaught (for
a finite number ofconcepts)
the basins of attraction of theexamples
are ruled out when their number is greater than a critical value. If one has an extensive number of concepts, theexamples
are never stable and the system isalways
in aprosopagnosia
state. This is not relevant when the taskproposed
to the system is toclassify
agiven
pattern in a certain class orcategory
and it isconceptually
correct to denominate thephase appearing
for s > sc in theHopfield
model asa
categorization phase. However,
when the purpose is to simulate systems that must preserve theidentity
of theparticular examples,
as well as extract the common information betweenthem,
the denomination of"generalization phase"
is somewhatimprecise. Also,
when the system stores agiven hierarchy
of patterns,including
classes inside classes severaltimes,
it is nolonger possible
to control with temperature the desired level on thehierarchy
[8]. Thiswould
only
bepossible
if the systempreserved
the individual basins of attraction.Thus,
the ideal situation consists of abig
basin of attraction centered at the concept with smaller basinsrelated to each
example lying
inside thebig basin,
around its center:general
ideas are easier to remember thanparticular examples
of such ideas. This situation is achievedby
othermodels,
for
instance,
the liS model [3].Summarizing,
wepresented
results for theHopfield
model and Hebblearning
rulestoring
sets of correlated patterns with each set
consisting
of someparticular examples
of agiven
concept. The results for tx'= 0 and a'
#
0 and severaldegrees
of correlation match very well theanalytical predictions
[5].Nevertheless,
it would beinteresting
tostudy
the size of the basins of attraction ofexamples
and concepts as their number increases [3] as well themean convergence
time,
in order toverify
whether there isor not a
change
of behavior forsome value of s. In
particular,
to test if the twofold solution found for tx'= 0 and s < sc
has similar basins of attraction or not. It would also be
interesting
toinvestigate
systems for which a newsymmetric phase (or
more than one,depending
on the number of levels ofhierarchy)
appears when sets of correlated patterns are embedded in the network [2, 3]. TheHopfield
model presents such aphase (for
all values of b) whenexamples
are learned inpairs
[9]
and,
in this case, the desired retrieval level can be controlled with temperature. As a finalremark,
a suitable and moregeneral
definition of the critical number ofexamples
sc to startgeneralization
should not make reference to thestability
of theexamples
or the mere existence of thes-symmetric
state as, forinstance,
the onegiven by
Miranda [4] that considers achange
in the
generalization
errordecaying velocity.
Acknowledgments.
We
acknowledge
to R-M.C. deAlmeida,
T-J.P. Penna and J.R.Iglesias
for fruitful discussions andsuggestions
and for a carefulreading
of themanuscript. Special
thanks to P-R- Krebs2024 JOURNAL DE PHYSIQUE I N°11
for very
stimulating
conversations and forproviding
his resultsprior
topublication.
Workpartially supported by
brazilianagencies CNPq,
FINEP and FAPERGS. Part of the simulationwas carried on a
Cray
Y-MP2E of theSupercomputing
Center at UFRGS.References
ill
Virasoro M., Phys. Rep, 184(1989)
300.[2] Arenzon J-J-, de Almeida R-M-C- and Iglesias J-R-, J. Stat. Phys. 69
(1992)
385Arenzon J-J-, de Almeida R-M-C-, Iglesias J-R-, Penua T-J-P- and de Oliveira P-M-C-, J. Phys. f France1
(1992)
55.[3] Branchtein M-C-, J-J- Arenzon, R-M-C- de AImeida and J-R- Iglesias, in preparation.
[4] Miranda E-N-, J. Phys. f France1
(1991)
999.[5] Fontanari J-F-, J. Phys. f France 51
(1990)
2421.[6] Krebs P-R-, private communication
[7] Penna T.J.P. and de Oliveira P-M-C-, J. Phys A22
(1989)
L719.de Oliveira P-M-C-, Computing Boolean Statistical Models,
(World
Scientific Publishing,1991).
[8] We thank P-R- Krebs for an interesting discussion on this point.
[9] Tamarit F-A- and Curado E-M-F- J. Stat. Phys. 62