HAL Id: jpa-00212375
https://hal.archives-ouvertes.fr/jpa-00212375
Submitted on 1 Jan 1990
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Prosopagnosia in high capacity neural networks storing
uncorrelated classes
S. Franz, D.J. Amit, M. A. Virasoro
To cite this version:
Prosopagnosia
in
high
capacity
neural networks
storing
uncorrelated classes
S. Franz
(1, 3),
D. J. Amit(2, *)
and M. A. Virasoro(3)
(1)
Istituto diPsicologia
del C.N.R., Viale K. Marx 14, Roma,Italy
(2)
I.N.F.N.,Dipart.
di Fisica, Università di Roma « LaSapienza
», Roma,Italy
(3)
Dipartimento
di Fisica dell’ Università di Roma « LaSapienza
», Piazzale Aldo Moro 4, Roma,Italy
(Reçu
le 7 août 1989,accepté
le 20 octobre1989)
Résumé. 2014 On met en évidence une matrice
synaptique
qui
stocke efficacement les patternsorganisés
encatégories
non corrélées dans les réseaux neuronaux à attracteurs et les perceptrons.La
capacité
destockage
limite augmente avec le recouvrement m d’un pattern avec sacatégorie
ancestrale, etdiverge
lorsque
m tend vers 1. La distribution deprobabilité
desparamètres
de stabilité locaux estétudiée,
et conduit à uneanalyse complète
desperformances
d’un perceptron en fonction de sa matricesynaptique,
ainsiqu’à
unecompréhension
qualitative
du comportementdu réseau neuronal
correspondant.
L’analyse
de l’attracteur du réseau estcomplétée à
l’aide de lamécanique statistique.
La motivation d’une telle construction est de rendrepossible
l’étude d’un modèle deprosopagnosie :
le passage durappel
individuel à celui decatégories
lors de lésions, c’est-à-dire d’une détérioration aléatoire des efficacitéssynaptiques.
Lespropriétés
derappel
du modèle en fonction de la matricesynaptique proposée
sont étudiées en détail. Enfin nouscomparons notre matrice
synaptique à
une matricegénérique
dont tous lesparamètres
de stabilitésont
positifs.
Abstract. 2014 We
display
asynaptic
matrix that canefficiently
store, in attractor neural networks(ANN)
and perceptrons, patternsorganized
in uncorrelated classes. We find a storagecapacity
limit
increasing
with m, theoverlap
of a pattern with its class ancestor, anddiverging
as m ~ 1. Theprobability
distribution of the localstability
parameters is studied,leading
to acomplete analysis
of theperformance
of a perceptron with thissynaptic
matrix, and to aqualitative understanding
of the behavior of thecorresponding
ANN. Theanalysis
of the retrievalattractor of the ANN is
completed
via statistical mechanics. The motivation for the construction of this matrix was to makepossible
astudy
of a model forprosopagnosia,
i.e. the shift from individual to class recall, under lesion, i.e. a random deterioration of thesynaptic
efficacies. The retrievalproperties
of the model with theproposed
synaptic
matrix, affectedby
randomsynaptic
dilution are studied in detail.Finally
we compare oursynaptic
matrix with ageneric
matrix whichhas all
positive stability
parameters.Classification
Physics
Abstracts 87.30 - 64.60 -75.10
1. Introduction.
Following
thediscovery by
Gardner[1]
that there existsynaptic
matrices which would storesparsely
coded(magnetized)
pattems
in neural networks withcapacity
that exceeds that of(*)
On leave from the Racah Institute ofPhysics,
HebrewUniversity,
Jerusalem.uncorrelated
patterns,
anddiverges
as the correlation between thepatterns
tends to one[2],
several
proposals
forexplicit
realization of such matrices have beenput
forth. The aim has been to find a modifiedsynaptic prescription
of theoriginal Hopfield
model[3]
for which the Gaussiannoise,
that affects the localstability
of the storedpatterns,
decreases as thecorrelation between
patterns
invreases[4, 5],
while thesignal
is controlledby
anappropriate
threshold[6].
Such constructions filled the gap between the Gardner results andprevious
work on the
storage
ofmagnetized
patterns
[7-9]
whichpredicted
astorage
capacity
decreasing
with themagnetization.
In anindependent developement
it has been shown[10],
using
Gardner’smethod,
that the limit ofcapacity
of a networkstoring
patterns
organized
in afinite number of uncorrelated
classes,
is the same as in the case of asingle
class ofsparsely
coded(biased)
patterns.
On the other
hand,
theattempt
[11]
to extend thehigh
storage
prescription
[4-6]
tomultiple
classes has turned out to be limited to classes of
higly
correlated ancestors. Here wegeneralize
the work of reference
[6]
to find asynaptic prescription
which can be store uncorrelated classes ofpatterns.
Each of the classes isrepresented by
an N-bit ancestor orprototype.
These ancestors are uncorreleted. The individuals in agiven
class aregenerated by
a randombranching
process in whichthey
are chosenrandomly
with a fixed level of correlation with the ancestor. This is carried out in section 2 where we reformulate thesynaptic
matrix ofreference
[6]
so as to make it gauge invariant. The gauge invariantsingle
classdynamics
iseasily
extended tomultiple
uncorrelatedclasses,
giving
a Gaussian distribution of the localstability
parameters
identical to that of asingle
class[6].
Consequently,
thecapacity diverges
as in the
optimal
storage
ofGardner,
when the difference between individuals and theircorresponding
prototypes
tends to zero(limit
of fullcorrelation).
It is shown that while thenatural extension of the
synaptic
matrix isasymmetrical,
asymmetrical
version can also beconstructed. Section 2 concludes
showing
that theproposed
synaptic
prescription applies
tothe
storage
ofpatterns
in an ANN(attractor
neuralnetwork),
as well as to theimplementation
of an associative rule for aperceptron.
In the latter case the network isexpected
to associatecorrectly
a set ofinput
patterns
to a set ofoutput patterns.
Both theinput
and theoutput
patterns
aregrouped
incorresponding
classes,
i.e. if twoinputs belong
to the sameclass,
thecorresponding
outputs
belong
to the same class. Inside thecorresponding
classes the association isunspecified.
The task of theperceptron
is to associate with all individuals in aninput
class therepresentative
of oneoutput
class.In section 3 and 4 we
analyze
theprobability
distribution of the localstability
parameters
both for aperceptron
and for an ANN. Forperceptrons,
theprobability
distribution of the localstability
parameters
fully
characterizes theproperties
of theoutput
state when agiven
pattern
ispresented
asinput
[12].
It is
interesting
toanalyze
the behavior of networks with thissynaptic
prescription
whenthey undergo
deterioration,
for instanceby
random destruction of a fixed fraction of thesynapses. For one individual
pattern
belonging
to aparticular
class it is useful todistinguish
those sites that are common with the class
prototype
(which
we callplus
sites)
and those in which the individualpattern
differs from the class(minus sites).
It waspointed
out[14]
that inGardner’s measure almost every network that stores correlated
patterns
willspontaneously
show a level of robustness which ishigher
in theplus
sites then in the minus sites. This is aprobabilistic
statement and as such itdepends
on the apriori
probability
distributionassumed. But a
particular synaptic
matrix,
thepseudo-inverse
forexample,
may lead to anetwork which is
highly untypical
withrespect
to theprobability
measure. In section 5 weshow that when the number of
plus
sitesincorrectly
retrieved is of the same order of that ofFor ANN’s this
probability
distribution determines the state of the network after onestep
of
parallel
iteration,
andgives
arough
estimate on the attractors of thedynamics
[12, 13].
Other methods must be
employed
toproduce
a fulldescription
of the attractors. In the case ofsymmetrical
synapses, statistical mechanics can be used. This is done in section 6 where theanalysis
confirms and clarifies thepicture
that emerges from theanalysis
of thestability
parameters.
In section
7,
westudy
the effect ofsymmetric
dilution of thesynaptic
matrix on theattractors of an ANN. Just like in the
perceptron,
or for the firststep
parallel dynamics
of anANN,
the lesion affects the attractorasymmetrically
at theplus
and the minus sites. For low levels of dilution the individualpatterns
can be stillretrieved,
although,
theprobability
ofmaking
an error at aplus
site increases morerapidly
then thecorresponding probability
at aminus site. As the dilution level
increases,
the attractors of individuals arecompletely
destabilized,
while those of classprototypes
remain fixedpoints
of thedynamics.
Finally
in section 8 we compare oursynaptic
matrix to ageneric
one in the Gardner volume.For that matrix the
stability
parameters
at every site arepositive
when the network is in astored
pattern,
i.e. there are no retrieval errors in absence of fast noise. We find that thisoverlap
does notdepend
on theparameter
which is introduced to reduce the slow noise. We show that in the limit of full correlation oursynaptic
matrix is the axis of the cone in which the Gardner volume is embedded(1).
2. The model.
2.1 REFORMULATION OF HIGH STORAGE IN A SINGLE CLASS. - In a model
storing
asingle
class p
= a Nmagnetized
patterns (ai") ,
thepatterns
are chosen with theprobability
lawm is the
magnetization,
or accessactivity,
of eachpattern
in the limit N --+ oo . The model is formulated in terms of a Hamiltonianwith Si
’= ± 1 and
-1 p
The
parameters
U and c, are then varied to maximize thestorage
capacity
which is anincreasing
function of m, and in the limit mu1,
thecapacity diverges
as the theoretical limitfound
by
Gardner[1].
As
usual,
the trend of the result can be detected from asignal
to noise calculation for thestability
of the storedpattems.
Details are filled inby
a statistical mechanicscomputation,
which ispossible
due to thesymmetry
of theJij’s.
Togeneralize
the model we focus ourattention on the
signal
and the noiseparts
of thestability
parameters
at each site of the storedpatterns
-(1)
When this work wascompleted
we received a brief report of Ioffe, Kuhn and Van Hemmen,where
is the local field.
The first
step
in thegeneralization
of the model to uncorrelated classes is to store a class ofpatterns
whose ancestor is different from the state of all l’s Le. to storepatterns
ur
= a i"
ai with anarbitrary
ancestor oi =
± 1. Note that due to the presence of c and U in(2.5)
thestability
parameters
L1r
in formula(2.4)
are not invariant under the gaugetransformation
(Mattis transformation) [15]
where
oj
= ± 1 arearbitrary.
Hence it is notpossible
to pass from thestorage
of theferromagnetic
patterns
{ ur}
to thestorage
of asingle
class ofpatterns {uf}
={ ur U j}
by
gaugetransforming
thesynaptic
matrix. This is also the main obstacle to thestorage
of uncorrelated classes.The model is therefore reformulated to ensure that
L1r
and itsprobability
distribution be gauge invariant. We can rewrite(2.4)
in the formif we take
In other
words,
U and c are absorbed in the definitionof Ji’j :
from PSPs or thresholdsthey
become
synaptic
efficacies. The non-local term that hasappeared
in(2.8)
could not beevoided when
storing multiple
classes. It willplay
an essential role inreducing
the slow noisedue to the
storage
of an extensive number ofpatterns,
taking advantage
of the correlationsbetween the
patterns
with a class. The distribution of theJ1r
is the same as in theoriginal
model,
as can be seenby substituting
(2.8)
in(2.7)
andusing
the fact that1
af
= m. If ini
(2.6)
one substituesJi’j
forJij
thestability
parameters
are now gauge invariant. We can,therefore,
pass from one class to anotherby
a mere gauge transformation[15].
Note that while the distribution of the
stability
parameters
in the model of reference[6]
and that of the reformulated model are the same, thedynamical
properties
of the two models canbe different. This can be seen
observing
that theoriginal
model is formulated in terms of aHamiltonian,
whichdisappears
in the reformulation becauseJi¡ + JJ;.
2.2 HIGH STORAGE IN MULTIPLE UNCORRELATED CLASSES. - Next we
proceed
to formulatea
synaptic
matrix that stores p = a Npatterns
{grV}
organized
in classes. There arePc
classes,
definedby
the uncorrelatedprototype patterns
orancestors {g r},
chosen withA
pattem
inclass g
is defined in relation to itsancestor (§f)
by
a stochasticbranching
process
[8] :
The number of classes p, =
(aN)"
with 0 -- y 1 and the number ofpattems
per class(aN)’-B
To store the{rV}’s
we useIn the sums, the
indices g
andÀ,
each takes thevalue,
1,
..., p,, v takes the values1,
...,plp,,
andk,
the values1,
..., N.The first two terms on the r.h.s. of
(2.11)
arejust
those that appear in reference[8, 9].
Tounderstand the effect of the third term, let us note that if the number of classes
p, «r.
N,
the matrixPij
=1/N Y
acts as theprojector
in the spacespanned by
theA
(§f) .
The third term in(2.11)
can,therefore,
be written asWhich subtracts from the
lij
thatprojection
of thevectors {( g (JI - mg ()}
in the spacespanned by
the classprototypes.
Assuch,
as we will see, it is very effective insubtracting
from thestability
parameters
of the individualpatterns
thatpart
of noise that comes from thisprojection.
Admittedly,
this termgives
rise to anon-locality
of thesynaptic
matrix. Yet this may not be such a seriousdrawback,
since it may bepossible
togenerate
itby
a local iterativeprocedure.
It should bepointed
out that a localalgorithm,
such as the« perceptron
leaming
algorithm
»,gives
rise to matrices which can not beexpressed
as localexpression
in terms of thepatterns.
The matrix
(2.11)
is notsymmetric.
Itis, however,
possible
tosymmetrize
it withoutchanging
in a relevant way the distribution of thestability
parameters.
Asymmetrical
versionof
(2.11 )
isNow the model is
again
Hamiltonian,
with H= - L
Iij Si Sj
and it can beinvestigated by
«/
statical mechanics.
2.3 THE HETERO-ASSOCIATION PERCEPTRON VIEW. 2013 Before
proceeding
to show that thissynaptic prescription actually
stores the extensive setof ei" ,
let uspoint
out that(2.11 )
caninput
units and Noutput
units. It is natural to extend theprescription
(2.11)
to the case in which theoutput patterns
are notnecessarily
the same as theinput
patterns.
We associate a set ofinput
patterns {u r V} j
=1,
...,
N,
organized
in uncorrelatedclasses,
to a set ofoutput
patterns 1 i
=1,
..., N’
belonging
tocorresponding
classes. N andN’ are,
respectively,
. the numbers of
input
andoutput
units.The
generation
ofinput
andoutput patterns
is doneby
independent branching
processes similar to(2.10),
frominput
andoutput
ancestorsUM
and1
respectively.
For aperceptron,
in contrast to anANN,
theoverlaps
of theinput
patterns
with theirinput
prototypes,
min, and thecorresponding overlaps
of theoutput patterns
with theoutput
prototypes,
mout, areindependent
variables[10].
The association is
performed through
the usual noiseless response functionIn the case of auto-association N =
N’,
m;n = mout,ur"
=eé"
forall i,
IL, v.Equation (2.13)
can be viewed as the firststep
of theparallel dynamics
of an ANN.The extention of
(2.11)
to the hetero-associationperceptron
can be written asThe overall factors in
equations
(2.11), (2.12)
and(2.14)
have been chosen so thatThis will tum out to be useful in section 8. The
prescriptions
(2.11), (2.12)
and(2.14)
can begeneralized
to a full ultrametrichierarchy
ofpatterns.
Thepresentation
of this more involvedcase will be
given
elsewhere.3. The distribution of the local
stability
parameters.
In this section we
analyze separately
theprobability
distribution of thestability
parameters
when the network is in a state which is an individual
pattern
or in a state which is an ancestor,with the
synaptic
matrix(2.14).
Theperceptron
will have Ninput
units andN’ output
units,
and will
store p
= a Npatterns
equally
divided in a number pc =(a N )’Y,
of classes(yl).
Thecorresponding analysis
for the firststep
parallel dynamics
in an ANN isrecovered
by setting
min =mout = m in all formulae. We do not
analyze
in detail the case ofthe
symmetric
matrix(2.12),
which is verysimilar,
except
to indicate when formulaeundergo
essential modifications.As we stressed in the
introduction,
it is useful todistinguish
in apattern {grJl}
the sites inprobability
distributionof g
is(from (2.10))
i.e. the fraction
of §
sites is(1
+mout)/2
of the total.Let us consider the
stability
parameter
à,
of one of the storedpatterns
(e.g.
pattern
(§/) ) at a § site.
Using
(2.14)
and N
uJ uJl
=m;" we obtain
J
The first term in
(3.2)
is theexpected
value ofàe.
The second term is a Gaussian noise with zero mean and variance(1 - 2 Cmin + m.)
which has a minimumequal
to1 - m
forc = Min
[6].
The third and fourth terms in(3.2)
sum terms inJij
different from{ U jll, ej’ .
They
do not contribute at all.They
have zero mean and do not fluctuate in the limit N --> 00. This can be seen, forexample,
from theexpectation
since Pc =
(aN)".
In the case of the
symmetrical prescription
(2.12)
for ANN(i.e.
min = mout = m,urv = grV)
an extra term appears inâe,
i. e.In the limit N --+ oo this term becomes
namely
it iseffectively
a shift of Uby
aquantity -
a. Such a shift isirrelevant,
since U is aparameter
to beoptimized.
It should be stressed that even if theasymmetrical
and thesymmetrical
prescriptions (2.11)
and(2.12)
lead to the same distribution of thestability
The ratio between the
expected
value ofàe (signal)
and its standard deviation(noise)
isThis shows that in the limit mout -+ 1 a can
diverge roughly
as1 / (1 -
mout )
and that unless U = 1 theplus
and the minus sites have differentdegrees
ofstability.
We will see in the nextsection,
studying
the tail of the distribution of theAe [15],
that the maximization of thestorage
capacity
will be obtained for U :> 1. Thisimplies
that theplus
sites will be more stable then the minus sites. Furtherinsight
can begained by studying
the tail of theprobability
distribution[14],
as we will see in the next section.The
stability
parameter
at a site i of aprototype
(ancestor)
pattern
isgiven by
As in the case of
dg
the third and the fourth termsinâ,
have zero mean and do not fluctuate in the limit N - oo . The second term is a Gaussian variable of zero mean, which vanishes ifC =
mine ·
The choice c =
m;n is
optimal
because it minimizes the noise both inàe
and in,àc
[6, 11].
Inparticular, Ac
does not fluctuate at all and isIn
fact,
A, plays
the role of theparameter
M introducedby
Gardner[1].
We will see belowthat
optimizing
U for a = a G, where a G is the Gardner bound onstorage
capacity [1],
implies
a =
M for mout -+ 1. With thesynaptic
matrix(2.14),
the r.h.s. of(3.6)
becomeswhere we have set min =
mo"t = m, as is
required
for an ANN. Thiscorresponds
toshifting
U
by -
a and does not have any effect when U isoptimized.
The above
analysis
does not hold in the limit cases min =0,
i.e. when allinput
pattern
areindependent
and there are no classes. It also does not hold when min =1,
i.e. when all4. Performance.
From the
analysis
of the abovesection,
theprobability
distribution ofâg
is :where we have chosen c =
m; as in references
[6, 11].
To
study
theperformance
of the network we introduce theprobability
Ee
of an error in ae site,
i.e. a site of descendent with relativesign e
to the same site in its ancestor. From(4.1)
where Note that
Ee
isindependent
of minel je w à ,,,
(except
for min = 0 and min = 1 where(4.2)
does nothold). Consequently,
the discussion inthis section can be carried out for fixed
(arbitrary)
min.Eg
represents
the number of errors inthe e
sites dividedby
the numberof e
sites. The contribution of these sites to the total average error isand the total average error E is
given by
Clearly,
as E decreases the network retrieves the stored information moreeffectively.
Theperformance
of the network isoptimized
if U is chosen to minimize E.Differentiating
(4.2)
respect
to U one finds :For this value of U the error in the
plus
sites and the error in the minus sites are of the sameorder of
magnitude.
Infact,
Since
Up, :::.
1,
it can be seen fromequation (4.2)
thatwhich is a consequence of the fact that the
expected
value of thestability
parameter
isgreater
at theplus
sites then at the minussites,
the noise at bothtypes
of sitesbeing equal.
In
figures
1 and 2 weplot
E+,
E_ and E vs. mout for a= 8 a G
with ig =
0.9 andFig.
1. -E+,
E_ and E as functions of mout foret = 0 - 9 X et G. E,
and E aredecreasing
functions of mout. E- has a maximum which can lie very close to the value mout = 1.Fig.
2.- E, ,
E_ and E as functions of mout for a = 0.1 xaG. The maximum of E_ is lower then in
figure
1 and lies at a lower value of mout. In this case, the range of m.ut for which the model can retrieveequal
to E for mout = 0 and tendint to zero for mout --+ 1. It has amaximum,
that shifts to theright
forincreasing
values of8.
Note that forf3
=0.9,
the value of the maximum is veryhigh,
it lies very close to mout = 1.
Clearly,
agood
discrimination of an individualpattern
from its classprototype
is obtainedif,
in addition to E «r.1/2,
also E- « 1/2. Itis, therefore,
necessary that theargument
of the error function inE_,
bepositive,
which is satisfied if U -- 2. When mout --+1,
it ispossible
to determine theasymptotic
behavior ofEe,
equation
(4.2),
and of thestorage
capacity.
In thislimit,
as all theoutput patterns
tend to theiroutput
prototypes,
the task for the network reduces to the classification ofinput
patterns,
and it ispossible
to reduce the error to zero, since the number of classes is not extensive.To see this note that since E is
given by
(4.4),
to have E --> 0 it is sufficient thatE+ --+
0,
regardless
to the value ofE_.
E+
will vanish whenAt each
input
pattern
in agiven
class,
has to be associated theprototype
of thecorresponding
output
class. This is an easy task[10],
and can beperformed
for anarbitrary large
number ofpattems.
But it is morerelevant,
torequire
that even in the limit m.ut --+ 1 the individuals bedistinguished
from the classprototypes ;
i.e. torequire
thatE- --+
0.Then,
using
(4.2),
we must haveThis is
particularly
relevant in anANN,
whereE. -
0guarantees
thatthe p
storedpatterns
em’
be fixedpoints
of thedynamics.
Using
the value(4.5)
forU,
andwriting a
aswe find that condition
(4.9)
is fulfilled for any/3
in the interval 0 :/3
1.Substituting
(4.10)
in
(4.5)
we findIt is
interesting
to compare.dc, given by
(3.6),
with Gardner’sparameter
M= L r Jij
ur
j
where
Jij
is a matrix in the Gardner volume. Oneobtains,
from(4.11)
and the saddlepoint
équation
in[1],
This shows
that,
in the limit mout --+1,
in which «approaches
the Gardnerbound,
thestability
parameters
of the classrepresentatives
approach
the value takenby
theoptimal
matrices thatstore the
given
pattems.
In section 8 we will see that the distance between the matrix(2.14)
The
asymptotic
form ofEg,
equation
(4.2)
isgiven,
as mout --+1,
by
which vanishes with a
3-dependent
power as m., --+1, apart
fromlogaritmic
corrections. Forhigh
values off3
(== 1 ),
thedecay
ofE_
to zero is very slow. Formula(4.13) applies
in a very narrow range below mout =1,
because the power in theexponent
becomes very small. Thiscan be seen in
figure
1, for f3 = 0.9,
where E_ has a maximum for mout= 0.98,
andonly
then decreases to zero.When Q ± 1,
equation (4.13)
is still valid forE+ ,
whileE_ =
1/2 forf3
= 1 andE_ =
1 forj8
:> 1. We see that for0 = 1 (a = a G
andà,
=M)
there is atransition,
in whichthe network looses its
ability
to recallindividuals,
but can still recall classes. 5.Noisy
patterns
and lesions.Next we
study
theperformance
of the network when theinput
is apattern
with random errors or when thesynaptic
matrix is lesionedby
a random dilution of synapses. If theinput
pattern
{Uj}
has anoverlap a
with one of the storedpattern,
e.g.{u]1}
i.e.
11N
uJl
o-j
= a ,
i
then the same
arguments
that led to formula(4.2)
give
with
The error
Eg
is defined withrespect
to the desiredoutput el’.
Theresulting
overlap
of theoutput
with e 11
is[12]
The
analogous equation
was derived in references[12, 13]
to determine the « basins ofattraction » of the stored
patterns.
Note that for a in the interval(0, 1), 6
is adecreasing
function of min. This means that for
greater
min the basins of attraction of the storedpatterns
are smaller.
Expressions
(5.1)-(5.3)
apply
also to the case of random dilution of synapses, witha2 replaced by
p, theprobability
of survival of a synapse. In otherwords,
ifthen
(5.1)
holds with.
The
presentation
of anoisy
pattern
as well as the dilution of the synapses are,therefore,
equivalent
to an increase of aby
a factor1/
BFb
(keeping
Ufixed).
The derivative of
Ee
withrespect
to b at 8 = 1gives
the increment of the error due to aninfinitesimal lesion
As
long
as U2,
where it should be to allow recall of individualpatterns,
the error incrementin the minus sites is
greater
then that in theplus
sites. As was mentioned in theIntroduction,
this difference in the increments occurs for almost all
synaptic
matrices in the volume whichoptimizes
thestorage
of thepatterns
[14].
There,
it is a consequence of the fact that theexpected
value of thestability
parameters
at theplus
sites isgreater
then at the minus sites. In otherwords,
more information is lost in those sites that code the difference of individuals withrespect
to their class(i.e.
sites at which apattern
differs from its classrepresentative
prototype)
than at those sites that confirm the class. This effect can beinterpreted
asprosopagnosia
once aquantitative
criterion isgiven
to establish what is the maximum amountof error in the munis sites for which one can
distinguish
individuals,
and what is the maximumamount of error in the
plus
sites for which one can have class recall. It is clear that a very smallvalue of 8 has a
marginal
effect on theperformance
of thenetwork,
while alarge
8
destoys
all the informations stored(agnosia).
Somewhere in between there is theregion
ofprosopagnosia,
in which the classes arerecalled,
but it isimpossible
todistinguish
the individuals inside a class.On the other
hand,
if theinput
is apattern
which has anoverlap a
with one of the ancestorsand is otherwise uncorrelated with any of the
individuals,
then the error fraction of theoutput,
withrespect
to thecorresponding
output prototype,
isgiven by
Consequently,
for any set ofparameters
( a ,
min, mout,a )
we can makeE,
arbitrarily
smallby
taking
Of course, if U is
optimized
to have the minimumE,
the classes areefficiently
retrievedonly
for
a (1 - m 2 ) «.c 1 .
6. Statistical mechanics of the
symmetrical
model.We have seen that up to a redefinition of
U,
the distribution of thestability
parameters
doesnot
change
if thesymmetrical
orasymmetrical
version of the model is chosen. Theanalysis
of[12, 13]
picture
of whathappens
in an ANN. For thesymmetrical
matrix(2.12)
the model isgoverned by
the Hamiltonian H= - L
Iij Si Sj.
Acomplete analysis
of the retrieval states of«;
the
dynamics
can be obtainedstudying
the statistical mechanics of thesystem
[17].
In thefollowing
we restrict the discussion to the cases in which theoverlap
with asingle
individualpattern
(e.g.
(§)) )
and/or theoverlap
with asingle
classprototype
( ( §)) )
is condensed.The free energy
density
of thesystem
is foundby
a mean fieldtheory, using
thereplica
method to average over thequenched
patterns
[17].
Ifreplica
symmetry
holds,
thenwhere
(3
is the inversetemperature
and the doubleangular
bracketsimply
an average over the distribution ofe.
Themeaning
of theparameters
in(6.1)
is thefollowing : th
is the meantemporal overlap
with the retrieved classprototype,
M
is the normalized meantemporal overlap
withet"’ - mel£,
i.e.r is the normalized square of the uncondensed
overlaps,
q is the Edwards-Anderson
[18]
orderparameter,
and C is the
susceptibility,
The
angular
brackets stand forthermal,
ortemporal,
average, andDy
is a normalized Gaussian measure for the slow noise.The
parameters
(6.2)-(6.6)
satisfy
a set of saddlepoint
equations.
In the limitwhere
From the definitions of m and
M,
equations
(6.2)
and(6.3),
and from their saddlepoint
values(6.7)
and(6.8),
respectively,
theprobability
Ee
ofmaking
an error ina g
site is :When m - 1 below a Ce
M -+
1 C --+ 0 rh --+ m andequation (6.12)
reduces toequation
(4.2)
for the first
step,
asexpected.
The numerical solutions of the saddle
point equations
can be used to find thestorage
capacity
and thecorresponding
value of the retrievalquality
for classes and forindividuals,
m and
M,
respectively.
Infigure
3 weplot
Uc vs. m, which is anincreasing
function of m, anddiverges
form -+ 1,
asexpected.
Infigure
4M
and rh areplotted.
Theshape
ofM
as a function of m can be understoodobserving
thatFig.
3. - TheFig.
4.M
and m vs. m for a =a C. Note that rh grows almost
linearly
and for all value of m one has th - m.M
has a minimum at m = 0.78, as aconsequence of the
optimization
of U.and that E_ has a maximum for m :::.
0,
as can be seen fromfigure
5. The maximum ofE_
infigure
5 has the sameorigin
as those offigures
1 and2,
it is due to theoptimization
of U. It isconvenient,
in order to maximize thestorage
capacity,
to haverelatively high
values ofE_ ,
andcorrespondingly,
low values ofM.
It must be noted also that unlike the firststep
case,E also has a maximum as a function of m. Note the
qualitative
accord offigure
5 withfigures
1 and 2.Fig. 5.
-E+,
E- and E vs. m for a =crc. The maximum in E-
corresponds
to the minimum inOf course the states in which a
class,
but noindividual,
is retrieved(th :0
0,
M
=0)
are also relevant fixedpoints.
In this case, for any value of a and m,while
confirming
that the classes are stored with no errors. This can be understood since we have constructedsynaptic
matrix so as to have no noise on the classprototypes.
7.
Symmetrical
dilution ofsynaptic
matrix.We now
proceed
tostudy
the effect on the attractors of a lesion of thesynaptic
matrix. In order tokeep
thesystem
Hamiltonian we consider asymmetric
dilution of thesynaptic
matrixaccording
(4.4), (4.5).
It has been proven that such a kind of dilution is
equivalent
to the introduction in the Hamiltonian of an SKspin-glass
interaction among neurons[19],
with
l;j
independent
Gaussian variables with zero mean and variance1/Na
(1 - p )/p.
In thereplica symmetric theory,
this effective interaction adds to the free energy a term of the form[20]
The effect of this term in the saddle
point equations,
is tomodify equation
(6.10)
to read :A first effect of the dilution on the attractor is
analogous
to the effect on theperceptron.
The noise inducedby
the lesion affectsasymmetrically
theplus
sites and the minus sites of the individual attractors.Repeating
theargument
of section5,
one finds that the increment inE_
isgreater
then thecorresponding
increment inE+.
A moreimportant
effect,
regards
thedesappearance
of the individual attractors. Thecapacity
limit for the recall of individuals is loweredby
the new term is r, as can be realizedobserving
that a c is adecreasing
function ofthe noise r.
Furthermore,
due to thedependence
of the additional noise(7.1)
on a, anothercritical value of a appears, to be denoted
by
acl, at which the classprototypes
cease to beattractors. This is a
point
offerromagnet-spin
glass phase
transition.Neverthless,
for a smallnoise
level,
a CI >
a c. There is a wholeregion
a c - a aCI in which the network is in a stateof
prosopagnosia.
The individual information iscompletely
lost,
while the information about the classes is still stored in the network.Only
for veryhigh dammage,
the network loosescompletely
allinformation,
falling
into a state ofagnosia.
8.
Comparison
withoptimal
storage.
In section
4,
we have found that in the limitm.ut --+ 1
thestorage
capacity
of a network withthe matrix
(2.14)
approaches
the Gardner bound and inaddition,
the error goes to zero andthe matrix
Jij
(2.14)
with ageneric
matrix that stores thegiven
patterns
with no errors, i.e. amatrix within the volume of
£l’s
thatsatisfy
[1]
Specifically,
wecompute
theoverlap
averaged
over the Gardner volume.Q
can also beinterpreted
as theangle
cosine between thematrix
(2.14)
and ageneric
matrix that satisfies(8.1).
The
probability
distributioncorresponding
to the constraints(8.1)
isThe
quantity Q
isself-averaging
withrespect
to the measure(8.3)
and does notdepend
on theparticular sample
ofpatterns
stored.Substituting
(2.14)
in(8.3)
we obtainSince M
= L gr
liT ur
is aquantity
that does not fluctuate we can write the second and thej
third terms in
(8.4), respectively,
asand
respectively.
These two terms are of orderp,,IN
and do not contribute toQ
when N - oo. So the termsproportional
to U and c in theJij’s,
that are essential for thestorage,
arewhich is of order
1/
JN.
Developing
theleading
term of(8.4)
we obtainHence,
in order tocompute
Q,
all that is needed is theprobability
distribution of thestability
parameters
at the
plus
and the minus sites. Infact,
if we denote this function inthe e
sitesby
Fg (,à 9 *)
thenwhere the
angular
brackets stand for the average over the distribution(3.1)
ofe i. e.
expliciting
the average
over g
The distribution
Fe
(A*)
wascomputed
in reference[14]
as a function of the orderparameter
M and the
parameter
q = E
lija
lijb,
, where the distinctmatrices j a
lijb
satisfy
equation
i
(8.1 ) .
It is :where,
for agiven
a «-- a,, q and Msatisfy
the Gardner saddlepoint equations
[1].
OneQ
isconsequently,
independent
of m.because,
as was shown in[10],
V m. 10
M
does not1-m
depend
on min"For q
=1,
i.e. a =ac
(8.13)
reduces towhich tends to 1 in the limit mout -+ 1. From
(8.14)
one can see that for allm..t :0 1,
one has6
1,
i.e. theangle
formedby
matrix(2.14)
and the Gardner volume is different from zero.In
figure
6,
Q
isplotted
for fixed values of q.Q
is anincreasing
function of mout and one seesthat as mout
increases,
the distance of matrix(2.14)
from the Gardner volume decreases. Thiscan be
intuitively
understoodobserving
that the retrievalproperties
of the matrix(2.14)
are more and more similar to that of theoptimal
matrices as moutapproaches
the 1.Fig.
6. -Q
vs. mo"tfor q
= 1and q
= 0.8. For fixed q,Q
is anincreasing
function ofm..,
showing
that the distance of matrix(1.14)
from the Gardner volume decreases with mo"t.Actually,
for ageneric q,
asm.,,,t --> 1
(8.13)
reduces toThis
implies
thatConclusion.
In this paper we have considered a
synaptic
matrix that storesefficiently
uncorrelated classes ofpattern.
Thestorage
capacity
limit behavesqualitatively
as in the case ofoptimal
storage :
it is an
increasing
function of the correlation of the individuals with their classprototypes,
anddiverges
in the limit of full correlation. We have obtained this resultby modifying
thesynaptic
matrix of reference[8, 9].
In the framework of this model the
leaming
of a newpattern,
afterO (N )
havealready
beenmemorized,
can beimagined
asoccurring
in twosteps.
The newpattern
iscompared
with the memorized
classes,
and the difference with therepresentative
of the class with which itoverlaps
class is stored in a Hebbian-like outerproduct.
In addition theprojection
of this difference on the spacespanned by
the classesprototypes
is subtracted. This reduces the Gaussian noise on thestability
parameters
of thepatterns
stored.Then,
together
with the individualpatterns,
the classesprototypes
are stored so as to havegreater
stability
then the individualpattern.
This is necessary to minimize the number oferrors when an individual
pattern
is retrieved. It then follows that theplus
sites are morestable then the minus sites in accord with what
happens
in the Gardner case.Prosopagnosia
appears at this
point.
It is modeledby
the fact that a lesion in the network structuredestroys
first the information about individuals then the information about classes.
Only
at muchhigher
level ofdisruption
the information about classes isdestroyed, mimicking
agnosia.
We have studied the matrix both in the context of ANN’s and
perceptrons,
and haveargued
that as far as memory errors are concerned the two becomeequivalent
when one identifies theerrors in the
perceptron output
with thosepresent
in theconfiguration
of the attractor.The retrieval of information in
perceptron,
isfully
characterisedby
theprobability
distribution of the localstability
parameters
of individuals and classprototypes.
For ANN’swe have considered a
symmetric
matrix and anasymmetric
one. At the level of theprobability
distribution of the
stability
parameters,
the two matrices have the sameproperties.
However,
there would
normally
bedynamical
differences. Thesymmetric
matrix has been studied in detailby
statisticalmechanics,
which confirms thepicture
obtained from theanalysis
of thestability
parameters.
We did not simulate
extensively
thedynamics
of the ANN. Weperformed
simulations onthe zero
temperature
dynamics
of neural networks up to 100 neurons with thesesynaptic
matrices. Both the
symmetric
and theasymmetric
model have astorage
capacity
consistentwith the results of section 6. We observed
that,
asexpected, beyond
the limit ofstorage
capacity,
initial stateshighly
correlated with one individualpattern
evolves to thecorrespond-ing
classprototype.
Acknowledgments.
S.F.
gratefully acknowledges
ascholarship
of Selenia S.P.A. D.J.A and S.F. aregrateful
forthe
hospitality
extended to themby Imperial College
London,
where the work wascompleted.
References
[1]
GARDNER E., J.Phys. A
21(1988)
257 ;Europhys.
Lett. 4(1987)
481.[2]
WILLSHAW D. J., BUNEMAN O. P. and LONGNET-HIGGINS H. C., Nature 222(1969)
960.[3]
HOPFIELD J. J., Proc. Nac. Acad. Sci. USA 79(1982)
2554.[5]
TSODYSK M. V. and FEIGELMANN M. V.,Europhys.
Lett. 6(1988)
101.[6]
PEREZ VICENTE C. J. and AMIT D. J., J.Phys. A :
Math. Gen. 22(1989)
559.[7]
AMIT D. J., GUTFREUND H. and SOMPOLINSKY H.,Phys.
Rev. A 35(1987)
2293.[8]
PARGA N. and VIRASORO M. A., J.Phys.
France 47(1986)
1857.[9]
FEIGELMAN M. V. and IOFFE L.,Chernogolovka preprint
(1988).
[10]
DELGIUDICE P., FRANZ S. and VIRASORO M. A., J.Phys.
France 50(1989)
121.[11]
PEREZ VICENTE C. J.,preprint
(1988).
[12]
KRAUTH W., MEZARD M. and NADAL J. P.,Complex Syst.
2(1988)
387.[13]
KEPLER T. B. and ABBOT L. F., J.Phys.
France 49(1988)
1657.[14]
VIRASORO M. A.,Europhys.
Lett. 7(1988)
299.[15]
MATTIS D. C.,Phys.
Lett. 56A(1976)
421.[16]
WEISBUCH G. and FOLGEMAN-SOULIÉ F., J.Phys.
Lett. France 46(1985)
L-623 ;MCELIECE R. J., POSNER E. C., RODEMICH E. R. and VENKATESH S. S., Caltech
preprint
(1986).
[17]
AMIT D. J., GUTFREUND H. and SOMPOLINSKY H.,Phys.
Rev. Lett. 55(1985)
1530 ; Ann.Phys.
173