Prosopagnosia in high capacity neural networks storing uncorrelated classes

(1)

HAL Id: jpa-00212375

https://hal.archives-ouvertes.fr/jpa-00212375

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Prosopagnosia in high capacity neural networks storing

uncorrelated classes

S. Franz, D.J. Amit, M. A. Virasoro

To cite this version:

(2)

Prosopagnosia

in

_high

_capacity

neural networks

_storing

uncorrelated classes

S. Franz

_{(1, 3),}

D. J. Amit

_{(2, *)}

and M. A. Virasoro

₍₃₎

(1)

Istituto di

_Psicologia

del C.N.R., Viale K. Marx _{14, Roma,}

_Italy

(2)

I.N.F.N.,

Dipart.

di _Fisica,Università di Roma « La

_Sapienza

», Roma,

Italy

(3)

Dipartimento

di Fisica dell’ Università di Roma « La

_Sapienza

», Piazzale Aldo Moro 4, Roma,

Italy

(Reçu

le 7 août 1989,

accepté

le 20 octobre

₁₉₈₉₎

Résumé. 2014 On met en évidence une matrice

_synaptique

_qui

stocke efficacement les _patterns

organisés

en

_catégories

non corrélées dans les réseaux neuronaux à attracteurs et les perceptrons.

La

_capacité

de

_stockage

limite _augmenteavec le recouvrement m d’un _patternavec sa

_catégorie

ancestrale, et

_diverge

_lorsque

m tend vers 1. La distribution de

_probabilité

des

paramètres

de stabilité locaux est

_étudiée,

et conduit à une

_{analyse complète}

des

_performances

d’un _perceptron en fonction de sa matrice

_synaptique,

ainsi

_qu’à

une

_{compréhension}

_qualitative

du _comportement

du réseau neuronal

_{correspondant.}

_L’analyse

de l’attracteur du réseau est

_{complétée à}

l’aide de la

mécanique statistique.

La motivation d’une telle construction est de rendre

_possible

l’étude d’un modèle de

_{prosopagnosie :}

_{le passage du}

_rappel

individuel à celui de

_catégories

lors de _lésions, c’est-à-dire d’une détérioration aléatoire des efficacités

_synaptiques.

Les

_propriétés

de

_rappel

du modèle en fonction de la matrice

_{synaptique proposée}

sont étudiées en détail. Enfin nous

comparons notre matrice

_{synaptique à}

une matrice

_générique

dont tous les

_paramètres

de stabilité

sont

_positifs.

Abstract. 2014 We

_display

a

_synaptic

matrix that can

_efficiently

store, in attractor neural networks

(ANN)

and _{perceptrons, patterns}

_organized

in uncorrelated classes. We find a _storage

_capacity

limit

_increasing

with m, the

overlap

of a _patternwith its class ancestor, and

diverging

as m ~ 1. The

_probability

distribution of the local

stability

parameters is studied,

leading

to a

complete analysis

of the

_performance

of a _perceptronwith this

_synaptic

matrix, and to a

qualitative understanding

of the behavior of the

_{corresponding}

ANN. The

_analysis

of the retrieval

attractor of the ANN is

_completed

via statistical mechanics. The motivation for the construction of this matrix was to make

_possible

a

_study

of a model for

_{prosopagnosia,}

i.e. the shift from individual to class recall, under lesion, i.e. a random deterioration of the

_synaptic

efficacies. The retrieval

_properties

of the model with the

proposed

synaptic

matrix, affected

_by

random

_synaptic

dilution are studied in detail.

Finally

we _compareour

_synaptic

matrix with a

_generic

matrix which

has all

_{positive stability}

parameters.

Classification

Physics

Abstracts 87.30 - 64.60 -

75.10

1. Introduction.

Following

the

_{discovery by}

Gardner

_[1]

that there exist

_synaptic

matrices which would store

sparsely

coded

_(magnetized)

_pattems

in neural networks with

_capacity

that exceeds that of

(*)

On leave from the Racah Institute of

_Physics,

Hebrew

University,

Jerusalem.

(3)

uncorrelated

_patterns,

and

_diverges

as the correlation between the

_patterns

tends to one

[2],

several

_proposals

for

_explicit

realization of such matrices have been

put

forth. The aim has been to find a modified

_{synaptic prescription}

of the

_{original Hopfield}

model

_[3]

for which the Gaussian

noise,

that affects the local

_stability

of the stored

_patterns,

decreases as the

correlation between

_patterns

invreases

_{[4, 5],}

while the

_signal

is controlled

_by

an

_appropriate

threshold

_[6].

Such constructions _{filled the gap between the Gardner results and}

_previous

work on the

_storage

of

_magnetized

_patterns

_[7-9]

which

_predicted

a

_storage

_capacity

decreasing

with the

_{magnetization.}

In an

_{independent developement}

it has been shown

_[10],

using

Gardner’s

_method,

that the limit of

_capacity

of a network

_storing

_patterns

_organized

in a

finite number of uncorrelated

_classes,

is the same as in the case of a

_single

class of

_sparsely

coded

_(biased)

_patterns.

On the other

hand,

the

_attempt

_[11]

to extend the

_high

storage

prescription

[4-6]

to

_multiple

classes has turned out to be limited to classes of

_higly

correlated ancestors. Here we

_generalize

the work of reference

_[6]

to find a

_{synaptic prescription}

which can be store uncorrelated classes of

_patterns.

Each of the classes is

_{represented by}

an N-bit ancestor or

_prototype.

These ancestors are uncorreleted. The individuals in a

_given

class are

_{generated by}

a random

branching

process in which

they

are chosen

_randomly

with a fixed level of correlation with the ancestor. This is carried out in section 2 where we reformulate the

_synaptic

matrix of

reference

_[6]

so as to _{make it gauge invariant. The gauge invariant}

_single

class

_dynamics

is

easily

extended to

_multiple

uncorrelated

classes,

giving

a Gaussian distribution of the local

stability

parameters

identical to that of a

_single

class

_[6].

_{Consequently,}

the

_{capacity diverges}

as in the

_optimal

_storage

of

_Gardner,

when the difference between individuals and their

corresponding

prototypes

tends to zero

_(limit

of full

correlation).

It is shown that while the

natural extension of the

_synaptic

matrix is

_{asymmetrical,}

a

_symmetrical

version can also be

constructed. Section 2 concludes

_showing

that the

_proposed

_synaptic

_{prescription applies}

to

the

_storage

of

_patterns

in an ANN

(attractor

neural

network),

as well as to the

_{implementation}

of an associative rule for a

_perceptron.

In the latter case the network is

_expected

to associate

correctly

a set of

_input

_patterns

to a set of

_{output patterns.}

Both the

_input

and the

_output

patterns

are

_grouped

in

_{corresponding}

classes,

i.e. if two

_{inputs belong}

to the same

_class,

the

corresponding

outputs

belong

to the same class. Inside the

_{corresponding}

classes the association is

_unspecified.

The task of the

_perceptron

is to associate with all individuals in an

input

class the

_{representative}

of one

_output

class.

In section 3 and 4 we

_analyze

the

_probability

_stability

_parameters

both for a

_perceptron

and for an ANN. For

_perceptrons,

the

_probability

_stability

_parameters

_fully

characterizes the

_properties

of the

_output

state when a

_given

pattern

is

_presented

as

_input

_[12].

It is

_interesting

to

_analyze

the behavior of networks with this

_synaptic

_prescription

when

they undergo

deterioration,

for instance

_by

random destruction of a fixed fraction of the

synapses. For one individual

_pattern

_belonging

to a

_particular

class it is useful to

_distinguish

those sites that are common with the class

_prototype

_(which

we call

_plus

_sites)

and those in which the individual

_pattern

differs from the class

_{(minus sites).}

It was

_pointed

out

_[14]

that in

Gardner’s measure _{almost every network that}stores correlated

patterns

will

_{spontaneously}

show a level of robustness which is

_higher

in the

_plus

sites then in the minus sites. This is a

probabilistic

statement and as such it

_depends

on the a

priori

probability

distribution

assumed. But a

_{particular synaptic}

_matrix,

the

_{pseudo-inverse}

for

_example,

_{may lead}to a

network which is

_{highly untypical}

with

_respect

to the

_probability

measure. In section 5 we

show that when the number of

_plus

sites

_incorrectly

retrieved is of the same order of that of

(4)

For ANN’s this

_probability

distribution determines the state of the network after one

_step

of

_parallel

_iteration,

and

_gives

a

_rough

estimate on the attractors of the

_dynamics

_{[12, 13].}

Other methods must be

_employed

to

_produce

a full

_description

of the attractors. In the case of

symmetrical

synapses, statistical mechanics can be used. This is done in section 6 where the

analysis

confirms and clarifies the

_picture

_{that emerges from the}

_analysis

of the

_stability

parameters.

In section

7,

we

_study

the effect of

_symmetric

dilution of the

_synaptic

matrix on the

attractors of an ANN. Just like in the

_perceptron,

or for the first

_step

_{parallel dynamics}

of an

ANN,

the lesion affects the attractor

_{asymmetrically}

at the

_plus

and the minus sites. For low levels of dilution the individual

_patterns

can be still

_retrieved,

_although,

the

_probability

of

making

an error at a

_plus

site increases more

_rapidly

then the

_{corresponding probability}

at a

minus site. As the dilution level

increases,

the attractors of individuals are

_completely

destabilized,

while those of class

_prototypes

remain fixed

_points

of the

_dynamics.

Finally

in section 8 we _compareour

_synaptic

matrix to a

_generic

one in the Gardner volume.

For that matrix the

_stability

_parameters

at _{every site}are

_positive

when the network is in a

stored

_pattern,

i.e. there are no retrieval errors in absence of fast noise. We find that this

overlap

does not

_depend

on the

_parameter

which is introduced to reduce the slow noise. We show that in the limit of full correlation our

_synaptic

matrix is the axis of the cone in which the Gardner volume is embedded

_(1).

2. The model.

2.1 REFORMULATION OF HIGH STORAGE IN A SINGLE CLASS. - In a model

_storing

a

_single

class p

= a N

magnetized

patterns (ai") ,

the

_patterns

are chosen with the

_probability

law

m is the

_{magnetization,}

or access

_activity,

of each

_pattern

in the limit N --+ oo . The model is formulated in terms of a Hamiltonian

with Si

’

= _±1 and

-1 p

The

_parameters

U and c, are then varied to maximize the

_storage

_capacity

which is an

increasing

function of m, and in the limit mu

_1,

the

_{capacity diverges}

as the theoretical limit

found

_by

Gardner

_[1].

As

usual,

the trend of the result can be detected from a

_signal

to noise calculation for the

stability

of the stored

_pattems.

Details are filled in

_by

a statistical mechanics

_computation,

which is

_possible

due to the

symmetry

of the

_Jij’s.

To

_generalize

the model we focus our

attention on the

_signal

and the noise

_parts

of the

_stability

_parameters

at each site of the stored

patterns

-(1)

When this work was

_completed

we received a brief report of Ioffe, Kuhn and Van Hemmen,

(5)

where

is the local field.

The first

_step

in the

_{generalization}

of the model to uncorrelated classes is to store a class of

patterns

whose ancestor is different from the state of all l’s Le. to store

_patterns

ur

= a i"

ai with an

arbitrary

_{ancestor oi =}

± 1. Note that due to _{the presence of}c and U in

(2.5)

the

_stability

_parameters

_L1r

in formula

_(2.4)

are not _{invariant under the gauge}

transformation

_{(Mattis transformation) [15]}

where

_oj

= _±1 _are

arbitrary.

Hence it is _not

possible

to _{pass from the}

_storage

of the

ferromagnetic

patterns

_{{ ur}}

to the

_storage

of a

_single

class of

patterns {uf}

=

{ ur U j}

by

gauge

transforming

the

synaptic

matrix. This is also the main obstacle to the

storage

of uncorrelated classes.

The model is therefore reformulated to ensure that

_L1r

and its

_probability

distribution be gauge invariant. We can rewrite

_(2.4)

in the form

if we take

In other

_words,

U and c are absorbed in the definition

_{of Ji’j :}

from PSPs or thresholds

_they

become

_synaptic

efficacies. The non-local term that has

_appeared

in

_(2.8)

could not be

evoided when

_{storing multiple}

classes. It will

_play

an essential role in

_reducing

the slow noise

due to the

_storage

of an extensive number of

_patterns,

_{taking advantage}

of the correlations

between the

_patterns

with a class. The distribution of the

_J1r

is the same as in the

original

model,

as can be seen

_{by substituting}

_(2.8)

in

_(2.7)

and

_using

the fact that

1

_af

= m. If in

i

(2.6)

one substitues

_Ji’j

for

_Jij

the

_stability

_parameters

are now _gaugeinvariant. We can,

therefore,

pass from one class to another

_by

a mere _gaugetransformation

[15].

Note that while the distribution of the

_stability

_parameters

in the model of reference

_[6]

and that of the reformulated model are the _same,the

_dynamical

_properties

of the two models can

be different. This can be seen

_observing

that the

_original

model is formulated in terms of a

Hamiltonian,

which

_disappears

in the reformulation because

_{Ji¡ + JJ;.}

2.2 HIGH STORAGE IN MULTIPLE UNCORRELATED CLASSES. - Next we

proceed

to formulate

a

_synaptic

matrix that stores _p= a N

_patterns

{grV}

organized

in classes. There are

Pc

classes,

defined

_by

the uncorrelated

_{prototype patterns}

or

ancestors {g r},

chosen with

(6)

A

_pattem

in

class g

is defined in relation to its

_{ancestor (§f)}

_by

a stochastic

_branching

process

[8] :

The number of classes _{p, =}

_(aN)"

with 0 -- y 1 and the number of

_pattems

per class

(aN)’-B

To store the

_{rV}’s

we use

In the sums, the

indices g

and

À,

each takes the

_value,

_1,

..., p,, v takes the values

1,

...,

plp,,

and

_k,

the values

_1,

..., N.

The first two terms on the r.h.s. of

(2.11)

are

_just

those that appear in reference

[8, 9].

To

understand the effect of the third term, let us note that if the number of classes

p, «r.

N,

the matrix

_Pij

=

1/N Y

acts as the

_projector

in the space

spanned by

the

A

(§f) .

The third term in

_(2.11)

can,

therefore,

be written as

Which subtracts from the

_lij

that

_projection

of the

_{vectors {( g (JI - mg ()}}

_{in the space}

spanned by

the class

_prototypes.

As

_such,

as we will see, it is very effective in

subtracting

from the

_stability

_parameters

of the individual

_patterns

that

_part

of noise that comes from this

projection.

Admittedly,

this term

_gives

rise to a

_non-locality

of the

_synaptic

_{matrix. Yet this may}not be such a serious

drawback,

since it may be

possible

to

_generate

it

_by

a local iterative

_procedure.

It should be

_pointed

out that a local

_algorithm,

such as the

« perceptron

leaming

algorithm

»,

gives

rise to matrices which can not be

_expressed

as local

_expression

in terms of the

_patterns.

The matrix

_(2.11)

is not

_symmetric.

It

_{is, however,}

_possible

to

_symmetrize

it without

changing

in a relevant way the distribution of the

_stability

_parameters.

A

_symmetrical

version

of

_{(2.11 )}

is

Now the model is

_again

_Hamiltonian,

with H

_{= - L}

_{Iij Si Sj}

and it can be

_{investigated by}

«/

statical mechanics.

2.3 THE HETERO-ASSOCIATION PERCEPTRON VIEW. 2013 Before

proceeding

to show that this

synaptic prescription actually

stores the extensive set

_{of ei" ,}

let us

_point

out that

_{(2.11 )}

can

(7)

input

units and N

_output

units. It is natural to extend the

_prescription

_(2.11)

to the case in which the

_{output patterns}

are not

_necessarily

the same as the

_input

_patterns.

We associate a set of

_input

_{patterns {u r V} j}

=

1,

...,

N,

organized

in uncorrelated

classes,

to a set of

output

patterns 1 i

=

1,

..., N’

belonging

to

corresponding

classes. N and

N’ are,

respectively,

. the numbers of

input

and

output

units.

The

_generation

of

_input

and

output patterns

is done

_by

_{independent branching}

_processes similar to

_(2.10),

from

_input

and

output

ancestors

_UM

and

₁

_{respectively.}

For a

perceptron,

in contrast to an

_ANN,

the

_overlaps

of the

_input

_patterns

with their

_input

prototypes,

min, and the

corresponding overlaps

of the

output patterns

with the

_output

prototypes,

mout, are

_independent

variables

_[10].

The association is

_{performed through}

the usual noiseless _{response function}

In the case of auto-association N =

N’,

_m;n= _mout,

ur"

=

eé"

for

all i,

_IL,v.

Equation (2.13)

can be viewed as the first

_step

of the

_{parallel dynamics}

of an ANN.

The extention of

_(2.11)

to the hetero-association

_perceptron

can be written as

The overall factors in

_equations

_{(2.11), (2.12)}

and

_(2.14)

have been chosen so that

This will tum out to be useful in section 8. The

_{prescriptions}

_{(2.11), (2.12)}

and

_(2.14)

can be

generalized

to a full ultrametric

_hierarchy

of

_patterns.

The

_presentation

of this more involved

case will be

_given

elsewhere.

3. The distribution of the local

_stability

_parameters.

In this section we

_{analyze separately}

the

_probability

distribution of the

_stability

parameters

when the network is in a state which is an individual

_pattern

or in a state which is an ancestor,

with the

_synaptic

matrix

_(2.14).

The

_perceptron

will have N

_input

units and

_{N’ output}

units,

and will

_{store p}

= a N

_patterns

equally

divided in _anumber _{pc =}

(a N )’Y,

of classes

(yl).

The

_{corresponding analysis}

for the first

_step

_{parallel dynamics}

in an ANN is

recovered

_{by setting}

_min=

mout = m in all formulae. We do not

_analyze

in detail the case of

the

_symmetric

matrix

_(2.12),

_{which is very}

similar,

except

to indicate when formulae

_undergo

essential modifications.

As we stressed in the

introduction,

it is useful to

_distinguish

in a

pattern {grJl}

the sites in

(8)

probability

distribution

_{of g}

is

_{(from (2.10))}

i.e. the fraction

of §

sites is

₍₁

+

_mout)/2

of the total.

Let us consider the

_stability

_parameter

_à,

of one of the stored

_patterns

_(e.g.

_pattern

(§/) ) at a § site.

Using

(2.14)

and N

_{uJ uJl}

=

m;" we obtain

J

The first term in

_(3.2)

is the

_expected

value of

_àe.

The second term is a Gaussian noise with zero mean and variance

_{(1 - 2 Cmin + m.)}

which has a minimum

_equal

to

1 - m

for

c = _Min

[6].

The third and fourth terms in

(3.2)

_sumterms in

Jij

different from

{ U jll, ej’ .

They

do not contribute at all.

_They

have zero mean and do not fluctuate in the limit N --> 00. This can be seen, for

example,

from the

_expectation

since Pc =

(aN)".

In the case of the

_{symmetrical prescription}

_(2.12)

for ANN

_(i.e.

_min= _mout= _m,

urv = grV)

an extra term _{appears in}

_âe,

i. e.

In the limit N --+ oo this term becomes

namely

it is

_effectively

a shift of U

_by

a

quantity -

a. Such a shift is

irrelevant,

since U is a

parameter

to be

_optimized.

It should be stressed that even if the

_asymmetrical

and the

symmetrical

prescriptions (2.11)

and

_(2.12)

lead to the same distribution of the

_stability

(9)

The ratio between the

_expected

value of

_{àe (signal)}

and its standard deviation

_(noise)

is

This _{shows that in the limit mout -+ 1 a}can

_{diverge roughly}

as

_{1 / (1 -}

_{mout )}

and that unless U = 1 the

plus

and the minus sites have different

degrees

of

stability.

We will _seein the next

section,

studying

the tail of the distribution of the

_{Ae [15],}

that the maximization of the

_storage

capacity

will be obtained for U :> 1. This

_implies

that the

_plus

sites will be more stable then the minus sites. Further

_insight

can be

_{gained by studying}

the tail of the

_probability

distribution

[14],

as we will see in the next section.

The

_stability

_parameter

at a site i of a

_prototype

_(ancestor)

_pattern

is

_{given by}

As in the case of

_dg

the third and the fourth terms

_inâ,

have zero mean and do not fluctuate in the limit N - oo . The second term is a Gaussian variable of zero mean, which vanishes if

C =

mine ·

The choice c =

m;n is

optimal

because it minimizes the noise both in

àe

and in

,àc

[6, 11].

In

_{particular, Ac}

does not fluctuate at all and is

In

fact,

A, plays

the role of the

_parameter

M introduced

_by

Gardner

_[1].

We will see below

that

_optimizing

U for a _{= a G, where a G is the}Gardner bound on

_storage

_{capacity [1],}

_implies

a =

M _{for mout -+ 1.}With the

_synaptic

matrix

_(2.14),

the r.h.s. of

_(3.6)

becomes

where we have set _min=

mo"t = m, as is

_required

for an ANN. This

corresponds

to

_shifting

U

_{by -}

a and does not _{have any}effect when U is

_optimized.

The above

_analysis

does not hold in the limit cases _min=

0,

i.e. when all

input

_pattern

are

independent

and there are no classes. It also does not _{hold when min}=

1,

i.e. when all

(10)

4. Performance.

From the

_analysis

of the above

_section,

the

_probability

distribution of

_âg

is :

where we have chosen c =

m; as in references

_{[6, 11].}

To

_study

the

_performance

of the network we introduce the

probability

Ee

of an error in a

e site,

i.e. a site of descendent with relative

_{sign e}

to the same site in its ancestor. From

_(4.1)

where Note that

_Ee

is

_independent

_{of min}

el je w à ,,,

(except

for min = 0 and min = 1 where

(4.2)

does not

hold). Consequently,

the discussion in

this section can be carried out for fixed

_(arbitrary)

_min.

_Eg

_represents

the number of errors in

the e

sites divided

_by

the number

of e

sites. The contribution of these sites to _{the total average} error is

and the total average error E is

_{given by}

Clearly,

as E decreases the network retrieves the stored information more

_effectively.

The

performance

of the network is

_optimized

if U is chosen to minimize E.

_{Differentiating}

_(4.2)

respect

to U one finds :

For this value of U the error in the

_plus

sites and the error in the minus sites are of the same

order of

_magnitude.

In

fact,

Since

_{Up, :::.}

_1,

it can be seen from

_{equation (4.2)}

that

which is a _{consequence of the fact that the}

_expected

value of the

stability

_parameter

is

greater

at the

_plus

sites then at the minus

_sites,

the noise at both

_types

of sites

_{being equal.}

In

_figures

1 and 2 we

_plot

_E+,

E_ and E vs. _{mout for a}

_{= 8 a G}

with ig =

0.9 and

(11)

Fig.

1. -

E+,

E_ and E as _{functions of mout for}

_{et = 0 - 9 X et G. E,}

and E are

_decreasing

functions of mout. E- has a maximum which can _{lie very close}to the value mout = 1.

Fig.

2.

_{- E, ,}

E_ and E as _{functions of mout for}a = 0.1 _x

aG. The maximum of E_ is lower then in

figure

1 and lies at a lower value of mout. In this case, the range of m.ut for which the model can retrieve

(12)

equal

to E _{for mout}= 0 and tendint to zero for mout --+ 1. It has a

_maximum,

that shifts to the

right

for

_increasing

values of

_8.

Note that for

_f3

=

0.9,

the value of the maximum is very

high,

it lies very close to _mout= 1.

Clearly,

a

good

discrimination of an individual

_pattern

from its class

prototype

is obtained

_if,

in addition to E «r.

1/2,

also E- « 1/2. It

is, therefore,

necessary that the

argument

of the error function in

_{E_,}

be

_positive,

which is satisfied if U -- 2. When mout --+

1,

it is

_possible

to determine the

_asymptotic

behavior of

_Ee,

_equation

(4.2),

and of the

_storage

_capacity.

In this

limit,

as all the

_{output patterns}

tend to their

_output

prototypes,

the task for the network reduces to the classification of

_input

_patterns,

and it is

possible

to reduce the error to _zero,since the number of classes is not extensive.

To see this note that since E is

_{given by}

_(4.4),

to have E --> 0 it is sufficient that

E+ --+

0,

regardless

to the value of

E_.

E+

will vanish when

At each

_input

_pattern

in a

_given

_class,

has to be associated the

_prototype

of the

_{corresponding}

output

class. This is an _{easy task}

_[10],

and can be

_performed

for an

_{arbitrary large}

number of

pattems.

But it is more

_relevant,

to

_require

that even in the limit m.ut --+ 1 the individuals be

distinguished

from the class

_{prototypes ;}

i.e. to

_require

that

E- --+

0.

_Then,

_using

_(4.2),

we must have

This is

_particularly

relevant in an

_ANN,

where

_{E. -}

0

_guarantees

that

_{the p}

stored

_patterns

em’

be fixed

_points

of the

_dynamics.

Using

the value

_(4.5)

for

U,

and

_{writing a}

as

we find that condition

_(4.9)

is fulfilled for any

_/3

in the interval 0 :

_/3

1.

_Substituting

(4.10)

in

_(4.5)

we find

It is

_interesting

to _compare

_{.dc, given by}

_(3.6),

with Gardner’s

_parameter

M

_{= L r Jij}

_ur

j

where

_Jij

is a matrix in the Gardner volume. One

obtains,

from

(4.11)

and the saddle

point

équation

in

_[1],

This shows

_that,

_{in the limit mout --+}

1,

in which «

_approaches

the Gardner

_bound,

the

_stability

parameters

of the class

_{representatives}

_approach

the value taken

_by

the

_optimal

matrices that

store the

_given

_pattems.

In section 8 we will see that the distance between the matrix

(2.14)

(13)

The

_asymptotic

form of

_Eg,

equation

(4.2)

is

_given,

as _{mout --+}

_1,

_by

which vanishes with a

_3-dependent

_poweras _{m., --+}

_{1, apart}

from

_logaritmic

corrections. For

high

values of

_f3

_{(== 1 ),}

the

_decay

of

E_

to zero is very slow. Formula

_{(4.13) applies}

in a _very narrow _{range below mout}=

1,

because the power in the

_exponent

becomes very small. This

can be seen in

_figure

1, for f3 = 0.9,

where E_ has a maximum for mout

_{= 0.98,}

and

_only

then decreases to zero.

When Q ± 1,

equation (4.13)

is still valid for

_{E+ ,}

while

E_ =

1/2 for

_f3

= 1 and

E_ =

1 for

_j8

:> 1. We see that for

_{0 = 1 (a = a G}

and

_à,

=

M)

there is _a

transition,

in which

the network looses its

_ability

to recall

individuals,

but can still recall classes. 5.

_Noisy

_patterns

and lesions.

Next we

_study

the

_performance

of the network when the

_input

is a

_pattern

with random errors or when the

_synaptic

matrix is lesioned

_by

a random _{dilution of synapses. If the}

_input

_pattern

{Uj}

has an

_{overlap a}

with one of the stored

_pattern,

_e.g.

_{u]1}

_i.e.

_11N

_uJl

_o-j

_{= a ,}

i

then the same

_arguments

that led to formula

_(4.2)

_give

with

The error

_Eg

is defined with

_respect

to the desired

_{output el’.}

The

_resulting

_overlap

of the

output

with e 11

is

_[12]

The

_{analogous equation}

was derived in references

[12, 13]

to determine the « basins of

attraction » of the stored

_patterns.

Note that for a in the interval

_{(0, 1), 6}

is a

_decreasing

function of min. This means that for

_greater

_{min the}basins of attraction of the stored

_patterns

are smaller.

Expressions

(5.1)-(5.3)

apply

also to the case of random dilution of _{synapses, with}

a2 replaced by

p, the

probability

of survival of a _{synapse. In other}

words,

if

(14)

then

_(5.1)

holds with

.

The

_presentation

of a

_noisy

_pattern

as well as the dilution of the _synapsesare,

therefore,

equivalent

to an increase of a

_by

a factor

1/

BFb

(keeping

U

fixed).

The derivative of

_Ee

with

_respect

to b at 8 = 1

gives

the increment of the error due to an

infinitesimal lesion

As

_long

as U

_2,

where it should be to allow recall of individual

patterns,

the error increment

in the minus sites is

_greater

then that in the

_plus

sites. As was mentioned in the

_{Introduction,}

this difference in the increments occurs for almost all

_synaptic

matrices in the volume which

optimizes

the

_storage

of the

_patterns

_[14].

There,

it is a _{consequence of the fact that the}

expected

value of the

_stability

_parameters

at the

_plus

sites is

_greater

then at the minus sites. In other

words,

more information is lost in those sites that code the difference of individuals with

respect

to their class

_(i.e.

sites at which a

_pattern

differs from its class

_{representative}

prototype)

than at those sites that confirm the class. This effect can be

_interpreted

as

prosopagnosia

once a

_quantitative

criterion is

_given

to establish what is the maximum amount

of error in the munis sites for which one can

_distinguish

_individuals,

and what is the maximum

amount of error in the

_plus

sites for which one can have class recall. It is clear that a _{very small}

value of 8 has a

_marginal

effect on the

_performance

of the

_network,

while a

_large

8

_destoys

all the informations stored

_(agnosia).

Somewhere in between there is the

_region

of

prosopagnosia,

in which the classes are

_recalled,

but it is

_impossible

to

_distinguish

the individuals inside a class.

On the other

hand,

if the

_input

is a

_pattern

which has an

_{overlap a}

with one of the ancestors

and is otherwise uncorrelated with any of the

individuals,

then the error fraction of the

output,

with

_respect

to the

_{corresponding}

output prototype,

is

_{given by}

Consequently,

for any set of

parameters

( a ,

min, mout,

a )

we can make

_E,

_arbitrarily

small

_by

taking

Of course, if U is

optimized

to have the minimum

_E,

the classes are

_efficiently

retrieved

_only

for

_{a (1 - m 2 ) «.c 1 .}

6. Statistical mechanics of the

_symmetrical

model.

We have seen that up to a redefinition of

U,

the distribution of the

_stability

_parameters

does

not

_change

if the

_symmetrical

or

_asymmetrical

version of the model is chosen. The

_analysis

of

(15)

[12, 13]

picture

of what

_happens

in an ANN. For the

_symmetrical

matrix

_(2.12)

the model is

governed by

the Hamiltonian H

_{= - L}

_{Iij Si Sj.}

A

_{complete analysis}

of the retrieval states of

«;

the

_dynamics

can be obtained

_studying

the statistical mechanics of the

system

[17].

In the

following

we restrict the discussion to the cases in which the

_overlap

with a

_single

individual

pattern

(e.g.

_{(§)) )}

and/or the

_overlap

with a

_single

class

_prototype

( ( §)) )

is condensed.

The free _energy

_density

of the

system

is found

_by

a mean field

_{theory, using}

the

_replica

method to _averageover the

_quenched

patterns

[17].

If

_replica

symmetry

holds,

then

where

(3

is the inverse

temperature

and the double

_angular

brackets

_imply

an _averageover the distribution of

e.

The

_meaning

of the

_parameters

in

_(6.1)

is the

_{following : th}

is the mean

temporal overlap

with the retrieved class

_prototype,

M

is the normalized mean

_{temporal overlap}

with

et"’ - mel£,

i.e.

r is the normalized square of the uncondensed

overlaps,

q is the Edwards-Anderson

[18]

order

parameter,

and C is the

_{susceptibility,}

The

_angular

brackets stand for

thermal,

or

_temporal,

_{average, and}

_Dy

is a normalized Gaussian measure for the slow noise.

The

_parameters

_(6.2)-(6.6)

_satisfy

a set of saddle

_point

_equations.

In the limit

(16)

where

From the definitions of m and

_M,

_equations

_(6.2)

and

_(6.3),

and from their saddle

_point

values

_(6.7)

and

_(6.8),

_{respectively,}

the

_probability

_Ee

of

_making

an error in

_{a g}

site is :

When m - 1 below a Ce

M -+

1 C --+ 0 rh --+ m and

_{equation (6.12)}

reduces to

_equation

_(4.2)

for the first

_step,

as

_expected.

The numerical solutions of the saddle

_{point equations}

can be used to find the

_storage

capacity

and the

_{corresponding}

value of the retrieval

_quality

for classes and for

_individuals,

m and

_M,

_{respectively.}

In

_figure

3 we

_plot

_Ucvs. m, which is an

_increasing

function of m, and

diverges

for

_{m -+ 1,}

as

_expected.

In

_figure

4

M

and rh are

_plotted.

The

shape

of

M

as a function of m can be understood

_observing

that

Fig.

3. - The

(17)

Fig.

4.

M

and m vs. m for a =

a C. Note that rh grows almost

linearly

and for all value of m one has th - m.

M

has a minimum at m = 0.78, _{as a}

consequence of the

optimization

of U.

and that E_ has a maximum for m :::.

_0,

as can be seen from

_figure

5. The maximum of

E_

in

_figure

5 has the same

_origin

as those of

_figures

1 and

_2,

it is due to the

_optimization

of U. It is

_convenient,

in order to maximize the

_storage

_capacity,

to have

_{relatively high}

values of

E_ ,

and

_{correspondingly,}

low values of

M.

It must be noted also that unlike the first

_step

case,

E also has a maximum as a function of m. Note the

_qualitative

accord of

_figure

5 with

_figures

1 and 2.

Fig. 5.

-

E+,

E- and E vs. m for a =

crc. The maximum in E-

corresponds

to the minimum in

(18)

Of course the states in which a

_class,

but no

_individual,

is retrieved

_{(th :0}

0,

M

=

0)

are also relevant fixed

_points.

In this case, for any value of a and m,

while

confirming

that the classes are stored with no errors. This can be understood since we have constructed

_synaptic

matrix so as to have no noise on the class

prototypes.

7.

_Symmetrical

dilution of

_synaptic

matrix.

We now

_proceed

to

_study

the effect on the attractors of a lesion of the

_synaptic

matrix. In order to

_keep

the

_system

Hamiltonian we consider a

_symmetric

dilution of the

_synaptic

matrix

according

(4.4), (4.5).

It has been proven that such a kind of dilution is

_equivalent

to the introduction in the Hamiltonian of an SK

_spin-glass

interaction among neurons

[19],

with

_l;j

_independent

Gaussian variables with zero mean and variance

1/Na

_{(1 - p )/p.}

In the

replica symmetric theory,

this effective interaction adds to _{the free energy}a term of the form

[20]

The effect of this term in the saddle

_{point equations,}

is to

_{modify equation}

_(6.10)

to read :

A first effect of the dilution on the attractor is

_analogous

to the effect on the

_perceptron.

The noise induced

_by

the lesion affects

_{asymmetrically}

the

_plus

sites and the minus sites of the individual attractors.

_Repeating

the

_argument

of section

5,

one finds that the increment in

E_

is

_greater

then the

_{corresponding}

increment in

_E+.

A more

_important

_effect,

_regards

the

desappearance

of the individual attractors. The

_capacity

limit for the recall of individuals is lowered

_by

the new term is r, as can be realized

_observing

that a c is a

_decreasing

function of

the noise r.

_Furthermore,

due to the

_dependence

of the additional noise

_(7.1)

on a, another

critical value of a _appears,to be denoted

_by

_acl,at which the class

prototypes

cease to be

attractors. This is a

_point

of

_{ferromagnet-spin}

_{glass phase}

transition.

_Neverthless,

for a small

noise

_level,

_{a CI >}

_{a c.}There is a whole

_region

_{a c}- a _aCIin which the network is in a state

of

_{prosopagnosia.}

The individual information is

_completely

lost,

while the information about the classes is still stored in the network.

_Only

_{for very}

_{high dammage,}

the network looses

completely

all

information,

falling

into a state of

_agnosia.

8.

Comparison

with

_optimal

_storage.

In section

4,

we have found that in the limit

_{m.ut --+ 1}

the

_storage

_capacity

of a network with

the matrix

_(2.14)

_approaches

the Gardner bound and in

addition,

the error _goesto zero and

(19)

the matrix

_Jij

_(2.14)

with a

_generic

matrix that stores the

_given

_patterns

with no _errors,i.e. a

matrix within the volume of

_£l’s

that

_satisfy

_[1]

Specifically,

we

_compute

the

_overlap

averaged

over the Gardner volume.

_Q

can also be

_interpreted

as the

_angle

cosine between the

matrix

_(2.14)

and a

_generic

matrix that satisfies

_(8.1).

The

_probability

distribution

_{corresponding}

to the constraints

_(8.1)

is

The

_{quantity Q}

is

_{self-averaging}

with

_respect

to the measure

_(8.3)

and does not

_depend

on the

particular sample

of

_patterns

stored.

_Substituting

_(2.14)

in

_(8.3)

we obtain

Since M

_{= L gr}

_{liT ur}

is a

_quantity

that does not fluctuate we can write the second and the

j

third terms in

_{(8.4), respectively,}

as

and

respectively.

These two terms are of order

p,,IN

and do not contribute to

_Q

when N - oo. So the terms

_proportional

to U and c in the

_Jij’s,

that are essential for the

_storage,

are

(20)

which is of order

_1/

JN.

_Developing

the

_leading

term of

_(8.4)

we obtain

Hence,

in order to

_compute

_Q,

all that is needed is the

_probability

distribution of the

stability

parameters

at the

_plus

and the minus sites. In

_fact,

if we denote this function in

_{the e}

sites

_by

Fg (,à 9 *)

then

where the

_angular

brackets stand _{for the average}over the distribution

(3.1)

of

e i. e.

expliciting

the average

over g

The distribution

_Fe

_(A*)

was

_computed

in reference

_[14]

as a function of the order

_parameter

M and the

_parameter

_{q = E}

_lija

_lijb,

, where the distinct

matrices j a

lijb

satisfy

equation

i

(8.1 ) .

It is :

where,

for a

_given

a «-- _a,,_{q and M}

_satisfy

the Gardner saddle

_{point equations}

_[1].

One

(21)

Q

is

_{consequently,}

_independent

_{of m.}

_because,

as was shown in

_[10],

V m. 10

M

does not

1-m

depend

on _min"

_{For q}

=

1,

i.e. a =

ac

(8.13)

reduces to

which tends to 1 _{in the limit mout -+ 1. From}

_(8.14)

one can see that for all

_{m..t :0 1,}

one has

6 1,

i.e. the

_angle

formed

_by

matrix

_(2.14)

and the Gardner volume is different from zero.

In

_figure

6,

Q

is

_plotted

_{for fixed values of q.}

Q

is an

_increasing

_{function of mout and}one sees

that as _mout

_increases,

the distance of matrix

_(2.14)

from the Gardner volume decreases. This

can be

_intuitively

understood

_observing

that the retrieval

_properties

of the matrix

_(2.14)

are more and more similar to that of the

_optimal

matrices as _mout

_approaches

the 1.

Fig.

6. -

Q

vs. _mo"t

_{for q}

= 1

and q

= 0.8. For fixed q,

Q

is an

_increasing

function of

m..,

showing

that the distance of matrix

_(1.14)

from the Gardner volume decreases with mo"t.

Actually,

for a

_{generic q,}

as

_{m.,,,t --> 1}

(8.13)

reduces to

This

_implies

that

(22)

Conclusion.

In this paper we have considered a

_synaptic

matrix that stores

_efficiently

uncorrelated classes of

_pattern.

The

_storage

_capacity

limit behaves

_{qualitatively}

as in the case of

optimal

storage :

it is an

_increasing

function of the correlation of the individuals with their class

_prototypes,

and

diverges

in the limit of full correlation. We have obtained this result

_{by modifying}

the

_synaptic

matrix of reference

_{[8, 9].}

In the framework of this model the

_leaming

of a new

_pattern,

after

_{O (N )}

have

_already

been

memorized,

can be

_imagined

as

_occurring

in two

_steps.

The new

_pattern

is

compared

with the memorized

classes,

and the difference with the

_{representative}

of the class with which it

_overlaps

class is stored in a Hebbian-like outer

_product.

In addition the

_projection

of this difference on _{the space}

_{spanned by}

the classes

_prototypes

is subtracted. This reduces the Gaussian noise on the

_stability

_parameters

of the

patterns

stored.

Then,

together

with the individual

_patterns,

the classes

_prototypes

are stored so as to have

greater

stability

then the individual

_pattern.

_{This is necessary}to minimize the number of

errors when an individual

_pattern

is retrieved. It then follows that the

_plus

sites are more

stable then the minus sites in accord with what

_happens

in the Gardner case.

Prosopagnosia

appears at this

_point.

It is modeled

_by

the fact that a lesion in the network structure

_destroys

first the information about individuals then the information about classes.

_Only

at much

higher

level of

_disruption

the information about classes is

_{destroyed, mimicking}

_agnosia.

We have studied the matrix both in the context of ANN’s and

_perceptrons,

and have

_argued

that as far as _memoryerrors are concerned the two become

_equivalent

when one identifies the

errors in the

_{perceptron output}

with those

_present

in the

_{configuration}

of the attractor.

The retrieval of information in

_perceptron,

is

_fully

characterised

_by

the

_probability

_stability

_parameters

of individuals and class

_prototypes.

For ANN’s

we have considered a

_symmetric

matrix and an

_asymmetric

one. At the level of the

probability

distribution of the

_stability

_parameters,

the two matrices have the same

_properties.

_However,

there would

_normally

be

_dynamical

differences. The

_symmetric

matrix has been studied in detail

_by

statistical

mechanics,

which confirms the

_picture

obtained from the

_analysis

of the

stability

parameters.

We did not simulate

_extensively

the

_dynamics

of the ANN. We

_performed

simulations on

the zero

_temperature

_dynamics

of neural networks up to 100 neurons with these

_synaptic

matrices. Both the

_symmetric

and the

_asymmetric

model have a

_storage

_capacity

consistent

with the results of section 6. We observed

_that,

as

_{expected, beyond}

the limit of

_storage

capacity,

initial states

_highly

correlated with one individual

_pattern

evolves to the

correspond-ing

class

_prototype.

Acknowledgments.

S.F.

_{gratefully acknowledges}

a

_scholarship

of Selenia S.P.A. D.J.A and S.F. are

_grateful

for

the

_hospitality

extended to them

_{by Imperial College}

London,

where the work was

completed.

References

[1]

GARDNER E., J.

_{Phys. A}

21

₍₁₉₈₈₎

257 ;

_Europhys.

Lett. 4

₍₁₉₈₇₎

481.

[2]

WILLSHAW D. J., BUNEMAN O. P. and LONGNET-HIGGINS H. C., Nature 222

₍₁₉₆₉₎

960.

[3]

HOPFIELD J. J., Proc. Nac. Acad. Sci. USA 79

₍₁₉₈₂₎

2554.

(23)

[5]

TSODYSK M. V. and FEIGELMANN M. V.,

Europhys.

Lett. 6

₍₁₉₈₈₎

101.

[6]

PEREZ VICENTE C. J. and AMIT D. J., J.

_{Phys. A :}

Math. Gen. 22

₍₁₉₈₉₎

559.

[7]

AMIT D. J., GUTFREUND H. and SOMPOLINSKY H.,

Phys.

Rev. A 35

₍₁₉₈₇₎

2293.

[8]

PARGA N. and VIRASORO M. A., J.

_Phys.

France 47

₍₁₉₈₆₎

1857.

[9]

FEIGELMAN M. V. and IOFFE L.,

Chernogolovka preprint

(1988).

[10]

DELGIUDICE P., FRANZ S. and VIRASORO M. A., J.

_Phys.

France 50

₍₁₉₈₉₎

121.

[11]

PEREZ VICENTE C. J.,

preprint

(1988).

[12]

KRAUTH W., MEZARD M. and NADAL J. P.,

Complex Syst.

2

₍₁₉₈₈₎

387.

[13]

KEPLER T. B. and ABBOT L. F., J.

_Phys.

France 49

₍₁₉₈₈₎

1657.

[14]

VIRASORO M. A.,

Europhys.

Lett. 7

₍₁₉₈₈₎

299.

[15]

MATTIS D. C.,

Phys.

Lett. 56A

₍₁₉₇₆₎

421.

[16]

WEISBUCH G. and FOLGEMAN-SOULIÉ F., J.

_Phys.

Lett. France 46

₍₁₉₈₅₎

L-623 ;

MCELIECE R. J., POSNER E. C., RODEMICH E. R. and VENKATESH S. S., Caltech

_preprint

_(1986).

[17]

AMIT D. J., GUTFREUND H. and SOMPOLINSKY H.,

Phys.

Rev. Lett. 55

₍₁₉₈₅₎

_{1530 ;}Ann.

_Phys.

173

₍₁₉₈₇₎

30.

[18]

EDWARDS S. F., ANDERSON P. W., J.

_Phys.

France 5

(1975)

956.

[19]

SHERRINGTON D., KIRKPATRICK S.,

Phys.

Rev. Lett. 32

₍₁₉₇₅₎

1792.

Prosopagnosia in high capacity neural networks storing uncorrelated classes

HAL Id: jpa-00212375

https://hal.archives-ouvertes.fr/jpa-00212375

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Prosopagnosia in high capacity neural networks storing

uncorrelated classes

S. Franz, D.J. Amit, M. A. Virasoro

To cite this version:

Prosopagnosia

in

high

capacity

neural networks

storing

uncorrelated classes

(1, 3),

(2, *)

(3)

(1)

Psicologia

Italy

(2)

Dipart.

Sapienza

Italy

(3)

Dipartimento

Sapienza

Italy

(Reçu

accepté

1989)

synaptique

qui

organisés

catégories

capacité

stockage

catégorie

diverge

lorsque

probabilité

paramètres

étudiée,

analyse complète

performances

synaptique,

qu’à

compréhension

qualitative

correspondant.

L’analyse

complétée à

mécanique statistique.

possible

prosopagnosie :

rappel

catégories

synaptiques.

propriétés

rappel

synaptique proposée

synaptique à

générique

paramètres

positifs.

display

synaptic

_high

_capacity

_storing

_{(1, 3),}

_{(2, *)}

₍₃₎

_Psicologia

_Italy

_Sapienza

_Sapienza

₁₉₈₉₎

_synaptique

_qui

_catégories

_capacité

_stockage

_catégorie

_diverge

_lorsque

_probabilité

_étudiée,

_{analyse complète}

_performances

_synaptique,

_qu’à

_{compréhension}

_qualitative

_{correspondant.}

_L’analyse

_{complétée à}

_possible

_{prosopagnosie :}

_rappel

_catégories

_synaptiques.

_propriétés

_rappel

_{synaptique proposée}

_{synaptique à}

_générique

_paramètres

_positifs.

_display

_synaptic

_efficiently

_organized

_capacity

_increasing

_probability

_performance

_synaptic

_{corresponding}

_analysis

_completed

_possible

_study

_{prosopagnosia,}

_synaptic

_properties

_by

_synaptic

_synaptic

_generic

_{positive stability}

_{discovery by}

_[1]

_synaptic

_(magnetized)

_pattems

_capacity

_Physics,

_patterns,

_diverges

_patterns

_proposals

_explicit

_{synaptic prescription}

_{original Hopfield}

_[3]

_stability