Retrieval dynamics of neural networks constructed from local and nonlocal learning rules

(1)

HAL Id: jpa-00212362

https://hal.archives-ouvertes.fr/jpa-00212362

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Retrieval dynamics of neural networks constructed from

local and nonlocal learning rules

J. Krätzschmar, G.A. Kohring

To cite this version:

(2)

Retrieval

_dynamics

_{of neural networks constructed from local}

and nonlocal

_learning

rules

J. Krätzschmar and G. A.

_Kohring

Universität Bonn,

Physikalisches Institut,

Nussallee 12, D-5300 Bonn _I,F. R. G.

(Reçu

le 7 avril 1989, révisé le 18

_septembre

1989,

accepté

le 20 octobre

₁₉₈₉₎

Résumé. 2014 Les

dynamiques

de

_rappel

de réseaux neuronaux construits à

_partir

de

règles

d’apprentissage

locales et non locales sont

_comparées

à l’aide de simulations

_numériques

et

apparaissent

très similaires. Nos simulations

_indiquent

_qu’il

ne _{semble pas}

_possible

de déterminer le _comportementdes

_systèmes

aux _temps

_longs

en fonction de la

_dynamique

aux

_premiers

instants, comme cela avait été récemment

_suggéré

_par

_Kepler

et Abbott. Abstract. 2014 The retrieval

dynamics

of neural networks constructed from local and nonlocal

learning

rules are

_compared

via _computersimulations and shown to _{be very similar.}Furthermore,

the simulations show no

_hope

for

_determining

the

_long

time behavior of the _systemin terms of the first _step

_dynamics

as has been advocated

_{by Kepler}

and Abbott.

Classification

Physics

Abstracts

87.30G - 06.50

Introduction.

The

_{thermodynamic properties}

of the

_{original Little-Hopfield}

model have been studied in detail

_by

_{Amit, Gutfreund,}

_Sompolinsky

and their co-workers

_[1].

_They

showed that a

strongly

connected network

_having

N

nodes,

or

_Ising-spins,

behaves as an associative memory

device

_provided

the number of

_pattems,

P,

scales with N as: a =- lim

_{PIN 0. 14.}

The N-+oo

relatively

low

_storage

_capacity

of the model

_prompted

Personnaz et al.

_[2]

to invent the

nonlocal,

or

_{pseudo-inverse, learning}

rule for

_determining

the

_couplings,

_Jjj,

between the

spins.

Such a model has a maximum

_storage

_capacity

of a = 1. Vekentash

[3],

however,

showed that the maximum

_storage

_capacity

_{for any model with}

_{only two-spin}

interactions is

a = 2. This

prompted

Gardner

[4]

to find _alocal

learning

rule which could reach this

maximum

_storage

_capacity

and in

_doing

so she

_developed

the

_analytical

tools needed for

completely determining

the static

_properties

of networks

_{having asymmetric couplings.}

Static

_properties

alone are not sufficient for

_{understanding}

neural networks because their ultimate use is as

_dynamical

attractors. In that

_respect

it has been

_argued

_[5]

that networks constructed from nonlocal

_learning

rules should have better

_{dynamical properties}

than those constructed from local

_learning

rules.

However,

extensive numerical simulations

_by

Forrest

[6]

and Kanter and

_Sompolinsky

_[2]

at two values of a do not bear that out. An

_approximate

analytic

treatment

_[7]

also indicates that at least in the

_neighborhood

of « =

1,

the two

learning

rules should have similar

_dynamical

behavior.

_Furthermore,

it has been shown

_by

(3)

224

Diederich and

_Opper

_[8]

that some local

_learning

rules can in fact

_{reproduce exactly}

the

coupling

matrix of the

_{pseudo-inverse.}

In this paper

then,

we make

_{explicit comparison}

between these two

_types

of networks

_using

as a measure of the

dynamical

response, the radius of

attraction,

R. R is defined as one minus

the minimum

_overlap,

mmin, with a stored state such that as N - 00 almost all of the states

having >

mmin

(m =

(1 IN ej

Sj,

where e

represents

the stored state and S an

arbitrary

j

state)

will evolve towards the stored state.

_{Previously Kepler}

and Abbott

_[9]

_gavea

phenomenological

rule for

_calculating

this

_quantity,

but the simulations to be

_presented

here will show that care is

_required

when

_using

their rule. In the next section the numerical

procedures

used will be

_explained

and an

_{interpretation}

is

_given

in the last section. Numerical methods.

Local

_learning

rules are

_designed

to

_produce

a

_coupling

matrix,

_Jjj,

such that the

following

stability

condition is fulfilled for every site of every

pattern :

The

_learning

rules

_{proposed by}

Krauth and Mèzard

_[10],

Forrest

_[6]

and Gardner

_[4]

can be

shown to

_{produce equivalent}

networks at

_{saturation, i.e.,}

when the network has the maximum

storage

capacity

allowed for a

_given

K. In the case of

random,

uncorrelated

_patterns,

a is related to K at saturation

_by

the Gardner formula

_{[4] :}

To construct a network at saturation we used the

_approach

advocated

_by

Gardner

[4].

_Starting

with

_pattern

1 and

_{going sequentially through}

all of the

patterns

to be

stored,

the

_coupling

matrix is

_{changed by :}

After all the

_pattems

have been

checked,

the

_procedure

is

_repeated

until

_AJij

= 0 for every

pattern, J.L

and

site,

i. For a finite size

network,

the

_system

saturates much sooner than would

be

_{expected by equation}

_(2.2).

In order to correct for

this,

we note that the

_learning

time on an infinite

_system

should

_diverge

as

_{K --> K S ( a ),}

where _{K S} is the saturation value of

K

[6, 11].

_Hence,

we looked for a similar finite

_size,

_{K fs,}at which the

_learning

time tends to

infinity

and chose for the simulations a value of K

_slightly

smaller so as to

_produce

reasonable

learning

times. This

_procedure

_produces

a network with

_properties

_{very similar}to that of a

fully

saturated,

infinite volume network. In

_fact,

a calculation of the average local

alignment,

(§fi

L Jij

03BEjr>>03BE ,

gives

a value on a 100 node network within 0.5 % of that

expected

for an

03BEui

j =i

iij

03BEUj>>

03BE

’j¥Ai

03BE

infinite size

_system

at saturation

_[9].

(4)

stability

condition in

_(2.1)

_to,

_§iU

_E

_{Jij 03BEuj’}

=

1,

could

reproduce

the

coupling

matrix _structure

j=i

of the

_{pseudo-inverse}

method of Personnaz

_{et al.,}

as N --> oo and in the limit of an infinite number of

_leaming

_steps.

The

_{pseudo-inverse}

method defines the

_synaptic

_couplings

as :

-where the correlation matrix is defined as : This method has the

advantage

that no refinements are _{necessary for finite size}

_systems.

After

_preparing

a network

_using

one of these

_{prescriptions,}

the

_system

was initialized to some state

_having

an

_overlap,

_{mo, with}one of the stored

_pattems.

Then the network was

updated

in

_{parallel according}

to the

_{zero-temperature,}

deterministic rule

_(Glauber

Dynamics) :

Biologically,

this rule is not _{correct ;}

_however,

a naive

_comparison

of time scales indicates that

real

_systems

should

_update

_parallel

than in

_{serial, hence,}

there has been a

_great

amount of attention

_paid

to this

_updating

scheme

_{[9, 12, 7, 13].}

Other

_updating

rules will

_give

different

_quantitative

results for the radius of

_attraction,

but the

_comparative

results should be the same.

The

_updating

rule is then

_applied

until the

_system

reaches a fixed

_point

or a

_cycle.

_Thus,

the

percentage

of times the

_{pattern IL}

is recalled as a function of

_me

can be calculated.

Repeating

this

_procedure

_{for many different values of mo and}

_using

the definition

_given

_above,

enables

one to calculate the radius of attraction. This

radius, however,

must be

renormalized, since,

as Kantor and

_{Som olinsky}

_[2]

have

_pointed

out, for finite size

_systems

the radius is reduced due to the

_0(1/

N )

_overlap

_{of every stored}state _{with every other stored}state.

_Hence,

the actual definition in use is : R

1 Mnùn

where m av is the average

overlap

with the other stored states and the brackets indicate an _averageover all stored states and all

_possible

sets of

patterns.

We have also checked in the course of these simulations the

_proposed

_{phenomenological}

rule of

_Kepler

and Abbott

_{[9] :}

where,

ml is the

overlap

of the

system

at the first time

_step

and P

_{(So --> E )}

is the

_probability

that initial

_system

is

_mapped

onto the stored state.

_Kepler

and Abbott

_suggested

that

X has the value 1/2.

For the simulation results to be

_presented

_here,

an N = 100 node network _wasused and

typically

75 different sets of

_patterns

were

_randomly

chosen to stabilize for each distinct value of a. For each set of

_pattems,

the radius of attraction was measured for each

_pattern

as

(5)

226

Results and discussion.

The main results of this paper are

_presented

in

_figure

_1,

where it can

_clearly

be seen that local

and

_{nonlocal learning}

rules have similar

_dynamical

behavior as manifested

_by

the

_similarity

of their radius of attraction. At a

= 1,

_Rnonlocal

must tend to zero

[2],

_however,

it is also clear that

R10cal

al is

becomming

very small.

Hence,

even

though

the local

learning

rule has a

storage

capacity

of a

= 2,

the

_systems

_ability

to function as an _{associative memory device is}

_already

seriously

impaired

at a = 1. These results confirm the

predictions

of Krauth et al.

[13]

who

argued

on the basis of a

_{one-pattern-neural-network}

that the

_stability

of the

_patterns

is the most

_important

variable for

determining

the radius of attraction.

Indeed,

for the nonlocal

learning

the

_stability,

à,

is

_{[2] : à}

=

(1 -

a )/ a ,

while for the

local learning

rule we have

Fig.

1. -

The radius of attraction, R vs. a.

(D)

For the

local learning

rule,

(A)

for the

nonlocal learning

rule,

(0)

using

the

_Kepler

and Abbott

_{phenomenological}

rule. The broken line is the

analytical

(6)

[4]:

a = [ ( K2; 1 ) erfc (- K/ v2) + (K/ v2)e-K2/2t1,

and as K-->oo,

a -> 1 /

(K 2 + 1),

==> K

= ( 1 - a )/ a .

_Hence,

these results confirm Krauth et al.’s

_{extropolation}

from a

_{one-pattern-network}

to a full network.

Furthermore,

since the local

_leaming

rule

produces

couplings

which are

_{symmetric only}

in the mean

_{(i. e. ,}

the

_quantity,

y =

Tin - Tji

is Gaussian distributed with zero

_mean),

while the nonlocal

_{learning produces completely}

symmetric couplings ;

these results indicate the lesser

_importance

of the

_asymmetry

of the

couplings

(at

least when such

_asymmetry

is not too

_large).

Figure

1 also shows the radius of attraction as

_{predicted by}

the

_{phenomenological}

rule of

Kepler

and Abbott

_along

with the data from their _paper

_[9].

The

_disagreement

between their results and the

_present

results can be understood

_by

_looking

at

_figure

2a. Here the

_probability

that P

_{(So -03BE )}

as a function of

1 -mo

is

_plotted

for a =

0.10,

K = 2.5 and two different

1 - m0

values

_{of mo. Although}

simulations at _mo= 0.15

give

results in

_agreement

with their

proposed

rule,

results for mo = 0.50 do not.

Clearly,

_auniversal value for the

adjustable

parameter

X cannot be determined from this data because its

dependence

upon mo is not clear at this finite value of N.

Furthermore,

the results in

_figure

2b for a = 0.40 and K = 0.92 indicate

that the value

_{of X}

_{may also have}a K

_dependence.

In

_figure

3 additional evidence of this is

shown at a =

0.4,

K =

0.92,

where P

(So --+ e )

is

plotted

_{as a}function of mo

using

the normal

procedure

and the

_Kepler

and Abbott rule. The former

_predicts

the correct radius of

(7)

228

Fig.

3. -

P (S --> 03BE) )

vs. _{mo. The full line is for the normal}

_procedure

while the dotted line is for the

Kepler

and Abbott

_procedure.

Both are for a = 0.40 and K = 0.92.

attraction,

while the latter

_predicts

a radius of attraction consistent with the results

_{given by}

[9]

and indicated in

_figure

1. The

_present

results shown in

_figure

1 could in fact be accounted for

_{by using X =}

0.3

_[14].

However,

the results in

_figures

2 and 3 would still be difficult to

interpret

as

_supporting

_{any such universal value of}X . Given this

uncertainty

for the correct

value of _{X ,}_{if any,}

_prudence

_{would dictate that any results should be based upon the}more

reliable method for

_computing

the radius of

attraction,

namely

the standard method as

described in section 2 and used

_by

other authors

_(see

for

_example

Kanter and

_Sompolinsky

in

[2])

which uses

_only

the initial

_overlap

_moas its

_dependent

variable.

In summary

then,

the

_similarity

of the

_dynamical

behavior between networks constructed from local and

_{nonlocal learning}

rules has been demonstrated and

_implies

no

_advantage,

from

this

_viewpoint,

of either network

_type

in actual

_{applications.}

Furthermore,

it has been shown that

_attempting

a

_description

of the

_long

time behavior of these

_systems

in terms of

_only

the first

step

dynamics

may lead to

_spurious

results.

References

[1]

AMIT D., GUTFREUND H. and SOMPOLINSKY H.,

Phys.

Rev. Lett. 55

₍₁₉₈₅₎

1530 ; Ann.

_Phys.

173

(1987)

30 ;

CRISANTI A., AMIT D. J. and GUTFREUND H.,

Europhys.

Lett. 2

₍₁₉₈₆₎

337.

[2]

PERSONNAZ L., GUYON I. and DREYFUS G.,

Phys.

Rev. A 34

₍₁₉₈₆₎

4217 ;

(8)

[3]

VENKATESH S. S., Neural Networks

_for

_Computing,

Ed. J. S. Denker

_(American

Inst. of

Phys. ,

New

_York)

1986.

[4]

GARDNER E.,

Europhys.

Lett. 4

₍₁₉₈₇₎

_{481 ;}J.

Phys.

A 21

₍₁₉₈₈₎

_{257 ;} GARDNER E. and DERRIDA B., J.

_Phys.

A 21

₍₁₉₈₈₎

271.

[5]

AMIT D. J.

_(Racah

Inst. of

Phys.,

Preprint),

RI/87/19

_(to

be

_published).

[6]

FORREST B. _M.,J.

_Phys.

A 21

₍₁₉₈₈₎

245.

[7]

KOHRING G. A.,

Europhys.

Lett. 8

₍₁₉₈₉₎

697.

[8]

DIEDERICH S. and OPPER M.,

Phys.

Rev. Lett. 58

₍₁₉₈₇₎

949.

[9]

KEPLER T. B. and ABBOTT L. _F.,J.

_Phys.

France 49

₍₁₉₈₈₎

1657.

[10]

KRAUTH W. and MÈZARD M., J.

_Phys.

A 20

₍₁₉₈₇₎

L745.

[11]

OPPER M.,

Phys.

Rev. A 38

₍₁₉₈₈₎

3824.

[12]

GARDNER E., DERRIDA B. and MOTTISHAW M., J.

Phys.

France 48

₍₁₉₈₇₎

741 ;

KRAUTH W., NADAL J.-P. and MÈZARD M.,

Complex

Syst.

2

₍₁₉₈₈₎

387.

[13]

KRAUTH W., NADAL J.-P. and MÈZARD M., J.

_Phys.

A 21

₍₁₉₈₈₎

2995.

Retrieval dynamics of neural networks constructed from local and nonlocal learning rules

HAL Id: jpa-00212362

https://hal.archives-ouvertes.fr/jpa-00212362

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Retrieval dynamics of neural networks constructed from

local and nonlocal learning rules

J. Krätzschmar, G.A. Kohring

To cite this version:

Retrieval

dynamics

of neural networks constructed from local

and nonlocal

learning

rules

Kohring

Physikalisches Institut,

(Reçu

septembre

accepté

1989)

dynamiques

rappel

partir

règles

d’apprentissage

comparées

numériques

apparaissent

indiquent

qu’il

possible

systèmes

longs

dynamique

premiers

suggéré

Kepler

dynamics

learning

compared

hope

determining

long

dynamics

by Kepler

Physics

thermodynamic properties

original Little-Hopfield

by

Amit, Gutfreund,

Sompolinsky

[1].

They

strongly

having

nodes,

Ising-spins,

provided

pattems,

P,

PIN 0. 14.

relatively

storage

capacity

prompted

[2]

nonlocal,

pseudo-inverse, learning

_dynamics

_{of neural networks constructed from local}

_learning

_Kohring

_septembre

₁₉₈₉₎

_rappel

_partir

_comparées

_numériques

_indiquent

_qu’il

_possible

_systèmes

_longs

_dynamique

_premiers

_suggéré

_Kepler

_compared

_hope

_determining

_long

_dynamics

_{by Kepler}

_{thermodynamic properties}

_{original Little-Hopfield}

_by

_{Amit, Gutfreund,}

_Sompolinsky

_[1].

_They

_having

_Ising-spins,

_provided

_pattems,

_{PIN 0. 14.}

_storage

_capacity

_prompted

_[2]

_{pseudo-inverse, learning}

_determining

_couplings,

_Jjj,

_storage

_capacity

_storage

_capacity

_{only two-spin}

_storage

_capacity

_doing

_developed

_analytical

_properties

_{having asymmetric couplings.}

_properties

_{understanding}

_dynamical

_respect

_argued

_[5]

_learning

_{dynamical properties}

_learning

_by

_Sompolinsky

_[2]

_approximate

_[7]

_neighborhood

_dynamical

_Furthermore,

_by

_Opper

_[8]

_learning

_{reproduce exactly}

_{pseudo-inverse.}