Quasi-optimized memorization and retrieval dynamics in sparsely connected neural network models

(1)

HAL Id: jpa-00212470

https://hal.archives-ouvertes.fr/jpa-00212470

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Quasi-optimized memorization and retrieval dynamics in

sparsely connected neural network models

Karl E. Kürten

To cite this version:

(2)

1585

Quasi-optimized

memorization and retrieval

_dynamics

in

sparsely

connected neural network models

Karl E. Kürten

Institut für Theoretische

_Physik,

Universität zu _Köln,D-5000 Köln, F.R.G. and HLRZ

c/o

KFA Jülich, D-5170 Jülich 1, F.R.G.

(Reçu

le 29 mars _1990,

_accepté

le 4 avril

₁₉₉₀₎

Abstract. 2014

Several network

_topologies

with sparse

connectivity

suitable for information

processing

in neural network models are studied. A trial and error scheme for

_{quasi-optimal}

neighbour

search is shown to

_successfully

overcome instabilities

_giving

rise to chaotic behaviour. Moreover, computer simulations reveal that for random unbiased _patternsthe

_connectivity

of the network can be

_adapted

to the

_specific

structure of the information the network is asked to

capture such that it can be

quasi-optimally

stabilized. On the other hand, networks with random

or

_purely

nearest

_neighbour

interactions are not

_competitive

candidates for the realization of associative memories. It is shown further that

_sparsely

connected network models with

quasi-optimal neighbour

search

_{techniques substantially outperform}

their

_fully

connected _{counterparts.}

J.

_Phys.

France 51

₍₁₉₉₀₎

1585-1594 ler AOÛT 1990,

Classification

Physics

Abstracts

87.30 - 75.10H - 64.60

1. Introduction.

There has been a tremendous amount of interest in

_dynamical

neural network models

exhibiting

collective

_phenomena

associated with information

process-ing

and

_cognitive

behaviour :

_storage

and retrieval of

memories,

learning

and

_forgetting,

association and abstraction

_[1-3].

One detrimental

_assumption

of some of the

_currently

_{cônnectivity}

of the neuron-like elements. On the one

hand,

high connectivity

is

_rarely

found in nature. On the

other,

highly

connected networks of

respectable

size,

suitable for such

_cognitive

tasks as

_pattern

recognition

and

image processing,

raise immense

_{wiring problems}

in hardware realization and make real-time software

simulations

_{computationally expensive.}

In

_fact,

it is well known that realistic

_biological

networks are rather

_sparsely

_{interconnected,}

and it is also well known that

systems

with low

connectivity

are often amenable to

_{analytical study [3, 5].}

2. Formalism.

The network model consists of N

_sparsely

interconnected

_logical

units

_{only capable}

of the values cri = 1 and

os

= - 1,

where each cell i is

supposed

to receive

_exactly

K

_inputs

from

(3)

other units of the network. The

_dynamical

time evolution of the _systemis then formulated in

terms of the local fields

_{hi ( t)}

where j

runs over the

_arbitrary

K

_neighbours

of cell

_i,

not

_necessarily

nearest

_neighbours,

and the normalization

_factor

_{1 c}

₁

_iis defined

_by

. The interaction matrix C is assumed to be real and

hi

can be

interpreted

as the net internal

stimulus from cells

_synapsing

onto cell i

_{represented by}

the usual linear sum of

_weighted

inputs.

The actual

_binary

state is then determined via the deterministic threshold rule

The network is now

_supposed

to learn a series

_{of p = a K}

_random,

unbiased

_{configurations}

n(IL)

=

(7T fIL),

...,

03C0N (u )

J-’ =

1,

..., p

with p being proportional

to the

connectivity

parameter

K. A necessary

through

minimal

_requirement

is that the

_pN

_polarization

_parameters,

Rv(u)

defined as

are

_larger

than zero, which

guarantees

that the

_patterns

to be memorized are fixed

points

of

the

_{dynamics (1).}

In order to _{engrave the information}as

_strongly

as

_possible

one has to

_adjust

the

_couplings

such that the

_polarization

_parameters

in

_{(4) satisfy}

the

_inequality

with the

_{quantity K}

as

_large

as

_possible.

In the case of

_random,

uncorrelated

_patterns

and

random

_connectivity

_{the upper}

_storage

_{capacity a}

for

_given

K has been shown to

_{satisfy [2]}

in the

_{thermodynamic}

limit.

_{Equation (6) implies}

that the critical _{a c}= 2 is

reached,

when K vanishes. In view of technical

_{applications,}

however,

where the total number of cells N is finite and some

_patterns

_might

be

_strongly

correlated,

one is faced with the

_problem

to

produce

a set of

_cil

so as to

_satisfy

condition

₍₅₎

with K as

_large

as

_possible.

A

_{straightforward}

generalization

of the

_perceptron

_{learning algorithm [6]}

suggests

the

_following

local

_dynamical

learning

rule

Thus,

patterns

well embedded have no

_impact

on the modification of the

_synaptic

efficiencies,

whereas

_patterns

not

_{satisfying (5)}

become effective in

_modifying

the

_couplings.

If

_then,

on _{average, the}

_magnitude

of the

_polarization

_parameters

_RlJL)

in

₍₄₎

are maximized

the sizes of the basins of attraction are also

expected

to be

_fairly

close to their

_optimal

magnitudes.

Here it is useful to start from « tabula rasa » rather than from

_randomly

chosen

(4)

1587

3.

_Stability

_analysis.

Exact results can be derived in the

thermodynamic

limit for

connectivity

_parameters

K oc

_{Log (N ),}

if the

_connectivity

as well as the

_patterns

are chosen at random. The fractional

Hamming

distance between a

_spin

configuration

and an

_arbitrary

stored

pattern

at time t can

be

_given

as a function of the distance at the

_previous

time

step

and the

_polarization

parameter

K related to the critical value of a

_{by (6) [10,}

_5]

Here,

erf

_(z)

denotes the error function defined as

The time evolution of

_H(t

₎

admits the natural fixed

_points

H - 0

and H -1 /2

and the existence

2

of a

_phase

transition from an ordered to a chaotic behaviour

_{depends only}

on the

magnitude

of the

_slope

of 0

_(H)

evaluated at the fixed

_points.

Figure

1 shows that

_{d cP / dHI H ae 0}

exhibits a

_singularity

at K

_{= 0,}

which allows no

_recognition

for

_{ac = 2}

since the fixed

point

H = 0 is unstable. On the

_contrary,

for

_{K = 0,}

d 0 IDHI H = 0

and all

_higher

derivatives remain

_strictly

zero.

_Hence,

the fixed

_point

H - 0 is

_always

stable and any stored

pattern

can in

_principle

be recovered if it shares a

sufficiently large overlap

with the initial condition.

_Moreover,

as can be seen from

_figure

_1,

for 0 K _{K G ~}

l.l,

_{d cP / dHI H ae 1/2}

does not exceed the value one which

_implies

that in this

parameter

regime

the fixed

_point

H -1 is also

_stable,

_giving

rise to the existence of a

_strictly

2

unstable fixed

_point

H*.

_Hence,

the memory of initial conditions can be

_completely

_erased,

since the time evolution of the

_overlap

of

_slightly

different

_{configurations}

with a stored

pattern

might

evolve to zero.

_According

to

_{equation (6)}

_KG

_corresponds

to a critical

_capacity

a G = 0.42.

In contrast to random model networks

_[5]

the

_prevalence

of the ordered and chaotic

_phase

not

_{only depends}

on the network

_parameters,

but also on the initial conditions. For K above the critical _KGthe unstable fixed

_point

H*

_disappears

and the fixed

point H =1/2

_changes

its

2

stability

from attraction to

_repulsion.

_Then,

_{any initial}

_{configuration}

with a

_Hamming

distance

H(t

=

0)

1

with _astored pattern will lead _to

perfect recognition.

(5)

Fig.

1. - The

slope of 03A6 (H)

for

_{a) K}

= 0,

b) K

= 0.5,

c) K

= 1.198 and

d) K

= 4.

For

H 1

_{and K #}

0

_equation

₍₈₎

can be well

_{approximated by}

2

whereas for H close to the value 1

whereas for H close to the

2

is a

_{good approximation. Equation (11)}

_suggests

_{rather fast local convergence}to a stored

pattern.

Note that

_{equation (8)}

is a scalar mean field

_equation

in the

_{thermodynamic}

limit and

the two basins of attraction are well

_{separated by}

the relative

_position

of the unstable fixed

point

H*.

_However,

for a finite number of cells a small fractional

_Hamming

distance does not

guarantee

convergence to a stored

pattern,

since

_judgement

involves a measure in n-dimensional space,

where,

depending

on the number of stored

_memories,

the basins of

attraction can be

_{degraded by large}

crevices and holes close to the stored information. These

holes allow errors to be incurred for small

_Hamming

distances from the stored states and

degrade

the network

_{performance substantially.}

4.

_Computer

simulations.

In this section we

_give

some results of our

_computer

simulations for three network models

(6)

1589

passes and

stop

the

_{learning procedure}

₍₇₎

when the average of all

polarization

parameters

reaches a

_near-optimal

value. Before the retrieval

_phase

the

_original

_patterns

are

degraded by

random

noise,

and

_presented

as initial conditions for the network to recall.

4.1 NEXT-NEIGHBOUR SQUARE LATTICE MODEL. - We first

_study

networks with cells

residing

on a two-dimensional square lattice restricted to

_{exclusively eight}

next-nearest

neighbour

interactions.

_Figure

2 shows that the fraction of bits not

_satisfying

condition

₍₅₎

even with K = 0 _growswith

increasing

_a.

Thus,

a

_portion

of the

_patterns

are not fixed

_points

of the

_{dynamics (1)}

and

_consequently

cannot be recalled

_{perfectly. Though}

for a = 0.375 and a = 0.5

figure

2 shows that the

output-input

ratio is

_{essentially larger}

than one, the network is

hardly capable

of

_functioning

as an associative memory. The results are in accord with Forrest’s

findings [7]

who

reports

a

somewhat smaller

_output-input

ratio

_explainable

through

the restriction of

_{only allowing}

clipped coupling

coefficients,

which reduces the

storage

capacity

and the

_performance

of the network. For a = 0.625 the situation

_gets

_worse,where the retrieval rate is in fact less than

the

_input

rate. These results are

_independent

of N for N > 225 and have been

_performed

for

N =

225,N

= 625 and N = 2 500

respectively, averaged

over ten different initial

configur-ations and some

_twenty

_independent

_systems.

It is

_{obvious, therefore,}

that models with

_{exclusively nearest-neighbour}

interactions are not at all

_designed

to process random

_patterns.

Fig.

2. -

Mean fraction of

correctly

recalled bits as a function of the fraction of

_initially

correct bis for

K = 8

(square lattice),

a = 0.375, a = 0.5 and a = 0.625

(from above).

Stars

represent Forrest’s results for a = 0.5 based _on

clipped coupling

coefficients

(

1 cj

1 = 1 ).

4.2 RANDOM NEIGHBOUR LONG-RANGED MODEL. - In a next

_step

we

_study

a somewhat less restricted and more realistic class of

_networks,

where the units choose their

_arbitrary

(7)

As has been found for the

_{next-nearest-neighbour}

model,

computer

simulations for

K = 8 reveal that after the

learning

_processa small

_portion

of the

_patterns

to be memorized

are not fixed

_points

of the

_{dynamics (3), though}

this fraction decreases to zero with

_increasing

N.

_Figure

3 shows the mean fraction

_{of correctly}

retrieved

_spins

as a function of the fraction of

initially

correct

_spins

for a = 0.375 and different total number of cells N. Note the

_strong

size

dependence clearly indicating

that,

for

_large

_N,

an initial

_damage

of less than

_fifty

_percent

leads to an

_{essentially perfect}

recall as

_{predicted by}

Gardner’s theoretical results

_[10]

in section 3.

Fig.

3. - Mean fraction

of correctly

recalled

spins

as a function of the fraction

of initially

correct

_spins

for K = 8

(random connectivity),

N = 225, N = 625 and N = 2 500 cells for a = 0.375

(from below).

As to be

_expected

the situation

_{changes drastically}

for a = 0.5. The

_stability

of the fixed

point H =1/2

reduces

_{substantially}

the

_magnitude

of the basins of attraction of the individual

2

attractors. We observe that with

_{increasing degree}

of _{noise the average}transient

_lengths

for

parameter

values above the critical _{a G}= 0.42 increases

dramatically, being

of the order of

several

_{magnitudes larger}

than below the critical _aG.

_Moreover,

the

_system

often does not

lock into a fixed

_point

after more than some five hundred time

_steps

_wandering

around in

phase

space

indicating

that attractors

_{corresponding}

to chaotic behaviour become dominant.

Though

the

_performance

of random

_neighbour

models is

_perfect

for a = 0.375 it

performs

very

poorly

and even breaks down for

_{capacities a}

above the critical value _{a G.}The sudden

appearance of

long

transients and failures in

_reaching

a

_cyclic

mode is

quite

reminiscent of

damage spreading

effects in networks with

_randomly

chosen

_couplings,

where above a critical parameter value disorder grows

exponentially

with the number of cells

[8, 9]

and chaotic

dynamics prevails.

(8)

1591

storage

capacity

far

_beyond

the critical

_capacity

a G = 0.42.

Furthermore,

all the

patterns

to

be memorized should be stable fixed

_points

of the

dynamics

(1)

and their basins of attraction should be as

_large

as

_possible.

As a first

_improvement,

we

_adopt

a trial and error scheme

by

choosing

new random

_neighbours

if the initial random

neighbour

choice fails to fulfill the

embedding

condition

_(5).

The

_strategy

is

_repeated

until the network has found its

quasi-optimal

connections.

_Figure

4 shows the recall

_performance

of such a network f-or K = 8 and a = 0.5 for different numbers of cells N. The final connections have been obtained

after a few trials on _average,

_however,

there are

_large

_{fluctuations,}

since most of the

polarization

parameters

satisfy (6)

already

after the initial random

_neighbour

choice.

Fig.

4. - Mean fraction of

correctly

recalled bits as a function of the fraction of correct

_input

bits for

K = 8, N = 225, N = 625 and N = 2 500 for a = 0.5

(from above).

Crosses show the _same

quantity

for

N = 225 cells without the trial and error _strategy.

Note that the

_improved

model for a = 0.5

performs

somewhat better than the one without

quasi-optimization

for a = 0.375.

Furthermore,

_computer

simulations indicate that also for a

far

_beyond

2 the trial and error scheme works

_{successfully, though}

with

_{increasing a}

the

learning

time increases

_rapidly

due to

_larger

amounts of trials until the

_embedding

condition

(5)

is fulfilled. In

_figure

5 we compare the

performance

of these

_{optimized sparsely}

connected

networks to that with a

_fully

connected model for a = 0.5.

Though

in all networks the

amount of stored information in bits is alike

(3

200

_bits),

and all networks have the same

number of 6 400

_couplings

it is

quite

clear from the

_figure

that the

_sparsely

connected networks

_outperform

their

_fully

connected

_{counterparts.}

Note, however,

that for structured

_information,

the information content is not the same,

(9)

Fig.

5. - Mean

fraction

_{of correctly}

recalled

_spins

as a function of the fraction of correct

_{input spins}

for

sparsely

connected networks

_consisting

of N = 800, N = 400 and N = 200 cells with a = 0.5 and

K = 8, K = 16 and K = 32

couplings, respectively (from

the

stop).

The dashed line shows the same

quantity

for a

_fully

connected network

_consisting

of N = 80 cells.

patterns.

Due to

_stronger

correlations within the individual

_patterns

the information content

per bit decreases

strongly

with

increasing

resolution in the case of structured

patterns.

Thus,

according

to

_figure

_5,

_depending

on technical

_tasks,

one can choose between

_memorizing

a

few

_particular

_patterns

with

_{relatively high}

resolution

_{(K smaller)}

and a

_larger

number of

patterns

with

_relatively

low resolution

_{(K larger).}

Let us now

_study

the

_performance

of the network with

_{optimized connectivity,}

where the ratio of the number of connections and the number of cells y =

K/N

remains constant.

Figure 6

shows the retrieval

_performance

of

_sparsely

_connected

networks with y = 0.16

compared

to

_fully

connected networks for different numbers of N and a = 0.5.

As has been observed

_by

Forrest

_[4]

for the case y = 1 the recall fraction exhibits _a crossover effect for

_increasing

N for y

1,

too. As the number of cells N

_increases,

there is a

dramatic increase

_(decrease)

in the level of errors, when the fraction of correct

_input

bits is reduced below

_{(increased above)}

a critical value. It is

_quite

evident that the smaller the

quantity

y the more the critical retrieval value is reduced so that the

_performance

of the network

_improves.

Our network model is not restricted to the

_storage

of random information but is also able to

process

artificially

structured

_patterns

_[11].

_Here,

we first connect each cell with its

_eight

next-nearest

_neighbours,

while additional suitable

_long-ranged

connections _{emerge from}a trial and

error scheme. The

_outcoming

architecture is reminiscent of

_{Braitenberg’s picture [12]}

about the

_{interconnectivity}

of cortical

_pyramidal

cells

_{making short-range}

connections with basal

dendrites and

_long-range

connections with

_apical

dendrites.

_According

to the

_{physiological}

rule that

_low-efficacy

_synapses

_degenerate

we

_suggest

to

_apply

a trial and error scheme

_by

(10)

1593

Fig.

6. - Mean fraction of

correctly

recalled patterns as a function of the fraction of

_initially

correct

spins

for

_a)

N = 100, N = 200 and N = 300 cells with _y= 0.16, and

b)

N = 40, N = 80 and

N = 120 cells with

y = 1. The _{storage parameter}a has been chosen as 0.5.

biologically

motivated _strategyis

_highly

beneficial for

_improving

the convergence to near

optimal

answers. 5. Conclusion.

We have shown that for

_randomly

chosen information the

_performance

of

_{optimized sparsely}

connected networks is

_{quite superior}

to that of

_short-ranged

lattice models as well as to that of their

_{fully-connected}

_{counterparts.}

It is

_quite

evident that our

_strategy

for

solving

the

combinatorial

_{optimization problem}

can be

_{largely improved.}

_Here,

parallel genetic

algorithms [13J,

in the

_spirit

of the discrete Mendelian

_genetic

model can be introduced with a

view to

_get

closer to

_near-optimal

answers. The

general

philosophy

is that the

_specific

neighbour

choice,

as used

_here,

and in future work also other network

_parameters

such as

degree

of

_connectivity

and thresholds

_ought

to be

_adapted

to the

_specific

information the network has to

_capture

and hence can be

_subject

to modification

_during

an

_optimization

process.

Acknowledgments.

The author

_{gratefully acknowledges}

financial

_support

for this work

by

the German Federal

Department

of Research and

_Technology

_(BMFT)

under

_grant

Nr. ITR 8800K4. It is a

pleasure

to thank the von Seelen _groupfor their

_hospitality

at the Institute for

Neuroinforma-tics in Bochum

_during

several visits. The author

_appreciates

_manyuseful and

_stimulating

discussions with J. W.

Clark,

J.

_Duarte,

M.

_Husson,

J. L. van

_Hemmen,

U.

Keller,

G.

(11)

References

[1]

HOPFIELD J. J., Proc. Natl. Acad. Sci. 79

₍₁₉₈₂₎

2554.

[2]

GARDNER E., J.

_Phys.

A 21

₍₁₉₈₇₎

257.

[3]

DERRIDA B., GARDNER E. and ZIPPELIUS A.,

Europhys.

Lett. 4

₍₁₉₈₇₎

167.

[4]

FORREST B. M., J.

_{Phys. A :}

Math. Gen. 21

₍₁₉₈₈₎

245.

[5]

KÜRTEN K. E.,

Phys.

Lett. A 129

₍₁₉₈₈₎

157.

[6]

MINSKY M. L. and PAPERT _S.,

_Perceptrons

_(Cambridge,

MA : MIT

_Press)

1969.

[7]

FORREST B. M., J.

_Phys.

France 50

₍₁₉₈₉₎

2003.

[8]

STAUFFER D., Philos.

_Mag.

B 56

₍₁₉₈₇₎

901.

[9]

KÜRTEN K. E., J.

_Phys.

France 50

₍₁₉₈₉₎

2313.

[10]

GARDNER E., J.

_{Phys. A}

22

₍₁₉₈₉₎

1969.

[11]

KÜRTEN K. E., Parallel

_Processing

in neural _systemsand _computers,Eds. R. _Eckmiller,G. Hartmann and G. Hauske

_{(North-Holland)}

_1990,

_{pp. 191-194.}

[12]

BRAITENBERG _V.,Lect. Notes Biomath.

_{(Springer Verlag Berlin)}

21

₍₁₉₇₈₎

171.

[13]

MÜHLENBEIN H. and KINDERMANN J., Connectionism in

_Perspective,

Eds. R. Pfeifer et al.

Quasi-optimized memorization and retrieval dynamics in sparsely connected neural network models

HAL Id: jpa-00212470

https://hal.archives-ouvertes.fr/jpa-00212470

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Quasi-optimized memorization and retrieval dynamics in

sparsely connected neural network models

Karl E. Kürten

To cite this version:

Quasi-optimized

memorization and retrieval

dynamics

in

sparsely

connected neural network models

Physik,

c/o

(Reçu

accepté

1990)

topologies

connectivity

processing

quasi-optimal

neighbour

successfully

giving

connectivity

adapted

specific

quasi-optimally

purely

neighbour

competitive

sparsely

quasi-optimal neighbour

techniques substantially outperform

fully

Phys.

(1990)

Physics

dynamical

exhibiting

phenomena

process-ing

cognitive

storage

memories,

learning

forgetting,

[1-3].

assumption

currently

popular

cônnectivity

hand,

high connectivity

rarely

other,

highly

respectable

size,

cognitive

pattern

recognition

image processing,

wiring problems

computationally expensive.

fact,

_dynamics

_Physik,

_accepté

₁₉₉₀₎

_topologies

_{quasi-optimal}

_successfully

_giving

_connectivity

_adapted

_specific

_purely

_neighbour

_competitive

_sparsely

_{techniques substantially outperform}

_fully

_Phys.

₍₁₉₉₀₎

_dynamical

_phenomena

_cognitive

_storage

_forgetting,

_[1-3].

_assumption

_currently

_{cônnectivity}

_rarely

_cognitive

_pattern

_{wiring problems}

_{computationally expensive.}

_fact,

_biological

_sparsely

_{interconnected,}

_{analytical study [3, 5].}

_sparsely

_logical

_{only capable}

_exactly

_inputs

_dynamical

_{hi ( t)}

_arbitrary

_neighbours

_i,

_necessarily

_neighbours,

_factor

_{1 c}

₁

_by

_synapsing

_{represented by}

_weighted

_binary

_supposed

_{of p = a K}

_random,

_{configurations}

_requirement

_pN

_polarization

_parameters,

_larger

_patterns

_{dynamics (1).}

_strongly

_possible

_adjust

_couplings

_polarization

_parameters

_{(4) satisfy}

_inequality

_{quantity K}

_large

_possible.