Performance enhancement of Willshaw type networks through the use of limit cycles

(1)

HAL Id: jpa-00212536

https://hal.archives-ouvertes.fr/jpa-00212536

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Performance enhancement of Willshaw type networks

through the use of limit cycles

G.A. Kohring

To cite this version:

(2)

2387

Short Communication

Performance

enhancement of Willshaw

_type

networks

_through

the

use

of

limit

cycles

G.A.

_{Kohring (*)}

HLRZ an der KFA

_Jülich,

D-5170

_Julich,

F.R.G.

(Received

24

_August

_1990,

_{accepted 30August}

₁₉₉₀₎

Abstract. 2014 Simulation results of _aWillshaw

type model for

_storing

_sparsely

coded _patternsare

presented.

It is

_suggested

that random _patternscan be stored in Willshaw _typemodels

_by

_transforming

them into a set of

_sparsely

coded _patternsand

_retrieving

this set as a limit

_cycle.

_{In this way,}the number of _stepsneeded to recall a _patternwill be a function of the amount of information the _pattern

contains. A

_general

_algorithm

for

_simulating

neural networks with

_sparsely

coded _patternsis also

discussed, and, on a

_fully

connected network ofN = 36 864 neurons

_(1.4

x

109 _couplings),

it is shown

to achieve effective

_{updaping speeds}

as

_high

as 1.6 x

1011 coupling

evaluations per second on one

Cray-YMP

processor.

LE

JOURNAL

DE

_PHYSIQUE

1

_Phys.

France 51

_{( 1990)}

2387-2393 1er NoVEMBRE 1990, 1

Classification

Physics

Abstracts 87.30G

Sparse coding,

i.e.

_coding

a

_pattern

such that the number of ones is much smaller than the

num-ber of zeros or minus ones, has been of interest to the neural network

_community

for _{many years.}

Neurological

modellers,

for

_example,

seek in

_sparse

_coding

an

_explanation

of the low neuron

_firing

rates

_{[1, 2]}

which have been

_{experimentally}

observed in the cerebral cortex

_{[3] ;}

while

_application

oriented researchers note that neural networks are

_{computationally}

more efficient than conven-tional methods for associative _memorytasks

_only

when the number of stored

_patterns

_perneuron is

_greater

than the number of

_incoming

_synapses

_perneuron

_[4,5].

This latter condition is

effec-tively

fulfillable

_only

for

_sparsely

coded

_patterns,

since for

_non-sparsely

coded

_patterns

the basins of attraction will be

_negligibly

small whenever this constraint is satisfied

_{[5, 6].}

One of the earliest models for

_sparsely

coded

_patterns,

and one which has

_appealed

to both of the above

_groups,

is the model introduced

_by

Willshaw et al.

_[7,2,4,8].

This model consists of N

neurons,

(S; [ 1

=

1,...,

N},

taking

on the values

_{{D, 1}}

for the

_resting

and

_firing

states

_{respectively.}

(3)

2388

In accord with current

_{neurological thinking, synaptic plasticity}

is assumed to occur

_only

at

exci-tatory synapses

and

only

when both the

presynaptic

and

postsynaptic

neurons fire

_{simultaneously.}

Inhibition is assumed to be the mechanism

_whereby

nature controls the neuron

_firing

rates and

thus achieves

_sparse

_coding

_{[1] .}

A further attribute of the Willshaw

model

is the use of

_clipped

_synapses,

_i.e.,

the

_synapses,

fjij

Ji, j

=

1,

...,

N;

Jii

-

0},

take

only

the two

values,

{o,1}.

Such a drastic reduction of the

coupling phase

space

is not

_biologically

_justifiable

_except

under certain conditions when the

post-synaptic

membrane of a dendritic

_spine

is active

_{[9] ,}

but it has been shown to

_improve

the

infor-mation

storage

efficiency

of neural networks

_[5]

and due to its

_simplicity,

it is of much interest to

the

_applied

research

_community.

Combined with the above

_sparse

_coding

of

_patterns,

this leads

to

_improved

retrieval

_performance

as well as fast and efficient

_updating

and

_{learning algorithms.}

Information is stored in the Willshaw model

_by

a

_learning

_procédure

_which,

in contrast to other

learning

prescriptions,

is

_{comparatively}

_simple.

For concreteness, consider the case of

_storing

sparsely

coded random

_patterns.

Each

_node,

_{

_{gf ]1}

=

1,...,

N ; p

=

1,...,

P},

of the P random

patterns,

is selected to be either 1 or 0 with

_probability,

_aP,

_{and, 1 -}

_a4,

_{respectively.}

The

_pattern

activity,

aP,

should

_satisfy,

am «

1,

and _maybe different for différent

_patterns.

Previous work

usu-ally

included the extra

_restriction;

_{Li çf}

=

_aN,

with the same

_activity

level for all

_patterns

_{[2, 7, 8].}

However,

this is not _necessary

_provided

the network

_updating

rule is

_{appropriately}

modified

_by

the addition of

_inhibitory

interactions.

From the chosen

_patterns,

the

_couplings

are defined

_by

the

_{equation :}

This

_equation

is well suited for hardware

_{implementation}

in artificial neural networks as well as

efficient

_programming

on conventional

_computers.

_Through

the use of multineuron

_coding

_{[10] ,}

i.e.,

storing

the neurons as

_single

bits rather than

_single

words,

one can take

_advantage

of the

parallelization

inherent in conventional

_computers.

For the case of

_equation

_(1),

one should store the state of the i-th neuron from each of B

_patterns

in one

_{computer word}

of B-bits

Then,

equation

(1)

can be

_simply

rewritten as:

where the

_symbol,

A,

represents

the

_logical

AND

_operator

and the

_symbol,

_{V , represents}

the

logical

OR

_operator.

In this

_algorithm,

the execution of the AND

_operation

calculates the con-tribution to the

_couplings

of B

_patterns

_{simultaneously.}

After

_OR-ing

this contribution with the

contribution from all other

_P/B

words

_containing

the stored

_patterns,

_Jij

is set to one if the

re-sulting

word is nonzero and to zero otherwise. Such an

_{algorithm requires}

0

_{(p N2 / B)}

logical

operations

for

_learning

P

_patterns.

For neural networks with

_binary

valued

_synapses,

such an

_algorithm

_maybe the fastest and

(4)

new

_patterns

are added.

Hence,

this would seem to be the

_only

_learning

rule for

_binary

_synapses

which has a chance of

_being

_implemented

in _any

_practical

device.

willshaw et al. have shown in their

_original

_paper

_[7]

that this model network reaches a max-imum information content of E = In 2 stored bits _perbit of _synapse,when the

activity

of each

pattern

is

_{given by}

Oc =

(Nln

2)-lln

N.

However,

this calculation is

only

based

upon

the static

stability

of the

_patterns.

It does not take into account the

_possibility

that the

_patterns

_maybe

dynamically

unstable, i.e.,

that the basins of attraction are of

_negligible

size.

_(In

other models it is known that the basins of attraction tend to zero size at the limit of maximum

_storage

effi-ciency

[5, 6].)

The condition

_{of dynamic stability}

can

_only

be

_{investigated by studying}

the network

dynamics through

numerical simulations.

The

_{dynamical updating}

rule for a

_single

_layer,

recurrent network

_operating

in the noise free limit is

_given

_{by :}

where,

The first term on the

_right

hand side of

_{equation (4)}

is the usual Willshaw local field

contribution,

while the

second,

a

_global

_inhibitory

term

_{[2] ,}

has the effect of

_stabilizing

_patterns

with different activation levels. This obviates the

_previous

restriction of

_{only storing}

_patterns

with one definite activation level. À is an a

_priori

defined constant in the _range0 A 1.

When

_studying

the retrieval

_properties

of this

_model,

most of the simulation time is taken _up with the calculation of the local

fields,

hi(S[t]).

This time can be

_improved

_upon

_{by using}

multi-neuron

_coding

and

_taking

_advantage

of the _sparsedistribution of l’s in each

_pattern.

_Equation

(4)

is best calculated

_by

_{storing B}

neurons from

_S(t)

in a

_single

word of B

_bits,

and B _synapses from

_{Jj; ,}

in a

_single

word of B bits.

Since,

only

those

_S(t)

which have the value

_{Si (t)}

= 1 contribute to the sum in

_{equation (4),}

one

_should,

in

_principle,

sum

_only

over such terms. In

_practice,

the time needed to determine whether or not a

_{particular S;(t)}

is 0 or 1 is of the same order of

_magnitude

as the time needed to

carry out the

_{multiplication}

and addition. This

_problem

can be countered

_{by using}

a

_{fully parallel}

updating

scheme and

_{interchanging}

the order in which the calculations are

_performed.

As an

_example

of such a calculational

_scheme,

consider the

_{following algorithm :}

POPCNT

_(x)

counts the number of bits set

_equal

to one in the word x. The innermost

_loop

of this

algorithm

can be

_fully

vectorized

_using

_{Cray-standard}

Fortran. Such a

_program

comes close to

realizing

the maximum

_{possible O(B/a) speed-up}

over conventional

_algorithms

which use

(5)

2390

paper

is the

Hopfield-Little

model

[11]

with the

commonly

used continuous

synapses.)

In

figure

1 is a

_plot

of the time needed for one

_{complete parallel update}

of a

_fully

connected network when tite network is loaded to its maximum

_storage

_efficiency.

Note,

that for models with continuous

synapses,

the main

memory

of the

Cray-YMP/832

cannot hold networks with more than 2.2 x

107 couplings,

i.e.,

a

_{fully-connected}

network with no more than N = 4 672 neurons.

Fig.

1. - Plot of the time needed for _one

parallel update

of the entire network as a function of _pattern

activity

for a

_{single Cray-YMP}

_processor.

_Triangles

are for a network of N = 20 480 neurons. In the insert is shown a

_plot

of

_update

time versus N.

_Squares

are for a conventional

_{algorithm implementing}

continuous

synapses and the circles are for the present

algorithm

when a> _

_(Nln

2)-lln

N.

From this

_plot

it is

_clearly

seen that there are orders of

_magnitude

differences in the

_updating

times of the two

_algorithms.

At our maximum network size of N = 36

864,

the

_updating

time

corresponds

to a sustained

_speed

of 1.64 x

1011

_coupling

evaluations

_per

second

₍₁₆₄

_Gcops).

The

_speed

and

_simplicity

of such

_algorithms

make them attractive for

_practical

_applications

since

they

can be hard-wired with conventional

_technology.

As mentioned

above,

the usefulness of such models for associative _memory

_applications

de-pends

not

_only

on the information

_storage

_capacity,

but also on the retrieval

_{properties. Figure}

2 shows a

_plot

of the retrieval

_performance

for several different values of the _average

_pattern

activ-ity.

For each

network,

the number of stored

_patterns

has been

_adjusted

so that the information

content

_per

_synapse

is

_{approximately}

the

_{same, E}

= 0.086 :i: 0.002. Given the différent

activity

levels in the different sets of

_patterns,

this

_gives

a better

_comparison

of network

_performance

than

simply storing

the same number of

_patterns

in each network.

At S m

_0.086,

one would have

_expected

a networkwhose stored

_patterns

have an _average

_activity

given by

a> = 0.0007 to exhibit better retrieval

_properties

than a network whose

_patterns

have

average

activity given by

a> =

0.0035,

because the latter network is

operating

closer to it’s

overloading

threshold of S m 0.093

_{[7,8] .}

From

_figure

2,

however,

it is

_clearly

seen that this is

(6)

Fig.

2. - Plot of the

percentage

of correctly

retrieved _patternsas a function of the initial normalized

Ham-ming

distance.The network is

_fully

connected with N = 20 480 neurons and the number of stored _patterns

has been

_adjusted

to the _average

_activity

level so that the information content is t = 0.086 + 0.002 bits stored

per bit of synapse. Circles : a> =

0.0007,

P = 206

336;

Squares :

_a>=

0.003,

P =

61184;

Triangles

are

for the same

_system

as the _squaresbut for the case of

_{storing cycles consisting}

of 64 states. In the insert is shown a

_plot

of the minimum normalized

_Hamming

distance which is needed to

_correctly

recall 50% of the initial states. All of the networks contain the same amount of information.

activity

exhibits better retrieval

_{properties. (i.e.,}

_upto the

_point

where the

_pattern

_activity

cannot

support

the fixed

_{information ; apacity.)}

A sketch

_showing

the minimum initial

_Hamming

distance needed to insure correct retrieval 50% of the time as a function of the

_average

_activity

is shown in

_figure

2 for N = 20 480.

What these

_figures

show is that the

_optimal

network

_parameters

determined

_by

_simply

static

properties,

such as information

_capacity,

are no

_{longer optimal}

when

_{dynamics properties,}

such as retrieval

_performance,

are also taken into account. This conclusion is in

_agreement

with other results found in a

_variety

of différent models

[5, 6] .

One criticism of

_storing

_patterns

with such low

_activity

_levels,

is that each

_pattern

_{contains very} little information. With the information content of a

_pattern

_{having activity,}

_a,

_given

_by

the usual

entropy

expression;

,

then,

the ratio of the information contents of

_patterns

with different

_activity

levels is

_{given by :}

In the

_{present model,}

one is in the situation of

_being

able to recall

_patterns

with low information

(7)

2392

a network model with a much slower

_updating

time. One _wayout of this dilemma is to transform

patterns

with

_higher

information content into a set of

_patterns

with lower information content and

to recall this set of

_patterns

as a

_dynamical

limit

_cycle

rather than a fixed

_point.

For random

_patterns

of

_activity

_level,

al

= 1/2,

the

function, 0,

in

equation (6)

gives

the

mini-mum number of

_patterns

_{having activity}

level,

a2, which are _necessaryto encode all the

informa-tion contained in the

_original

_pattern

as :

Once the information has been

_{appropriately}

_transformed,

the

_{dynamical cycle}

of

_patterns,

{

Çf [1

=1,

...,N; J.L

= 1,

"""’

pc ),

can be stored in the network

by

adjusting

the

couplings

previ-ously

determined with

_{equation (2)}

from

_Jij

to

_Jj;

as :

where,

(f+1 ==

_(il.

Although

it _mayseem counterintuitive to

_expect

such

_{highly cliped}

_synapses

to be able to store

_cycles,

the result in

_figure

2 show that the retrieval

_properties

are as

_good

as those for fixed

_point

_storage

in an

_eqûivalent

network.

_{(For programming}

convenience the

_cycles

have a

_length

of 64 states

_each,

but

_cycles

of _any

_length

are

_possible.)

Previous researchers have discussed

_using

_cycles

to store time

_sequences

_[12]

or to achieve

_low,

local neuron

_firing

rates

_{[1, 2].}

_Here,

_however,

the

_{cycles play}

the fundamental role of

_conveying

more information than can be

_conveyed

in a

_single

_pattern

of low

_activity.

Combined with the

updating algorithm

presented

here,

this means that the

_length

of time needed to

_completely

recall

a

_given

amount of information is

_proportional

to that amount of information. This is in

_agreement

with anecdotal results from human

_beings,

but it is in contrast to

_previous

_algorithms

and models where the time needed to recall information is

_independent

of the amount of information

_being

recalled.

Furthermore,

by

equation (7),

it

_requires

34

_patterns

_{having activity}

level a> =

0.003,

to store

as much information as is contained in a

_single

random

_pattern

of

_activity

level a> = 0.5. Such a random

_pattern

on a network of N = 20 480 neurons would need over 2 000 ms for retrieval

_using

a

conventional model and

_algorithm.

On the other

_hand,

with the

_present

model and

_algorithm,

the entire 34 state

_cycle

of low

_activity

_patterns

can be recalled in about 450 ms.

_Hence,

the

_present

model is

_{computationally}

_veryefficient and could be

_{quite advantageous}

in

_{practical applications.}

One

_problem

which still needs further

work,

is the

_question

of

_finding

a convenient method for

converting

patterns

of

_{high activity}

into a minimal set of

_patterns

of low

_activity.

This

however,

raises the more

_{general question}

of how to code real world

_patterns

for efficient

_processing

_by

neural networks. In

_biological

_systems,

an

_image

at the retina

_undergoes

_{many transformations} before

_reaching

the visual cortex, where it is

_presumably

stored in some

_highly

_compact

form

[13] .

It is

_likely

that artificial networks _mayneed to use similar

_types

of transformations before

_they

can

deal

_efficiently

with "real-world"

_patterns.

In _summary,a fast and efficient

_algorithm

for

_simulating

Willshaw

_type

models has been

pre-sented.

_Using

this

_algorithm,

it has been shown that the

_optimal

value of the

_pattern

_activity

in

terms of the retrieval

_properties

is not in

_agreement

with the

_optimal

value

_previously

determined

(8)

information content

_patterns

has also been demonstrated and shown to be

_{computationally}

more efficient under the

_present

_algorithm

than under conventional

_algorithms.

Acknowledgements.

1 am _very

_grateful

to D. Stauffer for _many

_helpful

conversations related to this work.

References

[1]

AMIT D.J. and TREVES A., Proc. Nat. Acad. Sci. USA 86

₍₁₉₈₉₎

_7871;

BUHMAN

J.,

Phys.

Rev. A40

₍₁₉₈₉₎

_{4145 ;}

TREVES A. and AMIT

D.J.,

J.

_Phys.

A22

₍₁₉₈₉₎

2205.

[2]

GOLOMB

D.,

RUBIN N. and SOMPOLINSKY H.,

Phys.

Rev. A41

₍₁₉₉₀₎

1843.

[3]

ABELES

_M.,

VAADIA E. and BERGMAN

H.,

Network 1

₍₁₉₉₀₎

13.

[4]

PALM

G.,

Biol.

_Cybern.

36

₍₁₉₈₀₎

_19;

THAKOOR A.P., MOOPENN A., LAMBE J. and KHANNA

_{S.K., Appl. Opt.}

26

₍₁₉₈₇₎

5085.

[5]

KOHRING G.A.

_(preprint)

HLRZ-33/90.

[6]

KANTER I. and SOMPOLINSKY _H.,

_Phys.

Rev. A35

₍₁₉₈₇₎

_{380 ;}

FORREST

_{B.M., J.}

_Phys.

A21

₍₁₉₈₈₎

_{245 ;}

HORNER

H.,

BORMANN

D.,

FRICK

M.,

KINZELBACH H. and SCHMIDT A., Z.

_Phys.

B76

₍₁₉₈₉₎

_381;

KRÄTZSCHMAR J. and KOHRING

G.A., J.

Phys.

France 51

₍₁₉₉₀₎

223.

[7]

WILLSHAW

DJ.,

BUNEMAN O.P. and LONGUET-HIGGINS

H.C.,

Nature 222

₍₁₉₆₉₎

960.

[8]

AMARI

_S.-I.,

Neural Networks 2

₍₁₉₈₉₎

_{451 ;}

NADAL J.-P. and TOULOUSE

_G.,

Network 1

₍₁₉₉₀₎

61.

[9]

KOCH C. and POGGIO

_T,

in New

_Insights

into

_Synaptic

_Function,

Eds. G.M.

_Edelman,

W.W. Gall and W.M. Cowan

_{(John Wiley}

and

_Sons,

New

_York)

1986.

[10]

PENNA T.J.P. and OLIVEIRA

EM.C., J.

Phys.

A22

₍₁₉₈₉₎

_L719;

KOHRING

G.A., J.

Stat.

_Phys.

59

_{( 1990)}

1077.

[11]

HOPFIELD

J.J.,

Proc. Nat. Acad. Sci. USA 79

₍₁₉₈₂₎

_2554;

LITTLE

_{W.A., Math.}

Biosci.

_{19 (1975)}

101.

[12]

SOMPOLINSKY H. and _KANTER,

_Plays.

Rev. Lett. 57

₍₁₉₈₆₎

_2861;

BUHMANN J. and SCHULTEN K.,

Europhys.

Lett. 4

₍₁₉₈₇₎

_{1205 ;}

BAUER K. and KREY

_U.,

Z.

_Phys.

B79

₍₁₉₉₀₎

461.

[13]

DAUGMAN _{J.G., IEEE}Trans. ASSP 36

₍₁₉₈₈₎

_{1169 ;}

ZETSCHE

_C.,

in Parallel

_Processing

in Neural

_Systems

and

_Computers,

Eds. R.

_Eckmiller,

G. Hartmann and G. Hauske

_{(North-Holland, Amsterdam)}

1990.

[14]

KRAUTH W. and OPPER

M., J.

Phys.

A22

₍₁₉₈₉₎

_L519;

KÖHLER

H.,

DIEDERICH

S.,

KINZE L.W. and OPPER

M.,

Z.

_Phys.

B78

₍₁₉₉₀₎

_{333 ;}

KÖHLER

_H.,

Inst. Th.

_Phys.,

Uni.

_{Göttingen (preprint)}

1990.