• Aucun résultat trouvé

Performance enhancement of Willshaw type networks through the use of limit cycles

N/A
N/A
Protected

Academic year: 2021

Partager "Performance enhancement of Willshaw type networks through the use of limit cycles"

Copied!
8
0
0

Texte intégral

(1)

HAL Id: jpa-00212536

https://hal.archives-ouvertes.fr/jpa-00212536

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Performance enhancement of Willshaw type networks

through the use of limit cycles

G.A. Kohring

To cite this version:

(2)

2387

Short Communication

Performance

enhancement of Willshaw

type

networks

through

the

use

of

limit

cycles

G.A.

Kohring (*)

HLRZ an der KFA

Jülich,

D-5170

Julich,

F.R.G.

(Received

24

August

1990,

accepted 30August

1990)

Abstract. 2014 Simulation results of a Willshaw

type model for

storing

sparsely

coded patterns are

presented.

It is

suggested

that random patterns can be stored in Willshaw type models

by

transforming

them into a set of

sparsely

coded patterns and

retrieving

this set as a limit

cycle.

In this way, the number of steps needed to recall a pattern will be a function of the amount of information the pattern

contains. A

general

algorithm

for

simulating

neural networks with

sparsely

coded patterns is also

discussed, and, on a

fully

connected network ofN = 36 864 neurons

(1.4

x

109

couplings),

it is shown

to achieve effective

updaping speeds

as

high

as 1.6 x

1011 coupling

evaluations per second on one

Cray-YMP

processor.

LE

JOURNAL

DE

PHYSIQUE

1

Phys.

France 51

( 1990)

2387-2393 1er NoVEMBRE 1990, 1

Classification

Physics

Abstracts 87.30G

Sparse coding,

i.e.

coding

a

pattern

such that the number of ones is much smaller than the

num-ber of zeros or minus ones, has been of interest to the neural network

community

for many years.

Neurological

modellers,

for

example,

seek in

sparse

coding

an

explanation

of the low neuron

firing

rates

[1, 2]

which have been

experimentally

observed in the cerebral cortex

[3] ;

while

application

oriented researchers note that neural networks are

computationally

more efficient than conven-tional methods for associative memory tasks

only

when the number of stored

patterns

per neuron is

greater

than the number of

incoming

synapses

per neuron

[4,5].

This latter condition is

effec-tively

fulfillable

only

for

sparsely

coded

patterns,

since for

non-sparsely

coded

patterns

the basins of attraction will be

negligibly

small whenever this constraint is satisfied

[5, 6].

One of the earliest models for

sparsely

coded

patterns,

and one which has

appealed

to both of the above

groups,

is the model introduced

by

Willshaw et al.

[7,2,4,8].

This model consists of N

neurons,

(S; [ 1

=

1,...,

N},

taking

on the values

{D, 1}

for the

resting

and

firing

states

respectively.

(3)

2388

In accord with current

neurological thinking, synaptic plasticity

is assumed to occur

only

at

exci-tatory synapses

and

only

when both the

presynaptic

and

postsynaptic

neurons fire

simultaneously.

Inhibition is assumed to be the mechanism

whereby

nature controls the neuron

firing

rates and

thus achieves

sparse

coding

[1] .

A further attribute of the Willshaw

model

is the use of

clipped

synapses,

i.e.,

the

synapses,

fjij

Ji, j

=

1,

...,

N;

Jii

-

0},

take

only

the two

values,

{o,1}.

Such a drastic reduction of the

coupling phase

space

is not

biologically

justifiable

except

under certain conditions when the

post-synaptic

membrane of a dendritic

spine

is active

[9] ,

but it has been shown to

improve

the

infor-mation

storage

efficiency

of neural networks

[5]

and due to its

simplicity,

it is of much interest to

the

applied

research

community.

Combined with the above

sparse

coding

of

patterns,

this leads

to

improved

retrieval

performance

as well as fast and efficient

updating

and

learning algorithms.

Information is stored in the Willshaw model

by

a

learning

procédure

which,

in contrast to other

learning

prescriptions,

is

comparatively

simple.

For concreteness, consider the case of

storing

sparsely

coded random

patterns.

Each

node,

{

gf ]1

=

1,...,

N ; p

=

1,...,

P},

of the P random

patterns,

is selected to be either 1 or 0 with

probability,

aP,

and, 1 -

a4,

respectively.

The

pattern

activity,

aP,

should

satisfy,

am «

1,

and may be different for différent

patterns.

Previous work

usu-ally

included the extra

restriction;

Li çf

=

aN,

with the same

activity

level for all

patterns

[2, 7, 8].

However,

this is not necessary

provided

the network

updating

rule is

appropriately

modified

by

the addition of

inhibitory

interactions.

From the chosen

patterns,

the

couplings

are defined

by

the

equation :

This

equation

is well suited for hardware

implementation

in artificial neural networks as well as

efficient

programming

on conventional

computers.

Through

the use of multineuron

coding

[10] ,

i.e.,

storing

the neurons as

single

bits rather than

single

words,

one can take

advantage

of the

parallelization

inherent in conventional

computers.

For the case of

equation

(1),

one should store the state of the i-th neuron from each of B

patterns

in one

computer word

of B-bits

Then,

equation

(1)

can be

simply

rewritten as:

where the

symbol,

A,

represents

the

logical

AND

operator

and the

symbol,

V , represents

the

logical

OR

operator.

In this

algorithm,

the execution of the AND

operation

calculates the con-tribution to the

couplings

of B

patterns

simultaneously.

After

OR-ing

this contribution with the

contribution from all other

P/B

words

containing

the stored

patterns,

Jij

is set to one if the

re-sulting

word is nonzero and to zero otherwise. Such an

algorithm requires

0

(p N2 / B)

logical

operations

for

learning

P

patterns.

For neural networks with

binary

valued

synapses,

such an

algorithm

may be the fastest and

(4)

new

patterns

are added.

Hence,

this would seem to be the

only

learning

rule for

binary

synapses

which has a chance of

being

implemented

in any

practical

device.

willshaw et al. have shown in their

original

paper

[7]

that this model network reaches a max-imum information content of E = In 2 stored bits per bit of synapse, when the

activity

of each

pattern

is

given by

Oc =

(Nln

2)-lln

N.

However,

this calculation is

only

based

upon

the static

stability

of the

patterns.

It does not take into account the

possibility

that the

patterns

may be

dynamically

unstable, i.e.,

that the basins of attraction are of

negligible

size.

(In

other models it is known that the basins of attraction tend to zero size at the limit of maximum

storage

effi-ciency

[5, 6].)

The condition

of dynamic stability

can

only

be

investigated by studying

the network

dynamics through

numerical simulations.

The

dynamical updating

rule for a

single

layer,

recurrent network

operating

in the noise free limit is

given

by :

where,

The first term on the

right

hand side of

equation (4)

is the usual Willshaw local field

contribution,

while the

second,

a

global

inhibitory

term

[2] ,

has the effect of

stabilizing

patterns

with different activation levels. This obviates the

previous

restriction of

only storing

patterns

with one definite activation level. À is an a

priori

defined constant in the range 0 A 1.

When

studying

the retrieval

properties

of this

model,

most of the simulation time is taken up with the calculation of the local

fields,

hi(S[t]).

This time can be

improved

upon

by using

multi-neuron

coding

and

taking

advantage

of the sparse distribution of l’s in each

pattern.

Equation

(4)

is best calculated

by

storing B

neurons from

S(t)

in a

single

word of B

bits,

and B synapses from

Jj; ,

in a

single

word of B bits.

Since,

only

those

S(t)

which have the value

Si (t)

= 1 contribute to the sum in

equation (4),

one

should,

in

principle,

sum

only

over such terms. In

practice,

the time needed to determine whether or not a

particular S;(t)

is 0 or 1 is of the same order of

magnitude

as the time needed to

carry out the

multiplication

and addition. This

problem

can be countered

by using

a

fully parallel

updating

scheme and

interchanging

the order in which the calculations are

performed.

As an

example

of such a calculational

scheme,

consider the

following algorithm :

POPCNT

(x)

counts the number of bits set

equal

to one in the word x. The innermost

loop

of this

algorithm

can be

fully

vectorized

using

Cray-standard

Fortran. Such a

program

comes close to

realizing

the maximum

possible O(B/a) speed-up

over conventional

algorithms

which use

(5)

2390

paper

is the

Hopfield-Little

model

[11]

with the

commonly

used continuous

synapses.)

In

figure

1 is a

plot

of the time needed for one

complete parallel update

of a

fully

connected network when tite network is loaded to its maximum

storage

efficiency.

Note,

that for models with continuous

synapses,

the main

memory

of the

Cray-YMP/832

cannot hold networks with more than 2.2 x

107

couplings,

i.e.,

a

fully-connected

network with no more than N = 4 672 neurons.

Fig.

1. - Plot of the time needed for one

parallel update

of the entire network as a function of pattern

activity

for a

single Cray-YMP

processor.

Triangles

are for a network of N = 20 480 neurons. In the insert is shown a

plot

of

update

time versus N.

Squares

are for a conventional

algorithm implementing

continuous

synapses and the circles are for the present

algorithm

when a> _

(Nln

2)-lln

N.

From this

plot

it is

clearly

seen that there are orders of

magnitude

differences in the

updating

times of the two

algorithms.

At our maximum network size of N = 36

864,

the

updating

time

corresponds

to a sustained

speed

of 1.64 x

1011

coupling

evaluations

per

second

(164

Gcops).

The

speed

and

simplicity

of such

algorithms

make them attractive for

practical

applications

since

they

can be hard-wired with conventional

technology.

As mentioned

above,

the usefulness of such models for associative memory

applications

de-pends

not

only

on the information

storage

capacity,

but also on the retrieval

properties. Figure

2 shows a

plot

of the retrieval

performance

for several different values of the average

pattern

activ-ity.

For each

network,

the number of stored

patterns

has been

adjusted

so that the information

content

per

synapse

is

approximately

the

same, E

= 0.086 :i: 0.002. Given the différent

activity

levels in the different sets of

patterns,

this

gives

a better

comparison

of network

performance

than

simply storing

the same number of

patterns

in each network.

At S m

0.086,

one would have

expected

a networkwhose stored

patterns

have an average

activity

given by

a> = 0.0007 to exhibit better retrieval

properties

than a network whose

patterns

have

average

activity given by

a> =

0.0035,

because the latter network is

operating

closer to it’s

overloading

threshold of S m 0.093

[7,8] .

From

figure

2,

however,

it is

clearly

seen that this is

(6)

Fig.

2. - Plot of the

percentage

of correctly

retrieved patterns as a function of the initial normalized

Ham-ming

distance.The network is

fully

connected with N = 20 480 neurons and the number of stored patterns

has been

adjusted

to the average

activity

level so that the information content is t = 0.086 + 0.002 bits stored

per bit of synapse. Circles : a> =

0.0007,

P = 206

336;

Squares :

a> =

0.003,

P =

61184;

Triangles

are

for the same

system

as the squares but for the case of

storing cycles consisting

of 64 states. In the insert is shown a

plot

of the minimum normalized

Hamming

distance which is needed to

correctly

recall 50% of the initial states. All of the networks contain the same amount of information.

activity

exhibits better retrieval

properties. (i.e.,

up to the

point

where the

pattern

activity

cannot

support

the fixed

information ; apacity.)

A sketch

showing

the minimum initial

Hamming

distance needed to insure correct retrieval 50% of the time as a function of the

average

activity

is shown in

figure

2 for N = 20 480.

What these

figures

show is that the

optimal

network

parameters

determined

by

simply

static

properties,

such as information

capacity,

are no

longer optimal

when

dynamics properties,

such as retrieval

performance,

are also taken into account. This conclusion is in

agreement

with other results found in a

variety

of différent models

[5, 6] .

One criticism of

storing

patterns

with such low

activity

levels,

is that each

pattern

contains very little information. With the information content of a

pattern

having activity,

a,

given

by

the usual

entropy

expression;

,

then,

the ratio of the information contents of

patterns

with different

activity

levels is

given by :

In the

present model,

one is in the situation of

being

able to recall

patterns

with low information

(7)

2392

a network model with a much slower

updating

time. One way out of this dilemma is to transform

patterns

with

higher

information content into a set of

patterns

with lower information content and

to recall this set of

patterns

as a

dynamical

limit

cycle

rather than a fixed

point.

For random

patterns

of

activity

level,

al

= 1/2,

the

function, 0,

in

equation (6)

gives

the

mini-mum number of

patterns

having activity

level,

a2, which are necessary to encode all the

informa-tion contained in the

original

pattern

as :

Once the information has been

appropriately

transformed,

the

dynamical cycle

of

patterns,

{

Çf [1

=1,

...,N; J.L

= 1,

"""’

pc ),

can be stored in the network

by

adjusting

the

couplings

previ-ously

determined with

equation (2)

from

Jij

to

Jj;

as :

where,

(f+1 ==

(il.

Although

it may seem counterintuitive to

expect

such

highly cliped

synapses

to be able to store

cycles,

the result in

figure

2 show that the retrieval

properties

are as

good

as those for fixed

point

storage

in an

eqûivalent

network.

(For programming

convenience the

cycles

have a

length

of 64 states

each,

but

cycles

of any

length

are

possible.)

Previous researchers have discussed

using

cycles

to store time

sequences

[12]

or to achieve

low,

local neuron

firing

rates

[1, 2].

Here,

however,

the

cycles play

the fundamental role of

conveying

more information than can be

conveyed

in a

single

pattern

of low

activity.

Combined with the

updating algorithm

presented

here,

this means that the

length

of time needed to

completely

recall

a

given

amount of information is

proportional

to that amount of information. This is in

agreement

with anecdotal results from human

beings,

but it is in contrast to

previous

algorithms

and models where the time needed to recall information is

independent

of the amount of information

being

recalled.

Furthermore,

by

equation (7),

it

requires

34

patterns

having activity

level a> =

0.003,

to store

as much information as is contained in a

single

random

pattern

of

activity

level a> = 0.5. Such a random

pattern

on a network of N = 20 480 neurons would need over 2 000 ms for retrieval

using

a

conventional model and

algorithm.

On the other

hand,

with the

present

model and

algorithm,

the entire 34 state

cycle

of low

activity

patterns

can be recalled in about 450 ms.

Hence,

the

present

model is

computationally

very efficient and could be

quite advantageous

in

practical applications.

One

problem

which still needs further

work,

is the

question

of

finding

a convenient method for

converting

patterns

of

high activity

into a minimal set of

patterns

of low

activity.

This

however,

raises the more

general question

of how to code real world

patterns

for efficient

processing

by

neural networks. In

biological

systems,

an

image

at the retina

undergoes

many transformations before

reaching

the visual cortex, where it is

presumably

stored in some

highly

compact

form

[13] .

It is

likely

that artificial networks may need to use similar

types

of transformations before

they

can

deal

efficiently

with "real-world"

patterns.

In summary, a fast and efficient

algorithm

for

simulating

Willshaw

type

models has been

pre-sented.

Using

this

algorithm,

it has been shown that the

optimal

value of the

pattern

activity

in

terms of the retrieval

properties

is not in

agreement

with the

optimal

value

previously

determined

(8)

information content

patterns

has also been demonstrated and shown to be

computationally

more efficient under the

present

algorithm

than under conventional

algorithms.

Acknowledgements.

1 am very

grateful

to D. Stauffer for many

helpful

conversations related to this work.

References

[1]

AMIT D.J. and TREVES A., Proc. Nat. Acad. Sci. USA 86

(1989)

7871;

BUHMAN

J.,

Phys.

Rev. A40

(1989)

4145 ;

TREVES A. and AMIT

D.J.,

J.

Phys.

A22

(1989)

2205.

[2]

GOLOMB

D.,

RUBIN N. and SOMPOLINSKY H.,

Phys.

Rev. A41

(1990)

1843.

[3]

ABELES

M.,

VAADIA E. and BERGMAN

H.,

Network 1

(1990)

13.

[4]

PALM

G.,

Biol.

Cybern.

36

(1980)

19;

THAKOOR A.P., MOOPENN A., LAMBE J. and KHANNA

S.K., Appl. Opt.

26

(1987)

5085.

[5]

KOHRING G.A.

(preprint)

HLRZ-33/90.

[6]

KANTER I. and SOMPOLINSKY H.,

Phys.

Rev. A35

(1987)

380 ;

FORREST

B.M., J.

Phys.

A21

(1988)

245 ;

HORNER

H.,

BORMANN

D.,

FRICK

M.,

KINZELBACH H. and SCHMIDT A., Z.

Phys.

B76

(1989)

381;

KRÄTZSCHMAR J. and KOHRING

G.A., J.

Phys.

France 51

(1990)

223.

[7]

WILLSHAW

DJ.,

BUNEMAN O.P. and LONGUET-HIGGINS

H.C.,

Nature 222

(1969)

960.

[8]

AMARI

S.-I.,

Neural Networks 2

(1989)

451 ;

NADAL J.-P. and TOULOUSE

G.,

Network 1

(1990)

61.

[9]

KOCH C. and POGGIO

T,

in New

Insights

into

Synaptic

Function,

Eds. G.M.

Edelman,

W.W. Gall and W.M. Cowan

(John Wiley

and

Sons,

New

York)

1986.

[10]

PENNA T.J.P. and OLIVEIRA

EM.C., J.

Phys.

A22

(1989)

L719;

KOHRING

G.A., J.

Stat.

Phys.

59

( 1990)

1077.

[11]

HOPFIELD

J.J.,

Proc. Nat. Acad. Sci. USA 79

(1982)

2554;

LITTLE

W.A., Math.

Biosci.

19 (1975)

101.

[12]

SOMPOLINSKY H. and KANTER,

Plays.

Rev. Lett. 57

(1986)

2861;

BUHMANN J. and SCHULTEN K.,

Europhys.

Lett. 4

(1987)

1205 ;

BAUER K. and KREY

U.,

Z.

Phys.

B79

(1990)

461.

[13]

DAUGMAN J.G., IEEE Trans. ASSP 36

(1988)

1169 ;

ZETSCHE

C.,

in Parallel

Processing

in Neural

Systems

and

Computers,

Eds. R.

Eckmiller,

G. Hartmann and G. Hauske

(North-Holland, Amsterdam)

1990.

[14]

KRAUTH W. and OPPER

M., J.

Phys.

A22

(1989)

L519;

KÖHLER

H.,

DIEDERICH

S.,

KINZE L.W. and OPPER

M.,

Z.

Phys.

B78

(1990)

333 ;

KÖHLER

H.,

Inst. Th.

Phys.,

Uni.

Göttingen (preprint)

1990.

Références

Documents relatifs

c - Ordered selection mode provides multiple sets of resources (at least two), accompanied with knowledge to specify the order in which they must be pre- sented.. Only one set

This paper provides an effective method to create an abstract simplicial complex homotopy equivalent to a given set S described by non-linear inequalities (polynomial or not)1. To

Note however that, even if the approximate and exact solutions are very close in practice, our method does not guarantee the formal existence of cyclic exact solutions inside I,

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Now we introduce some key concepts of the topological dynamics. This general formalism will be useful when studying directional dynamics of cellular automata, but the reader should

Index Terms—Business to worker, multiagent system, multiagent architecture, generic service component, field trial, virtual

In order to derive the dependent elimination principle from the uniqueness property we need to extend CC with dependent sums and identity types.. Our implementation shows that, at

Health authorities should have power to place people under surveillance even though they have been vaccinated and they may also in certain circumstances require isolation until