HAL Id: jpa-00212536
https://hal.archives-ouvertes.fr/jpa-00212536
Submitted on 1 Jan 1990
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Performance enhancement of Willshaw type networks
through the use of limit cycles
G.A. Kohring
To cite this version:
2387
Short Communication
Performance
enhancement of Willshaw
type
networks
through
the
useof
limit
cycles
G.A.
Kohring (*)
HLRZ an der KFA
Jülich,
D-5170Julich,
F.R.G.(Received
24August
1990,
accepted 30August
1990)
Abstract. 2014 Simulation results of a Willshaw
type model for
storing
sparsely
coded patterns arepresented.
It issuggested
that random patterns can be stored in Willshaw type modelsby
transforming
them into a set ofsparsely
coded patterns andretrieving
this set as a limitcycle.
In this way, the number of steps needed to recall a pattern will be a function of the amount of information the patterncontains. A
general
algorithm
forsimulating
neural networks withsparsely
coded patterns is alsodiscussed, and, on a
fully
connected network ofN = 36 864 neurons(1.4
x109
couplings),
it is shownto achieve effective
updaping speeds
ashigh
as 1.6 x1011 coupling
evaluations per second on oneCray-YMP
processor.LE
JOURNAL
DE
PHYSIQUE
1
Phys.
France 51( 1990)
2387-2393 1er NoVEMBRE 1990, 1Classification
Physics
Abstracts 87.30GSparse coding,
i.e.coding
apattern
such that the number of ones is much smaller than thenum-ber of zeros or minus ones, has been of interest to the neural network
community
for many years.Neurological
modellers,
forexample,
seek insparse
coding
anexplanation
of the low neuronfiring
rates
[1, 2]
which have beenexperimentally
observed in the cerebral cortex[3] ;
whileapplication
oriented researchers note that neural networks arecomputationally
more efficient than conven-tional methods for associative memory tasksonly
when the number of storedpatterns
per neuron isgreater
than the number ofincoming
synapses
per neuron[4,5].
This latter condition iseffec-tively
fulfillableonly
forsparsely
codedpatterns,
since fornon-sparsely
codedpatterns
the basins of attraction will benegligibly
small whenever this constraint is satisfied[5, 6].
One of the earliest models for
sparsely
codedpatterns,
and one which hasappealed
to both of the abovegroups,
is the model introducedby
Willshaw et al.[7,2,4,8].
This model consists of Nneurons,
(S; [ 1
=1,...,
N},
taking
on the values{D, 1}
for theresting
andfiring
statesrespectively.
2388
In accord with current
neurological thinking, synaptic plasticity
is assumed to occuronly
atexci-tatory synapses
andonly
when both thepresynaptic
andpostsynaptic
neurons firesimultaneously.
Inhibition is assumed to be the mechanism
whereby
nature controls the neuronfiring
rates andthus achieves
sparse
coding
[1] .
A further attribute of the Willshaw
model
is the use ofclipped
synapses,i.e.,
thesynapses,
fjij
Ji, j
=1,
...,
N;
Jii
-0},
takeonly
the twovalues,
{o,1}.
Such a drastic reduction of thecoupling phase
space
is notbiologically
justifiable
except
under certain conditions when thepost-synaptic
membrane of a dendriticspine
is active[9] ,
but it has been shown toimprove
theinfor-mation
storage
efficiency
of neural networks[5]
and due to itssimplicity,
it is of much interest tothe
applied
researchcommunity.
Combined with the abovesparse
coding
ofpatterns,
this leadsto
improved
retrievalperformance
as well as fast and efficientupdating
andlearning algorithms.
Information is stored in the Willshaw model
by
alearning
procédure
which,
in contrast to otherlearning
prescriptions,
iscomparatively
simple.
For concreteness, consider the case ofstoring
sparsely
coded randompatterns.
Eachnode,
{
gf ]1
=1,...,
N ; p
=1,...,
P},
of the P randompatterns,
is selected to be either 1 or 0 withprobability,
aP,
and, 1 -
a4,
respectively.
Thepattern
activity,
aP,
shouldsatisfy,
am «1,
and may be different for différentpatterns.
Previous workusu-ally
included the extrarestriction;
Li çf
=aN,
with the sameactivity
level for allpatterns
[2, 7, 8].
However,
this is not necessaryprovided
the networkupdating
rule isappropriately
modifiedby
the addition ofinhibitory
interactions.From the chosen
patterns,
thecouplings
are definedby
theequation :
This
equation
is well suited for hardwareimplementation
in artificial neural networks as well asefficient
programming
on conventionalcomputers.
Through
the use of multineuroncoding
[10] ,
i.e.,
storing
the neurons assingle
bits rather thansingle
words,
one can takeadvantage
of theparallelization
inherent in conventionalcomputers.
For the case ofequation
(1),
one should store the state of the i-th neuron from each of Bpatterns
in onecomputer word
of B-bitsThen,
equation
(1)
can besimply
rewritten as:where the
symbol,
A,represents
thelogical
ANDoperator
and thesymbol,
V , represents
thelogical
ORoperator.
In thisalgorithm,
the execution of the ANDoperation
calculates the con-tribution to thecouplings
of Bpatterns
simultaneously.
AfterOR-ing
this contribution with thecontribution from all other
P/B
wordscontaining
the storedpatterns,
Jij
is set to one if there-sulting
word is nonzero and to zero otherwise. Such analgorithm requires
0(p N2 / B)
logical
operations
forlearning
Ppatterns.
For neural networks with
binary
valuedsynapses,
such analgorithm
may be the fastest andnew
patterns
are added.Hence,
this would seem to be theonly
learning
rule forbinary
synapses
which has a chance ofbeing
implemented
in anypractical
device.willshaw et al. have shown in their
original
paper
[7]
that this model network reaches a max-imum information content of E = In 2 stored bits per bit of synapse, when theactivity
of eachpattern
isgiven by
Oc =(Nln
2)-lln
N.However,
this calculation isonly
basedupon
the staticstability
of thepatterns.
It does not take into account thepossibility
that thepatterns
may bedynamically
unstable, i.e.,
that the basins of attraction are ofnegligible
size.(In
other models it is known that the basins of attraction tend to zero size at the limit of maximumstorage
effi-ciency
[5, 6].)
The conditionof dynamic stability
canonly
beinvestigated by studying
the networkdynamics through
numerical simulations.The
dynamical updating
rule for asingle
layer,
recurrent networkoperating
in the noise free limit isgiven
by :
where,
The first term on the
right
hand side ofequation (4)
is the usual Willshaw local fieldcontribution,
while the
second,
aglobal
inhibitory
term[2] ,
has the effect ofstabilizing
patterns
with different activation levels. This obviates theprevious
restriction ofonly storing
patterns
with one definite activation level. À is an apriori
defined constant in the range 0 A 1.When
studying
the retrievalproperties
of thismodel,
most of the simulation time is taken up with the calculation of the localfields,
hi(S[t]).
This time can beimproved
uponby using
multi-neuroncoding
andtaking
advantage
of the sparse distribution of l’s in eachpattern.
Equation
(4)
is best calculatedby
storing B
neurons fromS(t)
in asingle
word of Bbits,
and B synapses fromJj; ,
in asingle
word of B bits.Since,
only
thoseS(t)
which have the valueSi (t)
= 1 contribute to the sum inequation (4),
oneshould,
inprinciple,
sumonly
over such terms. Inpractice,
the time needed to determine whether or not aparticular S;(t)
is 0 or 1 is of the same order ofmagnitude
as the time needed tocarry out the
multiplication
and addition. Thisproblem
can be counteredby using
afully parallel
updating
scheme andinterchanging
the order in which the calculations areperformed.
As anexample
of such a calculationalscheme,
consider thefollowing algorithm :
POPCNT
(x)
counts the number of bits setequal
to one in the word x. The innermostloop
of thisalgorithm
can befully
vectorizedusing
Cray-standard
Fortran. Such aprogram
comes close torealizing
the maximumpossible O(B/a) speed-up
over conventionalalgorithms
which use2390
paper
is theHopfield-Little
model[11]
with thecommonly
used continuoussynapses.)
Infigure
1 is a
plot
of the time needed for onecomplete parallel update
of afully
connected network when tite network is loaded to its maximumstorage
efficiency.
Note,
that for models with continuoussynapses,
the mainmemory
of theCray-YMP/832
cannot hold networks with more than 2.2 x107
couplings,
i.e.,
afully-connected
network with no more than N = 4 672 neurons.Fig.
1. - Plot of the time needed for oneparallel update
of the entire network as a function of patternactivity
for asingle Cray-YMP
processor.Triangles
are for a network of N = 20 480 neurons. In the insert is shown aplot
ofupdate
time versus N.Squares
are for a conventionalalgorithm implementing
continuoussynapses and the circles are for the present
algorithm
when a> _(Nln
2)-lln
N.From this
plot
it isclearly
seen that there are orders ofmagnitude
differences in theupdating
times of the twoalgorithms.
At our maximum network size of N = 36864,
theupdating
timecorresponds
to a sustainedspeed
of 1.64 x1011
coupling
evaluationsper
second(164
Gcops).
Thespeed
andsimplicity
of suchalgorithms
make them attractive forpractical
applications
sincethey
can be hard-wired with conventionaltechnology.
As mentioned
above,
the usefulness of such models for associative memoryapplications
de-pends
notonly
on the informationstorage
capacity,
but also on the retrievalproperties. Figure
2 shows aplot
of the retrievalperformance
for several different values of the averagepattern
activ-ity.
For eachnetwork,
the number of storedpatterns
has beenadjusted
so that the informationcontent
per
synapse
isapproximately
thesame, E
= 0.086 :i: 0.002. Given the différentactivity
levels in the different sets of
patterns,
thisgives
a bettercomparison
of networkperformance
thansimply storing
the same number ofpatterns
in each network.At S m
0.086,
one would haveexpected
a networkwhose storedpatterns
have an averageactivity
given by
a> = 0.0007 to exhibit better retrievalproperties
than a network whosepatterns
haveaverage
activity given by
a> =0.0035,
because the latter network isoperating
closer to it’soverloading
threshold of S m 0.093[7,8] .
Fromfigure
2,
however,
it isclearly
seen that this isFig.
2. - Plot of thepercentage
of correctly
retrieved patterns as a function of the initial normalizedHam-ming
distance.The network isfully
connected with N = 20 480 neurons and the number of stored patternshas been
adjusted
to the averageactivity
level so that the information content is t = 0.086 + 0.002 bits storedper bit of synapse. Circles : a> =
0.0007,
P = 206336;
Squares :
a> =0.003,
P =61184;
Triangles
arefor the same
system
as the squares but for the case ofstoring cycles consisting
of 64 states. In the insert is shown aplot
of the minimum normalizedHamming
distance which is needed tocorrectly
recall 50% of the initial states. All of the networks contain the same amount of information.activity
exhibits better retrievalproperties. (i.e.,
up to thepoint
where thepattern
activity
cannotsupport
the fixedinformation ; apacity.)
A sketchshowing
the minimum initialHamming
distance needed to insure correct retrieval 50% of the time as a function of theaverage
activity
is shown infigure
2 for N = 20 480.What these
figures
show is that theoptimal
networkparameters
determinedby
simply
staticproperties,
such as informationcapacity,
are nolonger optimal
whendynamics properties,
such as retrievalperformance,
are also taken into account. This conclusion is inagreement
with other results found in avariety
of différent models[5, 6] .
One criticism of
storing
patterns
with such lowactivity
levels,
is that eachpattern
contains very little information. With the information content of apattern
having activity,
a,given
by
the usualentropy
expression;
,then,
the ratio of the information contents ofpatterns
with differentactivity
levels isgiven by :
In the
present model,
one is in the situation ofbeing
able to recallpatterns
with low information2392
a network model with a much slower
updating
time. One way out of this dilemma is to transformpatterns
withhigher
information content into a set ofpatterns
with lower information content andto recall this set of
patterns
as adynamical
limitcycle
rather than a fixedpoint.
For random
patterns
ofactivity
level,
al= 1/2,
thefunction, 0,
inequation (6)
gives
themini-mum number of
patterns
having activity
level,
a2, which are necessary to encode all theinforma-tion contained in the
original
pattern
as :Once the information has been
appropriately
transformed,
thedynamical cycle
ofpatterns,
{
Çf [1
=1,
...,N; J.L
= 1,
"""’pc ),
can be stored in the networkby
adjusting
thecouplings
previ-ously
determined withequation (2)
fromJij
toJj;
as :where,
(f+1 ==
(il.
Although
it may seem counterintuitive toexpect
suchhighly cliped
synapses
to be able to store
cycles,
the result infigure
2 show that the retrievalproperties
are asgood
as those for fixedpoint
storage
in aneqûivalent
network.(For programming
convenience thecycles
have alength
of 64 stateseach,
butcycles
of anylength
arepossible.)
Previous researchers have discussed
using
cycles
to store timesequences
[12]
or to achievelow,
local neuronfiring
rates[1, 2].
Here,
however,
thecycles play
the fundamental role ofconveying
more information than can beconveyed
in asingle
pattern
of lowactivity.
Combined with theupdating algorithm
presented
here,
this means that thelength
of time needed tocompletely
recalla
given
amount of information isproportional
to that amount of information. This is inagreement
with anecdotal results from humanbeings,
but it is in contrast toprevious
algorithms
and models where the time needed to recall information isindependent
of the amount of informationbeing
recalled.Furthermore,
by
equation (7),
itrequires
34patterns
having activity
level a> =0.003,
to storeas much information as is contained in a
single
randompattern
ofactivity
level a> = 0.5. Such a randompattern
on a network of N = 20 480 neurons would need over 2 000 ms for retrievalusing
aconventional model and
algorithm.
On the otherhand,
with thepresent
model andalgorithm,
the entire 34 statecycle
of lowactivity
patterns
can be recalled in about 450 ms.Hence,
thepresent
model is
computationally
very efficient and could bequite advantageous
inpractical applications.
Oneproblem
which still needs furtherwork,
is thequestion
offinding
a convenient method forconverting
patterns
ofhigh activity
into a minimal set ofpatterns
of lowactivity.
Thishowever,
raises the more
general question
of how to code real worldpatterns
for efficientprocessing
by
neural networks. Inbiological
systems,
animage
at the retinaundergoes
many transformations beforereaching
the visual cortex, where it ispresumably
stored in somehighly
compact
form[13] .
It islikely
that artificial networks may need to use similartypes
of transformations beforethey
candeal
efficiently
with "real-world"patterns.
In summary, a fast and efficient
algorithm
forsimulating
Willshawtype
models has beenpre-sented.
Using
thisalgorithm,
it has been shown that theoptimal
value of thepattern
activity
interms of the retrieval
properties
is not inagreement
with theoptimal
valuepreviously
determinedinformation content
patterns
has also been demonstrated and shown to becomputationally
more efficient under thepresent
algorithm
than under conventionalalgorithms.
Acknowledgements.
1 am very
grateful
to D. Stauffer for manyhelpful
conversations related to this work.References
[1]
AMIT D.J. and TREVES A., Proc. Nat. Acad. Sci. USA 86(1989)
7871;BUHMAN
J.,
Phys.
Rev. A40(1989)
4145 ;
TREVES A. and AMIT
D.J.,
J.Phys.
A22(1989)
2205.[2]
GOLOMBD.,
RUBIN N. and SOMPOLINSKY H.,Phys.
Rev. A41(1990)
1843.[3]
ABELESM.,
VAADIA E. and BERGMANH.,
Network 1(1990)
13.[4]
PALMG.,
Biol.Cybern.
36(1980)
19;THAKOOR A.P., MOOPENN A., LAMBE J. and KHANNA
S.K., Appl. Opt.
26(1987)
5085.[5]
KOHRING G.A.(preprint)
HLRZ-33/90.[6]
KANTER I. and SOMPOLINSKY H.,Phys.
Rev. A35(1987)
380 ;FORREST
B.M., J.
Phys.
A21(1988)
245 ;HORNER
H.,
BORMANND.,
FRICKM.,
KINZELBACH H. and SCHMIDT A., Z.Phys.
B76(1989)
381;KRÄTZSCHMAR J. and KOHRING
G.A., J.
Phys.
France 51(1990)
223.[7]
WILLSHAWDJ.,
BUNEMAN O.P. and LONGUET-HIGGINSH.C.,
Nature 222(1969)
960.[8]
AMARIS.-I.,
Neural Networks 2(1989)
451 ;
NADAL J.-P. and TOULOUSE
G.,
Network 1(1990)
61.[9]
KOCH C. and POGGIOT,
in NewInsights
intoSynaptic
Function,
Eds. G.M.Edelman,
W.W. Gall and W.M. Cowan(John Wiley
andSons,
NewYork)
1986.[10]
PENNA T.J.P. and OLIVEIRAEM.C., J.
Phys.
A22(1989)
L719;
KOHRING
G.A., J.
Stat.Phys.
59( 1990)
1077.[11]
HOPFIELDJ.J.,
Proc. Nat. Acad. Sci. USA 79(1982)
2554;
LITTLE
W.A., Math.
Biosci.19 (1975)
101.[12]
SOMPOLINSKY H. and KANTER,Plays.
Rev. Lett. 57(1986)
2861;BUHMANN J. and SCHULTEN K.,
Europhys.
Lett. 4(1987)
1205 ;
BAUER K. and KREY
U.,
Z.Phys.
B79(1990)
461.[13]
DAUGMAN J.G., IEEE Trans. ASSP 36(1988)
1169 ;
ZETSCHE
C.,
in ParallelProcessing
in NeuralSystems
andComputers,
Eds. R.Eckmiller,
G. Hartmann and G. Hauske(North-Holland, Amsterdam)
1990.[14]
KRAUTH W. and OPPERM., J.
Phys.
A22(1989)
L519;
KÖHLER