HAL Id: jpa-00211044
https://hal.archives-ouvertes.fr/jpa-00211044
Submitted on 1 Jan 1989
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Vectorized multi-site coding for nearest-neighbour neural networks
B.M. Forrest
To cite this version:
B.M. Forrest. Vectorized multi-site coding for nearest-neighbour neural networks. Journal de Physique, 1989, 50 (15), pp.2003-2017. �10.1051/jphys:0198900500150200300�. �jpa-00211044�
Vectorized multi-site coding for nearest-neighbour neural
networks
B. M. Forrest
Institut für Festkörperforschung, Kemforschungsanlage, Postfach 1913, D-5170 Jülich, F.R.G.
(Reçu le 20 avril 1989, accepté le 21 avril 1989)
Résumé. 2014 Nous étudions par simulation numérique des réseaux de neurones binaires utilisant des algorithmes de codage multisite. Nous obtenons des vitesses de plus de 200 neurones visités
par microseconde à l’aide d’un algorithme vectorisé sur le Cray-XMP. Nous présentons des
résultats sur des réseaux bidimensionnels contenant jusqu’à 512 x 512 neurones. Nous montrons
que les réseaux fonctionnent comme des mémoires associatives et qu’ils stockent l’information de
façon plus efficace que les réseaux complètement connectés.
Abstract. 2014 Ising spin neural networks with clipped synapses (± 1 only) and with local
connectivity are simulated using multi-site coding algorithms. Speeds of over 200 neuron updates
per microsecond are achieved by vectorization of the algorithm on the Cray-XMP. Results are presented for two-dimensional networks of up to 512 x 512 neurons. The networks are shown to function as associative memories and the amount of information stored compared to the amount
used to store it improves upon fully-connected models.
Classification
Physics Abstracts
02.70 - 05.50
1. Introduction.
In recent years a great deal of research activity has centred around simplified models of neural
networks, examining their ability to perform as associative memories. Within the class of models which shall concern us here, each neuron may be represented by an Ising
(or bit) variable Si
= 1 for neuron « on »(spin up)
or - 1 for neuron « off »(spin down).
The state ofeach neuron is govemed by its momentary local field
hi (t )
which is assumed to be a linear sumof the incoming signals from all the neurons j which have a synaptic connection
Tij
incident onto neuron i :If no stochastic noise is present
(which
shall be the casehere),
then each neuron simply« aligns » with its local field,
Article published online by EDP Sciences and available at http://dx.doi.org/10.1051/jphys:0198900500150200300
where T represents the « clock cycle » of each neuron. In the simulations presented below, the network was
updated
synchronously, that is, at each time-step T all of the neutrons modified their states according to(2).
The states of the network which are the fixed points of the abovedynamics
are customarily identified as the states which are « stored » by the network, corresponding to persistent « firing patterns »(stable
spinconfigurations).
The necessary and sufficient condition for S * to be such a state is that every spinSi*
be aligned with its local field,Ideally we would like to specify a priori a set of nominal
pattems (§[ ;
1 -- i --N ; 1 --
r * p )
which are to be stored in this neural memory, i.e., which are to be fixed pointsof the dynamics
(2),
or, at least, which should lie reasonably close to a fixed point. Whether ornot we are successful depends entirely upon our choice of the
synaptic
efficaciesTij.
It is well known that for the Hopfield-Little model[1-2],
which is a fully-connectednetwork
(where
each neuron may be connected to everyother),
the Hebbianprescription
can successfully store up to p -= 0.14 N random uncorrelated patterns
[3].
In the ther-modynamic limit
(N - oo )
the Tij
assume a continuous range of values, since they arediscretised on the scale
of 1 ,
5 as they are in the error-correctiveleaming
algorithms[4-11]
which have been studied in order to improve upon the performance of
(4).
The realisation that the fully-connected nature. of these synaptic connections would prove
an insurmountable task in the fabrication of a network of any reasonable size has prompted
the consideration networks with more restricted architectures limiting the number of connections per site
[12].
Implementation difficulties would also be alleviated by either imposing an upper bound on the magnitude of the connections[13-16] («
clippedsynapses »)
or by discretising the connections, or both. For example, in the extreme case of the latter
restriction, where the synapses are ± 1 only, it has been calculated
[15]
that a fully-connectednetwork will function as an associative memory, storing p ac N random uncorrelated patterns, with critical storage ratio ac = 0.102 and with retrieval
quality
at worst 97.4 %.Here we shall consider the imposition of both restrictions : the
Tij
will be allowed to assumeonly the values ± 1 and each neuron shall only be connected to its four nearest neighbours in a
two-dimensional network. As will be explained below, this shall allow us to simulate very
large networks using very
powerful
vectorizedmulti-spin
codingtechniques [17].
A similarmodel has recently been studied by Kürten
[18], employing
different techniques and addressing different aspects.2. Choosing the synaptic connections.
z
Before describing the algorithm, we first should specify the choice of the synapses
Tij.
Given that they are limited to ± 1, how should we choose them ? We shall consider ageneral asymmetric network so that we are not encumbered by the condition
Ti j
=Tii.
This permits us to consider each site i independently. Now, given the set of nominalpatterns ( §[ ; 1--- i --
N ;1 * r* p )
which we wish to store, at each site i we would like to have that, for each pattern r,where j runs over the four nearest neighbours of i. Since each
Tij
can only assume one of twopossible values, there are only 24 choices for the four incoming connections to site i. We can
thus simply
perform
an exact enumeration and evaluate the p constraints(5)
for each of these 16 possible choices and choose the best one for our connections to site i. The « best » choice shall bedesignated
as that one which satisfies the most of the p inequations(5).
In the case ofa tie, we shall choose the one which satisfies
N
where
Rr r Tii ei
and 0 is the threshold function. The reason for this choice is that a=i i
larger value of
Ri
should imply larger content-addressability of the r-th nominal pattern[8].
This exact enumeration of all possible choices of connections requires 16 Np evaluation of
(5).
3. Multi-spin coding.
Using the optimal choice of connections elucidated in the previous section, a two-dimensional network of L x L
neurons Si ;
1 -- i -- N(N
=L 2)
was simulated by employing a Fortran multi-spincoding
algorithm which requires only one bit per spin(neuron)
and is based on amethod propounded by Herrmann
[19]
for the fast simulation of Ising models.The
technique
lends itself to system sizes where L is a multiple of 64. Thus the system sizes which are dealt with start off where in other models they often end : 4 096 neurons. DefiningM =
L/64,
the first M spins in the first row of the lattice are placed in the first bit of the 64-bitintegers IS
(1 ),
..., IS(M),
then the next M spins are placed in the second bit, and so on up to the 64th bit. The next row of the lattice will be held by the wordsIS (M + 1 )
toIS (2
xM).
The actual array of spins are thus represented by the L x M wordsIS (M
+1 )
toIS(L
x M +M),
with the top row(IS (1 )
toIS (M))
and an additional row at the bottomserving as shadow lines to invoke up-down periodic
boundary
conditions. Hence the arrayIS(L
x M + 2 xM)
will hold all the spins in a L x L lattice plus these two shadow lines. With the exception of the words M + 1, M + 2, ... and 2 M, 3 M, ..., the four neighbours of each ofthe 64 neurons held in the word IS
(1 )
will then be found at the same bit-position in the wordsIS (I - M), IS (1 + 1 ), IS (I
+M), IS (I -1 ) (up,
right, down and left neighbour, respect-ively).
Table I. - The number of
single
neuron updates per micro- second achieved on the CrayX-MP for various systems
of
size L x L.Now,
recalling
the form of theupdating
rule for a neuron(2),
we still have to specify thecase
of hi
= 0. The rule chosen in the simulationswas Si (t
+T ) = Si (t )
ifhi
= 0 since, as willbe
explained
below, this was found to induce much more stability in the network thanchoosing Si (t
+T )
= 1 ifhi
= 0.Representing
the state of aneuron S¡
by the bitvariable s¡ == 4
1(Si
+1),
andstoring
the connections in a similar fashion,tij - 1/2 (Tq + 1 ), the modified signal Tij sj
incident from
neuron j onto neuron i will then
correspond
to EQV(tij, sj),
where EQV is the «equivalence
»bitwise logical operation.
The neural updating rule explained above can be realised in the
following
manner.Denoting
the bitsEQV(tij, sj)
by nj, we set the i-th neuron « on »(TRUE)
if and only if atleast three of the five bits nl, n2, n3, n4
and si
are TRUE. This can beimplemented
by theBoolean function ,
where v denotes logical OR and A denotes logical AND.
Three separate loops, each of which fully vectorize due to the
parallel
nature of the neural dynamics, are needed for a sweep through the lattice : one for those words(M + 1,
2 M + 1,
...)
where the left neighbour is not a bit in the same position as the bit of the sitebeing
updated ; one for those words who have such a right-hand neighbour(2
M,3 M,
...) ;
and one for all remaining words.The algorithm achieves over 200 million neuron updates per second on the Cray X-MP : timings for various system sizes up to L = 2 048 are presented in table I. Of course, the above algorithm generalises for an n-bit machine
(n
= 64 for theCray)
to systems of linear size L = n x M.(An
algorithm using 3 bits per site[17]
wasslightly
slower(around
180 million updates persecond)
but was applied to systems L -- 32 and is discussed in theappendix.)
4. Results.
Systems of linear size L = 4 to 512 were simulated. Although the above algorithm allowed systems as
large
as 2 048 x 2 048 to be dealt with, the computational effortrequired
for theexact enumeration leaming of the synapses limited the number of statistical samples which
could be carried out in a reasonable time. Note that this initialisation effort is
equivalent
to16 Np sweeps of the network
(Np
single spin updates for each possible choice of theconnections at a
site),
but that this part of the program was not carried out by multi-spin coding. Thetimings
for theupdating
loop were found to be over 100 times faster using multi- spincoding
ascompared
to normal Fortran(one
word per site and involving integermultiplications).
Given that, forexample,
at L = 512 the number of sweeps tostability
of apattern was
typically
between 100 and 200(see below),
this is a substantialsaving.
Figure
la shows the mean finaloverlap
mf of an iterated pattern with a nominal state afterhaving
started from a state which had initial overlap mo with that nominal state. These resultswere obtained by
averaging
over 10 initial states withoverlap
mo and thenperforming
a quenched average over 103 independent samples(103
choices of thegr).
The overlap is theusual measure of the
resemblance
of two patterns, orspin configurations S(1)
andS(2) :
Fig. 1. - a) Mean final overlap, m f, after iteration from an initial state having overlap mo and with p = 2 patterns stored ; b) the size-dependence of the mean final overlap, mf, from a state having
overlap mo = 0.75 with p = 2.
The choice of the dynamics which
keep Si
the same ifhi
= 0 is justified from thisfigure
sincethe performance using
Si (hi
=0)
--+ 1 is greatly deteriorated, even at p = 2. Moreover, themean number of unstable neurons was observed to increase : e.g., to around 5 % at p = 2, mo = 1.0, and to 20 % at p = 2, mo = 0.5, from less than 0.1 % with the chosen
dynamics.
(These
unstable sites were in fact all bistable. Note that for voting rule cellular automata[20],
where each cell adopts the state corresponding to a « poll » of its neighbours,every state evolves into either a fixed point or a bistable state - no limit cycles of higher
period
exist. The network here is similar to such a rule, but is different in that the states of theneighbouring
cells are involved in the « vote » only after they are modified by their synapticconnections. )
These results hold for all the system sizes attempted with L > 12, since the mean final overlap was only found to show a size-dependent drift for L 12
(Fig. Ib).
For larger valuesof L, m f remained unchanged
(within
statisticalfluctuations)
on increasing L - the width of the distribution of final overlaps merely decreased.The
width i1ml= mf- (m f )2
of m f was found to increase as mo decreased, but the rate at whichi1ml decreased
with respect to increasing system size L was found to be independent ofmo. As shown in table II, obtained from figure 2,
Amf 2 apparently
obeys the scaling relationwith y =1.0 and
w (mo )
some function of mo only : thusdml
exhibits strong self-averaging.This in turn indicates that as L - oo the probability of obtaining a particular final overlap
m f from an initial overlap mo will approach a Kronecker delta function :
p (mf 1 mo)--->
8 (m f - m f (mo )),
wherem f (mo )
is the function plotted in figure la.Fig. 2. - The size-dependence of the width,
Amf,
of mf after itération from four différent values of mo atp=2.Table II. - Estimates
o f the
parameters g and c obtained from the linear relationship(Fig. 2)
In
Am/
= g In L + c.If we ask for the fraction
f (mo ) of
iterated states which are recalled to within 10 % accuracy(m f , 0.8 )
from an initial state having overlap mo, then, for p = 2 we obtain the behaviour infigure 3a.
Fig. 3. - a) At p = 2, the fraction, f, of states recalled with less than 10 % errors from initial overlap
mo. Best-fit scaling forms (9) are drawn through the points ; b) the linear relationship between
In ( f / (1- f ) ) and mo implies the scaling form (9).
The smooth curves drawn through the points are best-fit forms of the relation
expressing
the ratio of probability of recall to non-recall as
[8]
The validity of this assumption is confirmed by figure 3b.
The gradient g was found to scale linearly with L, since the data fit In g = y In L + c with y = 1.004 ± 0.014, and c = 0.230 ± 0.059, so that
g/L
= 1.26 ± 0.06 and an estimate for the critical minimum overlap, above whichf (mo) --+ 1
for L --+ ao, can be obtained by extrapolating the initial overlapsmo( f )
required for a particularf to
L - 1 --+ 0. This is done infigure 4 for
f
= 0.2 and 0.8,yielding
Fig. 4. - The initial overlap, m, required to produce a mean recall fraction, f, is plotted against L-1. Extrapolation to L -1-+ 0 yields the critical minimum overlap, mc.
Similar critical behaviour with respect to
f
has been obtained for fully-connected models[8],
withf (mo) following
an identical scaling form(9).
It is not clear whether the number of sweeps to stability
n (L )
grows exponentially with respect to the system size for large L, although fromfigure
5 we cannot rule out the case thatthe number may obey some
scaling
law,n(L, mo) = nt (mo) n2(L).
Such a law, withn2(N )
= In N has recently been found by Kanter[21]
forinfinite-ranged
interactions.(For
the simulations here, the number
n (L )
is actually the number of sweeps until every neuron is either stable or bistable. In fact, the number of bistable neurons was of the order of o.1 % in allcases.)
Simulations were also performed for networks containing both nearest- and next-nearest-
neighbour
(NNN)
connections for system sizes L = 16 up to L = 128. The exact enumerationlearning
procedure then involves 28 possible choices at each of the N sites of the network.Fig. 5. - The mean number of sweeps, n (L ), required for iteration to stability from states having a given initial overlap, mo, at different system sizes L.
Similar behaviour of the final overlap was observed, Le., the mean value remained
invariant to within statistical fluctuations as L was increased, with the width decreasing. In
terms of the closeness of the fixed points to the nominal states, the
performance
of thenetwork improved, m f increasing for larger values of mo at p = 2 and 4. However for p = 8 the NNN network had a slightly inferior
performance,
as can be seen in figure 6.Fig. 6. - Comparison of the retrieval quality of the network with only nearest-neighbour connections
(z = 4) and with additional next-nearest-neighbour connections (z = 8).
Relaxing the condition
(3)
that the alignment of « spin » and « local field » should bestrictly positive
to the case where it is onlyrequired
to be non-negative enhances the retrievalquality
of the network
(both
for z = 4 and8),
but only from values of mo near 1.Up until now we have only been considering the storage of p random, unbiased patterns, i. e. , where the mean «
magnetisation »
is zero :where the angular brackets denote a
(quenched)
average over the choice of the randomg F.
It has been found that the storage capacity of this class of networks is improved if weinstead attempt to store biased pattems
[9, 22]
which have a non-zero meanmagnetisation,
Does this also hold for networks with restricted synapses of the type considered here ?
Figure 7a produces an affirmative answer, showing that for high enough bias a the retrieval
quality is improved.
However, as explained by Amit et al.
[22],
we really should examine not merely the numberof patterns stored, but the total information content of the patterns. Their measure takes into
account both the amount of information stored in a nominal pattern and the loss of information when the pattern is retrieved with errors
(m
f1 ).
The information stored in each nominal pattern is the entropy, S, associated with the number of ways of choosing arandom
pattern 03BE
subject to itsmagnetisation (11)
being a :The information lost when the retrieved pattern has overlap m f with the nominal one is the entropy of all possible patterns which have an overlap m
with
that nominal pattern :Thus, for z connections per site, the total information per connection stored in the network when the mean final overlap of retrieved pattems is m f is ,
Note that this is also the information per bit used to store the patterns.
Using this
quantity (Fig. 7b)
we see that less information is actually stored as the bias(and
hence
correlation)
of the patterns is increased. It is also evident that although the NNNnetwork produces enhanced retrieval of p = 2 and 4 patterns, the information per connection is actually less.
Fig. 7. - a) Mean final overlap, mf (from mo = 1) in networks storing random patterns of mean magnetisation a ; b) the corresponding information stored per bit of information used.
5. Discussion.
How well does the performance of this type of network compare to that with of a fully-
connected model where each neuron can interact with any other ? It has become customary to
measure the network’s
ability
to store patterns in terms of the storage ratio a -p/N,
the ratio of the number p of patterns stored in a network of N neurons. This would be anaive and unfair measure, however : we should rather use a
quantity
which tells us the number of bits of information stored,bs,
compared to the number of bits used to store them(the
number of bits needed to specify all of thesynapses), bT.
For the
fully-connected
network withbinary
synapses(Tij =
±1 ),
the number of bits used isbT
= N 2 forasymmetric connections, 1 N 2
for symmetric interactions. In the networks considered here, thecorresponding
number isbT
= zN, where the coordination number z is 4 for thenearest-neighbour
network and z = 8 for the NNN network.In table III two measures are used for the number of bits stored in the network. The information ratio R, involves the total number of « uncorrupted » bits which are retrieved : if the nominal patterns are retrieved with a mean final overlap m f, then the number of retrieval
errors for each pattern
is 1 N (1- m f),
so that the total number of bits stored correctly isbs N (1 + mf) p.
For a fully-connected network with symmetric couplings[15]
mf = 0.948 and p = 0.102 N, thus Ri -bs/bT
= 0.199. From table III it is clear that both thenearest-neighbour
and NNN networksoutperform
theirlong-range
counterpart.The second ratio, R2, uses an entropic measure
(as
discussedpreviously)
akin to that ofAmit et al.
[22]
for the information stored :Rz == 1 (mf)/bT,
where1 (mf)
is the totalinformation stored in p patterns,
Table III. - Comparison o f the performance
o f a clipped fully-connected
model with the localz = 4 and z = 8 models in terms
o f the
ratios : Rl, the numbero f bits
retrieved without errors tothe number o f bits used in the synapses,
bT,
and ; R2, the information 1(mf) (15)
stored in thenetwork to
bT.
Once again, the networks with localised connections
perform
better with respect to this second measure(Tab. III).
Networks with interactions limited to a local
neighbourhood
and restricted to one bit onlyare able to function as associative memories since it is possible to create stable states with an
appreciable
correlation to the nominal random patterns. While it is, of course, not possible tostore an extensive number
(0 (N ) )
of patterns for z = 4 or 8 connections per neuron, theperformance
of the network is improved compared to thefully-connected
model(z
= 0(N ) ),
since the ratio of information stored to the information used to store it(in
thesynapses)
ishigher.
It would be interesting to study the effect of systematically increasing z(Ref. [23])
or of increasing the number of bits per synapse. The introduction of asynchronous dynamics may, of course, also have a significant effect. The local nature of the connections may notcorrespond
closely to real neural systems, but for purposes of hardware realisation of associative memory such models should prove far more viable,requiring 0 (N )
links on a chipof N « neurons » instead of
0 (N 2)
links.For such models with « clipped » synapses and local
connectivity
as well as simpleIsing-like
neurons, the
technique
of multi-spin coding is a very powerful simulation tool : itprovides
both a
significant
speed-up over conventional programs and a very efficient use of computer memory.Acknowledgments.
1 am very grateful to D. Stauffer of the HLRZ Jülich for very
helpful
and stimulatingdiscussions.
Appendix.
A multi-site
coding
algorithminvolving
3 bits per site[17] (for
z = 4 connections persite)
isdescribed here. The extension to higher values of z is
straightforward (unlike
the 1 bit per sitealgorithm described
above),
as will be pointed out below.As before, the state of a
neuron Si
isrepresented by
the bitvariable si - 1 (Si
+1 ),
and theconnections
Tij
are storedas tij (Tij
+1),
with the modifiedsignal Tij Sj
incident fromneuron j onto neuron i corresponding to EQV
(tij, sj),
where EQV is the« equivalence »
bitwise
logical operation. Now if we consider the local field experienced by a neuron,4
Hi Tij SJ,
we see that there are only four possible outcomes, namely Hi 4,i=1 1
4
- 2, 0, 2, or 4. The corresponding summation in bit
variables, hi E EQV(tij
; i
sj)
willyield
0, 1, 2, 3 or 4.In bit representation this updating rule can be achieved by
si (t
+7 ) = 1(0)
if and only ifhi
+ 1 +si (t ) > 4 ( 4 ).
By adding the extra 1 we now have the convenient rule that theneuron switches on if and only if the third bit of the sum is set. Furthermore, the outcome of this latter summation is restricted to an integer value lying between 1 and 6 inclusive, which only requires three bits of storage
(as
it would have without inclusion of theadditional 1).
Thiscan be