• Aucun résultat trouvé

Vectorized multi-site coding for nearest-neighbour neural networks

N/A
N/A
Protected

Academic year: 2021

Partager "Vectorized multi-site coding for nearest-neighbour neural networks"

Copied!
16
0
0

Texte intégral

(1)

HAL Id: jpa-00211044

https://hal.archives-ouvertes.fr/jpa-00211044

Submitted on 1 Jan 1989

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Vectorized multi-site coding for nearest-neighbour neural networks

B.M. Forrest

To cite this version:

B.M. Forrest. Vectorized multi-site coding for nearest-neighbour neural networks. Journal de Physique, 1989, 50 (15), pp.2003-2017. �10.1051/jphys:0198900500150200300�. �jpa-00211044�

(2)

Vectorized multi-site coding for nearest-neighbour neural

networks

B. M. Forrest

Institut für Festkörperforschung, Kemforschungsanlage, Postfach 1913, D-5170 Jülich, F.R.G.

(Reçu le 20 avril 1989, accepté le 21 avril 1989)

Résumé. 2014 Nous étudions par simulation numérique des réseaux de neurones binaires utilisant des algorithmes de codage multisite. Nous obtenons des vitesses de plus de 200 neurones visités

par microseconde à l’aide d’un algorithme vectorisé sur le Cray-XMP. Nous présentons des

résultats sur des réseaux bidimensionnels contenant jusqu’à 512 x 512 neurones. Nous montrons

que les réseaux fonctionnent comme des mémoires associatives et qu’ils stockent l’information de

façon plus efficace que les réseaux complètement connectés.

Abstract. 2014 Ising spin neural networks with clipped synapses 1 only) and with local

connectivity are simulated using multi-site coding algorithms. Speeds of over 200 neuron updates

per microsecond are achieved by vectorization of the algorithm on the Cray-XMP. Results are presented for two-dimensional networks of up to 512 x 512 neurons. The networks are shown to function as associative memories and the amount of information stored compared to the amount

used to store it improves upon fully-connected models.

Classification

Physics Abstracts

02.70 - 05.50

1. Introduction.

In recent years a great deal of research activity has centred around simplified models of neural

networks, examining their ability to perform as associative memories. Within the class of models which shall concern us here, each neuron may be represented by an Ising

(or bit) variable Si

= 1 for neuron « on »

(spin up)

or - 1 for neuron « off »

(spin down).

The state of

each neuron is govemed by its momentary local field

hi (t )

which is assumed to be a linear sum

of the incoming signals from all the neurons j which have a synaptic connection

Tij

incident onto neuron i :

If no stochastic noise is present

(which

shall be the case

here),

then each neuron simply

« aligns » with its local field,

Article published online by EDP Sciences and available at http://dx.doi.org/10.1051/jphys:0198900500150200300

(3)

where T represents the « clock cycle » of each neuron. In the simulations presented below, the network was

updated

synchronously, that is, at each time-step T all of the neutrons modified their states according to

(2).

The states of the network which are the fixed points of the above

dynamics

are customarily identified as the states which are « stored » by the network, corresponding to persistent « firing patterns »

(stable

spin

configurations).

The necessary and sufficient condition for S * to be such a state is that every spin

Si*

be aligned with its local field,

Ideally we would like to specify a priori a set of nominal

pattems (§[ ;

1 -- i --

N ; 1 --

r * p )

which are to be stored in this neural memory, i.e., which are to be fixed points

of the dynamics

(2),

or, at least, which should lie reasonably close to a fixed point. Whether or

not we are successful depends entirely upon our choice of the

synaptic

efficacies

Tij.

It is well known that for the Hopfield-Little model

[1-2],

which is a fully-connected

network

(where

each neuron may be connected to every

other),

the Hebbian

prescription

can successfully store up to p -= 0.14 N random uncorrelated patterns

[3].

In the ther-

modynamic limit

(N - oo )

the Tij

assume a continuous range of values, since they are

discretised on the scale

of 1 ,

5 as they are in the error-corrective

leaming

algorithms

[4-11]

which have been studied in order to improve upon the performance of

(4).

The realisation that the fully-connected nature. of these synaptic connections would prove

an insurmountable task in the fabrication of a network of any reasonable size has prompted

the consideration networks with more restricted architectures limiting the number of connections per site

[12].

Implementation difficulties would also be alleviated by either imposing an upper bound on the magnitude of the connections

[13-16] («

clipped

synapses »)

or by discretising the connections, or both. For example, in the extreme case of the latter

restriction, where the synapses are ± 1 only, it has been calculated

[15]

that a fully-connected

network will function as an associative memory, storing p ac N random uncorrelated patterns, with critical storage ratio ac = 0.102 and with retrieval

quality

at worst 97.4 %.

Here we shall consider the imposition of both restrictions : the

Tij

will be allowed to assume

only the values ± 1 and each neuron shall only be connected to its four nearest neighbours in a

two-dimensional network. As will be explained below, this shall allow us to simulate very

large networks using very

powerful

vectorized

multi-spin

coding

techniques [17].

A similar

model has recently been studied by Kürten

[18], employing

different techniques and addressing different aspects.

2. Choosing the synaptic connections.

z

Before describing the algorithm, we first should specify the choice of the synapses

Tij.

Given that they are limited to ± 1, how should we choose them ? We shall consider a

general asymmetric network so that we are not encumbered by the condition

Ti j

=

Tii.

This permits us to consider each site i independently. Now, given the set of nominal

patterns ( §[ ; 1--- i --

N ;

1 * r* p )

which we wish to store, at each site i we would like to have that, for each pattern r,

(4)

where j runs over the four nearest neighbours of i. Since each

Tij

can only assume one of two

possible values, there are only 24 choices for the four incoming connections to site i. We can

thus simply

perform

an exact enumeration and evaluate the p constraints

(5)

for each of these 16 possible choices and choose the best one for our connections to site i. The « best » choice shall be

designated

as that one which satisfies the most of the p inequations

(5).

In the case of

a tie, we shall choose the one which satisfies

N

where

Rr r Tii ei

and 0 is the threshold function. The reason for this choice is that a

=i i

larger value of

Ri

should imply larger content-addressability of the r-th nominal pattern

[8].

This exact enumeration of all possible choices of connections requires 16 Np evaluation of

(5).

3. Multi-spin coding.

Using the optimal choice of connections elucidated in the previous section, a two-dimensional network of L x L

neurons Si ;

1 -- i -- N

(N

=

L 2)

was simulated by employing a Fortran multi-spin

coding

algorithm which requires only one bit per spin

(neuron)

and is based on a

method propounded by Herrmann

[19]

for the fast simulation of Ising models.

The

technique

lends itself to system sizes where L is a multiple of 64. Thus the system sizes which are dealt with start off where in other models they often end : 4 096 neurons. Defining

M =

L/64,

the first M spins in the first row of the lattice are placed in the first bit of the 64-bit

integers IS

(1 ),

..., IS

(M),

then the next M spins are placed in the second bit, and so on up to the 64th bit. The next row of the lattice will be held by the words

IS (M + 1 )

to

IS (2

x

M).

The actual array of spins are thus represented by the L x M words

IS (M

+

1 )

to

IS(L

x M +

M),

with the top row

(IS (1 )

to

IS (M))

and an additional row at the bottom

serving as shadow lines to invoke up-down periodic

boundary

conditions. Hence the array

IS(L

x M + 2 x

M)

will hold all the spins in a L x L lattice plus these two shadow lines. With the exception of the words M + 1, M + 2, ... and 2 M, 3 M, ..., the four neighbours of each of

the 64 neurons held in the word IS

(1 )

will then be found at the same bit-position in the words

IS (I - M), IS (1 + 1 ), IS (I

+

M), IS (I -1 ) (up,

right, down and left neighbour, respect-

ively).

Table I. - The number of

single

neuron updates per micro- second achieved on the Cray

X-MP for various systems

of

size L x L.

(5)

Now,

recalling

the form of the

updating

rule for a neuron

(2),

we still have to specify the

case

of hi

= 0. The rule chosen in the simulations

was Si (t

+

T ) = Si (t )

if

hi

= 0 since, as will

be

explained

below, this was found to induce much more stability in the network than

choosing Si (t

+

T )

= 1 if

hi

= 0.

Representing

the state of a

neuron S¡

by the bit

variable s¡ == 4

1

(Si

+

1),

and

storing

the connections in a similar fashion,

tij - 1/2 (Tq + 1 ),

the modified

signal Tij sj

incident from

neuron j onto neuron i will then

correspond

to EQV

(tij, sj),

where EQV is the «

equivalence

»

bitwise logical operation.

The neural updating rule explained above can be realised in the

following

manner.

Denoting

the bits

EQV(tij, sj)

by nj, we set the i-th neuron « on »

(TRUE)

if and only if at

least three of the five bits nl, n2, n3, n4

and si

are TRUE. This can be

implemented

by the

Boolean function ,

where v denotes logical OR and A denotes logical AND.

Three separate loops, each of which fully vectorize due to the

parallel

nature of the neural dynamics, are needed for a sweep through the lattice : one for those words

(M + 1,

2 M + 1,

...)

where the left neighbour is not a bit in the same position as the bit of the site

being

updated ; one for those words who have such a right-hand neighbour

(2

M,

3 M,

...) ;

and one for all remaining words.

The algorithm achieves over 200 million neuron updates per second on the Cray X-MP : timings for various system sizes up to L = 2 048 are presented in table I. Of course, the above algorithm generalises for an n-bit machine

(n

= 64 for the

Cray)

to systems of linear size L = n x M.

(An

algorithm using 3 bits per site

[17]

was

slightly

slower

(around

180 million updates per

second)

but was applied to systems L -- 32 and is discussed in the

appendix.)

4. Results.

Systems of linear size L = 4 to 512 were simulated. Although the above algorithm allowed systems as

large

as 2 048 x 2 048 to be dealt with, the computational effort

required

for the

exact enumeration leaming of the synapses limited the number of statistical samples which

could be carried out in a reasonable time. Note that this initialisation effort is

equivalent

to

16 Np sweeps of the network

(Np

single spin updates for each possible choice of the

connections at a

site),

but that this part of the program was not carried out by multi-spin coding. The

timings

for the

updating

loop were found to be over 100 times faster using multi- spin

coding

as

compared

to normal Fortran

(one

word per site and involving integer

multiplications).

Given that, for

example,

at L = 512 the number of sweeps to

stability

of a

pattern was

typically

between 100 and 200

(see below),

this is a substantial

saving.

Figure

la shows the mean final

overlap

mf of an iterated pattern with a nominal state after

having

started from a state which had initial overlap mo with that nominal state. These results

were obtained by

averaging

over 10 initial states with

overlap

mo and then

performing

a quenched average over 103 independent samples

(103

choices of the

gr).

The overlap is the

usual measure of the

resemblance

of two patterns, or

spin configurations S(1)

and

S(2) :

(6)

Fig. 1. - a) Mean final overlap, m f, after iteration from an initial state having overlap mo and with p = 2 patterns stored ; b) the size-dependence of the mean final overlap, mf, from a state having

overlap mo = 0.75 with p = 2.

The choice of the dynamics which

keep Si

the same if

hi

= 0 is justified from this

figure

since

the performance using

Si (hi

=

0)

--+ 1 is greatly deteriorated, even at p = 2. Moreover, the

mean number of unstable neurons was observed to increase : e.g., to around 5 % at p = 2, mo = 1.0, and to 20 % at p = 2, mo = 0.5, from less than 0.1 % with the chosen

(7)

dynamics.

(These

unstable sites were in fact all bistable. Note that for voting rule cellular automata

[20],

where each cell adopts the state corresponding to a « poll » of its neighbours,

every state evolves into either a fixed point or a bistable state - no limit cycles of higher

period

exist. The network here is similar to such a rule, but is different in that the states of the

neighbouring

cells are involved in the « vote » only after they are modified by their synaptic

connections. )

These results hold for all the system sizes attempted with L > 12, since the mean final overlap was only found to show a size-dependent drift for L 12

(Fig. Ib).

For larger values

of L, m f remained unchanged

(within

statistical

fluctuations)

on increasing L - the width of the distribution of final overlaps merely decreased.

The

width i1ml= mf- (m f )2

of m f was found to increase as mo decreased, but the rate at which

i1ml decreased

with respect to increasing system size L was found to be independent of

mo. As shown in table II, obtained from figure 2,

Amf 2 apparently

obeys the scaling relation

with y =1.0 and

w (mo )

some function of mo only : thus

dml

exhibits strong self-averaging.

This in turn indicates that as L - oo the probability of obtaining a particular final overlap

m f from an initial overlap mo will approach a Kronecker delta function :

p (mf 1 mo)--->

8 (m f - m f (mo )),

where

m f (mo )

is the function plotted in figure la.

Fig. 2. - The size-dependence of the width,

Amf,

of mf after itération from four différent values of mo atp=2.

Table II. - Estimates

o f the

parameters g and c obtained from the linear relationship

(Fig. 2)

In

Am/

= g In L + c.

(8)

If we ask for the fraction

f (mo ) of

iterated states which are recalled to within 10 % accuracy

(m f , 0.8 )

from an initial state having overlap mo, then, for p = 2 we obtain the behaviour in

figure 3a.

Fig. 3. - a) At p = 2, the fraction, f, of states recalled with less than 10 % errors from initial overlap

mo. Best-fit scaling forms (9) are drawn through the points ; b) the linear relationship between

In ( f / (1- f ) ) and mo implies the scaling form (9).

(9)

The smooth curves drawn through the points are best-fit forms of the relation

expressing

the ratio of probability of recall to non-recall as

[8]

The validity of this assumption is confirmed by figure 3b.

The gradient g was found to scale linearly with L, since the data fit In g = y In L + c with y = 1.004 ± 0.014, and c = 0.230 ± 0.059, so that

g/L

= 1.26 ± 0.06 and an estimate for the critical minimum overlap, above which

f (mo) --+ 1

for L --+ ao, can be obtained by extrapolating the initial overlaps

mo( f )

required for a particular

f to

L - 1 --+ 0. This is done in

figure 4 for

f

= 0.2 and 0.8,

yielding

Fig. 4. - The initial overlap, m, required to produce a mean recall fraction, f, is plotted against L-1. Extrapolation to L -1-+ 0 yields the critical minimum overlap, mc.

Similar critical behaviour with respect to

f

has been obtained for fully-connected models

[8],

with

f (mo) following

an identical scaling form

(9).

It is not clear whether the number of sweeps to stability

n (L )

grows exponentially with respect to the system size for large L, although from

figure

5 we cannot rule out the case that

the number may obey some

scaling

law,

n(L, mo) = nt (mo) n2(L).

Such a law, with

n2(N )

= In N has recently been found by Kanter

[21]

for

infinite-ranged

interactions.

(For

the simulations here, the number

n (L )

is actually the number of sweeps until every neuron is either stable or bistable. In fact, the number of bistable neurons was of the order of o.1 % in all

cases.)

Simulations were also performed for networks containing both nearest- and next-nearest-

neighbour

(NNN)

connections for system sizes L = 16 up to L = 128. The exact enumeration

learning

procedure then involves 28 possible choices at each of the N sites of the network.

(10)

Fig. 5. - The mean number of sweeps, n (L ), required for iteration to stability from states having a given initial overlap, mo, at different system sizes L.

Similar behaviour of the final overlap was observed, Le., the mean value remained

invariant to within statistical fluctuations as L was increased, with the width decreasing. In

terms of the closeness of the fixed points to the nominal states, the

performance

of the

network improved, m f increasing for larger values of mo at p = 2 and 4. However for p = 8 the NNN network had a slightly inferior

performance,

as can be seen in figure 6.

Fig. 6. - Comparison of the retrieval quality of the network with only nearest-neighbour connections

(z = 4) and with additional next-nearest-neighbour connections (z = 8).

(11)

Relaxing the condition

(3)

that the alignment of « spin » and « local field » should be

strictly positive

to the case where it is only

required

to be non-negative enhances the retrieval

quality

of the network

(both

for z = 4 and

8),

but only from values of mo near 1.

Up until now we have only been considering the storage of p random, unbiased patterns, i. e. , where the mean «

magnetisation »

is zero :

where the angular brackets denote a

(quenched)

average over the choice of the random

g F.

It has been found that the storage capacity of this class of networks is improved if we

instead attempt to store biased pattems

[9, 22]

which have a non-zero mean

magnetisation,

Does this also hold for networks with restricted synapses of the type considered here ?

Figure 7a produces an affirmative answer, showing that for high enough bias a the retrieval

quality is improved.

However, as explained by Amit et al.

[22],

we really should examine not merely the number

of patterns stored, but the total information content of the patterns. Their measure takes into

account both the amount of information stored in a nominal pattern and the loss of information when the pattern is retrieved with errors

(m

f

1 ).

The information stored in each nominal pattern is the entropy, S, associated with the number of ways of choosing a

random

pattern 03BE

subject to its

magnetisation (11)

being a :

The information lost when the retrieved pattern has overlap m f with the nominal one is the entropy of all possible patterns which have an overlap m

with

that nominal pattern :

Thus, for z connections per site, the total information per connection stored in the network when the mean final overlap of retrieved pattems is m f is ,

Note that this is also the information per bit used to store the patterns.

Using this

quantity (Fig. 7b)

we see that less information is actually stored as the bias

(and

hence

correlation)

of the patterns is increased. It is also evident that although the NNN

network produces enhanced retrieval of p = 2 and 4 patterns, the information per connection is actually less.

(12)

Fig. 7. - a) Mean final overlap, mf (from mo = 1) in networks storing random patterns of mean magnetisation a ; b) the corresponding information stored per bit of information used.

(13)

5. Discussion.

How well does the performance of this type of network compare to that with of a fully-

connected model where each neuron can interact with any other ? It has become customary to

measure the network’s

ability

to store patterns in terms of the storage ratio a -

p/N,

the ratio of the number p of patterns stored in a network of N neurons. This would be a

naive and unfair measure, however : we should rather use a

quantity

which tells us the number of bits of information stored,

bs,

compared to the number of bits used to store them

(the

number of bits needed to specify all of the

synapses), bT.

For the

fully-connected

network with

binary

synapses

(Tij =

±

1 ),

the number of bits used is

bT

= N 2 for

asymmetric connections, 1 N 2

for symmetric interactions. In the networks considered here, the

corresponding

number is

bT

= zN, where the coordination number z is 4 for the

nearest-neighbour

network and z = 8 for the NNN network.

In table III two measures are used for the number of bits stored in the network. The information ratio R, involves the total number of « uncorrupted » bits which are retrieved : if the nominal patterns are retrieved with a mean final overlap m f, then the number of retrieval

errors for each pattern

is 1 N (1- m f),

so that the total number of bits stored correctly is

bs N (1 + mf) p.

For a fully-connected network with symmetric couplings

[15]

mf = 0.948 and p = 0.102 N, thus Ri -

bs/bT

= 0.199. From table III it is clear that both the

nearest-neighbour

and NNN networks

outperform

their

long-range

counterpart.

The second ratio, R2, uses an entropic measure

(as

discussed

previously)

akin to that of

Amit et al.

[22]

for the information stored :

Rz == 1 (mf)/bT,

where

1 (mf)

is the total

information stored in p patterns,

Table III. - Comparison o f the performance

o f a clipped fully-connected

model with the local

z = 4 and z = 8 models in terms

o f the

ratios : Rl, the number

o f bits

retrieved without errors to

the number o f bits used in the synapses,

bT,

and ; R2, the information 1

(mf) (15)

stored in the

network to

bT.

(14)

Once again, the networks with localised connections

perform

better with respect to this second measure

(Tab. III).

Networks with interactions limited to a local

neighbourhood

and restricted to one bit only

are able to function as associative memories since it is possible to create stable states with an

appreciable

correlation to the nominal random patterns. While it is, of course, not possible to

store an extensive number

(0 (N ) )

of patterns for z = 4 or 8 connections per neuron, the

performance

of the network is improved compared to the

fully-connected

model

(z

= 0

(N ) ),

since the ratio of information stored to the information used to store it

(in

the

synapses)

is

higher.

It would be interesting to study the effect of systematically increasing z

(Ref. [23])

or of increasing the number of bits per synapse. The introduction of asynchronous dynamics may, of course, also have a significant effect. The local nature of the connections may not

correspond

closely to real neural systems, but for purposes of hardware realisation of associative memory such models should prove far more viable,

requiring 0 (N )

links on a chip

of N « neurons » instead of

0 (N 2)

links.

For such models with « clipped » synapses and local

connectivity

as well as simple

Ising-like

neurons, the

technique

of multi-spin coding is a very powerful simulation tool : it

provides

both a

significant

speed-up over conventional programs and a very efficient use of computer memory.

Acknowledgments.

1 am very grateful to D. Stauffer of the HLRZ Jülich for very

helpful

and stimulating

discussions.

Appendix.

A multi-site

coding

algorithm

involving

3 bits per site

[17] (for

z = 4 connections per

site)

is

described here. The extension to higher values of z is

straightforward (unlike

the 1 bit per site

algorithm described

above),

as will be pointed out below.

As before, the state of a

neuron Si

is

represented by

the bit

variable si - 1 (Si

+

1 ),

and the

connections

Tij

are stored

as tij (Tij

+

1),

with the modified

signal Tij Sj

incident from

neuron j onto neuron i corresponding to EQV

(tij, sj),

where EQV is the

« equivalence »

bitwise

logical operation. Now if we consider the local field experienced by a neuron,

4

Hi Tij SJ,

we see that there are only four possible outcomes, namely Hi 4,

i=1 1

4

- 2, 0, 2, or 4. The corresponding summation in bit

variables, hi E EQV(tij

; i

sj)

will

yield

0, 1, 2, 3 or 4.

In bit representation this updating rule can be achieved by

si (t

+

7 ) = 1(0)

if and only if

hi

+ 1 +

si (t ) > 4 ( 4 ).

By adding the extra 1 we now have the convenient rule that the

neuron switches on if and only if the third bit of the sum is set. Furthermore, the outcome of this latter summation is restricted to an integer value lying between 1 and 6 inclusive, which only requires three bits of storage

(as

it would have without inclusion of the

additional 1).

This

can be

exploited

in the following fashion. 21

neurons Si

can be stored in one 64-bit integer

IS (1) :

Références

Documents relatifs

Consider the flow of water in a channel and assume that the water is incompressible, non-viscous, non-heat conducting and subject to gravitational forces.. Flow in a channel with

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

What we want to do in this talk is firstly to explain when (§1) and why (§2) a presupposition trigger becomes obligatory in the very case where it is redundant, and secondly

Three key examples are detailed: a linearized Schr¨odinger equation describing the interaction of an electro-magnetic field (the control) with an n-levels quantum system; the

A natural question about a closed negatively curved manifold M is the following: Is the space MET sec<0 (M) of negatively curved met- rics on M path connected?. This problem has

This provides a simple means of using the TRAP-TYPE macro to represent the existing standard SNMP traps; it is not intended to provide a means to define additional standard

This document describes a family of Initial Connection Protocols (ICP’s) suitable for establishing one pair of connections between any user process and any server process,

This memo makes the case for interdependent TCP control blocks, where part of the TCP state is shared among similar concurrent connections, or across similar