Vectorized multi-site coding for nearest-neighbour neural networks

(1)

HAL Id: jpa-00211044

https://hal.archives-ouvertes.fr/jpa-00211044

Submitted on 1 Jan 1989

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Vectorized multi-site coding for nearest-neighbour neural networks

B.M. Forrest

To cite this version:

B.M. Forrest. Vectorized multi-site coding for nearest-neighbour neural networks. Journal de Physique, 1989, 50 (15), pp.2003-2017. �10.1051/jphys:0198900500150200300�. �jpa-00211044�

(2)

Vectorized multi-site coding ^for nearest-neighbour ^neural

networks

B. M. Forrest

Institut für Festkörperforschung, Kemforschungsanlage, ^Postfach1913, ^D-5170Jülich, ^F.R.G.

(Reçu le 20 avril 1989, accepté le 21 avril 1989)

Résumé. ²⁰¹⁴Nous étudions par simulation numérique des réseaux de neurones binaires utilisant des algorithmes ^decodage multisite. Nous obtenons des vitesses de plus ^de²⁰⁰^neurones^visités

par microseconde à l’aide d’un algorithme ^vectorisé^sur^leCray-XMP. ^Nousprésentons ^des

résultats sur des réseaux bidimensionnels contenant jusqu’à ⁵¹²^x⁵¹²^neurones.^Nous^montrons

que les réseaux fonctionnent ^commedes mémoires associatives et qu’ils ^stockentl’information de

façon plus efficace que les réseaux complètement ^connectés.

Abstract. ²⁰¹⁴Ising spin ^neuralnetworks with clipped synapses (± ¹only) and with local

connectivity âre^simulatedusing multi-site coding algorithms. Speeds ôfôver²⁰⁰^neuronupdates

per microsecond âreachieved by vectorization of the algorithm ôn^theCray-XMP. ^Resultsâre presented ^fortwo-dimensional networks of up ^to512 x 512 neurons. The networks are shown to function as associative memories and the amount of information stored compared ^to^theâmount

used to store it improves upon fully-connected ^models.

Classification

Physics ^Abstracts

02.70 ^-05.50

1. Introduction.

In recent years ^agreat deal of research activity has centred around simplified models of neural

networks, examining ^theirability ^toperform âsassociative memories. Within the class of models which shall concern us here, êach^neuronmay be represented by ânIsing

(or bit) variable Si

^{= 1 for}^neuron^«^{on »}

(spin up)

or - 1 for neuron « off »

(spin down).

^The^state^of

each neuron is govemed by ^itsmomentary local field

hi (t )

^whichis assumed to be a linear sum

of the incoming signals from all the neurons j which have a synaptic ^connection

Tij

^incident^onto^neuron^{i :}

If no stochastic noise is present

(which

shall be the case

here),

^{then each}^neuronsimply

« aligns ^»with its local field,

Article published online by EDP Sciences and available at http://dx.doi.org/10.1051/jphys:0198900500150200300

(3)

where T represents ^the^«^clockcycle » ^{of each}^neuron.^Inthe simulations presented below, ^the network was

updated

synchronously, ^thatis, ^at^eachtime-step T all of the neutrons modified their states according ^to

(2).

^The^statesof the network which are the fixed points ^{of the}^above

dynamics

^arecustomarily identified as the states which are « stored » by ^thenetwork, corresponding ^topersistent ^«firing patterns »

(stable

spin

configurations).

The necessary and sufficient condition for S * to be such a state is that every spin

Si*

^bealigned with its local field,

Ideally ^we^would^like^tospecify a priori ^a^set of nominal

pattems (§[ ;

^{1 -- i --}

N ; 1 --

r * p )

^which^are^tobe stored in this neural memory, i.e., ^which^are^to^be^fixedpoints

of the dynamics

(2),

ôr,ât^least,^whichshould lie reasonably ^close^toâ^fixedpoint. ^Whetherôr

not we are successful depends entirely upon ^ourchoice of the

synaptic

^efficacies

Tij.

It is well known that for the Hopfield-Little ^model

[1-2],

^{which is}^afully-connected

network

(where

^each^neuronmay be connected ^toevery

other),

the Hebbian

prescription

can successfully ^storeup ^top -= 0.14 N random uncorrelated patterns

[3].

In the ther-

modynamic ^limit

(N - oo )

the ^Tij

^{assume a}continuous range of values, ^sincethey ^are

discretised on the scale

of 1 ,

⁵^as^they^areⁱⁿthe error-corrective

leaming

algorithms

[4-11]

which have been studied in order to improve upon the performance ^of

(4).

The realisation that the fully-connected ^nature.^{of these}synaptic connections would prove

an insurmountable task in the fabrication of a network of any reasonable size has prompted

the consideration networks with more restricted architectures limiting the number of connections per site

[12].

Implementation difficulties would also be alleviated by êither imposing ânupper bound ônthe magnitude of the connections

[13-16] («

clipped

synapses »)

or by discretising ^theconnections, ^orboth. For example, ^{in the}^extreme^caseof the latter

restriction, ^wherethe synapses ^are^±1 only, it has been calculated

[15]

^that^afully-connected

network will function as an associative memory, storing p ac N random uncorrelated patterns, ^with^criticalstorage ^ratioac ⁼0.102 and with retrieval

quality

^{at worst}^{97.4 %.}

Here we shall consider the imposition of both restrictions : the

Tij

will be allowed to assume

only ^the^{values ±}^{1 and}^each^neuron^shallonly be connected to its four nearest neighbours ⁱⁿ^a

two-dimensional network. As will be explained below, this shall allow us to simulate _very

large ^networksusing very

powerful

vectorized

multi-spin

coding

techniques [17].

^A^similar

model has recently been studied by ^Kürten

[18], employing

^differenttechniques ^and addressing ^differentaspects.

2. Choosing ^thesynaptic connections.

z

Before describing ^thealgorithm, ^wefirst should specify the choice of the synapses

Tij.

^Given^that^they^are^limited^{to ±}^1,^how^should^wechoose them ? We shall consider a

general asymmetric ^network^so^that^we^are^notencumbered by ^the^condition

Ti j

⁼

Tii.

^This^permits^us^toconsider each site i independently. ^Now,given ^the^set^{of nominal}

patterns ( §[ ; ^{1--- i --}

^{N ;}

1 * r* p )

^which^we^wish^to^store,^ateach site i we would like to have that, ^{for each}pattern ^r,

(4)

where j ^{runs over}^the^four^nearestneighbours of i. Since each

Tij

^canônlyâssumeôneôf^two

possible values, ^there^areonly ²⁴^choicesfor the four incoming connections to site i. We can

thus simply

perform

^an^exactenumeration and evaluate the p constraints

(5)

for each of these 16 possible ^choicesand choose the best one for our connections to site i. The ^«best ^»choice shall be

designated

^as^that^onewhich satisfies the most of the p inequations

(5).

^In^the^case^of

a tie, ^weshall choose the one which satisfies

N

where

Rr r ^{Tii ei}

and 0 is the threshold function. The reason for this choice is that a

=i ⁱ

larger ^{value of}

Ri

^shouldimply larger content-addressability of the r-th nominal pattern

[8].

This exact enumeration of all possible choices of connections requires ¹⁶Np evaluation of

(5).

3. Multi-spin coding.

Using ^theoptimal choice of connections elucidated in the previous section, ^atwo-dimensional network of L x L

neurons Si ;

^{1 -- i --}^N

(N

⁼

L 2)

^was^simulatedby employing ^a^Fortran multi-spin

coding

algorithm ^whichrequires only ^onebit per spin

(neuron)

and is based on a

method propounded by ^Herrmann

[19]

for the fast simulation of Ising ^models.

The

technique

lends itself to system sizes where L is a multiple of 64. Thus the system ^sizes which are dealt with start off where in other models they ^often^{end :}⁴⁰⁹⁶^neurons.Defining

M ⁼

L/64,

the first M spins in the first row of the lattice are placed ^{in the}^firstbit of the 64-bit

integers ^IS

(1 ),

^...,^IS

(M),

^{then the}^next^Mspins ^areplaced in the second bit, ^and^{so on}up ^to the 64th bit. The next row of the lattice will be held by the words

IS (M + 1 )

^to

IS (2

^x

M).

^The^actualarray of spins ^are^thusrepresented by ^{the L}^x^M^words

IS (M

⁺

1 )

^to

IS(L

^x^M⁺

M),

^{with the}^top^row

(IS (1 )

^to

IS (M))

^and^anadditional row at the bottom

serving ^asshadow lines to invoke up-down periodic

boundary

conditions. Hence the array

IS(L

^x^M⁺²^x

M)

^willhold all the spins ⁱⁿ^a^L^x^L^latticeplus ^these^twoshadow lines. With the exception of the words M + 1, ^M⁺2, ^...^{and 2}M, ³M, ..., ^{the four}neighbours ^{of each of}

the 64 neurons held in the word IS

(1 )

will then be found at the same bit-position in the words

IS (I - M), IS (1 + 1 ), IS (I

⁺

M), IS (I -1 ) (up,

right, down and left neighbour, respect-

ively).

Table I. - The number of

single

^neuronupdates per microsecond achieved ^onthe Cray

X-MP for ^varioussystems

of

^{size L}^x^L.

(5)

Now,

recalling

the form of the

updating

^{rule for}^a^neuron

(2),

^westill have to specify ^the

case

of hi

= 0. The rule chosen in the simulations

was Si (t

⁺

T ) = Si (t )

^if

hi

^{= 0}since, ^as^will

be

explained

below, ^this^was^found^to^induce^much^morestability in the network than

choosing Si (t

⁺

T )

^{= 1 if}

hi

⁼^0.

Representing

^the^state^of^a

neuron S¡

by ^{the bit}

variable s¡ == 4

1

^(Si

⁺

^1),

^and

^storing

^the connections in a similar fashion,

tij - 1/2 ^(Tq ^{+ 1 ),}

the modified

signal Tij sj

^incident^from

neuron j ^onto^{neuron i}^{will then}

correspond

^to^EQV

(tij, sj),

^where^EQV^{is the}^«

equivalence

^»

bitwise logical operation.

The neural updating ^ruleexplained ^above^can^berealised in the

following

^manner.

Denoting

^{the bits}

EQV(tij, sj)

^by^nj,^we^set^{the i-th}^neuron^«^on^»

^(TRUE)

^{if and}ônlyîfât

least three of the five bits _nl,_n2,_{n3, n4}

and si

^areTRUE. This can be

implemented

by ^the

Boolean function ^,

where v denotes logical ÔRândÂ^denoteslogical ÂND.

Three separate loops, ^{each of}^whichfully vectorize due to the

parallel

^natureôf^the^neural dynamics, âreneeded for a sweep through ^the^{lattice :} ônefor those words

(M + 1,

2 M + 1,

...)

^where^{the left}neighbour îs^notâbit in the same position âsthe bit of the site

being

updated ; ^one^for^thosewords who have such a right-hand neighbour

(2

^M,

3 M,

...) ;

^and^one^{for all}remaining ^words.

The algorithm âchievesôver200 million neuron updates per second on the Cray ^{X-MP :} timings ^for^varioussystem ^sizesup ^toL ⁼2 048 are presented ⁱⁿ^tableÎ.Ôf^course,^{the above} algorithm generalises ^forân^n-bit^machine

(n

⁼⁶⁴^{for the}

Cray)

^tosystems ^{of linear}^size L = n x M.

(An

algorithm using ³bits per site

[17]

^was

slightly

^slower

(around

¹⁸⁰^millionupdates per

second)

^but^wasapplied ^tosystems L -- 32 and is discussed in the

appendix.)

4. Results.

Systems of linear size L ⁼4 to 512 were simulated. Although ^theâbovealgorithm âllowed systems âs

large

^as²⁰⁴⁸^x²⁰⁴⁸^to^{be dealt}with, ^thecomputational ^effort

required

^{for the}

exact enumeration leaming ^of^thesynapses limited the number of statistical samples ^which

could be carried out in a reasonable time. Note that this initialisation effort is

equivalent

^to

16 Np sweeps of the network

(Np

single spin updates ^{for each}possible ^choice^{of the}

connections at a

site),

^but^that^this^part^ofthe program ^was^not^carried^outby multi-spin coding. ^The

timings

^{for the}

updating

loop ^were^found^to^be^over¹⁰⁰times faster using ^multi- spin

coding

^as

compared

^tonormal Fortran

(one

^wordper site and involving integer

multiplications).

^Given^that,^for

example,

^at^L⁼512 the number of sweeps ^to

stability

^of^a

pattern ^was

typically

between 100 and 200

(see below),

^{this is}^asubstantial

saving.

Figure

^la^shows^the^mean^final

overlap

mf of an iterated pattern ^with^a^nominal^state^after

having

started from a state which had initial overlap mo with that nominal state. These results

were obtained by

averaging

^over¹⁰^initial^states^with

overlap

mo and then

performing

^a quenched average ^over103 independent samples

(103

choices of the

gr).

^Theoverlap ^{is the}

usual measure of the

resemblance

^of^two_patterns,^or

spin configurations S(1)

^and

S(2) :

(6)

Fig. ^1.- a) ^Mean^finaloverlap, m f, after iteration from ^aninitial state having overlap mo and with p ⁼2 patterns stored ; b) ^thesize-dependence ^{of the}^mean^finaloverlap, mf, from ^{a state}having

overlap mo ⁼0.75 with _p⁼2.

The choice of the dynamics ^which

keep Si

^the^same^if

hi

⁼^{0 is}justified ^from^this

figure

^since

the performance using

Si (hi

⁼

0)

^--+¹^isgreatly deteriorated, ^evenat p ⁼^2.Moreover, ^the

mean number of unstable neurons was observed to increase : e.g., ^toaround 5 % at p ⁼2, mo = 1.0, ^and^to^{20 %}at p ⁼2, mo ⁼0.5, from less than 0.1 % with the chosen

(7)

dynamics.

(These

unstable sites were in fact all bistable. Note that for voting rule cellular automata

[20],

where each cell adopts ^the^statecorresponding ^to^a^«poll » ^{of its}neighbours,

every ^stateevolves into either a fixed point ^{or a}^bistable^state^-^no^limitcycles ^ofhigher

period

exist. The network here is similar to such a rule, ^butis different in that the states of the

neighbouring

^cellsâreinvolved in the ^«vote » only âfterthey âre^modifiedby ^theirsynaptic

connections. )

These results hold for all the system ^sizesattempted with L > 12, ^{since the}^mean^final overlap ^wasonly ^found^to^show^asize-dependent drift for L 12

(Fig. Ib).

^For^larger^values

of L, m f remained unchanged

(within

statistical

fluctuations)

^onincreasing ^{L -}the width of the distribution of final overlaps merely ^decreased.

The

width i1ml= mf- (m f )2

^{of m f}^was^found^to^increase^as^modecreased, ^but^the^{rate at} which

i1ml decreased

^withrespect ^toincreasing system ^{size L}^was^found^to^beindependent ^of

mo. As shown in table II, obtained from figure 2,

Amf 2 apparently

obeys ^thescaling ^relation

with y =1.0 ^and

w (mo )

^some^function^{of mo}only : ^thus

dml

^exhibits^strongself-averaging.

This in turn indicates that as L - oo the probability ^ofobtaining ^aparticular ^finaloverlap

m f from an initial overlap mo will approach ^aKronecker delta function :

p (mf 1 mo)--->

8 (m f - m f (mo )),

^where

m f (mo )

is the function plotted ⁱⁿfigure ^la.

Fig. ^2.^-^Thesize-dependence ^{of the}^width,

Amf,

of mf after itération from four différent values of mo atp=2.

Table II. ^-Estimates

o f the

parameters g ^and^c^obtainedfrom the linear relationship

(Fig. 2)

In

Am/

^{= g In L}⁺^c.

(8)

If we ask for the fraction

f (mo ) of

^iterated^states^which^are^recalled^towithin 10 % accuracy

(m f , 0.8 )

^from^an^initial^statehaving overlap mo, then, for p ⁼²^weobtain the behaviour in

figure ^3a.

Fig. ^3.^-a) At p ⁼2, ^thefraction, f, ^of^statesrecalled with less than 10 % errors from initial overlap

mo. Best-fit scaling ^forms(9) ^are^drawnthrough ^thepoints ; b) the linear relationship ^between

In ( f / (1- ^{f ) )}^{and mo}^implies^the^scaling^form^(9).

(9)

The smooth curves drawn through ^thepoints ^arebest-fit forms of the relation

expressing

the ratio of probability ^{of recall}^tonon-recall as

[8]

The validity ^{of this}assumption ^is^confirmedby figure ^3b.

The gradient g ^was^found^to^scalelinearly ^withL, since the data fit In g ⁼y In L + c with y = 1.004 ± 0.014, ^and^c⁼^{0.230 ±}0.059, ^so^that

g/L

^{= 1.26}^±^0.06ândânestimate for the critical minimum overlap, âbove^which

f (mo) --+ 1

for L --+ _ao, can be obtained by extrapolating the initial overlaps

mo( f )

required ^for^aparticular

f to

^{L - 1 --+}0. This is done in

figure ^{4 for}

f

⁼^0.2^and0.8,

yielding

Fig. ^4.^-The initial overlap, ^m,required ^toproduce ^{a mean}^recallfraction, f, ^isplotted against L-1. Extrapolation ^toL -1-+ 0 yields the critical minimum overlap, ^mc.

Similar critical behaviour with respect ^to

f

^{has been}^obtained^forfully-connected ^models

[8],

^with

f (mo) following

^an^identicalscaling ^form

(9).

It is not clear whether the number of _sweepsto stability

n (L )

grows exponentially ^with respect ^to^thesystem ^size^forlarge ^L,although ^from

figure

⁵^we^cannot^rule^out^the^case^that

the number may obey ^some

scaling

law,

n(L, mo) = nt (mo) n2(L).

^Such^a^law,^with

n2(N )

⁼^{In N}^hasrecently ^been^foundby ^Kanter

[21]

^for

infinite-ranged

interactions.

(For

the simulations here, the number

n (L )

^isactually the number of sweeps until every ^neuronis either stable or bistable. In fact, the number of bistable neurons was of the order of o.1 % in all

cases.)

Simulations were also performed for networks containing ^both^nearest-^andnext-nearest-

neighbour

(NNN)

connections for system ^{sizes L}^{= 16}up ^toL ⁼128. The exact enumeration

learning

procedure then involves 28 possible ^choices^at^{each of}^{the N}^sitesof the network.

(10)

Fig. ^{5. - The}^mean^numberof sweeps, n (L ), required for iteration to stability ^from^stateshaving â given înitialoverlap, mo, âtdifferent system ^{sizes L.}

Similar behaviour of the final overlap ^wasobserved, Le., ^the^mean^value^remained

invariant to within statistical fluctuations as L was increased, ^{with the}^widthdecreasing. ^In

terms of the closeness of the fixed points ^tothe nominal states, the

performance

^{of the}

network improved, m f increasing ^forlarger values of mo at p ⁼2 and 4. However for p ⁼8 the NNN network had a slightly ^inferior

performance,

^{as can}^be^seenⁱⁿfigure ^6.

Fig. ^6.^-Comparison of the retrieval quality of the network with only nearest-neighbour connections

(z ⁼4) and with additional next-nearest-neighbour connections (z ⁼8).

(11)

Relaxing ^the^condition

(3)

^{that the}alignment ^of^«spin » ^and^«local field » should be

strictly positive

^to^the^casewhere it is only

required

^to^benon-negative enhances the retrieval

quality

of the network

(both

^for^z⁼^{4 and}

8),

^butonly from values of mo ^near1.

Up ^until^{now we}^haveonly ^beenconsidering ^thestorage of p random, ^unbiased_patterns, i. e. , ^where^the^mean^«

magnetisation »

^is^{zero :}

where the angular brackets denote a

(quenched)

average ^overthe choice of the random

g F.

It has been found that the storage capacity of this class of networks is improved ^if^we

instead attempt ^{to store}^biasedpattems

[9, 22]

which have a non-zero mean

magnetisation,

Does this also hold for networks with restricted synapses of the type considered here ?

Figure ^7aproduces ^anaffirmative _answer,showing ^{that for}high enough ^bias^athe retrieval

quality ^isimproved.

However, ^asexplained by ^{Amit et}^al.

[22],

^wereally should examine not merely ^{the number}

of patterns stored, but the total information content of the patterns. ^Their^measure^{takes into}

account both the amount of information stored in a nominal pattern and the loss of information when the pattern is retrieved with errors

(m

f

1 ).

The information stored in each nominal pattern ^{is the}entropy, S, associated with the number of ways of choosing ^a

random

pattern 03BE

subject ^to^its

magnetisation (11)

being a :

The information lost when the retrieved pattern ^hasoverlap m f with the nominal ^oneis the entropy ^{of all}possible patterns which have an overlap m

with

that nominal pattern :

Thus, ^for^zconnections _persite, ^{the total}information per connection stored in the network when the mean final overlap ^of^retrievedpattems is m f is ,

Note that this is also the information _perbit used to store the patterns.

Using ^this

quantity (Fig. 7b)

^{we see}^thatless information is actually ^stored^as^the^bias

(and

hence

correlation)

^{of the}patterns is increased. It is also evident that although ^{the NNN}

network produces enhanced retrieval of p = 2 ^{and 4}patterns, ^theinformation per connection is actually ^less.

(12)

Fig. ^7.^-a) Mean final overlap, mf (from mo = 1) ⁱⁿ^networksstoring ^randompatterns ^of^mean magnetisation a ; b) ^thecorresponding information stored per bit of information used.

(13)

5. Discussion.

How well does the performance ^of^thistype of network compare ^tothat with of a fully-

connected model where each neuron can interact with _anyother ? It has become customary ^to

measure the network’s

ability

^to ^storepatterns ⁱⁿ^terms^of^thestorage ^ratio ^{a -}

p/N,

^the^{ratio of}^the^number^{p of}^patterns^{stored in}^anetwork of N neurons. This would be a

naive and unfair _measure,however : we should rather use a

quantity

which tells us the number of bits of information stored,

bs,

compared ^to^the^number^of^bits^used^to^store^them

(the

number of bits needed to specify all of the

synapses), bT.

For the

fully-connected

network with

binary

synapses

(Tij =

^±

^{1 ),}

the number of bits used is

bT

⁼^{N 2 for}

asymmetric connections, 1 N 2

^forsymmetric interactions. In the networks considered here, ^the

corresponding

^{number is}

bT

⁼zN, ^{where the}coordination number z is 4 for the

nearest-neighbour

^network^and^z⁼8 for the NNN network.

In table III two measures are used for the number of bits stored in the network. The information ratio R, involves the total number of ^«uncorrupted ^»^bits^which^areretrieved : if the nominal patterns ^areretrieved with a mean final overlap m f, then the number of retrieval

errors for each pattern

is 1 N (1- m f),

^so^thatthe total number of bits stored correctly ^is

bs N (1 + mf) p.

^For^afully-connected network with symmetric couplings

[15]

mf = 0.948 and p ⁼0.102 N, ^thusRi -

bs/bT

⁼0.199. From table III it is clear that both the

nearest-neighbour

^{and NNN}^networks

outperform

^their

long-range

counterpart.

The second ratio, R2, ^{uses an}entropic ^measure

(as

^discussed

previously)

^akin^to^{that of}

Amit et al.

[22]

^{for the}information stored :

Rz == 1 (mf)/bT,

^where

1 (mf)

^is^the^total

information stored _{in p}patterns,

Table III. ^-Comparison o f ^theperformance

o f a clipped fully-connected

model with the local

z ⁼4 and ^z⁼8 models in terms

o f the

^{ratios :}Rl, ^the^number

o f bits

^retrieved^without^errors^to

the number o f ^{bits used}ⁱⁿ^thesynapses,

bT,

and ; R2, ^theinformation ¹

(mf) (15)

^storedⁱⁿ^the

network to

bT.

(14)

Once again, the networks with localised connections

perform

better with respect ^to^this second measure

(Tab. III).

Networks with interactions limited to a local

neighbourhood

^andrestricted to one bit only

are able to function as associative memories since it is possible ^{to create}^stable^states^with^an

appreciable

correlation to the nominal random patterns. ^{While it}is, ^ofcourse, ^notpossible ^to

store an extensive number

(0 (N ) )

^ofpatterns ^for^z⁼⁴^or⁸connections _per_neuron,the

performance

^of the network is improved compared ^to ^the

fully-connected

^model

(z

⁼⁰

(N ) ),

since the ratio of information stored to the information used to store it

(in

^the

synapses)

^is

higher.

Ît^{would be}interesting ^tostudy ^theêffectôfsystematically increasing ^z

(Ref. [23])

ôrôfincreasing the number of bits per synapse. The introduction of asynchronous dynamics may, of course, also have a significant êffect.^{The local}^nature^{of the}connections may ^not

correspond

closely ^toreal neural systems, ^but^forpurposes of ^hardwarerealisation of associative memory such models should _{prove far}more viable,

requiring 0 (N )

^links^on^achip

of N ^«neurons » instead of

0 (N 2)

^links.

For such models with « clipped ^»synapses and local

connectivity

^as^well^assimple

Ising-like

neurons, the

technique

ôfmulti-spin coding îsâvery powerful simulation tool : it

provides

both a

significant

speed-up ôverconventional programs and âvery efficient ûseof computer memory.

Acknowledgments.

1 am very grateful ^to^D.^Staufferof the HLRZ Jülich _{for very}

helpful

^andstimulating

discussions.

Appendix.

A multi-site

coding

algorithm

involving

³^bitsper site

[17] (for

^z⁼4 connections per

site)

^is

described here. The extension to higher values of z is

straightforward (unlike

the 1 bit per site

algorithm ^described

above),

^as^{will be}pointed ^out^below.

As before, ^the^state^of^a

neuron Si

^is

represented by

^the^bit

variable si - 1 ^(Si

⁺

^{1 ),}

^{and the}

connections

Tij

^are^stored

as tij (Tij

⁺

^1),

with the modified

signal Tij Sj

incident from

neuron j ^onto^neuronⁱcorresponding ^to^EQV

(tij, sj),

^where^EQV^{is the}

« equivalence »

bitwise

logical operation. ^Now^if^we^considerthe local field experienced by ^a^neuron,

4

Hi ^{Tij SJ,}

^we^see^that^thereâreônly^four^possibleôutcomes,namely Hi ^4,

i=1 ¹

4

- 2, 0, 2, ^or^4. ^Thecorresponding summation in bit

variables, hi E EQV(tij

; ⁱ

sj)

^will

^yield

0, 1, 2, ³^or^4.

In bit representation ^thisupdating ^rule^can^be^achievedby

si (t

⁺

7 ) = 1(0)

^{if and}only ^if

hi

⁺¹⁺

si (t ) > 4 ( 4 ).

By adding ^the^extra¹^{we now}^{have the}convenient rule that the

neuron switches on if and only ^{if the}^thirdbit of the sum is set. Furthermore, ^the^outcome^of this latter summation is restricted to an integer ^valuelying ^{between 1}^{and 6}inclusive, ^which only requires three bits of storage

(as

it would have without inclusion of the

additional 1).

^This

can be

exploited

^{in the}following fashion. 21

neurons Si

^can^be^{stored in}^one^64-bitinteger

Vectorized multi-site coding for nearest-neighbour neural networks

HAL Id: jpa-00211044

https://hal.archives-ouvertes.fr/jpa-00211044

Vectorized multi-site coding for nearest-neighbour neural networks

B.M. Forrest

To cite this version:

Vectorized multi-site coding for nearest-neighbour neural

networks

(or bit) variable Si

(spin up)

(spin down).

hi (t )

Tij

(which

here),

updated

(2).

dynamics

(stable

configurations).

Si*

pattems (§[ ;

r * p )

(2),

synaptic

Tij.

[1-2],

(where

other),

prescription

[3].

(N - oo )

the Tij

of 1 ,

leaming

[4-11]

(4).

[12].

[13-16] («

synapses »)

[15]

quality

Tij

powerful

multi-spin

techniques [17].

[18], employing

Tij.

Ti j

Tii.

patterns ( §[ ; 1--- i --

1 * r* p )

Tij

perform

(5)

designated

(5).

Rr r Tii ei

Ri

[8].

(5).

neurons Si ;

(N

L 2)

coding

(neuron)

[19]

technique

L/64,

(1 ),

(M),

IS (M + 1 )

IS (2

M).

IS (M

1 )

IS(L

M),

(IS (1 )

IS (M))

Vectorized multi-site coding ^for nearest-neighbour ^neural

the ^Tij

patterns ( §[ ; ^{1--- i --}

Rr r ^{Tii ei}

^(Si

^1),

^storing

tij - 1/2 ^(Tq ^{+ 1 ),}

^(TRUE)