• Aucun résultat trouvé

A statistical investigation of bidirectional associative memories (BAM)

N/A
N/A
Protected

Academic year: 2021

Partager "A statistical investigation of bidirectional associative memories (BAM)"

Copied!
14
0
0

Texte intégral

(1)

HAL Id: jpa-00247018

https://hal.archives-ouvertes.fr/jpa-00247018

Submitted on 1 Jan 1994

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A statistical investigation of bidirectional associative memories (BAM)

J. Kurchan, L. Peliti, M. Saber

To cite this version:

J. Kurchan, L. Peliti, M. Saber. A statistical investigation of bidirectional associative memories (BAM). Journal de Physique I, EDP Sciences, 1994, 4 (11), pp.1627-1639. �10.1051/jp1:1994211�.

�jpa-00247018�

(2)

J. Phys. 1Fianc.e 4 j1994) 1627-1639 NOVEMBER 1994, PAGE 1627

Clas~ification Pff1.v/(..v Ah.vtl ait-v

87.10 89.70 64.60C

A statistical investigation of bidirectional associative memories

(BAM)

J. Kurchan j'), L. Peliti (2 *) and M. Saber (~)

j') Dipartimento di Fisica and Sezione INFN, Università di Roma « La Sapienza », P-le Aldo Moto 2, 1-00185 Roma, Italy

j2j Institut Curie, Section de Physique et Chimie. Laboratoire Curie, rue Pierre et Marie Curie, F-75231 Pans Cede~ 05. France

j~) Département de Physique, Faculté des Sciences, Université « Moulay Ismail

» B-P- 4010, Ben] M'hamed, Meknè~, Morocco

lRe(en.ed 12 Apifi 1994, le(.en.ed

la final foi-ni IN.liil» 1994, a((epted 21Jiil>. 1994)

Abstract. We inve,tigate by ~tati,tical mechanical methods a uocha~tic analogue of the

bidirectional a;sociative memone~ introduced by B. Kosko. We denve its ;torage capacity a~ a

function of the total number of synapses and of the asymmetry of the network.

l. Introduction.

Une of the main shortcomings in J. J. Hopfield's mortel of associative memory [1, 2] is the fact that the stored patterns lack internai organization. Any bit is as important as any other bit. This

runs contrary tu the commonly expenenced fact that the abjects which we try tu remember are often structured. su that some part of their description is more important than another. Une

possible way of taking this feature into account in Attractor Neural Networks [2] is tu introduce correlations among the stored pattems. For example, Parga and Virasoro [3] have considered the way in which a hierarchical family of patterns can be stored in a Hopfield network, and

Amit and Tsodkys [4] have shown that it is possible tu introduce artificial correlations between

temporally neighboring pattems, su that closeness in the Hamming distance gives a due tu

temporal closeness.

Une may choose a different approach, separating the information tu be coded in different features une possibility is categonzation. This led quite naturally to the consideration of

hierarchically structured networks, in which each layer is devoted tu the treatment of a

j*j Boursier Henri de Roth~child. On leave of absence from Dipartimento di Scienze Fisiche and Unità INFM, Univemità « Fedenco II

» Moura d'oltremare, Pad, 19, 1-80125 Napoli, italy. Associato INFN, Sezione di Napoli.

(3)

1628 JOURNAL DE PHYSIQUE il

different feature. Several authoo have con~idered hierarchical organizations of layers [5-7].

More general organizations have been considered [8, 9]. They have the common feature of

considering layers of strongly interacting neurons, which are comparatively weakly connected

among themselves. In the model considered by Brunel [9], for example, a module (layer)

codes for the category of the stored information, while another group of neurons code~ for the

remaining information. For example, one may separate a name into surname and giveii name, or the identity record of a persan into a name and a photograph. Then incomplete information

on either the name or the photograph may lead to the recall of the whole record.

Bart Kosko Il01 introduced in 1988 a completely different network architecture performing

this tasL and named it a bidirectional associative memory (BAMj. The task of the network is to

retrieve a pair of associated activity patterns when a stimulus close to one or the other pattem is

presented to the network. Kosko has shown how this task can be performed by a networL with

a simple two layer architecture. The interesting point he made is that the network works by

« reverberation » even if there are no connections between the neurons on the.vaine layer. He considers a network made up of two layers, A and B, of Mcculloch-Pitts neurons. We denote by A and B the corresponding activity patterns

A

= (ai, a~, aJ~) B

= (hi, b~, b~). ('.')

Here the binary variables denoting the activities of the neurons are supposed to take up the values ± 1.

The network evolves according to a determini~tic process defined by the following scheme (A, B

- (A', B)

- (A', B'

-. (1.21

with ~uccessive~ puiallel updatings of the two layers, one at the lime. The updatings are

defined by the following equations

,,>

a)

= sign ~ (j bj

,

il.3aj

j

hi' = sign 1

K~~ a~'l. ii,3b)

j

Î

lt is important to remark that these equations provide no direct coupling between neurons

belonging to the same layer at each time step.

If the two matrices,/ and K are the transpose of each other, it is easy to show that the

following quantity acts as a Lyapunov function, since it decreases (or remains unchanged) at each time step

M <'< <if <'<

E

=

~ ~ fj a~ b~

= ~ ~

K~~ b~ a~ il.4)

1=ii=i i=ii=i

As a consequence, the attractors of the dynamics are fixed points, i e., the local minima of E, which satisfy the equation

~" <if

a~ = sign ~ .(~

b~ /Jj = sign ~ a~.(~ il.5)

j

i

We shall consider from now on only the matri~./

=

K~.

(4)

il BIDIRECTIONAL ASSOCIATIVE MEMORIES j629

Stoia~ge is performed by choosing a form of the coupling matrix J reminiscent of the

Hopfield synaptic matrix :

fj =

f

f~~ 7~f Il.6)

,'MN

~

Here #~

= (if, if,

.,

#É) and 7~~ = (7~(, 7~f,

...,

7~(), ~1 1, ?, p, are activity pat-

tems f~~, 7~~~ = ±1, Vi, j, ~1. The factor

, MN is irrelevant at this stage, and has been

chosen for later convenience. We shall assume here, and throughout the paper, that the

components f~~, ~/ are independent random variables, assuming the values + l~ with

equal probability.

Retiiei~al is performed in a way analogous to Hopfield models : one of the layers is prepared

in an initial condition close to one of the corresponding stored pattems (f for layer A,

7~ for layer B). Then the system evolves, taking cane to update first the other layer the same reasonings as for the Hopfield models show that eventually the system reaches a fixed point close to the form (f~" 7~~"), where the pattern ~lo is determined by the initial condition.

Une can suggest several possible uses of this architecture as an association device. Kosko [1Ii has aise suggested some interesting modifications of it, and in particular the introduction of a learning rule leading to what he calls Adaptive Bidirectional Associative Memories (ABAM). Here we shall stick to the simple «nonadaptive» architecture we have just

descnbed.

Dur atm is to investigate the phase diagram of a stochastic version of the BAM, in particular

in order to determine its storage capacity and its performance. The method is close to the classic work of Amit, Gutfreund and Sompolinsky [12, 13] on Hopfield networks. lt is a first step towards the con~ideration of more complex architectures, in which single neural layers act as a unity. The point which makes this architecture worth investigating is the tact that it may become more efficient in the case of great asymmetry between the two layers, as it happens,

e-g-, for the case of the as~ociation of a name with an image. In this case one needs only

NM links instead of IN + Mé/2 to have essentially comparable retneval features, and the

former figure is much smaller when N MM. The price to be paid is in the attraction basin, which may be considerably smaller for the smaller layer.

We now define the stochastic version of the BAM. A fictitious « inverse temperature »

fl

= 1/T can be introduced as usual by considering a stochastic generalization of the updating equations (1.3j : namely, the probability that the state of spin i is equal to a) at the next time

step is given by

e~~'~'

(' .71

~1°~

~

~P"i + e~ ~~'"'

The local field fi, is defined by

A

fi, = _

~ f~ b~ il.8)

,> MN

The layer B is governed by completely analogous equations the probability that unit j takes up the state bj at the next time step is given by

b)

~~Ù~~ Pjb~ -p§bÇ ~~'~~

e + e

(5)

1630 JOURNAL DE PHYSIQUE

where the local field

k~ is given by

u

li~ = ~j J,~ a~. (1,10)

Let us remark that

b~ and a~ in equations il.7) and il 9) respectively stand for the Cm.rent state of the respective layers. There is no memory in the dynamics.

In order to obtain the Boltzmann distribution of Hamiltonian E (Eq. (1.4)) the updating fuie should be modified, by choosing ai I.andom, at each time step, the layer to be updated, lt is easy to convince one self, in fact, that if the layers are updated in succession, as in the

deterministic fuie, the resulting dynamics does not obey detailed balance.

As a consequence, the presentation of the stimulus to the network cannot take place simply by setting one of the layers (say Al in a given initial state : since then, if the next layer to be updated tutus out to be A, the ~imulus is irretrievably lost. Hence one should assume that the stimulus is presented by applying to the chosen layer a sufficiently strong externat field fi(' whose si gn is determined by the pattem one wants to treat. We are however only concemed

with the dynamics that follows once this externat field is set to zero.

The paper is organized as follows section 2 contains the analysis of the simple case in

which the number p of stored patterns is finite in the thermodynamic [(mit Section 3 discusses the case of an infinite number of pattems. Section 4 contains a few conclusions.

2, Finite number of patterns.

We now derive the phase diagram of the stochastic BAM defined above when the number

p of stored pattems is finite. In this situation the system is always self-averaging, and it is not necessary to introduce replicas. We consider therefore the partition function

Z Tr exPl- fIEJ

=

Tr

xp~- ,~ jj (f~ a)j7~~ b) j2.ii

,,. h ,, i, , MN

~

We denote by Tr the sum over activity pattern~

Tr

=

jj jj j?,2)

" ii =='i ii,=='i

The scalar product if. a) denotes the sum

ii. a) (

<, a,

,

(23)

and analogously for B.

We now introduce two Hubbard-Stratonovich variables u~ and u~, exploiting the identity

.t.v =

' [(i +

1>)~ i-t » il (2 4)

4 We thus obtain

P dii~ dv~ fl

~ ~

~ ~'~~~ ~ ~ Ù~

2 mi ' ~

~~~ 2 "

~~ ÎÎ ~~~~ ~ ~~ ~

x Tr

xp~ ~-[ii~(jf~ aj+ (~~ b)j+iv~(jfH a)- i~H fil)]], (2,5)

~ b ,2

(6)

Il BIDIRECTIONAL ASSOCIATIVE MEMORIES 1631

We can now introduce the two auxiliary fields m~ and n~ by

m~ = (u~ + iv~ Il

, 2

(?.61 n~ = (u~ + iv~)/ ,>'?

so that

We therefore obtain, up to a normalizing factor, the following expression for the partition function

i' 1- 1>

Z fl dm~ dfi~ exp p

, MN ~j ,n~ n~ x

~ ~

x Tr exp(pm~(f~ a)+pn~(~~ b)). (2.8) a,b

The trace can be now simpJy evaluated. yielding

j,

Tr exp p ~j [m~ (f~ a + n~ (~ ~ b)]

=

a, b

~

" fi /j ~ fi Jf

= eXp ~j În 2 COSh £ ~ fil

~

fi + ~j În 2 COSh ~j ~ Il

~

lJ/ (?.9)

~

~~

i ~

~~

When M and N go to infinity, the sum over the M jrespectively NJ independent random variables f~(7~ ) yields, up to a factor, the average over the distribution of the pattems, which

we denote by a bar

~w /> ~ p ~

~j In 2 cosh ~j p m~ f~~ m M In 2 co~h ~j p m~ f~~

,

(2.10)

p

~M

~

~~

while an analogous relation holds for the second term, As in the analogous case of the Hopfield model, the system is self-averaging. It is therefore warranted to calculate the partition function by the saddle point method.

We consider the order parameters m = (mj, m~,

.,

m~,) and n = (ni, n~,

, n~,) as vectors with p components. The free energy F is given as a function of their saddle point values by

F

=

,~MN (m

n p ' M In 2 cosh p ÎM~ Ii m

p ' N In 2 cosh p ÎN~ (~ n) The saddle point equations are given by

n = (tanh ~p /~ ((.m))

M

(?.ii)

m m~tanh ~p /~111.n))N

(7)

1632 JOURNAL DE PHYSIQUE I 11

The simplest case is the one of the Mattis (or retneval) states, in which only one component of the order parameters is nonzero, In this case it is easy to see that the component must be the

same for m and n, The corresponding saddle point equations read :

n =

tanh (p /~ mj

M j~ j?j

m =

tanh ~p /~n)

N

We see that m and fi must vanish or be nonzero together, In particular, m must be a solution of the equation

m =

tanh (p /~ tanh

~p /~

m) ), (2.13)

N M

which has the same qualitative behavior as the usual lsing mean field equation :

m =

tanh (pzJm). (2,14)

The discussion is then quite straightforward if p < 1~ only the zero solution exists (disordered state) for p ~ l there are two solutions of opposite sign, where m is one of the two nonzero

solutions of equation (2,13) and n is given by

fi = tanh p ~ m (2.15)

~~M

For p - m both m and n tend to either + or 1, These thermodynamic states correspond to

retrieval of a pair of associated pattems,

Uther solutions of the mean field equations correspond to « mixed ~tates », 1e,~ they have

nonzero overlap with more than one pair of patterns. These solutions can be identified and

discussed along the fines of references [12]. We have checked, in particular, that symmetric solutions (which overlap equally with more than one stored pattem) are at most metastable and [ose their stability when p is slightly larger than one, i e., immediately be'nw the transition

temperature,

3. lnfinite number of patterns.

We now consider the expression for the partition function of ii replicas of the system Z"

=

Tr exp ~~ f ( jf~ ù~)(7~~ b~) (3. Ii

u-1>

<MN~Îj~~j

We have introduced a vector notation for the summation over the « neural

» indices. By means

of the manipulations descnbed in the previous section we obtain a form analogous to

equation (2.8) : Z"

=

£ fl dm[ dn[ Tr exp (p ~j m$(f~ À) + p ~j n[(7~~ ~)) x

~,~ a. /~ ~,~

~ i

x exp p ,/MN ~j m[ n[

~

(3?)

(8)

Il BIDIRECTIONAL ASSOCIATIVE MEMORIES 1633

where £ is a constant, We can now distinguish between « low » iv

= 1,

~

i and

« high »

(~1=1+ l~

..~

p) components, according to reference [7], Summing over the

« high»

patterns we obtain

exp p ~j ~jm[ (f a~)

= exp

£ £ In cosh p £ ni( aÎ

~

(3.3)

~ ?

1~ ?

ÎÎ

where we have denoted by the bar the average over patterns. An analogous equation holds for

the B units. We now rescale the m and fi fields according to the ~cheme

m[ -

~~~

~

n[

-

~~ (3.4)

,'M ,'N

This yields the following contribution of the

« high » patterns to the partition function

flmexp~~~ ~j~j~jmjm~a)aÎ+~~ ~~ ~j~j~j,ijn[b)b)~ x

~ if

~~

i ~ if

x exp p ~j ~jni( n~ (3.5)

~ i

We introduce the « spin glass parameters » qj~ and jj~, by the definition

qj~ = ~j a) a) jj~ = ~jb) b) Jl. # f (3,6)

~ ~

i

Integrating over the m~ n fields yield~

§l

= const. exp ~ In Det [p ~(Q + Î )(Q + / / j3_7j

2

Here / is the identity matrix in the space of replicas~ while the matrices Q and

Q are defined by

[QIii

= qii lQ Iii

= qii 11 # (3.8)

We finally obtain the following expression for the partition function, up to irrelevant

multiplicative constants

Z"

= fl dnil,dal, fl dqj~ dij~ d(j~ d7j~ x

? ii fi

~ ~~~ ~~ '~ ~~~~~ ~ ~~~ ~ ~ ~

~ ~~

~

j p 2 z ij~ «j~ j p 2 z ij~ vj~ p

,

W z mi. filj 13.9J

j i i

Here F~ and F~ are the spin traces defined by

~~P(~ ~~A ~~ ~~P ) ~ ~ ~ ~lÎ ~~~~ ~ ~ ~Î~ ~ ~~~Î >'~~~ ~~'~~~

" l ~f

JOURNAL DE PHiSIQUE1 -T 4 ~'ll NO~E~BER lW4 ,h

(9)

1634 JOURNAL DE PHYSIQUE 11

and the analogue for B

exp(- pF~

=

Tr exp ~ p ~ ~j 7~~ b' b~ + p ~ ~j ~j n) 7~,, b~ j3,1ii

h

Î2N

~~i

~N

We define

~~

~~'

~~ ~Î~ ~' "N~~Î £X~=~;

~y~

j) ~~

~ ~ ~~ ~ M ,<~~ l 2)

We now assume replica symmetry~ implying that r~~

= i cj~ = q, ml

= m, (and analogous

relations for 7, 4 and fiÎ.). This is expected to be a good approximation, just as in the Hopfield

case [141. The spin traces can be then explicitly computed to yield exPl- PF

~ pF

~ =

~~

exp

' ~i =2

Î,'2 ~

w ,'2 w 2 2

X eXp

ll

In 2 COSh ~ Z ,~

+ YN ~j m, f'

~ ~ ~ '~

+ y~ jj,~ ~,, ,~

~ '~ ~ ~~~

~ 2 ~ ~~~<W' + £YN j~ ~~~

Une the other hand, an explicit evaluation yields

e~ mDet [p~lo+/)(Ù+/)-/]

=

[p~(1-ql(1-j)-1]"~'

x

x (iP2ii q)ii i) -11 + nP2iq + 4 2 q4)) 13.14)

We thus obtain the limite for n

- 0 of r

rmnlin jp2ji -q)ji -k)-11+ ~~~~~~~~~~~

), j3.15)

p~(1- qj(1- fil -1

By considenng the straightforward limits

~ji~~ ji = fi (n )1-q

- ni-q

if

1l'if qji

- iii q1 13.16)

ii

p ,/$ ~j

mil,nÎ. - ,'MN m,. n,,

(10)

II BIDIRECTIONAL ASSOCIATIVE MEMORIES 1635

we obtain the following expression for the average free energy G

=

In Z~

~

"~Ytt În2COSh (~~Z,~+YN ~jm, f')j

n,MN

=

y~ In 2 cosh p f ,~

+ yj, ~j n, 7~'

Îi

+p jjni,,fi, + " p~rjl -q)+ " p~7(1-Ri

? 2

+ c1

In jp ~(i q )i1 RI 11 +

f~~~ ~ ~ ~ ~~~

~13.17)

Pl'-«11'-«1-'

Here we have introduced the shorthand

=

Î~ The saddle point equations are

Î,

2 w

obtained by taking the denvatives of G. The denvatives with respect to ni and n are

straightforward

If' tanh

(p (z ,~ nu1+ yN 1m, f") = fi~ ~

13,18)

171' tanh p f

+ YM 1n,, 7~' 1= ni,.

i

The derivatives with respect to the Lagrangian multiplier i and 1 yield :

ltanh~ ~ ~Z,~

+ yN ~j ni,, f')

= q

=

(3.19)

tanh~ p f ,~

+ y~t ~j fi,, 7~'

= j

j

Un the other hand, the saddle point~ with respect to q and j yield

; j + p~qjl

5 ji p211 q)(1 4)l~'

(3.20)

~ q + p?q(1-fi)~

i jj p2(1- qJll 4)l~

lt i~ convenient to define

c. = pli -qjj

j3.21)

fi

= p(1-Ri.

Références

Documents relatifs

Different approaches are used for flow control on the lower Stream, for messages coming upstream from the device driver, and on the upper Streams, for messages coming

It will be seen that the method used in this paper leads immediately to a much better result than o (n/log2 n) for thenumber ofprimitive abundant numbers not

Classification of information in either category depends on its value, the impact of unauthorized disclosure, legal requirements, and the manner in which it needs to be used

If the value in the document is more than one higher than the local version number, the local version number is set to the value in the new document, the document is

Additionally, there could be multiple class attributes in a RADIUS packet, and since the contents of Class(25) attribute is not to be interpreted by clients,

As an example, the Home Agent may reject the first Registration Request because the prepaid quota for the user is reached and may attach a Message String Extension with the

Once this constraint is lifted by the deployment of IPv6, and in the absence of a scalable routing strategy, the rapid DFZ RIB size growth problem today can potentially

Using address translation to facilitate multihoming support has one unique advantage: there is no impact on the routing system scalability, as the NAT box simply takes one