A statistical investigation of bidirectional associative memories (BAM)

(1)

HAL Id: jpa-00247018

https://hal.archives-ouvertes.fr/jpa-00247018

Submitted on 1 Jan 1994

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A statistical investigation of bidirectional associative memories (BAM)

J. Kurchan, L. Peliti, M. Saber

To cite this version:

J. Kurchan, L. Peliti, M. Saber. A statistical investigation of bidirectional associative memories (BAM). Journal de Physique I, EDP Sciences, 1994, 4 (11), pp.1627-1639. �10.1051/jp1:1994211�.

�jpa-00247018�

(2)

J. Phys. 1Fianc.e 4 j1994) 1627-1639 _NOVEMBER 1994, _PAGE 1627

Clas~ification Pff1.v/(..v Ah.vtl _ait-v

87.10 89.70 64.60C

A statistical investigation of bidirectional associative memories

(BAM)

J. Kurchan j'), L. Peliti (2 *) and M. Saber (~)

j') Dipartimento di Fisica and Sezione INFN, Università di Roma _« La Sapienza _», P-le Aldo Moto 2, 1-00185 Roma, Italy

j2j Institut Curie, Section de Physique et Chimie. Laboratoire Curie, rue Pierre _et Marie Curie, F-75231 Pans Cede~ 05. France

j~) Département de Physique, Faculté des Sciences, Université _« Moulay Ismail

» B-P- 4010, Ben] M'hamed, Meknè~, Morocco

lRe(en.ed 12 Apifi 1994, le(.en.ed

la final foi-ni IN.liil» 1994, a((epted 21Jiil>. 1994)

Abstract. We inve,tigate by ~tati,tical mechanical methods _a uocha~tic analogue of the

bidirectional a;sociative memone~ introduced by ^B. ^Kosko. We denve its ;torage capacity a~ a

function of the total number of synapses and of the asymmetry of the network.

l. Introduction.

Une of the main shortcomings _in J. J. Hopfield's mortel of associative memory [1, 2] is the fact that the stored patterns ^lack ^internai organization. Any bit is _as important as any other bit. This

runs contrary tu the commonly expenenced fact that the abjects which _we try tu remember _are often structured. _su that _some part ^of ^their description _is _more important than another. Une

possible _way of taking this feature into account _in Attractor Neural Networks [2] is _tu introduce correlations among the stored pattems. ^For example, Parga and Virasoro [3] have considered the way in which _a hierarchical family of patterns can be stored in _a Hopfield network, and

Amit and Tsodkys [4] have shown that it is possible tu introduce artificial correlations between

temporally neighboring pattems, su that closeness _in the Hamming distance gives a due tu

temporal closeness.

Une may choose _a different approach, separating ^the information _tu be coded in different features _une possibility is categonzation. This led quite naturally to the consideration of

hierarchically structured networks, in which each layer is devoted tu the treatment of _a

j*j Boursier Henri de Roth~child. On leave of absence from Dipartimento di Scienze Fisiche and Unità INFM, Univemità _« Fedenco II

» Moura d'oltremare, Pad, 19, 1-80125 Napoli, italy. ^Associato INFN, Sezione di Napoli.

(3)

1628 JOURNAL DE PHYSIQUE N° il

different feature. Several authoo have con~idered hierarchical organizations ^of layers [5-7].

More general organizations have been considered [8, 9]. They have the _common feature of

considering layers ^of strongly interacting neurons, which _are comparatively weakly connected

among themselves. In the model considered by Brunel [9], for example, a module (layer)

codes for the category ^of ^the ^stored information, while another group of _neurons code~ for the

remaining information. For example, one may separate a name into surname and giveii name, or the identity record of _a persan into _a _name and _a photograph. Then incomplete information

on either the name or the photograph _may lead _to the recall of the whole record.

Bart Kosko Il01 introduced in 1988 _a completely different network architecture performing

this tasL and named it _a bidirectional associative memory (BAMj. The task of the network is _to

retrieve a pair ^of associated activity patterns ^when a stimulus close _to _one _or the other pattem ^is

presented to the network. Kosko has shown how this task _can be performed by a networL with

a simple two layer architecture. The interesting point ^he ^made is that the network works by

« reverberation _» _even if there _are _no connections between the _neurons _on the.vaine layer. He considers _a network made up of two layers, A and B, of Mcculloch-Pitts _neurons. We denote by ^A and B the corresponding activity patterns

A

= (ai, a~, aJ~) B

= (hi, b~, b~). ('.')

Here the binary ^variables denoting the activities of the _neurons _are supposed to take up the values ± 1.

The network evolves according to a determini~tic process defined by the following scheme (A, B

- (A', B)

- (A', B'

-. (1.21

with ~uccessive~ puiallel updatings of the _two layers, one at the _lime. The updatings _are

defined by the following _equations

,,>

a)

= sign ~ (j _bj

,

il.3aj

j

hi' = sign 1

K~~ a~'l. ^ii,3b)

j

Î

lt _is important to remark that these equations provide no direct coupling between _neurons

belonging to the _same layer at each time step.

If the two matrices,/ and K _are the transpose ^of each other, it is easy ^to show that the

following quantity acts _as _a Lyapunov function, _since _it decreases (or _remains unchanged) at each time step

M _<'< <if <'<

E

=

~ ~ fj _a~ _b~

= ~ ~

K~~ b~ a~ il.4)

1=ii=i i=ii=i

As _a consequence, the _attractors of the dynamics _are fixed points, i e., the local _minima of E, which satisfy the equation

~" <if

a~ = sign ~ _.(~

b~ /Jj = sign ~ _a~.(~ il.5)

j

i

We shall consider from now on only the matri~./

=

K~.

(4)

N° il BIDIRECTIONAL ASSOCIATIVE MEMORIES j629

Stoia~ge is performed by choosing a form of the coupling matrix J reminiscent of the

Hopfield synaptic matrix _:

fj =

f

f~~ 7~f Il.6)

,'MN

~

Here #~

= (if, if,

.,

#É) and 7~~ ₌ (7~(, _7~f,

...,

7~(), _~1 1, ?, _p, _are activity pat-

tems f~~, _7~~~ ₌ ±1, Vi, j, _~1. The factor

, MN is irrelevant _at this stage, and has been

chosen for later convenience. We shall _assume here, and throughout the paper, that the

components f~~, ~/ _are independent random variables, assuming the values ₊ l~ with

equal probability.

Retiiei~al is performed in _a way analogous to Hopfield models _: _one of the layers is prepared

in an initial condition close _to _one of the corresponding stored pattems (f for layer A,

7~ for layer B). Then the system evolves, taking _cane to update first the other layer the _same reasonings as for the Hopfield models show that eventually the system reaches _a fixed point close _to the form (f~" _7~~"), where the pattern _~lo is determined by the initial condition.

Une _can suggest ^several possible uses of this architecture _as _an association device. Kosko [1Ii has aise suggested some interesting modifications of it, and in particular the introduction of a learning rule leading to what he calls Adaptive Bidirectional Associative Memories (ABAM). Here _we shall stick to the simple «nonadaptive» architecture _we have just

descnbed.

Dur atm is _to investigate the phase diagram of _a stochastic _version of the BAM, in particular

in order to determine _its storage capacity and its performance. The method is close _to the classic work of Amit, Gutfreund and Sompolinsky [12, 13] _on Hopfield networks. lt is _a first step ^towards ^the con~ideration of _more complex architectures, in which single neural layers act as a unity. The point which makes this architecture worth investigating is the tact that it may become _more efficient in the _case of great asymmetry ^between ^the two layers, as it happens,

e-g-, for the _case of the as~ociation of _a _name with an image. ^In this _case _one needs only

NM links instead of IN ₊ Mé/2 _to _have essentially comparable retneval features, and the

former figure is much smaller when N MM. The price to be paid is _in the _attraction basin, which may be considerably smaller for the smaller layer.

We now define the stochastic version of the BAM. A fictitious _« _inverse temperature »

fl

= 1/T _can be introduced as usual by considering a stochastic generalization of the updating equations (1.3j _: namely, the probability that the _state of spin i is equal to a) at the next time

step is given by

e~~'~'

(' .71

~1°~

~

~P"i^"Î ₊ e~ ^~~'"'

The local field fi, is defined by

A

fi, = _

~ f~ _b~ il.8)

,> MN

The layer ^B is governed by completely analogous equations ^the probability that _unit j takes up the _state bj ^at ^the ^next ^time ^step ^is given by

p§ b)

~~Ù~~ Pjb~ _-p§bÇ ^~~'~~

e + e

(5)

1630 JOURNAL DE PHYSIQUE N°

where the local field

k~ ^is given by

u

li~ ⁼ ~j J,~ a~. (1,10)

Let _us remark that

b~ and _a~ in equations il.7) and il 9) respectively stand for the Cm.rent state of the respective layers. There _is _no memory in the dynamics.

In order _to obtain the Boltzmann distribution of Hamiltonian E (Eq. (1.4)) the updating fuie should be modified, by choosing ai I.andom, at each time step, ^the layer to be updated, lt is easy ^to convince _one self, in fact, that if the layers are updated in succession, _as in the

deterministic fuie, the resulting dynamics does not obey detailed balance.

As _a consequence, the presentation ^of ^the ^stimulus to the network cannot take place simply by setting _one of the layers (say Al in _a given initial _state _: _since then, if the _next layer to be updated tutus out to be A, the ~imulus _is irretrievably lost. Hence _one should _assume that the stimulus _is presented by applying to the chosen layer a sufficiently _strong externat field fi(' whose si gn is determined by the pattem one wants to treat. We _are however only concemed

with the dynamics that follows _once this externat field is _set _to _zero.

The paper is organized as follows _section 2 _contains the analysis of the simple _case _in

which the number _p of stored patterns is finite _in the thermodynamic [(mit Section 3 discusses the case of _an infinite number of pattems. ^Section 4 contains a few conclusions.

2, Finite number of patterns.

We _now derive the phase diagram of the stochastic BAM defined above when the number

p of stored pattems is finite. In this _situation the system is always self-averaging, and it _is not necessary ^to introduce replicas. We consider therefore the partition function

Z Tr exPl- fIEJ

=

Tr

xp~- ,~ jj (f~ a)j7~~ b) j2.ii

,,. h _,, i, , MN

~

We denote by Tr the _sum _over activity pattern~

Tr

=

jj jj j?,2)

" ii =='i ii,=='i

The scalar product if. a) denotes the _sum

ii. a) (

<, _a,

,

(23)

and analogously for B.

We _now introduce _two Hubbard-Stratonovich variables u~ ^and u~, exploiting the identity

.t.v =

' [(i ₊

1>)~ i-t _» il ₍₂ ₄₎

4 We thus obtain

P dii~ dv~ _fl

~ ~

~ ~'~~~ ~ ~ Ù~

2 _mi ^{' ~}

~~~ 2 ^"

~~ ÎÎ ^~~~~ ^~ ^~~ ^~

x Tr

xp~ ~-[ii~(jf~ _aj+ _(~~ b)j+iv~(jfH a)- i~H fil)]], ^(2,5)

~ b ,2

(6)

N° Il BIDIRECTIONAL ASSOCIATIVE MEMORIES 1631

We _can _now introduce the two auxiliary fields m~ ^and n~ by

m~ = (u~ + iv~ Il

, 2

(?.61 n~ = (u~ + iv~)/ ,>'?

so that

We therefore obtain, _up to a normalizing ^factor, the following _expression for the partition function

i' 1- ^1>

Z fl dm~ dfi~ exp p

, MN ~j ,n~ n~ x

~ ~

x Tr exp(pm~(f~ _a)+pn~(~~ b)). (2.8) a,b

The trace can be now simpJy evaluated. yielding

j,

Tr exp p ~j ^[m~ ^(f~ a + n~ (~ ^~ ^b)]

=

a, b

~

" fi /j ^~ ^fi Jf

= eXp ~j ^{În 2} ^COSh £ ~ _fil

~

fi ₊ ~j ^În ² ^COSh ~j ^~ Il

~

lJ/ (?.9)

~

~~

i ~

~~

When M and N go ^to infinity, the _sum _over the M jrespectively NJ independent random variables f~(7~ ) yields, _up to a factor, the average over the distribution of the pattems, ^which

we denote by _a bar

~w /> ~ _p ~

~j ^{In 2} ^cosh ~j ^p m~ f~~ m M In 2 co~h ~j ^p m~ f~~

,

(2.10)

p

~M

~

~~

while _an analogous relation holds for the second _term, As _in the analogous case of the Hopfield model, the system ^is self-averaging. It is therefore warranted to calculate the partition function by the saddle point method.

We consider the order parameters m ₌ (mj, _m~,

.,

m~,) ^and n = (ni, n~,

, n~,) ^as ^vectors with p components. ^The ^free energy ^F is given as a function of their saddle point ^values by

F

=

,~MN _(m

n p ^' M In 2 cosh p ÎM^~ Ii _m

p ^' N In 2 cosh p ÎN^~ (~ _n) The saddle point equations are given by

n = (tanh ~p /~ _((.m))

M

(?.ii)

m m~tanh ~p /~111.n))_N

(7)

1632 JOURNAL DE PHYSIQUE I N° 11

The simplest _case is the _one of the Mattis (or retneval) _states, in which only one component ^of the order parameters is nonzero, In this _case it _is easy to see that the component must be the

same for _m and _n, The corresponding saddle point equations read _:

n =

tanh (p /~ _mj

M j~ j?j

m =

tanh ~p /~n)

N

We _see that _m and _fi _must vanish _or be _nonzero together, In particular, m must be _a solution of the equation

m =

tanh (p /~ _tanh

~p /~

m) ), (2.13)

N M

which has the _same qualitative behavior _as the usual lsing mean field equation :

m =

tanh (pzJm). (2,14)

The discussion is then quite straightforward if p _< _1~ only the _zero solution exists (disordered state) for p _~ l there _are two solutions of opposite _sign, where _m is _one of the two _nonzero

solutions of equation (2,13) and _n is given by

fi = tanh p ^~ _m (2.15)

~~M

For p _- m both _m and _n tend to either ₊ _or 1, These thermodynamic states correspond to

retrieval of _a pair of associated pattems,

Uther solutions of the _mean field equations correspond to « mixed ~tates _», 1e,~ they have

nonzero overlap with _more than _one pair ^of patterns. ^These solutions can be identified and

discussed along the fines of references [12]. We have checked, in particular, that symmetric solutions (which overlap equally with _more than _one stored pattem) are at most metastable and [ose their stability when p is slightly larger than _one, _i _e., immediately be'nw the transition

temperature,

3. lnfinite number of patterns.

We _now consider the expression for the partition ^function ^of ii replicas ^of the system Z"

=

Tr exp ~~ ^{f (} _jf~ _ù~)(7~~ b~) (3. Ii

u-1>

<MN~Îj~~j

We have introduced _a vector notation for the summation over the _« neural

» indices. By _means

of the manipulations descnbed _in the _previous section we obtain _a form analogous to

equation (2.8) _: Z"

=

£ fl dm[ dn[ Tr exp (p ^~j ^m$(f~ ^À) ⁺ ^p ^{~j n[(7~~} ^~)) x

~,~ a. /~ ~,~

~ i

x exp p ,/MN ~j m[ n[

~

(3?)

(8)

N° Il BIDIRECTIONAL ASSOCIATIVE MEMORIES 1633

where £ is _a constant, We _can _now distinguish between _« low _» iv

= 1,

~

i and

« high _»

(~1=1+ l~

..~

p) components, according _to reference [7], Summing _over the

« high»

patterns we obtain

exp p ~j ~jm[ _(f a~)

= exp

£ £ In cosh p £ ni( aÎ

~

(3.3)

~ ?

1~ ?

ÎÎ

where _we have denoted by the bar the average over patterns. ^An analogous _equation holds for

the B units. We _now rescale the _m and _fi fields according to the ~cheme

m[ -

~~~

~

n[

-

~~ (3.4)

,'M ,'N

This yields the following contribution of the

« high _» patterns to the partition ^function

flmexp~~~ ~j~j~jmjm~a)aÎ+_~~ ^~~ ~j~j~j,ijn[b)b)~ ^x

~ if

~~

i ~ if

x exp p ~j ~jni( _n~ (3.5)

~ i

We introduce the _« _spin glass parameters _» _qj~ ^and jj~, by ^the ^definition

qj~ = ~j a) a) ^jj~ = ~jb) b) Jl. # f (3,6)

~ ~

i

Integrating _over the _m~ _n fields yield~

§l

= const. exp ^~ In Det [p ~(Q + Î )(Q ₊ _/ _/ j3_7j

2

Here / _is the identity matrix in the space of replicas~ while the matrices Q and

Q are defined by

[QIii

= qii lQ Iii

= qii 11 # (3.8)

We finally obtain the following _expression for the partition function, _up _to irrelevant

multiplicative constants

Z"

= fl dnil,dal, fl dqj~ dij~ d(j~ d7j~ _x

? ii fi

~ ~~~ ^~~ ^'~ ~~~~~ _~ ~~~ _~ ~ ~

~ ~~

~

j _p 2 z _ij~ _«j~ j _p ² _z _{ij~ vj~} _p

,

W z mi. filj ^13.9J

j i i

Here F~ and F~ _are the spin ^traces defined by

~~P(~ ~~A ~~ ~~P ) _~ ~ ~ _~lÎ ~~~~ _~ ~ ~Î~ ^~ ~_~~Î ^>'~~~ ~~'~~~

" l ~f

JOURNAL DE PHiSIQUE1 -T 4 ~'ll NO~E~BER lW4 ,h

(9)

1634 JOURNAL DE PHYSIQUE ^N° ¹¹

and the analogue ^for B

exp(- pF~

=

Tr exp ^~ p ^~ ~j _7~~ ^b' ^b~ + p ^~ ~j ~j n) 7~,, b~ j3,1ii

h

Î2N

~~i

~N

We define

~~

~~'

~~ ~Î~ ~' _"N~~Î _£X~=~;

~y~

j) ~~

~ ~ _~~ ~ M ,<~~ ^l ²⁾

We _now _assume replica symmetry~ implying that _r~~

= i cj~ = q, ml

= m, (and analogous

relations for 7, 4 and fiÎ.). This is expected to be _a good approximation, just as in the Hopfield

case [141. The spin traces can be then explicitly computed to yield exPl- PF

~ pF

~ ⁼

~~ d°

exp

' ~i =2

Î,'2 ~

w ,'2 _w ² ²

X eXp

ll

In 2 COSh ~ _Z ,~

+ YN ~j m, f'

~ ~ ~ '~

+ y~ jj,~ ~,, ^,~

~ '~ ~ ~~~

~ 2 ~ ~~~<W' + £YN j~ ~~~

Une the other hand, _an explicit evaluation yields

e~ mDet [p~lo+/)(Ù+/)-/]

=

[p~(1-ql(1-j)-1]"~'

x

x (iP2ii _q)ii _i) _-11 ₊ nP2iq ₊ ₄ 2 q4)) 13.14)

We thus obtain the limite for _n

- 0 of r

rmnlin ^jp2ji ^-q)ji ^-k)-11+ ~~~~~~~~~~~

), ^j3.15)

p~(1- _qj(1- _fil _-1

By considenng the straightforward limits

~ji~~ ji = fi (n )1-q

- ni-q

if

1l'if _qji

- iii q1 13.16)

ii

p ,/$ _~j

mil,nÎ. _- nô ,'MN _m,. _n,,

(10)

N° II BIDIRECTIONAL ASSOCIATIVE MEMORIES 1635

we obtain the following expression for the average free energy G

=

In Z~

~

"~Ytt În2COSh (~~Z,~+YN ~jm, f')j

n,MN

=

y~ ^In 2 cosh p f ,~

+ yj, ~j n, 7~'

Îi

+p jjni,,fi, ₊ ^" ^p~rjl -q)+ ^" p~7(1-Ri

? 2

+ c1

In jp _~(i q )i1 RI 11 +

f^~~~ ^~ ~ ~ ~~~

~13.17)

Pl'-«11'-«1-'

Here _we have introduced the shorthand

=

Î~ The saddle point equations are

Î,

2 _w

obtained by taking the denvatives of G. The denvatives with respect to ni and _n _are

straightforward

If' _tanh

(p ^(z ^,~ ^nu1+ ^yN 1m, f") ⁼ ^fi~ ^~

13,18)

171' ^tanh ^p ^f ,É

+ YM 1n,, 7~' 1= ^ni,.

i

The derivatives with respect to the Lagrangian multiplier i and 1 yield _:

ltanh~ ^~ ~Z,~

+ yN ~j ni,, f')

= q

=

(3.19)

tanh~ p f ,~

+ y~t ~j fi,, 7~'

= j

j

Un the other hand, the saddle point~ with respect to q and j yield

; j + p~qjl Ré

5 ji p211 _q)(1 4)l~'

(3.20)

~ q + p?q(1-fi)~

i _jj p2(1- _qJll 4)l~

lt _i~ convenient to define

c. = pli -qjj

j3.21)

fi

= p(1-Ri.