HAL Id: jpa-00246756
https://hal.archives-ouvertes.fr/jpa-00246756
Submitted on 1 Jan 1993
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Short-term memory in a sparse clock neural network
S. Semenov, A. Plakhov
To cite this version:
S. Semenov, A. Plakhov. Short-term memory in a sparse clock neural network. Journal de Physique
I, EDP Sciences, 1993, 3 (3), pp.767-776. �10.1051/jp1:1993161�. �jpa-00246756�
J.
Phys.
I France 3(1993)
767-776 MARCH 1993, PAGE 767Classification
Physics
Abstracts05.90 64.60 87.10
Short-term memory in
asparse clock neural network
S. A. Semenov and A. Yu. Plakhov
Institute of
Physics
andTechnology,
Prechistenka Str, 13/7, Moscow119034, Russia(Received 7
September
1992,accepted
21 October 1992)Abstract. We propose a model of short-term
(working)
memory in a clock neural network. The retrieval dynamics of astrongly
diluted version of the model is solved. The relatedphase diagrams reflecting
fixedpoint
retrieval behaviour of the system are also obtained and discussed.1. Introduction,
A direct and
intuitively
evident way to avoidoverloading
inHopfield-type
neural networks is that of an introduction ofnon-additivity
intoleaming
rules which saturates neuronalcouplings
and, infact,
isresponsible,
forpalimpsestic
behaviour of associative memory models[1-3].
In thisregard
severalleaming
schemes forbinary
networks wereproposed [2, 4, 5], analysed [3- 6]
and discussed from abiological viewpoint [7]. Newly developed
models have even closerrelations with
psychological phenomena [8].
It is of interest therefore to
investigate
manifestation of short-term memory effects in otherimportant
attractor-like neural networkmodels,
such as, forinstance,
clock or Potts models. In recent years, a clock model whichexploits
a Hebb-likelearning prescription
has been treated[9, 10].
As it wasfound,
thesystem,
whenbeing
overloadedby
learntpattems,
falls to thestate of total memory confusion. It thus seems valuable to invent a tractable
learning procedure
that
prevents
memoryoverloading
in a clock neural network.In this paper we
present
andanalyse
a model of a clock neural networkexhibiting properties
of short-term
(working)
memory. In our model intemeuronalcouplings
aregiven by complex
numbers with fixed
modulus,
and theleaming
of any new pattem consists of Hebb-likemodification and then renormalization of each
coupling.
Thisfactually
leads to memorization of new pattems whilesimultaneously forgetting
the most ancient ones and as a result the totallength
of stored pattems list remains constant. We conclude in effect that main features inherent to the models ofworking
memorydesigned
forbinary
neural networks such as, forexample,
an existence of critical andoptimal embedding strengths
arepreserved
in our model.The paper is
organized
as follows. In the next section we describe ageneral
model which will be solved in the third section in the limit of extreme dilution. In the forth sectionstability analysis
is carried outproviding phase diagrams
in theparameter
space of the model.Then,
in section5,
weanalyse
in detail the deterministicdynamics
limit. The paper finishes withconcluding
remarks.2. Definition of the model.
Let us consider a network of N
planar spins (neurons).
The state of each neuron isgiven by complex
variable s~,[sk(
"I, (k
=
I,
,
N
).
The state of the system is then describedby
vector s =
(si,
,
s~). Spin dynamics
isgovemed by
local field of the formN
~k(S)
"
I
l~ki St>
f=1
with
complex couplings T~i.
We
imply
that stochastic evolution of the system isgiven by
eitherparallel
orsequential
heat bathdynamics
with timestep
Atby assuming
transition of neuron k to the state z in the next time moment t + At to begiven by probability density
~(s~(t
+ At)
= z = p
(z v~(t ) ),
whereexp(ji
Re(z*
V»&(jzj I>
,
~~~~~~ 11)
~~~~p
R~~w*v»&(jwj I)dw
~
~
v~(t)
=
z T~i si(t> (2>
Here asterisk marks
complex conjugation
and thecouplings
T~t=
K~t J~t
combine the effects ofleaming (J~t)
and dilution(K~t).
The factorsK~t
are chosen to beindependent
randomvariables with distribution
p
(Kki)
=
(C
IN(K~t
I/$)
+
(I
C/N(K~t),
here C is the mean number of
couplings
of each neuron. SinceK~t
andKt~
do not correlate with each other the dilution isasymmetric.
The parameter fl in(I)
is the inverse temperaturedetermining
the level of stochastic noise. Forparallel (synchronous) updating
of all the systema time increment At
=
I is
implied. Sequential dynamics
consists in random choice of network's neuron andupdating
it in time interval At=
N
according
to stochasticdynamics (1)-(2).
In the deterministiclimit, fl
= oJ, the
dynamics
becomessimply s~(t
+At)
=
v~
(t )/
v~
(t )
During
thelearning
process randompattems
which are nominatedconfigurations
N
s~,
v= ,
1, 0,
1,, p chosen
independently
with uniform distribution©(s)
=fl g(s~),
k I
g(s )
=
(2
ar)~ ( Is
I),
aresequentially presented
to the network. If one tries to store thepattem's
sequenceusing
Hebb-likeprescription
~ki
"
iS~ Si'
v
the memory deteriorates at some critical
loading
and the system loses theability
to retrieve any pattemtaught [9, 10].
To overcome the state of memoryoverloading,
it seems to be reasonable to restrict the range ofpossible
values ofcomplex couplings
in aquite
natural wayby assuming J~t
to take the values on the unitcircle,
I,e.J~i
=
I. Below we propose an
embedding
schemeenabling
us to memorize the most recent pattems in the sequence leamt. This is achieved instep-by-step «embedding
and renormalization» ofcouplings.
Thatis,
at the time ofacquisition
of the v~th pattem, thecouplings
are to bechanged by
~ ~
J(I
+es( s)
Jki
"~ ~
~
(3)
(J~t
+ esk siN° 3 SHORT-TERM MEMORY 769
where
e is the
embedding strength.
In contrast to Hebb-like additiveleaming proposed by
Noest[9],
in which bothamplitudes
andphases
ofcouplings J~t
aremodified,
the rule(3) implies
that the information is stored in thephases fl~t
= argJ~t only.
Note that the
learning procedure (3)
and the networkdynamics (1)-(2)
are invariant with respect toglobal
rotation of thesystem,
s~ - cs~,[c[
= I.
Following
Noest[9]
let us define an «overlap
» between networkconfigurations
s andS'by
m(S, S'>=
'~l Z Sksl*
For random noncorrelated
configurations,
we have m N ~~~ while ones, which coincide with each other after someglobal rotation, specify [m
=
1.
3.
Extremely
diluted network.If the mean number of unbroken bonds of neuron,
C,
becomes of orderlogN
asN - oJ, one may
neglect
correlations in the « ancestor tree » and ananalytical
treatment ofdynamics
can be used(Derrida
et al.[I II).
In thefollowing
we solve thedynamics
of such anextremely
diluted network(1)-(3)
withembedding strength
e inteaming
rule(3) going
to zero.We will be interested in storage and retrieval of the ~p + I
)-th pattem
from the end of pattem sequence..,
s~ ~,
s°,
s~,.,
sP,
I-e- the pattems°.
First of all, we will fix indices k andf
andconcentrate on the contribution of the pattem
s°
to the formation of thecoupling J~i.
It is easy to show that in view of the
independence
and the uniform distribution of(s( s)*,
v =
,
1, 0, 1,
,
p)
on the unit circle everygiven
increment4(1m
fl(t+ flit
does not correlate with the
previous
ones.(We
remind thatfib
= arg
J(I).
Indeed, thequantity
J(/ s( s)*
isuniformly
distributed on the unit circle and does not correlate with allquantities
sf~~(sf~~)*,
pm v and hence with
Jft.
Fromthis, according
to(3),
it follows thatJ(t+ J(/
=
(I
+eJ(/ s( s)
* I +eJ(/ s( s)*
does not correlate with allJfi,
p w v. In terms of arguments it means the above stated. From this fact oneimmediately
gets mutualindependence
of all the increments(4j,
p=. ,
1, 0,
1,,
p).
As e
-
0, by taking
linearpart
ofexpression (3)
weget J(t+ J(/
= I + I e Im
(J(/ s( s)* )
+ O(e~), (4)
or, in terms of arguments,
4(1
= E Im
(J(/ S( S)~)
+ O(E~).
Variance of
4(I
is therefore estimated ase~/2
+ O
(e~).
Theresulting change flit+ fill
which is obtained afterstoring
of p most recent pattemss~,.
,
sP is thus
given by
the sum of pindependent
randomquantities flit+~ fill
p=
I Al
with total variancepe~/2
+O~pe~).
a i
Consequently,
forlarge
p thequantity f~t
=~pe~/2)~~/~ (flit+~-flit)
isapproximately
Gaussian with a zero mean and unit variance.
Introducing
the notation ym
p/C
for the short-term storagecapacity
and e;=eC~'~
for thereduced
embedding strength (further
on we Will call eembedding strength
to be moreconcise),
we now
impose
the constraint that both y and e arekept
fixed in the limit p, C- oJ. This
implies
for escaling e~C~~~~
In these terms one getsflit+~- fl(t
=
(ye~/2)~~~ (~t.
Retuming
tocomplex
notations andusing
formula(4),
we can write down~~
"
~(i
~XP(' ~
~ki )
~"
J~i
+
fi
[S~Si~ (J~i)~
S~~Sil
+(E~)~
X
eXp(I /~ (ki)
2 C
We suppose next that the initial network's state
s(0)
has amacroscopic overlap only
with the pattems°. Studying
the retrieval process of pattems°,
werepresent
the local field in themoment t, in the form
vk(t>
=
iBk(t>
+fk(t>i si
>
(5>
where
Bk(t
=fi I Kki Si~
S((t )
eXp(I /~
~ki )
2 C >~
j
and
~k(t)
=
§ I
(4i
~
~ (4i)~
S~~Si
+ °(~ )j ~XP1' ~
~ki )
St(I)
>
(~)
~fj
where the sum
z
isgiven
over alli's
connected to neuronk, K~t
# 0.B~(t)
in(5)
can be~fj
interpreted
as asignal
from pattems°.
It isrepresented by
aself-averaging quantity
with themean
equal
toBm(t),
herem(t)
is atime-dependent overlap
betweenconfigurations s(t)
and~o
m(t)
#
( Sk(t)
S~~~
k=1
and
B
=
(e/2)(exp(I fi ())
=
(e/2) exp(- ye~/4)
<
( .)
,
signs
average over normalized Gaussian variableD. f~(t)
in(5) plays
the role of noise which, as can be shownusing
standard arguments[11],
isrepresented by
the sum ofuncorrelated items that
approaches,
as C~
log
N- oJ, a
complex spherically
distributed Gaussianquantity
with zero mean and unit variance, I,e.f~(t)j~)
= I.Let us consider the case of
parallel dynamics
with At=
I.
By using
theexpression
for conditional transitionprobability (I),
the thermal average ofs~(t
+I)s(*
under fixedf~(t ) equals c s(*
zp(z s((Bm (t )
+f~(t )))
dz. Due to rotational invariance of the function p, p(z
v)
m p(sz
sv) provided
s=
I,
after substitution w=
s(*
z we can rewrite theexpression
for thermal average as
s~
(t
+ Is(*
= wp
(w
Bm(t
+f~
(t)
dwc
Taking
average overf~(t),
weget immediately
(s~(t
+ Is(*)~
= F(m(t)) (7)
N° 3 SHORT-TERM MEMORY 771
with
F(m>= lwjp(wjBm
+f»~dw,
c
where
(. )
~ means
averaging
over normalizedcomplex spherically
distributed Gaussianvariable
f.
Without any loss ofgenerality
we may assume the initialoverlap
m(0 )
to be real andpositive (it
canalways
be achievedby
some rotation of the wholesystem). Obviously,
the function F(m )
takes realpositive
values forpositive
m and hence theoverlap dynamics
occuron
positive
half-axis.As a
result, taking
into account that anoverlap
is aself-averaging quantity,
in virtue of(7),
itstraightforwardly
follows an evolutionequation
in terms ofoverlap
m(t
+ I=
F
(m (t (8>
In the case of
sequential dynamics
similar considerationgives
evolution of the form=
F
(m(t m(t> (9>
Evidently,
iterative(8)
and differential(9) equations
have the same fixedpoint
solution m*= F
(m*)
that defines retrievaloverlap.
Inprinciple,
this solution can be obtainedby
numerical
integration
which appears to be timeconsuming
for nonzero temperatures. Wetherefore confine ourselves to the zero temperature limit
only
which isrelegated
to section 5.For
arbitrary temperatures,
thestability
of trivial solution m= 0 will be examined
(Sect. 4).
4.
Stability analysis
andphase diagrams.
Since F
(m)
is astrictly
convex function(that
can be checkedthrough
routinecalculation)
the condition for the existence of aunique
nontrivial solution m * of evolutiondynamics (8)
or(9)
is thenF'(m
=
0)
> 1.Thus,
the critical relation(here
x= 3l
(f )
isdenoted)
defines a critical surface in the parameter space(e,
y,fl)
at which the second order transition to memory deterioration takesplace.
To obtain this relation in an
explicit form,
we first introduce the notationf
= x +
iy,
z =
exp(14 )
in which theexpression
for aplax in(lo)
takes the form~~
= exp
[fl (x
cos4
+ y sin ~fi)]
exp[fl (x
cos it + y sin it)] fl (cos
4 cosit ) dir
x aX~
1@
-2
x
-«
exp~p (x
cosit
+ y sin it)] dir (I1)
Then
using polar
coordinates x +iy
=
rexp(10)
andtalcing
into account that(. )~
=m «
(2 ar)~ dr~ exp(- r~)
do..,
after substitution
(11)
into(lo)
one obtains-
«
? °~ dr~ exp(- r~> j"
doj" d#R
(r,
o,#
CDs#
= i
,
o «
«
where
R(r,
0,4 1«
= exp
[fir
cos(4
0)]
exp[fir
cos (11 0)] (cos 4
cos it)dir
x«
1«
-2x
- exp~pr
cos (11 0)] dir
«
Replacing 4
0by
x andperforming integration
over4,
X wefinally get
~~
e~ Y~~~
l~ dr~ ~~~(1 ~~~~~~ j
= ,(12)
4~
I~(fir)
here
I~ (x ) 1«
= ar exp
(x
cos4
cos(n4 d4
is the modified Bessel function of n-th order.o
The solutions to
equation (12)
can be obtainednumerically providing
the critical surface in the parameter space. Infigures
1-3 we presentphase diagrams corresponding
tosectioning
thissurface
by planes
with one of the parametersbeing
fixed.The critical lines
y~(e
; fl=
Cst)
areplotted
infigure
I for several values of the noiseparameter fl.
The functiony~(e
;fl
=Cst)
reaches its maximum at thepoint
e~~~(fl), shifting
towards
larger
values of e as temperature increases, and becomes zero at somepoint
e~(fl (critical embedding
stj~th
atgiven temperature). e~(fl
increases withtemperature
from e~=
e~(fl
=
oJ)
=
4/ ar,
behaving
asfl~~
asfl
-0(it
can bestraightforwardly
deduced from
(12)).
In the limite - oJ, at any temperature, the maximal storage
capacity
vanishes as
y~(e
fl = Cst e~ ~ ln e. Infact,
the storagecapacity
reducessufficiently
with temperature increase(see Fig. I) indicating
noticeablesensitivity
of memoryfunctioning
todynamical
noise.O.1 6
~O.12
- U O Cu O
° O.08
© Oh
$
O$
O.04 'ooo
ini>erse
emstrength
Fig.
I.- Critical lines in(e~~- y)-plane
at fixed noise parameter p (from top to bottom p = w, 10, 4, 2, 1, 0.5). Nonzero retrieval regions are below the corresponding curves.N° 3 SHORT-TERM MEMORY 773
A
family
of critical curves in the(e fl )-plane
isplotted
for several fixedyin figure
2.For the choice y
=
0,
the criticaltemperature
goes toinfinity
as e, e - oJ. Itimplies
that atarbitrarily high temperatures
an intensive number ofpattems
p =o(C)
can be stored forsufficiently high
values of e.Otherwise,
if y > 0 isassumed,
there exists a criticaltemperature
above which thepattems'
list oflength yC
cannot be retrieved for any e. As y increases thearea of nonzero retrieval
drops
and vanishes at y = yo = 0.145.4.OO
3.OO
~
©~$ (
2.OOCu
E
©
i .oo
o.oo
invhrse embedding strength
Fig.
2. Critical lines in (e~ p~)-plane plotted
for several values of storagecapacity
y (from top tobottom y
= 0, 0.01, 0.05, 0.I). Nonzero retrieval
regions
are below thecorresponding
curves.Critical lines below which nonzero retrieval appears are drawn for certain values of e in the
(y fl~~ )-plane (Fig. 3).
Whenembedding strength
exceeds e~, a finite area of nonzeroretrieval arises near the
origin. Figure
3displays
a characteristicchange
of theshape
of retrievalregions
with an increase of e. As e- oJ, the
slope
of the critical curve growsinfinitely.
5. Zero
temperature
limit.In the most
important
case of absence ofdynamical noise, fl
= oJ, the distribution
density
pbecomes a delta function
p(z[v)=&
z- ~(V(
and F
(m)
takes the formF
(m
=
~"~ ~
~
(Bm+f(
~(13)
JOURNAL DE PHYSIOUE I -T 3, N'3 MARCH 1993
1.60
1.20
~
©li
~j O.80
E
CL©
O.40
o.oo
storage capacity
Fig.
3.- Critical lines in (yfl~~hplane pictured
fora set of e values (from top to bottom
e =
6.0, ED " 3.7, 2.5). Nonzero retrieval
regions
are below thecorresponding
curves.Using polar
coordinates(r,
4),
Bm +f
= r exp(14
we can rewrite(13)
aslm
«F
(m
=
(2
ar)~ dr~ d4
cos
4
exp(- r~
+ 2 Bmr cos4
B~m~ )
,
0
-
«
and after
integrating
over r we get theexpression
for F(m)
«
F
(m)
=
(2
ar )~exp(- B~ m~) d4 (1
+ar~'~Q exp(Q~) II
+W(Q)I)
cos4
,
(14)
-
«
lx
where
Q
= Bm cos 4 and W
(x )
= ar
~/~ exp
(- t~ )
dt.x
The critical relation
(12)
isgreatly simplified
at zero temperaturegiving
a critical line of the formy~(e)
= 2
e~~In (are~/16).
The system is thus able to store p =
y~(e )C
>0 last pattems if theembedding strength
exceeds threshold value e~ =
4/
7~=
2.257. The function
y~(e)
reaches its maximumyo =
y~(eo)
=
ar/8 e =0,145
(optimal
short-term storagecapacity)
atoptimal embedding
strength
eo = 4/~
= 3.721 and y~ vanishes ~p~ =
o(C ))
as e- oJ
(see Fig, I). Evidently,
a
qualitative picture
of memoryperformance
bears astrong
resemblance to diluted versions of theHopfield-Parisi
model and similar ones[5].
It isinteresting
to note that the ratio betweendoubled
optimal
short-termstorage capacity
and criticalstorage capacity
of a sparsephasor
model
by
Noest[9] (the coupling Jki
definedby (3)
hasonly
onedegree
of freedomfl~t
instead of two in thephasor model)
isexactly
e~ ~,coinciding
with ananalogous
ratio forstrongly
diluted version ofbinary
models[5].
N° 3 SHORT-TERM MEMORY 775
Fixed
point
solutions m*=
F
(m*)
of retrievaldynamics
at zerotemperature
have been obtainednumerically
for varied choices of storagecapacity
y andembedding strength
e.Retrieval
overlap
m* as a function of y isdisplayed
infigure
4. Thesystem
exhibits acontinuous transition to zero retrieval
phase
when the short-termstorage capacity
reaches itscritical value
y~(e)
=
y~(e fl
= oJ
).
Theoptimal overlap
value ism?
=
m*(y
=
0; eo)=
0.889. If e grows from e~ to eo, the maximalstorage capacity y~(e; fl
=
oJ)
enhances(see
alsoFig. I). Otherwise,
forhigh
intensities oflearning,
e > eo, it reduces
but, simultaneously,
the retrievalquality improves greatly
at lowstorage capacities
so thatm*(y e)
- I when e
- oJ, y =
O(e~~). Thus,
an increase ofembedding strength
leads toimprovement
ofrecalling precision
of the newpattems
at the expense ofmaximal storage
capacity.
1.oo
*i O.80
CL O
j
O.60>
O
)
O.40©
°C
$
~ 0.20
o.oo
m 6
storage capacity
Fig.
4. Fixedpoint
solution m* of retrievaldynamics
(8, 14) versus storagecapacity
y for certain values of theembedding strength
e at fl= w (from top to bottom e
= 6.0, so
= 3.7, 2.5).
Retrieval
overlap
versus inverseembedding strength
is drawn for selected values of storagecapacity yin figure
5. For y>
0,
the retrievaloverlap
becomes zero atsufficiently large
ebecause the
progressive
increase inembedding strength produces
a more intensiveforgetting
of the most ancient pattems so as that the set of p =
yC
pattems cannot be retrieved as a whole.6.
Concluding
remarks.We have
presented
the model ofworking
memory in a clock neural network which has been solved in the limit of extreme dilution. Ouranalysis
shows that the system possesses nonzeroshort-term storage
capacity,
whenembedding strength
exceeds some thresholdvalue,
and demonstrates continuous transition to memory loss at critical surface in the parameter space. Itoo
~.
o
O.80
CL O
[
O.60>
O
)
O.40© 'C
$
~ O.20
O.OO
inverse
estre
Fig.
5. Retrievaloverlap
m* as a function of inverseembedding strength
e~ for several values of storagecapacity
y at fl= w (from top to bottom y
= 0, 0.01, 0.05, 0,1).
tums out that the
qualitative picture
of memoryperformance
isessentially
the same as it was found forbinary
nets[5].
It is worth
mentioning
thatproposed learning algorithm
seems toacquire
necessarysimplicity
to beimplemented technologically
on the base of coherentoptics technique [12].
In the
end,
it should be noted that neural network modelsusing
n-vectorspins
as neurons canbe
developed
and treated in closeanalogy
with the present consideration.References
[I] HOPFIELD J. J,, Proc. Nail. Acad. Sci. USA 79 (1982) 2554.
[2] PARISI G., J.
Phys.
A 19 (1986) L617.[3] HEMMEN VAN J. L., KELLER G. and KUHN R., Europhys. Lett. 5 (1988) 663.
[4] NADAL J. P., TOULOUSE G., MEzARD M., CHANGEUX J. P. and DEHAENE S.,
Europhys.
Lett. 1 (1986) 535.[5] DERRIDA B. and NADAL J. P., J. Stat.
Phys.
49(1987)
993.[6] MEzARD M., NADAL J. P. and TouLousE G., J. Phys. France 47 (1986) 1457.
[7] NADAL J. P., TOULOUSE G., MEzARD M., CHANGEUX J. P. and DEHAENE S., in Computer
Simulation in Brain Science
(Cambridge Cambridge University
Press, 1988) p. 221.[8] WONG K. Y. M., KAHN P. E. and SHERRINGTON D., J.
Phys.
A 24 (1991) ll19.[9] NOEST A. J., Europhys. Lett. 6 (1988) 469.
[lo] COOK J. J., J. Phys. A 22 (1989) 2057.
[ll] DERRIDA B., GARDNER E. and ZIPPELIUS A., Europhys. Lett. 4 (1987) 167.
j12] ANDERSON D. Z., in Proceedings of the Conference on Neural network models for