HAL Id: jpa-00212435
https://hal.archives-ouvertes.fr/jpa-00212435
Submitted on 1 Jan 1990
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Dynamics of a multi-layered perceptron model : a
rigorous result
A.E. Patrick, V.A. Zagrebnov
To cite this version:
Dynamics
of
amulti-layered
perceptron
model :
arigorous
result
A. E. Patrick
(*)
and V. A.Zagrebnov
(*)
Centre de
Physique Théorique,
Luminy,
13288 MarseilleCedex 9,
France(Reçu
le 24 novembre 1989,accepté
le 26janvier 1990) .
Résumé. 2014 Nous dérivons exactement et
rigoureusement
lessystèmes
d’équations
dynamiques
pour un perceptron multi-couchesproposé
parDomany,
Meir et Kinzel(DMK-model).
Ceséquations
décrivent simultanément l’évolution des fonctions de recouvrement essentielles etrésiduelles.
Abstract. 2014 We derive
exactly
andrigorously
the system ofdynamical equations
for amulti-layered
perceptronproposed by Domany,
Meir and Kinzel(DMK-model).
They
describes both the main and the residualoverlaps
evolution.Classification
Physics
Abstracts 05.20 - 05.90 - 87.301. The DMK-model
[1]
is alayered
feed-forward neural network withoutcouplings
withinlayers.
In this sence it can be considered as ageneralization
of the idea of theperceptron
(multi-layered perceptron),
see e.g.[2].
Thedynamics
of theoriginal
DMK-model,
and of its variousextensions,
have been solved in[3-5]
by
a(hard controlled)
saddlepoint integration
method,
proposed
in[6]
for thestudying
of theparallel dynamics
infully
connected neural networks.The aim of the
present
paper is to derive(exactly
andrigorously)
thesystem
ofdynamics
equations
for DMK-modelexploiting
some main ideas of theprobability theory
and thetheory
ofdynamical
systems
with noise. As a result weget
notonly
rigorous
but even shorter(then
in[3-5])
way to exactsystem
ofdynamics equations.
Below we follow the line of
reasoning
formulated in[7]
for theHopfield
model. This program is focussed on the accurate calculations of stochasticproperties
of the intemal noise in the limit oflarge
neural networks. Thisapproach
have beenimproved
in thesubsequent
paper[8].
There weexplain
theimportance
to considertogether
with the mainoverlap
thecorresponding
stochasticequations
for evolution of the intemal noise(residual overlaps).
This allows to rederive(exactly,
rigorously
and in shorterway)
the Gardner-Derrida-Mottishawequation
[6]
for the mainoverlap
(in
theparallel
dynamics)
for the secondstep
1130
t = 2 and to derive the
corresponding equation
for t = 3. Thisapproach gives
a more clearview on a
possible
behaviour of the mainoverlap
forarbitrary t
and allows topresent
a puredynamical
few-lines derivation of theAmit-Gutfreund-Sompolinsky
formula
[9]
for the mainoverlap
at t = 00.Here we
exploit
thisapproach
to DMK-model where itgives,
besides the well-known exactresult for
dynamics
of the mainoverlap [3-5],
the exactequation
for the evolution of thenoisy
terms
(residual overlaps).
2. We recall that DMK-model can be formulated as follows. Each of the
layer t
( =
1, 2,
...,T )
contains Nbinary
variables(spin configurations) S (t)
={ Si (t )
= ±1 } N=
i
and the set of M fixed realizations
{ {P (t ) } :
= 1 of the sequence of
independent
identically
P =
distributed random variables of the
length
N.Therefore,
each{P (t)
en N (t)
={-
1, 1} N -
theprobability
space with the naturalalgebra
andprobability
I?N {{}
defined as theproduct-measure by
Pr{ ; _ ± 1 }
=1
2.
Then for the zero
temperature
(0=0)
evolution of the DMK-modelcorresponds
toupdating
of thespin configuration
at the(t
+ 1)-th
layer according
to the effective fieldsby
the lawHere fields
(1)
are definedby
thespin configuration
on theprevious
t-thlayer
andby
the(directed)
couplings
between this twolayers,
which have Hebbian form[1]
For non-zero
temperature
(0 :o 0)
evolution of the DMK-model is definedby
stochasticequation
Here i.i.d.r.v.
{;(t)}N-1
reproduce
a heat-bathtemperature
noise ifSo,
in contrast to thequenched
random vectors(key
patterns
at eachlayer)
the random variables{ (t ) ) i 1
areunquenched.
We have to average over this termal noisevariables to consider
dynamics
in the presence of the heat-bath.3. The
problem
of the neural networksdynamics
isusually specified
as follows. Let initialconfiguration
S(t
= 1 )
has amacroscopic overlap
with one of thekey
patterns,
e.g., withlimit N --+ oo. For the DMK-model this means that for a fixed
(quenched)
randomsystem
ofuncorrelated
key
pattems
n
one chooses initial
configuration
in such a way thathas non-zero limit
m q( t
=1)
for N --+ oo(main
overlap)
and forp ( # q )
=1, 2,
..., M even if M =
a N (a - lim).
From
equations (1), (3)
and(6)
onegets
where we introduce residual
overlaps
In contrast to
mKr(t),
the residualoverlaps
(8)
do not converge to any limits for thequenched
system
ofkey
patterns.
Butthey
have limits as sequences of random variables if we considerpatterns
as random vectors in the space(f2,
u,P).
Suppose
that at the moment t a - limm g (t)
=M q(t)
almostsurely (a. s.)
withrespect
tothe
probability
P.
Thenby
the individualergodic (Birkhoff-Khinchin)
theorem([10],
V,
Sect.3)
weget
that :(1) a -
lim for the arithmetic mean in therightihand
side of(7)
existsfor almost all
(by
the measureP)
quenched
systems
ofkey
patterns ;
(2)
this limit is a.s.independent
of the choice of the fixedsystem
ofpatterns
(self-averaging)
and(3)
it isequal
toexpectation
Here
noisy
termq (t + 1 ) = -q i (t + 1 )
andis the
stationary
(i = 1, 2, ... )
ergodic
sequence of random variables. The convergence in(10)
occures in the sense of distribution.
Therefore,
in the a - limby
ergodic
theorem one reduces the initialproblem
forquenched
key
patterns
to calculation theexpectation
withrespect
to distribution of the limit randomvariable
(10).
Hence,
to constructdynamics
for the mainoverlap
we have to considertogether
withequation
(7)
the evolution of the residualoverlaps
(8). Using equations (1), (3)
and(8)
1132
This is a stochastic
equation
(recurrence relation)
for random variablesgenerated by
the different realizations of thekey
pattems
and the noise In the a - lim relations(7)
and(11)
form asystem
ofcoupled
equations
whichdescribes
dynamics
of the DMK-model.4.
According
to above observation atfirst
we have to derive in a - lim the stochasticrecursion relations for evolution of the residual
overlaps
{rP(t)} p+q (8)
and then use thedistribution of the
noisy
term(10)
for calculation the recursion relation for the mainoverlap
mq(t ) (9).
We do thisby
induction in t.For t = 1 the initial
configuration
S(t
=1)
is chosen in such a way thatFor
unquenched
key
patterns
this means thatS(t = 1 )
andq (t = 1 )
are correlated :Pr
(Si (t
=1)
=U) =
(1
+ mq(1
=1),
butfor p #
qp t
= 1 Si(1
=1)}
°°
1 is ase-Pr(,(=l)==
+ ( )),
but for Pq{(=1)
)
( )Îm
is asé-quence of i.i.d.r.v. with Pr
{U(t
=1 ) Si(t
= 1 ) = ± 1 )
= 1.
Consequently, by
the central2
limit theorem
(CLT)
residualoverlaps (as
randomvariables)
converge in the sensé ofdistribution to the Gaussian random variable :
with mean 0 and variance 1.
Moreover,
for the differentp (# q)
they
areindependent,
i.e.,
{rP(t
=1)};= 1
is the sequence of i.i.d. Gaussian r.v. Note that for t = 2 random variables{?(t)}
areindependent
of {rÑ(t = 1 )}
(their
arguments
belong
to differentlayers
!).
Thenby
thesymmetry
ofp(t = 2 ) _ ± 1
and of the distribution forN (0, 1 )
weget
that(§f(t
= 2) rP(t
=1)}: = 1
isagain
a sequence of i.i.d. Gaussian r.v. This means thatby
theCLT random variables
(deviation
ofrÑ (t
=1 )
from
rP(t
=1 )
can be controlledby
theBerry-Esseen
theorem([10],
III,
Sect.6),
i.e.,
we useCLT in the scheme of
series)
converge in distribution :to the Gaussian random variable with the variance a.
Finally,
taking
into accountequations
(13), (14),
symmetry
of N(0, 1 ), independence
of random variablesVq
N(t
=2)} ,
Using
distribution(15)
andergodic
theorem onegets
(see
Eqs.
(7)
and(9))
5. To go to the next
step t
= 3 for themq(t)
we have first to derive theprobability
distribution for
rP(t = 2 ),
see(9), (10).
To this end we calculate the a - lim for stochasticrecurrence relation
(11),
i.e.,
stochastic characteristics of random variables{rp(t
=2)}‘°
p=conditioned
by arbitrarily
fixed realization{rP(t
= 1)};=1.
Now
application
of the CLT to the sequence of randomvariables
N (t = 2 ) (see (11)
and(14))
is not verystraightforward.
For fixed Gaussianreafizations (r§ (t
= 1 )}p
(see (13))
onegets
that{çJ(t = 2)
r(t = 1)}t;&q is
the sequence ofindependent
but notidentically
distributed random variables : ingeneral
rpN (t
=1 )
rpN (t
=1 )
for p :0 p’.
Therefore,
onehas either to check for this sequence the
Lindeberg
conditions or to useLyapunov’s
method ofcharacteristic functions for the direct
proof ([10],
III,
Sect.3).
From Pr
{ef(t
=2)
r’(t
=1)
= :L-r’(t
=1 )}
= 1/2
itfollows
that for the random variableone
gets
ESM
=0,
Var .So,
thecharac-teristic function
fM(T )
for the random variableSM/ vVar
SM
has the formNow,
by
theergodic
theorem(or
the law of thelarge
numbers)
for the almost all realizationswe
get
that a - lim , seeequation
(13)
and remark after it. This means thatby
definition(8)
andby
the iteratedlogarithm
law([10],
IV,
Sect. 4)
onegets
1134
i.e.,
SM/ y’Var SM -+
N(0,1)
in the a - lim in distribution. Then as acorollary
(cf.
(11)),
weget
Using
theindependence
{J(t
=2)}
of random variables(20)
andequations
(12), (20)
wecan calculate limit distribution for the random variables
(cf.
(11))
One gets
From the
independence
gf(t
=2)} P :ri: q
ofw-,§(t
=2)}
and thesymmetry
of distribution* q j,N
)}
N(22)
we obtain that distribution of itsproducts
(cf.
(11» {{j(t = 2)
wf.’(t = 2)}:=1
coincides with
(22) :
Therefore,
random variables(for a
afixed
realization {r(t
= 1)}t",’q!)
form Li.d.r.v. Now we canapply
CLT(again
in thescheme of series
([10],
III,
Sect.4))
to the sequence{signqp N
controlling
the rate ofconvergence in
(22)
by
theBerry-Esseen
theorem([10],
III,
Sect.6) :
Let us stress that it is
key
patterns
from the netlayer t
= 2 thatguarantee
theindependence (§f(t
=2)
wf,’(t
=2)}=1
of realization{rpN,(t
=1)}P
andfinally
theinde-j = 1 P
pendence
N (0, 1 ),
in(24),
ofrP(t
=1 ).
For theHopfield
modelthey
aredependent :
this créâtes main difficulties in dérivation ofdynamical équations
for this model[8].
Using equations
(22)
and(23)
we can estimateexpectation
in(24)
and(25) :
So,
using
equations (24), (26)
weget
the recurrence relation for residualoverlaps
(cf. (11)) :
Here,
in accordance to that we have stressedabove,
the random Gaussian variableN (0, 1 )
isindependent
ofrP (t
=1 ).
So,
onegets
for VarrP(t )
=D (t )
thefollowing
relations(see (13)
and(27)) :
6. From the
explicit
formula for’n 2 N
and the limit distribution(22)
it follows that random variables{Tlf}
p are
uncorrelated forp =1= p’:
E (e,» (t
=2)
eP’(t
=2))
=0,
and variables{rP(t
=1)} P
are evenindependent
forp :0
p’.
Therefore,
randomvariables {rP(t
=2)}
p are
uncorrelated. From
(11)
it follows that thisproperty
isreproducible.
If{rP(t)} P
areuncorrelated at the moment t, then
by
abovearguments
the sameproperty
has the sequenceTherefore,
we can use induction in t to dérive the recurrence relations for the main andresidual
overlaps
forarbitrary
t. The firststep
we haveproved
in 4 and 5.Now let
{rP(t)} p
be uncorrelatedidentically
distributed Gaussian random variables withzero mean and
Var rP(t)
=D(t).
From thesymmetry
and theindependence
eP(t
+1)
ofM j
rp(t )
weget
that)
isagain
Gaussian variable(as
a sum ofGaussians)
which in the a - lim is
equal
toN(O, D(t)).
Then,
taking
into accountrP(t =
a - lim r (t)
andequations (9), (10),
onegets
1136
for the almost all realizations. Let realization
{rP(t)}
P be
fixed.Then,
using
(29)
and the line ofreasoning
of the section5,
weget
(see (11)
and(20)-(22)) :
This means that we can copy the
proof
in the section 5 toget
(cf. (27)) :
where
again
the random variablesN(0, 1)
andrP(t)
areindependent,
i.e.,
This finishes the
proof
for the zerotemperature 0
= 0.For 0 #
0 one has to use evolution definedby
stochasticequation (4).
As far as theheat-bath
temperature
noise(5)
isindependent
of intemal evolution we havesimply
to averageequations (28)
and(30)
over this noise. As a result onegets
where random variables
N (0, 1 )
andrP(t), p :o
q, areuncorrelated,
rP(t
=1 ) = N (0, 1 )
andD(t)
= VarrP(t),
i.e.,
7. The
equations
(28), (32)
and(33), (35)
have been derived at the first time in[3, 4]
for thecase of linear Hebbian
couplings
(3)
and for the various extensions of the basic DMK-model in reference[5].
There one can find also a detailedanalysis
of the fixedpoint
equation
and the critical behavior of the model.Function
D (t )
in the above papers in anauxiliary
saddlepoint
orderparameter.
Theadvantage
of our methodis,
inaddition,
indecoding
of themeaning
of theparameter
D (t)
as the variance of the residualoverlaps responsible
for the intrinsic noise indynamics
of the mainoverlap.
Note added. -
After this paper has been finished
(Dubna,
July
1989)
we become aware of therecent work
by
the DMK-model inventors[11]
where very similar(except
of theequation
for the residualoverlaps
and the
rigorous proofs)
ideas wereproposed,
see also[12].
Acknowledgments.
and useful discussions. He also thanks R. Kühn and J.
Slawny
forhelpful
remarks andcomments. V. A. Z. thanks CPT-CNRS and the Faculté des Sciences de
Luminy,
Universitéd’Aix-Marseille II for kind
hospitality
and financialsupport.
The authors aregrateful
toAntoinette Sueur for efficient
typing
of themanuscript.
References
[1]
DOMANY E., MEIR R. and KINZEL W.,Europhys.
Lett. 2(1986)
175.[2]
MINSKY M. and PAPERT S.,Perceptrons
(Cambridge,
Mass. : MITPress)
1969.[3]
MEIR R. and DOMANY E.,Phys.
Rev. Lett. 59(1987)
359 ;Europhys.
Lett. 4(1987)
645.[4]
MEIR R. and DOMANY E.,Phys.
Rev. A 37(1988)
608.[5]
MEIR R., J.Phys.
France 49(1988)
201.[6]
GARDNER E., DERRIDA B. and MOTTISHAW P., J.Phys.
France 48(1987)
741.[7]
ZAGREBNOV V. A. and CHVYROV A. S., Sov.Phys.
JETP 68(1989)
153.[8]
ZAGREBNOV V. A. and PATRICK A. E., ProbabilisticApproach
to ParallelDynamics
forHopfield
Model,