Dynamics of a multi-layered perceptron model : a rigorous result

(1)

HAL Id: jpa-00212435

https://hal.archives-ouvertes.fr/jpa-00212435

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Dynamics of a multi-layered perceptron model : a

rigorous result

A.E. Patrick, V.A. Zagrebnov

To cite this version:

(2)

Dynamics

of

a

_{multi-layered}

_perceptron

model :

a

rigorous

result

A. E. Patrick

_(*)

and V. A.

_Zagrebnov

_(*)

Centre de

_{Physique Théorique,}

_Luminy,

13288 Marseille

_{Cedex 9,}

France

(Reçu

le 24 novembre 1989,

accepté

le 26

_{janvier 1990) .}

Résumé. 2014 Nous dérivons exactement et

rigoureusement

les

_systèmes

_{d’équations}

_dynamiques

pour un _perceptronmulti-couches

_proposé

par

Domany,

Meir et Kinzel

_(DMK-model).

Ces

équations

décrivent simultanément l’évolution des fonctions de recouvrement essentielles et

résiduelles.

Abstract. 2014 We derive

exactly

and

_rigorously

the _systemof

_{dynamical equations}

for a

multi-layered

perceptron

proposed by Domany,

Meir and Kinzel

_(DMK-model).

_They

describes both the main and the residual

_overlaps

evolution.

Classification

Physics

Abstracts 05.20 - 05.90 - 87.30

1. The DMK-model

_[1]

is a

_layered

feed-forward neural network without

_couplings

within

layers.

In this sence it can be considered as a

_{generalization}

of the idea of the

_perceptron

(multi-layered perceptron),

see _e.g.

_[2].

The

_dynamics

of the

_original

_DMK-model,

and of its various

extensions,

have been solved in

_[3-5]

_by

a

_{(hard controlled)}

saddle

_{point integration}

method,

proposed

in

_[6]

for the

_studying

of the

_{parallel dynamics}

in

_fully

connected neural networks.

The aim of the

_present

_{paper is}to derive

_(exactly

and

_rigorously)

the

_system

of

_dynamics

equations

for DMK-model

_exploiting

some main ideas of the

_{probability theory}

and the

theory

of

_dynamical

_systems

with noise. As a result we

_get

not

_only

_rigorous

but even shorter

(then

in

_[3-5])

_wayto exact

_system

of

_{dynamics equations.}

Below we follow the line of

_reasoning

formulated in

_[7]

for the

_Hopfield

model. This program is focussed on the accurate calculations of stochastic

_properties

of the intemal noise in the limit of

_large

neural networks. This

_approach

have been

_improved

in the

_subsequent

paper

[8].

There we

_explain

the

_importance

to consider

_together

with the main

_overlap

the

corresponding

stochastic

_equations

for evolution of the intemal noise

_{(residual overlaps).}

This allows to rederive

_(exactly,

_rigorously

and in shorter

_way)

the Gardner-Derrida-Mottishaw

equation

[6]

for the main

_overlap

_(in

the

_parallel

_dynamics)

for the second

step

(3)

1130

t = 2 and _toderive the

corresponding equation

for t = 3. This

approach gives

_{a more}clear

view on a

_possible

behaviour of the main

_overlap

for

_{arbitrary t}

and allows to

_present

a _pure

dynamical

few-lines derivation of the

_{Amit-Gutfreund-Sompolinsky}

_formula

_[9]

for the main

overlap

at t = 00.

Here we

_exploit

this

_approach

to DMK-model where it

_gives,

besides the well-known exact

result for

_dynamics

of the main

_{overlap [3-5],}

the exact

_equation

for the evolution of the

_noisy

terms

_{(residual overlaps).}

2. We recall that DMK-model can be formulated as follows. Each of the

_{layer t}

( =

1, 2,

...,

T )

contains N

binary

variables

(spin configurations) S (t)

=

{ Si (t )

= ±

1 } N=

i

and the set of M fixed realizations

_{{ {P (t ) } :}

= 1 of the sequence of

independent

identically

P =

distributed random variables of the

_length

N.

Therefore,

each

_{{P (t)}

e

_{n N (t)}

=

{-

1, 1} N -

the

probability

space with the natural

algebra

and

_probability

_{I?N {{}}

defined as the

_{product-measure by}

Pr

{ ; _ ± 1 }

=

1

2.

Then for the zero

_temperature

(0=0)

evolution of the DMK-model

_corresponds

to

updating

of the

_{spin configuration}

at the

_(t

+ 1

)-th

_{layer according}

to the effective fields

by

the law

Here fields

₍₁₎

are defined

_by

the

_{spin configuration}

on the

_previous

t-th

_layer

and

_by

the

(directed)

couplings

between this two

_layers,

which have Hebbian form

_[1]

For non-zero

_temperature

(0 :o 0)

evolution of the DMK-model is defined

_by

stochastic

equation

Here i.i.d.r.v.

_{;(t)}N-1

_reproduce

a heat-bath

_temperature

noise if

So,

in contrast to the

_quenched

random vectors

_(key

_patterns

at each

_layer)

the random variables

_{{ (t ) ) i 1}

are

_unquenched.

We have to _averageover this termal noise

variables to consider

_dynamics

_{in the presence of the heat-bath.}

3. The

_problem

of the neural networks

_dynamics

is

_{usually specified}

as follows. Let initial

configuration

S(t

= 1 )

has a

_{macroscopic overlap}

with one of the

_key

_patterns,

_e.g.,with

(4)

limit N --+ oo. For the DMK-model this means that for a fixed

_(quenched)

random

_system

of

uncorrelated

_key

_pattems

n

one chooses initial

configuration

in such a _{way that}

has non-zero limit

_{m q( t}

=

1)

for N --+ oo

(main

overlap)

and for

p ( # q )

=

1, 2,

..., M even if M =

a N (a - lim).

From

_{equations (1), (3)}

and

₍₆₎

one

_gets

where we introduce residual

overlaps

In contrast to

_mKr(t),

the residual

_overlaps

₍₈₎

do not _convergeto _anylimits for the

_quenched

system

of

_key

_patterns.

But

_they

have limits as _sequencesof random variables if we consider

patterns

as random vectors in the _space

_(f2,

u,

P).

Suppose

that at the moment t a - lim

_{m g (t)}

=

M q(t)

almost

surely (a. s.)

with

_respect

to

the

_probability

P.

Then

_by

the individual

_{ergodic (Birkhoff-Khinchin)}

theorem

_([10],

_V,

Sect.

₃₎

we

_get

that :

_{(1) a -}

lim for the arithmetic mean in the

rightihand

side of

(7)

exists

for almost all

_(by

the measure

P)

_quenched

_systems

of

_key

_{patterns ;}

(2)

this limit is a.s.

independent

of the choice of the fixed

_system

of

_patterns

_{(self-averaging)}

and

(3)

it is

_equal

to

expectation

Here

_noisy

term

_{q (t + 1 ) = -q i (t + 1 )}

and

is the

_stationary

_{(i = 1, 2, ... )}

_ergodic

_{sequence of random variables.}_{The convergence in}

₍₁₀₎

occures in the sense of distribution.

Therefore,

in the a - lim

_by

_ergodic

theorem one reduces the initial

_problem

for

_quenched

key

patterns

to calculation the

_expectation

with

_respect

to distribution of the limit random

variable

_(10).

_Hence,

to construct

_dynamics

for the main

_overlap

we have to consider

_together

with

_equation

₍₇₎

the evolution of the residual

_overlaps

_{(8). Using equations (1), (3)}

and

₍₈₎

(5)

1132

This is a stochastic

_equation

_{(recurrence relation)}

for random variables

generated by

the different realizations of the

_key

pattems

and the noise In the a - lim relations

₍₇₎

and

₍₁₁₎

form a

_system

of

_coupled

_equations

which

describes

_dynamics

of the DMK-model.

4.

_According

to above observation at

_first

we have to derive in a - lim the stochastic

recursion relations for evolution of the residual

_overlaps

{rP(t)} p+q (8)

and then use the

distribution of the

_noisy

term

₍₁₀₎

for calculation the recursion relation for the main

_overlap

mq(t ) (9).

We do this

_by

induction in t.

For t = 1 the initial

configuration

S(t

=

1)

is chosen in such a _{way that}

For

_unquenched

_key

_patterns

this means that

_{S(t = 1 )}

and

_{q (t = 1 )}

are correlated :

Pr

_{(Si (t}

=

1)

=

U) =

(1

+ mq(1

=

1),

but

for p #

_q

p t

= 1 Si(1

=

1)}

°°

1 is a

se-Pr(,(=l)==

+ ( )),

but for Pq

_{(=1)

)

_{( )Îm}

is a

sé-quence of i.i.d.r.v. with Pr

{U(t

=

1 ) Si(t

= 1 ) = ± 1 )

= 1.

Consequently, by

the central

2

limit theorem

_(CLT)

residual

_{overlaps (as}

random

_variables)

_{converge in the}sensé of

distribution to the Gaussian random variable :

with mean 0 and variance 1.

Moreover,

for the different

_{p (# q)}

_they

are

_independent,

_i.e.,

{rP(t

=

1)};= 1

is the _sequenceof i.i.d. Gaussian r.v. Note that for t = 2 random variables

{?(t)}

are

_independent

of {rÑ(t = 1 )}

_(their

_arguments

_belong

to different

_layers

_!).

Then

by

the

_symmetry

of

_{p(t = 2 ) _ ± 1}

and of the distribution for

_{N (0, 1 )}

we

_get

that

(§f(t

= 2) rP(t

=

1)}: = 1

is

_again

a _{sequence of}i.i.d. Gaussian r.v. This means that

_by

the

CLT random variables

_(deviation

of

_{rÑ (t}

=

1 )

from

_rP(t

=

1 )

_canbe controlled

by

the

Berry-Esseen

theorem

([10],

III,

Sect.

6),

i.e.,

we use

CLT in the scheme of

_series)

_{converge in}distribution :

to the Gaussian random variable with the variance a.

Finally,

_taking

into account

equations

(13), (14),

symmetry

of N

_{(0, 1 ), independence}

of random variables

_Vq

_N

_(t

=

2)} ,

(6)

Using

distribution

₍₁₅₎

and

_ergodic

theorem one

_gets

(see

_Eqs.

(7)

and

(9))

5. _{To go}to the next

_{step t}

= 3 for the

mq(t)

_wehave first to derive the

probability

distribution for

_{rP(t = 2 ),}

see

(9), (10).

To this end we calculate the a - lim for stochastic

recurrence relation

_(11),

i.e.,

stochastic characteristics of random variables

_{rp(t

=

2)}‘°

p=

conditioned

_{by arbitrarily}

fixed realization

_{rP(t

_{= 1)};=1.}

Now

_application

of the CLT to _{the sequence of random}

_variables

_{N (t = 2 ) (see (11)}

and

(14))

is not _very

_{straightforward.}

For fixed Gaussian

_{reafizations (r§ (t}

_{= 1 )}p}

_{(see (13))}

one

gets

that

_{{çJ(t = 2)}

_{r(t = 1)}t;&q is}

_{the sequence of}

_independent

but not

_identically

distributed random variables : in

_general

_{rpN (t}

=

1 )

rpN (t

=

1 )

for p :0 p’.

Therefore,

one

has either to check for this _sequencethe

_Lindeberg

conditions or to use

_Lyapunov’s

method of

characteristic functions for the direct

_{proof ([10],}

III,

Sect.

_3).

From Pr

{ef(t

=

2)

r’(t

=

1)

_{= :L-}

r’(t

=

1 )}

= 1/2

it

follows

that for the random variable

one

_gets

_ESM

=

0,

Var .

So,

the

charac-teristic function

_{fM(T )}

for the random variable

_{SM/ vVar}

_SM

has the form

Now,

by

the

_ergodic

theorem

_(or

the law of the

_large

_numbers)

for the almost all realizations

we

_get

that a - lim , see

equation

(13)

and remark after it. This means that

_by

definition

₍₈₎

and

_by

the iterated

_logarithm

law

([10],

IV,

Sect. 4)

one

_gets

(7)

1134

i.e.,

SM/ y’Var SM -+

N

_(0,1)

in the a - lim in distribution. Then as a

_corollary

(cf.

(11)),

we

get

Using

the

_independence

_{J(t

=

2)}

of random variables

₍₂₀₎

and

_equations

_{(12), (20)}

we

can calculate limit distribution for the random variables

(cf.

(11))

One gets

From the

_independence

_gf(t

=

2)} P :ri: q

of

_w-,§(t

=

2)}

and the

_symmetry

of distribution

* q j,N

)}

N

(22)

we obtain that distribution of its

_products

_(cf.

_{(11» {{j(t = 2)}

wf.’(t = 2)}:=1

coincides with

_{(22) :}

Therefore,

random variables

_{(for a}

a

fixed

_{realization {r(t}

_{= 1)}t",’q!)}

form Li.d.r.v. Now we can

_apply

CLT

(again

in the

scheme of series

_([10],

III,

Sect.

₄₎₎

to _{the sequence}

{signqp N

_controlling

the rate of

convergence in

(22)

by

the

Berry-Esseen

theorem

([10],

III,

Sect.

6) :

Let us stress that it is

_key

_patterns

from the net

_{layer t}

= 2 that

_guarantee

the

independence (§f(t

=

2)

wf,’(t

=

2)}=1

of realization

_{rpN,(t

=

1)}P

and

_finally

the

inde-j = 1 _P

pendence

N (0, 1 ),

in

_(24),

of

_rP(t

=

1 ).

For the

Hopfield

model

they

are

dependent :

this créâtes main difficulties in dérivation of

_{dynamical équations}

for this model

_[8].

(8)

Using equations

(22)

and

₍₂₃₎

we can estimate

expectation

in

(24)

and

(25) :

So,

using

equations (24), (26)

we

_get

the recurrence relation for residual

_overlaps

_{(cf. (11)) :}

Here,

in accordance to that we have stressed

above,

the random Gaussian variable

N (0, 1 )

is

_independent

of

_{rP (t}

=

1 ).

So,

_one

_gets

for Var

rP(t )

=

D (t )

the

following

relations

(see (13)

and

_{(27)) :}

6. From the

_explicit

formula for

_{’n 2 N}

and the limit distribution

₍₂₂₎

it follows that random variables

_{Tlf}

p are

uncorrelated for

p =1= p’:

E (e,» (t

=

2)

eP’(t

=

2))

=

0,

and variables

{rP(t

=

1)} P

are even

_independent

for

_{p :0}

_p’.

_Therefore,

random

variables {rP(t

=

2)}

p are

uncorrelated. From

₍₁₁₎

it follows that this

_property

is

_{reproducible.}

If

{rP(t)} P

are

uncorrelated at the moment _t,then

_by

above

_arguments

the same

_property

_{has the sequence}

Therefore,

we can use induction in t to dérive the recurrence relations for the main and

residual

_overlaps

for

_arbitrary

t. The first

_step

we have

_proved

in 4 and 5.

Now let

_{{rP(t)} p}

be uncorrelated

_identically

distributed Gaussian random variables with

zero mean and

Var rP(t)

=

D(t).

From the

_symmetry

and the

independence

eP(t

+

₁₎

of

M j

rp(t )

we

_get

that

)

is

_again

Gaussian variable

_(as

a sum of

_Gaussians)

which in the a - lim is

_equal

to

N(O, D(t)).

_Then,

_taking

into account

rP(t =

a - lim r (t)

and

_{equations (9), (10),}

one

_gets

(9)

1136

for the almost all realizations. Let realization

_{rP(t)}

_{P be}

fixed.

_Then,

_using

₍₂₉₎

and the line of

_reasoning

of the section

5,

we

_get

_{(see (11)}

and

_{(20)-(22)) :}

This means that we can _{copy the}

_proof

in the section 5 to

_get

_{(cf. (27)) :}

where

_again

the random variables

_{N(0, 1)}

and

_rP(t)

are

_independent,

_i.e.,

This finishes the

_proof

for the zero

_{temperature 0}

= 0.

For 0 #

0 one has to use evolution defined

_by

stochastic

_{equation (4).}

As far as the

heat-bath

_temperature

noise

₍₅₎

is

_independent

of intemal evolution we have

_simply

to _average

equations (28)

and

₍₃₀₎

over this noise. As a result one

_gets

where random variables

_{N (0, 1 )}

and

_{rP(t), p :o}

_q,are

_{uncorrelated,}

rP(t

=

1 ) = N (0, 1 )

and

D(t)

= Var

rP(t),

i.e.,

7. The

_equations

_{(28), (32)}

and

_{(33), (35)}

have been derived at the first time in

_{[3, 4]}

for the

case of linear Hebbian

_couplings

₍₃₎

and for the various extensions of the basic DMK-model in reference

_[5].

There one can find also a detailed

_analysis

of the fixed

_point

_equation

and the critical behavior of the model.

Function

_{D (t )}

in _{the above papers in}an

_auxiliary

saddle

_point

order

_parameter.

The

advantage

of our method

_is,

in

addition,

in

_decoding

of the

_meaning

of the

_parameter

D (t)

as the variance of the residual

_{overlaps responsible}

for the intrinsic noise in

_dynamics

of the main

overlap.

Note added. -

After this paper has been finished

_(Dubna,

_July

₁₉₈₉₎

we become aware of the

recent work

_by

the DMK-model inventors

_[11]

_{where very similar}

_(except

of the

_equation

for the residual

_overlaps

and the

_{rigorous proofs)}

ideas were

_proposed,

_[12].

Acknowledgments.

(10)

and useful discussions. He also thanks R. Kühn and J.

_Slawny

for

_helpful

remarks and

comments. V. A. Z. thanks CPT-CNRS and the Faculté des Sciences de

_Luminy,

Université

d’Aix-Marseille II for kind

_hospitality

and financial

_support.

The authors are

_grateful

to

Antoinette Sueur for efficient

_typing

of the

_manuscript.

References

[1]

DOMANY E., MEIR R. and KINZEL _W.,

_Europhys.

Lett. 2

₍₁₉₈₆₎

175.

[2]

MINSKY M. and PAPERT S.,

Perceptrons

(Cambridge,

Mass. : MIT

Press)

1969.

[3]

MEIR R. and DOMANY E.,

Phys.

Rev. Lett. 59

₍₁₉₈₇₎

359 ;

Europhys.

Lett. 4

₍₁₉₈₇₎

645.

[4]

MEIR R. and DOMANY E.,

Phys.

Rev. A 37

(1988)

608.

[5]

MEIR R., J.

_Phys.

France 49

₍₁₉₈₈₎

201.

[6]

GARDNER E., DERRIDA B. and MOTTISHAW P., J.

_Phys.

France 48

₍₁₉₈₇₎

741.

[7]

ZAGREBNOV V. A. and CHVYROV A. S., Sov.

_Phys.

JETP 68

₍₁₉₈₉₎

153.

[8]

ZAGREBNOV V. A. and PATRICK A. E., Probabilistic

_Approach

to Parallel

_Dynamics

for

_Hopfield

Model,

JINR-Dubna, Preprint

1989

_{(in Russian).}

[9]

AMIT D. J., GUTFREUND H. and SOMPOLINSKY H., Ann.

_Phys.

NY 173

₍₁₉₈₇₎

30.

[10]

SHIRYAYEV A. N.,

Probability

(New

York:

Springer-Verlag)

1984.

[11]

DOMANY E., MEIR R. and KINZEL W., J.

_Phys.

A : Math. Gen. 22

₍₁₉₈₉₎

2081.