• Aucun résultat trouvé

Extensions of a solvable feed forward neural network

N/A
N/A
Protected

Academic year: 2021

Partager "Extensions of a solvable feed forward neural network"

Copied!
14
0
0

Texte intégral

(1)

HAL Id: jpa-00210685

https://hal.archives-ouvertes.fr/jpa-00210685

Submitted on 1 Jan 1988

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Extensions of a solvable feed forward neural network

Ronny Meir

To cite this version:

Ronny Meir. Extensions of a solvable feed forward neural network. Journal de Physique, 1988, 49 (2),

pp.201-213. �10.1051/jphys:01988004902020100�. �jpa-00210685�

(2)

Extensions of a solvable feed forward neural network

Ronny Meir

Department of Electronics, Weizmann Institute of Science, Rehovot 76100, Israel

(Requ le 9 septembre 1987, accepté le 21 octobre 1987)

Résumé.

2014

J’étends la classe des modèles de réseaux neuronaux avec connexions unidirectionnelles discutée dans une publication antérieure. Je propose trois modifications principales : a) apprentissage pondéré, b) apprentissage de formes corrélées et c) effet de bruit synaptique. Chacun des modèles étudiés est résolu exactement et des relations de récurrence entre couches sont obtenues.

Abstract.

2014

I extend the class of exactly solvable feed-forward neural networks discussed in a previous publication. Three basic modifications of the original model are proposed : a) learning with weights, b) learning biased patterns and c) the effect of static synaptic noise. Each of the models studied is solved

exactly and layer to layer recursion relations are obtained.

Classification

Physics Abstracts

05.20

-

05.40

-

75.10H - 87.30

1. Introduction.

The original Little [1] and Hopfield [2] models of

Neural networks have been much extended over the past few years following the work of Amit et al. [3].

These extensions have been in various directions.

The first type of extension was a modification of the learning rules to incorporate such effects as

forgetting [4, 5], storage of correlated patterns [6-9]

and more. These models were still of the original Hopfield paradigm in the sense that they consist of a network of symmetrically connected binary variables (spins) with 2-spin interactions of an infinite range type. The method of solution in these cases (when an

exact solution exists) is the replica method, following

the original work of Amit et al. [3].

Another class of models eliminates the restriction of symmetric bonds [10, 11] inherent to the Hopfield

model and the extensions discussed above. Techni-

cally, once the bonds are made asymmetric the

model is no longer Hamiltonian and the replica

method cannot be used. Recently, Derrida and co-

workers [12, 13] have shown that one can solve

exactly the dynamics of a class of heavily diluted asymmetric neural-networks. It turns out that the dilution and asymmetry make the model rather

easily soluble.

Another extension we consider is that of layered

architectures. This type of model, on which we will

focus in what follows, has been studied extensively

by computer scientists over the past few decades.

The origin of this class of models can be traced back to the idea of the perceptron introduced by

Rosenblat [14] and studied in detail by Minsky and Papert [15], who demonstrated the limits of the

single layer perceptron. In the last few years much work has been done in generalizing the original ideas

of Rosenblatt to multi-layered systems. The main feature of this class which distinguishes it from the

previous classes is the existence of « hidden units ».

These systems usually [16] consists of an input unit,

an output unit and intermediate « hidden » units that do the processing. Contact with the external world is made only via the input and output units. No external constraints are placed on these units, and they are used in order to construct good « internal representations » of the environment. Recently,

Rumelhart et al. [1’7] have found an algorithm called

« back propagation » which solves many of the

problems encountered in the earlier perceptron models. Multi-layered models with couplings

between and within layers have been introduced in the physics literature by Huberman et al. [18]. Later Domany et al. [19] introduced layered feed-forward networks, with no couplings within layers.

The dynamics of a model feed-forward neural- network have been recently solved by Meir and Domany [20, 21] (to be referred to as MD). This

model will be briefly described in the next section.

Article published online by EDP Sciences and available at http://dx.doi.org/10.1051/jphys:01988004902020100

(3)

This paper generalizes our previous work in various

directions. 1) Incorporation of different learning schemes, namely the weighted learning of Nadal

et al. [4]. 2) Learning biased patterns [7], i.e. pat-

terns whose level of activity is different from 50 %.

3) Effect of static noise in the synapses. The paper is

organized as follows. In section 2 I give a detailed description of the network, its architecture and

operation and a detailed description of the exten-

sions considered. The exact analytic solution of the various extensions to the original MD model is given

in section 3 in the form of layer to layer recursion relations, while section 4 contains an analysis of the

results. Section 5 summarizes our findings.

2. Definition of the model.

The original model we studied is the following.

Consider L layers ; each contains N cells (spins),

with a binary variable Si = ± 1 associated with cell i of layer I. Each cell is connected to all cells of the

neighbouring layers. The bonds are, however, unidi-

rectional : the state of layer I + 1 is determined by

the state (at the previous time step) of layer 1 according to a probabilistic rule. The dynamic pro-

cess is one which sets the layers sequentially : input corresponds to setting the first layer in an initial

state, Si1. At the next time step the second layer is set

in state s1 and so on. The probability that the i’th

spin in the (I + 1 )’th layer has the value Sil + 1, given

that on the previous layer I the cells are in state

Sil, is taken to be

where

is the field produced by the spins of layer I at site i of layer I + 1. The parameter

governs the stochasticity of the dynamics, which is

deterministic for T -->* 0 (or 13 -+ oo ) and becomes

more stochastic as T increases. The couplings or

bonds Jfj are chosen according to some prescription,

which we took in our original solution [20, 21] to be

of the outer-product [1, 2] type,

where 6 f, , v =1, ..., p,, are the stored key pat-

terns.

The extensions we consider in this paper are the

following :

1) Weighted learning schemes [4, 13] : Here the original learning rule (Eq. (3)) is modified. Each pattern is learned with a weight. This models the effect that patterns which have been recently learned

are embedded with a larger weight as compared with

« old » patterns. A more detailed discussion of the

« philosophy » of this modification can be found in reference [4]. The new couplings in this case take the

form :

where A (v / N ) obeys the normalization condi- tion [4]

where K is a normalization constant independent of

N. With this notation, and assuming A(u) to be a decreasing function of u, the most recently stored pattern is the one with v = 1, and the storage

« ancestry » increases with v. This modification was

originally proposed by Mezard et al. [4] who solved

the problem for the Hamiltonian networks within the framework of the replica theory. Derrida and Nadal [13] have also considered this modification of the learning rule for the diluted asymmetric neural-

networks [12]. They were able to give an exact

solution of the dynamics in this case.

2) Biased patterns [7] : Following Amit et al. [7]

we study the properties of our network when the

mean level of activity is different from 50 % as in MD. Thus, every component e, J.L in a learned

pattern can be chosen independently with a probabi-

li ty P ( e f, J.L ),

We adopt also the modified form of the coupling proposed by Amit et al. [7]

where a, is the magnetization of the patterns on layer

1.

3) Static noise in the synapses [22] : Here the couplings are modified so as to include a random part which is not related to the learning process. The

couplings in this case take the form :

where the L1f; are independent, identically distributed

(4)

Gaussian random variables with zero mean and

variance A/ BIN. This problem was treated by Sompolinsky [22] in the context of the Hopfield

model.

It should be noted that in all the above extensions

we retain the basic feature of our network, i.e. each pattern carries a layer index. This is a central feature that characterizes the class of model neural networks studied in [19] ; it has conceptual as well as technical significance. The main point is that while the input representation of the key pattern v, i.e. gf, ,, is

externally dictated, the network is free to choose the internal as well as output representations g f, v’

l ::> 1.

By exact solution of our model we mean the

following. The network is presented with an initial configuration S’ on the first layer (l = 1 ). This may be one of the key patterns, a noisy key pattern, a mixture state (one having finite overlap with several

key patterns) or just a random state. This state is

characterized by its overlap mi with each of the key- patterns on the first layer. The overlap mlu is defined

by

(this definition will be generalized when dealing with

biased patterns in sect. 3). Our solution consists of the calculation of mi on all subsequent layers. From

the recursion relations for mi we will able to learn a

great deal about the performance of the network.

This will be done in section 4 after the solution is

presented.

3. Exact solution.

This section is divided into four parts. In the first I

give the general framework for the solution of feed- forward type networks. We have given a detailed

derivation of the original model in MD, but will recapitulate the main steps for the sake of complete-

ness. Our formulation is similar to that proposed by

Gardner et al. [24] for the case of the parallel dynamics of the Little and SK models. In each of the

remaining three parts I consider one of the exten- sions of the basic model we have previously solved

and derive the exact layer to layer recursion rela-

tions, which are then analysed in section 4.

3.1 GENERAL FRAMEWORK. - Consider a random

assignment of v = 1, 2, ..., ps key patterns f, v on

each of L layers of the network. Choose an initial state on the first layer, S1,

The question we ask : what is the probability

p (SL I S1), that the dynamic rules (1-2) produce on

layer L a state S’ given the initial state S1. Note that

we must average both over the random assignment

of the es and over the probability distribution given

in equation (1). The conditional probability to get a configuration Sl + 1 on layer I + 1, given the configu-

ration S’ on the previous layer, is obtained by taking

the product of equation (1) over all sites :

where hi + 1 is given in equation (2). The subscript e

denotes the dependence of P j on all the key patterns

g f. v’ A sequence of configurations S1,

...,

SL will be generated by our dynamic rules with the probability

In order to obtain the probability for a configuration SL on layer L, given the initial state S1, we must sum

this over all intermediate layers

Finally, averaging this quantity over the probability

distribution of the random variables 6 we obtain the probability P for SL given S’ for a given realization of the £is

The double averaging sign indicates an average over

the es.

Combining the above equations into a single expression for the probability distribution P (SL I Sl )

we obtain :

This expression is the starting point for the various models considered. We now derive the solution in each case. The basic idea in the solutions is to bring

the expression for P into the form of an integral of

the type

from which the layer to layer recursion relations will

be derived from the saddle point equations aF/ax.

To do this we will have to introduce various order

(5)

parameters in order to calculate the averages over the patterns and the trace over the spin variables.

3.2 WEIGHTED LEARNING SCHEME. - Here we use

the form given in equation (4) for the coupling. In

this case the expression for the probability distribu-

tion takes the form :

In this equation and in what follows we have made the abbreviation Au

=

A (J.L / N). In all subsequent

discussion of the weighted learning scheme, I will

use the notation p,

=

gN. As we show in section 4.1,

the number of patterns embedded in the network is not equal to the number of effectively stored patterns, aN. That is, even though u = 1, 2, ..., p S patterns appear in the sum (4), only aN ps s patterns are effectively stored by the network [4]. (In the two

other extensions I consider these two numbers are

the same and will be denoted by the standard notation aN).

To proceed further we need to introduce new

variables which will make the calculation of the averages possible. Thus, we introduce the two sets of variables m, rh and Q , 0 through the relations :

and

At this stage we make the assumption that the

initial state S’ has a finite overlap mv with pattern v

and an overlap of order 1/ BIN with all the others.

Doing this we get the following expression for P :

In the expression above we have separated the terms with J.L =1= v from the term with g

=

v. In this equation and in the remainder of this sub-section

whenever u appears it will only take values g = v.

Y is defined in equation (19) below. At this stage I make the following observation. Due to the fact that I have considered a system of L layers, the variables

SL and ç t JL corresponding to the last layer appear in

a different form from those corresponding to the layers I L. In what follows I will be interested in the layer to layer recursion relations. Clearly, the

recursion relations for layers I L do not depend on

what goes on in layer L. In order to avoid this asymmetry I will implicitly assume that the size of the system L - oo in deriving the recursion relations.

Thus, I define the probability distribution P by the equation :

With this proviso Y is given by

In this expression and in what follows, the upper limit on the summations on I will be assumed be

inifinity, in accordance with what was said above.

The lower limit will be 1, unless otherwise specified.

To proceed further we use the fact that our initial state S’ has_a finite overlap with the pattern v and’

order 1/ BIN- with the others. We assume that this

(6)

situation holds at all times (i.e. on all layers), and

thus make the following rescaling of the variables in the expression Y for g :0 v

Using the new variables A I and A we expand the

expression for Y to lowest order in N

In order to separate variables A 1’that carry a pattern

index, from the cpi, that carry a site index, we need

to introduce additional variables using the following

identities :

Going back to equation (17) we note that we still

need to calculate the average over the pattern

ç f, v and the trace over the variables Sl. This give an

additional contribution of the form

Combining all the above results into one formula

we obtain :

The integral over the variables m v can be done and

we finally obtain :

where

and

The function Z, is given by

In the limit N - oo we can calculate the integral (23) using the saddle-point method. This corresponds

to evaluating the derivatives of the form aF lax

=

0

(7)

where x stands for any one of the integration

variables. Doing this we obtain the saddle point equations in the form of recursion relations for the various order parameters introduced. After ma-

nipulating these expressions using similar techniques

to those described in MD we finally obtain the

following recursion relations which are derived in

appendix A.

where a J.L, l + 1 is given by

In this equation the average is with respect to the Gaussian random variable z with zero mean and unit variance. The initial conditions are au, = 1 which implies q 1= K (assuming the normalization condi- tion 4b).

In the limit when N --> oo we can transform the

sum in (29) into an integral. Doing this, and defining rl

=

gql we obtain the following recursion relations at zero temperature (3 ---+ oo ) :

with

al(u) = 1 and

These equations constitute the solution of this model and will be analysed in the next section. In particular,

we will show that the number of effectively stored

patterns may be smaller than the number of embed- ded patterns ps.

3.3 LEARNING BIASED PATTERNS.

-

In this subsec- tion we give the exact solution of the model as

defined in section 2, through the coupling Jfj given in

equation (6) and the probability distribution for the random variable )/, , given in equation (5). Follow-

ing Amit et al. [7] I also define the order parameter

m in this case to be :

where a, is the magnetization on layer I. The full

overlap fn-1. is given by

Following the derivation given in the previous sub-

section for « Learning with Weights », and making

the obvious generalizations appropriate here we finally obtain the following expression for the prob- ability distribution P (Details of the derivation are

given in appendix B).

where

and

The function Z is given by

In these equations a

=

psIN (see discussion following Eq. (14)). As in the previous section we calculate the

integral by the saddle-point method. The general recursion relations and their detailed derivation are given

in appendix B. For the sake of brevity, we give here only the recursion relations at T

=

0 and for the case

where a,

=

a, i.e. a is independent of the layer. These recursions have the following form :

(8)

These equations together with the initial conditions m 1 = m 1 and q 1 = 1 - a 2 constitute the solution of the model (at T

=

0).

3.4 EFFECT OF STATIC NOISE.

-

In this subsection I

give the solution to the basic MD model with the inclusion of a static (non-learned) component in the coupling. This generalization has been treated in the context of the Hopfield model by Sompolinsky [22],

with similar results to those we obtain below. I will derive the recursion relations at zero temperature and analyse them in the next section. The only

difference between the derivation at zero tempera-

ture and at a finite temperature T is that instead of the function P (SI + 11 1 Sl) appearing in equation (9)

we have a theta-function constraining the dynamics.

The probability distribution P (SL S1 ) is given in this

case by

In this equation the square brackets represent an average over the Gaussian random variables i1fj with

zero mean and variance A/ J N. Transforming the

theta-function into an integral by using the identity

we bring the function P into a form which can be

easily evaluated. The integral over the distribution

P (A!.) is Gaussian and can be easily done. The remaining average over the patterns f, -, and trace over Sli is done in a manner analogous to that

described in the previous two subsections, and will

not be repeated. The final layer to layer recursion

relations I obtain are :

The initial conditions are as before given by

Ml=Ml 1 and q 1 = 1.

4. Analysis of the solution.

In the previous section we presented the recursion relations for the three generalizations of the basic MD model. This section contains an analysis of these

recursion relations. The most important question I

wish to address is the asymptotic behaviour of the system. That is, given an initial state which has overlap m 1 with a given pattern v what is the behaviour for I - oo. If m * is finite (and close to 1)

we say that the system has recognized pattern v. If

the asymptotic overlap is 0 (or order 1/ BIN), the

system has lost all trace of its initial state, and thus has not recognized pattern v. In MD we found that the system undergoes a phase transition as the parameter a is varied. For example, at T

=

0, we

find that for a > 0.27 the asymptotic overlap is zero,

whatever the initial state. This means that the system

cannot function any longer as an associative memory.

I find that similar type of behaviour persists in the

models described above. In the following three sub-

sections I discuss the behaviour of the network for the three generalizations treated above.

4.1 WEIGHTED LEARNING SCHEMES.

-

Starting

from the general recursion relations given in equa- tions (29), (30) we restrict ourselves in this section to the case of the so called marginalistic learning

scheme. This corresponds to the following choice of

the function Au [4, 13].

With this choice of Au I find two different types of

behaviour as a function of the parameter E. This behaviour is the same as that obtained by Mezard et

al. for the case of Hamiltonian networks and by

Derrida and Nadal for the case of the diluted

asymmetric networks. Following Derrida and Nadal

I define the number of e f fectively stored patterns to

be pm, and the parameter «

=

pmln. Recall that the

total number of patterns embedded in the system

(9)

was ps

=

gN. Similarly to the results of the above authors I find two regimes as a function of 8. The behaviour in each regime is of the following nature (using the notation of Derrida and Nadal [13]) :

(i) Good learning regime for g -- g * (E ). In this regime all the embedded patterns are effectively stored, i.e. pm = ps.

(ii) Forgetting regime for g * -- g -- g c (E ). In this regime only the most recently stored patterns are

effectively stored, while the rest cannot be retrieved.

In this regime pm ps.

(iii) Above gc (e ) no stored pattern is effectively

memorized. In this regime a

=

0.

There is never a complete deterioration regime for

this range of E. As found in the diluted model of Derrida and Nadal there is an asymptotic finite capacity a as g - oo . That is, for g g * the capacity

a is equal to g, while for g > g * the limiting capacity, a (g - oo ), is finite. This behaviour can be

seen in figure 2.

For the marginalistic type of learning described by equations (42) I find the critical value of E to be Ec

=

1.68... In figure 1 I give the capacity a vs. g in

Fig. 1.

-

Weighted learning : effective capacity a vs.

g

=

ps/N where p, is the number of patterns embedded in the network. Three regimes are seen in the figure as

discussed in the text (e

=

1 EC).

the first regime (for E = 1 E c). The different types

of behaviour described in (i), (ii) and (iii) above can

be clearly seen in this figure. Figure 2 depicts the

same graph for E

=

2 :> £c which is in the second

regime in E. As we can see from figure 2 the curve

« (g ) indeed levels off to its asymptotic behaviour already at g - 1, and the complete deterioration

regime a

=

0 is never reached.

Fig. 2.

-

Weighted learning : effective capacity a vs.

g

=

ps/N where ps is the number of patterns embedded in the network for E

=

2 > ec. Here only two distinct regimes

are observed as discussed in the text.

4.2 BIASED PATTERNS.

-

Starting from the zero temperature recursion relations given in equa- tions (39) I address the problem of the asymptotic overlap m * and the critical value of a as a function of the magnetization a imposed on each layer.

Solving equations (39) numerically I find the curve a, (a) given in figure 3. As can be expected we see

that the storage capacity decreases with the magneti-

zation a and approaches zero as a - 1. This curve is similar in shape to that obtained by Amit et al. [7] for

the case of Hopfield model with biased patterns.

Figure 4 depicts the asymptotic overlap m c *(a),

where mc* m * (a c). Again, one observes that as

the magnetization increases the asymptotic overlap

decreases and approaches 0 as a -> 1. The full

overlap however (Eq. (33b), m = m + a 2 is always

close to 1, even in the vicinity of a c.

Fig. 3.

-

Biased patterns : critical capacity a vs. a, the

magnetization on each layer.

4.3 STATIC SYNAPTIC NOISE.

-

As for the original

MD model the saddle-point equations contain two

(10)

Fig. 4.

-

Biased patterns : asymptotic overlap m * (a,) vs.

a, the magnetization on each layer.

stable solutions. The solution with m - 1 disappears

above a critical value a c (A). I plot this critical value of a as a function of A in figure 5. As can be expected this ac is a monotonically decreasing

function of A. We observe that retrieval is possible only for L1 0.8 which is the same value obtained by Sompolinsky in the case of the Hopfield model. In figure 6 I plot the asymptotic value of m c *

=

m * (a c)

as a function of noise level A. I find that it goes

continuously to zero as A - Ac-.

As was found by Sompolinsky [22] in the case of

the Hopfield model I find our layered model to be

rather insensitive to static noise. In fact the network

performs rather well even when the width of the noise term is comparable to the width of the_Hebb

component which is of the order of J a / J N .

Fig. 5.

-

Static synaptic noise : critical capacity ac vs. d,

the width of the distribution of the random variable

4.

5. Discussion.

In this paper I have extended the solvable class of feed-forward neural networks defined by Domany et

Fig. 6.

-

Static synaptic noise : asymptotic overlap m *(a,) vs. d, the width of the distribution of the random variable .1fj’

al. [19] and solved by Meir and Domany [21] to

include :

a) A weighted learning scheme.

b) Learning of biased patterns.

c) Inclusion of static noise in the synapses.

My analysis has shown that all these modifications

can be successfully incorporated into the basic model of MD, thus enlarging its domain of operation. As

we have shown, the introduction of weighted learn- ing prevents the abrupt decline in performance of

the network at a sharp value of a(= 0.27). The price we pay, of course, is that « anciently » learned patterns are forgotten. We have also shown that

simply correlated patterns, as in the biased pattern

case, can be successfully learned and retrieved.

Finally, static synaptic noise was found to have little

effect on the performance of the network, even

when it is comparable in magnitude to the learned

component.

The mathematical techniques used in this paper and in MD should be applicable to many types of feed-forward neural networks, and allow one to obtain interesting analytical insight into these models, which would not be available from computer simulations.

Of course the most interesting part of the theory

of feed-forward networks has not been addressed in this paper, i.e. the problem of learning. We have previously introduced [19] a layered model posses-

sing such a dynamical learning stage which leads to

perfect recall of all key patterns. We note in passing

that much recent work is concerned with learning algorithms for feed-forward networks [16]. Numeri-

cal work has demonstrated the utility of such sys- tems, but no convergence theorem has been proved

for the learning stage, as has been for the single layered perceptron [15].

Two interesting unanswered questions in the

theory of feed-forward neural networks, which can

(11)

hopefully be attacked with the techniques described

in this paper, are the following :

1) The maximal storage capacity of such systems.

This question has recently been addressed by Gard-

ner and Derrida [25] in the framework of the

Hopfield type networks. It would be interesting to

compare the capacity of the two types of systems.

2) The structure of the attractors, i.e. the topog- raphy of the state space. Questions such as the

asymptotic overlap of two initially close patterns are of importance in answering this question.

We are currently studying the possibility of using

such networks for storing sequences of different

periods (in a single network).

I thank E. Domany and H. Orland for many

helpful discussions, and E. Domany for his encour- agement. This research was supported by the US-

Israel Binational Science Foundation, the Israel

Academy of Sciences and the Minerva Foundation.

Appendix A.

In this appendix I derive the recursion relations for the weighted learning scheme. The derivation is very similar to that given in appendix A of MD but will be

repeated for completeness. The saddle-point equations derived from equation (22) are :

As in MD we need to make an ansatz concerning the

solution of the saddle-point equations. Thus, I

assume

With this assumption one can show that p must also

vanish. To see this I note that p1, contains a term of

the form (i  10 À 10 - 1) zp.

.

This term is proportional

to

Assuming qL

=

0 for some large L (which is the

number of layers in the system which we implicitly

assume tends to infinity) and integrating over À L gives d (Â L). In the last term of (A.3) Â L

multiplies A L - 1 ; hence if A . L = 0 and qL - 1 = 0,

there remains only one term with A L -1, and the integral over A L - also yields 8 (A . L -1 ) and so on,

until I

=

l o is reached, for which one gets

Thus ql

=

0 yields

However, for pl

=

0 it is easy to see that /i of equation (27) satisfy the relation

The equation for ql is obtained from calculating

the average ( (A 1)2) z,, appearing in the first equation

in (A.I). A straight-forward evaluation of this

integral, yields

Now we must evaluate i,61. Using the expression

for f given in (26) together with the equations (A.1)

and (A.2), we get

It should be noted that (A.5) holds for P’

=

0 ; in

(A.7) we must first take the derivative and then set

pl

=

0. From (26) it is easy to see that

and we find

Calculating the derivative and substituting this into (A.6), using the first equations of (A.1), yields the

recursion relation for ql. The recursion relation for

ml, is obtained by using (A.5) and the last of

equations (A.1) to get

Using equation (27) for I’ (with pl

=

0) yields the

(12)

required equation. Putting everything together I

obtain the following recursion relations, which are displayed in the main body of the paper as well.

where aM, I + 1 is given by

In this equation the average is with respect to the Gaussian random variable z with zero mean and unit variance. The initial conditions are a JL, =1 which implies q = K (see Eq. (4b)).

Finally, in order to check self-consistency of the

solution we must demand that indeed 41

=

0. To

show this we need to evaluate a f 1 + 1/ aq which can

be shown to yield

Using this result it is simple to check that indeed

ql

=

0, and our solution is consistent with the

starting assumption that led to it.

Appendix B.

In this appendix I derive the recursion relations for the network with biased patterns. As usual, we start

from the expression for the probability distribution

given in equation (13). Inserting the explicit form for

Jfj given in equation (6), and going through the same

introduction of variables m, m, cp , 0 as in equations (15), (16) of the main text I obtain : (note however

that the definition of ml has been modified in this

case)

In the above expression we have separated the terms with tk > 1 from those with g = 1. In what follows tt always takes values > > 1. Using again the remarks made after equation (17) I implicitly assume

L --> oo and so neglect the boundary term resulting from the term with SL. As mentioned before, since I will be interested in the recursion relations this makes no difference. Doing this Y is given by :

which can be averaged over the probability distribution for g given in equation (5). This yields :

Rescaling the variables ml u and thl as in equation (20) and defining the variables pi and

ql as in equation (21) we are left with the following expression for P :

(13)

The trace over the variables Si can be easily done. This yields the following expression :

Collecting all terms containing the variables cp, 0 we have integrals of the following form :

The integral over the variable 0 can be done and we are left with the following expression for J :

Combining all the above manipulations, we finally

obtain the final expreession for P given in the main text (Eqs. (34-38)).

As was mentioned in the text the integral is

calculated via the saddle-point method. To do this

we must set to zero the derivatives of the function F with respect to each one of the integration variables.

Doing this we obtain the saddle-point equations :

The equation for ml can be seen to give

where 11 ell £2 is given in equation (37). In order to

solve the saddle-point equations we need to make an

ansatz concerning the solution. Using the experience

gained in MD we assume :

(14)

which will be checked for self-consistency at the end

of the calculation. With this assumption it is not

difficult to check that

as well (see appendix A for details). The equation

for ql is :

With pl

=

0 one can also check that the following

relation holds :

From (B.10), (B.ll) and (B.13) one finds that

From (B.9) and (B.12) with the assumptions (B.10) and consequences (B.ll), (B.13) and (B.14)

it is simple to obtain the following recursion relations

after some algebra :

In these equations ( ... ) represents an average with respect to the Gaussian random variable z with zero mean and unit variance. Taking the zero temperature limit Q - oo we obtain the recursion relations given

in equation (39) in the main text. Finally, one can

check that the assumptions (B.10) indeed lead to a

self-consistent solution. To do this I substitute the solution (B.10) and (B.11) into the saddle-point equations (B.8) and find that they are indeed

satisfied.

References

[1] LITTLE, W. A., Math. Biosci. 19 (1975) 101.

[2] HOPFIELD, J. J., Proc. Natl. Acad. Sci. USA 79

(1982) 2554.

[3] AMIT, D. J., GUTFREUND, H. and SOMPOLINSKY, H., Phys. Rev. A 32 (1985) 1007 ; Phys. Rev.

Lett. 55 (1985) 1530 ; Ann. Phys. 173 (1987) 30.

[4] MEZARD, M., NADAL, J. P. and TOULOUSE, G., J.

Phys. France 47 (1986) 1457.

[5] PARISI, G., J. Phys. A 19 (1986) L617.

[6] PERSONNAZ, L., GUYON, I. and DREYFUS, G., J.

Phys. Lett. France 46 (1985) L-359.

[7] AMIT, D. J., GUTFREUND, H. and SOMPOLINSKY, H., Phys. Rev. 35 (1987) 2293.

[8] KANTER, I. and SOMPOLINSKY, H., Phys. Rev. A 35 (1987) 380.

[9] DIEDERICH, S. and OPPER, M., Phys. Rev. Lett. 58

(1987) 949.

[10] HERTZ, J., GRINSTEIN, G. and SOLLA, S., in L. Van Hemmen and I. Morgenstern (eds) : Glassy Dynamics (Berlin : Springer Verlag) 1987.

[11] SOMPOLINSKY, H. and KANTER, I., Phys. Rev. Lett.

57 (1986) 2861.

[12] DERRIDA, B., GARDNER, E. and ZIPPELIUS, A., to be published in Europhys. Lett.

[13] DERRIDA, B. and NADAL, J. P., submitted to J. Stat.

Phys.

[14] ROSENBLATT, F., Principles of Neurodynamics (Washington D.C. : Spartan) 1961.

[15] MINSKY, M. and PAPERT, S., Perceptrons (Cam- bridge, Mass : MIT Press) 1969.

[16] MCCLELLAND, J. L. and RUMMELHART, D. E., Parallel Distributed Processing : Explorations in

the Microstructure of Cognition, 2 vols. (Cam- bridge, Mass : The MIT Press) 1986.

[17] RUMELHART, D., HINTON, G. and WILLIAMS, R., Nature 32 (1986) 533.

[18] HOGG, T. and HUBERMAN, B., J. Stat. Phys. 41 (1985) 115 ; Phys. Rev. Lett. 52 (1984) 1024.

[19] DOMANY, E., MEIR, R. and KINZEL, Europhys. Lett.

2 (1986) 175.

[20] MEIR, R. and DOMANY, E., Phys. Rev. Lett. 59

(1987) 359 and Europhys. Lett. 4 (1987) 645.

[21] MEIR, R. and DOMANY, E., Phys. Rev. A., in press.

[22] SOMPOLINSKY, H., Phys. Rev. A 34 (1986) 2571.

[23] GARDNER, E., J. Phys. A 19 (1986) L1047.

[24] GARDNER, E., DERRIDA, B. and MOTTISHAW, P., J.

Phys. France 48 (1987) 741.

[25] GARDNER, E., Edinburgh University preprint 87/396

and GARDNER, E. and DERRIDA, B., Edinburgh

University preprint 87/397.

Références

Documents relatifs

In summary, we find that a model of recurrently connected neurons with random effective connections captures the observed activity statistics, in particular the relations

tion of pre-synaptic neurons, whose spiking behavior was drawn from a gaussian distribution, creating a pulse packet [5] with strength (i.e. number of spikes in the packet) a

During training process, in order to make every forward unit responsible to the classification, for every forward unit there are temporary fully connected layers applied to train

Multi- layer feed-forward neural network is used for the prediction of the human driving decisions (vehicle speed and steering wheel angle) for a particular time horizon. Use Case

In the general case of feed-forward neural networks (including convolutional deep neural network architectures), under random noise at- tacks, we propose to study the probability

Combining recent results on rational solutions of the Riccati-Schr¨ odinger equations for shape invariant potentials to the finite difference B¨ acklund algorithm and

Our paper describes the first algorithm which computes the maximum end-to-end delay for a given flow, as well as the maximum backlog at a server, for any feed-forward network

Our paper describes the first algorithm which computes the maximum end-to-end delay for a given flow, as well as the maximum backlog at a server, for any feed-forward network