• Aucun résultat trouvé

Memory capacity of neural networks learning within bounds

N/A
N/A
Protected

Academic year: 2021

Partager "Memory capacity of neural networks learning within bounds"

Copied!
7
0
0

Texte intégral

(1)

HAL Id: jpa-00210652

https://hal.archives-ouvertes.fr/jpa-00210652

Submitted on 1 Jan 1987

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Memory capacity of neural networks learning within bounds

Mirta Gordon

To cite this version:

Mirta Gordon. Memory capacity of neural networks learning within bounds. Journal de Physique,

1987, 48 (12), pp.2053-2058. �10.1051/jphys:0198700480120205300�. �jpa-00210652�

(2)

Memory capacity of neural networks learning within bounds

Mirta B. Gordon

Centre d’Etudes Nucléaires de Grenoble, Département de Recherche Fondamentale/Service de Physique, Groupe Magnétisme et Diffraction Neutronique (*), 85 X, 38041 Grenoble Cedex, France

(Reçu le 7 juillet 1987, accept6 le 12 aoat 1987)

Résumé.

-

Nous présentons un modèle de mémoire à long terme : apprentissage avec bornes irréversibles.

Les meilleures valeurs des bornes et la capacité de mémoire sont déterminés numériquement. Nous montrons qu’il est possible en général de calculer analytiquement la capacité de mémoire si l’on résout le problème de

marche aléatoire associé à chaque règle d’apprentissage. Nos estimations

2014

faites pour plusieurs règles d’apprentissage

2014

sont en excellent accord avec les résultats numériques et de mécanique statistique.

Abstract.

2014

We present a model of long term memory : learning within irreversible bounds. The best bound values and memory capacity are determined numerically. We show that it is possible in general to calculate analytically the memory capacity by solving the random walk problem associated to a given learning rule. Our

estimations

2014

done for several learning rules

2014

are in excellent agreement with numerical and analytical

statistical mechanics results.

Classification

Physics Abstracts

75.10H - 64.60 - 87.30

In the last few years, a great amount of work has been done on the properties of networks of formal neurons, proposed by Hopfield [1] as models of

associative memories. In these models, each neuron

i is represented by a spin variable oi which can take

only two values ai =1 or ai = - I . Any state of the system is defined by the values {o-i, U2,

...,

UN} == U taken by each one of the N spins or

neurons. Pairs of neurons i, j interact with strengths

Cij, the synaptic efficacies, which are modified by learning. As usual, we denote 6’ (v

=

1, 2, ...) the

learnt states or patterns. Retrieval of patterns is a dynamic process in which each spin takes the sign of

the local field :

acting on it. The primed sum means that terms

j = i should be ignored. A learnt state ç v is said to

be memorized or retrieved if, starting with the

network in state ç v it relaxes towards a final state close to ç v. In general, the final state can be very different from ç v, and will be denoted lv. The

overlap between both :

gives a measure of retrieval quality.

The simplest local learning prescription [2] for p

learnt patterns is Hebb’s rule :

Assuming that the values of )I’ are random and uncorrelated, it has been shown [1-3] that the

maximum number of patterns p that can be memorized with Hebb’s learning rule is proportional

to the number of neurons : p

=

aN, with a

=

0.145 ± 0.009. If more than aN patterns are learnt,

memory breaks down and none of the learnt patterns

are retrieved.

In order to avoid this catastrophic effect, different

modifications of Hebb’s rule were proposed [4-6].

The simplest one is the so-called learning within

bounds [5] : synaptic efficacies are modified by learning in the same way as Hebb’s rule, but their

values are constrained to remain within some chosen range. In the version proposed by Parisi [4] bounds

are reversible : once a Cij reaches a barrier, it

remains at its value until a pattern is learnt that returns it inside the allowed range. This is a model of

Article published online by EDP Sciences and available at http://dx.doi.org/10.1051/jphys:0198700480120205300

(3)

2054

short term memory : only the last learnt patterns are retrieved, old memories are gradually erased by learning. With this learning rule no deterioration occurs, but the storage capacity is smaller than with Hebb’s rule.

In the first part of this paper, we present numerical simulations on a model of long term memory, which is an irreversible version of learning within bounds : those synaptic efficacies that reach a bound remain at its value for ever [6]. The best bounds and the

storage capacity are similar to those found with reversible bounds, but now the first, and not the last,

learnt patterns are memorized.

In the second part of the paper, we show that a

quantitative analysis of the random walk associated to each learning rule gives a very good estimate of the network’s memory capacity. We present results for the standard Hebb’s rule and for different variants of learning within bounds. Generalization to other learning rules is straightforward, and is presented in section 3.

1. Learning within irreversible bounds. Numerical simulations.

The learning rule with irreversible bounds or barriers

with C ij (0)

=

0. Sij is the pattern number for which

Cij first reaches a bound. Patterns after Sij are not learnt and the synaptic efficacy is saturated. For

m - oo, the standard Hebb’s rule is recovered. But, unlike in Hebbian learning, with rule (4) the number

u - the « time » at which l ’ is learnt

-

is relevant.

In our numerical simulations, random patterns

were learnt following (4). Each time a new pattern

was added, the retrieval quality of all the previously

stored patterns was tested : starting with the network in a learnt state, spins are allowed to flip with Monte

Carlo sequential dynamics until relaxation to a state in which each spin takes the sign of the field (1) acting on it. A learnt pattern is considered as well memorized if its overlap q with the relaxed state is q > 0.97. Any other value would give nearly the

same results because patterns are either retrieved without almost any error (q ~ 1 ), or with q 1.

The bound value giving maximal number of well retrieved patterns, mopt, was determined for net- works with N

=

100, 150, 200 and 400 neurons by testing different values of m. Figure 1 shows the

retrieval quality (2) as a function of the .pattern number, for N

=

400. With the best bounds (mopt ),

the overlap jumps abruptly from 1 to a small value,

Fig. 1- Overlap between the learnt pattern and the retrieved state vs. u the number of learnt pattern, once p patterns were learnt with the best bound value m

=

mopt.

Fig. 2. - Number of well retrieved patterns (q > 0.97 )

vs. number of learnt patterns.

showing that only the first learnt patterns are memorized. Figure 2 is a plot of the number of well retrieved patterns versus the number of learnt pat-

terns. For m mopt, a smaller number of patterns

are retrieved in the asymptotic regime (p large), and

for m > mopt the number of retrieved patterns van- ishes for large p, as it should, because in the large m limit, the standard Hebb’s rule

-

with its memory deterioration

-

is recovered.

Optimal bound values are proportional to the

network size, but we do not have enough accuracy to establish numerically the law mopt (N ). In next

section it is shown that mopt =..! 0.3 J N, and numeri-

cal data are consistent with this prediction. With the

optimal bounds, we find a storage capacity

(4)

= 0.05 N . These results show that learning within

irreversible bounds is a model of long term memory in the sense that only old learnt patterns are remembered. The catastrophic deterioration of Hebb’s rule is avoided by stopping the acquisition of

new patterns once the memory is saturated. The

capacity, and the « best » bound values, are similar

to those of reversible learning [4]

-

the « memory which forgets ».

2. Random walk analysis.

For uncorrelated random learnt patterns, the synap- tic efficacies Cij perform random walks of steps

1 . N In this section we show how a probabilistic p

analysis gives the maximum memory capacity of the

network under a given learning rule. It is based on

the following fact

-

observed in our numerical simulations : when the initial state of the network is

a learnt state, then either it remains in this state upon relaxation (retrieval is then perfect, q =1) or

it moves away, and this from the very first Monte Carlo step, to a distant state (q small). This suggests that an analysis based on the first Monte Carlo step should be able to predict the memory capacity of a

network with a given learning rule. That this is the

case is shown in this and the following sections. We first present the method on Hebb’s rule, for which analytic and very accurate numerical simulations

exist, to show how it works on a simple model,

before applying it to learning within bounds.

2.1 HOPFIELD MODEL. - The learning rule is given by’ (3). When the network is in the learnt state

6 ’, the field acting on neuron i, averaged over all the

learnt patterns

-

assumed random and uncorrelated - is

Therefore, when the network is allowed to relax, spins should

-

in the average

-

remain in state

g v. Note that if the initial state is not a learnt state,

then hi

=

0.

The second moment of the field distribution for p

learnt patterns is :

The first contribution to hf comes from the terms

j

=

k. It exists also if the network is not in a learnt state [1, 7]. The second contribution comes from

terms j # k and is fî2¡ (neglecting terms of order 1/N). The variance of the field acting on a given

neuron is then

, , .

Therefore, even if the initial state is a learnt state, say 6 II, when p/N is large enough, there is some probability that the sign of the field acting on a

neuron i is opposite to gr. This probability (we drop

down subscript i, all neurons being equivalent) is a

function of Alh2 :

For small x, the function P (x ) vanishes like

exp (- x- 2), and is linear in x in the neighbourhood

of x* = 1/3, the inflexion point. It can be approxi-

mated (Fig. 3) by a straigth line passing by x *,

dP I 1 ( 3 ) 3/2

P(x.)

=

0.042 of slope

dx x.

=

J7r 2 e ;:

0.2313, which crosses the x axis at xo

=

0.153. For x « xo, P (x) == 0. Beyond the crossover point at 0.153, errors in retrieval are expected. From (5) and (7), Alh2

=

p/N ; the maximum number of patterns that can be learnt before errors in retrieval become

important is therefore p = 0.153 N, in excellent agreement with theoretical [3] and numerical [1-2]

results z

=

0.145 ± 0.009).

The prescription for maximum storage capacity is

then

Fig. 3.

-

Probability of hi 6i’ 0 as a function of x

=

(5)

2056

In what follows, the same argument is applied to

other learning rules.

2.2 LEARNING WITHIN IRREVERSIBLE BOUNDS. - When the network is in state g v the average field

acting on a neuron i is (to lower order in 1 IN)

where P(s > v ) is the probability to perform a

random walk of more than v steps between absorbing

barriers at m and - m, without absorption. For large

v (see Appendix Aa) :

The variance of the field is easily seen to be :

where P (s ) is the probability that absorption takes place in s steps, so that s is the mean number of patterns learnt by a bond before its strength

Cij sticks to the bounds. From the random walk

problem (Appendix Aa) :

Unlike in the Hebbian scheme of learning, in the present case the dispersion of the field values is constant

-

limited by the bounds. Storage capacity

is limited because the average field

-

constant with Hebb’s rule

-

now decreases with the pattern number. Therefore, only the first learnt patterns have a field on each neuron large enough to ensure good retrieval. Introducing (9) to (12) into (8) gives

the maximum number v of patterns expected to be memorized, for a given m.

After maximization of v with respect to m, we find

in very good agreement with our numerical

simulations.

2.3 LEARNING WITHIN REVERSIBLE BOUNDS. -

With this learning scheme [4], the synaptic efficacies

show reversible saturation effects. They stick to the

bounds and do not learn those patterns that would make them take values beyond the allowed range.

Let sij be the pattern that produced the last satura-

tion effect on bond ij. The values taken by

Cij on learning the patterns that follow pattern

Sij, are all within the allowed range, as if barriers did

not exist. After learning a large number of patterns :

The random walk between reversible barriers gets into an equilibrium distribution : C ij (p ) takes any of the allowed values N (n = m, m -1, ..., - m) with

N

probability 1

- .

When the network is in state

probability 2m+1

.

When the network is in state

) v, the field averaged over all the learnt patterns and its variance, are given by (see Appendix Ab) :

were 11 = p - v is the pattern number counted

starting from the last learnt one, and P (x > 11 ) is the probability of a random walk of more than q steps starting from + m or - m, without sticking to the

barriers. For q > 1 we get (see Appendix Ab)

The field is now a decreasing function of q : the effect of learning new patterns is to lower the local fields acting on older patterns, and the variance of the field distribution remains constant. Introducing (15) and (16) into (8), and maximizing q with respect

to m gives

in good agreement with numerical results [4] : m,pt =-= 0.35 JN ; q (mopt ) - 0.04 N.

It is interesting to apply this analysis to learning

without synaptic sign changes [5]. The learning rule

is the same as (14), but half of the synaptic efficacies

are constrained between m/N and 0, the others between 0 and - m/N. From the corresponding

random walk,

The field decreases faster with q than (16), for a given m, because the allowed range for the Cij is half

as before and therefore saturation effects appear in fewer steps. However, the variance is of the same

order of magnitude

(6)

Therefore, memory capacity will be smaller than when synaptic sign changes are allowed. Indeed, one

finds the same value (Eq. (17a)) for mopt as before (this value is only a function of 4) but q is 4 times

smaller, ’T1 (mopt) = 0.011 N, in fairly good agree- ment with numerical results [5], which with our

3. Generalization to other learning rules.

The results of section 2 can easily be generalized to learning rules with variable acquisition intensities :

The average field on a neuron, when the network is in the learnt state g v is :

- ....

and the dispersion is given by :

With Hebb’s rule, À #L = 1 and the results of sec-

tion 2.1 are recovered. Here, the condition for pattern v to be well retrieved is :

An example of such a rule is the marginalist learning [5, 9], in which weights increase exponentially in

order to ensure good retrieval of the last learnt pattern. Introduction of À II.

=

eIF2IA/I Nin (19) and

(20) shows that within this scheme, both the average field and its dispersion increase with learning. If good retrieval of only the last learnt pattern is

imposed, then v = p in (21), and the value of

e2 that ensures this must satisfy

That is, E

=

2.56, which is the value estimated

numerically in [5], and is in very good agreement with e

=

2.465, the replica symmetric solution of this model [9]. But it is possible to do better, and ask

that the last q learnt patterns be retrieved. Introduc-

ing v = p - q into (21), we find :

Maximising 7y with respect to £2 gives Eopt , the

JOURNAL DE PHYSIQUE. - T. 48, 12, DTCEMBRE 1987

« best >> E21 and the number of well retrieved states :

again in excellent agreement with the theoretical

predictions [9] Bopt

=

4.108, 71 ( Eopt )

=

0.04895 N.

Result (21) shows that the normalization of the

p

Cij that consists of dividing it by A 2 does not

JA i

affect the memory capacity, and also suggests how other selective learning rules can be devised. It is possible, for example, to give stronger weights to the

most « important » patterns, in order to keep them

in memory even when other patterns are forgotten,

or reinforce [9] the memorization of a given pattern

v when it is at the limit of being erased (sign

=

in (21)), by learning it again.

Conclusion.

We analysed different schemes of learning sequences of uncorrelated patterns. When the network is in a

learnt state, the average value h of the field acting on

a given neuron, produced by all the others, has the

same sign as the neuron’s spin. The network should remain in the learnt state. The probability to have a

field of opposite sign is vanishingly small for a small

number of stored patterns, but the crossover to a regime where this probability increases almost linear-

ly sets an upper limit to storage capacity. The

maximum storage capacity is attained when A =

h 0.153, where d is the mean square width of the field distribution. We tested this prescription on several

models of learning within bounds, proposed as

models of short and long term memory. The esti- mated storage capacity and the best bound values

are in excellent agreement with the numerical re-

sults.

With Hebb’s rule, h = 1 and remains constant with pattern acquisition, while A increases. At crossover, because A and h are the same for all learnt patterns, all of them are «forgotten»

together. In learning within bounds, d is constant and h decreases with the pattern number : memory is lost only of those patterns that have small values of

h. Generalization to other learning schemes is

straightforward, the storage capacity with a given

rule can be estimated once h and A are known.

The fact that our predictions, based on a first

Monte Carlo step, are so successful, suggests that the size of the basins of attraction at maximum storage capacity is

~

N / [2(max. storage capacity) ].

The factor 2 is there because patterns g and - g cannot be distinguished in Hopfield’s networks.

132

(7)

2058

This extends to other learning rules a result that is exact with Hebb’s rule [10].

Finally, several authors [1, 3, 5, 7, 8] already pointed out that memory deterioration is due to the

increasing noise on synaptic efficacies, produced by acquisition of new patterns. Our approach gives a quantitative estimate of storage capacity, until now only available by numerical simulations or

-

in

some special cases

-

by statistical mechanics cal- culations.

Acknowledgments.

Useful discussions with Pierre Peretto, who suggested the model of learning within irreversible

bounds, are gratefully acknowledged.

Appendix A.

The solution to the random walk between barriers and some intermediate results leading to for-

mulae (10), (12) and (16) are summarized in this

appendix.

A(a) ABSORBING BARRIERS.

-

For a random walk

[11] between absorbing barriers at + m and - m, the probability of performing a walk of n steps from state i to state j is

where À k = COS (k 7r /2 m ) are the eigenvalues of

the transition probability matrix, and vj(k)

=

sin

[(j+m) kTT/2m]/Jm (j = m - l,

...,

-m+l)

the corresponding eigenvector components.

The probability of a random walk of more than v steps without absorption starting from i

=

0 is then

The dominant contribution to this sum is the term k = 1, which gives equation (10).

The mean number of patterns learnt by a given

bond Cij before saturation is the mean time to

absorption Y in the random walk problem. It is the derivative of the generating function of the probabili-

ty of absorption [11]

where f0,m(s) is the probability of first passage from

state 0 to state m in s steps, and A± (x) = (I ±

B/l - x)lx. It is then easy to check that s = lim dfldx

=

m2, which gives equation (12).

x -

A(b) NON ABSORBING BARRIERS.

-

The stationary probability distribution is given by the eigenvector of

the eigenvalue 1 of the transition probability matrix.

This gives the same probability for all the 2 m + 1 allowed states, namely (2 m + 1 )-1.

We are interested on the walks of more than q steps starting from + m or - m that do not stick to

the barriers. Their probability P (t -- q ) can be

deduced from the random walk between absorbing

barriers at m + 1 and - (m + 1 ) as the sum of the following terms : 1) the random walks starting at m, making a first step of - 1 and then 17 - 1 steps without absorption ; 2) those starting at - m, mak- ing a first step of + 1 and then q steps without absorption ; 3) those walks starting at n ( - m +

1 n m - 1) performing q steps without absorp-

tion.

Each of these terms enters in the sum multiplied by the probability (2 m + 1) - of starting the random

walk at the corresponding point. The problem is

therefore reduced to calculate sums of terms of the form (A. 1) with m + 1 instead of m. The dominant term of the sum gives equation (16).

References

[1] HOPFIELD, J. J., Proc. Natl. Acad. Sci. (USA) 79 (1982) 2554.

[2] PERETTO, P., On learning rules and memory storage abilities of neural networks, preprint (1987).

[3] CRISANTI, A., AMIT, D. J., GUTFREUND, H., Europhys. Lett. 2 (1986) 337.

[4] PARISI, G., J. Phys. A 19 (1986) L 617.

[5] NADAL, J. P., TOULOUSE, G., CHANGEUX, J. P. and DEHAENE, S., Europhys. Lett. 1 (1986) 535.

[6] This model has been suggested by P. Peretto.

[7] PERETTO, P., NIEZ, J. J., Biol. Cybern. 54 (1986) 1.

[8] WEISBUCH, G., FOGELMAN-SOULIÉ, F., J. Physique

Lett. 46 (1985) L 623.

[9] MÉZARD, M., NADAL, J. P. and TOULOUSE, G., J.

Physique 47 (1986) 1457.

[10] COTTRELL, M., Preprint 1987.

[11] Cox, D. R., MILLER, H. D., The theory of stochastic

processes (Ed. Chapman and Hall Ltd. , London)

1977.

Références

Documents relatifs

In the following we review some relevant vision-based approaches addressing the issue of human activity under- standing. Assuming that extraction of relevant human body information

We discuss here the role of synaptic activity in the induction of intrinsic plasticity in cortical, hippocampal, and cerebellar neurons.. Activation of glutamate receptors initiates

In Experiment 1b, we manipulated semantic relatedness in immediate serial recall tasks with or without an interfering task by presenting lists of six words that

Using the success rate as another index of learning, we also found that both spaced and massed conditioned groups performed differently from their control groups (group effect;

The new results, which close the above-mentioned gap between the lower bound and the upper bound, guarantee that no learning method, given the generative model of the MDP, can

On learning rules and memory storage abilities of asymmetrical neural

In order to meet these necessities, we propose to consider a learning ecosystem as a system of information systems developed from a collaborative model and including

Une fois la détection des dé- fauts effectuée, l’étape suivante, dans notre processus de restauration de films, est la correction de ces défauts par inpainting vidéo, qui, comme