Memory capacity of neural networks learning within bounds

(1)

HAL Id: jpa-00210652

https://hal.archives-ouvertes.fr/jpa-00210652

Submitted on 1 Jan 1987

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Memory capacity of neural networks learning within bounds

Mirta Gordon

To cite this version:

Mirta Gordon. Memory capacity of neural networks learning within bounds. Journal de Physique,

1987, 48 (12), pp.2053-2058. �10.1051/jphys:0198700480120205300�. �jpa-00210652�

(2)

Memory capacity ^of neural networks learning within bounds

Mirta B. Gordon

Centre d’Etudes Nucléaires de Grenoble, Département de Recherche Fondamentale/Service de Physique, Groupe Magnétisme ^et Diffraction Neutronique (*), ⁸⁵ X, 38041 Grenoble Cedex, ^France

(Reçu ^{le 7} juillet ^1987, accept6 le 12 aoat 1987)

Résumé.

^-

Nous présentons ^un modèle de mémoire à long ^{terme :} apprentissage ^avec bornes irréversibles.

Les meilleures valeurs des bornes et la capacité de mémoire sont déterminés numériquement. ^Nous ^montrons qu’il ^est ^possible ^en général de calculer analytiquement ^la capacité de mémoire si l’on résout le problème ^de

marche aléatoire associé à chaque règle d’apprentissage. Nos estimations

²⁰¹⁴

faites pour plusieurs règles d’apprentissage

²⁰¹⁴

^sont ^en excellent accord avec les résultats numériques ^et ^de mécanique statistique.

Abstract.

²⁰¹⁴

We present â ^{model of} long ^term memory : learning within irreversible bounds. The best bound values and memory capacity âre determined numerically. ^We show that it is possible ⁱⁿ general ^to ^calculate analytically the memory capacity by solving the random walk problem associated to a given learning ^rule. Ôur

estimations

²⁰¹⁴

done for several learning ^rules

²⁰¹⁴

^are in excellent agreement with numerical and analytical

statistical mechanics results.

Classification

Physics ^Abstracts

75.10H - 64.60 - 87.30

In the last few years, â great âmount of work has been done on the properties of networks of formal neurons, proposed by Hopfield [1] âs ^{models of}

associative memories. In these models, ^each ^neuron

i is represented by ^a spin variable oi which ^can take

only ^two ^values ai =1 or ai = - I . Any ^state ^{of the} system is defined by the values {o-i, ^U2,

^...,

UN} == U ^taken by êach ône of the N spins ôr

neurons. Pairs of neurons i, j interact with strengths

Cij, ^the ^synaptic efficacies, ^which âre ^modified by learning. Âs ûsual, ^we ^{denote 6’} (v

⁼

1, 2, ...) ^the

learnt states or patterns. Retrieval of patterns îs â dynamic process in which each spin ^{takes the} sign ôf

the local field :

acting ^on ^{it. The} primed sum means that terms

j ^{= i} ^{should be} ignored. ^A ^learnt ^{state ç v} ^{is said} ^to

be memorized or retrieved if, starting ^with ^the

network in state ç v it relaxes towards a final state close to ç v. ^In general, ^{the final} ^state ^can be very different from ç v, and will be denoted lv. ^The

overlap between both :

gives ^{a measure} of retrieval quality.

The simplest ^local learning prescription [2] for p

learnt patterns is Hebb’s rule :

Assuming that the values of )I’ ^are ^{random and} uncorrelated, it has been shown [1-3] ^{that the}

maximum number of patterns p ^that ^can ^be memorized with Hebb’s learning ^{rule is} proportional

to the number of neurons : p

⁼

aN, ^{with a}

⁼

0.145 ± 0.009. If more than aN patterns ^are learnt,

memory breaks down and ^none of the learnt patterns

are retrieved.

In order to avoid this catastrophic effect, ^different

modifications of Hebb’s rule were proposed [4-6].

The simplest ^one is the so-called learning ^within

bounds [5] : synaptic efficacies are modified by learning ^{in the} ^same way ^as Hebb’s rule, ^{but their}

values are constrained to remain within ^some chosen range. In the version proposed by ^Parisi [4] ^bounds

are reversible : once a Cij ^reaches ^a ^barrier, ^it

remains at its value until a pattern is learnt that returns it inside the allowed range. This is ^a model of

Article published online by EDP Sciences and available at http://dx.doi.org/10.1051/jphys:0198700480120205300

(3)

2054

short term memory : only the last learnt patterns âre retrieved, ôld ^memories âre gradually êrased by learning. With this learning ^rule ^no deterioration occurs, but the storage capacity is smaller than with Hebb’s rule.

In the first part of this paper, ^we present ^numerical simulations on a model of long ^term memory, which is an irreversible version of learning within bounds : those synaptic efficacies that reach a bound remain at its value for ever [6]. ^{The best} ^bounds ^{and the}

storage capacity ^are ^similar ^to those found with reversible bounds, ^but ^now ^the first, ^and ^not ^the last,

learnt patterns ^are ^memorized.

In the second part of the paper, ^we show that a

quantitative analysis of the random walk associated to each learning ^rule gives ^a very good estimate of the network’s memory capacity. ^We present ^results for the standard Hebb’s rule and for different variants of learning within bounds. Generalization to other learning ^{rules is} straightforward, ^{and is} presented in section 3.

1. Learning ^within irreversible bounds. Numerical simulations.

The learning ^rule ^with irreversible bounds or barriers

with C ij (0)

⁼

^0. Sij ^{is the} pattern number for which

Cij first reaches ^a bound. Patterns after Sij ^are ^not learnt and the synaptic efficacy is saturated. For

m - oo, the standard Hebb’s rule is recovered. But, unlike in Hebbian learning, ^{with rule} (4) ^{the number}

u - the ^« time » at which l ’ ^{is learnt}

^-

is relevant.

In our numerical simulations, ^random _patterns

were learnt following (4). ^Each ^time ^{a new} pattern

was added, ^the ^retrieval quality of all the previously

stored patterns ^was ^{tested :} starting with the network in a learnt state, spins ^are ^allowed ^to flip ^{with Monte}

Carlo sequential dynamics until relaxation to a state in which each spin takes the sign of the field (1) acting ^on it. A learnt pattern is considered as well memorized if its overlap q with the relaxed state is q > 0.97. Any other value would give nearly ^the

same results because patterns âre either retrieved without almost any êrror (q ~ 1 ), ôr with q ^1.

The bound value giving maximal number of well retrieved patterns, mopt, ^was determined for net- works with N

⁼

100, 150, 200 and 400 neurons by testing different values of m. Figure ¹ ^{shows the}

retrieval quality (2) ^{as a} function of the .pattern number, ^{for N}

⁼

400. With the best bounds (mopt ),

the overlap jumps abruptly ^{from 1} ^to ^a ^small value,

Fig. 1- Overlap between the learnt pattern ^{and the} retrieved state vs. u the number of learnt pattern, ^once p patterns ^were learnt with the best bound value m

⁼

mopt.

Fig. 2. - Number of well retrieved patterns (q > 0.97 )

vs. number of learnt patterns.

showing ^that only the first learnt patterns ^are memorized. Figure ^{2 is} ^a plot of the number of well retrieved patterns ^versus the number of learnt pat-

terns. For m mopt, a smaller number of patterns

are retrieved in the asymptotic regime (p large), ^and

for m > mopt the number of retrieved patterns ^van- ishes for large p, ^as ^it should, because in the large m limit, the standard Hebb’s rule

^-

with its memory deterioration

^-

is recovered.

Optimal bound values are proportional ^to ^the

network size, ^but ^we ^do ^not ^have enough accuracy ^to establish numerically ^{the law} mopt (N ). In ^next

section it is shown that mopt =..! 0.3 J N, and numeri-

cal data are consistent with this prediction. ^{With the}

optimal ^bounds, ^we ^find ^a storage capacity

(4)

= 0.05 N . These results show that learning ^within

irreversible bounds is a model of long ^term memory in the sense that only old learnt patterns ^are remembered. The catastrophic deterioration of Hebb’s rule is avoided by stopping ^the acquisition ^of

new patterns ^once the memory is saturated. The

capacity, ^{and the} ^« best » bound values, ^are ^similar

to those of reversible learning [4]

^-

^the ^« memory which forgets ^».

2. Random walk analysis.

For uncorrelated random learnt patterns, the synap- tic efficacies Cij ^perform random walks of steps

1 . _N In this section we show how a probabilistic p

analysis gives the maximum memory capacity ^{of the}

network under a given learning rule. It is based on

the following ^fact

^-

observed in ^our numerical simulations : when the initial state of the network is

a learnt state, then either it remains in this state upon relaxation (retrieval ^{is then} perfect, q =1) ^or

it moves away, and this from the very first Monte Carlo step, ^to â ^distant ^state (q small). ^This suggests that an analysis ^based ôn the first Monte Carlo step should be able to predict the memory capacity ôf â

network with a given learning rule. That this is the

case is shown in this and the following sections. We first present the method on Hebb’s rule, ^{for which} analytic and very ^accurate numerical simulations

exist, ^to show how it works on a simple model,

before applying ^it ^to learning within bounds.

2.1 HOPFIELD MODEL. - The learning ^{rule is} given by’ (3). ^{When the} network is in the learnt state

6 ’, ^{the field} acting ^on ^neuron i, averaged ^over ^{all the}

learnt patterns

^-

assumed random and uncorrelated - is

Therefore, when the network is allowed to relax, spins ^should

^-

ⁱⁿ the average

^-

remain in state

g v. Note that if the initial state is not a learnt state,

then hi

⁼

^0.

The second moment of the field distribution for p

learnt patterns ^{is :}

The first contribution ^to hf ^comes ^{from the} ^terms

j

⁼

k. It exists also if the network is not in a learnt state [1, 7]. The second contribution comes from

terms j # k ^{and is} fî2¡ (neglecting ^terms ^{of order} 1/N). The variance of the field acting ^{on a} given

neuron is then

, , .

Therefore, ^even if the initial state is a learnt state, say 6 II, ^when p/N ^is large enough, ^{there is} ^some probability ^{that the} sign of the field acting ^{on a}

neuron i is opposite ^to gr. ^This probability (we drop

down subscript i, âll ^neurons being equivalent) îs â

function of Alh2 :

For small x, the function P (x ) vanishes like

exp (- x- 2), and is linear in x in the neighbourhood

of x* = 1/3, the inflexion point. ^It ^can ^be approxi-

mated (Fig. 3) by ^a straigth ^line passing by ^{x *,}

dP I 1 ( 3 ) 3/2

P(x.)

⁼

^{0.042 of} slope

dx x.

=

J7r ^{2 e} ^;:

0.2313, ^which ^crosses the x axis at xo

⁼

0.153. For x « xo, P (x) == 0. Beyond ^the ^crossover point ât 0.153, êrrors in retrieval are expected. ^From (5) ând (7), Alh2

⁼

p/N ; the maximum number of patterns that can be learnt before errors in retrieval become

important is therefore p = 0.153 N, in excellent agreement with theoretical [3] and numerical [1-2]

results z

⁼

^{0.145 ±} 0.009).

The prescription for maximum storage capacity ^is

then

Fig. ^3.

^-

Probability ^of hi 6i’ ⁰ ^{as a} function of x

⁼

(5)

2056

In what follows, ^the ^same argument ^is applied ^to

other learning ^rules.

2.2 LEARNING WITHIN IRREVERSIBLE BOUNDS. - When the network is in state g v the average field

acting ^{on a} neuron i is (to lower order in 1 IN)

where P(s > v ) ^{is the} probability ^to perform ^a

random walk of more than v steps ^between absorbing

barriers at m and - m, ^without absorption. ^For large

v (see Appendix Aa) :

The variance of the field is easily ^seen ^to ^{be :}

where P (s ) ^{is the} probability ^that absorption ^takes place ^{in s} steps, ^so that s is the mean number of patterns ^learnt by ^a bond before its strength

Cij ^sticks ^to the bounds. From the random walk

problem (Appendix Aa) :

Unlike in the Hebbian scheme of learning, ^{in the} present ^case ^the dispersion ^of the field values is constant

^-

limited by the bounds. Storage capacity

is limited because the average field

^-

^constant with Hebb’s rule

^-

now decreases with the pattern number. Therefore, only the first learnt patterns have a field on each neuron large enough ^to ^ensure good retrieval. Introducing (9) ^to (12) ^into (8) gives

the maximum number v of patterns expected ^to ^be memorized, ^for ^a given ^m.

After maximization of v with respect ^{to m,} ^we ^find

in very good agreement ^with ^our ^numerical

simulations.

2.3 LEARNING WITHIN REVERSIBLE BOUNDS. -

With this learning ^scheme [4], ^the synaptic ^efficacies

show reversible saturation effects. They ^stick ^to ^the

bounds and do not learn those patterns ^{that would} make them take values beyond ^the ^allowed range.

Let sij ^{be the} ^pattern ^that ^produced ^the ^last ^satura-

tion effect ^on bond ij. The values taken by

Cij ^on ^learning ^the ^patterns ^that ^follow ^pattern

Sij, ^are all within the allowed range, ^as if barriers did

not exist. After learning ^a large ^{number of} patterns :

The random walk between reversible barriers gets into an equilibrium distribution : C ij (p ) takes any of the allowed values N (n = m, m -1, ..., - m) ^with

N

probability ¹

^-^.

When the network is in state

probability 2m+1

^.

When the network is in state

) v, ^{the field} averaged ôver all the learnt patterns ând its variance, âre given by (see Appendix Ab) :

were 11 = p - v is the pattern number counted

starting from the last learnt one, and P (x > 11 ) ^{is the} probability ^of ^a random walk of more than q steps starting ^from ⁺ ^m ^{or -} ^m, ^without sticking ^to ^the

barriers. For q > 1 we get (see Appendix Ab)

The field is now a decreasing function of _{q :} the effect of learning ^new patterns îs ^to lower the local fields acting ôn ôlder patterns, and the variance of the field distribution remains constant. Introducing (15) ând (16) înto (8), ând maximizing q ^with respect

to m gives

in good agreement with numerical results [4] : m,pt =-= ^0.35 JN ; q (mopt ) - ^{0.04 N.}

It is interesting ^to apply ^this analysis ^to learning

without synaptic sign changes [5]. ^The learning ^rule

is the same as (14), but half of the synaptic ^efficacies

are constrained between m/N ^and ^0, the others between 0 and - m/N. ^{From the} corresponding

random walk,

The field decreases faster with q ^than (16), ^for ^a given ^m, because the allowed range for the Cij ^{is half}

as before and therefore saturation effects appear in fewer steps. However, the variance is of the same

order of magnitude

(6)

Therefore, memory capacity will be smaller than when synaptic sign changes âre âllowed. Indeed, ône

finds the same value (Eq. (17a)) ^for mopt ^as ^before (this ^{value is} only ^a function of 4) but q is 4 times

smaller, ’T1 (mopt) = 0.011 N, in fairly good agree- ment with numerical results [5], which with our

3. Generalization to other learning ^rules.

The results of section 2 can easily ^be generalized ^to learning rules with variable acquisition intensities :

The average field ^{on a} ^neuron, when the network is in the learnt state g v ^{is :}

- ....

and the dispersion ^is given by :

With Hebb’s rule, À #L ^{= 1} and the results of sec-

tion 2.1 are recovered. Here, ^the condition for pattern v ^to be well retrieved is :

An example ^{of such} ^a rule is the marginalist learning [5, 9], ^{in which} weights ^increase exponentially ⁱⁿ

order to ensure good retrieval of the last learnt pattern. Introduction of À II.

⁼

eIF2IA/I Nin ₍₁₉₎ and

(20) shows that within this scheme, both the average field and its dispersion increase with learning. ^If good retrieval of only the last learnt pattern ^is

imposed, then v = p in (21), and the value of

e2 that ensures this must satisfy

That is, ^E

⁼

2.56, which is the value estimated

numerically ⁱⁿ [5], and is in very good agreement with e

⁼

2.465, ^the replica symmetric solution of this model [9]. ^But ^{it is} possible ^to ^do ^better, ^{and ask}

that the last q ^learnt patterns ^be retrieved. Introduc-

ing v = p - q into (21), ^we ^{find :}

Maximising ^7y ^with respect to £2 gives Eopt , ^the

JOURNAL DE PHYSIQUE. - T. 48, ^N° 12, DTCEMBRE 1987

« best >> E21 ^{and the} ^number of well retrieved states :

again in excellent agreement ^{with the} theoretical

predictions [9] Bopt

⁼

4.108, 71 ( Eopt )

⁼

^0.04895 ^N.

Result (21) shows that the normalization of the

p

Cij that consists of dividing ^it by A 2 ^does ^not

JA i

affect the memory capacity, ^{and also} suggests ^how other selective learning ^rules ^can ^be ^devised. ^{It is} possible, ^for example, ^to give stronger weights ^to ^the

most « important » patterns, ^{in order} ^to keep ^them

in memory ^even when other patterns ^are forgotten,

or reinforce [9] the memorization of a given pattern

v when it is at the limit of being ^erased (sign

⁼

ⁱⁿ (21)), by learning ^it again.

Conclusion.

We analysed different schemes of learning sequences of uncorrelated patterns. When the network is in a

learnt state, the average value h of the field acting ^on

a given neuron, produced by ^{all the} others, ^{has the}

same sign ^as the neuron’s spin. The network should remain in the learnt state. The probability ^to ^have ^a

field of opposite sign ^is vanishingly ^{small for} ^a ^small

number of stored patterns, ^{but the} ^crossover ^to ^a regime where this probability increases almost linear-

ly ^sets ^an upper limit ^to storage capacity. ^The

maximum storage capacity is attained when A =

h 0.153, where d is the mean square ^width of the field distribution. We tested this prescription ^on ^several

models of learning ^within bounds, proposed ^as

models of short and long ^term memory. The esti- mated storage capacity and the best bound values

are in excellent agreement with the numerical re-

sults.

With Hebb’s rule, h = 1 and remains constant with pattern acquisition, ^{while A} increases. At crossover, because A and h are the same for all learnt patterns, âll ôf ^them âre «forgotten»

together. ^In learning ^within ^{bounds, d} ^is ^constant and h decreases with the pattern ^{number :} memory is lost only ^{of those} patterns that have small values of

h. Generalization to other learning schemes is

straightforward, ^the storage capacity ^with ^a given

rule can be estimated once h and A are known.

The fact that our predictions, ^based ^{on a} ^first

Monte Carlo step, ^{are so} successful, suggests ^that the size of the basins of attraction at maximum storage capacity ^is

^~

N / [2(max. ^storage capacity) ].

The factor 2 is there because patterns g ^and - g ^cannot ^be distinguished ⁱⁿ Hopfield’s ^networks.

132

(7)

2058

This extends to other learning ^rules ^a result that is exact with Hebb’s rule [10].

Finally, several authors [1, 3, 5, 7, 8] already pointed ^out that memory deterioration is due ^to the

increasing ^noise ôn synaptic efficacies, produced by acquisition ôf ^new patterns. Ôur approach gives â quantitative estimate of storage capacity, ûntil ^now only âvailable by numerical simulations or

^-

in

some special ^cases

^-

by statistical mechanics cal- culations.

Acknowledgments.

Useful discussions with Pierre Peretto, ^who suggested the model of learning within irreversible

bounds, ^are gratefully acknowledged.

Appendix ^A.

The solution to the random walk between barriers and some intermediate results leading ^to ^for-

mulae (10), (12) ^and (16) ^are summarized in this

appendix.

A(a) ^ABSORBING ^BARRIERS.

^-

^For ^a random walk

[11] ^between absorbing ^barriers ât ^{+ m} ^{and -} ^m, ^the probability ôf performing â ^{walk of n} steps ^from state i to state j îs

where À k = COS (k 7r /2 m ) ^are ^the eigenvalues ^of

the transition probability matrix, ^and vj(k)

⁼

^sin

[(j+m) kTT/2m]/Jm (j = m - l,

^...,

-m+l)

the corresponding eigenvector components.

The probability ^of ^a random walk of more than v steps ^without absorption starting ^from ⁱ

⁼

^{0 is then}

The dominant contribution to this sum is the term k = 1, ^which gives equation (10).

The mean number of patterns ^learnt by ^a given

bond Cij before saturation is the mean time to

absorption ^Y in the random walk problem. It is the derivative of the generating function of the probabili-

ty ^of absorption [11]

where f0,m(s) ^{is the} probability ^{of first} passage ^from

state 0 to state m in s steps, ^and A± (x) = (I ±

B/l - x)lx. ^It is then easy ^to check that s = lim dfldx

⁼

m2, ^which gives equation (12).

x -

A(b) ^NON ABSORBING BARRIERS.

^-

The stationary probability distribution is given by ^the eigenvector ^of

the eigenvalue 1 of the transition probability ^matrix.

This gives ^the ^same probability for all the 2 m + 1 allowed states, namely (2 m + 1 )-1.

We are interested on the walks of more than q steps starting ^from ^{+ m} ^{or - m} ^{that do} ^not ^stick ^to

the barriers. Their probability ^P (t -- q ) ^can ^be

deduced from the random walk between absorbing

barriers at m + 1 and - (m ⁺ 1 ) âs ^the ^sum ôf ^the following ^{terms :} 1) the random walks starting ât ^m, making â ^first step of - 1 and then 17 - 1 steps without absorption ; 2) ^those starting ^{at - m,} ^mak- ing â ^first step ôf ⁺ ¹ ând then q steps ^without absorption ; 3) those walks starting ^{at n} ( - m ⁺

1 n m - 1) performing q steps ^without absorp-

tion.

Each of these terms enters in the sum multiplied by ^the probability (2 m ⁺ 1) - ^of starting ^{the random}

walk at the corresponding point. ^The problem ^is

therefore reduced to calculate sums of terms of the form (A. 1) ^with ^m ⁺ ¹ instead of m. The dominant term of the sum gives equation (16).

References

[1] HOPFIELD, ^J. J., ^Proc. Natl. Acad. Sci. (USA) ⁷⁹ (1982) ^2554.

[2] PERETTO, P., ^On learning ^{rules and} ^memory storage abilities of ^neural networks, preprint (1987).

[3] CRISANTI, A., AMIT, ^D. J., GUTFREUND, H., Europhys. ^{Lett. 2} (1986) ^337.

[4] PARISI, G., ^J. Phys. ^A ¹⁹ (1986) ^{L 617.}

[5] ^NADAL, ^J. P., TOULOUSE, G., CHANGEUX, ^{J. P. and} DEHAENE, S., Europhys. ^Lett. ¹ (1986) ^535.

[6] This model has been suggested by P. Peretto.

[7] PERETTO, P., NIEZ, J. J., ^Biol. Cybern. ⁵⁴ (1986) ^1.

[8] WEISBUCH, G., FOGELMAN-SOULIÉ, F., ^J. Physique

Lett. 46 (1985) ^L ^623.

[9] MÉZARD, M., NADAL, J. P. and TOULOUSE, G., ^J.

Physique ⁴⁷ (1986) ^1457.

[10] COTTRELL, M., Preprint ^1987.

[11] Cox, ^D. R., MILLER, ^H. D., ^The theory of stochastic

processes (Ed. Chapman ^{and Hall} Ltd. , London)

1977.

Memory capacity of neural networks learning within bounds

HAL Id: jpa-00210652

https://hal.archives-ouvertes.fr/jpa-00210652

Submitted on 1 Jan 1987

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Memory capacity of neural networks learning within bounds

Mirta Gordon

To cite this version:

Mirta Gordon. Memory capacity of neural networks learning within bounds. Journal de Physique,

1987, 48 (12), pp.2053-2058. �10.1051/jphys:0198700480120205300�. �jpa-00210652�

Memory capacity of neural networks learning within bounds

Mirta B. Gordon

Centre d’Etudes Nucléaires de Grenoble, Département de Recherche Fondamentale/Service de Physique, Groupe Magnétisme et Diffraction Neutronique (*), 85 X, 38041 Grenoble Cedex, France

(Reçu le 7 juillet 1987, accept6 le 12 aoat 1987)

Résumé.

Nous présentons un modèle de mémoire à long terme : apprentissage avec bornes irréversibles.

Les meilleures valeurs des bornes et la capacité de mémoire sont déterminés numériquement. Nous montrons qu’il est possible en général de calculer analytiquement la capacité de mémoire si l’on résout le problème de

marche aléatoire associé à chaque règle d’apprentissage. Nos estimations

faites pour plusieurs règles d’apprentissage

sont en excellent accord avec les résultats numériques et de mécanique statistique.

Abstract.

estimations

done for several learning rules

are in excellent agreement with numerical and analytical

statistical mechanics results.

Classification

Physics Abstracts

75.10H - 64.60 - 87.30

In the last few years, a great amount of work has been done on the properties of networks of formal neurons, proposed by Hopfield [1] as models of

associative memories. In these models, each neuron

i is represented by a spin variable oi which can take

only two values ai =1 or ai = - I . Any state of the system is defined by the values {o-i, U2,

UN} == U taken by each one of the N spins or

neurons. Pairs of neurons i, j interact with strengths

Cij, the synaptic efficacies, which are modified by learning. As usual, we denote 6’ (v

1, 2, ...) the

learnt states or patterns. Retrieval of patterns is a dynamic process in which each spin takes the sign of

the local field :

acting on it. The primed sum means that terms

j = i should be ignored. A learnt state ç v is said to

be memorized or retrieved if, starting with the

network in state ç v it relaxes towards a final state close to ç v. In general, the final state can be very different from ç v, and will be denoted lv. The

overlap between both :

gives a measure of retrieval quality.

The simplest local learning prescription [2] for p

learnt patterns is Hebb’s rule :

Assuming that the values of )I’ are random and uncorrelated, it has been shown [1-3] that the

maximum number of patterns p that can be memorized with Hebb’s learning rule is proportional

to the number of neurons : p

aN, with a

0.145 ± 0.009. If more than aN patterns are learnt,

memory breaks down and none of the learnt patterns

are retrieved.

In order to avoid this catastrophic effect, different

modifications of Hebb’s rule were proposed [4-6].

The simplest one is the so-called learning within

bounds [5] : synaptic efficacies are modified by learning in the same way as Hebb’s rule, but their

values are constrained to remain within some chosen range. In the version proposed by Parisi [4] bounds

are reversible : once a Cij reaches a barrier, it

remains at its value until a pattern is learnt that returns it inside the allowed range. This is a model of

Article published online by EDP Sciences and available at http://dx.doi.org/10.1051/jphys:0198700480120205300

2054

short term memory : only the last learnt patterns are retrieved, old memories are gradually erased by learning. With this learning rule no deterioration occurs, but the storage capacity is smaller than with Hebb’s rule.

In the first part of this paper, we present numerical simulations on a model of long term memory, which is an irreversible version of learning within bounds : those synaptic efficacies that reach a bound remain at its value for ever [6]. The best bounds and the

storage capacity are similar to those found with reversible bounds, but now the first, and not the last,

learnt patterns are memorized.

In the second part of the paper, we show that a

1. Learning within irreversible bounds. Numerical simulations.

The learning rule with irreversible bounds or barriers

with C ij (0)

0. Sij is the pattern number for which

Cij first reaches a bound. Patterns after Sij are not learnt and the synaptic efficacy is saturated. For

m - oo, the standard Hebb’s rule is recovered. But, unlike in Hebbian learning, with rule (4) the number

u - the « time » at which l ’ is learnt

is relevant.

In our numerical simulations, random patterns

were learnt following (4). Each time a new pattern

was added, the retrieval quality of all the previously

stored patterns was tested : starting with the network in a learnt state, spins are allowed to flip with Monte

Memory capacity ^of neural networks learning within bounds

Centre d’Etudes Nucléaires de Grenoble, Département de Recherche Fondamentale/Service de Physique, Groupe Magnétisme ^et Diffraction Neutronique (*), ⁸⁵ X, 38041 Grenoble Cedex, ^France

(Reçu ^{le 7} juillet ^1987, accept6 le 12 aoat 1987)

Nous présentons ^un modèle de mémoire à long ^{terme :} apprentissage ^avec bornes irréversibles.

Les meilleures valeurs des bornes et la capacité de mémoire sont déterminés numériquement. ^Nous ^montrons qu’il ^est ^possible ^en général de calculer analytiquement ^la capacité de mémoire si l’on résout le problème ^de

^sont ^en excellent accord avec les résultats numériques ^et ^de mécanique statistique.

done for several learning ^rules

^are in excellent agreement with numerical and analytical

Physics ^Abstracts

In the last few years, â great âmount of work has been done on the properties of networks of formal neurons, proposed by Hopfield [1] âs ^{models of}

associative memories. In these models, ^each ^neuron

i is represented by ^a spin variable oi which ^can take

only ^two ^values ai =1 or ai = - I . Any ^state ^{of the} system is defined by the values {o-i, ^U2,

UN} == U ^taken by êach ône of the N spins ôr

Cij, ^the ^synaptic efficacies, ^which âre ^modified by learning. Âs ûsual, ^we ^{denote 6’} (v

1, 2, ...) ^the

learnt states or patterns. Retrieval of patterns îs â dynamic process in which each spin ^{takes the} sign ôf

acting ^on ^{it. The} primed sum means that terms

j ^{= i} ^{should be} ignored. ^A ^learnt ^{state ç v} ^{is said} ^to

be memorized or retrieved if, starting ^with ^the

network in state ç v it relaxes towards a final state close to ç v. ^In general, ^{the final} ^state ^can be very different from ç v, and will be denoted lv. ^The

gives ^{a measure} of retrieval quality.

The simplest ^local learning prescription [2] for p

Assuming that the values of )I’ ^are ^{random and} uncorrelated, it has been shown [1-3] ^{that the}

maximum number of patterns p ^that ^can ^be memorized with Hebb’s learning ^{rule is} proportional

aN, ^{with a}

0.145 ± 0.009. If more than aN patterns ^are learnt,

memory breaks down and ^none of the learnt patterns

In order to avoid this catastrophic effect, ^different

The simplest ^one is the so-called learning ^within

bounds [5] : synaptic efficacies are modified by learning ^{in the} ^same way ^as Hebb’s rule, ^{but their}

values are constrained to remain within ^some chosen range. In the version proposed by ^Parisi [4] ^bounds

are reversible : once a Cij ^reaches ^a ^barrier, ^it

remains at its value until a pattern is learnt that returns it inside the allowed range. This is ^a model of

short term memory : only the last learnt patterns âre retrieved, ôld ^memories âre gradually êrased by learning. With this learning ^rule ^no deterioration occurs, but the storage capacity is smaller than with Hebb’s rule.

In the first part of this paper, ^we present ^numerical simulations on a model of long ^term memory, which is an irreversible version of learning within bounds : those synaptic efficacies that reach a bound remain at its value for ever [6]. ^{The best} ^bounds ^{and the}

storage capacity ^are ^similar ^to those found with reversible bounds, ^but ^now ^the first, ^and ^not ^the last,

learnt patterns ^are ^memorized.

In the second part of the paper, ^we show that a

1. Learning ^within irreversible bounds. Numerical simulations.

The learning ^rule ^with irreversible bounds or barriers

^0. Sij ^{is the} pattern number for which

Cij first reaches ^a bound. Patterns after Sij ^are ^not learnt and the synaptic efficacy is saturated. For

m - oo, the standard Hebb’s rule is recovered. But, unlike in Hebbian learning, ^{with rule} (4) ^{the number}

u - the ^« time » at which l ’ ^{is learnt}

In our numerical simulations, ^random _patterns

were learnt following (4). ^Each ^time ^{a new} pattern

was added, ^the ^retrieval quality of all the previously

stored patterns ^was ^{tested :} starting with the network in a learnt state, spins ^are ^allowed ^to flip ^{with Monte}

Carlo sequential dynamics until relaxation to a state in which each spin takes the sign of the field (1) acting ^on it. A learnt pattern is considered as well memorized if its overlap q with the relaxed state is q > 0.97. Any other value would give nearly ^the

same results because patterns âre either retrieved without almost any êrror (q ~ 1 ), ôr with q ^1.

The bound value giving maximal number of well retrieved patterns, mopt, ^was determined for net- works with N

100, 150, 200 and 400 neurons by testing different values of m. Figure ¹ ^{shows the}

retrieval quality (2) ^{as a} function of the .pattern number, ^{for N}

the overlap jumps abruptly ^{from 1} ^to ^a ^small value,

Fig. 1- Overlap between the learnt pattern ^{and the} retrieved state vs. u the number of learnt pattern, ^once p patterns ^were learnt with the best bound value m

Fig. 2. - Number of well retrieved patterns (q > 0.97 )

showing ^that only the first learnt patterns ^are memorized. Figure ^{2 is} ^a plot of the number of well retrieved patterns ^versus the number of learnt pat-

are retrieved in the asymptotic regime (p large), ^and

for m > mopt the number of retrieved patterns ^van- ishes for large p, ^as ^it should, because in the large m limit, the standard Hebb’s rule

Optimal bound values are proportional ^to ^the

network size, ^but ^we ^do ^not ^have enough accuracy ^to establish numerically ^{the law} mopt (N ). In ^next

cal data are consistent with this prediction. ^{With the}

optimal ^bounds, ^we ^find ^a storage capacity

= 0.05 N . These results show that learning ^within

irreversible bounds is a model of long ^term memory in the sense that only old learnt patterns ^are remembered. The catastrophic deterioration of Hebb’s rule is avoided by stopping ^the acquisition ^of

new patterns ^once the memory is saturated. The

capacity, ^{and the} ^« best » bound values, ^are ^similar

^the ^« memory which forgets ^».

For uncorrelated random learnt patterns, the synap- tic efficacies Cij ^perform random walks of steps

1 . _N In this section we show how a probabilistic p

analysis gives the maximum memory capacity ^{of the}

the following ^fact

observed in ^our numerical simulations : when the initial state of the network is

a learnt state, then either it remains in this state upon relaxation (retrieval ^{is then} perfect, q =1) ^or

it moves away, and this from the very first Monte Carlo step, ^to â ^distant ^state (q small). ^This suggests that an analysis ^based ôn the first Monte Carlo step should be able to predict the memory capacity ôf â

case is shown in this and the following sections. We first present the method on Hebb’s rule, ^{for which} analytic and very ^accurate numerical simulations

exist, ^to show how it works on a simple model,

before applying ^it ^to learning within bounds.

2.1 HOPFIELD MODEL. - The learning ^{rule is} given by’ (3). ^{When the} network is in the learnt state