HAL Id: jpa-00210652
https://hal.archives-ouvertes.fr/jpa-00210652
Submitted on 1 Jan 1987
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Memory capacity of neural networks learning within bounds
Mirta Gordon
To cite this version:
Mirta Gordon. Memory capacity of neural networks learning within bounds. Journal de Physique,
1987, 48 (12), pp.2053-2058. �10.1051/jphys:0198700480120205300�. �jpa-00210652�
Memory capacity of neural networks learning within bounds
Mirta B. Gordon
Centre d’Etudes Nucléaires de Grenoble, Département de Recherche Fondamentale/Service de Physique, Groupe Magnétisme et Diffraction Neutronique (*), 85 X, 38041 Grenoble Cedex, France
(Reçu le 7 juillet 1987, accept6 le 12 aoat 1987)
Résumé.
-Nous présentons un modèle de mémoire à long terme : apprentissage avec bornes irréversibles.
Les meilleures valeurs des bornes et la capacité de mémoire sont déterminés numériquement. Nous montrons qu’il est possible en général de calculer analytiquement la capacité de mémoire si l’on résout le problème de
marche aléatoire associé à chaque règle d’apprentissage. Nos estimations
2014faites pour plusieurs règles d’apprentissage
2014sont en excellent accord avec les résultats numériques et de mécanique statistique.
Abstract.
2014We present a model of long term memory : learning within irreversible bounds. The best bound values and memory capacity are determined numerically. We show that it is possible in general to calculate analytically the memory capacity by solving the random walk problem associated to a given learning rule. Our
estimations
2014done for several learning rules
2014are in excellent agreement with numerical and analytical
statistical mechanics results.
Classification
Physics Abstracts
75.10H - 64.60 - 87.30
In the last few years, a great amount of work has been done on the properties of networks of formal neurons, proposed by Hopfield [1] as models of
associative memories. In these models, each neuron
i is represented by a spin variable oi which can take
only two values ai =1 or ai = - I . Any state of the system is defined by the values {o-i, U2,
...,UN} == U taken by each one of the N spins or
neurons. Pairs of neurons i, j interact with strengths
Cij, the synaptic efficacies, which are modified by learning. As usual, we denote 6’ (v
=1, 2, ...) the
learnt states or patterns. Retrieval of patterns is a dynamic process in which each spin takes the sign of
the local field :
acting on it. The primed sum means that terms
j = i should be ignored. A learnt state ç v is said to
be memorized or retrieved if, starting with the
network in state ç v it relaxes towards a final state close to ç v. In general, the final state can be very different from ç v, and will be denoted lv. The
overlap between both :
gives a measure of retrieval quality.
The simplest local learning prescription [2] for p
learnt patterns is Hebb’s rule :
Assuming that the values of )I’ are random and uncorrelated, it has been shown [1-3] that the
maximum number of patterns p that can be memorized with Hebb’s learning rule is proportional
to the number of neurons : p
=aN, with a
=0.145 ± 0.009. If more than aN patterns are learnt,
memory breaks down and none of the learnt patterns
are retrieved.
In order to avoid this catastrophic effect, different
modifications of Hebb’s rule were proposed [4-6].
The simplest one is the so-called learning within
bounds [5] : synaptic efficacies are modified by learning in the same way as Hebb’s rule, but their
values are constrained to remain within some chosen range. In the version proposed by Parisi [4] bounds
are reversible : once a Cij reaches a barrier, it
remains at its value until a pattern is learnt that returns it inside the allowed range. This is a model of
Article published online by EDP Sciences and available at http://dx.doi.org/10.1051/jphys:0198700480120205300
2054
short term memory : only the last learnt patterns are retrieved, old memories are gradually erased by learning. With this learning rule no deterioration occurs, but the storage capacity is smaller than with Hebb’s rule.
In the first part of this paper, we present numerical simulations on a model of long term memory, which is an irreversible version of learning within bounds : those synaptic efficacies that reach a bound remain at its value for ever [6]. The best bounds and the
storage capacity are similar to those found with reversible bounds, but now the first, and not the last,
learnt patterns are memorized.
In the second part of the paper, we show that a
quantitative analysis of the random walk associated to each learning rule gives a very good estimate of the network’s memory capacity. We present results for the standard Hebb’s rule and for different variants of learning within bounds. Generalization to other learning rules is straightforward, and is presented in section 3.
1. Learning within irreversible bounds. Numerical simulations.
The learning rule with irreversible bounds or barriers
with C ij (0)
=0. Sij is the pattern number for which
Cij first reaches a bound. Patterns after Sij are not learnt and the synaptic efficacy is saturated. For
m - oo, the standard Hebb’s rule is recovered. But, unlike in Hebbian learning, with rule (4) the number
u - the « time » at which l ’ is learnt
-is relevant.
In our numerical simulations, random patterns
were learnt following (4). Each time a new pattern
was added, the retrieval quality of all the previously
stored patterns was tested : starting with the network in a learnt state, spins are allowed to flip with Monte
Carlo sequential dynamics until relaxation to a state in which each spin takes the sign of the field (1) acting on it. A learnt pattern is considered as well memorized if its overlap q with the relaxed state is q > 0.97. Any other value would give nearly the
same results because patterns are either retrieved without almost any error (q ~ 1 ), or with q 1.
The bound value giving maximal number of well retrieved patterns, mopt, was determined for net- works with N
=100, 150, 200 and 400 neurons by testing different values of m. Figure 1 shows the
retrieval quality (2) as a function of the .pattern number, for N
=400. With the best bounds (mopt ),
the overlap jumps abruptly from 1 to a small value,
Fig. 1- Overlap between the learnt pattern and the retrieved state vs. u the number of learnt pattern, once p patterns were learnt with the best bound value m
=mopt.
Fig. 2. - Number of well retrieved patterns (q > 0.97 )
vs. number of learnt patterns.
showing that only the first learnt patterns are memorized. Figure 2 is a plot of the number of well retrieved patterns versus the number of learnt pat-
terns. For m mopt, a smaller number of patterns
are retrieved in the asymptotic regime (p large), and
for m > mopt the number of retrieved patterns van- ishes for large p, as it should, because in the large m limit, the standard Hebb’s rule
-with its memory deterioration
-is recovered.
Optimal bound values are proportional to the
network size, but we do not have enough accuracy to establish numerically the law mopt (N ). In next
section it is shown that mopt =..! 0.3 J N, and numeri-
cal data are consistent with this prediction. With the
optimal bounds, we find a storage capacity
= 0.05 N . These results show that learning within
irreversible bounds is a model of long term memory in the sense that only old learnt patterns are remembered. The catastrophic deterioration of Hebb’s rule is avoided by stopping the acquisition of
new patterns once the memory is saturated. The
capacity, and the « best » bound values, are similar
to those of reversible learning [4]
-the « memory which forgets ».
2. Random walk analysis.
For uncorrelated random learnt patterns, the synap- tic efficacies Cij perform random walks of steps
1 . N In this section we show how a probabilistic p
analysis gives the maximum memory capacity of the
network under a given learning rule. It is based on
the following fact
-observed in our numerical simulations : when the initial state of the network is
a learnt state, then either it remains in this state upon relaxation (retrieval is then perfect, q =1) or
it moves away, and this from the very first Monte Carlo step, to a distant state (q small). This suggests that an analysis based on the first Monte Carlo step should be able to predict the memory capacity of a
network with a given learning rule. That this is the
case is shown in this and the following sections. We first present the method on Hebb’s rule, for which analytic and very accurate numerical simulations
exist, to show how it works on a simple model,
before applying it to learning within bounds.
2.1 HOPFIELD MODEL. - The learning rule is given by’ (3). When the network is in the learnt state
6 ’, the field acting on neuron i, averaged over all the
learnt patterns
-assumed random and uncorrelated - is
Therefore, when the network is allowed to relax, spins should
-in the average
-remain in state
g v. Note that if the initial state is not a learnt state,
then hi
=0.
The second moment of the field distribution for p
learnt patterns is :
The first contribution to hf comes from the terms
j
=k. It exists also if the network is not in a learnt state [1, 7]. The second contribution comes from
terms j # k and is fî2¡ (neglecting terms of order 1/N). The variance of the field acting on a given
neuron is then
, , .
Therefore, even if the initial state is a learnt state, say 6 II, when p/N is large enough, there is some probability that the sign of the field acting on a
neuron i is opposite to gr. This probability (we drop
down subscript i, all neurons being equivalent) is a
function of Alh2 :
For small x, the function P (x ) vanishes like
exp (- x- 2), and is linear in x in the neighbourhood
of x* = 1/3, the inflexion point. It can be approxi-
mated (Fig. 3) by a straigth line passing by x *,
dP I 1 ( 3 ) 3/2
P(x.)
=0.042 of slope
dx x.
=
J7r 2 e ;:
0.2313, which crosses the x axis at xo
=0.153. For x « xo, P (x) == 0. Beyond the crossover point at 0.153, errors in retrieval are expected. From (5) and (7), Alh2
=p/N ; the maximum number of patterns that can be learnt before errors in retrieval become
important is therefore p = 0.153 N, in excellent agreement with theoretical [3] and numerical [1-2]
results z
=0.145 ± 0.009).
The prescription for maximum storage capacity is
then
Fig. 3.
-Probability of hi 6i’ 0 as a function of x
=2056
In what follows, the same argument is applied to
other learning rules.
2.2 LEARNING WITHIN IRREVERSIBLE BOUNDS. - When the network is in state g v the average field
acting on a neuron i is (to lower order in 1 IN)
where P(s > v ) is the probability to perform a
random walk of more than v steps between absorbing
barriers at m and - m, without absorption. For large
v (see Appendix Aa) :
The variance of the field is easily seen to be :
where P (s ) is the probability that absorption takes place in s steps, so that s is the mean number of patterns learnt by a bond before its strength
Cij sticks to the bounds. From the random walk
problem (Appendix Aa) :
Unlike in the Hebbian scheme of learning, in the present case the dispersion of the field values is constant
-limited by the bounds. Storage capacity
is limited because the average field
-constant with Hebb’s rule
-now decreases with the pattern number. Therefore, only the first learnt patterns have a field on each neuron large enough to ensure good retrieval. Introducing (9) to (12) into (8) gives
the maximum number v of patterns expected to be memorized, for a given m.
After maximization of v with respect to m, we find
in very good agreement with our numerical
simulations.
2.3 LEARNING WITHIN REVERSIBLE BOUNDS. -
With this learning scheme [4], the synaptic efficacies
show reversible saturation effects. They stick to the
bounds and do not learn those patterns that would make them take values beyond the allowed range.
Let sij be the pattern that produced the last satura-
tion effect on bond ij. The values taken by
Cij on learning the patterns that follow pattern
Sij, are all within the allowed range, as if barriers did
not exist. After learning a large number of patterns :
The random walk between reversible barriers gets into an equilibrium distribution : C ij (p ) takes any of the allowed values N (n = m, m -1, ..., - m) with
N
probability 1
- .When the network is in state
probability 2m+1
.When the network is in state
) v, the field averaged over all the learnt patterns and its variance, are given by (see Appendix Ab) :
were 11 = p - v is the pattern number counted
starting from the last learnt one, and P (x > 11 ) is the probability of a random walk of more than q steps starting from + m or - m, without sticking to the
barriers. For q > 1 we get (see Appendix Ab)
The field is now a decreasing function of q : the effect of learning new patterns is to lower the local fields acting on older patterns, and the variance of the field distribution remains constant. Introducing (15) and (16) into (8), and maximizing q with respect
to m gives
in good agreement with numerical results [4] : m,pt =-= 0.35 JN ; q (mopt ) - 0.04 N.
It is interesting to apply this analysis to learning
without synaptic sign changes [5]. The learning rule
is the same as (14), but half of the synaptic efficacies
are constrained between m/N and 0, the others between 0 and - m/N. From the corresponding
random walk,
The field decreases faster with q than (16), for a given m, because the allowed range for the Cij is half
as before and therefore saturation effects appear in fewer steps. However, the variance is of the same
order of magnitude
Therefore, memory capacity will be smaller than when synaptic sign changes are allowed. Indeed, one
finds the same value (Eq. (17a)) for mopt as before (this value is only a function of 4) but q is 4 times
smaller, ’T1 (mopt) = 0.011 N, in fairly good agree- ment with numerical results [5], which with our
3. Generalization to other learning rules.
The results of section 2 can easily be generalized to learning rules with variable acquisition intensities :
The average field on a neuron, when the network is in the learnt state g v is :
- ....