HAL Id: hal-01957292
https://hal.archives-ouvertes.fr/hal-01957292
Submitted on 17 Dec 2018
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A mathematical approach on memory capacity of a simple synapses model
Pascal Helson
To cite this version:
Pascal Helson. A mathematical approach on memory capacity of a simple synapses model. Interna-tional Conference on Mathematical NeuroScience (ICMNS), Jun 2018, Antibes, France. �hal-01957292�
A M
ATHEMATICAL APPROACH ON MEMORY
CAPACITY OF A SIMPLE SYNAPSES MODEL
P
ASCAL
HELSON
INRIA T
EAMS: T
OSCA& M
ATHN
EUROM
AIL:
PASCAL.
HELSON@
INRIA.
FR1.F
RAMEWORK
Network models of Memory
: Capacity of neural networks in
memorising external inputs is a complex problem which has
given rise to numerous research.
It is widely accepted that
memory sits where communication between two neurons takes
place, in synapses [1]. It involves a huge number of chemical
reactions, cascades, ion flows, protein states and even more
mechanisms, which makes it really complex. Such a complexity
stresses the need of simplifying models: this is done in network
models of memory.
Problem
: Most of these models don’t take into account both
synaptic plasticity and neural dynamic. Adding dynamics on the
weights makes the analysis more difficult which explains that
most models consider either a neural [2, 3, 4] or a synaptic weight
dynamic [5, 6, 7, 8]. We decided to study the binary synapses
model of [9], model we wish to complete with a neural network
afterwards in order to get closer to biology.
Purpose
: Propose a rigorous mathematical approach of the model
of [9] as part of a more ambitious aim which is to have a general
mathematical framework adapted to many models of memory.
2.N
ETWORK MODELS OF
M
EMORY
Three main ingre-dients describes such models. A stimulus has direct effect on neurons which then modify the synaptic weight matrix leading to a stable response of the network possessing the information sent by the stimulus.
3.
A
MIT
-F
USI
M
ODEL
[9]
Discrete time model with two coupled binary processes, stimuli (ξt)t≥0 ∈ {0, 1}N and synaptic weight matrix (Jt)t≥0 ∈ {0, 1}N 2:
Stimuli:
(ξt)t≥0 i.i.d. random vectors ∼ Bernoulli(f)⊗NSynaptic weights dynamic:
∀i Jtii = 0 and at each time step a new stimulus ξt is received by the
network. The components Jtij, i 6= j, jump as follows:
• if ξti, ξtj = (1, 1), Jtij = 0 → Jt+1ij = 1 with probability q+ , • if ξti, ξtj = (0, 1), Jtij = 1 → Jt+1ij = 0 with probability q01− , • if ξti, ξtj = (1, 0), Jtij = 1 → Jt+1ij = 0 with probability q10− , • if ξti, ξtj = (0, 0), Jt+1ij = Jtij.
Remark: The initial condition is not defined on synaptic weights but on the synaptic input into neurons defined as follows.
Synaptic input into neurons:
hit
t≥0 is the field induced by ξ0 presented at time t, in neuron i:
hit = X
j6=i
Jtjiξ0j
In the following, we are interested in the laws of hit|ξ0i = 0 and
hit|ξ0i = 1, respectively called p0t and p1t.
Remark: The state space of hit depends on the size of ξ0: K = Pj6=i ξ0j.
Moreover, as neurons are similar we use notation h1t,K. Finally, it is easy to show pit,K converges to a unique p∞,K.
Initial condition:
Initially, synaptic input follows the stationary distribution: h10,K ∼ p∞,K
4.R
ETRIEVAL
C
RITERIA
Many methods have been used to study the storage capacity of
network models. The more intuitive is maybe to see stimuli to be
learned as attractors of a neural dynamic [3]: the maximal number
of attractors would then be the memory capacity of the model.
Signal to Noise Ratio (SNR) analysis [8, 9] and mean first passage
time to a threshold [7] have also been proposed. The underlying
idea of these methods is that the neural dynamic is ruled by a
threshold on the synaptic input: a linear decision rule. In our case,
we don’t impose such a rule. Our retrieval criteria holds on the
knowledge of the two distributions p
0t,Kand p
1t,K.
Decision rule [10]:
At fixed K, we aim at studying the minimal probability of error assum-ing the neuron 1 knows pit and observes h1t. As p1t,K and p0t,K converges to p∞,K, the error increases with time as distributions get closer:
0 100 200 300 0.00 0.01 0.02 0.03 0.04 0.05 0.06 dendritic sum density of h p_1^(0) p_0^(0) p_inf 0 100 200 300 0.00 0.01 0.02 0.03 dendritic sum density of h p_1^(0) p_1^(1) p_1^(2) p_1^(3) p_1^(4) p_inf
Fig: Distributions pit when f = 0.1 and parameters qx+/− = 0.5. At time t = 0, the two distributions are well separated (left) and then get closer(right).
As long as the probability of error ,Pe(t) defined below, is less than a
given , ξ01 is considered to be retrievable from h1t through the follow-ing decision rule. We are interested in the maximal time for which such a condition is achieved, and in particular we would like to know its de-pendence on the different parameters. We then define Pe(t). Let G =
{g : [[0, N ]] → {0, 1}}: Pe(t) = inf g∈G n P g(h1t) 6= ξ01o = inf g∈G n f P g(h1t ) = 0|ξ10 = 1 + (1 − f )P g(h1t ) = 1|ξ01 = 0 o | {z } L(t,g)
Unlike the SNR analysis which requires only the first two moments of the variable h1t, here we need the knowledge of both p1t,K and p0t,K. Although it is more costly, it is always valid, which is not the case of SNR as it requires p0t and p1t to be approximately Gaussian and this is not the case for some parameters: 0 1000 2000 3000 4000 5000 6000 0.000 0.001 0.002 0.003 0.004 0.005 0.006 dendritic sum distribution of h p_3^0 p_3^1 p_inf A: Distribution of pi3 and p∞ . 10 20 30 40 50 1000 2000 3000 4000 5000 6000 Time h B: Trajectory of ht Fig: Parameters are the following: f = 0.5, q+ = 0.9, q10− = 0.1, q−
01 = 0.9
We see the difference between the two rules on the probability of error:
0 2 4 6 8 10 12 14 0.0 0.1 0.2 0.3 0.4 Time Error probability linear rule, K=10 linear rule, K=100 linear rule, K=1000 Bayes rule, K = 10 Bayes rule, K = 100 Bayes rule, K = 1000 A: f = 0.5 = q+ = q10− = q− 01 . 0 2 4 6 8 10 12 14 0.0 0.1 0.2 0.3 0.4 Time Error probability linear rule, K=10 linear rule, K=100 linear rule, K=1000 Bayes rule, K = 10 Bayes rule, K = 100 Bayes rule, K = 1000 B: q+ = q01− = 0.9, q− 10 = 0.1, f = 0.5.
Fig: Probability of error in function of linear (SNR) and Bayes (the one we use) rules and parameters.
SNR analysis:
In [9], they propose an analysis based on SN Rt =
S2t Rt : • Signal: St = E h h1t| ξ01 = 1i − E hh1t| ξ01 = 0i • Noise: Rt = Var h h1t i
Because of correlation between synapses dynamics, SNR is only computed in a specific case leading to capacity P = max{t ∈ N, SN Rt > log(N )}:
f = log(N ) N , q − 01, q − 10 ∝ f q + ⇒ P ∝ N log(N ) !2 (1)
5.F
IRST
R
ESULTS
Mathematical results:
As having the general forms of p
1t,Kand p
0t,Kis difficult, we first
studied the spectrum of the transition matrix
M
h,Kof h
1tt≥0
and got a first result:
Proposition 1 The spectrum of the transition matrix M
h,Kand the
one of
ξ
t,
J
tj1 1≤j≤K t≥0,
M
ξ,J,Kis the following:
Σ Mh,K ={µi = (1 − f )(1 − f q01− | {z } λ0 )i + f (1 − (1 − f )q10− − f q+ | {z } λ1 )i, 0 ≤ i ≤ K} Σ Mξ,J,K = Σ Mh,K ∪ {0}, multiplicity K − 1 i for µi, 2K for 0In fact, the spectrum is linked to the speed at which stimuli are
forgotten. However, the slower this speed is, the less plastic the
network is. It is a classical compromise in optimising storage
ca-pacity.
Sketch of the proof for Σ M
ξ,J,KWe can write M
ξ,J,Kas a matrix by block with
p
ξ= P (ξ
t= ξ)
and
M
ξthe probability matrix of (J
tj1)
jknowing that ξ
t= ξ
:
Mξ,J,K = pξ1Mξ1 pξ2Mξ1 . . . pξ 2K Mξ1 pξ1Mξ2 pξ2 Mξ2 . . . pξ 2K Mξ2 . . . . . . . . . . . . pξ1Mξ 2K pξ2 Mξ2K . . . pξ2K Mξ2K | {z } 22K−1 ×22K−1 matrix
It is not difficult to show
Σ (M
ξ,J,K) = Σ
M
J,K=
2KX
k=1p
ξkM
ξk
∪ {0}
In particular,
if π is an invariant measure of the
pro-cess with matrix transition M
J,K, πM
J,K=
π
, then
π
K=
h
p
ξ1π p
ξ2π . . . p
ξ2K
π
i
is an invariant measure for
ξ
t,
J
tj1 1≤j≤K t≥0. One can then compute M
ξfrom the
following 2 × 2 matrices:
M00 = I2, M01 = 1 0 q01− 1 − q01− , M10 = 1 0 q10− 1 − q10− , M11 = 1 − q+ q+ 0 1Then, using Kronecker product properties, we have the lemma:
Lemma 1 With the notation ⊗
NM = M ⊗ M ⊗ . . . ⊗ M
|
{z
}
Ntimes,
M
J,K=(1 − f ) ⊗
K−1((1 − f )M
00+ f M
01|
{z
}
M0)
+ f ⊗
K−1((1 − f )M
10+ f M
11|
{z
}
M1)
We conclude on Σ (M
ξ,J,K)
using v
0= [1 1]
Tand e
2= [0 1]
T:
M0v0 = v0 = M1v0, M0e2 = λ0e2 and M1e2 = λ1e2 + f q+v0
Let u
i,K= (u
1i,K, . . . , u
K
i
i,K
)
vectors which can be written
as the Kronecker product of i vectors e
2and (K − i − 1) v
0,
u
1i,K= ⊗
ie
2⊗
K−1−iv
0, then:
MJ,Kuji,K = (1 − f )(λ0) i + f (λ1)i | {z } µi u j i,K+ i−1 X k=0 X l αk,lulk,K
In this basis, M
J,Kis triangular superior with µ
ion the diagonal
with the multiplicity
K−1i, it ends the proof on Σ (M
ξ,J,K)
.
Simulation for shaping intuition:
Thanks to simulations, we have a look to the case f and q small:
0 50 100 150 0.00 0.01 0.02 0.03 0.04 0.05 dendritic sum density of h correlated, K = 1000 independent, K = 1000 A: q+ = q01− = q− 10 = 0.5 50 75 100 125 150 0.00 0.01 0.02 0.03 0.04 dendritic sum density of h correlated, K = 1000 independent, K = 1000 birth and death
B: f q+ = q01− = q−
10 = 0.001
Fig: Invariant distributions when f = 0.1 and K = 1000.
The independent case is the model considering every Jtij evolves indepen-dently following the dynamic of one synapse in the model defined in 3. We can see in simulations that the behaviour of h1t is similar to the one in the independent case when f and q are small enough. Moreover, this model of independent synapses leads to similar results as (1) for the SNR analy-sis. Finally, under the assumption that synapses evolve independently, πti,the equivalent of pit in the previous model, are binomial laws. In fact, Jtij con-verges in law to the invariant distribution π =
π−, π+ with speed λt = 1 − f2q+ − f (1 − f )(q01− + q10− ) t
. Hence, we get the follow-ing results thanks to the generatfollow-ing function:
πt,K0 ∼ B K, π+ − π+q01− λt
, πt,K1 ∼ B K, π+ + π−q+λt Therefore, it seems to be possible to extend the result (1) for a larger range of parameters f and q thanks to the similarities between the independent model and the one presented in 3 when f and q are small. However, such a similarity would reduce the model to a too simple model loosing its initial interest.
6.C
ONCLUSION
When previous studies have considered small coding level f in
order to get results on the storage capacity depending on N , such
an assumption seems to make the model loose its initial interest
sited in the correlations between synapses. Our approach aims at
studying the synaptic input into a neuron taking correlations into
account through a decision rule. It enabled us to get a first result
on the speed of forgetting stimuli thanks to a spectral analysis.
7.P
ERSPECTIVES
• Show p∞,K converges to π∞,K when f , q are small • Control the probability of
er-ror thanks to the study in the independent case: f , q small
• A study of the probability of error taking into account all the vector ht
• Add neural dynamics as a feedback to maintain weights structure longer and enhance storage capacity
R
EFERENCES
[1] G. Mongillo, O. Barak, and M. Tsodyks. Longuet-Higgins. Synaptic Theory of Work-ing Memory. In Science ’08
[2] D. J. Willshaw, O. P. Buneman, and H. C. Longuet-Higgins. Non-Holographic Asso-ciative Memory. In Nature ’69
[3] J. J. Hopfield. Neural networks and physical systems with emergent collective com-putational abilities. In Proceedings of the national academy of sciences ’82
[4] V. Gripon, J. Heusel, M. Lwe, and F. Vermet. A Comparative Study of Sparse Asso-ciative Memories. In Journal of Statistical Physics ’16
[5] N. Brunel and F. Carusi. Slow stochastic Hebbian learning of classes in recurrent neural networks In Network: Computation in Neural Systems ’98
[6] S. Fusi and L. F. Abbott. Limits on the memory storage capacity of bounded synapses. In Nature Neuroscience ’07
[7] T. Elliott. Memory Nearly on a Spring: A Mean First Passage Time Approach to Memory Lifetimes. In Neural Computation ’14
[8] M. K. Benna and S. Fusi. Computational principles of synaptic memory consolida-tion. In Nature Neuroscience ’16
[9] D. J. Amit and S. Fusi. Learning in Neural Networks with Material Synapses. In Neural Computation ’94
[10] L. Devroye, L. Györfi, and G. Lugosi. A probabilistic theory of pattern recognition. In Springer ’96