• Aucun résultat trouvé

A mathematical approach on memory capacity of a simple synapses model

N/A
N/A
Protected

Academic year: 2021

Partager "A mathematical approach on memory capacity of a simple synapses model"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01957292

https://hal.archives-ouvertes.fr/hal-01957292

Submitted on 17 Dec 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A mathematical approach on memory capacity of a simple synapses model

Pascal Helson

To cite this version:

Pascal Helson. A mathematical approach on memory capacity of a simple synapses model. Interna-tional Conference on Mathematical NeuroScience (ICMNS), Jun 2018, Antibes, France. �hal-01957292�

(2)

A M

ATHEMATICAL APPROACH ON MEMORY

CAPACITY OF A SIMPLE SYNAPSES MODEL

P

ASCAL

HELSON

INRIA T

EAMS

: T

OSCA

& M

ATH

N

EURO

M

AIL

:

PASCAL

.

HELSON

@

INRIA

.

FR

1.F

RAMEWORK

Network models of Memory

: Capacity of neural networks in

memorising external inputs is a complex problem which has

given rise to numerous research.

It is widely accepted that

memory sits where communication between two neurons takes

place, in synapses [1]. It involves a huge number of chemical

reactions, cascades, ion flows, protein states and even more

mechanisms, which makes it really complex. Such a complexity

stresses the need of simplifying models: this is done in network

models of memory.

Problem

: Most of these models don’t take into account both

synaptic plasticity and neural dynamic. Adding dynamics on the

weights makes the analysis more difficult which explains that

most models consider either a neural [2, 3, 4] or a synaptic weight

dynamic [5, 6, 7, 8]. We decided to study the binary synapses

model of [9], model we wish to complete with a neural network

afterwards in order to get closer to biology.

Purpose

: Propose a rigorous mathematical approach of the model

of [9] as part of a more ambitious aim which is to have a general

mathematical framework adapted to many models of memory.

2.N

ETWORK MODELS OF

M

EMORY

Three main ingre-dients describes such models. A stimulus has direct effect on neurons which then modify the synaptic weight matrix leading to a stable response of the network possessing the information sent by the stimulus.

3.

A

MIT

-F

USI

M

ODEL

[9]

Discrete time model with two coupled binary processes, stimulit)t≥0 ∈ {0, 1}N and synaptic weight matrix (Jt)t≥0 ∈ {0, 1}N 2:

Stimuli:

t)t≥0 i.i.d. random vectors ∼ Bernoulli(f)⊗N

Synaptic weights dynamic:

∀i Jtii = 0 and at each time step a new stimulus ξt is received by the

network. The components Jtij, i 6= j, jump as follows:

• if ξti, ξtj = (1, 1), Jtij = 0 → Jt+1ij = 1 with probability q+ , • if ξti, ξtj  = (0, 1), Jtij = 1 → Jt+1ij = 0 with probability q01− , • if ξti, ξtj = (1, 0), Jtij = 1 → Jt+1ij = 0 with probability q10− , • if ξti, ξtj = (0, 0), Jt+1ij = Jtij.

Remark: The initial condition is not defined on synaptic weights but on the synaptic input into neurons defined as follows.

Synaptic input into neurons:



hit

t≥0 is the field induced by ξ0 presented at time t, in neuron i:

hit = X

j6=i

Jtjiξ0j

In the following, we are interested in the laws of hit0i = 0 and 

hit0i = 1, respectively called p0t and p1t.

Remark: The state space of hit depends on the size of ξ0: K = Pj6=i ξ0j.

Moreover, as neurons are similar we use notation h1t,K. Finally, it is easy to show pit,K converges to a unique p∞,K.

Initial condition:

Initially, synaptic input follows the stationary distribution: h10,K ∼ p∞,K

4.R

ETRIEVAL

C

RITERIA

Many methods have been used to study the storage capacity of

network models. The more intuitive is maybe to see stimuli to be

learned as attractors of a neural dynamic [3]: the maximal number

of attractors would then be the memory capacity of the model.

Signal to Noise Ratio (SNR) analysis [8, 9] and mean first passage

time to a threshold [7] have also been proposed. The underlying

idea of these methods is that the neural dynamic is ruled by a

threshold on the synaptic input: a linear decision rule. In our case,

we don’t impose such a rule. Our retrieval criteria holds on the

knowledge of the two distributions p

0t,K

and p

1t,K

.

Decision rule [10]:

At fixed K, we aim at studying the minimal probability of error assum-ing the neuron 1 knows pit and observes h1t. As p1t,K and p0t,K converges to p∞,K, the error increases with time as distributions get closer:

0 100 200 300 0.00 0.01 0.02 0.03 0.04 0.05 0.06 dendritic sum density of h p_1^(0) p_0^(0) p_inf 0 100 200 300 0.00 0.01 0.02 0.03 dendritic sum density of h p_1^(0) p_1^(1) p_1^(2) p_1^(3) p_1^(4) p_inf

Fig: Distributions pit when f = 0.1 and parameters qx+/− = 0.5. At time t = 0, the two distributions are well separated (left) and then get closer(right).

As long as the probability of error ,Pe(t) defined below, is less than a

given , ξ01 is considered to be retrievable from h1t through the follow-ing decision rule. We are interested in the maximal time for which such a condition is achieved, and in particular we would like to know its de-pendence on the different parameters. We then define Pe(t). Let G =

{g : [[0, N ]] → {0, 1}}: Pe(t) = inf g∈G n P  g(h1t) 6= ξ01o = inf g∈G n f P  g(h1t ) = 0|ξ10 = 1  + (1 − f )P  g(h1t ) = 1|ξ01 = 0 o | {z } L(t,g)

Unlike the SNR analysis which requires only the first two moments of the variable h1t, here we need the knowledge of both p1t,K and p0t,K. Although it is more costly, it is always valid, which is not the case of SNR as it requires p0t and p1t to be approximately Gaussian and this is not the case for some parameters: 0 1000 2000 3000 4000 5000 6000 0.000 0.001 0.002 0.003 0.004 0.005 0.006 dendritic sum distribution of h p_3^0 p_3^1 p_inf A: Distribution of pi3 and p∞ . 10 20 30 40 50 1000 2000 3000 4000 5000 6000 Time h B: Trajectory of ht Fig: Parameters are the following: f = 0.5, q+ = 0.9, q10− = 0.1, q−

01 = 0.9

We see the difference between the two rules on the probability of error:

0 2 4 6 8 10 12 14 0.0 0.1 0.2 0.3 0.4 Time Error probability linear rule, K=10 linear rule, K=100 linear rule, K=1000 Bayes rule, K = 10 Bayes rule, K = 100 Bayes rule, K = 1000 A: f = 0.5 = q+ = q10− = q− 01 . 0 2 4 6 8 10 12 14 0.0 0.1 0.2 0.3 0.4 Time Error probability linear rule, K=10 linear rule, K=100 linear rule, K=1000 Bayes rule, K = 10 Bayes rule, K = 100 Bayes rule, K = 1000 B: q+ = q01− = 0.9, q− 10 = 0.1, f = 0.5.

Fig: Probability of error in function of linear (SNR) and Bayes (the one we use) rules and parameters.

SNR analysis:

In [9], they propose an analysis based on SN Rt =

S2t Rt : • Signal: St = E h h1t| ξ01 = 1i − E hh1t| ξ01 = 0i • Noise: Rt = Var h h1t i

Because of correlation between synapses dynamics, SNR is only computed in a specific case leading to capacity P = max{t ∈ N, SN Rt > log(N )}:

f = log(N ) N , q − 01, q − 10 ∝ f q + ⇒ P ∝ N log(N ) !2 (1)

5.F

IRST

R

ESULTS

Mathematical results:

As having the general forms of p

1t,K

and p

0t,K

is difficult, we first

studied the spectrum of the transition matrix

M

h,K

of h

1t



t≥0

and got a first result:

Proposition 1 The spectrum of the transition matrix M

h,K

and the

one of



ξ

t

,



J

tj1



1≤j≤K



t≥0

,

M

ξ,J,K

is the following:

Σ Mh,K ={µi = (1 − f )(1 − f q01− | {z } λ0 )i + f (1 − (1 − f )q10− − f q+ | {z } λ1 )i, 0 ≤ i ≤ K} Σ Mξ,J,K = Σ Mh,K ∪ {0}, multiplicity K − 1 i  for µi, 2K for 0

In fact, the spectrum is linked to the speed at which stimuli are

forgotten. However, the slower this speed is, the less plastic the

network is. It is a classical compromise in optimising storage

ca-pacity.

Sketch of the proof for Σ M

ξ,J,K



We can write M

ξ,J,K

as a matrix by block with

p

ξ

= P (ξ

t

= ξ)

and

M

ξ

the probability matrix of (J

tj1

)

j

knowing that ξ

t

= ξ

:

Mξ,J,K =        pξ1Mξ1 pξ2Mξ1 . . . pξ 2K Mξ1 pξ1Mξ2 pξ2 Mξ2 . . . pξ 2K Mξ2 . . . . . . . . . . . . pξ1Mξ 2K pξ2 Mξ2K . . . pξ2K Mξ2K        | {z } 22K−1 ×22K−1 matrix

It is not difficult to show

Σ (M

ξ,J,K

) = Σ

M

J,K

=

2K

X

k=1

p

ξk

M

ξk

 ∪ {0}

In particular,

if π is an invariant measure of the

pro-cess with matrix transition M

J,K

, πM

J,K

=

π

, then

π

K

=

h

p

ξ1

π p

ξ2

π . . . p

ξ

2K

π

i

is an invariant measure for



ξ

t

,



J

tj1



1≤j≤K



t≥0

. One can then compute M

ξ

from the

following 2 × 2 matrices:

M00 = I2, M01 =  1 0 q01− 1 − q01−  , M10 =  1 0 q10− 1 − q10−  , M11 =  1 − q+ q+ 0 1 

Then, using Kronecker product properties, we have the lemma:

Lemma 1 With the notation ⊗

N

M = M ⊗ M ⊗ . . . ⊗ M

|

{z

}

Ntimes

,

M

J,K

=(1 − f ) ⊗

K−1

((1 − f )M

00

+ f M

01

|

{z

}

M0

)

+ f ⊗

K−1

((1 − f )M

10

+ f M

11

|

{z

}

M1

)

We conclude on Σ (M

ξ,J,K

)

using v

0

= [1 1]

T

and e

2

= [0 1]

T

:

M0v0 = v0 = M1v0, M0e2 = λ0e2 and M1e2 = λ1e2 + f q+v0

Let u

i,K

= (u

1i,K

, . . . , u

K

i 

i,K

)

vectors which can be written

as the Kronecker product of i vectors e

2

and (K − i − 1) v

0

,

u

1i,K

= ⊗

i

e

2

K−1−i

v

0

, then:

MJ,Kuji,K =   (1 − f )(λ0) i + f (λ1)i | {z } µi    u j i,K+ i−1 X k=0 X l αk,lulk,K

In this basis, M

J,K

is triangular superior with µ

i

on the diagonal

with the multiplicity

K−1i

, it ends the proof on Σ (M

ξ,J,K

)

.

Simulation for shaping intuition:

Thanks to simulations, we have a look to the case f and q small:

0 50 100 150 0.00 0.01 0.02 0.03 0.04 0.05 dendritic sum density of h correlated, K = 1000 independent, K = 1000 A: q+ = q01− = q− 10 = 0.5 50 75 100 125 150 0.00 0.01 0.02 0.03 0.04 dendritic sum density of h correlated, K = 1000 independent, K = 1000 birth and death

B: f q+ = q01− = q−

10 = 0.001

Fig: Invariant distributions when f = 0.1 and K = 1000.

The independent case is the model considering every Jtij evolves indepen-dently following the dynamic of one synapse in the model defined in 3. We can see in simulations that the behaviour of h1t is similar to the one in the independent case when f and q are small enough. Moreover, this model of independent synapses leads to similar results as (1) for the SNR analy-sis. Finally, under the assumption that synapses evolve independently, πti,the equivalent of pit in the previous model, are binomial laws. In fact, Jtij con-verges in law to the invariant distribution π =

 π−, π+  with speed λt =  1 − f2q+ − f (1 − f )(q01− + q10− ) t

. Hence, we get the follow-ing results thanks to the generatfollow-ing function:

πt,K0 ∼ B K, π+ − π+q01− λt 

, πt,K1 ∼ B K, π+ + π−q+λt  Therefore, it seems to be possible to extend the result (1) for a larger range of parameters f and q thanks to the similarities between the independent model and the one presented in 3 when f and q are small. However, such a similarity would reduce the model to a too simple model loosing its initial interest.

6.C

ONCLUSION

When previous studies have considered small coding level f in

order to get results on the storage capacity depending on N , such

an assumption seems to make the model loose its initial interest

sited in the correlations between synapses. Our approach aims at

studying the synaptic input into a neuron taking correlations into

account through a decision rule. It enabled us to get a first result

on the speed of forgetting stimuli thanks to a spectral analysis.

7.P

ERSPECTIVES

• Show p∞,K converges to π∞,K when f , q are small • Control the probability of

er-ror thanks to the study in the independent case: f , q small

• A study of the probability of error taking into account all the vector ht

• Add neural dynamics as a feedback to maintain weights structure longer and enhance storage capacity

R

EFERENCES

[1] G. Mongillo, O. Barak, and M. Tsodyks. Longuet-Higgins. Synaptic Theory of Work-ing Memory. In Science ’08

[2] D. J. Willshaw, O. P. Buneman, and H. C. Longuet-Higgins. Non-Holographic Asso-ciative Memory. In Nature ’69

[3] J. J. Hopfield. Neural networks and physical systems with emergent collective com-putational abilities. In Proceedings of the national academy of sciences ’82

[4] V. Gripon, J. Heusel, M. Lwe, and F. Vermet. A Comparative Study of Sparse Asso-ciative Memories. In Journal of Statistical Physics ’16

[5] N. Brunel and F. Carusi. Slow stochastic Hebbian learning of classes in recurrent neural networks In Network: Computation in Neural Systems ’98

[6] S. Fusi and L. F. Abbott. Limits on the memory storage capacity of bounded synapses. In Nature Neuroscience ’07

[7] T. Elliott. Memory Nearly on a Spring: A Mean First Passage Time Approach to Memory Lifetimes. In Neural Computation ’14

[8] M. K. Benna and S. Fusi. Computational principles of synaptic memory consolida-tion. In Nature Neuroscience ’16

[9] D. J. Amit and S. Fusi. Learning in Neural Networks with Material Synapses. In Neural Computation ’94

[10] L. Devroye, L. Györfi, and G. Lugosi. A probabilistic theory of pattern recognition. In Springer ’96

Références

Documents relatifs

Motivated by the desire to bridge the gap between the microscopic description of price formation (agent-based modeling) and the stochastic differ- ential equations approach

They deal with the different kinds of definitions which the mathematicians talked about; the features of their defining processes (a focus is made on the interplay between

maximum entropy principle for capacities is less natural than for probability distributions because the notion of uniformity in the context of capacities is not naturally linked

1365–8050 c 2015 Discrete Mathematics and Theoretical Computer Science (DMTCS), Nancy, France.. The main purpose of this paper is to introduce a new family of posets, defined from

R6sum6. - Nous avons calcul6, en fonction du nombre d'atomes, le facteur g de 1'6lectron non appari6 d'une chaine lin6aire dans l'approximation de Huckel. On tient compte des

Using the separator algebra inside a paver will allow us to get an inner and an outer approximation of the solution set in a much simpler way than using any other interval approach..

We propose a two-state model with pure state constraints in order to combine the standard dynamics of groundwater stock and the progressive development of RWH capacity.. In fact,

1) TPC and AC vary according to the different ecoregions in Newfoundland and Labrador, 2) higher TPC and AC will be found in areas of lower temperature and higher precipitation