HAL Id: jpa-00212356
https://hal.archives-ouvertes.fr/jpa-00212356
Submitted on 1 Jan 1990
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Neural networks with many-neuron interactions
G.A. Kohring
To cite this version:
145
LE
JOURNAL
DE
PHYSIQUE
Neural networks with many-neuron interactions
G. A.
Kohring
(*)
Physikalisches
Institut der Universität Bonn, Nussallee 12, D-5300 Bonn 1, F.R.G.(Reçu
le 2 octobre 1989,accepté
le 3 octobre1989)
Résumé. 2014 Les
propriétés statiques
etdynamiques
de réseaux neuronaux avec interactions entreplusieurs
neurones sont étudiéesanalytiquement
etnumériquement.
Lacapacité
destockage
deces réseaux est la même que celle des réseaux avec interactions entre
paires
de neurones, cequi
implique
que ces réseaux ne stockent pas l’information de manièreplus
efficace. La taille des bassins d’attractions est calculée exactement àpartir
d’une solution de ladynamique
du réseautotalement connecté. Ceci montre que les réseaux avec interactions entre
plus
de deux neuronessont
plus
efficaces pour la reconnaissance des patterns que les réseaux avec interactions entrepaires
de neurones.Abstract. - The static and
dynamical properties
of neural networkshaving
many-neuron interactions are studiedanalytically
andnumerically.
The storagecapacity
of such networks is found to beunchanged
from that of the morewidely
studied case of two-neuron interactionsimplying
that these networks store information no moreefficiently.
The size of the basins of attraction in the many-neuron case is calculatedexactly
from a solution of the networkdynamics
at full
connectivity
and reveals that networks with many-neuron interactions are better at patterndiscrimination than the
simpler
networks withonly
two-neuron interactions.J.
Phys.
France 51(1990)
145-155 15 JANVIER 1990,Classification
Physics
Abstracts87.30G - 75. 10H - 64.60C
Introduction.
The static structure of the
simplest single layer
neural networks is describedby
N two-stateneurons,
Si’
and areal-valued,
N x Ncoupling
matrix,
Tij,
which contains information about the architecture of the network as well as determines the stable states. The two-dimensional character of thecoupling
matrix follows from theassumption
that the neurons haveonly
two-neuron interactions.
However,
it was realizedquite early
[1]
that thisassumption
is nottenable for
biological
networks.Biological
networks have apropensity
for many-neuroninteractions,
although
the reasons for this are far from clear. Several authors[1,
2,
3]
havepointed
out the increasedstorage
capacity
of networksusing multibody
interactions and Baldi and Venkatesh[3]
have estimated that thisstorage
capacity
increases like0 (N A)
for networkshaving
(A
+ 1)-neuron
interactions. For the case of two-neuron interactions it is known from the work of Venkatesh[4]
and Gardner[5]
that the maximum number of uncorrelatedpatterns, P,
which can be stored without error, increases as P = 2 N. If onedefines the
storage
capacity,
a, as :then the maximum
storage
capacity
for networkshaving
two-neuron interactions isa = 2.
Hence,
theimportant
question
to be asked about networks with many-neuroninteractions is whether this
storage
capacity
isincreased,
decreased or remains the same. Inother
words,
do many-neuron connections store information more or lessefficiently
thantwo-neuron connections ? The results to be derived here will show that the maximum obtainable value of a is a = 2 for all values of k -- 1.
In addition to their static
properties,
neural networks are also definedby
theirdynamical
properties.
The most commondynamics
studied,
for two-neuroninteractions,
isdefined,
atzero
temperature
(i.e.
in the absence ofnoise)
via theequation :
For
strongly
connected networks withonly
two-body
interactions,
theonly dynamical
property
knownanalytically
is the firststep
in this time evolution[6].
However,
many otherproperties,
such as the size of the basins of attraction are known fromcomputer
simulations[7-9, 15].
A naturalgeneralization
of thedynamics
to themulti-body
case is definedby :
It turns out that
multi-body
dynamics
are somewhat easier tostudy
thantwo-body
dynamics
because of a decrease in the number ofmacroscopic
variables needed to describe agiven
stateof the network. Because of this decrease an ansatz can be made
regarding
thedynamics
whichallows certain
parameters
of the time evolution to be calculatedexactly.
Theseparameters,
aswill be
shown,
are in excellentagreement
withcomputer
simulations.The
plan
of the paper is then as follows. In the next section the many-neuron networks will be defined in more detail and the staticproperties
will be calculatedanalytically
via ageneralization
of the method firstemployed by
Gardner[5].
Then the first timestep
will be calculatedanalytically
andcompared
toprevious
results. Aconjecture
will made at thispoint
regarding
theanalytic
form of the exact solution to thedynamical
evolution.Next,
afterpresenting
alearning
rule suitable for all values ofÀ,
numerical results will be discussed whichsupport
thevalidity
of theforegoing analytical
arguements.
Static
properties.
The
properties
of a neural network willdepend,
of course, upon the network architecture. For many-neuroninteracting
networks afully
connected architecture has somesymmetries
not
present
in the two-neuron case. This is illustratedschematically
infigure
1 for thesimple
Fig.
1. - A schematicdiagram
of a three-neuron interaction.respect
tointerchanges
of theindices j
andk,
i.e.,
physically
there existsonly
onethree-neuron « synapse » labelled
i-j-k.
For ageneral,
(À
+ 1)-neuron
interaction thesynaptic
matrixTijl ...j,B
will be invariant underinterchange
of any of thejl to jk
indicies. This should be reflected in the calculation of the local fieldby summing only
over theindependent
elements of thesynaptic
matrix, i.e.,
the sum inequation (1.3)
should be further restricted to run overall jl 1 : j 2 ... : j À in
addition to the normal restriction ofrunning over jl, j2,
...,j03BB ~ i.
With these extra
symmetries,
it is evident fromequation
(1.3)
that the stable states of thesystem
(denoted
by
e)
will be defined as those states with localstability
greater
than zero :for every neuron i.
Where, g
E {l, ..., P }, eN’ is
defined as :which is
simply
the square root of the number ofindependent
elements in the sum andE
denotes the restricted sum overonly
theindependent
elements of thesynaptic
matrix asJ
described above.
In order to fix the scale for the
couplings
a suitablegeneralization
of the two-neuroninteraction case to the many-neuron interaction case is :
For À
= 1,
this is seen to reduce to the usual normalization[5]
and as is the casethere,
it canbe shown that this is not a real restriction on the network rather it is a
description
of the networks which emerge when constructed from some of the knownlearning
rules(see
Sect.3).
Experience
with À = 1 networks has also shown the need toadjust
thestability
condition inequation
(2.1) from £M:::.
0 toCr::>
K(where
K is apositive
constant which is taken for convience to beindependent
of 1£
andi),
so as toproduce
stable states withnon-negligible
basins of attraction.defined
by
thevanishing
of this volume. The fractional volume issimply : VT
= fl
Vi,
wherei
Vi
isgiven
by :
with,
The
typical
fractional volume is then to be foundby averaging VT
over all realizations of thestored
patterns.
SinceVT
factorizes over the indexi,
theassumption
of selfaveraging
withrespect
to i should be valid.Therefore, write,
InVT = 1 In
Vi
andperform
thequenched
i i
average on In
VT,
i.e.,
on eachsubsystem
InVi.
This average can be done via thereplica
method :
For the definition of
Vi given
above,
the calculation of thequenched
average reduces toevaluating
thefollowing expression :
Using
the standardintegral representation
of the efunction,
thisequation
becomes :where the
integrals
over the x-variables are restricted to the range[K, 00]
and the otherintegrals
remain unrestricted. In order to calculate the average over thepatterns,
we rewritethe
exponential
as :The sum over the
patterns
actsonly
on the terms in the square brackets and whenexpanding
these brackets out for À = 1they
vanishidentically.
For À >1,
all of those termsSince the
patterns
are random anduncorrelated,
apriori
oneexpects
that theTij, ...
j,B will alsobe uncorrelated.
Hence,
with on the order ofNÀ(À+l)/2
terms in the above sum andeN’ (À
+ 1)/2 -N À (À
+1)/2,
these sums are of theorder N- À (À + 1)/4.
Forlarge
N these terms arenegligible
and one is left with :Therefore,
to lowest order in1/N,
thequenched
average has the form :If the
variable,
q a6,
@ is nowintroduced ;
then
equation
(2.11)
becomes identical to that foundby
Gardner for the case of two-neuroninteractions.
Hence,
the rest of theanalysis proceeds identically
to her calculation and the interested reader is referred to heroriginal
paper for the details. The mainresult,
obtained atthe
replica symmetric point,
which can be shown to bestable,
is that thestorage
capacity
grows atsaturation, i. e. ,
the maximum allowed a for agiven
K, as :When K = 0 the
patterns
areminimally
stable and thestorage
capacity
has its maximum valueof a = 2. This proves that networks
having
many-neuron interactions store information no moreefficiently
than networkshaving only
two-neuron interactions.They
do store muchmore information than two-neuron
connections,
with the number of storablepatterns
growing
as
N À 1 À !,
however,
they
require
of the same order ofmagnitude
more connections to reach thishigher
level ofstorage.
Dynamic properties.
The
dynamical
behaviour of these models is at least asimportant
as thestorage
properties
forapplication
purposes. In order to function as an associative memory, the stable states must be able to attract almost all states within a « reasonable »Hamming
distance,
i.e.,
the distancemust not be so
large
that stateshaving spurious
correlations with the stored state areattracted,
but it also must not be so small thatonly
states which are almost identical to the stored state are attracted. The determination of this distance is a very difficultproblem
analytically.
Forstrongly
connected networks with two-neuroninteractions,
only
the firststep
of the time evolution is knownexactly
[6],
although
the attraction basins have been studiedby
computer
simulations[7-9].
To
gain
asimple
understanding
ofdynamics
for À >1,
firstcompute
theprobability,
P (m (1) m (0)),
that a state of the network attime, t = 1,
hasoverlap,
m (1 ), (m (t )
-(1/N) 03A3 03BEi
Si(t))
with some one of stored states,given
that it hadoverlap
m(0)
with thesame stored state at time t = 0 and random
overlap
with all of the other stable states. Thisprobability
isgiven by :
The evaluation of this
probability proceeds along
the lines ofKepler
and Abbott’s[6]
work for the two-neuron interaction network and with the use of the samearguement
as in section 2 forcomputing
the sum overconfigurations,
hence,
omitting
thedetails,
the result isquite
simply :
where
is defined inequation (2.1).
If the distribution of the£i’s
isknown,
then(3.2)
gives
anexplicit equation
form (1 ).
However,
theonly
restriction on theCi’s
is :£i
> rcVi,
which isindependent
of À ; hence,
they
should have the same distribution as for À = 1[10]. (It
is very easy to check this via areplica
calculation similar to that of Sect.2.)
Thus,
withprobability
one,m (1 )
becomes :For À
= 1,
the calculation ofm (2)
is very difficult because itdepends
notonly
uponm(1)
but also upon theparameter,
p(1):= L
[m p, (1)]2,
which measures theprojection
of the03BC
state of the network onto the
subspace spanned by
the stored states.(At
the first timestep
thisparameter
does not enterexplicitly
since the initial state of the network has been assumed tohave
macroscopic overlap
withonly
one storedpattern
and randomoverlap
with all of the other storedpatterns.)
The calculation of the time evolution ofp (t)
appears to be anintractable
problem
[11].
When À >1,
however,
a similarvariable,
pk (t)
03B1[M 2 1
03BC
does not
play
such a fundamental role because thecomplete
configuration
space isalways
spanned by
the set of storedpatterns
(provided
of course that a ~ 0 and in the limitNu
oo),
so that P À isindependent
of time. The effect of this timeindependence
can be seenby
considering
the basins of attraction around the stored states.A convenient measure of the « size » of the basins of attraction around a
given
stored statecan be
computed
fromequation
(3.3).
This measure is often called the radiuso f attraction
andit is denoted
by
R[15],
with Rbeing
defined as : R -1- mu, and mu is the unstable fixedpoint
of(3.3)
at agiven
value of K(and
as a consequence ofEq.
(2.13)
anequivalent
value ofa).
R isplotted
infigure
2 as a function of a for several values of À. Ifm (0 ) >
mu, then the network will iterate toward the stable fixedpoint,
m = 1. If on the otherhand,
been noticed that the
system
may iterate in the first few timesteps
toward the fixedpoint
m = 1 and then at latter times reverse direction and flow toward m = 0
[12].
Once,
thesystem
starts
flowing
toward m = 0however,
it has not been observed to reverse itself.(For
therelated case of feed-forward networks one can solve the
dynamics exactly
and proverigorously
that such an effect does indeed exist[13].)
Thus,
it can beconjectured,
and issupported by
the data(see
nextSect.),
thatfigure
2gives
the upper bound for the radius of attraction when À = 1. The reversal of the network’s direction of flow haspreviously
been related to the time evolution of p(t ) [11].
Since,
for Je >1,
p À should beindependent
oftime,
it can be
conjectured
that no such reversal takesplace
and that the fixedpoints
ofequation
(3.3)
determine the radius of attractionexactly.
In the nextsection,
numerical simulations will show this in fact to be the case.Fig.
2. -R = 1 - mc, where mc are the fixed
points
ofequation
(3.3),
vs. a for several values of À.There is another
qualitative
difference between the flowdiagram
for À = 1 andÀ >
1,
namely
the nature of the transition to R = 1. For À =1,
equation
(3.3)
predicts
asecond order transition to R = 1 at the value of rc defined
by :
or a = 0.419.
While,
for À >1,
equation
(3.3)
predicts
a first orderjump
to R = 1 at a = 0. The size of thisjump
increases as À increases and goes to 1 as A - oo. In otherwords,
as A - oo, R --> 0 for all a > 0 and R = 1 at a = 0.
For the
dynamical properties
then,
there arequantitative
as well asqualitative
differences between networks withonly
two-neuron interactions and networks with many-neuronNumerical simulations.
A
learning algorithm
togenerate
networkssatisfying
the constraints of section two formany-neuron
interacting
networks iseasily
constructed from ageneralization
of thelearning
rulesdeveloped
for two-neuroninteracting
networks[5,
8,
14].
Although
any of theproposed
learning
rules can begeneralized,
thegeneralization
here is based upon thelearning
ruledeveloped
by
theEdinburgh
group.Starting
with the firstpattern
andmoving
sequentially
through
all thepatterns
(and
in asequential
orparallel
mannerthrough
theneurons),
define an « error mask » as :then
update
thecoupling
matrix as :After all the
patterns
have been checked in this manner theprocedure
isrepeated
until thestability
condition is satisfied for everypattern
on every neuron. Thislearning
algorithm
canbe shown to converge to a solution of the
stability
conditionsprovided
such a solution exists. Theproof
makes no essentialdepartures
from thatgiven
in reference[5]
for the case oftwo-neuron interaction and so will not be
repeated
here.For finite size
networks,
one mustcompensate
for the deviation of a, from thatgiven by
equation
(2.13).
This is doneby
looking
for the value of K N at which thelearning
time tends toinfinity.
K N will be somewhat less than the Kepredicted by
equation
(2.13),
but the finite sizesystem
will haveproperties
which are very close to those of the infinite volumesystem
[9].
As a test of theforegoing analytical
calculations,
m (1 )
was measured as a function ofm (0 )
for a = 0.20 and a = 0.4 at À = 2 and a network size of N = 64. As seen infigures
3a and 3b the results agreequite nicely
with theanalytic predictions.
Asm (0 )
becomes small the interference from the other stored states increases(a
priori
oneexpects
interference to set in aroundm(O) -- 0(11
-N,/N-»
and theagreement
with the theoretical calculations ispredictably
worse.
Now,
in theprevious
section it wasconjectured
that the radius of attraction could be calculated fromequation
(3.3)
for À > 1. To show that this is indeed the case a definition of R is needed which takes the finite size of the network into account.A
working
definition of the radius of attraction Ris ; 1 -
mc, wheremc is
the value of theoverlap
such that as N - oo almost all of the stateshavin m
> mc will evolve towardm = 1. For finite size
systems, R is
reduced due to the0 (1 /
~N)
overlap
between the stored/ 1 M,
B
states.
Hence,
a better definition of R is[15] :
R= 1 - Mav ,
where m is the average[ ]
B
1 -m
av g-
av
overlap
of thegiven
state with all of the other stored states and the bracket indicates anaverage over all stored states and all
starting
configurations.
(As
N ---> oo, the mc defined here tends to the mu defined in theprevious
section and Ma, tends towardzero.)
The results obtained
when
calculating
Rusing
thisprescription
areplotted
infigure
4.(The
results for À = 1[9]
are alsoplotted
forcomparison.)
As can be seen, the fixedpoints
ofequation
(3.3)
agree very well with the numerical simulations for the radius of attraction when h = 2.(Note
also that for À= 1,
equation
(3.3)
becomes a verygood predictor
of R forlarge
a wherep (t)
is much morerestricted.)
Although
it wouldcertainly
be desirable to have results forlarger
values ofÀ,
such simulations becomeincreasingly costly
forfully
connected networks nearsaturation,
i.e.,
Fig.
3. -a)
mo vs. ml at a = 0.20. The solid line is aplot
ofequation
(3.3).
b)
mo vs. ml at a = 0.40. The solid line is aplot
ofequation
(3.3).
All the data was taken on a network with N = 64 neurons.memory,
growing
like 0(N AI À !).
Hence,
these simulations must await the nextgeneration
ofparallel
computers.
Discussion.
It has been shown that networks with many-neuron interactions do not store information
Fig.
4. - The radius of attraction for À = 1(0)
and À = 2(El).
For À = 1 the network size wasN = 100 while for À = 2, the network size was N = 64. Also shown are the theoretical
predictions.
global
nature of thelimiting
value of a = 2.Although
thestorage
capacity
of theoriginal
Hop field
model can beimproved
uponby deleting
connections[16],
orby using Ising
(± 1)
variables for the synapses
[17],
orby using
restricted range interactions[18],
none of theseapproaches
leads toincreasing a
above 2 aslong
as random uncorrelatedpatterns
are used. It would beinteresting
to know if a is limitedsimply by
the two state nature of thespin
variablesor
by
some more fundamental mechanism. Of course, one can consider correlatedpatterns
in which case thestorage
capacity
defined above goes toinfinity
as all of thepatterns
becomeperfectly
correlated ; however,
if one then looks at thequantity : in formationlsynaptic
coupling,
this decreases as the correlation between thepatterns
intreases[5],
implying
onceagain
some fundamental limit to the informationstorage
capacity
of neural networks. On the otherhand,
many-neuroninteracting
networks do have a differentdynamical
behaviour as indicatedby equation
(3.3).
This solution of thedynamics
indicates that many-neuroninteracting
neural networks are far better atpattern
discrimination than two-neuroninteracting
networks,
i.e.,
whenoperating
at agiven
radius of attraction many-neuroninteracting
networks are able to store morepatterns
than thesimple
two-neuroninteracting
networks. For
example,
a reasonable radius of attraction for the purpose of associative memory is R = 0.20(10
% wrongspins).
Operating
at this value ofR,
networks with two-neuron interactions can storeapproximately
0.52 Npatterns,
while a network with three-neuron interactions can storeapproximately
0.35(N 2/2!)
patterns
and a network with four-neuron interactions can store 0.18(N 3/3!)
patterns. Hence,
the real power of networks with many-neuron interactions will becomeapparent
inapplications
where it is necessary to obtaina
predetermined
level of discrimination betweenlarge
numbers ofpatterns.
(For
real worldapplications
this will beimportant only
if the cost ofmaking
many-neuron interactions is notsighificantly
more than that formaking
two-neuron interactions. Such iscertainly
the case infast thus
making
the network useless for most associative memoryapplications
independent
of the value of A. Instead ofworking
atlarge
athen,
these resultsimply
that it would be better togo to a
larger
value of À and then work at a smaller value of a.Finally,
it should be added that the calculationspresentéd
above can be extended to include networks of mixedtype, i.e.,
networkshaving
forexample two-body
andthree-body
interactions,
etc., and to networkshaving
less than fullconnectivity.
In such cases, theefficiency
of informationstorage
isunchanged
from thatabove,
but thedynamics undergoes
predictable changes depending
upon the exact architecture one isconsidering.
Acknowledgments.
1 would like to thank the
theory
group at theUniversity
of Bonn for theirhospitality
throughout
the earlierphases
of this work and the HLRZ for itshospitality
while thecomputer
calculations were in progress.References
[1]
PERETTO P. and NIEZ J. J., Biol.Cybern.
54(1986)
53.[2]
PERSONNAZ L., GUYON I. and DREYFUS G.,Europhys.
Lett. 8(1987)
863 ;GARDNER E., J.
Phys.
A 20(1987)
3453 ;ABBOTT L. F. and YAIR ARIAN
Phys.
Rev. A 36(1987)
5091.[3]
BALDI P. and VENKATESH S. S.,Phys.
Rev. Lett. 58(1987)
913.[4]
VENKATESH S. S., Neural Networks forComputing,
Ed. J. S. Denker(American
Inst. ofPhys.,
NewYork)
1986.[5]
GARDNER E.,Europhys.
Lett. 4(1987)
481; J.Phys.
A 21(1988)
257 ;GARDNER E. and DERRIDA B., J.
Phys.
A 21(1988)
271.[6]
KEPLER T. B. and ABBOTT L. F., J.Phys.
France 49(1988)
1657 ;KRAUTH W., MÉZARD M. and NADAL J.-P.,
Complex Syst.
2(1988)
387.[7]
WEISBUCH G., J.Phys.
Lett. France 46(1985)
L623 ;KRAUTH W., NADAL J.-P. and MÉZARD M., J.
Phys.
A 21(1988)
2995.[8]
FORREST B. M., J.Phys.
A 21(1988)
245.[9]
KRÄTZSCHMAR J. and KOHRING G. A.(to
bepublished).
[10]
KINZEL W. and OPPER M.,Physics
of Neural Networks, Eds. J. L. van Hemmen, E.Domany and
K. Schulten(Springer
Verlag ;
to bepublished).
[11]
KOHRING G. A.,Europhys.
Lett. 8(1989)
697.[12]
GARDNER E., DERRIDA B. and MOTTISHAW P., J.Phys.
France 48(1987)
741.[13]
MEIR R. and DOMANY E.,Phys.
Rev. Lett. 59(1987)
359 ;Phys.
Rev. 37(1988)
608.[14]
KRAUTH W. and MÉZARD M., J.Phys.
A 20(1987)
L745 ;ABBOTT L. F. and KEPLER T. B., J.
Phys.
A 22(1989)
L711.[15]
KANTER I. and SOMPOLINSKY H.,Phys.
Rev. 35(1987)
380.[16]
CANNING A. and GARDNER E., J.Phys.
A 21(1988)
3275 ;DERRIDA B., GARDNER E. and ZIPPELIUS A.,
Europhys.
Lett. 4(1987)
167.[17]
AMALDI E. and NICOLIS S., J.Phys.
France 50(1989)
2333 ;KRAUTH W. and OPPER M., J.
Phys.
(to
bepublished) ;
KRAUTH W. and MÉZARD M., Laboratoire de