HAL Id: jpa-00212470
https://hal.archives-ouvertes.fr/jpa-00212470
Submitted on 1 Jan 1990
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Quasi-optimized memorization and retrieval dynamics in
sparsely connected neural network models
Karl E. Kürten
To cite this version:
1585
Quasi-optimized
memorization and retrieval
dynamics
in
sparsely
connected neural network models
Karl E. Kürten
Institut für Theoretische
Physik,
Universität zu Köln, D-5000 Köln, F.R.G. and HLRZc/o
KFA Jülich, D-5170 Jülich 1, F.R.G.(Reçu
le 29 mars 1990,accepté
le 4 avril1990)
Abstract. 2014
Several network
topologies
with sparseconnectivity
suitable for informationprocessing
in neural network models are studied. A trial and error scheme forquasi-optimal
neighbour
search is shown tosuccessfully
overcome instabilitiesgiving
rise to chaotic behaviour. Moreover, computer simulations reveal that for random unbiased patterns theconnectivity
of the network can beadapted
to thespecific
structure of the information the network is asked tocapture such that it can be
quasi-optimally
stabilized. On the other hand, networks with randomor
purely
nearestneighbour
interactions are notcompetitive
candidates for the realization of associative memories. It is shown further thatsparsely
connected network models withquasi-optimal neighbour
searchtechniques substantially outperform
theirfully
connected counterparts.J.
Phys.
France 51(1990)
1585-1594 ler AOÛT 1990,Classification
Physics
Abstracts87.30 - 75.10H - 64.60
1. Introduction.
There has been a tremendous amount of interest in
dynamical
neural network modelsexhibiting
collectivephenomena
associated with informationprocess-ing
andcognitive
behaviour :storage
and retrieval ofmemories,
learning
andforgetting,
association and abstraction[1-3].
One detrimentalassumption
of some of thecurrently
popular
neural network models is fullcônnectivity
of the neuron-like elements. On the onehand,
high connectivity
israrely
found in nature. On theother,
highly
connected networks ofrespectable
size,
suitable for suchcognitive
tasks aspattern
recognition
andimage processing,
raise immense
wiring problems
in hardware realization and make real-time softwaresimulations
computationally expensive.
Infact,
it is well known that realisticbiological
networks are rather
sparsely
interconnected,
and it is also well known thatsystems
with lowconnectivity
are often amenable toanalytical study [3, 5].
2. Formalism.
The network model consists of N
sparsely
interconnectedlogical
unitsonly capable
of the values cri = 1 andos
= - 1,
where each cell i issupposed
to receiveexactly
Kinputs
fromother units of the network. The
dynamical
time evolution of the system is then formulated interms of the local fields
hi ( t)
where j
runs over thearbitrary
Kneighbours
of celli,
notnecessarily
nearestneighbours,
and the normalizationfactor
1 c
1
i is definedby
. The interaction matrix C is assumed to be real and
hi
can beinterpreted
as the net internalstimulus from cells
synapsing
onto cell irepresented by
the usual linear sum ofweighted
inputs.
The actualbinary
state is then determined via the deterministic threshold ruleThe network is now
supposed
to learn a seriesof p = a K
random,
unbiasedconfigurations
n(IL)
=(7T fIL),
...,
03C0N (u )
J-’ =1,
..., pwith p being proportional
to theconnectivity
parameter
K. A necessary
through
minimalrequirement
is that thepN
polarization
parameters,
Rv(u)
defined asare
larger
than zero, whichguarantees
that thepatterns
to be memorized are fixedpoints
ofthe
dynamics (1).
In order to engrave the information asstrongly
aspossible
one has toadjust
the
couplings
such that thepolarization
parameters
in(4) satisfy
theinequality
with the
quantity K
aslarge
aspossible.
In the case ofrandom,
uncorrelatedpatterns
andrandom
connectivity
the upperstorage
capacity a
forgiven
K has been shown tosatisfy [2]
in the
thermodynamic
limit.Equation (6) implies
that the critical a c = 2 isreached,
when K vanishes. In view of technicalapplications,
however,
where the total number of cells N is finite and somepatterns
might
bestrongly
correlated,
one is faced with theproblem
toproduce
a set ofcil
so as tosatisfy
condition(5)
with K aslarge
aspossible.
Astraightforward
generalization
of theperceptron
learning algorithm [6]
suggests
thefollowing
localdynamical
learning
ruleThus,
patterns
well embedded have noimpact
on the modification of thesynaptic
efficiencies,
whereaspatterns
notsatisfying (5)
become effective inmodifying
thecouplings.
Ifthen,
on average, themagnitude
of thepolarization
parameters
RlJL)
in(4)
are maximizedthe sizes of the basins of attraction are also
expected
to befairly
close to theiroptimal
magnitudes.
Here it is useful to start from « tabula rasa » rather than fromrandomly
chosen1587
3.
Stability
analysis.
Exact results can be derived in the
thermodynamic
limit forconnectivity
parameters
K ocLog (N ),
if theconnectivity
as well as thepatterns
are chosen at random. The fractionalHamming
distance between aspin
configuration
and anarbitrary
storedpattern
at time t canbe
given
as a function of the distance at theprevious
timestep
and thepolarization
parameter
K related to the critical value of a
by (6) [10,
5]
Here,
erf(z)
denotes the error function defined asThe time evolution of
H(t
)
admits the natural fixedpoints
H - 0and H -1 /2
and the existence2
of a
phase
transition from an ordered to a chaotic behaviourdepends only
on themagnitude
of the
slope
of 0(H)
evaluated at the fixedpoints.
Figure
1 shows thatd cP / dHI H ae 0
exhibits asingularity
at K= 0,
which allows norecognition
for
ac = 2
since the fixedpoint
H = 0 is unstable. On thecontrary,
forK = 0,
d 0 IDHI H = 0
and allhigher
derivatives remainstrictly
zero.Hence,
the fixedpoint
H - 0 is
always
stable and any storedpattern
can inprinciple
be recovered if it shares asufficiently large overlap
with the initial condition.Moreover,
as can be seen fromfigure
1,
for 0 K K G ~
l.l,
d cP / dHI H ae 1/2
does not exceed the value one whichimplies
that in thisparameter
regime
the fixedpoint
H -1 is also
stable,
giving
rise to the existence of astrictly
2
unstable fixed
point
H*.Hence,
the memory of initial conditions can becompletely
erased,
since the time evolution of the
overlap
ofslightly
differentconfigurations
with a storedpattern
might
evolve to zero.According
toequation (6)
KGcorresponds
to a criticalcapacity
a G = 0.42.
In contrast to random model networks
[5]
theprevalence
of the ordered and chaoticphase
notonly depends
on the networkparameters,
but also on the initial conditions. For K above the critical KG the unstable fixedpoint
H*disappears
and the fixedpoint H =1/2
changes
its2
stability
from attraction torepulsion.
Then,
any initialconfiguration
with aHamming
distanceH(t
=0)
1
with a stored pattern will lead toperfect recognition.
Fig.
1. - Theslope of 03A6 (H)
fora) K
= 0,b) K
= 0.5,c) K
= 1.198 andd) K
= 4.For
H 1
and K #
0equation
(8)
can be wellapproximated by
2whereas for H close to the value 1
whereas for H close to the
2
is a
good approximation. Equation (11)
suggests
rather fast local convergence to a storedpattern.
Note thatequation (8)
is a scalar mean fieldequation
in thethermodynamic
limit andthe two basins of attraction are well
separated by
the relativeposition
of the unstable fixedpoint
H*.However,
for a finite number of cells a small fractionalHamming
distance does notguarantee
convergence to a storedpattern,
sincejudgement
involves a measure in n-dimensional space,where,
depending
on the number of storedmemories,
the basins ofattraction can be
degraded by large
crevices and holes close to the stored information. Theseholes allow errors to be incurred for small
Hamming
distances from the stored states anddegrade
the networkperformance substantially.
4.
Computer
simulations.In this section we
give
some results of ourcomputer
simulations for three network models1589
passes and
stop
thelearning procedure
(7)
when the average of allpolarization
parameters
reaches anear-optimal
value. Before the retrievalphase
theoriginal
patterns
aredegraded by
random
noise,
andpresented
as initial conditions for the network to recall.4.1 NEXT-NEIGHBOUR SQUARE LATTICE MODEL. - We first
study
networks with cellsresiding
on a two-dimensional square lattice restricted toexclusively eight
next-nearestneighbour
interactions.Figure
2 shows that the fraction of bits notsatisfying
condition(5)
even with K = 0 grows with
increasing
a.Thus,
aportion
of thepatterns
are not fixedpoints
of thedynamics (1)
andconsequently
cannot be recalled
perfectly. Though
for a = 0.375 and a = 0.5figure
2 shows that theoutput-input
ratio isessentially larger
than one, the network ishardly capable
offunctioning
as an associative memory. The results are in accord with Forrest’s
findings [7]
whoreports
asomewhat smaller
output-input
ratioexplainable
through
the restriction ofonly allowing
clipped coupling
coefficients,
which reduces thestorage
capacity
and theperformance
of the network. For a = 0.625 the situationgets
worse, where the retrieval rate is in fact less thanthe
input
rate. These results areindependent
of N for N > 225 and have beenperformed
forN =
225,N
= 625 and N = 2 500respectively, averaged
over ten different initialconfigur-ations and some
twenty
independent
systems.
It is
obvious, therefore,
that models withexclusively nearest-neighbour
interactions are not at alldesigned
to process randompatterns.
Fig.
2. -Mean fraction of
correctly
recalled bits as a function of the fraction ofinitially
correct bis forK = 8
(square lattice),
a = 0.375, a = 0.5 and a = 0.625(from above).
Starsrepresent Forrest’s results for a = 0.5 based on
clipped coupling
coefficients(
1 cj
1
= 1 ).
4.2 RANDOM NEIGHBOUR LONG-RANGED MODEL. - In a next
step
westudy
a somewhat less restricted and more realistic class ofnetworks,
where the units choose theirarbitrary
As has been found for the
next-nearest-neighbour
model,
computer
simulations forK = 8 reveal that after the
learning
process a smallportion
of thepatterns
to be memorizedare not fixed
points
of thedynamics (3), though
this fraction decreases to zero withincreasing
N.Figure
3 shows the mean fractionof correctly
retrievedspins
as a function of the fraction ofinitially
correctspins
for a = 0.375 and different total number of cells N. Note thestrong
sizedependence clearly indicating
that,
forlarge
N,
an initialdamage
of less thanfifty
percent
leads to anessentially perfect
recall aspredicted by
Gardner’s theoretical results[10]
in section 3.Fig.
3. - Mean fractionof correctly
recalledspins
as a function of the fractionof initially
correctspins
for K = 8
(random connectivity),
N = 225, N = 625 and N = 2 500 cells for a = 0.375(from below).
As to be
expected
the situationchanges drastically
for a = 0.5. Thestability
of the fixedpoint H =1/2
reducessubstantially
themagnitude
of the basins of attraction of the individual2
attractors. We observe that with
increasing degree
of noise the average transientlengths
forparameter
values above the critical a G = 0.42 increasesdramatically, being
of the order ofseveral
magnitudes larger
than below the critical aG.Moreover,
thesystem
often does notlock into a fixed
point
after more than some five hundred timesteps
wandering
around inphase
spaceindicating
that attractorscorresponding
to chaotic behaviour become dominant.Though
theperformance
of randomneighbour
models isperfect
for a = 0.375 itperforms
very
poorly
and even breaks down forcapacities a
above the critical value a G. The suddenappearance of
long
transients and failures inreaching
acyclic
mode isquite
reminiscent ofdamage spreading
effects in networks withrandomly
chosencouplings,
where above a critical parameter value disorder growsexponentially
with the number of cells[8, 9]
and chaoticdynamics prevails.
1591
storage
capacity
farbeyond
the criticalcapacity
a G = 0.42.Furthermore,
all thepatterns
tobe memorized should be stable fixed
points
of thedynamics
(1)
and their basins of attraction should be aslarge
aspossible.
As a firstimprovement,
weadopt
a trial and error schemeby
choosing
new randomneighbours
if the initial randomneighbour
choice fails to fulfill theembedding
condition(5).
Thestrategy
isrepeated
until the network has found itsquasi-optimal
connections.Figure
4 shows the recallperformance
of such a network f-or K = 8 and a = 0.5 for different numbers of cells N. The final connections have been obtainedafter a few trials on average,
however,
there arelarge
fluctuations,
since most of thepolarization
parameters
satisfy (6)
already
after the initial randomneighbour
choice.Fig.
4. - Mean fraction ofcorrectly
recalled bits as a function of the fraction of correctinput
bits forK = 8, N = 225, N = 625 and N = 2 500 for a = 0.5
(from above).
Crosses show the samequantity
forN = 225 cells without the trial and error strategy.
Note that the
improved
model for a = 0.5performs
somewhat better than the one withoutquasi-optimization
for a = 0.375.Furthermore,
computer
simulations indicate that also for afar
beyond
2 the trial and error scheme workssuccessfully, though
withincreasing a
thelearning
time increasesrapidly
due tolarger
amounts of trials until theembedding
condition(5)
is fulfilled. Infigure
5 we compare theperformance
of theseoptimized sparsely
connectednetworks to that with a
fully
connected model for a = 0.5.Though
in all networks theamount of stored information in bits is alike
(3
200bits),
and all networks have the samenumber of 6 400
couplings
it isquite
clear from thefigure
that thesparsely
connected networksoutperform
theirfully
connectedcounterparts.
Note, however,
that for structuredinformation,
the information content is not the same,Fig.
5. - Meanfraction
of correctly
recalledspins
as a function of the fraction of correctinput spins
forsparsely
connected networksconsisting
of N = 800, N = 400 and N = 200 cells with a = 0.5 andK = 8, K = 16 and K = 32
couplings, respectively (from
thestop).
The dashed line shows the samequantity
for afully
connected networkconsisting
of N = 80 cells.patterns.
Due tostronger
correlations within the individualpatterns
the information contentper bit decreases
strongly
withincreasing
resolution in the case of structuredpatterns.
Thus,
according
tofigure
5,
depending
on technicaltasks,
one can choose betweenmemorizing
afew
particular
patterns
withrelatively high
resolution(K smaller)
and alarger
number ofpatterns
withrelatively
low resolution(K larger).
Let us now
study
theperformance
of the network withoptimized connectivity,
where the ratio of the number of connections and the number of cells y =K/N
remains constant.Figure 6
shows the retrievalperformance
ofsparsely
connected
networks with y = 0.16compared
tofully
connected networks for different numbers of N and a = 0.5.As has been observed
by
Forrest[4]
for the case y = 1 the recall fraction exhibits a crossover effect forincreasing
N for y1,
too. As the number of cells Nincreases,
there is adramatic increase
(decrease)
in the level of errors, when the fraction of correctinput
bits is reduced below(increased above)
a critical value. It isquite
evident that the smaller thequantity
y the more the critical retrieval value is reduced so that theperformance
of the networkimproves.
Our network model is not restricted to the
storage
of random information but is also able toprocess
artificially
structuredpatterns
[11].
Here,
we first connect each cell with itseight
next-nearestneighbours,
while additional suitablelong-ranged
connections emerge from a trial anderror scheme. The
outcoming
architecture is reminiscent ofBraitenberg’s picture [12]
about theinterconnectivity
of corticalpyramidal
cellsmaking short-range
connections with basaldendrites and
long-range
connections withapical
dendrites.According
to thephysiological
rule thatlow-efficacy
synapsesdegenerate
wesuggest
toapply
a trial and error schemeby
1593
Fig.
6. - Mean fraction ofcorrectly
recalled patterns as a function of the fraction ofinitially
correctspins
fora)
N = 100, N = 200 and N = 300 cells with y = 0.16, andb)
N = 40, N = 80 andN = 120 cells with
y = 1. The storage parameter a has been chosen as 0.5.
biologically
motivated strategy ishighly
beneficial forimproving
the convergence to nearoptimal
answers. 5. Conclusion.We have shown that for
randomly
chosen information theperformance
ofoptimized sparsely
connected networks isquite superior
to that ofshort-ranged
lattice models as well as to that of theirfully-connected
counterparts.
It isquite
evident that ourstrategy
forsolving
thecombinatorial
optimization problem
can belargely improved.
Here,
parallel genetic
algorithms [13J,
in thespirit
of the discrete Mendeliangenetic
model can be introduced with aview to
get
closer tonear-optimal
answers. Thegeneral
philosophy
is that thespecific
neighbour
choice,
as usedhere,
and in future work also other networkparameters
such asdegree
ofconnectivity
and thresholdsought
to beadapted
to thespecific
information the network has tocapture
and hence can besubject
to modificationduring
anoptimization
process.Acknowledgments.
The author
gratefully acknowledges
financialsupport
for this workby
the German FederalDepartment
of Research andTechnology
(BMFT)
undergrant
Nr. ITR 8800K4. It is apleasure
to thank the von Seelen group for theirhospitality
at the Institute forNeuroinforma-tics in Bochum
during
several visits. The authorappreciates
many useful andstimulating
discussions with J. W.
Clark,
J.Duarte,
M.Husson,
J. L. vanHemmen,
U.Keller,
G.References