HAL Id: jpa-00246726
https://hal.archives-ouvertes.fr/jpa-00246726
Submitted on 1 Jan 1993
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Noise and competition in neural networks
D. Sherrington, K. Wong, A. Coolen
To cite this version:
D. Sherrington, K. Wong, A. Coolen. Noise and competition in neural networks. Journal de Physique
I, EDP Sciences, 1993, 3 (2), pp.331-337. �10.1051/jp1:1993134�. �jpa-00246726�
Classification Physics Abstracts
87.36 75.10N 05.40
Noise and competition in neural networks(*)
D.
Sherrington,
K.Y.M.Wong(**)
and A.C.C. CoolenDepartment of Physics, University of Oxford, i Keble Road, Oxford, OXI 3NP, G.B.
(Received
17 June1992, accepted 3July1992)
Abstract. A brief summary is given of recent results on the use of noise in the optimal training of neural networks for the retrieval of static patterns and on competition between the
retrieval of static patterns and of sequences in networks with asymmetric synapses.
1. Introduction.
A neural network is an
assembly
ofrelatively simple
units(neurons)
whichdynamically
driveone another
strongly
andcooperatively
via interactions(or
localrules)
which arecompetitive,
so as to lead to a
large
number ofpossible
basins ofdynamic
attraction of theglobal behaviour,
but also are coded(trained)
so that many of those basins are related to learnedactivity
patterns,sequences or associations.
Noise enters into the
operation
andtraining
of neural networks in several ways and can be either stochastic orquenched.
It can affect theability
of a network toretrieve,
thecompetition
between different types ofretrieval,
and thequality
of retrieval. It can also be valuable intraining
the system to achieve desiredperformance.
Retrieval
(or operation)
is the process ofdynamical
evolution of a network with agiven
set of interaction rules. If the system is to have many basins of
attraction, corresponding
to manypieces
of retrievable memorizedinformation,
then there must becompetition
between those attractor basins. The influence of other attractors on any desiredone represents a source
of
quenched
interference noise.Damage
to a network is another source ofquenched
noise.Stochastic noise arises in retrieval if the local rules are
probabalistic,
rather than deterministic.Such stochastic noise is often referred to as thermal noise.
Thus,
for a network ofIsing
neuronswith a
simple parallel synaptic updating
rule«I(t
+i)
=Sign(~ Jo
«>(t)
+Tz;(t)
+@), (i)
(°) This paper is based on a talk given by D. Sherrington at the memorial meeting for R. Rammal held in
Grenoble on May 22nd 1992.
(*°) Present address: Dept. of Physics, the Hong Kong Urtiversity of Science and Technology, Clear Water Bay, Kowlon, Hong Kong.
332 JOURNAL DE PHYSIQUE I N°2
where a
= +I represents the neural state
(firing/non-firing)
andz;(t)
is chosenrandomly
froma Gaussian distribution of zero mean,
quenched
noise resides in the synapses(J;j
andweights (lI§),
while stochastic noise resides in the(Tz;(t)).
Training
is the process ofchoosing
the localrules,
or(J;j)
and(W;)
in theexample above,
so as to achieve some
operational goal.
Noise enters in further aspects in thisregard.
The first such to be mentioned here lies in a technicalprocedure
to determine what is achievable.If the desired aim can be formulated in terms of
maximizing
someperformance
measure, thenthis can be viewed as
equivalent
tofinding
theground
state energy of a'Hamiltonian' in the space of rules and can be achieved(at least,
inprinciple) by
'simulatedannealing'in
which oneintroduces a
complementary quantity analagous
to a temperature, the'annealing temperature',
such that theground
state is obtainedby allowing
this effective system toperform
a stochasticdynamics
whichyields
Gibbs measures at anyannealing
temperatureTa,
andgradually cooling
to Ta
= 0
(or formally taking
thatlimit).
Ta is thus another stochastic noise.A second aspect of noise in the determination of the
(J;j )
lies intraining using distorted,
ornoisy,
patterns.Thus, given
anytraining algorithm
based on thepresentation
of clean patternsto the
network,
one canenvisage
a modification in which thepresented input
is arandomly
distorted version of the clean
data,
with thedegree
of distortion a measure of thetraining
noise. This
procedure
enables the networkto'spread
out' its information storage and can be used to tune thesubsequent
retrievalperformance.
In section 2 we summarize some recent progress on the
analysis
oftraining
with noise in networksstoring
static patterns.Networks with
appropriate asymmetric
synapses can also store pattern sequences in a re-trievable format. One such is to take the
J;j
to begiven by
the modified Hebb formj;
~_~
~
~"~~~
~i
PPeriodic. (2)
Under
parallel dynamics
the networkevolves,
p - p + I - p + 2 etc.By
contrast, the usualHopfield-Hebb
rulep
J..
jf-I £ ~p~p j~)
'J i j
p=1 retrieves static patterns
[I].
A combinationp p
J;j
=N~~v £ fffJ
+N~~(I v) £ ff+~fJ (4)
»=1 p=i
leads to
competition
between the retrieval of static patterns and sequences,depending
on v, T.In section 3 we summarize recent results ofsuch
competition
for intensive p.2.
Waining
with noise.In this section we
briefly
mention some recent work in Oxford ontraining
with noise. We concentrate onIsing
networksobeying
local rules(I)
with zero thresholds and(J;j) obeying
the
spherical
constraints£j J;j
~ = c, where c is theincoming connectivity
per neuron,storing
uncorrelated patterns.
One
training
criterion is to choose the(J;j)
to maximize the average increase inoverlap
with a nominated pattern in one sweep of the
network, starting
from a state ofmacroscopic
overlap
mt withonly
one ofthe patterns; theoverlap
with a pattern p is definedby
m~(t)
=N~~ L«;(t)ff, (5)
where the
if
=+I;p
=I,..
p are the pattern bits. mt is then thetraining overlap,
with dt =(I mt)/2
thetraining
noise. The choices of the distortedinput
patterns may beaveraged
in either an annealed or aquenched fashion;
in the former one maximizes the averageoverlap,
in the latter one averages over the maximumoverlaps.
Aslong
as the number ofnoisy training examples
is greater than the inversetraining
temperaturepa (see below),
thequenched
average over theseexamples yields
the same results as the annealed average.The annealed average output
overlap
isJm,(lAl)
=
(Np)~~ ~
gm,(Al) (6)
Wh~~~
g~, ~A~ = er~
jm~All(i ml
+2)1
~~~and
~P
~
~Pj_,~P (~)
i
~
C i ij jis the
aligning
field of pure patternf~
at site I. Maximization followsby
simulatedannealing;
the
'thermally' averaged performance (output overlap)
atannealing
temperature Ta isgiven by
1#m,ilAtl))Ta
=('n(j exp(fla#m,(lAl)) (9)
where
pa
=T~~
The maximumperformance
isgiven by
theTa
- 0 limit.
So
far,
this is for aparticular
set of patternsif"). Averaging
over thespecific
choicescan be achieved
by application
of thereplica method,
in directanalogy
with thetechniques developed
forspin glasses [5,7],
with the(J;j )
the annealedvariables, analagous
to thespins
in thespin glass problem,
and the patterns(ff)
thequenched
random parameters,analagous
to theexchange
interactionsand/or spin positions
in thespin glass.
Within areplica-symmetric
ansatz and for uncorrelated patterns this
gives
as theaveraged
maximumupdate performance
per pattern
J dApm,(A)gm,(A)
wherep(A)
is thealigning
field distributiongiven by
Pm,
(A)
=f Dtb(A lit)) (lo)
where Dt
= dt exp
(-t~ /2) /@
andI(t)
is the value of I which maximizes gm,(I)- (I-t)~ /27
where 7 is determined
by J Di(I(i) t)~
= o~~ where tY=
p/c
is the storage ratio.For mt " 0+ the network
aligning
field distribution is that of a network with Hebb synapsesJo
=/§~~ LfffJ,
Asmt is increased this evolves
until,
for T= 0 and mt
= l~, one obtains the field
dist~ibution
ofa
maximally
stable network [8]. There are severalinteresting
featuresof the
evolution,
discussed in detail elsewhere [4].The average one-step
update
from aninput overlap
m of networks trained withinput overlap
mt is
fm,(m)
=
f dAPm,(A)gm(A). (ii)
334 JOURNAL DE PHYSIQUE I N°2
For a dilute
asymmetric network,
withconnectivity
c «lnN,
correlations areunimportant
and
fm,(m) provides
an iterative map from which both theasymptotic
retrievaloverlap
and the basinboundary overlaps follow, respectively
from the stable and unstable fixedpoints
of the map m -fm,(m).
Their appearance anddisappearance
as a function of tY,T determine thephase
lines in a retrievalphase diagram. Varying
mtprovides
a means oftuning
thebasin structures. When the
training
noise isvaried,
thedepth
and width of the retrieval basinscorresponding
to the stored patternschange accordingly, leading
to avarying degree
of
competition
amongthem,
and hence various retrieval and non-retrievalphases.
The stable and unstable fixedpoints
of theenvelope mapping
m -fm(m) give
theoptimally
achievableasymptotic
retrievaloverlaps
and basin boundaries.An
analysis
of the normal modeenergies
ofreplica-symmetry breaking (RSB)
fluctuations inJ-space places stability
bounds on the ansatz used above. When thetraining
noise isvaried,
the RSB
region corresponds
to a number ofpossible
solutionscompeting
to be the network withoptimal performance.
However, RSB affectsonly
a smallregion
of the tY mt space ofretrieval as determined
by
the dilute network maps; it excludes alarger region
of mt tY space for values of abeyond
the retrieval limit.In
physics involving noisy signals,
one is used to animportance lying
in thesignal-tc-noise
ratio. Another such
example
is found in thetraining
with noise scenario in neural networks.Asymptotically,
for small tY at T= 0, the
training
noiseregion
mt =@,
at which thesignal strength (mt)
and the interference noise level due to other patterns(/@
areequal,
appearsto mark the convergence of several novel
features;
among them theclosing
of a band gap in thep(A) distribution,
theshrinking together
of upper and lower Almeida-Thouless limits on thereplica-symmetric approximation's stability against
smallsymmetry-breaking fluctuations,
the minimum of a
generalized J-susceptibility,
x=
(~~'~Tp~[(J;()T, (J;j)( ]av,
and the maximum deviation of thetrajectory
inweight
spacefrom
the Hebbian totie maximally
stable network [9]. The network behaviours are different for mt above and below
@,
in their way ofoptimizing
the networkperformance
constrainedby
thecompeting
tendencies of the patterns, as illustratedby
therespective
presence and absence of abandgap
in thep(A)
distribution.
Figure
I shows the loci of some of the above features for the full range of mt and for alarger
range of tY. The lines of minimum
J-susceptibility
and maximum deviation of theweight
spacetrajectory
lie close to one another but above the line of bandmerging
forlarger
tY [9].In the
above,
we have taken theannealing
temperature Ta to zero in theoptimization
exercise.It is also
possible
to extend the concept of noisetraining
to one in which Ta iskept
at a finite temperature and theresulting (J;j)
or I distribution is obtained in thethermodynamic
limit.Technically,
this isanalagous
to aspin glass problem
at finite temperature. This has been considered for cost functionsarising
from rulelearning
with undistortedexamples [10,11]
and could be extended to cover the present case ofrandomly
distorted patterns. In some cases, it has beenargued
that a non-zeroannealing
temperature mayimprove
thegeneralization performance
of the network.For more details the reader is referred to references [2, 4].
3.
Competition
between pattern reconstruction and sequenceprocessing.
In this section we
report briefly
on astudy
of theproperties
of recurrent neural networks withcompetitive synaptic
efficacies asgiven by equation (4),
with p intensive(I.e.
notscaling
with the size of the
system,
which iseffectively
taken asinfinite)
and the patterns random and uncorrelated [11]. We concentrate onparallel
stochastic local fielddynamics
in which themicrostate distribution evolves
according
to the Glauber ruleO.8
O.4
O.2
O.
O. 2. 4. 6. 8. lo
alpha
Fig-I-
Special lines in a noise-trained neural network; the de Almeida-Thouless lines(solid)
mark- ing the onset of instabilities against small replica-symmetry breaking fluctuations, the line markingthe closing of the band-gap in the aligning field distribution
(dashed),
and the retrieval-nonretrieval phase boundary(dotted).
P(«;
t +I)
=£ W(«'
-
«)P(«'; t) (12)
« = al,., aN
(14)
and
again
T is a measure of stochastic retrieval noise.The useful
macroscopic dynamical
measures are theoverlaps m"(t)
as defined inequation (6).
In thethermodynamic
limitthey satisfy
mit
+ii
=(t
tanh(fit.A.m(t))ie (is)
where
A~p
=vb~p
+ii v)b~,p+i IF
modp), (16)
m =
(m~,..., mP)
and )~ denotes an average over patterns(.
Manipulations
ofIS)
lead to usefulequalities
andinequalities
from which followimportant
consequences for the
possible
fixedpoint
solutions. Inparticular,
for T > I theonly
fixedpoint
is m= 0 and'for I > T > 2v -1 the
only
fixedpoint
isuniform,
m =m(I,
I,..I).
Thus,
in neither of theseregions
is there a retrieval fixedpoint.
Infact,
for p > 2 any suchHopfield-like
retrieval solutions are restrictedby
first-order transitions to v >vc(T)
which isgiven
to a(numerically) good approximation by
336 JOURNAL DE PHYSIQUE I N°2
vc(t)
= + TT~
+~T~ ~T~. (17)
Fixed
points
are notnecessarily
stable.Stability requirements
restrict further thestability
of the uniform fixed
point
toregions
close to T = Iand,
for podd,
T =0,
which shrinkrapidly
as p is increased. Without stable fixed
points
theonly
alternatives areperiodic sequential
solutions.
One such set of
sequential
solutions follows from ageneral
symmetrymapping.
Thismapping
relates solutions of
(IS)
for u tocorresponding
solutions for v -ii u).
For any solution M =m(I),m(2),..
there exists a solution with identicalstability given by
M'= D M
where
[D M](n)
eD(n)m(n)
andDin)
= S~K withS~p
=b~,p+i
andK~p
=b~,p+i-p.
In
particular,
therefore anyHopfield
solution for u > uc has as its counterpart ap-periodic sequential
solution for u - I u, I.e. in theregion
u <ii vc),
with first-order transitions at the limits.More
interesting sequential
solutions lie between these first order boundariesii uc(T))
anduc(T).
These evolvesequentially
andperiodically through
the p patterns moreslowly
than those discussed in the lastparagraph.
Infact,
in the limit oflarge
p, andnumerically
to agood approximation
even forrelatively
small p, the relative sequencespeed
w ep/Q,
where Q is theperiod,
isgiven by
lim wv
= I u
(18)
p-co
This is illustrated in
Figure
2 which shows values ofii w)
obtainedasymptotically
for systems with various u in simulations started in pureHopfield
states, withsuperimposed
the first order boundaries of theregions
ofHopfield
andp-periodic cyclical
solutions and also thestability boundary
of the uniform state. p = 10 in the case exhibited.Thus,
by varying
the values of u,T a system describedby equations (4)
and(12)
can be made to retrieve either static patterns or sequences, with the transitionspeed
of the sequencetunable.
Further details may be found in reference
Ill].
Since the above discussion is restricted to p
intensive,
there is nosignificant
effect ofinterfer-ence noise between the patterns. It would be
interesting
toinvestigate
how the above could be extended to an extensive number ofsets of(finite)
pattern sequences, when such interference could beexpected
toplay
a role.Similarly,
it would beinteresting
to consider the effect ofreplacing
the Hebbian synapses of eqn.(4) by corresponding
noise-trained formsanalagous
to those considered in section 2, with the added feature that different
training
noises could be used ingenerating
the pattern retrieval and sequencegenerating
synapses,using
chemicalmodulaters to switch on and off different modes of
training
[13].Acknowledgement.
The work
reported
above has benefitted from discussions with several other members of the condensed mattertheory
group at Oxford and has beenperformed
within the context of an EC SCIENCEtwinning
programme, "Statistical Mechanics and itsApplications
toComplex
Problems inPhysics, Engineering
andBiology".
Until hisuntimely death,
Rammal Rammalwas the coordinator of the Grenoble branch of that
twinning
and all members of the Oxford group honour his memory.H °
o
o
OO §
O
O
O O
c
o
o ~ i
Fig.2.
Superposition of the phasediagram
and the measure(i
w) of the relative sequence proce8s- ing speeds of a competitive network described by equations(4)
and(12)
for p = lo. H, M, C denote regions in u, T space, respectively for Hopfield-like pattern retrieval, the uniform mixture state, and p-periodic cycles; in the remaining region C' only sequences with periods different from pare possible.
The phase boundaries between H, C and C'
are first order, between M and C' they
are second order.
The 'points' show the values of
(i
w) reached in the asymptotic behaviour of systems started in pure pattern states for u= 0.1, 0.2 o.9
(from
bottom totop)
as a function of the retrieval temperature.References
[1] J.J. Hopfield, Proc. Nat). Acad. Sci. USA 79, 2554
(1982).
[2] K.Y.M. Wong and D. Sherrington, J. Phys A23, L175
(1990).
[3] K.Y.M. Wong and D. Sherrington, J. Phys A23, 4659
(1990).
[4] K.Y.M.
Wong
and D. Sherrington, "Retrieval behaviour and basin structures of noise-optimalneural networks", OUTP-91-365, to be published.
[5] R.Rammal and J. Souletie, in "Magnetism of Metals and Alloys"
(ed.
M.Cyrot),
p 379(North-
Holland
1982).
[6] K.H. Fischer and J-A- Hertz, "Spin Glasses"
(Cambridge
University Press1991).
[7] D. Sherrington, in "Electronic Phase Transitions"
(eds.
W. Hanke and Yu. V.Kopaev)
p 79(North-Holland 1992).
[8] E. Gardner and B. Derrida, J. Phys A21, 271
(1988).
[9] K.Y.M. Wong, A. Rau and D. Sherrington, "Weight Space Organization of Optimized Neural Networks", Europhys. Lett. 19, 559
(1992).
[lo]
H. Sompolinsky, N. Tishby and H.S. Seung, Phys. Rev. Lett. 65, 1683(1990).
[ii]
T.L.H. Watkin, A. Rau and M. Biehl, "The Statistical Mechanics ofLearning
a Rule", OUTP- 92-ios, to be published.[12] A.C.C. Coolen and D. Sherrington, "Competition between pattern reconstruction and sequence processing in nonsymmetric neural networks", J. Phys. A25, 5493
(1992).
[13] A.C.C. Coolen, A.J. Noest and G.B. de Vries. "Modelling Chemical Modulation of Neural Processes", OUTP-92-285, to be published.