Noise and competition in neural networks

(1)

HAL Id: jpa-00246726

https://hal.archives-ouvertes.fr/jpa-00246726

Submitted on 1 Jan 1993

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Noise and competition in neural networks

D. Sherrington, K. Wong, A. Coolen

To cite this version:

D. Sherrington, K. Wong, A. Coolen. Noise and competition in neural networks. Journal de Physique

I, EDP Sciences, 1993, 3 (2), pp.331-337. �10.1051/jp1:1993134�. �jpa-00246726�

(2)

Classification Physics Abstracts

87.36 75.10N 05.40

Noise and competition in neural networks(*)

D.

Sherrington,

K.Y.M.

Wong(**)

and A.C.C. Coolen

Department of Physics, University of Oxford, i Keble Road, Oxford, OXI 3NP, G.B.

(Received

17 June1992, accepted 3

July1992)

Abstract. A brief _summary is given of recent results _on the _use of noise in the optimal training of neural networks for the retrieval of static patterns and _on competition between the

retrieval of static patterns and of _sequences in networks with asymmetric _synapses.

1. Introduction.

A neural network is _an

assembly

of

relatively simple

units

(neurons)

which

dynamically

drive

one another

strongly

and

cooperatively

via interactions

(or

local

rules)

which _are

competitive,

so as to lead to _a

large

number of

possible

basins of

dynamic

attraction of the

global behaviour,

but also _are coded

(trained)

_so that _many of those basins _are related to learned

activity

patterns,

sequences or associations.

Noise enters into the

operation

and

training

of neural networks in several _ways and _can be either stochastic _or

quenched.

It _can affect the

ability

of _a network to

retrieve,

the

competition

between different types of

retrieval,

and the

quality

of retrieval. It _can also be valuable in

training

the system to achieve desired

performance.

Retrieval

(or operation)

is the _process of

dynamical

evolution of _a network with _a

given

set of interaction rules. If the system is to have _many basins of

attraction, corresponding

to many

pieces

of retrievable memorized

information,

then there must be

competition

between those attractor basins. The influence of other attractors _on any desired

one represents a source

of

quenched

interference noise.

Damage

to _a network is another _source of

quenched

noise.

Stochastic noise arises in retrieval if the local rules _are

probabalistic,

rather than deterministic.

Such stochastic noise is often referred to as thermal noise.

Thus,

for _a network of

Ising

neurons

with _a

simple parallel synaptic updating

rule

«I(t

+

i)

₌

Sign(~ Jo

_«>

(t)

+

Tz;(t)

+

@), (i)

(°) This _paper is based _on _a talk given by D. Sherrington at the memorial meeting for R. Rammal held in

Grenoble _on May 22nd 1992.

(*°) Present address: Dept. of Physics, the Hong Kong Urtiversity of Science and Technology, Clear Water Bay, Kowlon, Hong Kong.

(3)

332 JOURNAL DE PHYSIQUE I N°2

where _a

= +I represents ^the ^neural ^state

(firing/non-firing)

and

z;(t)

is chosen

randomly

from

a Gaussian distribution of _zero _mean,

quenched

noise resides in the _synapses

(J;j

and

weights (lI§),

while stochastic noise resides in the

(Tz;(t)).

Training

is the _process of

choosing

the local

rules,

_or

(J;j)

and

(W;)

in the

example above,

so as to achieve _some

operational goal.

Noise enters in further aspects in this

regard.

The first such to be mentioned here lies in _a technical

procedure

to determine what is achievable.

If the desired aim _can be formulated in terms of

maximizing

_some

performance

_measure, ^then

this _can be viewed _as

equivalent

to

finding

the

ground

state energy of a'Hamiltonian' in the space of rules and _can be achieved

(at least,

in

principle) by

'simulated

annealing'in

which _one

introduces _a

complementary quantity analagous

to a temperature, ^the

'annealing temperature',

such that the

ground

state is obtained

by allowing

this effective system to

perform

a stochastic

dynamics

which

yields

Gibbs _measures at any

annealing

temperature

Ta,

and

gradually cooling

to Ta

= 0

(or formally taking

that

limit).

Ta is thus another stochastic noise.

A second aspect ^{of noise} ^{in the} determination of the

(J;j )

lies in

training using distorted,

_or

noisy,

patterns.

Thus, given

_any

training algorithm

based _on the

presentation

of clean patterns

to the

network,

_one _can

envisage

a modification in which the

presented input

is _a

randomly

distorted version of the clean

data,

with the

degree

of distortion _a _measure of the

training

noise. This

procedure

enables the network

to'spread

out' its information storage ^and can be used to tune the

subsequent

retrieval

performance.

In section 2 _we summarize _some recent progress on the

analysis

of

training

with noise in networks

storing

static patterns.

Networks with

appropriate asymmetric

_synapses _can also store pattern _sequences ⁱⁿ _a re-

trievable format. One such is to take the

J;j

to be

given by

the modified Hebb form

j;

~_~

~

~"~~~

~i

_P

Periodic. (2)

Under

parallel dynamics

the network

evolves,

_p _- _{p + I} _- _{p + 2} etc.

By

contrast, the usual

Hopfield-Hebb

rule

p

J..

jf-I £ _~p~p _j~)

'J i j

p=1 retrieves static patterns

[I].

A combination

p p

J;j

₌

^N~~v £ _fffJ

₊

_N~~(I _v) £ _ff+~fJ ₍₄₎

»=1 p=i

leads _to

competition

between the retrieval of static patterns ^and _sequences,

depending

_on _v, T.

In section 3 _we summarize recent results ofsuch

competition

for intensive _p.

2.

Waining

with noise.

In this section _we

briefly

mention _some recent work in Oxford _on

training

with noise. We concentrate _on

Ising

networks

obeying

local rules

(I)

with _zero thresholds and

(J;j) obeying

the

spherical

constraints

£j J;j

^~ ₌ _c, where _c is the

incoming connectivity

_per _neuron,

storing

uncorrelated patterns.

One

training

criterion is to choose the

(J;j)

to maximize the _average increase in

overlap

with _a nominated pattern in _one _sweep of the

network, starting

from _a _state of

macroscopic

overlap

_mt with

only

_one ofthe patterns; ^the

overlap

with _a pattern _p ^is ^defined

by

(4)

m~(t)

₌

^N~~ L«;(t)ff, ₍₅₎

where the

if

₌

+I;p

₌

I,..

_p _are the pattern bits. _mt is then the

training overlap,

with dt ₌

(I mt)/2

the

training

noise. The choices of the distorted

input

patterns may be

averaged

in either _an annealed _or _a

quenched fashion;

in the former _one maximizes the _average

overlap,

in the latter _one averages over the maximum

overlaps.

As

long

_as the number of

noisy training examples

is greater ^than ^the ^inverse

training

temperature

pa (see below),

the

quenched

_average _over these

examples yields

^the _same ^results as the annealed _average.

The annealed _average output

overlap

is

Jm,(lAl)

=

(Np)~~ ~

_gm,

(Al) (6)

Wh~~~

g~, ~A~ = er~

jm~All(i ^ml

⁺

²⁾¹

~~~

and

~P

~

_~P

j_,~P (~)

i

~

_C ⁱ ^ij ^j

is the

aligning

field of _pure pattern

f~

at site I. Maximization follows

by

simulated

annealing;

the

'thermally' averaged performance (output overlap)

at

annealing

temperature Ta is

given by

1#m,ilAtl))Ta

₌

('n(j exp(fla#m,(lAl)) (9)

where

pa

₌

T~~

The maximum

performance

is

given by

the

Ta

- 0 limit.

So

far,

this is for _a

particular

set of patterns

if"). Averaging

over the

specific

choices

can be achieved

by application

of the

replica method,

in direct

analogy

with the

techniques developed

for

spin glasses [5,7],

^with ^the

(J;j )

the annealed

variables, analagous

to the

spins

in the

spin glass problem,

and the patterns

(ff)

the

quenched

random parameters,

analagous

to the

exchange

interactions

and/or spin positions

in the

spin glass.

Within _a

replica-symmetric

ansatz and for uncorrelated patterns this

gives

_as the

averaged

maximum

update performance

per pattern

J dApm,(A)gm,(A)

where

p(A)

is the

aligning

^field distribution

given by

Pm,

(A)

₌

f _Dtb(A _lit)) _(lo)

where Dt

= dt exp

(-t~ _/2) /@

_and

_I(t)

_is _the _value of I which maximizes _gm,

(I)- (I-t)~ _/27

where ₇ is determined

by J Di(I(i) _t)~

₌ ^o~~ where _tY

=

p/c

is the storage ^ratio.

For _mt _" 0+ the network

aligning

field distribution is that of _a network with Hebb _synapses

Jo

₌

/§~~ LfffJ,

_As

mt is increased this evolves

until,

for T

= 0 and _mt

= l~, one obtains the field

dist~ibution

_of

a

maximally

stable network [8]. ^There are several

interesting

^features

of the

evolution,

discussed in detail elsewhere [4].

The _average one-step

update

from _an

input overlap

_m of networks trained with

input overlap

mt is

fm,(m)

=

f dAPm,(A)gm(A). (ii)

(5)

For _a dilute

asymmetric network,

with

connectivity

_c «

lnN,

correlations _are

unimportant

and

fm,(m) provides

_an iterative _map from which both the

asymptotic

retrieval

overlap

and the basin

boundary overlaps follow, respectively

from the stable and unstable fixed

points

of the map m -

fm,(m).

Their appearance and

disappearance

_as _a function of tY,T determine the

phase

lines in _a retrieval

phase diagram. Varying

_mt

provides

_a _means of

tuning

the

basin structures. When the

training

noise is

varied,

the

depth

and width of the retrieval basins

corresponding

to the stored patterns

change accordingly, leading

to a

varying degree

of

competition

_among

them,

and hence various retrieval and non-retrieval

phases.

The stable and unstable fixed

points

of the

envelope mapping

_m _-

fm(m) give

the

optimally

achievable

asymptotic

retrieval

overlaps

and basin boundaries.

An

analysis

of the normal mode

energies

of

replica-symmetry breaking (RSB)

fluctuations in

J-space places stability

bounds _on the ansatz used above. When the

training

noise is

varied,

the RSB

region corresponds

to _a number of

possible

solutions

competing

to be the network with

optimal performance.

However, RSB affects

only

_a small

region

of the _tY _mt space of

retrieval _as determined

by

the dilute network _maps; it excludes _a

larger region

of _mt _tY space for values of _a

beyond

the retrieval limit.

In

physics involving noisy signals,

_one is used to _an

importance lying

in the

signal-tc-noise

ratio. Another such

example

is found in the

training

with noise scenario in neural networks.

Asymptotically,

for small _tY at T

= 0, the

training

noise

region

_mt ₌

@,

_at which the

signal strength (mt)

and the interference noise level due to other patterns

(/@

_are

equal,

_appears

to mark the convergence of several novel

features;

_among them the

closing

of _a band _gap in the

p(A) distribution,

the

shrinking together

of upper and lower Almeida-Thouless limits _on the

replica-symmetric approximation's stability against

small

symmetry-breaking fluctuations,

the minimum of _a

generalized J-susceptibility,

_x

=

(~~'~Tp~[(J;()T, (J;j)( _]av,

_and the maximum deviation of the

trajectory

in

weight

_space

from

_the _Hebbian _to

tie _maximally

stable network [9]. ^The network behaviours _are different for _mt above and below

@,

in their _way of

optimizing

the network

performance

constrained

by

the

competing

tendencies of the patterns, as illustrated

by

the

respective

_presence and absence of _a

bandgap

in the

p(A)

distribution.

Figure

I shows the loci of _some of the above features for the full _range of _mt and for _a

larger

range of _tY. The lines of minimum

J-susceptibility

and maximum deviation of the

weight

_space

trajectory

lie close to one another but above the line of band

merging

for

larger

_tY _[9].

In the

above,

_we have taken the

annealing

temperature Ta to zero in the

optimization

exercise.

It is also

possible

to extend the concept of noise

training

to _one in which Ta is

kept

at _a finite temperature and the

resulting (J;j)

_or I distribution is obtained in the

thermodynamic

limit.

Technically,

this is

analagous

to _a

spin glass problem

at finite temperature. This has been considered for cost functions

arising

from rule

learning

with undistorted

examples [10,11]

and could be extended to _cover the present case of

randomly

distorted patterns. In _some cases, it has been

argued

that _a _non-zero

annealing

temperature _may

improve

the

generalization performance

of the network.

For _more details the reader is referred to references [2, 4].

3.

Competition

between pattern reconstruction and _sequence

processing.

In this section _we

report briefly

_on _a

study

of the

properties

of _recurrent neural networks with

competitive synaptic

efficacies _as

given by equation (4),

with _p intensive

(I.e.

not

scaling

with the size of the

system,

^which is

effectively

taken _as

infinite)

and the patterns random and uncorrelated [11]. ^We concentrate on

parallel

stochastic local field

dynamics

in which the

microstate distribution evolves

according

to the Glauber rule

(6)

O.8

O.4

O.2

O.

O. 2. 4. 6. 8. lo

alpha

Fig-I-

Special lines in _a noise-trained neural network; the de Almeida-Thouless lines

(solid)

marking the onset of instabilities against small replica-symmetry breaking fluctuations, ^the line marking

the closing of the band-gap in the aligning field distribution

(dashed),

and the retrieval-nonretrieval phase boundary

(dotted).

P(«;

t +

I)

₌

£ W(«'

-

«)P(«'; t) (12)

« = al,., aN

(14)

and

again

T is _a _measure of stochastic retrieval noise.

The useful

macroscopic dynamical

_measures are the

overlaps m"(t)

_as defined in

equation (6).

In the

thermodynamic

limit

they satisfy

mit

+

ii

₌

(t

tanh

(fit.A.m(t))ie (is)

where

A~p

₌

vb~p

+

ii v)b~,p+i IF

^mod

p), (16)

m =

(m~,..., mP)

and )~ denotes _an _average _over patterns

(.

Manipulations

^of

IS)

^lead to useful

equalities

and

inequalities

from which follow

important

consequences for the

possible

fixed

point

solutions. In

particular,

^for T > I the

only

fixed

point

is _m

= 0 and'for I _> T _> 2v -1 the

only

fixed

point

is

uniform,

_m ₌

m(I,

I,..

I).

Thus,

in neither of these

regions

is there _a retrieval fixed

point.

In

fact,

for _p _> 2 _any such

Hopfield-like

retrieval solutions _are restricted

by

first-order transitions to _v _>

vc(T)

which is

given

to _a

(numerically) good approximation by

(7)

vc(t)

₌ + T

T~

₊

~T~ ~T~. (17)

Fixed

points

_are not

necessarily

stable.

Stability requirements

restrict further the

stability

of the uniform fixed

point

to

regions

^close to T ₌ I

and,

for _p

odd,

T ₌

0,

which shrink

rapidly

as p is increased. Without stable fixed

points

the

only

alternatives _are

periodic sequential

solutions.

One such set of

sequential

solutions follows from _a

general

symmetry

mapping.

This

mapping

relates solutions of

(IS)

for _u to

corresponding

solutions for _v _-

ii u).

For _any solution M =

m(I),m(2),..

there exists _a solution with identical

stability given by

M'

= D M

where

[D M](n)

_e

D(n)m(n)

and

Din)

₌ S~K with

S~p

₌

b~,p+i

^and

K~p

₌

b~,p+i-p.

In

particular,

therefore _any

Hopfield

solution for _u _> _uc has _as its counterpart _a

p-periodic sequential

solution for _u _- I _u, I.e. in the

region

_u <

ii vc),

^with first-order transitions at the limits.

More

interesting sequential

solutions lie between these first order boundaries

ii uc(T))

and

uc(T).

These evolve

sequentially

and

periodically through

the p patterns more

slowly

than those discussed in the last

paragraph.

In

fact,

in the limit of

large

_p, ^and

numerically

to _a

good approximation

_even for

relatively

small _p, the relative _sequence

speed

_w e

p/Q,

where Q is the

period,

is

given by

lim _wv

= I _u

(18)

p-co

This is illustrated in

Figure

2 which shows values of

ii w)

obtained

asymptotically

for systems ^with ^various u in simulations started in pure

Hopfield

states, ^with

superimposed

the first order boundaries of the

regions

of

Hopfield

and

p-periodic cyclical

solutions and also the

stability boundary

of the uniform state. _p ₌ 10 in the _case exhibited.

Thus,

by varying

the values of u,T _a system ^described

by equations (4)

and

(12)

_can be made to retrieve either static patterns or sequences, with the transition

speed

of the _sequence

tunable.

Further details _may be found in reference

Ill].

Since the above discussion is restricted to p

intensive,

there is _no

significant

effect ofinterfer-

ence noise between the patterns. It would be

interesting

to

investigate

how the above could be extended to _an extensive number ofsets of

(finite)

pattern _sequences, when such interference could be

expected

to

play

_a role.

Similarly,

it would be

interesting

to consider the effect of

replacing

the Hebbian _synapses of _eqn.

(4) by corresponding

noise-trained forms

analagous

to those considered in section 2, ^with the added feature that different

training

noises could be used in

generating

the pattern retrieval and _sequence

generating

_synapses,

using

chemical

modulaters to switch _on and off different modes of

training

_[13].

Acknowledgement.

The work

reported

above has benefitted from discussions with several other members of the condensed matter

theory

_group at Oxford and has been

performed

within the context of _an EC SCIENCE

twinning

_programme, "Statistical Mechanics and its

Applications

to

Complex

Problems in

Physics, Engineering

and

Biology".

Until his

untimely death,

Rammal Rammal

was the coordinator of the Grenoble branch of that

twinning

and all members of the Oxford group honour his _memory.

(8)

H ^°

o

OO §

O

O O

c

o

o ~ ⁱ

Fig.2.

Superposition of the phase

diagram

and the _measure

(i

w) of the relative _sequence _proce8s- ing speeds ^of _a competitive network described by equations

(4)

^and

(12)

^for _p ₌ lo. H, M, C denote regions in _u, T _space, respectively for Hopfield-like _pattern retrieval, the uniform mixture state, and p-periodic cycles; in the remaining region ^C' only _sequences with periods different from _p

are possible.

The phase boundaries between H, C and C'

are first order, between M and C' they

are second order.

The 'points' show the values of

(i

_w) reached in the asymptotic behaviour of systems started in pure pattern states for _u

= 0.1, 0.2 o.9

(from

bottom to

top)

_as _a function of the retrieval temperature.

References

[1] ^J.J. Hopfield, Proc. Nat). Acad. Sci. USA 79, 2554

(1982).

[2] K.Y.M. Wong and D. Sherrington, J. Phys A23, L175

(1990).

[3] ^K.Y.M. Wong and D. Sherrington, J. Phys A23, 4659

(1990).

[4] K.Y.M.

Wong

and D. Sherrington, "Retrieval behaviour and basin structures of noise-optimal

neural networks", OUTP-91-365, to be published.

[5] R.Rammal and J. Souletie, in "Magnetism of Metals and Alloys"

(ed.

M.

Cyrot),

_p 379

(North-

Holland

1982).

[6] K.H. Fischer and J-A- Hertz, "Spin Glasses"

(Cambridge

University Press

1991).

[7] D. Sherrington, in "Electronic Phase Transitions"

(eds.

W. Hanke and Yu. V.

Kopaev)

_p 79

(North-Holland 1992).

[8] ^E. ^Gardner ^and ^B. Derrida, J. Phys A21, 271

(1988).

[9] ^K.Y.M. Wong, A. Rau and D. Sherrington, "Weight Space Organization of Optimized Neural Networks", Europhys. Lett. 19, 559

(1992).

[lo]

H. Sompolinsky, N. Tishby and H.S. Seung, Phys. Rev. Lett. 65, 1683

(1990).

[ii]

T.L.H. Watkin, A. Rau and M. Biehl, "The Statistical Mechanics of

Learning

_a Rule", OUTP- 92-ios, to be published.

[12] ^A.C.C. ^Coolen ^and ^D. Sherrington, "Competition between pattern reconstruction and _sequence processing in nonsymmetric neural networks", J. Phys. A25, 5493

(1992).

[13] ^A.C.C. Coolen, A.J. Noest and G.B. de Vries. "Modelling Chemical Modulation of Neural Processes", OUTP-92-285, to be published.