• Aucun résultat trouvé

On the storage of correlated patterns in Hopfield's model

N/A
N/A
Protected

Academic year: 2021

Partager "On the storage of correlated patterns in Hopfield's model"

Copied!
13
0
0

Texte intégral

(1)

HAL Id: jpa-00212374

https://hal.archives-ouvertes.fr/jpa-00212374

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

On the storage of correlated patterns in Hopfield’s model

J.F. Fontanari, W.K. Theumann

To cite this version:

(2)

On

the

storage

of correlated

patterns

in

Hopfield’s

model

J. F. Fontanari

(1,* )

and W. K. Theumann

(2)

(1)

Division of

Chemistry,

California Institute of

Technology,

Pasadena, CA 91125, U.S.A

(2)

Instituto de Fisica, Universidade Federal do Rio Grande do Sul, Caixa Postal 15051, 91500 Porto

Alegre,

RS, Brazil

(Reçu

le 3 avril 1989, révisé le 18 octobre 1989,

accepté

le 21 novembre

1989)

Résumé. 2014 On étudie les effets de

stockage de p

patterns

statistiquement indépendants

mais effectivement corrélés dans le modèle de

Hopfield

de mémoire associative. Ceci conduit à proposer une

règle d’apprentissage

locale

qui,

en augmentant les différences entre patterns,

conduit à une mémorisation d’efficacité

comparable

à celle des

règles d’apprentissage

non locales. Abstract. 2014

The effects of

storing

p

statistically independent

but

effectively

correlated patterns in the

Hopfield

model of associative memory are studied. This leads us to propose a local

learning

rule which

by enhancing

the differences among the patterns allows the network to store them with

an

efficiency comparable

to nonlocal

learning

rules. Classification

Physics

Abstracts 87.30G - 64.60C -

75. 10H

1. Introduction.

The statistical mechanics

analysis

of

feedback,

fully

connected,

neural networks has made

possible

to unveil a

variety

of

interesting

features of these

systems

which would have been hard to detect

solely through

numerical simulations

[1, 2].

The most studied neural network is the

Hopfield

model of associative memory : the states of the neurons are

represented by Ising

spins, Si =

+ 1

(active) or Si = -

1

(passive)

and the

system

of N

interacting

neurons is

governed by

the Hamiltonian

[3]

The stored

patterns {gr

= ±

1 , £

=

1,

... , p }

are

imprinted

on the

Ji/s

(synaptic

connec-tions)

by

the

generalized

Hebb rule

(*)

Present address : Instituto de Fisica e Quimica de Sao

Carlos,

Universidade de Sao Paulo, 13560 Sao Carlos SP Brazil.

(3)

The

equilibrium

states are characterized

by

an

overlap

vector m of p

components

defined

by

i = 1

for g

=

1,

..., p.

In this

approach

the neural network is viewed as a

system

of

Ising spins

in contact with a

heat bath which simulates the

biological synaptic

noise. The tools

developed

for infinite range

spin-glasses

[4]

allow an

analytical study

of the

equilibrium

properties

of neural networks

[1, 2].

It has been shown that there exist so called mixture states, i. e . states that are linear

combinations of the stored

patterns,

in which m has several

macroscopic,

0(1),

components

[1].

That

study

uncovered,

somewhat

surprisingly,

that the

synaptic

noise suppresses the

mixture states in favour of the retrieval states in which m has

only

one

macroscopic

component.

Storing

correlated

patterns

in the

Hopfield

model is a

problem

that involves mixture states since the network’s state must have either a non-zero

overlap

with all the

patterns

or with none. In this paper we consider the

symmetric

and

asymmetric

mixture states. The former describes the situation in which the network confuses the

patterns,

i.e. it cannot

perceive

the individual details which make a

pattern

different from the others. This state is described

by

the vector

N

where

mp

= j7 iE et’ Si

for JI- =

1,

..., p. The

asymmetric

state we

consider,

for

simplicity,

N ;

= 1

distinguishes only,

one of the

pattems

from the other p -

1,

and is described

by

the vector

where and for

These two states

capture

the essence of the

problem

of

storing

correlated

patterns

in

Hopfield’s

model : the network’s

tendency

to enhance the common

part

of the

patterns

makes it more difficult to retrieve the details that

distinguish

them.

Our

study

is restricted to a

particular

type

of correlations in which the

patterns

are

statistically independent

random variables

generated by

the

asymmetric

distribution

with

bl

=

(1

+

a )/2,

b2

=

(1 -

a )/2,

and a E

[- 1, 1 ].

Though independent,

the

pattems

are

effectively

correlated since

for IL ::1=

v . Due to this

particular

choice of correlations the

asymmetric

state is p

degenerate ;

any

component

we choose to be ml leads to an

equivalent asymmetric

state.

This paper is

organized

as follows. In section 2 ,we

study

the

thermodynamics

of the

Hopfield

model in the

regime

p /N -

0 when Nu oo. The

simplicity

of the model

singles

out

the effects of the correlations among the

patterns.

In section 3 we propose a local

learning

(4)

which enhances the

asymmetric

state

by suppressing

its

rival,

the

symmetric

state. The network’s overall

performance

is

comparable

to the nonlocal modified Hebb rule

proposed by

Amit et al.

[5]

where the

nonlocality

is due to the

parameter a

which stands for the average

activity

rate of the entire network. In section 4 we discuss our results and

present

some

concluding

remarks.

2. The

generalized

Hebb rule.

The

design

of associative memory models may be

thought

of as an

optimization problem

[6].

Given p

patterns { gr, IL

=

1, ..., p}

to be stored in the network we must choose the connection matrix such that the local minima of the cost function or energy occur when the network’s state

S1

is near each one of the

patterns.

The

simplest

guess is

where m’ is defined

by

equation (1.3).

Notice

that,

except

for the

diagonal

term, this

equation

is identical to

equation

(1.1).

A rather different criterium for the

design

of an

effective associative memory

model,

which we do not pursue in this paper, is that the basins of attraction of the local minima be as

large

as

possible

[7],

and some progress in this direction

has

already

been made with correlated

patterns

[8].

The

thermodynamics

of the Hamiltonian

(2.1)

has been

fully

studied

by

Amit et al. when the stored

patterns

are uncorrelated

[1, 2].

This section focuses on the

problems

of

using

the

generalized

Hebb

rule,

equation

(1.2),

to store correlated

patterns.

Since the

diagonal

term in

(2.1)

does not affect the

thermodynamics,

we can

straightfor-wardly

take the average free energy

density

from reference

[5],

p

where

m. e = E

M »

e,"

and m," is

given by

the saddle

point

equation

W=1 1

for g

=

1,

..., p. The

site subscripts

were

dropped

since the

self-averaging

property

of 1 N

f

allows us to

replace N

(... )

by

the averages

>

over the

e’s,

and

a -1-

T is a

N i =1

i

parameter

measuring

the amount of noise

acting

in the network. Next we consider two

particular

solutions of

equation

(2.3).

z

2.1 SYMMETRIC SOLUTIONS. -

(5)

where and stands for

with and

Expanding

equation

(2.4)

in powers of

mp

one finds

where

and the critical

temperature

Tc

at which the

symmetric

state

undergoes

a continuous transition to the

paramagnetic

state

(mP

=

0 )

is

This

equation

is an indication that the correlations among the

patterns

make the

symmetric

state more robust to noise effects. However this transition

only

occurs if the

symmetric

solutions are stable near

T,,

which is not the case for a =

0,

since the

odd-p symmetric

solutions are unstable above a certain

temperature

0

Tp

1 and the even-p are

always

unstable

[1].

Next we show how these results

change

forez

0.

The elements of the matrix A whose

eigenvalues

determine the local

stability

of the saddle

point

solutions,

equation

(2.3),

are

which for the

symmetric

solutions are reduced to

where

There are two

types

of

eigenvalues :

. a

nondegenerate

eigenvalue

. a p - 1

degenerate eigenvalue

The

signs

of these

eigenvalues

determine the local

stability

of the

symmetric

solutions. In the limit T -> 0 one

finds q =

1 -

prob

(z.

=

0)

and

Q .- a 2 - prob

(z.

= 0 ).

Since

prob

(z.

=

0)

=

Cp,p/z(b1 b2 )p/2

is nonzero

only

for even p, the

odd-p

symmetric

solutions

(6)

Near

Tc

the

expansion

of

equations

(2.lla-b)

in powers of

mp

gives

and

Substituting

these results into the

equations

fors

1

and À 2

yields

Hence

1 is

always

positive

below

Tc

and À 2

becomes

negative,

signaling

the

instability

of the

symmetric

solutions,

only

in the limit

a2 -+

0 for T --

T,.

For non-zero a, and T not too far

below

Tc, À 2

is

positive

independently

of the

parity

of p.

This

implies

that the even-p solutions must become stable above a certain 0

Tp

T,,.

To

identify

the

regions

of

stability

of the

symmetric

solutions in the space of

parameters

a and T we

perform

a numerical

analysis

Of À 2, equation (2.13),

since it is the first

eigenvalue

to

change

of

sign.

The results for several values of

odd p are presented

in

figure

1,

where the

odd-p

solutions are unstable inside the contours of

Tp (a ).

Increasing

p increases the

region

of

stability.

For each p there is a critical value of a above which these solutions are stable for all

T

T,.

The results for even-p are

presented

in

figure

2,

where the even-p

symmetric

solutions

become stable above

T.

T,.

This

figure

shows

clearly

that the role

played by

the

synaptic

noise

changes

when the

patterns

are correlated - it stabilizes the

symmetric

mixture state.

So far we have studied the network’s

tendency

to enhance the common information

contained in the

patterns

reflected in the

increasing stability

of the

symmetric

state. Next we

show how this

tendency jeopardizes

the network’s

ability

to retrieve individual details of the stored

patterns.

2.2 ASYMMETRIC SOLUTIONS. - The

asymmetric

solutions,

equation

(1.5),

obey

the

equations

p

where S, = ml -t mp - 1 zp - 1,

and

zp _ 1

= E e," .

For weak

correlations,

i.e. a « 1 these

/t=2

(7)

Fig. 1.

Fig.

2.

Fig.

1. - The

odd-p symmetric

solutions are unstable inside the contours of

T.

(solid curves)

shown for p = 3, 5, 7, 9, 21 and above

T,,

(broken curves)

shown for p = 3 and 21.

Fig.

2. -

The even-p

symmetric

solutions are unstable below

Tp

(solid curves)

shown for p = 2, 4, 6, 12 and above

7c (broken curves)

shown

for p

= 2 and 12.

Thus

making a

= 0 one recovers the Mattis solutions. Notice that the retrieval

overlap

ml is

only

slightly

reduced

by

the presence of weak correlations among the

patterns.

Next we consider the zero

temperature

solutions of

equations

(2.17).

Taking

the limit

f3 -+

00 in

equations

(2.18)

one finds

which

clearly

is the best the network can do for

retrieving

patterns

correlated

according

to

equation

(1.7).

However,

it can be

easily

verified that this solution exists

only

if

S’+

and E -

are

positive.

This condition is satisfied for a a,, where

The behaviour for a > a,,

depends

on the

parity

of p. For

odd p

the model

undergoes

a

(8)

where k’-=

[p/2]

stands for the

integer

part

of p/2 [9].

Since for even p the

symmetric

solutions are

unstable,

the model must

undergo

a discontinuous transition to another

asymmetric

state with

This solution has

mP _ 1 ml 1

and for

large p

tends to a

symmetric

solution with

ml = mp - 1 = a.

It should be

emphasized

that

equations

(2.19), (2.21),

and

(2.22)

are not the

only

zero

temperature

solutions of

equations (2.17) [10]

but

they

are the relevant ones for our purpose

since their basins of attraction contain the

input

states such that ml .-; 1 and

mp _

1 eue 0 which

bias the network to

distinguish

pattern

1 from the

other p -

1. We also remark that the transition

occurring

at a = ac is not a

thermodynamic

transition because the energy of the states

plays

no role in the transition.

Nevertheless,

a,

clearly signals

a

change

in the

dynamical

behaviour of the model which is the

important

information for the software and hardware

implementations.

To describe the behaviour of the model for T > 0 we associate the solutions

(2.19), (2.21),

and

(2.22)

with the

phases

asymmetric

I

(AI),

symmetric

(S),

and

asymmetric

II

(AII),

respectively.

The solution ml =

mp _

= 0

corresponds

to the

paramagnetic

phase

(P).

For

odd p

the

phase

diagram

has two transitions : a discontinuous one from

phase

AI to

phase

S and a continuous one from

phase

S to

phase

P

given by

equation (2.8).

Figure

3a illustrates

these transitions

for p

= 3. In the case of even p we need to include the

phase

AII. This

phase

is very sensitive to noise and

undergoes

a continuous transition to

phase

S as soon as the

symmetric

solutions become

stable,

i.e. when

a2

= 0. The

remaining

transitions are similar to the

odd-p

case.

Figure

3b shows the

phase

diagram

for p

= 4. Notice the robustness of the

phase

AI to noise and the

large

domain of the

phase

S in both

diagrams.

3. A

learning

rule for correlated

patterns.

An effective

leaming

rule for

storing

correlated

patterns

should enhance their dissimilarities

or, which is the same,

penalize

their similarities. To

implement

this idea we must

design

a

network whose cost function or energy is maximized

by

the

symmetric

mixture state. A

simple

guess is

where m" -

m v 1ft

and p

= p - 1 with m’ defined

by equation

(1.3).

A less

transparent

expression

for

H

can be obtained

by expanding

the

quadratic

term in

(9)

Fig.

3. - Phase

diagram

for

(a) p

= 3 and

(b)

p = 4. The broken curves

correspond

to continuous transitions while the solid curves

correspond

to discontinuous transitions.

where

Ji,

is a local

learning

rule

given by

Notice that

going

from

equations

(3.1)

to

(3.2)

we have omitted the

diagonal

term

lii

since it

plays

no role in the statistical mechanics

analysis

of the model.

However,

it does

affect the

dynamical properties

in a nontrivial way

[11]

and,

to avoid future

ambiguities,

we

define the model

by equations

(3.2)-(3.3).

The

thermodynamics

of the Hamiltonian

(3.1)

or

(3.2)

is

straightforward.

The

partition

(10)

Deforming

the contours of

integration

so that

they

pass the saddle

point

and

using

the

self-averaging

property

of the

averaged

free energy

density

one finds

where m’ and t "

obey

the saddle

point

equations

Clearly

the

symmetric

state,

equation (1.4),

is not a solution of these

equations.

This is the main consequence of our

learning

rule

being

different from that of Amit et

al.,

equation

(1.9),

which has an energy

landscape

dominated

by spurious symmetric

states. We retum to a

comparison

of the two rules below.

For the

asymmetric

solutions,

equation

(1.5),

the saddle

point equations

become

After

averaging

over e 1,

introducing

the variable

M = m 1 - m p _ 1,

and

eliminating

t1

and

tp -

1 one

gets

the

following

equations

where

r

We now consider the zero

temperature

solutions of these

equations. Taking

the limit

{3 -+

oo in

equation

(3.10b)

and

multiplying

both sides

by

M one finds

(11)

since

1 --t zp - i-- 0.

Tuming

to

equation

(3.10a),

we notice that

e+

and

0-

are

always

p

positive

except

for

zp - 1

= -

(p - 1)

and

zp -

1 = p - 1, respectively,

where

they

vanish.

Hence

taking

the limit

8 --+

oo the

only

solutions of

équations

(3.10)

are

which

rapidly

approach (2.19)

as p increases. These

equations give

us a clue to

understanding

how the network works. Take a certain bit in

pattern

{},

say

ei.

The

probability

of

e ’

= e i

for g =

2,

..., p is

r = bP

+

bf

since the

patterns

are

independent.

Thus the mean

number of bits which are common to all the

patterns

is N T . Because the

learning

rule

penalizes

the

symmetric

state the network’s state must be such that half of these N T bits are reversed

(if

all them are reversed the network would in fact enhance the

symmetric

state due to the

symmetry e "

--+ - e e)

and the N -

N T /2

remaining

ones are

equal

to the

eils.

Hence

-in

agreement

with

equation

(3.13a).

If one thinks of the

patterns

as

pictures

then

NT is the common

background

in the

pictures

and what the network does is

homogenize

the

background leaving

the

principal

untouched.

It is

interesting

to compare our

learning

rule,

equation (1.8),

which for

large p

is

essentially

with the rule of Amit et

al.,

equation

(1.9).

Noting

that

and

neglecting

the fluctuations

of 0 (p-1/2),

one recovers

equation (1.9).

Nevertheless,

these fluctuations are

strong

enough

to

modify

the energy

landscape

and,

consequently,

the

thermodynamics

of the model. To illustrate

this,

let us consider the T = 0 solutions of both

models in the limit of

large

but finite p.

For the rule

(1.9)

the

symmetric

solutions are

given by

[5]

(12)

while

for the

rule

(1.8)

one has

mp = Hs = 0.

The

asymmetric

solutions are

ml = 1,

mp _

1 =

a2

for both rules and possess the energy

Thus the role of

the O(P-1/2)

fluctuations is to destabilize the

symmetric

states. Notice that for rule

(1.9)

the

symmetric

solutions have lower energy for a >

(1 - 2/or

)1/2,

though

this serious drawback can be avoided

by imposing

a

global

constraint to the

dynamics

[5].

4. Discussion.

In this paper we

proposed

a local

leaming

rule for

storing

correlated

patterns

in a neural network model of associative memory. The

learning

rule,

equation

(1.8),

emerges as a natural

result of

posing

the

design

of the neural network as an

optimization problem

for the energy

function : find a

learning

rule that enhances the

patterns’

differences. Such a rule must

necessarily

be

complex.

A

simple

additive

leaming

rule,

like the

generalized

Hebb

rule,

treats each

pattern

as a new

piece

of information to be stored even if the

patterns

are correlated.

Instead,

equation

(1.8)

extracts and stores

only

the new information contained in the new

pattern

presented

to the network. Since the process of

selecting

the new information involves

a

comparison

of the new

pattern

with all the data

already

stored in the network the

learning

rule cannot be local in the space of the stored

patterns.

A similar situation occurs in the

storage

of

hierarchically

correlated

patterns

where in order to store a

pattern

(descendant)

the

network needs to recall information

already

stored

(ancestor) [14,

15].

Our

leaming

rule may

have a

promising

application

in

pattern

recognition problems

where

important

features of the

stored

patterns

are hidden in the

background.

We think that this

point

deserves further

attention.

Throughout

this paper we have considered

only learning

rules where all the

patterns

are

embedded in the network at once

through

a

prescription

for

Jjj.

Although

this

unsupervised

learning

strategy

does not

guarantee

the

stability

of the

patterns,

it allows an

analytical study

of the retrieval process. It should be

emphasized

that,

in the context of

supervised learning

rules,

there exist a local

algorithm

- the

perceptron

algorithm

- which

can

generate

the

appropriate synaptic

connections to stabilize a set of N correlated or uncorrelated

patterns

[16].

It should be remarked that the results of section 2 could be attributed to the statistical

independence

of the biased

patterns

e e e "> = e e> e ">

= a 2

instead of to a true correlation effect. We believe this is not the case since those results agree with the intuitive

expectation

that the correlations should favour the

symmetric

mixture state and this

preference

should be enhanced in the presence of noise. Further evidence is

provided by

comparing

our results for p = 2 with the results of reference

[13]

where

e,">

=

V)

= 0

and (

e >

= Q.

The

equations

there are reduced to

equations

(2.17)

when one

replaces Q

by a2.

Among

the

advantages

of the

leaming

rule

equation

(1.8)

over

equation

(1.9)

are the

absence of the

symmetric

mixture state and the

applicability

to any set of correlated

patterns

without earlier

knowledge

of the correlations. To compare these two

learning

rules in the

limit of non-zero a -

p/N

we have run simulations for a = 0.1

(N

=

200 )

and measured the

retrieval

overlap

ml as a function of a. Rule

(1.9)

has

only

a retrieval

overlap

(13)

Acknowledgments.

The research at Caltech was

supported by

contract N00014-87-K-0377 from the Office of Naval Research. J. F. F. thanks the kind

hospitality

of IF-UFRGS where this work was started. W. K. T. thanks R. Erichsen Jr. for aid with some of the calculations in section 2. The research

of W.K.T. was

supported

in

part

by

Conselho Nacional de Desenvolvimento Cientifico e

Tecnologico

(CNPq)

and Financiadora de Estudos e

Projetos

(FINEP),

Brazil. J.F.F. was

partly supported by

a

CNPq

fellowship.

References

[1]

AMIT D. J., GUTFREUND H. and SOMPOLINSKY H.,

Phys.

Rev. A 32

(1985)

1007.

[2]

AMIT D. J., GUTFREUND H. and SOMPOLINSKY H., Ann.

Phys.

N. Y. 173

(1987)

30.

[3]

HOPFIELD J. J., Proc. Natl. Acad. Sci. USA 79

(1982)

2554.

[4]

BINDER K. and YOUNG P. A., Rev. Mod.

Phys.

58

(1986)

801.

[5]

AMIT D. J., GUTFREUND H. and SOMPOLINSKY H.,

Phys.

Rev. A 35

(1987)

2293.

[6]

HOPFIELD J. J., and TANK D. W., Biol. Cibern. 52

(1985)

141.

[7]

BRUCE A. D., GARDNER E. and WALLACE D. J., J.

Phys. A

20

(1987)

2909.

[8]

GARDNER E., J.

Phys. A

21

(1988)

257.

[9]

See

Appendix

A of reference

[1]

for a similar calculation.

[10]

THEUMANN W. K. and ERICHSEN R., Jr.,

unpublished.

[11]

FONTANARI J. F. and KOEBERLE R., J.

Phys. A

21

(1988)

L667.

[12]

Throughout

this paper we have omitted the trivial solutions due to the symmetry m ~ - m.

[13]

FONTANARI J. F. and KOEBERLE R. , J.

Phys.

A 21

(1988)

2477.

[14]

PARGA N. and VIRASORO M. A., J.

Phys.

France 47

(1986)

1857.

[15]

FEIGELMAN M. V. and IOFFE L. B., Int. J. Mod.

Phys.

B 1

(1987)

51.

Références

Documents relatifs

Our work also belongs to this second approach based on the post processing of patterns. That is we aim at designing a data structure and efficient algorithms for the management of

A RECIPROCITY FOR SYMMETRIC ALGEBRAS by MARKUS LINCKELMANN† CNRS, Universit´e Paris 7, UFR Math´ematiques, 2 place Jussieu, 75251 Paris Cedex 05, France and BERNARD STALDER

method for solving the shell-model equations in terms of a basis which includes correlated sub-.. systems is

In particular, if ξ < 1 (non-summable correlations) there is no phase transition (the proof of Theorem 3.7 can actually be easily adapted), and the annealed system, well defined if

riodevensis: dorsal spine (CPT-1633) in E, posterior view; F, anterior view, right proximal ischia (CPT-1640); G, medial view, posterior dorsal spine vertebrae (CPT-1611); H,

Offenbar waren es nicht nur politische Rechte: Hans Georg Wackernagel stellte nach Durchsicht der Gremaud-Aktensammlung fest, dass nicht nur die Frauen, sondern auch die Kinder

Juni dieses Jahres mietete er von ihm ein Haus mit Garten in Brig.16 Und bereits im November desselben Jahres fungierte Stockalper neben dem damaligen Bannerherrn Georg

By analogy with spin glasses it is believed, for neural networks, that the true replica broken solution removes the reentrant phase and increases ac(TM) below the replica