Generalization in the Hopfield model: numerical results

(1)

HAL Id: jpa-00246392

https://hal.archives-ouvertes.fr/jpa-00246392

Submitted on 1 Jan 1991

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Generalization in the Hopfield model: numerical results

E. Miranda

To cite this version:

E. Miranda. Generalization in the Hopfield model: numerical results. Journal de Physique I, EDP

Sciences, 1991, 1 (7), pp.999-1004. �10.1051/jp1:1991183�. �jpa-00246392�

(2)

J

Phys.

Ifrance 1

(1991)

999-llKA JuiLLBr1991, PAGE 999

classification

Physics

AbsmvctY 87,10 64.60c

Sham Communication

Generalization in the Hopfield model: numerical results

E. N. Mkanda

HLR2~ WA

Jdlich,

D-5170 JUlich,

Gennany (Received llApdll99l, vcceptedl6Apfill99l)

Abstract. The

generalization capability

of the

Hopfield

model is studied

numerically.

There is _a critical number tc of

noisy examples

that should be

presented

before the system grasps ^the pure patterns. ^Above tc, the

generalization

_error falls off _{as a} power ^{of the}

presented examples.

The critical

number of

examples

increases

exponentially

with the noise level in the

examples

and

linearly

with the number of patterns to be learned.

The

Hopfield

model

[ii

is the _most

responsible

for the

present

excitement around this

subject

in the theoretical

physics community.

It has been

extensively

studied _as _an associative

memory

^model ^and ^its

phase _space

has been

explored

with the

powerful

tools of

spin-glass

mean-field

theory [2].

^However ^little îs ^known âbout îts

performance

in more com-

plex computational

tasks like

generalization.

This

problem

is the focus of many ^recent

papers [3 8], although

these works deal with _mono _or

multilayered perceptrons.

^Fontanari [9] ^has ex- amined

generalization

in the

Hopfield

model with the usual mean-field

techniques.

The aim of this paper is ^to

study numerically

the _same

problem.

In the usual

Hopfield

model

,

patterns

are learned

by choosing

the

couplings according

to Hebb's rule. In this way, ^the

patterns

^become minima of a

properly

defined energy

[2]. Now, _suppose

that several

noisy examples

of the

pure patterns

are stored in the

system.

^Does ^it

grasp

^the

pure pattems?

_{We say} the

system "general-

he" from tile

examples

if the

pure patterns (which

have never been stored! _are energy minima

or

they

are very close to energy minima. Our results show that _a critical number of

examples

must be

taught

to the

system

ⁱⁿ ^order to start

generalization.

Once the network is in the

generalization regime,

the retrieval _error decreases _as _a power of the

taught examples.

The critical number of

examples

increases

exponentially

with the amount of noise in the

taught examples

and

linearly

with the number of

pattems

to be stored.

Consider _a neural network with N _neurons which can take the values

S;

₌

+I;

_every site is con-

nected to every ^site. ^There ^is a set of p

patterns (ff (0)) (p

₌

I,

...., p; I

=

I,

...,

N)

to be learned

by

the

system.

^As

usual,

the relevant

quantity

is _a ₌

PIN.

We assume two realistic

hypotheses

about

teaming: a)

we never leam a pure

pattem

^but a

noisy

version of

it; b) leaming

never

stops;

we are

always relearning

and

improving previously

stored information.

So,

the

learning procedure

of _our model is _a continuous _one. A "time

step"

at time t

implies

the

following operations:

JOURNAL DE PHYSIOUE I T I,At7, JUILLET199> 40

(3)

llX4 JOURNALDEPHYSIQUEI N°7

. Choose _a set of

noisy examples

of the

pure patterns

to be

learned; they

are denoted

by (ff (t) ).

If _r is the nobe

level~

^this means that

ff(t)

=

-ff(0)

with

probability

r.

. Store the set of

noisy examples using

the usual Hebb rule:

Ji;(t)

₌

Ju(t i)

+

ffl"~(t)fj~~(t)

Jii

₌ 0

Initiauy,

the

system

starts from _a "tabula rosa" state

(all

the

J,j

are

zero).

It is well known [2]

that an

energy

may ^be defined if the

J;;

are

symmetric.

In _our _case, the energy will be _a time

dependent quantity:

~ ~

E(t)

₌

~j ~j _J,j _(t)S;Sj

;=i j=i

Generalization b achieved when the pure

patterns

are energy ^minima. ^This

point

is checked in the

following

_way. A

pure pattern

is taken _as initial

configuration;

it evolves

according

to a

sequential stepest-descendent dynamics

-I.e, each neuron should be

parallel

to its internal field-.

After few

iterations,

the

system

^reaches a fixed

point.

Its

overlap

with the initial

pure pattem

is

given by:

~

'~'"(l)

_"

j _~jffSS

<=I

This

quantity

is

averaged

over the whole set of

pure patterns.

^The

generalization

_error is defined

as:

Z(t)

₌

/~~~

The bar _means

average

over the

pattern

set. If the

system generalizes

from the shown

examples,

one

expects

z _ci 0.

Computer

simulations _were

performed

with neural networks of 256 and 512 _neurons. These are rather modest sizes -see reference

[10]

for a review of

large

scale simulations- but a

qualitative change

of the

system

^behaviour ^is not

expected

for

bigger

sizes. The time evolution of the

overlap

was measured and

averaged

over 50

samples.

We simulated over 50 _to 250 time

steps depending

on

N,

r and _a. The simulation time _was chosen such that the

power-law decreasing

_error

(see below)

was seen. In

figure I,

the

generalization

error is

plotted against

^time

(I,e,

number of

shown

examples).

These data

correspond

to _r ₌

0.25,

a = 0.10 and N

= 512. It is clear that the

generalization

_error is

approximately

constant in the first

iterations,

then it goes ^down with _a

power

^law. ^A ^critical ^number ^of

examples (I,e

_a critical

time)

_{tc may} be defined _as it is shown in

figure

I

(see

construction with dashed

line).

For _t _>

tc,

the

generalization

_error falls off with _a

power

^law:

z ~

t~°

We now

study

the

dependence

of

tc

and with the

parameters

involved in the model.

In

figure 2,

the critical time is

plotted against

the noise level _r for _a ₌ 0.05. The data fit very well an

exponential

law:

tc

_+~ ^e~

Thh means that

generalization

becomes much _more difficult _aj the

patterns

are less well defined

-a

good

advice for out teachers!-

certainly

decreases with _r

(bom

_m 3.5 for _r ₌ 0.I to b _ci 1.5 for _r ₌

0.4)

but the values fluctuates too much for _a definitive statemenL It should be remarked

(4)

N°7 GENERALIZA~TONINTHEHOPFIELDMODEL llYll

< ,---~

8 ^" ^" " .. '

k

c o

fi

_I

~ i o

tc

20

log10 (time)

Fig,

I.

Log~o Log~o plot

of the

generalization

_error _vs, time in _a 512 _neurons network. The noise level is _r

= 0.20 and _a

= 0, lo- A critical time tc may be defined as shown. If_t _> tc, the

generalization

_error falls off with _a power ^law: e =

t~°

D

1.

,..~'"

^~

~

,,:.." ^O

) ,,:.""~

,,....'"'i$

..

d'

O

loo 150 200 .250 300 350 400

Noise level

Fig.

~ The critical time tc as a function of the noise level _r for N =256

(o)

and 512 Q

).

^The ^data ^show that tc grows

exponentially

^with the noise in the

examples.

that for _r ₌ 0.5 there is _no

generalization

at all~ lids is

quite

obvious: _a

pattern (f» (0))

with

50~

of

wrong

^sites may be considered _as the

"anti"-pattem (-f" (0))

with the _same _amount of noise.

Therefore,

_one _may

expect

a

singularity

at _r ₌ 0.5.

In

figure

3a the critical number of

examples tc

^h

plotted ^against

a. We _can _see that tc increases

(5)

1l~l2 JOURNALDEPHYSIQUEI N°7

___,_,...'b

,..."'~"'

O

____,,.iy.""'

___,,,,:.g""""'

^O

~

..."'$"""

E

r B I qo

~

040 050 060 070 080 .090 loo l10 120 130 140

alpha

a)

"""'"...,q I""...,

'""..v.,

~

e

"....,,_

I

"'".

n._

o

'"....,,

""..R,,_

""'"....,_q

040 050 060 070 080 .090 loo l10 120 130 140

~'~~~

b)

Fig.

3. The critical time tc

(a)

and the exponent b

(b)

as a function of _a. For _a

= 0.15 there is _no

generalization

at all.

symbols

as in the

previous figure.

linearly

with the number of

pattem

to be learned. A finite tc is ^found ^for a =

0.14,

but there is no

generalization

at all for _a ₌ 0. lb. It is well known that

theje

is _a first order transition in the model for its retrieval

properties

_[2] and

large

scale simulations

[11]

have shown it takes

place

at ac =

0.143(1). So,

it is reasonable to

expect

a

jump

ⁱⁿ the

generalization capability

of the model

(6)

N°7 GENERALIZA~TONINTHEHOPFIELDMODEL 10J3

around _o _ci 0.14. In

figure

3b the

exponent

^is

plotted against

_a. We may conclude:

tc ~ a

~

'~7 -££

provided

that _a _< _ac.

Thus:

(I)

there is _a critical number of

examples

tc which ^should ^be ^shown ^to ^the ^network ^before ^it gen-

eralizes;

(it)

in the

generalization regime,

the

generalization

error decreases with a power ^law +~

t~°;

(iii)

_{tc grows}

exponentially

with the noise level in the

examples

shown to the

network;

(iv )tc

_grows

linearly

with _a and the

exponent

^decreases

linearly

with it

provided

_o < oc.

The first

point

is in full accordance with the mean-field results

[9]

^for ^the

Hopfield

model. There

are some

analytical

calculations for

single-layer perceptrons [12]

^which

predict

no critical size

tc

for the

training

set. Most

probably,

the results

strongly depend

_on the

architecture;

^{for this} reason

one should _not

expect agreement

^with ^those calculations. In any case, our numerical simulations

support

^the ^claim ^of ^reference

[9].

^One ^should ^remember ^that our data have been obtained in finite

systems

^and

they

_may

depend

on N. In

fac~ tc

^increases with N

-compare

^the ^data ^for ^N ⁼²⁵⁶ with those for N =512-. Some _runs with N=640 _were

performed

and the results agree with those

of 512 _neurons within statistical _errors (+~

5~).

A careful

study

of finite-size effects

(and bigger

systems I)

^would be needed for a

quantitative comparison

with mean-field

predictions.

Our second conclusion is in

disagreement

with

multilayered

neural networks

[13, 14]

that show _an

exponential decay

of _e.

However,

the behaviour of _e seems to

depend

on the details of the architecture _even for

multilayered perceptrons [14]. Therefore,

it is not

surprising

that _we

get

a

completely

different behaviour with _a

completely

different architecture.

Finally,

there are no

analytical predictions

about

points (iii)

and

(iv). Perhaps,

_a noise to

signal analysis

can be _a useful tool to

study

such

questions.

It would also be very

interesting

to

analyze generalization

with

Gardner's

technique [15]

ⁱⁿ order to

get

^results ^which are not bound to a

particular

architecture.

Acknowledgements.

The author vishes to thank H. J. Herrmann for _a careful

reading

^of the

manuscript

and the ref-

erees for many

suggestions

References

HOPFIELD

J-J-,

Ptvc. NatL Acaa. Sci USA 79

(1982)

2554.

Functions

(Cambridge

Univ.

Press, 1988).

Ml

(1989)

1983.

J, and RERWGIER

P, Europhys.

Lett. 9

(1989)

315.

M.,

^XINzEL

W,

KLEiNz J. and NEHL

R.,

J

Phys.

A23

(1990)

^I681.

GYORGYi

G.,

_Pfiys. Rev Len. 64

(1990)

2967.

HERI~

JA., KROGH A. and THORBERGSSON G-I-, J

Phys.

Ml

(1989)

2133.

~SHBY N, and SOLLA

s.A~,

Proc. IEEE 78

(1990)

1568.

FONTANABI

J.,

J

Phys.

France 51

(1990)

2421.

(7)

1o04 JOURNALDEPHYSIQUEI N°7

[10] ^KOHRiNG G., Int. L Mod

Phys.

Cl

(1990)

259.

[I _I]

KOBRiNG G.,J Stat.

Phys.

59

(1990)

1077.

[12] HANSEL D, and SOMPCLINSKY

H., Europhys.

Lent

11(1990)

687.

[13] ^DENKER

J., scHWAm2D.,

WnTNm

B.,

SOLLA

s.,

HOWARD

R.,

JACKELL. and HOPFIELD

JJ., Campier

SysL ¹

(1987)

877.

[14] ^~sHEY

N.,

LEVINE E, and SOLLA

s.,

Proc. Int. Joint conf, _on Neural

Neovorks, lAbshington (1989).

[l~

GARDNER E.,J Pfiys. ^Ml

(1988)

257.

Generalization in the Hopfield model: numerical results

HAL Id: jpa-00246392

https://hal.archives-ouvertes.fr/jpa-00246392

Submitted on 1 Jan 1991

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Generalization in the Hopfield model: numerical results

E. Miranda

To cite this version:

E. Miranda. Generalization in the Hopfield model: numerical results. Journal de Physique I, EDP

Sciences, 1991, 1 (7), pp.999-1004. �10.1051/jp1:1991183�. �jpa-00246392�

Phys.

(1991)

Physics

Generalization in the Hopfield model: numerical results

Jdlich,

Gennany (Received llApdll99l, vcceptedl6Apfill99l)

generalization capability

Hopfield

numerically.

noisy examples

presented

generalization

presented examples.

examples

exponentially

examples

linearly

Hopfield

[ii

popular

responsible

present

subject

physics community.

extensively

memory

phase space

explored

powerful

spin-glass

theory [2].

performance

plex computational

generalization.

problem

papers [3 8], although

multilayered perceptrons.

generalization

Hopfield

techniques.

study numerically

problem.

Hopfield

patterns

by choosing

couplings according

patterns

properly

[2]. Now, suppose

noisy examples

pure patterns

system.

grasp

pure pattems?

system "general-

examples

pure patterns (which

they

examples

taught

system

generalization.

generalization regime,

taught examples.

examples

exponentially

taught examples

linearly

pattems

phase _space

[2]. Now, _suppose

~j ~j _J,j _(t)S;Sj

j _~jffSS