• Aucun résultat trouvé

Generalization in the Hopfield model: numerical results

N/A
N/A
Protected

Academic year: 2021

Partager "Generalization in the Hopfield model: numerical results"

Copied!
7
0
0

Texte intégral

(1)

HAL Id: jpa-00246392

https://hal.archives-ouvertes.fr/jpa-00246392

Submitted on 1 Jan 1991

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Generalization in the Hopfield model: numerical results

E. Miranda

To cite this version:

E. Miranda. Generalization in the Hopfield model: numerical results. Journal de Physique I, EDP

Sciences, 1991, 1 (7), pp.999-1004. �10.1051/jp1:1991183�. �jpa-00246392�

(2)

J

Phys.

Ifrance 1

(1991)

999-llKA JuiLLBr1991, PAGE 999

classification

Physics

AbsmvctY 87,10 64.60c

Sham Communication

Generalization in the Hopfield model: numerical results

E. N. Mkanda

HLR2~ WA

Jdlich,

D-5170 JUlich,

Gennany (Received llApdll99l, vcceptedl6Apfill99l)

Abstract. The

generalization capability

of the

Hopfield

model is studied

numerically.

There is a critical number tc of

noisy examples

that should be

presented

before the system grasps the pure patterns. Above tc, the

generalization

error falls off as a power of the

presented examples.

The critical

number of

examples

increases

exponentially

with the noise level in the

examples

and

linearly

with the number of patterns to be learned.

The

Hopfield

model

[ii

is the most

popular

neural network and it is

responsible

for the

present

excitement around this

subject

in the theoretical

physics community.

It has been

extensively

stud- ied as an associative

memory

model and its

phase space

has been

explored

with the

powerful

tools of

spin-glass

mean-field

theory [2].

However little is known about its

performance

in more com-

plex computational

tasks like

generalization.

This

problem

is the focus of many recent

papers [3 8], although

these works deal with mono or

multilayered perceptrons.

Fontanari [9] has ex- amined

generalization

in the

Hopfield

model with the usual mean-field

techniques.

The aim of this paper is to

study numerically

the same

problem.

In the usual

Hopfield

model

,

patterns

are learned

by choosing

the

couplings according

to Hebb's rule. In this way, the

patterns

become minima of a

properly

defined energy

[2]. Now, suppose

that several

noisy examples

of the

pure patterns

are stored in the

system.

Does it

grasp

the

pure pattems?

We say the

system "general-

he" from tile

examples

if the

pure patterns (which

have never been stored! are energy minima

or

they

are very close to energy minima. Our results show that a critical number of

examples

must be

taught

to the

system

in order to start

generalization.

Once the network is in the

generalization regime,

the retrieval error decreases as a power of the

taught examples.

The critical number of

examples

increases

exponentially

with the amount of noise in the

taught examples

and

linearly

with the number of

pattems

to be stored.

Consider a neural network with N neurons which can take the values

S;

=

+I;

every site is con-

nected to every site. There is a set of p

patterns (ff (0)) (p

=

I,

...., p; I

=

I,

...,

N)

to be learned

by

the

system.

As

usual,

the relevant

quantity

is a =

PIN.

We assume two realistic

hypotheses

about

teaming: a)

we never leam a pure

pattem

but a

noisy

version of

it; b) leaming

never

stops;

we are

always relearning

and

improving previously

stored information.

So,

the

learning procedure

of our model is a continuous one. A "time

step"

at time t

implies

the

following operations:

JOURNAL DE PHYSIOUE I T I,At7, JUILLET199> 40

(3)

llX4 JOURNALDEPHYSIQUEI N°7

. Choose a set of

noisy examples

of the

pure patterns

to be

learned; they

are denoted

by (ff (t) ).

If r is the nobe

level~

this means that

ff(t)

=

-ff(0)

with

probability

r.

. Store the set of

noisy examples using

the usual Hebb rule:

Ji;(t)

=

Ju(t i)

+

ffl"~(t)fj~~(t)

Jii

= 0

Initiauy,

the

system

starts from a "tabula rosa" state

(all

the

J,j

are

zero).

It is well known [2]

that an

energy

may be defined if the

J;;

are

symmetric.

In our case, the energy will be a time

dependent quantity:

~ ~

E(t)

=

~j ~j J,j (t)S;Sj

;=i j=i

Generalization b achieved when the pure

patterns

are energy minima. This

point

is checked in the

following

way. A

pure pattern

is taken as initial

configuration;

it evolves

according

to a

sequential stepest-descendent dynamics

-I.e, each neuron should be

parallel

to its internal field-.

After few

iterations,

the

system

reaches a fixed

point.

Its

overlap

with the initial

pure pattem

is

given by:

~

'~'"(l)

"

j ~jffSS

<=I

This

quantity

is

averaged

over the whole set of

pure patterns.

The

generalization

error is defined

as:

Z(t)

=

/~~~

The bar means

average

over the

pattern

set. If the

system generalizes

from the shown

examples,

one

expects

z ci 0.

Computer

simulations were

performed

with neural networks of 256 and 512 neurons. These are rather modest sizes -see reference

[10]

for a review of

large

scale simulations- but a

qualitative change

of the

system

behaviour is not

expected

for

bigger

sizes. The time evolution of the

overlap

was measured and

averaged

over 50

samples.

We simulated over 50 to 250 time

steps depending

on

N,

r and a. The simulation time was chosen such that the

power-law decreasing

error

(see below)

was seen. In

figure I,

the

generalization

error is

plotted against

time

(I,e,

number of

shown

examples).

These data

correspond

to r =

0.25,

a = 0.10 and N

= 512. It is clear that the

generalization

error is

approximately

constant in the first

iterations,

then it goes down with a

power

law. A critical number of

examples (I,e

a critical

time)

tc may be defined as it is shown in

figure

I

(see

construction with dashed

line).

For t >

tc,

the

generalization

error falls off with a

power

law:

z ~

t~°

We now

study

the

dependence

of

tc

and with the

parameters

involved in the model.

In

figure 2,

the critical time is

plotted against

the noise level r for a = 0.05. The data fit very well an

exponential

law:

tc

+~ e~

Thh means that

generalization

becomes much more difficult aj the

patterns

are less well defined

-a

good

advice for out teachers!-

certainly

decreases with r

(bom

m 3.5 for r = 0.I to b ci 1.5 for r =

0.4)

but the values fluctuates too much for a definitive statemenL It should be remarked

(4)

N°7 GENERALIZA~TONINTHEHOPFIELDMODEL llYll

< ,---~

8 " " " .. '

k

c o

fi

_I

~ i o

tc

20

log10 (time)

Fig,

I.

Log~o Log~o plot

of the

generalization

error vs, time in a 512 neurons network. The noise level is r

= 0.20 and a

= 0, lo- A critical time tc may be defined as shown. Ift > tc, the

generalization

error falls off with a power law: e =

t~°

D

1.

,..~'"

~

~

,,:.." O

) ,,:.""~

,,....'"'i$

..

d'

O

loo 150 200 .250 300 350 400

Noise level

Fig.

~ The critical time tc as a function of the noise level r for N =256

(o)

and 512 Q

).

The data show that tc grows

exponentially

with the noise in the

examples.

that for r = 0.5 there is no

generalization

at all~ lids is

quite

obvious: a

pattern (f» (0))

with

50~

of

wrong

sites may be considered as the

"anti"-pattem (-f" (0))

with the same amount of noise.

Therefore,

one may

expect

a

singularity

at r = 0.5.

In

figure

3a the critical number of

examples tc

h

plotted against

a. We can see that tc increases

(5)

1l~l2 JOURNALDEPHYSIQUEI N°7

___,_,...'b

,..."'~"'

O

____,,.iy.""'

___,,,,:.g""""'

O

~

..."'$"""

E

r B I qo

~

040 050 060 070 080 .090 loo l10 120 130 140

alpha

a)

"""'"...,q I""...,

'""..v.,

~

e

"....,,_

I

"'".

n._

o

'"....,,

""..R,,_

""'"....,_q

040 050 060 070 080 .090 loo l10 120 130 140

~'~~~

b)

Fig.

3. The critical time tc

(a)

and the exponent b

(b)

as a function of a. For a

= 0.15 there is no

generalization

at all.

symbols

as in the

previous figure.

linearly

with the number of

pattem

to be learned. A finite tc is found for a =

0.14,

but there is no

generalization

at all for a = 0. lb. It is well known that

theje

is a first order transition in the model for its retrieval

properties

[2] and

large

scale simulations

[11]

have shown it takes

place

at ac =

0.143(1). So,

it is reasonable to

expect

a

jump

in the

generalization capability

of the model

(6)

N°7 GENERALIZA~TONINTHEHOPFIELDMODEL 10J3

around o ci 0.14. In

figure

3b the

exponent

is

plotted against

a. We may conclude:

tc ~ a

~

'~7 -££

provided

that a < ac.

Thus:

(I)

there is a critical number of

examples

tc which should be shown to the network before it gen-

eralizes;

(it)

in the

generalization regime,

the

generalization

error decreases with a power law +~

t~°;

(iii)

tc grows

exponentially

with the noise level in the

examples

shown to the

network;

(iv )tc

grows

linearly

with a and the

exponent

decreases

linearly

with it

provided

o < oc.

The first

point

is in full accordance with the mean-field results

[9]

for the

Hopfield

model. There

are some

analytical

calculations for

single-layer perceptrons [12]

which

predict

no critical size

tc

for the

training

set. Most

probably,

the results

strongly depend

on the

architecture;

for this reason

one should not

expect agreement

with those calculations. In any case, our numerical simulations

support

the claim of reference

[9].

One should remember that our data have been obtained in fi- nite

systems

and

they

may

depend

on N. In

fac~ tc

increases with N

-compare

the data for N =256 with those for N =512-. Some runs with N=640 were

performed

and the results agree with those

of 512 neurons within statistical errors (+~

5~).

A careful

study

of finite-size effects

(and bigger

systems I)

would be needed for a

quantitative comparison

with mean-field

predictions.

Our second conclusion is in

disagreement

with

previous

simulations for

multilayered

neural networks

[13, 14]

that show an

exponential decay

of e.

However,

the behaviour of e seems to

depend

on the details of the architecture even for

multilayered perceptrons [14]. Therefore,

it is not

surprising

that we

get

a

completely

different behaviour with a

completely

different architecture.

Finally,

there are no

analytical predictions

about

points (iii)

and

(iv). Perhaps,

a noise to

signal analysis

can be a useful tool to

study

such

questions.

It would also be very

interesting

to

analyze generalization

with

Gardner's

technique [15]

in order to

get

results which are not bound to a

particular

architecture.

Acknowledgements.

The author vishes to thank H. J. Herrmann for a careful

reading

of the

manuscript

and the ref-

erees for many

suggestions

References

HOPFIELD

J-J-,

Ptvc. NatL Acaa. Sci USA 79

(1982)

2554.

Functions

(Cambridge

Univ.

Press, 1988).

Ml

(1989)

1983.

J, and RERWGIER

P, Europhys.

Lett. 9

(1989)

315.

M.,

XINzEL

W,

KLEiNz J. and NEHL

R.,

J

Phys.

A23

(1990)

I681.

GYORGYi

G.,

Pfiys. Rev Len. 64

(1990)

2967.

HERI~

JA., KROGH A. and THORBERGSSON G-I-, J

Phys.

Ml

(1989)

2133.

~SHBY N, and SOLLA

s.A~,

Proc. IEEE 78

(1990)

1568.

FONTANABI

J.,

J

Phys.

France 51

(1990)

2421.

(7)

1o04 JOURNALDEPHYSIQUEI N°7

[10] KOHRiNG G., Int. L Mod

Phys.

Cl

(1990)

259.

[I I]

KOBRiNG G.,J Stat.

Phys.

59

(1990)

1077.

[12] HANSEL D, and SOMPCLINSKY

H., Europhys.

Lent

11(1990)

687.

[13] DENKER

J., scHWAm2D.,

WnTNm

B.,

SOLLA

s.,

HOWARD

R.,

JACKELL. and HOPFIELD

JJ., Campier

SysL 1

(1987)

877.

[14] ~sHEY

N.,

LEVINE E, and SOLLA

s.,

Proc. Int. Joint conf, on Neural

Neovorks, lAbshington (1989).

[l~

GARDNER E.,J Pfiys. Ml

(1988)

257.

Références

Documents relatifs

The main theorem of this paper (Theorem 3.9) tells us that this generalized Lelong number satisfies a semi- continuity property of the same type as the classical Lelong number

Furthermore, using the properties of the Green function and using the well-known theorems like Jensen’s or Jensen-Steffensen’s inequality, we give more explicit conditions

In addition, after in-depth examination of the financial statements, the Supervisory Board approved the annual financial statements and consolidated financial statements as well

The model results are compared to monthly averaged data at cruise altitudes in the upper troposphere and lower stratosphere and monthly averaged vertical profiles collected

Hysteresis loops observed on the oscilloscope screen before and after longitudinal E || (instantaneous reaction, not permanent effect) and transverse E ⊥ (gradual

As we shall explain, a careful analysis of the proof presented in [11] reduce our work to prove an estimate on the distribution of the coalescing time between two walks in the

While it is natural for us to talk about cumulated gain over time, the traditional cumulated gain measures have substi- tuted document rank for time and implicitly model a user

On planet Triozon, the inhabitants, the Trioz, have got only three fingers. To write down numbers bigger than “9”, we have to use two symbols. The number following nine is