• Aucun résultat trouvé

On the problems of neural networks with multi-state neurons

N/A
N/A
Protected

Academic year: 2021

Partager "On the problems of neural networks with multi-state neurons"

Copied!
5
0
0

Texte intégral

(1)

HAL Id: jpa-00246638

https://hal.archives-ouvertes.fr/jpa-00246638

Submitted on 1 Jan 1992

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

On the problems of neural networks with multi-state neurons

G. Korhring

To cite this version:

G. Korhring. On the problems of neural networks with multi-state neurons. Journal de Physique I,

EDP Sciences, 1992, 2 (8), pp.1549-1552. �10.1051/jp1:1992226�. �jpa-00246638�

(2)

Classification Physics Abstracts

87.10 89.70 64.60C

Short Communication

On the problems of neural networks with multi-state

neurons

GA.

Kohring

Institut fir Theoretische Physik, Universit£t zu K61n, Zfilpicherstrasse 77, D-5000 K61n, Ger- many

(Received

1 June1992, accepted 11

June1992)

Abstract For realistic neural network applications the storage and recognition of gray-tone patterns, I-e, patterns where each neuron in the network can take one of Q different values,

is more important than the storage of black and white patterns, although the latter has been

more widely studied. Recently, several groups have shown the former task to be problematic with current techniques since the useful storage capacity, a, generally decreases like: «

+~

Q~~.

In this paper one solution to this

problem

is proposed, which leads to the storage capacity

decreasing like: «

+~ (log~

Q)~~.

For realistic situations, where Q

= 256 this implies an increase of nearly four orders of magnitude in the storage capacity. The price paid, is that the time

needed to recall a pattern increases like: log~ Q. This price can be partially offset by an efficient parallel program which runs at IA Gflops on a 32 processor

iPSC/860

Hypercube.

Attractor neural networks are

usually

defined as N

spins, S;, fully coupled together

via a connection matrix

J;j

and

obeying

a

dynamical equation

of the form:

S;(t

+

i)

=

f IL .>S>(t)) (i)

The fixed

points

of this system are determined

by

the connection

matrix, J,

and the function

f.

These fixed

points

may be preset

by

an

appropriate

choice of J.

By

observation of

biological

systems, Hebb

[I] proposed

that the P fixed

points,

or patterns,

(f"),

could be stored

(I.e.

preset)

in this system

by chasing

J to be:

J:j

=

£ fffJ (2)

»

To

date,

the most

commonly

studied attractor neural networks consists of neurons with two states [2], for which the

dynamical function, f,

in the noise free

limit,

is

usually

taken as:

f(~)

=

sign(~).

The

properties

of these

networks,

which serve as

simple

models of pattern

recognition

in

biological

systems, have been studied

extensively using

both

analytical

[3] and numerical

(3)

1550 JOURNAL DE PHYSIQUE I N°8

[4 6] methods. It is

by

now well established that the

simplest

such

network,

the

Hopfield

model

(consisting

of

only

the above

ingredients), undergoes

a

(first order) phase

transition in the

quality

of its retrieval

properties [3-

6] at a storage

capacity,

o, of

approximately:

a n

PIN

m 0.142 [6].

Quality

of retrieval is meant here to include both the size of the

basins of attraction and the location of the nearest fixed

point

to the stored pattern. This

simple

model

however,

does not represent the best

performance

that can be achieved for these

networks,

indeed the maximum storage

capacity

for uncorrelated patterns has been shown to be a = 2

[ii

and the

quality

of retrieval has been shown to

degrade

via a second order

phase

transition [4].

Achieving

this

larger

storage

capacity requires

the use of different connections than those

given by equation (2)

and has been

investigated by

many different researchers

(for

a review see

[8]).

Although

networks with two state neurons are useful for

building

a theoretical

understanding

of the

principles

involved in neural

computation,

realistic

applications

in

biological

and artificial systems

require

the use of multi-state neurons, I-e-, the storage of gray-tone patterns. An

obvious

starting point

for studies of multi-state neurons is a

simple

extension of the

Hopfield

model. This was done

by llieger

[9], who assumed that the neurons are allowed to occupy any of

Q

different states, that the patterns are stored

using equation (2)

and that the

dynamical

function, f,

is defined

by:

f(~)

= ak for ~ E iRk-1> ski> k = I,.

,

Q (3j

where the ak are the

elementary

neuron states and the Rk are chosen to

give optimal perfor-

mance. He then found that the

storage capacity

falls off as a

+~

Q~~

This result is somewhat

surprising,

since in the limit of

analog

neurons,

Q

- oo, it

implies:

a - 0.

However,

a

previ-

ous

study by

Marcus et al. [10] on

analog

neurons had

predicted

an a

nearly

the same as the

twc-8tate a. This

seeming

contradiction can be resolved

by looking

more

closely

at the work

of Marcus et al. In their papers

they

used

analog

neurons to store two-state patterns, whereas

Rieger

stores patterns whose neuron may occupy any of the

possible Q

states. Since the latter

requirement

is more

sensible,

it can be concluded that the storage of gray-tone patterns

using

a

simple

extension of the

Hopfield

model is not

practical

for realistic

applications,

where

Q

is

typically

on the order of

O(256).

In an effort to overcome the limitations of the

Hopfield

type

models, Kohring ill]

calculated the maximum

possible

storage

capacity

of networks with

Q

state neurons and found that

a - I as

Q

- oo. This result would have meant it was

only

a matter of

finding

the correct

learning

rule in order to

apply

multi-state neurons.

However,

Mertens et al,

pointed

out that this result is of little

utility

since the basins of attraction shrink to zero size at the maximum storage

capacity,

I-e-, at the maximum storage

capacity

the stored patterns are unstable with

respect to a

single spin flip.

Mertens et al.

[ii

then calculated the storage

capacity

at a fixed size of the basins of attraction and

found,

as did

Rieger,

the storage

capacity

to decrease like

Q-2_

While these results may be

encouraging

to extremists who view the world in terms of green and non-green, it is

discouraging

for those who

prefer

to

distinguish

between the various gray levels. Furthermore these results could very well

disspell

claims about the usefulness of neural

networks.

One

path

out of this cauldron can be found

by

careful consideration of the

relationship

between two-state models and multi-state models. In two state

models,

the neurons can be described

by

a

single bit,

whereas

log~ Q

bits are

required

to describe multi-state neurons. The bits of a

single

neuron interact with each other

through equation (3).

This then is the source of the

difficulties, namely,

the interactions between the bits at a

single

neuron are not

always

(4)

constructive,

in

fact,

the bits interfere with each other. As a first

approximation

to

solving

this

problem,

one can

split

the bits off into separate

non-interacting networks,

I-e-, the neurons are

processed

so that first bit of each neuron goes into one black-and-white

network,

the second bit goes into a second

network,

etc. When the separate networks have each reached a fixed

point~

the bits are recombined to form the final state of each neuron.

Using

this

non-interacting-bit approximation, requires log~o

more

couplings

than the

straight

forward

approach,

but one expects to achieve the same storage

capacity

as the two-

state neuron models. In

particular,

for Hebb

couplings

in each

subnetwork,

the

storage capacity

is

expected

to be:

°Q

log~ Q

~~~

where a2 StS 0.142. Hence, for a realistic value of

Q

=

256,

the present

approximation

should increase the storage

capacity by nearly

four orders of

magnitude compared

with

a2/Q~.

The

expectations

are born out

by

simulations. Many of the simulations were

performed

on an Intel

iPSC/860 Hypercube

with 32 processors. The

algorithm

takes the

input

pattern,

splits

oil the

log~ Q

bits from each node and sends them to log~

Q

different groups of processors

where the networks relax in

parallel.

After convergence, the bits are recombined and

compared

to the stored pattern one is

trying

to recall. This

algorithm

is ideal for

parallel processing

and on a

32-processor machine,

it

algorithm

runs at lA

Gflops.

For the gray-tone patterns,

the neurons take on values in the interval

[0,1], hence,

the error in the

recalling

of a stored

pattern is

given by

the Euclidean distance between the stored pattern and the final state of the

network~ I.e.,

~P

~j(s, fP)2 (5)

j~~

~

' i

I=1

With this definition of the error, the retrieval

qualities

of gray-tone patterns was measured.

Figure

I shows the distance from the stored patterns to the nearest fixed

point.

As can be seen, for a tzS 0.14 the system

undergoes

a first order transition from a

phase

of

high

retrieval to a

phase

of low retrieval. Previous

high precision

simulations of networks

composed

of two-state

neurons showed that this transition

proceeded

in two steps [6]. There was an intermediate

region,

0.138 < a <

0.144,

where some fraction of the

patterns

had a fixed

point nearby

and others did not. Above

a tzS 0.144 there were no state with fixed

points nearby.

This intermediate

region

is washed out in the present

study

because it

requires

that a

given

pattern

must have

nearby fixed-points

on all

log~ Q

networks

simultaneously.

As

log~ Q

becomes

large,

this is

increasingly unlikely.

The second

figure

shows the basins of attraction as a function of a. Here

again,

a first order

jump

in the

quality

of retrieval can be seen near o m

0.14,

in

agreement with

previous

calculations for the twc-state neuron

Hopfield

model.

It should be mentioned that for the

simple Hopfield

model the use of real-valued

couplings

is inefficient [6],

however,

since the above

algorithm

is not restricted to any

particular

form

of the

coupling

matrix it can be used when the

couplings

are set via a more

complicated learning

processes. The

properties

of such networks are

currently

under

investigation

and will be

reported

upon elsewhere.

In summary, the

problem

of

storing

gray-tone patterns has been discussed and one viable solution in terms of

non-interacting

bits has been

proposed.

This solution is well suited for

parallel

computers and a

performance

of IA

Gflops

was achieved on the

iPSC/860 Hypercube

with 32 nodes. For random uncorrelated patterns, the

proposed

solution works

quite well,

(5)

1552 JOURNAL DE PHYSIQUE I N°8

~

i

j

~

'ij

..

j j

i

o

o o o

05

a

Fig. 1. The circles indicate the average distance to the nearest fixed point from a stored pattern

as a function of the storage capacity, a. The +'s indicate the minimum initial distance to a stored pattern so that

a fixed point near the stored pattern is eventually reached, I.e., they indicate the basins of attraction.

however,

for correlated patterns it may be

advantageous

to consider

interacting

bit models in order to reduce the redundant

storage

of information.

Acknowledgments.

I would like to thank D. Stauifer for

helpful

comments, the HLRZ at KFA Jfilich for a grant of time on their Intel iPSC

/860 Hypercube

and

Cray-YMP

as well as the

University

of

Cologne

for a grant of time on their NEC-SX3. Financial support for this work came from the SFB-341.

References

[II Hebb D.O., The Organization of Behavior

(Wiley,

New York,

1949).

[2] Hopfield J-J-, Proc. Nat. Acad. Sci. USA 79

(1982)

2554.

[3] Amit D., Gutfreund H. and Sompolinsky H., Ann. Phys. 173

(1987)

30;

Newman C.M., Neural Networks1

(1988)

223;

Kom16s J, and Paturi R., Neural Networks 1

(1988)

239.

[4] Forrest B-M-, J. Phvs. A21

(1988)

245;

Kritzschmar J. and Kohring G-A-, J.

Phys.

France 51

(1990)

223.

[5] Homer H., Bormann D., Frick M., Kinzelbach H. and Schmidt A., Z. Phvs. 876

(1989)

381.

[6] Kohring G.A., J. Stat. Phvs. 59

(1990)

lo77.

[7] Gardner E., J. Phvs. A21

(1988)

257.

[8] Abbott L-F-, Network1

(1990)

105.

[9] Rieger H., J. Phvs. A23

(1990)

L1273.

[10] Marcus C-M-, Waugh F.R. and Westervelt R-M-, Phys. Rev. A41

(1990)

3355.

[ill Kohring

G-A-, J. Stat. Phvs. 62

(1991)

563.

[12] Mertens S., K6hler H-M- and Bos S., J. Phys. A24

(1991)

4941.

Références

Documents relatifs

When dealing with convolutional neural networks, in order to find the best fit- ting model, we optimized 13 hyperparameters: convolutional kernel size, pooling size, stride

(1) we retrieve a deterministic implicit expression for the mean-square error (MSE) performance of training and testing for any fixed connectivity matrix W ∈ R n×n which, for

Instead, Peterson’s theorem should (only) be interpreted in the following way: it formally shows that, given two other technical premises, the Archimedian condition and (as I argue

For instance, the nodes in a social network can be labeled by the individual’s memberships to a given social group ; the node of protein interaction networks can be labeled with

This volume -- which is the fourth in the series -- is an endeavor by the Central Department of Sociology/Anthropology, Tribhuvan University to continue its tradition to bring out

J'ai vécu l'expérience dee run* de té B et 3( H â LVRF avec de* chercheur* qui n'ont par craint d'uiiliter mon matériel et d'accepter ma cet laboration. Tardieu et J.L. Ranck,

As I came to understand what happens in such circles, the relationship between the bodily ontology and the social ontology became more apparent, so that the juncture was not only

We can summarize this analysis the following way: for a loss-based learning algorithm (typically SGD), our decentralized system is equivalent to training the whole set of partial