Fast recognition of real objects by an optimized hetero-associative neural network

(1)

HAL Id: jpa-00212358

https://hal.archives-ouvertes.fr/jpa-00212358

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Fast recognition of real objects by an optimized

hetero-associative neural network

H.J. Schmitz, G. Pöppel, F. Wünsch, U. Krey

To cite this version:

(2)

Fast

_recognition

of real

_objects

_by

an

optimized

hetero-associative neural network

H. J.

_Schmitz,

G.

_Pöppel,

F. Wünsch and U.

_Krey

University

of

_{Regensburg, Faculty}

of

_Physics,

D-8400

_Regensburg,

F.R.G.

(Reçu

le 3 août 1989,

accepté

le 25

_septembre

₁₉₈₉₎

Résumé. 2014 Un

concept très bien

_adapté

à la reconnaissance

_rapide

de structures fortement corrélées est

_développé

et réalisé. Nous utilisons comme mémoire hétéro-associative un code de

sortie

_optimisé

de

_façon

minimale. Nous construisons une arborescence dont

_{l’indiçage}

est

déterminé par recuit simulé. De cette

_{façon, l’algorithme}

de stabilisation des structures

mémorisées fonctionne de

_{façon optimale.}

La reconnaissance

_d’objets

« réels _»,tels des lettres,

est étudiée

_{soigneusement.}

Dans ce cas, les bruits

caractéristiques

sont fortement

_anisotropes.

Une

_légère

modification de la

stratégie

de recouvrement minimal de Krauth et Mézard, par entraînement à ce bruit

_spécifique,

_permetd’améliorer les

_performances

de notre réseau. Afin d’étudier le réseau et son _{comportement,}nous utilisons une mesure

_baptisée

« _{constructivité}»

_qui

met clairement en évidence les effets

d’anisotropie.

Un réseau est entraîné à reconnaître un texte

et à

_produire

le fichier

_{correspondant.}

Grâce à son _{architecture,}_{de nombreux processus}_peuvent

être traités en

_parallèle

et des _{« transputers}» sont _{uilisés pour}sa réalisation. Abstract. 2014

We have

_developed

and realized a _concept_{which is very well suited for}a

_quick

recognition

of

_highly

correlated _patterns.For a _{hetero-associative memory}we used a minimal

optimized

output code

_(index

_memory).

We constructed a tree structure in which the

assignment

of indices has been

_optimized

_by

simulated

_annealing.

Thus the

_algorithm

for

_optimal

_stability

of the learned _patternsworks most

_{effectively. Special}

care was taken of

_recognizing

« real »

_objects,

e.g. scanned letters. Here the characteristic noise is very

anisotropic.

We have

slightly

modified the minimal

_overlap

_strategyof Krauth and Mezard

_[1]

_{by training}

with this

_specific

noise, and could

_improve

the

performance

of our network. In order to _get

_insight

into the network and its behaviour we used a measure called

_{constructivity}

which shows

_clearly

the

_anisotropic

effects. We trained a network to

_recognize

a scanned text and to

_produce

the associated text file. Due to the architecture of the network many processes can be treated in

_parallel.

Therefore we used

transputers for the

_{implementation.}

Classification

Physics

Abstracts

75.10H - 05.90 - 87.10

1. Introduction.

In _{the last few years there}was a _{lot of progress}

_conceming

neural networks

_[2]

even in fields which have been believed to be

_{relatively completed,}

_{e.g. the}

_perceptron

_[3].

To be

mentioned is the enhanced

_{performance by improved learning}

rules for hetero- and

auto-associative memories

_[4-7].

_Especially

the minimal

_{overlap algorithm}

_[MO]

of Krauth and

Mezard

_[1]

achieves

_{optimal stability,}

a measure for the

_performance

of a network. Besides linear

_{separable problems}

there was also a lot _{of progress in}

_finding

solutions for hard

(3)

computational problems.

Some

_examples

are the introduction of

_higher

order correlations

[8-10]

and new

_concepts

to handle

_continuously

valued neurons

[11]

and hidden units

[12].

All

these

developments

led to a

_greater

_attractivity

for

_{applications.}

Nevertheless

_practical

_{limitations arise very often due}to the

_{huge computational}

_power needed for the simulation of real

_systems.

If one wants to _{process e.g.}

_images

with

_high

resolution in an associative memory, the number of

_couplings

and

_by

this the number of

operations

grows very

quickly.

In _{this paper}we want to show how this

_problem

can be treated

by

an index memory which has a minimal and

_optimized

_output

code. Our first version

_only

uses N.

_log2

_p

_couplings

(p

is the number of stored

_patterns

and N the number of

_neurons).

So we need

_only

N.

_log2

_p

_{multiplications}

and additions for the

_recognition.

Therefore both the memory

requirement

is low and the network is very

quick ;

additionally

the

_computations

can be

_separated

into

_parallel

_processes.

If minimal

storage

requirement

is not so

_crucial,

one can

_essentially

increase the

_optimal

stability

in our second version

_{by introducing}

an index tree, i.e. hierarchical

_arrangement.

This memory has

(p - 1 )N .

couplings

but nevertheless there are

_only

N.

_log2

_p

_operations

(mult

+

add)

_{necessary for the decision which}

_pattern

_belongs

to a

_{given input object.}

The value of the

_optimal

_{stability depends}

on the chosen

_{output code ;}

therefore we

_optimized

this

code

_by

simulated

_annealing

with a

_particular

cost function to achieve « best »

_optimal

stability.

For our index tree this means a skillfull

_{rearrangement}

of the indices. We used a

realistic test for the

_performance

of our network : we scanned a

_printed

_{text ; different}

_type

styles

were tested. After classical

_{preprocessing}

_(letters

with 1024

_pixels,

N = 1 024 and 64

patterns, p

=

64)

the text _was

transposed by

the hetero-associative network into a text file. Our

_input

patterns

are

_highly

_correlated,

the

percentage

of correlation

_depending

on the

type

style.

The

patterns

were disturbed

_{by anisotropic}

noise

_arising

from paper and

printing

quality

and from the resolution of the scanner. We extended the MO

_{algorithm by training}

with noise

_{[5, 7]}

for this

_specific

case. With this

_procedure

we

_got

an essential

_improvement

of

the retrieval

_quality.

Now we could even retrieve

_letters,

which

_got

_{by scanning}

errors

_higher

correlations to other

_patterns

than to the actual letter. For

_strongly

correlated and structured

patterns

the

_{stability gives only}

a hint for the basins of attraction. We therefore introduced the

constructivity,

a measure which

_gives

more detailed information about the net. A similar

measure can be found for other

_types

of nets

_{[13, 14].}

We want to

_emphasize

_{that all kinds of processes}were carried out in

_parallel

with two

coupled T800-transputers.

This included

_segmentation

of the

_pictures,

simulated

_annealing

in the index tree, and the

learning

process.

Especially

the

assignment

of indices to a

_given

set of

patterns

and the

_learning

process are

_{highly parallelisible.}

For

_{storing p}

_patterns

one

_might

use p - 1

transputers,

i.e. 63

_transputers

in our case.

The retrieval errors

_depend

on the

_type

_style

and the

_quality

of the scanned

script

and vary

between one

_promille

and some

_percent

in the worst case. Our results indicate that the index

memory, and

especially

its enhanced

version,

the

optimized

index tree memory, are in fact

very well suited for real

problems

in

practical applications.

2.

_Image

_{preprocessing}

for letters.

Before

_starting

the

_learning

_processand the text

_recognition

the

_patterns

have to be extracted

from the

_graphic

_background

and must be normalized. We used a 300

_dpi

scanner to

_produce

a

_binary

black/white

_pixel

file. There was also some

_{picture preprocessing}

with classical

methods : we

_applied

_{picture segmentation, scaling}

and shift in

_position

of the extracted

(4)

SEGMENTATION OF SCANNED PICTURES. - The information contained in the

picture

was

transformed into the so called

_{Run-Length-Code}

_[15].

From this we extracted

_by

a

_labelling

algorithm objects consisting

of continuous black

_regions.

In fact not all letters consist of

_only

one such

_region,

_{e.g. « i »}or the german « â ».

Using

a

_{special algorithm}

to detect such

structures we

_regarded

them as one

_object.

Furthermore we took care of

_separating

kerned

objects

e.g.

1è,

where the two letters have a vertical

overlap.

SHIFT AND SCALE INVARIANCES. - The extracted

objects

were now

_{adjusted according}

to

their « center of mass ». We looked for the relation between

height

and width of all letters and

assigned

the neurons to an area

_having

about the same

_proportion.

We

_computed

a common zoom factor for all letters in order to maximize the number of neurons _{which carry}

information. Because of the invariance

_against

shifts the information about the vertical

position

is

_{lost ;}

we handled this

_{problem by introducing}

additional neurons for this

_position,

which is useful to

_distinguish

« _{p » and}« P ».

3. The heteroassociative index memory.

3.1 OPTIMAL RECOGNITION TIME WITH AN INDEX MEMORY. -

_Handling

a

_completely

connected neural network needs a lot of number

crunching.

For

_practical

applications

a

resolution of N > 1 000 _{is necessary which}can

_only

be mastered

by

a

powerful

_computer.

We

want to reduce the time for

_learning

and

_recognition

without

_loosing

a

high

retrieval

quality.

Therefore we choose an associative index or address memory.

We

_assign

to each

_pattern

an index

_consisting

of a set of

_binary

values - 1 and

+ 1.

_{For p}

= 2x

patterns

we need therefore at least x =

log2 p

index bits. This results in _an

asymmetric coupling

matrix

_Jik

with rows i

= 1,

...,

log2 P

and columns k

= 1,

..., N

(see

Fig. 1).

The number of

_couplings

is

_{given by}

N .

_log2

_p.

The

_assignment

of a certain

_input

_pattern

to an index is realized

_by

the

_following

one

_step

dynamics :

with

_input

vector S

_(Sk

= _±

1 ),

index _vector_u

(ui

_{= ±}

1 )

and real

couplings Jik.

Fig.

1. - The two

(5)

One sees that for this

_one-step

relaxation

only log2 p .

N

multiplications

and additions are

necessary. This means an enormous reduction of

_{computational}

effort in

_comparison

to a

completely

connected net. The

_procedure

is also very economic

concerning

memory

requirement.

For realistic dimensions

_(p

=

64,

N

= 1024)

the

coupling

matrix needs

only

24

KBytes.

In table II we

_give

an

_example

for the

_assignment

of indices to the

_eight

letters « A » till « H ». Here

they

are

_{simply alphabetically}

_ordered,

_they

_may

_represent

_memory

_addresses,

where their ASCII-Code is stored. Whether one can find a more efficient

_ordering,

will be

discussed below

_(see

Tab.

_III).

3.2 LEARNING ALGORITHM AND CONSTRUCTIVITY. - We

use the minimal

_{overlap algorithm}

of Krauth and Mezard

_[1]

to

_adapt

the net to a

_given

set of

_patterns.

At first all

_couplings

are

cleared. In

_step

2 we

_present

all

_patterns

gl’

to the network and measure the

_stability

dw

for each index bit i :

with the row vector

_Ji

of the

_coupling

matrix. In

_step

3 we learn for each index bit

_only

one

pattern,

namely

one of those wich have the smallest

_stability

_(«

minimal

_overlap

_{») :}

Steps

2 and 3 are

_repeated

until

_min,,

(à;’ )

reaches a maximum

_(«

_{optimal stability}

_»).

At 3the end the row vector

_Ji

is

_{given by Ji =}

_{(1 /N ) £ gfi}

_{ur #L.}

The

_parameters

_gr

_indicate,

03BC how often bit i of

_{pattern g}

has been leamed.

For each index bit i

only

the row vector

_Ji

determines which

_pattern

has to be learned at a

certain

_learning

_step.

The

_{learning procedure}

for a bit i also involves no other index bits

j :0 i.

Therefore each row

_Ji

of the

_coupling

matrix can be learned

_{separately :}

the

learning

algorithm

can be carried out in

_{parallel using log2 p}

_processors

_{(e.g. transputers).}

Text

_recognition

is a hard task for this memory model. The

_patterns

have a mean

correlation between 0.6 and

_0.7,

i.e. 80 % till 85 % of the

_pixels

are

_equal

on an _average.

Between some

_{special pairs}

of

_patterns

(e.g.

« e » and « c

»)

there exists a correlation of more

than

0.9,

about 95 % of the

_pixels

are

_equal.

Furthermore we demand that the retrieval is

correct even for letters

which,

_{by scanning}

errors,

got

noisy

at the transitions between black and white

_regions

or are

_displaced

as a whole

_by

one

_pixel

(discretization error).

For this aim

very

anisotropic

basins of attraction are _necessary.

To characterize such

_{anisotropic regions}

the

_stability

_à;’

_{gives only}

a hint. The

_system

_may

have a

_{high stability}

but is affected

_{strongly by}

the

_specific

noise. A few

_large

_coupling

constants _may

_produce

a

_{high stability.}

But if the characteristic noise will influence

_just

the

neurons

_belonging

to these

_{couplings, recognition}

will not work. We therefore define a new

measure

_K!k(Sk)

called

constructivity :

It describes for a

_{given input}

vector S the contribution of one

_input

neuron k to the

(6)

Fig.

2. - The tree structure for 8

patterns.

Table I. - Retrieval test with 1022

patterns.

value works

_against

it. The

_stability

of a

_pattern’s

index

bit,

à;’,

is

_simply

_{given by}

the sum over constructivities :

It is useful to know _{which and how many}

_input

neurons of a

_pattern

are allowed to be

(7)

Kuik

by magnitude

answers this

_{question. E.g.}

the

sign

of the

largest

constructivities can be

changed

as

_long

as the sum over all

_Ktk

_stays

_positive

(see

also Sect.

5).

In

_figures

3 and 6

_examples

for the vector

_{Kfi (S)}

=

(Ktî(Sl)

...

KilN(SN»

for fixed

i, J.L

and S are

_plotted.

The range of the index k

= 1,

..., N is broken into lines

giving

a

two-dimensional

_picture

of the same kind as the

original

scanned

pattern,

however with nine different grey values for the

pixels.

Dark

_points

indicate constructive neurons, white

points

destructive ones. If the number of learned

_patterns

is small one

_recognizes

_{in these grey scale}

pictures

the

_patterns

stored in the

_Jik

_(e.g.

« i » and « m » in the case of

_figures

3 and 4 where the

_input

_pattern

S was « i

»).

_By

this method the decisive neurons with a

_{large positive}

or

negative constructivity

can be found and reasons for wrong

_assignment

can be studied. Neurons which are never or seldom affected

_by

noise can also be detected.

Figure

3 and

_figure

4 are obtained with the same

_input

vector S

_(«

i

_»),

but with different

coupling

matrices

_resulting

from

_training

without noise

_{(Fig. 3)}

and with noise

_{(Fig. 4).}

Details on the

_{training procedure}

will be discussed later. At

_present

we

_only

concentrate on

the differences of these two

_{figures : namely}

in

_figure

_3,

the constructivities are either zero or

Fig.

3. -

(8)

Fig.

4. -

Gray

scale

_picture

_{(lower part)}

and sorted constructivities

_{(upper part)}

for a matrix row within the lowest

layer ; learning

with

noise ;

stored _{patterns :}« i », « m » ; tested pattern : « i ».

take a constant

_positive

value

_(see

_{the upper}

_part

of the

_figure)

whereas in

_figure

4 the

distribution of the constructivities is more

_smooth,

_particularly

at the

_edge,

which

_implies

that

scanner errors, _areno

_longer

so- critical.

From these and other

_figures

one finds that for small number

_of

_patterns_many

constructivities are zero. In that case there is still

_place

for more information within the

coupling

matrix,

and the basins of attraction can be modelled without

_destroying

the

_stability

of the

patterns

themselves. A

_{good modelling}

can be realized

_{by learning}

also

_displaced

patterns

and

_patterns

with

_{noisy edges}

_(see

in Sect.

_4.4).

In connection with the index tree

(see

Sect.

_4),

these considerations are of

_great

_{importance especially}

for the lower

_part

of the tree, where the condition of a small number of

patterns

is fulfilled.

4. The index tree.

When we use

_{exactly x}

=

log2P

index bits

for p

= 2x

_patterns

_{we see}from table II that

L ur

= 0. For every index bit i there exist _twoclasses of

_patterns

with the _samenumber of

(9)

Table II. -

An

_{example for}

an

_assignment

o f

indices to 8 patterns.

elements. The members of the first class have a

_positive

_M/B

the members of the second one

have a

_negative

_ut.

One now of the

_coupling

matrix

_produces

one index bit and thus makes an

assignment

to one class. The

_complete

index vector of a

_pattern

results from the common

_part

of these

_{log2 p}

classes.

A

_{hierarchically}

structured index tree

_(see

_Fig.

₂₎

is a better way to

_recognize

a

_pattern

as

correctly

as

_possible.

The first

_step

is the same as above. We decide in this

_layer

1,

whether the most

_significant

index bit

_{(i = 1 )}

is + 1 or - 1.

_By

this we find out to which of two main classes the

_pattern

_belongs.

For this aim

_only

the lst row

_JI

of the

_coupling

matrix is needed.

Depending

on the result we use two branches for the next

_step,

i.e. for the determination of the index bit 2

_{(layer 2).}

Both branches use the 2nd row

_J2

of the

_coupling

matrix.

However,

the two versions of

_J2

are

_generally

_{different ;}

we therefore introduce a

_numbering

,J,!

with i as the number of the

_layer

and f

as the

_special

version of the matrix row within one

layer.

We continue this

_procedure

until - within the last

_layer

- there rests

only

a decision between two

_possible

_patterns.

The number of decision

_{(log2 p )}

is the same as in the unstructured index

_model,

however as we will see, with the hierarchical tree

_higher

stabilities will be obtained. The reason is that

more

_coupling

constants are available for

_{optimization, namely}

_e.g.

J1i

and

J?

instead of

_only

Ji.

A detailed

_proof

will be

_given

below.

The

_system

considered in the

_following

consists of

_{p - 1}

_{one-row-matrices,}

but for the retrieval of a

_special

_pattern

_{only log2 p}

of them are

_used,

_{e.g. for the letter}« A » in

figure

2

only

J11, J2

and

_Ji

3-4.1 SAME RECOGNITION TIME, BUT HIGHER STABILITIES. - In this section

we want to show that the minimal

_{stability A"}

=

mini

_{(à;’ )}

of a

_pattern

_generally

becomes

_greater

if we use

the tree model. Of course we assume a fixed

_assignment

of a set of

_patterns

to a set of index

vectors. The reason for this

_improvement

is rather obvious.

_Only

the

_top

_layer

has to decide between

_{all p}

_patterns

with rather small basins of attraction. Whereas the matrices of

_layer

i

_(i

>

1 )

have to store a reduced number of

_patterns,

_{namely p . 21 - i.}

As

_already

mentioned

in section

3.2,

a smaller number of

_patterns

can

_improve

the

_performance

of the network.

Generally,

the stabilities become

_larger

when we _godown the index tree.

We want to illustrate these remarks

_by

the

_example

of table II. In the

_simple

index model

J2

has to

_distinguish

between the set of

_{patterns {A,}

B, E,

F} (u2 = - 1 ) and {C,

D,

G,

_H

(u2 = + 1).

Using

the index tree

_J2

has

_only

to decide

_{between {A, B }}

_{and {C, D} ,}

(10)

patterns

which are known to the

Jli

for all

f.

Of course, when for each

layer i

all matrices

Jf

are

_equal,

the tree model and the

_simple

index net are

_equivalent,

however,

if

they

are

different

_stability

_{may be}

_gained.

In

fact,

for a fixed

_output

code,

each

_stability

of a

_pattern

within the

_index

tree is

_always

_greater

than or

_equal

to the

_{corresponding stability}

of the

simple

index net, since a

_given

matrix

Jf

has to store a smaller number of

_patterns

in the index

tree, as

_already

mentioned.

To prove

this,

we show that the minimal

_stability

cannot become

_greater

when new

_patterns

are

additionally

stored into one 1-row-matrix

J).

For these considerations we

_drop

the

subscripts i

and

f

and

_only

write for the matrix J and for the index of a

_pattern

u IL. Vectors

lq"

are defined for all index bits

by

These vectors are

_parallel

to the

_patterns

vectors

_gl,

but with reversed direction for

ug = - i.

We remind to the

_{geometrical point}

of view introduced

_by

Krauth and Mezard

_[1].

Their

learning algorithm adjusts J

in such a _{way that}J becomes the

_symmetry

axis of the smallest

cone

_enclosing

all vectors

_lq,".

We can understand this fact

by

the

following

consideration. At

each

_leaming

_step

the

_{pattern g,"}

with the smallest

_stability

is learned.

_According

to the definition of the

_stability

we have

The smallest

_stability

is

_{given by 2l}

=

min IL

{TIlL.

JI 1 JI}

which

_corresponds

to the

_largest

angle

between the vectors

_lq,"

and J. Therefore all

TIlL

lie within a cone with

_symmetry

axis J.

The

_angle

of the cone is determined

_by

the minimal

_stability

reached at a

_given

moment. Each

learning

step

turns J a little bit towards the vector

_lq,"

to be learned at the moment. When

everything

is stable

J

is the

_symmetry

axis of the cone mentioned. The

_assignment

of a

(noisy)

input

vector 1

will be correct if the

_angle

between its

_fi

and J is smaller than

7T 12.

From these considerations we

_clearly

see that the smallest cone

_{including p}

_patterns

cannot

get

smaller if a new

_pattern

nP+1

is added. The cone either remains the same, when

if ’

1 lies already

within

_it,

or will have a

_{larger angle.}

Therefore

_adding

a new

_pattern

cannot

increase minimal

_stability.

This is valid for an

_arbitrary

but fixed

arrangement

of

indices,

which

_completes

our

_proof.

The index tree increases the

_requirement

_{of memory. For 1024}

_pixels

and 64 letters 252

_Kbytes

are _{necessary instead of 24}

_Kbytes.

However as mentioned

_above,

the number of

computing

steps

for

_recognition

is not increased.

_{Only learning}

becomes a bit more

_{complex ;}

p - 1

instead of

_{log2 p}

matrix rows have to be

_learned,

but the mean number of

_patterns

_per

matrix row decreases. For

_learning

the whole tree,

_using

L

_learning

_steps

_according

to Krauth

and Mezard we need L. N.

_{(p. log2 P + P - 1)}

_{multiplications}

and additions instead of L. N.

_(p

+

_{1 ).}

_{log2 p.}

_{For p}= 64 that is

only

a factor of

1.15,

and _{for p -->}00 _agrees

asymptotically.

Furthermore numerical tests show that the number of

_learning

steps

can be

reduced in lower branches of the tree.

Each of the

_{p - 1}

nodes within the tree can be

computed

in

_parallel.

_{So in any}case

_learning

may be accelerated very much

by using

many processors.

4.2 OPTIMAL INDICES FOR HIGHER STABILITY. - For

_highly

correlated

_patterns

the

(11)

indices stabilities can be

_drastically

increased and errors reduced. Similar

patterns

should

_get

similar

indices,

for

_example.

For the

_top

matrix of the tree it then will be

_simple

to

_assign

these

_patterns

to the same class.

_Very

difficult decisions between

_high

correlated

_patterns

have to be made in the last

_layers,

where the matrices

_only

store a few

_patterns.

Mean stabilities for three different

_arrangements

of the same 64

_patterns

are shown in

table III. The mean correlation of the

_pattern

was

Table III. - Retrieval

of

a scanned text with 1022 letters and

_{after learning}

with 20 000

Krauth/Mezard

_steps

without

noise ;

an index tree was used with three

different assignments o f

patterns

to indices.

By

the different

_arrangements

there result different vectors

_"lIL

=

u," - gl.

The minimal

angles

of Krauth/Mezard’s cones are the

_different,

and the stabilities reached are not the

same. Therefore one should find an

_optimal

_arrangement

where the cone formed

_by

the

"lIL

has a minimal

_angle.

Then the minimal

_stability

can reach a _{maximum. There may exist} more such

_{arrangements.}

These « best »

_arrangements

cause a cone with the same

_angle

and

the same minimal

_stability.

However the mean stabilities for these

_arrangements

can _vary

because the

"lIL

may have different directions within the cone.

4.3 OPTIMIZED INDICES BY SIMULATED ANNEALING. - We looked for

_optimal

indices

_by

maximisation of the mean stabilities. For a first trial we used

_couplings

which were

constructed

_according

to the Hebb rule :

_Jk

=

(liN) L

UIL.

_{el’ (the}

index i is here

_dropped

IL

again).

The mean

_stability

is then

_{given by}

We

_performed

our simulated

_{annealing along}

the lines of standard technics

_[16]

with the

(12)

Taking

account of to the constraint that half of the indices must be +

_1,

we

_exchange

_always

two index bits with

_{opposite sign}

in course of the

_{annealing procedure.}

If the indices of

pattem 9"

and 90

would be

_exchanged,

then we

_get

The best of the above mentioned

_{configurations}

_(see

Tab.

_III)

has been found

by

this method.

_{Interestingly}

the

_procedure

is

_equivalent

to the search of index

_{configurations}

where the

_patterns

are

_grouped

into classes so that the mean correlations between these classes are

very small. In this case the cost function would be :

If there is an

_exchange

of the indices

which means mean = i.e. the above mentioned

equivalence.

The

_{rearrangement}

of the indices is done first in the

_top

_layer

of the index tree and then

step

by

step

in the lower

_layers.

Because the calculations for nodes of the same

_layer

are

independent they

could be done in

_parallel.

_{For every node of the}

_layer

i there are

configurations

of indices

_possible.

Note that the

_computing

time for the

rearrangement

of the indices needs

_only

a fraction of the

_computing

time for the

leaming

procedure.

Simulated

_annealing

with the above mentioned cost function

₍₆₎

leads to

_good

stabilities with the

_{MO-algorithm.}

From

_geometrical

considerations of the

_problem

we know that there must exist a cost function which takes

_exactly

_respect

of the

_learning

with the minimal

_{overlap algorithm}

_(this

is

_presently

under

_study).

_{In any}case we want to stress the fact that the

_{rearrangement}

of the indices leads also to a better

_performance

for the normal index memory, but

especially

for the index tree this

_procedure

shows its

_great

_effectivity.

Only

for the index tree _{memory it makes}sense to set

_priority

to the first index neuron where

corresponding

couplings

have to do the most difficult

_decisions,

whereas in the normal index memory all rows of the

_coupling

matrix have to store the same number of

_patterns.

Applying

the cost function

_(6),

the

_patterns

within one class of the tree exhibit

_higher

correlations than the

_patterns

of different classes. Lower in the tree there are more difficult

decisions,

but this effect is

_{compensated by}

the lower number of

_patterns

which have to be stored in the

_{corresponding}

matrices. We observed that within the

_top

node the

_algorithm

of Krauth and Mezard

_only

asked for a subset of all

_patterns.

At the end all

patterns

were

recognized by

the first

layer,

even those which were not learned

_explicitly

_[17].

_Obviously

the

algorithm

looks for «

typical

»

_patterns

to

_{learn ;}

other

_patterns

with a

_{high overlap}

to these

(13)

tree without

problems

and to increase the number of

_patterns

to be learned

_using

the same

number of neurons.

4.4 TRAINING WITH NOISE. -

During

the

_{training phase}

we used a

_learning

_procedure

consisting

of a series of two

_alternating

_{steps :}

At

first,

one

_learning

_step

with an MO

_algorithm

was

performed

after

presenting

the

original

patterns

with noise at the

_edges,

i.e. and

_Âlk

= 1 ug -

is the N

pattern

with the minimal

_overlap.

The

_noisy

_patterns

were

_{produced by flipping}

a fixed

percentage

of the neurons at the

_edges.

Then,

at the second

_step,

for each

pattern

one

_randomly

selects one of nine shift vectors

TIL =

_{(nl, n2 )}

_{with ni}

_0,

+ for each

pattern,

shifts the

_{corresponding}

_2D-image

as a

whole

_by

T and

_applies

the MO

_procedure

for this set of shifted

_original

patterns.

This noise used

_{during learning procedure}

reflects the

_typical

noise,

appearing during

recognition.

Both

_{training-steps}

are done

_{alternatingly,}

so _{that the pure}

_patterns

remained favorised. This method leads to the remarkable result that even such test

_patterns

are

_correctly

associated

_by

the

_network,

for which the classical method of

_{comparing hamming}

distances failed. For the future we want to introduce

_weighted

noise at different

_layers

of the

network,

because

_strong

noise

_destroys

_{the memory of the upper node but leads}

_probably

to a better formation of the basins of attraction at lower levels.

5. Results.

We tested our index tree with a scanned text and

_{good quality copies}

with different

_type

styles.

In front of the text we

_{always printed}

the 64 letters to be learned. We used these

training

patterns

for simulated

_annealing

and the

_learning

_{process itself. All in all the}text

contained 1 022 letters.

_Characters,

which were

_already

known to be critical were often

repeated.

The

_copies

were of

_{good quality.}

For further considerations we have chosen as

typical

the text

_printed

with the

type

style

« roman ». We observed little better or little worse

results with other

_type

_styles.

The

_{following examples}

were learned with the same

_assignment

of indices and with the same

_patterns.

For the

_top

node we used L

_learning

_steps

_according

to Krauth and

Mezard,

for lower

_{layers i only}

L .

_{(2/3 Y -1}

_steps

were

_applied.

We

_occupied

the nodes of the lowest

layer,

which have to store

_only

two

_patterns,

_according

to Hebb’s

_learning

rule.

We wanted that the

_system

itself

_judges

if the

_assignment

of an

_input

_pattern

S to an index is certain or uncertain. For this aim we use a

_quality

measure h defined

_by

For the

_special

case that the

_input

vector S is a learned

_pattern

tl,

is the minimum of the stabilities

_i!j

of the index bits.

_Assignments

with

_{h (S )}

1 were marked as uncertain.

5.1 RETRIEVAL QUALITY OF THE INDEX TREE.

Learning

without noise. - After 5 000

steps

the stabilities

_practically

did not

_change

_anymore

and

_learning

was

_stopped.

98.6 % of the text was

_{recognized correctly}

_(see

Tab.

_I)

_only

_highly

correlated

_pairs

of letters were _{mixed up, e.g.}

_{u/n, 1/I, !/l, ill.}

Due to the

_ordering

used these errors occurred in the nodes of

layer

3 till 6. We found out, that either the

_pattern

itself

(14)

Fig.

5. -

Gray

scale

_picture

_{(lower part)}

_{(upper part)}

for a matrix row of

layer

3 ;

learning

without noise ; stored _{patterns :}

_{(TY(y! l lf ) ;}

tested _{pattern :}« 1 ».

A

_problem

arises from the choice of the

_training

_patterns.

If

_they

are,

_by

chance,

bad then

recognition

cannot work well because these

_objects

form the

_only

basis for

_learning.

Tests with the mean value of 5 realisations of one

_pattern

did not lead to

_satisfactory

results. We suppose that we have to use _manymore scanned versions of one letter to form an ideal

training

pattern.

Nevertheless even

_{by learning}

without noise we

_got

a retrieval

_quality

which is as

_good

or even better as the classical method of

_comparing

correlations

(1.4 %

errors

_comparing

to

1.7 % of the classical

method,

see

_figure

8 for an

_example,

for an error of the classical method

and correct

_recognition

with the index

_tree).

However,

our

recognition

is much

_quicker.

The

ratio of

_computations

needed for the

_recognition

scales like

_{p/log2 p.}

For 64

_patterns

our

method is one order of

_magnitude

faster.

Learning

with noise. - The two kinds of noise

(shifting

and

_{noisy edges,}

see Sect.

_4.4)

were

applied during

the

_learning

_{process in order}to still

_improve

the retrieval

_quality.

For this aim there are more

_learning

_steps

necessary, e.g. 3(YOOO instead of 5 000. But we see from table 1

(15)

Fig.

6. -

Gray

scale

_picture

_{(lower part)}

_{(upper part)}

for a matrix row of

layer

3 ;

learning

with noise ; stored _{patterns :}

_{{TY(y! IIf} ;}

tested pattern : « 1 ».

decisions

got

smaller,

too.

_Moreover,

after

_learning

with noise the

_system

« knows » when it

has made a _wrong

_{assignment. Practically :}

all errors are marked to be uncertain decisions with h : 1.

A very

hard test for the

_system

is the

_recognition

of

_inputs

which have a

_higher

correlation to other trained

_patterns

than to the correct one. With our method more than the

half of such

_inputs

were

_{assigned correctly,}

but are marked as uncertain decisions

_(see

example

in

_Fig.

_7).

The

_special

_parameters

for the

_learning

with noise

_{depend strongly}

on the

_patterns,

i.e. on

the

type

style

of the letters. The

_type

_style

« roman » on which we

_report

here consists of letters with many thin lines.

_Flipping

more than 3 % of the

_pixels

on

_edges

will

_distroy

such letters. For this

_style

the

_{significant improvement}

of retrieval

_quality

was reached

_{by shifting,}

not

_by

_{noisy edges}

_{(see Tab. 1).}

For other

_styles

we observed the

_opposite

behavior.

A very

suitable method to

_optimize

the noise

_parameters

was the evaluation of the constructivities

plotted

as _{grey scale}

_pictures

(see

Table 1 also contains the mean value of the «

quality

»

_parameter

h over all scanned letters.

(16)

Original

text

Let us assume then that the

_persistence

or

_repetition

of a

_{reverberatory}

activity (or "trace")

tends to induce

_lasting

cellular

_changes

that add to its

stability.

The

_assumption

can be

_precisely

stated as follown: When an axon of

cell A is near

_enough

to excite a cell B and

_repeatedly

or

_persistently

takes

part in

_{f iring}

it, some

_growth

_processor metabolic

_change

takes

_place

in one or both cells such that A’s

_efficiency,

as one of the cells

_firing

_B,is increased.

D. 0. Hebb, The

_Organization

of Behavior

Recognition

by

tree learned with noise and

optimized

indizes:

A NEUROPHYSIOLOGICAL POSTULATE

Let us assume then that the

_persistence

or

_repetition

of a

_{reverberatory}

activity

_{(or !, !, trace,)}

tends to induce

_lasting

cellular

_changes

that add to its

_stability.

The

assumption

can be

precisely

stated as follows: When an axon of cell A is near

_enough

to excite a cell B and

_repeatedly

or

persistently

takes

_part

in

_firing

_it,

some

_growth

_processor metabolic

_change

takes

_place

in one or

both cells such that

_A,s

_efficiency,

as one of the cells

_firing

_B,

is increased.

D. 0.

Hebb,

The

_Organization

of Behavior

Récognition

by

tree learned without

_noise,and

with

_alphabetical

indices:

Fig.

7. - This is

an

_example

of

_recognition

of a scanned text of Donald 0. Hebb. We used several

training matrices,

and uncertain decisions are marked in _{steps :}letters are underlined if 0.5 h = 1 and have a frame for h 0.5. In case of errors made

_by

the index tree, the correct letter is

_printed

as a lower index. Note that at first all 64 letters are

_printed,

which should be

_{recognized by}

the _system.The

(17)

Fig.

8. -

Learning

an italic type, we had a _{very bad}

_training

_patternfor the letter « e ». From left to

_right

the

_pictures

show : 1. The test _patternin _{the text,}to be

_recognized

_{(an «}

e » as a humain

_being

will see at

once),

2. The

_{corresponding training}

_pattern,and 3. The

_training

pattern « c ». A

_{recognition by}

a

simple comparison

of

_hamming

distances to the

_training

_patternsfails - the

testpattem is

_mostly

correlated to the

_training

_pattern« c ». But our index tree

_recognizes

this _pattern

_correctly

as an « e »,

marking

its decision as uncertain.

decreases. But we observed that the standard deviation of the distribution was

_reduced,

too.

By

this the number of uncertain decisions

_(h

_{1 )}

becomes smaller. 5.2 CONSTRUCTIVITIES. - In

figures

3-6 some

_typical

results for the

_{constructivity}

are

presented.

For

_figure

3 and

_figure

5 we

_{applied simple learning,}

for

_figure

4 and

_figure

6

learning

with noise.

_Figure

3 and

_figure

_4,

which have

_already

been introduced in section 3

display

the

_{constructivity}

for one node of the lowest

_{layer, namely}

the one which contains the

letters « i » and « m ». We looked for the

constructivity

of « i ».

_{Many couplings belong}

to the

background

and contain no information. Without noise

only

the different

_pixels

between the

two letters are marked

_by

some constant

_positive

value of the

_{constructivity. Applying}

noise

« smoothes » the constructivities. The number of

big

constructivities

corresponding

to the number of decisive

_pixels

decreases. This is the case

_especially

at the

_edges

of the letters. We observed however that some few constructivities which

_lay

outside the

_regions

of noise

became

_bigger,

_{e.g. inside the lines of}« m ».

Figure

5 and

_figure

6

_present

one node of

_layer

3 which stores 8 letters. To this node

belongs

a matrix row which

_{distinguishes}

between {TY

(y}

and

_{{ !lIf } .}

Here we

_regarded

the

constructivity

of « 1 ». In the

_plot

without noise there are some discrete values of the

constructivity ; again they

become continuous when noise is

_applied.

6. Conclusion.

We

_developed

a

_perceptron

like network

_{especially optimized}

for

_{practical applications.}

It

offers a

_high

retrieval rate

_concerning

« real »

_objects,

_{works very}

_quickly

and does not

consume much

_computer

_{memory. We}showed that the

_assignment

of the

_output

vectors to a

given

set of

_patterns

_strongly

influences the

_effectivity

of a neural net. For out hierarchical

index tree structure this

_arrangement

is

_crucial,

so we

_optimized

it

_by

simulated

_annealing

with a cost function related to the

_{optimal stability.}

We modified a minimal

_{overlap learning}

algorithm according

to Krauth and Mezard.

_By

this we

_got

an enhanced retrieval of

_objects

disturbed

_by

a

_specific

noise

_appearing

in

_{practical applications.}

We are certain that there are

many more

_{possibilities}

to still

_improve

the

_performance

of our index tree, which will be done

(18)

References

[1]

KRAUTH W., MEZARD M., J.

_{Phys. A}

20

₍₁₉₈₇₎

L745.

[2]

For a recent review see VAN HEMMEN J. L., DOMANY E. and SCHULTEN K.,

Physics of

neural

networks, to be

_{published by Springer Verlag}

and references therein.

[3]

MINSKY M., PAPERT S.,

Perceptrons

(MIT-press, Cambridge Ma.)

1969.

[4]

DIEDERICH S., OPPER M.,

Phys.

Rev. Lett. 58

₍₁₉₈₇₎

949.

[5]

PÖPPEL G., KREY U.,

Europhys.

Lett. 4

₍₁₉₈₇₎

979.

[6]

GARDNER E., J.

_{Phys. A}

21

₍₁₉₈₈₎

257.

[7]

GARDNER E., STROUD N., WALLACE D. J., Neural

_Computers,

Eds. R. Eckmiller and Chr. V. D.

Malsburg

NATO ASI Series F

_{(Berlin, Springer)}

41

₍₁₉₈₈₎

251.

[8]

GARDNER E., J.

_{Phys. A}

19

₍₁₉₈₆₎

3453.

[9]

PERSONNAZ L., GUYON I., DREYFUS G.,

Europhys.

Lett. 4

₍₁₉₈₇₎

863.

[10]

KREY U., PÖPPEL G., Proc. of the conference Measures of

_Complexity

held from 29

_Sept.

till 2nd Oct. 1987 in Rome, Eds. L. Peliti and A.

_Vulpiani,

Lect. Notes

_Phys.

_{(Berlin, Springer)}

314

(1988)

35.

[11]

FUCHS A., HAKEN H., Biol.

_Cybern.

60

₍₁₉₈₈₎

17, 107 ; Erratum 60

₍₁₉₈₈₎

476.

[12]

MEZARD M., NADAL J. P., J.

_{Phys. A}

22

₍₁₉₈₉₎

2191.

[13]

BUHMANN J., SCHULTEN K.,

Biological Cybernetics

54

₍₁₉₈₆₎

319.

[14]

LINDEN A., KINDERMANN _J.,DANIP

_workshop

held from 24 to 25

April

1989, GMD SchloB

Birlinghoven,

D-5205 Sankt

_Augustin,

to be

_published.

[15]

YOUNG I. T., PEVERINI R. _C.,VERBEEK P. _W.,VAN OTTERLOO P. J., A new

_{Implementation}

for the

_Binary

and Minkowski

_{Operators, Computer Graphics}

and

_{Image Processing}

17

₍₁₉₈₁₎

189.

[16]

KIRKPATRICK S., GELATT C. D., Jr, VECCHI M. P.,

Optimization by

Simulated

_Annealing,

Science 220

₍₁₉₈₃₎

671.

Fast recognition of real objects by an optimized hetero-associative neural network

HAL Id: jpa-00212358

https://hal.archives-ouvertes.fr/jpa-00212358

Submitted on 1 Jan 1990

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Fast recognition of real objects by an optimized

hetero-associative neural network

H.J. Schmitz, G. Pöppel, F. Wünsch, U. Krey

To cite this version:

Fast

recognition

of real

objects

by

optimized

hetero-associative neural network

Schmitz,

Pöppel,

Krey

University

Regensburg, Faculty

Physics,

Regensburg,

(Reçu

accepté

septembre

1989)

adapté

rapide

développé

optimisé

façon

l’indiçage

façon, l’algorithme

façon optimale.

d’objets

soigneusement.

caractéristiques

anisotropes.

légère

stratégie

spécifique,

performances

baptisée

qui

d’anisotropie.

produire

correspondant.

parallèle

developed

quick

recognition

highly

optimized

(index

memory).

assignment

optimized

by

annealing.

algorithm

optimal

stability

effectively. Special

recognizing

objects,

anisotropic.

slightly

overlap

_recognition

_objects

_by

_Schmitz,

_Pöppel,

_Krey

_{Regensburg, Faculty}

_Physics,

_Regensburg,

_septembre

₁₉₈₉₎

_adapté

_rapide

_développé

_optimisé

_façon

_{l’indiçage}

_{façon, l’algorithme}

_{façon optimale.}

_d’objets

_{soigneusement.}

_anisotropes.

_légère

_spécifique,

_performances

_baptisée

_qui

_produire

_{correspondant.}

_parallèle

_developed

_quick

_highly

_(index

_memory).

_optimized

_by

_annealing.

_algorithm

_optimal

_stability

_{effectively. Special}

_recognizing

_objects,

_overlap

_[1]

_{by training}

_specific

_improve

_insight

_{constructivity}

_clearly

_anisotropic

_recognize

_produce

_parallel.

_{implementation.}

_conceming

_[2]

_{relatively completed,}

_perceptron

_[3].

_{performance by improved learning}

_[4-7].

_Especially

_{overlap algorithm}

_[MO]

_[1]

_{optimal stability,}

_performance

_{separable problems}

_finding

_examples

_higher

_concepts

_continuously

_greater

_attractivity

_{applications.}

_practical