HAL Id: jpa-00212358
https://hal.archives-ouvertes.fr/jpa-00212358
Submitted on 1 Jan 1990
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Fast recognition of real objects by an optimized
hetero-associative neural network
H.J. Schmitz, G. Pöppel, F. Wünsch, U. Krey
To cite this version:
Fast
recognition
of real
objects
by
anoptimized
hetero-associative neural network
H. J.
Schmitz,
G.Pöppel,
F. Wünsch and U.Krey
University
ofRegensburg, Faculty
ofPhysics,
D-8400Regensburg,
F.R.G.(Reçu
le 3 août 1989,accepté
le 25septembre
1989)
Résumé. 2014 Un
concept très bien
adapté
à la reconnaissancerapide
de structures fortement corrélées estdéveloppé
et réalisé. Nous utilisons comme mémoire hétéro-associative un code desortie
optimisé
defaçon
minimale. Nous construisons une arborescence dontl’indiçage
estdéterminé par recuit simulé. De cette
façon, l’algorithme
de stabilisation des structuresmémorisées fonctionne de
façon optimale.
La reconnaissanced’objets
« réels », tels des lettres,est étudiée
soigneusement.
Dans ce cas, les bruitscaractéristiques
sont fortementanisotropes.
Unelégère
modification de lastratégie
de recouvrement minimal de Krauth et Mézard, par entraînement à ce bruitspécifique,
permet d’améliorer lesperformances
de notre réseau. Afin d’étudier le réseau et son comportement, nous utilisons une mesurebaptisée
« constructivité »qui
met clairement en évidence les effets
d’anisotropie.
Un réseau est entraîné à reconnaître un texteet à
produire
le fichiercorrespondant.
Grâce à son architecture, de nombreux processus peuventêtre traités en
parallèle
et des « transputers » sont uilisés pour sa réalisation. Abstract. 2014We have
developed
and realized a concept which is very well suited for aquick
recognition
ofhighly
correlated patterns. For a hetero-associative memory we used a minimaloptimized
output code(index
memory).
We constructed a tree structure in which theassignment
of indices has been
optimized
by
simulatedannealing.
Thus thealgorithm
foroptimal
stability
of the learned patterns works mosteffectively. Special
care was taken ofrecognizing
« real »objects,
e.g. scanned letters. Here the characteristic noise is veryanisotropic.
We haveslightly
modified the minimaloverlap
strategy of Krauth and Mezard[1]
by training
with thisspecific
noise, and couldimprove
theperformance
of our network. In order to getinsight
into the network and its behaviour we used a measure calledconstructivity
which showsclearly
theanisotropic
effects. We trained a network torecognize
a scanned text and toproduce
the associated text file. Due to the architecture of the network many processes can be treated inparallel.
Therefore we usedtransputers for the
implementation.
ClassificationPhysics
Abstracts75.10H - 05.90 - 87.10
1. Introduction.
In the last few years there was a lot of progress
conceming
neural networks[2]
even in fields which have been believed to berelatively completed,
e.g. theperceptron
[3].
To bementioned is the enhanced
performance by improved learning
rules for hetero- andauto-associative memories
[4-7].
Especially
the minimaloverlap algorithm
[MO]
of Krauth andMezard
[1]
achievesoptimal stability,
a measure for theperformance
of a network. Besides linearseparable problems
there was also a lot of progress infinding
solutions for hardcomputational problems.
Someexamples
are the introduction ofhigher
order correlations[8-10]
and newconcepts
to handlecontinuously
valued neurons[11]
and hidden units[12].
Allthese
developments
led to agreater
attractivity
forapplications.
Nevertheless
practical
limitations arise very often due to thehuge computational
power needed for the simulation of realsystems.
If one wants to process e.g.images
withhigh
resolution in an associative memory, the number ofcouplings
andby
this the number ofoperations
grows veryquickly.
In this paper we want to show how thisproblem
can be treatedby
an index memory which has a minimal andoptimized
output
code. Our first versiononly
uses N.log2
pcouplings
(p
is the number of storedpatterns
and N the number ofneurons).
So we need
only
N.log2
pmultiplications
and additions for therecognition.
Therefore both the memoryrequirement
is low and the network is veryquick ;
additionally
thecomputations
can be
separated
intoparallel
processes.If minimal
storage
requirement
is not socrucial,
one canessentially
increase theoptimal
stability
in our second versionby introducing
an index tree, i.e. hierarchicalarrangement.
This memory has
(p - 1 )N .
couplings
but nevertheless there areonly
N.log2
poperations
(mult
+add)
necessary for the decision whichpattern
belongs
to agiven input object.
The value of theoptimal
stability depends
on the chosenoutput code ;
therefore weoptimized
thiscode
by
simulatedannealing
with aparticular
cost function to achieve « best »optimal
stability.
For our index tree this means a skillfullrearrangement
of the indices. We used arealistic test for the
performance
of our network : we scanned aprinted
text ; differenttype
styles
were tested. After classicalpreprocessing
(letters
with 1024pixels,
N = 1 024 and 64patterns, p
=64)
the text wastransposed by
the hetero-associative network into a text file. Ourinput
patterns
arehighly
correlated,
thepercentage
of correlationdepending
on thetype
style.
Thepatterns
were disturbedby anisotropic
noisearising
from paper andprinting
quality
and from the resolution of the scanner. We extended the MOalgorithm by training
with noise[5, 7]
for thisspecific
case. With thisprocedure
wegot
an essentialimprovement
ofthe retrieval
quality.
Now we could even retrieveletters,
whichgot
by scanning
errorshigher
correlations to other
patterns
than to the actual letter. Forstrongly
correlated and structuredpatterns
thestability gives only
a hint for the basins of attraction. We therefore introduced theconstructivity,
a measure whichgives
more detailed information about the net. A similarmeasure can be found for other
types
of nets[13, 14].
We want to
emphasize
that all kinds of processes were carried out inparallel
with twocoupled T800-transputers.
This includedsegmentation
of thepictures,
simulatedannealing
in the index tree, and thelearning
process.Especially
theassignment
of indices to agiven
set ofpatterns
and thelearning
process arehighly parallelisible.
Forstoring p
patterns
onemight
use p - 1
transputers,
i.e. 63transputers
in our case.The retrieval errors
depend
on thetype
style
and thequality
of the scannedscript
and varybetween one
promille
and somepercent
in the worst case. Our results indicate that the indexmemory, and
especially
its enhancedversion,
theoptimized
index tree memory, are in factvery well suited for real
problems
inpractical applications.
2.Image
preprocessing
for letters.Before
starting
thelearning
process and the textrecognition
thepatterns
have to be extractedfrom the
graphic
background
and must be normalized. We used a 300dpi
scanner toproduce
a
binary
black/whitepixel
file. There was also somepicture preprocessing
with classicalmethods : we
applied
picture segmentation, scaling
and shift inposition
of the extractedSEGMENTATION OF SCANNED PICTURES. - The information contained in the
picture
wastransformed into the so called
Run-Length-Code
[15].
From this we extractedby
alabelling
algorithm objects consisting
of continuous blackregions.
In fact not all letters consist ofonly
one such
region,
e.g. « i » or the german « â ».Using
aspecial algorithm
to detect suchstructures we
regarded
them as oneobject.
Furthermore we took care ofseparating
kernedobjects
e.g.1è,
where the two letters have a verticaloverlap.
SHIFT AND SCALE INVARIANCES. - The extracted
objects
were nowadjusted according
totheir « center of mass ». We looked for the relation between
height
and width of all letters andassigned
the neurons to an areahaving
about the sameproportion.
Wecomputed
a common zoom factor for all letters in order to maximize the number of neurons which carryinformation. Because of the invariance
against
shifts the information about the verticalposition
islost ;
we handled thisproblem by introducing
additional neurons for thisposition,
which is useful todistinguish
« p » and « P ».3. The heteroassociative index memory.
3.1 OPTIMAL RECOGNITION TIME WITH AN INDEX MEMORY. -
Handling
acompletely
connected neural network needs a lot of number
crunching.
Forpractical
applications
aresolution of N > 1 000 is necessary which can
only
be masteredby
apowerful
computer.
Wewant to reduce the time for
learning
andrecognition
withoutloosing
ahigh
retrievalquality.
Therefore we choose an associative index or address memory.
We
assign
to eachpattern
an indexconsisting
of a set ofbinary
values - 1 and+ 1.
For p
= 2xpatterns
we need therefore at least x =log2 p
index bits. This results in anasymmetric coupling
matrixJik
with rows i= 1,
...,log2 P
and columns k= 1,
..., N(see
Fig. 1).
The number ofcouplings
isgiven by
N .log2
p.The
assignment
of a certaininput
pattern
to an index is realizedby
thefollowing
onestep
dynamics :
with
input
vector S(Sk
= ±1 ),
index vector u(ui
= ±1 )
and realcouplings Jik.
Fig.
1. - The twoOne sees that for this
one-step
relaxationonly log2 p .
Nmultiplications
and additions arenecessary. This means an enormous reduction of
computational
effort incomparison
to acompletely
connected net. Theprocedure
is also very economicconcerning
memoryrequirement.
For realistic dimensions(p
=64,
N= 1024)
thecoupling
matrix needsonly
24
KBytes.
In table II we
give
anexample
for theassignment
of indices to theeight
letters « A » till « H ». Herethey
aresimply alphabetically
ordered,
they
mayrepresent
memoryaddresses,
where their ASCII-Code is stored. Whether one can find a more efficient
ordering,
will bediscussed below
(see
Tab.III).
3.2 LEARNING ALGORITHM AND CONSTRUCTIVITY. - We
use the minimal
overlap algorithm
of Krauth and Mezard
[1]
toadapt
the net to agiven
set ofpatterns.
At first allcouplings
arecleared. In
step
2 wepresent
allpatterns
gl’
to the network and measure thestability
dw
for each index bit i :with the row vector
Ji
of thecoupling
matrix. Instep
3 we learn for each index bitonly
onepattern,
namely
one of those wich have the smalleststability
(«
minimaloverlap
») :
Steps
2 and 3 arerepeated
untilmin,,
(à;’ )
reaches a maximum(«
optimal stability
»).
At 3the end the row vectorJi
isgiven by Ji =
(1 /N ) £ gfi
ur #L.
Theparameters
gr
indicate,
03BC how often bit i of
pattern g
has been leamed.For each index bit i
only
the row vectorJi
determines whichpattern
has to be learned at acertain
learning
step.
Thelearning procedure
for a bit i also involves no other index bitsj :0 i.
Therefore each rowJi
of thecoupling
matrix can be learnedseparately :
thelearning
algorithm
can be carried out inparallel using log2 p
processors(e.g. transputers).
Text
recognition
is a hard task for this memory model. Thepatterns
have a meancorrelation between 0.6 and
0.7,
i.e. 80 % till 85 % of thepixels
areequal
on an average.Between some
special pairs
ofpatterns
(e.g.
« e » and « c»)
there exists a correlation of morethan
0.9,
about 95 % of thepixels
areequal.
Furthermore we demand that the retrieval iscorrect even for letters
which,
by scanning
errors,got
noisy
at the transitions between black and whiteregions
or aredisplaced
as a wholeby
onepixel
(discretization error).
For this aimvery
anisotropic
basins of attraction are necessary.To characterize such
anisotropic regions
thestability
à;’
gives only
a hint. Thesystem
mayhave a
high stability
but is affectedstrongly by
thespecific
noise. A fewlarge
coupling
constants may
produce
ahigh stability.
But if the characteristic noise will influencejust
theneurons
belonging
to thesecouplings, recognition
will not work. We therefore define a newmeasure
K!k(Sk)
calledconstructivity :
It describes for a
given input
vector S the contribution of oneinput
neuron k to theFig.
2. - The tree structure for 8patterns.
Table I. - Retrieval test with 1022
patterns.
value works
against
it. Thestability
of apattern’s
indexbit,
à;’,
issimply
given by
the sum over constructivities :It is useful to know which and how many
input
neurons of apattern
are allowed to beKuik
by magnitude
answers thisquestion. E.g.
thesign
of thelargest
constructivities can bechanged
aslong
as the sum over allKtk
stays
positive
(see
also Sect.5).
In
figures
3 and 6examples
for the vectorKfi (S)
=(Ktî(Sl)
...KilN(SN»
for fixedi, J.L
and S areplotted.
The range of the index k= 1,
..., N is broken into linesgiving
atwo-dimensional
picture
of the same kind as theoriginal
scannedpattern,
however with nine different grey values for thepixels.
Darkpoints
indicate constructive neurons, whitepoints
destructive ones. If the number of learnedpatterns
is small onerecognizes
in these grey scalepictures
thepatterns
stored in theJik
(e.g.
« i » and « m » in the case offigures
3 and 4 where theinput
pattern
S was « i»).
By
this method the decisive neurons with alarge positive
ornegative constructivity
can be found and reasons for wrongassignment
can be studied. Neurons which are never or seldom affectedby
noise can also be detected.Figure
3 andfigure
4 are obtained with the sameinput
vector S(«
i»),
but with differentcoupling
matricesresulting
fromtraining
without noise(Fig. 3)
and with noise(Fig. 4).
Details on thetraining procedure
will be discussed later. Atpresent
weonly
concentrate onthe differences of these two
figures : namely
infigure
3,
the constructivities are either zero orFig.
3. -Fig.
4. -Gray
scalepicture
(lower part)
and sorted constructivities(upper part)
for a matrix row within the lowestlayer ; learning
withnoise ;
stored patterns : « i », « m » ; tested pattern : « i ».take a constant
positive
value(see
the upperpart
of thefigure)
whereas infigure
4 thedistribution of the constructivities is more
smooth,
particularly
at theedge,
whichimplies
thatscanner errors, are no
longer
so- critical.From these and other
figures
one finds that for small numberof
patterns manyconstructivities are zero. In that case there is still
place
for more information within thecoupling
matrix,
and the basins of attraction can be modelled withoutdestroying
thestability
of thepatterns
themselves. Agood modelling
can be realizedby learning
alsodisplaced
patterns
andpatterns
withnoisy edges
(see
in Sect.4.4).
In connection with the index tree(see
Sect.4),
these considerations are ofgreat
importance especially
for the lowerpart
of the tree, where the condition of a small number ofpatterns
is fulfilled.4. The index tree.
When we use
exactly x
=log2P
index bitsfor p
= 2xpatterns
we see from table II thatL ur
= 0. For every index bit i there exist two classes ofpatterns
with the same number ofTable II. -
An
example for
anassignment
o f
indices to 8 patterns.elements. The members of the first class have a
positive
M/B
the members of the second onehave a
negative
ut.
One now of thecoupling
matrixproduces
one index bit and thus makes anassignment
to one class. Thecomplete
index vector of apattern
results from the commonpart
of these
log2 p
classes.A
hierarchically
structured index tree(see
Fig.
2)
is a better way torecognize
apattern
ascorrectly
aspossible.
The firststep
is the same as above. We decide in thislayer
1,
whether the mostsignificant
index bit(i = 1 )
is + 1 or - 1.By
this we find out to which of two main classes thepattern
belongs.
For this aimonly
the lst rowJI
of thecoupling
matrix is needed.Depending
on the result we use two branches for the nextstep,
i.e. for the determination of the index bit 2(layer 2).
Both branches use the 2nd rowJ2
of thecoupling
matrix.However,
the two versions of
J2
aregenerally
different ;
we therefore introduce anumbering
,J,!
with i as the number of thelayer
and f
as thespecial
version of the matrix row within onelayer.
We continue thisprocedure
until - within the lastlayer
- there restsonly
a decision between twopossible
patterns.
The number of decision
(log2 p )
is the same as in the unstructured indexmodel,
however as we will see, with the hierarchical treehigher
stabilities will be obtained. The reason is thatmore
coupling
constants are available foroptimization, namely
e.g.J1i
andJ?
instead ofonly
Ji.
A detailedproof
will begiven
below.The
system
considered in thefollowing
consists ofp - 1
one-row-matrices,
but for the retrieval of aspecial
pattern
only log2 p
of them areused,
e.g. for the letter « A » infigure
2only
J11, J2
andJi
3-4.1 SAME RECOGNITION TIME, BUT HIGHER STABILITIES. - In this section
we want to show that the minimal
stability A"
=mini
(à;’ )
of apattern
generally
becomesgreater
if we usethe tree model. Of course we assume a fixed
assignment
of a set ofpatterns
to a set of indexvectors. The reason for this
improvement
is rather obvious.Only
thetop
layer
has to decide betweenall p
patterns
with rather small basins of attraction. Whereas the matrices oflayer
i(i
>1 )
have to store a reduced number ofpatterns,
namely p . 21 - i.
Asalready
mentionedin section
3.2,
a smaller number ofpatterns
canimprove
theperformance
of the network.Generally,
the stabilities becomelarger
when we go down the index tree.We want to illustrate these remarks
by
theexample
of table II. In thesimple
index modelJ2
has todistinguish
between the set ofpatterns {A,
B, E,
F} (u2 = - 1 ) and {C,
D,
G,
H
(u2 = + 1).
Using
the index treeJ2
hasonly
to decidebetween {A, B }
and {C, D} ,
patterns
which are known to theJli
for allf.
Of course, when for eachlayer i
all matricesJf
areequal,
the tree model and thesimple
index net areequivalent,
however,
ifthey
aredifferent
stability
may begained.
Infact,
for a fixedoutput
code,
eachstability
of apattern
within the
index
tree isalways
greater
than orequal
to thecorresponding stability
of thesimple
index net, since agiven
matrixJf
has to store a smaller number ofpatterns
in the indextree, as
already
mentioned.To prove
this,
we show that the minimalstability
cannot becomegreater
when newpatterns
areadditionally
stored into one 1-row-matrixJ).
For these considerations wedrop
thesubscripts i
andf
andonly
write for the matrix J and for the index of apattern
u IL. Vectors
lq"
are defined for all index bitsby
These vectors are
parallel
to thepatterns
vectorsgl,
but with reversed direction forug = - i.
We remind to the
geometrical point
of view introducedby
Krauth and Mezard[1].
Theirlearning algorithm adjusts J
in such a way that J becomes thesymmetry
axis of the smallestcone
enclosing
all vectorslq,".
We can understand this factby
thefollowing
consideration. Ateach
leaming
step
thepattern g,"
with the smalleststability
is learned.According
to the definition of thestability
we haveThe smallest
stability
isgiven by 2l
=min IL
{TIlL.
JI 1 JI}
whichcorresponds
to thelargest
angle
between the vectorslq,"
and J. Therefore allTIlL
lie within a cone withsymmetry
axis J.The
angle
of the cone is determinedby
the minimalstability
reached at agiven
moment. Eachlearning
step
turns J a little bit towards the vectorlq,"
to be learned at the moment. Wheneverything
is stableJ
is thesymmetry
axis of the cone mentioned. Theassignment
of a(noisy)
input
vector 1
will be correct if theangle
between itsfi
and J is smaller than7T 12.
From these considerations we
clearly
see that the smallest coneincluding p
patterns
cannotget
smaller if a newpattern
nP+1
is added. The cone either remains the same, whenif ’
1 lies already
withinit,
or will have alarger angle.
Thereforeadding
a newpattern
cannotincrease minimal
stability.
This is valid for anarbitrary
but fixedarrangement
ofindices,
whichcompletes
ourproof.
The index tree increases the
requirement
of memory. For 1024pixels
and 64 letters 252Kbytes
are necessary instead of 24Kbytes.
However as mentionedabove,
the number ofcomputing
steps
forrecognition
is not increased.Only learning
becomes a bit morecomplex ;
p - 1
instead oflog2 p
matrix rows have to belearned,
but the mean number ofpatterns
permatrix row decreases. For
learning
the whole tree,using
Llearning
steps
according
to Krauthand Mezard we need L. N.
(p. log2 P + P - 1)
multiplications
and additions instead of L. N.(p
+1 ).
log2 p.
For p = 64 that isonly
a factor of1.15,
and for p --> 00 agreesasymptotically.
Furthermore numerical tests show that the number oflearning
steps
can bereduced in lower branches of the tree.
Each of the
p - 1
nodes within the tree can becomputed
inparallel.
So in any caselearning
may be accelerated very much
by using
many processors.4.2 OPTIMAL INDICES FOR HIGHER STABILITY. - For
highly
correlatedpatterns
theindices stabilities can be
drastically
increased and errors reduced. Similarpatterns
shouldget
similarindices,
forexample.
For thetop
matrix of the tree it then will besimple
toassign
these
patterns
to the same class.Very
difficult decisions betweenhigh
correlatedpatterns
have to be made in the lastlayers,
where the matricesonly
store a fewpatterns.
Mean stabilities for three different
arrangements
of the same 64patterns
are shown intable III. The mean correlation of the
pattern
wasTable III. - Retrieval
of
a scanned text with 1022 letters andafter learning
with 20 000Krauth/Mezard
steps
withoutnoise ;
an index tree was used with threedifferent assignments o f
patterns
to indices.By
the differentarrangements
there result different vectors"lIL
=u," - gl.
The minimalangles
of Krauth/Mezard’s cones are thedifferent,
and the stabilities reached are not thesame. Therefore one should find an
optimal
arrangement
where the cone formedby
the"lIL
has a minimalangle.
Then the minimalstability
can reach a maximum. There may exist more sucharrangements.
These « best »arrangements
cause a cone with the sameangle
andthe same minimal
stability.
However the mean stabilities for thesearrangements
can varybecause the
"lIL
may have different directions within the cone.4.3 OPTIMIZED INDICES BY SIMULATED ANNEALING. - We looked for
optimal
indicesby
maximisation of the mean stabilities. For a first trial we usedcouplings
which wereconstructed
according
to the Hebb rule :Jk
=(liN) L
UIL.el’ (the
index i is heredropped
ILagain).
The meanstability
is thengiven by
We
performed
our simulatedannealing along
the lines of standard technics[16]
with theTaking
account of to the constraint that half of the indices must be +1,
weexchange
always
two index bits with
opposite sign
in course of theannealing procedure.
If the indices ofpattem 9"
and 90
would beexchanged,
then weget
The best of the above mentioned
configurations
(see
Tab.III)
has been foundby
this method.Interestingly
theprocedure
isequivalent
to the search of indexconfigurations
where thepatterns
aregrouped
into classes so that the mean correlations between these classes arevery small. In this case the cost function would be :
If there is an
exchange
of the indiceswhich means mean = i.e. the above mentioned
equivalence.
The
rearrangement
of the indices is done first in thetop
layer
of the index tree and thenstep
by
step
in the lowerlayers.
Because the calculations for nodes of the samelayer
areindependent they
could be done inparallel.
For every node of thelayer
i there areconfigurations
of indicespossible.
Note that thecomputing
time for therearrangement
of the indices needsonly
a fraction of thecomputing
time for theleaming
procedure.
Simulatedannealing
with the above mentioned cost function(6)
leads togood
stabilities with the
MO-algorithm.
Fromgeometrical
considerations of theproblem
we know that there must exist a cost function which takesexactly
respect
of thelearning
with the minimaloverlap algorithm
(this
ispresently
understudy).
In any case we want to stress the fact that therearrangement
of the indices leads also to a betterperformance
for the normal index memory, butespecially
for the index tree thisprocedure
shows itsgreat
effectivity.
Only
for the index tree memory it makes sense to setpriority
to the first index neuron wherecorresponding
couplings
have to do the most difficultdecisions,
whereas in the normal index memory all rows of thecoupling
matrix have to store the same number ofpatterns.
Applying
the cost function(6),
thepatterns
within one class of the tree exhibithigher
correlations than thepatterns
of different classes. Lower in the tree there are more difficultdecisions,
but this effect iscompensated by
the lower number ofpatterns
which have to be stored in thecorresponding
matrices. We observed that within thetop
node thealgorithm
of Krauth and Mezardonly
asked for a subset of allpatterns.
At the end allpatterns
wererecognized by
the firstlayer,
even those which were not learnedexplicitly
[17].
Obviously
thealgorithm
looks for «typical
»patterns
tolearn ;
otherpatterns
with ahigh overlap
to thesetree without
problems
and to increase the number ofpatterns
to be learnedusing
the samenumber of neurons.
4.4 TRAINING WITH NOISE. -
During
thetraining phase
we used alearning
procedure
consisting
of a series of twoalternating
steps :
At
first,
onelearning
step
with an MOalgorithm
wasperformed
afterpresenting
theoriginal
patterns
with noise at theedges,
i.e. andÂlk
= 1 ug -
is the Npattern
with the minimaloverlap.
Thenoisy
patterns
wereproduced by flipping
a fixedpercentage
of the neurons at theedges.
Then,
at the secondstep,
for eachpattern
onerandomly
selects one of nine shift vectorsTIL =
(nl, n2 )
with ni
0,
+ for eachpattern,
shifts thecorresponding
2D-image
as awhole
by
T andapplies
the MOprocedure
for this set of shiftedoriginal
patterns.
This noise used
during learning procedure
reflects thetypical
noise,
appearing during
recognition.
Both
training-steps
are donealternatingly,
so that the purepatterns
remained favorised. This method leads to the remarkable result that even such testpatterns
arecorrectly
associated
by
thenetwork,
for which the classical method ofcomparing hamming
distances failed. For the future we want to introduceweighted
noise at differentlayers
of thenetwork,
because
strong
noisedestroys
the memory of the upper node but leadsprobably
to a better formation of the basins of attraction at lower levels.5. Results.
We tested our index tree with a scanned text and
good quality copies
with differenttype
styles.
In front of the text wealways printed
the 64 letters to be learned. We used thesetraining
patterns
for simulatedannealing
and thelearning
process itself. All in all the textcontained 1 022 letters.
Characters,
which werealready
known to be critical were oftenrepeated.
Thecopies
were ofgood quality.
For further considerations we have chosen astypical
the textprinted
with thetype
style
« roman ». We observed little better or little worseresults with other
type
styles.
The
following examples
were learned with the sameassignment
of indices and with the samepatterns.
For thetop
node we used Llearning
steps
according
to Krauth andMezard,
for lowerlayers i only
L .(2/3 Y -1
steps
wereapplied.
Weoccupied
the nodes of the lowestlayer,
which have to storeonly
twopatterns,
according
to Hebb’slearning
rule.We wanted that the
system
itselfjudges
if theassignment
of aninput
pattern
S to an index is certain or uncertain. For this aim we use aquality
measure h definedby
For the
special
case that theinput
vector S is a learnedpattern
tl,
is the minimum of the stabilitiesi!j
of the index bits.Assignments
withh (S )
1 were marked as uncertain.5.1 RETRIEVAL QUALITY OF THE INDEX TREE.
Learning
without noise. - After 5 000steps
the stabilitiespractically
did notchange
any moreand
learning
wasstopped.
98.6 % of the text wasrecognized correctly
(see
Tab.I)
only
highly
correlatedpairs
of letters were mixed up, e.g.u/n, 1/I, !/l, ill.
Due to theordering
used these errors occurred in the nodes oflayer
3 till 6. We found out, that either thepattern
itselfFig.
5. -Gray
scalepicture
(lower part)
and sorted constructivities(upper part)
for a matrix row oflayer
3 ;learning
without noise ; stored patterns :(TY(y! l lf ) ;
tested pattern : « 1 ».A
problem
arises from the choice of thetraining
patterns.
Ifthey
are,by
chance,
bad thenrecognition
cannot work well because theseobjects
form theonly
basis forlearning.
Tests with the mean value of 5 realisations of onepattern
did not lead tosatisfactory
results. We suppose that we have to use many more scanned versions of one letter to form an idealtraining
pattern.
Nevertheless even
by learning
without noise wegot
a retrievalquality
which is asgood
or even better as the classical method ofcomparing
correlations(1.4 %
errorscomparing
to1.7 % of the classical
method,
seefigure
8 for anexample,
for an error of the classical methodand correct
recognition
with the indextree).
However,
ourrecognition
is muchquicker.
Theratio of
computations
needed for therecognition
scales likep/log2 p.
For 64patterns
ourmethod is one order of
magnitude
faster.Learning
with noise. - The two kinds of noise(shifting
andnoisy edges,
see Sect.4.4)
wereapplied during
thelearning
process in order to stillimprove
the retrievalquality.
For this aim there are morelearning
steps
necessary, e.g. 3(YOOO instead of 5 000. But we see from table 1Fig.
6. -Gray
scalepicture
(lower part)
and sorted constructivities(upper part)
for a matrix row oflayer
3 ;learning
with noise ; stored patterns :{TY(y! IIf} ;
tested pattern : « 1 ».decisions
got
smaller,
too.Moreover,
afterlearning
with noise thesystem
« knows » when ithas made a wrong
assignment. Practically :
all errors are marked to be uncertain decisions with h : 1.A very
hard test for thesystem
is therecognition
ofinputs
which have ahigher
correlation to other trained
patterns
than to the correct one. With our method more than thehalf of such
inputs
wereassigned correctly,
but are marked as uncertain decisions(see
example
inFig.
7).
The
special
parameters
for thelearning
with noisedepend strongly
on thepatterns,
i.e. onthe
type
style
of the letters. Thetype
style
« roman » on which wereport
here consists of letters with many thin lines.Flipping
more than 3 % of thepixels
onedges
willdistroy
such letters. For thisstyle
thesignificant improvement
of retrievalquality
was reachedby shifting,
not
by
noisy edges
(see Tab. 1).
For otherstyles
we observed theopposite
behavior.A very
suitable method to
optimize
the noiseparameters
was the evaluation of the constructivitiesplotted
as grey scalepictures
(see
nextsection).
Table 1 also contains the mean value of the «
quality
»parameter
h over all scanned letters.Original
textLet us assume then that the
persistence
orrepetition
of areverberatory
activity (or "trace")
tends to inducelasting
cellularchanges
that add to itsstability.
Theassumption
can beprecisely
stated as follown: When an axon ofcell A is near
enough
to excite a cell B andrepeatedly
orpersistently
takespart in
f iring
it, somegrowth
process or metabolicchange
takesplace
in one or both cells such that A’sefficiency,
as one of the cellsfiring
B, is increased.D. 0. Hebb, The
Organization
of BehaviorRecognition
by
tree learned with noise andoptimized
indizes:A NEUROPHYSIOLOGICAL POSTULATE
Let us assume then that the
persistence
orrepetition
of areverberatory
activity
(or !, !, trace,)
tends to induce
lasting
cellularchanges
that add to itsstability.
Theassumption
can beprecisely
stated as follows: When an axon of cell A is near
enough
to excite a cell B andrepeatedly
orpersistently
takespart
infiring
it,
somegrowth
process or metabolicchange
takesplace
in one orboth cells such that
A,s
efficiency,
as one of the cellsfiring
B,
is increased.D. 0.
Hebb,
TheOrganization
of BehaviorRécognition
by
tree learned withoutnoise,and
with
alphabetical
indices:
Fig.
7. - This isan
example
ofrecognition
of a scanned text of Donald 0. Hebb. We used severaltraining matrices,
and uncertain decisions are marked in steps : letters are underlined if 0.5 h = 1 and have a frame for h 0.5. In case of errors madeby
the index tree, the correct letter isprinted
as a lower index. Note that at first all 64 letters areprinted,
which should berecognized by
the system. TheFig.
8. -Learning
an italic type, we had a very badtraining
pattern for the letter « e ». From left toright
the
pictures
show : 1. The test pattern in the text, to berecognized
(an «
e » as a humainbeing
will see atonce),
2. Thecorresponding training
pattern, and 3. Thetraining
pattern « c ». Arecognition by
asimple comparison
ofhamming
distances to thetraining
patterns fails - thetestpattem is
mostly
correlated to thetraining
pattern « c ». But our index treerecognizes
this patterncorrectly
as an « e »,marking
its decision as uncertain.decreases. But we observed that the standard deviation of the distribution was
reduced,
too.By
this the number of uncertain decisions(h
1 )
becomes smaller. 5.2 CONSTRUCTIVITIES. - Infigures
3-6 sometypical
results for theconstructivity
arepresented.
Forfigure
3 andfigure
5 weapplied simple learning,
forfigure
4 andfigure
6learning
with noise.Figure
3 andfigure
4,
which havealready
been introduced in section 3display
theconstructivity
for one node of the lowestlayer, namely
the one which contains theletters « i » and « m ». We looked for the
constructivity
of « i ».Many couplings belong
to thebackground
and contain no information. Without noiseonly
the differentpixels
between thetwo letters are marked
by
some constantpositive
value of theconstructivity. Applying
noise« smoothes » the constructivities. The number of
big
constructivitiescorresponding
to the number of decisivepixels
decreases. This is the caseespecially
at theedges
of the letters. We observed however that some few constructivities whichlay
outside theregions
of noisebecame
bigger,
e.g. inside the lines of « m ».Figure
5 andfigure
6present
one node oflayer
3 which stores 8 letters. To this nodebelongs
a matrix row whichdistinguishes
between {TY
(y}
and{ !lIf } .
Here weregarded
theconstructivity
of « 1 ». In theplot
without noise there are some discrete values of theconstructivity ; again they
become continuous when noise isapplied.
6. Conclusion.We
developed
aperceptron
like networkespecially optimized
forpractical applications.
Itoffers a
high
retrieval rateconcerning
« real »objects,
works veryquickly
and does notconsume much
computer
memory. We showed that theassignment
of theoutput
vectors to agiven
set ofpatterns
strongly
influences theeffectivity
of a neural net. For out hierarchicalindex tree structure this
arrangement
iscrucial,
so weoptimized
itby
simulatedannealing
with a cost function related to the
optimal stability.
We modified a minimaloverlap learning
algorithm according
to Krauth and Mezard.By
this wegot
an enhanced retrieval ofobjects
disturbed
by
aspecific
noiseappearing
inpractical applications.
We are certain that there aremany more
possibilities
to stillimprove
theperformance
of our index tree, which will be doneReferences
[1]
KRAUTH W., MEZARD M., J.Phys. A
20(1987)
L745.[2]
For a recent review see VAN HEMMEN J. L., DOMANY E. and SCHULTEN K.,Physics of
neuralnetworks, to be