HAL Id: jpa-00246902
https://hal.archives-ouvertes.fr/jpa-00246902
Submitted on 1 Jan 1994
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Neural networks : iterative unlearning algorithm converging to the projector rule matrix
A. Plakhov, S. Semenov
To cite this version:
A. Plakhov, S. Semenov. Neural networks : iterative unlearning algorithm converging to the projector rule matrix. Journal de Physique I, EDP Sciences, 1994, 4 (2), pp.253-260. �10.1051/jp1:1994105�.
�jpa-00246902�
Classification Physics AbsiJ.acts
87.10 06.50 75.10
Neural networks
:iterative unlearning algorithm converging to the projector rule matrix
A. Yu. Plakhov and S. A. Semenov
Institute of
Physics
antlTechnology,
Prechistenka Str. 13/7, Moscow 119034, Russia(Received 3J Maich 1993. >eceived in final form 4 OctobeJ. J993, accepted J2 OctobeJ. J993)
Abstract. The iterative
unlearning algorithm
for connectivity self~correction isproposent.
Nopresentation
of pattems tlunng the iteration process is requiretl.Starting
from the Hebbianconnectivity,
the convergence of the (rescaletl) iteratetl connection matrix to theprojector
rule one is proven, forarbitrary
set of p ~Nbinary
pattems.1. Introduction.
Over a
period
of years,spin-glass-type
neural network modelsfunctionmg
as associativememones have attracted considerable attention of
physicists. Havmg
consideredsimple
localHebb
prescription
for the connectionmatrix, deep understanding
of theunderlymg
mechanisms of the neural networkoperation
was achievedil -3].
However, even for the case of unbiasedrandom pattems, this
Ieaming
ruleprovides
a rather modest storagecapacity (for large
N,
p~w O.14 N
[3],
here N is thesystem size) and,
maddition,
allows a noticeable fraction oferrors in the retrieval. Moreover, it
completely
fails when asigmficant
amount of correlationsbetween the pattems occurs.
The efficient local iterative
algonthms,
which arecapable
to store correlated as well as uncorrelated pattems, weredeveloped [4~lO]. They imply
a localupdating
of thecouplings
whilst the pattems
(or
theirnoisy
versions[7, 8])
arepresented
to the network.Locality
isconsidered to be very desirable in the hardware context.
There were many successful attempts aimed at
obtainmg rigorous
results concemingconvergence properties of these
algorithms.
Inparticular,
the convergence theorems for perceptron-typealgonthms
which enable one to stabihze up to 2N random uncorrelated pattems were proven[5, 6]. Simultaneously,
thealgorithm
of Diederich andUpper
wasestablished to converge to the
projector (pseudoinverse)
mie matrixII, 12],
for the sets of bathhnearly mdependent [4]
andIinearly dependent
pattems[13]. Leaming dynamics
ofsimilar
algonthm
was solvedby Upper [14]
in thethermodynamical
hmit, for the set ofextensively
many random patterns. Morerecently,
Blatt andVergmi [lO]
haveproposed
analgorithm operating
witharbitrary
correlated patterns which ensures fast convergence to theprojector
rule matrix.254 JOURNAL DE PHYSIQUE I N° 2
All these
Ieaming procedures,
in fact, make use ofrepeated presentation
of pattern~ as anecessary
ingredient.
Weregard
an alternative situation, when after asingle presentation
ofp pattems f~~ = ±1, 1=1 ...,N, v =1,
..,
p and one~shot local
prescription
of theconnection
matrix,
further access to the information content becomesimpossible,
andsubsequent
correction of thecouplings
if needed takesplace
without any use off,~'s.
It seems to be reasonable in a zeroapproximation
to embed the information i,ia the Hebb ruleJ(
=
f f,~ f~~,
1,j
=
1,.., N
(1)
~mi
serving
as astarting point
for the further correction process.Only
a fewprocedures
are known,providing
the correction of the Hebbianconnectivity
without the pattem
presentation. Among
them, we first mention the so called «unleaming
)>proposed by Hopfield
et ai.(1983) [15].
Extensive numerical studies[16. 17]
revealquadrupled
increase of the cntical storagecapacity
~p~- 0.68 N
il 7]
and marked elimination ofspurious
metastable states mherent to the Hebbprescription
(1j. Furthermore, theunlearning
can handle the set of pattems
having
different activities, contrary to the standardHopfield
model.
Unfortunately,
it suffers from graveshortcomings first,
the existence ofoptimal
timeof
applying
theprocedure
after thatrecognition properties
of the network become worse, second, theresultmg
connection matrixusually
does notprovide
aperfect
storage of pattems, andthird, fully empirical
level ofstudying.
Dotsenko et ai.
il
8,19]
haverecently proposed
athermally
induced iterative redefinition ofcouplings, starting
from the Hebbmatrix,
so as to improve the storage of a set of non- correlated randompattems.
In their model, the iteratedsymmetric
connection matrix possessesan mtermediate forrn between the Hebb matrix and the
projector
rule one.In the present paper, we propose a stochastic iterative
algonthm
ofunlearmng
type for thecorrection of initial Hebbian
couplmgs
without access to the information to be memorized. Noconditions are
imposed
on the set of patterns. It is shown that, ifunleammg strength
is chosen below a certain cnticalvalue,
the iterated connection matrixappropnately
rescaled converges withprobability
one. Theresulting
matrix is givenby
theprojector
rule for any maximallmearly independent
subset of the given set of p ~N patterns, and a memonzation of the whole set of pattems is thus ensured[13].
The
plan
of the paper is thefollowing.
In the next section we descnbe thealgorithm.
Theproof
of its convergence isgiven
m the third section. The paper ends withconcluding
remarks.2.
Algonthm.
The iterative
algorithm
is formulated as follows. At each iteration step, the state vector S=
(Si,
,
S~ )
is chosen at random with the componentstaking
the values ±independently
with
equal probabihty
1/2. Afterwards, the local fieldsN
h,
=jj J~~)
j i
are calculated and then the
couplmgs
are redefinedby
J~~ ~
J,~ -1h~
h~ ,
(2)
where the positive parameter F represents the
unlearning strength.
Thecoupling updatmg
is thusnothing
else than theunleaming
of the vector of local fieldsproduced by
the randomconfiguration
S.Self-interactions,
J~~, are involved in the iteration process. Thealgorithm
staffs from the matrix of Hebbian
couphngs J~,
and theupdating procedure (2)
isrepeated
again and
again,
the randomconfigurations being
chosenindependently
at each step.The
algorithm
is local in the sense that thechange
ofJ~~
only depends
on the local fields onneurons i and
j [20, loi.
3.
Convergence
ofalgorithm.
Despite
of the stochastic nature of the iterativealgonthm (2),
it exhibits a remarkableconvergence property as we will show in this section. It tums out
that,
aslong
ase is chosen below some critical value e~, the connection matrix J, which is renorrnalized
by
afactor
inversely proportional
to the total number of iteration steps, converges to theprojector
rule matrix. To be more
precise,
for anypreassigned
pattem set, the convergence takesplace
with
probability
one.It should be
emphasized
herethat,
m ourapproach,
the pattems are non-randomN-vectors,
Nbeing
considered as a constantinteger.
Thestochasticity
is dueonly
to a random choice ofthe state vector S at each iteration step.
To staff
with,
we choose the maximal subset oflinearly independent
pattems and relabelthem
by f',
,
f~,
s < p. Theremaimng
vectorsf~,
s + < « < p, can then be wntten ass
their lmear combinations
f~
=
jj b~~ f~.
The Hebb matnx(1)
is then givenby
~ i
J(
=
j~ if B)~ il (3)
~
~. v =1
with
~
~pv
P~
~pv
+~j
~oe~~«v (~)
«=s~1
Remarkably,
in the course of the iteration process the connection matrix preserves the formj(m)
ijj~
~p~(m
j ~~ ~~~fil i PV J
~. v =1
with some symmetric s x s matrix B ~~l
(here
and below we will use an upper index m bracketsm order to denote
quantities
related to the iteration step m = 1,2,
.).lndeed,
beforeapplying
the
algorithm
one has(3, 4).
Let us assumevalidity
of the forrn(5)
for some iteration stepm and check that for step m. The local fields at the m-th step are
N s
h)~J
=
jj JÎj~
~~Sj~~
=
jj fi~ Bf-
~lgt~
jmi ~.~mi
with
~~~ É~~ ~Î
j=1
denoting
theoverlap
of the randomconfiguration S~~l
with the pattem f~, and,consequently,
with
(2),
we obtam the expression forJ)fl
j(mj_ j(m-ij_£~(mj~(mj_ ~ ~P(~(m-il_ ~~(m-il~ç(m)~(m-ij~ ~v (~~
ij ij ~ i j
~~
~
~v J
~,v=1
256 JOURNAL DE PHYSIQUE I N° 2
where G~~"' marks s x s matrix with e[ements Gj'[.~ =
g)"~
g))'"Thus,
J)("~ is of the form of (5) with ~ymmetnc matrix B~"~' = B~"' ' eB ~'~ ' G~'~ 'B~"' '.Subsequent analysis
can begreatly simplified
in terms of Q~"" =(B'"'~)~ '.
In this way it isnecessary first to examine under what conditions the inverses of B ~"'' exist.
Using
the definition ofB~
(4) it is easy to check that'
~Î ~~~'~~~~~~
~t#i
for any nonzero s-vector x, i-e- the matrix B~~ is positive definite and hence invertible. Next we
will show that, if the matrix
B~'~~
'J is invertible. B~'~J is also mvertible and its inverse Q~"~~ isgiven by
Q~"~' =
Q
~'~' ' +F~j
' G~"~' (7)provided
the~uantity
,
~m
l ~i BÎÎ~ G[[~
(8)~ , l
is not
equal
to zero. Indeed,by multiplymg
R-H-S- of(7) by B'"''
andtaking
mto account thatB~"~
Q
~'~=
I we obtain
(B~"' ' FB~"~ '' G~'~J B~'~'
J)(Q
~"~ ' +F~,j G~"'')
=
I +
+ FB ~"'~
'[~j
' G ~"~' G~"~'F~,,,
G"~ 'B ~"' G~"'~ (9)where I is unity s x s matrix.
By substituting
theexpression
for~,,, (8)
in(9)
and using the relation'
~ (m1 ~(,>< j ~ lit
~j
~ (in ~ (m1 ~ in<~< ~l'
~~,-l
which can be venfied
directly,
we find that the expression in square brackets in(9) equals
zero, and thus R.H.S. of(7)
is the inverse of B~'~ So we have obtainedby
induction that the inverse~of B~"~J e~ist and are given
by
the recursion(7),
if ai each iteration step A,~ ~é 0. The latter is fulfilledprovided
the constraint on themagnitude
of theunlearning strength
F, F ~ ~~ ~~'~,where A~,~, signs the maximal
eigenvalue
of J~~(for proof
seeAppendix).
Then. one con wfite
i '
~~~Q'~'~)~,Îfj'
jjl'~"p~~l~
'~~~
From (7),
by
induction, one gets,,>
Q(n<)
Q10)
~ ~l~(É)
jij)~v ~ PV ~
~j
1~>
k=1
i&.here
Q'~'
is the inverse ofB~
definedby (4).
Consider now the asymptotic behaviour of QÎ"Î,' at
large
m. Une can notice that for anyv, v
Gj),
2, repre~ent a ~equence ofindependent identically
distfibuted randomV~ri~bles
(1.i.d.r.v.). Obviously G()
< I. The
averaging
ofG))
isreadily perfornled
togive (G(kjj _fil-2 ~
N~~~vj~(L)s(Ll) fil-1~
~V i j i j ~v
i,j=(
N
where
C~~
= N~ 'jj if f)
is theoverlap
between the pattems v and v.i
Then, by decomposing
R-H-S- of(11)
intom n,
QΰÎ+e jj (Aj~-1)G()+e jj (G()-N~~C~~)+emN~~C~~,
k=1 k=1
one can rewrite
(11)
in the formQ(t~j
~
Q(oj
~~(t~j
~yi(mj
~~~fil-1
~~
~~~~
with
m ,,i
Rj$,~ = e
jj (Ai
l) G()
andW)$1
= ejj (G()
N~C~~).
k =1 k
We will show further that
Wj'Îl
andRl'Îl
are ofo(m),
and hence the last term m R-H-S- of(12)
dominates when m- cc. For
Wj'Î~,
this follows from the fact that it is thepartial
sum of a sequence of bounded1-1-d-r-v- with zero mean, andconsequently,
Wj'Îl(
= O(m~/~ +~) (0
~ ô~ I/2 as m
~ cc
with
probability
one(see, f-e-, [21]).
In order to establish that for
Rl'Îl,
it is sufficient to prove IimA~=1.
Thenm-m
Rl'Î~
=o(m),
m~ cc since it is the m-th
partial
sum of the series withvanishing
terms(namely, (Ai
lG()
<Ai
l - 0 as k-
cc).
Three issues will be used in theproof.
(i)
Because of the Imearmdependence
of vectorsf~,
.,
f~,
the s x soverlap
matrix C ispositive definite,
1-e- its minimaleigenvalue
ispositive.
(ii)
R~'~~ ispositive
semidefinite matrix. This is a direct consequence ofpositive
semidefi-mteness of matrices G~~l and of the
inequahty Ai
m1 proven inAppendix.
(iii)
The fact that for each matrix elementW($1
= o(m ),
m- cc entails the same
asymptotic
behaviour for the minimal
eigenvalue
of W~'~l.(We
recall that the matrix order s iskept fixed.)
In view of(1)-(iii),
from il2)
one finds that the minimaleigenvalue
of Q~'~~ goes tomfimty
mthe limit m
- cc, and hence the maximal
eigenvalue
of B~'~l,b$[(,
vanishes m this limit.By
virtue of
~À
ma~
~Î~Î
~~m
~(see
Appendix),
onestraightforwardly
gets therequired limiting
relation forA~.
So one can wnte
QÎ'Î~#emN~~(C~~+o(1))
as m-cc,258 JOURNAL DE PHYSIQUE I N° 2
and
by inverting,
in view of(10),
onefinally
obtainshm FmN~ '
J)/'
=
J( (13)
n> ce
where
~ Î Î ~~~~
~~~~~~
is the
projector
rule matrix for the pattemsf~,..., f~.
Since theremaining
pattemsf~~~,.., fP (if they exist)
lie in the lmearsubspace spanned by
the vectorsf~, f~, J~
is theprojection
matnx onto thesubspace spanned by
the whole set of nommated pattemsf~,
..,
fP,
p ~N. The relation(13)
takesplace
withprobability
one.4.
Concluding
remarks.In this paper, we hâve
proposed
theunlearning algorithm
for iterative self-correction of Hebbianconnectivity
which operates without the pattempresentation.
We hâve proven that, for anyprescribed
set ofp~N
pattems andsufficiently
smallunlearning strengths,
renormalized iterated connection matrix
approaches
theprojector
rule onedesigned by
anymaximal
linearly independent
subset of the whole set of pattems.It is worth notmg that, as one should expect, the convergence of our
algorithm
is much slower than that of iterative methodsutilizing
recurrent pattempresentation
which isactually supported by preliminary
numerical simulations.An
investigation
of how convergence ratedepends
upon the parameters of the model and how tooptimize
theunleammg strength
isbeyond
the scope of this paper. Theseproblems
will be examined in aforthcoming
paper treating the model in thethermodynamical
limit[22].
In the end, the efficient iterative
algorithm,
which allows one to reach the matnx ofoptimal
storage, has been constructed[5].
In this connection, theintnguing
question anses whether thealgorithm
of non-informationalconnectivity
self-correctionimplementing
the same functioncan be
developed.
As yet we have no answer to this question.Acknowledgements.
We would like to thank V. Dotsenko for useful discussion and N. Plakhova for
helpful
comments on the manuscfipt.
Appendix.
PROPOSITION Î.
if
F~ F~ = À~~~, then ~
~~
< Î, m= 1, 2,
By
using(6),
forarbitrary
N-vector f one hasN N N 2
1 JÎÎ fi fj
=1
JÎ(~Î< fj
~i JÎf Sj~
'fi
IA1 ), J= >,i= <,j=
(Here J))J
are taken to beJ(.)
As a consequence, the chain ofinequalities
holdsiJli~iiij~zJli~~~iiij~' ~zJl)~i<ij~Àmaxlii~
In
special
casef
= S~'~~'
(
j(n< ii s(m1 s (mi ~ ~ ~~)
fil ij i j md~
j =1
We will now prove
by
induction thatprovided
e~
Àj~[
the matricesJ~~l
arepositive
N
semidefinite,
i-e-jj J(~l f f~
~ 0
Vi
~ O.j i
The Hebb matrix, J~°1, is known to be
positive
semidefinite. Let us supposepositive
semidefiniteness of
J~~
l for some step m and prove that ofJ~~l
Forsymmetric positive
semidefinite bilinear form associated with the matrix J~'~ ~l, the
Cauchy-Schwarz inequality
con be written down as
IN ~ j(m
ij ij~(mj j~
i 2 ~~
Nj(m
ij i1 ~i~
j~j
Nj(m
ij i1s(m) s(mj
i j~~~
>,j i,j j
Substituting (A3)
and(A2)
into(Al),
one getsN N
~j ~ÎÎ~
~>~j
~~j ~ÎÎ ~i ~j' (Î
~Àmax
)
~i j=1 1j=1
for any nonzero f, and J~~~ is thus
positive
semidefinite.From definition of
A~ (8),
oneimmediately
obtainsN
d ~
~j j(m-11 ~(m) s(ml
m ~ ij i j
<,j= i
On account of
positive
semidefiniteness ofJ~~~
~~ and in view of(A2),
one comes to~
~
~ÎÎ ~Î~~ ~Î~~
~ EÀmax ~
>,j
what proves the statement.
PROPOSITION 2.
If
e~ À
j~[,
thenA~
m1 eÀ~~~
b$$1~l,
m= 1, 2,
For s-vector
g~~l,
one has~~~ ~ s ~~~ ~ p ~
N
Î~
~~ ~~P
~~ ~~Î~~
~ù ~ ~ij~i~
~~j~ ~~max.
~=l ~=1 j=1
Hence
Am"l~~ 1
sB$~~lgflgll~l-EÀmaxblli~~
~,v=1
which is
required.
References
Ill
Hopfieltl
J. J., Froc. Nati. Acad. Sci. USA 79 (1982) 2554.[2] Amit D. J., Gutfreuntl H. antl
Sompolmsky
H., Phys. Rev. A 32 (1985) 1007.[3] Amit D. J., Gutfreund H, antl
Sompohnsky
H., Ann. Phys. 173 (1987) 30.[4j Dietlerich S. and
Opper
M., Phys. Rev. Lett. 58 (1987) 949.[5] Krauth W. antl Mezartl M., J.
Phys.
A 20 (1987) L745.260 JOURNAL DE PHYSIQUE N° 2
[6j GartIner E., J. Piiys. A 21 (1988) 257.
[7j Pôppel G. antl Krej, U., EIIJ.npfii's Lett 4 (1987 ) 979
[8] GartIner E., Stroutl N. antl Wallace D. J., J Pfiȍ A 22 (1989j 2019.
[9] Abbott L. F. antl Kepler T. B., J. Pfiis A 22 (1989) L7 II
[10j Blatt M. G. antl Vergmi E. G., Piiys Rei Lett 66 (1991) 1793.
il Ii Personnaz L.. Guyon I, antl Dreyfus G., J. Ph_vs (F>.ance) Lett 46 (1985) L359.
[12] Kanter I. and
Sompolinsky
H.. Phj's Re» A 35 (1987) 380.[13] Berryman K. W., Inchioea M. E., Jaffe A. M. and Janowsky S. A., J Ph»s A 23 (1990) L223.
[14j Opper M., Em.ophyY- Lett. 811989) 389.
[lsj Hopfield J. J., Feinstem D. I. and Palmer R. G., Nati(Je 304 (1983) 158.
il 6j Kleinfeld D. and
Pentlergraft
D. B., Biopfiis J 51 (1987 47.[17] van Hemmen J. L., Ioffe L. B., Kùhn R. antl Vaa~ M., Pfijsica A 163 (1990j 386 [18j Dotsenko V. S., Yarunin N. D. antl Dorotheyev E. A., J Pfiis A 24 (1991) 2419.
[19j Dotsenko V. S. antl Tirozzi B., J Pfiis. A 24 (1991) 5163.
[20j Forrest B. M., J Pii_i's. 4 21 (19881245.
[21j Lampent J., Probability (Benjamin, New York. 19661.
[221 Plakhov A. Yu. antl Semenov S. A., m preparation.