Neural networks : iterative unlearning algorithm converging to the projector rule matrix

(1)

HAL Id: jpa-00246902

https://hal.archives-ouvertes.fr/jpa-00246902

Submitted on 1 Jan 1994

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Neural networks : iterative unlearning algorithm converging to the projector rule matrix

A. Plakhov, S. Semenov

To cite this version:

A. Plakhov, S. Semenov. Neural networks : iterative unlearning algorithm converging to the projector rule matrix. Journal de Physique I, EDP Sciences, 1994, 4 (2), pp.253-260. �10.1051/jp1:1994105�.

�jpa-00246902�

(2)

Classification Physics AbsiJ.acts

87.10 06.50 75.10

Neural networks

_:

iterative unlearning algorithm converging to the projector rule matrix

A. Yu. Plakhov and S. A. Semenov

Institute of

Physics

antl

Technology,

Prechistenka Str. 13/7, Moscow 119034, Russia

(Received 3J Maich 1993. >eceived in final form 4 OctobeJ. J993, accepted ^J2 ^OctobeJ. J993)

Abstract. The iterative

unlearning algorithm

for connectivity self~correction is

proposent.

No

presentation

of pattems tlunng the iteration process is requiretl.

Starting

from the Hebbian

connectivity,

the convergence of the (rescaletl) iteratetl connection matrix to the

projector

rule _one is proven, for

arbitrary

set of p ~N

binary

_pattems.

1. Introduction.

Over _a

period

of years,

spin-glass-type

neural network models

functionmg

as associative

memones have attracted considerable attention of

physicists. Havmg

considered

simple

local

Hebb

prescription

for the connection

matrix, deep understanding

of the

underlymg

mechanisms of the neural network

operation

_was achieved

il -3].

However, _even for the _case of unbiased

random pattems, ^this

Ieaming

rule

provides

_a rather modest storage

capacity (for large

N,

_p~

w O.14 N

[3],

here N is the

system size) and,

m

addition,

allows _a noticeable fraction of

errors in the retrieval. Moreover, it

completely

fails when _a

sigmficant

amount of correlations

between the pattems occurs.

The efficient local iterative

algonthms,

which _are

capable

to store correlated _as well _as uncorrelated pattems, were

developed [4~lO]. They imply

a local

updating

of the

couplings

whilst the pattems

(or

their

noisy

versions

[7, 8])

are

presented

to the network.

Locality

_is

considered _to be very desirable in the hardware _context.

There _were many successful attempts ^aimed at

obtainmg rigorous

results conceming

convergence properties of these

algorithms.

In

particular,

the convergence theorems for perceptron-type

algonthms

which enable _one _to stabihze up to 2N random uncorrelated pattems were proven

[5, 6]. Simultaneously,

the

algorithm

of Diederich and

Upper

was

established to converge ^to the

projector (pseudoinverse)

mie matrix

II, 12],

for the _sets of bath

hnearly mdependent [4]

and

Iinearly dependent

pattems

[13]. Leaming dynamics

of

similar

algonthm

was solved

by Upper [14]

in the

thermodynamical

hmit, ^for ^the set of

extensively

_many random patterns. ^More

recently,

Blatt and

Vergmi [lO]

have

proposed

an

algorithm operating

with

arbitrary

correlated patterns ^which ensures fast convergence to the

projector

rule _matrix.

(3)

254 JOURNAL DE PHYSIQUE I N° 2

All these

Ieaming procedures,

in fact, make _use of

repeated presentation

of pattern~ as a

necessary

ingredient.

We

regard

an alternative situation, when after _a

single presentation

of

p pattems f~~ = ±1, 1=1 ...,N, _v =1,

..,

p and one~shot local

prescription

^of the

connection

matrix,

further _access _to the information _content becomes

impossible,

and

subsequent

correction of the

couplings

if needed takes

place

^without _any use of

f,~'s.

It _seems to be reasonable _in _a _zero

approximation

to embed the information i,ia the Hebb rule

J(

=

f f,~ f~~,

1,

j

=

1,.., N

(1)

~mi

serving

_as _a

starting point

for the further _correction process.

Only

a few

procedures

_are known,

providing

the correction of the Hebbian

connectivity

without the pattem

presentation. Among

them, we first mention the _so called _«

unleaming

_)>

proposed by Hopfield

et ai.

(1983) [15].

Extensive numerical studies

[16. 17]

reveal

quadrupled

increase of the cntical storage

capacity

_~p~

- 0.68 N

il 7]

and marked elimination of

spurious

metastable states mherent _to the Hebb

prescription

(1j. Furthermore, the

unlearning

can handle the set of pattems

having

different activities, contrary to the standard

Hopfield

model.

Unfortunately,

it suffers from grave

shortcomings first,

the existence of

optimal

_time

of

applying

the

procedure

after that

recognition properties

of the network become _worse, second, the

resultmg

connection matrix

usually

does not

provide

a

perfect

storage ^of pattems, and

third, fully empirical

level of

studying.

Dotsenko _et ai.

il

8,

19]

have

recently proposed

_a

thermally

induced iterative redefinition of

couplings, starting

from the Hebb

matrix,

_so _as to improve the storage ^of a set of _non- correlated random

pattems.

^In ^their model, the iterated

symmetric

connection matrix possesses

an mtermediate forrn between the Hebb matrix and the

projector

rule _one.

In the present paper, we propose a stochastic iterative

algonthm

of

unlearmng

type ^for ^the

correction of initial Hebbian

couplmgs

without _access _to the information _to be memorized. No

conditions _are

imposed

on the set of patterns. ^{It is} shown that, if

unleammg strength

^is chosen below _a _certain cntical

value,

the iterated connection matrix

appropnately

rescaled converges with

probability

one. The

resulting

^matrix is given

by

the

projector

rule for any maximal

lmearly independent

subset of the given ^set of p ~N patterns, ^and a memonzation of the whole _set of pattems ^is thus ensured

[13].

The

plan

of the paper is the

following.

In the _next _section _we descnbe the

algorithm.

^The

proof

of its convergence is

given

_m the third section. The paper ends with

concluding

remarks.

2.

Algonthm.

The iterative

algorithm

is formulated _as follows. At each iteration step, the state vector S

=

(Si,

,

S~ )

is chosen at random with the components

taking

the values _±

independently

with

equal probabihty

1/2. Afterwards, the local fields

N

h,

₌

jj J~~)

j i

are calculated and then the

couplmgs

are redefined

by

J~~ ^~

J,~ -1h~

h~ ,

(2)

where the positive parameter F represents ^the

unlearning strength.

The

coupling updatmg

is thus

nothing

else than the

unleaming

^of the vector of local fields

produced by

the random

(4)

configuration

S.

Self-interactions,

_J~~, _are involved in the iteration process. The

algorithm

staffs from the matrix of Hebbian

couphngs J~,

and the

updating procedure (2)

is

repeated

again and

again,

the random

configurations being

chosen

independently

at each step.

The

algorithm

is local in the _sense that the

change

of

J~~

only depends

on the local fields _on

neurons i and

j [20, loi.

3.

Convergence

of

algorithm.

Despite

of the stochastic nature of the iterative

algonthm (2),

it exhibits _a remarkable

convergence property as we will show in this section. It _tums _out

that,

_as

long

as

e is chosen below _some critical value e~, the connection matrix J, which is renorrnalized

by

_a

factor

inversely proportional

to the total number of iteration steps, converges to the

projector

rule matrix. To be _more

precise,

for any

preassigned

pattem set, the convergence takes

place

with

probability

one.

It should be

emphasized

here

that,

_m _our

approach,

the pattems are non-random

N-vectors,

N

being

considered _as _a _constant

integer.

The

stochasticity

is due

only

to a random choice of

the _state _vector S at each iteration step.

To _staff

with,

we choose the maximal subset of

linearly independent

pattems ^and ^relabel

them

by f',

,

f~,

_s _< _p. The

remaimng

vectors

f~,

_s + < « < p, can then be _wntten _as

s

their lmear combinations

f~

=

jj _b~~ f~.

The Hebb matnx

(1)

is then given

by

~ i

J(

=

j~ if B)~ il (3)

~

~. v =1

with

~

~pv

P

~

~pv

+

~j

~oe~

~«v (~)

«=s~1

Remarkably,

in the _course of the iteration process the connection matrix preserves the form

j(m)

_ij

j~

^~^p

^~(m

^j ^~^~ ~~~

fil ⁱ ^PV J

~. v =1

with some symmetric s x s matrix B ^~~l

(here

and below _we will _use _an upper index _m brackets

m order _to denote

quantities

related _to the iteration step m = 1,

2,

.).

lndeed,

before

applying

the

algorithm

one has

(3, 4).

Let _us _assume

validity

of the forrn

(5)

for _some _iteration step

m and check that for step m. The local fields _at the m-th step are

N _s

h)~J

=

jj _JÎj~

^~~

_Sj~~

=

jj fi~ Bf-

~l

gt~

jmi ~.~mi

with

~ É ~Î

j=1

denoting

the

overlap

of the random

configuration ^S~~l

with the pattem f~, and,

consequently,

with

(2),

we obtam the expression for

J)fl

j(mj_ j(m-ij_£~(mj~(mj_ ~ ~P(~(m-il_ ~~(m-il~ç(m)~(m-ij~ ~v (~~

ij ij ~ ⁱ ^j

~~

~

~v J

~,v=1

(5)

where G~~"' marks _s _x _s matrix with e[ements Gj'[.~ =

g)"~

g))'"

Thus,

_J)("~ _is of the form of (5) with ~ymmetnc ^matrix ^B^~"~^' ₌ ^B^~"' ^' eB ^~'~ ^' G^{~'~ '}B^~"' '.

Subsequent analysis

can be

greatly simplified

in _terms of Q~"" ₌

(B'"'~)~ ^'.

In this way ^it is

necessary first to examine under what conditions the _inverses of B ^~"'' exist.

Using

the definition of

B~

₍₄₎ _it _is _easy _to _check _that

'

~Î ~'~~~~

~t#i

for _any _nonzero s-vector x, i-e- the matrix B~~ is positive ^definite and hence invertible. Next _we

will show that, ^if ^the matrix

B~'~~

^'J _is invertible. B~'~J _is also mvertible and its _inverse Q~"~~ is

given by

Q~"~' ₌

Q

^~'~' ^' ₊

F~j

^' ^G~"~' (7)

provided

the

~uantity

,

~m

l _~

i BÎÎ~ _G[[~

₍₈₎

~ , l

is not

equal

to zero. Indeed,

by multiplymg

R-H-S- of

(7) by B'"''

and

taking

mto account that

B^~"~

Q

^~'~

=

I _we obtain

(B~"' ^' FB^~"~ ^'^' G~'~J B~'~'

J)(Q

^~"~ ^' ₊

_F~,j G~"'')

=

I ₊

+ FB ^~"'~

'[~j

^' _G ~"~' G~"~'

F~,,,

G^"~ ^'B ~"' G~"'~ (9)

where I is unity s x s matrix.

By substituting

the

expression

for

~,,, (8)

ⁱⁿ

(9)

and using ^the relation

'

~ ^(m1 ~(,>< j ~ ^lit

~j

^~ ⁽ⁱⁿ ^~ ^(m1 ^~ ^in<

~< ~l'

~~,-l

which _can be venfied

directly,

we find that the expression in square brackets _in

(9) equals

_zero, and thus R.H.S. of

(7)

is the _inverse of B^~'~ So _we have obtained

by

induction that the inverse~

of B~"~J e~ist and _are given

by

the _recursion

(7),

if _ai each iteration step A,~ ~é 0. The latter _is fulfilled

provided

the constraint _on the

magnitude

of the

unlearning strength

_F, _F _~ _~~ _~~'~,

where A~,~, signs the maximal

eigenvalue

^of J~~

(for proof

_see

Appendix).

Then. one con wfite

i ^'

~~~Q'~'~)~,Îfj'

jjl'~"p~~l~

'

~~~

From (7),

by

induction, _one _gets

,,>

Q(n<)

Q10)

_~ ~

l~(É)

_jij)

~v ^~ PV ~

~j

₁

~>

k=1

i&.here

Q'~'

_is the _inverse of

B~

_defined

by (4).

Consider _now the asymptotic behaviour of QÎ"Î,' ^at

large

m. Une _can notice that for any

v, v

Gj),

2, repre~ent a ~equence of

independent identically

distfibuted random

(6)

V~ri~bles

(1.i.d.r.v.). Obviously G()

< I. The

averaging

of

G))

is

readily perfornled

to

give (G(kjj ^_fil-2 ~

N

~~~vj~(L)s(Ll) ^fil-1~

~V i j i j ~v

i,j=(

N

where

C~~

₌ ^N~ ^'

jj if f)

is the

overlap

between the pattems v and _v.

i

Then, by decomposing

R-H-S- of

(11)

into

m n,

QÎ°Î+e jj (Aj~-1)G()+e jj (G()-NC)+emNC,

k=1 k=1

one can rewrite

(11)

in the form

Q(t~j

~

Q(oj

_~

~(t~j

_~

^yi(mj

_~

^~~fil-1

~

~~~~

with

m ,,i

Rj$,~ = e

jj (Ai

l

) G()

and

W)$1

= e

jj ^(G()

N~

C~~).

k =1 k

We will show further that

Wj'Îl

^and

Rl'Îl

are of

o(m),

and hence the last term m R-H-S- of

(12)

dominates when _m

- cc. For

Wj'Î~,

^this ^follows ^from ^the ^fact ^that ^it is the

partial

_sum of _a sequence of bounded1-1-d-r-v- with _zero _mean, and

consequently,

Wj'Îl(

= O(m~/~ ⁺

~) (0

_~ ô

~ I/2 _as _m

~ cc

with

probability

_one

(see, f-e-, [21]).

In order to establish that for

Rl'Îl,

^it is sufficient to prove Iim

A~=1.

^Then

m-m

Rl'Î~

=

o(m),

_m

~ cc since it is the m-th

partial

_sum of the series with

vanishing

_terms

(namely, (Ai

l

G()

_<

Ai

l _- 0 _as k

-

cc).

Three _issues will be used in the

proof.

(i)

Because of the Imear

mdependence

of _vectors

f~,

.,

f~,

the _s _x _s

overlap

^matrix C is

positive definite,

1-e- _its minimal

eigenvalue

is

positive.

(ii)

R~'~~ _is

positive

semidefinite matrix. This _is _a direct consequence of

positive

semidefi-

mteness of matrices G~~l and of the

inequahty Ai

^m1 proven in

Appendix.

(iii)

The fact that for each matrix element

W($1

= o

(m ),

_m

- cc entails the _same

asymptotic

behaviour for the minimal

eigenvalue

^of ^W~'~l.

(We

recall that the matrix order _s _is

kept fixed.)

In view of

(1)-(iii),

from il

2)

one finds that the minimal

eigenvalue

of Q~'~~ goes ^to

mfimty

m

the limit _m

- cc, and hence the maximal

eigenvalue

of B~'~l,

b$[(,

vanishes _m this limit.

By

virtue of

~À

ma~

~Î~Î

~

~m

~

(see

Appendix),

_one

straightforwardly

_gets the

required limiting

relation for

A~.

So _one _can _wnte

QÎ'Î~#emN(C+o(1))

_as _m-cc,

(7)

and

by inverting,

in view of

(10),

_one

finally

obtains

hm FmN~ ^'

J)/'

=

J( (13)

n> ce

where

~ Î Î _~~~~

~~~~~~

is the

projector

rule matrix for the pattems

f~,..., _f~.

_Since _the

_remaining

_pattems

f~~~,.., _{fP (if} _they _exist)

lie in the lmear

subspace spanned by

the _vectors

f~, f~, J~

is the

projection

matnx onto the

subspace spanned by

the whole _set of nommated pattems

f~,

..,

fP,

_p ~N. The relation

(13)

takes

place

^with

probability

one.

4.

Concluding

remarks.

In this paper, we hâve

proposed

the

unlearning algorithm

for iterative self-correction of Hebbian

connectivity

which operates ^without ^the pattem

presentation.

We hâve proven that, for any

prescribed

set of

p~N

pattems ^and

sufficiently

small

unlearning strengths,

renormalized iterated connection matrix

approaches

the

projector

rule _one

designed by

_any

maximal

linearly independent

subset of the whole _set of pattems.

It is worth notmg that, _as _one should expect, ^the convergence of _our

algorithm

_is much slower than that of iterative methods

utilizing

recurrent pattem

presentation

which is

actually supported by preliminary

numerical simulations.

An

investigation

^of ^how convergence rate

depends

_upon the parameters ^of ^the ^model ^and how _to

optimize

the

unleammg strength

is

beyond

the scope of this paper. These

problems

will be examined in _a

forthcoming

_paper treating the model _in the

thermodynamical

limit

[22].

In the end, the efficient iterative

algorithm,

which allows _one to reach the matnx of

optimal

storage, has been constructed

[5].

In this connection, the

intnguing

_question _anses whether the

algorithm

of non-informational

connectivity

self-correction

implementing

the _same function

can be

developed.

As yet we have _no _answer _to this question.

Acknowledgements.

We would like _to thank V. Dotsenko for useful discussion and N. Plakhova for

helpful

comments on the manuscfipt.

Appendix.

PROPOSITION Î.

if

_F

~ F~ = À~~~, ^then _~

~~

_< Î, _m

= 1, 2,

By

_using

(6),

for

arbitrary

N-vector f one has

N N N 2

1 JÎÎ fi fj

₌

1

_JÎ(~

Î< fj

^~

i JÎf _Sj~

^'

_fi

_IA1 ₎

, J= >,i= <,j=

(Here J))J

are taken to be

J(.)

As _a consequence, ^the chain of

inequalities

holds

iJli~iiij~zJli~~~iiij~' ~zJl)~i<ij~Àmaxlii~

In

special

_case

f

₌ ^S~'~~

'

(

j(n< ⁱⁱ s(m1 s ^(mi _~ ~ ~~)

fil ^ij ⁱ ^j ^md~

j =1

(8)

We will _now prove

by

induction that

provided

_e

~

Àj~[

the matrices

J~~l

_are

positive

N

semidefinite,

i-e-

jj _J(~l _{f f~}

~ 0

Vi

_{~ O.}

j i

The Hebb matrix, J~°1, is known to be

positive

semidefinite. Let _us suppose

positive

semidefiniteness of

J~~

^l for _some step m and prove that of

J~~l

_For

symmetric positive

semidefinite bilinear form associated with the matrix J~'~ _~l, the

Cauchy-Schwarz inequality

con be written down _as

IN ^~ ^j(m

_ij ⁱ^j^~(m_j ^j

^~

_i ² ^~

^~

^N

^j(m

_ij ⁱ¹ ^~_i

^~

_j

^~j

^N

^j(m

_ij ⁱ¹

^{s(m) s(mj}

_i _j

^~~~

>,j i,j j

Substituting (A3)

and

(A2)

^into

(Al),

_one _gets

N N

~j ~ÎÎ~

~>

~j

^~

~j ~ÎÎ ~i ~j' (Î

~À

max

)

~

i j=1 1j=1

for any nonzero f, and J~~~ is thus

positive

semidefinite.

From definition of

A~ (8),

one

immediately

obtains

N

d ^~

~j ^j(m-11 ^~(m) ^s(ml

m ~ ^ij ⁱ ^j

<,j= i

On _account of

positive

semidefiniteness of

J~~~

~~ and in view of

(A2),

_one _comes to

~

~ÎÎ _{~Î~~ ~Î~~}

_~ EÀ

max ~

>,j

what proves the statement.

PROPOSITION 2.

If

_e

~ À

j~[,

then

A~

m1 eÀ

~~~

b$$1~l,

_m

= 1, 2,

For _s-vector

g~~l,

one has

~~~ ~ ^s ~~~ ~ ^p ~

N

Î~

_~

~ _~~P

_~

~ Î

_~

ù ~ ~ij~i~

~

~j~ ~~max.

~=l ~=1 j=1

Hence

Am"l~~ 1

s

B$~~lgflgll~l-EÀmaxblli~~

~,v=1

which is

required.

References

Ill

Hopfieltl

^J. J., Froc. Nati. Acad. Sci. USA 79 (1982) 2554.

[2] Amit D. J., Gutfreuntl H. antl

Sompolmsky

H., Phys. Rev. A 32 (1985) 1007.

[3] Amit D. J., Gutfreund H, antl

Sompohnsky

H., Ann. Phys. 173 (1987) 30.

[4j Dietlerich S. and

Opper

M., Phys. Rev. Lett. 58 (1987) 949.

[5] Krauth W. antl Mezartl M., J.

Phys.

A 20 (1987) L745.

(9)

260 JOURNAL DE PHYSIQUE ^N° 2

[6j GartIner E., J. Piiys. A 21 (1988) 257.

[7j Pôppel G. antl Krej, U., EIIJ.npfii's Lett 4 (1987 ) 979

[8] GartIner E., Stroutl N. antl Wallace D. J., J Pfi»ç A 22 (1989j 2019.

[9] Abbott L. F. antl Kepler T. B., J. Pfiis A 22 (1989) L7 II

[10j Blatt M. G. antl Vergmi E. G., Piiys Rei Lett 66 (1991) 1793.

il Ii Personnaz L.. Guyon I, antl Dreyfus G., J. Ph_vs (F>.ance) Lett 46 (1985₎ L359.

[12] Kanter I. and

Sompolinsky

H.. Phj's Re» A 35 (1987) 380.

[13] Berryman K. W., Inchioea M. E., Jaffe A. M. and Janowsky S. A., J Ph»s A 23 (1990) L223.

[14j Opper M., Em.ophyY- Lett. 811989) 389.

[lsj Hopfield J. J., Feinstem D. I. and Palmer R. G., Nati(Je 304 (1983) 158.

il 6j Kleinfeld D. and

Pentlergraft

D. B., Biopfiis J 51 (1987 47.

[17] _van Hemmen J. L., Ioffe L. B., Kùhn R. antl Vaa~ M., Pfijsica A 163 (1990j 386 [18j Dotsenko V. S., Yarunin N. D. antl Dorotheyev E. A., J Pfiis A 24 (1991) 2419.

[19j Dotsenko V. S. antl Tirozzi B., J Pfiis. A 24 (1991) 5163.

[20j Forrest B. M., J Pii_i's. 4 21 (19881245.

[21j Lampent J., Probability (Benjamin, New York. 19661.

[221 Plakhov A. Yu. antl Semenov S. A., _m preparation.