• Aucun résultat trouvé

Neural networks : iterative unlearning algorithm converging to the projector rule matrix

N/A
N/A
Protected

Academic year: 2021

Partager "Neural networks : iterative unlearning algorithm converging to the projector rule matrix"

Copied!
9
0
0

Texte intégral

(1)

HAL Id: jpa-00246902

https://hal.archives-ouvertes.fr/jpa-00246902

Submitted on 1 Jan 1994

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Neural networks : iterative unlearning algorithm converging to the projector rule matrix

A. Plakhov, S. Semenov

To cite this version:

A. Plakhov, S. Semenov. Neural networks : iterative unlearning algorithm converging to the projector rule matrix. Journal de Physique I, EDP Sciences, 1994, 4 (2), pp.253-260. �10.1051/jp1:1994105�.

�jpa-00246902�

(2)

Classification Physics AbsiJ.acts

87.10 06.50 75.10

Neural networks

:

iterative unlearning algorithm converging to the projector rule matrix

A. Yu. Plakhov and S. A. Semenov

Institute of

Physics

antl

Technology,

Prechistenka Str. 13/7, Moscow 119034, Russia

(Received 3J Maich 1993. >eceived in final form 4 OctobeJ. J993, accepted J2 OctobeJ. J993)

Abstract. The iterative

unlearning algorithm

for connectivity self~correction is

proposent.

No

presentation

of pattems tlunng the iteration process is requiretl.

Starting

from the Hebbian

connectivity,

the convergence of the (rescaletl) iteratetl connection matrix to the

projector

rule one is proven, for

arbitrary

set of p ~N

binary

pattems.

1. Introduction.

Over a

period

of years,

spin-glass-type

neural network models

functionmg

as associative

memones have attracted considerable attention of

physicists. Havmg

considered

simple

local

Hebb

prescription

for the connection

matrix, deep understanding

of the

underlymg

mechanisms of the neural network

operation

was achieved

il -3].

However, even for the case of unbiased

random pattems, this

Ieaming

rule

provides

a rather modest storage

capacity (for large

N,

p~

w O.14 N

[3],

here N is the

system size) and,

m

addition,

allows a noticeable fraction of

errors in the retrieval. Moreover, it

completely

fails when a

sigmficant

amount of correlations

between the pattems occurs.

The efficient local iterative

algonthms,

which are

capable

to store correlated as well as uncorrelated pattems, were

developed [4~lO]. They imply

a local

updating

of the

couplings

whilst the pattems

(or

their

noisy

versions

[7, 8])

are

presented

to the network.

Locality

is

considered to be very desirable in the hardware context.

There were many successful attempts aimed at

obtainmg rigorous

results conceming

convergence properties of these

algorithms.

In

particular,

the convergence theorems for perceptron-type

algonthms

which enable one to stabihze up to 2N random uncorrelated pattems were proven

[5, 6]. Simultaneously,

the

algorithm

of Diederich and

Upper

was

established to converge to the

projector (pseudoinverse)

mie matrix

II, 12],

for the sets of bath

hnearly mdependent [4]

and

Iinearly dependent

pattems

[13]. Leaming dynamics

of

similar

algonthm

was solved

by Upper [14]

in the

thermodynamical

hmit, for the set of

extensively

many random patterns. More

recently,

Blatt and

Vergmi [lO]

have

proposed

an

algorithm operating

with

arbitrary

correlated patterns which ensures fast convergence to the

projector

rule matrix.

(3)

254 JOURNAL DE PHYSIQUE I N° 2

All these

Ieaming procedures,

in fact, make use of

repeated presentation

of pattern~ as a

necessary

ingredient.

We

regard

an alternative situation, when after a

single presentation

of

p pattems f~~ = ±1, 1=1 ...,N, v =1,

..,

p and one~shot local

prescription

of the

connection

matrix,

further access to the information content becomes

impossible,

and

subsequent

correction of the

couplings

if needed takes

place

without any use of

f,~'s.

It seems to be reasonable in a zero

approximation

to embed the information i,ia the Hebb rule

J(

=

f f,~ f~~,

1,

j

=

1,.., N

(1)

~mi

serving

as a

starting point

for the further correction process.

Only

a few

procedures

are known,

providing

the correction of the Hebbian

connectivity

without the pattem

presentation. Among

them, we first mention the so called «

unleaming

)>

proposed by Hopfield

et ai.

(1983) [15].

Extensive numerical studies

[16. 17]

reveal

quadrupled

increase of the cntical storage

capacity

~p~

- 0.68 N

il 7]

and marked elimination of

spurious

metastable states mherent to the Hebb

prescription

(1j. Furthermore, the

unlearning

can handle the set of pattems

having

different activities, contrary to the standard

Hopfield

model.

Unfortunately,

it suffers from grave

shortcomings first,

the existence of

optimal

time

of

applying

the

procedure

after that

recognition properties

of the network become worse, second, the

resultmg

connection matrix

usually

does not

provide

a

perfect

storage of pattems, and

third, fully empirical

level of

studying.

Dotsenko et ai.

il

8,

19]

have

recently proposed

a

thermally

induced iterative redefinition of

couplings, starting

from the Hebb

matrix,

so as to improve the storage of a set of non- correlated random

pattems.

In their model, the iterated

symmetric

connection matrix possesses

an mtermediate forrn between the Hebb matrix and the

projector

rule one.

In the present paper, we propose a stochastic iterative

algonthm

of

unlearmng

type for the

correction of initial Hebbian

couplmgs

without access to the information to be memorized. No

conditions are

imposed

on the set of patterns. It is shown that, if

unleammg strength

is chosen below a certain cntical

value,

the iterated connection matrix

appropnately

rescaled converges with

probability

one. The

resulting

matrix is given

by

the

projector

rule for any maximal

lmearly independent

subset of the given set of p ~N patterns, and a memonzation of the whole set of pattems is thus ensured

[13].

The

plan

of the paper is the

following.

In the next section we descnbe the

algorithm.

The

proof

of its convergence is

given

m the third section. The paper ends with

concluding

remarks.

2.

Algonthm.

The iterative

algorithm

is formulated as follows. At each iteration step, the state vector S

=

(Si,

,

S~ )

is chosen at random with the components

taking

the values ±

independently

with

equal probabihty

1/2. Afterwards, the local fields

N

h,

=

jj J~~)

j i

are calculated and then the

couplmgs

are redefined

by

J~~ ~

J,~ -1h~

h~ ,

(2)

where the positive parameter F represents the

unlearning strength.

The

coupling updatmg

is thus

nothing

else than the

unleaming

of the vector of local fields

produced by

the random

(4)

configuration

S.

Self-interactions,

J~~, are involved in the iteration process. The

algorithm

staffs from the matrix of Hebbian

couphngs J~,

and the

updating procedure (2)

is

repeated

again and

again,

the random

configurations being

chosen

independently

at each step.

The

algorithm

is local in the sense that the

change

of

J~~

only depends

on the local fields on

neurons i and

j [20, loi.

3.

Convergence

of

algorithm.

Despite

of the stochastic nature of the iterative

algonthm (2),

it exhibits a remarkable

convergence property as we will show in this section. It tums out

that,

as

long

as

e is chosen below some critical value e~, the connection matrix J, which is renorrnalized

by

a

factor

inversely proportional

to the total number of iteration steps, converges to the

projector

rule matrix. To be more

precise,

for any

preassigned

pattem set, the convergence takes

place

with

probability

one.

It should be

emphasized

here

that,

m our

approach,

the pattems are non-random

N-vectors,

N

being

considered as a constant

integer.

The

stochasticity

is due

only

to a random choice of

the state vector S at each iteration step.

To staff

with,

we choose the maximal subset of

linearly independent

pattems and relabel

them

by f',

,

f~,

s < p. The

remaimng

vectors

f~,

s + < « < p, can then be wntten as

s

their lmear combinations

f~

=

jj b~~ f~.

The Hebb matnx

(1)

is then given

by

~ i

J(

=

j~ if B)~ il (3)

~

~. v =1

with

~

~pv

P

~

~pv

+

~j

~oe~

~«v (~)

«=s~1

Remarkably,

in the course of the iteration process the connection matrix preserves the form

j(m)

ij

j~

~p

~(m

j ~~ ~~~

fil i PV J

~. v =1

with some symmetric s x s matrix B ~~l

(here

and below we will use an upper index m brackets

m order to denote

quantities

related to the iteration step m = 1,

2,

.).

lndeed,

before

applying

the

algorithm

one has

(3, 4).

Let us assume

validity

of the forrn

(5)

for some iteration step

m and check that for step m. The local fields at the m-th step are

N s

h)~J

=

jj JÎj~

~~

Sj~~

=

jj fi~ Bf-

~l

gt~

jmi ~.~mi

with

~~~ É~~ ~Î

j=1

denoting

the

overlap

of the random

configuration S~~l

with the pattem f~, and,

consequently,

with

(2),

we obtam the expression for

J)fl

j(mj_ j(m-ij_£~(mj~(mj_ ~ ~P(~(m-il_ ~~(m-il~ç(m)~(m-ij~ ~v (~~

ij ij ~ i j

~~

~

~v J

~,v=1

(5)

256 JOURNAL DE PHYSIQUE I N° 2

where G~~"' marks s x s matrix with e[ements Gj'[.~ =

g)"~

g))'"

Thus,

J)("~ is of the form of (5) with ~ymmetnc matrix B~"~' = B~"' ' eB ~'~ ' G~'~ 'B~"' '.

Subsequent analysis

can be

greatly simplified

in terms of Q~"" =

(B'"'~)~ '.

In this way it is

necessary first to examine under what conditions the inverses of B ~"'' exist.

Using

the definition of

B~

(4) it is easy to check that

'

~Î ~~~'~~~~~~

~t#i

for any nonzero s-vector x, i-e- the matrix B~~ is positive definite and hence invertible. Next we

will show that, if the matrix

B~'~~

'J is invertible. B~'~J is also mvertible and its inverse Q~"~~ is

given by

Q~"~' =

Q

~'~' ' +

F~j

' G~"~' (7)

provided

the

~uantity

,

~m

l ~

i BÎÎ~ G[[~

(8)

~ , l

is not

equal

to zero. Indeed,

by multiplymg

R-H-S- of

(7) by B'"''

and

taking

mto account that

B~"~

Q

~'~

=

I we obtain

(B~"' ' FB~"~ '' G~'~J B~'~'

J)(Q

~"~ ' +

F~,j G~"'')

=

I +

+ FB ~"'~

'[~j

' G ~"~' G~"~'

F~,,,

G"~ 'B ~"' G~"'~ (9)

where I is unity s x s matrix.

By substituting

the

expression

for

~,,, (8)

in

(9)

and using the relation

'

~ (m1 ~(,>< j ~ lit

~j

~ (in ~ (m1 ~ in<

~< ~l'

~~,-l

which can be venfied

directly,

we find that the expression in square brackets in

(9) equals

zero, and thus R.H.S. of

(7)

is the inverse of B~'~ So we have obtained

by

induction that the inverse~

of B~"~J e~ist and are given

by

the recursion

(7),

if ai each iteration step A,~ ~é 0. The latter is fulfilled

provided

the constraint on the

magnitude

of the

unlearning strength

F, F ~ ~~ ~~'~,

where A~,~, signs the maximal

eigenvalue

of J~~

(for proof

see

Appendix).

Then. one con wfite

i '

~~~Q'~'~)~,Îfj'

jjl'~"p~~l~

'

~~~

From (7),

by

induction, one gets

,,>

Q(n<)

Q10)

~ ~

l~(É)

jij)

~v ~ PV ~

~j

1

~>

k=1

i&.here

Q'~'

is the inverse of

B~

defined

by (4).

Consider now the asymptotic behaviour of QÎ"Î,' at

large

m. Une can notice that for any

v, v

Gj),

2, repre~ent a ~equence of

independent identically

distfibuted random

(6)

V~ri~bles

(1.i.d.r.v.). Obviously G()

< I. The

averaging

of

G))

is

readily perfornled

to

give (G(kjj _fil-2 ~

N

~~~vj~(L)s(Ll) fil-1~

~V i j i j ~v

i,j=(

N

where

C~~

= N~ '

jj if f)

is the

overlap

between the pattems v and v.

i

Then, by decomposing

R-H-S- of

(11)

into

m n,

QΰÎ+e jj (Aj~-1)G()+e jj (G()-N~~C~~)+emN~~C~~,

k=1 k=1

one can rewrite

(11)

in the form

Q(t~j

~

Q(oj

~

~(t~j

~

yi(mj

~

~~fil-1

~

~

~~~~

with

m ,,i

Rj$,~ = e

jj (Ai

l

) G()

and

W)$1

= e

jj (G()

N~

C~~).

k =1 k

We will show further that

Wj'Îl

and

Rl'Îl

are of

o(m),

and hence the last term m R-H-S- of

(12)

dominates when m

- cc. For

Wj'Î~,

this follows from the fact that it is the

partial

sum of a sequence of bounded1-1-d-r-v- with zero mean, and

consequently,

Wj'Îl(

= O(m~/~ +

~) (0

~ ô

~ I/2 as m

~ cc

with

probability

one

(see, f-e-, [21]).

In order to establish that for

Rl'Îl,

it is sufficient to prove Iim

A~=1.

Then

m-m

Rl'Î~

=

o(m),

m

~ cc since it is the m-th

partial

sum of the series with

vanishing

terms

(namely, (Ai

l

G()

<

Ai

l - 0 as k

-

cc).

Three issues will be used in the

proof.

(i)

Because of the Imear

mdependence

of vectors

f~,

.,

f~,

the s x s

overlap

matrix C is

positive definite,

1-e- its minimal

eigenvalue

is

positive.

(ii)

R~'~~ is

positive

semidefinite matrix. This is a direct consequence of

positive

semidefi-

mteness of matrices G~~l and of the

inequahty Ai

m1 proven in

Appendix.

(iii)

The fact that for each matrix element

W($1

= o

(m ),

m

- cc entails the same

asymptotic

behaviour for the minimal

eigenvalue

of W~'~l.

(We

recall that the matrix order s is

kept fixed.)

In view of

(1)-(iii),

from il

2)

one finds that the minimal

eigenvalue

of Q~'~~ goes to

mfimty

m

the limit m

- cc, and hence the maximal

eigenvalue

of B~'~l,

b$[(,

vanishes m this limit.

By

virtue of

ma~

~Î~Î

~

~m

~

(see

Appendix),

one

straightforwardly

gets the

required limiting

relation for

A~.

So one can wnte

QÎ'Î~#emN~~(C~~+o(1))

as m-cc,

(7)

258 JOURNAL DE PHYSIQUE I N° 2

and

by inverting,

in view of

(10),

one

finally

obtains

hm FmN~ '

J)/'

=

J( (13)

n> ce

where

~ Î Î ~~~~

~~~~~~

is the

projector

rule matrix for the pattems

f~,..., f~.

Since the

remaining

pattems

f~~~,.., fP (if they exist)

lie in the lmear

subspace spanned by

the vectors

f~, f~, J~

is the

projection

matnx onto the

subspace spanned by

the whole set of nommated pattems

f~,

..,

fP,

p ~N. The relation

(13)

takes

place

with

probability

one.

4.

Concluding

remarks.

In this paper, we hâve

proposed

the

unlearning algorithm

for iterative self-correction of Hebbian

connectivity

which operates without the pattem

presentation.

We hâve proven that, for any

prescribed

set of

p~N

pattems and

sufficiently

small

unlearning strengths,

renormalized iterated connection matrix

approaches

the

projector

rule one

designed by

any

maximal

linearly independent

subset of the whole set of pattems.

It is worth notmg that, as one should expect, the convergence of our

algorithm

is much slower than that of iterative methods

utilizing

recurrent pattem

presentation

which is

actually supported by preliminary

numerical simulations.

An

investigation

of how convergence rate

depends

upon the parameters of the model and how to

optimize

the

unleammg strength

is

beyond

the scope of this paper. These

problems

will be examined in a

forthcoming

paper treating the model in the

thermodynamical

limit

[22].

In the end, the efficient iterative

algorithm,

which allows one to reach the matnx of

optimal

storage, has been constructed

[5].

In this connection, the

intnguing

question anses whether the

algorithm

of non-informational

connectivity

self-correction

implementing

the same function

can be

developed.

As yet we have no answer to this question.

Acknowledgements.

We would like to thank V. Dotsenko for useful discussion and N. Plakhova for

helpful

comments on the manuscfipt.

Appendix.

PROPOSITION Î.

if

F

~ F~ = À~~~, then ~

~~

< Î, m

= 1, 2,

By

using

(6),

for

arbitrary

N-vector f one has

N N N 2

1 JÎÎ fi fj

=

1

JÎ(~

Î< fj

~

i JÎf Sj~

'

fi

IA1 )

, J= >,i= <,j=

(Here J))J

are taken to be

J(.)

As a consequence, the chain of

inequalities

holds

iJli~iiij~zJli~~~iiij~' ~zJl)~i<ij~Àmaxlii~

In

special

case

f

= S~'~~

'

(

j(n< ii s(m1 s (mi ~ ~ ~~)

fil ij i j md~

j =1

(8)

We will now prove

by

induction that

provided

e

~

Àj~[

the matrices

J~~l

are

positive

N

semidefinite,

i-e-

jj J(~l f f~

~ 0

Vi

~ O.

j i

The Hebb matrix, J~°1, is known to be

positive

semidefinite. Let us suppose

positive

semidefiniteness of

J~~

l for some step m and prove that of

J~~l

For

symmetric positive

semidefinite bilinear form associated with the matrix J~'~ ~l, the

Cauchy-Schwarz inequality

con be written down as

IN ~ j(m

ij ij~(mj j

~

i 2 ~

~

N

j(m

ij i1 ~i

~

j

~j

N

j(m

ij i1

s(m) s(mj

i j

~~~

>,j i,j j

Substituting (A3)

and

(A2)

into

(Al),

one gets

N N

~j ~ÎÎ~

~>

~j

~

~j ~ÎÎ ~i ~j' (Î

max

)

~

i j=1 1j=1

for any nonzero f, and J~~~ is thus

positive

semidefinite.

From definition of

A~ (8),

one

immediately

obtains

N

d ~

~j j(m-11 ~(m) s(ml

m ~ ij i j

<,j= i

On account of

positive

semidefiniteness of

J~~~

~~ and in view of

(A2),

one comes to

~

~

~ÎÎ ~Î~~ ~Î~~

~

max ~

>,j

what proves the statement.

PROPOSITION 2.

If

e

~ À

j~[,

then

A~

m1 eÀ

~~~

b$$1~l,

m

= 1, 2,

For s-vector

g~~l,

one has

~~~ ~ s ~~~ ~ p ~

N

Î~

~

~ ~~P

~

~ ~~Î~~

~

ù ~ ~ij~i~

~

~j~ ~~max.

~=l ~=1 j=1

Hence

Am"l~~ 1

s

B$~~lgflgll~l-EÀmaxblli~~

~,v=1

which is

required.

References

Ill

Hopfieltl

J. J., Froc. Nati. Acad. Sci. USA 79 (1982) 2554.

[2] Amit D. J., Gutfreuntl H. antl

Sompolmsky

H., Phys. Rev. A 32 (1985) 1007.

[3] Amit D. J., Gutfreund H, antl

Sompohnsky

H., Ann. Phys. 173 (1987) 30.

[4j Dietlerich S. and

Opper

M., Phys. Rev. Lett. 58 (1987) 949.

[5] Krauth W. antl Mezartl M., J.

Phys.

A 20 (1987) L745.

(9)

260 JOURNAL DE PHYSIQUE 2

[6j GartIner E., J. Piiys. A 21 (1988) 257.

[7j Pôppel G. antl Krej, U., EIIJ.npfii's Lett 4 (1987 ) 979

[8] GartIner E., Stroutl N. antl Wallace D. J., J Pfiȍ A 22 (1989j 2019.

[9] Abbott L. F. antl Kepler T. B., J. Pfiis A 22 (1989) L7 II

[10j Blatt M. G. antl Vergmi E. G., Piiys Rei Lett 66 (1991) 1793.

il Ii Personnaz L.. Guyon I, antl Dreyfus G., J. Ph_vs (F>.ance) Lett 46 (1985) L359.

[12] Kanter I. and

Sompolinsky

H.. Phj's Re» A 35 (1987) 380.

[13] Berryman K. W., Inchioea M. E., Jaffe A. M. and Janowsky S. A., J Ph»s A 23 (1990) L223.

[14j Opper M., Em.ophyY- Lett. 811989) 389.

[lsj Hopfield J. J., Feinstem D. I. and Palmer R. G., Nati(Je 304 (1983) 158.

il 6j Kleinfeld D. and

Pentlergraft

D. B., Biopfiis J 51 (1987 47.

[17] van Hemmen J. L., Ioffe L. B., Kùhn R. antl Vaa~ M., Pfijsica A 163 (1990j 386 [18j Dotsenko V. S., Yarunin N. D. antl Dorotheyev E. A., J Pfiis A 24 (1991) 2419.

[19j Dotsenko V. S. antl Tirozzi B., J Pfiis. A 24 (1991) 5163.

[20j Forrest B. M., J Pii_i's. 4 21 (19881245.

[21j Lampent J., Probability (Benjamin, New York. 19661.

[221 Plakhov A. Yu. antl Semenov S. A., m preparation.

Références

Documents relatifs

The random matrix framework introduced in this article brings new light to the actual performance of linear echo- state networks with internal noise (and more generally re-..

the variables could be the weights of a neural network, and each constraint imposes that the network satisfies the correct input-output relation on one of M.. training

1) Train the model, i.e. find the value of the parameters that optimize the regularized empirical risk on the training set.. 2) Evaluate performance on validation set based

Nous considérons maintenant le cœur du problème qui consiste à prédire le moment fléchissant dans le cas hydro-élastique non-linéaire. En effet, prédire le moment

In this paper, we propose natural attacks using visibly foggy images to generate input that causes incorrect classi- fication by the Inception deep learning model [22].. Apart from

The main purpose of the authors is to research and test a new method to solve the second problem, namely, training ANNs, which will be based on the use of genetic algorithms, as

The deep neural network with an input layer, a set of hidden layers and an output layer can be used for further considerations.. As standard ReLu activation function is used for all

The standard ICP starts with two point clouds and an initial guess for their relative rigid-body transform, and iteratively refines the trans- form by repeatedly generating pairs