More on stationnary points in Independent Component Analysis

(1)

HAL Id: hal-00147406

https://hal.archives-ouvertes.fr/hal-00147406

Submitted on 16 May 2007

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

To cite this version:

Vincent Vigneron, Ludovic Aubry. More on stationnary points in Independent Component Analysis.

9th European Symposium on Artificial Neural Networks, Apr 2001, Bruges, Belgium. 6 p. �hal- 00147406�

(2)

More on stationnary points in

Independent Component Analysis

VincentVigneron 1

LudovicAubry 2

1

MATISSE-SAMOSUMR8595

2

CEMIF

90, ruedeTolbiac 40rueduPelvoux

75634Pariscedex 13 91020EvryCourcouronnes

Abstract. Inthispaper,wewillfocus ontheproblemofblindsource

separationforindependentandidenticallydistributedvariables(iid). The

problemmaybestatedasfollows: weobservealinear(unknown)mixture

ofkiidvariables(thesources),andwewanttorecovereitherthesources

or the linearmapping. We giveonline stability conditions of thealgo-

rithmusingtheeigenvaluesofthehessianmatrixofthepseudo-likelihood

matchingoursetofobservations.

1 Introduction

TheprocessofdiscoveringthelinearmappingiscalledBlindSourceSeparation

(BSS), because we don'twant to makeany asumption on thedistribution of

thesources, except that theyare mutually independent. A review of several

methodsrecoveringthemappingcanbefoundin[2]. Tee-WongLeeshowsthat

many of these methods are<equivalentto nding themaximumof entropy of

theobservationsassuming agivendistribution of the sources[6]. Cardoso&

Amarishowthat themixture canbe recoveredeven withwrong assumptions

onthedistributions,providedtheysatisfyafewstabilityconditions.

Weconsiderthe casewhere weobservead-dimensionalrandomvectorX.

The vector X is obtained from a linear (non-random) mapping of a source

variable S. Thus we have X(t) = AS(t) for all of our observations. Let

S(t)=[s

1

(t);:::;s

d

(t)] bethesourcessuchthat foreach instantt,thesource

s

i

(t) has the probability density fuction (pdf) p

i

(independent of t). With

the (only) assumption that S has independent components, its joint pdf is

p= Q

d

i=1 p

i .

Letobservenrealisationsx(t)ofthevariablesX suchthatx(t)=As(t);for

t=1;:::;nandanunknownddmatrixA 1

. Further,wewillproposeamodel

distributionforS,notedg= Q

d

i=1 g

i

. SeveralpapersfromComon[5],Cardoso

1

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 2001, D-Facto public., ISBN 2-930307-01-3, pp. 39-44

(3)

[3],Amari[1]showthatitis possibleto recoverasatisfyingestimation ofthe

matrixA 1

evenwheng isdierentfromp,undercertainconditions(such as

g

i

beingsub-gaussianifp

i is).

Inorder to obtainan estimatorof A, we haveto minimizea given criterion,

mostofthemarebasedonentropyandmutualinformation. Typicallywewoud

minimizethecriterion H(BX), or E[logg(BX)]with respect to B. The best

choice here would be to take for g the distribution of the sources 2

. Such a

criterionis minimal when each coordinate of the vectorBX are independent

andthedistributionofG(BX)isuniform(Gisthecumulativedensityfunction

ofg).

WorksfromCardoso[3]andCardoso&Amari[4]giveconditionsongandpso

that theminimisationalgorithm is stable. Theconditions showthat abroad

rangeoffunction gcanbeused.

2 Maximum likelihood

Let us dene a likelihood function, matching our set of observations: let

f;A;(G

B )

B2GLd(R)

gbeastatistical model ofthesourcess. In ourcase,we

don'tknowthetruedistributionP (P(ds)=p(s)ds)ofthesourcess,and we

don'tassumethat P belongsto ourparametricmodel(G

B )

B2GLd(R)

. That is

wearenotdoingmaximumlikelihoodestimation. Nevertheless,wewill dene

apseudolog-likelihoodwith:

U

n (B)=

1

n n

X

i=1

log(jdetBjg(BX

i ));

wherethevariables(X

i )

i=1;:::;n

aretheobservations,andg(y)= Q

d

k =1 g

k (x

k ),

isthedensityof G

B

(dx)=jdetBjg(Bx)dx. Thissequenceoffunctions ofthe

observationsiscalled acontrastprocessus ifitveriessomesimpleproperties.

Themainconditionisthatitconvergeinprobabilitytowardacontrastfunction

whose minimum is our solution. In fact, the requirement on U

n

(B) is abit

broader,becauseaswewillseelater,weonlyneedthat itsgradientcancelsat

thesolutionpointinorderto deneasequenceofestimatorsofA 1

.

Lemma 1 if E[jlog(jdetBjg(BX))j]<1, wehave the convergencein prob-

ability P of:

lim

n!1 U

n

(B)= E[l og(jdetBjg(BX))]

= Z

log(jdetBjg(BAS))p(s)ds

C(B;A 1

)

=K (g

BA

kp)+H (p)+logjdetBj: (1)

2

Recently,quasi-optimalmethodswithonlineestimationof pdf'sand ofscorefunctions

(4)

C(B;A 1

) is called acontrast function if B !C(B;A 1

) has a strict mini-

mumatthepointB=A 1

. TheprocessusU

n

(B)iscalledacontrastprocessus,

and

^

B

n

=inf

B U

n

(B)iscalledthe contrastestimate.

Frominequality(1), itisclearthatC(B;A 1

)H (p)+logjdetBjwithequal-

ityonly if the distributions g

BA

and pare the same. Yet, we have g6=pand

weneedtoprovethat B=A 1

,whereisthe productofascalematrixand

apermutationmatrix, isaminimum.

There isnogeneralconditions ensuringthat

E[jlog(jdetBjg(BX))j]<1;

butwecanenumerate several necessary conditions.

WeshowthatthefunctionC(B;A 1

)hasseveralminimaoftheformA 1

.

3 Stationnary points

We call stationnary points, the matrices of GL

d

(R), such that dU

n

(B) = 0.

Those points are good candidates for maxima and minima of our contrast

function. Furthermore,weshowthat there existssuch pointsthat arealways

solutionstoourproblems.

Let'scomputethetotaldierentialofourcontrastprocesswithrespecttothe

inversemixing matrixB:

U

n (B)=

1

n n

X

i=1

log(jdetBg(BX

i ))

then

dU

n

(B)= Trace(dBB 1

) 1

n n

X

i=1

T

(BX

i )d(BX

i )

with(x

1

;:::;x

d )

h

g 0

(x1)

g(x

1 )

;:::; g

0

(xd)

g(x

d )

i

T

.

Remark 1 If wedenethe mappingdW asdW =dBB 1

,andY

i

=BX

i ,

wehave

dU

n

(B)= Trace(dW) 1

n P

n

i=1

T

(Y

i )dBB

1

X

i

= Trace(dW) 1

n P

n

i=1

T

(Y

i )dWY

i :

Thismappingdoesnotcorrespondtoachangeofvariable,althoughitrepresents

alocal change ofcoordinate. As the only points ofinterestfor us are those of

the form A 1

, we will see that with the change of parameters W = BA 1

,

thehessian matrixhas ablock diagonalformateach stationnarypoints.

FromthedierentialofdU

n

wehave:

@U

n (B)

@B

= B

T 1

n n

X

(BX

i )X

T

i :

(5)

matrixandapermutation(Æ

i;(j)

). Thisisequivariantpropertyasintroduced

byComon[5]: weonlyneed(andcan)recoverthemixing matrixuptoaper-

mutation,thusweonlyrequireunicityoftheminimauptoapermutationand

scalingofthematrixA.

Letthesetof

i;j

besolutionsoftheintegralequations1+E[

i (

i;j s

j )

i;j s

j ]=

0. For anypermutation off1;:::;dg, wedene

thematrix whosecom-

ponentsare

i;(i)Æ

(i);j

. LetB

=

A

1

,then:

I

d

+E[ (B

X)(B

X)

T

]=I

d

+E[(

S)(

S)

T

]

that is for each element(i;j) we haveÆ

i;j +E[

j (

S)(S)

T

j

] =0. The left

memberoftheequation:

D

i;j

=

0 ifi6=j

1+E[

i (

i;(i) s

(i) )

j;(j) s

(j)

]=0 ifi=j

Weyethavetoprovetheexistenceofsuchsolutions(orsomeconditionson

thedistributions pandg)andtheuniqueness.

3.1 Stability conditions

WejustshowedB

isagoodcandidateforalocalminimum,weneedtoprove

thatthehessianmatrixE[ r 2

l

n (B

)]ispositivedenite. Thismaynotalways

bethecase,however. Amarietal. [1]proposedamodicationofthealgorithm

sothat the hessian becomes positive denite. Let us rst examine the one-

dimensionalcaseforwhich

@U

n (B)

@B

= B

T 1

n n

X

i=1 (BX

i )X

i :

and

@ 2

U

n (B)

@B 2

= 1

B 2

1

n n

X

i=1 B

0

(BX

i )X

2

i :

Forsuch stability at point B

such that 1

B

+E[(BX)X] =0, we needthat

1

2 [B

0

(BX

i )X

2

i ]0.

(6)

3.2 Hessian matrix form

Notingthat

@b T

k `

@b

ij

= b 1

jk b

1

`i

,wecancomputetheHessianasfollows:

H

ijk `

(B) =

@ 2

Un(B)

@bij@b

k `

=

@

@bij

b 1

k ` +

1

n P

n

r=1

k (BX

r

)X r

`

=b 1

`i b

1

jk 1

n P

n

r=1

0

(BX r

)X r

` X

r

j Æ

ik :

B

isastrict minimumof C(B;A 1

)iE[H(B) ] ispositivedenite,i.e.

E[H

i jk `

(B)]=(A 1

)

`i (A

1

)

jk E[

0

k (S)X

` X

j ]Æ

ik

Assuming isadiagonalmatrix(issolutionofI

d

+E[(S)(S) T

]=0):

E[H

ijk`

(B)]=a

`i a

jk 1

i

k X

p a

`p a

jp E[

0

k (

k s

k )s

2

p ]Æ

ik

Becausep6=qimpliesE[ 0

k (

k s

k )s

p s

q

]=0. Wenotethat ifQ(B)= P

ijk ` b

ij

b

k ` H

ijk `

isa positive denite quadraticform, soW !Q(WA 1

), which can

alsobewrittenQ(WA 1

)= P

ijk ` P

pq w

ip w

k q a

1

pj a

1

q`

H

ijk `

= P

ipk q w

ip w

k q U

ipk q ,

withU

ipk q

= P

j`

a 1

pj a

1

q`

H

ijk ` .

Soitisequivalentto provethat

P

k ;`

a 1

uj a

1

v`

E[H

ijk`

(A 1

)] = P

j;`

a 1

uj a

1

v`

a

`i a

jk 1

ik

P

j;`

a 1

uj a

1

v`

a

`p a

jp E[

0

k (

k s

k )s

2

p ]Æ

ik

;

U

ijk `

= Æ

jk Æ

i`

1

ij E[

0

k (

k s

k )s

2

j ]Æ

j`

Æ

ik

ispositivedenite. IfwerewritethematricesM

ij

asthevector=[M

12

;M

21

;

:::;M

ij

;M

ji

;:::;M

dd ]

T

,hencethetransformedhessianU

ijk `

hasamatrixform

asU

ijk `

=Æ

jk Æ

i`

1

i

k E[

0

i (

i s

i )s

2

j ]Æ

j`

Æ

ik

U = 2

6

4 .

.

U

ijij U

jiij

0

U

ijji U

jiji

.

0

0 U

iiii

0

.

. 3

7

5

whichleadstodeneforalli<j:

U

ij

=

ij

U

i

=U

iiii

=

ij +

ij :

if

ij

= E[ 0

i (

i s

i )s

2

j ];

ij

= 1

ij

. Thesimpliedsolutionwhere

i

=1is

U

ij

=

E[ 0

i (s

i )]E[s

2

j

] 1

1 E[

0

j (s

j )]E[s

2

i ]

U

i

=1 E[ 0

(s

i )s

2

]

(7)

Fromthetwoequations(3.2)and(3.2),wecannowgivethestabilitycon-

ditions,that isU

i

<0andtheeigenvaluesofU

ij

<0. TheeigenvaluesofU

ij

arethesolutionsof

(

ij x)(

ji

x)

2

ij

=0;

i.e.

x

1;2

= 1

2

ij +

ji

q

(

ij +

ji )

2

4 2

ij

:

Weneedthat Re(x

1

)<0andRe(x

2

)<0. Inourcase,x

1 and x

2

arereal

numbersbecauseU

ij

issymetric. Thus,sincex

1

>x

2 ,

x

1

<0 , (

ij +

ji )>

q

(

ij

ji )

2

+4 2

ij

>0

, (

ij +

ji )

2

>(

ij

ji )

2

+4 2

ij

,

ij

ji

>

2

ij

and

E

0

i (

i s

i )(

j s

j )

2

] E[

0

j (

j s

j )(

i s

i )

2

>1

4 Conclusion

Thislastformulaallowsustocheckthestabilityconditionsonline,byestimat-

ingthe valuesE[ 0

i (

i s

i

)] and E[(

j s

j )

2

] withrespectively 1

n P

n

k =1

0

i (

^

B

n X

k )

and 1

n P

n

k =1

0

i (

^

B

n X

k )

2

.

References

[1] Amari, S., Chen, T.P., and Cichocki, A. (1997). Stability analysis of

learning algorithms for Blind Source Separation, NeuralNetworks,Vol.

10,N 0

8,pp1345{1351.

[2] Cardoso,J.-F.(1997).Statisticalprinciplesofsourceseparation.InProc.

of SYSID'97, 11th IFAC symposium on system identication, Fukuoka

(Japan),pages1837{1844.

[3] Cardoso,J.-F. (1998). On thestability of somesource separationalgo-

rithms.InProc. ofthe 1998 IEEESPworkshop on Neural Networksfor

SignalProcessing(NNSP'98),pages13{22.

[4] Cardoso,J.-F.andAmari,S.-I.(1997).Maximumlikelihoodsourcesepa-

ration: equivarianceandadaptativity. InProc. ofSYSID'97, 11thIFAC

symposiumonsystemidentication,Fukuoka(Japan),pages1063{1068.

[5] Comon, P. Independent Component Analysis, a new concept ? Signal

Processing, 36, 287{314.

[6] Lee, T.W. (1998). Independent Component Analysis and applications.

Kluweracademicpublisher.