• Aucun résultat trouvé

More on stationnary points in Independent Component Analysis

N/A
N/A
Protected

Academic year: 2021

Partager "More on stationnary points in Independent Component Analysis"

Copied!
7
0
0

Texte intégral

(1)

HAL Id: hal-00147406

https://hal.archives-ouvertes.fr/hal-00147406

Submitted on 16 May 2007

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

To cite this version:

Vincent Vigneron, Ludovic Aubry. More on stationnary points in Independent Component Analysis.

9th European Symposium on Artificial Neural Networks, Apr 2001, Bruges, Belgium. 6 p. �hal- 00147406�

(2)

More on stationnary points in

Independent Component Analysis

VincentVigneron 1

LudovicAubry 2

1

MATISSE-SAMOSUMR8595

2

CEMIF

90, ruedeTolbiac 40rueduPelvoux

75634Pariscedex 13 91020EvryCourcouronnes

Abstract. Inthispaper,wewillfocus ontheproblemofblindsource

separationforindependentandidenticallydistributedvariables(iid). The

problemmaybestatedasfollows: weobservealinear(unknown)mixture

ofkiidvariables(thesources),andwewanttorecovereitherthesources

or the linearmapping. We giveonline stability conditions of thealgo-

rithmusingtheeigenvaluesofthehessianmatrixofthepseudo-likelihood

matchingoursetofobservations.

1 Introduction

TheprocessofdiscoveringthelinearmappingiscalledBlindSourceSeparation

(BSS), because we don'twant to makeany asumption on thedistribution of

thesources, except that theyare mutually independent. A review of several

methodsrecoveringthemappingcanbefoundin[2]. Tee-WongLeeshowsthat

many of these methods are<equivalentto nding themaximumof entropy of

theobservationsassuming agivendistribution of the sources[6]. Cardoso&

Amarishowthat themixture canbe recoveredeven withwrong assumptions

onthedistributions,providedtheysatisfyafewstabilityconditions.

Weconsiderthe casewhere weobservead-dimensionalrandomvectorX.

The vector X is obtained from a linear (non-random) mapping of a source

variable S. Thus we have X(t) = AS(t) for all of our observations. Let

S(t)=[s

1

(t);:::;s

d

(t)] bethesourcessuchthat foreach instantt,thesource

s

i

(t) has the probability density fuction (pdf) p

i

(independent of t). With

the (only) assumption that S has independent components, its joint pdf is

p= Q

d

i=1 p

i .

Letobservenrealisationsx(t)ofthevariablesX suchthatx(t)=As(t);for

t=1;:::;nandanunknownddmatrixA 1

. Further,wewillproposeamodel

distributionforS,notedg= Q

d

i=1 g

i

. SeveralpapersfromComon[5],Cardoso

1

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 2001, D-Facto public., ISBN 2-930307-01-3, pp. 39-44

(3)

[3],Amari[1]showthatitis possibleto recoverasatisfyingestimation ofthe

matrixA 1

evenwheng isdierentfromp,undercertainconditions(such as

g

i

beingsub-gaussianifp

i is).

Inorder to obtainan estimatorof A, we haveto minimizea given criterion,

mostofthemarebasedonentropyandmutualinformation. Typicallywewoud

minimizethecriterion H(BX), or E[logg(BX)]with respect to B. The best

choice here would be to take for g the distribution of the sources 2

. Such a

criterionis minimal when each coordinate of the vectorBX are independent

andthedistributionofG(BX)isuniform(Gisthecumulativedensityfunction

ofg).

WorksfromCardoso[3]andCardoso&Amari[4]giveconditionsongandpso

that theminimisationalgorithm is stable. Theconditions showthat abroad

rangeoffunction gcanbeused.

2 Maximum likelihood

Let us dene a likelihood function, matching our set of observations: let

f;A;(G

B )

B2GLd(R)

gbeastatistical model ofthesourcess. In ourcase,we

don'tknowthetruedistributionP (P(ds)=p(s)ds)ofthesourcess,and we

don'tassumethat P belongsto ourparametricmodel(G

B )

B2GLd(R)

. That is

wearenotdoingmaximumlikelihoodestimation. Nevertheless,wewill dene

apseudolog-likelihoodwith:

U

n (B)=

1

n n

X

i=1

log(jdetBjg(BX

i ));

wherethevariables(X

i )

i=1;:::;n

aretheobservations,andg(y)= Q

d

k =1 g

k (x

k ),

isthedensityof G

B

(dx)=jdetBjg(Bx)dx. Thissequenceoffunctions ofthe

observationsiscalled acontrastprocessus ifitveriessomesimpleproperties.

Themainconditionisthatitconvergeinprobabilitytowardacontrastfunction

whose minimum is our solution. In fact, the requirement on U

n

(B) is abit

broader,becauseaswewillseelater,weonlyneedthat itsgradientcancelsat

thesolutionpointinorderto deneasequenceofestimatorsofA 1

.

Lemma 1 if E[jlog(jdetBjg(BX))j]<1, wehave the convergencein prob-

ability P of:

lim

n!1 U

n

(B)= E[l og(jdetBjg(BX))]

= Z

log(jdetBjg(BAS))p(s)ds

C(B;A 1

)

=K (g

BA

kp)+H (p)+logjdetBj: (1)

2

Recently,quasi-optimalmethodswithonlineestimationof pdf'sand ofscorefunctions

(4)

C(B;A 1

) is called acontrast function if B !C(B;A 1

) has a strict mini-

mumatthepointB=A 1

. TheprocessusU

n

(B)iscalledacontrastprocessus,

and

^

B

n

=inf

B U

n

(B)iscalledthe contrastestimate.

Frominequality(1), itisclearthatC(B;A 1

)H (p)+logjdetBjwithequal-

ityonly if the distributions g

BA

and pare the same. Yet, we have g6=pand

weneedtoprovethat B=A 1

,whereisthe productofascalematrixand

apermutationmatrix, isaminimum.

There isnogeneralconditions ensuringthat

E[jlog(jdetBjg(BX))j]<1;

butwecanenumerate several necessary conditions.

WeshowthatthefunctionC(B;A 1

)hasseveralminimaoftheformA 1

.

3 Stationnary points

We call stationnary points, the matrices of GL

d

(R), such that dU

n

(B) = 0.

Those points are good candidates for maxima and minima of our contrast

function. Furthermore,weshowthat there existssuch pointsthat arealways

solutionstoourproblems.

Let'scomputethetotaldierentialofourcontrastprocesswithrespecttothe

inversemixing matrixB:

U

n (B)=

1

n n

X

i=1

log(jdetBg(BX

i ))

then

dU

n

(B)= Trace(dBB 1

) 1

n n

X

i=1

T

(BX

i )d(BX

i )

with(x

1

;:::;x

d )

h

g 0

(x1)

g(x

1 )

;:::; g

0

(xd)

g(x

d )

i

T

.

Remark 1 If wedenethe mappingdW asdW =dBB 1

,andY

i

=BX

i ,

wehave

dU

n

(B)= Trace(dW) 1

n P

n

i=1

T

(Y

i )dBB

1

X

i

= Trace(dW) 1

n P

n

i=1

T

(Y

i )dWY

i :

Thismappingdoesnotcorrespondtoachangeofvariable,althoughitrepresents

alocal change ofcoordinate. As the only points ofinterestfor us are those of

the form A 1

, we will see that with the change of parameters W = BA 1

,

thehessian matrixhas ablock diagonalformateach stationnarypoints.

FromthedierentialofdU

n

wehave:

@U

n (B)

@B

= B

T 1

n n

X

(BX

i )X

T

i :

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 2001, D-Facto public., ISBN 2-930307-01-3, pp. 39-44

(5)

matrixandapermutation

i;(j)

). Thisisequivariantpropertyasintroduced

byComon[5]: weonlyneed(andcan)recoverthemixing matrixuptoaper-

mutation,thusweonlyrequireunicityoftheminimauptoapermutationand

scalingofthematrixA.

Letthesetof

i;j

besolutionsoftheintegralequations1+E[

i (

i;j s

j )

i;j s

j ]=

0. For anypermutation off1;:::;dg, wedene

thematrix whosecom-

ponentsare

i;(i)Æ

(i);j

. LetB

=

A

1

,then:

I

d

+E[ (B

X)(B

X)

T

]=I

d

+E[(

S)(

S)

T

]

that is for each element(i;j) we haveÆ

i;j +E[

j (

S)(S)

T

j

] =0. The left

memberoftheequation:

D

i;j

=

0 ifi6=j

1+E[

i (

i;(i) s

(i) )

j;(j) s

(j)

]=0 ifi=j

Weyethavetoprovetheexistenceofsuchsolutions(orsomeconditionson

thedistributions pandg)andtheuniqueness.

3.1 Stability conditions

WejustshowedB

isagoodcandidateforalocalminimum,weneedtoprove

thatthehessianmatrixE[ r 2

l

n (B

)]ispositivedenite. Thismaynotalways

bethecase,however. Amarietal. [1]proposedamodicationofthealgorithm

sothat the hessian becomes positive denite. Let us rst examine the one-

dimensionalcaseforwhich

@U

n (B)

@B

= B

T 1

n n

X

i=1 (BX

i )X

i :

and

@ 2

U

n (B)

@B 2

= 1

B 2

1

n n

X

i=1 B

0

(BX

i )X

2

i :

Forsuch stability at point B

such that 1

B

+E[(BX)X] =0, we needthat

1

2 [B

0

(BX

i )X

2

i ]0.

(6)

3.2 Hessian matrix form

Notingthat

@b T

k `

@b

ij

= b 1

jk b

1

`i

,wecancomputetheHessianasfollows:

H

ijk `

(B) =

@ 2

Un(B)

@bij@b

k `

=

@

@bij

b 1

k ` +

1

n P

n

r=1

k (BX

r

)X r

`

=b 1

`i b

1

jk 1

n P

n

r=1

0

(BX r

)X r

` X

r

j Æ

ik :

B

isastrict minimumof C(B;A 1

)iE[H(B) ] ispositivedenite,i.e.

E[H

i jk `

(B)]=(A 1

)

`i (A

1

)

jk E[

0

k (S)X

` X

j

ik

Assuming isadiagonalmatrix(issolutionofI

d

+E[(S)(S) T

]=0):

E[H

ijk`

(B)]=a

`i a

jk 1

i

k X

p a

`p a

jp E[

0

k (

k s

k )s

2

p

ik

Becausep6=qimpliesE[ 0

k (

k s

k )s

p s

q

]=0. Wenotethat ifQ(B)= P

ijk ` b

ij

b

k ` H

ijk `

isa positive denite quadraticform, soW !Q(WA 1

), which can

alsobewrittenQ(WA 1

)= P

ijk ` P

pq w

ip w

k q a

1

pj a

1

q`

H

ijk `

= P

ipk q w

ip w

k q U

ipk q ,

withU

ipk q

= P

j`

a 1

pj a

1

q`

H

ijk ` .

Soitisequivalentto provethat

P

k ;`

a 1

uj a

1

v`

E[H

ijk`

(A 1

)] = P

j;`

a 1

uj a

1

v`

a

`i a

jk 1

ik

P

j;`

a 1

uj a

1

v`

a

`p a

jp E[

0

k (

k s

k )s

2

p

ik

;

U

ijk `

= Æ

jk Æ

i`

1

ij E[

0

k (

k s

k )s

2

j

j`

Æ

ik

ispositivedenite. IfwerewritethematricesM

ij

asthevector=[M

12

;M

21

;

:::;M

ij

;M

ji

;:::;M

dd ]

T

,hencethetransformedhessianU

ijk `

hasamatrixform

asU

ijk `

=Æ

jk Æ

i`

1

i

k E[

0

i (

i s

i )s

2

j

j`

Æ

ik

U = 2

6

6

6

6

6

6

6

6

6

4 .

.

.

U

ijij U

jiij

0

U

ijji U

jiji

.

.

.

0

0 U

iiii

0

.

.

. 3

7

7

7

7

7

7

7

7

7

5

whichleadstodeneforalli<j:

U

ij

=

ij

ij

ij

ij

U

i

=U

iiii

=

ij +

ij :

if

ij

= E[ 0

i (

i s

i )s

2

j ];

ij

= 1

ij

. Thesimpliedsolutionwhere

i

=1is

U

ij

=

E[ 0

i (s

i )]E[s

2

j

] 1

1 E[

0

j (s

j )]E[s

2

i ]

U

i

=1 E[ 0

(s

i )s

2

]

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 2001, D-Facto public., ISBN 2-930307-01-3, pp. 39-44

(7)

Fromthetwoequations(3.2)and(3.2),wecannowgivethestabilitycon-

ditions,that isU

i

<0andtheeigenvaluesofU

ij

<0. TheeigenvaluesofU

ij

arethesolutionsof

(

ij x)(

ji

x)

2

ij

=0;

i.e.

x

1;2

= 1

2

ij +

ji

q

(

ij +

ji )

2

4 2

ij

:

Weneedthat Re(x

1

)<0andRe(x

2

)<0. Inourcase,x

1 and x

2

arereal

numbersbecauseU

ij

issymetric. Thus,sincex

1

>x

2 ,

x

1

<0 , (

ij +

ji )>

q

(

ij

ji )

2

+4 2

ij

>0

, (

ij +

ji )

2

>(

ij

ji )

2

+4 2

ij

,

ij

ji

>

2

ij

and

E

0

i (

i s

i )(

j s

j )

2

] E[

0

j (

j s

j )(

i s

i )

2

>1

4 Conclusion

Thislastformulaallowsustocheckthestabilityconditionsonline,byestimat-

ingthe valuesE[ 0

i (

i s

i

)] and E[(

j s

j )

2

] withrespectively 1

n P

n

k =1

0

i (

^

B

n X

k )

and 1

n P

n

k =1

0

i (

^

B

n X

k )

2

.

References

[1] Amari, S., Chen, T.P., and Cichocki, A. (1997). Stability analysis of

learning algorithms for Blind Source Separation, NeuralNetworks,Vol.

10,N 0

8,pp1345{1351.

[2] Cardoso,J.-F.(1997).Statisticalprinciplesofsourceseparation.InProc.

of SYSID'97, 11th IFAC symposium on system identication, Fukuoka

(Japan),pages1837{1844.

[3] Cardoso,J.-F. (1998). On thestability of somesource separationalgo-

rithms.InProc. ofthe 1998 IEEESPworkshop on Neural Networksfor

SignalProcessing(NNSP'98),pages13{22.

[4] Cardoso,J.-F.andAmari,S.-I.(1997).Maximumlikelihoodsourcesepa-

ration: equivarianceandadaptativity. InProc. ofSYSID'97, 11thIFAC

symposiumonsystemidentication,Fukuoka(Japan),pages1063{1068.

[5] Comon, P. Independent Component Analysis, a new concept ? Signal

Processing, 36, 287{314.

[6] Lee, T.W. (1998). Independent Component Analysis and applications.

Kluweracademicpublisher.

Références

Documents relatifs

Although the mixing model in this situation can be inverted by linear structures, we show that some simple independent component analysis (ICA) strategies that are often employed in

We analyze the convergence of the randomly perturbed algorithm, and illustrate its fast convergence through numerical examples on blind demixing of stochastic signals.. The

Not convinced by the topical promises of urban reform, and that are captured in neoliberal development plans such as Kenya Vision 2030 and its proliferating spatial

As a reminder, the discussions were focused around six means of action: 1/ mobilizing domestic public and private financial resources; 2/ mobilizing international private

the framework of blind source separation (BSS) where the mixing process is generally described as a linear convolutive model, and independent component analysis (ICA) (Hyvarinen et

Section 2 describes a generic method for the blind separation of noisy mixtures of stationary processes, based on a spectral approximation to the likelihood which seems to work

The method permits blind separation on these simulations allowing us to reconstruct the noise and physical component’s temperature and polarization power spectra as well as the

Belouchrani, A., Abed Meraim, K., Cardoso, J.-F., Moulines, E.: A blind source separation technique based on second order statistics. on Signal Pro-