HAL Id: hal-00147406
https://hal.archives-ouvertes.fr/hal-00147406
Submitted on 16 May 2007
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
To cite this version:
Vincent Vigneron, Ludovic Aubry. More on stationnary points in Independent Component Analysis.
9th European Symposium on Artificial Neural Networks, Apr 2001, Bruges, Belgium. 6 p. �hal- 00147406�
More on stationnary points in
Independent Component Analysis
VincentVigneron 1
LudovicAubry 2
1
MATISSE-SAMOSUMR8595
2
CEMIF
90, ruedeTolbiac 40rueduPelvoux
75634Pariscedex 13 91020EvryCourcouronnes
Abstract. Inthispaper,wewillfocus ontheproblemofblindsource
separationforindependentandidenticallydistributedvariables(iid). The
problemmaybestatedasfollows: weobservealinear(unknown)mixture
ofkiidvariables(thesources),andwewanttorecovereitherthesources
or the linearmapping. We giveonline stability conditions of thealgo-
rithmusingtheeigenvaluesofthehessianmatrixofthepseudo-likelihood
matchingoursetofobservations.
1 Introduction
TheprocessofdiscoveringthelinearmappingiscalledBlindSourceSeparation
(BSS), because we don'twant to makeany asumption on thedistribution of
thesources, except that theyare mutually independent. A review of several
methodsrecoveringthemappingcanbefoundin[2]. Tee-WongLeeshowsthat
many of these methods are<equivalentto nding themaximumof entropy of
theobservationsassuming agivendistribution of the sources[6]. Cardoso&
Amarishowthat themixture canbe recoveredeven withwrong assumptions
onthedistributions,providedtheysatisfyafewstabilityconditions.
Weconsiderthe casewhere weobservead-dimensionalrandomvectorX.
The vector X is obtained from a linear (non-random) mapping of a source
variable S. Thus we have X(t) = AS(t) for all of our observations. Let
S(t)=[s
1
(t);:::;s
d
(t)] bethesourcessuchthat foreach instantt,thesource
s
i
(t) has the probability density fuction (pdf) p
i
(independent of t). With
the (only) assumption that S has independent components, its joint pdf is
p= Q
d
i=1 p
i .
Letobservenrealisationsx(t)ofthevariablesX suchthatx(t)=As(t);for
t=1;:::;nandanunknownddmatrixA 1
. Further,wewillproposeamodel
distributionforS,notedg= Q
d
i=1 g
i
. SeveralpapersfromComon[5],Cardoso
1
ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 2001, D-Facto public., ISBN 2-930307-01-3, pp. 39-44
[3],Amari[1]showthatitis possibleto recoverasatisfyingestimation ofthe
matrixA 1
evenwheng isdierentfromp,undercertainconditions(such as
g
i
beingsub-gaussianifp
i is).
Inorder to obtainan estimatorof A, we haveto minimizea given criterion,
mostofthemarebasedonentropyandmutualinformation. Typicallywewoud
minimizethecriterion H(BX), or E[logg(BX)]with respect to B. The best
choice here would be to take for g the distribution of the sources 2
. Such a
criterionis minimal when each coordinate of the vectorBX are independent
andthedistributionofG(BX)isuniform(Gisthecumulativedensityfunction
ofg).
WorksfromCardoso[3]andCardoso&Amari[4]giveconditionsongandpso
that theminimisationalgorithm is stable. Theconditions showthat abroad
rangeoffunction gcanbeused.
2 Maximum likelihood
Let us dene a likelihood function, matching our set of observations: let
f;A;(G
B )
B2GLd(R)
gbeastatistical model ofthesourcess. In ourcase,we
don'tknowthetruedistributionP (P(ds)=p(s)ds)ofthesourcess,and we
don'tassumethat P belongsto ourparametricmodel(G
B )
B2GLd(R)
. That is
wearenotdoingmaximumlikelihoodestimation. Nevertheless,wewill dene
apseudolog-likelihoodwith:
U
n (B)=
1
n n
X
i=1
log(jdetBjg(BX
i ));
wherethevariables(X
i )
i=1;:::;n
aretheobservations,andg(y)= Q
d
k =1 g
k (x
k ),
isthedensityof G
B
(dx)=jdetBjg(Bx)dx. Thissequenceoffunctions ofthe
observationsiscalled acontrastprocessus ifitveriessomesimpleproperties.
Themainconditionisthatitconvergeinprobabilitytowardacontrastfunction
whose minimum is our solution. In fact, the requirement on U
n
(B) is abit
broader,becauseaswewillseelater,weonlyneedthat itsgradientcancelsat
thesolutionpointinorderto deneasequenceofestimatorsofA 1
.
Lemma 1 if E[jlog(jdetBjg(BX))j]<1, wehave the convergencein prob-
ability P of:
lim
n!1 U
n
(B)= E[l og(jdetBjg(BX))]
= Z
log(jdetBjg(BAS))p(s)ds
C(B;A 1
)
=K (g
BA
kp)+H (p)+logjdetBj: (1)
2
Recently,quasi-optimalmethodswithonlineestimationof pdf'sand ofscorefunctions
C(B;A 1
) is called acontrast function if B !C(B;A 1
) has a strict mini-
mumatthepointB=A 1
. TheprocessusU
n
(B)iscalledacontrastprocessus,
and
^
B
n
=inf
B U
n
(B)iscalledthe contrastestimate.
Frominequality(1), itisclearthatC(B;A 1
)H (p)+logjdetBjwithequal-
ityonly if the distributions g
BA
and pare the same. Yet, we have g6=pand
weneedtoprovethat B=A 1
,whereisthe productofascalematrixand
apermutationmatrix, isaminimum.
There isnogeneralconditions ensuringthat
E[jlog(jdetBjg(BX))j]<1;
butwecanenumerate several necessary conditions.
WeshowthatthefunctionC(B;A 1
)hasseveralminimaoftheformA 1
.
3 Stationnary points
We call stationnary points, the matrices of GL
d
(R), such that dU
n
(B) = 0.
Those points are good candidates for maxima and minima of our contrast
function. Furthermore,weshowthat there existssuch pointsthat arealways
solutionstoourproblems.
Let'scomputethetotaldierentialofourcontrastprocesswithrespecttothe
inversemixing matrixB:
U
n (B)=
1
n n
X
i=1
log(jdetBg(BX
i ))
then
dU
n
(B)= Trace(dBB 1
) 1
n n
X
i=1
T
(BX
i )d(BX
i )
with(x
1
;:::;x
d )
h
g 0
(x1)
g(x
1 )
;:::; g
0
(xd)
g(x
d )
i
T
.
Remark 1 If wedenethe mappingdW asdW =dBB 1
,andY
i
=BX
i ,
wehave
dU
n
(B)= Trace(dW) 1
n P
n
i=1
T
(Y
i )dBB
1
X
i
= Trace(dW) 1
n P
n
i=1
T
(Y
i )dWY
i :
Thismappingdoesnotcorrespondtoachangeofvariable,althoughitrepresents
alocal change ofcoordinate. As the only points ofinterestfor us are those of
the form A 1
, we will see that with the change of parameters W = BA 1
,
thehessian matrixhas ablock diagonalformateach stationnarypoints.
FromthedierentialofdU
n
wehave:
@U
n (B)
@B
= B
T 1
n n
X
(BX
i )X
T
i :
ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 2001, D-Facto public., ISBN 2-930307-01-3, pp. 39-44
matrixandapermutation(Æ
i;(j)
). Thisisequivariantpropertyasintroduced
byComon[5]: weonlyneed(andcan)recoverthemixing matrixuptoaper-
mutation,thusweonlyrequireunicityoftheminimauptoapermutationand
scalingofthematrixA.
Letthesetof
i;j
besolutionsoftheintegralequations1+E[
i (
i;j s
j )
i;j s
j ]=
0. For anypermutation off1;:::;dg, wedene
thematrix whosecom-
ponentsare
i;(i)Æ
(i);j
. LetB
=
A
1
,then:
I
d
+E[ (B
X)(B
X)
T
]=I
d
+E[(
S)(
S)
T
]
that is for each element(i;j) we haveÆ
i;j +E[
j (
S)(S)
T
j
] =0. The left
memberoftheequation:
D
i;j
=
0 ifi6=j
1+E[
i (
i;(i) s
(i) )
j;(j) s
(j)
]=0 ifi=j
Weyethavetoprovetheexistenceofsuchsolutions(orsomeconditionson
thedistributions pandg)andtheuniqueness.
3.1 Stability conditions
WejustshowedB
isagoodcandidateforalocalminimum,weneedtoprove
thatthehessianmatrixE[ r 2
l
n (B
)]ispositivedenite. Thismaynotalways
bethecase,however. Amarietal. [1]proposedamodicationofthealgorithm
sothat the hessian becomes positive denite. Let us rst examine the one-
dimensionalcaseforwhich
@U
n (B)
@B
= B
T 1
n n
X
i=1 (BX
i )X
i :
and
@ 2
U
n (B)
@B 2
= 1
B 2
1
n n
X
i=1 B
0
(BX
i )X
2
i :
Forsuch stability at point B
such that 1
B
+E[(BX)X] =0, we needthat
1
2 [B
0
(BX
i )X
2
i ]0.
3.2 Hessian matrix form
Notingthat
@b T
k `
@b
ij
= b 1
jk b
1
`i
,wecancomputetheHessianasfollows:
H
ijk `
(B) =
@ 2
Un(B)
@bij@b
k `
=
@
@bij
b 1
k ` +
1
n P
n
r=1
k (BX
r
)X r
`
=b 1
`i b
1
jk 1
n P
n
r=1
0
(BX r
)X r
` X
r
j Æ
ik :
B
isastrict minimumof C(B;A 1
)iE[H(B) ] ispositivedenite,i.e.
E[H
i jk `
(B)]=(A 1
)
`i (A
1
)
jk E[
0
k (S)X
` X
j ]Æ
ik
Assuming isadiagonalmatrix(issolutionofI
d
+E[(S)(S) T
]=0):
E[H
ijk`
(B)]=a
`i a
jk 1
i
k X
p a
`p a
jp E[
0
k (
k s
k )s
2
p ]Æ
ik
Becausep6=qimpliesE[ 0
k (
k s
k )s
p s
q
]=0. Wenotethat ifQ(B)= P
ijk ` b
ij
b
k ` H
ijk `
isa positive denite quadraticform, soW !Q(WA 1
), which can
alsobewrittenQ(WA 1
)= P
ijk ` P
pq w
ip w
k q a
1
pj a
1
q`
H
ijk `
= P
ipk q w
ip w
k q U
ipk q ,
withU
ipk q
= P
j`
a 1
pj a
1
q`
H
ijk ` .
Soitisequivalentto provethat
P
k ;`
a 1
uj a
1
v`
E[H
ijk`
(A 1
)] = P
j;`
a 1
uj a
1
v`
a
`i a
jk 1
ik
P
j;`
a 1
uj a
1
v`
a
`p a
jp E[
0
k (
k s
k )s
2
p ]Æ
ik
;
U
ijk `
= Æ
jk Æ
i`
1
ij E[
0
k (
k s
k )s
2
j ]Æ
j`
Æ
ik
ispositivedenite. IfwerewritethematricesM
ij
asthevector=[M
12
;M
21
;
:::;M
ij
;M
ji
;:::;M
dd ]
T
,hencethetransformedhessianU
ijk `
hasamatrixform
asU
ijk `
=Æ
jk Æ
i`
1
i
k E[
0
i (
i s
i )s
2
j ]Æ
j`
Æ
ik
U = 2
6
6
6
6
6
6
6
6
6
4 .
.
.
U
ijij U
jiij
0
U
ijji U
jiji
.
.
.
0
0 U
iiii
0
.
.
. 3
7
7
7
7
7
7
7
7
7
5
whichleadstodeneforalli<j:
U
ij
=
ij
ij
ij
ij
U
i
=U
iiii
=
ij +
ij :
if
ij
= E[ 0
i (
i s
i )s
2
j ];
ij
= 1
ij
. Thesimpliedsolutionwhere
i
=1is
U
ij
=
E[ 0
i (s
i )]E[s
2
j
] 1
1 E[
0
j (s
j )]E[s
2
i ]
U
i
=1 E[ 0
(s
i )s
2
]
ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), 25-27 April 2001, D-Facto public., ISBN 2-930307-01-3, pp. 39-44
Fromthetwoequations(3.2)and(3.2),wecannowgivethestabilitycon-
ditions,that isU
i
<0andtheeigenvaluesofU
ij
<0. TheeigenvaluesofU
ij
arethesolutionsof
(
ij x)(
ji
x)
2
ij
=0;
i.e.
x
1;2
= 1
2
ij +
ji
q
(
ij +
ji )
2
4 2
ij
:
Weneedthat Re(x
1
)<0andRe(x
2
)<0. Inourcase,x
1 and x
2
arereal
numbersbecauseU
ij
issymetric. Thus,sincex
1
>x
2 ,
x
1
<0 , (
ij +
ji )>
q
(
ij
ji )
2
+4 2
ij
>0
, (
ij +
ji )
2
>(
ij
ji )
2
+4 2
ij
,
ij
ji
>
2
ij
and
E
0
i (
i s
i )(
j s
j )
2
] E[
0
j (
j s
j )(
i s
i )
2
>1
4 Conclusion
Thislastformulaallowsustocheckthestabilityconditionsonline,byestimat-
ingthe valuesE[ 0
i (
i s
i
)] and E[(
j s
j )
2
] withrespectively 1
n P
n
k =1
0
i (
^
B
n X
k )
and 1
n P
n
k =1
0
i (
^
B
n X
k )
2
.
References
[1] Amari, S., Chen, T.P., and Cichocki, A. (1997). Stability analysis of
learning algorithms for Blind Source Separation, NeuralNetworks,Vol.
10,N 0
8,pp1345{1351.
[2] Cardoso,J.-F.(1997).Statisticalprinciplesofsourceseparation.InProc.
of SYSID'97, 11th IFAC symposium on system identication, Fukuoka
(Japan),pages1837{1844.
[3] Cardoso,J.-F. (1998). On thestability of somesource separationalgo-
rithms.InProc. ofthe 1998 IEEESPworkshop on Neural Networksfor
SignalProcessing(NNSP'98),pages13{22.
[4] Cardoso,J.-F.andAmari,S.-I.(1997).Maximumlikelihoodsourcesepa-
ration: equivarianceandadaptativity. InProc. ofSYSID'97, 11thIFAC
symposiumonsystemidentication,Fukuoka(Japan),pages1063{1068.
[5] Comon, P. Independent Component Analysis, a new concept ? Signal
Processing, 36, 287{314.
[6] Lee, T.W. (1998). Independent Component Analysis and applications.
Kluweracademicpublisher.