Gradient based Approximate Joint Diagonalization by Orthogonal Transforms

(1)

HAL Id: hal-00339522

https://hal.archives-ouvertes.fr/hal-00339522

Submitted on 18 Nov 2008

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Gradient based Approximate Joint Diagonalization by

Orthogonal Transforms

Mikael Sorensen, Sylvie Icart, Pierre Comon, Luc Deneire

To cite this version:

Mikael Sorensen, Sylvie Icart, Pierre Comon, Luc Deneire. Gradient based Approximate Joint

Di-agonalization by Orthogonal Transforms. 16th European Signal Processing Conference (Eusipco’08),

Aug 2008, lausanne, Switzerland. pp.session L11-1. �hal-00339522�

(2)

GRADIENT BASED APPROXIMATE JOINT DIAGONALIZATION BY

ORTHOGONAL TRANSFORMS

Mikael Sørensen, Sylvie Icart, Pierre Comon and Luc Deneire

Laboratoire I3S, CNRS/UNSA

Les Algorithmes - Euclide-B, 06903 Sophia Antipolis, France phone: +33-4 9294 2751

{sorensen, icart, pcomon, deneire}@i3s.unice.fr

ABSTRACT

Approximate Joint Diagonalization (AJD) of a set of symmet-ric matsymmet-rices by an orthogonal transform is a popular problem in Blind Source Separation (BSS). In this paper we propose a gradient based algorithm which maximizes the sum of squares of diagonal entries of all the transformed symmetric matri-ces. Our main contribution is to transform the orthogonal-ity constrained optimization problem into an unconstrained problem. This transform is performed in two steps: First by parameterizing the orthogonal transform matrix by the matrix exponential of a skew-symmetric matrix. Second, by introducing an isomorphism between the vector space of skew-symmetric matrices and the Euclidean vector space of appropriate dimension. This transform is then applied to a gradient based algorithm called GAEX to perform joint diagonalization of a set of symmetric matrices.

1. INTRODUCTION

Joint diagonalization of a set of symmetric matrices is a popular problem in BSS [2], [3], [5], [18].

A necessary and sufficient condition for the exis-tence of an orthogonal matrix that will diagonalize all matrices in a finite set of symmetric matrices is that all the matrices contained in the set must mutually com-mute with each other [10], and hence only an approxi-mate solution to the joint diagonalization problem can in general be obtained.

One of the first AJD algorithms to handle a set of symmetric matrices was proposed by De Leeuw in [6]. Since there in general doesn’t exist an analytical solution to this problem a Jacobi procedure was proposed to nu-merically solve the problem. Another Jacobi procedure to this problem known as JADE, was later proposed by Cardoso in [3], [4].

Another way to try impose orthogonality of the transform matrix is by the penalty method and Joho proposed a gradient method in [7] to diagonalize a set of symmetric matrices by the aid of the penalty method. Furthermore Joho proposed a gradient and a Newton method in [8] to diagonalize a set of symmetric matri-ces by applying the projection based method proposed by Manton in [12] which projects every update to the manifold of orthogonal matrices.

Afsari [1] and Tanaka [16] proposed gradient based methods by exploiting the geometry of matrix Lie

THE WORK OF MIKAEL SØRENSEN IS SUPPORTED BY THE EU BY A MARIE-CURIE FELLOWSHIP (EST-SIGNAL PRO-GRAM : HTTP://EST-SIGNAL.I3S.UNICE.FR) UNDER CONTRACT NO MEST-CT-2005-021175.

groups. Furthermore Yamada proposed in [17] to apply the Cayley transform as a global parameterization of the set of orthogonal matrices with positive determi-nant and eigenvalues different from minus one.

The approach taken in this paper is also to make use of a proper parameterization of a set orthogonal matri-ces which will transform the constrained optimization problem into an unconstrained problem by the use of appropriate transforms so that the transform matrix will remain orthogonal after each update. The difference between the approach in [17] and the one presented in this paper is that a local parameterization will also be considered here and secondly the matrix exponential will be applied as a parameterization of the orthogonal matrices. Before presenting a gradient based AJD al-gorithm some notation used throughout the paper and the problem formulation will be presented.

1.1 Notations

Let R and R+ denote the set of real and nonnegative

real numbers respectively. Furthermore let SOm(R),

Sm(R) and S⊥m(R) denote the set of m × m orthogonal

matrices with determinant equal to one, the symmet-ric matsymmet-rices and skew-symmetsymmet-ric matsymmet-rices respectively. Moreover let (·)T, (·)H, Vec(·), Unvec(·), ⊗ and $ ·$ de-note the transpose, conjugate-transpose, matrix vector-ization operator, inverse matrix vectorvector-ization operator, the Kronecker product of two matrices and the Frobe-nius norm of a matrix respectively. Let A ∈ Rm×n_{, then}

let Aijdenote the ith row-jth column entry of A and let

ei∈ Rmdenote the unit vector with one in the entry i

and zero elsewhere. Let δ_ijbe the Dirac delta function which is zero except when the two indices are equal, in which case δ_ii= 1.

1.2 Problem Formulation

Given a finite set of symmetric matrices {Tk}n_k=1⊂ Sm(R),

then let the transform matrix be U ∈ SOm(R) and the

transformed matrices be Kk=UTkUT∀k ∈ {1,...,n}, then

the matrix U which would imply that Kkis as diagonal

as possible for every k ∈ {1,...,n} is sought.

A measure on how diagonal the set of transformed matrices are is given by the functional Ψ : SOm(R) → R

given by Ψ(U) = 1 2 n ! k=1 $UTkUT− diag " UTkUT # $2, (1) 16th European Signal Processing Conference (Eusipco'08), Lausanne, August~25-29 2008

(3)

where diag(A) ∈ Rm×m_{and diag(A)}

ij=Aijδij.

Due to the invariance property of the Frobenius norm wrt. any orthogonal operator, minimizing (1) is equivalent to maximizing the functional J : SOm(R) → R

given as J (U) = 1 2 n ! k=1 $diag"UT_kUT#_$2_.

A gradient based solution to the constrained maxi-mization problem

arg max

U∈SOm(R)J(U) (2)

will be proposed in this paper.

2. GRADIENT BASED AJD

2.1 Parameterization and Update of an Orthogonal Matrix

It is well-known that the orthogonality property of the transform matrix can be preserved by parameterizing them by the matrix exponential eX_!$∞

n=0X n

n!. Moreover

since SOm(R) is a multiplicative group and the

map-ping e : S⊥

m(R) → SOm(R) given by eX is surjective [9],

the following local parameterization of the transform matrix will ensure that the transform matrix will remain orthogonal after each update:

U(n+1) ₌ _U(n)_eµ%U_,

where %U ∈ S⊥

m(R), U(n)∈ SOm(R), µ ∈ R+and the upper

index denotes the iteration number. Alternatively the following global parameterization could be used:

U(n+1) = eU%

(n)

+µ%U_,

where %U(n), %U ∈ S⊥

m(R),µ ∈ R+and the upper index

de-notes the iteration number. By the use of the matrix exponential the problem has turned from finding an or-thogonal matrix to finding a skew-symmetric matrix. In general the computation of the matrix exponential must be considered computationally demanding, but here only the computation of the matrix exponential wrt. a skew-symmetric matrix is required and this can for instance be done by an EigenValue Decomposition (EVD).

The following subsection will show that it is pos-sible to turn the constrained problem (2) into an un-constrained problem by the use of a second transform introduced in the following subsection.

2.2 Isomorphism between S⊥

n(R) and R n(n−1)

2

In [14] an explicit expression of an isomorphism be-tween Sn(R) and R

(n+1)n

2 was derived. By a similar

procedure, an explicit expression of an isomorphism between S⊥

n(R) and R

(n−1)n

2 can be found.

Let A ∈ S⊥

n(R) and let w(A) denote the(n−1)n2 ×1

vec-tor that is obtained by eliminating all the diagonal and

supradiagonal elements from A and thereafter stacking the infradiagonal elements into the vector

w(A) =&A12,··· ,A1n,A23,··· ,A2n,··· ,A(n−1)n

'T

.

Definition 2.1. (Skew-Symmetric Elimination Matrix ) The skew-symmetric elimination matrix %L ∈ R(n−1)n2 ×n2is

defined to be the matrix representation of the linear mapping %_{L : S}⊥

n(R) → R

(n−1)n

2 and it satisfies %LVec(A) = w(A)∀A ∈

S⊥

n(R).

Definition 2.2. (Inverse Skew-Symmetric Elimination Ma-trix)

Let the inverse skew-symmetric elimination matrix %L−1∈ Rn2×(n−1)n2 be defined to be the matrix representation of

the linear mapping %L−1_{: R}(n−1)n2 → S⊥_n(R) and it satisfies

Vec(A) = %L−1w(A)∀A ∈ S⊥

n(R).

Before giving an explicit expression of the skew-symmetric elimination matrix, let

I(n−1)n 2 = & ˜u21, ˜u31, . . . , ˜un1, ˜u32, . . . , ˜un2, ˜u43, . . . , ˜un(n−1) ' ,

where Im is the m × m identity matrix and and ˜uij =

e_{(j−1)n+i−}1

2j(j+1)∈ R (n−1)n

2 .

Proposition 2.3. Let ˜uij=e_{(j−1)n+i−}1

2j(j+1)∈ R (n−1)n 2 and let Rn×n_{+ E}_ij=eieT_j then %L = ! i>j ˜uijVec " Eij #T =! i>j ˜uij⊗ eT_j ⊗ eT_i.

Proof. By making use of the identities Vec"yxT#₌_{y ⊗ x}

and xyT₌_{x ⊗ y}T_{we get} w(A) = ! i>j Aij˜uij = ! i>j ˜uijeTiAej = ! i>j ˜uijVec " Eij #T Vec(A) = ! i>j " ˜uij⊗ eT_j⊗ eT_i # Vec(A). " To obtain the matrix expression of the linear map-ping%L−1_{the following lemma will be used.}

Lemma 2.4. Let %Adenote the lower triangular matrix con-sisting of the infradiagonal elements of A then %LTw(A) = Vec"%A#.

(4)

Proof. Since ˜uT

ij˜uhk=δi−h,j−kwe get

%_LT%_{LVec(A) =}! i>j ! h>k " ˜uT ij⊗ ej⊗ ei #" ˜uhk⊗ eT_k⊗ eT_h # Vec(A) =! i>j ! h>k " ˜uT ij˜uhk⊗ ejeT_k⊗ eieT_h # Vec(A) = ! i>j " e_jeT j ⊗ eieTi # Vec(A) =! i>j Vec"eieT_iAejeT_j # = Vec   ! i>j AijEij    = Vec"A%#. Hence Vec"A%#= %LT%LVec(A) = %LTw(A). " By using the above lemma it is possible to obtain an explicit expression of %L−1_{, but before reaching this}

result the commutation matrix will be introduced.

Definition 2.5. (Commutation Matrix [13], [15] ) Let Rm×n_{+ H}_ij_!_e_i,m_eT

j,n, where ei,mis the ith column of

Imand ej,n is the jth column of In. Then the commutation matrix Cmn∈ Rmn×mnis defined to be Cmn= m ! i=1 n ! j=1 Hij⊗ HT_ij.

The commutation matrix relates Vec(A) and Vec"AT# by the equation CmnVec(A) = Vec

"

AT#∀A ∈ Rm×n[13].

Proposition 2.6. The matrix representation of the linear map%L−1_{is given by the expression %}_L−1_{= %}_LT_{− C}_nn_%_LT_.

Proof. By lemma 2.4 and the commutation matrix we get

.

%_LT_{− C}_nn%_LT/_{w(A) = %}_LT_{w(A) − C}_nn%_LT_w(A) = Vec"A%#_{− Vec} . % AT/ = Vec (A) ∀A ∈ S⊥_n(R). " By the use of the previously introduced isomor-phism the search for a skew-symmetric matrix has now turned into a search for a vector in R(n−1)n2 . Hence

any conventional iterative method operating in a vector space can now be applied to the problem. In particular a gradient ascent method will be described next.

2.3 Calculation of the Gradient

Now a matrix based formulation of the partial deriva-tives will be given. Let X ∈ Rm×m_{, then we have}

the identities $X$2_{= Vec (}_X)TVec(X) and Vec0diag(X)1 =

Diag(Vec(Im))Vec(X), where Diag(Vec(Im)) is a

diago-nal matrix with Diag(Vec(Im))ii= Vec (Im)iand the

ma-trix P ! Diag(Vec(Im)) is idempotent. By the use of

the identities we will derive ∂J(U)

∂w(%U)T from the differential

d(J(U)).

In [11] the differential of the matrix exponential wrt. a symmetric matrix was found. By a slight modification of their derivation the differential of the matrix expo-nential wrt. any square matrix can be found to be

d"eA#₌2 1

0 e

At_d(A)eA(1−t)_dt. ₍₃₎

Let U be fixed and let us recall the identity Vec(ABC) = "

CT⊗ A#Vec(B), then by applying the introduced iso-morphism and by equation (3) we get

d.Vec.Ue%U// = (Im⊗ U)d . Vec.eU%// = (Im⊗ U) 2 1 0 e % U(t−1)_{⊗ e}_U%t_dtVec"_d"_%_U## = (Im⊗ U) 2 1 0 e % U(t−1)_{⊗ e}U%t_dt%_L−1_d"_w(%_U)#_, (4) where %L−1 denotes the inverse skew-symmetric elimi-nation matrix.

Let U(n+1)₌_U(n)_e%U_{and K}

k=UTkUT, then the partial

derivatives for the gradient ascent method can be found via the following differential and the use of (4) to be

d(J(U)) = 1 2 n ! k=1 d"_$diag"UTkUT # $2# = n ! k=1 Vec(Kk)TPd " Vec"UTkUT ## = n ! k=1 Vec(Kk)TP((UTk⊗ Im) + (Im⊗ UTk)Cmm) × d(Vec(U)) = n ! k=1 Vec(Kk)TP((UTk⊗ Im) + (Im⊗ UTk)Cmm) × (Im⊗ U) 2 1 0 e % U(t−1)_{⊗ e}%Ut_dt%_L−1_d"_w(%_U)#_.

From the differential we can deduce that the gradient when evaluated at iteration n reduces to

∂J (U) ∂w(%U)T = n ! k=1 Vec(Kk)TP((UTk⊗ Im) + (Im⊗ UTk)Cmm) × (Im⊗ U)%L−1,

(5)

since U(n)₌_U(n)_e0_{at iteration n. For the global}

param-eterization U(n+1)_{= e}%U(n)+µ%U_{, the gradient vector can be}

found by similar calculations to be ∂J (U) ∂w(%U)T = n ! k=1 Vec(Kk)TP((UTk⊗ Im) + (Im⊗ UTk)Cmm) × 2 ₁ 0 e % U(t−1)_{⊗ e}U%t_dt%_L−1_.

Given the EVD %U = RΛRH_{the above integral can be}

expressed as 2 1 0 e % U(t−1)_{⊗ e}%Ut_{dt = (R ⊗ R)}2 1 0 e Λ(t−1)_{⊗ e}Λt_{dt(R ⊗ R)}H_.

Furthermore the diagonal elements of3₀1eΛ(t−1)_{⊗ e}Λt_dt

can be found to be 2 1 0 e λi(t−1)_eλjt_{dt =} 2 1 0 e " λi+λj # t−λi_dt =        eλj−e−λi λi+λj , λi!−λj e−λi _{, λ}_i₌_−λ_j

where λiis the ith eigenvalue of %U.

Applying the local parameterization, then the search direction is now set to

% U = Unvec   %L−1_∂∂_w(%J (U)_U)_T8999999_∂∂_w(%J (U)_U)_T999999   .

Next a simple and inexact linesearch procedure will be applied to solve the maximization problem

argmax

µ∈R+

J.U(n−1)_eµ%U/_. ₍₅₎

By selecting µ sufficiently small convergence to a crit-ical point is guaranteed. More specifcrit-ically let ω = {0,201,202, . . . , 1}, then

µ(n)= arg max

µ∈ω J

.

U(n−1)eµ%U/

will be used as an approximation to (5) in the following simulation section. It should be pointed out that a guarantee of convergence to a global maximum is still an open problem.

3. SIMULATION

The performance of the proposed gradient based algo-rithm, which will be called GAEX, will be compared with the JADE algorithm proposed in [3].

To compare the different orthogonal simultaneous matrix diagonalization algorithms, the performance will be measured on a set of matrices {Tk}n_k=1⊂ Rm×m,

where Tk=UDkUT+βE, where U is a randomly

gener-ated orthogonal matrix, Dkis a random diagonal matrix,

Figure 1: The diagonalization of a set random matrices where α is varying.

Figure 2: The convergence of the GAEX algorithm for different trials when β = 0.

Figure 3: The convergence of the GAEX algorithm for different trials when α = 0.5.

(6)

E ∈ Rm×m_{is a random matrix and β ∈ R is a gain factor.}

Let x be uniformly distributed

U_(a,b)(x) ∼ : ₁

b−a ,a < x < b

0 , elsewhere

Then let Dkii and Eij be randomly drawn elements in

U_(−100,100)(x) and U is an orthogonal basis of the sub-space spanned by the columns of random matrices con-tained in Rm×m_{and which entries are randomly drawn}

elements in U_(−100,100)(x). Moreover in all simulations m = n = 5 and the proposed gradient based method will be initialized with U(0)=Im.

A measure on how diagonal a set of matrices is, is the following γ = $_n k=1$diag(Tk)$2 $_n k=1$Tk$2 .

This is indeed a measure on how diagonal the set of matrices is, since 0 ≤ γ ≤ 1 and when orthogonal trans-forms are applied then γ decreases continuously as Tk

deviates continuously from a diagonal form. Let α =$nk=1$Dk$2

$βE$2 and let α vary from 0 to 0.9 with

a hop size of 0.1, then a comparison between the men-tioned procedures as a function of α can be seen on figure 1.

Furthermore the convergence of the GAEX algo-rithm for different trials as a function of the iteration number when β = 0 and α = 0.5 can be seen on figure 2 and figure 3 respectively.

4. SUMMARY

We have proposed a joint diagonalization algorithm which transforms the constrained optimization prob-lem into an unconstrained probprob-lem. This means that any conventional iterative method operating in a vec-tor space could be applied to the problem.

An explicit expression for an isomorphism between S⊥_n(R) and Rn(n−1)2 was introduced. Furthermore

ex-pressions of the first order derivatives of the proposed method was also presented.

Simulations indicate that the gradient based method converges to the same point as the JADE algorithm does. Hence a potential application of introduced method is in the field of adaptive BSS and this topic is left open for future investigation.

REFERENCES

[1] B. Afsari and P.S.Krishnaprasad “Some Gradient Based Joint Diagonalization Methods for ICA,” in Proceedings of the 5th International Conference on In-dependent Component Analysis and Blind Source Sep-aration, Granada, Spain, September 22-24, 2004, pp.437–444.

[2] A. Belouchrani and K. Abed Meraim and J.F. Car-doso and E. Moulines, “A blind source separation technique based on second order statistics,” IEEE Trans. Signal Process., vol. 45, pp. 434–444. 1997.

[3] J. F. Cardoso and A. Souloumiac, “Blind Beamform-ing for non-Gaussian signals,” IEE ProceedBeamform-ings-F, vol. 140, pp. 362–370. 1993.

[4] J. F. Cardoso and A. Souloumiac, “Jacobi angles for simultaneous diagonalization,” SIAM J. Mat. Anal. Appl., vol. 17, pp. 161–164. 1996.

[5] A. Cichocki and S-I. Amari, Adaptive Blind Signal and Image Processing. John Wiley & Sons, New York, 2002.

[6] J. De Leeuw, “A New Computational Method to Fit the Weighted Euclidian Distance Model,” Psychome-trika, vol. 43, pp. 479–489. 1978.

[7] M. Joho and K. Rahbar, “Joint diagonalization of correlation matrices by using a Gradient methods with application to blind signal separation,” in IEEE Proc. SAM, Rosslyn, VA, August 4-6, 2002 pp. 273– 277.

[8] M. Joho and H. Mathis, “Joint diagonalization of correlation matrices by using a Newton methods with application to blind signal separation,” in IEEE Proc. SAM, Rosslyn, VA, August 4-6, 2002 pp. 403-407.

[9] J. Gallier, Geometric Methods and Application: for com-puter science and engineering. Springer, New York, 2000.

[10] R. A. Horn and C. Johnson, Matrix Analysis. Cam-bridge University Press, New York, 1985.

[11] O. Linton and R. McCorie, “Differentation of an Exponential Matrix Function,” Econometric Theory, vol. 11, pp.1182–1185. 1995.

[12] J. Manton, “Optimization Algorithms Exploiting Unitary Constraints,” IEEE Trans. Signal Process., vol. 50, pp. 635–650. 2002.

[13] J.R. Magnus and H. Neudecker, “The Commuta-tion Matrix: Some Properties and ApplicaCommuta-tions,” The Annals of Statistics, vol. 7, pp. 381–394. 1979. [14] J.R. Magnus and H. Neudecker, “The Elimination

Matrix: Some Lemmas and Applications,” SIAM J. Alg. Disc. Meth., vol. 1, pp. 422–449. 1980.

[15] H. Neudecker and T. Wansbeck, “Some Results on Commutation Matrices, with Statistical Applica-tions,” The Canadian Journal of Statistics, vol.11, pp. 221–231. 1983.

[16] T. Tanaka and S. Fiori, “Least Squares Approx-imate Joint Diagonalization on the Orthogonal Group,” in ICASSP, Honolulu, Hawaii, USA, April 15-20, 2007, pp. 649–652.

[17] I. Yamada and T. Ezaki “An Orthogonal Matrix Op-timization by Dual Cayley Parameterization Tech-nique, ” in Proceedings of the 4th International Con-ference on Independent Component Analysis and Blind Source Separation, April 1 4, 2003, Nara, Japan, 2003, pp. 35–40.

[18] A. Ziehe and K. R. M ¨uller, “TDSEP - an efficient algorithm for blind separation using time struc-ture,” in Proc. Int. Conf. on Artificial Neural Networks, September 2-4, 1998, Sk¨ovde, Sweden, pp. 273–277.