Disponible à / Available at permalink :

(1)

- - -

Dépôt Institutionnel de l’Université libre de Bruxelles / Université libre de Bruxelles Institutional Repository

Thèse de doctorat/ PhD Thesis Citation APA:

Verdebout, T. (2008). Optimal inference for one-sample and multisample principal component analysis (Unpublished doctoral dissertation). Université libre de Bruxelles, Faculté des Sciences – Mathématiques, Bruxelles.

Disponible à / Available at permalink : https://dipot.ulb.ac.be/dspace/bitstream/2013/210448/4/0713fafd-33cb-4a08-b633-92f8b1c25855.txt

(English version below)

Cette thèse de doctorat a été numérisée par l’Université libre de Bruxelles. L’auteur qui s’opposerait à sa mise en ligne dans DI-fusion est invité à prendre contact avec l’Université ([email protected]).

Dans le cas où une version électronique native de la thèse existe, l’Université ne peut garantir que la présente version numérisée soit identique à la version électronique native, ni qu’elle soit la version officielle définitive de la thèse.

DI-fusion, le Dépôt Institutionnel de l’Université libre de Bruxelles, recueille la production scientifique de l’Université, mise à disposition en libre accès autant que possible. Les œuvres accessibles dans DI-fusion sont protégées par la législation belge relative aux droits d'auteur et aux droits voisins. Toute personne peut, sans avoir à demander l’autorisation de l’auteur ou de l’ayant-droit, à des fins d’usage privé ou à des fins d’illustration de l’enseignement ou de recherche scientifique, dans la mesure justifiée par le but non lucratif poursuivi, lire, télécharger ou reproduire sur papier ou sur tout autre support, les articles ou des fragments d’autres œuvres, disponibles dans DI-fusion, pour autant que :

Le nom des auteurs, le titre et la référence bibliographique complète soient cités;

L’identifiant unique attribué aux métadonnées dans DI-fusion (permalink) soit indiqué;

Le contenu ne soit pas modifié.

L’œuvre ne peut être stockée dans une autre base de données dans le but d’y donner accès ; l’identifiant unique (permalink) indiqué ci-dessus doit toujours être utilisé pour donner accès à l’œuvre. Toute autre utilisation non mentionnée ci-dessus nécessite l’autorisation de l’auteur de l’œuvre ou de l’ayant droit.

--- English Version ---

This Ph.D. thesis has been digitized by Université libre de Bruxelles. The author who would disagree on its online availability in DI-fusion is invited to contact the University ([email protected]).

If a native electronic version of the thesis exists, the University can guarantee neither that the present digitized version is identical to the native electronic version, nor that it is the definitive official version of the thesis.

DI-fusion is the Institutional Repository of Université libre de Bruxelles; it collects the research output of the University, available on open access as much as possible. The works included in DI-fusion are protected by the Belgian legislation relating to authors’ rights and neighbouring rights.

Any user may, without prior permission from the authors or copyright owners, for private usage or for educational or scientific research purposes, to the extent justified by the non-profit activity, read, download or reproduce on paper or on any other media, the articles or fragments of other works, available in DI-fusion, provided:

The authors, title and full bibliographic details are credited in any copy;

The unique identifier (permalink) for the original metadata page in DI-fusion is indicated;

The content is not changed in any way.

It is not permitted to store the work in another database in order to provide access to it; the unique identifier (permalink) indicated above must always be used to provide access to the work. Any other use not mentioned above requires the authors’ or copyright owners’ permission.

(2)

D 03538

Université Libre de Bruxelles

Faculté des Sciences Département de Mathématique

OPTIMAL INFERENCE FOR ONE-SAMPLE AND MULTISAMPLE PRINCIPAL COMPONENT

ANALYSIS

Thèse présentée en vue de l’obtention du grade de Docteur en Sciences, orientation statistique

Promoteur: Marc HALLIN Co-promoteur: Davy PAINDAVEINE

Année académique 2008-2009 Thomas VERDEBOUT

Université Libre de Bruxelles

(3)

Université Libre de Bruxelles

Faculté des Sciences Département de Mathématique

OPTIMAL INFERENCE FOR ONE-SAMPLE AND MULTISAMPLE PRINCIPAL COMPONENT

ANALYSIS

Thèse présentée en vue de l’obtention du grade de Docteur en Sciences, orientation statistique

Promoteur: Marc HALLIN Co-promoteur: Davy PAINDAVEINE

Année académique 2008-2009 Thomas VERDEBOUT

(4)

J’aimerais adresser mes remerciements les plus sincères aux Professeurs Marc Hallin et Davy Paindaveine. Je les remercie pour tout ce qu’ils m’ont apporté tant au niveau des connaissances en Statistique Mathématique qu’au niveau du soutien et de l’aide fournis dans la réalisation de ce travail.

Je voudrais remercier l’équipe des Statisticiens de l’ULB qui m’a offert le poste d’assistant que j’occupe aujourd’hui et qui m’a vraiment acceuilli de façon tout à fait sympathique, dans une ambiance de travail plus que positive. Merci en particulier aux Professeurs Catherine Dehon et Catherine Vermandele qui m’ont beaucoup soutenu durant ces années de travail.

Je tiens à remercier tous les membres du jury pour l’intérêt porté à ce travail. Je remercie en particulier le Professeur Guy Mélard d’en avoir accepté la présidence.

Je remercie les équipes d’ECARES et du LMTD qui m’ont fourni une aide plus qu’appréciable.

Merci en particulier aux Professeurs David Veredas et Jean-Jacques Droesbeke, à Hendrika, Claude, Nancy, Romy et tous les autres. Je remercie le Professeur Masanobu Taniguchi ainsi que tous les collègues Japonais qui ont fait de mon séjour à Tokyo et Hakone un moment inou

bliable.

Une aide constante m’est venue des collègues de bureau mais aussi amis. Merci donc à Nézar, aux deux Charles, Christophe, Yvik, Héléna et tous les autres. Merci aussi pour le soutien apporté par les amis de l’université et d’ailleurs, merci Dave, Sarah, Greg, Aurélie, les deux Quent, Fanny, Johan, Sophie, Dude, Tom, Justine et tous les autres.

Merci âmes parents, grand-parents, mon frère, Amélie, Anne, Nico, Martine, Lulu ainsi qu’à toute la famille pour leurs encouragements durant ces années.

Enfin, je voudrais remercier particulièrement Marie pour son aide quotidienne et pour l’ad

mirable patience dont elle a fait preuve pendant les moments difficiles de ces années de thèse.

(5)

sis (PCA). More specifically, we develop pseudo-Gaussian and signed-rank procedures for various testing problems in one-sample and multisample PCA. Our tests are valid under general ellipticity assumptions and achieve Le Cam optimality at correctly specified densities. Although it is mainly of a theoretical nature, this work is aJso driven by the objective of designing methods useful for practitioners, which explains that numerical results are presented throughout.

1 Principal Component Analysis.

1.1 Définition.

The earliest description of the multivariate tool which is known as PCA can be traced back to Pearson (1901) and HoteUing (1933). In many applications, the random vector of interest, X say, collects a large number k of variables, which darkens the structure of the corresponding distribution, hence makes difficult the study of the properties of this random vector. The main idea of PCA, in this context, is to replace the k marginals of X = (Xi,..., Xk)' with (typically, a few) appropriately chosen random variables Ui, called the principal components (PCs), in such a way that most of the information présent in X sit in the first PCs. Actually, PCs are uncorrelated linear combinations of the original variables Xi achieving the latter information objective.

More precisely, PCs are defined as follows. Assuming that the random vector X has finite second-order moments, the first PC Ui is obtained by looking for the linear combination ui — (3'iX. — of tho original variables which has maximum variance, subject to the constraint ;= {/? € ; ||;9|| = 1}, that is

Ui := /3[X, with Pi := argmax^ç5fc-i Var(/3'X).

For any £ = 2,... ,k, the tth PC is then defined as the linear combination U£ = /3^X of the original variables that still has maximum variance, but now subject to the constraint that G and that U£ is uncorrelated with Ui, i = !,...,£— 1, that is,

ue := /?fX, with := argniax^g5fc-i ^'X)=o, Var(/?'X).

If the k original variables are to be replaced with a collection of q < k variables Yî,..., Yg, then the first q PCs provide an “optimal” choice for those q variables, in the sense that they minimize

(9)

Introduction 6 the “information” that is lost when performing this replacement. What makes PC A interesting in practice is that it is usually so that most of the variability in X is contained in the first few PCs.

Denoting by E the covariance matrix of X, the PCs associated with X are usually computed as follows. Defining the /c-dimensionaJ spécial orthogonal group as

SO{k) := {O e GL(A:)|0'0 - ljt,det(0) = 1} , where GL{k) stands for the collection of {k x k) invertible matrices, let

S = (1)

be “the” spectral décomposition of S that is so that (i) P = (/?i,... ,/3^) € SO{k) and that (ii) the diagonal entries of

As V

\

are in decreasing order. It can then easily be shown that the ^th PC is given by ug — /î^X, with corresponding variance Therefore it is clear that inference on eigenvectors and eigenvalues of the covariance matrix is of primary interest in PCA.

PCs can also be interpreted geometrically. If one considers the family of k-dimensional ellipsoids

(x —— ^) = c, c>0 (2)

(for the sake of simplicity, we assume here that E is positive definite), the vectors (3e, £ = 1,..., k—which weight the original variables in the computation of the PCs—are parallel to the principal axes of these ellipsoids. And the size of each principal axis is directly related to the corresponding variance the first principal axis of such ellipsoids therefore détermines the direction in which variability is largest.

1.2 Gaussi^ul inference in Principal Component Anedysis,

As often in multivaxiate analysis, standard inference procedures assume that the underlying distribution is multinormal, i.e., that the fc-dimensional observations Xi,... ,X„ are mutually independent with a common A/Â;(d,E) distribution.

In this Gaussian parametric model, it is well-known that the Gaussian maximum likelihood estimators p = {P^,... ,Pk) and As for the population eigenvectors matrix P and eigenvalues matrix As are respectively given by the eigenvectors and eigenvalues of S := ~ X)(Xj — X)', with X := i Xj. Anderson (1963) derived the asymptotic joint distribution of P and As- More precisely, denoting by vec (A) (resp., dvec (A)) the vector stacking the entries (resp., the diagonal entries) of A on top of each other, and assuming the population eigenvalues are pairwise distinct, we hâve that, at the multinormal.

/ n^/^dvec (Âs — As)

y n^/^vec (P — P) A4(fc+i) 0 0

2A| 0 Vl 0 r. jj ■

(10)

as n —» po, where dénotés convergence in distribution, and Tp stands for the x k"^) matrix with block

a a!

^ (Aj;S - Aj;E)2

¥i

(At;E - in position (i, j), i,j = l,...,k.

Of course, this distributional resuit can be used to make inference about the population PCs—provided that the classical Gaussian assumption is fulfilled. We briefly provide two ex

amples, in the context of hypothesis testing.

1.2.1 Example 1.

First, we consider the problem of testing that the first eigenvector is equal to a fixed unit vector /3°, that is, the null hypothesis is Hq : — 0^. Anderson (1963) introduced the test 0/j,Anderson which rejects the null (at asymptotic level a) when

^Anderson n Pi ~‘^) > xLl,l-a, (3)

where Ai;E of course dénotés the upper left entry of Ae, emd Xfc-i,i-a stands for the a-upper quantile of the chi-square distribution with k — l degrees of freedom. We stress that Anderson’s test strictly requires multinormality. To improve on that, a more robust procedure has been introduced by Tyler (1981, 1983) for the same problem; Tyler’s tests are valid (in the sequel, a test is said to be valid if it meets asymptotically the nominal level constraint a) under any ellip- ticaJ distribution with finite fourth-order moments while retaining the properties of Anderson at the multinormal model.

1.2.2 Ex£UTiple 2.

Keeping in mind that the main objective of PCA is dimension réduction, it is crucial to be able to décidé (in a rational way) about the number q of PCs that should be kept to perform the subséquent analysis of the data. Varions rules hâve been proposed in the literature. Perbaps the most standard criterion for choosing q is to select the smallest value of q that ensures the proportion of the total variability “explained” by the last k — q PCs, namely

£j-,+i >'is

falls below some fixed target level po, say.

This motivâtes our second example, in which we want to test the null 7io '■ P = Po against the alternative 7ii : p < po, where po G (0,1) is fixed. Again, the classical Gaussian test is due

(11)

Introduction 8 to Anderson (1963). This test, 0a,Anderson say, rejects the null at asymptotic level a when

^Anderson •—

(n/2)V2 1[(1 - Po) Ej=9+1 - PO Ej=l 1 = 1 + (1 ~ Po)^ E^=g+1

<

where Za stands for the a-quantile of the standard normal distribution. As in our first example, this test requires Gaussian assumptions. Robustified tests, in the same spirit as Tyler’s tests mentioned above (ensuring validity under any elliptical distribution with finite fourth-order moments and retaining the properties of Anderson’s test at the multinormaJ), were proposed by Davis (1977).

The testing problems in Examples 1 and 2 above are very important in practice. The procedures proposed by Tyler and Davis, however, can be improved both in terms of robustness and in terms of efficiency. Defining robust and efficient tests for those problems will be one of the main goals of this work.

2 Common Principal Components.

2.1 Définition.

Multisample extensions of PCA were proposed by Flury (1984), in which the so-called Common Principal Components (CPC) model is introduced. Let (Xü, ..., Xi„J, z — 1,..., m be a collec

tion of m mutually independent samples of i.i.d. A:-dimensional random vectors with covariance matrix Ej - )', i = 1,... ,m; these m spectral décompositions here are as in (1).

Flury’s original goal was to model (possibly strong) relationships between covariance matrices.

Similarities among covariance matrices may take many forms, among which:

Level 1. Ail Ej’s are equaJ: Si = ... = E^. This is usually called homogeneity of covari

ances.

Level 2. The Ej’s are proportional to each other; for each i = 2,... ,m, there exists a posi

tive constant pi such that Sj = Pi^i-

Level 3. The Sj’s share the same eigenvectors: there exists ,5 G SOk and diagonal matri

ces As-, i = 1,... ,m such that Ej = i — l) - , m. This is what is called the CPC model.

Level 4- The Sj’s are arbitrary symmetric and positive definite matrices.

The introduction of CPC, which is an obvions generalizaticn of PCA to several populations, can be motivated in varions ways, one of them being the so-called parsimony principle, which recommends introducing extra parameters only when the data indicate they are required. In this

(12)

respect, the CPC model is more parsimonious than the general model in Level 4, while providing much more flexibility than the models associated with Levels 1 and 2. Most importantly, the CPC model is relevant in numerous apphcations, especially in biology. The CPC assumption typically holds when the same variables are measured, for instance, on several animal species.

Hence, CPC can be regarded as an interesting level in the hierarchy above (note that if m covariance matrices satisfy the constraint in level k, then they satisfy the constraint in level k + 1, k = 1,2,3).

2.2 GaussÎEin inference.

Assume (Xü, ...,Xj„J, i = 1,... ,m are m mutually independent samples of i.i.d. fc-dimensional Gaussian random vectors with location parameters Oi and positive definite covariance matrices Sj, Z = 1,..., m. Flury (1984) considered inference for the CPC model in this Gaussian context.

More specifically, he first considered maximum likelihood estimation of the parameters of the CPC model. That is, estimation of the common eigenvectors j3 = (/?i,... ,P^) and of the (possibly heterogeneous) eigenvalue matrices

^ Ail

As, Z = 1,..., m;

see the previous section. Denoting the sample means and covariance matrices by and Si:=^-|i(Xÿ-Xi)(Xÿ-Xi)',

j=i ^ j=i

respectively, 0 and the Âi, ’s are solutions of the likelihood équations

Pj j Ajj, i 1,..., TTi, j 1,..., k, PjPi ^) .7 1,..., k,

where ôji is the usual Kronecker Symbol. An explicit solution of the likelihood équations (4) does not exist, but an algorithm for solving them numerically has been proposed by Flury and Gautschi (1986).

Flury (1986) studied the asymptotic behavior of those MLEs at the Gaussian model above.

More precisely, assuming that 0 is properly identified, he showed that, as n :=

nî^^dvec(ÀEj -AeJ \ / ( 0 > ^ 2A|;j 0 0 ^ i

n^^dvec (Âe„, - As„,) 0

0

2A| 0

n^/^vec (P — P)

J

^V ^{l 0} ⁰ ^J

J

(5)

(13)

Introduction 10 where dénotés the {k^ x k^) matrix with block

k / m

^ n (\l;E - \h;Ey l^i

-1

)-M M. h=j

in position {h,j), h,j = 1,... ,k.

In this Work, we do not deal with point estimation in CPC, but rather with the problem of testing the null hypothesis of CPC. That is, we consider the problem of testing Ho :Hi =

i — 1,... ,m. Flury (1984) showed that the Gaussian hkehhood ratio test for this problem rejects the null of CPC (at asymptotic level a) whenever

—2 log A Flury

, _det{diag{pSip)) ^ 3

^ X{m-l)k{k-l)/2,l-a^

i=i det(p Si p)

(6) where diag(A) stands for the diagonal matrix with the same diagonal as the matrix A. Note that —21og Apiury has a very intuitive interprétation: this statistic will be small (close to zéro) iîps^p is nearly diagonal for alH == 1,..., m, hence can be regarded as a measure of how well the Sj’s can be simultaneously diagonalized.

Again, this Gaussian test is valid only if the observations hâve a multinormal distribution.

However, to the best of our knowledge, no robustification of Flury’s test is avallable in the literature, not even under the (quite strong) assumption that the observations hâve an elliptical distribution with finite fourth-order moments. Another objective of this thesis is to define such robust tests, which should remain valid even under heterokurtic elliptical distributions. As we will explain later in this introduction, a further goal is to develop tests for the null of CPC that are both validity- and efficiency-robust.

3 Elliptical densities, signs and ranks.

Most classical statistical procedures in multivariate analysis are based on strong multinormality and moment assumptions. As we hâve seen above, inference in one-sample and several sample PCA problems is no exception to this rule; see Sections 1 and 2, respectively. This section briefly présents the basic concepts and tools that will be used in this work in order to relax Gaussian and moment asumptions.

3.1 Elliptical distributions.

In this thesis, ths main distributional framework will be that of elliptical distributions, which were introduced in the last décades to extend multinormal models by allowing non-Gaussian

(14)

tail behaviors. Elliptical densities (in the sequel, we restrict to distributions that are absolutely continuons with respect to the Lebesgue measure on R*') are chareicterized by a probability density function of the form

for some fc-dimensional vector $ {location), some scale parameter 6 Rq , some symmetric and positive definite {k x k) matrix V with déterminant one (the shape matrix), and some (duly standardized: see below) function fi : Rq —> R"^. This function /i—which is called the radial density in the sequel—actually is not a probability density function since it does not integrate to one. Let (throughout dénotés the symmetric root of V)

di(d,V):=||V-i/2(Xi-0)|| (8)

be the Euclidean norm of the centered and sphericized obsevation — d), i = . ,n. If Xi,... ,X„ are i.i.d. with common density (7), the dj’s in (8) are i.i.d. with probability density function and cumulative distribution function

and

r^-hk(-) ■■=--- î--- (-) h

a \aj apk-vji \(^J

(^)

^[r>0]

Fik (^) := fj'' hk{s)ds,

(9)

respectively, provided that pk-i ji ■= < oo, an assumption which is henceforth always made on /i. To avoid moment assumptions, we will furthermore assume that f\ belongs to the set of standardized radial densities

:= |/i e .F : (Mfc-i;/i) ^ ^ ^fi{r)dr = ,

where T {/i > 0 a.e. : < oo} is the set of radial densities. That is, is defined in such a way that the dj’s hâve common médian a.

Spécial instances of elliptical densities are the fc-variate multinormal distribution, with radial density /i(r) = 0i(r) := exp(—afcr^/2), the Ar-variate Student distributions, with radial densities (for 1/ £ Rq degrees of freedom) /i(r) = /i,i/(i') (1 + and the fc-variate power-exponential distributions, with radial densities /i(r) = /f ,j(t) exp(—rj G Rj”;

the positive constants Cfc, a*,,,/, and bk,rj are such that /i G .Tq.

At this point, a clarification is needed. As the very définition of PCA does involve covariance matrices, hence also finite second-order moment assumptions, it may seem puzzling at first sight that we want to avoid any moment assumption in this work. However, the géométrie interprétation of PCA (see the end of Section 1.1) makes clear that, in an elliptical context, PCA can be based on the scatter matrix E := cr^V as well as on standard covariance matrices. The advantage of such scatter matrices is clearly that they make sense without any finite moment.

(15)

Introduction 12 Scatter matrices, which hâve therefore become classical tools in robust statistics, are, under ellipticity and finite second-order moments, proportional to the covariance matrix, hence can be considered as extensions of the traditional notion of covariance matrix. Most importantly, ail testing problems described in the préviens sections naturally extend to the setup where the observations would hâve an elliptically symmetric distribution, even in the absence of any moment assumption. Again, covariance matrices then need to be replaced with scatter (or shape) matrices.

Clearly, an elliptical density (see (7)) is characterized by a location 6, a scale parameter (T^, a shape matrix V, and a standardized radial density function /i. Above, the shape matrix V was normalized so that it has déterminant one. Alternatives normalizations, consisting e.g.

in imposing that (V)n = 1 or that tr(V) = k, are also used in the literature. We adopt the determinant-based normalization in the sequel, since Paindaveine (2008) showed that it is the only normalization that implies the block-diagonality of the resulting Fisher information matrix (for any other définition of shape/scale, information matrices will hâve non-zero shape/scale blocks). Considering then the spectral décomposition S = of the scatter matrix, where P G SOk and where the diagonal entries of

Ae

f Ai;E V

are in decreasing order, the eigenvalues décomposé into Aj =: <r^Av, where the diagonal matrix Av collects the eigenvalues of the shape matrix V. With this notation, the natural parameter for the elliptical densities in this work will be

i9:-(é>,a2,Av,/3),

which will also lead to block-diagonal information matrices—provided that the determinant- based définition of scale/shape is adopted. The latter block-diagonality highly simplifies inference on eigenvalues and eigenvectors in the sequel.

We end this section by announcing that optimality of the tests we propose in this thesis requires some mild regularity conditions on fi. More precisely, optimality at radial density fi will actually require /i to belong to the collection Ta of absolutely continuons densities in Ti for which

du < oo and Jk(fi) ■= du < oo, (10) where, denoting by fi the a.e. dérivative of /i, we let ipf^ —fijfi. We stress, however, that the validity of our tests do not require such assumptions, but only that /i G .

3.2 Invaricince, signs and ranks.

Rank-based methods today form a substantial body of techniques providing robust alternatives to the classical parametric methods. A full-scale development of methods based on ranks seems

(16)

to hâve been sparked by a paper of Wilcoxon (1945) on the comparison between two treatments.

The popnlarity of nonparametric (rank-based) methods is mostly explained by the fact that they are valid under extremely mild assumptions. But for a long time, it was thought that the price to pay to achieve such good robustness properties was a severe lack of efficiency. This is however not at ail the case, as it was shown in Hodges and Lehmann (1956) and ChernoflF and Savage (1958); see below.

In hypothesis testing, it often happens that the submodel associated with the null hypothesis Tfo is invariant with respect to some group of transformations Ç. To iUustrate this, let Yî,..., be i.i.d. random variables admitting a (common) probability density fonction of the form x f{x — d), where / belongs to the family T of synmietric (positive) densities on the real line.

Consider the testing problem

Tfo : 0 = 0

0, (11)

for which the underlying density / plays the rôle of an infinite-dimensional nuisance. Of course, if / is supposed to be known, one can dérivé optimal procedures at this fixed /. For instance, if this specified / is Gaussian, then Student tests are obtained.

In a semiparametric approach, the underlying density f € ^ is not specified anymore. A way to bypass this nuisance is to adopt the invariance principle which States that

If the null hypothesis Ho is invarieint with respect to a group of transforma

tions Q, one should restrict to tests that eire invariant with respect to G- Now, consider the group of transformations G+ ■= {Gg} defined by

Gg{Yi,...,Yn) := {g{Yi),...,g(Yn)),

where 5 is a odd, continuons, and strictly monotone increasing function that satisfies lima;_oo g{x) 00. The null hypothesis in (11) is clearly invariant with respect to G+- Therefore, the invariance principle leads to restricting to tests that are constant along the orbits of G+, equivalently, to tests that are measurable with respect to the maximal invariant associated with G+- It is easy to show that such a maximal invariant is given by the vector

(sj, . . . , Sfi, i?i, . . . , Rn)i

where si and Ri dénoté the sign of Yi and the rank of |yî| among |yî|,..., ly^l, respectively. This shows that the invariance principle naturally brings rank-based tests (here, actually signed-rank tests) into the picture.

Some of the resulting rank-based tests enjoy very good efficiency properties, as shown by the following results. Kodges and Lehmann (1956) showed that, still for the one-sample problem above, the asymptotic relative efficencies of the Wilcoxon signed-rank test (pw with respect to the traditional Student test 0stu satisfy

infj-AREy(^w/0Stu) = 0.864,

(17)

Introduction 14 where the infimum is computed over ail densities f G. T with finite second-order moments. In other words, Wilcoxon’s test, in the worst case, requires only 13.6% more observations than Student’s to achieve the same asymptotic (local) powers. We stress that this says something about the worst case, and that there is no best case, in the sense that it is possible to exhibit sequences of densities fk, k = 1,2,..such that

limfc_—*oo _{- OO,}

which means that Student’s test may be arbitrarily more demanding than Wilcoxon’s in terms of sample sizes to achieve the same power.

The Chernoff and Savage (1958) resuit concerns the Van der Waerden (i.e., normal-score) signed-rank test 0vdWi and States that (the infimum ranges over the same collection of densities as in the Hodges and Lehmann resuit)

inf/e:FARE/(0vdw/0stu) = 1,

and that this infimum is achieved under Gaussian densities only. In other words, the Van der Waerden signed-rank test always performs strictly better (asymptotically) than the Student test when the underlying distribution is non-Gaussian. Here again, there is no finite upper bound for the corresponding asymptotic relative efficiencies.

It turns out that, in many testing problems of a semiparametric nature, the invariance principle leads to considering signed-rank tests, and the (ellipticaJ versions of the) testing problems introduced in Sections 1.1 and 1.2 are no exception to this rule. To show this, let Xi,... ,X„

be fc-variate i.i.d. observations, with (common) ellipticaJ density / in (7), and dénoté the cor

responding hypothesis by we use the notation introduced in the previous section. Now, décomposé X^ into

Xi^e + ck{e,y)v^/^iJi{e,v), where

d,(d,V) = ||V-V2(x,-0)|| and U,(g,V) ||v-!/2|xi I g)|| ‘

Under the multivariate signs Ui(d, V),... ,U„(d,V) are i.i.d. uniformly distributed on the unit sphere in R*’, and the random distances di(0, V),... ,d„(d, V)—which are i.i.d.

with the probability density function fik in (9)—are independent of the Uj(0, V)’s.

Consider then the group g := gey,o of continuons monotone radial transformations defined by

g(Xi,...,X„) = g(d + di(ÿ,V)V'/2u^(^^V^^ + V))

:= 9 + h{d^{d, v))v^2u^ {9,vi...,e + hidn{9, v))v^/^Unie, v), where h : R+ i—> R"^ is a continuons, strictly monotone increasing function that satisfies h{0) = 0 and limrh_,oo h{r) = oo. Clearly, this group g,o generates the family of distribu

tions U(t2 Av/3/i "Tbat is, it generates the family of elliptical distributions with fixed values of 9, Av, and /?. For ail testing problems we consider in this work (the generalization

(18)

to multisample problems is obvious), it can be checked that the invariance principle leads to restricting to tests that axe measurable with respect to the maximal invariant associated with this group, namely to the vector of signs and ranks

{Ri {e,

^V),...,

Rn{e,

V), Ui((9, V),..., u„(ô,

v))

^, ⁽¹²⁾

where Ri{6,'V) dénotés the rank of di{0,'V) among di(d,V),... ,dn{0-,'V). The resulting tests will be automatically distribution-free under In practice, of course, those signs and ranks need to be replaced with estimâtes since the population values of $ and V are unknown.

4 Le Cam’s theory of asympotic experiments.

4.1 Local asymptotic normality eind the construction of locally euid asymptot- ically optim^d tests.

Consider a sequence of statistical “experiments”

where the parametric space H (which does not dépend on n) is an open subset of This sequence is said to be locally and asymptotically normal (LAN) iff, for any bounded sequence

t(") in the log-likelihood ratio satisfies

i(’■<“')Tçr(“) + op(l), (13) as n ^ 00 under where the central sequence is asymptotically normally distributed with mean zéro and covariance matrix Pç.

Under the LAN assumption, the sequence of local experiments (at any fixed 4 ^ := {r",B", {p^;Vv2, k G M-}}

actually converges to the m-dimensional Gaussian shift model := {R^,B^,{U(T^T,r^)\T €R^}} ,

where the convergence is based on the Le Cam distance between the sets of ail achievable risk functions (from R”^ to R"*") in the considered statistical experiments (for bounded loss functions).

This convergence in particular implies that when n —> oo, the power curves that can be achieved in converge—pointwise in T but uniformly in the set of ail possible statistical procedures—

to the power curves that can be achieved in fç. Vice versa, for ail achievable risk function R

in the Gaussian shift model, there exists a sequence of risk functions associated with that converges to R.

(19)

Introduction 16 Therefore it is clear that LAN and the convergence of statistical experiments above hâve crucial conséquences on the construction of locally and asymptotically optimal tests in the original sequence of experiments. More specifically, it can be shown that, if </>(A) is a test that enjoys some optimality property in the limiting experiment (here, A stands for the unique observation associated with é’ç), then the sequence of tests inherits, locally and asymptotically, the same optimality property in the original sequence of experiments; see Le Cam (1986) for more details.

4,2 Le Cam’s theory of asympotic experiments and PCA.

In the (elliptical versions of the) PCA problems we consider, the relevant sequences of experi

ments are

(14) where the parametrization i9 := {6, Av,/î) includes location, scale, and eigenvalues and eigenvectors of the shape matrix, and where the standardized radial density /i is fixed. Provided that the latter satisfies some mild regularity conditions, Hallin and Paindaveine (2006a) estabhshed LAN (actually, a reinforcement ULAN of LAN) for the families of distributions in (14) when parametrized by the location $ and the scatter matrix S; see Section 3.1. LAN or ULAN however are properties of the parametrization of a family of distributions, not of the family itself.

Since the Hallin and Paindaveine (2006a) {6, S) parametrization is not convenient in the présent context (due to the complicated relation between E and the quantities of interest A^ and fi in this thesis), we had to show how the ULAN resuit of Hallin and Paindaveine (2006a) carries over to the parametrization used in (14).

At first, one might think that establishing ULAN for this i>-parametrization (which is the natural one in the PCA setup considered) was the main difficulty in the construction of locally and asymptotically optimal tests. Unfortunately, this is not the case, since this new parametrization, where fi ranges over the set SOk oî k x k real orthogonal matrices with positive déterminant, raises problems of another nature. The subparameter \ec{fi) indeed takes its value in vec(<SOfc), a nonlinear manifold of , resulting in a curved ULAN experiment where the traditional local asymptotic optimality results (associated with local Gaussian shifts) do not apply.

The solution consists in relying on a third parametrization, which can be constructed from the fact that fi is in SOk iff if can be expressed as the exponential oî a, kx k skew-symmetric matrix L. Denoting by vech''‘(t) the vector resulting from stacking the upper off-diagonal éléments of t yields a parametrization involving location, scale, eigenvalues and vech''‘(t); the latter subparam

eter ranges freely over (an operi set of) yielding a well-behaved ULAN parametrization where local experiments converge to the classical Gaussian shifts, thereby allowing (as above) for the classical construction of locally asymptotically optimal tests. A last difficulty sits in the fact that translating the null hypothèses of interest in the t-space in practice is untractable. We will therefore hâve to show how optimality can be translated to the parametrization based on skew-symmetric matrices to the non-Iinear manifold SOk- The varions steps above are one of the main contributions of the first chapter of this thesis.

(20)

5 The Work, chapter by chapter.

5.1 Chapter 1.

In this first chapter, we want to develop tests for (we use the same notation as in Section 1.2)

^1=

i-,s < PO Ej=l

1

^{and {}l • ^{l^j=q+l j}

or equivalently, for

r

^{7io:Pi=0^}

r

^Ho

: ELg+i

^{-^j;V =}^PO

EL

i^j;V

l l : E^=,+i A,;v < Po E^=l A^;V,

on the basis of n i.i.d. observations with joint distribution • The curved ULAN property of the previous section and our theory for constructing locally optimal tests in such curved ULAN setups can readily be used in this context.

More precisely, we show that the test that rejects Hq : Pi — (3^ at asymptotic level a when nk{k + 2)

Jkifi) ^ Xlc —1;1—a! (15)

where

1 ” Si»;/! == n <^/i

i=l

di{e,v)\ di{e,\)

u,(û,v)Ui(û,vy,

is locally and asymptotically optimal (actually, locally and asymptotically most stringent) at (n) (n)

radial density /i. We study the Gaussian version of , which turns out to be asymp

totically équivalent (under Gaussian densities) to the Anderson test based on (3).

The Gaussian test based on Qjl^^ however is a highly parametric procedure, since it has the undesirable property of being valid (in the sense of meeting the asymptotic a-level constraint) at the Gaussian only. We therefore introduce a so-called pseudo-Gaussian version

of Qj!^^, which is based on a statistic of the form

— (1 + ^k) ^Q^}\

where consistently estimâtes the kurtosis of the underlying elliptical distribution. We show that the test based on inherits (at the Gaussian) the optimality properties of while rcmaining valid under any elliptical distribution with finite fourth-order moments.

The performance of test, however, quickly deteriorates when going away from the Gaus

sian case, particularly so under heavy tails. In other words, is validity-robust, but not efficiency-robust. To improve on this, we define signed-rank versions of the parametric /i-optimal tests—note that resorting to signed-rank tests, in the présent context, is again motivated by in

variance arguments, since the null hypothesis Ho ■ P\ = is invariant with respect to the

(21)

Introduction 18 group of monotone and continuons transformations introduced in Section 3.2. The resulting signed-rank tests reject Tfo at asymptotic level a when

Q

(n) nk{k + 2)

Jkih) —a»

where, defining Kf^{u) ^{u) for aJl it € (0,1), we let

s»,if/. (^^)u.(«,V)U/((»,V)'.

These tests are vaJid under the whole class of elliptic distributions, without any moment assumption. Still, they are locally and asymptotically most stringent at the target density /i.

Of course, is not a genuine test statistic as it involves the underlying value of We therefore replace â with some appropriate estimator. Keeping in mind that we want to avoid moment assumptions, we base this estimation on the Hettmansperger and Randles (2002) estimator of location 0 and the Tyler (1987) estimator of shape V; actually, starting from the eigenvectors of V, we use the Gram-Schmidt technique to obtain null eigenvectors estimâtes (for ail principal directions but the first one) which are root-n consistent, mutually orthogonal, and orthogonal to Most importantly, we show that the replacement of P with the resulting estimate P does not affect the behavior of under the null (nor under sequences of local alternatives).

For the problem of testing Ho ■ Ylj=q+i^jy — PoY!j=i we similaxly détermine the parametric locally and asymptotically optimal tests (for any density /i), and dérivé their Gaus- sian, pseudo-Gaussian, and signed-rank versions.

For both testing problems considered, we further détermine the asymptotic distributions of the proposed tests under local alternatives, by applying Le Cam’s third Lemma. This allows for computing the asymptotic relative efficiencies (AREs) of the proposed optimal signed-rank tests with respect to their pseudo-Gaussian competitors. Quite interestingly, we obtain the same AREs as in the problem of testing that the underlying shape matrix is equal to some fixed value (see Hallin and Paindaveine 2006a). This implies that the Chernoff-Savage resuit of Paindaveine (2006) applies in this context, and shows that the Gaussian-score version of our signed-rank tests uniformly dominâtes, in terms of AREs, their pseudo-Gaussian counterparts.

5.2 Chapter 2.

Consider again the CPC model where the covariance matrices Sj, i = 1,..., m of m populations of interest share the same matrix of eigenvectors P = (/îj,... ,Pk)- As mentioned in Section 2.2, Flury (1986) derived the asymptotic normality of the Gaussian MLE P and Aÿ;2, z = 1,... ,m, j = 1,... ,k] see (5). His resuit is restricted to the multinormal case, and the proof, which is a bit sketchy, relies on a disputable use of Slutsky’s Lemma.

(22)

In this chapter, we therefore dérivé a rigorous proof of Flury’s resuit and extend the resuit to the family of elliptic distributions with finite fourth-order moments. The idea of our proof is to dérivé an asymptotic représentation resuit for

^ nî^^dvec(Âj:j - Aei) C(") ; =

rij^ Avec

^ n^/^vec {fi — P) y

More precisely, we show that, under the family of elliptical distributions with finite fourth-order moments, there is a matrix A (see Chapter 2 for an explicit expression) such that

c(") - := - A

^ n|'^\ec(Si — Si) ''

n^^vec (S,ri - Sto) y

= op(l),

as ra oo, where Sj := - Xj)(Xy - Xj)', with Xj ;= That is,

the random vector is shown to be asymptotically équivalent to a linear transformation of the traditional sample covariance matrices. Since it is well-known that under elliptical densities with finite fourth-order moments, is asymptotically normal (see, e.g., Bilodeau and Brenner 1999), the asymptotic distribution of (still under elliptical densities with finite fomrth-order moments) follows as a trivial corollary.

As an application of the asymptotic normality resuit for we define a pseudo-Gaussian test for the null hypothesis Tio under which some specified {k — q)-tuple of common principal components accoimts for a proportion of the total variance that is smaller than a fixed proportion Po 6 (0,1) in ail populations. More precisely, we consider the null hypothesis

k k

Ho : , max ( V Xijx / V < po-

j=g+l j = l

Our pseudo-Gaussian test, contrary to the Flury (1986) Gaussian test, is valid under arbitrary m-tuples of possibly heterokurtic elliptical densities with finite fourth-order moments, while remaining asymptotically équivalent to Flury’s test under multinormal densities.

5.3 Chapter 3.

In (6), we described the Gaussian likelihood ratio test (LRT) for the null hypothesis of CPC.

The objective of the third chapter is to extend the validity of this test to the class of elliptical distributions with finite fourth-order moments.

Denoting by ovec(A) the vector stacking the k{k - l)/2 entries lying strictly above the diagonal of the k x k matrix A, we introduce the quadratic form

1 m Tïi

•= 2 ^ (trfT^] - tr[(diag(Ti))^]) = ^ (ovecTj)'(ovecTj), (16)

(23)

Introduction 20

where we let Tj := Si0Â^.^. We first show that this quadratic form is asymptoti- cally équivalent to the likelihood ratio test under the null, that is

qW = -21ogAFi ury + op(l)

as n —> oo under the null (actuaJly, under any null distribution with finite fourth-order moments).

Then, we assume that (Xji,... i — are m mutuaUy independent samples of i.i.d. fc-dimensionaJ elliptical random vectors with positive definite covariance matrices Ej, i — 1,..., m, respectively. To improve on Flury’s LRT (equivalently, on the test based on that does not meet the asymptotic level constraint under non-Gaussian distributions, we turn into a pseudo-Gaussian test statistic for which the validity holds under the class of elliptical distributions with finite fourth-order moments. The dérivation of such a test is ail but easy because of the possible heterokurticity of the m elliptical populations (the assumption of a common kurtosis parameter is not made here).

Letting Tj := (1 -|- (where ki is a consistent estimator for the kurtosis of the ith population), the proposed pseudo-Gaussian version of is given by

771

qJ") = E (ovecfi)'[5i,id, - (i>«)-'/2i>.(t>l''))-'/2] (ovecTe), (17) where is the usual Kronecker symbol and where, defining

(i)__n(l -|- Kj )ÂjjAjj

riî(Ajj Aj/)^ j,l 1,..., J <c Z, we let

/ Ai) \

m

and := (B0ï’r‘)~ • i=l

\ ‘^k—\,k\K /

We establish that is asymptoticaJly chi-square with (m — l)k{k — l)/2 degrees of freedom under the null and any possibly heterokurtic m-tuple of elliptical densities with finite fourth- order moments, ’.vhich shows the test based on is the desired pseudo-Gaussian test.

Finally, we investigate the finite-sample properties of (i) Flury’s LRT test, (ii) the Gaussian test based on and (iii) the pseudo-Gaussian test based on through a Monte-Carlo experiment.

i>w

i/'ioAi)_13;_k

.'-.(0

5.4 Chapter 4.

The simulations of Chapter 3 actually reveal that the pseudo-Gaussian procedure introduced there for testing the CPC hypcthesis—while being valid under any elliptical population with

(24)

fînite fourth-order moments—is poorly efficient away from the Gaussian case. The objective of this last chapter is to improve on this by using the same methodology as in Chapter 1.

More precisely, we now consider mutually independent samples (Xji,..., Xj„.), i = 1,..., m of i.i.d. /c-vaxiate elliptically symmetric observations with scatter matrices Ej. Cleaxly, the CPC model can be formulated in ternis of scatter matrices instead of covariance matrices. The nuU of CPC, in this general elliptical case, still takes the form Hq : = A^., i = 1,..., m, where yS is a matrix of common eigenvectors. Parrallel to Chapter 1, working with scatter matrices rather than covariance matrices allows for avoiding moment assumptions.

We start by stating the uniform local asymptotic normality (ULAN) property of this multisample elliptical model, which directly foUows from that of the one-sample case. Based on this ULAN structure, we construct the optimal (in the Le Cam sense) pseudo-Gaussian test for the null of CPC and check that it coincides with the pseudo-Gaussian test obtained in Chapter 3.

To combine validity- and efficiency-robustness, we define optimal signed-rank tests for the

null of CPC. Denoting by Rij the rank of among dn{di,PA\.,

), where 0 and Àv< (with diagonal éléments AjijVi • • •, Ajfe;v)i i — l, • ■ ■ ,m stand for adéquate estimators of the CPC model parameters, define

n.' i=i

Rjj

rii + l IJ ij {ài, ^Ay ^0 ) UUài, PAy^ fi ) i = (18) for some appropriate m-tuple of score functions (ATi,... ,Km)- The proposed signed-rank tests then reject the null hypothesis of CPC for large values of

:= k{k + IK

( m k 2){E E

i 3<J

niTij' iQi') la' \rà'

jj'=i

here, the Jk{Ki,gi)'s are some estimated cross-information coefficients, and the î^K,gjjds are defined as

^ ^'K,g-,12

î>K,g-13

\ î>K,g-,k-l,k

where, denoting by ùjf \ijyXu-y/{Xij-,y — Xu-y)'^, is the k{k— l)/2 x k{k — l)/2 diagonal matrix with diagonal éléments •. ■ ■ ■ ^^k-i k- study the asymptotic behavior of these signed-rank tests, both under the null and under sequences of local alternatives, and show that they indeed combine validity- and efficiency-robustness. In terms of validity, they are vaJid under any m-tuple of possibly heterogeneous elliptical densities without any moment assumption. As

(A^t, ffi) nJ{K,) (i>»)

-1 -1

(25)

Introduction 22 for efficiency, we establish that, for (essentially) any m-tuple of radial densifies / = (/i,..., fm), there exists a m-tuple of score functions {Kfi, - ■ ■ ,Kf^) such that the resulting signed-rank tests are optimal (in the Le Cam sense) under radial density /; in particular, when based on Gaussian scores, our signed-rank tests achieve the same asymptotic performances as the optimal Flury test at the multinormal. As we show (i) by computing their asymptotic relative efficiencies (with respect to the pseudo-Gaussian test of Chapter 3) and (ii) by performing a Monte-Carlo study, the proposed signed-rank tests exhibit a uniformly good power behavior, so that they indeed may be considered efficiency-robust.

(26)

Optimal Rank-Based Testing for Principal Components

This paper provides parametric and rank-based optimal tests for eigenvectors and eigenvalues of covariance or scatter matrices in elliptical families. The parametric tests extend the Gaussian likelihood ratio tests of Anderson (1963) and their pseudo-Gaussian robustifications by Tyler (1981, 1983) and Davis (1977), with which their Gaussian versions are shown to coincide, asymptotically, under Gaussian or finite fourth-order moment assumptions, respectively. Such assumptions however restrict the scope to covariance-based principal component analysis. The rank-based tests we are proposing remain valid without such assmnptions, and in the absence of any finite moments, hence address a much broader context, where covariance matrices need not exist, and where principal components are associated with more general scatter matrices. Asymptotic relative efficiencies moreover show that those rank-based tests are quite powerful; when based on van der Waerden or normal scores, they even uniformly dominate Anderson’s Gaussian procedures and their pseudo-Gaussian versions. The tests we axe proposing thus outperform daily practice both fi-om the point of view of validity as from the point of view of efficiency. The main methodological tool throughout is Le Cam’s theory of locally asymptotically normal experiments, in the nonstandard context, however, of a curved parametrization. The results we dérivé for curved experiments are of indépendant interest, and likely to apply in other setups.

1.1 Introduction.

1.1.1 Hypothesis testing for principed components.

Principal components are probably the most popular and widely used device in the tradi- tionai multivariate analysis toolkit. Introduced by Pearson (1901), principal component analysis (PCA) was rediscovered by Hotelling (1933), and ever since has been an essential part of daily statistical practice, basically in ail domains of application.

The general objective of PCA is to reduce the dimension of some observed A:-dimensional random vector X while preserving most of its total variability. This is achieved by considering an adéquate number q of linear combinations of the form /S^X,... ,;3gX, where Pj, j = 1,. . ., k are the eigenvectors associated with the eigenvalues Ai,... ,Afc of X’s covariance matrix Scov, ranked in decreasing order of magnitude. Writing /3 for the orthogonal k x k matrix with

Disponible à / Available at permalink :

D 03538

OPTIMAL INFERENCE FOR ONE-SAMPLE AND MULTISAMPLE PRINCIPAL COMPONENT

ANALYSIS

OPTIMAL INFERENCE FOR ONE-SAMPLE AND MULTISAMPLE PRINCIPAL COMPONENT

ANALYSIS

Contents

J

J

(^)

{Ri {e,

Rn{e,

v))

1

r

r

: ELg+i

EL

Optimal Rank-Based Testing for Principal Components