• Aucun résultat trouvé

in fulfilment of the requirements of the PhD Degree in Science (“Docteur en Sciences”)

N/A
N/A
Protected

Academic year: 2021

Partager "in fulfilment of the requirements of the PhD Degree in Science (“Docteur en Sciences”) "

Copied!
135
0
0

Texte intégral

(1)

Testing uniformity against rotationally symmetric alternatives on high-dimensional spheres

Thesis submitted by Christine CUTTING

in fulfilment of the requirements of the PhD Degree in Science (“Docteur en Sciences”)

Academic year 2019-2020

Supervisor: Professor Davy PAINDAVEINE Co-supervisor: Professor Thomas VERDEBOUT

ECARES and Département de Mathématique

(2)
(3)

Remerciements

Dans un premier temps, je tiens à remercier mon promoteur Davy Paindaveine, qui m’a permis d’effectuer cette thèse avec lui. Il me l’a proposé à un moment où j’étais un peu perdue dans mes études et je lui en suis infiniment reconnaissante. J’ai aussi eu la chance d’avoir Thomas Verdebout comme co-promoteur, avec lequel Davy forme une sacrée équipe. J’ai beaucoup apprécié sa personnalité très accessible et détendue. Merci, Davy, pour le cours de statistique de BA2 qui fut une révélation, et merci à tous les deux pour votre grande disponibilité et votre encadrement pendant ces six ans. De manière générale, je suis ravie de mon parcours à l’ULB et de la qualité des enseignements et des TP que j’y ai reçus.

Ensuite, je remercie mes parents qui m’ont donné les moyens de poursuivre les études que je voulais et m’ont toujours poussée vers le haut. Thank you Jo for being such an amazing aunt and for letting me confide in you. Merci bro d’être un chouette frère (mon préféré ;)).

Ces remerciements ne pourraient pas exister sans mentionner les gens merveilleux rencontrés à l’ULB qui m’ont permis de m’épanouir : Cédric, François, Mitia, Alexia, Thierry, Sylvie, Hoan-Phung, Patrick, Tania, Jérémie, Keno, Lancelot, Dimitri, Anna, An- toine, Florence, Julien, et beaucoup d’autres. Julie, je suis contente d’avoir quelqu’un comme toi sur qui je peux compter et avec qui j’ai passé autant d’années en bachelier, en colocation, comme collègues de bureau et partenaires de pâtisserie. Matthieu, je ne te remercierai jamais assez pour tout ce que tu as fait pour moi. J’ai une dette infinie de Triple Karmeliet envers toi. Nora et Clade Max, merci de partager avec moi la joie de passer des week-ends entiers en tournoi de badminton et de me rappeler que le plus im- portant, c’est de participer ! Dibo, je ne pensais pas rencontrer quelqu’un comme toi un jour, avec qui on se comprend tellement bien. Maintenant qu’on a commencé à partir en rando ensemble tous les ans, ça doit continuer au moins pour toujours ! Dong-Yan, je suis contente que tu aies été une étudiante impertinente prête à se lier d’amitié avec ses assistants. Tu es ma plus belle rencontre amicale pendant cette thèse, ta présence, même à distance, m’est indispensable et je chéris notre relation qui doit continuer aussi longtemps que les randos avec Dibo.

Enfin, merci, Alex, de me supporter tous les jours, notamment ces derniers mois. Tu es mon pilier et mon meilleur ami avec qui je ne me lasse pas d’être confinée, et ce n’est pas peu dire. Je t’aime fort mon “homme du moment”.

(4)
(5)

Contents

Notations 7

Introduction 9

1 Useful notions 13

1.1 Some distributions on the sphere . . . 14

1.2 Le Cam’s theory . . . 17

1.3 Invariance . . . 22

1.4 Billingsley’s Theorem . . . 24

2 Testing uniformity against a semiparametric extension of the FVML distribu- tions 27 2.1 Introduction . . . 28

2.2 Optimal testing under specified modal location . . . 30

2.3 Optimal testing under unspecified modal location . . . 32

2.4 Simulations . . . 37

2.5 Proofs . . . 39

3 Testing uniformity against a semiparametric extension of the Watson distribu- tions 43 3.1 Introduction . . . 44

3.2 Testing uniformity under specified location . . . 45

3.3 Testing uniformity under unspecified location: the Bingham test . . . 49

3.4 Testing uniformity under unspecified location: single-spiked tests . . . 52

3.5 Finite-sample comparisons . . . 58

3.6 Applications . . . 60

3.7 Proofs . . . 63

4 High-dimensional behaviour of the Rayleigh and Bingham tests under general rotationally symmetric distributions 77 4.1 Asymptotic non-null behaviour of the Rayleigh test . . . 78

4.2 Asymptotic non-null behaviour of the Bingham test . . . 83

4.3 Proofs . . . 92

Conclusion 113

A Asymptotic distribution underPθθθ(n)

nn,f, with κn = τp

p/n, of the Rayleigh test

statistic in the fixed-pcase 117

B Universality of the asymptotic high-dimensional distribution of the Rayleigh test

in the FvML case 119

(6)

C Consistency of the Bingham test whenn1/4κn/p3/4n → ∞in the high-dimensional

FvML case 125

Bibliography 129

(7)

Notations

⊗ the usual Kronecker product

D convergence in distribution

Lr

→ convergence in the Lr-norm

0 the null vector

A the Moore–Penrose generalized inverse of matrixA

B(Ω) the Borelσ-algebra onΩ

cp p Γ(p/2)

πΓ((p−1)/2)

χ2d the chi-square distribution withddegrees of freedom χ2d,α theα-quantile of the chi-square distribution withddegrees of

freedom

χ2d(λ) the noncentral chi-square distribution withddegrees of freedom and noncentrality parameterλ

e` the`th vector of the canonical basis ofRp F the class of functions©

f :R→R+ª

such that f is monotone increasing, twice differentiable at 0 andf(0)=f0(0)=1

Γ(·) the Euler Gamma function

Ip thep×pidentity matrix

Iν(·) the order-νmodified Bessel function of the first kind

IA the indicator function of condition A

iid independent and identically distributed

Jp the matrix¡

vecIp

¢ ¡vecIp

¢0 Kp thep2×p2commutation matrixPp

i,j=1

³ eie0j´

⊗¡ eje0i¢ λˆni theith largest eigenvalue of the sample covariance matrixSn

LAN Locally Asymptotically Normal

N ¡ µ,σ2¢

the univariate normal distribution with meanµand varianceσ2 Np

¡µµµ,ΣΣΣ¢ the multivariate normal distribution with meanµµµand covariance matrixΣΣΣ

(8)

P(n)0 theXni’s,i=1, . . . ,nare iid and uniformly distributed onSpn−1 Pθθθ(n)

n,Fn

theXni’s,i=1, . . . ,nare iid, rotationally symmetric aboutθθθnon Sp1(x) andX0niθθθnhas cumulative distribution Fn

P(n)F

n

a special case of Pθθθ(n)

n,Fn withθθθnchosen as the first vector of the canonical basis ofRpn

P(n)θθθ

nn,f

theXni’s,i=1, . . . ,nare independent and identically distributed (iid) with densityx7→cpnn,f f ¡

κnx0θθθn

¢1Sp−1(x)(n)θθθ

nn,f

theXni’s,i=1, . . . ,nare independent and identically distributed (iid) with densityx7→cpnn,ff ¡

κn(x0θθθn)2¢

1Sp1(x)

φ(n)Bing the Bingham test, see (3.1)

φ(n)Ray the Rayleigh test, see (2.2)

Φ(·) the cumulative distribution function of the standard normal distribution

ψp(·) the cumulative distribution function of the chi-squared distribution withpdegrees of freedom

ψp(·;λ)

the cumulative distribution function of the noncentral chi-squared distribution withpdegrees of freedom and

noncentrality parameterλ Qn the Bingham test statistic, npn(p2n+2)

³ tr£

S2n¤

p1n

´

QStn the high-dimensional Bingham test statistic, Qpn−dpn

2dpn , where dpn=pn(pn+1)/2−1

Rn the Rayleigh test statistic,npnkX¯nk2 RStn the high-dimensional Rayleigh test statistic, Rpn−pn

2pn

sa the sign of real numbera

Sn the covariance matrix of the observations,n1Pn

i=1XniX0ni Sp−1 the unit sphere inRp

x∈Rp:kxk =1ª

tr(A) the trace of matrixA

X¯n the mean vector of the observations,n−1Pn i=1Xni

vecA the vector obtained by stacking the columns of matrixAon top of each other

zα the upperα-quantile of the standard normal distributionN (0, 1)

(9)

Introduction

In directional statistics, inference is based onp-variate observations lying on the unit sphere Sp1:={x∈Rp:kxk =p

x0x=1}. This is relevant in various situations.

(i) First, the original data themselves may belong to Sp−1like the times of day at which thunderstorms occur (p=2) or epicentres of earthquakes (p=3) (see [Mardia and Jupp, 2000] for more examples).

(ii) Second, some fields by nature are so that only the relative magnitude of the obser- vations is important, which leads to projecting observations onto Sp−1. In shape analysis, for instance, this projection only gets rid of an overall scale factor related to the (irrelevant) object size.

(iii) Finally, even in inference problems where the full (Euclidean) observations in prin- ciple need to be considered, a common practice in nonparametric statistics is to restrict to sign procedures, that is, to procedures that are measurable with respect to the projections of the observations onto Sp−1; see, e.g., [Oja, 2010] and the refer- ences therein.

While (i) is obviously restricted to small dimensionsp, (ii)-(iii) nowadays increasingly involve high-dimensional data. For (ii), high-dimensional directional data were consid- ered in [Dryden, 2005], with applications in brain shape modeling; in text mining, [Baner- jee et al., 2003] and [Banerjee and Ghosh, 2004] project high-dimensional data on unit spheres to remove the bias arising due to the length of a text when performing clustering.

As for (iii), the huge interest raised by high-dimensional statistics in the last decade has made it natural to consider high-dimensionalsigntests. In particular, [Zou et al., 2014]

recently considered the high-dimensional version of the [Hallin and Paindaveine, 2006]

sign tests of sphericity, whereas an extension to the high-dimensional case of the location sign test from [Chaudhuri, 1992] and [Möttönen and Oja, 1995] was recently proposed in [Wang et al., 2015]. Considering (iii) in high dimensions is particularly appealing since for moderate-to-largep, sign tests show excellent (fixed-p) efficiency properties.

We consider the problem of testing uniformity on the unit sphere Sp−1. This is a very classical problem in multivariate analysis that can be traced back to [Bernoulli, 1735]. It is discussed at length in strictly all textbooks in the field; see, among many others, [Fisher et al., 1987], [Ley and Verdebout, 2017], and [Mardia and Jupp, 2000]. As explained in the review paper [García-Portugués and Verdebout, 2018], the topic has recently received a lot of attention: to cite only a few contributions, [Jupp, 2008] proposed data-driven Sobolev tests, [Cuesta-Albertos et al., 2009] proposed tests based on random projections, [Sun and Lockhart, 2019] obtained Bayesian optimality properties of some tests while [García- Portugués et al., 2019] transformed some uniformity tests into tests of rotational sym- metry. Possible applications of testing uniformity on high-dimensional spheres include outlier detection; see [Juan and Prieto, 2001]. Other natural applications are related with testing for sphericity inRp, in the spirit of (iii) above.

(10)

Aspn can diverge to infinity withn, it is necessary to consider triangular arrays of observations Xni, i =1, . . . ,n, n = 1, 2, . . . such that for anyn, Xni ∈Spn1 (some other works, including [Chikuse, 1991,Chikuse, 1993], actually rather consider a fixed-nlarge- pasymptotic scenario). Results to test uniformity on high-dimensional spheres already exist but they often impose conditions on the way pn andn diverge to infinity. For ex- ample, [Cai et al., 2013] study the asymptotic behaviours of the pairwise angles amongn uniformly distributed vectors on Spn−1asn→ ∞, whilepn is either fixed or growing to infinity withn. Let O≤Θi j≤πdenote the angle between the unit vectors−→

OXni and−→

OXn j for all 1≤i <jn, andΘmin:=min©

Θi j: 1≤i<jnª

. Whenpn→ ∞asn→ ∞, three different regimes have to be considered:

• the sub-exponential case: lognpn →0 asn→ ∞. Then

2pnlog sinΘmin+4 logn−log logn→D Y, asn→ ∞, where Y has cumulative distribution function

F(y)=1−exp µ

ey/2 4p

, y∈R.

• the exponential case: logpn n →β∈(0,∞) asn→ ∞. Then 2pnlog sinΘmin+4 logn−log logn→D Y, asn→ ∞, where Y has cumulative distribution function

F(y)=1−exp Ã

s β

8π(1−e4β)e(y+8β)/2

!

, y∈R.

• the super-exponential case: lognpn → ∞asn→ ∞. Then 2pnlog sinΘmin+ 4pn

pn−1logn−logpnD Y, asn→ ∞, where Y has cumulative distribution function

F(y)=1−exp µ

ey/2 2p

, y∈R.

In each case, uniformity is rejected if the statistic based onΘminis too small and the critical value can be found thanks to the cumulative distribution functions above. In prac- tice, though, if only one sample is available, it is not possible to know which regime applies and therefore which test to use.

On the contrary, test statistics that work in low dimensions and that do not need to be corrected to be valid in high-dimensions are called high-dimension robust as defined by [Paindaveine and Verdebout, 2016]. This notion arises first in [Ledoit and Wolf, 2002]

who showed that the Gaussian sphericity test from [John, 1972] is robust against high dimensionality. LetZn1, . . . ,Znn be Gaussian observations inRpn1. Whenpn=pis fixed, the Gaussian sphericity test from [John, 1972] rejects the null when

np

2 U>χ2dp;1−α,

(11)

where

U :=p

µ tr(S2n) tr(Sn)2−1

p

¶ , Sn:=n−1Pn

i=1ZniZ0ni is the covariance matrix of the observations,χ2d,1−αdenotes the up- perα-quantile of theχ2ddistribution anddp:=p(p+1)/2−1. In high dimensions,dpn→ ∞ aspn→ ∞and it is expected that

npnU/2−dpn q

2dpn

D N (0, 1).

This is indeed the case whenpn/n→c∈(0,∞) as shown in [Ledoit and Wolf, 2002].

In an even more general framework, pn → ∞ asn → ∞without any condition on the waypngoes to infinity withn, [Paindaveine and Verdebout, 2016] showed the high- dimension robustness of several sign test statistics thanks to a martingale theorem that will be stated in Chapter1and used in Chapter4. In particular, they studied the most classical test of uniformity, the [Rayleigh, 1919] test, which rejects the null of uniformity for large values of Rn:=npnkX¯nk2, where ¯Xn:=n1Pn

i=1Xni. In low dimensions (pn=p for alln), the test is based on the null asymptotic chi-square distribution withpdegrees of freedom,χ2p, of Rn. In the high-dimensional setup (pn→ ∞asn→ ∞), [Paindaveine and Verdebout, 2016] showed that

RStn :=Rnpn p2pn

D N (0, 1).

Therefore, denoting byΦ(·) the cumulative distribution function of the standard normal and definingzα:=Φ−1(1−α), the high-dimensional Rayleigh test rejects the null of uni- formity when

RStn2pn,1−αpn

p2pn (=zα+o(1))

has asymptotic sizeαunder the null hypothesis, irrespective of the rate at whichpndi- verges to infinity withn, and is therefore, in that sense, high-dimension robust.

In low dimensions the Rayleigh test is the score test of uniformity within the popu- lar Fisher–Von Mises–Langevin (FvML) model. It is then natural to consider an extension of this model to study its non-null behaviour. This is why we will focus in Chapter2on mutually independent observations with a common rotationally symmetric density pro- portional tox∈Spn−17→ f ¡

κnx0θθθn

¢, whereθθθn∈Spn−1is a location parameter,κn>0 is a concentration parameter and f is monotone strictly increasing. Ifκnis too small, the al- ternatives are too close to the null of uniformity and no test can be consistent. We say that the alternatives are contiguous to the null and we will see how smallκnmust be. Whenθθθn

is known, we will show thanks to Local Asymptotic Normality (LAN) that the Rayleigh test is not optimal even though it shows some power against contiguous alternatives in low di- mensions. However, whenθθθnis unknown, it becomes optimal against general monotone alternatives in low dimensions and against FvML alternatives in high dimensions.

Despite its nice optimality properties, the Rayleigh test will detect only asymmetric deviations from uniformity. It will show no power when the common distribution of theXni’s attributes the same probability to antipodal regions. Far from being the excep- tion, such antipodally symmetric distributions are actually those that need be considered when practitioners are facing axial data, that is, when one does not observe genuine lo- cations on the sphere but rather axes (a typical example of axial data relates to the direc- tions of optical axes in quartz crystals; see, e.g., [Mardia and Jupp, 2000]). Models and

(12)

inference for axial data have been considered a lot in the literature: to cite a few, [Tyler, 1987b] and more recently [Paindaveine et al., 2018] considered inference on the distribu- tion of the spatial sign of a Gaussian vector, [Watson, 1965], [Bijral et al., 2007] and [Sra and Karp, 2013] considered inference for Watson distributions, [Dryden, 2005] obtained distributions on high-dimensional spheres while [Anderson and Stephens, 1972], [Bing- ham, 1974] and [Jupp, 2001] considered uniformity tests against antipodally symmetric alternatives.

In low dimensions, the textbook test of uniformity for axial data is the [Bingham, 1974]

test, that rejects the null hypothesis of uniformity whenever Qn=npn(pn+2)

2

³ tr£

S2n¤

− 1 pn

´

2dpn,1−α,

whereSnis the covariance matrix of the observations on Spn−1anddp is defined as above.

In high dimensions, Theorem 2.5 from [Paindaveine and Verdebout, 2016] implies that, under the null hypothesis of uniformity,

QStn =Qndpn

q2dpn

=pn n

n

X

i<j

i,j=1

X0niXn j

¢2

− 1 pn

o D

→N (0, 1),

as soon aspndiverges to infinity withn. Therefore the Bingham test is also robust to high- dimensionality. Again, this property is not enough to justify its use and we are interested in its nonnull behaviour. In this case it is natural to consider an extension of the Wat- son distributions, namely axial rotationally symmetric distributions. In Chapter3we will assume that the observations share a density proportional tox∈Spn−17→ f ¡

κn(x0θθθn)2¢ , whereθθθn and f are defined as above andκn∈Ris still a concentration parameter. The contiguity rate in this model is higher than in the monotone one and suggests the prob- lem is more complicated. Whenθθθn is known, LAN yields optimal tests and enables to show that the Bingham test is not optimal whenθθθn is unknown but shows some power against contiguous alternatives in low dimensions. Consequently we turn to single-spiked tests based on the extreme eigenvalues of the covariance matrix and derive in the low- dimensional case their limiting behaviour under contiguous alternatives.

In Chapter4we will prove with a martingale central limit theorem that the null high- dimensional asymptotic distributions of the Rayleigh and Bingham tests, after appropri- ate standardisation, actually hold under broad classes of rotationally symmetric distribu- tions. This entails that the Bingham test, which is blind to contiguous alternatives in the monotone and axial models, can detect more severe alternatives in both cases. Through- out, simulations confirm our findings.

In Chapters2-4 proofs are to be found in the last section. To make this document as accessible as possible, theorems that require long technical proofs are followed by sketches of proofs for readers who want the general idea without having to go into too many details or need help getting their bearings. For theorems which are proven in a similar way, the outline of the proof is given after the first one.

This thesis is based on two published and one submitted articles:

• Cutting, Christine; Paindaveine, Davy; Verdebout, Thomas. Testing uniformity on high-dimensional spheres against monotone rotationally symmetric alternatives.

Ann. Statist. 45 (2017), no. 3, 1024–1058. https://projecteuclid.org/euclid.

aos/1497319687

(13)

• Cutting, Christine; Paindaveine, Davy; Verdebout, Thomas. On the power of axial tests of uniformity. Electronic Journal of Statistics 14, 2123-2154 (2020). https:

//projecteuclid.org/euclid.ejs/1589335309

• Cutting, Christine; Paindaveine, Davy; Verdebout, Thomas. Testing uniformity on high-dimensional spheres: the non-null behaviour of the Bingham test.

(14)
(15)

Chapter 1

Useful notions

Contents

1.1 Some distributions on the sphere . . . 14

1.1.1 Uniform distribution . . . . 14

1.1.2 Rotationally symmetric distributions . . . . 15

1.1.3 “Monotone” rotationally symmetric distributions . . . . 15

1.1.4 “Axial” rotationally symmetric distributions . . . . 16

1.2 Le Cam’s theory . . . 17

1.2.1 Convergence of experiments . . . . 17

1.2.2 Local Asymptotic Normality (LAN). . . . 18

1.2.3 Contiguity . . . . 20

1.2.4 Le Cam’s First Lemma . . . . 21

1.2.5 Le Cam’s Third Lemma. . . . 21

1.3 Invariance . . . 22

1.4 Billingsley’s Theorem. . . 24

(16)

In this chapter we introduce various unrelated notions that arise in subsequent chap- ters. We start in Section1.1with some distributions on the sphere: the uniform distribu- tion and rotationally symmetric distributions. We then focus on two special cases, mono- tone and axial distributions. In Section1.2we define Local Asymptotic Normality which can be used to find optimal tests and compute asymptotic powers under suitable alterna- tives thanks to Le Cam’s lemmas. Section1.3is devoted to the invariance principle and the reduction by invariance. Finally, Section1.4is about Billingsley’s Theorem, a martingale central limit theorem that will be applied to find the non-null behaviour of the Rayleigh and Bingham tests.

1.1 Some distributions on the sphere

1.1.1 Uniform distribution

IfXis uniformly distributed over Sp−1, then its density is Γ³p

−1 2

´ cp

(p1)/2 I{x∈Sp−1}, withcp:= Γ¡p

2

¢ pπΓ³p−1

2

´, (1.1)

whereΓ(·) is the Euler Gamma function andIAis the indicator function of condition A. As OXhas the same distribution asXfor any orthogonal matrixO, E [X]=0and Var [X] must be proportional to the identity matrixIp. ForkXk =1 almost surely,

Var [X]=E£ XX0¤

= 1

pIp. (1.2)

Besides we can prove that (see [Tyler, 1987a], page 244) Eh

vec¡ XX0¢ ¡

vec¡

XX0¢¢0i

= 1 p(p+2)

³

Ip2+Kp+Jp

(1.3) whereJp

vecIp¢ ¡

vecIp¢0

, vecAis the vector obtained by stacking the columns of ma- trixAon top of each other andKpis the commutation matrix (see [Magnus and Neudecker, 2007]).

For anyθθθ∈Sp−1, the distribution ofθθθ0Xis symmetric about 0 and¡ θθθ0X¢2

∼Beta³

1 2,p21´ (see, e.g., [Muirhead, 1982], Theorem 1.5.7(ii)). Therefore,θθθ0Xhas cumulative distribution function

Fp(t) :=cp Z t

1

(1−s2)(p−3)/2d s, fort∈[−1, 1] , and

E£ θθθ0X¤

=cp

Z 1

1

s(1s2)(p3)/2d s=0, (1.4) Eh

¡θθθ0X¢2i

=cp Z 1

−1

s2(1−s2)(p−3)/2d s= 1

p, (1.5)

E h¡

θθθ0X¢4i

=cp Z 1

−1

s4(1−s2)(p3)/2d s= 3

p(p+2). (1.6)

Consider triangular arrays of observations Xni, i =1, . . . ,n, n = 1, 2, . . .; we denote by P(n)0 the hypothesis that for anyn,Xn1,Xn2, . . . ,Xnnare mutually independent and uni- formly distributed on Spn1.

(17)

1.1.2 Rotationally symmetric distributions

A p-dimensional vectorXis said to be rotationally symmetric aboutθθθ∈Sp−1 if and only ifOXis equal in distribution toXfor any orthogonalp×pmatrixOsatisfyingOθθθ=θθθ. Such distributions are fully characterized by the location parameterθθθand the cumulative distribution function F ofX0θθθ. Popular examples of rotationally distributions are the FvML and Watson distributions described in Sections1.1.3and1.1.4.

IfXis rotationally symmetric aboutθθθ∈Sp1then we can show that (see [Saw, 1978])

E [X]=e1θθθ, (1.7)

XX0¤

=e2θθθθθθ0+1−e2 p−1

¡Ip−θθθθθθ0¢

, (1.8)

withe1:=E£ X0θθθ¤

ande2:=Eh

¡X0θθθ¢2i

. Moreover,Xcan be decomposed into

X=uθθθ+vS (1.9)

whereu:=X0θθθ,v:=p

1−u2and S:=

( X−uθθθ

kXuθθθk ifX6=θθθ,

0 otherwise.

The multivariate sign vectorSis uniformly distributed on Sp−1¡ θθθ¢

:=©

v∈Sp−1:v0θθθ=0ª and is independent ofu(see (9.3.32) in [Mardia and Jupp, 2000]).

o θθθ uθθθ

X

vS S

Figure 1.1 – The tangent-normal decomposition ofX, rotationally symmetric aboutθθθSp−1 Consider triangular arrays of observations Xni, i = 1, . . . ,n, n = 1, 2, . . .; we denote by Pθθθ(n)

n,Fn the hypothesis that for anyn,Xn1,Xn2, . . . ,Xnn are on Spn1, mutually indepen- dent and rotationally symmetric aboutθθθn, with cumulative distribution function Fn.

In the two following sections we will be interested in particular cases of rotationally symmetric distributions: monotone and axial ones.

1.1.3 “Monotone” rotationally symmetric distributions

We define “monotone” rotationally symmetric densities (with respect to the surface area measure on Sp−1) as densities of the form

x7→

Γ³p

1 2

´ cp,κ,f(p1)/2 f ¡

κx0θθθ¢

, x∈Sp1, (1.10)

where

(18)

1. θθθ∈Sp−1is a location parameter;

2. κ>0 is a concentration parameter (irrespective of f,κ=0 corresponds to the uni- form distribution and negative values ofκare discarded in this model because they only swap the roles ofθθθand−θθθ);

3. the function f belongs to the class of functionsF :=©

f :R→R+:f monotone in- creasing, twice differentiable at 0, with f(0)=f0(0)=1ª

; 4. the constantcp,κ,f is such that

cp,κ,f Z 1

1

(1−s2)(p3)/2f(κs)d s=1.

These conditions on f guarantee identifiability of θθθ, κand f: clearly, the strict mono- tonicity off implies thatθθθis the modal location on Sp−1, whereas the constraint f0(0)=1 allows to identifyκandf.

This family of distributions is an extension of the Fisher–von Mises–Langevin (FvML) distributions; indeed, choosing f(·)=exp(·) in (1.10) provides

x7→cFvMLp,κ exp¡ κx0θθθ¢

I{x∈Sp−1}, withcp,FvMLκ := (κ/2)p/21

p/2Ip/21(κ), (1.11) whereIν(·) is the order-νmodified Bessel function of the first kind; see Section 9.3.2. in [Mardia and Jupp, 2000] for more details.

Note that ifXis a random vector with density (1.10), thenX0θθθhas density s7→cp,κ,f ¡

1−s2¢(p−3)/2

fs)I{|s|≤1}. For anyθθθnnandf defined as above, we will denote as Pθθθ(n)

nn,f the hypothesis under which theXni’s,i =1, . . . ,n, are mutually independent and share the common density

x7→

Γ³p

n1 2

´

cpnn,f(pn1)/2 f ¡

κnx0θθθn

¢.

1.1.4 “Axial” rotationally symmetric distributions

We define “axial” rotationally symmetric densities (with respect to the surface area measure on Sp1) as densities of the form

x7→

Γ³p

1 2

´

˘ cp,κ,f(p1)/2 f

³κ¡ x0θθθ¢2´

, x∈Sp1, (1.12)

where

1. θθθ∈Sp−1is a location parameter;

2. κ∈Ris a concentration parameter: the larger|κ|, the more the probability mass will be concentrated: symmetrically about the poles±θθθfor positive values ofκ(bipolar case) or symmetrically about the hyperspherical equator Sp−1¡

θθθ¢

for negative val- ues ofκ(girdle case). The valueκ=0 (irrespective off) corresponds to the uniform distribution over the sphere;

(19)

3. the function f belongs to the class of functionsF:=©

f :R→R+: f monotone in- creasing, twice differentiable at 0, with f (0)=f0(0)=1ª

. Ifκ6=0, then f and the pair©

±θθθª

are identifiable, butθθθitself is not (which is natural for axial distributions).

Ifκ=0, the location parameterθθθis unidentifiable, even up to a sign.

4. the constant ˘cp,κ,f is such that

˘ cp,κ,f

Z 1

−1(1−s2)(p3)/2f ¡ κs2¢

d s=1.

Density (1.12) is a symmetric function ofx, it attributes the same probability to an- tipodal regions on the sphere, hence is suitable for axial data. This family of distributions is actually an extension of the Watson distributions (the rotationally symmetric/single- spiked [Bingham, 1974] distributions) obtained by choosing f(·)=exp(·) in (1.12); see Section 9.4.2. in [Mardia and Jupp, 2000] for more details.

Note that ifXis a random vector with density (1.12), thenX0θθθhas density s7→c˘p,κ,f ¡

1−s2¢(p−3)/2 f ¡

κs2¢ I{|s|≤1}. For anyθθθnnandf defined as above, we will denote as ˘P(n)θθθ

nn,f the hypothesis under which theXni’s,i=1, . . . ,n, are mutually independent and share the common density

x7→

Γ³p

n−1 2

´

˘ cpnn,f(pn1)/2 f

³κn

¡x0θθθn

¢2´ .

1.2 Le Cam’s theory

1.2.1 Convergence of experiments

Astatistical experimentorstatistical modelξis a measurable space (Ω,A), called the sample space, equipped with a collection of probability measuresP :=©

Pθθθ:θθθ∈Θ⊂Rpª . The notion of convergence for statistical models is challenging and the sequel offers a glimpse. For more details, see Sections 6.1 and 6.2 in [Liese and Miescke, 2008] or Sections 2.1 and 2.2 in [Le Cam and Yang, 2000].

Fix a decision spaceD equipped with aσ-fieldBD, a loss function L :D×Θ→[0, 1] : (d,θθθ)7→Lθθθ(d) and a mappingδthat attributes to every x∈Ωa probability measure on (D,BD). The average loss over all decisionsdwhen a valuex∈Ωis observed is

Z

DLθθθ(d)dδx(d).

The risk function associated withδis then Rδ(θθθ) :=

Z

µZ

DLθθθ(d)dδx(d)

dPθθθ(x)

and represents the overall average loss whenxis picked according to Pθθθ. Since one crite- rion to judge the quality of decision procedures is their risk functions, we are led to study more closely the space of risk functions available for an experimentξand a loss function L. Define

R(P,D, L) := n

r:Θ→[0, 1] : there existsδsuch that Rδ(θθθ)≤r(θθθ)∀θθθo ,

(20)

and its pointwise closure R¯(P,D, L) :=

½

r:Θ→[0, 1] :r(θθθ)= lim

i→∞ri(θθθ)∀θθθ, whereri∈R(P,D, L)

¾ . Definition 1.2.1.

Letξ1:=¡

1,A1,P1

P1,θθθ|θθθ∈Θ⊂Rpª¢

and ξ2 :=¡

2,A2,P2

P2,θθθ|θθθ∈Θ⊂Rpª¢

. The deficiency ofξ1with respect toξ2is

∆(ξ12) :=infn

ε∈[0, 1] : for every finite decision spaceD,∀L :D×Θ→[0, 1] and

r2∈R(P2,D, L),∃r1∈R(P¯ 1,D, L) such thatr1(θθθ)≤r2(θθθ)+ε∀θθθ∈Θo . In other words,∆(ξ12) is the smallest number such that for every L :D×Θ→[0, 1], the set ¯R(P1,D, L) contains {R(P2,D, L)+∆(ξ12)} :={r2+∆(ξ12)|r2∈R(P2,D, L)}.

Thedistancebetweenξ1andξ2is

dist12) :=max (∆(ξ12),∆(ξ21)) .

The equality∆dist12)=0 does not imply that ξ1 andξ2 are the same but rather are equivalent or of the same type, like two different experiments to measure the gravity of Earth for example. If ∆dist12)=δ, it means that any risk function you can have on one of the two experiments can be matched withinδby a risk function on the other experiment.

In this definition only loss functions L such that 0≤Lθθθ(d)≤1 are allowed. This is because multiplying the loss function by a constantc≥0 would multiply the distance by the same constantc and it could be made as large as one pleases unless the distance is zero.

We can finally give a meaning to the convergence of statistical models.

Definition 1.2.2.

Letξ(n):=

³Ω(n),A(n),P(n):=

n P(n)θθθ ¯

¯θθθ∈Θ⊂Rp

andξ:=¡

Ω,A,P :=© Pθθθ¯

¯θθθ∈Θ⊂Rpª¢

. We say that (ξ(n)) converges weakly toξasn→ ∞if and only if, for every finite subsetΘ0⊂Θ,

dist³

ξ(n)Θ0Θ0´n→∞

−→ 0.

The restriction to finite subsets in the definition stems from the fact that, in the finite- parameter case, equivalence of convergence in the sense of Le Cam’s distance and of con- vergence of distributions of likelihood ratios has been proven. The infinite-parameter case is much more complex but this weak form of convergence is enough to have impor- tant consequences in the sequel.

1.2.2 Local Asymptotic Normality (LAN)

Intuitively, a sequence of statistical models is locally and asymptotically normal if it converges (in the sense of the previous section) to a Gaussian model after a suitable reparametrisation. IfX1, . . . ,Xnis an iid sample from a distribution Pθθθon (Ω,A), define

ξ(n):=

³Ω(n),A(n),P(n):= n

Pθθθ(n)¯

¯θθθ∈Θ⊂Rp

,

a sequence of experiments associated withX1, . . . ,Xn. If, forν(n)→0 asn→ ∞, ξ(n)θθθ :=³

(n),A(n),Pθθθ(n):=n

P(n)θθθ+ν(n)τττ

¯

¯τττ∈Rp

→ξθθθ:=¡

Ω,A,Pθθθ:=© Pθθθ,τττ¯

¯τττ∈Rpª¢

, the optimal procedure inξθθθis asymptotically optimal inξ(n)θθθ hence locally and asymptot- ically optimal inξ(n).

(21)

Definition 1.2.3. A sequence of experimentsξ(n)isLocally Asymptotically Normal (LAN) if for allθθθ∈Θ, there exist:

1. a sequenceννν(n) of p×p full rank, non-random, symmetric and positive definite1 matrices, called the contiguity rate, such that°

°ννν(n)°

°→0 asn→ ∞, where°

°ννν(n)°

° denotes the Frobenius norm ofννν(n);

2. a sequence of randomp-vectors∆∆∆θθθ(n) measurable with respect toX1, . . . ,Xn, called the central sequence;

3. a non-randomp×pmatrixΓΓΓ(θθθ), calledthe Fisher information matrix;

such that for every bounded sequence of vectorsτττ(n)∈Rp, we have under Pθθθ(n) (i) logdP(n)

θθθ+ννν(n)τττ(n)

dP(n)θθθ =¡ τττ(n)¢0

∆∆∆(n)θθθ −1 2

¡τττ(n)¢0

ΓΓΓ(θθθ)τττ(n)+oP(n)

θθθ (1), (1.13) (ii)∆∆∆(n)θθθD Np

¡0,ΓΓΓ(θθθ)¢ . Consider the Gaussian shift experiment

ξθθθ:=¡

Rp,B(Rp),Pθθθ:=©

Pθθθ,τττ:=Np

¡ΓΓΓ(θθθ)τττ,ΓΓΓ(θθθ)¢ ¯

¯τττ∈Rpª¢

with a single observation that we denote as∆∆∆, whereB(Rp) is the Borelσ-field onRp. The log-likelihood ratio inξθθθis

logdPθθθ,τττ(∆∆∆)

dPθθθ,0(∆∆∆)=τττ0∆∆∆−1

2τττ0ΓΓΓ(θθθ)τττ. (1.14) We see that the log-likelihood ratio in (1.13) behaves asymptotically like the one in (1.14).

The similarity is not just a coincidence; define ξθθθ(n):=

³Ω(n),A(n),Pθθθ(n):= n

P(n)

θθθ+ννν(n)τττ

¯

¯τττ∈Rp

.

It can be shown thatξθθθ(n)converges weakly toξθθθ(see for example Corollary 6.66 in [Liese and Miescke, 2008] or Theorem 9.4 in [van der Vaart, 1998]).

Example 1.2.4(One-parameter exponential family).

Let X1, . . . , Xnbe iid from (Ω,A,P :={Pθ|θ∈Θ0⊂R}) such thatP is absolutely contin- uous with respect to someσ-finite measureµand

fθ(x) :=dPθ

=exp (θT(x)−A(θ)) for allθ∈Θ0. Note that X(n):=(X1, . . . , Xn) comes from

³Ω(n),A(n),P(n):=

n P(n)θ ¯

¯θ∈Θ0⊂Ro´

. The setΘ0is thenatural parameter spacedefined as

½

θ∈Rsuch that Z

pθ(x)dµ(x)< ∞

¾ . Letθ0be a point insideΘ0. Then

log dP(n)θ

0+n−1/2τ

dP(n)θ

0

=log

n

Y

i=1

fθ0+n−1/2τ(Xi) fθ0(Xi) = τ

pn

n

X

i=1

T(Xi)−n µ

A µ

θ0+ τ pn

−A(θ0)

. (1.15)

1This condition is not necessary but is added for convenience ; see [Le Cam and Yang, 2000]

(22)

It can be shown (see for example Problem 2.16 in [Lehmann and Romano, 2005]) that Eθ0[T(Xi)]=A00),

Varθ0[T(Xi)]=A000), so that, by a Taylor expansion, asn→ ∞,

n µ

A µ

θ0+ τ pn

−A(θ0)

=p

nτA00)+1

2A000)+o(1).

The log-likelihood ratio (1.15) becomes log

dP(n)

θ0+n−1/2τ

dP(n)θ

0

=τ∆(n)θ0 −1

2A000)+o(1), (1.16) where A000) is the Fisher information matrix and where, underθ0, the central sequence

(n)θ0 is such that

(n)θ

0 := 1

pn

n

X

i=1

¡T(Xi)−Eθ0[T(Xi)]¢ D

→N (0, A000)), asn→ ∞.

An approximate result like (1.16) can actually be obtained for much more general fam- ilies like iid observations, either by setting conditions on the second derivative of the log- likelihood or under the single condition of differentiability in quadratic mean of fθθθ1/2atθθθ (see Section 7.2 of [van der Vaart, 1998] or Section 12.2 in [Lehmann and Romano, 2005]).

1.2.3 Contiguity

The concept of contiguity was introduced by [Le Cam, 1960] as an equivalent to "asymp- totic absolute continuity". Contiguity implies a strong relation between the asymptotic behaviour of a sequence of statistics under laws Q(n)(typically an alternative hypothesis) and its asymptotic behaviour under laws P(n)(typically a null distribution) and enables to find the former from the latter.

Definition 1.2.5. Let (Ωn,An) be measurable spaces equipped with two sequences of probability distributions P(n)and Q(n).

The sequence Q(n)iscontiguousto P(n), denoted as Q(n)/P(n), if P(n)(An)→0 implies Q(n)(An)→0 for every sequence of measurable sets An∈An. If Q(n)/P(n)and P(n)/Q(n), we say that Q(n)and P(n)aremutually contiguous: Q(n)./P(n).

To get an idea of what contiguity is, consider the problem of testing (H0(n): {P(n)}

H1(n): {Q(n)}

and suppose P(n)and Q(n)are mutually contiguous. Consider a sequence of nonrandom- ized tests2φ(n). Then

P(n)(n)=1]→0⇔Q(n)(n)=1]→0

2A nonrandomized test is a test statistic that can have only two values: 0 (the null hypothesis is not rejected) and 1 (the null hypothesis is rejected).

(23)

and

Q(n)(n)=1]→1⇔P(n)(n)=1]→1.

If the size of the tests converges to zero, then the power of the tests converges to zero too.

Conversely if the power of the tests converges to one, then the size of the tests converges to one too. The sequences of probability distributions P(n) and Q(n)are so close that no test has a power converging to one.

1.2.4 Le Cam’s First Lemma

Contiguity can be proven thanks to the following result known as Le Cam’s first lemma (see Lemma 6.4 in [van der Vaart, 1998]).

Lemma 1.2.6(Le Cam’s First Lemma). Let(Ωn,An)be measurable spaces equipped with two sequences of probability distributionsP(n)andQ(n). Then the following statements are equivalent :

(i) Q(n)/P(n).

(ii) If dP(n)/dQ(n)D UunderQ(n)along a subsequence, thenP[U>0]=1.

(iii) If dQ(n)/dP(n)D VunderP(n)along a subsequence, thenE[V]=1.

(iv) For any statisticsTn:Ωn7→Rk: ifTn→0underP(n), thenTn→0underQ(n).

To prove mutual contiguity, one therefore needs to prove that ifdQ(n)/dP(n)D U un- der P(n)along a subsequence, then P[U>0]=1 and E[U]=1. For example, if P(n)and Q(n) are two probability measures on arbitrarily measurable spaces such that, under P(n),

dQ(n) dP(n)

D eN(µ,σ2),

then Q(n)and P(n)are mutually contiguous if and only ifµ= −σ2/2 (like Pθθθ+ννν(n) (n)τττ(n)and Pθθθ(n) whenξ(n)=

³Ω(n),A(n),P(n)= n

Pθθθ(n)¯

¯θθθ∈Θ⊂Rp

is Locally Asymptotically Normal).

1.2.5 Le Cam’s Third Lemma

This section allows computing the asymptotic distribution of a statisticXnunder Q(n) from its asymptotic distribution under P(n)(see Lemma 6.6 in [van der Vaart, 1998]).

Lemma 1.2.7. LetP(n)andQ(n)be sequences of probability measures on probability spaces (Ωn,An)and letXn:Ωn7→Rk be a sequence of random vectors. Suppose thatQ(n)/P(n) and

µ

Xn,dQ(n) dP(n)

D

→(X, V) underP(n). ThenL(B) :=E£

I{X∈B}

defines a probability measure, andXnD LunderQ(n). The following lemma is a famous special case that will be useful for computing the asymptotic power of tests under sequences of experiments that are locally and asymptot- ically normal (see Example 6.7 in [van der Vaart, 1998]).

Références

Documents relatifs

The expression of monomer availability can be obtained by multiply the monomer density and ratio of partition functions between surface (Z Ÿ ) and bulk (Z Ô ) [21]. This ratio

The numerical results are from the simulations with the GRI2.11 scheme, the dynamic PaSR model and: a single κ for all the species (blue solid lines); κ NO = 1 and the “standard” κ

The first phase of the method generates some association rules using the Apriori algorithm which is then used in the second phase that runs a Genetic Algorithm to find the best set

The literature reviewed as applicable to this thesis can be grouped according to the following sections: sensor aiming, path following, aircraft control methods, aircraft

This paper investigates in-depth: (i) the application of Fog Computing in perish- able produce supply chain management using blackberry fruit as a case study; (ii) the

In this thesis, the water entry problems for 2-D and 3-D objects entering both calm water and regular waves are investigated numerically with a CIP-based method.. Emphasis is put

If we quickly re- view how vorticity confinement works, we would find that curl is calculated from the macroscopic velocity, and the N-S based solver uses the velocity to advect

Key words: Orlicz affine and geominimal surface areas, Orlicz-Brunn-Minkowski theory, Orlicz-Petty bodies, L p projection body, p-capacity, p-affine capacity, p-integral affine