• Aucun résultat trouvé

Testing for equality of means of a Hilbert space valued random variable

N/A
N/A
Protected

Academic year: 2023

Partager "Testing for equality of means of a Hilbert space valued random variable"

Copied!
5
0
0

Texte intégral

(1)

Statistics/Probability Theory

Testing for equality of means of a Hilbert space valued random variable

Jean Gérard Aghoukeng Jiofack

a,b

, Guy Martial Nkiet

a

aUniversité des sciences et techniques de Masuku, URMI, BP 943, Franceville, Gabon

bUniversité de Yaoundé 1, faculté des sciences, département de mathématiques, BP 812, Yaoundé, Cameroun Received 10 May 2009; accepted after revision 23 September 2009

Available online 30 October 2009 Presented by Paul Deheuvels

Abstract

We propose a test for equality of means of a random variable valued into a real separable Hilbert space. The test statistic is based on projections of empirical means onto spaces spanned by principal directions obtained from principal component analysis of the random variable. The asymptotic distribution of this test statistic is derived under the null hypothesis and the consistency of the obtained test is proved. An application to the case of functional variables is indicated.To cite this article: J.G. Aghoukeng Jiofack, G.M. Nkiet, C. R. Acad. Sci. Paris, Ser. I 347 (2009).

©2009 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.

Résumé

Un test d’égalité des moyennes d’une variable aléatoire à valeurs dans un espace de Hilbert.Nous proposons un test d’égalité des moyennes pour une variable aléatoire à valeurs dans un espace de Hilbert réel séparable. La statistique de test est basée sur des projections des moyennes empiriques sur des directions obtenues à partir de l’analyse en composantes principale de la variable aléatoire. La loi limite, sous hypothèse nulle, de cette statistique de test est obtenue et la convergence du test obtenu est prouvée. L’application au cas de variables fonctionnelles est indiquée.Pour citer cet article : J.G. Aghoukeng Jiofack, G.M. Nkiet, C. R. Acad. Sci. Paris, Ser. I 347 (2009).

©2009 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.

Version française abrégée

SoientXetY deux variables aléatoires (v.a.) à valeurs respectivement dans(H,BH)et(F,P(F )), oùH est un espace de Hilbert réel séparable, BH est sa tribu borélienne, F est l’ensemble{1, . . . , q}et P(F ) la tribu de ses parties. Notant · la norme induite par le produit scalaire, supposant queE(X4) <+∞et considérantm=E(X), m=E(X|Y =), nous proposons dans cette note un test de l’hypothèse nulleH0définie en (1) contre l’hypothèse alternative notée H1. Pour cela, on suppose que les valeurs propres décroissantesλi de l’opérateur de covariance V de X sont distinctes, et on utilise la statistique définie en (2) sur la base d’un échantillon i.i.d.{(Xi, Yi)}1in

E-mail addresses:[email protected] (J.G. Aghoukeng Jiofack), [email protected] (G.M. Nkiet).

1631-073X/$ – see front matter ©2009 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.

doi:10.1016/j.crma.2009.10.010

(2)

de(X, Y ). Considérant les matrices diagonalesDV(p)=diag(λi;1ip)et=diag(p;1q), oùp= P (Y=) >0, la matrice carréeΣd’ordrepqdéfinie en (3), et notant⊗Kle produit matriciel de Kronecker, on a : Théorème 2.1.SousH0,nTn(p)L Q=UTΓ U(n→ +∞), oùΓ =(DV(p))1K1etUNRpq(0, Σ ).

Pour un niveauα∈ ]0,1[donné,H0est rejetté lorsqueFQ(nTn(p)) >1−α, oùFQdésigne la fonction de répar- tition deQ. Celle-ci peut être calculée en utilisant des formules d’approximation données dans [13] et qui nécessitent le calcul des valeurs propres deΣ1/2Γ Σ1/2. Les matricesΣandΓ étant inconnues en pratique, on les remplace par les estimateurs convergents obtenus en remplaçant, dans les formules qui les définissent, chaque paramètre par son analogue empirique. Le théorème suivant montre la convergence du test proposé, pour des valeurs depsuffisamment grandes.

Théorème 2.2.SousH1, il existep0∈Ntel que pour touspp0etα∈ ]0,1[, on a:

n→+∞lim P

FQnTn(p)

>1−α

=1.

Ce test peut être appliqué au cas d’une variable fonctionnelle. En effet, lorsqueX=(X(t ))t∈[0,1]est à valeurs dans L2([0,1]), et si l’on suppose que chaqueXiest observé en des pointst1, . . . , tLtels que 0t1< t2<· · ·< tL1, on obtient la matriceX=(X(i)(tj))sur la base de laquelle on peut calculer lesλˆiet les composantes principalescˆi(k)= Xi,uˆk en utilisant, par exemple, la fonction pca.fd disponible dans la librairie fda de R. Cela permet d’obtenir les matricesΓˆ etΣˆ ; cette dernière étant construite comme en (3) avec les termesσˆij kdonnés en (6).

1. Introduction

Letting(Ω,A, P )be a probability space, we consider two random variablesX andY defined on this space and valued respectively into(H,BH)and(F,P(F )), whereH is a real separable Hilbert space,BH is the related Borel σ-field,F is the set{1, . . . , q}andP(F )denotes theσ-field of all subsets ofF. Denoting by·,· the inner product of H and by·the related norm, we suppose thatE(X4) <+∞. Then we consider the meanm=E(X)and, for any F, the conditional meanm=E(X|Y=)and the probabilityp=P (Y=)which is assumed, without loss of generality, to be non-null. We are interested in testing for the hypothesis

H0:m1= · · · =mq (1)

against the alternative H1: ∃F, τ=0, whereτ=mm. This problem is within a field, calledFunctional Data Analysis, that has emerged in the last decade. It consists of statistical methods allowing analysis of data that are curves, including finely sampled records over some time interval. The book by Ramsay and Silverman [14] gives a comprehensive introduction to this field of statistics, whereas nonparametric approaches are tackled in [9]. Several recent papers deal with statistical testing procedures for functional variables, mainly in the context of functional linear regression models for which tests for no effect are considered (see, e.g., [3,4,11]). However, a few papers propose tests on the means of functional variables. More precisely, Mas [12] introduced a procedure for testing the equality of the mean to a fixed value, whereas tests for comparison of means, that is the hypothesisH0given in (1), were proposed in [2,5] and [8]. The practical interest of using these methods was illustrated in [2] on electroencephalogram data, and in [5] on data from experimental cardiology. Recent works also show the interest of making statistical inference on others location parameters in the context of functional variables. In this vein, mode and median were tackled in [9], whereas a general depth measure is proposed in [6] which also contains a discussion about the applicability of this idea for inference in functional data analysis. Our purpose in this paper is to introduce a new method for testing for H0; this method is based on projections of theτ’s onto principal directions, that is lines spanned by unit eigenvectors of the covariance operator ofX. Our approach is described in the general framework of Hilbert space valued random variables and is easily applied when functional variables are considered.

2. Main results

The covariance operator ofXis defined byV =E((Xm)(Xm)), where⊗denotes the tensor product such that, for any(x, y),xyis the linear map:hx, hy. It is known that, under the above assumption,V is a Hilbert–

(3)

Schmidt self-adjoint operator acting fromHto itself. Lettingi)i1be the decreasing sequence of eigenvalues ofV, we consider an orthonormal basis(ui)i1 ofH such thatui is an eigenvalue ofV associated withλi. We assume throughout the paper that the following assumption holds:

(A)i1, λi> λi+1>0.

Given an i.i.d. sample {(Xi, Yi)}1in of (X, Y ), we consider the empirical counterpart Vˆn of V, defined as Vˆn=n1n

i=1(Xi− ¯X(n))(Xi− ¯X(n)), where X¯(n)=n1n

i=1Xi, and we denote by(ˆλi,uˆi)i1a family of eigenelements ofVˆnsuch that(ˆλi)i1is the decreasing sequence of eigenvalues ofVˆnand(uˆi)i1is an orthonormal system of related eigenvectors,uˆi being associated withλˆi. In addition, forF, we put

ˆ n=

n i=1

1{Yi=}, pˆ=nˆ

n, X¯(n) = 1 ˆ n

n i=1

1{Yi=}Xi and τˆn= ¯X(n) − ¯X(n). For testingH0againstH1, we make use of the statistic given by

Tn(p)= p i=1

q =1

ˆ λi 1pˆ

ˆ τn,uˆi2

, (2)

wherepis sufficiently large. This statistic is a consistent estimator ofT (p)=p i=1

q

=1λi 1pτ, ui 2. Indeed, an obvious application of the strong law of large numbers gives the almost sure convergence ofpˆ (respectivelyτˆn; respectively Vˆn) to p (respectively τ; respectively V in L(H )) as n→ +∞. Further, Lemma 1 in [10] gives

λiλi| ˆVnV and ˆuiui2√

2 ˆui⊗ ˆuiuiui. Then, using Proposition 3 in [7] which ensures the almost sure convergence inL(H )of uˆi⊗ ˆui touiui asn→ +∞, we deduce from the preceding inequalities thatλˆi anduˆi converge almost surely, asn→ +∞, toλi andui respectively. Consequently,Tn(p)converges almost surely toT (p)asn→ +∞. The asymptotic distribution underH0of this statistic is given in the theorem that is stated below. Let us introduce the operators

V()=E

(Xm)(Xm) Y =

and Θj =δj pjV(j )+pjp

VV(j )V() , where(j, )F2andδj denotes the Kronecker delta. This permits to consider thepq×pqmatrix

Σ=

⎜⎜

⎜⎝

σ1111 σ1112 . . . σ11pq σ1211 σ1212 . . . σ12pq

... ... . . . ... σpq11 σpq12 . . . σpqpq

⎟⎟

⎟⎠, whereσij k= ui, Θj uk . (3)

Considering the diagonal matricesDV(p)=diag(λi;1ip)and=diag(p;1q), and denoting byK the Kronecker matrix product, we have:

Theorem 2.1.UnderH0,nTn(p)converges in distribution, asn→ +∞, toQ=UTΓ U, whereΓ =(DV(p))1K 1andUNRpq(0, Σ ).

Sketch of the proof. We first express the test statistic as Tn(p)=tr(Rnp), where Rnp=(Vpn)1Bn with Vpn = p

i=1λˆiuˆi ⊗ ˆui and Bn =q

=1pˆn τˆn⊗ ˆτn. Then, letting {f1, . . . , fq} be the canonical basis of Rq, putting W=q

=11{Y=}f,Wi=q

=11{Yi=}fand Z=

Xm W

, Zi=

Xi− ¯X(n) Wi

, VZ=E(ZZ), VZ(n)=1 n

n i=1

ZiZi,

we show thatnRnp=ϕn(Kn), whereKn=√

n(VZ(n)VZ)andϕnis the random map fromL(H×Rq)toL(H )given byϕn(S)=q

=1(pˆn)112(S)f)((Vpn)112(S)f)). Here,Π12 is defined byΠ12(S)=P1SP2 whereP1 (respectivelyP2) is the canonical projection fromH×RqtoH(respectivelyRq). On the one hand, we show thatKn

(4)

converges in distribution asn→ +∞to a r.v.Khaving a centered normal distribution inL(H×Rq)with covariance operator equal to that of

S 1

2

Z0Z0−E(Z0Z0)

(Z0μ)Π0(μ)Π0(Z0μ)μ+Π0(Z0μ)Π0(μ)

,

whereS:AL(H×Rq)A+Aand Z0=

X W

, μ=E(Z0), Π0: u

v

H×Rq−→

u 0

H×Rq.

On the other hand, using this convergence result and the almost sure convergences of pˆn and (Vpn)1 top and (Vp)1respectively, whereVp=p

i=1λiuiui, it is easy to check thatϕn(Kn)ϕ(Kn)converges in probability to 0, ϕ being defined byϕ(S)=q

=1(p)112(S)f)((Vp)112(S)f)). Therefore,ϕn(Kn)has the same asymptotic distribution thanϕ(Kn). From the continuity ofϕwe deduce (see [1]) that this asymptotic distribution is the distribution of ϕ(K), that is a centered normal distribution. Consequently,nTn(p)converges in distribution, as n→ +∞, toQ=tr(ϕ(K)). From some easy calculations, we finally obtainQ=UTΓ U where, denoting by vec the usual vectorization operator which stacks the columns of a matrix into one vector, and considering thep×q matrixM(K)=(ui, Π12(K)fj )1ip,1jq,U is defined byU=vec(M(K)T). This random vector is a linear function ofKand has, therefore, a centered normal distribution inRpq. Its covariance matrix is obtained from further calculations that show that it is equal toΣ. 2

For a given significance level α∈ ]0,1[, the hypothesisH0 will be rejected ifFQ(nTn(p)) >1−α, where FQ denotes the cumulative distribution function of Q. Since Qis a quadratic form of a normally distributed random vector,FQ can be computed or approximated by using formulas given in [13] and which involve the eigenvalues of Σ1/2Γ Σ1/2. In practiceΣandΓ are unknown; so, they are to be replaced by consistent estimators. For estimatingΓ, we have to consider Γˆ =(DˆV(p))1Kˆ1whereDˆV(p)=diag(ˆλi;1ip)andˆ =diag(pˆ;1q).

A consistent estimator ofΣis obtained by taking thepq×pqmatrixΣˆ constructed as in (3) but with elements given byσˆij k= ˆuiˆj uˆk , where

Vˆn()= 1 ˆ n

n i=1

1I{Yi=}

Xi− ¯X(n)

Xi− ¯X(n)

and Θˆj =δj pˆjVˆn(j )+ ˆpjpˆVˆn− ˆVn(j )− ˆVn() . (4) The following theorem shows the consistency of the test for a sufficiently largep:

Theorem 2.2.UnderH1, there existsp0∈Nsuch that, for anypp0and anyα∈ ]0,1[, we have:

n→+∞lim P

FQnTn(p)

>1−α

=1.

Proof. Since τ2=+∞

i=1τ, ui 2 then, under H1, there exists (0, p0)F ×N such that τ0, up0 2>0.

Therefore, we have for anypp0,T (p)λp1

0p0τ0, up0 2>0. Forpp0, let us consider a realεsuch that 0< ε < T (p). From the inequality

P Tn(p)T (p) < ε

PTn(p) > T (p)ε

and from the convergence in probability ofTn(p)toT (p)asn→ +∞, we deduce that

n→+∞lim PTn(p) > T (p)ε

=1. (5)

Letn0be an integer such that∀nn0, n1FQ1(1α) < T (p)ε. Then, for allnn0, we have PTn(p) > T (p)ε

PTn(p) > n1FQ1(1α)

and the required result is obtained from (5) and this later inequality. 2

(5)

3. Application to functional variables

Here we show how the test introduced in the previous section can be applied in order to deal with functional vari- ables, so introducing a new approach for testing for homogeneity of means for such variables. LetX=(X(t ))t∈[0,1]be a random variable valued intoL2([0,1])andY be as previously defined. We consider an i.i.d. sample{(Xi, Yi)}1in

of(X, Y ), assuming that each functionXi is observed on fine grids of pointst1, . . . , tLsatisfying 0t1< t2<· · ·<

tL1. This leads to then×LmatrixX=(Xi(tj))1in,1jLthat contains all observations of X. For showing how the previous test forH0 can be applied here, we just have to explain how to obtain the matricesΓˆ andΣˆ in practice. First note that, theλˆi’s and the scorescˆi(k)= Xi,uˆk are output of the function pca.fd that is available in the R package fda, and which require the matrixXas argument. So, by using this function the matrixΓˆ can be easily computed. Moreover, it is easy to check that ˆui,Vˆnuˆk = ˆβikn and ˆui,Vˆn(j )uˆk = ˆβikn|j, where

βˆikn =1 n

n r=1

cˆr(i)− ¯c(i) ˆ

cr(k)− ¯c(k)

and βˆikn|j= 1 ˆ nj

n r=1

1I{Yr=j} ˆ

cr(i)− ¯c(i) ˆ

cr(k)− ¯c(k) ,

withc(i)¯ =n1n

r=1cˆr(i). Therefore, from (4) we deduce thatΣˆ is constructed as in (3) with entries given by:

ˆ

σij k=δj pˆjβˆikn|j+ ˆpjpˆβˆikn − ˆβikn|j− ˆβikn|

. (6)

Acknowledgements

J.G. Aghoukeng Jiofack is grateful for financial support from the Agence Universitaire de la Francophonie (AUF).

G.M. Nkiet’s work was partly supported by the academy of sciences for the developing world (TWAS) through the grant 05-154 RG/MATHS/AF/AC.

References

[1] P. Billingsley, Convergence of Probability Measures, Wiley, 1968.

[2] C. Bugli, P. Lambert, Functional ANOVA with random functional effects: An application to event-related potentials modelling for electroen- cephalograms analysis, Statistics in Medicine 25 (2006) 3718–3739.

[3] H. Cardot, F. Ferraty, A. Mas, P. Sarda, Testing hypotheses in the linear functional model, Scand. J. Statist. 30 (2003) 241–255.

[4] H. Cardot, A. Goia, P. Sarda, Testing for no effect in functional linear regression models, some computational approaches, Commun. Stat.

Simulation Comput. 33 (2004) 179–199.

[5] A. Cuevas, M. Febrero, R. Fraiman, An ANOVA test for functional data, Comput. Statist. Data Anal. 47 (2004) 111–122.

[6] A. Cuevas, R. Fraiman, On depth measures and dual statistics. A methodology for dealing with general data, J. Multivariate Anal. 100 (2009) 753–766.

[7] S. Dossou-Gbete, A. Pousse, Asymptotic study of eigenelements of a sequence of random selfadjoint operators, Statistics 22 (1991) 479–491.

[8] J. Fan, S.K. Lin, Test of significance when data are curves, J. Amer. Statist. Assoc. 93 (1998) 1007–1021.

[9] F. Ferraty, Ph. Vieu, Nonparametric Functional Data Analysis: Theory and Practice, Springer, 2006.

[10] L. Ferré, A.F. Yao, Functional sliced inverse regression analysis, Statistics 37 (2003) 475–488.

[11] P. Kokoszka, I. Maslova, J. Sojka, L. Zhu, Testing for lack of dependence in the functional linear model, Can. J. Statist. 36 (2008) 1–16.

[12] A. Mas, Testing for the mean of random curves: A penalization approach, Stat. Inference Stoch. Process. 10 (2007) 147–163.

[13] A.M. Mathai, S.B. Provost, Quadratic Forms in Random Variables: Theory and Applications, Dekker, 1992.

[14] J.O. Ramsay, B.W. Silverman, Functional Data Analysis, Springer, 2005.

Références

Documents relatifs

In this article we give a necessary and sufficient condition in order that a measure on a Hilbert space be Gaussian.. We also

In the case when the gauge group is of the form C ∞ (M, G) this study was developed by Gelfand, Graev and Vershik ([8] and [9]), Wallach ([27]), Pressley ([24]) and Albeve- rio

— We give a relation between the sign of the mean of an integer-valued, left bounded, random variable X and the number of zeros of 1 − Φ(z) inside the unit disk, where Φ is

According to the TCGA batch effects tool (MD Anderson Cancer Center, 2016), the between batch dispersion (DB) is much smaller than the within batch dis- persion (DW) in the RNA-Seq

The key difference is that while those bounds are typically used to bound the deviation between empirical and expected means under the assumption that the data are drawn from the

Our purpose is to examine whether the distribution of the SBP changed during time, and which type of transformation it underwent. Following our notations, we will study

Index Terms— Consistency in uniform metric, classification, functional data analysis, pattern recognition, uniform consistency, universal consistency.. The quality of a classifier

The above theorem states that the clustering scheme is strongly stable for F 2 provided the absolute margin condition holds.. Here, we briefly argue that this result is op- timal in