• Aucun résultat trouvé

Validity of the parametric bootstrap for goodness-of-fit testing in semiparametric models

N/A
N/A
Protected

Academic year: 2022

Partager "Validity of the parametric bootstrap for goodness-of-fit testing in semiparametric models"

Copied!
32
0
0

Texte intégral

(1)

www.imstat.org/aihp 2008, Vol. 44, No. 6, 1096–1127

DOI: 10.1214/07-AIHP148

© Association des Publications de l’Institut Henri Poincaré, 2008

Validity of the parametric bootstrap for goodness-of-fit testing in semiparametric models

Christian Genest

a

and Bruno Rémillard

b

aDépartement de mathématiques et de statistique, Université Laval, Québec (Québec), Canada G1K 7P4.

E-mail: Christian.Genest@mat.ulaval.ca

bService d’enseignement des méthodes quantitatives de gestion, HEC Montréal, 3000, chemin de la Côte-Sainte-Catherine, Montréal (Québec), Canada H3T 2A7. E-mail: Bruno.Remillard@hec.ca

Received 24 May 2006; revised 3 July 2007; accepted 25 September 2007

Abstract. In testing that a given distributionPbelongs to a parameterized familyP, one is often led to compare a nonparametric estimateAnof some functionalAofP with an elementAθncorresponding to an estimateθnofθ. In many cases, the asymptotic distribution of goodness-of-fit statistics derived from the processn1/2(AnAθn)depends on the unknown distributionP. It is shown here that if the sequencesAnandθnof estimators are regular in some sense, a parametric bootstrap approach yields valid approximations for theP-values of the tests. In other words ifAn andθn are analogs ofAn andθn computed from a sample fromPθn, the empirical processesn1/2(AnAθn)andn1/2(AnAθ

n)then converge jointly in distribution to independent copies of the same limit. This result is used to establish the validity of the parametric bootstrap method when testing the goodness- of-fit of families of multivariate distributions and copulas. Two types of tests are considered: certain procedures compare the empirical version of a distribution function or copula and its parametric estimation under the null hypothesis; others measure the distance between a parametric and a nonparametric estimation of the distribution associated with the classical probability integral transform. The validity of a two-level bootstrap is also proved in cases where the parametric estimate cannot be computed easily.

The methodology is illustrated using a new goodness-of-fit test statistic for copulas based on a Cramér–von Mises functional of the empirical copula process.

Résumé. Pour tester qu’une loiP donnée provient d’une famille paramétriqueP, on est souvent amené à comparer une estimation non paramétriqueAnd’une fonctionnelleAdeP à un élémentAθncorrespondant à une estimationθndeθ. Dans bien des cas, la loi asymptotique de statistiques de tests bâties à partir du processusn1/2(AnAθn)dépend de la loi inconnueP. On montre ici que si les suitesAnetθnd’estimateurs sont régulières dans un sens précis, le recours au rééchantillonnage paramétrique conduit à des approximations valides des seuils des tests. Autrement dit siAnetθnsont des analogues deAnetθndéduits d’un échantillon de loiPθn, les processus empiriquesn1/2(AnAθn)etn1/2(AnAθ

n)convergent alors conjointement en loi vers des copies indépendantes de la même limite. Ce résultat est employé pour valider l’approche par rééchantillonnage paramétrique dans le cadre de tests d’adéquation pour des familles de lois et de copules multivariées. Deux types de tests sont envisagés : les uns comparent la version empirique d’une loi ou d’une copule et son estimation paramétrique sous l’hypothèse nulle ; les autres mesurent la distance entre les estimations paramétrique et non paramétrique de la loi associée à la transformation intégrale de probabilité classique. La validité du rééchantillonnage à deux degrés est aussi démontrée dans les cas où l’estimation paramétrique est difficile à calculer. La méthodologie est illustrée au moyen d’un nouveau test d’adéquation de copules fondé sur une fonctionnelle de Cramér–von Mises du processus de copule empirique.

MSC:62F05; 62F40; 62H15

Keywords:Copula; Goodness-of-fit test; Monte Carlo simulation; Parametric bootstrap;P-values; Semiparametric estimation

(2)

1. Introduction

Given independent copiesX1, . . . , Xnof a random vectorXwith cumulative distribution functionF:Rd→R, sup- pose that it is desired to test

H0: FF= {Fθ: θO},

the hypothesis thatF comes from a parametric family of distributions whose members are indexed by a parameterθ belonging to an open setO⊂Rp. To achieve this goal, a natural way to proceed consists of measuring the difference between the empirical distribution function, defined for allx∈Rdby

Fn(x)=1 n

n

i=1

1(Xix), (1)

and a parametric estimateFθnofF derived underH0from some consistent estimateθn=Tn(X1, . . . , Xn)of the true parameter valueθ0. Here and in the sequel, inequalities between vectors are taken to hold componentwise.

Cramér–von Mises, Kolmogorov–Smirnov and many other standard goodness-of-fit procedures are based on sta- tistics expressed as continuous functionalsSn=φ (GFn)of the empirical process

GFn =n1/2(FnFθn).

Formal tests, however, require knowledge of the asymptotic null distribution ofSn, which often depends on the un- known value ofθ.

1.1. The parametric bootstrap

To solve this problem, Stute et al. [26] suggest the following “parametric bootstrap” procedure.

For some large integerN and everyk∈ {1, . . . , N}, repeat the steps below:

(a) Givenθn=Tn(X1, . . . , Xn), generatenindependent observationsX1,k,. . .,Xn,kfrom distributionFθn. (b) Computeθn,k =Tn(X1,k, . . . , Xn,k)and for eachx∈Rd, let

Fn,k (x)=1 n

n

i=1

1

Xi,kx .

(c) ComputeSn,k =φ (GFn,k), where GFn,k=n1/2

Fn,kFθ n,k

.

With the convention that large values ofSnlead to the rejection ofH0, Stute et al. [26] show that under appropriate regularity conditions, an approximateP-value for the test is given by

1 N

N

k=1

1

Sn,k > Sn .

Henze [20] obtained a similar result in the univariate discrete case. In both papers, the validity of the paramet- ric bootstrap stems from the fact that under H0 and as n→ ∞,(Sn, Sn,1 , . . . , Sn,N )converges weakly to a vector (S, S1, . . . , SN)of mutually independent and identically distributed random variables.

(3)

1.2. Motivation for the present work

This investigation was motivated by the need to test the appropriateness of various dependence structures on the basis of a random sample

X1=(X11, . . . , X1d), . . . , Xn=(Xn1, . . . , Xnd)

from a continuous random vectorXwith cumulative distribution functionF. Specifically, denote byF1, . . . , Fd the univariate margins ofXand letC:[0,1]d→ [0,1]be the copula for which Sklar’s representation

F (x1, . . . , xd)=C

F1(x1), . . . , Fd(xd)

holds for allx1, . . . , xd∈R. In fact,Cis simply the cumulative distribution function ofU=ξ(X), whereξ:Rd→Rd is defined for allx1, . . . , xd∈Rby

ξ(x1, . . . , xd)=

F1(x1), . . . , Fd(xd)

. (2)

Unless the margins are known, the vectorsU1=ξ(X1), . . . , Un=ξ(Xn)cannot be observed. However, a consistent estimate ofFj is defined for allt∈Randj ∈ {1, . . . , d}by

Fj n(t )= 1 n+1

n

i=1

1(Xijt ).

This uncommon choice of normalization is used becauseFj n serves later as an argument in score functions and pseudo-likelihoods that could blow up at 1. Letting

ξn(x1, . . . , xd)=

F1n(x1), . . . , Fdn(xd)

, (3)

for allx1, . . . , xd∈R, one could thus base a test of the hypothesis

H0: CC= {Cθ: θO} (4)

on the pseudo-observationsUˆ1=ξn(X1), . . . ,Uˆn=ξn(Xn). Various options are possible; two of them are briefly described below.

Tests based on the empirical copula

Hypothesis (4) could be tested using a Cramér–von Mises or Kolmogorov–Smirnov statisticSn=φ (GCn)with GCn =n1/2(CnCθn),

whereCθn is a parametric estimate ofCθ derived from the estimationθn=T (X1, . . . , Xn)ofθunderH0whileCnis the empirical copula, defined for allu∈ [0,1]dby

Cn(u)=1 n

n

i=1

1(Uˆiu). (5)

This possibility is raised but quickly dismissed by Fermanian [10], due to the complexity of the weak limit ofGCn. See, e.g., [11,12,27] for derivations of the limit of the related empirical copula processn1/2(CnCθ0).

(4)

Tests based on Kendall’s distribution

Another avenue explored by Wang and Wells [29] and Genest et al. [15] is to construct a test of hypothesis (4) on Kendall’s distribution, i.e., the distribution functionKof the probability integral transformW=F (X). Using the fact that one can also writeW=C(U ), Genest and Rivest [17] and Barbe et al. [1] show that a consistent estimate ofK is given by the empirical distributionKnof the pseudo-observationsWˆ1=Cn(Uˆ1), . . . ,Wˆn=Cn(Uˆn). The latter is defined for allw∈ [0,1]by

Kn(w)=1 n

n

i=1

1(Wˆiw). (6)

Thus ifKθ denotes the distribution ofW whenC=CθC, and ifKθn is a parametric estimate ofKθ derived from θn=T (X1, . . . , Xn)under the subsidiary hypothesis

H0: KK= {Kθ: θO}, (7)

a goodness-of-fit test could rely on a continuous functionalSn=φ (GKn)of GKn =n1/2(KnKθn).

Whether hypothesis (4) is tested usingGCn or the subsidiary hypothesis (7) is tested usingGKn, the limiting dis- tribution of the test statisticSndoes not only depend on the unknown parameterθbut also possibly on the nuisance parametersF1, . . . , Fd. Therefore, while the use of a parametric bootstrap may very well yield validP-values, this conclusion cannot be reached on the basis of the results reported by Stute et al. [26], because of the presence of dependence among the sets of pseudo-observationsUˆ1, . . . ,UˆnandWˆ1, . . . ,Wˆn.

1.3. Objective and outline of the paper

The purpose of this work is to establish the validity of the parametric bootstrap in situations where the hypothesis to be tested concerns the distributionP of an unobservables-variate random vectorU, viz.

H0: PP= {Pθ: θO},

where O is an open subset of Rp. Although U cannot be seen, it is assumed that U =ξ(X)for some function ξ:Rd→Rs of an observabled-variate random vectorX, and that a consistent estimatorξnofξ can be constructed from independent copiesX1, . . . , XnofX.

In order to encompass procedures based onGCn andGKn as special cases, suppose that a test ofH0is to be derived from a continuous functionalSn=φ (GAn)of an abstract empirical process of the form

GAn =n1/2(AnAθn).

Here,Aθn andAn stand respectively for a parametric and a nonparametric estimate of an abstract quantityAthat depends onP. More specifically,Ais taken to be a function mapping a closed rectangleT ⊂ [−∞,∞]r intoRs, and Aθ denotes the form taken byAwhenP =Pθ for someθO. Thus for the test based onGCn, one hasT = [0,1]d, r=s=dandAθ=Cθ; similarly,T = [0,1],r=s=1 andAθ=Kθfor the test based onGKn.

The result to be shown here is that the parametric bootstrap yields a valid approximation to the null distribution of the empirical processGAn under appropriate conditions. The main requirements concern the large-sample behavior of the estimatorsAnofAandθnofθthat are constructed from the pseudo-observationsUˆ1=ξn(X1), . . . ,Uˆn=ξn(Xn).

In particular, the processΘn=n1/2nθ )needs to converge weakly, asn→ ∞, to a centered random variableΘ.

This is denoted symbolically

Θn=n1/2nθ )Θ. (8)

Similarly, it must be that, asn→ ∞,

An=n1/2(AnA)A, (9)

(5)

i.e.,Anconverges weakly to a centered processAin the spaceD(T;Rs)of càdlàg processes fromT toRs, equipped with the Skorohod topology.

Additional regularity conditions needed for the result are stated in Section 2. Although these conclusions could possibly be derived within a different framework considered by Bickel and Ren [4], the conditions given here are adapted to the current context and easier to verify than theirs. The present proofs are also different and yield interesting insights. The two-level parametric bootstrap introduced in Section 3 also appears to be novel; it is required in many applications whereAθncannot be computed easily but can be approximated through a parametric bootstrap of its own.

The goodness-of-fit tests for copula models introduced above are revisited in Section 4. Also given there is a multivariate extension of a procedure designed by Durbin [9] for checking the fit of a univariate distribution. As a practical illustration, testing for a Gaussian copula structure is considered in Section 5 on the basis of the empirical copula (5). An explicit algorithm is also provided which can be adapted easily to test for other copula families via one- or two-level parametric bootstrapping. For a more extensive comparison of this procedure with alternative tests for copula models, see Genest et al. [16].

To avoid interrupting the flow of the presentation, most technical arguments are relegated to a series of appendices.

2. Validity of the one-level parametric bootstrap

LetU1, . . . , Unbe a random sample from some distributionP, and assume that it is desired to test the hypothesis H0: PP= {Pθ: θO},

wherePis a family of probability measures onRdindexed by a parameterθliving in an open setO⊂Rp. The family is assumed to be identifiable, i.e.,θ =θPθ =Pθ.

As discussed in the Introduction, letT ⊂ [−∞,∞]r be a closed rectangle and suppose that the test ofH0 is to be based on an abstract mappingA:T →Rs that depends on the true distributionP of U1, . . . , Un. In particular, suppose thatA=Aθ whenP =Pθ, and writeA= {Aθ:θO}. In this general context, identifiability is ensured if for everyε >0,

inf

sup

t∈T

Aθ(t )Aθ0(t ): θOand|θθ0|> ε >0.

This condition is assumed throughout, as one might otherwise haveAθ=Aθ for someθ =θand problems could arise; see, e.g., [24]. Furthermore, the mappingθAθis assumed to be Fréchet differentiable with derivativeA, i.e.,˙ for allθ0O,

hlim0sup

t∈T

Aθ0+h(t )Aθ0(t )− ˙A(t )h

h =0. (10)

Finally, let θn =Tn(U1, . . . , Un) be a consistent estimate of θ and assume that the D(T;Rs)-valued process An=Υn(U1, . . . , Un) estimates A consistently. Suppose specifically that the processes Θn=n1/2nθ ) and An=n1/2(AnA)have centered Gaussian limits whenn→ ∞, as per (8) and (9).

The purpose of this section is to state additional regularity conditions on the familiesP,Aand on the sequences Anandθnof estimators. These requirements will ensure that a parametric bootstrap algorithm approximates correctly the limiting behavior of the empirical process

GAn =n1/2(AnAθn).

Consequently, the parametric bootstrap will also provide a suitable approximation of the asymptotic distribution of goodness-of-fit test statistics expressed as continuous functionalsSn=φ (GAn).

The validity of the parametric bootstrap first depends on smoothness and integrability conditions on the parametric family of distributions.

Definition 1. A familyP= {Pθ: θO}is said to belong to the classS(λ)for a given reference measureλ(indepen- dent ofθ)if:

(6)

1.1. The measurePθis absolutely continuous with respect toλfor allθO.

1.2. The densitypθ=dPθ/dλadmits first and second order derivatives with respect to all components ofθO.The gradient(row)vector with respect toθis denotedp˙θ,and the Hessian matrix is represented byp¨θ.

1.3. For arbitraryu∈Rd and everyθ0O,the mappingsθ→ ˙pθ(u)/pθ(u)andθ→ ¨pθ(u)/pθ(u)are continuous atθ0,Pθ0 almost surely.

1.4. For everyθ0O,there exist a neighborhoodN ofθ0and aλ-integrable functionh:Rd→Rsuch that for all u∈Rd, supθ∈N ˙pθ(u)h(u).

1.5. For everyθ0O,there exist a neighborhoodN ofθ0andPθ0-integrable functionsh1, h2:Rd→Rsuch that for everyu∈Rd,

sup

θ∈N

p˙θ(u) pθ(u)

2h1(u) and sup

θ∈N

p¨θ(u) pθ(u)

h2(u).

In the sequel,θ0represents the true (unknown) value ofθandP =Pθ0. Furthermore, p=pθ0, p˙= ˙pθ0, p¨= ¨pθ0.

Remark 1. Using Condition1.4with the continuity ofp˙θ as a function ofθand Lebesgue’s dominated convergence theorem,one may conclude that

∂θ

pθ(u)g(u)λ(du)=

˙

pθ(u)g(u)λ(du) (11)

for any bounded measurable functiong:Rd→R,not depending onθ.In particular,

˙

p(u)λ(du)=0.Furthermore, ifFθ denotes the distribution function associated withPθ,the mappingθFθ is then Fréchet differentiable and its derivativeF˙θ satisfies the following identity for allx∈Rd:

F˙θ(x)=

˙

pθ(u)1(ux)λ(du). (12)

Remark 2. WhenPS(λ),the multivariate central limit theorem implies that ifU1, . . . , Unform a random sample fromP=Pθ0,then asn→ ∞,

WP ,n=n1/2 n

i=1

˙ p(Ui)

p(Ui) WPN(0, IP), (13)

whereE(WP)=0by Remark1andIP is the Fisher information matrix,viz.

IP =

p˙(u)p(u)˙

p(u) λ(du). (14)

The validity of the parametric bootstrap also relies on the following general notion ofP-regularity of estimators.

It is cast below in terms ofAnbut it applies also to many other sequences in the sequel, e.g., in the caseAn=θn. Definition 2. LetU1, . . . , Unbe a random sample fromP =Pθ0 and letWP ,nbe defined as in(13).A sequenceAnis said to bePθ0-regular forA=Aθ0if,asn→ ∞,the process(An,WP ,n)withAn=n1/2(AnA)converges weakly inD(T;Rs)×Rpto a centered Gaussian pair(A,WP)and the Fréchet derivativeA˙ofAdefined in(10)satisfies A(t )˙ =E{A(t )WP}for everytT.The sequence is said to beP-regular forAif it isPθ0-regular forAθ0 at all θ0O.

Remark 3. TheP-regularity of a sequence of estimatorsθn=Tn(U1, . . . , Un)forθOimplies thatΘn=n1/2nθ )Θasn→ ∞,whereΘis a centered Gaussian random vector andE(ΘWP)=I is the identity matrix.

(7)

Now letU1, . . . , Unbe a bootstrap sample fromPθn, and set θn=Tn

U1, . . . , Un

, Θn=n1/2 θnθ

, An=Υn

U1, . . . , Un

, An=n1/2

AnA .

The following result, whose proof is given in Appendix B, gives conditions under which the weak limits of the processes

GAn =n1/2(AnAθn) and GAn=n1/2

AnAθn

.

are independent and identically distributed. This guarantees that a parametric bootstrap based on the processAn is valid.

Theorem 1. Assume thatPS(λ)and that asn→ ∞,

(An, Θn,WP ,n)(A, Θ,WP) (15)

inD(T;Rs)×Rp2,where the limit is a centered Gaussian process.LetΓ =E(ΘWP)and seta(t )=E{A(t )WP} for everytT.Then,asn→ ∞,

An,An, Θn, Θn

A,A, Θ, Θ

inD(T;Rs)2×Rp⊗2.In the limit,A=A+ and Θ=Θ+Γ Θ are defined in terms of an independent copy(A, Θ)of(A, Θ).If in addition(An, θn)isP-regular forA×O,then

GAn,GAn

GA,GA

=

A− ˙AΘ,A− ˙

inD(T;Rs)2,asn→ ∞,andGA is an independent copy ofGA. 3. A two-level parametric bootstrap

To perform a goodness-of-fit test based on a continuous functionalSn=φ (GAn)of the process GAn =n1/2(AnAθn),

one must computeAθnat various points, but this is not always easily done.

For tests based on the empirical copula, for instance, one hasAθn=Cθn and many copula families are not alge- braically closed. In this case, a simple way to circumvent the problem is to generate a random sampleV1, . . . , Vm from probability measureQθnwith distribution functionCθnand foru∈ [0,1]d, to approximateCθn(u)by

Cˇn(u)= 1 m

m

j=1

1 Vju

.

It is typical to takem= γ nfor someγ(0,), but it will only be assumed here thatmis a function ofnsuch that m/nγ(0,)asn→ ∞.

More generally, the strategy proposed here consists of replacingAθnby an approximationAˇn=Ψm(V1, . . . , Vm) built from a random sampleV1, . . . , Vm fromQθnQ= {Qθ: θO}. In order for this approach to make sense, it must be assumed that ifA=Aθ0 andAˇn=Ψm(V1, . . . , Vm)for a random sampleV1, . . . , VmfromQ=Qθ0, then

n=n1/2(AˇnA)

inD(T;Rs), asn→ ∞(and hencem→ ∞).

Given that such a process exists, here is a natural way to circumvent the lack of a closed form forAθn in the computation of the test statisticSn:

(8)

(a) Computeθn=Tn(U1, . . . , Un)and letAn=Υn(U1, . . . , Un).

(b) GivenU1, . . . , Un, generate a random sampleV1, . . . , VmfromQθn.

(c) LetAˇn=Ψm(V1, . . . , Vm)and computeSn=φ (GAnˇ), in whichGAnˇ=n1/2(An− ˇAn).

Now in order to approximate the distribution ofSn, a second parametric bootstrap procedure is necessary. To this end, pickN large and repeat the following steps for everyk∈ {1, . . . , N}:

(a) GivenU1, . . . , Un,V1, . . . , Vm, generate a random sampleU1,k , . . . , Un,k fromPθn. (b) Computeθn,k =Tn(U1,k , . . . , Un,k )and letAn,k=Υn(U1,k , . . . , Un,k ).

(c) GivenU1, . . . , Un,V1, . . . , VmandU1,k , . . . , Un,k , generate a random sampleV1,k∗∗, . . . , Vm,k∗∗ fromQθ n,k. (d) LetAˇ∗∗n,k=Ψm(V1,k∗∗, . . . , Vm,k∗∗)and computeSn,k =φ (GAn,kˇ∗∗), in whichGAn,kˇ∗∗=n1/2(An,k− ˇA∗∗n,k).

With the convention that large values ofSnlead to the rejection ofH0, and under regularity conditions stated below, a valid approximation to theP-value for the test based onSn=φ (GAnˇ)is given by

1 N

N k=1

1

Sn,k > Sn

.

As for the standard parametric bootstrap, the validity of the above two-level extension is ensured, provided that one can show that, asn→ ∞,(GAnˇ,GAn,1ˇ∗∗)converges weakly inD(T;Rs)2to a pair of independent and identically distributed limiting processes.

Assume thatQS(ν)for some reference measureν(independent ofθ). Writeqθfor the density ofQθ, letq˙θ be the gradient (row) vector with respect toθ, and denote the Hessian matrix byq¨θ. WhenQ=Qθ0, write by extension

q=qθ0, q˙= ˙qθ0, q¨= ¨qθ0.

Note that whenQS(ν), the multivariate central limit theorem implies that ifV1, . . . , Vmform a random sample fromQ=Qθ, then, asn→ ∞,

WQ,n=n1/2 m

i=1

˙ q(Vi)

q(Vi) WQN(0, IQ), (16)

where in view of the fact thatm/nγ(0,)asn→ ∞, IQ=γ

q˙(u)q(u)˙

q(u) ν(du). (17)

Now letU1, . . . , Un andV1, . . . , Vm be two mutually independent random samples fromP =Pθ0P andQ= Qθ0Q, respectively. LetWP ,nandWQ,nbe defined as in (13) and (16), respectively. Conditionally onU1, . . . , Un andV1, . . . , Vm, make the following additional assumptions:

(a) Givenθn=Tn(U1, . . . , Un), the random vectorsU1, . . . , UnandV1, . . . , Vm are mutually independent random samples fromPθn andQθn, respectively.

(b) GivenU1, . . . , Un andV1, . . . , Vm andθn=Tn(U1, . . . , Un), the random vectorsV1∗∗, . . . , Vm∗∗ are a random sample fromQθ

n.

Finally, introduce the additional notations Aˇn=Ψm(V1, . . . , Vm), Aˇn=Ψm

V1, . . . , Vm

, Aˇ∗∗n =Ψm

V1∗∗, . . . , Vm∗∗

and

n=n1/2(AˇnA),n=n1/2AˇnA

,∗∗n =n1/2Aˇ∗∗nA .

(9)

The following result, whose proof is given in Appendix C, gives conditions under which the weak limits of the processes

GAnˇ=n1/2

An− ˇAn

and GAnˇ∗∗=n1/2

An− ˇA∗∗n

are independent and identically distributed. This proves the validity of a two-level parametric bootstrap based on the processAn.

Theorem 2. Assume thatPS(λ),QS(ν)and that asn→ ∞, (An,n, Θn,WP ,n,WQ,n)(A,, Θ,WP,WQ)

and that the limit is a centered Gaussian process in D(T;Rs)m2×Rp3. Let Γ =E(ΘWP) and set a(t )= E{A(t )WP}anda(t )ˇ =E{ ˇA(t )WQ}for everytT.Then,asn→ ∞,

An,An,n,n,∗∗n , Θn, Θn

A,A,,,, Θ, Θ

inD(T;Rs)5×Rp⊗2.In the limit,

A=A+aΘ, Θ=Θ+Γ Θ,= ˇA+ ˇaΘ,= ˇA⊥⊥+ ˇ,

where(A, Θ)is an independent copy of(A, Θ).In addition,the processesA,ˇ Aˇand⊥⊥are mutually indepen- dent and identically distributed,as well as independent ofA,A,ΘandΘ.Moreover if(An, θn)isP-regular for A×OandAˇnisQ-regular forA,then

GAnˇ,GAnˇ∗∗

GAˇ,GAˇ∗∗

=

A− ˇA− ˙AΘ,A− ˇA⊥⊥− ˙

inD(T;Rs)2,asn→ ∞,andGAˇ is an independent copy ofGAˇ∗∗. 4. Examples of application

In this section, the validity of the one- and two-level parametric bootstrap is established in four common goodness-of- fit testing contexts. The first example considers classical tests for parametric families of random vectors; it is discussed here because the conditions under which Theorems 1 and 2 are established seem easier to verify than the requirements imposed by Stute et al. [26]. The second and the third examples are about goodness-of-fit for copula models, while the last application revisits the approach of Durbin [9] for goodness-of-fit testing of parametric families of random vectors using the probability integral transformation.

4.1. Goodness-of-fit tests for parametric families

LetXbe ad-variate random vector with continuous distribution functionF. Suppose that it is desired to test the null hypothesis

H0: FF= {Fθ: θO},

i.e.,F =Fθ0 for someθ0O. Given a random sampleX1, . . . , Xn fromF, a natural procedure is to compare the empirical distribution function (1) toFθn, whereθn=Tn(X1, . . . , Xn)is an estimation of the unknown parameter θ∈Rp. The test could be based, e.g., on a Cramér–von Mises or on a Kolmogorov–Smirnov functionalSn=φ (GFn) of the empirical process

GFn =n1/2(FnFθn).

(10)

To establish the validity of the parametric bootstrap for such statistics, one can use Theorems 1 and 2 withAθ=Fθ andPθ standing for the unique probability measure associated withFθ and density fθ. Assume thatP= {Pθ: θO} ∈S(λ), whereλis Lebesgue’s measure. Introduce the following notation:

f =fθ0, f˙= ˙fθ0, f¨= ¨fθ0.

To check theP-regularity ofF, letFn=n1/2(FnF )and WF,n=n1/2

n

i=1

f˙(Xi)

f (Xi) . (18)

Results from [5] imply that asn→ ∞,(Fn,WF,n)(F,WF)inD([−∞,∞]d;R)×Rp, whereWF is a centered Gaussian variable with variance

IF =

f˙(x)f (x)˙ f (x) λ(dx)

andFis anF-Brownian bridge, i.e.,Fis a continuous centered Gaussian process with covariance function cov

F(x),F(y)

=F (xy)F (x)F (y),

wherexy=min(x, y)for allx, y∈Rd. The following result is a consequence of these observations and the fact that for allx∈Rd,

E

F(x)WF

=

f (y)1(y˙ ≤x)λ(dy)= ˙F (x)

in view of Eq. (12).

Proposition 1. LetX1, . . . , Xnbe a random sample from distributionF=Fθ0for someθ0O.IfPS(λ),then the canonical empirical distribution functionFndefined in(1)isP-regular forF.

Next, assume thatθnis aP-regular sequence forOsuch that, asn→ ∞, (Fn, Θn,WF,n)(F, Θ,WF)

in D([−∞,∞]d;R)×Rp⊗2. Suppose further that the limit is Gaussian, so that condition (15) is satisfied with An=Fn. It then follows that (Fn, θn)is P-regular forF×Obecause E(FWF)= ˙F = ˙Fθ0 by Proposition 1 and E(ΘWF)=I by the regularity hypothesis onθn.

Finally, all the conditions of Theorems 1 and 2 are met withA=F,An=Fn andAˇn= ˇFn, where the latter is defined for allx∈Rdby

Fˇn(x)= 1 m

m

i=1

1(Yix)

in terms of a random sampleY1, . . . , YmfromPθ that is independent ofX1,. . . , Xn. Therefore, the one- and two-level parametric bootstraps yield valid approximations of the distribution of any continuous functionalSn=φ (GFn).

In this context, the class of estimators that areP-regular forOis broad, as shown below.

Definition 3. An estimatorθn=Tn(X1, . . . , Xn)forθOis said to belong to classRifn1/2nθ )=Θn+oP(1), where

Θn =n1/2 n

i=1

Jθ(Xi) (19)

(11)

is expressed in terms of a score functionJθ:Rd→Rpthat is square integrable with respect toPθ and such that for allθO,one has both

Eθ

Jθ(X)

=

Jθ(x)fθ(x)λ(dx)=0 and

Jθ(x)f˙θ(x)λ(dx)=I. (20)

Proposition 2. Letθn=Tn(X1, . . . , Xn)be an estimator ofθOfrom the classR.If PS(λ),then(Fn, θn)is P-regular forF×O.

To establish this result, first note that each component of the vector(Fn, Θn,WF,n)is tight and that the finite- dimensional distributions converge by the classical multivariate central limit theorem, because each term is a sum of independent and identically distributed centered random variables. In addition, observe thatE(ΘWF)=I by Eq. (20).

Example 1. When it is uniquely defined andIF is non-singular,the maximum likelihood estimator belongs toR.For, in that case,relation(19)holds withJθ=IF1f˙θ/fθ.Furthermore,this function satisfies conditions(20)because of identity(11)and from the fact that underP =Pθ0,

E ΘWF

=IF1

f˙(x)f (x)˙

f (x) λ(dx)=IF1IF =I.

Example 2. Moments estimators also belong toR.Assume that θ=g(μ) and μ=

M(x)fθ(x)λ(dx)

for some integrable functionM:Rd→Rdthat does not depend onθ.Suppose also thatgis continuously differen- tiable and that the matrixg˙of derivatives is non-singular.Theng1exists and is continuously differentiable by the inverse function theorem.Furthermore,Slutsky’s theorem implies that for allx∈Rd,

Jθ(x)= ˙g

g1(θ )

M(x)g1(θ ) .

This score function meets the appropriate requirements because of(11)and the fact that underP, E

ΘWF

= ˙g

g10) h(x)f (x)˙ f (x)λ(dx)

= ˙g

g10)

∂θ

M(x)fθ(x)λ(dx)

θ=θ0

= ˙g

g10)

∂θg1(θ )

θ=θ0

=I.

Example 3. When it is uniquely defined,the estimatorθnminimizing

n(θ )= Fn(x)Fθ(x)2

dFn(x)

betweenFnandFθalso belongs toR,provided that Σθ=

F˙θ(x)F˙θ(x)fθ(x)λ(dx)

is non-singular for everyθO.In this case,representation(19)holds with Jθ(x)=Σθ1 1(xy)Fθ(y)F˙θ(y)fθ(y)λ(dy)

Références

Documents relatifs

In order to test that the excess risk function λ exc belongs to a parametric family we build some statistical tests based on the discrepancy process, that is the difference between

More speci…cally, we are concerned with the correct speci…cation (or model selection) of the dynamic structure with time series and/or spatial stationary processes fx (t)gt 2Z de…ned

This part is not exactly an illustration of the theoretical part since it does not focus on the separating rate between the alternative and the null hypothesis, but it is devoted to

Table 4 presents simulation results on the power of tests for the Singh- Maddala distribution. The values given in the table are the percentage of rejections of the null H 0 : x ∼ SM

In the present work 1 , we propose a nonparametric testing framework in which we develop a generalized likelihood ratio test to perform curve registration.. We first prove that,

In order to check that a parametric model provides acceptable tail approx- imations, we present a test which compares the parametric estimate of an extreme upper quantile with

The key of the proof is to exhibit a testing procedure minimax in the inverse setting (7) that may fail in certain situations for the direct testing issue. We deal in this paper with

In the case of Sobolev null density f 0 , the fact that our testing procedure attains the minimax rate of testing (upper bound of the testing rate (4)), relies on a very sharp