• Aucun résultat trouvé

Rates of convergence for the weighted bootstrap of empirical and raking-ratio processes

N/A
N/A
Protected

Academic year: 2021

Partager "Rates of convergence for the weighted bootstrap of empirical and raking-ratio processes"

Copied!
45
0
0

Texte intégral

(1)

HAL Id: hal-01979703

https://hal.archives-ouvertes.fr/hal-01979703

Preprint submitted on 13 Jan 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Rates of convergence for the weighted bootstrap of empirical and raking-ratio processes

Mickael Albertus, Philippe Berthet

To cite this version:

Mickael Albertus, Philippe Berthet. Rates of convergence for the weighted bootstrap of empirical and

raking-ratio processes. 2019. �hal-01979703�

(2)

Rates of convergence for the weighted bootstrap of empirical and raking-ratio processes

M. Albertus and P. Berthet January 13, 2019

Abstract

We study the weighted bootstrap of the empirical process indexed by a class of functions, when the weights are allowed to be data dependent.

In addition to the classical one, we also consider three weighted bootstrap new methods based on the raking-ratio process using an auxiliary infor- mation on N partitions. Assuming entropy conditions like VC dimension, we use nonasymptotic strong approximation arguments to characterize the joint limiting Gaussian processes of b

n

bootstrap experiments and to eval- uate the rate of weak uniform convergence as b

n

Ñ +8 with the initial sample size n Ñ +8. Berry-Esseen bounds for bootstrapped statistics follows. This justifies the weighted bootstrap methodology to estimate the distribution of raked statistics, in particular their lower variance and smaller confident bands.

Keywords: Weighted bootstrap, Uniform central limit theorems, Non- parametric statistics, empirical processes, raking ratio process.

MSC Classification: 62G09, 62G20, 60F17, 60F05.

Contents

1 Introduction 2

1.1 The classical bootstrap . . . . 2

1.2 The weighted bootstrap . . . . 4

1.3 Weighted bootstraps . . . . 6

2 Main results 9 2.1 The class F . . . . 9

2.2 Strong approximations . . . . 10

2.3 Estimation of variance and distribution function . . . . 12

2.4 Rates of weak convergence . . . . 15

(3)

3 Raking-Ratio results 15

3.1 Strong approximation of α

˚n(N)

. . . . 15

3.2 Strong approximation of α

(N)n ˚

. . . . 17

4 Proofs 18 4.1 Decomposition of α

˚n

. . . . 18

4.2 Proof of Propositions 1, 4 and 5 . . . . 20

4.3 Construction of limit Gaussian processes . . . . 23

4.4 Proof of Theorem 2.1 and 2.2 . . . . 26

4.5 Proof of Theorem 3.1 and 3.2 . . . . 30

4.6 Proof of Proposition 6 . . . . 33

4.7 Proof of Corollaries 1 and 2 . . . . 36

4.8 Proof of Corollaries (3) and (4) . . . . 39

1 Introduction

1.1 The classical bootstrap

Presentation. The bootstrap is a very popular method of statistical infer- ence introduced by Efron [15, 16] that could be viewed as a generalization of the older jackknife method or leave k-out methods. Given n ě 1 inde- pendent random variables X

1

, ..., X

n

with common law P on a measurable space ( X , A ) let P

n

=

1n

ř

n

i=1

δ

Xi

denote the associated empirical measure where δ

Xi

are the Dirac measures. Any statistic of interest S

n

(X

1

, ..., X

n

) be- ing symmetric in its arguments can be written φ( P

n

). Whenever S

n

is not known to satisfy good estimation or test properties, it is unfortunately observed only once. The classical bootstrap aims to learn about unobserved proper- ties of S

n

by re-sampling at will X

1˚

, ..., X

n˚

among X

1

, ..., X

n

uniformly with replacement then estimating by Monte-Carlo methods the bias, variance or dis- tribution of S

n˚

= S

n

(X

1˚

, ..., X

n˚

) = φ( P

˚n

) centered at S

n

(X

1

, ..., X

n

), with P

˚n

=

1n

ř

n

i=1

δ

X˚

i

. The paradigm of Efron is that without any information on P the best way to mimic the unknown product measure (P )

n

= P ˆ ¨ ¨ ¨ ˆ P of the original sample is to use the product empirical measure ( P

n

)

n

= P

n

ˆ ¨ ¨ ¨ ˆ P

n

and to center P

˚n

at P

n

instead of P.

Motivation. From the mathematical statistics viewpoint a crucial question that has not been investigated in general is to quantify the information one really gets about the distributions of S

n

when bootstrapping b

n

times, with b

n

Ñ +8 as n Ñ +8. Towards this aim we address two unusual problems.

The first problem is to find a sufficient condition on how to choose b

n

to control the joint weak convergence of bootstrapped experiments. The second problem is how to add auxiliary information on P while bootstrapping.

Monte-Carlo bootstrap. The mathematical justification of the bootstrap

methodology is not obvious, even for a single explicit targeted statistic S

n

since

it strongly depends on φ itself. Strictly speaking, one should evaluate how

close the random experiments φ( P

˚n

) and φ( P

n

) are as n Ñ +8 and derive

(4)

the consequences on the subsequent estimation procedure from b

n

conditionally independent bootstraps. The known answers are mostly asymptotic and don’t involve n and b

n

together. Usually b

n

= 1 when n Ñ +8 and, for fixed n, it is implicit that b

n

Ñ +8 allows to numerically learn φ( P

˚n

). However, the learned probability distribution is conditional to the initial sample, and the difference with φ( P

n

) could be misleading. Thus letting b

n

Ñ +8 is not so useful due to the overfitting phenomenon. In other words, a rigorous compromise with the bias involving jointly the initial sample and the bootstrap samples is missing in the theory. We look for b

n

small enough to provide non-asymptotic joint results.

Asymptotic justification. The statistics S

n

we consider are sensitive to deviations between empirical and true expectations over a class of functions F Ă L

2

(P ). They are determined by α

n

(f ) = ?

n( P

n

(f ) ´ P (f)) where P

n

(f ) = n

´1

ř

n

i=1

f (X

i

) and P (f ) = E (f (X)), f P F . The collection α

n

( F ) = tα

n

(f ) : f P Fu is called the empirical process α

n

indexed by F . Its boot- strapped version is α

˚n

( F ) = tα

˚n

(f ) : f P Fu, α

˚n

(f ) = ?

n( P

˚n

(f )´P

n

(f )) where P

˚n

is the bootstrapped empirical measure introduced below. What is usually established is that conditionally to X

1

, ..., X

n

the bootstrap process α

˚n

( F ) has the same behavior as the empirical process α

n

( F ) for n large. Basically, with probability one α

˚n

weakly converges to the weak limit of α

n

as n Ñ +8, most often the P -Brownian bridge G ( F ) indexed by F if F is a Donsker class. The bootstrap method is therefore justified at the first order if S

n

= φ( P

n

) P R

d

, S

n˚

= φ( P

˚n

) and φ is Fréchet-differentiable at P since then the distribution of Y

n

= ?

n(S

n

´φ(P )) can be estimated by the distribution of Y

n˚

= ?

n(S

n˚

´S

n

) which is in smooth cases asymptotically the same random vector φ

1

(P )¨ G as for Y

n

provided that the differential distortion φ

1

( P

n

) ´ φ

1

(P ) vanishes. Whenever Y

n˚

is simulated b

n

times, the distortion generates a bias and b

n

should be cali- brated to avoid overfitting and learning the bias through φ

1

( P

n

). In particular cases the weak distance between the distributions of Y

n

, Y

n˚

and φ

1

(P ) ¨ G or between the distributions of S

n˚

and S

n

are known to vanish. No general esti- mate is available at fixed n, hence the balanced choice of b

n

to guaranty well controlled Monte-Carlo estimates can not be discussed. This is the main moti- vation of this paper, however we work with the weighted bootstrap version of α

˚n

and S

n˚

together with three new variants exploiting some auxiliary information.

Weak convergence. Giné and Zinn [17] proved that for any class of functions

F with envelope in L

2

(P ) the weak convergence of α

n

( F ) to a - Gaussian or not

- process G indexed by F is necessary and sufficient for the Efron’s bootstrap

empirical process α

˚n

( F ) to almost surely converge weakly to G ( F ) also. This

very nice statement is one of the most general results of the huge literature

on the bootstrap methodology. For a single real valued and regular statistic

S

n

a common approach is through Edgeworth expansions, which exploits the

cumulant expansion of the distribution function, see e.g. Hall [19] or Shao and

Tu [29]. Other approaches rely on Berry-Esseen bounds, like in Singh [31] or

Mallows distances, as in Bickel and Freedman [9]. Under the name of Bayesian

bootstrap, Rubin [28] defined an analogue of Efron’s bootstrap by resampling

according to exchangeable weights that are independent of X

1

, ..., X

n

rather

(5)

than uniformly according to P

n

. In the case X = R , Mason and Newton [26]

further generalized the bootstrap by independently assigning self-normalized random weights to the original data. If the weights are drawn independently from a multinomial distribution this reduces to the Efron’s bootstrap whereas if the weights come independently from a Dirichlet distribution this reduces to the Bayesian bootstrap. They established the weak convergence of this weighted real empirical process to a Brownian bridge provided the positive exchangeable weights satisfy a weak convergence condition. Unlike Giné and Zinn [17], their result does not handle the case indexed by F . Præstgaard and Wellner [27] fills this gap, by still assuming that the exchangeable weights are independent from the data, with again a common weak limit G ( F ) for α

n

( F ) and α

˚n

( F ).

About rates. The above weak convergence of Efron’s and weighted bootstrap processes are usually formulated in the sense of Hoffmann-Jørgensen to handle carefully measurability problems – see [4, 20]. The main results are assembled in chapter 3.6 of Van der Vaart and Wellner [35] – see also Kosorok [23]. The weak convergence is usually quantified in terms of the bounded Lipschitz norm be- tween the processes α

˚n

( F ) and G ( F ), with no explicit rate. Obviously the rates could be arbitrarily slow for large F or inadequate couples (P, F ). A nice feature of our approach is that it provides general and explicit rates at the empirical process level for typical P -Donsker or universal Donsker classes F , with quanti- fied statistical consequences. A few authors considered the distance between the probability measures themselves, like P

˚n

, P

n

or P . For instance Shao [30] proved that the bounded Lipschitz distance between the uniform empirical measure P

n

on the d-dimensional unit cube and the Efron’s bootstrap empirical measure P

˚n

is bounded by O(n

´1/d

) if d ą 2 and O(n

´1/2

(log n)

(d´1)/2

) if d = 1, 2. This improved Beran, Le Cam and Millar’s result [7] which only implies the con- vergence to zero, however in the more general indexed by sets setting. Other metrics have been studied in the indexed by F setting. For instance, Barbe and Bertail [6] showed that various supremum type distances between the weighted bootstrap measure P

˚n

and P on F are O(n

´1/2

(log n)

1/2

) in probability where the extra log n term can be removed by following [27]. Likewise, we derive al- most sure first order rates n

´1/2

together with second order rates, so that the weak distance between the distribution of S

n˚

and S

n

may be evaluated.

1.2 The weighted bootstrap

Our primary goal is to sharpen the above results for the self-normalized weighted bootstrap empirical measure in terms of distance in distribution. As a conse- quence, general answers to the two problems of the initial motivation follow.

Extensions to Efron’s bootstrap will be studied elsewhere.

A strong approximation approach. By strong approximation we mean a

coupling of the process α

˚n

( F ) and its Gaussian limiting process. This avoids

cumbersome weak convergence measurability considerations while providing a

sufficient control over the weak convergence metrics and being easy to use. Such

a Brownian coupling has been established in the very specific setting where

(6)

KMT [22] can be applied, that is when P is the uniform law on (0, 1). Alvarez- Andrade and Bouzebda [3] derived the usual almost sure rate O(n

´1/2

log n) of strong approximation by a sequence of Brownian bridges for the weighted bootstrap process α

˚n

(u) = ?

n( F

˚n

(u) ´ F

n

(u)), where F

n

(u) = 1

n

n

ÿ

i=1

1

tXiďuu

, F

˚n

(u) = 1 n

n

ÿ

i=1

W

i,n

1

tXiďuu

, u P (0, 1), and, for independent and identically distributed (i.i.d.) weights Z

1

, ..., Z

n

also independent from X

1

, ..., X

n

,

W

i,n

= Z

i

T

n

, T

n

=

n

ÿ

i=1

Z

i

, (1.1)

provided E (Z) = V (Z ) = 1 and Z

1

has a Laplace transform in a neighborhood of 0. Following the forthcoming arguments this induces a distance O(n

´1/2

log n) between P

n

and P

˚n

in various weak convergence metrics.

Main result in the classical setting. In this paper we revisit the above mentioned results for the self-normalized weighted bootstrap empirical process

α

n˚

(f ) = ?

n( P

˚n

(f ) ´ P

n

(f )), P

˚n

(f ) = 1 n

n

ÿ

i=1

W

i,n

f (X

i

), f P F , (1.2) with the weights W

i,n

from (1.1). Clearly for F = t1

t¨ďtu

: t P Ru and the uniform distribution on (0, 1) we recover the bootstrapped empirical process α

˚n

(u) of [3] defined above, howewer we allow any distribution P on R for this class F , not only absolutely continuous ones. Furthermore we relax the usual assumption that the resampling weights Z

i

/T

n

are independent from the original sample by allowing (X

i

, Z

i

) to be i.i.d. with some distribution P

(X,Z)

while still controlling the marginal laws P of X and P

Z

of Z. Our main result is a nonasymptotic joint strong approximation of b

n

bootstrap iterations of α

˚n

( F ) by independent versions of the same Gaussian process G ( F ) as the weak limit of α

n

( F ), jointly to the approximation of α

n

( F ) itself. This allows to turn these conditionally independent, and orthogonal, empirical processes into independent Gaussian processes. It ensues an uniform central limit theorem for the bootstrap procedure (1.2) with rates in various weak convergence metrics, including uniform Berry-Esseen type results and distances between distributions of S

n˚

and S

n

. The rigorous control of the bootstrap Monte-Carlo procedure itself is our most innovative contribution.

Bootstrapping under auxiliary information. Our secondary goal is to

extend the bootstrap procedure to a less classical setting where an auxiliary in-

formation is known or learned about P . The motivation comes from the hasty

development of distributed data. In this context it is realistic to consider a

global statistical model where several sources learn deeply about one partial

aspect of P and only communicate their conclusions rather than their too large

(7)

or confidential samples. In the next section we define the raking-ratio empirical process that combines these informations through N iterations of the procedure in order to improve the inference from X

1

, ..., X

n

as studied in details in [1]. The underlying intuition connecting the bootstrap and the raking-ratio is twofold.

On the one hand a better knowledge of P may help the bootstrap by either improving the initial P

n

and/or the redrawn P

˚n

. On the other hand bootstrap- ping the raked empirical measure denoted below P

(N)n

may give access to the distribution of a raked statistic observed only once, in particular to evaluate its lower variance, small bias and reduced risk.

Organization. The paper is organized as follows. The four bootstrap proce- dures to be studied are presented in section 1.3 together with our paradigm of auxiliary information from partitions. We present the main results in parts 2 and 3. More precisely, section 2.2 provides the strong approximation of the weighted bootstrap empirical process iterated b

n

times, then statistical conse- quences are derived in sections 2.3 and 2.4. Results of section 3.1 show that we can apply the Raking-Ratio method after bootstrapping a sample in order to simulate the asymptotic law of the empirical Raking-Ratio process. The results stated in sections 3.2 and ?? show how the performances of the basic bootstrap are improved by using a true information on P . Finally, the proof of all results are postponed until section 4, focusing mainly on the case without information then avoiding straightforward details.

1.3 Weighted bootstraps

The weighted bootstrap empirical process. Let (X

1

, Z

1

), . . . , (X

n

, Z

n

) be independent random variables with unknown law P

(X,Z)

on ( X , A ) ˆ( R , B ( R )).

The conditional distributions P

(Z|X=x)

are assumed to exist and satisfy E (Z|X = x) = Var(Z|X = x) = 1, x P X . (1.3) For sake of simplicity we shall assume

P (0 ď Z ď M

Z

) = 1, M

Z

ă +8. (1.4) This is not restrictive when n is fixed since for M

Z

= F

Z´1

(1´1/n

θ+1

b

n

) the ran- dom variable 1

t|Z|ďMZu

Z + 1

t|Z|ąMZu

M

Z

behaves like Z over all bootstrapped samples with probability of order 1 ´ 1/n

θ

. However, as n Ñ +8, truncation arguments would be cumbersome.

The first weighted bootstrap process α

˚n

to be considered is defined at (1.1)

and (1.2). We study the joint convergence of (α

n

( F ), α

˚n

( F )) with rates. Eval-

uating the weak distance between α

˚n

and its limit is crucial since each new

bootstrap sample X

1˚

, ..., X

n˚

is affected by it. Likewise, in order to control the

global distortion in play by not using P

n

when bootstrapping a collection of

estimators S

n

with law φ( P

n

) we shall approximate jointly the b

n

bootstrapped

empirical processes. The coupling error being quantified in the very strong sup-

norm over F , b

n

has to be sufficiently small to guaranty that the confident bands

for infinitely many estimated parameters are uniformly not over-biased.

(8)

Beyond the classical one, let us introduce three other weighted bootstraps - only the first two are studied, the third one satisfies similar results however with heavier notation.

The raking-ratio empirical process. What we call the raking-ratio algo- rithm was introduced by Deming and Stephan [14] and rectified by Stephan [34]

then justified by Lewis [25], Brown [12], Sinkhorn [32, 33] and finally Ireland and Kullback [21]. This procedure consists in changing iteratively the weights of each X

i

to match known probabilities of discrete marginals. Special cases or closely related methods are stratification, calibration, fitting after sampling, iterative proportions or matrix scaling. A rather general study at the empirical measure level was conducted in [1] from the viewpoint of auxiliary information of partitions. Let us briefly introduce the suitable notation and a few results in our setting where entries of the algorithm are random.

For all N P N

˚

assume that m

N

ě 1 and A

(N1 )

, . . . , A

(NmN)

Ă A is a partition of X such that the discrete marginal P ( A

(N)

) = (P (A

(1)1

), . . . , P (A

(NmN)

)) is known and p

N

= min

1ďjďmN

P(A

(Nj )

) ą 0. Let P

(N)n

and α

(Nn )

denote respectively the empirical measure and process associated with the raking-ratio method, defined to be P

(0)n

= P

n

, α

(0)n

= α

n

and, for N ě 1 and on the event B

n1,N

= tmin

1ďjďmN

P

(Nn ´1)

(A

(Nj )

) ą 0u,

P

(Nn )

(f ) =

mN

ÿ

j=1

P(A

(N)j

)

P

(N´1)n

(A

(N)j

) P

(Nn ´1)

(f 1

A(N) j

), (1.5)

α

(Nn )

(f ) = ?

n( P

(Nn )

(f ) ´ P(f )), f P F . (1.6) In particular, P

(Nn )

(A

(N)j

) = P(A

(N)j

) and α

(N)n

(A

(Nj )

) = 0 for all N P N and j = 1, . . . , m

N

. Notice that P

(Nn )

is the Kullback projection of P

(Nn ´1)

satisfying the N-th step constraint P ( A

(N)

) according to Proposition 1 of [1]. We proved that under classical entropy conditions on F , α

(Nn )

( F ) converges weakly as n Ñ +8 to a centered Gaussian process G

(N)

defined similarly to α

(N)n

– see Proposition 4 of [1]. More precisely, write E [f |A] = P(f 1

A

)/P(A) and set G

(0)

= G to be the P -Brownian bridge indexed by F then define the P -raked Brownian bridge to be, for N P N

˚

,

G

(N)

(f ) = G

(N´1)

(f ) ´

mN

ÿ

j=1

E [f |A

(N)j

] G

(N´1)

(A

(N)j

), f P F . (1.7)

We established that the covariance of the limiting process G

(N)

is

Cov( G

(N)

(f ), G

(N)

(g)) = Cov( G

(0)

(f ), G

(0)

(g)) ´ R

N

(P, f, g) (1.8)

where R

N

(P, f, g) has the following closed form expression given at Proposition

(9)

7 in [1]:

R

N

(P, f, g) =

N

ÿ

k=1

Φ

(Nk )

(P, f ) ¨ Var( G [ A

(k)

]) ¨ Φ

(Nk )

(P, g), (1.9)

Φ

(N)k

(P, f ) =

N

ÿ

l=1

P(P, l) ¨ P[f|A

(l)

],

where P [f |A

(l)

] = (P (f |A

(l)1

), . . . , P (f |A

(l)ml

)), P(P, l) P R

ml

The asymptotic uniform variance reduction is induced by the fact that R

N

(P, f, f) ě 0 for all N ě 1, f P F . Likewise all finite dimensional covariance matrices of G

(N)

( F ) are decreasing compared to the initial one. The strong approximation of α

(Nn )

by G

(N)

established by Theorem 2.1 of [1] further shows that the bias E ( P

(Nn )

(f )) ´ P(f ) is uniformly small and provides rates of uniform quadratic risk reduction over F . If recursive loops are performed among p partitions with known probabilities, for n sufficiently large the finite covariance matrices of α

(kp)n

decrease at each loop k. We also compute – see Theorem 2.2 of [1] – a simple expression for G

(N)

as N Ñ +8 when raking with two partitions alternatively – the basic case of a two-way contingency table.

Raking the bootstrapped empirical process. According to the bootstrap paradigm, in order to estimate the distribution of α

(Nn )

one has to re-sample according to the weighted P

n

then apply the N -th order raking-ratio procedure to P

˚n

. This gives access by Monte-Carlo approaches to the unknown distribution of G

(N)

– useful since P is unknown. Define the raked bootstrapped empirical measure to be P

˚(0)n

= P

˚n

then, recursively and conditionally to X

1

, ..., X

n

on the event B

n2,N

= tmin

1ďjďmN

P

˚(N´1)n

(A

(Nj )

) ą 0u,

P

˚n(N)

(f ) =

mN

ÿ

j=1

P

n

(A

(N)j

)

P

˚(Nn ´1)

(A

(Nj )

) P

˚n(N´1)

(f 1

A(N) j

), (1.10)

α

˚(N)n

(f ) = ?

n( P

˚(N)n

(f ) ´ P

n

(f)), f P F . (1.11) The centering with respect to P

n

in (1.11) should be discussed. In (1.2) the centering P

n

stands for the conditional expectation of P

˚n

and plays the role of the expectation P of P

n

in α

n

. On the opposite, there is a bias inherent to the raking-ratio procedure so that P is no more the expectation of P

(N)n

in (1.6).

This bias was established by Proposition 5 of [1] to be uniformly small, hence P is confirmed as the targeted probability measure and the limiting process is centered. In order to simulate the influence of this bias we center the boot- strap (1.11) on P

n

. Therefore P

˚n(N)

and α

˚n(N)

use the auxiliary information P

n

( A

(N)

) = ( P

n

(A

(N1 )

), . . . , P

n

(A

(Nmn)

)) instead of the original P ( A

(N)

) and sat- isfy

P

˚(Nn )

(A

(N)j

) = P

n

(A

(Nj )

), α

˚(Nn )

(A

(Nj )

) = 0, j = 1, . . . , m

N

.

Bootstrapping the raked empirical process. A way to exploit directly the

information of partitions is to bootstrap by using a probability that is possibly

(10)

closer to P than P

n

is, namely P

(N)n

. Let T

n(N)

/n denote the mean of Z

1

, ..., Z

n

under the discrete measure P

(Nn )

, that is T

n(N)

= ř

n

i=1

n P

(Nn )

(tX

i

u)Z

i

. In par- ticular, T

n(0)

= T

n

. Given N P N , define the bootstrapped N -th order raked empirical measure and process to be, on the event B

n1,N

,

P

(N)˚n

(f ) =

n

ÿ

i=1

n P

(N)n

(tX

i

u)Z

i

T

n(N)

f (X

i

), (1.12)

α

(N)n ˚

(f ) = ?

n( P

(Nn )˚

(f ) ´ P

(Nn )

(f )), f P F . This reproducible imitation of α

n

is a variant of (1.2).

2 Main results

2.1 The class F

In all this paper, one assume that X is measurable with σ-field A . For ap- proximations results X is not required to be metric separable nor A be Borel sets, however this may helps differentiability questions for the statistics to be bootstrapped. Let M be the set of measurable real valued functions on ( X , A ) and F r Ă F Ă M be such that sup

fPF

|f | ď M

F

ă +8, F r is countable and each f P F is the point-wise limit of a sequence belonging to F r . This condition ensures that the empirical process α

n

and the variants defined in the previous section are point-wise separable and hence ball measurable, which allows to restrict their weak convergence to ball measurable test maps and avoid outer probabilities – see example 2.3.4 of [35].

For a probability measure Q on ( X , A ) we endow M with the semi-metric dQ defined by dQ

2

(f, g) = ş

X

(f ´ h)

2

dQ. Let N ( F , ε, d) be the minimum number of balls of radius ε for the semi-distance d needed to cover F . Let N

[ ]

( F , ε, d) be the minimum number of ε-brackets for d necessary to cover F . One assume throughout the paper that F satisfies either (VC) or (BR) below, which encom- pass many useful examples. Typically (VC) concerns small classes well behaved with respect to all P , like VC-classes. On the opposite (BR) concerns classes that are very large or well behaved only with respect to a few P.

Hypothesis (VC). There exists c

0

ą 0, ν

0

ą 0 such that sup

Q

N ( F , ε, dQ) ď c

0

ε

´ν0

where the supremum is taken over all probability measure Q on ( X , A ).

Define α

0

= 1/(2 + 5ν

0

) P (0, 1/2) and β

0

= (4 + 5ν

0

)/(4 + 10ν

0

).

Hypothesis (BR). There exists b

0

ą 0, 0 ă r

0

ă 1 such that N

[ ]

( F , ε, dP) ď exp(b

20

ε

´2r0

). Define γ

0

= (1 ´ r

0

)/2r

0

.

Under (VC) or (BR) the process α

n

( F ) converges weakly to the P -Brownian bridge G ( F ) in

8

( F ) endowed with the sup-norm ||H ||

F

= sup

fPF

|H(f )|.

According to Propositions 1 and 2 of Berthet and Mason [8] the rate of weak

convergence is of order at most n

´α0

(log n)

β0

under (VC) and (log n)

´γ0

under

(11)

(BR). A slightly improvement and interpolation of these rates and conditions are possible, however we focus on our main topic. Thus we keep the original α

0

, β

0

, γ

0

and uniform boundedness – instead of square integrable envelope function – and it would suffice to substitute improved rates of approximation with no change in our statements. Recall that G ( F ) is a centered Gaussian linear process with covariance

Cov( G (f ), G (g)) = P (f g) ´ P (f )P(g), f, g P F . Write σ

2f

= Var(f (X)) = P(f

2

) ´ P(f )

2

and σ

F2

= sup

fPF

σ

2f

ă +8.

2.2 Strong approximations

For n P N , we write LL(n) = L(L(n)) with L(x) = max(1, log(x)).

Proposition 1. Assume (1.3), (1.4) and either (VC) or (BR). There exists a finite K = K( F , P

(X,Z)

) ą 0 such that

lim sup

nÑ+8

c n

LL(n) ||P

˚n

´ P ||

F

ď K a.s.

Next define u

n

= n, v

n

= n

´α0

(log n)

β0

if F satisfies (VC) and u

n

= log n, v

n

= (log n)

´γ0

if ( F , P ) satisfy (BR). Berthet and Mason (see Propositions 1 and 2 of [8]) proved that we can construct a probability space on which the sequence of empirical process α

n

( F ) can be defined together with a coupling sequence of P-Brownian bridges G

n

( F ) such that, almost surely, ||α

n

´ G

n

||

F

ď Cv

n

for some C ą 0 and all n large enough. By applying this strong approximation to P

(X,Z)

we obtain the following version for the weighted bootstrap empirical process α

˚n

, which we formulate in the same way as in [8].

Theorem 2.1. Assume (1.3), (1.4) and either (VC) or (BR). For all θ ą 0 there exists C

θ

ą 0, n

θ

ą 0 and a probability space supporting a sequence t(X

n

, Z

n

)u of i.i.d. random variables with distribution P

(X,Z)

and a sequence t( G

n

( F ), G

˚n

( F ))u of pairs of independent P-Brownian bridges such that for all n ą n

θ

,

P (t α

n

´ G

n

F

ą C

θ

v

n

u Y t α

˚n

´ G

˚n

F

ą C

θ

v

n

u) ď 1

n

θ

. (2.1) On the same probability space as above there also exists independent sequences tG

1n

( F )u and tG

˚n1

( F )u of independent versions of G ( F ) such that

P (

? 1 n max

1ďkďn

?

k

´

k

ÿ

i=1

G

1i

F

ą C

θ

v

n

) ď 1

u

θn

, (2.2) P

(

? 1

n max

kn,θďkďn

?

˚k

´

k

ÿ

i=1

G

˚i1

F

ą C

θ

v

n

)

ď 1

u

θn

, (2.3)

for all n ě max(n

θ

, k

n,θ

), where k

n,θ

= r2M

Z2

(ln(8) + (1 + θ) ln n)s.

(12)

The independence of G

n

and G

˚n

comes from Lemma 2. The bridges G

˚n

(resp.

G

n

) are not pair-wise independent, whereas by construction the G

1n˚

(resp. G

1n

) are mutually independent. Notice that Theorem 2.1 applied with θ ą 1 not only implies that α

n˚

almost surely weakly converge to G but also provides rates of weak convergence. In particular this establishes the asymptotic independence of the processes α

n

and α

˚n

. Moreover, by substituting G

n

and G

˚n

their residual orthogonality is quantified through

lim sup

nÑ+8

1 v

n

sup

f,gPF

|Cov(α

n

(f ), α

˚n

(g))| ă +8 a.s.

Consider now the Monte-Carlo experiment of b

n

iterated bootstraps. Let P

˚n,(j)

denote the j

th

independent bootstrapped empirical measure built from weights (Z

1,(j)

, . . . , Z

n,(j)

) drawn from P

(Z|X=Xi)

ˆ ¨ ¨ ¨ ˆ P

(Z|X=Xn)

conditionally to X

1

, . . . , X

n

. Write α

˚n,(j)

(f ) = ?

n( P

˚n,(j)

(f ) ´ P

n

(f )) the associated empirical process. For b

n

= 1 the following result reduces to Theorem 2.1, otherwise the rates of approximation are slowed down. If F satisfies (VC) then define

w

n

= ( b

5n

n (log n)

2

)

α0

(

log ( n

b

5n

))

5v/(4+10v)

ď ( b

5n

n )

α0

(log n)

β0

. If ( F , P ) satisfy (BR) then define

w

n

=

( 1 log(n/b

5n

)

)

γ0

ě ( 1

log n )

γ0

.

Theorem 2.2. Assume (1.3), (1.4) and either (VC) or (BR). Let b

n

P N

˚

be such that b

n

/n

1/5

Ñ 0. For all θ ą 0 there exists C

θ

ą 0, n

θ

ą 0 and a probability space supporting a triangular array t(X

n

, Z

n,(1)

, ..., Z

n,(bn)

)u of i.i.d. random vectors distributed as P

(X1,Z1,(1),...,Z1,(bn))

and a triangular array t( G

˚n,(0)

( F ), G

˚n,(1)

( F ), ..., G

˚n,(bn)

( F ))u of (b

n

+ 1)-uplets of mutually indepen- dent P -Brownian bridges such that, for n ě n

θ

,

P (

t∥ α

n

´ G

˚n,(0)

F

ą C

θ

w

n

u Y

bn

ď

j=1

t∥ α

˚n,(j)

´ G

˚n,(j)

F

ą C

θ

w

n

u )

ď 1 n

θ

.

(2.4) The coupling we perform implies that G

˚m,(j)

and G

˚n,(k)

are dependent if mn and independent if m = n and jk. The following uniform central limit theorem immediately follows, if b

n

= b is fixed. For b bootstraps define the R

b+1

-valued empirical process indexed by F

b+1

to be

Λ

n,b

( F

b+1

) = t(α

n

(f

0

), α

˚n,(1)

(f

1

), . . . , α

n,(b)˚

(f

b

)) : f = (f

0

, f

1

, . . . , f

b

) P F

b+1

u.

Consider any norm ||.|| on R

b+1

then endow

8

( F

b+1

Ñ R

b+1

) with the distance

associated to the sup-norm ||Λ||

b+1

= sup

fPFb+1

||Λ(f )||.

(13)

Proposition 2. Under the assumptions of Theorem 2.2, for any fixed b P N

˚

the sequence Λ

n,b

( F

b+1

) converges weakly in

8

( F

b+1

Ñ R

b+1

) to

G

b

( F

b+1

) = t( G

˚(0)

(f

0

), G

˚(1)

(f

1

), . . . , G

˚(b)

(f

b

)) : f = (f

0

, f

1

, . . . , f

b

) P F

b+1

u where G

˚(0)

, G

˚(1)

, . . . , G

˚(b)

are mutually independent P -Brownian bridges.

Theorem 2.2 also implies that the distance in distribution between Λ

n,bn

( F

bn+1

) and G

n,bn

( F

bn+1

) = t( G

˚(0)

(f

0

), G

˚(1)

(f

1

), . . . , G

˚(b)

(f

b

)) : f P F

b+1

u is at most O(w

n

), which is severely impacted by b

n

and requires that b

n

/n

1/5

Ñ 0. Let d

P L,n

(µ, ν) denote the Prokhorov-Lévy distance between two probability mea- sures (µ, ν) on

8

( F

bn+1

Ñ R

bn+1

).

Proposition 3. Under the assumptions of Theorem 2.2 we have d

P L,n

n,bn

( F

bn+1

), G

n,bn

( F

bn+1

)) = O(w

n

).

Comments. Proposition 2 shows that for one bootstrap experiment the biased empirical process ?

n( P

˚n,(1)

(f ) ´ P (f )) is asymptotically distributed as G

˚(0)

+ G

˚(1)

and hence ?

2 G so that its asymptotic variance is 2Var(f ). Moreover, thanks to Theorem 2.2 the study of weak convergence of functionals estimated by b

n

bootstrap experiments which are conditionally independent versions of α

˚n

is made easier by substituting the G

˚n,(j)

to the α

˚n,(j)

. Furthermore, when n is large the distributions of the statistics of interest are typically very concentrated and nearly Gaussian, thus bootstrapping only b

n

times with b

n

/n

1/5

Ñ 0 is not so penalizing once the uniform performance is guarantied by Theorem 2.2.

The assumption (1.4) could be relaxed in Theorems 2.1 and 2.2 by taking into account the tail behavior of Z through the function ψ

Z

(z) = ´ log P (Z ą z), z ą 0, however at the cost of additional technicalities.

The above coupling results can be applied to control in general estimators boot- strapped b

n

times and expressed as smooth transforms of P

˚n

( F ). In 2.3 we study in particular the variance and the distribution function uniformly over a class of such estimators. In 2.4 we derive uniform over F Berry-Esseen type bounds.

2.3 Estimation of variance and distribution function

Motivation. In a bootstrap Monte-Carlo procedure the P

˚n

(f ) have to be

reevaluated b

n

times by redrawing the n weights Z

i

according to the conditional

distribution P

(Z|X=Xi)

. Classically b

n

= 1 and the bootstrapped moments of a

smooth transform of several mean estimators P

˚n

(f) is evaluated by Edgeworth

expansions. When b

n

= b ą 1 estimators of the moments of P

˚n

(f ) have also been

studied, but typically not jointly. For instance, Booth and Sarkar [11] showed

that the distribution of the bootstrap estimator of the variance of a statistic is

approximately a chi-squared distribution with b ´ 1 degrees of freedom. They

deduce b ą 1 necessary to obtain a relative error less than a fixed bound with

some probability, assuming n large – for an error less than 10% with probability

(14)

0.95 about b = 800 are required. Chandra and Ghosh [13] studied the distribu- tions of smooth functions of the empirical mean by using Edgeworth expansions and proved that these statistics converge to a centered Chi-squared distribution.

In the same spirit, Babu [5] showed that the bootstrapped version of smooth transforms of a single empirical mean has a similar weak asymptotic behavior by assuming that the functional is three times continuously differentiable.

Let us extend these results to the case of functionals of the empirical measure itself instead of one or several empirical means, under a generic assumption of first order differentiability on the space of measures.

Hypothesis. We study the bootstrap estimation of moments and distribution of a statistic S

n

= φ( P

n

) with φ differentiable in the following weak sense. Let φ :

8

( F ) Ñ R be a real-valued function such that, for all Q P

8

( F ),

φ(P + Q) = φ(P ) + φ

1

(P) ¨ Q + R(Q), (2.5) where φ

1

(P) P L (ℓ

8

( F ), R ) is a linear application satisfying |φ

1

(P) ¨Q| ď ||Q||

F

, R :

8

( F ) Ñ R is an application such that in a ball B centered on the zero function |R(Q)| ď ||Q||

qF

for some q ą 1. The local expansion (2.5) includes Frechet differentiability and examples pages 11-16 in Barbe and Bertail [6].

The key factor showing up in this setting is σ

2

= Var(φ

1

(P) ¨ G ), with G the P-Brownian bridge. One could refer to σ

2

as a conditional coefficient of variation. The following table makes (2.5) explicit for some classical estimators, in particular the associated finite classes and the differential φ

1

(P ) ¨ G .

S

n

= φ( P

n

) F φ

1

(P ) ¨ G σ

2

Mean

P

n

(f

0

) tf

0

u G (f

0

) Var(f

0

(X ))

Variance

P

n

(f

02

) ´ P

n

(f

0

)

2

tf

02

, f

0

u G (f

02

) Var(f

02

(X )) Inverse mean

Pn1(f0)

tf

0

u

P2´1(f0)

G (f

0

)

Var(fP4(f0(X))0)

Conditional mean

Pn(f01A) Pn(A)

tf

0

1

A

, 1

A

u

G(f01A)´P(fP(A)0|A)G(A) Var((f0(X)´EP(A)[f0(X)|A])12 XPA)

Variance and distribution estimations by bootstrap. Write S

n,(j)˚

= φ( P

˚n,(j)

), for 1 ď j ď b

n

. Whenever S

n

is asymptotically normal, or regular in terms of Edgeworth expansion, Var(S

n

) is crucial to provide confident bands.

The variance and distribution function of S

n

could be estimated by the following bootstrap estimators

Var(S y

n˚

) = 1 b

n

bn

ÿ

j=1

(S

n,(j)˚

´ S

n

)

2

= 1 b

n

bn

ÿ

j=1

(

φ( P

˚n,(j)

) ´ φ( P

n

) )

2

,

F p

S˚

n

(x) = 1 b

n

n

ÿ

j=1

1

tS˚

n,(j)ďxu

.

(15)

Results. The next statement provides a confidence interval of Var(S

n

) from the value Var(S y

˚n

). In particular the statistic (nb

n

2

)y Var(S

n˚

) is asymptotically close to a χ

2

(b

n

).

Corollary 1. Assume (1.3), (1.4) and (VC). Let b

n

P N

˚

be such that b

n

/n

1/5

Ñ 0. There exists C

0

, n

0

ą 0 such that for all n ą n

0

,

P ( n

σ

2

| Var(S y

n˚

) ´ Var(S

n

)| ě δ + C

0

w

n

n

α0/2

)

ď α + 1

n

2

, (2.6) with α, δ such that P (ˇ

ˇ ˇ

χ2(bn) bn

´ 1

ˇ ˇ ˇ ě δ

)

ď α. In particular, almost surely there exists C

01

ą 0, n

10

= n

10

(ω) ą 0 such that for all n ą n

10

,

| Var(S y

n˚

) ´ Var(S

n

)| ď C

01

? b

n

. (2.7)

Remark. If b

n

= b is fixed and the class F is finite we get the same approxima- tion for b as Booth and Sarkar [11] to have a relative error for Var(S y

n˚

) less than δ with a probability greater than 1 ´ α, namely b » 2|Φ

´1

(α/2)|

2

2

. Moreover Corollary 1 provides a second order control of both the width and probability of the confident interval for the variance, uniformly in n and for infinite classes.

The following result is a DKW-type inequality for bootstrap statistics. It evalu- ates the uniform deviation between the unknown distribution function of S

n

and the estimated empirical distribution function F p

S˚

n

by also taking into account the shift due to the unavoidable bias B

n

= S

n

´ φ(P ).

Corollary 2. Assume (1.3), (1.4) and either (BR) or (VC). Let b

n

P N

˚

be such that b

n

/n

1/5

Ñ 0. Almost surely there exists C

0

ą 0, n

0

= n

0

(ω) ą 0 such that for all n ą n

0

,

sup

xPACn

| F p

S˚

n

(x + B

n

) ´ F

Sn

(x)| ď C

0

c log n

b

n

, (2.8)

with A

n

= [φ(P ) ´ a

n

; φ(P ) + a

n

], a

n

= σ a

log(log n/b

n

w

n2

).

Remark. Let Λ denote the class of strictly increasing continuous mappings of R onto itself. The J

1

-topology is defined by the Skorokhod metric (see Billings- ley [10])

d(F, G) = inf

λ

max (

sup

xPR

|λ(x) ´ x|, sup

xPR

|G ˝ λ(x) ´ F (x)|

) .

Let d

A

(F, G) denote the above distance when the second supremum is only evaluated on x P A. Since ?

nB

n

Ñ

weak

ϕ

1

(P ) ¨ G then B

n

= O

p.s.

( a

log(n)/n) and it follows from (2.8) that

P (

d

AC

n

(F

Sn

, F p

S˚

n

) ą C c log n

b

n

) ď 1

n

2

.

(16)

2.4 Rates of weak convergence

Local Berry-Esseen bounds. Let Φ be the distribution function of the stan- dard normal law. Singh[31] – see also section 3.1.3. of [29] - established several Berry-Esseen type inequalities for the distribution of the sum of bootstrapped variables of a given sample with respect to the distribution function of the sum of the variables of this sample and then with respect to Φ. In particular, under certain conditions he established that the uniform deviation between these dis- tributions is almost-surely at most O(n

´1/2

). With Edgeworth expansion tech- niques, Hall [18, 19] studied the leading term of the expansion of the uniform deviation conditionally to the sample and found the same rate as the statistic n

´3/2

| ř

n

j=1

X

j3

| + n

´2

ř

n

j=1

X

j4

converges to 0 – that is also O(n

´1/2

) if the fourth moment exists. In this section, we derive a Berry-Esseen inequality for the bootstrapped empirical process indexed by functions, i.e. uniform results among large classes of statistics, however with slower rates than the O(n

´1/2

) for a single smooth statistic.

Uniform Berry-Esseen bounds. Let L be a set of Lipschitz functions defined on

8

( F ) such that all ϕ P L has a Lipschitz constant bounded by C

0

ă +8 and the density ϕ( G (f )) is bounded by C

1

ă +8 for all f P F .

Corollary 3. Assume (1.3), (1.4) and either (VC) or (BR). There exists C ą 0, n

0

ą 0 such that for all n ą n

0

,

sup

ϕPL

sup

fPF

sup

xPR

|P (ϕ (α

n˚

(f )) ď x) ´ P (ϕ( G (f )) ď x)| ď CC

0

C

1

v

n

, (2.9) sup

ϕPL

sup

fPF

sup

xPR

|P (ϕ (α

˚n

(f )) ď x) ´ P (ϕ(α

n

(f )) ď x)| ď CC

0

C

1

v

n

. (2.10) In particular, if r σ

F2

= inf

fPF

Var(f (X)) ą 0 then

sup

xPR

sup

fPF

ˇ ˇ ˇ ˇ ˇ P

(

α

˚n

(f )

a Var(f (X )) ď x )

´ Φ(x) ˇ ˇ ˇ ˇ ˇ

ď C

? 2π r σ

F

v

n

, sup

xPR

sup

fPF

ˇ ˇ ˇ ˇ ˇ P

(

α

˚n

(f )

a Var(f (X)) ď x )

´ P (

α

n

(f)

a Var(f (X)) ď x

ˇ ˇ ˇ ˇ

ď C

? 2π r σ

F

v

n

.

3 Raking-Ratio results

3.1 Strong approximation of α

˚(Nn )

Fix N

0

P N

˚

and denote P

N0

= ś

N0

N=1

p

N

, M

N0

= ś

N0

N=1

m

N

. The boot- strapped empirical measure associated with the Raking-Ratio method assumes the following law of iterated logarithm:

Proposition 4. If F satisfies (VC) or (BR) then there exists K = K( F , Z) ą 0 such that,

lim sup

nÑ+8

max

1ďNďN0

c n

LL(n) ||P

˚(Nn )

´ P||

F

ď Kb

N0

a.s.,

Références

Documents relatifs

The impact of HWFC was therefore assessed at the facility level, at the local health system level in which the facility operated, and at the client level in terms of how the quality

From an early view that unavailable carbohydrate (as Southgate dietary fibre) contributes no energy to the human diet (Southgate & Durnin, 1970) has evolved a current view

We propose a boot- strap approach based on the Moving Block Bootstrap method to construct pointwise and simultaneous confidence intervals for the Fourier coefficients of

To get the lower bound, the strategy is to couple our process with a family of other stable-like processes whose Hausdorff dimension is known and then compare the Hausdorff dimension

The power of tests based on symmetric and equal-tail bootstrap P values against certain alternatives may be quite different, as is shown in Section 8... these tests may have

Moreover, the consistency of the Moving Block Bootstrap (MBB) for the Fourier coefficients of the shifted autocovariance function for PC or APC time series was studied in [31] (see

Sufficient conditions for the continuity of stationary gaussian processes and applications to random series of functions.. Annales de l’institut Fourier, tome 24, n o 2

We study the empirical measure associated to a sample of size n and modified by N iterations of the raking-ratio method.. This empirical mea- sure is adjusted to match the