• Aucun résultat trouvé

Processus empiriques en sondages. Application `a l’estimation du taux de pauvret´e.

N/A
N/A
Protected

Academic year: 2022

Partager "Processus empiriques en sondages. Application `a l’estimation du taux de pauvret´e."

Copied!
39
0
0

Texte intégral

(1)

Processus empiriques en sondages.

Application ` a l’estimation du taux de pauvret´ e.

H´el`eneBoistard(1), Rik Lopuha¨a(2) and Anne Ruiz-Gazen(1)

(1) TSE, Universit´e Toulouse 1 Capitole, contact: [email protected] (2) Delft University of Technology, The Netherlands.

9`eme colloque francophone sur les sondages Gatineau, Canada

(2)

Motivation

For a parameter θ and an estimator ˆθ,

θˆ−1.96 q

dVar(ˆθ) ; ˆθ+ 1.96 q

Vard(ˆθ)

In survey sampling, confidence intervalsfor finite population or infinite population parameters are making use of the asymptotic normality of the point estimators.

Even when estimating a total with the Horvitz-Thompson estimator, proving the asymptotic normality is not an easy task.

Even more difficult forcomplex parameters :

For some regular functionsg, possible to use the Delta methodto derive the asymptotic normality of g(ˆθ) given that ˆθis asymptotically normal.

For parameters like the poverty rate, more difficult.

(3)

Introduction

Poverty rate

For a distribution function F,

φ(F) =F βF−1(α)

(1) for fixed 0< α, β <1, where F−1(α) = inf{t:F(t)≥α}.

Typical choices areα= 0.5 and β = 0.5 (INSEE) orβ = 0.6 (EUROSTAT).

Functional definition.

In classical statistics, use empirical process theory to derive the asymptotic normality of functionals that are regular in the sense of Hadamard

differentiable using thefunctional Delta method (van der Vaart, 1998).

(4)

Motivation

Aims :

Use the empirical processes theory in order to prove functional limit theorems for relevant empirical processesin survey sampling.

Derive some asymptotic properties of estimators in the survey sampling framework using the functional delta method for Hadamard differentiable functionals.

(van der Vaart, 1998, van der Vaart and Wellner, 1996)

Example : derive asymptotic normalityof estimators of complex parameters such as the poverty rate.

(5)

Introduction

1 Introduction

2 Context

3 Empirical processes under study

4 Theorems Assumptions Results Proof sketch

Process centered byFN

Process centered byF

5 Application to the poverty rate estimation problem

6 Short comparison with other results

7 Conclusion and perspectives

8 The end

(6)

1 Introduction

2 Context

3 Empirical processes under study

4 Theorems Assumptions Results Proof sketch

Process centered byFN

Process centered byF

5 Application to the poverty rate estimation problem

6 Short comparison with other results

7 Conclusion and perspectives

8 The end

(7)

Context

Context

We follow Rubin-Bleuer and Schiopu Kratina (2005) and consider a product probability spacethat includes the super-population and the design space, assuming that sample selection and model characteristic are independent given the design variables (non-informativesampling).

Consider a sequence of nested finite populations associated to a set of indicesUN ={1,2, . . . ,N} of sizes N= 1,2, . . ..

For each indexi ∈UN, we have the variable of interestyi ∈R.

(8)

Simplified context (no design variable)

Super-population : we assume that the values in each finite population are realizations of the independent random variables Yi ∈Rfor

i = 1,2, . . . ,N, on a common probability space (Ω,F,Pm).

Design (without replacement) :

for all N = 1,2, . . .,SN ={s :s ⊂UN} : collection of subsets ofUN and AN =σ(SN) : σ-algebra generated bySN.

We define a probability measure Pd on the design space (SN,AN).

Product :

let (SN×Ω,AN ×F) be the product space with probability measure : Pd,m({s} ×E) =Pd({s})Pm(E).

(9)

Context

Context

Samples wheren denotes the expectation of the size of the sample under the design.

ξi =1{i∈s}, πi =Pdi = 1)>0

πi1i2...ik =Pdi1 = 1, ξi2 = 1, . . . , ξik = 1).

Note thatn and the inclusion probabilities are considered asfixed in the present talk but could be random (dependent on the design variables).

(10)

1 Introduction

2 Context

3 Empirical processes under study

4 Theorems Assumptions Results Proof sketch

Process centered byFN

Process centered byF

5 Application to the poverty rate estimation problem

6 Short comparison with other results

7 Conclusion and perspectives

8 The end

(11)

Empirical processes under study

Horvitz-Thompson processes

Let F be the c.d.f. ofYi and FN(t) = 1

N

N

X

i=1

1{Yi≤t},t∈R the empirical (finite population) c.d.f. This function is cadlag.

(12)

Horvitz-Thompson processes

The Horvitz-Thompson (HT) empirical processes obtained from the HT empirical c.d.f. :

FHTN (t) = 1 N

N

X

i=1

ξi1{Yi≤t}

πi

, t ∈R. (2)

where ξi is the indicator1{s3i}.

√n(FHTN −FN), centered around theempirical c.d.f. FN of theYi’s,

√n(FHTN −F), centered around c.d.f. F of theYi’s.

(13)

Empirical processes under study

H´ ajek processes

The H´ajek empirical processes

obtained from the H´ajek empirical c.d.f. : FHJN (t) = 1

Nb

N

X

i=1

ξi1{Yi≤t}

πi

, t∈R, (3)

where Nb=PN

i=1ξii is the HT estimator for the population totalN.

√n(FHJN −FN)

√n(FHJN −F).

In this presentation, we will only consider the Horvitz-Thompson processes.

(14)

Poverty rate

For a distribution function F,

φ(F) =F βF−1(α)

(4) for fixed 0< α, β <1, where F−1(α) = inf{t:F(t)≥α}.

The finite population parameter is φ(FN).

φ(FHTN ) is the HT estimator, φ(FHJN ) is the H´ajek estimator.

(15)

Empirical processes under study

Roughly speaking, if we are able to derive the limiting distribution of the

empirical process √

n(FHTN −FN) then for any functional φHadamard differentiable, we can get the normality of

√n(φ(FHTN )−φ(FN)).

And the same for

√n(FHTN −F) and√

n(φ(FHTN )−φ(F)),

√n(FHJN −FN) and√

n(φ(FHJN )−φ(FN))

√n(FHJN −F) and√

n(φ(FHJN )−φ(F))

(16)

A functional central limit theorem for √

n(FHTN −FN) and√

n(FHTN −F) is obtained using Theorem 13.5 in Billingsley, 1999.

This requires

weak convergence of all finite dimensional distributions and a tightness condition (see (13.14) in Billingsley, 1999).

(17)

Theorems Assumptions

1 Introduction

2 Context

3 Empirical processes under study

4 Theorems Assumptions Results Proof sketch

Process centered byFN

Process centered byF

5 Application to the poverty rate estimation problem

6 Short comparison with other results

7 Conclusion and perspectives

8 The end

(18)

Assumptions on the design

In order to prove the tightness condition, we make assumptions on the design.

Let Dν,N =

n

(i1,i2, . . . ,iν)∈ {1,2, . . . ,N}ν :i1,i2, . . . ,iν all different o

, (5) for the integers 1≤ν≤4.

(19)

Theorems Assumptions

Assumptions on the design

(C1) there exist constantsK1,K2, such that for alli = 1,2, . . . ,N, 0<K1 ≤ Nπi

n ≤K2 <∞, ω−a.s.

There exists a constant K3>0, such that for allN = 1,2, . . .: (C2) max(i,j)∈D2,N

Edi −πi)(ξj−πj)

<K3n/N2, (C3) max(i,j,k)∈D3,N

Edi −πi)(ξj −πj)(ξk−πk)

<K3n2/N3, (C4) max(i,j,k,l)∈D4,N

Edi−πi)(ξj −πj)(ξk−πk)(ξl−πl)

<K3n2/N4, ω-almost surely.

(20)

Assumptions on the design

These conditions on higher order correlations are commonly used in the literature on survey sampling in order to derive asymptotic properties of estimators, e.g., Breidt and Opsomer (2000), and Cardot et al.(2010).

Breidt and Opsomer (2000) proved that they hold for simple random sampling without replacement andstratified simple random sampling without replacement, whereas Boistard et al.(2012) proved that they hold also for rejective sampling.

(21)

Theorems Assumptions

Assumptions on the HT estimator

To establish the convergence of finite dimensional distributions, for sequences of bounded i.i.d. random variables V1,V2, . . . on (Ω,F,Pm), we need a Central Limit Theorem for the HT estimator in the design space, conditionally on the Vi’s.

Let SN2 be the (design-based) variance of the HT estimator of the population mean, i.e.,

SN2 = 1 N2

N

X

i=1 N

X

j=1

πij −πiπj

πiπj ViVj. (6)

(22)

Assumptions on the HT estimator

(HT1) LetV1,V2, . . . be a sequence of bounded i.i.d. random variables, not identical to zero, and such there exists an M >0, such that |Vi| ≤M ω-almost surely, for all i = 1,2, . . ..

Suppose that for N sufficiently large,SN >0 and 1

SN 1 N

N

X

i=1

ξiVi

πi − 1 N

N

X

i=1

Vi

!

→N(0,1), ω−a.s., in distribution underPd.

(23)

Theorems Assumptions

Assumptions on the HT estimator

We also need that nSN2 converges in probability under Pm to a constant : (HT2)there exist constants µπ1π2 ∈Rsuch that

(i) lim

N→∞

n N2

N

X

i=1

1 πi

−1

π1, (ii) lim

N→∞

n N2

X X

i6=j

πij −πiπj

πiπjπ2. (HT3)n/N →λ, where λ∈[0,1] ω-a.s.

(HT4)µπ1 >0.

HT3 and HT4 only needed for the process centered by the super-population c.d.f.

(24)

1 Introduction

2 Context

3 Empirical processes under study

4 Theorems Assumptions Results Proof sketch

Process centered byFN

Process centered byF

5 Application to the poverty rate estimation problem

6 Short comparison with other results

7 Conclusion and perspectives

8 The end

(25)

Theorems Results

HT process centered by F

N

Let D(R) be the space of c`adl`ag functions onR equipped with the Skorohod topology.

Theorem 1

Suppose that conditions (C1)-(C4) and (HT1)-(HT2) hold.

Then √

n(FHTN −FN) converges weakly inD(R) to a mean zero Gaussian process GHT with covariance kernel EmGHT(s)GHT(t) =µπ1F(s∧t) +µπ2F(s)F(t), for s,t ∈R

(26)

HT process centered by F

Theorem 2

Suppose that conditions (C1)-(C4) and (HT1)-(HT4) hold.

Then √

n(FHTN −F) converges weakly in D(R) to a mean zero Gaussian process GHTF with covariance kernel Ed,mGHTF (s)GHTF (t) = (µπ1+λ)F(s∧t) + (µπ2−λ)F(s)F(t), for s,t ∈R.

(27)

Theorems Proof sketch

1 Introduction

2 Context

3 Empirical processes under study

4 Theorems Assumptions Results Proof sketch

Process centered byFN

Process centered byF

5 Application to the poverty rate estimation problem

6 Short comparison with other results

7 Conclusion and perspectives

8 The end

(28)

Process centered by F

N

Tightness condition Lemma 1

Let XN =√

n(FHTN −FN) and suppose that (C1)-(C4) hold.

Then there exists a constant K >0 independent of N, such that for any t1,t2 and−∞<t1 ≤t≤t2<∞,

Ed,m

h

(XN(t)−XN(t1))2(XN(t2)−XN(t))2i

≤K

F(t2)−F(t1)2

.

(29)

Theorems Proof sketch

Use of Billingsley (1999) theorem 13.5

First for theYis’ uniformly distributed on [0; 1],

Then using the proof of Theorem 14.3 (Billingsley, 1999) to generalize to any distribution.

(30)

Process centered by F

Remarks

More terms in the tightness condition Finite dimensional convergence

√n 1 N

N

X

i=1

ξiVi πi

−µV

!

=√ n 1

N

N

X

i=1

ξiVi πi

− 1 N

N

X

i=1

Vi

! +

√n

√ N ×√

N 1

N

N

X

i=1

Vi−µV

! .

+ Theorem 5.1(iii) from Rubin-Bleuer and Schiopu Kratina (2005) with (HT4) in order to have the design-based variance converging to a strictly positive constant.

(31)

Application to the poverty rate estimation problem

1 Introduction

2 Context

3 Empirical processes under study

4 Theorems Assumptions Results Proof sketch

Process centered byFN

Process centered byF

5 Application to the poverty rate estimation problem

6 Short comparison with other results

7 Conclusion and perspectives

8 The end

(32)

Definition

Let D(R) be the space of c`adl`ag functions onR equipped with the Skorohod topology. Let Dφ⊂D(R) consist ofF ∈D(R) that are non-decreasing. Then for F ∈Dφ, the poverty rate is defined as

φ(F) =F βF−1(α)

(7) for fixed 0< α, β <1, whereF−1(α) = inf{t :F(t)≥α}. Typical choices are α= 0.5 and β = 0.5 (INSEE) orβ = 0.6 (EUROSTAT). Its Hadamard derivative is given by

φ0F(h) =−βf(βF−1(α))

f(F−1(α)) h(F−1(α)) +h(βF−1(α)). (8)

(33)

Application to the poverty rate estimation problem

Result

Corollary

Under conditions defined previously, if F is differentiable at F−1(α)with positive derivative f(F−1(α)), the random variables √

n(φ(FHTN )−φ(FN)) and √

n(φ(FHTN )−φ(F))converge in distribution to a mean zero normal random variable with variance

σ2HT,α,β2f(βF−1(α))2

f(F−1(α))2 γπ1α+γπ2α2

π1φ(F) +γπ2φ(F)2−2βf(βF−1(α))

f(F−1(α)) φ(F) γπ1π2α , (9) where γπ1π1+λand γπ2π2−λ.

(34)

1 Introduction

2 Context

3 Empirical processes under study

4 Theorems Assumptions Results Proof sketch

Process centered byFN

Process centered byF

5 Application to the poverty rate estimation problem

6 Short comparison with other results

7 Conclusion and perspectives

8 The end

(35)

Short comparison with other results

Comparison

F´evrier and Ragache (2001) : similar to us.

Breslow and Wellner (2007) and Saegusa and Wellner (2013) : equal probabilities.

Wang (2012) : similar to us when centered by F but assumptions missing for tightness.

Conti (2014) : similar to us.

Bertail, Chautru and Cl´emen¸con (2016) : more general empirical processes, results for Poisson and high entropy designs.

(36)

1 Introduction

2 Context

3 Empirical processes under study

4 Theorems Assumptions Results Proof sketch

Process centered byFN

Process centered byF

5 Application to the poverty rate estimation problem

6 Short comparison with other results

7 Conclusion and perspectives

8 The end

(37)

Conclusion and perspectives

Conclusions :

Results also for H´ajek.

Results for random n and inclusion probabilities.

Results for high entropy designs with conditions on the rate at which dN =P

i∈Upi(1−pi) tends to infinity, compared toN andn.

Perspectives :

Take into account auxiliary information (at the design stage or at the estimation stage)

Consider stratified sampling designs,. . .

(38)

Thank you for your attention !

(39)

The end

Some papers from the literature

Berger, Y. G. and Skinner, C. J. (2003). Variance estimation for a low income proportion. J. Roy. Statist. Soc. Ser. C 52 457-468.

Bertail, P. , Chautru, E. and Cl´emen¸con, S. (2014). Empirical processes in survey sampling. Accepted in Scand. J.

Statist.

Billingsley, P. (1999). Convergence of probability measures, second ed. Wiley Series in Probability and Statistics : Probability and Statistics. John Wiley & Sons, Inc., New York A Wiley-Interscience Publication.

Boistard, H., Lopuha¨a, H. P. and Ruiz-Gazen, A. (2016). Functional central limit theorems for one-stage sampling designs. Accepted in Ann. Statist.

Breslow, N. E. and Wellner, J. A. (2007). Weighted likelihood for semiparametric models and two-phase strati ?ed samples, with application to Cox regression. Scand. J. Statist. 34 86-102.

Conti, P. L. (2014). On the estimation of the distribution function of a finite population under high entropy sampling designs, with applications. Sankhya B 76 234-259.

Dell, F. and d’Haultfœuille, X. (2008). Measuring the evolution of complex indicators : Theory and application to the poverty rate in France. Ann. Econom. Statist. 90 259-290.

Rubin-Bleuer, S. and Schiopu Kratina, I. (2005). On the two-phase framework for joint model and design-based inference. Ann. Statist. 33 2789-2810.

van der Vaart, A. W. (1998). Asymptotic statistics. Cambridge Series in Sta- tistical and Probabilistic Mathematics 3.

Cambridge University Press, Cambridge.

van der Vaart, A. W. and Wellner, J. A. (1996). Weak convergence and empirical processes. Springer Series in Statistics.

Springer-Verlag, New York With applications to statistics.

Wang, J. C. (2012). Sample distribution function based goodness-of-fit test for complex surveys. Comput. Statist. Data Anal. 56 664-679.

Références

Documents relatifs

We prove that a class of weighted semilinear reaction diffusion equations on R&#34; generates gradient-like semiflows on the Banach space of bounded uniformly continuous functions

We study the asymptotics of the survival probability for the critical and decomposable branching processes in random environment and prove Yaglom type limit theorems for

Our main motivation to consider such a generalization comes from the second reason : these types of random measures play a role in the modeling of almost surely discrete priors, such

In the sequel, we will build again the spectral measure for invertible transformation and reversible Markov chain and then apply to Ergodic theorem and Central limit theorem..

An empirical process approach to the uniform consistency of kernel-type function estimators. Uniform in bandwidt consistency of kernel-type

In Section 4.1 we derive upper functions for L p -norm of some Wiener integrals (Theorem 9) and in Section 4.2 we study the local modulus of continuity of gaussian functions defined

We consider the estimation of the variance and spatial scale parameters of the covariance function of a one-dimensional Gaussian process with xed smoothness pa- rameter s.. We study

The proof is based on a new central limit theorem for dependent sequences with values in smooth Banach spaces, which is given in Section 3.1.1.. In Section 2.2, we state the