Plan de sondage informatif, distribution pondérée, et maximum de vraisemblance.

(1)

Plan de sondage informatif, distribution pondérée, et maximum de vraisemblance.

D. Bonnéry, F. Coquet, J. Breidt

* Crest (Ensai)

** Colorado State University

5 novembre 2012

(2)

1 Informative selection and asymptotic framework Informative selection mechanism

Sample distribution

2 Parametric estimation

Maximum pseudo-likelihood estimator Convergence

Simulations

(3)

Motivation and background Inference on a parametric model References

Informative selection mechanism Sample distribution

An observation is the outcome of two random processes:

Thepopulation realization:

variables are generated for each element of a population U of size N.

Theselection mechanism: a sample is drawn fromU. The observation usually consists of the values of the study variables for each element in the sample.

pΩ,A ,P^q

b b

b b b

b b

b

b bbb

b b

b b b

b b

b

b bbb

b

b bbb

(4)

Define:

pY1, . . . , YNq: study variable, pZ1, . . . , ZNq: design variables,

Π: design measure and symmetric function of pZ₁, . . . , Z_Nq, pJ₁, . . . , J_Nq: sample (vector of N^N),

P^Π,Y^,Za.s.pp, y, zq,P^J^|Πp,Z^z,Y^y p.

π_kΠptJ_k¥1uq: inclusion probability, n°

kPUJ_k: sample size, Assume:

lim_N_Ñ8N¹n¡0.

D. Bonnéry- Crest ( ) 7e colloque francophone sur les sondages 4/23

(5)

Definition

The weight function is:

ρ:yÞÑ lim

NÑ8ErJ_k|Y_k ys {ErJ_ks

Denotep`qthe `th drawn element.

Does P^p^Y^p^`^q^qⁿ^`¹ ρ.P^Y¹_bn

?

(6)

Observations behave like iidρ.f Zk

Y_k

1

Observations do not behave like iid ρ.f Z_k

Y_k

1

Population cdf, empirical sample cdf and

sample cdf

f,ρ.f and sample kernel density estimator

(7)

Results:

Under asymptotic independance of draws in the one-dimensinal case:

the empirical sample cdf converges to αÞÑ pρ.P^Yqpp8, αsq (Bonnéry, Breidt and Coquet, 2011)

a kernel density estimator (kde) of the pdf converges to

ρ.d P^Y dλ Goal:

Parametric estimation.

(8)

Maximum pseudo-likelihood estimator Simulations

Inference on population model

(9)

Consider the population model pY_kq_k_Pt_1,...,N_u pf_θ.λq^bN,

pY_k, Z_kq_k_Pt_1,...,N_u are iid realizations, P^Z^k^|^Y^k is parametrized byξ PΞ The target of the inference isθ.

Definition

ρθ,ξ :yÞÑ lim

NÑ8EθξrJk|Ykys {EθξrJks.

(10)

Following Pfeffermann and Krieger (1992) we define:

θˆpξq arg max

θPΘ

# _n

¸

`1

ln ρθξfθ Y_p_`_q+ .

AssumeA0. Let

ξˆbe a consistent estimator ofξ, L

Y_p_`_q

`Pt1,...,nu, θ, ξ n¹°_n

k1lnpρθξfθq Y_p_`_q, θ, ξ Definition

The maximum pseudo-likelihood estimator associated toξˆis:

θˆarg max

θPΘ

!L Y_p_`_q

`Pt1,...,nu, θ,ξˆ )

,

(11)

Definition Define

mpyq ErJ₁|Y₁ ys

m¹py1, y2q ErJ2|Y1 y1, Y2 y2s vpyq VarrJ1|Y1 ys

cpy1, y2q CovrJ1, J2|Y1y1, Y2y2s δpy₁, y₂q m¹py₁, y₂qm¹py₂, y₁q mpy₁qmpy₂q

Y ρ_θξf_θ.λ J11 Eθ,ξ

Blnpρ_θξf_θq

Bθ pY, θ, ξq ₂

J

B p q B p q

p q

(12)

Assumption

Standard conditions on ρθ,ξf, asymptotic independance of draws:

@gP

# 1,

Blnp^ρ^θξ^f^θq

Bθ p., θ, ξq 2

,

Blnp^ρ^θξ^f^θq

Bθ p., θ, ξq^B^lnp^ρ^θξ^f^θq

Bξ p., θ, ξq +

,

Eθξr|gpY1qgpY2q|c,θ,ξpY1, Y2qs oNÑ8p1q, E_θξr|gpY₁qgpY₂q|δ_,θ,ξpY₁, Y₂qs o_N_Ñ8p1q, E_θξ

g²pv_θ,ξ m²_θ,ξq pY₁q

opNq.

(13)

Theorem Suppose

?n B

BθL¯ Y_p_`_q

`Pt1,...,nu, θ, ξ ξˆξ

ÝÑNL

0,Σ

Σ₁₁ Σ₁₂ Σ₂₂

.

Then ?

n

θˆθ {σ ÝÑ^L N p0,1q,

with

σ² Σ₁₁ J₁₁²

J12

J₁₁² pΣ₂₂J122Σ₁₂q.

(14)

Consider:

YNpθ.1, Id_Nq,

εNp0, IdNq,εandY are independent,

Z ξ.Y ηε,η known, N5000,

N1

N 0.7, ^N_N²0.3,

n1

N11{70, _Nⁿ²

24{30.

Z_k

Yk ZγpN_γ1q

(15)

Z_k

Y_k

Z_k

Y_k

Z_k

Y_k

η10 η1 η0.1

(16)

Let

t₁lim_N_Ñ8^N_N¹ 0.7 ζ φ¹pt₁q

τ_hlim_N_Ñ8 nh

Nh , τ₁ 1{70,τ₂ 4{30 ppyq

P

ε ^ζ?

ξ² η² ξpθyq η

We get:

ρ_θ,ξpyq τ1ppyq τ2p1ppyqq τ₁t₁ τ₂p1t₁q

Figure: Plot ofρ forθ1.5, ξ2,ηP t.1,1,10u

y ρ₈_,θ,ξpyq

1 2

(17)

Z_k

Y_k

Z_k

Y_k

Z_k

Y_k

η10 η1 η0.1

(18)

To estimateξ, we use ξˆ

°_n

`1Z_p_`_qY_p_`_q{π_p_`_q

°_n

`1Y_p²_`_q{π_p_`_q . Compareθˆto

θ˜°_n

`1 Y_p_`_q

π_p`q arg max_θ_P_Θ!°_N

k1

lnpfθpYkqqJk

πk

) ., θ¯n¹°

`Pt1,...,nuY_p`q.

(19)

Table: Calculus of mean and mean square error on 1000 simulations

θ ξ σ Meanr.s MSEr.s b

MSE MSEpˆθq

1

n_γlimγÑ8nγVarr.s 1.5 2 0.1 θˆ 1.502 7.643 10⁴ 1 6.962 10⁴

θ˜ 1.5 4.811 10³ 2.509 4.523 10³ θ¯ 2.329 6.887 10¹ 30.02 3.979 10³

1.5 2 1 θˆ 1.5 1.975 10³ 1 2.975 10³

θ˜ 1.501 5.583 10³ 1.681 6.024 10³ θ¯ 2.241 5.509 10¹ 16.7 3.971 10³

1.5 2 10 θˆ 1.497 5.501 10³ 1 2.943 10³ θ˜ 1.5 1.030 10² 1.368 1.030 10²

¯ 2 3

(20)

Summary:

Definition of sample pdf and limit sample pdf, applicable to with or without replacement and fixed or random size samples, Simple and verifiable conditions on the sequence of sample schemes,

Pertinence: The sample behaves like an independent sample (Uniform cdf convergence),

The limit sample pdf can be used for inference (Convergence of the maximum pseudo likelihood estimator),

Asymptotic normality under stratified sampling and fixed number of strata.

(21)

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3).

Sugden, R. A. and Smith, T. M. F. (1984). Ignorable and informative designs in survey sampling inference. Biometrika, 71(3).

(22)

Bonnéry, D., Breidt, F. J., and Coquet, F. (2011). Uniform convergence of the empirical cumulative distribution function under informative selection from a finite population. To appear in Bernoulli.

Pfeffermann, D. and Krieger, A. M. (1992). Maximum likelihood estimation for complex sample surveys. Survey Methodology, 18(1):225–255.

Gong, G. and Samaniego, F. J. (1981). Pseudomaximum likelihood estimation: theory and applications. The Annals of Statistics, 9(4).

Pfeffermann, D., Krieger, A. M., and Rinott, Y. (1998a). Parametric distributions of complex survey data under informative probability sampling. Statistica Sinica, 8(4).

(23)

Thank you for your attention.