Plan de sondage informatif, distribution pondérée, et maximum de vraisemblance.
D. Bonnéry, F. Coquet, J. Breidt
* Crest (Ensai)
** Colorado State University
5 novembre 2012
1 Informative selection and asymptotic framework Informative selection mechanism
Sample distribution
2 Parametric estimation
Maximum pseudo-likelihood estimator Convergence
Simulations
Motivation and background Inference on a parametric model References
Informative selection mechanism Sample distribution
An observation is the outcome of two random processes:
Thepopulation realization:
variables are generated for each element of a population U of size N.
Theselection mechanism: a sample is drawn fromU. The observation usually consists of the values of the study variables for each element in the sample.
pΩ,A ,Pq
b b
b b
b b b
b b
b b
b
b bbb
b b
b b
b b b
b b
b b
b
b bbb
b
b
b bbb
Motivation and background Inference on a parametric model References
Informative selection mechanism Sample distribution
Define:
pY1, . . . , YNq: study variable, pZ1, . . . , ZNq: design variables,
Π: design measure and symmetric function of pZ1, . . . , ZNq, pJ1, . . . , JNq: sample (vector of NN),
PΠ,Y,Za.s.pp, y, zq,PJ|Πp,Zz,Yy p.
πkΠptJk¥1uq: inclusion probability, n°
kPUJk: sample size, Assume:
limNÑ8N1n¡0.
D. Bonnéry- Crest ( ) 7e colloque francophone sur les sondages 4/23
Motivation and background Inference on a parametric model References
Informative selection mechanism Sample distribution
Definition
The weight function is:
ρ:yÞÑ lim
NÑ8ErJk|Yk ys {ErJks
Denotep`qthe `th drawn element.
Does PpYp`qqn`1 ρ.PY1bn
?
Observations behave like iidρ.f Zk
Yk
1
Observations do not behave like iid ρ.f Zk
Yk
1
Population cdf, empirical sample cdf and
sample cdf
f,ρ.f and sample kernel density estimator
Motivation and background Inference on a parametric model References
Informative selection mechanism Sample distribution
Results:
Under asymptotic independance of draws in the one-dimensinal case:
the empirical sample cdf converges to αÞÑ pρ.PYqpp8, αsq (Bonnéry, Breidt and Coquet, 2011)
a kernel density estimator (kde) of the pdf converges to
ρ.d PY dλ Goal:
Parametric estimation.
Motivation and background Inference on a parametric model References
Maximum pseudo-likelihood estimator Simulations
Inference on population model
D. Bonnéry- Crest ( ) 7e colloque francophone sur les sondages 8/23
Motivation and background Inference on a parametric model References
Maximum pseudo-likelihood estimator Simulations
Consider the population model pYkqkPt1,...,Nu pfθ.λqbN,
pYk, ZkqkPt1,...,Nu are iid realizations, PZk|Yk is parametrized byξ PΞ The target of the inference isθ.
Definition
ρθ,ξ :yÞÑ lim
NÑ8EθξrJk|Ykys {EθξrJks.
Motivation and background Inference on a parametric model References
Maximum pseudo-likelihood estimator Simulations
Following Pfeffermann and Krieger (1992) we define:
θˆpξq arg max
θPΘ
# n
¸
`1
ln ρθξfθ Yp`q+ .
AssumeA0. Let
ξˆbe a consistent estimator ofξ, L
Yp`q
`Pt1,...,nu, θ, ξ n1°n
k1lnpρθξfθq Yp`q, θ, ξ Definition
The maximum pseudo-likelihood estimator associated toξˆis:
θˆarg max
θPΘ
!L Yp`q
`Pt1,...,nu, θ,ξˆ )
,
D. Bonnéry- Crest ( ) 7e colloque francophone sur les sondages 10/23
Motivation and background Inference on a parametric model References
Maximum pseudo-likelihood estimator Simulations
Definition Define
mpyq ErJ1|Y1 ys
m1py1, y2q ErJ2|Y1 y1, Y2 y2s vpyq VarrJ1|Y1 ys
cpy1, y2q CovrJ1, J2|Y1y1, Y2y2s δpy1, y2q m1py1, y2qm1py2, y1q mpy1qmpy2q
Y ρθξfθ.λ J11 Eθ,ξ
Blnpρθξfθq
Bθ pY, θ, ξq 2
J
B p q B p q
p q
Motivation and background Inference on a parametric model References
Maximum pseudo-likelihood estimator Simulations
Assumption
Standard conditions on ρθ,ξf, asymptotic independance of draws:
@gP
# 1,
Blnpρθξfθq
Bθ p., θ, ξq 2
,
Blnpρθξfθq
Bθ p., θ, ξqBlnpρθξfθq
Bξ p., θ, ξq +
,
Eθξr|gpY1qgpY2q|c,θ,ξpY1, Y2qs oNÑ8p1q, Eθξr|gpY1qgpY2q|δ,θ,ξpY1, Y2qs oNÑ8p1q, Eθξ
g2pvθ,ξ m2θ,ξq pY1q
opNq.
D. Bonnéry- Crest ( ) 7e colloque francophone sur les sondages 12/23
Motivation and background Inference on a parametric model References
Maximum pseudo-likelihood estimator Simulations
Theorem Suppose
?n B
BθL¯ Yp`q
`Pt1,...,nu, θ, ξ ξˆξ
ÝÑNL
0,Σ
Σ11 Σ12 Σ22
.
Then ?
n
θˆθ {σ ÝÑL N p0,1q,
with
σ2 Σ11 J112
J12
J112 pΣ22J122Σ12q.
Motivation and background Inference on a parametric model References
Maximum pseudo-likelihood estimator Simulations
Consider:
YNpθ.1, IdNq,
εNp0, IdNq,εandY are independent,
Z ξ.Y ηε,η known, N5000,
N1
N 0.7, NN20.3,
n1
N11{70, Nn2
24{30.
Zk
Yk ZγpNγ1q
D. Bonnéry- Crest ( ) 7e colloque francophone sur les sondages 14/23
Motivation and background Inference on a parametric model References
Maximum pseudo-likelihood estimator Simulations
Zk
Yk
Zk
Yk
Zk
Yk
η10 η1 η0.1
Motivation and background Inference on a parametric model References
Maximum pseudo-likelihood estimator Simulations
Let
t1limNÑ8NN1 0.7 ζ φ1pt1q
τhlimNÑ8 nh
Nh , τ1 1{70,τ2 4{30 ppyq
P
ε ζ?
ξ2 η2 ξpθyq η
We get:
ρθ,ξpyq τ1ppyq τ2p1ppyqq τ1t1 τ2p1t1q
Figure: Plot ofρ forθ1.5, ξ2,ηP t.1,1,10u
y ρ8,θ,ξpyq
1 2
D. Bonnéry- Crest ( ) 7e colloque francophone sur les sondages 16/23
Motivation and background Inference on a parametric model References
Maximum pseudo-likelihood estimator Simulations
Zk
Yk
Zk
Yk
Zk
Yk
η10 η1 η0.1
Motivation and background Inference on a parametric model References
Maximum pseudo-likelihood estimator Simulations
To estimateξ, we use ξˆ
°n
`1Zp`qYp`q{πp`q
°n
`1Yp2`q{πp`q . Compareθˆto
θ˜°n
`1 Yp`q
πp`q arg maxθPΘ!°N
k1
lnpfθpYkqqJk
πk
) ., θ¯n1°
`Pt1,...,nuYp`q.
D. Bonnéry- Crest ( ) 7e colloque francophone sur les sondages 18/23
Motivation and background Inference on a parametric model References
Maximum pseudo-likelihood estimator Simulations
Table: Calculus of mean and mean square error on 1000 simulations
θ ξ σ Meanr.s MSEr.s b
MSE MSEpˆθq
1
nγlimγÑ8nγVarr.s 1.5 2 0.1 θˆ 1.502 7.643 104 1 6.962 104
θ˜ 1.5 4.811 103 2.509 4.523 103 θ¯ 2.329 6.887 101 30.02 3.979 103
1.5 2 1 θˆ 1.5 1.975 103 1 2.975 103
θ˜ 1.501 5.583 103 1.681 6.024 103 θ¯ 2.241 5.509 101 16.7 3.971 103
1.5 2 10 θˆ 1.497 5.501 103 1 2.943 103 θ˜ 1.5 1.030 102 1.368 1.030 102
¯ 2 3
Motivation and background Inference on a parametric model References
Summary:
Definition of sample pdf and limit sample pdf, applicable to with or without replacement and fixed or random size samples, Simple and verifiable conditions on the sequence of sample schemes,
Pertinence: The sample behaves like an independent sample (Uniform cdf convergence),
The limit sample pdf can be used for inference (Convergence of the maximum pseudo likelihood estimator),
Asymptotic normality under stratified sampling and fixed number of strata.
D. Bonnéry- Crest ( ) 7e colloque francophone sur les sondages 20/23
Motivation and background Inference on a parametric model References
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3).
Sugden, R. A. and Smith, T. M. F. (1984). Ignorable and informative designs in survey sampling inference. Biometrika, 71(3).
Motivation and background Inference on a parametric model References
Bonnéry, D., Breidt, F. J., and Coquet, F. (2011). Uniform convergence of the empirical cumulative distribution function under informative selection from a finite population. To appear in Bernoulli.
Pfeffermann, D. and Krieger, A. M. (1992). Maximum likelihood estimation for complex sample surveys. Survey Methodology, 18(1):225–255.
Gong, G. and Samaniego, F. J. (1981). Pseudomaximum likelihood estimation: theory and applications. The Annals of Statistics, 9(4).
Pfeffermann, D., Krieger, A. M., and Rinott, Y. (1998a). Parametric distributions of complex survey data under informative probability sampling. Statistica Sinica, 8(4).
D. Bonnéry- Crest ( ) 7e colloque francophone sur les sondages 22/23
Motivation and background Inference on a parametric model References
Thank you for your attention.