Sequential procedure for SRSWR - Springer Series in Statistics

j= 0;

Fork= 1, . . . , N do

select thekth unitsktimes according to the binomial distribution B

n−

k−1 i=1

si, 1 N−k+ 1

; EndFor.

4.7 Links Between the Simple Sampling Designs

The following relations of conditioning can be proved using conditioning with respect to a sampling design:

• p_BERNWR(s, µ|R_n) =p_SRSWR(s, n),for allµ∈R^∗₊,

• p_BERNWR(s, θ|S) =p_BERN

s, π= θ 1 +θ

, for all θ∈R^∗₊,

• p_BERNWR(s, θ|S_n) =p_SRSWOR(s, n), for all θ∈R^∗₊,

• p_SRSWR(s, n|S_n) =p_SRSWOR(s, n),

• p_BERN(s, π|S_n) =p_SRSWOR(s, n), for all π∈]0,1[.

These relations are summarized in Figure 4.1 and Table 4.1, page 62.

BERNWR SRSWOR

BERN

SRSWR reduction

conditioning onS conditioning onSn

conditioning onSⁿ

conditioning onRn

with 0≤n≤N conditioning onSn

Fig. 4.1.Links between the main simple sampling designs

Table4.1.Mainsimplerandomsamplingdesigns

NotationBERNWRSRSWRBERNSRSWOR

Designp(s) µ n(s)

eNµ

k∈U 1sk! n!Nn

k∈U 1sk! π n(s)(1−π) N−n(s) Nn −1

SupportQRRnSSn

Char.functionφ(t)expµ

k∈U (e itk−1) 1N

k∈U e itk n

k∈U 1+π e itk−1 Nn −1

s∈Sn e its

ReplacementWOR/WRwithrepl.withrepl.withoutrepl.withoutrepl.

Samplesizen(S)randomﬁxedrandomﬁxed

Expectationµkµ nN π nN Inclusionprobabilityπk1−e −µ1− N−1N nπ nN VarianceΣkkµ n(N−1)N2 π(1−π) n(N−n)N2

Jointexpectationµk,k=µ2n(n−1)N2 π2n(n−1)N(N−1) CovarianceΣk,k=0− nN20− n(N−n)N2(N−1) Basicest.Y

k∈U ykSkµ Nn

k∈U ykSk

k∈U ykSkπ Nn

k∈U ykSk

Variancevar(Y)

k∈U y2kµ N 2σ 2yn 1−ππ

k∈U y 2kN 2N−nN V 2yn Varianceest.var(Y)

k∈U y2kµ2SkN 2v 2yn (1−π)

k∈U y 2kSkN 2N−nN v 2yn

5 Unequal Probability Exponential Designs

5.1 Introduction

A sampling design is a multivariate discrete distribution and an exponential design is thus an exponential multivariate discrete distribution. Exponential designs are a large family that includes simple random sampling, multinomial sampling, Poisson sampling, unequal probability sampling with replacement, and conditional or rejective Poisson sampling. A large part of the posthumous book of H´ajek (1981) is dedicated to the exponential family. H´ajek advocated for the use of Poisson rejective sampling but the link between the inclusion probabilities of Poisson sampling and conditional Poisson sampling was not yet clearly elucidated. Chen et al. (1994) have taken a big step forward by linking the question of conditional Poisson sampling to the general theory of the exponential family.

The fundamental point developed by Chen et al. (1994) is the link between the parameter of the exponential family and its mean; that is, its vector of inclusion probabilities. Once this link is clariﬁed, the implementation of the classical algorithms follows quite easily. Independently, Jonasson and Nerman (1996), Aires (1999, 2000a), Bondesson et al. (2004), and Traat et al. (2004) have investigated the question of conditional Poisson sampling. Chen and Liu (1997), Chen (1998), and Deville (2000) have improved the algorithms and the technique of computation of inclusion probabilities. A large part of the material of this chapter has been developed during informal conversations with Jean-Claude Deville. In this chapter, we attempt to present a coherent theory of exponential designs. A unique deﬁnition is given, and all exponential designs can be derived by changes of support. The application of the classical algorithms presented in Chapter 3 allows for the deduction of nine sampling algorithms.

5.2 General Exponential Designs

5.2.1 Minimum Kullback-Leibler Divergence

In order to identify a unique sampling design that satisﬁes a predetermined vector of meansµ, a general idea is to minimize the Kullback-Leibler diver-gence (see Kullback, 1959):

H(p, p_r) =

s∈Q

p(s) log p(s) p_r(s),

wherep_r(s) is a design of reference onQ. The divergenceH(p, p_r) (also called relative entropy) is always positive and H(p, p_r) = 0 when p(s) =p_r(s), for alls∈ Q.The objective is thus to identify the closest sampling design (in the sense of the Kullback-Leibler divergence) top_r(s) that satisﬁes ﬁxed inclusion probabilities.

In this chapter, the study of minimum Kullback-Leibler divergence designs is restricted to the case where the reference sampling design is simple; that is,

p_r(s) =p_SIMPLE(s, θ,Q) = θ^n(s)'

k∈U 1 sk!

s∈Qθ^n(s)'

k∈U 1 sk!

. (5.1)

Because H(p, p_r) is strictly convex with respect to p, the minimization of H(p, p_r) under linear constraints provides a unique solution. The Kullback-Leibler divergence H(p, p_r) is minimized under the constraints

s∈Q

p(s) = 1, (5.2)

and

s∈Q

sp(s) =µ. (5.3)

The vectorµbelongs toQ^◦ (the interior ofQ) (see Remark 4, page 15), which ensures that there exists at least one sampling design onQwith meanµ.

The Lagrangian function is L(p(s), β,γ) =

s∈Q

p(s) log p(s) p_r(s)−β

(

s∈Q

p(s)−1 )

−γ (

s∈Q

sp(s)−µ )

, whereβ ∈Randγ∈R^N are the Lagrange multipliers. By setting the deriva-tives ofL with respect top(s) equal to zero, we obtain

∂L(p(s), β,γ)

∂p(s) = log p(s)

p_r(s)+ 1−β−γs= 0,

5.2 General Exponential Designs 65 which gives

p(s) =p_r(s) exp(γs+β−1).

By using the constraint (5.2), we get

p(s) = p_r(s) expγs

s∈Qp_r(s) expγs.

Now, by replacingp_r(s) by (5.1) we get the exponential designs p_EXP(s,λ,Q) = identiﬁed by means of the constraint (5.3). We show that the problem of derivingλfromµis one of the most intricate questions of exponential designs.

5.2.2 Exponential Designs (EXP) We refer to the following deﬁnition.

Deﬁnition 45.A sampling design p_EXP(.)on a supportQ is said to be expo-nential if it can be written

p_EXP(s,λ,Q) =g(s) exp/

andα(λ,Q)is called the normalizing constant and is given by α(λ,Q) = log

s∈Q

g(s) expλs.

As it is the case for all of the exponential families, the expectation can be obtained by diﬀerentiating the normalizing constant

α(λ,Q) =∂α(λ,Q)

By diﬀerentiating the normalizing constant twice, we get the variance-covariance operator:

The characteristic function of an exponential design is given by φ_EXP(t) =

s∈Q

g(s)/

exp(it+λ)s−α(λ,Q)0

= exp [α(it+λ,Q)−α(λ,Q)]. Simple random sampling designs are a particular case of exponential de-signs. Indeed, we have

p_EXP(s,1logθ,Q) =p_SIMPLE(s, θ,Q), θ∈R+, where 1is a vector of N ones.

The parameterλ can be split into two parts:λa, that is, the orthogonal projection ofλonto−→Q (the direction ofQ, see Deﬁnition 9, page 12) andλb, that is, the orthogonal projection of λ onto Invariant Q (see Deﬁnition 11, page 12). We thus have thatλ=λa+λb,andλ_aλb = 0.Moreover, we have the following result:

Result 18.Let λa and λb be, respectively, the orthogonal projection on −→Q andInvariantQ.Then

p_EXP(s,λ,Q) =p_EXP(s,λ_a+λ_b,Q) =p_EXP(s,λ_a,Q) =p_EXP(s,λ_a+bλ_b,Q), for any b∈R.

Proof.

p_EXP(s,λ_a+bλ_b,Q)

= g(s) exp(λ_a+bλ_b)s

s∈Qg(s) exp(λa+bλb)s= g(s)(expλ_as)(expbλ_bs)

s∈Qg(s)(expλ_as)(expbλ_bs). Becauseλ_b∈Invariant (Q),(s−µ)λ_b= 0,and thusλ_bs=c,wherec=µλ_b for alls∈ Q.Thus,

p_EXP(s,λ_a+bλ_b,Q)

= g(s)(expλ_as)(expbc)

s∈Qg(s)(expλ_as)(expbc) = g(s)(expλ_as)

s∈Qg(s)(expλ_as)=p_EXP(s,λ_a,Q).2 Example 8.If the support isRn =

s∈N^N_k∈Us_k=n ,then InvariantRn={u∈R^N|u=a1for alla∈R},

−→Rn=

u∈R^N

k∈U

x_k = 0

, λb =11

k∈U

λ_k=1¯λ, and

λa= (λ₁−¯λ · · · λ_k−λ¯ · · · λ_N−λ),¯ where 1is a vector of ones ofR^N.

5.3 Poisson Sampling Design With Replacement (POISSWR) 67 The main result of the theory of exponential families establishes that the mapping between the parameter λand the expectationµis bijective.

Theorem 5.Let p_EXP(s,λ,Q)be an exponential design on supportQ.Then µ(λ) =

s∈Q

sp_EXP(s,λ,Q)

is a homeomorphism of−→Q (the direction ofQ) andQ^◦ (the interior ofQ); that is, µ(λ)is a continuous and bijective function from−→Q toQ^◦ whose inverse is also continuous.

Theorem 5 is an application to exponential designs of a well-known result of exponential families. The proof is given, for instance, in Brown (1986, p. 74).

Example 9.If the support isRn,thenµ(λ) is bijective from

−→R_n=

u∈R^N

k∈U

u_k = 0

, to

R◦n=

u∈]0, n[^N

k∈U

u_k=n

Another important property of the exponential design is that the parame-ter does not change while conditioning with respect to a subset of the support.

Result 19.Let Q1 andQ2 be two supports such that Q2⊂ Q1.Also, let p₁(s) =g(s) exp/

λs−α(λ,Q1)0 be an exponential design with support Q1.Then

p₂(s) =p₁(s|Q₂)

is also an exponential sampling design with the same parameter.

The proof is obvious.

5.3 Poisson Sampling Design With Replacement (POISSWR)

5.3.1 Sampling Design

Deﬁnition 46.An exponential design deﬁned onRis called a Poisson sam-pling with replacement (POISSWR).

The Poisson sampling with replacement can be derived from Deﬁnition 45,

The normalizing constant is α(λ,R) = log

which allows reformulating the deﬁnition of Poisson sampling with replace-ment.

Deﬁnition 47.A sampling designp_POISSWR(.,λ)onRis said to be a Poisson sampling with replacement if it can be written

p_POISSWR(s,λ) = &

5.3 Poisson Sampling Design With Replacement (POISSWR) 69 5.3.2 Estimation

The Hansen-Hurwitz estimator is Y!_HH = and can be estimated by

% the improved Hansen-Hurwitz estimator can be computed

Y!_IHH =

The improvement brings an important decrease of the variance with respect to the Hansen-Hurwitz estimator.

Finally, the Horvitz-Thompson estimator is Y!_HT = and is equal to the improved Hansen-Hurwitz estimator.

Algorithm 5.1Sequential procedure for POISSWR

Dans le document Springer Series in Statistics (Page 70-79)