Fundamental Results - Sampford Rejective Procedure

7.4 Sampford Rejective Procedure

7.4.2 Fundamental Results

s∈Sn

p_SAMPFORD(s) = 1

and that the sampling design reaches the required inclusion probabilities, Sampford ﬁrst proved the following lemma:

Lemma 1.(Sampford, 1967) If

g(n, r, k) =

132 7 More on Unequal Probability Designs

by Expressions (7.6) and (7.8), we obtain the recursive relation:

g(n, r, k) =r(1−π_k)D_n−r+g(n, r+ 1, k), with the initial condition

g(n, n, k) = 1−π_k.

Expression (7.5) satisﬁes this recurrence relation. 2 Lemma 1 also holds when π_k >1.In the case whereπ_k = 1,Sampford (1967) pointed out that a modiﬁed form can be stated:

g(n, r, k) =π_k

n−1

t=r

D_n−t−1(¯k). (7.9)

Now, the most fundamental results can be derived from Lemma 1.

Result 31.(Sampford, 1967) For the Sampford design deﬁned in (7.4)

(i)

(i) Suppose that a phantom unit z has a null inclusion probability; that is, π_z=ω_z= 0.Then, by Lemma 1:

Now suppose that units kand are aggregated in a new unit α; that is, the size of this new populationU isN−1, π_α=π_k+π,andω_α=π_α/(1−π_α).

Moreover, definegandD as the functionsgandLdefined over the modified population U. Then

Ψ_k=

134 7 More on Unequal Probability Designs

The case where π_α = 1 can be treated by using Expression (7.9). Reverting to the original population, we obtain

Ψ_k= n t=2

(t−π_k+π)D_n−t(¯k,).¯ (7.11) The proof is completed by inserting (7.11) in (7.10). 2 7.4.3 Technicalities for the Computation of the Joint Inclusion Probabilities

In order to compute the joint inclusion probabilities, it is not necessary to enumerate all the possible samples. Sampford pointed out that the following relation, which is proved in Result 25, page 88, can be used,

D_m= 1

with the initial condition D₀ = 1. Next, a recursive relation can be used to computeD_m(¯k,) :¯

D_m(¯k,) =¯ D_m−(ω_k+ω)D_m−1(¯k,)¯ −ω_kωD_m−2(¯k,).¯

Gabler (1981) has shown that Sampford’s procedure has a smaller variance than multinomial sampling; that is, sampling with unequal probabilities with replacement.

Example 23.With Sampford’s method, forN = 6, n= 3,and π= (0.07 0.17 0.41 0.61 0.83 0.91), the matrix of joint inclusion probabilities is given by

Π=

0.07 0.004 0.012 0.022 0.046 0.057 0.004 0.17 0.029 0.054 0.114 0.139 0.012 0.029 0.41 0.145 0.289 0.345 0.022 0.054 0.145 0.61 0.466 0.533 0.046 0.114 0.289 0.466 0.83 0.745 0.057 0.139 0.345 0.533 0.745 0.91

⎞

7.4.4 Implementation of the Sampford Design General result

The most powerful algorithms for implementing Sampford’s design are based on the following result.

Result 32.Let Q be a support such that S_n−1 ⊂ Q. Let S₁ and S₂ be two independent random samples such that

• S₁= (S₁₁· · ·S_1N) is a random sample of size 1 and inclusion probabilities q_k =π_k/nfrom U,

• S₂= (S₂₁· · ·S_2N) is an exponential random sample with supportQwith parameterλ= (λ₁· · ·λ_N) andλ_k= log(ω_k) = log[π_k/(1−π_k)].

Then Pr[S₁+S₂ = s|(S₁+S₂) ∈ Sn] is a Sampford sampling design with inclusion probabilitiesπ_k, k∈U.

Proof. Because

Pr[S₁=s] = &

k∈U

q^s_k^k, for alls∈ S₁, and

Pr[S₂=s] = expλs

s∈Qexpλs = '

k∈Uω_k^s^k

s∈Q

k∈Uω_k^s^k, for alls∈ Q, we get

Pr[S₁+S₂=s] =

v∈S1

Pr[S₂=s−v]Pr[S₁=v]

∈U

sqPr[S=s−a],

for alls∈ {r+v, for allr∈ Q,v∈ S1},wherea∈ S1 such thata= 1 and a_k = 0 whenk=. By conditioning onSn, we obtain

Pr[S₁+S₂=s|(S₁+S₂)∈ S_n] =C_n

∈U

πs &

k∈Uk=

ω_k^s^k,

which is the Sampford sampling design.

Multinomial rejective Sampford’s procedure

If Q =Rn, we obtain a simple implementation by means of a multinomial rejective procedure that is presented in Algorithm 7.3. This procedure can be very slow especially when the sample size is large with respect to the population size.

136 7 More on Unequal Probability Designs

Algorithm 7.3Multinomial rejective Sampford’s procedure

1. Selectnunits with replacement, the ﬁrst drawing being made with probabilities qk=πk/nand all the subsequent ones with probabilities proportional toπk/(1− πk).

2. If any unit is selected multiple times, reject the sample and restart the procedure.

Poisson rejective Sampford’s procedure

IfQ=S, we obtain a simple implementation by means of a Poisson rejective procedure that is presented in Algorithm 7.4.

Algorithm 7.4Poisson rejective Sampford’s procedure 1. Select the ﬁrst unit with equal probability,qk=πk/n.

2. Select a sample by means of a Poisson sampling design (with replacement) with inclusion probabilities

πk= cπk

1−(1−c)πk, for anyc∈R+. (Of course, one can choosec= 1,which givesπk=πk.)

3. If any unit is selected twice or if the sample size is not equal to n, reject the sample and restart the procedure.

CPS rejective Sampford’s procedure

If Q =S_n, we obtain an implementation by means of a CPS rejective pro-cedure. Traat et al. (2004) have suggested the ingenious implementation presented in Algorithm 7.5.

Algorithm 7.5CPS rejective Sampford’s procedure 1. Select the ﬁrst unit with equal probability,qk=πk/n.

Letibe the selected unit.

2. Exchange the places of unitiand 1 in the list.

3. Perform a CPS design of sizen−1 by means of sequential procedure 5.8, page 92, with parameterλk= log(ωk) = log[πk/(1−πk)].

4. If unit 1 (formeri) is selected twice, reject the sample and restart the procedure.

7.5 Variance Approximation and Estimation

7.5.1 Approximation of the Variance

Each one of the unequal probability sampling methods presented in Chap-ters 5 through 7 has particular joint inclusion probabilities. Their variances of the estimator can theoretically be estimated by the Horvitz-Thompson es-timator (see Expression 2.18, page 28) or the Sen-Yates-Grundy eses-timator of the variance (see Expression 2.19, page 28). In practice, the use of joint in-clusion probabilities is often unrealistic because they are diﬃcult to compute andn² terms must be summed to compute the estimate.

Ideally, a practical estimator should be computed by a simple sum over the sample, and the computation of the joint inclusion probabilities should be avoided. Such an estimator exists for the multinomial sampling design (see Expression (5.5), page 72), but multinomial sampling is not satisfactory. In-deed, sampling with replacement can always be improved (see Section 5.4.2, page 72). It is, however, possible to conclusively approximate the variance for several sampling methods by using a quite simple estimator. On this mat-ter, Berger (1998a) showed that if the sampling design does not diverge too much from the exponential sampling design, it is possible to construct an approximation.

Numerous publications (H´ajek, 1981; Hartley and Chakrabarty, 1967; Dev-ille, 1993; Brewer, 1999; Aires, 2000b; Brewer, 2001; Brewer and Donadio, 2003; Matei and Till´e, 2006) are devoted to the derivation of an approxi-mation of the variance. A simple reasoning can provide an approxiapproxi-mation of variance for exponential designs without replacement: Result 26, page 89 shows that a CPS design p(s) is obtained by conditioning a Poisson design 3

p(s) given that its sample sizen3_S is ﬁxed. If var_POISSON(.) denotes the variance and cov_POISSON(.) the covariance under the Poisson sampling designp(s) and3 var(.) the variance under the designp(.), we can write

var

Y!_HT = var_POISSON

Y!_HT|3n_S =n .

If we suppose that the pair (Y!_HT,3n_S) has a bivariate normal distribution (on this topic see H´ajek, 1964; Berger, 1998a), we obtain

var_POISSON

Y!_HT|3n_S =n = var_POISSON

Y!_HT+ (n−n3_S)β

, where

β=

cov_POISSON

3n_S,Y!_HT var_POISSON(3n_S) , var_POISSON(3n_S) =

k∈U

π_k(1−3π_k),

138 7 More on Unequal Probability Designs exponential sampling design (see Deville and Till´e, 2005; Till´e, 2001, p. 117):

var_APPROX

According to the values given to b_k, we obtain numerous variants of this approximation. Indeed, Expression (7.13) can also be written

var_APPROX

∆, from which is possible to derive an approximation of Π by Π_APPROX =

∆_APPROX+ππ.

H´ajek approximation 1

The most common value forb_k has been proposed by H´ajek (1981):

b_k= π_k(1−π_k)N

N−1 (7.14)

(on this topic see also Ros´en, 1997; Till´e, 2001).

Example 24.WithN= 6, n= 3,andπ= (0.07 0.17 0.41 0.61 0.83 0.91), we obtain

b₁= (0.07812 0.16932 0.29028 0.28548 0.16932 0.09828),

∆_APPROX1=

0.077 0 0.008 0.022 0.046 0.057 0 0.172 0.025 0.059 0.115 0.139 0.008 0.025 0.381 0.174 0.295 0.347 0.022 0.059 0.174 0.583 0.462 0.529 0.046 0.115 0.295 0.462 0.832 0.740 0.057 0.139 0.347 0.529 0.740 0.918

⎞

Note that the diagonal ofΠ_APPROX1is not equal to the inclusion probabilities.

Fixed-point approximation

The general approximation given in (7.13) can also be written:

var_APPROX The exact variance can be written

var(Y!_HT) =

Deville and Tillé (2005) proposed to computeb_k in such a way that the coef-ficient of the y²_k’s are exact, which amounts to solving the following equation system to find another approximation of b_k,

b_k− b²_k

∈Ub =π_k(1−π_k). (7.16) In Matei and Tillé (2006), a large set of simulations shows that this approx-imation is very accurate. Because the equation system (7.16) is not linear, the coefficients b_k can be obtained by the fixed-point technique, using the following recurrence equation until convergence:

b⁽ⁱ⁾_k =

140 7 More on Unequal Probability Designs b⁽⁰⁾_k =π_k(1−π_k) N

N−1, k∈U.

Unfortunately, Equation (7.16) has no solution unless (see Deville and Till´e, 2005)

π_k(1−π_k)

∈Uπ(1−π) ≤1

2, for allkin U.

If the method is not convergent, one can use the following variant, which consists of using one iteration,

b⁽¹⁾_k =π_k(1−π_k)

b₂= (0.06922,0.1643,0.3431,0.3335,0.1643,0.08866),

∆_APPROX2=

0.07 0.002 0.008 0.023 0.048 0.058 0.002 0.17 0.021 0.057 0.118 0.142 0.008 0.021 0.41 0.152 0.292 0.347 0.023 0.057 0.152 0.61 0.459 0.530 0.048 0.118 0.292 0.459 0.83 0.743 0.058 0.142 0.347 0.530 0.743 0.91

⎞

Note that the diagonal ofΠ_APPROX2is now equal to the inclusion probabilities.

7.5.2 Estimators Based on Approximations

From the general approximation given in (7.13), it is possible to construct a class of estimators that depend only on first-order inclusion probabilities for allk∈S. A general variance estimator can be derived (see Deville and Tillé, 2005; Tillé, 2001, p. 117): A large set of values has been proposed forc_k.

Estimator 1

A simple value for c_k can be (Deville, 1993, see, for instance) : c_k = (1−π_k) n

n−1. (7.20)

Estimator 2 of Deville

In the same manuscript, Deville (1993) suggested a more complex value (see also Deville, 1999):

Estimator based on the approximation

When the b_k are deﬁned by solving Equation (7.16), the following coeﬃcient can be constructed:

In the case where the inclusion probabilities are equal, we obtain the variance of simple random sampling without replacement. Unfortunately, the inclusion probabilities must be known for the entire population in order to compute thec_k.

Fixed-point estimator

Deville and Till´e (2005) proposed to use the following development to derive a value forc_k.The estimator deﬁned in Expression (7.19) can also be written as

142 7 More on Unequal Probability Designs

Using Formula (2.18), we can look forc_k which satisﬁes the equation:

c_k− c²_k

∈USc = (1−π_k). (7.23) These coefficients can be obtained by the fixed-point technique, using the following recurrence equation until the convergence is fulfilled:

c⁽ⁱ⁾_k =

c⁽ⁱ⁻¹⁾_k ₂

∈USc⁽ⁱ⁻¹⁾

+ (1−π_k), fori= 0,1,2,3, . . . and using the initialization:

c⁽⁰⁾_k = (1−π_k) n

n−1, k such thatS_k>0.

Unfortunately, Equation (7.23) has no solution unless the following inequality is satisﬁed (see Deville and Till´e, 2005):

1−π_k

∈US(1−π)≤ 1

2, for allkin S.

If the method is not convergent, one can use the previous variant, which uses one iteration:

c⁽¹⁾_k = (1−π_k)

1 n(1−π_k) (n−1)

∈US(1−π)+ 1 2

It is diﬃcult to conclude that a particular estimator is better than the other ones. Nevertheless, a large set of simulations (see Deville, 1993; Brewer and Donadio, 2003; Matei and Till´e, 2006; Brewer, 2002, Chapter 9) shows that the estimator given in Expression (7.20) gives very satisfactory results and is almost always better than the Horvitz-Thompson and the Sen-Yates-Grundy estimators.

7.6 Comparisons of Methods with Unequal Probabilities

There exist a large number of sampling methods with unequal probabilities and ﬁxed sample size. When the same inclusion probabilities are used, none of the methods without replacement provides a better accuracy than the other ones, as shown in the following result.

Result 33.Let p₁(.)and p₂(.) be two sampling designs without replacement with the same ﬁrst-order inclusion probabilities π₁, . . . , π_N. Their variance-covariance operators are denoted ∆₁ and∆₂. If, for some u∈R^N,

u∆₁u<u∆₂u, then there exists at least one v∈R^N such that v∆₁v>v∆₂v.

Proof. (by contradiction, as given by Lionel Qualit´e)

Suppose that for all x ∈ R^N,x∆₁x ≤ x∆₂x, and that u∆₁u < u∆₂u.

It follows that x(∆₂−∆₁)x ≥ 0. Matrix ∆₂−∆₁ is thus positive semi-deﬁnite. Now, p₁(.) and p₂(.) have the same inclusion probabilities, which implies that ∆₁ and ∆₂ have the same trace. Thus, trace(∆₁−∆₂) = 0.

Because∆₂−∆₁is positive semi-deﬁnite, all the eigenvalues of∆₁−∆₂ are null andx(∆₂−∆₁)x= 0,which is contradictory tou∆₁u<u∆₂u. 2 Result 33 is quite disappointing because it does not allow choosing a sam-pling design. The variances of diﬀerent samsam-pling designs, however, can be compared with each other. An interesting indicator consists of computing the largest possible deviation (LPD) of variance between two sampling designs:

LPD(2|1) = max

u|uπ=0

u∆₂u u∆₁u−1.

Note that the LPD is equal to 0 if ∆₁ is equal to ∆₂, but the LPD is not symmetrical; that is, LPD(1|2) = LPD(2|1). Practically LPD(2|1) is the largest eigenvalue of matrix ∆⁺₁∆₂,where ∆⁺₁ is the Moore-Penrose inverse of∆₁.

Example 26.The LPD has been computed to compare several sampling de-signs for which it is possible to compute the matrix of joint inclusion proba-bilities:

• The CPS design (see Expression (5.26), page 86),

• Sampford’s method (see Expression (7.12), page 134),

• Till´e’s eliminatory method (see Expression (6.6), page 116),

• The random systematic method (see Expression (7.3), page 128).

These four matrices of inclusion probabilities are next compared to

• The H´ajek approximation (Approx 1) (see Expression (7.15), page 139),

• The ﬁxed-point approximation (Approx 2) (see Expression (7.18), page 140).

Table 7.4 contains the LPD computed among the four sampling designs and the two approximations.

Figure 7.3 contains a multidimensional scaling of the matrix of the LPD given in Table 7.4. First Table 7.4 and its transpose are added in order to obtain a symmetric matrix. Next, the optimal Euclidian representation inR² is obtained by multidimensional scaling.

Although the inclusion probabilities are very uneven, Table 7.4 shows that the CPS design and the Sampford design are very close and are well approxi-mated by the ﬁxed-point approximation (Approx 2). For instance, the use of ﬁxed-point approximation (Approx 2) in place of the true variance for a CPS provides a maximum overestimation of the variance equal to 8.5% and a max-imum underestimation of 14.5%. For the standard deviation, the maxmax-imum overestimation is 4.1% and the maximum underestimation is 7.0%.

144 7 More on Unequal Probability Designs

Table 7.4. Largest possible deviation for the CPS, Sampford, Till´e, and random systematic designs and two approximations of∆for Example 26; for instance, col-umn 1 and row 2 contain LPD(CPS|Sampford)

CPS Sampford Till´e Systematic Approx 1 Approx 2

CPS 0 0.032 0.157 0.385 0.134 0.085

Sampford 0.040 0 0.122 0.439 0.125 0.052 Till´e 0.374 0.322 0 0.876 0.228 0.204 Systematic 0.694 0.747 0.958 0 0.773 0.837 Approx 1 0.240 0.214 0.190 0.583 0 0.175 Approx 2 0.142 0.099 0.069 0.574 0.123 0

CPS Sampford

Till´e Systematic

Approx1 Approx2

Fig. 7.3. Multidimensional scaling (MDS) of the symmetrized matrix of the LPD from Example 26

Example 27.With another vector of inclusion probabilities

π= (0.15 0.3 0.35 0.45 0.5 0.55 0.6 0.65 0.7 0.75),

N = 10 andn= 5, Table 7.5 contains the matrix of the LPD. The vector π is less uneven and the LPD’s are thus still reduced.

Table 7.5. Largest possible deviation for the CPS, Sampford, Till´e and random systematic designs and two approximations of∆for Example 27

CPS Sampford Till´e Systematic Approx 1 Approx 2

CPS 0 0.003 0.111 0.166 0.048 0.007

Sampford 0.010 0 0.108 0.177 0.047 0.004 Till´e 0.306 0.293 0 0.511 0.294 0.277 Systematic 0.141 0.144 0.151 0 0.151 0.147 Approx 1 0.023 0.019 0.079 0.189 0 0.018 Approx 2 0.024 0.013 0.105 0.191 0.046 0

Figure 7.4 contains a multidimensional scaling of the symmetrized matrix of the LPD given in Table 7.5. Next, the optimal Euclidian representation

in R² is obtained by multidimensional scaling. Although the distances are smaller, the conﬁguration is very similar to the previous example.

CPS

Sampford

Till´e Systematic

Approx1 Approx2

Fig. 7.4. Multidimensional scaling (MDS) of the symmetrized matrix of the LPD for Example 27

7.7 Choice of a Method of Sampling with Unequal Probabilities

Which sampling method with unequal probabilities to choose? We advocate the CPS design or the Sampford procedure. The CPS method is exponential and is linked with the Poisson sampling design and the multinomial sampling.

Sampford’s method is obviously a good alternative that is simple to imple-ment. Obviously, random systematic sampling is already very much used due to its extreme simplicity, but the variance will be less well approximated and estimated. Nevertheless, it is now possible to select balanced samples with un-equal probabilities (see Chapter 8), which provides more accurate estimators.

8 Balanced Sampling

8.1 Introduction

A balanced sampling design has the important property that the Horvitz-Thompson estimators of the totals for a set of auxiliary variables are equal to the totals we want to estimate. Therefore, the variances of all the variables of interest are reduced, depending on their correlations with the auxiliary variables. Yates (1949) had already insisted on the idea of respecting the means of known variables in probabilistic samples because the variance is then reduced. Yates (1946) and Neyman (1934) described methods of balanced sampling limited to one variable and to equal inclusion probabilities.

Royall and Herson (1973a) stressed the importance of balancing a sample for protecting inference against a misspeciﬁed model. They called this idea

‘robustness’. Since no method existed for achieving a multivariate balanced sample, they proposed the use of simple random sampling, which is mean-balanced with large samples. More recently, several partial solutions were proposed by Deville et al. (1988), Deville (1992), Ardilly (1991), and Hedayat and Majumdar (1995). Valliant et al. (2000) surveyed some existing methods.

In this chapter, after surveying the most usual methods for equal inclusion probabilities, we present the cube method proposed by Deville and Till´e (2004, 2005) that allows selecting balanced samples with equal or unequal inclusion probabilities.

8.2 The Problem of Balanced Sampling

8.2.1 Deﬁnition of Balanced Sampling The aim is always to estimate the total

Y =

k∈U

y_k.

Suppose also that the vectors of values

x_k = (x_k1 · · · x_kj · · · x_kp)

taken bypauxiliary variables are known for all units of the population. The vector of population totals of the balancing variables is thus also known

k∈U

x_k, and can be estimated by

X!_HT =

k∈U

x_kS_k π_k .

The aim is to construct a balanced sampling design, deﬁned as follows.

Deﬁnition 53.A sampling design p(s) is said to be balanced with respect to the auxiliary variables x₁, . . . , x_p, if and only if it satisﬁes the balancing equations given by

X!_HT =X, (8.1)

which can also be written

k∈U

x_kjS_k

π_k =

k∈U

x_kj,

for all s∈ S such that p(s)>0 and for allj= 1, . . . , p; or in other words var

X!_HT = 0.

Balanced sampling can thus be viewed as a restriction on the support.

Indeed, only the samples that satisfy the balancing equations have a strictly positive probability; that is, the support is

s∈ S

k∈U

x_ks_k π_k =X

8.2.2 Particular Cases of Balanced Sampling

Some particular cases of balanced sampling are well-known. Consider the fol-lowing two particular cases.

Particular case 1: Fixed sample size

A sampling design of ﬁxed sample size n is balanced on the variable x_k = π_k, k∈U, because

k∈U

x_kS_k

π_k =

k∈U

S_k =

k∈U

π_k =n.

8.2 The Problem of Balanced Sampling 149 Remark 12.In sampling with unequal probabilities, when all the inclusion probabilities are diﬀerent, the Horvitz-Thompson estimator of the population size N is generally random:

N!_HT =

k∈U

S_k π_k.

When the population size is known before selecting the sample, it could be important to select a sample such that

k∈U

S_k

π_k =N. (8.2)

Equation (8.2) is a balancing equation, in which the balancing variable is x_k = 1, k ∈ U. We show that balancing equation (8.2) can be satisﬁed by means of the cube method.

Particular case 2: stratiﬁcation

Suppose that the population can be split into H nonoverlapping groups (or strata) denotedU_h, h= 1, . . . , H,of sizesN_h, h= 1, . . . , H.A sampling design is said to be stratiﬁed if a sample is selected in each stratum with simple random sampling with ﬁxed sample sizesn₁, . . . , n_h. . . , n_H,the samples being selected independently in each stratum.

Now, letδ_k1, . . . , δ_kH,where δ_kh=

1 ifk∈U_h 0 ifk /∈U_h.

A stratiﬁed design is balanced on the variablesδ_k1, . . . , δ_kH; that is,

k∈U

S_kδ_kh

π_k =

k∈U

δ_kh=N_h.

forh= 1, . . . , H.In other words, the Horvitz-Thompson estimators of theN_h’s are exactly equal to N_h. Stratiﬁcation is thus a particular case of balanced sampling.

8.2.3 The Rounding Problem

In most cases, the balancing equations (8.1) cannot be exactly satisﬁed, as Example 28 shows.

Example 28.Suppose that N = 10, n = 7, π_k = 7/10, k ∈ U, and the only auxiliary variable is x_k =k, k∈U. Then, a balanced sample satisﬁes

k∈U

kS_k

π_k =

k∈U

k, so that

k∈UkS_k has to be equal to 55×7/10 = 38.5,which is impossible because 38.5 is not an integer. The problem arises because 1/π_k is not an integer and the population size is small.

The rounding problem also exists for simple random sampling and strat-ification. If the sum of the inclusion probabilities is not an integer, it is not possible to select a simple random sample with fixed sample size. In propor-tional stratification, all the inclusion probabilities are equal. In this case, it is generally impossible to define integer-value sample sizes in the strata in such a way that the inclusion probabilities are equal for all strata.

The rounding problem exists for almost all practical cases of balanced sampling. For this reason, our objective is to construct a sampling design, that satisﬁes the balancing equations (8.1) exactly if possible and to ﬁnd the best approximation if this cannot be achieved. This rounding problem becomes negligible when the expected sample size is large.

8.3 Procedures for Equal Probabilities

8.3.1 Neyman’s Method

Neyman (1934, p. 572) (see also Thionet, 1953, pp. 203-207) was probably the ﬁrst author to propose a sampling procedure based on the use of small strata:

Thus, we see that the method of purposive selection consists, (a) in dividing the population of districts into second-order strata according to values ofy andxand (b) in selecting randomly from each stratum a deﬁnite number of districts. The number of samplings is determined by the condition of maintenance of the weighted average of they.

8.3.2 Yates’s Procedure

Yates (1946, p. 16) also proposed a method based on the random substitution of units.

A random sample is first selected (stratified or other factors if re-quired). Further members are then selected by the same random pro-cess, the first member being compared with the first member of the original sample, the second with the second member and so on, the new member being substituted for the original member if balancing is thereby improved.

The Neyman and Yates procedures are thus limited to equal probability sampling with only one balancing variable.

8.4 Cube Representation of Balanced Samples 151 8.3.3 Deville, Grosbras, and Roth Procedure

Deville et al. (1988) proposed a method for the selection of balanced samples on several variables with equal inclusion probabilities. This method is based on the construction of groups that are constructed according to the means of all auxiliary variables. If p is not too large, we can construct 2^p groups according to the possible vector of signs:

sign

x_k1−X₁ · · · x_kj−X_j · · · x_kp−X_p , where

X_j = 1 N

k∈U

x_kj, j= 1, . . . , p.

The objective of Algorithm 8.1 is to select a sample balanced on X₁ · · · X_j · · · X_p

Unfortunately, Algorithm 8.1 provides an equal probability sampling de-sign. The inclusion probabilities are approximately satisﬁed. Moreover, the

Dans le document Springer Series in Statistics (Page 139-0)