• Aucun résultat trouvé

Application of the Cube Method to Particular Cases

Dans le document Springer Series in Statistics (Page 173-183)

k∈U

αk= 1.

The bound for the estimation of the mean can thus be written:

|X!jHT −Xj|

N p

max

k∈U

xkj αk −Xj

=O(p/n),

whereO(1/n) is a quantity that remains bounded when multiplied byn. The bound thus very quickly becomes negligible if the sample size is large with respect to the number of balancing variables.

For comparison note that with a single-stage sampling design such as sim-ple random sampling or Bernoulli sampling, we have generally that

|X!jHT −Xj|

N =Op(1/ n) (see, for example, Ros´en, 1972; Isaki and Fuller, 1982).

Despite the overstatement of the bound, the gain obtained by balanced sampling is very important. The rate of convergence is much faster for bal-anced sampling than for a usual sampling design. In practice, except for the case of very small sample sizes, the rounding problem is thus negligible. Fur-thermore, the rounding problem also becomes problematic in stratification with very small sample sizes. In addition, this bound corresponds to the “worst case”, whereas the landing phase is used to find the best one.

8.7 Application of the Cube Method to Particular Cases

8.7.1 Simple Random Sampling

Simple random sampling is a particular case of the cube method. Sup-pose that π = (n/N · · · n/N · · · n/N) and that the balancing vari-able is xk = n/N, k U. We thus have A = (1 · · · 1) RN and KerA=

vRN|

k∈Uvk= 0 .

There are at least three ways to select a simple random sampling without replacement.

1. The first way consists of beginning the first step by using u(1) =

N−1

N 1

N · · · − 1 N

. Then,λ1(1) = (N−n)/(N−1), λ2(1) =n/(N−1) and

π(1) =

⎧⎨

1 N−1n−1 · · · N−1n−1 with probabilityq1(1)

0 N−1n · · · N−1n with probabilityq2(1),

8.7 Application of the Cube Method to Particular Cases 167 where q1(1) = π1 = n/N and q2(1) = 1−π1 = (N −n)/N. This first step corresponds exactly to the selection-rejection method for SRSWOR described in Algorithm 4.3, page 48.

2. The second way consists of sorting the data randomly before applying the cube method with any vectorsv(t). Indeed, any choice ofv(t) provides a fixed size sampling with inclusion probabilitiesπk =n/N.A random sort applied before any equal probability sampling provides a simple random sampling (see Algorithm 4.5, page 50).

3. The third way consists of using a random vector v= (vk),where the vk are N independent identically distributed variables. Next, this vector is projected on KerA,which gives

uk=vk 1 N

k∈U

vk.

Note that, for such vk, it is obvious that a preliminary sorting of the data will not change the sampling design, which is thus a simple random sampling design.

An interesting problem occurs when the design has equal inclusion prob-abilities πk =π, k ∈U, such that N π is not an integer number. If the only constraint implies a fixed sample size; that is, xk = 1, k ∈U, then the bal-ancing equation can only be approximately satisfied. Nevertheless, the flight phase of the cube method works untilN−p=N−1 elements ofπ=π(N1) are integer numbers. The landing phase consists of randomly deciding whether the last unit is drawn. The sample size is therefore equal to one of the two integer numbers nearest toN π.

8.7.2 Stratification

Stratification can be achieved by takingxkh=δkhnh/Nh, h= 1, . . . , H,where Nh is the size of stratumUh,nh is the sample stratum size and

δkh=

1 ifk∈Uh 0 ifk /∈Uh. In the first step, we use

uk(1) =vk(1) 1 Nh

∈Uh

v(1), k∈Uh.

The three strategies described in Section 8.7.1 for simple random sampling allow us to obtain directly a stratified random sample with simple random sampling within the strata. If the sums of the inclusion probabilities are not integer within the strata, the cube method randomly rounds the sample sizes of the strata so as to ensure that the given inclusion probabilities are exactly satisfied.

The interesting aspect of the cube method is that the stratification can be generalized to overlapping strata, which can be called “quota random design”

or “cross-stratification”. Suppose that two stratification variables are avail-able, for example, in a business survey with “activity sector” and “region”.

The strata defined by the first variable are denoted byUh., h= 1, . . . , H,and the strata defined by the second variable are denoted by U.i, i = 1, . . . , K.

Next, define thep=H+K balancing variables, xkj=πk×

I[k∈Uj.] j= 1, . . . , H I/

k∈U.(j−H)0

j=H+ 1, . . . , H+K,

whereI[.] is an indicator variable that takes the value 1 if the condition is true and 0 otherwise. The sample can now be selected directly by means of the cube method. The generalization to a multiple quota random design follows immediately. It can be shown (Deville and Till´e, 2000) that the quota random sampling can be exactly satisfied.

8.7.3 Unequal Probability Sampling with Fixed Sample Size The unequal inclusion probability problem can be solved by means of the cube method. Suppose that the objective is to select a sample of fixed sizenwith inclusion probabilities πk, k U, such that

k∈Uπk =n. In this case, the only balancing variable isxk=πk.In order to satisfy this constraint, we must have

uKerA=

vRN

k∈U

vk= 0

,

and thus

k∈U

uk(t) = 0. (8.11)

Each choice, random or not, of vectors u(t) that satisfy (8.11) produces an-other unequal probability sampling method. Nearly all existing methods, ex-cept the rejective ones and the variations of systematic sampling, can easily be expressed by means of the cube method. In this case, the cube method is identical to the splitting method based on the choice of a direction described in Section 6.2.3, page 102.

The techniques of unequal probability sampling can always be improved.

Indeed, in all the available unequal probability sampling methods with fixed sample size, the design is only balanced on a single variable. Nevertheless, two balancing variables are always available, namely, xk1 = πk, k U, and xk2= 1, k ∈U.The first variable implies a fixed sample size and the second one implies that

N!HT =

k∈U

Sk πk =N.

In all methods, the sample is balanced onxk1 but not on xk2. The balanced cube method allows us to satisfy both constraints approximately.

8.8 Variance Approximations in Balanced Sampling 169

8.8 Variance Approximations in Balanced Sampling

8.8.1 Construction of an Approximation The variance of the Horvitz-Thompson estimator is

var

and = [∆k]. Matrix is called the variance-covariance operator. Thus, the variance of Y!HT can theoretically be expressed and estimated by using the joint inclusion probabilities. Unfortunately, even in very simple cases like fixed sample sizes, the computation of is practically impossible.

Deville and Till´e (2005) have proposed approximating the variance by supposing that the balanced sampling can be viewed as a conditional Poisson sampling. A similar idea was also developed by H´ajek (1981, p. 26, see also Section 7.5.1, page 139) for sampling with unequal probabilities and fixed sample size. In the case of Poisson sampling, which is a sampling design with no balancing variables, the variance of Y!HT is easy to derive and can be estimated because only first-order inclusion probabilities are needed. If S3 is the random sample selected by a Poisson sampling design andπ3k, k ∈U, are the first-order inclusion probabilities of the Poisson design, then

varPOISSON that Expression (8.13) containsπk, and3πk because the variance of the usual estimator (function ofπk’s) is computed under Poisson sampling (function of 3

πks). Theπks are always known, but the3πks are not necessarily known.

If we suppose that, through Poisson sampling, the vector (!YHT X!HT) has approximately a multinormal distribution, we obtain

varPOISSON

and

πkbecause we compute the variance of the usual Horvitz-Thompson estimator (function of πk) under the Poisson sampling design (function of3πk).

If

bk =3πk(13πk), Expression (8.14) can also be written

varAPPROX of balanced sampling amounts to sampling with unequal probabilities and fixed sample size. The approximation of variance given in (8.15) is then equal to the approximation given in Expression (7.13), page 138. In this case, ˇykis simply the mean of the ˇyk’s with the weightsbk.

The weights bk unfortunately are unknown because they depend on the 3

πk’s, which are not exactly equal to theπk.We thus propose to approximate thebk. Note that Expression (8.15) can also be written

varAPPROX

Y!HT = ˇyAPPROXˇy,

whereAPPROX={∆appk}is the approximated variance-covariance operator and Four variance approximations can be obtained by various definitions of the bk’s. These four definitions are denotedbk1, bk2, bk3, and bk4 and permit the definition of four variance approximations denotedVα, α = 1,2,3,4,and four variance-covariance operators denoted α, α= 1,2,3,4,by replacing in (8.15) and (8.16),bk with, respectively,bk1, bk2, bk3,andbk4.

1. The first approximation is obtained by considering that at least for large sample sizes,πk3πk, k∈U.Thus, we take bk1=πk(1−πk).

8.8 Variance Approximations in Balanced Sampling 171 2. The second approximation is obtained by applying a correction for the

loss of degrees of freedom:

bk2=πk(1−πk) N N−p.

This correction allows obtaining the exact expression for simple random sampling with fixed sample size.

3. The third approximation is derived from the fact that the diagonal el-ements of the variance-covariance operator of the true variance are always known and are equal toπk(1−πk).Thus, by defining

bk3=πk(1−πk)trace trace 1,

we can define the approximated variance-covariance operator3that has the same trace as∆.

4. Finally, the fourth approximation is derived from the fact that the diago-nal elementsAPPROXcan be computed and are given in (8.16). The bk4 are constructed in such a way thatk=appk,or in other words, that

πk(1−πk) =bk−bkxˇk (

k∈U

bkxˇkˇxk )−1

xˇkbk, k∈U. (8.17) The determination of the bk4’s then requires the resolution of the non-linear equation system. This fourth approximation is the only one that provides the exact variance expression for stratification.

In Deville and Till´e (2005), a set of simulations is presented which shows that bk4is indisputably the most accurate approximation.

8.8.2 Application of the Variance Approximation to Stratification Suppose that the sampling design is stratified; that is, the population can be split into H nonoverlapping strata denotedUh, h= 1, . . . , H,of sizesNh, h= 1, . . . , H.The balancing variables are

xk1=δk1, . . . , xkH =δkH, where

δkh=

1 ifk∈Uh 0 ifk /∈Uh.

If a simple random sample is selected in each stratum with sizesn1, . . . , nH, then the variance can be computed exactly:

var

Y!HT = H h=1

Nh2Nh−nh Nh

Vyh2 nh ,

where

It is thus interesting to compute the four approximations given in Section 8.8.1 in this particular case.

1. The first approximation gives

bk1=πk(1−πk) = nh 2. The second approximation gives

bk2= N 3. The third approximation gives

bk3=πk(1−πk)trace

8.9 Variance Estimation 173 4. The fourth approximation gives

bk4= nh

Although the differences between the variance approximations varAPPROX1, varAPPROX2, varAPPROX3, and varAPPROX4 are small relative to the population size, varAPPROX4is the only approximation that gives the exact variance of a stratified sampling design.

8.9 Variance Estimation

8.9.1 Construction of an Estimator of Variance

Because Expression (8.15) is a function of totals, we can substitute each total by its Horvitz-Thompson estimator (see, for instance, Deville, 1999) in order to obtain an estimator of (8.15). The resulting estimator for (8.15) is:

% is the estimator of the regression predictor of ˇyk.

Note that (8.18) can also be written allow defining five variance estimators by replacingckin Expression (8.18) by, respectively, ck1, ck2, ck3, ck4,andck5.

1. The first estimator is obtained by takingck1= (1−πk).

2. The second estimator is obtained by applying a correction for the loss of degrees of freedom:

ck2= (1−πk) n n−p.

This correction for the loss of degrees of freedom gives the unbiased esti-mator in simple random sampling with fixed sample size.

3. The third estimator is derived from the fact that the diagonal elements of the true matrixkkare always known and are equal to 1−πk.Thus, we can use

ck3= (1−πk)

k∈U(1−πk)Sk

k∈UDkk1Sk , whereDkk1is obtained by pluggingck1 inDkk.

4. The fourth estimator can be derived from bk4 obtained by solving the equation system (8.17).

ck4=bk4 πk

n n−p

N−p

N .

5. Finally, the fifth estimator is derived from the fact that the diagonal ele-mentsDkk are known. Theck5’s are constructed in such a way that

1−πk=Dkk, k∈U, (8.19)

or in other words that 1−πk =ck−ckxˇk

(

i∈U

ciSixˇixˇi )−1

xˇkck, k∈U.

A necessary condition of the existence of a solution for equation system (8.19) is that

maxk

1−πk

∈USi(1−π) < 1 2.

The choice of the weights ck is tricky. Although they are very similar, an evaluation by means of a set of simulations should still be run.

8.9.2 Application to Stratification of the Estimators of Variance The case of stratification is interesting because the unbiased estimator of variance in a stratified sampling design (with a simple random sampling in each stratum) is known and is equal to

% var

Y!HT = H h=1

Nh2Nh−nh Nh

vyh2 nh,

8.9 Variance Estimation 175

It is thus interesting to compute the five estimators in the stratification case.

1. The first estimator gives

ck1= (1−πk) =Nh−nh 2. The second estimator gives

ck2= n 3. The third estimator gives

ck3= Nh−nh

4. The fourth estimator gives 5. The fifth estimator gives

ck5= nh

The five estimators are very similar, butvar%APPROX5is the only approximation that gives the exact variance of a stratified sampling design.

Dans le document Springer Series in Statistics (Page 173-183)