Deville, Grosbras, and Roth balanced procedure for equal

1. Initialization. First, assume by convention that the population mean X = ( ¯X₁ · · · X¯j · · · X¯p) is a null vector ofR^p. Split the population intoGgroups, whereG≥pandpis the number of balancing variables. LetXg, g= 1, . . . , G be the vectors ofR^p containing the means of thegth group. The groups must be constructed in such a way that the sizesNg of the groups are nearly equal.

Next, computeng=nNg/N.Note thatng is not necessarily integer.

2. Rounding the sample group size. Theng’s are rounded by means of an unequal probability algorithm (see Section 5.6 and Chapters 6 and 7) with ﬁxed sample size and without replacement. Select a sample denoted (I₁ · · · Ig · · · IG) from population {1, . . . , g, . . . , G} with unequal inclusion probabilities (n₁ − n₁ · · · ng− ng · · · nG− nG),wherengis the greatest integer number less than or equal to ng. The rounded group sizes are mg = ng+Ig, g = 1, . . . , G.

3. Computation of the sample mean. Sett= 1 and ¯x(1) =

gµgx¯g.

4. Computation of ν. Search a vector ν = (ν₁ · · · νg · · · νG) by solving the following program

minimize G g=1

ν_g² Ng, subject to

G g=1

νg= 0 and G g=1

νgXg=−n¯x(t).

5. Rounding ν. The νg will be rounded by means of an unequal probability al-gorithm (see Section 5.6 and Chapters 6 and 7) with ﬁxed sample size and without replacement. Select a sample denoted (I₁ · · · Ig · · · IG) from popula-tion{1, . . . , g, . . . , G}with unequal inclusion probabilities (ν₁− ν₁ · · · νg− νg · · · νG− νG).The roundedνg areµg=ng+Ig, g= 1, . . . , G.

6. Stopping rule. If

g|µg| ≥A,retain the sample and stop; otherwise sett=t+1 and computeA=

g|µg|.

7. Updating the sample. Ifµg >0 drawµg units from the sample by simple random sampling and drop them. Ifµg <0 draw|µg|new units from groupgby simple random sampling and add them to the sample.

8. Computation of the sample mean. Compute the new sample mean ¯x(t) and Goto Step4.

an aﬃne subspace Qin R^N of dimensionN −p. Note that Q=π+ KerA, where

KerA=

v∈R^N|Av=0

is the kernel of thep×N matrixA given byA= (ˇx₁ · · · xˇ_k · · · xˇ_N). The main idea in obtaining a balanced sample is to choose a vertex of theN-cube that remains in the subspaceQor nearQif that is not possible.

IfC= [0,1]^N denotes theN-cube inR^N whose vertices are the samples of U, the intersection betweenCandQis nonempty becauseπis in the interior

8.4 Cube Representation of Balanced Samples 153 of C and belongs toQ. The intersection between anN-cube and a subspace deﬁnes a convex polytope K=C∩Qwhich has a dimensionN−pbecause it is the intersection of anN-cube and a plane of dimensionN−pthat has a point in the interior ofC.

8.4.2 Geometrical Representation of the Rounding Problem

The fact that the balancing equation system is exactly, approximately, or sometimes satisﬁed depends on the values of ˇx_k and X.

Deﬁnition 54.LetD be a convex polytope. A vertex, or extremal point, ofD is deﬁned as a point that cannot be written as a convex linear combination of the other points ofD. The set of all the vertices ofD is denoted byExt(D).

Deﬁnition 55.A samplesis said to be exactly balanced if s∈Ext(C)∩Q.

Note that a necessary condition for ﬁnding an exactly balanced sample is that Ext(C)∩Q=∅.

Definition 56.A balancing equation system is (i) Exactly satisfied if Ext(C)∩Q= Ext(C∩Q), (ii)Approximately satisfied ifExt(C)∩Q=∅,

(iii) Sometimes satisﬁed if Ext(C)∩Q= Ext(C∩Q)andExt(C)∩Q=∅.

A balancing system can thus be exactly satisﬁed if the extremal point of the intersection of the cube and of the subspace Qare also vertices of the cube.

This case is rather exceptional. However, the following result shows that the rounding problem concerns only a limited number of units.

Result 34.If r= [r_k] is a vertex ofK=C∩Q then card{k|0< r_k<1} ≤p,

where p is the number of auxiliary variables and card(U) denotes the cardi-nality of U.

Proof. LetA^∗ be the matrixA= (ˇx₁ · · · ˇx_k · · · xˇ_N) restricted to the non-integer components of vector r; that is, restricted to U^∗ ={k|0 < r_k <1}. If q= card(U^∗)> p, then KerA^∗ has dimension q−p >0, and ris not an

extreme point ofK. 2

The following three examples show that the rounding problem can be viewed geometrically. Indeed, the balancing equations cannot be exactly sat-isﬁed when the vertices of Kare not vertices ofC; that is, whenq >0.

Example 29.In Figure 8.1, a sampling design in a population of size N = 3 is considered. The only constraint consists of ﬁxing the sample size n = 2, and thus p= 1 and x_k =π_k, k ∈U. The inclusion probabilities satisfy π₁+ π₂+π₃= 2,so that the balancing equation is exactly satisﬁed. We thus have A= (1,1,1), Aπ= 2, Q={g₁, g₂, g₃ ∈R|g₁+g₂+g₃ = 2}, Ext(Q∩C) = {(1,1,0),(1,0,1),(0,1,1)},and Ext(C)∩Q={(1,1,0),(1,0,1),(0,1,1)}.

(000) (100)

(101) (001)

(010) (110)

(111) (011)

Fig. 8.1. Fixed size constraint of Example 29: all the vertices ofK are vertices of the cube

Example 30.Figure 8.2 exempliﬁes the case where the constraint subspace does not pass through any vertices of the cube. The inclusion probabilities are π₁=π₂=π₃= 0.5.The only constraint,p= 1, is given by the auxiliary variablex₁= 0, x₂= 6π₂,andx₃= 4π₃.It is then impossible to satisfy exactly the balancing equation; the balancing equation is always approximately satis-ﬁed. We thus have A= (0,6,4),Aπ= 5,Q={g₁, g₂, g₃∈R|6g₂+ 4g₃= 5}, Ext(Q∩C) ={(0,5/6,0),(0,1/6,1),(1,5/6,0),(1,1/6,0)},and Ext(C)∩Q=

∅.

(000) (100)

(101) (001)

(010) (110)

(111) (011)

Fig. 8.2.Example 30: none of vertices ofK are vertices of the cube

8.5 Enumerative Algorithm 155 Example 31.Figure 8.3 exempliﬁes the case where the constraint subspace passes through two vertices of the cube but one vertex of the intersection is not a vertex of the cube. The inclusion probabilities areπ₁=π₂ =π₃= 0.8.

The only constraint,p= 1,is given by the auxiliary variablex₁=π₁, x₂= 3π₂ andx₃=π₃.The balancing equation is only sometimes satisﬁed. In this case, there exist balanced samples, but there does not exist an exactly balanced sampling design for the given inclusion probabilities. In other words, although exact balanced samples exist, one must accept selecting only approximately balanced samples in order to satisfy the given inclusion probabilities. We thus have A = (1,3,1), Aπ = 4, Q = {g₁, g₂, g₃ ∈ R|g₁+ 3g₂+g₃ = 4}, Ext(Q∩C) ={(1,1,0),(0,1,1),(1,2/3,1)}, Ext(C)∩Q={(1,1,0),(0,1,1)}.

(000) (100)

(101) (001)

(010)

(110) (111) (011)

Fig. 8.3. Example 31: some vertices of K are vertices of the cube and others are not

8.5 Enumerative Algorithm

A first way to find a balanced sampling design is to define a cost function Cost(s) for all possible samples of S. The choice of the cost function is an arbitrary decision that depends on the priorities of the survey manager. The cost must be such that

• Cost(s)>0,for alls∈ S.

• Cost(s) = 0,ifsis balanced.

A simple cost could be deﬁned by the sum of squares function

Cost₁(s) = p j=1

X!_jHT(s)−X_j ₂

X_j² ,

where X!_jHT(s) is the value taken byX!_jHT on sample s. Instead, we might take

Cost₂(s) = (s−π^∗)A(AA)⁻¹A(s−π^∗),

where A= (ˇx₁ · · · xˇ_k · · · xˇ_N). The choice of Cost₂(.) has a natural inter-pretation as a distance in R^N,as shown by the following result.

Result 35.The square of the distance between a sample sand its Euclidean projection onto the constraint subspace is given by

Cost₂(s) = (s−π)A(AA)⁻¹A(s−π). (8.4) Proof. The projection of a samplesonto the constraint subspace is

s−A(AA)⁻¹A(s−π).

The square of the Euclidean distance betweensand its projection is thus (s−π)A(AA)⁻¹A(s−π) = (s−π^∗+π^∗−π)A(AA)⁻¹A(s−π^∗+π^∗−π) and, becauseA(π−π^∗) =0,(8.4) follows directly. 2

The sample can then be selected by means of Algorithm 8.2.

Algorithm 8.2Balanced sampling by linear programming

Dans le document Springer Series in Statistics (Page 159-163)