• Aucun résultat trouvé

Deville, Grosbras, and Roth balanced procedure for equal

Dans le document Springer Series in Statistics (Page 159-163)

1. Initialization. First, assume by convention that the population mean X = ( ¯X1 · · · X¯j · · · X¯p) is a null vector ofRp. Split the population intoGgroups, whereG≥pandpis the number of balancing variables. LetXg, g= 1, . . . , G be the vectors ofRp containing the means of thegth group. The groups must be constructed in such a way that the sizesNg of the groups are nearly equal.

Next, computeng=nNg/N.Note thatng is not necessarily integer.

2. Rounding the sample group size. Theng’s are rounded by means of an unequal probability algorithm (see Section 5.6 and Chapters 6 and 7) with fixed sample size and without replacement. Select a sample denoted (I1 · · · Ig · · · IG) from population {1, . . . , g, . . . , G} with unequal inclusion probabilities (n1 n1 · · · ng− ng · · · nG− nG),wherengis the greatest integer number less than or equal to ng. The rounded group sizes are mg = ng+Ig, g = 1, . . . , G.

3. Computation of the sample mean. Sett= 1 and ¯x(1) =

gµgx¯g.

4. Computation of ν. Search a vector ν = (ν1 · · · νg · · · νG) by solving the following program

minimize G g=1

νg2 Ng, subject to

G g=1

νg= 0 and G g=1

νgXg=−n¯x(t).

5. Rounding ν. The νg will be rounded by means of an unequal probability al-gorithm (see Section 5.6 and Chapters 6 and 7) with fixed sample size and without replacement. Select a sample denoted (I1 · · · Ig · · · IG) from popula-tion{1, . . . , g, . . . , G}with unequal inclusion probabilities (ν1− ν1 · · · νg νg · · · νG− νG).The roundedνg areµg=ng+Ig, g= 1, . . . , G.

6. Stopping rule. If

gg| ≥A,retain the sample and stop; otherwise sett=t+1 and computeA=

gg|.

7. Updating the sample. Ifµg >0 drawµg units from the sample by simple random sampling and drop them. Ifµg <0 drawg|new units from groupgby simple random sampling and add them to the sample.

8. Computation of the sample mean. Compute the new sample mean ¯x(t) and Goto Step4.

an affine subspace Qin RN of dimensionN −p. Note that Q=π+ KerA, where

KerA=

vRN|Av=0

is the kernel of thep×N matrixA given byA= (ˇx1 · · · xˇk · · · xˇN). The main idea in obtaining a balanced sample is to choose a vertex of theN-cube that remains in the subspaceQor nearQif that is not possible.

IfC= [0,1]N denotes theN-cube inRN whose vertices are the samples of U, the intersection betweenCandQis nonempty becauseπis in the interior

8.4 Cube Representation of Balanced Samples 153 of C and belongs toQ. The intersection between anN-cube and a subspace defines a convex polytope K=C∩Qwhich has a dimensionN−pbecause it is the intersection of anN-cube and a plane of dimensionN−pthat has a point in the interior ofC.

8.4.2 Geometrical Representation of the Rounding Problem

The fact that the balancing equation system is exactly, approximately, or sometimes satisfied depends on the values of ˇxk and X.

Definition 54.LetD be a convex polytope. A vertex, or extremal point, ofD is defined as a point that cannot be written as a convex linear combination of the other points ofD. The set of all the vertices ofD is denoted byExt(D).

Definition 55.A samplesis said to be exactly balanced if sExt(C)∩Q.

Note that a necessary condition for finding an exactly balanced sample is that Ext(C)∩Q=∅.

Definition 56.A balancing equation system is (i) Exactly satisfied if Ext(C)∩Q= Ext(C∩Q), (ii)Approximately satisfied ifExt(C)∩Q=∅,

(iii) Sometimes satisfied if Ext(C)∩Q= Ext(C∩Q)andExt(C)∩Q=∅.

A balancing system can thus be exactly satisfied if the extremal point of the intersection of the cube and of the subspace Qare also vertices of the cube.

This case is rather exceptional. However, the following result shows that the rounding problem concerns only a limited number of units.

Result 34.If r= [rk] is a vertex ofK=C∩Q then card{k|0< rk<1} ≤p,

where p is the number of auxiliary variables and card(U) denotes the cardi-nality of U.

Proof. LetA be the matrixA= (ˇx1 · · · ˇxk · · · xˇN) restricted to the non-integer components of vector r; that is, restricted to U ={k|0 < rk <1}. If q= card(U)> p, then KerA has dimension q−p >0, and ris not an

extreme point ofK. 2

The following three examples show that the rounding problem can be viewed geometrically. Indeed, the balancing equations cannot be exactly sat-isfied when the vertices of Kare not vertices ofC; that is, whenq >0.

Example 29.In Figure 8.1, a sampling design in a population of size N = 3 is considered. The only constraint consists of fixing the sample size n = 2, and thus p= 1 and xk =πk, k ∈U. The inclusion probabilities satisfy π1+ π2+π3= 2,so that the balancing equation is exactly satisfied. We thus have A= (1,1,1), Aπ= 2, Q={g1, g2, g3 R|g1+g2+g3 = 2}, Ext(Q∩C) = {(1,1,0),(1,0,1),(0,1,1)},and Ext(C)∩Q={(1,1,0),(1,0,1),(0,1,1)}.

(000) (100)

(101) (001)

(010) (110)

(111) (011)

Fig. 8.1. Fixed size constraint of Example 29: all the vertices ofK are vertices of the cube

Example 30.Figure 8.2 exemplifies the case where the constraint subspace does not pass through any vertices of the cube. The inclusion probabilities are π1=π2=π3= 0.5.The only constraint,p= 1, is given by the auxiliary variablex1= 0, x2= 6π2,andx3= 4π3.It is then impossible to satisfy exactly the balancing equation; the balancing equation is always approximately satis-fied. We thus have A= (0,6,4),Aπ= 5,Q={g1, g2, g3R|6g2+ 4g3= 5}, Ext(Q∩C) ={(0,5/6,0),(0,1/6,1),(1,5/6,0),(1,1/6,0)},and Ext(C)∩Q=

∅.

(000) (100)

(101) (001)

(010) (110)

(111) (011)

Fig. 8.2.Example 30: none of vertices ofK are vertices of the cube

8.5 Enumerative Algorithm 155 Example 31.Figure 8.3 exemplifies the case where the constraint subspace passes through two vertices of the cube but one vertex of the intersection is not a vertex of the cube. The inclusion probabilities areπ1=π2 =π3= 0.8.

The only constraint,p= 1,is given by the auxiliary variablex1=π1, x2= 3π2 andx3=π3.The balancing equation is only sometimes satisfied. In this case, there exist balanced samples, but there does not exist an exactly balanced sampling design for the given inclusion probabilities. In other words, although exact balanced samples exist, one must accept selecting only approximately balanced samples in order to satisfy the given inclusion probabilities. We thus have A = (1,3,1), Aπ = 4, Q = {g1, g2, g3 R|g1+ 3g2+g3 = 4}, Ext(Q∩C) ={(1,1,0),(0,1,1),(1,2/3,1)}, Ext(C)∩Q={(1,1,0),(0,1,1)}.

(000) (100)

(101) (001)

(010)

(110) (111) (011)

Fig. 8.3. Example 31: some vertices of K are vertices of the cube and others are not

8.5 Enumerative Algorithm

A first way to find a balanced sampling design is to define a cost function Cost(s) for all possible samples of S. The choice of the cost function is an arbitrary decision that depends on the priorities of the survey manager. The cost must be such that

Cost(s)>0,for alls∈ S.

Cost(s) = 0,ifsis balanced.

A simple cost could be defined by the sum of squares function

Cost1(s) = p j=1

X!jHT(s)−Xj 2

Xj2 ,

where X!jHT(s) is the value taken byX!jHT on sample s. Instead, we might take

Cost2(s) = (sπ)A(AA)−1A(sπ),

where A= (ˇx1 · · · xˇk · · · xˇN). The choice of Cost2(.) has a natural inter-pretation as a distance in RN,as shown by the following result.

Result 35.The square of the distance between a sample sand its Euclidean projection onto the constraint subspace is given by

Cost2(s) = (sπ)A(AA)−1A(sπ). (8.4) Proof. The projection of a samplesonto the constraint subspace is

sA(AA)−1A(sπ).

The square of the Euclidean distance betweensand its projection is thus (s−π)A(AA)−1A(s−π) = (s−π+π−π)A(AA)−1A(s−π+π−π) and, becauseA(ππ) =0,(8.4) follows directly. 2

The sample can then be selected by means of Algorithm 8.2.

Algorithm 8.2Balanced sampling by linear programming

Dans le document Springer Series in Statistics (Page 159-163)