• Aucun résultat trouvé

Consider the model (3.1). To compare the performance of our proposals with the existing bias-calibration methods in case of estimating/predicting the Gini index we implement a series of simulations. For that we generate the population data for d = 40 small areas, indexed by j = 1,· · ·, d, with equal population size of Nj = 300 in each area.

Later, we take a sample of size nj = 15 from each area using SRSWOR (simple random sampling without replacement) technique. The Xij’s (only auxiliary variable) are i.i.d with, XijlogN orm(mean = 1, sd = 0.5). The elements of the outcome vector Y are generated as yij = 100 + 5xij +uj +ij, for individual i in area j. Since we would like to study the effect of heavy tailed and right skewed distribution, the error terms are generated from the skewed t-3 distribution (St-3), with different measure of skewness

to somehow represent the (expected) inequality in developing and developed countries.

The scenarios are also created to distinguish between the situation when the mean of the heavily skewed error terms is equal to zero, referred to as centered error term and when the mean is rather away from zero, referred to as non-centered error term6. The following scenarios, therefore, are created:

1. Centered error terms:

(a) uN(0,1) andSt(3, k = 40), where St is a Skewed t distribution, with the right skewness, andk is the measure of skewness introduced inFernandez and Steel[1998], that takes the value greater than 0. The symmetric t-distribution has k = 1.

Here the data for the whole population is generated once under each scenario ((design based simulation) ) and then hundred samples has been drawn at random , t= 1· · ·100.

Then we consider the standard (EBLUP), robust predictor (REBLUP), REBLUP with symmetric (REBLUP-SBC) and asymmetric (REBLUP-ABC) bias-calibration, and 2 ro-bust bias-calibrated method based on the linearized index (IF-SBC, for symmetric cal-ibration) and (IF-ABC, for asymmetric calcal-ibration). Then the Gini index is predicted for each area, Ginij, using each of these techniques. These predicted values then are compared with their true counterparts in relative term to calculate the relative prediction error at each round of sampling, h.

relative prediction error(Gini(h)j ) = predicted value(Gini(h)j )≠true value(Ginij) true value(Ginij) . The expected value of these relative errors over the repeated sampling provides an estimate of the relative bias in each area.

Relative Biasj = 1 100

ÿ100 h=1

relative prediction error(Gini(h)j ).

6In these simulations we keep the distribution of the random effects asuN(0,1). We have also per-form our analysis for the scenarios where the random effects are coming from a heavy tailed distribution.

However, this deviation from the model does not seem to have much negative impact on the estimation of the areas’ Gini index, see McCulloch and Neuhaus [2011]. In this paper the authors argue that in a simple linear mixed model with only a random intercept, that there is no association between random effects, error term and the covariates and the cluster sizes are not informative, the misspecification of the shape of the random effects distribution can introduce no or ignorable bias in the estimation of the model parameters and random components.

7 In these extreme cases we are ought to acknowledge that even the primary assumptions to have a consistent estimate of the model parameters are not satisfied,e.g. E(Â(‘ij))= 0 . However, we emphasize that the focus of this chapter is mainly on the prediction of area specific non-linear parameters, rather than consistent estimators of the model parameters. The novelty of this proposal is in the bias calibration techniques that can be adopted to any appropriate robust estimation techniques.

The prediction of the Gini index is generally downward biased in small samples. An indirect estimate of the Gini by replacing the unobserved outcomes with their estimate suffers from the same problem. That is expected as the variation in the predicted outcomes (for the unobserved part of the population) is less than the variation in true outcomes.

Considering the predictive distribution instead of the point prediction as well as skewed truncation to some extent correct for such bias.

As well, to illustrate the efficiency gain of our proposed method we compare the square root of mean square of these relative predictive errors(RRMSE). That is, calculating the average over the repeated samples, the square of relative predictive errors and then taking the square root of this average.

RRMSEj = 1 100

ÿ100 h=1

3relative prediction error(Gini(h)j )42.

In what follows we illustrate the Relative Bias and RRMSE of the 40 areas in the format of a box-plot for each bias-calibrated methods, namely REBLUP-SBC, REBLUP-ABC, IF-SBC, IF-ABC, under 6 scenarios that where introduced before.

To Summarize the result we present in the Table 3.1 the median of Relative Bias , and the root square of the average of mean squared relative errors, RRMSE, of 40 areas under each scenario and for different techniques.

From the above box plots and also the summary of the results provided in Table 3.1, we can see that the asymmetric calibration methods are outperforming their symmetrically calibrated counterparts. The “IF-ABC” provides the best results in terms of Relative Bias, whereas the “REBLUP-ABC” acquires minimum RRMSE for all scenarios. This is not so surprising, in that through linearizion with IF an implicit bias correction is taking place. Having in mind the desirable bias/variance trade off one can choose between the

Table 3.1: Median of the areas’ relative Bias and RRMSE, for REBLUP and different calibration methods (i.e. Partial Calibration).

Centered error terms

Scenario(1.a) Scenario(1.b) Scenario(1.c)

median(True Gini) 0.20 0.34 0.49

Method Rel. Bias RRMSE Rel. Bias RRMSE Rel. Bias RRMSE

REBLUP -0.816 0.816 -0.893 0.894 -0.925 0.926

REBLUP-SBC -0.136 0.229 -0.119 0.200 -0.097 0.171

REBLUP-ABC -0.131 0.198 -0.068 0.165 -0.061 0.159

MQ-SBC -0.132 0.231 -0.115 0.204 -0.094 0.173

MQ-ABC -0.110 0.220 -0.087 0.181 -0.052 0.161

IF-SBC -0.146 0.339 -0.097 0.290 -0.076 0.275

IF-ABC -0.023 0.210 -0.059 0.218 -0.058 0.239

Non-Centered error terms

Scenario(2.a) Scenario(2.b) Scenario(2.c)

median(True Gini) 0.20 0.30 0.40

Method Rel. Bias RRMSE Rel. Bias RRMSE Rel. Bias RRMSE

REBLUP -0.898 0.899 -0.956 0.956 -0.971 0.971

REBLUP-SBC -0.139 0.235 -0.129 0.215 -0.116 0.192

REBLUP-ABC -0.136 0.206 -0.084 0.174 -0.063 0.164

MQ-SBC -0.135 0.238 -0.124 0.218 -0.111 0.195

MQ-ABC -0.109 0.232 -0.130 0.208 -0.069 0.172

IF-SBC -0.143 0.357 -0.117 0.315 -0.100 0.289

IF-ABC -0.027 0.213 -0.050 0.208 -0.039 0.229

two approaches. We have also provided in this table the summary of the results for the calibration based on MQ as an alternative robust method to REBLUP.

3.5 Choice of tuning parameters via estimating the