• Aucun résultat trouvé

Choice of tuning parameters via estimating the MSE

The main task remains to fulfill now is to find the optimum values for two tuning constants, namely c and “, in formula (3.10) for the asymmetric calibration. We consider that the best choice for the tuning constants are the ones that minimizes the MSE, or in our case the RRMSE for each area. Therefore, we firstly need to have an estimate of RRMSE that depends on the choices of these two constants and find their optimum values upon minimizing the RRMSE estimates8. The analytical formula for estimating the MSE of the linear population parameters has been proposed in the existing literature, either by first order Taylor expansion Prasad and Rao [1990], by defining the estimator as the pseudo linear parameterChandra et al.[2011], or other approximation methodsChambers et al. [2014]. However, there exist no general closed form for the non-linear population parameters. In the literature of SAE it is very common to use the re-sampling methods to provide an estimate for the MSE of the small area parameters, specially when the closed form formula or an appropriate approximation does not exist. See among many Hall and Maiti [2006b], Hall and Maiti [2006a], Pfeffermann and Correa [2012]. The bootstrap procedures proposed in SAE literature do take into account the dependence structure of the outcome and can use complex settings to have higher order asymptotic accuracy. Among the non-parametric bootstrap techniques due to the clustering of the data the (x,y)-pair bootstrapping procedure is not eligible, whereas the block residual bootstrapping can preform fairly well if the shrinkage factor is considered for drawing the random effects.

Here, we propose a non-parametric empirical bootstrap method that is not computa-tionally expensive and can help us in determining the tuning constants. Since we focus on the conditional distribution for each area (see equation 3.11) we only sample from the error terms and not from the random effects. So once random effects are predicted we consider them as fixed. This will, indeed, disregard the between area variation and will lead to an underestimation of the MSE. However, one must bear in mind that our aim is not to provide a precise estimate of the MSE, or RRMSE per se, but to look for the optimum constants. As a result, we only compare the choice of the optimum constants by using the “true values” of RRMSE with the one deducted from the estimate of the RRMSE through a bootstrapping procedure. The following 3D plots in Figure 3.2 show the value of the “true” RRMSE and the “true” Relative Bias for a mesh over the two constants, under the scenario (2.b), using REBLUP-ABC method, in area 24. The results for other areas, using different asymmetric bias calibration methods is as well available.

This figure illustrates the importance of the skewness factor (“) in calibration.

It is well known that the presence of outliers in the residuals can harm the non-parametric residual bootstrapping procedure and lead to erroneous inferences, when the outlying values are overrepresented in the bootstrap sample. To avoid this problem we

8Wang et al.[2007] have proposed a method to choose the data driven tuning constant in the robust estimation of linear model parameters. Such methods, however, do not exist in case of mixed linear models. Neither this approach can be extended in a straightforward way to choose the tuning constants for the calibration phase. This topic can be the subject of future research.

Figure 3.2: 3D plot of the “true” RRMSE, and the Bias with respect to the 2 constants of the Asymmetric influence function that is used for calibration. Here only the result for REBLUP-ABC method for Area(24) is illustrated as an example. The optimum value for these constants is those minimizing the RRMSE.

do the bootstrap sampling from the pool of winsorized residuals, see Singh [1998] for the breakdown theory of bootstrap quantiles. Our aim here is to find the optimum constants for the skewed bias-calibration function. Therefore, each time that we choose a pair of constants from the set of grid over their domains (cboot,“boot), to estimate the RRMSE accordingly, we need to use a more relaxed pair to winsorize the residuals a priori to the bootstrap sampling. More relaxed in this setting can be (c2,1), where c2 > cboot. As discussed earlier, once the random intercepts are predicted we can consider them as fixed and re-sample from the area-specific block of residuals with replacement, in that it is plausible to consider them as being independent.

This bootstrap algorithm has the following steps:

1. Fit the model with Robust EBLUP method, get the estimate of the fixed and random parameters.

2. Pick a combination of the two values, from the set of a mesh over the two constants;

namely cboot and boot.

3. Following the new bias calibrated methods that are explained in Section 3.3, es-timate the bias calibrated Gini (or any other inequality indices) for each area,e.g.

Gini[BCj for area j. These estimates are called the original estimates hereafter.9. 4. Get the winsorized residuals for each area from the original fit ,

resij =Âc2,1

1(yijyˆij)/wˆj

2úwˆj.

Where ˆwj is a robust estimate of the scale for the area (j) residuals. e.g. ˆwj = Qnj(yjyˆj) or ˆwj =11.4826◊median(|ˆij |)2. And c2 > cboot. In the simulations of the chapter we consider c2 =cboot+ 0.5.

5. Sample randomly with replacement separately, from each area set of the winsorized residuals, and stack them to build the vector of bootstrap residuals, resúij.

6. Construct the bootstrap sample: yúij = ˆyij +resúij.

7. Using this new sample of bootstrap outcomes and the same designed matrix for the observed and unobserved units, fit the bootstrap model and make prediction for the unobserved units, ykjú , fork œrj.

8. Make bootstrap population outcome set:

Ujú(b) ={yij}fi{ykjú }.

Estimate the parameters of interest, e.g. Giniú(b)j . Giniú(b)j is based on bootstrap population and its calibration is done by means of the bootstrap model residuals.

9. Repeat step 4-7, B times. In each round calculate the error :

A

Giniú(b)j Gini[BCj [ GiniBCj

B

.

9The original estimated/predicted population values, for areaj, can be expressed as{yij}{yˆkj} for jœsj andkœrj

10. Estimated RRMSE and Bias are:

3.6 Application: Acid Neutralizing Capacity of the