5 Conclusions and Discussion - Bootstrap estimation of uncertainty in prediction for generalize

We have proposed an implementation of the RWLB addressing the estimation of uncertainty in prediction for the entire class of GLMM, either by estimating the MSEP or correcting the CV.

We have performed a simulation study that shows that this algorithm can be succesfully applied to both Gaussian and Non-Gaussian GLMM while displaying evidence of the good properties of our proposal when compared with competitors.

In the context of a LMM inspired by the Orthodonticdata, the RWLB-based methods that (i) are based onresamplingwhile accounting for the shrinkage of predictions (RWLB.V2) and (ii) rely on the simulation strategy (RWLB.V3), described in Section 3.3.3, display a relative Bias and Efficiency that is on par with the other Bootstrap-based methods and classic estimators based in second-order correct approximations e.g. Prasad & Rao (1990).

Moreover, these proposals seem well suited for the estimation of the uncertainty of Fitted Values.

In the simulated Mixed Logit Model inspired by the Toenail data, the method relying on the RWLB scheme that uses thesimulation strategy with EP obtained with Monte-Carlo approximations (RWLBE.v3.ep) produces Normal-based Confidence Intervals with Coverage Ratios that approximate the nominal levels especially when this level is “low” i.e. 90% or 95% and the number of observations per cluster increases. There is apparently a slight gain when using perturbed EP to create a correction for the effect of the uncertainty due to parameter estimation (GBC.v2.ep).

Further refinements of our appproach could include the use of RWLB replicates of the fixed effects to construct the samples at the basis of the RWLB methods, see Equations (27) and (28). Moreover, the theoretical connexions with the double bootstrap procedure of Hall

& Maiti (2006) would be worth studying. Finally, it could be of interest to provide LMM simulations with other settings for the Variance Components parameters.

Acknoweledgments

The computations were performed at University of Geneva on the Baobab cluster. DF acknoweledges the Travelling Support of the Societé Académique de Genève.

6 Appendix

6.1 Coverage Ratios for the Linear Predictor in GLMM study

0.90 0.95 0.99

Method v bu_i 7 14 21 7 14 21 7 14 21

RWLBE v1 cm 0.7203 0.8481 0.8898 0.7796 0.8979 0.9386 0.8537 0.9475 0.9795 ep 0.8889 0.9204 0.9147 0.9185 0.9483 0.9549 0.9557 0.9726 0.9853 v2 cm 0.7922 0.8673 0.8949 0.8402 0.9122 0.9421 0.8977 0.9548 0.9808 ep 0.7957 0.8682 0.8958 0.8521 0.9154 0.9429 0.9202 0.9618 0.9820 v3 cm 0.8160 0.8732 0.8982 0.8688 0.9191 0.9446 0.9308 0.9637 0.9827 ep 0.8923 0.9121 0.9099 0.9301 0.9468 0.9523 0.9666 0.9774 0.9856 REBE v1 cm 0.9201 0.9497 0.9862 0.9409 0.9652 0.9919 0.9650 0.9816 0.9969 ep 0.9847 0.9545 0.9873 0.9903 0.9692 0.9927 0.9954 0.9841 0.9973 v2 cm 0.9841 0.9558 0.9868 0.9898 0.9705 0.9924 0.9951 0.9857 0.9973 ep 0.9906 0.9573 0.9873 0.9941 0.9718 0.9928 0.9973 0.9865 0.9975 PBE cm 0.8869 0.9194 0.9139 0.9170 0.9478 0.9543 0.9544 0.9723 0.9851 ep 0.8903 0.9114 0.9093 0.9284 0.9463 0.9519 0.9658 0.9772 0.9855 Table 6: Average Coverage Ratios of Prediction Intervals for η with n= 300.

0.90 0.95 0.99

Method v bu_i 7 14 21 7 14 21 7 14 21

RWLBE v1 cm 0.7149 0.8454 0.8888 0.7750 0.8961 0.9377 0.8499 0.9458 0.9790 ep 0.8885 0.9198 0.9145 0.9182 0.9476 0.9545 0.9551 0.9721 0.9849 v2 cm 0.7855 0.8643 0.8939 0.8343 0.9098 0.9411 0.8930 0.9530 0.9803 ep 0.7797 0.8660 0.8948 0.8390 0.9140 0.9422 0.9110 0.9603 0.9816 v3 cm 0.7936 0.8707 0.8973 0.8509 0.9173 0.9439 0.9192 0.9620 0.9823 ep 0.8918 0.9119 0.9097 0.9291 0.9461 0.9520 0.9661 0.9767 0.9853 REBE v1 cm 0.8507 0.9314 0.9770 0.8837 0.9493 0.9846 0.9247 0.9704 0.9928 ep 0.9783 0.9347 0.9782 0.9856 0.9528 0.9856 0.9930 0.9734 0.9935 v2 cm 0.9766 0.9367 0.9775 0.9844 0.9542 0.9851 0.9922 0.9745 0.9933 ep 0.9886 0.9377 0.9781 0.9928 0.9552 0.9856 0.9967 0.9753 0.9936 PBE cm 0.8874 0.9195 0.9142 0.9174 0.9472 0.9542 0.9545 0.9721 0.9849 ep 0.8907 0.9116 0.9095 0.9284 0.9459 0.9518 0.9656 0.9765 0.9852 Table 7: Coverage Ratios of Prediction Intervals for η with n= 600.

References

Bates, D., M¨achler, M., Bolker, B., & Walker, S. 2015. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1–48.

Booth, J. G., & Hobert, J. P. 1998. Standard errors of prediction in generalized linear mixed models. Journal of the American Statistical Association, 93(441), 262–272.

Boubeta, Miguel, Lombard´ıa, Mar´ıa Jos´e, & Morales, Domingo. 2016. Empirical best pre-diction under area-level Poisson mixed models. Test, 25(3), 548–569.

Butar, F. B., & Lahiri, P. 2003. On measures of uncertainty of empirical Bayes small-area estimators. Journal of Statistical Planning and Inference,112(1), 63–76.

Carpenter, J. R., Goldstein, H., & Rasbash, J. 2003. A novel bootstrap procedure for assess-ing the relationship between class size and achievement. Journal of the Royal Statistical Society: Series C (Applied Statistics), 52(4), 431–443.

Chatterjee, S., Lahiri, P., & Li, H. 2008. Parametric bootstrap approximation to the distri-bution of EBLUP and related prediction intervals in linear mixed models. The Annals of Statistics, 36(3), 1221–1245.

Das, K., Jiang, J., & Rao, JNK. 2004. Mean squared error of empirical predictor. The Annals of Statistics, 32(2), 818–840.

Datta, G.S., & Lahiri, P. 2000. A unified measure of uncertainty of estimated best linear unbiased predictors in small area estimation problems. Statistica Sinica, 10(2), 613–628.

Davison, A.C., & Hinkley, D.V. 1997. Bootstrap methods and their application. Cambridge Univ Pr.

De Backer, M, De Vroey, C, Lesaffre, Emmanuel, Scheys, I, & De Keyser, P. 1998. Twelve weeks of continuous oral therapy for toenail onychomycosis caused by dermatophytes: a double-blind comparative trial of terbinafine 250 mg/day versus itraconazole 200 mg/day.

Journal of the American Academy of Dermatology, 38(5), S57–S63.

Ding, Pauline, & Welsh, Alan H. 2017. Bootstrapping Longitudinal Data with Multiple Levels of Variation. Working Paper.

Field, C.A., Pang, Z., & Welsh, Alan H. 2008. Bootstrapping data with multiple levels of variation. Canadian Journal of Statistics,36(4), 521–539.

Field, C.A., Pang, Z., & Welsh, Alan H. 2010. Bootstrapping Robust Estimates for Clustered Data. Journal of the American Statistical Association,105(492), 1606–1616.

Flores-Agreda, Daniel. 2017. On the Inference of Random Effects in Generalized Linear Mixed Models. Ph.D. thesis, Universit´e de Gen`eve.

Hall, Peter, & Maiti, Tapabrata. 2006. On parametric bootstrap methods for small area prediction. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(2), 221–238.

Harville, D.A. 1985. Decomposition of prediction error. Journal of the American Statistical Association, 132–138.

Henderson, C.R. 1950. Estimation of genetic parameters. Biometrics,6, 186–187.

Jiang, J. 1998a. Consistent estimators in generalized linear mixed models. Journal of the American Statistical Association, 93(442), 720–729.

Jiang, J. 2003. Empirical best prediction for small-area inference based on generalized linear mixed models. Journal of Statistical Planning and Inference,111(1-2), 117–127.

Jiang, J. 2007. Linear and generalized linear mixed models and their applications. Springer Verlag.

Jiang, J., & Lahiri, P. 2001. Empirical best prediction for small area inference with binary data. Annals of the Institute of Statistical Mathematics, 53(2), 217–243.

Jiang, J., Lahiri, P., & Wan, S.M. 2002. A unified jackknife theory for empirical best prediction with M-estimation. Annals of Statistics, 1782–1810.

Jiang, Jiming. 1998b. Asymptotic properties of the empirical BLUP and BLUE in mixed linear models. Statistica Sinica, 861–885.

Kackar, R N, & Harville, D. 1984. Approximations for Standard Errors of Estimators of Fixed and Random Effect in Mixed Linear Models. Journal of the American Statistical Association, 853–862.

Kass, Robert E, & Steffey, Duane. 1989. Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). Journal of the American Statistical Association, 84(407), 717–726.

Kristensen, Kasper, Nielsen, Anders, Berg, Casper, Skaug, Hans, & Bell, Bradley. 2016.

TMB: Automatic Differentiation and Laplace Approximation.

Kubokawa, Tatsuya, & Nagashima, Bui. 2012. Parametric bootstrap methods for bias cor-rection in linear mixed models. Journal of Multivariate Analysis, 106, 1–16.

McCullagh, Peter. 2000. Resampling and exchangeable arrays. Bernoulli, 285–301.

McCulloch, C.E., & Searle, S.R. 2001. Generalized, Linear, and Mixed Models. New York:

Wiley.

Morris, C N. 1983. Parametric empirical Bayes inference: theory and applications. Journal of the American Statistical Association, 47–55.

Morris, Jeffrey S. 2002. The BLUPs are not “best” when it comes to bootstrapping.Statistics

& probability letters, 56(4), 425–430.

Pang, Zhen, & Welsh, A. H. 2014. The Generalised Bootstrap for Clustered Data. Int. J.

Data Anal. Tech. Strateg., 6(4), 407–415.

Pinheiro, J.C., & Bates, D.M. 2009. Mixed-effects models in S and S-PLUS. Springer Verlag.

Potthoff, Richard F, & Roy, SN. 1964. A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika, 313–326.

Prasad, N., & Rao, J. 1990. The estimation of the mean squared error of small-area estima-tors. Journal of the American Statistical Association, 85(409), 163–171.

R Core Team. 2015. R: A Language and Environment for Statistical Computing. R Foun-dation for Statistical Computing, Vienna, Austria.

Robinson, G.K. 1991. That BLUP is a good thing: The estimation of random effects.

Statistical Science, 15–32.

Searle, S.R., Casella, G., & McCulloch, C.E. 1992. Variance components. Wiley Online Library.

Shao, Jun, & Tu, Dongsheng. 2012. The jackknife and bootstrap. Springer Science & Business Media.

Singh, Avinash C, Stukel, DM, & Pfeffermann, D. 1998. Bayesian versus frequentist mea-sures of error in small area estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(2), 377–396.

Tierney, Luke, & Kadane, Joseph B. 1986. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81(393), 82–86.

Tierney, Luke, Kass, Robert E, & Kadane, Joseph B. 1989. Fully exponential Laplace approximations to expectations and variances of nonpositive functions. Journal of the American Statistical Association, 84(407), 710–716.

Verbeke, G., & Molenberghs, G. 2009. Linear mixed models for longitudinal data. Springer Verlag.

Dans le document Bootstrap estimation of uncertainty in prediction for generalized linear mixed models (Page 24-29)