### Thesis

### Reference

### On the Inference of Random Effects in Generalized Linear Mixed Models

FLORES AGREDA, Daniel Antonio

**Abstract**

In the first chapter, the problem of Bootstrap inference for the parameters of a GLMM is addressed. We formulate a bootstrapping strategy consisting on the random weighting of the contributions to the Joint Likelihood of Outcomes and Random Effects. Using the Laplace Approximation method for integrals on this function, yields a Random Weighted Log-Likelihood that produces the desired bootstrap replicates after optimization. In order to assess the properties of this procedure, that we name Random Weighted Likelihood Bootstrap (RWLB), we compare analytically their resulting EE to those of the Generalized Cluster Bootstrap for Gaussian LMM and conduct simulation studies both in a LMM and Mixed Logit regression contexts. The second chapter explores adaptations of the RWLB to the estimation of the uncertainty in prediction of random effects in a GLMM, as measured by the Mean Squared Error for the Predictors (MSEP).

FLORES AGREDA, Daniel Antonio. *On the Inference of Random Effects in Generalized*
*Linear Mixed Models*. Thèse de doctorat : Univ. Genève, 2017, no. GSEM 44

DOI : 10.13097/archive-ouverte/unige:102003 URN : urn:nbn:ch:unige-1020037

Available at:

http://archive-ouverte.unige.ch/unige:102003

Disclaimer: layout of this document may differ from the published version.

1 / 1

## On the Inference of Random Effects in Generalized Linear

## Mixed Models

by

### Daniel Antonio Flores Agreda

A thesis submitted to the

Geneva School of Economics and Management, University of Geneva, Switzerland,

in fulfillment of the requirements for the degree of PhD in Statistics

Members of the thesis committee:

Prof. Stefan Sperlich, Chair, University of Geneva Prof. EvaCantoni, Adviser, University of Geneva

Prof. Alan Welsh, Jury, College of Physical and Mathematical Sciences, Australia National University

Thesis No. 44 July 2017

**Acknowledgements**

This dissertation would have never been accomplished without the outstanding guidance of Professor Eva Cantoni. In her role of instructor throughout my undergraduate and graduate studies, she has ignited in me a deep interest for the field of Statistics through her lectures. Later, during the preparation of this PhD Thesis, she has consistently con- tributed countless profound ideas, insightful comments and productive intuitions. Aside from these professional qualities, she’s proven to be a greatly empathetic colleague and a remarkably humourous friend.

Secondly, I would to thank Prof. Sperlich and Prof. Alan H. Welsh for their helpful criticisms and inquisitive remarks. The points they have raised during the last stages of the writing of this dissertation have, in my view, provided a much needed perspective for this project to achieve the required degree of accomplishement.

Throughout these years all of my colleagues at the GSEM have provided me with various types of help, be it through our extensive academic discussions or our (even lengthier) moments of laughter. Among them, I would especially like to mention my good friends Ingrid, Marco, Mauro, Rose, Marie-Claire, Elise, Anja, Mattia, Sam, Setareh and Haotian.

Finally, I must acknoweledge my family in my homecountry, who even at the distance have provided me with a solid support and understanding throughout the years.

To all of you, all my gratitude.

**Abstract**

This dissertation aims to provide a contribution to the literature on the inference in Generalized Linear Mixed Models (GLMM) by means of Bootstrapping techniques.

In the first chapter, the problem of Bootstrap inference for the parameters of a GLMM
is addressed. Inspired by procedures based on the *random weighting* of the Estimating
Equations (EE), see e.g. Chatterjee and Bose [2005] and, in particular, the Generalized
Cluster Bootstrap (GCB) for Linear Mixed Models (LMM) [Field et al.,2010], we formu-
late a bootstrapping strategy consisting on the random weighting of the contributions to
the Joint Likelihood of Outcomes and Random Effects. Using the Laplace Approximation
method for integrals, see e.g. De Bruijn [1970], to ’integrate out’ the random effects on
the resulting weighted joint likelihood, yields a *Random Weighted log-Likelihood* function
that produces the desired bootstrap replicates after optimization. In order to assess the
validity of the procedure, that we name*Random Weighted Likelihood Bootstrap* (RWLB),
we compare their resulting EE to the expressions used in the GCB for a *Random Effect*
LMM, showing that both schemes agree up to the weighting of a term in the EE of the
Variance Components. Simulation studies conducted in a LMM with random slope and
a Mixed Logit regression with one random effect show that the distribution of the boot-
strap replicates provides a fairly good approximation of the asymptotic distribution for
the Model Parameters.

The second chapter addresses the estimation of the Mean Squared Error for the Pre- dictors (MSEP) of Random Effects in (GLMM) by means of non-parametric bootstrap methods. In the frequentist paradigm, the MSEP is used as a measure of the uncertainty in prediction and has been shown to be affected by the estimation of the model parame- ters. In the particular case of linear mixed models (LMM), two solutions are provided to practitioners: on one hand, second-order correct approximations which yield estimators of this quantity and, on the other hand, several Parametric Bootstrap algorithms. We pro- pose a non-parametric bootstrap scheme, consisting of an adaptation of the RWLB that can be used in the entire class of GLMM. On a first stage, the RWLB is used to generate bootstrap replicates of the parameters while, on a second stage, simulation is used to gen- erate bootstrap samples of standardized Random Effects. We conduct a first simulation study in the framework of Gaussian LMM to contrast the quality of our approach with respect to: (i) analytical estimators of MSEP based on approximations, (ii) Conditional Variances obtained with a Bayesian representation and (iii) other bootstrap schemes, on the grounds of their relative bias, relative efficiency and the coverage ratios of resulting prediction intervals. A second simulation study serves the purpose to illustrate the use and benefits of our proposal against other feasible alternatives in a pure, Non-Gaussian, GLMM setting.

**R´** **esum´** **e**

Cette th`ese est une contribution `a la litt´erature sur l’inf´erence statistique dans les Mod`eles
Lin´eaires G´en´eralis´es Mixtes (MLGM) par le biais de techniques *Bootstrap*.

Dans le premier chapitre, nous traitons le probl`eme de l’inf´erence Bootstrap des
param`etres d’un MLGM. Prenant inspiration de proc´edures bas´ees sur une *pond´eration*
*al´eatoire*des Equations d’Estimation (EE) (Chatterjee and Bose[2005]), et plus sp´ecifiquement
de la m´ethode nomm´ee *Generalized Cluster Bootstrap* (GCB, Field et al. [2010] pour les
Mod`eles Lin´eaires `a Effets Mixtes (MLM), nous proposons une strat´egie de Bootstrap
qui consiste `a pond´erer al´eatoirement les contributions individuelles `a la vraisemblance
conjointe. A l’aide de la m´ethode de Laplace pour approximer les int´egrales (De Bruijn
[1970]) afin d’int´egrer les effets al´eatoires, la fonction de vraisemblance pond´er´ee con-
jointe induit une fonction de *Log-Vraisemblance Al´eatoirement Pond´er´ee* qui produit les
r´ep´etitions Bootstrap d´esir´es apr`es optimisation. Afin de mettre en ´evidence les pro-
pri´et´es de cette proc´edure, que nous appelons *Bootstrap `a Vraisemblance Al´eatoirement*
*Pond´er´ee* (RWLB), dans le cadre d’un MLM, nous comparons les expressions des EE qui
en r´esultent avec celles du GCB. Nous d´emontrons, notamment, que les deux proc´edures
co¨ıncident `a un terme pr`es dans l’´equation qui concerne les param`etres associ´es aux Com-
posantes de la Variance. Des ´etudes de simulation effectu´ees sur un MLM comportant une
*pente* al´eatoire ainsi que sur une r´egression logistique mixte avec un effet al´eatoire, mon-
trent que la distribution bootstrap permet une bonne approximation de la distribution
asymptotique des param`etres du mod`ele.

Le deuxi`eme chapitre traite de l’estimation de l’Erreur Quadratique Moyenne des
Pr´edicteurs (EQMP) pour les Effets Al´eatoires dans un MLGM. Dans le paradigme
*fr´equentiste*, le EQMP est utilis´e comme mesure de l’incertitude des pr´evisions, qui est,
par ailleurs, affect´ee par l’estimation des param`etres du mod`ele. Dans le cas particulier
des MLM, deux solutions sont propos´ees aux praticiens: d’une part des estimateurs qui
r´esultent d’approximations de deuxi`eme ordre et d’autre part des algorithmes de Boot-
strap param´etrique. Nous proposons un nouveau sch´ema bootstrap bas´e sur une adapta-
tion du RWLB et qui donc peut ˆetre appliqu´e `a toute la classe de MLGM. Cet algorithme
est bas´e sur le RWLB pour la g´en´eration des r´ep´etitions bootstrap des estimateurs des
param`etres du mod`ele et d’une proc´edure de simulation afin de produire des ´echantillons
bootstrap pour des Effets Al´eatoires standard. Nous menons une ´etude de simulation
dans le cadre du MLM afin de comparer la qualit´e de notre proposition avec: (i) des
estimateurs de EQMP bas´es sur des approximations analytiques (ii) Variances Condition-
nelles obtenues avec une repr´esentation Bay´esienne et (iii) d’autres sch´emas bootstrap.

Nous ´evaluons le biais relatif, l’efficacit´e relative et le taux ce couverture des intervalles de pr´ediction qui en r´esultent. Une simulation suppl´ementaire nous aide `a illustrer l’usage et les b´en´efices de notre proposition compar´ee `a d’autres alternatives faisables dans le contexte d’un MLGM Non-Normal.

**Contents**

**Acknowledgements** **i**

**Abstract** **iii**

**R´esum´e** **v**

**Introduction** **1**

**1** **Bootstrapping GLMM via a Random Weighted Laplace Approximation** **3**

1.1 Introduction . . . 3

1.2 Model and Notation . . . 5

1.2.1 The General Formulation of a GLMM. . . 5

1.2.2 Special Case : Linear Mixed Models. . . 5

1.3 Approximate Likelihood Inference via Laplace Approximation . . . 7

1.4 Bootstrapping Mixed Models. . . 10

1.4.1 Bootstraps based on Random Effect Predictions . . . 10

1.4.2 Bootstraps based on Distributional Assumptions . . . 11

1.4.3 Bootstraps based on Data Clusters . . . 12

1.4.4 Discussion . . . 12

1.5 A Random Weighted Bootstrap for GLMM . . . 13

1.5.1 Relation with the GCB for a Random Effect LMM . . . 14

1.5.2 Properties in GLMM with an Individual Random Effect . . . 15

1.5.3 Implementation . . . 17

1.6 Simulation Study . . . 18

1.6.1 Gaussian Responses. . . 18

1.6.2 Binary Response . . . 25

1.7 Conclusions and Discussion . . . 31

1.8 Appendix . . . 32

1.8.1 Comparison of the Properties of RWLB vs.GCB . . . 32

1.8.2 Plots LMM . . . 36

1.8.3 Tables LMM. . . 38

1.8.4 Tables GLMM. . . 39

1.8.5 TMB templates for LAWLL . . . 41

**2** **Bootstrapping GLMM to Estimate the Uncertainty in Prediction** **43**
2.1 Introduction . . . 43

2.2 Model and Notation . . . 44

2.2.1 Special Case: Linear Mixed Models . . . 45

2.2.2 Point Predictors of Random Effects . . . 46

2.3 Measures of Uncertainty in Prediction of Random Effects . . . 47

2.3.1 Estimation of MSEP via Approximations . . . 47

2.3.2 Computation and Correction of CV . . . 49

2.3.3 Bootstrap-Based Measures of the Uncertainty . . . 50

2.3.4 Prediction Intervals . . . 54

2.3.5 Implementation . . . 54

2.4 Simulation Studies . . . 55

2.4.1 Gaussian Responses. . . 55

2.4.2 Binary Response . . . 59

2.5 Conclusions and Discussion . . . 62

2.6 Appendix . . . 64

2.6.1 Coverage Ratios for the Linear Predictor in GLMM study . . . 64

**References** **71**

*To my family and my friends.*

**Introduction**

Over the years, Linear Models (LM) have proven to be powerful and versatile tools in model-based research. At the beginning limited to a Gaussian response context with one random error, they were first extended to include various sources of variation by the addition of random effects to their classic specification, yielding the class of Linear Mixed Models, see e.g. Henderson [1950].

Similarily,*Generalized Linear Models*(GLM) [Nelder and Wedderburn,1972], initially
developed to model qualitative and quantitative variables independently drawn from ex-
ponential families, were also extended to account for repeated observations in various
manners. One such proposals, the framework of*Generalized Estimating Equations*(GEE)
Liang and Zeger [1986] can be seen as an extension of the quasi-likelihood inference in
GLM, with matrix equivalents to the variances of the response depending on a *working*
*correlation* matrix to account for the covariance structure among the elements within an
observational unit. While simple to implement and intuitive to use, inferential procedures
are limited because of the lack of expressions for the Likelihood function. Moreover, the
introduction of an additional degree of subjectivity, in the shape of the working correla-
tions, induces limitations in their application.

The extension of GLM that will interest us in this dissertation is the class of*General-*
*ized Linear Mixed Models* (GLMM), a generalization of the approach by mixed effects to
non-Gaussian responses, hence differing from the basic GLM in the addition of unobserv-
able random effects at the level of the linear predictor. Contrary to the GEE approach,
the Marginal Likelihood of a GLMM can be defined by the integration of the joint density
of outcomes and random effects, albeit with an added complication, as the integrals often
lack a closed-form expression and hence need approximations.

Like GLM, Bootstrapping techniques [Efron, 1979] have become essential elements
in the toolbox of any statistician. By means of an algorithm that is both intuitive and
easy to implement, researchers and practitioners alike have been given a simple method
to draw inference for complex model features. In the case of Linear Models, typical
implementations of this procedure rely on either resampling of fitted*residuals*, resampling
of data*pairs*, or*simulation*to construct the bootstrap samples of outcomes and construct
the bootstrap replicates of the parameters required to infer on the features of the sampling
distribution. To the best of our knoweledge, however, extensions of the method to the
GLMM framework have been scarce and rely, for the most part, on simulation. This is
partly due to the complexity of the random structure, which reduces the exchangeability
of observations necessary in both the *pairs* and *residuals approach* and in part to the
supplementary obstacle entailed by the Likelihood approximation methods, which impedes
the straightforward extensions of said approaches.

This dissertation aims to contribute to the discussion on the inference of GLMM by means of Bootstrapping techniques. We focus our attention in two subjects, namely the Bootstrap applied to GLMM for inference on the parameter estimates and its application in the estimation of the uncertainty of Random Effect Predictions. As a consequence, the manuscript comprises two self-contained chapters treating each of these subjects.

*Chapter 1* adresses the problem of bootstrap-based inference in GLMM. After a
review of bootstrap schemes currently available in the literature, and inspired in
part by procedures based on the*random weighting* of the Estimating Equations, see
e.g Chatterjee and Bose[2005],Field et al. [2010], we formulate a strategy based on
(i) random weighting of the contributions to the Joint Likelihood of outcomes and
Random Effects and (ii) the Laplace method for the approximation of integrals.

*Chapter 2* deals with the measure of uncertainty in predictions for random effects via
Bootstrap methods. The manuscript overviews the literature on the measure of said
uncertainty, which is customarily carried out via approximations to both*Frequentist*
measures such as the Mean Squared Error of Prediction and others inherited from
the*Bayesian* paradigm, such as the Conditional Variances. In this work, we propose
an implementation of the method developed in the previous chapter as an alternative
way of computing these measures.

**Chapter 1**

**Bootstrapping GLMM via a Random** **Weighted Laplace Approximation**

**1.1** **Introduction**

Ever since the introduction of the Bootstrap in the classic paper byEfron[1979], applica- tions of the method to a wide array of problems have been developed and refined with an increasingly accelerated pace. Among its many uses, one can mention the applications in Regression Models, where Bootstrap helps researchers perform inference on the sampling distribution of the model parameters by means of bootstrap estimates for their mean, bias, variance or density. For this purpose, a great number of bootstrap schemes and strategies have been developed over time.

Consider for example the Gaussian Linear Model (LM):*y** _{i}* =

**x**

^{T}

_{i}*+*

**β***,*

_{i}*i*∈ {1

*, . . . , n}*, where

**x**

*(*

_{i}*p*×1) are covariate vectors and

*i*are independent draws of a N(0

*, σ*

^{2}) distri- bution. The first and the most widespread of bootstrap strategies, known as

*Residual*

*Bootstrap*[Efron,1979], consists in creating replicates of the random errors by resampling with replacement from the vector of the predicted residuals b = [b

*i*]

^{n}*, b*

_{i=1}*i*=

*y*

*i*−

**x**

^{T}

_{i}

**β**^{b}and creating Bootstrap replicates using the assumed data-generating mechanism with an estimate of the parameters

**β**^{b}, i.e.

*y*

_{i}^{∗}=

**x**

^{T}

_{i}

**β**^{b}+

^{∗}

*. A second common approach, known as*

_{i}*External*or

*Wild Bootstrap*[Wu, 1986] creates the bootstrap samples via simulation of standardized residuals

*u*

^{∗}

*from a distribution with E[*

_{i}*u*

^{∗}

*] = 0 and Var[*

_{i}*u*

^{∗}

*] = 1, there- fore constructing*

_{i}^{∗}

*=*

_{i}*σu*b

^{∗}

*and then operating as in the Residual Bootstrap. When the distribution of*

_{i}*u*

^{∗}

*is standard Normal, thus coinciding with the assumed data generat- ing process, this procedure is known as*

_{i}*Parametric Bootstrap*. The third scheme often proposed in the literature is the

*Pairs Bootstrap*procedure, which consists in resampling outcomes and covariates per observational unit or

*pairs*of data [

*y*

*i*

*,*

**x**

^{T}*]*

_{i}*with replacement to create samples [*

^{T}*y*

_{i}^{∗}

*,*

**x**

^{∗T}

*]*

_{i}*yielding Bootstrap replicates of the parameters by optimizing a cost function such as the*

^{T}*Likelihood*or the

*Residual Sum of Squares*.

Over time, these basic procedures have been widely studied and extended to encom-
pass a variety of deviations from the basic linear model assumptions, yet they all rely
critically on the assumption of lack of correlation among observations and therefore their
exchangeability. As it has been argued, this assumption could have spourious effects
when dealing with clustered data for repeated observations, where the random structure
is more complex due to the inclusion of *Random Effects* (RE). For example, McCullagh
[2000] show that different resampling strategies yield different bootstrap estimators of the
sampling variance, and none of them estimate this feature consistently. Other theoret-

ical works study the conditions under which resampling procedures yield good asymp- totic approximations of the sampling distribution of the parameters. For instance, Field and Welsh [2007] show that the higher-order moment structure of the random effects, in conjunction with the assumptions on the data generating process and the choice of the resampling scheme have an impact on the bootstrap estimates of the sampling variance of the parameters for a one-way array. More generally, [Field et al.,2008] show that these structures and assumptions have an effect in the quality of the bootstrap approximation of the sampling distribution for all the parameters in a Longitudinal Linear Mixed Model.

In order to provide a general procedure, and avoid the considerations that entail a burden
of choice, Field et al.[2010] propose what is perhaps the more interesting of schemes, the
*Generalized Cluster Bootstrap*(GCB) following on many propositions of bootstrapping via
random weighting (e.g. Rubin et al. 1981, Newton and Raftery 1994) and the estimating
equations (e.g. Hu and Kalbfleisch 1997 and Chatterjee and Bose 2005). What’s more,
Field et al. [2010] and Samanta and Welsh [2012], provide evidence of the usefulness of
the method by applying the method to the fit of a Longitudinal Linear Mixed Model
via estimating equations with bounded Influence Functions, showing ways of extending
the procedure with considerations on its robustness, while more recent works, [Ding and
Welsh,2017] explore the weighting schemes by comparing the resulting bootstraps accross
model assumptions.

To the best of our knowledge, however, the question of a general non-parametric
bootstrap for a Non-Gaussian Generalized Linear Mixed Model is still open. It is clear that
in this context, one of the main problems comes from the difficuly of defining a *residual*
to be resampled, making *parametric* or*wild* procedures more appealing. Another burden
comes from the difficulties in approximating the multivariate integrals that compose the
likelihood which potentially induce systematic biases.

We feel that any attempt to adress the question of a general bootstrap method in
GLMM could be useful in various applications such as the computation of the asymptotic
variances for parameter estimates, the correction of small-sample biases and the construc-
tion of confidence intervals. More inherent to GLMM is the estimation of uncertainty of
the predictions of the random effects as measured, for instance, by the*Mean Squared Error*
*of Prediction* (MSEP) see e.g. Kackar and Harville [1984], Prasad and Rao[1990] which
often requires expensive computations that can be potentially bypassed with the use of
bootstrap methods. Hence, this manuscript attempts to contribute to this discussion by
proposing a Weighted Bootstrap applied to the Laplace Approximation for Longitudinal
Non-Gaussian GLMM in the spirit of GCB [Field et al., 2010, Pang and Welsh, 2014,
Ding and Welsh, 2017].

This chapter is structured as follows : On a first stage, we set up the general notation and features of the GLMM while providing the expressions for the important example that is the class of LMM (Section 1.2). After an overview of some other methods for ap- proximating the Likelihood of these models, we describe the key elements of the inference based on the Laplace approximation, all while providing the link between said methods and the exact ML inference in LMM (Section 1.3). We overview the bootstrapping meth- ods for Gaussian and Non-Gaussian GLMM in Section 1.4 and formulate our proposal and expose its links to other schemes based on the random weighting. Finally, we provide evidence of the finite sample properties of the method in two simulated examples, a LMM with random slope and intercept inspired by theOrthodonticdataset and a Mixed Logit resulting from the fit of the Toenaildata (Section 1.6).

1.2. Model and Notation 5

**1.2** **Model and Notation**

Let*i*= 1*, . . . , n*denote the index of the*observational unit* and*j* = 1*, . . . , n** _{i}* the index for
an observation within this unit. Write

*= [*

**θ**

**β**

^{T}*,*

**σ***]*

^{T}*((*

^{T}*p*+

*s*)×1) to denote the vector of model parameters, where

*(*

**β***p*×1) represents the

*fixed effect*parameters and

*(*

**σ***s*×1) contains the parameters associated with the random effects that we will (informally) refer to as

*variance components*.

**1.2.1** **The General Formulation of a GLMM**

In our formulation, we write *y** _{ij}* to denote the observed outcomes, assumed to be inde-
pendently drawn from an exponential family when conditioned on a vector of covariates

**x**

*(*

_{ij}*p*×1) and a vector of random effects

**γ***(*

_{i}*q*×1) following a N

*q*(

**0,∆**

*) distribution, endowed with a positive-definite symmmetric covariance matrix*

_{σ}**∆**

*. For simplicity of ex- position and following the usual practice in the literature, we consider the reparametriza- tion*

_{σ}

**γ***=*

_{i}**D**

_{σ}**u**

*resulting from the Cholesky decomposition of the covariance matrix of the random effects*

_{i}**∆**

*=*

_{σ}**D**

_{σ}**D**

^{T}*, where*

_{σ}**u**

*are multivariate standard normal vectors (see e.g.*

_{i}Bates 2010). We shall write *µ** _{ij}* to denote the conditional expectation of the outcome,

**z**

*(*

_{ij}*q*×1) a

*design*vector for the random effects and

*η*

*ij*=

**x**

^{T}

_{ij}*+*

**β****z**

^{T}

_{ij}**D**

_{σ}**u**

*the*

_{i}*linear*

*predictor*. With

*g*, representing a monotonic

*link*function that maps the linear predictor and the conditional expectation of the outcome, a general formulation of the GLMM can be written as follows:

*g*(*µ** _{ij}*) =

*η*

*=*

_{ij}**x**

^{T}

_{ij}*+*

**β****z**

^{T}

_{ij}**D**

_{σ}**u**

_{i}*.*(1.1) This general formulation encompasses a wide array of models, such as Mixed versions of Poisson, Gamma and Inverse Gaussian regression. In particular, assuming

*y*

*are independent draws of a conditional Bernoulli distribution, B(1*

_{ij}*, µ*

*) and via a*

_{ij}*logit*link, for

*µ*

*, it is possible to formulate the popular*

_{ij}*Mixed Logit Model*as follows:

logit (*µ** _{ij}*) = log

*µ*

*1−*

_{ij}*µ*

_{ij}!

=**x**^{T}_{ij}* β*+

**z**

^{T}

_{ij}**D**

_{σ}**u**

_{i}*.*(1.2) In order to characterize the expressions of the Likelihood, let us write

*f*to denote the probability density function (PDF) or probability mass function (PMF) evaluated at the observed outcomes

*y*

*, conditioned on vectors*

_{ij}**x**

*,*

_{ij}**u**

*and assumed to follow conditional exponential families, therefore conferred with the following structure :*

_{i}*f**θ*(*y**ij*|u*i*) :=*f*(*y**ij*|x*ij**,***u*** _{i}*;

*) =*

**θ, φ**^{X}

^{n}

^{i}*j=1*

*y*_{ij}*ξ** _{ij}* −

*b*(

*ξ*

*)*

_{ij}*φ* +*c*(*y**ij**, φ*) (1.3)
for*φ*a nuisance*dispersion*parameter,*ξ** _{ij}* =

*ξ*(

*η*

*) the so-called*

_{ij}*canonical parameter*(when

*φ*is known) and with

*b*and

*c*denoting some specific functions. Moreover, it is known that

*b*is a function that identifies the conditional mean and variance of the outcomes, i.e.

*µ** _{ij}* =E

*θ*[

*Y*

*|u*

_{ij}*] =*

_{i}*b*

^{0}(

*ξ*

*) and*

_{ij}*ν*

*=*

_{ij}*v*(

*µ*

*) = Var*

_{ij}*θ*[

*Y*

*|u*

_{ij}*] =*

_{i}*φb*

^{00}(

*ξ*

*). In what follows, we consider the link*

_{ij}*g*to be the

*canonical link*, in other words

*µ*

*=*

_{ij}*b*

^{0}(

*η*

*), implying*

_{ij}*ξ*

*=*

_{ij}*η*

*.*

_{ij}**1.2.2** **Special Case : Linear Mixed Models**

A widely studied GLMM is the class of Gaussian Linear Mixed Models (LMM), a case that
arises when the outcomes *y** _{ij}* are conditionally distributed as N(

*µ*

_{ij}*, φ*) with the identity

function as link, yielding *µ**ij* =**x**^{T}_{ij}* β*+

**z**

^{T}

_{ij}**D**

_{σ}**u**

*. In order to facilitate the exposition of its properties and computational methods we will use the matrix representation of the LMM which mirrors that of the Linear Model. After defining the*

_{i}*random error*component

*following a N(0*

_{ij}*, φ*) distribution, model (1.1) can be represented as follows:

*y** _{ij}* =

**x**

^{T}

_{ij}*+*

**β****z**

^{T}

_{ij}**D**

_{σ}**u**

*+*

_{i}

_{ij}*.*(1.4) Outcomes can then be concatenated into conditionally independent Normal vectors

**y**

*= [*

_{i}*y*

*]*

_{ij}

^{n}*j=1*

*following N*

^{i}

_{n}*(*

_{i}

**µ**

_{i}*, φI*

_{n}*) distributions with means*

_{i}

**µ***= [*

_{i}*µ*

*]*

_{ij}

^{n}*j=1*

*by the prop- erties of the Multivariate Normal Distribution, yielding a matrix characterization per observational unit :*

^{i}**y*** _{i}* =

**X**

_{i}*+*

**β****Z**

_{i}**D**

_{σ}**u**

*+*

_{i}

_{i}*,*(1.5) where:

**X**

*= [*

_{i}**x**

^{T}*]*

_{ij}

^{n}*j=1*(

*n*

*×*

_{i}*p*) serves to represent the matrix of covariates for the

*i*-th observational unit and

**Z**

*= [*

_{i}**z**

^{T}*]*

_{ij}

^{n}*j=1*(

*n*

*i*×

*q*) denotes the respective design matrix for random effects. The last expresssion is sometimes known as the

*Longitudinal*LMM, since it is assumed that the random structure grouping up to the level of an observational unit implies repeated observations through time.

A particular case, that we will call*Random Effect* LMM throughout this manuscript
arises when the random structure of the model can be decomposed into various inde-
pendent sources of variation up to the level of the observational unit, which implies the
partition of the design matrix **Z*** _{i}* into

**Z**

*= [*

_{i}**Z**

_{i1}*, . . . ,*

**Z**

*] and the corresponding partition of the vector of random effects*

_{iq}**u**

*= [*

_{i}**u**

*]*

_{ri}

^{q}*r=1*, where

**Z**

*(*

_{ir}*n*

*×*

_{i}*q*

*) are the design matrices*

_{ir}*per unit*and

**u**

*(*

_{ri}*q*

*×1) their associated random effects. With the convention*

_{ir}*σ*

_{0}=

*φ*

^{1/2}and

**Z**

*=*

_{i0}**I**

_{n}*, model (1.5) can be written as follows:*

_{i}**y*** _{i}* =

**X**

_{i}*+*

**β***q*

X

*r=0*

*σ*_{r}**Z**_{ir}**u**_{ir}*.* (1.6)

By means of a careful arrangement of the random effects according to source of vari-
ation **u*** _{r}* = [

**u**

*]*

_{ri}

^{n}*(*

_{i=1}*q*

*×1) where*

_{r}*q*

*=*

_{r}^{P}

^{n}

_{i=1}*q*

*as well as the design matrices*

_{ir}**Z**

*= blockdiag[*

_{r}**Z**

*]*

_{ir}

^{n}*i=1*(

*N*×

*q*

*r*) for

*N*=

^{P}

^{n}*i=1*

*n*

*i*, with

**X**= [

**X**

*]*

_{i}

^{n}*i=1*and

**y**= [

**y**

*]*

_{i}

^{n}*i=1*, the model can be seen as a particular case of the so-called

*Variance Components*Model [Searle et al., 1992], with the following representation:

**y**=**Xβ**+

*q*

X

*r=0*

*σ**r***Z**_{r}**u**_{r}*.* (1.7)

It is straightforward to see that equations (1.5) and (1.7) can be deemed as cases of a
*General* Linear Mixed Model:

**y**=**Xβ**+**ZDu**+* (1.8)
*

differing only in the structure of **Z** and **∆** = **DD*** ^{T}* (e.g.

**D**= blockdiag(

**D**

*)*

_{σ}

^{n}*i=1*,

**Z**= blockdiag(

**Z**

*)*

_{i}

^{n}*i=1*in the

*Longitudinal*LMM). As pointed out in the literature, see for example McCulloch and Searle [2001], the assumptions of normality for the conditional distribution of the outcomes imply that their marginal distribution is also multivariate normal N

*(*

_{N}**Xβ,Σ**), with

**Σ**=

**Z∆Z**

*+*

^{T}*φI*

*.*

_{N}1.3. Approximate Likelihood Inference via Laplace Approximation 7

**1.3** **Approximate Likelihood Inference via Laplace Ap-** **proximation**

In this manuscript we will limit the discussion on estimation methods to those based
on the likelihood, hence the need to provide a characterization of this function. As in
many latent variable problems, the expressions of the marginal PDF/PMF *f** _{θ}*(

*y*

*) of the observed outcomes, which are at the basis of the likelihood function, are obtained after integration of the random effects from the joint distribution of [*

_{ij}*y*

_{ij}*,*

**u**

*]. Using*

_{i}*ϕ*to denote the density of the multivariate standard normal random vector

**u**

*and with the assumption of independence between observational units, the likelihood for model (1.1) is the result of a product of multivariate integrals:*

_{i}L(* θ*) =

^{Y}

^{n}*i=1*

L* _{i}*(

*) =*

**θ**^{Y}

^{n}*i=1*

Z

R^{q}

*n**i*

Y

*j=1*

*f** _{θ}*(

*y*

*|u*

_{ij}*)*

_{i}

*ϕ*(**u*** _{i}*) d

**u**

_{i}*.*(1.9) Moreover, taking advantage of the expression of the multivariate standard normal density and the structure of the exponential family, defining functions

*`*

*as :*

_{i}*`** _{i}*(

**θ,****u**

*) :=*

_{i}*`*

*(*

_{i}**u**

_{i}*,*;

**θ***φ*) = −1

*n*

_{i}

*n**i*

X

*j=1*

"

*y*_{ij}*η** _{ij}* −

*b*(

*η*

*)*

_{ij}*φ* +*c*(*y*_{ij}*, φ*)

#

− 1
2**u**^{T}_{i}**u**_{i}*,*

(1.10) and using the conventions on integral notation, the product (1.9) is equal to either of the following integrals :

L(* θ*) = (2

*π*)

^{−nq/2}

^{Y}

^{n}*i=1*

Z

R^{q}

exp [−n_{i}*`** _{i}*(

**θ,****u**

*)] d*

_{i}**u**

*= (2*

_{i}*π*)

^{−nq/2}

^{Z}

R^{nq}

exp [*`*(**θ,****u**)] d**u,** (1.11)
where *`*(**θ,****u**) = −^{P}^{n}_{i=1}*n*_{i}*`** _{i}*(

**θ,****u**

*). It is known, however, that aside from the case of some particular models, e.g. the LMM of Section 1.2.1, these integrals are non-analytic and therefore need to be approximated numerically.*

_{i}One of such approximations results from the use of *Gaussian Quadrature* methods.

Rabe-Hesketh et al. [2002], for instance, show that the likelihood contributions can be
written as products of univariate integrals by exploiting the independence of standardized
random effects. These integrals can then be approximated via weighted sums of Hermite
polynomials evaluated at *quadrature points* determined by a quadrature rule. Additional
accuracy can be obtained using the*Adaptive*version of Gaussian Quadrature (AGQ) which
consists in allocating more weight and more points to the regions with more density. Many
studies have shown the good properties of the inference based on these approximations
when the number of quadrature points increases see e.g. Rabe-Hesketh et al. [2002],
Rabe-Hesketh et al.[2005],Rabe-Hesketh and Skrondal[2008],Pinheiro and Chao[2012],
but the implementations are limited to models with simple random structures, because of
the excessive amount of computational resources required. Moreover, even very efficient
implementations take a lot of time to return accurate estimates, see e.g. Huber et al.

[2004], making them unappealing for repeated fits in the spirit of bootstrap inference.

Another possibility is to consider a*Quasi-Likelihood*approach. FollowingGreen[1987],
Breslow and Clayton [1993] highlight the fact that the exponents*n*_{i}*`** _{i}*(

**θ,****u**

*) in (1.10) can be written as the sum between the conditional log-likelihood contributions and a*

_{i}*penalty*term on the eucledian norm of

**u**

*. Hence, they propose replacing the first term in this sum with a characterization of the relationship between the conditional expectation*

_{i}*µ*

_{ij}and variance *v*(*µ**ij*) by means of the derivative of the *integrated quasi-likelihood* function
*d** _{ij}*(

*y*

_{ij}*, µ*

*) := −2*

_{ij}^{R}

*y*

^{µ}*ij*

*[*

^{ij}*y*

*−*

_{ij}*x*]

*/v*(

*x*)d

*x*, yielding a

*Penalized Quasi-Likelihood*(PQL) ob- jective function. In its classic implementation, the PQL is optimized sequentially with respect to

**u**

*and*

_{i}*, yielding values of the linear predictor and the conditional expecta- tion evaluated at the optima*

**β***η*b

*and*

_{ij}*µ*b

*. On the basis of these quantities, the estimation problem can be written as a LMM*

_{ij}*y*

_{ij}*=*

^{w}**x**

^{T}

_{ij}*+*

**β****z**

^{T}

_{ij}**D**

_{σ}**u**

*+*

_{i}*for a*

_{ij}*working*response

*y*

_{ij}*=*

^{w}*g*(

*y*

*ij*) =

*g*(

*µ*b

*ij*) + (

*y*

*ij*−

*µ*b

*ij*)

*g*

^{0}(

*µ*b

*ij*) with

*ij*∼ N(0

*, w*

*ij*) and

*w*

*ij*=

*v*(

*µ*b

*ij*)[

*g*

^{0}(

*µ*b

*ij*)]

^{2}which can then be fit with appropriate methods for LMM known for being less compu- tationally intensive and easier to implement. However, in spite of this convenience, it has been assessed that the resulting estimates present systematic biases especially for the variance component parameters and in presence of very discrete outcomes, see e.g.

Breslow and Lin [1995], Jang and Lim [2006]. Owing to this inconsistency, we shall not base our proposals on this method.

A widespread alternative consists in applying the Laplace Approximation Method for Integrals, (see e.g. De Bruijn 1970, Barndorff-Nielsen and Cox 1989, Ch. 6), which is a popular way of approximating multiple integrals of the form:

I(* α*) =

^{Z}

R^{d}

*s*(**v**) exp[−nt(**v,*** α*)]d

**v**=

^{Z}

R^{d}

exp[−nh(**v,*** α*)]d

**v,**

when*n*is large and the function*t*(equivalently : *h*) has a unique minimum ˜**v**for a given*α*.
In such a situation, the integral can be approximated by the value of the integrand around

˜

**v**, a procedure that can be seen as applying AGQ after choosing only one quadrature
point. In order to characterize the approximation of the Likelihood of GLMM with this
method, we shall use the notational conventions of Magnus et al.[1995] andRaudenbush
et al. [2000]. Specifically, let *`*^{(k)}* _{i}* (

**θ,****u**

*) =*

_{i}*∂*

_{u}^{(k)}

_{i}*`*

*i*(

**θ,****u**

*) =*

_{i}*∂*vec

*`*

^{(k−1)}

*(*

_{i}

**θ,****u**

*)*

_{i}*/∂u*

^{T}*denote the Partial Derivative, or*

_{i}*Jacobian*, of

*k*-th order of the function

*`*

*with respect to*

_{i}**u**

*. Under this convention, the*

_{i}*Gradient*of

*`*

*with respect to*

_{i}**u**

*is the transpose of the first Jacobian: ∇*

_{i}**u**

*i*

*`*

*i*(

**θ,****u**

*) = [*

_{i}*`*

^{(1)}

*(*

_{i}

**θ,****u**

*)]*

_{i}*and a multivariate Taylor Series expansion of*

^{T}*`*

*i*

around a point **u**_{0} can be written as follows:

*`** _{i}*(

**θ,****u**

*) =*

_{i}*`*

*(*

_{i}

**θ,****u**

_{0}) +

*`*

^{(1)}

*(*

_{i}

**θ,****u**

_{0}) [

**u**

*−*

_{i}**u**

_{0}] + 1

2[**u*** _{i}*−

**u**

_{0}]

^{T}*`*

^{(2)}

*(*

_{i}

**θ,****u**

_{0}) [

**u**

*−*

_{i}**u**

_{0}] +

∞

X

*k=3*

*T** _{ik}* (1.12)
with the following characterization for terms of order

*k*≥3:

*T**ik* =*T**ik*(**θ,****u**_{0}) := 1
*k*!

_{k−1}

⊗ (**u*** _{i}*−

**u**

_{0})

^{}

^{T}*`*

^{(k)}

*(*

_{i}

**θ,****u**

_{0}) (

**u**

*−*

_{i}**u**

_{0})

*,*(1.13) where ⊗v

*=*

^{k}**v**⊗

**v**· · · ⊗

**v**(

*k*-times) represents a

*k*-fold Kronecker product of a vector

**v**. By writing ˜

**u**

*to designate the minimum of*

_{i}*`*

*, sometimes called the*

_{i}*Mode*of the joint PMF/PDF for [

**y**

_{i}

^{T}*,*

**u**

_{i}*]*

^{T}*, ˜*

^{T}**u**

*:= ˜*

_{i}**u**

*(*

_{i}*) = argmax*

**θ****u**

*i*[−n

*i*

*`*

*i*(

**θ,****u**

*)] and carrying out the expansion around this value, the second term of the expansion (1.12) vanishes, while the quadratic form in the third term recalls the exponent in the density of a normal random vector, yielding the following characterization after exponetiation of the series and subsequent integration:*

_{i}L* _{i}*(

*) = (2*

**θ***π*)

*|V*

^{q/2}*(*

_{i}*)|*

**θ**^{1/2}exp

^{h}−n

_{i}*`*˜

*(*

_{i}*)*

**θ**^{i}exp [

*ε*

*(*

_{i}*)] (1.14)*

**θ**1.3. Approximate Likelihood Inference via Laplace Approximation 9
where: ˜*`*^{(k)}* _{i}* (

*) :=*

**θ***`*

^{(k)}

*(*

_{i}

**θ,****u**˜

*;*

_{i}*φ*),

**V**

*(*

_{i}*) := [*

**θ***n*

_{i}*`*˜

^{(2)}

*(*

_{i}*)]*

**θ**^{−1},

*R*

*:= −n*

_{i}

_{i}^{P}

^{∞}

_{k=3}*T*˜

*for ˜*

_{ik}*T*

*=*

_{ik}*T*

*(*

_{ik}

**θ,****u**˜

*), and*

_{i}*ε*

*(*

_{i}*) := logE[exp (*

**θ***R*

*)] with the expectation taken over the density of a N[˜*

_{i}**u**

_{i}*,*

**V**

*(*

_{i}*)] random vector. With these considerations, the Laplace-approximated con- tributions can be formulated as follows:*

**θ**logL*i*(* θ*) =

*q*

2log (2*π*) + 1

2log|V*i*(* θ*)| −

*n*

*i*

*`*˜

*i*(

*) +*

**θ***ε*

*i*(

*)*

**θ***,*(1.15) which, after neglecting the approximation error in the contributions, yields the

*Laplace-*

*approximated log-Likelihood*(LALL):

log ˜L(* θ*) =

*nq*

2 log (2*π*) + 1
2

*n*

X

*i=1*

log|V* _{i}*(

*)|+ ˜*

**θ***`*(

*)*

**θ***,*(1.16) where ˜

*`*(

*θ*) = −

^{P}

^{n}

_{i=1}*n*

_{i}*`*˜

*(*

_{i}*). Customarily this approximation is then optimized with respect to*

**θ***, in the spirit of the Maximum Likelihood (ML) approach, to obtain what we shall call*

**θ***Laplace-approximated Maximum Likelihood Estimators*(LAMLE) of the model parameters. Hence, this strategy entails the following two-step procedure:

• Step 1: Optimization of *`** _{i}*(

**θ,****u**

*) with fixed*

_{i}

**θ**^{b}to obtain the modes ˜

**u**

*(*

_{i}*).*

**θ**• Step 2: Optimization of log ˜L(* θ*) to update the values of the estimates

which, in the implementations, can be performed as two separate routines to accelerate the estimation times. Moreover, it is of course possible to improve the approximation (and therefore inference) by taking into account higher-order terms in the Taylor expansion of

*`** _{i}*(

**θ,****u**

*) in equation (1.12), yielding a variety of higher-order approximations, see e.g.*

_{i}Lindley [1980], Liu and Pierce [1993], Raudenbush et al. [2000], yet most of modern implementations rely on the first-order approximation for computational simplicity.

It is important to point out that the asymptotics of the procedure have been assessed
since long both theoretically, see e.g. Shun and McCullagh[1995], and via simulations, see
e.g. Joe [2008]. For instance,Douc et al. [2004] show that the estimates obtained on the
grounds of an approximate log-likelihood such as (1.16) will have the correct asymptotic
distribution as long as the approximation error *ε** _{n}*(

*) :=*

**θ**^{P}

^{n}*i=1*

*ε*

*(*

_{i}*) converges to zero in probability. Other works, such as a recent preprint by Ogden [2016], show that this inference can have first-order accuracy provided conditions on the*

**θ***absolute*error of Score functions, which in the case of the LALL is given by the uniform norm of the gradient of the approximation error, i.e.

*δ*

*(*

_{n}*) := sup*

**θ***θ∈Θ*k∇

_{θ}*ε*

*(*

_{n}*)k. More specifically, they show that when*

**θ***δ*

*(*

_{n}*) =*

**θ***o*

*(*

_{p}*r*

*) among other conditions, the LAMLE converge in distribution to that of the MLE at a rate*

_{n}*r*

_{n}^{1/2}which will depend on the conditional distribution of the outcome [Ogden, 2016, Theorem 2].

The only case where the likelihood (1.9) has a closed-form expression is when the
outcome is Gaussian, i.e. in the context of LMM. To see this, rewrite model (1.5) as**y*** _{i}* =

**X**

_{i}*+*

**β**

**ε***where*

_{i}

**ε***=*

_{i}**Z**

_{i}**D**

_{σ}**u**

*+*

_{i}*are drawn independently from aN*

_{i}

_{n}*(*

_{i}**0,Σ**

*) distribution with*

_{i}**Σ**

*=*

_{i}*φI*

_{n}*+*

_{i}**Z**

_{i}**∆**

_{σ}**Z**

^{T}*because of multivariate normality and independence between vectors*

_{i}*and*

_{i}**u**

*. On the basis of this consideration, the literature of LMM proposes two competing likelihood-based estimation methods, namely the*

_{i}*Maximum Likelihood*

*(ML)*and the

*Residual*or

*Restricted*Maximum Likelihood (REML) approaches. While the ML can be obtained directly by optimizing the closed-form likelihood with respect to the model parameters using a gradient-based algorithm, it is possible to obtain the same estimates with the LALL, since the Laplace approximation is exact in the context

of Gaussian response. To illustrate this purpose, let us define *ρ*^{2} from the terms in the
exponential of the integrand in equation (1.11), as in:

*`*(**θ,****u**)∝

*n*

X

*i=1*

ky* _{i}*−

**X**

_{i}*−*

**β****Z**

_{i}**D**

_{σ}**u**

*k*

_{i}^{2}

_{2}+

*φku*

*k*

_{i}^{2}

_{2}=

*ρ*

^{2}(

**u,**

*;*

**β***φ,*)

**σ***.*(1.17) It is straightforward to see that the optimization with respect to

**u**and

*, in the spirit of the Laplace Method, implies optimization of*

**β***ρ*, an operation that is, in the words of Bates [2010], a

*Penalized Least Squares*problem yielding the

*Henderson’s Estimating*

*Equations*[Henderson, 1950] for fixed

*and*

**σ***φ*. This procedure is at the core of some implementations such as the Rpackages nlme[Pinheiro and Bates,2009] and lme4[Bates et al., 2015].

**1.4** **Bootstrapping Mixed Models**

In what follows, we review some of the bootstrapping procedures for Mixed Models cate-
gorized by the analogies one can make to the schemes available in the LM. First, we focus
on proposals having links to the *Residual Bootstrap* for LM, yet accounting for the ran-
dom structure that is particular to LMM. A second set of schemes relying on simulation
and therefore mimicking the *Wild* and *Parametric* procedures is then described, while a
third batch consisting in resampling data clusters, an analogue to the *Pairs* procedure, is
studied at the last subsection.

**1.4.1** **Bootstraps based on Random Effect Predictions**

If the random structure of a LMM can be decomposed as in equation (1.6), one can
consider resampling random errors and *Predictions* of the random effects to create the
bootstrap samples. This procedure, often named *Random Effect Bootstrap* (REB), see
e.g Davison and Hinkley[1997, Chapter 3.8], requires an initial fit of the model providing
parameter estimates **β**^{b} and Empirical Predictions (EP) for the Random Effects **u**b* _{r}* and
the Residuals b =

**y**−

**X**

**β**^{b}−

^{P}

^{q}

_{r=1}*σ*

_{b}

_{r}**Z**

_{r}**u**

_{b}

*as well. Then, one can construct bootstrap samples*

_{r}**u**

^{∗}

*and*

_{r}^{∗}via simple random sampling with replacement from the vectors of EP, yielding bootstrap samples of the outcome, as in:

**y**^{∗} =**X****β**^{b} +^{X}^{q}

*r=1*

*σ*b_{r}**Z**_{r}**u**^{∗}* _{r}*+

^{∗}

*.*(1.18) In practice, it is useful to consider a supplementary step, since predictions are

*shrunk*towards 0, implying that they are less spread than the random effects [Robinson,1991]. To illustrate this point, and following Morris [2002], let us consider the very simple example of a model with one random effect for the intercept. In this case, the expression of the

*Best Linear Unbiased Predictor*(BLUP) is given by

*u*b

*=*

_{i}

_{n}

_{i}

^{n}

_{σ}

^{i}^{2}

^{σ}_{+φ}(¯

*y*

*−*

_{i}**x**¯

^{T}

_{i}*), where*

**β**¯

*y** _{i}* =

_{n}^{1}

_{i}^{P}

^{n}*i=1*

^{i}*y*

*, with a varianceVar(*

_{ij}*u*b

*) =*

_{i}

_{n}

_{i}

^{n}

_{σ}*2*

^{i}*+φ(1−*

^{σ}^{1}

*)*

_{n}*σ*

^{2}, showing two shrinkage factors with respect to

*σ*

^{2}, the variance of the RE. To counter this effect one can consider

*reflating*the vectors of RE by dividing them with respect to an empirical estimate of their dispersion e.g.

*v*

*=*

_{r}**u**b

^{T}

_{r}**u**

_{b}

_{r}*/q*

*thus ensuring that the resulting predictions have a standard variance, a suggestion by Carpenter et al. [2003]. The resulting*

_{r}*reflated*predictions

**u**

^{e}

*=*

_{r}**u**

^{b}

_{r}*/*√

*v** _{r}*
are subsequently resampled to create the bootstrap samples.

1.4. Bootstrapping Mixed Models 11
Another scheme that generalizes the RB is the*Transformation Bootstrap*. Write model
(1.8) as**y**=**Xβ**+**Σ**^{1/2}* δ*, where

*is drawn from a multivariate standard normal distribu- tion and*

**δ****Σ**=

**Z∆Z**

*+*

^{T}*φI*

*. In this setting, one can define predictions of the standardized random components of the model*

_{N}

**δ**^{b}=

**Σ**

^{b}

^{−1/2}(

**y**−

**X**

**β**^{b}) and resample with replacement on this vector to construct bootstrap samples of the residuals

**δ**^{∗}and consequently

**y**

*via the relationship:*

^{?}**y**^{∗} =**X****β**^{b} +**Σ**^{1/2}

b*θ* **δ**^{∗}*.* (1.19)

This procedure, can be improved by the same *Reflation* step as for the REB to avoid the
effects of *shrinkage*.

Outside of the framework of LMM, it is not obvious how to define a residual/prediction-
based bootstrap. On one hand there is a problem of choice of the prediction method for
the random effects: either *Conditional Modes* (CM) or *Empirical Best Predictors* (EBP).

On the other hand, predictions of the random effects tend to fail to reproduce the normal distribution, since they depend on raw residuals that will present discreteness for some types of outcomes (e.g. Mixed Logit). For the sake of comparison, we could define the following bootstrap procedure following a suggestion found in Shao and Tu [2012].

Consider *Pearson* Residuals ˆ*ij* = [*y**ij*−*µ*ˆ*ij*]*/*^{q}*φv*(ˆ*µ**ij*) centered around an empirical mean
for the whole sample and concatenated into a vector ˆ**e*** _{P}*, and a prediction of the random
effects (either CM or EBP) ˆ

**u**

*subsequently concatenated in a vector ˆ*

_{i}**u**. After resampling with replacement over both vectors, creating

*η*

_{ij}^{∗}=

**x**

^{T}

_{ij}

**β**^{b}+

**z**

^{T}

_{ij}**D**

b^{σ}

**u**^{∗}* _{i}* and

*µ*

^{∗}

*=*

_{ij}*g*

^{−1}(

*η*

_{ij}^{∗}), bootstrap samples of the outcome can be constructed as follows :

*y*^{∗}* _{ij}* =

*µ*

^{∗}

*+*

_{ij}^{h}

*φv*

^{b}

^{}

*µ*

^{∗}

_{ij}^{i}

^{1/2}

*e*

^{∗}

_{ij}*.*(1.20)

**1.4.2** **Bootstraps based on Distributional Assumptions**

If there is enough confidence in the assumptions of the data generating processs, one
can consider generating replicates of random effects and errors using the estimates of
the model parameters via simulation, in the spirit of the *Parametric Bootstrap* for LM.

In LMM, bootstrap-samples for random effects (and sampling errors) are generated via
simulation using the estimated model parameters, i.e. **u**^{∗}* _{i}* and

^{∗}

*are drawn fromN*

_{i}*(*

_{q}**0,I**

*) andN*

_{q}

_{n}*(*

_{i}**0,**

*φI*

^{b}

_{n}*) respectively, allowing the construction of the following bootstrap samples for the outcome vectors:*

_{i}**y**^{∗}* _{i}* =

**X**

_{i}

**β**^{b}+

**Z**

_{i}**D**

b*σ***u**^{∗}* _{i}* +

^{∗}

_{i}*.*(1.21) This intuitive scheme has been used in many contexts with fair results, see e.g. Butar and Lahiri [2003], Lahiri et al. [2003], Gonz´alez-Manteiga et al. [2007] for an illustration of its use in the estimation of the uncertainty of predictions of random effects.

The extension to a non-Gaussian context is straightforward, since it is apparent that
the link between estimates **θ**^{b}, predictions **u**b* _{i}* and the conditional CDF of the outcome
is mediated through the fitted linear predictor

*η*b

*, and thus the fitted mean*

_{ij}*µ*b

*via the link function*

_{ij}*g*. It follows that bootstrap samples of the outcomes

*y*

^{∗}

*can be drawn by simulation from the conditional distribution F*

_{ij}b*µ*^{∗}* _{ij}*, where

*g*(

*µ*b

^{∗}

*) =*

_{ij}**x**

^{T}

_{ij}

**β**^{b}+

**z**

^{T}

_{ij}**D**

b*σ***u**^{∗}* _{i}* with
replicates

**u**

^{∗}

*drawn from a N(*

_{i}**0,I**

*). Variations to the method, such as a*

_{q}*Wild Bootstrap*procedure which does not require the multivariate normality of

**u**

^{∗}

*can also be considered, see e.g. Gonz´alez-Manteiga et al. [2008].*

_{i}