On the Inference of Random Effects in Generalized Linear Mixed Models

89  Download (0)

Full text




On the Inference of Random Effects in Generalized Linear Mixed Models

FLORES AGREDA, Daniel Antonio


In the first chapter, the problem of Bootstrap inference for the parameters of a GLMM is addressed. We formulate a bootstrapping strategy consisting on the random weighting of the contributions to the Joint Likelihood of Outcomes and Random Effects. Using the Laplace Approximation method for integrals on this function, yields a Random Weighted Log-Likelihood that produces the desired bootstrap replicates after optimization. In order to assess the properties of this procedure, that we name Random Weighted Likelihood Bootstrap (RWLB), we compare analytically their resulting EE to those of the Generalized Cluster Bootstrap for Gaussian LMM and conduct simulation studies both in a LMM and Mixed Logit regression contexts. The second chapter explores adaptations of the RWLB to the estimation of the uncertainty in prediction of random effects in a GLMM, as measured by the Mean Squared Error for the Predictors (MSEP).

FLORES AGREDA, Daniel Antonio. On the Inference of Random Effects in Generalized Linear Mixed Models. Thèse de doctorat : Univ. Genève, 2017, no. GSEM 44

DOI : 10.13097/archive-ouverte/unige:102003 URN : urn:nbn:ch:unige-1020037

Available at:


Disclaimer: layout of this document may differ from the published version.

1 / 1


On the Inference of Random Effects in Generalized Linear

Mixed Models


Daniel Antonio Flores Agreda

A thesis submitted to the

Geneva School of Economics and Management, University of Geneva, Switzerland,

in fulfillment of the requirements for the degree of PhD in Statistics

Members of the thesis committee:

Prof. Stefan Sperlich, Chair, University of Geneva Prof. EvaCantoni, Adviser, University of Geneva

Prof. Alan Welsh, Jury, College of Physical and Mathematical Sciences, Australia National University

Thesis No. 44 July 2017



This dissertation would have never been accomplished without the outstanding guidance of Professor Eva Cantoni. In her role of instructor throughout my undergraduate and graduate studies, she has ignited in me a deep interest for the field of Statistics through her lectures. Later, during the preparation of this PhD Thesis, she has consistently con- tributed countless profound ideas, insightful comments and productive intuitions. Aside from these professional qualities, she’s proven to be a greatly empathetic colleague and a remarkably humourous friend.

Secondly, I would to thank Prof. Sperlich and Prof. Alan H. Welsh for their helpful criticisms and inquisitive remarks. The points they have raised during the last stages of the writing of this dissertation have, in my view, provided a much needed perspective for this project to achieve the required degree of accomplishement.

Throughout these years all of my colleagues at the GSEM have provided me with various types of help, be it through our extensive academic discussions or our (even lengthier) moments of laughter. Among them, I would especially like to mention my good friends Ingrid, Marco, Mauro, Rose, Marie-Claire, Elise, Anja, Mattia, Sam, Setareh and Haotian.

Finally, I must acknoweledge my family in my homecountry, who even at the distance have provided me with a solid support and understanding throughout the years.

To all of you, all my gratitude.



This dissertation aims to provide a contribution to the literature on the inference in Generalized Linear Mixed Models (GLMM) by means of Bootstrapping techniques.

In the first chapter, the problem of Bootstrap inference for the parameters of a GLMM is addressed. Inspired by procedures based on the random weighting of the Estimating Equations (EE), see e.g. Chatterjee and Bose [2005] and, in particular, the Generalized Cluster Bootstrap (GCB) for Linear Mixed Models (LMM) [Field et al.,2010], we formu- late a bootstrapping strategy consisting on the random weighting of the contributions to the Joint Likelihood of Outcomes and Random Effects. Using the Laplace Approximation method for integrals, see e.g. De Bruijn [1970], to ’integrate out’ the random effects on the resulting weighted joint likelihood, yields a Random Weighted log-Likelihood function that produces the desired bootstrap replicates after optimization. In order to assess the validity of the procedure, that we nameRandom Weighted Likelihood Bootstrap (RWLB), we compare their resulting EE to the expressions used in the GCB for a Random Effect LMM, showing that both schemes agree up to the weighting of a term in the EE of the Variance Components. Simulation studies conducted in a LMM with random slope and a Mixed Logit regression with one random effect show that the distribution of the boot- strap replicates provides a fairly good approximation of the asymptotic distribution for the Model Parameters.

The second chapter addresses the estimation of the Mean Squared Error for the Pre- dictors (MSEP) of Random Effects in (GLMM) by means of non-parametric bootstrap methods. In the frequentist paradigm, the MSEP is used as a measure of the uncertainty in prediction and has been shown to be affected by the estimation of the model parame- ters. In the particular case of linear mixed models (LMM), two solutions are provided to practitioners: on one hand, second-order correct approximations which yield estimators of this quantity and, on the other hand, several Parametric Bootstrap algorithms. We pro- pose a non-parametric bootstrap scheme, consisting of an adaptation of the RWLB that can be used in the entire class of GLMM. On a first stage, the RWLB is used to generate bootstrap replicates of the parameters while, on a second stage, simulation is used to gen- erate bootstrap samples of standardized Random Effects. We conduct a first simulation study in the framework of Gaussian LMM to contrast the quality of our approach with respect to: (i) analytical estimators of MSEP based on approximations, (ii) Conditional Variances obtained with a Bayesian representation and (iii) other bootstrap schemes, on the grounds of their relative bias, relative efficiency and the coverage ratios of resulting prediction intervals. A second simulation study serves the purpose to illustrate the use and benefits of our proposal against other feasible alternatives in a pure, Non-Gaussian, GLMM setting.


esum´ e

Cette th`ese est une contribution `a la litt´erature sur l’inf´erence statistique dans les Mod`eles Lin´eaires G´en´eralis´es Mixtes (MLGM) par le biais de techniques Bootstrap.

Dans le premier chapitre, nous traitons le probl`eme de l’inf´erence Bootstrap des param`etres d’un MLGM. Prenant inspiration de proc´edures bas´ees sur une pond´eration al´eatoiredes Equations d’Estimation (EE) (Chatterjee and Bose[2005]), et plus sp´ecifiquement de la m´ethode nomm´ee Generalized Cluster Bootstrap (GCB, Field et al. [2010] pour les Mod`eles Lin´eaires `a Effets Mixtes (MLM), nous proposons une strat´egie de Bootstrap qui consiste `a pond´erer al´eatoirement les contributions individuelles `a la vraisemblance conjointe. A l’aide de la m´ethode de Laplace pour approximer les int´egrales (De Bruijn [1970]) afin d’int´egrer les effets al´eatoires, la fonction de vraisemblance pond´er´ee con- jointe induit une fonction de Log-Vraisemblance Al´eatoirement Pond´er´ee qui produit les r´ep´etitions Bootstrap d´esir´es apr`es optimisation. Afin de mettre en ´evidence les pro- pri´et´es de cette proc´edure, que nous appelons Bootstrap `a Vraisemblance Al´eatoirement Pond´er´ee (RWLB), dans le cadre d’un MLM, nous comparons les expressions des EE qui en r´esultent avec celles du GCB. Nous d´emontrons, notamment, que les deux proc´edures co¨ıncident `a un terme pr`es dans l’´equation qui concerne les param`etres associ´es aux Com- posantes de la Variance. Des ´etudes de simulation effectu´ees sur un MLM comportant une pente al´eatoire ainsi que sur une r´egression logistique mixte avec un effet al´eatoire, mon- trent que la distribution bootstrap permet une bonne approximation de la distribution asymptotique des param`etres du mod`ele.

Le deuxi`eme chapitre traite de l’estimation de l’Erreur Quadratique Moyenne des Pr´edicteurs (EQMP) pour les Effets Al´eatoires dans un MLGM. Dans le paradigme fr´equentiste, le EQMP est utilis´e comme mesure de l’incertitude des pr´evisions, qui est, par ailleurs, affect´ee par l’estimation des param`etres du mod`ele. Dans le cas particulier des MLM, deux solutions sont propos´ees aux praticiens: d’une part des estimateurs qui r´esultent d’approximations de deuxi`eme ordre et d’autre part des algorithmes de Boot- strap param´etrique. Nous proposons un nouveau sch´ema bootstrap bas´e sur une adapta- tion du RWLB et qui donc peut ˆetre appliqu´e `a toute la classe de MLGM. Cet algorithme est bas´e sur le RWLB pour la g´en´eration des r´ep´etitions bootstrap des estimateurs des param`etres du mod`ele et d’une proc´edure de simulation afin de produire des ´echantillons bootstrap pour des Effets Al´eatoires standard. Nous menons une ´etude de simulation dans le cadre du MLM afin de comparer la qualit´e de notre proposition avec: (i) des estimateurs de EQMP bas´es sur des approximations analytiques (ii) Variances Condition- nelles obtenues avec une repr´esentation Bay´esienne et (iii) d’autres sch´emas bootstrap.

Nous ´evaluons le biais relatif, l’efficacit´e relative et le taux ce couverture des intervalles de pr´ediction qui en r´esultent. Une simulation suppl´ementaire nous aide `a illustrer l’usage et les b´en´efices de notre proposition compar´ee `a d’autres alternatives faisables dans le contexte d’un MLGM Non-Normal.



Acknowledgements i

Abstract iii

R´esum´e v

Introduction 1

1 Bootstrapping GLMM via a Random Weighted Laplace Approximation 3

1.1 Introduction . . . 3

1.2 Model and Notation . . . 5

1.2.1 The General Formulation of a GLMM. . . 5

1.2.2 Special Case : Linear Mixed Models. . . 5

1.3 Approximate Likelihood Inference via Laplace Approximation . . . 7

1.4 Bootstrapping Mixed Models. . . 10

1.4.1 Bootstraps based on Random Effect Predictions . . . 10

1.4.2 Bootstraps based on Distributional Assumptions . . . 11

1.4.3 Bootstraps based on Data Clusters . . . 12

1.4.4 Discussion . . . 12

1.5 A Random Weighted Bootstrap for GLMM . . . 13

1.5.1 Relation with the GCB for a Random Effect LMM . . . 14

1.5.2 Properties in GLMM with an Individual Random Effect . . . 15

1.5.3 Implementation . . . 17

1.6 Simulation Study . . . 18

1.6.1 Gaussian Responses. . . 18

1.6.2 Binary Response . . . 25

1.7 Conclusions and Discussion . . . 31

1.8 Appendix . . . 32

1.8.1 Comparison of the Properties of RWLB vs.GCB . . . 32

1.8.2 Plots LMM . . . 36

1.8.3 Tables LMM. . . 38

1.8.4 Tables GLMM. . . 39

1.8.5 TMB templates for LAWLL . . . 41

2 Bootstrapping GLMM to Estimate the Uncertainty in Prediction 43 2.1 Introduction . . . 43

2.2 Model and Notation . . . 44

2.2.1 Special Case: Linear Mixed Models . . . 45

2.2.2 Point Predictors of Random Effects . . . 46


2.3 Measures of Uncertainty in Prediction of Random Effects . . . 47

2.3.1 Estimation of MSEP via Approximations . . . 47

2.3.2 Computation and Correction of CV . . . 49

2.3.3 Bootstrap-Based Measures of the Uncertainty . . . 50

2.3.4 Prediction Intervals . . . 54

2.3.5 Implementation . . . 54

2.4 Simulation Studies . . . 55

2.4.1 Gaussian Responses. . . 55

2.4.2 Binary Response . . . 59

2.5 Conclusions and Discussion . . . 62

2.6 Appendix . . . 64

2.6.1 Coverage Ratios for the Linear Predictor in GLMM study . . . 64

References 71


To my family and my friends.



Over the years, Linear Models (LM) have proven to be powerful and versatile tools in model-based research. At the beginning limited to a Gaussian response context with one random error, they were first extended to include various sources of variation by the addition of random effects to their classic specification, yielding the class of Linear Mixed Models, see e.g. Henderson [1950].

Similarily,Generalized Linear Models(GLM) [Nelder and Wedderburn,1972], initially developed to model qualitative and quantitative variables independently drawn from ex- ponential families, were also extended to account for repeated observations in various manners. One such proposals, the framework ofGeneralized Estimating Equations(GEE) Liang and Zeger [1986] can be seen as an extension of the quasi-likelihood inference in GLM, with matrix equivalents to the variances of the response depending on a working correlation matrix to account for the covariance structure among the elements within an observational unit. While simple to implement and intuitive to use, inferential procedures are limited because of the lack of expressions for the Likelihood function. Moreover, the introduction of an additional degree of subjectivity, in the shape of the working correla- tions, induces limitations in their application.

The extension of GLM that will interest us in this dissertation is the class ofGeneral- ized Linear Mixed Models (GLMM), a generalization of the approach by mixed effects to non-Gaussian responses, hence differing from the basic GLM in the addition of unobserv- able random effects at the level of the linear predictor. Contrary to the GEE approach, the Marginal Likelihood of a GLMM can be defined by the integration of the joint density of outcomes and random effects, albeit with an added complication, as the integrals often lack a closed-form expression and hence need approximations.

Like GLM, Bootstrapping techniques [Efron, 1979] have become essential elements in the toolbox of any statistician. By means of an algorithm that is both intuitive and easy to implement, researchers and practitioners alike have been given a simple method to draw inference for complex model features. In the case of Linear Models, typical implementations of this procedure rely on either resampling of fittedresiduals, resampling of datapairs, orsimulationto construct the bootstrap samples of outcomes and construct the bootstrap replicates of the parameters required to infer on the features of the sampling distribution. To the best of our knoweledge, however, extensions of the method to the GLMM framework have been scarce and rely, for the most part, on simulation. This is partly due to the complexity of the random structure, which reduces the exchangeability of observations necessary in both the pairs and residuals approach and in part to the supplementary obstacle entailed by the Likelihood approximation methods, which impedes the straightforward extensions of said approaches.


This dissertation aims to contribute to the discussion on the inference of GLMM by means of Bootstrapping techniques. We focus our attention in two subjects, namely the Bootstrap applied to GLMM for inference on the parameter estimates and its application in the estimation of the uncertainty of Random Effect Predictions. As a consequence, the manuscript comprises two self-contained chapters treating each of these subjects.

Chapter 1 adresses the problem of bootstrap-based inference in GLMM. After a review of bootstrap schemes currently available in the literature, and inspired in part by procedures based on therandom weighting of the Estimating Equations, see e.g Chatterjee and Bose[2005],Field et al. [2010], we formulate a strategy based on (i) random weighting of the contributions to the Joint Likelihood of outcomes and Random Effects and (ii) the Laplace method for the approximation of integrals.

Chapter 2 deals with the measure of uncertainty in predictions for random effects via Bootstrap methods. The manuscript overviews the literature on the measure of said uncertainty, which is customarily carried out via approximations to bothFrequentist measures such as the Mean Squared Error of Prediction and others inherited from theBayesian paradigm, such as the Conditional Variances. In this work, we propose an implementation of the method developed in the previous chapter as an alternative way of computing these measures.


Chapter 1

Bootstrapping GLMM via a Random Weighted Laplace Approximation

1.1 Introduction

Ever since the introduction of the Bootstrap in the classic paper byEfron[1979], applica- tions of the method to a wide array of problems have been developed and refined with an increasingly accelerated pace. Among its many uses, one can mention the applications in Regression Models, where Bootstrap helps researchers perform inference on the sampling distribution of the model parameters by means of bootstrap estimates for their mean, bias, variance or density. For this purpose, a great number of bootstrap schemes and strategies have been developed over time.

Consider for example the Gaussian Linear Model (LM):yi =xTi β+i,i∈ {1, . . . , n}, where xi(p×1) are covariate vectors and i are independent draws of a N(0, σ2) distri- bution. The first and the most widespread of bootstrap strategies, known as Residual Bootstrap[Efron,1979], consists in creating replicates of the random errors by resampling with replacement from the vector of the predicted residuals b = [bi]ni=1, bi = yixTi βb and creating Bootstrap replicates using the assumed data-generating mechanism with an estimate of the parameters βb, i.e. yi = xTi βb +i. A second common approach, known as External or Wild Bootstrap [Wu, 1986] creates the bootstrap samples via simulation of standardized residuals ui from a distribution with E[ui] = 0 and Var[ui] = 1, there- fore constructing i = σub i and then operating as in the Residual Bootstrap. When the distribution of ui is standard Normal, thus coinciding with the assumed data generat- ing process, this procedure is known as Parametric Bootstrap. The third scheme often proposed in the literature is the Pairs Bootstrap procedure, which consists in resampling outcomes and covariates per observational unit orpairs of data [yi,xTi ]T with replacement to create samples [yi,x∗Ti ]T yielding Bootstrap replicates of the parameters by optimizing a cost function such as the Likelihood or the Residual Sum of Squares.

Over time, these basic procedures have been widely studied and extended to encom- pass a variety of deviations from the basic linear model assumptions, yet they all rely critically on the assumption of lack of correlation among observations and therefore their exchangeability. As it has been argued, this assumption could have spourious effects when dealing with clustered data for repeated observations, where the random structure is more complex due to the inclusion of Random Effects (RE). For example, McCullagh [2000] show that different resampling strategies yield different bootstrap estimators of the sampling variance, and none of them estimate this feature consistently. Other theoret-


ical works study the conditions under which resampling procedures yield good asymp- totic approximations of the sampling distribution of the parameters. For instance, Field and Welsh [2007] show that the higher-order moment structure of the random effects, in conjunction with the assumptions on the data generating process and the choice of the resampling scheme have an impact on the bootstrap estimates of the sampling variance of the parameters for a one-way array. More generally, [Field et al.,2008] show that these structures and assumptions have an effect in the quality of the bootstrap approximation of the sampling distribution for all the parameters in a Longitudinal Linear Mixed Model.

In order to provide a general procedure, and avoid the considerations that entail a burden of choice, Field et al.[2010] propose what is perhaps the more interesting of schemes, the Generalized Cluster Bootstrap(GCB) following on many propositions of bootstrapping via random weighting (e.g. Rubin et al. 1981, Newton and Raftery 1994) and the estimating equations (e.g. Hu and Kalbfleisch 1997 and Chatterjee and Bose 2005). What’s more, Field et al. [2010] and Samanta and Welsh [2012], provide evidence of the usefulness of the method by applying the method to the fit of a Longitudinal Linear Mixed Model via estimating equations with bounded Influence Functions, showing ways of extending the procedure with considerations on its robustness, while more recent works, [Ding and Welsh,2017] explore the weighting schemes by comparing the resulting bootstraps accross model assumptions.

To the best of our knowledge, however, the question of a general non-parametric bootstrap for a Non-Gaussian Generalized Linear Mixed Model is still open. It is clear that in this context, one of the main problems comes from the difficuly of defining a residual to be resampled, making parametric orwild procedures more appealing. Another burden comes from the difficulties in approximating the multivariate integrals that compose the likelihood which potentially induce systematic biases.

We feel that any attempt to adress the question of a general bootstrap method in GLMM could be useful in various applications such as the computation of the asymptotic variances for parameter estimates, the correction of small-sample biases and the construc- tion of confidence intervals. More inherent to GLMM is the estimation of uncertainty of the predictions of the random effects as measured, for instance, by theMean Squared Error of Prediction (MSEP) see e.g. Kackar and Harville [1984], Prasad and Rao[1990] which often requires expensive computations that can be potentially bypassed with the use of bootstrap methods. Hence, this manuscript attempts to contribute to this discussion by proposing a Weighted Bootstrap applied to the Laplace Approximation for Longitudinal Non-Gaussian GLMM in the spirit of GCB [Field et al., 2010, Pang and Welsh, 2014, Ding and Welsh, 2017].

This chapter is structured as follows : On a first stage, we set up the general notation and features of the GLMM while providing the expressions for the important example that is the class of LMM (Section 1.2). After an overview of some other methods for ap- proximating the Likelihood of these models, we describe the key elements of the inference based on the Laplace approximation, all while providing the link between said methods and the exact ML inference in LMM (Section 1.3). We overview the bootstrapping meth- ods for Gaussian and Non-Gaussian GLMM in Section 1.4 and formulate our proposal and expose its links to other schemes based on the random weighting. Finally, we provide evidence of the finite sample properties of the method in two simulated examples, a LMM with random slope and intercept inspired by theOrthodonticdataset and a Mixed Logit resulting from the fit of the Toenaildata (Section 1.6).


1.2. Model and Notation 5

1.2 Model and Notation

Leti= 1, . . . , ndenote the index of theobservational unit andj = 1, . . . , ni the index for an observation within this unit. Write θ = [βT,σT]T((p+s)×1) to denote the vector of model parameters, where β(p×1) represents the fixed effect parameters and σ(s×1) contains the parameters associated with the random effects that we will (informally) refer to as variance components.

1.2.1 The General Formulation of a GLMM

In our formulation, we write yij to denote the observed outcomes, assumed to be inde- pendently drawn from an exponential family when conditioned on a vector of covariates xij(p×1) and a vector of random effects γi(q ×1) following a Nq(0,∆σ) distribution, endowed with a positive-definite symmmetric covariance matrix σ. For simplicity of ex- position and following the usual practice in the literature, we consider the reparametriza- tionγi =Dσui resulting from the Cholesky decomposition of the covariance matrix of the random effects σ =DσDTσ, whereui are multivariate standard normal vectors (see e.g.

Bates 2010). We shall write µij to denote the conditional expectation of the outcome, zij(q ×1) a design vector for the random effects and ηij = xTijβ+zTijDσui the linear predictor. With g, representing a monotonic link function that maps the linear predictor and the conditional expectation of the outcome, a general formulation of the GLMM can be written as follows:

g(µij) =ηij =xTijβ+zTijDσui. (1.1) This general formulation encompasses a wide array of models, such as Mixed versions of Poisson, Gamma and Inverse Gaussian regression. In particular, assuming yij are independent draws of a conditional Bernoulli distribution, B(1, µij) and via a logit link, for µij, it is possible to formulate the popularMixed Logit Model as follows:

logit (µij) = log µij 1−µij


=xTijβ+zTijDσui. (1.2) In order to characterize the expressions of the Likelihood, let us writef to denote the probability density function (PDF) or probability mass function (PMF) evaluated at the observed outcomes yij, conditioned on vectors xij, ui and assumed to follow conditional exponential families, therefore conferred with the following structure :

fθ(yij|ui) :=f(yij|xij,ui;θ, φ) = Xni



φ +c(yij, φ) (1.3) forφa nuisancedispersionparameter,ξij =ξ(ηij) the so-calledcanonical parameter (when φ is known) and with b and c denoting some specific functions. Moreover, it is known that bis a function that identifies the conditional mean and variance of the outcomes, i.e.

µij =Eθ[Yij|ui] =b0(ξij) and νij = v(µij) = Varθ[Yij|ui] =φb00(ξij). In what follows, we consider the linkg to be thecanonical link, in other wordsµij =b0(ηij), implyingξij =ηij.

1.2.2 Special Case : Linear Mixed Models

A widely studied GLMM is the class of Gaussian Linear Mixed Models (LMM), a case that arises when the outcomes yij are conditionally distributed as N(µij, φ) with the identity


function as link, yielding µij =xTijβ+zTijDσui. In order to facilitate the exposition of its properties and computational methods we will use the matrix representation of the LMM which mirrors that of the Linear Model. After defining the random error component ij following a N(0, φ) distribution, model (1.1) can be represented as follows:

yij =xTijβ+zTijDσui+ij. (1.4) Outcomes can then be concatenated into conditionally independent Normal vectors yi = [yij]nj=1i following Nni(µi, φIni) distributions with means µi = [µij]nj=1i by the prop- erties of the Multivariate Normal Distribution, yielding a matrix characterization per observational unit :

yi =Xiβ+ZiDσui+i, (1.5) where: Xi = [xTij]nj=1(ni ×p) serves to represent the matrix of covariates for the i-th observational unit and Zi = [zTij]nj=1(ni × q) denotes the respective design matrix for random effects. The last expresssion is sometimes known as the Longitudinal LMM, since it is assumed that the random structure grouping up to the level of an observational unit implies repeated observations through time.

A particular case, that we will callRandom Effect LMM throughout this manuscript arises when the random structure of the model can be decomposed into various inde- pendent sources of variation up to the level of the observational unit, which implies the partition of the design matrix Zi into Zi = [Zi1, . . . ,Ziq] and the corresponding partition of the vector of random effects ui = [uri]qr=1, where Zir(ni ×qir) are the design matrices per unit and uri(qir ×1) their associated random effects. With the conventionσ0 =φ1/2 and Zi0 =Ini, model (1.5) can be written as follows:

yi =Xiβ+




σrZiruir. (1.6)

By means of a careful arrangement of the random effects according to source of vari- ation ur = [uri]ni=1(qr ×1) where qr = Pni=1qir as well as the design matrices Zr = blockdiag[Zir]ni=1(N ×qr) for N =Pni=1ni, with X = [Xi]ni=1 and y= [yi]ni=1, the model can be seen as a particular case of the so-calledVariance Components Model [Searle et al., 1992], with the following representation:





σrZrur. (1.7)

It is straightforward to see that equations (1.5) and (1.7) can be deemed as cases of a General Linear Mixed Model:

y=+ZDu+ (1.8)

differing only in the structure of Z and = DDT (e.g. D = blockdiag(Dσ)ni=1, Z = blockdiag(Zi)ni=1 in the Longitudinal LMM). As pointed out in the literature, see for example McCulloch and Searle [2001], the assumptions of normality for the conditional distribution of the outcomes imply that their marginal distribution is also multivariate normal NN(Xβ,Σ), withΣ=Z∆ZT +φIN.


1.3. Approximate Likelihood Inference via Laplace Approximation 7

1.3 Approximate Likelihood Inference via Laplace Ap- proximation

In this manuscript we will limit the discussion on estimation methods to those based on the likelihood, hence the need to provide a characterization of this function. As in many latent variable problems, the expressions of the marginal PDF/PMF fθ(yij) of the observed outcomes, which are at the basis of the likelihood function, are obtained after integration of the random effects from the joint distribution of [yij,ui]. Usingϕto denote the density of the multivariate standard normal random vectoruiand with the assumption of independence between observational units, the likelihood for model (1.1) is the result of a product of multivariate integrals:

L(θ) = Yn


Li(θ) = Yn








ϕ(ui) dui. (1.9) Moreover, taking advantage of the expression of the multivariate standard normal density and the structure of the exponential family, defining functions `i as :

`i(θ,ui) :=`i(ui,θ;φ) = −1 ni






φ +c(yij, φ)


− 1 2uTi ui,

(1.10) and using the conventions on integral notation, the product (1.9) is equal to either of the following integrals :

L(θ) = (2π)−nq/2Yn




exp [−ni`i(θ,ui)] dui = (2π)−nq/2Z


exp [`(θ,u)] du, (1.11) where `(θ,u) = −Pni=1ni`i(θ,ui). It is known, however, that aside from the case of some particular models, e.g. the LMM of Section 1.2.1, these integrals are non-analytic and therefore need to be approximated numerically.

One of such approximations results from the use of Gaussian Quadrature methods.

Rabe-Hesketh et al. [2002], for instance, show that the likelihood contributions can be written as products of univariate integrals by exploiting the independence of standardized random effects. These integrals can then be approximated via weighted sums of Hermite polynomials evaluated at quadrature points determined by a quadrature rule. Additional accuracy can be obtained using theAdaptiveversion of Gaussian Quadrature (AGQ) which consists in allocating more weight and more points to the regions with more density. Many studies have shown the good properties of the inference based on these approximations when the number of quadrature points increases see e.g. Rabe-Hesketh et al. [2002], Rabe-Hesketh et al.[2005],Rabe-Hesketh and Skrondal[2008],Pinheiro and Chao[2012], but the implementations are limited to models with simple random structures, because of the excessive amount of computational resources required. Moreover, even very efficient implementations take a lot of time to return accurate estimates, see e.g. Huber et al.

[2004], making them unappealing for repeated fits in the spirit of bootstrap inference.

Another possibility is to consider aQuasi-Likelihoodapproach. FollowingGreen[1987], Breslow and Clayton [1993] highlight the fact that the exponentsni`i(θ,ui) in (1.10) can be written as the sum between the conditional log-likelihood contributions and a penalty term on the eucledian norm of ui. Hence, they propose replacing the first term in this sum with a characterization of the relationship between the conditional expectation µij


and variance v(µij) by means of the derivative of the integrated quasi-likelihood function dij(yij, µij) := −2Ryµijij[yijx]/v(x)dx, yielding a Penalized Quasi-Likelihood (PQL) ob- jective function. In its classic implementation, the PQL is optimized sequentially with respect to ui and β, yielding values of the linear predictor and the conditional expecta- tion evaluated at the optima ηbij and µbij. On the basis of these quantities, the estimation problem can be written as a LMM yijw = xTijβ+zTijDσui +ij for a working response yijw = g(yij) = g(µbij) + (yijµbij)g0(µbij) with ij ∼ N(0, wij) and wij = v(µbij)[g0(µbij)]2 which can then be fit with appropriate methods for LMM known for being less compu- tationally intensive and easier to implement. However, in spite of this convenience, it has been assessed that the resulting estimates present systematic biases especially for the variance component parameters and in presence of very discrete outcomes, see e.g.

Breslow and Lin [1995], Jang and Lim [2006]. Owing to this inconsistency, we shall not base our proposals on this method.

A widespread alternative consists in applying the Laplace Approximation Method for Integrals, (see e.g. De Bruijn 1970, Barndorff-Nielsen and Cox 1989, Ch. 6), which is a popular way of approximating multiple integrals of the form:

I(α) = Z


s(v) exp[−nt(v,α)]dv=Z



whennis large and the functiont(equivalently : h) has a unique minimum ˜vfor a givenα. In such a situation, the integral can be approximated by the value of the integrand around


v, a procedure that can be seen as applying AGQ after choosing only one quadrature point. In order to characterize the approximation of the Likelihood of GLMM with this method, we shall use the notational conventions of Magnus et al.[1995] andRaudenbush et al. [2000]. Specifically, let `(k)i (θ,ui) = u(k)i `i(θ,ui) = vec `(k−1)i (θ,ui)/∂uTi denote the Partial Derivative, or Jacobian, of k-th order of the function `i with respect to ui. Under this convention, the Gradient of `i with respect to ui is the transpose of the first Jacobian: ∇ui`i(θ,ui) = [`(1)i (θ,ui)]T and a multivariate Taylor Series expansion of `i

around a point u0 can be written as follows:

`i(θ,ui) = `i(θ,u0) +`(1)i (θ,u0) [uiu0] + 1

2[uiu0]T `(2)i (θ,u0) [uiu0] +



Tik (1.12) with the following characterization for terms of order k ≥3:

Tik =Tik(θ,u0) := 1 k!


⊗ (uiu0)T `(k)i (θ,u0) (uiu0), (1.13) where ⊗vk =vv· · · ⊗v (k-times) represents a k-fold Kronecker product of a vector v. By writing ˜ui to designate the minimum of `i, sometimes called the Mode of the joint PMF/PDF for [yiT,uiT]T, ˜ui := ˜ui(θ) = argmaxui[−ni`i(θ,ui)] and carrying out the expansion around this value, the second term of the expansion (1.12) vanishes, while the quadratic form in the third term recalls the exponent in the density of a normal random vector, yielding the following characterization after exponetiation of the series and subsequent integration:

Li(θ) = (2π)q/2|Vi(θ)|1/2exph−ni`˜i(θ)iexp [εi(θ)] (1.14)


1.3. Approximate Likelihood Inference via Laplace Approximation 9 where: ˜`(k)i (θ) := `(k)i (θ,u˜i;φ), Vi(θ) := [ni`˜(2)i (θ)]−1, Ri := −niPk=3T˜ik for ˜Tik = Tik(θ,u˜i), and εi(θ) := logE[exp (Ri)] with the expectation taken over the density of a N[˜ui,Vi(θ)] random vector. With these considerations, the Laplace-approximated con- tributions can be formulated as follows:

logLi(θ) = q

2log (2π) + 1

2log|Vi(θ)| −ni`˜i(θ) +εi(θ), (1.15) which, after neglecting the approximation error in the contributions, yields the Laplace- approximated log-Likelihood (LALL):

log ˜L(θ) = nq

2 log (2π) + 1 2




log|Vi(θ)|+ ˜`(θ), (1.16) where ˜`(θ) = −Pni=1ni`˜i(θ). Customarily this approximation is then optimized with respect to θ, in the spirit of the Maximum Likelihood (ML) approach, to obtain what we shall call Laplace-approximated Maximum Likelihood Estimators (LAMLE) of the model parameters. Hence, this strategy entails the following two-step procedure:

• Step 1: Optimization of `i(θ,ui) with fixed θb to obtain the modes ˜ui(θ).

• Step 2: Optimization of log ˜L(θ) to update the values of the estimates

which, in the implementations, can be performed as two separate routines to accelerate the estimation times. Moreover, it is of course possible to improve the approximation (and therefore inference) by taking into account higher-order terms in the Taylor expansion of

`i(θ,ui) in equation (1.12), yielding a variety of higher-order approximations, see e.g.

Lindley [1980], Liu and Pierce [1993], Raudenbush et al. [2000], yet most of modern implementations rely on the first-order approximation for computational simplicity.

It is important to point out that the asymptotics of the procedure have been assessed since long both theoretically, see e.g. Shun and McCullagh[1995], and via simulations, see e.g. Joe [2008]. For instance,Douc et al. [2004] show that the estimates obtained on the grounds of an approximate log-likelihood such as (1.16) will have the correct asymptotic distribution as long as the approximation error εn(θ) := Pni=1εi(θ) converges to zero in probability. Other works, such as a recent preprint by Ogden [2016], show that this inference can have first-order accuracy provided conditions on the absolute error of Score functions, which in the case of the LALL is given by the uniform norm of the gradient of the approximation error, i.e. δn(θ) := supθ∈Θk∇θεn(θ)k. More specifically, they show that when δn(θ) = op(rn) among other conditions, the LAMLE converge in distribution to that of the MLE at a ratern1/2 which will depend on the conditional distribution of the outcome [Ogden, 2016, Theorem 2].

The only case where the likelihood (1.9) has a closed-form expression is when the outcome is Gaussian, i.e. in the context of LMM. To see this, rewrite model (1.5) asyi = Xiβ+εi where εi =ZiDσui+i are drawn independently from aNni(0,Σi) distribution with Σi =φIni +ZiσZTi because of multivariate normality and independence between vectors i and ui. On the basis of this consideration, the literature of LMM proposes two competing likelihood-based estimation methods, namely the Maximum Likelihood (ML) and the Residual or Restricted Maximum Likelihood (REML) approaches. While the ML can be obtained directly by optimizing the closed-form likelihood with respect to the model parameters using a gradient-based algorithm, it is possible to obtain the same estimates with the LALL, since the Laplace approximation is exact in the context


of Gaussian response. To illustrate this purpose, let us define ρ2 from the terms in the exponential of the integrand in equation (1.11), as in:





kyiXiβZiDσuik22+φkuik22 =ρ2(u,β;φ,σ). (1.17) It is straightforward to see that the optimization with respect to u and β, in the spirit of the Laplace Method, implies optimization of ρ, an operation that is, in the words of Bates [2010], a Penalized Least Squares problem yielding the Henderson’s Estimating Equations [Henderson, 1950] for fixed σ and φ. This procedure is at the core of some implementations such as the Rpackages nlme[Pinheiro and Bates,2009] and lme4[Bates et al., 2015].

1.4 Bootstrapping Mixed Models

In what follows, we review some of the bootstrapping procedures for Mixed Models cate- gorized by the analogies one can make to the schemes available in the LM. First, we focus on proposals having links to the Residual Bootstrap for LM, yet accounting for the ran- dom structure that is particular to LMM. A second set of schemes relying on simulation and therefore mimicking the Wild and Parametric procedures is then described, while a third batch consisting in resampling data clusters, an analogue to the Pairs procedure, is studied at the last subsection.

1.4.1 Bootstraps based on Random Effect Predictions

If the random structure of a LMM can be decomposed as in equation (1.6), one can consider resampling random errors and Predictions of the random effects to create the bootstrap samples. This procedure, often named Random Effect Bootstrap (REB), see e.g Davison and Hinkley[1997, Chapter 3.8], requires an initial fit of the model providing parameter estimates βb and Empirical Predictions (EP) for the Random Effects ubr and the Residuals b = yXβbPqr=1σbrZrubr as well. Then, one can construct bootstrap samples ur and via simple random sampling with replacement from the vectors of EP, yielding bootstrap samples of the outcome, as in:

y =Xβb +Xq


σbrZrur+. (1.18) In practice, it is useful to consider a supplementary step, since predictions are shrunk towards 0, implying that they are less spread than the random effects [Robinson,1991]. To illustrate this point, and following Morris [2002], let us consider the very simple example of a model with one random effect for the intercept. In this case, the expression of the Best Linear Unbiased Predictor (BLUP) is given by ubi = ninσi2σyix¯Ti β), where


yi = n1i Pni=1i yij, with a varianceVar(ubi) = ninσi2σ(1−1n)σ2, showing two shrinkage factors with respect toσ2, the variance of the RE. To counter this effect one can considerreflating the vectors of RE by dividing them with respect to an empirical estimate of their dispersion e.g. vr =ubTrubr/qr thus ensuring that the resulting predictions have a standard variance, a suggestion by Carpenter et al. [2003]. The resulting reflated predictions uer = ubr/

vr are subsequently resampled to create the bootstrap samples.


1.4. Bootstrapping Mixed Models 11 Another scheme that generalizes the RB is theTransformation Bootstrap. Write model (1.8) asy=+Σ1/2δ, whereδ is drawn from a multivariate standard normal distribu- tion and Σ=Z∆ZT+φIN. In this setting, one can define predictions of the standardized random components of the model δb =Σb−1/2(yXβb) and resample with replacement on this vector to construct bootstrap samples of the residuals δ and consequentlyy? via the relationship:

y =Xβb +Σ1/2

bθ δ. (1.19)

This procedure, can be improved by the same Reflation step as for the REB to avoid the effects of shrinkage.

Outside of the framework of LMM, it is not obvious how to define a residual/prediction- based bootstrap. On one hand there is a problem of choice of the prediction method for the random effects: either Conditional Modes (CM) or Empirical Best Predictors (EBP).

On the other hand, predictions of the random effects tend to fail to reproduce the normal distribution, since they depend on raw residuals that will present discreteness for some types of outcomes (e.g. Mixed Logit). For the sake of comparison, we could define the following bootstrap procedure following a suggestion found in Shao and Tu [2012].

Consider Pearson Residuals ˆij = [yijµˆij]/qφvµij) centered around an empirical mean for the whole sample and concatenated into a vector ˆeP, and a prediction of the random effects (either CM or EBP) ˆui subsequently concatenated in a vector ˆu. After resampling with replacement over both vectors, creating ηij = xTijβb +zTijD


ui and µij = g−1(ηij), bootstrap samples of the outcome can be constructed as follows :

yij =µij +hφvb µiji1/2eij. (1.20)

1.4.2 Bootstraps based on Distributional Assumptions

If there is enough confidence in the assumptions of the data generating processs, one can consider generating replicates of random effects and errors using the estimates of the model parameters via simulation, in the spirit of the Parametric Bootstrap for LM.

In LMM, bootstrap-samples for random effects (and sampling errors) are generated via simulation using the estimated model parameters, i.e. ui andi are drawn fromNq(0,Iq) andNni(0,φIb ni) respectively, allowing the construction of the following bootstrap samples for the outcome vectors:

yi =Xiβb +ZiD

bσui +i. (1.21) This intuitive scheme has been used in many contexts with fair results, see e.g. Butar and Lahiri [2003], Lahiri et al. [2003], Gonz´alez-Manteiga et al. [2007] for an illustration of its use in the estimation of the uncertainty of predictions of random effects.

The extension to a non-Gaussian context is straightforward, since it is apparent that the link between estimates θb, predictions ubi and the conditional CDF of the outcome is mediated through the fitted linear predictor ηbij, and thus the fitted mean µbij via the link function g. It follows that bootstrap samples of the outcomes yij can be drawn by simulation from the conditional distribution F

bµij, where g(µbij) = xTijβb +zTijD

bσui with replicates ui drawn from a N(0,Iq). Variations to the method, such as a Wild Bootstrap procedure which does not require the multivariate normality ofui can also be considered, see e.g. Gonz´alez-Manteiga et al. [2008].




Related subjects :