On the Inference of Random Effects in Generalized Linear Mixed Models

(1)

Thesis

Reference

On the Inference of Random Effects in Generalized Linear Mixed Models

FLORES AGREDA, Daniel Antonio

Abstract

In the first chapter, the problem of Bootstrap inference for the parameters of a GLMM is addressed. We formulate a bootstrapping strategy consisting on the random weighting of the contributions to the Joint Likelihood of Outcomes and Random Effects. Using the Laplace Approximation method for integrals on this function, yields a Random Weighted Log-Likelihood that produces the desired bootstrap replicates after optimization. In order to assess the properties of this procedure, that we name Random Weighted Likelihood Bootstrap (RWLB), we compare analytically their resulting EE to those of the Generalized Cluster Bootstrap for Gaussian LMM and conduct simulation studies both in a LMM and Mixed Logit regression contexts. The second chapter explores adaptations of the RWLB to the estimation of the uncertainty in prediction of random effects in a GLMM, as measured by the Mean Squared Error for the Predictors (MSEP).

FLORES AGREDA, Daniel Antonio. On the Inference of Random Effects in Generalized Linear Mixed Models. Thèse de doctorat : Univ. Genève, 2017, no. GSEM 44

DOI : 10.13097/archive-ouverte/unige:102003 URN : urn:nbn:ch:unige-1020037

Available at:

http://archive-ouverte.unige.ch/unige:102003

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

On the Inference of Random Effects in Generalized Linear

Mixed Models

by

Daniel Antonio Flores Agreda

A thesis submitted to the

Geneva School of Economics and Management, University of Geneva, Switzerland,

in fulfillment of the requirements for the degree of PhD in Statistics

Members of the thesis committee:

Prof. Stefan Sperlich, Chair, University of Geneva Prof. EvaCantoni, Adviser, University of Geneva

Prof. Alan Welsh, Jury, College of Physical and Mathematical Sciences, Australia National University

Thesis No. 44 July 2017

(3)

(4)

Acknowledgements

This dissertation would have never been accomplished without the outstanding guidance of Professor Eva Cantoni. In her role of instructor throughout my undergraduate and graduate studies, she has ignited in me a deep interest for the field of Statistics through her lectures. Later, during the preparation of this PhD Thesis, she has consistently con- tributed countless profound ideas, insightful comments and productive intuitions. Aside from these professional qualities, she’s proven to be a greatly empathetic colleague and a remarkably humourous friend.

Secondly, I would to thank Prof. Sperlich and Prof. Alan H. Welsh for their helpful criticisms and inquisitive remarks. The points they have raised during the last stages of the writing of this dissertation have, in my view, provided a much needed perspective for this project to achieve the required degree of accomplishement.

Throughout these years all of my colleagues at the GSEM have provided me with various types of help, be it through our extensive academic discussions or our (even lengthier) moments of laughter. Among them, I would especially like to mention my good friends Ingrid, Marco, Mauro, Rose, Marie-Claire, Elise, Anja, Mattia, Sam, Setareh and Haotian.

Finally, I must acknoweledge my family in my homecountry, who even at the distance have provided me with a solid support and understanding throughout the years.

To all of you, all my gratitude.

(5)

(6)

Abstract

This dissertation aims to provide a contribution to the literature on the inference in Generalized Linear Mixed Models (GLMM) by means of Bootstrapping techniques.

In the first chapter, the problem of Bootstrap inference for the parameters of a GLMM is addressed. Inspired by procedures based on the random weighting of the Estimating Equations (EE), see e.g. Chatterjee and Bose [2005] and, in particular, the Generalized Cluster Bootstrap (GCB) for Linear Mixed Models (LMM) [Field et al.,2010], we formulate a bootstrapping strategy consisting on the random weighting of the contributions to the Joint Likelihood of Outcomes and Random Effects. Using the Laplace Approximation method for integrals, see e.g. De Bruijn [1970], to ’integrate out’ the random effects on the resulting weighted joint likelihood, yields a Random Weighted log-Likelihood function that produces the desired bootstrap replicates after optimization. In order to assess the validity of the procedure, that we nameRandom Weighted Likelihood Bootstrap (RWLB), we compare their resulting EE to the expressions used in the GCB for a Random Effect LMM, showing that both schemes agree up to the weighting of a term in the EE of the Variance Components. Simulation studies conducted in a LMM with random slope and a Mixed Logit regression with one random effect show that the distribution of the bootstrap replicates provides a fairly good approximation of the asymptotic distribution for the Model Parameters.

The second chapter addresses the estimation of the Mean Squared Error for the Pre- dictors (MSEP) of Random Effects in (GLMM) by means of non-parametric bootstrap methods. In the frequentist paradigm, the MSEP is used as a measure of the uncertainty in prediction and has been shown to be affected by the estimation of the model parameters. In the particular case of linear mixed models (LMM), two solutions are provided to practitioners: on one hand, second-order correct approximations which yield estimators of this quantity and, on the other hand, several Parametric Bootstrap algorithms. We propose a non-parametric bootstrap scheme, consisting of an adaptation of the RWLB that can be used in the entire class of GLMM. On a first stage, the RWLB is used to generate bootstrap replicates of the parameters while, on a second stage, simulation is used to generate bootstrap samples of standardized Random Effects. We conduct a first simulation study in the framework of Gaussian LMM to contrast the quality of our approach with respect to: (i) analytical estimators of MSEP based on approximations, (ii) Conditional Variances obtained with a Bayesian representation and (iii) other bootstrap schemes, on the grounds of their relative bias, relative efficiency and the coverage ratios of resulting prediction intervals. A second simulation study serves the purpose to illustrate the use and benefits of our proposal against other feasible alternatives in a pure, Non-Gaussian, GLMM setting.

(7)

(8)

R´ esum´ e

Cette thèse est une contribution à la littérature sur l’inférence statistique dans les Modèles Linéaires Généralisés Mixtes (MLGM) par le biais de techniques Bootstrap.

Dans le premier chapitre, nous traitons le problème de l’inférence Bootstrap des paramètres d’un MLGM. Prenant inspiration de procédures basées sur une pondération aléatoiredes Equations d’Estimation (EE) (Chatterjee and Bose[2005]), et plus spécifiquement de la méthode nommée Generalized Cluster Bootstrap (GCB, Field et al. [2010] pour les Modèles Linéaires à Effets Mixtes (MLM), nous proposons une stratégie de Bootstrap qui consiste à pondérer aléatoirement les contributions individuelles à la vraisemblance conjointe. A l’aide de la méthode de Laplace pour approximer les intégrales (De Bruijn [1970]) afin d’intégrer les effets aléatoires, la fonction de vraisemblance pondérée conjointe induit une fonction de Log-Vraisemblance Aléatoirement Pondérée qui produit les répétitions Bootstrap désirés après optimisation. Afin de mettre en évidence les pro- priétés de cette procédure, que nous appelons Bootstrap à Vraisemblance Aléatoirement Pondérée (RWLB), dans le cadre d’un MLM, nous comparons les expressions des EE qui en résultent avec celles du GCB. Nous démontrons, notamment, que les deux procédures co¨ıncident à un terme près dans l’équation qui concerne les paramètres associés aux Com- posantes de la Variance. Des études de simulation effectuées sur un MLM comportant une pente aléatoire ainsi que sur une régression logistique mixte avec un effet aléatoire, mon- trent que la distribution bootstrap permet une bonne approximation de la distribution asymptotique des paramètres du modèle.

Le deuxième chapitre traite de l’estimation de l’Erreur Quadratique Moyenne des Prédicteurs (EQMP) pour les Effets Aléatoires dans un MLGM. Dans le paradigme fréquentiste, le EQMP est utilisé comme mesure de l’incertitude des prévisions, qui est, par ailleurs, affectée par l’estimation des paramètres du modèle. Dans le cas particulier des MLM, deux solutions sont proposées aux praticiens: d’une part des estimateurs qui résultent d’approximations de deuxième ordre et d’autre part des algorithmes de Boot- strap paramétrique. Nous proposons un nouveau schéma bootstrap basé sur une adaptation du RWLB et qui donc peut être appliqué à toute la classe de MLGM. Cet algorithme est basé sur le RWLB pour la génération des répétitions bootstrap des estimateurs des paramètres du modèle et d’une procédure de simulation afin de produire des échantillons bootstrap pour des Effets Aléatoires standard. Nous menons une étude de simulation dans le cadre du MLM afin de comparer la qualité de notre proposition avec: (i) des estimateurs de EQMP basés sur des approximations analytiques (ii) Variances Condition- nelles obtenues avec une représentation Bayésienne et (iii) d’autres schémas bootstrap.

Nous évaluons le biais relatif, l’efficacité relative et le taux ce couverture des intervalles de prédiction qui en résultent. Une simulation supplémentaire nous aide à illustrer l’usage et les bénéfices de notre proposition comparée à d’autres alternatives faisables dans le contexte d’un MLGM Non-Normal.

(9)

(10)

Introduction

Over the years, Linear Models (LM) have proven to be powerful and versatile tools in model-based research. At the beginning limited to a Gaussian response context with one random error, they were first extended to include various sources of variation by the addition of random effects to their classic specification, yielding the class of Linear Mixed Models, see e.g. Henderson [1950].

Similarily,Generalized Linear Models(GLM) [Nelder and Wedderburn,1972], initially developed to model qualitative and quantitative variables independently drawn from exponential families, were also extended to account for repeated observations in various manners. One such proposals, the framework ofGeneralized Estimating Equations(GEE) Liang and Zeger [1986] can be seen as an extension of the quasi-likelihood inference in GLM, with matrix equivalents to the variances of the response depending on a working correlation matrix to account for the covariance structure among the elements within an observational unit. While simple to implement and intuitive to use, inferential procedures are limited because of the lack of expressions for the Likelihood function. Moreover, the introduction of an additional degree of subjectivity, in the shape of the working correla- tions, induces limitations in their application.

The extension of GLM that will interest us in this dissertation is the class ofGeneral- ized Linear Mixed Models (GLMM), a generalization of the approach by mixed effects to non-Gaussian responses, hence differing from the basic GLM in the addition of unobserv- able random effects at the level of the linear predictor. Contrary to the GEE approach, the Marginal Likelihood of a GLMM can be defined by the integration of the joint density of outcomes and random effects, albeit with an added complication, as the integrals often lack a closed-form expression and hence need approximations.

Like GLM, Bootstrapping techniques [Efron, 1979] have become essential elements in the toolbox of any statistician. By means of an algorithm that is both intuitive and easy to implement, researchers and practitioners alike have been given a simple method to draw inference for complex model features. In the case of Linear Models, typical implementations of this procedure rely on either resampling of fittedresiduals, resampling of datapairs, orsimulationto construct the bootstrap samples of outcomes and construct the bootstrap replicates of the parameters required to infer on the features of the sampling distribution. To the best of our knoweledge, however, extensions of the method to the GLMM framework have been scarce and rely, for the most part, on simulation. This is partly due to the complexity of the random structure, which reduces the exchangeability of observations necessary in both the pairs and residuals approach and in part to the supplementary obstacle entailed by the Likelihood approximation methods, which impedes the straightforward extensions of said approaches.

(15)

This dissertation aims to contribute to the discussion on the inference of GLMM by means of Bootstrapping techniques. We focus our attention in two subjects, namely the Bootstrap applied to GLMM for inference on the parameter estimates and its application in the estimation of the uncertainty of Random Effect Predictions. As a consequence, the manuscript comprises two self-contained chapters treating each of these subjects.

Chapter 1 adresses the problem of bootstrap-based inference in GLMM. After a review of bootstrap schemes currently available in the literature, and inspired in part by procedures based on therandom weighting of the Estimating Equations, see e.g Chatterjee and Bose[2005],Field et al. [2010], we formulate a strategy based on (i) random weighting of the contributions to the Joint Likelihood of outcomes and Random Effects and (ii) the Laplace method for the approximation of integrals.

Chapter 2 deals with the measure of uncertainty in predictions for random effects via Bootstrap methods. The manuscript overviews the literature on the measure of said uncertainty, which is customarily carried out via approximations to bothFrequentist measures such as the Mean Squared Error of Prediction and others inherited from theBayesian paradigm, such as the Conditional Variances. In this work, we propose an implementation of the method developed in the previous chapter as an alternative way of computing these measures.

(16)

Chapter 1 Bootstrapping GLMM via a Random Weighted Laplace Approximation

1.1 Introduction

Ever since the introduction of the Bootstrap in the classic paper byEfron[1979], applications of the method to a wide array of problems have been developed and refined with an increasingly accelerated pace. Among its many uses, one can mention the applications in Regression Models, where Bootstrap helps researchers perform inference on the sampling distribution of the model parameters by means of bootstrap estimates for their mean, bias, variance or density. For this purpose, a great number of bootstrap schemes and strategies have been developed over time.

Consider for example the Gaussian Linear Model (LM):y_i =x^T_i β+_i,i∈ {1, . . . , n}, where x_i(p×1) are covariate vectors and i are independent draws of a N(0, σ²) distribution. The first and the most widespread of bootstrap strategies, known as Residual Bootstrap[Efron,1979], consists in creating replicates of the random errors by resampling with replacement from the vector of the predicted residuals b = [bi]ⁿ_i=1, bi = yi −x^T_i β^b and creating Bootstrap replicates using the assumed data-generating mechanism with an estimate of the parameters β^b, i.e. y_i^∗ = x^T_i β^b +^∗_i. A second common approach, known as External or Wild Bootstrap [Wu, 1986] creates the bootstrap samples via simulation of standardized residuals u^∗_i from a distribution with E[u^∗_i] = 0 and Var[u^∗_i] = 1, therefore constructing ^∗_i = σub ^∗_i and then operating as in the Residual Bootstrap. When the distribution of u^∗_i is standard Normal, thus coinciding with the assumed data generating process, this procedure is known as Parametric Bootstrap. The third scheme often proposed in the literature is the Pairs Bootstrap procedure, which consists in resampling outcomes and covariates per observational unit orpairs of data [yi,x^T_i ]^T with replacement to create samples [y_i^∗,x^∗T_i ]^T yielding Bootstrap replicates of the parameters by optimizing a cost function such as the Likelihood or the Residual Sum of Squares.

Over time, these basic procedures have been widely studied and extended to encom- pass a variety of deviations from the basic linear model assumptions, yet they all rely critically on the assumption of lack of correlation among observations and therefore their exchangeability. As it has been argued, this assumption could have spourious effects when dealing with clustered data for repeated observations, where the random structure is more complex due to the inclusion of Random Effects (RE). For example, McCullagh [2000] show that different resampling strategies yield different bootstrap estimators of the sampling variance, and none of them estimate this feature consistently. Other theoret-

(17)

ical works study the conditions under which resampling procedures yield good asymptotic approximations of the sampling distribution of the parameters. For instance, Field and Welsh [2007] show that the higher-order moment structure of the random effects, in conjunction with the assumptions on the data generating process and the choice of the resampling scheme have an impact on the bootstrap estimates of the sampling variance of the parameters for a one-way array. More generally, [Field et al.,2008] show that these structures and assumptions have an effect in the quality of the bootstrap approximation of the sampling distribution for all the parameters in a Longitudinal Linear Mixed Model.

In order to provide a general procedure, and avoid the considerations that entail a burden of choice, Field et al.[2010] propose what is perhaps the more interesting of schemes, the Generalized Cluster Bootstrap(GCB) following on many propositions of bootstrapping via random weighting (e.g. Rubin et al. 1981, Newton and Raftery 1994) and the estimating equations (e.g. Hu and Kalbfleisch 1997 and Chatterjee and Bose 2005). What’s more, Field et al. [2010] and Samanta and Welsh [2012], provide evidence of the usefulness of the method by applying the method to the fit of a Longitudinal Linear Mixed Model via estimating equations with bounded Influence Functions, showing ways of extending the procedure with considerations on its robustness, while more recent works, [Ding and Welsh,2017] explore the weighting schemes by comparing the resulting bootstraps accross model assumptions.

To the best of our knowledge, however, the question of a general non-parametric bootstrap for a Non-Gaussian Generalized Linear Mixed Model is still open. It is clear that in this context, one of the main problems comes from the difficuly of defining a residual to be resampled, making parametric orwild procedures more appealing. Another burden comes from the difficulties in approximating the multivariate integrals that compose the likelihood which potentially induce systematic biases.

We feel that any attempt to adress the question of a general bootstrap method in GLMM could be useful in various applications such as the computation of the asymptotic variances for parameter estimates, the correction of small-sample biases and the construction of confidence intervals. More inherent to GLMM is the estimation of uncertainty of the predictions of the random effects as measured, for instance, by theMean Squared Error of Prediction (MSEP) see e.g. Kackar and Harville [1984], Prasad and Rao[1990] which often requires expensive computations that can be potentially bypassed with the use of bootstrap methods. Hence, this manuscript attempts to contribute to this discussion by proposing a Weighted Bootstrap applied to the Laplace Approximation for Longitudinal Non-Gaussian GLMM in the spirit of GCB [Field et al., 2010, Pang and Welsh, 2014, Ding and Welsh, 2017].

This chapter is structured as follows : On a first stage, we set up the general notation and features of the GLMM while providing the expressions for the important example that is the class of LMM (Section 1.2). After an overview of some other methods for approximating the Likelihood of these models, we describe the key elements of the inference based on the Laplace approximation, all while providing the link between said methods and the exact ML inference in LMM (Section 1.3). We overview the bootstrapping methods for Gaussian and Non-Gaussian GLMM in Section 1.4 and formulate our proposal and expose its links to other schemes based on the random weighting. Finally, we provide evidence of the finite sample properties of the method in two simulated examples, a LMM with random slope and intercept inspired by theOrthodonticdataset and a Mixed Logit resulting from the fit of the Toenaildata (Section 1.6).

(18)

1.2. Model and Notation 5

1.2 Model and Notation

Leti= 1, . . . , ndenote the index of theobservational unit andj = 1, . . . , n_i the index for an observation within this unit. Write θ = [β^T,σ^T]^T((p+s)×1) to denote the vector of model parameters, where β(p×1) represents the fixed effect parameters and σ(s×1) contains the parameters associated with the random effects that we will (informally) refer to as variance components.

1.2.1 The General Formulation of a GLMM

In our formulation, we write y_ij to denote the observed outcomes, assumed to be independently drawn from an exponential family when conditioned on a vector of covariates x_ij(p×1) and a vector of random effects γ_i(q ×1) following a Nq(0,∆_σ) distribution, endowed with a positive-definite symmmetric covariance matrix ∆_σ. For simplicity of exposition and following the usual practice in the literature, we consider the reparametriza- tionγ_i =D_σu_i resulting from the Cholesky decomposition of the covariance matrix of the random effects ∆_σ =D_σD^T_σ, whereu_i are multivariate standard normal vectors (see e.g.

Bates 2010). We shall write µ_ij to denote the conditional expectation of the outcome, z_ij(q ×1) a design vector for the random effects and ηij = x^T_ijβ+z^T_ijD_σu_i the linear predictor. With g, representing a monotonic link function that maps the linear predictor and the conditional expectation of the outcome, a general formulation of the GLMM can be written as follows:

g(µ_ij) =η_ij =x^T_ijβ+z^T_ijD_σu_i. (1.1) This general formulation encompasses a wide array of models, such as Mixed versions of Poisson, Gamma and Inverse Gaussian regression. In particular, assuming y_ij are independent draws of a conditional Bernoulli distribution, B(1, µ_ij) and via a logit link, for µ_ij, it is possible to formulate the popularMixed Logit Model as follows:

logit (µ_ij) = log µ_ij 1−µ_ij

!

=x^T_ijβ+z^T_ijD_σu_i. (1.2) In order to characterize the expressions of the Likelihood, let us writef to denote the probability density function (PDF) or probability mass function (PMF) evaluated at the observed outcomes y_ij, conditioned on vectors x_ij, u_i and assumed to follow conditional exponential families, therefore conferred with the following structure :

fθ(yij|ui) :=f(yij|xij,u_i;θ, φ) = ^Xⁿⁱ

j=1

y_ijξ_ij −b(ξ_ij)

φ +c(yij, φ) (1.3) forφa nuisancedispersionparameter,ξ_ij =ξ(η_ij) the so-calledcanonical parameter (when φ is known) and with b and c denoting some specific functions. Moreover, it is known that bis a function that identifies the conditional mean and variance of the outcomes, i.e.

µ_ij =Eθ[Y_ij|u_i] =b⁰(ξ_ij) and ν_ij = v(µ_ij) = Varθ[Y_ij|u_i] =φb⁰⁰(ξ_ij). In what follows, we consider the linkg to be thecanonical link, in other wordsµ_ij =b⁰(η_ij), implyingξ_ij =η_ij.

1.2.2 Special Case : Linear Mixed Models

A widely studied GLMM is the class of Gaussian Linear Mixed Models (LMM), a case that arises when the outcomes y_ij are conditionally distributed as N(µ_ij, φ) with the identity

(19)

function as link, yielding µij =x^T_ijβ+z^T_ijD_σu_i. In order to facilitate the exposition of its properties and computational methods we will use the matrix representation of the LMM which mirrors that of the Linear Model. After defining the random error component _ij following a N(0, φ) distribution, model (1.1) can be represented as follows:

y_ij =x^T_ijβ+z^T_ijD_σu_i+_ij. (1.4) Outcomes can then be concatenated into conditionally independent Normal vectors y_i = [y_ij]ⁿj=1ⁱ following N_n_i(µ_i, φI_n_i) distributions with means µ_i = [µ_ij]ⁿj=1ⁱ by the properties of the Multivariate Normal Distribution, yielding a matrix characterization per observational unit :

y_i =X_iβ+Z_iD_σu_i+_i, (1.5) where: X_i = [x^T_ij]ⁿj=1(n_i ×p) serves to represent the matrix of covariates for the i-th observational unit and Z_i = [z^T_ij]ⁿj=1(ni × q) denotes the respective design matrix for random effects. The last expresssion is sometimes known as the Longitudinal LMM, since it is assumed that the random structure grouping up to the level of an observational unit implies repeated observations through time.

A particular case, that we will callRandom Effect LMM throughout this manuscript arises when the random structure of the model can be decomposed into various independent sources of variation up to the level of the observational unit, which implies the partition of the design matrix Z_i into Z_i = [Z_i1, . . . ,Z_iq] and the corresponding partition of the vector of random effects u_i = [u_ri]^qr=1, where Z_ir(n_i ×q_ir) are the design matrices per unit and u_ri(q_ir ×1) their associated random effects. With the conventionσ₀ =φ^1/2 and Z_i0 =I_n_i, model (1.5) can be written as follows:

y_i =X_iβ+

q

X

r=0

σ_rZ_iru_ir. (1.6)

By means of a careful arrangement of the random effects according to source of variation u_r = [u_ri]ⁿ_i=1(q_r ×1) where q_r = ^Pⁿ_i=1q_ir as well as the design matrices Z_r = blockdiag[Z_ir]ⁿi=1(N ×qr) for N =^Pⁿi=1ni, with X = [X_i]ⁿi=1 and y= [y_i]ⁿi=1, the model can be seen as a particular case of the so-calledVariance Components Model [Searle et al., 1992], with the following representation:

y=Xβ+

q

X

r=0

σrZ_ru_r. (1.7)

It is straightforward to see that equations (1.5) and (1.7) can be deemed as cases of a General Linear Mixed Model:

y=Xβ+ZDu+ (1.8)

differing only in the structure of Z and ∆ = DD^T (e.g. D = blockdiag(D_σ)ⁿi=1, Z = blockdiag(Z_i)ⁿi=1 in the Longitudinal LMM). As pointed out in the literature, see for example McCulloch and Searle [2001], the assumptions of normality for the conditional distribution of the outcomes imply that their marginal distribution is also multivariate normal N_N(Xβ,Σ), withΣ=Z∆Z^T +φI_N.

(20)

1.3. Approximate Likelihood Inference via Laplace Approximation 7

1.3 Approximate Likelihood Inference via Laplace Ap- proximation

In this manuscript we will limit the discussion on estimation methods to those based on the likelihood, hence the need to provide a characterization of this function. As in many latent variable problems, the expressions of the marginal PDF/PMF f_θ(y_ij) of the observed outcomes, which are at the basis of the likelihood function, are obtained after integration of the random effects from the joint distribution of [y_ij,u_i]. Usingϕto denote the density of the multivariate standard normal random vectoru_iand with the assumption of independence between observational units, the likelihood for model (1.1) is the result of a product of multivariate integrals:

L(θ) = ^Yⁿ

i=1

L_i(θ) = ^Yⁿ

i=1

Z

R^q





ni

Y

j=1

f_θ(y_ij|u_i)



ϕ(u_i) du_i. (1.9) Moreover, taking advantage of the expression of the multivariate standard normal density and the structure of the exponential family, defining functions `_i as :

`_i(θ,u_i) :=`_i(u_i,θ;φ) = −1 n_i







ni

X

j=1

"

y_ijη_ij −b(η_ij)

φ +c(y_ij, φ)

#

− 1 2u^T_i u_i,







(1.10) and using the conventions on integral notation, the product (1.9) is equal to either of the following integrals :

L(θ) = (2π)^−nq/2^Yⁿ

i=1

Z

R^q

exp [−n_i`_i(θ,u_i)] du_i = (2π)^−nq/2^Z

R^nq

exp [`(θ,u)] du, (1.11) where `(θ,u) = −^Pⁿ_i=1n_i`_i(θ,u_i). It is known, however, that aside from the case of some particular models, e.g. the LMM of Section 1.2.1, these integrals are non-analytic and therefore need to be approximated numerically.

One of such approximations results from the use of Gaussian Quadrature methods.

Rabe-Hesketh et al. [2002], for instance, show that the likelihood contributions can be written as products of univariate integrals by exploiting the independence of standardized random effects. These integrals can then be approximated via weighted sums of Hermite polynomials evaluated at quadrature points determined by a quadrature rule. Additional accuracy can be obtained using theAdaptiveversion of Gaussian Quadrature (AGQ) which consists in allocating more weight and more points to the regions with more density. Many studies have shown the good properties of the inference based on these approximations when the number of quadrature points increases see e.g. Rabe-Hesketh et al. [2002], Rabe-Hesketh et al.[2005],Rabe-Hesketh and Skrondal[2008],Pinheiro and Chao[2012], but the implementations are limited to models with simple random structures, because of the excessive amount of computational resources required. Moreover, even very efficient implementations take a lot of time to return accurate estimates, see e.g. Huber et al.

[2004], making them unappealing for repeated fits in the spirit of bootstrap inference.

Another possibility is to consider aQuasi-Likelihoodapproach. FollowingGreen[1987], Breslow and Clayton [1993] highlight the fact that the exponentsn_i`_i(θ,u_i) in (1.10) can be written as the sum between the conditional log-likelihood contributions and a penalty term on the eucledian norm of u_i. Hence, they propose replacing the first term in this sum with a characterization of the relationship between the conditional expectation µ_ij

(21)

and variance v(µij) by means of the derivative of the integrated quasi-likelihood function d_ij(y_ij, µ_ij) := −2^Ry^µij^ij[y_ij −x]/v(x)dx, yielding a Penalized Quasi-Likelihood (PQL) ob- jective function. In its classic implementation, the PQL is optimized sequentially with respect to u_i and β, yielding values of the linear predictor and the conditional expectation evaluated at the optima ηb_ij and µb_ij. On the basis of these quantities, the estimation problem can be written as a LMM y_ij^w = x^T_ijβ+z^T_ijD_σu_i +_ij for a working response y_ij^w = g(yij) = g(µbij) + (yij −µbij)g⁰(µbij) with ij ∼ N(0, wij) and wij = v(µbij)[g⁰(µbij)]² which can then be fit with appropriate methods for LMM known for being less compu- tationally intensive and easier to implement. However, in spite of this convenience, it has been assessed that the resulting estimates present systematic biases especially for the variance component parameters and in presence of very discrete outcomes, see e.g.

Breslow and Lin [1995], Jang and Lim [2006]. Owing to this inconsistency, we shall not base our proposals on this method.

A widespread alternative consists in applying the Laplace Approximation Method for Integrals, (see e.g. De Bruijn 1970, Barndorff-Nielsen and Cox 1989, Ch. 6), which is a popular way of approximating multiple integrals of the form:

I(α) = ^Z

R^d

s(v) exp[−nt(v,α)]dv=^Z

R^d

exp[−nh(v,α)]dv,

whennis large and the functiont(equivalently : h) has a unique minimum ˜vfor a givenα. In such a situation, the integral can be approximated by the value of the integrand around

˜

v, a procedure that can be seen as applying AGQ after choosing only one quadrature point. In order to characterize the approximation of the Likelihood of GLMM with this method, we shall use the notational conventions of Magnus et al.[1995] andRaudenbush et al. [2000]. Specifically, let `^(k)_i (θ,u_i) = ∂_u^(k)_i ì(θ,u_i) = ∂vec `^(k−1)_i (θ,u_i)/∂u^T_i denote the Partial Derivative, or Jacobian, of k-th order of the function `_i with respect to u_i. Under this convention, the Gradient of `_i with respect to u_i is the transpose of the first Jacobian: ∇uiì(θ,u_i) = [`⁽¹⁾_i (θ,u_i)]^T and a multivariate Taylor Series expansion of ì

around a point u₀ can be written as follows:

`_i(θ,u_i) = `_i(θ,u₀) +`⁽¹⁾_i (θ,u₀) [u_i−u₀] + 1

2[u_i−u₀]^T `⁽²⁾_i (θ,u₀) [u_i−u₀] +

∞

X

k=3

T_ik (1.12) with the following characterization for terms of order k ≥3:

Tik =Tik(θ,u₀) := 1 k!

_k−1

⊗ (u_i−u₀)^T `^(k)_i (θ,u₀) (u_i−u₀), (1.13) where ⊗v^k =v⊗v· · · ⊗v (k-times) represents a k-fold Kronecker product of a vector v. By writing ˜u_i to designate the minimum of `_i, sometimes called the Mode of the joint PMF/PDF for [y_i^T,u_i^T]^T, ˜u_i := ˜u_i(θ) = argmaxui[−ni`i(θ,u_i)] and carrying out the expansion around this value, the second term of the expansion (1.12) vanishes, while the quadratic form in the third term recalls the exponent in the density of a normal random vector, yielding the following characterization after exponetiation of the series and subsequent integration:

L_i(θ) = (2π)^q/2|V_i(θ)|^1/2exp^h−n_i`˜_i(θ)ⁱexp [ε_i(θ)] (1.14)

(22)

1.3. Approximate Likelihood Inference via Laplace Approximation 9 where: ˜`^(k)_i (θ) := `^(k)_i (θ,u˜_i;φ), V_i(θ) := [n_i`˜⁽²⁾_i (θ)]⁻¹, R_i := −n_i^P^∞_k=3T˜_ik for ˜T_ik = T_ik(θ,u˜_i), and ε_i(θ) := logE[exp (R_i)] with the expectation taken over the density of a N[˜u_i,V_i(θ)] random vector. With these considerations, the Laplace-approximated contributions can be formulated as follows:

logLi(θ) = q

2log (2π) + 1

2log|Vi(θ)| −ni`˜i(θ) +εi(θ), (1.15) which, after neglecting the approximation error in the contributions, yields the Laplace- approximated log-Likelihood (LALL):

log ˜L(θ) = nq

2 log (2π) + 1 2

n

X

i=1

log|V_i(θ)|+ ˜`(θ), (1.16) where ˜`(θ) = −^Pⁿ_i=1n_i`˜_i(θ). Customarily this approximation is then optimized with respect to θ, in the spirit of the Maximum Likelihood (ML) approach, to obtain what we shall call Laplace-approximated Maximum Likelihood Estimators (LAMLE) of the model parameters. Hence, this strategy entails the following two-step procedure:

• Step 1: Optimization of `_i(θ,u_i) with fixed θ^b to obtain the modes ˜u_i(θ).

• Step 2: Optimization of log ˜L(θ) to update the values of the estimates

which, in the implementations, can be performed as two separate routines to accelerate the estimation times. Moreover, it is of course possible to improve the approximation (and therefore inference) by taking into account higher-order terms in the Taylor expansion of

`_i(θ,u_i) in equation (1.12), yielding a variety of higher-order approximations, see e.g.

Lindley [1980], Liu and Pierce [1993], Raudenbush et al. [2000], yet most of modern implementations rely on the first-order approximation for computational simplicity.

It is important to point out that the asymptotics of the procedure have been assessed since long both theoretically, see e.g. Shun and McCullagh[1995], and via simulations, see e.g. Joe [2008]. For instance,Douc et al. [2004] show that the estimates obtained on the grounds of an approximate log-likelihood such as (1.16) will have the correct asymptotic distribution as long as the approximation error ε_n(θ) := ^Pⁿi=1ε_i(θ) converges to zero in probability. Other works, such as a recent preprint by Ogden [2016], show that this inference can have first-order accuracy provided conditions on the absolute error of Score functions, which in the case of the LALL is given by the uniform norm of the gradient of the approximation error, i.e. δ_n(θ) := supθ∈Θk∇_θε_n(θ)k. More specifically, they show that when δ_n(θ) = o_p(r_n) among other conditions, the LAMLE converge in distribution to that of the MLE at a rater_n^1/2 which will depend on the conditional distribution of the outcome [Ogden, 2016, Theorem 2].

The only case where the likelihood (1.9) has a closed-form expression is when the outcome is Gaussian, i.e. in the context of LMM. To see this, rewrite model (1.5) asy_i = X_iβ+ε_i where ε_i =Z_iD_σu_i+_i are drawn independently from aN_n_i(0,Σ_i) distribution with Σ_i =φI_n_i +Z_i∆_σZ^T_i because of multivariate normality and independence between vectors _i and u_i. On the basis of this consideration, the literature of LMM proposes two competing likelihood-based estimation methods, namely the Maximum Likelihood (ML) and the Residual or Restricted Maximum Likelihood (REML) approaches. While the ML can be obtained directly by optimizing the closed-form likelihood with respect to the model parameters using a gradient-based algorithm, it is possible to obtain the same estimates with the LALL, since the Laplace approximation is exact in the context

(23)

of Gaussian response. To illustrate this purpose, let us define ρ² from the terms in the exponential of the integrand in equation (1.11), as in:

`(θ,u)∝

n

X

i=1

ky_i−X_iβ−Z_iD_σu_ik²₂+φku_ik²₂ =ρ²(u,β;φ,σ). (1.17) It is straightforward to see that the optimization with respect to u and β, in the spirit of the Laplace Method, implies optimization of ρ, an operation that is, in the words of Bates [2010], a Penalized Least Squares problem yielding the Henderson’s Estimating Equations [Henderson, 1950] for fixed σ and φ. This procedure is at the core of some implementations such as the Rpackages nlme[Pinheiro and Bates,2009] and lme4[Bates et al., 2015].

1.4 Bootstrapping Mixed Models

In what follows, we review some of the bootstrapping procedures for Mixed Models cate- gorized by the analogies one can make to the schemes available in the LM. First, we focus on proposals having links to the Residual Bootstrap for LM, yet accounting for the random structure that is particular to LMM. A second set of schemes relying on simulation and therefore mimicking the Wild and Parametric procedures is then described, while a third batch consisting in resampling data clusters, an analogue to the Pairs procedure, is studied at the last subsection.

1.4.1 Bootstraps based on Random Effect Predictions

If the random structure of a LMM can be decomposed as in equation (1.6), one can consider resampling random errors and Predictions of the random effects to create the bootstrap samples. This procedure, often named Random Effect Bootstrap (REB), see e.g Davison and Hinkley[1997, Chapter 3.8], requires an initial fit of the model providing parameter estimates β^b and Empirical Predictions (EP) for the Random Effects ub_r and the Residuals b = y−Xβ^b −^P^q_r=1σ_b_rZ_ru_b_r as well. Then, one can construct bootstrap samples u^∗_r and ^∗ via simple random sampling with replacement from the vectors of EP, yielding bootstrap samples of the outcome, as in:

y^∗ =Xβ^b +^X^q

r=1

σb_rZ_ru^∗_r+^∗. (1.18) In practice, it is useful to consider a supplementary step, since predictions are shrunk towards 0, implying that they are less spread than the random effects [Robinson,1991]. To illustrate this point, and following Morris [2002], let us consider the very simple example of a model with one random effect for the intercept. In this case, the expression of the Best Linear Unbiased Predictor (BLUP) is given by ub_i = _n_iⁿ_σⁱ²^σ_+φ(¯y_i −x¯^T_i β), where

¯

y_i = _n¹_i ^Pⁿi=1ⁱ y_ij, with a varianceVar(ub_i) = _n_iⁿ_σⁱ2^σ+φ(1−¹_n)σ², showing two shrinkage factors with respect toσ², the variance of the RE. To counter this effect one can considerreflating the vectors of RE by dividing them with respect to an empirical estimate of their dispersion e.g. v_r =ub^T_ru_b_r/q_r thus ensuring that the resulting predictions have a standard variance, a suggestion by Carpenter et al. [2003]. The resulting reflated predictions u^e_r = u^b_r/√

v_r are subsequently resampled to create the bootstrap samples.

(24)

1.4. Bootstrapping Mixed Models 11 Another scheme that generalizes the RB is theTransformation Bootstrap. Write model (1.8) asy=Xβ+Σ^1/2δ, whereδ is drawn from a multivariate standard normal distribution and Σ=Z∆Z^T+φI_N. In this setting, one can define predictions of the standardized random components of the model δ^b =Σ^b^−1/2(y−Xβ^b) and resample with replacement on this vector to construct bootstrap samples of the residuals δ^∗ and consequentlyy^? via the relationship:

y^∗ =Xβ^b +Σ^1/2

bθ δ^∗. (1.19)

This procedure, can be improved by the same Reflation step as for the REB to avoid the effects of shrinkage.

Outside of the framework of LMM, it is not obvious how to define a residual/prediction- based bootstrap. On one hand there is a problem of choice of the prediction method for the random effects: either Conditional Modes (CM) or Empirical Best Predictors (EBP).

On the other hand, predictions of the random effects tend to fail to reproduce the normal distribution, since they depend on raw residuals that will present discreteness for some types of outcomes (e.g. Mixed Logit). For the sake of comparison, we could define the following bootstrap procedure following a suggestion found in Shao and Tu [2012].

Consider Pearson Residuals îj = [yij−µîj]/^qφv(ˆµij) centered around an empirical mean for the whole sample and concatenated into a vector ê_P, and a prediction of the random effects (either CM or EBP) û_i subsequently concatenated in a vector û. After resampling with replacement over both vectors, creating η_ij^∗ = x^T_ijβ^b +z^T_ijD

b^σ

u^∗_i and µ^∗_ij = g⁻¹(η_ij^∗), bootstrap samples of the outcome can be constructed as follows :

y^∗_ij =µ^∗_ij +^hφv^b µ^∗_ijⁱ^1/2e^∗_ij. (1.20)

1.4.2 Bootstraps based on Distributional Assumptions

If there is enough confidence in the assumptions of the data generating processs, one can consider generating replicates of random effects and errors using the estimates of the model parameters via simulation, in the spirit of the Parametric Bootstrap for LM.

In LMM, bootstrap-samples for random effects (and sampling errors) are generated via simulation using the estimated model parameters, i.e. u^∗_i and^∗_i are drawn fromN_q(0,I_q) andN_n_i(0,φI^b _n_i) respectively, allowing the construction of the following bootstrap samples for the outcome vectors:

y^∗_i =X_iβ^b +Z_iD

bσu^∗_i +^∗_i. (1.21) This intuitive scheme has been used in many contexts with fair results, see e.g. Butar and Lahiri [2003], Lahiri et al. [2003], Gonz´alez-Manteiga et al. [2007] for an illustration of its use in the estimation of the uncertainty of predictions of random effects.

The extension to a non-Gaussian context is straightforward, since it is apparent that the link between estimates θ^b, predictions ub_i and the conditional CDF of the outcome is mediated through the fitted linear predictor ηb_ij, and thus the fitted mean µb_ij via the link function g. It follows that bootstrap samples of the outcomes y^∗_ij can be drawn by simulation from the conditional distribution F

bµ^∗_ij, where g(µb^∗_ij) = x^T_ijβ^b +z^T_ijD

bσu^∗_i with replicates u^∗_i drawn from a N(0,I_q). Variations to the method, such as a Wild Bootstrap procedure which does not require the multivariate normality ofu^∗_i can also be considered, see e.g. Gonz´alez-Manteiga et al. [2008].

On the Inference of Random Effects in Generalized Linear Mixed Models

Thesis

Reference

On the Inference of Random Effects in Generalized Linear Mixed Models

On the Inference of Random Effects in Generalized Linear

Mixed Models

Daniel Antonio Flores Agreda

Acknowledgements

Abstract

R´ esum´ e

Contents

Introduction

Chapter 1

Bootstrapping GLMM via a Random Weighted Laplace Approximation

1.1 Introduction

1.2 Model and Notation

1.2.1 The General Formulation of a GLMM

1.2.2 Special Case : Linear Mixed Models

1.3 Approximate Likelihood Inference via Laplace Ap- proximation

1.4 Bootstrapping Mixed Models

1.4.1 Bootstraps based on Random Effect Predictions

1.4.2 Bootstraps based on Distributional Assumptions