TMB templates for LAWLL - On the Inference of Random Effects in Generalized Linear Mixed Models

1.8 Appendix

1.8.5 TMB templates for LAWLL

LMM

Listing 1.1: TMBtemplate for the Weighted Joint Likelihood of the LMM

1 // G a u s s i a n L i n e a r M i x e d M o d e l wi th R a n d o m S l o p e and R a n d o m I n t e r c e p t .

2 # i n c l u d e < TMB . hpp >

4 t e m p l a t e<c l a s s Type >

5 T y p e o b j e c t i v e _ f u n c t i o n < Type >::o p e r a t o r() ()

6 {

7 D A T A _ F A C T O R ( g r o u p );

8 D A T A _ V E C T O R ( y );

9 D A T A _ M A T R I X ( X );

10 D A T A _ V E C T O R ( t ); // V a r i a b l e a s s o c i a t e d w i th the R a n d o m S l o p e

11 D A T A _ V E C T O R ( w ); // R a n d o m W e i g h t

12 P A R A M E T E R _ V E C T O R ( u ); // I n t e r c e p t for g i v e n i n d i v i d u a l

13 P A R A M E T E R _ V E C T O R ( v ); // S l o p e for g i v e n i n d i v i d u a l

14 P A R A M E T E R _ V E C T O R ( b e t a ); // length - fo ur for ea ch of the c o v a r i a t e s

15 P A R A M E T E R _ V E C T O R ( s i g m a ); // one for i n t e r c e p t , a n o t h e r for s l o p e

17 int n o b s = y . s i z e ();

18 int n g r o u p s = u . s i z e ();

19 int j ;

20 T y p e res ( 0 . 0 ) ;

22 vector < Type > mu = X * b e t a ;

24 /* W e i g h t e d D i s t r i b u t i o n of I n t e r c e p t and S l o p e */

25 for(int j =0; j < n g r o u p s ; j + + ) {

26 res -= w [ j ]* d n o r m ( u [ j ] , T y p e (0) , s i g m a [0] , 1);

27 res -= w [ j ]* d n o r m ( v [ j ] , T y p e (0) , s i g m a [1] , 1);

28 }

30 /* J o i n t W e i g h t e d D i s t r i b u t i o n */

31 for(int i =0; i < n o b s ; i + + ) {

32 j = g r o u p [ i ];

33 res -= w [ j ]* d n o r m ( y [ i ] , T y p e ( mu [ i ]+ u [ j ]+ t [ i ]* v [ j ]) , s i g m a [2] , 1);

34 }

35 r e t u r n res ;

36 }

GLMM

Listing 1.2: TMBtemplate for the Weighted Joint Likelihood of the Mixed Logit Model

1 // M i x e d L o g i t R e g r e s s i o n m o d e l with R a n d o m I n t e r c e p t .

2 # i n c l u d e < TMB . hpp >

4 t e m p l a t e<c l a s s Type >

5 T y p e o b j e c t i v e _ f u n c t i o n < Type >::o p e r a t o r() ()

6 {

7 D A T A _ F A C T O R ( g r o u p );

8 D A T A _ V E C T O R ( y );

9 D A T A _ M A T R I X ( X );

10 D A T A _ V E C T O R ( w ); // R a n d o m W e i g h t

11 P A R A M E T E R _ V E C T O R ( u ); // I n t e r c e p t for g i v e n i n d i v i d u a l

12 P A R A M E T E R _ V E C T O R ( b e t a ); // length - fo ur for ea ch of the c o v a r i a t e s

13 P A R A M E T E R ( s i g m a ); // one for i n t e r c e p t

15 int n o b s = y . s i z e ();

16 int n g r o u p s = u . s i z e ();

17 int j ;

18 T y p e res ( 0 . 0 ) ;

20 vector < Type > mu = X * b e t a ;

22 /* W e i g h t e d D i s t r i b u t i o n of the I n t e r c e p t */

23 for(int j =0; j < n g r o u p s ; j + + ) {

24 res -= w [ j ]* d n o r m ( u [ j ] , T y p e (0) , sigma , 1 ) ;

25 }

27 /* J o i n t W e i g h t e d D i s t r i b u t i o n */

28 for(int i =0; i < n o b s ; i + + ) {

29 j = g r o u p [ i ];

30 res -= w [ j ]* d b i n o m ( y [ i ] , T y p e (1) , T y p e ( 1 / ( 1 + exp ( - mu [ i ] - u [ j ] ) ) ) , 1 ) ;

31 }

33 r e t u r n res ;

34 }

Chapter 2 Bootstrapping GLMM to Estimate the Uncertainty in Prediction

2.1 Introduction

In many applications of Mixed Models, it is of interest to provide an “estimate” of the value for the random effects, be it for forecasting purposes or to assess the quality of a particular fit e.g. by performing some sort of residual analysis post-estimation. The process of providing such values is customarily called Point Prediction of the Random Effects, a denomination used to state its difference from the Estimation of the Model Parameters.

The problem of Prediction of Random Effects has been widely explored in the literature of Gaussian GLMM or Linear Mixed Models (LMM), a setting for which theoretical results have led to the determination of the Best Linear Unbiased Predictor (BLUP) in full knoweldege of the model parameters, and its Empirical version (EBLUP) when the parameters are estimated. Naturally, this approach has analogues in the Non-Gaussian framework in the form of the Best Predictor (BP) and EBP, often approximated by Conditional Modes obtained with an Empirical Bayes approach, see e.g Morris [1983], Tierney and Kadane [1986].

Similarly to the estimation problem, where point estimates are provided alongside their standard errors for inferential purposes, it is useful to retrieve a measure of uncer-tainty of the point predictions e.g. to classify observational units according to “significant”

differences in their predicted response or to construct prediction intervals for new obser-vations drawn from a given unit. In LMM, this translates into the computation of the Mean Squared Error of the Prediction (MSEP), often estimated by means of second-order correct approximations that take into account the uncertainty due to the parameter es-timation, such as those proposed by Kackar and Harville [1984], Prasad and Rao [1990], Datta and Lahiri [2000] andDas et al.[2004]. In a more general framework, it is custom-ary to report estimates of the Conditional Variances (CV) resulting from the Bayesian outlook on the GLMM, with the addition of corrections that account for the added vari-ability of the estimation of the model parameters, see e.g. Kass and Steffey[1989], Booth and Hobert [1998], Singh et al. [1998]. The computation of these measures could also be undertaken with the use of resampling methods such as the Jackknife approach to the computation of MSEP [Jiang et al., 2002] or the more widespread Parametric Bootstrap (PB) method, used to produce estimates of MSEP, see for instance Butar and Lahiri [2003] or to build Prediction Intervals, as seen in works by Butar and Lahiri [2003], Hall

and Maiti [2006],Chatterjee et al. [2008].

This manuscript provides a contribution to the literature of uncertainty estimation in prediction by means of non-parametric bootstrap methods, specifically by proposing algorithms based onRandom Weighted Laplace Bootstrap(RWLB) applicable in the entire GLMM class for the computation and correction of MSEP and CV. This proposal is then compared to adaptations of other schemes such as the so-calledRandom Effect Bootstrap (REB), see e.g. Davison and Hinkley [1997], Carpenter et al. [2003], Field et al. [2008]

and the more widespread PB alternatives.

The article is structured as follows: In Section 2.2, we set up the notation of the GLMM, characterize the special case of LMM (Section2.2.1) and summarize the problem of prediction of random effects (Section 2.2.2). Section 2.3, contains an overview of two methods for the evaluation and estimation of the uncertainty in prediction namely the approach via the MSEP (Section 2.3.1) and the Empirical Bayes approaches relying on CV (Section2.3.2). We briefly review some resampling schemes for LMM in Section2.3.3, highlight or propose adaptations to the estimation of uncertainty in the Non-Gaussian context and formulate our proposals based on the RWLB scheme. Finally, Section 2.4 contains two simulation studies as a basis of comparison of the different methods, one carried on a LMM (Section 2.4.1) and second one in a Mixed Logit context (Section 2.4.2).

2.2 Model and Notation

Let i = 1, . . . , n denote the index of the observational unit and j = 1, . . . , ni the index for an observation within this unit. Write θ = [β^T,σ^T]^T (d×1) to denote the vector of model parameters, where β(p×1) represents the fixed effect parameters and σ(s×1) contains the parameters associated with the random effects sometimes referred to as Variance Components andd=p+s. Write y_ij to denote the observed outcomes, assumed to be independently drawn from an exponential family when conditioned on a vector of covariates x_ij(p×1) and a vector of random effects γ_i(q×1) following a Nq(0,∆_σ) distribution, endowed with a positive-definite symmmetric covariance matrix ∆_σ. For notation simplicity, we will consider the reparametrization γ_i = D_σu_i resulting from the Cholesky decomposition of ∆_σ = D_σD^T_σ where u_i are multivariate standard normal vectors. Let µ_ij denote the conditional expectation of the outcome, z_ij(q×1) a design vector for the random effects and η_ij = x^T_ijβ +z^T_ijD_σu_i the Linear Predictor. With g, representing a monotoniclink function that maps the linear predictor and the conditional expectation of the outcome, the GLMM can be formulated as follows:

g(µ_ij) =η_ij =x^T_ijβ+z^T_ijD_σu_i.

Let f denote the Probability Density Function (PDF) or Probability Mass Function (PMF) evaluated at the observed outcomesy_ij, conditioned on vectorsx_ij,u_i and assumed to follow conditional exponential families :

f_θ(y_ij|u_i) :=f(y_ij|u_i,x_ij;θ, φ) = ^Xⁿⁱ

j=1

y_ijξ_ij −b(ξ_ij)

φ +c(y_ij, φ)

for φ a nuisance dispersion parameter, ξ_ij = ξ(η_ij) the so-called canonical parameter (when φ is known) and with b, the cumulant function, characterizing the conditional means and variances of the outcomes, e.g. µ_ij = E[Y_ij|u_i] = b⁰(ξ_ij) and ν_ij = v(µ_ij) =

2.2. Model and Notation 45 Var[Yij|ui] =φb⁰⁰(ξij) andcdenoting a specific function. In what follows, and without loss of generality, we consider the link g to be the canonical link, in other words µ_ij =b⁰(η_ij), implying ξ_ij =η_ij.

The expressions of the marginal PDF/PMF fθ(yij) are obtained after integration of the random effects from the joint distribution of [y_ij,u^T_i ]^T. Using ϕ to denote the den-sity of the standard multivariate normal random vector u_i and with the assumptions on the independence between observational units, the Likelihood contributions are given by multivariate integrals of the form: Moreover, taking advantage of the expression of the multivariate standard normal density and the structure of the exponential family, defining functions `_i as :

`_i(θ,u_i) :==−1 and using the conventions on integral notation, the joint PDF/PMF contributions can be written in the following form:

L_i(θ,u_i) = (2π)^−q/2exp [−n_i`_i(θ,u_i)]. (2.3)

Dans le document On the Inference of Random Effects in Generalized Linear Mixed Models (Page 54-58)