• Aucun résultat trouvé

1.8 Appendix

1.8.5 TMB templates for LAWLL

LMM

Listing 1.1: TMBtemplate for the Weighted Joint Likelihood of the LMM

1 // G a u s s i a n L i n e a r M i x e d M o d e l wi th R a n d o m S l o p e and R a n d o m I n t e r c e p t .

2 # i n c l u d e < TMB . hpp >

3

4 t e m p l a t e<c l a s s Type >

5 T y p e o b j e c t i v e _ f u n c t i o n < Type >::o p e r a t o r() ()

6 {

7 D A T A _ F A C T O R ( g r o u p );

8 D A T A _ V E C T O R ( y );

9 D A T A _ M A T R I X ( X );

10 D A T A _ V E C T O R ( t ); // V a r i a b l e a s s o c i a t e d w i th the R a n d o m S l o p e

11 D A T A _ V E C T O R ( w ); // R a n d o m W e i g h t

12 P A R A M E T E R _ V E C T O R ( u ); // I n t e r c e p t for g i v e n i n d i v i d u a l

13 P A R A M E T E R _ V E C T O R ( v ); // S l o p e for g i v e n i n d i v i d u a l

14 P A R A M E T E R _ V E C T O R ( b e t a ); // length - fo ur for ea ch of the c o v a r i a t e s

15 P A R A M E T E R _ V E C T O R ( s i g m a ); // one for i n t e r c e p t , a n o t h e r for s l o p e

16

17 int n o b s = y . s i z e ();

18 int n g r o u p s = u . s i z e ();

19 int j ;

20 T y p e res ( 0 . 0 ) ;

21

22 vector < Type > mu = X * b e t a ;

23

24 /* W e i g h t e d D i s t r i b u t i o n of I n t e r c e p t and S l o p e */

25 for(int j =0; j < n g r o u p s ; j + + ) {

26 res -= w [ j ]* d n o r m ( u [ j ] , T y p e (0) , s i g m a [0] , 1);

27 res -= w [ j ]* d n o r m ( v [ j ] , T y p e (0) , s i g m a [1] , 1);

28 }

29

30 /* J o i n t W e i g h t e d D i s t r i b u t i o n */

31 for(int i =0; i < n o b s ; i + + ) {

32 j = g r o u p [ i ];

33 res -= w [ j ]* d n o r m ( y [ i ] , T y p e ( mu [ i ]+ u [ j ]+ t [ i ]* v [ j ]) , s i g m a [2] , 1);

34 }

35 r e t u r n res ;

36 }

GLMM

Listing 1.2: TMBtemplate for the Weighted Joint Likelihood of the Mixed Logit Model

1 // M i x e d L o g i t R e g r e s s i o n m o d e l with R a n d o m I n t e r c e p t .

2 # i n c l u d e < TMB . hpp >

3

4 t e m p l a t e<c l a s s Type >

5 T y p e o b j e c t i v e _ f u n c t i o n < Type >::o p e r a t o r() ()

6 {

7 D A T A _ F A C T O R ( g r o u p );

8 D A T A _ V E C T O R ( y );

9 D A T A _ M A T R I X ( X );

10 D A T A _ V E C T O R ( w ); // R a n d o m W e i g h t

11 P A R A M E T E R _ V E C T O R ( u ); // I n t e r c e p t for g i v e n i n d i v i d u a l

12 P A R A M E T E R _ V E C T O R ( b e t a ); // length - fo ur for ea ch of the c o v a r i a t e s

13 P A R A M E T E R ( s i g m a ); // one for i n t e r c e p t

14

15 int n o b s = y . s i z e ();

16 int n g r o u p s = u . s i z e ();

17 int j ;

18 T y p e res ( 0 . 0 ) ;

19

20 vector < Type > mu = X * b e t a ;

21

22 /* W e i g h t e d D i s t r i b u t i o n of the I n t e r c e p t */

23 for(int j =0; j < n g r o u p s ; j + + ) {

24 res -= w [ j ]* d n o r m ( u [ j ] , T y p e (0) , sigma , 1 ) ;

25 }

26

27 /* J o i n t W e i g h t e d D i s t r i b u t i o n */

28 for(int i =0; i < n o b s ; i + + ) {

29 j = g r o u p [ i ];

30 res -= w [ j ]* d b i n o m ( y [ i ] , T y p e (1) , T y p e ( 1 / ( 1 + exp ( - mu [ i ] - u [ j ] ) ) ) , 1 ) ;

31 }

32

33 r e t u r n res ;

34 }

Chapter 2

Bootstrapping GLMM to Estimate the Uncertainty in Prediction

2.1 Introduction

In many applications of Mixed Models, it is of interest to provide an “estimate” of the value for the random effects, be it for forecasting purposes or to assess the quality of a particular fit e.g. by performing some sort of residual analysis post-estimation. The process of providing such values is customarily called Point Prediction of the Random Effects, a denomination used to state its difference from the Estimation of the Model Parameters.

The problem of Prediction of Random Effects has been widely explored in the literature of Gaussian GLMM or Linear Mixed Models (LMM), a setting for which theoretical results have led to the determination of the Best Linear Unbiased Predictor (BLUP) in full knoweldege of the model parameters, and its Empirical version (EBLUP) when the parameters are estimated. Naturally, this approach has analogues in the Non-Gaussian framework in the form of the Best Predictor (BP) and EBP, often approximated by Conditional Modes obtained with an Empirical Bayes approach, see e.g Morris [1983], Tierney and Kadane [1986].

Similarly to the estimation problem, where point estimates are provided alongside their standard errors for inferential purposes, it is useful to retrieve a measure of uncer-tainty of the point predictions e.g. to classify observational units according to “significant”

differences in their predicted response or to construct prediction intervals for new obser-vations drawn from a given unit. In LMM, this translates into the computation of the Mean Squared Error of the Prediction (MSEP), often estimated by means of second-order correct approximations that take into account the uncertainty due to the parameter es-timation, such as those proposed by Kackar and Harville [1984], Prasad and Rao [1990], Datta and Lahiri [2000] andDas et al.[2004]. In a more general framework, it is custom-ary to report estimates of the Conditional Variances (CV) resulting from the Bayesian outlook on the GLMM, with the addition of corrections that account for the added vari-ability of the estimation of the model parameters, see e.g. Kass and Steffey[1989], Booth and Hobert [1998], Singh et al. [1998]. The computation of these measures could also be undertaken with the use of resampling methods such as the Jackknife approach to the computation of MSEP [Jiang et al., 2002] or the more widespread Parametric Bootstrap (PB) method, used to produce estimates of MSEP, see for instance Butar and Lahiri [2003] or to build Prediction Intervals, as seen in works by Butar and Lahiri [2003], Hall

and Maiti [2006],Chatterjee et al. [2008].

This manuscript provides a contribution to the literature of uncertainty estimation in prediction by means of non-parametric bootstrap methods, specifically by proposing algorithms based onRandom Weighted Laplace Bootstrap(RWLB) applicable in the entire GLMM class for the computation and correction of MSEP and CV. This proposal is then compared to adaptations of other schemes such as the so-calledRandom Effect Bootstrap (REB), see e.g. Davison and Hinkley [1997], Carpenter et al. [2003], Field et al. [2008]

and the more widespread PB alternatives.

The article is structured as follows: In Section 2.2, we set up the notation of the GLMM, characterize the special case of LMM (Section2.2.1) and summarize the problem of prediction of random effects (Section 2.2.2). Section 2.3, contains an overview of two methods for the evaluation and estimation of the uncertainty in prediction namely the approach via the MSEP (Section 2.3.1) and the Empirical Bayes approaches relying on CV (Section2.3.2). We briefly review some resampling schemes for LMM in Section2.3.3, highlight or propose adaptations to the estimation of uncertainty in the Non-Gaussian context and formulate our proposals based on the RWLB scheme. Finally, Section 2.4 contains two simulation studies as a basis of comparison of the different methods, one carried on a LMM (Section 2.4.1) and second one in a Mixed Logit context (Section 2.4.2).

2.2 Model and Notation

Let i = 1, . . . , n denote the index of the observational unit and j = 1, . . . , ni the index for an observation within this unit. Write θ = [βT,σT]T (d×1) to denote the vector of model parameters, where β(p×1) represents the fixed effect parameters and σ(s×1) contains the parameters associated with the random effects sometimes referred to as Variance Components andd=p+s. Write yij to denote the observed outcomes, assumed to be independently drawn from an exponential family when conditioned on a vector of covariates xij(p×1) and a vector of random effects γi(q×1) following a Nq(0,∆σ) distribution, endowed with a positive-definite symmmetric covariance matrix σ. For notation simplicity, we will consider the reparametrization γi = Dσui resulting from the Cholesky decomposition of σ = DσDTσ where ui are multivariate standard normal vectors. Let µij denote the conditional expectation of the outcome, zij(q×1) a design vector for the random effects and ηij = xTijβ +zTijDσui the Linear Predictor. With g, representing a monotoniclink function that maps the linear predictor and the conditional expectation of the outcome, the GLMM can be formulated as follows:

g(µij) =ηij =xTijβ+zTijDσui.

Let f denote the Probability Density Function (PDF) or Probability Mass Function (PMF) evaluated at the observed outcomesyij, conditioned on vectorsxij,ui and assumed to follow conditional exponential families :

fθ(yij|ui) :=f(yij|ui,xij;θ, φ) = Xni

j=1

yijξijb(ξij)

φ +c(yij, φ)

for φ a nuisance dispersion parameter, ξij = ξ(ηij) the so-called canonical parameter (when φ is known) and with b, the cumulant function, characterizing the conditional means and variances of the outcomes, e.g. µij = E[Yij|ui] = b0(ξij) and νij = v(µij) =

2.2. Model and Notation 45 Var[Yij|ui] =φb00(ξij) andcdenoting a specific function. In what follows, and without loss of generality, we consider the link g to be the canonical link, in other words µij =b0(ηij), implying ξij =ηij.

The expressions of the marginal PDF/PMF fθ(yij) are obtained after integration of the random effects from the joint distribution of [yij,uTi ]T. Using ϕ to denote the den-sity of the standard multivariate normal random vector ui and with the assumptions on the independence between observational units, the Likelihood contributions are given by multivariate integrals of the form: Moreover, taking advantage of the expression of the multivariate standard normal density and the structure of the exponential family, defining functions `i as :

`i(θ,ui) :==−1 and using the conventions on integral notation, the joint PDF/PMF contributions can be written in the following form:

Li(θ,ui) = (2π)−q/2exp [−ni`i(θ,ui)]. (2.3)