Bootstrap estimation of uncertainty in prediction for generalized linear mixed models

(1)

Working Paper

Reference

Bootstrap estimation of uncertainty in prediction for generalized linear mixed models

FLORES AGREDA, Daniel Antonio, CANTONI, Eva

Abstract

In the framework of Mixed Models, it is often of interest to provide an estimate of the uncertainty in predictions for the random effects, customarily defined by the Mean Squared Error of Prediction (MSEP). To address this computation in the Generalized Linear Mixed Model (GLMM) context, a non-parametric Bootstrap algorithm is proposed. First, a newly developed Bootstrap scheme relying on random weighting of cluster contributions to the joint likelhood function of the model and the Laplace Approximation is used to create bootstrap replicates of the parameters. Second, these replicates yield in turn bootstrap samples for the random effects and for the responses. Third, generating predictions of the random effects employing the bootstrap samples of observations produces bootstrap replicates of the random effects that, in conjunction with their respective bootstrap samples, are used in the estimation of the MSEP. To assess the validity of the proposed method, two simulation studies are presented. The first one in the framework of Gaussian LMM, contrasts the quality of the proposed approach with respect to: (i) analytical [...]

FLORES AGREDA, Daniel Antonio, CANTONI, Eva. Bootstrap estimation of uncertainty in prediction for generalized linear mixed models. 2017

Available at:

http://archive-ouverte.unige.ch/unige:100298

Disclaimer: layout of this document may differ from the published version.

(2)

Bootstrap Estimation of Uncertainty in Prediction for Generalized Linear Mixed Models

Daniel Flores Agreda, Eva Cantoni

Research Center for Statistics and

Geneva School of Economics and Management University of Geneva, Switzerland

November 17, 2017

Abstract

In this article, we focus on the estimation of the Mean Squared Error for the Predic- tors (MSEP) of Random Effects (RE) in Generalized Linear Mixed Models (GLMM) by means of non-parametric bootstrap methods. In the frequentist paradigm, the MSEP is used as a measure of the uncertainty in prediction and has been shown to be affected by the estimation of the model parameters. In the particular case of linear mixed models (LMM), two solutions are provided to practitioners: on one hand, second- order correct approximations which yield estimators of this quantity and, on the other hand, several Parametric Bootstrap algorithms. We propose a non-parametric bootstrap scheme, consisting of an adaptation of the Random Weighted Laplace Bootstrap (RWLB) that can be used in the entire class of GLMM. On a first stage, the RWLB is used to generate bootstrap replicates of the parameters while, on a second stage, simulation is used to generate bootstrap samples of standardized RE. We conduct a first simulation study in the framework of Gaussian LMM to contrast the quality of our approach with respect to: (i) analytical estimators of MSEP based on approximations, (ii) Conditional Variances obtained with a Bayesian representation and (iii) other bootstrap schemes, on the grounds of their relative bias, relative efficiency and the coverage ratios of resulting prediction intervals. A second simulation study serves the purpose to illustrate the use and benefits of our proposal against other feasible alternatives in a pure, Non-Gaussian, GLMM setting.

Keywords : Bootstrap; GLMM; Prediction; Random Effects; MSEP; Laplace Approximation

(3)

1 Introduction

In many applications of Mixed Models, it is of interest to provide an “estimate” of the value for the random effects, be it for forecasting purposes or to assess the quality of a particular fit e.g. by performing some sort of residual analysis post-estimation. The process of providing such values is customarily called Point Prediction of the Random Effects, a denomination used to state its difference from the Estimation of the Model Parameters.

The problem of Prediction of Random Effects has been widely explored in the literature of Gaussian GLMM or Linear Mixed Models (LMM), a setting for which theoretical results have led to the determination of the Best Linear Unbiased Predictor (BLUP) in full know- eldege of the model parameters, and its Empirical version (EBLUP) when the parameters are estimated. Naturally, this approach has analogues in the Non-Gaussian framework in the form of the Best Predictor (BP) and EBP, often approximated by Conditional Modes obtained with an Empirical Bayes approach, see e.gMorris(1983),Tierney & Kadane(1986).

Similarly to the estimation problem, where point estimates are provided alongside their standard errors for inferential purposes, it is useful to retrieve a measure of uncertainty of the point predictions e.g. to classify observational units according to “significant” differences in their predicted response or to construct prediction intervals for new observations drawn from a given unit. In LMM, this translates into the computation of theMean Squared Error of the Prediction (MSEP), often estimated by means of second-order correct approximations that take into account the uncertainty due to the parameter estimation, such as those proposed by Kackar & Harville (1984), Prasad & Rao (1990), Datta & Lahiri (2000) and Das et al.

(2004). In a more general framework, it is customary to report estimates of the Conditional Variances (CV) resulting from the Bayesian outlook on the GLMM, with the addition of corrections that account for the added variability of the estimation of the model parameters, see e.g. Kass & Steffey(1989),Booth & Hobert(1998),Singhet al. (1998). The computation of these measures could also be undertaken with the use of resampling methods such as the Jackknife approach to the computation of MSEP (Jianget al.,2002) or the more widespread Parametric Bootstrap (PB) method, used to produce estimates of MSEP, see for instance Butar & Lahiri (2003) or to build Prediction Intervals, as seen in works by Butar & Lahiri (2003); Hall & Maiti(2006); Chatterjee et al. (2008).

This manuscript provides a contribution to the literature of uncertainty estimation in prediction by means of non-parametric bootstrap methods, specifically by proposing algorithms based onRandom Weighted Laplace Bootstrap (RWLB) applicable in the entire GLMM class for the computation and correction of MSEP and CV. This proposal is then compared to adaptations of other schemes such as the so-called Random Effect Bootstrap (REB), see e.g. Davison & Hinkley (1997); Carpenter et al. (2003); Field et al. (2008) and the more widespread PB alternatives.

The article is structured as follows: In Section 2, we set up the notation of the GLMM, characterize the special case of LMM (Section 2.1) and summarize the problem of prediction of random effects (Section 2.2). Section 3, contains an overview of two methods for the evaluation and estimation of the uncertainty in prediction namely the approach via the MSEP (Section 3.1) and the Empirical Bayes approaches relying on CV (Section 3.2). We briefly review some resampling schemes for LMM in Section 3.3, highlight or propose adaptations to the estimation of uncertainty in the Non-Gaussian context and formulate our proposals

(4)

based on the RWLB scheme. Finally, Section 4contains two simulation studies as a basis of comparison of the different methods, one carried on a LMM (Section 4.1) and second one in a Mixed Logit context (Section 4.2).

2 Model and Notation

Let i = 1, . . . , n denote the index of the observational unit and j = 1, . . . , n_i the index for an observation within this unit. Write θ = [β^T,σ^T]^T(d×1) to denote the vector of model parameters, where β(p×1) represents thefixed effect parameters and σ(s×1) contains the parameters associated with the random effects sometimes referred to as Variance Components and d = p+s. Write yij to denote the observed outcomes, assumed to be independently drawn from an exponential family when conditioned on a vector of covariates x_ij(p×1) and a vector of random effects γ_i(q×1) following a N_q(0,∆_σ) distribution, endowed with a positive-definite symmmetric covariance matrix ∆σ. For notation simplicity, we will consider the reparametrization γ_i = D_σu_i resulting from the Cholesky decomposition of ∆_σ =D_σD^T_σ where u_i are multivariate standard normal vectors. Let µ_ij denote the conditional expectation of the outcome,zij(q×1) a design vector for the random effects and η_ij =x^T_ijβ+z^T_ijD_σu_i the Linear Predictor. With g, representing a monotonic link function that maps the linear predictor and the conditional expectation of the outcome, the GLMM can be formulated as follows:

g(µ_ij) = η_ij =x^T_ijβ+z^T_ijD_σu_i.

Let f denote the Probability Density Function (PDF) or Probability Mass Function (PMF) evaluated at the observed outcomes y_ij, conditioned on vectorsx_ij, u_i and assumed to follow conditional exponential families :

fθ(yij|ui) := f(yij|ui,xij;θ, φ) =

ni

X

j=1

yijξij −b(ξij)

φ +c(yij, φ)

for φ a nuisancedispersion parameter, ξ_ij =ξ(η_ij) the so-called canonical parameter (when φ is known) and with b, the cumulant function, characterizing the conditional means and variances of the outcomes, e.g. µ_ij = E[Y_ij|u_i] = b⁰(ξ_ij) and ν_ij = v(µ_ij) = Var[Y_ij|u_i] = φb⁰⁰(ξ_ij) andcdenoting a specific function. In what follows, and without loss of generality, we consider the link g to be the canonical link, in other wordsµ_ij =b⁰(η_ij), implying ξ_ij =η_ij.

The expressions of the marginal PDF/PMF f_θ(y_ij) are obtained after integration of the random effects from the joint distribution of [y_ij,u^T_i ]^T. Usingϕto denote the density of the standard multivariate normal random vector u_i and with the assumptions on the independence between observational units, the Likelihood contributions are given by multivariate integrals of the form:

L_i(θ) :=

Z

R^q

"_n_i Y

j=1

f_θ(y_ij|u_i)

#

ϕ(u_i) du_i = Z

R^q

L_i(θ,u_i) du_i. (1)

(5)

Moreover, taking advantage of the expression of the multivariate standard normal density and the structure of the exponential family, defining functions `_i as :

`_i(θ,u_i) :==−1 ni

( _n

i

X

j=1

y_ijη_ij −b(η_ij)

φ +c(y_ij, φ)

− 1 2u^T_iu_i,

)

(2) and using the conventions on integral notation, the joint PDF/PMF contributions can be written in the following form:

L_i(θ,u_i) = (2π)^−q/2exp [−n_i`_i(θ,u_i)]. (3)

2.1 Special Case: Linear Mixed Models

We will exemplify some of the procedures in the framework of LMM, a case that arises when the outcomes y_ij are conditionally distributed as N(µ_ij, φ) with the identity function as link, yielding µ_ij = x^T_ijβ+z^T_ijD_σu_i. In order to simplify the notation, we shall often refer to the matrix representation of the LMM, which requires the definition of the random error components _ij following a N(0, φ) distribution. Outcomes can then be concatenated into conditionally independent Normal vectors y_i = [y_ij]ⁿ_j=1ⁱ following N_n_i(µ_i, φI_ni) with means µ_i = [µij]ⁿ_j=1ⁱ , yielding the characterization:

y_i =X_iβ+Z_iD_σu_i +_i, (4) where: X_i = [x^T_ij]ⁿ_j=1(n_i×p),Z_i = [z^T_ij]ⁿ_j=1(n_i×q), a representation that is often referred to as the Longitudinal LMM.

A particular LMM, that we shall callRandom Effect Model throughout this manuscript, arises when the random structure of the model can be decomposed into various independent sources of variation up to the level of the observational unit, implying the partition of bothZ_i into Z_i = [Z_i1, . . . ,Z_iq] andu_i = [u_ri]^q_r=1, where Z_ir(n_i×q_ir) andu_ri(q_ir×1). By employing the convention σ₀ =φ^1/2 and Z_i0 =I_n_i, Model (4) can be written as follows:

y_i =X_iβ+

q

X

r=0

σ_rZ_iru_ir, (5)

which can be carefully arranged according to source of variation u_r = [u_ri]ⁿ_i=1(q_r ×1) for q_r =Pn

i=1q_ir as well as the design matricesZ_r = diag[Z_ir]ⁿ_i=1(N×q_r) withN =Pn

i=1n_i, and X= [X_i]ⁿ_i=1 to yield a representation which can be seen as a particular case of the so-called Variance Components Model (Searleet al. ,1992):

y=Xβ+

q

X

r=0

σ_rZ_ru_r. (6)

It is straightforward to see that representations (4) and (6) can be deemed as cases of a General LMM,

y=Xβ+ZDu+ (7)

(6)

differing only on the structure of Z and ∆ = DD^T (e.g. D = blockdiag(D_σ)ⁿ_i=1, Z = blockdiag(Z_i)ⁿ_i=1 in the Longitudinal LMM). As pointed out in the literature, (see for example, McCulloch & Searle (2001)) the assumptions on normality for the conditional distribution of the outcomes imply that their marginal distribution is also N_N(Xβ,Σ), with Σ=Z∆Z^T +φI_N.

2.2 Point Predictors of Random Effects

Following classic references on the subject (e.g. McCulloch & Searle 2001, Ch. 9), we shall write u(θ,y) to denote a Predictor of the random effects u, i.e. a function of the data and the parameters θ providing the point prediction of the vector of random effects. With this notation we can recall that the function that minimizes the Mean Squared Error of Prediction, defined as MSEP[u(θ,y),u] := E[u(θ,y)−u]^T[u(θ,y)−u], is the conditional expectation of the random effects given the outcome, and this regardless of the distribution of the latter (see e.g. Jiang 2007, Sec. 2.3.). This function constitutes hence the Best Predictor (BP) of u which we shall represent throughout this manuscript as follows:

ˆ

u(θ,y) =E[u|y]. (8)

Since clusters in a GLMM are considered independent, it suffices to use the data contained in a cluster to compute a prediction foru_i. This consideration entails the following expression of the BP as a ratio of integrals:

uˆ_i(θ,y_i) = 1 [L_i(θ)]

Z

R^q

u_iL_i(θ,u_i) du_i, (9) for L_i and L_i defined in equations (1) and (3) respectively, a representation that lacks a closed-form expression aside from particular models. One of the latter cases is the framework of LMM, where the properties of Gaussian vectors allow to determine that the joint distribution of [y^T_i ,u^T_i ]^T is Gaussian, implying that the conditional distribution of u_i|y_i is also Gaussian and endowed with the following expectation and variance:

E[ui|yi] =DσZ^T_i Σ⁻¹_i [yi−Xiβ] (10) Var[u_i|y_i] =I−D^T_σZ^T_i Σ⁻¹_i Z_iD_σ, (11) yielding therefore the formulation of the Best Linear Predictor or BLP of u_i (see e.g. Hen- derson 1950, Robinson 1991):

uˆi(θ,yi) :=DσZ^T_i Σ⁻¹_i [yi−Xiβ]. (12) In other cases, the integrals composing the BP (9) are approximated by means of various methods, such as Monte Carlo (MC) or Adaptive Gaussian Quadrature (AGQ). In particular, the approximation via the Laplace method for Integrals, leads to simplifications that allow the approximation of the BP by the Conditional Mode (CM), defined as:

˜

u_i(θ,y_i) := argmax

u

{−n_i`_i(θ,u_i)}, (13)

(7)

with a relative approximation error of O(n⁻¹_i ) (see e.g. Kass & Steffey 1989; Tierney et al.

1989; Booth & Hobert 1998).

In practical applications, the parameters in expressions (12), (13) or other approximations of (9) are customarily replaced with estimates after model fit, yielding Empirical versions of the respective predictors. In the case of the BP the result of this procedure is customarily named theEmpirical Best Predictor (EBP), yet we shall refer to itEmpirical Predictor (EP) since there is not a guarantee that it minimizes the MSEP (hence, no guarantee of it being

“Best”).

Throughout this manuscript, and like much of the literature on the subject, we shall focus on the prediction of a linear combination of the fixed and random effects ormixed effect ϑ_i(θ) = λ_β^Tβ+λ_u^TD_σu_i which is naturally provided by the combination of estimates θb and predictions ˆui(θ,yi).

3 Measures of Uncertainty in Prediction of Random Effects

In what follows, we review two competing methods to account for the uncertainty in the prediction of random effects, namely, the estimation of theMean Squared Error of the Predictor (MSEP) and the computation of Conditional Variances (CV).

3.1 Estimation of MSEP via Approximations

In the Gaussian LMM context it is straightforward to show that the MSEP can be decomposed into various terms according to the degree of knoweledge of the model parameters (Harville, 1985). Let ϑb_i(σ) denote the Best Linear Unbiased Predictor (BLUP), i.e. the value of the BLP when σ is known andβ is replaced by its Best Linear Unbiased Estima- tor (BLUE) β(σ) = [Xb ^TΣ⁻¹X]⁻¹X^TΣ⁻¹y. With this notational consideration, Kackar &

Harville (1984) show that the MSEP of the EP can be characterized as:

MSEPh ϑb_i

bθi

= MSEPh

ϑb_i(σ)i +E

h

ϑb_i(σ)b −ϑb_i(σ)i2

+ 2E nh

ϑb_i(σ)−ϑ_i(σ)i h

ϑb_i(σ)b −ϑb_i(σ)io

, (14) and that the last term in (14) vanishes whenσb is an even and translation invariant estimator.

The first term in this characterization, which corresponds to the MSEP of the BLUP, can be written as a sum of two contributions:

g_1i(σ) := λ_u^T∆λ_u−s(σ)^TZ∆ (15)

g_2i(σ) :=

λ_β −X^Ts(σ)T

X^TΣ⁻¹X λ_β−X^Ts(σ)

, (16)

where s(σ) = Σ⁻¹Z∆λu. By defining g3i(σ) := E[bϑi(σ)b −ϑbi(σ)]² which is the contribution to the MSEP due to the estimation of σ, the MSEP of the EP can be written as a threefold decomposition:

MSEP h

ϑbi(σ)b i

=g1i(σ) +g2i(σ) +g3i(σ).

(8)

Kackar & Harville(1984) show thatg_3i is aO_p(n⁻¹) term that lacks a general closed-form expression and propose to approximate it by means of a linear expansion of ϑb_i(σ) aroundb σ, yielding:

g_3i^KH(σ) =E{[∂_σϑb_i(σ)](σb −σ)}²,

where ∂_σϑb_i(σ) denotes the partial derivative ofϑb_i(σ) with respect to σ. Further conditions on the estimator σb allow Prasad & Rao (1990) to justify the independence between the terms within the latter expectation and thus to propose the alternative approximation:

g^PR_3i (σ) = tr{Var[∂_σϑb_i(σ)]Var(σ)},b

endowed with a o_p(n⁻¹) error in the Fay-Herriot Model when σ is estimated via a method of Fitting Constants (FC), or Henderson Method 3 which corresponds to the well known ANOVA estimators. For estimates obtained via Maximum Likelihood (ML) and Restricted Maximum Likelihood (REML) in a Longitudinal LMM, Datta & Lahiri (2000) propose an analogous approximation with o_p(n⁻¹) error:

g^DL_3i (σ) = n⁻¹tr

[∂σs(σ)]^TΣ⁻¹[∂σs(σ)]I_σ⁻¹(σ) ,

where I_σ(σ) is the portion of the information matrix I_θ(σ) = −E[∂_θ²logL] corresponding to parameters σ, i.e. I_θ(σ) = blockdiag[I_β(σ),I_σ(σ)]. Later works by Das et al. (2004) provide the full set of conditions for estimators based on estimating equations for a various cases of LMM and show the validity of the Datta & Lahiri (2000) formula.

On the basis of these approximations, several plug-in estimators of MSEP have been proposed. A first approach, formulated by Prasad & Rao(1990), consists in a plug-in estimator based on their FC estimate σb_{F C} obtained after determining that the plug-in estimators g_2i(σb_{F C}), g_3i(σb_{F C}) have a bias o(n⁻¹) and that E[g_1i(σb_{F C})] = g_1i(σ)− g_3i(σ) +o(n⁻¹) yielding the estimator:

msep^PR[bϑ_i(σb_{F C})] =g_1i(σb_{F C}) +g_2i(σb_{F C}) + 2g^{P R}_3i (σb_{F C}). (17) Datta & Lahiri(2000) assess similar properties of the bias in plug-in estimates of the contri- butionsg_2i(σb_ML) andg_3i(σb_ML) all while showing thatE[g_1i(σb_ML)] = g_1i(σ)−g_3i(σ)+O(n⁻¹) due to the small sample bias of σb_ML. They further propose to correct this bias by means of the expectation of a linearization for the contribution g_1i around σ, thus constructing theb following estimator:

msep^DLh

ϑb_i(σ)b i

=g_1i(σ) +b g_2i(σ) + 2gb ^DL_3i (σ)b −[∂_σg_1i(σ)]b bb[σ,b σ], (18) where b is an estimate of the bias b[σ,b σ] =E[σ]b −σ. The validity of this decomposition- based methods and resulting approximations and estimators has further been asserted in the context of the general LMM of Equation (7) by Das et al. (2004).

We feel important to point out that this approximation-based approach has inspired similar proposals in the framework of non-Gaussian GLMM. For instance, in a Mixed Logit setting, Jiang & Lahiri (2001) provide the conditions to approximate the MSEP with the same linearization of the contribution due to the estimation of the model parameters in the spirit ofKackar & Harville(1984). Simplifications to the expressions of these approximations,

(9)

using arguments similar to those ofPrasad & Rao(1990), can be achieved using estimates of θ obtained with the method of Simulated Moments (Jiang, 1998a). We refer the interested reader toJiang(2003),Jiang (2007) for a summary of the extensions of this approach to the general case and toBoubetaet al. (2016) for an illustration in a Mixed Poisson setting, since we will not be providing further treatment of these procedures in this manuscript, because of the scarcity of the implementations of the method of Simulated Moments as an inferential procedure for the parameters in GLMM, which makes the use of the approximations to the MSEP hardly widespread.

3.2 Computation and Correction of CV

A more predominant approach to the estimation of uncertainty in prediction results from an Empirical Bayes (EB) outlook (see e.g. Morris 1983). Under this approach, and when θ is known, the uncertainty of the BP can be measured through the Conditional Variancev_i(θ) = Var[ϑi|yi], which in the LMM has a closed-form expression resulting from equation (11) and coincides with the contribution g_1i of formula (15). The uncertainty due to the estimation of the model parameters θ can be taken into account by means of prior distributions, a procedure that yields the following decomposition:

Var[ϑ_i(θ)|y] =Eθ{Var[ϑ_i(θ)|y]}+Var_θ{E[ϑ_i(θ)|y]}, (19) where the use of subscripts in the expectation and the variance indicate that these values are obtained on the basis of the posterior densities for the model parameters. Per Kass &

Steffey (1989), these terms can be approximated as follows:

E^θ{Var[ϑi(θ)|y]}=vi

bθMP

−pb

bθMP

+op(n⁻¹) (20)

Var_θ{E[ϑ_i(θ)|y]}=Var_θnh

∂_θϑb_i(bθ_MP)i

θ−bθ_MPo

+o_p(n⁻¹)

= h

∂θϑbi

bθMP

i Varθ

h bθMP

i h

∂θϑbi

bθMP

iT

+op(n⁻¹), (21) wherebθ_MPdenotes theMaximum a Posteriori(MP) estimator ofθand pb(bθ_MP) = v_i(bθ_MP)−

Eθ[v_i(θ)] represents a Posterior Bias of order O_p(n⁻¹), in the words of Booth & Hobert (1998). Naturally, assuming non-informative priors allows the substitution of the MP estimates by the ML in (20) and (21), yielding a second-order approximation of the conditional variance (19) provided that the posterior bias in (20) can be corrected.

In LMM, this procedure has analogies with the computation of MSEP, as pointed out by Singhet al. (1998), who show that, when onlyσ is unknown and with uninformative priors:

Eσ{Var[ϑ_i(σ)|y]}=Eσ{g_1i(σ) +g_2i(σ)}=v_i(σb_ML) +O(n⁻¹)

=g_1i(σb_ML) +g_2i(σb_ML) +O(n⁻¹), (22) while (21) has the same interpretation as the third contributiong3i, namely, that it computes the contribution due to the uncertainty in the parameters σ and has a similar expression to the approximation via linearization in the spirit of Kackar & Harville (1984). In this context, the correction term pb can be computed using a modification of the log-Likelihood as suggested by Kass & Steffey (1989) inspired byTierney & Kadane (1986).

(10)

To finalize this overview, we feel important to point out that this outlook has inspired a frequentist measure. Booth & Hobert (1998) propose a Conditional version of MSEP which can be decomposed into two terms when θ is known, as in:

CMESP[bϑi(bθ)] :=E[bϑi(bθ)−ϑi|yi]² =λuT

Dσvi(θ)D^T_σλu+ci(θ,yi)

where the last term c_i(θ,y_i) := E[ϑb_i(bθ)−ϑb_i(θ)|y_i]² is a O(n⁻¹) correction term accounting for the contribution of the parameter estimation. They further show the equivalence of their measures and the MSEP in the framework of LMM.

3.3 Bootstrap-Based Measures of the Uncertainty

Given the complex expressions of the estimators of the uncertainty for point predictors and their corrections, a natural alternative way of dealing with this problem is via resampling methods.

3.3.1 Parametric Bootstrap

Parametric Bootstrap (PB) procedures consitute the most intuitive and widespread scheme for bootstrapping GLMM and therefore have been extensively implemented to tackle the estimation of uncertainty measures. To summarize, this scheme relies on parameter estimates and simulation to create bootstrap samples of the random effects while exploiting the connection between linear predictor and the expectation of the conditional distribution of the outcome to generate the bootstrap samples y^∗_ij. In an attempt to illustrate how this scheme has been adapted to the estimation of the uncertainty, we describe the steps of a PB estimation algorithm as follows:

1. Estimation ofθ;b

2. Generation of bootstrap samples u^∗_i by simulation from a N_q(0,I_q) distribution;

3. Construction of bootstrap samples ofη_ij^∗ =x^T_ijβb+λ_uD

σbu^∗_i and ϑ^∗_i =λ_β^Tβb+λ_uD

σbu^∗_i; 4. Construction of µ^∗_ij =g⁻¹(η_ij^∗);

5. Generation of y^∗_ij by simulation from the the conditional distribution F_µ^∗

ij; 6. Model fit over the bootstrap samples yielding replicates bθ^∗ and ϑb^∗_i;

7. Repeat Steps2.-6. B times and estimate the Uncertainty Measure or the Correction.

Implementations of this basic algorithm to LMM are even more straightforward because of the identity link, which implies µ^∗_ij = η_ij^∗, and the conditional normality of the response entailingy_i^∗|u^∗_i ∼ N_n_i(µ^∗_ij,φIb _n_i) simplifying steps3,4and5of this general algorithm. Hence, a parametric bootstrap estimate of MSEP can be obtained from the expected squared differences of bootstrap replicates and samples, i.e.

msep^PB h

ϑbi(σ)b i

=E^∗ h

ϑb^∗_i −ϑ^∗_i i2

, (23)

(11)

where E^∗ denotes the Expectation over the Bootstrap distribution.

Still within the LMM framework and under an Empirical Bayes perspective, Butar &

Lahiri (2003) propose an alternative estimator that corrects the O(n⁻¹) Posterior Bias of the estimate in equation (22). The procedure consists in generating PB replicates of the variance components σb^∗ and plugging them into the expressions of g_1i, g_2i, hence providing the following correction:

pb^∗(σ) =b E^∗[g_1i(σb^∗) +g_2i(σb^∗)]−[g_1i(σ) +b g_2i(σ)],b

while the final contribution is estimated by computing the mean of the squared differences between plug-in replicates of the predictions and the EP, which equates to a bootstrap estimate of the contribution g_3i, as in:

g_3i^BL(σ) =b Var^∗_σn

ϑb_i(σ)b o

:=E^∗[bϑ_i(σb^∗)−ϑb_i(σ)]b ². The resulting estimator has therefore, the following expression:

var^∗DL_σ h

ϑb_i(σ)b i

= 2 [g_1i(σ) +b g_2i(σ)]b −E^∗[g_1i(σ) +b g_2i(σ)] +b g_3i^BL(σ)b . (24) Under some regularity conditions, Butar & Lahiri (2003) conclude that the bias of their estimator iso(n⁻¹) (Butar & Lahiri,2003, Theorem A.2), making it a comparable alternative to the approximation-based estimators for MSEP. Among the more recent proposals that pursue this methodology, we can mention for instance the works ofKubokawa & Nagashima (2012) who provide conditions entailing a third-order correct estimator of the contribution g3i.

3.3.2 Resampling-based Bootstraps

Procedures relying on resampling with replacement from vectors of the predicted random effects such as the Random Effect Bootstrap (REB) or the Transformation Bootstrap (TB) have been proposed, see e.g. Davison & Hinkley (1997); McCullagh (2000); Field et al.

(2010). To make a concise rundown of the procedures in the LMM, we can mention that the REB requires that the random structure of the model be decomposed into independent sources of variation as in equation (6), whereas the TB reparametrizes the general LMM (7) as y = Xβ+Σ^1/2δ where Σ = Z∆Z^T +φIN and δ drawn from a multivariate standard normal distribution.

These schemes could of course be adapted for the problem of estimation of MSEP through the steps of the general algorithm described in Section 3.3.1, albeit with the following mod- ifications:

• At Step 1, the estimation of θb should produce predictions of random effects bu(bθ).

In the case of the REB and TB for LMM, these predictions are unambiguously found via the EP, whereas in a non-Gaussian framework one can choose between Monte Carlo approximations of the integral ratios (9) orO(n⁻¹_i ) approximations via the CM, which are by-products of the LAML procedure.

(12)

• After Step 1, an optional intermediatereflation procedure could be performed to avoid the effects of shrinkage on the predictions, see e.g. Morris (2002).

For instance, one can use the suggestion by Carpenter et al. (2003) who propose to reflatethe predictions using an estimate of their variance e.g. in the REBv_r =ub^T_rbu_r/q_r, therefore yielding : b

ub_r=ub_r/√ v_r.

• At Step 2, bootstrap samples are constructed via Simple Random Sampling with Re- placement (SRSWR) from each of the vectors bu_r, instead of simulation for each observational unit.

• Step 5is replaced with a version that does not require simulation from the conditional distributions F_µ_ij.

In the LMM framework this modification does not have critical implications, again because of the identity link. As an example for the REB, consider the representation (6) and notice that, after construction of bootstrap samples u^∗_r and ^∗ via SRSWR from bur and b one can obtain bootstrap samples of the outcome, via the following relationship:

y^∗ =Xβb+

q

X

r=1

σb_rZ_ru^∗_r+^∗,

and this without any simulation step, whereas in a non-Gaussian framework this procedure is less obvious. Therefore, and inspired in part by the TB, we define a bootstrap procedure following a suggestion found in Shao & Tu (2012).

– Construct Pearson Residuals ˆ_ij = [y_ij −µˆ_ij]/p

φv(ˆµ_ij);

– Center these residuals around their empirical mean ¯= [Pn i=1

Pni

j=1εˆ_ij]/Pn i=1n_i as in e_ij =b_ij−¯and “stock them up” in vectors e= [[e_ij]ⁿ_j=1ⁱ ]ⁿ_i=1;

– Concatenate predictions of the random effects, obtained via either CM or Monte- Carlo EP, into a vector ˆu= [ˆu_i]ⁿ_i=1.

– Create bootstrap samples e^∗ = SRSWR(e), u^∗ = SRSWR(ˆu) and u^∗_i by parti- tioningu^∗.

– Withη^∗_ij =x^T_ijβb+z^T_ijD_b_σu^∗_i and µ^∗_ij =g⁻¹(η_ij^∗), bootstrap samples of the outcome can be obtained as follows:

y^∗_ij =µ^∗_ij +h

φv µb ^∗_iji1/2

e^∗_ij.

To close the discussion on this type of schemes, we believe important to point out that the quality of the procedures depend on the capability of the predictions to replicate the features of the Data Generating Process. In the Gaussian case, theory shows a degree of asymptotic convergence in the distribution of EP to the assumed Gaussian distribution, see Jiang (1998b), however this feature is not guaranteed in models for other types of outcomes, where the discreteness of the response could entail systematic discrepancies.

(13)

3.3.3 Random Weighted Laplace Bootstrap

A third bootstrap scheme for LMM relies on random weighting of the Estimating Equations such as the Generalized Cluster Bootstrap (GCB), see Field et al. (2010); Pang & Welsh (2014); Ding & Welsh(2017). Inspired by this approach, we propose the Random Weighted Laplace Bootstrap (RWLB), a procedure that consists in inserting random weights at the level of the exponent of the Joint PDF of outcomes and random effects of equation (3), which translates into weighting contributions `_i of equation (2), as in:

`^∗_i (θ,u_i) = w_i^∗`_i(θ,u_i). (25) Optimizing theseweighted contributions (25) in the spirit of the Laplace method for integrals, yields what we call the Laplace-approximated Weighted log-Likelihood function (LAWLL):

log ˜L^∗(θ) = nq

2 log 2π+ 1 2

n

X

i=1

log|V^∗_i (θ)| −

n

X

i=1

n_i`˜^∗_i (θ), (26) for V^∗_i (θ) := [w_i^∗n_i`˜⁽²⁾_i (θ)]⁻¹, which is optimized with respect to θ to obtain bootstrap replicates θ^∗. In Flores-Agreda (2017), we show some of the theoretical links between our procedure and the GCB for a Random Effect LMM of Equation (5), while simulation studies displayed evidence of the good quality of the resulting bootstrap distributions as approximations of the asymptotic distribution of the (fixed effects) parameters.

Once the replicates of the model parameters are available, an algorithm for the estimation of MSEP based the RWLB would follow the steps described in Section 3.3.1, yet with some additional considerations, namely:

• At Step 6: Unlike other schemes, the RWLB does not require bootstrap samples y_ij^∗ to generate the bootstrap replicates of bθ. They can be obtained by optimizing Formula (26), which could even be performed right after the Step1.

• Step 2 of the procedure requires the generation of bootstrap samples of the random effect vectors u^∗. For this purpose, we consider two possibilities:

– Simulation Strategy, generatingu^∗_i from a standard Multivariate Gaussian distribution, i.e. u^∗_i ∼ Nq(0,Iq), mimicking (to a point) the PB;

– Resampling Strategy, which constructs u^∗_i from the partitions of a vector of re- sampled concatenated predictions, echoing the REB and TB schemes described in Section 3.3.2.

• At Step 3, we propose to construct the samples taking advantage of the availability of the replicates of θ to generate the samples of η_ij and ϑ_i, e.g. as in:

η_ij^∗ =x^T_ijβb+z^T_ijD

bσ^∗u^∗_i (27)

ϑ^∗_i =λ_β^Tβb+λ_γ^TD_b_σ^∗u^∗_i (28) This approach is in part motivated by an analogy to the PB correction methods relying on double bootstrap procedures, see e.g. Hall & Maiti (2006). We have employed only the replicates of σ since, as we have seen in the literature review of Sections 3.1 and 3.2, it is their contribution to the uncertainty that requires correction.

(14)

• Finally, the maximization of the LAWLL provides as by-product “perturbed” CM at convergence i.e. predictions where the parameters have been replaced by replicatesθb^∗. These values could be helpful in the estimation to the contribution to the uncertainty due to the estimation of model parameters via the estimation of that contribution, similarly to the Butar & Lahiri (2003) estimator ofg_3i reviewed in Section 3.3.1.

3.4 Prediction Intervals

There has been a wide variety of suggestions of the use of uncertainty estimators, one of them being the construction ofPrediction Intervals (PI) often obtained by making analogies to asymptotic Gaussian Confidence Intervals for the parameters, see e.g. Morris (1983), as in:

PI1−α

h ϑb_i

θb , ϑ_ii

= (

ϑb_i bθ

±Φ⁻¹(1−α/2) r

mseph ϑb_i

θbi )

(29) where Φ⁻¹(1−α/2) denote the (1−α/2) quantile of a standard normal distribution. As pointed out by Hall & Maiti (2006) such proposals have generally first-order coverage and, in the LMM context, can be justified by considerations on the distribution of the EP and the consistency of the estimators of the parameters.

To improve on this coverage, PB-based intervals or corrections have also been proposed.

For the general LMM of equation (7), one of such suggestions has been provided byChatterjee et al. (2008) who provide conditions to refine the approximation of the distribution of combinations:

m(θ) :=λ^T[Xβ+ZDu].

Under full knoweledge of the parameters, these mixed effects can be predicted using combinations t(θ,y) := λ^T[Xβ+ZDu(θ,y)], where u(θ,y) := D^TZ^TΣ⁻¹[y−Xβ] denotes the BLUP of the vector of random effects. As exposed in Section 2.2 the covariance matrix of the predictions can be given by:

G(θ) := λ^TZ[∆−∆Z^TΣ⁻¹Z∆]Z^Tλ, hence yielding the normally distributed pivot:

Z(θ) := [G(θ)]^−1/2[t(θ,y)−m(θ)].

With these considerations Chatterjee et al. (2008) provide conditions under which the PB distribution of the statistic Z^∗(bθ^∗) := [G(bθ^∗)]^−1/2[t(bθ^∗,y^∗)−m(bθ^∗)] approximates the distribution of Z(bθ) with an order O(d³n^−3/2) whered=p+s and use the quantiles of the bootstrap distributionsq_α/2, q1−α/2, say, to construct bootstrap confidence sets for the mixed effects i.e.:

PI^CL_1−αh m

bθ ,mi

=

m bθ

−q_α/2h G

θbi1/2

,m θb

+q1−α/2

h G

bθi1/2 .

(15)

3.5 Implementation

We use the R environment (R Core Team,2015) for our implementations, where we rely on packages nlme (Pinheiro & Bates, 2009) andlme4(Bates et al.,2015) for ML estimates for the model parameters and CM predictions for the random effects. These packages provide also other useful elements such as the design matrices for the fixed and random effects, that we have exploited in order to construct the corrections to MSEP described in Section 3.2.

In the LMM context the CM predictions are equivalent to the EP, yet for non Gaus- sian outcomes we have also considered constructing Monte-Carlo approximations to the EP (MCEP). The estimates of the uncertainty measures obtained with these packages are just plug-in estimates, i.e. lacking the corrections that we have mentioned in Section3.2. There- fore, we have used the package TMB(Kristensen et al. ,2016) which provides the corrections and has the added flexibility to accomodate the RWLB.

4 Simulation Studies

To illustrate the performance of the adapted Bootstrap procedures and how they compare to the methods and corrections currently available, we have carried two simulation studies:

a first one in a LMM setting inspired by the Orthodontic data, where we can compare the many schemes to the analytical approximations. A second batch of simulations, motivated by the Toenail example, considers a Mixed Logit Model, for which only the CV and the Bootstrap procedures are available.

4.1 Gaussian Responses

We illustrate the methods for Gaussian LMM via an example inspired by the Orthodontic dataset (Potthoff & Roy,1964), which is available through the R packagenlmeand contains information on the measurement of the distance between the pituitary and the pterygo- maxillary fissure via x-ray measurements taken from 27 young girls and boys at four age points (8, 10, 12, 14 years). The objective of the original study consisted in estimating the trend of this measure over time all while taking into account the repeated observations over a same individual. The analyses found in the literature often consider modelling the outcomes by means of a LMM for distance while including the variation among individuals via a random intercept and a random slope for ageas in:

y_ij =β₀+β₁a_ij +β₂s_ij +β₃(s a)_ij +σ₁u_1i+σ₂u_2ia_ij +_ij, (30) where a_ij, s_ij denote the ageand Sex of the individual and (s a)_ij the interaction between these covariates, see e.g. Pinheiro & Bates(2009).

The values of the parameters in our simulations, set at β = [17.37,0.48,−1.03,0.9]^T, σ₁ = 7.5, σ₁ = 0.44 and φ = 1.82, come from an estimation of the original data with the model described in (30), yet they have been slightly adapted for numerical stability purposes. We have hence generated u_i = [u_1i, u_2i]^T ∼ N₂(0,I₂), _ij ∼ N(0, φ), and a_ij from sequences of values a_j ={8, . . . ,8 + 4(j −1)} for j ∈ {1, . . . , n_i} centered about their means as in ˜a_ij =a_j−¯a_j and s_ij from a vectors= [0^T_n/2,1^T_n/2]^T for1_n/2(n/2×1) a vector of

(16)

ones. Overall we have generated M = 1000 simulated samples, set the number of bootstrap replicates at B = 1000 and considered four configurations according to combinations of number of individuals n ∈ {100,250} and observations per individual ni ∈ {4,12}.

Throughout this study we have estimated uncertainty measures for the predictions of three types of effects : (i) Fitted Values (FV): η_ij, (λ_β = x_ij, λ_γ = z_ij); (ii) Random Intercepts (RI): σ1u1i (λβ =0p, λγ = [1,0]) and (iii) Random Slopes (RS): σ2u2i (λβ =0p, λ_γ = [0,1]) using the following estimators:

1. Conditional Variances (CV) in two versions:

(a) V1 : without any correction;

(b) V2 : sporting the correction ofKass & Steffey (1989), Equation (21);

For these estimators, we do not report the computations of these values for the predictions of fitted values ηij, since they are not retrieved by default with the methods available in TMB.

2. Approximation-Based Estimators (PRE) or MSEP, in the spirit of Kackar & Harville (1984), available in three versions:

(a) V1 : only comprising plug-in estimates of g_1i and g_2i, as described in formulae (15) and (16);

(b) V2 (Prasad & Rao, 1990): including the the estimate of the contribution due to the estimation of σ, as in (17);

(c) V3 : (Datta & Lahiri, 2000) : same as V2 but accounting for the small sample bias of σb_ML (18);

3. Random Weighted Laplace Bootstrap Estimator (RWLBE), in three versions:

(a) V1 : with theSimulation Strategy;

(b) V2 : with theResampling Strategy without Reflation;

(c) V3 : with theResampling Strategy with Reflation;

4. Random Effect Bootstrap Estimator (REBE) in two versions:

(a) V1 : without Reflation;

(b) V2 : with the Reflation step;

5. Parametric Bootstrap Estimator (PBE) in two versions :

(a) V1 : PB corrected CV (Butar & Lahiri, 2003), as in Equation (24);

(b) V2 : by direct estimation as in formula (23).

As in Singh et al. (1998), we compare the quality of the methods on the grounds of three features, namely, their capability to estimate the MSEP as measured by their Relative Bias, a measure of their efficiency in this estimation via the Relative Root-MSE and the Coverage Ratios when contrasted to the nominal level.

(17)

4.1.1 Relative Bias

The relative bias (RB) in the estimation of MSEP of a method denoted by the superscript (m) can be computed as follows:

RB^m_i = msep^m[bϑi(bθ)]−MSEP[bϑi(bθ)]

MSEP[bϑ_i(bθ)] ×100%

where msep^m denotes the mean of the estimate over all the samples, i.e.

msep^m[ϑbi(bθ)] = 1 M

M

X

k=1

msep^m_(k)[bϑi(bθ)]

and MSEP results from the mean squared differences of the predictions and effects averaged throughout the samples, as in:

MSEP[bϑ_i(bθ)] = 1 M

M

X

k=1

{ϑb^(k)_i (bθ)−ϑ^(k)_i }². (31) In Table 1 we display the relative bias averaged over the observations index. As a first remark, we can point out that the CV without correction (CV.V1) seriously underestimates the MSEP for RI and RS, when compared to theNaive MSEP estimator PRE.V1. Secondly, we can assess the improvements of the corrections both Bayesian and Frequentist for RE predictions, as the relative bias gets reduced when the corrections are added in most scenarios, with the exception of the second correction for the PRE (PRE.V3) when n= 250 which sports small positive biases for the RS and the setting when ni = 12 for the RI and RS predictions. These remarks do not translate to the FV the effects of interest are the fitted values, where the corrections seem to increase a positive bias for the PRE in all setings.

With regard to the Bootstrap-based methods, we can see that the proposals based on resampling of predictions without reflation (RWLBE.V1, REBE.V1) underestimate severely the uncertainty, whereas versions based on reflated predictions (RWLBE.V2, REBE.V2) reduce this bias to the point of making them comparable with the approximation-based methods, specifically when the effect of interest are FV and RI. RWLBE seems to be at its advantage when estimating the uncertainty of the FV with increases in cluster size, as RWLBE.V2 and RWLBE.V3 sport less of a positive bias after such increments while improving upon the approximation-based corrected estimators. On the other hand, when trying to predict random effects, this expansion doesn’t have a systematic effect as it can be seen from the increases in bias when n= 100 and reductions when n= 250, indicating that this method requires joint increases to improve over the approximation-based methods.

4.1.2 Efficiency

In a second comparison, we contrast the different methods on the grounds to their relative efficiency as measured via the Relative Root Mean Squared Error of estimation defined as:

RRMSE^m=

1 M

PM k=1

n

msep^m_(k)[bϑ_i(bθ)]−MSEP[ϑb_i(bθ)]o21/2

MSEP[bϑ_i(bθ)] ×100%,

(18)

Effect η_ij σ₁u_1i σ₂u_2i

n 100 250 100 250 100 250

CV V1 -63.57 -43.17 -16.16 -6.54

V2 -1.00 -1.18 -0.88 0.08

PRE V1 -0.02 0.31 -1.20 -1.24 -1.22 -0.06

V2 0.72 0.43 -0.77 -1.13 -1.21 -0.06 V3 1.39 0.70 -0.58 -1.01 -0.43 0.28 n_i = 4 RWLBE V1 -45.16 -45.65 -23.97 -31.95 -39.75 -42.34 V2 1.53 0.91 -1.38 -1.05 -0.49 0.36 V3 1.51 0.90 -1.43 -1.10 -0.50 0.33 REBE V1 -45.98 -45.96 -23.51 -31.84 -39.97 -42.53 V2 0.17 0.37 -0.87 -1.24 -0.82 0.07

PBE V1 0.47 0.49 -1.20 -1.24 -1.21 -0.06

V2 0.16 0.38 -0.98 -1.22 -0.82 0.07

CV V1 -79.82 -61.78 -72.28 -51.92

V2 -2.93 -0.06 -0.74 0.39

PRE V1 0.17 0.33 -2.93 -0.06 -0.75 0.39

V2 0.89 0.44 -2.68 0.01 -0.75 0.39

V3 1.28 0.59 -2.61 0.07 -0.64 0.46

n_i = 12 RWLBE V1 -16.34 -16.27 -7.61 -7.71 -6.73 -8.39 V2 0.24 0.37 -3.68 -0.42 -1.48 0.21 V3 0.24 0.35 -3.78 -0.34 -1.45 0.08 REBE V1 -16.43 -16.29 -6.79 -7.24 -5.92 -8.50 V2 0.18 0.32 -2.80 -0.10 -0.80 0.38

PBE V1 0.37 0.40 -2.93 -0.06 -0.75 0.39

V2 0.16 0.34 -2.84 -0.03 -0.84 0.40 Table 1: Average relative bias of estimators of MSEP.

where MSEP is computed as in equation (31) and the superscripts denote the methods.

Table 2displays these measures for all the methods in all of our considered scenarios.

It is apparent from Table 2 that, again, CV.V1 is the least efficient method of all those based on approximations while the “naive” PRE.V1 does not sport a noticeable difference in terms of efficiency with respect to the corrected estimators. Moreover, all of the estimators based on a decomposition according to contributions (PRE.V1, V2, V3 and even PBE.V1) display the least RRMSE in each of the scenarios. The Bootstrap-based methods, again with the exception of those based on resampling without reflation (RWLBE.V1 and REBE.V1), are more or less in the same order with those obtained with the approximations and corrections as it can be assessed by the differences in RRMSE ranging between 1 and 2%.

When reading through the effects of increases in number of clusters, all methods show reductions with the exception of RWLBE.V1 and REBE.V1 which display an increased ineffi- ciency whenn_i = 100. On the other hand, increases in cluster size alone seem to improve the efficiency of estimators for FV, whereas we can report increases in the RRMSE of estimators for RI and RS for all methods except RWLBE.V1 and REBE.V1, as a consequence of the

(19)

n 100 250 100 250 100 250

CV V1 63.68 43.32 18.14 9.07

V2 9.72 6.60 9.51 6.90

PRE V1 10.68 7.32 9.72 6.60 9.49 6.89

V2 10.71 7.32 9.73 6.59 9.49 6.89 V3 10.81 7.35 9.73 6.58 9.49 6.91 n_i = 4 RWLBE V1 45.59 45.86 25.40 32.32 40.18 42.55 V2 11.89 8.79 10.72 7.98 10.55 8.36 V3 11.82 8.70 10.69 8.09 10.52 8.30 REBE V1 46.40 46.16 25.03 32.22 40.39 42.73 V2 11.69 8.71 10.71 8.04 10.56 8.33

PBE V1 10.79 7.34 9.74 6.61 12.77 7.77

V2 11.61 8.63 10.77 8.02 10.49 8.26

CV V1 79.82 61.79 72.29 51.93

V2 11.54 6.51 10.57 5.94

PRE V1 6.16 4.92 11.54 6.51 10.57 5.94

V2 6.23 4.93 11.52 6.52 10.57 5.94 V3 6.30 4.94 11.50 6.52 10.56 5.94 n_i = 12 RWLBE V1 17.27 16.92 13.98 10.50 13.00 10.73 V2 7.71 6.81 12.36 7.88 11.65 7.53 V3 7.69 6.80 12.56 7.98 11.55 7.39 REBE V1 17.35 16.94 13.70 10.31 12.76 10.79 V2 7.70 6.79 12.43 7.91 11.53 7.50

PBE V1 6.19 4.93 11.54 6.52 10.91 6.30

V2 7.68 6.79 12.31 7.88 11.32 7.43 Table 2: Average RRMSE estimators of MSEP.

reduced shrinkage effect and, PBE.V1, since it is based on the o(n⁻¹) approximation. Over all, RWLBE.V2 and RWLBE.V3 fare comparably with the other methods in the grounds of efficiency.

4.1.3 Coverage Ratios of Prediction Intervals

As a final contrast, we measure the capability of the methods to create accurate prediction intervals by looking at the Coverage Ratios (CR) of their Prediction Intervals (PI) obtained with the widespread Gaussian-inspired suggestion of equation (29) and contrasting it with the nominal level 1−α, i.e.

CR^m= 1 M

M

X

k=1

1n

ϑ^(k)_i ∈PI^m_1−α[bϑ^(k)_i (bθ), ϑ^(k)_i ]o

where 1 represents a Kroenecker delta or Indicating function. For the purpose of being exhaustive, we have also included the Chatterjee et al. (2008) suggestion in the study, by

(20)

constructing PI for the FV using this proposal. In the tables, we indicate the CR of this last proposal with the label PBE.V3.

n 100 250 100 250 100 250

CV V1 0.7620 0.8587 0.9257 0.9416

V2 0.9483 0.9484 0.9476 0.9500

PRE V1 0.9487 0.9495 0.9043 0.9632 0.9474 0.9499 V2 0.9495 0.9496 0.9486 0.9485 0.9474 0.9499 V3 0.9503 0.9499 0.9489 0.9486 0.9482 0.9502 RWLBE V1 0.8508 0.8500 0.9120 0.8926 0.8696 0.8620 n_i = 4 V2 0.9499 0.9500 0.9476 0.9482 0.9481 0.9500 V3 0.9501 0.9499 0.9475 0.9482 0.9480 0.9498 REBE V1 0.8476 0.8489 0.9126 0.8928 0.8692 0.8612 V2 0.9486 0.9493 0.9484 0.9479 0.9474 0.9497 PBE V1 0.9492 0.9497 0.9481 0.9483 0.9459 0.9495 V2 0.9485 0.9493 0.9483 0.9478 0.9477 0.9497 V3 0.9481 0.9485

CV V1 0.6287 0.7737 0.7047 0.8247

V2 0.9447 0.9494 0.9461 0.9504

PRE V1 0.9495 0.9501 0.9447 0.9494 0.9461 0.9504 V2 0.9503 0.9503 0.9450 0.9495 0.9461 0.9504 V3 0.9507 0.9504 0.9451 0.9496 0.9462 0.9505 RWLBE V1 0.9258 0.9264 0.9386 0.9395 0.9385 0.9388 n_i = 12 V2 0.9493 0.9499 0.9434 0.9488 0.9444 0.9498 V3 0.9492 0.9499 0.9430 0.9487 0.9447 0.9495 REBE V1 0.9256 0.9263 0.9398 0.9402 0.9394 0.9385 V2 0.9492 0.9498 0.9446 0.9492 0.9454 0.9500 PBE V1 0.9497 0.9502 0.9447 0.9494 0.9459 0.9504 V2 0.9491 0.9499 0.9446 0.9494 0.9454 0.9499 V3 0.9483 0.9481

Table 3: Average coverage ratios for Prediction Intervals of level 1−α = 0.95.

Table3contains a summary of the CR for all methods and a nominal level of 1−α= 0.95, again averaged by index for presentation purposes. Once more the approximation-based methods that lack the corrections, be them Frequentist or Bayesian (CV.V1, PRE.V1, V2) and the bootstrap methods based on predictors without reflation (RWLBE.V1, REBE.V1), tend to miss this nominal level sometimes quite dramatically. On the other hand, the other methods sport good CR that are very close to the nominal level. In particular, RWLBE.V2 and RWLBE.V3 present a slight advantage in the computation of CR for FV in each of the scenarios, with CR that are very close to 0.95. Overall, increases in number of clusters seem to improve the CR for all methods, whereas increases in cluster size alone do not seem to have a positive effect, as it was the case for the efficiency.