• Aucun résultat trouvé

Bootstrap-Based Measures of the Uncertainty

2.3 Measures of Uncertainty in Prediction of Random Effects

2.3.3 Bootstrap-Based Measures of the Uncertainty

Given the complex expressions of the estimators of the uncertainty for point predictors and their corrections, a natural alternative way of dealing with this problem is via resampling methods.

Parametric Bootstrap

Parametric Bootstrap (PB) procedures consitute the most intuitive and widespread scheme for bootstrapping GLMM and therefore have been extensively implemented to tackle the estimation of uncertainty measures. To summarize, this scheme relies on parameter esti-mates and simulation to create bootstrap samples of the random effects while exploiting the connection between linear predictor and the expectation of the conditional distribu-tion of the outcome to generate the bootstrap samples yij. In an attempt to illustrate how this scheme has been adapted to the estimation of the uncertainty, we describe the steps of a PB estimation algorithm as follows:

1. Estimation ofθb;

2. Generation of bootstrap samples ui by simulation from a Nq(0,Iq) distribution;

3. Construction of bootstrap samples of ηij = xTijβb +λuD

bσui and ϑi = λβTβb + λuD

bσ

ui;

4. Construction of µij =g−1(ηij);

5. Generation of yij by simulation from the the conditional distribution Fµ

ij; 6. Model fit over the bootstrap samples yielding replicates θb and ϑbi;

7. Repeat Steps2.-6. Btimes and estimate the Uncertainty Measure or the Correction.

Implementations of this basic algorithm to LMM are even more straightforward be-cause of the identity link, which implies µij = ηij, and the conditional normality of the response entailing yi|ui ∼ Nni(µij,φIb ni) simplifying steps 3, 4 and 5 of this general al-gorithm. Hence, a parametric bootstrap estimate of MSEP can be obtained from the expected squared differences of bootstrap replicates and samples, i.e.

msepPBhϑbi(σb)i=E

h

ϑbiϑii2, (2.23) where E denotes the Expectation over the Bootstrap distribution.

2.3. Measures of Uncertainty in Prediction of Random Effects 51 Still within the LMM framework and under an Empirical Bayes perspective,Butar and Lahiri [2003] propose an alternative estimator that corrects the O(n−1) Posterior Bias of the estimate in equation (2.22). The procedure consists in generating PB replicates of the variance components σb and plugging them into the expressions of g1i, g2i, hence providing the following correction:

pb(σb) =E[g1i(σb) +g2i(σb)]−[g1i(σb) +g2i(σb)],

while the final contribution is estimated by computing the mean of the squared differences between plug-in replicates of the predictions and the EP, which equates to a bootstrap estimate of the contribution g3i, as in:

g3iBL(σb) = Varσ

n

ϑbi(σb)o:=E[ϑbi(σb)−ϑbi(σb)]2. The resulting estimator has therefore, the following expression:

var∗DLσ

h

ϑbi(σb)i= 2 [g1i(σb) +g2i(σb)]−E[g1i(σb) +g2i(σb)] +gBL3i (σb). (2.24) Under some regularity conditions, Butar and Lahiri[2003] conclude that the bias of their estimator is o(n−1) [Butar and Lahiri,2003, Theorem A.2], making it a comparable alter-native to the approximation-based estimators for MSEP. Among the more recent proposals that pursue this methodology, we can mention for instance the works of Kubokawa and Nagashima [2012] who provide conditions entailing a third-order correct estimator of the contribution g3i.

Resampling-based Bootstraps

Procedures relying on resampling with replacement from vectors of the predicted random effects such as theRandom Effect Bootstrap(REB) or theTransformation Bootstrap(TB) have been proposed, see e.g. Davison and Hinkley [1997], McCullagh [2000], Field et al.

[2010]. To make a concise rundown of the procedures in the LMM, we can mention that the REB requires that the random structure of the model be decomposed into independent sources of variation as in equation (2.6), whereas the TB reparametrizes the general LMM (2.7) as y = +Σ1/2δ where Σ = Z∆ZT +φIN and δ drawn from a multivariate standard normal distribution.

These schemes could of course be adapted for the problem of estimation of MSEP through the steps of the general algorithm described in Section 2.3.3, albeit with the following modifications:

• At Step 1, the estimation of θb should produce predictions of random effects ub(θb).

In the case of the REB and TB for LMM, these predictions are unambiguously found via the EP, whereas in a non-Gaussian framework one can choose between Monte Carlo approximations of the integral ratios (2.9) or O(n−1i ) approximations via the CM, which are by-products of the LAML procedure.

• After Step 1, an optional intermediate reflation procedure could be performed to avoid the effects of shrinkage on the predictions, see e.g. Morris [2002].

For instance, one can use the suggestion by Carpenter et al. [2003] who propose to reflate the predictions using an estimate of their variance e.g. in the REB vr = ubTrubr/qr, therefore yielding : ubbr =ubr/

vr.

• At Step 2, bootstrap samples are constructed via Simple Random Sampling with Replacement (SRSWR) from each of the vectors ubr, instead of simulation for each observational unit.

• Step5 is replaced with a version that does not require simulation from the condi-tional distributions Fµij.

In the LMM framework this modification does not have critical implications, again because of the identity link. As an example for the REB, consider the representation (2.6) and notice that, after construction of bootstrap samplesur and via SRSWR from ubr and b one can obtain bootstrap samples of the outcome, via the following relationship:

y =Xβb +Xq

r=1

σbrZrur+,

and this without any simulation step, whereas in a non-Gaussian framework this procedure is less obvious. Therefore, and inspired in part by the TB, we define a bootstrap procedure following a suggestion found in Shao and Tu [2012].

Construct Pearson Residuals ˆij = [yijµˆij]/qφvµij);

Center these residuals around their empirical mean ¯= [Pni=1

Pni

j=1εˆij]/Pni=1ni as in eij =bij¯and “stock them up” in vectors e= [[eij]nj=1i ]ni=1;

Concatenate predictions of the random effects, obtained via either CM or Monte-Carlo EP, into a vector ˆu = [ˆui]ni=1.

Create bootstrap samples e = SRSWR(e), u = SRSWR(ˆu) and ui by par-titioning u.

With ηij = xTijβb +zTijD

bσ

ui and µij = g−1(ηij), bootstrap samples of the out-come can be obtained as follows:

yij =µij +hφvb µiji1/2eij.

To close the discussion on this type of schemes, we believe important to point out that the quality of the procedures depend on the capability of the predictions to replicate the features of the Data Generating Process. In the Gaussian case, theory shows a degree of asymptotic convergence in the distribution of EP to the assumed Gaussian distribution, see Jiang [1998b], however this feature is not guaranteed in models for other types of outcomes, where the discreteness of the response could entail systematic discrepancies.

Random Weighted Laplace Bootstrap

A third bootstrap scheme for LMM relies on random weighting of the Estimating Equa-tions such as the Generalized Cluster Bootstrap (GCB), see Field et al. [2010], Pang and Welsh [2014], Ding and Welsh [2017]. Inspired by this approach, we propose the Ran-dom Weighted Laplace Bootstrap (RWLB), a procedure that consists in inserting random weights at the level of the exponent of the Joint PDF of outcomes and random effects of equation (2.3), which translates into weighting contributions `i of equation (2.2), as in:

`i (θ,ui) =wi`i(θ,ui). (2.25)

2.3. Measures of Uncertainty in Prediction of Random Effects 53 Optimizing these weighted contributions (2.25) in the spirit of the Laplace method for integrals, yields what we call the Laplace-approximated Weighted log-Likelihood function (LAWLL): replicates θ. In the previous Chapter, we show some of the theoretical links between our procedure and the GCB for a Random Effect LMM of Equation (2.5), while simulation studies displayed evidence of the good quality of the resulting bootstrap distributions as approximations of the asymptotic distribution of the (fixed effects) parameters.

Once the replicates of the model parameters are available, an algorithm for the esti-mation of MSEP based the RWLB would follow the steps described in Section 2.3.3, yet with some additional considerations, namely:

• At Step 6: Unlike other schemes, the RWLB does not require bootstrap samples yij to generate the bootstrap replicates of θb. They can be obtained by optimizing Formula (2.26), which could even be performed right after the Step 1.

• Step2of the procedure requires the generation of bootstrap samples of the random effect vectors u. For this purpose, we consider two possibilities:

Simulation Strategy, generatingui from a standard Multivariate Gaussian dis-tribution, i.e. ui ∼ Nq(0,Iq), mimicking (to a point) the PB;

Resampling Strategy, which constructsui from the partitions of a vector of re-sampled concatenated predictions, echoing the REB and TB schemes described in Section 2.3.3.

• At Step 3, we propose to construct the samples taking advantage of the availability of the replicates of θ to generate the samples of ηij and ϑi, e.g. as in:

This approach is in part motivated by an analogy to the PB correction methods relying on double bootstrap procedures, see e.g. Hall and Maiti [2006]. We have employed only the replicates of σ since, as we have seen in the literature review of Sections 2.3.1 and 2.3.2, it is their contribution to the uncertainty that requires correction.

• Finally, the maximization of the LAWLL provides as by-product “perturbed” CM at convergence i.e. predictions where the parameters have been replaced by replicates θb. These values could be helpful in the estimation to the contribution to the uncertainty due to the estimation of model parameters via the estimation of that contribution, similarly to the Butar and Lahiri [2003] estimator of g3i reviewed in Section 2.3.3.