The Gaussian process emulator can also be interpreted as an approximation of the com- puter model by kernel interpolation with radial basis function as in Schaback (1995, 2007). In this framework, a point-wise control on the error of approximation is pro- vided. Hence, we are able to guarantee a control on the distance between the maximum likelihood estimates in the approximate mixed meta-models and the maximum likeli- hood estimates obtained with the exact computer model. This control is decreasing to zero as a function of the space-fillingness of the design of numerical experiments. The paper is organized as follows. Section 2 introduces the standard non-linear mixed model and Section 3 recalls the principles and the main results of the Gaussian process emulation. Section 4 introduces three mixedmodels approximated by Gaussian process emulator. In Section 5, three versions of the SAEM algorithm coupled to a Gaussian process emulator are proposed. Theoretical results are given in Section 6. A simulation study illustrates these results (Section 7). Section 8 concludes the paper with some extensions. Proofs are gathered in Appendix.
We presented two approaches to fit linear mixedmodels accounting for left-censoring of the response and we showed with an example that they gave the same results. Thus, in a practical point of view, to fit mixedmodels for left-censored repeated measures, one can choose between NLMIXED and CENSAD. The main elements to choose between approaches are the structure of the data and the model used. In fact, CENSAD will be limited when numerous measures are censored while too many random effects will limit NLMIXED because of the numerical integration. Another point is the potential extension of the estimation to more general model. CENSAD allows including a Gaussian process in the error term like a first order auto-regressive process or a Brownian motion. The extension to a bivariate model is direct with NLMIXED (see section 3.2) and possible using the other method ). Using NLMIXED, the main limitation is then the number of random effects. In our experience, the procedure was reliable until four random effects leading, for example, to a bivariate model with two intercepts and two random slopes.
Mixedmodels with random effects are often used, for example in pharmacokinetics. They describe both individual behavior and variability between individuals. The distribution of random effects is of special interest. It allows for example to describe the heterogeneity of the drug kinetics in the population of individuals. Mixedmodels have been widely studied, often with parametric strategies and Gaussian random effects and noise (see Pinheiro and Bates, 2000). However it is not clear that this normality assumption of the random effects is truly satisfied in practice. The aim of this paper is to produce nonparametric estimation of the density of the random effects from the observations Yk,j.
Sarah Ola Moreira 1 , Karin Tesch Kuhlcamp 2 , Fabíola Lacerda de Souza Barros 2 ,
Moises Zucoloto 3 , Alyce Carla Rodrigues Moitinho 4
A bstract – Few cultivars of papaya from the Formosa group are available to producers, and the
development of new genotypes is indispensable. Thus, the use of effective selection strategies to obtain more productive cultivars and better quality fruits is also necessary. The aim of this study was to select of half-sib families (HSF) of papaya using the methodology of mixedmodels. Nineteen HSFs from the Incaper’s papaya breeding program were evaluated in a randomized block design with five replicates and nine plants per plot. The selection was made based on fruit mass (FM), pulp thickness (PT), soluble solids content in pulp (SS) and number of fruits (NF). The genetic parameters and genotypic values were estimated by the REML/BLUP procedure. The selected HSFs increased FM by 26.1%; the PT in 10.5%; the SS in 7.5% and; the NF in 13.0%. The additive heritability within the progenies and the individual variation coefficient obtained indicate that the selection between and within the HSFs can provide greater genetic gains. The selection based on the REML/BLUP methodology was efficient to obtain simultaneous genetic gains for all variables under study despite the negative correlation between them.
An interesting direction for further research would be to develop the statisti- cal methodology for semi-Markov switching generalized linear mixedmodels. Since the hidden semi-Markov chain likelihood cannot be written as a simple product of matrices, the MCEM algorithm proposed by Altman (2007) for the MS-GLMM cannot be directly extended to the semi-Markovian case. In our MCEM-like algorithm proposed for MS-LMM and SMS-LMM, the difficulty lies mainly in the prediction of the random effects.
In experimental sciences (agronomy, biology, experimental psychology, ...), analysis of vari- ance (ANOVA) is often used to explain one con- tinuous response with respect to different ex- perimental conditions, assuming homoscedas- tic errors. In studies where individuals con- tribute more than one observation, such as lon- gitudinal or repeated-measures studies, classi- cal ANOVA is no longer convenient since the assumption of data independence is not valid. The linear mixed model ( Laird and Ware , 1982 ) then provides then a better framework to take correlation between these observations into ac- count. By introducing random effects, mixedmodels allow to take into account the variabil- ity of the response among the different individ- uals and the possible within-individual corre- lation. Published case studies using a mixed model approach ( Baayen et al. , 2008 ; Onyango ,
Applications of linear mixedmodels (LMMs) to problems in genomics include phenotype prediction, correction for confounding in genome-wide association studies, estimation of narrow sense heritability, and testing sets of variants (e.g., rare variants) for association. In each of these applications, the LMM uses a genetic similarity matrix, which encodes the pairwise similarity between every two individuals in a cohort. Although ideally these similarities would be estimated using strictly variants relevant to the given phenotype, the identity of such variants is typically unknown. Consequently, relevant variants are excluded and irrelevant variants are included, both having deleterious effects. For each application of the LMM, we review known effects and describe new effects showing how variable selection can be used to mitigate them.
and the regression function is solution of an ODE. The estimation problem in the case where the ODE has no analytical solution has already been solved in .
However, most of the time, the studied biological process is not fully understood or too complex to be modeled deterministically. So, to account for time-dependent or serial correlated residual errors and to handle real life variations in model parameters occurring over time, mixedmodels described by stochastic differential equations (SDEs) have been introduced in the literature (see  or  for instance). These models are a natural extension of the models defined by ODEs, allowing to take into account errors associated with misspecifications and approximations in the dynamic system.
Bivariate linear mixedmodels are useful when analyzing longitudinal data of two associated markers. In this paper, we present a bivariate linear mixed model including random effects or first-order auto-regressive process and independent measurement error for both markers. Codes and tricks to fit these models using SAS Proc MIXED are provided. Limitations of this program are discussed and an example in the field of HIV infection is shown. Despite some limitations, SAS Proc MIXED is a useful tool that may be easily extendable to multivariate response in longitudinal studies.
Nonlinear mixed effects models (NLMM) are more and more frequently used for anal- ysis of longitudinal data and repeated measurements in pharmacokinetics, growth and other studies. Comparing to linear mixedmodels, parameters of such models provide a better biological interpretation of the mechanisms involved and the corresponding models are also more parcimonious. The main interest of this paper is to obtain good parameter estimates using maximum likelihood estimation in nonlinear mixed effects models. Several procedures have already been proposed to estimate parameters of NLMM. The first ones were based on linearization of the log-likelihood such as first-order (F0) and first-order conditional expectation (FOCE) approximations (Sheiner and Beal, 1980; Lindstrom and Bates, 1990). Since errors can be large in the approximation of the observed log-likelihood (Davidian and Giltinian, 1995), some methods based on exact maximum likelihood (ML) were proposed such as Gaussian quadrature and methods based on Monte Carlo meth- ods. However, integration via Gaussian quadrature can be difficult and inaccurate in cases with high dimensionality, in this way stochastic tools may be a powerful alternative. Wei and Tanner (1990) proposed the MCEM algorithm, in which the E-step of the EM algorithm is approximated using a large sample of simulated data and so it is highly time consuming. For instance, Booth and Hobert (1999) reported some results from a study on a real data set: they simulated around 60,000 samples for the final iteration. Delyon et al (1999) proposed a method which promises convergence with fewer simulations: the SAEM algorithm. In this method, the E-step of EM algorithm is replaced by a Simulation step and a Stochastic Approximation step. When the conditional distribution of the missing effects given the observations is unknown, Kuhn and Lavielle (2004, 2005) combined the SAEM algorithm with a MCMC procedure, such as the Metropolis-Hastings algorithm, and called SAEM-MCMC algorithm. In practice, the main problem of this method is to adequately calibrate its parameters to obtain good parameter estimates.
The estimation algorithms proposed in this paper can directly be transposed to other families of hidden Markov models such as for instance hidden Markov tree models; see Durand et al. (2005) and references therein. Another interesting direction for further research would be to develop the statistical methodology for semi-Markov switching generalized linear mixedmodels to take into account non-normally distributed response variables (for instance, number of growth units, apex death/life, non flowering/flowering character in the plant architec- ture context). Since the conditional expectation of random effects given state sequences cannot be analytically derived, the proposed MCEM-like algorithm for semi-Markov switching linear mixed model cannot be transposed to the case of non-normally distributed observed data and other conditional restora- tion steps, for instance based on a Metropolis-Hastings algorithm, have to be derived for the random effects.
Besides, regularisation methods have already been developped for GLMM, in which the random effects allow to model complex dependence structure. Eliot et al. [ 3 ] proposed to extend the classical ridge regression to Linear MixedModels (LMM). The Expectation- Maximisation algorithm they suggest includes a new step to find the best shrinkage pa- rameter - in the Generalised Cross-Validation (GCV) sense - at each iteration. More re- cently, Groll and Tutz [ 4 ] proposed an L 1 -penalised algorithm for fitting a high-dimensional
corresponding curves on Figure 1. Such a data set has been previously analyzed by Mignon-
Grasteau et al. (1999), Jaffr´ ezic et al. (2006) and Meza et al. (2007), who conluded that,
among the standard growth models, the monotonic mixed Gompertz model is the most
appropriate one. This model is adapted to the most subjects, however it fails to model the
Comparison of the estimated Gaussian hidden semi-Markov chain (GHSMC) parameters (i.e. where the influence of covariates and the inter-individual heterogeneity are not taken into account) with the estimated semi-Markov switching linear mixed model (SMS-LMM) parameters (state occupancy distributions and marginal observation distributions). The regression parameters, the cumulative rainfall effect and the variability decomposition are given
[Figure 3 about here.] [Figure 4 about here.]
The main original element of this study is the development of the SAEM algorithm for two- levels non-linear mixed effects models. We extend the SAEM algorithm developed by Kuhn and Lavielle (16), which was not yet adapted to the case of MNLMEMs with two levels of random effects. This algorithm will be implemented in the 3.1 version of the monolix software, freely available on the following website: http://monolix.org. The two levels of random effects are the between-subject variance and the within-subject (or between-unit) variance, with N subjects and K units, with no restriction on N or K. We show that the SAEM algorithm is split into two parts: an explicit EM algorithm and a stochastic EM part. The integration of the term p(b|φ; θ) in the likelihood results in the derivation of two additional sufficient statistics compared to the original algorithm. Furthermore it uses two intermediate quantities, the conditional expectations and variance of the between-subject random effects parameters b. The addition of higher levels of variability would therefore require other extensions of the algorithm.
Bootstrapping the raw random effects and residuals does not take into account variance underestimation, leading to shrinkage in the individual parameter estimates. To account for this issue, we employed the correction using the ratio between estimated and empirical variance-covariance matrix for the random effects and the residuals. It was shown to be an ap- propriate method for linear mixed-effects models because of the improvement of estimation for variance components. These ratios account for the degree of two shrinkages: η-shrinkage and ϵ-shrinkage, which quantify the amount of information in the individual data about the parameters [27, 28, 29]. When the data is not informative, the random effects and residuals are shrunk toward 0 and high degree of η-shrinkage and ϵ-shrinkage will be obtained. Sam- pling in the raw distribution will therefore underestimate the actual level of variability in the data, while correcting both empirical random effects and residuals for shrinkage restores this level. This idea of accounting for the difference between the estimated and empirical variance of residuals through an estimate of the shrinkage was proposed in bootstrapping ordinary lin- ear models , and was extended for the two levels of variability found in mixedmodels by Wang et al .