Thesis
Reference
Generalized linear latent variable models with flexible distributions
IRINCHEEVA, Irina
Abstract
We consider first a semi-nonparametric specification for the density of latent variables in Generalized Linear Latent Variable Models (GLLVM). This specification is flexible enough to allow for an asymmetric, multi-modal, heavy or light tailed smooth density. The degree of flexibility required by many applications can be achieved through this semi-nonparametric specification with a finite number of parameters. Even with this additional flexibility, we obtain an explicit expression of the likelihood for conditionally normal manifest variables. We show that the estimated density of latent variables capture the true one with good accuracy. In the second part we consider a spatial generalized linear latent variable model with and without assumption of normality on latent variables. The Laplace approximation is applicable when latent variables are multivariate normal. Otherwise the assumption of marginal normality is relaxed in favor of a mixture of normals. The pairwise likelihood estimators are explored by simulations.
IRINCHEEVA, Irina. Generalized linear latent variable models with flexible distributions. Thèse de doctorat : Univ. Genève, 2011, no. SES 758
URN : urn:nbn:ch:unige-169962
DOI : 10.13097/archive-ouverte/unige:16996
Available at:
http://archive-ouverte.unige.ch/unige:16996
Disclaimer: layout of this document may differ from the published version.
G ENERALIZED L INEAR L ATENT V ARIABLE M ODELS WITH
F LEXIBLE D ISTRIBUTIONS
Thèse présentée à la Faculté des sciences économiques et sociales de l’Université de Genève
Par Irina Irincheeva
pour l’obtention du grade de
Docteur ès sciences économiques et sociales mention : statistique
Membres du jury de thèse :
Prof. Eva CANTONI-RENAUD, Co-directrice de these, Universite de Genève Prof. Marc GENTON, Co-directeur de these, Texas A&M University, USA Prof. Irini MOUSTAKI, London School of Economics, London, UK
Prof. Maria-Pia VICTORIA-FESER, Presidente du jury, Universite de Genève
Thèse758
Genève, le 15 août 2011
La Faculté des sciences économiques et sociales, sur préavis du jury, a autorisé l’impression de la présente thèse, sans entendre, par là, n’émettre aucune opinion sur les propositions qui s’y trouvent énoncées et qui n’engagent que la responsabil- ité de leur auteur.
Genève, le 15 août 2011
Le doyen
Bernard MORARD
Impression d’après le manuscrit de l’auteur
Generalized Linear Latent Variable Models with Flexible Distributions
THÈSE
présentée à la Faculté des sciences économique et sociales de l’Université de Genève
par
Irina IRINCHEEVA
sous la direction de
prof. Eva Cantoni et prof. Marc G. Genton
pour l’obtention du grade de
Doctorat ès sciences économiques et sociales mention statistique
Membres du jury de thèses:
Mme Eva CANTONI, Professeur
Marc G. GENTON, Professeur, Texas A&M University, USA Irini MOUSTAKI, Professeur, London School of Economics, UK
Maria-Pia VICTORIA-FESER, Professeur, présidente du jury
Thèse758 Geneva, 15 août 2011
Contents
1 Introduction 1
2 Generalized linear latent variable models with flexible distribution of
latent variables 5
2.1 Generalized linear latent variable models . . . 6
2.2 Semi-nonparametricGLLVM . . . 11
2.2.1 Parametrization ofPL(z) . . . 11
2.2.2 Identifiability and constraints . . . 12
2.2.3 Multi-modality and flexibility ofSNP . . . 13
2.3 Inference inSNP-GLLVM . . . 15
2.3.1 Conditionally normal manifest variables . . . 15
2.3.2 Mixture of conditionally binary and normal manifest variables 17 2.3.3 Tuning the flexibility of theSNPdensity . . . 19
2.4 Monte Carlo simulations . . . 21
2.5 Data analysis . . . 27
2.5.1 Swineford-Holzinger data analysis . . . 27
2.5.2 Swiss consumption data analysis . . . 31
2.6 Residuals and latent scores . . . 36
2.6.1 Latent scores prediction . . . 36
2.6.2 Residuals . . . 39
2.6.3 Illustrations on simulations . . . 43
2.6.4 Real data examples . . . 59
2.7 Discussion . . . 65
3 Spatial Generalized Linear Latent Variable Models 67 3.1 Introduction . . . 68
3.2 SpatialGLLVMwith Gaussian Latent Variables . . . 72
3.2.1 Conditionally Gaussian manifest variables . . . 72
3.2.2 Mixed-scale manifest variables . . . 73
3.3 SpatialGLLVMwith mixture of Gaussians latent variables . . . 77
3.3.1 Assumptions on the latent variables distribution . . . 77
3.3.2 Pairwise likelihood . . . 79
3.3.3 Adaptive integration . . . 80
3.3.4 Pairwise Monte-Carlo EM algorithm . . . 81
3.4 Implementation and Simulations . . . 84
3.5 Analysis of Precision Agriculture Data . . . 93
3.6 Discussion . . . 94
4 Conclusion 97
A Gradient computation: normal manifest and multivariate latent. 99
B Hessian computation: normal manifest and multivariate latent. 101
C Gradient computation: mixed manifest and multivariate latent. 110
D Proof of Proposition 2.2.1 and Corollary 2.2.2 112
E Model selection for mixture of normals latent variable 114
F Swiss consumption data analysis, one latent variable 116
G SNP-GLLVMlog-h-likelihood: Hessian matrix elements 119
References 121
Résumé
Dans la première partie de cette thèse (Chapitre 2) nous considérons la spécifi- cation semi-nonparamétrique pour la densité des variables latentes dans les mod- èles généralisés à variables latentes (GLLVM). Cette spécification est assez flexible pour permettre à une densité assez lisse d’être asymétrique, multi-modale, avoir des queux épaisses ou légères. Le degré de flexibilité nécessaire pour plusieurs appli- cations deGLLVMest disponible à travers cette specification semi-nonparamétrique avec le nombre finis de paramètres estimés par maximum de vraisemblance. Même avec cette flexibilité additionnelle, on obtient une expression explicite du maximum de vraisemblance pour le cas des variables manifestes conditionnellement normales.
Nous montrons sur des simulations que la densité estimée des variables latentes cap- ture la vraie densité avec une bonne précision et est facile à visualiser. Par analyse de deux jeux de données réels nous montrons que la distribution flexible des vari- ables latentes est un bon outil pour explorer l’ajustement deGLLVMen pratique.
Dans la deuxième partie (Chapitre 3) nous considérons le modèle spatial général- isés à variables latentes avec et sans l’hypothèse de normalité sur les variables la- tentes. Il est montré que l’approximation de Laplace peut être appliquer quand les variables latentes suivent une loi normale multivariée. Sinon nous abandon- nons l’hypothèse de normalité marginale en faveur de la mixture des normales avec une matrice de covariance commune. Nous montrons comment construire la den- sité multivariée avec la dépendance spatiale Gaussienne et les densités marginales pré-définis non-croisées; nous utilisons pairwise likelihood pour estimer le medèle spatial généralisés à variables latentes. Les propriétés des estimateurs sont exploré sur des simulations.
Abstract
In the first part (Chapter 2) of this thesis we consider a semi-nonparametric spec- ification for the density of latent variables in Generalized Linear Latent Variable Models (GLLVM). This specification is flexible enough to allow for an asymmetric, multi-modal, heavy or light tailed smooth density. The degree of flexibility required by many applications ofGLLVM can be achieved through this semi-nonparametric specification with a finite number of parameters estimated by maximum likelihood.
Even with this additional flexibility, we obtain an explicit expression of the like- lihood for conditionally normal manifest variables. We show by simulations that the estimated density of latent variables capture the true one with good degree of accuracy and is easy to visualize. By analyzing two real data sets we show that a flexible distribution of latent variables is a useful tool for exploring the adequacy of theGLLVMin practice.
In the second part (Chapter 3) we consider a spatial generalized linear latent variable model with and without assumption of normality on latent variables. As shown, a Laplace approximation can be applied when latent variables are assumed multivariate normal. Otherwise the assumption of marginal normality is relaxed in favor of a mixture of normals with a common covariance matrix. We show how to construct a multivariate density with Gaussian spatial dependence and given nonoverlapping multivariate margins, and we use the pairwise likelihood to estimate the corresponding spatial generalized linear latent variable model. The properties of the obtained estimators are explored by simulations.
Acknowledgments
This thesis would not be possible without Eva Cantoni, Marc G. Genton, Irini Mous- taki, Maria-Pia Victoria-Feser, Elevezio Ronchetti, Philippe Huber, Christian, Erna and André Meylan, Laura and Alexandra Irincheeva, Konstantin Irincheev, Swiss Government scholarship for foreign students, grant of the foundation Paul Moriaud and “Tremplin” grant of the “Commission de l’Egalité SES”. Thank you!
Chapter 1
Introduction
Latent variables are ubiquitous in science and applications but cannot be measured directly and are usually assessed through different latent variable models that use observed variables as proxies. Probably the most popular of such models is the Gen- eralized Linear Latent Variable Model (GLLVM)1. Latent variable’s principal uses are heterogeneity modeling, dimension reduction and indices construction. Basi- cally they allow to uncover the meaning structure in multivariate, noisy data.
The development of latent variable models is usually associated with psychol- ogy where starting with the pioneering work of Spearman (1904) on factor analy- sis, researchers use latent variables to assess and make inference about intelligence, anxiety and other useful theoretical concepts. Later latent variables became used by researchers in other social sciences and economics to model constructs such as racism, quality of life, poverty, etc. The notion of a latent variable has much in com-
1this class of models is known under two simultaneous names: GLLVMas in Huber, Ronchetti, and Victoria-Feser (2004) andGLTM, Generalized Latent Trait Model, as in Moustaki and Knott (2000)
mon with the disease-related biomarker (a proxy that summarize various particular illness signs) that is why nowadays latent models start to be used to study illness markers jointly as, for example, in Martins et al. (2006).
High-dimensional and/or large data sets are now routine in many applied do- mains. There is a pressing need to develop statistical tools for analyzing and in- terpreting such data for which the normality assumption is often inappropriate, as noted for example in the very promising work on sparse factor analysis by Carvalho et al. (2008). But the assumption of only continuous manifest variables is clearly a limitation of sparse factor analysis models. In this work we relax the normality assumption on latent variables in the framework of theGLLVM that allow for cate- gorical manifest variables (with distribution belonging to the exponential family).
This introduction is followed by Chapter 2, where we introduce the
semi-nonparametric specification of Gallant and Nychka (1987) for latent variable distribution. This specification is flexible enough to allow for an asymmetric, multi- modal, heavy or light tailed smooth density. The degree of flexibility required by many applications ofGLLVM can be achieved through this semi-nonparametric specification with a finite number of parameters estimated by maximum likelihood.
Even with this additional flexibility, we obtain an explicit expression of the like- lihood for conditionally normal manifest variables. We show by simulations that the estimated density of latent variables capture the true one with good degree of accuracy and is easy to visualize. By analyzing two real data sets we show that a flexible distribution of latent variables is a useful tool for exploring the adequacy of theGLLVMin practice.
In Chapter 3 we explore the spatial GLLVMfrom the frequentist point of view
with and without assumption of marginal normality on latent variables. As shown, a Laplace approximation can be applied when latent variables are assumed multivari- ate normal. Otherwise the assumption of marginal normality is relaxed in favor of a mixture of normals with a common covariance matrix. We show how to construct a multivariate density with Gaussian spatial dependence and given nonoverlapping multivariate margins, and we use the pairwise likelihood to estimate the correspond- ing spatial generalized linear latent variable model. The properties of the obtained estimators are explored by simulations. A final section concludes this manuscript.
Chapter 2
Generalized linear latent variable models with flexible distribution of latent variables
In this chapter we consider a semi-nonparametric specification for the density of latent variables in Generalized Linear Latent Variable Models (GLLVM). This spec- ification is flexible enough to allow for an asymmetric, multi-modal, heavy or light tailed smooth density. The degree of flexibility required by many applications of
GLLVMcan be achieved through this semi-nonparametric specification with a finite number of parameters estimated by maximum likelihood. Even with this additional flexibility, we obtain an explicit expression of the likelihood for conditionally nor- mal manifest variables. We show by simulations that the estimated density of latent variables capture the true one with good degree of accuracy and is easy to visual- ize. By analyzing two real data sets we show that a flexible distribution of latent variables is a useful tool for exploring the adequacy of theGLLVMin practice.
2.1 Generalized linear latent variable models
Latent variables, as hypothetical constructs, are present in almost all sciences and in daily life. Indeed, constructs such as quality of life, physical health or disease are widespread in research and applications but cannot be measured directly. Usually scientists make and validate inference on those constructs with help of latent vari- able models using observable variables as proxies. In the aforementioned examples we can imagine quality of life to be modeled through economic wealth and access to drinking water; physical health can be assessed through cholesterol and hemoglobin rates, body mass index, eyesight, hearing and presence of chronic diseases; virus in- fection or other diseases can be revealed by fever, level of some particular antibod- ies, erythrocyte sedimentation rate, level of C-reactive protein. The principal aim of Generalized Linear Latent Variable Models (GLLVM, concept by Bartholomew 1980 and 1984) or any factor anlysis model is to explain most of the variability of pobserved (manifest) variablesX1, . . . , Xp by constructingq < platent variables Z1, . . . , Zq. To this aim,GLLVMassumes thatΓZ, with unknown parameter matrix Γ ∈ Rp×q and Z = (Z1, . . . , Zq)T, explains all the systematic variability of the manifest variables via the conditional probability density, or mass, function
g(x|z;µ,Γ) =
∏p j=1
gj(xj |µj +γjTz), (2.1)
where x = (x1, . . . , xp)T ∈ Rp, z = (z1, . . . , zq)T ∈ Rq, µj is the location pa- rameter of xj, γj ∈ Rq is thejth row of thep×q parameter matrix Γ, gj(·) is a probability density or mass function of a distribution from the exponential family.
The marginal probability density or mass function of the manifest variables is f(x|µ,Γ, ψ) =
∫
Rq
g(x|z)h(z)dz (2.2)
=
∫
Rq
[ p
∏
j=1
exp {x
j(µj+γTjz)−bj(µj+γjTz)
ψj +cj(xj, ψj) }]
h(z)dz
with functionsbj(·), cj(·,·)and, in some cases, additional scale parameterψj ∈ R withψ = (ψ1, . . . , ψp)T. The linear combinationµj+γjTzis related to the expected values ofXj|z through the link function denoted here asθj(·):E(Xj|z) =θj(µj + γjTz). If needed, observable covariatesY1, . . . , Ym possibly explaining the manifest variablesX1, . . . , Xpcan be introduced by settingE(Xj|z, y) = θj(µj+βjTy+γjTz), where y, βj ∈ Rm and µ = (µ1, . . . , µp)T, β = (β1, . . . , βp)T,Γ are parameters to be estimated. Then model (2.2) becomes a generalization (for responses with distribution from the exponential family) of the responses equation in a structural equation model as in Rabe-Hesketh & Skrondal (2004, page 78). Although the structural relation among latent variables is not the focus of GLLVM, its modeling can be done straightforwardly through an additional structural equation as in Liu et al. (2005). Model (2.2) with observable covariates can be generalized to two or more levels by adding additional subscript(s) to manifest and latent variables as in Rabe-Hesketh & Skrondal (2004, page 99), which renders possible modeling of, for example, multivariate longitudinal data as in Cagnone et al. (2009) or Dunson (2003). ThusGLLVM with observable covariates can be seen as an approach to a multivariate generalized mixed effects model.
Traditionally it is assumed that the densityh(z)of the latent variables is mul- tivariate normal. Bartholomew (1988) advocated the adequacy of the normal dis- tribution for two principal reasons. The first reason is the “arbitrariness about the
direction of measurement of a latent scale”, for example, the convention that high customer satisfaction is given a high score on the corresponding latent variable.
Bartholomew (1988) suggested that only symmetric distributions of latent variables can overcome this arbitrariness. This statement is refuted by Montanari and Viroli (2010) who showed that any distribution of latent variables would overcome this arbitrariness. The second reason in which Bartholomew (1988) believed is that an incorrect specification of the latent variables distribution would not affect the esti- mates. To the contrary, Ma and Genton (2010) described settings ofGLLVMwhere an inappropriate specification of the asymmetric latent variables distribution biases the estimates.
Using alternatives to normality for the latent variables is not new in the sta- tistical literature on GLLVM and its submodels. For instance in factor analysis Montanari and Viroli (2010) introduced skew-normal latent variables alluding to the frequent asymmetry of appreciations; Yung (1997) modeled latent variables via mixture of normals in order to handle heterogeneity of clusters; Wedel and Ka- makura (2001) assumed the latent variables to have a continuous distribution from the exponential family in order to construct test statistics. To date, the explicit ex- pression of the integral in (2.2) exists only wheng(x|z)is multivariate normal and the distribution of latent variablesh(z) is multivariate normal or mixture of nor- mals as in Yung (1997). For the other cited cases, a numerical approximation of the integral is required. The model we propose approximates arbitrarily closely a wider class of latent variable distributions (and then of manifest variables too) than proposed by Montanari and Viroli (2010) and Wedel and Kamakura (2001) but yields an explicit expression of the integral in (2.2) in case of conditionally normal
manifest variables.
In another submodel of GLLVM, the latent trait model with binary manifest, Knott and Tzamourani (2007) estimated the latent variables distribution by boot- strap combined with non-parametric maximum likelihood and concluded that the usual normality assumption of the latent variables “is not always justified”. The semi-nonparametric approach being different from the non-parametric maximum likelihood estimation (Laird, 1978) has an appealing smoothness property. The smooth density of the latent variable is easier to grasp: we do not have to take into account the possibility of other differently defined supports as in the case of the discrete mass-point distribution.
In the case of univariate responses the appropriateness of the normal distribu- tion for other latent variables (or random effects) models have been studied: for structural measurement error models in Huang et al. (2006), for generalized mixed effects models in Rabe-Hesketh et al. (2003), who estimated the latent variable distribution with nonparametric likelihood, and Chen et al. (2002), who used the semi-nonparametric approach as in the present paper.
In addition to the conclusion about the inappropriateness of the normality as- sumption for the latent variables Rabe-Hesketh et al. (2003) highlighted the impor- tance of the correct distributional assumptions for the prediction of latent scores.
The estimated density of latent scores is simply the estimated density of latent vari- ables. Its inappropriate specification and visualization lead to overlooking clusters, outliers and misinterpretation of the estimation results.
In some cases the inadequacy of the normally distributed latent variables can be due to the non-linear dependence on latent variables as explored for structural equa-
tion models by Wall and Amemiya (2000) and generalized latent variable models by Rizopoulos and Moustaki (2008).
In this paper we consider GLLVM with both discrete and continuous manifest variables and proposeh(z)in (2.2) to have the semi-nonparametric (SNP) specifica- tion introduced by Gallant and Nychka (1987)
h(z) = PL2(z)ϕ(z), PL(z) = ∑
0≤i1+···+iq≤L
ai1...iqz1i1· · ·zqiq, (2.3) wherei1, . . . , iqis a tuple such thati1, . . . , iq ≥0andϕ(z)is theq-variate standard normalNq(0, I)density. It is straightforward to see thatL = 0corresponds to the caseZ ∼ Nq(0, I). We demonstrate further in the paper (Section 2.2.3) how the flexibility and number of modes of theSNP density increase with the degree L of the polynomialPL.
Combining theGLLVM settings (2.2) and the SNP specification (2.3) results in what we call aSNP-GLLVM, where the marginal probability density or mass function of the manifest variablesX is
f(x|µ,Γ, ψ, PL) = 1 (2π)q/2
∫
Rq
g(x|z)PL2(z) exp{
−12zTz}
dz, (2.4)
where the expression forg(x|z)is given by (2.1).
In what follows we propose a necessary and sufficient condition for the identi- fiability of (2.4) and define estimatorsµ,ˆ Γ,ˆ ψˆvia maximum likelihood. For condi- tionally normal manifest variables, the integral in (2.4) can be computed explicitly.
One of our main results is the demonstration by simulations that in someGLLVM
settings the incorrect specification of the latent variables distribution biases the es- timatorsµˆandΓ.ˆ
The estimated SNPdensity of the latent variables (or latent scores distribution)
is easy to visualize which is an advantage when compared to the semiparametric
GLLVM estimator proposed recently by Ma and Genton (2010). An obvious non- normality (multi-modality and/or skewness) of latent scores distribution can indi- cate the presence of outliers, possible non-linearity in dependence on latent vari- ables, non-homogeneity of population or simply the inadequacy of the normal latent density to the particular data.
2.2 Semi-nonparametric GLLVM
2.2.1 Parametrization of P
L(z)
Restrictions must be imposed on the coefficients ofPL(z)in order forh(z)in (2.3) to be a density. This can be done as in Gallant and Tauchen (1989) by introducing a proportionality constant 1/∫
PL2(z)ϕ(z)dz and setting the constant term of the polynomial equal to one. Here we choose another parametrization ofPL(z) that avoids difficulties of constrained optimization. This parametrization is proposed by Zhang and Davidian (2001) and consists in rewriting the validity condition onh(z) as
1 =
∫
Rq
PL2(z)ϕ(z)dz = E{PL2(W)}=aT E( ˜WW˜T)a=aTAa, (2.5) withW ∼Nq(0, I),PL(W) =aTW˜,W˜ = (1, W1, . . . , Wq, W12, W1W2, . . . , WqL)T, so thatAis a positive definite matrix by definition. Therefore, there exists a positive definite matrixB such thatA =BTB. Definingc= Ba, (2.5) becomescTc= 1.
Hence, c = (c1, . . . , cd)T can be represented in polar coordinates: c1 = sinφ1, c2 = cosφ1sinφ2, . . . , cd−1 = cosφ1· · ·cosφd−2sinφd−1, cd = cosφ1cosφ2· · ·
cosφd−2cosφd−1,with angles−π/2< φt≤ π/2, t= 1, . . . , d−1in order forc to take values only on a half of the unit sphere inRd. More details on the polar co- ordinates transformation can be found in Scott (1992). Note thatd=∑L
k=0
(q+k−1
k
)
according to Stetter (2004, page 228).
Thus, the density (2.3) can be rewritten as h(z|φ, L) =PL2(z)ϕ(z) =(
aTz˜)2
ϕ(z), (2.6)
wherea=B−1c,z˜= (1, z1, . . . , zq, z12, z1z2, z22, . . . , zqL)T andφ= (φ1, . . . , φd−1)T. For example, whenq = 1(one latent variable),L= 2andPL(z) = a0+a1z+a2z2, we obtaina0 = sinφ1− √12 cosφ1cosφ2, a1 = cosφ1sinφ2,a2 = √1
2cosφ1cosφ2.
2.2.2 Identifiability and constraints
As noted by many researchers, for example Rabe-Hesketh and Skrondal (2001) and Rabe-Hesketh and Skrondal (2004)), the major difficulty of all the models with latent variables is identifiability. According to Hastie et al. (2001, page 494): “this aspect has left many analysts skeptical of factor analysis, and may account for its lack of popularity in contemporary statistics”.
A parametric statistical model is said to be identified if distinct values of param- eters correspond to distinct probability density or mass functions of the response variables. With this definition we investigate how any affine transformation of the latent variablesZ affects the probability density or mass function (2.4) of the ran- dom vectorX.
Proposition 2.2.1 For anyPL2(z) ̸= 0, the orthogonal transformationZ1 = CZ, (CCT = CTC = Iq)is the one and only one affine transformation of the random vectorZ leaving the probability density or mass function (2.4) of the random vector Xunchanged.
Corollary 2.2.2 The loadings matrixΓis undefined with respect to the orthogonal transformationZ1 =CZ, (CCT =CTC =Iq).
The proofs of these results are in Appendix 3.
Huber et al. (2004) demonstrated that if the loadings matrix is undefined with respect to the orthogonal transformation then a sufficient condition for identifiabil- ity of aGLLVMis thatq(q−1)/2elements of the matrixΓare set to zero. In other words, after permutations the elements of the upper triangle of Γ should be con- strained. The same authors proved that if, in addition, at least one of the elements of the loadings matrix is constrained to be either smaller or larger than zero, then the loadings matrix is necessarily identified. We can find similar conclusions in Ma and Genton (2010). The same number ofq(q−1)/2constraints is used by Jöreskog (1967) in order to obtain a single solution in factor analysis and avoid rotation.
2.2.3 Multi-modality and flexibility of
SNPSimilar to distributions considered by Ma and Genton (2004), the number of modes of the semi-nonparametric density increases with the degreeL of the polynomial PL(z) and the number q of latent variables. Indeed, a necessary condition for a
mode (local extremum) at the pointz0 is a null gradient:
∂
∂zPL2(z)ϕ(z)
z=z0
= [{
2∂PL(z)
∂z −zPL(z) }
PL(z)ϕ(z) ]
z=z0
= 0, i.e., either PL(z)
z=z0
= 0 (2.7)
or {
2∂PL(z)
∂z −zPL(z) }
z=z0
= 0. (2.8)
The set of real solutions of (2.7) can contain from 0 to L distinct manifolds of dimensionq−1or less. It is easy to see that, the solution of (2.7), if it exists, always corresponds to a local minimum of the densityPL2(z)ϕ(z). Thus, if (2.7) has up toL different solutions,PL2(z)ϕ(z)has up toL−1different modes. Independently of the fitted data, ifLis odd, theSNPdensity is equal to zero on a manifold of dimension q−1(i.e. ifq = 2andLis odd then theSNP density has to be equal to zero on a curve inR2). For this reason high odd degreesLshould be possibly avoided.
Equation (2.8) is a system ofqpolynomials where each polynomial is of degree L+ 1 and depends on q variables. In a regular case, i.e., without assuming that some coefficients in (2.6) are null, the system (2.8) can have up to(L+ 1)qdifferent isolated point solutions (i.e. solutions containing only one point not manifolds such as curves) according to Stetter (2004, page 228).
Defining the number of points where both (2.7) and (2.8) hold is not trivial. But assuming that (2.7) hasLdifferent isolated point solutions in which∂PL(z)/∂z = 0 and (2.8) has(L+ 1)q different isolated point solutions, we obtain that (2.7) and (2.8) together define at most(L+ 1)qisolated point solutions. This implies that an
SNP density can have at most three modes whenL = 2, q = 1; four modes when L = 2, q = 2; and thirteen modes whenL = 2, q = 3. Our practical experience whenL= 2, q = 1andL= 2, q= 2confirms these conclusions.
The sufficient condition for a local maximum (minimum) atz = z0 is that the Hessian matrix∂PL2(z)ϕ(z)/∂z∂zT at this point is negative definite (respectively positive definite):
∂
∂z∂zTPL2(z)ϕ(z) = 2∂PL(z)
∂z
∂PL(z)
∂zT ϕ(z) + 2∂2PL(z)
∂z∂zT PL(z)ϕ(z) +zzTPL(z)ϕ(z)−2∂PL(z)
∂z zTPL(z)ϕ(z) (2.9)
−2z∂PL(z)
∂zT PL(z)ϕ(z)−IqPL2(z)ϕ(z).
Once a density of latent variables is estimated, the solutions of (2.8) can be found numerically and the expression (2.10) estimated at these points. Hence, the number of modes can be established.
2.3 Inference in SNP - GLLVM
2.3.1 Conditionally normal manifest variables
Suppose the conditional density of manifest variables given the latent ones are mul- tivariate normal with the structural scale parameter given by a diagonal positive definite matrix Ψ = diag (ψ) ∈ Rp×p. Then the marginal density of x can be written as:
f(x;µ,Γ,Ψ, PL) =|2π(Ψ+ΓΓT)|−1/2 exp{
−12xT0(Ψ + ΓΓT)−1x0}
Ez|x0{PL2(z)}, (2.10) wherex0 =x−µ,B =Iq+ ΓTΨ−1Γand
Ez|x0{PL2(z)}=|2πB−1|−1/2
∫
Rq
PL2(z)· (2.11)
exp{
−12(z−B−1ΓTΨ−1x0)TB(z−B−1ΓTΨ−1x0)} dz.
As all the moments of the multivariate normal distribution are known and com- pletely defined by the first two moments,Ez|x0{PL2(z)}exists in explicit form and represents a2L-degree polynomial in x0. Hence the marginal density ofx exists in closed form. WhenPL(z) ≡ 1we obtain the classical factor analysis model for normally distributed manifest variables, as described, for example, by Mardia et al.
(1979).
Using (2.10) we obtain the following log-likelihood function ℓ(µ,Γ,Ψ, φ, L|x1, . . . , xn) =− n
2log |2π(Ψ + ΓΓT)| − (2.12) 1
2
∑n i=1
xT0i(Ψ + ΓΓT)−1x0i+
∑n i=1
log[
Ez|x0i{PL2(z)}] . The parameters of interest are those inherited from factor analysis, namely µ,Γ,Ψ, with additional parameters φ and L responsible for the shape of the la- tent variables density. In practiceLis fixed by the rule to be discussed in Section 2.3.3. The final estimators are defined as
ˆ
µ=µ∗+ Γ∗E(˜ Z), Γ = Γˆ ∗cov˜ 1/2(Z), Ψ = Ψˆ ∗, (2.13) where (µ∗,Γ∗,Ψ∗, φ∗) = arg maxµ,Γ,Ψ,φℓ(µ,Γ,Ψ, φ | L, x1, . . . , xn), E(Z)˜ and cov˜ 1/2(Z) are found given φ∗ and the SNP density (2.6). Thus, µˆ and Γˆ are the estimators corresponding to the uncorrelated latent variables with zero expectation and unit variance.
In the optimization ofℓ(µ,Γ,Ψ, φ|L, x1, . . . , xn)we use an analytically com- puted gradient and Hessian matrix (gradient can be found in Appendix 1, the Hes- sian expression is formidable and is available upon request). It should be stressed that the Hessian is computed in a matrix form offering a considerable advantage in R implementation compared to existing Hessian matrix computations such as
in Lawley (1967), Jennrich and Thayer (1973) and Ramsey (2010). The optimiza- tion is done inRwith thenlminbfunction and is sensitive to the choice of initial values. We discuss how to cope with this problem at the end of the next section.
2.3.2 Mixture of conditionally binary and normal manifest vari- ables
In practice the presence of both continuous and binary (or discrete) responses is more frequent than exclusively continuous responses. Suppose that amongpman- ifest variables the firstp1 are normal conditionally on the latent variables, and the lastp−p1 are conditionally Bernoulli, i.e., the joint conditional probability mass function from (2.1) is:
g(x|z) =
p1
∏
j=1
[
− 1
√2πψj
exp { 1
2ψj(xj−µj−γjTz)2
}] ∏p
j=p1+1
{exp(xjµj +xjγjTz) 1 + exp(µj +γjTz)
} , (2.14) where the expression in the last brackets is obtained by setting pr(xj = 1) =pj and choosing the logit link, i.e.,log{pj/(1−pj)}=µj+γjTz. Then, the manifest vari- ables marginal density for theSNP-GLLVMmodel is obtained straightforwardly by plugging (2.14) in (2.4). We approximate the corresponding log-likelihood function ℓ(µ,Γ,Ψ, φ|L, x1, . . . , xn)with one latent variable by computing the integral with theRcommandintegrate. The latter uses multiple algorithms including differ- ent adaptive integration algorithms for which “the evaluation points are clustered in the neighborhood of difficult spot of each integrand” (Piessens et al., 1983).
In a similar GLLVM setting but with normal distribution of latent variables, Huber et al. (2004) implemented a Laplace approximation of the integral. We high-
light here that the Laplace approximation for integrals is conceived for integrand with only one absolute maximum (De Bruijn, 1981, page 63) and cannot be used as approximation of the integrand in (2.4) which can have multiple local maxima.
Similarly, Gaussian or Gauss-Hermite quadratures will perform poorly. Other al- ternatives for computing the integral would be to consider a Monte Carlo EM algo- rithm as implemented by Chen et al. (2002) or an adaptive quadrature algorithm for numerical integration.
The estimatorsµˆandΓˆare defined as in (2.13) with the approximationℓ(µ,˜ Γ,Ψ, φ| L, x1, . . . , xn)(due to the integral) of the log-likelihood functionℓ(µ,Γ,Ψ, φ | L, x1, . . . , xn). The optimization is achieved vianlminbwith 10−4 as absolute and relative tolerance and an analytically computed gradient and Hessian (the gradient is available in Appendix 2, analytical expression of the Hessian is available upon request). As previously, the optimized function has multiple local optimums. Thus, an appropriate set of initial values is essential for a reliable optimization. We use as initial values for the parametersµ∗,Γ∗andΨ∗their estimations by maximum likeli- hood under the normality assumption of latent variables. Initial values for theφpa- rameters are taken through the grid of initial values constructed by theRcommand cover.design(Furrer et al., 2009) in the space[−π/2−π/10, π/2 +π/10]d−1. The number of initial values depends ond−1and is defined empirically (we stop to increase the number of initial values if the best value of the optimized function does not change after few successive increments). In our experience, this approach is more stable, fast and reliable than the genetic optimization algorithm (with a very large population size) implemented in the packagergenoud(Mebane & Sekhon, 2010) or any other optimization methods implemented in theRcommandoptim.
2.3.3 Tuning the flexibility of the
SNPdensity
The flexibility of the SNP density is controlled by the degree L of the polyno- mial PL(z) in (2.6). Different possibilities have been explored to choose L: the original work by Gallant and Nychka (1987) proposed to fix L by a determinis- tic rule L = nα, 0 < α < 1, with n being the sample size. Davidian and Gallant (1993) and Fenton and Gallant (1996) explored under different settings whether an adaptive rule for the choice of Lcan be applied. Following these au- thors we select L on the basis of one of the information criteria taking the form
−ℓ(µ,Γ,Ψ, φ|L, x1, . . . , xn) +C(n)k/n, wherekis the number of unconstrained parameters in the model with fixedL and C(n) = 1 for the Akaike Information Criterion (AIC, Akaike, 1974),C(n) = 0.5 logn for the Schwarz Information Cri- terion (BIC, Schwarz, 1978), andC(n) = log lognfor the Hannan-Quinn criterion (HQ, Hannan, 1987).
As an alternative to AIC in models with focus on inference about latent vari- ables, Vaida and Blanchard (2005) proposed a conditional Akaike information cri- terion (cAIC) for mixed effects models. The cAIC for the factor analysis model on large samples would be approximated by
cAIC =−2 logg(x|µ,ˆ Γˆˆz)−2(ρ+p),
whereµ,ˆ Γˆ are parameters estimated on the data,zˆ= E(z|µ,ˆ φ, L, x)ˆ is the empir- ical Bayes prediction,pcounts unknown parameters inψ andρcounts the degrees of freedom as introduced by Hodges and Sargent (2001). The number of those de- grees of freedom “is often much smaller than the number of parameters” in models with latent variables. The intuitive idea of cAICis appealing: we prioritize models
whereµ,ˆ Γˆˆzinduce greater density of the observedxwith appropriate penalty to the complexity of those models. However, to date cAICis not developed for generalized mixed effects models. Lu et al. (2007) suggested the method of counting degrees of freedom in generalized linear hierarchical models by using the Laplace approx- imation of the likelihood function. Unfortunately due to the potential presence of multiple maxima the method of Lu et al. (2007) is not applicable to the likelihood induced by (2.14).
As another alternative to model selection criteria we explored the likelihood ratio tests for testing the hypothesisL = 0(i.e. φ1 = π/2) and L = 1(i.e. φ2 = π/2). However in all simulated cases the likelihood ratio statistic’s distribution is far from an assumed chi-squared. This is due to the irregularity conditions discussed by Drton (2009) together with the fact thatφ1 = π/2is a boundary point. While this is a topic beyond the scope of this paper, we expect that resampling technique for obtaining the likelihood ratio statistic’s distribution could give good results.
Given that the above alternatives are not applicable to our setting, we restrict ourselves to the use of AIC, BIC and HQ for choosing L. When the exact log- likelihood functionℓ(µ,Γ,Ψ, φ |L, x1, . . . , xn)cannot be computed, as for the case of the Bernoulli distribution, we approximate it byℓ(µ,˜ Γ,Ψ, φ | L, x1, . . . , xn)as described in Section 2.3.2.
2.4 Monte Carlo simulations
We explore the performance of the proposed method on finite samples by simulating 600 samples of size 500 issued from theGLLVM with four manifest variables and one latent: three manifest conditionally normal and one conditionally Bernoulli.
The univariate scores of the latent variable are issued from three different distribu- tions: 1) symmetric unimodal normalN(0,1); 2) asymmetric trimodalSNPdistribu- tionh(z) = (
−cos 0.7/√
2 +zsin 0.7 +z2cos 0.7/√ 2)2
ϕ(z); and 3) asymmetric bimodal mixture of normals0.9N(2,1) + 0.1N(−2,0.25). For simulating from the
SNPdistribution we use the algorithm proposed by Gallant and Tauchen (1992).
For each simulated data set we estimate the coefficients of the SNP-GLLVM by the methodology of Sections 2.2 and 2.3 forL= 2(SNP2),L= 1(SNP1) andL= 0 (SNP0). The latter corresponds to the traditional maximum likelihood estimation under the normality assumption of the latent variable. Theoretically, as proved by Gallant and Nychka (1987), the parametersµ,ΓandΨare estimated consistently if Lis sufficiently large and together withφgenerate a density (2.6) close enough to the true one. In practice, as illustrated later in this section, an unduly large value of Lcan result in overfitting and bias due to the integral approximation, while L = 1 or 2 are usually sufficient to detect the departure from normality of the considered latent variable densities.
To make the simulation results comparable we impose the latent variable to have variance equal to the true one, that is: 1)var(Z) = 1for the normal density;
2)var(Z) = 2.228for the SNP density; and 3)var(Z) = 4.135 for the mixture of normals density. These choices give a slight advantage to the SNP0 estimator. We use grids of 6 initial values forSNP1 optimization and of 13 initial values forSNP2
(constructed as discussed in Section 2.3.2).
We compute theAIC,BICandHQinformation criteria discussed in Section 2.3.3 for SNP2, SNP1 and SNP0 estimations on each data set. In Tables 2.4 and 2.4 we report detailed simulation results for normal and SNP2 generated latent variables.
For the normal latent the estimates are all nearly unbiased, despite three trimodal estimated SNP2 densities not selected by any information criterion (all other es- timatedSNP1 and SNP2 densities are unimodal and nearly symmetric). When the true latent variable distribution isSNP2 all estimates of parameters corresponding to the conditionally normal manifest variables (i.e.µ1, µ2, µ3, γ1, γ2andγ3) are nearly unbiased. The fact that the conditionally normal observable variables are not sen- sitive to the wrong specification of the latent variable distribution has been already observed in the literature theoretically by Anderson and Amemiya (1988) and in simulations by Ma and Genton (2010). However, the SNP0 and SNP1 estimates of the parameters related to the loading of the Bernoulli manifest variable (γ4) present biases though not big. The bias is clear for theSNP0 estimates, when the assumed latent variable density is far from the true one, and diminishes when the estimated density gets closer to the true one. It is surprising that the bias in the SNP2 esti- mates ofµ4 is greater than theSNP1 bias and almost equal to theSNP0 bias for the same parameter. A closer inspection of estimates shows that the medians of theµ4 estimates for bothSNP1 and SNP2 are exactly at the true value and the biases of the mean are both due to a few (fewer for SNP1) extreme values inµ4 estimates. The median of theµ4 estimates by SNP0 is equal to the mean despite the presence of a few extreme values. Similarly, a few extreme values are found when inspecting the SNP1 and SNP2 estimates of γ4 (the median of γ4 estimates by SNP1 is equal
SNP0 SNP1 SNP2
AVE MC SD SE AVE MC SD SE AVE MC SD SE NORMAL LATENT
µ1(0) 0.06 0.08 0.08 0.00 0.08 0.08 0.00 0.08 0.08 µ2(0) 0.07 0.09 0.08 0.00 0.09 0.08 0.00 0.09 0.08 µ3(0) 0.06 0.08 0.08 0.00 0.08 0.08 0.00 0.08 0.08 µ4(0.7) 0.70 0.16 0.16 0.70 0.16 0.17 0.70 0.16 0.16 γ1(1.4) 1.40 0.07 0.07 1.40 0.07 0.07 1.40 0.07 0.07 γ2(1.6) 1.60 0.07 0.07 1.60 0.07 0.07 1.60 0.07 0.07 γ3(1.4) 1.40 0.07 0.07 1.40 0.07 0.07 1.40 0.07 0.07 γ4(2) 2.02 0.23 0.24 2.02 0.23 0.25 2.01 0.23 0.24 ψ1(1) 1.00 0.09 0.09 1.00 0.09 0.09 1.00 0.09 0.09 ψ2(1) 1.00 0.11 0.10 1.00 0.11 0.10 1.00 0.11 0.10 ψ3(1) 1.00 0.09 0.09 1.00 0.09 0.09 1.00 0.09 0.09
AICpreferred 79.2% of timeSNP0; 9.8% of timeSNP1 and 1% of timeSNP2
BICpreferred 99.7% of timeSNP0; 0.15% of timeSNP1 and 0.15% of timeSNP2
HQpreferred 95.5% of timeSNP0; 2.5% of timeSNP1 and 2% of timeSNP2 Table 2.1: Simulation results on 600 samples of size 500. The true latent vari- able distribution is standardized normalN(0,1). Mean parameters are denoted by (µ1, µ2, µ3, µ4)T, loadings by (γ1, γ2, γ3, γ4)T and uniquenesses by (ψ1, ψ2, ψ3)T with true values in parentheses. AVE MC, average of estimates;SD, standard devia- tion;SE, average standard errors estimated by sandwich covariance matrix.
SNP0 SNP1 SNP2
AVE MC SD SE AVE MC SD SE AVE MC SD SE SNP2LATENT
µ1 (1.95) 1.96 0.10 0.10 1.96 0.10 0.10 1.96 0.10 0.10 µ2 (2.22) 2.23 0.12 0.12 2.23 0.12 0.12 2.23 0.12 0.12 µ2 (1.95) 1.95 0.10 0.10 1.95 0.10 0.10 1.95 0.10 0.10 µ4 (3.49) 3.62 0.43 0.46 3.56 0.45 0.44 3.59 0.48 0.48 γ1(1.4) 1.40 0.08 0.08 1.39 0.06 0.09 1.40 0.06 0.09 γ2(1.6) 1.60 0.08 0.09 1.59 0.07 0.10 1.60 0.07 0.10 γ3(1.4) 1.39 0.08 0.08 1.39 0.06 0.09 1.39 0.06 0.09 γ4(2) 2.15 0.36 0.42 1.90 0.25 0.35 2.04 0.30 0.41 ψ1(1) 0.98 0.09 0.09 0.98 0.08 0.08 0.98 0.08 0.08 ψ2(1) 1.00 0.11 0.11 0.99 0.10 0.09 0.99 0.10 0.09 ψ3(1) 1.00 0.09 0.09 1.00 0.09 0.08 1.00 0.09 0.08
AIC,BICandHQpreferred 100% of timeSNP2
Table 2.2: Simulation results on 600 samples of size 500. The true la- tent variable distribution is asymmetric trimodal SNP2 distribution h(z) = (−cos 0.7/√
2 +zsin 0.7 +z2cos 0.7/√ 2)2
ϕ(z). Mean parameters are denoted by(µ1, µ2, µ3, µ4)T, loadings by(γ1, γ2, γ3, γ4)T and uniquenesses by(ψ1, ψ2, ψ3)T with true values in parentheses. AVE MC, average of estimates;SD, standard devia- tion;SE, average standard errors estimated by sandwich covariance matrix.
to the mean while the median ofγ4 estimates by SNP2 is equal to the true value).
This phenomenon illustrates the bias induced by the approximate integration as dis- cussed by Pinheiro and Chao (2006). It seems logical that this bias is greater for the integrand with greater amount of difficult spots. The integral approximation bias is an additional reason for a moderate use of highLvalues in our implementation.
Table 2.4 summarizes the simulation results for the mixture of normals latent variable. As previously,SNP0 estimates of parameters corresponding to the contin- uous manifest variables are all nearly unbiased while theSNP0 estimates related to the Bernoulli manifest variable present biases. We conclude that for GLLVM with discrete manifest variables the wrong specification of the latent variable distribu- tion induces a bias in the estimation. Similar conclusions can be found in Ma and Genton (2010). The differences inµ4andγ4estimates confirm the integral approxi- mation bias. The table withAIC,BICandHQselected estimation results for mixture of normals latent variable are available in Table S.1 in the supplementary material.
For normal andSNP2 latent variables a similar table is not of interest becauseAIC,
BICandHQalmost always choose the right model.
From the results of Tables 2.4 and 2.4 we infer that AIC prefers models with greaterL, BIC with smaller L and HQ choices are intermediate. Our conclusions meet those by Zhang and Davidian (2001) and Chen et al. (2002)
The advantage of using the proposed method can be appreciated when looking at the shape of estimated densities in Figure 2.1. The figure illustrates that theSNP1 andSNP2 specifications selected byHQcapture the main features of the true density of the latent variable.
SNP0 SNP1 SNP2
AVE MC SD SE AVE MC SD SE AVE MC SD SE
µ1 (2.24) 2.25 0.10 0.11 2.25 0.10 0.11 2.25 0.10 0.11 µ2 (2.57) 2.57 0.12 0.12 2.57 0.12 0.12 2.57 0.12 0.12 µ3 (2.24) 2.25 0.11 0.11 2.25 0.11 0.11 2.25 0.11 0.11 µ4 (3.90) 4.36 0.56 0.58 3.91 0.43 0.44 3.95 0.46 0.65 γ1(1.4) 1.39 0.08 0.09 1.39 0.06 0.10 1.39 0.06 0.10 γ2(1.6) 1.59 0.09 0.10 1.59 0.07 0.11 1.59 0.07 0.11 γ3(1.4) 1.40 0.08 0.09 1.39 0.06 0.10 1.39 0.06 0.10 γ4(2) 2.22 0.40 0.53 2.00 0.29 0.45 2.05 0.29 0.63 ψ1(1) 0.99 0.09 0.09 1.00 0.08 0.09 1.00 0.09 0.09 ψ2(1) 0.99 0.11 0.11 1.01 0.11 0.10 1.01 0.11 0.10 ψ3(1) 0.99 0.09 0.09 1.00 0.09 0.09 1.00 0.09 0.09
AICpreferred 0% of timeSNP0; 11.4% of timeSNP1 and 88.6% of timeSNP2
BICpreferred 0% of timeSNP0; 69.7% of timeSNP1 and 30.3% of timeSNP2
HQpreferred 0% of timeSNP0; 33.4% of timeSNP1 and 66.6% of timeSNP2 Table 2.3: Simulation results on 600 samples of size 500. The true latent vari- able distribution is asymmetric mixture of normals 0.9N(2,1) + 0.1N(−2,0.25).
Mean parameters are denoted by(µ1, µ2, µ3, µ4)T, loadings by(γ1, γ2, γ3, γ4)T and uniquenesses by(ψ1, ψ2, ψ3)T with true values in parentheses. AVE MC, average of estimates; SD, standard deviation; SE, average standard errors estimated by sand- wich covariance matrix.
-4 -2 0 2 4
0.00.10.20.30.40.5
(a)
z
h(z)
-4 -2 0 2 4
0.00.10.20.30.40.5
(b)
z
h(z)
Figure 2.1: The dashed line is the average of estimated densities for fits preferred by
HQ, the shaded area is the pointwise estimated confidence envelope, the dotted line is the true density for (a) mixture of normals0.9N(2,1)+0.1N(−2,0.25), (b)SNP2 densityh(z) = (
−cos 0.7/√
2 +zsin 0.7 +z2cos 0.7/√ 2)2
ϕ(z)(dashed and dot- ted lines coincide forSNP2).
2.5 Data analysis
2.5.1 Swineford-Holzinger data analysis
This data set introduced by Holzinger and Swineford (1939) is well known in the statistical literature on factor analysis and contains scores of nine psychological tests for 145 individuals. These nine tests are: “visual perception”, “cubes”, “paper from board”, “general information”, “sentence completion”, “word classification”, “fig- ure recognition”, “object-number” and “number-figure”. In this data set Jöreskog (1971) and Sörbom (1974) concluded the presence of four groups of individuals