Generalized linear latent variable models with flexible distributions

(1)

Thesis

Reference

Generalized linear latent variable models with flexible distributions

IRINCHEEVA, Irina

Abstract

We consider first a semi-nonparametric specification for the density of latent variables in Generalized Linear Latent Variable Models (GLLVM). This specification is flexible enough to allow for an asymmetric, multi-modal, heavy or light tailed smooth density. The degree of flexibility required by many applications can be achieved through this semi-nonparametric specification with a finite number of parameters. Even with this additional flexibility, we obtain an explicit expression of the likelihood for conditionally normal manifest variables. We show that the estimated density of latent variables capture the true one with good accuracy. In the second part we consider a spatial generalized linear latent variable model with and without assumption of normality on latent variables. The Laplace approximation is applicable when latent variables are multivariate normal. Otherwise the assumption of marginal normality is relaxed in favor of a mixture of normals. The pairwise likelihood estimators are explored by simulations.

IRINCHEEVA, Irina. Generalized linear latent variable models with flexible distributions. Thèse de doctorat : Univ. Genève, 2011, no. SES 758

URN : urn:nbn:ch:unige-169962

DOI : 10.13097/archive-ouverte/unige:16996

Available at:

http://archive-ouverte.unige.ch/unige:16996

Disclaimer: layout of this document may differ from the published version.

(2)

G ENERALIZED L INEAR L ATENT V ARIABLE M ODELS WITH

F LEXIBLE D ISTRIBUTIONS

Thèse présentée à la Faculté des sciences économiques et sociales de l’Université de Genève

Par Irina Irincheeva

pour l’obtention du grade de

Docteur ès sciences économiques et sociales mention : statistique

Membres du jury de thèse :

Prof. Eva CANTONI-RENAUD, Co-directrice de these, Universite de Genève Prof. Marc GENTON, Co-directeur de these, Texas A&M University, USA Prof. Irini MOUSTAKI, London School of Economics, London, UK

Prof. Maria-Pia VICTORIA-FESER, Presidente du jury, Universite de Genève

Thèse758

Genève, le 15 août 2011

(3)

La Faculté des sciences économiques et sociales, sur préavis du jury, a autorisé l’impression de la présente thèse, sans entendre, par là, n’émettre aucune opinion sur les propositions qui s’y trouvent énoncées et qui n’engagent que la responsabil- ité de leur auteur.

Genève, le 15 août 2011

Le doyen

Bernard MORARD

Impression d’après le manuscrit de l’auteur

(4)

Generalized Linear Latent Variable Models with Flexible Distributions

THÈSE

présentée à la Faculté des sciences économique et sociales de l’Université de Genève

par

Irina IRINCHEEVA

sous la direction de

prof. Eva Cantoni et prof. Marc G. Genton

pour l’obtention du grade de

Doctorat ès sciences économiques et sociales mention statistique

Membres du jury de thèses:

Mme Eva CANTONI, Professeur

Marc G. GENTON, Professeur, Texas A&M University, USA Irini MOUSTAKI, Professeur, London School of Economics, UK

Maria-Pia VICTORIA-FESER, Professeur, présidente du jury

Thèse758 Geneva, 15 août 2011

(5)

(6)

Dans la première partie de cette thèse (Chapitre 2) nous considérons la spécifi- cation semi-nonparamétrique pour la densité des variables latentes dans les mod- èles généralisés à variables latentes (GLLVM). Cette spécification est assez flexible pour permettre à une densité assez lisse d’être asymétrique, multi-modale, avoir des queux épaisses ou légères. Le degré de flexibilité nécessaire pour plusieurs applications deGLLVMest disponible à travers cette specification semi-nonparamétrique avec le nombre finis de paramètres estimés par maximum de vraisemblance. Même avec cette flexibilité additionnelle, on obtient une expression explicite du maximum de vraisemblance pour le cas des variables manifestes conditionnellement normales.

Nous montrons sur des simulations que la densité estimée des variables latentes capture la vraie densité avec une bonne précision et est facile à visualiser. Par analyse de deux jeux de données réels nous montrons que la distribution flexible des variables latentes est un bon outil pour explorer l’ajustement deGLLVMen pratique.

Dans la deuxième partie (Chapitre 3) nous considérons le modèle spatial général- isés à variables latentes avec et sans l’hypothèse de normalité sur les variables latentes. Il est montré que l’approximation de Laplace peut être appliquer quand les variables latentes suivent une loi normale multivariée. Sinon nous abandon- nons l’hypothèse de normalité marginale en faveur de la mixture des normales avec une matrice de covariance commune. Nous montrons comment construire la den- sité multivariée avec la dépendance spatiale Gaussienne et les densités marginales pré-définis non-croisées; nous utilisons pairwise likelihood pour estimer le medèle spatial généralisés à variables latentes. Les propriétés des estimateurs sont exploré sur des simulations.

(10)

Abstract

In the first part (Chapter 2) of this thesis we consider a semi-nonparametric specification for the density of latent variables in Generalized Linear Latent Variable Models (GLLVM). This specification is flexible enough to allow for an asymmetric, multi-modal, heavy or light tailed smooth density. The degree of flexibility required by many applications ofGLLVM can be achieved through this semi-nonparametric specification with a finite number of parameters estimated by maximum likelihood.

Even with this additional flexibility, we obtain an explicit expression of the likelihood for conditionally normal manifest variables. We show by simulations that the estimated density of latent variables capture the true one with good degree of accuracy and is easy to visualize. By analyzing two real data sets we show that a flexible distribution of latent variables is a useful tool for exploring the adequacy of theGLLVMin practice.

In the second part (Chapter 3) we consider a spatial generalized linear latent variable model with and without assumption of normality on latent variables. As shown, a Laplace approximation can be applied when latent variables are assumed multivariate normal. Otherwise the assumption of marginal normality is relaxed in favor of a mixture of normals with a common covariance matrix. We show how to construct a multivariate density with Gaussian spatial dependence and given nonoverlapping multivariate margins, and we use the pairwise likelihood to estimate the corresponding spatial generalized linear latent variable model. The properties of the obtained estimators are explored by simulations.

(11)

Acknowledgments

This thesis would not be possible without Eva Cantoni, Marc G. Genton, Irini Mous- taki, Maria-Pia Victoria-Feser, Elevezio Ronchetti, Philippe Huber, Christian, Erna and André Meylan, Laura and Alexandra Irincheeva, Konstantin Irincheev, Swiss Government scholarship for foreign students, grant of the foundation Paul Moriaud and “Tremplin” grant of the “Commission de l’Egalité SES”. Thank you!

(12)

Chapter 1 Introduction

Latent variables are ubiquitous in science and applications but cannot be measured directly and are usually assessed through different latent variable models that use observed variables as proxies. Probably the most popular of such models is the Gen- eralized Linear Latent Variable Model (GLLVM)¹. Latent variable’s principal uses are heterogeneity modeling, dimension reduction and indices construction. Basi- cally they allow to uncover the meaning structure in multivariate, noisy data.

The development of latent variable models is usually associated with psychol- ogy where starting with the pioneering work of Spearman (1904) on factor analysis, researchers use latent variables to assess and make inference about intelligence, anxiety and other useful theoretical concepts. Later latent variables became used by researchers in other social sciences and economics to model constructs such as racism, quality of life, poverty, etc. The notion of a latent variable has much in com-

1this class of models is known under two simultaneous names: GLLVMas in Huber, Ronchetti, and Victoria-Feser (2004) andGLTM, Generalized Latent Trait Model, as in Moustaki and Knott (2000)

(13)

mon with the disease-related biomarker (a proxy that summarize various particular illness signs) that is why nowadays latent models start to be used to study illness markers jointly as, for example, in Martins et al. (2006).

High-dimensional and/or large data sets are now routine in many applied do- mains. There is a pressing need to develop statistical tools for analyzing and in- terpreting such data for which the normality assumption is often inappropriate, as noted for example in the very promising work on sparse factor analysis by Carvalho et al. (2008). But the assumption of only continuous manifest variables is clearly a limitation of sparse factor analysis models. In this work we relax the normality assumption on latent variables in the framework of theGLLVM that allow for cate- gorical manifest variables (with distribution belonging to the exponential family).

This introduction is followed by Chapter 2, where we introduce the

semi-nonparametric specification of Gallant and Nychka (1987) for latent variable distribution. This specification is flexible enough to allow for an asymmetric, multi- modal, heavy or light tailed smooth density. The degree of flexibility required by many applications ofGLLVM can be achieved through this semi-nonparametric specification with a finite number of parameters estimated by maximum likelihood.

Even with this additional flexibility, we obtain an explicit expression of the likelihood for conditionally normal manifest variables. We show by simulations that the estimated density of latent variables capture the true one with good degree of accuracy and is easy to visualize. By analyzing two real data sets we show that a flexible distribution of latent variables is a useful tool for exploring the adequacy of theGLLVMin practice.

In Chapter 3 we explore the spatial GLLVMfrom the frequentist point of view

(14)

with and without assumption of marginal normality on latent variables. As shown, a Laplace approximation can be applied when latent variables are assumed multivariate normal. Otherwise the assumption of marginal normality is relaxed in favor of a mixture of normals with a common covariance matrix. We show how to construct a multivariate density with Gaussian spatial dependence and given nonoverlapping multivariate margins, and we use the pairwise likelihood to estimate the corresponding spatial generalized linear latent variable model. The properties of the obtained estimators are explored by simulations. A final section concludes this manuscript.

(15)

(16)

Chapter 2 Generalized linear latent variable models with flexible distribution of latent variables

In this chapter we consider a semi-nonparametric specification for the density of latent variables in Generalized Linear Latent Variable Models (GLLVM). This specification is flexible enough to allow for an asymmetric, multi-modal, heavy or light tailed smooth density. The degree of flexibility required by many applications of

GLLVMcan be achieved through this semi-nonparametric specification with a finite number of parameters estimated by maximum likelihood. Even with this additional flexibility, we obtain an explicit expression of the likelihood for conditionally normal manifest variables. We show by simulations that the estimated density of latent variables capture the true one with good degree of accuracy and is easy to visualize. By analyzing two real data sets we show that a flexible distribution of latent variables is a useful tool for exploring the adequacy of theGLLVMin practice.

(17)

2.1 Generalized linear latent variable models

Latent variables, as hypothetical constructs, are present in almost all sciences and in daily life. Indeed, constructs such as quality of life, physical health or disease are widespread in research and applications but cannot be measured directly. Usually scientists make and validate inference on those constructs with help of latent variable models using observable variables as proxies. In the aforementioned examples we can imagine quality of life to be modeled through economic wealth and access to drinking water; physical health can be assessed through cholesterol and hemoglobin rates, body mass index, eyesight, hearing and presence of chronic diseases; virus in- fection or other diseases can be revealed by fever, level of some particular antibod- ies, erythrocyte sedimentation rate, level of C-reactive protein. The principal aim of Generalized Linear Latent Variable Models (GLLVM, concept by Bartholomew 1980 and 1984) or any factor anlysis model is to explain most of the variability of pobserved (manifest) variablesX₁, . . . , X_p by constructingq < platent variables Z₁, . . . , Z_q. To this aim,GLLVMassumes thatΓZ, with unknown parameter matrix Γ ∈ R^p^×^q and Z = (Z₁, . . . , Z_q)^T, explains all the systematic variability of the manifest variables via the conditional probability density, or mass, function

g(x|z;µ,Γ) =

∏p j=1

g_j(x_j |µ_j +γ_j^Tz), (2.1)

where x = (x₁, . . . , x_p)^T ∈ R^p, z = (z₁, . . . , z_q)^T ∈ R^q, µ_j is the location parameter of x_j, γ_j ∈ R^q is thejth row of thep×q parameter matrix Γ, g_j(·) is a probability density or mass function of a distribution from the exponential family.

(18)

The marginal probability density or mass function of the manifest variables is f(x|µ,Γ, ψ) =

∫

R^q

g(x|z)h(z)dz (2.2)

=

∫

R^q

[ _p

∏

j=1

exp {_x

j(µj+γ^T_jz)−bj(µj+γ_j^Tz)

ψj +c_j(x_j, ψ_j) }]

h(z)dz

with functionsb_j(·), c_j(·,·)and, in some cases, additional scale parameterψ_j ∈ R withψ = (ψ₁, . . . , ψ_p)^T. The linear combinationµ_j+γ_j^Tzis related to the expected values ofX_j|z through the link function denoted here asθ_j(·):E(X_j|z) =θ_j(µ_j + γ_j^Tz). If needed, observable covariatesY₁, . . . , Y_m possibly explaining the manifest variablesX₁, . . . , X_pcan be introduced by settingE(X_j|z, y) = θ_j(µ_j+β_j^Ty+γ_j^Tz), where y, β_j ∈ R^m and µ = (µ₁, . . . , µ_p)^T, β = (β₁, . . . , β_p)^T,Γ are parameters to be estimated. Then model (2.2) becomes a generalization (for responses with distribution from the exponential family) of the responses equation in a structural equation model as in Rabe-Hesketh & Skrondal (2004, page 78). Although the structural relation among latent variables is not the focus of GLLVM, its modeling can be done straightforwardly through an additional structural equation as in Liu et al. (2005). Model (2.2) with observable covariates can be generalized to two or more levels by adding additional subscript(s) to manifest and latent variables as in Rabe-Hesketh & Skrondal (2004, page 99), which renders possible modeling of, for example, multivariate longitudinal data as in Cagnone et al. (2009) or Dunson (2003). ThusGLLVM with observable covariates can be seen as an approach to a multivariate generalized mixed effects model.

Traditionally it is assumed that the densityh(z)of the latent variables is multivariate normal. Bartholomew (1988) advocated the adequacy of the normal distribution for two principal reasons. The first reason is the “arbitrariness about the

(19)

direction of measurement of a latent scale”, for example, the convention that high customer satisfaction is given a high score on the corresponding latent variable.

Bartholomew (1988) suggested that only symmetric distributions of latent variables can overcome this arbitrariness. This statement is refuted by Montanari and Viroli (2010) who showed that any distribution of latent variables would overcome this arbitrariness. The second reason in which Bartholomew (1988) believed is that an incorrect specification of the latent variables distribution would not affect the estimates. To the contrary, Ma and Genton (2010) described settings ofGLLVMwhere an inappropriate specification of the asymmetric latent variables distribution biases the estimates.

Using alternatives to normality for the latent variables is not new in the statistical literature on GLLVM and its submodels. For instance in factor analysis Montanari and Viroli (2010) introduced skew-normal latent variables alluding to the frequent asymmetry of appreciations; Yung (1997) modeled latent variables via mixture of normals in order to handle heterogeneity of clusters; Wedel and Ka- makura (2001) assumed the latent variables to have a continuous distribution from the exponential family in order to construct test statistics. To date, the explicit expression of the integral in (2.2) exists only wheng(x|z)is multivariate normal and the distribution of latent variablesh(z) is multivariate normal or mixture of normals as in Yung (1997). For the other cited cases, a numerical approximation of the integral is required. The model we propose approximates arbitrarily closely a wider class of latent variable distributions (and then of manifest variables too) than proposed by Montanari and Viroli (2010) and Wedel and Kamakura (2001) but yields an explicit expression of the integral in (2.2) in case of conditionally normal

(20)

manifest variables.

In another submodel of GLLVM, the latent trait model with binary manifest, Knott and Tzamourani (2007) estimated the latent variables distribution by boot- strap combined with non-parametric maximum likelihood and concluded that the usual normality assumption of the latent variables “is not always justified”. The semi-nonparametric approach being different from the non-parametric maximum likelihood estimation (Laird, 1978) has an appealing smoothness property. The smooth density of the latent variable is easier to grasp: we do not have to take into account the possibility of other differently defined supports as in the case of the discrete mass-point distribution.

In the case of univariate responses the appropriateness of the normal distribution for other latent variables (or random effects) models have been studied: for structural measurement error models in Huang et al. (2006), for generalized mixed effects models in Rabe-Hesketh et al. (2003), who estimated the latent variable distribution with nonparametric likelihood, and Chen et al. (2002), who used the semi-nonparametric approach as in the present paper.

In addition to the conclusion about the inappropriateness of the normality assumption for the latent variables Rabe-Hesketh et al. (2003) highlighted the impor- tance of the correct distributional assumptions for the prediction of latent scores.

The estimated density of latent scores is simply the estimated density of latent variables. Its inappropriate specification and visualization lead to overlooking clusters, outliers and misinterpretation of the estimation results.

In some cases the inadequacy of the normally distributed latent variables can be due to the non-linear dependence on latent variables as explored for structural equa-

(21)

tion models by Wall and Amemiya (2000) and generalized latent variable models by Rizopoulos and Moustaki (2008).

In this paper we consider GLLVM with both discrete and continuous manifest variables and proposeh(z)in (2.2) to have the semi-nonparametric (SNP) specification introduced by Gallant and Nychka (1987)

h(z) = P_L²(z)ϕ(z), P_L(z) = ∑

0≤i1+···+iq≤L

a_i₁_...i_qz₁ⁱ¹· · ·z_qⁱ^q, (2.3) wherei₁, . . . , i_qis a tuple such thati₁, . . . , i_q ≥0andϕ(z)is theq-variate standard normalN_q(0, I)density. It is straightforward to see thatL = 0corresponds to the caseZ ∼ N_q(0, I). We demonstrate further in the paper (Section 2.2.3) how the flexibility and number of modes of theSNP density increase with the degree L of the polynomialP_L.

Combining theGLLVM settings (2.2) and the SNP specification (2.3) results in what we call aSNP-GLLVM, where the marginal probability density or mass function of the manifest variablesX is

f(x|µ,Γ, ψ, P_L) = 1 (2π)^q/2

∫

R^q

g(x|z)P_L²(z) exp{

−¹₂z^Tz}

dz, (2.4)

where the expression forg(x|z)is given by (2.1).

In what follows we propose a necessary and sufficient condition for the identifiability of (2.4) and define estimatorsµ,ˆ Γ,ˆ ψˆvia maximum likelihood. For conditionally normal manifest variables, the integral in (2.4) can be computed explicitly.

One of our main results is the demonstration by simulations that in someGLLVM

settings the incorrect specification of the latent variables distribution biases the es- timatorsµˆandΓ.ˆ

The estimated SNPdensity of the latent variables (or latent scores distribution)

(22)

is easy to visualize which is an advantage when compared to the semiparametric

GLLVM estimator proposed recently by Ma and Genton (2010). An obvious non- normality (multi-modality and/or skewness) of latent scores distribution can indi- cate the presence of outliers, possible non-linearity in dependence on latent variables, non-homogeneity of population or simply the inadequacy of the normal latent density to the particular data.

2.2 Semi-nonparametric ^GLLVM

2.2.1 Parametrization of P

_L

(z)

Restrictions must be imposed on the coefficients ofP_L(z)in order forh(z)in (2.3) to be a density. This can be done as in Gallant and Tauchen (1989) by introducing a proportionality constant 1/∫

P_L²(z)ϕ(z)dz and setting the constant term of the polynomial equal to one. Here we choose another parametrization ofP_L(z) that avoids difficulties of constrained optimization. This parametrization is proposed by Zhang and Davidian (2001) and consists in rewriting the validity condition onh(z) as

1 =

∫

R^q

P_L²(z)ϕ(z)dz = E{P_L²(W)}=a^T E( ˜WW˜^T)a=a^TAa, (2.5) withW ∼N_q(0, I),P_L(W) =a^TW˜,W˜ = (1, W₁, . . . , W_q, W₁², W₁W₂, . . . , W_q^L)^T, so thatAis a positive definite matrix by definition. Therefore, there exists a positive definite matrixB such thatA =B^TB. Definingc= Ba, (2.5) becomesc^Tc= 1.

Hence, c = (c₁, . . . , c_d)^T can be represented in polar coordinates: c₁ = sinφ₁, c₂ = cosφ₁sinφ₂, . . . , c_d₋₁ = cosφ₁· · ·cosφ_d₋₂sinφ_d₋₁, c_d = cosφ₁cosφ₂· · ·

(23)

cosφ_d₋₂cosφ_d₋₁,with angles−π/2< φ_t≤ π/2, t= 1, . . . , d−1in order forc to take values only on a half of the unit sphere inR^d. More details on the polar coordinates transformation can be found in Scott (1992). Note thatd=∑_L

k=0

(_q+k−1

k

)

according to Stetter (2004, page 228).

Thus, the density (2.3) can be rewritten as h(z|φ, L) =P_L²(z)ϕ(z) =(

a^Tz˜)2

ϕ(z), (2.6)

wherea=B⁻¹c,z˜= (1, z₁, . . . , z_q, z₁², z₁z₂, z₂², . . . , z_q^L)^T andφ= (φ₁, . . . , φ_d₋₁)^T. For example, whenq = 1(one latent variable),L= 2andP_L(z) = a₀+a₁z+a₂z², we obtaina₀ = sinφ₁− ^√¹₂ cosφ₁cosφ₂, a₁ = cosφ₁sinφ₂,a₂ = √¹

2cosφ₁cosφ₂.

2.2.2 Identifiability and constraints

As noted by many researchers, for example Rabe-Hesketh and Skrondal (2001) and Rabe-Hesketh and Skrondal (2004)), the major difficulty of all the models with latent variables is identifiability. According to Hastie et al. (2001, page 494): “this aspect has left many analysts skeptical of factor analysis, and may account for its lack of popularity in contemporary statistics”.

A parametric statistical model is said to be identified if distinct values of parameters correspond to distinct probability density or mass functions of the response variables. With this definition we investigate how any affine transformation of the latent variablesZ affects the probability density or mass function (2.4) of the random vectorX.

(24)

Proposition 2.2.1 For anyP_L²(z) ̸= 0, the orthogonal transformationZ₁ = CZ, (CC^T = C^TC = I_q)is the one and only one affine transformation of the random vectorZ leaving the probability density or mass function (2.4) of the random vector Xunchanged.

Corollary 2.2.2 The loadings matrixΓis undefined with respect to the orthogonal transformationZ₁ =CZ, (CC^T =C^TC =I_q).

The proofs of these results are in Appendix 3.

Huber et al. (2004) demonstrated that if the loadings matrix is undefined with respect to the orthogonal transformation then a sufficient condition for identifiability of aGLLVMis thatq(q−1)/2elements of the matrixΓare set to zero. In other words, after permutations the elements of the upper triangle of Γ should be constrained. The same authors proved that if, in addition, at least one of the elements of the loadings matrix is constrained to be either smaller or larger than zero, then the loadings matrix is necessarily identified. We can find similar conclusions in Ma and Genton (2010). The same number ofq(q−1)/2constraints is used by Jöreskog (1967) in order to obtain a single solution in factor analysis and avoid rotation.

2.2.3 Multi-modality and flexibility of

SNP

Similar to distributions considered by Ma and Genton (2004), the number of modes of the semi-nonparametric density increases with the degreeL of the polynomial P_L(z) and the number q of latent variables. Indeed, a necessary condition for a

(25)

mode (local extremum) at the pointz₀ is a null gradient:

∂

∂zP_L²(z)ϕ(z)

z=z0

= [{

2∂P_L(z)

∂z −zP_L(z) }

P_L(z)ϕ(z) ]

z=z0

= 0, i.e., either P_L(z)

z=z0

= 0 (2.7)

or {

2∂PL(z)

∂z −zP_L(z) }

z=z0

= 0. (2.8)

The set of real solutions of (2.7) can contain from 0 to L distinct manifolds of dimensionq−1or less. It is easy to see that, the solution of (2.7), if it exists, always corresponds to a local minimum of the densityP_L²(z)ϕ(z). Thus, if (2.7) has up toL different solutions,P_L²(z)ϕ(z)has up toL−1different modes. Independently of the fitted data, ifLis odd, theSNPdensity is equal to zero on a manifold of dimension q−1(i.e. ifq = 2andLis odd then theSNP density has to be equal to zero on a curve inR²). For this reason high odd degreesLshould be possibly avoided.

Equation (2.8) is a system ofqpolynomials where each polynomial is of degree L+ 1 and depends on q variables. In a regular case, i.e., without assuming that some coefficients in (2.6) are null, the system (2.8) can have up to(L+ 1)^qdifferent isolated point solutions (i.e. solutions containing only one point not manifolds such as curves) according to Stetter (2004, page 228).

Defining the number of points where both (2.7) and (2.8) hold is not trivial. But assuming that (2.7) hasLdifferent isolated point solutions in which∂P_L(z)/∂z = 0 and (2.8) has(L+ 1)^q different isolated point solutions, we obtain that (2.7) and (2.8) together define at most(L+ 1)^qisolated point solutions. This implies that an

SNP density can have at most three modes whenL = 2, q = 1; four modes when L = 2, q = 2; and thirteen modes whenL = 2, q = 3. Our practical experience whenL= 2, q = 1andL= 2, q= 2confirms these conclusions.

(26)

The sufficient condition for a local maximum (minimum) atz = z₀ is that the Hessian matrix∂P_L²(z)ϕ(z)/∂z∂z^T at this point is negative definite (respectively positive definite):

∂

∂z∂z^TP_L²(z)ϕ(z) = 2∂PL(z)

∂z

∂PL(z)

∂z^T ϕ(z) + 2∂²PL(z)

∂z∂z^T P_L(z)ϕ(z) +zz^TP_L(z)ϕ(z)−2∂P_L(z)

∂z z^TP_L(z)ϕ(z) (2.9)

−2z∂PL(z)

∂z^T P_L(z)ϕ(z)−I_qP_L²(z)ϕ(z).

Once a density of latent variables is estimated, the solutions of (2.8) can be found numerically and the expression (2.10) estimated at these points. Hence, the number of modes can be established.

2.3 Inference in ^SNP - ^GLLVM

2.3.1 Conditionally normal manifest variables

Suppose the conditional density of manifest variables given the latent ones are multivariate normal with the structural scale parameter given by a diagonal positive definite matrix Ψ = diag (ψ) ∈ R^p^×^p. Then the marginal density of x can be written as:

f(x;µ,Γ,Ψ, P_L) =|2π(Ψ+ΓΓ^T)|⁻^1/2 exp{

−¹₂x^T₀(Ψ + ΓΓ^T)⁻¹x₀}

E_z_|_x₀{P_L²(z)}, (2.10) wherex₀ =x−µ,B =I_q+ Γ^TΨ⁻¹Γand

E_z_|_x₀{P_L²(z)}=|2πB⁻¹|⁻^1/2

∫

R^q

P_L²(z)· (2.11)

exp{

−¹₂(z−B⁻¹Γ^TΨ⁻¹x₀)^TB(z−B⁻¹Γ^TΨ⁻¹x₀)} dz.

(27)

As all the moments of the multivariate normal distribution are known and com- pletely defined by the first two moments,Ez|x0{P_L²(z)}exists in explicit form and represents a2L-degree polynomial in x₀. Hence the marginal density ofx exists in closed form. WhenP_L(z) ≡ 1we obtain the classical factor analysis model for normally distributed manifest variables, as described, for example, by Mardia et al.

(1979).

Using (2.10) we obtain the following log-likelihood function ℓ(µ,Γ,Ψ, φ, L|x₁, . . . , x_n) =− n

2log |2π(Ψ + ΓΓ^T)| − (2.12) 1

2

∑n i=1

x^T_0i(Ψ + ΓΓ^T)⁻¹x_0i+

∑n i=1

log[

E_z_|_x_0i{P_L²(z)}] . The parameters of interest are those inherited from factor analysis, namely µ,Γ,Ψ, with additional parameters φ and L responsible for the shape of the latent variables density. In practiceLis fixed by the rule to be discussed in Section 2.3.3. The final estimators are defined as

ˆ

µ=µ^∗+ Γ^∗E(˜ Z), Γ = Γˆ ^∗cov˜ ^1/2(Z), Ψ = Ψˆ ^∗, (2.13) where (µ^∗,Γ^∗,Ψ^∗, φ^∗) = arg max_µ,Γ,Ψ,φℓ(µ,Γ,Ψ, φ | L, x₁, . . . , x_n), E(Z)˜ and cov˜ ^1/2(Z) are found given φ^∗ and the SNP density (2.6). Thus, µˆ and Γˆ are the estimators corresponding to the uncorrelated latent variables with zero expectation and unit variance.

In the optimization ofℓ(µ,Γ,Ψ, φ|L, x₁, . . . , x_n)we use an analytically computed gradient and Hessian matrix (gradient can be found in Appendix 1, the Hes- sian expression is formidable and is available upon request). It should be stressed that the Hessian is computed in a matrix form offering a considerable advantage in R implementation compared to existing Hessian matrix computations such as

(28)

in Lawley (1967), Jennrich and Thayer (1973) and Ramsey (2010). The optimization is done inRwith thenlminbfunction and is sensitive to the choice of initial values. We discuss how to cope with this problem at the end of the next section.

2.3.2 Mixture of conditionally binary and normal manifest vari- ables

In practice the presence of both continuous and binary (or discrete) responses is more frequent than exclusively continuous responses. Suppose that amongpman- ifest variables the firstp1 are normal conditionally on the latent variables, and the lastp−p1 are conditionally Bernoulli, i.e., the joint conditional probability mass function from (2.1) is:

g(x|z) =

p1

∏

j=1

[

− 1

√2πψj

exp { 1

2ψj(x_j−µ_j−γ_j^Tz)²

}] ∏^p

j=p1+1

{exp(x_jµ_j +x_jγ_j^Tz) 1 + exp(µ_j +γ_j^Tz)

} , (2.14) where the expression in the last brackets is obtained by setting pr(x_j = 1) =p_j and choosing the logit link, i.e.,log{pj/(1−pj)}=µj+γ_j^Tz. Then, the manifest variables marginal density for theSNP-GLLVMmodel is obtained straightforwardly by plugging (2.14) in (2.4). We approximate the corresponding log-likelihood function ℓ(µ,Γ,Ψ, φ|L, x1, . . . , xn)with one latent variable by computing the integral with theRcommandintegrate. The latter uses multiple algorithms including different adaptive integration algorithms for which “the evaluation points are clustered in the neighborhood of difficult spot of each integrand” (Piessens et al., 1983).

In a similar GLLVM setting but with normal distribution of latent variables, Huber et al. (2004) implemented a Laplace approximation of the integral. We high-

(29)

light here that the Laplace approximation for integrals is conceived for integrand with only one absolute maximum (De Bruijn, 1981, page 63) and cannot be used as approximation of the integrand in (2.4) which can have multiple local maxima.

Similarly, Gaussian or Gauss-Hermite quadratures will perform poorly. Other alternatives for computing the integral would be to consider a Monte Carlo EM algorithm as implemented by Chen et al. (2002) or an adaptive quadrature algorithm for numerical integration.

The estimatorsµˆandΓˆare defined as in (2.13) with the approximationℓ(µ,˜ Γ,Ψ, φ| L, x₁, . . . , x_n)(due to the integral) of the log-likelihood functionℓ(µ,Γ,Ψ, φ | L, x₁, . . . , x_n). The optimization is achieved vianlminbwith 10⁻⁴ as absolute and relative tolerance and an analytically computed gradient and Hessian (the gradient is available in Appendix 2, analytical expression of the Hessian is available upon request). As previously, the optimized function has multiple local optimums. Thus, an appropriate set of initial values is essential for a reliable optimization. We use as initial values for the parametersµ^∗,Γ^∗andΨ^∗their estimations by maximum likelihood under the normality assumption of latent variables. Initial values for theφpa- rameters are taken through the grid of initial values constructed by theRcommand cover.design(Furrer et al., 2009) in the space[−π/2−π/10, π/2 +π/10]^d⁻¹. The number of initial values depends ond−1and is defined empirically (we stop to increase the number of initial values if the best value of the optimized function does not change after few successive increments). In our experience, this approach is more stable, fast and reliable than the genetic optimization algorithm (with a very large population size) implemented in the packagergenoud(Mebane & Sekhon, 2010) or any other optimization methods implemented in theRcommandoptim.

(30)

2.3.3 Tuning the flexibility of the

SNP

density

The flexibility of the SNP density is controlled by the degree L of the polynomial P_L(z) in (2.6). Different possibilities have been explored to choose L: the original work by Gallant and Nychka (1987) proposed to fix L by a determinis- tic rule L = n^α, 0 < α < 1, with n being the sample size. Davidian and Gallant (1993) and Fenton and Gallant (1996) explored under different settings whether an adaptive rule for the choice of Lcan be applied. Following these authors we select L on the basis of one of the information criteria taking the form

−ℓ(µ,Γ,Ψ, φ|L, x₁, . . . , x_n) +C(n)k/n, wherekis the number of unconstrained parameters in the model with fixedL and C(n) = 1 for the Akaike Information Criterion (AIC, Akaike, 1974),C(n) = 0.5 logn for the Schwarz Information Cri- terion (BIC, Schwarz, 1978), andC(n) = log lognfor the Hannan-Quinn criterion (HQ, Hannan, 1987).

As an alternative to AIC in models with focus on inference about latent variables, Vaida and Blanchard (2005) proposed a conditional Akaike information criterion (cAIC) for mixed effects models. The cAIC for the factor analysis model on large samples would be approximated by

cAIC =−2 logg(x|µ,ˆ Γˆˆz)−2(ρ+p),

whereµ,ˆ Γˆ are parameters estimated on the data,zˆ= E(z|µ,ˆ φ, L, x)ˆ is the empir- ical Bayes prediction,pcounts unknown parameters inψ andρcounts the degrees of freedom as introduced by Hodges and Sargent (2001). The number of those degrees of freedom “is often much smaller than the number of parameters” in models with latent variables. The intuitive idea of cAICis appealing: we prioritize models

(31)

whereµ,ˆ Γˆˆzinduce greater density of the observedxwith appropriate penalty to the complexity of those models. However, to date cAICis not developed for generalized mixed effects models. Lu et al. (2007) suggested the method of counting degrees of freedom in generalized linear hierarchical models by using the Laplace approximation of the likelihood function. Unfortunately due to the potential presence of multiple maxima the method of Lu et al. (2007) is not applicable to the likelihood induced by (2.14).

As another alternative to model selection criteria we explored the likelihood ratio tests for testing the hypothesisL = 0(i.e. φ₁ = π/2) and L = 1(i.e. φ₂ = π/2). However in all simulated cases the likelihood ratio statistic’s distribution is far from an assumed chi-squared. This is due to the irregularity conditions discussed by Drton (2009) together with the fact thatφ₁ = π/2is a boundary point. While this is a topic beyond the scope of this paper, we expect that resampling technique for obtaining the likelihood ratio statistic’s distribution could give good results.

Given that the above alternatives are not applicable to our setting, we restrict ourselves to the use of AIC, BIC and HQ for choosing L. When the exact log- likelihood functionℓ(µ,Γ,Ψ, φ |L, x₁, . . . , x_n)cannot be computed, as for the case of the Bernoulli distribution, we approximate it byℓ(µ,˜ Γ,Ψ, φ | L, x₁, . . . , x_n)as described in Section 2.3.2.

(32)

2.4 Monte Carlo simulations

We explore the performance of the proposed method on finite samples by simulating 600 samples of size 500 issued from theGLLVM with four manifest variables and one latent: three manifest conditionally normal and one conditionally Bernoulli.

The univariate scores of the latent variable are issued from three different distributions: 1) symmetric unimodal normalN(0,1); 2) asymmetric trimodalSNPdistribu- tionh(z) = (

−cos 0.7/√

2 +zsin 0.7 +z²cos 0.7/√ 2)2

ϕ(z); and 3) asymmetric bimodal mixture of normals0.9N(2,1) + 0.1N(−2,0.25). For simulating from the

SNPdistribution we use the algorithm proposed by Gallant and Tauchen (1992).

For each simulated data set we estimate the coefficients of the SNP-GLLVM by the methodology of Sections 2.2 and 2.3 forL= 2(SNP2),L= 1(SNP1) andL= 0 (SNP0). The latter corresponds to the traditional maximum likelihood estimation under the normality assumption of the latent variable. Theoretically, as proved by Gallant and Nychka (1987), the parametersµ,ΓandΨare estimated consistently if Lis sufficiently large and together withφgenerate a density (2.6) close enough to the true one. In practice, as illustrated later in this section, an unduly large value of Lcan result in overfitting and bias due to the integral approximation, while L = 1 or 2 are usually sufficient to detect the departure from normality of the considered latent variable densities.

To make the simulation results comparable we impose the latent variable to have variance equal to the true one, that is: 1)var(Z) = 1for the normal density;

2)var(Z) = 2.228for the SNP density; and 3)var(Z) = 4.135 for the mixture of normals density. These choices give a slight advantage to the SNP0 estimator. We use grids of 6 initial values forSNP1 optimization and of 13 initial values forSNP2

(33)

(constructed as discussed in Section 2.3.2).

We compute theAIC,BICandHQinformation criteria discussed in Section 2.3.3 for SNP2, SNP1 and SNP0 estimations on each data set. In Tables 2.4 and 2.4 we report detailed simulation results for normal and SNP2 generated latent variables.

For the normal latent the estimates are all nearly unbiased, despite three trimodal estimated SNP2 densities not selected by any information criterion (all other es- timatedSNP1 and SNP2 densities are unimodal and nearly symmetric). When the true latent variable distribution isSNP2 all estimates of parameters corresponding to the conditionally normal manifest variables (i.e.µ₁, µ₂, µ₃, γ₁, γ₂andγ₃) are nearly unbiased. The fact that the conditionally normal observable variables are not sensitive to the wrong specification of the latent variable distribution has been already observed in the literature theoretically by Anderson and Amemiya (1988) and in simulations by Ma and Genton (2010). However, the SNP0 and SNP1 estimates of the parameters related to the loading of the Bernoulli manifest variable (γ₄) present biases though not big. The bias is clear for theSNP0 estimates, when the assumed latent variable density is far from the true one, and diminishes when the estimated density gets closer to the true one. It is surprising that the bias in the SNP2 estimates ofµ₄ is greater than theSNP1 bias and almost equal to theSNP0 bias for the same parameter. A closer inspection of estimates shows that the medians of theµ₄ estimates for bothSNP1 and SNP2 are exactly at the true value and the biases of the mean are both due to a few (fewer for SNP1) extreme values inµ₄ estimates. The median of theµ₄ estimates by SNP0 is equal to the mean despite the presence of a few extreme values. Similarly, a few extreme values are found when inspecting the SNP1 and SNP2 estimates of γ₄ (the median of γ₄ estimates by SNP1 is equal

(34)

SNP0 SNP1 SNP2

AVE MC SD SE AVE MC SD SE AVE MC SD SE NORMAL LATENT

µ₁(0) 0.06 0.08 0.08 0.00 0.08 0.08 0.00 0.08 0.08 µ₂(0) 0.07 0.09 0.08 0.00 0.09 0.08 0.00 0.09 0.08 µ3(0) 0.06 0.08 0.08 0.00 0.08 0.08 0.00 0.08 0.08 µ₄(0.7) 0.70 0.16 0.16 0.70 0.16 0.17 0.70 0.16 0.16 γ1(1.4) 1.40 0.07 0.07 1.40 0.07 0.07 1.40 0.07 0.07 γ₂(1.6) 1.60 0.07 0.07 1.60 0.07 0.07 1.60 0.07 0.07 γ3(1.4) 1.40 0.07 0.07 1.40 0.07 0.07 1.40 0.07 0.07 γ₄(2) 2.02 0.23 0.24 2.02 0.23 0.25 2.01 0.23 0.24 ψ₁(1) 1.00 0.09 0.09 1.00 0.09 0.09 1.00 0.09 0.09 ψ2(1) 1.00 0.11 0.10 1.00 0.11 0.10 1.00 0.11 0.10 ψ₃(1) 1.00 0.09 0.09 1.00 0.09 0.09 1.00 0.09 0.09

AICpreferred 79.2% of timeSNP0; 9.8% of timeSNP1 and 1% of timeSNP2

BICpreferred 99.7% of timeSNP0; 0.15% of timeSNP1 and 0.15% of timeSNP2

HQpreferred 95.5% of timeSNP0; 2.5% of timeSNP1 and 2% of timeSNP2 Table 2.1: Simulation results on 600 samples of size 500. The true latent variable distribution is standardized normalN(0,1). Mean parameters are denoted by (µ₁, µ₂, µ₃, µ₄)^T, loadings by (γ₁, γ₂, γ₃, γ₄)^T and uniquenesses by (ψ₁, ψ₂, ψ₃)^T with true values in parentheses. AVE MC, average of estimates;SD, standard deviation;SE, average standard errors estimated by sandwich covariance matrix.

(35)

SNP0 SNP1 SNP2

AVE MC SD SE AVE MC SD SE AVE MC SD SE SNP2LATENT

µ₁ (1.95) 1.96 0.10 0.10 1.96 0.10 0.10 1.96 0.10 0.10 µ₂ (2.22) 2.23 0.12 0.12 2.23 0.12 0.12 2.23 0.12 0.12 µ2 (1.95) 1.95 0.10 0.10 1.95 0.10 0.10 1.95 0.10 0.10 µ₄ (3.49) 3.62 0.43 0.46 3.56 0.45 0.44 3.59 0.48 0.48 γ1(1.4) 1.40 0.08 0.08 1.39 0.06 0.09 1.40 0.06 0.09 γ₂(1.6) 1.60 0.08 0.09 1.59 0.07 0.10 1.60 0.07 0.10 γ₃(1.4) 1.39 0.08 0.08 1.39 0.06 0.09 1.39 0.06 0.09 γ4(2) 2.15 0.36 0.42 1.90 0.25 0.35 2.04 0.30 0.41 ψ₁(1) 0.98 0.09 0.09 0.98 0.08 0.08 0.98 0.08 0.08 ψ2(1) 1.00 0.11 0.11 0.99 0.10 0.09 0.99 0.10 0.09 ψ₃(1) 1.00 0.09 0.09 1.00 0.09 0.08 1.00 0.09 0.08

AIC,BICandHQpreferred 100% of timeSNP2

Table 2.2: Simulation results on 600 samples of size 500. The true latent variable distribution is asymmetric trimodal SNP2 distribution h(z) = (−cos 0.7/√

2 +zsin 0.7 +z²cos 0.7/√ 2)2

ϕ(z). Mean parameters are denoted by(µ₁, µ₂, µ₃, µ₄)^T, loadings by(γ₁, γ₂, γ₃, γ₄)^T and uniquenesses by(ψ₁, ψ₂, ψ₃)^T with true values in parentheses. AVE MC, average of estimates;SD, standard deviation;SE, average standard errors estimated by sandwich covariance matrix.

(36)

to the mean while the median ofγ₄ estimates by SNP2 is equal to the true value).

This phenomenon illustrates the bias induced by the approximate integration as discussed by Pinheiro and Chao (2006). It seems logical that this bias is greater for the integrand with greater amount of difficult spots. The integral approximation bias is an additional reason for a moderate use of highLvalues in our implementation.

Table 2.4 summarizes the simulation results for the mixture of normals latent variable. As previously,SNP0 estimates of parameters corresponding to the continuous manifest variables are all nearly unbiased while theSNP0 estimates related to the Bernoulli manifest variable present biases. We conclude that for GLLVM with discrete manifest variables the wrong specification of the latent variable distribution induces a bias in the estimation. Similar conclusions can be found in Ma and Genton (2010). The differences inµ₄andγ₄estimates confirm the integral approximation bias. The table withAIC,BICandHQselected estimation results for mixture of normals latent variable are available in Table S.1 in the supplementary material.

For normal andSNP2 latent variables a similar table is not of interest becauseAIC,

BICandHQalmost always choose the right model.

From the results of Tables 2.4 and 2.4 we infer that AIC prefers models with greaterL, BIC with smaller L and HQ choices are intermediate. Our conclusions meet those by Zhang and Davidian (2001) and Chen et al. (2002)

The advantage of using the proposed method can be appreciated when looking at the shape of estimated densities in Figure 2.1. The figure illustrates that theSNP1 andSNP2 specifications selected byHQcapture the main features of the true density of the latent variable.

(37)

SNP0 SNP1 SNP2

AVE MC SD SE AVE MC SD SE AVE MC SD SE

µ₁ (2.24) 2.25 0.10 0.11 2.25 0.10 0.11 2.25 0.10 0.11 µ₂ (2.57) 2.57 0.12 0.12 2.57 0.12 0.12 2.57 0.12 0.12 µ3 (2.24) 2.25 0.11 0.11 2.25 0.11 0.11 2.25 0.11 0.11 µ₄ (3.90) 4.36 0.56 0.58 3.91 0.43 0.44 3.95 0.46 0.65 γ1(1.4) 1.39 0.08 0.09 1.39 0.06 0.10 1.39 0.06 0.10 γ₂(1.6) 1.59 0.09 0.10 1.59 0.07 0.11 1.59 0.07 0.11 γ3(1.4) 1.40 0.08 0.09 1.39 0.06 0.10 1.39 0.06 0.10 γ4(2) 2.22 0.40 0.53 2.00 0.29 0.45 2.05 0.29 0.63 ψ₁(1) 0.99 0.09 0.09 1.00 0.08 0.09 1.00 0.09 0.09 ψ2(1) 0.99 0.11 0.11 1.01 0.11 0.10 1.01 0.11 0.10 ψ₃(1) 0.99 0.09 0.09 1.00 0.09 0.09 1.00 0.09 0.09

AICpreferred 0% of timeSNP0; 11.4% of timeSNP1 and 88.6% of timeSNP2

BICpreferred 0% of timeSNP0; 69.7% of timeSNP1 and 30.3% of timeSNP2

HQpreferred 0% of timeSNP0; 33.4% of timeSNP1 and 66.6% of timeSNP2 Table 2.3: Simulation results on 600 samples of size 500. The true latent variable distribution is asymmetric mixture of normals 0.9N(2,1) + 0.1N(−2,0.25).

Mean parameters are denoted by(µ1, µ2, µ3, µ4)^T, loadings by(γ1, γ2, γ3, γ4)^T and uniquenesses by(ψ1, ψ2, ψ3)^T with true values in parentheses. AVE MC, average of estimates; SD, standard deviation; SE, average standard errors estimated by sandwich covariance matrix.

(38)

-4 -2 0 2 4

0.00.10.20.30.40.5

(a)

z

h(z)

-4 -2 0 2 4

0.00.10.20.30.40.5

(b)

z

h(z)

Figure 2.1: The dashed line is the average of estimated densities for fits preferred by

HQ, the shaded area is the pointwise estimated confidence envelope, the dotted line is the true density for (a) mixture of normals0.9N(2,1)+0.1N(−2,0.25), (b)SNP2 densityh(z) = (

−cos 0.7/√

2 +zsin 0.7 +z²cos 0.7/√ 2)2

ϕ(z)(dashed and dotted lines coincide forSNP2).

2.5 Data analysis

2.5.1 Swineford-Holzinger data analysis

This data set introduced by Holzinger and Swineford (1939) is well known in the statistical literature on factor analysis and contains scores of nine psychological tests for 145 individuals. These nine tests are: “visual perception”, “cubes”, “paper from board”, “general information”, “sentence completion”, “word classification”, “figure recognition”, “object-number” and “number-figure”. In this data set Jöreskog (1971) and Sörbom (1974) concluded the presence of four groups of individuals

Generalized linear latent variable models with flexible distributions

Thesis

Reference

Generalized linear latent variable models with flexible distributions

G ENERALIZED L INEAR L ATENT V ARIABLE M ODELS WITH

F LEXIBLE D ISTRIBUTIONS

Generalized Linear Latent Variable Models with Flexible Distributions

Irina IRINCHEEVA

prof. Eva Cantoni et prof. Marc G. Genton

Doctorat ès sciences économiques et sociales mention statistique

Contents

Résumé

Abstract

Acknowledgments

Chapter 1

Introduction

Chapter 2

Generalized linear latent variable models with flexible distribution of latent variables

2.1 Generalized linear latent variable models

2.2 Semi-nonparametric ^GLLVM

2.2.1 Parametrization of P

(z)

2.2.2 Identifiability and constraints

2.2.3 Multi-modality and flexibility of

2.3 Inference in ^SNP - ^GLLVM

2.3.1 Conditionally normal manifest variables

2.3.2 Mixture of conditionally binary and normal manifest vari- ables

2.3.3 Tuning the flexibility of the

density

2.4 Monte Carlo simulations

(a)

(b)

2.5 Data analysis

2.5.1 Swineford-Holzinger data analysis

Generalized linear latent variable models with flexible distributions

Thesis

Reference

Generalized linear latent variable models with flexible distributions

G ENERALIZED L INEAR L ATENT V ARIABLE M ODELS WITH

F LEXIBLE D ISTRIBUTIONS

Generalized Linear Latent Variable Models with Flexible Distributions

Irina IRINCHEEVA

prof. Eva Cantoni et prof. Marc G. Genton

Doctorat ès sciences économiques et sociales mention statistique

Contents

Résumé

Abstract

Acknowledgments

Chapter 1

Introduction

Chapter 2

Generalized linear latent variable models with flexible distribution of latent variables

2.1 Generalized linear latent variable models

2.2 Semi-nonparametric GLLVM

2.2.1 Parametrization of P

(z)

2.2.2 Identifiability and constraints

2.2.3 Multi-modality and flexibility of

2.3 Inference in SNP - GLLVM

2.3.1 Conditionally normal manifest variables

2.3.2 Mixture of conditionally binary and normal manifest vari- ables

2.3.3 Tuning the flexibility of the

density

2.4 Monte Carlo simulations

(a)

(b)

2.5 Data analysis

2.5.1 Swineford-Holzinger data analysis

2.2 Semi-nonparametric ^GLLVM

2.3 Inference in ^SNP - ^GLLVM