• Aucun résultat trouvé

Goodness-of-fit for generalized linear latent variables models

N/A
N/A
Protected

Academic year: 2022

Partager "Goodness-of-fit for generalized linear latent variables models"

Copied!
79
0
0

Texte intégral

(1)

Thesis

Reference

Goodness-of-fit for generalized linear latent variables models

CONNE, David

Abstract

Generalized Linear Latent Variables Models (GLLVM) enable the modelling of relationships between manifest and latent variables. These models are widely used in the social sciences.

In a latent variable framework, one works with several unobservable quantities (latent scores, parameters) and it is herefore essential to choose a model as close as possible to the data.

To test the appropriateness of a particular model, ne needs to define a Goodness-of-fit test statistic (GFI). Available GFI can be separated in two groups: first, GFI based on a comparison between the sample covariance and the model covariance of the manifest variables, which implies reducing the information that is contained in the data to their covariance structure, and secondly emph{Pearson}-type statistic when manifest variables are binary. In this work, we propose an alternative Goodness-of-fit statistic based on some distance comparison between the latent scores and the original data. This GFI takes into account the nature of each manifest variable and can in principle be applied in various situations and in particular with models with both discrete and [...]

CONNE, David. Goodness-of-fit for generalized linear latent variables models . Thèse de doctorat : Univ. Genève, 2008, no. SES 681

URN : urn:nbn:ch:unige-67331

DOI : 10.13097/archive-ouverte/unige:6733

Available at:

http://archive-ouverte.unige.ch/unige:6733

(2)

Goodness-of-t for Generalized Linear Latent

Variables Models

David Conne

Submitted for the degree of Ph.D in Eonometris and Statistis

Department of Eonometris

University of Geneva, Switzerland

Aepted on the reommendation of:

Dr. Eva Cantoni, University ofGeneva

Prof. Sophia Rabe-Hesketh,UniversityofCalifornia, Berkeley

Prof. ElvezioRonhetti, o-advisor, Universityof Geneva

Prof. Maria-Pia Vitoria-Feser, o-advisor, Universityof Geneva

Thesis n681

(3)

risé l'impression de la présente thèse, sans entendre, par là, émettre auune

opinion sur les propositionsqui s'y trouvent énonées et qui n'engagent que

la responsabilitéde leur auteur.

Genève, le9 otobre 2008

le doyen

Bernard MORARD

(4)

Generalized LinearLatentVariablesModels (GLLVM)enable the modelling

of relationships between manifest and latent variables. These models are

widely used inthe soialsienes. In alatentvariableframework,one works

with several unobservable quantities (latent sores, parameters) and it is

thereforeessentialtohooseamodelasloseaspossible tothe data. Totest

theappropriatenessofapartiularmodel,oneneedstodeneaGoodness-of-

t test statisti (GFI). Available GFI an be separated intwo groups: rst,

GFI based on a omparison between the sample ovariane and the model

ovarianeof the manifest variables,whihimplies reduing the information

that is ontained in the data to their ovariane struture, and seondly

Pearson-type statisti when manifest variablesare binary. In this work, we

propose analternativeGoodness-of-tstatistibasedonsomedistane om-

parisonbetween the latent soresand theoriginaldata. This GFItakesinto

aount the natureof eahmanifest variable and an inpriniplebeapplied

in various situations and in partiular with models with both disrete and

ontinuous manifest variables. We propose two proedures to ompute the

p-values of ourGFI.The rst one isbased onthe asymptotidistributionof

aU-statistiandappearstobequitediulttoimplementnumerially. The

seond one is based onresampling tehniques and requires aonsistent esti-

matoroftheloadings,thesores,andaorrespondingasymptotiovariane

(5)

goodperformane interms ofempirialleveland empirialpower, espeially

ompared to the one proposed by Satorra and Bentler (2001).

Finally, a real dataset is analyzed to highlight the appliation of the

methodology. In most health surveys the state of health of individuals is

measured through several qualitative, disrete quantitative or dihotomi

variables. From these variables, one aims at building univariate indiators

of health that summarize the information. To do so, we propose to use a

GLLVM,inwhihthe latentvariablesarethe healthindiators. Weonsider

the data from the 1997 Swiss Health Survey and we dene a new model

with twohealthindiators. Therst one desribesthehealthstatus indued

merely by the age of the subjet, while the seond one aptures another

dimension ofthe health status. This lattermodelisnot rejeted by our GFI

and gives another insight into the understanding of the health status of the

population.

(6)

Les modèles linéaires à variables latentes généralisés permettent de dénir

un lienentre lesvariablesmanifestesetlesvariableslatentes.Ce typede mo-

dèles estbeauouputiliséen sienes humaines. Dansleadred'unmodèleà

variableslatentes,lenombredequantitésinonnuesesttrèsimportant(para-

mètres,sores),ilest dès lorsessentielde hoisir unmodèleaussiprohe que

possibledesdonnées originales.Pourtestersiun modèlepartiulierest perti-

nent,ilfautdénirun testd'adéquation(GFI).Ilexiste un grandnombre de

testsd'adéquationdisponiblesdanslalittératurepourlesmodèlesàvariables

latentes. Ceux-i peuvent être séparés en deux groupes : premièrement, les

GFI basés sur la omparaison entre la matrie de variane-ovariane sous

le modèle à variables latentes et sous le modèle saturé, e qui revient à ré-

duire l'information ontenue dans les données à une matrie de ovariane

et deuxièmement, les statistiques du type de Pearson quand toutes les va-

riables manifestessont binaires.Danse travail,nous proposons un nouveau

testd'adéquationbasésurlaomparaisondesdistanesentrelesobservations

sur l'éhantillon brut et elles données par le modèle.

Ce test peut être en prinipe appliqué ave des variables manifestes de

diérentstypes,enpartiulieravedesvariablesmanifestesdisrètesetonti-

nues. Nousproposons deux tehniques pour évaluer lesp-valeurs de e GFI.

La première est basée sur la distribution asymptotique d'une U-statisti et

(7)

teur onvergent des loadings, des sores ainsi qu'une matrie de ovariane

asymptotique orrespondante.

Une étude du omportement de ette statistique à l'aide de simulations

révèle que laperformane de notre statistiqueest bonne en terme de niveau

empirique et de puissane, en partiulier en omparaison de elle proposée

par Satorra and Bentler (2001). Finalement, une appliation sur un jeu de

donnéesréellesestprésentéepourmettreenévidenel'appliationpotentielle

de ette proédure. Dans les enquêtes de santé, la santé des individus est

mesurée àtravers diérents types de variables ommelesvariables ordinales

ou dihotomiques. A partir de es variables, on herhe à onstruire un ou

plusieurs indiesde santé. Nousproposons iid'utiliserles modèles linéaires

à variables latentes généralisés qui permettent d'estimer une ou plusieurs

variables latentes ontinues à partir d'un groupe de variables observables.

Nous onsidérons ii les données issues de l'enquête suisse sur la santé de

1997. Nous proposons un nouveau modèle ave deux indies de santé : le

premier dérit l'état de santé lié à l'âge du sujet et le seond apture une

dimension de la santé indépendante de l'âge. Ce modèle n'est pas refusé

par notre test d'adéquation et permet de d'évaluer la santé sous un angle

nouveau.

(8)

1 Introdution 3

2 Generalized Linear Latent Variable Models (GLLVM) 9

3 Goodness-of-tfor GeneralizedLinear Latent VariableMod-

els (GLLVM) 15

3.1 Test Statisti . . . 15

3.2 Derivation of the p-value of the Test . . . 18

3.3 Computing the p-value . . . 20

4 Other GFI for GLLVM 23

4.1 LikelihoodRatio Test . . . 24

4.2 Satorraand Bentler (S&B) GFI . . . 25

5 Simulation study 27

5.1 Disussion of the Results . . . 30

Appendix A: SimulationParameters . . . 31

6 Asymptoti distribution 35

6.1 Asymptotis of

ω 1

. . . . . . . . . . . . . . . . . . . . . . . . . 35

6.2 Asymptoti distributionof

S

. . . . . . . . . . . . . . . . . . . 36

(9)

6.2.1 Calulation of

E [h(X i 1 , X i 2 )]

. . . . . . . . . . . . . . 36

6.2.2 Estimationof

E [h(X i 1 , X i 2 )]

. . . . . . . . . . . . . . 43

6.2.3 Estimationof

ζ 1

. . . . . . . . . . . . . . . . . . . . . . 46

6.3 Disussion of the Results . . . 47

Appendix B: U-statisti of order 2. . . 48

Appendix C: Expetationestimators . . . 49

7 Appliation: A Latent Variable Approah for the Constru- tion of Continuous Health Indiators 52 7.1 Introdution . . . 52

7.2 Generalized Linear Latent VariableModels . . . 54

7.3 Health Indiators forthe 1997 Swiss Health Survey . . . 56

7.4 Conlusion . . . 62

Appendix D: Model and Estimated Parameters . . . 63

Referenes 67

(10)

Introdution

In many sienti elds, theoretial onepts that annot be measured di-

retly are dened by meansofobservableindiators. Someexamples inlude

intelligeneinpsyhologyorwelfareineonomis. Inthesesituations,onese-

letsobservablevariablessupposedtobelinkedtotheunobservablevariables.

The objetive istolinkthe unobservable variables, alledlatentvariables to

the observed ones, alled manifest variables. Jöreskog (1969) proposed a

model based on a linear link between the latent variables assumed to be

normaland the manifestvariablesassumed tobeonditionallynormalgiven

the latentones. Thisonditionallowstoomputeexat maximumlikelihood

estimatorsusing theovarianematrix ofthemanifestvariables. Thismodel

is implemented in well-known and widely used softwares suh as LISREL

(Jöreskog, 1990) or Mplus (Muthen and Muthen, 2001). However, in eo-

nomis and soial sienes, researhers very often work with surveys whih

ontain responses measured on ordinal or binary sales. When the man-

ifest variables are not normally distributed, the methods implemented in

these softwares suppose thatthe manifest variablesare indiretobservations

(11)

variableapproah, doesn't take diretly intoaountthe atualdistribution

ofthevariables,exeptwhenallmanifestvariablesarebinarywiththeprobit

link.

Bartholomew (1984) and Moustaki and Knott (2000) proposed to drop

the assumption of multivariate normaldistribution with the speiation of

theGeneralizedLinearLatentVariablesModel(GLLVM).Thismodelallows

to onsider all distributions belonging to the exponential family, whih in-

ludes both ontinuous and disrete distributions,suhas normal,binomial,

Poisson,et. WithGLLVM,the likelihoodfuntiondependson multivariate

integralswhihannotbealulatedanalytially. Tooveromethis problem,

Moustaki and Knott(2000) deneestimators based onaGauss-Hermite ap-

proximationofthese integrals. Unfortunately,this methodisofteninfeasible

when the number oflatentvariablesislarger than 2. Huber, Ronhetti, and

Vitoria-Feser (2004) propose instead to use a Laplae approximation and

dene new estimators whih an be viewed as M-estimators (Huber, 1981).

This allows onsistent estimation and inferene even in the presene of a

large numberof latentand manifestvariables. Furthermore,the Laplaeap-

proximation allows to dene the h-likelihood sores on eah latent variable,

the asymptoti properties of whih an be found in Lee and Nelder (1996).

These estimated sores an also be viewed as penalized quasi likelihood es-

timators (Green, 1987). Moreover, inthe Bayesian approah, the estimated

soresarethemodaloftheposteriordistributionofthesoreswithestimated

parameters. These estimators are alled the EmpirialBayes modal(EBM)

bySkrondalandRabe-Hesketh(2004),ormodalaposteriori(MAP)byBok

(1985). Forageneraloverview onlatentvariablemodeling,seeSkrondaland

(12)

Inalatentvariableframework,oneworkswithseveralunobservablequan-

tities (latent sores, parameters) and it is therefore essential to hoose a

model as lose as possible to the data. More speially, the number of la-

tentvariablesislearlyunknown,aswellasthefatthatapartiularmanifest

variableis linked or not to a partiular latent variable. To ompare models

with a dierentnumber of latent variables, one ould use the Akaike(1973)

riterion, whih is a powerful measure of relative t; see Conne (2003) for

some numerial omparison in this ontext. However, this riterion suers

fromanimportantdrawbakwhenapplieddiretlytolatentvariablesmodels

inludingthe denitionofthe likelihoodand the roleof thesoresas param-

eters. For a disussion and a proposition of a orreted Akaike riterionfor

mixed-eetsmodels, see e.gVaida andBlanhard(2005). Even if oneould

extendthis riteriontolatentvariablesmodels,itwouldn'tdesribeabsolute

t and wouldn't test the appropriateness of a partiular model. To do so,a

Goodness-of-tindex (GFI) is neessary.

Several GFIare available for latent variable models in the literature. In

the ase of multivariate normal manifest variables, a natural hoie is the

likelihood ratio test (LRT) on the ovariane matrix, see e.g Bartholomew

and Knott (1999). More preisely, the LRT is dened as the dierene be-

tweenthelog-likelihoodevaluatedwiththe ovarianematrixonstrainedby

the latent model and thatevaluated atthe unonstrained ovariane matrix

(saturated model). When manifest variables are not multivariate normal,

one an use a funtion of the ovariane matrix of the underlying normally

distributed manifest variables to dene a Goodness-of-t. Indeed, the un-

onstrained ovarianematrix an be estimated withpolyhoriorrelations

(13)

method suers from two drawbaks: rstly, using polyhori or polyserial

orrelations implies reduing the information that is ontained in the data

to their ovariane struture. Sine the ovariane matrix is not a suient

statisti when the variables are ordinal or binary, this implies a loss of in-

formation. Moreover, the estimation of polyhori or polyserial orrelations

impliesthe estimationofanumberof parametersthatinreases rapidlywith

the number of manifest variables. Seondly, this method is strongly de-

pendent on the normality assumption of the underlying manifest variables.

Therefore, a GFIbased ona funtionof the ovariane matrix ofthe under-

lying normally distributed manifest variables in a non-normal setting tests

simultaneously the normalityassumption of the underlyingvariable and the

t of the GLLVM struture, exept when all manifest variables are binary,

see e.g. Muthen (1993).

In the binary ase, the data an be represented by a ontingeny table

and a Pearson-type statisti an be derived. However, for a moderate sam-

ple size and animportantnumberof variables,one isfaed with theurse of

dimensionality problemand this methodbeomes unfeasible. In the ordinal

ase, the number of ells an be large and the problem is even worse. To

overome this problem, Glas (1988), Reiser (1996) or Bartholomew and Le-

ung (2002) propose to use only informationonlower order margins(usually

rst and seond order). This method an only be used if all variables are

binary orordinal, but it is unlear howto extend it to mixtures of dierent

type of manifest variables.

Othertypesofgoodness-of-tstatistisavailableinsoftwareslikeLISREL

and MPLUS, are the RNI, RMSEA or SRMR. All of them are funtions of

(14)

Moreover, if the manifest variables are not multivariate normal, their null

asymptoti distributions are not available. They are used with empirial

ut-o riteria and their p-values are not uniformly distributed under the

null hypothesis,see Marsh, Hau,and Wen (2004).

We propose an alternative goodness-of-t statisti whih is not based

on the omparison between the orrelation matries omputed respetively

diretly from the data and through the estimated GLLVM, but on some

distane omparison between the latent sores and the original data. The

onept of distane is widely used in the ontext of luster analysis to nd

subjets that are similar,see e.g Kaufman and Rousseeuw (1990). Our GFI

is based on the idea that two subjets that are very dissimilar in the data

spae should also be very dissimilarin the latent sores spae, if the hosen

model (under the null hypothesis)is orret.

OurGFIaninpriniplebeappliedinvarioussituationsandinpartiular

with models with both disrete and ontinuous manifest variables. For sim-

pliity we develop here our GFI in the ase of independent latent variables.

However, theproedure anbeextended totheaseoforrelatedlatentvari-

ables by using the estimator of the parameters and the sores whih was

proposedby Huber, Ronhetti, andVitoria-Feser(2004),p. 900. Thelatter

is anextension of the independent ase presented inChapter 2 tothe orre-

latedase. TheasymptotidistributionofourGFIunderthenullhypothesis

is derived but it appears to be quitediult to implementnumerially. We

propose instead to ompute p-values using dierent resampling tehniques

that we optimize for omputational speed. Our simulations show that the

p-values obtained by this proedure have a uniform distribution under the

(15)

tives that are near the null model. Finally, we ompare our proedure in

terms of empirial levels and powers to a GFI implemented in the software

MPLUS and show that our approah improves onsiderably goodness-of-t

testing within GLLVM.

The thesis is organized as follows: in Chapter 2, we briey review the

GLLVM and the estimation tehnique based on the Laplae approximation

developed by Huber, Ronhetti,and Vitoria-Feser(2004). In Chapter3,we

propose a new GFI based on distane omparison and a proedure to om-

puteassoiatedp-values. DierentavailableGFIare reviewed andompared

in Chapter 4. Asimulationstudy omparing our GFI and the one proposed

bySatorraandBentler(2001)ispresentedinChapter5. Thisshowsthatour

GFIhas betterperformane interms ofempirialleveland empirialpower.

In Chapter 6, we present a proedure to approximate the asymptoti dis-

tribution of this Goodness-of-ttest statisti. This methodis based on the

asymptoti distribution of a U-statisti whih requires the alulation and

the estimation of the rst and seondmomentsof the test statisti. Finally,

a small simulation study is presented in Setion6.3. From a pratialpoint

of view, the asymptotis approximation is not aurate enough to be used

for typial sample sizes enountered in real data sets and so this approxi-

mation should not be used in pratie. In Chapter 7, as an appliation, we

onsider the data from the 1997 Swiss Health Survey and build two health

indiators. A latent model is dened and tested with our GFI. This latter

model gives another insight into the understanding of the health status of

the swiss population.

(16)

Generalized Linear Latent

Variable Models (GLLVM)

The relationship between

p

manifest variables

x (j) , j = 1, .., p

and

q

latent

variables

z (k) , k = 1, ..., q

,

q < p

, isformalized inthe samemanneras ingen-

eralizedlinearmodels(MCullaghandNelder,1989)bymeansofonditional

distributions

g j (x (j) | z)

belongingto the exponential family,i.e.

g j (x (j) | z) = exp

x (j) α (j)T z − b(α (j)T z) φ j

+ c(x (j) , φ j )

,

where

α (j) ∈ ℜ q+1

are theloadings,

φ j

isasaleparameter andthefuntions

b(α (j)T z)

and

c(x (j) , φ j )

depend on the spei distribution

g j (x (j) | z)

. We

denealinearlinkbetween afuntionoftheonditionalexpetationof

x (j) | z

and

z

as

γ (j) E(x (j) | z)

= α (j)T z

, where

α (j)T = (α (j) l ) l=0,1,...,q ,

(2.1)

z = (1, z (2) ) T , z (2) = (z (1) , ..., z (q) ) T .

(2.2)

(17)

We give here the spei funtion

b

,

c

and

γ

and the sale parameter

φ

for

normal and ordinalonditional distribution

g j (x (j) | z)

.

Normal manifest variables

Let

x (j) | z

have a normal distribution with mean

µ

and variane

σ 2

.

The link funtion

γ()

is the identity funtion

γ (j) E(x (j) | z)

= E(x (j) | z) = µ = α (j)T z,

b(α (j)T z) = (α (j)T z) 2

2 ,

c(x (j) , φ j ) = − 1 2

x 2

φ + log(2πφ)

and

φ = σ 2

.

Ordinal manifest variables

Let

x (j) | z

follow an ordered multinomial distribution with ategories goingfrom1to

M (j)

. Thelinkfuntionanbehosenasalogitfuntion

γ (j) p s

1 − p s

= α (j)T s z,

where

p s

isdened asthe umulativeprobabilityofaresponse

x (j) | z

to

be

s

or less, where

s = 1, ..., M (j)

. The

s

index on

α (j) s

indiates that

the rstomponent

α (j) 0,s

of thevetor

α (j)

isathresholdthat isrelated

to eah ategory.

b(α (j)T z) = log p s+1

p s+1 − p s

,

c(x (j) , φ j ) = 0

(18)

The main assumption in the GLLVM is the onditional independene of

the manifest variables given the latent ones. Hene, the joint onditional

distributionis

Q p

j=1 g j (x (j) | z)

. Assumingfurtherthatthelatentvariablesare

independentanddistributedasstandardnormalvariables,thejointmarginal

distribution of the manifest variables is given by

f α,φ ( x ) =

Z " Y p

j=1

g j (x (j) | z )

#

h( z (2) )dz (2) ,

(2.3)

where

h( · )

isthe density of a

q

-dimensional standard normalvariable. Given a sample of

n

observations

x i = (x (1) i , ..., x (p) i ) T

,

i = 1, .., n

, the log-

likelihood of the

(q + 1) × p

loadings matrix

α

and the

p

-vetor of sale

parameters

φ

is

ℓ(α, φ | x) = P n

i=1 log f α,φ (x i )

. This expression ontains a

multidimensionalintegralwhih annotbeomputed expliitly,exeptwhen

x | z

isdistributedaordingtoamultivariatenormal. Huber,Ronhetti,and Vitoria-Feser (2004) propose to use a Laplae approximation of (2.3) and

deneanestimatoralledLAMLE. Thisidea hasbeen usedinseveral elds,

inluding the Bayesian approah to approximate the posterior distribution

(see e.g. Tierney and Kadane, 1986) and in a simpliedform ingeneralized

linear mixed models (GLLAMM) (see Breslow and Clayton, 1993).

Sine our GFI will be based on the LAMLE, we give here the denition

and the properties of the LAMLE. For more details, see Huber, Ronhetti,

and Vitoria-Feser(2004). Byrewriting the density of

x i

in (2.3)as

f α,φ (x i ) = Z

e p · Q(α,z,x i ) dz (2) ,

(2.4)

(19)

Q(α, z, x i ) = 1 p

h X p

j=1

x (j) i α (j)T z − b j (α (j)T z) φ j

+ c j (x (j) i , φ j )

− z T (2) z (2) 2 − q

2 log(2π) i

,

(2.5)

and by applying the Laplaeapproximationto (2.4), we obtain

f α,φ (x i ) = 2π

p q 2

det( − W ( z ˆ i(2) )) 1 2 e p Q(α, z ˆ i ,x i ) (1 + O(p 1 )),

(2.6)

where

W (z) = ∂ 2 Q( α, z, x i )

∂z T (2) z (2) = − 1

p Γ (α, φ, z), Γ (α, φ, z) =

X p

j=1

1 φ j

2 b j (α (j)T z)

∂z T z + I q .

(2.6) depends on the unknown quantity

z ˆ i

,the maximum of the funtion

Q

dened through

∂Q( α, z b i , x i )

∂ z b i = 0,

(2.7)

and an be estimated iteratively by means of

ˆ

z i(2) = ˆ z i(2) (α, φ, x i ) = X p

j=1

1 φ j

x (j) i − ∂b j (α (j)T z ˆ i )

∂α (j)T z ˆ i

α (j) (2) ,

(2.8)

where

α (j) = (α (j) 0 , α (j)T (2) ) T

. It should be noted that

z b i(2)

an be interpreted as the maximum likelihoodestimators of the latent sores. Indeed, if the

z i(2)

were onsidered as parameters, the rst derivative of the likelihood with respet to

z i(2)

for xed

α

and

φ

leads exatly tothe expression (2.7).

(20)

who show that as

n → ∞

n 1 2 ( z b i(2) − z i(2) ) → D N (0, Γ (α, φ, z i ) 1 ).

(2.9)

From (2.5)and (2.6), we obtainthe Laplae approximated log-likelihood

funtion

˜ l(α, φ, | x) = X n

i=1

− 1

2 log det { Γ (α, φ, z ˆ i ) } − z ˆ T i(2) z ˆ i(2) 2 +

X p

j=1

( x j i α (j)T z ˆ i − b j (α (j)T z ˆ i ) φ j

+ c j (x j i , φ j ) )

.

(2.10)

The resulting loadingsestimatorof

α

alledthe LAMLE isthe solution of

∂ ℓ(α, ˜ φ | x)

∂α kl

= X n

i=1

ψ(x i ; α, z b i ) = 0, k = 1, ..., p,

(2.11)

where

ψ(x i ; α, z b i ) = − 1 2

tr

Γ (α, φ, z b i ) 1 ∂ Γ (α, φ, z b i )

∂α kl

+ 1

φ k

x (k) i − ∂b k (α T k z b i )

∂α T k z b i

ˆ

z il ,

(2.12)

with

b z i

given impliitlyby (2.8). Equation (2.11)whihdenes the LAMLE

may have multiple solutions. If

q > 1

, it is neessary to impose

q(q 2 1)

on-

straintson the parameters

α

to obtaina unique solution.

(21)

The LAMLE

α ˆ

belongs to the lass of M-estimators (Huber, 1981), and under the onditions given in Huber(1981), as

n → ∞

n 1 2 ( ˆ α − α) → D N (0, V (α)),

(2.13)

where

V ( α ) = B( α ) 1 A( α )B( α ) T ,

(2.14)

A(α) = E

ψ(x; α, b z)ψ T (x; α, z) b

,

B(α) = − E ∂ψ(x; α, b z)

∂α ,

and the expetations are taken under the GLLVM model. Formore details,

speiallyforthespeiestimationequationsinthenormalandordinalases,

see Huber, Ronhetti, and Vitoria-Feser (2004)and Huber(2004).

(22)

Goodness-of-t for Generalized

Linear Latent Variable Models

(GLLVM)

3.1 Test Statisti

TheobjetiveofaGFIistomeasurethedistanebetweenasuitablequantity

omputed fromthe sampleand itsestimated ounterpartusing the assumed

model. ThebasiideaofourGFIistoomparethedistaneamongthelatent

sores and the orresponding distane among the originalobservations. The

latentsoresrepresentinawaythemappingoftheobservationsonthelatent

variablespae. Hene,if the model is adequate, toa distane between two

observationsintheoriginaldataspaeshouldorrespondasimilardistane

onthelatentspae. Clearly,weneedtodeneageneraldistane measure on

the latent spae and onthe data spae while taking intoaount the nature

of the dierent variables. We propose here to use the distanes developed

(23)

aordingtotheGLLVM,eahobservation

x i

hasaorresponding(unknown)

latentsore

z i

estimatedby

z ˆ i

bymeansof(2.7). Let

d q (ˆ z i 1 , z ˆ i 2 )

beadistane

funtion onthe sores spaeand

d ˜ p ( x i 1 , x i 2 )

adistane funtion onthe data

spae. Sine

z ˆ

is ontinuous, a natural hoie for

d q ( · , · )

is the Eulidean

distane standardizedby the standard deviation of

z ˆ i

, i.e.

d q (ˆ z i 1 , ˆ z i 2 ) = 1 q

v u u t

X q

j=1

ˆ

z i (j) 1 − z ˆ i (j) 2

˜ σ z (j)

! 2

,

(3.1)

where

σ ˜ z (j) = q

1 n

P n

i=1 (ˆ z (j) i − z ˆ (j) ) 2

is the empirial standard deviations of

the

z ˆ (j) i

, the

j

omponent of the vetor

ˆ z i

. In the sample spae, if

x (j)

is normally distributed, the Eulidean distane funtion is also suitable for

d ˜ p ( · , · )

. When

x (j)

is ordinal, a standard hoie is the Manhattan distane

(

L 1

distane) on the ranks

r i (j)

of the observations. Hene for a model with

p 1

normal manifestvariablesand

p 2

ordinalmanifest variables,we have

d ˜ p (x i 1 , x i 2 ) = 1 p 1

v u u t

p 1

X

j=1

x (j) i 1 − x (j) i 2

˜ σ x (j)

! 2

+ 1 p 2

p 2

X

j=1

r (j) i 1 − r (j) i 2

n 2

,

(3.2)

where

σ ˜ (j) x

is the empirialstandard deviationsof

x (j)

and

n 2

is asale fator

for the ranks orresponding to the maximum of the dierenes

r i (j) 1 − r i (j) 2

,

whih has the same order as the variane of

r i (j)

. Consequently, a natural GFI isdened by

(24)

S(x, z ˆ | α) = ˆ 1

n 2 − n 2

X n

i 1 =1

X n

i 2 =1 i 1 >i 2

h

d q (ˆ z i 1 , z ˆ i 2 ) − d ˜ p (x i 1 , x i 2 ) i 2

= 1

n 2 − n 2

X n

i 1 =1

X n

i 2 =1 i 1 >i 2

1 q

v u u t

X q

j=1

ˆ

z i (j) 1 − z ˆ i (j) 2

˜ σ z (j)

! 2

− 1 p 1

v u u t

p 1

X

j=1

x (j) i 1 − x (j) i 2

˜ σ (j) x

! 2

− 1 p 2

p 2

X

j=1

r (j) i 1 − r (j) i 2

n 2

2

.

(3.3)

Basially,thisGFIisanaveragesquareddierenebetweenageneraldistane

on the sample spae and its estimated ounterpart on the latent spae. It

implies that only the latent sores matrix is used in the model part of our

GFI.Otherdistanesanbespeiedfor

d q ( · , · )

and

d ˜ p ( · , · )

. However, Conne

(2005) shows in a simulationstudy that

S

has agoodperformane interms

of empirialpower ompared tothe GFI basedon other distanes.

Sine the distribution of

S

depends on

α ˆ

through

z ˆ

, in order to obtain

orret inferene,

α ˆ

is integrated out usingits asymptotidistributiongiven by (2.13). It turns out that this orresponds to making a orretion on

S

based on a distane between two estimated asymptoti ovariane matries

of

α ˆ

, namely

V ( ˜ α)

and

V ( ˆ α)

. This leads to the followingGFI

Ω = 2

 det

V ˆ ( ˜ α) det

V ˆ ( ˆ α)

1 2

· S(x, z ˆ | α). ˆ

(3.4)

This orretion fator willbe derived inthe next setion.

(25)

p-valuesomputedusing

ν S | α ˆ (s | α) ˆ

,the onditionaldensityof

S

given

α ˆ

,will

dependon

α ˆ

. Inordertoobtainorretunonditionalinferene,weonsider

ˆ

α

as a nuisane parameter, and we integrate out

α ˆ

using its asymptoti

normal distributiongiven by (2.13),i.e.

f S (s) = Z

ν S | α ˆ (s | α) ˆ · h

V 1 2 (α)(vec( ˆ α) − vec(α))

| det(V (α)) | 1 2 d α ˆ

= 1

(2π) p 2 ˜

1

| det(V (α)) | 1 2 · Z

ν S | α ˆ (s | α) ˆ · exp (˜ p · κ( ˆ α)) d α, ˆ

(3.5)

where

κ( ˆ α) = − 1

2˜ p · vec( ˆ α) − vec(α))V 1 (α)(vec( ˆ α) − vec(α) T

,

h()

is the density funtion of the standard normaland

p ˜ = dim(vec( ˆ α ))

.

The term outside the integral depends on an unknown matrix

V (α)

,

whihwill be estimated by

V ˆ ( ˜ α)

with

α ˜

dened below.

Moreover, the maximum of the funtion

κ( ˆ α)

isahieved at

α ˆ = α

with

κ(α) = 0

. Applyingthe

p ˜

-dimensionalLaplaeapproximationtotheintegral in (3.5), weobtain

Z

ν S | α ˆ (s | α) ˆ · exp (˜ p · κ( ˆ α)) d α ˆ = 1 2 q

| det( − 1 p ˜ V 1 (α)) | · ν S | α ˆ (s | α) · 2π

˜ p

p 2 ˜

{ 1 + O(˜ p 1 ) }

(3.6)

(26)

f S (s) = 1 (2π) p 2 ˜

1

| det(V ( ˜ α)) | 1 2 · 1 2 q

| det( − 1 p ˜ V 1 (α)) |

ν S | α ˆ (s | α) 2π

˜ p

p 2 ˜

{ 1 + O(˜ p 1 ) }

= 1 2

| det(V (α)) |

| det(V ( ˜ α)) | 1 2

· ν S | α ˆ (s | α) { 1 + O(˜ p 1 ) } .

(3.7)

α

is unknown and willbe estimated by

α ˆ

, see (2.11). Finally, weobtain

f S (s) = 1 2

| det(V ( ˆ α)) |

| det(V ( ˜ α)) | 1 2

· ν S | α ˆ (s | α ˆ ) { 1 + O(˜ p 1 ) }

(3.8)

whihdenes the orretion fator of the GFI

S

, leadingto

in (3.4).

Forreasonsof numerialstability and sine the log-likelihoodfuntionis

approximatedby

˜ l

in (2.10),weuse

V ˆ ( α ) = 1

n 2 X n

i=1

"

∂ ˜ l(α, φ | x i )

∂α

T

· ∂ ˜ l(α, φ | x i )

∂α

#! 1

.

(3.9)

instead of

1

n V (α)

the asymptotiovarianematrix given by (2.14). To get

V ˆ ( ˆ α)

,

α

is replaed by

α ˆ

in (3.9). (3.9) is preferred to

n 1 V (α)

beause the

derivative of

ψ

appears tobe very unstable insimulations.

Note that if

α ˆ

and

α ˜

are the same estimator, the orretion fator be-

omes simplytwo. Sine our empirialexperiene shows that aorretion is

deisive in having a orret inferene, we propose to onsider two dierent

estimators

α ˆ

and

α ˜

in

V ˆ (α)

,where

α ˜

hasasmaller variane. Forthat,sev-

eral tehniques ould in priniple be onsidered, but we propose to use the

bagging proedure (Breiman, 1996). Our simulation study shows that this

hoie isadequateatleast forthemodelwehaveinvestigated. The omplete

algorithmis presented in the next setion.

(27)

the distribution of

ν S | α ˆ (s | α) ˆ

. In Chapter 6,the nullasymptotidistribution of

isderived whihisshown tobenormalwith ratherompliatedexpres-

sions forthemeanand variane. Toomputethe latter,oneneeds numerial

approximations whih makes inferene quite unstable if not inappropriate.

We prefer instead to approximate

ν S | α ˆ (s | α) ˆ

by means of resampling meth-

ods. Parametri bootstrap has been widely used in goodness-of-t testing

as for example by Romano (1988). However, a diret parametri bootstrap

would betoo omputer intensive,beause

α ˆ

and

z ˆ i

need tobe omputed at

eahbootstrappedsample. Therefore, followingasimilarideaasinSalibian-

Barrera and Zamar (2002), we propose a fast parametri bootstrap that

avoids the omputationof

α ˆ

ateah bootstrapped sample.

3.3 Computing the p-value

First, we need to estimate

V ˆ ( ˜ α)

using the bagging proedure whih an

be summarized in the following way. Let

y = (x, z) ˆ

be a data set with

orresponding estimated sores.

Repeat for

b = 1, .., B

:

1. Generate arandomsample

y b = (x b , z ˆ b )

ofsize

n

from

y

with replae-

ment.

2. Estimate the loadings

α ˜ b

orresponding to the sample

y b

using (2.11)

with

z ˆ b

xed.

3. Evaluate

V ˆ b ( ˆ α b | y b )

with (3.9).

V ˆ ( ˜ α) = 1 P B V ˆ

(28)

Note that in step 2. we ompute the loadings

α ˜ b

onditionally on the original

z ˆ b

. Oneould reestimateboth

α ˜ b

and

z ˆ i

butourproedure ismuh

faster and stable.

The fast parametri bootstrap we propose an be summarized in the

followingway. Let

x

bethe data set supposed tobe generatedby aGLLVM

model. Let also

z ˆ

, and

α ˆ

and

φ ˆ

be the orresponding estimated sores, loadings and sale parameters respetively and

V ˆ ( ˆ α)

the ovariane matrix

evaluated with (3.9), using

α ˆ

,

φ ˆ

and

z ˆ i

,

i = 1, ..., n

.

Repeat for

b = 1, .., B

:

1. Generate one

α b

from its estimated asymptoti distribution

α b

N

ˆ

α, V ˆ ( ˆ α)

.

2. Generate

q

independent standard normal vetors

z

of size

n

.

3. Generate a vetor

µ = E[x | z]

of onditional means of all responses

dened by

γ ( µ ) = α ⋆T b z.

(3.10)

4. Generate the bootstrapped sampleof manifest variable

x b

based upon

the meanthat were alulatedin(3.10)aswellasthe saleparameters

φ ˆ

for the normalresponses.

5. Given the bootstrapped sample, estimate

z ˆ b

onditionallyon

α b

with

(2.7).

6. Evaluate

V ˆ (α b | x b , z ˆ b )

with (3.9).

b = 2 det( ˆ V ⋆ b )) 1 2

· S(x b , z ˆ b | α b )

(29)

p − \ value = 1

B # { Ω b > w } ,

where

w

is the observed value of

omputed on the originalsample. Note

that instep 5. both

z ˆ b

and

α b

ould bereestimated but this would inrease the omputational time without improvement on the performane in terms

of p-values. The variability of

α ˆ

is taken into aount inthe rst step. One

ould also use

V ˆ ( ˜ α)

as a ovariane matrix estimate tosimulate

α b

in step

1. However, under the null hypothesis, the p-values assoiated with this

proedure donot seem tobe aslose touniformasthe one presented above.

Finally,itshouldbestressed thatthisstatistiandtheproedure toeval-

uate its p-value is widely appliable. Indeed, if one uses another onsistent

estimator of

α

,

z

and a orresponding asymptoti ovariane matrix, one ould apply the same proedure to dene a goodness-of-t index and its

orresponding p-value.

(30)

Other GFI for GLLVM

In this hapter, we present the LRT in the GLLVM framework and the

SatorraandBentler(S&B)Goodness-of-t(SatorraandBentler,2001). Cut-

oriteriasuhasRMSEAorSRMR arenot presentedbeausetheyare not

omparable toour GFIoutside the ase of normalmanifest variables.

Inthebinary ase, Pearson-typestatistisareused. They arebased ona

omparison between the empirialfrequenies andthe estimated frequenies

under the model. Pearson-typestatistisrequire a large numberof observa-

tions in eah ell of the ontingeny table for their asymptoti distribution

to hold. To avoid this problem of sparsity, statistis similar to Pearson's

but using only information from lower margins, usually bivariate margins,

are available, see e.g Bartholomew and Leung (2002) or Maydeu-Olivares

and Joe (2005). However, if there are signiant interations among higher

order margins, the GFI may not rejet false models. Moreover, in the ordi-

nal ase, this method isonly appliable when the number of observations is

large. Sine the models we use inour simulation are too omplex for GFI's

based on Pearson-type statisti, they will not be ompared to our GFI in

(31)

Suppose rst that the manifest variablesare onditionallynormal given the

latentones, i.e.

x | z ∼ N p (α T z, ζ),

(4.1)

where

ζ

is adiagonal matrix. Let

var(x) = Σ

, then

Σ = α T α + ζ

and the

likelihoodfuntion

l( Σ 1 ) ∝ n 2

log | Σ 1 | − trace[ Σ 1 C]

,

where

C

isthe empirialovarianeororrelation matrix. Suppose we have

estimators

α ˆ

and

ζ ˆ

for

α

and

ζ

. Then if the number of latent variables

q

is known or xed, the likelihood ratio statisti for the null hypothesis

H 0 : Σ = α 0 T α 0 + ζ 0

, or equivalently

H 0 : x ∼

N

p µ, α T 0 α 0 + ζ 0

against

the alternativethat

Σ

isunonstrained is

n h

− log | Σ ˆ 1 C | + trace[ Σ ˆ 1 C] − p i

,

(4.2)

where

Σ ˆ = ˆ α T α ˆ + ˆ ζ

. Ifthe manifestvariablesareonditionallynormal,this statistiisasymptotiallydistributedasa

χ 2

with

1 2 [(p − q) 2 − (p+ q)]

degrees

of freedom.

If the manifest variables are not (onditionally) normal, the problem of

estimating the likelihood under the alternative arises. In the binary ase,

the multivariatedensity has

2 p − 1

parameterswhihan be estimatedusing

moment estimators of

E[x (j 1 ) · x (j 2 ) ], j 1 6 = j 2

; see e.g Teugels (1990). Even

(32)

suers from two drawbaks. First, this proedure annot be applied to a

mixtureofontinuous anddisretemanifestvariables,beausethelikelihood

under thealternativewould bealmostimpossible toderive. Seondly, inthe

ordinalase,thenumberofobservationsneedstobeverylargeifthenumber

of manifestvariablesismoderatetohigh. Indeed,the numberofparameters

inreases very quikly with the number of manifest variables.

4.2 Satorra and Bentler (S&B) GFI

Thisstatistiisbasedonaomparisonbetweenthesampleovariane

C

and

the model ovariane

Σ

of the manifest variables

x = (x (1) , ..., x (p) )

. If the

manifest variables are a mixture of ordinal and normal variables, the usual

estimatorofthesampleovarianematrix

C

isreplaed bythe polyhori or

polyserialovarianematrix, seee.g Qu, Piedmonte, andMedendorp(1995).

Let

s

and

σ

be the vetor ontaining all the distint values of

C

and

Σ

respetively, and

Υ

the asymptoti ovariane matrix of

√ n(s − σ)

. Let

θ

be the vetor ontaining all the parameters of the GLLVM. Consider

σ(θ)

and the funtion

F ( θ ) := ( s − σ ( θ )) T Υ ˆ 1 ( s − σ ( θ )),

(4.3)

where

Υ ˆ 1

isa onsistentestimator of

Υ 1

. Then the S&B statistiis

T = n F (s, σ(ˆ θ))

c ,

(4.4)

(33)

where

θ ˆ

isthe weighted leastsquares estimator(WLS) whih minimize(4.3)

and

c = dim(θ) r 0 ,

where

r 0 = dim(σ) − dim(θ)

. Then

T → D

r 0

X

j=1

λ j (χ 2 1 ) j ,

(4.5)

as

n → ∞

, where the

2 1 ) j

are independent hi-squares variables with one df and the

λ j

are the nonnull eigenvalues of the matrix

U 0 Υ

, with

U 0 = ˆ Υ 1 − Υ ˆ 1 ∆ ( ∆ T Υ ˆ 1 ∆ ) 1T Υ ˆ 1 ,

and

∆ = ∂θ T σ(θ)

.

In the ase of disrete manifest variables, the problem that the informa-

tion drawn from the sample is redued to the estimated ovariane matrix

still remains. If we ompare our statisti with (4.2) or (4.4), both measure

a distane onsisting ina model part and a sample part. The model part of

in (3.4) depends onthe

(n × q)

matrix of

z ˆ

while the model part of

T

in

(4.4) or the LRT based on (4.2) depend only on the

(q × p)

matrix of the

estimated loadings. Sine the S&B statisti is widely used in pratie, we

will ompare itsperformane to

inthe simulation presented in hapter 5.

(34)

Simulation study

In this Chapter, we study the behavior in terms of level and power of our

GFI

and ompare it to the behavior of the S&B GFI

T

.

T

and its as-

soiated p-value, omputed by means of the asymptoti distribution given

in (4.5), are omputed with Mplus. We onsider models ontaining 2 and 3

latentvariables. 10000samples ofsize 100 (for themodelwith 2latent vari-

ables) and 5000 samples of size 200 (for the model with 3 latent variables)

were simulated. They ontain respetively 5ordinalmanifestvariablesanda

mixture of 5ordinal manifest variablesand 5normal manifest variables. 30

randomsampleswere generatedinthe baggingproedure and100 bootstrap

samples inthe fastparametri bootstrap.

The samples of size

n

are generated in S-Plus using the following proe-

dure

1. Initialize allthe parameters:

• p(q + 1)

elements of the matrix

α

,

• p 1

varianesdening the vetor

φ

.

• (s − 1)p 2 α s

(35)

2. Generate

q

independent standard normal vetors

z j

of size

n

.

3. Generate a vetor

µ = E[ x | z ]

of onditional means of all responses

given by

µ = γ 1 ( α T z ).

4. Generate allresponses

x

baseduponthe means

µ

thatwere alulated

in 3as well asthe sale parameters

φ

for the normalresponses.

Wetriedtouseoursimulateddataset diretlytoompute

T

with Mplus,

but even under

H 0

(the orret model is estimated) only 60% of the simu-

latedsamplesouldprovideanestimatedvaluefor

T

. Thesituationwaseven

worse underalternativehypothesis (where aninorret model isestimated),

whereinsomeasesno

T

isestimated. Thesituationisnotmuhbetterifthe

sample size beomes large (

n = 10000

). Indeed, only 77% of the simulated

samples ould provide an estimated value for

T

in that ase. This might

be due to the fat that the estimation in Mplus is based on the underlying

variable approah. This approah assumes that the manifest variables are

indiret observations of normal underlying variables. When this is not the

ase, the model might not be identied. To avoid this problem, new data

were simulated diretly from Mplus with the same parameters, and resid-

ual variane of the underlying variables dened as

ζ = I pq − α ˜ T α ˜

, where

α ˜

are the loadings standardized by their Eulidean norm. These data are

onstrainedtofollowaless generalmodelthantheone generatedinS-PLUS.

Indeed, in the underlyingapproah, new parameters, the residualvarianes,

havetobexedand thenthemodel isonstrainedtohaveapartiular form.

When one works with ordinal data and use

T

as a goodness-of-t, one has to test rst if the underlying approahis adequate, see Muthen (1993).

(36)

alternatives in the two latent variables models and two in the three latent

variablesmodels. Under

H A

, the models are larger (more general) than un-

der

H 0

. More preisely, models under

H A

are dened so that eah has one

additionalnonnullloading thanthe previousmodel. Forexample, inmodels

ontainingtwolatentvariables,

M3

has thesame loadingsas

M 2

exept

α (5) 1

that is 0. In the two latent variables models,

M 1

has three latentvariables,

but only one additional nonnull loading than

M 2

, whih is equivalent to a

model withthe sameparametersbut oneadditionallatentvariablewithor-

responding loadings equals to 0. We have the following results in terms of

empirialleveland power.

Hypothesis level empiriallevel empirialpower

Ω T Ω T

H 0

0.1 0.099 0.16

H 0

0.05 0.050 0.11

H 0

0.01 0.014 0.04

H A (M 1)

0.05 0.75 0.57

H A (M 2)

0.05 0.42 0.25

H A (M 3)

0.05 0.12 0.22

Table 5.1: Empirialleveland power for the model with 2 latent variables

Hypothesis level empiriallevel empirialpower

Ω T Ω T

H 0

0.1 0.103 0.045

H 0

0.05 0.052 0.021

H 0

0.01 0.018 0.002

H A (M 1)

0.05 0.21 0.09

H A (M 2)

0.05 0.10 0.06

Table 5.2: Empirialleveland power for the model with 3 latent variables

(37)

Inthe twolatentvariablesmodel,

outperformsthe S&Bstatistisinterms of empirialleveland power. Theonlyexeptionistheempirialpowerwith

the

M2

model,butthis isdiulttoomparebeauseoftheliberalbehavior

oftheempiriallevelof

T

.

hasalsotheexpetedbehaviorintermofpower

when the testedmodel isfurther away fromthe null. The 3 latentvariables

model simulations onrm this results. Notie that in this ase the power

is lower beause the alternativeis loser to the null. Indeed, in the 3 latent

variablesmodel, weadd anonullloadingin aloadingmatrix ofsize

(10 × 3)

while thesize ofthe loadingmatrixinthe2latentvariablesmodel is

(5 × 2)

.

Moreover, the level of

T

is muh smaller than the presribed nominal level

0.05andthismakesthistesttooonservative. Theresultmightbeduetothe

small sample size used in this simulation study. Indeed, the p-value of

T

is

estimated using itsasymptotidistribution. But,even whenthe sample size

islarger, onehas tobesurethat theassumptions ofthe underlyingvariables

approah are valid tobe able to ompute

T

in pratie.

Clearly,thenewstatistiimprovesgoodness-of-ttestingwithinGLLVM.

At present,

and its assoiated p-values are estimated using R ode by

alling C ode to ompute the loadings and the sores using the algorithm

presented inHuber, Ronhetti,and Vitoria-Feser (2004). Bothuse a quasi-

Newton proedure (Dennis and Shnabel, 1983) where ompatible stopping

ruleshavetobedenedandan beomputationallyintensive. Onepotential

improvement would be todevelop a stand-alone R library to avoidthis two

step proedure and more eient algorithmstoredue the omputing time.

Thisproedure anbenaturallyextended tolatentvariableswithovari-

(38)

between the distane among the latent sores minus the estimated funtion

of the ovariates and the orresponding distane among the original obser-

vations. Further researh diretions inlude the denition of new distane

funtions for nominal disrete variables, and non-normal distributed vari-

ables. Finally, our GFI and the proedure to evaluate its p-value ould be

extendedtomultilevelmodels,struturalequationsmodelsorombinedmea-

surement models. Indeed, the GFI an be naturally extended to that ase

usingthe generalizedfatorformulation(GF)of Skrondaland Rabe-Hesketh

(2004),but muh workneeds tobe donetoextend the estimationproedure

and the resampling tehniques to these ases.

Appendix A: Simulation Parameters

The loading with the value of 0 for the model under the nullhypothesis are

atually not estimated. This onstraint ensures a unique solution.

A.1: Model with 2 latent variables

Using (2.1), wehave the followingmodel

γ =

 

 

γ (1) E(x (1) | z)

.

.

.

γ (5) E(x (5) | z)

 

 

=

 

 

 log

p (1) s 1 − p (1) s

.

.

.

log

p (5) s 1 − p (5) s

 

 

=

 

 

α (1) 0,s α 0,s (2) α (3) 0,s α (4) 0,s α (5) 0,s α (1) 1 α 1 (2) α (3) 1 α (4) 1 α (5) 1 α (1) 2 α 2 (2) α (3) 2 α (4) 2 α (5) 2

 

 

T

z

with the nullhypothesis,

H 0 : α 2 (1) = α 1 (3) = α (5) 1 = 0.

Références

Documents relatifs

In this paper, we propose a new estimator for the parameters of a GLLVM, based on a Laplace approximation to the likelihood function and which can be computed even for models with

They proposed a generalized linear latent variable model (GLLVM) (see Bartholomew and Knott, 1999) that allows to link latent variables to manifest variables of dierent type

The robustness properties of the approximate ML estimator are analysed by means of the IF. For the simulations we will consider the case of a mixture of normal and binary variables

In this work, we propose a new estimator for the parameters of a GLLVM, the LAMLE, based on a Laplace approximation to the likelihood function and which can be computed even for

We propose a new estimator for the parameters of a GLLVM, based on a Laplace approximation to the likelihood function and which can be computed even for models with a large number

In this article we have shown that the MLE for the GLLVM, at least when binary and normal manifest variables are ana- lyzed, can be biased when the data are not exactly generated by

For example, the model we consider in this paper is the generalized linear latent variable model (GLLVM) proposed in Moustaki and Knott (2000) (see also Moustaki, 1996, Sammel,

In particular, we propose generative models for dependency parsing and semantic role classification, which use inter-connected latent variables to encode the parsing history in