Robust Estimation and Inference for Generalised Latent Trait Models

(1)

Report

Reference

Robust Estimation and Inference for Generalised Latent Trait Models

MOUSTAKI, Irini, VICTORIA-FESER, Maria-Pia

Abstract

The paper discusses the effect of model deviations such as data contamination on the maximum likelihood estimator (MLE) for a general class of latent trait models (citeNP{MoKn:00}). This is done with the use of the influence function (Hampel 1968, 1974) a mathematical tool to assess the robustness properties of any statistic, such as an estimator.

Simulation studies show that the MLE can be seriously biased by model deviations.

Therefore, we then propose alternative robust estimators that are not less influenced by data contamination. The performance of the robust estimators in terms of bias and variance is compared to the MLE estimator both analytically and through simulation studies.

MOUSTAKI, Irini, VICTORIA-FESER, Maria-Pia. Robust Estimation and Inference for Generalised Latent Trait Models. London School of Economics, 2002

Available at:

http://archive-ouverte.unige.ch/unige:6508

Disclaimer: layout of this document may differ from the published version.

(2)

Robust Estimation and Inference for Generalised Latent Trait Models

Irini Moustaki and Maria-Pia Victoria-Feser London School of Economics

University of Geneva

Statistics Research Report LSERR76, LSE March 2002

Abstract

The paper discusses the effect of model deviations such as data contamination on the maximum likelihood estimator (MLE) for a general class of latent trait models (Moustaki and Knott 2000). This is done with the use of the influence function (Hampel 1968, 1974) a mathematical tool to assess the robustness properties of any statistic, such as an estimator. Simulation studies show that the MLE can be seriously biased by model deviations. Therefore, we then propose alternative robust estimators that are not less influenced by data contamination.

The performance of the robust estimators in terms of bias and variance is compared to the MLE estimator both analytically and through simulation studies.

Keywords Generalized latent trait, mixed items, inﬂuence function, robust estimation

(3)

1 Introduction

The study of relationships among variables is a common research problem in social sciences. Theoretical constructs such as intelligence, ability, emotion and stress are not directly measurable but only indirectly through observed variables that are supposed to be indicators of those unobserved constructs.

Therefore, models such as factor analysis or structural equation models that allow measurements of unobservable variables or latent variables by means of observable or manifest variables (also called items) are very important.

In practice any type of manifest variables can be collected, such as mul- tiple choice binary data (e.g. correct/incorrect), ordinal data such as Likert type scales (e.g. opinions) or metric data (e.g. scores obtained on a test), etc. To analyse the relationships between those different types of variables, one needs to use the appropriate methodology. In the literature there are two main approaches for analyzing the interrelationships among a set of observed variables. One approach, the underlying variable approach, supposes that the underlying scale of the different variables is normal and for practical reasons is observed on discrete (binary or ordinal) values. Tetrachoric, polychoric (Olsson 1979) or polyserial (Olsson, Drasgow, and Dorans 1982) correlations are computed among those underlying continuous variables. Poon and Lee (1987) developed a method for finding the maximum likelihood estimates of the parameters of a multivariate normal distribution in these situations.

Their method can be used for structural equation modelling as it is proposed in Muth´en (1984) and Lee, Poon, and Bentler (1992). This is not the approach we consider here because of the relatively strong assumptions on the underlying distributions.

The other approach, the item response theory approach, models the observed variables as they are by postulating distributions on the observed variables. A generalized model framework for any type of observed data in the exponential family is discussed in Moustaki and Knott (2000). Their paper gives an estimation procedure for the maximum likelihood estimator (ML) of the generalized latent trait models (GLT). They extend the work of Moustaki (1996) for mixed binary and metric variables and Bartholomew and Knott (1999) for categorical variables. O’Muircheartaigh and Moustaki (1999) also consider the case of missing values.

However, a classical ML approach makes the fundamental assumption that the data are generated exactly from the model and in particular that there are no errors in the set of responses. For example, in the case of normal variables a subject with a response more than 3 standard deviations away from the mean has an unexpected response under the normal model which is considered to be either an error (e.g. recording error) or just an unusual

(4)

subject not representative of the sampled population. For binary variables it is harder to deﬁne when a data set is contaminated or not. For example, if the assumed model is a Guttman model then any positive/correct response that is followed by a negative/wrong response does not comply with the assumed model. As we deviate from the deterministic nature of the response patterns under the Guttman model it is more diﬃcult to detect response patterns that are not generated by the assumed model. Subjects that indicate the presence of model deviation, i.e. they are highly improbable under the assumed model, might have been generated by another (not assumed) model.

The question that is addressed in this paper is what is the eﬀect of these unexpected set of responses on the ML estimator? Do the parameter estimates change radically if subjects that do not “ﬁt the model” are present in the sample or in other words is the ML estimator for the GLT model robust?

If the ML estimator is not robust, that means that in principle one subject can change the conclusions drawn from the data analysis. This is obviously a non desirable property of the estimation procedure. In that case, a robust estimator which is built to be resistant to model deviations should be ﬁrst developed and then used in practice. The aims of the paper are ﬁrst to investigate the robustness properties of the ML estimator for GLT model and then to propose robust estimators.

General robustness theory can be found in Huber (1981) and Hampel, Ronchetti, Rousseeuw, and Stahel (1986) who have set the foundations. We adopt here the approach based on theInfluence Function (IF) (Hampel 1968, 1974) a mathematical tool to assess the robustness properties of any statistic such as an estimator or a test statistic. Let F_θ denote a parametric model such as the GLT model, where θ is a vector of parameters. Let also T be an estimator of θ which can be written as a functional of any distribution F, i.e. T(F). The theory of robust statistics starts by enlarging the hypothesis about the distribution of the data by supposing that the data generating the model is in a neighbourhood (1−ε)F_θ+εGof the true model. In other words, it is supposed that a large proportion (1−ε) of the data are generated by the model F_θ whereas a small fraction ε >0 comes from any distributionG.

To assess the behaviour in the neighbourhood of the hypothetical model of T with usually inﬁnitesimal values of ε, one uses the IF. Formally the IF is deﬁned as

IF(x;T, F_θ) = lim

ε↓0

T(F_θ)−T(F_ε) ε

with F_ε = (1−ε)F_θ+ε∆_x where ∆_x is the distribution which assigns probability 1 to an arbitrary pointx. Data generated fromF_εare usually said to be generated under model contamination. The IF then measures the inﬂuence of an inﬁnitesimal amount of contaminated data at the arbitrary point x on

(5)

the value of the statistic T. In fact, Hampel et al. (1986) show that the IF gives information on the behaviour ofT for any “contamination” distribution G, since one has that

sup

G T((1−ε)F_θ+εG)−T(F_θ) εsup

x IF(x;T, F_θ)

TheIF can also be seen as a first order approximation of the asymptotic bias of T (see Hampel et al. 1986). This means that the IF can be used to assess the robustness properties ofT in that if it is unbounded then the asymptotic bias of T can be infinite (or very large) under model contamination, meaning that T is not robust. If it can take large values, then although the bias is in this case finite, it might nevertheless be large. In the later case, in order to measure the maximal size of the bias, one can use the self-standardized sensitivity (Hampel et al. 1986) given by

γ^∗(T, F_θ) = sup

x

IF(x;T, F_θ)^TV(T, F_θ)⁻¹IF(x;T, F_θ)₁/2

(1) where V(T, F_θ) is the asymptotic variance of T. Hence the upper bound of the asymptotic bias is measured in the metric given by the asymptotic covariance matrix of the estimator. We also have the result that

γ^∗(T, F_θ)² ≥s

where s = dim(θ) is the number of parameters. The IF and γ^∗ will be used to assess the robustness properties of the ML estimator for the GLT model.

The paper is organised as follows. The GLT model and the ML estimator of its parameters are presented in section 2. In section 3, the robustness properties of the ML estimator are studied by means of the IF and the self- standardized sensitivity. Several robust estimators are presented in section 4 and their robustness, eﬃciency and consistency properties are studied. In section 5, the behaviour of the ML and a robust estimator under model contamination are studied through a simulation study. Finally, section 6 concludes.

2 The generalized latent trait model

A latent variable model aims to explain the interrelationships among pman- ifest response variables x₁, . . . , x_p with qlatent variables z₁, . . . , z_q whereqis much smaller than p. The conditional distribution of x_i|z (z = [z₁, . . . , z_q]) is taken from the exponential family, i.e.

g_i(x_i|z, α_i) = exp x_iθ_i

φ_i − b(θ_i)

φ_i +c(x_i, φ_i)

(6)

whereθ_i =θ_i(z, α_i) is called a canonical parameter andb(θ_i) andc(x_i, φ_i) are specific functions taking a different form depending on the distribution of the response variable x_i (see Moustaki and Knott 2000). φ_i is a scale parameter which is usually estimated separately from the rest of the parameters αi. The link between the manifest and latent variables is defined through plink functions b(θ_i) such that we have the following relationship

θi(z, αi) = αi0+ q

j=1

αijzj =αiz^∗

where α_i = [α_i₀, . . . , α_iq] and z^∗ = [1, z₁, . . . , z_q]^T. The functions b(θ_i) = b(θ_i(z, α_i)) are deﬁned for diﬀerent types of data and distributions as follows

• Binary responses with logit link: b(θ_i) = log(1 + exp(θ_i))

• Poisson responses with log link: b(θ_i) = exp(θ_i)

• Normal responses with identity link: b(θ_i) =θ_i²/2

• Gamma responses with reciprocal link: b(θ_i) =−log(−θ_i)

In the ordinal case, the canonical parameter θ_i(z, α_i) is not linear with respect to the latent variables. For a generalized framework for modelling ordinal data see Moustaki (2000).

The aim of a GLT model is to reduce the dimension of the manifest variables into a smaller number of variables by taking into account the correlation structure of thex_i’s. An assumption that is made for a GLT model is that of conditional independence which says that the manifest variables are conditionally independent given the latent variables, i.e.

g(x|z, α) = Π^p_i₌₁g_i(x_i|z, α_i)

with x = [x₁, . . . , xp], α = [α^T₁, . . . , α^T_p]^T and g the conditional distribution of x given z. The joint distribution of the manifest variables can thus be written as

f(x;α) =

· · ·

[Π^p_i₌₁g_i(x_i|z, α_i)]h(z)dz

where the z_j inz are assumed to follow independently standard normal distributions, i.e. h(z) =

q j=1

ϕ(z_j).

For a sample of size n, the log-likelihood is then L(α, φ) = 1

n n

h=1

logf(x_h;α)

(7)

The partial derivatives are

∂L(α, φ)

∂α_i^T = 1 n

n h=1

s_i(x_h;α)

= 1

n n h=1

1 f(x_h;α)

· · ·

g(x_h|z, α)vec

x_ih−b(θ_i)

φ_i z^∗ h(z)dz (2) where b(θ_i) = _∂θ^∂

ib(θ_i). The derivative of the specific function b(θ_i) with respect to the canonical parameter θ_i is equal toE[x_i|z]. Note also that the second derivative of the specific function b(θ_i) with respect to θ_i multiplied by φ_i is equal to var[x_i|z]. The roots of (2) define the ML estimator ˆα_i, ∀i.

Diﬀerentiating the log-likelihood with respect to the scale parameter leads to

∂L(α, φ)

∂φ^T_i = 1 n

n h=1

s_i(x_h;φ)

= 1

n n h=1

1 f(x_h;α)

· · ·

g(xh|z, α)

−x_ihθ_i−b(θ_i)

φ²_i +c(φi, xih) h(z)dz (3) For the Binomial, the multinomial and the Poisson distribution the scale

parameter φ = 1. For the Normal distribution, we have c(φi, xi) = 0.5

x²_i φ²_i − 1

φ_i

so that

s(x;φ_i) = 1 f(x;α)

· · ·

g(x|z, α)0.5 φ²_i

(x_i−θ_i)²−φ_i

h(z)dz

In order to ﬁnd the ML estimator, one has to rely on an iterative process described by Moustaki and Knott (2000) who propose to approximate the integrals by Gauss-Hermite quadrature with k weights ϕ(z_tj) and abscissae z_tj for each latent variable j = 1, . . . , q, giving

f(x; α) = k t₁=1

. . . k t_q=1

h(z_t) [Π^p_i₌₁g_i(x_i|z_t, α_i)]

(8)

and

s_i(x;α) = k t₁=1

. . . k t_q=1

h(z_t)g(x|z_t, α) f(x;α) vec

x_i−b_i(θ_it)

φ_i z^∗_t (4) with h(z_t) =

q j=1

ϕ(z_t_j) and z_t= [z_t_i₁, . . . , z_t_i_q].

3 Robustness properties of the ML estimator for GLT model

In this section we study the robustness properties of the ML estimator for the GLT model by means of theIF and the self-standardized sensitivity. The features of the GLT model are relatively complicated and this is why both approaches are necessary. We will study in turn the model parameters αand the scale parameters. We will also restrict our study to the case of a mixture between normal and binary variables with one latent variable.

3.1 Model parameters α

To study the robustness properties of the ML estimator for the GLT model as deﬁned by (2) we use the IF. For ML estimators with score functions, it is given by

IF(x,α, Fˆ ) =M(s, F)⁻¹s(x, α) where

M(s, F) =

· · ·

s(x, α)s^T(x, α)f(x;α)dx and s(x;α)^T =

s_i(x;α)^T

i=1,...,p (see Hampel et al. 1986). It is therefore proportional to the score function. For the GLT model, the score function which is given in (4), depends on the point of contamination x through the quantities f(x;α),g(x|z, α) = Π^p_i₌₁g_i(x_i|z, α_i) andx_i. For example the effect of an extreme value for the ith manifest variable has an influence not only on the ML estimator of α_i corresponding to this manifest variable, but also on the other estimates. Actually the ML estimator of the whole vector α can, in principle, be influenced by extreme data. What is not clear is the size of theIF for different types of variables. Indeed, the quantity

_x

i−b_i(θ_i) φ_i

can be very large if x_i is far away from its expectation, but at the same time its density g_i(x_i|z, α_i) becomes very small and the behaviour of ^g_f^(x|z_(x;_α^,α₎⁾ is not straightforward to study. One could also expect the IF to be bounded,

(9)

since for extreme values in xthe corresponding conditional density g(x|z, α) should be very small or even nil.

In order to investigate this point, we computed theIF for each parameter as a function of one of thexi inx. The model we chose is a one-factor model ﬁtted to two binary (i= 1,2) and three (i= 3,4,5) normal manifest variables with parameter’s values of

• α₁ = [1.0,0.7]

• α₂ = [0.8,1.0]

• α₃ = [2.0,0.6] and φ₃ = 1

• α₄ = [2.5,0.7] and φ₄ = 1

• α₅ = [3.0,0.8] and φ₅ = 1

Figure 1 shows the IF for each parameter of the model, when the third manifest variable (i.e. the ﬁrst normal variable) takes values between -50 and 50 (the other manifest variables are set to a value of 1). We suppose the scale parameters known. One can see that the IF is bounded in a natural way when x₃ becomes really extreme. This can be explained by the fact that the conditional density g(x|z) becomes 0. However, it seems that the size of the bias which is proportional to the IF can be quite large for all parameters. It is however larger for the parameters corresponding to the ﬁrst normal variable, i.e. the one which carries the contamination.

In order to have an idea of the size of the (asymptotic) bias of the ML estimator, we can compute the self-standardized sensitivity given in (1) under diﬀerent contamination settings and also diﬀerent parameters’ values. We have that the asymptotic covariance matrix of the ML estimates is the inverse of the information matrix given by

· · ·

s(x;α)s(x;α)^Tf(x;α)dx ₋₁

Therefore, γ^∗ is γ^∗(T, Fθ) = sup

x

s(x, α)^T

· · ·

s(x, α)s^T(x, α)f(x;α)dx ₋₁

s(x, α) ₁/2

For diﬀerent combinations of contaminations (i.e. 1st, 2nd and/or 3rd normal variable taking extreme values) and for the case when the scale parameters are known, we found that γ^∗ ≥ 388 which means that for a small amount of contamination, say 1%, the bias on the ML estimates can be as large as 3.88!!!

(10)

3.2 Scale parameter

The scale parameter φ_i is also estimated via the ML equation given by (3).

As for theαparameters, it is diﬃcult to study the impact of model deviations such as extreme observations on the scale estimates by just looking at the expression of the IF. As before and for the same model, we computed the IF for the scale parameter when the value of one of the manifest variables is varied (here x₃). As one can see in Figure 2, the score function is bounded for extreme values in x₃ and that is probably due to the conditional density g(x|z) becoming 0. However, the value of theIF can be very large, especially for the scale estimate corresponding to the variable which values are varied.

The self-standardized sensitivity measure when the scale parameter is also estimated by means of the ML estimator is this time γ^∗ ≥ 1402, i.e. more than 3 times larger than in the case where the scale parameters are known.

The study of the IF and the self-standardized sensitivity permits one to have an idea of the asymptotic bias of the ML estimator. To make the point even stronger, we propose to perform simulation studies. This will be done in the next section together with simulations studies for robust estimators.

4 Robust estimation for GLT model

4.1 Robust M -estimators

Several classes of estimators have been defined (see e.g. Hampel et al. 1986) in which one can find robust estimators. The most well known class of estimators is the class of M-estimators defined by Huber (1964) which is a generalisation of the ML estimator. Indeed, a relatively general function ψ (see Huber 1981) replaces the score function leading to an M-estimator defined implicitly as the solution in α of

n h=1

ψ(x_h;α) = 0 (5)

It is known that the IF of M-estimators is proportional to ψ (see Hampel et al. 1986) so that choosing a bounded ψ or controlling the bound on ψ deﬁnes a robust estimator. For GLT models, we can easily generalize the ML estimator to M-estimators. In the following subsections we present and analyze several proposals of M-estimators for the α parameters.

(11)

4.1.1 Optimal bias-robust estimator

Among the M-estimators, the one which has the smallest covariance matrix under the constraint of a boundedIF (see Hampel et al. 1986) is the optimal bias-robust estimator (OBRE). The later is deﬁned for any score function as

1 n

n h=1

A(α)[s(x_h;α)−a(α)]w_c(x_h) = 0

where the weight function w_c is deduced from the Huber function with parameter cand is given by

w_c(x;α) = min

1 ; c

A(α)[s(x;α)−a(α)]

and the p(q+ 1)×p(q+ 1) matrixA(α) andp(q+ 1) vectora(α) are implicitly deﬁned through

· · ·

[s(x;α)−a(α)][s(x;α)−a(α)]^Twc(x)²f(x;α)dx = A(α)⁻¹A(α)⁻^T

· · ·

[s(x;α)−a(α)]wc(x)f(x;α)dx = 0 (6) Needless to say that solving the implicit equations to ﬁnd A(α) and a(α) is a rather complicated procedure. The parameter space becomes very large even in rather simple problems. For example, if we have p = 10 and we ﬁt only one latent variable, then A(α) is of dimension 20×20 (and even bigger if the scale parameters need to be estimated as well)! We therefore propose to explore other simpler M-estimators.

4.1.2 Residual-robust estimator

If one looks at each element of the score function deﬁning the ML estimator, i.e.

1 n

n h=1

1 f(x_h;α)

· · ·

g(x_h|z, α)vec

x_ih−b(θ_it)

φ_i z^∗ h(z)dz one notices that each observation of a variable i contributes to the score function proportionally to the typical deviation ^x^ih⁻_φ^bⁱ⁽^θ^it⁾

i . If the observed value x_ih for the ith variable is large with respect to its conditional expec- tation b_i(θ_it), then the ML estimator could be attracted by this observation in that the estimated values would be biased towards it. This is then a

(12)

possible quantity to bound. However, in order to be able to choose a value for the bounding constant, it is better to standardize the deviations by the square root of the conditional variances (b_i(θ_it)φ_i = ^∂²^b_∂θⁱ⁽₂^θ^it⁾

i φ_i). We therefore propose to bound the following quantity: ∀x_i, z_t

x_i −b_i(θ_it) b_i(θ_it)φ_i

≤c (7) so that a unique value for the bounding parameter can be set. The M- estimator we propose is then given implicitly by

n h=1

ψ(x_h;α) = n h=1

[ψ_i(x_h;α)]_i₌₁_,...,p = 0 (8)

with

ψ_i(x;α) =

· · ·

g(x|z, α) f(x;α) vec

x_i−b_i(θ_it)

φ_i w_c(x_i, α_i) z^∗ h(z)dz and where

w_c(x_i, α_i) =

⎧⎨

⎩

1 if |x_i−b_i(θ_it)| ≤c

b_i(θ_it)φ_i

c√

b_i(θ_it)φ_i

|^xⁱ⁻^bi(θ_it)| otherwise (9) It should be stressed that the residual-robust estimator is not deﬁned for the scale parameters. If the scale parameters need to be estimated, then one has to look for a robust estimator as well. We do not propose one here, because later in the paper another robust estimator will be proposed for all the model parameters including the scale.

To see if the estimator really downweights extreme values, one can look at the IF which are presented in Figure 3 for the parameter of the mixed GLT model with two binary and three normal manifest variables, when the third manifest variable (i.e. the ﬁrst normal variable) takes values between -50 and 50 (the others manifest variables are set to a value of 1). It should be compared to Figure 1. One can see that overall, the inﬂuence of extreme data is much more limited than with the ML estimator. It is however, not sure that the maximal bias is small and hence the self-standardized sensitivity needs to be computed. We found γ^∗ ≥ 69 (when the scale parameters are known) which means that the bias can be as large as say 0.69 for 1% of contaminated data. Is this a large quantity? In several simulations studies, we have found that the residual-robust estimator can be biased even with

(13)

bounding constants c as low as 2. The reason is that the weights in (9) do not take into account the ratio of the conditional versus the joint probability

g(xh|z,α)

f(xh;α) which can in practice take very large values for some atypical observations. It is therefore important to propose an M-estimator that limits the inﬂuence of all elements of the score function.

4.1.3 Globally weighted robust estimator

It is becoming clear that in order to limit the inﬂuence of an extreme observation on the resulting estimator, one has to bound the whole score function.

This is actually what the OBRE does, but as we have seen, it is probably too complicated to implement to the GLT models. What we propose here is a similar version given by

1 n

n h=1

ψ_c(x_h;α) = 1 n

n h=1

[s_i(x_h;α)w_i(x_h, c)]_i₌₁_,...p = 0

where the weight function w_i is the Huber function with parameter c given by

wi(x;c) = min

1 ; c

s_i(x;α) and

s_i(x;α) = 1 f(x;α)

· · ·

g(x|z, α)vec

x_i−b(θ_i)

φ_i z^∗ h(z)dz Note that one could choose different bounding constants c, one for each manifest variable. We did not however explore this possibility because we want to keep the robust estimator as simple as possible. Note also that this robust estimator is also defined for the scale parameters. The IF of this globally weighted robust (GWR) estimator with bounding constantc= 4 for the parameters of the mixed GLT model with two binary and three normal manifest variables, when the third manifest variable (i.e. the first normal variable) takes values between -50 and 50 (the other manifest variables are set to a value of 1) are given in Figure 4. One can notice that theIF are now bounded under a lower bound so that the bias should be smaller. One can also set a bounding constant to a lower value, say c= 2 and get theIF given in Figure 5. The information we can extract from these IF is that globally the bias should be limited, except maybe for the parameters of the binary variables α₁₁ and α₁₂.

Finally, the self-standardized sensitivity for the GWR estimator withc= 4, and c= 2 isγ^∗ ≥23.78 and γ^∗ ≥11.89 respectively. The bias can then be limited to 0.1 or 0.3 for 1% or 3% contaminated data.

(14)

4.2 Consistency

For an M-estimator deﬁned generally through a ψ-function as n

h=1

ψ(x_h;α) = 0 Fisher consistency implies

· · ·

ψ(x;α)f(x;α)dx= 0

When this is not the case, one can make the M-estimator Fisher consistent by adding a proper quantity in its deﬁnition, i.e.

1 n

n h=1

ψ(x_h;α)−a(α) = 0 such that

a(α) =

· · ·

ψ(x;α)f(x;α)dx For the GWR estimator, we have

a(α) =

· · ·

g(x|z, α)vec

x_i−b_i(θ_i(z))

φ_i diag (w_i(x, c))z^∗ dxh(z)dz

i=1,...,p

This quantity is not obvious to compute, but as we will see from our simulations, if there is a bias (under the true model), then it is very small.

4.3 Eﬃciency

The asymptotic covariance matrix of M-estimators deﬁned in (5) is given by V(ψ, α) =M⁻¹(ψ, α)Q(ψ, α)M⁻^T(ψ, α) (10) where

M(ψ, α) =

· · ·

ψ(x;α)s^T(x;α)f(x;α)dx and

Q(ψ, α) =

· · ·

ψ(x;α)ψ^T(x;α)f(x;α)dx

These quantities are very complicated to compute unless some simpliﬁcations are made. First the quadrature points are used to compute the integrals

(15)

involved in s(x;α) and in ψ(x;α) as given in (4) and (8) respectively. Then we propose to compute the sample version ofV(ψ, α) which is obtained when

M(ψ, α) = 1 n

n h=1

ψ(x_h;α)s^T(x_h;α) and

Q(ψ, α) = 1 n

n h=1

ψ(x_h;α)ψ^T(x_h;α)

The sample is simulated given the values of the model parameters. With the parameter values used in our previous simulation studies and a simulated (uncontaminated) sample of 1000 observations, we found a relationship between the efficiency of the GWR estimator and the bounding constant c which given in Figure 6. In particular, for an efficiency ratio of 95%, one can use a bounding constant of c= 1.1, whereas a bounding constant of c = 2, leads to an efficiency ratio of 98.7%. It should be noted that in principle the efficiency depends on the parameter values. A strategy that is often adopted in such cases is to try different bounding constants c and compute the efficiency given the values of the estimates.

Note that inference for each estimated parameter can be performed, since the asymptotic covariance matrix for any M-estimator is given by (10). It can be estimated in the same way as it is done for computing the eﬃciency.

5 Simulation study

In this section, we present a small simulation study that should enable one to conﬁrm the results we found theoretically. In particular, we would like to check that the bias under contaminated data is smaller with the GWR estimator and that in some settings, the ML estimator can be seriously biased.

Moreover, we would like to have an idea of the bias of the GWR estimator under no contamination, i.e. at the true model. In order to see that, we have simulated 100 samples of size 500 from the mixed GLT model we used previously. We also contaminated the data in two diﬀerent fashions. In one case we chose randomly 3% of the ﬁrst normal variable (i.e. observations of x₃) that we set to an arbitrary value (20), whereas in the second case, we chose randomly 3% of the subjects and set their responses on all the normal variables to an arbitrary value (20). We then estimated the 10 parameters of the GLT model and the results are presented in the form of boxplots in respectively Figure 7 for the binary items, in Figure 8 for the means (α₀_i) of the normal items and in Figure 9 for the latent variable parameters (α₁_i)

(16)

of the normal items. The horizontal lines correspond to the true parameters values. The first two boxplots in each graph are the distributions of the estimators under no model contamination, the following two when only one normal variable is contaminated (3%) and the last two when all three normal variables are contaminated (3%). One can first notice, that at the model, i.e. when there is no contamination, the GWR estimator is not biased or the bias is very small. When there is contamination, the ML estimator is biased, not always in the same manner, but the GWR estimator is either not biased or at least less biased. Therefore, for a small efficiency loss (i.e. less than 2%), there is a certain gain in using the GWR we proposed.

We also tried a more robust estimator by setting the bounding constant to c= 1.1. The results are presented in Figure 10, 11 and 12. With a lower bounding constant (c= 1.1), the GWR estimator for the binary parameters become unbiased under contamination. They behave similarly withc= 2 for the parameters (α₁_i) of the normal items, except that under the model, they seem to be more biased. A correction for the bias seems therefore necessary.

6 Conclusion

In this paper we have shown that the ML estimator for the GLT model, at least when binary and normal manifest variables are mixed, can be biased when the data are not exactly generated by the postulated model. This is for example the case when there are extreme observations, i.e. subjects not behaving like the majority. We have investigated this robustness problem by means of the IF and the self-standardized sensitivity. We have also proposed some robust alternative estimators for the α parameters, one of which seems to be satisfactory in terms of limited bias, eﬃciency and computational complexity. What we have not yet investigated is the estimation of the scale parameters. The scale parameters are needed to compute conﬁdence intervals for the α parameters. This problem is however left for future research.

(17)

References

Bartholomew, D. J. and M. Knott (1999). Latent Variable Models and Factor Analysis. Kendall’s Library of Statistics 7. London: Arnold.

Hampel, F. R. (1968). Contribution to the Theory of Robust Estimation.

Ph. D. thesis, University of California, Berkeley.

Hampel, F. R. (1974). The inﬂuence curve and its role in robust estimation.

Journal of the American Statistical Association 69, 383–393.

Hampel, F. R., E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel (1986).

Robust Statistics: The Approach Based on Influence Functions. New York: John Wiley.

Huber, P. J. (1964). Robust estimation of a location parameter.Annals of Mathematical Statistics 35, 73–101.

Huber, P. J. (1981). Robust Statistics. New York: John Wiley.

Lee, S.-Y., W.-Y. Poon, and P. M. Bentler (1992). Structural equation models with continuous and polytomous variables. Psychometrika 57, 89–105.

Moustaki, I. (1996). A latent trait and a latent class model for mixed observed variables. British Journal of Mathematical and Statistical Psy- chology 49, 313–334.

Moustaki, I. (2000). A latent variable model for ordinal variables.Applied Psychological Measurement 24, 211–223.

Moustaki, I. and M. Knott (2000). Generalized latent trait models. Psy- chometrika 65, 391–411.

Muth´en, B. (1984). A general structural equation model with dichoto- mous, ordered categorical and continuous latent variables indicators.

Psychometrika 49, 115–132.

Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coeﬃcient. Psychometrika 44, 443–460.

Olsson, U., F. Drasgow, and N. Dorans (1982). The polyserial correlation coeﬃcient. Psychometrika 47, 337–347.

O’Muircheartaigh, C. and I. Moustaki (1999). Symmetric pattern models: A latent variable approach to item non-responsein attitude scales.

Journal of the Royal Statitical Society, Series A 162, 177–194.

Poon, W.-Y. and S.-Y. Lee (1987). Maximum likelihood estimation of multivariate polyserial and polychoric correlation coeﬃcients (corr: V53 p301). Psychometrika 52, 409–430.

(18)

Figure 1: IF for the ML estimator of a mixed model with two binary and three normal manifest variables.

(19)

Figure 2: IF for the ML estimator of the scale parameter of a mixed GLT model with two binary and three normal manifest variables.

(20)

Figure 3: IF for the residual robust estimator (c= 4) of a mixed GLT model with two binary and three normal manifest variables.

(21)

Figure 4: IF for the GWR estimator (c = 4) of a mixed GLT model with two binary and three normal manifest variables.

(22)

Figure 5: IF for the GWR estimator (c = 2) of a mixed GLT model with two binary and three normal manifest variables.

(23)

Figure 6: Eﬃciency versus bounding constant cfor the GWR estimator.

(24)

Figure 7: Distribution of the ML and GWR (c = 2) estimators for the binomial parameters under diﬀerent data contamination.

(25)

Figure 8: Distribution of the ML and GWR (c= 2) estimators for the means (α₀_i) of the normal items under diﬀerent data contamination.

(26)

Figure 9: Distribution of the ML and GWR (c = 2) estimators for the latent variable parameters (α₁_i) of the normal items under diﬀerent data contamination.

(27)

Figure 10: Distribution of the ML and GWR (c = 1.1) estimators for the binary parameters under diﬀerent data contamination.

(28)

Figure 11: Distribution of the ML and GWR (c = 1.1) estimators for the means (α₀_i) of the normal items under diﬀerent data contamination.

(29)

Figure 12: Distribution of the ML and GWR (c = 1.1) estimators for the latent variable parameters (α₁_i) of the normal items under diﬀerent data contamination.