Article
Reference
Bounded-Influence Robust Estimation in Generalized Linear Latent Variable Models
MOUSTAKI, Irini, VICTORIA-FESER, Maria-Pia
Abstract
Latent variable models are used for analyzing multivariate data. Recently, generalized linear latent variable models for categorical, metric, and mixed-type responses estimated via maximum likelihood (ML) have been proposed. Model deviations, such as data contamination, are shown analytically, using the influence function and through a simulation study, to seriously affect ML estimation. This article proposes a robust estimator that is made consistent using the basic principle of indirect inference and can be easily numerically implemented. The performance of the robust estimator is significantly better than that of the ML estimators in terms of both bias and variance. A real example from a consumption survey is used to highlight the consequences in practice of the choice of the estimator.
MOUSTAKI, Irini, VICTORIA-FESER, Maria-Pia. Bounded-Influence Robust Estimation in
Generalized Linear Latent Variable Models. Journal of the American Statistical Association , 2006, vol. 101, no. 474, p. 644-653
DOI : 10.1198/016214505000001320
Available at:
http://archive-ouverte.unige.ch/unige:6460
Disclaimer: layout of this document may differ from the published version.
1 / 1
1 60
2 61
3 62
4 63
5 64
6 65
7 66
8 67
9 68
10 69
11 70
12 71
13 72
14 73
15 74
16 75
17 76
18 77
19 78
20 79
21 80
22 81
23 82
24 83
25 84
26 85
27 86
28 87
29 88
30 89
31 90
32 91
33 92
34 93
35 94
36 95
37 96
38 97
39 98
40 99
41 100
42 101
43 102
44 103
45 104
46 105
47 106
48 107
49 108
50 109
51 110
52 111
53 112
54 113
55 114
56 115
57 116
58 117
59 118
Bounded-Influence Robust Estimation in Generalized Linear Latent Variable Models
Irini MOUSTAKIand Maria-Pia VICTORIA-FESER
Latent variable models are used for analyzing multivariate data. Recently, generalized linear latent variable models for categorical, metric, and mixed-type responses estimated via maximum likelihood (ML) have been proposed. Model deviations, such as data contamination, are shown analytically, using the influence function and through a simulation study, to seriously affect ML estimation. This article proposes a robust estimator that is made consistent using the basic principle of indirect inference and can be easily numerically implemented. The performance of the robust estimator is significantly better than that of the ML estimators in terms of both bias and variance. A real example from a consumption survey is used to highlight the consequences in practice of the choice of the estimator.
KEY WORDS: Indirect inference; Influence function; Latent variable models; Mixed variables; Robust estimation.
1. INTRODUCTION
Latent variable models are widely used in social sciences for studying the interrelationships among observed variables.
More specifically, latent variable models are used for reduc- ing the dimensionality of multivariate data, for assigning scores to sample members on the latent dimensions identified by the model, and for constructing measurement scales (e.g., in edu- cational testing and psychometrics). Moustaki and Knott (2000) proposed a generalized linear latent variable model (GLLVM) framework for any type of observed data (metric or categorical) in the exponential family. They extended the work of Moustaki (1996) and Sammel, Ryan, and Legler (1997) for mixed binary and metric variables (the latter with covariate effects as well) and Bartholomew and Knott (1999) for categorical variables.
A similar framework was also discussed by Skrondal and Rabe- Hesketh (2004) that includes multilevel models (random-effects models) as a special case.
In the literature, the parameters of GLLVM are estimated us- ing a classical maximum likelihood (ML) approach. However, ML estimation is based on the fundamental assumptions that the data are generated exactly from the model and, in partic- ular, that there are no gross errors in the set of responses. For example, in the case of normal variables, a subject with a re- sponse more than 3 standard deviations away from the mean is considered an unexpected response under the normal model, which can be either an error (e.g., recording error) or just an unusual subject not representative of the sampled population.
For binary variables, an unexpected response occurs when the associated probability is low under the true model. To illustrate the proposed methodology, in Section 5 we attempt to construct a measurement scale for the construct “wealth” using five in- dicators (data collected from Swiss households). Two of these indicators are binary recording, the possession of a dishwasher and a car, and three are continuous, measuring expenditures on food, clothing, and housing. When a standard maximum likeli- hood estimator (MLE) is used, three of the five indicators are found to have significant loadings on the construct “wealth.”
When the robust estimator is used that accounts for the presence
Irini Moustaki is Associate Professor, Department of Statistics, Athens University of Economics and Business, 104-34 Athens, Greece (E-mail:
[email protected]). Maria-Pia Victoria-Feser is Professor, Faculty of Eco- nomics and Social Sciences (HEC), University of Geneva 40, Geneva 4, Switzerland (E-mail: [email protected]). This work was supported in part by the Swiss National Science Foundation (grants 610- 057883.99 and PP001-106465). The authors thank the two anonymous referees and the associate editor for their comments, which helped improve the quality of the article.
of few extreme observations in the data, one more indicator is added to the “wealth” scale.
In this article we investigate the effect of an unexpected (un- usual) set of responses on the MLE with respect to bias and ef- ficiency. We show theoretically and through a simulation study that MLEs may change significantly if subjects that do not “fit the model” are present in the sample. That makes the MLE less stable (not robust), and therefore, in principle, one subject may change the conclusions drawn from the data analysis. This is a rather undesirable property of the estimation procedure. In that case, a robust estimator that built to be resistant to model devia- tions is developed and used in practice. The aim of this article is therefore twofold: to investigate the robustness properties of the MLE both theoretically and through a simulation for GLLVM, and to propose a robust estimator.
General robustness theory has been given by Huber (1981) and Hampel, Ronchetti, Rousseeuw, and Stahel (1986), who set the foundations. To assess some of the robustness properties of any statistic, such as an estimator or a test statistic, one can use the influence function (IF) (Hampel 1968, 1974). Hampel et al. (1986) showed that the asymptotic bias of an estimator is proportional to its IF. To build a robust estimator, one can con- sider a general class of estimators, such as M-estimators (Huber 1964), and choose one that has a bounded IF. The optimal bias robust estimator (OBRE) is the most efficient M-estimator with bounded IF for general parametric models that has been defined by Hampel et al. (1986). But the OBRE is very hard to compute when the models are complicated, like GLLVM. Other robust estimators like those based on weighted score functions, such as weighted MLEs (WMLEs) (see Dupuis and Morgenthaler 2002), can be used, but if the model is not based on symmetric models (such as the normal model), then care must be taken to avoid inconsistent estimators. To correct WMLEs for bias, Dupuis and Morgenthaler (2002) proposed a first-order approx- imation correction term. In this article we propose a simple M-estimator based on weighted score functions (i.e., a WMLE), and adapt indirect estimation (Gouriéroux, Monfort, and Renault 1993; Gallant and Tauchen 1996; Genton and Ronchetti 2003) to make the resulting estimator consistent.
The article is organized as follows. The GLLVM and the MLE of the model parameters are presented in Section 2.1.
A robust estimator is presented in Sections 2.2 and 2.3, and its robustness, efficiency, and consistency properties are studied
© 0 American Statistical Association Journal of the American Statistical Association
???? 0, Vol. 0, No. 00, Theory and Methods DOI 10.1198/00 1
1 60
2 61
3 62
4 63
5 64
6 65
7 66
8 67
9 68
10 69
11 70
12 71
13 72
14 73
15 74
16 75
17 76
18 77
19 78
20 79
21 80
22 81
23 82
24 83
25 84
26 85
27 86
28 87
29 88
30 89
31 90
32 91
33 92
34 93
35 94
36 95
37 96
38 97
39 98
40 99
41 100
42 101
43 102
44 103
45 104
46 105
47 106
48 107
49 108
50 109
51 110
52 111
53 112
54 113
55 114
56 115
57 116
58 117
59 118
in Section 3 along with the robustness properties of the MLE.
In Section 4 the behavior of the MLE and the robust estima- tor under model contamination are studied through a simulation study, and in Section 5 the consumption dataset is analyzed us- ing both methods.
2. ESTIMATION OF GENERALIZED LINEAR LATENT VARIABLE MODELS
2.1 Approximate Maximum Likelihood Estimator
The basic idea of latent variable analysis is as follows. For a given set of response variables x1, . . . ,xp, one wants to find a set of latent variables or factors z1, . . . ,zq, that are fewer in number than the observed variables, but that contain essentially the same information. The factors are supposed to account for the dependencies among the response variables in the sense that if the factors are held fixed, then the observed variables are in- dependent. This is known as the assumption of conditional or local independence.
The conditional distribution of xm|z (z = [z1, . . . ,zq]) is taken from the exponential family (with canonical link func- tions)
gm(xm|z,θm)=exp
xmαmz∗
φm −b(αmz∗)
φm +c(xm, φm)
, m=1, . . . ,p, with αm = [αm0, . . . , αmq], z∗ = [1,z1, . . . ,zq]T, and θm = (αTm, φm)T. The functions b(αmz∗)and c(xm, φm)take different forms depending on the distribution of the response variable xm
(see McCullagh and Nelder 1989). Under the assumption of conditional independence, the joint marginal distribution of the manifest variables is
f(x;θ)=
· · · p m=1
gm(xm|z,θm)
ϕ(z)dz, (1) with x= [x1, . . . ,xp],θ= [θT1, . . . ,θTp]T, and where z is mul- tivariate standard normal, that is, ϕ(z)=q
j=1ϕ(zj). The in- dependence assumption for the latent variables can be relaxed.
Moreover, Bartholomew (1988) showed that the choice of the latent variable distribution has a negligible effect on the inter- pretation of the results. He suggested using the normal distribu- tion because it has rotational advantages when it comes to more than one latent variable.
For a sample of size n, the log-likelihood is then L(θ)=
1 n
n
i=1log f(xi;θ)with partial derivatives
∂L(θ)
∂αTm =1 n
n i=1
s(1)m (xi;θ)
=1 n
n i=1
1 f(xi;θ)
· · ·
g(xi|z,θ)
×
xmi−b(αmz∗) φm
z∗ϕ(z)dz, (2) where b(x)= ∂x∂ b(x). Note that b(αmz∗)=E[xm|z] and that b(αmz∗)φm =var[xm|z]. The roots of (2) define the MLE
αm,∀m. Differentiating the log-likelihood with respect to the scale parameter leads to
∂L(θ)
∂φm =1 n
n i=1
s(2)m (xi;θ)
=1 n
n i=1
1 f(xi;θ)
· · ·
g(xi|z,θ)
×
−xmiαmz∗−b(αmz∗) φ2m
+ ∂
∂φmc(φm,xmi)
ϕ(z)dz. (3) The scale parameterφ for the case of binomial, multinomial, and Poisson distributed variables is 1. For the normal distribu- tion, we have
∂
∂φm
c(φm,xmi)=.5 xm2
φm2 − 1 φm
.
To compute the MLE, one must solve (2) and (3). The inte- grals in (2) and (3) can be approximated using fixed Gauss–
Hermite quadrature (see, e.g., Bock and Lieberman 1970), adaptive quadrature points (see, e.g., Bock and Schilling 1997;
Schilling and Bock 2005), Monte Carlo approximations (see, e.g., Sammel et al. 1997), or Laplace approximation (see, e.g., Huber, Ronchetti, and Victoria-Feser 2004). All of these ap- proximations lead to approximate MLEs. The models that we consider here are one-factor models, and even though it is known that Gauss–Hermite rule can give biased estimators in some situations, we nevertheless use it to compute the integrals.
2.2 RobustM-Estimator
As we show in Section 3, the MLE is not robust to small model deviations (e.g., presence of outliers). Therefore, here we propose a robust estimator that belongs to the class of M-estimators (see Huber 1981) that has well-known proper- ties. Given a relatively general function ψ (see Huber 1981), an M-estimator is defined implicitly as the solution inθof
n i=1
ψ(xi;θ)=0. (4) It is known that choosing a boundedψ or controlling the bound onψdefines a robust estimator. A simple choice forψis given by a weighted score function leading to a WMLE, with smaller weights when the score function becomes too large, that is,
1 n
n i=1
ψc(xi;θ)=1 n
n i=1
s(xi;θ)w(xi,c)=0. (5) The weight function w can be defined through the Huber func- tion with parameter c given by
w(x;c)=min
1; c s(x;θ)
, (6)
where · · · denotes the Euclidean norm. Such weights guaran- tee a bounded IF for the corresponding estimator. For GLLVM, we have s(xi;θ) = [sm(x;θ)T]Tm=1,...,p with sm(x;θ) = [s(1)m (x;θ)T s(2)m (x;θ)T]T, and the score functions s(1)m
Moustaki and Victoria-Feser: Bounded-Influence Robust Estimation 3
1 60
2 61
3 62
4 63
5 64
6 65
7 66
8 67
9 68
10 69
11 70
12 71
13 72
14 73
15 74
16 75
17 76
18 77
19 78
20 79
21 80
22 81
23 82
24 83
25 84
26 85
27 86
28 87
29 88
30 89
31 90
32 91
33 92
34 93
35 94
36 95
37 96
38 97
39 98
40 99
41 100
42 101
43 102
44 103
45 104
46 105
47 106
48 107
49 108
50 109
51 110
52 111
53 112
54 113
55 114
56 115
57 116
58 117
59 118
and s(2)m are given in (2) and (3). We call the resulting estimator a globally weighted robust (GWR) estimator.
When an M-estimator defined generally through aψ-function given in (4) is not consistent, one can make the M-estimator Fisher-consistent by adding a proper quantity in its defini- tion, that is, 1nn
i=1ψ(xi;θ)−a(θ)=0 such that a(θ)= · · ·
ψ(x;θ)f(x;θ)dx. For the GWR estimator, we have that a(θ)=
· · ·
g(xi|z,θ)
×
xm−b(αmz∗) φm
z∗T
−xmαmz∗−b(αmz∗)
φm2 +c(φm,xm) T
m=1,...,p
×w(x,c)dxϕ(z)dz.
The double integration with respect to z and x makes a(θ)hard to compute, and so we adopt a different approach that makes the GWR consistent, namely indirect inference.
2.3 Robust Indirect Estimator
Indirect estimation (see Gouriéroux et al. 1993; Gallant and Tauchen 1996) has been proposed as an estimation procedure for a complex model Fθ with intractable likelihood functions.
It involves the computation of an estimatorπ(Fn)(where Fnis the empirical distribution) for the parameters of an auxiliary modelFπ that does not provide a consistent estimator ofθ. In particular, letπ(Fn)be an M-estimator defined implicitly by
1 n
n i=1
ψ(xi;π(Fn))=0 (7) for a sample {x1, . . . ,xn} supposedly generated from Fθ. Here π(Fn) is a Fisher-consistent estimator of π in that the ψ-function satisfies
ψ(x;π)dFπ(x)=0. Whenπ =θ, we suppose that a locally injective binding function h exists such that θ=h−1(π) and such that a consistent estimator ofθ is given byθ(Fn)=h−1(π(Fn)) (see also Genton and de Luna 2000), withπ(Fn)given by (7). The latter is obtained implic- itly by the solution inθof
ψ(x;π(Fn))dFθ(x)=0. (8) When Fn→Fθ, we have
ψ(x;h(θ))dFθ(x)=0. (9) In other words, to make the estimator ofθFisher-consistent, in- direct estimation implicitly allows us to transformψ(x;θ)into ψ(x;θ)=ψ(x;h(θ)).
The indirect estimator given in (8) results as a particular case of the general minimization problem defining indirect estima- tors, that is,
θ=arg min
θ
π(Fn)−h(θ)T
π(Fn)−h(θ)
, (10)
in which=I. First-order conditions imply that
∂
∂θh(θ)T
π(Fn)−h(θ)
=0, (11)
and hence the solution is h−1(π(Fn)), if ∂θ∂ h(θ)T is invertible and it is obtained using (8).
For the model considered in this article, ψ is given in (5) and (6),π(Fn)is the GWR estimator [i.e. solution of (5) with θ replaced by π], and Fθ is the GLLVM with density given in (1) and with unknownθ. Note that if the weights in (6) are all equal to 1 [i.e., c:= ∞in (6)], then the solution of (8) isθ(Fn)=π(Fn). Indeed, in that case ψ(x;π)=s(x;π)and
π(Fn)is the MLE that is unbiased, and therefore no correction is required.
One can estimate the integral in (8) by simulating n∗obser- vations xi(θ)from Fθ for a givenθ. To solve (8) with respect toθ, we can use a Newton step given by
θ(k+1)=θ(k)−S−1
π,θ(k) n∗
i=1
ψ xiθ(k)
;π
, (12)
withθ:=θ(Fn)andπ:=π(Fn), and where S(π,θ)=
n∗ i=1
ψ
xi(θ);π sT
xi(θ);θ
(13) is the sample approximation of the derivative of (8) with re- spect toθ. Note that we can takeθ(1)=π, and that we should set the seed parameter to a fixed value for all values ofθto en- sure successful optimization. For efficiency reasons, n∗should be chosen as large as possible (see, e.g., Genton and Ronchetti 2003) and can be set equal to n∗=n·l, where l is chosen a priori.
Based on the foregoing, for the GLLVM, we propose as a robust estimator the convergedθgiven in (12) [which is the so- lution of (8)], withπandψ-function defined in (7) and (5). We call that estimator an indirect globally weighted robust (IGWR) estimator. In the Appendix we present an iterative procedure to compute it.
We note that an alternative asymptotically equivalent estima- tor to (10) is defined as
θ=arg min
θ
n∗
i=1
ψ(xi(θ);π) T
n∗
i=1
ψ(xi(θ);π)
(see Gouriéroux et al. 1993; Gallant and Tauchen 1996). When =I, this quadratic form admits a unique solution atθ given implicitly as the solution in θ of n∗
i=1ψ(xi(θ);π)=0. The Newton iterative procedure described in (12) produces this so- lution. We also note that Cabrera and Fernholz (1999) called the indirect estimator given in (8) a target estimator and studied it for M-estimators of location and scale.
Finally, it should be stressed that the IGWR estimator is general because it can be used in principle for any parametric model Fθ and can be extended to other types ofψ-functions.
3. STATISTICAL PROPERTIES OF THE ESTIMATORS In this section we investigate the robustness properties of the MLE and the IGWR estimator for the GLLVM by means of the IF. We then look into the efficiency properties of the IGWR to develop guidelines for choosing the constant c in (6).
1 60
2 61
3 62
4 63
5 64
6 65
7 66
8 67
9 68
10 69
11 70
12 71
13 72
14 73
15 74
16 75
17 76
18 77
19 78
20 79
21 80
22 81
23 82
24 83
25 84
26 85
27 86
28 87
29 88
30 89
31 90
32 91
33 92
34 93
35 94
36 95
37 96
38 97
39 98
40 99
41 100
42 101
43 102
44 103
45 104
46 105
47 106
48 107
49 108
50 109
51 110
52 111
53 112
54 113
55 114
56 115
57 116
58 117
59 118
3.1 Robustness Properties
The robustness properties of the MLE and IGWR estima- tor are investigated using the IF. For a multidimensional func- tionalθat a model Fθ, the IF is defined by
IF(y,θ,Fθ)=lim
ε↓0
θ(Fε)−θ(Fθ)
ε ,
where Fε=(1−ε)Fθ+εy, withythe probability distrib- ution with point mass of 1 at an arbitrary location y. When it exists, we can use IF(y,θ,Fθ)=∂ε∂θ(Fε)|ε=0. For MLEsθML, the IF is proportional to the score function (see Hampel et al. 1986). For the GLLVM, the score function is given in (2) and (3) and is affected by the point of contamination y through the quantities f(y;θ),g(y|z,θ)=p
m=1gm(ym|z,θm), and ym. Therefore, the effect of an extreme value in the mth manifest variable has an influence not only on the MLE ofαm
andφm, corresponding to the contaminated manifest variable, but also on the other estimates of the model. Actually, in prin- ciple, the MLEθof all model parameters can be influenced by extreme data. What is not clear is the size of the IF for different types of variables. Indeed, the quantity (ym−bm(αmz∗t))/φm
can be very large if ym is far away from its expectation, but at the same time its conditional density gm(ym|z,θm)becomes very small and the behavior of g(yf(y|z,θ);θ) is not straightforward to study. Moustaki and Victoria-Feser (2004) studied the behavior of the IF by means of a numerical example.
The following proposition provides the IF of the IGWR θIGWRestimator.
Proposition 1. LetθIGWR(Fn):=θIGWR be the IGWR, de- fined implicitly as the solution of (11) based on the GWR
π(Fn), a consistent estimator ofπ of the auxiliary modelFπ, and the ψ-function defined in (5) and (6). Suppose that the binding function h exists and that it is locally injective and such thatθ(Fn)=h−1(π(Fn))is a consistent estimator ofθ. The IF ofθIGWRis then
IF(y,θIGWR,Fθ)=S−1(π,θ)ψ(y;π), (14) with S(π,θ)=
ψ(x;π)sT(x;θ)dFθ(x).
Proof. Writing θIGWR as a functional θIGWR(Fε) of Fε, (11) becomes
∂
∂θh(θ)T
θ=θIGWR(Fε)
π(Fε)−h(θIGWR(Fε))
=0.
Taking derivatives with respect to εatε=0, we get (see also Genton and de Luna 2000)
IF(y,θIGWR,Fθ)
= ∂
∂θh(θ)T ∂
∂θh(θ) −1 ∂
∂θh(θ)TIF(y,π,Fθ)
=B
− ∂
∂πψ(x;π)dFθ(x) −1
ψ(y;π), (15) with B= [∂θ∂ h(θ)T∂θ∂ h(θ)]−1∂θ∂ h(θ)T. Moreover, we can de- duce ∂θ∂ h(θ) from (9) by taking derivatives with respect to θ [i.e., ∂
∂πψ(x;π)dFθ(x)∂∂θh(θ)+
ψ(x;π)sT(x;θ)×
dFθ(x) = 0], so that ∂∂θh(θ) = −M−1(π,θ)S(π,θ) with M(π,θ)= ∂
∂πψ(x;π)dFθ(x). Then B= −
S(π,θ)TM−T(π,θ)M−1(π,θ)S(π,θ)−1
×S(π,θ)TM−T(π,θ), (16) and by replacing in (15), we get (14).
Becauseψ given in (5) is bounded, the IF ofθIGWR is also bounded. Although in our context theψ defines the GWR, the result of the proposition applies to other M-estimators as well.
The IF measures directional effects of model deviations on the estimator. A more global measure is self-standardized sen- sitivity (Hampel et al. 1986), which is taken as the supremum in y of a function of the IF. It is a measure of the asymptotic bias of the estimatorθ due to small model deviations (see Hampel et al. 1986, p. 175). Genton and Ronchetti (2003, prop. 1) showed that the indirect estimator has a self-standardized sen- sitivity smaller than or equal to the self-standardized sensitivity of the estimator of the auxiliary model. Because the IF of the latter is based on theψ-function (5), which is bounded, the self- standardized sensitivity of the IGWR is then also bounded and so is the asymptotic bias of the IGWR estimator under small model deviations. That is not the case with the MLE, however.
To make this point even more strongly, we perform simulation studies in Section 4 and compare the performance in terms of bias of the MLE to the IGWR estimator under small model de- viations or data contamination.
3.2 Efficiency
To compute the IGWR estimator, we must choose the bound c in the weight function (6). Obviously, the smaller its value, the more robust (but also the less efficient) the estimator.
A strategy commonly used (for choosing an appropriate value for c) is to fix a degree of efficiency loss for the robust estimator compared to the MLE and choose c accordingly.
From work of Genton and Ronchetti (2003), we can de- duce the asymptotic covariance matrix of the IGWR estima- torθand obtain Vθ=BVπBT+1lBVπ∗(Fθ)BT. Note that for an M-estimator as in (7), we have Vπ =M−1(π,θ)Q(π,θ)× M−T(π,θ), with Q(π,θ)=
ψ(x;π)ψT(x;π)dFθ(x). When l is sufficiently large and using (16), we get Vθ∼=S−1(π,θ)× Q(π,θ)S−T(π,θ), which can be estimated by
Vθ=S−1(π,θ)Q(π,θ)S−T(π,θ), (17) where Q(π,θ)is computed as in (13).
For a fixed value ofθ, we can use (17) to compute the effi- ciency of the IGWR estimator (vs. the MLE) as a function of the bounding constant c. Taking the same model and parameter values as in the simulation study (see Sec. 4), we simulated a (uncontaminated) sample of 1,000 observations and calculated the relationship between the efficiency of the IGWR estima- tor and the bounding constant c. This relationship is illustrated in Figure 1. In particular, for an efficiency ratio of 95%, we can use a bounding constant of approximately c=3.5, whereas a bounding constant of c=2 leads to an efficiency ratio of approximately 82%. It should be noted that in principle, effi- ciency depends on the parameter values. A strategy that is often
Moustaki and Victoria-Feser: Bounded-Influence Robust Estimation 5
1 60
2 61
3 62
4 63
5 64
6 65
7 66
8 67
9 68
10 69
11 70
12 71
13 72
14 73
15 74
16 75
17 76
18 77
19 78
20 79
21 80
22 81
23 82
24 83
25 84
26 85
27 86
28 87
29 88
30 89
31 90
32 91
33 92
34 93
35 94
36 95
37 96
38 97
39 98
40 99
41 100
42 101
43 102
44 103
45 104
46 105
47 106
48 107
49 108
50 109
51 110
52 111
53 112
54 113
55 114
56 115
57 116
58 117
59 118
Figure 1. Efficiency versus Bounding Constant c for the IGWR Esti- mator.
adopted in practice is to first compute the robust estimator us- ing a low value of c, compute the efficiency given the values of the estimates, and then increase the value of c to achieve a re- quired level of efficiency. This procedure may take a few steps to achieve the desired level of efficiency. We illustrate this pro- cedure with a real example in Section 5.
Finally, (17) can be used to test the significance of the para- meters.
4. SIMULATION STUDY
In this section we present a simulation study that allows us to test the results that we found theoretically. In particular, we want to check whether the bias under contaminated data is smaller with the IGWR estimator and whether the MLE can be seriously biased in some settings. We simulated 150 samples of size 200 from the mixed GLLVM. The model that we chose is an one-factor model fitted to two binary (m=1,2) and three
(m=3,4,5) normal manifest variables with the following pa- rameter values:
• α1= [1.0, .7]T
• α2= [.8,1.0]T
• α3= [2.0, .6]T andφ3=1
• α4= [2.5, .7]T andφ4=1
• α5= [3.0, .8]T andφ5=1.
We contaminated the data in three different ways. In one set- ting we randomly chose 3% of the first normal variable (i.e., ob- servations of x3) and set to an arbitrary value (20); in the second setting we randomly chose 3% of the subjects and set their re- sponses on all three normal variables to an arbitrary value (20);
in a third setting, we generated 3% of the data from the mixed GLLVM with α5= [3.0,8]T instead of α5= [3.0, .8]T. The first two types of contamination are known as point-mass con- tamination, where observed values are replaced by arbitrary values (point-mass contamination 1 and 3). The third type of contamination is known as typical model deviation, where a small proportion of the subjects do not belong to the same pop- ulation. The 13 parameters of the GLLVM were estimated us- ing three different estimators: the MLE, the IGWR estimator, and one-step IGWR (IGWR1), with bounding constant c=3.5.
The latter estimator is given by one iteration of (12), that is, θIGWR1=π−S−1(π,π)n∗
i=1ψ(xi(θ=π);π)withπ being the GWR estimator. The simulation results are presented in the form of boxplots, with horizontal lines corresponding to the true parameters values. We also tried a different set of parameter val- ues and we found very similar results.
The effect of data contamination on the MLE is rather sur- prising. First, the estimated means of the normal variables (α30, α40,α50) are biased when the corresponding normal variable is contaminated (see Fig. 2 for α30, the graphs for the other two continuous variables are not presented here). Second, the loadings of the normal variables (α31,α41,α51) are biased only when all three normal variables are simultaneously contami- nated and in the model deviation case (see Fig. 3 forα51; the
(a) (b)
(c) (d)
Figure 2. Estimator Distributions forα30(3% contamination). (a) No contamination; (b) point-mass contamination 1; (c) point-mass contamina- tion 3; (d) model deviation.
1 60
2 61
3 62
4 63
5 64
6 65
7 66
8 67
9 68
10 69
11 70
12 71
13 72
14 73
15 74
16 75
17 76
18 77
19 78
20 79
21 80
22 81
23 82
24 83
25 84
26 85
27 86
28 87
29 88
30 89
31 90
32 91
33 92
34 93
35 94
36 95
37 96
38 97
39 98
40 99
41 100
42 101
43 102
44 103
45 104
46 105
47 106
48 107
49 108
50 109
51 110
52 111
53 112
54 113
55 114
56 115
57 116
58 117
59 118
(a) (b)
(c) (d)
Figure 3. Estimator Distributions forα51(3% contamination). (a) No contamination; (b) point-mass contamination 1; (c) point-mass contamina- tion 3; (d) model deviation.
other graphs are not presented here). Third, although the con- tamination is on the normal variables, the MLE of the loadings for the binary variables (α11,α21) (not the meansα10andα20) are biased when the contamination occurs on all normal vari- ables (see Fig. 4 for α11; the other graphs are not presented here). Fourth, the scale parameter of the contaminated contin- uous variable is biased (see Fig. 5), as is that for the normal variable corresponding to the model deviation case (see Fig. 6).
In contrast, the proposed robust estimators are not biased (or are significantly less biased) under any type of contamination studied here. It also seems that the IGWR1 (one-step) performs as well as the IGWR in terms of robustness properties. Nonethe- less, more thorough study is needed before we can conclude that the two robust estimators are equivalent in practice. We also
compare the behavior of the robust estimators for different val- ues of the bounding constant c and with when there is no data contamination. In summary, we found that there is no apparent difference in terms of bias and efficiency between the estima- tors for the binary parameters and the normal mean parameters.
The behavior changes for the loadings of the normal observed variables and the scale parameters. For those parameters, it ap- pears that the IGWR is in general less variable and less biased than the IGWR1 (see Fig. 7 for φ5; the other graphs are not presented here). The bias and efficiency loss increase when the bounding constant c decreases. The best compromise between bias and robustness seems to be achieved with c=3, which corresponds to an efficiency ratio compared with the MLE of about 92% (see Fig. 1).
(a) (b)
(c) (d)
Figure 4. Estimator Distributions forα11(3% contamination). (a) No contamination; (b) point-mass contamination 1; (c) point-mass contamina- tion 3; (d) model deviation.
Moustaki and Victoria-Feser: Bounded-Influence Robust Estimation 7
1 60
2 61
3 62
4 63
5 64
6 65
7 66
8 67
9 68
10 69
11 70
12 71
13 72
14 73
15 74
16 75
17 76
18 77
19 78
20 79
21 80
22 81
23 82
24 83
25 84
26 85
27 86
28 87
29 88
30 89
31 90
32 91
33 92
34 93
35 94
36 95
37 96
38 97
39 98
40 99
41 100
42 101
43 102
44 103
45 104
46 105
47 106
48 107
49 108
50 109
51 110
52 111
53 112
54 113
55 114
56 115
57 116
58 117
59 118
(a) (b)
(c) (d)
Figure 5. Estimator Distributions forφ3(3% contamination). (a) No contamination; (b) point-mass contamination 1; (c) point-mass contamina- tion 3; (d) model deviation.
5. ANALYSIS OF WEALTH DATA
To study the impact of potential (small) model deviations in practice and to compare possible differences between a clas- sical and a robust approach, we analyze a subsample of 100 households of the 1990 Switzerland consumption survey, pro- vided by the Swiss Federal Statistical Office. The aim here is to construct a measurement scale for the level of wealth. For the purpose of this exercise, five variables were selected:
• Purchase of a dishwasher (1/0) (Dishwasher)
• Purchase of a car (1/0) (Car)
• Equivalent food expenditure in logarithm (Food)
• Equivalent expenditures for clothing in logarithm (Cloth- ing)
• Equivalent expenditures for housing in logarithm (Hous- ing).
The continuous variables are treated as normal variables. Vari- ables from the same survey were analyzed using the GLLVM by Bartholomew and Knott (1999) and Huber et al. (2004). We fit a one-factor model using both the MLE and the IGWR esti- mator. The bounding constant c was set to 5, corresponding to an efficiency level of 94% (computed on the parameter values provided by the IGWR). Table 1 presents the parameter values estimated by the MLE and the IGWR estimator together with their standard errors (with values in bold corresponding to sig- nificant variables at the 5% level).
The MLE shows that only the variables Food, Clothing, and Housing are indicators of wealth, whereas the IGWR adds
(a) (b)
(c) (d)
Figure 6. Estimator Distributions for φ5. (a) No contamination; (b) point-mass contamination 1; (c) point-mass contamination 3; (d) model deviation.
1 60
2 61
3 62
4 63
5 64
6 65
7 66
8 67
9 68
10 69
11 70
12 71
13 72
14 73
15 74
16 75
17 76
18 77
19 78
20 79
21 80
22 81
23 82
24 83
25 84
26 85
27 86
28 87
29 88
30 89
31 90
32 91
33 92
34 93
35 94
36 95
37 96
38 97
39 98
40 99
41 100
42 101
43 102
44 103
45 104
46 105
47 106
48 107
49 108
50 109
51 110
52 111
53 112
54 113
55 114
56 115
57 116
58 117
59 118
(a) (b)
Figure 7. Estimator Distributions forφ5. (a) IGWR; (b) IGWR1.
the variable Dishwasher. Both analyses exclude the variable Car. Both methods find the variables Food and Housing to be indicators of the latent variable, whereas the association is stronger with the Clothing variable. For a diagnostic analysis, the weights given in (6) are computed for each observation at the IGWR values and plotted in Figure 8. Apparently there are (only) five outliers. If we were to draw scatterplots of the nor- mal variables (not presented here), these outliers would also ap- pear as outliers in the scatterplots, but there would also be other observations away from the bulk of the data that were not down- weighted by the IGWR. This is not contradictory, because the IGWR gives weights according to departures of the data from the GLLVM, not necessarily from the correlation structure. This example shows that the implication of using a robust estimator can produce scales with different sets of indicators, and that this
Table 1. Parameter Estimates and Standard Errors for the GLLVM, Wealth Data
MLE IGWR, c=5
Parameters Estimate Standard error Estimate Standard error Constants
α10 −.506 .23 −.589 .26
α20 −.623 .23 −.537 .23
α30 6.922 .23 6.887 .28
α40 5.353 .32 5.332 .32
α50 7.087 .33 7.140 .29
Loadings
α11 .466 .26 .679 .28
α21 −.167 .24 .216 .25
α31 1.021 .18 1.098 .21
α41 1.412 .31 1.415 .28
α51 1.044 .33 1.064 .27
Variances
φ3 .289 .16 .426 .17
φ4 1.280 .27 1.056 .20
φ5 1.475 .22 .935 .14
can be pointed down to the data through the analysis of the ro- bust weights.
6. CONCLUSION
In this article we have shown that the MLE for the GLLVM, at least when binary and normal manifest variables are ana- lyzed, can be biased when the data are not exactly generated by the postulated model. We have investigated this robustness problem by means of the IF and a simulation study. We have then proposed a robust estimator that is found to perform sat- isfactorily in terms of mean squared error and computational complexity. The analysis of the real dataset has also shown that an alternative robust analysis can result in a different solution and interpretation of the analysis.
Although the simulations and the real example concern a GLLVM with a mixture of binary and metric responses and
Figure 8. IGWR Weights Against Observation Number for the Wealth Data.