Robust estimation of constrained covariance matrices for Confirmatory Factor Analysis

(1)

Report

Reference

Robust estimation of constrained covariance matrices for Confirmatory Factor Analysis

DUPUIS LOZERON, Elise, VICTORIA-FESER, Maria-Pia

Abstract

Confirmatory factor analysis (CFA) is a data analysis procedure that is widely used in social and behavioral sciences in general and other applied sciences that deal with large quantities of data (variables). The underlying model links a set latent factors, that are supposed to correspond to latent concepts, to a larger set of observed (manifest) variables through linear regression equations. With CFA, it is not necessary that all manifest variables are linked to all latent factors, and is particularly useful for the construction of so-called measurement scales like depression scales in psychology. The classical estimator (and inference) procedures are based either on the maximum likelihood (ML) or generalized least squares (GLS) approaches.

Unfortunately these methods are known to be non robust to model misspecification, which in the case of factor analysis in general, and in CFA in particular, is misspecification with respect to the multivariate normal model. A natural robust estimator is obtained by first estimating the (mean and) covariance matrix of the manifest variables and then "plug-in" this statistic into the ML [...]

DUPUIS LOZERON, Elise, VICTORIA-FESER, Maria-Pia. Robust estimation of constrained covariance matrices for Confirmatory Factor Analysis. 2009

Available at:

http://archive-ouverte.unige.ch/unige:5714

Disclaimer: layout of this document may differ from the published version.

(2)

2009.02

FACULTE DES SCIENCES

ECONOMIQUES ET SOCIALES

HAUTES ETUDES COMMERCIALES

Robust estimation of constrained covariance matrices for Confirmatory Factor Analysis

E. DUPUIS LOZERON

M.-P. VICTORIA-FESER

(3)

Robust estimation of constrained covariance matrices for

Confirmatory Factor Analysis

Elise Dupuis Lozeron and Maria-Pia Victoria-Feser University of Geneva

November 28, 2008

Abstract

Conﬁrmatory factor analysis (CFA) is a data anylsis procedure that is widely used in social and behavioral sciences in general and other applied sciences that deal with large quantities of data (variables).

The underlying model links a set latent factors, that are supposed to correspond to latent concepts, to a larger set of observed (manifest) variables through linear regression equations. With CFA, it is not necessary that all manifest variables are linked to all latent factors, and is particularly useful for the construction of so-called measurement scales like depression scales in psychology. The classical estimator (and inference) procedures are based either on the maximum likelihood (ML) or generalized least squares (GLS) approaches. Unfortunately these methods are known to be non robust to model misspecification, which in the case of factor analysis in general, and in CFA in particular, is misspecification with respect to the multivariate normal model. A natural robust estimator is obtained by first estimating the (mean and) covariance matrix of the manifest variables and then “plug-in” this statistic into the ML or GLS estimating equations. This two-stage method however doesn’t fully take into account the covariance structure im- plied by the CFA model. In this paper, we propose anS-estimator for

(4)

the parameters of the CFA model that is computed directly from the data. We derive the estimating equations and an iterative procedure.

The two estimators have different asymptotic properties, in that their asymptotic covariance matrix is not the same, and they both depend on the model and the parameters values. We perform a simulation study to compare the finite sample properties of both estimators and find that the direct estimator we propose is more stable (smaller MSE) than the two-stage estimator.

(5)

1 Introduction

Factor analysis (FA) is an old technique that is widely used not only as a data reduction technique, but also for the construction of measurement scales as is done for example in psychology. It is also used for the analysis of data in social and behavioral sciences in general and other applied sciences that deal with large quantities of data (variables). Given a vector of observed (manifest) variables X = (X1, ..., Xp)^T ∈R^p with mean vector µ= (µ1, ..., µp)^T and covariance matrix Σ, the FA model states that

X−µ=Γf +

wheref = (f1, ..., fk)^T is the vector containingk < pfactors, Γis thep×k matrix of factor loadings and is the error term. It is also assumed that cov(f,) = 0,E(f) =0_k,E() =0_p,E(ff^T) =Φ_k×k,E(^T) =Ψ_p×pand Ψ = diag{ψ₁², ..., ψ²_p}, where the ψ_j² are the so-called uniqueness. Moreover, sinceE[(X −µ)(X −µ)^T] =E[(Γf+)(Γf+)^T], the FA model is also often presented as

Σ=ΓΦΓ^T+Ψ (1.1)

The loadings matrixΓ, containing sayq unknown factor loadingsγ1, . . . , γq, can be either full for so-called exploratory FA (i.e. q=p·k) or constrained (typically some loadings are set to 0) for so-called conﬁrmatory FA (i.e.

q < p·k). In what follows, we will assume that the factors are independent, i.e. Φ=I_p, but this hypothesis could be relaxed.

Given a sample ofnobservations on the pmanifest variables, with classical FA,µis estimated by the sample mean ¯xandΣis estimated by by the sample covariance matrixS. Estimates forΨandΓare then found by maximum

(6)

likelihood estimator (MLE) algorithm asS is supposed to have a Whishart distribution ifX is multivariate normal (see Joreskog 1967). However, it is well known that the sample mean and covariance are not robust estimators of their corresponding population parameters, in that small departures from the normality assumption on the manifest variables can seriously bias the FA parameter estimates (see e.g. Yuan and Bentler 1998, Yuan and Bentler 2001, Pison, Rousseeuw, Filzmoser, and Croux 2003). This is also true for alternative estimators to the MLE, such as generalized least squares (GLS) methods (see Browne 1984).

To avoid potential biases due to the violation of the normality assumption, one can simply estimate the mean and covariance of the manifest variables in a robust fashion and “plug in” the estimates in a classical objective function L (to be minimized), such as the one for the MLE based on the Whishart distribution i.e.

L_{M L}(θ) = log|Σ(θ)|+ tr[SΣ(θ)⁻¹]−log|S| −p (1.2) or the one for the GLS

L_GLS(θ) = ( ˆσ−σ(θ))^TW( ˆσ−σ(θ)) (1.3) where W is some weight matrix, θ is the vector of parameter of interest (the γ_j and ψ_j²), Σ(θ) depends on θ through (1.1) and σ(θ) = vec(Σ(θ)) and ˆσ = vec(S), where the operator vec(.) stacks the columns of a matrix on top of each other.

This type of procedure that has been formalized in e.g. Yuan and Bentler (1998) and Pison, Rousseeuw, Filzmoser, and Croux (2003), provides unbiased estimators of the FA model’s parameters. What is less clear however

(7)

is the efficiency property of the resulting estimators. Indeed, this two-stage method implies that a robust (mean and) covariance matrix is first estimated, without using the fact that under the FA model, the covariance matrix is constrained by (1.1). Yuan and Bentler (1998) propose to estimate the FA model’s parameters directly using a robust estimator but do not imple- ment the method. In this paper, we also propose to estimate the parameter of the FA model directly in a robust fashion. For that, we use a constrained biweight S-estimator (CBS) in the same spirit as the constrained trans- lated biweightS-estimator proposed by Copt and Victoria-Feser (2006) for mixed linear models. We develop an iterative algorithm to compute it in the framework of confirmatory FA. We also investigate its large and small sample properties in terms of variance (and bias) and compare its performance to a two-stage robust estimator using also the sameS-estimator. We do this analytically and by means of simulations in different models and settings.

2 Robust estimators for FA

2.1 Two-stage estimator

To estimateµand Σ, we use an S-estimator with a high breakdown point.

Very generally, an S-estimator of multivariate location and scale is deﬁned as the solution forµand Σthat minimizes the determinant of Σ, i.e. |Σ|, subject to

1 n

n i=1

ρ

(xi−µ)^TΣ⁻¹(xi−µ)

= 1 n

n i=1

ρ(di) =b0 (2.1)

whereρ is a symmetric, continuously diﬀerentiable function, withρ(0) = 0 and ρ is strictly increasing on [0, c] and constant on [c,∞] and c is such that E_Φ[ρ]/ρ(c) = ^∗ where Φ is the standard normal and ^∗ is the chosen

(8)

breakdown point (see Rousseeuw and Yohai 1984 ). b₀ is a parameter chosen to determine the breakdown point by means ofb₀ =^∗max_dρ(d).

For ρ, we use the biweight ρ-function of Tukey. Hence, the biweight S- estimator (BS) is deﬁned by the minimum of|Σ|subject to (2.1), with

ρ(d;c) =

⎧⎪

⎪⎨

⎪⎪

⎩

d²/2−d⁴/(2c²) +d⁶/(6c⁴), 0≤d≤c

c²/6, d > c

(2.2)

Estimating equations for theBS-estimator are

µ = 1

ni=1u(d_i;c) n i=1

u(d_i;c)x_i (2.3)

Σ = p

ni=1u(d_i;c)d²_i n

i=1

u(d_i;c) (x_i−µ) (x_i−µ)^T ,

with weights

u(d;c) = 1 d

∂

∂dρ(d;c) =

⎧⎪

⎪⎨

⎪⎪

⎩

1−(d/c)²₂

, 0≤d≤c

0, d > c

TheBS-estimator ofΣcan then be used to replaceS or ˆσ in the objective functions (1.2) or (1.3); this deﬁnes the two-stage estimator (MLE or GLS).

2.2 Direct estimator

The FA model implies that the data have been generated by a multivariate normal distribution with constrained covariance matrix according to (1.1).

For this type of problems, one can use the results of Copt and Victoria-Feser (2006) who have developed such an estimator for mixed linear models. The

(9)

trick is to be able to write the covariance matrix Σas

Σ= r j=0

σ²_jzjz^T_j (2.4)

wherez_j is a known design matrix of 0 and 1 andσ_j² the variance of thejth randon eﬀect in the mixed linear model. The resulting elements of Σ are linear combinations of the σ_j². In the FA model, the elements of Σare not linear combinations of the model’s parametersγj andψ²_j. Therefore it is not possible (at least not in an easy manner), to use the algorithm proposed by Copt and Victoria-Feser (2006) to ﬁnd estimates forγj and ψ²_j from σ_j². In Appendix A, we derive estimating equations for the FA parameters directly from the objective function of theS-estimator, using the Lagrangian of the minimization of|Σ|subject to (2.1) given by

L= log|Σ|+λ

1 n

n i=1

ρ(di;c)−b0

(2.5)

LetP₀ = [ψ₁², ..., ψ_p²]^T and G₀ = [γ₁, ..., γ_q]^T, an iterative system for γ_j and ψ_j is given by

P^(t+1)₀ = p n

1 n

n i=1

u(d_i;c)^(t)d^2(t)_i ₋₁

C^−1(t)D^(t) G^(t+1)₀ = p

n 1

n n i=1

u(d_i;c)^(t)d^2(t)_i ₋₁

M^−1(t)N^(t)

with the matrices C, D, M and N given in respectively (A.14), (A.15), (A.17) and (A.18). Note that µ is calculated at each step with equation (2.3) and ΣfromG₀ and P₀.

For the starting point of this iterative procedure, we do an explanatory factor analysis in which we use the orthogonalized Gnanadesikan-Kettering (OGK)

(10)

proposed by Maronna and Zamar (2002) as a robust “plug-in” covariance matrix. The OGK has the advantage of being fast to compute. The resulting loadings (γ_ij) and uniqueness (ψ²_j) are then used as starting point in our algorithm, which is as follows

1. Compute µ_start=µ_OGK and ΣOGK

2. ComputeΨ_startand Γ_start by means of an exploratory factor analysis based onΣOGK.

3. Compute Σ_start=Γ_startΓ^T_start+Ψ_start 4. Compute the Mahalanobis distanced⁽¹⁾_i =

(x_i−µ_start)^TΣ⁻¹_start(x_i−µ_start).

5. Compute the diﬀerent weights u(d⁽¹⁾_i ;c) 6. Compute the mean vector

µ⁽¹⁾ =

ni=1u(d⁽¹⁾_i ;c)x_i

ni=1u(d⁽¹⁾_i ;c)

7. Compute the vector of loadingsG0 and uniquenessP0

G⁽¹⁾₀ = p n

1 n

n i=1

u(d_i;c)⁽¹⁾d²⁽¹⁾_i ₋₁

M⁻¹⁽¹⁾N⁽¹⁾ P⁽¹⁾₀ = p

n 1

n n i=1

u(di;c)⁽¹⁾d²⁽¹⁾_i ₋₁

C⁻¹⁽¹⁾D⁽¹⁾

8. Using P⁽¹⁾₀ and G⁽¹⁾₀ computeΣ⁽¹⁾

9. Use a convergence criterion; if the conditions of convergence are met stop, otherwise start again from (4) and putµ_start=µ⁽¹⁾andΣstart= Σ⁽¹⁾. Repeat the procedure until convergence is reached.

(11)

3 Inference based on the CBS for the direct and two-stage estimators

3.1 Direct estimator

To compute the asymptotic covariance V_G₀ of the loadings estimator ˆG₀ and V_P₀ of the variances estimator ˆP₀, we use the results given by Lop- uha¨a (1989) for general S-estimators together with the delta-method. The asymptotic covariance of √

nvec( ˆΣ) is

V_Σ=e₃(I_p2 +K_p,p)Σ⊗Σ+e₄vec(Σ)vec(Σ)^T (3.1)

where⊗denotes the Kronecker product andKp,p is a (p²×p²)-block matrix with (i, j)-block being a p×pmatrix with 1 at entry (j, i) and 0 elsewhere.

The constants e3 and e4 are given by

e₃ = p(p+ 2)E_Φ[ψ²(d;c)d²]

E_Φ²[ψ(d;c)d²+ (p+ 1)ψ(d;c)d] (3.2) e₄ = −2

pe₃+4E_Φ[ρ(d;c)−b₀]²

E_Φ²[ψ(d;c)d] (3.3) with p the dimension of Σ, ψ(d;c) = (∂/∂d)ρ(d;c). The expectations are taken at the standardizedp-variate normal distribution Φ. The values ofe3

and e4 can be obtained by numerical simulation.

Because of (1.1), Σ is a function of G0 and P0, i.e. Σ(G0,P0), and using the delta-method, we have that

V_G₀ =

D_Γ^TD_Γ₋₁

D_Γ^TV_ΣD_Γ

D_Γ^TD_Γ₋₁

(3.4)

and

V_P₀ =

D_Ψ^TD_Ψ₋₁

D_Ψ^TV_ΣD_Ψ

D_Ψ^TD_Ψ₋₁

(3.5)

(12)

with D_Γ = ^∂

∂G^T₀vec

Σ(G₀,P₀)

and D_Ψ = ^∂

∂P^T₀vec

Σ(G₀,P₀)

. For D_Γ, we have

D_Γ= [ ∂

∂γ1vec(Σ) . . . ∂

∂γqvec(Σ)] = [ ∂

∂γ1vec(ΓΓ^T) . . . ∂

∂γqvec(ΓΓ^T)]

= [vec( ∂

∂γ1ΓΓ^T) . . . vec( ∂

∂γqΓΓ^T)]

= [vec ( ∂

∂γ1Γ)Γ^T

+ vec Γ( ∂

∂γ1Γ^T)

. . . vec ( ∂

∂γqΓ)Γ^T + vec

Γ( ∂

∂γqΓ^T) ]

Because vec(AB) = (I_p⊗A)vec(B) = (B^T⊗I_m)vec(A) where A is an m×n matrix and B is an n×q matrix (see e.g. Magnus and Neudecker (1988)), we obtain that

D_Γ =

I_p⊗(∂ γ1)Γ

vec(Γ^T) + (∂

γ1Γ^T)^T⊗I_p

vec(Γ) . . . I_p⊗( ∂

γ_q)Γ

vec(Γ^T) + (∂

γ_qΓ^T)^T⊗I_p

vec(Γ)

(3.6)

3.2 Two-stage estimator

The two-stage estimator ˆθis the result of the minimization of (1.3), i.e. the solution inθ of

∂σ(θ)

∂θ^T W( ˆσ−σ(θ)) = 0 (3.7) The asymptotic variance of ˆθ is equal to (see e.g. Yuan and Bentler (1998))

Ω=A⁻¹ΠA⁻¹ (3.8)

with A = (^∂σ(θ)

∂θ^T )^TW^∂σ(θ)_∂θT , Π = (^∂σ(θ)

∂θ^T )^TWV_ΣW(^∂σ(θ)

∂θ^T ) and V_Σ being the asymptotic variance of√

nσˆ =√

nvec( ˆΣ). (3.7) actually deﬁnes a class of GLS estimators within which the most eﬃcient (and equivalent to the MLE) is obtained when W = ˆV⁻¹_Σ , a sample estimate of V_Σ In this case,

(13)

(3.8) simpliﬁes to

Ω=

∂σ(θ)

∂θ^T

TV⁻¹_Σ ∂σ(θ)

∂θ^T ₋₁

(3.9) Whenθ contains only the loadings, (3.9) can be written as

V_G₀ =

D_Γ^TV⁻¹_Σ D_Γ₋₁

(3.10)

with VΣ given in (3.1). One can see that (3.4) and (3.10) are equal if DΓ

is a square matrix, but not in the other cases. It is however not clear which of the asymptotic covariance matrices is smaller, and hence which of the two-stage or direct estimators is the most eﬃcient.

4 Simulation study

The aim of the simulation study is to compare the (direct) CBS estimator with respectively the classical two-stage estimator (i.e based on the Pearson correlations) and the two-stage estimator using the robust BS-estimator as covariance matrix estimator, under diﬀerent kinds of models and data contamination. For the classical and robust two-stage estimators we use the software Mplus, whereas we use the statistical package R (R Develop- ment Core Team 2008) for the (direct)CBS, with theOGK estimator as a starting point as well as for computing theBS-estimator. We try both the MLE and the GLS objective function in Mplus for the classical and robust two-stage estimators, but the latter doesn’t seem to be reliable in relatively small samples (n = 40 , p = 20) as will be shown below with Figure 1.

Therefore, we will ﬁnally compare three estimators, the classical and robust two-stage estimators (with, respectively, the Pearson or the BS covariance estimates) with the ML objective function (M L(P) and M L(BS)) and the

(14)

(direct)CBS.

The data are generated from two different models, a model with a relatively small number of manifest variables (p= 6) for which we simulate relatively large samples (n = 100) and a model with a relatively large number of manifest variables (p = 20) for which we simulate relatively small samples (n = 40). In both models, we set the number of latent variables to 2, and for each of them, two sets of values for Γ were chosen in order to create a strong and weak signal to noise ratio. The values of the parameters for all models can be found in Appendix B. For the model withp= 6, the number of loadings different from 0 (q) is equal to p and for the other model we choose 2 different values for q, namelyq =p= 20 and q= 26.

The contamination consists in generating (1−)% of the data from a multivariate normal distribution with parameters given by the model and % of the data from a multivariate normal distribution with one third of the loadings diﬀerent from 0 that are multiplied by 10. Diﬀerent amounts of contamination are chosen, but we present here only the= 5% case.

For each model, we compute the asymptotic efficiency measured by the ratio of the traces of the covariance matrices of the robust two-stage relative to the directCBS estimators using (3.4) and (3.10) with the true values of the parameters. With our chosen set of parameters, whenq=pthe efficiency is approximately one but whenq > pthe two-stage estimator is asymptotically more efficient (see values in Appendix B).

In Figure 1 are presented the bias distributions of the loadings’ and uniqueness’ estimators for the model withn= 40 (q =p= 20) and a high signal to noise ratio, without data contamination. The estimators are the two-stage estimators using respectively theM Lor theGLS objective functions, with the Pearson correlations or the BS estimator, as well as the direct CBS

(15)

estimation. As mentioned previously, it seems that using theGLS objective function produces unreliable results, while the other estimators (M L(P), M L(BS) andCBS) behave well (unbiased) and in the same manner (same variances).

ML(P) GLS(P) ML(BS) GLS(BS) CBS

−2.0−1.00.00.51.01.5

γ3

−2.0−1.00.00.51.01.5

γ16

−2.0−1.00.00.51.01.5

ψ3

−2.0−1.00.00.51.01.5

ψ16

Figure 1: Bias distribution of the loadings’ and uniqueness’ estimators for the model withn= 40, no contamination and a high signal to noise ratio

When there is data contamination, not surprisingly the simulations show that the two-stage estimator based on the Pearson’s covariance matrix is bi- ased, whereas this is not the case with theCBS and the two-stage estimator based on theBS (see Figure 2).

Overall, the two robust estimators (M L(BS) and CBS) seem to be pretty similar in terms of bias and variance. However, in one case (n= 100, p= 6 and low signal to noise ratio) the two-stage estimator produces extreme estimates for the loadings and negative ones for the uniqueness, as can be

(16)

ML(P) ML(BS) CBS

−1.0−0.50.00.51.0

γ3

ML(P) ML(BS) CBS

−1.5−1.0−0.50.00.51.01.5

γ5

ML(P) ML(BS) CBS

−1.5−0.50.00.51.01.5

ψ3

ML(P) ML(BS) CBS

−1.5−0.50.00.51.01.5

ψ5

Figure 2: Bias distribution of the loadings’ and uniqueness’ estimators for the model withp= 6, data contamination and a high signal to noise ratio

seen in Figures 3 and 4. The Mean Square Error of the two-stage estimator is really larger than that of theCBS in that case (see Table 1). This could be due to the Mplus implementation or to the fact that the two-stage estimator needs the estimation of a full covariance matrix.

In the case where q > p, although the asymptotic variance of the CBS is larger than that of the two-stage estimator, with ﬁnite samples, the CBS doesn’t seem to be more variable than the two-stage estimator with or without data contamination, as illustrated in Figure 5.

To conclude, the simulation study show that, in ﬁnite sample, the CBS is not less eﬃcient than the indirect estimator and it is more stable in some cases where the latter produces extreme estimates.

(17)

ML(P) ML(BS) CBS

−4−2024

γ2

ML(P) ML(BS) CBS

−4−2024

γ6

ML(P) ML(BS) CBS

−4−2024

ψ1

ML(P) ML(BS) CBS

−4−2024

ψ6

Figure 3: Bias distribution of the loadings’ and uniqueness’ estimators for the model withp= 6 without data contamination and with a low signal to noise ratio

5 Conclusion

To estimate in a robust fashion the parameters of a FA model, one can choose between two strategies. One can compute a two-stage estimator by which a robust covariance matrix of the manifest variables is ﬁrst computed and then used in the MLE or GLS estimating equations, possibly using a software, such as Mplus, available in the market. Alternatively, one can directly estimate the FA parameters (loadings and uniqueness) from a robust objective function such as the one of anS-estimator, in the same spirit as is done in Copt and Victoria-Feser (2006) for the mixed linear model. In this paper we derive the estimating equations (and an iterative procedure) for the S-estimator of the FA model, i.e. the CBS-estimator. The two estimators

(18)

ML(P) ML(BS) CBS

−4−2024

γ2

ML(P) ML(BS) CBS

−4−2024

γ6

ML(P) ML(BS) CBS

−4−2024

ψ1

ML(P) ML(BS) CBS

−4−2024

ψ6

Figure 4: Bias distribution of the loadings’ and uniqueness’ estimators for the model with p = 6 with data contamination and with a low signal to noise ratio

have diﬀerent asymptotic properties, in that their asymptotic covariance matrix is not the same. With the models used in the simulation study, we found that the (trace of the) asymptotic covariance matrix of the two-stage estimator is even smaller. However, in ﬁnite samples, the CBS-estimator seems to be more stable than the two-stage estimator (smaller MSE).

(19)

Table 1: MSE ×100 for the model withp= 6, low signal to noise ratio

^a= 0% = 5%

ML(P) ML(BS) CBS ML(P) ML(BS) CBS

γ1 5.45 4.82 9.24 7.72 6.57 10.91

γ₂ 17.14 31.74 16.19 63.09 9.74 18.32

γ₃ 9.35 23.39 15.67 22.86 15.99 16.53

γ4 10.63 54.25 13.02 31.96 35.76 13.88

γ₅ 1.24 1.31 3.60 1.35 1.72 4.16

γ₆ 6.86 8.78 9.82 44.43 28.16 12.09

ψ1 13.45 5.53 9.79 13.45 5.53 9.79

ψ₂ 174.50 692.83 26.61 174.50 692.83 26.61 ψ₃ 56.35 275.53 25.17 56.35 275.53 25.17 ψ4 62.63 4658.36 30.62 62.63 4658.36 30.62

ψ₅ 1.51 1.76 5.40 1.51 1.76 5.40

ψ₆ 30.47 69.43 23.58 30.47 69.43 23.58

aAmount of contamination

ML(P) ML(BS) CBS

−1.5−1.0−0.50.00.51.01.5

γ4

ML(P) ML(BS) CBS

−1.5−1.0−0.50.00.51.01.5

γ23

ML(P) ML(BS) CBS

−1.5−0.50.00.51.01.5

ψ4

ML(P) ML(BS) CBS

−1.5−0.50.00.51.01.5

ψ17

Figure 5: Bias distribution of the loadings’ and uniqueness’ estimators for the model withn= 40, q > p, data contamination and with a low signal to noise ratio

(20)

References

Browne, M. W. (1984). Asymptotically distribution-free methods for the analysis of covariance structures.British Journal of Mathematical and Statistical Psychology 37, 62–83.

Copt, S. and M.-P. Victoria-Feser (2006). High breakdown inference in the mixed linear model. Journal of the American Statistical Associa- tion 101, 292–300.

Joreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis.Psychometrika 32, 443–482.

Lopuha¨a, H. P. (1989). On the relation between S-estimators and M- estimators of multivariatelocation and covariance. The Annals of Statistics 17, 1662–1683.

Magnus, J. R. and H. Neudecker (1988).Matrix Diﬀerential Calculus with Applications in Statistics and Econometrics. John Wiley & Sons.

Maronna, R. A. and R. H. Zamar (2002). Robust estimates of location and dispersion for high-dimensional datasets. Technometrics 44, 307–317.

Pison, G., P. J. Rousseeuw, P. Filzmoser, and C. Croux (2003). Robust factor analysis.Journal of Multivariate Analysis 84(1), 145–172.

R Development Core Team (2008). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.

Rousseeuw, P. J. and V. J. Yohai (1984). Robust regression by means of S-estimators. In J. W. Franke, Hardle, and R. D. Martin (Eds.),

(21)

Robust and Nonlinear Time Series Analysis, pp. 256–272. New York:

Springer-Verlag.

Yuan, K.-H. and P. M. Bentler (1998). Robust mean and covariance structure analysis.British Journal of Mathematical and Statistical Psychol- ogy 51, 63–88.

Yuan, K.-H. and P. M. Bentler (2001). Eﬀect of outliers on estimators and tests in covariance structure analysis.The British Journal of Mathe- matical and Statistical Psychology 54, 161–175.

(22)

A Derivation of the estimating equations for the CBS -estimator

It is useful to notice that for the FA model we have

Γ = q j=1

∂

∂γ_jΓ

γ_j (A.1)

Ψ = r j=1

Z_jψ²_j (A.2)

withq≥pand r ≤p. We also have that Z_j = ∂

∂ψ²_jΣ= ∂

∂ψ_j²(ΓΓ^T +Ψ) (A.3)

and

∂

∂γ_jΣ= ∂Γ

∂γ_j

Γ^T +Γ ∂Γ^T

∂γ_j

(A.4) Using

∂

∂θ_j log|Σ|= tr

Σ⁻¹∂Σ

∂θ_j

and

∂

∂θ_jΣ⁻¹ =−Σ⁻¹∂Σ

∂θ_jΣ⁻¹

with θ_j any element of θ, the vector of the FA parameters, the partial

(23)

derivatives of the Lagrangian (2.5) are then

∂

∂µL=−λ1 n

n i=1

u(di;c)Σ⁻¹(xi−µ) =0 (A.5)

∂

∂ψ²_jL= tr[Σ⁻¹Z_j]− λ 2n

n i=1

u(d_i;c)(x_i−µ)^TΣ⁻¹Z_jΣ⁻¹(x_i−µ) = 0 (A.6)

∂

∂γjL= tr

Σ⁻¹ ∂

∂γjΓ

Γ^T +Σ⁻¹Γ ∂

∂γjΓ^T

− λ

2n n i=1

u(di;c)(xi−µ)^TΣ⁻¹ ∂

∂γ_jΓ

Γ^T +Γ ∂

∂γ_jΓ^T

Σ⁻¹(xi−µ) = 0 (A.7)

¿From (A.5) we get the estimating equation for µ given in (2.3). Then, multiplying (A.6) byψ²_j and using the properties of the trace we get

tr[Σ⁻¹Z_jψ²_j]− λ 2n

n i=1

u(d_i;c)(x_i−µ)^TΣ⁻¹Z_jψ_j²Σ⁻¹(x_i−µ) = 0 r

j=1

tr[Σ⁻¹Z_jψ_j²]− λ 2n

r j=1

n i=1

u(d_i;c)(x_i−µ)^TΣ⁻¹Z_jψ_j²Σ⁻¹(x_i−µ) = 0

tr[Σ⁻¹^r

j=1

Z_jψ²_j]− λ 2n

n i=1

u(d_i;c)(x_i−µ)^TΣ⁻¹^r

j=1

Z_jψ_j²Σ⁻¹(x_i−µ) = 0

tr[Σ⁻¹Ψ]− λ 2n

n i=1

u(d_i;c)(x_i−µ)^TΣ⁻¹ΨΣ⁻¹(x_i−µ) = 0 (A.8)

(24)

Similarly for (A.7) we get

tr Σ⁻¹

∂Γ

∂γ_j

γ_jΓ^T +Σ⁻¹Γ ∂Γ^T

∂γ_j

γ_j −

λ 2n

n i=1

u(d_i;c)(x_i−µ)^TΣ⁻¹ ∂Γ

∂γj

γ_jΓ^T +Γ ∂Γ^T

∂γj

γ_j

Σ⁻¹(x_i−µ) = 0

q j=1

tr Σ⁻¹

∂Γ

∂γ_j

γjΓ^T +Σ⁻¹Γ ∂Γ^T

∂γ_j

γj

−

λ 2n

q j=1

n i=1

u(d_i;c)(x_i−µ)^TΣ⁻¹ ∂Γ

∂γj

γ_jΓ^T +Γ ∂Γ^T

∂γj

γ_j

Σ⁻¹(x_i−µ) = 0

tr Σ⁻¹

⎛

⎝^q

j=1

∂Γ

∂γjγj

⎞

⎠Γ^T +Σ⁻¹Γ

⎛

⎝^q

j=1

∂Γ^T

∂γj γj

⎞

⎠

−

λ 2n

n i=1

u(d_i;c)(x_i−µ)^TΣ⁻¹⎛

⎝^q

j=1

∂Γ

∂γjγ_j

⎞

⎠Γ^T +Γ

⎛

⎝^q

j=1

∂Γ^T

∂γj γ_j

⎞

⎠

Σ⁻¹(x_i−µ) = 0

2tr

Σ⁻¹ΓΓ^T

− λ 2n

n i=1

u(d_i;c)(x_i−µ)^TΣ⁻¹(2ΓΓ^T)Σ⁻¹(x_i−µ) = 0 (A.9)

Multiplying (A.8) by 2 and adding the result to (A.9) we obtain

2tr[Σ⁻¹Ψ] + 2tr[Σ⁻¹ΓΓ^T]− λ n

n i=1

u(d_i;c)(x_i−µ)^TΣ⁻¹ΨΣ⁻¹(x_i−µ)− λ

n n i=1

u(d_i;c)(x_i−µ)^TΣ⁻¹(ΓΓ^T)Σ⁻¹(x_i−µ) = 0 2tr[Σ⁻¹Ψ+Σ⁻¹ΓΓ^T]−λ

n n

i=1

u(d_i;c)(x_i−µ)^TΣ⁻¹(Ψ+ΓΓ^T)Σ⁻¹(x_i−µ) = 0 2tr

Σ⁻¹(Ψ+ΓΓ^T) − λ

n n i=1

u(di;c)(xi−µ)^TΣ⁻¹(Σ)Σ⁻¹(xi−µ) = 0 2p− λ

n n i=1

u(d_i;c)d²_i = 0

(25)

so that we ﬁnally get

λ= 2p 1

n n i=1

u(di;c)(d²_i) ₋₁

(A.10)

Using (A.10) in (A.6) and (A.7) respectively yields

tr[Σ⁻¹Zj] = p n

1 n

n i=1

u(di;c)d²_i

₋₁ⁿ

i=1

u(di;c)(xi−µ)^TΣ⁻¹ZjΣ⁻¹(xi−µ)

(A.11) and

tr Σ⁻¹

∂Γ

∂γ_j

Γ^T +Σ⁻¹Γ ∂Γ^T

∂γ_j = p n

1 n

n i=1

u(d_i;c)d²_i ₋₁

ⁿ

i=1

u(di;c)(xi−µ)^TΣ⁻¹ ∂Γ

∂γj

Γ^T +Γ ∂Γ^T

∂γj Σ⁻¹(xi−µ)

(A.12) (A.11) and (A.12) are the estimating equations for the parameters γj and ψj.

To deﬁne an iterative system, we need now to extractψ_j andγ_j from (A.11) and (A.12). First, we use (A.2) to reformulate the left hand side of (A.11) as

tr[Σ⁻¹Z_j] = tr[Σ⁻¹Z_jΨ⁻¹Ψ] = tr[Σ⁻¹Z_jΨ⁻¹^r

k=1

Z_kψ_k²]

= r k=1

tr[Σ⁻¹Z_jΨ⁻¹Z_k]ψ_k² (A.13)

which deﬁnes an iterative system for each ψ_k², k= 1, . . . , r. Let C=

tr[Σ⁻¹Z_jΨ⁻¹Z_k]

j,k=1,...,r (A.14)

(26)

and

D= ⁿ

i=1

u(di;c)(xi−µ)^TΣ⁻¹ZjΣ⁻¹(xi−µ)

j=1,...,r (A.15)

an expression for the ψ_j² is given by

P₀ = p n

1 n

n i=1

u(d_i;c)d²_i ₋₁

C⁻¹D (A.16)

In the same way, we use (A.1) to reformulate the left hand side of (A.12) as

tr Σ⁻¹

∂Γ

∂γ_j

Γ^T +Σ⁻¹Γ ∂Γ^T

∂γ_j = tr

Σ⁻¹∂Γ

∂γ_j q k=1

∂Γ^T

∂γ_kγ_k

+Σ⁻¹ q k=1

∂Γ

∂γ_kγ_k ∂Γ^T

∂γ_j

= q k=1

tr

Σ⁻¹∂Γ

∂γ_j

∂Γ^T

∂γ_k +Σ⁻¹∂Γ

∂γ_k

∂Γ^T

∂γ_j

γ_k

Letting

M=

tr[Σ⁻¹∂Γ

∂γ_j

∂Γ^T

∂γ_k +Σ⁻¹∂Γ

∂γ_k

∂Γ^T

∂γ_j

j,k=1,...,q (A.17)

and

N= ⁿ

i=1

u(di;c)(xi−µ)^TΣ⁻¹∂Γ

∂γ_j

Γ^T+Γ∂Γ^T

∂γ_j

Σ⁻¹(xi−µ)

j=1,...,q

(A.18) so that

G0= p n

1 n

n i=1

u(di;c)d²_i ₋₁

M⁻¹N (A.19)

(27)

B Parameter’s values for the simulations

B.1 First Model : n = 100, p= 6

We ﬁrst choose the parameter’s values µ, Γ and Ψ so that the signal to noise ratio is high. They are

µ= [0,0,0,0,0,0]^T, Γ=

⎡

⎢⎢

⎣

−1.5 0 1.8 0

2 0

0 1.93 0 −1.6 0 1.4

⎤

⎥⎥

⎦

, Ψ=Ip

The eﬃciency of theM L(BS) relative to theCBS is 0.996.

For a low signal to noise ratio case, we choose forµ,Γ andΨthe following values

µ= [0,0,0,0,0,0]^T, Γ=

⎡

⎢⎢

⎣

0.1 0 0.54 0

−0.45 0 0 0.83 0 0.26 0 −0.6

⎤

⎥⎥

⎦

, Ψ=I_p

B.2 Second Model : n= 40, p= 20

For a high signal to noise ratio and p=q, the values forµ,Γ and Ψare

(28)

µ= [0, . . . ,0]^T, Γ=

⎡

⎢⎢

⎢⎣

−1.2 0

−1.3 0

−1.6 0 1.3 0 0.9 0 2.2 0 1.2 0 2.1 0 1.5 0 0.3 0 0 −1.1 0 1.5 0 −2.1 0 −0.7 0 2.2 0 1.5 0 −2 0 0.9 0 0.7 0 −0.8

⎤

⎥⎥

⎥⎦

, Ψ=Ip

For the low signal to noise ratio, the values forµ, Γand Ψare

(29)

µ= [0, . . . ,0]^T, Γ=

⎡

⎢⎢

⎢⎣

−0.7 0

−0.4 0 0.1 0

−0.6 0

−0.62 0

−0.43 0 0.4 0

1 0

0.47 0

−0.3 0 0 0.1 0 0.8 0 −0.6 0 0.5 0 −0.9 0 −0.95 0 −0.8 0 0.3 0 0.03 0 0.09

⎤

⎥⎥

⎥⎦

, Ψ=Ip

For a high signal to noise ratio and 26 loadings diﬀerent from zero, the values forµ,Γ and Ψare

(30)

µ= [0, . . . ,0]^T, Γ=

⎡

⎢⎢

⎢⎣

−1.2 0

−1.3 0

−1.6 −1.2 1.3 0 0.9 −0.9 2.2 0 1.2 0 2.1 0 1.5 1.4 0.3 0 0 −1.1 0 1.5 1.3 −2.1 0 −0.7 0 2.2 0 1.5

−1 −2 1.6 0.9 0 0.7 0 −0.8

⎤

⎥⎥

⎥⎦

, Ψ=Ip

For a low signal to noise ratio and 26 loadings diﬀerent from zero, the values forµ,Γ and Ψare