Robust Statistics for Multivariate Methods

(1)

Book Chapter

Reference

Robust Statistics for Multivariate Methods

VICTORIA-FESER, Maria-Pia

VICTORIA-FESER, Maria-Pia. Robust Statistics for Multivariate Methods. In: Brian S. Everitt &

David C. Howell. Encyclopedia of Statistics in Behavioral Science . Chichester, U.K. : John Wiley & Sons, Ltd, 2005. p. 1764-1768

Available at:

http://archive-ouverte.unige.ch/unige:6503

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Robust Statistics for Multivariate Methods

M

ARIA

-P

IA

V

ICTORIA

-F

ESER

Volume 4, pp. 1764–1768 in

Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9

ISBN-10: 0-470-86080-4 Editors

Brian S. Everitt & David C. Howell



John Wiley & Sons, Ltd, Chichester, 2005

(3)

Multivariate Methods

Robust statistics, as a concept, probably dates back to the prehistory of statistics. It has, however, been formalized in the sixties by the pioneering work of Huber [9, 10] and Hampel [6, 7]. Robust statistics is an extension of classical statistics, which takes into account the fact that models assumed to have generated the data at hand are only approximate. It provides tools to investigate the robustness properties of a statisticT (such as estimators, test statistics) as well as robust estimators and robust testing procedures (seeRobust Testing Procedures).

Although one would easily agree that models can only describe approximately the reality, what is more difficult to understand is the effect of this fact on the properties of classical statistics T for which it is assumed that models are exact. Suppose that the hypothetical (multivariate) model is denoted by F but that the data at hand have been generated by the general mixtureF_ε=[1−ε]F+εH, withH a contamination distribution. AssumingFε means that the data have been generated by the modelF with probability [1−ε] and by a contamination distribution H with probabilityε. Note that a particular case for H is a distribution assigning a probability of one to an arbitrary point, that is, producing so-called outliers. Ifεis large, the contamination distribution has an important weight in the mixture distribution and an analysis based on Fε (assuming F as the central model) is meaningless. On the other hand, if ε is small, an analysis based on Fε should not be entirely determined by the contamination. It is, therefore, important to find or construct statisticsT that are not entirely determined by data contamination, that is, robust under slight model deviations (see Finite Mixture Distributions).

A well-known tool to assess the effect (on the bias of T) of infinitesimal amounts ε of contamination is the influence function (IF) introduced by Hampel [6, 7] and further developed in [8]. Another tool is thebreakdown point(BDP), which measures the maximal amount ε of (any type of) contamination thatT can withstand before it ‘breaks down’ or gives unreliable results (see for example [8]). A statis- tic T with bounded IF is said to be robust (in the infinitesimal sense). It should be stressed that most

all classical procedures for models based on the multivariate normal distribution (seeCatalogue of Prob- ability Density Functions) are not robust. This is the case, for example, for regression models (seeMulti- ple Linear Regression),factor analysis,structural equation models, linear multilevel models (which includerepeated measures analysis of variance).

In practice, to detect observations from a contamination distribution (i.e., contaminated data) is not an obvious task. For models based on thep-variate normal distribution Fµ,, a useful measure is the Mahalanobis distancedi defined on each (multivariate) observationx_i by

d_i²=(x_i−µ)^T⁻¹(x_i−µ) (1) The di takes into account the covariance structure of the data, which is very important in multivariate settings (see Multivariate Analysis: Overview).

Indeed, as an example we consider scores on psycho- logical tests collected for the study of age differences in working memory (see [1] for more details), which is presented as a multi scatterplot in Figure 1. A close look at the scatterplot between the variables ML1TOT and ML2TOT reveals that there is a minority of subjects not ‘fitting’ the covariance structure described by the bulk of data (i.e., the majority). On the other hand, on the univariate level, that is, when looking a the scores only on one of each variable, this minority of subjects has not so extreme scores.

The point here is that, when dealing with multivariate models, the screening of the data at the univariate level is not sufficient to detect contaminated data (see Multivariate Outliers).

Unfortunately, scatter plots show only the behav- ior of the data at the bivariate level, and (exact) bivariate normality does not imply (exact) normality of higher orders. It is, therefore, important to be able to rely on general measures such as (1). How- ever, (1) supposes the parametersµ,to be known, which, in practice, is never the case. If nonrobust estimators are used, then they are biased in the pres- ence of data contamination, which means that thedi

will in the best case not reveal the right contaminated data (masking effect), and, in the worst, reveal false contaminated data.

Robust statistics for multivariate models have first been used for the estimation of multivariate mean (location) and covariance (scatter). In this setting, it is desirable for the robust estimators to

(4)

2 Robust Statistics for Multivariate Methods

BCXS

10 15 15 25 10 20 30 40 20 30 40 50

10 20

5 10 15

BCXD

ML1TOT

10 20 30

15 25

ML2TOT

CMTTMSS

10 20 30

10

30 CMTTMSD

CMTTMCS

20 40

10 15 20 20

30 40 50

10 20 30 10 20 30 20 30 40

CMTTMCD 5

5

5 5

5

Figure 1 Multiscatter plot of the working memory study data

be affine equivariant (a linear transformation of the data results in a known transformation of the estimates), to have relatively highBDP(see [13]) and to be computationally efficient. The first high BDP affine equivariant estimator is the minimum volume ellipsoid(MVE) proposed by [17]. The ellipsoid (of dimensionp) containing at least half of the data with minimum volume is found and the sample mean and covariance of these data define the MVE. The latter is very computationally intensive, and is known to have poor efficiency. However, it is used, for example, to detect contaminated data or as a starting point for more efficient estimators based on weighted means and covariances.

A general class of estimators in which one can find robust ones is the class ofM-estimators (see [10]) that generalizemaximum likelihood estimators (MLE).

M-estimators (see M Estimators of Location) are defined for general parametric models Fθ as the solution inθ of

1 n

n

i=1

ψ (xi, θ )=0 (2)

When theψ-function is the score functions(x, θ )= (∂/∂θ )logf (x, θ ), one gets the MLE. Such estimators, under very mild conditions, have known asymptotic properties that can be used for inference (see e.g., [8]). For the multivariate normal model, another popular class of estimators is the class ofS-estimators (see [18]), which can be computed iteratively by means of

1 n

n

i=1

w_i^µ(µ−xi)=0 (3)

1 n

n

i=1

w^δ_i−w^η_i(xi−µ)(xi−µ)^T

=0 (4)

where the weights w_i^µ, w^η_i, w_i^δ are decreasing functions of the Mahalanobis distancesdi. Note that when the former are equal to 1 for all i, one gets the classical sample means and covariances. The choice for the weights define different estimators (see e.g., [16]). When there are missing data, [1] proposed an adaptation of (3) and (4) as an alternative to the EM algorithm (see [4]). For the working memory

(5)

data (which include missing data), the correlation between ML1TOT and ML2TOT was found to be 0.84 (robust estimation), whereas it is equal to 0.20 when using the EM algorithm. Other robust estimators for multivariate location and scatter (and their statistical properties) can be found in, for example, [3], [5], [11], [12], [14], [19], [20], [21], [22] and [23].

Although the multivariate normal distribution (see Catalogue of Probability Density Functions) is the central distribution for several models, the covariance matrix is not always present in a free form. Indeed, like in structural equations models or in mixed linear models (see Linear Multilevel Models), the true covariance matrix is structured. For example, it could be supposed that the variances are all equal, and the covariances are all equal (one-way ANOVA with repeated measures). In these cases, it is important to estimate the covariance matrix by taking into account its structure, and not just estimate it freely, and then

‘plug-in’ the estimate in the model to estimate the other parameters. [2] proposed a general class ofS- estimators for constrained covariance matrices that can be used for example with mixed linear models.

When the models are not based on the multivariate normal distribution, robust statistics become more complex. The Mahalanobis distance does not play anymore a role, and another measure for detecting contaminated data needs to be specified. For M- estimators, [10] proposed a weighting scheme based on the score function itself, that is,

ψ (x, θ )=wc(x, θ )s(x, θ ) (5) with

wc(x, θ )=min

1; c

s(x, θ )

(6)

where x =p j=1x_j²

1/2

denotes the Euclidian norm. Observations corresponding to large (absolute) value of the score function are hence downweighted.

The score function in a sense replaces the Maha- lanobis distance for multivariate normal models. The parameterc can be chosen for efficiency arguments.

With nonsymmetric models, (5) leads to inconsistent estimators, and, therefore, a shift needs to be added to (5) to make theM-estimator consistent (see for example, [8] and [15]). This can make the robust estimator computationally nearly unfeasible. With nonnormal multivariate models, robust statistics, therefore, still need to be further developed.

References

[1] Cheng, T.-C. & Victoria-Feser, M. (2002). High breakdown estimation of multivariate mean and covariance with missing observations,British Journal of Mathemat- ical and Statistical Psychology55, 317–335.

[2] Copt, S. & Victoria-Feser, M.-P. (2003). High Break- down Inference in the Mixed Linear Model. Cahiers du d´epartement d’econom´etrie no 2003.6, University of Geneva.

[3] Davies, P.L. (1987). Asymptotic behaviour of S- estimators of multivariate location parameters and dispersion matrices, The Annals of Statistics 15, 1269–1292.

[4] Dempster, A.P., Laird, M.N. & Rubin, D.B. (1977).

Maximum likelihood from incomplete data via the EM algorithm,Journal of the Royal Statistical Society, Series B39, 1–22.

[5] Donoho, D.L. (1982). Breakdown properties of multivariate location estimators, Ph.D. qualifying paper, Department of Statistics, Harward University.

[6] Hampel, F.R. (1968). Contribution to the theory of robust estimation, Ph.D. thesis, University of California, Berkeley.

[7] Hampel, F.R. (1974). The influence curve and its role in robust estimation,Journal of the American Statistical Association69, 383–393.

[8] Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. &

Stahel, W.A. (1986). Robust Statistics: The Approach Based on Influence Functions, John Wiley, New York.

[9] Huber, P.J. (1964). Robust estimation of a location parameter, Annals of Mathematical Statistics 35, 73–101.

[10] Huber, P.J. (1981).Robust Statistics, John Wiley, New York.

[11] Kent, J.T. & Tyler, D.E. (1996). Constrained M- estimation for multivariate location and scatter, The Annals of Statistics24, 1346–1370.

[12] Lopuha¨a, H.P. (1991). τ-estimators for location and scatter,Canadian Journal of Statistics19, 307–321.

[13] Maronna, R.A. (1976). Robust M-estimators of multivariate location and scatter, The Annals of Statistics4, 51–67.

[14] Maronna, R.A. & Zamar, R.H. (2002). Robust estimates of location and dispersion for high-dimensional datasets, Technometrics44, 307–317.

[15] Moustaki, I. & Victoria-Feser, M.-P. (2004).Bounded- Bias Robust Inference for Generalized Linear Latent Variable Models. Cahiers du d´epartement d’econom´etrie no 2004.02, University of Geneva.

[16] Rocke, D.M. & Woodruff, D.L. (1996). Identification of outliers in multivariate data,Journal of the American Statistical Association91, 1047–1061.

[17] Rousseeuw, P.J. (1984). Least median of squares regression,Journal of the American Statistical Association79, 871–880.

(6)

4 Robust Statistics for Multivariate Methods

[18] Rousseeuw, P.J. & Yohai, V.J. (1984). Robust regression by means of S-estimators, in Robust and Nonlinear Time Series Analysis, J.W., Franke & R.D., Martin editors, Hardle W, eds, Springer-Verlag, New York, pp. 256–272.

[19] Stahel, W.A.(1981). Breakdown of covariance estimators, Technical Report 31, Fachgruppe f¨ur Statistik, ETH, Zurich.

[20] Tamura, R. & Boos, D. (1986). Minimum Hellinger distance estimation for multivariate location and covariance,Journal of the American Statistical Association81, 223–229.

[21] Tyler, D.E. (1983). Robustness and efficiency properties of scatter matrices,Biometrika70, 411–420.

[22] Tyler, D.E. (1994). Finite sample breakdown points of projection based multivariate location and scatter statistics,Annals of Statistics22, 1024–1044.

[23] Woodruff, D.L. & Rocke, D.M. (1994). Computable robust estimation of multivariate location and shape in high dimension using compound estimators,Journal of the American Statistical Association89, 888–896.

MARIA-PIAVICTORIA-FESER