Book Chapter
Reference
Robust Statistics for Multivariate Methods
VICTORIA-FESER, Maria-Pia
VICTORIA-FESER, Maria-Pia. Robust Statistics for Multivariate Methods. In: Brian S. Everitt &
David C. Howell. Encyclopedia of Statistics in Behavioral Science . Chichester, U.K. : John Wiley & Sons, Ltd, 2005. p. 1764-1768
Available at:
http://archive-ouverte.unige.ch/unige:6503
Disclaimer: layout of this document may differ from the published version.
1 / 1
Robust Statistics for Multivariate Methods
M
ARIA-P
IAV
ICTORIA-F
ESERVolume 4, pp. 1764–1768 in
Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9
ISBN-10: 0-470-86080-4 Editors
Brian S. Everitt & David C. Howell
John Wiley & Sons, Ltd, Chichester, 2005
Multivariate Methods
Robust statistics, as a concept, probably dates back to the prehistory of statistics. It has, however, been formalized in the sixties by the pioneering work of Huber [9, 10] and Hampel [6, 7]. Robust statistics is an extension of classical statistics, which takes into account the fact that models assumed to have generated the data at hand are only approximate. It provides tools to investigate the robustness properties of a statisticT (such as estimators, test statistics) as well as robust estimators and robust testing proce- dures (seeRobust Testing Procedures).
Although one would easily agree that models can only describe approximately the reality, what is more difficult to understand is the effect of this fact on the properties of classical statistics T for which it is assumed that models are exact. Suppose that the hypothetical (multivariate) model is denoted by F but that the data at hand have been generated by the general mixtureFε=[1−ε]F+εH, withH a con- tamination distribution. AssumingFε means that the data have been generated by the modelF with prob- ability [1−ε] and by a contamination distribution H with probabilityε. Note that a particular case for H is a distribution assigning a probability of one to an arbitrary point, that is, producing so-called out- liers. Ifεis large, the contamination distribution has an important weight in the mixture distribution and an analysis based on Fε (assuming F as the cen- tral model) is meaningless. On the other hand, if ε is small, an analysis based on Fε should not be entirely determined by the contamination. It is, there- fore, important to find or construct statisticsT that are not entirely determined by data contamination, that is, robust under slight model deviations (see Finite Mixture Distributions).
A well-known tool to assess the effect (on the bias of T) of infinitesimal amounts ε of contami- nation is the influence function (IF) introduced by Hampel [6, 7] and further developed in [8]. Another tool is thebreakdown point(BDP), which measures the maximal amount ε of (any type of) contamina- tion thatT can withstand before it ‘breaks down’ or gives unreliable results (see for example [8]). A statis- tic T with bounded IF is said to be robust (in the infinitesimal sense). It should be stressed that most
all classical procedures for models based on the mul- tivariate normal distribution (seeCatalogue of Prob- ability Density Functions) are not robust. This is the case, for example, for regression models (seeMulti- ple Linear Regression),factor analysis,structural equation models, linear multilevel models (which includerepeated measures analysis of variance).
In practice, to detect observations from a contam- ination distribution (i.e., contaminated data) is not an obvious task. For models based on thep-variate normal distribution Fµ,, a useful measure is the Mahalanobis distancedi defined on each (multivari- ate) observationxi by
di2=(xi−µ)T−1(xi−µ) (1) The di takes into account the covariance structure of the data, which is very important in multivari- ate settings (see Multivariate Analysis: Overview).
Indeed, as an example we consider scores on psycho- logical tests collected for the study of age differences in working memory (see [1] for more details), which is presented as a multi scatterplot in Figure 1. A close look at the scatterplot between the variables ML1TOT and ML2TOT reveals that there is a minor- ity of subjects not ‘fitting’ the covariance structure described by the bulk of data (i.e., the majority). On the other hand, on the univariate level, that is, when looking a the scores only on one of each variable, this minority of subjects has not so extreme scores.
The point here is that, when dealing with multivari- ate models, the screening of the data at the univariate level is not sufficient to detect contaminated data (see Multivariate Outliers).
Unfortunately, scatter plots show only the behav- ior of the data at the bivariate level, and (exact) bivariate normality does not imply (exact) normal- ity of higher orders. It is, therefore, important to be able to rely on general measures such as (1). How- ever, (1) supposes the parametersµ,to be known, which, in practice, is never the case. If nonrobust estimators are used, then they are biased in the pres- ence of data contamination, which means that thedi
will in the best case not reveal the right contaminated data (masking effect), and, in the worst, reveal false contaminated data.
Robust statistics for multivariate models have first been used for the estimation of multivariate mean (location) and covariance (scatter). In this setting, it is desirable for the robust estimators to
2 Robust Statistics for Multivariate Methods
BCXS
10 15 15 25 10 20 30 40 20 30 40 50
10 20
5 10 15
BCXD
ML1TOT
10 20 30
15 25
ML2TOT
CMTTMSS
10 20 30
10
30 CMTTMSD
CMTTMCS
20 40
10 15 20 20
30 40 50
10 20 30 10 20 30 20 30 40
CMTTMCD 5
5
5 5
5
Figure 1 Multiscatter plot of the working memory study data
be affine equivariant (a linear transformation of the data results in a known transformation of the estimates), to have relatively highBDP(see [13]) and to be computationally efficient. The first high BDP affine equivariant estimator is the minimum volume ellipsoid(MVE) proposed by [17]. The ellipsoid (of dimensionp) containing at least half of the data with minimum volume is found and the sample mean and covariance of these data define the MVE. The latter is very computationally intensive, and is known to have poor efficiency. However, it is used, for example, to detect contaminated data or as a starting point for more efficient estimators based on weighted means and covariances.
A general class of estimators in which one can find robust ones is the class ofM-estimators (see [10]) that generalizemaximum likelihood estimators (MLE).
M-estimators (see M Estimators of Location) are defined for general parametric models Fθ as the solution inθ of
1 n
n
i=1
ψ (xi, θ )=0 (2)
When theψ-function is the score functions(x, θ )= (∂/∂θ )logf (x, θ ), one gets the MLE. Such estima- tors, under very mild conditions, have known asymp- totic properties that can be used for inference (see e.g., [8]). For the multivariate normal model, another popular class of estimators is the class ofS-estimators (see [18]), which can be computed iteratively by means of
1 n
n
i=1
wiµ(µ−xi)=0 (3)
1 n
n
i=1
wδi−wηi(xi−µ)(xi−µ)T
=0 (4)
where the weights wiµ, wηi, wiδ are decreasing func- tions of the Mahalanobis distancesdi. Note that when the former are equal to 1 for all i, one gets the classical sample means and covariances. The choice for the weights define different estimators (see e.g., [16]). When there are missing data, [1] proposed an adaptation of (3) and (4) as an alternative to the EM algorithm (see [4]). For the working memory
data (which include missing data), the correlation between ML1TOT and ML2TOT was found to be 0.84 (robust estimation), whereas it is equal to 0.20 when using the EM algorithm. Other robust estima- tors for multivariate location and scatter (and their statistical properties) can be found in, for example, [3], [5], [11], [12], [14], [19], [20], [21], [22] and [23].
Although the multivariate normal distribution (see Catalogue of Probability Density Functions) is the central distribution for several models, the covariance matrix is not always present in a free form. Indeed, like in structural equations models or in mixed linear models (see Linear Multilevel Models), the true covariance matrix is structured. For example, it could be supposed that the variances are all equal, and the covariances are all equal (one-way ANOVA with repeated measures). In these cases, it is important to estimate the covariance matrix by taking into account its structure, and not just estimate it freely, and then
‘plug-in’ the estimate in the model to estimate the other parameters. [2] proposed a general class ofS- estimators for constrained covariance matrices that can be used for example with mixed linear models.
When the models are not based on the multivariate normal distribution, robust statistics become more complex. The Mahalanobis distance does not play anymore a role, and another measure for detecting contaminated data needs to be specified. For M- estimators, [10] proposed a weighting scheme based on the score function itself, that is,
ψ (x, θ )=wc(x, θ )s(x, θ ) (5) with
wc(x, θ )=min
1; c
s(x, θ )
(6)
where x =p j=1xj2
1/2
denotes the Euclidian norm. Observations corresponding to large (absolute) value of the score function are hence downweighted.
The score function in a sense replaces the Maha- lanobis distance for multivariate normal models. The parameterc can be chosen for efficiency arguments.
With nonsymmetric models, (5) leads to inconsistent estimators, and, therefore, a shift needs to be added to (5) to make theM-estimator consistent (see for exam- ple, [8] and [15]). This can make the robust estimator computationally nearly unfeasible. With nonnormal multivariate models, robust statistics, therefore, still need to be further developed.
References
[1] Cheng, T.-C. & Victoria-Feser, M. (2002). High break- down estimation of multivariate mean and covariance with missing observations,British Journal of Mathemat- ical and Statistical Psychology55, 317–335.
[2] Copt, S. & Victoria-Feser, M.-P. (2003). High Break- down Inference in the Mixed Linear Model. Cahiers du d´epartement d’econom´etrie no 2003.6, University of Geneva.
[3] Davies, P.L. (1987). Asymptotic behaviour of S- estimators of multivariate location parameters and dispersion matrices, The Annals of Statistics 15, 1269–1292.
[4] Dempster, A.P., Laird, M.N. & Rubin, D.B. (1977).
Maximum likelihood from incomplete data via the EM algorithm,Journal of the Royal Statistical Society, Series B39, 1–22.
[5] Donoho, D.L. (1982). Breakdown properties of mul- tivariate location estimators, Ph.D. qualifying paper, Department of Statistics, Harward University.
[6] Hampel, F.R. (1968). Contribution to the theory of robust estimation, Ph.D. thesis, University of California, Berkeley.
[7] Hampel, F.R. (1974). The influence curve and its role in robust estimation,Journal of the American Statistical Association69, 383–393.
[8] Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. &
Stahel, W.A. (1986). Robust Statistics: The Approach Based on Influence Functions, John Wiley, New York.
[9] Huber, P.J. (1964). Robust estimation of a loca- tion parameter, Annals of Mathematical Statistics 35, 73–101.
[10] Huber, P.J. (1981).Robust Statistics, John Wiley, New York.
[11] Kent, J.T. & Tyler, D.E. (1996). Constrained M- estimation for multivariate location and scatter, The Annals of Statistics24, 1346–1370.
[12] Lopuha¨a, H.P. (1991). τ-estimators for location and scatter,Canadian Journal of Statistics19, 307–321.
[13] Maronna, R.A. (1976). Robust M-estimators of multi- variate location and scatter, The Annals of Statistics4, 51–67.
[14] Maronna, R.A. & Zamar, R.H. (2002). Robust estimates of location and dispersion for high-dimensional datasets, Technometrics44, 307–317.
[15] Moustaki, I. & Victoria-Feser, M.-P. (2004).Bounded- Bias Robust Inference for Generalized Linear Latent Variable Models. Cahiers du d´epartement d’econom´etrie no 2004.02, University of Geneva.
[16] Rocke, D.M. & Woodruff, D.L. (1996). Identification of outliers in multivariate data,Journal of the American Statistical Association91, 1047–1061.
[17] Rousseeuw, P.J. (1984). Least median of squares regres- sion,Journal of the American Statistical Association79, 871–880.
4 Robust Statistics for Multivariate Methods
[18] Rousseeuw, P.J. & Yohai, V.J. (1984). Robust regression by means of S-estimators, in Robust and Nonlinear Time Series Analysis, J.W., Franke & R.D., Martin editors, Hardle W, eds, Springer-Verlag, New York, pp. 256–272.
[19] Stahel, W.A.(1981). Breakdown of covariance estima- tors, Technical Report 31, Fachgruppe f¨ur Statistik, ETH, Zurich.
[20] Tamura, R. & Boos, D. (1986). Minimum Hellinger distance estimation for multivariate location and covari- ance,Journal of the American Statistical Association81, 223–229.
[21] Tyler, D.E. (1983). Robustness and efficiency properties of scatter matrices,Biometrika70, 411–420.
[22] Tyler, D.E. (1994). Finite sample breakdown points of projection based multivariate location and scatter statistics,Annals of Statistics22, 1024–1044.
[23] Woodruff, D.L. & Rocke, D.M. (1994). Computable robust estimation of multivariate location and shape in high dimension using compound estimators,Journal of the American Statistical Association89, 888–896.
MARIA-PIAVICTORIA-FESER