Book Chapter
Reference
Fast algorithms for computing high breakdown covariance matrices with missing data
COPT, Samuel, VICTORIA-FESER, Maria-Pia
Abstract
Robust estimation of covariance matrices when some of the data at hand are missing is an important problem. It has been studied by Little and Smith (1987) and more recently by Cheng and Victoria-Feser (2002). The latter propose the use of high breakdown estimators and so-called hybrid algorithms (see, e.g., Woodruff and Rocke, 1994). In particular, the minimum volume ellipsoid of Rousseeuw (1984) is adapted to the case of missing data. To compute it, they use (a modified version of) the forward search algorithm (see e.g. Atkinson, 1994). In this paper, we propose to use instead a modification of the C-step algorithm proposed by Rousseeuw and Van Driessen (1999) which is actually a lot faster. We also adapt the orthogonalized Gnanadesikan-Kettenring (OGK) estimator proposed by Maronna and Zamar (2002) to the case of missing data and use it as a starting point for an adapted S-estimator.
Moreover, we conduct a simulation study to compare different robust estimators in terms of their efficiency and breakdown.
COPT, Samuel, VICTORIA-FESER, Maria-Pia. Fast algorithms for computing high breakdown covariance matrices with missing data. In: Hubert, M., Pison, G., Struyf, A. and Van Aelst, S.
Theory and Applications of Recent Robust Methods . Basel : Birkhauser, 2004. p. 71-82
Available at:
http://archive-ouverte.unige.ch/unige:6502
Disclaimer: layout of this document may differ from the published version.
1 / 1
Fast Algorithms for Computing
High Breakdown Covariance Matrices with Missing Data
S. Copt and M.-P. Victoria-Feser
Abstract. Robust estimation of covariance matrices when some of the data at hand are missing is an important problem. It has been studied by Little and Smith (1987) and more recently by Cheng and Victoria-Feser (2002). The latter propose the use of high breakdown estimators and so-called hybrid al- gorithms (see, e.g., Woodruff and Rocke, 1994). In particular, the minimum volume ellipsoid of Rousseeuw (1984) is adapted to the case of missing data.
To compute it, they use (a modified version of) the forward search algorithm (see e.g. Atkinson, 1994). In this paper, we propose to use instead a modifica- tion of the C-step algorithm proposed by Rousseeuw and Van Driessen (1999) which is actually a lot faster. We also adapt the orthogonalized Gnanadesikan- Kettenring (OGK) estimator proposed by Maronna and Zamar (2002) to the case of missing data and use it as a starting point for an adaptedS-estimator.
Moreover, we conduct a simulation study to compare different robust estima- tors in terms of their efficiency and breakdown.
Mathematics Subject Classification (2000).Primary 62G35; Secondary 62H12.
Keywords. C-step algorithm, minimum covariance determinant, outliers, ro- bust statistics, S-estimators, orthogonalized Gnanadesikan-Kettering robust estimator.
1. Introduction
The statistical literature contains several proposals for high breakdown estimators of the mean and covariance in multivariate data when it is suspected that the data contain outliers or extreme observations. A well known one is the minimum co- variance determinant (M CD) of Rousseeuw (1984). However, little has been done
Both authors acknowledge the financial support of the Swiss National Science Foundation (grant 610-057883.99).
to consider also the case of missing data which is in practice very common. Only Little and Smith (1987) and Cheng and Victoria-Feser (2002) propose different solutions. In this paper we actually concentrate on robust estimators with missing data, in particular we propose the use of faster algorithms for their computation and compare them through extensive simulations in terms of their robustness prop- erties when data are contaminated and also in terms of the speed of two different algorithms used to compute the robust estimators. In particular, we adapt the orthogonalized Gnanadesikan-Kettenring (OGK) estimator proposed by Maronna and Zamar (2002) to the case of missing data and use it as a starting point for an adaptedS-estimator. All our programs are readily available (upon request) in the form of an Splus library which has been used to produce the results and graphics presented in this paper.
2. A General Class of Estimators with Missing Data
The aim is to estimate the parametersµandΣ, i.e., the mean and covariance of an underlying multivariate variableY = (Y1, . . . , Yp) that has supposedly generated the sample yi, i= 1, . . . , n at hand. As it often happens in practice, we suppose that some of the observations might be missing in that some of theyijare observed for some j ∈ {1, . . . , p} and the others are not observed or missing for the other j’s. In other terms,yi = [yT[oi],yT[mi]]T so that a distinction is made between the observed (oi) and the missing (mi) data. We suppose that the data are missing at random (see Rubin, 1976), a sufficient condition for correct likelihood-based inferences. Most known estimators of mean and covariance with missing data fall in the class proposed by Cheng and Victoria-Feser (2002), i.e.,
1 n
n i=1
wiµ(µ−yi) = 0 (2.1) 1
n n i=1
wδiΣ−wiη((yi−µ)(yi−µ)T−Ci)
= 0 (2.2)
where
yi =
y[oi]T , E
y[mi]y[oi], µ,ΣTT
=
y[oi]T , µT[mi]+ (y[oi]−µ[oi])TΣ−1[ooi]Σ[omi]T
, (2.3)
and
Ci =
0 0 0 cov
y[mi],yT[mi]y[oi], µ,Σ
=
0 0
0 Σ[mmi]−Σ[moi]Σ−1[ooi]Σ[omi]
(2.4)
where for exampleΣ[ooi]denotes the partition ofΣcorresponding to the observed part of yi, etc. The different estimators are actually defined through the data weighting system given by wiµ, wδi and wiη in (2.1) and (2.2) which in turn also depend on the parameters µandΣ (see below). To compute the estimators, one can use an iterative procedure in which given current estimatesµ(t)andΣ(t), the
yi,Ci and the weights are first computed, and the values are then updated by µ(t+1)= 1
n n i=1
wµiyi
1 n
n i=1
wµi (2.5)
Σ(t+1)=
1 n
n i=1
wiη((yi−µ(t+1))(yi−µ(t+1))T −Ci)
1 n
n i=1
wδi . (2.6) We have not worked out the conditions for (2.1) and (2.2) in particular to have a unique solution and for (2.5) and (2.6) to converge to that unique solution.
However, the reader is referred to Davies (1987) for general conditions for S- estimators.
The classical M LE is obtained when wiµ =wiη =wδi = 1∀i, and (2.5) and (2.6) define the EM algorithm (Dempster et al., 1977). However, with complete data it is well known that theM LEof mean and covariance are not robust. When there are missing data, the situation does not change; see Cheng and Victoria- Feser (2002). Little and Rubin (1987) propose to base the M-step on a robust estimator belonging to the general class ofM-estimator (Huber, 1981). They call the resulting procedure theERalgorithm. Their estimator is defined by (2.5) and (2.6) with
(wµi)2=wiη=wδi =wi=ω(doi)/d2oi (2.7) where
d2oi=d2oi(µ,Σ) =
y[oi]−µ[oi]T
Σ−1[ooi]
y[oi]−µ[oi]
(2.8) is the squared Mahalanobis distance corresponding to the observed part ofyi. See Little and Smith (1987) for the choice of the weight functionω. The iteration step for the covariance matrix (2.6) does not exactly correspond to the same step in the ERalgorithm in that the weightswηi are not applied to the correction matrixCi. We will however, in what follows consider this slight modification of the ER algo- rithm (Cheng and Victoria-Feser, 2002). If the caseiis uncontaminated, the data are normal and missing values are missing at random, then (2.8) is asymptotically χ2pi wherepi = dim
y[oi]
. The Wilson-Hilferty transformation of the chi-squared distribution yields (d2i/pi)1/3∼N(1−2/(9pi),2/(9pi)). Following Little and Smith (1987), we also propose a probability plot of
Zi= (d2i/pi)1/3−1 + 2/(9pi)
2/(9pi) (2.9)
versus standard normal order statistics, that should reveal atypical observations.
Little and Smith (1987) proposed as starting point of theERalgorithm, the M LE on the data where the missing ones have been replaced by the median of
the corresponding observations. Although the ER algorithm is relatively simple to implement, it suffers from an important drawback : its breakdown point is at most 1/(p+ 1) because it is based on a weighting scheme that is not redescending.
This drawback will be highlighted by the simulation results. This means that if the proportion of outliers exceeds this value (or even is near it) the robust estimator is not robust anymore.
To construct a high breakdown estimator of mean and covariance matrix in multivariate data when some observations are missing, Cheng and Victoria- Feser (2002) propose two strategies. The first one is to provide a high breakdown estimator such as theM CDestimator as starting point for theERalgorithm and the second is to also adapt a high breakdown estimator such as an S-estimator (Rousseeuw and Yohai, 1984) to incomplete data. The resulting estimator which is called theERT BSis then defined through (2.5) and (2.6) with
wiµ=wηi/p= wδid˜2i = ψ( ˜di)
d˜i , (2.10)
d˜i=doi
µ, Σ
/k with µand Σ being the current values of the high breakdown point estimator, and
k= d[q]
(χ2p)−1(q/(n+ 1)) , (2.11) where d[q] denotes theqth ordered distance (based on thedoi
µ,Σ
),q=(n+ p+ 1)/2withxdenoting the integer part ofx, and
ψ(d;c, M) =
d 0≤d < M
d
1−d−M
c
22
M ≤d≤M+c 0 d > M+c.
This ψ function defines the translated biweight S-estimator proposed by Rocke (1996). The parametersM andccontrol the breakdown pointε∗ and the asymp- totic rejection probabilityARP αof theERT BS. TheARP can be interpreted as the probability for an estimator, in large samples under a reference distribution, to give a null (or nearly null) weight.M andc are found implicitly by
ε∗max
d ρ(d;c, M) = Eχ2p[ρ(d;c, M)], M +c =
(χ2p)−1(1−α) ;
whereρis the primitive ofψ(see Rocke, 1996). The choices forε∗andαare to be made by the analyst. The former is the suspected maximal amount of contaminated data and for the latter Cheng and Victoria-Feser (2002) propose choices between 0.1% and 1%.
As Rocke (1996) noted, it is very important to choose a good starting point for any algorithm defining a high breakdown point estimator, otherwise the latter
can loose its high breakdown properties. For the ERT BS, Cheng and Victoria- Feser (2002) therefore propose an adaptation of theM CD estimator as a starting point as well as an algorithm to compute it. However, to compute the M CD one needs algorithms that are based on random starting subsamples. This can lead to situations in which theM CD is very slow to compute, if not impossible.
Therefore, in the following section, we propose a fast algorithm to compute the M CDby adapting theFAST-MCD of Rousseeuw and Van Driessen (1999) and as an even faster alternative, we propose a modified version of the OGK estimator adapted to the case of missing data to be used as a starting point for theERT BS.
3. Starting Point Robust Estimators with Missing Data
3.1. The Modified MCD
The objective of the M CD estimator is to find hobservations (out of n) whose covariance matrix has the lowest determinant. TheM CD mean estimator is then the sample mean of those h points, and the M CD covariance estimator is their sample covariance matrix. To compute the M CD, one needs an algorithm for finding the best subset ofhpoints, which usually involves the repeated computa- tion of the sample mean and covariance as well as Mahalanobis distances. When some observations are missing, Cheng and Victoria-Feser (2002) propose to use the EM algorithm to compute the sample means and covariances at all steps of the algorithm and to base the Mahalanobis distances on the observed part of the ob- servation as in (2.8). The latter are standardized by means of the Wilson-Hilferty transformation given in (2.9), so that one takes into account the unequal number of missing values for each observation.
A choice needs to be made on h and one way is to choose it such that the M CD has the highest breakdown. In this case, the minimal value of h is (Rousseeuw and Leroy, 1987)h:=n+p+1
2
. But this is also the choice that gives the largest efficiency loss. So when we suspect that the sample is not heavily contaminated we can reasonably choose a larger value for the proportion of points of say 75% or 80% so we can takeh:=0.75norh:=0.80n.
To run theM CD, Cheng and Victoria-Feser (2002) adapt the forward search algorithm proposed by Atkinson (1993, 1994). However, more recently Rousseeuw and Van Driessen (1999) have proposed a new algorithm calledFAST-MCD sup- posed to be even faster than the forward search algorithm and able to deal with very large data sets. In this paper, we propose to adapt it to compute theM CD when there are missing data.
A key idea of the FAST-MCD algorithm is the fact that starting from any approximation to theM CD, it is possible to find an approximation with a lower determinant. Indeed Rousseeuw and Van Driessen (1999) observed that from a subsetHk of sizeh in whichµ, Σ and the Mahalanobis distances are computed, one can create a subsetHk+1by taking among thenobservations thehones with the smallest Mahalanobis distances with the property that the determinant of Σ
based onHk+1is smaller. Each step is called aC-step. The initial subset is created by choosing randomlyp+ 1 observations on which the Mahalanobis distances are computed to order the n observations. The firsth ones define the initial subset H1. If the determinant of Σ based on the randomly chosen p+ 1 observations is null, one adds one randomly chosen observation at the time until the determinant becomes positive. If for any subset Hk there are missing values, we compute µk
and Σk with the EM algorithm. The Mahalanobis distances are also changed as in (2.8) and standardized using the Wilson-Hilferty transformation. The absolute value of the latter is used to order the observations. The initial subset is created choosing randomly p+ 1 observations among the fully observed ones. For more details, see Copt and Victoria-Feser (2003).
Through extensive simulations we compare, in Section 4, the forward search algorithm and theFAST-MCD algorithm for the computation of theM CDwith missing data.
3.2. The Modified OGK
Maronna and Zamar (2002) base their OGK on the robust estimator for covari- ancesσjkproposed by Gnanadesikan and Kettenring (1972) which is very simple to compute. Indeed the latter is defined for a pair of random variables (i.e.,p= 2) as
1 4
σ(Yj+Yk)2−σ(Yj−Yk)2
whereσ() is a standard deviation function applied on its argument. A robust esti- mator forσjkis obtained whenσ() is a robust function. Whenp >2, the covariance matrixΣis estimated by replacing all its elements by all pairwise estimates. It is well known that such an estimator may produce non positive definite matrices and the estimator is not affine equivariant. To overcome the lack of positive definite- ness, Maronna and Zamar (2002) propose an estimator defined by the following four steps:
1. LetD= diag (σ(Yj))|j=1,...,p and define xi =D−1yi, i= 1, . . . , n, i.e., real- izations fromX = (X1, . . . , Xp).
2. Compute the matrixU= (ujk) with ujk=
1
4
σ(Xj+Xk)2−σ(Xj−Xk)2 j=k
1 j=k. (3.1)
3. DecomposeU asU=EΛET with Λ =diag(λ1, . . . , λp).
4. Definezi =ETxi, i.e., realizations fromZ= (Z1, . . . , Zp) andA=DE. The estimator ofΣisAΓAT with Γ = diag
σ(Zj)2
j=1,...,p.
A location estimator for µ is given by Aν with ν = (m(Zj))|j=1,...,p, m() being a (robust) mean function. The procedure can be iterated by replacingUin step 2 by EΓET until convergence. For the choice ofσ() and m(), see Maronna and Zamar (2002). The latter argue that to improve the efficiency of the OGK, one could use it as a hard rejection tool in that a reweighted estimator as in (2.1)
is used in whichyi =yi∀iandwiµ=wηi =wδi =wiwith wi=
1 (yi−µOGK)TΣ−1OGK(yi−µOGK)≤χ2p(.9) 0 otherwise.
The resulting estimator will be called the reweighted OGK (rOGK). Note that this strategy is also used most of the times with theM CD but with the quantile 0.975 (instead of 0.9) of theχ2p. We will call the resulting estimator the rM CD. We have not derived the formal conditions for convergence to a unique solution of the algorithm used to compute the (rOGK), but in our simulations we found that it converged in all our samples.
To extend theOGK orrOGKto the case of missing data, we propose to im- pute the missing values by means of theyiin (2.3) obtained by the EM algorithm, i.e., withµandΣestimated by (2.1) where all weights are equal to 1. The reason is that the EM algorithm is very fast, and although it leads to biased estimates of µ and Σ and therefore of the imputed values yi, we found through extensive simulation studies, that in practice the rOGK is not or very little affected (see Section 4). In future research, we will seek a better adaptation of theOGK to the case of missing data.
4. Simulation Study
4.1. The Design
The model is the multivariate normal distributionN(µ,Σ). Because theOGK is not affine equivariant, following Maronna and Zamar (2002), we choose to simulate correlated data, i.e., Σ = R(ρ)2 where R(ρ) = ρfor the elements j =k and 1 for the others, with ρ= 0.2. We also chose µ=0. The data were contaminated by so-called shift-outliers (Woodruff and Rocke, 1993), i.e., an -proportion was generated usingN(√
p+β/√ 2
ep,Σ) whereepis ap-dimensional vector of ones.
We setβ= 1.6 and= 0, 0.02, 0.05 and 0.1. The missing data, if any, were chosen randomly among the mixture distribution between the good and the bad data.
We chose proportionsmiss= 0.1, 0.2 and 0.3. Table 1 shows the different values fornandpused in the simulations. Each robust estimator requires a decision on
Table 1. Values fornandp. p= 10 p= 20 p= 50 n= 50 100 200 n= 100 200 400 n= 500 500 600
its initialization parameters. For theM CD estimator,h= [0.6n] was chosen. For theOGK, c1= 4.5 andc2= 3 were chosen (see Maronna and Zamar, 2002). For theERT BSestimator we chose for our simulations the breakdown pointε∗= 0.3
and the ARP α= 0.001. All computational experiments were done on a Athlon 1900Mhz with 512 MB of memory. The core of the program was written in Fortran 77 and Splus was used as a front-end (to produce the various graphics). For all combinations of parameters, 1000 samples were generated.
4.2. Computational Times
The computational time to compute theERand theERT BSdepends essentially on the choice of the starting point. Therefore, we compute here the time (as func- tion of the sample sizen) needed to compute therOGK or therM CD, when the latter is computed using the adaptedFAST-MCD algorithm (rM CD/F AST) or the forward search algorithm (rM CD/F W D). We chose the reweighted version of the two starting point estimators, because as is shown in Copt and Victoria- Feser (2003), the non-reweighted versions can lead to biased estimates. For each value ofmiss and and the values of Table 1, a time in seconds has been com- puted. Figure 1 shows the results (in a log-scale) for the datasets with = 10%
and miss = 30% (for other combinations the results are comparatively similar).
We notice the following features. The speed for therM CD/F W Das expected is
number of observations
log(time in second)
200250300350400450500
23456
rMCD/FAST
p=10 p=20 p=50
number of observations
log(time in second)
200250300350400450500
1.52.02.53.03.5
rOGK
p=10 p=20 p=50
number of observations
log(time in second)
200250300350400450500
5678910
rMCD/FWD
p=10 p=20 p=50
Figure 1. Log of mean time in seconds needed to compute the rOGK and the rM CD by means of the forward search (FWD) algorithm and the FAST-MCD algorithm, as a function of the sample size and the data dimensionp.
slower than the speed of therM CD/F AST, with an increasing difference as the sample size increases. TherM CD/F AST can be up to 150 times faster than the rM CD/F W D. However, when the rOGK is used as a starting point, the com- putational times decrease drastically, with sometimes a ratio of 18 compared with
the rM CD/F AST. However, the speed of the rM CD/F AST does not depend very much on the sample sizen, whereas the rOGK does quite substantially.
4.3. Comparing Estimators
The aim of this subsection is to study the robustness properties (bias versus ef- ficiency) of the different estimators proposed with incomplete data by means of simulations. For theM CD, all calculations were made using the modifiedFAST- MCD for missing data. It should be stressed that this exercise has not been done in Cheng and Victoria-Feser (2002). Copt and Victoria-Feser (2003) compare the behavior of the M CD, rM CD, OGK andrOGK. They conclude that both the rM CD and rOGK behave nicely (in terms of bias) even with correlated data (Σ=R(.2)2), whereas theOGK for the variances and covariances can be biased with contaminated data.
The estimators we consider here are the final estimators namely the M LE computed via theEM algorithm (which is taken as a benchmark), theER algo- rithm with theM LE as starting point (ER/M LE), the ERalgorithm with the rM CD, andrOGK as starting point (ER/rM CDandER/rOGK), theERT BS algorithm with the rM CD and rOGK as starting point (ERT BS/rM CD and ERT BS/rOGK). The data were generated using the designs presented in Section 4.1. The percentage of missing observations and the sizes n and p do not seem to have an influence on the behavior of the different estimators. The influential factor is the percentage of contamination and potentially, the covariance structure when therOGK is chosen as starting point. We therefore chose theN(0,R(.2)2) as data generating model. We use boxplots to compare the estimators. They are built on the estimated biases of one of the elements of the mean vector, one of the diagonal elements of the covariance matrix and one of the off-diagonal elements of the covariance matrix. Only the results forµ1,σ11, andσ12 are represented, since for other parameters, the same pattern is found.
Figures 2 and 3 show the boxplots of the sampling distributions of the fi- nal estimators using different amounts of data contamination. The M LE clearly fails even if the contamination is small. However it is the most efficient with no contamination but the efficiency loss for the robust estimators seems to be quite small. The ER/M LE breaks down at 5% of data contamination. Finally, the ER/rM CD, ER/rOGK, ERT BS/rM CD or ERT BS/rOGK are very robust and can withstand at least 10% of data contamination.
If we want to see a difference between the ER andERT BS with the same high breakdown starting point, we have to push the percentage of contamination up to 30%. We have not done a full coverage of such situation since it is very unlikely that someone will want to study data sets with such a percentage of contamination.
However, Copt and Victoria-Feser (2003) show a simulated example with 30% of contaminated data in which the ER/rOGK (and also the ER/rM CD) clearly fails to detect the outliers, whereas the ERT BS/rOGK is not affected by their presence in the sample.
P
1-0.4-0.20.00.20.40.6
EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK
EM ER/ ER/ ER/ ERTBS/ERTBS/
EM rMCD rOGK rMCD rOGK
-0.6-0.4-0.20.00.20.40.6
EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK
P
1EM ER/ ER/ ER/ ERTBS/ERTBS/
EM rMCD rOGK rMCD rOGK
V
10.51.01.52.02.53.03.5
EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK
EM ER/ ER/ ER/ ERTBS/ERTBS/
EM rMCD rOGK rMCD rOGK
0.51.01.52.02.53.0
EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK
V
1EM ER/ ER/ ER/ ERTBS/ERTBS/
EM rMCD rOGK rMCD rOGK
V
120.00.51.01.52.0
EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK
EM ER/ ER/ ER/ ERTBS/ERTBS/
EM rMCD rOGK rMCD rOGK
0.51.01.52.0
EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK
V
12EM ER/ ER/ ER/ ERTBS/ERTBS/
EM rMCD rOGK rMCD rOGK
(a) (b)
Figure 2. Sampling distribution of the final high breakdown es- timators with missing data for (a) 0% and (b) 2% of data con- tamination.
5. Conclusion
In this paper, we have considered (efficient) high breakdown estimators of the mean and covariance of a multivariate normal distribution with missing data. The computational speed of the estimators depends on the computational speed of their starting point. The fastest is the modifiedOGK, although its speed depends on the sample size, which is not so strongly the case for the modified MCD using theFAST-MCD algorithm to compute it. For their performance in terms of bias, efficiency and breakdown point, the conclusion is that a high breakdown estimator is crucial as starting point, among which the (modified)OGK can be biased when the data are correlated but the (modified) rOGK is not, and that theERT BS has a larger breakdown point as the ER. TheEM is not robust and theER/EM has a very small breakdown point. Finally, under no contamination, the EM is the most efficient estimator but theERT BShas a very small efficiency loss.
P
1-0.6-0.4-0.20.00.20.40.6
EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK
EM ER/ ER/ ER/ ERTBS/ERTBS/
EM rMCD rOGK rMCD rOGK
P
1-0.50.00.51.0
EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK
EM ER/ ER/ ER/ ERTBS/ERTBS/
EM rMCD rOGK rMCD rOGK
V
10.51.01.52.02.53.03.5
EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK
EM ER/ ER/ ER/ ERTBS/ERTBS/
EM rMCD rOGK rMCD rOGK
V
1123456
EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK
EM ER/ ER/ ER/ ERTBS/ERTBS/
EM rMCD rOGK rMCD rOGK
V
120.00.51.01.52.02.53.0
EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK
EM ER/ ER/ ER/ ERTBS/ERTBS/
EM rMCD rOGK rMCD rOGK
V
12012345
EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK
EM ER/ ER/ ER/ ERTBS/ERTBS/
EM rMCD rOGK rMCD rOGK
(a) (b)
Figure 3. Sampling distribution of the final high breakdown es- timators with missing data for (a) 5% and (b) 10% of data con- tamination.
References
[1] A.C. Atkinson,Stalactite Plots and Robust Estimation for the Detection of Multi- variate Outliers. In S. Morgenthaler, E. Ronchetti, and W.A. Stahel, editors, New Directions in Statistical Data Analysis and Robustness. Birkh¨auser, Basel, 1993.
[2] A.C. Atkinson, Fast Very Robust Methods for the Detection of Multiple Outliers.
J. Amer. Statist. Assoc.89(1994), 1329–1339.
[3] T.-C. Cheng and M.-P. Victoria-Feser,High Breakdown Estimation of Multivariate Location and Scale with Missing Observations. British J. Math. Statist. Psych.55 (2002), 317–335.
[4] S. Copt and M.-P. Victoria-Feser,Fast Algorithms for Computing High Breakdown Covariance Matrices with Missing Data. Cahiers du d´epartement d’Econom´etrie no 2003.04, University of Geneva, CH-1211 Geneva, 2003, available at http://
www.unige.ch/ses/metri.
[5] P.L. Davies,Asymptotic Behaviour of S-estimators of Multivariate Location Param- eters and Dispersion Matrices.Ann. Statist.15(1987), 1269–1292.
[6] A.P. Dempster, M.N. Laird, and D.B. Rubin,Maximum Likelihood from Incomplete Data via the EM Algorithm.J. R. Stat. Soc. Ser. B39(1977), 1–22.
[7] R. Gnanadesikan and J.R. Kettenring, Robust Estimates, Residuals, and Outlier Detection with Multiresponse Data.Biometrics28(1972), 81–124.
[8] P.J. Huber,Robust Statistics. Wiley, New York, 1981.
[9] R.J.A. Little and D.B. Rubin, Statistical Analysis with Missing Data. Wiley, New York, 1987.
[10] R.J.A. Little and P.J. Smith,Editing and Imputing for Quantitative Survey Data.
J. Amer. Statist. Assoc.82(1987), 58–68.
[11] R.A. Maronna and R.H. Zamar,Robust Multivariate Estimates for High-Dimensional Datasets.Technometrics44(2002), 307–317.
[12] D.M. Rocke, Robustness Properties of S-estimators of Multivariate Location and Shape in High Dimension.Ann. Statist.24(1996), 1327–1345.
[13] P.J. Rousseeuw, Least Median of Squares Regression. J. Amer. Statist. Assoc. 79 (1984), 871–880.
[14] P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection. Wiley, New York, 1987.
[15] P.J. Rousseeuw and K. Van Driessen,A Fast Algorithm for the Minimum Covariance Determinant Estimator.Technometrics41(1999), 212–223.
[16] P.J. Rousseeuw and V.J. Yohai, Robust Regression by means of S-estimators. In J. Franke, W. H¨ardle, and D. Martin, editors, Robust and Nonlinear Time Series Analysis, pages 256–272. Springer-Verlag, New York, 1984.
[17] D.B. Rubin,Inference and Missing Data.Biometrika63(1976), 581–592.
[18] D.L. Woodruff and D.M. Rocke,Heuristic Search Algorithm for the Minimum Vol- ume Ellipsoid.J. Comput. Graph. Statist.2(1993), 69–95.
[19] D.L. Woodruff and D.M. Rocke,Computable Robust Estimation of Multivariate Loca- tion and Shape in High Dimension using Compound Estimators.J. Amer. Statist. As- soc.89 (1994), 888–896.
Acknowledgements
Both authors are grateful to two anonymous referees for their constructive com- ments.
S. Copt and M.-P. Victoria-Feser Department of Econometrics and HEC University of Geneva
40, Bd. du Pont d’Arve CH 1211 Geneva Switzerland
e-mail:{maria-pia.victoriafeser, SamuelCopt}@hec.unige.ch