Fast algorithms for computing high breakdown covariance matrices with missing data

(1)

Book Chapter

Reference

Fast algorithms for computing high breakdown covariance matrices with missing data

COPT, Samuel, VICTORIA-FESER, Maria-Pia

Abstract

Robust estimation of covariance matrices when some of the data at hand are missing is an important problem. It has been studied by Little and Smith (1987) and more recently by Cheng and Victoria-Feser (2002). The latter propose the use of high breakdown estimators and so-called hybrid algorithms (see, e.g., Woodruff and Rocke, 1994). In particular, the minimum volume ellipsoid of Rousseeuw (1984) is adapted to the case of missing data. To compute it, they use (a modified version of) the forward search algorithm (see e.g. Atkinson, 1994). In this paper, we propose to use instead a modification of the C-step algorithm proposed by Rousseeuw and Van Driessen (1999) which is actually a lot faster. We also adapt the orthogonalized Gnanadesikan-Kettenring (OGK) estimator proposed by Maronna and Zamar (2002) to the case of missing data and use it as a starting point for an adapted S-estimator.

Moreover, we conduct a simulation study to compare different robust estimators in terms of their efficiency and breakdown.

COPT, Samuel, VICTORIA-FESER, Maria-Pia. Fast algorithms for computing high breakdown covariance matrices with missing data. In: Hubert, M., Pison, G., Struyf, A. and Van Aelst, S.

Theory and Applications of Recent Robust Methods . Basel : Birkhauser, 2004. p. 71-82

Available at:

http://archive-ouverte.unige.ch/unige:6502

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Fast Algorithms for Computing

High Breakdown Covariance Matrices with Missing Data

S. Copt and M.-P. Victoria-Feser

Abstract. Robust estimation of covariance matrices when some of the data at hand are missing is an important problem. It has been studied by Little and Smith (1987) and more recently by Cheng and Victoria-Feser (2002). The latter propose the use of high breakdown estimators and so-called hybrid algorithms (see, e.g., Woodruﬀ and Rocke, 1994). In particular, the minimum volume ellipsoid of Rousseeuw (1984) is adapted to the case of missing data.

To compute it, they use (a modiﬁed version of) the forward search algorithm (see e.g. Atkinson, 1994). In this paper, we propose to use instead a modiﬁca- tion of the C-step algorithm proposed by Rousseeuw and Van Driessen (1999) which is actually a lot faster. We also adapt the orthogonalized Gnanadesikan- Kettenring (OGK) estimator proposed by Maronna and Zamar (2002) to the case of missing data and use it as a starting point for an adaptedS-estimator.

Moreover, we conduct a simulation study to compare diﬀerent robust estimators in terms of their eﬃciency and breakdown.

Mathematics Subject Classification (2000).Primary 62G35; Secondary 62H12.

Keywords. C-step algorithm, minimum covariance determinant, outliers, ro- bust statistics, S-estimators, orthogonalized Gnanadesikan-Kettering robust estimator.

1. Introduction

The statistical literature contains several proposals for high breakdown estimators of the mean and covariance in multivariate data when it is suspected that the data contain outliers or extreme observations. A well known one is the minimum covariance determinant (M CD) of Rousseeuw (1984). However, little has been done

Both authors acknowledge the financial support of the Swiss National Science Foundation (grant 610-057883.99).

(3)

to consider also the case of missing data which is in practice very common. Only Little and Smith (1987) and Cheng and Victoria-Feser (2002) propose diﬀerent solutions. In this paper we actually concentrate on robust estimators with missing data, in particular we propose the use of faster algorithms for their computation and compare them through extensive simulations in terms of their robustness properties when data are contaminated and also in terms of the speed of two diﬀerent algorithms used to compute the robust estimators. In particular, we adapt the orthogonalized Gnanadesikan-Kettenring (OGK) estimator proposed by Maronna and Zamar (2002) to the case of missing data and use it as a starting point for an adaptedS-estimator. All our programs are readily available (upon request) in the form of an Splus library which has been used to produce the results and graphics presented in this paper.

2. A General Class of Estimators with Missing Data

The aim is to estimate the parametersµandΣ, i.e., the mean and covariance of an underlying multivariate variableY = (Y1, . . . , Yp) that has supposedly generated the sample yi, i= 1, . . . , n at hand. As it often happens in practice, we suppose that some of the observations might be missing in that some of theyijare observed for some j ∈ {1, . . . , p} and the others are not observed or missing for the other j’s. In other terms,yi = [y^T_[oi],y^T_[mi]]^T so that a distinction is made between the observed (oi) and the missing (mi) data. We suppose that the data are missing at random (see Rubin, 1976), a suﬃcient condition for correct likelihood-based inferences. Most known estimators of mean and covariance with missing data fall in the class proposed by Cheng and Victoria-Feser (2002), i.e.,

1 n

n i=1

w_i^µ(µ−yi) = 0 (2.1) 1

n n i=1

w^δ_iΣ−w_i^η((yi−µ)(yi−µ)^T−Ci)

= 0 (2.2)

where

yi =

y_[oi]^T , E

y_[mi]y_[oi], µ,Σ_T_T

=

y_[oi]^T , µ^T_[mi]+ (y_[oi]−µ[oi])^TΣ⁻¹_[ooi]Σ_[omi]_T

, (2.3)

and

Ci =

0 0 0 cov

y_[mi],y^T_[mi]y_[oi], µ,Σ

=

0 0

0 Σ_[mmi]−Σ_[moi]Σ⁻¹_[ooi]Σ_[omi]

(2.4)

(4)

where for exampleΣ_[ooi]denotes the partition ofΣcorresponding to the observed part of yi, etc. The diﬀerent estimators are actually deﬁned through the data weighting system given by w_i^µ, w^δi and w_i^η in (2.1) and (2.2) which in turn also depend on the parameters µandΣ (see below). To compute the estimators, one can use an iterative procedure in which given current estimatesµ^(t)andΣ^(t), the

yi,Ci and the weights are ﬁrst computed, and the values are then updated by µ^(t+1)= 1

n n i=1

w^µ_iyi

1 n

n i=1

w^µ_i (2.5)

Σ^(t+1)=

1 n

n i=1

w_i^η((yi−µ^(t+1))(yi−µ^(t+1))^T −Ci)

1 n

n i=1

w^δ_i . (2.6) We have not worked out the conditions for (2.1) and (2.2) in particular to have a unique solution and for (2.5) and (2.6) to converge to that unique solution.

However, the reader is referred to Davies (1987) for general conditions for S- estimators.

The classical M LE is obtained when w_i^µ =w_i^η =w^δi = 1∀i, and (2.5) and (2.6) deﬁne the EM algorithm (Dempster et al., 1977). However, with complete data it is well known that theM LEof mean and covariance are not robust. When there are missing data, the situation does not change; see Cheng and Victoria- Feser (2002). Little and Rubin (1987) propose to base the M-step on a robust estimator belonging to the general class ofM-estimator (Huber, 1981). They call the resulting procedure theERalgorithm. Their estimator is deﬁned by (2.5) and (2.6) with

(w^µ_i)²=w_i^η=w^δi =wi=ω(doi)/d²oi (2.7) where

d²_oi=d²_oi(µ,Σ) =

y_[oi]−µ[oi]_T

Σ⁻¹_[ooi]

y_[oi]−µ[oi]

(2.8) is the squared Mahalanobis distance corresponding to the observed part ofyi. See Little and Smith (1987) for the choice of the weight functionω. The iteration step for the covariance matrix (2.6) does not exactly correspond to the same step in the ERalgorithm in that the weightsw^η_i are not applied to the correction matrixCi. We will however, in what follows consider this slight modiﬁcation of the ER algorithm (Cheng and Victoria-Feser, 2002). If the caseiis uncontaminated, the data are normal and missing values are missing at random, then (2.8) is asymptotically χ²pi wherepi = dim

y_[oi]

. The Wilson-Hilferty transformation of the chi-squared distribution yields (d²_i/pi)^1/3∼N(1−2/(9pi),2/(9pi)). Following Little and Smith (1987), we also propose a probability plot of

Zi= (d²_i/pi)^1/3−1 + 2/(9pi)

2/(9pi) (2.9)

versus standard normal order statistics, that should reveal atypical observations.

Little and Smith (1987) proposed as starting point of theERalgorithm, the M LE on the data where the missing ones have been replaced by the median of

(5)

the corresponding observations. Although the ER algorithm is relatively simple to implement, it suﬀers from an important drawback : its breakdown point is at most 1/(p+ 1) because it is based on a weighting scheme that is not redescending.

This drawback will be highlighted by the simulation results. This means that if the proportion of outliers exceeds this value (or even is near it) the robust estimator is not robust anymore.

To construct a high breakdown estimator of mean and covariance matrix in multivariate data when some observations are missing, Cheng and Victoria- Feser (2002) propose two strategies. The ﬁrst one is to provide a high breakdown estimator such as theM CDestimator as starting point for theERalgorithm and the second is to also adapt a high breakdown estimator such as an S-estimator (Rousseeuw and Yohai, 1984) to incomplete data. The resulting estimator which is called theERT BSis then deﬁned through (2.5) and (2.6) with

w_i^µ=w^η_i/p= w^δ_id˜²_i = ψ( ˜di)

d˜i , (2.10)

d˜i=doi

µ, Σ

/k with µand Σ being the current values of the high breakdown point estimator, and

k= d[q]

(χ²_p)⁻¹(q/(n+ 1)) , (2.11) where d[q] denotes theqth ordered distance (based on thedoi

µ,Σ

),q=(n+ p+ 1)/2withxdenoting the integer part ofx, and

ψ(d;c, M) =







d 0≤d < M

d

1−_d−M

c

2₂

M ≤d≤M+c 0 d > M+c.

This ψ function deﬁnes the translated biweight S-estimator proposed by Rocke (1996). The parametersM andccontrol the breakdown pointε^∗ and the asymptotic rejection probabilityARP αof theERT BS. TheARP can be interpreted as the probability for an estimator, in large samples under a reference distribution, to give a null (or nearly null) weight.M andc are found implicitly by

ε^∗max

d ρ(d;c, M) = Eχ²_p[ρ(d;c, M)], M +c =

(χ²_p)⁻¹(1−α) ;

whereρis the primitive ofψ(see Rocke, 1996). The choices forε^∗andαare to be made by the analyst. The former is the suspected maximal amount of contaminated data and for the latter Cheng and Victoria-Feser (2002) propose choices between 0.1% and 1%.

As Rocke (1996) noted, it is very important to choose a good starting point for any algorithm deﬁning a high breakdown point estimator, otherwise the latter

(6)

can loose its high breakdown properties. For the ERT BS, Cheng and Victoria- Feser (2002) therefore propose an adaptation of theM CD estimator as a starting point as well as an algorithm to compute it. However, to compute the M CD one needs algorithms that are based on random starting subsamples. This can lead to situations in which theM CD is very slow to compute, if not impossible.

Therefore, in the following section, we propose a fast algorithm to compute the M CDby adapting theFAST-MCD of Rousseeuw and Van Driessen (1999) and as an even faster alternative, we propose a modiﬁed version of the OGK estimator adapted to the case of missing data to be used as a starting point for theERT BS.

3. Starting Point Robust Estimators with Missing Data

3.1. The Modified MCD

The objective of the M CD estimator is to ﬁnd hobservations (out of n) whose covariance matrix has the lowest determinant. TheM CD mean estimator is then the sample mean of those h points, and the M CD covariance estimator is their sample covariance matrix. To compute the M CD, one needs an algorithm for ﬁnding the best subset ofhpoints, which usually involves the repeated computation of the sample mean and covariance as well as Mahalanobis distances. When some observations are missing, Cheng and Victoria-Feser (2002) propose to use the EM algorithm to compute the sample means and covariances at all steps of the algorithm and to base the Mahalanobis distances on the observed part of the observation as in (2.8). The latter are standardized by means of the Wilson-Hilferty transformation given in (2.9), so that one takes into account the unequal number of missing values for each observation.

A choice needs to be made on h and one way is to choose it such that the M CD has the highest breakdown. In this case, the minimal value of h is (Rousseeuw and Leroy, 1987)h:=_n+p+1

2

. But this is also the choice that gives the largest eﬃciency loss. So when we suspect that the sample is not heavily contaminated we can reasonably choose a larger value for the proportion of points of say 75% or 80% so we can takeh:=0.75norh:=0.80n.

To run theM CD, Cheng and Victoria-Feser (2002) adapt the forward search algorithm proposed by Atkinson (1993, 1994). However, more recently Rousseeuw and Van Driessen (1999) have proposed a new algorithm calledFAST-MCD sup- posed to be even faster than the forward search algorithm and able to deal with very large data sets. In this paper, we propose to adapt it to compute theM CD when there are missing data.

A key idea of the FAST-MCD algorithm is the fact that starting from any approximation to theM CD, it is possible to ﬁnd an approximation with a lower determinant. Indeed Rousseeuw and Van Driessen (1999) observed that from a subsetHk of sizeh in whichµ, Σ and the Mahalanobis distances are computed, one can create a subsetHk+1by taking among thenobservations thehones with the smallest Mahalanobis distances with the property that the determinant of Σ

(7)

based onHk+1is smaller. Each step is called aC-step. The initial subset is created by choosing randomlyp+ 1 observations on which the Mahalanobis distances are computed to order the n observations. The ﬁrsth ones deﬁne the initial subset H1. If the determinant of Σ based on the randomly chosen p+ 1 observations is null, one adds one randomly chosen observation at the time until the determinant becomes positive. If for any subset Hk there are missing values, we compute µk

and Σ_k with the EM algorithm. The Mahalanobis distances are also changed as in (2.8) and standardized using the Wilson-Hilferty transformation. The absolute value of the latter is used to order the observations. The initial subset is created choosing randomly p+ 1 observations among the fully observed ones. For more details, see Copt and Victoria-Feser (2003).

Through extensive simulations we compare, in Section 4, the forward search algorithm and theFAST-MCD algorithm for the computation of theM CDwith missing data.

3.2. The Modified OGK

Maronna and Zamar (2002) base their OGK on the robust estimator for covari- ancesσjkproposed by Gnanadesikan and Kettenring (1972) which is very simple to compute. Indeed the latter is deﬁned for a pair of random variables (i.e.,p= 2) as

1 4

σ(Yj+Yk)²−σ(Yj−Yk)²

whereσ() is a standard deviation function applied on its argument. A robust estimator forσjkis obtained whenσ() is a robust function. Whenp >2, the covariance matrixΣis estimated by replacing all its elements by all pairwise estimates. It is well known that such an estimator may produce non positive definite matrices and the estimator is not affine equivariant. To overcome the lack of positive definite- ness, Maronna and Zamar (2002) propose an estimator defined by the following four steps:

1. LetD= diag (σ(Yj))|_j=1,...,p and deﬁne xi =D⁻¹yi, i= 1, . . . , n, i.e., realizations fromX = (X1, . . . , Xp).

2. Compute the matrixU= (ujk) with ujk=

₁

4

σ(Xj+Xk)²−σ(Xj−Xk)² j=k

1 j=k. (3.1)

3. DecomposeU asU=EΛE^T with Λ =diag(λ1, . . . , λp).

4. Deﬁnezi =E^Txi, i.e., realizations fromZ= (Z1, . . . , Zp) andA=DE. The estimator ofΣisAΓA^T with Γ = diag

σ(Zj)²

j=1,...,p.

A location estimator for µ is given by Aν with ν = (m(Zj))|_j=1,...,p, m() being a (robust) mean function. The procedure can be iterated by replacingUin step 2 by EΓE^T until convergence. For the choice ofσ() and m(), see Maronna and Zamar (2002). The latter argue that to improve the eﬃciency of the OGK, one could use it as a hard rejection tool in that a reweighted estimator as in (2.1)

(8)

is used in whichyi =yi∀iandw_i^µ=w^η_i =w^δ_i =wiwith wi=

1 (yi−µOGK)^TΣ⁻¹_OGK(yi−µOGK)≤χ²p(.9) 0 otherwise.

The resulting estimator will be called the reweighted OGK (rOGK). Note that this strategy is also used most of the times with theM CD but with the quantile 0.975 (instead of 0.9) of theχ²_p. We will call the resulting estimator the rM CD. We have not derived the formal conditions for convergence to a unique solution of the algorithm used to compute the (rOGK), but in our simulations we found that it converged in all our samples.

To extend theOGK orrOGKto the case of missing data, we propose to im- pute the missing values by means of theyiin (2.3) obtained by the EM algorithm, i.e., withµandΣestimated by (2.1) where all weights are equal to 1. The reason is that the EM algorithm is very fast, and although it leads to biased estimates of µ and Σ and therefore of the imputed values yi, we found through extensive simulation studies, that in practice the rOGK is not or very little aﬀected (see Section 4). In future research, we will seek a better adaptation of theOGK to the case of missing data.

4. Simulation Study

4.1. The Design

The model is the multivariate normal distributionN(µ,Σ). Because theOGK is not aﬃne equivariant, following Maronna and Zamar (2002), we choose to simulate correlated data, i.e., Σ = R(ρ)² where R(ρ) = ρfor the elements j =k and 1 for the others, with ρ= 0.2. We also chose µ=0. The data were contaminated by so-called shift-outliers (Woodruﬀ and Rocke, 1993), i.e., an -proportion was generated usingN(√

p+β/√ 2

ep,Σ) whereepis ap-dimensional vector of ones.

We setβ= 1.6 and= 0, 0.02, 0.05 and 0.1. The missing data, if any, were chosen randomly among the mixture distribution between the good and the bad data.

We chose proportionsmiss= 0.1, 0.2 and 0.3. Table 1 shows the diﬀerent values fornandpused in the simulations. Each robust estimator requires a decision on

Table 1. Values fornandp. p= 10 p= 20 p= 50 n= 50 100 200 n= 100 200 400 n= 500 500 600

its initialization parameters. For theM CD estimator,h= [0.6n] was chosen. For theOGK, c1= 4.5 andc2= 3 were chosen (see Maronna and Zamar, 2002). For theERT BSestimator we chose for our simulations the breakdown pointε^∗= 0.3

(9)

and the ARP α= 0.001. All computational experiments were done on a Athlon 1900Mhz with 512 MB of memory. The core of the program was written in Fortran 77 and Splus was used as a front-end (to produce the various graphics). For all combinations of parameters, 1000 samples were generated.

4.2. Computational Times

The computational time to compute theERand theERT BSdepends essentially on the choice of the starting point. Therefore, we compute here the time (as function of the sample sizen) needed to compute therOGK or therM CD, when the latter is computed using the adaptedFAST-MCD algorithm (rM CD/F AST) or the forward search algorithm (rM CD/F W D). We chose the reweighted version of the two starting point estimators, because as is shown in Copt and Victoria- Feser (2003), the non-reweighted versions can lead to biased estimates. For each value ofmiss and and the values of Table 1, a time in seconds has been computed. Figure 1 shows the results (in a log-scale) for the datasets with = 10%

and miss = 30% (for other combinations the results are comparatively similar).

We notice the following features. The speed for therM CD/F W Das expected is

number of observations

log(time in second)

200250300350400450500

23456

rMCD/FAST

p=10 p=20 p=50

log(time in second)

200250300350400450500

1.52.02.53.03.5

rOGK

p=10 p=20 p=50

log(time in second)

200250300350400450500

5678910

rMCD/FWD

p=10 p=20 p=50

Figure 1. Log of mean time in seconds needed to compute the rOGK and the rM CD by means of the forward search (FWD) algorithm and the FAST-MCD algorithm, as a function of the sample size and the data dimensionp.

slower than the speed of therM CD/F AST, with an increasing diﬀerence as the sample size increases. TherM CD/F AST can be up to 150 times faster than the rM CD/F W D. However, when the rOGK is used as a starting point, the computational times decrease drastically, with sometimes a ratio of 18 compared with

(10)

the rM CD/F AST. However, the speed of the rM CD/F AST does not depend very much on the sample sizen, whereas the rOGK does quite substantially.

4.3. Comparing Estimators

The aim of this subsection is to study the robustness properties (bias versus efficiency) of the different estimators proposed with incomplete data by means of simulations. For theM CD, all calculations were made using the modifiedFAST- MCD for missing data. It should be stressed that this exercise has not been done in Cheng and Victoria-Feser (2002). Copt and Victoria-Feser (2003) compare the behavior of the M CD, rM CD, OGK andrOGK. They conclude that both the rM CD and rOGK behave nicely (in terms of bias) even with correlated data (Σ=R(.2)²), whereas theOGK for the variances and covariances can be biased with contaminated data.

The estimators we consider here are the final estimators namely the M LE computed via theEM algorithm (which is taken as a benchmark), theER algorithm with theM LE as starting point (ER/M LE), the ERalgorithm with the rM CD, andrOGK as starting point (ER/rM CDandER/rOGK), theERT BS algorithm with the rM CD and rOGK as starting point (ERT BS/rM CD and ERT BS/rOGK). The data were generated using the designs presented in Section 4.1. The percentage of missing observations and the sizes n and p do not seem to have an influence on the behavior of the different estimators. The influential factor is the percentage of contamination and potentially, the covariance structure when therOGK is chosen as starting point. We therefore chose theN(0,R(.2)²) as data generating model. We use boxplots to compare the estimators. They are built on the estimated biases of one of the elements of the mean vector, one of the diagonal elements of the covariance matrix and one of the off-diagonal elements of the covariance matrix. Only the results forµ1,σ11, andσ12 are represented, since for other parameters, the same pattern is found.

Figures 2 and 3 show the boxplots of the sampling distributions of the final estimators using different amounts of data contamination. The M LE clearly fails even if the contamination is small. However it is the most efficient with no contamination but the efficiency loss for the robust estimators seems to be quite small. The ER/M LE breaks down at 5% of data contamination. Finally, the ER/rM CD, ER/rOGK, ERT BS/rM CD or ERT BS/rOGK are very robust and can withstand at least 10% of data contamination.

If we want to see a diﬀerence between the ER andERT BS with the same high breakdown starting point, we have to push the percentage of contamination up to 30%. We have not done a full coverage of such situation since it is very unlikely that someone will want to study data sets with such a percentage of contamination.

However, Copt and Victoria-Feser (2003) show a simulated example with 30% of contaminated data in which the ER/rOGK (and also the ER/rM CD) clearly fails to detect the outliers, whereas the ERT BS/rOGK is not aﬀected by their presence in the sample.

(11)

P

₁

-0.4-0.20.00.20.40.6

EM ER ER/rMCD ER/rOGK ERTBS/rMCD ERTBS/rOGK

EM ER/ ER/ ER/ ERTBS/ERTBS/

EM rMCD rOGK rMCD rOGK

-0.6-0.4-0.20.00.20.40.6

P

₁

V

₁

0.51.01.52.02.53.03.5

0.51.01.52.02.53.0

V

₁

V

₁₂

0.00.51.01.52.0

0.51.01.52.0

V

₁₂

(a) (b)

Figure 2. Sampling distribution of the ﬁnal high breakdown estimators with missing data for (a) 0% and (b) 2% of data contamination.

5. Conclusion

In this paper, we have considered (efficient) high breakdown estimators of the mean and covariance of a multivariate normal distribution with missing data. The computational speed of the estimators depends on the computational speed of their starting point. The fastest is the modifiedOGK, although its speed depends on the sample size, which is not so strongly the case for the modified MCD using theFAST-MCD algorithm to compute it. For their performance in terms of bias, efficiency and breakdown point, the conclusion is that a high breakdown estimator is crucial as starting point, among which the (modified)OGK can be biased when the data are correlated but the (modified) rOGK is not, and that theERT BS has a larger breakdown point as the ER. TheEM is not robust and theER/EM has a very small breakdown point. Finally, under no contamination, the EM is the most efficient estimator but theERT BShas a very small efficiency loss.

(12)

P

₁

-0.6-0.4-0.20.00.20.40.6

P

₁

-0.50.00.51.0

V

₁

0.51.01.52.02.53.03.5

V

₁

123456

V

₁₂

0.00.51.01.52.02.53.0

V

₁₂

012345

(a) (b)

Figure 3. Sampling distribution of the ﬁnal high breakdown estimators with missing data for (a) 5% and (b) 10% of data contamination.

References

[1] A.C. Atkinson,Stalactite Plots and Robust Estimation for the Detection of Multi- variate Outliers. In S. Morgenthaler, E. Ronchetti, and W.A. Stahel, editors, New Directions in Statistical Data Analysis and Robustness. Birkh¨auser, Basel, 1993.

[2] A.C. Atkinson, Fast Very Robust Methods for the Detection of Multiple Outliers.

J. Amer. Statist. Assoc.89(1994), 1329–1339.

[3] T.-C. Cheng and M.-P. Victoria-Feser,High Breakdown Estimation of Multivariate Location and Scale with Missing Observations. British J. Math. Statist. Psych.55 (2002), 317–335.

[4] S. Copt and M.-P. Victoria-Feser,Fast Algorithms for Computing High Breakdown Covariance Matrices with Missing Data. Cahiers du d´epartement d’Econom´etrie no 2003.04, University of Geneva, CH-1211 Geneva, 2003, available at http://

www.unige.ch/ses/metri.

(13)

[5] P.L. Davies,Asymptotic Behaviour of S-estimators of Multivariate Location Param- eters and Dispersion Matrices.Ann. Statist.15(1987), 1269–1292.

[6] A.P. Dempster, M.N. Laird, and D.B. Rubin,Maximum Likelihood from Incomplete Data via the EM Algorithm.J. R. Stat. Soc. Ser. B39(1977), 1–22.

[7] R. Gnanadesikan and J.R. Kettenring, Robust Estimates, Residuals, and Outlier Detection with Multiresponse Data.Biometrics28(1972), 81–124.

[8] P.J. Huber,Robust Statistics. Wiley, New York, 1981.

[9] R.J.A. Little and D.B. Rubin, Statistical Analysis with Missing Data. Wiley, New York, 1987.

[10] R.J.A. Little and P.J. Smith,Editing and Imputing for Quantitative Survey Data.

J. Amer. Statist. Assoc.82(1987), 58–68.

[11] R.A. Maronna and R.H. Zamar,Robust Multivariate Estimates for High-Dimensional Datasets.Technometrics44(2002), 307–317.

[12] D.M. Rocke, Robustness Properties of S-estimators of Multivariate Location and Shape in High Dimension.Ann. Statist.24(1996), 1327–1345.

[13] P.J. Rousseeuw, Least Median of Squares Regression. J. Amer. Statist. Assoc. 79 (1984), 871–880.

[14] P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection. Wiley, New York, 1987.

[15] P.J. Rousseeuw and K. Van Driessen,A Fast Algorithm for the Minimum Covariance Determinant Estimator.Technometrics41(1999), 212–223.

[16] P.J. Rousseeuw and V.J. Yohai, Robust Regression by means of S-estimators. In J. Franke, W. H¨ardle, and D. Martin, editors, Robust and Nonlinear Time Series Analysis, pages 256–272. Springer-Verlag, New York, 1984.

[17] D.B. Rubin,Inference and Missing Data.Biometrika63(1976), 581–592.

[18] D.L. Woodruﬀ and D.M. Rocke,Heuristic Search Algorithm for the Minimum Vol- ume Ellipsoid.J. Comput. Graph. Statist.2(1993), 69–95.

[19] D.L. Woodruﬀ and D.M. Rocke,Computable Robust Estimation of Multivariate Loca- tion and Shape in High Dimension using Compound Estimators.J. Amer. Statist. As- soc.89 (1994), 888–896.

Acknowledgements

Both authors are grateful to two anonymous referees for their constructive com- ments.

S. Copt and M.-P. Victoria-Feser Department of Econometrics and HEC University of Geneva

40, Bd. du Pont d’Arve CH 1211 Geneva Switzerland

e-mail:{maria-pia.victoriafeser, SamuelCopt}@hec.unige.ch