Tail-behavior of estimators and of their one-step versions

(1)

Vol. 153No. 1 (2012)

Tail-behavior of estimators and of their one-step versions.

^*

Jana Jureˇcková¹

Abstract:The finite-sample breakdown points and finite-sample tail behavior are studied for a class of equivariant estimators in the linear regression model under a fixed design. The same is considered for the one-step andk-step versions of the estimators, starting with an initial estimator. It is shown that the tail-behavior of the one- andk-step versions of an estimator is determined mainly by that of the initial estimator.

Résumé :Les points de rupture et le comportement des queues sous des échantillons finis sont etudiés pour une classe d’estimateurs equivariants dans le modèle linéaire avec un design fixe. Des résultats du même type sont obtenus pour des itérations de type Newton-Raphson d’un estimateur initial. On démontre que le comportement des queues de ces estimateurs itérés est principalemnt déterminé par celui de l’estimateur initial.

Keywords:breakdown point, equivariant estimator, tail-behavior

Mots-clés :point de rupture, estimateur equivariant, comportement des queues AMS 2000 subject classifications:35L05, 35L70

1. Introduction

Consider the linear regression model

Y=Xβ+U (1.1)

with vector of observationsY= (Y₁, . . . ,Yn)^>,matrixX=

x₁, . . . ,xn

of ordern×pwith the full rank p,parameterβ = (β1, ...,βp)^> and with the vector U= (U1, ...,Un)^> of i.i.d. errors, distributed according to the distribution functionFwith density f,which is absolutely continuous and has a non-vanishing characteristic functionϕ.

The problem is estimatingβunder the loss functionL(b,β) =L(b−β).The corresponding risk R(T,β)is invariant with respect to the group of transformationsG ={Y+Xb,b∈R^p},and one possible maximal invariant forG is the vectorZ=Y−Yb withYb=Xbβ andβb= (X^>X)⁻¹X^>Y being the LSE ofβ.Under the invariant risk, it is natural to restrict considerations to equivariant estimatorsTnsatisfyingTn(Y+Xb) =Tn(Y) +b.Every equivariant estimator Tn we obtain from an initial equivariant estimatorT⁰_nwith a finite risk, adding an invariant statistic, i.e. adding any possible (vector) function ofZ:

Tn=T⁰_n+v(Z).

* The research was supported by the Grant GAC(R 201/12/0083.

1 Department of Probability and Statistics, Charles University in Prague, Sokolovská 83, CZ-186 75 Prague 8, Czech Republic.

E-mail:[email protected]

(2)

For instance, the minimum risk equivariant estimator with respect to the quadratic losskb−βk² is equal toT^∗_n=T⁰_n−IE₀(T⁰_n|Z).To calculate it, we must knowF,but even then its calculation is rather technical. Possible approximations ofT^∗_nwere studied in [4].

Many robust estimators in the linear model, as M-, L- and R-estimators, correspond to an invariant loss and so are regression equivariant. Also their calculation is often technical, and they are often approximated by their one-step versions. These are the first Newton-Raphson iterations of the estimating equation, starting with an initial estimator, even if the estimator itself is not exactly the root. Thek-step versions are also considered, and many results in the literature show the asymptotic closeness ofTnand of its one-step version.

However, though the estimators are asymptotically normal, some of their properties are typically finite-sample, despite of the widely accepted view that many properties are inherited by the asymptotic normality. The admissibility of an estimator with respect a specified loss function is a typical example. Generally accepted view was that, while the classical procedures are sensitive to the heavy-tailed distributions, this problem is solved by using the robust procedures. This is true only to some extent. It was shown in [5] that the distribution of an equivariant estimator of location is heavy-tailed for any finitenprovided the parent distribution is heavy tailed. Although the Pareto index is increasing withn,it is always finite and the distribution never gets exponentially tailed.

We shall illustrate this problem, using the tail-behavior measure for regression estimators proposed in [1]. Under some conditions, we derive the lower bound for the tail behavior of M-estimators in model (1.1). Then we shall consider the one-step versions in model (1.1) and show that the tail-behavior of the Newton-Raphson iteration is determined mainly by that of the initial estimator, and this holds for thek-step version for any finitek≥1.

2. Tail behavior of equivariant estimators

A possible measure of tail performance of estimator T_n in the location model withY₁, . . . ,Yn

independent and identically distributed according a symmetricF(y−θ)is B(a,Tn) =−log P_θ(Tn−θ >a)

−log(1−F(a)) for fixed n as a→∞ (2.1) (see [2]). Then 1≤lim infa→∞B(a,T_n)≤lim sup_a→∞B(a,Tn)≤nfor any equivariantT_nsatisfying weak conditions, and both bounds are attainable by the sample meanTn=X¯n; the upper one for the exponentially tailed, the lower one for heavy tailed distributions.

Remind a close link between the tail performance and the breakdown point for a large class of location estimators, proved in [1]:

Theorem 2.1. (He et al. (1990)). Let T(Y₁, . . . ,Yn) be a location equivariant estimator of θ, nondecreasing in each argument Yi.Then Tnhas a universal breakdown point m^∗and

m^∗≤lim inf

a→∞ B(a,Tn)≤lim sup

a→∞

B(a,Tn)≤n−m^∗+1 (2.2) is valid for any symmetric, absolutely continuous F of Y₁satisfying

z→∞lim

log(1−F(z+c))

log(1−F(z)) =1 for any fixed c>0. (2.3)

(3)

Theorem 2.1 holds both for exponential and algebraic tails ofF.For example, the medianT_nhas breakdownm^∗=ⁿ₂ and fornoddP_θ(Tn−θ >a)tends to zero asa→∞just ⁿ⁺¹₂ -times faster than the tails of the underlying error distribution.

The measure of tail performance (2.1) was extended in [1] to the linear model (1.1) in the following way:

B(a,Tn) = −lnP_β

max1≤i≤n|x^>_i (Tn−β)|>a

−ln(1−F(a)) for a0. (2.4)

He at al. (1990) showed that lim sup_a→∞B(Tn,a)≤n for a regression equivariant estimator under a weak condition, and showed that lim sup_a→∞B(bβ_n,a)≤n/pfor the LSE and the normal distribution. However, this upper bound holds only under a balanced design, while generally we have

a→∞limB(βb_n,a) =1/hˆ_n for the normal F

a→∞limB(βb_n,a) =1 for the heavy-tailed F

where ˆh_nis the maximal diagonal element of the hat matrixH=Xn(X^>_nXn)⁻¹X^>_n.It indicates that the matrixXnseriously affects the tail behavior of an estimator in model (1.1). A bad message for an estimator is the low value ofB(Tn,a),which hints that the estimator utilizes only a small proportion of the data. He at al. (1990) derived the lower bound of the tail behavior for the L₁-estimator and for some M-estimators ofβ with the criterion functionρ(x)close to|x|.

We shall derive the lower bounds of the tail behavior for a more general class of M-estimators, including the Huber-type estimators and some redescending M-estimators, defined as

Tn=arg min

b∈R^p

n ⁿ

i=1

∑

ρ(Yi−x^>_i b) o

(2.5) We shall show that the tail behavior of such estimators has a lower bound>1 for both heavy-tailed and light-tailedF,satisfying

0<F(z)<1, F(z) +F(−z) =1, z∈R¹

and (2.6)

a→∞lim

−ln(1−F(a+c))

−ln(1−F(a)) =1 for ∀c>0.

Following Mizera and Müller (1999), we shall impose the following conditions on the criterion functionρ,discussed in [8] in detail:

(i) ρis absolutely continuous, nondecreasing on[0,∞)andρ(z)≥0,ρ(z) =ρ(−z),z∈R¹. (ii) ρ(z)is unbounded and its derivativeψ(z)is bounded forz∈R¹.

(iii) ρis subadditive in the sense that there existsL>0 such thatρ(z1+z2)≤ρ(z1) +ρ(z2) +L forz₁,z₂≥0.

(4)

Givenρ satisfying (i)–(iii), define m_∗=m_∗(n,X,ρ) =min

(

cardM:

∑

i∈M

ρ(x^>_i b)≥

∑

i/∈M

ρ(x^>_i b) for some b6=0 )

(2.7) whereMruns over the subsets ofN={1,2, . . . ,n}.The following theorem shows thatm_∗is the lower bound for the tail behavior of M-estimator generated byρ:

Theorem 2.2. Under F satisfying (2.6) andρsatisfying (i)–(iii), the tail behavior of M-estimator defined in (2.5) satisfies

lim inf

a→∞ B(Tn,a)≥m_∗ (2.8)

with m∗defined in (2.7).

Proof.Regarding the equivariance of the M-estimators in (2.5), we can assumeβ=0without loss of generality.Lwill be a generic constant. By (i)–(iii),

∑

N

ρ(yi−x^>_i b) ≥

∑

N\M

ρ(x^>_i b)−

∑

N\M

ρ(yi)−

∑

M

ρ(x^>_i b)−

∑

M

ρ(yi)

!

−L

=

∑

N\M

ρ(x^>_i b)−

∑

M

ρ(x^>_i b) +

∑

N

ρ(yi)−2

∑

N\M

ρ(yi)−L.

By the definition ofm_∗,there existsε>0 such that

∑

M

ρ(x^>_i b)≤(1−ε)

∑

N\M

ρ(x^>_i b) (2.9)

for everyM⊂Nof sizem=m∗−1 and everyb∈R^p.Assume that all butm=m∗−1 of the ρ(yi)’s are uniformly bounded. Because (2.9) implies

∑

N

ρ(yi−x^>_i b)−

∑

N

ρ(yi)≥ε

∑

N\M

ρ(x^>_i b)−2

∑

N\M

ρ(yi)−L,

we deduce that there existsC>0 such that ifkbk>Cthen∑Nρ(yi−x^>_i b)−∑Nρ(yi)>0 andb cannot be a minimizer of∑Nρ(yi−x^>_i t).Hence, for

maxiρ(x^>_i Tn)1

we need to have the ρ(yi)unbounded fori∈N\Mand maxiρ(x^>_i Tn)≤Kρ(|y|n:n−m∗+1),where|y|n:n−m∗+1 is the (n−m_∗+1)-th order statistic of|y₁|, . . . ,|y_n|.Thus,

P

maxi |x^>_i Tn|>a

≤Pn

|y|_n:n−m_∗₊₁≥K⁰ao .

Using the distribution of order statistics of|Y₁|, . . . ,|Y_n|forFsatisfying (2.6), we conclude that

lim infa→∞B(Tn)≥m∗. 2

Remark 2.1. The lower bound in (2.8) coincides with the lower bound derived by Mizera and Müller (1999) for the finite-sample breakdown point of the M-estimatorTn. Moreover, ifρ is regularly varying at infinity with an exponent r≥0,the lower bound in [8] is further modified to a form depending on r.

(5)

3. One-step version of Tn

A broad class of estimatorsTnofβ admit a representation Tn(Y) =β+1

γ(X^>_nXn)⁻¹

n i=1

∑

xiψ(Yi−x^>_i β) +Rn, kR_nk=o_p(kX^>_nXnk^−1/2) (3.1) with a suitable function ψ and a functionalγ =γ(ψ,F).The representation (3.1) is valid e.g.

for an M-estimator (2.5). The representation (3.1) holds as an identity ifTnis the least squares estimatorβb_nandψis linear. By Kagan et al. (1973), the admissibility ofβb_nwith respect to the quadratic loss implies thatIE₀(βb_n|Z_n) =0and this in turn implies the normality of the distribution.

The one-step version ofTnwith representation (3.1) is defined as the one-step Newton-Raphson iteration of the system of equations

n

∑

i=1

xiψ(Yi−x^>_i b) =0 (3.2)

even when the global minimum of (2.5) is not a root of (3.2), as in the case ofL₁-estimator or of other M-estimators with discontinuousψ.

For simplicity, standardize the sequence{X_n}so that

n→∞limQ^∗_n=Q^∗, Q^∗_n=n⁻¹X^>_nXn, (3.3) and assume thatQ^∗is a positively definite p×pmatrix.

Start with an initial estimatorT⁽⁰⁾n ofβ satisfyingn^1/2(T⁽⁰⁾n −β) =Op(1).Assume thatγ6=0 and let ˆγn be a consistent estimator ofγ such that 1−(γ/γˆn) =Op(n^−1/2).Then the one-step version ofTnis defined as

T⁽¹⁾n =







T⁽⁰⁾n + (nγˆn)⁻¹(Q^∗_n)⁻¹∑ⁿ_i=1xiψ(Yi−x^>_i T⁽⁰⁾n ) . . . if ˆγn6=0

T⁽⁰⁾n . . . otherwise

(3.4)

The two-step or thek-step versions ofTnare defined analogously fork=2,3, . . .. For a studentized M-estimatorMnofβ defined as

Tn=arg min ( n

∑

i=1

ρ

Yi−X^>_i b Sn

,b∈R^p

) ,

its one-step version is the one-step Newton-Raphson iteration of the system of equations

n

∑

i=1

xiψ

Y_i−x^>_i b Sn

=0, b∈R^p (3.5)

whereS_n(Yn)is a studentizing scale statistic such that

S_n(y)>0,S_n(c(y+Xb)) =cS_n(y),c>0,y∈Rⁿ,b∈R^p,

(6)

and to which exists a functionalS=S(F)>0 such thatn^1/2(Sn−S(F)) =O_p(1) asn→∞.

Under some conditions onψand onF,the one-step version ofTnwas modified in [6] so thatT⁽¹⁾n

inherited the breakdown point of the initial estimatorT⁽⁰⁾n and the asymptotic efficiency ofTn. Namely, underFsymmetric andψ=ρ⁰skew-symmetric, absolutely continuous, non-decreasing and bounded, and under some other smoothness conditions, we define the one-step version in the following way:

T⁽¹⁾n =







T⁽⁰⁾n +γˆ_n⁻¹Wn . . . γˆ_n⁻¹Wn

≤c,0<c<∞

T⁽⁰⁾n . . . otherwise

(3.6)

whereWn=n⁻¹Q^∗−1_n ∑ⁿ_i=1xiψ

Y_i−x^>_i T⁽⁰⁾n

S_n

and

ˆ γn= 1

2√ n

n

∑

i=1

x_i1

"

ψ(Yi−x^>_i T⁽⁰⁾n +n^−1/2x^>_i q⁽¹⁾n ) Sn

−ψ(Yi−x^>_i T⁽⁰⁾n −n^−1/2x^>_i q⁽¹⁾n ) Sn

#

(3.7)

whereq⁽¹⁾n is the first column ofQ^∗−1_n .Then

Theorem 3.1. (Jureˇcková and Portnoy 1987). Under the above conditions, (i) kT⁽¹⁾_n −Tnk=O_p(n⁻¹).

(ii) If T⁽⁰⁾n has finite sample breakdown point mn,thenT⁽¹⁾n has the same breakdown point mn. (iii) If T⁽⁰⁾n is affine equivariant, then Pn

T⁽¹⁾n (XnA)6=A⁻¹T⁽¹⁾n (Xn)o

→0as n→∞for any regular p×p matrixA.

Remark 3.1. The results are true even for the initial estimator satisfyingkT⁽⁰⁾n −βk=O_p(n^−τ) forτ satisfying¹₄ <τ≤¹₂.

Remark 3.2. Starting with an estimator with a low rate of consistency needs many observations to achieve a desired precision. In a model with a scalar parameter, it was proved in [7] that

|T_n−T_n⁽¹⁾|=op(n⁻¹)only in a symmetric location model (F symmetric andψskew-symmetric) or generally only if the initial Tn⁽⁰⁾has the same influence function as Tn.The rate of approximation of the k-step version Tn^(k), k≥2,to T_n depends on the smoothness ofψ :while|T_n−Tn^(k)|= Op(n⁻¹²⁻^k²)for an absolutely continuousψ,it is only|T_n−Tn^(k)|=Op

n⁻¹⁺²^−k−1

forψ with jump-discontinuities (see [3]).

We shall show that in the location model not only the breakdown point but also the tail-behavior of the one-step version is determined by that of the initial estimate. It all leads to a conjecture that the finite-sample properties ofTn⁽¹⁾depend on the properties ofTn⁽⁰⁾,while the asymptotic properties depend on those of the non-iterated estimatorT_n.

(7)

3.1. Tail behavior of one-step andk-step versions in the location model

Let us now consider the equivariant estimatorTnof location parameter, satisfying the representation (3.1), and its modified one-step version with an equivariant initial estimatorTn⁽⁰⁾:

T_n⁽¹⁾= (

Tn⁽⁰⁾+γˆ_n⁻¹W_n . . . if |γˆ_n⁻¹W_n| ≤c,0<c<∞

Tn⁽⁰⁾ . . . otherwise (3.8)

whereWn=n⁻¹∑ⁿ_i=1ψ(Yi−T_n⁽⁰⁾) =Op(n^−1/2)and with ˆγnof type (3.7). Define thek-step version ofTnanalogously. ThenP_θ(Tn6=Tn⁽⁰⁾+γˆ_n⁻¹Wn)→0 asn→∞,henceTn⁽¹⁾−Tn=op(n^−1/2).If Tn⁽⁰⁾is equivariant, so isTn⁽¹⁾,because both ˆγnandWnare invariant. Surprisingly, the tail behavior ofTn⁽¹⁾and ofTn^(k)depends more on that ofTn⁽⁰⁾than on the tail-behavior of non-iterativeT_n.This situation is described in the following theorem:

Theorem 3.2. Let Y₁, . . . ,Ynbe a sample from a population with d.f. F(y−θ),F symmetric and satisfying (2.3). Let T_nbe an equivariant estimator ofθadmitting the representation (3.1) with a bounded skew-symmetric non-decreasingψ.Then

lim inf

a→∞ B(Tn⁽⁰⁾,a)≤lim inf

a→∞ B(Tn^(k),a)≤lim sup

a→∞

B(Tn^(k),a)≤lim sup

a→∞

B(Tn⁽⁰⁾,a)

for k=1,2, . . ..

Proof:Let us prove the proposition fork=1; the casek>1 is analogous. The tail behavior measure ofTn⁽¹⁾satisfies, for 0<δ <1 and sufficiently largea,

P₀(Tn⁽¹⁾>a) ≤ P₀(Tn⁽⁰⁾>(1−δ)a) +P₀(γˆ_n⁻¹W_n>(1−δ)a) +P₀([Tn⁽⁰⁾>δa]∩[γˆ_n⁻¹W_n>δa])

= P₀(Tn⁽⁰⁾>(1−δ)a).

Lettingδ ↓0,we obtainP₀(Tn⁽¹⁾>a)≤P₀(Tn⁽⁰⁾>a)for sufficiently largea.On the other hand, P₀(Tn⁽¹⁾>a)≥P₀(Tn⁽⁰⁾>a+c,γˆ_n⁻¹Wn>−c) =P₀(Tn⁽⁰⁾>a+c).

Hence,

−lnP0(Tn⁽⁰⁾>a)

−ln(1−F(a)) ≤ −lnP₀(Tn⁽¹⁾>a)

−ln(1−F(a)) ≤−lnP₀(Tn⁽⁰⁾>a+c)

−ln(1−F(a))

for sufficiently largea; it proves the theorem. 2

Corollary 3.1. (i) Let Tn⁽⁰⁾=X˜nbe the sample median, n odd. Let Tnbe an equivariant estimator and T_n^(k)its k-step version starting withX˜_n.Then, under the conditions of Theorem 3.2,

a→∞limB(Tn^(k),a) =n+1

2 for k=1,2, . . . .

(8)

(ii) Let T_n⁽⁰⁾=X¯_nbe the sample mean. Let T_nbe an equivariant estimator and T_n^(k)its k-step version starting withX¯n.Then, under the conditions of Theorem 3.2, for k=1,2, . . . ,

a→∞limB(Tn^(k),a) =

( 1 if F is exponentially tailed 0 if F is heavy tailed.

where the exponentially and heavy tailed F satisfy

a→∞lim

−ln(1−F(a))

ba^r =1, b>0, r≥1

a→∞lim

−ln(1−F(a))

mlna =1, m>0, respectively (see [2] for more details).

References

[1] X. He, J. Jureˇcková, R. Koenker, and S. Portnoy. Tail behavior of regression estimators and their breakdown points.

Econometrica, 58:1195–1214, (1990).

[2] J. Jureˇcková. Tail behavior of location estimators.Ann. Statist., 9:578–585, (1981).

[3] J. Jureˇcková and M. Malý. The asymptotics for studentized k-step M-estimators of location.Sequential Analysis, 14:229–245, (1995).

[4] J. Jureˇcková and J. Picek. Minimum risk equivariant estimators in linear regression model.Statistics & Decisions, 27:37–59, (2009).

[5] J. Jureˇcková and J. Picek. Finite-sample behavior of robust estimators. In S. Chen et al., editor,Recent Researches in Instrumentation, Measurement, Circuits and Systems, pages 15–20. WSEAS, ISSN 1792-8575, (2011).

[6] J. Jureˇcková and S. Portnoy. Asymptotics for one-step M-estimators in regression with application to combining efficiency and high breakdown point.Commun. Statist. A, 16:2187–2199, (1987).

[7] J. Jureˇcková and Pranab K. Sen. Effect of the initial estimator on the asymptotic behavior of one-step M-estimator.

Ann. Inst. Statist. Math., 42:345–357, (1990).

[8] I. Mizera and C. H. Mueller. Breakdown points and variation exponents of robust M-estimators in linear models.

Ann. Statist, 27:1164–1177, (1999).