Plug-in estimators for higher-order transition densities in autoregression

(1)

DOI:10.1051/ps:2008001 www.esaim-ps.org

PLUG-IN ESTIMATORS FOR HIGHER-ORDER TRANSITION DENSITIES IN AUTOREGRESSION

^∗

Anton Schick

¹

and Wolfgang Wefelmeyer

²

Abstract. In this paper we obtain root-n consistency and functional central limit theorems in weighted L1-spaces for plug-in estimators of the two-step transition density in the classical stationary linear autoregressive model of order one, assuming essentially only that the innovation density has bounded variation. We also show that plugging in a properly weighted residual-based kernel estimator for the unknown innovation density improves on plugging in an unweighted residual-based kernel estimator. These weights are chosen to exploit the fact that the innovations have mean zero. If an eﬃcient estimator for the autoregression parameter is used, then the weighted plug-in estimator for the two-step transition density is eﬃcient. Our approach generalizes to invertible linear processes.

Mathematics Subject Classification. 62M05, 62M10.

Received July 24, 2007. Revised November 6, 2007.

Introduction

Suppose we observeX₀, . . . , X_n coming from a stationary linear autoregressive process of order one:

X_t=ρX_t−1+ε_t, t∈Z,

with ρ ∈ (−1,1) and independent and identically distributed innovations {ε_t, t ∈ Z} with mean zero, ﬁnite varianceσ² and a density f. We are interested in estimating the two-step transition density q of X_t+2 given X_t=x for some ﬁxedx. We could estimate q(z) nonparametrically by writing it as h(x, z)/g(z) with g and h the stationary densities of X_t and (X_t, X_t+2), respectively, and plugging in kernel estimators for h and g;

see Roussas [16,17], Nguyen [13] and Athreya and Atuncar [1]. The rates would be those for estimating a two- dimensional density. We could use the Markov structure and writeq(z) as

s(x, y)s(y, z) dywithsthe one-step transition density, and plugging in nonparametric estimators for s. This would give better rates, because the problem reduces to that of estimating one-dimensional functions, namely conditional expectations. This is work in progress. Functional central limit theorems cannot be expected (except locally, on intervals shrinking as the bandwidth) since for diﬀerent values of z the estimators eventually use essentially diﬀerent parts of the observations.

Keywords and phrases. Empirical likelihood, Owen estimator, least dispersed regular estimator, eﬃcient inﬂuence function, stochastic expansion of residual-based kernel density estimator.

∗The research of A. Schick was partially supported by NSF Grant DMS0405791.

1 Department of Mathematical Sciences, Binghamton University, Binghamton, NY 13902-6000, USA.

2 Mathematisches Institut, Universität zu Köln, Weyertal 86-90, 50931 Köln, Germany;[email protected]

Article published by EDP Sciences c EDP Sciences, SMAI 2009

(2)

Using the autoregressive structure, we can express the two-step transition density givenxas q(z) =

f(z−ρy−ρ²x)f(y) dy, z∈R.

This is a smooth functional off andρ, and we shall show that plugging in appropriate estimators forf andρ leads to a root-nconsistent estimator ofq. We must of course exclude the degenerate caseρ= 0.

First we discuss limitations and possible extensions of our approach. For the one-step transition density f(· −ρx) ofX_t+1 givenX_t=x, the rate of the estimator ˆf(· −ρx) equals the rate of ˆˆ f, and we do not have root-n consistency. Our results generalize however to estimation ofm-step transition densities of X_t+m given X_t=x, which can be expressed as

· · ·

f(· −ρy_m−1− · · · −ρ^m−1y₁−ρ^mx)f(y₁). . . f(y_m−1) dy₁. . .dy_m−1.

Our results also generalize to higher-order autoregression. To keep the paper readable, we consider only the simplest case.

Root-n consistent estimation of q is possible because q can be written as a convolution. This idea has already been used for other estimation problems in time series. Saavedra and Cao [18] obtain pointwise root- n consistency of a plug-in estimator for the stationary density p of a moving-average process of order one, X_t=ρε_t−1+ε_t, which can be expressed asp(y) =

f(y−ρx)f(x) dx. Schick and Wefelmeyer [20] prove that versions of this estimator are asymptotically normal and eﬃcient. Schick and Wefelmeyer [22] show that such estimators obey functional central limit theorems inL₁andC₀; they also consider higher-order moving average processes. – Estimating q is similar to estimating the stationary density pof X_t, which can be expressed as p(y) =

f(y−ρx)p(x) dx. Root-nconsistent estimators for the stationary density of general invertible linear processes are derived Schick and Wefelmeyer [24,26].

Similar results exist for i.i.d. observationsX₁, . . . , X_n and functionals not involving unknown parameters likeρ. Frees [4] shows that his local U-statistic estimators for densities of certain functions q(X₁, . . . , X_m) with m≥2 are pointwise root-nconsistent. Saavedra and Cao [19] consider the special caseX₁+aX₂withaknown.

Schick and Wefelmeyer [21,25] obtain functional central limit theorems for plug-in estimators of densities of u₁(X₁) +· · ·+u_m(X_m) and X₁+X₂ in L₁ and C₀. Gin´e and Mason [5,6] prove functional central limit theorems and laws of the iterated logarithm in L_p, 1 ≤p≤ ∞, for local U-statistic estimators of densities of functions q(X₁, . . . , X_m), under minimal conditions, and uniformly in the bandwidth.

To describe our plug-in estimator, let ˆρdenote theleast squares estimator

ρˆ= _n

j=1X_j−1X_j _n

j=1X_j−1² · Note that ˆρhas the martingale approximation

ρˆ−ρ= _n

j=1 X_j−1ε_j _n

j=1 X_j−1² = 1 E[X₀²]

1 n

n j=1

X_j−1ε_j+o_p(n^−1/2)

and E[X₀²] = σ²/(1−ρ²). In particular, ˆρ is asymptotically normal with variance 1−ρ². We mimic the innovationε_j by the residual

ˆε_j=X_j−ρXˆ _j−1, j= 1, . . . , n.

(3)

We can then estimate the innovation densityf by the kernel estimator based on these residuals, fˆ(y) = 1

n n j=1

k_b(y−εˆ_j), y∈R,

where k_b(y) =k(y/b)/b for some densityk and some bandwidth b = b_n. Substituting these estimators for f andρin the expression forqyields the following plug-in estimator ofq:

q(z) =ˆ

fˆ(z−ρyˆ −ρˆ²x) ˆf(y) dy, z∈R.

In place of the kernel estimator ˆf we can use aweighted kernel estimator fˆ_w(y) = 1

n n j=1

w_jk_b(y−εˆ_j), y∈R.

This results in the plug-in estimator qˆ_w(z) =

fˆ_w(z−ρyˆ −ρˆ²x) ˆf_w(y) dy, z∈R.

Since the innovations have mean zero, we choose weights for which the weighted empirical distribution of the residuals has mean zero. Motivated by Owen [14,15], we take weights of the form

w_j = 1

1 + ˆλˆε_j, (0.1)

where ˆλ is chosen such that the weights w₁, . . . , w_n are positive and _n

j=1w_jεˆ_j = 0. This is possible on the event{min_1≤j≤nεˆ_j<0<max_1≤j≤nεˆ_j}, which has probability tending to one; otherwise we set ˆλ= 0.

We shall show that the estimators ˆqand ˆq_ware root-nconsistent in theL₁-norm and obey functional central limit theorems in L₁ with the weighted estimator resulting in a smaller asymptotic variance structure. To describe these results let us deﬁne functionsχ₀,χ₁andχ₂by

χ₀(z) =

f(z−ρy)f(y) dy, χ₁(z) =

f(z−ρy)yf(y) dy, χ₂(z) =

(z−ρy)f(z−ρy)f(y) dy, z∈R.

We shall require that f is of bounded variation. Then the ﬁrst two of these functions will be shown to be absolutely continuous with integrable a.e. derivativesχ₀andχ₁. Now set χ=χ₁+χ₂and

q(z) =˙ −2ρxχ₀(z)−χ₁(z), z∈R.

In the following, letεandXbe distributed asε_tandX_t. Letf_ρdenote the density ofρε,i.e. f_ρ(y) =f(y/ρ)/|ρ|.

Deﬁne functionsψandψ_∗ by

ψ(y, z) =f_ρ(z−y) +f(z−ρy) and ψ_∗(y, z) =ψ(y, z)−σ⁻²yχ(z), y, z∈R.

(4)

These can be used to deﬁne processes Ψ(z) = 1

n n j=1

(ψ(ε_j, z)−E[ψ(ε, z)]), z∈R,

Ψ_∗(z) = 1 n

n j=1

(ψ_∗(ε_j, z)−E[ψ_∗(ε, z)]), z∈R.

Finally, fort∈R, introduce the shift operatorS_twhich assigns to an integrable functionhthe shifted version S_thdeﬁned byS_th(z) =h(z−t).

Theorem 1. Letρ= 0. Supposef has bounded variation and a ﬁnite moment of order greater than16/7. Let k be the standard normal density andb∼(nlogn)^−1/4. Then

qˆ−q−S_ρ2xΨ−( ˆρ−ρ)S_ρ2xq˙₁=o_p(n^−1/2), qˆ_w−q−S_ρ2xΨ_∗−( ˆρ−ρ)S_ρ2xq˙1=o_p(n^−1/2).

Moreover,n^1/2(ˆq−q)converges in distribution in L₁ toS_ρ2x(G∗+σ⁻¹Z₁χ+

1−ρ²Z₂q), while˙ n^1/2(ˆq_w−q) converges in distribution in L₁ toS_ρ2x(G∗+

1−ρ²Z₂q), where˙ Z₁ andZ₂ are independent standard normal random variables independent of the centered L₁-valued Gaussian process G∗ that has the same covariance structure asψ_∗(ε,·).

The rest of the paper is organized as follows. In Section1we describe our main result, a version of Theorem1 for certain weightedL₁-norms. Theorem1then follows by taking the weight function to be constant. Section2 gives conditions under which ˆq_wis efficient for all finite-dimensional marginals if an efficient estimator ˆρis used.

Section3derives stochastic approximations for the residual-based density estimators ˆf and ˆf_wused for plug-in.

The proof of the stochastic approximations (1.6) and (1.7) for ˆqand ˆq_wis in Section4.

1. A version of Theorem 1 in weighted L

1

-spaces

LetV be a positive measurable function. TheV-norm of a measurable functionhis deﬁned by hV =

V(x)|h(x)|dx.

This is the usualL₁-norm ifV = 1. LetL_V denote the space of all (equivalence classes of) measurable functions with ﬁniteV-norm. In other words,L_V is simply theL₁-space for the measure that has densityV with respect to the Lebesgue measure. Throughout we always impose the following assumptions on the functionV.

Assumption 1. The function V is continuous at 0 withV(0) = 1 and satisﬁes

V(s+t) ≤ V(s)V(t), s, t∈R, (1.1)

V(st) ≤ V(t), |s| ≤1, t∈R. (1.2)

Examples of such functions areV(y) = (1 +|y|)^randV(y) = exp(r|y|) for non-negativer. The class of functions satisfying Assumption 1 is closed under multiplication and under taking positive powers. Consequently, the following functions V_∗ andW_α share Assumption1withV,

V_∗(y) = (1 +|y|)V(y) and W_α(y) = (1 +|y|)^αV²(y), y∈R, withα≥0.

(5)

Assumption 1 was introduced by Schick and Wefelmeyer [25]. Property (1.1) is useful when dealing with shifts and convolutions, while (1.2) is useful when dealing with rescaling. More precisely, for functionsgandh of ﬁniteV-norm we have

S_thV ≤V(t)hV and g∗hV ≤ gVhV

and, with h_s(y) =h(y/s)/|s|fory∈R,

h_sV ≤ hV, 0<|s| ≤1.

Furthermore, for eachα >1 we have the inequality

h²_V ≤C_αh²Wα (1.3)

withC_α=

(1 +|y|)^−αdy. See Schick and Wefelmeyer [25] for details.

To state our result for the space L_V we need to strengthen the assumption that f has bounded variation.

Lethbe an integrable function of bounded variation. Then there are ﬁnite measuresμ₁andμ₂with equal mass μ₁(R) =μ₂(R) such that

h(x) =μ₁((−∞, x])−μ₂((−∞, x])

for all but countably manyx. This motivates the following deﬁnition. We say that an integrable functionhhas boundedV-variation ifhhas bounded variation and the measuresμ₁ andμ₂ can be chosen such that

Vdμ₁ and

Vdμ₂ are ﬁnite.

In this section we no longer insist that ˆρis the least squares estimator. Instead we consider more generally an estimator ˆρthat satisﬁes a martingale approximation

ρˆ−ρ= 1 n

n j=1

X_j−1φ(ε_j) +o_p(n^−1/2) (1.4)

for some measurable function φ such that E[φ(ε)] = 0 and E[φ²(ε)] is ﬁnite. The least squares estimator satisﬁes (1.4) withφ(y) =y/E[X²] = (1−ρ²)y/σ²,y∈R.

Theorem 2. Letρ= 0. Suppose the densitykhas mean zero and is four times continuously diﬀerentiable with bounded derivatives andk and its four derivatives have ﬁnite V_∗²-norms. Let b∼(nlogn)^−1/4. Supposef has boundedV-variation and

(1 +|y|)^αV²(y) +y²V(y) +|y|^ξ

f(y) dy <∞ (1.5)

for someα >1 and someξ >16/7. Let (1.4) hold and let γ denote the standard deviation ofX₀φ(ε₁). Then qˆ−q−S_ρ2xΨ−( ˆρ−ρ)S_ρ2xq˙V =o_p(n^−1/2), (1.6) qˆ_w−q−S_ρ2xΨ_∗−( ˆρ−ρ)S_ρ2xq˙_V =o_p(n^−1/2). (1.7) Moreover,n^1/2(ˆq−q)converges in distribution inL_V toS_ρ2x(G∗+σ⁻¹Z₁χ+γZ₂q), while˙ n^1/2(ˆq_w−q)converges in distribution inL_V toS_ρ2x(G_∗+γZ₂q), where˙ Z₁ andZ₂ are independent standard normal random variables independent of the centeredL_V-valued Gaussian process G_∗ that has the same covariance structure asψ_∗(ε,·).

We obtain Theorem1by takingV = 1,kthe standard normal density, and ˆρthe least squares estimator, for whichγ=

1−ρ². If V(x) = (1 +|x|)^r for some non-negativer, then (1.5) amounts to a moment condition.

For example, for r≥1, (1.5) is equivalent tof having a ﬁnite moment of order greater than 1 + 2r.

We shall defer the proof of the expansions (1.6) and (1.7) to Section4. A proof of the weak convergence of n^1/2(ˆq−q) and n^1/2(ˆq_w −q) is given next. For this let Z₁, Z₂ and G∗ be as in Theorem 2. In view of the expansions (1.6) and (1.7) and the continuity of shifts and addition in L_V, it suﬃces to prove that An = n^1/2(Ψ_∗,Ψ−Ψ_∗,( ˆρ−ρ) ˙q) converges in distribution in L³_V to A = (G∗, σ⁻¹Z₁χ, γZ₂q). For this it is˙

(6)

enough to show that the sequence A_n is tight inL³_V and that M_h(A_n) converges to M_h(A) for every triplet h= (h₁, h₂, h₃) of bounded measurable functions, whereM_h is the operator fromL³_V toRdeﬁned by

M_h(g) = h₁(z)g₁(z) +h₂(z)g₂(z) +h₃(z)g₃(z)

dz, g= (g₁, g₂, g₃)∈L³_V.

Here we made use of the fact that the familyM_his a probability determining class onL³_V. In view of (1.4) we obtain the expansion

M_h(An) =n^−1/2 n j=1

a₁(ε_j)−E[a₁(ε)]−a₂ε_j

σ² +a₃X_j−1φ(ε_j)

+o_p(n^−1/2)

with a₁(y) =

h₁(z)ψ_∗(y, z) dz, a₂ =

h₂(z)χ(z) dz and a₃ =

h₃(z) ˙q(z) dz. Note also thatE[X] = 0 and E[a₁(ε)ε] = 0, the latter in view of (1.8). Thus the martingale central limit theorem yields that M_h(An) is asymptotically normal with mean zero and variance

Var(a₁(ε)) +a²₂

σ² +a²₃γ².

This is also the variance of the centered normal random variableM_h(A). Thus we have shown that M_h(An) converges to M_h(A) for each tripleth = (h₁, h₂, h₃) of bounded measurable functions. Hence we are left to establish tightness of An. For this we recall the central limit theorem for the spaceL_V; see e.g. Ledoux and Talagrand [11], Theorem 10.10.

Theorem 3. LetY, Y₁, Y₂, . . . be independent and identically distributedL_V-valued random variables with mean zero. Then n^−1/2_n

j=1Y_j converges in distribution in the space L_V to a centered Gaussian process (with the same covariance structure as Y) if and only if

t→∞lim t²P(YV > t) = 0 and

V(z)(E[Y²(z)])^1/2dz <∞.

A suﬃcient condition for this central limit theorem is thatE[Y²] has ﬁniteW_α-norm for someα >1; see Schick and Wefelmeyer [25].

AbbreviateW_α byW for theαof Theorem2. For eachz∈R, we calculate E[ψ(ε, z)ε] =

f_ρ(z−y)yf(y) dy+χ₁(z) =

(z−y)f(z−y)f_ρ(y) dy+χ₁(z) =χ(z) (1.8) and thus ﬁnd

E[ψ²_∗(ε, z)] =E[ψ²(ε, z)]−σ⁻²χ²(z)≤E[ψ²(ε, z)]≤2f_ρ²∗f(z) + 2f²∗f_ρ(z).

Sincef has ﬁniteW-norm and is bounded under the assumptions of Theorem2, we obtain that E[ψ²_∗(ε,·)]W ≤ E[ψ²(ε,·)]W ≤2f_ρ²∗fW+ 2f²∗f_ρW

≤2f_ρ²WfW + 2f²Wf_ρW ≤4Df²_W

for some constantD. Thusn^1/2Ψ andn^1/2Ψ_∗ obey a central limit theorem inL_V. Sincen^1/2( ˆρ−ρ) is tight, so isn^1/2( ˆρ−ρ) ˙qin view of the continuity of the map (s, h)→shfromR×L_V into L_V. This shows that the sequenceAn is tight inL³_V.

(7)

We end this section by stating some expansions in L_V. We say a function h is L_V-Lipschitz if there is a constantC such that

S_th−h_V ≤C|t|V(t), t∈R.

It was shown in Schick and Wefelmeyer [25], Lemma 8, that functions of boundedV-variation areL_V-Lipschitz.

LetAdenote the set of functionsgwith ﬁniteV-norm that are absolutely continuous and whose a.e. deriva- tivesg have ﬁnite V-norms. LetAL={g∈ A:g isL_V-Lipschitz}and A2={g∈ A:g∈ A}.

Lemma 1. Let K be a measurable function with ﬁnite V_∗-norm. Then ν_i =

uⁱK(u) duis ﬁnite for i= 0,1, and the following are true.

(1) If g isL_V-Lipschitz, then g∗K_b−ν₀g_V =O(b).

(2) If g belongs toA, theng∗K_b−ν₀g+bν₁g_V =o(b).

(3) If g belongs toAL and

V(u)u²|K(u)|duis ﬁnite, theng∗K_b−ν₀g+bν₁gV =O(b²).

Proof. The ﬁrst assertion follows from Lemma 7(1) of Schick and Wefelmeyer [25] with γ = 1; the last two

assertions follow from their Lemma 6.

2. Efficiency

In this section we recall a characterization of efficient estimators in autoregressive models and use it to prove that ˆq_wis efficient if an efficient estimator ˆρis used. Fixρ∈(−1,1) and an innovation densityf. Assume that f is absolutely continuous witha.e. derivativef, and that theFisher information for location,J =E[²(ε)], is finite, where=−f/f. Introduce perturbationsρ_nr=ρ+n^−1/2r, r∈R, and f_nh =f(1 +n^−1/2h) withhin the spaceH of bounded measurable functions such thatE[h(ε)] =E[εh(ε)] = 0. Let P_n+1 andP_n+1,rhdenote the joint stationary law of (X₀, . . . , X_n) under (ρ, f) and (ρ_nr, f_nh), respectively. Then we havelocal asymptotic normality under P_n+1,

logdP_n+1,rh

dP_n+1 =n^−1/2ⁿ

j=1

rX_j−1(ε_j) +h(ε_j)

−1 2

r²E[X²]J+E[h²(ε)]

+o_p(1);

seee.g. Koul and Schick [8]. Let ¯H denote the closure ofH inL₂(f). The squared norm on the right-hand side deﬁnes an inner product on R×H¯. A real-valued functional κ of (ρ, f) is calleddiﬀerentiable at (ρ, f) with gradient (r_κ, h_κ)∈R×H¯ if

n^1/2(κ(ρ_nr, f_nh)−κ(ρ, f))→r_κrE[X²]J+E[h_κ(ε)h(ε)], (r, h)∈R×H.

An estimator ˆκofκis calledregular at (ρ, f) withlimit L ifL is a random variable such that n^1/2(ˆκ−κ(ρ_nr, f_nh))⇒LunderP_n+1,rh, (r, h)∈R×H.

The convolution theorem of Hájek and LeCam says thatL is distributed as the convolution of some random variable with a normal random variableN that has mean 0 and variancer²_κE[X²]J+E[h²_κ(ε)]. This justifies calling ˆκefficient ifLis distributed asN.

An estimator ˆκof κis called asymptotically linear at (ρ, f) with inﬂuence function g if g ∈ L₂(P₂) with E(g(X₀, X₁)|X₀) = 0 and

n^1/2(ˆκ−κ(ρ, f)) =n^−1/2 n j=1

g(X_j−1, X_j) +o_p(1).

It follows from a version of the convolution theorem that an estimator ˆκis regular and eﬃcient if and only if it is asymptotically linear with inﬂuence functiong(X₀, X₁) =r_κX₀(ε₁) +h_κ(ε₁). We refer to Bickelet al.[2]

for these results.

(8)

We read off immediately the well-known result that an estimator forρis regular and efficient if (and only if) it is asymptotically linear with influence functionr_ρX₀(ε₁), where 1/r_ρ=E[X²]J,i.e. expansion (1.4) holds with φ(y) =r_ρ(y). Efficient estimators for parameters of time series are constructed in Kreiss [9,10], Jeganathan [7], Drostet al. [3] and Koul and Schick [8]. If we use an efficient estimator ˆρfor ˆq_w, then, by Theorem2, ˆq_w(z) has influence functionr_zX₀(ε₁) +S_ρ2x(ψ_∗(ε₁, z)−E[ψ_∗(ε, z)]) with r_z =r_ρS_ρ2xq(z). To prove that ˆ˙ q_w(z) is efficient, we need only check that (r_z, S_ρ2x(ψ_∗(ε₁, z)−E[ψ_∗(ε, z)])) is the gradient ofκ(ρ, f) =q(z). Note first that for (r, h)∈R×H we can write

n^1/2(κ(ρ_nr, f_nh)−κ(ρ, f)) =n^1/2

f_nh(z−ρ_nry−ρ²_nrx)f_nh(y) dy−

f(z−ρy−ρ²x)f(y) dy

.

By Taylor expansion this converges to

−r2ρx

f(z−ρy−ρ²x)(z−ρy−ρ²x)f(y) dy−r

f(z−ρy−ρ²x)y(z−ρy−ρ²x)f(y) dy +

f(z−ρy−ρ²x)h(z−ρy−ρ²x)f(y) dy+

f(z−ρy−ρ²x)h(y)f(y) dy which can be rewritten as

−r

2ρxS_ρ2xχ₀(z) +S_ρ2xχ₁(z) +S_ρ2x

f_ρ(z−y)h(y)f(y) dy+S_ρ2x

f(z−ρy)h(y)f(y) dy

=rS_ρ2xq(z) +˙ E[S_ρ2xψ(ε, z)h(ε)] =r_zrE[X²]J+E[S_ρ2x(ψ_∗(ε, z)−E[ψ_∗(ε, z)])h(ε)].

This shows thatκ(ρ, f) =q(z) has the desired gradient. In the last step we have used thatψ_∗(ε, z)−E[ψ_∗(ε, z)]

is the projection ofψ(ε, z) onto ¯H.

3. The behavior of the density estimators

In this section we derive convergence rates and stochastic expansions for residual-based density estimators.

Throughout the section we let Kdenote a measurable function with ﬁnite V-norm and set ν_K =

K(y) dy and K_b(y) =K(y/b)/b, y∈R. Forz∈Rlet

Hˆ_w(z) = 1 n

n j=1

w_jK_b(z−εˆ_j), H(z) =ˆ 1 n

n j=1

K_b(z−εˆ_j),

H˜(z) = 1 n

n j=1

K_b(z−ε_j), H¯(z) =E[ ˜H(z)] =K_b∗f(z).

Now ﬁx an αwith 1 < α ≤ 2 and set W = W_α. Since f has a ﬁnite second moment, so do the stationary random variablesX_t, and this yields

1≤j≤nmax |X_j−1|=o_p(n^1/2). (3.1) The root-nconsistency of ˆρthen implies that

τ_n:= max

1≤j≤n|εˆ_j−ε_j|=|ρˆ−ρ| max

1≤j≤n|X_j−1|=o_p(1). (3.2) From Lemma1we obtain the following result.

(9)

Lemma 2. Let f have ﬁniteV-norm and be L_V-Lipschitz. Let K have ﬁniteV_∗-norm. Then H¯ −ν_Kf_V =O(b).

Lemma 3. Let f andK² have ﬁniteW-norms. Then

H˜ −H¯V =O_p(n^−1/2b^−1/2).

Proof. We calculate

nVar( ˜H(y))≤E[K_b²(y−ε)] =K_b²∗f(y), y∈R, and thus obtain in view of (1.3) the bound

nE[H˜−H¯²_V]≤C_αK_b²∗fW ≤C_αfWK_b²W.

This yields the desired result asK_b²W ≤b⁻¹K²W forb≤1.

Lemma 4. Suppose thatf isL_V-Lipschitz and has finiteW-norm. LetK be twice continuously differentiable, let(K)² and(K)² have finiteW-norms, and letK andK have finiteV_∗-norms. Letnb⁴→0. Then

Hˆ −H˜V =O_p(n⁻¹b⁻²).

Proof. We may assume thatb≤1. In view ofV ≤V_∗≤W, the functionsf,KandKhave ﬁniteV-norms. Let K_b andK_bdenote the ﬁrst and second derivative ofK_b. ThenK_b(x) =K(x/b)/b²andK_b(x) =K(x/b)/b³. Using this it is easy to see that

K_bV =O(b⁻¹), K_bV =O(b⁻²), (K_b)²W =O(b⁻³), (K_b)²W =O(b⁻⁵).

Sincef isL_V-Lipschitz andK andKhave ﬁniteV_∗-norms, and since the integrals

K(u) duand

K(u) du equal zero, part (i) of Lemma1implies that

f ∗K_bV =O(1) and f∗K_bV =O(b⁻¹). (3.3) Set

Γ(y) = 1 n

n j=1

X_j−1K_b(y−ε_j), Γ¯(y) = 1 n

n j=1

X_j−1K_b∗f(y), y∈R. Then we have

Γ¯_V =O_p

⎛

⎝1 n

n j=1

X_j−1

⎞

⎠=O_p(n^−1/2) =o_p(n⁻¹b⁻²).

Note thatnE[(Γ(y)−Γ¯(y))²] =E[X²](K_b)²∗f(y). Using this and (1.3) we obtain that nE[Γ −Γ¯²_V]≤C_αE[X²](K_b)²∗f_W ≤C_αE[X²]f_W(K_b)²_W =O(b⁻³).

The above shows that

( ˆρ−ρ)ΓV =O_p(n⁻¹b^−3/2).

Thus it suﬃces to show that

Hˆ −H˜ −( ˆρ−ρ)ΓV =O_p(n⁻¹b⁻²).

A Taylor expansion shows that ˆH(y)−H˜(y)−( ˆρ−ρ)Γ(y) equals 1

n n j=1

( ˆρ−ρ)²X_j−1² ₁

0

K_b

y−ε_j+u( ˆρ−ρ)X_j−1

(1−u) du.

(10)

UsingS_th_V ≤V(t)h_V and V(s+t)≤V(s)V(t) we obtain Hˆ −H˜ −( ˆρ−ρ)ΓV ≤ 1

n n j=1

( ˆρ−ρ)²X_j−1² V(ε_j)V(τ_n)K_bV.

This gives the desired result in view of the bounds

( ˆρ−ρ)²K_bV =O_p(n⁻¹b⁻²), V(τ_n) = 1 +o_p(1), 1 n

n j=1

X_j−1² V(ε_j) =O_p(1),

the latter by the ﬁniteness ofE[X₀²V(ε₁)].

A stronger result is possible under an additional moment assumption.

Lemma 5. Let the assumptions of Lemma 4hold withb∼(nlogn)^−1/4. Let f have a ﬁnite moment of order ξ >16/7. Then

Hˆ −H˜V =o_p(n^−1/2).

Proof. We may assume thatb≤1 andξ <4. It follows from the moment assumption onf that the stationary random variablesX_t also have ﬁnite moments of orderξ. This implies that

1≤j≤nmax |X_j−1|=o_p(n^1/ξ). (3.4) In view of the results in the proof of Lemma4 it suﬃces to show that

Hˆ −H˜ −( ˆρ−ρ)ΓV =o_p(n^−1/2).

Setr_nj =n^−1/2X_j−1 and ˆΔ =n^1/2( ˆρ−ρ). Then ˆε_j−ε_j =−( ˆρ−ρ)X_j−1 =−Δrˆ _nj. This allows us to write Hˆ(y)−H˜(y)−( ˆρ−ρ)Γ(y) asR_Δ_ˆ(y), where

R_Δ(y) = 1 n

n j=1

K_b(y−ε_j+ Δr_nj)−K_b(y−ε_i)−Δr_njK_b(y−ε_j) .

In view of then^1/2-consistency of ˆρ, it suﬃces to show that, for each (large) constantC, sup

|Δ|≤CR_ΔV =o_p(n^−1/2). (3.5) Now ﬁx such a C. A Taylor expansion shows that

R_Δ(y) = 1 n

n j=1

(Δr_nj)² ₁

0 K_b(y−ε_j+uΔr_nj)(1−u) du and that

R_{Δ+ ˜}_Δ(y)−R_Δ(y) = 1 n

n j=1

Δr˜ _nj ₁

0 (Δ +vΔ)r˜ _nj ₁

0

K_b

y−ε_j+u(Δ +vΔ)r˜ _nj dudv.

(11)

Let a=a_n be a sequence of positive numbers such thatC ≥a and a∼(logn)⁻¹. It follows fromS_th_V ≤ V(t)hV and the properties ofV that

sup

|Δ|≤C sup

|Δ|≤a˜

R_{Δ+ ˜}_Δ−R_Δ_V ≤a(C+a)V (C+a) max

1≤j≤n|r_nj|

K_b_V 1 n²

n j=1

X_j−1² V(ε_j).

Since max_1≤j≤n|r_nj|=o_p(1) andE[X_j−1² V(ε_j)] =E[X²]fV, we see that sup

|Δ|≤C sup

|Δ|≤a˜

R_{Δ+ ˜}_Δ−R_Δ_V =O_p(an⁻¹b⁻²) =O_p((nlogn)^−1/2). (3.6)

Let now R_Δ^∗(y) be deﬁned as R_Δ(y), but with r_nj replaced by r_nj^∗ =r_nj1[|X_j−1| ≤n^1/ξ]. Then R^∗_Δ(y) and R_Δ(y) can diﬀer only on the event{max_1≤j≤n|X_j−1|> n^1/ξ}which has probability tending to zero. This shows that

sup

|Δ|≤CR^∗_Δ−R_ΔV =o_p(n^−1/2). (3.7) Now set

R¯^∗_Δ(y) = 1 n

n j=1

(Δr^∗_nj)² ₁

0

K_b_n

y−z+uΔr^∗_nj

f(z) dz(1−u) du.

In view of (3.3) and (1.1) we obtain

sup

|Δ|≤CR¯_Δ^∗_V ≤ 1 n

n j=1

Δr^∗_nj₂

V C max

1≤j≤n|r_nj|

K_b∗f_V =O_p(n⁻¹b⁻¹) =o_p(n^−1/2). (3.8)

Next,

E[(R^∗_Δ−R¯^∗_Δ)²W]≤ ₁

0

W(y)E[Γ²_Δ(y, u)] dydu, with

Γ_Δ(y, u) = 1 n

n j=1

(Δr_nj^∗ )² K_b_n

y−ε_j+uΔr^∗_nj

−

K_b_n

y−z+uΔr^∗_nj

f(z) dz a martingale. Since

E[Γ_Δ²(y, u)]≤n⁻³|Δ|⁴E[|X₀|⁴1[|X₀| ≤n^1/ξ](K_b_n(y−ε₁+uΔn^−1/2X₀))²], we obtain from (1.1) withW in place ofV that

n sup

|Δ|≤CE[(R^∗_Δ−R¯^∗_Δ)²_W]≤W(Cn^−1/2+1/ξ)C⁴n^{−2+(4−ξ)/ξ}b⁻⁵_n E[|X|^ξ]f_W(K)²_W. Since−2 + (4−ξ)/ξ+ 5/4<0 in view ofξ >16/7, we arrive at the rate

n sup

|Δ|≤CE[(R^∗_Δ−R¯^∗_Δ)²W] =O_p(n^−ζ)

(12)

for some ζ > 0. In view of (1.3) we then have, for every ﬁnite subset D_n of the interval (−C, C) with M_n elements,

P max

Δ∈Dn

n^1/2R^∗_Δ−R¯^∗_ΔV > η

≤

Δ∈Dn

P(n^1/2R^∗_Δ−R¯^∗_ΔV > η)

≤

Δ∈Dn

P(nC_α(R^∗_Δ−R¯^∗_Δ)²_W > η²)

≤

Δ∈Dn

nC_α

η² E[(R^∗_Δ−R¯_Δ^∗)²_W], η >0.

This shows that, for everyη >0 and every ﬁnite subsetD_n as above, P max

Δ∈Dn

n^1/2R^∗_Δ−R¯^∗_ΔV > η

=O(M_nn^−ζ). (3.9)

Now takeD_n to be such that the intervals of lengtha_n centered at elements of D_n cover the interval [−C, C].

We can chooseD_n such thatM_n=O(a⁻¹_n ) =O(logn). The desired (3.5) follows from (3.6)–(3.9).

Lemma 6. Let the assumptions on f and K of Lemmas 2 to4hold. Let nb⁴ →0 andnb^8/3 → ∞. Then we have

Hˆ −ν_KfV =o_p(n^−1/4).

In addition, ifK has ﬁnite W-norm and

u²|K(u)|duis ﬁnite, then Hˆ −ν_KfV∗ =o_p(n^−1/16).

If also

y²V(y)f(y) dy is ﬁnite, then

Hˆ −ν_Kf_V_∗ =o_p(n^−1/8).

Proof. The ﬁrst conclusion is a consequence of Lemmas2to 4. Sinceα >1, we have (1 +|y|)V^3/2(y)≤W(y), y∈R,

and hence the inequality h²_V_∗ =

(1 +|y|)V^3/2(y)|h(y)|dy

(1 +|y|)V^1/2(y)|h(y)|dy≤ hWh^1/2_U h^1/2_V

where U(y) = (1 +|y|)², y ∈ R. The second conclusion now follows from the facts that f_W and f_U are finite and thatHˆW =O_p(1) andHÛ =O_p(1). For example, using S_thW ≤W(t)hW andW(x+y)≤ W(x)W(y), we find

HˆW ≤ 1 n

n j=1

|K_b(z−εˆ_j)|W(z) dz≤ K_bW1 n

n j=1

W(ˆε_j)≤ KWW(τ_n)1 n

n j=1

W(ε_j)

forb≤1 and thus Hˆ_W =O_p(1) in view of (3.2) and ﬁniteness ofE[W(ε)] =f_W. The third conclusion is proved similarly using instead the inequality

h²_V_∗≤

(1 +|y|)²V(y)|h(y)|dy

V(y)|h(y)|dy

Letιdenote the identity map onR.