Donsker results for the

(1)

Donsker results for the

smoothed empirical process of dependent observations

Eric Beutner, Maastricht University Joint work with Henryk Zähle

Besancon, May 13

(2)

Overview

2

Introduction

Extending the benchmark Free lunch? Yes free lunch!

One more free lunch? No (too much asked).

(3)

Introduction

(4)

Introduction

4

■ Let Z₁, . . . , Z_n be identically distributed with cdf F₀ and density f₀ and consider the kernel based estimator for f₀

fˆ_n(z) = 1 n

Xn

i=1

1 h_n K

z − Z_i h_n

(5)

Introduction

fˆ_n(z) = 1 n

Xn

i=1

1 h_n K

z − Z_i h_n

=

Z 1 h_n K

z − y h_n

dFb_n(y),

where Fb_n is the empirical distribution function based on Z₁, . . . , Z_n and h_n is the bandwidth.

(6)

Introduction

4

fˆ_n(z) = 1 n

Xn

i=1

1 h_n K

z − Z_i h_n

=

Z 1 h_n K

z − y h_n

dFb_n(y),

■ Let the kernel K be symmetric and for some C > 0 Z

K(y) dy = 1, Z

yK(y) dy = 0, and Z

y²K(y)dy = C.

(7)

Introduction

fˆ_n(z) = 1 n

Xn

i=1

1 h_n K

z − Z_i h_n

=

Z 1 h_n K

z − y h_n

dFb_n(y),

■ Let the kernel K be symmetric and for some C > 0 Z

K(y) dy = 1, Z

yK(y) dy = 0, and Z

y²K(y)dy = C.

■ For f₀ twice continuously differentiable, h_n = n^−1/5 is optimal for MISE( ˆf_n) =

Z

E[( ˆf_n(z) − f₀(z))²] dz

(8)

Introduction (cont’d)

5

■ Ideally, we could use this estimate for f₀ to estimate, for instance, moments of F₀. The second moment plug-in estimate would be

Z

z²fˆ_n(z) dz =

Xn

i=1

Z

z² 1

nh_n K

z − Z_i h_n

dz.

(9)

Introduction (cont’d)

Z

z²fˆ_n(z) dz =

Xn

i=1

Z

z² 1

nh_n K

z − Z_i h_n

dz.

■ Substituting z by zh_n + Z_i this equals 1

n

Xn

i=1

Z

z²h²_nK(z) dz + 2 n

Xn

i=1

Z_ih_n Z

zK(z) dz + 1 n

Xn

i=1

Z_i².

(10)

Introduction (cont’d)

5

Z

z²fˆ_n(z) dz =

Xn

i=1

Z

z² 1

nh_n K

z − Z_i h_n

dz.

n

Xn

i=1

Z

Xn

i=1

Z_ih_n Z

zK(z) dz + 1 n

Xn

i=1

Z_i².

■ The difference between this estimate and R

z²f₀(z) dz equals 1

n

Xn

i=1

Z_i² − Z

z²f₀(z) dz + Ch²_n.

(11)

Introduction (cont’d)

Z

z²fˆ_n(z) dz =

Xn

i=1

Z

z² 1

nh_n K

z − Z_i h_n

dz.

n

Xn

i=1

Z

Xn

i=1

Z_ih_n Z

zK(z) dz + 1 n

Xn

i=1

Z_i².

■ The difference between this estimate and R

z²f₀(z) dz equals 1

n

Xn

i=1

Z_i² − Z

z²f₀(z) dz + Ch²_n.

■ ⇒ R

z²fˆ_n(z)dz not √

n consistent for E[Z₁²] as √

nh_n = n^1/10 → ∞.

(12)

Introduction (cont’d)

6

■ Can we find K such that we have √

n consistency for the plug-in estimator

fˆ_n → Z

z²fˆ_n(z)dz,

i.e. √

n

Z

z²fˆ_n(z)dz − Z

z²f₀(z) dz

= O_P (1), and MISE-optimality at the same time?

(13)

Introduction (cont’d)

■ Can we find K such that we have √

n consistency for the plug-in estimator

fˆ_n → Z

z²fˆ_n(z)dz,

i.e. √

n

Z

z²fˆ_n(z)dz − Z

z²f₀(z) dz

= O_P (1), and MISE-optimality at the same time?

■ More generally, for which functions g and kernels K can we have MISE-optimality and

√n

Z

g(z) ˆf_K(z) dz − Z

g(z)f₀(z) dz

= O_P (1) at the same time?

(14)

Introduction (cont’d)

7

■ Comparing this last question to what is known for the plug-in estimator based on Fb_n where even

sup

˜ g∈G˜

√n

Z

˜

g(z)dFb_n(z) − Z

˜

g(z) dF₀(z)

= O_P (1)

for several classes of functions G˜ (even converges weakly)

(15)

Introduction (cont’d)

■ Comparing this last question to what is known for the plug-in estimator based on Fb_n where even

sup

˜ g∈G˜

√n

Z

˜

g(z)dFb_n(z) − Z

˜

g(z) dF₀(z)

= O_P (1)

for several classes of functions G˜ (even converges weakly)

■ we may ask if we can have for some kernels MISE-optimality and sup

˜ g∈G˜

√n

Z

˜

g(z) ˆf_K(z)dz − Z

˜

g(z)f₀(z)dz

= O_P(1) for the same classes of functions (or even weak convergence).

(16)

Introduction (cont’d)

8

■ Already known: If Z₁, . . . , Z_n are iid it holds, for instance, that for G˜ = BV = {g˜ : R → R | variation of g˜ ≤ c} we have

sup

˜ g∈BV

√n

Z

˜

g(z) dFb_n(z) − Z

˜

g(z) dF₀(z)

= O_P(1).

■ Giné and Nickl (2008) proved that for this class of functions and f₀ bounded and m-times continuously differentiable one has with

h_n = n⁻^2m+1¹ (i.e.MISE optimality) sup

˜ g∈BV

√n

Z

˜

g(z)f₀(z)dz

= O_P(1),

(17)

Introduction (cont’d)

■ Already known: If Z₁, . . . , Z_n are iid it holds, for instance, that for G˜ = BV = {g˜ : R → R | variation of g˜ ≤ c} we have

sup

˜ g∈BV

√n

Z

˜

g(z) dFb_n(z) − Z

˜

g(z) dF₀(z)

= O_P(1).

■ Giné and Nickl (2008) proved that for this class of functions and f₀ bounded and m-times continuously differentiable one has with

h_n = n⁻^2m+1¹ (i.e.MISE optimality) sup

˜ g∈BV

√n

Z

˜

g(z)f₀(z)dz

= O_P(1),

■ if √

nh^m+k_n → 0 and if with r = 1, . . . , m, and k > 1/2 Z

K(z) dz = 1, Z

z^rK(z) dz = 0, Z

|z|^m+k|K(z)| dz < ∞.

(18)

Introduction (cont’d)

9

■ Giné and Nickl (2008) proved two more results in the same spirit with G˜ ⊂ C(R) where

C(R) = {f : R → R | |f(x)| ≤ M, x ∈ R, f continuous}.

■ Clearly, our example from the beginning g(z) = z² does not belong to C(R) nor is it of bounded variation on R.

(19)

Introduction (cont’d)

■ First, one might ask whether the above results can be extended, for instance, to the set

BV_loc = {g˜ : R → R | g˜ is locally of bounded variation}.

(20)

Introduction (cont’d)

9

■ First, one might ask whether the above results can be extended, for instance, to the set

BV_loc = {g˜ : R → R | g˜ is locally of bounded variation}.

■ Second, one might ask whether we can extend the results to time series settings, where

Z_i is ARMA(p, q) or Z_i is GARCH(p, q).

(21)

Extending the benchmark

(22)

Locally bounded variation: iid

11

■ First recall the benchmark for the kernel based density estimator sup

˜ g∈BV

√n

Z

˜

g(z)dFb_n(z) − Z

˜

g(z) dF(z)

= O_P(1).

■ Hence, we first need to extend this result into two directions:

1. Replace sup w.r.t. g˜ ∈ BV by sup w.r.t. g˜ ∈ BV_loc;

2. Replace Fb_n based on iid Z₁, . . . , Z_n by Z_i is ARMA(p, q) or Z_i is GARCH(p, q) (or more generally, by some weak

dependence concept).

(23)

Locally bounded variation: iid

■ Let φ : R → [1, ∞) be a weight function and put BV_(1/φ),≤c =

˜

g ∈ BV_loc :

Z 1

φ(z) |d˜g|(z) ≤ c

.

(24)

Locally bounded variation: iid

12

˜

g ∈ BV_loc :

Z 1

φ(z) |d˜g|(z) ≤ c

.

■ Note that for φ ≡ 1 we get the functions of bounded variation.

(25)

Locally bounded variation: iid

˜

g ∈ BV_loc :

Z 1

φ(z) |d˜g|(z) ≤ c

.

■ If we take φ(z) = (1 + |z|)^2+ǫ, ǫ > 0, then our example from the beginning g(z) = z² (dg(z) = 2z) is included in

BV_(1/(1+|z|)_2+ǫ_),≤c =

˜

g ∈ BV_loc :

Z 1

(1 + |z|)^2+ǫ |dg˜|(z) ≤ c

.

(26)

Locally bounded variation: iid

12

˜

g ∈ BV_loc :

Z 1

φ(z) |d˜g|(z) ≤ c

.

■ If we take φ(z) = (1 + |z|)^2+ǫ, ǫ > 0, then our example from the beginning g(z) = z² (dg(z) = 2z) is included in

BV_(1/(1+|z|)_2+ǫ_),≤c =

˜

g ∈ BV_loc :

Z 1

(1 + |z|)^2+ǫ |dg˜|(z) ≤ c

.

■ Theorem: Let Z₁, . . . , Z_n be iid and φ be a weight function. Then for F₀ with R

φ²dF₀ < ∞, we have sup

˜ g∈BV

(1/φ),≤c

√n

Z

˜

g(z) dFb_n(z) − Z

˜

g(z)dF₀(z)

= O_P(1).

(27)

Dependent data

■ Theorem: Z₁, . . . , Z_n be strictly stationary and α-mixing with α(n) = O(n^−θ) for some θ > 1 + √

2. Let φ_λ(x) := (1 + |x|)^λ, λ ≥ 0 and assume that R

R |x|^γ dF₀(x) < ∞ where γ > _θ−1^2θλ . Then we have sup

˜

g∈BV_(1/φ),≤c

√n

Z

˜

g(z) dFb_n(z) − Z

˜

g(z)dF₀(z)

= O_P(1).

(28)

Dependent data

13

■ Theorem: Z₁, . . . , Z_n be strictly stationary and α-mixing with α(n) = O(n^−θ) for some θ > 1 + √

2. Let φ_λ(x) := (1 + |x|)^λ, λ ≥ 0 and assume that R

R |x|^γ dF₀(x) < ∞ where γ > _θ−1^2θλ . Then we have sup

˜

g∈BV_(1/φ),≤c

√n

Z

˜

g(z) dFb_n(z) − Z

˜

g(z)dF₀(z)

= O_P(1).

■ Theorem: Let Z_t := P∞

s=0 a_s ε_t−s, t ∈ N, with (ε_i)_i∈^Z i.i.d. Assume a_s = s^−β ℓ(s), β ∈ (¹₂,1), s ∈ N. Then Cov(X₀, X_k) ∼ k^1−2β, hence non-summable, thus long-memory.

With φ_λ as above and E[|ε₀|^2+2λ] < ∞, we have sup

˜

g∈BV_(1/φ),≤c

r_n

Z

˜

g(z) dFb_n(z) − Z

˜

g(z) dF₀(z)

= O_P(1), where r_n = n^β−1/2.

(29)

Free lunch? Yes free lunch!

(30)

Intro

15

■ First note that we have Z

˜

g(z)f₀(z)dz

=

ZZ

˜

g(z) 1 h_n K

z − y h_n

dFb_n(y)dz −

ZZ

˜

g(z) 1 h_nK

z − y h_n

dF₀(y)dz +

ZZ

˜

g(z) 1 h_n K

x − y h_n

dF₀(y)dz − Z

˜

g(z)dF₀(z)

■ Rewriting the second line as Z Z

˜

g(z) 1 h_nK

z − y h_n

dz

| {z }

¯ gn(y)

dFb_n(y)−

Z Z

˜

g(z) 1 h_n K

z − y h_n

dz

| {z }

¯ gn(y)

dF₀(y)

we see that the above benchmark result applies if all the g¯_n belong to BV_(1/φ),≤c

(31)

Intro

˜

g(z)f₀(z)dz

=

ZZ

˜

g(z) 1 h_n K

z − y h_n

dFb_n(y)dz −

ZZ

˜

g(z) 1 h_nK

z − y h_n

dF₀(y)dz +

ZZ

˜

g(z) 1 h_n K

x − y h_n

dF₀(y)dz − Z

˜

g(z)dF₀(z)

˜

g(z) 1 h_nK

z − y h_n

dz

| {z }

¯ gn(y)

dFb_n(y)−

Z Z

˜

g(z) 1 h_n K

z − y h_n

dz

| {z }

¯ gn(y)

dF₀(y)

(32)

Intro

15

˜

g(z)f₀(z)dz

=

ZZ

˜

g(z) 1 h_n K

z − y h_n

dFb_n(y)dz −

ZZ

˜

g(z) 1 h_nK

z − y h_n

dF₀(y)dz +

ZZ

˜

g(z) 1 h_n K

x − y h_n

dF₀(y)dz − Z

˜

g(z)dF₀(z)

˜

g(z) 1 h_nK

z − y h_n

dz

| {z }

¯ gn(y)

dFb_n(y)−

Z Z

˜

g(z) 1 h_n K

z − y h_n

dz

| {z }

¯ gn(y)

dF₀(y)

we see that the above benchmark result applies if all the g¯_n belong to BV_(1/φ),≤c

(33)

Intro

■ Hence, it only remains to consider sup

˜ g∈G˜

ZZ

˜

g(z) 1 h_nK

x − y h_n

dF₀(y)dz − Z

˜

g(z)dF₀(z).

■ Consider this for

˜

g ∈ G˜ := {g_x : R → R|g_x(z) = φ(x)¹_(−∞,x)(z), x ≤ 0, and g_x(z) = −φ(x)¹_[x,∞)(z), x > 0}.

■ Then the above becomes sup

x

φ(x) Z

K(z) (F₀(x + zh_n) − F₀(x)) dz.

(34)

Free lunch

17

■ If the benchmark result holds for F₀ we have

φ(x)F₀(x) → 0 for x → −∞ and φ(x)(1 − F₀(x)) for x → ∞.

(35)

Free lunch

■ Hence, for K compact and x small (or large) F₀(x + zh_n) − F₀(x)

will also be small for all z even when compared to φ(x).

(36)

Free lunch

17

■ Theorem: We have the following extension of the above result sup

˜

g∈BV_(1/φ),≤c

√n

Z

˜

g(z)f₀(z)dz

= O_P(1),

(37)

Free lunch

■ Theorem: We have the following extension of the above result sup

˜

g∈BV_(1/φ),≤c

√n

Z

˜

g(z)f₀(z)dz

= O_P(1),

if √

nh^m+k_n → 0, if sup_z φ(z)f₀^(m)(z) ≤ C and if with r = 1, . . . , m, and k > 1/2

Z

K(z) dz = 1, Z

z^rK(z) dz = 0, Z

|z|^m+k|K(z)| dz < ∞.

(38)

One more free lunch? No (too much asked).

18

(39)

Intro

■ Now consider K non-compact. Then the above reasoning that F₀(x + zh_n) − F₀(x)

will be small if F₀(x) is small does not apply anymore (we can make the first (almost) equal to 1 by making z large).

(40)

Intro

19

■ Yet, intuitively, if K puts not too much weight on these z then Z

K(z) (F₀(x + zh_n) − F₀(x)) dz should still be small if x is small.

(41)

Intro

■ Yet, intuitively, if K puts not too much weight on these z then Z

K(z) (F₀(x + zh_n) − F₀(x)) dz should still be small if x is small.

■ Impose the following: f₀ is m-times continuously differentiable and for all t ∈ [0,1] and all x, y ∈ R we have

sup

x |φ(x)f^(m)(x + ty)| ≤ L₂|y|^p(y), where p is a bounded function.

(42)

Too much asked

20

■ Theorem: Let f₀ be as above and K be non-compact. Then sup

˜ g∈BV

(1/φ),≤c

√n

Z

˜

g(z)f₀(z)dz

= O_P(1),

(43)

Too much asked

■ Theorem: Let f₀ be as above and K be non-compact. Then sup

˜ g∈BV

(1/φ),≤c

√n

Z

˜

g(z)f₀(z)dz

= O_P(1), if √

nh^m+s_n → 0, and if with r = 1, . . . ,⌊m + s⌋ Z

K(z) dz = 1, Z

z^rK(z) dz = 0, Z

|z|^m+s|K(z)| dz < ∞, where s = sup_y p(y).

(44)

Example

21

■ K compact: Then for f₀ density of the double exponential we clearly have that

sup

z

φ(z)f₀(z) is finite for any polynomial weight function.

(45)

Example

■ K compact: Then for f₀ density of the double exponential we clearly have that

sup

z

φ(z)f₀(z)

is finite for any polynomial weight function.

■ K non-compact and same f₀: Then for a polynomial weight of the form (1 + |z|)^λ the above condition holds with p > λ.

■ Thus, we have to increase the order of the kernel beyond what is

needed by the smoothness and that increase is a function of the weight.

(46)

22

That’s all.