Donsker results for the
smoothed empirical process of dependent observations
Eric Beutner, Maastricht University Joint work with Henryk Zähle
Besancon, May 13
Overview
2
Introduction
Extending the benchmark Free lunch? Yes free lunch!
One more free lunch? No (too much asked).
Introduction
Introduction
4
■ Let Z1, . . . , Zn be identically distributed with cdf F0 and density f0 and consider the kernel based estimator for f0
fˆn(z) = 1 n
Xn
i=1
1 hn K
z − Zi hn
Introduction
■ Let Z1, . . . , Zn be identically distributed with cdf F0 and density f0 and consider the kernel based estimator for f0
fˆn(z) = 1 n
Xn
i=1
1 hn K
z − Zi hn
=
Z 1 hn K
z − y hn
dFbn(y),
where Fbn is the empirical distribution function based on Z1, . . . , Zn and hn is the bandwidth.
Introduction
4
■ Let Z1, . . . , Zn be identically distributed with cdf F0 and density f0 and consider the kernel based estimator for f0
fˆn(z) = 1 n
Xn
i=1
1 hn K
z − Zi hn
=
Z 1 hn K
z − y hn
dFbn(y),
where Fbn is the empirical distribution function based on Z1, . . . , Zn and hn is the bandwidth.
■ Let the kernel K be symmetric and for some C > 0 Z
K(y) dy = 1, Z
yK(y) dy = 0, and Z
y2K(y)dy = C.
Introduction
■ Let Z1, . . . , Zn be identically distributed with cdf F0 and density f0 and consider the kernel based estimator for f0
fˆn(z) = 1 n
Xn
i=1
1 hn K
z − Zi hn
=
Z 1 hn K
z − y hn
dFbn(y),
where Fbn is the empirical distribution function based on Z1, . . . , Zn and hn is the bandwidth.
■ Let the kernel K be symmetric and for some C > 0 Z
K(y) dy = 1, Z
yK(y) dy = 0, and Z
y2K(y)dy = C.
■ For f0 twice continuously differentiable, hn = n−1/5 is optimal for MISE( ˆfn) =
Z
E[( ˆfn(z) − f0(z))2] dz
Introduction (cont’d)
5
■ Ideally, we could use this estimate for f0 to estimate, for instance, moments of F0. The second moment plug-in estimate would be
Z
z2fˆn(z) dz =
Xn
i=1
Z
z2 1
nhn K
z − Zi hn
dz.
Introduction (cont’d)
■ Ideally, we could use this estimate for f0 to estimate, for instance, moments of F0. The second moment plug-in estimate would be
Z
z2fˆn(z) dz =
Xn
i=1
Z
z2 1
nhn K
z − Zi hn
dz.
■ Substituting z by zhn + Zi this equals 1
n
Xn
i=1
Z
z2h2nK(z) dz + 2 n
Xn
i=1
Zihn Z
zK(z) dz + 1 n
Xn
i=1
Zi2.
Introduction (cont’d)
5
■ Ideally, we could use this estimate for f0 to estimate, for instance, moments of F0. The second moment plug-in estimate would be
Z
z2fˆn(z) dz =
Xn
i=1
Z
z2 1
nhn K
z − Zi hn
dz.
■ Substituting z by zhn + Zi this equals 1
n
Xn
i=1
Z
z2h2nK(z) dz + 2 n
Xn
i=1
Zihn Z
zK(z) dz + 1 n
Xn
i=1
Zi2.
■ The difference between this estimate and R
z2f0(z) dz equals 1
n
Xn
i=1
Zi2 − Z
z2f0(z) dz + Ch2n.
Introduction (cont’d)
■ Ideally, we could use this estimate for f0 to estimate, for instance, moments of F0. The second moment plug-in estimate would be
Z
z2fˆn(z) dz =
Xn
i=1
Z
z2 1
nhn K
z − Zi hn
dz.
■ Substituting z by zhn + Zi this equals 1
n
Xn
i=1
Z
z2h2nK(z) dz + 2 n
Xn
i=1
Zihn Z
zK(z) dz + 1 n
Xn
i=1
Zi2.
■ The difference between this estimate and R
z2f0(z) dz equals 1
n
Xn
i=1
Zi2 − Z
z2f0(z) dz + Ch2n.
■ ⇒ R
z2fˆn(z)dz not √
n consistent for E[Z12] as √
nhn = n1/10 → ∞.
Introduction (cont’d)
6
■ Can we find K such that we have √
n consistency for the plug-in estimator
fˆn → Z
z2fˆn(z)dz,
i.e. √
n
Z
z2fˆn(z)dz − Z
z2f0(z) dz
= OP (1), and MISE-optimality at the same time?
Introduction (cont’d)
■ Can we find K such that we have √
n consistency for the plug-in estimator
fˆn → Z
z2fˆn(z)dz,
i.e. √
n
Z
z2fˆn(z)dz − Z
z2f0(z) dz
= OP (1), and MISE-optimality at the same time?
■ More generally, for which functions g and kernels K can we have MISE-optimality and
√n
Z
g(z) ˆfK(z) dz − Z
g(z)f0(z) dz
= OP (1) at the same time?
Introduction (cont’d)
7
■ Comparing this last question to what is known for the plug-in estimator based on Fbn where even
sup
˜ g∈G˜
√n
Z
˜
g(z)dFbn(z) − Z
˜
g(z) dF0(z)
= OP (1)
for several classes of functions G˜ (even converges weakly)
Introduction (cont’d)
■ Comparing this last question to what is known for the plug-in estimator based on Fbn where even
sup
˜ g∈G˜
√n
Z
˜
g(z)dFbn(z) − Z
˜
g(z) dF0(z)
= OP (1)
for several classes of functions G˜ (even converges weakly)
■ we may ask if we can have for some kernels MISE-optimality and sup
˜ g∈G˜
√n
Z
˜
g(z) ˆfK(z)dz − Z
˜
g(z)f0(z)dz
= OP(1) for the same classes of functions (or even weak convergence).
Introduction (cont’d)
8
■ Already known: If Z1, . . . , Zn are iid it holds, for instance, that for G˜ = BV = {g˜ : R → R | variation of g˜ ≤ c} we have
sup
˜ g∈BV
√n
Z
˜
g(z) dFbn(z) − Z
˜
g(z) dF0(z)
= OP(1).
■ Giné and Nickl (2008) proved that for this class of functions and f0 bounded and m-times continuously differentiable one has with
hn = n−2m+11 (i.e.MISE optimality) sup
˜ g∈BV
√n
Z
˜
g(z) ˆfK(z)dz − Z
˜
g(z)f0(z)dz
= OP(1),
Introduction (cont’d)
■ Already known: If Z1, . . . , Zn are iid it holds, for instance, that for G˜ = BV = {g˜ : R → R | variation of g˜ ≤ c} we have
sup
˜ g∈BV
√n
Z
˜
g(z) dFbn(z) − Z
˜
g(z) dF0(z)
= OP(1).
■ Giné and Nickl (2008) proved that for this class of functions and f0 bounded and m-times continuously differentiable one has with
hn = n−2m+11 (i.e.MISE optimality) sup
˜ g∈BV
√n
Z
˜
g(z) ˆfK(z)dz − Z
˜
g(z)f0(z)dz
= OP(1),
■ if √
nhm+kn → 0 and if with r = 1, . . . , m, and k > 1/2 Z
K(z) dz = 1, Z
zrK(z) dz = 0, Z
|z|m+k|K(z)| dz < ∞.
Introduction (cont’d)
9
■ Giné and Nickl (2008) proved two more results in the same spirit with G˜ ⊂ C(R) where
C(R) = {f : R → R | |f(x)| ≤ M, x ∈ R, f continuous}.
■ Clearly, our example from the beginning g(z) = z2 does not belong to C(R) nor is it of bounded variation on R.
Introduction (cont’d)
■ Giné and Nickl (2008) proved two more results in the same spirit with G˜ ⊂ C(R) where
C(R) = {f : R → R | |f(x)| ≤ M, x ∈ R, f continuous}.
■ Clearly, our example from the beginning g(z) = z2 does not belong to C(R) nor is it of bounded variation on R.
■ First, one might ask whether the above results can be extended, for instance, to the set
BVloc = {g˜ : R → R | g˜ is locally of bounded variation}.
Introduction (cont’d)
9
■ Giné and Nickl (2008) proved two more results in the same spirit with G˜ ⊂ C(R) where
C(R) = {f : R → R | |f(x)| ≤ M, x ∈ R, f continuous}.
■ Clearly, our example from the beginning g(z) = z2 does not belong to C(R) nor is it of bounded variation on R.
■ First, one might ask whether the above results can be extended, for instance, to the set
BVloc = {g˜ : R → R | g˜ is locally of bounded variation}.
■ Second, one might ask whether we can extend the results to time series settings, where
Zi is ARMA(p, q) or Zi is GARCH(p, q).
Extending the benchmark
Locally bounded variation: iid
11
■ First recall the benchmark for the kernel based density estimator sup
˜ g∈BV
√n
Z
˜
g(z)dFbn(z) − Z
˜
g(z) dF(z)
= OP(1).
■ Hence, we first need to extend this result into two directions:
1. Replace sup w.r.t. g˜ ∈ BV by sup w.r.t. g˜ ∈ BVloc;
2. Replace Fbn based on iid Z1, . . . , Zn by Zi is ARMA(p, q) or Zi is GARCH(p, q) (or more generally, by some weak
dependence concept).
Locally bounded variation: iid
■ Let φ : R → [1, ∞) be a weight function and put BV(1/φ),≤c =
˜
g ∈ BVloc :
Z 1
φ(z) |d˜g|(z) ≤ c
.
Locally bounded variation: iid
12
■ Let φ : R → [1, ∞) be a weight function and put BV(1/φ),≤c =
˜
g ∈ BVloc :
Z 1
φ(z) |d˜g|(z) ≤ c
.
■ Note that for φ ≡ 1 we get the functions of bounded variation.
Locally bounded variation: iid
■ Let φ : R → [1, ∞) be a weight function and put BV(1/φ),≤c =
˜
g ∈ BVloc :
Z 1
φ(z) |d˜g|(z) ≤ c
.
■ Note that for φ ≡ 1 we get the functions of bounded variation.
■ If we take φ(z) = (1 + |z|)2+ǫ, ǫ > 0, then our example from the beginning g(z) = z2 (dg(z) = 2z) is included in
BV(1/(1+|z|)2+ǫ),≤c =
˜
g ∈ BVloc :
Z 1
(1 + |z|)2+ǫ |dg˜|(z) ≤ c
.
Locally bounded variation: iid
12
■ Let φ : R → [1, ∞) be a weight function and put BV(1/φ),≤c =
˜
g ∈ BVloc :
Z 1
φ(z) |d˜g|(z) ≤ c
.
■ Note that for φ ≡ 1 we get the functions of bounded variation.
■ If we take φ(z) = (1 + |z|)2+ǫ, ǫ > 0, then our example from the beginning g(z) = z2 (dg(z) = 2z) is included in
BV(1/(1+|z|)2+ǫ),≤c =
˜
g ∈ BVloc :
Z 1
(1 + |z|)2+ǫ |dg˜|(z) ≤ c
.
■ Theorem: Let Z1, . . . , Zn be iid and φ be a weight function. Then for F0 with R
φ2dF0 < ∞, we have sup
˜ g∈BV
(1/φ),≤c
√n
Z
˜
g(z) dFbn(z) − Z
˜
g(z)dF0(z)
= OP(1).
Dependent data
■ Theorem: Z1, . . . , Zn be strictly stationary and α-mixing with α(n) = O(n−θ) for some θ > 1 + √
2. Let φλ(x) := (1 + |x|)λ, λ ≥ 0 and assume that R
R |x|γ dF0(x) < ∞ where γ > θ−12θλ . Then we have sup
˜
g∈BV(1/φ),≤c
√n
Z
˜
g(z) dFbn(z) − Z
˜
g(z)dF0(z)
= OP(1).
Dependent data
13
■ Theorem: Z1, . . . , Zn be strictly stationary and α-mixing with α(n) = O(n−θ) for some θ > 1 + √
2. Let φλ(x) := (1 + |x|)λ, λ ≥ 0 and assume that R
R |x|γ dF0(x) < ∞ where γ > θ−12θλ . Then we have sup
˜
g∈BV(1/φ),≤c
√n
Z
˜
g(z) dFbn(z) − Z
˜
g(z)dF0(z)
= OP(1).
■ Theorem: Let Zt := P∞
s=0 as εt−s, t ∈ N, with (εi)i∈Z i.i.d. Assume as = s−β ℓ(s), β ∈ (12,1), s ∈ N. Then Cov(X0, Xk) ∼ k1−2β, hence non-summable, thus long-memory.
With φλ as above and E[|ε0|2+2λ] < ∞, we have sup
˜
g∈BV(1/φ),≤c
rn
Z
˜
g(z) dFbn(z) − Z
˜
g(z) dF0(z)
= OP(1), where rn = nβ−1/2.
Free lunch? Yes free lunch!
Intro
15
■ First note that we have Z
˜
g(z) ˆfK(z)dz − Z
˜
g(z)f0(z)dz
=
ZZ
˜
g(z) 1 hn K
z − y hn
dFbn(y)dz −
ZZ
˜
g(z) 1 hnK
z − y hn
dF0(y)dz +
ZZ
˜
g(z) 1 hn K
x − y hn
dF0(y)dz − Z
˜
g(z)dF0(z)
■ Rewriting the second line as Z Z
˜
g(z) 1 hnK
z − y hn
dz
| {z }
¯ gn(y)
dFbn(y)−
Z Z
˜
g(z) 1 hn K
z − y hn
dz
| {z }
¯ gn(y)
dF0(y)
we see that the above benchmark result applies if all the g¯n belong to BV(1/φ),≤c
Intro
■ First note that we have Z
˜
g(z) ˆfK(z)dz − Z
˜
g(z)f0(z)dz
=
ZZ
˜
g(z) 1 hn K
z − y hn
dFbn(y)dz −
ZZ
˜
g(z) 1 hnK
z − y hn
dF0(y)dz +
ZZ
˜
g(z) 1 hn K
x − y hn
dF0(y)dz − Z
˜
g(z)dF0(z)
■ Rewriting the second line as Z Z
˜
g(z) 1 hnK
z − y hn
dz
| {z }
¯ gn(y)
dFbn(y)−
Z Z
˜
g(z) 1 hn K
z − y hn
dz
| {z }
¯ gn(y)
dF0(y)
Intro
15
■ First note that we have Z
˜
g(z) ˆfK(z)dz − Z
˜
g(z)f0(z)dz
=
ZZ
˜
g(z) 1 hn K
z − y hn
dFbn(y)dz −
ZZ
˜
g(z) 1 hnK
z − y hn
dF0(y)dz +
ZZ
˜
g(z) 1 hn K
x − y hn
dF0(y)dz − Z
˜
g(z)dF0(z)
■ Rewriting the second line as Z Z
˜
g(z) 1 hnK
z − y hn
dz
| {z }
¯ gn(y)
dFbn(y)−
Z Z
˜
g(z) 1 hn K
z − y hn
dz
| {z }
¯ gn(y)
dF0(y)
we see that the above benchmark result applies if all the g¯n belong to BV(1/φ),≤c
Intro
■ Hence, it only remains to consider sup
˜ g∈G˜
ZZ
˜
g(z) 1 hnK
x − y hn
dF0(y)dz − Z
˜
g(z)dF0(z).
■ Consider this for
˜
g ∈ G˜ := {gx : R → R|gx(z) = φ(x)1(−∞,x)(z), x ≤ 0, and gx(z) = −φ(x)1[x,∞)(z), x > 0}.
■ Then the above becomes sup
x
φ(x) Z
K(z) (F0(x + zhn) − F0(x)) dz.
Free lunch
17
■ If the benchmark result holds for F0 we have
φ(x)F0(x) → 0 for x → −∞ and φ(x)(1 − F0(x)) for x → ∞.
Free lunch
■ If the benchmark result holds for F0 we have
φ(x)F0(x) → 0 for x → −∞ and φ(x)(1 − F0(x)) for x → ∞.
■ Hence, for K compact and x small (or large) F0(x + zhn) − F0(x)
will also be small for all z even when compared to φ(x).
Free lunch
17
■ If the benchmark result holds for F0 we have
φ(x)F0(x) → 0 for x → −∞ and φ(x)(1 − F0(x)) for x → ∞.
■ Hence, for K compact and x small (or large) F0(x + zhn) − F0(x)
will also be small for all z even when compared to φ(x).
■ Theorem: We have the following extension of the above result sup
˜
g∈BV(1/φ),≤c
√n
Z
˜
g(z) ˆfK(z)dz − Z
˜
g(z)f0(z)dz
= OP(1),
Free lunch
■ If the benchmark result holds for F0 we have
φ(x)F0(x) → 0 for x → −∞ and φ(x)(1 − F0(x)) for x → ∞.
■ Hence, for K compact and x small (or large) F0(x + zhn) − F0(x)
will also be small for all z even when compared to φ(x).
■ Theorem: We have the following extension of the above result sup
˜
g∈BV(1/φ),≤c
√n
Z
˜
g(z) ˆfK(z)dz − Z
˜
g(z)f0(z)dz
= OP(1),
if √
nhm+kn → 0, if supz φ(z)f0(m)(z) ≤ C and if with r = 1, . . . , m, and k > 1/2
Z
K(z) dz = 1, Z
zrK(z) dz = 0, Z
|z|m+k|K(z)| dz < ∞.
One more free lunch? No (too much asked).
18
Intro
■ Now consider K non-compact. Then the above reasoning that F0(x + zhn) − F0(x)
will be small if F0(x) is small does not apply anymore (we can make the first (almost) equal to 1 by making z large).
Intro
19
■ Now consider K non-compact. Then the above reasoning that F0(x + zhn) − F0(x)
will be small if F0(x) is small does not apply anymore (we can make the first (almost) equal to 1 by making z large).
■ Yet, intuitively, if K puts not too much weight on these z then Z
K(z) (F0(x + zhn) − F0(x)) dz should still be small if x is small.
Intro
■ Now consider K non-compact. Then the above reasoning that F0(x + zhn) − F0(x)
will be small if F0(x) is small does not apply anymore (we can make the first (almost) equal to 1 by making z large).
■ Yet, intuitively, if K puts not too much weight on these z then Z
K(z) (F0(x + zhn) − F0(x)) dz should still be small if x is small.
■ Impose the following: f0 is m-times continuously differentiable and for all t ∈ [0,1] and all x, y ∈ R we have
sup
x |φ(x)f(m)(x + ty)| ≤ L2|y|p(y), where p is a bounded function.
Too much asked
20
■ Theorem: Let f0 be as above and K be non-compact. Then sup
˜ g∈BV
(1/φ),≤c
√n
Z
˜
g(z) ˆfK(z)dz − Z
˜
g(z)f0(z)dz
= OP(1),
Too much asked
■ Theorem: Let f0 be as above and K be non-compact. Then sup
˜ g∈BV
(1/φ),≤c
√n
Z
˜
g(z) ˆfK(z)dz − Z
˜
g(z)f0(z)dz
= OP(1), if √
nhm+sn → 0, and if with r = 1, . . . ,⌊m + s⌋ Z
K(z) dz = 1, Z
zrK(z) dz = 0, Z
|z|m+s|K(z)| dz < ∞, where s = supy p(y).
Example
21
■ K compact: Then for f0 density of the double exponential we clearly have that
sup
z
φ(z)f0(z) is finite for any polynomial weight function.
Example
■ K compact: Then for f0 density of the double exponential we clearly have that
sup
z
φ(z)f0(z)
is finite for any polynomial weight function.
■ K non-compact and same f0: Then for a polynomial weight of the form (1 + |z|)λ the above condition holds with p > λ.
■ Thus, we have to increase the order of the kernel beyond what is
needed by the smoothness and that increase is a function of the weight.
22
That’s all.