Adaptation via L
p-norm in the regression model.
NGUYEN Ngoc Bien
Laboratoire d’Analyse, Topologie et Probabilit´es Universit´e de Provence
CIRM, April 16, 2012
NGUYEN Ngoc Bien Adaptive estimation
Outline
1 Introduction
Model Conditions
Collection of estimators and the selection rule
2 Results
3 Adaptation
anisotropic H¨older classes Adaptation
Introduction.
Regression model
LetYi =f(Xi) +ξi i = 1,n where:
1 (Xi,Yi)-observation.
2 (Xi) are i.i.d and uniformly distributed on [0,1]d.
3 (ξi)’s are i.i.d and independent of (Xi).
4 f ∈ F :={g : [0,1]d →R, kgk∞6f∞}
NGUYEN Ngoc Bien Adaptive estimation
Statistical models. Regression.
Our goal:Estimate the function f from the observation (Xi,Yi),i = 1,n.
Risk:
Rqs(ˆf,f) :=
Efkfˆ−fkqs,ν1/q
Rqs(ˆf,F) := sup
f∈F
Efkˆf −fkqs,ν1/q
Where the mesuredν =1[δ,1−δ]d(x)dx with 0< δ <1/2 given.
Statistical models. Regression.
We need the assumption on the noise, Assumption N
1 There exists c >0, α >0 such that P{|ξ1|>x}6cexp{−xα} ∀x >0.
2 There exists p >1 andP >0 such that E|ξ|p6P
NGUYEN Ngoc Bien Adaptive estimation
Collection of estimators.
Estimator linear.
ˆfh(t) := 1 n
n
X
i=1
Kh(Xi −t)Yi ,h∈H
where
h := (h1, ...,hd)-bandwidth,K :Rd →R-kernel,
Kh(·) :=Vh−1K(·/h), Vh:=h1· · ·hd and if x,h∈Rd then x/h := (x1/h1, ...,xd/hd)
H :=
h∈[hmin,1]d :Vh6Vmax where 0<hmin<1 and Vmax >0.
Assumptions .
We need the assumption on the kernelK Assumption K
1 suppK ⊂[−1/2,1/2]d
2 There exists k >0 such that ∀x,y ∈Rd, we have
|K(x)−K(y)|6kkx−ykwherek·k is the euclidean norm.
3 R
K(x)dx = 1
NGUYEN Ngoc Bien Adaptive estimation
Selection rule.
Putting
ˆfh,η(t) := 1 n
n
X
i=1
(Kh∗Kη−Kh) (Xi−t)Yi ,h, η∈H where∗ is the convolution.
Selection rule Rˆh:= sup
η∈H
kˆfη,h−fˆηks,ν−C 1
√nVh
+
+C 1
√nVh hˆ:= arg infh∈HRˆh
ˆf := ˆfˆh
whereC :=C(k,d,s,q, α,f∞) if N1, and C :=C(k,d,s,q,p,P,f∞) if N2
Selection rule
Remark
BecauseH is a compact, ˆfh(·) is continous a.s then ˆh is measurable.
NGUYEN Ngoc Bien Adaptive estimation
Results.
Theorem 1 If the assumption N1 holds, then we have:
Rqs(ˆf,f)6(1 + 2k) inf
h∈H
Rqs(ˆf,f) +C1
√1 nVh
+C2nln3d(h−1min) exp
− 1 4qV−
α s(α+1)
max
,∀f ∈ F whereC1,C2 depend only on f∞,k,s, α,d andq
Results.
Theorem 2 If the assumption N2 holds, then we have:
Rqs(ˆf,f)6(1 + 2k) inf
h∈H
Rqs(ˆf,f) +C3 1
√nVh
+C4nln3d(h−1min)Vmaxp/(3qs) ,∀f ∈ F whereC3,C4 depend only on f∞,k,s,p,P,d andq
NGUYEN Ngoc Bien Adaptive estimation
Adaptation-Anitropic Holder classes
Definition
Letβ = (β1, ..., βd), βi >0 andL>0. We say that the function f :Rd→R belongs to the anisotropic Holder classHd(β,L) of function if:
For alli = 1, ...,d and allt ∈R sup
x1,...,xd∈Rd
Dibβicf(x1, ...,xi +t, ...,xd)−Dibβicf(x1, ...,xi, ...,xd)
≤L|t|βi−bβic
HereDikf denotes thekth order partial derivative off with respect to the variableti andbtc is the largest integer strictly less thant.
Adaptation.
We define alsoφn(β) =n−β/(2 ¯¯ β+1) where 1/β¯=Pd i=11/βi
H:=
Hd(β,L) : 0< βi <l,i = 1,d,L>0 wherel >0 fixed.
Additional assumption on K R
RdK(t)tkdt = 0 ∀ |k|= 1, ...,blc −1 wherek = (k1, ...,kd) is multi-index,|k|=k1+· · ·+kd tk =t1k1· · ·tdkd fort = (t1, ...,td)
NGUYEN Ngoc Bien Adaptive estimation
Adaptation.Theorem
We sethmin= 1/n andVmax =n−d/(2l+d) then we have Theorem 3
For alls >1,Hd(β,L)∈H, assume thatp>9qs(l+ 1/2) if N2, then
lim sup
n→∞
h
φ−1n (β)Rqs(ˆf,Hd(β,L)) i
<+∞
Remark
It is well-known thatφn(β) is the rate-minimax over the space functionHd(β,L). Then our theorem precedent indice the adaptation of estimator ˆf over the class H
Proof of theorem 1 and 2
kˆf −fks ≤ kˆfˆh−ˆfˆh,hks+kˆfˆh,h−ˆfhks+kˆfh−fks
≤ kˆfh−fks+
kˆfˆh−ˆfˆh,hks−C 1 pnVˆh
+
+C 1 nVˆh +
kfˆh,ˆh−ˆfhks−C 1
√nVh
+
+C 1
√nVh
≤ kˆf −fks+ (
sup
η∈H
kˆfη−ˆfη,hks−C 1 pnVη
+
+C 1
√nVh )
+ (
sup
η∈H
kfˆη −ˆfη,hˆks−C 1 pnVη
+
+C 1 pnVˆh
)
=kˆfh−fks+ ˆRh+ ˆRˆh
≤ kˆfh−fks+ 2 ˆRh
NGUYEN Ngoc Bien Adaptive estimation
Proof of theorem 1 and 2
To bound ˆRh we have to bound Ms,h(f) := Ef sup
η∈H
kˆfη−fˆη,hks −C 1 pnVη
q
+
!1/q
Writing
ˆfη,h(t)−ˆfη(t) =bias + stochastic error:=Ah,η(t) +Bh,η(t).
The hardest work is to bound Mh(1)(f) := Ef sup
η∈H
"
kBh,ηks−C(1) 1 pnVη
#q
+
!1/q
The main technical tools used in our derivations are uniform bounds onLp-norms of empirical processes developed by Goldensluger and Lepski [2010].
Refercenes.
A.Goldensluger and O.Lepski : Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality To appear in Ann. Stat.
A.Goldensluger and O.Lepski : Structural adaptation via Lp-norm oracle inequalities, Probab.Theory Ralat. Fields 143,41-71.
A.Goldensluger and O.Lepski : Uniform bounds for norms of independent random functions, Ann. Probab 39, 2318-2384.
A.Goldensluger and O.Lepski : Universal estimation routines in non parametric statistics, Manuscrit.
G. Kerkyacharian, O. Lepski and D. Picard :Nonlinear estimation in anisotropic multi-index denoising, Probab.
Theory Relat. Fields 121, 137-170.
O.Lepski and B.Y.Levit : Universal pointwise selection rule in multivariate function estimation, Bernoulli 14, 1150-1190.
NGUYEN Ngoc Bien Adaptive estimation