Adaptation via Lp

(1)

Adaptation via L

_p

-norm in the regression model.

NGUYEN Ngoc Bien

Laboratoire d’Analyse, Topologie et Probabilit´es Universit´e de Provence

CIRM, April 16, 2012

NGUYEN Ngoc Bien Adaptive estimation

(2)

Outline

1 Introduction

Model Conditions

Collection of estimators and the selection rule

2 Results

3 Adaptation

anisotropic H¨older classes Adaptation

(3)

Introduction.

Regression model

LetY_i =f(X_i) +ξ_i i = 1,n where:

1 (Xi,Yi)-observation.

2 (Xi) are i.i.d and uniformly distributed on [0,1]^d.

3 (ξ_i)’s are i.i.d and independent of (X_i).

4 f ∈ F :={g : [0,1]^d →R, kgk_∞6f∞}

(4)

Statistical models. Regression.

Our goal:Estimate the function f from the observation (Xi,Yi),i = 1,n.

Risk:

R^q_s(ˆf,f) :=

Efkfˆ−fk^q_s,ν1/q

R^q_s(ˆf,F) := sup

f∈F

Efkˆf −fk^q_s,ν1/q

Where the mesuredν =1_[δ,1−δ]^d(x)dx with 0< δ <1/2 given.

(5)

Statistical models. Regression.

We need the assumption on the noise, Assumption N

1 There exists c >0, α >0 such that P{|ξ₁|>x}6cexp{−x^α} ∀x >0.

2 There exists p >1 andP >0 such that E|ξ|^p6P

(6)

Collection of estimators.

Estimator linear.

ˆfh(t) := 1 n

n

X

i=1

Kh(Xi −t)Yi ,h∈H

where

h := (h1, ...,hd)-bandwidth,K :R^d →R-kernel,

K_h(·) :=V_h⁻¹K(·/h), V_h:=h1· · ·h_d and if x,h∈R^d then x/h := (x₁/h₁, ...,x_d/h_d)

H :=

h∈[h_min,1]^d :V_h6V_max where 0<h_min<1 and V_max >0.

(7)

Assumptions .

We need the assumption on the kernelK Assumption K

1 suppK ⊂[−1/2,1/2]^d

2 There exists k >0 such that ∀x,y ∈R^d, we have

|K(x)−K(y)|6kkx−ykwherek·k is the euclidean norm.

3 R

K(x)dx = 1

(8)

Selection rule.

Putting

ˆf_h,η(t) := 1 n

n

X

i=1

(K_h∗Kη−K_h) (Xi−t)Yi ,h, η∈H where∗ is the convolution.

Selection rule Rˆh:= sup

η∈H

kˆfη,h−fˆηk_s,ν−C 1

√nV_h

+

+C 1

√nV_h hˆ:= arg infh∈HRˆh

ˆf := ˆf_ˆ_h

whereC :=C(k,d,s,q, α,f∞) if N1, and C :=C(k,d,s,q,p,P,f∞) if N2

(9)

Selection rule

Remark

BecauseH is a compact, ˆf_h(·) is continous a.s then ˆh is measurable.

(10)

Results.

Theorem 1 If the assumption N1 holds, then we have:

R^q_s(ˆf,f)6(1 + 2k) inf

h∈H

R^q_s(ˆf,f) +C1

√1 nVh

+C2nln^3d(h⁻¹_min) exp

− 1 4qV⁻

α s(α+1)

max

,∀f ∈ F whereC1,C2 depend only on f∞,k,s, α,d andq

(11)

Results.

Theorem 2 If the assumption N2 holds, then we have:

R^q_s(ˆf,f)6(1 + 2k) inf

h∈H

R^q_s(ˆf,f) +C₃ 1

√nV_h

+C₄nln^3d(h⁻¹_min)V_max^p/(3qs) ,∀f ∈ F whereC₃,C₄ depend only on f∞,k,s,p,P,d andq

(12)

Adaptation-Anitropic Holder classes

Definition

Letβ = (β₁, ..., β_d), β_i >0 andL>0. We say that the function f :R^d→R belongs to the anisotropic Holder classH_d(β,L) of function if:

For alli = 1, ...,d and allt ∈R sup

x1,...,xd∈R^d

D_i^bβⁱ^cf(x₁, ...,x_i +t, ...,x_d)−D_i^bβⁱ^cf(x₁, ...,x_i, ...,x_d)

≤L|t|^βⁱ^−bβⁱ^c

HereD_i^kf denotes thekth order partial derivative off with respect to the variablet_i andbtc is the largest integer strictly less thant.

(13)

Adaptation.

We define alsoφn(β) =n⁻^{β/(2 ¯}^¯ ^β+1) where 1/β¯=Pd i=11/βi

H:=

H_d(β,L) : 0< β_i <l,i = 1,d,L>0 wherel >0 fixed.

Additional assumption on K R

R^dK(t)t^kdt = 0 ∀ |k|= 1, ...,blc −1 wherek = (k1, ...,kd) is multi-index,|k|=k1+· · ·+k_d t^k =t₁^k¹· · ·t_d^k^d fort = (t1, ...,t_d)

(14)

Adaptation.Theorem

We seth_min= 1/n andV_max =n^−d/(2l+d) then we have Theorem 3

For alls >1,H_d(β,L)∈H, assume thatp>9qs(l+ 1/2) if N2, then

lim sup

n→∞

h

φ⁻¹_n (β)R^q_s(ˆf,H_d(β,L)) i

<+∞

Remark

It is well-known thatφn(β) is the rate-minimax over the space functionH_d(β,L). Then our theorem precedent indice the adaptation of estimator ˆf over the class H

(15)

Proof of theorem 1 and 2

kˆf −fk_s ≤ kˆf_ˆ_h−ˆf_ˆ_h,hk_s+kˆf_ˆ_h,h−ˆf_hk_s+kˆf_h−fk_s

≤ kˆf_h−fk_s+

kˆf_ˆ_h−ˆf_ˆ_h,hk_s−C 1 pnV_ˆ_h

+

+C 1 nV_ˆ_h +

kfˆ_h,_ˆ_h−ˆf_hk_s−C 1

√nV_h

+

+C 1

√nV_h

≤ kˆf −fk_s+ (

sup

η∈H

kˆf_η−ˆf_η,hk_s−C 1 pnVη

+

+C 1

√nV_h )

+ (

sup

η∈H

kfˆ_η −ˆf_η,_h_ˆk_s−C 1 pnVη

+

+C 1 pnVˆh

)

=kˆf_h−fk_s+ ˆR_h+ ˆR_ˆ_h

≤ kˆfh−fk_s+ 2 ˆRh

(16)

Proof of theorem 1 and 2

To bound ˆR_h we have to bound M_s,h(f) := Ef sup

η∈H

kˆfη−fˆ_η,hk_s −C 1 pnVη

q

+

!1/q

Writing

ˆf_η,h(t)−ˆfη(t) =bias + stochastic error:=A_h,η(t) +B_h,η(t).

The hardest work is to bound M_h⁽¹⁾(f) := Ef sup

η∈H

"

kB_h,ηk_s−C⁽¹⁾ 1 pnV_η

#q

+

!1/q

The main technical tools used in our derivations are uniform bounds onLp-norms of empirical processes developed by Goldensluger and Lepski [2010].

(17)

Refercenes.

A.Goldensluger and O.Lepski : Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality To appear in Ann. Stat.

A.Goldensluger and O.Lepski : Structural adaptation via Lp-norm oracle inequalities, Probab.Theory Ralat. Fields 143,41-71.

A.Goldensluger and O.Lepski : Uniform bounds for norms of independent random functions, Ann. Probab 39, 2318-2384.

A.Goldensluger and O.Lepski : Universal estimation routines in non parametric statistics, Manuscrit.

G. Kerkyacharian, O. Lepski and D. Picard :Nonlinear estimation in anisotropic multi-index denoising, Probab.

Theory Relat. Fields 121, 137-170.

O.Lepski and B.Y.Levit : Universal pointwise selection rule in multivariate function estimation, Bernoulli 14, 1150-1190.