• Aucun résultat trouvé

Cross-validation for estimator selection or aggregation

N/A
N/A
Protected

Academic year: 2022

Partager "Cross-validation for estimator selection or aggregation"

Copied!
72
0
0

Texte intégral

(1)

1/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Cross-validation for estimator selection or aggregation

Sylvain Arlot (joint works with Alain Celisse, Matthieu Lerasle, Nelo Magalhães, Guillaume Maillard)

Université Paris-Sud, Laboratoire de Mathématiques d’Orsay

Data and Analytics for Short-Term Operations, Cambridge February 27, 2019

Survey: arXiv:0907.4728 (& arXiv:1703.03167) CV inL2density estimation: arXiv:1210.5830

Aggregated hold-out: arXiv:1709.03702

(2)

2/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Outline

1 Estimator selection

2 Cross-validation

3 Cross-validation for risk estimation

4 Cross-validation for estimator selection

5 Conclusion on CV

6 Combining cross-validation with aggregation

(3)

3/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Regression: data (X

1

, Y

1

), . . . , (X

n

, Y

n

)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(4)

4/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Goal: predict Y given X , i.e., denoising

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(5)

5/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

General setting: prediction

Data: Dn= (Xi,Yi)16i6n∈(X × Y)n assumed i.i.d. P Predictor: f :X → Y (F: set of all predictors)

Risk (prediction error): R(f)=E h

c f(X),Yi minimal forf =f?

LS regression: c(y,y0) = (yy0)2,f?(X) =E[Y|X]and R(f)− R(f?) =E

(f(X)f?(X))2

Goal: fromDn only, find f ∈ F with R(f) minimal.

Examples: regression, classification

More general setting possible, including density estimation with LS or KL risk.

(6)

6/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Estimator selection (regression): regular regressograms

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(7)

7/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Estimator selection (regression): kernel ridge

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(8)

8/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Estimator selection (regression): k nearest neighbours

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(9)

9/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Estimator selection

Estimator/Learning algorithm: bf :Dn7→bf(Dn)∈ F Example: least-squares estimator on somemodel Sm⊂ F bfm ∈argmin

f∈Sm

n

Rbn(f)o where Rbn(f) := 1 n

X

(Xi,Yi)∈Dn

c f(Xi),Yi

Examples of models: histograms, span{ϕ1, . . . , ϕD}

Estimator collection(bfm)m∈M ⇒ choosemb =m(Db n)?

Examples:

model selection

calibration of tuning parameters (choosingk or the distance fork-NN, choice of a regularization parameter, etc.) choice between different methods

ex.: random forests vs. SVM?

(10)

9/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Estimator selection

Estimator/Learning algorithm: bf :Dn7→bf(Dn)∈ F Example: least-squares estimator on some model Sm⊂ F bfm ∈argmin

f∈Sm

n

Rbn(f)o where Rbn(f) := 1 n

X

(Xi,Yi)∈Dn

c f(Xi),Yi

Examples of models: histograms, span{ϕ1, . . . , ϕD}

Estimator collection(bfm)m∈M ⇒ choosemb =m(Db n)?

Examples:

model selection

calibration of tuning parameters(choosingk or the distance fork-NN, choice of a regularization parameter, etc.) choice betweendifferent methods

ex.: random forests vs. SVM?

(11)

10/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Estimator selection: two possible goals

Estimation goal: minimize the risk of the final estimator, i.e., Oracle inequality(in expectation or with a large probability):

R bf

mb

− R(f?)6C inf

m∈M

R(fbm)− R(f?) +Rn

Identification goal: select the (asymptotically) best model/estimator, assuming it is well-defined, i.e., Selection consistency:

P m(Db n) =m?−−−→

n→∞ 1. Equivalent to estimation in the parametric setting. Both goals with the same procedure (AIC-BIC dilemma)? No in general (Yang, 2005). Sometimes possible.

(12)

10/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Estimator selection: two possible goals

Estimation goal: minimize the risk of the final estimator, i.e., Oracle inequality (in expectation or with a large probability):

R bf

mb

− R(f?)6C inf

m∈M

R(fbm)− R(f?) +Rn Identification goal: select the (asymptotically) best model/estimator, assuming it is well-defined, i.e., Selection consistency:

P m(Db n) =m?−−−→

n→∞ 1. Equivalent to estimation in the parametric setting.

Both goals with the same procedure (AIC-BIC dilemma)? No in general (Yang, 2005). Sometimes possible.

(13)

10/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Estimator selection: two possible goals

Estimation goal: minimize the risk of the final estimator, i.e., Oracle inequality (in expectation or with a large probability):

R bf

mb

− R(f?)6C inf

m∈M

R(fbm)− R(f?) +Rn Identification goal: select the (asymptotically) best model/estimator, assuming it is well-defined, i.e., Selection consistency:

P m(Db n) =m?−−−→

n→∞ 1. Equivalent to estimation in the parametric setting.

Both goals with the same procedure (AIC-BIC dilemma)?

Noin general (Yang, 2005). Sometimes possible.

(14)

11/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Estimation goal: Bias-variance trade-off

E

hR(bfm)i− R(f?) = Bias + Variance Bias or Approximation error

R(fm?)− R(f?) = inf

f∈SmR(f)− R(f?) Variance or Estimation error

OLS in regression: σ2dim(Sm) n

Bias-variance trade-off

⇔avoid overfittingandunderfitting

(15)

12/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Outline

1 Estimator selection

2 Cross-validation

3 Cross-validation for risk estimation

4 Cross-validation for estimator selection

5 Conclusion on CV

6 Combining cross-validation with aggregation

(16)

13/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Validation principle: data splitting

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(17)

13/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Validation principle: training/learning sample

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(18)

13/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Validation principle: training/learning sample

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(19)

13/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Validation principle: validation sample

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(20)

13/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Validation principle: validation sample

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(21)

14/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Cross-validation

(X1,Y1), . . . ,(Xnt,Ynt)

| {z }

Training set D(t)nfbm(t) =bfm D(t)n

(Xnt+1,Ynt+1), . . . ,(Xn,Yn)

| {z }

Validation setDn(v)⇒evaluate risk hold-out estimator of the risk:

Rb(v)n bfm(t)= 1 nv

X

(Xi,Yi)∈Dn(v)

cbfm(t)(Xi);Yi

nv=|Dn(v)|=n−nt

cross-validation: average several hold-out estimators

Rbcvfbm;Dn; (Ij(t))16j6V= 1 V

V

X

j=1

Rb(v,j)n bfm(t,j) D(t,j)n =(Xi,Yi)

i∈I(t) j

estimator selection: mbcvDn; (Ij(t))16j6V∈argmin

m∈M

n

Rbcvfbm;Dnobf

mbcv(Dn;(I(t)j )16j6V)(Dn)

(22)

14/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Cross-validation

(X1,Y1), . . . ,(Xnt,Ynt)

| {z }

Training set D(t)nfbm(t) =bfm D(t)n

(Xnt+1,Ynt+1), . . . ,(Xn,Yn)

| {z }

Validation setDn(v)⇒evaluate risk hold-out estimator of the risk:

Rb(v)n bfm(t)= 1 nv

X

(Xi,Yi)∈Dn(v)

cbfm(t)(Xi);Yi

nv=|Dn(v)|=n−nt

cross-validation: average several hold-out estimators

Rbcvfbm;Dn; (Ij(t))16j6V= 1 V

V

X

j=1

Rb(v,j)n bfm(t,j) D(t,j)n =(Xi,Yi)

i∈I(t) j

estimator selection: mbcvDn; (Ij(t))16j6V∈argmin

m∈M

n

Rbcvfbm;Dnobf

mbcv(Dn;(I(t)j )16j6V)(Dn)

(23)

14/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Cross-validation

(X1,Y1), . . . ,(Xnt,Ynt)

| {z }

Training set D(t)nfbm(t) =bfm D(t)n

(Xnt+1,Ynt+1), . . . ,(Xn,Yn)

| {z }

Validation setDn(v)⇒evaluate risk hold-out estimator of the risk:

Rb(v)n bfm(t)= 1 nv

X

(Xi,Yi)∈Dn(v)

cbfm(t)(Xi);Yi

nv=|Dn(v)|=n−nt

cross-validation: average several hold-out estimators

Rbcvfbm;Dn; (Ij(t))16j6V= 1 V

V

X

j=1

Rb(v,j)n bfm(t,j) D(t,j)n =(Xi,Yi)

i∈I(t) j

estimator selection:

mbcvDn; (Ij(t))16j6V∈argmin

m∈M

n

Rbcvfbm;Dnobf

mbcv(Dn;(Ij(t))16j6V)(Dn)

(24)

15/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Cross-validation: examples

Exhaustive data splitting: all possible subsets of size nt

⇒ leave-one-out (nt=n−1) Rbloobfm;Dn

= 1 n

n

X

j=1

cbfm(−j)(Xj);Yj

⇒ leave-p-out (nt =np)

V-fold cross-validation: B= (Bj)16j6V partition of {1, . . . ,n}

⇒ Rbvfbfm;Dn;B= 1 V

V

X

j=1

Rbjnbfm(−j) Monte-Carlo CV / Repeated learning testing:

I1(t), . . . ,IV(t) i.i.d. uniform

(25)

15/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Cross-validation: examples

Exhaustive data splitting: all possible subsets of size nt

⇒ leave-one-out (nt=n−1) Rbloobfm;Dn

= 1 n

n

X

j=1

cbfm(−j)(Xj);Yj

⇒ leave-p-out (nt =np)

V-fold cross-validation: B= (Bj)16j6V partition of {1, . . . ,n}

⇒ Rbvfbfm;Dn;B= 1 V

V

X

j=1

Rbjnbfm(−j)

Monte-Carlo CV / Repeated learning testing: I1(t), . . . ,IV(t) i.i.d. uniform

(26)

15/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Cross-validation: examples

Exhaustive data splitting: all possible subsets of size nt

⇒ leave-one-out (nt=n−1) Rbloobfm;Dn

= 1 n

n

X

j=1

cbfm(−j)(Xj);Yj

⇒ leave-p-out (nt =np)

V-fold cross-validation: B= (Bj)16j6V partition of {1, . . . ,n}

⇒ Rbvfbfm;Dn;B= 1 V

V

X

j=1

Rbjnbfm(−j) Monte-Carlo CV / Repeated learning testing:

I1(t), . . . ,IV(t) i.i.d. uniform

(27)

16/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Outline

1 Estimator selection

2 Cross-validation

3 Cross-validation for risk estimation

4 Cross-validation for estimator selection

5 Conclusion on CV

6 Combining cross-validation with aggregation

(28)

17/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Bias of cross-validation

In this talk, we always assume: ∀j, Card Dn(t,j)

=nt For V-fold CV: Card(Bj) =n/Vnt =n(V −1)/V. Ideal criterion: R bfm(Dn)

General analysis for the bias: E

Rbcv

bfm;Dn;Ij(t)

16j6V

=E

hR bfm(Dnt)i

⇒ everything depends on n→E

hR bfm(Dn)i

Note: bias can be corrected in some settings (Burman, 1989). Note: Dnbfm(Dn) must be fixedbefore seeing any data; otherwise (e.g., data-driven model m), stronger bias.

(29)

17/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Bias of cross-validation

In this talk, we always assume: ∀j, Card Dn(t,j)

=nt For V-fold CV: Card(Bj) =n/Vnt =n(V −1)/V. Ideal criterion: R bfm(Dn)

General analysis for the bias:

E

Rbcv

bfm;Dn;Ij(t)

16j6V

=E

hR bfm(Dnt)i

⇒ everything depends on n→E

hR bfm(Dn)i

Note: bias can be corrected in some settings (Burman, 1989). Note: Dnbfm(Dn) must be fixedbefore seeing any data; otherwise (e.g., data-driven model m), stronger bias.

(30)

17/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Bias of cross-validation

In this talk, we always assume: ∀j, Card Dn(t,j)

=nt For V-fold CV: Card(Bj) =n/Vnt =n(V −1)/V. Ideal criterion: R bfm(Dn)

General analysis for the bias:

E

Rbcv

bfm;Dn;Ij(t)

16j6V

=E

hR bfm(Dnt)i

⇒ everything depends onn →E

hR bfm(Dn)i

Note: bias can be corrected in some settings (Burman, 1989). Note: Dnbfm(Dn) must be fixedbefore seeing any data; otherwise (e.g., data-driven model m), stronger bias.

(31)

17/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Bias of cross-validation

In this talk, we always assume: ∀j, Card Dn(t,j)

=nt For V-fold CV: Card(Bj) =n/Vnt =n(V −1)/V. Ideal criterion: R bfm(Dn)

General analysis for the bias:

E

Rbcv

bfm;Dn;Ij(t)

16j6V

=E

hR bfm(Dnt)i

⇒ everything depends onn →E

hR bfm(Dn)i

Note: bias can be correctedin some settings (Burman, 1989).

Note: Dnbfm(Dn) must be fixedbefore seeing any data;

otherwise (e.g., data-driven model m), stronger bias.

(32)

18/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Bias of cross-validation: generic example

Assume:

E

hR bfm(Dn)i=α(m) + β(m) n

(e.g., LS/ridge/k-NN regression, LS/kernel density estimation).

⇒ E

Rbcv

bfm;Dn;Ij(t)

16j6V

=α(m) + n

nt β(m)

n

⇒Bias:

decreases as a function of nt, minimal for nt =n−1, negligible if ntn.

V-fold: bias decreases when V increases, vanishes asV →+∞.

(33)

18/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Bias of cross-validation: generic example

Assume:

E

hR bfm(Dn)i=α(m) + β(m) n

(e.g., LS/ridge/k-NN regression, LS/kernel density estimation).

⇒ E

Rbcv

bfm;Dn;Ij(t)

16j6V

=α(m) + n

nt β(m)

n

⇒Bias:

decreases as a function of nt, minimal for nt =n−1, negligible if ntn.

V-fold: bias decreases whenV increases, vanishes asV →+∞.

(34)

19/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Variance of cross-validation: general case

Hold-out (Nadeau & Bengio, 2003):

varRb(vn)bfm(t)= 1 nvE

hvarc f(X),Yf =bfm(t)i +varR bfm(Dnt)

Monte-Carlo CV and number of splits: (p=nnt) var Rbcv

bfm;Dn;Ij(t)

16j6V

!

=varRb`po bfm;Dn + 1

V E hvarI(t)

Rb(v)n bfm(t) Dni

| {z }

permutation variance

V-fold CV:V,nt,nv related

leave-one-out: related to stability? (empirical results)

(35)

20/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Variance of V -fold CV criterion

Least-squares density estimation (A. & Lerasle, 2016), exact computation (non-asymptotic):

varRbvffbm;Dn;B= 1+O(1)

n varP(fm?) + 2

n2

1+ 4

V −1+O(V1+1n)

A(m)

(simplified formula, histogram model with bin sizedm−1,A(m)dm) Linear regression, asymptotic formula (Burman, 1989):

varRbvfbfm;Dn;B= 2σ2 n +4σ4

n2

4+ 4 V −1+

2 (V−1)2+ 1

(V−1)3

+o(n−2)

⇒ decreasing with V, dependence only in second order terms.

(36)

21/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Outline

1 Estimator selection

2 Cross-validation

3 Cross-validation for risk estimation

4 Cross-validation for estimator selection

5 Conclusion on CV

6 Combining cross-validation with aggregation

(37)

22/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Risk estimation and estimator selection are different goals

mb ∈argmin

m∈M

n

Rbcvbfm

o vs. m? ∈argmin

m∈M

nR fbm(Dn)o For anyZ (deterministic or random),

mb ∈argmin

m∈M

n

Rbcvbfm+Zo

⇒ bias and variance meaningless.

Perfect ranking among (fbm)m∈M ⇔ ∀m,m0∈ M, sign Rbcv(bfm)−Rbcv(bfm0)=sign R(bfm)− R(fbm0)

⇒ E h

Rbcv bfm

−Rbcv bfm0ishould be of the good sign(unbiased risk estimation heuristic: AIC,Cp, leave-one-out...)

⇒ varRbcv bfm

−Rbcv bfm0

should be minimal(detailed heuristic: A. & Lerasle, 2016)

(38)

22/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Risk estimation and estimator selection are different goals

mb ∈argmin

m∈M

n

Rbcvbfm

o vs. m? ∈argmin

m∈M

nR fbm(Dn)o For anyZ (deterministic or random),

mb ∈argmin

m∈M

n

Rbcvbfm+Zo

⇒ bias and variance meaningless.

Perfect ranking among (fbm)m∈M ⇔ ∀m,m0∈ M, sign Rbcv(bfm)−Rbcv(bfm0)=sign R(bfm)− R(fbm0)

⇒ E h

Rbcv bfm

−Rbcv bfm0ishould be of the good sign(unbiased risk estimation heuristic: AIC,Cp, leave-one-out...)

⇒ varRbcv bfm

−Rbcv bfm0

should be minimal(detailed heuristic: A. & Lerasle, 2016)

(39)

23/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

CV with an estimation goal: the big picture (M “small”)

At first order, the bias drives the performanceof:

leave-p-out,V-fold CV, Monte-Carlo CV if V n2

or ifnv large enough (including hold-out) CV performs similarly to

argmin

m∈M

n E

hR bfm(Dnt)io

⇒ first-order optimality ifntn

⇒ suboptimal otherwise

e.g., V-fold CV with V fixed.

Theoretical results for least-squares regression and density estimation at least.

(40)

23/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

CV with an estimation goal: the big picture (M “small”)

At first order, the bias drives the performance of:

leave-p-out,V-fold CV, Monte-Carlo CV if V n2

or ifnv large enough (including hold-out) CV performs similarly to

argmin

m∈M

n E

hR bfm(Dnt)io

⇒ first-order optimality ifntn

⇒ suboptimal otherwise

e.g., V-fold CV with V fixed.

Theoretical results for least-squares regression and density estimation at least.

(41)

24/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Bias-corrected VFCV / V -fold penalization

Bias-correctedV-fold CV(Burman, 1989):

Rbvf,corrbfm;Dn;B:=Rbvfbfm;Dn;B+Rbnbfm− 1 V

V

X

j=1

Rbnbfm(−j)

=Rbnbfm(Dn)+ penVF(bfm;Dn;B)

| {z }

V-fold penalty (A. 2008)

In least-squares density estimation (A. & Lerasle, 2016): Rbvf(bfm;Dn;B) =Rbn bfm(Dn)+

1+ 1

2(V −1)

| {z }

overpenalization factor

penVF(bfm;Dn;B)

Rb`po(bfm;Dn;B) =Rbn bfm(Dn)+

z }| {

1+ 1 2pn−1

penVF(bfm;Dn;Bloo)

(42)

24/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Bias-corrected VFCV / V -fold penalization

Bias-correctedV-fold CV(Burman, 1989):

Rbvf,corrbfm;Dn;B:=Rbvfbfm;Dn;B+Rbnbfm− 1 V

V

X

j=1

Rbnbfm(−j)

=Rbnbfm(Dn)+ penVF(bfm;Dn;B)

| {z }

V-fold penalty(A. 2008)

In least-squares density estimation (A. & Lerasle, 2016):

Rbvf(bfm;Dn;B) =Rbn bfm(Dn)+

1+ 1

2(V −1)

| {z }

overpenalization factor

penVF(bfm;Dn;B)

Rb`po(bfm;Dn;B) =Rbn bfm(Dn)+

z }| {

1+ 1 2pn−1

penVF(bfm;Dn;Bloo)

(43)

25/47

Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo

Variance and estimator selection

∆(m,m0,V) =Rbvf,corrbfm−Rbvf,corrbfm0

Theorem (A. & Lerasle, 2016, least-squares density estimation)

var ∆(m,m0,V)=4

1+2 n + 1

n2

varP(fm?fm?0) n +2

1+ 4 V −1 −1

n

B(m,m0) n2

| {z }

>0

If SmSm0 are two histogram models with constant bin sizes dm−1,dm−10 , then, B(m,m0)∝ kfm?fm?0kdm.

The two terms are of the same order ifkfm?fm?0k ≈dm/n.

Références

Documents relatifs

Performing model selection for the problem of selecting among histogram-type estimators in the statistical frameworks described in Exam- ples 1 and 3 (among others) has been

OLS, random-design regression, heteroscedastic noise (regressograms, A. & Massart, 2009; piecewise polynomials, Saumard, 2013). Least-squares density

Under mild assumptions on the estimation mapping, we show that this approximation is a weakly differentiable function of the parameters and its weak gradient, coined the Stein

The average ratio (estimator of variance/empirical variance) is displayed for two variance estimators, in an ideal situation where 10 independent training and test sets are

If the amount of data is large enough, PE can be estimated by the mean error over a hold-out test set, and the usual variance estimate for means of independent variables can then

If the amount of data is large enough, PE can be estimated by the mean error over a hold-out test set, and the usual variance estimate for means of independent variables can then

In K-fold cross-validation, the data set D is first chunked into K disjoint subsets (or blocks) of the same size (to simplify the analysis below we assume that n is a multiple of

In a general statistical framework, the model selection performance of MCCV, VFCV, LOO, LOO Bootstrap, and .632 bootstrap for se- lection among minimum contrast estimators was