1/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Cross-validation for estimator selection or aggregation
Sylvain Arlot (joint works with Alain Celisse, Matthieu Lerasle, Nelo Magalhães, Guillaume Maillard)
Université Paris-Sud, Laboratoire de Mathématiques d’Orsay
Data and Analytics for Short-Term Operations, Cambridge February 27, 2019
Survey: arXiv:0907.4728 (& arXiv:1703.03167) CV inL2density estimation: arXiv:1210.5830
Aggregated hold-out: arXiv:1709.03702
2/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Outline
1 Estimator selection
2 Cross-validation
3 Cross-validation for risk estimation
4 Cross-validation for estimator selection
5 Conclusion on CV
6 Combining cross-validation with aggregation
3/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Regression: data (X
1, Y
1), . . . , (X
n, Y
n)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
4/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Goal: predict Y given X , i.e., denoising
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
5/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
General setting: prediction
Data: Dn= (Xi,Yi)16i6n∈(X × Y)n assumed i.i.d. ∼P Predictor: f :X → Y (F: set of all predictors)
Risk (prediction error): R(f)=E h
c f(X),Yi minimal forf =f?
LS regression: c(y,y0) = (y−y0)2,f?(X) =E[Y|X]and R(f)− R(f?) =E
(f(X)−f?(X))2
Goal: fromDn only, find f ∈ F with R(f) minimal.
Examples: regression, classification
More general setting possible, including density estimation with LS or KL risk.
6/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Estimator selection (regression): regular regressograms
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
7/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Estimator selection (regression): kernel ridge
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
8/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Estimator selection (regression): k nearest neighbours
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
9/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Estimator selection
Estimator/Learning algorithm: bf :Dn7→bf(Dn)∈ F Example: least-squares estimator on somemodel Sm⊂ F bfm ∈argmin
f∈Sm
n
Rbn(f)o where Rbn(f) := 1 n
X
(Xi,Yi)∈Dn
c f(Xi),Yi
Examples of models: histograms, span{ϕ1, . . . , ϕD}
Estimator collection(bfm)m∈M ⇒ choosemb =m(Db n)?
Examples:
model selection
calibration of tuning parameters (choosingk or the distance fork-NN, choice of a regularization parameter, etc.) choice between different methods
ex.: random forests vs. SVM?
9/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Estimator selection
Estimator/Learning algorithm: bf :Dn7→bf(Dn)∈ F Example: least-squares estimator on some model Sm⊂ F bfm ∈argmin
f∈Sm
n
Rbn(f)o where Rbn(f) := 1 n
X
(Xi,Yi)∈Dn
c f(Xi),Yi
Examples of models: histograms, span{ϕ1, . . . , ϕD}
Estimator collection(bfm)m∈M ⇒ choosemb =m(Db n)?
Examples:
model selection
calibration of tuning parameters(choosingk or the distance fork-NN, choice of a regularization parameter, etc.) choice betweendifferent methods
ex.: random forests vs. SVM?
10/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Estimator selection: two possible goals
Estimation goal: minimize the risk of the final estimator, i.e., Oracle inequality(in expectation or with a large probability):
R bf
mb
− R(f?)6C inf
m∈M
R(fbm)− R(f?) +Rn
Identification goal: select the (asymptotically) best model/estimator, assuming it is well-defined, i.e., Selection consistency:
P m(Db n) =m?−−−→
n→∞ 1. Equivalent to estimation in the parametric setting. Both goals with the same procedure (AIC-BIC dilemma)? No in general (Yang, 2005). Sometimes possible.
10/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Estimator selection: two possible goals
Estimation goal: minimize the risk of the final estimator, i.e., Oracle inequality (in expectation or with a large probability):
R bf
mb
− R(f?)6C inf
m∈M
R(fbm)− R(f?) +Rn Identification goal: select the (asymptotically) best model/estimator, assuming it is well-defined, i.e., Selection consistency:
P m(Db n) =m?−−−→
n→∞ 1. Equivalent to estimation in the parametric setting.
Both goals with the same procedure (AIC-BIC dilemma)? No in general (Yang, 2005). Sometimes possible.
10/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Estimator selection: two possible goals
Estimation goal: minimize the risk of the final estimator, i.e., Oracle inequality (in expectation or with a large probability):
R bf
mb
− R(f?)6C inf
m∈M
R(fbm)− R(f?) +Rn Identification goal: select the (asymptotically) best model/estimator, assuming it is well-defined, i.e., Selection consistency:
P m(Db n) =m?−−−→
n→∞ 1. Equivalent to estimation in the parametric setting.
Both goals with the same procedure (AIC-BIC dilemma)?
Noin general (Yang, 2005). Sometimes possible.
11/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Estimation goal: Bias-variance trade-off
E
hR(bfm)i− R(f?) = Bias + Variance Bias or Approximation error
R(fm?)− R(f?) = inf
f∈SmR(f)− R(f?) Variance or Estimation error
OLS in regression: σ2dim(Sm) n
Bias-variance trade-off
⇔avoid overfittingandunderfitting
12/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Outline
1 Estimator selection
2 Cross-validation
3 Cross-validation for risk estimation
4 Cross-validation for estimator selection
5 Conclusion on CV
6 Combining cross-validation with aggregation
13/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Validation principle: data splitting
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
13/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Validation principle: training/learning sample
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
13/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Validation principle: training/learning sample
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
13/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Validation principle: validation sample
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
13/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Validation principle: validation sample
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−3
−2
−1 0 1 2 3 4
14/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Cross-validation
(X1,Y1), . . . ,(Xnt,Ynt)
| {z }
Training set D(t)n ⇒fbm(t) =bfm D(t)n
(Xnt+1,Ynt+1), . . . ,(Xn,Yn)
| {z }
Validation setDn(v)⇒evaluate risk hold-out estimator of the risk:
Rb(v)n bfm(t)= 1 nv
X
(Xi,Yi)∈Dn(v)
cbfm(t)(Xi);Yi
nv=|Dn(v)|=n−nt
cross-validation: average several hold-out estimators
Rbcvfbm;Dn; (Ij(t))16j6V= 1 V
V
X
j=1
Rb(v,j)n bfm(t,j) D(t,j)n =(Xi,Yi)
i∈I(t) j
estimator selection: mbcvDn; (Ij(t))16j6V∈argmin
m∈M
n
Rbcvfbm;Dno ⇒ bf
mbcv(Dn;(I(t)j )16j6V)(Dn)
14/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Cross-validation
(X1,Y1), . . . ,(Xnt,Ynt)
| {z }
Training set D(t)n ⇒fbm(t) =bfm D(t)n
(Xnt+1,Ynt+1), . . . ,(Xn,Yn)
| {z }
Validation setDn(v)⇒evaluate risk hold-out estimator of the risk:
Rb(v)n bfm(t)= 1 nv
X
(Xi,Yi)∈Dn(v)
cbfm(t)(Xi);Yi
nv=|Dn(v)|=n−nt
cross-validation: average several hold-out estimators
Rbcvfbm;Dn; (Ij(t))16j6V= 1 V
V
X
j=1
Rb(v,j)n bfm(t,j) D(t,j)n =(Xi,Yi)
i∈I(t) j
estimator selection: mbcvDn; (Ij(t))16j6V∈argmin
m∈M
n
Rbcvfbm;Dno ⇒ bf
mbcv(Dn;(I(t)j )16j6V)(Dn)
14/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Cross-validation
(X1,Y1), . . . ,(Xnt,Ynt)
| {z }
Training set D(t)n ⇒fbm(t) =bfm D(t)n
(Xnt+1,Ynt+1), . . . ,(Xn,Yn)
| {z }
Validation setDn(v)⇒evaluate risk hold-out estimator of the risk:
Rb(v)n bfm(t)= 1 nv
X
(Xi,Yi)∈Dn(v)
cbfm(t)(Xi);Yi
nv=|Dn(v)|=n−nt
cross-validation: average several hold-out estimators
Rbcvfbm;Dn; (Ij(t))16j6V= 1 V
V
X
j=1
Rb(v,j)n bfm(t,j) D(t,j)n =(Xi,Yi)
i∈I(t) j
estimator selection:
mbcvDn; (Ij(t))16j6V∈argmin
m∈M
n
Rbcvfbm;Dno ⇒ bf
mbcv(Dn;(Ij(t))16j6V)(Dn)
15/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Cross-validation: examples
Exhaustive data splitting: all possible subsets of size nt
⇒ leave-one-out (nt=n−1) Rbloobfm;Dn
= 1 n
n
X
j=1
cbfm(−j)(Xj);Yj
⇒ leave-p-out (nt =n−p)
V-fold cross-validation: B= (Bj)16j6V partition of {1, . . . ,n}
⇒ Rbvfbfm;Dn;B= 1 V
V
X
j=1
Rbjnbfm(−j) Monte-Carlo CV / Repeated learning testing:
I1(t), . . . ,IV(t) i.i.d. uniform
15/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Cross-validation: examples
Exhaustive data splitting: all possible subsets of size nt
⇒ leave-one-out (nt=n−1) Rbloobfm;Dn
= 1 n
n
X
j=1
cbfm(−j)(Xj);Yj
⇒ leave-p-out (nt =n−p)
V-fold cross-validation: B= (Bj)16j6V partition of {1, . . . ,n}
⇒ Rbvfbfm;Dn;B= 1 V
V
X
j=1
Rbjnbfm(−j)
Monte-Carlo CV / Repeated learning testing: I1(t), . . . ,IV(t) i.i.d. uniform
15/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Cross-validation: examples
Exhaustive data splitting: all possible subsets of size nt
⇒ leave-one-out (nt=n−1) Rbloobfm;Dn
= 1 n
n
X
j=1
cbfm(−j)(Xj);Yj
⇒ leave-p-out (nt =n−p)
V-fold cross-validation: B= (Bj)16j6V partition of {1, . . . ,n}
⇒ Rbvfbfm;Dn;B= 1 V
V
X
j=1
Rbjnbfm(−j) Monte-Carlo CV / Repeated learning testing:
I1(t), . . . ,IV(t) i.i.d. uniform
16/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Outline
1 Estimator selection
2 Cross-validation
3 Cross-validation for risk estimation
4 Cross-validation for estimator selection
5 Conclusion on CV
6 Combining cross-validation with aggregation
17/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Bias of cross-validation
In this talk, we always assume: ∀j, Card Dn(t,j)
=nt For V-fold CV: Card(Bj) =n/V ⇒nt =n(V −1)/V. Ideal criterion: R bfm(Dn)
General analysis for the bias: E
Rbcv
bfm;Dn;Ij(t)
16j6V
=E
hR bfm(Dnt)i
⇒ everything depends on n→E
hR bfm(Dn)i
Note: bias can be corrected in some settings (Burman, 1989). Note: Dn→bfm(Dn) must be fixedbefore seeing any data; otherwise (e.g., data-driven model m), stronger bias.
17/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Bias of cross-validation
In this talk, we always assume: ∀j, Card Dn(t,j)
=nt For V-fold CV: Card(Bj) =n/V ⇒nt =n(V −1)/V. Ideal criterion: R bfm(Dn)
General analysis for the bias:
E
Rbcv
bfm;Dn;Ij(t)
16j6V
=E
hR bfm(Dnt)i
⇒ everything depends on n→E
hR bfm(Dn)i
Note: bias can be corrected in some settings (Burman, 1989). Note: Dn→bfm(Dn) must be fixedbefore seeing any data; otherwise (e.g., data-driven model m), stronger bias.
17/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Bias of cross-validation
In this talk, we always assume: ∀j, Card Dn(t,j)
=nt For V-fold CV: Card(Bj) =n/V ⇒nt =n(V −1)/V. Ideal criterion: R bfm(Dn)
General analysis for the bias:
E
Rbcv
bfm;Dn;Ij(t)
16j6V
=E
hR bfm(Dnt)i
⇒ everything depends onn →E
hR bfm(Dn)i
Note: bias can be corrected in some settings (Burman, 1989). Note: Dn→bfm(Dn) must be fixedbefore seeing any data; otherwise (e.g., data-driven model m), stronger bias.
17/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Bias of cross-validation
In this talk, we always assume: ∀j, Card Dn(t,j)
=nt For V-fold CV: Card(Bj) =n/V ⇒nt =n(V −1)/V. Ideal criterion: R bfm(Dn)
General analysis for the bias:
E
Rbcv
bfm;Dn;Ij(t)
16j6V
=E
hR bfm(Dnt)i
⇒ everything depends onn →E
hR bfm(Dn)i
Note: bias can be correctedin some settings (Burman, 1989).
Note: Dn→bfm(Dn) must be fixedbefore seeing any data;
otherwise (e.g., data-driven model m), stronger bias.
18/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Bias of cross-validation: generic example
Assume:
E
hR bfm(Dn)i=α(m) + β(m) n
(e.g., LS/ridge/k-NN regression, LS/kernel density estimation).
⇒ E
Rbcv
bfm;Dn;Ij(t)
16j6V
=α(m) + n
nt β(m)
n
⇒Bias:
decreases as a function of nt, minimal for nt =n−1, negligible if nt ∼n.
⇒V-fold: bias decreases when V increases, vanishes asV →+∞.
18/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Bias of cross-validation: generic example
Assume:
E
hR bfm(Dn)i=α(m) + β(m) n
(e.g., LS/ridge/k-NN regression, LS/kernel density estimation).
⇒ E
Rbcv
bfm;Dn;Ij(t)
16j6V
=α(m) + n
nt β(m)
n
⇒Bias:
decreases as a function of nt, minimal for nt =n−1, negligible if nt ∼n.
⇒V-fold: bias decreases whenV increases, vanishes asV →+∞.
19/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Variance of cross-validation: general case
Hold-out (Nadeau & Bengio, 2003):
varRb(vn)bfm(t)= 1 nvE
hvarc f(X),Yf =bfm(t)i +varR bfm(Dnt)
Monte-Carlo CV and number of splits: (p=n−nt) var Rbcv
bfm;Dn;Ij(t)
16j6V
!
=varRb`po bfm;Dn + 1
V E hvarI(t)
Rb(v)n bfm(t) Dni
| {z }
permutation variance
V-fold CV:V,nt,nv related
leave-one-out: related to stability? (empirical results)
20/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Variance of V -fold CV criterion
Least-squares density estimation (A. & Lerasle, 2016), exact computation (non-asymptotic):
varRbvffbm;Dn;B= 1+O(1)
n varP(fm?) + 2
n2
1+ 4
V −1+O(V1+1n)
A(m)
(simplified formula, histogram model with bin sizedm−1,A(m)≈dm) Linear regression, asymptotic formula (Burman, 1989):
varRbvfbfm;Dn;B= 2σ2 n +4σ4
n2
4+ 4 V −1+
2 (V−1)2+ 1
(V−1)3
+o(n−2)
⇒ decreasing with V, dependence only in second order terms.
21/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Outline
1 Estimator selection
2 Cross-validation
3 Cross-validation for risk estimation
4 Cross-validation for estimator selection
5 Conclusion on CV
6 Combining cross-validation with aggregation
22/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Risk estimation and estimator selection are different goals
mb ∈argmin
m∈M
n
Rbcvbfm
o vs. m? ∈argmin
m∈M
nR fbm(Dn)o For anyZ (deterministic or random),
mb ∈argmin
m∈M
n
Rbcvbfm+Zo
⇒ bias and variance meaningless.
Perfect ranking among (fbm)m∈M ⇔ ∀m,m0∈ M, sign Rbcv(bfm)−Rbcv(bfm0)=sign R(bfm)− R(fbm0)
⇒ E h
Rbcv bfm
−Rbcv bfm0ishould be of the good sign(unbiased risk estimation heuristic: AIC,Cp, leave-one-out...)
⇒ varRbcv bfm
−Rbcv bfm0
should be minimal(detailed heuristic: A. & Lerasle, 2016)
22/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Risk estimation and estimator selection are different goals
mb ∈argmin
m∈M
n
Rbcvbfm
o vs. m? ∈argmin
m∈M
nR fbm(Dn)o For anyZ (deterministic or random),
mb ∈argmin
m∈M
n
Rbcvbfm+Zo
⇒ bias and variance meaningless.
Perfect ranking among (fbm)m∈M ⇔ ∀m,m0∈ M, sign Rbcv(bfm)−Rbcv(bfm0)=sign R(bfm)− R(fbm0)
⇒ E h
Rbcv bfm
−Rbcv bfm0ishould be of the good sign(unbiased risk estimation heuristic: AIC,Cp, leave-one-out...)
⇒ varRbcv bfm
−Rbcv bfm0
should be minimal(detailed heuristic: A. & Lerasle, 2016)
23/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
CV with an estimation goal: the big picture (M “small”)
At first order, the bias drives the performanceof:
leave-p-out,V-fold CV, Monte-Carlo CV if V n2
or ifnv large enough (including hold-out) CV performs similarly to
argmin
m∈M
n E
hR bfm(Dnt)io
⇒ first-order optimality ifnt ∼n
⇒ suboptimal otherwise
e.g., V-fold CV with V fixed.
Theoretical results for least-squares regression and density estimation at least.
23/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
CV with an estimation goal: the big picture (M “small”)
At first order, the bias drives the performance of:
leave-p-out,V-fold CV, Monte-Carlo CV if V n2
or ifnv large enough (including hold-out) CV performs similarly to
argmin
m∈M
n E
hR bfm(Dnt)io
⇒ first-order optimality ifnt ∼n
⇒ suboptimal otherwise
e.g., V-fold CV with V fixed.
Theoretical results for least-squares regression and density estimation at least.
24/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Bias-corrected VFCV / V -fold penalization
Bias-correctedV-fold CV(Burman, 1989):
Rbvf,corrbfm;Dn;B:=Rbvfbfm;Dn;B+Rbnbfm− 1 V
V
X
j=1
Rbnbfm(−j)
=Rbnbfm(Dn)+ penVF(bfm;Dn;B)
| {z }
V-fold penalty (A. 2008)
In least-squares density estimation (A. & Lerasle, 2016): Rbvf(bfm;Dn;B) =Rbn bfm(Dn)+
1+ 1
2(V −1)
| {z }
overpenalization factor
penVF(bfm;Dn;B)
Rb`po(bfm;Dn;B) =Rbn bfm(Dn)+
z }| {
1+ 1 2pn−1
penVF(bfm;Dn;Bloo)
24/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Bias-corrected VFCV / V -fold penalization
Bias-correctedV-fold CV(Burman, 1989):
Rbvf,corrbfm;Dn;B:=Rbvfbfm;Dn;B+Rbnbfm− 1 V
V
X
j=1
Rbnbfm(−j)
=Rbnbfm(Dn)+ penVF(bfm;Dn;B)
| {z }
V-fold penalty(A. 2008)
In least-squares density estimation (A. & Lerasle, 2016):
Rbvf(bfm;Dn;B) =Rbn bfm(Dn)+
1+ 1
2(V −1)
| {z }
overpenalization factor
penVF(bfm;Dn;B)
Rb`po(bfm;Dn;B) =Rbn bfm(Dn)+
z }| {
1+ 1 2pn−1
penVF(bfm;Dn;Bloo)
25/47
Estimator selection Cross-validation CV for risk estimation CV for estimator selection Conclusion on CV Agghoo
Variance and estimator selection
∆(m,m0,V) =Rbvf,corrbfm−Rbvf,corrbfm0
Theorem (A. & Lerasle, 2016, least-squares density estimation)
var ∆(m,m0,V)=4
1+2 n + 1
n2
varP(fm?−fm?0) n +2
1+ 4 V −1 −1
n
B(m,m0) n2
| {z }
>0
If Sm⊂Sm0 are two histogram models with constant bin sizes dm−1,dm−10 , then, B(m,m0)∝ kfm?−fm?0kdm.
The two terms are of the same order ifkfm?−fm?0k ≈dm/n.