• Aucun résultat trouvé

Regression: data (X

N/A
N/A
Protected

Academic year: 2022

Partager "Regression: data (X"

Copied!
56
0
0

Texte intégral

(1)

1/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Analyse du risque de forêts purement aléatoires

Sylvain Arlot1 (collaboration avec Robin Genuer2)

1Université Paris-Sud

2ISPED, Université Bordeaux 2

Séminaire AgroParisTech, Paris 1er Octobre 2018

Références: arXiv:1407.3939 arXiv:1604.01515

(2)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Outline

1 Random forests

2 Purely random forests

3 Toy forests in one dimension

4 Hold-out random forests

(3)

3/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Outline

1 Random forests

2 Purely random forests

3 Toy forests in one dimension

4 Hold-out random forests

(4)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Regression: data (X

1

, Y

1

), . . . , (X

n

, Y

n

)

−3

−2

−1 0 1 2 3 4

(5)

5/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Goal: find the signal (denoising)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−3

−2

−1 0 1 2 3 4

(6)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Regression

Data Dn: (X1,Y1), . . . ,(Xn,Yn)∈Rd×R (i.i.d. ∼P) Yi =s?(Xi) +εi

with s?(X) =E[Y|X] (regression function).

Goal: learn f measurable function X →Rs.t. the quadratic risk

E(X,Y)∼P

h

f(X)−s?(X)2i is minimal.

(7)

6/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Regression

Data Dn: (X1,Y1), . . . ,(Xn,Yn)∈Rd×R (i.i.d. ∼P) Yi =s?(Xi) +εi

with s?(X) =E[Y|X] (regression function).

Goal: learn f measurable function X →Rs.t. the quadratic risk

E(X,Y)∼P

h f(X)−s?(X)2i is minimal.

(8)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Regression tree (Breiman et al, 1984)

Tree: piecewise-constant

predictor, obtained by partitioning recursivelyRd.

Restriction: splits parallel to the axes.

1 Choice of the partition U (tree structure)

2 For eachλ∈U(tree leaf), choice of the estimationβbλ ofs?(x) whenxλ. Here,βbλ=Yλ average of the (Yi)Xi∈λ.

(9)

7/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Regression tree (Breiman et al, 1984)

Tree: piecewise-constant

predictor, obtained by partitioning recursivelyRd.

1 Choice of the partition U (tree structure)

Usually, at each step, one looks for the best split of the data into two groups

(minimize sum of

within-group variances)Dn.

2 For eachλ∈U(tree leaf), choice of the estimationβbλ ofs?(x) whenxλ. Here,βbλ=Yλ average of the (Yi)Xi∈λ.

(10)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Regression tree (Breiman et al, 1984)

Tree: piecewise-constant

predictor, obtained by partitioning recursivelyRd.

1 Choice of the partition U (tree structure)

2 For eachλ∈U(tree leaf), choice of the estimationβbλ ofs?(x) whenxλ.

Here,βbλ=Yλ average of the (Yi)Xi∈λ.

(11)

8/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Random forest (Breiman, 2001)

Definition (Random forest (Breiman, 2001)) n

bsΘj,16j 6qo collection of tree predictors, (Θj)16j6q i.i.d. r.v.

independent fromDn.

Random forest predictorbs obtained by aggregating the tree collection.

bs(x) = 1 q

q

X

j=1

bsΘj(x)

ensemble method (Dietterich, 1999, 2000)

powerfulstatistical learningalgorithm, for both classification andregression.

(12)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Bagging (“bootstrap aggregating”)

Bootstrap(Efron, 1979): draw n i.i.d. r.v., uniform over (Xi,Yi)/i = 1, . . . ,n (sampling with replacement)

⇒ resampleDnb

Bootstrapping a tree: bstreeb =bstree(Dnb)

Bagging: bootstrap (q independent resamples) then aggregation

bsbagging(x) = 1 q

q

X

j=1

bstreeb,j (x)

(13)

10/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Random Forest-Random Inputs (Breiman, 2001)

Definition (RI tree)

In a RI tree, at each node,mtry variables are randomly chosen.

Then, the best cut direction is chosen only among the chosen variables.

Definition (Random forest RI)

A random forest RI (RF-RI) is obtained byaggregating RI trees built on independentbootstrap resamples.

RF-RI ⇔ bagging on RI trees

(14)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Random Forest-Random Inputs

Dn

Bootstrap

uu {{ ((

Dnb,1

RI tree

Dnb,2

. . . . . . Dnb,q

. . . . . .

bsΘ1

Aggregation )) bsΘ2

##

. . . . . . bsΘq

vv

bsRF−RI

(15)

12/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Example of application of random forests: Kinect

Depth image ⇒ depth comparison features at each pixel

⇒body part at each pixel ⇒body part positions ⇒ · · ·

Figures from Shotton et al (2011)

(16)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Theoretical results on RF-RI

Few theoretical results on Breiman’s original RF-RI Most results:

focus on aspecific partof the algorithm (resampling, split criterion),

modifythe algorithm (eg, subsampling instead of resampling) makestrong assumptionsons?

References (see survey paperby Biau and Scornet, 2016):

Mentch & Hooker (2014), Scornet, Biau & Vert (2015), Wager & Athey (2015), ...

⇒ Here, we consider simplified RF models, for which a precise analysis is possible: purely random forests

(17)

13/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Theoretical results on RF-RI

Few theoretical results on Breiman’s original RF-RI Most results:

focus on aspecific partof the algorithm (resampling, split criterion),

modifythe algorithm (eg, subsampling instead of resampling) makestrong assumptionsons?

References (see survey paperby Biau and Scornet, 2016):

Mentch & Hooker (2014), Scornet, Biau & Vert (2015), Wager & Athey (2015), ...

⇒ Here, we consider simplified RF models, for which a precise analysis is possible: purely random forests

(18)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Outline

1 Random forests

2 Purely random forests

3 Toy forests in one dimension

4 Hold-out random forests

(19)

15/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Purely random forests

Definition (Purely random tree) bsU(x) =X

λ∈U

Yλ(Dn)1x∈λ

whereYλ(Dn) is the average of (Yi)Xi∈λ ,(Xi,Yi)∈Dn and the partitionUis independent fromDn.

Definition (Purely random forest)

bs(x) = 1 q

q

X

j=1

bsUj(x) withU1, . . . ,Uq i.i.d., independent fromDn.

Example (“hold-out RF” model): use someextra dataD0nfor building the trees: Uj =URI(Dn0?j)) (can be done by splitting the sample into two subsamplesDn andDn0).

B

From now on, Dn is the sample used for computing the Yλ(Dn), and we assume its size is n.

(20)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Purely random forests

Definition (Purely random forest)

bs(x) = 1 q

q

X

j=1

bsUj(x) = 1 q

q

X

j=1

X

λ∈Uj

Yλ(Dn)1x∈λ

withU1, . . . ,Uq i.i.d., independent fromDn.

Example (“hold-out RF” model): use someextra data Dn0 for building the trees: Uj =URI(Dn0?j)) (can be done by splitting the sample into two subsamplesDn andDn0).

B

From now on, Dn is the sample used for computing the Yλ(Dn), and we assume its size is n.

(21)

15/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Purely random forests

Definition (Purely random forest)

bs(x) = 1 q

q

X

j=1

bsUj(x) = 1 q

q

X

j=1

X

λ∈Uj

Yλ(Dn)1x∈λ

withU1, . . . ,Uq i.i.d., independent fromDn.

Example (“hold-out RF” model): use someextra data Dn0 for building the trees: Uj =URI(Dn0?j)) (can be done by splitting the sample into two subsamplesDn andDn0).

B

From now on, Dn is the sample used for computing the Yλ(Dn), and we assume its size is n.

(22)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Purely random forests

U1

U2

. . . . . . Uq

UsingDn, with or without resampling

Independent fromDn

. . . . . .

. . . . . .

bsU1

Aggregation ((

bsU2

""

. . . . . . bsUq

vv

bsPRF

(23)

17/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Purely random forests: theory

Consistency: Biau, Devroye & Lugosi (2008), Scornet (2014)

Rates of convergence: Breiman (2004), Biau (2012)

Some adaptivity to dimension reduction(sparse framework):

Biau (2012)

Forests decrease the estimation error(Biau, 2012; Genuer, 2012)

⇒ What about approximation error? Almost the same for a forest and a tree?

(24)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Purely random forests: theory

Consistency: Biau, Devroye & Lugosi (2008), Scornet (2014)

Rates of convergence: Breiman (2004), Biau (2012)

Some adaptivity to dimension reduction(sparse framework):

Biau (2012)

Forests decrease the estimation error(Biau, 2012; Genuer, 2012)

⇒ What about approximation error?

Almost the same for a forest and a tree?

(25)

18/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Risk of a single tree (regressogram)

Given the partitionU, regressogram estimator bsU(x) := X

λ∈U

Yλ1x∈λ

whereYλ is the average of (Yi)Xi∈λ.

bsU∈argmin

f∈SU

(1 n

n

X

i=1

Yif(Xi)2 )

whereSU is the vector space of functions which are constant over eachλ∈U.

Define:

˜sU(x) := X

λ∈U

βλ1x∈λ whereβλ :=E[s?(X)|Xλ] .

⇒˜sU∈argminf∈S

UE

h f(X)−s?(X)2i and ˜sU(x) =EbsU(x)|U

(26)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Risk of a single tree (regressogram)

Given the partitionU, regressogram estimator bsU(x) := X

λ∈U

Yλ1x∈λ

whereYλ is the average of (Yi)Xi∈λ.

bsU∈argmin

f∈SU

(1 n

n

X

i=1

Yif(Xi)2 )

whereSU is the vector space of functions which are constant over eachλ∈U.

Define:

˜sU(x) := X

λ∈U

βλ1x∈λ whereβλ :=E[s?(X)|Xλ] .

(27)

19/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Risk decomposition: single tree

E h

bsU(X)−s?(X)2i

=E

h ˜sU(X)−s?(X)2i+E h

bsU(X)−˜sU(X)2i

= Approximation error + Estimation error

Ifs? is smooth,X ∼ U([0,1]) andUregular partition intoD pieces, then

E

h ˜sU(X)−s?(X)2i∝ 1 D2

If var(Y|X) =σ2 does not depend onX, then E

h

bsU(X)−˜sU(X)2iσ2D n

(28)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Risk decomposition: single tree

E h

bsU(X)−s?(X)2i

=E

h ˜sU(X)−s?(X)2i+E h

bsU(X)−˜sU(X)2i

= Approximation error + Estimation error Ifs? is smooth,X ∼ U([0,1]) andUregular partition intoD pieces, then

E

h ˜sU(X)−s?(X)2i∝ 1 D2

If var(Y|X) =σ2 does not depend onX, then E

h

bsU(X)−˜sU(X)2iσ2D n

(29)

19/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Risk decomposition: single tree

E h

bsU(X)−s?(X)2i

=E

h ˜sU(X)−s?(X)2i+E h

bsU(X)−˜sU(X)2i

= Approximation error + Estimation error Ifs? is smooth,X ∼ U([0,1]) andUregular partition intoD pieces, then

E

h ˜sU(X)−s?(X)2i∝ 1 D2

If var(Y|X) =σ2 does not depend onX, then E

h

bsU(X)−˜sU(X)2iσ2D n

(30)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Approximation and estimation errors

0 20 40 60 80 100

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

Approx. error Estim. error E[Risk]

(31)

21/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Risk decomposition: purely random forest

(Uj)16j6q finite partitions, i.i.d. ∼ U Estimator (forest): bsU1···q(x) := 1

q

q

X

j=1

bsUj(x) Ideal forest: ˜sU1···q(x) := 1

q

q

X

j=1

˜sUj(x) =EbsU1···q(x)|U1···q

Quadratic risk decomposition (givenX =x) E

h

bsU1···q(x)−s?(x)2i=E

h ˜sU1···q(x)−s?(x)2i +E

h

bsU1···q(x)−˜sU1···q(x)2i Approximation error: BU,q(x) :=E

h

˜

sU1···q(x)−s?(x)2i

(32)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Risk decomposition: purely random forest

(Uj)16j6q finite partitions, i.i.d. ∼ U Estimator (forest): bsU1···q(x) := 1

q

q

X

j=1

bsUj(x) Ideal forest: ˜sU1···q(x) := 1

q

q

X

j=1

˜sUj(x) =EbsU1···q(x)|U1···q

Quadratic risk decomposition (givenX =x) E

h

bsU1···q(x)−s?(x)2i=E

h ˜sU1···q(x)−s?(x)2i +E

h

bsU1···q(x)−˜sU1···q(x)2i Approximation error: B (x) :=E

h

˜

s (x)−s?(x)2i

(33)

22/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Bias decomposition (given X = x )

BU,q(x) =BU,∞(x) + VU(x) q where BU,∞(x) :=E˜sU(x)s?(x)2

and VU(x) := var ˜sU(x)

BU,∞(x) is the approx. error of the infinite forest: ˜sU,∞(x) :=E˜sU(x) to be compared with theapproximation error of a single tree

BU,1(x) =BU,∞(x) +VU(x)

(34)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Outline

1 Random forests

2 Purely random forests

3 Toy forests in one dimension

4 Hold-out random forests

(35)

24/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Toy forests in one dimension

Assume: X = [0,1) andX uniform over [0,1)

U∼ Uktoy defined by:

U=

0,1−T k

,

1−T

k ,2−T k

, . . . ,

kT

k ,1

whereT has uniform distribution over [0,1].

(36)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Interpretation of the ideal infinite forest

Proposition (A. & Genuer, 2014)

For any xh1k,1−1ki, the ideal infinite forest at x satisfies:

˜

sU,∞(x) = (s?hk)(x) = Z 1

0

s?(t)hk(x−t)dt where

hk(u) =

k(1−ku) if 06u6 1k k(1 +ku) if1k 6u 60 0 if |u|> 1k

(37)

26/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Interpretation of the ideal infinite forest: proof

IU(x) := the interval ofU to whichx belongs

˜

sU(x) = 1

|IU(x)|

Z

IU(x)

s?(t)dt

Ifxh1k,1−1ki, IU(x) =hx+Vxk−1,x+Vkx whereVx has uniform distribution over [0,1].

˜

sU,∞(x) =EU

˜sU(x)

=k Z 1

0

s?(t)P

x+Vx−1

k 6t<x+ Vx

k

dt

=k Z 1

0

s?(t)P k(tx)<Vx 6k(tx) + 1dt

(38)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Interpretation of the ideal infinite forest: proof

IU(x) := the interval ofU to whichx belongs

˜

sU(x) = 1

|IU(x)|

Z

IU(x)

s?(t)dt

Ifxh1k,1−1ki, IU(x) =hx+Vxk−1,x+Vkx whereVx has uniform distribution over [0,1].

˜

sU,∞(x) =EU

˜sU(x)

=k Z 1

0

s?(t)P

x+Vx−1

k 6t<x+ Vx

k

dt

=k Z 1

0

s?(t)P k(tx)<Vx 6k(tx) + 1dt

(39)

27/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Analysis of the approximation error

(H2) s? twice differentiable over (0,1) ands?00 bounded Taylor-Lagrange formula: for everyt ∈(0,1), some ct,x ∈(0,1) exists such that

s?(t)−s?(x) =s?0(x)(t−x) +1

2s?00(ct,x)(t−x)2

Therefore,

˜

sU(x)−s?(x) =k Z x+Vxk

x+Vxk−1

(s?(t)−s?(x))dt

=k s?0(x) Z x+Vxk

x+Vxk−1

(t−x)dt+R1(x)

= s?0(x) k

Vx−1

2

+R1(x) whereR1(x) = k2 Rx+

Vx k

x+Vx−1k s?00(ct,x)(t−x)2dt

(40)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Analysis of the approximation error

(H2) s? twice differentiable over (0,1) ands?00 bounded Taylor-Lagrange formula: for everyt ∈(0,1), some ct,x ∈(0,1) exists such that

s?(t)−s?(x) =s?0(x)(t−x) +1

2s?00(ct,x)(t−x)2 Therefore,

˜

sU(x)−s?(x) =k Z x+Vxk

x+Vxk−1

(s?(t)−s?(x))dt

=k s?0(x) Z x+Vxk

x+Vx−1k

(t−x)dt+R1(x)

= s?0(x) k

Vx−1

2

+R1(x)

x+Vx

(41)

28/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Analysis of the approximation error

EU˜sU(x)−s?(x)2 6

k4 VU(x) ∼

k→+∞

k2 Proposition (A. & Genuer, 2014)

Assuming (H2), for every xhk1,1−1ki, BUtoy

k ,1(x) ∼

k→+∞

k2 BUtoy

k ,∞(x)6

k4 Z 1−1

k 1 k

BUtoy

k ,1(x)dx ∼

k→+∞

k2

Z 1−1

k 1 k

BUtoy

k ,∞(x)dx 6

k4 Ratek−4 is tight assuming:

(H3) s? three times differentiable over (0,1) ands?000 bounded

(42)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Estimation error

General fact (Jensen’s inequality):

E h

bsU,(X)−˜sU,(X)2i6E h

bsU(X)−˜sU(X)2i

For the toy forest, without any resampling for computing labels and assuming that var(Y|X) =σ2:

E h

bsU(X)−˜sU(X)2iσ2k n E

h

bsU,∞(X)−˜sU,(X)2i≈ 2 3

σ2k n (A. & Genuer, 2016)

(43)

29/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Estimation error

General fact (Jensen’s inequality):

E h

bsU,(X)−˜sU,(X)2i6E h

bsU(X)−˜sU(X)2i For the toy forest, without any resampling for computing labels and assuming that var(Y|X) =σ2:

E h

bsU(X)−˜sU(X)2iσ2k n E

h

bsU,∞(X)−˜sU,(X)2i≈ 2 3

σ2k n (A. & Genuer, 2016)

(44)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Summary: risk analysis

Single tree Infinite forest

(q= 1) (q=∞)

E h

bsU1···q(x)−s?(x)2ic1(s?,x) k2 +σ2k

n

c2(s?,x)

k4 +2σ2k 3n

where c1(s?,x) =s?0(x)2

12 and c2(s?,x) =s?00(x)2 144 . Assumptions:

x ∈(0,1) far from boundary

(H3) s? three times differentiable over (0,1) ands?000 bounded X uniform over [0,1]

var(Y|X) =σ2

(45)

31/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Rates of convergence

Corollary: risk convergence rates (far from boundaries, withk =kn? optimal):

Tree> n−2/3

Infinite forest6 n−4/5 ⇒ minimax C2

Remarks:

q > (kn?)2 is sufficient to get an “infinite” forest with subsampling a out ofn for computing labels: estimation error of a single tree σa2k instead of σn2k; no change for infinite forest

(46)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Rates of convergence

Corollary: risk convergence rates (far from boundaries, withk =kn? optimal):

Tree> n−2/3

Infinite forest6 n−4/5 ⇒ minimax C2

Remarks:

q > (kn?)2 is sufficient to get an“infinite” forest with subsampling a out ofn for computing labels:

estimation error of a single tree σa2k instead of σn2k; no change for infinite forest

(47)

32/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Outline

1 Random forests

2 Purely random forests

3 Toy forests in one dimension

4 Hold-out random forests

(48)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Definition (Biau, 2012)

SplitDn into Dn1 andDn2 U1

U2

. . . Uq

UsingDn2, no resampling here

RI partitions, usingDn1

. . . . . .

bsU1

Aggregation )) bsU2

##

. . . bsUq

{{

bsHO−RF

(49)

34/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Numerical experiments: framework

Data generation:

Xi ∼ U([0,1]d) Yi =s?(Xi) +εi

εi ∼ N(0, σ2) σ2= 1/16 s? :x∈[0,1]d 7→ 1

10×h10 sin(πx1x2) + 20(x3−0.5)2+ 10x4+ 5x5i. Data split: n1 = 1 280 n2= 25 600

Forests definition:

nodesize= 1 k ∈ {25,26,27,28}

“Large” forests are made ofq =k trees.

Compute integrated approximation/estimation errors

(50)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Numerical experiments: results (d = 5)

Single tree Large forest No bootstrap

mtry=d

0.13

k0.17 +1.04σ2k n2

0.13

k0.17+ 1.04σ2k n2 Bootstrap

mtry=d

0.14

k0.17 +1.06σ2k n2

0.15

k0.29+ 0.08σ2k n2

No bootstrap mtry=bd/3c

0.23

k0.19 +1.01σ2k n2

0.06

k0.31+ 0.06σ2k n2

Bootstrap mtry=bd/3c

0.25

k0.20 +1.02σ2k n2

0.06

k0.34+ 0.05σ2k n2

2

d + 2 ≈0.286 4

d+ 4 ≈0.444

(51)

36/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Numerical experiments: results (d = 10)

Single tree Large forest No bootstrap

mtry=d

0.11

k0.12 +1.03σ2k n2

0.11

k0.12+ 1.03σ2k n2 Bootstrap

mtry=d

0.11

k0.11 +1.05σ2k n2

0.10

k0.19+ 0.04σ2k n2

No bootstrap mtry=bd/3c

0.21

k0.18 +1.08σ2k n2

0.08

k0.25+ 0.04σ2k n2

Bootstrap mtry=bd/3c

0.20

k0.16 +1.05σ2k n2

0.07

k0.26+ 0.03σ2k n2

2

d + 2 ≈0.167 4

d+ 4 ≈0.286

(52)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Conclusion

Forests improve the order of magnitudeof the approximation error, compared to a single tree

Estimation error seems to change only by a constant factor (at least for toy forests);

not contradictory with literature: here, we fix k; different picture if nodesizeis fixed (+subsampling)

Randomization:

randomization of labels seems to have no impact;

strong impact of randomization of partitions(hold-out RF: both bootstrap and mtry)

(53)

37/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Conclusion

Forests improve the order of magnitudeof the approximation error, compared to a single tree

Estimation error seems to change only by a constant factor (at least for toy forests);

not contradictory with literature: here, we fix k; different picture if nodesizeis fixed (+subsampling)

Randomization:

randomization of labels seems to have no impact;

strong impact of randomization of partitions(hold-out RF:

both bootstrap and mtry)

(54)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Approximation error: generalization

General result on the approximation errorunder (H2)/(H3):

e.g., roughly, if x iscentered in its cell (on average overU), tree approx. error∝M2 infinite forestapprox. error∝M22 whereM2 ≈averagesquare distance fromx to the boundary of its cell (∝k−2 for toy forests)

toy forests in dimension d: approximation error∝k−2/d vs. k−4/d (infinite forest reachesminimaxC2 rates)

purely uniformly random forests in dimension 1(split a random cell, chosen with probability equal to its volume): ≈toy balanced purely random forests(full binary tree, uniform splits) in dimension d: k−α (tree) vs. k−2α (forest) where α=−log21− 2d1 ⇒ not minimax rates!

Mondrian forests(Mourtada, Gaïffas & Scornet 2018).

(55)

38/39

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Approximation error: generalization

General result on the approximation errorunder (H2)/(H3):

e.g., roughly, if x iscentered in its cell (on average overU), tree approx. error∝M2 infinite forestapprox. error∝M22 whereM2 ≈averagesquare distance fromx to the boundary of its cell (∝k−2 for toy forests)

toy forests in dimension d: approximation error∝k−2/d vs.

k−4/d (infinite forest reachesminimaxC2 rates)

purely uniformly random forests in dimension 1(split a random cell, chosen with probability equal to its volume): ≈toy balanced purely random forests(full binary tree, uniform splits) in dimension d: k−α (tree) vs. k−2α (forest) where α=−log21− 2d1 ⇒ not minimax rates!

Mondrian forests(Mourtada, Gaïffas & Scornet 2018).

(56)

Random forests Purely random forests Toy forests Hold-out random forests Conclusion

Open problems / future work

Theory on approximation error of hold-out RF?

⇒ understand the typical shape of the cell that contains x, for a RI tree

(x centered on average? square distance to boundary?)

Theory on estimation errorof other models (beyond toy)?

of hold-out RF?

Extensive numerical experiments? (other functions s?, ...)

Références

Documents relatifs

Forests improve the order of magnitude of the approximation error, compared to a single tree. Estimation error seems to change only by a constant factor (at least for

In contrast to traditional methods of neural network model building based on feed- forward layered networks [3, 4], which build a single model for the entire feature

Such simulations extend the training grammar, and, although the simulations are not wholly correct or incorrect—as ob- served from the results with different weightings for

Optimal control, reduced basis method, a posteriori error estimation, model order reduction, parameter- dependent systems, partial differential equations, parabolic problems.. ∗

Also, this section strives to critically review techniques designed to boost the L 2 -convergence rate without excessive additional computational effort An a priori and an a

As there are certain limitations due to the small-sized and heavily skewed datasets and error functions, we utilize a simula- tion technique, namely the bootstrap method in order

Key Words: Extreme quantiles estimation, relative approximation er- ror, asymptotic properties,

This paper investigates the lag length selection problem of a Vector Error Correction Model (VECM) by using a convergent information criterion and tools based on the