Pruning randomized trees with L1-norm regularization

(1)

Method: pruning with L1-regularization

Conclusion and perspectives

Results

Supervised learning is often applied on high-dimensional data:

3D Image segmentation (Kinect) Stability prediction in electrical network Genome studies

...

A robust supervised learning method: ensemble of randomized

decision trees (eg., extremely randomized trees , Geurts et al., 2006). Each decision tree is a hierarchical set of questions which leads

to a prediction.

High dimensional data often requires

huge ensembles of randomized trees

to achieve good performance, i.e.

- a high number of trees,

- each tree is fully developed (no pruning)

.

Huge amount of storage

Variable selection is performed by applying L1-norm regularization (LASSO) on the new training samples :

Regularization preserves

accuracy or even can

improve it.

Drastic pruning.

Our experiments show that it is possible to drastically prune ensembles of randomized trees while preserving or even

improving their accuracy.

Interested? Email me at [email protected] or go on www.ajoly.org

Original forest has 30000

test nodes

Robust with respect to the number of trees and the pre-pruning parameter

Each tree node is associated with a prediction weight and a node characteristic function equals to 1 if the sample reaches the node, 0 otherwise.

1

2 ₃

4 5

Example for one tree and one sample:

The solution leads to prune and to reweight the

randomized tree ensemble. A test node can be deleted if the weights of all its descendants are equal to zero.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.91 1.1 0 0.5 1 1.5 2 2.5 3 3.5 Quadratic error t

Friedman1 problem M=100 n_min=1 K=10 Regularised ET ET 0 100 200 300 400 500 600 700 800 900 0 0.5 1 1.5 2 2.5 3 3.5 Number of test no des t

Friedman1 problem M=100 n_min=1 K=10 Regularised ET 10 100 1000 10000 100000 1 10 100 1000

Friedman1 problem n_min=10 K=10

Regularised ET ET 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 1 10 100 1000

Friedman1 problem n_min=10 K=10

Regularised ET ET 0.150.2 0.250.3 0.350.4 0.450.5 0.550.6 0.65 1 10 100 1000 Friedman1 problem M=100 K=10 Regularized ET ET E rr o r Nu mber o f t est n odes

Forest size Pre-pruning eﬀect

Pruning randomized trees

with L1-norm regularization

Arnaud Joly, François Schnitzler, Pierre Geurts, Louis Wehenkel

Systems and modeling, Department of EE and CS, University of Liège

Motivation

Goal of Supervised learning: from a dataset of input-output pairs to learn a function to predict the output for any (new) input .

The function thus embeds the original input space into a new space with the total number of tree nodes.

Future research:

- how to avoid the generation of a huge randomized tree ensemble prior to regularization?

- Study the links with compressed sensing

10 100 1000 10000 100000 1 10 100 1000 Friedman1 problem M=100 K=10 Regularized ET ET