• Aucun résultat trouvé

JiriMatasandJanˇSochmanCentreforMachinePerceptionCzechTechnicalUniversity,Praguehttp://cmp.felk.cvut.cz AdaBoost

N/A
N/A
Protected

Academic year: 2022

Partager "JiriMatasandJanˇSochmanCentreforMachinePerceptionCzechTechnicalUniversity,Praguehttp://cmp.felk.cvut.cz AdaBoost"

Copied!
136
0
0

Texte intégral

(1)

AdaBoost

Jiri Matas and Jan ˇ Sochman

Centre for Machine Perception Czech Technical University, Prague

http://cmp.felk.cvut.cz

(2)

Presentation Outline:

AdaBoost algorithm

• Why is of interest?

• How it works?

• Why it works?

AdaBoost variants

AdaBoost with a Totally Corrective Step (TCS)

Experiments with a Totally Corrective Step

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(3)

Introduction

1990 – Boost-by-majority algorithm (Freund)

1995 – AdaBoost (Freund & Schapire)

1997 – Generalized version of AdaBoost (Schapire & Singer)

2001 – AdaBoost in Face Detection (Viola & Jones) Interesting properties:

AB is a linear classifier with all its desirable properties.

AB output converges to the logarithm of likelihood ratio.

AB has good generalization properties.

AB is a feature selector with a principled strategy (minimisation of upper bound on empirical error).

AB close to sequential decision making (it produces a sequence of gradually more complex classifiers).

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(4)

What is AdaBoost?

AdaBoost is an algorithm for constructing a ”strong” classifier as linear combination

f (x) =

T

X

t=1

α

t

h

t

(x) of “simple” “weak” classifiers h

t

(x).

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(5)

What is AdaBoost?

AdaBoost is an algorithm for constructing a ”strong” classifier as linear combination

f (x) =

T

X

t=1

α

t

h

t

(x)

of “simple” “weak” classifiers h

t

(x).

Terminology

h

t

(x) ... “weak” or basis classifier, hypothesis, ”feature”

H (x) = sign(f (x)) ... ‘’strong” or final classifier/hypothesis

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(6)

What is AdaBoost?

AdaBoost is an algorithm for constructing a ”strong” classifier as linear combination

f (x) =

T

X

t=1

α

t

h

t

(x)

of “simple” “weak” classifiers h

t

(x).

Terminology

h

t

(x) ... “weak” or basis classifier, hypothesis, ”feature”

H (x) = sign(f (x)) ... ‘’strong” or final classifier/hypothesis Comments

The h

t

(x)’s can be thought of as features.

Often (typically) the set H = {h(x)} is infinite.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(7)

(Discrete) AdaBoost Algorithm – Singer & Schapire (1997) Given: (x

1

, y

1

), ..., (x

m

, y

m

); x

i

∈ X , y

i

∈ {−1, 1}

Initialize weights D

1

(i) = 1/m For t = 1, ..., T :

1. (Call WeakLearn), which returns the weak classifier h

t

: X → {−1, 1} with minimum error w.r.t. distribution D

t

;

2. Choose α

t

∈ R, 3. Update

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

)) Z

t

where Z

t

is a normalization factor chosen so that D

t+1

is a distribution Output the strong classifier:

T

X

!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(8)

(Discrete) AdaBoost Algorithm – Singer & Schapire (1997) Given: (x

1

, y

1

), ..., (x

m

, y

m

); x

i

∈ X , y

i

∈ {−1, 1}

Initialize weights D

1

(i) = 1/m For t = 1, ..., T :

1. (Call WeakLearn), which returns the weak classifier h

t

: X → {−1, 1} with minimum error w.r.t. distribution D

t

;

2. Choose α

t

∈ R, 3. Update

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

)) Z

t

where Z

t

is a normalization factor chosen so that D

t+1

is a distribution Output the strong classifier:

H (x) = sign

T

X

t=1

α

t

h

t

(x)

!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(9)

WeakLearn

Loop step: Call WeakLearn, given distribution D

t

;

returns weak classifier h

t

: X → {−1, 1} from H = {h(x)}

Select a weak classifier with the smallest weighted error h

t

= arg min

hj∈H

j

= P

m

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

Prerequisite:

t

< 1/2 (otherwise stop)

WeakLearn examples:

• Decision tree builder, perceptron learning rule – H infinite

Selecting the best one from given finite set H

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(10)

WeakLearn

Loop step: Call WeakLearn, given distribution D

t

;

returns weak classifier h

t

: X → {−1, 1} from H = {h(x)}

Select a weak classifier with the smallest weighted error h

t

= arg min

hj∈H

j

= P

m

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

Prerequisite:

t

< 1/2 (otherwise stop)

WeakLearn examples:

• Decision tree builder, perceptron learning rule – H infinite

Selecting the best one from given finite set H Demonstration example

Weak classifier = perceptron

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(11)

WeakLearn

Loop step: Call WeakLearn, given distribution D

t

;

returns weak classifier h

t

: X → {−1, 1} from H = {h(x)}

Select a weak classifier with the smallest weighted error h

t

= arg min

hj∈H

j

= P

m

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

Prerequisite:

t

< 1/2 (otherwise stop)

WeakLearn examples:

• Decision tree builder, perceptron learning rule – H infinite

Selecting the best one from given finite set H Demonstration example

Training set Weak classifier = perceptron

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(12)

WeakLearn

Loop step: Call WeakLearn, given distribution D

t

;

returns weak classifier h

t

: X → {−1, 1} from H = {h(x)}

Select a weak classifier with the smallest weighted error h

t

= arg min

hj∈H

j

= P

m

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

Prerequisite:

t

< 1/2 (otherwise stop)

WeakLearn examples:

• Decision tree builder, perceptron learning rule – H infinite

Selecting the best one from given finite set H Demonstration example

Training set Weak classifier = perceptron

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(13)

AdaBoost as a Minimiser of an Upper Bound on the Empirical Error

The main objective is to minimize ε

tr

=

m1

|{i : H (x

i

) 6= y

i

}|

It can be upper bounded by ε

tr

(H ) ≤

T

Q

t=1

Z

t

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(14)

AdaBoost as a Minimiser of an Upper Bound on the Empirical Error

The main objective is to minimize ε

tr

=

m1

|{i : H (x

i

) 6= y

i

}|

It can be upper bounded by ε

tr

(H ) ≤

T

Q

t=1

Z

t

How to set α

t

?

Select α

t

to greedily minimize Z

t

(α) in each step

Z

t

(α) is convex differentiable function with one extremum

⇒ h

t

(x) ∈ {−1, 1} then optimal α

t

=

12

log(

1+r1−rt

t

) where r

t

= P

m

i=1

D

t

(i)h

t

(x

i

)y

i

Z

t

= 2 p

t

(1 −

t

) ≤ 1 for optimal α

t

⇒ Justification of selection of h

t

according to

t

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(15)

AdaBoost as a Minimiser of an Upper Bound on the Empirical Error

The main objective is to minimize ε

tr

=

m1

|{i : H (x

i

) 6= y

i

}|

It can be upper bounded by ε

tr

(H ) ≤

T

Q

t=1

Z

t

How to set α

t

?

Select α

t

to greedily minimize Z

t

(α) in each step

Z

t

(α) is convex differentiable function with one extremum

⇒ h

t

(x) ∈ {−1, 1} then optimal α

t

=

12

log(

1+r1−rt

t

) where r

t

= P

m

i=1

D

t

(i)h

t

(x

i

)y

i

Z

t

= 2 p

t

(1 −

t

) ≤ 1 for optimal α

t

⇒ Justification of selection of h

t

according to

t

Comments

The process of selecting α

t

and h

t

(x) can be interpreted as a single optimization step minimising the upper bound on the empirical error.

Improvement of the bound is guaranteed, provided that

t

< 1/2.

The process can be interpreted as a component-wise local optimization

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(16)

Reweighting Effect on the training set

Reweighting formula:

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

))

Z

t

= exp(−y

i

P

t

q=1

α

q

h

q

(x

i

)) m Q

t

q=1

Z

q

exp(−α

t

y

i

h

t

(x

i

))

< 1, y

i

= h

t

(x

i

)

> 1, y

i

6= h

t

(x

i

) }

Increase (decrease) weight of wrongly (correctly) classified examples. The weight is the upper bound on the error of a given example!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(17)

Reweighting Effect on the training set

Reweighting formula:

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

))

Z

t

= exp(−y

i

P

t

q=1

α

q

h

q

(x

i

)) m Q

t

q=1

Z

q

exp(−α

t

y

i

h

t

(x

i

))

< 1, y

i

= h

t

(x

i

)

> 1, y

i

6= h

t

(x

i

) }

Increase (decrease) weight of wrongly (correctly) classified examples. The weight is the upper bound on the error of a given example!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(18)

Reweighting Effect on the training set

Reweighting formula:

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

))

Z

t

= exp(−y

i

P

t

q=1

α

q

h

q

(x

i

)) m Q

t

q=1

Z

q

exp(−α

t

y

i

h

t

(x

i

))

< 1, y

i

= h

t

(x

i

)

> 1, y

i

6= h

t

(x

i

) }

Increase (decrease) weight of wrongly (correctly) classified examples. The weight is the upper bound on the error of a given example!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(19)

Reweighting Effect on the training set

Reweighting formula:

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

))

Z

t

= exp(−y

i

P

t

q=1

α

q

h

q

(x

i

)) m Q

t

q=1

Z

q

exp(−α

t

y

i

h

t

(x

i

))

< 1, y

i

= h

t

(x

i

)

> 1, y

i

6= h

t

(x

i

) }

Increase (decrease) weight of wrongly (correctly) classified examples. The weight is the upper bound on the error of a given example!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(20)

Reweighting Effect on the training set

Reweighting formula:

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

))

Z

t

= exp(−y

i

P

t

q=1

α

q

h

q

(x

i

)) m Q

t

q=1

Z

q

exp(−α

t

y

i

h

t

(x

i

))

< 1, y

i

= h

t

(x

i

)

> 1, y

i

6= h

t

(x

i

) }

Increase (decrease) weight of wrongly (correctly) classified examples. The weight is the upper bound on the error of a given example!

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0 0.5 1 1.5 2 2.5 3

yf(x)

err

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(21)

Reweighting Effect on the training set

Reweighting formula:

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

))

Z

t

= exp(−y

i

P

t

q=1

α

q

h

q

(x

i

)) m Q

t

q=1

Z

q

exp(−α

t

y

i

h

t

(x

i

))

< 1, y

i

= h

t

(x

i

)

> 1, y

i

6= h

t

(x

i

) }

Increase (decrease) weight of wrongly (correctly) classified examples. The weight is the upper bound on the error of a given example!

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0 0.5 1 1.5 2 2.5 3

yf(x)

err

Effect on h

t

α

t

minimize Z

t

⇒ P

i:ht(xi)=yi

D

t+1

(i) = P

i:ht(xi)6=yi

D

t+1

(i)

Error of h

t

on D

t+1

is 1/2

e 0.5

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(22)

Summary of the Algorithm

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(23)

Summary of the Algorithm

Initialization... 1

2

3

4

5

6

7

8

9

10

11

12

13

14

(24)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(25)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

Find h

t

= arg min

hj∈H

j

=

m

P

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

t = 1 1

2

3

4

5

6

7

8

9

10

11

12

13

14

(26)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

Find h

t

= arg min

hj∈H

j

=

m

P

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

If

t

≥ 1/2 then stop

t = 1 1

2

3

4

5

6

7

8

9

10

11

12

13

14

(27)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

Find h

t

= arg min

hj∈H

j

=

m

P

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

If

t

≥ 1/2 then stop

Set α

t

=

12

log(

1−r1+rt

t

)

t = 1 1

2

3

4

5

6

7

8

9

10

11

12

13

14

(28)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

Find h

t

= arg min

hj∈H

j

=

m

P

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

If

t

≥ 1/2 then stop

Set α

t

=

12

log(

1−r1+rt

t

)

Update

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

)) Z

t

t = 1 1

2

3

4

5

6

7

8

9

10

11

12

13

14

(29)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

Find h

t

= arg min

hj∈H

j

=

m

P

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

If

t

≥ 1/2 then stop

Set α

t

=

12

log(

1−r1+rt

t

)

Update

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

)) Z

t

Output the final classifier:

H (x) = sign

T

X

t=1

α

t

h

t

(x)

!

t = 1

0 5 10 15 20 25 30 35 40

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(30)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

Find h

t

= arg min

hj∈H

j

=

m

P

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

If

t

≥ 1/2 then stop

Set α

t

=

12

log(

1−r1+rt

t

)

Update

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

)) Z

t

Output the final classifier:

H (x) = sign

T

X

t=1

α

t

h

t

(x)

!

t = 2

0 5 10 15 20 25 30 35 40

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(31)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

Find h

t

= arg min

hj∈H

j

=

m

P

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

If

t

≥ 1/2 then stop

Set α

t

=

12

log(

1−r1+rt

t

)

Update

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

)) Z

t

Output the final classifier:

H (x) = sign

T

X

t=1

α

t

h

t

(x)

!

t = 3

0 5 10 15 20 25 30 35 40

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(32)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

Find h

t

= arg min

hj∈H

j

=

m

P

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

If

t

≥ 1/2 then stop

Set α

t

=

12

log(

1−r1+rt

t

)

Update

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

)) Z

t

Output the final classifier:

H (x) = sign

T

X

t=1

α

t

h

t

(x)

!

t = 4

0 5 10 15 20 25 30 35 40

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(33)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

Find h

t

= arg min

hj∈H

j

=

m

P

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

If

t

≥ 1/2 then stop

Set α

t

=

12

log(

1−r1+rt

t

)

Update

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

)) Z

t

Output the final classifier:

H (x) = sign

T

X

t=1

α

t

h

t

(x)

!

t = 5

0 5 10 15 20 25 30 35 40

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(34)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

Find h

t

= arg min

hj∈H

j

=

m

P

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

If

t

≥ 1/2 then stop

Set α

t

=

12

log(

1−r1+rt

t

)

Update

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

)) Z

t

Output the final classifier:

H (x) = sign

T

X

t=1

α

t

h

t

(x)

!

t = 6

0 5 10 15 20 25 30 35 40

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(35)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

Find h

t

= arg min

hj∈H

j

=

m

P

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

If

t

≥ 1/2 then stop

Set α

t

=

12

log(

1−r1+rt

t

)

Update

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

)) Z

t

Output the final classifier:

H (x) = sign

T

X

t=1

α

t

h

t

(x)

!

t = 7

0 5 10 15 20 25 30 35 40

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(36)

Summary of the Algorithm Initialization...

For t = 1, ..., T :

Find h

t

= arg min

hj∈H

j

=

m

P

i=1

D

t

(i)[y

i

6= h

j

(x

i

)]

If

t

≥ 1/2 then stop

Set α

t

=

12

log(

1−r1+rt

t

)

Update

D

t+1

(i) = D

t

(i)exp(−α

t

y

i

h

t

(x

i

)) Z

t

Output the final classifier:

H (x) = sign

T

X

t=1

α

t

h

t

(x)

!

t = 40

0 5 10 15 20 25 30 35 40

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(37)

Does AdaBoost generalize?

Margins in SVM

max min

(x,y)∈S

y (~ α · ~h(x)) k~ αk

2

Margins in AdaBoost

max min

(x,y)∈S

y (~ α · ~h(x)) k~ αk

1

Maximizing margins in AdaBoost

P

S

[yf (x) ≤ θ] ≤ 2

T

T

Y

t=1

q

1−θt

(1 −

t

)

1+θ

where f (x) = α ~ · ~h(x) k~ αk

1

Upper bounds based on margin

P

D

[yf (x) ≤ 0] ≤ P

S

[yf (x) ≤ θ] + O

√ 1 m

d log

2

(m/d)

θ

2

+ log(1/δ )

!

1/2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(38)

AdaBoost variants Freund & Schapire 1995

Discrete (h : X → {0, 1})

Multiclass AdaBoost.M1 (h : X → {0, 1, ..., k})

Multiclass AdaBoost.M2 (h : X → [0, 1]

k

)

Real valued AdaBoost.R (Y = [0, 1], h : X → [0, 1]) Schapire & Singer 1997

Confidence rated prediction (h : X → R, two-class)

Multilabel AdaBoost.MR, AdaBoost.MH (different formulation of minimized loss)

... Many other modifications since then (Totally Corrective AB, Cascaded AB)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(39)

Pros and cons of AdaBoost Advantages

Very simple to implement

Feature selection on very large sets of features

Fairly good generalization Disadvantages

Suboptimal solution for α ¯

Can overfit in presence of noise

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(40)

Adaboost with a Totally Corrective Step (TCA) Given: (x

1

, y

1

), ..., (x

m

, y

m

); x

i

∈ X , y

i

∈ {−1, 1}

Initialize weights D

1

(i) = 1/m For t = 1, ..., T :

1. (Call WeakLearn), which returns the weak classifier h

t

: X → {−1, 1} with minimum error w.r.t. distribution D

t

;

2. Choose α

t

∈ R, 3. Update D

t+1

4. (Call WeakLearn) on the set of h

m

’s with non zero α’s . Update α

.

. Update D

t+1

. Repeat till |

t

− 1/2| < δ, ∀t.

Comments

All weak classifiers have

t

≈ 1/2, therefore the classifier selected at t + 1 is

”independent” of all classifiers selected so far.

It can be easily shown, that the totally corrective step reduces the upper bound on the empirical error without increasing classifier complexity.

The TCA was first proposed by Kivinen and Warmuth, but their α

t

is set as in

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(41)

Experiments with TCA on the IDA Database

Discrete AdaBoost, Real AdaBoost, and Discrete and Real TCA evaluated

Weak learner: stumps.

Data from the IDA repository (Ratsch:2000):

Input Training Testing Number of dimension patterns patterns realizations

Banana 2 400 4900 100

Breast cancer 9 200 77 100

Diabetes 8 468 300 100

German 20 700 300 100

Heart 13 170 100 100

Image segment 18 1300 1010 20

Ringnorm 20 400 7000 100

Flare solar 9 666 400 100

Splice 60 1000 2175 20

Thyroid 5 140 75 100

Titanic 3 150 2051 100

Twonorm 20 400 7000 100

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(42)

Results with TCA on the IDA Database

Training error (dashed line), test error (solid line)

Discrete AdaBoost (blue), Real AdaBoost (green),

Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(43)

Results with TCA on the IDA Database

Training error (dashed line), test error (solid line)

Discrete AdaBoost (blue), Real AdaBoost (green),

Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

0.05 0.1 0.15 0.2 0.25 0.3 0.35

0.4 IMAGE

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(44)

Results with TCA on the IDA Database

Training error (dashed line), test error (solid line)

Discrete AdaBoost (blue), Real AdaBoost (green),

Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

0.34 0.36 0.38 0.4 0.42 0.44

0.46 FLARE

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(45)

Results with TCA on the IDA Database

Training error (dashed line), test error (solid line)

Discrete AdaBoost (blue), Real AdaBoost (green),

Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

0.18 0.2 0.22 0.24 0.26 0.28 0.3

0.32 GERMAN

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(46)

Results with TCA on the IDA Database

Training error (dashed line), test error (solid line)

Discrete AdaBoost (blue), Real AdaBoost (green),

Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

0.05 0.1 0.15 0.2 0.25 0.3 0.35

0.4 RINGNORM

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(47)

Results with TCA on the IDA Database

Training error (dashed line), test error (solid line)

Discrete AdaBoost (blue), Real AdaBoost (green),

Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

0.05 0.1 0.15 0.2

0.25 SPLICE

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(48)

Results with TCA on the IDA Database

Training error (dashed line), test error (solid line)

Discrete AdaBoost (blue), Real AdaBoost (green),

Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

0.05 0.1 0.15 0.2

0.25 THYROID

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(49)

Results with TCA on the IDA Database

Training error (dashed line), test error (solid line)

Discrete AdaBoost (blue), Real AdaBoost (green),

Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

0.215 0.22 0.225 0.23 0.235

0.24 TITANIC

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(50)

Results with TCA on the IDA Database

Training error (dashed line), test error (solid line)

Discrete AdaBoost (blue), Real AdaBoost (green),

Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

0.1 0.15 0.2 0.25 0.3 0.35 0.4

0.45 BANANA

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(51)

Results with TCA on the IDA Database

Training error (dashed line), test error (solid line)

Discrete AdaBoost (blue), Real AdaBoost (green),

Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

0.25 0.26 0.27 0.28 0.29 0.3

0.31 BREAST

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(52)

Results with TCA on the IDA Database

Training error (dashed line), test error (solid line)

Discrete AdaBoost (blue), Real AdaBoost (green),

Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

0.05 0.1 0.15 0.2 0.25 0.3

0.35 DIABETIS

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(53)

Results with TCA on the IDA Database

Training error (dashed line), test error (solid line)

Discrete AdaBoost (blue), Real AdaBoost (green),

Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)

the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)

0.05 0.1 0.15 0.2 0.25 0.3

0.35 HEART

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(54)

Conclusions

The AdaBoost algorithm was presented and analysed

A modification of the Totally Corrective AdaBoost was introduced

Initial test show that the TCA outperforms AB on some standard data sets.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

(55)
(56)
(57)
(58)
(59)
(60)
(61)
(62)
(63)

0.5 1 1.5 2 2.5 3

err

(64)

e

0.5

(65)
(66)
(67)
(68)

0.5 1 1.5 2 2.5 3

err

(69)

0.5 1 1.5 2 2.5 3

err

(70)

e

0.5

(71)
(72)
(73)
(74)
(75)
(76)
(77)
(78)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(79)
(80)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(81)
(82)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(83)
(84)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(85)
(86)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(87)
(88)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(89)
(90)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(91)
(92)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(93)
(94)
(95)
(96)
(97)
(98)
(99)
(100)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(101)
(102)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(103)
(104)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(105)
(106)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(107)
(108)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(109)
(110)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(111)
(112)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(113)
(114)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

(115)

0.05 0.1 0.15 0.2 0.25 0.3 0.35

0.4 IMAGE

(116)

0.34 0.36 0.38 0.4 0.42 0.44

0.46 FLARE

(117)

0.18 0.2 0.22 0.24 0.26 0.28 0.3

0.32 GERMAN

(118)

0.05 0.1 0.15 0.2 0.25 0.3 0.35

0.4 RINGNORM

(119)

0.05 0.1 0.15 0.2

0.25 SPLICE

(120)

0.05 0.1 0.15 0.2

0.25 THYROID

(121)

0.215 0.22 0.225 0.23 0.235

0.24 TITANIC

(122)

0.1 0.15 0.2 0.25 0.3 0.35 0.4

0.45 BANANA

(123)

0.25 0.26 0.27 0.28 0.29 0.3

0.31 BREAST

(124)

0.05 0.1 0.15 0.2 0.25 0.3

0.35 DIABETIS

(125)

0.05 0.1 0.15 0.2 0.25 0.3

0.35 HEART

(126)

0.05 0.1 0.15 0.2 0.25 0.3 0.35

0.4 IMAGE

(127)

0.34 0.36 0.38 0.4 0.42 0.44

0.46 FLARE

(128)

0.18 0.2 0.22 0.24 0.26 0.28 0.3

0.32 GERMAN

(129)

0.05 0.1 0.15 0.2 0.25 0.3 0.35

0.4 RINGNORM

(130)

0.05 0.1 0.15 0.2

0.25 SPLICE

(131)

0.05 0.1 0.15 0.2

0.25 THYROID

(132)

0.215 0.22 0.225 0.23 0.235

0.24 TITANIC

(133)

0.1 0.15 0.2 0.25 0.3 0.35 0.4

0.45 BANANA

(134)

0.25 0.26 0.27 0.28 0.29 0.3

0.31 BREAST

(135)

0.05 0.1 0.15 0.2 0.25 0.3

0.35 DIABETIS

(136)

0.05 0.1 0.15 0.2 0.25 0.3

0.35 HEART

Références

Documents relatifs

Une autre comparaison est faite entre notre méthode de sélection et une méthode classique de sélection qui utilise un algorithme génétique simple avec une fonction de fit- ness

To investigate how diverse the score values of different class probability cali- brated models are, we compared the scores obtained by the five CPC methods described in Section

University of Basel Library, on 11 Jul 2017 at 14:54:33, subject to the Cambridge Core terms of use, available at https:/www.cambridge.org/core/terms... and introduced the notion of

There is increasing evidence 5 that the paradigm of selecting one set of hyperparameters is suboptimal to keeping all models and combining them in an ensemble setup, so we

the Xi’an snake with the Dark Warrior(s) was arrived at independently in Luo Qikun, ‘Xi’an Jiaotong daxue Xihan muzang bihua ershiba xiu xingtu kaoshi’, 242; Shaanxi sheng

During one round, each worker trains its weak classifiers on the entire training dataset, identifies its best base learner and writes it in the JavaSpace with its weighted error..

To solve the region-classification task in the instance-transfer learning setting we note that (1) the conformal framework [1, 10] is a general framework for the region

Given previous tests conducted by us on real-time visual rear car detection application [9] that have also shown these new “connected control-points” features to provide better