Slides

(1)

Francisco J. Ferrer Troyano

Jesús S. Aguilar, José C. Riquelme University of Sevilla

FACIL:

Incremental Rule Learning from Data Streams

Motivation

Why Classify? Why Rules?

Heterogeneous Data Sources ÆNoise, Missing Values, Inconsistency High-Speed DataÆFast algorithms, approximate answers Open-Ended Data ÆOn-line learning… complex models?

FACIL – The hybrid knowledge model

Pres.

0 122

mm / Hg

25 35 75 90

Ritmo

50 210

puls/min

60 80 130 150

e1 a1 e2

a2 a3

e3 Frontera de Decisión

Primer enemigo recibido Amigo obtenido para el 2º enemigo

60 - 150

25-90

Ejemplos Enemigos Ejemplos Amigos

Sop-Pos:96 % Sop.Rel.=20% Prec.=98.5%

Edad ^{12 - 20}

56 Intervalos

Ritmo

50 210

puls/min

e3

a2 a1

Pres.

0 122

mm / Hg

e1 e1

a3

a1 - a3 e1 - e3 21 - 29 30 - 38 39 - 47 48 - 56

e1 a1 e3 a3 a2 e2 12

Border examples inside rules

(3)

FACIL – The Algorithm

Three user parameters

Maximum number of rules / label

Æ necessary to bound the computational complexity

Minimum accuracy (purity) / rule: p/(p+n) Æ anytime,approximate answers

Maximum growth / dimension

FACIL – IIL updating

IIL: Every example Æ 1/3 cases:

1. Success 2. Anomaly 3. Novelty

General IL Approach

Successive Episodes ÆUpdating: Mt= f(Vt,Mt-1) Window of examples ÆV_t: New [and] past examples

(4)

Éxito

0.7 0.6 0.1

0.4 0.5 0.7

A

1.0 0.2

a A

e a

FACIL – IIL / Success

0.7 0.6 0.1

0.4 0.5 0.7

1.0 0.2

X

a₁ e₁

a₃

a2

e₃

e2

a4

FACIL – IIL / Anomaly

(5)

0.7 0.6 0.1

0.4 0.5 0.7

1.0 0.2

Anomaly

FACIL – Learning Approach / Case 3

Selection:

The rule implying the minimum growth FACIL – IIL / Novelty: The GrowthDistance

0.7 0.9

0.1 0.1 0.5 0.9

1.0

R

X

1.0

d(R,x) = 0.2+0.4

B C

A 0.1 0.4 0.6

D

X

1.0

R R

d(R,x) = 0.25+0.2

(6)

0.7 0.6 0.1

0.4 0.5 0.7

1.0 0.2

0.7 0.6 0.1

0.4 0.5 0.7

R

1.0 0.2

Max. Dif. = 15%

G(R

_a

,x)= 0.2 G(R

_b

,x)= 0.2

X

FACIL – IIL / Novelty: The Moderate Growth Distance

FACIL – Implicit and Explicit Forgetting Heuristics

reglas en t=16

ejemplos t=35

B

A

A B

reglas en t=34

B

A

e₁₆ B e₁₆, e₂₉

e₁₉, e₂₈

e₃₄

último ejemplo

e₂₈ e19

e16 e16

e34

t=[17,33]

e29

e₁₆ e₃₄

multiple windows

(7)

A

A B B

A

FACIL – Multi-strategy Classification

?

A

? ?

A

B B

B

Experiments – General Purpose Classifier Evaluation

FACIL as a UCI Repository

10 Folds Cross Validation, 10 times, t-student (0.05)

Implicit Forgetting Heuristic

Synthetic Data Streams

Change Magnitude = ±0.1 for 40% attributes every 10⁴examples INPUT: 10⁵examples - 5% class noise: 900 training / 100 testing Both Implicit and Explicit Forgetting Heuristics

(8)

UCI Repository – Accuracy

50 60 70 80 90 100

Balance Breast C, Glass Heart S, Ionosphere Iris Diabetes Sonar Vehicle Vowel Waveform Wine C4.5Rules FACIL

Numerical Attributes

Num. & Symbolic Atts.

40 50 60 70 80 90 100

Anneal Autos

Clevelan d

Credit Rating German Credit

Hep atitis

Horse Colic Zoo

Audoilogy Breast C

ancer Mushroom

Primary Tumor

Soybean Vote C4.5Rules FACIL

UCI Repository – Accuracy

(9)

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Balance Breast C, Glass Heart S, Ionosphere Iris Diabetes Sonar Vehicle Vowel Waveform Wine C4.5Rules FACIL

Numerical Attributes

UCI Repository – Number of rules

0 10 20 30 40 50 60 70 80

Anneal Autos Cleveland

Credit Rating German Credit

Hepat itis

Horse Colic Zoo Audoi

logy

Breast Ca ncer

Mushroom Primary Tum

or

Soybean Vote

C4.5Rules FACIL Num. & Simb. Atts.

UCI Repository – Number of rules

(10)

0,05 0,1 0,15 0,2 0,25 0,3 0,35 0,4 0,45

10 20 30 40 50 60 70 80 90 100

Valores límite para los tres parámetros

En función del máximo número de reglas En función del máximo crecimiento permitido En función del umbral de mínima pureza

Time (seconds)

UCI Repository – Parametric Sensitivity

81 81,5 82 82,5 83 83,5 84 84,5 85 85,5 86 86,5 87

10 20 30 40 50 60 70 80 90 100

Accuracy (%)

(11)

24 29 34 39 44 49 54 59 64 69 74

10 20 30 40 50 60 70 80 90 100

Number of rules

Máx. Growth = 50% Máx. Growth = 75%

Nº Attributes Accuracy Time Nº Rules ≤25 Accuracy Time Nº Rules ≤25

10 96,36 14 10 98.32 20 7

20 93,25 63 9 94.57 35 12

30 91,04 124 3 89.26 53 10

40 89,7 221 2 89.19 67 7

50 83,88 277 4 87.72 77 10

Moving Hyperplane – Accuracy, time, and number of rules

Î>180 examples / second

Î>3500 examples / second Î>2500 examples / second

Î>650 examples / second

(12)

0 50 100 150 200 250 300 350 400 450 500 550 600

10 20 30 40 50

Max Crec. = 50 %, Max Nº Reglas = 50 Max Crec. = 75 %, Max Nº Reglas = 100

Time

Moving Hyperplane – Sensitivity to the number of attributes

73 78 83 88 93 98

1000 10000 20000 30000 40000 50000

Max Crec. = 75 %, Max Nº Reglas = 25 Max Crec. = 100 %, Max Nº Reglas = 10

Accuracy

Moving Hyperplane – Sensitivity to the number of examples

(13)

Sensitivity to attribute

Removing / Recovering attributes To adapt the Growth distance:

Changing attribute weights Æimprove the selection of rules

Reordering attributes according influence Æspeed-up the updating process

Processing streams with a variable number of attributes

To evaluate alternative distances between a rule and an example

PreProcessing streams? ÆMultiple Classification Future Work

Francisco Ferrer

LSI - University of Seville [email protected]

Slides

Francisco J. Ferrer Troyano

FACIL:

Incremental Rule Learning from Data Streams

Contents

Motivation

Three user parameters

IIL: Every example Æ 1/3 cases:

1. Success 2. Anomaly 3. Novelty

General IL Approach

Éxito

A

A

Anomaly

Selection:

R

R R

R

R

G(R

,x)= 0.2 G(R

,x)= 0.2

multiple windows

A

A

?

? ?

B B

FACIL as a UCI Repository

Synthetic Data Streams

Francisco Ferrer

GRAZIE MILLE