• lin ´eaires: g ( x ) = w

(1)

• Objectif

• d ´eterminer directement les fonctions discriminantes

• lin ´eaires: g ( x ) = w

₀

+ ∑

^d

i=1

w

_i

x

_i

= w

^t

x + w

₀

• lin éaires g én éralis ées: g ( x ) =

d

∑

i=1

a

_i

y

_i

( x ) = a

^t

y

• en minimisant le risque empirique

(2)

Fonctions discriminantes lin ´eaires

2

• Justifications

• parfois optimal

• facile `a calculer

• candidates pour des classifieurs initiales

• aborder quelques principes importants

(3)

• fonction de d ´ecision:

f ( x ) =

C

₁

si g ( x ) > 0 , C

₂

si g ( x ) < 0 =

C

₁

si w

^t

x > − w

₀

, C

₂

si w

^t

x < − w

₀

x₀=1

x₁

. . .

w₂ w₀

w₁

w_d

g(x)

x₂

. . .

x_d

unit´e de biais unit´e de sortie

unit´es d’entr´ee

(4)

Fonctions discriminantes lin ´eaires

4

• G ´eom ´etrie – deux classes

• fronti `ere de d ´ecision H est un hyperplan:g ( x ) = 0

• x

₁

, x

₂

∈ H: w

^t

( x

₁

− x

₂

) = 0

• r ´egions de d ´ecision: R

₁

:cot ´e positif, R

₂

:cot ´e n ´egatif

• r = distance alg ´ebrique de x et H:

x = x

_p

+ r w w

g ( x ) = w

^t

x + w

₀

= r w r = g ( x )

w

(5)

• G ´eom ´etrie – deux classes

x

g( x ) = 0 w

x

₁

x

₂

x

₃

w

⁰

/ ^|| ^w ^||

r

H

x

_p

R

¹

R

²

(6)

Fonctions discriminantes lin ´eaires

6

• G ´eom ´etrie – multiclasses

• C

_i

/ non C

_i

ω

¹

not ω1

ω1

not ω2

ω2

not ω³ ω3

not ω4

ω4

ω

²

ω

⁴

ω

³

r´egion ambigue

(7)

• G ´eom ´etrie – multiclasses

• N ( N − 1 )/ 2 fonctions discriminantes

ω1

ω2

ω3

ω4

ω4 ω4

ω

³

ω

²

ω

¹

ω

⁴

H13

H₁₂

H14

H23 H₂₄

H34

r´egion ambigue

(8)

Fonctions discriminantes lin ´eaires

8

• Fonctions discriminantes lin ´eaires

• machine lin ´eaire: g

_j

( x ) = w

^t_j

x + w

_j0

, j = 1 ,..., N

• fronti `eres de d ´ecision H

_i_,_j

:g

_i

( x ) = g

_j

( x )

• ( w

_i

− w

_j

) est orthogonal `a H

_i_,_j

• r ( x , H

_i_,_j

) = g

_i

( x ) − g

_j

( x )

w

_i

− w

_j

(9)

• Fonctions discriminantes lin ´eaires

R

1

R

2

R

3

R

4

R

5

ω

1

R

2

R

3

R

1

ω

2

ω

1

ω

3

ω

5

ω

2

ω

3

ω

4

H

₁₅

H

₂₅

H

₂₄

H

₁₄

H

₃₅

H

₁₃

H

₃₄

H

₂₃

H

₁₂

H

₂₃

H

₁₃

(10)

Fonctions discriminantes lin ´eaires

10

• Fonctions discriminantes lin éaires g én éralis ées:

g ( x ) =

d i ∑ = 1

a _i y _i ( x ) = a ^t y

• exemple: fonction discriminante quadratique:

g ( x ) = w

₀

+ ∑

^d

i=1

w

_i

x

_i

+ ∑

^d

i=1 d

∑

j=1

w

_{i j}

x

_i

x

_j

• fronti `ere de d ´ecision: hyperquadrique

(11)

• Fonctions discriminantes lin éaires g én éralis ées

• exemple: g ( x ) = a

₁

+ a

₂

x + a

₃

x

²

, y =



 1 x x

²





0

-1 0

1 2

y2

0 2 4

y3

0.5 1

1.5 2

2.5

y1

1

-1 0 2

-2 x

R1

R1 R2

y = ¹

( )

^x^x²

R2

R1

ˆ

(12)

12

• exemple: y =



 x

₁

x

₂

α x

₁

x

₂





y

₂

w

R

2

R

1

R

1

R

2

R

1

x

₁

x

₂

x

₁

x

₂

y

₁

y

₃

y = ^x ( )

¹

^x ^α

²

^x

¹

^x

²

H ^ˆ

ˆ

(13)

• Vecteur augment ´e

• g ( x ) = w

₀

+ ∑

^d

i=1

w

_i

x

_i

= ∑

^d

i=0

w

_i

x

_i

( x

₀

= 1 )

• g ( x ) =

d i

∑

=1

a

_i

y

_i

, d = d + 1, y =



 

  1 x

₁

...

x

_d



 

  , a =



 

  w

₀

w

₁

...

w

_d



 

  =



 

  w

₀

w



 

 

(14)

Fonctions discriminantes lin ´eaires

14

• Vecteur augment ´e

y1

y2

y0

a

y0=1

R1

R2

y0=0

y=0¹

y₂=0

fronti

`ere de d´ecision

(15)

• S éparabilit é lin éaire

• D

_n

= ( y

₁

, z

₁

),..., ( y

_n

, z

_n

)

, z

_i

=

1 si y

_i

est classifi ´e C

₁

− 1 si y

_i

est classifi ´e C

₂

• g ( x ) = a

^t

y s ´epare D

_n

sans erreur:

a

^t

y

_i

z

_i

> 0 , i = 1 ,..., n

• a: vecteur s ´eparateur, vecteur de solution

(16)

Fonctions discriminantes lin ´eaires

16

• S éparabilit é lin éaire

y

₁

y

₂

y

₁

y

₂

a a

r´egion de r´egion de

solution solution

plans´eparateur

plan”s´eparateur”

(17)

• Marge de s ´eparation:

m _i = g ( x _i ) z _i = a ^t yz _i

• S ´eparation avec une marge b:

m _i = a ^t y _i z _i > b , i = 1 ,..., n

(18)

Fonctions discriminantes lin ´eaires

18

• Marge de s ´eparation

y

1

y

2

y

3

a

1

a

2

a

2

a

1

y

1

y

2

y

3

b/ || y

2

||

b/ || y

¹

||

b/ || y ||

³

}

r´egion de r´egion de

solution solution

(19)

• Proc ´edures de descente de gradient

• fonction de crit `ere: J ( a ) – minimis ´ee si a est une solution

• a ( k + 1 ) = a ( k ) − η( k ) J ( a ( k ))

• η( k ) : taux d’apprentissage

D ESCENTE D E G RADIENT S IMPLE ( Θ , η (·), a

₀

)

1 a ← a

₀

2 k ← 0 3 faire

4 k ← k + 1

5 a ← a − η( k ) J ( a )

6 jusqu’ `a | η ( k ) J ( a )| < Θ

7 retourner a

(20)

Fonctions discriminantes lin ´eaires

20

• Descente de Newton

• J ( a ) J ( a ( k )) + J

^t

( a − a ( k )) + 1

2 ( a − a ( k ))

^t

H ( a − a ( k ))

• matrice hessienne: H

_{i j}

= δ

²

J δ a

_i

δ a

_j

• a ( k + 1 ) = a ( k ) − H

⁻¹

J

D ESCENTE D E N EWTON (Θ, a

₀

)

1 a ← a

₀

2 faire

3 a ← a − H

⁻¹

J ( a )

4 jusqu’ `a | H

⁻¹

J ( a )| < Θ

5 retourner a

(21)

• Descente de Newton

a

₁

a

₂

J(a)

(22)

Fonctions discriminantes lin ´eaires

22

• Le perceptron

• J

_p

( a ) = ∑

ⁿ

i=1

I

_{_a^t_y_i_z_i_≤₀_}

(− a

^t

y

_i

z

_i

)

• J

_p

= ∑

ⁿ

i=1

I

_{_a^t_y_i_z_i_≤₀_}

(− y

_i

z

_i

)

• a ( k + 1 ) = a ( k ) + η( k ) ∑

ⁿi=1

I

_{_a^t_y_i_z_i_≤₀_}

y

_i

z

_i

(23)

• Le perceptron

P ERCEPTRON B ATCH ( Θ , η (·), a

₀

)

1 a ← a

₀

2 k ← 0 3 faire

4 k ← k + 1

5 a ← a + η( k ) ∑

ⁿi=1

I

_{_a^t_y_i_z_i_≤₀_}

y

_i

z

_i

6 jusqu’ `a | η ( k ) ∑

ⁿi=1

I

_{_a^t_y_i_z_i_≤₀_}

y

_i

z

_i

| < Θ

7 retourner a

(24)

Fonctions discriminantes lin ´eaires

24

• Fonctions de crit `ere

-2 0 2

4 -2

0 2

4 0

100

-2 0 2

4 -2

0 2

4 0

5 -2 0 2 4 -2

0 2

4 0

1 2 3

-2 0 2 4 -2

0 2

4 0

5 10

y₁ y₁

y₂ y₂

y₃ y₃

a₂ a₂

a₁ a₁

J_p(a)

Jq(a) J_r(a)

J(a)

r´egion de r´egion de

solution solution

(25)

1 a ← a

₀

2 k ← 0 3 faire

4 k ← ( k + 1 ) mod n

5 si a

^t

y

_k

z

_k

≤ 0 alors y

_k

mal classifi´e 6 a ← a + y

_k

z

_k

7 jusqu’ `a ∑

ⁿi=1

I

_{_a^t_y_i_z_i_≤₀_}

= 0 pas d’erreur 8 retourner a

• Th ´eor `eme

• Si l’ensemble d’entraˆınement est lin ´eairement s ´eparable, l’algorithm

P ERCEPTRON E N L IGNE se termine `a une vecteur de solution apr `es

un nombre fini de corrections.

(26)

Fonctions discriminantes lin ´eaires

26

• Le perceptron en-ligne, avec marge, d’incr ´ement variable

P ERCEPTRON E N L IGNE M ARGE V ARIABLE (η(·), a

₀

, b)

1 a ← a

₀

2 k ← 0 3 faire

4 k ← k + 1 5 k

← k mod n

6 si a

^t

y

_k

z

_k

≤ b alors 7 a ← a + η( k ) y

_k

z

_k

8 jusqu’ `a ∑

ⁿi=1

I

_{_a^t_y_i_z_i_≤_b_}

= 0 pas d’erreur par rapport `a la marge b

9 retourner a

(27)

• Conditions de convergence

• η( k ) ≥ 0

• lim

m→∞

m k

∑

=1

η( k ) = ∞

• lim

m→∞

∑

^mk=1

η

²

( k )

(∑

^mk=1

η ( k ))

²

= 0

(28)

Fonctions discriminantes lin ´eaires

28

• Le perceptron batch d’incr ´ement variable

• y

⁽^k⁾

= ∑

ⁿ

i=1

I

_{_a^t₍_k₎_y_i_z_i_≤₀_}

y

_i

z

_i

P ERCEPTRON B ATCH V ARIABLE (η(·), a

₀

)

1 a ← a

₀

2 k ← 0 3 faire

4 k ← k + 1

5 a ← a + η( k ) ∑

ⁿi=1

I

_{_a^t_y_i_z_i_≤₀_}

y

_i

z

_i

6 jusqu’ `a ∑

ⁿi=1

I

_{_a^t_y_i_z_i_≤₀_}

= 0

7 retourner a

(29)

• Proc ´edures de relaxation

• J

_q

( a ) = ∑

ⁿ

i=1

I

_{_a^t_y_i_z_i_≤₀_}

( a

^t

y

_i

z

_i

)

²

• J

_r

( a ) = 1 2

n i

∑

=1

I

_{_a^t_y_i_z_i_≤_b_}

( a

^t

y

_i

z

_i

− b )

²

y

_i

z

_i

²

• J

_r

= ∑

ⁿ

i=1

I

_{_a^t_y_i_z_i_≤_b_}

a

^t

y

_i

z

_i

− b y

_i

z

_i

²

y

_i

z

_i

• a ( k + 1 ) = a ( k ) + η( k ) ∑

ⁿ

i=1

I

_{_a^t_y_i_z_i_≤_b_}

b − a

^t

y

_i

z

_i

y

_i

z

_i

²

y

_i

z

_i

(30)

Fonctions discriminantes lin ´eaires

30

• Proc ´edures de relaxation

R ELAXATION B ATCH M ARGE ( η (·), a

₀

, b)

1 a ← a

₀

2 k ← 0 3 faire

4 k ← k + 1

5 a ← a + η( k ) ∑

ⁿi=1

I

_{_a^t_y_i_z_i_≤_b_}^b⁻^a^t^yⁱ^zⁱ

y_iz_i²

y

_i

z

_i

6 jusqu’ `a ∑

ⁿi=1

I

_{_a^t_y_i_z_i_≤_b_}

= 0

7 retourner a

(31)

• Relaxation en-ligne

R ELAXATION E N L IGNE M ARGE (η(·), a

₀

, b)

1 a ← a

₀

2 k ← 0 3 faire

4 k ← k + 1 5 k

← k mod n

6 si a

^t

y

_k

z

_k

≤ b alors

7 a ← a + η( k )

^b⁻_y^a^t^y^k^z^k

kz_k²

y

_k

z

_k

8 jusqu’ `a ∑

ⁿi=1

I

_{_a^t_y_i_z_i_≤_b_}

= 0

9 retourner a

(32)

Fonctions discriminantes lin ´eaires

32

• Relaxation en-ligne

• r ( k ) = b − a

^t

y

_k

z

_k

y

_k

z

_k

a

t

y

k

= b

a (k)

y

^k

r(k)

y

₁

y

₂

1- η η

(33)

• η > 1: sur-relaxation

• η < 1: sous-relaxation

• condition de convergence: 0 < η < 2

a₁ a₁

J(a) J(a)

(34)

Fonctions discriminantes lin ´eaires

34

• Comportement dans le cas non-s ´eparable

• proc ´edures de correction d’erreur

• fonctionnent bien si

• la d écision de Bayes est à peu près lin éaire

• l’erreur de Bayes est petite

• si 2 d > n, la probabilit é de non-s éparabilit é est petite

(35)

• Incr ´ement fixe

• boucle infinie

• engendre un proc ´essus d’ ´etat fini

• moyenner les vecteurs de poids

• Incr ´ement variable

• converge si η( k ) → 0

(36)

Fonctions discriminantes lin ´eaires

36

• L’approche d’erreur carr ´ee (r ´egression)

• soit b = ( z

₁

,..., z

_n

)

^t

• Id ´ealement on voudrait trouver a tel que Ya = b

• Mais on commet des erreurs e = Ya − b

• J

_s

( a ) = Ya − b

²

= ∑

ⁿ

i=1

( a

^t

y

_i

− b

_i

)

²

• J

_s

( a ) = ∑

ⁿ

i=1

2 ( a

^t

y

_i

− b

_i

) y

_i

= 2Y

^t

( Ya − b )

• Y

^t

Ya = Y

^t

b

• a = ( Y

^t

Y )

⁻¹

Y

^t

b = Y

^†

b

(37)

(38)

Fonctions discriminantes lin ´eaires

38

• Proc ´edure de Widrow-Hoff (LMS)

• batch: a ( k + 1 ) = a ( k ) + η( k ) Y

^t

( b − Ya ( k ))

• en ligne: a ( k + 1 ) = a ( k ) + η( k ) y

_k

( b

_k

− a

^t

y

_k

) LMS ( Θ , η (·), a

₀

)

1 a ← a

₀

2 k ← 0 3 faire

4 k ← k + 1 5 k

← k mod n

6 a ← a + η( k ) y

_k

( b

_k

− a

^t

y

_k

)

7 jusqu’ `a | η ( k ) y

_k

( b

_k

− a

^t

y

_k

)| < Θ

8 retourner a

(39)

• Proc ´edure de Widrow-Hoff (LMS)

• se comporte bien dans le cas non-s ´eparable

• ne converge pas n écessairement à un hyperplan s éparateur dans les

cas s ´eparables

(40)

Fonctions discriminantes lin ´eaires

40

• La machine de support vector (SVM)

• objectif: trouver un hyperplan s ´eparateur avec une grande marge z

_i

g ( y

_i

) = z

_i

a

^t

y

_i

• maximiser b: z

_i

g ( y

_i

)

a ≥ b i = 1 ,..., n

y₁ y₂

R2

R1

hyperplan

optimal

mar gemar

ge

maximale maximale

• lin ´eaires: g ( x ) = w

• Objectif

• d ´eterminer directement les fonctions discriminantes

• lin ´eaires: g ( x ) = w

+ ∑

w

x

= w

x + w

• lin éaires g én éralis ées: g ( x ) =

∑

a

y

( x ) = a

y

• en minimisant le risque empirique

Fonctions discriminantes lin ´eaires

• Justifications

• parfois optimal

• facile `a calculer

• candidates pour des classifieurs initiales

• aborder quelques principes importants

• fonction de d ´ecision:

f ( x ) =

C

si g ( x ) > 0 , C

si g ( x ) < 0 =

C

si w

x > − w

, C

si w

x < − w

. . .

. . .

unit´e de biais unit´e de sortie

unit´es d’entr´ee

Fonctions discriminantes lin ´eaires

• G ´eom ´etrie – deux classes

• fronti `ere de d ´ecision H est un hyperplan:g ( x ) = 0

• x

, x

∈ H: w

( x

− x

) = 0

• r ´egions de d ´ecision: R

:cot ´e positif, R

:cot ´e n ´egatif

• r = distance alg ´ebrique de x et H:

x = x

+ r w w

g ( x ) = w

x + w

= r w r = g ( x )

w

• G ´eom ´etrie – deux classes

x

g( x ) = 0 w

x

x

x

w

/ || w ||

r

H

x

R

R

Fonctions discriminantes lin ´eaires

• G ´eom ´etrie – multiclasses

• C

/ non C

ω

ω

ω

ω

• G ´eom ´etrie – multiclasses

• N ( N − 1 )/ 2 fonctions discriminantes

ω

/ ^|| ^w ^||

a _i y _i ( x ) = a ^t y