Fonctions discriminantes lin ´eaires

(1)

•

Objectif

•d ´eterminerdirectementles fonctions discriminantes

•lin ´eaires:g(x) =w0+

!

^d

i=1

wixi=w^tx+w0

•lin éairesg én éralis ées:g(x) =

!

^d^"

i=1aiyi(x) =a^ty

•en minimisant lerisque empirique

•

Justifications

•parfoisoptimal

•facile `a calculer

•candidates pour desclassifieurs initiales

•aborderquelquesprincipesimportants

Fonctions discriminantes lin ´eaires

3

•

G ´eom ´etrie – deux classes

•fonction de d ´ecision:

f(x) =

!C1 sig(x)>0, C2 sig(x)<0 =

!C1 siw^tx>−w0, C2 siw^tx<−w0

x0=1 x₁

. . .

w₂ w₀

w1

w_d g(x)

x₂

. . .

x_d

unit´e de biais unit´e de sortie

unit´es d’entr´ee

Fonctions discriminantes lin ´eaires

4

•

G ´eom ´etrie – deux classes

•fronti `erede d ´ecisionHest unhyperplan:g(x) =0

•x1,x2∈H:w^t(x₁−x2) =0

•r égionsde d écision:R1:cot épositif,R2:cot én égatif

•r=distance alg ´ebriquedexetH:

x = xp+r w

#w# g(x) = w^tx+w0=r#w#

r = g(x)

#w#

(2)

•

G ´eom ´etrie – deux classes

x

g(x) = 0 w

x1

x2

x3

w⁰/^||^w||

r

H xp

R1

R2

•

G ´eom ´etrie – multiclasses

•Ci/nonCi

ω1

not ω1

ω1

not ω2

ω2

not ω3ω3

not ω4

ω₄ ω2

ω4

ω3 r´egion

ambigue

Fonctions discriminantes lin ´eaires

7

•

G ´eom ´etrie – multiclasses

•N(N−1)/2fonctions discriminantes

ω1

ω2

ω3

ω4

ω4 ω4

ω3

ω2

ω1

ω4

H13

H12

H14

H23 H24

H34

r´egion ambigue

Fonctions discriminantes lin ´eaires

8

•

Fonctions discriminantes lin ´eaires

•machine lin ´eaire:gj(x) =w^tjx+wj0, j=1, . . . ,N

•fronti `eresde d ´ecisionHi,j:gi(x) =gj(x)

•(wi−wj)estorthogonal`aHi,j

•r(x,Hi,j) =gi(x)−gj(x)

#wi−wj#

(3)

•

Fonctions discriminantes lin ´eaires

R1

R2

R3

R4

R5

ω₁ R2

R3

R1

ω₂ ω₁

ω₃

ω₅

ω₂ ω₃

ω4

H15 H₂₅

H₂₄ H₁₄

H₃₅

H13

H₃₄H23

H₁₂

H23

H13

•

Fonctions discriminantes lin éaires g én éralis ées:

g(x) =

!

^d^"

i=1

a_iy_i(x) =a^ty

•exemple: fonction discriminantequadratique:

g(x) =w0+

!

^d

i=1

wixi+

!

^d

i=1 d

!

j=1

wi jxixj

•fronti `erede d ´ecision:hyperquadrique

Fonctions discriminantes lin ´eaires

11

•

Fonctions discriminantes lin éaires g én éralis ées

•exemple:g(x) =a1+a2x+a3x²,y=



 1 x x²





0

-1 0

1 2

y2 0

2 4

y3

0.5 1

1.5 2

2.5 y1 1

-1 0 2

-2 x

R1

R1 R2

y =¹

( )

^x^x²

R2

R1

ˆ ˆ

12

•exemple:y=



 x1

x2

"x1x2





y₂

w

R2

R1

R2

R1

x₁ x₂

x₁ x₂ y₁ y₃

y =

^x

( )

¹^x^α²^x¹^x² H

^ˆ

ˆ

(4)

•

Vecteur augment ´e

•g(x) =w0+

!

^d

i=1

wixi=

!

^d

i=0

wixi (x0=1)

•g(x) =

!

^d^"

i=1

aiyi,d"=d+1,y=





 1 x1

...

xd





,a=





 w0

w1

...

wd





=





 w0

w







•

Vecteur augment ´e

y1

y2

y0

a y0=1

R1

R2

y0=0

y¹=0

y2=0 fronti

`ere de d´ecision

Fonctions discriminantes lin ´eaires

15

•

S éparabilit é lin éaire

•Dn=)

(y1,z1), . . . ,(yn,zn)* ,zi=

!1 siyiest classifi´eC1

−1 siyiest classifi´eC2

•g(x) =a^tys ´epareDnsans erreur:

a^tyizi>0, i=1, . . . ,n

•a: vecteurs ´eparateur, vecteur desolution

Fonctions discriminantes lin ´eaires

16

•

S éparabilit é lin éaire

y1

y₂

y1

y2

a a

r´egion de r´egion de

solution solution

plans´eparateur

plan”s´eparateur”

(5)

•

Marge de s ´eparation:

m_i=g(x_i)z_i=a^tyz_i

•S ´eparation avec unemargeb:

m_i=a^ty_iz_i>b, i=1, . . . ,n

•

Marge de s ´eparation

y1

y2

y3

a1

a2

a1

y1

y2

y3

b/||y₂

||

b/||y1||

b/||y||³

}

Fonctions discriminantes lin ´eaires

19

•

Proc ´edures de descente de gradient

•fonction decrit `ere:J(a)– minimis ´ee siaest une solution

•a(k+1) =a(k)−%(k)!J(a(k))

•%(k):taux d’apprentissage

DÊSCENTEDÊG^RADIENTSÎMPLE($,%(·),a0) 1 a←a0

2 k←0 3 faire 4 k←k+1 5 a←a−%(k)!J(a) 6 jusqu’`a|%(k)!J(a)|<$ 7 retourner a

Fonctions discriminantes lin ´eaires

20

•

Descente de Newton

•J(a)$J(a(k)) +!J^t(a−a(k)) +1

2(a−a(k))^tH(a−a(k))

•matricehessienne:Hi j= #²J

#ai#aj

•a(k+1) =a(k)−H⁻¹!J DESCENTEDENEWTON($,a0)

1 a←a0

2 faire

3 a←a−H⁻¹!J(a) 4 jusqu’`a|H⁻¹!J(a)|<$ 5 retourner a

(6)

•

Descente de Newton

a1

a2

J(a)

•

Le perceptron

•Jp(a) =

!

ⁿ

i=1

I_{a^ty_iz_i≤0}(−a^tyizi)

•!Jp=

!

ⁿ

i=1

I_{a^tyizi≤0}(−yizi)

•a(k+1) =a(k) +%(k)!ⁿi=1I_{a^ty_iz_i≤0}yizi

Fonctions discriminantes lin ´eaires

23

•

Le perceptron

PERCEPTRONBATCH($,%(·),a0) 1 a←a0

2 k←0 3 faire 4 k←k+1

5 a←a+%(k)!ⁿi=1I_{a^ty_iz_i≤0}yizi

6 jusqu’`a|%(k)!ⁿi=1I_{a^tyizi≤0}yizi|<$ 7 retourner a

Fonctions discriminantes lin ´eaires

24

•

Fonctions de crit `ere

-2 0 2 4 -2

0 2

4 0

100

-2 0 2 4 -2

0 2

4 0

5 -2 0 2 4 -2

0 2

4 0

1 2 3

-2 0 2 4 -2

0 2

4 0

5 10

y1 y1

y2 y2

y3 y3

a2 a2

a1 a1

Jp(a)

Jq(a) Jr(a)

J(a)

(7)

PERCEPTRONENLIGNE(a0) 1 a←a0

2 k←0 3 faire

4 k←(k+1) modn

5 si a^tykzk≤0alors !ykmal classifi´e 6 a←a+y_kzk

7 jusqu’`a!ⁿi=1I_{a^tyizi≤0}=0 !pas d’erreur 8 retourner a

•

Th ´eor `eme

•Sil’ensemble d’entraˆınement estlin éairement s éparable, l’algorithm PERCEPTRONENLIGNEsetermine à une vecteur de solution apr ès unnombrefini de corrections.

•

Le perceptron en-ligne, avec marge, d’incr ´ement variable

PÊRCEPTRONE^NLÎGNEMÂRGEVÂRIABLE(%(·),a0,b) 1 a←a0

2 k←0 3 faire 4 k←k+1 5 k^'←k modn 6 si a^tyk'zk'≤balors 7 a←a+%(k)yk^'zk^'

8 jusqu’`a!ⁿi=1I_{a^tyizi≤b}=0 !pas d’erreur par rapport `a la marge b

9 retourner a

Fonctions discriminantes lin ´eaires

27

•

Conditions de convergence

•%(k)≥0

•lim

m→&

m k=1

!

%(k) =&

•lim

m→&

!^mk=1%²(k) (!^mk=1%(k))²=0

Fonctions discriminantes lin ´eaires

28

•

Le perceptron batch d’incr ´ement variable

•y^(k)=

!

ⁿ

i=1

I_{a^t(k)y_iz_i≤0}yizi

PÊRCEPTRONBÂTCHVÂRIABLE(%(·),a0) 1 a←a0

2 k←0 3 faire 4 k←k+1

5 a←a+%(k)!ⁿi=1I_{a^ty_iz_i≤0}yizi

6 jusqu’`a!ⁿi=1I_{a^ty_iz_i≤0}=0 7 retourner a

(8)

•

Proc ´edures de relaxation

•Jq(a) =

!

ⁿ

i=1I_{a^ty_iz_i≤0}(a^tyizi)²

•Jr(a) =1 2

n i=1

!

I_{a^tyizi≤b}

(a^tyizi−b)²

#yizi#²

•!Jr=

!

ⁿ

i=1I_{a^ty_iz_i≤b}a^tyizi−b

#yizi#² yizi

•a(k+1) =a(k) +%(k)

!

ⁿ

i=1

I_{a^tyizi≤b}

b−a^tyizi

#yizi#² yizi

•

Proc ´edures de relaxation

RÊLAXATIONBÂTCHMÂRGE(%(·),a0,b) 1 a←a0

2 k←0 3 faire 4 k←k+1

5 a←a+%(k)!ⁿi=1I_{a^ty_iz_i≤b}b−a^ty_iz_i

#y_iz_i#²yizi

6 jusqu’`a!ⁿi=1I_{a^ty_iz_i≤b}=0 7 retourner a

Fonctions discriminantes lin ´eaires

31

•

Relaxation en-ligne

RÊLAXATIONE^NLÎGNEMÂRGE(%(·),a0,b) 1 a←a0

2 k←0 3 faire 4 k←k+1 5 k^'←k modn 6 si a^tyk'zk'≤balors 7 a←a+%(k)^b−a^t^y^k'^z^k'

#y_k'z_k'#²yk'zk'

8 jusqu’`a!ⁿi=1I_{a^tyizi≤b}=0 9 retourner a

Fonctions discriminantes lin ´eaires

32

•

Relaxation en-ligne

•r(k) =b−a^tyk^'zk^'

#yk^'zk^'#

atyk = b a(k)

y^k

r(k)

y1

y2

1- η η

(9)

•

Relaxation en-ligne

•%>1:sur-relaxation

•%<1:sous-relaxation

•condition deconvergence:0<%<2

a1 a1

J(a) J(a)

•

Comportement dans le cas non-s ´eparable

•proc ´edures decorrection d’erreur

•fonctionnentbiensi

•la d écision de Bayes està peu près lin éaire

•l’erreur de Bayes estpetite

•si2d">n, la probabilit é de non-s éparabilit é est petite

Fonctions discriminantes lin ´eaires

35

•

Incr ´ement

fixe

•boucleinfinie

•engendre un proc ´essus d’´etatfini

•moyennerles vecteurs de poids

•

Incr ´ement variable

•convergesi%(k)→0

Fonctions discriminantes lin ´eaires

36

•

L’approche d’erreur carr ´ee (r ´egression)

•soitb= (z1, . . .,zn)^t

•Id ´ealement on voudrait trouveratel queYa=b

•Mais on commet des erreurse=Ya−b

•Js(a) =#Ya−b#²=

!

ⁿ

i=1

(a^tyi−bi)²

•!Js(a) =

!

ⁿ

i=1

2(a^tyi−bi)yi=2Y^t(Ya−b)

•Y^tYa=Y^tb

•a= (Y^tY)⁻¹Y^tb=Y^†b

(10)

•

Proc ´edure de Widrow-Hoff (LMS)

•batch:a(k+1) =a(k) +%(k)Y^t(b−Ya(k))

•en ligne:a(k+1) =a(k) +%(k)y_k'(b_k'−a^tyk^') LMS($,%(·),a0)

1 a←a0

2 k←0 3 faire 4 k←k+1 5 k^'←k modn

6 a←a+%(k)yk'(bk'−a^tyk') 7 jusqu’`a|%(k)yk'(bk'−a^tyk')|<$ 8 retourner a

Fonctions discriminantes lin ´eaires

39

•

Proc ´edure de Widrow-Hoff (LMS)

•se comportebiendans le casnon-s ´eparable

•ne converge pas n écessairementà un hyperplan s éparateur dans les cas s éparables

Fonctions discriminantes lin ´eaires

40

•

La machine de support vector (SVM)

•objectif: trouver un hyperplan s ´eparateur avec unegrande marge zig(y_i) =zia^tyi

•maximiserb:zig(yi)

#a# ≥b i=1, . . . ,n

y1

y2

R2

R1

hyperplan optimal

mar gemar

ge

maximale maximale