• Aucun résultat trouvé

Le Lasso La solution component wise

N/A
N/A
Protected

Academic year: 2022

Partager "Le Lasso La solution component wise"

Copied!
14
0
0

Texte intégral

(1)

Le Lasso

La solution component wise

S.Canu

APPS ITI /INSA de Rouen Normandie

11 octobre 2021

(2)

Le Lasso

Données

I X Rn×p lespvariables observéesnfois

I yRn la variable réponse Inconnues

I βRp

hypothèses : pas de biais (pas de terme constant)

I mean(X) =0

I kXjk2=1

I mean(y) =0

le cout pénalisé

Jλ(β) = 12kXβ−yk2+λkβk1, Les conditions d’optimalité

0∈∂Jλ(β) = X>(Xβ−y) +λ∂(kβk1),

(3)

La notion de chemin de régularisation

β∈Rminp Jλ(β) avec Jλ(β) = 12kXβ−yk2

p

X

j=1

j|

∂J(β) =X>(Xβ−y) +λs(β) avec sj(β) =

1 siβj >0 [−1,1] siβj =0

−1 siβj <0 β optimal si 0∈∂J(β)c’est-à-dire si pour g(β) =X>(Xβ−y)

gj(β) +λ=0 siβj >0

−λ≤g ≤λ siβj =0 gj(β)−λ=0 siβj <0 Un cas intéressant :

siλ≥max

j (X>y)j alors β =0

(4)

La notion de chemin de régularisation

β(λ)

λlarge ⇒β=0 so that|X>y| ≤λ

the first non zero component for vectorβ appears forλ= max|X>y|

the regularization path is piecewise linear

(5)

Algo CW pour le Lasso

Jλ(β) = 12kXβ−yk2

p

X

j=1

j|,

Algo CW pour le Lasso

itérer, tant que on n’a pas convergé : Pourj =1,p

fixer toutes les variables saufβj résoudre le Lasso en une dimension :

βˆj = arg min

β∈R

Jλ(j)(β)

(1)

Le cout du Lasso en une dimension Jλ(j)(β) = 12kXjβ−

y−Xβ(−j)

k2+λ|β|

(6)

Le Lasso en 1d

minβ∈R

Jλ(1d)(β) avec Jλ(1d)(β) = 12kxβ−rk2+λ|β|

βJλ(1d)(β) =x>(xβ−r) +

λα ifβ =0 λsign(β) else.

0∈∂βJλ(1d)(β) ⇔

∃α∈[−1,1],−x>r+λα=0 ifβ =0 kxk2β−x>r+λsign(β) =0 else.

The differential part :

( if β >0 β= xkxk>r−λ2 it works whenx>r > λ if β <0 β= xkxk>r+λ2 it works whenx>r <−λ It remains the non differential part : when −λ <x>r < λthenβ =0

(7)

Le Lasso en 1d

The differential part :

( if β >0 β= xkxk>r−λ2 it works whenx>r > λ if β <0 β= xkxk>r+λ2 it works whenx>r <−λ It remains the non differential part : when −λ <x>r < λthenβ =0 if kxk2 =1

βˆLasso1d =

0 if|x>r| ≤λ sign(x>r)(|x>r| −λ) else.

Or in one line :

sign(x>r) max (|x>r| −λ),0

(8)

Algo CW pour le Lasso

Pourj =1,p : ˆβj = arg min

β∈R

J(j)(β)

J(j)(β) = 12ky−Xβ(−j)−Xjβk2+λ|β|

= 12kr−Xjβk2+λ|β|

avecr =y−Xβ(−j) et β(−j) = (β1, . . . , βj−1,0, βj+1, . . . , βp) (2)

Maths Info

Pour j =1,p for j in range(p) :

β(−j)= (β1, . . . , βj−1,0, βj+1, . . . , βp) bj=beta;bj[j] =0;

r =y−Xβ(−j) r=y−X@bj βˆj = arg min

β∈R

J(j)(β,Xj,z) beta[j] =sign(x>r) max (|x>r| −λ),0

fin de pour end

(9)

Codage

1 let’s begin withβ =0

2 in that case r ←y since Xβ=0

3 fit a single (thejth) component of vector β : β[j] =sign(x>r) max (|x>r| −λ),0

4 update r :r←r−X[:,j]β[j]

5 choose another componentj0

6 r ←r+X[:,j0]β[j0]

7 fit the other component j0 :β[j0] =sign(x>r) max (|x>r| −λ),0

8 update r :r←r−X[:,j0]β[j0]

9 . . .iterate (go to 5)

(10)

Sklearn et le Lasso

Lasso LassoCV lasso_path LassoLars LassoLarsCV lars_path

sklearn.decomposition.sparse_encode

(11)
(12)

Lasso et Elastic Net

Le Lasso

Jλ(β) = 12kX0β−y0k2+λkβk1 , Elastic Net

Jel(β) = 12kXβ−yk2+λkβk1+γkβk22 , Lasso = Elastic Net withγ =0,X =X0,y =y0

Elastic Net = Lasso with X0 = (X>,√

γIp)>,y0 = (y>,0p)>

kXβ−yk22+γkβk22

| {z } kγβk22

=

X

√γIp

| {z }

=:X0

β− y

0p

| {z }

=:y

¯

0

2

2

=

X0β−y0

2 2.

Regularization and variable selection via the elastic net H Zou, T Hastie, 2005

(13)

Reweigthed least square (Lasso and Ridge)

Jλ(β) = 12kXβ−yk2

p

X

j=1

j|= βj2

j| ,

the idea : iterate the weighted rigde towards a fixed point β(k+1) = arg min

β 1

2kXβ−yk2

p

X

j=1

βj2

j(k)| =wj βj2,

with wj =1/|βj(k)|, that is

β(k+1) = (X>X +λW)−1X>y withdiag(W) = 1

(kj )| W = Identité

Tantque on n’a pas convergé

β = arg min12kXβ−yk2+λβ>Wβ W =diag(1/|β|)

fin de tantque

(14)

The adaptive Lasso

Jλ(β) = 12kXβ−yk2

p

X

j=1

wjj|,

wj = 1

|βˆ(ls) j |γ

w = np.linalg.solve(X.T@X,X.T@y) w = 1/np.sqrt(np.abs(w))

X_c = X/w

c = mon_lasso(X_c,y,lambda) beta = c/w

The Adaptive Lasso and Its Oracle Properties, H. Zou, JASA, 2012

Références

Documents relatifs

Cette revue de la littérature a pour but de répondre à la question de recherche sui- vante : En quoi les soins bucco-dentaires participent à la prévention/limitation

The trivial drawback is that although feature selection may be accurate in each regression, no pertinent selection is made throughout the whole algorithm: features just add up

Proportion of zeros identified by the screening procedures as a function of − log 10 (λ t /λ max ) (horizontal axis) and the (log 10 of the) number of iterations (vertical axis):

To our knowledge there are only two non asymptotic results for the Lasso in logistic model : the first one is from Bach [2], who provided bounds for excess risk

To our knowledge there are only two non asymptotic results for the Lasso in logistic model: the first one is from Bach [2], who provided bounds for excess risk

A hierarchical Bayesian model is used to shrink the fixed and random effects toward the common population values by introducing an l 1 penalty in the mixed quantile regression

Correlations thus allow for small tuning parameters; this implies for correlated designs, via the factor λ in the slow rate bounds, favorable bounds even though no fast rate bounds

La s´ election pas ` a pas est performante lorsque le nombre de variables pertinentes est faible devant nombre total de variables explicatives, quand ces variables sont ind´