• Aucun résultat trouvé

SVM

N/A
N/A
Protected

Academic year: 2022

Partager "SVM"

Copied!
14
0
0

Texte intégral

(1)

Classification

Penalization, Optimisation and SVM

Ana Karina Fermin

ISEFAR

(2)

Penalization

Penalized Loss Minimization of

argmin

θ∈Θ

1 n

n

X

i=1

`(Yi,fθ(Xi)) + pen(θ) where pen(θ) is a penalty.

Penalties

Upper bound of the optimism of the empirical loss Depends on the loss and the framework!

(3)

Variable Selection

Setting: Gen. linear model = prediction of Y byh(Xtβ).

Model coefficients

Model entirely specified by β.

Coefficientwise:

βi = 0 means that theith covariate is not used.

βi 0 means that theith covariate as alowinfluence...

If some covariates are useless, better use a simpler model...

Submodels

Simplify the model through a constraint onβ!

Examples:

Support: Impose thatβi = 0 fori6∈I.

Support size: Impose thatkβk0=Pd

i=11βi6=0<C Norm: Impose thatkβkp<C with 1p(Oftenp= 2 or p= 1)

(4)

Constraint and Penalization

Constrained Optimization Choose a constant C. Computeβ as

argmin

β∈Rd,kβkp≤C

1 n

n

X

i=1

`(Yi,h(βtXi))

Lagrangian Reformulation Choose λ

Ccompute β as argmin

β∈Rd

1 n

n

X

i=1

`(Yi,h(βtXi)) +λkβkpp0 with p0 =p except if p= 0 wherep0= 1.

(5)

Penalization

Penalized Linear Model Minimization of

argmin

β∈Rd

1 n

n

X

i=1

`(Yi,h(βtXi)) + pen(β) Variable selection if β is sparse.

Classical Penalties

AIC: pen(β) =λkβk0 (non convex / sparsity) Ridge: pen(β) =λkβk22 (convex / no sparsity) Lasso: pen(β) =λkβk1 (convex / sparsity)

Elastic net: pen(β) =λ1kβk1+λ2kβk22 (convex / sparsity) Easy optimization if pen (and the loss) is convex...

Need to specify λ!

(6)

Logistic Revisited

Ideal solution:

bf = argmin

f∈S

1 n

n

X

i=1

`0/1(yi,f(xi))

Logistic regression

Usef(x) =hβ,xi+b.

Use the logistic loss `(y,f) = log2(1 +e−yf), i.e. the -log-likelihood.

Different vision than the statistician but same algorithm!

(7)

Methods

Statistical point of view

1 k Nearest-NeighborsX

2 Generative Modeling (Naive Bayes, LDA, QDA)X

3 Logistic ModelingX Optimisation point of View

1 Logistic ModelingX

2 SVM

3 ...

(8)

Ideal Separable Case

Linear classifier: sign(hβ,xi+b)

Separable case: ∃(β,b),∀i,yi(hβ,xi+b)>0!

How to choose (β,b) so that the separation is maximal?

Strict separation: ∃(β,b),∀i,yi(hβ,xi+b)≥1 Maximize the distance betweenhβ,xi+b = 1 and hβ,xi+b=−1.

Equivalent to the minimization of kβk2.

(9)

Non Separable Case

What about the non separable case?

Relax the assumption that ∀i,yi(hβ,xi+b)≥1.

Naive attempt:

argminkβk2+C1 n

n

X

i=1

1yi(hβ,xi+b)≤1

Non convex minimization.

SVM: better convex relaxation!

argminkβk2+C1 n

n

X

i=1

max(1−yi(hβ,xi+b),0)

(10)

SVM as a Penalized Convex Relaxation

Convex relaxation:

argminkβk2+C1 n

n

X

i=1

max(1−yi(hβ,xi+b),0)

= argmin1 n

n

X

i=1

max(1−yi(hβ,xi+b),0) + 1 Ckβk2

Prop: `0/1(yi,sign(hβ,xi+b))≤max(1−yi(hβ,xi+b),0) Penalized convex relaxation (Tikhonov!)

1 n

n

X

i=1

`0/1(yi,sign(hβ,xi+b))

≤ 1 n

n

X

i=1

max(1−yi(hβ,xi+b),0) + 1 Ckβk2

(11)

SVM

(12)

The Kernel Trick

Non linear separation: just replace x by a non linear Φ(x)...

Kernel trick

Computing k(x,x0) =hΦ(x),Φ(x0)i may be easier than computing Φ(x), Φ(x0) and then the scalar product!

Φ can be specified through its definite positive kernelk. Exemples :

linear kernel k(x,x0) =hx,x0i

Polynomial kernel k(x,x0) = (1 +hx,x0i)d, Gaussian kernel k(x,x0) =e−kx−x0k2/2,...

(13)

SVM

(14)

SVM

Références

Documents relatifs

Stable isotope compositions were studied in particulate organic matter (POM), zooplankton and different trophic groups of teleosts to compare food chains based

When forming their value proposition, which is needed for acquiring new customers, the company may focus one of the three main assets that are needed to run process instances:

Within this genomic region, we identified a hemizygous 5-bp frameshift deletion in PRLR, which causes the loss of 98 C-terminal AA and is the causal polymorphism for low

Visitation network (a ), pollen type network (b ), and nutritional network (c ), representing four/five bumblebee species (bottom ) and 11 plant species visited (a ), 35 pollen

We, therefore, investigated (1) whether colonies of honeybees and two different bumblebee species located at the same habitat use the same or different plant spectrum for

Specifically, this thesis extends the classic Support Vector Machine (SVM) algorithm to present a new hill- climbing method Relevant Subset Selection Using The

The EcoHealth paradigm, however, takes a broader view of health and links public health to natural resource manage- ment within an ecosystem approach to human health.. EcoHealth is

coli FdhD, it has been reported that especially the first cysteine of the two cysteines in the conserved 121 CXXC 124 motif is essential for its role in transferring the sulfur