• Aucun résultat trouvé

Modèle Logistique (Partie 2)

N/A
N/A
Protected

Academic year: 2022

Partager "Modèle Logistique (Partie 2)"

Copied!
18
0
0

Texte intégral

(1)

Classification

Logistic Model

Ana Karina Fermin

ISEFAR

http://fermin.perso.math.cnrs.fr/

(2)

Ouline:

1 k Nearest-Neighbors X

2 K-Fold cross validation, Model selection X

3 Generative Modeling (Naive Bayes, LDA, QDA) X

4 Logistic Modeling

5 SVM

6 Tree Based Methods (M. Zetlaoui course, Apprentissage)

(3)

Logistic Modeling

Direct modeling ofY|x.

The Binary logistic model (Y ∈ {−1,1})

p+1(x) = eβtϕ(x) 1 +eβtϕ(x) whereϕ(x) is a transformation of the individualx

In this model, one verifies that

p+1(x)≥p−1(x) ⇔ βtϕ(x)≥0 True Y|x may not belong to this model⇒ maximum likelihood of β only finds a good approximation!

Binary Logistic classifier:

bfL(x) =

(+1 ifβbtϕ(x)≥0

−1 otherwise whereβbis estimated by maximum likelihood.

(4)

Logistic Modeling

Logistic model: approximation of B(p1(x)) byB(h(βtϕ(x))) with h(t) = 1+eett.

Opposite of the log-likelihood formula

− 1 n

n

X

i=1

1yi=1log(h(βtϕ(x))) +1yi=−1log(1−h(βtϕ(x)))

=−1 n

n

X

i=1

1yi=1log eβtϕ(x)

1 +eβtϕ(x) +1yi=−1log 1 1 +eβtϕ(x)

!

= 1 n

n

X

i=1

log1 +e−yitϕ(x)) Convex function in β!

Remark: You can also use your favorite parametric model instead of the logistic one...

(5)

Example: TwoClass Dataset

Synthetic Dataset

Two features/covariates.

Two classes.

Dataset from Applied Predictive Modeling, M. Kuhn and K.

Johnson, Springer

Numerical experiments withR.

(6)

Example: Logistic

(7)

Example: Quadratic Logistic

(8)

Model Selection

Methods

1 Model Selection

2 Prediction Errors

(9)

Model Selection

Model Selection

1 Given Y a variable to explain by d variables X1, . . . ,Xd , how to select (systematically) the most interesting subset of variables to do the prediction?

Variable selection

Find automatically a sub-group of variables to explainY.

2 More generaly, given k modelsM1, . . . ,Mk, which one to use?

Model selection

Criterion to compare the performance of different models.

(10)

Model Selection

Embedded model testing

Assume we have two competing modelsMp (with p parameters) and Mq (withq parameters) such that Mp⊂ Mq .

Can we test if Mp is sufficient?

Example (p=2 and q=4) Models:

Mp : logitpβ(p)(x) =β0+β1X1+β2X2

Mq : logitpβ(q)(x) =β0+β1X1+β2X2+β3X3+β4X4 Test

H0:β3=β4= 0 against H1 :β3 6= 0 or β4 6=0

(11)

Model Selection

Embedded model testing

Deviance (residual deviance sous R)

Log-likelihood Mp: Lp= log(Lp( ˆβ,Dn))

Log-likelihood of modelMq: Lq= log(Lq( ˆβ,Dn)) Deviance between the two models:

Dq−p = 2(Lq− Lp) = 2(log(Lq( ˆβ,Dn))−log(Lp( ˆβ,Dn))) Asymptotically underH0 : (Dq−p)∼χ2(q−p)

Under R :IfWandVare two objects obtained withglmsuch thatW is a submodel ofV, the command

anova(W,V,test="Chisq") performs this test.

(12)

Model Selection

AIC, BIC

Let Mbe a generic logistic model and denotep its number of parameters.

Let ˆβ be the ML estimate in this model M.

The AIC and BIC consist in minimizing

−2×log(L( ˆβ,Dn)) +κ(n)×p over all models.

Different choices for the factorκ(n):

AIC :κ(n) = 2.

BIC :κ(n) = logn.

The BIC criterion leads to the selection of a model with a smaller dimension than AIC.

(13)

Prediction Errors

Methods

1 Model Selection

2 Prediction Errors

(14)

Prediction Errors

How to draw benefits from the estimated model?

Prediction for a new individual:

A new individualxnew appears and we want to predict if he has the disease (ynew= 1) or not (ynew= 0).

We have estimated the logistic coefficients βb so that P{Y = 1|X} ≈ eβbTX

1 +eβbTX

. (1)

Any ideas to predictynew?

(15)

Prediction Errors

Prediction Error (CHD example)

Prediction (threshold = 0.5) Then,

IfPb(y=Yes|X)>0.5, we predicty =Yes;

IfPb(y=Yes|X)≤0.5, we predicty =No.

Confusion Matrix : Cross table of the prediction vs the truth

## pred

## CHD No Yes

## No 45 12

## Yes 14 29

Prediction error : (14 + 12)/100 = 0.26

(16)

Prediction Errors

Classical Metrics

Score

Accuracy = TP + TN TP + TN + FP + FN

(17)

Prediction Errors

Classical Metrics Scores

Accuracy = TP + TN TP + TN + FP + FN True Positive Rate

Sensitivity Recall

)

= TP

#(real P) = TP TP + FN

False Positive Rate 1−Specificity

o= FP

#(real N) = FP FP + TN Precision = TP

#(predicted P) = TP TP + FP F-Score = 2Precision×Recall

Precision + Recall Many other metrics...

(18)

Prediction Errors

Confusion matrix for multiclass data

To label digits:

True label yi ∈ {0, . . . ,9}, Predicted label ybi ∈ {0, . . . ,9},

Confusion matrix is thus of size 10×10.

Confusion matrix

Références

Documents relatifs

Est-ce que les électeurs de minorités ethniques (vs les Blancs), les gens moins favorisés économiquement (vs les plus favorisés) et les femmes (plus que les hommes) ont plus

Selon I'AFNOR, la traqabilitk se dtfinit comrne : " I'aptitude 1 retrouver l'historique, l'utilisation ou la localisation d'un article ou d'une activit4 ou

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

This chapter focuses on large-scale multiple crops nurseries, with high-frequency irrigation requirements under limited water availability, with the objective of

Si les données sont unidimensionnelles, la corrélation entre deux items s’annule quand les sujets ont la même mesure dans la dimension latente et donc, dans le modèle de

But then the incommensurability thesis does not, on its own, represent any threat to scientific realism (which is the issue under discussion), for the realist can accept that there is

> 1. Même question avec.. On suppose que la fonction P ainsi définie est dérivable et strictement positive sur 0 ; +∞ et qu’il existe des constantes k et M strictement

Afin de donner du relief ` a ce r´esultat, observons que la cat´egorie U poss`ede une infinit´e d’objets qui sont dans un cran fini de la filtration de Krull et qui ne sont