• Aucun résultat trouvé

k plus proches voisins

N/A
N/A
Protected

Academic year: 2022

Partager "k plus proches voisins"

Copied!
14
0
0

Texte intégral

(1)

Intégration de données

k Nearest-Neighbors

Ana Karina Fermin

M2 Miage APP

http://fermin.perso.math.cnrs.fr/

(2)

1 k Nearest-Neighbors

2 Generative Modeling (Naive Bayes, LDA, QDA)

3 Logistic Modeling

4 Neural Network

5 Tree Based Methods

6 Boosting

7 SVM

(3)

Methods

1 k Nearest-Neighbors (knn)

(4)

Example: kNearest-Neighbors (withk= 3)

1 2

3 4

(5)

Example: kNearest-Neighbors (withk= 4)

(6)

kNearest-Neighbors

Neighborhood Vx ofx: k closest from x learning samples.

k-NN as local conditional density estimate

bp+1(x) = P

xi∈Vx1{yi=+1}

k

KNN Classifier:

fbKNN(x) =

(+1 ifpb+1(x)≥pb−1(x)

−1 otherwise

(7)

Example: TwoClass Dataset

Synthetic Dataset

Two features/covariates.

Two classes.

Dataset from Applied Predictive Modeling, M. Kuhn and K.

Johnson, Springer

Numerical experiments withR.

(8)

Example: KNN

(9)

Example: KNN

(10)

Example: KNN

(11)

Under-fitting / Over-fitting Issue

Different behavior for different model complexity

Under-fit : Low complexity modelsare easily learned but too simple to explain the truth.

Over-fit: High complexity models are memorizing the data they have seen and are unable to generalize to unseen examples.

(12)

Under-fitting / Over-fitting Issue

(13)

Over-fitting Issue

Error behaviour

Learning/training error (error made on the learning/training set) decays when the complexity of the model increases.

Quite different behavior when the error is computed on new observations (generalization error).

Overfit for complex models: parameters learned are too specific to the learning set!

General situation! (Think of polynomial fit...)

Need to use an other criterion than the training error!

(14)

Example: KNN (bk= 25 using cross-validation)

Références

Documents relatifs

Multilateral organizations such as the World Health Organization (WHO), The World Customs Organization (WCO), and the International Criminal Police Organization

Application : model selection using

Martin n'est pas le seul à s'intéresser à la façon dont les humains traitent les animaux - il échange ses pensées avec Mark Rowlands, philosophe des droits des animaux et

Commanders of high performing teams have in shift 4 a stronger inhibition that a New Message is followed by a Show Information or that a Show Information is fol- lowed by a

Commanders of high performing teams have in shift 4 a stronger inhibition that a New Message is followed by a Show Information or that a Show Information is fol- lowed by a

Table 114 Results of multiple regression, predicting performance shift 3, with input and summary level process variables, and task adaptive behavior variables controlling

When the number n of data points increases the nearest neighbor of X gets closer to X (this.. has to be made rigorous since X is a random variable).. This result was first proved

Les distributions correspondantes à un téta donné sont Fq et Gq.On utilise les échantillons pour estimer q par une q^ et on utilise alors les distributions Fq^ et Gq^ comme dans