• Aucun résultat trouvé

k plus proches voisins

N/A
N/A
Protected

Academic year: 2022

Partager "k plus proches voisins"

Copied!
14
0
0

Texte intégral

(1)

Intégration de données

k Nearest-Neighbors

Ana Karina Fermin

M2 Miage APP

http://fermin.perso.math.cnrs.fr/

(2)

1 k Nearest-Neighbors

2 Generative Modeling (Naive Bayes, LDA, QDA)

3 Logistic Modeling

4 Neural Network

5 Tree Based Methods

6 Boosting

7 SVM

(3)

Methods

1 k Nearest-Neighbors (knn)

(4)

Example: kNearest-Neighbors (withk= 3)

1 2

3 4

(5)

Example: kNearest-Neighbors (withk= 4)

(6)

kNearest-Neighbors

Neighborhood Vx ofx: k closest from x learning samples.

k-NN as local conditional density estimate

bp+1(x) = P

xi∈Vx1{yi=+1}

k

KNN Classifier:

fbKNN(x) =

(+1 ifpb+1(x)≥pb−1(x)

−1 otherwise

(7)

Example: TwoClass Dataset

Synthetic Dataset

Two features/covariates.

Two classes.

Dataset from Applied Predictive Modeling, M. Kuhn and K.

Johnson, Springer

Numerical experiments withR.

(8)

Example: KNN

(9)

Example: KNN

(10)

Example: KNN

(11)

Under-fitting / Over-fitting Issue

Different behavior for different model complexity

Under-fit : Low complexity modelsare easily learned but too simple to explain the truth.

Over-fit: High complexity models are memorizing the data they have seen and are unable to generalize to unseen examples.

(12)

Under-fitting / Over-fitting Issue

(13)

Over-fitting Issue

Error behaviour

Learning/training error (error made on the learning/training set) decays when the complexity of the model increases.

Quite different behavior when the error is computed on new observations (generalization error).

Overfit for complex models: parameters learned are too specific to the learning set!

General situation! (Think of polynomial fit...)

Need to use an other criterion than the training error!

(14)

Example: KNN (bk= 25 using cross-validation)

Références

Documents relatifs

When the number n of data points increases the nearest neighbor of X gets closer to X (this.. has to be made rigorous since X is a random variable).. This result was first proved

Martin n'est pas le seul à s'intéresser à la façon dont les humains traitent les animaux - il échange ses pensées avec Mark Rowlands, philosophe des droits des animaux et

Multilateral organizations such as the World Health Organization (WHO), The World Customs Organization (WCO), and the International Criminal Police Organization

Application : model selection using

Les distributions correspondantes à un téta donné sont Fq et Gq.On utilise les échantillons pour estimer q par une q^ et on utilise alors les distributions Fq^ et Gq^ comme dans

Commanders of high performing teams have in shift 4 a stronger inhibition that a New Message is followed by a Show Information or that a Show Information is fol- lowed by a

Commanders of high performing teams have in shift 4 a stronger inhibition that a New Message is followed by a Show Information or that a Show Information is fol- lowed by a

Table 114 Results of multiple regression, predicting performance shift 3, with input and summary level process variables, and task adaptive behavior variables controlling