• Aucun résultat trouvé

Naive Bayes Classifier

N/A
N/A
Protected

Academic year: 2022

Partager "Naive Bayes Classifier"

Copied!
3
0
0

Texte intégral

(1)

M1 LI

Naive Bayes Classifier

Guillaume Wisniewski [email protected]

September 2021

1 Goal

We are aiming to train a classifier, implementing the Naive Bayes approach, to predict whether a mail is a spam or a ham. To train and evaluate our classifier we will consider theling-spam dataset that contains spam messages and messages from the Linguist list.1

2 Dataset

1. Write a function that takes as parameter the name of the directory containing theling-spam and returns a list of examples. Each examples is made of an observation (the list of words extracted from the mail) and a label (eitherhamorspam).

2. How many examples are there in the corpus? Represent the label distribution.

3. Write a function that split the corpus into a train and a test sets. The function will take as parameters the list of examples and the proportion of examples that must be included into the train set.

3 Naive Bayes classifier

In the Naive Bayes classifier the conditional distribution over the class variableyis estimated by:

p(y|x)∝ p(y)·

k

Y

i=1

p(xi|y) (1)

1Seethis articlefor further details.

1

(2)

where∝denotes proportionality andx = (xi)ki=1 is an observation represented by a vector ofk components.2

The computation ofp(y|x) involves the multiplication of a large number of tiny numbers (there as many terms in the multiplication as words in an observation, each of the terms is a probability and, consequently, smaller than 1), which can result in numerical stability issues (computers are not able to represent small real numbers). That is why, you have to manipulate log-probability in our program rather than probabilities.

4. Use Equation 1to express logp(y|x) as a function of p(xi|y) and p(y). Why can we use logp(y|x) in the MAP decision rule instead of p(y|x).

5. Write a function that takes as input a list of examples a returns a dictionary of dictionar- ies that maps every pairs (word,label) to the number of occurrences ofword in mails labeled bylabel

6. Write down a class3with:

• afitmethod that will estimate the parameters of the Naive Bayes classifier;

• apredictmethod that will take as input an observation (a list of words) and return the predicted label (eitherspamorham);

• ascoremethod that will take as input a list of annotated examples and return the proportion of correctly predicted labels. This corresponds to the classifieraccuracy.

7. What is the accuracy of a naive Bayes classifier when the train set is made of 80% of all the examples. How is the accuracy related to the 0/1 loss?

4 Evaluation

8. Compute theconfusion matrixof the classifier trained in the previous section. A confusion matrix is an×n matrix, wherenis the number of possible label. An entrymi,j of the confusion matrix indicates the number of times an examples with gold labelihas been classified as j. What is a confusion matrix useful for? Interpret.

9. Compute the accuracy of the classifier for different size of the training set (e.g. starting with a 80:20 train-test split, consider 10%, 20%, 30%, ... of the training data to estimate the classifier parameters and estimate the classifier performance on the test set). Why do we have to consider a single test set (rather than, for instance, consider all the remaining data to evaluate the classifier performance). Plot the classifier performance with respect to the size to the train set. Interpret.

2In the next classes we will see that these components are usually called ‘features’.

3Defining a class is a simple way to ‘link’ the different methods together. If you are not at ease with OOP, you can write two separated methods.

2

(3)

10. Generate 100 train-test splits (all with a 80:20 proportion), train and evaluate a classifier on each of this split. What is the smallest and largest accuracies achieved? Plot the resulting accuracy distribution. Interpret.

11. Find, for each labely, the ten word with the largest probability p(w|y). Interpret.

3

Références

Documents relatifs

The sparse regularized likelihood being non convex, we propose an online gradient algorithm using mini-batches and a post- optimization to avoid local minima.. In our experiments we

We shall say that two power series f, g with integral coefficients are congruent modulo M (where M is any positive integer) if their coefficients of the same power of z are

Figures 7 and 8 detail the latency for a broadcast message to be delivered by all correct processes in systems of size 25, 49, and 73, using either RSA or EC: PISTIS delivers

This article is inspired from the YouTube video “Ogni numero `e l’inizio di una potenza di 2” by MATH-segnale (www.youtube.com/mathsegnale) and from the article “Elk natuurlijk

Adult Spanish second language (L2) learners of English and native speakers of English participated in an English perception task designed to investigate their ability to use

We also obtain results for the mean and variance of the number of overlapping occurrences of a word in a finite discrete time semi-Markov sequence of letters under certain

Lurch is a free word processor that can check the steps of a mathe- matical proof, and we have focused the current version on the needs of introduction-to-proof courses, covering

Using a simple word matching I show that the new word list may perform better than ANEW, though not as good as the more elaborate approach found in SentiStrength..