• Aucun résultat trouvé

Advanced Human-Machine Interaction Interaction Data Analysis TD04: Text Mining

N/A
N/A
Protected

Academic year: 2022

Partager "Advanced Human-Machine Interaction Interaction Data Analysis TD04: Text Mining"

Copied!
1
0
0

Texte intégral

(1)

Advanced Human-Machine Interaction Interaction Data Analysis

TD04: Text Mining

INSA Rouen Normandie - Normandie University

Practical session : Positive / Negative classication of messages

Data collection and processing

1. Download and analyze the dataset at http://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+

Sentences. It contains a set of sentences posted by users and labeled as 0 (negative) or 1 (positive).

2. Merge the three les and mix the sentences.

3. If necessary, format all lines to be easily processed

Baseline (no pre-processing, bag of words, simple presence of word, no cross vali- dation)

1. Divide the dataset into 2 sets : a learning set (2/3) and a test set (1/3).

2. Construct the learning matrix (some toolboxes propose their own function, others need you to do it yourself) : each line is a sentence, each column a word, a 1 means that the word is in the sentence, 0 it is not.

3. Learn a model (ex : SVM with linear kernel -if you don't want to wait for a too long time-, Naive Bayes, ...) on the learning set.

4. Test the model on the test set and compute the F-measure

Baseline (no pre-processing, bag of words, simple presence of word, cross validation)

1. Divide the dataset into K sets.

2. Do a k-fold cross validation (learn on sets 1...n-1 and test on set n ; learn on sets 1...n-2, n and test on set n-1 ; ... ; learn on sets 2..., n and test on set 1.

3. compute the mean F-measure

Advanced models (pre-processing, bag of words, cross validation)

1. Evaluate the eciency of various pre-processing : stop-words ltering, stemming, lemmatization...

2. Evaluate the eciency of various bag of word representations (word counting, tf.idf) 3. Compare various ML models (SVM, Naive Bayes, Random Forests, ...)

Neural network approaches (pre-processing, word2vec, cross validation)

1. Evaluate the eciency of word2vec representation and classic ML algorithms (SVM, Naive Bayes, ...).

2. Evaluate various neural network models (CNN, LSTM, etc.).

3. Evaluate novel NN representations such as BERT.

1

Références

Documents relatifs

Table 5: Excerpt of the extensional definition of the serous image class at the semantic level for the segmentation task: ’ObjectExtraction {serous cell nucleus}’.. Object of

It is now commonly accepted that there is a continuum between perception and action, so that the perception of sound can be a motor process for a listener. […] there is

With experiments on standard topic and sentiment classification tasks, we show that (a) our proposed model learns mean- ingful word importance for a given task (b) our model gives

Most methods in this approach formulate the article-finding step as a text classification (TC) task, in which articles relevant to PPI are denoted as positive instances

Alors sans intervention de type supervisé (cad sans apprentissage avec exemples), le système parvient à détecter (structurer, extraire) la présence de 10 formes

• Cet algorithme hiérarchique est bien adapté pour fournir une segmentation multi-échelle puisqu’il donne en ensemble de partitions possibles en C classes pour 1 variant de 1 à N,

Interactions that we recognize in this paper come from the UT-interaction dataset and do not involve more than two persons. But it may be noted that our method can be easily

Figure 2: UI allows users (1) to get an overview of rules (2) filter by precision, recall, and F1, (3) rank, (4,5) filter by predicates, (6) remove rules by predicate, (7)