Récemment recherché

Aucun résultat trouvé

Étiquettes

Aucun résultat trouvé

Document

Aucun résultat trouvé

Accueil Écoles Thèmes

Connexion

Non-standard texts: from theoretical positions to Natural Language Processing normalisation

Partager "Non-standard texts: from theoretical positions to Natural Language Processing normalisation"

N/A

N/A

Protected

Année scolaire: 2021

Info

Protected

Academic year: 2021

Partager "Non-standard texts: from theoretical positions to Natural Language Processing normalisation"

Copied!

1

0

0

1

0

0

Chargement.... (Voir le texte intégral maintenant)

Télécharger maintenant ( 1 Page )

Texte intégral

(1)

› annotation is not neutral

› annotation is linked to interpretative frameworks

› researchers should not be trapped

› researchers need to conduct own annotation

› full corpus (88,000 sms) & samples available for download

theoretical position

alignment algorithms

for transcoding

88milSMS corpus

SMS writing

(eSMS)

knowledge

building

http://88milsms.huma-num.fr/

Non-standard texts: from theoretical positions to

Natural Language Processing normalisation

Cédric Lopez* Mathieu Roche Rachel Panckhurst*

*R&D, Viseo, Grenoble [email protected] **UMR TETIS,Cirad, Irstea, AgroParisTech, Montpellier;

LIRMM, UMR 5506 CNRS & Université Montpellier [email protected] ***Praxiling UMR 5267 CNRS & Université Paul-Valéry Montpellier 3

[email protected]

Case 1

plus

svt

possible

plus

souvent possible

Case 2

Vasi

lâche

moi

Vas-y

lâche-moi

Case 3

T’as

eu

Tu

as

eu

language classification

_{C1.2: LEFF no accents}

C1.1: LEFF

C2.1 (sole letters)

a, c, f, j, p, v…

C2.2 (time)

8heures, 10minaperdre, 6-7h

C2.3 (repetition)

Mdrrr, Loool, tkkkkt, Huuuummm

C2.4 (special)

**Conn*rd, désannule, resto+ciné**

C2.5 (numbers)

numb3rs, mc2, 106ounette, 3615ma-vie

C2.6 (smileys)

^^ :p ;) :d <3 :-) xd :( :/

C3 (INSO)

tkt, jte, cc, voituration, cinglicité, tetrangle

‘unknown’ non-standard items (INSO)

› new typology of detected ‘mistakes’

› normalisation based on most frequent errors

› confrontation with:

traditional automatic translation,

speech recognition,

spelling/grammatical checker principles

› comparison between different types of instant media (SMS, forums, tweets)

automatic normalisation techniques

PLIN Day,

Louvain-la-Neuve, 12 May 2016

ARTS, LETTRES, LANGUES, SCIENCES HUMAINES ET SOCIALES

Références

Télécharger maintenant ( PDF - 1 Page - 517.41 KB )

Documents relatifs

Natural Language Processing for aviation safety reports: from classification to interactive analysis

We describe in Section 4 an experiment with probabilistic topic models on a large collection of reports, with mixed results that cannot conclude on the utility of this method in

Digital Humanities and Natural Language Processing: “Je t’aime... Moi non plus”

In spite of the increasingly large textual datasets humanities researchers are confronted with, and the need for automatic tools to extract information from them,

Applying Light Natural Language Processing to Ad-Hoc Cross Language Information Retrieval

In the CLEF 2005 Ad-Hoc Track we experimented with language-specific morphosyntactic processing and light Natural Language Processing (NLP) for the retrieval of Bulgarian,

Natural Language Processing with Python

This module searches a file of code or documentation for blocks of text that look like an interactive Python session, of the form you have already seen many times in this book.

Natural Language Processing (Almost) from Scratch

We have described how we induced useful word embeddings by applying our architecture to a language modeling task trained using a large amount of unlabeled text data.. These

Using Polyadic Formal Contexts for Information Extraction from Natural Language Texts

Consider some experimental results. Associations were created on both formal contexts regarding the position of the subject of the action − the first position for the three-element

Quantum Logic and Natural Language Processing

Keywords: Natural Language Processing, Compositional Distributional Model of Meaning, Quantum Logic, Similarity Measures..

Framework for Conceptual Modeling on Natural Language Texts

We add additional tags if special words (extreme, obligately, etc.) recognized in the texts. A BioNLP data is always domain-specific. All the texts in the corpus [4] are about

Documents relatifs

The British Family Between Change and Continuity

The British Family Between Change and Continuity

60

0

0

From Requirements Engineering to UML using Natural Language Processing – Survey Study

From Requirements Engineering to UML using Natural Language Processing – Survey Study

7

0

0

AUTOMATED ANALYSIS OF NATURAL-LANGUAGE REQUIREMENTS USING NATURAL LANGUAGE PROCESSING

AUTOMATED ANALYSIS OF NATURAL-LANGUAGE REQUIREMENTS USING NATURAL LANGUAGE PROCESSING

191

0

0

Non-size increasing Graph Rewriting for Natural Language Processing

Non-size increasing Graph Rewriting for Natural Language Processing

37

0

0

Deep-sea manefishes (Perciformes: Caristiidae) from oceanic islands and seamounts off northeastern Brazil, with comments on the caristiids previously reported in Brazilian waters

Deep-sea manefishes (Perciformes: Caristiidae) from oceanic islands and seamounts off northeastern Brazil, with comments on the caristiids previously reported in Brazilian waters

16

0

0

The SNEMO and SUGAR Companion Data Sets

The SNEMO and SUGAR Companion Data Sets

5

0

0

Ochratoxin A in Roasted Coffee from French Supermarkets and Transfer in Coffee Beverages: Comparison of Analysis Methods

Ochratoxin A in Roasted Coffee from French Supermarkets and Transfer in Coffee Beverages: Comparison of Analysis Methods

17

0

0

Le plan de sauvegarde et de mise en valeur

Le plan de sauvegarde et de mise en valeur

3

0

0