Récemment recherché

Aucun résultat trouvé

Étiquettes

Aucun résultat trouvé

Document

Aucun résultat trouvé

Accueil Écoles Thèmes

Connexion

Non-standard texts: from theoretical positions to Natural Language Processing normalisation

Partager "Non-standard texts: from theoretical positions to Natural Language Processing normalisation"

N/A

N/A

Protected

Année scolaire: 2021

Info

Protected

Academic year: 2021

Partager "Non-standard texts: from theoretical positions to Natural Language Processing normalisation"

Copied!

2

0

0

2

0

0

Chargement.... (Voir le texte intégral maintenant)

Télécharger maintenant ( 2 Page )

Texte intégral

(1)

HAL Id: hal-01487025

https://hal.archives-ouvertes.fr/hal-01487025

Submitted on 10 Mar 2017

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Non-standard texts: from theoretical positions to

Natural Language Processing normalisation

Cédric Lopez, Mathieu Roche, Rachel Panckhurst

To cite this version:

Cédric Lopez, Mathieu Roche, Rachel Panckhurst. Non-standard texts: from theoretical positions to Natural Language Processing normalisation. PLIN-Day, May 2016, Louvain-la-Neuve, Belgium. 2016. �hal-01487025�

(2)

› annotation is not neutral

› annotation is linked to interpretative frameworks

› researchers should not be trapped

› researchers need to conduct own annotation

› full corpus (88,000 sms) & samples available for download

theoretical position

alignment algorithms

for transcoding

88milSMS corpus

SMS writing

(eSMS)

knowledge

building

http://88milsms.huma-num.fr/

Non-standard texts: from theoretical positions to

Natural Language Processing normalisation

Cédric Lopez* Mathieu Roche Rachel Panckhurst*

*R&D, Viseo, Grenoble

[email protected]

**UMR TETIS,Cirad, Irstea, AgroParisTech, Montpellier;

LIRMM, UMR 5506 CNRS & Université Montpellier

[email protected]

***Praxiling UMR 5267 CNRS & Université Paul-Valéry Montpellier 3

[email protected]

Case 1

plus

svt

possible

plus

souvent possible

Case 2

Vasi

lâche

moi

Vas-y

lâche-moi

Case 3

T’as

eu

Tu

as

eu

language classification

_{C1.2: LEFF no accents}

C1.1: LEFF

C2.1 (sole letters)

a, c, f, j, p, v…

C2.2 (time)

8heures, 10minaperdre, 6-7h

C2.3 (repetition)

Mdrrr, Loool, tkkkkt, Huuuummm

C2.4 (special)

**Conn*rd, désannule, resto+ciné**

C2.5 (numbers)

numb3rs, mc2, 106ounette, 3615ma-vie

C2.6 (smileys)

^^ :p ;) :d <3 :-) xd :( :/

C3 (INSO)

tkt, jte, cc, voituration, cinglicité, tetrangle

‘unknown’ non-standard items (INSO)

› new typology of detected ‘mistakes’

› normalisation based on most frequent errors

› confrontation with:

traditional automatic translation,

speech recognition,

spelling/grammatical checker principles

› comparison between different types of instant media (SMS, forums, tweets)

automatic normalisation techniques

PLIN Day,

Louvain-la-Neuve, 12 May 2016

ARTS, LETTRES, LANGUES, SCIENCES HUMAINES ET SOCIALES

Références

Télécharger maintenant ( PDF - 2 Page - 699.65 KB )

Documents relatifs

Applying Light Natural Language Processing to Ad-Hoc Cross Language Information Retrieval

In the CLEF 2005 Ad-Hoc Track we experimented with language-specific morphosyntactic processing and light Natural Language Processing (NLP) for the retrieval of Bulgarian,

Natural Language Processing for aviation safety reports: from classification to interactive analysis

We describe in Section 4 an experiment with probabilistic topic models on a large collection of reports, with mixed results that cannot conclude on the utility of this method in

Digital Humanities and Natural Language Processing: “Je t’aime... Moi non plus”

In spite of the increasingly large textual datasets humanities researchers are confronted with, and the need for automatic tools to extract information from them,

Natural Language Processing with Python

This module searches a file of code or documentation for blocks of text that look like an interactive Python session, of the form you have already seen many times in this book.

Natural Language Processing (Almost) from Scratch

We have described how we induced useful word embeddings by applying our architecture to a language modeling task trained using a large amount of unlabeled text data.. These

Using Polyadic Formal Contexts for Information Extraction from Natural Language Texts

Consider some experimental results. Associations were created on both formal contexts regarding the position of the subject of the action − the first position for the three-element

Quantum Logic and Natural Language Processing

Keywords: Natural Language Processing, Compositional Distributional Model of Meaning, Quantum Logic, Similarity Measures..

Framework for Conceptual Modeling on Natural Language Texts

We add additional tags if special words (extreme, obligately, etc.) recognized in the texts. A BioNLP data is always domain-specific. All the texts in the corpus [4] are about

Documents relatifs

From Requirements Engineering to UML using Natural Language Processing – Survey Study

From Requirements Engineering to UML using Natural Language Processing – Survey Study

7

0

0

AUTOMATED ANALYSIS OF NATURAL-LANGUAGE REQUIREMENTS USING NATURAL LANGUAGE PROCESSING

AUTOMATED ANALYSIS OF NATURAL-LANGUAGE REQUIREMENTS USING NATURAL LANGUAGE PROCESSING

191

0

0

Non-size increasing Graph Rewriting for Natural Language Processing

Non-size increasing Graph Rewriting for Natural Language Processing

37

0

0

Le plan de sauvegarde et de mise en valeur

Le plan de sauvegarde et de mise en valeur

3

0

0

The British Family Between Change and Continuity

The British Family Between Change and Continuity

60

0

0

Non-standard texts: from theoretical positions to Natural Language Processing normalisation

Non-standard texts: from theoretical positions to Natural Language Processing normalisation

1

0

0

The SNEMO and SUGAR Companion Data Sets

The SNEMO and SUGAR Companion Data Sets

5

0

0

Ochratoxin A in Roasted Coffee from French Supermarkets and Transfer in Coffee Beverages: Comparison of Analysis Methods

Ochratoxin A in Roasted Coffee from French Supermarkets and Transfer in Coffee Beverages: Comparison of Analysis Methods

17

0

0