COMPARAISON DES LOGICIELS DE TRAITEMENT DES DONNEES TEXTUELLES parSofia TriantafyllidouMaître du stage : Antoine SpinakisProfesseur responsamble : Nicolas Loménie

(1)

Network of Excellence in Text Mining

& its

application in Statistics

COMPARAISON DES LOGICIELS DE

TRAITEMENT DES DONNEES TEXTUELLES par

Sofia Triantafyllidou

Maître du stage : Antoine Spinakis

Professeur responsamble : Nicolas Loménie

(2)

Network of Excellence in Text Mining

& its

application in Statistics

TEXT MINING OU

TEXT DATA MINING

• Sortir l’information pertinente à partir des données textuelles, non structurées

• A partir des textes non structurés, créer une

forme intermédiaire (FI) des données qui va

servir à l’extraction de l’information souhaitée.

(3)

Network of Excellence in Text Mining

& its

application in Statistics

Forme intermédiaire basée sur les

documents

Forme intermédiaire basée sur des

concepts

classification catégorisation visualisation

modèle de prédiction visualisation

LE CADRE DU TEXT MINING

(4)

Network of Excellence in Text Mining

& its

application in Statistics

APERCU DU PROCESSUS DE COMPARAISON

1

^ère

phase: preparation du processus d’evaluation 1

^ère

étape

Sélection des logiciels text mining à comparer

2

^ème

étape

Description générale de ces logiciels

3

^ème

étape

Présentation des critères d’évaluation

4

^ème

étape

Comparaison des outils

de text mining selon les

critères d’évaluation

2

^ème

phase: comparaison

des logiciels TM seléctionés

(5)

Network of Excellence in Text Mining

& its

application in Statistics

LOGICIELS SELECTIONES

• ALCESTE

• ATLAS.ti

• Hyperbase

• IBM Intelligent Miner for Text

• Intex

• Lexico

• NUD*IST

• SAS Text Miner

• SPAD

• Sphinx Lexica

• SPSS

• STING

(6)

Network of Excellence in Text Mining

& its

application in Statistics

CRITERES D’EVALUATION

• CARACTERISTIQUES TECHNIQUES

• PROCESSUS DU TRAITEMENT DES DONNEES TEXTUELLES

• METHODES D’ANALYSE

• PRESENTATION DES RESULTATS

• METHODES DE VISUALISATION

• AUTOMATISMES

(7)

Network of Excellence in Text Mining

& its

application in Statistics

EXEMPLE DE PRESENTATION DES RESULTATS

QUARACTERISTIQUES TECHNIQUES

Text Mining Tools

Platform Databases/Data Sources Web

Access

ALCESTE WIN 95, WIN 98, WIN NT4, Power Macintosh and UNIX

Data published from CNRS (Centre National de la Recherché Scientific) with the support of ANVAR (French Agency for Innovation).

Text from interview transcriptions, newspaper articles (perhaps downloaded from ‘Lexis Nexis Professional’ at LSE), books or any other source.

ATLAS

Windows 98 SE, Windows ME, Windows NT (SP 6), W2000 (SP 3) and Windows XP. The latter is the preferred system.

All relevant entities are stored in a container, the so-called

"Hermeneutic Unit (HU).”

Hyperbase

Apple Macintosh with at least 4MB RAM and optional CD- ROM drive.

It is predominantly applicable to literary corpora (e.g. Dabelais, Diderot, etc.) but also to historic, advertising, legal texts or even to polls and surveys

IBM Intelligent Miner for Text

 Microsoft® Windows NT®V4.0 and Service Pack 3, Sun Solaris V2.5.1, or OS/390 V2.4-2.6.

 The Web search solution requires an Internet connection server, such as

Full-text retrieval components.

The OS/390® version requires the Text Search component of the OS/390 operating system, which is available at no cost for downloading. The OS/390 version also requires a Web server and

Web access

tools