• Aucun résultat trouvé

Maîtriser la complexité des systèmes: apport de la fouille de données

N/A
N/A
Protected

Academic year: 2021

Partager "Maîtriser la complexité des systèmes: apport de la fouille de données"

Copied!
10
0
0

Texte intégral

(1)

HAL Id: hal-01414117

https://hal.inria.fr/hal-01414117

Preprint submitted on 12 Dec 2016

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Maîtriser la complexité des systèmes: apport de la fouille de données

Jean-François Mari, Amedeo Napoli

To cite this version:

Jean-François Mari, Amedeo Napoli. Maîtriser la complexité des systèmes: apport de la fouille de données. 2016. �hal-01414117�

(2)

Maîtriser la complexité des systèmes:

apport de la fouille de données

Equipe Orpailleur

LORIA (CNRS INRIA Nancy Grand-Est Université de Lorraine) B.P. 239, 54506 Vandoeuvre les Nancy, France

http://orpailleur.loria.fr/

Comité de l'Administration Régionale (CAR) Châlons en Champagne, 23 novembre 2016

(3)

Apport de l'Intelligence Articielle en fouille de données

Nettoyer les données

diminuer le bruit de mesure

cerner la variabilité dûe aux processus vivants parmodélisation stochastique

Agréger et classer les données : élémentaire: binaires,nombres

ou complexes : ensembles,séquences, relations

Décrire les ressemblances / diérences entre familles (les classes) Visualiser: les résultats agrègés (la photo de famille !)

Les formes découvertes peuvent être interprétées comme des unités de connaissance pour un univers à explorer.

Orpailleur Team KDD at INria NGE/LORIA ACAL-CAR 23/11/2016 2 / 9

(4)

Nature des données d'utilisation du territoire

Données temporelles car elles sont issues d'un calendrier de travail lié aux saisons

Données spatialescar elles sont issues de territoires mis en valeur par les agriculteurs ;

Données hétérogènes à diérentes échelles

Enjeux : rendre visibles ces données par des moyens numériques Permettre leur réutilisation

Rendre ces données inter-opérables Développer des nouveaux services

(5)

Dynamiques territoriales dans le bassin du Yar (

analyse sur 12 ans à partir d'images satellite à l'aide du logiciel ARPENTAge)

region 1 : forêts et zones pérennes

region 2 : urbain

region 3 : disparition des prairies

region 4 : maintien des prairies

region 5 : sans dynamique

Orpailleur Team KDD at INria NGE/LORIA ACAL-CAR 23/11/2016 4 / 9

(6)

Disparition des prairies dans le bassin du Yar : bassin à

algues vertes

(7)

Knowledge Discovery in Databases (KDD)

KDDis applied to large volumes of complex data for discovering patternswhich can besignicant andreusable.

KDD is based onthree main steps: data preparation,data mining, and interpretation of the discovered units.

KDD isiterativeandinteractive, i.e. it can bereplayed and it is guided by ananalyst.

Data

Prepared data

Discovered patterns

Interpreted patterns Selection and preparation

Data mining

Interpretation / Evaluation

Orpailleur Team KDD at INria NGE/LORIA ACAL-CAR 23/11/2016 6 / 9

(8)

Research tracks in the Orpailleur Team

Knowledge Discovery:

pattern mining, FCA and extensions, association rules, mining big data, visualization

mining complex data: sequences, trees, graphs, linked open data text mining, information retrieval, recommendation

preferencesin KD: skylines, skycubes, aggregation measures privacyandreputationin KD

KD in Life sciences:

data integration, mining complex data in biology and pharmacology Knowledge engineering:

knowledge mining, ontology engineering Inductive Logic Programming

decision making, dimensionality reduction

vulnerability management: detection and representation

(9)

Towards Exploratory Knowledge Discovery

Knowledge Discovery and Knowledge Engineering are complementary.

Use adeclarative approach for problem solving: i.e. describe the problem and the solver will take care of the solution.

Dene interactive mechanisms to identify seeds for pattern space exploration consistent w.r.t. domain knowledge (interaction, constraints, dimensionality reduction).

Propose mechanisms to explore the neighborhood of the seeds until initial constraints are satised by query reformulation (redescription).

Address threshold issues w.r.t. analyst queries thanks to the skyline analysis of the pattern space (preferences).

(10)

An Ordinal Approach to EKD: Small is Beautiful. . .

Classication is apolymorphic process,

and a good candidate for bridging discoveryandrepresentation of patterns.

Partial Orders and their properties can be used forrevisiting classication:

Discoveryof classes for understanding data.

Organizationof classes into a partial order.

Representation and Reasoning: instantiation, class denition, information retrieval. . .

Formal Concept Analysis can play some of these roles. . .

Références

Documents relatifs

(However, since these applications use metadata associated with text documents, rather than the text directly, it is unclear if it should be considered text data mining or standard

We have shown that mining multiple linked data sources improves classification performance of lung cancer ICD-10 codes from textual data, as compared to using a

The best results are achieved when using the Decision Trees classifier with all features, scoring an accuracy of 97% on the training data.. In comparison, the Na¨ıve Bayes

The boundaries between information retrieval, informa- tion extraction, and data mining are blurring; bringing them together, in an activity commonly referred to as text mining,

This book is about the tools and techniques of machine learning used in practical data mining for finding, and describing, structural patterns in data.. As with any burgeoning

Research on this problem in the late 1970s found that these diagnostic rules could be generated by a machine learning algorithm, along with rules for every other disease category,

For example, for a statistical dataset on cities, Linked Open Data sources can provide relevant background knowledge such as population, climate, major companies, etc.. Having

We have also tested our tool by retrieving information about movies from 1991 to 2001 by DBpedia and Linked Movie Data Base from various countries and integrated that with