• Aucun résultat trouvé

Optimizing the Knowledge Discovery Process through Semantic Meta-Mining

N/A
N/A
Protected

Academic year: 2022

Partager "Optimizing the Knowledge Discovery Process through Semantic Meta-Mining"

Copied!
1
0
0

Texte intégral

(1)

Optimizing the Knowledge Discovery Process through Semantic Meta-Mining

Melanie Hilario

Computer Science Department University of Geneva

Geneva, Switzerland

Abstract. I will describe a novel meta-learning approach to optimizing the knowledge discovery or data mining (DM) process. This approach has three features that distinguish it from its predecessors. First, previous meta-learning research has focused exclusively on improving the learning phase of the DM process. More specifically, the goal of meta-learning has typically been to select the most appropriate algorithm and/or parame- ter settings for a given learning task. We adopt a more process-oriented approach whereby meta-learning is applied to design choices at different stages of the complete data mining process or workflow (hence the term meta-mining). Second, meta-learning for algorithm or model selection has consisted mainly in mapping dataset properties to the observed per- formance of algorithms viewed as black boxes. While several generations of researchers have worked intensively on characterizing datasets, little has been done to understand the internal mechanisms of the algorithms used. At best, a few have considered perceptible features of algorithms like their ease of implementation or their robustness to noise, or the in- terpretability of the models they produce. In contrast, our meta-learning approach complements dataset descriptions with an in-depth analysis and characterization of algorithms - their underlying assumptions, opti- mization goals and strategies, together with the structure and complexity of the models and patterns they generate. Third, previous meta-learning approaches have been strictly (meta) data-driven. To make sense of the intricate relationships between tasks, data and algorithms at different stages of the data mining process, our meta-miner relies on extensive background knowledge concerning knowledge discovery itself. For this reason we have developed a data mining ontology, which defines the es- sential concepts and relations needed to represent and analyse data min- ing objects and processes. In addition, a DM knowledge base gathers as- sertions concerning data preprocessing and machine learning algorithms as well as their implementations in several open-source software pack- ages. The DM ontology and knowledge base are domain-independent;

they can be exploited in any application area to build databases de- scribing domain-specific data analysis tasks, datasets and experiments.

Aside from their direct utility in their respective target domains, such databases are the indispensable source of training and evaluation data for the meta-miner. These three features together lay the groundwork for semantic meta-mining, the process of mining DM meta-data on the basis of data mining expertise distilled in an ontology and knowledge base.

Références

Documents relatifs

The new field of Semantic Data Mining, which has emerged in the past few years, has suggested a possible solution to this problem: Domain knowledge can be derived from semantic

For example, for a statistical dataset on cities, Linked Open Data sources can provide relevant background knowledge such as population, climate, major companies, etc.. Having

The align- ments are stored in the Field Mapping Language (FML) [?], expressing that a data field (column) in a PMML model semantically corresponds to an abstract ‘field’ in a

Implemented data mining algorithms cover all typically mined kinds of knowledge, such as frequent patterns, association rules; and classification, prediction and clustering

To make sense of the intricate relationships between tasks, data and algorithms at different stages of the data mining process, our meta-miner relies on extensive background

Query-independent techniques, for example, were applied to represent temporal data, and query-dependent techniques are commonly used to represent continuous quantitative data

The field of process mining [1] provides tools and tech- niques to increase the overall knowledge of a (business) process, by means of analyzing the event data stored during

The outcome of our approach is a Pareto front of process models based on the constraints specified by the domain expert and common quality dimensions of process mining.. Keywords: