Chapter 4 Meta-mining as a Classification Problem 39

4.4 Meta-mining Framework

Symbol Meaning

o a workflow operator.

e a workflow data type.

wl= [o1, . . . , ol] a ground DM workflow as a sequence ofl operators.

wl= (I(p1twl), . . . ,I(p|P|twl))T the fixed length |P|-dimensional vector description of a workflowwl

x= (x1, . . . , xd)T thed-dimensional vector description of a dataset.

y(x,w) the relative performance score of w workflow on x dataset

g a DM goal.

t a HTN task.

m a HTN method.

Oˆ={o1, . . . , on} a HTN abstract operator withnpossible operators.

Cl a set of candidate workflows at some abstract

oper-ator ˆO.

Sl a set of candidate workflows selected fromCl.

Table 4.1: Summary of notations used.

match, like classification accuracy, of thexi and wj instances. Table 4.1 summarizes the most important notations.

4.4 Meta-mining Framework

We will provide here the general description of our framework. In the next section, we will define the task of algorithm selection (Hilario, 2002; Smith-Miles, 2008) that is best described with the Rice Model and is one of the two main tasks in meta-learning – the other task is model selection (Ali and Smith-Miles, 2006, 2007). Then we will see how we will extend this model to account for workflow descriptors. From this new model, we will describe in Section 4.4.3 the three meta-mining tasks that we will address with our framework.

4.4.1 Rice Model

The task of algorithm selection traces back to the mid-seventies when J. Rice formalized this task in a seminal paper (Rice, 1976). In his model, this task is decomposed into four main components; the problem space X, the algorithm space A, the performance space Y between problems and workflows, and a problem feature space which describes the original problem space. Basically, the task of algorithm selection amounts to extract relevant featuresx∈Rdfrom different problemsx∈ X in order to find the best mapS(x) to select the algorithma∈ Awhich will maximize the performance metric y(x, a). Figure

Selection Mapping

Performance Mapping

xRd

Algorithm SpaceA aA Performance SpaceY

yY

y(x, a)

a=argmaxa∈AS(x) Problem SpaceX

xX

Problem Feature Space

Figure 4.1: Rice model and its four components.

4.1 shows the Rice model and its four components.

In meta-learning, we will typically proceed as follows. First we will have to extract training meta-data, i.e. a set of meta-examples of the form hx,y(x, a)i from a set of learning problemsx∈Xand a set of algorithms A⊆ A. With these training data, we will train one or more meta-learning models of the formH :X × A → Y, which will be used to predict from the poolA of base-level learners the best algorithm a ∈A that maximizes a given performance metricy ∈ Y on a given learning problemx∈ X:

a = S(x) (4.1)

= arg max

a∈A

h(x, a)

where h ∈ H is a meta-learning model to predict the performance of a given algorithm a∈A for a given learning problemx. Several variants for building a meta-learning model exist in the literature; one can build either one meta-learning model per algorithm and select the algorithm that delivers the highest performance on a given dataset (King et al., 1995; Michie et al., 1994), or build one meta-learning model per pair of algorithms and then rank each algorithm according to its relative performance on the dataset (Kalousis and Theoharis, 1999; Peng et al., 2002; Sun and Pfahringer, 2013), or even build a single model to predict the performance of similarly performing algorithms on the dataset (Aha, 1992). Overall, the main assumption behind the algorithm selection task in meta-learning is thatproblems characterized by similar features map to the same algorithm and exhibit similar performance (Giraud-Carrier, 2008).

However, the main drawback in meta-learning is that this framework is restricted to model solely over the specific algorithm entities given in the poolAof base-level learners;

4.4. Meta-mining Framework

Selection Mapping Performance Mapping

xRd

Workflow SpaceW wW Performance SpaceY

yY

y(x,w)

w= arg maxwS(x,w) Problem SpaceX

xX

Problem Feature Space

wR|P| Workflow Feature Space

Figure 4.2: The new Rice model and its five components.

it is not possible for the meta-learner to generalize the learned meta-learning models to algorithms which not have been experimented in the original poolAof base-level learners.

As a result, in order to account for new algorithms, the meta-learner has to train new base-level experiments and to re-train a meta-learning model with respect to these new data points.

4.4.2 Revisiting the Rice Model

With respect to the Rice model, we propose to extend this model so that the selection mapping S will be no more restricted to dataset characteristics but it will include also workflow characteristics. The proposed revision of the Rice model is illustrated by Figure 4.2. We have now a fifth component, the workflow feature spaceW which describes the workflow space W in the same manner as the problem feature space X describes the problem spaceX. In addition, we have now a mapping functionS(x,w) which takes into account both dataset and workflow characteristics.

With the additional use of workflow descriptors w ∈ W in the mapping function S, we will be no more restricted to build a model that only considers the specific entities of the training algorithms as in the original Rice model. Instead, we will build a meta-mining model h∈H that will combine dataset and workflow characteristics, so that the model will be able to associate dataset and workflow characteristics with respect to some performance measure of the latter applied to the former. More precisely, we will look now

for a predictive modelh∈H of the form:

H:X × W → Y (4.2)

which will associate the two feature spaces X ⊆ X and W ⊆ W with respect to the performance space Y ⊆ Y. The form of the model H in Eq.(4.2) is typically referred to dyadic data in machine learning, where we have two finite sets of objects in which observations are made for dyads, i.e. pairs with one element from either set (Hofmann et al., 1999). Finally note that our model is closely related to the recent works on automatic algorithm configuration for algorithm selection likeAutoWeka (Thornton et al., 2013) and AutoFolio (Lindauer, Hoos, Hutter, and Schaub, 2015) which treat workflows as

”templates” and optimize the components of the templates, i.e. the algorithms to be selected within a workflow, using a Bayesian procedure. However as already mentioned in Section 2.4 our framework encompass those approaches in the sense that it is the first one able to generalize the meta-models to unseen datasets and workflows, thanks to the learned association between the dataset and workflow feature spaces.

4.4.3 Meta-mining Tasks

From the above model, we will be able to address three novel selection tasks. The first one is similar to the algorithm selection task in meta-learning. However, as already said, we will use now the notion of workflow instead of algorithm, so we will call this tasklearning workflow preferences. In addition, we will address the task of dataset selection, orlearning dataset preferences, which consists to select for a given workflow the best dataset, i.e. the one on which the given workflow will achieve the best performance. Furthermore, we will go one step forwards by trying to address both algorithm and dataset selection together, a task that we will call learning dataset-workflow preferences, and which consists to find the best match, or selection mapping S, between a new problem and a newalgorithm or workflow, considering that both have never been experimented in the past.

Task 1: Learning workflow preferences

In the case of workflow selection, the optimization problem can be re-formulated as:

w = S(x,w) (4.3)

= arg max

w∈W

h(x,w)

4.4. Meta-mining Framework

for some x∈ X. Note that, compared to the original model, this new model is no more restricted to select workflows from a fixed pool of base-level workflows W ⊆ W; because we can characterize now any workflow w∈ W by some characteristics, frequent workflow patterns,w∈R|P|, then the model can generate suggestions for workflows that were never seen nor experimented during the model generation process.

Task 2: Learning dataset preferences

In the same manner as for the case of dataset selection, the optimization problem is now given by:

x = S(x,w) (4.4)

= arg max

x∈X

h(x,w)

for some w∈ W. As in Task 1, we are here not restricted to select the best problem x from the training set of learning problemsX; that is, the model can generate suggestions for datasets that were never seen nor experimented during the model generation process.

Even though it is not a typical use-case in machine learning, i.e. to select datasets for workflows, we think that it is worth mentioning and also experimenting this task in the context of meta-mining. More precisely, because a meta-mining model h is able to take into account both dataset and workflow characteristics, we can address Task 2 using the same model learned in Task 1. However, because each task focus on the performance prediction of a different learning entity, we will obtain different performance accuracies in our experiments, see Section 4.6 for more details.

Task 3: Learning dataset-workflow preferences

Finally, in the case of the dataset plus workflow selection task, the optimization problem can be formulated as the joint of the two optimization problems formulated above:

(x,w) = S(x,w) (4.5)

= arg max

x∈X,w∈W

h(x,w)

As before, we can use the same meta-mining model h learned in Tasks 1 and 2 to address Task 3. However, the current objective now is to find the best pair of dataset and workflow that will maximize the performance metric y(x,w) under consideration. Note that this task is much more difficult than the two other tasks since here we are trying to

predict the performance of a new learning pair and not only the performance of a single learning entity.

Overall, we will propose in this Thesis different methods to address the three above mining tasks. Note that none of these three tasks have been considered in the meta-learning literature since, in meta-meta-learning, we do not consider workflow characteristics to build a meta-learning model; as such, meta-learning can not predict the performance of new workflows. For the first task,learning workflow preferences, we will compare the per-formance of a meta-learning model against a meta-mining model in classifying workflows for a real-world learning problem; ie a set of real-world datasets on which we applied a number of different workflows. In both approaches, we will use the same meta-classifier where for meta-learning we will train the meta-learner using only dataset characteristics and for the meta-miner we will use together dataset and workflow characteristics, with respect to some performance measure of the workflows applied on the datasets. Before proceeding to the experiments, we will describe in the next Section the learning problem on which we will extract dataset characteristics and workflow performances to build the baseline meta-learner, and also on which we will extract workflow characteristics to build our meta-miner.

Dans le document Meta-mining: a meta-learning framework to support the recommendation, planning and optimization of data mining workflows (Page 62-67)