3 The ADX Algorithm—Creation of a Ruleset for One Class

Let us begin with a loose description of the algorithm. The idea of the ADX (Apriori–

Decision rules eXtraction) algorithm is based on the observation that conjunction of selectors cannot have a largercovthan minimalcovfor each of these selectors. The main goal of the algorithm is to find the best ruleset (the set of complexes such that each of them has a possibly large pcov and a possibly smallncov for each of the classes). The rules are built by an iteration process. In each iteration, we lengthen complexes of high quality by one selector. Quality measure is based on pcovand ncovand roughly speaking it is high if pcovis high andncovis low. After creating all the rulesets, classification can be done by measuring similarity of a new unlabeled event to each ruleset. The ruleset which is most similar to the considered event will be chosen as a classifier’s decision. We allow ambiguities, noise, and missing values in input data.

The ADX algorithm does not try to create very long and complicated rules with ncov=0 (strong rules). The idea is to create a set of rather simple rules with possibly large pcovand possibly smallncov. Sometimes a set of strong rules can perform

badly during later prediction (e.g. for noisy data), and therefore ADX does not treat strong rules in any special way. In the algorithm, there is no need to truncate the rules finally obtained since the way they are constructed makes them possibly simple, short and general. We assume in the sequel that input data are discretized beforehand, and hence that the ADX works on nominal values. However, a very good discretization technique as proposed by Fayyad and Irani [9] is implemented into ADX.

3.1 First Step—Finding Selectors Base

For a given class and for each value of each attribute inD, the algorithm findspcov andncov. It means that in the first step, the algorithm builds a set of contingency tables for all attributes (excludingd). The set of all simple selectors thus obtained, to be termed selectors base, can be sorted by quality measureQwhich combinespcov andncov(definition ofQwill be given later).

3.2 Second Step—Creation of Complexes Set

We can say that the selectors base is a set of complexes with length 1. Longer complexes are created via an iteration process. Each iteration consists of:

• Creation of candidates (each of them being 1 simple selector longer than selectors in the previous iteration).

• Estimation of candidates’ quality.

• Deleting useless complexes.

• Selection of parents set.

• Checking if stop criteria have been reached.

Creation of CandidatesComplexes whose quality has not been estimated yet are called candidates. Creation of such complexes with length 2 is very simple. These are all possible selectors’ pairs excluding pairs where both selectors are based on the same attribute. Notice that complex (length 2)c1= s1,s2equalsc2= s2,s1, because each complex consists of conjunction of selectors. There is an important issue about creation of complexes longer than 2. To create a new complex candidate of lengthn+1, the algorithm uses two complexes shorter by 1, which can be called parents. Parent complexes need to have common part of length n −1 (where n denotes length of parents complexes). For example complexc1 = s1,s2,s3and c2 = s2,s3,s5can be used to createc3 = s1,s2,s3,s5. Creation of complexes with length 2 is based on simple selectors. Complexes with length 3 are based on selected parents complexes with length 2 and so on.

ADX Algorithm for Supervised Classification 43 Evaluation of Candidates’ QualityAfter creation of the candidates set there is a need to separately evaluate quality of each newly created complex. During the eval-uation process, for each complex candidate there is calculated positive and negative coverage (pcovandncov) on the training set. Based on these coverages, the quality measureQ1is calculated and is used to evaluate complexes’ usability.

Q1=(pcov−ncov)(1−ncov) (6) Deleting Useless Candidates Complexes that do not cover any positive event (pcov = 0) are deleted (to save the memory) during this step. Such complexes do not describe a class and are useless for next steps. However the rest of the com-plexes, including those that will not be used for creation of new candidates, are stored for future final selection step.

Selection of Parents Within this step, ADX selects complexes as parents from which, next candidates (longer by 1) are created. The size of parents set is defined by parametersear ch Beamand selection of parents is based on measureQ1(higher is better). If the number of complexes which have the same (and the highest) value ofQ1is larger thansear ch Beamthen exactlysear ch Beamparents are selected at random. Complexes which were not selected to be parents in the next iteration, are stored to be used for final selection process. While parametersear ch Beamcontrols the scope of exploration, it affects the learning time. Withsear ch Beam increas-ing the scope of exploration increases, but unfortunately, learnincreas-ing time grows too.

Loosely speaking,sear ch Beamshould be set in such a way that possibly best results be obtained within acceptable time. Its default value is set to 50.

In this phase, selection of complexes withncov=0 as parents is allowed. Exper-iments have shown that it leads to building longer complexes which can play positive role during the classification process (the ruleset proves more specific).

Stop Criteria Creation of candidates has two possible stop criteria. According to the first criterion, creation of new candidates is stopped when complex length equals the number of attributes (without decision attribute). Creation of the ruleset has to be finished because each single selector in complex must describe a different attribute (complex cannot be any longer).

According to the second criterion, the algorithm is stopped when the set of new candidates is empty (if there are no parents that have common part ofn−1 length and new candidates cannot be created).

3.3 Third Step—Merging of Complexes

In this step, the number of created and evaluated complexes is decreased and their quality improved. If some complexes are based on the same attribute set and only one selector has different value, then it is possible to merge such complexes. Merging induces creation of new selectors and removal of those merged. This new selector

is a disjunction of values of the corresponding base selectors. For example:A = 1,B = 3,C = 4 ⊕ A =1,B =3,C =7 =⇒ A =1,B =3,C = [4,7], where new selector isC = [4,7]. For the resulting complex pcovandncovare the sums of corresponding coverages of removed complex.

3.4 Fourth Step—Final Selection of the Ruleset

The final selection step is critical for the ruleset creation and is similar to truncation of a tree—essentially it has the same effect. From the entire set of stored rules we have to select the most appropriate subset that can be used in later successful classification.

In this phase, from the entire set of complexes, maximally f i nal Beam(a number not larger than fixed value of this parameter) of complexes is selected. Selection is based on prediction ability of already selected rules. First of all, all the complexes discovered so far are sorted by quality measure Q2(7). The measureQ2is similar toQ1but is more focused on complexes with lowerncov.

Q2=(pcov−ncov)(1−ncov)² (7) In order to describe final selection process, a measureQr of prediction ability of a ruleset has to be introduced.

Qr =(S^p−Sⁿ)(1−Sⁿ) (8) where

S^p=

e∈Dp

S(e) (9)

Sⁿ=

e∈Dn

S(e) (10)

andS(e)is measure of evente, to be defined in the next section.

The idea of measure Qr is to estimate prediction ability of a ruleset on a given subset of events. FactorS^pdenotes the sum of scores over all positive events in the subset. Loosely speaking scoreS(e)of the eventeexpresses the level of belonging to the given ruleset. Analogously, factor Sⁿ is calculated only for negative events from the selected training subset.

After sorting all the complexes we can start from the complex with the highest Q2, and add complexes, one by one, to the temporary final ruleset. If measureQr

calculated for the entire temporary ruleset does not decrease, the temporarily added rule joins the final ruleset. If not the rule is removed from the temporary ruleset and the next rule from the ranking (with slightly lowerQ2) is temporarily added. If the training set is very large to speed up the final selection process we can estimateQr

ADX Algorithm for Supervised Classification 45 on a randomly selected subset of training events. The size of the selected subset is defined by parametermax Event s F or Selecti on.

MeasureQr has high value if rules contained in temporary ruleset have possibly high score for all positive events in the subset data and possibly low score for negative events in the same subset. Generally speaking, the selection of rules to the final ruleset is based on optimization of successful reclassification of events from the training subset.

Notice that the number and quality of selected rules have high impact on speed and quality of later classification. Therefore maximal number of rules in the final ruleset (parameter f i nal Beam) can be set larger than its default value for extremely large and complicated data. However, for some datasets, ifQr cannot be increased by adding next complexes, the size of the final ruleset can be smaller than default f i nal Beamvalue (e.g. if 1 rule in a given ruleset is sufficient to successfully predict the class). The default setting for f i nal Beamis 50.

Dans le document Challenges in Computational Statistics and Data Mining (Page 52-56)