Association Analysis with Microsoft Decision Trees

One of the unique features of Microsoft Decision Trees is that it can be used for association analysis. A mining model may contain a forest of trees. If a model contains a nested table and the nested table is predictable, all the nested keys are considered to be predictable attributes. The Microsoft Decision Trees algo-rithm builds trees for each of them, more precisely, for each nested key that is selected feature.

MICROSOFT LINEAR REGRESSION ALGORITHM

To make the linear regression feature of the Microsoft Decision Trees algorithm more visible, SQL Server Analysis Services 2005 added a new algorithm: Microsoft Linear Regression. It is actually based on the Microsoft Decision Trees algorithm. The linear regression algorithm doesn’t split the data. The regression formula is based on the entire dataset.

Figure 5.3 illustrates a set of trees to predict movie relationships. The top-left tree predicts the popularity Stargate. The dark bar in the histogram represents the probability of a viewer not liking Stargate, while the white bar represents the probability of a viewer liking Stargate. The first split is on the Star Wars attribute.

If a person likes Star Wars, he or she is much likelier to like Stargate. The second split, shows that a person who likes Star Trek also has a high probability of liking Stargate.

Figure 5.3 Association using Microsoft Decision Trees

Stargate Star Wars Star WarsNo Star Wars Star TrekNo Star Trek MatrixNo Matrix

TerminatorNo Terminator

Terminator Matrix Matrix

No Matrix E.T.No E.T. Star TrekNo Star Trek

Star WarsNo Star Wars

There are multiple trees in the model. From each tree, we can find a set of movies that is correlated with the predictable movie. For example, based on the Stargate tree, we can say that fans of Star Wars and Star Trek are likely to enjoy Stargate with certain weights (calculated based on the probability gain).

Based on the Terminator tree, we can predict that Matrix and ET fans will also like Terminator. By going over the entire forest of trees, we can derive all the relationships among the movies. These relationships are, in fact, association rules and can be used for making associated predictions. For example, if a per-son likes Star Wars, we can recommend Stargate and Matrix to him.

Using Microsoft Decision Trees for association analysis is very interesting;

associated items are displayed in the tree form and dependency network form.

However, there are also limitations to this association task. Because it builds a decision tree for each item, this may take time and resources when there are lots of items. The default maximum number of trees is 255. If there are more than 255 items, the algorithm uses feature selection techniques to select the important features.

T I P The Microsoft Decision Trees algorithm does association analysis by combining all the trees and deriving the correlations among the tree roots.

It is best when the number of items for associative analysis is limited;

otherwise, the algorithm has to build large number of trees. This is time-and resource-consuming.

The other issue is that Microsoft Decision Trees doesn’t return itemsets and rules as an association algorithm does. The user has to figure out the relationship using a content viewer. Our recommendation is to build models with both the decision tree and association algorithms. You may find

complementary information. If you have a large number of items, you should use an association algorithm.

Algorithm Parameters

There are a number of parameters for Microsoft Decision Trees. These para-meters are used to control the tree growth, tree shape, and the input/output attribute settings. By adjusting these parameter settings, you can fine-tune the model accuracy. The following is the list of decision tree algorithm parameters.

■■ Complexity_Penaltyis used to control the tree growth. It is a float-ing number with range [0,1]. When its value is set close to 0, there is a lower penalty for the tree growth after model training; thus, you may see a large tree. When its value is set to close 1, the tree growth is penal-ized, and the final tree is relatively small. Generally speaking, large

trees tend to have overtraining issues, whereas small tree may miss some patterns. The recommended way to tune the model is to try multi-ple trees with different settings and then use a lift chart to verify the model’s accuracy on testing data in order to pick the best one. The default setting is related to the number of input attributes. If there are fewer than 10 input attributes, the value is set to 0.5; if there are more than 100 attributes, the value is set to 0.99. If you have between 10 and 100 input attributes, the value is set to 0.9.

■■ Minimum_Supportis used to specify the minimum size of each leaf node in the tree. For example, if this value is set to 20, any tree split that can produce a child node containing less than 20 cases is not accepted.

The default value for Minimum_Support Minimum_Leaf_Casesis 10.

Usually, if the training dataset contains lots of cases, you will need to raise the value of this parameter to avoid oversplitting (overtraining).

■■ Score_Methodis a parameter of Integer type. It is used to specify the method for measuring a tree split score during the tree growth. We have discussed the concept of entropy in this chapter. To use an entropy score for tree growth, you need to set Score_Method = 1. There are a few other score methods supported by Microsoft Decision Trees:

Bayesian K2, 3 (BK2) and Bayesian Dirichlet Equivalent with Uniform prior, 4 (BDEU). BK2 adds a constant for each state of the predictable attribute in a tree node, regardless the node level of the tree. BDEU adds weighted support to each predictable state based on the node level. The weight of the root node is higher than that of the leaf node;

thus, the assigned prior (knowledge) is larger. The default value for Score_Method is 4, which is a BDEU method. Score_Method = 2(orthogonal)is no longer supported in SQL Server 2005.

■■ Split_Methodis a parameter with integer type. It is used to specify the tree shape, for example, whether the tree shape is binary or bushy.

Split_Method = 1means the tree is split only in a binary way. For example, Education is an attribute with three states: high school, under-graduate, and graduate. If the tree split is set to be binary, the algorithm may split the tree into two nodes with the criteria “Education =

Undergraduate?” If the tree split is set to be complete (Split_Method

= 2), the split on the Education attribute produces three nodes, one corre-sponding to each educational state. When Split_Methodis set to 3 (the default setting), the decision tree will automatically choose the better of the first two methods to create the split.

■■ Maximum_Input_Attributeis a threshold parameter of feature selection. When the number of input attributes is greater than this parameter value, feature selection is invoked implicitly to select the most significant input attributes.

■■ Maximum_Output_Attributeis a threshold parameter of feature selection. When the number of predictable attributes is greater than this parameter value, feature selection is invoked implicitly to select the most significant attributes. A tree is built for each of the selected attributes.

■■ Force_Regressoris a parameter for regression trees. It forces the regression and uses the specified attribute as the regressor. Suppose that you have a model to predict Income using Age, IQ, and other attributes. If you specify Force_Regressor = {Age, IQ}, you get regression formulas using Age and IQ for each leaf node of the tree.

Dans le document Data Mining with (Page 176-180)