Exploring a Naïve Bayes Model

When exploring a Naïve Bayes model, it is easier to think of the process as sim-ply exploring your data. Since the Naïve Bayes algorithm does not perform any kind of advanced analysis on your data, the views into the model really are simply a new way of looking at the data you always had.

The Naïve Bayes viewer contains four views. SQL Server Data Mining pro-vides four different views on Naïve Bayes models that help provide insight into your data. The viewer is accessed through either the BI Development Stu-dio or SQL Management StuStu-dio by right-clicking on the model and selecting

“Browse.” The views are:

■■ Dependency Net

■■ Attribute Profiles

■■ Attribute Characteristics

■■ Attribute Discrimination

Dependency Net

The first tab of the Naïve Bayes viewer is the dependency net. The dependency net (see Figure 4.3) provides a quick display of how all of the attributes in your model are related. Each node in the graph represents an attribute, whereas each edge represents a relationship. If a node has an outgoing edge, as indi-cated by the arrow, it is predictive of the attribute in the node at the end of the edge. Likewise, if a node has an incoming edge, it is predicted by the other node. Edges can also be bidirectional, indicating that the attributes in the cor-responding nodes predict and are predicted by each other.

You can easily hone in on the attributes that interest you by using the Find Node feature. Clicking the Find Node button provides a list of all attributes in the graph or hidden. Selecting a node from the list will cause the node to become selected in the graph. Selected nodes are highlighted and all con-nected nodes are highlighted with a color representing their relationship with the selection. Figure 4.3 shows a portion of the dependency net for the Con-gressional Voting model with the Party node selected. From this view, it is easy to see the relationships that Party has with the other attributes in the model.

In addition to displaying the relationships and their directions, the depen-dency net can also tell you the strength of those relationships. Moving the slider from top to bottom will filter out the weaker links, leaving the strong relationships.

N OT E You will not see all of the possible relationships in your model unless all columns are checked both predictable and input in the Mining Model Wizard or marked Predict in the Mining Model Editor. Additionally, some links may be missing if you raise the MINIMUM_NODE_SCOREparameter.

Attribute Profiles

The second tab, the Attribute Profile viewer, provides you with an exhaustive report of how each input attribute corresponds to each output attribute one attribute at a time. At the top of the Attribute Profile viewer, you select which output to look at, and the rest of the view shows how all of the input attributes correlated to the states of the output attribute.

Figure 4.4 shows the attribute profiles for the party attribute. You can see that the Abortion Non-Discrimination Act vote was approximately even, with Republicans voting Yeah and Democrats Nay. At the same time, you can see the almost unanimous support for the Child Abduction Prevention Act.

Figure 4.3 Naïve Bayes Dependency Net viewer with the Party node selected

Figure 4.4 Attribute profiles for the party attribute

You can also use this view to organize your data to be presented the way you see fit. You can rearrange columns by clicking and dragging on their head-ers, or you can even remove a column altogether by right-clicking the column header and selecting Hide Column. Additionally, if the alphabetical order doesn’t suit you, simply click the header for the attribute state you are inter-ested in, and the row ordering changes based on how important that attribute is in predicting that state.

Attribute Characteristics

The third tab allows you to select an output attribute and value and shows you a description of the cases where that attribute and value occur. Essentially, this provides answers to the question “what are people who _____ like?” For example, Figure 4.5 shows the characteristics of Democrats. You can see that these representatives in general voted No on the health care, class action, and rental purchase acts, but voted Yes on the Child Abduction Act.

When viewing the attribute characteristics, there are two issues you should keep in mind. First, an attribute characteristic does not imply predictive power. For instance, if most representatives voted for the Child Abduction Pre-vention Act, then it is likely to characterize Republicans as well as Democrats.

Second, inputs that fall below the minimum node score set in the algorithm parameters are not displayed.

Figure 4.5 Characteristics of attributes, values, and probability

Attribute Discrimination

The last tab, Attribute Discrimination, provides the answers to the most inter-esting question — what is the difference between A and B? With this viewer, you choose the attribute you are interested in and select the states you want to compare, and the viewer displays a modified tornado chart indicating which factors favor each state.

Figure 4.6 shows the results distinguishing Republicans and Democrats.

Republicans tended to vote for most issues, while Democrats voted against them. When reading this view, you also need to take care in your interpreta-tion. It is not implied that no Democrats voted for the Death Tax Repeal Act, rather that these factors favor one group over the other.

T I P You can determine the unique characteristics of a group by comparing a state to “all other states.” This will give you a view of what seperates that particular group from the rest of the crowd.

When interpreting this view, you have to be careful to consider the support level of the attribute before making judgments. Figure 4.7 shows the discrimi-nation between Independents and all other congresspersons. Looking at this figure, you could say that a strong differentiator between Independents and Democrats is the support for the Low Cost Healthcare Act. Unfortunately, you would be wrong. When examining the Mining Legend for that issue, you see that there are actually only two Independents in your data set. Obviously, it is not prudent to make conclusions based on such limited support.

N OT E If the Mining Legend is not visible, you can display it by right-clicking on the view and selecting “Show Legend.”

Figure 4.6 Distinguishing between Republicans and Democrats

Figure 4.7 Discrimination between Independents and Congresspersons

Summary

Naïve Bayes is a machine implementation of Bayes Rule created by the Rev-erend Thomas Bayes in the eighteenth century, which has become the founda-tion for many machine-learning and data mining methods. It is a quick, approachable data mining algorithm that you can use to perform predictions and do advanced exploration of your data. The visualizations provided for Naïve Bayes are easy to understand by a wide audience and are particularly suitable for inclusion in reports.

145 Put yourself in the place of a loan officer at a bank. A young couple walks into request a loan. Young, you think — not a good sign. You talk to them. They’re married and that’s a plus. He’s worked the same job for three years. Job stabil-ity is another good sign. A look at their credit reports shows they’ve missed three payments in the last 12 months — a big negative. From your experi-ence, you’ve created a tree in your mind that allows you to determine how you rank each loan application. The question remains, does this couple get the loan? Let’s see how decision trees can help to solve this puzzle.

In this chapter you will learn about:

■■ The principles of the Microsoft Decision Trees Algorithm

■■ Using the Microsoft Decision Trees algorithm

■■ Interpreting the tree model content

Dans le document Data Mining with (Page 163-168)