APPLICATION OF NEURAL NETWORK MODELING

is moresensitiveto variations in dividend yield and that therefore dividend yield is a more important factor for predicting stock prices than is price–earnings ratio.

Next, we apply a neural network model using Insightful Miner on the sameadult data set [3] from the UCal Irvine Machine Learning Repository that we analyzed in Chapter 6. The Insightful Miner neural network software was applied to a training set of 24,986 cases, using a single hidden layer with eight hidden nodes. The algorithm iterated 47 epochs (runs through the data set) before termination. The resulting neural network is shown in Figure 7.8. The squares on the left represent the input nodes.

For the categorical variables, there is one input node per class. The eight dark circles represent the hidden layer. The light gray circles represent the constant inputs. There is only a single output node, indicating whether or not the record is classiﬁed as having income less than $50,000.

In this algorithm, the weights are centered at zero. An excerpt of the computer output showing the weight values is provided in Figure 7.9. The columns in the ﬁrst

Figure 7.8 Neural network for the adult data set generated by Insightful Miner.

Figure 7.9 Some of the neural network weights for the income example.

table represent the input nodes: 1=age, 2=education-num, and so on, while the rows represent the hidden layer nodes: 22 =ﬁrst (top) hidden node, 23=second hidden node, and so on. For example, the weight on the connection from age to the topmost hidden node is−0.97, while the weight on the connection fromRace:

American Indian/Eskimo(the sixth input node) to the last (bottom) hidden node is

−0.75. The lower section of Figure7.9 displays the weights from the hidden nodes to the output node.

The estimated prediction accuracy using this very basic model is 82%, which is in the ballpark of the accuracies reported by Kohavi [4]. Since over 75% of the subjects have incomes below $50,000, simply predicted “less than $50,000” for every person would provide a baseline accuracy of about 75%.

However, we would like to know which variables are most important for predict-ing (classifypredict-ing) income. We therefore perform a sensitivity analysis uspredict-ing Clemen-tine, with results shown in Figure 7.10. Clearly, the amount of capital gains is the best predictor of whether a person has income less than $50,000, followed by the number of years of education. Other important variables include the number of hours worked per week and marital status. A person’s gender does not seem to be highly predictive of income.

Of course, there is much more involved with developing a neural network classiﬁcation model. For example, further data preprocessing may be called for; the

EXERCISES 145

Figure 7.10 Most important variables: results from sensitivity analysis.

model would need to be validated using a holdout validation data set, and so on. For a start-to-ﬁnish application of neural networks to a real-world data set, from data preparation through model building and sensitivity analysis, see Reference 5.

REFERENCES

1. Tom M. Mitchell,Machine Learning, McGraw-Hill, NewYork, 1997.

2. Russell D. Reed and Robert J. Marks II,Neural Smithing: Supervised Learning in Feedfor-ward Artiﬁcial Neural Networks, MIT Press, Cambridge, MA, 1999.

3. C. L. Blake and C. J. Merz, UCI Repository of Machine Learning Databases,http://

www.ics.uci.edu/∼mlearn/MLRepository.html, University of California, De-partment of Information and Computer Science, Irvine, CA, 1998.

4. Ronny Kohavi, Scaling up the accuracy of na¨ıve Bayes classiﬁers: A decision tree hybrid, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, 1996.

5. Daniel Larose,Data Mining Methods and Models, Wiley-Interscience, Hoboken, NJ (to appear 2005).

EXERCISES

1. Suppose that you need to prepare the data in Table 6.10 for a neural network algorithm.

Deﬁne the indicator variables for theoccupationattribute.

2. Clearly describe each of these characteristics of a neural network:

a. Layered b. Feedforward

c. Completely connected

3. What is the sole function of the nodes in the input layer?

4. Should we prefer a large hidden layer or a small one? Describe the beneﬁts and drawbacks of each.

5. Describe how neural networks function nonlinearly.

6. Explain why the updating term for the current weight includes thenegativeof the sign of the derivative (slope).

7. Adjust the weightsW0B,W1B,W2B,andW3Bfrom the example on back-propagation in the text.

8. Refer to Exercise 7. Show that the adjusted weights result in a smaller prediction error.

9. True or false: Neural networks are valuable because of their capacity for always ﬁnding the global minimum of the SSE.

10. Describe the beneﬁts and drawbacks of using large or small values for the learning rate.

11. Describe the beneﬁts and drawbacks of using large or small values for the momentum term.

Hands-on Analysis

For the following exercises, use the data set churnlocated at the book series Web site. Normalize the numerical data, recode the categorical variables, and deal with the correlated variables.

12. Generate a neural network model for classifying churn based on the other variables.

Describe the topology of the model.

13. Which variables, in order of importance, are identiﬁed as most important for classifying churn?

14. Compare the neural network model with the CART and C4.5 models for this task in Chapter 6. Describe the beneﬁts and drawbacks of the neural network model compared to the others. Is there convergence or divergence of results among the models?

C H A P T E R

8

HIERARCHICAL AND

Dans le document An Introduction to Data Mining (Page 162-166)