• Aucun résultat trouvé

Applying HMM to the Classification of Customers

4.5 Applying HMM to the Classification of Customers

In this section, the HMM discussed in Sect.4.4is applied to the classification of the customers of a computer service company. We remark that there are a number of classification methods such as machine learning [16, 143, 179] and Bayesian learning, interested readers can consult the book by Young and Calvert [212]. In this problem, HMM is an efficient and effective classification method but we make no claim that HMM is the best one.

A computer service company offers four types of long distance call services I, II, III and IV (four different periods of a day). From the customer database, the information of the expenditure distribution of 71 randomly chosen customers is obtained. A longitudinal study was then carried out for half a year to investigate the customers. Their behavior and responses were captured and monitored during the period of investigation. For simplicity of discussion, the customers are classified into two groups. Among them22customers are known to be loyal customers (GroupA) and the other49customers are not loyal customers (GroupB). This classification is useful to marketing managers when they plan any promotions. The customers in Group A will be given promotions on new services and products. While the customers in GroupBwill be offered discounts on the current services to prevent them from switching to the competitor companies.

Two-thirds of the data are used to build the HMM and the remaining data are used to validate the model. Therefore,16candidates are randomly taken (these customers are labeled as the first16customers in Table4.2) from GroupAand37candidates from groupB. The remaining6candidates (the first6customers in Table4.4) from GroupAand12candidates from GroupBare used for validating the constructed HMM. A HMM having four observable states (I, II, III and IV) and two hidden states (Group A and Group B) is then built.

From the information of the customers in GroupAand GroupBin Table4.2, the average expenditure distributions for both groups are computed in Table4.3. This means that a customer in Group A (Group B) is characterized by the expenditure distribution in the first (second) row of Table4.3.

An interesting problem is the following: Given the expenditure distribution of a customer, how does one classify the customer correctly (Group A or Group B) based on the information in Table4.4? To tackle this problem, one can apply the method discussed in the previous section to compute the transition probability˛in the hidden states. This value of˛can be used to classify a customer. If˛is close to1 then the customer is likely to be a loyal customer. If˛is close to0then the customer is likely to be an unloyal customer.

The values of˛for all the53customers are listed in Table4.2. It is interesting to note that the values of˛of all the first16customers (Group A) lie in the interval Œ0:83; 1:00. While the values of˛of all the other customers (Group B) lie in the intervalŒ0:00; 0:69. Based on the values of˛obtained, the two groups of customers can be clearly separated by setting the cutoff valueˇto be0:75. A possible decision

Table 4.2 Two-third of the data are used to build the HMM

Customer I II III IV ˛ Customer I II III IV ˛

1 1.00 0.00 0.00 0.00 1.00 2 1.00 0.00 0.00 0.00 1.00

3 0.99 0.01 0.00 0.00 1.00 4 0.97 0.03 0.00 0.00 1.00

5 0.87 0.06 0.04 0.03 0.98 6 0.85 0.15 0.00 0.00 0.92

7 0.79 0.18 0.02 0.01 0.86 8 0.77 0.00 0.23 0.00 0.91

9 0.96 0.01 0.00 0.03 1.00 10 0.95 0.00 0.02 0.03 1.00

11 0.92 0.08 0.00 0.00 1.00 12 0.91 0.09 0.00 0.00 1.00

13 0.83 0.00 0.17 0.00 0.97 14 0.82 0.18 0.00 0.00 0.88

15 0.76 0.04 0.00 0.20 0.87 16 0.70 0.00 0.00 0.30 0.83

17 0.62 0.15 0.15 0.08 0.69 18 0.57 0.14 0.00 0.29 0.62

19 0.56 0.00 0.39 0.05 0.68 20 0.55 0.36 0.01 0.08 0.52

21 0.47 0.52 0.00 0.01 0.63 22 0.46 0.54 0.00 0.00 0.36

23 0.25 0.75 0.00 0.00 0.04 24 0.22 0.78 0.00 0.00 0.00

25 0.21 0.01 0.78 0.00 0.32 26 0.21 0.63 0.00 0.16 0.03

27 0.18 0.11 0.11 0.60 0.22 28 0.18 0.72 0.00 0.10 0.00

29 0.15 0.15 0.44 0.26 0.18 30 0.07 0.93 0.00 0.00 0.00

31 0.04 0.55 0.20 0.21 0.00 32 0.03 0.97 0.00 0.00 0.00

33 0.00 0.00 1.00 0.00 0.10 34 0.00 1.00 0.00 0.00 0.00

35 0.00 0.00 0.92 0.08 0.10 36 0.00 0.94 0.00 0.06 0.00

37 0.03 0.01 0.96 0.00 0.13 38 0.02 0.29 0.00 0.69 0.00

39 0.01 0.97 0.00 0.02 0.00 40 0.01 0.29 0.02 0.68 0.00

41 0.00 0.24 0.00 0.76 0.00 42 0.00 0.93 0.00 0.07 0.00

43 0.00 1.00 0.00 0.00 0.00 44 0.00 1.00 0.00 0.00 0.00

45 0.00 0.98 0.02 0.00 0.00 46 0.00 0.00 0.00 1.00 0.06

47 0.00 1.00 0.00 0.00 0.00 48 0.00 0.96 0.00 0.04 0.00

49 0.00 0.91 0.00 0.09 0.00 50 0.00 0.76 0.03 0.21 0.00

51 0.00 0.00 0.32 0.68 0.07 52 0.00 0.13 0.02 0.85 0.01

53 0.00 0.82 0.15 0.03 0.00

Table 4.3 The average expenditures of Group A and Group B

Group I II III IV

A 0.8806 0.0514 0.0303 0.0377 B 0.1311 0.5277 0.1497 0.1915

rule can therefore be defined as follows: Classify a customer to GroupAif˛ˇ, otherwise classify the customer to GroupB. Referring to Fig.4.1, it is clear that the customers are separated by the hyperplaneHˇ. The hyperplaneHˇis parallel to the two hyperplanesH1andH2such that it has a perpendicular distance ofˇfromH2. The decision rule is applied to the remaining22 captured customers. Among them,6customers (the first six customers in Table4.4) belong to GroupAand12 customers belong to GroupB. Their˛values are computed and listed in Table4.4.

It is clear that if the value ofˇis set to be0:75, all the customers will be classified correctly.

4.7 Exercises 105

Table 4.4 The remaining one-third of the data for validation of the HMM

Customer I II III IV ˛ Customer I II III IV ˛

1’ 0.98 0.00 0.02 0.00 1.00 2’ 0.88 0.01 0.01 0.10 1.00

3’ 0.74 0.26 0.00 0.00 0.76 4’ 0.99 0.01 0.00 0.00 1.00

5’ 0.99 0.01 0.00 0.00 1.00 6’ 0.89 0.10 0.01 0.00 1.00

7’ 0.00 0.00 1.00 0.00 0.10 8’ 0.04 0.11 0.68 0.17 0.08

9’ 0.00 0.02 0.98 0.00 0.09 10’ 0.18 0.01 0.81 0.00 0.28 11’ 0.32 0.05 0.61 0.02 0.41 12’ 0.00 0.00 0.97 0.03 0.10 13’ 0.12 0.14 0.72 0.02 0.16 14’ 0.00 0.13 0.66 0.21 0.03 15’ 0.00 0.00 0.98 0.02 0.10 16’ 0.39 0.00 0.58 0.03 0.50 17’ 0.27 0.00 0.73 0.00 0.38 18’ 0.00 0.80 0.07 0.13 0.00 Table 4.5 Probability

distributions of dice A and dice B

Dice 1 2 3 4 5 6

A 1/6 1/6 1/6 1/6 1/6 1/6

B 1/12 1/12 1/4 1/4 1/4 1/12

Table 4.6 Observed

distributions of dots Dice 1 2 3 4 5 6

A 1/8 1/8 1/4 1/4 1/8 1/8

4.6 Summary

In this chapter, we propose a simple HMM with estimation methods. The framework of the HMM is simple and the model parameters can be estimated efficiently.

Application to customer classification with practical data taken from a computer service company is presented and analyzed. Further discussions on new HMMs and applications will be given in Chap. 8.

4.7 Exercises

1. Solve the minimization problem (4.1) to get the optimal˛.

2. Consider the process of choosing a die again. Suppose that this time we have two dice A and B, and that each of them has six faces (1; 2; 3; 4; 5 and 6).

The probability distribution of dots obtained by throwing dice A and B are given in Table4.5. Each time a die is to be chosen, we assume that with probability˛, Die A is chosen, and with probability.1˛/, Die B is chosen. This process is hidden as we don’t know which die is chosen. After a long-run observation, the observed distribution of dots is given in Table4.6. Use the method in Sect.4.2to estimate the value of˛.

3. Prove Proposition4.1.

4. Prove Proposition4.2.

Table 4.7 The new average expenditures of Group A and Group B

Group I II III IV

A 0.6000 0.1000 0.0500 0.2500 B 0.1500 0.5000 0.1500 0.2000

5. Consider the classification problem in Sect. 4.5. Suppose the average expenditures of Groups A and Group B have been changed and are recorded in Table4.7, then find the new˛for the classification rule.

Chapter 5

Markov Decision Processes for Customer Lifetime Value

5.1 Introduction

In this chapter a stochastic dynamic programming model with a Markov chain is proposed to capture customer behavior. The advantage of using Markov chains is that the model can take into account the customers switching between the company and its competitors. Therefore customer relationships can be described in a probabilistic way, see for instance Pfeifer and Carraway [170]. Stochastic dynamic programming is then applied to solve the optimal allocation of the promotion budget for maximizing the Customer Lifetime Value (CLV). The proposed model is then applied to practical data in a computer services company.

The customer equity should be measured in making the promotion plan so as to achieve an acceptable and reasonable budget. A popular approach is the CLV.

Kotler and Armstrong [134] define a profitable customer as “a person, household, or company whose revenues over time exceeds, by an acceptable amount, the company costs of attracting, selling, and servicing that customer.” This excess is called the CLV. In some literature, CLV is also referred to as “customer equity” [14]. In fact, some researchers define CLV as the customer equity less the acquisition cost.

Nevertheless, in this chapter CLV is defined as the present value of the projected net cash flow that a firm expects to receive from the customer over time [35].

Recognizing its importance in decision making, CLV has been successfully applied to the problems of pricing strategy [13], media selection [121] and setting optimal promotion budget [20, 61].

To calculate the CLV, a company should estimate the expected net cash flow that they expect to receive from the customer over time. The CLV is the present value of that stream of cash flow. However, it is a difficult task to estimate the net cash flow to be received from the customer. In fact, one needs to answer, the following questions:

1. How many customers one can attract given a specific advertising budget?

2. What is the probability that the customer will stay with the company?

3. How does this probability change with respect to the promotion budget?

W.-K. Ching et al., Markov Chains, International Series in Operations Research

& Management Science 189, DOI 10.1007/978-1-4614-6312-2 5,

© Springer Science+Business Media New York 2013

107

To answer the first question, there are a number of advertising models found in the book by Lilien, Kotler and Moorthy [148]. The second and the third questions give rise to an important concept, the retention rate. The retention rate [123] is defined as “the chance that the account will remain with the vendor for the next purchase, provided that the customer has bought from the vendor on each previous purchase”. Jackson [123] proposed an estimation method for the retention rate based on historical data. Other models for the retention rate can also be found in [82, 148].

Blattberg and Deighton [20] proposed a formula for the calculation of CLV (customer equity). The model is simple and deterministic. Using their notations (see also [13, 14]), the CLV is the sum of two net present values: the return from acquisition spending and the return from retention spending. In their model, CLV is defined as

CLVD am„ ƒ‚ …A

acquisition

C X1 kD1

a.mR

r/Œr.1Cd /1k

„ ƒ‚ …

retention

DamACa.mR

r/ r

.1Cd r/

(5.1)

wherea is the acquisition rate,A is the level of acquisition spending,m is the margin on a transaction,Ris the retention spending per customer per year,ris the yearly retention rate (a proportion), anddis the yearly discount rate appropriate for marketing investment. Moreover, they also assume that the acquisition ratea and retention raterare functions ofAandRrespectively, and are given by

a.A/Da0.1eK1A/ and

r.R/Dr0.1eK2R/

wherea0 andr0 are the estimated ceiling rates andK1 andK2 are two positive constants. In this chapter, a stochastic model (Markov decision process) is proposed for the calculation of CLV and promotion planning.

The rest of the chapter is organized as follows. In Sect.5.2, the Markov chain model for modelling the behavior of customers is presented. Section5.4extends the model to consider multi-period promotions. In Sect. 5.3, stochastic dynamic programming is then used to calculate the CLV of the customers for three different scenarios:

1. infinite horizon without constraint (without limit in the number of promotions), 2. finite horizon (with a limited number of promotions), and

3. infinite horizon with constraints (with a limited number of promotions).

In Sect.5.5, we consider higher-order Markov decision processes with applications to the CLV problem. Finally a summary is given to conclude the chapter in Sect.5.5.