• Aucun résultat trouvé

Understanding Telecom Customer Churn with Machine Learning: From Prediction to Causal Inference

N/A
N/A
Protected

Academic year: 2022

Partager "Understanding Telecom Customer Churn with Machine Learning: From Prediction to Causal Inference"

Copied!
2
0
0

Texte intégral

(1)

Understanding telecom customer churn with machine learning: from prediction to causal

inference

?

Th´eo Verhelst1, Olivier Caelen2, Jean-Christophe Dewitte2, Bertrand Lebichot1, and Gianluca Bontempi1

1 Machine Learning Group, Computer Science Department, Universit´e Libre de Bruxelles, Brussels, Belgium

{tverhels,gbonte}@ulb.ac.be

2 Data science team, Orange Belgium

{olivier.caelen,jean-christophe.dewitte}@orange.be

Telecommunication companies are evolving in a highly competitive market where attracting new customers is much more expensive than retaining existing ones [3]. Retention campaigns can be used to prevent customer churn, but their effectiveness depends on the availability of accurate prediction models. Churn prediction is notoriously a difficult problem because of the large amount of data, non-linearity, imbalance and low separability between the classes of churners and non-churners. In this paper, we discuss a real case of churn prediction based on Orange Belgium customer data.

In the first part of the paper we focus on the design of an accurate predic- tion model. The large class imbalance between the two classes is handled with the EasyEnsemble algorithm [4] using a random forest classifier. The dataset contains 73 variables and about 7.6 million entries, covering a 5 months time window in 2018. The classification model is trained on the first 4 months of data and evaluated on the last month. We also assess the impact of different data preprocessing techniques including feature selection and engineering. Re- sults show that feature selection can be used to reduce computation time and memory requirements, though engineering variables does not necessarily improve performance.

In the second part of the paper we explore the application of data-driven causal inference, which allows to infer causal relationships between variables purely from observational data. More specifically, we applied 5 different causal inference methods, namely PC [6], Grow-shrink (GS) [5], Incremental Associ- ation Markov Blanket (IAMB) [7], Minimum interaction maximum relevance (mIMR) [2] and D2C [1]. PC infers the set of causal graphs faithful to the dataset, GS and IAMB infer the Markov blanket of the churn variable, and mIMR and D2C return the direct causes of churn. Two implementations of the mIMR al- gorithm are used: one based on histograms to estimate mutual information, and another assuming Gaussian variables, thus allowing a closed-form formula for the computation of the mutual information. The results of these algorithms (summa-

?Copyright c2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0).

(2)

2 T. Verhelst et al.

rized in figure 1) are varied and are consistent with prior knowledge of the causes of churn. We conclude that the bill shock and the wrong tariff plan positioning are putative causes of churn.

Churn

Tenure Tariff plan

Out of bundle

Data usage Number contracts

Age

Messages, voice calls Province

GS, mIMR 1 & 2 IAMB, mIMR 1, D2C

GS mIMR 2 mIMR 1, D2C

GS, mIMR 1, D2C GS, mIMR 2

GS, mIMR 2

Fig. 1.Summary of results of causal infer- ence. mIMR 1 stands for the histogram- based estimator, and mIMR 2 for the es- timator with Gaussian assumption.

Finally, we present a novel method to evaluate, in terms of the direction and magnitude, the impact of causally relevant variables on churn. We eval- uate the average probability of churn predicted by the learning algorithm on the dataset, before and after a shift of the values of the variable of interest.

The difference between these two av- erage probabilities is a measure of the effect of a manipulation of the variable on churn. This method is based on the assumption that no latent variables are confounding factors of churn and the variable under inspection. Results show that, on the one hand, some vari- ables such as the tenure and the num- ber of contracts are observed to be

monotonically associated with the churn probability. On the other hand, some variables have a non-monotonic causal influence on churn. For example, variables related to the amount paid by the customer and the data usage cause more churn when they are increased, but the opposite is not true.

References

1. Bontempi, G., Flauder, M.: From dependency to causality: a machine learning ap- proach. The Journal of Machine Learning Research16(1), 2437–2457 (2015) 2. Bontempi, G., Meyer, P.E.: Causal filter selection in microarray data. In: Proceed-

ings of the 27th international conference on machine learning (icml-10). pp. 95–102 (2010)

3. Hadden, J., Tiwari, A., Roy, R., Ruta, D.: Computer assisted customer churn man- agement: State-of-the-art and future trends. Computers & Operations Research 34(10), 2902–2917 (2007)

4. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learn- ing. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39(2), 539–550 (2009). https://doi.org/10.1109/tsmcb.2008.2007853

5. Margaritis, D., Thrun, S.: Bayesian network induction via local neighborhoods. In:

Advances in neural information processing systems. pp. 505–511 (2000)

6. Spirtes, P., Glymour, C.: An algorithm for fast recovery of sparse causal graphs.

Social science computer review9(1), 62–72 (1991)

7. Tsamardinos, I., Aliferis, C.F., Statnikov, A.R., Statnikov, E.: Algorithms for large scale markov blanket discovery. In: FLAIRS conference. vol. 2, pp. 376–380 (2003)

Références

Documents relatifs

Ensuite, baissez vos yeux en direction de votre jambe, qui tremble encore, et là faites comme si vous veniez de comprendre, ayez l’air gênée puis refermez les yeux.. Ce jeu de

In practice, comparisons between regression and propensity- score methods suggest they usually yield similar results.[15, 19, 20] Like standard regression methods, a

In the case of churn, evaluating a model based on a traditional measure such as accuracy or predictive power, does not yield to the best results when measured by the actual

The final part of this model includes optimization of the original meta-metric, which is designed to reflect the real-life experience in usage of prediction models that rely on a

The structure of a relational model can be depicted by super- imposing the dependencies on the ER diagram of the re- lational schema, as shown in Figure 1, and labeling each arrow

ﺔﺟﻣرﺑﻣ رﯾﻏ وأ ﺔﺟﻣرﺑﻣ (. - تارارﻘﻟا ذﺎﺧﺗا بﯾﻟﺎﺳأ ثﯾﺣ نﻣ ) ﺔﯾﻣﻛ وأ ﺔﯾﻔﯾﻛ. ﺔﻌﺑرأ ﻰﻠﻋ مﯾﻘﻟا ﻰﻠﻋ ﺔﻣﺋﺎﻘﻟا ةدﺎﯾﻘﻟا زﻛﺗرﺗ ﺔﯾﺳﺎﺳأ ئدﺎﺑﻣ. ﺎﻬﯾﻟإ نوﻌﺳﯾو

The GPScan.VI program is a general purpose LabVIEW program for the control of National Instruments high-speed data acquisition boards, allowing to design scanning imaging

Cela trouve encore plus son sens dans le cadre de l’apprentissage fédéré, puisque les clients ne disposant chacun que d’un nombre limité d’exemples des données, ceux-ci