By: Eugen Stripling, Seppe vanden Broucke, Bart Baesens
This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps. Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail over at briefings@dataminingapps.com and let’s get in touch!
In saturated markets, the success of a corporate organization is determined by constantly improving its services towards customers with the ultimate goal to prevent them from switching to a competitor. Customer retention management has therefore become increasingly important in industries such as telecommunication and finance [1]. As underpinned by research studies [2], the attraction of new customers costs five to six times more than preventing existing clients from leaving the company. Thus, making it vital for organizations to retain their customers. Identifying individuals that are likely to churn from a customer base with a potential size of several million is a challenging task. Consequently, companies inevitably require sophisticated analytical models that help them to select customers with a high risk to churn and still allow executing profitable retention campaigns. The profit of such campaigns can be measured by considering, for example, the customer lifetime value (CLV) of retained customers and the costs of contact them and offer an incentive. The business requirements are thus twofold: get reliable churn predictions and maximize the profit of the retention campaign. This creates the need for the development of profit-sensitive customer churn prediction models. A churn model has to fulfill all requirements within the given resources of the retention campaign, i.e. select only customers that are the most valuable to the company. Next, the model should provide a high degree of interpretability, meaning that the analyst should be able to get insights into the complex variable interactions and be able to communicate those to stakeholders and other parties involved. High interpretability is achieved by using so-called white-box models, which disclose the variable interaction structure and help to explain which variables correlate with the target. Typically, only a handful of features are required to predict churn [1]. Also in line with the principles of Occam’s razor, the model should therefore employ a feature selection mechanism to build a simple model with only the most important features. As a side effect, this often enhances the interpretability since only a number of features are used to predict churn. Finally, the model should make it possible to easily incorporate business knowledge, which in turn increases model justification and boosts the likelihood of an actionable outcome.
All aforementioned model requirements can be met, when, for example, combining a white-box model such as logistic regression with a profit-based metric and maximizing for the chosen metric using a genetic algorithm. The expected maximum profit (EMP) is such a profit-based metric. The EMP for customer churn explicitly takes the costs of contact and offer as well as the CLV into account. It permits a profit-based classification performance evaluation, and unambiguously identifies the most profitable churn model [3]. When constructing a logistic model in which the regression coefficients are optimized according to the EMP metric by means of a genetic algorithm, this yields a profit-sensitive model. Genetic algorithms are highly flexible optimization techniques. The search space in which the genetic algorithm operates can easily be extended and constrained so that feature selection and business knowledge can be integrated with ease. Moreover, the logistic regression model structure provides high interpretability from which valuable insights can be derived. Having a churn model that maximizes the profit in the model construction step likely produces a more profitable model than just selecting a model based on the EMP criterion in the evaluation step; it thus better aligns with business requirements. In this article, we only focused on customer churn prediction, but profit-maximizing modeling can similarly be applied in other domains such as risk analytics, using for instance the EMP for credit scoring [4]. In principle, other base models such as decision trees can be applied instead of logistic regression. They also possess high interpretability but can easier cope with nonlinearities.
To conclude this article, following are the main takeaways in a nutshell:
- Corporations are interested in executing profit-maximized customer retention campaigns by identifying and retaining customers that are the most valuable to them rather than in models with just high prediction accuracy;
- Profit-sensitive churn models better align with business requirements than conventional models;
- The model should provide a high degree of interpretability, which can be further improved by a feature selection mechanism; and, finally,
- It should be possible for the analyst to incorporate valuable business knowledge into the model not only for justification purposes but also to increase the likelihood of an actionable outcome.
Endnotes
- B. Baesens, Analytics in a Big Data World: The Essential Guide to Data Science and its Applications. Wiley, 2014.
- A. D. Athanassopoulos, “Customer satisfaction cues to support market segmentation and explain switching behavior,” Journal of Business Research, vol. 47, no. 3, pp. 191 – 207, 2000.
- T. Verbraken, W. Verbeke, and B. Baesens, “A novel profit maximizing metric for measuring classification performance of customer churn prediction models,” Knowledge and Data Engineering, IEEE Transactions on, vol. 25, no. 5, pp. 961–973, 2013.
- T. Verbraken, C. Bravo, R. Weber, and B. Baesens, “Development and application of consumer credit scoring models using profit-based classification measures,” European Journal of Operational Research, vol. 238, no. 2, pp. 505 – 513, 2014.