By: Bart Baesens, Seppe vanden Broucke
This QA first appeared in Data Science Briefings, the DataMiningApps newsletter. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.
You asked: How can we establish cross-fertilization between the various business units investing in analytics?
Our answer:
Big data and analytics have matured differently across the various business units of an organization. Triggered by the introduction of regulatory guidelines (e.g. Basel II/III, Solvency II), many firms (especially financial institutions) invested in big data and analytics for risk management for quite some time now. Years of analytical experience and perfecting contributed to very sophisticated models for insurance risk, credit risk, operational risk, market risk and fraud risk. The most advanced analytical techniques such as survival analysis, random forests, neural networks and (social) network learning have been used in these applications. Furthermore, these analytical models have been complimented with powerful model monitoring frameworks and stress testing procedures to fully leverage their potential.
Marketing analytics is still somewhat less mature with many firms starting to deploy their first models for churn prediction, response modeling or customer segmentation. These are typically based on simpler analytical techniques such as logistic regression, decision trees or k-means clustering. Other application areas such HR and supply chain analytics start to gain traction although not many successful case studies have been reported yet.
The disparity in maturity creates a tremendous potential for cross-fertilization of model development and monitoring experiences. After all, classifying whether a customer is creditworthy or not in risk management, is analytically the same as classifying a customer as a responder or not in marketing analytics, or classifying an employee as a churner or not in HR analytics. The data preprocessing issues (e.g. missing values, outliers, categorization), classification techniques (e.g. logistic regression, decision trees, random forests, etc.) and evaluation measures (e.g. AUC, lift curves) are all similar. Only the interpretation and usage of the models will be different. The cross-fertilization also applies to model monitoring since most of the challenges and approaches are essentially the same. Finally, gauging the effect of macro-economic scenarios using stress testing (which is a common practice in credit risk analytics) could be another example of sharing useful experiences across applications.
To summarize, less mature analytical applications (e.g. marketing, HR and supply chain analytics) can substantially benefit from many of the lessons learned by more mature applications (e.g. risk management) as such avoiding many rookie mistakes and expensive beginner traps. Hence, the importance of rotational deployment to generate maximum economic value and return.