By: Bart Baesens, Seppe vanden Broucke | Read and comment on this article on Medium
This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps.
Big data and Analytics: terms that frequently pop up in newspapers, magazines, airports or even during pub chats to strike up conversation. These days, everybody talks about it, but only few are actually doing it successfully! One of the reasons is that firms often lack a clear insight into the critical success factors for building actionable analytical models. Hence, in this column, we provide some recent research insights based upon partnerships we initiated with firms world-wide.
In short, these are:
- The business relevance of the analytical model should be guaranteed.
- Statistical performance and validity need to be balanced against statistical performance.
- Operational efficiency and economic cost need to be taken into account.
- Regulatory compliance is becoming increasingly important.
Business relevance
In order to be successful, an analytical model needs to satisfy various requirements. A first key requirement is business relevance. The analytical model should solve the business problem that it was developed for! It makes no sense to have a high-performing analytical model that was sidetracked from the original business problem. In other words, if the business problem is detecting insurance fraud, then the analytical model must be sure to detect insurance fraud. Obviously, this requires a thorough business knowledge and understanding of the problem to be addressed before any analysis can start. Some example kick-off questions are: how do we define (what is it?), measure (how to see it?) and manage (what to do with it?) fraud?
Statistical performance and validity
Another important success factor is statistical performance and validity. In other words, the analytical model should make sense statistically. It should be significant and provide good predictive or descriptive performance.
Depending on the type of analytics, various performance metrics can be used. In customer segmentation, statistical evaluation measures will contrast intra-cluster similarity with inter-cluster dissimilarity. Analytical churn prediction models will be evaluated in terms of their ability to assign high churn scores to the most likely churners, etc.
Interpretability refers to the fact that the analytical model should be comprehensible or understandable to the decision maker (e.g. marketer, fraud analyst, credit expert). Justifiability indicates that the model is in accordance with the expectations and business knowledge of the expert.
Both interpretability and justifiability are subjective and depend on the knowledge and experience of the decision maker. Both often need to be balanced against statistical performance, which implies that complex, non-interpretable models (e.g. neural networks, random forests, …) are often better performing in a statistical sense. In settings like credit risk modeling, interpretability and justifiability are very important because of the societal impact of these models. However, in settings like fraud detection and marketing response modeling, they are typically less of an issue.
Operational efficiency and economic cost
Operational efficiency relates to the effort that is needed to evaluate, monitor, backtest or rebuild the model. From this perspective, it is quite obvious that a neural network or random forest is less efficient that e.g. a plain vanilla regression model or decision tree. In settings like credit card fraud detection, operational efficiency is very important because a decision should be made within a few seconds after the credit card transaction was initiated.
Economical cost refers to the cost that is needed to gather the model inputs, run the model and process its outcome(s). Also the cost of external data and/or models should be taken into account here. This will enable you to calculate the economic return on the analytical model, which is typically not a straightforward exercise.
Regulatory compliance
Finally, regulatory compliance is becoming more and more important. This refers to the extent to which the model is compliant with regulation and legislation. In a credit risk modeling setting, it is important that the models are compliant with the Basel II and III regulations. In an analytical insurance setting, the Solvency II accord must be respected. In marketing settings, regulations regarding privacy and ethical data governance are becoming increasingly important.
To conclude, in this article we briefly zoomed into the critical success factors for building analytical models. As already mentioned, the importance of each of them depends on the application field in which you are working.