By: Bart Baesens, Seppe vanden Broucke
This QA first appeared in Data Science Briefings, the DataMiningApps newsletter. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.
You asked: Can you summarize the findings of your updated benchmarking study for credit scoring?
Our answer:
Our study consolidates previous work in PD modeling and provides a holistic picture of the state-of-the-art in predictive modeling for retail scorecard development. From an academic point of view, an important question is whether efforts into the development of novel scoring techniques are worthwhile. Our study provides some support but also raises concerns. We find some advanced methods to perform extremely well on our credit scoring data sets, but never observe the most recent classifiers to excel. Neural networks perform better than extreme learning machines, random forest better than rotation forests, and dynamic selective ensembles worse than almost all other classifiers. This may indicate that progress in the field has stalled, and that the focus of attention should move from PD models to other modeling problems in the credit industry including data quality, scorecard recalibration, variable selection, and LGD/EAD modeling. On the other hand, we do not expect the desire to develop better, more accurate scorecards to end any time soon. Likely, future papers will propose novel classifiers and the “search for the silver bullet will continue. An implication of our study is that such efforts must be accompanied by a rigorous assessment of the proposed method vis-à-vis challenging benchmarks. In particular, we recommend random forests as a benchmark against which to compare new classification algorithms.