QA: I noticed the update of your credit scoring benchmarking paper which you published together with Stefan Lessmann in EJOR. Can you highlight some of the key findings?

By: Bart Baesens, Seppe vanden Broucke

This QA first appeared in Data Science Briefings, the DataMiningApps newsletter as a “Free Tweet Consulting Experience” — where we answer a data science or analytics question of 140 characters maximum. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.


You asked: I noticed the update of your credit scoring benchmarking paper which you published together with Stefan Lessmann in EJOR. Can you highlight some of the key findings?

Our answer:

Sure thing!  Here we go:

  • For the individual classifiers, neural networks perform usually best in terms of percentage correctly classified, AUC and H-index, but poor in Brier Score (which is a calibration measure)
  • Random forest performs consistently best for the homogeneous ensemble models. Basically, this finding can also be generalized to other settings such as churn prediction, fraud detection, etc.
  • Heuristic search works well for heterogeneous ensemble selection. We found Bagged Hill-Climbing to work especially well. Very complex, dynamic heterogeneous ensemble selection methods perform poorly.
  • AUC and H-measure are strongly correlated with correlations going above 95%. The Brier Score differs from the other measures.