By: Bart Baesens, Seppe vanden Broucke
This QA first appeared in Data Science Briefings, the DataMiningApps newsletter. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.
You asked: What package can we use to calculate weights of evidence and information value in R?
Our answer:
I suggest you use the package Information written by Kim Larsen. Here is a small example of how to use it using the hmeq (=home equity loans) data set which can be downloaded from http://www.creditriskanalytics.net/:
library(Information) #Information package author is Kim Larsen hmeq <- read.csv("c:/temp/hmeq.csv") IV <- create_infotables(data=hmeq, y="BAD") print(head(IV$Summary)) MultiPlot(IV,"LOAN")
The result is as follows:
Variable IV DEBTINC 1.8771930 DELINQ 0.5653247 VALUE 0.4703797 DEROG 0.3471889 CLAGE 0.2301126 LOAN 0.1630072
You can see that DEBTINC is the most predictive variable in terms of information value.
The weights of evidence (WOE) plot for the LOAN variable looks as follows:
Remember, positive (negative) weights of evidence means less (more) risk.