Contributed by: Sebastiaan Höppner, Tim Verdonck
This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps. Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail over at briefings@dataminingapps.com and let’s get in touch!
In the column Analytics in the Battle Against Fraud, we announced that BNP Paribas Fortis and the KU Leuven joined forces in a research program to combat transactional fraud through analytics. BNP Paribas Fortis financially supports the KU Leuven Fraud Analytics chair to stimulate co-creation and to invest in fraud analytics expertise. In this post, we share some general findings and high-level insights.
Fraud Detection Challenges
There exist many different types of fraud like, for example, insurance fraud, money laundering, healthcare fraud, etc. However, all kinds of fraud, including credit transfer fraud, share the following key characteristics (Van Vlasselaer et al., 2015): Fraud is an uncommon, well-considered, imperceptibly concealed, time-evolving and often carefully organized crime which appears in many types and forms.
Fraudulent transactions are relatively uncommon. Typically less than 0.001% (rough estimate) of all credit transfers are related to fraud. Despite the rarity of fraud cases, they cause large losses for the bank and potentially cause clients to loose faith in the bank as a trusted institution. From the data scientist’s point of view, detecting the rare fraudulent transfers within large stacks of data is arguably the biggest challenge. On average, 100 up to 500 transfers are executed each second over various different channels. As a result, huge volumes of data need to be processed and stored in (near) real-time. Therefore, the fraud detection systems need to be operational efficient since the decision time is often limited to a few seconds.
Fraudsters try to steal money and personal credentials through different modus operandi (MOs) like phishing or hacking. Each MO is a well-considered and carefully planned fraud structure by which fraudsters try to conceal there activities by imitating normal behavior as good as possible. In some fraud cases it was even reported that the fraudster was able to steal a victim’s credentials but stole the money from the account a few months later. Continuous monitoring and updating is thus of utmost importance. Fraudsters learn from their mistakes and those of their predecessors, so they adapt and refine their strategies to avoid being detected. In turn, the bank is forced to continuously improve its fraud detection systems since machine learning methods can become inadequate if they miss to adapt to new fraud strategies, i.e. static models that are never updated. This leads to a persisting cat-and-mouse play between the fraudsters and the fraud fighters.
Profiling Money Mules
An important aspect of each transaction is the beneficiary. If the bank recognizes a beneficiary as untrustworthy, the transfer is considered suspicious and its legitimacy decreases. A particular kind of beneficiaries are so-called money mules. A money mule is an intermediary who transfers stolen money on behalf of criminals. The money is transferred from the mule’s account to the scam operator, typically in another country. Machine learning techniques are used to predict an account’s propensity of being a potential money mule or becoming one in the near future. Most money mules that are known by BNP Paribas Fortis are not customers of the bank so limited data is available on these type of fraudulent accounts. As a result, profiling money mules through data-analytical techniques may lead to false positives causing regular accounts to be incorrectly flagged as illegitimate. Of course, one should avoid harassing good customers by blocking their transactions and accounts. One possible approach to mine extra data on external mule accounts is by using (social) network analysis in order discover potential relationships between customers and fraudsters.
A Cost Driven Approach
Credit transfer fraud detection is by definition a cost-sensitive problem, in the sense that the cost due to a false positive is different than the cost of a false negative. When predicting a transaction as fraudulent, when in fact it is not a fraud, there is an administrative cost that is incurred by the bank. On the other hand, when failing to detect a fraud, the amount of that transaction is lost. Moreover, it is not enough to assume a constant cost difference between false positives and false negatives, as the amount of the transactions varies quite significantly; therefore, its financial impact is not constant but depends on each transaction. Therefore, when building and evaluating fraud detection models we should focus more on the costs related to classification and incorporate them in the construction of the models. One such cost-based measure has been proposed by Correa Bahnsen et al. (2013) to evaluate credit card fraud detection models, taking into account the different financial costs incurred by the fraud detection process.
References
- [1] Alejandro Correa Bahnsen, Aleksandar Stojanovic, Djamila Aouada, and Bjorn Ottersten. Cost sensitive credit card fraud detection using bayes minimum risk. In Machine Learning and Applications (ICMLA), 2013 12th International Conference on, volume 1, pages 333–338. IEEE, 2013.
- [2] Veronique Van Vlasselaer, Cristian Bravo, Olivier Caelen, Tina Eliassi-Rad, Leman Akoglu, Monique Snoeck, and Bart Baesens. Apate: A novel approach for automated credit card transaction fraud detection using network-based extensions. Decision Support Systems, 75:38–48, 2015.