By: Bart Baesens, Seppe vanden Broucke
This QA first appeared in Data Science Briefings, the DataMiningApps newsletter as a “Free Tweet Consulting Experience” — where we answer a data science or analytics question of 140 characters maximum. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.
You asked: What is the betweenness in social network analytics and how can it be used for fraud detection?
Our answer:
The betweenness measures the extent to which a node lies on the shortest paths connecting any two nodes in the network. This can be interpreted as the extent to which information passes through this node. A node with a high betweenness possibly connects communities (i.e., subgraphs in the network) with each other. This is depicted in the figure below:
In the figure, the shaded node connects the three communities with each other and has the highest betweenness. If this node is “infected” by fraud from one community, fraud can easily pass on towards the other communities. One option is, for example, to remove this node from the network to prevent that fraud contaminates the other communities. Let gjk be the number of shortest paths between node j and node k, and gjk(vi) the number of shortest paths between node j and node k that pass through node, then the betweenness becomes