Contributed by: Maria Óskarsdóttir, Jan Vanthienen, Bart Baesens
This article first appeared in Data Science Briefings, the DataMiningApps newsletter. Subscribe now for free if you want to be the first to receive our feature articles, or follow us @DataMiningApps. Do you also wish to contribute to Data Science Briefings? Shoot us an e-mail over at briefings@dataminingapps.com and let’s get in touch!
In most data mining applications, data sets are composed of variables with exact values and strict boundaries. A churn dataset in retail banking could have features such as age of customers, their account balance and how often they use their online bank. The values of these features are real numbers, which can be used as such in the model building. However, it might be practical to group the values together, and use for example age brackets instead of age or even a dummy variable indicating whether a person is young or not. Then what does it mean to be young? Indeed that depends on the definition for this specific application and changing it might greatly affect the insights generated by the dataset.
In contrast to exact and strict boundaries, representing them more vaguely might be convenient. This is what fuzzy set theory is used for. In this case, the instances don’t have to be either one thing or the other, black or white. Instead, it is possible to specify to which degree they are something and thus incorporate gradual transition.
Mathematically a fuzzy set ~A is defined as
~A = { (x, µA(x)) | x ∈ A, µA(x) ∈ [0,1] }
where A is any regular set. The function µA(x) is called a membership function and describes to which extent an element belongs to ~A. As such, a larger value of the membership function indicates a higher degree of membership. Fuzzy sets are a generalizations of classical set theory and allow elements to belong to many sets at the same time. With the transition from regular sets to fuzzy ones, it is possible to represent terms which don’t have an absolute meaning such as young, tall and rich.
Fuzzy sets have been widely applied in data mining and machine learning, leading to fuzzy data mining. This term can be understood in two different ways. Either the datasets are fuzzy or the mining itself, the model building, is fuzzy.
Fuzzy datasets arise when fuzziness is used in data selection and preparation, such as converting variables to young and rich, where the intervals don’t have exact endpoints. Instead, every observation comes with a membership value. Subsequently the fuzzy data can be analysed with extended versions of standard data mining techniques or the analyses can be carried out in fuzzy spaces.
The second approach to fuzzy data mining is to employ the principle of fuzziness in the model building itself. Various application exist and we will discuss a few.
- Clustering: k-means clustering is a common unsupervised data mining technique. This method assigns each observation to one of the predefined k clusters which has the closest mean. Thus, each instance of the dataset belongs to only one cluster and there are crisp boundaries between them. In fuzzy clustering, an observation can belong to many clusters, to a certain extent. This partial belonging is described with the membership function.
- Rule-Based Systems: Systems which store and manipulate knowledge in order to interpret information in a useful way are called rule-based systems. These are, for example, expert systems which can be used by doctors to assist in disease diagnosis. Fuzziness has a long history in rule-based systems, mainly because the knowledge that goes into them is usually not based on exact facts but rather statements like OLD houses in SOUTH England are EXPENSIVE. In order to accurately represent the knowledge, fuzziness is introduced in the model and the induction mechanism.
- Decision Trees: Decision trees are frequently used for building predictive models. When training the models, splitting in inner nodes is usually based on strict rules, i.e. a person is either younger than 25 or older than 25. Changing the boundaries can affect the final model greatly, which leads to unstable solutions.
Applying fuzzy sets in this case allows for more flexibility. Instead of using an absolute value, it is possible to make the split based on a fuzzy set. As a result, an instance can belong to more than one terminal node. - Association Rules: The mining of association rules comprises discovering relations between variables in a data set. A well known example of association rules is the mining of frequent item sets, i.e. finding products that are often bought together. Thus rules such as everyone who buys ketchup and fries also buys mayonnaise, can be discovered. Introducing fuzziness in this case can lead to increased support of rules. However, applying the fussiness is not straight forward and different ways to generalize the evaluation metrics of association rules have been proposed. This specific field of combining fuzziness with data mining is particularly widely studied, with more than 500 papers published since 1997.
Fuzzy frameworks are abstract mathematical entities which have been successfully applied in various data mining applications. It is an active field of research in machine learning and AI since fuzzy sets gives a more representative description of the world we live in, where not everything is either black or white.
References
- Bojadziev and M. Bojadziev. Fuzzy logic for business, finance, and management, volume 12. World Scientific, 1997
- Hüllermeier. Fuzzy sets in machine learning and data mining. Applied Soft Computing, 11(2):1493–1505, 2011
- Marín, M. Ruiz, and D. Sánchez. Fuzzy frameworks for mining data associations: fuzzy association rules and beyond. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2016
- Pedrycz. Fuzzy set technology in knowledge discovery. Fuzzy Sets and Systems, 98(3):279–290, 1998