Professional Documents
Culture Documents
iew
Isaac Kega Mwangi1*; Lawrence Nderu2; Ronald Waweru Mwangi3; Dennis Gitari Njagi4
1. School of ICT, Media and Engineering, Zetech University, P.O. Box 2768 – 00200
Nairobi, Kenya
* E-mail of the Corresponding author: isaac.kega@zetech.ac.ke, Mobile Number: +254
720935632.
ev
2. School of Computing and Information technology, Jomo Kenyatta University of
Agriculture and Technology, P.O. Box 62000-00200 Nairobi, Kenya.
Email: lnderu@jkuat.ac.ke
3. School of Computing and Information technology, Jomo Kenyatta University of
Agriculture and Technology, P.O. Box 62000-00200 Nairobi, Kenya.
r
Email: waweru_mwangi@icsit.jkuat.ac.ke
er
4. School of Computing and Information technology, Jomo Kenyatta University of
Agriculture and Technology, P.O. Box 62000-00200 Nairobi, Kenya.
Email: dennis.njagi@jkuat.ac.ke
pe
ABSTRACT
Interpretability is a critical concern in the machine-learning realm. Detecting interactions in the
data is one fundamental way in which intrinsic models portray their interpretability. A
Generalized Linear Model (GLM) is an example of an intrinsic interpretable model which uses
interaction detection in its operation. However, Generalized Linear Models do not search the
ot
whole sample space for variables' interactions, assuming that all variables' interactions with the
target variable(s) are the same. This paper proposes a hybrid model, RAGL, which uses Rough
Set theory to detect interaction terms through information granulation and then derive decision
tn
rules from these seen terms via the association rule mining method. These rules shall contain the
potential high and low interactions, which shall then be selected and used as new variables in the
GLM model. This study showed that the proposed model could detect high and low-level
interactions within the whole sample space of a given dataset and ultimately use the important
interaction terms for prediction purposes. Weather data for Kariki_farm in Juja was used to train
rin
and test the proposed model and evaluate it against the classical GLM model. Interaction
detection using the proposed model performed better than the classical GLM model in terms of
accuracy and interpretability.
ep
1.0 INTRODUCTION
The task of explaining the outcome of a machine-learning model process in an understandable
human way is known as Interpretability (Doshi-Velez, 2017). Interpretability is vital to satisfy
human curiosity for tasks or aspects they are handling, and the other reason is to handle
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
Interpretable machine learning techniques can be grouped into intrinsic Interpretability and post-
ed
hoc Interpretability. Models that are said to be intrinsic incorporate interpretability into their
underlying structure. At the same time, the post-hoc one requires creating a second model to
explain an existing model. Examples of intrinsic machine learning models are decision trees,
rule-based, linear, and attention models. In contrast, Post-hoc models that can be used are
Permutation Feature Importance, SHAPELY Values, LIME(Local Surrogate), etcetera. The main
iew
difference between these two groups lies in the trade-off between model accuracy and
explanation fidelity. Inherently interpretable models could provide accurate and undistorted
explanations but may sacrifice prediction performance. The post-hoc ones are limited in their
approximate nature while keeping the underlying model accuracy intact (Du, M., Liu, N., & Hu,
X. (2019)).
ev
Furthermore, interpretability can be further classified as global or local. Global purely means
understanding the structure and parameters of the complex model as it performs the prediction
globally, thus a holistic view without going deeper. Local interpretability can help uncover the
causal relations between a specific input and its corresponding model prediction (Du, M., Liu,
r
N., & Hu, X. (2019)). Local interpretability inspects the individual prediction of the model while
trying to make sense of how the model arrived at that prediction. (C. Molnar, 2019). Global
interpretability could illuminate the inner workings of machine learning models and thus increase
their transparency.
er
Interaction detection identifies and analyzes the interactions between different input variables in
pe
a given model. Understanding the interactions between the input variables, one better
understands how the model uses these interactions to make predictions. Through this, it will aid
one in understanding how the model is making decisions and thus provide insights into the
models' strengths and weaknesses.
Any suitable explanation method must include interactions between the features. Two main
ot
objectives in the domain of feature interaction detection are (1) To find a group of features that
depend on one another, synonymously known as Feature Interaction Detection, and (2) To
interpret in what way the group of features detected interacts with one another known as Feature
Interaction Interpretation (Tsang, M., Enouen, J., & Liu, Y. (2021).)
tn
Assessing the interpretability of a machine learning method is also very important. There are
three ways of determining the interpretability of the machine learning method (1) Application
grounded, which involves conducting human experiments within a natural application
rin
environment, e.g., a researcher who has a concrete application in mind will use domain experts
to test the workability of their solution within the environment. This metric tests the workability
of the model built for that environment but is expensive and not easy evaluation. (2) Human-
grounded evaluation use lay human instead of domain experts to test the workability of the
model. This allows a vast pool of people to test the model, and at the same time, the experiment
ep
is less expensive than the application-grounded metric. (3) Functional grounded metric uses a
formal definition of interpretability as a proxy for explanation quality (Doshi. V & B. Kim,
2017).
Zhou et al. (2021) added that some definition of interpretability is used as a proxy to evaluate the
Pr
explanation quality in functional grounded interpretability. They further divided the functional
grounded metric into three types, namely, model-based, attribution-based, and example-based
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
explanations. Model-based explanations use an intrinsic interpretable model to explain the
ed
original task model. Examples of quantitative metrics in this method are examining the model
size, the runtime operation counts, and the interaction strength that looks at the feature's effect
depending on other features' values. These can be used to evaluate the simplicity of local and
global model-based explanations. Attribution-based explanations measure the explanatory
powers of the input features and use them to explain the task model. Examples of metrics here
iew
include feature mutual information between original samples and corresponding features
extracted for explanations to monitor the simplicity and broadness of reasons, while the mutual
target information between extracted features and related targets to monitor the fidelity of
explanations. Example-based explanations explain the task model by selecting instances from the
training/testing dataset or creating new instances.
ev
The functional grounded metric will be used as the basis for evaluating the interpretability of our
proposed model due to the nature of the model being hybrid, thus still in the development phases,
and also due to expenses associated with the other two metrics used to evaluate the
interpretability of machine learning methods.
r
2.0 PROBLEM STATEMENT
er
Generalized Linear Model is an intrinsic interpretable model that belongs to a subset of machine
learning models whose simple internal structure is used to interpret the prediction results of the
pe
model. Though it is an intrinsic model, GLM falls short in the way it detects interactions from
the input variables. The model assumes that all interactions are the same and, as such, does not
search the whole sample space to see these interactions (Changpetch, P., & Lin, D. K. (2013)).
An ideal interpretable model should not only provide a means of accurate predictions but also
provide a means of explaining how the model arrived at this prediction. As such, one key
ot
the Association rule mining method to detect interactions in a Generalized Linear Model
(RAGL). The advantage of RAGL lies in its ability to use Rough Set theory information
granulation procedures to define indiscernible features from discernible ones through the greedy
heuristic reduct method, which will formulate a reduct containing all the essential elements. The
rin
reduct shall then be used in generating decision rules by the Apriori method for association rule
mining. The work's main contribution can be summarized as (1)RAGL uses information
granulation for Rough Set theory to detect the interactions, reduce the sample space and get a
reduct having the detected interactions necessary for prediction; (2) RAGL generates decision
rules from the generated reduct. The rules analysis is a robust process to explore relationships
ep
that ultimately decompose the interactions into their most refined form for easy understanding.
The rules selected shall be based on the ones with the highest confidence and support. The rules
should also not be redundant or duplicated; (3) the RAGL framework will be able to convert the
rules into binary values from which these shall be the necessary interactions for prediction
Pr
purposes.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
2.1 RELATED WORK
ed
This section reviewed work on interaction detection's use in aiding the machine learning model's
interpretability and prediction prowess. In (Tsang, Enuoen, and Liu 2021), the authors made a
case for feature interaction in interpretable A.I. The authors noted that most Post-Hoc methods
used complex black box models like neural networks do not consider the shared importance of
iew
groups of features in a dataset and hence look at the effect of a feature individually, which
hampers the interpretability of the models. They came up with two objectives for feature
interaction in the interpretability of machine learning models (1) Feature Interaction Detection,
which was to find a group of features that depend on one another, and (2) Feature interaction
Interpretation which deals with understanding how a group of features interact with one another
through interpretation of coefficients or interpreting interaction attribution among other methods.
ev
Adding interaction terms to the GLM model is one popular method for interaction detection in
GLMs. It involves creating a new variable that is the product of two or more predictors and
including it as an additional predictor in the model. The coefficient associated with the
r
interaction term measures the strength of the interaction effect. This method is simple to
implement and provides a straightforward way of detecting interaction effects. However, it can
be computationally intensive when the number of predictors is large. It also can lead to
er
overfitting and poor generalization capabilities of the model. Another issue relates to selecting
the appropriate interaction terms to include in the model and may require domain knowledge or
exploratory data analysis to identify the interactions before modeling them on the model.
pe
(McCabe et al., 2022)
Another method for interaction detection in GLMs is using polynomial or spline-based functions,
which involves transforming the predictors into polynomial or spline functions and including
them in the model. Zhang et al. (2019) used this method to analyze the relationship between age
and body mass index in a large cohort of women. An issue with this method is that increasing the
ot
degree of polynomial functions, or the number of knots in spline functions, can lead to models
with a high degree of complexity, making it difficult to interpret the results and leading to
computational challenges. Selecting the appropriate degree of a polynomial or the number of
knots in spline functions can be challenging and may require domain knowledge or exploratory
tn
data analysis to determine the most appropriate model. (Perperoglou A. et al. (2019))
Tsang et al. (2020) proposed a model GLIDER(Global Interaction Detection and Encoding for
rin
Recommendation) that uses Neural Interaction Detection with LIME for data instance
perturbation over a batch of data samples and then explicitly encodes the collected global
interactions into a target model via sparse feature crossing. Their proposed model improved the
target model prediction performance, and the detected global interactions were explainable.
ep
However, GLIDER had issues with computational complexity and interpretation of the results.
Interactions detected in GLIDER were confined to local interactions and hence could not provide
a global understanding of interactions in the model. Despite using LIME, interpreting the results
from GLIDER can be challenging incredibly for people without knowledge of deep learning. It is
computationally expensive to train a deep neural network which eats up time and system
Pr
resources.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
Sumalatha, L., Uma Sankar, P., & Sujatha, B. (2016), in their research, proposed to use of Rough
ed
Set Theory to find behavioral patterns of bank customers. Their proposed method started by
computing their decision reduct using the discernibility matrix and degree of dependency
computation between the attributes to find the critical attributes. Then they used the selected
attributes to mine decision rules. They stated that the advantages brought about by their proposed
method were (1) dimensionality reduction and (2) 90% accuracy on customer deposit nature by
iew
the generated decision rules.
In their research, Xun, J., Xu, L. C., & Qi, L. (2012, August) used Rough Set Theory as a basis
for mining Association rules from a given dataset. This research combines the idea of the Apriori
Algorithm with a Decision Table. This method has three advantages: eliminating redundancy
attributes, reducing the number of attributes, and the ability to produce decision attribute sets
ev
from a cost of only one Decision Table scan. The method removed redundant attributes through
the simplified decision table, then generated frequent itemsets by the improved Apriori
Algorithm.
r
Slimani, T. (2015) used the Roughset method to mine Class association rules which contained
classes as their consequences. The research paper discussed an efficient algorithm for
discovering these rules inspired by the Apriori algorithm. It is based on a principle from rough
er
set theory, which involves looking at the elementary set of lower approximations included in
rough sets. This approach is more straightforward and more effective than other classic methods
of finding association rules. The proposed algorithm works by first making multiple passes over
pe
the data and counting the support of each 1-rule item (each rule item contains one item in its
condition set). A particular expression denotes the set of all rule items, and the algorithm's goal
is to identify the frequent candidate k-rule items. The unique difference between the C_Apriori
algorithm and the Apriori algorithm is that in C_Apriori, rule items are joined to create the same
class.
ot
Ong, C. S., Huang, J. J., & Tzeng, G. H. (2004) the authors proposed detecting interactions
between different factors through 'information granulation", which is the basis behind the Rough
Set theory. To simplify the problem, they applied a decision rule based on the 'Rough set theory'
to reduce the number of factors they had to consider. After this, they used a method called
tn
'stepwise selection' to determine the 'significant interaction effects' which had the most extensive
influence on the model. Finally, the authors found that when the logit model incorporated these
interactions, it performed better than other methods.
rin
Rough Set theory is a knowledge discovery method highly applied in relational databases.
Professor Pawlak first introduced it in 1982. It uses Information granulation to group similar
objects into a single, collective entity (granules) to simplify information representation and
ep
In rough set theory, we use an information system to represent data with its associated attributes
𝐼 = (𝑈,𝐴). We can then calculate an indiscernibility relation 𝑈/𝐼𝑁𝐷(𝐴), and the elements in
each indiscernibility are known as granules. The granule represents a set of conditional attributes
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
that affect a decisional attribute and can be represented as 𝐺 = {𝐶1𝐶2𝐶3……𝐶𝑛} that satisfy a
ed
decision 𝐶𝑖→𝐷𝑖 where 𝐷𝑖 is the decision attribute represented by the detected information
granule. Rough Set theory uses the concept of approximations to define features that indeed
belong or do not belong to the target set 𝑎𝑝𝑟 = (𝑈,𝐴), where U is the universe of discourse and
A is the set of attributes known as lower and upper approximations respectively.( Raza, M. S., &
iew
Qamar, U. (2017))
ev
Upper approximation is a set of objects that may belong to the target set.
r
The Lower approximations are used to determine the reduct, which is the RoughSet theory
process of removing irrelevant features from the decision table/information system. ( Raza, M.
S., & Qamar, U. (2017)).
er
The generation of the reduct is the next pivotal step, and the method for use is the greedy
heuristic reduct generation method. The method uses the greedy algorithm to compute decision
reducts. The algorithm is defined as 𝑄𝑑:𝐴 𝑥 2𝐴→ℝ + ∪ {0} corresponds to a monotonic attribute
pe
quality measure in that it decreases with the increasing size of the set from its second argument.
This function also needs a property that equals 0 if the second argument is a superreduct, a
collection of attributes that discerns all objects from different decision classes Janusz, A., Ślęzak,
D. (2014)).
ot
Association rule mining is a data mining technique that identifies relationships between items in
tn
large datasets. It is based on finding associations between items in a transaction database and
identifying the Support, confidence, and Lift of the rules generated from the associations. These
rules can be used for various purposes, such as market basket analysis, recommendation systems,
and fraud detection. The rules are in the form where A is the antecedent part and B is the
consequent part of the rule(Abdel-Basset et al., 2018), primarily represented in the form of
rin
IF…THEN…. Statements. The most commonly used algorithms for association rule mining are
the Apriori algorithm and the FP-Growth algorithm. Association rules are mined and filtered
based on three metrics which are:
ep
Support is the number of transactions containing a particular item set divided by the total number
of transactions. It indicates how often the association rule appears in the data set.
Confidence is the percentage of transactions containing itemset X and itemset Y. Typically, how
reliable a rule calculation is.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
𝑃(𝐴 ∪ 𝐵)
𝐴→𝐵 = 𝑃(𝐵│𝐴) = 𝑃(𝐴)𝑃(𝐵)…….(iv)
ed
Lift indicates the strength of the dependence between the antecedents and consequences of the
association rule.
iew
Where 𝑃(𝐴𝐵)is the probability that A and B coincide in the data to be analyzed. 𝑃(𝐵|𝐴)is the
conditional probability of B given A, 𝑃(𝐴)is the probability that A appears in the dataset and
𝑃(𝐵) is the probability that B appears in the dataset(Santoso, M. H. (2021))
The Apriori method identifies frequent individual items in a given database. It then explains
them to larger item sets while checking that the item sets appear sufficiently in the database.
ev
Thus, the aim is to identify frequent item sets that satisfy minimum support, and the generated
rules will satisfy the minimum confidence. The algorithm first determines the frequent itemsets
by setting a minimum support threshold. Next, the rules are generated from the frequent itemsets
by computing the confidence, after which the rules are pruned to remove redundant ones. The
r
process is iterative until all the frequent itemsets have been used to find the rules.
The FP-growth (Frequent Pattern) algorithm builds conditional parameters based on the FP-tree
er
structure to generate a complete set of conditional parameters. The tree structure will maintain
the associations between the frequent itemsets, which shall then be analyzed for patterns of
associations (Gupta, A. 2019).
pe
In this work, we focused on using the Apriori method to generate the decision rules from the
reduct generated by the Rough Set theory method.
Linear regression is one of the most interpretable models today in the machine learning realm. It
predicts a target as a weighted sum of inputs. The way linear regression makes interpretability
easy is by viewing the linearity of the learned relationships. This is achieved through modeling
tn
the dependence of the target variable y on some features of X. Generalized linear models are an
offshoot of the interpretable model linear regression. It is mainly used to tackle quantitative
problems by statisticians and computer scientists(C Molnar, 2019).
A GLM is one of the extensions of the linear model to model non-linear outcomes. The key
rin
defining feature of a GLM is to allow non-Gaussian outcome distributions and connect the
distributions and the weighted sum through a non-linear function. Thus, a GLM can be modeled
to give a categorical outcome and a count outcome, but a few of which a linear model cannot
produce (C Molnar, 2019).
ep
GLM consists of THREE essential parts, namely (1) Random Component – refers to the
probability distribution of the response variable (Y), e.g., normal distribution for Y in the linear
regression or binomial distribution for Y in the binary logistic regression. Also called a noise
model or error model; (2) Systematic Component - specifies the explanatory variables (X1, X2
Pr
... Xk) in the model, precisely their linear combination in creating the so-called linear predictor,
e.g., β0 + β1x1 + β2x2, as we have seen in linear regression; (3) Link Function, η or g (μ) -
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
specifies the link between random and systematic components. It says how the expected value of
ed
the response relates to the linear predictor of explanatory variables, e.g., η = g (E (Yi)) = E (Yi)
for linear regression or η = logit (π) for logistic regression.
The table below explains a few of the models that are for the GLM, their link functions, types of
problems, and the data they handle.
iew
Table 1:GLM models and their link functions
ev
ANOVA Normal Identity Categorical
ANCOVA Normal Identity Mixed
Logistic Regression Binomial Logit Mixed
r
Log-linear Poisson Log Categorical
Poisson Regression Poisson Log Mixed
Multinomial response Multinomial er Generalized Logit Mixed
pe
The link function in GLM is representative of each distribution from the exponential family (C
Molnar, 2019). The choice of the proper link function; there is no predefined way of choosing
the function (C Molnar, 2019). One must consider their target's distribution and how well the
model fits the data. For this research, the binomial link function was used because the target
variable in our dataset was categorical data with two categories: Rain (1) and No rain (0).
ot
The decision rules generated from the Association rule mining process and rough set theory will
be converted into binary values before being fitted into the GLM model. The outcome should be
whether the proposed approach had better prediction capabilities and interpretability than a
tn
2.2 PROPOSED RAGL (Hybrid Rough Set theory -Association Rule Mining-Generalized
rin
dimensionality reduction are handled by the Rough Set theory method, which employs the
greedy heuristic method to select the best features within the dataset. The Association Rule
mining method generates decision rules from the specified features, and these rules, in turn, form
the interactions between the selected features. The decision rules are converted into a data frame
Pr
with binary value representations, which are then modeled onto the Generalized Linear Model
employing the Logit function for classification purposes. The dataset must first be discretized
because the data was in continuous form.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
The algorithm is described below:
ed
ALGORITHM: RAGL
Input Decision Table T = (U, A, C, D) where U is the universe of discourse, A is a family of
equivalence relations over U, and C and D are conditional and
iew
decisional attributes, respectively.
Output: - RED of T: Defines that the output will be a decision reduct RED from the input
Decision Table T
ev
1. Rough Set theory greedy heuristic method to get essential attributes and formulate them
into a decision reduct RED.
T{ }
𝑄𝑚𝑎𝑥 = ∞
r
While 𝑄𝑚𝑎𝑥 ≠ 0
Do
for each 𝑐 ∈ 𝐶 do
𝑄𝑐 = 𝑄𝑑(𝐶,𝑇)
If 𝑄𝑐 > 𝑄𝑚𝑎𝑥 then
𝑄𝑚𝑎𝑥 = 𝑄𝑐
er
pe
𝑐𝑏𝑒𝑠𝑡 = 𝐶;
𝑒𝑛𝑑
end
ot
𝑇 = 𝑅𝐸𝐷 ∪ {𝑐𝑏𝑒𝑠𝑡}
tn
𝑄𝑚𝑎𝑥 =‒ ∞
end
end
rin
𝑅𝐸𝐷 = 𝑅𝐸𝐷{𝑐}
End
End
Pr
Return RED
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
End;
ed
Input 𝑹𝑬𝑫 = {(𝐶),𝐷)|𝑐 ∈ 𝐶 𝑎𝑛𝑑 𝑑 ∈ 𝐷}, 𝑘 = 0, the input is our decision reduct RED generated
from the first step of the RAGL algorithm, and k represents the conversion of the
reduct to a transaction containing all the frequent set items to be converted into
decision rules.
iew
2. 𝑅𝐸𝐷𝑘←𝑖𝑛𝑖𝑡()
|𝑅𝐸𝐷 ∗ (𝑐) ∪ 𝑅𝐸𝐷 ∗ (𝑑)|
𝑆← { |𝑘| }
"𝑚𝑖𝑛𝑠𝑢𝑝𝑝
ev
{|𝑅𝐸𝐷 ∗ (𝑐) ∪ 𝑅𝐸𝐷 ∗ (𝑑)|}
𝐶← "𝑚𝑖𝑛𝑐𝑜𝑛𝑓
𝑅𝐸𝐷 ∗ (𝑐)
r
𝑅𝐸𝐷𝑐←𝐶𝐴𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 ‒ 𝑔𝑒𝑛 (𝑆𝑖 ‒ 1)
For each transaction 𝑘 ⊆ 𝐾do
{ |
𝑆𝑖← 𝑐 ∈ 𝐶𝑖 𝑐.𝑠𝑢𝑝𝑝𝑜𝑟𝑡 ≥ 𝑚𝑖𝑛𝑠𝑢𝑝𝑝}
er
If (k.Condset is included in RED), then
k.condsupport ++
pe
𝐶𝐴𝑖←{𝑓│𝑓 ∈ 𝐹,𝑓.𝑠𝑢𝑢𝑝𝑝𝑜𝑟𝑡 ≥ 𝑚𝑖𝑛𝑐𝑜𝑓}
End for
Return 𝐶𝐴←𝑅𝐸𝐷𝑖𝐶𝐴𝑖
ot
3. We now have the decision rules in the form of 𝑖𝑓 𝑋𝑖 = 𝑎 𝑎𝑛𝑑 𝑋𝑛 = 𝑏 𝑡ℎ𝑒𝑛 𝑌 = 𝑐 where
𝑋𝑖 and 𝑋𝑛 Are the conditional attributes generated from steps 1 and 2 of RAGL while Y is
tn
the decision attribute. We generate binary decision variables from the rules using the
following association of the conditional attribute to their binary association.
{
1 𝑖𝑓𝑋1 = 0, 𝑋2 = 1, 𝑋3 = 0……𝑋𝑛 = 1
𝑋1 (0),𝑋2 (1), 𝑋3 (0)…𝑋𝑛 (1) = 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
rin
4. Having converted the decision rules into binary values, we fit them into a GLM model
3. EXPERIMENT SETUP
ep
The experiment was performed over the Kariki Farm weather data scrapped from the
wunderground.com website. Kariki farm is a 22-hectare cultivated land in the Juja area
belonging to the Marginpar Group, which deals primarily with growing flowers and other
horticulture crops. The farm houses a personal weather station(PWS) that relays its weather
readings to the wnderground.com website. After the scrapping, data was stored in a google sheet,
Pr
and from there, it was loaded into R studio for preprocessing and then modeled onto the
proposed RAGL model. The objective of the experiments was to assess whether combining the
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
use of Rough Set theory and Association rule mining to detect interactions in a Generalized
ed
linear model can improve the accuracy and interpretability of the Generalized linear model.
Rough Set theory was used as a feature selection model. The model selected the most critical
features from the 17 features in the dataset and was left only with ten features that impacted the
target variable, Rain. From the features chosen, association rule mining using the Apriori method
was used to mine frequent itemsets from the features and ultimately generate decision rules. The
iew
support was set at 0.01, and the confidence at 0.8 to generate a total of 1431 rules. The rules
were pruned to remove redundant and duplicate rules. From the 1431 rules, we were left with 54
rules. These rules were later converted into binary values by choosing the top 30 rules with the
highest lift/confidence/support.
The experiments were carried out in two scenarios, one with interaction detection and the other
ev
without interaction detection. The results from the experiments were analyzed on two fronts:
prediction capabilities, where the metrics of accuracy, precision, recall, and pseudo. R-squared
measures were used to measure the prediction capabilities of the proposed solution, whereas, for
interpretability capabilities, the Model-based explanation sub-metric under the functional level
r
metric proposed by Doshi-Velez and Kim(2017) was used. Here we compared the complexity of
the two models between RAGL and the classical Logit model. We used the AIC (Akaike
Information Criterion) and BIC (Bayesian Information Criterion) to check the complexity of the
er
two models and which models fit the data well and give good predictions. Also, we studied the
interaction strength of the features in the two models and compared the two models, which were
able to give a comprehensive cause and effect between the features.
pe
4. RESULTS AND DISCUSSION
This section discusses the experiment's results using the classical Logit model and the proposed
RAGL model on prediction and interpretability capabilities.
4.1.1 Experiment Prediction Metrics: Prediction and Interpretation using the classical
ot
interaction detection on the weather dataset against the results of our proposed model, RAGL.
The experiment showed that the proposed model performed better in prediction and
interpretability capabilities.
Table 2:Classical GLM vs. RAGL results
rin
McFadden's Pseudo R-squared was used. The results show that our proposed model performed
better in prediction than the classical GLM model. Reviewing the Pseudo R squared
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
measurement shows that RAGL scored higher than the classical GLM model without interaction
ed
detection. The higher values indicated a better fit of the model to the data by the RAGL model
than the classical GLM model.
Regarding classification accuracy, we see that RAGL had a 6% increase compared to the
classical GLM model. The above can be represented in graphical format as shown in figure 1,
iew
which shows that the proposed model RAGL had better performance in classification accuracy,
precision, recall, and Pseudo RSquared to the classical GLM model.
ev
1
0.8
0.6
r
0.4
0.2
0
Classification accuracy Precision er Recall/Sensitivity
Classical GLM model without interaction detection
Pseudo Rsquared
RAGL Model
pe
ot
tn
rin
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
ed
Complexity metrics for classical GLM vs RAGL
iew
10000
8000
6000
4000
2000
0
Classical GLM model without RAGL Model
ev
interaction detection
AIC BIC
r
The AIC and BIC values for the Classical GLM and the proposed RAGL significantly differ. The
AIC and BIC values for RAGL are 22 and 80. To find the attributes that led to a significant drop
in the AIC values, we ran stepwise regression on the RAGL model to determine which attributes
er
led to this significant drop. The attributes determined from the RAGL model are as depicted in
table 3:
pe
ot
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
Humidity_High..80.89. + Windspeed_High.0.18. + Windspeed_High..18.28. +
ed
Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. +
Windspeed_Avg.5.4.18.5. + High.Hpa..849.3.1023. + High.Hpa..1023.1024. +
High.Hpa..1024.1100.
4 Step: AIC=38 38
RAIN ~ High_Temp.15.7.23.7. + High_Temp.25.3.30.3. + Low_Temp..40.14.5. +
iew
Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. + Dewpoint_High.7.3.13.1. +
Dewpoint_High..13.1.15.6. + Humidity_High.31.80. + Humidity_High..80.89. +
Windspeed_High.0.18. + Windspeed_High..18.28. + Windspeed_High.28.59.7. +
Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. + Windspeed_Avg.5.4.18.5. +
High.Hpa..849.3.1023. + High.Hpa..1023.1024. + High.Hpa..1024.1100.
5 Step: AIC=36 36
ev
RAIN ~ High_Temp.15.7.23.7. + Low_Temp..40.14.5. + Low_Temp.14.5.15.9. +
Low_Temp.15.9.26.3. + Dewpoint_High.7.3.13.1. + Dewpoint_High..13.1.15.6. +
Humidity_High.31.80. + Humidity_High..80.89. + Windspeed_High.0.18. +
Windspeed_High..18.28. + Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. +
Windspeed_Avg.2.4.5.4. + Windspeed_Avg.5.4.18.5. + High.Hpa..849.3.1023. +
r
High.Hpa..1023.1024. + High.Hpa..1024.1100.
6 Step: AIC=34 34
er
RAIN ~ High_Temp.15.7.23.7. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +
Dewpoint_High.7.3.13.1. + Dewpoint_High..13.1.15.6. + Humidity_High.31.80. +
Humidity_High..80.89. + Windspeed_High.0.18. + Windspeed_High..18.28. +
Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. +
pe
Windspeed_Avg.5.4.18.5. + High.Hpa..849.3.1023. + High.Hpa..1023.1024. +
High.Hpa..1024.1100.
7 Step: AIC=32 32
RAIN ~ High_Temp.15.7.23.7. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +
Dewpoint_High..13.1.15.6. + Humidity_High.31.80. + Humidity_High..80.89. +
Windspeed_High.0.18. + Windspeed_High..18.28. + Windspeed_High.28.59.7. +
ot
High.Hpa..1024.1100.
10 Step: AIC=26 26
RAIN ~ High_Temp.15.7.23.7. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +
Humidity_High..80.89. + Windspeed_High.0.18. + Windspeed_High..18.28. +
Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. +
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
RAIN ~ High_Temp.15.7.23.7. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +
ed
Humidity_High..80.89. + Windspeed_High.0.18. + Windspeed_High..18.28. +
Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. +
High.Hpa..1023.1024. + High.Hpa..1024.1100.
12 Step: AIC=22 22
RAIN ~ High_Temp.15.7.23.7. + Low_Temp.14.5.15.9. + Humidity_High..80.89. +
iew
Windspeed_High.0.18. + Windspeed_High..18.28. + Windspeed_High.28.59.7. +
Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. + High.Hpa..1023.1024. +
High.Hpa..1024.1100.
We get the lowest AIC for our proposed model when considering the ten variables we deduced
from the RAGL model on the dataset. These attributes reveal that RAGL split the detected
interactions according to their respective values and relation to the RAIN target variable.
ev
Compared with the classical GLM model, where it equates that the predictors have a similar
interaction with the target variable. We can easily explain the model by using these predictors as
a whole.
4.1.2 Experiment Interpretability Metrics: Interpretability of the Classical GLM model vs.
r
interpretability of the RAGL model.
er
Table 2 shows that the RAGL model had better AIC and BIC scores than the classical GLM
model. This shows that our proposed model had a better fit for the data. Also, the complexity of
the model was significantly reduced because RAGL only used the detected interaction to build
the model instead of considering even non-important features. Regarding the attribute
pe
coefficient, the RAGL model gave a better account of the features because the model had to
discretize the continuous features before generating decision rules on these features. The analysis
of the deviance table of the classical GLM model in Table 4 vs. RAGL in Table 5 shows that
RAGL had a better fit and could better understand the model parameters and coefficients. This
we can see in how our proposed model RAGL was able to estimate the coefficients of the
predictors. For example, we can use the High_Temp predictor for the classical GLM model vs.
ot
the High_Temp(15.7;23.7) predictor in the RAGL model. We can see that p-values are different
for these two coefficients in the sense that the predictor in the RAGL model for
High_Temp(15.7;23.7) had a p-value much lower than that of the classical GLM model and, as
tn
such, showed that High_Temp(15.7;23.7) was more significantly associated with the outcome
variable. Here 15.7 to 23.7 represents the range of the temperature in question. In the classical
GLM model, the predictor High_Temp equates that it is similar to all ranges of the high
temperature, and we know that is not the case. In the sense that High_Temp(15.7;23.7) has a
rin
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
Dewpoint_low 1 1.07 1614 1191.8 0.3001695
ed
Humidity_High 1 13.17 1613 1178.6 0.000284***
Humidity_Avg 1 37.63 1612 1141.0 8.532e-10***
Humidity_Low 1 4.62 1611 1136.4 0.0315486*
Windspeed_High 1 0.02 1610 1136.3 0.8903394
iew
Windspeed_Avg 1 33.73 1609 1102.6 6.320e-09***
High(Hpa) 1 46.26 1608 1056.4 1.038e-11***
Low(Hpa) 1 1.92 1607 1054.4 0.1659203
r ev
Table 5: ANOVA table for RAGL
NULL
High_Temp.15.7.23.7. 1 15.1656
36
35
er
Df Deviance Resid Resid. df Resid.Dev
49.61
34.795
Pr(>Chi)
9.848e-05 ***
pe
High_Temp.23.7.25.3. 1 1.2918 34 33.503 0.255714
High_Temp.25.3.30.3 1 0.6877 33 32.815 0.406964
Low_Temp..40.14.5. 1 0.7189 32 32.097 0.396514
Low_Temp.14.5.15.9. 1 2.3137 31 29.783 0.128236
Low_Temp.15.9.26.3. 1 1.1520 30 28.631 0.283125
ot
5. DISCUSSION
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
Rough set theory and association rule mining methods can improve the interpretability of logistic
ed
regression by identifying the most critical factors or rules contributing to the model's predictions.
Rough Set Theory helped identify the most critical factors driving the model's predictions and
reduce the number of input variables that needed to be considered, making the model more
interpretable. Rough set theory was used to identify a minimal subset of input variables sufficient
to predict the output variable accurately. Association rule mining methods were used to identify
iew
patterns or rules within the data associated with a particular outcome. These rules were used to
explain the model's predictions and help understand how different input variables are related to
the output variable.
Rough set theory and association rule mining helped identify relevant relationships between
variables used to improve the model's accuracy. Generalized linear models were then used to
ev
model these relationships in a more sophisticated and precise manner, resulting in a hybrid
model that was more accurate than a single-model approach.
Both rough set theory and association rule mining methods were used in combination with
r
logistic regression to improve interpretability by providing a more comprehensive understanding
of the data and the factors driving the model's predictions.
er
RAGL had better prediction metrics and interpretability than the classical GLM model.
6. CONCLUSION
pe
The research combined the Rough Set theory, association rule mining, and the Generalized
Linear model into one hybrid model. Through this hybrid model, we saw an increase in
interpretability. Rough Set Theory and Association Rule mining methods provided insights into
the relationships between variables and helped identify critical features. Combining the two with
the generalized linear model resulted in a hybrid model that was more interpretable and easier to
understand. It also improved accuracy, where feature selection, Rough Set theory, and
ot
Association Rule mining methods identified the critical features in the data. The generalized
linear model was used for prediction purposes on these detected features. The Generalized linear
model was able to understand these detected features more efficiently and provide better
tn
REFERENCES
1. Abdel-Basset, M., Mohamed, M., Smarandache, F., & Chang, V. (2018). Neutrosophic
ep
association rule mining algorithm for big data analysis. Symmetry, 10(4), 106.
2. Bello, R., & Falcon, R. (2017). Rough sets in machine learning: A review. In Thriving Rough Sets (pp.
87-118). Springer, Cham.
Pr
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
4. Carvalho, D. V., Pereira, E. M., & Cardoso, J. S. (2019). Machine learning interpretability: A survey on
ed
methods and metrics. Electronics, 8(8), 832.
5. Changpetch, P., & Lin, D. K. (2013). Model selection for logistic regression via association rules
analysis. Journal of Statistical Computation and Simulation, 83(8), 1415-1428.
6. Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv
preprint arXiv:1702.08608.
iew
7. Du, M., Liu, N., & Hu, X. (2019). Techniques for interpretable machine learning. Communications of
the ACM, 63(1), 68-77.
8. Frost, J. (2018). How To Interpret R-squared in Regression Analysis - Statistics By Jim. Retrieved
November 20, 2020, from https://statisticsbyjim.com/regression/interpret-r-squared-regression/
9. George, J., Letha, J., & Jairaj, P. G. (2016). Daily rainfall prediction using a generalized linear bivariate
model–A case study. Procedia Technology, 24, 31-38.
ev
10. Gupta, A. (2019, June 12). ML: Eclat algorithm. GeeksforGeeks. Retrieved September 26,
2022, from https://www.geeksforgeeks.org/ml-eclat-algorithm/
11. Haykin, S. S., Haykin, S. S., Haykin, S. S., Elektroingenieur, K., & Haykin, S. S. (2009). Neural
r
networks and learning machines (Vol. 3). Upper Saddle River: Pearson.
12. Hassanien, A. E., Abdelhafez, M. E., & Own, H. S. (2008). Rough Sets Data Analysis in Knowledge
Discovery: A Case of Kuwaiti Diabetic Children Patients. Advances in Fuzzy Systems.
er
13. Huang, Y., Zhao, H., & Huang, X. (2019, February). A Prediction Scheme for Daily Maximum and
Minimum Temperature Forecasts Using Recurrent Neural Network and Rough set. In IOP Conference
Series: Earth and Environmental Science (Vol. 237, No. 2, p. 022005). IOP Publishing.
14. Janusz, A., & Ślęzak, D. (2014). Rough set methods for attribute clustering and selection. Applied
pe
Artificial Intelligence, 28(3), 220-242.
15. Jayasingh, S. K., & Mantri, J. K(2019). Soft Computing Approaches on Climate Modeling and Weather
Predictions: Article. International Journal of Engineering and Advanced Technology.
16. Liakos, K., Busato, P., Moshou, D., Pearson, S., & Bochtis, D. (2018). Machine learning in agriculture:
A review. Sensors, 18(8), 2674
ot
17. Liu, D., Li, T., & Liang, D. (2014). Incorporating logistic regression to decision-theoretic rough sets for
classifications. International Journal of Approximate Reasoning, 55(1), 197-210.
18. McCabe, C. J., Halvorson, M. A., King, K. M., Cao, X., & Kim, D. S. (2022). Interpreting
tn
information/#:~:text=Climate%20is%20chaotic%20in%20nature,have%20not%20been%20fully%20un
derstood.&text=Every%20day%20different%20centers%20around,simulations%2C%20satellites%2C%
20and%20radars.> [Accessed November 8, 2022].
20. Mohankumar, S., & Balasubramanian, V. (2016). Identifying effective features and classifiers for short-
ep
term rainfall forecast using rough sets maximum frequency weighted feature reduction
technique. Journal of computing and information technology, 24(2), 181-194.
21. Molnar, C. (2019). Interpretable machine learning. Lulu. com.
22. Molnar, C., Casalicchio, G., & Bischl, B. (2020). Interpretable Machine Learning--A Brief History,
State-of-the-Art, and Challenges. arXiv preprint arXiv:2010.09337.
Pr
23. Nguyen, H. S. (2001). On efficient handling of continuous attributes in large databases; Fundamenta
Informaticae, 48(1), 61-81.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
24. Ong, C. S., Huang, J. J., & Tzeng, G. H. (2004, June). Using rough set theory for detecting the
ed
interaction terms in a generalized logit model. In International Conference on Rough Sets and Current
Trends in Computing (pp. 624-629). Springer, Berlin, Heidelberg.
25. Pawlak, Z. (1982). Rough sets. International journal of computer & information sciences, 11(5), 341-
356.
26. Perperoglou, A., Sauerbrei, W., Abrahamowicz, M., & Schmid, M. (2019). A review of spline
iew
function procedures in R. BMC medical research methodology, 19(1), 1-16.
27. Raza, M. S., & Qamar, U. (2017). Understanding and using rough set-based feature selection:
concepts, techniques, and applications. Springer Singapore.
28. Rissino, S., & Lambert-Torres, G. (2009). Rough set theory—fundamental concepts, principals, data
extraction, and applications. In Data mining and knowledge discovery in real-life applications.
ev
IntechOpen.
29. Ruzgar, B., & Ruzgar, N. S. (2008). Rough sets and logistic regression analysis for loan
payment. International journal of mathematical models and methods in applied sciences, 2(1), 65-73.
30. Santoso, M. H. (2021). Application of Association Rule Method Using Apriori Algorithm to Find Sales
Patterns Case Study of Indomaret Tanjung Anom. Brilliance: Research of Artificial Intelligence, 1(2), 54-
r
66.
31. Singh, M. K., Akhtar, Z., & Sharma, D. K. (2006). Challenges and Research Issues in Association Rule
V1N2, 767-774.
er
Mining. The proc. of International Journal of Electronics and Computer Science Engineering (IJECSE)
32. Slimani, T. (2015). Class association rules mining based rough set method—arXiv preprint
arXiv:1509.05437.
pe
33. Sumalatha, L., Uma Sankar, P., & Sujatha, B. (2016). Rough set-based decision rule generation to find
behavioral patterns of customers. Sādhanā, 41(9), 985-991.
34. Tsang, M., Enouen, J., & Liu, Y. (2021). Interpretable Artificial Intelligence through the Lens of
Feature Interaction. arXiv preprint arXiv:2103.03103.
35. Whittingham, M. J., Stephens, P. A., Bradbury, R. B., & Freckleton, R. P. (2006). Why do we still use
stepwise modeling in ecology and behavior? Journal of animal ecology, 75(5), 1182-1189.
ot
36. Widz, S., & Ślęzak, D. (2012). Rough set-based decision support—models easy to interpret. Rough
Sets: Selected Methods and Applications in Management and Engineering (pp. 95-112). Springer,
London.
tn
37. Xun, J., Xu, L. C., & Qi, L. (2012, August). IEEE. Association rules mining algorithm based on rough
set. In 2012 International Symposium on Information Technologies in Medicine and Education (Vol. 1,
pp. 361-364).
38. Zhou, J., Gandomi, A. H., Chen, F., & Holzinger, A. (2021). Evaluating the quality of machine
rin
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406