You are on page 1of 19

ed

HYBRID INTERPRETABLE MODEL USING ROUGHSET THEORY AND


ASSOCIATION RULE MINING TO DETECT INTERACTION TERMS IN A
GENERALIZED LINEAR MODEL (RAGL).

iew
Isaac Kega Mwangi1*; Lawrence Nderu2; Ronald Waweru Mwangi3; Dennis Gitari Njagi4
1. School of ICT, Media and Engineering, Zetech University, P.O. Box 2768 – 00200
Nairobi, Kenya
* E-mail of the Corresponding author: isaac.kega@zetech.ac.ke, Mobile Number: +254
720935632.

ev
2. School of Computing and Information technology, Jomo Kenyatta University of
Agriculture and Technology, P.O. Box 62000-00200 Nairobi, Kenya.
Email: lnderu@jkuat.ac.ke
3. School of Computing and Information technology, Jomo Kenyatta University of
Agriculture and Technology, P.O. Box 62000-00200 Nairobi, Kenya.

r
Email: waweru_mwangi@icsit.jkuat.ac.ke

er
4. School of Computing and Information technology, Jomo Kenyatta University of
Agriculture and Technology, P.O. Box 62000-00200 Nairobi, Kenya.
Email: dennis.njagi@jkuat.ac.ke
pe
ABSTRACT
Interpretability is a critical concern in the machine-learning realm. Detecting interactions in the
data is one fundamental way in which intrinsic models portray their interpretability. A
Generalized Linear Model (GLM) is an example of an intrinsic interpretable model which uses
interaction detection in its operation. However, Generalized Linear Models do not search the
ot

whole sample space for variables' interactions, assuming that all variables' interactions with the
target variable(s) are the same. This paper proposes a hybrid model, RAGL, which uses Rough
Set theory to detect interaction terms through information granulation and then derive decision
tn

rules from these seen terms via the association rule mining method. These rules shall contain the
potential high and low interactions, which shall then be selected and used as new variables in the
GLM model. This study showed that the proposed model could detect high and low-level
interactions within the whole sample space of a given dataset and ultimately use the important
interaction terms for prediction purposes. Weather data for Kariki_farm in Juja was used to train
rin

and test the proposed model and evaluate it against the classical GLM model. Interaction
detection using the proposed model performed better than the classical GLM model in terms of
accuracy and interpretability.
ep

1.0 INTRODUCTION
The task of explaining the outcome of a machine-learning model process in an understandable
human way is known as Interpretability (Doshi-Velez, 2017). Interpretability is vital to satisfy
human curiosity for tasks or aspects they are handling, and the other reason is to handle
Pr

uncertainty and biases in decisions being made (C Molnar 2019).

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
Interpretable machine learning techniques can be grouped into intrinsic Interpretability and post-

ed
hoc Interpretability. Models that are said to be intrinsic incorporate interpretability into their
underlying structure. At the same time, the post-hoc one requires creating a second model to
explain an existing model. Examples of intrinsic machine learning models are decision trees,
rule-based, linear, and attention models. In contrast, Post-hoc models that can be used are
Permutation Feature Importance, SHAPELY Values, LIME(Local Surrogate), etcetera. The main

iew
difference between these two groups lies in the trade-off between model accuracy and
explanation fidelity. Inherently interpretable models could provide accurate and undistorted
explanations but may sacrifice prediction performance. The post-hoc ones are limited in their
approximate nature while keeping the underlying model accuracy intact (Du, M., Liu, N., & Hu,
X. (2019)).

ev
Furthermore, interpretability can be further classified as global or local. Global purely means
understanding the structure and parameters of the complex model as it performs the prediction
globally, thus a holistic view without going deeper. Local interpretability can help uncover the
causal relations between a specific input and its corresponding model prediction (Du, M., Liu,

r
N., & Hu, X. (2019)). Local interpretability inspects the individual prediction of the model while
trying to make sense of how the model arrived at that prediction. (C. Molnar, 2019). Global
interpretability could illuminate the inner workings of machine learning models and thus increase
their transparency.
er
Interaction detection identifies and analyzes the interactions between different input variables in
pe
a given model. Understanding the interactions between the input variables, one better
understands how the model uses these interactions to make predictions. Through this, it will aid
one in understanding how the model is making decisions and thus provide insights into the
models' strengths and weaknesses.
Any suitable explanation method must include interactions between the features. Two main
ot

objectives in the domain of feature interaction detection are (1) To find a group of features that
depend on one another, synonymously known as Feature Interaction Detection, and (2) To
interpret in what way the group of features detected interacts with one another known as Feature
Interaction Interpretation (Tsang, M., Enouen, J., & Liu, Y. (2021).)
tn

Assessing the interpretability of a machine learning method is also very important. There are
three ways of determining the interpretability of the machine learning method (1) Application
grounded, which involves conducting human experiments within a natural application
rin

environment, e.g., a researcher who has a concrete application in mind will use domain experts
to test the workability of their solution within the environment. This metric tests the workability
of the model built for that environment but is expensive and not easy evaluation. (2) Human-
grounded evaluation use lay human instead of domain experts to test the workability of the
model. This allows a vast pool of people to test the model, and at the same time, the experiment
ep

is less expensive than the application-grounded metric. (3) Functional grounded metric uses a
formal definition of interpretability as a proxy for explanation quality (Doshi. V & B. Kim,
2017).
Zhou et al. (2021) added that some definition of interpretability is used as a proxy to evaluate the
Pr

explanation quality in functional grounded interpretability. They further divided the functional
grounded metric into three types, namely, model-based, attribution-based, and example-based

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
explanations. Model-based explanations use an intrinsic interpretable model to explain the

ed
original task model. Examples of quantitative metrics in this method are examining the model
size, the runtime operation counts, and the interaction strength that looks at the feature's effect
depending on other features' values. These can be used to evaluate the simplicity of local and
global model-based explanations. Attribution-based explanations measure the explanatory
powers of the input features and use them to explain the task model. Examples of metrics here

iew
include feature mutual information between original samples and corresponding features
extracted for explanations to monitor the simplicity and broadness of reasons, while the mutual
target information between extracted features and related targets to monitor the fidelity of
explanations. Example-based explanations explain the task model by selecting instances from the
training/testing dataset or creating new instances.

ev
The functional grounded metric will be used as the basis for evaluating the interpretability of our
proposed model due to the nature of the model being hybrid, thus still in the development phases,
and also due to expenses associated with the other two metrics used to evaluate the
interpretability of machine learning methods.

r
2.0 PROBLEM STATEMENT
er
Generalized Linear Model is an intrinsic interpretable model that belongs to a subset of machine
learning models whose simple internal structure is used to interpret the prediction results of the
pe
model. Though it is an intrinsic model, GLM falls short in the way it detects interactions from
the input variables. The model assumes that all interactions are the same and, as such, does not
search the whole sample space to see these interactions (Changpetch, P., & Lin, D. K. (2013)).
An ideal interpretable model should not only provide a means of accurate predictions but also
provide a means of explaining how the model arrived at this prediction. As such, one key
ot

ingredient to this is interaction detection.


Association rules analysis is a robust methodology for exploring relationships among items in
the form of decision rules. This article presents a hybrid model using the Rough Set theory and
tn

the Association rule mining method to detect interactions in a Generalized Linear Model
(RAGL). The advantage of RAGL lies in its ability to use Rough Set theory information
granulation procedures to define indiscernible features from discernible ones through the greedy
heuristic reduct method, which will formulate a reduct containing all the essential elements. The
rin

reduct shall then be used in generating decision rules by the Apriori method for association rule
mining. The work's main contribution can be summarized as (1)RAGL uses information
granulation for Rough Set theory to detect the interactions, reduce the sample space and get a
reduct having the detected interactions necessary for prediction; (2) RAGL generates decision
rules from the generated reduct. The rules analysis is a robust process to explore relationships
ep

that ultimately decompose the interactions into their most refined form for easy understanding.
The rules selected shall be based on the ones with the highest confidence and support. The rules
should also not be redundant or duplicated; (3) the RAGL framework will be able to convert the
rules into binary values from which these shall be the necessary interactions for prediction
Pr

purposes.

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
2.1 RELATED WORK

ed
This section reviewed work on interaction detection's use in aiding the machine learning model's
interpretability and prediction prowess. In (Tsang, Enuoen, and Liu 2021), the authors made a
case for feature interaction in interpretable A.I. The authors noted that most Post-Hoc methods
used complex black box models like neural networks do not consider the shared importance of

iew
groups of features in a dataset and hence look at the effect of a feature individually, which
hampers the interpretability of the models. They came up with two objectives for feature
interaction in the interpretability of machine learning models (1) Feature Interaction Detection,
which was to find a group of features that depend on one another, and (2) Feature interaction
Interpretation which deals with understanding how a group of features interact with one another
through interpretation of coefficients or interpreting interaction attribution among other methods.

ev
Adding interaction terms to the GLM model is one popular method for interaction detection in
GLMs. It involves creating a new variable that is the product of two or more predictors and
including it as an additional predictor in the model. The coefficient associated with the

r
interaction term measures the strength of the interaction effect. This method is simple to
implement and provides a straightforward way of detecting interaction effects. However, it can
be computationally intensive when the number of predictors is large. It also can lead to

er
overfitting and poor generalization capabilities of the model. Another issue relates to selecting
the appropriate interaction terms to include in the model and may require domain knowledge or
exploratory data analysis to identify the interactions before modeling them on the model.
pe
(McCabe et al., 2022)
Another method for interaction detection in GLMs is using polynomial or spline-based functions,
which involves transforming the predictors into polynomial or spline functions and including
them in the model. Zhang et al. (2019) used this method to analyze the relationship between age
and body mass index in a large cohort of women. An issue with this method is that increasing the
ot

degree of polynomial functions, or the number of knots in spline functions, can lead to models
with a high degree of complexity, making it difficult to interpret the results and leading to
computational challenges. Selecting the appropriate degree of a polynomial or the number of
knots in spline functions can be challenging and may require domain knowledge or exploratory
tn

data analysis to determine the most appropriate model. (Perperoglou A. et al. (2019))

Tsang et al. (2020) proposed a model GLIDER(Global Interaction Detection and Encoding for
rin

Recommendation) that uses Neural Interaction Detection with LIME for data instance
perturbation over a batch of data samples and then explicitly encodes the collected global
interactions into a target model via sparse feature crossing. Their proposed model improved the
target model prediction performance, and the detected global interactions were explainable.
ep

However, GLIDER had issues with computational complexity and interpretation of the results.
Interactions detected in GLIDER were confined to local interactions and hence could not provide
a global understanding of interactions in the model. Despite using LIME, interpreting the results
from GLIDER can be challenging incredibly for people without knowledge of deep learning. It is
computationally expensive to train a deep neural network which eats up time and system
Pr

resources.

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
Sumalatha, L., Uma Sankar, P., & Sujatha, B. (2016), in their research, proposed to use of Rough

ed
Set Theory to find behavioral patterns of bank customers. Their proposed method started by
computing their decision reduct using the discernibility matrix and degree of dependency
computation between the attributes to find the critical attributes. Then they used the selected
attributes to mine decision rules. They stated that the advantages brought about by their proposed
method were (1) dimensionality reduction and (2) 90% accuracy on customer deposit nature by

iew
the generated decision rules.
In their research, Xun, J., Xu, L. C., & Qi, L. (2012, August) used Rough Set Theory as a basis
for mining Association rules from a given dataset. This research combines the idea of the Apriori
Algorithm with a Decision Table. This method has three advantages: eliminating redundancy
attributes, reducing the number of attributes, and the ability to produce decision attribute sets

ev
from a cost of only one Decision Table scan. The method removed redundant attributes through
the simplified decision table, then generated frequent itemsets by the improved Apriori
Algorithm.

r
Slimani, T. (2015) used the Roughset method to mine Class association rules which contained
classes as their consequences. The research paper discussed an efficient algorithm for
discovering these rules inspired by the Apriori algorithm. It is based on a principle from rough

er
set theory, which involves looking at the elementary set of lower approximations included in
rough sets. This approach is more straightforward and more effective than other classic methods
of finding association rules. The proposed algorithm works by first making multiple passes over
pe
the data and counting the support of each 1-rule item (each rule item contains one item in its
condition set). A particular expression denotes the set of all rule items, and the algorithm's goal
is to identify the frequent candidate k-rule items. The unique difference between the C_Apriori
algorithm and the Apriori algorithm is that in C_Apriori, rule items are joined to create the same
class.
ot

Ong, C. S., Huang, J. J., & Tzeng, G. H. (2004) the authors proposed detecting interactions
between different factors through 'information granulation", which is the basis behind the Rough
Set theory. To simplify the problem, they applied a decision rule based on the 'Rough set theory'
to reduce the number of factors they had to consider. After this, they used a method called
tn

'stepwise selection' to determine the 'significant interaction effects' which had the most extensive
influence on the model. Finally, the authors found that when the logit model incorporated these
interactions, it performed better than other methods.
rin

2.1.1 RoughSet Theory

Rough Set theory is a knowledge discovery method highly applied in relational databases.
Professor Pawlak first introduced it in 1982. It uses Information granulation to group similar
objects into a single, collective entity (granules) to simplify information representation and
ep

decision-making. Information granulation helps reduce information's complexity while


preserving essential properties and relationships between objects. This allows for more efficient
and effective decision-making based on simplified information.
Pr

In rough set theory, we use an information system to represent data with its associated attributes
𝐼 = (𝑈,𝐴). We can then calculate an indiscernibility relation 𝑈/𝐼𝑁𝐷(𝐴), and the elements in
each indiscernibility are known as granules. The granule represents a set of conditional attributes

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
that affect a decisional attribute and can be represented as 𝐺 = {𝐶1𝐶2𝐶3……𝐶𝑛} that satisfy a

ed
decision 𝐶𝑖→𝐷𝑖 where 𝐷𝑖 is the decision attribute represented by the detected information
granule. Rough Set theory uses the concept of approximations to define features that indeed
belong or do not belong to the target set 𝑎𝑝𝑟 = (𝑈,𝐴), where U is the universe of discourse and
A is the set of attributes known as lower and upper approximations respectively.( Raza, M. S., &

iew
Qamar, U. (2017))

Lower approximation defines objects that belong to the target set.

𝑎𝑝𝑟(𝐴) = {𝑥|𝑥 ∈ 𝑈,𝑈/𝐼𝑁𝐷(𝐴) ⊂ X} …….(i)

ev
Upper approximation is a set of objects that may belong to the target set.

𝑎𝑝𝑟(𝐴) = {𝑥|𝑥 ∈ 𝑈,𝑈/𝐼𝑁𝐷(𝐴) ∩ 𝑋 ≠ ∅} …..(ii)

r
The Lower approximations are used to determine the reduct, which is the RoughSet theory
process of removing irrelevant features from the decision table/information system. ( Raza, M.
S., & Qamar, U. (2017)).

er
The generation of the reduct is the next pivotal step, and the method for use is the greedy
heuristic reduct generation method. The method uses the greedy algorithm to compute decision
reducts. The algorithm is defined as 𝑄𝑑:𝐴 𝑥 2𝐴→ℝ + ∪ {0} corresponds to a monotonic attribute
pe
quality measure in that it decreases with the increasing size of the set from its second argument.
This function also needs a property that equals 0 if the second argument is a superreduct, a
collection of attributes that discerns all objects from different decision classes Janusz, A., Ślęzak,
D. (2014)).
ot

2.1.2 Association Rule Mining

Association rule mining is a data mining technique that identifies relationships between items in
tn

large datasets. It is based on finding associations between items in a transaction database and
identifying the Support, confidence, and Lift of the rules generated from the associations. These
rules can be used for various purposes, such as market basket analysis, recommendation systems,
and fraud detection. The rules are in the form where A is the antecedent part and B is the
consequent part of the rule(Abdel-Basset et al., 2018), primarily represented in the form of
rin

IF…THEN…. Statements. The most commonly used algorithms for association rule mining are
the Apriori algorithm and the FP-Growth algorithm. Association rules are mined and filtered
based on three metrics which are:
ep

Support is the number of transactions containing a particular item set divided by the total number
of transactions. It indicates how often the association rule appears in the data set.

𝐴→𝐵 = 𝑃(𝐴 ∪ 𝐵)……(iii)


Pr

Confidence is the percentage of transactions containing itemset X and itemset Y. Typically, how
reliable a rule calculation is.

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
𝑃(𝐴 ∪ 𝐵)
𝐴→𝐵 = 𝑃(𝐵│𝐴) = 𝑃(𝐴)𝑃(𝐵)…….(iv)

ed
Lift indicates the strength of the dependence between the antecedents and consequences of the
association rule.

iew
Where 𝑃(𝐴𝐵)is the probability that A and B coincide in the data to be analyzed. 𝑃(𝐵|𝐴)is the
conditional probability of B given A, 𝑃(𝐴)is the probability that A appears in the dataset and
𝑃(𝐵) is the probability that B appears in the dataset(Santoso, M. H. (2021))

The Apriori method identifies frequent individual items in a given database. It then explains
them to larger item sets while checking that the item sets appear sufficiently in the database.

ev
Thus, the aim is to identify frequent item sets that satisfy minimum support, and the generated
rules will satisfy the minimum confidence. The algorithm first determines the frequent itemsets
by setting a minimum support threshold. Next, the rules are generated from the frequent itemsets
by computing the confidence, after which the rules are pruned to remove redundant ones. The

r
process is iterative until all the frequent itemsets have been used to find the rules.

The FP-growth (Frequent Pattern) algorithm builds conditional parameters based on the FP-tree
er
structure to generate a complete set of conditional parameters. The tree structure will maintain
the associations between the frequent itemsets, which shall then be analyzed for patterns of
associations (Gupta, A. 2019).
pe
In this work, we focused on using the Apriori method to generate the decision rules from the
reduct generated by the Rough Set theory method.

2.1.3 GLM (Generalized Linear Model)


ot

Linear regression is one of the most interpretable models today in the machine learning realm. It
predicts a target as a weighted sum of inputs. The way linear regression makes interpretability
easy is by viewing the linearity of the learned relationships. This is achieved through modeling
tn

the dependence of the target variable y on some features of X. Generalized linear models are an
offshoot of the interpretable model linear regression. It is mainly used to tackle quantitative
problems by statisticians and computer scientists(C Molnar, 2019).
A GLM is one of the extensions of the linear model to model non-linear outcomes. The key
rin

defining feature of a GLM is to allow non-Gaussian outcome distributions and connect the
distributions and the weighted sum through a non-linear function. Thus, a GLM can be modeled
to give a categorical outcome and a count outcome, but a few of which a linear model cannot
produce (C Molnar, 2019).
ep

GLM consists of THREE essential parts, namely (1) Random Component – refers to the
probability distribution of the response variable (Y), e.g., normal distribution for Y in the linear
regression or binomial distribution for Y in the binary logistic regression. Also called a noise
model or error model; (2) Systematic Component - specifies the explanatory variables (X1, X2
Pr

... Xk) in the model, precisely their linear combination in creating the so-called linear predictor,
e.g., β0 + β1x1 + β2x2, as we have seen in linear regression; (3) Link Function, η or g (μ) -

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
specifies the link between random and systematic components. It says how the expected value of

ed
the response relates to the linear predictor of explanatory variables, e.g., η = g (E (Yi)) = E (Yi)
for linear regression or η = logit (π) for logistic regression.
The table below explains a few of the models that are for the GLM, their link functions, types of
problems, and the data they handle.

iew
Table 1:GLM models and their link functions

Model Random Link Systematic


Linear Regression Normal Identity Continuous

ev
ANOVA Normal Identity Categorical
ANCOVA Normal Identity Mixed
Logistic Regression Binomial Logit Mixed

r
Log-linear Poisson Log Categorical
Poisson Regression Poisson Log Mixed
Multinomial response Multinomial er Generalized Logit Mixed
pe
The link function in GLM is representative of each distribution from the exponential family (C
Molnar, 2019). The choice of the proper link function; there is no predefined way of choosing
the function (C Molnar, 2019). One must consider their target's distribution and how well the
model fits the data. For this research, the binomial link function was used because the target
variable in our dataset was categorical data with two categories: Rain (1) and No rain (0).
ot

The decision rules generated from the Association rule mining process and rough set theory will
be converted into binary values before being fitted into the GLM model. The outcome should be
whether the proposed approach had better prediction capabilities and interpretability than a
tn

classical GLM model.

2.2 PROPOSED RAGL (Hybrid Rough Set theory -Association Rule Mining-Generalized
rin

Linear Model) MODEL


The proposed model is a hybrid model that allows for (1) Feature interaction detection leading to
data dimensionality reduction, (2) Model selection for the Generalized Linear Model, and (3)
improved classification performance using the detected features. The Feature selection and
ep

dimensionality reduction are handled by the Rough Set theory method, which employs the
greedy heuristic method to select the best features within the dataset. The Association Rule
mining method generates decision rules from the specified features, and these rules, in turn, form
the interactions between the selected features. The decision rules are converted into a data frame
Pr

with binary value representations, which are then modeled onto the Generalized Linear Model
employing the Logit function for classification purposes. The dataset must first be discretized
because the data was in continuous form.

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
The algorithm is described below:

ed
ALGORITHM: RAGL
Input Decision Table T = (U, A, C, D) where U is the universe of discourse, A is a family of
equivalence relations over U, and C and D are conditional and

iew
decisional attributes, respectively.

Quality measure 𝑄𝑑 :𝐶 ∗ 2𝑐→ℛ + 𝑈{𝑜}

Output: - RED of T: Defines that the output will be a decision reduct RED from the input
Decision Table T

ev
1. Rough Set theory greedy heuristic method to get essential attributes and formulate them
into a decision reduct RED.
T{ }
𝑄𝑚𝑎𝑥 = ∞

r
While 𝑄𝑚𝑎𝑥 ≠ 0
Do
for each 𝑐 ∈ 𝐶 do
𝑄𝑐 = 𝑄𝑑(𝐶,𝑇)
If 𝑄𝑐 > 𝑄𝑚𝑎𝑥 then
𝑄𝑚𝑎𝑥 = 𝑄𝑐
er
pe
𝑐𝑏𝑒𝑠𝑡 = 𝐶;

𝑒𝑛𝑑
end
ot

if 𝑄𝑚𝑎𝑥 > 0 then

𝑇 = 𝑅𝐸𝐷 ∪ {𝑐𝑏𝑒𝑠𝑡}
tn

𝑄𝑚𝑎𝑥 =‒ ∞

end
end
rin

for each 𝑐 ∈ 𝑅𝐸𝐷 do

if 𝑄𝑑(𝑐, 𝑅𝐸𝐷{𝑐}) = 0 then


ep

𝑅𝐸𝐷 = 𝑅𝐸𝐷{𝑐}
End
End
Pr

Return RED

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
End;

ed
Input 𝑹𝑬𝑫 = {(𝐶),𝐷)|𝑐 ∈ 𝐶 𝑎𝑛𝑑 𝑑 ∈ 𝐷}, 𝑘 = 0, the input is our decision reduct RED generated
from the first step of the RAGL algorithm, and k represents the conversion of the
reduct to a transaction containing all the frequent set items to be converted into
decision rules.

iew
2. 𝑅𝐸𝐷𝑘←𝑖𝑛𝑖𝑡()
|𝑅𝐸𝐷 ∗ (𝑐) ∪ 𝑅𝐸𝐷 ∗ (𝑑)|
𝑆← { |𝑘| }
"𝑚𝑖𝑛𝑠𝑢𝑝𝑝

ev
{|𝑅𝐸𝐷 ∗ (𝑐) ∪ 𝑅𝐸𝐷 ∗ (𝑑)|}
𝐶← "𝑚𝑖𝑛𝑐𝑜𝑛𝑓
𝑅𝐸𝐷 ∗ (𝑐)

For (𝑐 = 𝑘;𝑆𝑘 ‒ 1 ≠ ∅;𝑐 ++ ) 𝑑𝑜

r
𝑅𝐸𝐷𝑐←𝐶𝐴𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 ‒ 𝑔𝑒𝑛 (𝑆𝑖 ‒ 1)
For each transaction 𝑘 ⊆ 𝐾do

{ |
𝑆𝑖← 𝑐 ∈ 𝐶𝑖 𝑐.𝑠𝑢𝑝𝑝𝑜𝑟𝑡 ≥ 𝑚𝑖𝑛𝑠𝑢𝑝𝑝}
er
If (k.Condset is included in RED), then
k.condsupport ++
pe
𝐶𝐴𝑖←{𝑓│𝑓 ∈ 𝐹,𝑓.𝑠𝑢𝑢𝑝𝑝𝑜𝑟𝑡 ≥ 𝑚𝑖𝑛𝑐𝑜𝑓}
End for
Return 𝐶𝐴←𝑅𝐸𝐷𝑖𝐶𝐴𝑖
ot

3. We now have the decision rules in the form of 𝑖𝑓 𝑋𝑖 = 𝑎 𝑎𝑛𝑑 𝑋𝑛 = 𝑏 𝑡ℎ𝑒𝑛 𝑌 = 𝑐 where
𝑋𝑖 and 𝑋𝑛 Are the conditional attributes generated from steps 1 and 2 of RAGL while Y is
tn

the decision attribute. We generate binary decision variables from the rules using the
following association of the conditional attribute to their binary association.
{
1 𝑖𝑓𝑋1 = 0, 𝑋2 = 1, 𝑋3 = 0……𝑋𝑛 = 1
𝑋1 (0),𝑋2 (1), 𝑋3 (0)…𝑋𝑛 (1) = 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
rin

4. Having converted the decision rules into binary values, we fit them into a GLM model

3. EXPERIMENT SETUP
ep

The experiment was performed over the Kariki Farm weather data scrapped from the
wunderground.com website. Kariki farm is a 22-hectare cultivated land in the Juja area
belonging to the Marginpar Group, which deals primarily with growing flowers and other
horticulture crops. The farm houses a personal weather station(PWS) that relays its weather
readings to the wnderground.com website. After the scrapping, data was stored in a google sheet,
Pr

and from there, it was loaded into R studio for preprocessing and then modeled onto the
proposed RAGL model. The objective of the experiments was to assess whether combining the

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
use of Rough Set theory and Association rule mining to detect interactions in a Generalized

ed
linear model can improve the accuracy and interpretability of the Generalized linear model.
Rough Set theory was used as a feature selection model. The model selected the most critical
features from the 17 features in the dataset and was left only with ten features that impacted the
target variable, Rain. From the features chosen, association rule mining using the Apriori method
was used to mine frequent itemsets from the features and ultimately generate decision rules. The

iew
support was set at 0.01, and the confidence at 0.8 to generate a total of 1431 rules. The rules
were pruned to remove redundant and duplicate rules. From the 1431 rules, we were left with 54
rules. These rules were later converted into binary values by choosing the top 30 rules with the
highest lift/confidence/support.
The experiments were carried out in two scenarios, one with interaction detection and the other

ev
without interaction detection. The results from the experiments were analyzed on two fronts:
prediction capabilities, where the metrics of accuracy, precision, recall, and pseudo. R-squared
measures were used to measure the prediction capabilities of the proposed solution, whereas, for
interpretability capabilities, the Model-based explanation sub-metric under the functional level

r
metric proposed by Doshi-Velez and Kim(2017) was used. Here we compared the complexity of
the two models between RAGL and the classical Logit model. We used the AIC (Akaike
Information Criterion) and BIC (Bayesian Information Criterion) to check the complexity of the
er
two models and which models fit the data well and give good predictions. Also, we studied the
interaction strength of the features in the two models and compared the two models, which were
able to give a comprehensive cause and effect between the features.
pe
4. RESULTS AND DISCUSSION
This section discusses the experiment's results using the classical Logit model and the proposed
RAGL model on prediction and interpretability capabilities.
4.1.1 Experiment Prediction Metrics: Prediction and Interpretation using the classical
ot

GLM model vs. using the RAGL model


Table 2 below shows the experiment results while running the classical GLM model without
tn

interaction detection on the weather dataset against the results of our proposed model, RAGL.
The experiment showed that the proposed model performed better in prediction and
interpretability capabilities.
Table 2:Classical GLM vs. RAGL results
rin

Metric RAGL model Normal GLM model


Classification accuracy 0.905 0.8592763
Precision 0.905 0.844
Recall/Sensitivity 0.905 0.844
ep

Area Under Curve 0.973 0.886


AIC 22 819.65
BIC 80 9838.642
Pseudo R2 1.000 0.4816824
Pr

McFadden's Pseudo R-squared was used. The results show that our proposed model performed
better in prediction than the classical GLM model. Reviewing the Pseudo R squared

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
measurement shows that RAGL scored higher than the classical GLM model without interaction

ed
detection. The higher values indicated a better fit of the model to the data by the RAGL model
than the classical GLM model.
Regarding classification accuracy, we see that RAGL had a 6% increase compared to the
classical GLM model. The above can be represented in graphical format as shown in figure 1,

iew
which shows that the proposed model RAGL had better performance in classification accuracy,
precision, recall, and Pseudo RSquared to the classical GLM model.

Prediction Metrics for Classical GLM vs RAGL


1.2

ev
1

0.8

0.6

r
0.4

0.2

0
Classification accuracy Precision er Recall/Sensitivity
Classical GLM model without interaction detection
Pseudo Rsquared
RAGL Model
pe
ot
tn
rin

Figure 1: Prediction performance Classical GLM Vs. RAGL


Comparing the complexity of the proposed model vs. the classical GLM model, we see that our
proposed model had better AIC and BIC metrics than the classical GLM model. This shows that
the proposed model had a better fit for the data.
ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
ed
Complexity metrics for classical GLM vs RAGL

iew
10000
8000
6000
4000
2000
0
Classical GLM model without RAGL Model

ev
interaction detection
AIC BIC

Figure 2: Complexity metrics for Classical GLM Vs. RAGL

r
The AIC and BIC values for the Classical GLM and the proposed RAGL significantly differ. The
AIC and BIC values for RAGL are 22 and 80. To find the attributes that led to a significant drop
in the AIC values, we ran stepwise regression on the RAGL model to determine which attributes
er
led to this significant drop. The attributes determined from the RAGL model are as depicted in
table 3:
pe
ot

Table 3: Candidate models for the Weather dataset


S/No Variables AIC
tn

1 RAIN ~ High_Temp.15.7.23.7. + High_Temp.23.7.25.3. + High_Temp.25.3.30.3. + 44


Low_Temp..40.14.5. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +
Dewpoint_High.7.3.13.1. + Dewpoint_High..13.1.15.6. + Dewpoint_High..15.6.22.4. +
Humidity_High.31.80. + Humidity_High..80.89. + Humidity_High..89.99. +
Windspeed_High.0.18. + Windspeed_High..18.28. + Windspeed_High.28.59.7. +
rin

Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. + Windspeed_Avg.5.4.18.5. +


High.Hpa..849.3.1023. + High.Hpa..1023.1024. + High.Hpa..1024.1100.
2 RAIN ~ High_Temp.15.7.23.7. + High_Temp.23.7.25.3. + High_Temp.25.3.30.3. + 42
Low_Temp..40.14.5. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +
Dewpoint_High.7.3.13.1. + Dewpoint_High..13.1.15.6. + Humidity_High.31.80. +
ep

Humidity_High..80.89. + Humidity_High..89.99. + Windspeed_High.0.18. +


Windspeed_High..18.28. + Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. +
Windspeed_Avg.2.4.5.4. + Windspeed_Avg.5.4.18.5. + High.Hpa..849.3.1023. +
High.Hpa..1023.1024. + High.Hpa..1024.1100.
Pr

3 RAIN ~ High_Temp.15.7.23.7. + High_Temp.23.7.25.3. + High_Temp.25.3.30.3. + 40


Low_Temp..40.14.5. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +
Dewpoint_High.7.3.13.1. + Dewpoint_High..13.1.15.6. + Humidity_High.31.80. +

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
Humidity_High..80.89. + Windspeed_High.0.18. + Windspeed_High..18.28. +

ed
Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. +
Windspeed_Avg.5.4.18.5. + High.Hpa..849.3.1023. + High.Hpa..1023.1024. +
High.Hpa..1024.1100.
4 Step: AIC=38 38
RAIN ~ High_Temp.15.7.23.7. + High_Temp.25.3.30.3. + Low_Temp..40.14.5. +

iew
Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. + Dewpoint_High.7.3.13.1. +
Dewpoint_High..13.1.15.6. + Humidity_High.31.80. + Humidity_High..80.89. +
Windspeed_High.0.18. + Windspeed_High..18.28. + Windspeed_High.28.59.7. +
Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. + Windspeed_Avg.5.4.18.5. +
High.Hpa..849.3.1023. + High.Hpa..1023.1024. + High.Hpa..1024.1100.
5 Step: AIC=36 36

ev
RAIN ~ High_Temp.15.7.23.7. + Low_Temp..40.14.5. + Low_Temp.14.5.15.9. +
Low_Temp.15.9.26.3. + Dewpoint_High.7.3.13.1. + Dewpoint_High..13.1.15.6. +
Humidity_High.31.80. + Humidity_High..80.89. + Windspeed_High.0.18. +
Windspeed_High..18.28. + Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. +
Windspeed_Avg.2.4.5.4. + Windspeed_Avg.5.4.18.5. + High.Hpa..849.3.1023. +

r
High.Hpa..1023.1024. + High.Hpa..1024.1100.
6 Step: AIC=34 34

er
RAIN ~ High_Temp.15.7.23.7. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +
Dewpoint_High.7.3.13.1. + Dewpoint_High..13.1.15.6. + Humidity_High.31.80. +
Humidity_High..80.89. + Windspeed_High.0.18. + Windspeed_High..18.28. +
Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. +
pe
Windspeed_Avg.5.4.18.5. + High.Hpa..849.3.1023. + High.Hpa..1023.1024. +
High.Hpa..1024.1100.
7 Step: AIC=32 32
RAIN ~ High_Temp.15.7.23.7. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +
Dewpoint_High..13.1.15.6. + Humidity_High.31.80. + Humidity_High..80.89. +
Windspeed_High.0.18. + Windspeed_High..18.28. + Windspeed_High.28.59.7. +
ot

Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. + Windspeed_Avg.5.4.18.5. +


High.Hpa..849.3.1023. + High.Hpa..1023.1024. + High.Hpa..1024.1100.
8 Step: AIC=30 30
RAIN ~ High_Temp.15.7.23.7. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +
tn

Dewpoint_High..13.1.15.6. + Humidity_High..80.89. + Windspeed_High.0.18. +


Windspeed_High..18.28. + Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. +
Windspeed_Avg.2.4.5.4. + Windspeed_Avg.5.4.18.5. + High.Hpa..849.3.1023. +
High.Hpa..1023.1024. + High.Hpa..1024.1100.
9 Step: AIC=28 28
rin

RAIN ~ High_Temp.15.7.23.7. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +


Humidity_High..80.89. + Windspeed_High.0.18. + Windspeed_High..18.28. +
Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. +
Windspeed_Avg.5.4.18.5. + High.Hpa..849.3.1023. + High.Hpa..1023.1024. +
ep

High.Hpa..1024.1100.
10 Step: AIC=26 26
RAIN ~ High_Temp.15.7.23.7. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +
Humidity_High..80.89. + Windspeed_High.0.18. + Windspeed_High..18.28. +
Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. +
Pr

Windspeed_Avg.5.4.18.5. + High.Hpa..1023.1024. + High.Hpa..1024.1100.


11 Step: AIC=24 24

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
RAIN ~ High_Temp.15.7.23.7. + Low_Temp.14.5.15.9. + Low_Temp.15.9.26.3. +

ed
Humidity_High..80.89. + Windspeed_High.0.18. + Windspeed_High..18.28. +
Windspeed_High.28.59.7. + Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. +
High.Hpa..1023.1024. + High.Hpa..1024.1100.
12 Step: AIC=22 22
RAIN ~ High_Temp.15.7.23.7. + Low_Temp.14.5.15.9. + Humidity_High..80.89. +

iew
Windspeed_High.0.18. + Windspeed_High..18.28. + Windspeed_High.28.59.7. +
Windspeed_Avg.0.2.4. + Windspeed_Avg.2.4.5.4. + High.Hpa..1023.1024. +
High.Hpa..1024.1100.
We get the lowest AIC for our proposed model when considering the ten variables we deduced
from the RAGL model on the dataset. These attributes reveal that RAGL split the detected
interactions according to their respective values and relation to the RAIN target variable.

ev
Compared with the classical GLM model, where it equates that the predictors have a similar
interaction with the target variable. We can easily explain the model by using these predictors as
a whole.
4.1.2 Experiment Interpretability Metrics: Interpretability of the Classical GLM model vs.

r
interpretability of the RAGL model.

er
Table 2 shows that the RAGL model had better AIC and BIC scores than the classical GLM
model. This shows that our proposed model had a better fit for the data. Also, the complexity of
the model was significantly reduced because RAGL only used the detected interaction to build
the model instead of considering even non-important features. Regarding the attribute
pe
coefficient, the RAGL model gave a better account of the features because the model had to
discretize the continuous features before generating decision rules on these features. The analysis
of the deviance table of the classical GLM model in Table 4 vs. RAGL in Table 5 shows that
RAGL had a better fit and could better understand the model parameters and coefficients. This
we can see in how our proposed model RAGL was able to estimate the coefficients of the
predictors. For example, we can use the High_Temp predictor for the classical GLM model vs.
ot

the High_Temp(15.7;23.7) predictor in the RAGL model. We can see that p-values are different
for these two coefficients in the sense that the predictor in the RAGL model for
High_Temp(15.7;23.7) had a p-value much lower than that of the classical GLM model and, as
tn

such, showed that High_Temp(15.7;23.7) was more significantly associated with the outcome
variable. Here 15.7 to 23.7 represents the range of the temperature in question. In the classical
GLM model, the predictor High_Temp equates that it is similar to all ranges of the high
temperature, and we know that is not the case. In the sense that High_Temp(15.7;23.7) has a
rin

significant contribution to Rain occurring.


Table 4: ANOVA table for classical GLM model

D.F. Deviance Resid Resid. df Resid.Dev Pr(>Chi)


ep

NULL 1620 2003.1


High_Temp 1 39.00 1619 1964.1 4.248e-10***
Avg_Temp 1 0.18 1618 1963.9 0.6718811
Low_Temp 1 333.56 1617 1630.4 <2.2e-16***
Pr

Dewpoint_High 1 408.77 1616 1221.6 <2.2e-16***


Dewpoint_Avg 1 28.72 1615 1192.9 8.369e-08***

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
Dewpoint_low 1 1.07 1614 1191.8 0.3001695

ed
Humidity_High 1 13.17 1613 1178.6 0.000284***
Humidity_Avg 1 37.63 1612 1141.0 8.532e-10***
Humidity_Low 1 4.62 1611 1136.4 0.0315486*
Windspeed_High 1 0.02 1610 1136.3 0.8903394

iew
Windspeed_Avg 1 33.73 1609 1102.6 6.320e-09***
High(Hpa) 1 46.26 1608 1056.4 1.038e-11***
Low(Hpa) 1 1.92 1607 1054.4 0.1659203

r ev
Table 5: ANOVA table for RAGL

NULL
High_Temp.15.7.23.7. 1 15.1656
36
35
er
Df Deviance Resid Resid. df Resid.Dev
49.61
34.795
Pr(>Chi)

9.848e-05 ***
pe
High_Temp.23.7.25.3. 1 1.2918 34 33.503 0.255714
High_Temp.25.3.30.3 1 0.6877 33 32.815 0.406964
Low_Temp..40.14.5. 1 0.7189 32 32.097 0.396514
Low_Temp.14.5.15.9. 1 2.3137 31 29.783 0.128236
Low_Temp.15.9.26.3. 1 1.1520 30 28.631 0.283125
ot

Dewpoint_High.7.3.13.1. 1 0.6304 29 28.000 0.427217


Dewpoint_High..13.1.15.6. 1 6.1785 28 21.822 0.012931 *
Dewpoint_High..15.6.22.4. 1 9.041 27 12.781 0.002640 **
Humidity_High.31.80. 1 0.4733 26 12.307 0.491456
tn

Humidity_High..80.89. 1 2.8046 25 9.503 0.093995


Humidity_High..89.99. 1 9.5027 24 0.000 0.002052 **
Windspeed_High.0.18. 1 0.0000 23 0.000 1.000000
Windspeed_High..18.28. 1 0.0000 22 0.000 1.000000
rin

Windspeed_High.28.59.7. 1 0.0000 21 0.000 0.999999


Windspeed_Avg.0.2.4. 1 0.0000 20 0.000 0.999999
Windspeed_Avg.2.4.5.4. 1 0.0000 19 0.000 0.999999
Windspeed_Avg.5.4.18.5. 1 0.0000 18 0.000 0.999999
ep

High.Hpa..849.3.1023. 1 0.0000 17 0.000 1.000000


High.Hpa..1023.1024. 1 0.0000 16 0.000 0.999997
High.Hpa..1024.1100 1 0.0000 15 0.000 0.999998
Pr

5. DISCUSSION

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
Rough set theory and association rule mining methods can improve the interpretability of logistic

ed
regression by identifying the most critical factors or rules contributing to the model's predictions.
Rough Set Theory helped identify the most critical factors driving the model's predictions and
reduce the number of input variables that needed to be considered, making the model more
interpretable. Rough set theory was used to identify a minimal subset of input variables sufficient
to predict the output variable accurately. Association rule mining methods were used to identify

iew
patterns or rules within the data associated with a particular outcome. These rules were used to
explain the model's predictions and help understand how different input variables are related to
the output variable.
Rough set theory and association rule mining helped identify relevant relationships between
variables used to improve the model's accuracy. Generalized linear models were then used to

ev
model these relationships in a more sophisticated and precise manner, resulting in a hybrid
model that was more accurate than a single-model approach.
Both rough set theory and association rule mining methods were used in combination with

r
logistic regression to improve interpretability by providing a more comprehensive understanding
of the data and the factors driving the model's predictions.

er
RAGL had better prediction metrics and interpretability than the classical GLM model.
6. CONCLUSION
pe
The research combined the Rough Set theory, association rule mining, and the Generalized
Linear model into one hybrid model. Through this hybrid model, we saw an increase in
interpretability. Rough Set Theory and Association Rule mining methods provided insights into
the relationships between variables and helped identify critical features. Combining the two with
the generalized linear model resulted in a hybrid model that was more interpretable and easier to
understand. It also improved accuracy, where feature selection, Rough Set theory, and
ot

Association Rule mining methods identified the critical features in the data. The generalized
linear model was used for prediction purposes on these detected features. The Generalized linear
model was able to understand these detected features more efficiently and provide better
tn

prediction and interoperability results.


In the future, some additional areas to be investigated include using the proposed framework in a
multinomial Generalized Linear model and including aspects of ensembling the hybrid RAGL
model using bagging and boosting methods.
rin

REFERENCES

1. Abdel-Basset, M., Mohamed, M., Smarandache, F., & Chang, V. (2018). Neutrosophic
ep

association rule mining algorithm for big data analysis. Symmetry, 10(4), 106.
2. Bello, R., & Falcon, R. (2017). Rough sets in machine learning: A review. In Thriving Rough Sets (pp.
87-118). Springer, Cham.
Pr

3. Bühlmann, P. (2012). Bagging, boosting, and ensemble methods. In Handbook of computational


statistics (pp. 985-1022). Springer, Berlin, Heidelberg.

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
4. Carvalho, D. V., Pereira, E. M., & Cardoso, J. S. (2019). Machine learning interpretability: A survey on

ed
methods and metrics. Electronics, 8(8), 832.
5. Changpetch, P., & Lin, D. K. (2013). Model selection for logistic regression via association rules
analysis. Journal of Statistical Computation and Simulation, 83(8), 1415-1428.
6. Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv
preprint arXiv:1702.08608.

iew
7. Du, M., Liu, N., & Hu, X. (2019). Techniques for interpretable machine learning. Communications of
the ACM, 63(1), 68-77.
8. Frost, J. (2018). How To Interpret R-squared in Regression Analysis - Statistics By Jim. Retrieved
November 20, 2020, from https://statisticsbyjim.com/regression/interpret-r-squared-regression/
9. George, J., Letha, J., & Jairaj, P. G. (2016). Daily rainfall prediction using a generalized linear bivariate
model–A case study. Procedia Technology, 24, 31-38.

ev
10. Gupta, A. (2019, June 12). ML: Eclat algorithm. GeeksforGeeks. Retrieved September 26,
2022, from https://www.geeksforgeeks.org/ml-eclat-algorithm/
11. Haykin, S. S., Haykin, S. S., Haykin, S. S., Elektroingenieur, K., & Haykin, S. S. (2009). Neural

r
networks and learning machines (Vol. 3). Upper Saddle River: Pearson.
12. Hassanien, A. E., Abdelhafez, M. E., & Own, H. S. (2008). Rough Sets Data Analysis in Knowledge
Discovery: A Case of Kuwaiti Diabetic Children Patients. Advances in Fuzzy Systems.

er
13. Huang, Y., Zhao, H., & Huang, X. (2019, February). A Prediction Scheme for Daily Maximum and
Minimum Temperature Forecasts Using Recurrent Neural Network and Rough set. In IOP Conference
Series: Earth and Environmental Science (Vol. 237, No. 2, p. 022005). IOP Publishing.
14. Janusz, A., & Ślęzak, D. (2014). Rough set methods for attribute clustering and selection. Applied
pe
Artificial Intelligence, 28(3), 220-242.
15. Jayasingh, S. K., & Mantri, J. K(2019). Soft Computing Approaches on Climate Modeling and Weather
Predictions: Article. International Journal of Engineering and Advanced Technology.
16. Liakos, K., Busato, P., Moshou, D., Pearson, S., & Bochtis, D. (2018). Machine learning in agriculture:
A review. Sensors, 18(8), 2674
ot

17. Liu, D., Li, T., & Liang, D. (2014). Incorporating logistic regression to decision-theoretic rough sets for
classifications. International Journal of Approximate Reasoning, 55(1), 197-210.
18. McCabe, C. J., Halvorson, M. A., King, K. M., Cao, X., & Kim, D. S. (2022). Interpreting
tn

interaction effects in generalized linear models of non-linear probabilities and


counts. Multivariate Behavioral Research, 57(2-3), 243-263.
19. Misiani, H., 2021. How can Artificial Intelligence Help Improve Climate Forecasting and Risk
Information? [online] ICPAC. Available at: <https://www.icpac.net/news/how-can-artificial-
intelligence-help-improve-climate-forecasting-and-risk-
rin

information/#:~:text=Climate%20is%20chaotic%20in%20nature,have%20not%20been%20fully%20un
derstood.&text=Every%20day%20different%20centers%20around,simulations%2C%20satellites%2C%
20and%20radars.> [Accessed November 8, 2022].
20. Mohankumar, S., & Balasubramanian, V. (2016). Identifying effective features and classifiers for short-
ep

term rainfall forecast using rough sets maximum frequency weighted feature reduction
technique. Journal of computing and information technology, 24(2), 181-194.
21. Molnar, C. (2019). Interpretable machine learning. Lulu. com.
22. Molnar, C., Casalicchio, G., & Bischl, B. (2020). Interpretable Machine Learning--A Brief History,
State-of-the-Art, and Challenges. arXiv preprint arXiv:2010.09337.
Pr

23. Nguyen, H. S. (2001). On efficient handling of continuous attributes in large databases; Fundamenta
Informaticae, 48(1), 61-81.

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406
24. Ong, C. S., Huang, J. J., & Tzeng, G. H. (2004, June). Using rough set theory for detecting the

ed
interaction terms in a generalized logit model. In International Conference on Rough Sets and Current
Trends in Computing (pp. 624-629). Springer, Berlin, Heidelberg.
25. Pawlak, Z. (1982). Rough sets. International journal of computer & information sciences, 11(5), 341-
356.
26. Perperoglou, A., Sauerbrei, W., Abrahamowicz, M., & Schmid, M. (2019). A review of spline

iew
function procedures in R. BMC medical research methodology, 19(1), 1-16.
27. Raza, M. S., & Qamar, U. (2017). Understanding and using rough set-based feature selection:
concepts, techniques, and applications. Springer Singapore.
28. Rissino, S., & Lambert-Torres, G. (2009). Rough set theory—fundamental concepts, principals, data
extraction, and applications. In Data mining and knowledge discovery in real-life applications.

ev
IntechOpen.
29. Ruzgar, B., & Ruzgar, N. S. (2008). Rough sets and logistic regression analysis for loan
payment. International journal of mathematical models and methods in applied sciences, 2(1), 65-73.
30. Santoso, M. H. (2021). Application of Association Rule Method Using Apriori Algorithm to Find Sales
Patterns Case Study of Indomaret Tanjung Anom. Brilliance: Research of Artificial Intelligence, 1(2), 54-

r
66.
31. Singh, M. K., Akhtar, Z., & Sharma, D. K. (2006). Challenges and Research Issues in Association Rule

V1N2, 767-774.
er
Mining. The proc. of International Journal of Electronics and Computer Science Engineering (IJECSE)

32. Slimani, T. (2015). Class association rules mining based rough set method—arXiv preprint
arXiv:1509.05437.
pe
33. Sumalatha, L., Uma Sankar, P., & Sujatha, B. (2016). Rough set-based decision rule generation to find
behavioral patterns of customers. Sādhanā, 41(9), 985-991.
34. Tsang, M., Enouen, J., & Liu, Y. (2021). Interpretable Artificial Intelligence through the Lens of
Feature Interaction. arXiv preprint arXiv:2103.03103.
35. Whittingham, M. J., Stephens, P. A., Bradbury, R. B., & Freckleton, R. P. (2006). Why do we still use
stepwise modeling in ecology and behavior? Journal of animal ecology, 75(5), 1182-1189.
ot

36. Widz, S., & Ślęzak, D. (2012). Rough set-based decision support—models easy to interpret. Rough
Sets: Selected Methods and Applications in Management and Engineering (pp. 95-112). Springer,
London.
tn

37. Xun, J., Xu, L. C., & Qi, L. (2012, August). IEEE. Association rules mining algorithm based on rough
set. In 2012 International Symposium on Information Technologies in Medicine and Education (Vol. 1,
pp. 361-364).

38. Zhou, J., Gandomi, A. H., Chen, F., & Holzinger, A. (2021). Evaluating the quality of machine
rin

learning explanations: A survey on methods and metrics. Electronics, 10(5), 593.


ep
Pr

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4367406

You might also like