Professional Documents
Culture Documents
Research Challenges
Neda Abdelhamid Ahmad Abdul Jabbar Fadi Thabtah
Information Technology Ebusiness department Applied Business
Auckland Institute of Studies Canadian University of Dubai Nelson Marlborough Institute
Auckland, New Zealand Dubai, UAE of Technology
nedah@ais.ac.nz 121200344@students.cud.ac.ae Auckland, New Zealand
Fadi.Fayez@nmit.ac.nz
Abstract. Association rule mining involves discovering concealed correlations among variables often from sales
transactions to help managers in key business decision involving items shelving, sales and planning. In the last
decade, association rule mining methods have been employed in deriving rules from classification dataset in
different business domains. This has resulted in an emergence of new classification approach called Associative
Classification (AC), which often produces higher predictive classifiers than classic approaches such as decision
trees, greedy and rule induction. Nevertheless, AC suffers from noticeable challenges some of which have been
inherited from association rules and others have been resulted from building the classifier phase. These challenges
are not limited to the massive numbers of candidate ruleitems found, the very large classifiers derived, the inability
to handle multi-label datasets, and the design of rule pruning, ranking and prediction procedures. This article
highlights and critically analyzes common challenges faced by AC algorithms that are still sustained. Hence, it
opens the door for interested researchers to further investigate these challenges hoping to enhance the overall
performance of this approach and increase it applicability in research domains.
Row # Att1 Att2 Class Table 2: Frequent items derived by CBA from Table 2.1
1 a1 b1 c2
2 a1 b1 c2 Frequent Support Confidence
3 a2 b1 c1 attribute
4 a1 b2 c1 value
5 a3 b1 c1 <a1>, c2 40% 57.10%
6 a1 b1 c2 <a1>, c1 30% 42.85%
<b1>, c2 30% 60%
7 a2 b2 c1
<b2>, c1 40% 80%
8 a1 b2 c1
<a1,b1>,c2 30% 100%
9 a1 b2 c1
<a1,b2>, c1 30% 75%
10 a1 b2 c2
discovered in the previous scan (n) in order to derive possible frequent (n +1)- ruleitem, and so on. Once all
frequent ruleitems are derived, the algorithms generate the set of candidate rules from the frequent ruleitems that
pass the minconf threshold. Overall, the step of generating the frequent ruleitems is a hard task that requires
excessive processing because of the possible ruleitems support counting in each iteration (Abdelhamid, et al.,
2012b).
3. Associative Classification Possible New Challenges
3.1 Multi-label Rules Generation
One of the challenges in AC is that most current algorithms are unable to generate all class labels associated with
an attribute value in the dataset. Commonly, an AC algorithm finds only the highest frequency class linked with
the attribute value. Nevertheless, there could be more than one class linked with the rule’s body in different rows
in the training dataset with high frequency making choosing only one class questionable. For instance, consider
attribute value < x1 , x2 > in a training data of 100 examples and two classes ( c 2 , c 3 ). Assume that < x1 , x2 > are
connected with classes c2 and c3 10 and 9 times respectively. A typical AC algorithm will induce a rule such as
x1 x2 c2 and not consider the rule x1 x2 c3 since attribute value < x1 , x2 > appeared 10 times with class
c 2 which is only one extra training example than class c 3 . Though, class c 3 should be included in the rule rather
discarded since it brings up crucial information for the decision maker and has large frequency. So favoring class
c 2 over c 3 due to one additional training example is not justified. As a matter of fact, not generating the possible
class labels for each rule can be seen as ignoring useful knowledge that can be important to the classifier’s
accuracy. The research question(s) raised to treat the abovementioned problem is: Would deriving additional
useful knowledge (rules) from single label data improve the predictive performance of the classifier?
Two promising research directions have been proposed AC to handle the discovery of multi-label rules from single
label classification datasets. One solution proposed in (Thabtah, et al., 2006) indicated that single classifiers can
originally be derived from the training dataset then merging them can be a possible solution to form multiple
labels rules. They introduced a separate phase during rule induction called recursive learning that initially
produces the first single label classifier, removes all data examples covered by the classifier and then produces
the next classifier from the remaining uncovered training examples and so forth. One obvious shortcoming from
this approach is the fact that classifiers are produced from parts of the training dataset and not the entire data
examples at once. A recently modified version of MMAC called eMCAC was developed by (Abdelhamid, 2014).
The author relaxed the recursive learning and was be able to induce multi-label rules directly from the training
dataset by keeping conflicting rules and then merge them later on while building the classifier. Experiments using
UCI datasets, trainer scheduling dataset as well as website phishing classification datasets, showed competitive
results in error rate when comparing eMCAC with MMAC and other rule induction algorithms.
4. Conclusions
Association rule and classification are two major data mining tasks that have been studied extensively by
researchers in the last two decades. Using association rule mining in the training phase has resulted in a promising
classification approach in data mining called associative classification (AC). In the last decade, several AC
techniques have been proposed in the literature and applied on different application datasets including medical
diagnoses, website phishing classification, email classification and others. Still, there are many challenges
associated with current AC techniques that if investigated may improve the overall performance of this family of
algorithms. This paper has investigated possible challenges linked to AC algorithms that can be tackled by scholars
in data mining and machine learning communities. Vital problems such as extending current AC algorithms to
handle multi-label data, not relying on association rule candidate generation function, reducing the number of
candidate rules induced, minimising classifiers’ size are among important issues this paper highlighted. Moreover,
the test data prediction step as well as the need for new tie breaking criteria in rule ranking are other possible areas
that require deep investigation. We also directed researchers to possible starting research points in solving the
abovementioned problems. In near future, we are going to develop a new AC algorithm with no candidate
generation function hoping to resolve a major deficiency in AC which is the exponential growth of rules.
References
1. Abdelhamid N., Ayesh A., Thabtah F. (2015) Emerging Trends in Associative Classification Data Mining International
Journal of Electronics and Electrical Engineering, Vol. 3, No. 1, pp. 50-53, February, 2015.
2. Abdelhamid N. (2014) Multi-label rules for phishing classification. Applied Computing and Informatics 11 (1), 29-46.
3. Abdelhamid N., Ayesh A., Thabtah F., Ahmadi S., Hadi W (2012) MAC: A multiclass associative classification algorithm.
To be published in the Journal of Information and Knowledge Management (JIKM). 11 (2), pp. 1250011-1 - 1250011-10.
WorldScinet.
4. Abdelhamid N., Ayesh A., Thabtah F. (2012a) An Experimental Study of Three
5. Different Rule Ranking Formulas in Associative Classification Mining. Proceedings of the 7th IEEE International
Conference for Internet Technology and Secured Transactions (ICITST-2012), pp. (795-800), UK.
6. Agrawal, R., and Srikant, R. (1994) Fast algorithms for mining association rule. Proceedings of the 20th International
Conference on Very Large Data Bases-VLDP,487-499.
7. Ayyat Susan, Lu J., Thabtah F. (2014) Class Strength Prediction Method for Associative Classification. Proceedings of
the IMMM 2014, The Fourth International Conference on Advances in Information Mining and Management, pp. 5-10.
Paris, France, July 2014. Best Paper Award.
8. Jabez C. (2011) A statistical approach for associative classification. European Journal of Scientific Research Vol. 58 (No.
2) 140-147.
9. Li J., Zaiane O. (2015) Associative Classification with Statistically Significant Positive and Negative Rule. CIKM '15
Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Pages 633-642.
10. Li, W., Han, J., and Pei, J. (2001) CMAR: Accurate and efficient classification based on multiple-class association rule.
Proceedings of the IEEE International Conference on Data Mining –ICDM, 369-376.
11. Liu, B., Hsu, W., and Ma, Y. (1998) Integrating classification and association rule mining. Proceedings of the Knowledge
Discovery and Data Mining Conference- KDD, 80-86. New York.
12. Qabajeh I., Thabtah F. , Chiclana F. (2015) A dynamic rule-induction method for classification in data mining. Journal of
Management Analytics, Volume 2, Issue 3, pp 233-253. Wiley.
13. Sasirekha D., Punitha A. (2015) A Comprehensive Analysis on Associative Classification in Medical Datasets. Indian
Journal of Science and Technology. Volume 8, Issue 33, December 2015.
14. Schmida M. R., Iqbalb F., Fungc B. (2015) E-mail authorship attribution using customized associative classification. The
Proceedings of the Fifteenth Annual DFRWS Conference, Volume 14, Supplement 1, August 2015, Pages S116–S126.
15. Thabtah F., Hammoud H., Abdel-Jaber H. (2015) Parallel Associative Classification Data Mining Frameworks Based
MapReduce. Parallel Processing Letters, Volume 25 (2).
16. Thabtah F., Hadi W., Abdelhamid N., Issa A. (2011) Prediction Phase in Associative Classification. Journal of Knowledge
Engineering and Software Engineering. Volume: 21, Issue: 6(2011) pp. 855-876. WorldScinet.
17. Thabtah F., Mahmood Q., mccluskey L., Abdel-Jaber H (2010). A new Classification based on Association Algorithm.
Journal of Information and Knowledge Management, Vol 9, No. 1, 55-64. World Scientific.
18. Thabtah F. (2007): Review on Associative Classification Mining. Journal of Knowledge Engineering Review, Vol.22:1,
37-65. Cambridge Press.
19. Thabtah F., Cowling P., and Peng Y. (2006): Multiple Label Classification Rules Approach. Journal of Knowledge and
Information System. Volume 9:109-129. Springer.
20. Yoon Y., Lee G. (2008) Efficient implementation of associative classifiers for document classification, Information
Processing Management, An International Journal, 43(2): 393-405.
21. Yu K., Wu X., Ding W., and Wang H. (2011) Causal Associative Classification. Proceedings of the 11th IEEE International
Conference on Data Mining (ICDM '11)914-923.