Professional Documents
Culture Documents
Abstract-The significant development in field of data proper resources, selecting proper environment for crop
collection and data storage technologies have provided production.
transactional data to grow in data warehouses that reside in
companies and public sector organizations. As the data is
As the database is growing day by day the
organizations which maintain this database are worried
growing day by day, there has to be certain mechanism that
about the importance of such huge transaction database.
could analyze such large volume of data. Data mining is a The hidden knowledge patterns can provide insight to the
way of extracting the hidden predictive information from data holders as well as be invaluable in important decision
those data warehouses without revealing their sensitive making and strategic planning. Sometimes organizations
information. Privacy preserving data mining (PPDM) is the are interested in collaborating their datasets which
recent research area that deals with the problem of hiding the organizations of similar fields to analyse their databases
sensitive information while analyzing data. Association Rule for mutual benefits. For example few hospitals want to
share the diagnosis details of their patients to perform a
Hiding is one of the techniques of PPDM to hide association
research without revealing the details to each other or
rules generated by Association Rule Generation Algorithms. some third party. Some banks want to generate some
In this paper we will provide a comparative theoretical criterion for credit card policy or scheme and doesn’t want
analysis of Algorithms that have been developed for to leak the details of their customer to other banks than
Association Rule Hiding. they require a technique which can analyse their data while
maintaining the data privacy .
Keywords- Data Mining, Privacy Preserving, Association Rule
Hiding
II. ASSOCIATION RULE HIDING
Let I = {i1,…., in} be a set of items from a database.
Let D be a set of transactions. Each transaction t Є D is an
I. INTRODUCTION
item set such that t is a proper subset of I. A transaction t
Association rules are an important methods of finding supports A, a set of items in I, if A is a proper subset of t.
regularities or patterns in raw data and it is the most An association rule of the form A →B, where A and B are
important model developed and extensively studied by subsets of I and A∩B= Ø. The support denoted as σ of rule
databases and data mining community. Association mining A→B can be computed by the following equation:
finds its application across many domains. One of the best
known application of association rule mining is in the Support (A→B) = |AUB| / |D|, where |AUB| denotes the
business field where discovery of purchase behaviours or number of transactions in the database that contains the
association between products is very useful for decision item set AB, and |D| denotes the number of the
making and developing effective marketing strategy. transactions in the database D which means that σ% of the
However in last few years there has been an significant transactions in D supports item set AB. The confidence
development in the area of association rule mining. denoted as τ of rule A→B is calculated by following
equation: Confidence (A→B) = |AUB|/|A|, where |A| is
Some recent applications of association rule mining number of transactions in database D that contains item set
includes , finding patterns in biological databases, A which means τ% of the transactions in D that supports A
extraction of knowledge from software engineering
metrics, web personalization, text mining. Association rule also supports B. A rule A→B is strong if support (A→B)
mining can also play an important role in discovering ≥ min_support and confidence (A→B) ≥ min_confidence,
knowledge from agricultural databases, survey data from where min_support and min_confidence are two given
agricultural research, data about soil and cultivation, data minimum thresholds [7].
containing information linking geographical conditions
and crop production to name a few. Such knowledge can Association rule hiding algorithms prevents the
assist in making decisions regarding selection of crops to sensitive rules from being disclosed. The problem of
be grown in particular area based on certain geographical association rule hiding can be stated as follows: “We are
conditions, increasing production of crops by selecting given a transactional database D with minimum
confidence, minimum support and a set R of rules which
Advantages Limitations In [11] the authors speak about ISL (Increase Support
Heuristic Based Approaches (Distortion techniques) of LHS) and DSR (Decrease Support of RHS). Item sets
Efficiency, scalability and Produce undesirable side are given as input to both the algorithms to automatically
quick responses due to effects in new database (i.e. hide sensitive association rules without pre-mining and
which it is getting focus by Lost rules and new rules) selection of hidden rules. In [12] authors proposed two
algorithms, DCIS (Decrease Confidence by Increase
majority of the researchers.
Support) and DCDS (Decrease Confidence by Decrease
Totally takes best decision. Support) were introduced to automatically hide association
Heuristic Based Approaches (Blocking technique) rules without pre-mining and selection of hidden rules. The
405
ISL and DCIS algorithms try to increase the support of left discussed algorithms is shown on the basis of insertion and
hand side of the association rule and algorithms DSR and deletion of sensitive item set. The table also shows the
DCDS try to decrease the support of the right hand side of analysis of algorithms on the basis of parameters like
the association rule. It is observed that the running time of hiding failure, new rules generated and lost rules.
ISL is more than DSR. Also both algorithm exhibit
contrasting side effects. In [13] an algorithm DSC Table II Comparative Analysis of various ARH
(Decrease Support and Confidence) is proposed in which
pattern-inversion tree is used to store related information Algorithms
so that only one scan of database is required. The proposed
algorithm can automatically sanitize informative rule sets Item Hidi New
without pre-mining and selection of a class of rules under Rule Lost
Method Hiding ng Rule
Name of Hiding Rule
one database scan. of Rule ( LHS Failu Genera
Algorithm Algorit
Hiding or re tion
hm %
RHS) % %
In [1] authors in their paper discussed a heuristic By ISL LHS 13 33 0
algorithm DSRRC (Decrease Support of R.H.S. item of Adding DCIS RHS 0 75 0
Rule Clusters) which provides privacy for sensitive rules at the
certain level while maintains data quality. DSRRC Sensiti-
ve Item Algorithm 1 YES
algorithm clusters the sensitive association rules based on Set
R.H.S. of rules and hides all possible rules at a time by DSR LHS 0 5 11
modifying lesser number of transactions which helps DCDS RHS 0 1 4
maintaining data quality. DSRRC algorithm cannot hide DSC BOTH 4 9
By NAÏVE BOTH
rules having multiple RHS items. In [9] the authors Deletion MinFIA BOTH
discussed about four heuristic algorithms: Algorithm of MaxFIA BOTH
Naïve, MinFIA (Minimum Frequency Item Algorithm), Sensiti-
IGA BOTH
MaxFIA (Maximum Frequency Item Algorithm) and IGA ve item
set FHSAR YES
(Item Grouping algorithm). The Naive Algorithm removes Algorithm 2 YES
the entire items with the highest frequency in the database Algorithm 3 YES
of selected transaction. In MinFIA (Minimum Frequency DSRRC YES
Item Algorithm) algorithm the item with the smallest Both Hybrid
BOTH
(Insertio Algorithm
support in the pattern chosen as a sensitive item and it n and
removes that item from the sensitive transactions. Unlike In[9] BOTH
deletion)
the MinFIA, algorithm MaxFIA (Maximum Frequency
Item Algorithm) selects the item with the maximum
support in the pattern as a sensitive item and removes it V. CONCLUSION
from the transaction. The IGA algorithm (Item Grouping The comparative analysis of algorithms discussed using
algorithm) groups restricts the patterns in groups of table in section III and IV. It is found that the algorithms
patterns sharing the same item sets so that all sensitive that are used for association rule hiding either hides the
patterns in the group will be hidden in single step. rules by using the left hand side (LHS) or right hand side
(RHS) of association rule. Some algorithms introduces
In [2] the authors introduced an efficient algorithm new association rule while some faces the consequences of
known as FHSAR (Fast Hiding Sensitive Association lost rules. All the algorithms that are discussed in section
Rules), for fast hiding of sensitive association rules. The III focuses on the hiding the sensitive item set or
algorithm has the capability to hide any given sensitive association rule but do not emphasize on the number of
association rule by scanning database single time, which rules while hiding the association rules. In the near future
helps significantly in reducing the execution time. In [6] a the focus will be on developing algorithm that will produce
Hybrid algorithm is proposed that uses the combination of minimum number of rules when applying the association
ISL and DSR technique and hides the association rules by rule hiding algorithms.
modifying the database transactions so that the confidence
of the association rules can be reduced. Such approach will
provide better result than using either ISR or DSR. In [7] VI. REFERENCES
the proposed algorithm doesn’t modifying the database [1] C.N. Modi, U.P. Rao & D.R. Patel, “Maintaining
transactions so that the support &confidence of the privacy and data quality in privacy preserving
association rules remains unchanged. It scans the database association rule mining”, International
less number of times and prunes more number of hidden Conference on Computing Communication and
rules.
Networking Technologies (ICCCNT), 2010.
[2] Chih-Chia Weng , Shan-Tai Chen & Hung-Che
IV. COMPARATIVE ANALYSIS OF
Lo “A Novel Algorithm for Completely Hiding
ASSOCIATION RULE HIDING
Sensitive Association Rules” IEEE Intelligent
ALGORITHMS
Systems Design and Applications, 2008.
[3] E. Dasseni, V. Verykios, A. Elmagarmid & E.
In this section the table shows the comparative analysis
Bertino, “Hiding association rules by using
the various association rule hiding algorithms on the basis
of theoretical study. In the table the classification of above
406
confidence and support” In Proceedings of 4th
information hiding workshop, Pittsburgh,2001.
[4] Khyati B. Jadav, Jignesh Vania, Dhiren R. Patel
“A Survey on Association Rule Hiding Methods”
International Journal of Computer Applications,
November 2013.
[5] Komal Shah, Amit Thakkar, Amit Ganatra,” A
Study on Association Rule Hiding Approaches”
International Journal of Engineering and
Advanced Technology (IJEAT), February 2012.
[6] Niteen Dhutraj, Siddhart Sasane, Vivek
Kshirsagar “Hiding Sensitive Association Rule
for Privacy Preservation”, IEEE Transactions on
Knowledge And Data Engineering Year 2013.
[7] Padam Gulwani “Association Rule Hiding by
Positions Swapping of Support and Confidence”
International journal of Information Technology
and Computer Science, 2012.
[8] R. Agrawal & R. Srikant, “Privacy preserving
data mining” In ACM SIGMOD conference on
management of data, Dallas, Texas, May 2000.
[9] R. M. Oliveira Stanley, R. Zaiane Osmar,
“Privacy Preserving Frequent Itemset Mining”,
IEEE International Conference on Data Mining
Workshop on Privacy, Security, and Data Mining,
Maebashi City, Japan. Conferences in Research
and Practice in Information Technology, 2002.
[10] S. Oliveira & O. Zaiane, “Algorithms for
balancing privacy and knowledge discovery in
association rule mining” In Proceedings of 7 th
international database engineering and
applications symposium (IDEAS03), Hong Kong,
July 2003.
[11] Shyue-Liang Wang, Bhavesh Parikh, Ayat Jafari
“Hiding informative association rule sets”,
ELSEVIER, Expert Systems with Applications
2007.
[12] Shyue-LiangWang, Dipen Patel, Ayat Jafari &
Tzung-Pei Hong “Hiding collaborative
recommendation association rules” Springer
Science+Business Media, LLC 2007.
[13] Shyue-Liang Wang, Rajeev Maskey, Ayat Jafari
& Tzung-Pei Hong “Efficient sanitization of
informative association rules” ACM, Expert
Systems with Applications: An International
Journal, July 2008.
407