A Survey of Association Rule Hiding Algorithms

2014 Fourth International Conference on Communication Systems and Network Technologies
A Survey of Association Rule Hiding Algorithms
Vikram Garg Anju Singh Divakar Singh

Stdt, Department of CSE Department of CSE Department of CSE
Barkatullah University Barkatullah University Barkatullah University
Institute of Technology Institute of Technology Institute of Technology
Bhopal Bhopal Bhopal
vikramgarg85@gmail.com asingh0123@rediffmail.com divakar_singh@rediffmail.com
Abstract-The significant development in field of data proper resources, selecting proper environment for crop
collection and data storage technologies have provided production.
transactional data to grow in data warehouses that reside in
companies and public sector organizations. As the data is
As the database is growing day by day the
organizations which maintain this database are worried
growing day by day, there has to be certain mechanism that
about the importance of such huge transaction database.
could analyze such large volume of data. Data mining is a The hidden knowledge patterns can provide insight to the
way of extracting the hidden predictive information from data holders as well as be invaluable in important decision
those data warehouses without revealing their sensitive making and strategic planning. Sometimes organizations
information. Privacy preserving data mining (PPDM) is the are interested in collaborating their datasets which
recent research area that deals with the problem of hiding the organizations of similar fields to analyse their databases
sensitive information while analyzing data. Association Rule for mutual benefits. For example few hospitals want to
share the diagnosis details of their patients to perform a
Hiding is one of the techniques of PPDM to hide association
research without revealing the details to each other or
rules generated by Association Rule Generation Algorithms. some third party. Some banks want to generate some
In this paper we will provide a comparative theoretical criterion for credit card policy or scheme and doesn’t want
analysis of Algorithms that have been developed for to leak the details of their customer to other banks than
Association Rule Hiding. they require a technique which can analyse their data while
maintaining the data privacy .
Keywords- Data Mining, Privacy Preserving, Association Rule
Hiding
II. ASSOCIATION RULE HIDING
Let I = {i1,…., in} be a set of items from a database.
Let D be a set of transactions. Each transaction t Є D is an
I. INTRODUCTION
item set such that t is a proper subset of I. A transaction t
Association rules are an important methods of finding supports A, a set of items in I, if A is a proper subset of t.
regularities or patterns in raw data and it is the most An association rule of the form A →B, where A and B are
important model developed and extensively studied by subsets of I and A∩B= Ø. The support denoted as σ of rule
databases and data mining community. Association mining A→B can be computed by the following equation:
finds its application across many domains. One of the best
known application of association rule mining is in the Support (A→B) = |AUB| / |D|, where |AUB| denotes the
business field where discovery of purchase behaviours or number of transactions in the database that contains the
association between products is very useful for decision item set AB, and |D| denotes the number of the
making and developing effective marketing strategy. transactions in the database D which means that σ% of the
However in last few years there has been an significant transactions in D supports item set AB. The confidence
development in the area of association rule mining. denoted as τ of rule A→B is calculated by following
equation: Confidence (A→B) = |AUB|/|A|, where |A| is
Some recent applications of association rule mining number of transactions in database D that contains item set
includes , finding patterns in biological databases, A which means τ% of the transactions in D that supports A
extraction of knowledge from software engineering
metrics, web personalization, text mining. Association rule also supports B. A rule A→B is strong if support (A→B)
mining can also play an important role in discovering ≥ min_support and confidence (A→B) ≥ min_confidence,
knowledge from agricultural databases, survey data from where min_support and min_confidence are two given
agricultural research, data about soil and cultivation, data minimum thresholds [7].
containing information linking geographical conditions
and crop production to name a few. Such knowledge can Association rule hiding algorithms prevents the
assist in making decisions regarding selection of crops to sensitive rules from being disclosed. The problem of
be grown in particular area based on certain geographical association rule hiding can be stated as follows: “We are
conditions, increasing production of crops by selecting given a transactional database D with minimum
confidence, minimum support and a set R of rules which
978-1-4799-3070-8/14 $31.00 © 2014 IEEE 404

DOI 10.1109/CSNT.2014.86
have been mined from database D. A subset R H of R is Maintains truthfulness of Difficult to reproduce
denoted as set of sensitive association rules which we have the underlying data. original dataset.
to prevent from being disclosed. The objective of Minimizes side effects
association rule hiding is to transform D into a database D’ Border Based Approaches
in such a way that nobody will be able to mine association
rule which belongs to RH and all non sensitive rules in R Maintains data quality by Unable to identify optimal
should remain unaffected[7]. greedily selecting the hiding solution But still
modification with minimal dependent on heuristic to
side effects. Improvement decide upon the
III. LITERATURE SURVEY over pure heuristic item modification.
approach
The concept of privacy preserving in data mining came Exact Approaches
in to existence in response to the concerns that were raised Guarantees quality for But requires very high time
for preserving the private information which are produced
hiding sensitive complexity
as a result of data mining algorithms [8]. There are two
information than other due to integer programming
types of privacy concern that were raised in reference to
the data mining. The first type of privacy concern termed approaches.
as output privacy is that the data is minimally altered so Reconstruction Approaches
that the mining result will preserve privacy. Many Create privacy aware The open problem is to
techniques have been proposed for this type of output database by exacting restrict the number of
privacy. Techniques like blocking, perturbation, sensitive characteristic transactions in the new
aggregation, swapping, and sampling are the example of from the original database. database.
output privacy. In output privacy for hiding a given Lesser side effects in
specific rules or patterns, there are many proposed database than heuristic
techniques available for hiding association rule, approach.
classification and clustering rules. For hiding the Cryptographic Approaches
association rules, two approaches have been proposed. The Secure mining of Do not protect the output of
first approach that has been proposed, hides one rule at a association rule over a computation. Falls short of
time [5]. It first selects transactions that contain the items partitioned providing a complete answer
in a give rule. It then attempts to modify transaction by database to the problem of privacy
transaction until the support or confidence of the rule fall
preserving data mining.
below minimum support or minimum confidence. The
modification is done by either deleting items from the Communication and
transaction or adding new items to the transactions. computation cost should be
low.
The second type of privacy concern which is related
with the input privacy of the data is that the data is altered
in such a way that the mining result is not affected or In [8] the authors discussed three algorithms for hiding
affected minimally [3], like cryptography-based techniques sensitive association rules. Algorithm 1 provides the
which allow users access to only a subset of data while hiding of association rules by increasing the support of the
global data mining results can still be discovered. The rule’s antecedent until the confidence of the rule
example includes multiparty computation. The second decreases below the minimum confidence threshold.
approach deals with groups of restricted patterns or Algorithm 2 hides sensitive rules by decreasing the
association rules at a time [10]. It first selects the frequency of the consequent until either the confidence or
transactions that contain the intersecting patterns of a the Support of the association rule falls below the
group of restricted patterns. After that on the basis of threshold. Algorithm 3 decreases the support of the
disclosure threshold supplied by users, it hides the sensitive rules until either their confidence falls below the
restricted patterns by sanitizing the percentage of the minimum confidence threshold or their support falls below
selected transactions. In [4] authors summarize the the minimum support threshold. In algorithm 1 large
advantages and limitations of associations hiding number of new frequent item sets is introduced and,
approaches. therefore, an increasing number of new rules are
generated. Algorithm 2 and 3 affects number of no
Table I Summary of association rule hiding sensitive rules in database due to removal of items from
approaches [4] transaction
Advantages Limitations In [11] the authors speak about ISL (Increase Support
Heuristic Based Approaches (Distortion techniques) of LHS) and DSR (Decrease Support of RHS). Item sets
Efficiency, scalability and Produce undesirable side are given as input to both the algorithms to automatically
quick responses due to effects in new database (i.e. hide sensitive association rules without pre-mining and
which it is getting focus by Lost rules and new rules) selection of hidden rules. In [12] authors proposed two
algorithms, DCIS (Decrease Confidence by Increase
majority of the researchers.
Support) and DCDS (Decrease Confidence by Decrease
Totally takes best decision. Support) were introduced to automatically hide association
Heuristic Based Approaches (Blocking technique) rules without pre-mining and selection of hidden rules. The
405
ISL and DCIS algorithms try to increase the support of left discussed algorithms is shown on the basis of insertion and
hand side of the association rule and algorithms DSR and deletion of sensitive item set. The table also shows the
DCDS try to decrease the support of the right hand side of analysis of algorithms on the basis of parameters like
the association rule. It is observed that the running time of hiding failure, new rules generated and lost rules.
ISL is more than DSR. Also both algorithm exhibit
contrasting side effects. In [13] an algorithm DSC Table II Comparative Analysis of various ARH
(Decrease Support and Confidence) is proposed in which
pattern-inversion tree is used to store related information Algorithms
so that only one scan of database is required. The proposed
algorithm can automatically sanitize informative rule sets Item Hidi New
without pre-mining and selection of a class of rules under Rule Lost
Method Hiding ng Rule
Name of Hiding Rule
one database scan. of Rule ( LHS Failu Genera
Algorithm Algorit
Hiding or re tion
hm %
RHS) % %
In [1] authors in their paper discussed a heuristic By ISL LHS 13 33 0
algorithm DSRRC (Decrease Support of R.H.S. item of Adding DCIS RHS 0 75 0
Rule Clusters) which provides privacy for sensitive rules at the
certain level while maintains data quality. DSRRC Sensiti-
ve Item Algorithm 1 YES
algorithm clusters the sensitive association rules based on Set
R.H.S. of rules and hides all possible rules at a time by DSR LHS 0 5 11
modifying lesser number of transactions which helps DCDS RHS 0 1 4
maintaining data quality. DSRRC algorithm cannot hide DSC BOTH 4 9
By NAÏVE BOTH
rules having multiple RHS items. In [9] the authors Deletion MinFIA BOTH
discussed about four heuristic algorithms: Algorithm of MaxFIA BOTH
Naïve, MinFIA (Minimum Frequency Item Algorithm), Sensiti-
IGA BOTH
MaxFIA (Maximum Frequency Item Algorithm) and IGA ve item
set FHSAR YES
(Item Grouping algorithm). The Naive Algorithm removes Algorithm 2 YES
the entire items with the highest frequency in the database Algorithm 3 YES
of selected transaction. In MinFIA (Minimum Frequency DSRRC YES
Item Algorithm) algorithm the item with the smallest Both Hybrid
BOTH
(Insertio Algorithm
support in the pattern chosen as a sensitive item and it n and
removes that item from the sensitive transactions. Unlike In[9] BOTH
deletion)
the MinFIA, algorithm MaxFIA (Maximum Frequency
Item Algorithm) selects the item with the maximum
support in the pattern as a sensitive item and removes it V. CONCLUSION
from the transaction. The IGA algorithm (Item Grouping The comparative analysis of algorithms discussed using
algorithm) groups restricts the patterns in groups of table in section III and IV. It is found that the algorithms
patterns sharing the same item sets so that all sensitive that are used for association rule hiding either hides the
patterns in the group will be hidden in single step. rules by using the left hand side (LHS) or right hand side
(RHS) of association rule. Some algorithms introduces
In [2] the authors introduced an efficient algorithm new association rule while some faces the consequences of
known as FHSAR (Fast Hiding Sensitive Association lost rules. All the algorithms that are discussed in section
Rules), for fast hiding of sensitive association rules. The III focuses on the hiding the sensitive item set or
algorithm has the capability to hide any given sensitive association rule but do not emphasize on the number of
association rule by scanning database single time, which rules while hiding the association rules. In the near future
helps significantly in reducing the execution time. In [6] a the focus will be on developing algorithm that will produce
Hybrid algorithm is proposed that uses the combination of minimum number of rules when applying the association
ISL and DSR technique and hides the association rules by rule hiding algorithms.
modifying the database transactions so that the confidence
of the association rules can be reduced. Such approach will
provide better result than using either ISR or DSR. In [7] VI. REFERENCES
the proposed algorithm doesn’t modifying the database [1] C.N. Modi, U.P. Rao & D.R. Patel, “Maintaining
transactions so that the support &confidence of the privacy and data quality in privacy preserving
association rules remains unchanged. It scans the database association rule mining”, International
less number of times and prunes more number of hidden Conference on Computing Communication and
rules.
Networking Technologies (ICCCNT), 2010.
[2] Chih-Chia Weng , Shan-Tai Chen & Hung-Che
IV. COMPARATIVE ANALYSIS OF
Lo “A Novel Algorithm for Completely Hiding
ASSOCIATION RULE HIDING
Sensitive Association Rules” IEEE Intelligent
ALGORITHMS
Systems Design and Applications, 2008.
[3] E. Dasseni, V. Verykios, A. Elmagarmid & E.
In this section the table shows the comparative analysis
Bertino, “Hiding association rules by using
the various association rule hiding algorithms on the basis
of theoretical study. In the table the classification of above
406
confidence and support” In Proceedings of 4th
information hiding workshop, Pittsburgh,2001.
[4] Khyati B. Jadav, Jignesh Vania, Dhiren R. Patel
“A Survey on Association Rule Hiding Methods”
International Journal of Computer Applications,
November 2013.
[5] Komal Shah, Amit Thakkar, Amit Ganatra,” A
Study on Association Rule Hiding Approaches”
International Journal of Engineering and
Advanced Technology (IJEAT), February 2012.
[6] Niteen Dhutraj, Siddhart Sasane, Vivek
Kshirsagar “Hiding Sensitive Association Rule
for Privacy Preservation”, IEEE Transactions on
Knowledge And Data Engineering Year 2013.
[7] Padam Gulwani “Association Rule Hiding by
Positions Swapping of Support and Confidence”
International journal of Information Technology
and Computer Science, 2012.
[8] R. Agrawal & R. Srikant, “Privacy preserving
data mining” In ACM SIGMOD conference on
management of data, Dallas, Texas, May 2000.
[9] R. M. Oliveira Stanley, R. Zaiane Osmar,
“Privacy Preserving Frequent Itemset Mining”,
IEEE International Conference on Data Mining
Workshop on Privacy, Security, and Data Mining,
Maebashi City, Japan. Conferences in Research
and Practice in Information Technology, 2002.
[10] S. Oliveira & O. Zaiane, “Algorithms for
balancing privacy and knowledge discovery in
association rule mining” In Proceedings of 7 th
international database engineering and
applications symposium (IDEAS03), Hong Kong,
July 2003.
[11] Shyue-Liang Wang, Bhavesh Parikh, Ayat Jafari
“Hiding informative association rule sets”,
ELSEVIER, Expert Systems with Applications
2007.
[12] Shyue-LiangWang, Dipen Patel, Ayat Jafari &
Tzung-Pei Hong “Hiding collaborative
recommendation association rules” Springer
Science+Business Media, LLC 2007.
[13] Shyue-Liang Wang, Rajeev Maskey, Ayat Jafari
& Tzung-Pei Hong “Efficient sanitization of
informative association rules” ACM, Expert
Systems with Applications: An International
Journal, July 2008.
407

A Survey of Association Rule Hiding Algorithms

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Survey of Association Rule Hiding Algorithms

Uploaded by

Copyright:

Available Formats

2014 Fourth International Conference on Communication Systems and Network Technologies

A Survey of Association Rule Hiding Algorithms

Vikram Garg Anju Singh Divakar Singh

978-1-4799-3070-8/14 $31.00 © 2014 IEEE 404

You might also like