Professional Documents
Culture Documents
Example 1 Assuming that the min_supp = 33% and Table 5 Databases before and after hiding item C and item
min_conf = 70%, the result of hiding item C and then item B using DSR
B using ISL algorithm is as follows. To hide item C, the TID D D3
rule C => B (50%, 75%) will be hidden if transaction T5 is T1 111 001
modified from 100 to 101 using ISL (Increase Support of T2 111 111
LHS). The new database D1 is shown in Table 3. The rule T3 111 111
C => B will have support = 50% and confidence = 60%. T4 110 010
However, rules C => A, B => A, B => C cannot be hidden T5 100 100
by ISL algorithm. T6 101 001
Table 3 Databases before and after hiding item C and item Example 4 As in example 3, reversing the order of hiding
B using ISL items, the result of hiding item B and then item C using
TID D D1 DSR algorithm is as follows. To hide item B, the rule B
T1 111 111 => A (60%, 100%), B => C (50%, 75%), C => A (60%,
T2 111 111 100%), C => B (50%, 75%) will be hidden if transaction T4
T3 111 111 is modified from 110 to 010, T1 is modified from 111 to
T4 110 110 011, T1 is modified from 011 to 010, T6 is modified from
T5 100 101 101 to 001, using DSR. The new database D4 is shown in
Table 6.
T6 101 101
Table 6 Databases before and after hiding item B and item
C using DSR
Example 2 As in example 1, reversing the order of hiding TID D D4
items, the result of hiding item B and then item C using ISL T1 111 010
algorithm is as follows. To hide item B, the rule B => C T2 111 111
(50%, 75%) will be hidden if transaction T5 is modified T3 111 111
from 100 to 110 using ISL. The new database D2 is shown T4 110 010
in Table 4. The rule B => C will have support = 50% and T5 100 100
confidence = 60%. However, rules B => A, C => A, C => T6 101 001
B cannot be hidden by ISL algorithm.
One observation is that different sequences of hiding
One observation we can make is that different sequences of items will result in different transformed databases, i.e., D3
hiding items will result in different transformed databases, and D4 for DSR algorithm.
i.e., D1 and D2 for ISL algorithm.
Table 4 Databases before and after hiding item B and item 5 Analysis
C using ISL This section analyzes some of the characteristics of
TID D D2 the proposed algorithms and compares with the algorithms
T1 111 111 proposed in Verykios etc’s [9,23]. The first characteristic
T2 111 111 we observe is the item ordering effect. The transformed
T3 111 111 databases are different under different ordering of hiding
T4 110 110 items, even though the same set of sensitive items is
T5 100 110 specified. The second characteristic we observe is the
T6 101 101 algorithm effect. The transformed databases will be
different under different algorithm. These characteristics
are demonstrated in the four examples in section 4 and
summarized in Table 7. Databases D1 and D2 are resulting The fourth characteristic we analyze is efficiency
databases using ISL algorithm and D3 and D4 are resulting comparison of the ISL and DSR algorithms. One
databases using DSR algorithm. observation we conclude from the examples in section four
is that DSR algorithm seems to be more effective when the
Table 7 Databases before and after hiding items B and C support count of the hidden item is large. This is due to
using ISL and DSR when support of right hand side of the rule is large;
TID D D1 D2 D3 D4 increase support of left hand side usually does not reduce
T1 111 111 111 001 010 the confidence of the rule. However, decrease support of
T2 111 111 111 111 111 right hand side usually decreases the confidence of the rule.
T3 111 111 111 111 111
T4 110 110 110 010 010 6 Conclusions
T5 100 101 110 100 100
T6 101 101 101 001 001 In this work, we have studied the database privacy
problems caused by data mining technology and proposed
two naïve algorithms for hiding sensitive predictive
The third characteristic we analyze is the efficiency of
association rules in association rules mining. The proposed
the proposed algorithm compared with the Verykios etc
algorithms are based on modifying the database
algorithms. Even though it is the hidden rules, instead of
transactions so that the confidence of the association rules
hidden items, that are specified in [9,23], we compare the
can be reduced. Examples demonstrating the proposed
number of database scanning and the number of rules
algorithms are shown. The item ordering and algorithm
pruned between the two approaches. Table 8 summarizes
ordering characteristics of the proposed algorithms are
the results.
analyzed. The efficiency of the proposed approach is
further compared with Verykios etc’s [9,23]. It was shown
For ISL algorithm, the number of database scanning
that our approach required less number of database
comes from the calculation of large one itemsets, large two
scanning and prune more number of hidden rules.
itemsets, and transactions TL. The rules pruned are AC =>
However, our approach must hide all rules containing the
B and C => AB. For Verykios etc’s 1a algorithm, the
hidden items on the left hand side, where Verykios etc’s
number of database scanning comes from the calculation of
approach can hide any specified rule. Currently we are
large one itemsets, large two itemsets, large three itemsets,
performing numerical simulation on the time effects and
and partial support transactions T. No rules are pruned in
side effects (the number of lost rules and new rules due to
the Verykios etc’s algorithm. It can be seen that the ISL
alternation of the database) and will be included in this
algorithm requires less database scanning and prune more
work. In the future, we will examine and compare with
number of association rules. Similar results are obtained
other alternation techniques for hiding predictive
for comparing DSR algorithm and Verykios etc’s 1b
association rules based on current approach
algorithm.
[9] E. Dasseni, V. Verykios, A. Elmagarmid and E. [21] J. Vaidya and C.W. Clifton. “Privacy preserving
Bertino, “Hiding Association Rules by Using Confidence association rule mining in vertically partitioned data”, In
and Support” in Proceedings of 4th Information Hiding Proc. of the 8th ACM SIGKDD Int’l Conference on
Workshop, 369-383, Pittsburgh, PA, 2001. Knowledge Discovery and Data Mining, Edmonton,
Canada, July 2002.
[10] A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke,
“Privacy preserving mining of association rules”, In Proc. [22] V. Verykios, E. Bertino, I.G. Fovino, L.P. Provenza,
Of the 8th ACM SIGKDD Int’l Conference on Knowledge Y. Saygin, and Y. Theodoridis, “State-of-the-art in Privacy
Discovery and Data Mining, Edmonton, Canada, July 2002. Preserving Data Mining”, SIGMOD Record, Vol. 33, No. 1,
50-57, March 2004.
[11] Alexandre Evfimievski, “Randomization in Privacy
Preserving Data Mining”, SIGKDD Explorations, 4(2), [23] V. Verykios, A. Elmagarmid, E. Bertino, Y. Saygin,
Issue 2, 43-48, Dec. 2002. and E. Dasseni, “Association Rules Hiding”, IEEE
Transactions on Knowledge and Data Engineering, Vol. 16,
[12] Alexandre Evfimievski, Johannes Gehrke and No. 4, 434-447, April 2004.
Ramakrishnan Srikant, “Limiting Privacy Breaches in
Privacy Preserving Data Mining”, PODS 2003, June 9-12,
2003, San Diego, CA.