A Novel Approach For Privacy Preserving Publication of Data

A Novel Approach for Privacy Preserving Publication of Data
V. Valli Kumari1 S. Ram Prasad Reddy1 M. Aruna Sowjanya1 B. Jhansi Vazram2 KVSVN Raju1
Abstract
list, one can easily find out the sensitive information
Protecting privacy of an individual while revealing about a person. Studies show that 87% of the US
knowledge about one’s personal and sensitive information population can be uniquely identified based on the
is a much talked about concern in the data mining world. combination of three attributes: Date of Birth, Gender and
Several works have been reported in literature as a 5-digit Zip code. The set of attributes that in combination
solution to this concern. But most of these solutions which when linked with external data uniquely identify the
claim privacy preserving, usually end up with less individuals are called as quasi identifiers [1].
information in the resulting data. The tradeoff in Several approaches have been evolved to provide the
information compromised for privacy is very large. In this data privacy, such as k-Anonymity, l-Diversity, t-
paper, we present a fuzzy and taxonomy tree based Closeness and personalized privacy preservation and so
solution where optimal and useful knowledge about on. Fuzzy based privacy preservation provides the
individual’s information can be retained while preserving solution to this privacy problem in a more attractive way
privacy of that individual. Our approach allows owner of compared to above approaches in terms of computational
the data to prescribe privacy levels and disclosure levels. effort and privacy level.
We also compare performance of our model with others Contributions: The idea of using fuzzy sets for preserving
like k-anonymity and l-diversity. privacy was discussed in [9] and [4]. This paper shows
that fuzzy based privacy preserving models are better than
Keywords: Privacy preserving, data privacy, fuzzy the k-anonymity, and l-diversity, as they preserve
information, anonymity, diversity information along with privacy. It has the results of our
experiments on Adult data set of UCI Machine Learning
Repository [8]. The implementations were done in java
1. Introduction for k-anonymity, l-diversity and our fuzzy based model
for both numerical and categorical attributes.
Organizations are maintaining person specific data in Out line of the paper: Section 2 presents the related work
large volumes for their own purposes, such as banks done so far in this privacy area. Section 3 gives brief
maintain customer information; hospitals maintain patient overview of the proposed system. Section 4 discusses the
records and so on. These organizations share their data implementation issues and mentions the issues covered in
with many other research institutes for various uses. fuzzy based model. Section 5 demonstrates the
Today technologies are providing easy way of experimental results. Section 6 gives the conclusion about
information sharing. However sharing the data with the paper and future work need to be done.
outsiders should not reveal the individual identification of
a person[10]. Care must be taken to provide the privacy 2. Related work
for the person specific data at the time of publishing
personal information for research purposes. k-anonymity is the basic model that provides the data
Data is stored in terms of tables, so a general way to privacy[1]. A release provides k-anonymity protection if
provide the privacy is to release person specific data by the information for each person contained in the release
removing all the identifiers that explicitly identify can not be distinguished from at least k-1 individuals
individuals. Examples of such types of identifiers are whose information also appears in the release [1, 3]. In
Name, SSN, and Address and so on. But this approach other words a table is said to be k-anonymous if every
does not provide the complete security. For example, take record in the table is indistinguishable from at least k-1
a scenario where, a hospital releases its patient data by records with respect to the set of quasi identifiers. k-
removing all the explicit identifiers. Any one can get the anonymity provides data privacy by suppressing the
voter registration list publicly. By careful linking of these explicit identifiers and generalizing the quasi identifiers.
two, hospital released data and voter registration Generalization is the process of replacing the attribute
1
Andhra University College of Engineering, Visakhapatnam-530003, value with more general value [3], for example with a
India, {vallikumari,reddysadi,arunasowjanya,kvsvn.raju}@gmail.com, range of values instead of a single value. Tables 1 and 2
2
Narasaraopet Engineering College, Narasaraopet, India, show the original table and its corresponding 2-
jhansi_bolla@gmail.com
anonymous table.
1
Table 1: Original Patient Table Table 3: 2-Diversity Table
Age Income Zipcode Disease Age Salary Zipcode Disease
5 12000 12000 gastric ulcer 1-10 10001-20000 10001-20000 gastric ulcer
9 14000 14000 dyspepsia 1-10 10001-20000 10001-20000 dyspepsia
6 18000 18000 pneumonia 1-10 10001-20000 10001-20000 pneumonia
8 19000 19000 bronchitis 1-10 10001-20000 10001-20000 bronchitis
56 33000 33000 flu 11-30 20001-40000 20001-40000 flu
26 36000 36000 gastritis 11-30 20001-40000 20001-40000 gastritis
21 58000 58000 flu 11-30 20001-40000 20001-40000 flu
Table 2: 2-Anonymous Table k-anonymity and l-diversity provide data privacy by

Age Income Zipcode Disease considering the entire data set as single entity with same
1-10 10001-20000 10001-20000 gastric ulcer kind of priority for each tuple. They provide privacy to
1-10 10001-20000 10001-20000 dyspepsia each person at the same level. They do not take into
1-10 10001-20000 10001-20000 pneumonia account personal anonymity requirements [6]. This
1-10 10001-20000 10001-20000 bronchitis process may lead to insufficient protection to a subset of
11-20 20001-30000 20001-30000 pneumonia people, while applying excessive privacy control to
11-20 20001-30000 20001-30000 pneumonia another subset. To resolve these problems a new model
21-30 30001-40000 30001-40000 flu was proposed named personalized privacy preservation.
21-30 30001-40000 30001-40000 gastritis In this method of privacy preservation, the sensitive
21-30 30001-40000 30001-40000 pneumonia attributes are also generalized for preserving the privacy
21-30 30001-40000 30001-40000 flu of published data. Here, a person can specify the degree
of privacy protection for their sensitive values. The
k-anonymity has several drawbacks [6]. First, k- motive was to preserve a large amount of information in
anonymity only prevents association between individuals the original data without violating any privacy constraints.
and tuples and not between individuals and sensitive Personalized privacy preservation model provides direct
values. Second, from the k-anonymous table knowledge association between individuals and their sensitive values.
relatedness to the accurate results is poor and it doesn’t It requires two generalizations, it first generalizes the
provide the privacy based on personal priorities [6]. Quasi identifiers and then based on personal
Several attacks are possible on this k-anonymous table specifications generalizes the sensitive attributes [6]. It
such as homogeneity attack and background knowledge uses greedy algorithm for generalization, which is not
attack [2]. A k-anonymous table may allow an adversary optimal, and so does not achieve minimal loss.
to derive the sensitive information of an individual with Computational effort also increases as sensitive attributes
100% confidence [6]. For example when a set of records also need generalization.
in a k-anonymous table with unique value of quasi set has t-Closeness is another way of providing privacy. It is
the same sensitive attribute value then each individual developed to overcome the limitations of the l-diversity. t-
record is definitely assigned with that same single value. closeness formalizes the idea of global background
k-anonymity lacks the diversity in its sensitive attributes. knowledge by requiring that the distribution of a sensitive
To overcome this problem another approach called l- attribute in any equivalence class is close to the
diversity was proposed. l-diversity provides the privacy distribution of the attribute in the overall table. An
beyond the k-anonymity. Once we have the k-anonymity equivalence class is said to have t-closeness if the
table we can provide l-diversity to that table, by dividing distance between the distribution of a sensitive attribute in
our table into equivalence classes and by providing the this class and the distribution of the attribute in the whole
diversity of the sensitive attributes in each of the table is no more than a threshold t. A table is said to have
equivalence classes. l-diversity also attacks the t-closeness if all equivalence classes have t-closeness [5].
background knowledge attack of the k-anonymity. An
equivalence class is l-diverse if it contains at least l well 3. Fuzzy based model
represented values for the sensitive attribute. A table is l-
diverse if every equivalence class is l-diverse [2]. Table 3 The model uses fuzzy sets for providing the
gives the 2-diversity for the Table 1. personalized privacy. Fuzzy set theory is used since it
permits the gradual assessment of the membership of
2
elements in relation to a set. This is described with the aid DB Dimensionality Reducer (DR)
of a membership function [7]. reduced dimension
µ → [0, 1]
Instead of binary values 0 and 1, we assign weights in Business Rules User Privacy Policy
the range 0 to 1, which gives more related information (BR) (PR)
about the original values. For example, if Jane and James
belonged to a fuzzy set called ‘old’ and if the membership
values associated with each of them are 0.7 and 0.8, then Privacy Categorizer
James is supposed to be older than Jane. Both of them Compliance
Categorized sensitive
belong to the same set, but still we are able to make out checker attributes
the difference in their age values using the membership
values. Inspite of making out the difference we surely Numerical Categorical
cannot come to a conclusion about their exact age values. attributes attributes
The main goal of our model is to provide personalized
privacy for both numerical and categorical attributes. The Sensitive attributes to
be transformed
model provides the privacy to entire data either at same
level or based on the individual preferences. At the same Numerical Categorical User Disclosure
time providing personalized privacy should not increase Policy (DL)
the computational effort. So to reduce this effort, we
introduce two new attributes into the table before
Fuzzifier TT
generalizing. The attributes are PR (Privacy Reveal) and
DL (Disclosure Level). PR corresponds to a row, to Privacy
specify whether or not to reveal that tuple. DL Preserving Publishable Data
corresponds to Sensitive attributes whether or not to Module
transform the value of Sensitive Attribute. DL is attached
to only categorical attributes.
If PR is set to true, we mean that user is willing to Figure 1. System Architecture
reveal the data; otherwise user is not willing to release
his/her data. If DL is set to true it means user does not 4. System Architecture
care about the disclosure of the attribute value; otherwise
we need to transform the value of the Sensitive Attribute. Our privacy preserving model primarily has two
A taxonomy tree is assumed for each Sensitive Attribute. objectives: preserving privacy while revealing useful
After the user specifies his/her preferences a table is information for sensitive i) numerical attributes, and ii)
derived from original table which contains only tuples categorical (non-numerical) attributes and to find a
whose PR values are set to true. This table is then generalized table T*, such that it includes all the attributes
generalized to provide the privacy, as the size of the table of T and an individual tuple from T is not identifiable in
is reduced it takes less time for computation. T*. Let T be a relation holding information about a set of
Fuzzy based privacy preserving approach is a individuals each associated with a tuple t. The system
practically feasible approach for achieving maximum architecture is given in figure 1. The following are the
privacy with more information and minimum overheads steps involved .
(as only the necessary tuples are transformed). The data 1. Dimensionality reduction (DR)
privacy problem is addressed using fuzzy set approach, a 2. Identifying sensitive attributes through business
total paradigm shift and a new perspective of looking at rules(BR)
privacy problem in data publishing. The domain 3. Categorizing the attributes (Categorizer)
generalization [11] based solution completely 4. Inducting personalized privacy reveal and
disassociates the sensitive values with the identifying disclosure level (Use PR and DL)
attributes. Our Practically feasible domain generalization 5. Use fuzzy based transformation for sensitive
method in addition, allows personalized privacy numerical attributes (Fuzzifier)
preservation, and is useful for both numerical and 6. Use taxonomy based transformation for sensitive
categorical attributes. Thus, this model ensures i) categorical attributes (TT)
complete privacy, ii) more information, iii) less
computational effort, iv) allows personalized privacy 4.1 Dimensionality reduction
preservation, and v) provides privacy for both numerical
and categorical attributes. Dimensionality reduction and attribute selection aim at
choosing a sub set of attributes sufficient to describe the
data set. The goal of the methods designed for
3
dimensionality reduction is to map d-dimensional objects To reduce the computational effort and increased
into k-dimensional objects, where k<d. Dimensionality control of the owner on his data, an attribute PR (Privacy
reduction is beneficial only when the loss of information Reveal) is introduced into the table. The value set to PR
is not critical to the solution or the problem, or if more tells whether the data is to be released or not, in other
information is gained by the visualization of the problem words whether, the sensitive attribute needs to be
than what is lost. Reduction from dimension d to k (k<d) transformed or not. If the user prefers to release the data,
reduces complexity, reduces communication cost and he can decide on the level of disclosure by setting the
provides privacy since extra data given may help in re- parameter DL (Disclosure level). The value set to DL
identifying individuals or loss of sensitive information decides whether the data has to be partially released or
vulnerable for second use. Assume T1 is the relation can be released in full. The DL is valid only for
obtained after dimensionality reduction. Let d be the categorical attributes. Both PR and DL are Boolean
number of attributes in T. Let D be the set of these attributes. Thus two more attributes PR (for each tuple)
attributes in relation T. T1 is the reduced relation with k and DL (one for each sensitive attribute) are added to
attributes of set K. relation T1. We call this new relation as T2.
Axiom 1: T1 = ∏ ∀k ∈K
T where K ⊆ D Axiom 2: Two attributes PR and DL are added to relation
T1 to form T2.
Privacy Reveal (PR): The user may be given a chance to
4.2 Categorizing attributes select his level of privacy by setting the PR to true (t)/
false (f). Setting PR to true means, the user wants to
The attributes in T1 are classified as identifier disclose the data. But if the selection is false, the user is
attributes , sensitive attributes and quasi identifiers. not willing to give the data. So whole of the data in the
row pertaining to the user is suppressed. This increases
1. Identifier Attributes (Ai): These attributes personalized privacy and also, reduces the computational
uniquely identify the individual associated with effort. T3 is the relation which contains only the tuples
the tuple, as anonymization requires that the data which are publishable.
be disassociated with the identifiers. One specific Axiom 3: T3 = σ PL = tT2
example is the name attribute.
Disclosure Level (DL):A DL attribute is attached to each
2. Sensitive attributes (As): These attributes should
categorical attribute in the table, which allows the user to
not be disclosed to the public or may be
choose graded personalized privacy. A taxonomy tree
disclosed after disassociating its value with an
with three levels is assumed for each sensitive attribute.
individual’s other information. A few examples
An example taxonomy tree is given in figure 2.
are Disease and Income, as shown in Tables 1
and 4.
3. Quasi Identifier( Aqi ): These values may be
4.4 Transformation for numerical attributes
published, but it so happens that with a
Assume, the data in table 4 is to be published and that
combination of these attributes an individual
the user specified sensitive attribute is Income. Then, the
may get identified. For instance, age and zipcode
following procedure is used to transform the table in to a
might disclose the identity.
publishable form. In the table, As= {income}. As income
is a sensitive attribute and is numerical, Rule1 is applied
4.3 Sensitive attribute and individual privacy for transforming its values. L is the linguistic term set
with {l1 , l 2 ,..l n } as the linguistic values, Ai is the ith
s
Though name and address are suppressed in medical
records, the key point is that this data when joined with sensitive variable in set As and ‘n’ is the number of
data from other sources should not reveal the data. The linguistic values. We select all tuples in T3 to T4.
objective of privacy preserving mining [10] is that this Let T4 = ∀t ∈ T3
data, when published should not link back to the
individual. One possible solution is to provide privacy to
the sensitive attribute value. This is done by either Rule 1: Given L= {l1 , l 2 ,..l n } , then ∀t ∈ T 4
generalizing it or transforming it using some method. But
this solution applies to all the tuples with the same
( )
F t. A → l , such that l ∈ L
i
s
priority, without any concern about the individual Suppose the linguistic term set for the variable income
privacy. Advantages of considering individual privacy are L(As=income) is: {High, Medium, Low} with membership
two fold: functions defined as below. The minimum and maximum
i) Giving priority to the individual who owns the tuple values of income according to the business organization
and allow graded levels of privacy, and are min and max respectively and a1, a2, a3 are the
ii) Reducing the computational effort midpoints of each fuzzy set and k is the number of fuzzy
4
sets. The k fuzzy sets will have ranges of: {min-a2}, {a1- For categorical attributes like disease, the following
a3}, {a(i-1)-a(i+1)},…,{a(k-1)-max}. taxonomy tree is taken.
For fuzzy set with midpoints a1, a2, a3,…ak-1 the
membership function is given by f1, f2 & f3 for Low, Disease
Medium and High respectively.
f1(x) = 1.0 if x=min Respiratory Digestive
= (x- a2) / (min-a2) if x< a2 Problem Problem
=0 if x>= a2
For the fuzzy set with midpoint ai, 2<=i<=k-1, the Flu Pneumonia Gastric Dyspepsia
membership function is given by Ulcer
fi(x) =0 if x<=a(i-1)
Bronchitis Gastritis
= (x-ai-1)/(ai-ai-1) if a(i-1)<x<ai
= 1.0 if x= ai
= (ai+1-x)/(ai+1- ai) if ai<x< ai+1 Figure 2 . Taxonomy Tree for disease
=0 if x>=a(i+1)
For fuzzy set with midpoint ak, the membership function In the Taxonomy tree, pertaining to a specific attribute,
is given by we associate the sensitivity level with each such attribute.
fk(x) =0 if x<=a(k-1) The user has choice of defining the sensitivity level for
= (x- a(k-1)) / (max- a(k-1)) if x>a(k-1) each sensitive attribute. For instance for Disease, we have
=1.0 if x=max DL = {t, f}. If the selected level is false, then the ancestor
of the attribute value is returned as a response to query on
Table 4. Microdata of employee table that tuple. If the selected level is true, the attribute value
Name Age Gender Zipcode Income itself is returned in response to any query on that tuple.
Arun 52 M 12000 10000 For each row we have PR, as fixed by the user. The
possibilities are ‘t / f’. The user may set this value to ‘t’ if
Keller 60 M 18000 23000
one wants the data to be revealed and ‘f’ when one does
Mani 81 M 19000 20000 not want to reveal the data. For each sensitive attribute we
Joe 42 M 22000 58000 assign a Disclosure Level (DL) specifier. If the user
Syam 19 M 24000 85000 doesn’t mind revealing the sensitive data as it is, the DL is
Rama 21 F 58000 94000
set to ‘t’, if one wants to disclose the data but disassociate
the data with himself, the DL is set to ‘f’. Let TT be the 3-
level taxonomy tree for a given sensitive value. The
Table 5. Transformed values transformation of the categorical attribute is done if and
Income 10000 23000 58000 85000 94000 only if PR=t.
weight 1.0 0.71 0.73 0.66 0.86 Rule2:
changed low low Medium high high ∀t1 ∈ T4 , F (∏t1.DL = t t1. Ais ) → parent (TT , Ais ) , if
to
and only if PR=t and ∀i ∈ K
Rule3: ∀t1 ∈ T4 , F (∏t1.DL = f t1. Ais ) → Ais , if and
The income attribute values of table 4 after applying
the above transformations along with the values of weight only if PR=t and ∀i ∈ K
(f1, f2, f3) are as given in table 5. This helps the end user
of the data to make out the distinction between two Table 6. Transformed values for categorical attribute
attribute values, even though they are mapped to the same
linguistic term. For instance, in table 5, both 10000 and PR Disease DL Transformed to
23000 are mapped to low. The relativeness t Gastric Ulcer t Gastric Ulcer
(informativeness) is still maintained by the weight. The
weight associated tells that low associated with 10000 is f Pneumonia t Not published
still lower than the low associated with 23000. The data in f Pneumonia f Not published
publishable form will have weight associated with every
transformed value as in table 5. However, the Income t Gastric Ulcer f Digestive problem
attribute values are not published. This is how we claim
informativeness in data while preserving privacy.
4.5 Transformation for categorical attributes Other attributes: Other attributes like Zip Code and
Gender, a taxonomy tree can be constructed.
5
Thus, T* is a copy of T4 contains only those tuples of T
10000
for which the owner has set PR to t and sensitive Without Privacy
numerical attribute is fuzzified and sensitive categorical 9000
20% privacy
attribute is transformed according to Rules 2 and 3. These 8000

50% privacy
selected tuples contain transformed sensitive categorical Effort in milli seconds

7000
and numerical attribute values according to the
preferences set by the owner of the data. 6000
5000
5. Experimental results 4000
We have implemented the k-anonymity, l-diversity 3000
approaches in java along with our fuzzy based model for 2000
both numerical and categorical attributes. The motive to 1000

implement all these approaches is to compare our fuzzy 1 1.5 2 2.5 3
No of Tuples
3.5 4 4.5 5
4
based model that takes less time for execution in

x 10
comparison with other models. These programs have been Figure 4. Computational Effort for ten clusters
tested by using the data set that was taken from UCI 10000
Machine Learning Repository and in that adult dataset without privacy
and patient dataset was considered for our experiments 9000 20% privacy
50% privacy
[8]. For comparison purpose, we have taken different 8000
sizes of data sets. We have taken 10k, 20k, 30k, 40k and effort in milli seconds
50k data set sizes. Graphs have been developed by 7000
considering the number of tuples in x-axis and 6000

computational effort in y-axis. The graphs clearly
demonstrate that the fuzzy based model is very feasible in 5000
terms of execution effort. 4000

Plots are also drawn for different levels of privacy
reveal level, i.e., for all the users willing to reveal data 3000
(No privacy), 20% of the users set their choice not to 2000
reveal the information (20% privacy) and 50% of the 1 1.5 2 2.5 3
No of Tuples
3.5 4 4.5 5
4
x 10
users set their choice not to reveal the information (50%
privacy). Figure 3 shows computational effort for Figure 5. Computational Effort for fifteen clusters
different numbers of tuples, for five clusters (k=5, where
k is the number of fuzzy sets), figure 4 shows plots for ten
clusters and figure 5 plots for fifteen clusters.
4
x 10
9
k-anonymity
9000 8
l-diversity
Without privacy Privacy PR-1,DL-1,100%
7
Computaional EffortPrivacy
in ms PR-1,DL-1,100%
8000 20% privacy
50% privacy
6
7000
Effort in milli seconds
5
6000
4
5000 3
4000 2
1
3000
0
10 15 20 25 30 35 40 45 50
2000
No. of Tuples
1000
1 1.5 2 2.5 3 3.5 4 4.5 5
Figure 6. Computational Effort for k-anonymity, l-
No of tuples
x 10
4 diversity and privacy reveal level with 100%
Figure 3. Computational effort for five clusters
6
14000
loss. It gives more information for the research or
statistical purposes while maintaining maximum privacy.
PR-1,DL-1 100%
12000 PR-1,DL-0 100% It is also satisfying the individual privacy need that is
PR-1,DL-1 50%
PR-1,DL-0 50%
personalized privacy preservation with less overhead and
Computational Effort in ms
it also provides privacy for both numerical and categorical
10000
PR-1,DL-1 20%
attributes. This model has been compared with k-

PR-1,DL-0 20%
8000
anonymity and l-diversity and has shown that our model

6000
has taken less computational effort than the above
4000
models. Further work is to enhance our model for
protecting privacy for multiple Sensitive attributes and to
2000 find the metrics for measuring privacy and information
loss.
0
10 15 20 25 30 35 40 45 50
No. of Tuples
7. References
Figure 7. Computational Effort for privacy reveal
level with 100%, 50% and 20%
[1] L. Sweeney. “k-Anonymity: A model for protecting
18 privacy”. Intl. Journal on Uncertainty, Fuzziness, and
Knowledge-based Systems, 10(5):557 (570), 2002.
17
k-anonymity
16 l-diversity
15
[2] Machanavajjhala A., Gehrke J., Kifer D., “l-diversity:
privacy PR-1,DL-1 100%
Computational Effort in ms
14 privacy PR-1,DL-0 100%
13
Privacy Beyond k-Anonymity”. Proceedings of the 22nd
12
11
IEEE Intl. Conf. on Data Engineering, 2006.
[3] L. Sweeney. “Achieving k-Anonymity privacy protection
10
9
8 using generalization and Suppression”. Intl. Journal on
7
6
Uncertainty, Fuzziness and Knowledge-based Systems, 10
5 (5), 2002;571-588.
4
3 [4] P. Kusuma Kumari, KVSVN Raju, S.Srinivasa Rao,
2
1
“Privacy Preserving in Clustering using Fuzzy Sets”,
0
10 15 20 25 30 35 40 45 50
“WORLDCOMP’06 The 2006 International Conference on
No. of Tuples Data Mining (DMIN’06: JUNE 26-29, 2006, Las Vegas,
Figure 8. Computational Effort for k-anonymity, l- USA)”, pp 290-295.
diversity and privacy reveal level with 100%, 50% and [5] Ninghui Li, Tiancheng Li and Suresh V. “t-Closeness:
20% Privacy beyond k-Anonymity and l-Diversity”. ICDE 2007,
23rd IEEE Intl. Conf. on Data Engineering, 2007.
Figure 6 shows the comparison of computational [6] X.Xiao and Y. Tao, “Personalized Privacy Preservation,”
efforts for k-anonymity, l-diversity and fuzzy based Proceedings of ACM SIGMOD International Conference.
personalized privacy without any individual privacy Management of Data, June 2006.
specifications i.e., 100% of the owners wish to reveal [7] L. A. Zadeh, “Fuzzy sets”, Information and control, vol.8,
sensitive information. Figure 7 shows the comparison pp.338-353, 1965.
among different levels of personalized privacy. Plots are
[8] D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz,
taken for 100% privacy reveal for all the tuples, 50% “UCI Repository of Machine Learning Databases,
privacy (privacy for only 50% tuples only) and 20% Available at www.ics.uci.edu/~mlearn/MLRepository.html,
privacy (privacy for the 20% tuples only). Figure 8 shows University of California, Irvine, 1998.
the comparison among all the models i.e., k-anonymity, l-
[9] K.Sridevi, KVSVN Raju, V. Valli Kumari and S. Srinivasa
diversity and personalized privacy for different levels of Rao. ”Privacy Preserving in Clustering by Categorizing
privacy. Attributes using Privacy and Non Privacy Disclosure
Sets”, WORLDCOMP’07,The 2007 Intl. Conf. on Data
6. Conclusion and future work Mining, pp301-307, June 2007, LasVegas, USA.
[10] J. Vaidya, C. Clifton, M. Zu, Privacy preserving data
Thus, our model ensures i) complete privacy, ii) more mining. Springer, 2006
information, iii) less computational effort, iv) allows [11] Kristen LeFevre, David J. DeWitt and Raghu
personalized privacy preservation, and v) provides Ramakrishnan. “Incognito: Efficient Full Domain K-
privacy for both numerical and categorical attributes. Anonymity”. In proceedings of ACM SIGMOD’05, USA,
2005.
Fuzzy model is feasible in terms of implementation
and it provides maximum privacy with less information

A Novel Approach For Privacy Preserving Publication of Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Novel Approach For Privacy Preserving Publication of Data

Uploaded by

Copyright:

Available Formats

A Novel Approach for Privacy Preserving Publication of Data

21 58000 58000 flu 11-30 20001-40000 20001-40000 flu

Table 2: 2-Anonymous Table k-anonymity and l-diversity provide data privacy by

attribute is transformed according to Rules 2 and 3. These 8000

selected tuples contain transformed sensitive categorical Effort in milli seconds

5. Experimental results 4000

We have implemented the k-anonymity, l-diversity 3000

both numerical and categorical attributes. The motive to 1000

based model that takes less time for execution in

50k data set sizes. Graphs have been developed by 7000

considering the number of tuples in x-axis and 6000

terms of execution effort. 4000

attributes. This model has been compared with k-

anonymity and l-diversity and has shown that our model

You might also like