10 1109@SNPD 2016 7515960

Fuzzy Set based Data Publishing For Privacy
Preservation
Meng-bo Xie, Quan Qian*

School of Computer Engineering & Science, Shanghai University, Shanghai 200444, China
Materials Genome Institute, Shanghai University, Shanghai 200444, China
Corresponding Author Email: qqian@shu.edu.cn
Abstract—K-anonymity and its successors, like l-diversity conjunction with fuzzy draft rate. The experimental result
and t-closeness, are the most popular approaches for privacy shows that, comparing with the classical k-anonymity method,
preserving data publishing. However, each method has relatively the new algorithm performs better in the information loss and
high information loss and computational complexity. In order execution performance.
to solve this problem, this paper presents a fuzzy set based
anonymity algorithm, where numerical data are transformed to
linguistic data and sensitive data are published in conjunction II. R ELATED W ORK
with fuzzy draft rate. The experimental results show that the
fuzzy based algorithm performs better than that of the k- Currently, numerous methods have been proposed for pri-
anonymity method from the points of information loss and vacy preservation. Samarati and Sweeney [3][4] introduced
execution performance. The information loss of the fuzzy based k-anonymity with a attractive property that each tuple in
algorithm has been reduced by 40%∼50% and the execution the private table being released cannot be identified from
time reduced by 48%∼59%. at least k records. In other words, k-anonymity claims that
Keywords—privacy preservation, fuzzy sets, k-anonymity. each equivalence class contains at least k records with respect
to the quasi-identifier. The main contribution of paper [5]
is p-sensitive k-anonymity, which requires, in addition to
I. I NTRODUCTION
k-anonymity, that for each group of tuples with identical
Along with the development of computer network, dis- combination of quasi-identifier values, the number of distinct
tributed computing, data mining and big data, huge amounts sensitive attributes values must be at least p. Although k-
of data can be collected and analysed efficiently. But when anonymity protects identity disclosure, it is insufficient to
we excavate the potential value of the large amounts of data, prevent attribute disclosure. Nergiz et al. [6] proposed a
it will face privacy leakage inevitably. Therefore, in big data privacy model called multi-relational k-anonymity to ensure
era, how to protect privacy has become one of the hottest IT- k-anonymity on multiple relational tables. K-anonymity is
related research areas. popular and classical because of its simplicity in definition and
also easy implementation to process the anonymization. But
In daily life, most organizations often need to publish their it faces many attacks when background knowledge is known
micro data, e.g, medical data, census data, etc., for research and to outsiders. Such attacks include: (1)“Homogeneity Attack”
other purposes. Typically, such data are stored in databases, leverages the case where all the values for a sensitive value
and each record or a row corresponds to one individual. Each within a set of k records are identical. In such cases, even
record has a number of attributes [1], which can be divided into though the data has been k-anonymized, the sensitive value for
three categories: Identifier attribute, Quasi-identifier attribute the set of k records may be exactly predicted; (2)“Background
and Sensitive attribute. (1)Identifier attribute can uniquely Knowledge Attack” leverages an association between one or
identify an individual, for instance, identity card number, social more quasi-identifier attributes with the sensitive attribute to
security number, and etc. (2)Quasi-identifier attributes that reduce the set of possible values of the sensitive attribute.
may be known by outsiders, such as zip-code, birthday date,
and gender, that can be joined with external information to To address the limitations of k-anonymity, Machanavajjhala
re-identify individual records. (3)Sensitive attributes that are et al. [7] introduced a new notion of privacy, called L-diversity,
assumed to be unknown to outsiders and need to be protected, which requires that the distribution of a sensitive attribute
such as health condition or salaries. in each equivalence class has at least l “well represented”
values. A t-closeness model [8] was proposed which depends
To preserve privacy, a number of techniques have been
on the property that the earth mover distance(EMD) between
proposed for modifying or transforming the original data. The
the distribution of the sensitive attribute within an anonymized
techniques are done through cryptography, data mining and
group should not be different from the global distribution by
information hiding [2]. But in general, these techniques have
more than a threshold t. A table is said to have t-closeness
too much information losses with high execution complexity.
if all equivalence classes have t-closeness. A (l, t)-closeness
In this paper, we address the data privacy problem by using model [9] was proposed based on the partition of the sensitive
fuzzy sets, a new perspective of looking at privacy problem levels. This model relaxes the equivalence class constrains of
in data publishing. In fuzzy set based anonymity algorithm, t-closeness model, that the distance between the distribution
numerical data are fuzzily processed in order to transform of sensitive levels in the equivalence class and the whole data
into linguistic data and sensitive data are also published in table are no more than a threshold t. It depends on the Hellinger
978-1-5090-2239-7/16/$31.00 copyright 2016 IEEE

SNPD 2016, May 30-June 1, 2016, Shanghai, China
distance to measure the distance between the two distributions i = {1, 2...n}. Based on the following maximum membership
to avoid the problems that setting standard distance and principle, if we have:
high execution time of EMD distance computation. Ref. [10]
discusses a new generalization principle called m-invariance, n
[
which effectively reduces the risk of privacy-disclosure in re- Ai (x0 ) = Ai (x0 ) (2)
publication. Several models such as (ω; γ, k)-anonymity [11] k=1
and L-cover [12] anonymization are also proposed in the
literature in order to deal with the problem of above mentioned Then x0 is maximally subordinated to fuzzy set Ai , and
anonymity models. this character is recognized as the “target character”.
Fuzzy mathematics based privacy preserving in data pub- C. Membership Function
lishing with available alternatives has been derived in [13].
This method allows personalized privacy preservation being The membership function of a fuzzy set is a generalization
useful for both numerical and categorical attributes and only of the indicator function in classical sets. In fuzzy logic, it rep-
those necessary tuples are transformed. Paper [14] transforms a resents the degree of truth as an extension of valuation. Three
part of quasi-identifier and personalizes the sensitive attribute common methods used to determine membership function are
values so that to eliminate the privacy leakage (how much as follows [18]:
an adversary learn from the published data) and increase 1) Fuzzy statistical method: Fuzzy statistical method, is
utility (accuracy of data mining task) of a released database. an objective method, which is based on the fuzzy statistical
Paper [15] discusses fuzzy based approach for numerical experiment, and according to the objective existence of the
attributes and provides a taxonomy tree based mapping table degree of membership. The steps are corresponding to the
approach for categorical attribute. Ref. [16] analyses how probability and statistical random experiments.
can classical techniques of data privacy, like k-anonymity, be
developed within a framework of fuzzy information. 2) Assignment method: The assignment method, is a sub-
jective method, which is based on the practical experience so
The rest of the paper is organized as follows. Section that people can determine the membership function for fuzzy
III presents a overview of fuzzy sets, and fuzzy-anonymity sets. If the fuzzy sets are defined in the real number field R,
algorithm will be discussed in Section IV. A small example then the fuzzy set membership function can be regarded as
will be discussed in Section V. Section VI gives the whole fuzzy distribution. Assignment method is used to subjectively
experimental results and finally summarizes the whole paper select some fuzzy distribution, according to the nature of the
in Section VII. problem and then the parameters are determined according to
the actual measurement data.
III. P REREQUISITE K NOWLEDGE 3) Existing scale method: In economic management and
The main propose of privacy preservation in data publish- social science field, researchers can directly use the existing
ing is to maintain data utility, meanwhile, protecting privacy. In scale as a membership function for fuzzy sets. For instance, if
other words, the techniques with higher data utility and greater we want to define a fuzzy set “equipment in good condition”,
efficiency are our expected. the “equipment perfectness rate” can be directly used as a
membership to represent the fuzzy set of equipment in good
condition.
A. Fuzzy Sets
Fuzziness is a means to symbolize improbable, prospect D. Fuzzy Draft Rate
and approximation. These sets are different from classical set The fuzzy membership degree is great for representing
theory and are applied in fuzzy logic. In classical set theory, the degree of linguistic variable that subjectively describes
the associated elements in relation to a set is assessed in characteristics of an object or a situation, such as “hot” and
binary terms with respect to a crisp condition to the set. In “cold”, or how far they are from a goal value as “near” or
contradiction, fuzzy set theory allows the slow assessment of “far”.
the associated elements in relation to a set [17].
However, the weakness of the membership degree is to
Given a fuzzy subset A of discourse domain U , any describe whether the object close to the fuzzy critical point
arbitrary fuzzy set of this type is defined by a membership from the left side or the right side. So, paper [15] proposes
function: the concept of the fuzzy draft rate, which can be defined as:
µA : U → [0, 1] (1) P (x) = ±(1 − µA (x)) (3)
Assigning a value µA (x) ∈ [0, 1] to each element x ∈ U , For example, let U be the universe of discourse and
we call µA (x) ∈ [0, 1] is the membership degree that x belongs U = {2, 4, 6, 8, 10}. If we want to describe “the number which
A. closer to 7”, it is obviously known that 6 and 8 are all closer
to 6, the membership degree of 6 and 8 are both 0.8. When
B. Maximum Membership Principle use the fuzzy offset, we might calculate the fuzzy draft rate
of 6 is −0.2 and the fuzzy draft rate of 8 is +0.2. The above
Let U be the discourse domain, with n fuzzy subsets mentioned statement could be interpreted that 6 is close to 7
{A1 , A2 ...An }, we can identify any given x0 ∈ U and from the left side and 8 is close to 6 from the right side.
E. Information Loss Algorithm 1 Fuzzy-anonymity Algorithm
1: Input: Original Table T(with n records),
The concept of data utility is a standard to measure the
2: Output: Private Table T 0 which satisfies fuzzy-anonymity
degree of the original data characteristics after anonymization.
3: Begin
The definition of information loss is consistent with the data
4: for i = 1 to n do
utility. The smaller the information loss, the better the data
utility. So the information loss can be used to measure the data 5: Select the required attributes from the table T.
utility after anonymization. We propose a method that calculate 6: Categorize the type of attributes.
the homogeneity of data sets to measure the information 7: Identifier attributes, for instance, identification card
loss [19], which can be defined as: number, are generally replaced with auto generated
numbers.
8: Choose the number of fuzzy sets.
IL = SSE/SST (4) 9: while quasi-identifier attributes is numerical do
10: Calculate fuzzy membership degree by choosing
In Eq.(4) IL, SSE and SST are abbreviated for informa- membership function based on experts experience
tion loss, squares of error, and sum of squares respectively. mentioned in section II (In this step, the calculated
SSE represents the homogeneity measurement in a class, membership degree is not released in order to resist
which can be defined as: the linking attack since this membership function is
vulnerable to disclose).
ni
a X
X 11: end while
SSE = (Xij − X i )T (Xij − X i ) (5) 12: while sensitive attribute is numerical do
i=1 j=1 13: Transform the actual values to the fuzzy draft rate by
Algorithm2.
In this formula, Xij refers to the j th tuple on the ith 14: end while
equivalence class. X i refers to the average value of each tuple 15: end for
in the equivalence class. a is the number of the equivalence 16: For all the records, the algorithm will terminate until all
0
classes. ni is the number of tuples in the equivalence classes i. the numerical attributes are transformed and table T is
The smaller the value of SSE, the smaller is the information generated.
loss for anonymized data. 17: End
SST represents the whole data table, which can be defined

as: B. Calculate the fuzzy draft rate
X ni
a X Observing the above mentioned algorithm, we can find that
SST = (Xij − X)T (Xij − X) (6) the core of our work is to calculate the fuzzy draft rate for
i=1 j=1 sensitive numerical attributes. In this section we would like to
introduce the steps to calculate the fuzzy draft rate.
In this formula, X refers to the average value of each tuple
in the whole data table. n refers to the number of tuples in Let L be the linguistic term set, and n is the number of
the whole table. As for a already known table, the value of fuzzy sets. The linguistic values of L are {L1 , L2 ...Ln }. The
SST is generally constant. The smaller the value of SST , the n fuzzy sets have ranges of:
higher the degree of similarity among tuples.
IV. F UZZY-A NONYMITY A LGORITHM {min − a2 }, {a1 − a2 }, {ai−1 − ai+1 }...{an−1 − max} (7)
In this section, we propose a fuzzy-anonymity algorithm
to protect the privacy. The main idea is that the numerical For example, supposing L = {Low, M edium, High},
data are fuzzily processed and transformed into linguistic sets. then the number of fuzzy set is three. The minimum and max-
Especially, sensitive data are published in conjunction with imum values of income according to the business organization
fuzzy draft rate to maintain the utility of data mining. During are min and max respectively, and {a1 , a2 , a3 } are the mid-
the whole process, the membership function need to remain points of each fuzzy set.
confidential.
Since triangle membership function has the benefit of
The central idea of fuzzy mathematics is that membership simple calculation and easily achievable, we use triangle mem-
degree indicated by a value on the range [0, 1], where ‘0’ bership function to transfer the numerical data to linguistic
represents absolute false and ‘1’ the absolute true with respect data, which belongs to ‘assignment method’ mentioned in
to a given linguistic term. The linguistic terms are the words section II. For the fuzzy set with mid-point a1 , the membership
that describe the magnitude of the linguistic variable, such as function is given as Eq.8.
low, high and medium.

A. Algorithm Overview 0.99
 x = min
a2 −x
The pseudo-code of the fuzzy-based algorithm is shown as A1 (x) = min < x < a2 (8)
 a2 −min
Algorithm 1. 0 x ≥ a2
TABLE I. T HE INITIAL MICRO DATA TO BE PUBLISHED
For the fuzzy set with mid-point ai , 2 ≤ i ≤ n − 1, the
membership function is given as Eq.9. Name Age Gender Income
 Alice 42 female 10K

0 x ≤ ai−1 Bob 38 male 48K
Lily 24 female 25K

x−ai−1

ai−1 < x < ai


 ai −ai−1
 Lee 56 male 57K
Ann 27 female 94K
Ai (x) = 0.99 x = ai (9) Simon 60 male 51K
 ai+1 −x
 ai < x < ai+1
 ai+1 −ai



0 x ≥ ai+1
• Firstly, classify the type of attributes. Notice that
For the fuzzy set with mid-point an , the membership quasi-identifier attributes are ‘age’ and ‘gender’. The
function is given as Eq.10. numerical data of ‘Income’ is the sensitive attribute.
Identifier attributes such as ‘name’ should be trans-
 formed into auto generated id numbers.
0
 x ≤ an−1
max−x
an−1 < x < max • For quasi-identifier attribute ‘age’, since it is numeri-
An (x) = max−an−1 (10)
 cal. Let the linguistic fuzzy set is {F1 = Y oung, F2 =
0.99 x ≥ max M idlif e, F3 = Old}. Then the number of fuzzy set
is three. Choose the existing scale as a membership
The numerical attribute values are transformed into lin- function to calculate the fuzzy membership degree.
guistic term set based on the above mentioned membership The final membership degree is not released in order
functions and the maximum membership principle. The fuzzy to resist the “linking attack” since this membership
draft rate mentioned in section II is used to measure the actual function is vulnerable to leak. The membership func-
value of objects in the same linguistic set. tion can be defined as:
Let x be the value of the numerical data need to be (
transformed. The value of ai refers to the mid-points of the 1 0 ≤ x ≤ 25
membership function. When calculate x by using above men- F1 (x) = −1
x−25 2
(11)
tioned membership function and the maximum membership 1+ 5 25 < x ≤ 100
principle, we can know the number of fuzzy set that x belongs
to. Supposing j is this fuzzy set number, and the draft rate p (
0 0 ≤ x ≤ 45
is +0.01 when x is minimum value or maximum value. The F2 (x) = −1
x−45 −2
algorithm for fuzzy draft rate is shown as Algorithm 2. 1+ 5 45 < x ≤ 100
(12)
Algorithm 2 Fuzzy draft rate p
Input: x ∨ j ∨ ai (
0 0 ≤ x ≤ 65
Output: p F3 (x) = −1
x−65 −2
1: Begin 1+ 5 65 < x ≤ 100
2: if j = 1 then (13)
3: p ← +(1 − max)
4: end if • For sensitive attribute of income, supposing the fuzzy
5: for each j ∈ [2, n − 1] do set is {A1 = Low, A2 = M edium, A3 = High}. The
6: if x < ai then membership degree can be calculated according to the
7: p ← −(1 − max) membership function that mentioned in section III. In
8: end if this case, min = 10K, max = 94K, a2 = (min +
9: if x > ai then max)/2, a1 = (min + a2 )/2, a3 = (a2 + max)/2.
10: p ← +(1 − max) • Based on the maximum membership principle men-
11: end if tioned in section II and algorithm 1 and 2 transformed
12: end for the numerical attribute income can be converted to
13: if j = n then publish the linguistic set and the fuzzy draft rate. After
14: p ← −(1 − max) applying the transformations, TABLE II is created and
15: end if can be released.
16: End
• In TABLE II, a single numerical attribute ‘in-
come’ is replaced with two different attributes, in-
V. E XAMPLE come l(linguistic term) and the fuzzy draft rate. The
sensitive numerical attribute can be transformed with
Assuming that the micro data table I need to publish. The the numerical value which preserve the privacy with-
attributes in the original table may be classified as identifier out changing the original table structure.
attribute, quasi-identifier attributes and sensitive attributes.
A new modified data table will be generated according to • Supposing the fuzzy linguistic set
the above mentioned fuzzy-anonymity algorithm. The specific {Low, M edium, High} equals to the corresponding
process are shown as follows: numerical set {1, 2, 3}. For example, if the value of
TABLE II. T HE INTERMEDIATE ANONYMIZED TABLE
Name Age Gender Income l draft rate
1 Midlife female Low +0.01

2 Midlife male Medium -0.2
3 Young female Low +0.12
4 Old male High -0.12
5 Young female High +0.01
6 Old male Medium +0.05
TABLE III. T HE FINAL ANONYMIZED MICRO DATA TABLE
Name Age Gender Income f
1 Midlife female +1.01

2 Midlife male -2.2
3 Young female +1.12
4 Old male -3.12
5 Young female +3.01
6 Old male +2.05 Fig. 1. Comparisons of execution performance
sensitive attribute income is 10K, we can transform

10k to a fuzzy draft rate 0.01, which belongs to
linguistic set number ‘l’ and can be simplified as
‘1.01’. Finally, the TABLE II can be anonymized
further as shown in TABLE III.
Comparing TABLE I with TABLE II, the data sets of ‘age’
and ‘income’ are transformed into the fuzzy set by the above
mentioned approach. Table II shows that the features of each
object are still clear, but the actual numerical value is hidden,
which can prevent the linking attack.
Comparing TABLE II with TABLE III, the fuzzy draft rate
is proposed to measure the actual value of objects in the same
fuzzy set. For example, in table III, we can obviously notice
that person ‘1’ is poorer than person ‘3’ according to the fuzzy Fig. 2. Comparisons of information loss
offset, even if they are in the same linguistic set number ‘1’,
which can increase the data utility.
B. Information loss
VI. E XPERIMENTS AND A NALYSIS
In this part, the formula IL = SSE/SST mentioned
The experiments are carried out based on adult train-
in section II was used to measure the information loss for
ing data set from UCI machine learning repository [20].
fuzzy-anonymity and k-anonymity. The smaller the value of
It contains about 32,560 rows with 4 numerical columns.
IL, the smaller the information loss is for anonymized data.
Delete some broken records by preprocessing the data set so
Since the conception of equivalent class is diluted in the fuzzy-
that 30,000 tuples will be used. Moreover, the attribute set
anonymity method, we assume that the number of equivalent
{age, gender, income} are selected for publishing.
class is equal to the number of tuples. Then the number of
The programs are written in java and the hardware environ- each equivalent class is 1. Sequentially, selecting 5K, 10K,
ment is Intel(R) Core(TM) i5-3230M CPU @2.60GHZ, 6GB 15K, 20K, 25K, 30K records to test. According to the above
Memory, Microsoft Windows 7, MySQL 5.6.18. formula, Fig. 2 shows the information loss of k-anonymity and
fuzzy-anonymity.
A. Comparisons of execution performance From Fig. 2, it says that the information loss of data sets
We compared the execution time between fuzzy-anonymity after processed by fuzzy-anonymity is less than that of k-
and k-anonymity based generalization [4]. In addition, let the anonymity. Meanwhile, it also shows that the information loss
number of the fuzzy set be three and the value of k is also three. is also highly related to the data volume. When the data volume
Sequentially select 5K, 10K, 15K, 20K, 25K, 30K records to is increasing, the information loss will get smaller because of
test. Fig. 1 shows the comparisons of execution performance the increasing similarity of tuples in the class, which satisfies
among these methods. the same constraints.
Obviously, in Fig. 1, it shows that the execution time of In k-anonymity, numerical attributes are usually general-
fuzzy-anonymity based method is less than that of the k- ized. For example, supposing ‘income’ has values of 10K,
anonymity (k is 3). From the result, we can see that fuzzy- 12K, 18K, 19K and 60K, then the generalized range would
anonymity performs efficiently, because as a global generaliza- be [10K, 60K]. It can be seen that although four of them are
tion algorithm, k-anonymity has a great computation overhead less than 20K, only one is 60K, the range upper bound is
on frequent distance computing and class merging. still 60K. But in the fuzzy based privacy preservation, when
the above set values are mapped to fuzzy set ‘low’, it can be [11] X. Huang, J. Liu, Z. Han, and J. Yang, “A new anonymity model for
seen that 60K is associated with less membership degree in the privacy-preserving data publishing,” Wireless Communication Over Zig-
‘low’ and relatively the others are mapped to the same set with bee for Automotive Inclination Measurement China Communications,
vol. 11, no. 9, pp. 47–59, 2014.
higher membership degrees, which improves the data utility.
[12] L. Zhang, L. Wang, S. Jajodia, and A. Brodsky, “L-cover: Preserving
It is the membership degree that preserves information and diversity by anonymity,” in Proceedings of the 6th VLDB Workshop on
informativeness of the proposed method when compared with Secure Data Management, 2009, pp. 158–171.
perturbation methods and other k-anonymity related methods. [13] V. V. Kumari, S. S. Rao, K. Raju, K. V. Ramana, and B. Avadhani,
“Fuzzy based approach for privacy preserving publication of data,” IJC-
SNS International Journal of Computer Science and Network Security,
VII. C ONCLUSION VOL.8 No.1, no. 1, pp. 115–121, 2008.
[14] E. Poovammal and M. Ponnavaikko, “Preserving micro data release:
In this paper, we propose a new method to protect privacy Categorical and numerical data,” IEEE Setit, 2009.
based on fuzzy logic. The main idea is to transform the [15] G. R. Zhang, “Privacy preserving using fuzzy sets,” Computer Engi-
numerical attributes to linguistic set. Experiments show that, neering Applications, 2010.
comparing with k-anonymity, the execution performance of [16] W. Qian, Y. Chuandong, and L. Hong, “Fuzzy-based methods for
fuzzy based anonymity has been reduced by almost 40% to privacy preserving,” Application Research of Computers, vol. 30, no. 2,
50%. In addition, the information loss is reduced by almost pp. 518–520, 2013.
48% to 59%. [17] R. A. Aliev, “Fundamentals of the fuzzy logic-based generalized theory
of decisions,” Studies in Fuzziness Soft Computing, vol. 293, 2013.
In the future, the proposed algorithm should be extended [18] J. N. Mordeson, “Fuzzy mathematics,” Physica Verlag, vol. 4, no. 4,
in a certain distributed computing framework for processing pp. 95–125, 2011.
the big data efficiently. Furthermore, combining the fuzzy set [19] J. M. Chen and J. M. Han, “Evaluation model for quality of k-anonymity
based method with k-anonymity or other privacy preserving data oriented to microaggregation,” Application Research of Computers,
vol. 27, no. 6, pp. 2344–2347, 2010.
methods are also very meaningful research directions.
[20] K. Bache and M. Lichman, “Uci machine learning repository,” Univer-
sity of California Irvine School of Information Andcomputer Sciences,
2013.
ACKNOWLEDGMENT
This work is partially supported by Shanghai Munic-
ipal Natural Science Foundation (Grant No.13ZR1416100)
and Shanghai Municipal Science and Technology Commis-
sion(Grant No. 15DZ2260300, 15DZ2260301).
R EFERENCES
[1] P. Samarati and L. Sweeney, “Generalizing data to provide anonymity
when disclosing information,” in Proceedings of the seventeenth ACM
SIGACT-SIGMOD-SIGART symposium on Principles of database sys-
tems, 1998, p. 188.
[2] B. C. M. Fung, K. Wang, W. C. Fu, and P. S. Yu, “Introduction to
privacy-preserving data publishing : concepts and techniques,” A Survey
on Recent Developments, Computing, 2011.
[3] L. SWEENEY, “k-anonymity: A model for protecting privacy,” Interna-
tional Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,
vol. 10, no. 5, pp. 557–570, 2012.
[4] L.SWEENEY, “Achieving k -anonymity privacy protection using gen-
eralization and suppression,” International Journal of Uncertainty,
Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 571–588,
2002.
[5] J. Domingoferrer and V. Torra, “A critique of k-anonymity and some
of its enhancements.” in 2012 Seventh International Conference on
Availability, Reliability and Security, 2008, pp. 990–993.
[6] M. E. Nergiz, C. Clifton, and A. E. Nergiz, “Multirelational k-
anonymity,” IEEE Transactions on Knowledge & Data Engineering,
vol. 21, no. 8, pp. 1104–1117, 2009.
[7] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam,
“l-diversity: Privacy beyond k-anonymity,” Acm Transactions on Knowl-
edge Discovery from Data, vol. 1, no. 1, p. 24, 2007.
[8] N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy beyond
k-anonymity and l-diversity,” Icde, pp. 106 – 115, 2007.
[9] J. Yang, B. Zhang, J. Zhang, and J. Xie, “A(l,t)-closeness anonymization
method based on sensitive levels partition,” Journal of Huazhong
University of Science & Technology, 2014.
[10] X. Xiao and Y. Tao, “m-invariance: Towards privacy preserving re-
publication of dynamic datasets,” in Proceedings of the 2007 ACM
SIGMOD international conference on Management of data, 2007, pp.
689–700.

10 1109@SNPD 2016 7515960

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 1109@SNPD 2016 7515960

Uploaded by

Copyright:

Available Formats

Fuzzy Set based Data Publishing For Privacy

Meng-bo Xie, Quan Qian*

978-1-5090-2239-7/16/$31.00 copyright 2016 IEEE

µA : U → [0, 1] (1) P (x) = ±(1 − µA (x)) (3)

SST represents the whole data table, which can be defined

 Alice 42 female 10K

Name Age Gender Income l draft rate

1 Midlife female Low +0.01

TABLE III. T HE FINAL ANONYMIZED MICRO DATA TABLE

Name Age Gender Income f

1 Midlife female +1.01

sensitive attribute income is 10K, we can transform

You might also like