Professional Documents
Culture Documents
K-anonymity
Abstract—With the rising of data mining technology and the threat or nonauthorization interview by identity authentication
appearances of data stream and uncertain data technology etc, mechanism and data coding. Many current international
individual data, the enterprise data are possibly leaked at any conferences analyse this problem, put forward the requirement
moments, so the data security has become nowadays the main of the higher encryption technology with the further research
topic of information security. The common way to protect on the information security, establish corresponding protection
privacy is to use K-anonymity in data publishing. This paper will model, ensure the strategy of interview authorization by setting
analyse comprehensively the current research situation of K- up server-based protective device, client-based protective
anonymity model used to prevent privacy leaked in data model and later logic framed protective model. Such method
publishing, introduce the technology of K-anonymity,
can prevent the disclosure of sensitive information to some
generalization and suppression, illustrate K-anonymity
extent, but unavoidable to the user who apply non- sensitive
evaluation criterion, and assess many different algorithms used
currently. Finally, the future directions in this field are discussed. data to infer indirectly the sensitive information with the help
of other external knowledge. Sensitive data, which can’t be
Keywords- K-anonymity; privacy protection; evaluation publicized, belong to private and national confidential data.
criterion; algorithm classfication The leak of such data may lead to the logic publicity of
subjective identity of data. In order to solve the problem of the
unconscious disclosure of private information and protection of
I. INTRODUCTION
individual identity, many research institute commit a great deal
Database technology has been developed quickly and of manpower, and the material resources to the research, the
applied extensively in the 1960's. With the development of research of the problem was put forward on many current
network technology, the database security problem is international conferences, such as SIGMOD, VLDB, ICDE,
prominent increasingly, especially information stealing, and PODS etc.
tampering and destroying, endanger the safety of information
system. With the rising of data mining technology and the K-anonymity [3], a model put forward by Samarati P and
appearances of data stream and uncertain data technology etc, Sweeney L in 1998 to avoid privacy leaks, requests existence
individual data, the enterprise data are possibly leaked at any of a certain amount of unrecognizable individuals in the
moments, so the data security has become nowadays the main publicized data which make the aggressor disable to distinguish
topic of information security. the concrete individual of privacy, and prevent the leak of
individual privacy. K-anonymity got the universal concern of
Enterprise, organization and government stored a great deal the academic circles, and a lot of scholars research and develop
of information, such as employee's salary, medical records, the technology on different levels.
criminal records, credit records etc., which all preserved in
various database. On the database server of some departments Samarati P [4] realizes k-anonymity by adopting
saved sensitive financial data, including the data of the generalization and suppression techniques to protect individual
transaction record, business, and account which should be private information, and introduce the concept of minimal
protected in order to prevent the competitor and other outlaws generalization. Iyengar V [5] explores preserving the
from obtaining these data. In this global information-based anonymity by the use of generalizations and suppressions on
environment, people can obtain useful information from mass the potentially identifying portions of the data. In particular, he
data by knowledge discovery, which in turn, also bring threat investigates the privacy transformation in the context of data
to privacy, as a result, people are required to share data and mining applications like building classification and regression
protect simultaneously. models. This is combined with a more thorough exploration of
the solution space using the genetic algorithm framework. Yao
Database security is mostly fulfilled by authorization or C et al. [6] illustrates the identification of k-anonymity in
encryption technology [1], [2]. Which are very important for views. Machanavajjhala A et al. [7] extended k-anonymity
the information and the data protection, they prevent external concept to the l-diversity concept, which requires l different
∑∑ HM ( A , A ) ij
'
j
Generalization mainly includes DGHA (domain Cost ( RT ) = 1 − i =1 j =1 (2)
generalization hierarchy on A) and VGHA (value generalization n×m
hierarchy on A) [22]. DGHA is that a given set of attributes HM(Aij ,A’j) presents the hamming distance that Aij is
value is generalized into a set of average attribute value, for generalized to A'j, Aij is the attribute of No. i tuple and j
example, ZIP codes {311578, 311579, 311588, 311589} can be column.
generalized to {31157*, 31158*}, which make the set indicate
a bigger range semantically. VGHA can be shown by a tree. 3) Based on Partition
Z3={******} ******
The cost measure of anonymity method, a classification
measure, is total sum of every row’s penalty coefficient:
Z2={3115**} 3115** n