Professional Documents
Culture Documents
2021 IJIEMR.Personal use of this material is permitted. Permission from IJIEMR must
be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new
collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted
component of this work in other works. No Reprint should be done to this paper, all copy
right is authenticated to Paper Authors
IJIEMR Transactions, online available on 20th Jan 2021. Link
:http://www.ijiemr.org/downloads.php?vol=Volume-10&issue=ISSUE-01
DOI: 10.48047/IJIEMR/V10/I01/32
Title: Privacy preservation for Abstracting Anonymization Techniques using
Generalization Algorithm
Volume 10, Issue 01, Pages: 157-161.
Paper Authors
B.M. Iran, Dr. K.Bhavana Raj, Deepu J.J.Lazarus
Abstract— The ongoing improvements around open information featured the significant issue of
anonymization with regards to information distributing. Many research endeavors were given to the
meaning of strategies performing such an anonymization. Anyway the determination of the most
applicable method and the satisfactory calculation is perplexing. Effective choice relies right off the
bat upon the capacity of information distributers to comprehend the anonymization systems and their
related calculations. In this paper, we center around the decision of a calculation among the various
ones executing one of the anonymization systems, to be specific speculation. Through a reflection
procedure displayed in this paper, we give information distributers improved depictions for the
speculation system and its calculations. These depictions encourage the comprehension of the
calculations by information distributers having low programming aptitudes. We present additionally
some other use instances of these deliberations just as an experimentation directed to approve them.
Keywords : Anonymization, Programming aptitude.
INTRODUCTION
The open information activity is an open door prompting execution assessment. In addition, a
for making an immense measure of data few reviews proposed in the writing are
available to end clients. In any case, information utilization arranged. They for the most part
distributers are confronting the issue of break down various anonymization procedures
discharging helpful information without trading featuring their focal points and downsides so as
off protection. The eventual fate of their to propose explore headings. Others are method
business relies upon their capacity both to offer situated. Their correlations of calculations are
helpful information to open, to examiners, to for the most part devoted to scientists wishing
scientists, and so on and to pick up the trust of to take a shot at information anonymization and
information proprietors. The last requires in this manner are not open to information
executing forms that forestall the abuse of their distributers having commonly low aptitudes in
delicate data. Thusly, information on information anonymization. Moreover, existing
anonymization methods gets essential for devices are obscure. Regardless of whether they
information distributers. In specific, through propose a few procedures, they execute, more
this information, they should pick the suitable often than not, just a single calculation for every
calculation given their unique circumstance. strategy without portraying it. Additionally, the
vast majority of these apparatuses don't give
Scrambling calculations are various. They are direction to the selection of calculations or
commonly portrayed gratitude to their launch in procedures.
a given setting. Papers portraying these
calculations center around the experimentations
algorithms, we conductedan inductive process, Output: 𝑀𝐺𝑇, a generalization of 𝑃𝑇𝑄𝐼 with respect to 𝑘
Assumes: 𝑃𝑇 ≥ 𝑘
also called generalization in Navrat et al that Method:
presents all these algorithms using a 1. 𝒇𝒓𝒆𝒒 ← a frequency list contains distinct sequences of values of 𝑷𝑻[𝑸𝑰], along with the number
of occurrences of each sequence.
commondescription. These two sub-processes 2. 𝑾𝒉𝒊𝒍𝒆 𝒕𝒉𝒆𝒓𝒆 𝒆𝒙𝒊𝒔𝒕𝒔 sequences in 𝑓𝑟𝑒𝑞 occurring less than 𝑘 times that account for more than
𝑘 tuples 𝒅𝒐
are described in the following paragraphs. 2.1. 𝑳𝒆𝒕 𝐴i be attribute in 𝑓𝑟𝑒𝑞 having the most number of distinct values
2.2. 𝒇𝒓𝒆𝒒 ← generalize the values of 𝐴i in 𝒇𝒓𝒆𝒒
3. 𝒇𝒓𝒆𝒒 ← suppress sequences in 𝑓𝑟𝑒𝑞 occurring less than 𝑘 times.
4. 𝒇𝒓𝒆𝒒 ← enforce 𝑘 requirement on suppressed tuples in 𝒇𝒓𝒆𝒒
3.1. Abstraction by parametrization of the 5. 𝑅𝑒𝑡𝑢𝑟𝑛 𝑀𝐺𝑇 ← construct table from 𝒇𝒓𝒆𝒒
generalization algorithms
In most papers, the generalization algorithms (a)
are presented in such a way that they can be
directly translated intoa program. Usually, they 1. 𝑨𝒍𝒈𝒐𝒓𝒊𝒕𝒉𝒎 𝑻𝑫𝑺
2. Initialize every value in 𝑻 to the top most value.
are usually partially instantiated using an 3. Initialize Cut i to include the top most value.
example of a table to be anonymized. Their 4. 𝒘𝒉𝒊𝒍𝒆 some 𝑥 ∈ ⋃Cut i is valid and beneficial 𝒅𝒐
basicprinciples are textually described. 5. Find the Best specialization from ⋃Cut i .
6. Perform Best on 𝑻 and update ⋃Cut i .
Therefore they are dedicated to computer 7. Update 𝑆𝑐𝑜𝑟𝑒(𝑥) and validity for 𝑥 ∈ ⋃Cut i .
scientists or to professionals having 8. 𝒆𝒏𝒅 𝒘𝒉𝒊𝒍𝒆
aprogramming background. To produce a more 9. 𝒓𝒆𝒕𝒖𝒓𝒏 Generalized 𝑻 and ⋃Cut i .
abstract description of the generalization (b)
algorithms, we have filteredaway all irrelevant
information and sometimes added information Fig. 2. (a) Datafly Algorithm13; (b) Top-Down
in order to facilitate the extraction of Specialization Algorithm (TDS)
content.Since an algorithm is a dynamic artifact,
we chose to represent it via a flowchart. The The abstractpresentation at Fig. 3a
latter is quite helpful inunderstanding the logic highlights the basic principle and therefore
of complex problems. For lack of space, we facilitates the understanding. In this
present the results of our abstraction process abstraction,Datafly generalizes one attribute
foronly three algorithms: Datafly, Top Down
at a time. At each iteration, it selects (as
specialization (TDS) and Median Mondrian
algorithms. Datafly was thefirst algorithm able
mentioned in step 2) the one that has thebest
to meet the k-anonymity requirement for a big score (the highest number of distinct values
set of real data. It combines generalization of in the current table). The execution stops
dataand suppression of tuples in order to avoid when the k-anonymity isreached. The
an excessive generalization which would reduce second example describes the TDS
data usefulness. At eachiteration, DataFly (a) algorithm extracted from Fung et al.16 (Fig.
generalizes the attributes having the highest 2b). It browses thegeneralization hierarchy
G>0 G=0
Then,the individuals of the same group will
have the same value for their QI via the
generalization process. Moreprecisely,
individuals (tuples of the original table) are
represented, thanks to the values of their QI, in
amultidimensional space where each dimension
corresponds to an attribute of the QI (Fig. 4a).
(b) The splitting of thespace into areas generates
the groups. It is performed using the median
Fig. 3. (a) Abstraction of Datafly algorithm; (b)
value of the attribute.
Abstraction of TDS algorithm
Age
30
original table preserves k-anonymitybut can 27
Datafly
Fig. 4. Principle of Median Mondrian and its
algorithm
The function find_median(fs) returns the value The experiment lasted about four hours. First
“splitVal” of the median. “t.dim” is the value of the participants had to fill a questionnaire about
a given dimension“dim” for the tuple “t”. The their level ofknowledge in anonymization
summary consists of the generalization of a set techniques (they had to evaluate their level
of values belonging to the samepartition. It is using a scale of 1 to 10) and theirprogramming
defined by a value range where the lower limit skills (they had to say if they have ancient,
(resp. upper limit) corresponds to the smallest recent or very recent programming skills).
value(resp. to the largest value) in the partition. Second, allparticipants were given a brief
Our abstraction of Median Mondrian leads to presentation on anonymization with emphasis
the activity diagram at Fig.5. on the generalization technique. Thenthey
received a copy of the slides and sheets of
papers for taking down notes. Third, they were
Datafly of Fig. 3a Datafly of Fig. 2a Median Mondrian of Fig. 5 Median Mondrian of Fig. 4b
Class Erroneous Correct Partial Erroneous Correct Partial Erroneous Correct Partial Erroneous Correct Partial
All the participants who have executed our Mondrian. They unanimously mentioned
abstractions didn’t face blocking in the post questionnaire that
difficulties. Moreover, only thosethat had ourabstraction helped them understanding
ancient knowledge obtained erroneous the original algorithms. Moreover they
executions and they are few. Only those have ranked the original algorithmsas more
who first performed theexecution of difficult to understand than our
Datafly abstraction have then proposed a abstractions. All the participants that met a
correct execution of the original Datafly. blockage in the original Datafly(40%) have
The sameobservation emerged for Median not executed our abstraction. It is the same
User
Top down
otherwise
algorithm
Datafly