You are on page 1of 19

D.

Divya Sesha Sree


19261A1214
presents…

Data
Anonymization
Under the guidance of
Mrs. J. Hima Bindu
Data Anonymization
01 What is it ?
02 Why do we need ?
03 Where do we actually use it ?

Contents
04 Pseudonymization
05 Types of Data
06 Relational Data anonymization
07 Graph Data anonymization
08 Methodologies

09 Disadvantages
10 Future Scope
11 Conclusion
What is it?
Privacy Protection Remove PII
It is a process of It is a process of
protecting publicly removing personally
obtained private data identifiable
information
Why anonymize?
Data miss-use PPDP GDPR
To safe guard against Privacy preserving data Companies must follow
data miss-use by publishing: to publish the EU's General Data
unreliable employees data, it has to be Protection Regulations
and 3rd parties anonymized first to to run businesses in
preserve privacy the region
Where do we use ?

Banking and trading Medical Research E-marketing

Search Engine Social Media


Employee monitoring behavior tracking Analytics.
Pseudonymization
vs
Anonymization
• Pseudonymized data can be restored to its
original state with the addition of information
which allows individuals to be re-identified.

• In contrast, anonymization is intended to


prevent re-identification of individuals within
the dataset.
Types of Data(to anonymize)

Relational Graph

Homogeneous Heterogeneous
Relational Data
anonymization
AIM: Produce the anonymous table T′ from
an original table T
Most data collected from sources other than SN is
in tabular form like customer data in super marts.
Thus, we must anonymize SA (Sensitive
Attributes) and QI (Quasi Identifiers) in the tables.
Methodology
k -Anonymity
• The k -anonymity model protects user’s privacy by placing at
least k users in an equivalence class (EC) with same QI’s values.

• Hence, the probability of re-identifying someone from T′ becomes 1/k.

• The SA’s disclosure in these two ECs(C2, C3) based on auxiliary


information is 100%.
Methodology
l -Diversity
• The ℓ -diversity privacy model was proposed to solve the k -
anonymity model’s limitations.
• According to this model, an EC satisfies ℓ -diversity property
if there are at least l ”well-represented” values for the SA
Methodology
t -Closeness
• The t -closeness privacy model suggests that the SA’s
values distribution in any EC of T′ differs from the overall
SA’s values distribution in T by at most threshold t.

• The t -closeness privacy model significantly improves the


user’s privacy, but it severely reduces the utility of the
released data.
Graph Data
anonymization
Graph data span many different domains,
ranging from online social network data
from networks like Facebook to
epidemiological data used to study the
spread of infectious diseases.

• SN data mainly via two methods:


metrices and graphs
• Graph can have their nodes and edges
labeled or unlabeled, undirected or
directed, weighted or unweighted
• Users’ privacy preservation in SN data
publishing is very challenging compared
to the relational data
Methodology
K-anonymization
• It is a graph modification technique
where, k- edges/vertices/both are modified

• The modifications techniques are: (i) edge


add, (ii) edge delete, (iii) edge add/del, (iv)
simple edge switch, (v) double edge
switch, and (vi) node addition.
Methodology
Graph generalization/clustering

• The generalization/clustering based


approaches anonymize SN data by
partitioning it into different clusters, and
generalizing the clusters into super
nodes/edges

• the cluster sizes and generalization


degrees are determined in a way that
maximal information is retained in the
clustered network
Disadvantages
Lose meaning User permisssion
Data may be less coherent You will have to request
and meaningful. Thus, the permission and that takes
quality of insights time.
decreases
Challenges/ Future Scope
Latest technologies have ability to process any kind of data with advanced
analytics tools to extract insights from collected data. Thus, security concerns
stay relevant.

1. User's groups’ privacy issues


2. Controlling excessive information loss caused by over-generalization of the QIs
3. Generic solutions for the social graph anonymization
4. Personalized privacy preservation in SNs
5. Devising privacy-friendly mechanisms for exceptional situations (Covid - 19)
Conclusion
• It is important to provide privacy
involving both tabular and SN data.
• Data Anonymization techniques help
preserve many malicious attacks on
data or sometimes make sure the
stolen data is rendered useless
• Tailoring the anonymization with
privacy objectives can adversely affect
the anonymous data utility, and vice-
versa.
References
1. "Anonymization Techniques for Privacy Preserving Data Publishing: A
Comprehensive Survey," in IEEE Access, vol. 9, pp. 8512-8545, 2021, doi:
10.1109/ACCESS.2020.3045700. (2020)
Thank You!

You might also like