Professional Documents
Culture Documents
A Q U I C K I N F O R M AT I O N G U I D E
Achieving
Data Privacy Using
Machine Learning
Un-learning
Presented by – Under Guidance of –
K. Priyanshu Manoj Mrs. P. G. Kolapwar
2020BCS063 Asst. Professor,
A-30 SGGSIE&T
Agenda
KEY TOPICS DISCUSSED IN THIS
PRESENTATION ARE -
• Why Data Privacy is important?
• How machine learning is potentially putting in danger the
Data and privacy of the USERS across the world
• What is Unlearning Process?
• What is Machine UN-Learning ?
• Methods used to Achieve the Machine Unlearning
• Impact of Machine Unlearning technology on Data Privacy
• The benefits of interactive technology, Remote learning and
the new normal
New trends to watch out for…
Data Privacy?
Why is it necessary?
• In 2020, the amount of users have realized that this data is
data on the internet is collected and is both used and sold
64 zettabytes .
(where a zettabyte is a trillion gigabytes).
WHY So?
Ex.
In 2014, the Spanish court ruled in a favor of a man who asked
that certain information be removed from Google searches.
WELL…
OF COURSE NOT!!!!!
We are coming here with a solution…
MACHINE UN-LEARNING…!!
WHAT?
METHODS
Error Minimization Approach
Gradient Based Method
Is it DIFICULT to UNLEARN the
machines?
In general, it is very difficult
to provably unlearn a data point,
Steps Involved:
1. It is intimately linked to how data is
partitioned to form shards: the goal of
aggregation is to maximize the joint predictive
performance of constituent models.
2. The aggregation strategy should not involve
the training data (otherwise the aggregation
mechanism itself would have to be unlearned
in some cases).
New trends to watch out for…
SISA was just one of the many researches that
are being conducted on this topic
There are many other interesting
ways to achieve the Machine
Unlearning…
Error Minimization
Approach
Thus,
THE TIME HAS COME…
when We can Scream Aloud!!!
Do you have any
questions?
I hope you all must have learned something new.
REFERENCES
• https://www.cse.iitd.ac.in/index.php/2011-12-29-23-14-40/cse-seminar-talks
• Speaker:
• Date:
• https://medium.com/syncedreview/machine-unlearning-fighting-for-the-right
-to-be-forgotten-c381f8a4acf5
• https://arxiv.org/pdf/1912.03817.pdf
• https://www.youtube.com/watch?v=xUnMkCB0Gns
• https://www.wired.com/story/machines-can-learn-can-they-unlearn/
• https://towardsdatascience.com/machine-unlearning-the-duty-of-forgetting-
3666e5b9f6e5
• https://github.com/cleverhans-lab/machine-unlearning
• https://en.wikipedia.org/wiki/Facebook%E2%80%93Cambridge_Analytica_da
ta_scandal
• And various online resources for gathering the satistics
Impact of Machine Unlearning technology on Data
Privacy
In unlearning research, we aim to develop an algorithm that can take as its input a trained machine learning
model and output a new one such that any requested training data (i.e., any data originally used to create the
machine learning model), has now been removed. A naive strategy is to retrain the model from scratch without
the training data that needs to be unlearned. However, this comes at a high computational cost; unlearning
research seeks to make this process more efficient.
Let’s describe the problem a bit more formally so we can see how it differs from other definitions of privacy. We
denote the user data that is requested to be unlearned as d_u. We need to develop an unlearning algorithm that
outputs the same distribution of models as retraining without d_u (the naive solution), which is our
(strict) deterministic definition that we explore to better align with the goals of new privacy legislation; in this
setting, we certainly unlearn the entirety of d_u’s contributions. If these distributions do not match, then there is
necessarily some influence from d_u that has led to this difference. Settings where an unlearning algorithm only
approximately matches the retraining distribution can be viewed as a (relaxed) probabilistic setting, where we
unlearn most (but not all) of d_u’s contributions.
Such an example of a probabilistic definition of privacy is already found in the seminal work on
differential privacy, which addresses a different but related definition of privacy than what unlearning seeks to
achieve. For readers familiar with the definition of differential privacy, one can think of satisfying the strict
privacy definition behind unlearning through differential privacy as requiring that we learn a model with an
algorithm that satisfies ε=0. Of course, this would prevent any learning and destroy the utility of the learned
model. Research in this relaxed setting of unlearning may be able to further reduce computational load, but the
guarantee is difficult for non-experts to grasp and may not comply with all regulatory needs.
LEARNING V/S UNLEARNING
V/S
Machine learning is perceived to be able to exacerbate the
problem by collecting and analyzing all this data (from
emails to medical data) by holding the information
forever. Furthermore, using this information in
insurance, medical, and loan application models can lead
to obvious harm and amplify bias.
Switching to a researcher’s perspective, a concern is
that if and when a data point is actually removed from
an ML training set, that may make it necessary to
retrain downstream models from scratch.