Has Approximate Machine Unlearning Been Evaluated Properly? From Auditing To Side Effects

Has Approximate Machine Unlearning been evaluated properly?
From Auditing to Side Effects
Cheng-Long Wang1 , Qi Li2 , Zihang Xiang1 , and Di Wang1

1 King Abdullah University of Science and Technology
2 National University of Singapore
arXiv:2403.12830v1 [cs.LG] 19 Mar 2024
Abstract Legal frameworks such as the General Data Protection Regu-

lation (GDPR) have been instituted to reinforce data subject
The growing concerns surrounding data privacy and security
rights, encapsulating the "right to be forgotten" [3, 28].
have underscored the critical necessity for machine unlearn-
As one way of implementing the "right to be forgotten" in
ing — aimed at fully removing data lineage from machine
machine learning, machine unlearning offers an approach to
learning models. MLaaS providers expect this to be their ulti-
adapt trained models by removing selected data points, facil-
mate safeguard for regulatory compliance. Despite its critical
itating updates without full retraining [4, 7]. This technique
importance, the pace at which privacy communities have been
is expected to balance efficiency for Machine-Learning-as-a-
developing and implementing strong methods to verify the
Service (MLaaS) providers with user satisfaction, complying
effectiveness of machine unlearning has been disappointingly
with practical and legal expectations at the same time.
slow, with this vital area often receiving insufficient focus.
This paper seeks to address this shortfall by introducing well- The trustworthiness of machine unlearning hinges on the
defined and effective metrics for black-box unlearning audit- deployment of a robust auditing mechanism to prevent mis-
ing tasks. We transform the auditing challenge into a question alignments that could lead to privacy risks across design, im-
of non-membership inference and develop efficient metrics plementation, and auditing stages [32]. For example, design
for auditing. By relying exclusively on the original and un- flaws may fail to account for all data impacts, leaving rem-
learned models — eliminating the need to train additional nants behind. Implementation issues, such as coding errors,
shadow models — our approach simplifies the evaluation of could similarly allow data to persist, violating privacy. Inade-
unlearning at the individual data point level. Utilizing these quate auditing may overlook such problems, falsely signaling
metrics, we conduct an in-depth analysis of current approx- successful data removal. Therefore, a strong auditing process
imate machine unlearning algorithms, identifying three key is critical for ensuring thorough data deletion and avoiding
directions where these approaches fall short: utility, resilience, unfounded claims of unlearning efficacy by providers.
and equity. Our aim is that this work will greatly improve our The most stringent audit would entail retraining the model
understanding of approximate machine unlearning methods, without the targeted data and comparing the outcomes with
taking a significant stride towards converting the theoretical those from the model that underwent unlearning, or by provid-
right to data erasure into a auditable reality. ing evidence of the training procedure, as suggested by [21].
However, this approach runs counter to the very purpose of
machine unlearning, which seeks to avoid the impracticality
1 Introduction of full retraining in light of resource and time constraints.
Current research into machine unlearning evaluation typically
The widespread integration of machine learning models, like focuses on the performance of the unlearned model on the test
the advanced ChatGPT, into digital services and IoT devices dataset and the efficiency of the unlearning process. An exist-
has escalated data privacy concerns [5, 9]. These AI systems ing study by [31] introduced an ‘unlearning error’ metric, de-
are integrated into various applications, from streaming rec- vised by breaking down a single Stochastic Gradient Descent
ommendations to smart homes and connected vehicles, lead- (SGD) learning update. An algorithm capable of minimizing
ing to the collection of vast amounts of personal data and this unlearning error achieves what is referred to as ‘unlearn-
thus increasing the risk of privacy breaches [6, 14, 27]. To ing.’ However, their analysis only offers a loose boundary for
safeguard user privacy, it is crucial to establish methods that one-time unlearning. This overlooks the potential subsequent
empower individuals to exert control over their personal data, errors produced by continuous sample unlearning requests.
including the ability to request its removal from AI systems. Another common technique employed by researchers to
1
ascertain the effectiveness of unlearning protocols involves traditional performance metrics.
the use of Membership Inference Attacks (MIAs) [30]. These 2) Analysis of Unlearning Resilience: We explore how
attacks typically aim at determining whether the target sam- black-box models react to multiple unlearning requests, fo-
ples were included in a model’s training set. MIAs provide a cusing on the effect of successive unlearning attempts on
direct measure of whether data’s influence persists, ensuring previous unlearning results. Additionally, we delve into the
compliance with privacy requirements. In those MIAs-based relationship between unlearning resilience and memory cor-
unlearning auditing research, [24] adopted the Kolmogorev- relation. This resilience evaluation is crucial for determining
Smirnov distance to overcome the drawback of MIA that the long-term effectiveness of these unlearning methods.
returns false positives when data sources overlap. [19] intro- 3) Advocating Equity in Machine Unlearning: Our met-
duced an Ensembled Membership Auditing (EMA) method ric spotlights the variations in unlearning effectiveness, distin-
and applied it to a medical dataset. However, both [24] and guishing it from the conventional fairness concept in machine
[19] concentrate on dataset-level auditing, operating under the learning. The differing complexities of unlearning requests
assumption that all samples within the queried dataset possess highlight the necessity for equitable unlearning processes and
an identical membership status, i.e., all dataset samples are impartial auditing results.
uniformly considered as either included in or excluded from Our framework aspires to move beyond the average per-
the training set, with no individual distinction. formance evaluations and to eliminate the reliance on com-
Furthermore, while MIAs can be insightful, their core eval- parisons with retrained models, which are at odds with the
uation criteria may not align seamlessly with the goals of core tenets of machine unlearning. We thus advocate for the
unlearning auditing. A clear disconnect exists between the establishment of standardized metrics that will enable consis-
prevalent scenarios for MIAs and the necessity for auditing tent and fair evaluation of unlearning methods across diverse
machine unlearning. As pointed out by [8]: "If a member- algorithms within black-box settings. This initiative is crucial
ship inference attack can reliably violate the privacy of even for ensuring that evaluations remain meaningful and true to
just a few users in a sensitive dataset, it has succeeded." An the foundational goals of machine unlearning.
effective membership inference attacker should have the true-
positive rate (TPR, the rate with which a test can identify true
positives from among all those who should test positive) at 2 Related Works
low false-positive rate (FPR, the rate at which a test incorrectly
identifies negatives as positives, showing its error in falsely The related works of machine unlearning evaluation can be
alarming conditions.) when predicting membership, while a categorized as dataset auditing ( [19, 24]), Membership In-
competent machine unlearning auditor should have the TPR ference Attack (MIA) ( [12]– [20]), and unlearning metrics
at low FPR when predicting non-membership. If an auditor ( [31], [15]– [2]).
incorrectly asserts that a sample has been unlearned from
a model, it could potentially leave undetected privacy risks
looming in the future. It’s reasonable for MLaaS providers to
2.1 Dataset Auditing
refuse an unlearning request for a sample that isn’t present Calibrating: A recent work [24] considers a dataset auditing
in their training dataset. We will explore this issue in greater problem that the auditor having access to the query dataset
detail, highlighting that a successful MIA does not necessarily DQ and the model f (x; D∗ , θ∗ ) (trained on data D∗ ) needs to
mean the unlearning process has failed. determine whether f retains information about DQ . It points
To bridge the misalignment between the existing privacy out that MIAs always return false positives when DQ and
evaluation metrics and the objectives of unlearning auditing, D∗ overlap which frequently occurs in the real world. To
we introduce a refined black-box auditing framework aimed overcome the drawback of MIAs on data auditing, it first
at advancing the field of machine unlearning with evaluations creates a calibrated model trained on a calibration dataset
that are both comprehensive and nuanced. Our contributions DC from D∗ but no overlap with DQ . Based on the output
are as follows: distribution of the shadow models trained on DQ and DC , it
1) Enhanced Sample-Level Auditing Techniques: We introduces the Kolmogorov-Smirnov (K-S) distance to detect
craft effective auditing methods for sample-level black-box if the target model has used/forgotten the query dataset DQ .
unlearning, framing it as a non-membership inference chal- EMA: Ensembled Membership Auditing (EMA) [19] is de-
lenge. This method enables a thorough analysis of how indi- signed to verify if a trained model memorizes a query dataset
vidual samples are unlearned without the need to train addi- by ensembling the MIA results of each query sample with
tional shadow models. Instead, it leverages non-membership various metrics. The filter threshold of sample-wise prediction
inference based on comparisons between the original model is selected to maximize the balanced accuracy. EMA assumes
and the model post-unlearning. This strategy reveals nuanced a similar black-box setting as Calibrating for data auditing,
shifts in model behavior, greatly enhancing our comprehen- where: 1) the classification model parameters are all unknown,
sion of the unlearning process and offering insights beyond and 2) the posteriors of the query dataset on the classifica-
2
tion model. The auditor does not have access to the training that the two score features of UpdateMIA outperform UnLeak
dataset or the network parameters. in terms of efficiency and effectiveness. Regrettably, their in-
These methods, while innovative, primarily address dataset vestigations only encompass the model update setting where
membership as a collective problem, potentially overlook- it’s known that the target sample isn’t a part of the model’s
ing the nuanced requirements of auditing at the individual training set prior to the update. This neglects the machine
sample level. This generalized approach contrasts with the unlearning scenario, a situation we have previously discussed,
granular challenges faced by auditors of machine unlearning, where the attacker possesses different knowledge.
who must consider each sample individually. Here, the au-
ditor’s advantage lies in knowing the target sample’s initial 2.3 Unlearning Metrics
inclusion in the training dataset, aiming to verify if it has been
Unlearning Error: Thudi et al. [31] identify verification error,
effectively unlearned. This sample-specific focus introduces
the l2 distance between the weights of an approximately un-
a more intricate auditing task, emphasizing the need for tai-
learned model and the corresponding naively retrained model,
lored strategies that go beyond dataset-wide assessments to
as an approximate unlearning metric that should be optimized.
accurately evaluate machine unlearning processes.
Based on the theoretically analysis, they derive Unlearning
2.2 Unlearning-related MIAs Error as an proxy of verification error. Minimizing such an
Unlearning Error during training could improve the ability to
UnLeak: Chen et al. [12] developed an MIA method, designed later unlearn with smaller verification error. However, such
to extract private information by querying both the unlearned weight-dependent metric fails to be an auditing criteria. Be-
and original models. For convenience, we refer to this method sides, [32] proves that the approximately unlearned model is
as UnLeak. Their investigation into privacy leakage in un- close to an exactly retrained model, is incorrect. We suggest
learning settings successfully demonstrated that they could readers refer to [32] for more detailed information.
determine whether a target sample was part of the original Interclass Confusion: Interclass Confusion test [15] inject
model’s training set and was not part of the training set of the a strong differentiating influence specific to the forgetting
unlearned model at the same time. However, it’s important dataset into the training dataset via label manipulations. De-
to ask: Can UnLeak be considered as an unlearning auditing spite reporting fairly good unlearning evaluation results, such
method? The answer is unequivocally, no. This is because an IC test requires specifying the forgetting set prior to train-
the assumptions underpinning UnLeak are entirely different ing, and this could potentially compromise the model’s per-
from those in an auditing setting. In the case of UnLeak, it’s formance on the forgetting dataset.
not known whether a target sample appears in the training Epistemic Uncertainty [2] needs a white-box setting to ac-
set of the original model. Conversely, in an auditing setting, cess the model parameters gradients for calculating the Fisher
the auditor clearly knows that the target sample is part of the Information matrix (FIM). This requirement introduces ad-
training set and the goal is to determine whether this sample ditional computational demands, potentially increasing the
has been unlearned from the original model. resource consumption and complexity of implementing such
Moreover, the UnLeak method is predicated on the assump- an approach in practical settings. The added computational
tion of an honest unlearned model. It surmises that any al- burden may be significant, especially for large-scale models
terations in the target sample’s outputs should be the con- or when real-time performance is critical.
sequence of an unlearning process. However, our empirical
evidence suggests this may not always hold true. In particular, 3 Preliminaries
we demonstrate that UnLeak can be fooled by a dishonest
unlearned model, one that simply persists in fine-tuning the 3.1 (Approximate) Machine Unlearning
original model on the target sample to produce the updates
that UnLeak seeks, all the while bypassing any genuine un- Machine unlearning responds to the ‘right to be forgotten’
learning process. legal requirement by ensuring that once a data subject re-
UpdateMIA: Another related one of MIA is [20], puting forth quests the deletion of their data, its influence is also expunged
an innovative approach that integrates MIAs on multiple up- from any predictive or generative models in which it was
dated machine learning models. They introduce a new MIA, used. The process, therefore, is not just about data deletion,
which we call it UpdateMIA for simplicity, designed to dis- but about preserving the privacy of individuals by ensuring
close information about specific training examples in the their data leaves no residual footprint in the models. Initially
update set. UpdateMIA executes attack by observing certain proposed by [7], unlearning transforms some or all learning
aspects of the model both before and after the updates have algorithms in a system into a summation form. To forget a
been applied. Viewing machine unlearning as the ‘reverse training data sample, unlearning updates a small number of
operation’ of their model update setting, they adapt UnLeak summations, and is asymptotically faster than retraining from
and draw a comparison between UnLeak and UpdateMIA scratch. Following this, the field saw an evolution towards
within their model update setting. The comparison illustrates ‘approximate machine unlearning.’ This concept, focusing on
3
Table 1: Statistics of approximate unlearning evaluation pipeline elements
Evaluation element Test Set ACC Forgetting Set ACC Retaining Set ACC MIA Model Inversion Retrain Model
Amnesiac [18] ✓ ✓ ✓ ✓ ✓ ✓
L-Codec [26] ✓ ✓ ✓ ✓
DeltaGrad [34] ✓ ✓
Forsaken [25] ✓ ✓ ✓ ✓
Fisher [16] ✓ ✓ ✓ ✓
NTK-Fisher [17] ✓ ✓ ✓ ✓ ✓
sufficiently reducing the influence of data to meet privacy that should have been forgotten, thereby missing minor yet
standards, emerged as a practical approach amidst concerns crucial privacy violations. Additionally, the difficulty auditors
of computational efficiency and storage concern. face in retraining models from scratch, mainly due to limited
Let D = {x1 , ..., xn } denote a dataset comprising n data resources, exacerbating the problem. Such limitations under-
points, with each data point xi ∈ Rd . Consider a (possibly ran- score the critical need for auditing mechanisms, which assess
domized) learning algorithm, A : D → H , mapping a dataset a model’s outputs to deduce the success of the unlearning
D to a model in the hypothesis space H . A unlearning method process without necessitating model retraining. Establishing
R is a function from a model A (D ), dataset D , and a dataset effective auditing mechanisms is crucial for the wider accep-
D f to be removed from the model in H , where D f ∈ D . tance and implementation of approximate machine unlearning
techniques.
Definition 3.1 (Approximate Unlearning). A unlearning al-
gorithm R is an (ε, δ)-unlearning for a learning algorithm A
if
P(R (A (D ), D f , D )) ≤ eε P(A (D \D f )) + δ, (1) 4 Methodology
and
4.1 Formalizing Approximate Machine Un-
P(A (D \D f )) ≤ eε P(R (A (D ), D f , D )) + δ. (2) learning Auditing
Recent developments in this area have introduced innova- Approximate machine unlearning methods pose significant
tive techniques, such as Fisher Forgetting [16], NTK-Fisher challenges for auditability due to their inherent complexity
Forgetting [17], Amnesiac Machine Unlearning [18], Hessian- and opacity. These methods often involve intricate algorithms
based Deep Unlearning [26], DeltaGrad [34], and Neuron that make it difficult to directly observe or measure the extent
Masking [25], and applying differential privacy to enhance to which data has been effectively unlearned. While Jia et
the unlearning process [29], particularly in complex and large- al. [21] were able to construct a verifiable training history
scale models. that attests to proof of learning, this does not automatically
Despite notable advancements, machine unlearning still translate to conclusive evidence of successful unlearning for
grapples with significant challenges, including balancing com- the targeted dataset. The core uncertainty tied to these approx-
putational efficiency with utility, while also ensuring com- imate methods creates ambiguity when it comes to verifying
pliance with legal requirements such as the GDPR. While the true efficacy of the unlearning process. As emphasized by
approximate unlearning offers enhanced computational effi- [1, 33], hyperparameters could also highly impact the effec-
ciency and utility compared to exact unlearning, demonstrat- tiveness of algorithms designed for approximate unlearning.
ing its effectiveness in unlearning remains a complex issue Consequently, the mere execution of an unlearning algorithm
as pointed out by [32]. This paper zeroes in on the auditing does not ensure the utility of the unlearning process, empha-
challenges unique to approximate machine unlearning. sizing the need for iterative testing and objective metrics after
A common approach in the literature for evaluating approxi- unlearning to confirm that the algorithm’s intended effects are
mate machine unlearning methods involves comparisons with achieved.
models that have been retrained from scratch, as indicated Understanding how a model behaves after unlearning is key
in Table 1. This approach leans heavily on dataset perfor- to this refinement. To address the limitations of internal audits,
mance metrics, which may not provide an accurate measure black-box auditing methods are necessary to assess unlearn-
of unlearning effectiveness. This discrepancy arises because ing by analyzing model outputs from an external perspective,
dataset performance metrics are tailored to evaluate overall offering a uniform and independent auditing framework. This
model performance, not the detailed process of unlearning. is particularly critical when internal model details cannot be
These metrics might fail to identify remaining traces of data disclosed due to proprietary or privacy concerns.
4
4.1.1 Approximate Unlearning Auditing Problem 4.2 Problem Statement
Now, let’s discuss whether the issue of black-box unlearning Figure 1 outlines a detailed pipeline for auditing machine
auditing can be translated into the problem of (non-) mem- unlearning, depicting the systematic process involved in as-
bership inference attacks. Let event A represent the scenario sessing unlearning effectiveness within a model. It illustrates
where a target sample x is included in the training set of the the stages from identifying the data targeted for unlearning,
original model θori . Conversely, let event B indicate that the and applying unlearning methods, to evaluating the model’s
same target sample x is not part of the training set of the model ability to forget the specified data. Our research concentrates
after unlearning θunl . Utilizing Bayes’ Theorem, we can cal- on refining the auditing process highlighted in this pipeline,
culate the conditional probability of event B—the successful with a particular focus on developing a method to quantify the
unlearning of sample x from the original model—given that unlearning process through a measurable unlearning score.
event A is true. This allows us to assess the likelihood that
The auditing game, designed to assess non-membership at
sample x has been effectively removed from θori . The formula
the sample level, unfolds as presented in Algorithm 1:
provided by Bayes’ Theorem is as follows:
P(A|B) · P(B) Algorithm 1 Sample-level Non-membership Auditing Game

P(B|A) = . (3)
P(A) 1: Input: Data distribution πn , training algorithm T , analy-
Here, P(B|A) quantifies the probability that sample x has been sis technique A
unlearned (event B) given that it was originally part of the 2: Output: Success rate of auditing
model’s training set (event A). P(A|B) is the likelihood of 3: Generate dataset D using seed sD , following distribution
sample x being in the training set of θori given it is absent πn
from the unlearned model θunl ’s training set. 4: Train original model θori on D using seed sori with T
In the context of an attacker who lacks knowledge about 5: Select data record z = (x, y) from D using seed sz
whether a target sample x is a member of the training set of 6: Retrain model θF on D using seed s f ake with T
the original model θori , the attacker is tasked with estimating 7: Train target model θT on D \ z using seed sori with T
each of the three terms on the right side of Bayes’ theorem: 8: Flip an unbiased coin b ∈ {F, T } to choose between θF
P(A|B), P(B), and P(A). and θT
9: Present auditor with (θori , θb , z)
10: Auditor predicts b̂ using A (θori , θb , z)
4.1.2 Streamlining into Non-membership Inference 11: if b̂ = b then output T (success)
It is notable that unlike MIAs, in machine unlearning, the 12: else output F (failure)
server should reject the unlearning request if the target sample 13: end if
x was not in the training set, indirectly confirming its prior 14: return Average success rate over multiple iterations
inclusion (A). This makes P(B|A)—the chance of effectively
unlearning x—a key metric for auditors, especially when A’s
Consider (θ, z) as random vectors drawn from the joint
occurrence is confirmed, shifting their focus to this probability
distribution of the target model and the target data point under
as a measure of unlearning success. With P(A) = 1, Bayes’
one of two (non-) membership hypotheses:
Theorem simplifies to:
P(B|A) = P(A|B) · P(B), (4) HF :D ∼ i.i.d.(π, n), z ∼ D, θ ← T (D) (5)

HT :D ∼ i.i.d.(π, n), z ∼ D, θ ← T (D \ z) (6)
i.e., reducing P(B|A) to a direct function of P(A|B) and P(B).
Further, underpinned by Theorem 1, this simplification al-
lows us to equate P(B|A) with P(B), effectively framing the An effective Sample-level Machine Unlearning Auditor
auditing of approximate unlearning as a non-membership should be able to make confidential decisions between hy-
inference challenge, i.e., it clarifies that upon confirming A, potheses HF and HT for targeting samples.
analyzing unlearning auditing shifts seamlessly to evaluating
non-membership, streamlining the audit process. The proof
of Theorem 1 is provided in Appendix A. 4.3 Proposed Auditing Techniques
Theorem 1 (Conditional Probability of Unlearning Auditing). Let Mθ (z) be the membership indicator function for a model
The conditional probability P(B|A) equals the marginal prob- with parameters θ, where Mθ (z) = 1 if z is a member of the
ability P(B) when event A—the inclusion of sample x in the model (that is, in the training set on which the model θ was
original model θori ’s training set—is confirmed (P(A) = 1), trained) and Mθ (z) = 0 otherwise. We can categorize all sam-
simplifying the assessment of unlearning’s efficacy. ples into three subsets:
5
Unlearning
Utility
Unlearning
Training Set Request Non-member Set
Retaining Set
𝑓!$%& 𝑓!!"#
Unlearning
𝑓!$%& Algorithm 𝑓!!"# Unlearning
Resilience
Original Model Unlearned Model

Approximate Unlearning process posteriors
Inputs
𝑓!$%&
Query Unlearning
Estimation Equity
Target 𝑓!!"# Unlearning

samples scores
Auditing process Evaluation
Figure 1: Approximate Unlearning Auditing and Evaluation Framework
Likelihood’s Difference Score: The first involves esti-

Dmm = {z ∈ D | Mθori (z) = 1 and Mθunl (z) = 1}, mating the outputs of a model for non-member data. When
provided with the original model θori and its non-member
Dmn = {z ∈ D | Mθori (z) = 1 and Mθunl (z) = 0}, (7) set, we estimate a Gaussian distribution, characterized by
Dnn = {z ∼ π | Mθori (z) = 0 and Mθunl (z) = 0}, Gori ∼ N (µori , σ2ori ), based on the logit scaling model confi-
dences of θori within the non-member set. Similarly, for the
where Dmm denotes the retaining set, consisting of all samples
unlearned model, we obtain another Gaussian distribution
retained after unlearning, Dmn is the forgetting set, and Dnn
Gunl ∼ N (µunl , σ2unl ). Notable that the samples in Dnn are the
includes samples that are non-members of θori and θunl at the
non-member of θori and θunl at the same time, we use the
same time.
samples of Dnn to estimate Gori and Gunl . Given a target sam-
Building upon the following Lemma 2, which is generally
ple z, it’s natural to calculate two separate non-membership
accepted, we derive the corresponding outputs for the datasets
likelihoods according to Eq. (8):
defined earlier:
Lemma 2. For a target sample z = (x, y), the model θ has hθori (z) = 1 − Pr[gθori (z) > Gori ],
(9)
a loss Lθ (z) and assigns confidence scores fθ (x)y to its hθunl (z) = 1 − Pr[gθunl (z) > Gunl ].
ground truth label. The logit scaled confidence is defined
as gθ (z) = log( fθ (x)y /(1 − fθ (x)y )). A model trained to mini- We then normalize the difference between these likelihoods
mize training sample loss has lower expected loss and higher to obtain our first unlearning score.
confidence scores for training set samples than non-training
1 + hθunl (z) − hθori (z)
samples. L-Diff(z) = . (10)
2
Based on Lemma 2 and Eq. (7), we have
Difference’s Likelihood Score: The second method aims
Lθ (z0 ) < Lθ (z1 ), fθ (x0 )y0 > fθ (x1 )y1 , gθ (z0 ) > gθ (z1 ), to directly calculate the unlearning score. It tracks the changes
where z0 ∈ Dmm ∪ Dmn , z1 ∈ Dnn , θ = θori , (8) in the logit scaling model confidences from the original model
to the unlearned model for the provided samples.
or where z0 ∈ Dmm , z1 ∈ Dmn ∪ Dnn , θ = θunl .
Define Dpos = Dmn as the set of samples being unlearned,
Leveraging these inequality relationships, we will design and Dneg = Dmm ∪ Dnn as the set of samples whose member-
two auditing scores to efficiently assess the presence of non- ship status remains unchanged throughout the learning and
member data, thereby enhancing our ability to observe and unlearning process. It is logical to deduce that samples in
quantify the unlearning process. Let us now introduce how we Dneg exhibit an identical distribution in confidence changes.
extract non-member information from the original model and In contrast, samples in Dmn display a distinct distribution of
the unlearning model to conduct effective unlearning auditing. confidence changes. For a given dataset Dneg , Difference’s
6
Likelihood Score involves calculating the confidence changes rationale that a continuous score offers a more nuanced and
of its samples from θori to θunl and fitting these to a Gaussian detailed understanding of the model’s behavior, specifically in
distribution Gneg , characterized by mean µneg and variance the context of approximate unlearning. We prioritize the true
σ2neg . positive rate at a low false positive rate (TPR@lowFPR) as
However, given that model confidence is bounded to the our primary metric for evaluating the quality of our auditing
range [0, 1], the confidence change is constrained to [−1, 1]. scores. Additionally, we report on the balanced area under
This range implies a non-normal distribution. To address the curve (AUC) in our experimental analyses following the
this, we employ a three-step boosted strategy: 1) apply two implementation of [8].
distinct adjustment methods to parameterize the non-normal It is significant that compared to the LiRA method pro-
confidence change distribution; 2) calculate the likelihoods posed by Carlini et al. [8], which is designed for MIAs and re-
of a target sample’s confidence change relative to both dis- quires training shadow models to accurately estimate the non-
tributions, separately; 3) combine two likelihoods using a member set distribution, our two unlearning auditing methods
boost function to obtain our final unlearning score. The two simplify the process. These methods leverage the outputs
adjustment methods are logit scaling and Median Absolute of the original and unlearned models when applied to non-
Deviation (MAD, [23]). We provide details in the following. member sets, which is easy to obtain. This approach not only
With logit scaling, for a sample z, we compute its logit streamlines the auditing process but also reduces the com-
scaling confidence change as φA (z) = gθunl (z) − gθori (z). Us- putational effort by eliminating the need for shadow model
ing Dneg , we can approximate the corresponding mean µA training, offering a more practical and efficient way to gauge
and variance σ2A of these logit scaling confidence changes. the effectiveness of data unlearning techniques.
To estimate a target sample’s likelihood as a non-member,
we use its logit scaling confidence change and calculate its
likelihood relative to the estimated Gaussian, which serves as 5 Experimental Evaluation Setup
the unlearning score:
5.1 Data and Setup
DA Lik(z) = 1 − Pr[φA (z) > Gneg ], where Gneg ∼ N (µA , σ2A ).
5.1.1 Datasets
For MAD, we compute the samples confidence changes
as φB (z) = fθunl (x)y − fθori (x)y , where z = (x, y). Then we di- We take 5 datasets in our evaluation benchmark, consist-
rectly calculate the mean µB of the model confidence change, ing of two image classification datasets (Cifar10, Cifar100),
replacing the variance calculation with MAD. Thus we have a shopping record dataset (Purchase100), a hospital record
varB = c · MAD, where c is a constant to make MAD con- dataset (Texas100), and a location dataset (Location30). These
sistent with the standard deviation for normal distribution. datasets provide a varied testing ground for machine learning
MAD, defined as the median of the absolute deviations from models, balancing the need for diverse data types with privacy
the data’s median, offers a more robust estimate, being less considerations.
influenced by outliers compared to standard deviation or vari- Cifar10: This dataset contains a diverse set of 60,000 small,
ance. When estimating the likelihood of a target sample as 32x32 pixel color images categorized into 10 distinct classes,
a non-member, we directly use its model confidence change each represented by 6,000 images. It is organized into 50,000
and calculate its likelihood relative to the estimated Gaussian training images and 10,000 test images. The classes in Cifar10
as the unlearning score: are exclusive, featuring a range of objects like birds, cats, and
trucks, making it ideal for basic tasks.
DB Lik(z) = 1−Pr[φB (z) > Gneg ], where Gneg ∼ N (µB , varB ). Cifar100: Cifar100 is similar to Cifar10 in its structure, con-
sisting of 60,000 32x32 color images. However, it expands
For a sample z in Dpos , which is unlearned from the model, the complexity with 100 unique classes, which can be further
φ(z) is expected to exhibit a larger amplitude compared to organized into 20 superclasses. Each image in Cifar100 is as-
samples in Dneg . The second unlearning score can be calcu- sociated with two types of labels: a ‘fine’ label identifying its
lated as follows: specific class, and a ‘coarse’ label indicating the broader su-
D-Liks(z) = Boost(DA Lik(z), DB Lik(z)). (11) perclass it belongs to. This dataset is suited for more nuanced
evaluation.
Considering that Dmm is not available for an unlearning au- Purchase100: It contains 197,324 anonymized data about
ditor, we simply use Dnn to calculate the approximated mean customer purchases across 100 different product categories.
µnn and variance σ2nn . For the sake of simplicity, we choose to Each record in the dataset represents an individual purchase
use the arithmetic average as our boost function. Note that Our transaction and includes details such as product category,
auditing framework generates scores that represent confidence quantity, and transaction time, useful for analyzing consumer
values, rather than straightforward binary decisions, scaled behavior patterns.
within the range of [0, 1]. This approach is grounded in the Texas100: This dataset comprises 67,330 hospital discharge
7
records from the state of Texas. It includes anonymized patient Gradient Ascent: Initially, we train the initial model on the
data such as diagnosis, procedure, length of stay, and other unlearning set to record the accumulated gradients. Subse-
relevant clinical information. The data is grouped into 100 quently, we update the original trained model by adding the
classes, and used for healthcare data analysis and predictive recorded gradients as the inverse of the gradient descent learn-
modeling. ing process.
Location30: This dataset includes 5,010 location "check-in" Fisher Forgetting: As per [16], we utilize the Fisher In-
records of different individuals. It is organized into 30 distinct formation Matrix (FIM) of samples related to the retaining
categories, representing different types of geosocial behavior. set to calculate optimal noise for erasing information of the
The 446 binary attributes correspond to various regions or unlearning samples. Given the huge memory requirement of
location types, denoting whether or not the individual has vis- the original Fisher Forgetting implementation, we employ an
ited each area. The primary classification task involves using elastic weight consolidation technique (EWC) (as suggested
these 446 binary features to accurately predict an individual’s by [22]) for a more efficient FIM estimation.
geosocial type. Forsaken: We implement the Forsaken [25] method by mask-
ing the neurons of the original trained model with gradients
5.1.2 Data Processing (called mask gradients) that are trained to eliminate the mem-
orization of the unlearning samples.
In our experiments, we divide the datasets into three distinct L-Codec: Similar to the Fisher Forgetting, L-Codec uses
sets: training set, test set, and shadow set. The training set is optimization-based updates to achieve approximate unlearn-
employed for training the original model, the test set for as- ing. To make the Hessian computation process scalable with
sessing the performance of the trained model, and the shadow the model dimension, [26] leverages a variant of a new con-
set for the development of attack models. For the Cifar10 and ditional independence coefficient to identify a subset of the
Cifar100 datasets, we have randomly chosen 20,000 images model parameters with the most semantic overlap on an indi-
from their training datasets to form the shadow set for each. vidual sample level.
The rest of the images are utilized for training classifiers, Boundary Unlearning: Targeting class-level unlearning, this
while the predefined test images make up the test set. In the method [11] shifts the decision boundary of the original
case of the Purchase100, Texas100, and Location30 datasets, trained model to imitate the decision behavior of the model
we randomly select 20% of the records to the test set for each. retrained from scratch.
Subsequently, we select 40,000, 20,000, and 1,000 records SSD: Selective Synaptic Dampening (SSD) [13] is a fast, ap-
as the shadow set for the Purchase100, Texas100, and Loca- proximate unlearning method. SSD employs the first-order
tion30 datasets, respectively, leaving the remaining records FIM to assess the importance of parameters associated with
as the training set for each dataset. the unlearning samples. It then induces forgetting by pro-
portionally dampening these parameters according to their
5.1.3 Original Models relative importance to the unlearning set in comparison to the
broader training dataset.
We employ the Resnet18 model as the original model for
learning tasks on the Cifar10 and Cifar100 datasets. For clas-
sification tasks involving the Purchase100, Texas100, and 5.3 Metrics Baselines
Location30 datasets, we have implemented a four-layer fully We compare our designed metrics with the state-of-the-art pri-
connected neural network as the original model. This archi- vacy leakage metrics related to machine unlearning, focusing
tecture comprises hidden layers with 1024, 512, 256, and 128 on exact retraining auditing tasks to demonstrate the utility of
neurons, respectively. our metrics.
UnLeak: Following the instruction of [12], we train a base-
5.2 Unlearning Baselines line model, referred to as the ‘shadow original model’, on
the entire shadow dataset. Subsequently, we train 16 separate
We implement 7 approximate machine unlearning methods in ‘shadow exact retraining models’ on 16 distinct subsets of the
our evaluation framework, using exact retraining as the bench- shadow dataset, each termed a ‘shadow retaining set’. These
mark for ground truth. Depending on the types of unlearning shadow retaining sets, randomly sampled from 80% of the
required, the retaining set is formed by excluding the selected shadow dataset, constituted 50% of the shadow dataset’s size
unlearning samples from the original training set. for each time. The adversary then process the remaining 20%
Exact Retrainin:. The model is initialized and retrained on of the shadow dataset samples, feeding them into their corre-
the retained dataset, utilizing the same random seeds and sponding shadow retraining models and the shadow original
hyperparameters as those used in the original model. model to obtain their posterior outputs. We train the attack
Fine Tuning: We fine-tune the originally trained model on model using the features constructed from these posteriors. Fi-
the retained set for 5 epochs with a small learning rate. nally, the attack model can make prediction for target samples
8
0.6 0.8 1.0
0.6 0.8
TPR@FPR=0.01
TPR@FPR=0.01
TPR@FPR=0.01
0.4
0.6
0.4
0.4
0.2
0.2
0.2
0.0 CIFAR10 CIFAR100 Purchase Texas Location 0.0 CIFAR10 CIFAR100 Purchase Texas Location 0.0 CIFAR10 CIFAR100 Purchase Texas Location
1.0 1.0
0.8
0.8 0.8
0.6
0.6 0.6
AUC
AUC
AUC
0.4 0.4 0.4
0.2 0.2 0.2
0.0 CIFAR10 CIFAR100 Purchase Texas Location 0.0 CIFAR10 CIFAR100 Purchase Texas Location 0.0 CIFAR10 CIFAR100 Purchase Texas Location
Random Sample Unlearning Partial Class Unlearning Total Class Unlearning
UnLeak UnLeak+LS LiRA UpdateRatio UpdateDiff L-Diff D-Liks
Figure 2: Metric utility on 3 types of exact unlearning
based on their constructed features. 6 Auditing Metric Validation

UnLeak+LS: To enhance the performance of UnLeak, we In this section, we start by validating the effectiveness of our
further apply the logit scaling to the obtained posteriors, as newly designed unlearning auditing metrics. This is achieved
implemented in LiRA [8]. Then we train the attack model through a comparative analysis between the original machine
using the new constructed features from the scaled logits. learning model and the corresponding exact retrained model,
which excludes certain selected samples. Our findings demon-
LiRA: We implement the offline LiRA on θunl to predict if strate that the utility of unlearning can be quantifiably mea-
the target point is a non-member of θunl . These shadow mod- sured in black-box settings, applicable to three distinct types
els replicate the architectural design of the original models of unlearning requests, each presenting a distinct challenge.
and use the same random seeds. Additionally, the hyper pa- These include:
rameters employed in training the shadow models are aligned
with those used in the training of the original models. Sim- • Random Sample Unlearning. This process targets sam-
ilarly, we train 128 shadow models on randomly sampled ples selected at random, without focusing on any specific
shadow datasets. For each target sample, we can query its class or data characteristic.
non-member outputs on those shadow datasets, estimate the
corresponding Gaussian distribution, and calculate the LiRA • Partial Class Unlearning. This approach involves un-
score. learning a portion of the samples from a specific class in
the original model.
UpdateRatio, UpdateDiff: Following the procedure outlined
by Jagielski et al. [20], we calculate the LiRA scores for a • Total Class Unlearning. The model is required to un-
given target sample using both the original trained model learn all instances belonging to a specified class.
and the unlearned model. For estimating the LiRA scores,
128 shadow models were employed. We then combined two
scores into a single score using the ‘ScoreRatio’ and ‘ScoreD- 6.1 Evaluating Metric Effectiveness
iff’ method used in [20], separately. Based on the scoring
function used, we named the two adversarial methods ‘Updat- For random sample unlearning, we remove 500 randomly
eRatio’ and ‘UpdateDiff’ for convenience. chosen samples from the training set and retrain the model
from scratch for each dataset. For partial class unlearning,
It’s important to note that LiRA, UpdateRatio, and Up- we simplify this by setting the portion to 50%. For both par-
dateDiff are originally designed for membership inference tial class unlearning and total class unlearning, We select
purposes. In our experiments, we adapt their outputs from indi- the first 10 classes from each dataset and perform unlearn-
cating membership probability to reflecting non-membership ing for each class individually, averaging the results across
probability by flipping the outcomes. these 10 classes. The measuring results of L-Diff, D-Liks
9
CIFAR10 1
CIFAR100 Purchase Texas Location
1 1 1 1
1e-1 1e-1 1e-1 1e-1 1e-1
TPR
TPR
TPR
TPR
TPR
1e-2 1e-2 1e-2 1e-2 1e-2
1e-3 1e-3 1e-3 1e-3 1e-3
1e-41e-5 1e-4 1e-3 1e-2 1e-1 1 1e-41e-5 1e-4 1e-3 1e-2 1e-1 1 1e-41e-5 1e-4 1e-3 1e-2 1e-1 1 1e-41e-5 1e-4 1e-3 1e-2 1e-1 1 1e-41e-5 1e-4 1e-3 1e-2 1e-1 1
FPR FPR FPR FPR FPR
Figure 3: Auditing results on exact random sample unlearning
and other 5 related MIAs are reported in Figure 2. We pri- Auditability Trends Across Unlearning Tasks. It is worth
oritize the TPR at a 0.01‰ FPR as the primary metric for emphasizing that the auditability of unlearning tasks exhibits
interpreting the audit results. The AUC scores are also pro- a distinct trend, closely tied to the specificity with which sam-
vided for reference. Clearly, L-Diff and D-Liks—particularly ples are unlearned and subsequently distinguished from those
D-Liks—demonstrate superior performance over other base- retained. Random sample unlearning, as the most challenging
lines with a significant margin. For LiRA, UpdateRatio, and unlearning task, results in the forgetting set potentially shar-
UpdateDiff, although they achieve competitive AUC scores, ing a similar distribution with the retaining set. It introduces
their TPR at an FPR of 0.01‰ significantly lags behind that considerable hurdles in its auditing process as well. Not a sin-
of L-Diff and D-Liks. This underscores our previous discus- gle auditing metric attains a 100% AUC score. For datasets
sion: a high TPR at a low FPR membership inference does like CIFAR10, Purchase, and Texas, L-Diff and D-Liks signif-
not necessarily translate to a high TPR at a low FPR in non- icantly outshine other MIAs by achieving a ten times higher
membership inference. TPR at the 0.01‰ FPR, yet they manage to secure only a 10%
We detail the variations in TPRs as the FPR shifts from 1e- TPR. This situation underscores the intrinsic difficulties faced
5 to 1e-1 for auditing metrics applied to exact random sample when auditing sample-level unlearning tasks. In scenarios of
unlearning, as shown in Figure 3. On a logarithmic scale, partial class unlearning, there is a noticeable improvement in
L-Diff and D-Liks maintain a high TPR, offering an order of the performance of auditing metrics, indicating a positive shift
magnitude improvement over other baselines, particularly at in the auditing’s effectiveness. However, the most surprising
lower FPRs. In fact, the AUC curves for LiRA, UpdateRatio, results occur with the auditing of total class unlearning, where
and UpdateDiff only begin to align with those of L-Diff and L-Diff and D-Liks manage to achieve a 100% TPR at an FPR
D-Liks when the FPR exceeds 0.5—a considerably high FPR of 0.01‰. This significant advancement highlights the critical
threshold, as shown in Appendix B. The discrepancy between role of distinguishability between unlearned and retained data
L-Diff and D-Liks is small. Unless stated otherwise, we will in simplifying the audit process, proving that greater clarity
utilize the average outcomes of L-Diff and D-Liks for the in differentiation directly facilitates smoother auditing.
analysis presented in the subsequent sections.
6.2 Observations 7 Auditing of Approximate Unlearning

Direct vs. Shadow Model-based Auditing. L-Diff and D-Liks
achieve higher TPRs at low FPRs by directly using outputs In this section, we employ the proposed auditing metrics to
from the original and unlearned models, relying only on non- assess the performance of existing unlearning baselines. Our
member data for accurate non-membership estimations. Con- approach involves the implementation and evaluation of seven
versely, UpdateRatio and UpdateDiff show lower TPRs due approximate unlearning methods for their unlearning util-
to their dependence on per-example difficulty scores. Specifi- ity, resilience, and equity. The unlearning utility is precisely
cally, these methods attempt to gauge the likelihood of mem- quantified using our unique set of black-box unlearning audit
bership by estimating a Gaussian distribution of processed metrics. We then examine the resilience of these unlearning
outputs under the assumption that the target sample is absent methods, focusing on how the unlearning utility responds to
from a collection of shadow models, which complicates the variations in the number of unlearning requests. In addition,
estimation process and reduces TPRs at low FPRs. This high- we assess unlearning equity by investigating the disparity
lights the effectiveness of direct model output analysis over in the difficulty of unlearning between various samples or
methods that introduce additional model complexity, under- classes, showing that some classes might be unlearned more
scoring the value of leveraging the distinct characteristics of easily than others. Similarly to the exact retraining method,
the original and unlearned models for precise auditing. we apply three types of unlearning requests to these baselines.
10
Table 2: Unlearning results (TPR@FPR=0.01‰) of 7 approximate unlearning baselines
Dataset Retrain FT Ascent Forsaken Fisher L-Codec Boundary SSD

CIFAR10 13.73±0.54 0.30±0.12 0.32±0.19 0.19±0.15 0.09±0.05 0.05±0.04 - 0.37±0.21
Random
CIFAR100 59.18±0.79 0.52±0.25 0.18±0.11 0.20±0.11 0.20±0.13 0.10±0.07 - 0.38±0.09

Purchase 11.34±0.27 4.04±0.21 0.20±0.02 0.18±0.03 0.11±0.04 0.11±0.04 - 0.27±0.03
Texas 13.92±0.38 8.22±0.20 0.23±0.02 0.10±0.02 0.15±0.02 0.19±0.03 - 4.85±0.14
Location 56.49±0.36 15.41±0.83 0.02±0.01 0.01±0.01 0.08±0.02 0.20±0.05 - 0.42±0.06
CIFAR10 21.06±0.67 0.24±0.06 1.32±0.22 1.33±0.24 0.08±0.03 1.23±0.27 1.35±0.21 0.17±0.05
Partial Class
CIFAR100 76.34±1.14 6.44±1.11 55.36±2.69 55.23±3.29 0.58±0.30 23.54±1.57 49.51±2.63 3.17±0.56

Purchase 29.04±0.81 19.36±1.06 0.21±0.02 0.11±0.01 0.03±0.00 0.00±0.00 40.35±2.49 12.29±0.93
Texas 41.63±1.08 47.41±1.29 57.18±1.88 1.74±0.21 0.07±0.02 - 73.44±2.23 60.17±2.28
Location 75.49±0.52 60.95±0.95 35.44±1.03 2.47±0.28 3.12±0.31 1.75±0.21 70.61±3.00 42.13±2.00
CIFAR10 99.98±0.01 21.12±1.86 75.45±0.88 9.79±0.80 0.04±0.02 46.02±1.62 44.37±2.43 25.35±1.52
Total Class
CIFAR100 100.00±0.00 71.39±1.81 69.19±1.49 58.62±1.89 0.26±0.12 15.28±0.27 65.10±1.75 2.06±0.41

Purchase 100.00±0.00 100.00±0.00 0.10±0.01 0.02±0.00 0.02±0.00 0.01±0.00 99.96±0.00 17.04±0.52
Texas 91.64±0.31 93.09±0.15 56.70±0.97 0.61±0.09 29.16±0.03 - 87.21±0.14 59.50±1.27
Location 100.00±0.00 83.84±0.36 41.73±0.46 0.93±0.11 11.37±0.21 2.67±0.14 100.00±0.00 59.56±1.36
7.1 Unlearning Utility 7.2 Unlearning Resilience

Following the implementation of metric validation in the last Total class unlearning results (TPR@FPR=0.01 ) over Time
section, we measure the approximate unlearning results of var- 100% Retrain
ious baseline algorithms. Table 2 presents the detailed results FT
80% Ascent
TPR@FPR=0.01
of TPR at a fixed low FPR across different datasets, with ex- Forsaken
60% Fisher
act retraining serving as the ground truth. The corresponding L-Codec
40% Boundary
AUC scores are available in Table 3, located in Appendix C. SSD
Notably, the effectiveness of unlearning algorithms varies 20%
considerably based on the dataset and unlearning scenario, 0% t0(airplane) t1(car) t2(bird) t3(cat) t4(deer)
Time step
with none of the baseline methods consistently reaching the
performance benchmark set by exact retraining. (a) CIFAR10
Total class unlearning results (TPR@FPR=0.01 ) over Time
Specifically, in the random sample unlearning scenario, 100% Retrain
the Finetune algorithm notably excels, surpassing the perfor- FT
80% Ascent
mance of other baseline methods. This indicates the complex-
TPR@FPR=0.01
Forsaken
60% Fisher
ity of creating a general unlearning algorithm that can adeptly L-Codec
handle cases where the forgetting set and the retaining set 40% Boundary
20% SSD
share similar distributions. In the partial class unlearning sce-
nario, achieving results that compete with the ground truth 0% t0(apple) t1(fish) t2(baby) t3(bear) t4(beaver)
remains challenging. FT, Ascent, and SSD algorithms demon- Time step
strate the potential for over-unlearning in the Texas dataset—a (b) CIFAR100
phenomenon that appears to be dataset-specific, as it is not
consistently observed across other datasets. The implementa- Figure 4: Total class unlearning resilience results
tion of the Boundary algorithm in this context serves as a case (TPR@FPR=0.01‰) of baselines
study of over-unlearning, given its original design for total
class scenarios. It’s important to note that over-unlearning In this section, we explore a scenario of continual unlearn-
does not necessarily translate to improved auditing outcomes. ing, where unlearning requests arrive sequentially, necessi-
When it comes to total class unlearning, almost every algo- tating the unlearning algorithm to execute the unlearning
rithm shows improved performance, yet no single method process multiple times. To simulate this, for each dataset, we
consistently outperforms across all datasets, including the randomly select five distinct subsets from the training set. For
Boundary algorithm, which is tailored for class-level unlearn- random sample unlearning, we randomly draw five groups
ing. without crossover from the training set, each comprising 100
11
samples. For partial class unlearning and total class unlearn- 7.3 Unlearning Equity
ing, we randomly select 5 classes to make up the unlearning
queue. The chosen unlearning algorithms are then tasked with We continued our analysis and assessed the variations in un-
conducting the unlearning process five times in sequence on a learning equity across different unlearning groups for each
given original model. This leads to that by the end of the pro- unlearning baseline. We focused on two key tasks: partial
cess, all five distinct subsets have been unlearned. To evaluate class unlearning and total class unlearning. Our findings, il-
the impact of successive unlearning processes on previously lustrated through clear visualizations, highlight the unlearning
unlearned groups, we track and report the auditing scores of equity among groups. Specifically, Figure 5 shows the TPR
the first unlearning group after each unlearning iteration. This at 0.01‰ FPR for total class unlearning, a task noted for
approach allows us to observe the cumulative effects of the its strong auditability. Figure 13 reports the results of par-
unlearning process on the model’s performance and integrity tial class unlearning. For a coherent comparison, we use the
over time. Figure 4 provides the results on two image datasets, highest TPR observed as a benchmark to determine the polar
CIFAR10 and CIFAR100; the results of other datasets and chart’s radius. This strategy scales the performance of other
random sample unlearning are provided in Appendix D. groups against this peak value, providing a graphical depic-
tion of unlearning equity. This method not only showcases
the unlearning equity across groups but also offers an acces-
Unlearning resilience vs. privacy onion effect. The CIFAR10 sible, comparative analysis of our results. Additional AUC
dataset’s unlearning sequence starts with the ‘airplane’ class, scores are provided in Appendix E for a more comprehensive
followed by the ‘automobile’ class, denoted as ‘car’ in Fig- understanding.
ure 4. Initially, when only the ‘airplane’ class is removed, algo- It is evident from our analysis that significant disparities
rithms such as FT, Ascent, L-Codec, and Boundary maintain exist in the unlearning equity of various unlearning meth-
a TPR@FPR=0.01‰ above 30%. The scenario significantly ods. These discrepancies highlight that even specialized class-
shifts after the unlearning of the ‘car’ class, which shares level unlearning techniques like Boundary expanding exhibit
similar textures with ‘airplane’; a marked decrease in the un- variances in their unlearning capabilities. This finding raises
learning scores for ‘airplane’ is observed. This emphasizes concerns regarding unlearning equality for other methods
the profound impact on the model’s ability to disassociate such as SSD, Fisher, Ascent, and L-codec. These methods
from ‘airplane’ following the removal of ‘car’. Notably, the demonstrate considerable challenges in achieving uniform un-
unlearning of classes less similar to ‘airplane’, such as ‘bird’, learning across different data classes, pointing to a significant
‘cat’, and ‘deer’, leads to a progressive improvement in the issue of unlearning equality. This inconsistency suggests that
unlearning score for ‘airplane’. Meanwhile, CIFAR100, with while some groups may be effectively unlearned, others retain
its diverse range of classes (‘apple’, ‘aquarium fish’, ‘baby’, residual information, leading to potential privacy breaches
‘bear’, ‘beaver’), presents a more intricate scenario where the and compliance risks.
impacts of unlearning are less predictable due to the varied na- Does unlearning indeed incur an equity cost? This pivotal
ture of class features and relationships. This variation, where question emerges from our analysis. Figure 5 finds no clear
the impact of unlearning specific classes is contingent on their pattern of equity cost tied to the dataset itself. Instead, the
similarity to previously unlearned classes, echoes the ‘privacy capacity for achieving equity varies significantly across dif-
onion effect’, as illustrated by [10]. This concept underscores ferent algorithms. For instance, algorithms like Boundary and
the intricate, layered approach to memorization and forget- Finetune (FT) on the Purchase dataset successfully demon-
ting within machine learning models, demonstrating how data strate the possibility of equitable unlearning. In contrast, al-
privacy is managed through unlearning. gorithms such as Ascent, Fisher, and SSD exhibit substantial
equity issues. The challenge in unlearning certain classes
varies across different algorithms, suggesting that equity is-
However, acknowledging the ‘privacy onion effect’ does sues are more closely linked to the algorithms used rather
not eliminate concerns about the significant impact that sub- than to the datasets’ inherent characteristics. Notably, exact
sequent unlearning may have on the outcomes of earlier un- retraining demonstrates a capacity to achieve almost 100%
learning efforts. This is highlighted by the observation that TPR@FPR=0.01‰.
exact retraining is able to maintain near-perfect TPR scores
consistently throughout the process. This consistency indi-
cates that the ‘privacy onion effect’ becomes evident primarily 8 Discussions
after the model has initially been trained on related samples.
Therefore, effective approximate machine unlearning should 8.1 Limitations
aim to achieve a level of resilience comparable to that of exact
retraining, ensuring that the process of unlearning does not In this study, we presented two black-box unlearning auditing
detrimentally affect the model’s unlearning ability on previ- metrics integrated within a standardized evaluation frame-
ously unlearned samples. work. A key challenge identified relates to the data distribu-
12
CIFAR10 - Retrain CIFAR10 - FT CIFAR10 - Ascent CIFAR10 - Forsaken CIFAR10 - SSD CIFAR10 - Fisher CIFAR10 - Boundary CIFAR10 - L-Codec
CIFAR100 - Retrain CIFAR100 - FT CIFAR100 - Ascent CIFAR100 - Forsaken CIFAR100 - SSD CIFAR100 - Fisher CIFAR100 - BoundaryCIFAR100 - L-Codec
Location - Retrain Location - FT Location - Ascent Location - Forsaken Location - SSD Location - Fisher Location - Boundary Location - L-Codec
Purchase - Retrain Purchase - FT Purchase - Ascent Purchase - Forsaken Purchase - SSD Purchase - Fisher Purchase - Boundary Purchase - L-Codec
Texas - Retrain Texas - FT Texas - Ascent Texas - Forsaken Texas - SSD Texas - Fisher Texas - Boundary
Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 Class 8 Class 9 Class 10
Figure 5: Total class unlearning relative results (TPR@FPR=0.01‰)
tion characteristics between the forgetting set and the retain- 8.3 Privacy Leakage during Auditing
ing set. The overlap in these distributions can notably affect
the framework’s ability to accurately audit the unlearning
process. This observation emphasizes the urgent need for the
continued refinement of unlearning auditing methodologies.
Using non-membership inference for checking how well a
model forgets data (unlearning auditing) can accidentally
reveal if certain data was used for training, similar to mem-
8.2 Identified Side Effects bership inference attacks (essentially two sides of the same
coin) but focusing on data not used. To avoid these privacy
In our analysis presented in Section 7, we uncover a notable risks, we need careful approaches that separate unlearning
discrepancy between the efficacy of current approximate un- auditing from privacy attacks. Usually, privacy attacks look at
learning algorithms and the ground truth benchmarks, espe- the current model alone, but unlearning checks involve com-
cially the unlearning resilience and equity. A lot of backdoor paring the model before and after forgetting data. One viable
removal works use approximate machine unlearning meth- strategy to reduce direct exposure is introducing a temporal
ods to remove backdoor from the trained model. Our results delay in the availability of these models for auditing, prevent-
hint at the potential risks of these approaches. To mitigate ing attackers from simultaneously accessing both the original
these side effects, it is essential to incorporate robustness eval- and unlearned models.
uations, ensuring that unlearning algorithms function fairly
across diverse datasets and are capable of handling various
complexities and data distributions. Furthermore, the develop- We can also use privacy protection methods like differential
ment of adaptive unlearning frameworks is imperative. Such privacy, which adds a bit of randomness to model data, or cryp-
frameworks should be designed to dynamically adjust their tographic techniques such as secure multi-party computation
methods in response to the specific attributes of the data and (SMPC) or homomorphic encryption, which let auditors run
the inherent memorization patterns of the model, enhancing checks without seeing sensitive data directly. These methods
the long-term effectiveness and fairness of unlearning pro- help keep the model’s information safe while still allowing
cesses. for thorough unlearning checks.
13
References CA, USA, May 22-26, 2022, pages 1897–1914. IEEE,
2022.
[1] Samyadeep Basu, Phillip Pope, and Soheil Feizi. In-
fluence functions in deep learning are fragile. In 9th [9] Nicholas Carlini, Daphne Ippolito, Matthew Jagielski,
International Conference on Learning Representations, Katherine Lee, Florian Tramèr, and Chiyuan Zhang.
ICLR 2021, Virtual Event, Austria, May 3-7, 2021. Open- Quantifying memorization across neural language mod-
Review.net, 2021. els. In The Eleventh International Conference on Learn-
ing Representations, ICLR 2023, Kigali, Rwanda, May
[2] Alexander Becker and Thomas Liebig. Evaluating ma- 1-5, 2023. OpenReview.net, 2023.
chine unlearning via epistemic uncertainty. CoRR,
abs/2208.10836, 2022. [10] Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang,
Nicolas Papernot, Andreas Terzis, and Florian Tramèr.
[3] Theo Bertram, Elie Bursztein, Stephanie Caro, Hubert The privacy onion effect: Memorization is relative. In
Chao, Rutledge Chin Feman, Peter Fleischer, Albin Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Bel-
Gustafsson, Jess Hemerly, Chris Hibbert, Luca Inv- grave, K. Cho, and A. Oh, editors, Advances in Neu-
ernizzi, Lanah Kammourieh Donnelly, Jason Ketover, ral Information Processing Systems 35: Annual Confer-
Jay Laefer, Paul Nicholas, Yuan Niu, Harjinder Obhi, ence on Neural Information Processing Systems 2022,
David Price, Andrew Strait, Kurt Thomas, and Al Verney. NeurIPS 2022, New Orleans, LA, USA, November 28 -
Five years of the right to be forgotten. In Lorenzo Cav- December 9, 2022, 2022.
allaro, Johannes Kinder, XiaoFeng Wang, and Jonathan
Katz, editors, Proceedings of the 2019 ACM SIGSAC [11] Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and
Conference on Computer and Communications Security, Chen Wang. Boundary unlearning: Rapid forgetting of
CCS 2019, London, UK, November 11-15, 2019, pages deep networks via shifting the decision boundary. In
959–972. ACM, 2019. IEEE/CVF Conference on Computer Vision and Pattern
Recognition, CVPR 2023, Vancouver, BC, Canada, June
[4] Lucas Bourtoule, Varun Chandrasekaran, Christopher A. 17-24, 2023, pages 7766–7775. IEEE, 2023.
Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu
Zhang, David Lie, and Nicolas Papernot. Machine un- [12] Min Chen, Zhikun Zhang, Tianhao Wang, Michael
learning. In 42nd IEEE Symposium on Security and Backes, Mathias Humbert, and Yang Zhang. When ma-
Privacy, SP 2021, San Francisco, CA, USA, 24-27 May chine unlearning jeopardizes privacy. In Yongdae Kim,
2021, pages 141–159. IEEE, 2021. Jong Kim, Giovanni Vigna, and Elaine Shi, editors, CCS
’21: 2021 ACM SIGSAC Conference on Computer and
[5] Hannah Brown, Katherine Lee, Fatemehsadat Communications Security, Virtual Event, Republic of
Mireshghallah, Reza Shokri, and Florian Tramèr. What Korea, November 15 - 19, 2021, pages 896–911. ACM,
does it mean for a language model to preserve privacy? 2021.
In FAccT ’22: 2022 ACM Conference on Fairness,
Accountability, and Transparency, Seoul, Republic of [13] Jack Foster, Stefan Schoepf, and Alexandra Brintrup.
Korea, June 21 - 24, 2022, pages 2280–2292. ACM, Fast machine unlearning without retraining through se-
2022. lective synaptic dampening. CoRR, abs/2308.07707,
2023.
[6] Joseph A. Calandrino, Ann Kilzer, Arvind Narayanan,
Edward W. Felten, and Vitaly Shmatikov. "you might [14] Daniel Frassinelli, Sohyeon Park, and Stefan Nürnberger.
also like: " privacy risks of collaborative filtering. In I know where you parked last summer : Automated
32nd IEEE Symposium on Security and Privacy, SP reverse engineering and privacy analysis of modern cars.
2011, 22-25 May 2011, Berkeley, California, USA, pages In 2020 IEEE Symposium on Security and Privacy, SP
231–246. IEEE Computer Society, 2011. 2020, San Francisco, CA, USA, May 18-21, 2020, pages
1401–1415. IEEE, 2020.
[7] Yinzhi Cao and Junfeng Yang. Towards making systems
forget with machine unlearning. In 2015 IEEE Sympo- [15] Shashwat Goel, Ameya Prabhu, Amartya Sanyal, Ser-
sium on Security and Privacy, SP 2015, San Jose, CA, Nam Lim, Philip Torr, and Ponnurangam Kumaraguru.
USA, May 17-21, 2015, pages 463–480. IEEE Computer Towards adversarial evaluations for inexact machine
Society, 2015. unlearning. 2023.
[8] Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, [16] Aditya Golatkar, Alessandro Achille, and Stefano Soatto.
Andreas Terzis, and Florian Tramèr. Membership infer- Eternal sunshine of the spotless net: Selective forgetting
ence attacks from first principles. In 43rd IEEE Sympo- in deep networks. In 2020 IEEE/CVF Conference on
sium on Security and Privacy, SP 2022, San Francisco, Computer Vision and Pattern Recognition, CVPR 2020,
14
Seattle, WA, USA, June 13-19, 2020, pages 9301–9309. deviation around the median. Journal of Experimental
Computer Vision Foundation / IEEE, 2020. Social Psychology, 49(4):764–766, 2013.
[17] Aditya Golatkar, Alessandro Achille, and Stefano Soatto. [24] Xiao Liu and Sotirios A. Tsaftaris. Have you forgotten?
Forgetting outside the box: Scrubbing deep networks of A method to assess if machine learning models have
information accessible from input-output observations. forgotten data. In Anne L. Martel, Purang Abolmae-
In Computer Vision - ECCV 2020 - 16th European Con- sumi, Danail Stoyanov, Diana Mateus, Maria A. Zulu-
ference, Glasgow, UK, August 23-28, 2020, Proceedings, aga, S. Kevin Zhou, Daniel Racoceanu, and Leo Joskow-
Part XXIX, volume 12374 of Lecture Notes in Computer icz, editors, Medical Image Computing and Computer
Science, pages 383–398. Springer, 2020. Assisted Intervention - MICCAI 2020 - 23rd Interna-
tional Conference, Lima, Peru, October 4-8, 2020, Pro-
[18] Laura Graves, Vineel Nagisetty, and Vijay Ganesh. Am-
ceedings, Part I, volume 12261 of Lecture Notes in Com-
nesiac machine learning. In Thirty-Fifth AAAI Confer-
puter Science, pages 95–105. Springer, 2020.
ence on Artificial Intelligence, AAAI 2021, Thirty-Third
Conference on Innovative Applications of Artificial In- [25] Zhuo Ma, Yang Liu, Ximeng Liu, Jian Liu, Jianfeng
telligence, IAAI 2021, The Eleventh Symposium on Edu- Ma, and Kui Ren. Learn to forget: Machine unlearning
cational Advances in Artificial Intelligence, EAAI 2021, via neuron masking. IEEE Trans. Dependable Secur.
Virtual Event, February 2-9, 2021, pages 11516–11524. Comput., 20(4):3194–3207, 2023.
AAAI Press, 2021.
[26] Ronak Mehta, Sourav Pal, Vikas Singh, and Sathya N.
[19] Yangsibo Huang, Xiaoxiao Li, and Kai Li. EMA: au- Ravi. Deep unlearning via randomized conditionally
diting data removal from trained models. In Marleen independent hessians. In IEEE/CVF Conference on
de Bruijne, Philippe C. Cattin, Stéphane Cotin, Nicolas Computer Vision and Pattern Recognition, CVPR 2022,
Padoy, Stefanie Speidel, Yefeng Zheng, and Caroline Es- New Orleans, LA, USA, June 18-24, 2022, pages 10412–
sert, editors, Medical Image Computing and Computer 10421. IEEE, 2022.
Assisted Intervention - MICCAI 2021 - 24th Interna-
tional Conference, Strasbourg, France, September 27 - [27] Pardis Emami Naeini, Janarth Dheenadhayalan, Yuvraj
October 1, 2021, Proceedings, Part V, volume 12905 Agarwal, and Lorrie Faith Cranor. Which privacy and
of Lecture Notes in Computer Science, pages 793–803. security attributes most impact consumers’ risk percep-
Springer, 2021. tion and willingness to purchase iot devices? In 42nd
IEEE Symposium on Security and Privacy, SP 2021, San
[20] Matthew Jagielski, Stanley Wu, Alina Oprea, Jonathan R. Francisco, CA, USA, 24-27 May 2021, pages 519–536.
Ullman, and Roxana Geambasu. How to combine IEEE, 2021.
membership-inference attacks on multiple updated ma-
chine learning models. Proc. Priv. Enhancing Technol., [28] Stuart L Pardau. The california consumer privacy act:
2023(3):211–232, 2023. towards a european-style privacy regime in the united
states. J. Tech. L. & Pol’y, 23:68, 2018.
[21] Hengrui Jia, Mohammad Yaghini, Christopher A.
Choquette-Choo, Natalie Dullerud, Anvith Thudi, Varun [29] Ayush Sekhari, Jayadev Acharya, Gautam Kamath, and
Chandrasekaran, and Nicolas Papernot. Proof-of- Ananda Theertha Suresh. Remember what you want
learning: Definitions and practice. In 42nd IEEE Sympo- to forget: Algorithms for machine unlearning. In
sium on Security and Privacy, SP 2021, San Francisco, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N.
CA, USA, 24-27 May 2021, pages 1039–1056. IEEE, Dauphin, Percy Liang, and Jennifer Wortman Vaughan,
2021. editors, Advances in Neural Information Processing Sys-
tems 34: Annual Conference on Neural Information Pro-
[22] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, cessing Systems 2021, NeurIPS 2021, December 6-14,
Joel Veness, Guillaume Desjardins, Andrei A. Rusu, 2021, virtual, pages 18075–18086, 2021.
Kieran Milan, John Quan, Tiago Ramalho, Agnieszka
Grabska-Barwinska, Demis Hassabis, Claudia Clopath, [30] Reza Shokri, Marco Stronati, Congzheng Song, and Vi-
Dharshan Kumaran, and Raia Hadsell. Overcoming taly Shmatikov. Membership inference attacks against
catastrophic forgetting in neural networks. Proceed- machine learning models. In 2017 IEEE Symposium
ings of the National Academy of Sciences, 114(13):3521– on Security and Privacy, SP 2017, San Jose, CA, USA,
3526, 2017. May 22-26, 2017, pages 3–18. IEEE Computer Society,
2017.
[23] Christophe Leys, Christophe Ley, Olivier Klein, Philippe
Bernard, and Laurent Licata. Detecting outliers: Do not [31] Anvith Thudi, Gabriel Deza, Varun Chandrasekaran, and
use standard deviation around the mean, use absolute Nicolas Papernot. Unrolling SGD: understanding fac-
15
tors influencing machine unlearning. In 7th IEEE Eu-
ropean Symposium on Security and Privacy, EuroS&P
P(A|B) · P(B)
2022, Genoa, Italy, June 6-10, 2022, pages 303–319. P(B|A) =
IEEE, 2022. P(A)
P(A|B) · P(B)
=
[32] Anvith Thudi, Hengrui Jia, Ilia Shumailov, and Nicolas P(A|B) · P(B) + P(A|Bc ) · P(Bc ) (13)
Papernot. On the necessity of auditable algorithmic 1 − P(A|Bc ) · P(Bc )
=
definitions for machine unlearning. In 31st USENIX 1
Security Symposium (USENIX Security 22), pages 4007– = 1 − P(A|Bc ) · P(Bc ).
4022, 2022.
Therefore, P(B|A) = P(B) if and only if P(A|Bc ) = 1. It’s
obvious that when sample x is part of the training set of the
[33] Alexander Warnecke, Lukas Pirch, Christian Wressneg- unlearned model θunl , it must also be part of the training set
ger, and Konrad Rieck. Machine unlearning of features of original model θori , i.e., P(A|Bc ) = 1. Thus, the equality
and labels. 2023. P(B|A) = P(B) holds when A and B are independent. In other
words, leveraging non-membership inference allows us to
[34] Yinjun Wu, Edgar Dobriban, and Susan B. Davidson. perform unlearning auditing seamlessly.
Deltagrad: Rapid retraining of machine learning models.
In Proceedings of the 37th International Conference on
Machine Learning, ICML 2020, 13-18 July 2020, Virtual
B Additional results of metric validation
Event, volume 119 of Proceedings of Machine Learning
Research, pages 10355–10366. PMLR, 2020. Figure 6 and Figure 7 showcase how the TPRs of the different
privacy metrics vary with changes in the FPR for the exact
unlearning tasks.
A Proof of Theorem 1
C Additional results of unlearning utility
Proof. Given events A and B where A represents the inclu-
sion of a target sample x in the training set of an original Table 3 shows the AUC scores of 7 approximate unlearning
model θori , and B represents the absence of sample x from baselines.
the training set after the model has undergone a process of
so-called ‘unlearning’ to become θunl , the conditional proba-
D Additional results of unlearning resilience
bility P(B|A) is equal to the marginal probability P(B), when
we have validated the occurrence of event A (P(A) = 1). We provide the AUC scores and TPR@FPR=0.01‰ results
We analyze the relationship between A and B under two of unlearning baselines on 5 datasets.
distinct scenarios.
Case 1: The events A and B are statistically independent,
E Additional results of unlearning equity
which can occur if the server performing the unlearning pro-
cess has access to sample x. Under this condition, the presence Figure 13, 15, 14 present the unlearning equity results of
or absence of x in the original model θori has no bearing on different approximate unlearning methods.
its presence in the unlearned model θunl . For example, a dis-
honest server could incrementally include x in the training
data for the unlearned model θunl without it having been in
the original model. Formally, this independence implies that:
P(B|A) = P(B). (12)
Case 2: The events A and B are not statistically independent,

which can occur if the server lacks access to sample x during
unlearning. Here, the probability P(B) is conditioned by P(A).
For example, P(B) = 1 when P(A) = 0.
Applying Bayes’ Theorem under the assumption that
P(A) = 1, we have:
16
1.0 CIFAR10 CIFAR100 1.0 Purchase 1.0 Texas Location
1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
TPR
TPR
TPR
TPR
TPR
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
FPR FPR FPR FPR FPR
Figure 6: Auditing TPR-FPR curves on exact random sample unlearning
1.0 CIFAR10 CIFAR100 Purchase Texas Location

1.0 1.0 1.0 1.0
0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6
TPR
TPR
TPR
TPR
TPR
0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2
0.0 0.0 0.0 0.0 0.0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
FPR FPR FPR FPR FPR
Figure 7: Auditing TPR-FPR curves on exact partial class unlearning
Total class unlearning results (AUC) over Time

100% Retrain
FT
80% Ascent
Forsaken
60% Fisher
AUC
L-Codec
40% Boundary
20% SSD
0% t0(airplane) t1(car) t2(bird) t3(cat) t4(deer)

Time step
(a) CIFAR10
Total class unlearning results (AUC) over Time
100% Retrain
FT
80% Ascent
Forsaken
60% Fisher
AUC
L-Codec
40% Boundary
20% SSD
0% t0(apple) t1(fish) t2(baby) t3(bear) t4(beaver)

Time step
(b) CIFAR100
Figure 8: Total class unlearning resilience results (AUC) of

baselines on CIFAR10 and CIFAR100
17
Table 3: Unlearning results (AUC) of 7 approximate unlearning baselines
Dataset Retrain FT Ascent Forsaken Fisher L-Codec Boundary SSD

CIFAR10 65.48±0.89 53.17±0.70 51.82±0.84 52.98±0.83 50.63±0.57 48.31±0.49 - 53.48±0.41
Random
CIFAR100 86.31±0.61 57.83±0.77 50.84±0.59 53.24±0.30 50.97±0.77 56.48±0.20 - 55.82±0.32

Purchase 65.19±0.16 64.31±0.07 49.24±0.13 50.33±0.07 49.96±0.15 48.72±0.08 - 47.98±0.14
Texas 72.46±0.16 68.81±0.23 50.08±0.19 48.21±0.19 49.61±0.17 49.74±0.09 - 56.02±0.21
Location 85.47±0.18 79.08±0.22 49.19±0.13 46.41±0.22 51.88±0.09 51.74±0.16 - 50.21±0.08
CIFAR10 74.02±0.33 36.91±0.34 97.31±0.05 96.54±0.05 49.94±0.52 94.22±0.06 96.45±0.04 77.80±0.23

Partial Class
CIFAR100 91.33±0.57 77.22±0.94 98.90±0.11 99.29±0.06 49.94±1.25 70.45±0.62 98.91±0.08 89.61±0.35

Purchase 93.81±0.05 95.29±0.04 59.70±0.12 50.94±0.14 49.43±0.19 27.55±0.11 99.75±0.01 98.91±0.03
Texas 89.20±0.17 91.43±0.18 93.53±0.13 50.92±0.32 39.23±0.27 - 96.23±0.10 97.56±0.09
Location 94.96±0.16 95.66±0.14 87.30±0.22 51.70±0.41 53.40±0.45 32.98±0.45 99.23±0.09 97.04±0.16
CIFAR10 100.00±0.00 97.01±0.07 99.94±0.00 98.56±0.04 49.99±0.25 99.31±0.02 99.18±0.02 91.57±0.07

Total Class
CIFAR100 100.00±0.00 99.52±0.04 99.20±0.06 99.42±0.04 50.30±1.11 62.34±0.62 99.22±0.05 91.26±0.26

Purchase 100.00±0.00 100.00±0.00 57.48±0.10 52.60±0.09 49.11±0.10 56.23±0.11 100.00±0.00 99.14±0.02
Texas 99.52±0.01 98.58±0.04 93.72±0.09 54.22±0.27 52.74±0.18 - 96.27±0.03 97.97±0.06
Location 100.00±0.00 99.20±0.04 88.23±0.15 48.66±0.37 56.54±0.37 50.95±0.34 100.00±0.00 98.50±0.06
18
Total class unlearning results (TPR@FPR=0.01 ) over Time
100% Retrain Total class unlearning results (AUC) over Time
100% Retrain
FT
80% Ascent FT
80% Ascent
TPR@FPR=0.01
Forsaken
60% Fisher Forsaken
60% Fisher
Boundary
AUC
40% SSD Boundary
40% SSD
20%
20%
0% t0 t1 t2 t3 t4
Time step 0% t0 t1 t2 t3 t4
Time step
(a) Location
Total class unlearning results (TPR@FPR=0.01 ) over Time (a) Location
100% Retrain
FT
80% Ascent FT
80% Ascent
TPR@FPR=0.01
Forsaken
60% Fisher Forsaken
60% Fisher
L-Codec
AUC
40% Boundary L-Codec

40% Boundary
20% SSD
20% SSD
0% t0 t1 t2 t3 t4
Time step
(b) Purchase
Total class unlearning results (TPR@FPR=0.01 ) over Time (b) Purchase
100% Retrain
80% FT
Ascent FT
80% Ascent
TPR@FPR=0.01
60% Forsaken
Fisher Forsaken
60% Fisher
Boundary
AUC
40% SSD Boundary

40% SSD
20%
20%
0% t0 t1 t2 t3 t4
Time step
(c) Texas
(c) Texas
Figure 9: Total class unlearning resilience results
Figure 10: Total class unlearning resilience results (AUC) of
(TPR@FPR=0.01‰) of baselines on Location, Pur-
baselines on Location, Purchase, and Texas
chase, and Texas
19
Random sample unlearning results (TPR@FPR=0.01 ) over Time
18% Retrain Random sample unlearning results (AUC) over Time
15% FT Retrain
Ascent 60% FT
12%
TPR@FPR=0.01
Forsaken 50% Ascent

10% Fisher Forsaken
8% L-Codec 40% Fisher
AUC
SSD 30% L-Codec
5%
20% SSD
2%
0% t0(airplane) t1(car) t2(bird) t3(cat) t4(deer) 10%
Time step 0% t0(airplane) t1(car) t2(bird) t3(cat) t4(deer)
Time step
(a) CIFAR10
Random sample unlearning results (TPR@FPR=0.01 ) over Time (a) CIFAR10
50% Retrain Random sample unlearning results (AUC) over Time
FT 80% Retrain
40% Ascent FT
70%
TPR@FPR=0.01
Forsaken Ascent
30% Fisher 60%
Forsaken
L-Codec 50% Fisher
AUC
20% SSD 40% L-Codec
10% 30% SSD
20%
0% t0(apple) t1(fish) t2(baby) t3(bear) t4(beaver) 10%
Time step 0% t0(apple) t1(fish) t2(baby) t3(bear) t4(beaver)
Time step
(b) CIFAR100
60% Random sample unlearning results (TPR@FPR=0.01 ) over Time (b) CIFAR100
Random sample unlearning results (AUC) over Time
Retrain
50% FT 80% Retrain
Ascent 70% FT
40% Ascent
TPR@FPR=0.01
Forsaken 60% Forsaken

30% Fisher 50% Fisher
AUC
L-Codec 40% L-Codec

20% SSD 30% SSD
20%
10% 10%
0% t0 t1 t2 t3 t4
0% t0 t1 t2 t3 t4 Time step
Time step
(c) Location
(c) Location 70% Random sample unlearning results (AUC) over Time
Random sample unlearning results (TPR@FPR=0.01 ) over Time Retrain
18% 60% FT
Retrain
FT 50% Ascent
15% Forsaken
Ascent 40%
12% Fisher
TPR@FPR=0.01
AUC
Forsaken
10% Fisher 30% L-Codec
L-Codec SSD
8% 20%
SSD
5% 10%
2% 0% t0 t1 t2 t3 t4
0% Time step
t0 t1 t2 t3 t4
Time step
(d) Purchase
(d) Purchase Random sample unlearning results (AUC) over Time
Random sample unlearning results (TPR@FPR=0.01 ) over Time 70% Retrain
25% Retrain 60% FT
FT Ascent
20% 50% Forsaken
Ascent
40% Fisher
TPR@FPR=0.01
AUC
Forsaken
15% Fisher 30% SSD
SSD 20%
10%
10%
5% 0% t0 t1 t2 t3 t4
0% Time step
t0 t1 t2 t3 t4
Time step
(e) Texas
(e) Texas
Figure 12: Random sample unlearning resilience results
Figure 11: Random sample unlearning resilience results (AUC) of baselines
(TPR@FPR=0.01‰) of baselines
20
Figure 13: Partial class unlearning relative results (TPR@FPR=0.01‰)
Figure 14: Total class unlearning results (AUC) of 10 classes
21
Figure 15: Partial class unlearning results (AUC) of 10 classes
22

Has Approximate Machine Unlearning Been Evaluated Properly? From Auditing To Side Effects

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Has Approximate Machine Unlearning Been Evaluated Properly? From Auditing To Side Effects

Uploaded by

Copyright:

Available Formats

Has Approximate Machine Unlearning been evaluated properly?

From Auditing to Side Effects

Cheng-Long Wang1 , Qi Li2 , Zihang Xiang1 , and Di Wang1

Abstract Legal frameworks such as the General Data Protection Regu-

P(A|B) · P(B) Algorithm 1 Sample-level Non-membership Auditing Game

P(B|A) = P(A|B) · P(B), (4) HF :D ∼ i.i.d.(π, n), z ∼ D, θ ← T (D) (5)

Original Model Unlearned Model

Target 𝑓!!"# Unlearning

Figure 1: Approximate Unlearning Auditing and Evaluation Framework

Likelihood’s Difference Score: The first involves esti-

Figure 2: Metric utility on 3 types of exact unlearning

based on their constructed features. 6 Auditing Metric Validation

Figure 3: Auditing results on exact random sample unlearning

6.2 Observations 7 Auditing of Approximate Unlearning

Dataset Retrain FT Ascent Forsaken Fisher L-Codec Boundary SSD

CIFAR100 59.18±0.79 0.52±0.25 0.18±0.11 0.20±0.11 0.20±0.13 0.10±0.07 - 0.38±0.09

CIFAR100 76.34±1.14 6.44±1.11 55.36±2.69 55.23±3.29 0.58±0.30 23.54±1.57 49.51±2.63 3.17±0.56

CIFAR100 100.00±0.00 71.39±1.81 69.19±1.49 58.62±1.89 0.26±0.12 15.28±0.27 65.10±1.75 2.06±0.41

7.1 Unlearning Utility 7.2 Unlearning Resilience

Figure 5: Total class unlearning relative results (TPR@FPR=0.01‰)

P(B|A) = P(B). (12)

Case 2: The events A and B are not statistically independent,

Figure 6: Auditing TPR-FPR curves on exact random sample unlearning

1.0 CIFAR10 CIFAR100 Purchase Texas Location

Figure 7: Auditing TPR-FPR curves on exact partial class unlearning

Total class unlearning results (AUC) over Time

0% t0(airplane) t1(car) t2(bird) t3(cat) t4(deer)

0% t0(apple) t1(fish) t2(baby) t3(bear) t4(beaver)

Figure 8: Total class unlearning resilience results (AUC) of

Dataset Retrain FT Ascent Forsaken Fisher L-Codec Boundary SSD

CIFAR100 86.31±0.61 57.83±0.77 50.84±0.59 53.24±0.30 50.97±0.77 56.48±0.20 - 55.82±0.32

CIFAR10 74.02±0.33 36.91±0.34 97.31±0.05 96.54±0.05 49.94±0.52 94.22±0.06 96.45±0.04 77.80±0.23

CIFAR100 91.33±0.57 77.22±0.94 98.90±0.11 99.29±0.06 49.94±1.25 70.45±0.62 98.91±0.08 89.61±0.35

CIFAR10 100.00±0.00 97.01±0.07 99.94±0.00 98.56±0.04 49.99±0.25 99.31±0.02 99.18±0.02 91.57±0.07

CIFAR100 100.00±0.00 99.52±0.04 99.20±0.06 99.42±0.04 50.30±1.11 62.34±0.62 99.22±0.05 91.26±0.26

40% Boundary L-Codec

40% SSD Boundary

Forsaken 50% Ascent

Forsaken 60% Forsaken

L-Codec 40% L-Codec

Figure 13: Partial class unlearning relative results (TPR@FPR=0.01‰)

Figure 14: Total class unlearning results (AUC) of 10 classes

Figure 15: Partial class unlearning results (AUC) of 10 classes

You might also like