Professional Documents
Culture Documents
Towards Neural Networks Robust Against Sparse Attacks
Towards Neural Networks Robust Against Sparse Attacks
Francisco Ferrari
April 2022
Objectives 2
Speaker
• Verify if projected gradient ascent (PGA) attacks can be used in
adversarial training to obtain competitive robust models
• Explore if there are other bounded type of attack methods that can find
L1 and L0 perturbations in a efficient way.
3
Francisco Ferrari
89% “Panda”
Adversarial Examples
97% “cow”
Adversarial examples 4
Speaker
Panda
Speaker
Cow
x+Δx
Adversarial Example
however
Speaker
Projected Gradient Ascent: 𝛿
Speaker
Automatic Projected Gradient Ascent (APGD):
Similar fashion as PGD but the step and sparsity (for l1) are automatically adjusted at each step
depending on the evolution of the loss from previous iterations
Mind the box: l1-APGD for sparse adversarial attacks on image classifiers (Croce and Hein)
The constrain problem 9
Speaker
When computing adversarial perturbations we have to ensure images' dynamic range (e.g. [0-255])
For l-inf and l-2 perturbations almost all pixel are distorted with small magnitude so usually
constrains are not enforced or if values exceed dynamic range clipping them have minor impact
For space perturbations: L0 & L-1 results in few distorted pixels of high magnitude.
Clipping values after adversarial attack have impact on success of attack.
Speaker
Min-Max Problem
PGA adversary
Adversarial training can only rely on approximate methods to solve the inner maximization problem.
Speaker
• We want to modify the smallest amount of pixels in order to change the decision
• Due to the convexity of L-infinity and L-2 norms, projected gradient descent (PGD) can
typically obtain consistent and satisfying results in these cases.
• However, the performance of PGD degrades significantly in L-1 and L-0 cases, which
leaves much room for improvement
For:
Gradient Regularization 12
Speaker
Imposing some structural constraints on how the gradient of the input features x wrt. our
loss function behave
Regularizer:
Initial Accuracy Results 13
Speaker
PGD L1 90.9% 81.8% 34.9%
Speaker
PGD-L1 (no regularization) APGD-L1 (no regularization)
Speaker
until now…
neural networks
Future work 16
Speaker
§ Continue to look into regularization as a mean to improve robust
accuracy
§ Continue to investigate if PGA base methods can be used to find L1
and L0 attacks with simpler solutions
§ Explore if there are other bounded type of attack methods that can find
L1 and L0 perturbations in a efficient way
17
Speaker
Thank you!