Towards Neural Networks Robust Against Sparse Attacks

Towards Neural Networks
Robust Against Sparse

Attacks
Francisco Ferrari
April 2022
Objectives 2
Speaker
• Verify if projected gradient ascent (PGA) attacks can be used in
adversarial training to obtain competitive robust models
• Optimize PGA based methods to find L1 and L0 attacks with less

computational needs or with simpler solutions
• Explore if there are other bounded type of attack methods that can find
L1 and L0 perturbations in a efficient way.
3
Francisco Ferrari
89% “Panda”
Adversarial Examples
Adversarial Small perturbations to input data that

Example
can make huge changes into the
model output
97% “cow”
Adversarial examples 4
Speaker
Panda
§ NNs have a lot of parameters, act as black-boxes & highly no linear

§ NNs work in a different way as humans à are less robust to input changes
compared to human
Neural Network: Grant Sanderson (www.3blue1brown.com)

Adversarial examples 5
Speaker
Cow
x+Δx
Adversarial Example
however
Neural Network: Grant Sanderson (www.3blue1brown.com)

Finding Adversarial Examples 7
Speaker
Projected Gradient Ascent: 𝛿
Find local max
Coordinate ascent Improve by Take top-k directions

Slow convergence However choosing k is tricky
Finding Adversarial Examples 8
Speaker
Automatic Projected Gradient Ascent (APGD):
Similar fashion as PGD but the step and sparsity (for l1) are automatically adjusted at each step
depending on the evolution of the loss from previous iterations
• APGD is stronger than PGD

• Better projection into
• Automatic step-size adjusting à hyper-parameter free
• Possibility of using multi-epsilon for exploring different values of adversarial budget
Mind the box: l1-APGD for sparse adversarial attacks on image classifiers (Croce and Hein)
The constrain problem 9
Speaker
When computing adversarial perturbations we have to ensure images' dynamic range (e.g. [0-255])
For l-inf and l-2 perturbations almost all pixel are distorted with small magnitude so usually
constrains are not enforced or if values exceed dynamic range clipping them have minor impact
For space perturbations: L0 & L-1 results in few distorted pixels of high magnitude.
Clipping values after adversarial attack have impact on success of attack.
Several methods take into consideration the limitations

• SparseFool (Modas et. al): Linearize the constrain by approximating the hyperplane
• APGD-l1 (Croce and Hein): Close solution under the assumption that the perturbation is
sufficiently small
Adversarial Training 10
Speaker
Min-Max Problem
PGA adversary
Optimization problem: non-convex, non-concave min-max problem à Exact solution NP-hard.
Adversarial training can only rely on approximate methods to solve the inner maximization problem.
L-2 & L-inf à The approximation is good

L1 à Probably not the case
Sparce Attacks 11
Speaker
• We want to modify the smallest amount of pixels in order to change the decision
• Due to the convexity of L-infinity and L-2 norms, projected gradient descent (PGD) can
typically obtain consistent and satisfying results in these cases.
• However, the performance of PGD degrades significantly in L-1 and L-0 cases, which
leaves much room for improvement
PGA: Perturbations updated in the steepest descent direction
For:
Gradient Regularization 12
Speaker
Imposing some structural constraints on how the gradient of the input features x wrt. our
loss function behave
For l1 bounded attacks à support norm: Linf (p-q dual norm)

Regularizer: Where is a random point sampled from the l1 ball
We can accelerate convergence by taking the top-k directions
Regularizer:
Initial Accuracy Results 13
Speaker
PGD L1 90.9% 81.8% 34.9%
PGD L1 91.8% 82.7% 35.9%

(regularization)
APGD L1 90.4% 77.3% 43.1%
APGD L1 88.3% 74.6% 46.3%

(regularization)
Lose surface plots 14
Speaker
PGD-L1 (no regularization) APGD-L1 (no regularization)
APGD-L1 (with regularization)

PGD-L1 (with regularization)
What have been done 15
Speaker
until now…
§ Explore different sparse L1 norm bounded attacks
§ Adversarially train different networks with different adversarial attacks
§ Introduce gradient regularization to adversarially trained networks
§ Study how different adversarial attacks affect the loss landscape of
neural networks
Future work 16
Speaker
§ Continue to look into regularization as a mean to improve robust
accuracy
§ Continue to investigate if PGA base methods can be used to find L1
and L0 attacks with simpler solutions
§ Explore if there are other bounded type of attack methods that can find
L1 and L0 perturbations in a efficient way
17
Speaker
Thank you!

Towards Neural Networks Robust Against Sparse Attacks

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Towards Neural Networks Robust Against Sparse Attacks

Uploaded by

Copyright:

Available Formats

Towards Neural Networks

Robust Against Sparse

• Optimize PGA based methods to find L1 and L0 attacks with less

Adversarial Small perturbations to input data that

§ NNs have a lot of parameters, act as black-boxes & highly no linear

Neural Network: Grant Sanderson (www.3blue1brown.com)

Neural Network: Grant Sanderson (www.3blue1brown.com)

Find local max

Coordinate ascent Improve by Take top-k directions

• APGD is stronger than PGD

Several methods take into consideration the limitations

Optimization problem: non-convex, non-concave min-max problem à Exact solution NP-hard.

L-2 & L-inf à The approximation is good

PGA: Perturbations updated in the steepest descent direction

For l1 bounded attacks à support norm: Linf (p-q dual norm)

We can accelerate convergence by taking the top-k directions

PGD L1 91.8% 82.7% 35.9%

APGD L1 90.4% 77.3% 43.1%

APGD L1 88.3% 74.6% 46.3%

APGD-L1 (with regularization)

§ Explore different sparse L1 norm bounded attacks

§ Adversarially train different networks with different adversarial attacks

§ Introduce gradient regularization to adversarially trained networks

§ Study how different adversarial attacks affect the loss landscape of

You might also like