You are on page 1of 16

Towards Neural Networks

Robust Against Sparse


Attacks

Francisco Ferrari

April 2022
Objectives 2

Speaker
• Verify if projected gradient ascent (PGA) attacks can be used in
adversarial training to obtain competitive robust models

• Optimize PGA based methods to find L1 and L0 attacks with less


computational needs or with simpler solutions

• Explore if there are other bounded type of attack methods that can find
L1 and L0 perturbations in a efficient way.
3

Francisco Ferrari
89% “Panda”
Adversarial Examples

Adversarial Small perturbations to input data that


Example
can make huge changes into the
model output

97% “cow”
Adversarial examples 4

Speaker
Panda

§ NNs have a lot of parameters, act as black-boxes & highly no linear


§ NNs work in a different way as humans à are less robust to input changes
compared to human

Neural Network: Grant Sanderson (www.3blue1brown.com)


Adversarial examples 5

Speaker
Cow

x+Δx

Adversarial Example

however

Neural Network: Grant Sanderson (www.3blue1brown.com)


Finding Adversarial Examples 7

Speaker
Projected Gradient Ascent: 𝛿

Find local max

Coordinate ascent Improve by Take top-k directions


Slow convergence However choosing k is tricky
Finding Adversarial Examples 8

Speaker
Automatic Projected Gradient Ascent (APGD):
Similar fashion as PGD but the step and sparsity (for l1) are automatically adjusted at each step
depending on the evolution of the loss from previous iterations

• APGD is stronger than PGD


• Better projection into
• Automatic step-size adjusting à hyper-parameter free
• Possibility of using multi-epsilon for exploring different values of adversarial budget

Mind the box: l1-APGD for sparse adversarial attacks on image classifiers (Croce and Hein)
The constrain problem 9

Speaker
When computing adversarial perturbations we have to ensure images' dynamic range (e.g. [0-255])
For l-inf and l-2 perturbations almost all pixel are distorted with small magnitude so usually
constrains are not enforced or if values exceed dynamic range clipping them have minor impact

For space perturbations: L0 & L-1 results in few distorted pixels of high magnitude.
Clipping values after adversarial attack have impact on success of attack.

Several methods take into consideration the limitations


• SparseFool (Modas et. al): Linearize the constrain by approximating the hyperplane
• APGD-l1 (Croce and Hein): Close solution under the assumption that the perturbation is
sufficiently small
Adversarial Training 10

Speaker
Min-Max Problem

PGA adversary

Optimization problem: non-convex, non-concave min-max problem à Exact solution NP-hard.

Adversarial training can only rely on approximate methods to solve the inner maximization problem.

L-2 & L-inf à The approximation is good


L1 à Probably not the case
Sparce Attacks 11

Speaker
• We want to modify the smallest amount of pixels in order to change the decision
• Due to the convexity of L-infinity and L-2 norms, projected gradient descent (PGD) can
typically obtain consistent and satisfying results in these cases.
• However, the performance of PGD degrades significantly in L-1 and L-0 cases, which
leaves much room for improvement

PGA: Perturbations updated in the steepest descent direction

For:
Gradient Regularization 12

Speaker
Imposing some structural constraints on how the gradient of the input features x wrt. our
loss function behave

For l1 bounded attacks à support norm: Linf (p-q dual norm)


Regularizer: Where is a random point sampled from the l1 ball

We can accelerate convergence by taking the top-k directions

Regularizer:
Initial Accuracy Results 13

Speaker
PGD L1 90.9% 81.8% 34.9%

PGD L1 91.8% 82.7% 35.9%


(regularization)

APGD L1 90.4% 77.3% 43.1%

APGD L1 88.3% 74.6% 46.3%


(regularization)
Lose surface plots 14

Speaker
PGD-L1 (no regularization) APGD-L1 (no regularization)

APGD-L1 (with regularization)


PGD-L1 (with regularization)
What have been done 15

Speaker
until now…

§ Explore different sparse L1 norm bounded attacks

§ Adversarially train different networks with different adversarial attacks

§ Introduce gradient regularization to adversarially trained networks

§ Study how different adversarial attacks affect the loss landscape of

neural networks
Future work 16

Speaker
§ Continue to look into regularization as a mean to improve robust
accuracy
§ Continue to investigate if PGA base methods can be used to find L1
and L0 attacks with simpler solutions
§ Explore if there are other bounded type of attack methods that can find
L1 and L0 perturbations in a efficient way
17

Speaker
Thank you!

You might also like