Secure Machine Learning With Neural Networks: Mane Pooja (M190442EC)

Secure Machine Learning with Neural Networks
Mane Pooja (M190442EC)
Guided by Dr. Deepthi.P.P
October 27, 2020
National Institute of Technology, Calicut

M.Tech,Signal Processing 2019-21
1 / 31
Overview
1 Introduction
2 Literature Survey
3 Problem Definition
4 Work Done
5 Work Schedule
6 References
2 / 31
Introduction:
Adversarial Example/ Adversarial Attack:

Slightly shifted or modified version of an original image from dataset.
Inputs formed by applying small perturbations to examples from the

dataset, such that the perturbed input results in the model outputting
an incorrect answer with high confidence.
Figure: Adversarial examples
3 / 31
Motivation
Machine Learning models can misclassify adversarial example, which

is slightly different from Original example.
Adversarial examples are not specific to a particular type of NN

architecture.
Same adversarial examples can be misclassified by different NN

architectures trained on same dataset, differently.
This practically means that ML models are not learning the

true underlying properties of the data.
4 / 31
Cause of Adversarial Attacks
Non-linearity of Neural Networks.
Insufficient Regularisation.
Insufficient Model Averaging.
5 / 31
Literature Survey:
A method called Fast Gradient Sign Method to generate

adversarial examples is presented in. 1 The confidence of model,
misclassifying adversarial examples is high
It also shows that adversarial training can be used as a regularisation

technique.
Authors hypothesize that NN are able to learn approximately same

weights when trained on different subsets of training data.
The stability of the learned weights result in the stability of

misclassification of adversarial examples.
1
I. J. Goodfellow, J. Shlens, and C. Szegedy. (2015). Explaining and harnessing
adversarial examples.” [Online]. Available: https://arxiv.org/abs/1412.6572
6 / 31
Literature Survey:
To detect and defend adversarial examples authors present an

unsupervised learning approach I-Defender2 . and also introduce
Defence- GAN (Generative Adversarial Network).
I-defender uses the IHSDs of a classifier to reject adversarial inputs

because they tend to produce hidden states lying in the low density
regions of the IHSD.
Able to robustly defend against various of black-box and gray-box

attacks.
2
Zhihao Zheng, Pengyu Hong. (2018). Robust Detection of Adversarial Attacks by
Modeling the Intrinsic Properties of Deep Neural Networks” .
7 / 31
Literature Survey:
Higher vulnerability of medical images due to Complex Biological

Textures and High gradient regions.
Less perturbation is required to generate a successful attack on

medical images.
KD and Dfeat3 -based detectors, achieve an AUC of above 99

percent against all attacks across three datasets.
Medical image adversarial attack can be easily detected as adversarial

attack since it results in perturbations to widespread regions outside
the lesion area.
3
Xingjun M, Yuhao Niu,Lin Gu , Yisen Wang, Yitian Zhao,James Bailey , Feng Lu.
(2020). Understanding Adversarial Attacks on Deep Learning Based Medical Image
Analysis Systems”.
8 / 31
Adversarial Attack Methods:
Adversarial Examples can be generated using

Fast Gradient Sign Method(FGSM)
Basic Iterative Method (BIM)
Projected Gradient Descent (PGD)
Carlini and Wagner Attack (CW)
9 / 31
Resist Adversarial Attacks:
Methods to defend Adversarial Attacks
Adversarial Training as a Regularizer
Defense-GAN (Generative Adversarial Network)
I-Defender
Kernel Density (KD)
Local Intrinsic Dimensionality (LID)
Qfeat-quantized deep features
Dfeat- Deep features

10 / 31
Problem Definition and Objective:
Problem Statement: Neural network-based machine learning is

prone to adversarial attacks. This work attempts to formulate
methods for secure and accurate classification using NNs which are
prone to adversarial attacks.
Major Objectives:
1. To identify various adversarial attack methods against NN based
classification.
2. Find the sensitivity of various NN models to such attacks.
3. Devise proper training methods to resist these attacks.
4. Validate the improved resistance to attacks through experiments.
11 / 31
Work Done
In this phase, efficiency of FGSM to mount successful attacks on various

network models and datasets is investigated.
Initially different network models to generate accurate classification

on each dataset are tried and suitable models are identified.
Then the adversarial inputs are generated through FGSM method.
For generating adversarial inputs, different loss functions are tried to

identify most efficient one for fooling the network through adversarial
inputs.
The effect of varying the adversarial noise by varying multiplication

constant epsilon is analyzed.
12 / 31
Fast Gradient Sign Method(FGSM)
The fast gradient sign method works by using the gradients of the
neural network to create an adversarial example.
FGSM ensures misclassification.
13 / 31
Fast Gradient Sign Method(FGSM):
adv-x : Adversarial image; x : Original input image; y : Original input

label; Epsilon : Multiplier to ensure the perturbations are small; Theta :
Model parameters; J : Loss.
Objective is to create an image that maximises the loss.
Gradients are taken w.r.t the input image.
The model parameters remain constant.
The only goal is to fool an already trained model.

14 / 31
Fooling a trained Model
Loss Function: Mean Squared Error and Categorical cross

entropy.
Consider an Image from training data.
Predict to which label the Image belongs.
Gradients can be calculated using loss and input image.
Extract sign of the gradients calculated above.
Multiply it with epsilon.
15 / 31
CIFAR10 dataset trained on CNN model
16 / 31
CIFAR10 dataset trained on CNN model
Epochs=25; Training loss: 0.0049; Training accuracy: 0.6338;

val-loss: 0.0054; val-accuracy: 0.5910
Figure: accuracy vs epochs and loss vs epochs
17 / 31
FGSM - CIFAR10
Adversarial examples generated using FGSM
18 / 31
MNIST dataset trained on CNN model
19 / 31
MNIST dataset trained on CNN model
Epochs=25; Training loss: 0.0027; Training Accuracy: 0.9836;

val-loss: 0.0032; val-accuracy: 0.9812.
20 / 31
FGSM - MNIST
21 / 31
FMNIST dataset trained on DNN model
22 / 31
FMNIST dataset trained on DNN model

val-loss: 0.3983; val-accuracy: 0.8666
23 / 31
FGSM - FMNIST
24 / 31
MNIST dataset trained on DNN model
25 / 31
MNIST dataset trained on DNN model

val-loss: 0.3983; val-accuracy: 0.8666.
26 / 31
FGSM - MNIST
27 / 31
Conclusions based on above Results:
FGSM is effective in generating adversarial examples for various

datasets.
Categorical cross entropy has better suitability as a loss function

compared to mean squared error.
Lesser perturbation is required to generate adversarial examples for

complex dataset like CIFAR10.
28 / 31
Work Schedule
Generate adversarial examples using other methods (eg. Carlini

Wagner (CW)).
Analyze the vulnerabilities of different network models to different

attacks.
Develop possible defense mechanisms to train different network

models with adversarial perturbations to improve the robustness.
Develop algorithms for secure and accurate classification with neural

network for different applications.
29 / 31
References
I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing

adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
Zhihao Zheng, Pengyu Hong. (2018). Robust Detection of

Adversarial Attacks byModeling the Intrinsic Properties of Deep
Neural Networks.
Xingjun M, Yuhao Niu,Lin Gu , Yisen Wang, Yitian Zhao,James

Bailey , Feng Lu.(2020). Understanding Adversarial Attacks on Deep
Learning Based Medical ImageAnalysis Systems.
Preserving Privacy in Convolutional Neural Network: An -tuple

Differential Privacy Approach,2019.
30 / 31
The End
31 / 31

Secure Machine Learning With Neural Networks: Mane Pooja (M190442EC)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Secure Machine Learning With Neural Networks: Mane Pooja (M190442EC)

Uploaded by

Copyright:

Available Formats

Secure Machine Learning with Neural Networks

Mane Pooja (M190442EC)

Guided by Dr. Deepthi.P.P

October 27, 2020

National Institute of Technology, Calicut

Adversarial Example/ Adversarial Attack:

Inputs formed by applying small perturbations to examples from the

Figure: Adversarial examples

Machine Learning models can misclassify adversarial example, which

Adversarial examples are not specific to a particular type of NN

Same adversarial examples can be misclassified by different NN

This practically means that ML models are not learning the

Non-linearity of Neural Networks.

Insufficient Model Averaging.

A method called Fast Gradient Sign Method to generate

It also shows that adversarial training can be used as a regularisation

Authors hypothesize that NN are able to learn approximately same

The stability of the learned weights result in the stability of

To detect and defend adversarial examples authors present an

I-defender uses the IHSDs of a classifier to reject adversarial inputs

Able to robustly defend against various of black-box and gray-box

Higher vulnerability of medical images due to Complex Biological

Less perturbation is required to generate a successful attack on

KD and Dfeat3 -based detectors, achieve an AUC of above 99

Medical image adversarial attack can be easily detected as adversarial

Adversarial Examples can be generated using

Basic Iterative Method (BIM)

Projected Gradient Descent (PGD)

Carlini and Wagner Attack (CW)

Defense-GAN (Generative Adversarial Network)

Kernel Density (KD)

Local Intrinsic Dimensionality (LID)

Qfeat-quantized deep features

Dfeat- Deep features

Problem Statement: Neural network-based machine learning is

In this phase, efficiency of FGSM to mount successful attacks on various

Initially different network models to generate accurate classification

Then the adversarial inputs are generated through FGSM method.

For generating adversarial inputs, different loss functions are tried to

The effect of varying the adversarial noise by varying multiplication

FGSM ensures misclassification.

adv-x : Adversarial image; x : Original input image; y : Original input

Objective is to create an image that maximises the loss.

Gradients are taken w.r.t the input image.

The model parameters remain constant.

The only goal is to fool an already trained model.

Loss Function: Mean Squared Error and Categorical cross

Consider an Image from training data.

Predict to which label the Image belongs.

Gradients can be calculated using loss and input image.

Extract sign of the gradients calculated above.

Multiply it with epsilon.

Epochs=25; Training loss: 0.0049; Training accuracy: 0.6338;

Figure: accuracy vs epochs and loss vs epochs

Adversarial examples generated using FGSM

Epochs=25; Training loss: 0.0027; Training Accuracy: 0.9836;

Figure: accuracy vs epochs and loss vs epochs

Adversarial examples generated using FGSM

Epochs=25; Training loss: 0.3879; Training Accuracy: 0.8618;

Figure: accuracy vs epochs and loss vs epochs

Adversarial examples generated using FGSM

Epochs=25; Training loss: 0.3879; Training Accuracy: 0.8618;

Figure: accuracy vs epochs and loss vs epochs