You are on page 1of 31

Secure Machine Learning with Neural Networks

Mane Pooja (M190442EC)

Guided by Dr. Deepthi.P.P

October 27, 2020

National Institute of Technology, Calicut


M.Tech,Signal Processing 2019-21
1 / 31
Overview

1 Introduction

2 Literature Survey

3 Problem Definition

4 Work Done

5 Work Schedule

6 References

2 / 31
Introduction:

Adversarial Example/ Adversarial Attack:


Slightly shifted or modified version of an original image from dataset.

Inputs formed by applying small perturbations to examples from the


dataset, such that the perturbed input results in the model outputting
an incorrect answer with high confidence.

Figure: Adversarial examples

3 / 31
Motivation

Machine Learning models can misclassify adversarial example, which


is slightly different from Original example.

Adversarial examples are not specific to a particular type of NN


architecture.

Same adversarial examples can be misclassified by different NN


architectures trained on same dataset, differently.

This practically means that ML models are not learning the


true underlying properties of the data.

4 / 31
Cause of Adversarial Attacks

Non-linearity of Neural Networks.

Insufficient Regularisation.

Insufficient Model Averaging.

5 / 31
Literature Survey:

A method called Fast Gradient Sign Method to generate


adversarial examples is presented in. 1 The confidence of model,
misclassifying adversarial examples is high

It also shows that adversarial training can be used as a regularisation


technique.

Authors hypothesize that NN are able to learn approximately same


weights when trained on different subsets of training data.

The stability of the learned weights result in the stability of


misclassification of adversarial examples.
1
I. J. Goodfellow, J. Shlens, and C. Szegedy. (2015). Explaining and harnessing
adversarial examples.” [Online]. Available: https://arxiv.org/abs/1412.6572
6 / 31
Literature Survey:

To detect and defend adversarial examples authors present an


unsupervised learning approach I-Defender2 . and also introduce
Defence- GAN (Generative Adversarial Network).

I-defender uses the IHSDs of a classifier to reject adversarial inputs


because they tend to produce hidden states lying in the low density
regions of the IHSD.

Able to robustly defend against various of black-box and gray-box


attacks.

2
Zhihao Zheng, Pengyu Hong. (2018). Robust Detection of Adversarial Attacks by
Modeling the Intrinsic Properties of Deep Neural Networks” .
7 / 31
Literature Survey:

Higher vulnerability of medical images due to Complex Biological


Textures and High gradient regions.

Less perturbation is required to generate a successful attack on


medical images.

KD and Dfeat3 -based detectors, achieve an AUC of above 99


percent against all attacks across three datasets.

Medical image adversarial attack can be easily detected as adversarial


attack since it results in perturbations to widespread regions outside
the lesion area.
3
Xingjun M, Yuhao Niu,Lin Gu , Yisen Wang, Yitian Zhao,James Bailey , Feng Lu.
(2020). Understanding Adversarial Attacks on Deep Learning Based Medical Image
Analysis Systems”.
8 / 31
Adversarial Attack Methods:

Adversarial Examples can be generated using


Fast Gradient Sign Method(FGSM)

Basic Iterative Method (BIM)

Projected Gradient Descent (PGD)

Carlini and Wagner Attack (CW)

9 / 31
Resist Adversarial Attacks:
Methods to defend Adversarial Attacks
Adversarial Training as a Regularizer

Defense-GAN (Generative Adversarial Network)

I-Defender

Kernel Density (KD)

Local Intrinsic Dimensionality (LID)

Qfeat-quantized deep features

Dfeat- Deep features


10 / 31
Problem Definition and Objective:

Problem Statement: Neural network-based machine learning is


prone to adversarial attacks. This work attempts to formulate
methods for secure and accurate classification using NNs which are
prone to adversarial attacks.

Major Objectives:
1. To identify various adversarial attack methods against NN based
classification.
2. Find the sensitivity of various NN models to such attacks.
3. Devise proper training methods to resist these attacks.
4. Validate the improved resistance to attacks through experiments.

11 / 31
Work Done

In this phase, efficiency of FGSM to mount successful attacks on various


network models and datasets is investigated.

Initially different network models to generate accurate classification


on each dataset are tried and suitable models are identified.

Then the adversarial inputs are generated through FGSM method.

For generating adversarial inputs, different loss functions are tried to


identify most efficient one for fooling the network through adversarial
inputs.

The effect of varying the adversarial noise by varying multiplication


constant epsilon is analyzed.

12 / 31
Fast Gradient Sign Method(FGSM)

The fast gradient sign method works by using the gradients of the
neural network to create an adversarial example.

FGSM ensures misclassification.

13 / 31
Fast Gradient Sign Method(FGSM):

adv-x : Adversarial image; x : Original input image; y : Original input


label; Epsilon : Multiplier to ensure the perturbations are small; Theta :
Model parameters; J : Loss.

Objective is to create an image that maximises the loss.

Gradients are taken w.r.t the input image.

The model parameters remain constant.

The only goal is to fool an already trained model.


14 / 31
Fooling a trained Model

Loss Function: Mean Squared Error and Categorical cross


entropy.

Consider an Image from training data.

Predict to which label the Image belongs.

Gradients can be calculated using loss and input image.

Extract sign of the gradients calculated above.

Multiply it with epsilon.

15 / 31
CIFAR10 dataset trained on CNN model

16 / 31
CIFAR10 dataset trained on CNN model

Epochs=25; Training loss: 0.0049; Training accuracy: 0.6338;


val-loss: 0.0054; val-accuracy: 0.5910

Figure: accuracy vs epochs and loss vs epochs

17 / 31
FGSM - CIFAR10

Adversarial examples generated using FGSM

18 / 31
MNIST dataset trained on CNN model

19 / 31
MNIST dataset trained on CNN model

Epochs=25; Training loss: 0.0027; Training Accuracy: 0.9836;


val-loss: 0.0032; val-accuracy: 0.9812.

Figure: accuracy vs epochs and loss vs epochs

20 / 31
FGSM - MNIST

Adversarial examples generated using FGSM

21 / 31
FMNIST dataset trained on DNN model

22 / 31
FMNIST dataset trained on DNN model

Epochs=25; Training loss: 0.3879; Training Accuracy: 0.8618;


val-loss: 0.3983; val-accuracy: 0.8666

Figure: accuracy vs epochs and loss vs epochs

23 / 31
FGSM - FMNIST

Adversarial examples generated using FGSM

24 / 31
MNIST dataset trained on DNN model

25 / 31
MNIST dataset trained on DNN model

Epochs=25; Training loss: 0.3879; Training Accuracy: 0.8618;


val-loss: 0.3983; val-accuracy: 0.8666.

Figure: accuracy vs epochs and loss vs epochs

26 / 31
FGSM - MNIST

Adversarial examples generated using FGSM

27 / 31
Conclusions based on above Results:

FGSM is effective in generating adversarial examples for various


datasets.

Categorical cross entropy has better suitability as a loss function


compared to mean squared error.

Lesser perturbation is required to generate adversarial examples for


complex dataset like CIFAR10.

28 / 31
Work Schedule

Generate adversarial examples using other methods (eg. Carlini


Wagner (CW)).

Analyze the vulnerabilities of different network models to different


attacks.

Develop possible defense mechanisms to train different network


models with adversarial perturbations to improve the robustness.

Develop algorithms for secure and accurate classification with neural


network for different applications.

29 / 31
References

I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing


adversarial examples. arXiv preprint arXiv:1412.6572, 2014.

Zhihao Zheng, Pengyu Hong. (2018). Robust Detection of


Adversarial Attacks byModeling the Intrinsic Properties of Deep
Neural Networks.

Xingjun M, Yuhao Niu,Lin Gu , Yisen Wang, Yitian Zhao,James


Bailey , Feng Lu.(2020). Understanding Adversarial Attacks on Deep
Learning Based Medical ImageAnalysis Systems.

Preserving Privacy in Convolutional Neural Network: An -tuple


Differential Privacy Approach,2019.

30 / 31
The End

31 / 31

You might also like