You are on page 1of 16

Building a Machine Learning

System Resilient to Adversarial


Attacks
Machine learning offers tremendous promise but is vulnerable to adversarial attacks. In this
presentation, we will explore how to create a resilient machine learning system.

Project Guide: Dr. Manisha Parlewar

Group Members

Arun Raj K V(B180590EC)

Akthar Azif(B180739EC)

Shamil Shihab(B190775EC)

Rithul Sabi Kumar(B191090EC)


Overview of Adversarial Attacks
1 Definition 2 Examples

Adversarial attacks modify Examples of adversarial


data inputs to produce attack techniques include
incorrect results from FGSM, iFGSM, and MI-FGSM.
machine learning models

3 Impact

Adversarial attacks can result in catastrophic consequences, from


fraudulent financial transactions to fatal accidents.
Understanding FGSM

The Fast Gradient Sign Method (FGSM) is a popular method for generating adversarial
examples in machine learning. It works by adding small perturbations to input data based on
the gradient of the loss function. These perturbations can cause a machine learning model to
misclassify the input, even if it appears unchanged to a human observer. Understanding FGSM
is critical to developing defenses against adversarial attacks.
Understanding iFGSM

The Iterative Fast Gradient Sign Method (iFGSM) is an extension of FGSM that generates
multiple perturbations to input data. It is more effective than FGSM and can produce stronger
adversarial examples. Understanding iFGSM is important for developing robust defenses
against adversarial attacks.
Understanding MI-FGSM

The Momentum Iterative Fast Gradient Sign Method (MI-FGSM) is a state-of-the-art method for
generating adversarial examples in machine learning. It uses a momentum term to
accumulate gradients across iterations, which can lead to even stronger attacks.
Understanding MI-FGSM is crucial for developing effective defenses against adversarial
attacks.
Approaches to Designing a Resilient
Machine Learning System
Defensive distillation Adversarial Training

It involves training the model on a Adversarial training modifies machine


distilled version of the original training learning algorithms to teach them how to
data, which is generated by training recognize and counteract adversarial
another model on the original data. This attacks.It involves adding modified inputs
second model is trained to predict the (adversarial examples) to the training data
output of the first model, rather than the to teach the model to recognize and
original labels. The distilled data is then defend against such attacks.
used to train the target model.

Gradient Masking Ensemble Techniques

Gradient masking is a technique that Ensemble techniques use the output of


obscures gradients, making it more multiple machine learning systems to
challenging for attackers to manipulate improve the overall result, making it more
them to cause damage challenging for attackers to find
weaknesses.
A failed defense: “gradient masking”
1 2 3

Perturb the input System misidentifies What if there were no


during attack the Class gradient?
Why Defensive Distillation
1 Adversarial Training

Increased computational and training costs

Limited generalization

Trade-off between accuracy and robustness

2 Ensemble Techniques
Increased computational and training costs

Overfitting

Difficulty in interpretation

Limited generalization

3 Defensive Distillation
Improved robustness

Lower computational and training costs

Simpler implementation

Reduced overfitting
Training Data and Defensive distillation
1. Train a "teacher" model on the original training data. This model is typically a large,
complex model that is trained to perform well on the training data.

2. Use the teacher model to generate a new set of training data by predicting the outputs of
the original training data. This new data set is referred to as the "distilled" data.

3. Train a "student" model on the distilled data. This model is typically a smaller, simpler
model that is easier and faster to train than the teacher model.

4. Use the student model for inference, i.e., to make predictions on new data.
RESULTS
RESULTS
RESULTS
Conclusion and Future Directions
Challenges Future
1. Robustness is difficult to achieve when 1. Future research must address adversarial
algorithmic decisions depend on attacks in the medical, financial, legal
features of adversarial distributions. and other domains.

2. Designing defenses can require privacy- 2. Developing novel information-theoretic


preserving tests that increase complexity defences against adversarial machine
in AI systems. learning attacks is an important area for
research.
3. Introducing new risks, such as hidden
algorithms exploiting the shortcomings 3. Developing application-specific
of the defenses will threaten to neutralize defenses, making security a constraint
their effectiveness. during AI application and model design
is the need of the hour.
Conclusion
Adversarial attacks pose a serious threat to machine learning systems,
with potentially catastrophic consequences in a variety of applications.
Building resilient machine learning systems that are able to withstand
these attacks is therefore of utmost importance. Defensive distillation is
a promising approach to achieving this goal, offering several
advantages over other methods such as adversarial training and
ensemble techniques. However, there is still much work to be done in
this area, and developing effective defenses against adversarial attacks
remains an active area of research. By continuing to study these attacks
and developing robust and resilient machine learning systems, we can
help ensure the safety and security of the technologies that are
increasingly shaping our world.
References
Papernot, N., McDaniel, P., Sinha, A., Wellman, M. P. (2017). Practical black-box attacks
against machine learning. arXiv preprint arXiv:1602.02697v4.

Hinton, G., Vinyals, O., Dean, J. (2015, March 9). Distilling the Knowledge in a Neural
Network. arXiv.org. https://arxiv.org/abs/1503.02531v1

N. Papernot, P. McDaniel, X. Wu, S. Jha and A. Swami, ”Distillation as a Defense to


Adversarial Perturbations Against Deep Neural Networks,” 2016 IEEE Symposium on
Security and Privacy (SP), San Jose, CA, USA, 2016, pp. 582-597, doi: 10.1109/SP.2016.41.

N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of


deep learning in adversarial settings,” in Proceedings of the 1st IEEE European Symposium
on Security and Privacy. IEEE, 2016.
Thank You

You might also like