You are on page 1of 43

Previous Lecture 1

• Cyber Threat Landscape


• Common Cyber Attacks
• Machine Learning
• AI applications in Cyber Security
• Common Security Problems
Agenda 2

• Machine Learning
▪ Why Machines Learn?
▪ Machine Learning Model
▪ Challenges & Applications
▪ Types
• Key Mathematical/ Statistical Concepts
▪ Scale of Measurements
• Python
• Datasets
What will be the Output of this Lecture? 3

1. Understand SL, UnSL and RL.


2. Demo of Weka
3. Various SL algorithms
4. When to use ML and Why?
5. ML Versus DL
6. When to use DL and Why?
7. Understanding Nature of the dataset
8. How to handle Dataset?
9. How to take sample data?
10. Confusion Matrix
What will be the Output of this Lecture? 4

1. Feature Engineering ……
Machine Learning 5

“A computer program is said to learn from experience E with respect to some


class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.”

• Improve their performance P

• At executing some task T

• Over time with experience E


Machine Learning Problem 6

A Pattern Exists

We don’t know it

We have data to learn it

• Designing algorithms that ingest data and learn a (hypothesized) model of the data

• The learned model can be used to Detect patterns/structures/themes/trends etc. in the data

• Make predictions about future data and make decisions


Why Machine Learning? 7

• ML needed for:
▪ Data-Driven Decision Making
▪ Efficiency and Scale

• When to Make Machines Learn?


▪ Lack of human expertise
▪ Dynamic scenarios
▪ Difficulty in translating expertise into computational task
Challenges 8

• Quality of data
• Use of low-quality data leads to the problems related to data preprocessing and feature extraction.

• Time-Consuming task
• Consumption of time especially for data acquisition, feature extraction and retrieval.

• No clear objective for formulating business problems


• Having no clear objective and well-defined goal for business problems is another key challenge.

• Issue of overfitting & underfitting


• If the model is overfitting or underfitting, it cannot be represented well for the problem

• Curse of dimensionality
• Too many features of data points. This can be a real hindrance.

• Difficulty in deployment
• Complexity of the ML model makes it difficult to be deployed in real life.
Applications In Security 9

• Malware Detection and Classification


• DDOS/Phishing attack detection
• Cryptography and AI
• Network Traffic Analysis
• Insider Threat Detection
• Anomaly Detection
• Botnet and Spam Detection
• Digital Forensics
Trends in ML and CYS 10

• Responding to Ransomware
• Combining Application Development and Cybersecurity
• Using Deep Learning to Detect DGA-Generated Domains
• Detecting Non-Malware Threats
• Adaptive Honeypots and Honeytokens
• Deep Reinforcement Learning
• Protecting the IoT
• Predicting the Future

DGA: Domain Generation Algorithm


11
Supervised Learning 12
Supervised Learning Algorithms 13

• Linear Regression

• Random Forest

• Support Vector Machine


Supervised Learning Algorithms 14

F1 F2 Label1 Label2 (%)


1 2 Play 10
1 2 No play 20
1 2 Play 23
9 10 No play 100
Supervised Learning: Examples 15
Un-Supervised Learning 16
Unsupervised Learning Algorithms 17

• K-Means Algorithm

• DBSCAN

• Apriori Algorithm

• Hierarchical Clustering Algorithm


Attributes/Features/Columns/Dimensions
Object/ Record/ Tuple/Row
Un-Supervised Learning 18

Clustering

Anomaly Detection
Example- Unsupervised Learning 19

Social Network Analysis


Customer Segmentation

Association Rule Mining


Reinforcement Learning 20
Reinforcement Learning… 21

Exploration

Exploitation
Examples: Reinforcement Learning… 22

• Allocating scarce resources to handle ER cases

• Creating next best offer model for a call center

• Shelf Management
ML Versus DL 23

• Data Dependency
• Hardware Dependency
• Feature Engineering
• Problem Solving Approach
• Execution Time
• Interpretability
24

Statistical & Mathematical Concepts in ML


Data in Machine Learning 25

Data is the fact and figures collected together for reference and analysis.

Data are made up of two aspects:


• Objects such as people, tree, animals, etc.
• Attributes that were recorded for objects such as age, size, weight, cost, etc.

s
Data in Machine Learning 26

Two types of variables based on the type of values that it can take.

Qualitative
Variables can take only particular values: retail store location area, state, city are examples for discrete
variables as it can take only one particular value for a store (here store is our object).

These types of variables are also known as categorical variables.

Quantitative
Variables can take any positive or negative numerical value between a large range.

Retail sales amount, insurance claims amounts are examples for continuous variables that can take any number
within large ranges.

These types of variables are also generally known as numerical variables.


Scale of Measurement 27

• Nominal Scale
▪ Color, Gender, etc.

• Ordinal Scale
▪ Military rank, clothing size, etc.

• Interval Scale
▪ Temperature, IQ rating, etc.

• Ratio Scale
▪ Age, Weight, Height, etc.
Data Handling 28

• Collection
• Analysis
• Interpretation
• Presentation
• Visualization
Sampling Techniques 29
Random Sampling 30

Each member of the population has equal chance of


being selected in the sample
Systematic Sampling 31

Every nth record is chosen from the population to be


part of the sample
Stratified Sampling 32

• A stratum is a subset of the population that shares at least one


common characteristic, in this case gender

• Random sampling is used to select a sufficient number of subjects


from each stratum
Descriptive Statistics 33

• Measure of Central Tendency

• Measure of Variability/Spread
Measure of Central Tendency 34

• Mean

• Median

• Mode
Data Pre-Processing 35

• Dealing with Missing Data


• Handling Categorical Data
• Normalizing Data
• Feature Construction or Generation
36

About Python

Familiarize yourself with Python Programming this week.


Python 37

• Install Anaconda Navigator https://www.anaconda.com/products/individual


Python 38

Environments and Libraries

Notebook Pandas
Qtconsole Scipy
Orange Matplotlib
Vscode Sklearn
PyCharm Numpy
Python Exercises to solve this week 39

• https://pynative.com/python-exercises-with-solutions/

• https://www.w3resource.com/machine-learning/scikit-learn/iris/index.php

• https://www.practicepython.org/
Next Lecture: Supervised Learning Algorithms 40

• Linear Regression
• Logistic Regression
• Decision Tree
• Random Forest
• Naïve Bayes
Reading Task for this week 41

Relevant sections from Chapter 2 of Text Book


Reading Task for this week 42

Part I: Understanding Machine Learning Chapter 2 and 3


Chapter 1 and 2
Questions?

You might also like