Cyber Security

Powerpoint Templates Page 1

An Intro to Expert Systems
• Rule-based decision system
applied to a given application
domain in the real world.
• When the nature of the data is
clear and conforms to known
models, there is no advantage
in using ML algorithms instead
of pre-defined models.

Types of ML
• Supervised Learning
– Classification Algorithm (Eg: Spam classification)
– Regression Algorithm
• Regression (linear and logistic)
• k-Nearest Neighbors (k-NNs)
• Support vector machines (SVMs)
• Decision trees and random forests
• Neural networks (NNs)

Types of ML
• Unsupervised Learning
– identifying new forms of malware attacks, frauds, and email spamming campaigns.
– Dimensionality reduction
• Principal component analysis (PCA)
• PCA Kernel
– Clustering
• k-means
• Hierarchical cluster analysis (HCA)
• Reinforcement Learning
– Trial and error approach
– Learning Process: Positive Reward / Negative Reward
• Markov process (HMM: polymorphic malware threats)
• Q-learning
• Temporal difference (TD) methods
• Monte Carlo methods

Quality vs Quantity
– What types of malware can we consider most

representative of the most probable risks and threats
to our company?
– How many example cases (samples) should we
collect and administer to the algorithms in order to
obtain a reliable result in terms of both effectiveness
and predictive efficiency of future threats?

Python ML Libraries
– NumPy
– Pandas
– Matplotlib
– scikit-learn
– Seaborn

AI in Cyber Security
– Classification: Identify types of similar attacks / malware
belonging to the same family with common characteristics
and behavior, even if their signatures are distinct
(polymorphic malware).
– classify emails, distinguishing spam from legitimate
emails.
– Clustering: automatically identify the classes to which the
samples belong when information about classes is not
available in advance (malware analysis and forensic
analysis)
– Predictive analysis: NNs and DL used to identify threats.

– Network protection: ML allows highly sophisticated
Intrusion Detection Systems (IDS) used in the network
perimeter protection area.
– Endpoint protection: Threats such as ransomware can
be adequately detected by adopting algorithms that
learn the behaviors of malware, thus overcoming the
limitations of traditional antivirus software.
– Application security: Attacks on web applications
include Server Side Request Forgery (SSRF), SQL
injection, Cross-Site Scripting (XSS), and Distributed
Denial of Service (DDoS) attacks can be adequately
countered by using AI and ML tools and algorithms.

• Suspect user behavior: Identifying attempts at fraud
or compromising applications by malicious users at the
very moment they occur is one of the emerging areas
of application of DL.

Detecting spam with Perceptrons
• SpamAssassin – Open source tool
• Neural Networks (NN) – Common, Simpler one Basic
form is Perceptron.
• conceptually mimic the behavior of the human brain
• The Perceptron is one of the first successful
implementations of a neuron in the field of AI.

Spam Filters in Nutshell
• Categorizing mails based on the presence or the
absence of particular keywords occurring within the text
of the emails with a certain frequency
• Number of occurrences of the suspicious keywords
• Assign a score to the individual messages identified as
spam, based on the number of occurrences of
identified keywords to classify subsequent email
messages.
• If score > threshold value, the email will be classified as
spam; otherwise, classified as ham.

Spam Filters in Action
• Spammers are well aware of our attempt to filter

unwanted messages.
• The first spam detection solution - made use of static
rules, using regular expressions to identify predefined
patterns of suspicious words in the email text.
– Static rules quickly proved to be ineffective
• Dynamic approach allowed the spam filter to learn
based on the continuous innovations introduced by
spammers.

Spam Filters in Action
• B variable instead of word buy, and the

• S variable instead of the word sex.
– Scoring Function: y = B + S
• If both words are present in the text of the email, the
probability of it being spam increases.
• Attribute a lower weight of 2 to the B variable and a
greater weight of 3 to the S variable.
– Corrected Scoring Function: y = 2B + 3S

Detecting spam with linear classifiers
• Determine the score to be associated with every

single email message.

Detecting spam with linear classifiers
• Generalize this formalization by shifting the θ threshold value on the left side of the equation
• Formulation of the linear classifier takes its definitive form
• Index 𝒾 now assumes the value from 0 to 𝓃

How the Perceptron Learns

Perceptron-based Spam Filter
• Used scikit- learn library to create a simple spam

filter based on the Perceptron.
• https://archive.ics.uci.edu/dataset/228/
sms+spam+collection
– Downloaded in CSV format
– Transforming it into numerical values
– Selected only the messages containing the buy and sex
keywords, counting for each message, the number of
occurrences of the keywords present in the text of the
message.
– Loading of data from the sms_spam_perceptron.csv file,
through the pandas library, extracting from the DataFrame
of pandas the respective values, referenced through the
iloc() method.

SVM based Spam Filter
• SVM - most popular Supervised Learning algorithms

used for Classification as well as Regression
problems.
• The goal is to create the best line or decision
boundary that can segregate n-dimensional space
into classes.
https://roadmap.sh/cyber-security

SVM based Spam Filter
• Load the data with pandas, associating the class

labels with the corresponding -1 (spam) and 1 (ham)
• Split the original dataset into 30% test data and 70%
training data
• Import the SVC class from the sklearn.svm package
• Using sklearn.metric, evaluate the accuracy of the
predictions
– Prediction Accuracy: 84%
– Number of incorrect classifications: 7 cases

Image spam detection with SVM
• Hackers use images as a vehicle for spreading

spam, instead of simple text.
• Image-based spam detection solutions:
– Content-based filtering:
• Pattern recognition techniques leveraging Optical Character
Recognition (OCR) technology to extract text from images
– Non content-based filtering:
• Identify specific features of spam images
• for the extraction of the features, advanced recognition techniques
based on NNs and deep learning (DL) are used.
https://scholarworks.sjsu.edu/etd_projects/486/
• Image-based spam detection solution
– https://medium.com/@yesprabhakaran98/email-spam-classification-
92b661d3b700

Linear Regression: Pros & Cons
Pros:
•Implementation is simple
•Linear regression works better with continuous
intervals of values
Cons:
•Can manage only quantitative data
•Assumption is unrealistic (age & weight)
•Greater classification errors
•Systematically distorted predictions

Linear vs Logistic

Logistic Regression
• Used to predict the category of a dependent variable

based on the values of the independent variable. Its
output is 0 or 1.
Classifying Credit Cards using Logistic Regression

Phishing Detector using Logistic Regression
• https://archive.ics.uci.edu/dataset/327/
phishing+websites
• The dataset  CSV format using the data wrangling
technique (one-hot encoding)
• Consists of records containing 30 features that
characterize phishing websites.

Logistic Regression: Pros & Cons
• Pros:
– The model can be trained very efficiently even in the
presence of a large number of features
– The algorithm has a high degree of scalability, due to the
simplicity of its scoring function
• Cons:
– Features to be linearly independent
– Require more training samples
– Less powerful in minimizing the prediction errors

Types of Malware
• Trojans: Executables that appear as legitimate and

harmless, but once they are launched, they execute
malicious instructions in the background
• Botnets: Malware that has the goal of compromising
as many possible hosts of a network, in order to put
their computational capacity at the service of the
attacker
• Downloaders: Malware that downloads malicious
libraries or portions of code from the network and
executes them on victim hosts
• APTs: APTs are forms of tailored attacks that exploit
specific vulnerabilities on the victimized hosts

Types of Malware
• Rootkits: compromises the hosts at the operating

system level and, therefore, often come in the form
of device drivers, making the various
countermeasures (antiviruses) ineffective
• Ransomwares: proceeds to encrypt files stored
inside the host machines, asking for a ransom from
the victim to obtain the decryption key which is used
for recovering the original files
• Zero days: exploits vulnerabilities not yet disclosed
to the community of researchers and analysts,
whose characteristics and impacts are not yet
known, and therefore undetected by antivirus s/w

Decision Tree
https://youtu.be/LDRbO9a6XPU
• Decision trees use binary trees to analyze and

process data for Predictions.
• Accepting both numerical values and qualitative
information as input data.
Implementation Steps:
• Subdividing the original dataset into two child
subsets (binary condition is verified or falsified)
• The child subsets further subdivided on the basis of
further conditions
– At each step, the condition that provides the best bipartition
of the original subset is chosen

Cyber Security

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cyber Security

Uploaded by

Copyright:

Available Formats

Powerpoint Templates Page 1

Powerpoint Templates Page 2

Powerpoint Templates Page 3

Powerpoint Templates Page 4

Powerpoint Templates Page 5

– What types of malware can we consider most

Powerpoint Templates Page 6

Powerpoint Templates Page 7

Powerpoint Templates Page 8

Powerpoint Templates Page 9

Powerpoint Templates Page 10

Powerpoint Templates Page 11

Powerpoint Templates Page 12

• Spammers are well aware of our attempt to filter

Powerpoint Templates Page 13

• B variable instead of word buy, and the

Powerpoint Templates Page 14

• Determine the score to be associated with every

Powerpoint Templates Page 15

• Formulation of the linear classifier takes its definitive form

• Index 𝒾 now assumes the value from 0 to 𝓃

Powerpoint Templates Page 16

Powerpoint Templates Page 17

• Used scikit- learn library to create a simple spam

Powerpoint Templates Page 18

• SVM - most popular Supervised Learning algorithms

Powerpoint Templates Page 19

• Load the data with pandas, associating the class

Powerpoint Templates Page 20

• Hackers use images as a vehicle for spreading

Powerpoint Templates Page 21

Powerpoint Templates Page 22

Powerpoint Templates Page 23

• Used to predict the category of a dependent variable

Classifying Credit Cards using Logistic Regression

Powerpoint Templates Page 24

Powerpoint Templates Page 25

Powerpoint Templates Page 26

• Trojans: Executables that appear as legitimate and

Powerpoint Templates Page 27

• Rootkits: compromises the hosts at the operating

Powerpoint Templates Page 28

• Decision trees use binary trees to analyze and

Powerpoint Templates Page 29

You might also like