Ch5 - Support Vector Machine (SVM)

Chapter 5
Support Vector Machines (SVM)

SVM is a binary classifier.
SVM is a machine
SVM is one of the most
learning model that is
used Machine Learning
both powerful and
models. What is versatile.
Support
Vector
Machine ?
SVM is a linear model for SVMs well adapted to
classification, regression classifying complex
problems, and outliers datasets that are small or
detection. medium in size.
2
The idea of linear SVM
The algorithm creates a line or a hyperplane which separates the data into classes .
Input: Data SVM Output : line
3
Separate the blue dots from the

red rectangles. So, your job is to
find an optimal line that divides
this dataset into two groups.
4
Which line according to you

best separates the data???
In machine learning our goal is to

get a more generalized separator.
5
Some terminologies to understand the SVM
• Support vector: It is the points closest to

the line from both the classes, and it is
important for identifying the border and
classification.
• Margin: The distance between the line and

the support vectors.
• Maximized margin: The distance between

the two support vectors (the goal).
• Optimal hyperplane: It is the line that

stays as far away from the support vectors
as possible.
6
SVM’s way to find the best line
First, find the support vectors.
Second, Compute the margin.
Then, the hyperplane for which the
margin is maximum is the optimal
hyperplane.
SVM aims to create a decision border

that is as wide as possible between the
two classes. This is called
Large Margin Classification.
7
Feature Scales in SVM
SVM is sensitive to the feature scales, as you can see in

the below image after scaling it can classify data more
efficiently compare to unscaled data.
8
Hard vs Soft margin classification
Hard Margin Classification
Hard Margin Classification
Problem
• It only works if the data is linearly separable.
No data points are allowed in
the margin areas. • It is quite sensitive to outliers.
So, what should we do??
9
Hard vs Soft margin classification
Soft Margin Classification To control this balance using the C hyperparameter:
controls how much you want to punish your model
Data points in the middle of for each misclassified point:
margin areas or even on the • A smaller C value leads to a wider margin but
wrong side (margin violations ). more margin violations.
• A high C value the classifier makes fewer margin
violations but ends up with a smaller margin.
10
Pros & Cons of SVM
Advantages: Disadvantages:
• Accuracy.
• Works well on smaller cleaner • Isn’t suited to larger datasets as the
datasets. training time with SVMs can be
• SVM is relatively memory high.
efficient. • Less effective on noisier datasets.
11
SVM Uses
Text Image Handwritten

Classification Recognition Digit
Challenges Recognition
Detecting spam and Color-based Postal automation

sentiment analysis. classification services
12
Iiris Dataset
13
Code
First, Implementing SVM using sklearn.
Second, Load iris dataset, and using only petal length and petal width
and class only one class which is iris virginica (as SVM is a binary
classifier, we will classify if it is virginica or not)
14
Code
Third, scale the dataset using StandardScaler and to make
things easier, we will use the Pipeline API of sklearn.
Finally, we will train our model on the dataset.
15
The Output
16
Case Study
Tax Revenue in Government
of South Lampung
Article link:
http://sunankalijaga.org/prosiding/index.php/ics
e/article/view/521/495
17
Support Vector Machine Predictive Analysis Implementation: Case
Study of Tax Revenue in Government of South Lampung
Publication:
Proceeding International Conference on Science and Engineering in 2020.
Problem:
Difficulty in
Absence of formula Lack of strategic
predicting the
to calculate the management for
target tax revenue
potential tax local revenue
when arranging a
revenue accurately. improvement.
revenue budget.
18
Purpose:
 Use Business Intelligence to predict the potential tax revenue of hotel and restaurant.
 The government of South Lampung can develop appropriate strategies to improve

local tax revenue and minimize tax reduction.
 To take the result of this study is into consideration for determination the target of the
hotel and restaurant tax sector in the coming year.
Limitations:
Number of hotel and restaurant visitors changes every year, resulting in fluctuations in the
amount of tax.
19
Study area:
 The dataset used in this research is tax revenue data of hotels and restaurants in South
Lampung Region, Indonesia, from 2016 to 2018.
2017 2018
Training 2016 to 2017
4747 6729
Dataset totally 14495 rows
Testing 2017 to 2018 3017
2016
 The dataset contains nine fields, and there are the date, month, number of evidence,
description, type of tax, sub-type of tax, debit, credit, and saldo.
 The tools used to analyze descriptive and predictive analysis in this study are Microsoft
excel and Rapidminer.
20
Methodology:
1. Preparing the data

 The data needs to be reprocessed based on the amount of revenue per month and year.
 The data is processed into training data and testing data.
 Training and testing data are grouped into two, input data starts from month 1 to 12 and
target data starts at month 13.
21
Methodology:
2. Normalization
 Making existing data into smaller values to optimize the computational process.
 The data is normalized to be in a certain range between 0 to 1.
22
Methodology:
3. Data Processing
 Uses descriptive and predictive analysis.
 The predictive analysis is done by using the Support Vector Machine method.
 Results from SVM analysis is being denormalization to get Rupiah value.
23
Results:
Descriptive analysis
24
Results:
Predictive analysis
25
Results:
 Forecast results and the original data are not

much different.
 Results of forecasting can be used as a basis

for determining targets in the coming year.
 Governments set tax revenue targets from the

previous year's target plus 10-20%, high
targets can result into not being achieved and
showing poor performance.
26
THANKS!
Do you have any questions?
Mawadh Sairafi
Mawaddh Basouki

Ch5 - Support Vector Machine (SVM)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch5 - Support Vector Machine (SVM)

Uploaded by

Copyright:

Available Formats

Chapter 5

Support Vector Machines (SVM)

Input: Data SVM Output : line

Separate the blue dots from the

Which line according to you

In machine learning our goal is to

• Support vector: It is the points closest to

• Margin: The distance between the line and

• Maximized margin: The distance between

• Optimal hyperplane: It is the line that

SVM aims to create a decision border

SVM is sensitive to the feature scales, as you can see in

So, what should we do??

Text Image Handwritten

Detecting spam and Color-based Postal automation

Finally, we will train our model on the dataset.

 The government of South Lampung can develop appropriate strategies to improve

1. Preparing the data

 Forecast results and the original data are not

 Results of forecasting can be used as a basis

 Governments set tax revenue targets from the

You might also like