You are on page 1of 27

Chapter 5

Support Vector Machines (SVM)


SVM is a binary classifier.

SVM is a machine
SVM is one of the most
learning model that is
used Machine Learning
both powerful and
models. What is versatile.

Support
Vector
Machine ?
SVM is a linear model for SVMs well adapted to
classification, regression classifying complex
problems, and outliers datasets that are small or
detection. medium in size.
2
The idea of linear SVM

The algorithm creates a line or a hyperplane which separates the data into classes .

Input: Data SVM Output : line

3
The idea of linear SVM

Separate the blue dots from the


red rectangles. So, your job is to
find an optimal line that divides
this dataset into two groups.

4
The idea of linear SVM

Which line according to you


best separates the data???

In machine learning our goal is to


get a more generalized separator.

5
Some terminologies to understand the SVM

• Support vector: It is the points closest to


the line from both the classes, and it is
important for identifying the border and
classification.

• Margin: The distance between the line and


the support vectors.

• Maximized margin: The distance between


the two support vectors (the goal).

• Optimal hyperplane: It is the line that


stays as far away from the support vectors
as possible.

6
SVM’s way to find the best line
First, find the support vectors.
Second, Compute the margin.
Then, the hyperplane for which the
margin is maximum is the optimal
hyperplane.

SVM aims to create a decision border


that is as wide as possible between the
two classes. This is called
Large Margin Classification.

7
Feature Scales in SVM

SVM is sensitive to the feature scales, as you can see in


the below image after scaling it can classify data more
efficiently compare to unscaled data.

8
Hard vs Soft margin classification
Hard Margin Classification
Hard Margin Classification
Problem
• It only works if the data is linearly separable.
No data points are allowed in
the margin areas. • It is quite sensitive to outliers.

So, what should we do??

9
Hard vs Soft margin classification
Soft Margin Classification To control this balance using the C hyperparameter:
controls how much you want to punish your model
Data points in the middle of for each misclassified point:
margin areas or even on the • A smaller C value leads to a wider margin but
wrong side (margin violations ). more margin violations.
• A high C value the classifier makes fewer margin
violations but ends up with a smaller margin.

10
Pros & Cons of SVM

Advantages: Disadvantages:

• Accuracy.
• Works well on smaller cleaner • Isn’t suited to larger datasets as the
datasets. training time with SVMs can be
• SVM is relatively memory high.
efficient. • Less effective on noisier datasets.

11
SVM Uses

Text Image Handwritten


Classification Recognition Digit
Challenges Recognition

Detecting spam and Color-based Postal automation


sentiment analysis. classification services

12
Iiris Dataset

13
Code
First, Implementing SVM using sklearn.

Second, Load iris dataset, and using only petal length and petal width
and class only one class which is iris virginica (as SVM is a binary
classifier, we will classify if it is virginica or not)

14
Code
Third, scale the dataset using StandardScaler and to make
things easier, we will use the Pipeline API of sklearn.

Finally, we will train our model on the dataset.

15
The Output

16
Case Study
Tax Revenue in Government
of South Lampung

Article link:
http://sunankalijaga.org/prosiding/index.php/ics
e/article/view/521/495

17
Support Vector Machine Predictive Analysis Implementation: Case
Study of Tax Revenue in Government of South Lampung

Publication:
Proceeding International Conference on Science and Engineering in 2020.

Problem:

Difficulty in
Absence of formula Lack of strategic
predicting the
to calculate the management for
target tax revenue
potential tax local revenue
when arranging a
revenue accurately. improvement.
revenue budget.

18
Support Vector Machine Predictive Analysis Implementation: Case
Study of Tax Revenue in Government of South Lampung

Purpose:
 Use Business Intelligence to predict the potential tax revenue of hotel and restaurant.

 The government of South Lampung can develop appropriate strategies to improve


local tax revenue and minimize tax reduction.

 To take the result of this study is into consideration for determination the target of the
hotel and restaurant tax sector in the coming year.

Limitations:
Number of hotel and restaurant visitors changes every year, resulting in fluctuations in the
amount of tax.

19
Support Vector Machine Predictive Analysis Implementation: Case
Study of Tax Revenue in Government of South Lampung

Study area:
 The dataset used in this research is tax revenue data of hotels and restaurants in South
Lampung Region, Indonesia, from 2016 to 2018.
2017 2018
Training 2016 to 2017
4747 6729
Dataset totally 14495 rows
Testing 2017 to 2018 3017
2016

 The dataset contains nine fields, and there are the date, month, number of evidence,
description, type of tax, sub-type of tax, debit, credit, and saldo.

 The tools used to analyze descriptive and predictive analysis in this study are Microsoft
excel and Rapidminer.
20
Support Vector Machine Predictive Analysis Implementation: Case
Study of Tax Revenue in Government of South Lampung

Methodology:

1. Preparing the data


 The data needs to be reprocessed based on the amount of revenue per month and year.
 The data is processed into training data and testing data.
 Training and testing data are grouped into two, input data starts from month 1 to 12 and
target data starts at month 13.
21
Support Vector Machine Predictive Analysis Implementation: Case
Study of Tax Revenue in Government of South Lampung

Methodology:

2. Normalization
 Making existing data into smaller values to optimize the computational process.
 The data is normalized to be in a certain range between 0 to 1.

22
Support Vector Machine Predictive Analysis Implementation: Case
Study of Tax Revenue in Government of South Lampung

Methodology:

3. Data Processing
 Uses descriptive and predictive analysis.
 The predictive analysis is done by using the Support Vector Machine method.
 Results from SVM analysis is being denormalization to get Rupiah value.

23
Support Vector Machine Predictive Analysis Implementation: Case
Study of Tax Revenue in Government of South Lampung

Results:
Descriptive analysis

24
Support Vector Machine Predictive Analysis Implementation: Case
Study of Tax Revenue in Government of South Lampung

Results:
Predictive analysis

25
Support Vector Machine Predictive Analysis Implementation: Case
Study of Tax Revenue in Government of South Lampung

Results:

 Forecast results and the original data are not


much different.

 Results of forecasting can be used as a basis


for determining targets in the coming year.

 Governments set tax revenue targets from the


previous year's target plus 10-20%, high
targets can result into not being achieved and
showing poor performance.

26
THANKS!
Do you have any questions?

Mawadh Sairafi
Mawaddh Basouki

You might also like