You are on page 1of 2

Support Vector Machine for Regression

It is illustrated using a sample data from a wine company. It contains physical and
chemical properties about the wine and quality of wine. Support Vector machine is
used to build a regression model to predict the quality of wine. Following are the
steps followed to build a model.
Tool used R
Package e1071
Dataset wine.csv
1) First most important step to build an SVM, is feature selection. This package
has many wrapper functions to identify the most important variables. Most
widely used algorithm for feature selection is Random Forest. It calculates
decrease in Gini at each node and order the variables based on the mean.
Pass the variables in the formula and the training dataset
tunerf <- tune.randomForest(churn ~ .,data = wine)
IncNodePurity
fixed.acidity
233.2445
volatile.acidity
390.9797
citric.acid
252.6819
residual.sugar
277.9426
chlorides
294.9922
free.sulfur.dioxide
385.1084
total.sulfur.dioxide
299.3479
density
414.0921
pH
259.2492
sulphates
223.4536
alcohol
627.9292

2) There are many parameters to play around while building a model using SVM.
To identify the best set of parameters, following are the steps.
a. Need to build the model with both Regression Types (eps, nu) which
uses different error functions to optimize
b. Need to build model using all four kind of Kernel Functions (Linear,
Polynomial, Radial and Sigmoid)
c. Pass the range of values for Cost, Gamma and Epsilon
d. Following function will iterate through all the ranges and will pick the
best set of parameters (Cost, Gamma and Epsilon)
tune_1 <- tune (svm, quality ~ volatile.acidity + residual.sugar + chlorides +
free.sulfur.dioxide + total.sulfur.dioxide + density + pH + alcohol, data =
wine, type = c(epsregression, nuregression), kernel =
c(linear,radial,polynomial,sigmoid), ranges = list (epsilon =
seq(0,0.6,0.1), gamma = 2^(-2:2), cost = 2^(-2:3)))

Parameter tuning of svm:


- sampling method: 10-fold cross validation
- best parameters:
epsilon cost gamma SVM-Type SVM-Kernel
0.1 2
2 eps-regression radial
- best performance: 0.4201753

3) Build the SVM model for with optimized set of parameters


best_model <- svm (quality ~ volatile.acidity + residual.sugar + chlorides
+ free.sulfur.dioxide + total.sulfur.dioxide + density + pH + alcohol, data
=wine, type = epsregression, kernel = radial, cost = 2, gamma = 2,
epsilon = 0.1)
Call:
best_model <- svm (quality ~ volatile.acidity + residual.sugar + chlorides +
free.sulfur.dioxide + total.sulfur.dioxide + density + pH + alcohol, data =wine, type = eps
regression, kernel =radial, cost = 2, gamma = 2, epsilon = 0.1)
Parameters:
SVM-Type: eps-regression
SVM-Kernel: radial
cost: 2
gamma: 2
epsilon: 0.1
Number of Support Vectors: 3839

4) Root mean square error is used as a metric for SVM Regression


error <- wine$quality - predict(best_model)
best_model_RMSE <- rmse(error)
best_model_RMSE
[1] 0.2016594

5) Comparing with Linear Regression in terms of RMSE


LM_2 <- lm(formula = quality ~ fixed.acidity + volatile.acidity + residual.sugar +
free.sulfur.dioxide + sulphates + alcohol, data = wine)
error <- wine$quality - predict(LM_2)
LM_2_RMSE<- rmse(error)
LM_2_RMSE
[1] 0.7561817

You might also like