Support Vector Machine in R - Using SVM To Predict Heart Diseases - Edureka

15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
Support Vector Machine In R: Using SVM To Predict Heart Diseases

Last updated on May 22,2019 4.3K Views
Zulaikha Lateef
Zulaikha is a tech enthusiast working as a Research Analyst at Edureka.
myMock Interview Service for Real Tech Jobs

Mock interview in latest tech domains i.e JAVA, AI, DEVOPS,etc
Get interviewed by leading tech experts
Real time assessment report and video recording
TRY OUT MOCK INTERVIEW 
Support Vector Machine In R:

With the exponential growth in AI, Machine Learning is becoming one of the most sort after elds. As the name suggests,
Machine Learning is the ability to make machines learn through data by using various Machine Learning Algorithms and in this
blog on Support Vector Machine In R, we’ll discuss how the SVM algorithm works, the various features of SVM and how it used in
the real world.
To get in-depth knowledge on Data Science and the various Machine Learning Algorithms, you can enroll for live Data Science
Certi cation Training by Edureka with 24/7 support and lifetime access.
Before we move any further, let me list down the topics I’ll be covering:
1. Introduction to Machine Learning

2. Types of Machine Learning
3. What is SVM?
4. How SVM works?
5. Non-linear SVM
6. SVM Use Case
7. Support Vector Machine Demo
Introduction to Machine Learning

Machine learning is a science of getting computers to act by feeding them data and letting them learn a few tricks on their own,
without being explicitly programmed to do so
The key to machine learning is the data. Machines learn just like us humans. We humans need to collect information and data to
learn, similarly, machines must also be fed data in order to learn and make decisions.
Subscribe to our Newsletter, and get personalized recommendations.

×
Sign up with Google
Signup with Facebook


Already have an account? Sign in.
https://www.edureka.co/blog/support-vector-machine-in-r/ 1/15
Introduction to machine learning – Support Vector Machine In R
To understand Machine learning, let’s consider an example. Let’s say you want a machine to predict the value of a stock. In such
situations, you just feed the machine with relevant data. After that, you must create a model which is used to predict the value of
the stock.
One thing to keep in mind is, the more data you feed the machine, the better it’ll learn and make more accurate predictions.
Obviously, ML is not so simple. For a machine to analyze and get useful insights from data, it must process and study the data by
running di erent algorithms on it. And in this blog, we’ll be discussing one of the most widely used algorithms called SVM.
So, that was a brief introduction to Machine Learning, if you want to learn more about Machine Learning, check out this video by
our experts:
Machine Learning Tutorial | Edureka
What is Machine Learning? | Machine Learning Basics | Machine Lear…

Lear…
Types Of Machine Learning

Now that you have a brief idea about Machine learning, let’s look at the di erent ways in which machines learn.
1. Supervised learning
Supervised means to oversee or direct a certain activity and make sure it’s done correctly. In this type of learning the machine
learns under guidance.
At school, our teachers guided us and taught us, similarly in supervised learning, you feed the model a set of data called training
data, which contains both input data and the corresponding expected output. The training data acts as a teacher and teaches the
model the correct output for a particular input so that it can make accurate decisions when later presented with new data.
2. Unsupervised learning
Unsupervised means to act without anyone’s supervision or direction.
In unsupervised learning, the model is given a data set which is neither labeled nor classi ed. The model explores the data and
draws inferences from data sets to de ne hidden structures from unlabeled data.
An example of unsupervised learning is an adult like you and me. We don’t need a guide to help us with our daily activities, we
gure things out on our own without any supervision.
×
Sign up with Google


Types of machine learning – Support Vector Machine In R
3. Reinforcement learning
Reinforcement means to establish or encourage a pattern of behavior. Let’s say you were dropped o at an isolated island, what
would you do?
Initially, you’d panic and be unsure of what to do, where to get food from, how to live and so on. But after a while you will have to
adapt, you must learn how to live in the island, adapt to the changing climates, learn what to eat and what not to eat.
You’re following what is known as the hit and trail concept because you’re new to this surrounding and the only way to learn, is
experience and then learn from your experience.
This is what reinforcement learning is. It is a learning method wherein an agent (you, stuck on an island) interacts with its
environment (island) by producing actions and discovers errors or rewards.
If you want to get an in-depth explanation about Support Vector Machine, check out this video recorded by our Machine Learning
experts.
Support Vector Machine Tutorial Using R | Edureka
Support Vector Machine Tutorial Using R | SVM Algorithm Explained |…

|…
What Is SVM?
SVM (Support Vector Machine) is a supervised machine learning algorithm which is mainly used to classify data into di erent
classes. Unlike most algorithms, SVM makes use of a hyperplane which acts like a decision boundary between the various
classes.
SVM can be used to generate multiple separating hyperplanes such that the data is divided into segments and each segment
contains only one kind of data.
What is SVM? – Support Vector Machine In R
Before moving further, let’s discuss the features of SVM: Subscribe to our Newsletter, and get personalized recommendations.
×
1. SVM is a supervised learning algorithm. This means that SVM trains on a set of labeled data. SVM studies the labeled
Sign up with Google
training data and then classi es any new input data depending on what it learned in the training phase.
2. A main advantage of SVM is that it can be used for both classi cation and regression problems. Though SVM is mainly
known for classi cation, the SVR (Support Vector Regressor)Signup
is used forFacebook
with regression problems.
3. SVM can be used for classifying non-linear data by using the kernel trick. The kernel trick means transforming data into 
another dimension that has a clear dividing margin between classes of data. After which you can easily draw a hyperplane
between the various classes of data.
How Does SVM Work?

In order to understand how SVM works let’s consider a scenario.
For a second, pretend you own a farm and you have a problem–you need to set up a fence to protect your rabbits from a pack of
wolves. But where do you build your fence?
How does SVM work? – Support Vector Machine In R
One way to get around the problem is to build a classi er based on the position of the rabbits and wolves in your pasture.
So if I do that, and try to draw a decision boundary between the rabbits and the wolves, it looks something like this. Now you can
clearly build a fence along this line.
In simple terms, this is exactly how SVM works. It draws a decision boundary, i.e. a hyperplane between any two classes in order
to separate them or classify them.
Now I know you’re thinking how do you know where to draw a hyperplane?
The basic principle behind SVM is to draw a hyperplane that best separates the 2 classes. In our case the two classes are the
rabbits and the wolves. Before we move any further, let’s try to understand what a support vector is.
What is a Support Vector in SVM?

So, you start of by drawing a random hyperplane and then you check the distance between the hyperplane and the closest data
points from each class. These closest data points to the hyperplane are known as support vectors. And that’s where the name
comes from, support vector machine.

×
Sign up with Google


The hyperplane is drawn based on these support vectors and an optimum hyperplane will have a maximum distance from each
of the support vectors. And this distance between the hyperplane and the support vectors is known as the margin.
To sum it up, SVM is used to classify data by using a hyperplane, such that the distance between the hyperplane and the support
vectors is maximum.
Alright, now let’s try to solve a problem.
Let’s say that I input a new data point and now I want to draw a hyperplane such that it best separates these two classes.
Data Science Certi cation Course using R

Instructor-led Sessions
Real-life Case Studies
Assignments
Lifetime Access
Explore Curriculum
So, I start o by drawing a hyperplane and then I check the distance between the hyperplane and the support vectors. Here I’m
basically trying to check if the margin is maximum for this hyperplane.
But what if I draw the hyperplane like this? The margin for this hyperplane is clearly more than the previous one. So, this is my
optimal hyperplane.

×
Sign up with Google

So far it was quite easy, our data was linearly separable, which means that you could a draw a straight line to separate the two
classes!
But what will you do if the data set is like this?
You possibly can’t draw a hyperplane like this! It doesn’t separate the two classes.
This is where Non-linear SVM is implemented.
Non-Linear Support Vector Machine (SVM)

×
Earlier in this blog, I mentioned how a kernel can be used to transform data into another dimension that has a clear dividing
margin between classes of data. Sign up with Google
Kernel functions o er the user the option of transforming nonlinear spaces into linear ones.

Until this point, we were plotting our data on 2-dimensional space. So, we had only 2 variables, x and y.
A simple trick would be transforming the two variables x and y into a new feature space
Already have an involving a new
account? Sign in. variable z. Basically,
we’re visualizing the data on a 3-dimensional space.
When you transform the 2D space into a 3D space you can clearly see a dividing margin between the 2 classes of data. And now
you can go ahead and separate the two classes by drawing the best hyperplane between them.
Non-linear Support Vector Machine – Support Vector Machine In R
This sums up the idea behind Non-linear SVM. To understand the real world applications of Support Vector Machine let’s look at
a use case.
Use Case – SVM

SVM as a classi er has been used in cancer classi cation since the early 2000’s.
A group of professionals held an experiment to classify colon cancer tissue by using SVM. The dataset consisted of about 2000
transmembrane protein samples and only about 50-200 gene samples were input into the SVM classi er. The sample input to
the SVM had both colon cancer tissue samples and normal colon tissues.
SVM Use Case – Support Vector Machine In R
The main objective of this study was to classify gene samples based on whether they are cancerous or not. So SVM was trained
using the 50-200 samples in order to discriminate non-tumor from tumor specimens. The performance of the SVM classi er was
very accurate for even a small data set and its performance was compared to other classi cation algorithms like Naïve Bayes and
in each case, the SVM outperformed Naive Bayes.
So after this experiment, it was clear that SVM classi ed the data more e ectively and it worked exceptionally good with small
data sets.
Support Vector Machine Demo

Alright, now let’s get into the practical part. We’ll run a demo to better understand how SVM can be used as a classi er.
A lot of people have this question mind,

×
What Is SVM In R?
The answer is, R is basically an open source statistical, programming
Sign up language
with Googleused mainly in the eld of Data Science. In our
demo, we’ll be using the R programming language to build a SVM classi er, so if you don’t have a good understanding of R, I
suggest you watch this video recorded by our experts:
R Programming For Beginners | Edureka 
R Programming For Beginners | R Language Tutorial | R Tutorial For B…

B…
You can also refer this R for Data Science blog to learn more about how the entire Data Science work ow can be implemented
using R.
Problem Statement:
To Study a heart disease data set and to model a classi er for predicting whether a patient is su ering from any heart disease or
not.
SVM Demo Problem statement – Support Vector Machine In R
In this demo, we’ll be using the Caret package. The caret package is also known as the Classi cation And REgression Training, has
tons of functions that helps to build predictive models. It contains tools for data splitting, pre-processing, feature selection,
tuning, unsupervised learning algorithms, etc.
So, to use it, we rst need to install it using this command:
1 install.packages(“caret”)
The caret package is very helpful because it provides us direct access to various functions for training our model with various
machine learning algorithms like KNN, SVM, decision tree, linear regression, etc.
After installing it, we just need to load the package into our console, to do that we have this code:
1 library('caret')
Our next step is to load the data set.
For this demo, we’ll be using a Heart Disease data set which consists of various attributes like the person’s age, sex, cholesterol
level and etc. In the same data set, we’ll have a target variable, which is used to predict whether a patient is su ering from any
heart disease or not
In short, we’ll be using SVM to classify whether a person is going to be prone to heart disease or not.
The data set looks like this:
×
Sign up with Google

Heart Data set – Support Vector Machine In R 
This data set has around 14 attributes and the last attribute is the target variable which we’ll be predicting using our SVM model.
Now it’s time to load the data set:
1 heart <- read.csv("/Users/zulaikha/Desktop/heart_dataset.csv", sep = ',', header = FALSE)
In the above line of code, we’re reading the dataset which is stored in a CSV format and that’s why we’ve used the read.csv
function to read it from the speci ed path.
ta Science Training
MACHINE
PYTHON PYTHON
LEARNING DATA SCIEN
󡁃 󡔹 󡕂 󡀨
CERTIFICATION PROGRAMMING
CERTIFICATION CERTIFICATI
TRAINING FOR DATA CERTIFICATION
TRAINING USING COURSE USI
SCIENCE COURSE
PYTHON
Machine Learning
Python Certi cation Python Programming Data Science Certi
Certi cation Training
Training for Data Science Certi cation Course Course using R
using Python
Reviews Reviews Reviews Reviews
     5(47761)      5(8268)      5(5128)      5(34289)
The ‘sep’ attribute indicates that the data is stored in a CSV or Comma Separated Version.
Now that we’ve imported our dataset, let’s check the structure of our dataset:
For checking the structure of data frame we can call the function str():
1 str(heart)
Structure of Data set – Support Vector Machine In R
The output shows us that our dataset consists of 300 observations each with 14 attributes.
If you want to display the top 5-6 rows of the data set, use the head() function:
1 head(heart)

×
Head of Data set – Support
Sign Vector
up with Machine
Google In R
Our next step is to split the data into training set and testing set, this is also called data splicing.

We’ll be using the training set speci cally for our model building and the testing set for evaluating the model:
1 intrain <- createDataPartition(y = heart$V14, p= 0.7, list = FALSE)

2 training <- heart[intrain,]
3 testing <- heart[-intrain,]
The caret package provides a method createDataPartition() which is basically for partitioning our data into train and test set.
We’ve passed 3 parameters to this createdatapartition() function:
The “y” parameter takes the value of variable according to which data needs to be partitioned. In our case, target variable is
at V14, so we are passing heart$V14
The “p” parameter holds a decimal value in the range of 0-1. It’s to show the percentage of the split. We are using p=0.7. It
means that data split should be done in 70:30 ratio. So, 70% of the data is used for training and the remaining 30% is for
testing the model.
The “list” parameter is for whether to return a list or matrix. We are passing FALSE for not returning a list
Now this createDataPartition() method is returning a matrix “intrain”. This intrain matrix has our training data set and we’re
storing this in the ‘training’ variable and the rest of the data, i.e. the remaining 30% of the data is stored in the testing variable.
Next, for checking the dimensions of our training data frame and testing data frame, we can use these:
1 dim(training);
2 dim(testing);
Our next step is to clean the data, so if there are any missing values or inconsistent values, they have to be dealt with before we
build the training model
We’ll be using the anyNA() method, which checks for any null values:
1 anyNA(heart)
on running this, we get the return values as false, which means that there are no missing values in our dataset.
Next, we’re checking the summary of our data by using the summary() function
1 summary(heart)
Summary Data set – Support Vector Machine In R
The output shows that the values of the various variables are not standardized.
For example, the V14 variables, which is our target variable, it holds only 2 values, either 0 or 1.
Instead, this should be a categorical variable. To convert these to categorical variables, we need to factorize them:
1 training[["V14"]] = factor(training[["V14"]])
The above code will convert the training data frame’s “V14” column to a factor variable.
Our next step is to train our model.

×
Before we train our model, we’ll rst implement the trainControl() method. This will control all the computational overheads so
Sign up with Google
that we can use the train() function provided by the caret package. The training method will train our data on di erent
algorithms.

First, let’s focus on the traincontrol() method:
1 trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)

The trainControl() method here, is taking 3 parameters.
The “method” parameter de nes the resampling method, in this demo we’ll be using the repeatedcv or the repeated cross-
validation method.
The next parameter is the “number”, this basically holds the number of resampling iterations.
The “repeats ” parameter contains the sets to compute for our repeated cross-validation. We are using setting number =10
and repeats =3
This trainControl() method returns a list. We are going to pass this on our train() method.
1 svm_Linear <- train(V14 ~., data = training, method = "svmLinear",

2 trControl=trctrl,
3 preProcess = c("center", "scale"),
4 tuneLength = 10)
The train() method should be passed with “method” parameter as “svmLinear”. We are passing our target variable V14. The
“V14~.” denotes a formula for using all attributes in our classi er and V14 as the target variable. The “trControl” parameter
should be passed with results from our trianControl() method. The “preProcess” parameter is for preprocessing our training data.
We are passing 2 values in our “pre-process” parameter “center” & “scale”.
These two help for centering and scaling the data.
After pre-processing, these convert our training data with mean value as approximately “0” and standard deviation as “1”. The
“tuneLength” parameter holds an integer value. This is for tuning our algorithm.
You can check the result of our train() method. We are saving its results in the svm_Linear variable.
1 svm_Linear
SVM linear output – Support Vector Machine In R
It’s a linear model therefore, it just tested at value “C” =1.
Now, our model is trained with C value as 1. We are ready to predict classes for our test set. We can use predict() method.
The caret package provides predict() method for predicting results. We are passing 2 arguments. Its rst parameter is our trained
model and second parameter “newdata” holds our testing data frame. The predict() method returns a list, we are saving it in a
test_pred variable.
1 test_pred <- predict(svm_Linear, newdata = testing)

2 test_pred

×
Sign up with Google
Test pred output – Support Vector Machine In R
Now let’s check the accuracy of our model. We’re going to use the confusion matrix to predict the accuracy:

1 confusionMatrix(table(test_pred, testing$V14))
Confusion matrix output 1 – Support Vector Machine In R
The output shows that our model accuracy for test set is 86.67%.
Data Science Certi cation Course using R

Weekday / Weekend Batches
See Batch Details
By following the above procedure, we can build our svmLinear classi er.
We can also do some customization for selecting C value(Cost) in Linear classi er. This can be done by inputting values in grid
search.
The next code snippet will show you, building & tuning of an SVM classi er with di erent values of C.
We are going to put some values of C using expand.grid() into “grid” dataframe. Next step is to use this dataframe for testing our
classi er at speci c C values. It needs to be put in train() method with tuneGrid parameter.
1 grid <- expand.grid(C = c(0,0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2,5))
2 svm_Linear_Grid <- train(V14 ~., data = training, method = "svmLinear",
3 trControl=trctrl,
4 preProcess = c("center", "scale"),
5 tuneGrid = grid,
6 tuneLength = 10)
7 svm_Linear_Grid
8 plot(svm_Linear_Grid)

×
SVM Plot – Support Vector Machine In R
Sign up with Google
The above plot is showing that our classi er is giving best accuracy on C = 0.05. Let’s try to make predictions using this model for
our test set.

1 test_pred_grid <- predict(svm_Linear_Grid, newdata = testing)
2 test_pred_grid
Let’s check its accuracy using confusion -matrix.
1 confusionMatrix(table(test_pred_grid, testing$V14))
Confusion matrix 2 – Support Vector Machine In R
The results of the confusion matrix show that this time the accuracy on the test set is 87.78 %, which is more accurate than our
previous result.
With this, we come to the end of this blog. I hope you found this informative and helpful, stay tuned for more tutorials on
machine learning.
To get in-depth knowledge of the di erent machine learning algorithms along with its various applications, you can enroll here
for live online training with 24/7 support and lifetime access.
Got a question for us? Mention them in the comments section.
Recommended videos for you
   
Introduction to Business Know The Science Behind 3 Scenarios Where Predictive Android Developme
Analytics with R Product Recommendation Analytics is a Must Android 5.0 Lollipop
With R Programming
Watch Now Watch Now Watch Now Watch Now
‹›
Recommended blogs for you

×
K-means Clustering Algorithm: Python Anaconda Tutorial : All You Need To Know About Introduction To File
Know How It Works Everything You Need To Know Principal Component Analysis In Python
Sign up with Google
(PCA)
Read Article Read Article Read Article Read Article


‹› Already have an account? Sign in.

Support Vector Machine in R - Using SVM To Predict Heart Diseases - Edureka

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Support Vector Machine in R - Using SVM To Predict Heart Diseases - Edureka

Uploaded by

Copyright:

Available Formats

15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka

Support Vector Machine In R: Using SVM To Predict Heart Diseases

myMock Interview Service for Real Tech Jobs

TRY OUT MOCK INTERVIEW 

Support Vector Machine In R:

1. Introduction to Machine Learning

Introduction to Machine Learning

Subscribe to our Newsletter, and get personalized recommendations.

Signup with Facebook

Introduction to machine learning – Support Vector Machine In R

Machine Learning Tutorial | Edureka

What is Machine Learning? | Machine Learning Basics | Machine Lear…

Types Of Machine Learning

Unsupervised means to act without anyone’s supervision or direction.

Signup with Facebook

Types of machine learning – Support Vector Machine In R

Support Vector Machine Tutorial Using R | Edureka

Support Vector Machine Tutorial Using R | SVM Algorithm Explained |…

What is SVM? – Support Vector Machine In R

How Does SVM Work?

How does SVM work? – Support Vector Machine In R

How does SVM work? – Support Vector Machine In R

What is a Support Vector in SVM?

Subscribe to our Newsletter, and get personalized recommendations.

Signup with Facebook

Alright, now let’s try to solve a problem.

How does SVM work? – Support Vector Machine In R

Data Science Certi cation Course using R

How does SVM work? – Support Vector Machine In R

Subscribe to our Newsletter, and get personalized recommendations.

Already have an account? Sign in.

How does SVM work? – Support Vector Machine In R

But what will you do if the data set is like this?

How does SVM work? – Support Vector Machine In R

How does SVM work? – Support Vector Machine In R

This is where Non-linear SVM is implemented.

Non-Linear Support Vector Machine (SVM)

Non-linear Support Vector Machine – Support Vector Machine In R

Use Case – SVM

SVM Use Case – Support Vector Machine In R

Support Vector Machine Demo

A lot of people have this question mind,

R Programming For Beginners | R Language Tutorial | R Tutorial For B…

SVM Demo Problem statement – Support Vector Machine In R

So, to use it, we rst need to install it using this command:

Our next step is to load the data set.

The data set looks like this:

Sign up with Google

Signup with Facebook

Now it’s time to load the data set:

1 heart <- read.csv("/Users/zulaikha/Desktop/heart_dataset.csv", sep = ',', header = FALSE)

     5(47761)      5(8268)      5(5128)      5(34289)

Structure of Data set – Support Vector Machine In R

Subscribe to our Newsletter, and get personalized recommendations.

Already have an account? Sign in.

1 intrain <- createDataPartition(y = heart$V14, p= 0.7, list = FALSE)

We’ve passed 3 parameters to this createdatapartition() function:

Summary Data set – Support Vector Machine In R

Our next step is to train our model.

1 trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)

The trainControl() method here, is taking 3 parameters.

1 svm_Linear <- train(V14 ~., data = training, method = "svmLinear",

We are passing 2 values in our “pre-process” parameter “center” & “scale”.