Professional Documents
Culture Documents
Zulaikha Lateef
Zulaikha is a tech enthusiast working as a Research Analyst at Edureka.
To get in-depth knowledge on Data Science and the various Machine Learning Algorithms, you can enroll for live Data Science
Certi cation Training by Edureka with 24/7 support and lifetime access.
Before we move any further, let me list down the topics I’ll be covering:
The key to machine learning is the data. Machines learn just like us humans. We humans need to collect information and data to
learn, similarly, machines must also be fed data in order to learn and make decisions.
https://www.edureka.co/blog/support-vector-machine-in-r/ 1/15
15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
To understand Machine learning, let’s consider an example. Let’s say you want a machine to predict the value of a stock. In such
situations, you just feed the machine with relevant data. After that, you must create a model which is used to predict the value of
the stock.
One thing to keep in mind is, the more data you feed the machine, the better it’ll learn and make more accurate predictions.
Obviously, ML is not so simple. For a machine to analyze and get useful insights from data, it must process and study the data by
running di erent algorithms on it. And in this blog, we’ll be discussing one of the most widely used algorithms called SVM.
So, that was a brief introduction to Machine Learning, if you want to learn more about Machine Learning, check out this video by
our experts:
1. Supervised learning
Supervised means to oversee or direct a certain activity and make sure it’s done correctly. In this type of learning the machine
learns under guidance.
At school, our teachers guided us and taught us, similarly in supervised learning, you feed the model a set of data called training
data, which contains both input data and the corresponding expected output. The training data acts as a teacher and teaches the
model the correct output for a particular input so that it can make accurate decisions when later presented with new data.
2. Unsupervised learning
In unsupervised learning, the model is given a data set which is neither labeled nor classi ed. The model explores the data and
draws inferences from data sets to de ne hidden structures from unlabeled data.
An example of unsupervised learning is an adult like you and me. We don’t need a guide to help us with our daily activities, we
gure things out on our own without any supervision.
Subscribe to our Newsletter, and get personalized recommendations.
×
Sign up with Google
https://www.edureka.co/blog/support-vector-machine-in-r/ 2/15
15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
3. Reinforcement learning
Reinforcement means to establish or encourage a pattern of behavior. Let’s say you were dropped o at an isolated island, what
would you do?
Initially, you’d panic and be unsure of what to do, where to get food from, how to live and so on. But after a while you will have to
adapt, you must learn how to live in the island, adapt to the changing climates, learn what to eat and what not to eat.
You’re following what is known as the hit and trail concept because you’re new to this surrounding and the only way to learn, is
experience and then learn from your experience.
This is what reinforcement learning is. It is a learning method wherein an agent (you, stuck on an island) interacts with its
environment (island) by producing actions and discovers errors or rewards.
If you want to get an in-depth explanation about Support Vector Machine, check out this video recorded by our Machine Learning
experts.
What Is SVM?
SVM (Support Vector Machine) is a supervised machine learning algorithm which is mainly used to classify data into di erent
classes. Unlike most algorithms, SVM makes use of a hyperplane which acts like a decision boundary between the various
classes.
SVM can be used to generate multiple separating hyperplanes such that the data is divided into segments and each segment
contains only one kind of data.
Before moving further, let’s discuss the features of SVM: Subscribe to our Newsletter, and get personalized recommendations.
×
1. SVM is a supervised learning algorithm. This means that SVM trains on a set of labeled data. SVM studies the labeled
Sign up with Google
training data and then classi es any new input data depending on what it learned in the training phase.
2. A main advantage of SVM is that it can be used for both classi cation and regression problems. Though SVM is mainly
known for classi cation, the SVR (Support Vector Regressor)Signup
is used forFacebook
with regression problems.
3. SVM can be used for classifying non-linear data by using the kernel trick. The kernel trick means transforming data into
another dimension that has a clear dividing margin between classes of data. After which you can easily draw a hyperplane
Already have an account? Sign in.
between the various classes of data.
https://www.edureka.co/blog/support-vector-machine-in-r/ 3/15
15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
For a second, pretend you own a farm and you have a problem–you need to set up a fence to protect your rabbits from a pack of
wolves. But where do you build your fence?
One way to get around the problem is to build a classi er based on the position of the rabbits and wolves in your pasture.
So if I do that, and try to draw a decision boundary between the rabbits and the wolves, it looks something like this. Now you can
clearly build a fence along this line.
In simple terms, this is exactly how SVM works. It draws a decision boundary, i.e. a hyperplane between any two classes in order
to separate them or classify them.
Now I know you’re thinking how do you know where to draw a hyperplane?
The basic principle behind SVM is to draw a hyperplane that best separates the 2 classes. In our case the two classes are the
rabbits and the wolves. Before we move any further, let’s try to understand what a support vector is.
https://www.edureka.co/blog/support-vector-machine-in-r/ 4/15
15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
The hyperplane is drawn based on these support vectors and an optimum hyperplane will have a maximum distance from each
of the support vectors. And this distance between the hyperplane and the support vectors is known as the margin.
To sum it up, SVM is used to classify data by using a hyperplane, such that the distance between the hyperplane and the support
vectors is maximum.
Let’s say that I input a new data point and now I want to draw a hyperplane such that it best separates these two classes.
Explore Curriculum
So, I start o by drawing a hyperplane and then I check the distance between the hyperplane and the support vectors. Here I’m
basically trying to check if the margin is maximum for this hyperplane.
But what if I draw the hyperplane like this? The margin for this hyperplane is clearly more than the previous one. So, this is my
optimal hyperplane.
Signup with Facebook
https://www.edureka.co/blog/support-vector-machine-in-r/ 5/15
15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
So far it was quite easy, our data was linearly separable, which means that you could a draw a straight line to separate the two
classes!
You possibly can’t draw a hyperplane like this! It doesn’t separate the two classes.
Kernel functions o er the user the option of transforming nonlinear spaces into linear ones.
Signup with Facebook
Until this point, we were plotting our data on 2-dimensional space. So, we had only 2 variables, x and y.
A simple trick would be transforming the two variables x and y into a new feature space
Already have an involving a new
account? Sign in. variable z. Basically,
we’re visualizing the data on a 3-dimensional space.
https://www.edureka.co/blog/support-vector-machine-in-r/ 6/15
15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
When you transform the 2D space into a 3D space you can clearly see a dividing margin between the 2 classes of data. And now
you can go ahead and separate the two classes by drawing the best hyperplane between them.
This sums up the idea behind Non-linear SVM. To understand the real world applications of Support Vector Machine let’s look at
a use case.
A group of professionals held an experiment to classify colon cancer tissue by using SVM. The dataset consisted of about 2000
transmembrane protein samples and only about 50-200 gene samples were input into the SVM classi er. The sample input to
the SVM had both colon cancer tissue samples and normal colon tissues.
The main objective of this study was to classify gene samples based on whether they are cancerous or not. So SVM was trained
using the 50-200 samples in order to discriminate non-tumor from tumor specimens. The performance of the SVM classi er was
very accurate for even a small data set and its performance was compared to other classi cation algorithms like Naïve Bayes and
in each case, the SVM outperformed Naive Bayes.
So after this experiment, it was clear that SVM classi ed the data more e ectively and it worked exceptionally good with small
data sets.
https://www.edureka.co/blog/support-vector-machine-in-r/ 7/15
15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
You can also refer this R for Data Science blog to learn more about how the entire Data Science work ow can be implemented
using R.
Problem Statement:
To Study a heart disease data set and to model a classi er for predicting whether a patient is su ering from any heart disease or
not.
In this demo, we’ll be using the Caret package. The caret package is also known as the Classi cation And REgression Training, has
tons of functions that helps to build predictive models. It contains tools for data splitting, pre-processing, feature selection,
tuning, unsupervised learning algorithms, etc.
1 install.packages(“caret”)
The caret package is very helpful because it provides us direct access to various functions for training our model with various
machine learning algorithms like KNN, SVM, decision tree, linear regression, etc.
After installing it, we just need to load the package into our console, to do that we have this code:
1 library('caret')
For this demo, we’ll be using a Heart Disease data set which consists of various attributes like the person’s age, sex, cholesterol
level and etc. In the same data set, we’ll have a target variable, which is used to predict whether a patient is su ering from any
heart disease or not
In short, we’ll be using SVM to classify whether a person is going to be prone to heart disease or not.
×
Subscribe to our Newsletter, and get personalized recommendations.
https://www.edureka.co/blog/support-vector-machine-in-r/ 8/15
15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
In the above line of code, we’re reading the dataset which is stored in a CSV format and that’s why we’ve used the read.csv
function to read it from the speci ed path.
ta Science Training
MACHINE
PYTHON PYTHON
LEARNING DATA SCIEN
CERTIFICATION PROGRAMMING
CERTIFICATION CERTIFICATI
TRAINING FOR DATA CERTIFICATION
TRAINING USING COURSE USI
SCIENCE COURSE
PYTHON
Machine Learning
Python Certi cation Python Programming Data Science Certi
Certi cation Training
Training for Data Science Certi cation Course Course using R
using Python
Reviews Reviews Reviews Reviews
The ‘sep’ attribute indicates that the data is stored in a CSV or Comma Separated Version.
Now that we’ve imported our dataset, let’s check the structure of our dataset:
For checking the structure of data frame we can call the function str():
1 str(heart)
The output shows us that our dataset consists of 300 observations each with 14 attributes.
If you want to display the top 5-6 rows of the data set, use the head() function:
1 head(heart)
Our next step is to split the data into training set and testing set, this is also called data splicing.
Signup with Facebook
We’ll be using the training set speci cally for our model building and the testing set for evaluating the model:
https://www.edureka.co/blog/support-vector-machine-in-r/ 9/15
15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
The caret package provides a method createDataPartition() which is basically for partitioning our data into train and test set.
The “y” parameter takes the value of variable according to which data needs to be partitioned. In our case, target variable is
at V14, so we are passing heart$V14
The “p” parameter holds a decimal value in the range of 0-1. It’s to show the percentage of the split. We are using p=0.7. It
means that data split should be done in 70:30 ratio. So, 70% of the data is used for training and the remaining 30% is for
testing the model.
The “list” parameter is for whether to return a list or matrix. We are passing FALSE for not returning a list
Now this createDataPartition() method is returning a matrix “intrain”. This intrain matrix has our training data set and we’re
storing this in the ‘training’ variable and the rest of the data, i.e. the remaining 30% of the data is stored in the testing variable.
Next, for checking the dimensions of our training data frame and testing data frame, we can use these:
1 dim(training);
2 dim(testing);
Our next step is to clean the data, so if there are any missing values or inconsistent values, they have to be dealt with before we
build the training model
We’ll be using the anyNA() method, which checks for any null values:
1 anyNA(heart)
on running this, we get the return values as false, which means that there are no missing values in our dataset.
Next, we’re checking the summary of our data by using the summary() function
1 summary(heart)
The output shows that the values of the various variables are not standardized.
For example, the V14 variables, which is our target variable, it holds only 2 values, either 0 or 1.
Instead, this should be a categorical variable. To convert these to categorical variables, we need to factorize them:
1 training[["V14"]] = factor(training[["V14"]])
The above code will convert the training data frame’s “V14” column to a factor variable.
Signup with Facebook
First, let’s focus on the traincontrol() method:
https://www.edureka.co/blog/support-vector-machine-in-r/ 10/15
15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
The “method” parameter de nes the resampling method, in this demo we’ll be using the repeatedcv or the repeated cross-
validation method.
The next parameter is the “number”, this basically holds the number of resampling iterations.
The “repeats ” parameter contains the sets to compute for our repeated cross-validation. We are using setting number =10
and repeats =3
This trainControl() method returns a list. We are going to pass this on our train() method.
The train() method should be passed with “method” parameter as “svmLinear”. We are passing our target variable V14. The
“V14~.” denotes a formula for using all attributes in our classi er and V14 as the target variable. The “trControl” parameter
should be passed with results from our trianControl() method. The “preProcess” parameter is for preprocessing our training data.
After pre-processing, these convert our training data with mean value as approximately “0” and standard deviation as “1”. The
“tuneLength” parameter holds an integer value. This is for tuning our algorithm.
You can check the result of our train() method. We are saving its results in the svm_Linear variable.
1 svm_Linear
Now, our model is trained with C value as 1. We are ready to predict classes for our test set. We can use predict() method.
The caret package provides predict() method for predicting results. We are passing 2 arguments. Its rst parameter is our trained
model and second parameter “newdata” holds our testing data frame. The predict() method returns a list, we are saving it in a
test_pred variable.
Now let’s check the accuracy of our model. We’re going to use the confusion matrix to predict the accuracy:
Signup with Facebook
1 confusionMatrix(table(test_pred, testing$V14))
https://www.edureka.co/blog/support-vector-machine-in-r/ 11/15
15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
The output shows that our model accuracy for test set is 86.67%.
By following the above procedure, we can build our svmLinear classi er.
We can also do some customization for selecting C value(Cost) in Linear classi er. This can be done by inputting values in grid
search.
The next code snippet will show you, building & tuning of an SVM classi er with di erent values of C.
We are going to put some values of C using expand.grid() into “grid” dataframe. Next step is to use this dataframe for testing our
classi er at speci c C values. It needs to be put in train() method with tuneGrid parameter.
1 grid <- expand.grid(C = c(0,0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2,5))
2 svm_Linear_Grid <- train(V14 ~., data = training, method = "svmLinear",
3 trControl=trctrl,
4 preProcess = c("center", "scale"),
5 tuneGrid = grid,
6 tuneLength = 10)
7 svm_Linear_Grid
8 plot(svm_Linear_Grid)
Signup with Facebook
1 test_pred_grid <- predict(svm_Linear_Grid, newdata = testing)
2 test_pred_grid
Already have an account? Sign in.
https://www.edureka.co/blog/support-vector-machine-in-r/ 12/15
15/09/2019 Support Vector Machine In R | Using SVM To Predict Heart Diseases | Edureka
1 confusionMatrix(table(test_pred_grid, testing$V14))
The results of the confusion matrix show that this time the accuracy on the test set is 87.78 %, which is more accurate than our
previous result.
With this, we come to the end of this blog. I hope you found this informative and helpful, stay tuned for more tutorials on
machine learning.
To get in-depth knowledge of the di erent machine learning algorithms along with its various applications, you can enroll here
for live online training with 24/7 support and lifetime access.
Introduction to Business Know The Science Behind 3 Scenarios Where Predictive Android Developme
Analytics with R Product Recommendation Analytics is a Must Android 5.0 Lollipop
With R Programming
‹›
https://www.edureka.co/blog/support-vector-machine-in-r/ 13/15