You are on page 1of 2

SVM Kernels

In practice, SVM algorithm is implemented with kernel that transforms an


input data space into the required form. SVM uses a technique called the
kernel trick in which kernel takes a low dimensional input space and
transforms it into a higher dimensional space. In simple words, kernel
converts non-separable problems into separable problems by adding
more dimensions to it. It makes SVM more powerful, exible and
accurate. The following are some of the types of kernels used by SVM.

Linear Kernel
It can be used as a dot product between any two observations. The
formula of linear kernel is as below −

K(x,xi)=sum(x∗xi)K(x,xi)=sum(x∗xi

From the above formula, we can see that the product between two
vectors say & is

the sum of the multiplication of each pair of input values.

Polynomial Kernel
It is more generalized form of linear kernel and distinguish curved or
nonlinear input space. Following is the formula for polynomial kernel −

k(X,Xi)=1+sum(X∗Xi)^dk(X,Xi)=1+sum(X∗Xi)^d

Here d is the degree of polynomial, which we need to specify manually in


the learning algorithm.

Radial Basis Function (RBF) Kernel


RBF kernel, mostly used in SVM classi cation, maps input space in
inde nite dimensional space. Following formula explains it mathematically

K(x,xi)=exp(−gamma∗sum(x−xi^2))K(x,xi)=exp(−gamma∗sum(x−xi^2))

Here, gamma ranges from 0 to 1. We need to manually specify it in the
learning algorithm.

A good default value of gamma is 0.1.

As we implemented SVM for linearly separable data, we can implement it


in Python for the data that is not linearly separable. It can be done by
using kernels

6. Open Source
The best thing about the machine learning library is that it is open source
so anyone can use it as much as they have internet connectivity. So,
people can manipulate the library and come up with a fantastic variety of
useful products. And it has become another DIY community which has a
massive forum for people getting started with it and those who nd it hard
to use it.

7. Feature Columns
TensorFlow has feature columns which could be thought of as
intermediates between raw data and estimators; accordingly, bridging
input data with our model.

8. Availability of Statistical Distributions


This library provides distributions functions including Bernoulli, Beta,
Chi2, Uniform, Gamma, which are essential, especially where considering
probabilistic approaches such as Bayesian models.

9. Layered Components
TensorFlow produces layered operations of weight and biases from the
function such as tf.contrib.layers and also provides batch normalization,
convolution layer, and dropout layer. So tf.contrib.layers.optimizers have
optimizers such

10. Visualizer (With TensorBoard)


We can inspect a di erent representation of a model and make the
changed necessary while debugging it with the help of TensorBoard.

11.Event Logger (With TensorBoard)


It is just like UNIX, where we use tail - f to monitor the output of tasks at
the cmd. It checks, logging events and summaries from the graph and
production with the TensorBoard.

4. Video Detection
The deep learning algorithm is used for video detection. It is used for
motion detection, real-time threat detection in gaming, security, airports,
and UI/UX eld.

For example, NASA is developing a deep learning network for object


clustering of asteroids and orbit classi cation. So, it can classify and
predict NEOs (Near Earth Objects).

5. Text-Based Applications
Text-based application is also a popular deep learning algorithm.
Sentimental analysis, social media, threat detection, and fraud detection,
are the example of Text-based applications.

For example, Google Translate supports over 100 languages.



Some companies who are currently using TensorFlow are Google, AirBnb,
eBay, Intel,

Support Vector Machine or SVM is one of the most popular Supervised


Learning algorithms, which is used for Classi cation as well as
Regression problems. However, primarily, it is used for Classi cation
problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future.
This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the


hyperplane. These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine.

Types of SVM :- SVM can be of two types:


o Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classi ed into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classi er is used called as Linear SVM classi er.

o Non-linear SVM: Non-Linear SVM is used for non-linearly separated


data, which means if a dataset cannot be classi ed by using a straight
line, then such data is termed as non-linear data and classi er used is
called as Non-linear SVM classi er.

fi
fi
fi
𝑥
𝑥
𝑖
ff
fi
fi
fi
fi
fi
fi
fi
fl
fi
fi
fi
Linearly separable and non-linearly separable data
Linear and non-linear separable data are described in the diagram below.
Linearly separable data is data that is populated in such a way that it can
be

easily classi ed with a straight line or a hyperplane. Non-linearly


separable data, on the other hand, is described as data that cannot be
separated using a simple straight line (requires a complex classi er).

Based on the maximum margin, the Maximal-Margin Classi er chooses


the

optimal hyperplane. The dotted lines, parallel to the hyperplane in the

following diagram are the margins and the distance between both these

dotted lines (Margins) is the Maximum Margin.

Hierarchical Clustering in Machine Learning


Hierarchical clustering is another unsupervised machine learning
algorithm, which is used to group the unlabeled datasets into a cluster
and also known as hierarchical cluster analysis or HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a


tree, and this tree- shaped structure is known as the dendrogram.

Sometimes the results of K-means clustering and hierarchical clustering


may look similar, but they both di er depending on how they work. As
there is no requirement to predetermine the number of clusters as we did
in the K-Means algorithm.

The hierarchical clustering technique has two approaches:


Agglomerative: Agglomerative is a bottom-up approach, in which the
algorithm starts with taking all data points as single clusters and merging
them until one cluster is left. 

Divisive: Divisive algorithm is the reverse of the agglomerative algorithm
as it is a top-down approach. 


What is an EM algorithm?
The Expectation-Maximization (EM) algorithm is de ned as the
combination of various unsupervised machine learning algorithms, which
is used to determine the local maximum likelihood estimates (MLE) or
maximum a posteriori estimates (MAP) for unobservable variables in
statistical models. Further, it is a technique to nd maximum likelihood
estimation when the latent variables are present. It is also referred to as
the latent variable model.

A latent variable model consists of both observable and unobservable


variables where observable can be predicted while unobserved are
inferred from the observed variable. These unobservable variables are
known as latent variables.

How the Agglomerative Hierarchical clustering Work?

The working of the AHC algorithm can be explained using the below
steps:

o Step-1: Create each data point as a single cluster. Let's say there are N
data points, so the number of clusters will also be N.

Step-2: Take two closest data points or clusters and merge them to form
one cluster. So, there will now be N-1 clusters.

Step-3: Again, take the two closest clusters and merge them together to
form one cluster. There will be N-2 clusters.

Step-4: Repeat Step 3 until only one cluster left. So, we will get the
following clusters. Consider the below images:

o Step-5: Once all the clusters are combined into one big cluster, develop
the dendrogram to divide the clusters as per the problem.

fi
ff
fi
fi
fi
fi

You might also like