You are on page 1of 17

K-m eanS

X Y
K-3
2 -mcOS
KheahS
The appao ach

2 6 4 l o w Ho

+he p» obien
Solue

x pecation-
Call

5 6 aimization

Esteb assign i
the d a t a points Ho

7 +he ciose s t clusco

MStep) Comprt
t h e Cetaoicd or

8 3 each c l u t e >

app Haae+

Segment ation
6
6 docoment

ClS+esnins

m 4Seamentahio

5 2 DracwbueKS
Fir,Kmean ago

5 1 Cloesn't l e t data

Pointshet a a e
a s - a w a y asom

6 3 e a c h othen s h u n e

h e s Ame
Cveh thouyh + a
Clusie

Obuious be lo
4 o Same Clused.

Ebo ethod
K-means clusteing
Proble
se k-means clusteua algorithm to
Aivide. he followiv daia into two
lustes
2 2 3 5
3 2 3 5
-3 Anst ane Ba,e eaniv)
Anstate basal leaa Slo e O
molhocl S
ainir Cct
Cn ples insl eac or \eaanir Pp l i i t
decipio o AcayeA oncA ion
C ts lat10nsh
new nytance S e m ounte> ek,
orhey
Stod ealeS s Chain eo
in

C
tuge+ i untion ualue fo ho e
nstane.
includes
Necs est e h bo d
locallueihted 1eqaesion
CAse baseck easonh

Some t ines iet esed OaJazy eaanin me 4hods b(aUe

Atha dela Paocesi Ontil ahew ins1 Cun ce n u s b


Classitted,
hese mehadS CC estimate ur, p unc4on a co ocallu
to each
dilesent ne inStne be Clasitial

KNN
Scaor both Cassiitoct To ? oeqbessio obens
KUN Stooes al cuila ble Cases E Cla ssios neu Ca ses
t ts K nelh bor S
'aedict ions Coe mode tO C
new data point
Seaxh -te CKtie saini s S e t o hek-nat
Siilu insi ances Ae hergh bcas) SO ma»i7i)
ose in\cun s
foclass
K3
A
AA
K4 AA
A

vec-to uantizohi on LOO)


Veuelopect Classicahion cjs, Ao> binaa
nuti-Cla s Classiticaton Problems,
KNN
eyis e entire ta
ainma dcctu set
Lva s outo Choose
hou
Nalo4kc allous
mcun
Aaini
>cainin instuntes o heun
hcu ono
leasns ecc hoe in es Should look
e. Te v u e o ho Dr instunes is optimiz
ouain eannin
O

stoai
stosik
Zve educe e ne mod S e q u r emeks

e ertise \yaina AutceA iA


2 0 rra 2$v26 ie.
s u dimn
Set-oyanizns M( Som) A unsueyLoiecd odeap \euming
odo wed > e c t e datetiun Or olimah)i0oL
nehon. 1t o ditte o n AN a alppl Compe t t t e e

a opMS e NLP
toe0-CO hon eashih nandwted
1. Determine parameter K = number of nearest neighbors
2. Calculate the distance between the query-instance and all the training samples
5. Sort thne distance and determine nearest neighbors based on the K-th minimum distai
4. Gather the category r of the nearest neighbors
. Use simple majority of the category of nearest neighbors as the prediction value ol u

query instance

calculate KNN by hand


We will use again the previous example (WhatisKNN.html) to
computation. If you want to download the MS excel companion of this tutorial, click nere
(purchase.html)

Example
and objective testun8
from the questionnaires survey (to ask people opinion)
We have data
tissue 1s
with two attributes (acid durability and strength) to classify whether special paper
a

good or not. Here is four training samples

x2 Strength
Y = Classification
X1 Acid Durability (seconds)
(kg/square meter)

1 Bad

4 Bad
7

Good
3

4 Good
1

PLURALSIONT

Tech
skills look
good on
everyone
GIFT TECH SKILLS
Now the factory produces a new paper tissue that pass laboratory test with X1 = 3 and Az
Without another expensive survey, can we guess what the classification of this new tissue l18

1. Determine parameter K number of nearest


=

neighbors
Suppose use K = 3

2. Calculate the distance between the query-instance and all the training samples
Coordinate ofquery instance is (3, 7), instead of calculating the distance we compute square
distance which is faster to calculate (without square root)

x2 = Strength

X1 Acid Durability (seconds) Square Distance to query instance (3, 7)


(kg/square meter)

(7-3) +(7-7)2 =
16

(7-3)2 +(4-7 =25

4 (3-3) +(4-7) =9

4 (1-3)+(4-7)2 = 13

3. Sort the distance and determine nearest neighbors based on the K-th mininmum distance

x2
X1 Acid Strength Rank
Square Distance to Is it included in 3-
Durability minimurm
query instance (3, 7) Nearest neighbors?
(seconds) (kg/square distance
meter)

(7-3)2 +(7-7) =16


1 3 Yes

(7-3) +(4-7)2 =25 No


A

A (3-3)3 +(4-7)2 = 9
Yes
A (1-3) +(4-7 = 13 2 Yes

that
4. Gather the category ofthe nearest neighbors. Notice in the second row last column
this data is more than
the category of nearest neighbor (Y) is not included because the rank of
3 (=K).
X2
X1 Acid Strength Square Distance to Rank Is it included in 3- Y = Category of
minimum Nearest nearest
Durability query instance (3,
(seconds) (kg square 7) distance neighbors? Neighbor
meter)

(7-3)+(7-7) = 16 3 Yes Bad

1 (7-3)(4-7) =25 4 No

3 (3-3+(4-7) = 9 1 Yes Good

1 4 (1-3)+(4-7) = 13 2 Yes Good

5. Use simple majority of the category ofnearest neighbors as the prediction value of the
query instance

We have 2 good and 1 bad, since 2>1 then we conclude that a new paper tissue that pass
laboratory test with X1 3 and X2 = 7 is included in Good category.
SUM Snit-2

Machine?
What is Support Vector
algorithm is to find
vector machine
The objective of the support
N-dimensional space(N
the number of
-

a hyperplane in an

classifies the data points.


features) that distinctly

of data points, there many are


To separate the two classes
chosen. Our objective istop
possiblehyperplanes that could be
maximum
maximum margin, i.e the
find plane that has the
a
classes. Maximizing the
distance between data points of both
reinforcement so that future
margin distance provides some
confidence.
data points can be classified with more

Support Vectors
Hyperplanes and

Hyperplanes are decision boundaries that help classify the data


Data points falling on either side of the hyperplane can
points.
the dimension of the
be attributed to different classes. Also,
number of features. If the number
hyperplane depends upon the
a line. If the
of input features is 2, then the hyperplane just
is
becomes a
number of input features is 3, then the hyperplane
two-dimensional plane. It becomes difficult to imagine when
the

number of features exceeds 3.


Understanding Support Vector Machine(SVM) algorithm from examples
(along with code)
where n is a number of features you have) with the value of each feature being the value ofa
particular coordinate. Then, we perform classihcation by hnding the hyper-plane that
differentiates the two classes very well (look at the below snapshot)

Support Vectors are simply the coordinates of individual observation. The SVM classiner is a
frontier that best segregates the two classes (hyper-plane/ line).

Youcan look at ineand a fewexamples of their working here.

How does it work?


Above, we got accustomed to the process of segregating the two classes with a
hyper-plane
Now the burning question is "How can we identify the right hyper plane?" Dont worry. it's not
S hard as you think

Let's understand:

Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A. B, and
C). Now, identify the right hyper-plane to classify stars and circles.

Ho

You need to remernber a thumb rule to identify the right hyper-plane: "Select the hyper
plane which segregates the two classes better" In this scenario, hyper-plane "B
has excellently performed this job.
Identify the righthyper-plane (Scenario-2): Here, we have three hyper-planes (A, B, and
C) and all are segregating the classes well. Now, How can we identify the right hyper
plane?

Here, maximizing the distances between nearest data point (either class) and hyper-plane
wit help us to decide the right hyper-plane. This distance is called as Margin. Let's look at

the below snapshot


from examples
Vector Machine(SVM) algorithm
Understanding Support
(along with code)

To c
nam
anal

C is high as compared to both A and B.


Above, you can see that the margin for hyper-plane
Hence, we name the right hyper-plane as C. Another lightning reason for selecting the
hyper-plane with higher margin is robustness. If we select a hyper-plane having low

margin then there is high chance of miss-classification.

Identify the right hyper-plane (Scenario-3):Hint: Use the rules as discussed in previous

section to identify the right hyper-plane

Some of you may have selected the hyper-plane B as it has higher margin compared to AA.

But. here is the catch, SVM selects the hyper-plane which classiDes the classeS

accurately prior to maximizing margin. Here, hyper-plane B has a classification error and A

has classified all correctly. Therefore, the right hyper-plane is A.

Can we classify two classes (Scenario-4)?: Below, I am unable to segregate the two
classes using a straight line, as one of the stars lies in the territory of other(circle) class as
an outlier.

ASTnave already mentioned. one star at other end is like an oitlier for star class. The SVM
oritnm has a feiture to ignore outliers and find the hyper-plane that has the maximum

margin. Hence, we can say, SVM classifcation is robust to outliers,


2.
Applications of SVM in Real World
Aswe have seen, SVMs depends
on3pervised iear algorithms. The aim of using SVM is tn

COTrectly classity unseen data. SVMs have a number of


applications in several tlelds.
Some commmon
applications of SVM are-
Face detection -

SVMc classify parts of the image as a face and non-face and create a square
boundary around the face.
T e x t a n d hypertext categorization - SVMs allow Text and hypertext categorization tor bot

inductive and transductive models. 1hey use training data to classify documents into ditferent

catego11e>. It eategorizes on the basis of the score generated and then compares with the

threshoid value.
e Classifieation of images-se of SVMs provides better search accuracv for image
traditional query-bascd scarchin
classific.alion. It provides better accuracyineompansonto the

iechniques,
Bioinformaiics -

It includcs proten classificalion and cancer classification. We use SVM io

the basis of genes and other biological


identifing the classification of genes, patients on

proble17s.
for protein remot
. P r o t e i n fold and
remote homology detection- Appiy SVM algorithms
homology deteclion.
recogl7e handwritten characters used A-icdals-:.
Handwriting recognition
We usC SVUS lO
M baSed GPC to control chadtic
predictive confrol(GPC)
-

Use S dy
.
Gencralized

11itht
Gradient Descent trom Scratch in Python

Advantages:
1.SVM works relatively well when there is a clear margin of separation
between classes.

2. SVM is more effective in high dimensional spaces.

3. SVM is effective in cases where the number of dimensions is greater


than the number of samples.

4. SVM is relatively men1ory efficient


of Support Vector
Machines (SVM)
Advantages

lt C«
be used for data
SVM y helpful me e dont have much dea about the data
distributed and
Such age.
texi, audio Can be used for the data that is not regularly
have unknown distribution

Drovides a very useful technique within it known as kernel and by the appiicaion of
The S
assoc kernelfunction we can solve any commplex problem. torms in
have different
is not necessarily linear and
can
Kernel VIdes choosing a tinction which
terms ferent data it opeI.les on and thus is a non-parametric functIon.
Data have samples that are linearly
In Clas ation problems, there is a strong assumption that is
be converted into High
separatie but with the introduction of kernel, Input data can
dimensional data avoiding the need of this assumption. f

kernel function, x1, x2 are n-dimensional inputs and is a


K(x1. x T(X1), f(x2)) Wheie Kis the 1s Used
into m-dimensional space and (x1, x2)
Tunctionatis used to map -dimensional space
to spec'y indicate the dot product
and performs well when there is a ciear

SVM gehorally do not suffer condition of overfitting than


can be used when total
no of samples is less
indicati 7 of separation between classes. SVM
limensions and pertorms well in terms of memory.
the no cf

well on the out of sample data. Due to this as it performs well


SVM performs and generalized
be fasi as the sure fact says thatin SVM
on out 1eneralization sample data SVM proves itself to
for each and
the kernel function is evaluated and performed
forthe:ssification of one sdmple ,

every Support vectorS.


that it is able to handle High dimensional
he o portant advantago of SVM Algorithm is
data to nd this proves to be a great help taking nto account its usage and application in
Machin arning field.

Support ctor iachine is useful in finding the separating Hyperplane ,Finding


a hyperplane
between difterent groups.
can be ueful to classity the data correctly

SVM has nature of Convex Optimization which is very helpful as we are assured of
a
optimality in
results t h e answer would be global mininmum instead of a local minimum

SVM v.e can due to the large margin that it likes to generate, we can fit in more data andd

classify it perfectiy

Outliers have less influence in SvM Algorithm theretore tnere are less chances of skewing the

results as ouliers alfect the mean of the data and therelore mean cannot represent the data set

vm/
Advantages of Support Vector Machines (SVM)

which it s able to do effect of


e
e liers,Thus as ther influence o
having
outliersr SVM,it proves to be helpful
As In S V , the Classifier is dependent ideally only on a subset of points, while maximi
distance between closest poinis oftwo classes (Marain), So We do not need to take care and take

into account all the points but aking taking only subset of points become helpful
than most
here are many algorithms used for classification in machine learning but SVM is better
of the other algorithms used as it has a better accuracy in results.
anid
SVM Classifier in comparison to other classifiers have better computational complexity
are not same ,SVM can be used ds t s
even if the number of positive and negative examples
the ability to normalize the data or to project into the

space of the decision boundary separating the two classes.

is the reasoon
The other reason to say that SVM is better than other algorithms
that it can also perform in n-Dimensional space
to algorithm such as Artificial
The Execution time comes out to be very little in comparison
Neuron Network.
little modification in the feature
The other say SVM better is the fact that after doing
reason to

extracted data does not afect the results which were


before. It is converging very fast and as earlier stated in
the article
expected
Kernel Functionality, In general Polynomial kernel proves out to be a better

factor in terms of Support Vector Machine.

In comparison with Naive Bayes Algorithm which is also a technique used for
classification. Support Vector Machine Aigorithm has a faster prediction along with better

accuracy.
. in comparison with Logistic Regression which is also a classification method SVM proves itsel to
be cheaper, it has a time complexity of O(N^2*K) where K is no
of support vectors whereas logistic Regression had the time complexity of O(N^3).

SVMs can be robust, even when the training sample has some bias and one of the reasons
that SVM proves out to be robust is its ability to deliver unique solution better than Neural
Networks where we get more than one solutions Corresponding to each local nminima for different
samples.

VM/
Unit 2

BAYESIAN BELIEF NETWORKS - EXAMPLE -1


P(B) P(E)
Burglary 001 Earthquake 002

B E P A B,E)
95
Alarm
F 94
F .29
F F 001

A P(JJA) A P(MIA)
JohnCalls T 90 MaryCallIs T .70
.05 F 01
BAYESIAN BELIEF NETWORKS-EXAMPLE 1
1 What is the probability that the P(B)|
Burglary 001
P(E)
Earthquake002
alarm has sounded but neither a
B E PAJB,E)
T .95
burglary nor an earthquake has T F 94
Alarm
F T .29
F F 001
occurred, and both John and Merry
call? A PJJA) A P(MIA)
JohnCalls T90 MaryCalls T .70
05 .01
Solution:

Pljama aa-ba-e) =
P{j | a) P(m | a) P(a | -b, -e) P(--b) P(-e)
=
0.90x 0.70 x 0.001x 0.999 x 0.998
= 0.00062

You might also like