MLT Notes

K-m eanS
X Y
K-3
2 -mcOS
KheahS
The appao ach
2 6 4 l o w Ho
+he p» obien
Solue
x pecation-
Call
5 6 aimization
Esteb assign i
the d a t a points Ho
7 +he ciose s t clusco
MStep) Comprt
t h e Cetaoicd or
8 3 each c l u t e >
app Haae+
Segment ation
6
6 docoment
ClS+esnins
m 4Seamentahio
5 2 DracwbueKS
Fir,Kmean ago
5 1 Cloesn't l e t data
Pointshet a a e
a s - a w a y asom
6 3 e a c h othen s h u n e
h e s Ame
Cveh thouyh + a
Clusie
Obuious be lo
4 o Same Clused.
Ebo ethod
K-means clusteing
Proble
se k-means clusteua algorithm to
Aivide. he followiv daia into two
lustes
2 2 3 5
3 2 3 5
-3 Anst ane Ba,e eaniv)
Anstate basal leaa Slo e O
molhocl S
ainir Cct
Cn ples insl eac or \eaanir Pp l i i t
decipio o AcayeA oncA ion
C ts lat10nsh
new nytance S e m ounte> ek,
orhey
Stod ealeS s Chain eo
in
C
tuge+ i untion ualue fo ho e
nstane.
includes
Necs est e h bo d
locallueihted 1eqaesion
CAse baseck easonh
Some t ines iet esed OaJazy eaanin me 4hods b(aUe
Atha dela Paocesi Ontil ahew ins1 Cun ce n u s b

Classitted,
hese mehadS CC estimate ur, p unc4on a co ocallu
to each
dilesent ne inStne be Clasitial
KNN
Scaor both Cassiitoct To ? oeqbessio obens
KUN Stooes al cuila ble Cases E Cla ssios neu Ca ses
t ts K nelh bor S
'aedict ions Coe mode tO C
new data point
Seaxh -te CKtie saini s S e t o hek-nat
Siilu insi ances Ae hergh bcas) SO ma»i7i)
ose in\cun s
foclass
K3
A
AA
K4 AA
A
vec-to uantizohi on LOO)

Veuelopect Classicahion cjs, Ao> binaa
nuti-Cla s Classiticaton Problems,
KNN
eyis e entire ta
ainma dcctu set
Lva s outo Choose
hou
Nalo4kc allous
mcun
Aaini
>cainin instuntes o heun
hcu ono
leasns ecc hoe in es Should look
e. Te v u e o ho Dr instunes is optimiz
ouain eannin
O
stoai
stosik
Zve educe e ne mod S e q u r emeks
e ertise \yaina AutceA iA

2 0 rra 2$v26 ie.
s u dimn
Set-oyanizns M( Som) A unsueyLoiecd odeap \euming
odo wed > e c t e datetiun Or olimah)i0oL
nehon. 1t o ditte o n AN a alppl Compe t t t e e
a opMS e NLP
toe0-CO hon eashih nandwted
1. Determine parameter K = number of nearest neighbors
2. Calculate the distance between the query-instance and all the training samples
5. Sort thne distance and determine nearest neighbors based on the K-th minimum distai
4. Gather the category r of the nearest neighbors
. Use simple majority of the category of nearest neighbors as the prediction value ol u
query instance
calculate KNN by hand

We will use again the previous example (WhatisKNN.html) to
computation. If you want to download the MS excel companion of this tutorial, click nere
(purchase.html)
Example
and objective testun8
from the questionnaires survey (to ask people opinion)
We have data
tissue 1s
with two attributes (acid durability and strength) to classify whether special paper
a
good or not. Here is four training samples
x2 Strength
Y = Classification
X1 Acid Durability (seconds)
(kg/square meter)
1 Bad
4 Bad
7
Good
3
4 Good
1
PLURALSIONT
Tech
skills look
good on
everyone
GIFT TECH SKILLS
Now the factory produces a new paper tissue that pass laboratory test with X1 = 3 and Az
Without another expensive survey, can we guess what the classification of this new tissue l18
1. Determine parameter K number of nearest

=
neighbors
Suppose use K = 3
2. Calculate the distance between the query-instance and all the training samples
Coordinate ofquery instance is (3, 7), instead of calculating the distance we compute square
distance which is faster to calculate (without square root)
x2 = Strength
X1 Acid Durability (seconds) Square Distance to query instance (3, 7)

(kg/square meter)
(7-3) +(7-7)2 =
16
(7-3)2 +(4-7 =25
4 (3-3) +(4-7) =9
4 (1-3)+(4-7)2 = 13
3. Sort the distance and determine nearest neighbors based on the K-th mininmum distance
x2
X1 Acid Strength Rank
Square Distance to Is it included in 3-
Durability minimurm
query instance (3, 7) Nearest neighbors?
(seconds) (kg/square distance
meter)
(7-3)2 +(7-7) =16

1 3 Yes
(7-3) +(4-7)2 =25 No

A
A (3-3)3 +(4-7)2 = 9
Yes
A (1-3) +(4-7 = 13 2 Yes
that
4. Gather the category ofthe nearest neighbors. Notice in the second row last column
this data is more than
the category of nearest neighbor (Y) is not included because the rank of
3 (=K).
X2
X1 Acid Strength Square Distance to Rank Is it included in 3- Y = Category of
minimum Nearest nearest
Durability query instance (3,
(seconds) (kg square 7) distance neighbors? Neighbor
meter)
(7-3)+(7-7) = 16 3 Yes Bad
1 (7-3)(4-7) =25 4 No
3 (3-3+(4-7) = 9 1 Yes Good
1 4 (1-3)+(4-7) = 13 2 Yes Good
5. Use simple majority of the category ofnearest neighbors as the prediction value of the
query instance
We have 2 good and 1 bad, since 2>1 then we conclude that a new paper tissue that pass
laboratory test with X1 3 and X2 = 7 is included in Good category.
SUM Snit-2
Machine?
What is Support Vector
algorithm is to find
vector machine
The objective of the support
N-dimensional space(N
the number of
-
a hyperplane in an
classifies the data points.

features) that distinctly
of data points, there many are

To separate the two classes
chosen. Our objective istop
possiblehyperplanes that could be
maximum
maximum margin, i.e the
find plane that has the
a
classes. Maximizing the
distance between data points of both
reinforcement so that future
margin distance provides some
confidence.
data points can be classified with more
Support Vectors
Hyperplanes and
Hyperplanes are decision boundaries that help classify the data

Data points falling on either side of the hyperplane can
points.
the dimension of the
be attributed to different classes. Also,
number of features. If the number
hyperplane depends upon the
a line. If the
of input features is 2, then the hyperplane just
is
becomes a
number of input features is 3, then the hyperplane
two-dimensional plane. It becomes difficult to imagine when
the
number of features exceeds 3.

Understanding Support Vector Machine(SVM) algorithm from examples
(along with code)
where n is a number of features you have) with the value of each feature being the value ofa
particular coordinate. Then, we perform classihcation by hnding the hyper-plane that
differentiates the two classes very well (look at the below snapshot)
Support Vectors are simply the coordinates of individual observation. The SVM classiner is a
frontier that best segregates the two classes (hyper-plane/ line).
Youcan look at ineand a fewexamples of their working here.
How does it work?

Above, we got accustomed to the process of segregating the two classes with a
hyper-plane
Now the burning question is "How can we identify the right hyper plane?" Dont worry. it's not
S hard as you think
Let's understand:
Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A. B, and
C). Now, identify the right hyper-plane to classify stars and circles.
Ho
You need to remernber a thumb rule to identify the right hyper-plane: "Select the hyper
plane which segregates the two classes better" In this scenario, hyper-plane "B
has excellently performed this job.
Identify the righthyper-plane (Scenario-2): Here, we have three hyper-planes (A, B, and
C) and all are segregating the classes well. Now, How can we identify the right hyper
plane?
Here, maximizing the distances between nearest data point (either class) and hyper-plane
wit help us to decide the right hyper-plane. This distance is called as Margin. Let's look at
the below snapshot

from examples
Vector Machine(SVM) algorithm
Understanding Support
(along with code)
To c
nam
anal
C is high as compared to both A and B.

Above, you can see that the margin for hyper-plane
Hence, we name the right hyper-plane as C. Another lightning reason for selecting the
hyper-plane with higher margin is robustness. If we select a hyper-plane having low
margin then there is high chance of miss-classification.
Identify the right hyper-plane (Scenario-3):Hint: Use the rules as discussed in previous
section to identify the right hyper-plane
Some of you may have selected the hyper-plane B as it has higher margin compared to AA.
But. here is the catch, SVM selects the hyper-plane which classiDes the classeS
accurately prior to maximizing margin. Here, hyper-plane B has a classification error and A
has classified all correctly. Therefore, the right hyper-plane is A.
Can we classify two classes (Scenario-4)?: Below, I am unable to segregate the two
classes using a straight line, as one of the stars lies in the territory of other(circle) class as
an outlier.
ASTnave already mentioned. one star at other end is like an oitlier for star class. The SVM
oritnm has a feiture to ignore outliers and find the hyper-plane that has the maximum
margin. Hence, we can say, SVM classifcation is robust to outliers,

2.
Applications of SVM in Real World
Aswe have seen, SVMs depends
on3pervised iear algorithms. The aim of using SVM is tn
COTrectly classity unseen data. SVMs have a number of

applications in several tlelds.
Some commmon
applications of SVM are-
Face detection -
SVMc classify parts of the image as a face and non-face and create a square
boundary around the face.
T e x t a n d hypertext categorization - SVMs allow Text and hypertext categorization tor bot
inductive and transductive models. 1hey use training data to classify documents into ditferent
catego11e>. It eategorizes on the basis of the score generated and then compares with the
threshoid value.
e Classifieation of images-se of SVMs provides better search accuracv for image
traditional query-bascd scarchin
classific.alion. It provides better accuracyineompansonto the
iechniques,
Bioinformaiics -
It includcs proten classificalion and cancer classification. We use SVM io
the basis of genes and other biological

identifing the classification of genes, patients on
proble17s.
for protein remot
. P r o t e i n fold and
remote homology detection- Appiy SVM algorithms
homology deteclion.
recogl7e handwritten characters used A-icdals-:.
Handwriting recognition
We usC SVUS lO
M baSed GPC to control chadtic
predictive confrol(GPC)
-
Use S dy
.
Gencralized
11itht
Gradient Descent trom Scratch in Python
Advantages:
1.SVM works relatively well when there is a clear margin of separation
between classes.
2. SVM is more effective in high dimensional spaces.
3. SVM is effective in cases where the number of dimensions is greater

than the number of samples.
4. SVM is relatively men1ory efficient

of Support Vector
Machines (SVM)
Advantages
lt C«
be used for data
SVM y helpful me e dont have much dea about the data
distributed and
Such age.
texi, audio Can be used for the data that is not regularly
have unknown distribution
Drovides a very useful technique within it known as kernel and by the appiicaion of
The S
assoc kernelfunction we can solve any commplex problem. torms in
have different
is not necessarily linear and
can
Kernel VIdes choosing a tinction which
terms ferent data it opeI.les on and thus is a non-parametric functIon.
Data have samples that are linearly
In Clas ation problems, there is a strong assumption that is
be converted into High
separatie but with the introduction of kernel, Input data can
dimensional data avoiding the need of this assumption. f
kernel function, x1, x2 are n-dimensional inputs and is a

K(x1. x T(X1), f(x2)) Wheie Kis the 1s Used
into m-dimensional space and (x1, x2)
Tunctionatis used to map -dimensional space
to spec'y indicate the dot product
and performs well when there is a ciear
SVM gehorally do not suffer condition of overfitting than

can be used when total
no of samples is less
indicati 7 of separation between classes. SVM
limensions and pertorms well in terms of memory.
the no cf
well on the out of sample data. Due to this as it performs well

SVM performs and generalized
be fasi as the sure fact says thatin SVM
on out 1eneralization sample data SVM proves itself to
for each and
the kernel function is evaluated and performed
forthe:ssification of one sdmple ,
every Support vectorS.

that it is able to handle High dimensional
he o portant advantago of SVM Algorithm is
data to nd this proves to be a great help taking nto account its usage and application in
Machin arning field.
Support ctor iachine is useful in finding the separating Hyperplane ,Finding

a hyperplane
between difterent groups.
can be ueful to classity the data correctly
SVM has nature of Convex Optimization which is very helpful as we are assured of
a
optimality in
results t h e answer would be global mininmum instead of a local minimum
SVM v.e can due to the large margin that it likes to generate, we can fit in more data andd
classify it perfectiy
Outliers have less influence in SvM Algorithm theretore tnere are less chances of skewing the
results as ouliers alfect the mean of the data and therelore mean cannot represent the data set
vm/
Advantages of Support Vector Machines (SVM)
which it s able to do effect of

e
e liers,Thus as ther influence o
having
outliersr SVM,it proves to be helpful
As In S V , the Classifier is dependent ideally only on a subset of points, while maximi
distance between closest poinis oftwo classes (Marain), So We do not need to take care and take
into account all the points but aking taking only subset of points become helpful
than most
here are many algorithms used for classification in machine learning but SVM is better
of the other algorithms used as it has a better accuracy in results.
anid
SVM Classifier in comparison to other classifiers have better computational complexity
are not same ,SVM can be used ds t s
even if the number of positive and negative examples
the ability to normalize the data or to project into the
space of the decision boundary separating the two classes.
is the reasoon
The other reason to say that SVM is better than other algorithms
that it can also perform in n-Dimensional space
to algorithm such as Artificial
The Execution time comes out to be very little in comparison
Neuron Network.
little modification in the feature
The other say SVM better is the fact that after doing
reason to
extracted data does not afect the results which were

before. It is converging very fast and as earlier stated in
the article
expected
Kernel Functionality, In general Polynomial kernel proves out to be a better
factor in terms of Support Vector Machine.
In comparison with Naive Bayes Algorithm which is also a technique used for
classification. Support Vector Machine Aigorithm has a faster prediction along with better
accuracy.
. in comparison with Logistic Regression which is also a classification method SVM proves itsel to
be cheaper, it has a time complexity of O(N^2*K) where K is no
of support vectors whereas logistic Regression had the time complexity of O(N^3).
SVMs can be robust, even when the training sample has some bias and one of the reasons
that SVM proves out to be robust is its ability to deliver unique solution better than Neural
Networks where we get more than one solutions Corresponding to each local nminima for different
samples.
VM/
Unit 2
BAYESIAN BELIEF NETWORKS - EXAMPLE -1

P(B) P(E)
Burglary 001 Earthquake 002
B E P A B,E)
95
Alarm
F 94
F .29
F F 001
A P(JJA) A P(MIA)
JohnCalls T 90 MaryCallIs T .70
.05 F 01
BAYESIAN BELIEF NETWORKS-EXAMPLE 1
1 What is the probability that the P(B)|
Burglary 001
P(E)
Earthquake002
alarm has sounded but neither a
B E PAJB,E)
T .95
burglary nor an earthquake has T F 94
Alarm
F T .29
F F 001
occurred, and both John and Merry
call? A PJJA) A P(MIA)
JohnCalls T90 MaryCalls T .70
05 .01
Solution:
Pljama aa-ba-e) =
P{j | a) P(m | a) P(a | -b, -e) P(--b) P(-e)
=
0.90x 0.70 x 0.001x 0.999 x 0.998
= 0.00062

MLT Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MLT Notes

Uploaded by

Copyright:

Available Formats

K-m eanS

7 +he ciose s t clusco

Some t ines iet esed OaJazy eaanin me 4hods b(aUe

Atha dela Paocesi Ontil ahew ins1 Cun ce n u s b

vec-to uantizohi on LOO)

e ertise \yaina AutceA iA

calculate KNN by hand

good or not. Here is four training samples

1. Determine parameter K number of nearest

X1 Acid Durability (seconds) Square Distance to query instance (3, 7)

(7-3)2 +(4-7 =25

(7-3)2 +(7-7) =16

(7-3) +(4-7)2 =25 No

(7-3)+(7-7) = 16 3 Yes Bad

3 (3-3+(4-7) = 9 1 Yes Good

1 4 (1-3)+(4-7) = 13 2 Yes Good

classifies the data points.

of data points, there many are

Hyperplanes are decision boundaries that help classify the data

number of features exceeds 3.

Youcan look at ineand a fewexamples of their working here.

How does it work?

the below snapshot

C is high as compared to both A and B.

margin then there is high chance of miss-classification.

section to identify the right hyper-plane

has classified all correctly. Therefore, the right hyper-plane is A.

margin. Hence, we can say, SVM classifcation is robust to outliers,

COTrectly classity unseen data. SVMs have a number of

It includcs proten classificalion and cancer classification. We use SVM io

the basis of genes and other biological

2. SVM is more effective in high dimensional spaces.

3. SVM is effective in cases where the number of dimensions is greater

4. SVM is relatively men1ory efficient

kernel function, x1, x2 are n-dimensional inputs and is a

SVM gehorally do not suffer condition of overfitting than

well on the out of sample data. Due to this as it performs well

every Support vectorS.

Support ctor iachine is useful in finding the separating Hyperplane ,Finding

which it s able to do effect of

space of the decision boundary separating the two classes.

extracted data does not afect the results which were

factor in terms of Support Vector Machine.

BAYESIAN BELIEF NETWORKS - EXAMPLE -1

You might also like