You are on page 1of 55

• K-Means Clustering Algorithm

• K-Means Clustering is an unsupervised


learning algorithm that is used to solve the
clustering problems in machine learning or
data science.
What is K-Means Algorithm?

• K-Means Clustering is an 


Unsupervised Learning algorithm, which groups the
unlabeled dataset into different clusters.
• Here K defines the number of pre-defined clusters that
need to be created in the process, as if K=2, there will
be two clusters, and for K=3, there will be three
clusters, and so on.
• It is an iterative algorithm that divides the
unlabeled dataset into k different clusters in
such a way that each dataset belongs only one
group that has similar properties.
• It allows us to cluster the data into different groups
and a convenient way to discover the categories of
groups in the unlabeled dataset on its own without
the need for any training.
• It is a centroid-based algorithm, where each cluster is
associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between
the data point and their corresponding clusters.
• The algorithm takes the unlabeled dataset as
input, divides the dataset into k-number of
clusters, and repeats the process until it does
not find the best clusters. The value of k
should be predetermined in this algorithm.
Here 6,3 data point has moved/shifted from C3
to C2 so we have to calculate the centroids
Here 8,3 data point has moved/shifted from C3
to C2 so we have to calculate the centroids
Here 4,7 data point has moved/shifted from C1 to C3 so
we have to calculate the centroids
Decision Tree using ID3 Algorithm
• In this example there are 4 attributes(outlook,
Temp,Humidity,Wind)
• PlayTennis is the Target Attribute.
• If you want to draw the Decision tree using ID3
algorithm
First thing To Identify the attribute which is giving
the maximum information out of the available attribute.
• Now we have to identify which attribute has
the maximum information gain.
• Maximum information gain attribute will
choose as a Root Node. ie Outlook attribute is
the Root node because it has maximum
information gain.
Naive Bayes classifiers
• Are the collection of classification algorithms
based on Bayes’ Theorem.
• It is not a single algorithm but a family of
algorithms where all of them share a common
principle, i.e. every pair of features being
classified is independent of each other.
• To start with, let us consider a dataset.
• In this case we have been given new instance
with certain conditions we have check
whether it is classified as Yes or No.
• We have to find the prior probability and the
current probability .

• The prior probability in this case are Yes or No.


• So the probability of Yes and NO is given as.
• Once you calculate the prior probability .
• Next we have to find the current probability.
• That is also called as conditional probabilities
of the individual attributes.
• Once you find the prior probability and the
conditional probability.
• Next to classify the new instance either Yes or
No.
New Instance
• Is the new instance is given above with their
conditions.
• We have to find whether it belongs to Yes or
NO.
Naive Bayes classifier equation is
given below

You might also like