Welcome to Scribd!

Unit 5 Big Data

Uploaded by

0% found this document useful (0 votes)

13 views55 pages

K-Means clustering is an unsupervised machine learning algorithm that groups unlabeled data points into k number of clusters based on their similarities. It is an iterative algorithm that minimizes distances between data points and their assigned cluster centers. The algorithm takes unlabeled data as input and divides it into k clusters, repeating this process until the best fitting clusters are identified.

Original Description:

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

13 views55 pages

Unit 5 Big Data

Uploaded by

Venkatesh Sharma

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 55

Search inside document

• K-Means Clustering Algorithm

• K-Means Clustering is an unsupervised

learning algorithm that is used to solve the
clustering problems in machine learning or
data science.
What is K-Means Algorithm?

• K-Means Clustering is an

Unsupervised Learning algorithm, which groups the
unlabeled dataset into different clusters.
• Here K defines the number of pre-defined clusters that
need to be created in the process, as if K=2, there will
be two clusters, and for K=3, there will be three
clusters, and so on.
• It is an iterative algorithm that divides the
unlabeled dataset into k different clusters in
such a way that each dataset belongs only one
group that has similar properties.
• It allows us to cluster the data into different groups
and a convenient way to discover the categories of
groups in the unlabeled dataset on its own without
the need for any training.
• It is a centroid-based algorithm, where each cluster is
associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between
the data point and their corresponding clusters.
• The algorithm takes the unlabeled dataset as
input, divides the dataset into k-number of
clusters, and repeats the process until it does
not find the best clusters. The value of k
should be predetermined in this algorithm.
Here 6,3 data point has moved/shifted from C3
to C2 so we have to calculate the centroids
Here 8,3 data point has moved/shifted from C3
to C2 so we have to calculate the centroids
Here 4,7 data point has moved/shifted from C1 to C3 so
we have to calculate the centroids
Decision Tree using ID3 Algorithm
• In this example there are 4 attributes(outlook,
Temp,Humidity,Wind)
• PlayTennis is the Target Attribute.
• If you want to draw the Decision tree using ID3
algorithm
First thing To Identify the attribute which is giving
the maximum information out of the available attribute.
• Now we have to identify which attribute has
the maximum information gain.
• Maximum information gain attribute will
choose as a Root Node. ie Outlook attribute is
the Root node because it has maximum
information gain.
Naive Bayes classifiers
• Are the collection of classification algorithms
based on Bayes’ Theorem.
• It is not a single algorithm but a family of
algorithms where all of them share a common
principle, i.e. every pair of features being
classified is independent of each other.
• To start with, let us consider a dataset.
• In this case we have been given new instance
with certain conditions we have check
whether it is classified as Yes or No.
• We have to find the prior probability and the
current probability .

• The prior probability in this case are Yes or No.

• So the probability of Yes and NO is given as.
• Once you calculate the prior probability .
• Next we have to find the current probability.
• That is also called as conditional probabilities
of the individual attributes.
• Once you find the prior probability and the
conditional probability.
• Next to classify the new instance either Yes or
No.
New Instance
• Is the new instance is given above with their
conditions.
• We have to find whether it belongs to Yes or
NO.
Naive Bayes classifier equation is
given below

Unsupervised Learning
Document12 pages
Unsupervised Learning
Revathi Kalyanasundaram
No ratings yet
Week 11
Document49 pages
Week 11
SvipDag
No ratings yet
W6 Clustering
Document29 pages
W6 Clustering
5599RAJNISH SINGH
No ratings yet
w6 Clustering
Document29 pages
w6 Clustering
Srisha Prasad Rath
No ratings yet
DuongToGiangSon 517H0162 HW2 Nov-26
Document17 pages
DuongToGiangSon 517H0162 HW2 Nov-26
Son Tran
No ratings yet
TOD 212 - Digging Through Data - PPT - For Students - Monsoon 2023 (Autosaved)
Document18 pages
TOD 212 - Digging Through Data - PPT - For Students - Monsoon 2023 (Autosaved)
dhyani.s
No ratings yet
DWDM Unit5
Document14 pages
DWDM Unit5
sri charan
No ratings yet
Clustering (Unit 3)
Document71 pages
Clustering (Unit 3)
vedang maheshwari
100% (1)
Chapter 5
Document43 pages
Chapter 5
Bikila Seketa
No ratings yet
19 - Sessionppt - Clusteringalgos
Document36 pages
19 - Sessionppt - Clusteringalgos
Graisy Biswal
No ratings yet
Final Clustering
Document21 pages
Final Clustering
NEEL GHADIYA
No ratings yet
Chapter 4
Document31 pages
Chapter 4
Bikila Seketa
No ratings yet
Machine Learning and Web Scraping Lecture 03
Document22 pages
Machine Learning and Web Scraping Lecture 03
patrice mvogo
No ratings yet
Unit 5
Document31 pages
Unit 5
minichel
No ratings yet
MLT Unit 3
Document38 pages
MLT Unit 3
iamutkarshdube
100% (1)
Clustering Algorithm: An Unsupervised Learning Approach
Document23 pages
Clustering Algorithm: An Unsupervised Learning Approach
SyedDabeerAli
No ratings yet
Understanding The Inners of Clustering: DR Akashdeep, UIET, Panjab University Chandigarh, Maivriklab@pu - Ac.in
Document61 pages
Understanding The Inners of Clustering: DR Akashdeep, UIET, Panjab University Chandigarh, Maivriklab@pu - Ac.in
Dr. Dnyaneshwar Kirange
No ratings yet
Learning Types ML
Document18 pages
Learning Types ML
21124059
No ratings yet
8ad59658 1701235711480
Document36 pages
8ad59658 1701235711480
kashyaputtam7
No ratings yet
Classify Clustering
Document31 pages
Classify Clustering
priyanshidubey2008
No ratings yet
(KtabPDF Com) xrwA7TEBGp
Document32 pages
(KtabPDF Com) xrwA7TEBGp
شجن الزبير
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
Document32 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
Rule2
No ratings yet
MODULE 3 Classification
Document5 pages
MODULE 3 Classification
dhruu2503
No ratings yet
Datamining - Revited
Document8 pages
Datamining - Revited
Bridget Smith
No ratings yet
Data Analytics CSE704 Module-2
Document42 pages
Data Analytics CSE704 Module-2
suryanshmishra425
No ratings yet
Data Mining Unit 3 Cluster Analysis: Types of Clusters
Document11 pages
Data Mining Unit 3 Cluster Analysis: Types of Clusters
rohan
No ratings yet
K-Means Clustering Algorithm
Document13 pages
K-Means Clustering Algorithm
Gaurav Raut
No ratings yet
Unit5 - Unsupervised Learning
Document48 pages
Unit5 - Unsupervised Learning
Soumya Mishra
No ratings yet
Clustering
Document24 pages
Clustering
Vits Rangannavar
No ratings yet
K Means Clustering
Document6 pages
K Means Clustering
Alina Corina Bala
No ratings yet
DM Lecture 06
Document32 pages
DM Lecture 06
Sameer Ahmad
No ratings yet
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
Document30 pages
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
Fidia Dta
No ratings yet
Module 3
Document103 pages
Module 3
V Neha
No ratings yet
K-Nearest Neighbor Algorithm
Document6 pages
K-Nearest Neighbor Algorithm
hejoj76652
No ratings yet
K - Mean Clustering
Document12 pages
K - Mean Clustering
Shuvajit Das amit
No ratings yet
K-Nearest Neighbor Algorithm
Document6 pages
K-Nearest Neighbor Algorithm
hejoj76652
No ratings yet
Unit 4
Document4 pages
Unit 4
adityapawar1865
No ratings yet
DW&M Unit 3 Part I
Document101 pages
DW&M Unit 3 Part I
UT DU
No ratings yet
ML (Interview)
Document20 pages
ML (Interview)
ratnadepp
No ratings yet
Project Report 2
Document11 pages
Project Report 2
seethamrajumukund
No ratings yet
Unit 4 - Data Warehousing and Mining
Document51 pages
Unit 4 - Data Warehousing and Mining
Ã S Àdhìkãrí
No ratings yet
3.decision Tree
Document23 pages
3.decision Tree
anima tor
No ratings yet
Unsupervised Machine Learning-UNITIV
Document22 pages
Unsupervised Machine Learning-UNITIV
Anil
No ratings yet
Unsupervised Machine Learning
Document10 pages
Unsupervised Machine Learning
Ananya S
No ratings yet
DSS06-DS5 - CLS-Rule Induction, K-NN, Naive Bayesian
Document40 pages
DSS06-DS5 - CLS-Rule Induction, K-NN, Naive Bayesian
Thịnh Thái
No ratings yet
K Mean
Document12 pages
K Mean
Shivram Dwivedi
No ratings yet
Machine Learning
Document33 pages
Machine Learning
shobhit
No ratings yet
Overview of Clustering:: UNIT-5
Document27 pages
Overview of Clustering:: UNIT-5
Kalyan Varma
No ratings yet
IML Assignment 6 Report
Document18 pages
IML Assignment 6 Report
Hasya Patel
No ratings yet
Hierarchical Clustering: Required Data
Document6 pages
Hierarchical Clustering: Required Data
Hritik Agrawal
No ratings yet
001 Ebook - Predictive Modeling Techniques
Document58 pages
001 Ebook - Predictive Modeling Techniques
Hemanta Kumar Dash
No ratings yet
Unit 4 Supervised Learning
Document75 pages
Unit 4 Supervised Learning
Soumya Mishra
No ratings yet
Technical Seminar On K-Means Clustering in Data Mining: Darshna Sharma 1HK18IS086 Guide: Prof. Priyanka K
Document22 pages
Technical Seminar On K-Means Clustering in Data Mining: Darshna Sharma 1HK18IS086 Guide: Prof. Priyanka K
Darshna Sharma
No ratings yet
Unit 5
Document77 pages
Unit 5
khatuaryan16
No ratings yet
Dmbi Unit-4
Document18 pages
Dmbi Unit-4
Paras Sharma
No ratings yet
Machine Learning 1
Document29 pages
Machine Learning 1
Jemin Ajudiya
No ratings yet
Experiment No 07: Mihir Patel Teit 2
Document5 pages
Experiment No 07: Mihir Patel Teit 2
MIHIR PATEL
No ratings yet
Aiml M3 C2
Document56 pages
Aiml M3 C2
Vivek Tg
No ratings yet
1.supervised and Unsupervised
Document42 pages
1.supervised and Unsupervised
rajthakre81
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Unit 6
Document143 pages
Unit 6
Venkatesh Sharma
No ratings yet
Unit 4
Document84 pages
Unit 4
Venkatesh Sharma
No ratings yet
Unit 3
Document41 pages
Unit 3
Venkatesh Sharma
No ratings yet
Unit 3 Big Data
Document18 pages
Unit 3 Big Data
Venkatesh Sharma
No ratings yet
Usher Hall, Edinburgh: History
Document1 page
Usher Hall, Edinburgh: History
Tarun Tango
No ratings yet
Questionnaire
Document4 pages
Questionnaire
vishal chauhan
No ratings yet
Sacred Books of The East Series, Volume 47: Pahlavi Texts, Part Five
Document334 pages
Sacred Books of The East Series, Volume 47: Pahlavi Texts, Part Five
Jimmy T.
100% (1)
Daily Lesson Log in Math 9
Document13 pages
Daily Lesson Log in Math 9
Marjohn Palmes
No ratings yet
Term 3 Mid-Term Assessment Schedule
Document9 pages
Term 3 Mid-Term Assessment Schedule
Rabia Moeed
No ratings yet
Azevedo Slum English 1926
Document90 pages
Azevedo Slum English 1926
Nealon Isaacs
100% (1)
Following The Path of The Eagle - David Oyedepo - 230720 - 123245
Document173 pages
Following The Path of The Eagle - David Oyedepo - 230720 - 123245
sakurablossxmyt1
No ratings yet
Discoid Lupus Erythematosus - Background, Etiology, Epidemiology
Document8 pages
Discoid Lupus Erythematosus - Background, Etiology, Epidemiology
Jair Mathews
No ratings yet
Seng2011 - Assignment 5
Document11 pages
Seng2011 - Assignment 5
yajnas1996
No ratings yet
Debt Policy and Firm Performance of Family Firms The Impact of Economic Adversity
Document21 pages
Debt Policy and Firm Performance of Family Firms The Impact of Economic Adversity
Miguel Hernandes Junior
No ratings yet
Life and Work of Architect Ejaz Ahed
Document25 pages
Life and Work of Architect Ejaz Ahed
wajihazahid
100% (2)
Writen Practice of Qualification and Certification SH
Document15 pages
Writen Practice of Qualification and Certification SH
Sisira Chandrasoma
100% (3)
Chloe Ting - 2 Weeks Shred Challenge - Free Workout Program
Document1 page
Chloe Ting - 2 Weeks Shred Challenge - Free Workout Program
Ira Naval
No ratings yet
Agatthiyar's Saumya Sagaram - A Quick Summary of The Ashta Karma
Document5 pages
Agatthiyar's Saumya Sagaram - A Quick Summary of The Ashta Karma
Bujji John
No ratings yet
Karrnathi Undead P2
Document2 pages
Karrnathi Undead P2
Monjis Monjas
No ratings yet
158 Carino vs. CA
Document2 pages
158 Carino vs. CA
Francesca Isabel Montenegro
67% (3)
Home Depot
Document13 pages
Home Depot
Tyfanie Petersen
No ratings yet
Thomas Friedman - The World Is Flat
Document12 pages
Thomas Friedman - The World Is Flat
Elena Țăpean
No ratings yet
Chartered Accountancy
Document28 pages
Chartered Accountancy
Nidhi Shrivastava
No ratings yet
Camisole No.8 Pattern by MyFavoriteThingsKnitwear
Document7 pages
Camisole No.8 Pattern by MyFavoriteThingsKnitwear
Garlic
No ratings yet
Mechanics of Materials Lab 1-Zip Tie Tensile Testing
Document7 pages
Mechanics of Materials Lab 1-Zip Tie Tensile Testing
coolshava
No ratings yet
PPHN
Document32 pages
PPHN
Anonymous NeRC5JYiS
No ratings yet
08 03 Runge-Kutta 2nd Order Method
Document11 pages
08 03 Runge-Kutta 2nd Order Method
John Bofarull Guix
No ratings yet
Knowledge Versus Opinion
Document20 pages
Knowledge Versus Opinion
Shumaila Hameed
No ratings yet
M-10 Content+Previous Years Question
Document65 pages
M-10 Content+Previous Years Question
Online Physics Care by Syed Al-Nahiyan
No ratings yet
Forensic 2 Module 1-Content
Document12 pages
Forensic 2 Module 1-Content
Sheri Ann Patoc
No ratings yet
8.31 - Standard Costing
Document109 pages
8.31 - Standard Costing
Bhosx Kim
100% (1)
Reaction Paper Politics
Document1 page
Reaction Paper Politics
Denise Jim Galanta
No ratings yet
Speculative Design
Document16 pages
Speculative Design
Will Kurlinkus
100% (1)
Battleship Potemkin
Document7 pages
Battleship Potemkin
MariusOdobașa
100% (1)