Welcome to Scribd!

Skip carousel

Big Data Analytics (2017 Regulation) : Overview of Clustering

Uploaded by

cskinit

0% found this document useful (0 votes)

10 views9 pages

Original Title

BDA UNIT-2-2

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

10 views9 pages

Big Data Analytics (2017 Regulation) : Overview of Clustering

Uploaded by

cskinit

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 9

Search inside document

BIG DATA ANALYTICS (2017 REGULATION)

Overview of clustering:
 In general, clustering is the use of unsupervised techniques for grouping similar objects.
 In machine learning, unsupervised refers to the problem of finding hidden structure within unlabeled data.
(Clustering is a method often used for exploratory analysis of the data.)
Example:
 Based on customers’ personal income, it is straightforward to divide the customers into three groups
depending on arbitrarily selected values.
The customers could be divided into three groups as follows:
 Earn less than $10,000
 Earn between $10,000 and $99,999
 Earn $100,000 or more
BIG DATA ANALYTICS (2017 REGULATION)

They are different types of clustering methods, including:

 Partitioning clustering : Used to classify observations, within a data set, into multiple groups based on

their similarity. The algorithms require the analyst to specify the number of clusters to be generated.

 Hierarchical clustering : Works by grouping data objects into a hierarchy or tree of cluster. (Top-Down

(Divisive), Bottom-Up (Agglomerative))

 Fuzzy clustering : Fuzzy clustering is a form of clustering in which each data point can belong to more

than one cluster.

 Density-based clustering : Which can be used to identify clusters of any shape in a data set containing

noise and outliers.

 Model-based clustering : Which consider the data as coming from a distribution that is mixture of two or

more clusters.
BIG DATA ANALYTICS (2017 REGULATION)

Partitioning clustering :
 Used to classify observations, within a data set, into multiple groups based on their similarity.
The algorithms require the analyst to specify the number of clusters to be generated.

Algorithms:

K-means clustering : Each cluster is represented by the center or means of the data points belonging to the
cluster.
BIG DATA ANALYTICS (2017 REGULATION)

Example:
BIG DATA ANALYTICS (2017 REGULATION)

K-means algorithm can be summarized as follows:

 Specify the number of clusters (K) to be created. (by the analyst)

 Select randomly k objects from the data set as the initial cluster centers or means.

 Assigns each observation to their closest centroid, based on the Euclidean distance between the

object and the centroid.

 For each of the k clusters update the cluster centroid by calculating the new mean values of all

the data points in the cluster. The centroid of a Kth cluster is a vector of length p containing the

means of all variables for the observations in the kth cluster; p is the number of variables.

 Iteratively minimize the total within sum of square. That is, iterate steps 3 and 4 until the

cluster assignments stop changing or the maximum number of iterations is reached. By default,

the R software uses 10 as the default value for the maximum number of iterations.
BIG DATA ANALYTICS (2017 REGULATION)

K-means Algorithm (Overview of the method):

BIG DATA ANALYTICS (2017 REGULATION)

K-means Algorithm (Overview of the method):

BIG DATA ANALYTICS (2017 REGULATION)

K-means Algorithm (Overview of the method):

Initialization Cluster Assignment

Moving Centroid Convergence

BIG DATA ANALYTICS (2017 REGULATION)

Use Cases of K-means :

Here is a list of ten interesting use cases for k-means.
Document Classification
 Cluster documents in multiple categories based on tags, topics, and the content of the document. This is a
very standard classification problem and k-means is a highly suitable algorithm for this purpose.
Delivery Store Optimization
 Optimize the process of good delivery using truck drones by using a combination of k-means to find the
optimal number of launch locations.
Identifying Crime Localities
 With data related to crimes available in specific localities in a city, the category of crime, the area of the
crime, Drug trade, Kidnapping, Robbery, Murder etc.
Customer Segmentation
 Clustering helps marketers improve their customer base, work on target areas, and segment customers based
on purchase history, interests, or activity monitoring.

Fantasy League Stat Analysis

 Analyzing player stats has always been a critical element of the sporting world, and with increasing
competition, machine learning has a critical role to play here.

Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Big Data Analytics (2017 Regulation) : Insurance Fraud Detection
Document8 pages
Big Data Analytics (2017 Regulation) : Insurance Fraud Detection
cskinit
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Mining Association Rules Mining:: Large
Document7 pages
Data Mining Association Rules Mining:: Large
Simran Jit Singh
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Variance Rover System
Document3 pages
Variance Rover System
esatjournals
No ratings yet
DWDM Unit-5
Document52 pages
DWDM Unit-5
Arun kumar Soma
No ratings yet
What is Cluster Analysis? Unsupervised Machine Learning for Customer Segmentation and Stock Market Clustering
Document22 pages
What is Cluster Analysis? Unsupervised Machine Learning for Customer Segmentation and Stock Market Clustering
Netra Raina
No ratings yet
Dynamic Approach To K-Means Clustering Algorithm-2
Document16 pages
Dynamic Approach To K-Means Clustering Algorithm-2
IAEME Publication
No ratings yet
Research Papers On Clustering in Data Mining PDF
Document7 pages
Research Papers On Clustering in Data Mining PDF
svfziasif
No ratings yet
Clustering & PCA Assignment Questions
Document4 pages
Clustering & PCA Assignment Questions
Amith C Gowda
No ratings yet
OPTICS: Ordering Points To Identify The Clustering Structure
Document12 pages
OPTICS: Ordering Points To Identify The Clustering Structure
qoberif
No ratings yet
Sine Cosine Based Algorithm For Data Clustering
Document5 pages
Sine Cosine Based Algorithm For Data Clustering
Anonymous lPvvgiQjR
No ratings yet
Market Segmentation For Airlines
Document1 page
Market Segmentation For Airlines
Prateep Kandru
No ratings yet
DOC-20231118-WA0008new unit 5
Document15 pages
DOC-20231118-WA0008new unit 5
facoj84692
No ratings yet
Big Data Internal 2 Answers
Document9 pages
Big Data Internal 2 Answers
TKK
No ratings yet
Ama 2018
Document14 pages
Ama 2018
Dr. Thulasi Bikku
No ratings yet
Delivery Feet Data Using K Mean Clustering With Applied SPSS
Document2 pages
Delivery Feet Data Using K Mean Clustering With Applied SPSS
Editor IJTSRD
No ratings yet
Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning
Document4 pages
Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning
monajigari vedhanth reddy
No ratings yet
Big Data Internal 2 Answers-1-9
Document9 pages
Big Data Internal 2 Answers-1-9
TKK
No ratings yet
Joseph Xavier J - FML
Document15 pages
Joseph Xavier J - FML
Joseph Xavier
No ratings yet
A Dynamic K-Means Clustering For Data Mining
Document6 pages
A Dynamic K-Means Clustering For Data Mining
elymolko
No ratings yet
Customer Segmentation Report
Document31 pages
Customer Segmentation Report
OXEN Enterprises
No ratings yet
Comparison of Graph Clustering Algorithms
Document6 pages
Comparison of Graph Clustering Algorithms
seventhsensegroup
No ratings yet
Data Mining Business Report Set
Document12 pages
Data Mining Business Report Set
priyada16
No ratings yet
Unit-5 Unit-5: Case Studies of Big Data Analytics Using Map-Reduce Programming
Document11 pages
Unit-5 Unit-5: Case Studies of Big Data Analytics Using Map-Reduce Programming
Chitra Madhuri Yashoda
No ratings yet
Research Papers On K Means Clustering
Document5 pages
Research Papers On K Means Clustering
tutozew1h1g2
100% (3)
Clustering Algorithms For Mixed Datasets: A Review: K. Balaji and K. Lavanya
Document10 pages
Clustering Algorithms For Mixed Datasets: A Review: K. Balaji and K. Lavanya
Edward
No ratings yet
Bio-Inspired Heuristics For Clustering in Big Data Applications
Document7 pages
Bio-Inspired Heuristics For Clustering in Big Data Applications
Ameya Deshpande.
No ratings yet
BDS 1101-BSD 3101-Cat 1
Document2 pages
BDS 1101-BSD 3101-Cat 1
Ruth Mwende
No ratings yet
A Dynamic K-Means Clustering For Data Mining-Dikonversi
Document6 pages
A Dynamic K-Means Clustering For Data Mining-Dikonversi
IntanSetiawatiAbdullah
No ratings yet
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
Document8 pages
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
Prianca Kayastha
No ratings yet
Unit 4
Document4 pages
Unit 4
adityapawar1865
No ratings yet
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
Document8 pages
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
yang yang
No ratings yet
Market Segmentation and Customer Clustering
Document47 pages
Market Segmentation and Customer Clustering
Davin Malore
No ratings yet
Importance of Clustering
Document5 pages
Importance of Clustering
Sattyasai Allapathi
No ratings yet
Data Cluster Algorithms in Product Purchase Saleanalysis
Document6 pages
Data Cluster Algorithms in Product Purchase Saleanalysis
Quenton Compas
No ratings yet
Cluster customers using K-Means analysis
Document7 pages
Cluster customers using K-Means analysis
Ankit Seth
No ratings yet
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
Document6 pages
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
M Media
No ratings yet
Applied Mathematics and Computation: Graça Trindade, José G. Dias, Jorge Ambrósio
Document12 pages
Applied Mathematics and Computation: Graça Trindade, José G. Dias, Jorge Ambrósio
Claudiu Paşcovici
No ratings yet
Iterative Improved K-Means Clusterin
Document5 pages
Iterative Improved K-Means Clusterin
madhuridalal1012
No ratings yet
Dynamicclustering
Document6 pages
Dynamicclustering
kasun prabhath
No ratings yet
Cluster Analysis in R TML
Document5 pages
Cluster Analysis in R TML
RajyaLakshmi
No ratings yet
R For Data Science Sample Chapter
Document39 pages
R For Data Science Sample Chapter
Packt Publishing
100% (1)
Data Science Analysis Final Project
Document10 pages
Data Science Analysis Final Project
Srikarrao Naropanth
No ratings yet
Cluster Analysis Research Paper PDF
Document7 pages
Cluster Analysis Research Paper PDF
lihbcfvkg
100% (1)
Data Mining Graded Assignment Analysis
Document39 pages
Data Mining Graded Assignment Analysis
rakesh sandhyapogu
100% (3)
Top 30 Data Analyst Interview Questions & Answers (2022)
Document16 pages
Top 30 Data Analyst Interview Questions & Answers (2022)
wesaltarron
No ratings yet
Gautam A. Kudale
Document6 pages
Gautam A. Kudale
Hellbuster45
No ratings yet
DM Lecture 06
Document32 pages
DM Lecture 06
Sameer Ahmad
No ratings yet
Assi 1
Document27 pages
Assi 1
Menna
No ratings yet
Ijettcs 2014 04 25 123
Document5 pages
Ijettcs 2014 04 25 123
International Journal of Application or Innovation in Engineering & Management
No ratings yet
116-Article Text-393-2-10-20200220 - NOMOR2
Document6 pages
116-Article Text-393-2-10-20200220 - NOMOR2
Tester Butter
No ratings yet
Data Mining Lecture Review on Knowledge Discovery
Document20 pages
Data Mining Lecture Review on Knowledge Discovery
mofoel
No ratings yet
5 CS 03 Ijsrcse
Document4 pages
5 CS 03 Ijsrcse
Edward
No ratings yet
Major Issues in Data Mining
Document9 pages
Major Issues in Data Mining
Gaurav Jaiswal
No ratings yet
Report ML 2
Document10 pages
Report ML 2
v453083
No ratings yet
CLUSTERING ANALYSIS FOR CUSTOMER SEGMENTATION
Document16 pages
CLUSTERING ANALYSIS FOR CUSTOMER SEGMENTATION
rakesh sandhyapogu
No ratings yet
K-Means Clustering
Document8 pages
K-Means Clustering
Abeer Pareek
No ratings yet
Clustering Analysis: Reading The Data
Document15 pages
Clustering Analysis: Reading The Data
KATHIRVEL S
100% (1)
Big Data Analytics Clustering and Classification
Document7 pages
Big Data Analytics Clustering and Classification
cskinit
No ratings yet
HDFS and MapReduce in Big Data Analytics
Document7 pages
HDFS and MapReduce in Big Data Analytics
cskinit
No ratings yet
1) Velocity: Speed of Data: Generation and Processing
Document9 pages
1) Velocity: Speed of Data: Generation and Processing
cskinit
No ratings yet
Big Data Analytics 2017 Regulation Guide
Document7 pages
Big Data Analytics 2017 Regulation Guide
cskinit
No ratings yet
Big Data Analytics (2017 Regulation)
Document8 pages
Big Data Analytics (2017 Regulation)
cskinit
No ratings yet
Slide 1 Unit 1 Evolution - Need For Quality
Document12 pages
Slide 1 Unit 1 Evolution - Need For Quality
cskinit
No ratings yet
Unsupervised Learning - Clustering Cheatsheet - Codecademy
Document5 pages
Unsupervised Learning - Clustering Cheatsheet - Codecademy
Imane Loukili
No ratings yet
ML Techniques Presentation
Document45 pages
ML Techniques Presentation
fareenfarzanawahed
No ratings yet
Comp 2140: Lab Assignment 5-Recursive Descent Parser (8 Marks)
Document2 pages
Comp 2140: Lab Assignment 5-Recursive Descent Parser (8 Marks)
Muhammad Adil Naeem
No ratings yet
The Scala Programming Language: Presented by Donna Malayeri
Document25 pages
The Scala Programming Language: Presented by Donna Malayeri
sarvesh_mishra
No ratings yet
Lab 1: Combinational Logic Design: A. Objectives
Document4 pages
Lab 1: Combinational Logic Design: A. Objectives
Simon Elysium
No ratings yet
C++ Data Structures and Algorithm Design Principles - John Carey, Shreyans Doshi and Payas Rajan
Document626 pages
C++ Data Structures and Algorithm Design Principles - John Carey, Shreyans Doshi and Payas Rajan
Cássio Antonio Tavares
100% (4)
Lab Manual Soft Computing
Document44 pages
Lab Manual Soft Computing
Avicii23 Avicii
No ratings yet
2020 - 2021 Intelligence Pre-Requisite Chart: Thread Picks 6/9/20
Document1 page
2020 - 2021 Intelligence Pre-Requisite Chart: Thread Picks 6/9/20
Sarang Pujari
No ratings yet
Facebook Leetcode
Document30 pages
Facebook Leetcode
Ritik Garg
No ratings yet
15.053 - Optimization Methods in Management Science (Spring 2007)
Document9 pages
15.053 - Optimization Methods in Management Science (Spring 2007)
Ehsan Spencer
No ratings yet
University of Dhaka Basic of Graph
Document36 pages
University of Dhaka Basic of Graph
Redowan Mahmud Ratul
No ratings yet
EES452 - Syllabus
Document4 pages
EES452 - Syllabus
OSAMA
No ratings yet
TOC (CS 501) Lab Manual Compressed
Document42 pages
TOC (CS 501) Lab Manual Compressed
02.satya.2001
No ratings yet
String Data Structure and Class in Java
Document79 pages
String Data Structure and Class in Java
priyanjay
No ratings yet
Assignment 1 CSC 248
Document10 pages
Assignment 1 CSC 248
Fareez Daus
No ratings yet
Reflexive Relation: R A A, Ara. R A
Document9 pages
Reflexive Relation: R A A, Ara. R A
Talha Nazeer
100% (1)
CH 3
Document25 pages
CH 3
Nazia Enayet
No ratings yet
ITE 211 MIDTERM Part 1
Document8 pages
ITE 211 MIDTERM Part 1
Tameta Elaisha
No ratings yet
Array Rotation - Program For Left and Right Rotation of An Array - Faceprep PROcoder
Document1 page
Array Rotation - Program For Left and Right Rotation of An Array - Faceprep PROcoder
acas
No ratings yet
Project Report CS 341: Computer Architecture Lab
Document12 pages
Project Report CS 341: Computer Architecture Lab
thumarushik
No ratings yet
06ESL38 - Logic Design Lab1
Document76 pages
06ESL38 - Logic Design Lab1
Sachin S Shetty
No ratings yet
DSA Mock Interview With Sharpener 1
Document7 pages
DSA Mock Interview With Sharpener 1
kishan94147234472
No ratings yet
Fixed point math guide for integer and fractional representations
$Fixed point math guide for integer and fractional representations$
Document3 pages
Fixed point math guide for integer and fractional representations
Tom Maaswinkel
No ratings yet
Q1. Circle The Correct Answer
Document17 pages
Q1. Circle The Correct Answer
KOAN- KEEKA
No ratings yet
Dekker's Algorithm
Document9 pages
Dekker's Algorithm
Ayeshaaurangzeb Aurangzeb
No ratings yet
CompilerDesign 210170107518 Krishna (4-10)
Document47 pages
CompilerDesign 210170107518 Krishna (4-10)
krishnaviramgama57
No ratings yet
Speech Communication: Qirong Mao, Guopeng Xu, Wentao Xue, Jianping Gou, Yongzhao Zhan
Document10 pages
Speech Communication: Qirong Mao, Guopeng Xu, Wentao Xue, Jianping Gou, Yongzhao Zhan
Abdallah Grima
No ratings yet
HW9
Document10 pages
HW9
Marija
No ratings yet
Computer Graphics 4: Bresenham Line Drawing Algorithm, Circle Drawing & Polygon Filling
Document48 pages
Computer Graphics 4: Bresenham Line Drawing Algorithm, Circle Drawing & Polygon Filling
areejalmajed
No ratings yet
Sample SCJP Questions
Document23 pages
Sample SCJP Questions
Manuel Galgana
No ratings yet