Welcome to Scribd!

Machine Learning: Chapter 2 Clustering

Uploaded by

0% found this document useful (0 votes)

10 views23 pages

The document discusses machine learning clustering techniques. It covers K-means clustering, including initializing centroids, assigning samples, adjusting centroids, and iterating until convergence. It also discusses choosing the number of clusters K using the elbow method by minimizing distortion score. In addition, it briefly reviews decision trees and random forests, and mentions other clustering methods like hierarchical and spectral clustering.

Original Description:

Original Title

Untitled

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

10 views23 pages

Machine Learning: Chapter 2 Clustering

Uploaded by

Nam Le Hoang

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 23

Search inside document

Machine Learning

Chapter 2 Clustering
Dr. Minhhuy Le
EEE, Phenikaa University
Chapter 2: Decision Tree
1. Decision tree review
2. Clustering intuition
3. K-means algorithm
4. Summary
1. Random Forest Review
Example:
f: <Outlook, Temperature, Humidity, Wind> => PlayTennis?

Minhhuy Le, ICSLab, Phenikaa Uni. 3

1. Random Forest Review
Example:

Minhhuy Le, ICSLab, Phenikaa Uni. 4

1. Random Forest Review
ID3 approach: Natural greedy approach to growing a decision tree top-down (from the
root to the leaves by repeatedly replacing an existing leaf with an internal node.).

Algorithm:
• Pick “best” attribute to split at the root based on training data.
• Recurse on children that are impure (e.g, have both Yes and No)

Key question: Which attribute is best?

Minhhuy Le, ICSLab, Phenikaa Uni. 5

1. Random Forest Review
ID3 approach: Select attribute with highest information gain (IG)

Information Gain of A is the expected reduction in entropy of target variable Y for

data sample S, due to sorting on variable A

Minhhuy Le, ICSLab, Phenikaa Uni. 6

1. Random Forest Review
ID3 approach: Select attribute with highest information gain (IG)

Minhhuy Le, ICSLab, Phenikaa Uni. 7

1. Random Forest Review
ID3 approach: Select attribute with highest information gain (IG)

Minhhuy Le, ICSLab, Phenikaa Uni. 8

1. Random Forest Review
ID3 steps:

1. Calculate Entropy of one attribute

2. Calculate Entropy of each feature (IG)
3. Choose largest IG as Root Node
4. Entropy = 0 -> Leaf, ≠0 will be splitting
5. Repeat until all data classified

Hyper parameters

Minhhuy Le, ICSLab, Phenikaa Uni. 9

1. Random Forest Review
Random Forest Steps:

1. Select random samples from a given dataset.

2. Construct a decision tree for each sample and get a

prediction result from each decision tree.

3. Perform a vote for each predicted result.

4. Select the prediction result with the most votes as the

final prediction.

Minhhuy Le, ICSLab, Phenikaa Uni. 10

2. Clustering Intuition
Supervised learning Unsupervised learning

Training set:
Training set:

No label data (y)

Minhhuy Le, ICSLab, Phenikaa Uni. 11

2. Clustering Intuition

Minhhuy Le, ICSLab, Phenikaa Uni. 12

2. Clustering Intuition
Clustering: Finding structure in the data
• By isolating groups of examples that are similar in some well-defined sense
• Unsupervised learning algorithm: only input data, no label information

How many classes is the best? Depend on the measure of similarity (or distance) between the
data points to be clustered

Minhhuy Le, ICSLab, Phenikaa Uni. 13

3. K-means algorithm
Clustering methods:
• Hierarchical clustering methods
• Spectral clustering
• Semi-supervised clustering
• Clustering by dynamics
• Flat clustering methods: k-means clustering
• Etc.

Minhhuy Le, ICSLab, Phenikaa Uni. 14

3. K-means algorithm
Procedure:
1. Pick k arbitrary centroids (cluster means)
2. Assign each sample to its closest centroid
3. Adjust the centroids to be the means of the
examples assigned to them
4. Repeat step 2 until no change

K-means algorithm is guaranteed to converge in

a finite number of iterations

Minhhuy Le, ICSLab, Phenikaa Uni. 15

3. K-means algorithm

Minhhuy Le, ICSLab, Phenikaa Uni. 16

3. K-means algorithm
Procedure illustration:

Minhhuy Le, ICSLab, Phenikaa Uni. 17

3. K-means algorithm
Procedure illustration:

Minhhuy Le, ICSLab, Phenikaa Uni. 18

3. K-means algorithm
Procedure illustration:

Minhhuy Le, ICSLab, Phenikaa Uni. 19

3. K-means algorithm

Minhhuy Le, ICSLab, Phenikaa Uni. 20

3. K-means algorithm
How to choose k?

Could repeat several times to get the best solution

Minhhuy Le, ICSLab, Phenikaa Uni. 21

3. K-means algorithm
How to choose k? “Elbow” method
• As k is large (smaller clusters), J is smaller however the model is easy overfitting
• k should be chosen at the “elbow” where J is not significantly reduced

J: distortion score

Minhhuy Le, ICSLab, Phenikaa Uni. 22

5. Summary

• K-means is a parametric method, where the parameters are the prototypes.

• Inflexible; the decision boundary is linear.
• Fast! The update steps can be parallelized.
• There are several variations on the basic K-means algorithm
1. K-means++ gives a more specific way to initialize clusters
2. K-medoids chooses the centermost datapoint in the cluster as the prototype instead
of the centroid. (The centroid may not correspond to a datapoint.)

Minhhuy Le, ICSLab, Phenikaa Uni. 23

Bertalanffy Ludwig Von 1968 General System Theory Foundations Development Applications
Document290 pages
Bertalanffy Ludwig Von 1968 General System Theory Foundations Development Applications
david_horton83
100% (9)
Lect 1
Document39 pages
Lect 1
Omer Elsir
No ratings yet
CH 6 Learning
Document14 pages
CH 6 Learning
gaurab
No ratings yet
AI
Document24 pages
AI
Tanusha hande
No ratings yet
ML Lecture 10
Document14 pages
ML Lecture 10
T
No ratings yet
Lecture 2 Autoregressive Models
Document113 pages
Lecture 2 Autoregressive Models
albertoluin10
No ratings yet
Basic Concepts and K-Nearest Neighbors: Pekka Parviainen
Document40 pages
Basic Concepts and K-Nearest Neighbors: Pekka Parviainen
Val
No ratings yet
(Coursera) Neural Networks For Machine Learning - Geoffery Hinton
Document9 pages
(Coursera) Neural Networks For Machine Learning - Geoffery Hinton
Adam112211
No ratings yet
2021 10 11 - Intro ML - Inserm
Document41 pages
2021 10 11 - Intro ML - Inserm
po esperitable
No ratings yet
ML Lecture 10
Document14 pages
ML Lecture 10
tahirnaquash
No ratings yet
AIML 2m
Document2 pages
AIML 2m
sonasinghmaurya0527
No ratings yet
M1112sp-Ivb - 1,2
Document4 pages
M1112sp-Ivb - 1,2
chingferdie_111
100% (1)
(N) Semi-Supervised Learning Quantization Algorithm With Deep Features
Document13 pages
(N) Semi-Supervised Learning Quantization Algorithm With Deep Features
Christian F. Vega
No ratings yet
A L C N N: A C - S A: Ctive Earning For Onvolutional Eural Etworks ORE ET Pproach
Document13 pages
A L C N N: A C - S A: Ctive Earning For Onvolutional Eural Etworks ORE ET Pproach
vinasreis
No ratings yet
Schizophrenia Ieee 2018
Document6 pages
Schizophrenia Ieee 2018
Ms. S.Sridevi Vels University
No ratings yet
Machine Learning and Eural Etwork
Document21 pages
Machine Learning and Eural Etwork
Redmond R. Shamshiri
100% (1)
AI Unit 4 QA
Document22 pages
AI Unit 4 QA
HOW to BASIC INDIAN
No ratings yet
Classification of Symbolic Objects Using Adaptive Auto-Configuring RBF Neural Networks
Document5 pages
Classification of Symbolic Objects Using Adaptive Auto-Configuring RBF Neural Networks
scribdkhatn
No ratings yet
Learning: Book: Artificial Intelligence, A Modern Approach (Russell & Norvig)
Document22 pages
Learning: Book: Artificial Intelligence, A Modern Approach (Russell & Norvig)
Mustefa Mohammed
No ratings yet
Unit - 4
Document26 pages
Unit - 4
UNISA SAKHA
No ratings yet
KNN Algorithm
Document9 pages
KNN Algorithm
amjedfay1
No ratings yet
Channel Selection and Classification of Electroencephalogram Signals: An Artificial Neural Network and Genetic Algorithm-Based Approach
Document10 pages
Channel Selection and Classification of Electroencephalogram Signals: An Artificial Neural Network and Genetic Algorithm-Based Approach
fikri rahman
No ratings yet
A Review On K Means Clustering
Document7 pages
A Review On K Means Clustering
Faizan Shaikh
No ratings yet
C Final Report
Document28 pages
C Final Report
saiakhileshyepuri
No ratings yet
T2 - 972010015 - Full Text
Document9 pages
T2 - 972010015 - Full Text
Yomainid
No ratings yet
Chapter-7-Sampling & Sampling Distributions
Document5 pages
Chapter-7-Sampling & Sampling Distributions
Simeon solomon
No ratings yet
C H A P T e R 7
Document5 pages
C H A P T e R 7
claudine nyampinga
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
Document14 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
Mustafa Türker
No ratings yet
Brain Tumor Segmentation
Document5 pages
Brain Tumor Segmentation
lija
No ratings yet
FRP Final
Document13 pages
FRP Final
nnn
No ratings yet
Chapter-7-Sampling & Sampling Distributions
Document5 pages
Chapter-7-Sampling & Sampling Distributions
Abenezer Nega
No ratings yet
(KtabPDF Com) xrwA7TEBGp
Document32 pages
(KtabPDF Com) xrwA7TEBGp
شجن الزبير
No ratings yet
Statics Chapter 7777
Document5 pages
Statics Chapter 7777
tarekegnworku5
No ratings yet
Soft Computing Short
Document7 pages
Soft Computing Short
Aman Sah
No ratings yet
Intro To Statistics
Document9 pages
Intro To Statistics
Krypsen Dune L. Pilongo
No ratings yet
UNIT I Research Design
Document16 pages
UNIT I Research Design
Suganya C
No ratings yet
Neuro-Fuzzy Programming To Finding Fuzzy Multiple Objective Linear Programming Problems
Document6 pages
Neuro-Fuzzy Programming To Finding Fuzzy Multiple Objective Linear Programming Problems
International Journal of Innovative Science and Research Technology
100% (1)
Building Classifiers With Unrepresentative Training Instances: Experiences From The KDD Cup 2001 Competition
Document8 pages
Building Classifiers With Unrepresentative Training Instances: Experiences From The KDD Cup 2001 Competition
kalluucet
No ratings yet
Unit 2
Document40 pages
Unit 2
vamsi kiran
No ratings yet
Experiment No. 4 TE SL-II (ANN)
Document2 pages
Experiment No. 4 TE SL-II (ANN)
Rutuja
No ratings yet
Introduction To Artificial Neural Network
Document9 pages
Introduction To Artificial Neural Network
Jacob Tauro
No ratings yet
Perceptron Learning
Document19 pages
Perceptron Learning
Workagegn Tatek
No ratings yet
Cytometry PT A - 2018 - Gupta - Deep Learning em Citometria de Imagem-Uma Revisão
Document16 pages
Cytometry PT A - 2018 - Gupta - Deep Learning em Citometria de Imagem-Uma Revisão
Mônica Monteiro
No ratings yet
SDS Deep Learning For Spatial Application
Document47 pages
SDS Deep Learning For Spatial Application
MIKATAgung Jalaludin
No ratings yet
A Novel Approach in Solving 0/1 Knapsack Problem Using Neural Selection Principle
Document5 pages
A Novel Approach in Solving 0/1 Knapsack Problem Using Neural Selection Principle
Murugan Annamalai
No ratings yet
Machine Learning Components in Deterministic Model
Document4 pages
Machine Learning Components in Deterministic Model
ferdi66
No ratings yet
Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI - Presentation
Document18 pages
Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI - Presentation
Marce Vera
No ratings yet
Lecture Notes in Statistics
Document26 pages
Lecture Notes in Statistics
Jayson Saguindan
100% (1)
Digital Image Processing: Segmentation-5
Document43 pages
Digital Image Processing: Segmentation-5
hamza
No ratings yet
Week 4-6 Lessons: Collection of Data
Document8 pages
Week 4-6 Lessons: Collection of Data
bernadeth salmoro
No ratings yet
ML Mod6
Document24 pages
ML Mod6
amarthya v
No ratings yet
Chapter 2 Sampling
Document3 pages
Chapter 2 Sampling
Aba Macha
No ratings yet
Chapter 3 - Lecturer 2 Other Nature Inspired Optimization Techniques
Document32 pages
Chapter 3 - Lecturer 2 Other Nature Inspired Optimization Techniques
MK JAYANTHI KANNAN
No ratings yet
Research Report Guidelines
Document7 pages
Research Report Guidelines
Bhagyasri P
No ratings yet
Information Sciences: Le Zhang, P.N. Suganthan
Document3 pages
Information Sciences: Le Zhang, P.N. Suganthan
Nombre
No ratings yet
Machine Learning Approach For Acute Respiratory Infections (ISPA) Prediction: Case Study Indonesia
Document5 pages
Machine Learning Approach For Acute Respiratory Infections (ISPA) Prediction: Case Study Indonesia
Sasmita Dewi
No ratings yet
Automated Epileptic Seizure Detection Using ICFSandRF PDF
Document12 pages
Automated Epileptic Seizure Detection Using ICFSandRF PDF
Vikram Verma
No ratings yet
Active Learning With Sampling by Uncertainty and Density For Word
Document8 pages
Active Learning With Sampling by Uncertainty and Density For Word
gabrielgonzalo
No ratings yet
Udacity Enterprise Syllabus Data Analyst nd002
Document16 pages
Udacity Enterprise Syllabus Data Analyst nd002
Iaksksks
No ratings yet
Self-Regularity: A New Paradigm for Primal-Dual Interior-Point Algorithms
From Everand
Self-Regularity: A New Paradigm for Primal-Dual Interior-Point Algorithms
Jiming Peng
Rating: 5 out of 5 stars
5/5 (1)
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
K Fold Cross Validation
Document17 pages
K Fold Cross Validation
Lony Islam
No ratings yet
Real Time Speed Estimation
Document7 pages
Real Time Speed Estimation
Raghu Ram
No ratings yet
Etd8273 YZhang
Document124 pages
Etd8273 YZhang
Vũ Mạnh Cường
No ratings yet
Speaker Response
Document14 pages
Speaker Response
sohalder1026
No ratings yet
EfficientNet Tutorial
Document20 pages
EfficientNet Tutorial
Tsigabu
No ratings yet
1 The Road
Document3 pages
1 The Road
sophia787
No ratings yet
UI Design Principles and Application
Document26 pages
UI Design Principles and Application
Jordan Tagao Colcol
No ratings yet
Communication Is A Process That Allows People To Exchange Information by One of Several Method1
Document5 pages
Communication Is A Process That Allows People To Exchange Information by One of Several Method1
haq nawaz
100% (1)
A Tutorial On Machine Learning For Failure Management in Optical Networks
Document15 pages
A Tutorial On Machine Learning For Failure Management in Optical Networks
Hemanth
No ratings yet
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
Document34 pages
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
Adel Lee
No ratings yet
Modue 2 - APPLICATIONS OF BPN
Document40 pages
Modue 2 - APPLICATIONS OF BPN
project mission
No ratings yet
Mba 4 Sem Business Data Warehousing and Data Analytics Kmbnit05 2022
Document1 page
Mba 4 Sem Business Data Warehousing and Data Analytics Kmbnit05 2022
lalaspnup
No ratings yet
Named Entity Recognition in English Using Hidden Markov Model
Document6 pages
Named Entity Recognition in English Using Hidden Markov Model
Anonymous lVQ83F8mC
No ratings yet
Oral Communication in Context: For Senior High School
Document197 pages
Oral Communication in Context: For Senior High School
Desiree Calpito
No ratings yet
Media and Information Languages
Document13 pages
Media and Information Languages
Katherine Caneja
No ratings yet
Muhammad's Resume
Document2 pages
Muhammad's Resume
myekini1
No ratings yet
BI Syllabus (E-Next - In)
Document2 pages
BI Syllabus (E-Next - In)
FIT160-Singh Pratik
No ratings yet
Automatic Video Surveillance System For Pedestrian Crossing Using Digital Image Processing
Document7 pages
Automatic Video Surveillance System For Pedestrian Crossing Using Digital Image Processing
Neha Sinha
No ratings yet
Power BI Imp Questions and Answers
Document10 pages
Power BI Imp Questions and Answers
diptiroy532019
No ratings yet
CW 1 Planning Document FINAL
Document5 pages
CW 1 Planning Document FINAL
er_hvpatel
No ratings yet
Bai Tieu Luan - KPDL - HK2 2022 2023
Document5 pages
Bai Tieu Luan - KPDL - HK2 2022 2023
Nam Dang
No ratings yet
Ensemble
Document2 pages
Ensemble
Shivareddy Gangam
No ratings yet
Summat Ive
Document4 pages
Summat Ive
Hao Li Min
No ratings yet
Institute of Pure and Applied Sciences: Implementation of Learning Vector Quantization (LVQ) Using Matlab
Document8 pages
Institute of Pure and Applied Sciences: Implementation of Learning Vector Quantization (LVQ) Using Matlab
Burak Ece
No ratings yet
Lab 4 - Support Vector Machines: Part B
Document5 pages
Lab 4 - Support Vector Machines: Part B
shahima khan
No ratings yet
Klasifikasi Gambar Medis
Document23 pages
Klasifikasi Gambar Medis
Annisacakeshop Yasmin
No ratings yet
MAT6007 - Session6 - Multilayer Perceptrons
Document13 pages
MAT6007 - Session6 - Multilayer Perceptrons
manojnaidu yandrapu
No ratings yet
Lung CT Image Segmentation Using Deep Neural Networks
Document5 pages
Lung CT Image Segmentation Using Deep Neural Networks
Ber239
No ratings yet
1 Artificial Intelligence
Document17 pages
1 Artificial Intelligence
Rohit Rawat
No ratings yet