You are on page 1of 44

‫َّر‬ ‫َّر‬

‫ِبْس اللِه ال ْحٰمِن ال ِحْي‬


‫ِم‬ ‫ِم‬
Data Mining
Types of Learning & Machine
Learning Cycle
Types of Learning

1 Supervised Learning
Three main
types of 2 Unsupervised Learning
learning are

3 Semi-supervised Learning

Dr. Rao Muhammad Adeel Nawab 3


Supervised Learning

Definition

In Supervised Learning, a Machine Learning


Algorithm learns from Annotated Data
Annotated Data means that for all Training Examples,
Output is associated with Inputs

Dr. Rao Muhammad Adeel Nawab 4


Types of Supervised Learning

Two main 1 Classification


types of
Supervised
Learning are 2 Regression

Dr. Rao Muhammad Adeel Nawab 5


Types of Supervised Learning Cont...

Classification

Definition
In Classification, the Output is Categorical (or Discrete)

Dr. Rao Muhammad Adeel Nawab 6


Types of Supervised Learning Cont...

Example
Task: Gender Identification – Annotated Data
Instance No. Input Output
Height Weight Hair Length Beard Scarf Gender
d1 180.3 196 Bald Yes No Male
d2 170.0 120 Long No No Female
d3 178.5 200 Short No No Male
d4 163.4 110 Medium No Yes Female
d5 175.2 220 Short Yes No Male
d6 165.0 150 Medium No Yes Female
d7 179.1 185 Long Yes No Male
d8 160.5 130 Short No No Female
d9 177.8 160 Bald No No Male
d10 161.1 100 Medium No No Female
Dr. Rao Muhammad Adeel Nawab 7
Types of Supervised Learning Cont...

Regression

Definition
In Regression, the Output is Numeric (or Continuous)

Dr. Rao Muhammad Adeel Nawab 8


Types of Supervised Learning Cont...

Example
Task: House Price Prediction– Annotated Data
Instance Input Output
No. House Price
rm age dis rad Istat
d1 6.575 65.2 4.0900 1 4.98 24.0
d2 6.421 78.9 4.9671 2 9.14 21.6
d3 6.998 45.8 6.0622 3 2.94 33.4
d4 7.147 54.2 6.0622 3 5.33 36.2
d5 6.012 66.6 5.5605 5 12.43 22.9
Dr. Rao Muhammad Adeel Nawab 9
Unsupervised Learning

Definition

In Unsupervised Learning, a Machine Learning


Algorithm learns from Unannotated Data
Unannotated Data means that for all Training Examples,
Output is not associated with Inputs

Dr. Rao Muhammad Adeel Nawab 10


Unsupervised Learning Cont...

Strengths Weaknesses

You can easily and quickly Learning may not be


collect large amount of accurate, since the quality
Unannotated Data of Data is low because it is
unannotated

Dr. Rao Muhammad Adeel Nawab 11


Semi-Supervised Learning

Definition

In Semi-supervised Learning, a Machine Learning


Algorithm learns from Semi-annotated Data
Semi-annotated Data means that only for some Training
Examples, Output is associated with Inputs

Dr. Rao Muhammad Adeel Nawab 12


Semi-Supervised Learning Cont...

Strengths Weaknesses

You can quickly collect Learning may not be


large amount of Semi- accurate, since the quality
annotated Data of Data is low because all
Training Examples are not
annotated

Dr. Rao Muhammad Adeel Nawab 13


Popular and Widely Used Machine Learning Algorithms for
Different Types of Learning

ML Algorithms for Supervised Learning

Naïve Bayes Gradient Boosting


Random Forest Multi-layer Perceptron
Support Vector Machine K-Nearest Neighbors
Logistic Regression

Dr. Rao Muhammad Adeel Nawab 14


Popular and Widely Used Machine Learning Algorithms for
Different Types of Learning Cont...

ML Algorithms for Classification

Naïve Bayes Gradient Boosting


Random Forest Classifier Multi-layer Perceptron
Support Vector Machine K-Nearest Neighbors
Classifier
Logistic Regression

Dr. Rao Muhammad Adeel Nawab 15


Popular and Widely Used Machine Learning Algorithms for
Different Types of Learning Cont...

ML Algorithms for Regression

Random Forest Regressor


Support Vector Machine Regressor
Logistic Regression
Linear Regression

Dr. Rao Muhammad Adeel Nawab 16


Popular and Widely Used Machine Learning Algorithms for
Different Types of Learning Cont...

ML Algorithms for Unsupervised Learning


K-means Clustering Algorithm DBSCAN – Density-Based Spatial
Clustering of Applications with
Noise
Agglomerative Hierarchical EM using GMM – Expectation-
Clustering Algorithm Maximization (EM) Clustering
using Gaussian Mixture Models
(GMM)

Mean-Shift Clustering Algorithm

Dr. Rao Muhammad Adeel Nawab 17


Popular and Widely Used Machine Learning Algorithms for
Different Types of Learning Cont...

ML Algorithms for Semi-supervised Learning

Label Propagation

Dr. Rao Muhammad Adeel Nawab 18


Machine Learning
Cycle
Machine Learning Cycle

Training /
Feedback
Learning
Phase
Phase

Testing /
Application
Phase Evaluation
Phase

Dr. Rao Muhammad Adeel Nawab 20


Balanced Data vs Unbalanced Data

For more accurate learning, it is important to have


Balanced Data

Balanced Data

For each class, the Dataset must contain the same


number of instances

Dr. Rao Muhammad Adeel Nawab 21


Example 01 – Balanced Data vs Unbalanced Data

Machine Learning Problem: Gender Identification

No. of Classes
Class 01 = Male Class 02 = Female

Dataset Size
300 Instances

Dr. Rao Muhammad Adeel Nawab 22


Example 01 –
Balanced Data vs Unbalanced Data Cont...

Examples – Balance &Unbalanced Datasets

Unbalanced Dataset 01 Balanced Dataset 02

Male = 50, Female = 250 Male = 150, Female = 150

Note that this Dataset is Note that this Dataset is


highly unbalanced Balanced

Reason for Unbalanced Datasets: For Male and Female


classes, the number of instances in not same

Dr. Rao Muhammad Adeel Nawab 23


Data Split

Problem A Possible Solution

For both Training Phase Split available Data into


and Testing Phase, we  Train Data (or Train set)
 Test Data (or Test set)
need
 Data Use Train Data in the
Training Phase and Test
Data in the Testing Phase

Dr. Rao Muhammad Adeel Nawab 24


Data Split Cont...

Standard Approach for Data Split

Train Set Use 2 / 3 (67%) of Data


Test Set Use 1 / 3 (33%) of Data

Dr. Rao Muhammad Adeel Nawab 25


Data Split Cont...

Two main approaches to Data split:


 Random Data Split
 Class Balanced Data Split

Random Data Split Class Balanced Data Split

In this approach, the Data In this approach, the Data


Distribution (for each class) in the Distribution (for each class) in the
original Dataset is not followed original Dataset is strictly followed
while splitting Data while splitting Data

Dr. Rao Muhammad Adeel Nawab 26


Example 01 – Data Split

Machine Learning Problem: Gender Identification

No. of Classes
Class 01 = Male Class 02 = Female

Original Dataset Size


600 Instances

Dr. Rao Muhammad Adeel Nawab 27


Example 01 – Data Split Cont...

Data Distribution
Male = 300 (50%) Female = 300 (50%)

Train-Test Split Ratio


67%-33%

Dr. Rao Muhammad Adeel Nawab 28


Example 01 – Data Split Cont...

Random Data Split Approach

Train Set Test Set

400 instances 200 instances


(67% of Original Dataset) (33% of Original Dataset)
Male = 250 Male = 150
Female = 150 Female = 50

Dr. Rao Muhammad Adeel Nawab 29


Example 01 – Data Split Cont...

Class Balanced Data Split Approach

Train Set Test Set

400 instances 200 instances


(67% of Original Dataset) (33% of Original Dataset)
Male = 200 Male = 100
Female = 200 Female = 100

Dr. Rao Muhammad Adeel Nawab 30


Machine Learning Cycle

Recall
Training /
Feedback
Learning
Phase
Phase

Testing /
Application
Phase Evaluation
Phase

Dr. Rao Muhammad Adeel Nawab 31


Training / Learning Phase

Definition

Use Training Data to build a Purpose


Model
Build a Model (or Intelligent
Program) from Training Data,
to make predictions on
unseen Data

Dr. Rao Muhammad Adeel Nawab 32


Testing / Evaluation Phase

Definition

Use Test Data to evaluate the


performance of Model (build Purpose
in the Training Phase)
Judge how good Model has
Learned from the Training
Data using Evaluation
Measure

Dr. Rao Muhammad Adeel Nawab 33


Standard Evaluation Measures

Classification - Standard Evaluation Measures

Baseline Accuracy F1 – Measure


Accuracy Area Under the Curve (AUC)
Precision
Recall

Dr. Rao Muhammad Adeel Nawab 34


Standard Evaluation Measures Cont...

Regression - Standard Evaluation Measures


Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
R² or Coefficient of Determination
Adjusted R²

Dr. Rao Muhammad Adeel Nawab 35


Summary - Training and Testing Phases

Recall the Equation

𝑫𝒂𝒕𝒂=𝑴𝒐𝒅𝒆𝒍+𝑬𝒓𝒓𝒐𝒓
 In Training Phase, Model is build using the Training Data
 In Testing Phase, Test Data is used to check the Error in the
Model

Dr. Rao Muhammad Adeel Nawab 36


Application Phase

Definition

Deploy the Model in the real


world Purpose

Use the Model to make


predictions on future unseen
Data for a range of Real-world
Applications

Dr. Rao Muhammad Adeel Nawab 37


Feedback Phase

Definition

Take Feedback from Users and


Domain Experts on your Purpose
deployed Model

To further improve the


deployed Model

Dr. Rao Muhammad Adeel Nawab 38


Machine Learning
– Training Regimes
Training Regime

Definition Purpose

A systematic way in which When we learn (or train)


Training Data is used by a systematically, the quality
Machine Learning of Training increases
Algorithm to learn from it

Dr. Rao Muhammad Adeel Nawab 40


Types of Training Regimes

1 Batch Method
Some of the
main types of
Training 2 Incremental Method
Regimes are

3 On-line Method

Dr. Rao Muhammad Adeel Nawab 41


Types of Training Regimes Cont...

Batch Method

In this method, all Training Examples are available


and used all at once to build the Model (or
Hypothesis h)

Dr. Rao Muhammad Adeel Nawab 42


Types of Training Regimes Cont...

Incremental Method

In the method, one member (Training Example) of


the Training Data is selected at a time and used to
modify the current Hypothesis (h)

Dr. Rao Muhammad Adeel Nawab 43


Types of Training Regimes Cont...

On-line Method

If Training Examples become available one at a time


and are used as they become available, the Training
Regime is called On-line Method
Example
 A robot which is learning a Hypothesis (h) from sensory
inputs which controls its actions (and hence determines
its future sensory inputs)
Dr. Rao Muhammad Adeel Nawab 44

You might also like