1 Introduction To ML

10/3/2022
CS-471
Machine Learning
Dr. Hammad Afzal
hammad.afzal@mcs.edu.pk
Assoc Prof (NUST)

Data and Text Processing Lab
www.codteem.com
Resources
 Lecture Slides will be available on LMS
 Additional references shall be provided (if any)
 Mid Terms : 30%

 Quizzes: 10%
• Total 4 All announced
• Best 3 will be considered
• Assignment: 10%
• Semester Project
• Syndicate Members: 1-3
• Will be announced after 4th Week
2
1
10/3/2022
Resources
Course Intro
• Pre-requisite
• Introductory knowledge of Probability, Statistics and Linear Algebra
• Course meeting times

• Lectures: 2 session/week (Monday & Thursday)
• Course Resources
• Lectures slides, assignments (computer/written), solutions to
problems, research papers, projects, and announcements will be
uploaded on LMS page.
2
10/3/2022
What is Machine Learning?
Make the machine Evaluate how good the

‘learn’ some thing machine has ‘learned’
Machine Learning
Field of study that gives computers the

ability to learn without being explicitly
programmed.
Arthur Sameul (1959)
3
10/3/2022
Machine Learning
Machine learning is programming computers

to optimize a performance criterion using
example data or past experience.
Tom Mitchell (1998)
Machine Learning
• Learning = Improving with experience over some task
A computer program is said to learn from experience E

with respect to some task T and performance measure P,
if its performance at task T, as measured by P, improves
with experience E
4
10/3/2022
Learning Problems – Examples

• Learning = Improving with experience over
some task
• Improve over task T,
• With respect to performance measure P,
• Based on experience E.
• Example
• T = Play checkers
• P = % of games won in a tournament
• E = opportunity to play

• Handwriting recognition
learning problem
• Task T: recognizing handwritten
words within images
• Performance measure P:
percent of words correctly
recognized
• Training experience E: a
database of handwritten words
with given classifications
10
10
5
10/3/2022

• A robot driving learning
problem
• Task T: driving on public four-lane
highways using vision sensors
• Performance measure P: average
distance traveled before an error
(as judged by human overseer)
• Training experience E: a
sequence of images and steering
commands recorded while
observing a human driver
11
11
Machine Learning
• Nicolas learns about Apple and Oranges
12
12
6
10/3/2022
Machine learning
• But will he recognize others?
So learning involves ability to generalize from labeled

examples
13
13
Machine Learning
• There is no need to “learn” to calculate payroll
• Learning is used in:

• Data mining programs that learn to detect fraudulent credit
card transactions
• Programs that learn to filter spam email
• Programs that learn to play checkers/chess
• Autonomous vehicles that learn to drive on public highways
• Self customizing programs
• And many more…
14
14
7
10/3/2022
Machine learning
15
15
Applications
Credit Scoring
• Differentiating
between low-risk and
high-risk customers
from their income and
savings
Discriminant: IF income > θ1 AND savings > θ216

THEN low-risk
ELSE high-risk
16
8
10/3/2022
Applications
Autonomous driving
• ALVINN* – Drives 70mph on highways
*Autonomous Land Vehicle In 17

a Neural Network
17
Face recognition
Training examples of a person
Test images
AT&T Laboratories, Cambridge UK

http://www.uk.research.att.com/facedatabase.html
18
18
9
10/3/2022
Template matching
• Problem: Recognize letters A to Z
Image is converted into 12x12 bitmap.
19
19
Template Matching
Bitmap is represented by 12x12-matrix or by 144-vector
with 0 and 1 coordinates.
0 0 0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 1 0 1 1 0 0 0
0 0 0 0 1 1 0 1 1 0 0 0
0 0 0 0 1 0 0 0 1 0 0 0
0 0 0 1 1 0 0 0 1 1 0 0
0 0 0 1 1 0 0 0 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 0
0 0 1 1 0 0 0 0 0 1 1 0
0 1 1 0 0 0 0 0 0 1 1 0
0 1 1 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 1 1
20
10
10/3/2022
Template matching
Training samples – templates with corresponding class:
t1  { (0,0,0,0,1,1,...,0), ' A '}
t 2  { (0,0,0,0,0,1,...,0), ' A '}
.........
t k  { (0,0,1,1,1,1,...,0), ' B'}
..........
Template of the image to be recognized:
T  { (0,0,0,0,1,1,...,0), ' A? '}
Algorithm:
1. Find ti , so that ti  T . 21
2. Assign image to the same class as ti .
21
Template Matching
Number of templates to store: 2144

If fewer templates are stored, some images might not
be recognized.
Improvements?
22
22
11
10/3/2022
Features
• Features are the individual measurable properties of the signal being
observed.
• The set of features used for learning/recognition is called feature vector.
• The number of used features is the dimensionality of the feature vector.
• n-dimensional feature vectors can be represented as points in n-

dimensional feature space
23
23
Features
height
 x1 
x   x
 2
Class 1 weight Class 1
Class 2 Class 2
24
24
12
10/3/2022
Feature Extraction
• Feature extraction aims to create discriminative
features good for learning
• Good Features
• Objects from the same class have similar feature
values.
• Objects from different classes have different values.
25
“Good” features “Bad” features
25
Features
• Use fewer features if possible
• Use features that differentiate classes well
26
26
13
10/3/2022
• Supervised learning
• Classification
• Regression
• Unsupervised learning
• Reinforcement learning
27
27
CLASSIFICATION
28
28
14
10/3/2022
Supervised learning - Classification

• Objective
• Make Nicolas recognize what is an apple and what is
an orange
29
29
Classification
Apples Oranges
30
30
15
10/3/2022
Classification
• You had some training example
or ‘training data’
What is this???
• The examples were ‘labeled’
• You used those examples to

make the kid ‘learn’ the
difference between an apple
and an orange
31
Its an
apple!!!
31
Classification
Apple
Pear
Tomato
Cow
Dog
Horse
Given: training images and their categories

What are the categories
32
of these test images?
32
16
10/3/2022
Classifier: identify the class of given pattern

Distance between Feature Vectors
 Instead of finding template exactly matching input template look
at how close feature vectors are
 Nearest neighbor classification algorithm:
Class 1
1. Find template closest to the Class 2
input pattern.
2. Classify pattern to the same class
as closest template.
33
33
Classifier
K-Nearest Neighbor Classifier

 Use k nearest neighbors instead of 1 to classify pattern.
Class 1
Class 2
34
34
17
10/3/2022
Classifier
A classifier partitions feature space X into class-labeled
regions such that:
and X 1  X 2    X |Y |  {0}
X  X 1  X 2    X |Y |
X1
X1 X3 X1
X2
X2 X3
The classification consists of determining to which region a feature vector x belongs to.
Borders between regions are called decision boundaries 35
35
Classification
• Cancer Diagnosis – Tumor size for prediction
Malignant
Benign
Tumor Size
36
36
18
10/3/2022
Classification
• Cancer Diagnosis – Generally more than one
variables
Malignant
Benign
Age
Tumor Size
Why supervised – The algorithm is given a number of patients
with the RIGHT ANSWER and we want the algorithm to learn37 to
predict for new patients
37
Classification
• Cancer Diagnosis – Generally more than one
variables
Predict for this
patient
Malignant
Benign
Age
Tumor Size
We want the algorithm to learn the separation line. Once a new

patient arrives with a given age and tumor size – Predict as
38
Malignant or Benign
38
19
10/3/2022
Supervised Learning - Example

 Cancer diagnosis – Many more features
Patient ID # of Tumors Avg Area Avg Density Diagnosis
1 5 20 118 Malignant
2 3 15 130 Benign
3 7 10 52 Benign
4 2 30 100 Malignant
Use this training set to learn how to classify patients

where diagnosis is not known:
Patient ID # of Tumors Avg Area Avg Density Diagnosis
101 4 16 95 ?
102 9 22 125 ?
103 1 14 80 ?
Input Data Classification

39
39
Contents
• Classification
• Regression
40
40
20
10/3/2022
Course Outline
• Machine Learning: Theory and Applications
• Introduction to probability theory and Linear Algebra
• Bayesian Decision Theory
• Parametric Methods
• Dimensionality Reduction
• Frequent Pattern Analysis
• Clustering
• Decision Trees
• Artificial neural networks
• Advanced topics in Machine Learning: HMMs, Support
Vector Machines (SVM), … 41
41
REGRESSION
42
42
21
10/3/2022
Regression
CLASSIFICATION
The variable we are trying to predict is
DISCRETE
REGRESSION
The variable we are trying to predict is
CONTINUOUS
43
43
Regression
• Dataset giving the living areas and prices of 50
houses
44
44
22
10/3/2022
Regression
• We can plot this data
Given data like this,

how can we learn to
predict the prices of
other houses as a
function of the size of
their living areas?
45
45
Regression
• The “input” variables – x(i) (living area in this example)
• The “output” or target variable that we are trying to predict – y(i)
(price)
• A pair (x(i), y(i)) is called a training example
• A list of m training examples {(x(i), y(i)); i =
• 1, . . . ,m}—is called a training set
• X denote the space of input values, and Y the space of output values
46
46
23
10/3/2022
Regression
Given a training set, to learn a function h : X → Y so
that h(x) is a “good” predictor for the corresponding
value of y. For historical reasons, this function h is
called a hypothesis.
47
47
Regression
48
48
24
10/3/2022
Regression
• Example: Price of a
used car
• x : car attributes
y : price
49
49
Contents
• Classification
• Regression
50
50
25
10/3/2022
CLUSTERING
51
51
UNSUPERVISED LEARNING
• CLUSTERING
There are two types of fruit in

the basket, separate them into
two ‘groups’
52
52
26
10/3/2022
UNSUPERVISED LEARNING
• CLUSTERING
 The data was not ‘labeled’ you did
not tell Nicolas which are apples
which are oranges
 May be the kid used the idea that

things in the same group should
be similar to one another as Separate groups or clusters
compared to things in the other
group
 Groups - Clusters
53
53
Clustering
Age
Tumor Size
We have the data for patients but NOT the RIGHT ANSWERS.
The objective is to find interesting structures in data (in this
case two clusters)
54
54
27
10/3/2022
Unsupervised Learning – Cocktail Party Effect

• Speakers recorded speaking simultaneously
55
55
Unsupervised Learning – Cocktail Party Effect

• Source Separation
• Data can be explained by two different speakers speaking – ICA
algorithm
56
Source: http://cnl.salk.edu/~tewon/Blind/blind_audio.html
56
28
10/3/2022
Classification vs Clustering
• Challenges
• Intra-class variability
• Inter-class similarity
57
57
Intra class variability
The letter “T” in different typefaces
58
Same face under different expression, pose, illumination
58
29
10/3/2022
Inter class similarity
Characters that look similar
Identical twins
59
59
Contents
• Classification
• Regression
60
60
30
10/3/2022
REINFORCEMENT
LEARNING
61
61
Reinforcement Learning
• In RL, the computer is simply given a goal to achieve.
• The computer then learns how to achieve that goal by
trial-and-error interactions with its environment
System learns from success and failure,

reward and punishment
62
62
31
10/3/2022
Reinforcement Learning
• Similar to training a pet dog
 Every time dog does something good you

pat him and say ‘good dog’
 Every time dog does some thing bad you

scold him saying ‘bad dog’
 Over time dog will learn to do good things
63
63
Learning to Ride a Bicycle

• Goal given to RL System - To ride the bicycle without falling over
• The RL system begins riding the bicycle and performs a series of actions that
result in the bicycle being tilted 45 degrees to the right
• At this point two actions possible: turn the handle bars left or turn them right.
• RL system turns the handle bars to the left, immediately crashes to the
ground, and receives a negative reinforcement.
• The RL system has just learned not to turn the handle bars left when tilted 45
64
degrees to the right
64
32
10/3/2022
Learning to Ride a Bicycle

• RL system turns the handle bars to the RIGHT
• Result: CRASH!!!
• Receives negative reinforcement
• RL system has learned that the “state” of being titled 45 degrees to

the right is bad
65
65
A fancy PR Example
66
66
33
10/3/2022
A (Simplified )PR System

Two Modes:
Classification Mode
test Feature
Preprocessing Classification
pattern Measurement
training Feature
pattern Preprocessing Extraction/ Learning
Selection
67
Training Mode
67
A Fancy problem
Sorting incoming fish on a conveyor according to
species (salmon or sea bass) using optical sensing
Salmon or sea bass?

(2 categories or classes)
It is a classification
problem. How to
solve it?
68
68
34
10/3/2022
Approach
Data Collection: Take some
images using optical sensor
69
69
Approach
• Data collection
• Preprocessing: Use a segmentation operation to isolate fishes from

one another and from the background
• Information from a single fish is sent to a feature extractor whose

purpose is to reduce the data by measuring certain features
• The features are passed to a classifier that evaluates the evidence

and then takes a final decision
70
70
35
10/3/2022
Approach
• Set up a camera and take some sample images to extract features
• Length
• Lightness
• Width
• Number and shape of fins
• Position of the mouth, etc…
• This is the set of all suggested features to explore for use in our
classifier!
71
71
How data is collected & used

• Data can be raw signals (e.g. images) or features extracted from images –
• The data is divided into three parts (exact percentage of each portion depends (partially) on
data sample size)
Train Validation Test
• Train data: It is used to build a prediction model or learner (classifier)
• Validation data: It is used to estimate the prediction error (classification error)

and adjust the learner parameters
• Test data: It is used to estimate the classification error of the chosen learner on
unseen data called generalization error. The test must be 72 kept inside a ‘vault’
and be brought out only at the end of data analysis
72
36
10/3/2022
Pre-processing
• If data is an image then apply image processing
• What is an image?
• A gray scale image z = f(x,y) is composed of pixels where x & y
are the location of the pixel and z is its intensity
• Image can be considered just a matrix of certain dimensions
 a11  a1 n 
 
A    
a  a mn 
 m1
Divided
into 8x8
blocks 73
73
Feature extraction
• Feature extraction: use domain knowledge
• The sea bass is generally longer than a salmon
• The average lightness of sea bass scales is greater than that of salmon
• We will use training data in order to learn a classification rule based

on these features (length of a fish and average lightness)
• Length of fish and average lightness may not be sufficient features i.e.
they may not guarantee 100% classification results
74
74
37
10/3/2022
Classification – Option 1
• Select the length of the fish as a possible feature
for discrimination between two classes
Decision
Boundary
75
Histograms for the length feature for the two categories
75
Cost of Taking a Decision

• A fish-packaging industry use the system to pack
fish in cans.
• Two facts
• People do not want to find sea bass in the cans labeled
salmon
• People occasionally accepts to find salmon in the cans
labeled sea-bass
• So the cost of taking a decision in favor of sea bass

when the true reality is salmon is not the same as
the converse 76
76
38
10/3/2022
Evaluation of a classifier
• How to evaluate a certain classifier?
• Classification error: The percentage of patterns (e.g.

fish) that are assigned to wrong category
• Choose a classifier that gives minimum classification
error
• Risk is the total expected cost of decisions

• Choose a classifier that minimizes the risk 77
77
Classification – option 2
• Select the average lightness of the fish as a
possible feature for discrimination between two
classes
78
Histograms for the average lightness feature for the two categories
78
39
10/3/2022
x   x1 x2 
• Use both length and average lightness features for
classification. Use a simple line to discriminate
Decision
Boundary
79
The two features of lightness and width for sea bass and salmon. The dark line
might serve as a decision boundary of our classifier
79
Classification – option 3
• Use both length and average lightness features for
classification. Use a complex model to discriminate
Overly complex models for the fish will lead to decision boundaries that are
80 (classification
complicated. While such a decision may lead to perfect classification
error is zero) of our training samples, it would lead to poor performance on future
patterns (generalization is poor)  overfitting
80
40
10/3/2022
Comments
• Model selection
• A complex model seems not be correct one. It is learning the
training data by heart.
• So how to choose correct model? (a difficult question)

• Occam Razor principle says “simpler models should be preferred
over complex ones”
• Generalization error
• The minimization of classification error on train database does not
guarantee minimization of classification error on test database
(generalization error) 81
81
• Decision boundary with good generalization
The decision boundary shown might represent the optimal tradeoff between
performance on the training set and simplicity of classifier.
82
82
41
10/3/2022
RESOURCES
83
83
Resources - Journals
• Journal of Machine Learning Research
• Machine Learning
• Pattern Recognition
• Pattern Recognition Letters
• Neural Computation
• Neural Networks
• IEEE Transactions on Neural Networks
• IEEE Transactions on Pattern Analysis and Machine Intelligence
• ...
84
84
42
10/3/2022
Resources – Conferences
• International Conference on Machine Learning (ICML)
• International Conference of Pattern Recognition (ICPR)
• European Conference on Machine Learning (ECML)
• ...
85
85
The material in these slides has been taken from the following sources
Acknowledgements
• Dr. Imran Siddiqi: Bahria University, Islamabad
• Machine Intelligence, Dr M. Hanif, UET, Lahore
• Machine Learning, S. Stock, University of Nebraska
• Lecture Slides, Introduction to Machine Learning,
E. Alpyadin, MIT Press.
• Machine Learning, Andrew Ng – Stanfrod
University
• Fisher kernels for image representation &
generative classification models, Jakob Verbeek
86
86
43
10/3/2022
Courtesy
• Slides are prepared using material from Website of
• Jiawei Han, Micheline Kamber, and Jian Pei
• University of Illinois at Urbana-Champaign & Simon Fraser
University.
• Course Slides: Infolabs Stanford University

• Course Slides: Purdue University
87
87
Thank You
88
88
44

1 Introduction To ML

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Introduction To ML

Uploaded by

Copyright:

Available Formats

10/3/2022

Assoc Prof (NUST)

 Lecture Slides will be available on LMS

 Additional references shall be provided (if any)

 Mid Terms : 30%

• Course meeting times

What is Machine Learning?

Make the machine Evaluate how good the

Field of study that gives computers the

Arthur Sameul (1959)

Machine learning is programming computers

Tom Mitchell (1998)

A computer program is said to learn from experience E

Learning Problems – Examples

Learning Problems – Examples

Learning Problems – Examples

So learning involves ability to generalize from labeled

• Learning is used in:

Discriminant: IF income > θ1 AND savings > θ216

*Autonomous Land Vehicle In 17

AT&T Laboratories, Cambridge UK

Image is converted into 12x12 bitmap.

2. Assign image to the same class as ti .

Number of templates to store: 2144

• The set of features used for learning/recognition is called feature vector.

• The number of used features is the dimensionality of the feature vector.

• n-dimensional feature vectors can be represented as points in n-

“Good” features “Bad” features

Supervised learning - Classification

• The examples were ‘labeled’

• You used those examples to

Given: training images and their categories

Classifier: identify the class of given pattern

K-Nearest Neighbor Classifier

We want the algorithm to learn the separation line. Once a new

Supervised Learning - Example

Use this training set to learn how to classify patients

Input Data Classification

Given data like this,

There are two types of fruit in

 May be the kid used the idea that

Unsupervised Learning – Cocktail Party Effect

Unsupervised Learning – Cocktail Party Effect

Intra class variability

The letter “T” in different typefaces

Inter class similarity

Characters that look similar

System learns from success and failure,

 Every time dog does something good you

 Every time dog does some thing bad you

 Over time dog will learn to do good things

Learning to Ride a Bicycle

Learning to Ride a Bicycle

• RL system has learned that the “state” of being titled 45 degrees to

A (Simplified )PR System

Salmon or sea bass?

• Preprocessing: Use a segmentation operation to isolate fishes from

• Information from a single fish is sent to a feature extractor whose

• The features are passed to a classifier that evaluates the evidence

How data is collected & used

Train Validation Test

• Train data: It is used to build a prediction model or learner (classifier)

• Validation data: It is used to estimate the prediction error (classification error)