You are on page 1of 44

10/3/2022

CS-471
Machine Learning
Dr. Hammad Afzal
hammad.afzal@mcs.edu.pk

Assoc Prof (NUST)


Data and Text Processing Lab
www.codteem.com

Resources

 Lecture Slides will be available on LMS

 Additional references shall be provided (if any)

 Mid Terms : 30%


 Quizzes: 10%
• Total 4 All announced
• Best 3 will be considered

• Assignment: 10%
• Semester Project
• Syndicate Members: 1-3
• Will be announced after 4th Week
2

1
10/3/2022

Resources

Course Intro
• Pre-requisite
• Introductory knowledge of Probability, Statistics and Linear Algebra

• Course meeting times


• Lectures: 2 session/week (Monday & Thursday)

• Course Resources
• Lectures slides, assignments (computer/written), solutions to
problems, research papers, projects, and announcements will be
uploaded on LMS page.

2
10/3/2022

What is Machine Learning?

Make the machine Evaluate how good the


‘learn’ some thing machine has ‘learned’

Machine Learning

Field of study that gives computers the


ability to learn without being explicitly
programmed.

Arthur Sameul (1959)

3
10/3/2022

Machine Learning

Machine learning is programming computers


to optimize a performance criterion using
example data or past experience.

Tom Mitchell (1998)

Machine Learning
• Learning = Improving with experience over some task

A computer program is said to learn from experience E


with respect to some task T and performance measure P,
if its performance at task T, as measured by P, improves
with experience E

4
10/3/2022

Learning Problems – Examples


• Learning = Improving with experience over
some task
• Improve over task T,
• With respect to performance measure P,
• Based on experience E.
• Example
• T = Play checkers
• P = % of games won in a tournament
• E = opportunity to play

Learning Problems – Examples


• Handwriting recognition
learning problem
• Task T: recognizing handwritten
words within images
• Performance measure P:
percent of words correctly
recognized
• Training experience E: a
database of handwritten words
with given classifications

10

10

5
10/3/2022

Learning Problems – Examples


• A robot driving learning
problem
• Task T: driving on public four-lane
highways using vision sensors
• Performance measure P: average
distance traveled before an error
(as judged by human overseer)
• Training experience E: a
sequence of images and steering
commands recorded while
observing a human driver

11

11

Machine Learning
• Nicolas learns about Apple and Oranges

12

12

6
10/3/2022

Machine learning
• But will he recognize others?

So learning involves ability to generalize from labeled


examples
13

13

Machine Learning
• There is no need to “learn” to calculate payroll

• Learning is used in:


• Data mining programs that learn to detect fraudulent credit
card transactions
• Programs that learn to filter spam email
• Programs that learn to play checkers/chess
• Autonomous vehicles that learn to drive on public highways
• Self customizing programs
• And many more…
14

14

7
10/3/2022

Machine learning

15

15

Applications

Credit Scoring
• Differentiating
between low-risk and
high-risk customers
from their income and
savings

Discriminant: IF income > θ1 AND savings > θ216


THEN low-risk
ELSE high-risk
16

8
10/3/2022

Applications

Autonomous driving
• ALVINN* – Drives 70mph on highways

*Autonomous Land Vehicle In 17


a Neural Network

17

Face recognition
Training examples of a person

Test images

AT&T Laboratories, Cambridge UK


http://www.uk.research.att.com/facedatabase.html

18

18

9
10/3/2022

Template matching
• Problem: Recognize letters A to Z

Image is converted into 12x12 bitmap.

19

19

Template Matching
Bitmap is represented by 12x12-matrix or by 144-vector
with 0 and 1 coordinates.
0 0 0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 1 0 1 1 0 0 0
0 0 0 0 1 1 0 1 1 0 0 0
0 0 0 0 1 0 0 0 1 0 0 0
0 0 0 1 1 0 0 0 1 1 0 0
0 0 0 1 1 0 0 0 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 0
0 0 1 1 0 0 0 0 0 1 1 0
0 1 1 0 0 0 0 0 0 1 1 0
0 1 1 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 1 1

20

10
10/3/2022

Template matching
Training samples – templates with corresponding class:
t1  { (0,0,0,0,1,1,...,0), ' A '}
t 2  { (0,0,0,0,0,1,...,0), ' A '}
.........
t k  { (0,0,1,1,1,1,...,0), ' B'}
..........
Template of the image to be recognized:
T  { (0,0,0,0,1,1,...,0), ' A? '}

Algorithm:
1. Find ti , so that ti  T . 21

2. Assign image to the same class as ti .

21

Template Matching

Number of templates to store: 2144


If fewer templates are stored, some images might not
be recognized.

Improvements?

22

22

11
10/3/2022

Features
• Features are the individual measurable properties of the signal being
observed.

• The set of features used for learning/recognition is called feature vector.

• The number of used features is the dimensionality of the feature vector.

• n-dimensional feature vectors can be represented as points in n-


dimensional feature space

23

23

Features
height

 x1 
x   x
 2
Class 1 weight Class 1
Class 2 Class 2

24

24

12
10/3/2022

Feature Extraction
• Feature extraction aims to create discriminative
features good for learning
• Good Features
• Objects from the same class have similar feature
values.
• Objects from different classes have different values.

25

“Good” features “Bad” features

25

Features
• Use fewer features if possible
• Use features that differentiate classes well

26

26

13
10/3/2022

• Supervised learning
• Classification

• Regression

• Unsupervised learning

• Reinforcement learning

27

27

CLASSIFICATION

28

28

14
10/3/2022

Supervised learning - Classification


• Objective
• Make Nicolas recognize what is an apple and what is
an orange

29

29

Classification

Apples Oranges

30

30

15
10/3/2022

Classification
• You had some training example
or ‘training data’
What is this???

• The examples were ‘labeled’

• You used those examples to


make the kid ‘learn’ the
difference between an apple
and an orange
31
Its an
apple!!!

31

Classification
Apple

Pear

Tomato

Cow

Dog

Horse

Given: training images and their categories


What are the categories
32
of these test images?

32

16
10/3/2022

Classifier: identify the class of given pattern


Distance between Feature Vectors
 Instead of finding template exactly matching input template look
at how close feature vectors are
 Nearest neighbor classification algorithm:

Class 1
1. Find template closest to the Class 2
input pattern.
2. Classify pattern to the same class
as closest template.

33

33

Classifier

K-Nearest Neighbor Classifier


 Use k nearest neighbors instead of 1 to classify pattern.

Class 1

Class 2

34

34

17
10/3/2022

Classifier
A classifier partitions feature space X into class-labeled
regions such that:
and X 1  X 2    X |Y |  {0}
X  X 1  X 2    X |Y |

X1
X1 X3 X1
X2
X2 X3

The classification consists of determining to which region a feature vector x belongs to.
Borders between regions are called decision boundaries 35

35

Classification
• Cancer Diagnosis – Tumor size for prediction

Malignant

Benign
Tumor Size

36

36

18
10/3/2022

Classification
• Cancer Diagnosis – Generally more than one
variables

Malignant
Benign

Age

Tumor Size
Why supervised – The algorithm is given a number of patients
with the RIGHT ANSWER and we want the algorithm to learn37 to
predict for new patients

37

Classification
• Cancer Diagnosis – Generally more than one
variables
Predict for this
patient
Malignant
Benign

Age

Tumor Size

We want the algorithm to learn the separation line. Once a new


patient arrives with a given age and tumor size – Predict as
38

Malignant or Benign

38

19
10/3/2022

Supervised Learning - Example


 Cancer diagnosis – Many more features
Patient ID # of Tumors Avg Area Avg Density Diagnosis
1 5 20 118 Malignant
2 3 15 130 Benign
3 7 10 52 Benign
4 2 30 100 Malignant

Use this training set to learn how to classify patients


where diagnosis is not known:
Patient ID # of Tumors Avg Area Avg Density Diagnosis
101 4 16 95 ?
102 9 22 125 ?
103 1 14 80 ?

Input Data Classification


39

39

Contents
• Supervised learning
• Classification

• Regression

• Unsupervised learning

• Reinforcement learning

40

40

20
10/3/2022

Course Outline
• Machine Learning: Theory and Applications
• Introduction to probability theory and Linear Algebra
• Bayesian Decision Theory
• Parametric Methods
• Dimensionality Reduction
• Frequent Pattern Analysis
• Clustering
• Decision Trees
• Artificial neural networks
• Advanced topics in Machine Learning: HMMs, Support
Vector Machines (SVM), … 41

41

REGRESSION

42

42

21
10/3/2022

Regression
CLASSIFICATION
The variable we are trying to predict is
DISCRETE

REGRESSION
The variable we are trying to predict is
CONTINUOUS

43

43

Regression
• Dataset giving the living areas and prices of 50
houses

44

44

22
10/3/2022

Regression
• We can plot this data

Given data like this,


how can we learn to
predict the prices of
other houses as a
function of the size of
their living areas?

45

45

Regression
• The “input” variables – x(i) (living area in this example)
• The “output” or target variable that we are trying to predict – y(i)
(price)
• A pair (x(i), y(i)) is called a training example
• A list of m training examples {(x(i), y(i)); i =
• 1, . . . ,m}—is called a training set
• X denote the space of input values, and Y the space of output values

46

46

23
10/3/2022

Regression
Given a training set, to learn a function h : X → Y so
that h(x) is a “good” predictor for the corresponding
value of y. For historical reasons, this function h is
called a hypothesis.

47

47

Regression

48

48

24
10/3/2022

Regression
• Example: Price of a
used car
• x : car attributes
y : price

49

49

Contents
• Supervised learning
• Classification

• Regression

• Unsupervised learning

• Reinforcement learning

50

50

25
10/3/2022

CLUSTERING

51

51

UNSUPERVISED LEARNING
• CLUSTERING

There are two types of fruit in


the basket, separate them into
two ‘groups’

52

52

26
10/3/2022

UNSUPERVISED LEARNING
• CLUSTERING
 The data was not ‘labeled’ you did
not tell Nicolas which are apples
which are oranges

 May be the kid used the idea that


things in the same group should
be similar to one another as Separate groups or clusters
compared to things in the other
group

 Groups - Clusters
53

53

Clustering

Age

Tumor Size

We have the data for patients but NOT the RIGHT ANSWERS.
The objective is to find interesting structures in data (in this
case two clusters)
54

54

27
10/3/2022

Unsupervised Learning – Cocktail Party Effect


• Speakers recorded speaking simultaneously

55

55

Unsupervised Learning – Cocktail Party Effect


• Source Separation
• Data can be explained by two different speakers speaking – ICA
algorithm

56
Source: http://cnl.salk.edu/~tewon/Blind/blind_audio.html

56

28
10/3/2022

Classification vs Clustering

• Challenges
• Intra-class variability
• Inter-class similarity
57

57

Intra class variability

The letter “T” in different typefaces

58
Same face under different expression, pose, illumination

58

29
10/3/2022

Inter class similarity

Characters that look similar

Identical twins

59

59

Contents
• Supervised learning
• Classification

• Regression

• Unsupervised learning

• Reinforcement learning

60

60

30
10/3/2022

REINFORCEMENT
LEARNING

61

61

Reinforcement Learning
• In RL, the computer is simply given a goal to achieve.
• The computer then learns how to achieve that goal by
trial-and-error interactions with its environment

System learns from success and failure,


reward and punishment
62

62

31
10/3/2022

Reinforcement Learning
• Similar to training a pet dog

 Every time dog does something good you


pat him and say ‘good dog’

 Every time dog does some thing bad you


scold him saying ‘bad dog’

 Over time dog will learn to do good things

63

63

Learning to Ride a Bicycle


• Goal given to RL System - To ride the bicycle without falling over

• The RL system begins riding the bicycle and performs a series of actions that
result in the bicycle being tilted 45 degrees to the right

• At this point two actions possible: turn the handle bars left or turn them right.

• RL system turns the handle bars to the left, immediately crashes to the
ground, and receives a negative reinforcement.

• The RL system has just learned not to turn the handle bars left when tilted 45
64
degrees to the right

64

32
10/3/2022

Learning to Ride a Bicycle


• RL system turns the handle bars to the RIGHT
• Result: CRASH!!!
• Receives negative reinforcement

• RL system has learned that the “state” of being titled 45 degrees to


the right is bad

65

65

A fancy PR Example

66

66

33
10/3/2022

A (Simplified )PR System


Two Modes:
Classification Mode

test Feature
Preprocessing Classification
pattern Measurement

training Feature
pattern Preprocessing Extraction/ Learning
Selection

67
Training Mode

67

A Fancy problem
Sorting incoming fish on a conveyor according to
species (salmon or sea bass) using optical sensing

Salmon or sea bass?


(2 categories or classes)

It is a classification
problem. How to
solve it?
68

68

34
10/3/2022

Approach
Data Collection: Take some
images using optical sensor

69

69

Approach
• Data collection

• Preprocessing: Use a segmentation operation to isolate fishes from


one another and from the background

• Information from a single fish is sent to a feature extractor whose


purpose is to reduce the data by measuring certain features

• The features are passed to a classifier that evaluates the evidence


and then takes a final decision

70

70

35
10/3/2022

Approach
• Set up a camera and take some sample images to extract features
• Length
• Lightness
• Width
• Number and shape of fins
• Position of the mouth, etc…

• This is the set of all suggested features to explore for use in our
classifier!

71

71

How data is collected & used


• Data can be raw signals (e.g. images) or features extracted from images –
• The data is divided into three parts (exact percentage of each portion depends (partially) on
data sample size)

Train Validation Test

• Train data: It is used to build a prediction model or learner (classifier)

• Validation data: It is used to estimate the prediction error (classification error)


and adjust the learner parameters

• Test data: It is used to estimate the classification error of the chosen learner on
unseen data called generalization error. The test must be 72 kept inside a ‘vault’
and be brought out only at the end of data analysis

72

36
10/3/2022

Pre-processing
• If data is an image then apply image processing
• What is an image?
• A gray scale image z = f(x,y) is composed of pixels where x & y
are the location of the pixel and z is its intensity
• Image can be considered just a matrix of certain dimensions

 a11  a1 n 
 
A    
a  a mn 
 m1

Divided
into 8x8
blocks 73

73

Feature extraction
• Feature extraction: use domain knowledge
• The sea bass is generally longer than a salmon
• The average lightness of sea bass scales is greater than that of salmon

• We will use training data in order to learn a classification rule based


on these features (length of a fish and average lightness)

• Length of fish and average lightness may not be sufficient features i.e.
they may not guarantee 100% classification results

74

74

37
10/3/2022

Classification – Option 1
• Select the length of the fish as a possible feature
for discrimination between two classes

Decision
Boundary

75

Histograms for the length feature for the two categories

75

Cost of Taking a Decision


• A fish-packaging industry use the system to pack
fish in cans.
• Two facts
• People do not want to find sea bass in the cans labeled
salmon
• People occasionally accepts to find salmon in the cans
labeled sea-bass

• So the cost of taking a decision in favor of sea bass


when the true reality is salmon is not the same as
the converse 76

76

38
10/3/2022

Evaluation of a classifier
• How to evaluate a certain classifier?

• Classification error: The percentage of patterns (e.g.


fish) that are assigned to wrong category
• Choose a classifier that gives minimum classification
error

• Risk is the total expected cost of decisions


• Choose a classifier that minimizes the risk 77

77

Classification – option 2
• Select the average lightness of the fish as a
possible feature for discrimination between two
classes

78

Histograms for the average lightness feature for the two categories

78

39
10/3/2022

Classification – Option 3
x   x1 x2 
• Use both length and average lightness features for
classification. Use a simple line to discriminate

Decision
Boundary

79
The two features of lightness and width for sea bass and salmon. The dark line
might serve as a decision boundary of our classifier

79

Classification – option 3
• Use both length and average lightness features for
classification. Use a complex model to discriminate

Overly complex models for the fish will lead to decision boundaries that are
80 (classification
complicated. While such a decision may lead to perfect classification
error is zero) of our training samples, it would lead to poor performance on future
patterns (generalization is poor)  overfitting

80

40
10/3/2022

Comments
• Model selection
• A complex model seems not be correct one. It is learning the
training data by heart.

• So how to choose correct model? (a difficult question)


• Occam Razor principle says “simpler models should be preferred
over complex ones”

• Generalization error
• The minimization of classification error on train database does not
guarantee minimization of classification error on test database
(generalization error) 81

81

Classification – Option 3
• Decision boundary with good generalization

The decision boundary shown might represent the optimal tradeoff between
performance on the training set and simplicity of classifier.
82

82

41
10/3/2022

RESOURCES

83

83

Resources - Journals
• Journal of Machine Learning Research
• Machine Learning
• Pattern Recognition
• Pattern Recognition Letters
• Neural Computation
• Neural Networks
• IEEE Transactions on Neural Networks
• IEEE Transactions on Pattern Analysis and Machine Intelligence
• ...
84

84

42
10/3/2022

Resources – Conferences
• International Conference on Machine Learning (ICML)

• International Conference of Pattern Recognition (ICPR)

• European Conference on Machine Learning (ECML)

• ...

85

85

The material in these slides has been taken from the following sources

Acknowledgements
• Dr. Imran Siddiqi: Bahria University, Islamabad
• Machine Intelligence, Dr M. Hanif, UET, Lahore
• Machine Learning, S. Stock, University of Nebraska
• Lecture Slides, Introduction to Machine Learning,
E. Alpyadin, MIT Press.
• Machine Learning, Andrew Ng – Stanfrod
University
• Fisher kernels for image representation &
generative classification models, Jakob Verbeek
86

86

43
10/3/2022

Courtesy
• Slides are prepared using material from Website of
• Jiawei Han, Micheline Kamber, and Jian Pei
• University of Illinois at Urbana-Champaign & Simon Fraser
University.

• Course Slides: Infolabs Stanford University


• Course Slides: Purdue University

87

87

Thank You

88

88

44

You might also like