Introduction To Machine Learning

INTRODUCTION TO
MACHINE LEARNING
Outline
Topics:
• The Machine Learning Framework
• What is Supervised Learning?
• What is Unsupervised Learning?
Reading Material:
 Chapter 18.2 to 14.2 in Russell & Norvig
 Reference Videos:
• Machine Learning course by Andrew Ng, Coursera
− What is Machine Learning [video]
− Introduction – Supervised Learning [video]
− Introduction – Unsupervised Learning [video]
• The Machine Learning Pipeline by Evan Sparks [video]
Problems with Traditional Approach
Input
Output
Complex,
specific “car”
program
if this local region looks like a door handle,

and
if this local region area looks like a wheel
Classify image as a car
Will work if given the same image again, but, given the following new images, this
algorithm is expected to fail.
Problem:
- Static – cannot adapt to new input
- Complex – problem becomes unwieldy (many variations) 3
What is Machine Learning
Learning = Improving with experience at some task
Arthur Samuel (1959)

 Machine Learning: Field of study that gives
computers the ability to learn without being
explicitly programmed.
Tom Mitchell (1998)

 Well-posed Learning Problem:
A computer program is said to learn from
experience E with respect to some task T and some
performance measure P, if its performance on T, as
measured by P, improves with experience E.
4
The Machine Learning Framework
 Machine learning algorithms build models (hypothesis function) to tackle

tasks
• Example: a straight line (linear classifier)
 A model can be adjusted by modifying its parameters.
• Example: adjusting slope and bias of a straight line allow us to split the feature
space into two partitions
 Divided into two phases:
• Training phase
− Use training samples to learn the parameters of the model
• Testing phase
− Given an new sample, apply the model (learnt from the training phase) for
the intended task (regression, classification)
 Since the parameters are adjustable, we need not write custom algorithms for
different tasks. We simply need train a new model for new problems or when
the environment changes.
5
A Standard Machine Learning Pipeline
Training Phase: Testing Phase:
Ground Truths Training Evaluation Testing Ground Truths

(if required) Set Set Set (if available)
Data Preparation
Training
Input to
generates
Model
Predictions
Evaluation
6
Data Preparation
 Involves the following activities:

• Determine useful features or information to collect
• Collect samples for training and testing (can be very labor-intensive)
• Perform data cleaning and preprocessing
 Types of data:
• Text, numbers, clickstreams, images, videos, transactions, graphs,
tables, etc.
7
 Feature extraction:
 To automatically classify fishes (salmon or sea
brass) in a conveyor belt
 Useful features: lightness, width, number of fins,
shape of the fins, shape of fish
 Interested to know: classes of fishes in the
conveyor belt
 To perform digit classification in an image:

 Useful features: intensity value of each pixel in an
image
 Interested to know: the digits given an image
 To predict house prices:

 Useful features: size of house, age of house,
number of rooms, number of toilets, location, type
of house, population size of neighborhood, freehold
or leasehold, renovation status
 Interested to know : price of an unseen house
 In traditional machine learning where features are

hand-picked. Deep learning learns the feature
automatically from raw features
 Perform data preprocessing:
o Data aggregation:
 Fusing data from multiple sources
o Data cleaning:
 Cleaning data to remove noise and duplicate observations
o Data transformation:
o Format conversion:
 Convert the format into the desired format, e.g., from free text into vector
o Discretization and binarization:
 Required if some learning algorithms require the data to be of categorical
attributes.
o Feature creation:
 Creation of a new set of features from the original raw feature
o Data Reduction
o Feature subset selection:
 Selecting a subset of features
o Dimensionality Reduction:
 Removing features that are not useful. Problem with the curse of dimensionality
9
Training
 Build a machine learning model to predict the value of a

particular attribute based on the values of other attributes
 There are four types of modeling tasks:
• Classification: if the target predictive variable is discrete
− Example: for manholes, will the manhole explode next year? Y/N
• Regression: if the target predictive variable is continuous

− Example: predict stock prices based on recent trends
• Clustering: if we want to group observations into similar-
looking groups
• Recommendation: if we want to recommend someone
an item, e.g., a book, movie or product based on rating
data from customers
10
 Example (classification):
• Categorize the fishes in the conveyor belt into salman or sea brass. Let’s say our
model is a straight line that separates the two types of fishes (a linear classification
model)
• Find the best straight line (decision boundary) that partitions the feature space into
2 regions, one for each type of fish.
• Need to find the best parameter value (intersection and slope)
Width = 19.2
Lightness =
Width = 19.3 7.3
Lightness = 1.8
Sea Brass
Salmon
Width = 16.4
Lightness =
Width = 17.3 7.6
Lightness = 2.2
Sea Brass
Salmon
11
Testing
 Build a machine learning model to predict the value of a

particular attribute based on the values of other attributes
 There are four types of modeling tasks:
• Classification: if the target predictive variable is discrete
− Example: for manholes, will the manhole explode next year? Y/N
• Regression: if the target predictive variable is continuous

− Example: predict stock prices based on recent trends
• Clustering: if we want to group observations into similar-
looking groups
• Recommendation: if we want to recommend someone
an item, e.g., a book, movie or product based on rating
data from customers
12
Evaluation
 This step evaluates the performance of the learnt model
 How do you measure the quality of the result?

 Need to have ground-truths - may be hard to get
 Different kinds of performance measure is available, each with their

own pros and cons.
 Recall
 Precision
 F-measure
 Accuracy
 Confusion Table
 etc.
13
Categories of Machine Learning Techniques
 Supervised Learning:
 Learning a model from labeled data
Features
Label
length width weight
fruit 1 165 38 172 Banana
fruit 3 76 80 145 Orange
… … … … …
fruit 5 … … … …
 Useful for tasks to predict the labels/values of a certain attribute of an input sample
(classification/regression tasks)
 Example: Predict the type of a fruit (banana/orange) given its features (length, width and
weight)
 Unsupervised Learning:
 Sometimes, the labels are not available
 Learning a model using features only without the labels
 Useful for grouping similar samples into multiple groups (clustering)
 Example: Given a group of fruits and their features (length, width, weight), cluster them into
different categories
14
Supervised Learning
 In supervised learning, the algorithm is given some example input-output pair

and it learns a function that maps from input to output
 The input is the set of features used to describe the samples
 The output is the attribute (category or value) that we are interested to predict
 Types of supervised problems:
Classification Regression
Classification predicts discrete Regression predicts continuous
valued output (e.g., present/not valued output (e.g., house price)
present)
Price (RM x1000)
Yes Yes 400
300
200
100
No No
0
Size
0 500 1000 1500 2000 2500
Object Detection (Images with Car) Housing Price Prediction

15
Classification: Example
Input (Image): Output:
 Digit Classification
• Input: images / pixel grids 0
• Output: a digit 0-9
• Setup: 1
− Get a large collection of example images, each labeled with a
digit
− Note: someone has to hand label all this data! 2
− Want to learn to predict labels of new, future digit images
• Features: 1
− The attributes used to make the digit decision
− Pixels: (6,8)=ON
− Shape Patterns: NumComponents, AspectRatio, NumLoops ??
− …
16
Classification: Example
Input (Image): Output:

Spam mail classification
 Input: an email Dear Sir.
 Output: spam or non-spam
 Setup: First, I must solicit your confidence in
this transaction, this is by virture of its
• Get a large collection of example emails, each nature as being utterly confidencial
labeled “spam” or “non-spam” and top secret. …
• Note: someone has to hand label all this data!
• Want to learn to predict labels of new, future TO BE REMOVED FROM FUTURE
emails MAILINGS, SIMPLY REPLY TO THIS
MESSAGE AND PUT "REMOVE" IN THE
 Features: The attributes used to make the spam SUBJECT.
or non-spam decision
• Words: FREE! 99 MILLION EMAIL ADDRESSES
• Text Patterns: $dd, CAPS FOR ONLY $99
• Non-text: SenderInContacts
Ok, Iknow this is blatantly OT but I'm
• … beginning to go insane. Had an old Dell
Dimension XPS sitting in the corner and
decided to put it to use, I know it was
working pre being stuck in the corner,
but when I plugged it in, hit the power
nothing happened.
17
Regression: Example
Predicting number of comments for blog post

 https://archive.ics.uci.edu/ml/datasets/BlogFeedback
 Input: blog posts
 Output: number of comments received for a post in the next 24 hours
 Setup:
• Crawl raw HTML-documents of blogs that were posted at most 72 hours before selected basetime.
For each blog, collect the number of comments received in the next 24 hours relative to the
basetime
• Collect from different base date/time.
• Ensure the train and test split are temporally disjoint (Training set: 2010, 2011, Test set: 2012) to
simulate the real-world situation where training data in the past is used to predict events in the
future
 Features: Attributes extracted from the blog posts

• Total number of comments before basetime (C1)
• Number of comments in the last 24 hours before the basetime (C2)
• Number of comments between 24 and 48 hours before basetime (C3)
• Difference between C2 and C3
• The length of the blog post
• Bag of words for 200 frequent words of the text of the blog post
• Day of post
• …
18
Unsupervised Learning
19
Unsupervised Learning
• Unsupervised learning involves learning pattern when the training

samples are provided without output (no teacher)
• Uses similarity measure to detect groupings / clustering
Supervised Learning Unsupervised Learning

Positive We want
samples to learn this
x x
x
x2 x x2
x
Samples (no labels)
Negative
samples
x1 x1
Learn the hypothesis function for the Discover the underlying structure,
task based on the features of the relationship or patterns based only
training samples and their labels. on the features of the training
sample 20
Example: News Search
 Unsupervised learning is used by Google news to cluster similar news stories
A news article by CNN Related news detected discovered

through clustering
21
Example: Gene clustering
 Understanding genomics by finding clusters of people who have or do not have a

certain types of gene
Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8

Genes
Individuals
22
[Source: Daphne Koller]
Other Applications
Other classification tasks: Other regression tasks:

• Spam detection (input: document, • Sociology (input: : pay,
classes: spam / ham) qualifications, output: measure of
• OCR (input: images, classes: social status of various occupations)
characters) • Economics (input: family’s income,
• Medical diagnosis (input: number of children in family, output:
symptoms, classes: diseases) family consumption expenditure)
• Automatic essay grading (input: • Political science (input: measures of
document, classes: grades) public opinion, institutional variables,
• Fraud detection (input: account output: state's level of welfare
activity, classes: fraud / no fraud) spending)
• … many more • … many more
23
When to apply machine learning?
 Problem size is too vast for our limited reasoning capacity

(e.g., large datasets from the growth of automation/web. Such as web click data,
medical records, biology)
 Applications that cannot be programmed by hand where humans are unable to
explain their expertise
(e.g., Autonomous helicopter, handwriting recognition, most of Natural Language
Processing (NLP), Computer Vision)
 Solution changes with time
(e.g., tracking, preferences)
 Self-customizing programs
(e.g., Amazon, Netflix product recommendations)
 Understanding human learning
(brain, real AI)
24

Introduction To Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Machine Learning

Uploaded by

Copyright:

Available Formats

INTRODUCTION TO

if this local region looks like a door handle,

Learning = Improving with experience at some task

Arthur Samuel (1959)

Tom Mitchell (1998)

 Machine learning algorithms build models (hypothesis function) to tackle

Ground Truths Training Evaluation Testing Ground Truths

 Involves the following activities:

 To perform digit classification in an image:

 To predict house prices:

 In traditional machine learning where features are

 Build a machine learning model to predict the value of a

• Regression: if the target predictive variable is continuous

 Build a machine learning model to predict the value of a

• Regression: if the target predictive variable is continuous

 This step evaluates the performance of the learnt model

 How do you measure the quality of the result?

 Different kinds of performance measure is available, each with their

 In supervised learning, the algorithm is given some example input-output pair

Object Detection (Images with Car) Housing Price Prediction

Input (Image): Output:

Input (Image): Output:

Predicting number of comments for blog post

 Features: Attributes extracted from the blog posts

• Unsupervised learning involves learning pattern when the training

Supervised Learning Unsupervised Learning

 Unsupervised learning is used by Google news to cluster similar news stories

A news article by CNN Related news detected discovered

 Understanding genomics by finding clusters of people who have or do not have a

Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8

Other classification tasks: Other regression tasks:

 Problem size is too vast for our limited reasoning capacity

You might also like