You are on page 1of 24

INTRODUCTION TO

MACHINE LEARNING
Outline

Topics:
• The Machine Learning Framework
• What is Supervised Learning?
• What is Unsupervised Learning?

Reading Material:
 Chapter 18.2 to 14.2 in Russell & Norvig
 Reference Videos:
• Machine Learning course by Andrew Ng, Coursera
− What is Machine Learning [video]
− Introduction – Supervised Learning [video]
− Introduction – Unsupervised Learning [video]
• The Machine Learning Pipeline by Evan Sparks [video]
Problems with Traditional Approach

Input
Output
Complex,
specific “car”
program

if this local region looks like a door handle,


and
if this local region area looks like a wheel
Classify image as a car

Will work if given the same image again, but, given the following new images, this
algorithm is expected to fail.

Problem:
- Static – cannot adapt to new input
- Complex – problem becomes unwieldy (many variations) 3
What is Machine Learning

Learning = Improving with experience at some task

Arthur Samuel (1959)


 Machine Learning: Field of study that gives
computers the ability to learn without being
explicitly programmed.

Tom Mitchell (1998)


 Well-posed Learning Problem:
A computer program is said to learn from
experience E with respect to some task T and some
performance measure P, if its performance on T, as
measured by P, improves with experience E.

4
The Machine Learning Framework

 Machine learning algorithms build models (hypothesis function) to tackle


tasks
• Example: a straight line (linear classifier)
 A model can be adjusted by modifying its parameters.
• Example: adjusting slope and bias of a straight line allow us to split the feature
space into two partitions
 Divided into two phases:
• Training phase
− Use training samples to learn the parameters of the model
• Testing phase
− Given an new sample, apply the model (learnt from the training phase) for
the intended task (regression, classification)
 Since the parameters are adjustable, we need not write custom algorithms for
different tasks. We simply need train a new model for new problems or when
the environment changes.

5
A Standard Machine Learning Pipeline
Training Phase: Testing Phase:

Ground Truths Training Evaluation Testing Ground Truths


(if required) Set Set Set (if available)

Data Preparation

Training
Input to

generates

Model

Predictions

Evaluation
6
Data Preparation

 Involves the following activities:


• Determine useful features or information to collect
• Collect samples for training and testing (can be very labor-intensive)
• Perform data cleaning and preprocessing
 Types of data:
• Text, numbers, clickstreams, images, videos, transactions, graphs,
tables, etc.

7
 Feature extraction:
 To automatically classify fishes (salmon or sea
brass) in a conveyor belt
 Useful features: lightness, width, number of fins,
shape of the fins, shape of fish
 Interested to know: classes of fishes in the
conveyor belt

 To perform digit classification in an image:


 Useful features: intensity value of each pixel in an
image
 Interested to know: the digits given an image

 To predict house prices:


 Useful features: size of house, age of house,
number of rooms, number of toilets, location, type
of house, population size of neighborhood, freehold
or leasehold, renovation status
 Interested to know : price of an unseen house

 In traditional machine learning where features are


hand-picked. Deep learning learns the feature
automatically from raw features
 Perform data preprocessing:
o Data aggregation:
 Fusing data from multiple sources
o Data cleaning:
 Cleaning data to remove noise and duplicate observations
o Data transformation:
o Format conversion:
 Convert the format into the desired format, e.g., from free text into vector
o Discretization and binarization:
 Required if some learning algorithms require the data to be of categorical
attributes.
o Feature creation:
 Creation of a new set of features from the original raw feature
o Data Reduction
o Feature subset selection:
 Selecting a subset of features
o Dimensionality Reduction:
 Removing features that are not useful. Problem with the curse of dimensionality

9
Training

 Build a machine learning model to predict the value of a


particular attribute based on the values of other attributes
 There are four types of modeling tasks:
• Classification: if the target predictive variable is discrete
− Example: for manholes, will the manhole explode next year? Y/N

• Regression: if the target predictive variable is continuous


− Example: predict stock prices based on recent trends
• Clustering: if we want to group observations into similar-
looking groups
• Recommendation: if we want to recommend someone
an item, e.g., a book, movie or product based on rating
data from customers
10
 Example (classification):
• Categorize the fishes in the conveyor belt into salman or sea brass. Let’s say our
model is a straight line that separates the two types of fishes (a linear classification
model)
• Find the best straight line (decision boundary) that partitions the feature space into
2 regions, one for each type of fish.
• Need to find the best parameter value (intersection and slope)

Width = 19.2
Lightness =
Width = 19.3 7.3
Lightness = 1.8
Sea Brass
Salmon
Width = 16.4
Lightness =
Width = 17.3 7.6
Lightness = 2.2
Sea Brass
Salmon

11
Testing

 Build a machine learning model to predict the value of a


particular attribute based on the values of other attributes
 There are four types of modeling tasks:
• Classification: if the target predictive variable is discrete
− Example: for manholes, will the manhole explode next year? Y/N

• Regression: if the target predictive variable is continuous


− Example: predict stock prices based on recent trends
• Clustering: if we want to group observations into similar-
looking groups
• Recommendation: if we want to recommend someone
an item, e.g., a book, movie or product based on rating
data from customers
12
Evaluation

 This step evaluates the performance of the learnt model

 How do you measure the quality of the result?


 Need to have ground-truths - may be hard to get

 Different kinds of performance measure is available, each with their


own pros and cons.
 Recall
 Precision
 F-measure
 Accuracy
 Confusion Table
 etc.

13
Categories of Machine Learning Techniques

 Supervised Learning:
 Learning a model from labeled data

Features
Label
length width weight
fruit 1 165 38 172 Banana
fruit 2 218 39 230 Banana
fruit 3 76 80 145 Orange
fruit 4 145 35 150 Banana
… … … … …
fruit 5 … … … …

 Useful for tasks to predict the labels/values of a certain attribute of an input sample
(classification/regression tasks)
 Example: Predict the type of a fruit (banana/orange) given its features (length, width and
weight)

 Unsupervised Learning:
 Sometimes, the labels are not available
 Learning a model using features only without the labels
 Useful for grouping similar samples into multiple groups (clustering)
 Example: Given a group of fruits and their features (length, width, weight), cluster them into
different categories
14
Supervised Learning

 In supervised learning, the algorithm is given some example input-output pair


and it learns a function that maps from input to output
 The input is the set of features used to describe the samples
 The output is the attribute (category or value) that we are interested to predict
 Types of supervised problems:

Classification Regression
Classification predicts discrete Regression predicts continuous
valued output (e.g., present/not valued output (e.g., house price)
present)
Price (RM x1000)
Yes Yes 400
300
200
100
No No
0
Size
0 500 1000 1500 2000 2500

Object Detection (Images with Car) Housing Price Prediction


15
Classification: Example

Input (Image): Output:

 Digit Classification
• Input: images / pixel grids 0
• Output: a digit 0-9
• Setup: 1
− Get a large collection of example images, each labeled with a
digit
− Note: someone has to hand label all this data! 2
− Want to learn to predict labels of new, future digit images
• Features: 1
− The attributes used to make the digit decision
− Pixels: (6,8)=ON
− Shape Patterns: NumComponents, AspectRatio, NumLoops ??
− …

16
Classification: Example

Input (Image): Output:


Spam mail classification
 Input: an email Dear Sir.
 Output: spam or non-spam
 Setup: First, I must solicit your confidence in
this transaction, this is by virture of its
• Get a large collection of example emails, each nature as being utterly confidencial
labeled “spam” or “non-spam” and top secret. …
• Note: someone has to hand label all this data!
• Want to learn to predict labels of new, future TO BE REMOVED FROM FUTURE
emails MAILINGS, SIMPLY REPLY TO THIS
MESSAGE AND PUT "REMOVE" IN THE
 Features: The attributes used to make the spam SUBJECT.
or non-spam decision
• Words: FREE! 99 MILLION EMAIL ADDRESSES
• Text Patterns: $dd, CAPS FOR ONLY $99
• Non-text: SenderInContacts
Ok, Iknow this is blatantly OT but I'm
• … beginning to go insane. Had an old Dell
Dimension XPS sitting in the corner and
decided to put it to use, I know it was
working pre being stuck in the corner,
but when I plugged it in, hit the power
nothing happened.

17
Regression: Example

Predicting number of comments for blog post


 https://archive.ics.uci.edu/ml/datasets/BlogFeedback
 Input: blog posts
 Output: number of comments received for a post in the next 24 hours
 Setup:
• Crawl raw HTML-documents of blogs that were posted at most 72 hours before selected basetime.
For each blog, collect the number of comments received in the next 24 hours relative to the
basetime
• Collect from different base date/time.
• Ensure the train and test split are temporally disjoint (Training set: 2010, 2011, Test set: 2012) to
simulate the real-world situation where training data in the past is used to predict events in the
future

 Features: Attributes extracted from the blog posts


• Total number of comments before basetime (C1)
• Number of comments in the last 24 hours before the basetime (C2)
• Number of comments between 24 and 48 hours before basetime (C3)
• Difference between C2 and C3
• The length of the blog post
• Bag of words for 200 frequent words of the text of the blog post
• Day of post
• …
18
Unsupervised Learning

19
Unsupervised Learning

• Unsupervised learning involves learning pattern when the training


samples are provided without output (no teacher)
• Uses similarity measure to detect groupings / clustering

Supervised Learning Unsupervised Learning


Positive We want
samples to learn this
x x
x
x2 x x2
x
Samples (no labels)
Negative
samples

x1 x1

Learn the hypothesis function for the Discover the underlying structure,
task based on the features of the relationship or patterns based only
training samples and their labels. on the features of the training
sample 20
Example: News Search

 Unsupervised learning is used by Google news to cluster similar news stories

A news article by CNN Related news detected discovered


through clustering

21
Example: Gene clustering

 Understanding genomics by finding clusters of people who have or do not have a


certain types of gene

Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8


Genes

Individuals

22
[Source: Daphne Koller]
Other Applications

Other classification tasks: Other regression tasks:


• Spam detection (input: document, • Sociology (input: : pay,
classes: spam / ham) qualifications, output: measure of
• OCR (input: images, classes: social status of various occupations)
characters) • Economics (input: family’s income,
• Medical diagnosis (input: number of children in family, output:
symptoms, classes: diseases) family consumption expenditure)
• Automatic essay grading (input: • Political science (input: measures of
document, classes: grades) public opinion, institutional variables,
• Fraud detection (input: account output: state's level of welfare
activity, classes: fraud / no fraud) spending)
• … many more • … many more

23
When to apply machine learning?

 Problem size is too vast for our limited reasoning capacity


(e.g., large datasets from the growth of automation/web. Such as web click data,
medical records, biology)
 Applications that cannot be programmed by hand where humans are unable to
explain their expertise
(e.g., Autonomous helicopter, handwriting recognition, most of Natural Language
Processing (NLP), Computer Vision)
 Solution changes with time
(e.g., tracking, preferences)
 Self-customizing programs
(e.g., Amazon, Netflix product recommendations)
 Understanding human learning
(brain, real AI)

24

You might also like