You are on page 1of 88

Introduction

Welcome
Machine Learning
Andrew Ng
Andrew Ng
SPAM
Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.

Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.

Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.

Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.
- Self-customizing programs
E.g., Amazon, Netflix product recommendations
Andrew Ng
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.
- Self-customizing programs
E.g., Amazon, Netflix product recommendations
- Understanding human learning (brain, real AI).
Andrew Ng
Andrew Ng
Introduction
What is machine
learning
Machine Learning

Andrew Ng
Machine Learning definition

Andrew Ng
Machine Learning definition
• Arthur Samuel (1959). Machine Learning: Field of
study that gives computers the ability to learn
without being explicitly programmed.

Andrew Ng
Machine Learning definition
• Arthur Samuel (1959). Machine Learning: Field of
study that gives computers the ability to learn
without being explicitly programmed.

Andrew Ng
Machine Learning definition
• Arthur Samuel (1959). Machine Learning: Field of
study that gives computers the ability to learn
without being explicitly programmed.
• Tom Mitchell (1998) Well-posed Learning
Problem: A computer program is said to learn
from experience E with respect to some task T
and some performance measure P, if its
performance on T, as measured by P, improves
with experience E.
Andrew Ng
“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.”
Suppose your email program watches which emails you do or do
not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?

Classifying emails as spam or not spam.

Watching you label emails as spam or not spam.

The number (or fraction) of emails correctly classified as spam/not spam.

None of the above—this is not a machine learning problem.


“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.”
Suppose your email program watches which emails you do or do
not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?

Classifying emails as spam or not spam.

Watching you label emails as spam or not spam.

The number (or fraction) of emails correctly classified as spam/not spam.

None of the above—this is not a machine learning problem.


“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.”
Suppose your email program watches which emails you do or do
not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?

Classifying emails as spam or not spam.

Watching you label emails as spam or not spam.

The number (or fraction) of emails correctly classified as spam/not spam.

None of the above—this is not a machine learning problem.


Machine learning algorithms:
- Supervised learning
- Unsupervised learning
Others: Reinforcement learning, recommender
systems.

Also talk about: Practical advice for applying


learning algorithms.
Andrew Ng
Andrew Ng
Introduction
Supervised
Learning
Machine Learning

Andrew Ng
Housing price prediction.
400

300
Price ($)
200
in 1000’s
100

0
0 500 1000 1500 2000 2500
Size in feet2

Supervised Learning Regression: Predict continuous


“right answers” given valued output (price)
Andrew Ng
Breast cancer (malignant, benign)
Classification
1(Y)
Discrete valued
Malignant?
output (0 or 1)
0(N)
Tumor Size

Tumor Size

Andrew Ng
- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape

Tumor Size

Andrew Ng
You’re running a company, and you want to develop learning algorithms to address each
of two problems.

Problem 1: You have a large inventory of identical items. You want to predict how many
of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for each
account decide if it has been hacked/compromised.

Should you treat these as classification or as regression problems?


Treat both as classification problems.

Treat problem 1 as a classification problem, problem 2 as a regression problem.

Treat problem 1 as a regression problem, problem 2 as a classification problem.

Treat both as regression problems.


Andrew Ng
Introduction
Unsupervised
Learning
Machine Learning

Andrew Ng
Supervised Learning

x2

x1
Andrew Ng
Unsupervised Learning

x2

x1
Andrew Ng
Andrew Ng
Andrew Ng
Genes

Individuals

[Source: Daphne Koller] Andrew Ng


Genes

Individuals

[Source: Daphne Koller] Andrew Ng


Organize computing clusters Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Market segmentation Astronomical data analysis


Andrew Ng
Cocktail party problem

Speaker #1 Microphone #1

Speaker #2 Microphone #2

Andrew Ng
Microphone #1: Output #1:

Microphone #2: Output #2:

Microphone #1: Output #1:

Microphone #2: Output #2:

[Audio clips courtesy of Te-Won Lee.] Andrew Ng


Cocktail party problem algorithm

[W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

[Source: Sam Roweis, Yair Weiss & Eero Simoncelli] Andrew Ng


Of the following examples, which would you address using an
unsupervised learning algorithm? (Check all that apply.)

Given email labeled as spam/not spam, learn a spam filter.

Given a set of news articles found on the web, group them into
set of articles about the same story.
Given a database of customer data, automatically discover market
segments and group customers into different market segments.
Given a dataset of patients diagnosed as either having diabetes or
not, learn to classify new patients as having diabetes or not.
Andrew Ng
Linear  regression  
with  one  variable  
Model  
representa6on  
Machine  Learning  

Andrew  Ng  
500  
Housing  Prices  
400  
(Portland,  OR)  
300  

Price   200  
(in  1000s   100  
of  dollars)   0  
0   500   1000   1500   2000   2500   3000  
Size  (feet2)  
Supervised  Learning   Regression  Problem  
   

Given  the  “right  answer”  for   Predict  real-­‐valued  output  


each  example  in  the  data.  
Andrew  Ng  
Training  set  of   Size  in  feet2  (x)   Price  ($)  in  1000's  (y)  
housing  prices   2104   460  
(Portland,  OR)   1416   232  
1534   315  
852   178  
…   …  
Nota6on:  
 
 

     m  =  Number  of  training  examples  


     x’s  =  “input”  variable  /  features  
     y’s  =  “output”  variable  /  “target”  variable  

Andrew  Ng  
Training  Set   How  do  we  represent  h  ?  

Learning  Algorithm  

Size  of   h   Es6mated  


house   price  

Linear  regression  with  one  variable.  


Univariate  linear  regression.  

Andrew  Ng  
Linear  regression  
with  one  variable  
Cost  func6on  

Machine  Learning  

Andrew  Ng  
Size  in  feet2  (x)   Price  ($)  in  1000's  (y)  
Training  Set   2104   460  
1416   232  
1534   315  
852   178  
…   …  

Hypothesis:  
‘s:            Parameters  
How  to  choose          ‘s  ?  
Andrew  Ng  
3   3   3  

2   2   2  

1   1   1  

0   0   0  
0   1   2   3   0   1   2   3   0   1   2   3  

Andrew  Ng  
y  

x  

Idea:  Choose                          so  that                                        


                     is  close  to          for  our  
training  examples    

Andrew  Ng  
Linear  regression  
with  one  variable  
Cost  func6on  
intui6on  I  
Machine  Learning  

Andrew  Ng  
Simplified  
Hypothesis:  

Parameters:  

Cost  Func6on:  

Goal:  

Andrew  Ng  
(for  fixed          ,  this  is  a  func6on  of  x)   (func6on  of  the  parameter            )  

3   3  

2   2  
y  
1   1  

0   0  
0   1   2   3   -­‐0.5   0   0.5   1   1.5   2   2.5  
x  

Andrew  Ng  
(for  fixed          ,  this  is  a  func6on  of  x)   (func6on  of  the  parameter            )  

3   3  

2   2  
y  
1   1  

0   0  
0   1   2   3   -­‐0.5   0   0.5   1   1.5   2   2.5  
x  

Andrew  Ng  
(for  fixed          ,  this  is  a  func6on  of  x)   (func6on  of  the  parameter            )  

3   3  

2   2  
y  
1   1  

0   0  
0   1   2   3   -­‐0.5   0   0.5   1   1.5   2   2.5  
x  

Andrew  Ng  
Linear  regression  
with  one  variable  
Cost  func6on  
intui6on  II  
Machine  Learning  

Andrew  Ng  
Hypothesis:  

Parameters:  

Cost  Func6on:  

Goal:  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

500  

400  
Price  ($)     300  
in  1000’s  
200  

100  

0  
0   1000   2000   3000  
Size  in  feet2  (x)  

Andrew  Ng  
Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
Linear  regression  
with  one  variable  

Gradient  
Machine  Learning  
descent  
Andrew  Ng  
Have  some  func6on  
Want    

Outline:  
• Start  with  some  
• Keep  changing                            to  reduce                                          
un6l  we  hopefully  end  up  at  a  minimum  
Andrew  Ng  
J(θ0,θ1)

θ1

θ0

Andrew  Ng  
J(θ0,θ1)

θ1

θ0

Andrew  Ng  
Gradient  descent  algorithm  

Correct:  Simultaneous  update   Incorrect:  

Andrew  Ng  
Linear  regression  
with  one  variable  
Gradient  descent  
intui6on  
Machine  Learning  

Andrew  Ng  
Gradient  descent  algorithm  

Andrew  Ng  
Andrew  Ng  
If  α  is  too  small,  gradient  descent  
can  be  slow.  

If  α  is  too  large,  gradient  descent  


can  overshoot  the  minimum.  It  may  
fail  to  converge,  or  even  diverge.  

Andrew  Ng  
at  local  op6ma  

Current  value  of    

Andrew  Ng  
Gradient  descent  can  converge  to  a  local  
minimum,  even  with  the  learning  rate  α  fixed.  

As  we  approach  a  local  


minimum,  gradient  
descent  will  automa6cally  
take  smaller  steps.  So,  no  
need  to  decrease  α  over  
6me.    
Andrew  Ng  
Linear  regression  
with  one  variable  
Gradient  descent  for    
linear  regression  
Machine  Learning  

Andrew  Ng  
Gradient  descent  algorithm   Linear  Regression  Model  

Andrew  Ng  
Andrew  Ng  
Gradient  descent  algorithm  

update    
and  
simultaneously  
 

Andrew  Ng  
J(θ0,θ1)

θ1

θ0

Andrew  Ng  
J(θ0,θ1)

θ1

θ0

Andrew  Ng  
Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
(for  fixed                      ,  this  is  a  func6on  of  x)   (func6on  of  the  parameters                        )  

Andrew  Ng  
“Batch”  Gradient  Descent  

“Batch”:  Each  step  of  gradient  descent  


uses  all  the  training  examples.  

Andrew  Ng  

You might also like