Slides Lect 02

Learning From Data
Lecture 2
The Perceptron
The Learning Setup
A Simple Learning Algorithm: PLA
Other Views of Learning
Is Learning Feasible: A Puzzle
M. Magdon-Ismail
CSCI 4100/6100
recap: The Plan
1. What is Learning?
2. Can We do it?
3. How to do it?
concepts
4. How to do it well? theory
practice
5. General principles?
6. Advanced techniques.
7. Other Learning Paradigms.
our language will be mathematics . . .

. . . our sword will be computer algorithms
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 2 /25 Recap: key players −→
recap: The Key Players
• Salary, debt, years in residence, . . . input x ∈ Rd = X .

• Approve credit or not output y ∈ {−1, +1} = Y.
• True relationship between x and y target function f : X 7→ Y.
(The target f is unknown.)
• Data on customers data set D = (x1, y1), . . . , (xN , yN ).

(yn = f (xn).)
X Y and D are given by the learning problem;

The target f is fixed but unknown.
We learn the function f from the data D.
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 3 /25 Recap: learning setup −→
recap: Summary of the Learning Setup
UNKNOWN TARGET FUNCTION

f : X 7→ Y
(ideal credit approval formula)
yn = f (xn )
TRAINING EXAMPLES
(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )
(historical records of credit customers)
LEARNING FINAL
ALGORITHM HYPOTHESIS
A g≈f
(learned credit approval formula)
HYPOTHESIS SET
H
(set of candidate formulas)
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 4 /25 Simple learning model −→
A Simple Learning Model
• Input vector x = [x1, . . . , xd]t.
• Give importance weights to the different inputs and compute a “Credit Score”
d
X
“Credit Score” = w i xi .
i=1
• Approve credit if the “Credit Score” is acceptable.

d
X
Approve credit if wixi > threshold, (“Credit Score” is good)
i=1
X d
Deny credit if wixi < threshold. (“Credit Score” is bad)
i=1
• How to choose the importance weights wi

input xi is important =⇒ large weight |wi|
input xi beneficial for credit =⇒ positive weight wi > 0
input xi detrimental for credit =⇒ negative weight wi < 0
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 5 /25 Rewriting the model −→
A Simple Learning Model
d
X
Approve credit if wixi > threshold,
i=1
Xd
Deny credit if wixi < threshold.
i=1
can be written formally as
d
! !
X
h(x) = sign w i xi + w0
i=1
The “bias weight” w0 corresponds to the threshold. (How?)
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 6 /25 Perceptron −→
The Perceptron Hypothesis Set
We have defined a Hyopthesis set H
H = {h(x) = sign(wtx)} ← uncountably infinite H
   
w0 1
w 1   x1 
w= .  ∈ Rd+1, x= .  ∈ {1} × Rd.
 .  .
wd xd
This hypothesis set is called the perceptron or linear separator
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 7 /25 Geometry of perceptron −→
Geometry of The Perceptron
h(x) = sign(wtx) (Problem 1.2 in LFD)
Income
Income
Age Age
Which one should we pick?
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 8 /25 Use the data −→
Use the Data to Pick a Line
Income
Income
Age Age
A perceptron fits the data by using a line to separate the +1 from −1 data.
Fitting the data: How to find a hyperplane that separates the data?
(“It’s obvious - just look at the data and draw the line,” is not a valid solution.)
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 9 /25 How to learn g −→
How to Learn a Final Hypothesis g from H
We want to select g ∈ H so that g ≈ f .

We certainly want g ≈ f on the data set D. Ideally,
g(xn) = yn.
How do we find such a g in the infinite hypothesis set H, if it exists?
Idea! Start with some weight vector and try to improve it.
Income
Age
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 10 /25 PLA −→
The Perceptron Learning Algorithm (PLA)
A simple iterative method.

1: w(1) = 0 y∗ = +1
2: for iteration t = 1, 2, 3, . . . y∗ x∗
w(t + 1)
3: the weight vector is w(t). w(t)
4: From (x1, y1 ), . . . , (xN , yN ) pick any misclassified example.
x∗
5: Call the misclassified example (x∗, y∗ ),
sign (w(t) • x∗) 6= y∗ .
y∗ = −1
6: Update the weight:
w(t + 1) = w(t) + y∗x∗ . y∗ x∗
w(t)
w(t + 1)
7: t←t+1
x∗
PLA implements our idea: start at some weights and try to improve.
“incremental learning”on a single example at a time
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 11 /25 PLA convergence −→
Does PLA Work?
Theorem. If the data can be fit by a linear separator, then after some finite number
of steps, PLA will find one.
Income
What if the data cannot be fit by a perceptron?
Age
iteration 1
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 12 /25 Start −→
Does PLA Work?
Income
Age
iteration 1
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 13 /25 Iteration 1 −→
Does PLA Work?
After how long?
Income
Age
iteration 1
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 14 /25 Iteratrion 2 −→
Does PLA Work?
After how long?
Income
Age
iteration 2
c AM
Does PLA Work?
After how long?
Income
Age
iteration 3
c AM
Does PLA Work?
After how long?
Income
Age
iteration 4
c AM
Does PLA Work?
After how long?
Income
Age
iteration 5
c AM
Does PLA Work?
After how long?
Income
Age
iteration 6
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 19 /25 Non-separable data? −→
Does PLA Work?
After how long?
Income
Age
iteration 1
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 20 /25 We can fit! −→
We can Fit the Data
• We can find an h that works from infinitely many (for the perceptron).
(So computationally, things seem good.)
• Ultimately, remember that we want to predict.

We don’t care about the data, we care about “outside the data”.
Can a limited data set reveal enough information to pin down an

entire target function, so that we can predict outside the data?
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 21 /25 Other views of learning −→
Other Views of Learning
• Design: learning is from data, design is from specs and a model.

• Statistics, Function Approximation.
• Data Mining: find patterns in massive data (typically unsupervised).
• Three Learning Paradigms
– Supervised: the data is (xn , f (xn )) – you are told the answer.
– Reinforcement: you get feedback on potential answers you try:
x → try something → get feedback.
– Unsupervised: only given xn, learn to “organize” the data.
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 22 /25 Coins – supervised −→
Supervised Learning - Classifying Coins
25
25
5
Mass
Mass
1 5
1
10 10
Size Size
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 23 /25 Coins – unsupervised −→
Unsupervised Learning - Categorizing Coins
type 4
Mass
Mass
type 3
type 2
type 1
Size Size
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 24 /25 Puzzle: outside the data −→
Outside the Data Set - A Puzzle
Dogs (f = −1)
Trees (f = +1)
Tree or Dog? (f = ?)
c AM
L Creator: Malik Magdon-Ismail The Perceptron: 25 /25

Slides Lect 02

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slides Lect 02

Uploaded by

Copyright:

Available Formats

Learning From Data

7. Other Learning Paradigms.

our language will be mathematics . . .

• Salary, debt, years in residence, . . . input x ∈ Rd = X .

• Data on customers data set D = (x1, y1), . . . , (xN , yN ).

X Y and D are given by the learning problem;

We learn the function f from the data D.

UNKNOWN TARGET FUNCTION

(historical records of credit customers)

(set of candidate formulas)

• Input vector x = [x1, . . . , xd]t.

• Approve credit if the “Credit Score” is acceptable.

• How to choose the importance weights wi

can be written formally as

The “bias weight” w0 corresponds to the threshold. (How?)

We have defined a Hyopthesis set H

H = {h(x) = sign(wtx)} ← uncountably infinite H

This hypothesis set is called the perceptron or linear separator

h(x) = sign(wtx) (Problem 1.2 in LFD)

Which one should we pick?

We want to select g ∈ H so that g ≈ f .

A simple iterative method.

“incremental learning”on a single example at a time

After how long?

After how long?

After how long?

After how long?

After how long?

After how long?

After how long?

• Ultimately, remember that we want to predict.

Can a limited data set reveal enough information to pin down an

• Design: learning is from data, design is from specs and a model.

You might also like