Professional Documents
Culture Documents
Jia-Bin Huang
ECE-5424G / CS-5824 Virginia Tech Spring 2019
BRACE YOURSELVES
WINTER IS COMING
BRACE YOURSELVES
HOMEWORK IS COMING
Administrative
• Office hour
• Chen Gao
• Shih-Yang Su
• Feedback (Thanks!)
• Notation?
• Video/audio recording?
Dimensionality
Continuous Regression reduction
Recap: Nearest neighbor classifier
• Training data
• Learning
Do nothing.
• Testing
, where
Recap: Instance/Memory-based Learning
1. A distance metric
• Continuous? Discrete? PDF? Gene data? Learn the metric?
2. How many nearby neighbors to look at?
• 1? 3? 5? 15?
3. A weighting function (optional)
• Closer neighbors matter more
4. How to fit with the local points?
• Kernel regression
• Cost function
• Gradient descent
• Normal equation
Linear Regression
• Model representation
• Cost function
• Gradient descent
• Normal equation
Regression
Training set
real-valued output
Learning Algorithm
𝑥 h 𝑦
Size of house Hypothesis Estimated price
House pricing prediction
Price ($)
in 1000’s
400
300
200
100
• Notation:
• = Number of training examples
• = Input variable / features Examples:
• = Output variable / target variable
• (, ) = One training example
• (, ) = training example
Slide credit: Andrew Ng
Model representation
Training set
Shorthand
𝑥 h 𝑦
200
100
Size of house Hypothesis Estimated price 500 1000 1500 2000 2500
Size in feet^2
• Cost function
• Gradient descent
• Normal equation
Size in feet^2 (x) Price ($) in 1000’s (y)
Training set 2104 460
1416 232
1534 315
852 178 = 47
… …
• Hypothesis
: parameters/weights
1 2 3 𝑥 1 2 3 𝑥 1 2 3 𝑥
400
300
200
Cost function
100
• Parameters: • Parameters:
• Goal: • Goal:
𝜃 0 , 𝜃1 𝜃 0 , 𝜃1
Slide credit: Andrew Ng
, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3
2 2
1 1
1 2 3 𝑥 0 1 2 3 𝜃1
2 2
1 1
1 2 3 𝑥 0 1 2 3 𝜃1
2 2
1 1
1 2 3 𝑥 0 1 2 3 𝜃1
2 2
1 1
1 2 3 𝑥 0 1 2 3 𝜃1
2 2
1 1
1 2 3 𝑥 0 1 2 3 𝜃1
• Parameters:
• Cost function:
• Goal:
𝜃 0 , 𝜃1
Slide credit: Andrew Ng
Cost function
• Cost function
• Gradient descent
• Normal equation
Gradient descent
Have some function
Want
𝜃 0 , 𝜃1
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at minimum
Slide credit: Andrew Ng
Slide credit: Andrew Ng
Gradient descent
Repeat until convergence{
(for and )
}
𝜕
3 𝐽 ( 𝜃1 ) < 0
𝜕 𝜃1
𝜕
2 𝐽 ( 𝜃1 ) > 0
𝜕 𝜃1
0 1 2 3 𝜃1
Slide credit: Andrew Ng
Learning rate
Gradient descent for linear regression
Repeat until convergence{
(for and )
}
•:
•:
• Cost function
• Gradient descent
• Normal equation
Training dataset
Size in feet^2 (x) Price ($) in 1000’s (y)
2104 460
1416 232
1534 315
852 178
… …
h 𝜃 ( 𝑥 )=𝜃 0+ 𝜃1 𝑥
Notation:
= Number of features
= Input features of training example
= Value of feature in training example
Slide credit: Andrew Ng
Hypothesis
Previously:
Now:
}
Simultaneously update
𝜃2 𝜃2
3 3
2 2
1 1
𝜃1 𝜃1
0 1 2 3 0 1 2 3 Slide credit: Andrew Ng
Gradient descent in practice: Learning rate
• Automatic convergence test
• too small: slow convergence
• too large: may not converge
• To choose , try
• Area
300
200
100 = (size)
500 1000 1500 2000 2500
= (size)^2
Size in feet^2 = (size)^3
• Cost function
• Gradient descent
• Normal equation
() Size in feet^2 Number of Number of Age of home Price ($) in
() bedrooms () floors () (years) () 1000’s (y)
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
… …
[ ]
460
𝑦 = 232
315
178
⊤ −1 ⊤
𝜃=( 𝑋 𝑋) 𝑋 𝑦 Slide credit: Andrew Ng
Least square solution
•
Justification/interpretation 1
• Loss minimization
• Cost function
• Gradient descent for linear regression
Repeat until convergence {}
• Features and polynomial regression
Can combine features; can use different functions to generate features (e.g.,
polynomial)
• Normal equation
Next
• Naïve Bayes, Logistic regression, Regularization