2 Regression 2

Polynomial Regression
Shusen Wang
Warm-up: Linear Regression
Linear Regression (Task)
Input: vectors 𝐱" , ⋯ , 𝐱 % ∈ ℝ( and labels 𝑦" , ⋯ , 𝑦% ∈ ℝ
Output: a vector 𝐰 ∈ ℝ( and scalar 𝑏 ∈ ℝ such that 𝐱 ,- 𝐰 + 𝑏 ≈ 𝑦, .
Tasks assume 𝑦, is a linear

function of 𝐱 , .
Linear
Regression
Least Squares Regression (Method)
1. Add one dimension to 𝐱 ∈ ℝ( : 𝐱01 = [𝐱1 ; 1] ∈ ℝ(7" .
C
2. Solve least squares regression: min 𝐗 @ 𝐰 − 𝐲 .
𝐰∈ℝ<=> C
Tasks Methods
Linear Least Squares Regression

Regression
Least Squares Regression (Method)
1. Add one dimension to 𝐱 ∈ ℝ( : 𝐱01 = [𝐱1 ; 1] ∈ ℝ(7" .
C
2. Solve least squares regression: min 𝐗 @ 𝐰 − 𝐲 .
𝐰∈ℝ<=> C
Tasks Methods Algorithms

Analytical Solution
Linear Least Squares Regression
Regression Gradient Descent
Conjugate Gradient
The Regression Task
Input: vectors 𝐱" , ⋯ , 𝐱 % ∈ ℝ( and labels 𝑦" , ⋯ , 𝑦% ∈ ℝ.
Output: a function 𝑓: ℝ( ↦ ℝ such that 𝑓 𝐱 ≈ 𝑦.
Question: 𝑓 is unknown! So how to learn 𝑓?

The Regression Task
Output: a function 𝑓: ℝ( ↦ ℝ such that 𝑓 𝐱 ≈ 𝑦.
Question: 𝑓 is unknown! So how to learn 𝑓?
Answer: polynomial approximation; 𝑓 is a polynomial function.

JKK L
Taylor expansion: 𝑓 𝑥 = 𝑓 𝑎 + 𝑓I 𝑎 𝑎−𝑥 + 𝑎−𝑥 C +⋯
C!
Polynomial Regression: 1D Example
Input: scalars 𝑥" , ⋯ , 𝑥% ∈ ℝ and labels 𝑦" , ⋯ , 𝑦% ∈ ℝ.
Output: a function 𝑓: ℝ ↦ ℝ such that 𝑓 𝑥 ≈ 𝑦.
One-dimensional example: 𝑓 𝑥 = 𝑤O + 𝑤" 𝑥 + 𝑤C 𝑥 C + ⋯ + 𝑤P 𝑥 P .
Polynomial regression:
1. Define a feature map 𝛟 𝑥 = [1, 𝑥, 𝑥 C , 𝑥 R , ⋯ , 𝑥 P ].
2. For 𝑗 = 1 to 𝑛, do the mapping 𝑥1 ↦ 𝛟(𝑥1 ).
• Let 𝚽 = 𝛟 𝑥" ; ⋯ , 𝛟 𝑥% - ∈ ℝ%× P7"
C
3. Solve the least squares regression minY=> 𝚽 𝐰 − 𝐲 C
.
𝐰∈ℝ
Polynomial Regression: 2D Example
Input: vectors 𝐱" , ⋯ , 𝐱 % ∈ ℝC and labels 𝑦" , ⋯ , 𝑦% ∈ ℝ.
Output: a function 𝑓: ℝC ↦ ℝ such that 𝑓 𝐱 , ≈ 𝑦, .
Two-dimensional example: how to do feature mapping?
Polynomial features:
𝛟 𝐱 = [1, 𝑥" , 𝑥C , 𝑥"C , 𝑥CC , 𝑥" 𝑥C , 𝑥"R , 𝑥CR , 𝑥" 𝑥CC , 𝑥"C 𝑥C ].
degree-0 degree-1 degree-2 degree-3

In [1]:
import numpy
XIn= [1]:
numpy.arange(6).reshape(3, 2)
print('X = ')
import numpy
print(X)
X = numpy.arange(6).reshape(3, 2)
Xprint('X
= = ')
[[0 1]
print(X)
[2 3]
X[4
= 5]]
[[0 1]
[2 3]
In[4[2]:
5]]
from sklearn.preprocessing import PolynomialFeatures
In [2]:
poly = PolynomialFeatures(degree=3)
Phi = poly.fit_transform(X)
from sklearn.preprocessing
print('Phi = ') import PolynomialFeatures
poly = PolynomialFeatures(degree=3)
print(Phi)
Phi = poly.fit_transform(X)
print('Phi
Phi = = ')
print(Phi)
[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
Phi =
[ 1. 4. 5. 16. 20. 25. 64. 80. 100. 125.]]
[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
In
[ [1]:
1. 4. 5. 16. 20. 25. 64. 80. 100. 125.]]
degree-0 degree-1 degree-2 degree-3
from keras.datasets import boston_housing
• 𝐱: 𝑑-dimensional
• 𝛟(𝐱): degree-𝑝 polynomial
• The dimension of 𝛟 𝐱 is 𝑂 𝑑 P
Output: a function 𝑓: ℝ( ↦ ℝ such that 𝑓 𝐱 , ≈ 𝑦, .
# ! "
$ % $&
Training, Test, and Overfitting
Polynomial Regression: Training
Feature map: 𝛟 𝐱 =⊗P 𝐱0 . Its dimension is 𝑂 𝑑 P .
C
Least squares: min 𝚽 𝐰 − 𝐲 C .
𝐰
Question: what will happen as 𝑝 grows?
1. For sufficiently large 𝑝, the dimension of the feature 𝛟 𝐱 exceeds 𝑛.

2. Then you can find 𝐰 such that 𝚽 𝐰 = 𝐲. (Zero training error!)
Training and Testing
Underfitting Overfitting

Train:
Output: a function 𝑓: ℝ( ↦ ℝ such that 𝑓 𝐱 , ≈ 𝑦, .
Input: a never-seen-before feature vectors 𝐱′ ∈ ℝ( .

Test:
Input: predict its label by 𝑓 𝐱 I .
𝐱′ 𝐱′ 𝐱′
BAD GOOD BAD
linear model
15-degree polynomial
4-degree polynomial
Hyper-Parameter Tuning
Question: for the polynomial regression model, how to determine the degree 𝑝?
Answer: the degree 𝑝 leads to the smallest test error.
Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2
Cross-Validation (Naïve Approach)
for Hyper-Parameter Tuning
labels
features
𝑛 training samples 𝑚 test samples

labels
features
𝑛 training samples 𝑛àb validation 𝑚 test samples

samples

Validation Set
Train a degree-1 polynomial regression Valid. MSE = 23.1
𝑘-Fold Cross-Validation
𝑘-Fold Cross-Validation
1. Propose a grid of hyper-parameters.
• E.g. 𝑝 ∈ {1, 2, 3, 4, 5}.
2. Randomly partition the training samples to 𝑘 parts.
• 𝑘 − 1 parts for training.
• One part for test.
3. Compute the averaged test errors of the 𝑘 repeats.
• The average is called the validation error.
4. Choose the hyper-parameter 𝑝 that leads to the
smallest validation error.
Example: 5-fold cross-validation

Example: 10-Fold Cross-Validation
validation error
Example: 10-Fold Cross-Validation
hyper-parameter validation error
p=1 23.19
p=2 21.00
p=3 18.54
p=4 24.36
p=5 27.96
Real-World Machine Learning Competition
The Available Data
Training Public Private
Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 klmbno 𝐗 kpnàqr
Test Data
The public and private are mixed;
Participants cannot distinguish them.
Train A Model
Model
Prediction
Model
𝐲klmbno 𝐲kpnàqr
Submission to Leaderboard
Submission
Score=0.9527 Secret!
Submission to Leaderboard
Question: Why two leaderboards?

Submission
Answer: The score can be evilly used
for hyper-parameter tuning (cheating).
Score=0.9527 Secret!
Summary
• Polynomial regression for non-linear problems.

• Polynomial regression has a hyper-parameter 𝑝.
• Underfitting (very small 𝑝) and overfitting (very big 𝑝) .
• Tune the hyper-parameters using cross-validation.
• Make your model parameters and hyper-parameters
independent of the test set!!!

2 Regression 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 Regression 2

Uploaded by

Copyright:

Available Formats

Polynomial Regression

Output: a vector 𝐰 ∈ ℝ( and scalar 𝑏 ∈ ℝ such that 𝐱 ,- 𝐰 + 𝑏 ≈ 𝑦, .

Tasks assume 𝑦, is a linear

Linear Least Squares Regression

Tasks Methods Algorithms

Question: 𝑓 is unknown! So how to learn 𝑓?

Question: 𝑓 is unknown! So how to learn 𝑓?

Answer: polynomial approximation; 𝑓 is a polynomial function.

One-dimensional example: 𝑓 𝑥 = 𝑤O + 𝑤" 𝑥 + 𝑤C 𝑥 C + ⋯ + 𝑤P 𝑥 P .

Two-dimensional example: how to do feature mapping?

degree-0 degree-1 degree-2 degree-3

Question: what will happen as 𝑝 grows?

1. For sufficiently large 𝑝, the dimension of the feature 𝛟 𝐱 exceeds 𝑛.

Input: vectors 𝐱" , ⋯ , 𝐱 % ∈ ℝ( and labels 𝑦" , ⋯ , 𝑦% ∈ ℝ.

Input: a never-seen-before feature vectors 𝐱′ ∈ ℝ( .

𝑛 training samples 𝑚 test samples

𝑛 training samples 𝑛`ab validation 𝑚 test samples

Training Set Test Set

Example: 5-fold cross-validation

Question: Why two leaderboards?

• Polynomial regression for non-linear problems.

You might also like