Professional Documents
Culture Documents
Shusen Wang
Warm-up: Linear Regression
Linear Regression (Task)
Input: vectors 𝐱" , ⋯ , 𝐱 % ∈ ℝ( and labels 𝑦" , ⋯ , 𝑦% ∈ ℝ
Linear
Regression
Least Squares Regression (Method)
Input: vectors 𝐱" , ⋯ , 𝐱 % ∈ ℝ( and labels 𝑦" , ⋯ , 𝑦% ∈ ℝ
1. Add one dimension to 𝐱 ∈ ℝ( : 𝐱01 = [𝐱1 ; 1] ∈ ℝ(7" .
C
2. Solve least squares regression: min 𝐗 @ 𝐰 − 𝐲 .
𝐰∈ℝ<=> C
Tasks Methods
Conjugate Gradient
Polynomial Regression
The Regression Task
Input: vectors 𝐱" , ⋯ , 𝐱 % ∈ ℝ( and labels 𝑦" , ⋯ , 𝑦% ∈ ℝ.
Output: a function 𝑓: ℝ( ↦ ℝ such that 𝑓 𝐱 ≈ 𝑦.
Polynomial regression:
1. Define a feature map 𝛟 𝑥 = [1, 𝑥, 𝑥 C , 𝑥 R , ⋯ , 𝑥 P ].
2. For 𝑗 = 1 to 𝑛, do the mapping 𝑥1 ↦ 𝛟(𝑥1 ).
• Let 𝚽 = 𝛟 𝑥" ; ⋯ , 𝛟 𝑥% - ∈ ℝ%× P7"
C
3. Solve the least squares regression minY=> 𝚽 𝐰 − 𝐲 C
.
𝐰∈ℝ
Polynomial Regression: 2D Example
Input: vectors 𝐱" , ⋯ , 𝐱 % ∈ ℝC and labels 𝑦" , ⋯ , 𝑦% ∈ ℝ.
Output: a function 𝑓: ℝC ↦ ℝ such that 𝑓 𝐱 , ≈ 𝑦, .
Polynomial features:
𝛟 𝐱 = [1, 𝑥" , 𝑥C , 𝑥"C , 𝑥CC , 𝑥" 𝑥C , 𝑥"R , 𝑥CR , 𝑥" 𝑥CC , 𝑥"C 𝑥C ].
Polynomial Regression
import numpy
XIn= [1]:
numpy.arange(6).reshape(3, 2)
print('X = ')
import numpy
print(X)
X = numpy.arange(6).reshape(3, 2)
Xprint('X
= = ')
[[0 1]
print(X)
[2 3]
X[4
= 5]]
[[0 1]
[2 3]
In[4[2]:
5]]
from sklearn.preprocessing import PolynomialFeatures
In [2]:
poly = PolynomialFeatures(degree=3)
Phi = poly.fit_transform(X)
from sklearn.preprocessing
print('Phi = ') import PolynomialFeatures
poly = PolynomialFeatures(degree=3)
print(Phi)
Phi = poly.fit_transform(X)
print('Phi
Phi = = ')
print(Phi)
[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
Phi =
[ 1. 4. 5. 16. 20. 25. 64. 80. 100. 125.]]
[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
In
[ [1]:
1. 4. 5. 16. 20. 25. 64. 80. 100. 125.]]
degree-0 degree-1 degree-2 degree-3
from keras.datasets import boston_housing
Polynomial Regression
• 𝐱: 𝑑-dimensional
• 𝛟(𝐱): degree-𝑝 polynomial
• The dimension of 𝛟 𝐱 is 𝑂 𝑑 P
Polynomial Regression
Input: vectors 𝐱" , ⋯ , 𝐱 % ∈ ℝ( and labels 𝑦" , ⋯ , 𝑦% ∈ ℝ.
Output: a function 𝑓: ℝ( ↦ ℝ such that 𝑓 𝐱 , ≈ 𝑦, .
# ! "
$ % $&
Training, Test, and Overfitting
Polynomial Regression: Training
Input: vectors 𝐱" , ⋯ , 𝐱 % ∈ ℝ( and labels 𝑦" , ⋯ , 𝑦% ∈ ℝ.
Feature map: 𝛟 𝐱 =⊗P 𝐱0 . Its dimension is 𝑂 𝑑 P .
C
Least squares: min 𝚽 𝐰 − 𝐲 C .
𝐰
Underfitting Overfitting
Training and Testing
𝐱′ 𝐱′ 𝐱′
BAD GOOD BAD
Training and Testing
linear model
15-degree polynomial
4-degree polynomial
Hyper-Parameter Tuning
Question: for the polynomial regression model, how to determine the degree 𝑝?
Answer: the degree 𝑝 leads to the smallest test error.
Hyper-Parameter Tuning
Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2
Train a degree-2 polynomial regression Test MSE = 19.0
Train a degree-3 polynomial regression Test MSE = 16.7
Train a degree-4 polynomial regression Test MSE = 12.2
Train a degree-5 polynomial regression Test MSE = 14.8
Train a degree-6 polynomial regression Test MSE = 25.1
Train a degree-7 polynomial regression Test MSE = 39.4
Train a degree-8 polynomial regression Test MSE = 53.0
Hyper-Parameter Tuning
Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2
Train a degree-2 polynomial regression Test MSE = 19.0
Train a degree-3 polynomial regression Test MSE = 16.7
Train a degree-4 polynomial regression Test MSE = 12.2
Train a degree-5 polynomial regression Test MSE = 14.8
Train a degree-6 polynomial regression Test MSE = 25.1
Train a degree-7 polynomial regression Test MSE = 39.4
Train a degree-8 polynomial regression Test MSE = 53.0
Hyper-Parameter Tuning
Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2
Train a degree-2 polynomial regression Test MSE = 19.0
Train a degree-3 polynomial regression Test MSE = 16.7
Train a degree-4 polynomial regression Test MSE = 12.2
Train a degree-5 polynomial regression Test MSE = 14.8
Train a degree-6 polynomial regression Test MSE = 25.1
Train a degree-7 polynomial regression Test MSE = 39.4
Train a degree-8 polynomial regression Test MSE = 53.0
Cross-Validation (Naïve Approach)
for Hyper-Parameter Tuning
Cross-Validation (Naïve Approach)
labels
features
labels
features
validation error
Example: 10-Fold Cross-Validation
hyper-parameter validation error
p=1 23.19
p=2 21.00
p=3 18.54
p=4 24.36
p=5 27.96
Real-World Machine Learning Competition
The Available Data
Training Public Private
Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 klmbno 𝐗 kpn`aqr
Test Data
The public and private are mixed;
Participants cannot distinguish them.
Train A Model
Training Public Private
Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 klmbno 𝐗 kpn`aqr
Model
Prediction
Training Public Private
Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 klmbno 𝐗 kpn`aqr
Model
𝐲klmbno 𝐲kpn`aqr
Submission to Leaderboard
Training Public Private
Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 klmbno 𝐗 kpn`aqr
𝐲klmbno 𝐲kpn`aqr
Submission
Score=0.9527 Secret!
Submission to Leaderboard
Training Public Private
Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 klmbno 𝐗 kpn`aqr