Professional Documents
Culture Documents
2
Regression Problem
3
Regression Problem
𝑦 = 𝑤! + 𝑤"𝑥
Slope
Intercept
In the example
• 𝑤! = 15
• 𝑤" = 0.8
4
Regression Problem
5
Regression Problem
• Many alternative
solutions
6
Regression Problem
• Many alternative
solutions
• Linear modeling is
an assumption
7
Regression Problem
• Many alternative
solutions
• Linear modeling is
an assumption
8
Linear Regression
Given observed pairs < 𝑥! , 𝑦! >, ∀𝑖 ∈ 𝑁, find the best linear model
Noisy observations
𝑦 = 𝑤! + 𝑤" 𝑥
• Infinite solutions exist
• Need to define an
optimality criterion
Least Squares
Estimation!
9
Least Squares Regression
10
Least Squares Regression
Then, the goal is to find the value of the weights / parameters 𝑤" , 𝑤#
(sometimes named as 𝛽" , 𝛽# ) which minimize the RSS
11
Multivariate Linear Regression
𝑦 = 𝑤! + 𝑤" 𝑥" + 𝑤# 𝑥# + ⋯ + 𝑤$ 𝑥$ + 𝜖
𝑦
Example:
A regression with 2 features [𝑥", 𝑥&] and 1 target
variable 𝑦. The least squares solution is a plane
𝑥"
chosen by minimizing the distances between the
observations and the plane.
𝑥#
12
Multivariate Linear Regression
𝑦 = 𝑤! + 𝑤" 𝑥" + 𝑤# 𝑥# + ⋯ + 𝑤$ 𝑥$ + 𝜖
𝑦
%
&
𝑅𝑆𝑆 = . 𝑦# − 𝑤! + 𝑤"𝑥#" + 𝑤&𝑥#& + ⋯ + 𝑤- 𝑥#-
#$"
(
$ '
𝑥" = * 𝑦! − 𝑤% + * 𝑤& 𝑥!&
!"# &"#
𝑥#
𝑅𝑆𝑆(𝑤) = 𝑦 − X𝑤 . (𝑦 − X𝑤)
13
Multivariate Linear Regression
𝑦 = 𝑤! + 𝑤" 𝑥" + 𝑤# 𝑥# + ⋯ + 𝑤$ 𝑥$ + 𝜖
𝑦
𝑦! 1 𝑥!! 𝑥!" ⋯ 𝑥!$ 𝑤%
𝑦" 1 𝑥"! 𝑥"" ⋯ 𝑥"$ 𝑤!
⋯ ⋯ ⋯ ⋯ ⋯ ⋯ 𝑤"
𝑦# 1 𝑦#! 𝑦#" ⋯ 𝑥#$ ⋯
𝑤$
𝑥" • 𝑦 = N x 1 vector of target values
• X is a N x (M+1) data matrix
• 𝑤 is a (M+1) x 1 vector of weights
𝑥#
𝑅𝑆𝑆(𝑤) = 𝑦 − X𝑤 . (𝑦 − X𝑤)
14
Multivariate Linear Regression
𝑅𝑆𝑆(𝑤) = 𝑦 − X𝑤 . (𝑦 − X𝑤)
Quadratic function,
Let’s compute the derivatives thus convex, thus
unique minimum …
𝜕𝑅𝑆𝑆
= −2X . 𝑦 − X𝑤 .
𝜕𝑤
Putting the derivatives equal to zero and solving for 𝑤 we obtain
𝑤 = X.X /" X . 𝑦
which are known as the normal equations for the least squares problem.
Quite expensive with Moore-Penrose
many samples … pseudo-inverse of X
15
Non linearity of data
𝑦 = 𝑤! + 𝑤"𝑥" + 𝑤&𝑥& + ⋯ + 𝑤- 𝑥- + 𝜖
16
Generalized Linear Regression
Given a set of input variables 𝑥, a set of 𝑁 examples < 𝑥! , 𝑦! > and a set
of D features ℎ1 computed from the input variables 𝑥! , we get the model
18
Polinomial Regression
19
Polinomial Regression
20
Polinomial Regression
21
Polinomial Regression
22
Coefficient of Determination 𝑹𝟐
Coefficient of determination
What about
new data?
R2 measures of how well the regression line approximates the real data
points. When R2 is 1, the regression line perfectly fits the data.
23
Model Evaluation
Models should be evaluated using data that have not been used to build
the model itself:
• Training data will be used to build the model
• Test data will be used to evaluate the model performance
24
Hold Out Evaluation
Reserves a certain amount for testing and uses the remainder for training
• Too small training sets might result in poor weight estimation
• Too small test sets might result in a poor estimation of future performance
Typically,
• Reserve 2/3 for training and 1/3 for testing (but percentage may vary)
25
Hold Out Evaluation on Housing Data
Given the original dataset, split the data into 2/3 train and 1/3 test and
then apply linear regression using polynomials of increasing degree.
26
Cross-Validation
27
Cross-Validation
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10 p1
test train
1 2 3 4 5 6 7 8 9 10 p2
train test train
… … …
1 2 3 4 5 6 7 8 9 10 p10 Sometimes
train test repeated multiple
times (e.g., 10)
28
K-fold Cross-Validation on Housing Data
29
Regression: the machine learning perspective
Example: Increasing Sales by Advertising
31
What can we ask to the data?
33
Example: TV Advertising vs Sales
34
Least squares fitting
Where and
35
Least square solution …
36
Least squares fitting
Where and
But are they
any good?
37
Population regression line
39
Parameters confidence intervals
From standard errors we can compute confidence intervals for the linear
regression parameters, e.g., the 95% confidence intervals for the
parameters are
the true slope is, with 95% probability, in the range This should be the 97.5
quantile of a t-distribution
40
Standard error & linear regression
The (squared) standard error for the mean estimator represents the
average distance of the sample mean from the real mean
41
Example: TV Advertising data
Sales without any
If we consider the 95% confidence intervals advertising
42