Professional Documents
Culture Documents
What is regression ?
Linear regression
Learning
• The algorithm learns the line, plane or hyper-plane that best fits the training samples.
Prediction
• Use learned line, plane or hyper-plane to predict output value for any input sample.
Y=f(X) f: linear function
Example: Given a set of data points: fit a line; out of the different linear functions
y=2*x+2 (x=range(1,100))
least squares regression line gives the unique line such that the sum of the squared vertical
distances (error) between the data points and the line is the smallest possible.
For the 2-D problem,
2. Keep changing the coefficients a little bit to try and reduce Cost Function/ error function
4. Repeat
A multiple linear regression model with k predictor variables X1, X2, ..., Xk and a
response Y , can be written as
Example
Plane in y
x2
x1
Extend it to higher dimensions as well!!
Parameter update
Polynomial regression
Polynomial Regression
Instead of finding a best fit “line” on the given data points, we can also try to find
the best fit “curve”.
Poor fit
Poor fit
So, always choose “complex” model with higher order polynomials to fit the data set??
NO, it may be possible that such a model gives very wrong predictions on test data.
Though it fits well on training data but fails to estimate the real relationship among
variables beyond the training set. This is known as “Over-fitting”.
Source: Bishop (Ch1)
Coefficients for polynomials of different order
Bias: Amount by which average of estimate differs from mean of actual values
Bias occurs: model has enough data to learn; but is not complex enough to
capture underlying relationships (or patterns); leading to low accuracy in
prediction. This is known as underfitting.
Variance occurs: model pays too much attention to data; high error on test set;
model unable to generalize its predictions to the larger population. High sensitivity
to the training set is also known as overfitting, and generally occurs when either
the model is too complex or when we do not have enough data to support it.
Bias variance decomposition
MSE = Bias2 + variance
𝑀𝑆𝐸 = 𝐸 𝑦ො − 𝑦 2
Let, 𝜇 = 𝐸𝐷 𝑦ො
2 𝐸 𝑦ො − 𝜇 = 0
= 𝐸 𝑦ො − 𝜇 + 𝜇 − 𝑦 (add, subtract 𝜇)
2 2 2
= 𝐸 𝑦ො − 𝜇 + 𝜇−𝑦 + 2 𝑦ො − 𝜇 𝜇 − 𝑦 ( 𝑎+𝑏 = 𝑎2 + 𝑏 2 + 2𝑎𝑏)
constant
Variance Bias2
Regularization: control overfitting
● Apply complex models to small datasets
● Regularization: add penalty term to error function in order to discourage
coefficients from reaching larger values
● Penalty term: sum of squares of all coefficients
Regularization
parameter
Effect of size of data on model complexity