You are on page 1of 12

1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision

Machine Learning
Machine Learning Resources ▾
Machine Learning and Econometrics ▾
Supervised Learning Theory ▾
Overview (/machine-learning/)

One Variable Linear Regression (/one-variable-linear-regression/)

Linear Algebra (/linear-algebra-machine-learning/)

Multiple Variable Linear Regression (/multi-variable-linear-regression/)

Logistic Regression (/logistic-regression/)

Neural Networks (Representation) (/neural-networks-representation/)

Neural Networks (Learning) (/neural-networks-learning/)

Applying Machine Learning (/applying-machine-learning/)

Machine Learning Systems Design (/machine-learning-systems-design/)

Support Vector Machines (/machine-learning-svms-support-vector-machines/)

Unsupervised Learning Theory ▾


Reinforcement Learning Theory ▾
Deep Learning Theory ▾
Deep Learning with TensorFlow ▾
Machine Learning with Scikit-Learn ▾
Machine Learning Projects ▾

Linear Regression with Multiple Variables


Summary: Linear Regression with Multiple Variables.

Table of Contents
1. Multivariate Linear Regression
– 1a. Multiple Features (Variables)
– 1b. Gradient Descent for Multiple Variables

http://www.ritchieng.com/multi-variable-linear-regression/ 1/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision

– 1c. Gradient Descent: Feature Scaling


– 1d. Gradient Descent: Checking
– 1e. Gradient Descent: Learning Rate
– 1f. Features and Polynomial Regression

2. Computing Parameters Analytically


– 2a. Normal Equation
– 2b. Normal Equation Non-invertibility

1. Multivariate Linear Regression


I would like to give full credits to the respective authors as these are my personal python notebooks taken from
deep learning courses from Andrew Ng, Data School and Udemy :) This is a simple python notebook hosted
generously through Github Pages that is on my main personal notes repository on
https://github.com/ritchieng/ritchieng.github.io . They are meant for my personal review but I have open-source
my repository of personal notes as a lot of people found it useful.

1a. Multiple Features (Variables)


X1, X2, X3, X4 and more

http://www.ritchieng.com/multi-variable-linear-regression/ 2/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision

New hypothesis

Multivariate linear regression

Can reduce hypothesis to single number with a transposed theta matrix multiplied by x matrix

1b. Gradient Descent for Multiple Variables

http://www.ritchieng.com/multi-variable-linear-regression/ 3/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision

Summary

http://www.ritchieng.com/multi-variable-linear-regression/ 4/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision

New Algorithm

1c. Gradient Descent: Feature Scaling


Ensure features are on similar scale

Gradient descent will take longer to reach the global minimum when the features are not on a similar
scale

Feature scaling allows you to reach the global minimum faster

http://www.ritchieng.com/multi-variable-linear-regression/ 5/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision

So long they’re close enough, need not be between 1 and -1

Mean normalization

1d. Gradient Descent: Checking


Can you a graph

x-axis: number of iterations

http://www.ritchieng.com/multi-variable-linear-regression/ 6/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision

y-axis: min J(theta)

Or use automatic convergence test

Tough to gauge epsilon

Gradient descent that is not working (large learning rate)

1e. Gradient Descent: Learning Rate


Alpha (Learning Rate) too small: slow convergence

http://www.ritchieng.com/multi-variable-linear-regression/ 7/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision

Alpha (Learning Rate) too large:

J(theta) may not decrease on every iteration

May not converge (diverge)

Start with 0.001 and increase x3 each time until you reach an acceptable alpha

Choose a slightly smaller number than that acceptable alpha value

1f. Features and Polynomial Regression


Ensure the features capture the pattern

Doesn’t make sense to choose quadratic equation for house prices

Use cubic or square root

There are automatic algorithms, and this will be discussed later

http://www.ritchieng.com/multi-variable-linear-regression/ 8/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision

2. Computing Parameters Analytically

2a. Normal Equation


Method to solve for theta analytically

If theta is real number

Minimise J(theta) is to take the derivative and equate to zero

Solve for theta

If theta is not

Take partial derivative and equate to zero

http://www.ritchieng.com/multi-variable-linear-regression/ 9/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision

Solve for all thetas

Minimise Cost Function: Specific Example

X: m x (n + 1)

m: number of training examples

n: number of features

X_transpose: (n + 1) x m

X_transpose * X: (n + 1) x m * m x (n + 1) = (n + 1) x (n + 1)

(X_transpose * X)^-1 * X_transpose: (n + 1) x (n + 1) * (n + 1) x m = (n + 1) x m

http://www.ritchieng.com/multi-variable-linear-regression/ 10/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision

theta = (n + 1) x m * m x 1 = (n + 1) x 1

Minimise Cost Function: General

Minimise Cost: Octave Code

No need for feature scaling using normal equation

http://www.ritchieng.com/multi-variable-linear-regression/ 11/12
1/21/2018 Linear Regression with Multiple Variables | Machine Learning, Deep Learning, and Computer Vision

pinv (X' * X) * X' * y

Gradient Descent vs Normal Equation

Gradient Descent Normal Equation

Need to choose No need to choose alpha


alpha

Needs many Don’t need to iterate


iterations

Works with large n Slow if n is large (100, 1000 is fine)


(10,000)

Number of features So long number features < 1000


> 1000

2b. Normal Equation Non-invertibility


What happens if X_transpose * X is non-invertible (singular or degenerate)

pinv (X' * X) * X' * y

This works regardless if it is non-invertible

Intuition of non-invertibility

Causes of non-invertibility

http://www.ritchieng.com/multi-variable-linear-regression/ 12/12