You are on page 1of 1

normal equation

In the "Normal Equation" method, we will minimize J by explicitly taking its derivatives with
respect to the θj s and setting them to zero. This allows us to find the optimum theta without
iteration.

we want ¿ solve for w : Xw = y but sometimes there are no solutions,


2
so we want ¿ minimize the cost function J ( w ) =( y−Xw )
2 T T T T T
J ( w )=( y−Xw ) =( y−Xw ) ( y−Xw ) = y y−( y ) ( Xw )−( Xw ) ( y ) + ( Xw ) ( Xw )
T T T T T
¿ y y−2 w X y + w X Xw
Set the derivative of the cost function to zero as the local optimum’s derivative must be zero.

∂ J ( w) T T
=−2 X y +2 X Xw set ¿ 0
∂w
T T
−2 X y +2 X Xw =0
T T
−X y+ X Xw =0
T T
X Xw= X y
−1 T
w=( X T X ) X y

normal equation vs gradient descent


gradient descent: | normal equation:

need to choose learning rate α | don't need to choose α

need to do many iterations | don't need to iterate - computed in one step

works well with large n | slow if n is large (n ⩾ 10000)

| if (XTX) is not-invertible - we have problems

You might also like