1- You need feature scaling if you are using Gradient Descent
method but not if you are using finding Minimum by d Y/d X =
0
2- If cost function is increasing with more iteration of
Gradient Descent than learning rate may be too high for the model converse to minimum.
3- Time complexity of inverting matrix is O(N3)
4- Common cause of Transpose(X) * X is non-invertible
a- Redundant features ( one feature is linearly dependent on some other features) b- Too many features ( m <= n) 5- LSTM # Need to read in more details
LR - assumptions LR gradient descent Conjugate descent BFGS, L-BFGS