You are on page 1of 4

Machine Learning

Simple Linear Regression:


· Regression is used when we try to find relation between variables.

· Linear regression is when we try to find linear relation.

e.g, salary on experience

Y = b0 +b1X

Y= dependent variable

X = independent variable

b0 = intercept

b1 = slope

b1 = sum{(x-x_mean)(y-y_mean)}divided by sum{(x-x_mean) 2}

Ordinary least square method


· It is used to find the best fitting line.

· Output is least calculated by this formulla

sum{(actual - predicted)2} for every point.

Multiple Linear Regression


y=b0 + b1x1 + b2x2 + ......+bnxn
· It is used to find the relation between many variables.

Multi-colinearity:

output is same as intercept.


solution:

1. either eliminate intercept.

2. or eliminate one of dummy variables (d-1)

Backward Elimination:

3. Select a significance level to stay in model (default 5%)

4. Fit the model with all predictors (input values).

5. Look for higest value of p. if p>sL , then goto iv

6. Remove the p value.

7. Again fit the model without p value.

Polynomial linear Regression:


y = b0 x0+ b1x1 + b2x2 + b3x3 + .......+bnxn

Decision Tree:
· Regression

· classification

Regression

entropy:

It is used to calculate independent varaible.

Random Forest Algorithm:


it uses ensemble learning (i.e, either combine multiple
algorithms to predict a better output or use one algorithm
multiple times to predict a better output).
Steps:

8. Select k random points.

9. Make a decision tree.

10. Select a n tree and repeat i and ii.

11. Predict all the y output and take average of all outputs.

Logistic Regression:
· It is used when the data is non continous.

· It is a classifier.

· The data is in the form of yes or no.

ln(p/1-p) = b0 + b1x

Feature Scaling:
It is used to scale the feature in same scale in order
to make calculation efficient.
Confusion Matrix:
It gives us the correct and incorrect results in a
matrix form.

K Nearest Neighbour:
· It is a classification algorithm.
· By default k=5

· If I add a random point. To predict its catagory,we did this:

12. Select k random points.

13. Find k nearest points by euclidean distance formulla.

14. The catagory having more no. of neighbours will be considered the
catagory where the new point will belong.

SVM(support vector machine):


15. Itdivides catogirical linear seperable data by optimal
hyperplane.
16. optimalhyperplane is drawn by calculating max.
margin(by finding nearest points).
17. These points are called supporting vectors.
18. The data point seperate themselves by comparing
eachother.

You might also like