You are on page 1of 4

Linear Regression

It is a supervised learning algorithm which generally used for prediction problems. It is


a method to predict the target variable by finding a best fit line between the
independent and dependent variable. It is the approximation of linear model used to
describe the relationship between two or more variables.
Dependent Variable: - It can be seen as a “state” or “final goal” we study and try to
predict.
Independent Variable: - It is also known as explanatory variables which can be seen
as the “cause of states”.
Multiple Linear Regression: - When we have more than one independent variable
then we call linear regression as “Multiple Linear Regression”.
Multivariate Linear Regression: - When we have more than one dependent variable
then we call linear regression as “Multivariate Linear Regression”.

Best Fit Line

Error

The Very known equation of a simple linear regression Model is:


Y = a*x + b
Here, x -> Independent Variable

Y -> Dependent Variable


a -> Slope parameter which we can adjust
b -> Intercept parameter on dependent variable axis
We can also say that, [Y = f(x)], i.e. Y depends on x.
Least Square Method: - It is used to calculate the best fit line by making sure that the
sum of all the distances between the shape and the actual observations at each point is
as small as possible.
Steps for plotting best fit line: -
1. Calculate mean of x-values and y-values
2. Calculate slope of line
∑𝒏𝒊 𝟏(𝒙𝒊 − 𝒙)(𝒚𝒊 − 𝒚)
𝑴=
∑𝒏𝒊 𝟏(𝒙𝒊 − 𝒙)𝟐
3. Calculate the y-intercept of the line
𝑪= 𝒚− 𝒎∗𝒙

Example: -
Consider the following data: -

x 2 4 3 6 8 8 10
y 10 9 6 6 6 3 2

Now, we start plotting our best fit line for following example: -
Step-1: - Calculate the Mean values
(2 + 4 + 3 + 6 + 8 + 8 + 10) 41
𝑥̅ = = = 5.86
7 7

(10 + 9 + 6 + 6 + 6 + 3 + 2) 42
𝑌= = =6
7 7
Step-2: - Calculate the slope of line

i 𝒙𝒊 𝒚𝒊 𝒙𝒊 − 𝒙 𝒚𝒊 − 𝒚 (𝒙𝒊 − 𝒙) ∗ (𝒚𝒊 − 𝒚) (𝒙𝒊 − 𝒙)𝟐


1 2 10 -3.86 4 -15.44 14.9
2 4 9 -1.86 3 -5.58 3.46
3 3 6 -2.86 0 0 8.18
4 6 6 0.14 0 0 0.02
5 8 6 2.14 0 0 4.58
6 8 3 2.14 -3 -6.42 4.58
7 10 2 4.14 -4 -16.56 17.14

∑𝟕𝒊 𝟏(𝒙𝒊 − 𝒙)(𝒚𝒊 − 𝒚) −𝟒𝟒


𝑺𝒍𝒐𝒑𝒆(𝑴) = = = −𝟎. 𝟖𝟑
∑𝟕𝒊 𝟏(𝒙𝒊 − 𝒙)𝟐 𝟓𝟐. 𝟖𝟔

Step-3: - Calculate the y-intercept


y = m*x + c
6 = -0.83*5.86+c => c = 10.86
0 = -0.83*x + 10.86 => x = 10.86/0.83 = 13.1

y-intercept = 10.86 & x-intercept = 13.1


Now, we are ready with our best fit line. So, graph will look like:

Best-Fit Line
12
0, 10.86
10 2, 10
4, 9
8
Dependent Axis

6 3, 6 6, 6 8, 6

4
8, 3
2 10, 2

0 13.1, 0
0 2 4 6 8 10 12 14
-2
Independent Axis

Pros: -
1. Simple to implement.
2. Used to predict numeric values

Cons: -
1. Prone to overfitting
2. Cannot used it when the relation between x & y variables are non-linear

Logistic Regression: -

Probability of class, f(x) =

Probability lies between 0 – 1


0-> x approaches to infinity in -ve
1-> x approaches to infinity in +ve
Special kind of Non-linear regression
X is the sum of variables weighted by regression result (coefficient)
F(x) is the logistic function or logistic curve.

Function is also called as sigmoidal function and it draw s-shaped curve sigmoid curve
and it was firstly when scientist got a problem for modelling population growth.

f(x) =

LR passes the input through the f(x) but after it treats the result in terms of probability.
Iris Flower dataset.

You might also like