Professional Documents
Culture Documents
Date of Performance:
Date of Submission:
Software: Python
Theory: In a simple linear regression, there is one independent variable and one dependent
variable . The model estimates the slope and intercept of the line of best fit, which represents the
relationship between the variables using a straight line Y= B0 + B1 X.
The slope represents the change in the dependent variable for each unit change in the independent
variable, while the intercept represents the predicted value of the dependent variable when the
independent variable is zero.
The goal of the linear regression algorithm is to get the best values for B0 and B1 to find the best
fit line. The best fit line is a line that has the least error which means the error between predicted
values and actual values should be minimum.
Methodology: #Python-Jupyter program for linear regression
import pandas as p
from sklearn.linear_model import LinearRegression
import numpy as n
import matplotlib.pyplot as mtp
data = p.read_csv("salary_Data.csv")
data.head()
YearsExperience Salary
0 1.1 39343.0
1 1.3 46205.0
2 1.5 37731.0
3 2.0 43525.0
4 2.2 39891.0
X = data.iloc[:,0]
X
0 1.1
1 1.3
2 1.5
3 2.0
4 2.2
5 2.9
6 3.0
7 3.2
8 3.2
9 3.7
10 3.9
11 4.0
12 4.0
13 4.1
14 4.5
15 4.9
16 5.1
17 5.3
18 5.9
19 6.0
20 6.8
21 7.1
22 7.9
23 8.2
24 8.7
25 9.0
26 9.5
27 9.6
28 10.3
29 10.5
Name: YearsExperience, dtype: float64
y= data.iloc[:,1]
y
0 39343.0
1 46205.0
2 37731.0
3 43525.0
4 39891.0
5 56642.0
6 60150.0
7 54445.0
8 64445.0
9 57189.0
10 63218.0
11 55794.0
12 56957.0
13 57081.0
14 61111.0
15 67938.0
16 66029.0
17 83088.0
18 81363.0
19 93940.0
20 91738.0
21 98273.0
22 101302.0
23 113812.0
24 109431.0
25 105582.0
26 116969.0
27 112635.0
28 122391.0
29 121872.0
Name: Salary, dtype: float64
xy = X*y
x_squared = X**2
n = len(data)
30
sum_y = sum(y)
sum_y
2280090.0
sum_x = sum(X)
sum_x
159.4
sum_x_squared = sum(x_squared)
sum_xy = sum(xy)
a = ((sum_y*sum_x_squared)-(sum_x*sum_xy))/((n*sum_x_squared)-sum_x**2)
a
25792.20019866868
b = ((n*sum_xy)-(sum_x*sum_y))/((n*sum_x_squared)-sum_x**2)
b
9449.962321455077
y = a+b*1.7
41857.136145142314
x_test = [1.7,2.5,6.5,1,2.2]
y_pred = []
for i in range(len(x_test)):
y_pred.append(a + b* x_test[i])
y_pred
[41857.136145142314,
49417.10600230638,
87216.95528812669,
35242.16252012376,
46582.11730586985]
l = LinearRegression()
x_reshaped = n.reshape(X,(-1,1))
print(l.fit(x_reshaped,y))
LinearRegression()
prediction = l.predict(x_reshaped)
prediction
mtp.plot(x_reshaped,prediction)
mtp.scatter(x_test,y_pred,color="red")
mtp.scatter(X,y,color="orange")
<matplotlib.collections.PathCollection at 0x1fdb7275340>
Conclusion: Study of linear regression in ML has been done successfully .
R1 R2 R3 R4 R5 Signature
(2 Marks) (3 Marks) (4 Marks) (4 Marks) (2 Marks)