CH - En.u4cse19101 Cheduri Linearregression

19CSE305 –MACHINE LEARNING
CHEDURI SURYA UMA SHANKAR

CH.EN.U4CSE19101
TITLE: Python lab exercise to implement linear regression
AIM: To find the best fit line for the given data using Linear
Regression
DATASET:
Auto Insurance in Sweden
In the following data

X = number of claims
Y = total payment for all the claims in thousands of Swedish Kronor
for geographical zones in Sweden
X Y
108 392
19 46
13 1
124 422
40 114
57 170
23 5
14 77
45 214
10 65
5 20
48 248
11 23
23 39
7 48
2 6
24 134
6 50
3 4
23 113
6 14
9 48
9 52
3 13
29 103
7 77
4 11
20 98
7 27
4 38
0 0
25 69
6 14
5 40
22 161
11 57
61 217
12 58
4 12
16 59
13 89
60 202
41 181
37 152
55 162
41 73
11 21
27 92
8 76
3 39
17 142
13 93
13 31
15 32
8 55
29 133
30 194
24 137
9 87
31 20
14 95
53 244
26 187
IMPLEMENT DETAILS:
In statistics, linear regression is a linear approach to modelling the
relationship between a dependent variable and one or more
independent variables. Let X be the independent variable and Y be the
dependent variable. We will define a linear relationship between these
two variables as follows:
Y = mX+C
Our objective is to determine the value of m and c, such that the line
corresponding to those values is the best fitting line or gives the
minimum error.
Loss Function:
The loss is the error in our predicted value of m and c. We will use the
Mean Squared Error function to calculate the loss. There are three
steps in this function:
• Find the difference between the actual y and predicted y value
(y = mx + c), for a given x.
• Square this difference.
• Find the mean of the squares for every value in X.
Here yᵢ is the actual value and ȳᵢ is the predicted value. Let’s
substitute the value of ȳᵢ:
So, we square the error and find the mean. hence the name Mean
Squared Error.
The Gradient Descent Algorithm:

Gradient descent is an iterative optimization algorithm to find the
minimum of a function. Here that function is our Loss Function.
Now, apply gradient descent to m and c and approach it step by step:

• Initially let m = 0 and c = 0. Let L be our learning rate. This
controls how much the value of m changes with each step. L
could be a small value like 0.0001 for good accuracy.
• Calculate the partial derivative of the loss function with respect
to m, and plug in the current values of x, y, m and c in it to
obtain the derivative value D.
Dₘ is the value of the partial derivative with respect to m.

Similarly let’s find the partial derivative with respect to c, Dc:
• Now we update the current value of m and c using the following
equation:
• We repeat this process until our loss function is a very small

value or ideally 0 (which means 0 error or 100% accuracy). The
value of m and c that we are left with now will be the optimum
values.
CODE:
# Making the imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (12.0, 9.0) # plot size
# Preprocessing Input data

data = pd.read_csv('ml_lab_1.csv')
X = data.iloc[:, 0] # get all rows from column 0
Y = data.iloc[:, 1] # get all rows from column 1
plt.scatter(X, Y) # draw a scatter plot
plt.show() # display all figures
# Building the model

m = 0 # m is the slope of the line
c = 0 # c is the y intercept
L = 0.0001 # The learning Rate
iters = 1000 # The number of iterations to perform gradient descent
n = float(len(X)) # Number of elements in X
# Performing Gradient Descent

for i in range(iters):
Y_pred = m*X + c # The current predicted value of Y
D_m = (-2/n) * sum(X * (Y - Y_pred)) # Derivative wrt m
D_c = (-2/n) * sum(Y - Y_pred) # Derivative wrt c
m = m - L * D_m # Update m
c = c - L * D_c # Update c
print (m, c)
# Making predictions
Y_pred = m*X + c
plt.scatter(X, Y)
plt.plot([min(X), max(X)], [min(Y_pred), max(Y_pred)],
color='green') # predicted
plt.show()
Result:
m = 3.70478986556524, c = 1.6365274116383624
OBSERVATIONS:
• When learning rate is set to 0.0001, we are getting a slope with
min error and there are 13 points very close to the slope
• When learning rate is set to 0.001, we can say that the points
moved a little further from the slope
• When learning rate is set to 0.00001, we are getting a slope with

min error and there are 13 points very close to the slope almost
similar to the result obtained when L=0.0001
• When learning rate is 0.1, we are not getting any slope
Hence, we can say 0.0001 is the ideal learning rate for the plotting the
graph.

CH - En.u4cse19101 Cheduri Linearregression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH - En.u4cse19101 Cheduri Linearregression

Uploaded by

Copyright:

Available Formats

19CSE305 –MACHINE LEARNING

CHEDURI SURYA UMA SHANKAR

TITLE: Python lab exercise to implement linear regression

Auto Insurance in Sweden

In the following data

The Gradient Descent Algorithm:

Now, apply gradient descent to m and c and approach it step by step:

Dₘ is the value of the partial derivative with respect to m.

• We repeat this process until our loss function is a very small

# Preprocessing Input data

# Building the model

n = float(len(X)) # Number of elements in X

# Performing Gradient Descent

• When learning rate is set to 0.00001, we are getting a slope with

You might also like