You are on page 1of 11

Sir Syed University of Engineering & Technology, Karachi

Implementation of Linear
Regression

Week 3
Session 2

Batch - 2017 Department of Computer Science


Libraries Required

Step 1: Import the following libraries using {import


libraryName as alias}

• NumPy is a library for the Python programming language,


adding support for large, multi-dimensional arrays and
matrices, along with a large collection of high-level
mathematical functions to operate on these arrays.
• pandas is a software library written for
the Python programming language for data manipulation
and analysis. In particular, it offers data structures and
operations for manipulating numerical tables and time
series.
• matplotlib.pyplot is a python module for data visualization
Gradient Descent Algorithm
Gradient descent simply is an algorithm that
makes small steps along a function to find a
local minimum. We can look at a simply
quadratic equation such as this one:

• x_quad = [n/10 for n in range(0, 100)]


• y_quad = [(n-4)**2+5 for n in x_quad]
plt.figure(figsize = (10,7))
plt.plot(x_quad, y_quad, 'k--')
plt.axis([0,10,0,30])
plt.plot([1, 2, 3], [14, 9, 6], 'ro')
plt.plot([5, 7, 8],[6, 14, 21], 'bo')
plt.plot(4, 5, 'ko')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Quadratic Equation')
We’re trying to find the local minimum on this function. If we start at the first
red dot at x = 2, we find the gradient and we move against it. In this case, the
gradient is the slope. And since the slope is negative, our next attempt is
further to the right. Thus bringing us closer to the minimum.

If we start at the right-most blue dot at x = 8, our gradient or slope is positive,


so we move away from that by multiplying it by a -1. And it eventually makes
smaller and smaller updates (as the gradient approaches 0 at the minimum)
until the parameter converges at the minimum we’re looking for
Step 2: Search and download data file named ‘ex1data1.txt’ from internet

Step 3: Load the data file using pandas.read_csv( ) function as

data = pd.read_csv(FilePath/ex1data1.txt', names = ['population', 'profit'])

Step 4: Split population and profit into X and y


X_df = pd.DataFrame(data.population)
y_df = pd.DataFrame(data.profit)
m = len(y_df)

Step 5: Plot the data using matplotlib.pyplot.plot( ) function as,


plt.figure(figsize=(10,8))
plt.plot(X_df, y_df, 'kx')
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
Step 6: Initialize variables alpha and iter for gradient descent
iter = 1000
alpha = 0.01

Step 7: Add a columns of 1s as intercept to X

X_df['intercept'] = 1

Step 8: Transform the dataframes into Numpy arrays for easier matrix operations
and initialize theta to 0

X = np.array(X_df)
y = np.array(y_df).flatten()
theta = np.array([0, 0])
Step 9: Define cost function for Linear regression
𝑚
1 𝑖 2
𝐽 𝜃0 , 𝜃1 = (ℎ𝜃 𝑥 −𝑦 𝑖 )
2𝑚
𝑖=1

def cost_function(X, y, theta):


m = len(y)

## Calculate the cost with the given parameters


J = np.sum((X.dot(theta)-y)**2)/2/m

return J

Step 10: Call cost function as

cost_function(X, y, theta)
Step 11: Define function gradient descent as

def gradient_descent(X, y, theta, alpha, iterations):

cost_history = [0] * iterations

for iteration in range(iterations):


print X
print np.shape(X)
hypothesis = X.dot(theta)

loss = hypothesis-y
gradient = X.T.dot(loss)/m
theta = theta - alpha*gradient
cost = cost_function(X, y, theta)
cost_history[iteration] = cost

return theta, cost_history

Step 12: Call function gradient descent as

gd = gradient_descent(X,y,theta,alpha, iter)
Step 13: Plot the best fit line as

best_fit_x = np.linspace(0, 25, 20)


best_fit_y = [t[1] + t[0]*xx for xx in best_fit_x]

plt.figure(figsize=(10,6))
plt.plot(X_df.population, y_df, '.')
plt.plot(best_fit_x, best_fit_y, '-')
plt.axis([0,25,-5,25])
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.title('Profit vs. Population with Linear Regression Line')
plt.show()
Tasks to perform:

• Derive equations for Gradient Descent by using Linear


Regression cost function.
• Follow above steps and run the above code to get
optimized parameters for Linear Regression by using
Gradient Descent. Print the optimized parameters and
visualizations and attach in your file.
• By using the parameters obtained in above question,
manually solve linear regression hypothesis equation for
x=3.7 and x=7.4 by showing all necessary steps.
• Search a dataset for Linear Regression and apply same
algorithm on your dataset. Print the optimized parameters
and visualizations and attach in your file. Also attach the
code of this part in your file

You might also like