You are on page 1of 3

Ex no.

:
Date :

Problem Statement:
Predicting the cost of homes in any rural area has become a significant difficulty for construction
companies.In order to anticipate the cost of dwellings in Coimbatore for a specific square foot, the
least squares method must be used.

Problem Analysis:
Here, we will create and assess a model that was trained and tested using information gathered from
homes in the outskirts of Coimbatore.We will use this model to forecast the financial worth of a house
in every area of Coimbatore once we have a good fit.For real estate brokers and construction
businesses, where they could use the data provided on a regular basis, a model like this would be
highly helpful.Two columns and twenty rows (Area, Price) make up our data collection.Due to the
nature of basic linear regression, this is a small dataset.Using the least squares method to the provided
dataset, we analyse the issue.The dataset is divided into training and test sets.

Dataset:

Code:
(a) Using Simple Linear Regressison From Scratch:

#SIMPLE LINEAR REGRESSION WITHOUT SKLEARN


#y=m*x+c
import pandas as pd
df=pd.read_csv("C:\\Users\\SHANMUGAPRIYAA\\OneDrive\\Documents\\DATA SCIENCE\\
SEMESTER-4\\machine learning lab\\Simple Linear Regression.csv")
x=df.area
y=df.price
#calculating the slope m of the regression line
#calculating SSxx=sum(xmean-x)^2
xmean=x.mean()
df['diffx']=xmean-x
df['diffxsquared']=df.diffx**2
SSxx=df.diffxsquared.sum()
#calculating SSxy=sum((xmean-x)*(ymean-y))
ymean=y.mean()
df['diffy']=ymean-y
SSxy=(df.diffx*df.diffy).sum()
#calculating SSxy=sum((xmean-x)*(ymean-y))
ymean=y.mean()
df['diffy']=ymean-y
SSxy=(df.diffx*df.diffy).sum()
#calculating slope m=SSxy/SSxx
m=SSxy/SSxx
#calculating the intercept b=ymean-m*xmean
b=ymean-m*xmean
def predict(value):
predict=m*value+b
return predict
predict(53)
print('Slope :',m)
import matplotlib.pyplot as plt
plt.scatter(x,y)
plt.plot(x,m*x+b,'r')
plt.xlabel('area')
plt.ylabel('price')
plt.show()

Output:

Slope : 85.70540588654167
Intercept : 3505.4143425112743
The equation is : y = 85.70540588654167 x + 3505.4143425112743

Inference:
The equation we obtain here is y = 85.70540588654167 x + 3505.4143425112743. The graph also
proves that there is no much deviation in the values. This model can be used further by training it with
a large data.

(b) Using Simple Linear Regressison with built in functions:

#SIMPLE LINEAR REGRESSION WITH SKLEARN


from sklearn.linear_model import LinearRegression
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("C:\\Users\\SHANMUGAPRIYAA\\OneDrive\\Documents\\DATA SCIENCE\\
SEMESTER-4\\machine learning lab\\Simple Linear Regression.csv")
x=df['area'].values.reshape(-1, 1)
y=df['price']
lr_model=LinearRegression()
lr_model.fit(x,y)
y_pred=lr_model.predict(x)
plt.scatter(x,y)
plt.xlabel('area')
plt.ylabel('price')
plt.plot(x,y_pred,'r')
plt.legend(["Datapoints", "Regression Line"], loc=0)
plt.show()

Output:
lr_model.coef_ : array([85.70540589])
lr_model.intercept_ : 3505.414342511275

Inference:
The equation we obtain here is y = 85.70540588654167 x + 3505.4143425112743. The graph also
proves that there is no much deviation in the values. This model can be used further by training it with
a large data.

You might also like