You are on page 1of 51

HOUSE PRICES PREDICTION SYSTEM

An Interim report was Submitted in partial fulfillment of the requirements for the
award of the degree of

BACHELOR OF TECHNOLOGY

in

ELECTRONICS AND COMMUNICATION ENGINEERING

by

Name of the student


180040161 - MADDASANI VENKATA RAMI REDDY

Under the esteemed guidance of

Mrs. Kasthuri
Developer
Techciti Software Consulting Private
Limited

Company Logo

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING


K L UNIVERSITY
Green Fields, Vaddeswaram, Tadepalli,
Guntur - 522 502, Andhra Pradesh.

2021-22
TECHCITI SOFTWARE CONSULTING PVT LTD.
3rd Floor, BNR Complex. ,Sri Rama Layout,J.P Nagar - 7th Phase, Bangalore - 560078(Above:
State Bank of India)Bus stop: Brigade Millennium

Bonafide Certificate
Company Logo

This is to certify that this Interim report entitled “HOUSE PRICES PREDICTION
SYSTEM” submitted to the Department of Electronics & Communication Engineering, KL
Deemed to be University, Guntur, in connection with the University internship program is a
bonafide record of work done by “MADDASANI VENKTATA RAMI REDDY” under my
supervision at the “TECHCITI SOFTWARE CONSULTING PVT LTD” from “6TH DEC 2021”
to “19TH MAR 2022”.

Signature of the Company Guide

Ms.Kasthuri K
Software Developer
K L Deemed to be UNIVERSITY

DEPARTMENT OF

ELECTRONIC AND COMMUNICATION ENGINEERING

DECLARATION

The Project Report entitled "HOUSE PRICES PREDICTION SYSTEM” is a record of


bonafide work of “Maddasani Venkata rami reddy (Id No:180040161)” submitted in partial
fulfilment for the award of B. Tech in Electronics and Communication engineering to the K L
University. The results embodied in this report have not been copied from any other
departments/university/institute.

By

Maddasani Venkata Rami reddy


(Id No: 180040161)
K L Deemed to be UNIVERSITY

DEPARTMENT OF

ELECTRONIC AND COMMUNICATION ENGINEERING

CERTIFICATE

This is to certify that the Project Report entitled ““Houses Prices Prediction system” is being
submitted by “180040161- MADDASANI VENKATA RAMI REDDY” submitted in partial fulfillment
for the award of B. Tech in Electronics and communication engineering to KL University is a record of
bonified work carried out under guidance and supervision. The results in this report have not been copied
from any other departments/University/Institute.

Signature of the Department coordinator Signature of the HOD

Signature of PS University Guide Signature of the External Examiner

Signature of Company Guide


ACKNOWLEDGEMENT

I have taken efforts in this project. However, it would not have been possible without the
kind, support andhelp of Ms. Kasthuri.k Company Guide.

I am highly indebted to Ms. Kasthuri.k, Project Coordinator for her guidance and constant
supervision aswell as for providing necessary information regarding the project & also for
her support for completing the project.

We wish to express our sincere thanks to Dr. M Suman, Head of department of ECE
for providing anopportunity to undertake this project.

And we would wish to our sincere thanks to our University Guide, Mr.Vinay Atgur
for his valuableguidance and suggestions in completing our project successfully.

It would be unfair if we do not mention the invaluable contribution and timely cooperation
extended by staff members of our department. We would like to thank our Institution
without which this project wouldhave been a distant reality.

I would like to express our gratitude towards our parents for their kind co-operation and
encouragementwhich help us in completion of this project.

Maddasani Venkata rami reddy


Id No:180040161

5
ABSTRACT

House price forecasting is an important topic of real estate. The literature attempts to
derive useful knowledge from historical data of property markets. Machine learning
techniques are applied to analyze historical property transactions in the USA to discover
useful models for house buyers and sellers. Revealed is the high discrepancy between
house prices in the most expensive and most affordable suburbs. Moreover, experiments
demonstrate that the Multiple Linear Regression that is based on mean squared error
measurement is a competitive approach. People are careful when they are trying to buy a
new house with their budgets and market strategies. The objective of the paper is to
forecast the coherent house prices for non-house holders based on their financial
provisions and their aspirations. By analyzing the foregoing merchandise, fare ranges and
also forewarns developments, speculated prices will be estimated. The paper involves
predictions using different Regression techniques like Multiple linear, Ridge, LASSO,
Elastic Net, Gradient boosting, and Ada Boost Regression. House price prediction on a
data set has been done by using all the above-mentioned techniques to find out the best
among them. The motive of this paper is to help the seller to estimate the selling cost of a
house perfectly and to help people to predict the exact time slap to accumulate a house.
Some of the related factors that impact the cost were also taken into consideration such as
physical conditions, concept, and location.

6
INDEX
CONTENTS PG NO

CHAPTER-1
1.1 INTRODUCTION 9
1.2 SOFTWARE DESCRIPTION 9
CHAPTER-II
2.1 DATA SET 10
2.2 DATA EXPLORATION 11
2.3 DATA VISUALIZATION 11
2.4 DATA SELECTION 12
2.5 DATA TRANSFORMATION 13
CHAPTER-III
3.1 PYTHON 14
3.2 NUMPY 14
3.3 PANDAS 17
3.4 MATPLOTLIB 18
3.5 SKIT LEARN 20
3.5 SEABORN 21
CHAPTER-IV
4.1 MACHINE LEARNING 24
4.2 MODELS USED 26
4.3 REGRESSION ANALYTICS 28
CHAPTER-V
5.1 HOUSE SALES PRICES USING LINEAR REGRESSION 32
5.2 DJANGO WEB FRAMEWORK 35

7
CHAPTER-VI
6.1 CODE 46
6.2 RESULTS AND DISCUSSION 49
6.3 BEST SUITED MODEL 51
CONCLUSION 51
REFERENCES 51

List of Figures

s.no figure names

1 visualization graphs

2 correlation heat map

3 linear regression scattering plot

4 random forest regression scattering plot

5 different simple linear regression models

6 django working block diagram

7 login page of house prices prediction

8 signup page of house pries prediction

9 home page of house prices prediction

10 prediction page of house prices prediction

8
CHAPTER-I

1.1 INTRODUCTION

There are some of the Parameters on which we will evaluate ourselves-Create an effective
price prediction model to validate the model’s prediction accuracy. Identify the important
home price attributes which feed the model’s predictive power. In the first model, a dataset
is present with inputs and known outputs. In the second one, the machine learns from a
dataset that comes with input variables only. In a reinforcement learning model, algorithms
are used to select an action. This project is implemented using supervised machine learning
algorithms. The outcome of our project is to make predictions on the sales prices of the
houses of California State with the dataset provided. It is hoped this study will inform better
analysis of gathered information (unanalyzed data) andother machine learning techniques.
Linear Regression is a Supervised Machine Learning Model for finding the relationship
between independent variables and dependent variables. Linear regression performs the
task to predict the response (dependent) variable value (y) based on a given (independent)
explanatory variable (x). So, this regression technique finds out a linear relationship
between x (input) and y (output).

1.2 SOFTWARE DESCRIPTION:


Anaconda is a free and open-source distribution of Python and R programming languages
for scientific computing (data science, machine learning, applications, large-scale data
processing, predictive analytics, etc.), that aims to specify package management and
deployment.

Jupyter Notebook is a web-based interactive computational environment for creating


Jupyter notebooks documents. The ‘notebook’ term colloquially make reference to many
different entities,mainly the Jupyter web application, Jupyter Python web server, or jupyter
document format depending on context. A Jupyter Notebook document is a JSON
document, following a versioned schema, and containing an ordered list of input/output
cells which can contain code, text (using Markdown), mathematics, plots and rich media,
usually ending with the “.ipynp” extension.

9
CHAPTER- II

2.1 DATASET

Problem Statement – Real estate agents want help to predict the house price for regions in
the area. The dataset to work on and we decided to use the Linear Regression Model. Create
a model that will help him to estimate what the house would sell for.

The dataset contains 7 columns and 5000 rows with CSV extension. The data contains the
following columns :

• ‘Avg. Area Income’ – Avg. The income of the householder of the city house is located.

• ‘Avg. Area House Age’ – Avg. Age of Houses in the same city.

• ‘Avg. Area Number of Rooms’ – Avg. Number of Rooms for Houses in the same city.

• ‘Avg. Area Number of Bedrooms’ – Avg. Number of Bedrooms for Houses in the same

city.

• ‘Area Population’ – Population of the city.

• ‘Price’ – Price that the house sold at.

• ‘Address’ – Address of the houses.

House price prediction on a data set has been done by using all the above-mentioned
techniques to find out the best among them. The motive of this paper is to help the seller
to estimate the selling cost of a house perfectly and to help people to predict the exact time
slap to accumulate a house. Some of the related factors that impact the cost were also taken
into consideration such as physical conditions, concept, and location.

10
2.2 DATA EXPLORATION

Data exploration is the first step in data analysis and typically involves summarizing the
main characteristics of a data set, including its size, accuracy, initial patterns in the data
and other attributes. It is commonly conducted by data analysts using visual analytics tools,
but it can also be done in more advanced statistical software, Python. Before it can conduct
analysis on data collected by multiple data sources and stored in data warehouses, an
organization must know how many cases are in a data set, what variables are included,
how many missing values there are, and what general hypotheses the data is likely to
support. An initial exploration of the data set can help answer these questions by
familiarizing analysts with the data with which they are working. This area of data
exploration has become an area of interest in the field of machine learning. This is a
relatively new field and is still evolving. As its most basic level, a machine-learning
algorithm can be fed a data set and can be used to identify whether a hypothesis is true
based on the dataset. Common machine learning algorithms can focus on identifying the
specific pattern Many common patterns include regression and classification or clustering,
but there are many possible patterns and algorithms that can be applied to data via machine
learning.

2.3 DATA VISUALIZATION

Data visualization is the graphical representation of information and data. By using


visual elements like charts, graphs, and maps, data visualization tools provide an accessible
way to see and understand trends, outliers, and patterns in data. In the world of Big Data,
data visualization tools and technologies are essential to analyze massive amounts of
information and make data-driven decisions.

11
Fig 1: visualization Graphs

2.4 DATA SELECTION

Data selection is defined as the process of determining the appropriate data type and
source, as well as suitable instruments to collect data. Data selection precedes the actual
practice of data collection. This definition distinguishes data selection from selective data
reporting (selectively excluding data that is not supportive of a research hypothesis) and
interactive/active data selection (using collected data for monitoring activities/events or
conducting secondary data analyses). The process of selecting suitable data for a research
project can impact data integrity.
The primary objective of data selection is the determination of appropriate data type,
source, and instrument(s) that allow investigators to adequately answer research questions.
This determination is often discipline-specific and is primarily driven by the nature of the
investigation, existing literature, and accessibility to necessary data sources.

12
Fig 2: Correlation Heatmap

2.5 DATA TRANSFORMATION

Data transformation is the process of changing the format, structure, or values of data. For
data analytics projects, data may be transformed at two stages of the data pipeline.
Organizations that use on-premises data warehouses generally use an ETL process, in
which data transformation in middle step. Today, most organizations use cloud-based data
warehouses, which can scale compute and storage resources with latency measured in
seconds or minutes. The scalability of the cloud platform lets organizations skip preload
transformations and load raw data into the data warehouse, then transform it at query time
— a model called ELT ( extract, load, transform).
In computing, data transformation is the process of converting data from one format or
structure into another format or structure. It is a fundamental aspect of most data
integration and data management tasks such as data wrangling, data warehousing, data
integration and application integration

13
CHAPTER-III
3.1 PYTHON

Python is a popular programming language. It was created by Guido van Rossum and
released in 1991. It is used for:

1) Web development (server-side),

2) Software development,

3) Mathematics,

4) System scripting…

• Python can be used on a server to create web applications.

• Python can be used alongside software to create workflows.

• Python can connect to database systems. It can also read and modify files.

• Python can be used to handle big data and perform complex mathematics.

• Python can be used for rapid prototyping, or for production-ready software development.

Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).

• Python has a simple syntax like the English language.

• Python has a syntax that allows developers to write programs with fewer lines than some
otherprogramming languages.
• Python runs on an interpreter system, meaning that code can be executed as soon as it is
written.This means that prototyping can be very quick.

3.2 NUMPY

The topic is very broad: datasets can come from a wide range of sourcesand a wide range
of formats, including be collections of documents, collections of images, collections of
sound clips, collections of numerical measurements, or nearly anything else. Despitethis
apparent heterogeneity, it will help us to think of all data fundamentally as arrays of
numbers.

For example, images–particularly digital images–can be thought of as simply two-

14
dimensional arrays of numbers representing pixel brightness across the area. Sound clips
can be thought of as one-dimensional arrays of intensity versus time. Text can be converted
in various ways into numerical representations, perhaps binary digits representing the
frequency of certain words or pairs of words. No matter what the data are, the first step in
making it analyzable will beto transform them into arrays of numbers.

Data manipulation in Python is nearly synonymous with NumPy array manipulation:


even newer tools like Pandas (Chapter 5) are built around the NumPy array.

Categories of basic array manipulations here:

• Attributes of arrays: Determining the size, shape, memory consumption, and data types of
arrays
• Indexing of arrays: Getting and setting the value of individual array elements

• Slicing of arrays: Getting and setting smaller subarrays within a larger array

• Reshaping of arrays: Changing the shape of a given array

• Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array
into many

NumPy (short for Numerical Python) provides an efficient interface to store and
operate on dense data buffers. In some ways, NumPy arrays are like Python's built-in “list”
type, but NumPyarrays provide much more efficient storage and data operations as the
arrays grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of
data science tools in Python, so time spent learning to use NumPy effectively will be
valuable no matter what aspect of data science interests you.

If you followed the advice outlined in the Preface and installed the Anaconda stack,
you already have NumPy installed and ready to go. If you're more the do-it-yourself type,
you can go to http://www.numpy.org/ and follow the installation instructions found there.

NumPy Array Attributes

First let’s discuss some useful array attributes. We’ll start by defining three random
arrays: a one-dimensional, two-dimensional, and three-dimensional array. We’ll use
NumPy’s random number generator, which we will seed with a set value in order to ensure

15
that the same random arrays are generated each time this code is run:
Fixed-Type Arrays in Python

Python offers several different options for storing data in efficient, fixed-type data
buffers. The built-in array module (available since Python 3.3) can be used to create dense
arrays of a uniform type:

Here 'i' is a type code indicating the contents are integers.Here I’m importing standard
NumPy, under the alias np:

We can use np.array to create arrays from Python lists.Unlike Python lists, NumPy
is constrained to arrays that all contain the same type. If types do not match, NumPy will
upcast if possible (here, integers are up-cast to floating point):

Finally, unlike Python lists, NumPy arrays can explicitly be multi-dimensional;


here's oneway of initializing a multidimensional array using a list of lists:

16
3.3 PANDAS

Pandas is a newer package built on top of NumPy and provides an efficient


implementation of a DataFrame. DataFrames are essentially multidimensional arrays with
attached row and column labels, and often with heterogeneous types and/or missing data.
As well as offering a convenient storage interface for labeled data, Pandas implements
several powerful data operations familiar to users of both database frameworks and
spreadsheet programs.

As we saw, NumPy's ndarray data structure provides essential features for the type of
clean, well-organized data typically seen in numerical computing tasks. While it serves this
purpose very well, its limitations become clear when we need more flexibility (e.g.,
attaching labels to data, working with missing data, etc.) and when attempting operations
that do not map well to element-wise broadcasting (e.g., groupings, pivots, etc.), each of
which is an important piece of analyzingthe less structured data available in many forms
in the world around us. Pandas and its Series andDataFrame objects, builds on the NumPy
array structure and provides efficient access to these sorts of "data munging" tasks that
occupy much of a data scientist's time.

Installing and Using Pandas

Installation of Pandas on your system requires NumPy to be installed. Details on this


installation can be found in the Pandas documentation. If you followed the advice outlined
in the Installation chapter and used the Anaconda stack, you already have Pandas installed.

The Pandas Series Object

A Pandas Series is a one-dimensional array of indexed data. It can be created from a


list or array as follows:

In[2]: data = pd.Series([0.25, 0.5, 0.75, 1.0])


data
Out[2]: 0 0.25 Series as generalized NumPy array

From what we’ve seen so far, it may look like the Series object is basically
interchangeable with a one-dimensional NumPy array. The essential difference is the
presence of the index: whilethe NumPy array has an implicitly defined integer index used
to access the values, the Pandas Series has an explicitly defined index associated with the
values. This explicit index definition gives the Series object additional capabilities. For

17
example, the index need not be an integer, but can consist of values of any desired type.
For example, if we wish, we can use strings as an index:

In[7]: data = pd.Series([0.25, 0.5, 0.75, 1.0],


index=['a', 'b', 'c', 'd'])data
Out[7]: a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64

3.4 MATPLOTLIB

We’ll now look at the Matplotlib tool for visualization in Python. Matplotlib is a
multiplatformdata visualization library built on NumPy arrays and designed to work with
the broader SciPy stack. It was conceived by John Hunter in 2002, originally as a patch to
IPython for enabling interactive MATLAB-style plotting via gnuplot from the IPython
command line. IPython’screator, Fernando Perez, was at the time scrambling to finish his
PhD, and let John know he wouldn’t have time to review the patch for several months. John
took this as a cue to set out on hisown, and the Matplotlib package was born, with version
0.1 released in 2003.

One of Matplotlib’s most important features is its ability to play well with many
operating systems and graphics backends. This cross-platform, everything-to-everyone
approach has been one of the great strengths of Matplotlib. For this reason, I believe that
Matplotlib itself will remaina vital piece of the data visualization stack, even if new tools
mean the community gradually movesaway from using the Matplotlib API directly.

We’ll now look at the Matplotlib tool for visualization in Python. Matplotlib is a
multiplatformdata visualization library built on NumPy arrays and designed to work with
the broader SciPy stack. It was conceived by John Hunter in 2002, originally as a patch to
IPython for enabling interactive MATLAB-style plotting via gnuplot from the IPython
command line. IPython’screator, Fernando Perez, was at the time scrambling to finish his
PhD, and let John know he wouldn’t have time to review the patch for several months. John

18
took this as a cue to set out on hisown, and the Matplotlib package was born, with version
0.1 released in 2003.

One of Matplotlib’s most important features is its ability to play well with many
operating systems and graphics backends. This cross-platform, everything-to-everyone
approach has been one of the great strengths of Matplotlib. For this reason, I believe that
Matplotlib itself will remaina vital piece of the data visualization stack, even if new tools
mean the community gradually movesaway from using the Matplotlib API directly.

Importing matplotlib

Just as we use the np shorthand for NumPy and the pd shorthand for Pandas, we
will usesome standard shorthands for Matplotlib imports:

In[1]: import matplotlib as mpl


import matplotlib.pyplot as plt

Basic Errorbars

A basic errorbar can be created with a single Matplotlib function call :

In[1]: %matplotlib inline


import
matplotlib.pyplot as
plt
plt.style.use('seaborn-
whitegrid')import numpy
as np
In[2]: x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy *
np.random.randn(50)
plt.errorbar(x, y, yerr=dy,
fmt='.k');

19
3.5 SCIKIT-LEARN

There are several Python libraries that provide solid implementations of a range of
machine learning algorithms. One of the best known is Scikit-Learn, a package that
provides efficient versions of a large number of common algorithms. Scikit-Learn is
characterized by a clean, uniform, and streamlined API, as well as by very useful and
complete online documentation. A benefit of this uniformity is that once you understand
the basic use and syntax of Scikit-Learn forone type of model, switching to a new model
or algorithm is very straightforward. Machine learning is about creating models from data:
for that reason, we’ll start by discussing how data can be represented in order to be
understood by the computer. The best way to think about data within Scikit-Learn is in
terms of tables of data. For example, consider the Irisdataset, famously analyzed by Ronald
Fisher in 1936.

Machine learning is about creating models from data: for that reason, we’ll start by
discussing how data can be represented in order to be understood by the computer. The
best way to think about data within Scikit-Learn is in terms of tables of data. For example,
consider the Irisdataset, famously analyzed by Ronald Fisher in 1936. We can download
this dataset in the form of a Pandas DataFrame using the Seaborn library:

In[1]: import seaborn as sns

iris = sns.load_dataset('iris')iris.head()
Out[1]: sepal_length sepal_width petal_length petal_widthspecies
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa

A benefit of this uniformity is that once you understand the basic use and syntax of
Scikit-Learn forone type of model, switching to a new model or algorithm is very
straightforward. Machine learning is about creating models from data: for that reason,
we’ll start bydiscussing how data can be represented in order to be understood by the
computer

20
3.6 SEABORN

Seaborn helps you explore and understand your data. Its plotting functions operate on
dataframes and arrays containing whole datasets and internally perform the necessary
semantic mapping and statistical aggregation to produce informative plots. Its dataset-
oriented, declarative API lets you focus on what the different elements of your plots
mean, rather than on the details of how to draw them.

Seaborn is a data visualization library built on top of matplotlib and closely integrated with
pandas data structures in Python. Visualization is the central part of Seaborn which helps in
exploration and understanding of data.

One has to be familiar with Numpy and Matplotlib and Pandas to learn about Seaborn.

Seaborn offers the following functionalities:

1. Dataset oriented API to determine the relationship between variables.

2. Automatic estimation and plotting of linear regression plots.

3. It supports high-level abstractions for multi-plot grids.

4. Visualizing univariate and bivariate distribution.

These are only some of the functionalities offered by Seaborn, there are many more of them,
and we can explore all of them here.

To initialize the Seaborn library, the command used is:


import seaborn as sns

Using Seaborn we can plot wide varieties of plots like:

1. Distribution Plots

2. Pie Chart & Bar Chart

21
3. Scatter Plots

4. Pair Plots

5. Heat maps

1. Distribution Plots

We can compare the distribution plot in Seaborn to histograms in Matplotlib. They both
offer pretty similar functionalities. Instead of frequency plots in the histogram, here we’ll
plot an approximate probability density across the y-axis.

We will be using sns.distplot() in the code to plot distribution graphs.Before going further,
first, let’s access our datase

2. Pie Chart & Bar Chart

Pie Chart is generally used to analyze the data on how a numeric variable changes across
different categories.In the dataset we are using, we’ll analyze how the top 4 categories in
the Content Rating column is performing.From the above Pie diagram, we cannot correctly
infer whether “Everyone 10+” and “Mature 17+”. It is very difficult to assess the difference
between those two categories when their values are somewhat similar to each other.

We can overcome this situation by plotting the above data in Bar chart.
Bar Chart for Content rating column. Similar to Pie Chart, we can customize our Bar Graph
too, with different Colors of Bars, the title of the chart, etc.

3. Scatter Plots

Up until now, we have been dealing with only a single numeric column from the dataset,
like Rating, Reviews or Size, etc. But, what if we have to infer a relationship between two
numeric columns, say “Rating and Size” or “Rating and Reviews”.

22
Scatter Plot is used when we want to plot the relationship between any two numeric columns
from a dataset. These plots are the most powerful visualization tools that are being used in
the field of machine learning.

Let’s see how the scatter plot looks like for two numeric columns in the dataset “Rating” &
“Size”. First, we’ll plot the graph using matplotlib after that we’ll see how it looks like in
seaborn.We will be using sns.joinplot() in the code for scatter plot along with the
histogram.sns.scatterplot() in the code for only scatter plots.
Scatter Plot using SeabornThe main advantage of using a scatter plot in seaborn is, we’ll
get both the scatter plot and the histograms in the graph.If we want to see only the scatter
plot instead of “jointplot” in the code, just change it with “scatterplot”

Regression plots create a regression line between 2 numerical parameters in the


jointplot(scatterplot) and help to visualize their linear relationships.From the above graph,
we can infer that there is a steady increase in the Rating if the Price of the apps increases.

4. Pair Plots

Pair Plots are used when we want to see the relationship pattern among more than 3 different
numeric variables. For example, let’s say we want to see how a company’s sales are affected
by three different factors, in that case, pair plots will be very helpful.

23
CHAPTER-IV

4.1 MACHINE LEARNING


Machine learning is an application of artificial intelligence (AI) that provides systems
the ability to automatically learn and improve from experience without being explicitly
programmed.Machine learning focuses on the development of computer programs that can
access data and useit to learn for themselves.
The process of learning begins with observations or data, such as examples, direct
experience,or instruction, in order to look for patterns in data and make better decisions in
the future based onthe examples that we provide. The primary aim is to allow the computers
learn automatically without human intervention or assistance and adjust actions
accordingly.

Some machine learning methods:

Machine learning algorithms are often categorized as supervised or unsupervised.

Supervised machine learning

Starting from the analysis of a known training dataset, the learning algorithm
produces aninferred function to make predictions about the output values. The system is
able to provide targetsfor any new input after sufficient training. The learning algorithm can
also compare its output withthe correct, intended output and find errors in order to modify
the model accordingly.

Unsupervised machine learning

These are used when the information used to train is neither classified nor labeled.
Unsupervised learning studies how systems can infer a function to describe a hidden
structure fromunlabeled data. The system explores the data and can draw inferences from
datasets to describe hidden structures from unlabeled data.

Semi-supervised machine learning

These algorithms fall somewhere in between supervised and unsupervised learning,


since they use both labeled and unlabeled data for training – typically a small amount of
labeled data

24
and a large amount of unlabeled data. Usually, semi-supervised learning is chosen
when theacquired labeled data requires skilled and relevant resources in order to train it /
learn from it.

Reinforcement machine learning

Algorithms is a learning method that interacts with its environment by producing


actions and discovers errors or rewards. Trial and error search and delayed reward are the
most relevant characteristics of reinforcement learning. This method allows machines and
software agents to automatically determine the ideal behavior within a specific context in
order to maximize its performance.

Importance of Machine learning

Consider some of the instances where machine learning is applied: the self-driving
Googlecar, cyber fraud detection, online recommendation engines like friend suggestions
on Facebook, Netflix showcasing the movies and shows you might like, and “more items
to consider” and “get yourself a little something” on Amazon are all examples of applied
machine learning. All these examples echo the vital role machine learning has begun to
take in today’s data-rich world.
Machines can aid in filtering useful pieces of information that help in major
advancements, and we are already seeing how this technology is being implemented in a
wide variety of industries.The process flow depicted here represents how machine learning
works

Uses of Machine Learning

Earlier, we mentioned some applications of machine learning. To understand the


concept ofmachine learning better, let’s consider some more examples: web search results,
real-time ads onweb pages and mobile devices, email spam filtering, network intrusion
detection, and pattern and image recognition. All these are by-products of applying
machine learning to analyze huge volumes of data.
By developing fast and efficient algorithms and data-driven models for real-time
processingof data, machine learning can produce accurate results and analysis.

25
4.2 MODELS USED

4.2.1 Regression model

• Linear Regression is a machine learning algorithm based on supervised learning.

• It performs a regression task. Regression models a target prediction value based on


independent variables.

• It is mostly used for finding out the relationship between variables and forecasting.

Real Vs Predicted

Fig 3: linear regression scattering plot

26
4.2.2 Random Forest Regression model

• A Random Forest is an ensemble technique capable of performing both regression and


classification tasks with the use of multiple decision trees and a technique called Bootstrap
Aggregation, commonly known as bagging.

• Bagging, in the Random Forest method, involves training each decision tree on a different
data sample where sampling is done with replacement.

• The basic idea behind this is to combine multiple decision trees in determining the final
output rather than relying on individual decision trees.

Real Vs Predicted

Fig 4:Random forest regression scattering plot

27
4.3 REGRESSION ANALYSIS
Regression analysis is a predictive modelling technique that analyzes the relation between
the target or dependent variable and independent variable in a dataset. The different types
of regression analysis techniques get used when the target and independent variables
show a linear or non-linear relationship between each other, and the target variable contains
continuous values. The regression technique gets used mainly to determine the predictor
strength, forecast trend, time series, and in case of cause & effect relation.
Regression analysis is the primary technique to solve the regression problems in machine
learning using data modelling. It involves determining the best fit line, which is a line that
passes through all the data points in such a way that distance of the line from each data
point is minimized.
Types of Regression Analysis Techniques
There are many types of regression analysis techniques, and the use of each method
depends upon the number of factors. These factors include the type of target variable, shape
of the regression line, and the number of independent variables.
Below are the different regression techniques:

1. Linear Regression
2. Logistic Regression
3. Ridge Regression
4. Lasso Regression
5. Polynomial Regression
6. Bayesian Linear Regression

The different types of regression in machine learning techniques are explained below
in detail:

1. Linear Regression

Linear regression is one of the most basic types of regression in machine learning. The
linear regression model consists of a predictor variable and a dependent variable related

28
linearly to each other. In case the data involves more than one independent variable, then
linear regression is called multiple linear regression models.
The below-given equation is used to denote the linear regression model:
y=mx+c+e
where m is the slope of the line, c is an intercept, and e represents the error in the model.

The best fit line is determined by varying the values of m and c. The predictor error is the
difference between the observed values and the predicted value. The values of m and c get
selected in such a way that it gives the minimum predictor error. It is important to note that
a simple linear regression model is susceptible to outliers. Therefore, it should not be used
in case of big size data.

2. Logistic Regression

Logistic regression is one of the types of regression analysis technique, which gets used
when the dependent variable is discrete. Example: 0 or 1, true or false, etc. This means the
target variable can have only two values, and a sigmoid curve denotes the relation between
the target variable and the independent variable.
Logit function is used in Logistic Regression to measure the relationship between the target
variable and independent variables. Below is the equation that denotes the logistic
regression.
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3….+bkXk
where p is the probability of occurrence of the feature.

For selecting logistic regression, as the regression analyst technique, it should be noted,
the size of data is large with the almost equal occurrence of values to come in target
variables. Also, there should be no multicollinearity, which means that there should be no
correlation between independent variables in the dataset.

3. Ridge Regression

This is another one of the types of regression in machine learning which is usually used
when there is a high correlation between the independent variables. This is because, in the

29
case of multi collinear data, the least square estimates give unbiased values. But, in case
the collinearity is very high, there can be some bias value. Therefore, a bias matrix is
introduced in the equation of Ridge Regression. This is a powerful regression method
where the model is less susceptible to overfitting.
Below is the equation used to denote the Ridge Regression, where the introduction of λ
(lambda) solves the problem of multicollinearity:
β = (X^{T}X + λ*I)^{-1}X^{T}y

3 Lasso Regression

Lasso Regression is one of the types of regression in machine learning that performs
regularization along with feature selection. It prohibits the absolute size of the regression
coefficient. As a result, the coefficient value gets nearer to zero, which does not happen in
the case of Ridge Regression.
Due to this, feature selection gets used in Lasso Regression, which allows selecting a set
of features from the dataset to build the model. In the case of Lasso Regression, only the
required features are used, and the other ones are made zero. This helps in avoiding the
overfitting in the model. In case the independent variables are highly collinear, then Lasso
regression picks only one variable and makes other variables to shrink to zero.

Below is the equation that represents the Lasso Regression method:


N^{-1}Σ^{N}_{i=1}f(x_{i}, y_{I}, α, β)

4 .Polynomial Regression

Polynomial Regression is another one of the types of regression analysis techniques in


machine learning, which is the same as Multiple Linear Regression with a little
modification. In Polynomial Regression, the relationship between independent and
dependent variables, that is X and Y, is denoted by the n-th degree.
It is a linear model as an estimator. Least Mean Squared Method is used in Polynomial
Regression also. The best fit line in Polynomial Regression that passes through all the data
points is not a straight line, but a curved line, which depends upon the power of X or value
of n.

30
While trying to reduce the Mean Squared Error to a minimum and to get the best fit line,
the model can be prone to overfitting. It is recommended to analyze the curve towards the
end as the higher Polynomials can give strange results on extrapolation.
Below equation represents the Polynomial Regression:
l = β0+ β0x1+ε

6. Bayesian Linear Regression


Bayesian Regression is one of the types of regression in machine learning that uses the
Bayes theorem to find out the value of regression coefficients. In this method of regression,
the posterior distribution of the features is determined instead of finding the least-squares.
Bayesian Linear Regression is like both Linear Regression and Ridge Regression but is
more stable than the simple Linear Regression.

Within multiple types of regression models, it is important to choose the best suited
technique based on type of independent and dependent variables, dimensionality in the
data and other essential characteristics of the data.

1. Data exploration is an inevitable part of building predictive model. It should be you first
step before selecting the right model like identify the relationship and impact of variables
2. To compare the goodness of fit for different models, we can analyse different metrics like
statistical significance of parameters, R-square, Adjusted r-square, AIC, BIC and error
term. Another one is the Mallow’s Cp criterion. This essentially checks for possible bias
in your model, by comparing the model with all possible submodels (or a careful selection
of them).
3. Cross-validation is the best way to evaluate models used for prediction. Here you divide
your data set into two group (train and validate). A simple mean squared difference
between the observed and predicted values give you a measure for the prediction accuracy.
4. If your data set has multiple confounding variables, you should not choose automatic
model selection method because you do not want to put these in a model at the same time.
5. It’ll also depend on your objective. It can occur that a less powerful model is easy to
implement as compared to a highly statistically significant model.
6. Regression regularization methods(Lasso, Ridge and ElasticNet) works well in case of
high dimensionality and multicollinearity among the variables in the data set.

31
CHAPTER-V

5.1 PREDICTING THE HOUSE SALES PRICES USING LINEAR


REGRESSION

Firstly ,what is the problem?

We set out to use linear regression to predict housing prices in Iowa.


The problem is to build a model that will predict house prices with a high degree
of predictive accuracy given the available data. More about it here. “With 77 explanatory
variables describing (almost) every aspect of residential homes in Ames, Iowa, this
competition challengesyou to predict the final price of each home.”

Mission 1: Importing Libraries

Where I got the data

The dataset is the prices and features of residential houses sold from 2006 to 2010 in
Ames,Iowa. Check the references for the dataset link.

Libraries Used in this Project

First things first, Import the python libraries and dataset

Figure 16: Importing Libraries

pandas is an open-source, BSD-licensed library providing high-performance, easy-


to-use data structures and data analysis tools for the Python programming language. We'll
use scikit-learnfor the model training process, so we can focus on gaining an intuition for
the model-based learningapproach to machine learning.

Mission2: Introduction to the Data

To get familiar with this machine learning approach, we'll work with a dataset on sold
housesin Ames, Iowa. Each row in the dataset describes the properties of a single house as

32
well as the amount it was sold for. In this course, we'll build models that predict the final
sale price from its other attributes. Specifically, we'll explore the following questions:

• Which properties of a house most affect the final sale price?


• How effectively can we predict the sale price from just its properties?

This dataset was originally compiled by Dean De Cock for the primary purpose of
having ahigh-quality dataset for regression. Here are some of the columns:

1. Lot Area: Lot size in square feet.


2. Overall, Qual: Rates the overall material and finish of the house.
3. Overall Cond: Rates the overall condition of the house.

Mission 3: Simple Linear Regression

We'll start by understanding the univariate case of linear regression, also known as
simple linear regression. The following equation is the general form of the simple linear
regression model.
y^=ax1+a0
y^ represents the target column while x1 represents the feature column we choose
to use inour model. These values are independent of the dataset. On the other hand, a0 and
a1represent theparameter values that are specific to the dataset. The goal of simple linear
regression is to find the

optimal parameter values that best describe the relationship between the feature column
and the target column. The following diagram shows different simple linear regression
models dependingon the data:

Figure 5: Different simple linear regression models

The first step is to select the feature, x1, we want to use in our model. Once we
select thisfeature, we can use scikit-learn to determine the optimal parameter values a1 and
a0based on the training data.

33
• Create a figure with dimensions 7 x 15 containing three scatter plots in a single column:
• The first plot should plot the Garage Area column on the x-axis against the SalePrice
columnon the y-axis.
• The second one should plot the Gr Liv Area column on the x-axis against the SalePrice
columnon the y-axis.
• The third one should plot the Overall Cond column on the x-axis against the SalePrice
columnon the y-axis.
Mission 4: Least Squares

From the last screen, we can tell that the Gr Liv Area feature correlates the most
with the SalePrice column. We can confirm this by calculating the correlation between
pairs of these columns using the pandas.DataFrame.corr() method:
print(train[['Garage Area', 'Gr Liv Area', 'Overall Cond', 'SalePrice']].corr())

The correlation between Gr Liv Area and SalePrice is around 0.706, which is the
highest. Recall that the closer the correlation coefficient is to 1.0, the stronger the
correlation. Here's the updated form of our model:
y^=a1∗Gr Liv Area+a0 Let's now move on to understanding the model fitting criteria.

Mission 5: Using Scikit-Learn to Train and Predict

Let's now use scikit-learn to find the optimal parameter values for our model. The
scikitlearn library was designed to easily swap and try different models. Because we're
familiar with the scikit-learn workflow for k-nearest neighbors, switching to using linear
regression is straightforward. 34 Instead of working with the
sklearn.neighbors.KNeighborsRegressors class, we work with the
sklearn.linear_model.LinearRegression class. The LinearRegression class also has it's own
fit() method. Specific to this model, however, is the coef_ and intercept_ attributes, which
returns a1 (a1 to an if it were a multivariate regression model) and a0 accordingly.

Mission 6: Making Predictions

In the last step, we fit a univariate linear regression model between the Gr Liv Area
and SalePrice columns. We then displayed the single coefficient and the residuel value. If
we refer back to the format of our linear regression model, the fitted model can be

34
represented as: y^=116.86624683x1+5366.82171006

One way to interpret this model is "for every 1 square foot increase in above ground
livingarea, we can expect the home's value to increase by approximately 116.87 dollars".
We can now use the predict() method to predict the labels using the training data
and compare them with the actual labels. To quantify the fit, we can use mean squared error.
Let's alsoperform simple validation by making predictions on the test set and calculate the
MSE value for those predictions as well

5.2 DJANGO WEBFRAMEWORK

Django is a python -based free and open source web framework that follows the model–
template–views (MTV) architectural pattern. It is maintained by the Django software
foundation (DSF), an independent organization established in the US as a non-profit.

Django's primary goal is to ease the creation of complex, database-driven websites. The
framework emphasizes reusability and "pluggability" of components, less code, low
coupling, rapid development, and the principle of don’t repeat yourself is used throughout,
even for settings, files, and data models. Django also provides an optional
administrative create, update and delete interface that is generated dynamically
through introspection and configured via admin models.Some well-known sites that use
Django include Instagram, Mozilla, clubhouse and bitbucket.

We used the Django web framework to deploy the model where the front and backend was
perfectly developed as per the industry requirement.

Django is a high-level Python web framework that enables rapid development of secure
and maintainable websites. Built by experienced developers, Django takes care of much of
the hassle of web development, so you can focus on writing your app without needing to
reinvent the wheel. It is free and open source, has a thriving and active community, great
documentation, and many options for free and paid-for support.

Django follows the "Batteries included" philosophy and provides almost everything
developers might want to do "out of the box". Because everything you need is part of the

35
one "product", it all works seamlessly together, follows consistent design principles, and
has extensive and up-to-date documentation.

Versatile

Django can be (and has been) used to build almost any type of website — from content
management systems and wikis, through to social networks and news sites. It can work
with any client-side framework, and can deliver content in almost any format (including
HTML, RSS feeds, JSON, XML, etc). The site you are currently reading is built with
Django!

Internally, while it provides choices for almost any functionality you might want (e.g.
several popular databases, templating engines, etc.), it can also be extended to use other
components if needed.

Secure

Django helps developers avoid many common security mistakes by providing a framework
that has been engineered to "do the right things" to protect the website automatically. For
example, Django provides a secure way to manage user accounts and passwords, avoiding
common mistakes like putting session information in cookies where it is vulnerable
(instead cookies just contain a key, and the actual data is stored in the database) or directly
storing passwords rather than a password hash.

A password hash is a fixed-length value created by sending the password through


a cryptographic hash function. Django can check if an entered password is correct by
running it through the hash function and comparing the output to the stored hash value.
However due to the "one-way" nature of the function, even if a stored hash value is
compromised it is hard for an attacker to work out the original password.

Django enables protection against many vulnerabilities by default, including SQL


injection, cross-site scripting, cross-site request forgery and clickjacking (see Website
security for more details of such attacks).

Scalable

Django uses a component-based "shared-nothing" architecture (each part of the


architecture is independent of the others, and can hence be replaced or changed if needed).

36
Having a clear separation between the different parts means that it can scale for increased
traffic by adding hardware at any level: caching servers, database servers, or application
servers. Some of the busiest sites have successfully scaled Django to meet their demands
(e.g. Instagram and Disqus, to name just two).

Maintainable

Django code is written using design principles and patterns that encourage the creation of
maintainable and reusable code. In particular, it makes use of the Don't Repeat Yourself
(DRY) principle so there is no unnecessary duplication, reducing the amount of code.
Django also promotes the grouping of related functionality into reusable "applications"
and, at a lower level, groups related code into modules (along the lines of the Model View
Controller (MVC) pattern).

Portable

Django is written in Python, which runs on many platforms. That means that you are not
tied to any particular server platform, and can run your applications on many flavours of
Linux, Windows, and Mac OS X. Furthermore, Django is well-supported by many web
hosting providers, who often provide specific infrastructure and documentation for hosting
Django sites.

Django was initially developed between 2003 and 2005 by a web team who were
responsible for creating and maintaining newspaper websites. After creating a number of
sites, the team began to factor out and reuse lots of common code and design patterns. This
common code evolved into a generic web development framework, which was open-
sourced as the "Django" project in July 2005.

Django has continued to grow and improve, from its first milestone release (1.0) in
September 2008 through to the recently-released version 3.1 (2020). Each release has
added new functionality and bug fixes, ranging from support for new types of databases,
template engines, and caching, through to the addition of "generic" view functions and
classes (which reduce the amount of code that developers have to write for a number of
programming tasks).

Note: Check out the release notes on the Django website to see what has changed in recent
versions, and how much work is going into making Django better.

37
Django is now a thriving, collaborative open source project, with many thousands of users
and contributors. While it does still have some features that reflect its origin, Django has
evolved into a versatile framework that is capable of developing any type of website.

There isn't any readily-available and definitive measurement of popularity of server-side


frameworks (although you can estimate popularity using mechanisms like counting the
number of GitHub projects and StackOverflow questions for each platform). A better
question is whether Django is "popular enough" to avoid the problems of unpopular
platforms. Is it continuing to evolve? Can you get help if you need it? Is there an
opportunity for you to get paid work if you learn Django?

Based on the number of high profile sites that use Django, the number of people
contributing to the codebase, and the number of people providing both free and paid for
support, then yes, Django is a popular framework!

High-profile sites that use Django include: Disqus, Instagram, Knight Foundation,
MacArthur Foundation, Mozilla, National Geographic, Open Knowledge Foundation,
Pinterest, and Open Stack (source: Django overview page).

Is Django opinionated?

Web frameworks often refer to themselves as "opinionated" or "unopinionated".

Opinionated frameworks are those with opinions about the "right way" to handle any
particular task. They often support rapid development in a particular domain (solving
problems of a particular type) because the right way to do anything is usually well-
understood and well-documented. However they can be less flexible at solving problems
outside their main domain, and tend to offer fewer choices for what components and
approaches they can use.

Unopinionated frameworks, by contrast, have far fewer restrictions on the best way to glue
components together to achieve a goal, or even what components should be used. They
make it easier for developers to use the most suitable tools to complete a particular task,
albeit at the cost that you need to find those components yourself.

Django is "somewhat opinionated", and hence delivers the "best of both worlds". It
provides a set of components to handle most web development tasks and one (or two)

38
preferred ways to use them. However, Django's decoupled architecture means that you can
usually pick and choose from a number of different options, or add support for completely
new ones if desired.

In a traditional data-driven website, a web application waits for HTTP requests from the
web browser (or other client). When a request is received the application works out what
is needed based on the URL and possibly information in POST data or GET data.
Depending on what is required it may then read or write information from a database or
perform other tasks required to satisfy the request. The application will then return a
response to the web browser, often dynamically creating an HTML page for the browser
to display by inserting the retrieved data into placeholders in an HTML template.

Django web applications typically group the code that handles each of these steps into
separate files:

Fig6:Djangoworkingblockdiagram

• URLs: While it is possible to process requests from every single URL via a single function,
it is much more maintainable to write a separate view function to handle each resource. A

39
URL mapper is used to redirect HTTP requests to the appropriate view based on the request
URL. The URL mapper can also match particular patterns of strings or digits that appear
in a URL and pass these to a view function as data.
• View: A view is a request handler function, which receives HTTP requests and returns
HTTP responses. Views access the data needed to satisfy requests via models, and delegate
the formatting of the response to templates.
• Models: Models are Python objects that define the structure of an application's data, and
provide mechanisms to manage (add, modify, delete) and query records in the database.
• Templates: A template is a text file defining the structure or layout of a file (such as an
HTML page), with placeholders used to represent actual content. A view can dynamically
create an HTML page using an HTML template, populating it with data from a model. A
template can be used to define the structure of any type of file; it doesn't have to be HTML!

Install Django

You’ve got three easy options to install Django:

• Install an oflcial release. This is the best approach for most users.

• Install a version of Django provided by your operating system distribution.

• Install the latest development version. This option is for enthusiasts who want the latest-
and-greatest features and aren’t afraid of running brand new code. You might encounter
new bugs in the development version, but reporting them helps the development of Django.
Also, releases of third-party packages are less likely to be compatible with the development
version than with the latest stable release.
Always refer to the documentation that corresponds to the version of Django you’re
using!

Writing your first Django app, part 1

Let’s learn by example.

Throughout this tutorial, we’ll walk you through the creation of a basic poll application.
It’ll consist of two parts:
• A public site that lets people view polls and vote in them.

40
• An admin site that lets you add, change, and delete polls.

We’ll assume you have Django installed already. You can tell Django is installed and

$ python -m django --version


which version by running the following command in a shell

prompt (indicated by the $ prefix):

If Django is installed, you should see the version of your installation. If it isn’t, you’ll get
an error telling “No module named django”.

This tutorial is written for Django 2.2, which supports Python 3.5 and later. If the Django
version doesn’t match, you can refer to the tutorial for your version of Django by using the
version switcher at the bottom right corner of this page, or update Django to the newest
version. If you’re using an older version of Python, check What Python version can I use
with Django? to find a compatible version of Django.

See How to install Django for advice on how to remove older versions of Django and install
a newer one.

Creating a project

some code that establishes a Django project – a collection of settings for an instance of
Django, including database configuration, Django-specific options and application-
specific settings.

From the command line, cd into a directory where you’d like to store your code, then run
the following command:

$ django-admin startproject mysite


This will create a mysite directory in your current directory. If it didn’t work, see
Problems running django-admin.

Let’s look at what startproject created:


mysite/
manage.py
mysite/
init .py
settings.py
urls.py

41
wsgi.py
These files are:

• manage.py: A command-line utility that lets you interact with this Django project in
various ways. You can read all the details about manage.py in django-admin and
manage.py.

• The inner mysite/ directory is the actual Python package for your project. Its name is the
Python package name you’ll need to use to import anything inside it (e.g. mysite.urls).

mysite/ init .py: An empty file that tells Python that this directory should be considered a
Python pack- age. If you’re a Python beginner, read more about packages in the official
Python docs.

• mysite/settings.py: Settings/configuration for this Django project. Django settings will


tell you all about how settings work.

• mysite/urls.py: The URL declarations for this Django project; a “table of contents” of
your Django-powered site. You can read more about URLs in URL dispatcher.

• mysite/wsgi.py: An entry-point for WSGI-compatible web servers to serve your project.


See How to deploy with WSGI for more details.

The development server

Let’s verify your Django project works. Change into the outer mysite directory, if you

$ python manage.py runserver


haven’t already, and run the following commands:

You’ll see the following output on the command line:

Performing system checks...System check identified no issues (0 silenced).

You have unapplied migrations; your app may not work properly until they are
applied.Run 'python manage.py migrate' to apply them.

February 02, 2022 - 15:50:53


Django version 2.2, using settings 'mysite.settings' Starting development server at
http://127.0.0.1:8000/Quit the server with CONTROL-C.
Note: Ignore the warning about unapplied database migrations for now; we’ll deal with

42
the database shortly.

You’ve started the Django development server, a lightweight Web server written purely in
Python. We’ve included this with Django so you can develop things rapidly, without having
to deal with configuring a production server – such as Apache – until you’re ready for
production.

Now’s a good time to note: don’t use this server in anything resembling a production
environment. It’s intended only for use while developing. (We’re in the business of making
Web frameworks, not Web servers.)

Now that the server’s running, visit http://127.0.0.1:8000/ with your Web browser. You’ll
see a “Congratulations!” page, with a rocket taking off. It worked!

Changing the port

By default, the runserver command starts the development server on the internal IP at
port 8000.

If you want to change the server’s port, pass it as a command-line argument. For

$ python manage.py runserver 8080


instance, this command starts the server on port 8080:

If you want to change the server’s IP, pass it along with the port. For example, to listen
on all available public IPs (which is useful if you are running Vagrant or want to show off
your work on other computers on the network), use:

Creating the Polls app

Now that your environment – a “project” – is set up, you’re set to start doing work.

Each application you write in Django consists of a Python package that follows a certain
convention. Django comes with a utility that automatically generates the basic directory
structure of an app, so you can focus on writing code rather than creating directories.

43
Projects vs. apps

What’s the difference between a project and an app? An app is a Web application that does
something – e.g., a Weblog system, a database of public records or a simple poll app. A
project is a collection of configuration and apps for a particular website. A project can
contain multiple apps. An app can be in multiple projects.

Your apps can live anywhere on your Python path. In this tutorial, we’ll create our poll app
right next to your manage.py
file so that it can be imported as its own top-level module, rather than a submodule of
mysite.
To create your app, make sure you’re in the same directory as manage.py and type this
$ python manage.py startapp polls
command:

That’ll create a directory polls, which is laid out like this:

polls/
init .py
admin.py
apps.py
migrations/
init .py
models.py
tests.py
views.py

This directory structure will house the poll application.

Write your first view

Let’s write the first view. Open the file polls/views.py and put the following
Python code in it:

from django.http import HttpResponse

def index(request):
return HttpResponse("Hello, world. You're at the polls index.")

44
This is the simplest view possible in Django. To call the view, we need to map it to a
Prediction page

Database setup

Now, open up mysite/settings.py. It’s a normal Python module with module-level variables
representing Django settings.

By default, the configuration uses SQLite. If you’re new to databases, or you’re just
interested in trying Django, this is the easiest choice. SQLite is included in Python, so you
won’t need to install anything else to support your database. When starting your first real
project, however, you may want to use a more scalable database like PostgreSQL, to avoid
database-switching headaches down the road.

If you wish to use another database, install the appropriate database bindings and change
the following keys in the
DATABASES 'default' item to match your database connection settings:

• ENGINE – Either 'django.db.backends.sqlite3', 'django.db.backends.postgresql',


'django.db. backends.mysql', or 'django.db.backends.oracle'. Other backends are also
available.

• NAME – The name of your database. If you’re using SQLite, the database will be a file on
your computer; in that case, NAME should be the full absolute path, including filename,
of that file. The default value, os.path.join(BASE_DIR, 'db.sqlite3'), will store the file
in your project directory.

For databases other than SQLite

If you’re using a database besides SQLite, make sure you’ve created a database by this
point. Do that with “CREATEDATABASE database_name;” within your database’s
interactive prompt.

Also make sure that the database user provided in mysite/settings.py has “create database”
privileges. This allows automatic creation of a test database which will be needed in a
later tutorial.

While you’re editing mysite/settings.py, set TIME_ZONE to your time zone.

45
CHAPTER-VI

6.1 CODE

#### Backend Python Code ######

from django.shortcuts import render

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn import metrics

def home(request):

return render(request,"home.html")

def predict(request):

return render(request,"predict.html")

def result(request):

data = pd.read_csv(r'C:\Users\mvrre\Downloads\USA_Housing.csv')

data = data.drop(['Address'],axis=1)

x = data.drop('Price', axis=1)

y = data['Price']

46
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=.30)

model = LinearRegression()

model.fit(x_train, y_train)

var1 =float(request.GET['n1'])

var2 =float(request.GET['n2'])

var3 =float(request.GET['n3'])

var4 =float(request.GET['n4'])

var5 =float(request.GET['n5'])

pred = model.predict(np.array([var1, var2, var3, var4, var5]).reshape(1,-1))

pred = round(pred[0])

price = "The predicted price is $"+str(pred)

return render(request,"predict.html", {"result2":price})

from django.shortcuts import render, redirect, HttpResponseRedirect

from registers.models import Member #models.py

# Create your views here.

def index(request):

if request.method == 'POST':

member = Member(username=request.POST['username'],
password=request.POST['password'], firstname=request.POST['firstname'],
lastname=request.POST['lastname'])

member.save()

return redirect('/')

47
else:

return render(request, 'signup_page.html')

def login(request):

return render(request, 'login_page.html')

def home(request):

if request.method == 'POST':

if Member.objects.filter(username=request.POST['username'],
password=request.POST['password']).exists():

member = Member.objects.get(username=request.POST['username'],
password=request.POST['password'])

return render(request, 'home.html', {'member': member})

else:

context = {'msg': 'Invalid username or password'}

return render(request, 'login_page.html', context)

48
6.2 RESULTS

Fig 7: sign up page of House prices prediction system

Fig 8: Login Page of House pries prediction

49
Fig 9: Home Page of house prices predition

Fig 10: predict page of house prices prediction\

50
6.3 BEST SUITED MODEL
Linear Regression displayed the best performance for this Dataset and can be used for
deploying purposes.

Random Forest Regressor and XGBoost Regressor are far behind, so can’t be
recommended for further deployment purposes.

CONCLUSION

So, our Aim is achieved as we have successfully ticked all our parameters as mentioned in
our Aim Column. It is seen that circle rate is the most effective attribute in predicting the
house price and that the Linear Regression is the most effective model for our Dataset and
we deployed the Linear Regression model using the Django web framework for prediction
the prices in different values

REFERENCES

[1]“Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-
learn,and TensorFlow”, Packt Publishing, 2 edition, Sebastian Raschka & Vahid Mirjalili

[2]“Data Mining: Practical Machine Learning Tools and Techniques”, 4th edition,
Published byMorgan Kaufmann, Ian H. Witten, Eibe Frank, Mark A.Hall, Christoper J.
Pal
[3] “Python Data Science Handbook”, Published by O’Reilly Media, Jake VanderPlas

51

You might also like