Final Report

HOUSE PRICE PREDICTION USING MACHINE
LEARNING
A PROJECT REPORT
Submitted by
JAYASURYA S (210620104023)
in the partial fulfillment for the award of the

degree of
BACHELOR OF ENGINEERING
in
COMPUTER SCIENCE AND ENGINEERING
JEPPIAAR INSTITUTE OF TECHNOLOGY

ANNA UNIVERSITY: CHENNAI 600025
MAY 2024
i
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “HOUSE PRICE PREDICTION USING

MACHINE LEARNING" is the bonafide work of “JAYASURYA S
(210620104023) “who carried out the project work under my supervision.
SIGNATURE SIGNATURE
MR.SATHEESH M.E., DR.TAMILARASI M.E.,Ph.D.,
ASSISTANT PROFESSOR, PROFESSOR,
SUPERVISOR, HEAD OF THE DEPARTMENT,
Computer Science And Engineering, Computer Science And Engineering,
Jeppiaar Institute Of Technology, Jeppiaar Institute Of Technology,
Sriperumbudur,Chennai Sriperumbudur,Chennai
This project report is submitted for viva voice examination to

be held on
INTERNAL EXAMINER EXTERNAL EXAMINER
JEPPIAAR INSTITUTE OF TECHNOLOGY,

SRIPERUMBUDUR, CHENNAI - 631 604.
ii
ACKNOWLEDGEMENT
We express our deep sense of gratitude to Lord Almighty for the blessings
to complete this project work successfully.
We would like to express our deepest gratitude and respect to honorable

Col. Dr. JEPPIAAR, M.A., B.L., Ph.D., Chairman for having given the
opportunity to pursue the education in this premier institution.
We take this opportunity to express our deepest and special thanks to

Dr. N. MARIE WILSON, B.Tech., MBA., Ph.D., Managing Director for
providing all the facilities and continuous encouragement for carrying out this
project work.
We express our sincere gratitude to Dr. SURESH, B.E., M.E., Ph.D.,

Principal and Dr. TAMILARASI, B.E., M.E., Ph.D., Head (Department Of
Computer Science And Engineering) for their guidance and advise throughout
the project
We convey our sincere and in-depth gratitude to our Internal guide Mr.
A. SATHEESH, B.E., M.E., for her valuable guidance throughout the duration of
this project.
We would also like to thank our parents and friends for the support they
extended during this course of the project.
iii
TABLE OF CONTENTS
CHAPTER TITLE PAGE NO.

ABSTRACT vii
LIST OF FIGURES viii
LIST OF ABBREVIATIONS ix
1 INTRODUCTION 1
1.1 Scope of Project 1
1.2 Motivation of Project
2 SYSTEM ANALYSIS 2
2.1 Literature Survey 2
2.2 Existing System 2
2.2.1 Disadvantages
2.3 Proposed System 7
2.3.1 Advantages
3 REQUIREMENT SPECIFICATION 8
3.1 Introduction 8
3.2 Hardware and Software Specification 9
3.2.1 Hardware Requirement 9
3.2.2 Software Requirement 9
3.3 Technologies Used 10
3.3.1 Introduction to MATLAB 10
3.3.2 Toolboxes 10
3.3.3 The MATLAB system 11
3.3.4 Development Environment 12
3.3.5 Manipulating Matrices 17
iv
3.3.6 Image Processing Techniques 20
3.3.6.1 Definition 20
3.3.6.2 Advantages 21
3.3.6.3 Disadvantages 21
4 SYSTEM DESIGN 22
4.1 Architecture Diagram 22
4.2 Sequence Diagram 23
4.3 Use Case Diagram 24
4.4 Activity Diagram 25
4.5 Collaboration Diagram 26
5 SYSTEM DESIGN - IMPLEMENTATION 27
5.1 Modules 27
5.2 Module Explanation 28
6 CODING AND TESTING 31
6.1 Coding Standards 31
6.2 Test Procedure 33
6.3 Test data and output 33
7 SNAPSHOTS 41
8 CONCLUSION AND FUTURE ENHANCEMENTS 47
8.1 Conclusion 47
8.2 Future Enhancements 47
9 REFERENCES 48
9.1 Appendiz_Coding 50
10 PUBLICATION 53
v
ABSTRACT
House price prediction using machine learning in the data science

domain has witnessed remarkable advancements due to the expanding data availability
and the evolution of machine learning algorithms. This progress has made the accurate
prediction of house prices more achievable than ever. Various machine learning
techniques such as decision trees, Random Forests, and Linear Regression analysis are
harnessed for predicting house prices. Additionally, the exploration of feature selection
and engineering techniques plays a pivotal role in identifying the most influential
variables impacting house prices. The accurate analysis and prediction of house prices
offer invaluable insights to stakeholders such as buyers, sellers, and real estate agents,
empowering them to make well-informed decisions aligned with prevailing market
conditions. Notably, machine learning algorithms exhibit superior predictive
capabilities, often surpassing traditional regression methods in accuracy and reliability .
vi
LIST OF FIGURES
FIGURE NO FIGURE NAME PAGE NO
4.1 Architecture Diagram 9
4.2 Sequence Diagram 10
4.3 Use Case Diagram 11
4.4 Activity Diagram 12
4.5 Collaboration Diagram 13
vii
LIST OF ABBREVIATIONS
RF Random Forest Regression
DT Decision Tree Regression
CSV Comma-Separated Values
SVM Support Vector Machine
viii
CHAPTER 1
INTRODUCTION
The House price prediction is a critical aspect of the real estate industry as it
significantly impacts stakeholders including buyers, sellers, investors, and
policymakers. Traditional methods of price prediction often rely on simplistic
regression models or expert opinions, which may overlook the intricacies inherent in
housing markets. However, with the emergence of machine learning in prediction
techniques, there is newfound optimism. Machine learning algorithms offer a data-
centric approach capable of modeling intricate relationships and patterns within
housing markets. Leveraging extensive datasets comprising diverse features such as
location, size, amenities, economic indicators, and past sales data, these algorithms
unveil hidden insights and deliver precise predictions. This study aims to
comprehensively evaluate various machine learning algorithms for house price
prediction. We will assess the performance of algorithms like linear regression,
decision trees, random forests, support vector machines (SVMs), and gradient boosting
machines (GBMs), aiming to identify the most effective methods for forecasting house
prices. Furthermore, we will explore feature engineering techniques to enhance the
extraction of meaningful features for improved prediction accuracy.
1
1.1 SCOPE OF PROJECT:
The Machine Learning project for house price prediction encompasses several critical stages,
each contributing to the project's success. Initially, the problem is defined clearly, focusing on
predicting house prices accurately by considering diverse features such as location, size, and
amenities. Data collection involves gathering information from various sources like real estate
databases or APIs, followed by rigorous cleaning and exploratory analysis to understand data
distributions effectively. Feature engineering then extracts meaningful insights from the data, while
model selection, training, and optimization refine algorithms such as linear regression, decision
trees, or neural networks to achieve higher accuracy rates. Evaluation metrics like RMSE and MAE
are employed to validate model performance, ensuring reliable predictions. This leads to the
deployment of the model, followed by continuous monitoring and maintenance to uphold its
accuracy and reliability over time. Additionally, comprehensive documentation and reporting are
crucial to summarize the project's objectives, methodologies, and outcomes, providing stakeholders
with a clear understanding of the project's achievements and areas for improvement, ultimately
contributing to sustained accuracy and reliability in house price predictions.
1.2 MOTIVATION OF PROJECT:
The drive behind a project focused on using machine learning to predict house prices is both
broad and impactful. Essentially, it aims to address a real-world need by providing crucial insights
into the complex factors that influence property values. This is incredibly valuable for both buyers
and sellers, helping them make informed decisions and ensuring fairness in transactions while
maximizing profits.
Additionally, the project benefits real estate professionals like agents and brokers by
improving their ability to analyze markets, leading to happier clients and better business outcomes.
Investors also see the project's value as it helps them identify profitable opportunities and manage
risks effectively, bolstering their investment strategies.
2
From a technology standpoint, these projects are pushing the boundaries of what machine learning can
achieve, showing its potential to revolutionize multiple industries and greatly enhance decision-
making processes. Overall, the motivation behind this project is its positive impact on stakeholders,
the continuous progress in machine learning, and the potential to bring positive changes not just in
real estate but across various domains.
3
CHAPTER 2
SYSTEM ANALYSIS
2.1 LITERATURE SURVEY:
Paper 1:
Title: House Price Prediction Using Machine Learning
Author: G. Naga Satish, Ch. V. Raghavendran, M.D.Sugnana Rao, Ch.Srinivasulu
Year: 2019
Abstract:
Machine learning plays a major role from past years in image detection, spam
reorganization, normal speech command, product recommendation and medical diagnosis. Present
machine learning algorithm helps us in enhancing security alerts, ensuring public safety and improve
medical enhancements. Machine learning system also provides better customer service and safer
automobile systems. In the present paper we discuss about the prediction of future housing prices that
is generated by machine learning algorithm. For the selection of prediction methods we compare and
explore various prediction methods. We utilize lasso regression as our model because of its adaptable
and probabilistic methodology on model selection. Our result exhibit that our approach of the issue
need to be successful, and has the ability to process predictions that would be comparative with other
house cost prediction models. More over on other hand housing value indices, the advancement of a
housing cost prediction that tend to the advancement of real estate policies schemes. This study
utilizes machine learning algorithms as a research method that develops housing price prediction
models. We create a housing cost prediction model In view of machine learning algorithm models for
example, XGBoost, lasso regression and neural system on look at their order precision execution. We
in that point recommend a housing cost prediction model to support a house vender or a real estate
agent for better information based on the valuation of house. Those examinations exhibit that lasso
regression algorithm, in view of accuracy, reliably outperforms alternate models in the execution of
housing cost prediction.
4
Paper 2
Title: House Price Prediction Modeling Using Machine Learning
Author: Dr. M. Thamarai, Dr. S P. Malarvizhi
Year: 2020
Abstract:
Machine Learning is seeing its growth more rapidly in this decade. Many applications and
algorithms evolve in Machine Learning day to day. One such application found in journals is house
price prediction. House prices are increasing every year which has necessitated the modeling of
house price prediction. These models constructed, help the customers to purchase a house suitable
for their need. Proposed work makes use of the attributes or features of the houses such as number
of bedrooms available in the house, age of the house, travelling facility from the location, school
facility available nearby the houses and Shopping malls available nearby the house location. House
availability based on desired features of the house and house price prediction are modeled in the
proposed work and the model is constructed for a small town in West Godavari district of
Andhrapradesh. The work involves decision tree classification, decision tree regression and
multiple linear regression and is implemented using Scikit-Learn Machine Learning Tool.
Paper 3
Author: Vaishnavi Aghav, Dhanashree Avhad, Supriya Nanaware, Rakesh Gudekar, Prof.
Mahendra Pawar
Year: 2023
Abstract:
House price prediction using machine learning is a popular topic in the field of data science
and artificial intelligence. With the increasing availability of data and the advancements in the
machine learning algorithms, predicting house prices accurately has become more feasible than ever
5
before. In this study, we aim to predict house prices using variety of machine learning techniques
decision tree, Lasso, Linear regression analysis. We will also explore feature selection and
engineering techniques to identify the most important variables that influence house prices. By
analyzing and predicting house prices accurately, we can provide valuable insights for buyers,
sellers, and real estate agents, enabling them to make informed decisions based on the current
market conditions. Our results show that machine learning algorithms can accurately predict house
prices and out perform traditional regression methods.
Paper 4
Title: Prediction of House Pricing Using Machine Learning with Python
Author: Mansi Jain, Himani Rajput, Neha Garg, Pronika Chawla
Year: 2020
Abstract:
This paper provides an overview about how to predict house costs utilizing different
regression methods with the assistance of python libraries. The proposed technique considered the
more refined aspects used for the calculation of house price and provide the more accurate
prediction. It also provides a brief about various graphical and numerical techniques which will be
required to predict the price of a house. This paper contains what and how the house pricing model
works with the help of machine learning and which dataset is used in our proposed model.
Paper 5
Author: MS.A.VIDHYAVANI, O.BHARGAV SATHWIK, HEMANTH.T, VISHNU VARDHAN
YADAV.M.
Year: 2021
Abstract:
This project provides us an overview on how to predict house prices using various machine
learning models with the help of different python libraries. This proposed model considers as the
6
most accurate model used for calculating the house price and provides a most accurate prediction.
This provides a brief introduction which will be needed to predict the house price. This project
consists of what and how the house price model works with the assistance of machine learning
technique using scikit-learn and which datasets we will be using in our proposed model. Predicting
the price of a house helps for determine the selling price of the house in a particular region and it
help people to find the correct time to buy a home. In this task on House Price Prediction using
machine learning, our task is to use data to create a machine learning model to predict house prices
in the given region. We will implement a linear regression algorithm on our dataset. By using real
world data entities, we are going to predict the price of the house in that area. For better results we
require data pre-processing units to improve the efficiency of the model. for this project we are
using supervised learning, which is a part of machine learning. We have to go through different
attributes of the dataset.
2.2 EXISTING SYSTEM

In the existing system for house price prediction using machine learning, several
methodologies and algorithms have been developed and utilized to tackle the complexity of
predicting property values. Traditional regression models such as linear regression have been
commonly employed, leveraging features like location, size, and amenities to estimate house
prices. Decision tree-based algorithms like random forests have also been popular due to their
ability to handle non-linear relationships and feature interactions effectively.
Moreover, advanced techniques such as gradient boosting machines (GBM) and eXtreme Gradient
Boosting (XGBoost) have gained prominence for their superior predictive power and ability to
handle large datasets with high dimensionality. Support Vector Machines (SVM) and neural
networks, including deep learning architectures like convolutional neural networks (CNN) and
recurrent neural networks (RNN), have been explored for their capability to capture intricate
patterns in the data.
Data preprocessing techniques such as feature scaling, handling missing values, and encoding
categorical variables have been crucial in preparing the data for modeling. Additionally, model
evaluation using metrics like RMSE, MAE, and R-squared has been employed to assess the
7
accuracy and performance of the predictive models.
Overall, the existing system for house price prediction using machine learning encompasses a
range of methodologies and algorithms tailored to address the challenges inherent in predicting
property values accurately. Continued research and development in this area aim to further enhance
the accuracy and reliability of house price predictions, benefiting stakeholders in the real estate
industry and related sectors.
2.2.1 DISADVANTAGES
 Some advanced machine learning techniques, such as deep learning models like CNNs and
RNNs, can be highly complex and lack interpretability.
 Ensuring data quality and integrity remains a significant challenge in the existing system.
 Implementing advanced machine learning algorithms, especially those that involve large
datasets or complex models, often requires substantial computational resources.
 This can lead to poor performance in real-world scenarios and undermine the reliability of
house price predictions.
2.3 PROPOSED SYSTEM:

The proposed system for house price prediction using machine learning integrates several
advanced strategies to overcome limitations and enhance prediction accuracy. It employs a
combination of sophisticated algorithms such as eXtreme Gradient Boosting (XGBoost), Gradient
Boosting Machines (GBM), and ensemble methods like Random Forests, chosen for their ability to
handle complex relationships and high-dimensional data effectively. Extensive feature engineering
techniques are applied to extract meaningful insights, including handling categorical variables and
incorporating domain knowledge. Robust data preprocessing ensures high-quality data for training,
while model interpretability techniques like feature importance analysis and SHAP values enhance
understanding and trust. Automated hyperparameter tuning optimizes model performance, and
ensemble learning further improves accuracy and generalization. Continuous monitoring and model
updating mechanisms ensure ongoing effectiveness and relevance. This comprehensive approach
aims to deliver accurate and reliable house price predictions in the real estate domain.
8
2.3.1 ADVANTAGES:
 The use of advanced algorithms like XGBoost, GBM, and ensemble methods results in
significantly higher prediction accuracy compared to traditional models.
 The system's capability to handle complex relationships and high-dimensional data allows
it to capture intricate patterns in the real estate market.
 This leads to more nuanced and insightful predictions, considering diverse factors that
influence property values.
 Robust data preprocessing techniques ensure high-quality data for training the machine
learning models.
 By addressing issues like noise, outliers, and missing values effectively, the system
enhances the reliability and trustworthiness of its predictions, improving decision-making
processes.
9
CHAPTER 3
REQUIREMENTS SPECIFICATION
3.1 INTRODUCTION
The requirements specification is a technical specification of requirements for the software

products. It is the first step in the requirements analysis process it lists the requirements of a
software system including functional, performance and security requirements. The requirements
also provide usage scenarios from a user, an operational and an administrative perspective. The
purpose of software requirements specification is to provide a detailed overview of the software
project, its parameters and goals. This describes the project target audience and its user interface,
hardware and software requirements. It defines how the client, team and audience see the project
and its functionality.
3.2. INPUT DESIGN
The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and those steps are necessary to put
transaction data in to a usable form for processing can be achieved by inspecting the computer to
read data from a written or printed document or it can occur by having people keying the data
directly into the system. The design of input focuses on controlling the amount of input required,
controlling the errors, avoiding delay, avoiding extra steps and keeping the process simple. The
input is designed in such a way so that it provides security and ease of use with retaining the
privacy. Input Design considered the following things:
 What data should be given as input?

 How the data should be arranged or coded?
 The dialog to guide the operating personnel in providing input.
 Methods for preparing input validations and steps to follow when error occur.
10
3.2.1. OBJECTIVES
1. Input Design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input process and
show the correct direction to the management for getting correct information from the
computerized system.
2. It is achieved by creating user-friendly screens for the data entry to handle large
volume of data. The goal of designing input is to make data entry easier and to be free from
errors. The data entry screen is designed in such a way that all the data manipulates can be
performed. It also provides record viewing facilities.
3. When the data is entered it will check for its validity. Data can be entered with the help
of screens. Appropriate messages are provided as when needed so that the user will not be in
maize of instant. Thus, the objective of input design is to create an input layout that is easy to
follow
3.3 HARDWARE AND SOFTWARE SPECIFICATION
3.3.1 HARDWARE REQUIREMENTS
 Processor Type : Intel Core i5 or i7.

 Network Speed : 1.1 GHZ.
 RAM : 8 GB .
3.3.2 SOFTWARE REQUIREMENTS
 Operating System : Windows 7.

 Programming Language : Python (version 3).
 Libraries : NumPy, Pandas , Matplotlib , Scikit Learn.
 IDE :Jupyter Notebooks, VS Code , PyCharm.
11
3.4 TECHNOLOGIES USED
3.3.3 MATLAB
MATLAB is a high-performance language for technical computing. It integrates
computation, visualization, and programming in an easy-to-use environment where problems and
solutions are expressed in familiar mathematical notation. Typical uses include:
• Math and computation
• Algorithm development
• Modeling, simulation, and prototyping
• Data analysis, exploration, and visualization
• Scientific and engineering graphics
• Application development, including graphical user interface building
MATLAB is an interactive system whose basic data element is an array that does not
require dimensioning. This allows you to solve many technical computing problems, especially
those with matrix and vector formulations, in a fraction of the time it would take to write a
program in a scalar no interactive language such as C or Fortran.
The name MATLAB stands for matrix laboratory. MATLAB was originally written
to provide easy access to matrix software developed by the LINPACK and EISPACK projects.
Today, MATLAB uses software developed by the LAPACK and ARPACK projects, which
together represent the state-of-the-art in software for matrix computation.
MATLAB has evolved over a period of years with input from many users. In
university environments, it is the standard instructional tool for introductory and advanced
courses in mathematics, engineering, and science. In industry, MATLAB is the tool of choice for
high- productivity research, development, and analysis.
12
3.3.4 Toolboxes
MATLAB features a family of application-specific solutions called toolboxes. Very
important to most users of MATLAB, toolboxes allow you to learn and apply specialized
technology. Toolboxes are comprehensive collections of MATLAB functions (M-files) that
extend the MATLAB environment to solve particular classes of problems. Areas in which
toolboxes are available include signal processing, control systems, neural networks, fuzzy logic,
wavelets, simulation, and many others.
3.3.5 The MATLAB System

The MATLAB system consists of five main parts:
Development Environment.
This is the set of tools and facilities that help you use MATLAB functions and files.
Many of these tools are graphical user interfaces. It includes the MATLAB desktop and
Command Window, a command history, and browsers for viewing help, the workspace, files,
and the search path.
The MATLAB Mathematical Function Library.

This is a vast collection of computational algorithms ranging from elementary functions
like sum, sine, cosine, and complex arithmetic, to more sophisticated functions like matrix
inverse, matrix eigenvalues, Bessel functions, and fast Fourier transforms.
The MATLAB language.

This is a high-level matrix/array language with control flow statements, functions, data
structures, input/output, and object-oriented programming features. It allows both “programming
in the small” to rapidly create quick and dirty throw-away programs, and “programming in the
large” to create complete large and complex application programs.
13
Handle Graphics®.
This is the MATLAB graphics system. It includes high-level commands for two-
dimensional and three-dimensional data visualization, image processing, animation, and
presentation graphics. It also includes low-level commands that allow you to fully customize the
appearance of graphics as well as to build complete graphical user interfaces on your MATLAB
applications.
The MATLAB Application Program Interface (API).

This is a library that allows you to write C and Fortran programs that interact with
MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking), calling
MATLAB as a computational engine, and for reading and writing MAT-files. Image processing
in MATLAB Images and pictures as we mentioned in the preface, human beings are
predominantly visual creatures: we rely heavily on our vision to make sense of the world around
us. We not only look at things to identify and classify them, but we can scan for divergences, and
obtain an overall rough feeling for a scene with a quick glance.
Humans have evolved very precise visual skills: we can identify a face in an instant; we can
differentiate colors; we can process a large amount of visual information very quickly.
However, the world is in constant motion: stare at something for long enough and it will change
in some way. Even a large solid structure, like a building or a mountain, will change its
appearance depending on the time of day (day or night); amount of sunlight (clear or cloudy), or
various shadows falling upon it. We are concerned with single images: snapshots, if you like, of
a visual scene. Although image processing can deal with changing scenes, we shall not discuss it
in any detail in this text. For our purposes, an image is a single picture which represents
something. It may be a picture of a person, of people or animals, or of an outdoor scene, or a
microphotograph of an electronic component, or the result of medical imaging. Even if the
picture is not immediately recognizable, it will not be just a random blur.
14
3.3.4. DEVELOPMENT ENVIRONMENT
This chapter provides a brief introduction to starting and quitting MATLAB, and
the tools and functions that help you to work with MATLAB variables and files. For more
information about the topics covered here, see the corresponding topics under Development
Environment in the MATLAB documentation, which is available online as well as in print.
Starting and Quitting MATLAB

Starting MATLAB
On a Microsoft Windows platform, to start MATLAB, double-click the
MATLAB shortcut icon on your Windows desktop.
On a UNIX platform, to start MATLAB, type MATLAB at the operating system
prompt. After starting MATLAB, the MATLAB desktop opens - see MATLAB Desktop.
We can change the directory in which MATLAB starts, define startup options including running
a script upon startup, and reduce startup time in some situations.
Quitting MATLAB
To end your MATLAB session, select Exit MATLAB from the File menu in the
desktop, or type quit in the Command Window. To execute specified functions each time
MATLAB quits, such as saving the workspace, you can create and run a finish script.
MATLAB Desktop
When you start MATLAB, the MATLAB desktop appears, containing tools
(graphical user interfaces) for managing files, variables, and applications associated with
MATLAB.
The first time MATLAB starts, the desktop appears as shown in the following
illustration, although your Launch Pad may contain different entries. Change the way your
desktop looks by opening, closing, moving, and resizing the tools in it. You can also move tools
outside of the desktop or return them back inside the desktop (docking). All the desktop tools
provide common features such as context menus and keyboard shortcuts.
We can specify certain characteristics for the desktop tools by selecting
Preferences from the File menu. For example, you can specify the font characteristics for
15
Command Window text. For more information, click the Help button in the Preferences dialog
box.
Desktop Tools
This section provides an introduction to MATLAB's desktop tools. You can also
use MATLAB functions to perform most of the features found in the desktop tools. The tools
are:
 Current Directory Browser
 Workspace Browser
 Array Editor
 Editor/Debugger
 Command Window
 Command History
 Launch Pad
 Help Browser
Command Window
Use the Command Window to enter variables and run functions and M-files.
Command History
Lines you enter in the Command Window are logged in the Command History
window. In the Command History, you can view previously used functions, and copy and
execute selected lines. To save the input and output from a MATLAB session to a file, use the
diary function.
Running External Programs
You can run external programs from the MATLAB Command Window. The
exclamation point character! is a shell escape and indicates that the rest of the input line is a
command to the operating system. This is useful for invoking utilities or running other programs
without quitting MATLAB. On Linux, for example, emacs magik.m invokes an editor called
emacs for a file named magik.m. When you quit the external program, the operating system
returns control to MATLAB.
16
Launch Pad
MATLAB's Launch Pad provides easy access to tools, demos, and
documentation.
Help Browser
Use the Help browser to search and view documentation for all your Math Works
products. The Help browser is a Web browser integrated into the MATLAB desktop that
displays HTML documents.
To open the Help browser, click the help button in the toolbar, or type help
browser in the Command Window. The Help browser consists of two panes, the Help Navigator,
which you use to find information, and the display pane, where you view the information.
Help Navigator
Use to Help Navigator to find information. It includes:
Product filter - Set the filter to show documentation only for the products.
Contents tab - View the titles and tables of contents of documentation for your products.
Index tab - Find specific index entries (selected keywords) in the MathWorks documentation
for your products.
Search tab - Look for a specific phrase in the documentation. To get help for a specific function,
set the Search type to Function Name.
Favorites tab - View a list of documents you previously designated as favorites.
Display Pane - After finding documentation using the Help Navigator, view it in the display
pane. While viewing the documentation, you can:
Browse to other pages - Use the arrows at the tops and bottoms of the pages or use the
back and forward buttons in the toolbar.
Bookmark pages - Click the Add to Favorites button in the toolbar.
Print pages - Click the print button in the toolbar.
Find a term in the page - Type a term in the Find in page field in the toolbar and click Go.
Other features available in the display pane are: copying information, evaluating a selection, and
viewing Web pages.
17
Current Directory Browser
MATLAB file operations use the current directory and the search path as
reference points. Any file you want to run must either be in the current directory or on the search
path.
Search Path
To determine how to execute functions you call, MATLAB uses a search path to
find M-files and other MATLAB-related files, which are organized in directories on your file
system. Any file you want to run in MATLAB must reside in the current directory or in a
directory that is on the search path. By default, the files supplied with MATLAB and
MathWorks toolboxes are included in the search path.
Workspace Browser
The MATLAB workspace consists of the set of variables (named arrays) built up
during a MATLAB session and stored in memory. You add variables to the workspace by using
functions, running M-files, and loading saved workspaces.
To view the workspace and information about each variable, use the Workspace
browser, or use the functions who and whos.
To delete variables from the workspace, select the variable and select Delete from
the Edit menu. Alternatively, use the clear function.
The workspace is not maintained after you end the MATLAB session. To save the
workspace to a file that can be read during a later MATLAB session, select Save Workspace As
from the File menu, or use the save function. This saves the workspace to a binary file called a
MAT-file, which has a. mat extension. There are options for saving to different formats. To read
in a MAT-file, select Import Data from the File menu, or use the load function.
Array Editor
Double-click on a variable in the Workspace browser to see it in the Array Editor.
Use the Array Editor to view and edit a visual representation of one- or two-dimensional numeric
arrays, strings, and cell arrays of strings that are in the workspace.
18
Editor/Debugger
Use the Editor/Debugger to create and debug M-files, which are programs you
write to run MATLAB functions. The Editor/Debugger provides a graphical user interface for
basic text editing, as well as for M-file debugging.
You can use any text editor to create M-files, such as Emacs, and can use
preferences (accessible from the desktop File menu) to specify that editor as the default. If you
use another editor, you can still use the MATLAB Editor/Debugger for debugging, or you can
use debugging functions, such as dB stop, which sets a breakpoint.
If you just need to view the contents of an M-file, you can display it in the
Command Window by using the type function.
3.3.5. MANIPULATING MATRICES
Entering Matrices
The best way for you to get started with MATLAB is to learn how to handle
matrices. Start MATLAB and follow along with each example.
You can enter matrices into MATLAB in several different ways:
 Enter an explicit list of elements.
 Load matrices from external data files.
 Generate matrices using built-in functions.
 Create matrices with your own functions in M-files.
Start by entering Dürer's matrix as a list of its elements. You have only to follow a few basic
conventions:
 Separate the elements of a row with blanks or commas.
 Use a semicolon, ; , to indicate the end of each row.
 Surround the entire list of elements with square brackets, [
]. To enter Dürer's matrix, simply type in the Command Window
A = [16 3 2 13; 5 10 11 8; 9 6 7 12; 4 15 14 1]

MATLAB displays the matrix you just entered.
19
A= 16 3 2 13
5 10 11 8
9 6 7 12
4 15 14 1
This exactly matches the numbers in the engraving. Once you have entered the matrix, it is
automatically remembered in the MATLAB workspace. You can refer to it simply as A.
Expressions
Like most other programming languages, MATLAB provides mathematical
expressions, but unlike most programming languages, these expressions involve entire matrices.
The building blocks of expressions are:
 Variables
 Numbers
 Operators
 Functions
Variables
MATLAB does not require any type declarations or dimension statements. When
MATLAB encounters a new variable name, it automatically creates the variable and allocates the
appropriate amount of storage. If the variable already exists, MATLAB changes its contents and,
if necessary, allocates new storage. For example,
num_students = 25
Creates a 1-by-1 matrix named num_students and stores the value 25 in its single element.
Variable names consist of a letter, followed by any number of letters, digits, or underscores.
MATLAB uses only the first 31 characters of a variable name. MATLAB is case sensitive;
it distinguishes between uppercase and lowercase letters. A and a are not the same variable.
To view the matrix assigned to any variable, simply enter the variable name.
Numbers
MATLAB uses conventional decimal notation, with an optional decimal point and
leading plus or minus sign, for numbers. Scientific notation uses the letter e to specify a power-
20
of-ten scale factor. Imaginary numbers use either i or j as a suffix. Some examples of legal
numbers are
3 -99 0.0001
9.6397238 1.60210e-20 6.02252e23
1i -3.14159j 3e5i
All numbers are stored internally using the long format specified by the IEEE
floating-point standard. Floating-point numbers have a finite precision of roughly 16 significant
decimal digits and a finite range of roughly 10-308 to 10+308.
Operators
Expressions use familiar arithmetic operators and precedence rules.
+ Addition
- Subtraction
* Multiplication
/ Division
\ Left division (described in "Matrices and Linear Algebra" in
Using MATLAB)
^ Power
' Complex conjugate transpose
() Specify evaluation order
Functions
MATLAB provides a large number of standard elementary mathematical
functions, including abs, sqrt, exp, and sin. Taking the square root or logarithm of a negative
number is not an error; the appropriate complex result is produced automatically. MATLAB also
provides many more advanced mathematical functions, including Bessel and gamma functions.
Most of these functions accept complex arguments. For a list of the elementary mathematical
functions, type Some of the functions, like sqrt and sin, are built in. They are part of the
MATLAB core, so they are very efficient, but the computational details are not readily
21
accessible. Other functions, like gamma and sin, are implemented in M-files. You can see the
code and even modify it if you want. Several special functions provide values of useful
constants.
Pi 3.14159265...
I Imaginary unit, √-1
I Same as i
Eps Floating-point relative precision, 2-52
Realmin Smallest floating-point number, 2-1022
Realmax Largest floating-point number, (2- ε)21023
Inf Infinity
NaN Not-a-number
3.3.6. IMAGE PROCESSING
3.3.6.1. Definition
Image processing involves changing the nature of an image in order to either
1. Improve its pictorial information for human interpretation,
2. Render it more suitable for autonomous machine perception.
Image processing finds applications in such fields as photography, satellite imaging, medical
imaging, and image compression, just to name a few. In the past, image processing was largely
done using analog devices. However, as computers have become more powerful, processing
shifted toward the digital domain. Like one-dimensional digital signal processing, digital image
processing overcomes traditional analog “problems” such as noise, distortion during processing,
inflexibility of system to change, and difficulty of implementation.
Generally, image processing consists of several stages: image import, analysis, manipulation and
image output. There are two methods of image processing: digital and analogue. In particular,
digital image processing and its techniques is what this article is about. Computer algorithms
play a crucial role in digital image processing. Developers use multiple algorithms to solve
different tasks, including digital image detection, analysis, reconstruction, restoration, image data
compression, image enhancement, image estimation and image spectral estimation.
Major techniques of digital image processing are as follows:
Image Editing, which basically means altering digital images by means of graphic software tools.
22
Image Restoration, which refers to the estimation of a clean original image out of the corrupt
image taken in order to get back the information lost.
Independent Component Analysis, which separates a multivariate signal computationally into

additive subcomponents.
Anisotropic Diffusion, which is often known as Perona-Malik Diffusion, makes it possible to

reduce image noise without having to remove important parts of the image.
Linear Filtering. It’s another digital image processing technique, which refers to processing
time-varying input signals and producing output signals that are subject to the constraint of
linearity.
Neural Networks, which are computational models widely used in machine learning for solving
various tasks.
Pixilation, which often refers to turning printed images into digitized ones (such as GIF).
Principal Components Analysis, a digital image processing technique that can be used
for feature extraction.
Partial Differential Equations, which also is dealing with effectively de-noising images.
Hidden Markov Models, a technique used for image analysis in two dimensions.
Wavelets, which stands for a mathematical function that’s used in image compression.
Self-organizing Maps, a digital image processing technique for classifying images into a
number of classes.
3.3.6.2. Advantages of image processing:
1) Remove noises.
2) Correct image density and contrast.
3) Helps to easily store and retrieve in computers.
4) Image can be made available in any desired formats like black and white, negative image.
3.3.6.3. Disadvantage of image processing:
1) Initial cost is high depending upon the system used.

2) Once the system is damaged the image will be lost.
23
CHAPTER 4
SYSTEM DESIGN
4.1 ARCHITECTURE DIAGRAM:
Fig 4.1 Architecture diagram
24
4.2 Sequence Diagram:
Fig 4.2 Sequence diagram
25
4.3 Use Case Diagram:
26
Fig 4.3 Use case diagram
27
4.4 Activity Diagram:
Fig 4.4 Activity diagram
28
4.5 Collaboration Diagram:
User System
Train Dataset
Test Dataset
Model Deploy
Linear Regression
Build Model
Predicted Output
Fig 4.5 Collaboration Diagram
29
CHAPTER 5
SYSTEM DESIGN - IMPLEMENTATION
5.1 MODULES:
 Data Collection
 Data Pre-Processing
 Feature Engineering
 Model Training and Optimization
 Model Evaluation
 Deployment
 Monitoring and Maintenance
5.2Module Explanation
In a house price prediction system using machine learning, several modules are typically
employed to facilitate various stages of the prediction process. Here are the key modules
commonly used:
5.2.1 Data Collection Module:

This module is responsible for gathering data from diverse sources such as real estate
databases, APIs, web scraping, or user inputs. It ensures a comprehensive dataset for training
and prediction.
5.2.2 Data Preprocessing Module:

The data preprocessing module handles tasks like cleaning the data (handling missing values,
outliers), transforming features (scaling, encoding categorical variables), and splitting the data
into training and testing sets.
30
5.2.3 Feature Engineering Module:
Feature engineering involves creating new features, selecting relevant features, and
transforming data to extract meaningful information that can enhance prediction accuracy.
Techniques like PCA (Principal Component Analysis) or polynomial features may be used
here.
5.2.4 Model Training and Optimization Module:

This module involves selecting appropriate machine learning algorithms (e.g., linear
regression, decision trees, random forests, gradient boosting machines) and training them on
the preprocessed data. Hyperparameter tuning techniques like grid search or random search are
used for model optimization.
5.2.5 Model Evaluation Module:

The model evaluation module assesses the performance of trained models using metrics such
as RMSE (Root Mean Squared Error), MAE (Mean Absolute Error), R-squared, or MAPE
(Mean Absolute Percentage Error). Cross-validation techniques may also be applied to
validate model robustness.
5.2.6 Deployment Module:

Once a satisfactory model is obtained, the deployment module facilitates integrating the
model into a production environment. This could involve creating APIs, building web
applications, or deploying the model as a standalone service for real-time predictions.
5.2.7 Monitoring and Maintenance Module:

The monitoring and maintenance module continuously monitors model performance in
production, detects anomalies or drifts in data distributions, and triggers retraining or updates
as needed to ensure sustained accuracy and reliability over time.
31
.
32
33
34
CHAPTER 6
CODING STANDARDS
6.1 CODING STANDARDS
Coding standards are guidelines to programming that focuses on the physical structure and
appearance of the program. They make the code easier to read, understand and maintain. This
phase of the system actually implements the blueprint developed during the design phase. The
coding specification should be in such a way that any programmer must be able to understand
the code and can bring about changes whenever felt necessary. Some of the standard needed to
achieve the above-mentioned objectives are as follows:
 Program should be simple, clear and easy to understand.
 Naming conventions
 Value conventions
 Script and comment procedure
 Message box format
 Exception and error handling
6.1.1 NAMING CONVENTIONS
Naming conventions of classes, data member, member functions, procedures

etc., should be self-descriptive. One should even get the meaning and scope of the variable by
its name. The conventions are adopted for easy understanding of the intended message by the
user. So it is customary to follow the conventions. These conventions are as follows:
Class names
Class names are problem domain equivalence and begin with capital letter and have
mixed cases.
35
Member Function and Data Member name
Member function and data member name begins with a lowercase letter with each
subsequent letters of the new words in uppercase and the rest of letters in lowercase.
6.1.2 VALUE CONVENTIONS
Value conventions ensure values for variable at any point of time. This involves the
following:
 Proper default values for the variables.
 Proper validation of values in the field.
 Proper documentation of flag values.
6.1.3 SCRIPT WRITING AND COMMENTING STANDARD
Script writing is an art in which indentation is utmost important. Conditional and

looping statements are to be properly aligned to facilitate easy understanding. Comments are
included to minimize the number of surprises that could occur when going through the code.
6.1.4 MESSAGE BOX FORMAT
When something has to be prompted to the user, he must be able to understand it

properly. To achieve this, a specific format has been adopted in displaying messages to the user.
They are as follows:
 X – User has performed illegal operation.
 ! – Information to the user.
36
6.2 TEST PROCEDURE
SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub-assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the software system meets its
requirements and user expectations and does not fail in an unacceptable manner. There are
various types of test. Each test type addresses a specific testing requirement.
Testing is one of the important steps in the software development phase. Testing checks
for the errors, as a whole of the project testing involves the following test cases:
 Static analysis is used to investigate the structural properties of the Source

code.
 Dynamic testing is used to investigate the behavior of the source code by
executing the program on the test data.
6.3 TEST DATA AND OUTPUT
6.3.1 UNIT TESTING
Unit testing involves the design of test cases that validate that the internal program
logic is functioning properly, and that program inputs produce valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual software units
of the application .it is done after the completion of an individual unit before integration. This is
a structural testing, that relies on knowledge of its construction and is invasive. Unit tests
perform basic tests at component level and test a specific business process, application, and/or
system configuration. Unit tests ensure that each unique path of a business process performs
accurately to the documented specifications and contains clearly defined inputs and expected
results.
37
6.3.2 FUNCTIONAL TEST
Functional tests provide systematic demonstrations that functions tested are
available as specified by the business and technical requirements, system documentation, and
user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be
rejected. Functions : identified functions must be exercised.
Output : identified classes of application outputs must be
exercised. Systems/Procedures : interfacing systems or procedures must be
invoked.
Organization and preparation of functional tests is focused on requirements, key functions,

or special test cases. In addition, systematic coverage pertaining to identify Business
process flows; data fields, predefined processes, and successive processes must be
considered for testing. Before functional testing is complete, additional tests are identified
and the effective value of current tests is determined.
6.3.3 PERFORMANCE TEST

It determines the amount of execution time spent in various parts of the unit, program
throughput, and response time and device utilization by the program unit.
6.3.4 STRESS TEST

Stress Test is those test designed to intentionally break the unit. A Great deal can be
learned about the strength and limitations of a program by examining the manner in which a
programmer in which a program unit break.
38
6.3.5 STRUCTURED TEST
Structure Tests are concerned with exercising the internal logic of a program and
traversing execution paths. The way in which White-Box test strategy was employed to ensure
that the test cases could Guarantee that all independent paths within a module have been
exercised at least once.
 Exercise all logical decisions on their true or false sides.

 Execute all loops at their boundaries and within their operational bounds.
 Exercise internal data structures to assure their validity.
 Checking attributes for their correctness.
 Handling end of file condition, I/O errors, buffer problems and textual
errors in output information
6.3.6 INTEGRATION TESTING
Integration tests are designed to test integrated software components to determine if they
run as one program. Testing is event driven and is more concerned with the basic outcome of
screens or fields. Integration tests demonstrate that although the components were individually
satisfaction, as shown by successfully unit testing, the combination of components is correct and
consistent. Integration testing is specifically aimed at exposing the problems that arise from the
combination of components.
6.3.7 SYSTEM TEST
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable results. An example of
system testing is the configuration-oriented system integration test. System testing is based on
process descriptions and flows, emphasizing pre-driven process links and integration points.
39
6.3.8 TESTING TECHNIQUES / TESTING STRATERGIES
a) TESTING
Testing is a process of executing a program with the intent of finding an error. A good
test case is one that has a high probability of finding an as-yet –undiscovered error. A successful
test is one that uncovers an as-yet- undiscovered error. System testing is the stage of
implementation, which is aimed at ensuring that the system works accurately and efficiently as
expected before live operation commences. It verifies that the whole set of programs hang
together. System testing requires a test consists of several key activities and steps for run
program, string, system and is important in adopting a successful new system. This is the last
chance to detect and correct errors before the system is installed for user acceptance testing.
The software testing process commences once the program is created and the
documentation and related data structures are designed. Software testing is essential for
correcting errors. Otherwise the program or the project is not said to be complete. Software
testing is the critical element of software quality assurance and represents the ultimate the review
of specification design and coding. Testing is the process of executing the program with the
intent of finding the error. A good test case design is one that as a probability of finding an yet
undiscovered error. A successful test is one that uncovers an yet undiscovered error. Any
engineering product can be tested in one of the two ways:
b) WHITE BOX TESTING
This testing is also called as Glass box testing. In this testing, by knowing the
specific functions that a product has been design to perform test can be conducted that
demonstrate each function is fully operational at the same time searching for errors in each
function. It is a test case design method that uses the control structure of the procedural design to
derive test cases. Basis path testing is a white box testing.
40
Basis path testing:
 Flow graph notation

 Cyclometric complexity
 Deriving test cases
 Graph matrices Control
c) BLACK BOX TESTING

In this testing by knowing the internal operation of a product, test can be
conducted to ensure that “all gears mesh”, that is the internal operation performs according to
specification and all internal components have been adequately exercised. It fundamentally
focuses on the functional requirements of the software.
The steps involved in black box test case design are:
 Graph based testing methods

 Equivalence partitioning
 Boundary value analysis
 Comparison testing
d) SOFTWARE TESTING STRATEGIES:
A software testing strategy provides a road map for the software developer. Testing is a
set activity that can be planned and conducted systematically. For this reason, a template for
software testing a set of steps into which we can place specific test case design methods should
be strategy should have the following characteristics:
 Testing begins at the module level and works “outward” toward the
integration of the entire computer-based system.
 Different testing techniques are appropriate at different points in time.
 The developer of the software and an independent test group conducts
testing.
41
 Testing and Debugging are different activities, but debugging must
be accommodated in any testing strategy.
e) INTEGRATION TESTING:
Software integration testing is the incremental integration testing of two or more integrated
software components on a single platform to produce failures caused by interface defects.
The task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company level –
interact without error.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
f) PROGRAM TESTING:
The logical and syntax errors have been pointed out by program testing. A syntax
error is an error in a program statement that in violates one or more rules of the language in
which it is written. An improperly defined field dimension or omitted keywords are common
syntax error. These errors are shown through error messages generated by the computer. A logic
error on the other hand deals with the incorrect data fields, out-off-range items and invalid
combinations. Since the compiler s will not deduct logical error, the programmer must examine
the output. Condition testing exercises the logical conditions contained in a module. The possible
types of elements in a condition include a Boolean operator, Boolean variable, a pair of Boolean
parentheses A relational operator or on arithmetic expression. Condition testing method focuses
on testing each condition in the program the purpose of condition test is to deduct not only
errors in the condition of a program but also other a errors in the program.
g) SECURITY TESTING
Security testing attempts to verify the protection mechanisms built in to a system well,
in fact, protect it from improper penetration. The system security must be tested for
invulnerability from frontal attack must also be tested for invulnerability from rear attack. During
security, the tester places the role of individual who desires to penetrate system.
42
h) VALIDATION TESTING
At the culmination of integration testing, software is completely assembled as a

package. Interfacing errors have been uncovered and corrected and a final series of software test-
validation testing begins. Validation testing can be defined in many ways, but a simple definition
is that validation succeeds when the software functions in manner that is reasonably expected by
the customer. Software validation is achieved through a series of black box tests that
demonstrate conformity with requirement. After validation test has been conducted, one of two
conditions exists.
 The function or performance characteristics confirm to specifications and are

accepted.
 A validation from specification is uncovered and a deficiency created.
Deviation or errors discovered at this step in this project is corrected prior to

completion of the project with the help of the user by negotiating to establish a method for
resolving deficiencies. Thus, the proposed system under consideration has been tested by using
validation testing and found to be working satisfactorily. Though there were deficiencies in the
system they were not catastrophic.
i) USER ACCEPTANCE TESTING
User acceptance of the system is key factor for the success of any system. The system under
consideration is tested for user acceptance by constantly keeping in touch with prospective
system and user at the time of developing and making changes whenever required. This is done
in regarding to the following points.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
6.4. OUTPUT DESIGN
A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and to
43
other system through outputs. In output design it is determined how the information is to be
displaced for immediate need and also the hard copy output. It is the most important and direct
source information to the user. Efficient and intelligent output design improves the system’s
relationship to help user decision-making.
1. Designing computer output should proceed in an organized, well thought out manner; the right
output must be developed while ensuring that each output element is designed so that people will
find the system can use easily and effectively. When analysis design computer output, they
should Identify the specific output that is needed to meet the requirements.
2.Select methods for presenting information.
3.Create document, report, or other formats that contain information produced by the system.
The output form of an information system should accomplish one or more of the following
objectives.
 Convey information about past activities, current status or projections of the

 Future.
 Signal important events, opportunities, problems, or warnings.
 Trigger an action.
 Confirm an action.
44
CHAPTER 7
SNAP
SHOTS
7.1 DATA COLLECTION
45
46
7.2 DATA SELECTION
47
7.3 DATA CLEANING
48
7.4 DATA PREPROCESSING
49
7.5 OUTPUT
50
CHAPTER 8
CONCLUSION AND FUTURE ENHANCEMENTS
8.1 CONCLUSION
In summary, this study showcases the effectiveness of machine learning algorithms in

accurately and robustly predicting house prices. Through a systematic evaluation of various
algorithms and extensive feature engineering, we have demonstrated that models such as gradient
boosting machines excel in capturing complex relationships within housing markets, resulting in
superior predictive performance. The analysis of feature importance further enriches our
understanding of the key factors influencing house prices, providing valuable insights for stakeholders
in the real estate industry. By leveraging these insights, stakeholders can make informed decisions
regarding pricing strategies, investment opportunities, and market analysis, ultimately enhancing their
competitiveness and profitability.
Furthermore, this research contributes significantly to the advancement of house price prediction
methodologies, highlighting the potential of machine learning techniques to address the challenges of
accurately forecasting house prices in dynamic and heterogeneous real estate markets. Looking ahead,
continued research in this area, coupled with the integration of emerging technologies and data
sources, will further enhance the accuracy and applicability of machine learning models in real-world
scenarios, ultimately benefiting both industry practitioners and consumers alike.
8.2 FUTURE ENHANCEMENTS

Future enhancements for house price prediction using machine learning are poised to
revolutionize the real estate industry. One key area of development is the integration of advanced deep
learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks
(RNNs), to extract more intricate spatial and temporal patterns from property data. This can lead to
more accurate predictions, especially in dynamic markets where trends and preferences evolve
rapidly.Additionally, the incorporation of natural language processing (NLP) techniques can enhance
51
the analysis of unstructured data, such as property descriptions and customer reviews. Sentiment
analysis and topic modeling can provide valuable insights into market sentiments and buyer
preferences, further refining prediction models.
The utilization of augmented reality (AR) and virtual reality (VR) technologies is another
exciting avenue for future enhancements. These technologies can offer immersive experiences for
buyers, allowing them to virtually tour properties, visualize renovations, and assess neighborhood
amenities, all of which can influence pricing predictions.
Furthermore, the integration of blockchain technology for transparent and secure property
transactions, coupled with data from Internet of Things (IoT) devices for real-time market monitoring,
can create a more holistic and data-driven approach to house price prediction.
In summary, future enhancements in house price prediction using machine learning will
leverage advanced deep learning techniques, NLP, AR/VR technologies, blockchain, and IoT data to
enhance prediction accuracy, provide richer insights, and transform the real estate industry's decision-
making processes.
52
CHAPTER 9
REFERENCES
[1] Sahu, M., Singh, A., & Chawda, R. K. (n.d.-b). House Price Prediction using Machine Learning.
International Journal of Innovative Technology and Exploring Engineering, 8(9), 717–722.
https://doi.org/10.35940/ijitee.i7849.078919
[2] Thamarai, M., & Malarvizhi, S. (n.d.). House Price Prediction Modeling Using Machine Learning.
International Journal of Information Engineering and Electronic Business, 12(2), 15–20.
https://doi.org/10.5815/ijieeb.2020.02.03
[3] HOUSE PRICE PREDICTION USING MACHINE LEARNING. (2023). International Research
Journal of Modernization in Engineering Technology and Science.
https://doi.org/10.56726/irjmets35094
[4] Jain, M., Rajput, H., Garg, N., & Chawla, P. (2020). Prediction of House Pricing using Machine
Learning with Python. 2020 International Conference on Electronics and Sustainable
Communication Systems (ICESC). https://doi.org/10.1109/icesc48915.2020.9155839
[5] House Price Prediction Using Machine Learning Techniques: A Review" by K. L. Kala and S. S.
Patel (2018).
[6] A Comparative Study of Machine Learning Algorithms for House Price Prediction" by A. R.
Salim, M. A. Z. Aziz, and M. S. Al Mamun (2021).
[7] House Price Prediction Using Machine Learning Techniques: A Case Study in Lagos, Nigeria" by
O. O. Oyelade, O. R. Arulogun, and O. Olabiyisi (2019).
[8] Predicting House Prices with Machine Learning: A Case Study of the Melbourne Property
Market" by E. Morley and M. J. Skitmore (2018).
53
9.1 APPENDIX :CODING
JUPYTER NOTEBOOK
import pandas as pd
import NumPy as np
import matplotlib.pyplot as plt
import seaborn as sns
import random
random.seed(100)
import warnings
warnings.filterwarnings("ignore")
df=pd.read_csv("train.csv")
pd.set_option("display.max_rows",1000)
pd.set_option("display.max_columns",1000)
## Let's check the head of the Data
df.head()
##Let's check the shape of the data
df.shape()
## Let's check if any null values exist in our dataset or not
df.isnull().sum()
##The datasets seems clean
#Let's see the information about the dataset
df.info()
##Let's see how many categorical columns are there in our dataset
cat_cols=[col for col in df.columns if df[col].dtypes=="O"]
cat_cols
df[cat_cols]
##Let's see posted owner
df["POSTED_BY"].value_counts()
sns.countplot(df["POSTED_BY"])
## Let's See How Price of the House Varies According the Posted member
54
df.groupby("POSTED_BY")["TARGET(PRICE_IN_LACS)"].mean()
## As you can see ,if Builder Posts the project it will be higher price.
## let's see how many buildings are in construction
df["UNDER_CONSTRUCTION"].value_counts()
sns.countplot(df["UNDER_CONSTRUCTION"])
## Let's Visulaize the Rera columns
##RERA- Rear state regularity Act
## It was formed to bring transperency in Real Estate Sector
df["RERA"].value_counts()
sns.countplot(df["RERA"])
df.groupby("RERA")["TARGET(PRICE_IN_LACS)"].mean().plot(kind="bar",grid=True)
## It is ovibous that RERA approved buliding are of Higher Price
##Because it aims to reduce project delays and mis-selling.
## Let's See on BHK_NO which represents number of rooms
df["BHK_NO."].value_counts()
##As we can see we number of rooms ranging from 1 to 17
#But generally we don't have more than 6 columns as you can see from the distribution
## The values that are greater than 6 can be treated as potential outliers
## Let's Remove this Values From Our dataset
df2=df[df["BHK_NO."]<7]
df2.shape
len(df2)/len(df)*100
df2["BHK_NO."].value_counts()
df2.groupby(["BHK_NO."])["TARGET(PRICE_IN_LACS)"].mean()
## As you can see as the number of rooms increases , price of the house decrease
## Let's See how many are RK and How Many are BHK
df2["BHK_OR_RK"].value_counts()
## RK means Rooms and Kitchen which mean in a single big room there is kitchen and room attached
## These type of housese are rare and used in old days
df2[df2["BHK_OR_RK"]=="RK"]["BHK_NO."].value_counts()
## As you can see these are only single rooms
## Hence it would be better to remove these values.
55
## As after removing this value we have only BHK value left in that column , It would be better to remove this
columns
df3=df2[df2["BHK_OR_RK"]!="RK"]
df3.drop(columns=["BHK_OR_RK"],inplace=True)
df3.head()
## Let's See SQUARE_FT columns
df["SQUARE_FT"]
plt.figure(figsize=(10,5))
sns.displot(df3["SQUARE_FT"])
## Let's See the scatter plot between squaers
plt.scatter(x=df3["SQUARE_FT"],y=df3["TARGET(PRICE_IN_LACS)"])
sns.boxplot(df3["SQUARE_FT"])
df3[df3["SQUARE_FT"]>3*1e5]
## These Eight values are potential Outliers.As square foot size is extremely high
##hence drop these values
df4=df3[df3["SQUARE_FT"]<3*1e5]
sns.boxplot(df4["SQUARE_FT"])
df4.head()
## Let's see How many building are ready to move
df4["READY_TO_MOVE"].value_counts()
df4.groupby("READY_TO_MOVE")["TARGET(PRICE_IN_LACS)"].mean()
## There is no much more difference in
df4["RESALE"].value_counts()
## As you can see most of the houses are resold
df4.groupby("RESALE")["TARGET(PRICE_IN_LACS)"].mean()
## It is clearly seen that resold houses are relatively lower price
## Let's see the address columns
df4["ADDRESS"].value_counts()
## As we can see in the address , at the end of each address there is a prominent city or state name
##Let's extract these citis and states from these columns
56
df4["ADDRESS"].apply(lambda x : x.split(",")[-1].strip())
## Let's store these values in a new column
df4["LOCATION"]=df4["ADDRESS"].apply(lambda x : x.split(",")[-1].
strip())
df4["LOCATION"].value_counts()
## Let's make the value of the loaction "Others" if that value is less frequent i.e less than 20
rare_values=list(df4["LOCATION"].value_counts()[df4["LOCATION"].value_counts()<15].index)
rare_values
df4["LOCATION"].replace(rare_values,"Others",inplace=True)
df4["LOCATION"].value_counts()
## Let's drop ADDRESS column
## As Latitude and Longitude are Specific to that address let's drop this also
df5=df4.drop(columns=["ADDRESS","LATITUDE","LONGITUDE"])
locations_values=list(df5["LOCATION"].unique())
locations_values
cat_col=[col for col in df5.columns if df5[col].dtypes =="O"]
cat_col
## Let's use label encoding to fill the value of categorical columns
df5["POSTED_BY"].unique()
posted_map={"Owner":0,"Dealer":1,"Builder":2}
all_loc=list(df5.groupby("LOCATION")["TARGET(PRICE_IN_LACS)"].mean().sort_values().index)
location_map={loc:i for i,loc in enumerate(all_loc)}
location_map
df5["LOCATION"]=df5["LOCATION"].map(location_map)
df5["POSTED_BY"]=df5["POSTED_BY"].map(posted_map)
df5.info()
##Let's plot correlation between output and input features
df5.drop(columns=["TARGET(PRICE_IN_LACS)"]).corrwith(df["TARGET(PRICE_IN_LACS)"]).plot(kind=
"bar",color="purple")
## As you can see the square foot column is more correlated
## Let's see correlation betweem input features
57
sns.heatmap(df5.drop(columns=["TARGET(PRICE_IN_LACS)"]).corr(),cmap="rainbow",fmt=".2f",annot=Tr
ue)
## As I can see there is high correlation betwwen under construction and ready to move. The realtion is clear
that as the house is under cnstruction it is not ready to use hence drop it
df5.drop(columns=["UNDER_CONSTRUCTION"],inplace=True)
X=df5.drop(columns=["TARGET(PRICE_IN_LACS)","READY_TO_MOVE"])
y=df5["TARGET(PRICE_IN_LACS)"]
X.head()
X.shape
from sklearn.model_selection import train_test_split,cross_val_score,RandomizedSearchCV

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression,Lasso,Ridge
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble._forest import RandomForestRegressor
# from CatBoost import CatBoostRegressor
# from xgboost import XGBRegressor
# from lightgbm import LGBMRegressor
from sklearn.metrics import r2_score,mean_squared_error
train_X,test_X,train_y,test_y=train_test_split(X,y,test_size=0.15,random_state=100)
sc=StandardScaler()
train_X=sc.fit_transform(train_X)
test_X=sc.transform(test_X)
def check_test_score(model):
pred=model.predict(test_X)
58
print("R2 score is:", r2_score(test_y,pred))
print("The mean squared error is :",mean_squared_error(test_y,pred))
lr=LinearRegression()
lr.fit(train_X,train_y)
check_test_score(lr)
check_test_score(lasso)
ridge=Ridge(random_state=100)
ridge.fit(train_X,train_y)
check_test_score(ridge)
sv_regressor=SVR(kernel="linear")
sv_regressor.fit(train_X,train_y)
check_test_score(sv_regressor)
sv_regressor=SVR(kernel="rbf")
sv_regressor.fit(train_X,train_y)
check_test_score(sv_regressor)
dt=DecisionTreeRegressor()
dt.fit(train_X,train_y)
check_test_score(dt)
rf=RandomForestRegressor()
rf.fit(train_X,train_y)
mod=rf.predict([[0,0,0,5,1300.236407,1,1]])
check_test_score(rf)
## Among all models Gradient Boosting Techniques Outperforms all these values
sc=StandardScaler()
X=sc.fit_transform(X)
X
## As you can see it is performimg well
import pickle
with open("my1_model.pkl","wb") as f:
pickle.dump(rf,f)
with open("my1_scalar.pkl","wb") as f:
pickle.dump(sc,f)
df5
59
posted_map
yes_map={"Yes":1,"No":0}
Pause=True
while Pause:
values=[]
a=input("POSTED_BY")
values.append(posted_map[a])
a=input("RERA approved or not")
values.append(yes_map[a])
a=int(input("Number of Rooms"))
values.append(a)
a=int(input("Square foot"))
values.append(a)
a=input("Ready to Move")
a=input("Resale")
a=input("Location")
values.append(location_map[a])
Pause=False
val=sc.transform([values])
model.predict(val)
for loc in locations_values:
print(f"""<option value="{loc}">{loc}</option>""")
len(locations_values)
posted_map={'Owner': 0, 'Dealer': 1, 'Builder': 2}
posted_map
location_map
60
end
ifnargout
[varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});
else
gui_mainfcn(gui_State, varargin{:});
end
% End initialization code - DO NOT EDIT
% --- Executes just before main_gui is made visible.
functionmain_gui_OpeningFcn(hObject, eventdata, handles,
varargin)
% This function has no output args, see OutputFcn.
% hObject handle to figure
% eventdata reserved - to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
% varargin command line arguments to main_gui (see VARARGIN)
% Choose default command line output for main_gui
handles.output = hObject;
% Update handles structure
guidata(hObject, handles);
% UIWAIT makes main_gui wait for user response (see UIRESUME)
% uiwait(handles.figure1);
% --- Outputs from this function are returned to the command line.
functionvarargout = main_gui_OutputFcn(hObject, eventdata,
handles)
% varargout cell array for returning output args (see VARARGOUT);
% hObject handle to figure
% Get default command line output from handles
structure varargout{1} = handles.output;
61
% --- Executes on button press in pushbutton1.
function pushbutton1_Callback(hObject, eventdata, handles)
% hObject handle to pushbutton1 (see GCBO)
global image
[filename pathname] = uigetfile({'*.jpg';'*.bmp'},'File Selector');
image = strcat(pathname, filename);
axes(handles.axes1);
imshow(image)
% handles structure with handles and user data (see
GUIDATA) global image I
k=getimage(handles.axes1);
I=imresize(k,[256 256]);
pause(1)
I=rgb2gray(I);
imshow(I)
global I I2 a2
I2 = fspecial('gaussian')
62
imshow(I2)
pause(1)
a2=imfilter(I,I2);
imshow(a2)
--- Executes on button press in pushbutton4.
hObject handle to pushbutton4 (see GCBO)
eventdata reserved - to be defined in a future version of
MATLAB handles structure with handles and user data (see
GUIDATA) global a2 image1
image1=imadjust(a2,stretchlim(a2));
imshow(image1);
--- Executes on button press in pushbutton5.
function pushbutton5_Callback(hObject, eventdata,
handles) hObject handle to pushbutton5 (see GCBO)
eventdata reserved - to be defined in a future version of
MATLAB handles structure with handles and user data (see
GUIDATA) global a2 I2 image1
OTSU segmentation ..
level = graythresh(image1);
seg_img = im2bw(image1,level);
imshow(seg_img);
pause(1)
gmag = imgradient(image1);
imshow(gmag);
title('Gradient Magnitude')
pause(1)
63
L = watershed(gmag);
Lrgb = label2rgb(L);
imshow(Lrgb)
title('Watershed Transform of Gradient
Magnitude') pause(1)
se = strel('disk',20);
Io = imopen(image1,se);
imshow(Io)
title('Opening')
pause(1)
Ie = imerode(image1,se);
Iobr = imreconstruct(Ie,image1);
imshow(Iobr)
title('Opening-by-Reconstruction')
pause(1)
Ioc = imclose(Io,se);
imshow(Ioc)
title('Opening-Closing')
pause(1)
Iobrd = imdilate(Iobr,se);
Iobrcbr = imreconstruct(imcomplement(Iobrd),imcomplement(Iobr));
Iobrcbr = imcomplement(Iobrcbr);
imshow(Iobrcbr)
title('Opening-Closing by
Reconstruction') pause(1)
64
fgm = imregionalmax(Iobrcbr);
imshow(fgm)
title('Regional Maxima of Opening-Closing by Reconstruction')
pause(2)
I2 = labeloverlay(image1,fgm);
imshow(I2)
title('Regional Maxima Superimposed on Original Image')
pause(1)
se2 = strel(ones(5,5));
fgm2 = imclose(fgm,se2);
fgm3 =
imerode(fgm2,se2);
fgm4 = bwareaopen(fgm3,20);
I3 = labeloverlay(image1,fgm4);
imshow(I3)
title('Modified Regional Maxima Superimposed on Original Image')
pause(1)
bw = imbinarize(Iobrcbr);
imshow(bw)
title('Thresholded Opening-Closing by
Reconstruction') pause(1)
D = bwdist(bw);
DL = watershed(D);
bgm = DL == 0;
imshow(bgm)
title('Watershed Ridge Lines)')
65
pause(1)
gmag2 = imimposemin(gmag, bgm | fgm4);

L = watershed(gmag2);
labels = imdilate(L==0,ones(3,3)) + 2*bgm + 3*fgm4;
I4 = labeloverlay(image1,labels);
imshow(I4)
title('Markers and Object Boundaries Superimposed on Original Image')
pause(1)
Lrgb = label2rgb(L,'jet','w','shuffle');
imshow(Lrgb)
title('Colored Watershed Label Matrix')
pause(1)
imshow(image1)
hold on
himage = imshow(Lrgb);
himage.AlphaData = 0.3;
title('Colored Labels Superimposed Transparently on Original Image')
pause(1)
home.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-
66
scale=1.0, minimum-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>House Price Prediction</title>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-
awesome.min.css" integrity="sha384-
wvfXpqpZZVQGK6TAh5PVlGOfQNHSoD2xbE+QkPxCAFlNEevoEH3Sl0sibVcOQVnN"
crossorigin="anonymous">
<link rel="stylesheet" href="static\style.css">
<style>
.centerdiv {
height: 20vh;
display: flex;
justify-content: center;
align-items: center;
}
a{
height: 50px;
width: 50px;
background-color: #f5f6fa;
border-radius: 50px;
text-align: center;
margin: 20px;
box-shadow: 1px 4px 2px 2px #dcdde1;
line-height: 60px;
}
ai{
transition: all 0.3s linear;
}
a:hover i {
transform: scale(1.4);
}
67
.details {
margin: 5px;
}
</style>
</head>
<body background="static\house1.jpg">
<h1 align="center">House Price Prediction</h1>

<h5 align="center">House Price Prediction Using Machine Learning</h5>
<div class="imagediv"><img src="static\ip1.jpg" alt="" class="img1"></div>
<div class="forms">
<form action="{{ url_for('predict') }}" method="post">
<label for="posted">Posted By:</label>
<select name="posted" id="posted" required>
<option value="Owner">Owner</option>
<option value="Dealer">Dealer</option>
<option value="Builder">Builder</option>
</select>
<label for="rera">RERA approved or not:</label>

<select name="rera" id="rera" required>
<option value="Yes">Yes</option>
<option value="No">No</option>
</select>
<label for="rooms">Number of Rooms:</label>

<input type="number" name="rooms" placeholder="Number of Rooms" autocomplete="off" min="0"
required>
<label for="foot">Square Foot:</label>

<input type="number" name="foot" placeholder="Square Foot" autocomplete="off" step="0.1" min="0"
required>
68
<label for="ready_to_move">Ready to Move or Not:</label>
<select name="ready_to_move" id="ready_to_move" required>
</select>
<label for="resale">Resale:</label>
<select name="resale" id="resale" required>
</select>
<label for="location">Location Name:</label>

<select name="location" id="location_name" required>
<option value="team">Select City Name</option>

<option value="Bangalore">Bangalore</option>
<option value="Mysore">Mysore</option>
<option value="Ghaziabad">Ghaziabad</option>
<option value="Kolkata">Kolkata</option>
<option value="Kochi">Kochi</option>
<option value="Jaipur">Jaipur</option>
<option value="Mohali">Mohali</option>
<option value="Chennai">Chennai</option>
<option value="Siliguri">Siliguri</option>
<option value="Noida">Noida</option>
<option value="Raigad">Raigad</option>
<option value="Bhubaneswar">Bhubaneswar</option>
<option value="Pune">Pune</option>
<option value="Mumbai">Mumbai</option>
<option value="Nagpur">Nagpur</option>
<option value="Bhiwadi">Bhiwadi</option>
<option value="Faridabad">Faridabad</option>
69
<option value="Lalitpur">Lalitpur</option>
<option value="Maharashtra">Maharashtra</option>
<option value="Vadodara">Vadodara</option>
<option value="Visakhapatnam">Visakhapatnam</option>
<option value="Vapi">Vapi</option>
<option value="Mangalore">Mangalore</option>
<option value="Aurangabad">Aurangabad</option>
<option value="Vijayawada">Vijayawada</option>
<option value="Belgaum">Belgaum</option>
<option value="Bhopal">Bhopal</option>
<option value="Lucknow">Lucknow</option>
<option value="Kanpur">Kanpur</option>
<option value="Gandhinagar">Gandhinagar</option>
<option value="Pondicherry">Pondicherry</option>
<option value="Agra">Agra</option>
<option value="Ranchi">Ranchi</option>
<option value="Gurgaon">Gurgaon</option>
<option value="Udupi">Udupi</option>
<option value="Indore">Indore</option>
<option value="Jodhpur">Jodhpur</option>
<option value="Coimbatore">Coimbatore</option>
<option value="Valsad">Valsad</option>
<option value="Palghar">Palghar</option>
<option value="Surat">Surat</option>
<option value="Varanasi">Varanasi</option>
<option value="Guwahati">Guwahati</option>
<option value="Amravati">Amravati</option>
<option value="Anand">Anand</option>
<option value="Tirupati">Tirupati</option>
<option value="Secunderabad">Secunderabad</option>
<option value="Raipur">Raipur</option>
<option value="Vizianagaram">Vizianagaram</option>
<option value="Thrissur">Thrissur</option>
<option value="Madurai">Madurai</option>
70
<option value="Chandigarh">Chandigarh</option>
<option value="Shimla">Shimla</option>
<option value="Gwalior">Gwalior</option>
<option value="Rajkot">Rajkot</option>
<option value="Sonipat">Sonipat</option>
<option value="Allahabad">Allahabad</option>
<option value="Dharuhera">Dharuhera</option>
<option value="Durgapur">Durgapur</option>
<option value="Panchkula">Panchkula</option>
<option value="Solapur">Solapur</option>
<option value="Goa">Goa</option>
<option value="Jamshedpur">Jamshedpur</option>
<option value="Jabalpur">Jabalpur</option>
<option value="Hubli">Hubli</option>
<option value="Patna">Patna</option>
<option value="Bilaspur">Bilaspur</option>
<option value="Ratnagiri">Ratnagiri</option>
<option value="Meerut">Meerut</option>
<option value="Jalandhar">Jalandhar</option>
<option value="Ludhiana">Ludhiana</option>
<option value="Kota">Kota</option>
<option value="Panaji">Panaji</option>
<option value="Kolhapur">Kolhapur</option>
<option value="Ernakulam">Ernakulam</option>
<option value="Bhavnagar">Bhavnagar</option>
<option value="Bharuch">Bharuch</option>
<option value="Asansol">Asansol</option>
<option value="Margao">Margao</option>
<option value="Bhilai">Bhilai</option>
<option value="Dehradun">Dehradun</option>
<option value="Guntur">Guntur</option>
<option value="Jalgaon">Jalgaon</option>
<option value="Udaipur">Udaipur</option>
<option value="Neemrana">Neemrana</option>
71
<option value="Sindhudurg">Sindhudurg</option>
<option value="Kottayam">Kottayam</option>
<option value="Dhanbad">Dhanbad</option>
<option value="Navsari">Navsari</option>
<option value="Bahadurgarh">Bahadurgarh</option>
<option value="Nellore">Nellore</option>
<option value="Haridwar">Haridwar</option>
<option value="Jamnagar">Jamnagar</option>
<option value="Junagadh">Junagadh</option>
<option value="Ahmednagar">Ahmednagar</option>
<option value="Palakkad">Palakkad</option>
<option value="Karjat">Karjat</option>
<option value="Ajmer">Ajmer</option>
<option value="Aligarh">Aligarh</option>
<option value="Rudrapur">Rudrapur</option>
<option value="Solan">Solan</option>
<option value="Mathura">Mathura</option>
<option value="Others">Others</option>
</select>
<input type="submit" value="Submit">

</form>
<div class="headw">
{% with messages=get_flashed_messages() %}
{% if messages %}
{% for msg in messages %}
<h3 align="center">{{ msg }}</h3>
{% endfor %}
{% endif %}
{% endwith %}
</div>
<div class="details" align="center">

72
<h2>Made with ❤ by JS</h2>
</div>
</div>
<div class="centerdiv">
<a href="https://www.facebook.com/jayasurya.jayasurya.946954?sfnsn=wiwspwa&mibextid=2JQ9oc"
target="_blank">
<i class="fa fa-2x fa-facebook" aria-hidden="true"></i>
</a>
<a href="https://www.instagram.com/jayasurya.2511?igsh=YzljYTk1ODg3Zg==" target="_blank">
<i class="fa fa-2x fa-instagram" aria-hidden="true"></i>
</a>
<a href="https://github.com/jayasurya2511/TO-DO-LIST-main" target="_blank">
<i class="fa fa-2x fa-github" aria-hidden="true"></i>
</a>
<a href="https://www.linkedin.com/in/jayasurya-s-ab6959200?
utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=android_app"
target="_blank">
<i class="fa fa-2x fa-linkedin" aria-hidden="true"></i>
</a>
</div>
</body>
</html>
Index.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>House Price Prediction</title>
73
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-
awesome.min.css" integrity="sha384-
wvfXpqpZZVQGK6TAh5PVlGOfQNHSoD2xbE+QkPxCAFlNEevoEH3Sl0sibVcOQVnN"
crossorigin="anonymous">
</head>
<body style="background-image: url('static/house1.jpg'); background-size: cover; background-repeat: no-
repeat; height: 100vh; margin: 0;">
<h2 align="center" style="color: white; margin-top: 50px;">House Price Prediction</h2>
<div style="text-align: center; margin-top: 20px;">

{% with messages=get_flashed_messages() %}
{% if messages %}
{% for msg in messages %}
<h3 style="background: white; border: 4px solid pink; border-radius: 4px; padding: 10px; display:
inline-block;">{{ msg }}</h3>
{% endfor %}
{% endif %}
{% endwith %}
</div>
<div style="text-align: center; margin-top: 20px; color: white;">

<h2>Made with ❤ by JS</h2>
</div>
<div class="centerdiv" style="margin-top: 20px;">

</form>
</div>
74
<div class="centerdiv" style="margin-top: 20px;">
<a href="https://www.facebook.com/jayasurya.jayasurya.946954?sfnsn=wiwspwa&mibextid=2JQ9oc"
target="_blank">
<i class="fa fa-2x fa-facebook" aria-hidden="true"></i>
</a>
<a href="https://www.instagram.com/jayasurya.2511?igsh=YzljYTk1ODg3Zg==" target="_blank">
<i class="fa fa-2x fa-instagram" aria-hidden="true"></i>
</a>
<a href="https://github.com/jayasurya2511/TO-DO-LIST-main" target="_blank">
<i class="fa fa-2x fa-github" aria-hidden="true"></i>
</a>
<a href="https://www.linkedin.com/in/jayasurya-s-ab6959200?
utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=android_app"
target="_blank">
<i class="fa fa-2x fa-linkedin" aria-hidden="true"></i>
</a>
</div>
</body>
</html>
Style.css:
body{
background-size: auto 100%;

background-repeat:no-repeat;
}
.imagediv{
text-align:center;
}
.forms{
max-width:400px;
75
margin:auto;
border-radius:5px;
paddding:60px;
}
input[type=text],select{
width:100%;
padding:12px 15px;
margin:8px 0;
display:inline-block;
border:1px solid #ccc;
border-radius:4px;
box-sizing:border-box;
}
input[type=number],select{
width:100%;
padding:12px 20px;
margin:8px 0;
background:purple;
color:white;
text-color:white;
display:inline-block;
border:1px solid #ccc;
border-radius:4px;
box-sizing:border-box;
}input[type=submit],select{
width:100%;
padding:14px 20px;
margin:8px 0;
background:purple;
color:white;
border:none;
border-radius:4px;
cursor:pointer;}
76
input[type=submit]:hover{
background:#000;
}
app.py:
from flask import Flask, render_template, redirect, url_for, request, session, flash
import numpy as np
import pickle
app = Flask(__name__)
app.secret_key = "JS"
# Your existing route for the home page

@app.route("/")
def home():
return render_template("home.html")
# Modified predict route

@app.route('/predict', methods=['POST'])
def predict():
if request.method == "POST":
session.permanent = True
vals = []
location_map = {
'Karjat': 0,'Bhavnagar': 1,'Rudrapur': 2,'Palghar': 3,'Junagadh': 4,'Durgapur': 5,'Ratnagiri': 6,
'Bharuch': 7,'Vapi': 8,'Neemrana': 9,'Bhiwadi': 10,'Valsad': 11,'Bhilai': 12,'Navsari':
13,'Asansol':14,'Vizianagaram': 15,'Jamnagar': 16,'Haridwar': 17,'Mathura': 18,'Raigad': 19,'Meerut':
20,'Sindhudurg': 21, 'Bilaspur': 22,'Solan': 23,'Dhanbad': 24,'Bhopal': 25,'Aurangabad': 26,'Nellore': 27,'Hubli':
28,'Raipur': 29, 'Amravati': 30,'Ajmer': 31,'Dharuhera': 32,'Solapur': 33,'Kolhapur': 34,'Siliguri': 35,'Gwalior':
36,'Others': 37,'Ahmednagar': 38,'Agra': 39,'Udupi': 40,'Aligarh': 41,'Jodhpur': 42,'Gandhinagar': 43,'Guntur':
44,'Anand': 45, 'Bahadurgarh': 46,'Belgaum': 47,'Indore': 48,'Jamshedpur': 49,'Margao': 50,'Rajkot':
77
51,'Palakkad': 52,'Madurai': 53,'Sonipat': 54,'Kota': 55,'Vijayawada': 56,'Jabalpur': 57,'Pondicherry':
58,'Guwahati': 59,'Jalandhar': 60,'Allahabad': 61,'Tirupati': 62,'Udaipur': 63,'Secunderabad': 64,'Vadodara':
65,'Visakhapatnam': 66,'Ghaziabad': 67,'Jaipur': 68,'Thrissur': 69,'Patna': 70,'Faridabad': 71,'Bhubaneswar':
72,'Surat': 73,'Shimla': 74,'Varanasi': 75,'Mysore': 76,'Mangalore': 77,'Dehradun': 78,'Nagpur': 79,'Coimbatore':
80,'Ernakulam': 81,'Ludhiana': 82,'Panchkula': 83,'Lucknow': 84,'Chandigarh': 85,'Kolkata': 86,'Kanpur':
87,'Kottayam': 88,'Panaji': 89,'Jalgaon': 90,'Mohali': 91,'Pune': 92,'Kochi': 93,'Ranchi': 94,'Noida': 95,'Chennai':
96,'Bangalore': 97,'Goa': 98,'Lalitpur': 99,'Mumbai': 100,'Maharashtra': 101,'Gurgaon': 102}
posted_map = {'Owner': 0, 'Dealer': 1, 'Builder': 2}

yes_map = {"Yes": 1, "No": 0}
try:
posted = request.form["posted"]
vals.append(posted_map[posted])
rera = request.form["rera"]
vals.append(yes_map[rera])
rooms = request.form["rooms"]
vals.append(rooms)
square_foot = request.form["foot"]
vals.append(square_foot)
ready = request.form["ready_to_move"]
vals.append(yes_map[ready])
resale = request.form["resale"]
vals.append(yes_map[resale])
location = request.form["location"]
vals.append(location_map[location])
with open("my1_model.pkl", "rb") as f:

78
model = pickle.load(f)
res = model.predict([vals])
for val in res:

flash(f"The Price of the House is around {int(val)} Lacs", "info")
except KeyError as e:
flash(f"Error: '{e}' not found in the form", "error")
return render_template("index.html")
if __name__ == "__main__":
app.run(debug=True)
79
CHAPTER 10
NATIONAL CONFERENCE
80
81
82
83
84

Final Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Report

Uploaded by

Copyright:

Available Formats

HOUSE PRICE PREDICTION USING MACHINE

in the partial fulfillment for the award of the

COMPUTER SCIENCE AND ENGINEERING

JEPPIAAR INSTITUTE OF TECHNOLOGY

Certified that this project report “HOUSE PRICE PREDICTION USING

This project report is submitted for viva voice examination to

INTERNAL EXAMINER EXTERNAL EXAMINER

JEPPIAAR INSTITUTE OF TECHNOLOGY,

We would like to express our deepest gratitude and respect to honorable

We take this opportunity to express our deepest and special thanks to

We express our sincere gratitude to Dr. SURESH, B.E., M.E., Ph.D.,

CHAPTER TITLE PAGE NO.

House price prediction using machine learning in the data science

FIGURE NO FIGURE NAME PAGE NO

4.1 Architecture Diagram 9

4.2 Sequence Diagram 10

4.3 Use Case Diagram 11

4.4 Activity Diagram 12

4.5 Collaboration Diagram 13

RF Random Forest Regression

DT Decision Tree Regression

CSV Comma-Separated Values

SVM Support Vector Machine

1.2 MOTIVATION OF PROJECT:

2.1 LITERATURE SURVEY:

2.2 EXISTING SYSTEM

2.3 PROPOSED SYSTEM:

The requirements specification is a technical specification of requirements for the software

3.2. INPUT DESIGN

 What data should be given as input?

3.3.1 HARDWARE REQUIREMENTS

 Processor Type : Intel Core i5 or i7.

3.3.2 SOFTWARE REQUIREMENTS

 Operating System : Windows 7.

3.3.5 The MATLAB System

The MATLAB Mathematical Function Library.

The MATLAB language.

The MATLAB Application Program Interface (API).

Starting and Quitting MATLAB

A = [16 3 2 13; 5 10 11 8; 9 6 7 12; 4 15 14 1]

3.3.6. IMAGE PROCESSING

Image processing involves changing the nature of an image in order to either

1. Improve its pictorial information for human interpretation,

2. Render it more suitable for autonomous machine perception.

Major techniques of digital image processing are as follows:

Independent Component Analysis, which separates a multivariate signal computationally into

Anisotropic Diffusion, which is often known as Perona-Malik Diffusion, makes it possible to

3.3.6.2. Advantages of image processing:

3.3.6.3. Disadvantage of image processing:

1) Initial cost is high depending upon the system used.

4.1 ARCHITECTURE DIAGRAM:

Fig 4.1 Architecture diagram

Fig 4.2 Sequence diagram

Fig 4.4 Activity diagram

Fig 4.5 Collaboration Diagram

SYSTEM DESIGN - IMPLEMENTATION

5.2.1 Data Collection Module:

5.2.2 Data Preprocessing Module:

5.2.4 Model Training and Optimization Module:

5.2.5 Model Evaluation Module:

5.2.6 Deployment Module:

5.2.7 Monitoring and Maintenance Module:

6.1 CODING STANDARDS

 Program should be simple, clear and easy to understand.