You are on page 1of 30

A Project Report Submitted To

Rajiv Gandhi Proudyogiki Vishwavidyalaya

In partial fulfillment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by
Aastha Mahajan
(0701CS211001)
6th sem B.Tech
(2021 -2025)

Under the Supervision of


Mrs. Anjali More
(Assistant Professor)
Department of Computer Science and Engineering

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

UJJAIN ENGINEERING COLLEGE UJJAIN


(Declared Autonomous by Govt. of M.P.)
Indore Road, Ujjain, Madhya Pradesh 456010
ACKNOWLEDGEMENT

We are extremely thankful to our beloved Principal Mr. J.K. Srivastava, who took keen
interest in providing better infrastructure facilities and permitting us to use the facilities available to
accomplish the Seminar successfully. Who took keen interest and encouraged us in every effort
throughout this project.
We owe our gratitude to Mr. A. K. Mewafarosh, Head, Department of Computer Science and
Engineering, for his kind attention and valuable guidance to us throughout this project.
We are extremely thankful to our Project Guide Mrs. Anjali More, Assistant Professor of CSE
Department who took keen interest and encouraged us in every effort throughout this project.
We also thank all the teaching and non-teaching staff of CSE Department for their cooperation.

Aastha Mahajan
0701CS211001
TABLE OF CONTENTS

1. Introduction 1

1.1. Rationale 2
1.2. Problem Definition and Proposed System 2
1.3. System Synthesis 3
2. Literature Survey 5

3. Process Model Adopted

3.1. Analysis 5

3.1.1. Requirement Analysis 6


3.1.2. Object Oriented Analysis 7
3.1.3. Architectural signification 9
3.2. Design 10

3.2.1. Use Case Diagram 11


3.2.2. Sequence Diagram 12
3.2.3. Data Model (Data Flow Diagram) 13
4. Implementation 14

5. Expected Outcome 15

5.1. Appendix A

Screenshots of Sample Case 20

6. Concluding Remarks 26

7. References and Bibliography 27


INTRODUCTION

One of the basic requirements of livelihood in the recent world is to buy a house of your
own. The price of the house may depend on various factors. Real estate agents and many who are
involved in selling the house want a price tag on the house which would be the real worth of buying
the house. The prediction of the price of the house is often very hard for the inexperienced.
House is a fundamental need to someone and them costs range from place to place primarily
based totally at the facilities to be had like parking area, location, etc. The residence price is a factor
that issues a lot of citizens either wealthy or white-collar magnificence as possible by no means
decide or gauge the value of a residence primarily based totally on place or places of work
accessible. Purchasing a residence is the best and unique desire of a own circle of relatives because
this expends the whole thing of their funding budget and every now and then cover them under
loan. It is a hard challenge for expecting the correct value of residence price. This proposed version
could make viable who are expecting the precise costs of house. This model of Linear regression in
machine learning takes the internal factor of house valuation dependencies like area, no of
bedrooms, locality etc. and external factors like air pollution and crime rates. This Linear regression
in machine learning gives the output of price of the house with more accuracy.The user interacts
with the recommendation system through Streamlit, a web app framework known for its simplicity
and flexibility. Streamlit provides an interface where users can input the movie name. This
interactive design ensures a seamless user experience, allowing individuals to actively participate in
tailoring recommendations to their tastes.
Buying a house is one of the most important decisions for an individual. The price of a
house is dependent on wide factors which are its features such as no of bedrooms, area of
construction, locality of the house etc. and many external factors such as air pollution and crime
rate. These all factors make the prediction of price of the house more complex. This house price
prediction is beneficial for many real estates. So, an easier and accurate method is needed for the
prediction of house price.At the core of the recommendation system is a collaborative filtering
algorithm that predicts user preferences based on the viewing patterns of similar users. This
algorithm refines its predictions by incorporating user inputs received through the Streamlit
interface, ensuring that the recommendations are not only accurate but also aligned with individual
preferences.

Page | 1
1.1. Rationale
Having lived in India for so many years if there is one thing that I had been taking for granted, it’s that
housing and rental prices continue to rise. Since the housing crisis of 2008, housing prices have recovered
remarkably well, especially in major housing markets. However, in the 4th quarter of 2016, I was surprised
to read that Bombay housing prices had fallen the most in the last 4 years. In fact, median resale prices for
condos and coops fell 6.3%, marking the first time there was a decline since 2017. The decline has been
partly attributed to political uncertainty domestically and abroad and the 2014 election. So, to maintain the
transparency among customers and also the comparison can be made easy through this model. If customer
finds the price of house at some given website higher than the price predicted by the model, so he can reject
that house.
The aim is to predict the efficient house pricing for real estate customers with respect totheir budgets
and priorities. By analyzing previous market trends and price ranges, and alsoupcoming developments future
prices will be predicted. House prices increase every year, so there is a need for a system to predict house
prices in the future. House price prediction can help the developer determine the selling price of a house and
can help the customer to arrange the right time to purchase a house. We use linear regression algorithm in
machine learning for predicting the house price trends.

1.2. Problem Definition and Proposed Solution


House Price prediction, is important to drive Real Estate efficiency. As earlier, House prices were
determined by calculating the acquiring and selling price in a locality. Therefore, the House Price
prediction model is very essential in filling the information gap and improve Real Estate efficiency.

Since the data is broken down into two modules: a Training set and Testset, we must initially train the
model. The training set includes the target variable. The decision tree regressor algorithm is applied to the
training data set. The Decision tree builds a regression model in the form of a tree structure.

The trained model is applied to test dataset and house prices are predicted. The trained model is then
integrated with the front end using Flask in python.

Using historical data on house features (e.g., size, location, number of bedrooms) and corresponding rent
prices to train the chosen model. Evaluation: Assessing the model's performance using metrics like Mean
Squared Error (MSE), Root Mean Squared Error (RMSE), etc. Deployment: Implementing the model into a
system or application where users can input house features to get rent predictions. The domain also includes
considerations like feature engineering (creating new features from existing ones), handling outliers, and
potentially dealing with skewed data distributions.

Assessing the performance of the model on the testing data. Metrics such as Mean Absolute Error
(MAE),Mean Squared Error (MSE), Root Mean Squared Error (RMSE), or R-squared can be used. Cross-
validation may also be employed to ensure the model's generalization ability. Monitoring and Continuously
monitoring the model's performance in the production environment. Periodically retraining the model with
new data to keep it up to date.
Handling missing data: Imputing missing values or removing rows with missing data. Feature
engineering: Creating new features from existing ones, such as calculating price per square foot,
creating binary variables for categorical features, or extracting information from text (like
neighborhood sentiment analysis). Exploratory Data Analysis (EDA):Understanding the distribution of
the target variable (rent prices).Visualizing relationships between features and the target variable.
Identifying outliers or anomalies that may need to be addressed. Feature Selection: Choosing the most
relevant features that contribute to predicting rent prices.

Page | 2
Handling missing data: Imputing missing values or removing rows with missing data. Feature
engineering:Creating new features from existing ones, such as calculating price per square foot, creating
binary variables for categorical features, or extracting information from text (like neighborhood sentiment
analysis). Exploratory Data Analysis (EDA):Understanding the distribution of the target variable (rent
prices).Visualizing relationships between features and the target variable. Identifying outliers or anomalies
that may need to be addressed. Feature Selection: Choosing the most relevant features that contribute to
predicting rent prices

Implementing the model into a system or application where users can input house features to get rent
predictions. The domain also includes considerations like feature engineering (creating new features from
existing ones), handling outliers, and potentially dealing with skewed data distributions.

1.2 System Synthesis

Page | 3
Page | 4
LITERATURE SURVEY

House price prediction is a vast topic, which is implemented through a variety of Computer
Science Methods. Like Machine Learning, Linear Regression, Decision Tree, Deep Learning,
Fuzzy Logic, ANFIS (Adaptive-Neuro Fuzzy Inference System), and Linear performance pricing.
In proposed model of Machine Learning, the dataset is divided into two parts: Training and
Testing. 80% of data is used for training purpose and 20% used for testing purpose. The training
set include target variable. The model is trained by using various machine learning algorithms, out
of which Random forest regressions predict better results. For implementing the Algorithms, they
have used Python Libraries NumPy and Pandas.
House price prediction using polynomial regression with Particle Swarm Optimization the
authors have Washington DC house price prediction using polynomial regression and particle
swarm optimization methods. They have also improved particle swarm optimization method with
two methods. One is changing the topological structure of particle relations and the second
improvement is the introduction of new particle control mechanisms.
In Deep Learning Model study, the authors have developed a mode based on using
Heterogeneous Data Analysis Along with Joint Self-Attention Mechanism. The Heterogeneous
Data is to supplement house information, and it also assigns the weights automatically depending
different features or samples.

Page | 5
PROCESS MODEL ADOPTED
3.1. Analysis
3.1.1. Requirement Analysis
 Functional Requirements
✓ User Interface: The user interface will be a website. The user has to enter all the attributes
correctly and in the required format.
✓ Proper Forecasting: The system has to properly predict the price of the house according to
the input given by the user.

 Non-Functional Requirements
✓ Collaborative filtering: Collaborative filtering approaches build a model from user’s past behaviour
(i.e. items purchased or searched by the user) as well as similar decisions made by other users. This
model is then used to predict items (or ratings for items) that user may have an interest in but this
type of algorithm have some limitations.
✓ Content-based filtering: Content-based filtering approaches uses a series of discrete characteristics
of an item in order to recommend additional items with similar properties. Content-based filtering
methods are totally based on a description of the item and a profile of the user’s preferences. It
recommends items based on user’s past preferences. This is one of the metric that we can use when
calculating similarity, between users or contents.

 Integration Requirements
✓ Integration of numpy and pandas libraries for efficient data manipulation and
analysis.
✓ Integration of House price prediction project with the web application for real-
time recommendation generation.

Page | 6
3.1.2. Object Oriented Analysis
Object-oriented analysis is a powerful approach to designing and structuring software projects,
including ones like house price prediction..

✓Dataset Class: Manages the loading, preprocessing, and splitting of the dataset into training and
testing sets.
✓Predictor Class: Contains methods for training the model, making predictions, and evaluating the
model's performance.
✓Evaluator Class: Handles the evaluation metrics like RMSE, MAE, etc., and provides methods for
visualizing the model's performance.
✓Composition: The Dataset Class could have a composition relationship with the House Class, as the
dataset contains multiple houses.
✓Dependency: The Predictor Class depends on the Dataset Class for training and testing data.

Page | 7
3.1.3. Architectural Specification

HARDWARE AND SOFTWARE REQUIREMENTS

Hardware Requirements -

HARDWARE TOOLS MIMINUM REQUIREMENTS

PROCESSOR Core i3 or above

HARD DISK 50 GB

RAM 4.00 GB (Recommended 8.00 GB)

MONITOR RESOLUTION 1024x768

Software Requirements –

SOFTWARE TOOLS MINIMUM REQUIREMENTS

OPERATING SYSTEM WINDOWS 7/8/10/11, macOS 10.14 or


later, Any LINUX distribution
BIT 64-bit Operating System
TECHNOLOGY Python 3.12.0
PROGRAMING LANGUAGE Python
IDE Colab Notebook

Page | 8
3.2. Design
METHODOLOGY
1. Data Collection
✓ Determine what data you need for training your ML model. This includes factors like
historical house prices, features of the houses (e.g., size, number of bedrooms, location,
amenities), and any other relevant information..

2. Data Preprocessing
✓ Clean the dataset to handle missing values, inconsistencies, and outliers.
✓ Standardize data formats and ensure uniformity across different attributes.

3. Exploratory Data Analysis (EDA)


✓ Utilize numpy and pandas libraries to perform exploratory data analysis on the
preprocessed dataset.
✓ Analyze area_type,size, ,availability,location,size,society,total_sqft.

4. Model Development
✓ Once you have a satisfactory model, deploy it in a production environment where it
can be used to predict house prices..

5. Model Evaluation
✓ Evaluate the performance of the trained models using appropriate evaluation
metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root
Mean Squared Error (RMSE), etc. This helps in selecting the best-performing
model.

6. Model Serialization and Storage

✓ Serialize the trained house price prediction using the pickle library for efficient
storage and retrieval.

Page | 9
7. Web Application Development
✓ Continuously monitor the performance of the deployed model and update it as
necessary to adapt to changes in data patterns or to improve its accuracy over time.
✓ Document all steps involved in the project, including data sources, preprocessing steps,
model selection, training, evaluation results, and deployment procedures for future
reference and replication.

8. Testing and Deployment

✓ Conduct thorough testing of the House price prediction system to ensure


functionality, performance, and usability.
✓ Deploy the House price prediction system on a web server or cloud platform to
make it accessible to users.
✓ Monitor system performance and user feedback post-deployment to identify any
issues or areas for improvement.

Page | 10
3.2.1. Use Case Diagram

Page | 11
3.2.2. Sequence Diagram

Page | 12
3.2.3. Data Flow Diagram

Page | 13
IMPLEMENTATION

IDE Use In a Project –

1. Colaboratory Noebook - Google Colaboratory, popularly known as Colab, is a web IDE for
python that was released by Google in 2017.
Colab is an excellent tool for data scientists to execute Machine Learning and Deep Learning
projects with cloud storage capabilities. Colab is basically a cloud-based Jupyter notebook
environment that requires no setup. What’s more, it provides its users free access to high
compute resources such as GPUs and TPUs that are essential to training models quickly and
more efficiently.
Google Colab is built around Project Jupyter code and hosts Jupyter notebooks without
requiring any local software installation. But while Jupyter notebooks support multiple
languages, including Python, Julia and R, Colab currently only supports Python.

Google Colaboratory, or Colab, is an as-a-service version of Jupyter Notebook that enables you to write
and execute Python code through your browser. Jupyter Notebook is a free, open source creation from
the Jupyter Project

Page | 14
Data Set Use In Project –

1. House_size.csv
2. House_location.csv
3. House_area_type.csv
4. House_Data.csv

Steps to Develop a Project –

1. Data Collection
• Obtain a dataset containing house information from sources like IMDb, Kaggle.
• Ensure the dataset includes area_type,availability,Location,Size,Society,total_sqft.

2. Data Preprocessing
• Load Dataset - Obtain a dataset containing information about movies, such as titles,
genres, and ratings.
• Clean Data - Handle missing values, remove duplicates, and format data
appropriately.
• Transform Data - Convert categorical variables into numerical representations if
needed.
• Normalize Ratings - Normalize rating data to ensure consistency.

3. Feature Engineering

We first look into dimensionality reduction, where we consider the correlation between ‘total_rooms,’
‘total_bedrooms,’ and ‘households.’

By understanding their relationships, we can identify if any of these features can be combined or reduced
without losing valuable information

For numerical variables, the graph below shows the scatter plots of top 6 numerical features of highest
correlation with the house price. We can see that for the first five of them, there are very clear and strong
linear relationships..

4. Building Recommendation Model


• Implement recommendation techniques such as collaborative filtering, content-based
filtering, or hybrid approaches.
• Utilize NumPy for efficient computations and data manipulation required for model
development.

Page | 15
Page | 16
Python Libraries Use In Project -

1. Pandas - Pandas is a powerful Python library for data manipulation and analysis. It provides
data structures like DataFrame and Series, allowing users to handle structured data
effectively. Pandas simplifies tasks such as cleaning, filtering, and transforming data. With
its intuitive syntax and comprehensive functionality, Pandas is widely used in data science,
machine learning, and statistical analysis workflows.
1. pd.read_csv() - This function is used to read data from a CSV file into a Pandas
DataFrame. It takes in the path to the CSV file as input and returns a DataFrame
containing the data.
2. DataFrame.head() - This function is used to view the first n rows of a DataFrame.
By default, it returns the first 5 rows, but you can specify the number of rows to
display.
3. DataFrame.tail() - This function is used to view the last n rows of a DataFrame. By
default, it returns the last 5 rows, but you can specify the number of rows to display.
4. DataFrame.info() - This function provides a concise summary of a DataFrame,
including the data types of each column and the number of non-null values. It's useful
for quickly understanding the structure of the data.
5. DataFrame.merge() -This function is used to merge two DataFrames based on a
common column or index. It performs a database-style join operation, allowing you
to combine data from different sources.
6. DataFrame.plot() - This function is used to create various types of plots directly from
a DataFrame. It provides a convenient way to visualize data using matplotlib under
the hood.
7. DataFrame.pivot_table() - This function creates a pivot table from a DataFrame,
allowing you to summarize and aggregate data based on one or more columns.
8. DataFrame.dropna() - This function is used to remove rows or columns with missing
values (NaN). It provides options to drop rows or columns based on different criteria.

2. Numpy - NumPy is a fundamental Python library for numerical computing. It provides


support for multi-dimensional arrays and matrices, along with a variety of mathematical
functions to operate on these arrays efficiently. NumPy's capabilities include array
manipulation, linear algebra operations, Fourier transform, and random number generation.
It serves as a cornerstone for scientific computing and data analysis in Python.

Page | 17
1. numpy.array() - This function creates a NumPy array from a Python list or tuple. It
allows you to create multi-dimensional arrays easily.
2. numpy.arange() - This function returns an array with evenly spaced values within a
specified range. It is similar to Python's range() function but returns an array instead
of a list.
3. numpy.zeros() - This function creates an array filled with zeros of a specified shape.
4. numpy.ones() - This function creates an array filled with ones of a specified shape.
5. numpy.sum() - This function computes the sum of array elements over a specified
axis or the whole array if no axis is specified.
6. numpy.random.rand() - This function generates random numbers from a uniform
distribution over the interval [0, 1].

3. Seaborn - Seaborn is an amazing visualization library for statistical graphics plotting in Python. It
It provides beautiful default styles and color palettes to make statistical plots more attractive. It is
built on top matplotlib library and is also closely integrated with the data structures from pandas.

Seaborn aims to make visualization the central part of exploring and understanding data. It provides
dataset-oriented APIs so that we can switch between different visual representations for the same
variables for a better understanding of the dataset.

Different categories of plot in Seaborn

• Relational plots: This plot is used to understand the relation between two variables.

• Categorical plots: This plot deals with categorical variables and how they can be visualized.

• Distribution plots: This plot is used for examining univariate and bivariate distributions.

• Regression plots: The regression plots in Seaborn are primarily intended to add a visual guide that
helps to emphasize patterns in a dataset during exploratory data analyses.

• Matrix plots: A matrix plot is an array of scatterplots.

• Multi-plot grids: It is a useful approach to draw multiple instances of the same plot on different
subsets of the dataset.

Page | 18
4. Cosine Similarity - The Cosine Similarity library in Python computes the cosine similarity
between pairs of vectors, measuring the cosine of the angle between them in a multi-
dimensional space. It is commonly used in information retrieval and recommendation systems
to assess the similarity between items or documents based on their feature vectors. This metric
ranges from -1 (completely dissimilar) to 1 (perfectly similar).

Page | 19
EXPECTED OUTCOME

House price prediction can help the developer determine the selling price of a house and can help
the customer to arrange the right time to purchase a house. There are three factors that influence the
price of a house which include physical conditions, concept and location.

Page | 20
Page | 21
Page | 22
Page | 23
Page | 24
Page | 25
CONCLUDING REMARKS

In conclusion, this article explored the fascinating realm of house price prediction using machine learning in
Python. By leveraging various algorithms and data preprocessing techniques, we demonstrated how
predictive models can be developed to estimate house prices with remarkable accuracy. The significance of
feature engineering in enhancing model performance was evident, as it allowed us to extract meaningful
insights from the dataset and capture essential patterns.

The approaches for predicting property prices are constantly changing in line with the area of machine
learning. Such housing market predictions will likely grow more accurate and effective as new algorithms
and data become accessible. Thus, the machine learning model using linear regression algorithm is very
helpful in predicting the house prices for real estate customers. Here we have used a supervised learning
approach in machine learning field which will yield us a best possible result.

Thus the machine learning model to predict the house price based ongiven dataset is executed successfully
using xg regressor (a upgraded/slighted boosted form of regular linear regression, this gives lesser error).
This model further helps people understand whether this place is more suited for them based on heatmap
correlation. It also helps people looking to sell a house at best time for greater profit.

Any house price in any location can be predicted with minimum errorby giving appropriate dataset.

Page | 26
REFERENCES AND BIBLIOGRAPHY

 https://www.python.org/
 https://www.kaggle.com/
 https://pandas.pydata.org/
 https://seaborn.pydata.org/

 https://numpy.org/
 https://www.youtube.com/

Page | 27

You might also like