You are on page 1of 16

Incinerator – Data Analysis

Submitted by: - Riya Chhabra


Background study about the Problem Statement
In order to locate a new incinerator site near our city, we must select the location
such that there should not be any impact of its presence on the houses located in
the vicinity. And if they do, then it will impact the house prices too. In this case
study we will determine the impact of an incinerator site’s vicinity on house prices.

Problem Statement

1) To find out how much impact does have homes near/far from incinerator bring
on the house prices

2) To identify the impact of age, land, area, and incinerator distance from the house
on the selling price.

3) To analyse the factors such as rooms, bathroom, wind, cbd and age of the house
on the selling price.

4) To analyse if the house prices vary according to the year in the data (i.e., 1978 &
1981) and the condition of being near/far in those each year.

5) To identify that how much does no of rooms in the house and bathrooms effect
the area of the house.

About the Dataset


I have taken a dataset that has data of house prices corresponding to some other
parameters like distance to incinerator site, property area, distance to the city
centre, and distance to the
ring road, the land associated with a property, number of rooms, number of baths
and age of the house. The data of house prices were recorded in two years (i.e.,
1978 and 1981).On the base of this analysis, a model for predicting the prices of a
house is constructed.
Data Overview
The Dataset is derived from information collected about house prices. The
following describes the dataset columns:

1. year -- 1981
2. age -- age of house
3. agesq -- age^2
4. nbh -- neighborhood #, 1 to 6
5. cbd -- dist. to central bus. dstrct, feet
6. intst -- dist. to interstate, feet
7. lintst -- log(intst)
8. price -- selling price
9. rooms -- # rooms in house
10. area -- square footage of house
11. land -- square footage lot
12. baths -- # bathrooms
13. dist -- dist. from house to incinerator, feet
14. ldist -- log(dist)
15. wind -- perc. time wind incin. to house
16. lprice -- log(price)
17. y81 -- 1 if year == 1981
18. larea -- log(area)
19. lland -- log(land)
20. y81ldist -- y81*ldist
21. lintstsq -- lintst^2
22. y81nrinc -- y81*nearinc
23. rprice -- price in 1981 (dollars)
24. lrprice -- log(rprice)

Data Pre-Processing
In this Dataset we need not to clean the data. The dataset was already cleaned
when we download from the Kaggle. For your satisfaction I will show the
number of null or missing values in the dataset. As well as we need to
understand shape of the dataset.
Here first we will see what our dataset is

Data Description (Statistical analysis)

Check and Treat Missing Values

Check the data types of the variables


Exploratory Data Analysis

In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets


to summarize their main characteristics, often with visual methods. A statistical
model can be used or not, but primarily EDA is for seeing what the data can tell us
beyond the formal modeling or hypothesis testing task.

Now let’s try to plot subplots of some variables with respect to price
Now let’s see the distribution plots of different variables.

Now let’s see the count plot of Year 1978 and 1981.
Now let’s find out the relation between these variables
Model Building
Problem Statement - 1 : To find out how much impact does have
homes near/far from incinerator bring on the house prices

Shuffle and Split Data

For this section we will take the dataset and split the data into training and testing
subsets. Typically, the data is also shuffled into a random order when creating the
training and testing subsets to remove any bias in the ordering of the dataset.

Splitting into train and test datasets

Applying Linear Regression


Result:
From the given coefficient of the above linear model, it is clear that if the house is near the
dumpsite (i.e within 15600ft) then price drop will be around 3.9% for every 1ft of closeness.
so, we can say that the house which is near to the Incinerator have low price and the house
which is far have the higher price.

Problem Statement - 2: To identify the impact of age, land, area, and


incinerator distance from the house on the selling price.

Shuffle and Split Data

For this section we will take the dataset and split the data into training and testing
subsets. Typically, the data is also shuffled into a random order when creating the
training and testing subsets to remove any bias in the ordering of the dataset.
Splitting into X and Y datasets

Splitting into train and test datasets

Applying Linear Regression


Result:
From the above model, we have analysed that: 1) For increase of area by 1% there will be a
price increase by 63%, assuming all other variables are held constant. 2)For increase of land
by 1% there will be a price increase by 0.4%, assuming all other variables are held
constant.3) for increase of dist. from house to incinerator, feet price increase of 12%,
assuming all other variables are held constant. 4) For each year increase in house age, there
will be 1% of decrease in house price, assuming all other variables are held constant. From
the given coefficient of the above linear model, it is clear that if the house is near the
dumpsite (i.e., within 15600ft) then price drop will be around 19.4% for every 1ft of
closeness.

Problem Statement - 3 : To analyse the factors such as rooms, bathroom,


wind, cbd and age of the house on the selling price.

Shuffle and Split Data

For this section we will take the dataset and split the data into training and testing
subsets. Typically, the data is also shuffled into a random order when creating the
training and testing subsets to remove any bias in the ordering of the dataset.

Splitting into X and Y datasets

Splitting into train and test datasets


Applying Linear Regression

Result:
If year is 1981 then there will be a price increase by 44%,assuming all other variables are
held constant. For every ft of increase in distance from incinerator, the price will be increased
by 7.5%,assuming all other variables are held constant. For every room, there will be a price
increase of 7.3%,assuming all other variables are held constant. For every bathroom, there
will be a price increase of 26%,assuming all other variables are held constant.
Problem Statement - 4 : To analyse if the house prices vary according to
the year in the data (i.e., 1978 & 1981) and the condition of being near/far in those
each year.

Shuffle and Split Data

For this section we will take the dataset and split the data into training and testing
subsets. Typically, the data is also shuffled into a random order when creating the
training and testing subsets to remove any bias in the ordering of the dataset.

Splitting into X and Y datasets

Splitting into train and test datasets


Applying Linear Regression

Result:
From the given coefficient of the above linear model, it is clear that if the house is near the
dumpsite (i.e within 15600ft) then price drop will be around 38.9% for every 1ft of closeness.
If year is 1981 then there will be a price increase by 46%,assuming all other
variables are held constant. If year is 1981 it is clear that if the house is near the dumpsite (i.e
within 15600ft) then price drop will be around 0.6% for every 1ft of closeness.
Problem Statement - 5 : To identify that how much does no of rooms in
the house and bathrooms effect the area of the house.

Shuffle and Split Data

For this section we will take the dataset and split the data into training and testing
subsets. Typically, the data is also shuffled into a random order when creating the
training and testing subsets to remove any bias in the ordering of the dataset.

Splitting into X and Y datasets

Splitting into train and test datasets


Applying Linear Regression

Result:
For every bathroom, there will be a price increase of 34%, assuming all other variables are
held constant. For every room, there will be a price increase of 6%, assuming all other
variables are held constant.

You might also like