Professional Documents
Culture Documents
Problem Statement
1) To find out how much impact does have homes near/far from incinerator bring
on the house prices
2) To identify the impact of age, land, area, and incinerator distance from the house
on the selling price.
3) To analyse the factors such as rooms, bathroom, wind, cbd and age of the house
on the selling price.
4) To analyse if the house prices vary according to the year in the data (i.e., 1978 &
1981) and the condition of being near/far in those each year.
5) To identify that how much does no of rooms in the house and bathrooms effect
the area of the house.
1. year -- 1981
2. age -- age of house
3. agesq -- age^2
4. nbh -- neighborhood #, 1 to 6
5. cbd -- dist. to central bus. dstrct, feet
6. intst -- dist. to interstate, feet
7. lintst -- log(intst)
8. price -- selling price
9. rooms -- # rooms in house
10. area -- square footage of house
11. land -- square footage lot
12. baths -- # bathrooms
13. dist -- dist. from house to incinerator, feet
14. ldist -- log(dist)
15. wind -- perc. time wind incin. to house
16. lprice -- log(price)
17. y81 -- 1 if year == 1981
18. larea -- log(area)
19. lland -- log(land)
20. y81ldist -- y81*ldist
21. lintstsq -- lintst^2
22. y81nrinc -- y81*nearinc
23. rprice -- price in 1981 (dollars)
24. lrprice -- log(rprice)
Data Pre-Processing
In this Dataset we need not to clean the data. The dataset was already cleaned
when we download from the Kaggle. For your satisfaction I will show the
number of null or missing values in the dataset. As well as we need to
understand shape of the dataset.
Here first we will see what our dataset is
Now let’s try to plot subplots of some variables with respect to price
Now let’s see the distribution plots of different variables.
Now let’s see the count plot of Year 1978 and 1981.
Now let’s find out the relation between these variables
Model Building
Problem Statement - 1 : To find out how much impact does have
homes near/far from incinerator bring on the house prices
For this section we will take the dataset and split the data into training and testing
subsets. Typically, the data is also shuffled into a random order when creating the
training and testing subsets to remove any bias in the ordering of the dataset.
For this section we will take the dataset and split the data into training and testing
subsets. Typically, the data is also shuffled into a random order when creating the
training and testing subsets to remove any bias in the ordering of the dataset.
Splitting into X and Y datasets
For this section we will take the dataset and split the data into training and testing
subsets. Typically, the data is also shuffled into a random order when creating the
training and testing subsets to remove any bias in the ordering of the dataset.
Result:
If year is 1981 then there will be a price increase by 44%,assuming all other variables are
held constant. For every ft of increase in distance from incinerator, the price will be increased
by 7.5%,assuming all other variables are held constant. For every room, there will be a price
increase of 7.3%,assuming all other variables are held constant. For every bathroom, there
will be a price increase of 26%,assuming all other variables are held constant.
Problem Statement - 4 : To analyse if the house prices vary according to
the year in the data (i.e., 1978 & 1981) and the condition of being near/far in those
each year.
For this section we will take the dataset and split the data into training and testing
subsets. Typically, the data is also shuffled into a random order when creating the
training and testing subsets to remove any bias in the ordering of the dataset.
Result:
From the given coefficient of the above linear model, it is clear that if the house is near the
dumpsite (i.e within 15600ft) then price drop will be around 38.9% for every 1ft of closeness.
If year is 1981 then there will be a price increase by 46%,assuming all other
variables are held constant. If year is 1981 it is clear that if the house is near the dumpsite (i.e
within 15600ft) then price drop will be around 0.6% for every 1ft of closeness.
Problem Statement - 5 : To identify that how much does no of rooms in
the house and bathrooms effect the area of the house.
For this section we will take the dataset and split the data into training and testing
subsets. Typically, the data is also shuffled into a random order when creating the
training and testing subsets to remove any bias in the ordering of the dataset.
Result:
For every bathroom, there will be a price increase of 34%, assuming all other variables are
held constant. For every room, there will be a price increase of 6%, assuming all other
variables are held constant.