Housing
Prices
in Boston
Group - 10
Akhil Augustine
Akhil Dev
Ardhra
Santhosh
Naveen Shaji
Thomas Kappen
Introducti
• This project aims to explore the on
key factors that influence the housing prices
by analyzing the Boston Housing dataset using statistical and analytical
techniques such as descriptive statistics , hypothesis testing and regression
analysis.
• By applying the statistical techniques and identifying patterns in data , this
helps to support better decision making in urban planning ,housing
development and real estate investment. The key factors identified helps
making houses more affordable and informs citizens about local market
trends .
Dataset
Overview
Boston Housing Dataset -Collected from US census
Data
The purpose of the dataset is to analyze how
various social ,environmental and economic factors
affects housing prices.
The dataset consist of various attributes,around 14
columns with 506 rows.
Research
Questions
Objective : Determine the key factors that influence the housing prices in Boston.
Research Questions
• Is there a significant difference in the house prices based on procimity to charles river .
• Does the number of rooms affect the housing prices .
• How does the highway road accessibility affect the prices of houses.
- Implement a regression model to understand what features helps to predict the housing prices .
Target Variable (dependant ) y = medv ( Median Home Values in 1000s)
Feature Variables (independant) x = crim,zn ,chas, nox, rm ,dis , rad ,tax, indus , age ,
ptratio,b ,lstat
Descriptive
With a mean crime rate of 3.61 and a maximum of 88.98, the crime rate is extremely skewed, indicating that a small number of
Statistics
high-crime locations substantially skew the average.
Average number of rooms is 6.28 which indicates that most are mid sized , but it also show a significant variation in range which can
impacts housing prices .
The large standard deviation and broad range (187 to 711) of property tax (tax) indicate significant variation in municipal levies
across several zones.
Hypothesis
Test
Objective : Determine whether 1 number of rooms(rm)is greater
the average
than 6.
Assume that the significance level is 5 %.
Hypothesis
Test 2
Determine if there is a statistically significant difference in median home
values between the houses that are near to charles river (chas=1) and
those that are not (chas=0).Assuming significance level 5 percent .
H₀: μ₁ = μ₂
H₁: μ₁ ≠ μ₂
From the table ,
|t-stat| > t critical , so we reject H₀ .
So there is a significant difference .
Hypothesis
Test 3
Determine whether the median value of homes (medv) differs between houses
with low highway accessibility (rad<=4) and those with high accessibility
(rad>=4).
H₀: μ₁ = μ₂
H₁: μ₁ ≠ μ₂
From the table ,
|t-stat| > t critical , so we reject H₀.
There is a strong evidence that homes with
low access to highways have higer median
values .
Correlation
• Homes with more rooms typically have much higher costs, according to the strong positive correlation. (rm
vs medv = +0.695).
Matrix
• There is a strong negative correlation ((lstat vs medv = -0.7376) , which shows the median home value is
generaly lower in area with high lower status population.
• Towns with higher student-teacher ratios typically have lower home costs, which may indicate worse
educational quality,, according to a moderately negative connection (ptratio vs medv = -0.508).
• There is no significant correlation between chas and other variables.
Regression Analysis
y = 36.34 - 0.11*crim +0.04*zn + 2.71*chas - 17.37*nox + 3.80*rm - 1.49*dis + 0.29*rad - 0.01*tax -0.94*ptratio + 0.009*b -
0.52*lstat
Coefficient of Determination - R square - 0.7405
This means that 74% of variance in housing prices is
explained by this model.
Standard Error of Estimate - 4.736
The low value of standard error indicates that the predictions
of housing prices are reasonably accurate.
Testing Validity of Model - Significance F - 6E-137
It shows that the overall regression is significant. (p<0.05)
Testing coefficients
Independant variables with p value < 0.05 are significant .
(rm, rad and lstat are the most significant variables.)
GRAPH
S