You are on page 1of 4

Not Peer Reviewed

Project Name: House Prices

Group Name: C
Group Members: PGID no :
Amrita Sarah
Ashutosh Dixit
Dhaval Mistry
Joyal Thomas 92110094
Prachi Naik
Renju Sarin
Overall Inferences

Summary Inferences
Bedrooms: 97.4% of properties have either 2 , 3 or 4 bedrooms. Out of
which 50% properties have three bedrooms, while 30.7% are with four Most frequently bought property will have 3 or 4
and remaining ones with 2 bedrooms. While six properties with one or bedrooms.
six bedrooms.

Fireplace: Around 60% properties of properties have fireplace mean age Most of newly constructed properties include fireplace.
20.3 years, while properties without fireplace have mean 38.5 years.

Price Vs Lot Size: 547 of properties i.e. 52.3% are in lot size “0 to 1”
and are in price in range of (100, 200). Trend is moving towards affordable housing,

Bathrooms: Majority (76.9%) of properties have either 1.5, 2 or 2.5 67% of the newly constructed houses come with
bathrooms. Only 5 properties have 4 and above, while approximately powder rooms indicating this is a new trend.
19% properties have single bathroom.

Price Vs Lot Size: 52.3% of properties are in lot size “0 to 1” and are in
price in range of (100, 200).

Buyer Preference: 40% of the houses are independent houses as


they use less than 10% of the lot area as living space and only 2% Buyers prefer to purchase houses with outdoor areas
of the houses sold are apartment housing.

Graphical output
Problems
Q2: Does the normal model provide a good description of prices? Use a normal quartile …
Solution: As evident from below plot , the pricing representation is not normal.

Q3: Irrespective of your response to Q2, assume that Price ~ N(164K, (68K)2). Given this:
A. Calculate the following probabilities – P(Price > 92.8K), P(Price < 255.5K). Do these numbers agree with what
you see in the data?
Solution:
Mean – 164K and Sigma – 68K | x1=92K & x2=255.5K
For (P>92.8K) the z score is -0.147536
For (P<255.5K) the z score is 0.910782
B. Once again, assuming the above normal distribution, what percentage of houses should have a value less than
232K? Does that agree with the data?
Solution:
84% of the houses have a value less than 232K. Yes, it agrees with the data.
C. Based on the theoretical model, what do you expect should be the price of a house that is exactly on the 3rd
quartile (75th percentile,). How does that compare to the actual?

Based on theoretical model price of a house on 3 rd quartile is 205397

P(0.75) = 0.6745 which is the Z value


Z= x-μ/(σ)

Z(σ)= x – μ, 0.6745*68000 = x-164000

Value of x = 209866

Therefore, there is a difference of 4,469 between the actual and theoretical value

Q4: Create the 90%, 95%, and 99% confidence intervals for the average home price and explain what these mean.
How do the margins of error for these three confidence intervals compare? Does that make sense?
Solution:

Confidence Level T-value Lower CI Upper CI MOE


90% 1.646 1,64,053 1,63,960 39.8%
95% 1.962 1,64,064 1,63,936 47.4%
99% 2.581 1,64,084 1,63,916 62.4%

Higher the confidence, lower the precision and the margin of error is 1.5X times higher as we move from 90% to
99%

Q5. The sample data given to you all come from home sales within the past 12 months. Suppose you had sample
data of the same size each year going back several years, and calculated the average sale price for each year. What
kind of distribution do you expect to see for these averages and why? (Include the parameters of the distribution in
your response, assuming that the house prices don’t change i.e. go up or down, over time. Clearly this is not a great
assumption, but make it anyway.)
Solution:
According to CLT, the curve will be symmetrical and the distribution of the sample mean would approximate the
normal distribution in the range 163936 – 164063 with 95% confidence and low precision.

Q6.Your friend claims that the average house price in this area is above $150K. Do you agree? He also claims that
the average living area is more than 1800 Sq.ft. Do you agree with this? (Use a 5% significance level for both).
Briefly explain what the p-values in these cases mean?

You might also like