You are on page 1of 2

Applied Econometrics for Managers: Case #1

House Price Prediction

1. We will use the “data.xls” dataset for this session. Your first task is to import the dataset into
R. The data set contains information from the Ames Assessor’s Office used in computing
assessed values for individual residential properties sold in Ames, IA from 2006 to 2010.
In particular, two variables (PID and NEIGHBORHOOD) may be of special interest. PID
is the Parcel Identification Number assigned to each property within the Ames Assessor’s
system. The NEIGHBORHOOD variable identifies the physical locations within Ames city
limits. The data set contains 2930 observations and 82 variables. Detail descriptions of the
variables are given in “documentation.txt” file.

2. Provide a table with summary statistics for all the variables in the dataset.

3. Provide a plot for the sale price (SalePrice) against the above grade (ground) living area
square feet (GrLivArea). What do you see?

4. How many observations are there for which GrLivArea>4000? Remove all such observa-
tions.

5. Create a new variable “totalbath” which will provide the total number of bathrooms in a
house (sum of basement full and half bathrooms and above grade (ground) full and half
bathrooms). What is the mean of this new variable? Provide a plot for SalePrice against
totalbath. What do you see?

6. Construct a variable “yearlastdev” which for every observation will take the maximum of
YearBuilt and YearRemod/Add. Create variable “agebeforesale” by subtracting yearlastdev
from “YrSold”. What is the mean of agebeforesale?

7. Construct a dummy variable “normalsale” which will be 1 if for any observation where the
sale condition is “Normal”. Drop all the observations where sale condition is not normal.
How many observations are we left with?

8. Construct a dummy variable “newmodel” if a house is built in or after the year 2000. What
is its mean?

9. Now we will build our first model for this test. We want to know if the above grade (ground)
living area in square feet (GrLivArea) has any impact on sale price of a house (SalePrice).
For this purpose, build a linear model with the SalePrice as the dependent variable, and the

1
GrLivArea as the only explanatory variable. Estimate the parameters of the model. Explain
the results.

10. We might be concerned that other measures of size of the house and the size of the plot on
which the house is built might also play an important role. Add TotalBsmtSf, LotArea, and
GarageCars as other explanatory variables to the model from question 9, and re-estimate
the parameters. Explain the results, and explain why not including the additional variables
might be a problem.

11. Now, we want to know how other attributes of a house can affect its sale price. For this pur-
pose, add the totalbath, OverallQual, and square of GarageCars as explanatory variables to
the model from question 10. Provide a justification for adding these variables. Re-estimate
the parameters, and explain the results.

12. To the model from question 11, add the dummy variables agebeforesale and newmodel.
Provide justification for doing this. Re-estimate model, and explain the results.

13. Add the neighborhood indicators and indicators for the month when the house is sold to the
model from question 12, and re-estimate the parameters. Explain the results. Test whether
the coefficients for month sold dummies are jointly significant.

14. Provide a nice table with the results from the five models you have built.

15. Suppose you present the entire work you have done so far to your boss and she doesn’t like
it. Build an alternative model. Justify your answer.

You might also like