Professional Documents
Culture Documents
ECON1203 HW Solution Week12
ECON1203 HW Solution Week12
1. Recall the Anzac Garage data (ANZACG.XLS) used in Weeks 3, 8 and 10.
In Week 3 we considered the simple linear regression model given by:
where price = used car price in dollars and age = age of the car in years.
The EXCEL results obtained using Ordinary Least Squares are
presented below:
Regression Statistics
R2
0.077
Standard Error 42069
Observations
117
CoefficientsStandard Error t Stat pvalue
Intercept
47469
6748
7.035 0.000
Age
2658
856
3.106 0.002
(a) Interpret the tStat and the pvalues in the EXCEL output.
What do you need to assume?
The tstat & pvalues in the EXCEL output are derived from twotail tests with
null hypotheses that the associated population parameter equals to 0. Hence,
larger tstats and lower pvalues mean we are more confident that the
associated population parameter is nonzero. Here, pvalues for both intercept
and Age coefficients are below 1% &, hence we can be confident that both
population parameters are statistically significant (nonzero).
We need to assume the disturbances are normal or because the sample size is
large invoke the CLT.
(b) Calculate a 95% confidence interval for the coefficient on age.
Standard normal critical value is 1.96 hence 95% confidence interval is:
1
where s = 42069, se(b1)=856 and hence
42069
856
Hence:
20889
1.98
42069
1
117
10
6.44
2415
2415
20889
9783
We are 95% confident that the price of a 10 year old car will fall between
$11,106 and $30,672. While the impact of age on price is precisely
estimated, the CI is quite wide because of the large amount of
unexplained variation that is indicated by the very low R2 value reported.
(Note: use of normal critical values here would be acceptable given the
large sample size and would make little practical difference as the
critical value would be 1.96 rather than 1.98)
Anzac Garages pricing scheme based on the age of the car is not
working out very well. When its secondhand cars are compared with
cars of the same age from other dealers, prices often diverge. One of
their consultants noted that the value of a secondhand car should
depend on both the Odometer reading as well as the Age of the vehicle.
This consultant wanted to estimate the following two simple linear
regression models separately:
where Odometer = distance the car has travelled since leaving factory
in kilometers. A senior consultant advised use of a multiple linear
regression model instead:
(f) Discuss why the simple linear regression methods may not be
preferable to the multiple regression method, in general, and in
the context of this problem. The resultant OLS estimates for the
multiple regression model given below:
The predictive performance of the model will improve as relevant variables are
added to a simple regression model.
Also the assumption that the disturbance is uncorrelated with the explanatory
variables is critical for the unbiased estimation of coefficients of included
variables. In the simple price on age regression it will be violated if variables
affecting price and correlated with age have been omitted from the model.
This is likely to be the case here with distance the car has traveled.
3
In addition though, you could argue that the multiple regression model is
better because it guards against the omitted variable bias that is likely in the
two simple linear regression models.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.9867
R Square
0.9736
Adjusted R
0.9713
Standard E 3140.3680
Observatio
26
Intercept
GNE
Price
t Stat
P-value
1.488
0.150
23.406
0.000
-4.722
0.000
3. SIA: Sydney housing prices.
Recall the housing price data for Sydney suburbs used in Question 6 in
Week 3. Your statistically nave friend has been doing some analysis of
Sydney housing prices using these data and has asked you for help. In
addition to the price data there are a number of characteristics
associated with the suburb that have been collected and are likely to
explain some of the large variation in housing prices across suburbs
that are observed in the data. Your friend was very interested in the
impact on housing prices of being located under the flight path. The
regression of housing price on the flightpath variable (Model 1)
provided a result that he did not expect. On your advice he ran a
second regression (Model 2) that included several extra explanatory
variables. Results for Model 1 and Model 2 are presented in the table,
together with a full description of variables used in the analysis.
Housing price is the mean of the median price of houses sold in each
suburb for two quarters (September and December 2002) measured
in thousands of dollars;
5
(a) How would you interpret the regression estimates for the
parameters in Model 1 and explain why your friend found the
result to be unexpected?
Because the estimate of 1 is positive this means houses under the flightpath
on average sell for more ($216,200 more) than houses not under the
flightpath. This is surprising because you would except aircraft noise
associated with being under the flighpath would be unattractive and hence
lead to lower not higher prices.
(b) Explain why the results in Model 1 are unreliable as a basis for
determining the impact on housing prices of being located under
the flight path. Which of the assumptions associated with simple
linear regression has clearly been violated in Model 1?
You would like to make the statement about the impact of being under the
flightpath holding other factors constant. This is not possible with Model 1
as it is a simple linear regression and hence there is potential for omitted
(confounding) variables that lead to biased estimates of the impact of being
situated under the flightpath.
For example, proximity to the beach is likely to impact on housing prices and
be correlated with being under the flightpath. In Model 1, the variable
Distance to beach is in the disturbance term and hence leads to a violation of
assumption that E(u|X) = 0.
(c) Write a brief description of the results for Flightpath in Model 2 in
terms of the parameter estimate, its interpretation and its
statistical significance.
6
Dependent variable:
Housing price
Model 1
Model 2
569.9
853.5
Intercept
(20.6)
(35.5)
216.2
51.5
Flightpath
(56.0)
(50.2)
Distance to
21.5
CBD
(3.4)
Distance to
21.0
Airport
(2.9)
Distance to
13.9
beach
(2.3)
Observations
503
503
R squared
0.029
0.372
* Numbers in brackets below coefficient estimates are standard errors.
Explanatory
variables