Professional Documents
Culture Documents
Name
Instructor
Course
Date
Homework Assignment 2
Problem 1
We also assume that both X1 and X2 have zero sample correlation, which means that:
a) Show the formula to find the estimates of β1 and β2. What is the value of the slope
when X2 is regressed on X1?
In case of regression line with two independent variables as denoted here, then
estimates for β1 and β2 are as below:
^β 1= S X 1 Y
S X1 X 1
and
S X2Y
^β 2=
S X2 X2
^β 3=0
b) What are the formulas to calculate the residuals when Y is regressed on X1 and
when X2 is regressed on X1?
c) Find the slope of the regression for the added variable model when Y is regressed on
X1 and then on X2. Illustrate that this slope is identical to the slope when Y is
regressed on X2. What is the intercept for the added variable regression model?
∑ ( x i2 −x́2 ) ( y i− ý− β^ 1 ( xi 1− x́ 1 ) )
=
∑ ( x i 2− x́ 2) 2
Substituting in the expressions for partial regression coefficients yields;
n
( )
= S X 2 Y − β^ 1 ∑ ( x i1 −x́1 ) ( x i 2−x́ 2 ) / S X 2 X 2
i=1
= S X 2 Y /S X 2 X 2
This is the same as ^β 2
Hence;
slope = ^β 2
From this, the estimated intercept equals 0, a case which implies that the R-squared
value here will be identical to that when Y is regressed to X2.
Surname 3
Problem 2
The relationship between per capita GDP and birth rate and some other possible variables
(that may have been missed in previous studies) is being analyzed by an international panel of
scientists. The database is available for about 200 regions (see the data on page 2). The
following mean function is used in the analysis:
E(log(Birth Rate)|log(Per capita GDP) = x1, (Percent City Population) = x2) = β0 + β1x1 +
β2x2
a) Illustrate that the coefficient estimated for log(Per capita GDP) is identical to the slope
estimated for log(Per capita GDP) after adding the variable (Percent City Population)
to the model. If the two coefficients are the same it means that in a multiple linear
regression model all the estimates are being adjusted for the other terms in the mean
function.
Solution:
> attach(UN)
> m1 <- lm(logBirthRate~PercentCityPopulation)
> m2 <- lm(logPerCapitaGDP~Purban)
> m3 <- lm(residuals(m1)~residuals(m2))
> m4 <- lm(logBirthRate~PercentCityPopulation+logPerCapitaGDP)
> summary(m3)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.487e-17 2.826e-02 -5.26e-16 1
residuals(m2) -1.255e-01 1.904e-02 -6.588 4.21e-10 ***
---
Residual standard error: 0.3926 on 191 degrees of freedom
Multiple R-Squared: 0.1852
F-statistic: 43.41 on 1 and 191 DF, p-value: 4.208e-10
> summary(m4)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Surname 4
In the given two regressions, it is clear coefficients for log (PerCapitaGDP) are
identical (about -0.125) despite the fact that one is printed in standard notation ( -
1.255e-01) whereas the other one in scientific notation (-0.125475).
b) Show that the residuals from in the added variable model are the same as the residuals
calculated from the mean function with two explanatory variables.
By either having the two sets subtracted from each other or plotting a given set against
the other, the residuals will be seen to be same.
c) Illustrate that the t-values for the log(Per capita GDP) coefficient from the added
variable plot and from the two predictor regression are not the same and explain the
reason for variations.
The degree of freedom (DF) of the added variable plot calculation is wrong (just by an
additional DF. Upon having this rectified, the calculations will definitely be identical.
Problem 3
a) Explain how β1 can be considered a rate of change in Y for a very small change in
the explanatory variable x.
Here, the approximate mean function of the bivariate regression above is denoted
as:
E ( Y |X =x ) ≈ e β x β
0 1
b) Explain why the estimate of β1 will not depend on the base of logarithm.
To demonstrate this case, the base of logs has to be changed thus multiplying the
equations by a given constant. However, a keener observation reveals that the same
constant will divide the value of the coefficient β1, a case which implies that there would
be no impacts on the results. Therefore, the base of logarithms has no impact on β1
Problem 4
The model:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.828409
R Square 0.686261
Adjusted R Square 0.616542
Standard Error 2.232702
Observations 12
ANOVA
Significan
df SS MS F ce F
98.1353 49.0676 9.84314
Regression 2 6 8 4 0.005427
44.8646
Residual 9 4 4.98496
Surname 6
Total 11 143
2
1
0
-1 0 50000 100000 150000 200000
-2
-3
Sales
2
1
0
-1 0 2000 4000 6000 8000 10000 12000 14000 16000
-2
-3
Spend
1
0
0 2 4 6 8 10 12 14
-1
-2
-3
Month
As well, there is no clear pattern in this plot (seems random), a case which may point
out that there thus test of autocorrelation is passed by the model.