You are on page 1of 7

Surname 1

Name

Instructor

Course

Date

Homework Assignment 2

Problem 1

Suppose we need to fit the following mean function:

E(Y|X1 = x1, X2 = x2) = β0 + β1x1 +β2x2

We also assume that both X1 and X2 have zero sample correlation, which means that:

a) Show the formula to find the estimates of β1 and β2. What is the value of the slope
when X2 is regressed on X1?

In case of regression line with two independent variables as denoted here, then
estimates for β1 and β2 are as below:

^β 1= S X 1 Y
S X1 X 1

and

S X2Y
^β 2=
S X2 X2

As for the slope when independent variable X 2 is regressed on independent variable


X 1;
Surname 2

^β 3=0

b) What are the formulas to calculate the residuals when Y is regressed on X1 and
when X2 is regressed on X1?

Residual when Y is regressed on X1 can be found by;


e^ 1 i= y i− ý − ^β1 ( x i 1− x́ 1)

And residual when X2 is regressed on X1 can be found by;


e^ 3 i=x i 2−x́ 2

c) Find the slope of the regression for the added variable model when Y is regressed on
X1 and then on X2. Illustrate that this slope is identical to the slope when Y is
regressed on X2. What is the intercept for the added variable regression model?

Here, it is worth noting that ∑ e^ 3 i=0


Thus;
Slope = ∑ e^ 3 i e^ 1 i / ∑ e^ 23 i

∑ ( x i2 −x́2 ) ( y i− ý− β^ 1 ( xi 1− x́ 1 ) )
=
∑ ( x i 2− x́ 2) 2
Substituting in the expressions for partial regression coefficients yields;
n

( )
= S X 2 Y − β^ 1 ∑ ( x i1 −x́1 ) ( x i 2−x́ 2 ) / S X 2 X 2
i=1

= S X 2 Y /S X 2 X 2
This is the same as ^β 2
Hence;
slope = ^β 2

From this, the estimated intercept equals 0, a case which implies that the R-squared
value here will be identical to that when Y is regressed to X2.
Surname 3

Problem 2

The relationship between per capita GDP and birth rate and some other possible variables
(that may have been missed in previous studies) is being analyzed by an international panel of
scientists. The database is available for about 200 regions (see the data on page 2). The
following mean function is used in the analysis:

E(log(Birth Rate)|log(Per capita GDP) = x1, (Percent City Population) = x2) = β0 + β1x1 +
β2x2

a) Illustrate that the coefficient estimated for log(Per capita GDP) is identical to the slope
estimated for log(Per capita GDP) after adding the variable (Percent City Population)
to the model. If the two coefficients are the same it means that in a multiple linear
regression model all the estimates are being adjusted for the other terms in the mean
function.
Solution:

> attach(UN)
> m1 <- lm(logBirthRate~PercentCityPopulation)
> m2 <- lm(logPerCapitaGDP~Purban)
> m3 <- lm(residuals(m1)~residuals(m2))
> m4 <- lm(logBirthRate~PercentCityPopulation+logPerCapitaGDP)
> summary(m3)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.487e-17 2.826e-02 -5.26e-16 1
residuals(m2) -1.255e-01 1.904e-02 -6.588 4.21e-10 ***
---
Residual standard error: 0.3926 on 191 degrees of freedom
Multiple R-Squared: 0.1852
F-statistic: 43.41 on 1 and 191 DF, p-value: 4.208e-10

> summary(m4)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Surname 4

(Intercept) 2.592996 0.146864 17.656 < 2e-16


PercentCityPopulation -0.003522 0.001884 -1.869 0.0631
logPerCapitaGDP -0.125475 0.019095 -6.571 4.67e-10
---
Residual standard error: 0.3936 on 190 degrees of freedom
Multiple R-Squared: 0.4689
F-statistic: 83.88 on 2 and 190 DF, p-value: < 2.2e-16

In the given two regressions, it is clear coefficients for log (PerCapitaGDP) are
identical (about -0.125) despite the fact that one is printed in standard notation ( -
1.255e-01) whereas the other one in scientific notation (-0.125475).

b) Show that the residuals from in the added variable model are the same as the residuals
calculated from the mean function with two explanatory variables.

By either having the two sets subtracted from each other or plotting a given set against
the other, the residuals will be seen to be same.

c) Illustrate that the t-values for the log(Per capita GDP) coefficient from the added
variable plot and from the two predictor regression are not the same and explain the
reason for variations.

The degree of freedom (DF) of the added variable plot calculation is wrong (just by an
additional DF. Upon having this rectified, the calculations will definitely be identical.

Problem 3

Assume we have a bivariate regression with a mean function:


E(log(Y)|X = x) = β0 + β1log(x)

a) Explain how β1 can be considered a rate of change in Y for a very small change in
the explanatory variable x.

Here, the approximate mean function of the bivariate regression above is denoted
as:
E ( Y |X =x ) ≈ e β x β
0 1

This implies that;


dE ( Y |X =x ) /dx
=β 1 / x
E ( Y |X=x )
Thus, for a small change in the predictor variable x, the expression above shows
that the per unit change in the predicted variable Y is inverse (decreasing) to x
Surname 5

b) Explain why the estimate of β1 will not depend on the base of logarithm.

To demonstrate this case, the base of logs has to be changed thus multiplying the
equations by a given constant. However, a keener observation reveals that the same
constant will divide the value of the coefficient β1, a case which implies that there would
be no impacts on the results. Therefore, the base of logarithms has no impact on β1

Problem 4

Use the following data to estimate a multiple regression model.


Month Spend Sales
1 1000 9914
2 4000 40487
3 5000 54324
4 4500 50044
5 3000 34719
6 4000 42551
7 9000 94871
8 11000 118914
9 15000 158484
10 12000 131348
11 7000 78504
12 3000 36284

The model:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.828409
R Square 0.686261
Adjusted R Square 0.616542
Standard Error 2.232702
Observations 12

ANOVA
Significan
  df SS MS F ce F
98.1353 49.0676 9.84314
Regression 2 6 8 4 0.005427
44.8646
Residual 9 4 4.98496
Surname 6

Total 11 143      

Coefficien Standar Lower Upper Lower Upper


  ts d Error t Stat P-value 95% 95% 95.0% 95.0%
1.28294 1.68260 0.12674 5.06093 5.06093
Intercept 2.158701 8 9 3 -0.74353 2 -0.74353 2
0.00324 0.00949
Spend -0.01065 6 -3.28242 2 -0.018 -0.00331 -0.018 -0.00331
0.00030 3.42281 0.00759 0.00173 0.00035 0.00173
Sales 0.001045 5 9 4 0.000354 5 4 5

Month = 2.16 – 0.01065(Spend) + 0.00105(Sales)


a) Check the model for heteroscedasticity

Sales Residual Plot


5
4
3
Residuals

2
1
0
-1 0 50000 100000 150000 200000
-2
-3
Sales

Spend Residual Plot


5
4
3
Residuals

2
1
0
-1 0 2000 4000 6000 8000 10000 12000 14000 16000
-2
-3
Spend

Here, there is no clear pattern in these plots, an indication of lack of heteroscedasticity.


However, one may argue that spending varies slightly less in later months

b) Check the independence of residuals (autocorrelation).


Surname 7

Plot of residuals vs month


5
4
3
2
Residuals

1
0
0 2 4 6 8 10 12 14
-1
-2
-3
Month

As well, there is no clear pattern in this plot (seems random), a case which may point
out that there thus test of autocorrelation is passed by the model.

You might also like