You are on page 1of 13

Department of Economics

College of Business Administration and Accountancy


MSU-Iligan Institute of Technology

Econ 102 Econometrics


Removal Problem Set
Due Date: January 3, 2016

Jerome B. Capalac

I. Write the equations for the true and estimated relationships between X and Y.

y i=b o +b1 X 1 + i
True Equation:

Estimated Equation: i=b^ o + b^ 1 X 1 +e i

II. Why do we NOT simply take the sum of the deviations without squaring them?

We do not simply take the sum of deviations without squaring them because if we
compute the mean of the deviations by summing the deviations and dividing by the sample
size, we will run into a problem. There will be an equal size of positive and negative values
which will be cancelled out and would result to the sum of deviation equal to zero. The
property of the sample mean is the sum of the deviations below the mean equal to the sum
of the deviations above the mean. However, the goal is to capture the magnitude of these
deviations in a summary measure. To address this problem of the deviations summing to
zero, we could take absolute values or square each deviation from the mean. Both methods
would address the problem. The more popular method to summarize the deviations from
the mean involves squaring the deviations.

III. Table 1 gives the bushels of corn per acre, Yt, resulting from the use of the various amounts
of fertilizer in pounds per acre, Xt, produced on a farm in each of 10 years from 1971 to
1980.

Year n Yt Xt
1971 1 40 6
1972 2 44 10
1973 3 46 12
1974 4 48 14
1975 5 52 16
1976 6 58 18
1977 7 60 22
1978 8 68 24
1979 9 74 26
1980 10 80 32
a. Compute the values of b^0 and b^1 .

Year N Yt Xt (Xt- X) (Yt-Y) (Xt-X)2 (Yt-Y)(Xt-


X)
1971 1 40 6 -12 -17 144 204
1972 2 44 10 -8 -13 64 104
1973 3 46 12 -6 -11 36 66
1974 4 48 14 -4 -9 16 36
1975 5 52 16 -2 -5 4 10
1976 6 58 18 0 1 0 0
1977 7 60 22 4 3 16 12
1978 8 68 24 6 11 36 66
1979 9 74 26 8 17 64 136
1980 10 80 32 14 23 196 322
MEAN 57 18 576 956

(X XX )( Y Y ) 1 = 1.6597
1 = (X X )2 0 = Y - 1 X

1 = 956/576 0 = 57 - 1.66 (18)

0 = 27.1254
b. Write the regression equation.

Y = 27.1254 + 1.6597 Xt
c. Interpret the regression result (in terms of the direction of the relationship of the
variable and interpretation of intercept and slope coefficient).

The direction of the relationship between bushels of corn per acre and the
amount of fertilizers (pounds per acre) used is positive. This means that the
greater the amount of fertilizers used, the greater is the quantity of bushels of
corn per acre produced.
Intercept: There will be only 27.1254 bushels of corn per acre produced if
zero pounds or no amount of fertilizers per acre is used.
Slope coefficient: For every 1 pound amount of fertilizers per acre is used,
there will be an increase of 1.6597 bushels of corn per acre produced.

d. Compute the values of Y^ I and


^ . Verify that the residuals (approximately) sum
i

to zero.
Y^ = 27.1254 + 1.6597 Xt
= Yi- Y
Residuals
COR FERTILIZER
YEAR N N (Xt) Yi 1 (Xt- X) (Yt-Y)
(Yt)
1971 1 6 40 93.5143 -87.5134 -12 -17
1972 2 10 44 100.152 -90.1522 -8 -13
2
1973 3 12 46 103.471 -91.4716 -6 -11
6
1974 4 14 48 106.791 -92.791 -4 -9
1975 5 16 52 113.429 -97.4298 -2 -5
8
1976 6 18 58 123.388 -105.388 0 1
1977 7 22 60 126.707 - 4 3
4 104.7074
1978 8 24 68 139.985 -115.985 6 11
1979 9 26 74 149.943 - 8 17
2 123.9432
1980 10 32 80 159.901 - 14 23
4 127.9014
180 570 1217.28 - 0 0
39 1037.283

IV. The following chart shows the relationship between two variables: (SALARY = Y-variable,
EDUC = X - variable) annual salary and total years of education.

EDUC (X) SALARY (Y)


11 40000
12 37000
11 34000
8 12000
12 45000
16 95000
18 100000
12 42000
12 49000
17 120000

A. Enter the SALARY and EDUC data into Excel. Make a scatter plot of the data.

N Education (X) Salary (Y)

1 11 40000

2 12 37000

3 11 34000

4 8 12000

5 12 45000

6 16 95000

7 18 100000

8 12 42000

9 12 49000

10 17 120000
X = 129 Y = 574,000
X = 12.9 Y = 57,400

B. The general equation for linear relationship is SALARY = B0 + B 1 EDUC. Suppose 0 = 0 and
1=10,000. In Excel, show the linear relationship given the values of b0, b1.
If 0 = 0 and 1=10,000, then the equation or formula for Y^ would be
SALARY_HAT = 0 + 10,000 EDUC

n Education (X) Salary (Y) Y^

1 11 40000 110000

2 12 37000 120000

3 11 34000 110000

4 8 12000 80000

5 12 45000 120000

6 16 95000 160000

7 18 100000 180000

8 12 42000 120000

9 12 49000 120000

10 17 120000 17000
Y^

C. Provide an interpretation of the intercept and slope coefficients given here.

Slope: If the total years of education increases by 1 year, then we predict


that the annual salary would increase by 10,000.
Intercept: If the total years of education are zero years, then we predict that the
annual salary is also zero.

D. You now have an estimate of the economic relationship. Calculate the explained portion of
SALARY and the unexplained, or residual, part of SALARY.

The fitted or explained portion of SALARY for each observation i is:


SALARY_HAT,i = b0 + b1 EDUCi

^ 0 Y - ^ 1 X ^ 1= Xi X Yi
=
Y
X i X

= 57,400 - (0 * 12.9)

^ 0 ^ 1
= 57,400 = 0

The residual or unexplained portion of SALARY for each observation i is:


ui = (SALARYi ) (SALARY_HATI )

E. Provide a table of fitted and residual values for SALARYi over the 10 observations.

You now want to determine how well the economic theory explains the observed relationship
between SALARY and EDUC. You also want to determine how good your best-fit line is
relative to other options.
n X Y Y^ u^ ^
= Y- Y
I

1 11 40,000 110,000 -70,000

2 12 37,000 120,000 -83,000

3 11 34,000 110,000 -76,000

4 8 12,000 80000 -68,000

5 12 45,000 120000 -75,000

6 16 95,000 160000 -65,000

7 18 100,000 180000 -80,000

8 12 42,000 120000 -78,000

9 12 49,000 120000 -71,000

10 17 120,000 17000 -50,000

F. Calculate your sum of the residual values over the 10 observations. What is desirable for
this sum? What does your sum of residuals tell you about your line?

u^
= Y- Y^
i Normally, the sum of the residual values is
equal to zero because some of the errors are
-70,000 negative while others are positive, so these will just
be cancelled out. However in the case of this
-83,000 problem, the sum of the residual values is a

-76,000
negative number but it wouldnt matter because
u^i
-68,000
is actually not a good measure of goodness of
fit of the estimated from the actual economic
-75,000 relationship of salary and education.

-65,000

-80,000

-78,000
-71,000

-50,000

ui=716,000

G. Calculate your sum of the squared residual values over the 10 observations. What is
desirable for this sum? What does your sum of residuals tell you about your line?
The simple-minded alternative to the economic theory is to simply calculate the mean for
SALARY over the sample.

u^ (Y Y^ )2 The desirable for the sum of the squared residual


i = Y- value is equal to zero because SSR=0, this means
Y^ that the values of the estimated model is a perfect fit
or is just equal to the actual model. But unfortunately
-70000 4,900,000,000 for this problem, SSR is very large which means that
-83000 6,889,000,000 the values of the estimated model are very far from
-76000 5,776,000,000 the values of the actual model. It can be inferred that
-68000 4,624,000,000 the estimated regression line will not even intersect
with the actual regression line. This implies that the
-75000 5,625,00,0000
parameters b0=0 and b1=10,000 are not good for our
-65000 4,225,000,000 estimation. It is so far from the fitted values so it
-80000 6,400,000,000 means that is not the best fit. It also gives us large
-78000 6,084,000,000 penalty for errors and this data can be found farther
-71000 5,041,000,000 from the regression line.
-50000 2,500,000,000
= -716000 = 52,064,000,000

H. Calculate the mean for SALARY over the 10 observations. Does the mean provide as good
an explanation for the behavior of SALARY as your line? Why or why not? Be specific (what
is the sum of squared residuals)?

Salary (Y) N

40000 1
By definition, the mean is used to give a
measure of central tendency for a set of
37000 2
observations and the points are used to
34000 3 summarize the location of observations as
to where data lies or can be assumed to be
12000 4 laying for all summary purpose.

45000 5 However, the mean of the salary does


not contain much data to analyze other
factor of the line. It only suggests that the
mean minimizes the average squared
distance between each number and our
summary.
95000 6

100000 7

42000 8

49000 9

120000 10

Y = 574,000
Y = Y / n
Y = 574,00 / 10
Y = 57,400
I. In the context of this example, explain why there exists an error or residual term and exactly
what it represents.

An error term is a variable in a statistical or mathematical model, which is created when


the model does not fully represent the actual relationship between the independent variables
and the dependent variables. As a result of this incomplete relationship, the error term is the
amount at which the equation may differ during empirical analysis. The error term is also
known as the residual, disturbance or remainder term. There exist an errors or residual term
because these are the unexplained component in the estimated model.

An error term represents the margin of error within a statistical model, referring to the
sum of the deviations within the regression line, which provides an explanation for the
difference between the results of the model and actually observed results. The regression
line is used as a point of analysis when attempting to determine the correlation between one
independent variable and one dependent variable.

J. Calculate the ordinary least squares estimates for 0 and 1. Do not use the Excel regression
function (thats cheating), rather calculate the components and plug them into the OLS
formula. Show your work.

Yi Xi Xi - Yi - ( X )2 Yi - Y^i u^ u^ 2
N (
X Y Y )(
Xi
-
X
)
1 40000 11 -1.9 -17400 3.61 33,060 36,817.03 3182.97 10131290.38
2 37000 12 -0.9 -20400 0.81 18,360 47650.17 -10650.17 113426178.5
3 34000 11 -1.9 -23400 3.61 44,460 36817.03 -2817.03 7935664.78
4 12000 8 -4.9 -45400 24.01 222,460 4317.61 7682.39 59019116.82
5 45000 12 -0.9 -12400 0.81 11,160 47650.18 -2650.17 7023415.34
6 95000 16 3.1 37600 9.61 116,560 90982.74 4017.27 16138388.35
7 100000 18 5.1 42600 26.01 217,260 112469.02 -12649.01 159977750
8 42000 12 -0.9 -15400 0.81 13,860 47650.17 -5650.17 31924451.54
9 49000 12 -0.9 -8400 0.81 7,560 47650.17 1349.83 1822033.74
10 120000 17 4.1 62600 16.81 256,660 101818.9 18184.13 330553117.2
574,00 129 0 0 86.9 941,400 0.04 738,080,552

X X/N =129/10 =12.9 Y Y/N = 574000/10 =57,400

(X XX )(Y Y )
1 =
(X X )2

= 941400/86.9

1 = 10,833.14
Y Y
0OLS =Y
Y - Y1 X
X

= 57,400 -[10,833.14(12.9)]

0 = - 82,347.51

^
OLS = Y =82,347.525+10,833.1415 X i

K. Show that the OLS estimates are better than the values given in part (B) above.
Explain your answer.

OLS : Yi 82,347.5254 10,833.1415 X i
Yi 0 10,000 X i

a They are unbiased. This means that the OLS estimates of the coefficients are
centered on the true population values of the parameters being estimated.
b They are minimum variance. This means that no other unbiased estimator has a
lower variance for each 1 than OLS.
c They are consistent. This means that as the sample size approaches infinity, the
estimates converge on the true population parameters.
d They are normally distributed.

V. Use the data set in WAGE2.RAW for this problem. As usual, be sure all of the following
regressions contain an intercept.
~
1. Run a simple regression of IQ on educ to obtain the slope coefficient, say, .
1

N = 935
R2 = 0.27
~
1 = 3.5338
^
IQ = 53.69 + 3.53 educ

~
2. Run the simple regression of log(wage)on educ, and obtain the slope coefficient, 1 .
N = 935
R2 = 0.10
~
1 = 0.05984
^
log( wage) = 5.97 + 0.0598 educ

3. Run the multiple regression of log(wage) on educ and IQ, and obtain the slope
^ ^
coefficients, 1 and , respectively.
2

N = 935
2
R = 0.13
^
log ( wage) = 5.66 + .039 educ + 0.0058 IQ
^
1 = 0.03912
^
2 = 0.00586

~ ^ ^ ~
4. Verify that 1 = 1 + 2
1 .

^
1 = 0.03912
^
2 = 0.0058631
~
1 = 0.05984
~
1 = 3.533829

~ ^ ^ ~
1 = 1 + 2
1.

0.05984 = 0.03912 + (0.00586) (3.533829)


0.05984 = 0.05984