Professional Documents
Culture Documents
Econometrics Assignment... 2
Econometrics Assignment... 2
1. The following results have been obtained from a sample of 11 observations on the
values of sales (Y) of a firm and the corresponding prices (X).
X̄=519.18
Ȳ =217.82
∑ X 2i =3,134 ,543
∑ X i Y i=1,296,836
∑ Y 2i =539,512
i) Estimate the regression line of sale on price and interpret the results
ii) What is the part of the variation in sales which is not explained by the regression
line?
iii) Estimate the price elasticity of sales.
Answer
i) The regression line of sales on price can be represented by the following equation
Y= a +bX
∑ N (∑ xy )−( ∑ x )( ∑ y )
Where b = and a = Y −b X and N = 11
N ( ∑ x 2 )−( ∑ x )
2
(11∗1296836)−(11∗519.18∗11∗217.82)
b = (11∗3134543)−(11∗519.18∗11∗519.18)
52870.3364
b = 169516.4036 = 0.31
1
ii) The part of sales that is explained by the sales regression is calculated by R2
N ( ∑ xy )−( ∑ x ) ( ∑ y )
R 2=
√ N ∑ ( X−ΣX ) ( NΣγ−∑ γ )
R 2=
[ ( 11∗1296836−11∗519.18∗11∗217.82 ) ]
√[(11∗3134543−11∗519.18∗11∗519.18)(11∗539512−11∗217.82∗11∗217.82)]
2 52870.3364
R= ∗17610.9236
169516.4036
2
R 0.968
The part of sales that is not explained by price can be calculated as 1−R2=1−0.968=0.032 and
interpreted as 3.2% of sales are not explained by the price.
dY
∗X
e(p)¿ dX
Y
X
e(p) = b* Y
0.31 X
e(p) = Y
2. The following data refers to the price of a good ‘P’ and the quantity of the good supplied, ‘S’.
P 2 7 5 1 4 8 2 8
S 15 41 32 9 28 43 17 40
Answer
2
8
SSS=∑ S =∑ ( S i−s ) =1205
2 2
i=1
8
SPP=∑ P =∑ ( Pi−P ) =55.9
2 2
i=1
8
SSP=∑ ( SP )=¿ ∑ ( S i−S ) ( Pi −P )=¿ ¿ 22.4
i=1
(∑ SP) 225.4
α =S−Pβ and β = = 1 1 = 0.8685
√ S 2∗√ P 2 1205 ∗55.9
2 2
Si 228
S=∑ = =28.125
n 8
P
P=∑ i =37/8 = 4.625
n
α =28.12−4.625∗0.8685=24.1082
SE(α )= σ
√ 1
n
+ P 2/ Spp∧¿ SE(α ) = σ/√ Spp
Σ=√ 193.8058=13.9214
√
1
SE(α )= σ + P2 / S PP
n
13.9214/√ 55.9=1.86199
3
3. Suppose that a researcher estimates a consumptions function and obtains the following results:
C= 15 + 0 .81 Yd n=19
2
( 3. 1 ) ( 18. 7 ) R =0 . 99
Where C=Consumption, Yd=disposable income, and numbers in the parenthesis are the‘t-ratios’
a. Test the significant of Yd statistically using t-ratios
b. Determine the estimated standard deviations of the parameter estimates
Answer
For a fitted regression model y= β^ 1 + ^β 2 with Y as response and X as prediction variable, the test
statistic for testing the significance of X is given by,
^β + β 0
2 2
T=
√ σ^ H0 tn-2
2
s xx
Where:
n r
1
σ^ =RSS/n-2 = SXX =∑ ( X i−x ) ∑X,
2 2
=
i=1 n i=1 i
β is the hypothesized value Of ^β 2 here it is 0
0
2
Therefore we are to test the null hypothesis H1:β2≠0 against the alternative hypothesis .H1:β2≠0.
The test statistic for this test is given by tβ2= T-ratio for β2=18.7 [ ∵ under H0;β20=0]
The p-value for this test can be computed for t-distribution with f=n−2=19−2=17 using R
code:
2Hpt(18.7,17,lower.tail=F which gives p-value as 0 for which we reject the null hypothesis.
4
b) The t-ratio is basically the estimate divided by the standard error. Again the standard error is
the standard deviation of the estimates.
t-ratio(β 2)=3.1= β1/sec(β1)=15/se(β1)
Se(β1)=15/3.1=4.839
t-ratio(β2)=18.7= β2/se (β2)=0.81/se(β2)
sec(β1)=0.81/18.7=0.043
Therefore the estimated standard deviations of the parameter estimates are 4.389 and 0.043
respectively.
4. Discuss the nature, causes, consequences and remedies of each of the following problems we
might encounter in regression analysis.
a) Muticollinearity
b) Hetroscedasticity
c) Autocorrelation
Answer
a) Multicollinearity
Multicollinearity is the occurrences of high inter correlations among two or more independent
variables in a multiple regression model. The causes of multicollinearity can be data-based or
structural. Data-based multicollinearity can occur due to insufficient data, existence of dummy
variables, using a variable that is a combination of two existing variables or using two identical
or almost identical variables. The consequences of multicollinearity include the coefficient
estimates to have inflated standard errors and reduction in the precision of the estimated
coefficients, which lowers the model's power. Some of the remedies for Multicollinearity are
remove some of the highly correlated independent variables, combine the independent variables
by adding them together, and perform an analysis designed for highly correlated variables, such
as principal components analysis or partial least squares regression.
b) Heteroscedasticity
Heteroscedasticity is a situation where the variance of residuals is not constant over the range of
measured values. It results in an unequal scatter of residuals in a regression analysis. One cause
5
of heteroscedasticity is using a dataset with a wide range of values, resulting in outliers. Another
cause is the omission of variables from the model. The consequence of heteroscedasticity is that
it results in estimators that are not best, linear, and unbiased. Similarly, hypothesis tests of the
estimated coefficients using t-test and f-test become invalid due to heteroscedasticity. The
remedies for heteroscedasticity test are data transformation (Square root transformation,
exponential transformation, logarithmic transformation, absolute value transformation and
inverse transformation).
c) Autocorrelation
Autocorrelation is the correlation of the same variable between two successive time intervals. It
measures the relationship between a variable's current value and its past values. On the other
hand autocorrelation is a correlation coefficient. However, instead of correlation between two
different variables, the correlation is between two values of the same variable. This means that
the disturbances are not pairwise independent, but pairwise auto correlated. Auto correlation is
most likely to occur in time series data. Autocorrelation can be caused by seasonal shocks that
affect a variable differently at different periods. Other causes of autocorrelation are
misspecification and data smoothing or manipulation. Autocorrelation leads to coefficient
estimates that are not best, linear and unbiased. Besides, it underestimates the variances of the
estimates, which affects hypothesis testing. Similarly, the coefficient of determination becomes
overestimated, and all t-statistics become higher. When autocorrelated error terms are found to
be present, then one of the first remedial measures should be to investigate the omission of a key
predictor variable. If such a predictor does not aid in reducing/eliminating autocorrelation of the
error terms, then certain transformations on the variables can be performed.
5. Use the data file wage to work on using STATA and answer the following questions:
Answer
a) The data has 10 variables and 39 observations.
b) Remedial measures are not needed because the given data had no problems of
multicollinearity and hetroscedasticity.
c) The results of the regression analysis is presented as follows:
d) 1. Linktest
7
. linktest
The results of the link test indicated that the model is properly specified.
2. Ovtest
The results of the ovtest indicated that the model is properly specified.
e) Multicollinearity test result
chi2(1) = 2.16
Prob > chi2 = 0.1421
8
g) The model is statistically significant (F=1079, p<0.001) and the adj. R-squared value
showed that about 76.31% of the variation of the dependent variable is based on the given
independent variables.
h) The results of the regression analysis indicated that the factors that affect the average
hours worked during the year are Average yearly earnings of other family members
($),Average yearly non-earned income, average family asset holdings (Bank account,
etc.) ($), Average age of respondent, and average number of dependents.
6. Use the data file EARNINGS and, using STATA for analysis, carry out the following tasks.
a. Perform a regression of EARNINGS on S where EARNINGS represents Current hourly
earnings in $ and S represents education (highest grade completed) in number of years of
schooling of the respondent. Interpret the regression results
b. Comment on the value of R2
c. Perform a test on the coefficients of regression. Explain the implications of the result of
the test. Calculate a 95% confidence interval for the slope coefficient
d. Perform an F test for the goodness of fit and comment on the result
e. Regress S on ASVAC and SM where ASVAC is a composite measure of numerical and
verbal ability of the respondent and SM is the years of schooling of the respondent’s
mother. Repeat the regression using SF, the years of schooling of the father, instead of
SM, and again including both as regressors. Do your regression result support the view
that if you educate a male, you educate an individual, while if you educate a female, you
educate a nation?
f. Regress EARNINGS on S and EXP (total out-of-school work experience in years),
interpret the results and perform t tests
Answer
9
Source SS df MS Number of obs = 540
F(1, 538) = 112.15
Model 19321.5589 1 19321.5589 Prob > F = 0.0000
Residual 92688.6722 538 172.283777 R-squared = 0.1725
Adj R-squared = 0.1710
Total 112010.231 539 207.811189 Root MSE = 13.126
The result of regression analysis shows that the highest grade completed in number of years of
schooling has a positive and significant effect on current hourly earnings.
b. The value of R2 indicates that the variation of current hourly earnings can be explained by
that the highest grade completed in number of years of schooling. But the value is small,
indicating that the highest grade completed in number of years of schooling explains the
current hourly earnings by 17.3%.
c. The value of beta coefficient ( β=2.46 , p<0.00 1 ¿ indicates that a one unit increase in
number of years of schooling brings 2.46 unit increases in current hourly earnings.
d. The value of F-test (F=112.15, p<0.001) indicates that the data fits the model.
10
Source SS df MS Number of obs = 540
F(2, 537) = 155.49
Model 1175.37867 2 587.689333 Prob > F = 0.0000
Residual 2029.60467 537 3.77952452 R-squared = 0.3667
Adj R-squared = 0.3644
Total 3204.98333 539 5.94616574 Root MSE = 1.9441
The results of the third regression analysis did not support the view that if you educate a male,
you educate an individual, while if you educate a female, you educate a nation.
f. The regression analysis EARNINGS on S and EXP (total out-of-school work experience in
years) presented and interpreted as follows:
11
The result of regression analysis shows that the highest grade completed in number of years of
schooling and total out-of-school work experience in years has a positive and significant effect
on current hourly earnings.
12