You are on page 1of 12

PHARMA COLLEGE

DEPARTMENT OF ACCOUNTING AND FINANCE


MSC IN ACCOUNTING AND FINANCE
ECONOMETRICS ASSIGNMENT

1. The following results have been obtained from a sample of 11 observations on the
values of sales (Y) of a firm and the corresponding prices (X).
X̄=519.18
Ȳ =217.82
∑ X 2i =3,134 ,543
∑ X i Y i=1,296,836
∑ Y 2i =539,512
i) Estimate the regression line of sale on price and interpret the results
ii) What is the part of the variation in sales which is not explained by the regression
line?
iii) Estimate the price elasticity of sales.

Answer
i) The regression line of sales on price can be represented by the following equation

Y= a +bX

∑ N (∑ xy )−( ∑ x )( ∑ y )
Where b = and a = Y −b X and N = 11
N ( ∑ x 2 )−( ∑ x )
2

Putting the values,

(11∗1296836)−(11∗519.18∗11∗217.82)
b = (11∗3134543)−(11∗519.18∗11∗519.18)

52870.3364
b = 169516.4036 = 0.31

a = 217.82 – (0.31*519.18) = 55.89

Therefore, the regression line of sales on price is, Y = 55.89 + 0.31X

1
ii) The part of sales that is explained by the sales regression is calculated by R2

N ( ∑ xy )−( ∑ x ) ( ∑ y )
R 2=
√ N ∑ ( X−ΣX ) ( NΣγ−∑ γ )

R 2=
[ ( 11∗1296836−11∗519.18∗11∗217.82 ) ]
√[(11∗3134543−11∗519.18∗11∗519.18)(11∗539512−11∗217.82∗11∗217.82)]
2 52870.3364
R= ∗17610.9236
169516.4036
2
R 0.968

The part of sales that is not explained by price can be calculated as 1−R2=1−0.968=0.032 and
interpreted as 3.2% of sales are not explained by the price.

iii) The price elasticity of sales can be calculated as:

dY
∗X
e(p)¿ dX
Y
X
e(p) = b* Y
0.31 X
e(p) = Y

2. The following data refers to the price of a good ‘P’ and the quantity of the good supplied, ‘S’.

P 2 7 5 1 4 8 2 8
S 15 41 32 9 28 43 17 40

a. Estimate the linear regression line Ε( S )=α + βP

b. Estimate the standard errors of α^ and β^


c. Test the hypothesis that price influences supply

Answer

a) The linear regression line can be calculated as follows:

2
8
SSS=∑ S =∑ ( S i−s ) =1205
2 2

i=1
8
SPP=∑ P =∑ ( Pi−P ) =55.9
2 2

i=1
8
SSP=∑ ( SP )=¿ ∑ ( S i−S ) ( Pi −P )=¿ ¿ 22.4
i=1

(∑ SP) 225.4
α =S−Pβ and β = = 1 1 = 0.8685
√ S 2∗√ P 2 1205 ∗55.9
2 2

From the table

Si 228
S=∑ = =28.125
n 8
P
P=∑ i =37/8 = 4.625
n
α =28.12−4.625∗0.8685=24.1082

Therefore, the estimated regression line is, S=24.1082 + 0.8685P

b) The standard error (SE) ofα and β are calculated as follows:

SE(α )= σ
√ 1
n
+ P 2/ Spp∧¿ SE(α ) = σ/√ Spp

Σ^2=1/( n−2)S SE=1/(n-2)[SSS- β 2 Spp]


=1/(8-2)[1205-0.86852*55.9]
=1/6*1162.8351=193.8058

Σ=√ 193.8058=13.9214


1
SE(α )= σ + P2 / S PP
n
13.9214/√ 55.9=1.86199

c) Testing for hypothesis


H0 : β = 0 Versus H1 : β ≠0 at α = 0.05
t = (β -0)/ (SE (β)
t = 0.8685/1.86199=0.4664
ttabulated =2.4469
∣t∣=2.4469
Therefore, we fail to reject H0, that means at α = 0.05; Price doesn′t affect Supply

3
3. Suppose that a researcher estimates a consumptions function and obtains the following results:
C= 15 + 0 .81 Yd n=19
2
( 3. 1 ) ( 18. 7 ) R =0 . 99
Where C=Consumption, Yd=disposable income, and numbers in the parenthesis are the‘t-ratios’
a. Test the significant of Yd statistically using t-ratios
b. Determine the estimated standard deviations of the parameter estimates
Answer
For a fitted regression model y= β^ 1 + ^β 2 with Y as response and X as prediction variable, the test
statistic for testing the significance of X is given by,
^β + β 0
2 2
T=

√ σ^ H0 tn-2
2

s xx
Where:
n r
1
σ^ =RSS/n-2 = SXX =∑ ( X i−x ) ∑X,
2 2
=
i=1 n i=1 i
β is the hypothesized value Of ^β 2 here it is 0
0
2

a) In this case, Y=C,x= Yd, n=19, R2=0.99

T-ratio for β1=3.1, T-ratio for β2=18.7

Therefore we are to test the null hypothesis H1:β2≠0 against the alternative hypothesis .H1:β2≠0.

The test statistic for this test is given by tβ2= T-ratio for β2=18.7 [ ∵ under H0;β20=0]

The p-value for this test can be computed for t-distribution with f=n−2=19−2=17 using R
code:
2Hpt(18.7,17,lower.tail=F which gives p-value as 0 for which we reject the null hypothesis.

Therefore, It can be conclude that Yd is statistically significant.

4
b) The t-ratio is basically the estimate divided by the standard error. Again the standard error is
the standard deviation of the estimates.
t-ratio(β 2)=3.1= β1/sec(β1)=15/se(β1)
Se(β1)=15/3.1=4.839
t-ratio(β2)=18.7= β2/se (β2)=0.81/se(β2)
sec(β1)=0.81/18.7=0.043

Therefore the estimated standard deviations of the parameter estimates are 4.389 and 0.043
respectively.
4. Discuss the nature, causes, consequences and remedies of each of the following problems we
might encounter in regression analysis.
a) Muticollinearity
b) Hetroscedasticity
c) Autocorrelation

Answer

a) Multicollinearity

Multicollinearity is the occurrences of high inter correlations among two or more independent
variables in a multiple regression model. The causes of multicollinearity can be data-based or
structural. Data-based multicollinearity can occur due to insufficient data, existence of dummy
variables, using a variable that is a combination of two existing variables or using two identical
or almost identical variables. The consequences of multicollinearity include the coefficient
estimates to have inflated standard errors and reduction in the precision of the estimated
coefficients, which lowers the model's power. Some of the remedies for Multicollinearity are
remove some of the highly correlated independent variables, combine the independent variables
by adding them together, and perform an analysis designed for highly correlated variables, such
as principal components analysis or partial least squares regression.

b) Heteroscedasticity

Heteroscedasticity is a situation where the variance of residuals is not constant over the range of
measured values. It results in an unequal scatter of residuals in a regression analysis. One cause

5
of heteroscedasticity is using a dataset with a wide range of values, resulting in outliers. Another
cause is the omission of variables from the model. The consequence of heteroscedasticity is that
it results in estimators that are not best, linear, and unbiased. Similarly, hypothesis tests of the
estimated coefficients using t-test and f-test become invalid due to heteroscedasticity. The
remedies for heteroscedasticity test are data transformation (Square root transformation,
exponential transformation, logarithmic transformation, absolute value transformation and
inverse transformation).

c) Autocorrelation

Autocorrelation is the correlation of the same variable between two successive time intervals. It
measures the relationship between a variable's current value and its past values. On the other
hand autocorrelation is a correlation coefficient. However, instead of correlation between two
different variables, the correlation is between two values of the same variable. This means that
the disturbances are not pairwise independent, but pairwise auto correlated. Auto correlation is
most likely to occur in time series data. Autocorrelation can be caused by seasonal shocks that
affect a variable differently at different periods. Other causes of autocorrelation are
misspecification and data smoothing or manipulation. Autocorrelation leads to coefficient
estimates that are not best, linear and unbiased. Besides, it underestimates the variances of the
estimates, which affects hypothesis testing. Similarly, the coefficient of determination becomes
overestimated, and all t-statistics become higher. When autocorrelated error terms are found to
be present, then one of the first remedial measures should be to investigate the omission of a key
predictor variable. If such a predictor does not aid in reducing/eliminating autocorrelation of the
error terms, then certain transformations on the variables can be performed.

5. Use the data file wage to work on using STATA and answer the following questions:

a) Examine the data


b) Carry out remedial measure(s) if there is any problem with data
c) Regress HRS on RATE, ERSP, ERNO, NEIN, AGE and DEP
d) Conduct model specification tests using linktest and ovtest commands of STATA, and
interpret the result
6
e) Perform multicollinearity test
f) Perform hetroscedasticity test
g) Comment on the explanatory power and adequacy of the model
h) Interpret the regression coefficients

Answer
a) The data has 10 variables and 39 observations.

b) Remedial measures are not needed because the given data had no problems of
multicollinearity and hetroscedasticity.
c) The results of the regression analysis is presented as follows:

Source SS df MS Number of obs = 36


F(6, 29) = 19.79
Model 115137.522 6 19189.5869 Prob > F = 0.0000
Residual 28119.2285 29 969.628568 R-squared = 0.8037
Adj R-squared = 0.7631
Total 143256.75 35 4093.05 Root MSE = 31.139

HRS Coef. Std. Err. t P>|t| [95% Conf. Interval]

RATE -26.85105 24.04727 -1.12 0.273 -76.03323 22.33114


ERSP .0285354 .0357251 0.80 0.431 -.0445307 .1016015
ERNO -.2780987 .091545 -3.04 0.005 -.4653293 -.0908681
NEIN .5846803 .0892013 6.55 0.000 .4022431 .7671175
AGE -5.239095 2.40644 -2.18 0.038 -10.16082 -.3173734
DEP 27.15917 13.75215 1.97 0.058 -.9671344 55.28547
_cons 2210.755 104.2362 21.21 0.000 1997.568 2423.942

d) 1. Linktest

7
. linktest

Source SS df MS Number of obs = 36


F(2, 33) = 70.14
Model 115975.366 2 57987.6832 Prob > F = 0.0000
Residual 27281.3837 33 826.708597 R-squared = 0.8096
Adj R-squared = 0.7980
Total 143256.75 35 4093.05 Root MSE = 28.753

HRS Coef. Std. Err. t P>|t| [95% Conf. Interval]

_hat 6.920836 5.882094 1.18 0.248 -5.046374 18.88805


_hatsq -.0013877 .0013785 -1.01 0.321 -.0041922 .0014168
_cons -6311.115 6271.789 -1.01 0.322 -19071.17 6448.936

The results of the link test indicated that the model is properly specified.

2. Ovtest

Ramsey RESET test using powers of the fitted values of HRS


Ho: model has no omitted variables
F(3, 26) = 3.00
Prob > F = 0.0488

The results of the ovtest indicated that the model is properly specified.
e) Multicollinearity test result

Variable VIF 1/VIF

NEIN 5.49 0.182100


RATE 4.56 0.219380
AGE 3.93 0.254487
ERSP 2.95 0.338889
DEP 2.91 0.343477
ERNO 2.65 0.377146

Mean VIF 3.75

f) Hetroscedasticity test result


Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of HRS

chi2(1) = 2.16
Prob > chi2 = 0.1421

8
g) The model is statistically significant (F=1079, p<0.001) and the adj. R-squared value

showed that about 76.31% of the variation of the dependent variable is based on the given

independent variables.

h) The results of the regression analysis indicated that the factors that affect the average
hours worked during the year are Average yearly earnings of other family members
($),Average yearly non-earned income, average family asset holdings (Bank account,
etc.) ($), Average age of respondent, and average number of dependents.

6. Use the data file EARNINGS and, using STATA for analysis, carry out the following tasks.
a. Perform a regression of EARNINGS on S where EARNINGS represents Current hourly
earnings in $ and S represents education (highest grade completed) in number of years of
schooling of the respondent. Interpret the regression results
b. Comment on the value of R2
c. Perform a test on the coefficients of regression. Explain the implications of the result of
the test. Calculate a 95% confidence interval for the slope coefficient
d. Perform an F test for the goodness of fit and comment on the result
e. Regress S on ASVAC and SM where ASVAC is a composite measure of numerical and
verbal ability of the respondent and SM is the years of schooling of the respondent’s
mother. Repeat the regression using SF, the years of schooling of the father, instead of
SM, and again including both as regressors. Do your regression result support the view
that if you educate a male, you educate an individual, while if you educate a female, you
educate a nation?
f. Regress EARNINGS on S and EXP (total out-of-school work experience in years),
interpret the results and perform t tests

Answer

a. The regression analysis of EARNINGS on S where EARNINGS represents Current hourly


earnings in $ and S represents education (highest grade completed) in number of years of
schooling of the respondent and its interpretation is presented as follows.

9
Source SS df MS Number of obs = 540
F(1, 538) = 112.15
Model 19321.5589 1 19321.5589 Prob > F = 0.0000
Residual 92688.6722 538 172.283777 R-squared = 0.1725
Adj R-squared = 0.1710
Total 112010.231 539 207.811189 Root MSE = 13.126

EARNINGS Coef. Std. Err. t P>|t| [95% Conf. Interval]

S 2.455321 .2318512 10.59 0.000 1.999876 2.910765


_cons -13.93347 3.219851 -4.33 0.000 -20.25849 -7.608444

The result of regression analysis shows that the highest grade completed in number of years of
schooling has a positive and significant effect on current hourly earnings.

b. The value of R2 indicates that the variation of current hourly earnings can be explained by
that the highest grade completed in number of years of schooling. But the value is small,
indicating that the highest grade completed in number of years of schooling explains the
current hourly earnings by 17.3%.
c. The value of beta coefficient ( β=2.46 , p<0.00 1 ¿ indicates that a one unit increase in
number of years of schooling brings 2.46 unit increases in current hourly earnings.
d. The value of F-test (F=112.15, p<0.001) indicates that the data fits the model.

e. The Regression results are presented as follows:


Source SS df MS Number of obs = 540
F(2, 537) = 147.36
Model 1135.67473 2 567.837363 Prob > F = 0.0000
Residual 2069.30861 537 3.85346109 R-squared = 0.3543
Adj R-squared = 0.3519
Total 3204.98333 539 5.94616574 Root MSE = 1.963

S Coef. Std. Err. t P>|t| [95% Conf. Interval]

ASVABC .1328069 .0097389 13.64 0.000 .1136758 .151938


SM .1235071 .0330837 3.73 0.000 .0585178 .1884963
_cons 5.420733 .4930224 10.99 0.000 4.452244 6.389222

10
Source SS df MS Number of obs = 540
F(2, 537) = 155.49
Model 1175.37867 2 587.689333 Prob > F = 0.0000
Residual 2029.60467 537 3.77952452 R-squared = 0.3667
Adj R-squared = 0.3644
Total 3204.98333 539 5.94616574 Root MSE = 1.9441

S Coef. Std. Err. t P>|t| [95% Conf. Interval]

ASVABC .1285797 .0095914 13.41 0.000 .1097385 .1474209


SF .1289751 .0259437 4.97 0.000 .0780115 .1799387
_cons 5.541335 .4692887 11.81 0.000 4.619468 6.463202

Source SS df MS Number of obs = 540


F(3, 536) = 104.30
Model 1181.36981 3 393.789935 Prob > F = 0.0000
Residual 2023.61353 536 3.77539837 R-squared = 0.3686
Adj R-squared = 0.3651
Total 3204.98333 539 5.94616574 Root MSE = 1.943

S Coef. Std. Err. t P>|t| [95% Conf. Interval]

ASVABC .1257087 .0098533 12.76 0.000 .1063528 .1450646


SM .0492424 .0390901 1.26 0.208 -.027546 .1260309
SF .1076825 .0309522 3.48 0.001 .04688 .1684851
_cons 5.370631 .4882155 11.00 0.000 4.41158 6.329681

The results of the third regression analysis did not support the view that if you educate a male,
you educate an individual, while if you educate a female, you educate a nation.

f. The regression analysis EARNINGS on S and EXP (total out-of-school work experience in
years) presented and interpreted as follows:

Source SS df MS Number of obs = 540


F(2, 537) = 67.54
Model 22513.6473 2 11256.8237 Prob > F = 0.0000
Residual 89496.5838 537 166.660305 R-squared = 0.2010
Adj R-squared = 0.1980
Total 112010.231 539 207.811189 Root MSE = 12.91

EARNINGS Coef. Std. Err. t P>|t| [95% Conf. Interval]

S 2.678125 .2336497 11.46 0.000 2.219146 3.137105


EXP .5624326 .1285136 4.38 0.000 .3099816 .8148837
_cons -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213

11
The result of regression analysis shows that the highest grade completed in number of years of
schooling and total out-of-school work experience in years has a positive and significant effect
on current hourly earnings.

12

You might also like