You are on page 1of 9

ST.

Mary’s University Graduate Class

MSc. Development Economics

Econometrics Assignment on STATA

Group Name:

1. Yilkal Astatikie – SGS/0372/2015A


2. Rahel Desalegn – SGS/0253/2014A
3. Wudassie Ayele – SGS/0371/2015A

DATE: 4 February 2023


1. Run simple descriptive statistics of the data and make interpretations

Variable Obs Mean Std. Dev. Min Max

expenditure 119 7.26e+17 7.90e+18 1 8.62e+19


households 119 7.184874 2.840286 2 13
saving 119 998.4286 220.1074 720 2000
createdemp~y 119 12.61345 2.404478 7 22

INTERPRETATION: Descriptive statistics of the continuous dependent variables (Y) and the
independent variables (X2, X4 and X10)

 For the dependent variable Y, i.e expenditure, the mean value is 7.26e+17, the standard
deviation 7.90e+18, minimum 1 and maximum 8.62e+19 for 119 numbers of observations.
 For the independent variable X2, i.e Household size, the mean value is 7.184474, the
standard deviation 2.840286, minimum 2 and maximum 13 for 119 number of observations.
 For the independent variable X4, i.e amount of saving, the mean value is 998.4286, the
standard deviation 220.1074, minimum 720 and maximum 2000 for 119 number of
observations.
 For the independent variable X10, i.e number of employment created, the mean value is
12.61345, the standard deviation 2.404478, minimum 7 and maximum 22 for 119 number of
observations.
. tab1 sex edustat credituse credituse loan loanterm loandisburse bustraining

-> tabulation of sex

Sex Freq. Percent Cum.

0 66 55.46 55.46
1 53 44.54 100.00

Total 119 100.00

-> tabulation of edustat

EduStat Freq. Percent Cum.

0 96 80.67 80.67
1 23 19.33 100.00

Total 119 100.00

-> tabulation of credituse

Credituse Freq. Percent Cum.

0 64 53.78 53.78
1 55 46.22 100.00

Total 119 100.00

-> tabulation of credituse

Credituse Freq. Percent Cum.

0 64 53.78 53.78
1 55 46.22 100.00

Total 119 100.00

-> tabulation of loan

loan Freq. Percent Cum.

0 48 40.34 40.34
1 71 59.66 100.00

Total 119 100.00

-> tabulation of loanterm

Loanterm Freq. Percent Cum.

0 40 33.61 33.61
1 79 66.39 100.00

Total 119 100.00

-> tabulation of loandisburse

loandisburs
e Freq. Percent Cum.

0 43 36.13 36.13
1 76 63.87 100.00

Total 119 100.00

-> tabulation of bustraining

Bustraining Freq. Percent Cum.

0 34 28.57 28.57
1 85 71.43 100.00

Total 119 100.00

INTERPRETATION: For binary variables of total observation 119

 For sex:- 66 frequency and 55.46% for Female and for male 53 frequency and 44.54%.
 For Education status: 96 frequency and 80.67% for illiteracy and for literacy 23
frequency and 19.33%
 For use of credit:- 64 frequency and 53.78% for no response and for yes response 55
frequency and 46.22%.
 For loan:- 48 frequency and 40.34% for no response and for yes response 71 frequency
and 59.66%.
 For loan term:- 40 frequency and 33.61% for no response and for yes response 79
frequency and 66.39%.
 For loan disbursement:- 43 frequency and 36.13% for no response and for yes response
76 frequency and 63.87%.
 For business training:- 34 frequency and 28.57% for no response and for yes response 85
frequency and 71.43%.

2. Perform pair wise correlation coefficient between Y and other continuous variables only and
make interpretations
. pwcorr expenditure households saving createdemploy

expend~e househ~s saving create~y

expenditure 1.0000
households -0.1038 1.0000
saving 0.1485 0.0548 1.0000
createdemp~y -0.0237 -0.0143 -0.4318 1.0000

INTERPRETATION: All the values are below 0.8 there is no multicollinearity problem among
the independent variables.

3. Normalize dependent variable (Y) by changing into logarithmic (LogY)

. gen lnexpenditure=log(expenditure )

4. Run Multiple Linear Regression Model using normalized dependent variable (LogY), do you
think the model should be interpreted, explain your reason? Explain your reason
. reg lnexpenditure sex households edustat saving credituse loan loanterm loandisburse bustraining createdemploy

Source SS df MS Number of obs = 119


F( 10, 108) = 25.95
Model 7711.25276 10 771.125276 Prob > F = 0.0000
Residual 3208.99766 108 29.7129413 R-squared = 0.7061
Adj R-squared = 0.6789
Total 10920.2504 118 92.544495 Root MSE = 5.451

lnexpendit~e Coef. Std. Err. t P>|t| [95% Conf. Interval]

sex -.4349056 1.147066 -0.38 0.705 -2.708589 1.838777


households .2236451 .1803806 1.24 0.218 -.1339007 .5811908
edustat 3.874306 1.568052 2.47 0.015 .7661544 6.982458
saving .0128518 .0032202 3.99 0.000 .0064688 .0192349
credituse 4.720044 1.213964 3.89 0.000 2.313756 7.126331
loan 2.703437 1.184458 2.28 0.024 .3556361 5.051237
loanterm 1.460195 1.275371 1.14 0.255 -1.067811 3.988201
loandisburse 3.9192 1.311975 2.99 0.003 1.318639 6.519762
bustraining 2.788077 1.254224 2.22 0.028 .3019872 5.274167
createdemp~y -.1121978 .2568968 -0.44 0.663 -.6214119 .3970163
_cons -9.924487 5.27352 -1.88 0.063 -20.37752 .528544

REMARK: Yes as P value is significant at 1%

5. Comment on the overall goodness of fit of the model?


 The P value is significant at 1%
 The R-squared and P value interpretation:- 70% of the variation in expenditure of
business per annum is explained by the overall regression model or the independent
variables in the model. The P value is 0.000 < 0.01, thus significant at 1% .

6. Test the degree of multicollinearity among the explanatory variables in the data
using the formal test (interpret the result), if any problem takes the required remedial
measure?
. vif

Variable VIF 1/VIF

saving 2.00 0.501216


loandisburse 1.59 0.628579
edustat 1.54 0.651287
createdemp~y 1.52 0.659940
credituse 1.47 0.681614
loanterm 1.45 0.687912
loan 1.35 0.739527
sex 1.30 0.768239
bustraining 1.29 0.777758
households 1.04 0.959311

Mean VIF 1.45

. pwcorr sex households edustat saving credituse loan loanterm loandisburse bustraining createdemploy

sex househ~s edustat saving credit~e loan loanterm

sex 1.0000
households 0.0550 1.0000
edustat -0.0533 -0.0696 1.0000
saving -0.0591 0.0548 0.5470 1.0000
credituse 0.1867 0.0526 0.3573 0.4363 1.0000
loan -0.1938 -0.1158 0.2723 0.3320 0.1438 1.0000
loanterm 0.2439 0.0214 0.3032 0.3787 0.3028 0.1764 1.0000
loandisburse 0.2869 0.0121 0.2353 0.4155 0.4166 0.2373 0.4276
bustraining 0.0053 0.0874 0.1682 0.3312 0.2878 0.0488 0.2588
createdemp~y 0.0387 -0.0143 -0.2854 -0.4318 -0.1953 -0.3760 -0.3526

loandi~e bustra~g create~y

loandisburse 1.0000
bustraining 0.2213 1.0000
createdemp~y -0.3114 -0.3585 1.0000

 VIF INTERPRETATION: All the VIF values are less the 10% thus there is no serious
multicollinearity problem.
 PAIRWISE CORRELATION: All the values are less than 0.8 thus there is no serious
multicollinearity problem.

7. Test whether there is heteroskedasticity problem in the data or not using formal test
(interpret the result), if any problem take the required remedial measure?
. estat hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity


Ho: Constant variance
Variables: fitted values of lnexpenditure

chi2(1) = 30.11
Prob > chi2 = 0.0000
 INTERPRETATION: Since P value is small and significant at 1% the null hypothesis of
homoskedasaticity is rejected and the model has heteroskedasaticity problem.
. . reg lnexpenditure sex households edustat saving credituse loan loanterm loandisburse bustraining createdemploy, robust

Linear regression Number of obs = 119


F( 10, 108) = 26.99
Prob > F = 0.0000
R-squared = 0.7061
Root MSE = 5.451

Robust
lnexpendit~e Coef. Std. Err. t P>|t| [95% Conf. Interval]

sex -.4349056 1.13029 -0.38 0.701 -2.675337 1.805526


households .2236451 .1636435 1.37 0.175 -.1007248 .5480149
edustat 3.874306 1.687719 2.30 0.024 .5289542 7.219658
saving .0128518 .0065734 1.96 0.053 -.0001777 .0258814
credituse 4.720044 1.090922 4.33 0.000 2.557648 6.88244
loan 2.703437 1.005023 2.69 0.008 .7113064 4.695567
loanterm 1.460195 1.06451 1.37 0.173 -.6498494 3.57024
loandisburse 3.9192 1.092791 3.59 0.001 1.753098 6.085302
bustraining 2.788077 1.120728 2.49 0.014 .5666005 5.009553
createdemp~y -.1121978 .2614451 -0.43 0.669 -.6304273 .4060317
_cons -9.924487 6.725476 -1.48 0.143 -23.25555 3.406574

8. Test for non-normality of error terms in the data or not using graphical test (interpret the
result), what should be done if a problem?
Kernel density estimate
.08
.06
Density

.04
.02
0

-30 -20 -10 0 10 20


Residuals

Kernel density estimate


Normal density
kernel = epanechnikov, bandwidth = 1.5966

 INTERPRETATION: The Kdensity of the plot for regression model is relatively matching
the normal curve. Therefore this indicates that the normality assumption is not violated
showcasing the non-existence of outliers.

9. Test for model misspecification bias in the data or not (interpret the result) using formal
test (interpret the result), what should be done if a problem?
. linktest

Source SS df MS Number of obs = 119


F( 2, 116) = 139.60
Model 7714.87839 2 3857.4392 Prob > F = 0.0000
Residual 3205.37202 116 27.6325174 R-squared = 0.7065
Adj R-squared = 0.7014
Total 10920.2504 118 92.544495 Root MSE = 5.2567

lnexpendit~e Coef. Std. Err. t P>|t| [95% Conf. Interval]

_hat .9286563 .2058571 4.51 0.000 .5209304 1.336382


_hatsq .0022936 .0063319 0.36 0.718 -.0102476 .0148347
_cons .390188 1.410685 0.28 0.783 -2.403852 3.184228

. ovtest

Ramsey RESET test using powers of the fitted values of lnexpenditure


Ho: model has no omitted variables
F(3, 105) = 6.71
Prob > F = 0.0003

 INTERPRETATION: As per the linktest the P-value of hat is significant at 1% while as


per the ovtest the P-value is significant at 1%, thus there exist a misspecification problem
and omission relevant variable.
 SOLUTION: the solution is to ensure adding relevant variable the regression model.

10. Explain final model result after the required diagnostics test, correction and make
interpretations of the coefficients of the model?

. . reg lnexpenditure sex households edustat saving credituse loan loanterm loandisburse bustraining createdemploy, robust

Linear regression Number of obs = 119


F( 10, 108) = 26.99
Prob > F = 0.0000
R-squared = 0.7061
Root MSE = 5.451

Robust
lnexpendit~e Coef. Std. Err. t P>|t| [95% Conf. Interval]

sex -.4349056 1.13029 -0.38 0.701 -2.675337 1.805526


households .2236451 .1636435 1.37 0.175 -.1007248 .5480149
edustat 3.874306 1.687719 2.30 0.024 .5289542 7.219658
saving .0128518 .0065734 1.96 0.053 -.0001777 .0258814
credituse 4.720044 1.090922 4.33 0.000 2.557648 6.88244
loan 2.703437 1.005023 2.69 0.008 .7113064 4.695567
loanterm 1.460195 1.06451 1.37 0.173 -.6498494 3.57024
loandisburse 3.9192 1.092791 3.59 0.001 1.753098 6.085302
bustraining 2.788077 1.120728 2.49 0.014 .5666005 5.009553
createdemp~y -.1121978 .2614451 -0.43 0.669 -.6304273 .4060317
_cons -9.924487 6.725476 -1.48 0.143 -23.25555 3.406574
Yi= -9.924487 -0.43X1+0.22X2+3.87X3+0.01X4+4.72X5+2.70X6 + 1.46X7+3.91X8 +
2.78X9-0.11X10

 INTERPRETATION: There is only one significant variable in the model is saving as


the P-Value is significant at 10%. The reg. coefficient of saving is 100*0.0128 = 1.28,
interpreted as saving increases by 1birr per month expenditure of business increase by
1.28.

You might also like