You are on page 1of 13

Basic Data Handling1

The basic data handling in the text is related to the preliminary use of MS Excel and Eviews2.

Regression in Excel

Open Excel and go to….


Data/ Data analysis3/Regression/Ok
Input Y range H2:H26
Input X range M2:M26

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.108631
R Square 0.011801
Adjusted R
Square -0.02773
Standard Error 306.2715
Observations 27

ANOVA
Significance
df SS MS F F
Regression 1 28003.8 28003.8 0.298541 0.589642
Residual 25 2345056 93802.25
Total 26 2373060

Standard Upper Lower Upper


Coefficients Error t Stat P-value Lower 95% 95% 95.0% 95.0%
-
Intercept 255.7005 172.3374 1.48372 0.150382 -99.2351 610.6361 99.2351 610.6361
-
X2 4.995999 9.14367 0.546389 0.589642 -13.8357 23.82774 13.8357 23.82774

1
Lecture notes by Dr. Muhammad Atiq ur Rehman, Assistant Professor of Economics, Higher Education Department Punjab &
Adjunct Faculty of Economics and Management Sciences, COMSATS University Islamabad/ Lahore campus

2
Based on Chapter 1 and 2 of the book 'Economic and Financial Modelling with Eviews' by Aljndali and Tatahi.
3
Use add-ins if data analysis option not found
C is the intercept which shows that if the impact of independent variable is zero the value of Y

will be equal to 255.7005.

The coefficient of X is equal to 4.995999 which shows that one unit increase in X leads to

increase Y by 5 units or vice a versa.

Significance

C is the constant which is statistically insignificant b/c its calculated t value is 1.48372 which

is less than 2.

The X coefficient is also statistically insignificant b/c its calculated t value is 0.5464 which is

less than 2.

R- squared

R- squared value is very low which is roughly 0.012. We can say 1.2% variations in the

dependent variable are explained by the independent variable.

Adjusted R- squared

Adjusted R- squared may be negative. Here the value of Adjusted R- squared is found to be

negative. The value is roughly equal to -0.028%. It's the value of R- squared adjusted for the

degree of freedom N-K.

Some Additional interpretations about ANOVA (Analysis of Variance)

SS stands for Sum of Squares. It is the sum of the squares of the deviations from the means.

df stands for degrees of freedom (n-k).

MS stands for Mean Square. It is a kind of "average variation" and is found by dividing the

Sum of Squares by the degrees of freedom (SS/df). Well, thinking back to the section on

variance, you may recall that a variance was the variation divided by the degrees of freedom.

So, each number in the MS column is found by dividing the number in the SS column by the

number in the df column and the result is a variance.


The F column will be found by dividing the two numbers in the MS column. The value of F

is 0.295 which is less than 5. So the model is overall not good fit.

Between Group Variation: The total variation between each group mean and the overall mean.

Within-Group Variation: The total variation in the individual values in each group and their

group mean.

Correlation in Excel

Data/ Data analysis/correlation


Select Ranges P1:Q28
Names in first row

X2 X3
X2 1
X3 -0.08866 1
-0.08866 is r which is coefficient of correlation.

Summary Stat. in Excel

Data/ Data analysis/Descriptive stat.


Select Ranges P1:Q28
Names in first row
Summary stat

Y X2 X3

Mean 323.1923 Mean 17.59231 Mean 3.423077


Standard Standard Standard
Error 56.34629 Error 1.307986 Error 1.469352
Median 301 Median 16.6 Median 3
Mode #N/A Mode 16 Mode 9
Standard Standard Standard
Deviation 287.3108 Deviation 6.669448 Deviation 7.492252
Sample Sample Sample
Variance 82547.52 Variance 44.48154 Variance 56.13385
Kurtosis -1.31002 Kurtosis 16.01904 Kurtosis 5.374519
Skewness 0.399664 Skewness 3.570566 Skewness 1.603726
Range 831 Range 37 Range 37.5
Minimum 13 Minimum 10 Minimum -7.5
Maximum 844 Maximum 47 Maximum 30
Sum 8403 Sum 457.4 Sum 89
Count 26 Count 26 Count 26

Graphs in Excel
Select variables and their range by using mouse.
Insert 2-D line graph or 3-D line graph

Line Graph
160
140
120
100
80
60
40
20
0
1 2 3 4 5 6 7 8 9 10 11 12 13

P W EDU

Column Chart
70
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10

P W EDU
Bar Chart
10
9
8
7
6
5
4
3
2
1

0 10 20 30 40 50 60 70

EDU W P

Pivot Chart

450

400

350

300

250 Sum of P

200 Sum of W

150 Sum of Y

100

50

0
Total
Regression in Eviews

File/ Open/Foreign Data As Work File / Select Excel File From Computer (Book 1) /
Open/Next/Finish/ No

Quick/ Estimate Equation ……. Or You can go to (Object/New Object/ Equation


also)

A box appears …..

Write command in box Y C X2 X3

/Ok

You will get the following Eviews output

Dependent Variable: Y
Method: Least Squares
Date: 04/25/21 Time: 01:12
Sample: 2010 2019
Included observations: 10

Variable Coefficient Std. Error t-Statistic Prob.

C 8.646870 2.361273 3.661952 0.0080


X2 -0.480894 0.174926 -2.749136 0.0285
X3 0.105595 0.255812 0.412783 0.6921
R-squared 0.524559 Mean dependent var 4.400000
Adjusted R-squared 0.388719 S.D. dependent var 2.875181
S.E. of regression 2.247944 Akaike info criterion 4.701234
Sum squared resid 35.37277 Schwarz criterion 4.792010
Log likelihood -20.50617 Hannan-Quinn criter. 4.601654
F-statistic 3.861594 Durbin-Watson stat 1.310419
Prob(F-statistic) 0.074103

To understand the interpretation of the empirical findings, let's interpret the multiple

regression results.

C is the intercept which shows that if the impact of all independent variables is zero the

value of Y will be equal to 8.647.

The coefficient of X2 is equal to -0.4809 which shows that one unit increase in X2 leads to

decrease Y by 0.4809 units or vice a versa.

The coefficient of X3 is equal to minus 0.1056 which shows that one unit increase in X3

leads to increase Y by 0.1056 units or vice a versa.

Significance

C is the constant or intercept which is statistically significant b/c calculated t value is greater

than 2. P value is 0.0080 which also shows the significance at 1 % level.

The X2 coefficient is statistically significant b/c calculated t value is greater than 2 in absolute

term. P value is 0.0285 which shows the significance at 5 % level.

The X3 coefficient is statistically insignificant b/c calculated t value is not greater than 2. P

value is 0.6921 which shows that the variable is insignificance at 1 % , 5% and 10 % level.
R- squared; 52% variations in the dependent variable are explained by the independent

variables.

Adjusted R- squared; 39% variations in the dependent variable are explained by the

independent variables when the estimation is adjusted for the degree of freedom N-K (10-3).

F test

The value of F<5 so the model is overall not good fit.

Similarly the probability of F is also showing insignificance (at 5%). It implies that model is

overall not good fit.

Durbin-Watson (DW) test

DW value is 1.31 which is less than 2. It means that positive autocorrelation is present.

Note: DW value is almost 2 means no autocorrelation. DW value less than 2 means positive

autocorrelation. DW value greater than 2 means negative autocorrelation.

Obtaining Summary statistics:


To get summary stat. Quick/ group stat./ descriptive stat./common sample
Enter seres names as … Y X2 X3
Y X2 X3
Mean 344.1852 17.71111 2.981481
Median 338.0000 16.80000 2.500000
Maximum 890.0000 47.00000 30.00000
Minimum 13.00000 10.00000 -8.500000
Std. Dev. 302.1118 6.569003 7.696754
Skewness 0.356969 3.329201 1.367863
Kurtosis 1.703821 15.95334 6.712965

Jarque-Bera 2.463512 238.6387 23.92910


Probability 0.291780 0.000000 0.000006

Sum 9293.000 478.2000 80.50000


Sum Sq. Dev. 2373060. 1121.947 1540.241

Observations 27 27 27

The complement of standard descriptive statist are displayed. All of the statistics are calculated
using the observations in the current sample.
• Mean is the average value of the series, obtained by adding up the series and dividing by the
number of observations.

• Median is the middle value (or average of the two middle values) of the series when the
values are ordered from the smallest to the largest. The median is a robust measure of the center
of the distribution that is less sensitive to outliers than the mean.

• Max and Min are the maximum and minimum values of the series in the current sample.

• Std. Dev. (standard deviation) is a measure of dispersion or spread in the series.

• Skewness is a measure of asymmetry of the distribution of the series around its mean.
The skewness of a symmetric distribution, such as the normal distribution, is zero. Positive
skewness means that the distribution
has a long right tail and negative skewness implies that the distribution has a long left tail.

• Kurtosis measures the peakedness or flatness of the distribution of the series.


The kurtosis of the normal distribution is 3. If the kurtosis exceeds 3, the distribution is peaked
(leptokurtic) relative to the normal; if the kurtosis is less than 3, the distribution is flat
(platykurtic) relative to the normal.

Jarque-Bera is a test statistic for testing whether the series is normally distributed.
The test statistic measures the difference of the skewness and kurtosis of the series with those
from the normal distribution.

Null hypothesis; H0: normal distribution


Null hypothesis; H1: Non-normal distribution

If probability of JB test is less than 0.05, we reject H0, it means Non-normal distribution.
If probability of JB test is greater than 0.05, we accept H0, it means normal distribution.
The normal distribution s a bell-shaped distribution. In our results JB probablty is 0.291780 so
we accept H0 and the distribution of Y is normal. JB probablty in case of X2 s 0.0000 and in X3
case its agan 0.0000 which means we reject H0 and the distrbuton of both X2 and X3 are non-
normal.
Graphs in Eviews
Residuals and Graph (Residual plot):

To obtain the table showing the predicted and residual values, go to View >> Actual, Fitted,
Residual >> Actual, Fitted, Residual Table and you get:

EViews: Table of actual, predicted and residual values

Actual means the actual Y , fitted means the estimated Y or Y-hat. Residuals are the difference between
actual and fitted Y(that is, Y – Yhat ). The sum and mean of the residuals equals zero.

Likewise, to obtain the plot of the predicted and residual values, go to View >> Actual, Fitted,
Residual >> Actual, Fitted, Residual Graph and you get:
Other Graphs
Go to …. Quick /Graph/ Series list

Enter variables Names i.e y x2 x3…. You can get different graphs by selecting line, bar, chart or pie
etc.
900

800

700

600

500

400

300

200

100

0
96 98 00 02 04 06 08 10 12 14 16 18 20 22

Y X2 X3

900

800

700

600

500

400

300

200

100

0
96 98 00 02 04 06 08 10 12 14 16 18 20 22

Y X2 X3

You might also like