You are on page 1of 49

An Investigation into Regression Model using

EVIEWS
Module One

Prepared by:
Sayed Hossain
For more articles and videos, visit at:
www.sayedhossain.com

Seven assumptions about a good regression model

1.
2.
3.
4.

5.
6.
7.

Regression line must be fitted to data strongly.


Most of the independent variables should be
individually significant to explain dependent variable
Independent variables should be jointly significant to
influence or explain dependent variable.
The sign of the coefficients should follow economic
theory or expectation or experiences or intuition.
No serial or auto-correlation in the residual (u)
The variance of the residual (u) should be constant
meaning that homoscedasticity
The residual (u) should be normally distributed.
2

(Assumption no. 1)
Regression line must be fitted to data strongly
(Goodness of Data Fit)

***
Guideline : R2 => 60 percent (0.60) is better

Goodness of Data Fit

Data must be fitted reasonable well.

That is value of R2 should be reasonable


high, more than 60 percent.
Higher the R2 better the fitted data.

www.sayedhossain.com
4

(Assumption no. 2)
Most of the independent variables should
be individually individually significant
**
t- test
t test is done to know whether each and every
independent variable (X1, X2 and X3 etc here) is
individually significant or not to influence the
dependent variable, that is Y here.
5

Individual significance of the variable

Most of the independent variables should be individually


significant.

This matter can be checked using t test.


If the p-value of t statistics is less than 5 percent (0.05)
we can reject the null and accept alternative hypothesis.

If we can reject the null hypothesis, it means that


particular independent variable is significant to influence
dependent variable in the population.
6

For Example>>
Variables:
We have four variables, Y, X1, X2 X3
Here Y is dependent and X1, X2 X3 are independent
Population regression model
Y = Bo + B1X1+ B2X2 + B3X3 + u

Sample regression model


Y = bo + b1X1+ b2X2 + b3X3 + e
Here, sample regression line is a estimator of population regression line. Our
target is to estimate population regression line (which is almost impposible or
time and money consuming to estimate) from sample regression line. For
example, small b1, b2 and b3 are estimators of big B1, B2 and B3
Here, u is the residual for population regression line while e is the residual for
sample regression line. e is the estimator of u. We want to know the nature of u
from e.

Tips
If the sample collection is done as per the
statistical guideline (several random procedures)
then sample regression line can be a
representative of population regression line.
Our target is to estimate the population
regression line from a sample regression line.

Setting hypothesis for t test : An example


Null Hypothesis: Bo=0
Alternative hypothesis: Bo0
Null hypothesis : B1=0
Alternative hypothesis: B10
Null Hypothesis : B2=0
Alternative hypothesis: B20
Null Hypothesis : B3=0
Alternative hypothesis: B3 0
Hypothesis setting is always done for population, not for sample. That
is why we have taken all big B (from population regression line) but
not small b from sample regression line.
9

Hypothesis Setting
Null hypothesis : B1=0
Alternative hypothesis: B10

Since the direction of alternative hypothesisis is , meaning that we assume


that there exists a relationship between independent variable (X1 should be
here) with dependent variable (Y here) in the population. But it can not say
whether the relationship is negative or positive. This direction is a two tail
hypothesis.

Null hypothesis : B1=0


Alternative hypothesis: B1<0

But if we set hypothesis as above, then we assume that in the population,


there exists a negative relationship between X1 and Y as the direction in
alternative hypothesis is <. It requires one tail test.

(Assumption no. 3)

Joint Significace
Independent variables should be jointly significant
to explain dependent variable
**
F- test
ANOVA
(Analysis of Variance)

11

Joint significance

Independent variables should be jointly


significant to explain Y. This can be checked
using F-test.

If the p-value of F statistic is less than 5


percent (0.05) we can reject the null and
accept alternative hypothesis.

If we can reject null hypothesis, it means that


all the independent variables (X1, X2 X3 ) jointly
can influence dependent variable, that is Y
here.
12

Joint hypothesis setting


Null hypothesis Ho: B1=B2=B3=0
Alternative H1: Not all Bs are simultaneously equal to
zero
Here Bo is dropped as it is not associated with any
variable.

Here also taken all big B


www.sayedhossain.com

13

Few things
Residual ( u or e) = Actual Y estimated (fitted)
Y
Residual, error term, disturbance term all are
same meaning.
Serial correlation and auto-correlation are same
meaning.

14

(Assumption no. 4)
The sign of the coefficients should follow
economic theory or expectation or experiences
of others (literature review) or intuition.

www.sayedhossain.com
15

Residual Analysis

(Assumption no. 5)
No serial or auto-correlation in the residual (u).

**
Breusch-Godfrey serial correlation LM test : BG test

17

Serial correlation
Serial correlation is a statistical term used to the
describe the situation when the residual is
correlated with lagged values of itself.
In other words, If residuals are correlated, we
call this situation serial correlation which is not
desirable.

18

How serial correlation can be formed in the


model?
Incorrect model specification,
omitted variables,
incorrect functional form,
incorrectly transformed data.

19

Detection of serial correlation

Many ways we can detect the existence of serial


correlation in the model.

An approach of detecting serial correlation is


Breusch-Godfrey serial correlation LM test : BG
test

20

Hypothesis setting
Null hypothesis Ho: no serial correlation (no
correlation between residuals (ui and uj))
Alternative hypothesis H1: serial correlation
(correlation between residuals (ui and uj )

21

(Assumption no. 6)
The variance of the residual (u) is
constant (Homoscedasticity)
***
Breusch-Pegan-Godfrey Test

22

Heteroscedasticity is a term used to the


describe the situation when the variance of the
residuals from a model is not constant.

When the variance of the residuals is constant,


we call it homoscedasticity. Homoscedasticity
is desirable.

If residuals do not have constant variance, we


call it hetersocedasticty, which is not desirable.

23

How the heteroscedasticity may form?


Incorrect model specification,
Incorrectly transformed data,

24

Hypothesis setting for heteroscedasticity

Null hypothesis Ho: Homoscedasticity (the


variance of residual (u) is constant)
Alternative hypothesis H1 : Heteroscedasticity
(the variance of residual (u) is not constant )

25

Detection of heteroscedasticity
There are many test involed to detect
heteroscedasticity.
One of them is Bruesch-Pegan-Godfrey test
which we will employ here.

26

(Assumption no. 7)
Residuals (u ) should be normally distributed
**
Jarque Bera statistics

27

Setting the hypothesis:

Null hypothesis Ho : Normal distribution (the


residual (u) follows a normal distribution)
Alternative hypothesis H1: Not normal
distribution (the residual (u) follows not normal
distribution)

Detecting residual normality:


Histogram-Normality test (Perform JarqueBera Statistic).

If the p-value of Jarque-Bera statistics is less


than 5 percent (0.05) we can reject null and
accept the alternative, that is residuals (u) are
28
not normally distributed.

An Emperical
Model Development

www.sayedhossain.com
29

Our hypothetical model


Variables:
We have four variables, Y, X1, X2 X3
Here Y is dependent and X1, X2 and X3 are independent

Population regression model


Y = Bo + B1X1+ B2X2 + B3X3 + u
Sample regression line
Y = bo+ b1X1+ b2X2+b3X3 + e

30

DATA
Sample size is 35 taken from
population

DATA
obs

RESID

X1

X2

X3

YF

0.417167

1700

1.2

20000

1.2

0.782833

-0.27926

1200

1.03

18000

0.65

0.929257

-0.17833

2100

1.2

19000

0.6

0.778327

0.231419

937.5

15163

1.2

0.968581

-0.33278

7343.3

0.97

21000

0.5

0.832781

0.139639

837.9

0.88

15329

1.2

1.060361

-0.01746

1648

0.91

16141

1.017457

-0.14573

739.1

1.2

21876

0.65

0.795733

0.480882

2100

0.89

17115

1.5

1.019118

10

-0.0297

274.6

0.23

23400

1.5

1.529701

11

-0.32756

231

0.87

16127

0.75

1.077562

12

0.016113

1879.1

0.94

17688

0.983887

13

-0.34631

1941

0.99

17340

0.6

0.946315

14

0.485755

2317.6

0.87

21000

1.5

1.014245

15

0.972181

471.4

0.93

16000

1.027819

16

-0.22757

678

0.79

16321

0.9

1.127572

17

-0.2685

7632.9

0.93

18027

0.6

0.868503

18

-0.41902

510.1

0.93

18023

0.6

1.019018

19

-0.4259

630.6

0.93

15634

0.6

1.0259

20

0.076632

1500

1.03

17886

0.923368

DATA
obs

RESID

X1

X2

X3

YF

21

-0.37349949

1618.3

1.1

16537

0.5

0.873499

22

0.183799347

2009.8

0.96

17655

1.15

0.966201

23

0.195832507

1562.4

0.96

23100

1.15

0.954167

24

-0.46138707

1200

0.88

13130

0.6

1.061387

25

0.309577968

13103

20513

0.690422

26

-0.21073204

3739.6

0.92

17409

0.75

0.960732

27

-0.08351157

324

1.2

14525

0.75

0.833512

28

-0.02060854

2385.8

0.89

15207

1.020609

29

0.14577644

1698.5

0.93

15409

1.15

1.004224

30

-0.06000649

544

0.87

18900

1.060006

31

-0.50510204

1769.1

0.45

17677

0.85

1.355102

32

0.870370225

1065

0.65

15092

2.1

1.22963

33

0.274774344

803.1

0.98

18014

1.25

0.975226

34

-0.1496757

1616.7

28988

0.75

0.899676

35

0.062732149

210

1.2

21786

0.87

0.807268

Y, X1, X2 and X3 are actual sample data collected from population


YF= Estimated, forecasted or predicted Y
RESID (e) = Residuals of the sample regression line that is, e=Actual Y Predicted Y (fitted Y)

Regression Output

Regression output
Dependent Variable: Y

35 Observation

Method: Least Squares

Included observations: 98
Variable

Coefficient

Std. Error

t-Statistic

Prob.

1.800

0.4836

3.72

0.0008

X1

-2.11E-05

2.58E-05

-0.820

0.4183

X2

-0.7527

0.3319

-2.267

0.0305

X3

-3.95E-06

2.08E-05

-0.189

0.8509

R-squared

0.1684

Mean dependent var

0.9834

Adjusted R-squared

0.087

S.D. dependent var

0.3912

S.E. of regression

0.3736

Akaike info criterion

0.9762

Sum squared resid

4.328

Schwarz criterion

1.15

Log likelihood

-13.08

F-statistic

2.093

Durbin-Watson stat

2.184

Prob(F-statistic)

0.1213

35

Few things
t- statistics= Coeffient / standard error
t-statistics (absolute value) and p values always
move in opposite direction

Output
Actual Y, Fitted Y, Residual and its plotting
obs

Actual

Fitted

Residual

Residual Plot

1.2

0.782832991

0.417167009

. | .*

0.65

0.92925722

-0.27925722

.* | .

0.6

0.778327375

-0.178327375

.*| .

1.2

0.96858115

0.23141885

. | *.

0.5

0.8327808

-0.3327808

* | .

1.2

1.060360549

0.139639451

. |*.

1.017457055

-0.017457055

. * .

0.65

0.79573323

-0.14573323

.*| .

1.5

1.019118163

0.480881837

. | .*

10

1.5

1.529701243

-0.029701243

. * .

11

0.75

1.077562408

-0.327562408

* | .

12

0.983887019

0.016112981

. * .

13

0.6

0.946314864

-0.346314864

* | .

14

1.5

1.014244939

0.485755061

. | .*

15

1.027819105

0.972180895

. | .

16

0.9

1.127572088

-0.227572088

.* | .

17

0.6

0.868503447

-0.268503447

.* | .

18

0.6

1.019018495

-0.419018495

*. | .

19

0.6

1.025899595

-0.425899595

*. | .

20

0.923368304

0.076631696

. |* .

Output
Actual Y, Fitted Y, Residual and its plotting
obs

Actual

Fitted

Residual

Residual Plot

21

0.5

0.873499486

-0.373499486

* | .

22

1.15

0.966200653

0.183799347

. |*.

23

1.15

0.954167493

0.195832507

. |*.

24

0.6

1.061387074

-0.461387074

*. | .

25

0.690422032

0.309577968

. | *

26

0.75

0.960732042

-0.210732042

.*| .

27

0.75

0.833511567

-0.083511567

. *| .

28

1.020608541

-0.020608541

. * .

29

1.15

1.00422356

0.14577644

. |*.

30

1.060006494

-0.060006494

. *| .

31

0.85

1.355102042

-0.505102042

*. | .

32

2.1

1.229629775

0.870370225

. | .

*|

33

1.25

0.975225656

0.274774344

. | *.

34

0.75

0.899675696

-0.149675696

.*| .

35

0.87

0.807267851

0.062732149

. |* .

Actual Y, Fitted Y and Residual


2.4
2.0
1.6
1.2
1.2
0.8
0.8
0.4
0.4
0.0
-0.4
-0.8
5

10

15

Residual

20
Actual

25

30
Fitted

35

Sample residual
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
5

10

15

20
Y Residuals

25

30

35

(Assumption no. 1)
Goodness of Fit Data
R-square: 0.1684
It means that 16.84 percent variation in Y can
be explained jointly by three independent
variables such as X1, x2 and X3. The rest
83.16 percent variation in Y can be explained
by residuals or other variables other than X1 X2
and X3.
41

(Assumption no. 2)
Joint Hypothesis : F statistics
F statistics: 2.093 and Prob 0.1213
Null hypothesis Ho: B1=B2=B3=0

Alternative H1: Not all Bs are simultaneously equal to zero

Since the p-value is more tha than 5 percent (here 12.13


percent), we can not reject null. In other words, it means that all
the independent variables (here X1 X2 and X3) can not jointly
explain or influence Y in the population.

42

Assumption No. 3
Independent variable significance
For X1, p-value : 0.4183
Null Hypothesis: B1=0
Alternative hypothesis: B10
Since the p-value is more than 5 percent (0.05) we can not reject null and meaning
we accept null meaning B1=0. In other words, X1 can not influence Y in the
population.
For X2, p-value: 0.0305 (3.05 percent)
Null Hypothesis: B2=0
Alternative hypothesis: B20
Since p-value (0.03035) is less than 5 percent meaning that we can reject null and
accept alternative hypothesis. It means that variable X2 can influence variable Y in
the population but what direction we can not say as alternative hypothesis is .

For X3, p-value: 0.8509. So X3 is not significant to explain Y.

43

Assumption No. 4
Sign of the coefficients
Our sample model:
Y=bo+b1x1+b2x2+b3x3+e
Sign we expected after estimation as follows:
Y=bo - b1x1 + b2x2 - b3x3
Decision : The outcome did not match with our expectation.
So assumption 4 is violated.

Assumption no 5
SERIAL OR AUTOCORRELATION
Breusch-Godfrey Serial Correlation LM Test:
F-statistic
1.01
Obs*R-squared 2.288

Prob. F(2,29)
0.3751
Prob. Chi-Square(2) 0.3185

Null hypothesis : No serial correlation in the residuals


(u)
Alternative: There is serial correlation in the residuals
(u)
Since the p-value ( 0.3185) of Obs*R-squared is
more than 5 percent (p>0.05), we can not reject null
hypothesis meaning that residuals (u) are not
serially correlated which is desirable.

45

Assumption no. 6
Heteroscedasticy Test
Breusch-Pegan-Godfrey test (B-P-G Test)
F-statistic
1.84 Probability 0.3316
Obs*R-squared 3.600 Probability 0.3080
Null Hypothesis: Residuals (u) are Homoscedastic
Alternative: Residuals (u) are Hetroscedastic

The p-value of Obs*R-squared shows that we can not


reject null. So residuals do have constant variance which
is desirable meaning that residuals are homoscedastic.
B-P-G test normally done for large sample
46

Assumption no. 7
Residual (u) Normality Test
6

Series: Residuals
Sample 1 35
Observations 35

Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis

Jarque-Bera
Probability

1.15e-16
-0.029701
0.972181
-0.505102
0.356788
0.880996
3.508042
4.903965
0.086123

0
-0.6

-0.4

-0.2

-0.0

0.2

0.4

0.6

0.8

1.0

Null Hypothesis: residuals (u) are normally distribution


Alternative: Not normally distributed

Jarque Berra statistics is 4.903 and the corresponding p value is


0.08612. Since p vaue is more than 5 percent we accept null meaning
that population residual (u) is normally distrbuted which fulfills the
47
assumption of a good regression line.

Evaluation of our model on the basis of


assumptions
1.
2.
3.
4.
5.
6.
7.

R-square is very low ( Bad sign)


There is no serial correlation (Good sign)
Independent variables are not jointly can influence Y
(Bad sign)
Signs are not as expected (Bad sign)
Only X2 variable is significant out of three (Bad sign).
Heteroscedasticity problem is not there (Good sign)
Residuals are normally distributed (Good sign)

48

Use the information of this website on your own risk. This website
shall not be responsible for any loss or expense suffered in
connection with the use of this website.

49

You might also like