Professional Documents
Culture Documents
Autocorrelation by Aamir Khan
Autocorrelation by Aamir Khan
1 & 2(3
rd
Monthly) Date: 14/09/2012
AUTOCORRELATION
Autocorrelation occurs when Error terms of different times are dependent on each
other i.e. their Covariance is not equal to zero.
Mathematically: (
) (
And another series making the first one as series of lagged values, lagging by time
period of one i.e.
11 3 2
........ ,......... , c c c ,. Correlation between such series is called
Autocorrelation.
Serial Correlation:
Serial Correlation is the lag correlation between the error terms of two different series.
Example:
If we have a series
10 2 , 1
.......... .......... c c c and another series comprising of the error
terms
11 3 2
........ ,......... , u u u where these two are two different series, then the
correlation among them is called Serial Correlation.
Structure of Autocorrelation:
1. First order Autocorrelation:
It assumes the first order Autoregressive process and is denoted by AR (1).
This model assumes that the error term for period-t (Current period) is
correlated with the error term for period (t 1) (Previous period).
Example:
If we have an income-consumption function then the linear regression model
assuming AR (1) process will be like:
t t t
c | o + X + = Y
Where t=1 10
, which is an error.
The parameter is called the first-order autocorrelation coefficient and
can be interpreted as the correlation coefficient between
and
.
If then the disturbances in period t are positively correlated with the
disturbances in period t-1. In this case there is positive autocorrelation.
This means that when disturbances in period t-1 are positive disturbances,
then disturbances in period t tend to be positive. When disturbances in
period t-1 are negative disturbances, then disturbances in period t tend to
be negative. Time-series data sets in economics are usually characterized by
positive autocorrelation.
If then the disturbances in period t are negatively correlated with
the disturbances in period t-1. In this case there is Negative autocorrelation.
This means that when disturbances in period t-1 are positive disturbances,
then disturbances in period t tend to be negative. When disturbances in
period t-1 are negative disturbances, then disturbances in period t tend to
be positive.
2. Second-Order Autocorrelation:
An alternative model of autocorrelation is called the second-order
autoregressive process or AR(2). The AR(2) model of autocorrelation
assumes that the disturbance in period t is related to both the disturbance in
period t-1 and the disturbance in period t-2. The general linear regression
model that assumes an AR(2) process is given by:
t t t
c | o + X + = Y
The second equation tells us that the disturbance in period t depends upon
the disturbance in period t-1, the disturbance in period t-2 and some
additional amount, which is an error.
3. th-Order Autocorrelation:
The general linear regression model that assumes a th-order autoregressive
process or AR(), where can assume any positive value and is given by:
t t t
c | o + X + = Y
Sources of Autocorrelation:
1. Omitted explanatory variable:
If we exclude an independent variable that can affect our model, then the
effect of that variable results in increase of the error term. Such a case is called
Excluded variable specification bias and the inclusion of such variable
mostly removes the autocorrelation.
Example:
If we have a model:
Where:
Now if we exclude the variable
Where:
Which means that the error
, thus causing
a false Autocorrelation.
2. Misspecification of mathematical form:
It refers to using of an incorrect functional form i.e. using linear form instead
of quadratic and vice versa.
Example:
Considering the above dividend equation, if we take its original form as:
But instead of that we fit the regression line as:
Then the error term
Or
Where:
Note: The numerator consist of n 1 observations. This is because one
observation is lost in taking successive differences i.e. for the first value in
data set, there is no value of
Here
b. If we are performing one tail test then:
It will be one tail test for positive autocorrelation.
c. For negative autocorrelation, one tail test will have hypotheses as:
2. Choosing the significance level i.e. alpha
3. Choosing the test statistic i.e. Durbin-Watson d test
4. Calculations for test statistic.
5. Locating the critical region as:
a. First part is to get the tabulated value of Durbin-Watson test from table
as:
In the table:
K represents the number of independent variables
For different significance levels, there are different tables
In the required table see against number of observations n and for
the required value of k, the values of d
u
and d
l
.
Where these are the upper and lower limits of Durbin-Watson for the
required significance level and k number of independent variables.
b. Then compare these values with the calculated values of d as:
a. If
or
lies between
and
and
is called no
conclusion area or inconclusive area.
d. There is also a rule of thumb that is sometimes used to conclude
that there is no first-order autocorrelation if the d statistic is
between 1.5 and 2.5. A d statistic below 1.5 indicates positive
first-order autocorrelation. A d statistic of greater than 2.5
indicates negative first-order autocorrelation. However, strictly
speaking, this is not correct.
6. Making the decision according to the conditions and calculations.
Above discussion in the form of diagram:
Example for Hypothesis testing through Durbin-Watson d test:
1. Our null and alternative hypothesis are:
Because we are performing a two tail test therefore we have set the hypotheses
as above.
2. Significance level is
3. The test statistic used is the Durbin-Watson test.
Or
Where:
4. Calculations for test statistic:
Given in the Supported Excel File (Sheet1)
5. Critical region (Also given in the Excel File, Sheet1)
6. Taking decision about correlation whether it is there or not.
Given in supported Excel File (Sheet1)
How to find value of Durbin-Watson in Gretl:
Following are the steps to get the value of Durbin-Watson in Gretl:
1. Open the gretl software with Guajarati installed if test is to be run at some
table of this book. In my case, I have run test on table: 6.3 of Guajaratis
book.
2. Click the File menu and select the table from the required book.
3. Then go to Model menu and click at OLS in that menu, a window will
appear where you have to specify the dependent and independent variables
for the software.
4. Put Y
1
into the dependent variable box and X into the independent
variable box.
5. Click ok in that window and you will get the value of Durbin-Watson
for the required data in the table chosen, at the end of gretl windows
values.
6. You can save the residual from this model by clicking at residuals in the
menu Save
5. BG-Test (Breusch godfrey test):
Its also called LM-test and is used to overcome the drawbacks of the Durbin-
Watson test.
Unlike the DW test, its used for models having:
a. More than one lagged values
b. Lagged values of the dependent variable as independent variables
c. Independent variable(X) also as a regressor along with the lagged values
of the dependent variable i.e. the residual for time-t.
If we have a two variable model like
Then the null hypothesis to be tested will be:
Steps for BG-Test:
1. Estimate the original equation by OLS, meaning that estimate the values of
and through OLS method for the estimation of the residual
.
2. Regress
.
This regression will simply take the form like:
Note: if we have P=4, it will mean that we have to put four lagged values in
the model to be regressed for residuals and to run the regression we will have
observations.
3. Obtain
4. If the sample size is large then
and
How to run BG test in Gretl:
Following are the steps for running BG test in gretl:
1. Open Gretl software and select a table from Guajarati through the
following path:
File open data sample file click Guajarati table 6.3
(in this case)
2. A new window with dependent and independent variables will come on
the screen. then follow the path:
Model OLS
A new window will appear where you have to assign dependent and
independent variables to their respective columns as:
Shift Y to dependent variable column and X to independent variable
column.
3. Then click ok, you will get Durbin-Watson value. Then click on the
menu save and then on sub menu residuals and save them as residual
as uhat 1.
4. Then close that window and go to models again, then to OLS and you
will find a residual named uhat1 at the bottom, shift that one to the
dependent variable column.
5. Then go the Tab Lags given at the bottom of the same window and then
check the box for lags of dependent variable and then increase the lags
up to three.
6. Then click ok and gretl will give you the results.
Note: Taking X along with the lagged values depends on ones own taste.
How to interpret the results of BG test:
See from Excel file where all values are given (shhet2)
6. Alternative test:
In Alternative test we use only T-test and Z-test for finding correlation
between residuals and their lagged values and also here we study this
relationship of residual with each of its lagged value separately. Formula for
the T-test will be:
And the Null and Alternative hypothesis will be:
and
I.e. we have to test correlation for both residuals separately during alternative
test for AR (2) model.
For example:
If we have residual equations like this:
Here we can find the correlation between
and
Then using alternative test we will have to determine the correlation between
and
...II
Now to derive the GLS-estimator we will:
a. Derive a transformed model that satisfies all the assumptions of Classical
linear regression model (CLRM) as:
i. Substitute equation II in I we get:
..III
ii. Now in order to eliminate autocorrelation, we will have to eliminate
the term
IV
i.e. we have lagged the equation by one time period. Now solving
equation-IV for
we will get:
Now multiplying both sides of this equation we get:
V
Substituting the value of
in equation-III we get:
This equation can also be written as: (re-arranging the equation)
Now if we say that:
Then the above equation will become as:
.A
This is the transformed model.
Note:
a. Slope of coefficient of transformed model is same as slope of
coefficient of the original model.
b. The constant term of the original model will be:
c.
iv. We can also find the value of
as:
v. From this we can find value of
as well:
But these estimates are not GLS and thus are not BLUE because while
computing
and
And on the basis of this observation, again create transformed
variables
and
.
vi. Now regress
and n
observations.
vii. Resulting values from this regression will be GLS and will be BLUE.
Problems with Using the GLS Estimator:
The major problem with the GLS estimator is that to use it you must know the
true autocorrelation coefficient . If you dont the value of , then you
cant create the transformed variables
and
on a constant and
I
2. Calculate the residuals from this regression i.e. the values of
or
.
3. Regress
on
This will yield an estimate of , denoted
.
but do not include a constant
term in this regression.
4. Use the estimate of to create the transformed variables
and
Which can also b written as:
5. Now get the values of