Professional Documents
Culture Documents
Hitchhiker'S Guide To Eviews and Econometrics: January 2000
Hitchhiker'S Guide To Eviews and Econometrics: January 2000
January 2000
Byung-Joo Lee
Department of Economics
University of Notre Dame
Notre Dame, IN 46556
ByungJoo.Lee.81@nd.edu
574-631-6837
Introductory Session
This is an introductory session of EViews. This session only covers the most basic and
necessary tools that we need to perform minimal regression analysis. The next section,
Econometric Review, covers brief but fairly comprehensive topics of the most
commonly used econometric models.
We will practice an actual statistical analysis using small data set. In this practice, I will
explain both using menu bar and EViews command. EViews command will be
boldfaced. Our practice will appear as italics.
1. Begin EViews session and prepare for the workfile.
You need to create workfile for any EViews session. Click on the File|New|
Workfile. (Type CREATE in a command mode) A dialog box appears to ask the
frequency of data: annual, semi-annual, quarterly, monthly, weekly, daily or undated.
Select appropriate frequency and enter the starting and end period of data. For
example, in the start period area, type 1960 for annual or 1960:1 for quarterly,
monthly or weekly (you can use period instead of colon, e.g., 1960.1). The end
period can be entered as 1969 (annual), 1969:4 (quarterly) or 1969:12(monthly), etc
(see step 11 to enter data). For cross section data, choose none for data frequency and
enter 1 in the start period and n (number of sample size) in the end period. Then,
click OK. Now the workfile appears (currently showing as workfile:UNTITLED)
and there are two variables already in the workfile: C(constant) and resid(Residuals).
2. Type in data for three series CONS, INCOME and CPI.
To enter data manually, click on Quick|Empty Group(Edit Series). (Type DATA).
Blank spreadsheet appears with the pre-specified frequency you entered in step 1.
Click in the gray cell to enter the series (variable) name. This window is called the
Group within your own work file. For each operation you do in your work file, you
can name these as separate group or you can just discard to close the group windows.
You can use any ASCII characters up to 8 characters. Start type in numeric data
immediately below the series name. Type CONS (consumption) and press down
arrow (DATA CONS) and start to type in 325, 335, Data is at the end of this
exercise. After finishing cons, go to the next column and in the gray area, type
INCOME and type in 350, 364, 385, Repeat this for another series CPI. (You can
do all these three series at once by DATA CONS INCOME CPI).
If you want to read a data from other file(ASCII file, Lotus *.WK3, or Excel *.XLS),
click on Proc/Import Data in the workfile menu. Choose proper file format to read
in and select data file from your directory. You will have a data dialog window that is
asking the order of data (by observation or by series: most data is arranged by
observation) and series names (if they dont have names). If data already have series
names, simply type in how many series in the data set. EViews will read in all
necessary data into the workfile. You can export selected series into any data file
format by click on Proc/Export Data in the workfile menu.
3. Save and retrieve your current working file.
When you are done entering all your data, you need to save your work file so that you
can use it later. To save your working file, click on File|SaveAs and give appropriate
file name (work file has an automatic extension of *.wf1) and path for your own
(SAVE filename). In a later session when you need this work file back, you can
retrieve by File|Open and find your appropriate work file (LOAD filename).
4. Generate real variables using nominal variables and price index.
Note that CONS and INCOME are nominal terms. In order to use these variables in
real terms, we need to modify these variables into real terms. In your current work
file menu bar, there is a button for GENR. Click GENR (GENR), you will have a
window asking equation and sample period. Enter appropriate equations to change
nominal variables into real terms. Type RCONS(Real consumption)=CONS/CPI and
adjust for sample period. Sample period is already entered as the entire period. Do
the same for RINCOME (Real income).
5. Print CONS, INCOME, CPI, RCONS, RINCOME.
If you want to look at your data, you can use SHOW command in the work file menu
bar and type appropriate series name(s) (SHOW CONS INCOME). For a single
series, you can just double click on its series name in the workfile directory. For a
multiple series, highlight the entire series name in the workfile directory and click
SHOW. You can type more than one series name in the series name window. You
can print using PRINT menu in the work group menu bar (PRINT CONS
INCOME).
6. Plot each series separately and simultaneously.
You can a graph for a single or multiple series. You can also do this by View|
Graphics under the Group menu (This is only available under the View in the Group
menu, where you see the actual list of series, for example) (PLOT series name or
PLOT series1, series2). Each series is plotted using different color. You can make
multiple graphs by choosing View|Multiple Graphs in the Group menu.
11. The following is the dada set we used for this introductory session.
Year
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
Cons
325
335
355
375
401
433
466
492
537
576
Income Cpi
350
0.887
364
0.896
385
0.906
405
0.917
438
0.929
473
0.945
512
0.972
547
1.000
590
1.042
630
1.098
Data Handling
1. Data Transformation
You can transform most of data using GENR button in the workfile menu bar.
Following are the most commonly used functions and operations.
+, -, *, /
>, <, =, <>
<=, =>
AND, OR
X2=X^2
LY=LOG(X)
EX=EXP(X)
AX=ABS(X)
SQX=SQR(X)
RND, NRND
RX=@INV(X)
DX=D(X)
DnX=D(X,n)
LX=X(-1)
2. Descriptive Statistics
@SUM(X)
@MEAN(X)
@VAR(X)
@COV(X,Y)
@COR(X,Y)
@DNORM(X)
@CNORM(X)
Sum of X
Mean of X
Variance of X
Covariance between X and Y
Correlation between X and Y
Standard Normal Density Function of X
CDF of Standard Normal Random Variable
3. Regression Statistics
If you assign the name of your Object for your regression, you can use the regression
name to retrieve various regression statistics. For example, assume our regression
name is TEST. Then, TEST@R2 is an R2 value of the TEST regression. If you did
not assign regression name, @R2 refers to R2 value of the most recently estimated
equation.
@R2, @RBAR
@SE, @SSR
@DW, @F, @LOGL
R2 and adjusted R2
Standard error of regression, sum of squared residual
Durbin-Watson, F-statistic, value of log-likelihood function
Econometric Review
This section provides a quick review of econometric technique commonly used for many
empirical economic research projects. However, this section does not intend to teach
econometric theory. Those who are interested in learning more econometric theory can
read Gujarati (1996)s Basic Econometrics, 3rd ed., or more advanced book, Econometric
Analysis, 4th ed., by W. Greene (2000).
In this section, I assume that data is already loaded and ready to use and all the variables
are defined. If you are not familiar with this, go back to the steps 2 and 3 in previous
section. For EViews command, I will provide both menu bar and EViews command for
each operation whenever possible. Throughout this section, I will assume the following
regression model and variable notations. I also assume that you know how to interpret
regression output.
Yt 0 1X t1 2 X t2 u t
Yt (or Yt1, Yt2) is a dependent variable and Xt1 and Xt2 are independent variables and s
are parameters to estimate.
1. Ordinary Least Squares (OLS) Estimation.
This is the most commonly and widely used estimation method when classical
assumptions of regression are all met. The classical assumptions are:
1) E u t X t 0
2) X t and u t are uncorrelated, E X t u t 0
2
3) Error terms are homoskedastic, Var u t X t E u t X t
u t ~ N 0, 2
u t ~ N 0, X
LS Y C X1 X2 AR(1)
LS Y C X1 X2 AR(1) AR(2)
Regression output reports parameter estimates of the model and the
autocorrelation parameters.
2.4. ARIMA Model.
When error term follows the general ARIMA(p,0,q) structure such that
u t 1u t 1 p u t p 1 t 1 q t q , we have the ARIMA model.
Quick|Estimate Equation and type Y C X1 X2 AR(1) MA(1) in the Equation
Specification window. Choose Least Squares in the Estimation Settings
window.
LS Y C X1 X2 AR(1) MA(1)
3. Autoregressive Conditional Heteroskedasticity ((G)ARCH).
ARCH model is similar to the ARIMA model, but ARCH model assumes the ARIMA
relationship in the second moment, i.e., the conditional variance of u t follows
autoregressive (ARCH) and/or moving average (GARCH) components in
heteroskedasticity structure. These models are frequently used to analyze the
financial data where some periods of large price volatility (measured by the variance)
follows by the periods of relative tranquility.
2
h t E u t t 1 : Conditional variance of u t ,
Yt 0 1X t1 2 X t2 u t ,
3.1. ARCH(p)
2
h t 0 1 u t 1 2 u t 2 ... p u t p
3.2. GARCH(p,q)
h t 0 1 u t 1 2 u t 2 ... p u t p 1 h t 1 2 h t 2 ... q h t q
10
Yit X it uit .
Depending on the error structure of uit , we can allow cross-sectional
heteroskedasticity or cross-sectional correlation. To estimate this model, you
need to understand the way EViews handle the pooled data structure. (See the
previous section for pooled data handling).
To estimate this model, click Objects|New Object|Pool, and you will get Pooled
Estimation window. Specify appropriate entries (dependent variable, sample
period and regressors). In the Regressors window, you need to specify which
variables have the common coefficients and which have different coefficients
(cross section specific coefficients). In above model, we assume that all variables
have the same coefficients (no cross section specific coefficients). To allow crosssectional heteroskedasticity, select Cross section weights, and to allow crosssectional correlation, select SUR estimation. For intercept choose either none or
common.
4.2. Panel Data Analysis
11
Panel data analysis is a special case of pooled time series and cross section data
analysis. This is often found in the longitudinal data structure where same
individual is followed over periods of time. Therefore, there may exist individual
specific effect (heterogeneity) constant over time. There are two ways to handle
this problem, fixed effect or random effect.
'
Yit i X it uit
Yit X it ui it
'
Yt 0 1 X t u t
*
The latent variable ( Yt ) is unobservable, but the binary variable Yt is observed one
if 0 1 X t u t 0 and zero otherwise. Depending on the assumptions about the
error term, we define either Logit model ( u t has a Weibull distribution) or Probit
model ( u t has a normal distribution). These models are estimated by maximum
likelihood estimation with numerical iteration (typically by Newton-Raphson or
BHHH method).
Quick|Estimate Equation and type Y C X1 X2 in the Equation Specification
window. Choose Logit or Probit in the Estimation Settings window. Specify
appropriate sample period.
LOGIT Y C X1 X2
PROBIT Y C X1 X2
6. Non-Linear Least Squares.
12
EViews automatically applies nonlinear least squares to any equation that is nonlinear
in its coefficients. You can just specify nonlinear equation in the Equation
Specification window. For example, if you want to estimate the CES production
function with the following specification:
Yt A K t 1 L t
1
,
y t y t 1 u t
y t y t 1 t u t
This is the basic Dickey-Fuller unit root test equation, and the testable hypothesis
is 0 (i.e., 1 , y t has a unit root). There are three different tables
depending on your testable equation (w/ or w/o intercept and/or trend variable).
More general version of the original DK test is the Augmented DK test (ADF) as
following.
P
y t y t 1 p y t-p1 u t
p 1
y t y t 1 p y t-p 1 u t
p 1
y t y t 1 t p y t-p 1 u t
p 1
In the ADF test, we assume the error terms ( u t ) are independent and have
constant variances. Also, the lag length P in the regression equation is rather
13
Even though these equations look simpler than ADF test, this test allows far more
general data generating process allowable by the ADF test. Both tests use the
same critical values.
Select the series you want to test unit root, and double click (or View|Show) the
series to get the series window. Click View|Unit Root Test and choose
appropriate options (ADF or Phillips-Perron, and appropriate equation for unit
root test).
7.2. Vector Autoregression (VAR)
VAR is a system of stationary time series variables. Each equation has the same
right-hand side variables consisting of exogenous variables and the lagged values
of all endogenous variables in the system. This system is often used to determine
the causality (Granger-causality) between variables. This system is also useful to
investigate the external shock effects on the endogenous variables using impulse
response function.
Yt 10 11 Yt 1 12 Z t 1 13 t u1t
Z t 20 21 Yt 1 22 Z t 1 23 t u 2t
14
In a multiple non-stationary time series, it is possible that there is more than one
linear relationship to form a cointegration. This is called the cointegration rank.
For cointegration test, select the series (group of variables) to test cointegration to
obtain group window. Choose View|Cointegration Test and specify appropriate
settings for testing. The setting is whether you want to specify intercept and/or
linear deterministic time trend in the cointegration equation.
7.4. Error Correction Model
If two or more non-stationary time series are cointegrated, then there exists an
Error Correction Model (ECM). Cointegration is a necessary condition for ECM.
ECM describes the long run equilibrium relationship between non-stationary
series. Even though individual series are non-stationary, when they are
cointegrated, there is a long run equilibrium relationship, and ECM explains this
relationship.
X t m1 11 X t 1 12 Yt 1 1 Z t 1 u1t
Yt m 2 21 X t 1 22 Yt 1 2 Z t 1 u 2t
ECM is similar to VAR, but the original series are non-stationary and they are
cointegrated. To estimate ECM, follow the same path as VAR estimation.
Objects|New Object|VAR and select appropriate entries. For VAR specification,
choose Vector Error Correction, and specify appropriate cointegration equation
(i.e., w/ or w/o intercept and/or deterministic time trends).
8. System of Equations
When we have more than one equations to estimate together, we will use additional
information from other equations to improve the efficiency of parameter estimates.
8.1. Seemingly Unrelated Regression
Y1t 10 11 X1t1 12 X1t2 u1t
Y2 t 20 21X2 t1 22 X2 t2 u2 t
involves numerical iteration, you can choose appropriate number of iterations and
convergence criteria in the Options button.
8.2. Simultaneous Equation System
Y1t 10 11 X1t 12 Y2 t u1t
Y2 t 20 21X2 t 21Y1t u2 t
This equation system is different from SUR model in a sense that the dependent
variables appear in the right hand side of each equation. Because of this
endogeneity problem, simple OLS of each equation will yield inconsistent
estimators. To estimate this simultaneous equation system, each equation should
fist satisfy identification condition of order condition and rank condition.
8.2.1. Two Stage Least Squares (TSLS).
This is one of the most often used estimation methods for simultaneous
equation. The first stage of the TSLS estimation involves the estimation of all
endogenous variables on all exogenous variables in the system and some other
instrumental variables. The second stage is the least squares estimation of the
structural equations using the estimated values of the endogenous variables
from the first stage. The structural parameters are estimated in each equation
separately.
8.2.2. Three Stage Least Squares (3SLS)
3SLS is more efficient estimation procedure than TSLS in the sense that 3SLS
estimates entire structural parameters all at once. The first two stages of 3SLS
is equivalent to the TSLS, but 3SLS uses TSLS estimates to estimate covariance structure of entire system. Using the estimated co-variances of the
system, the final stage (the third stage) is the GLS estimation method of the
entire system. This method is more efficient than TSLS.
To use either one of above estimation method, click Objects|New Object|System
as above for SUR estimation. In the System window, type in only the behavioral
equations. Behavioral equations are the ones with structural parameters to
estimate. Ignore any other identities in the system. Type your equations for such
that Y1=C(1)+C(2)* Y2+C(3)*X1 for first equation and Y2=C(4)+C(5)*
Y1+C(6)*X2, etc. Since this is TSLS or 3SLS estimation, you need to specify
instrumental variables for the first stage estimation. After the structural equations,
you need to specify which variables to use as instrumental variables. For TSLS
and 3SLS, you need all exogenous variables in the system for the instrumental
variables. Type INST X1 X2. Constant is automatically included as an
instrumental variable. Click Estimate in the System menu bar. You will have a
choice of different estimation methods. Choose either Two Stage Least Squares
or Three Stage Least Squares.
16
S m T WT m T , where m T
'
defined as m T m T '
1 T
ut
T t 1
T
1 u X
t
t
T
t 1
17