Time Series Analysis Methods
Time Series Analysis Methods
htm
See also: latent growth curve modeling, event history analysis, cox regression.
Cohort analysis design. The cohort design involves study of a group which has experienced a major event, such as those
who were of draft age during the Vietnam War. Thus, a cohort is a set of people who experienced an event at roughly the
same time. A standard cohort table is constructed in which the columns are the times the data are collected and the rows
are the groups (usually age groups, such as 21-30, 31-40, etc) from whom data are collected. Ideally, the data collection
time interval (the survey interval) is the same as the age span of the groups (the cohort interval) being studied. In a
standard cohort table, with means of the variable of interest in the cells, inter-cohort cross-sectional comparisons for
different age cohorts at the same time period are read down the columns, inter-cohort cross-period comparisons for
comparable cohorts over time are read across the rows, and intra-cohort trends for the same age group over time are read
diagonally from left to right and top to bottom.
Panel studies design. Panel studies are similar to cohort analyses except the same individuals are interviewed in each
period, whereas in cohort studies only a random sample of the same age group is interviewed. In panel designs, such as
those built into the National Election Studies, individuals serve as their own controls across survey periods, whereas in
cohort analyses, random sampling is assumed to serve a control function, assuring similarity of a cohort by gender, race,
etc., as it is measured in successive time periods. Where cohort designs measure net change, panel designs measure gross
change (for instance, many more people may switch party identification in a given period than are measured by the net
effect of all such switches). Normally panel designs cannot measure net change for an entire population because only
certain birth cohorts are studied as they evolve over time, and new cohorts entering the population are not studied. Unlike
cohort studies, panel studies have the special problem of sensitization: subjects may react differently on a second survey
simply because they have had the experience of the first one. To check for a sensitization effect, panel studies may require
control groups matched to the panel groups, further increasing the great expense of this design. Panel data regression is
discussed in the section on multiple regression.
Event-history design. This is a type of panel study in which the periods of observation are not arbitrarily spaced but
instead is taken at each stage of a sequence of events. The timing and spacing of observations thus becomes a critical
variable in its own right. Moreover, often in event history studies the data are not interviews of individuals but rather
measurements pertaining to organizations or even governments. Event history analysis is covered more fully in a separate
section.
Time series effects. There are three types of time-series effects, having to do with age, period, and cohort. Disentangling
these three types of effects for one set of time-series data is a major challenge of time series analysis, discussed below.
Age effects are effects related to aging or the life-cycle. For instance, individuals often tend to become more
conservative as they age.
Period effects are effects affecting all cohorts in a given historical period. For instance, individuals who
experienced the Great Depression became more likely to support social welfare policies.
Cohort effects are effects which reflect the unique reaction of a cohort to an historical event, or which were
experienced uniquely by the cohort. For instance, the post-WWII cohort which reached draft age during the
Vietnam War experienced unique issues which seem to be associated with increased alienation from government.
Dependence in a time series refers to serial dependence -- that is, the correlation of observations of one variable at one
1 of 12 3/21/2010 10:09 PM
Time Series Analysis: Statnotes, from North Carolina State University, ... http://faculty.chass.ncsu.edu/garson/PA765/time.htm
point in time with observations of the same variable at prior time points. It is the object of many forms of time series
analysis to identify the type of dependency which exists, then to create mathematical formulae which emulate the
dependence, and only then to proceed with forecasting or policy analysis.
Stationarity occurs in a time series when the mean value of the series remains constant over the time series. Frequently,
differencing (see below) is needed to achieve stationarity. A stricter definition of stationarity also requires that the
variance remain homogenous for the series. Sometimes this can be achieved by taking the logarithm of the data. Because
many time series (most economic indicators, for instance) tend to rise, simple application of regression methods to time
series encounters spurious correlations and even multicollinearity. A first step in time series analysis is to achieve
stationarity in the data to avoid these problems. Stationarity is discussed further under "Assumptions."
Differencing is a data pre-processing step which attempts to de-trend data to control autocorrelation and achieve
stationarity by subtracting each datum in a series from its predecessor. Single differencing is used to de-trend linear trends.
Double differencing is used to de-trend quadratic trends. Differencing will drive autocorrelation toward 0 or even in a
negative direction. As a rule of thumb, if single differencing yields autocorrelation spike in an ACF plot (discussed below)
>.5, then there has been over-differencing. Over-differencing is also indicated by a residual plot where there is a change
in sign from one time period to the next. As another rule of thumb, the optimal level of differencing will be the one with
the lowest standard deviation. Sometimes adding moving average (q) terms (see below) to the model will compensate for
moderate over-differencing. Note that with each degree of differencing, the time series is shortened by one.
Specification. Specification may involve testing for linear vs. nonlinear dependence, followed by specifying either linear
models (AR (auto-regressive), MA (moving average), ARMA (combined), or ARIMA (combined, integrated)) or nonlinear
models (TAR (threshold autoregressive), Bilinear, EXPAR (exponential autoregressive), ARCH (auto-regressive
conditional heterescedastic), GARCH (generalized ARCH)). The procedures in SPSS that handle autoregressive models
are AREG (for AR(1) models) and ARIMA (for more general models), which are found in the SPSS Trends module.
ARIMA is also found in SPSS under Analyze, Time Series, ARIMA.
Exponential Smoothing
Exponential smoothing is a form of time series analysis in which the researcher is interested less in modeling than in sheer
prediction of the next period based on data about the current and recent periods. It is used for short-term, "one step
ahead" forecasting in practical situations, and is not appropriate for long or even medium-term forecasting.
Weighting. A key question in forming an exponential smoothing line is how much to count the current period, the previous
period, and earlier periods when estimating the value in the next period. Exponential smoothing software allows up to four
weighting parameters to be assigned:
1. Alpha: This parameter affects level estimates and varies from 0 (old observations count just as much as the current
observation) to 1 (the current observation is used exclusively).
2. Gamma: This parameter affects trend estimates and varies from 0 (trends are based on all observations counting
equally) to 1 (the trend is only based on the most recent observations in the series). Gamma is only used if there is a
trend in the series.
3. Phi: This parameter affects damping estimates and is used instead of gamma when it is thought that the series is
dying out. A value of phi near .1 uses all observations to estimate any trend toward dying out, whereas phi values
near .9 respond rapidly to any recent observations indicating the trend is dying.
4. Delta: This parameter affects seasonality estimates and is used only when the data shows seasonal cycles. Delta
values near 0 cause all points to count equally while delta values near 1 estimate seasonality primarily from recent
observations.
Selecting which parameters are needed. The researcher begins by considering which parameters need to be set. Only
alpha needs to be set if there is no seasonality, the trend is not dying (damping), and the there is no trend (the series varies
randomly about its mean). For alpha to apply, there does need to be autocorrelation (memory), meaning that adjacent
points are not random but tend to be relatively close together, even if there is no overall trend. Autocorrelation, trends,
damping, and seasonality can all be assessed by looking at a sequence plot of the time series, obtained in SPSS by
selecting Graphs, Sequence. (The Sequence Plot dialog box also provides for logarithmic transformations, differencing,
and seasonality differencing).
Grid search for needed parameters. To help estimate smoothing parameters, SPSS provides a grid search function. One
chooses Analyze, Time Series, Exponential Smoothing from the menus, then in the Exponential Smoothing dialog box. The
grid search function causes SPSS to create a sequence of equally spaced values for alpha and for each value calculates a
measure of how well the predictions agreed with the actual values. The parameters that produce the smallest SSE (sum
square of errors) are the best-fitting parameters. By default, SPSS displays the 10 best-fitting sets of parameters and their
2 of 12 3/21/2010 10:09 PM
Time Series Analysis: Statnotes, from North Carolina State University, ... http://faculty.chass.ncsu.edu/garson/PA765/time.htm
corresponding SSE values. (Warning: if you are estimating more than one parameter, the size of the grid grows
exponentially). After the grid search comes up with the optimal parameter setting, SPSS adds two new series to your file.
The series fit_1 contains the predicted values from the exponential smoothing, and err_1 contains the errors. Select
Graphs, Sequence, from the SPSS menus to obtain a sequence plot of the new smoothed series fit_1. The original series
amount and the fit_1 forecasts are both show, and their correspondence indicates the degree to which the exponential
smoothing forecasts are tracking actual values.
Models. SPSS provides exponential smoothing in the menu system under Analyze, Time Series, Exponential Smoothing.
The Exponential Smoothing dialog box allows the user to select from four models:
1. Simple. The series has no overall trend and shows no seasonal variation.
2. Holt. The series has a linear trend but shows no seasonal variation.
3. Winters. The series has a linear trend and shows multiplicative seasonal variation.You cannot select this option
unless you have defined seasonality with the Define Dates command.
4. Custom. You can specify the form of the trend component and the way in which the seasonal component is applied.
Residual analysis. The SPSS exponential smoothing procedure automatically adds the variable err_1 variable to the file.
This series is simply the difference between the actual value and the prediction value. It can be plotted by selecting
Graphs. Sequence from the SPSS menus. This plot is inspected to assure that residuals are randomly distributed. A finding
of non-randomness indicates the model is inadequate.
Regression models follow the strategy of dividing the time series into two parts. The regression model is developed for the
estimation dataset (also called the historical or training dataset). Then it is validated by running it on the hold-out dataset
(also called the validation dataset).
Inspecting the series. Selecting Graphs, Sequence, from the SPSS menu creates a plot of the series. By inspecting the
series, the researcher gains a rough impression of whether it would be reasonable to think that some sort of curve might be
fitted to the pattern displayed.
Fitting curves. Select Analyze, Regression, Curve Estimation from the SPSS menus. In the Curve Estimation dialog box's
"Models" section, one may check the type of curve wanted: linear, power, quadratic, cubic, inverse, logistic, exponential,
or other. Only one independent is allowed. Click the Save button in the dialog to save predicted and/or residual values for
each model, for purposes of later comparison. Note that the Save button allows the researcher to specify the range of
observations to be predicted.
Output. For each model selected, SPSS output will show these parameters:
Thus the formula for a quadratic model will be Dependent = b1*case + b2*case-squared + b0, where case is the sequential
case number (representing the time variable).
Validation. While selecting the model with the highest R-squared is tempting, it is not the recommended method. For
instance, a cubic model will always have a higher R-squared than a quadratic model. The recommended method for
selecting which model is best is cross-validation. That is, the formulas for each model based on the estimation dataset are
applied to the hold-out dataset, then the R-squares are compared based on output for the hold-out dataset. Alternatively,
the determination may be made graphically by overlaying sequence plots of both models for the hold-out dataset.
Leading Indicator Regression. While simple curve-fitting uses time (the sequential case number) as the predictor
variable, in some settings one or more leading indicators may be available. A leading indicator, of course, is a variable
whose value in the present period is a good predictor of the dependent variable in a future period.
3 of 12 3/21/2010 10:09 PM
Time Series Analysis: Statnotes, from North Carolina State University, ... http://faculty.chass.ncsu.edu/garson/PA765/time.htm
Obtaining stationarity.The researcher can determine if the dataset has leading indicators by using the cross-
correlation function (CCF) discussed below. However, CCF can only be used if the time series is stationary. In the
common situation where all variables are increasing over time, cross correlation will be spurious: all variables will
appear to be leading indicators of all other variables simply because all are increasing. Stationarity of a time series is
achieved through differencing, which is subtracting the previous value from the current value. First-order
differencing subtracts once and removes linear trends. Second-order differencing subtracts twice and removes
exponential trends. Not all series can be rendered stationary by differencing. Differencing can be done on the fly
inside the CCF option in SPSS.
Using cross-correlation to identify leading indicators and lags. CCF is found in the SPSS menu system under
Graphs, Time Series, Cross-Correlations. The Cross-Correlations dialog box allows the research to treat any or all
time series variables and to apply first or second order differencing (or higher, but that is rarely done), as well as
apply natural log transforms. One might apply cross-correlation to a suspected leading indicator and to the
dependent variable. Upon clicking OK, Cross-Correlations will yield a cross-correlation plot in which the x axis is
lags. The lag with the greatest correlation will show having the highest bar. To put it another way, a good leading
indicator will have a high bar on one of the positive lags (1 lag ahead or greater).
Creating the leading indicator variable. After a leading indicator is found and the optimal number of lags ahead it
predicts is determined, the next step is to create a new variable which for any given time period contains the value
of the indicator from the proper number of lags ago. In SPSS Trends, select Transform, Create Time Series. In the
Create Time Series dialog box, move the indictor variable into the variables list. Then highlight the contents of the
Name text box and type a name that you want to replace it. Then choose the Lag function from the Function
drop-down list. The Order text box shows a value of 1. Highlight this and replace it with a higher lag value if CCF so
indicated. Click Change. The New Variables list will now contain something like "leadvar=LAG(inquiries,3)", for
the case where the newly created leading indicator variable "leadvar" is the "inquiries" variable with a lag of 3 time
periods. Click OK to create the new time series.(Note that since there is a lag of 3 in this example, the first three
observations will have a period, representing a missing value, since the file lacks information about the index prior
to observation 1. Other observations will equal the value of "inquiries" three rows higher.)
Linear regression. Select Analyze, Regression, Linear, from the SPSS Base menu system. Follow normal regression
procedures to specify the dependent variable and to make the new leading indicator variable the independent. If
cross-validation is to be used (recommended, see below), regress only the evaluation cases, saving a hold-out
portion of the time series for validation. Note that time series regression frequently violates the regression
assumption of uncorrelated errors. When this happens, the significance levels and goodness-of-fit statistics reported
by Linear Regression are unreliable. Nonetheless, one can still use the regression equation to make forecasts on the
basis of a leading indicator. The regression coefficients themselves are not biased by the autocorrelated errors.
Cross-validation. To apply the linear regression model to all observations in the time series, including the hold-out
data, from the SPSS menu select, Transform, Compute. In the Computer Variable dialog box, let the Target Variable
be a new variable such as "predict". In the Numeric Expression text box enter the regression formula computed in
the linear regression step above. It will take the form such as "32+ 1.54*leadvar", where "32" is the constant, "1.54"
is the b coefficient for leadvar, and "leadvar" is the leading indicator, which will be the lagged version of some other
variable (of "inquiries" in the example above). Click OK to create the new variable "predict". Go to Data, Select
Cases, and select All Cases. For graphical cross-validation, obtain a sequence plot for "predict" and the "dependent"
variable, to visually inspect how well "predict" tracks the dependent not only for the evaluation cases but also for
the hold-out validation observations.
Autoregression. SPSS Trends contains an Autoregression procedure which will correct for correlated error and allow
proper interpretation of significance and R-squared statistics associated with time series regression models. In the SPSS
menu system, autoregression is found under Analyze, Time Series, Autoregression. This opens the Autoregression dialog
box, where one may select a dependent variable and one (or more) independent variables. The researcher then selects
among three autoregression models:
1. Exact maximum-likelihood. A newer model which can handle missing data. It also can handle the case where one of
the independent variables is the lagged dependent variable.
2. Cochran-Orcutt. A widely used model for first-order autogression when there are no imbedded missing values.
3. Prais-Winsten. This is a generalized least-squares (GLS) method. It cannot be used when a series contains imbedded
missing values.
In the Autoregression dialog box, the researcher can also specify whether to include a constant in model (the default). and
whether to save predicted values and residuals. The Autoregression procedure can create five new variables: fitted
(predicted) values, residuals, the standard errors of the prediction, and the lower and upper confidence limits of the
prediction.
The Autoregression procedure displays final parameters and goodness-of-fit statistics. The b coefficients and their
4 of 12 3/21/2010 10:09 PM
Time Series Analysis: Statnotes, from North Carolina State University, ... http://faculty.chass.ncsu.edu/garson/PA765/time.htm
significance in an autoregression model (after autocorrelation has been removed) may be compared with the
corresponding coefficients in a simple regression model. Independents which were shown to be weak or insignificant in a
simple regression model may be revealed to be significant in an autoregression model. The parameter estimates in an
autoregression model are much more likely to represent the "true" relationships since correlated errors are taken into
accoount. Autoregression is discussed further in Chapter 9 of the SPSS Trends manual.
ARIMA Models
ARIMA models are Auto-Regressive Integrated Moving Average models, also called Box-Jenkins models. ARIMA models
predict a variable's present values from its past values. They are sometimes called "p,d,q models" because three parameters must
be specified in the SPSS Analyze, Time Series, ARIMA procedure (or similar in other packages):
ARIMA modeling involves three stages: (1) Identification of the initial p, d, and q parameters, using autocorrelation and partial
autocorrelation methods; (2) Estimation of the p (auto-regressive) and q (moving average) components to see if they contribute
significantly to the model or if one or the other should be dropped; and (3) Diagnosis of the residuals to see if they are random
and normally distributed, indicating a good model. An ARIMA (0,1,1) model means no autoregressive component, differencing
one time to remove linear trends, and a lag 1 moving average component. Lags: Note that lag n means an effect n weeks back, so
a spike in the data every four weeks would be one of lag 3, not 4.
Autoregressive component (p). Usually 0, 1, or 2, A value of p=0 means the raw data have no autocorrelation, p=1 means
current observations of the series are correlated with themselves at lag 1 (the most common situation), p=2 means
correlation at lag 2 also, and so on. An autoregressive component of p = 2 thus means that the dependent (the time series
value) is affected by the preceding two values, xt-1 and xt-2 independently.
Integrated component (d). Usually 0, 1, or 2. The d (integrated) component is simply 0 if the raw data are stationary to
begin with, 1 if there is a linear trend (the usual case), or 2 if there is a quadratic trend. Higher positive values are possible
but very rarely useful. An ARIMA (0,1,0) model is a random walk model in which differencing can be used to remove the
linear trend but the remaining variation cannot be explained either on an autoregressive nor on a moving average basis.
ARFIMA models are fractional ARIMA models which allow d to be other than integer. Where in ARIMA, the
researcher specifies d à priori, in ARFIMA d is estimated using maximum likelihood methods. The (p,d,q) order of
an ARFIMA model is selected by comparing AIC statistics for alternative models. Point estimates of d, called the
"fractional integration parameter," may more accurately reflect the effect (persistence) of the given independent
variable and also serve to compare measures of the variable across samples. For instance, Ray (1993) found that
modeling an ARFIMA process as an ARIMA autoregressive process led to serious errors of forecasting when d is
near 0.5. However, ARFIMA requires large samples. Simulations by Crato and Ray (1995) demonstrated that a
large number of observations was necessary to obtain a significant advantage in using ARFIMA models. ARFIMA is
supported by the popular RATS time series program from Estima.
Moving average component (q). Usually 0, 1, or 2. A value of q=0 means there are no shocks in the series and the series is
purely an autoregressive one. A setting of q=1 means current observations are correlated with shocks at lag 1, q=2 means
they are correlated with shocks at lag 2, and so on. Normally the researcher will set either p or q to a positive value but not
both as that may cause overfitting the solution to noise in the data.
The values of the p and q parameters may be inferred by looking at autocorrelation and partial autocorrelation functions as
discussed below.
Constants: When d=0 there is usually a constant term equal to the mean of the time series. When d=1 there is usually a
constant term reflecting the non-zero average trend. When d=2 there is normally no constant; if a constant is added, then
it reflects the value of the "trend in the trend."
There can also be components reflecting continuous variables entered as independents in the ARIMA model.
Autocorrelation and partial autocorrelation functions (ACF and PACF) can also be used to estimate p and q. Specifically, ACF
and PACF plots plot deviations from zero autocorrelation by time period: the larger the positive or negative autocorrelation for a
period, the longer the plot line to the right (positive) or left (negative) of zero. ACF and PACF are obtained in SPSS under
Graphs/Time Series/Autocorrelations.
Autoregressive models. AR models are indicated when PACF cuts off sharply at lag x but ACF declines slowly. To
determine tentatively the value of p, look at the PACF plot and determine the highest lag at which the PACF is significant.
Moving average models. MA models are indicated by a rapidly declining ACF and PACF. If the ACF does not decline
slowly but rather cuts off sharply at lag x, this is suggests setting q=x, thereby adding a moving average component. If
5 of 12 3/21/2010 10:09 PM
Time Series Analysis: Statnotes, from North Carolina State University, ... http://faculty.chass.ncsu.edu/garson/PA765/time.htm
autocorrelation is negative at lag-1 then this also indicates the need for an MA (q) term higher than 0.
ARIMA (p,0,0): ACF is spiked at lag p and declines toward 0. PACF is spiked at lag 1 to lag p.
ARIMA (0,1,0): Random walk model. The only effect is a non-seasonal differencing to remove a linear trend. ACF is
either constant or is balanced between positive and negative. PACF is spiked only at lag 1.
ARIMA (1,1,0): First-order autoregressive model. There is non-seasonal differencing to remove a linear trend, and lagging
the dependent variable by 1.
ARIMA (0,1,1): Simple exponential smoothing model. There is non-seasonal differencing to remove a linear trend, and
lagging shock effects by 1.
ARIMA (0,0,p): ACF is spiked at lags 1 to p, declining sharply thereafter to 0. PACF is spiked at lags 1 to p, declining
more slowly toward 0.
ARIMA (p,0,q): ACF and PACF both decline slowly toward 0. PACF declines erratically due to shock effects.
ARIMA (1,1,1): A mixed model. Warning: normally one does not include both autoregressive effects and moving average
effects in the same model because this may lead to overfitting the data to noise, and may reduce the reliability of
interpretation of the significance of individual components in the model.
As a rule of thumb, one may wish to start with p=1 and/or q=1 and then increase the p and/or q values if the ACF and PACF for
the residuals display spiking.
Seasonal effects: If there are spikes in the data every four periods for quarterly data, or every 12 periods for monthly data, there
is a seasonal effect. In SPSS, prior to asking for the ARIMA model, one asks for seasonal differencing by Data, Define Dates,
Cases Are: (and put in the time period, such as "years, months" or "years, quarters"), which creates variables like YEAR_,
MONTH_, and DATE_. Then in Graphs, Time Series, Autocorrelations, put in the difference you want (typically 1) and the
seasonal differencing you want (typically 1, for monthly data with a periodicity of 12). Alternatively, in Analyze, Time Series,
ARIMA, simply enter the p, d, and q parameters you want for both the simple differencing column and the seasonal differencing
column in the "Model" portion of the ARIMA dialog box.
A seasonal ARIMA effect is indicated by a second parenthetical term with a subscript indicating the periodicity. ARIMA
(1,1,0)(1,1,0)12 thus refers to a lag 1 autoregressive model with single differencing to remove a linear trend, and no
moving average (lagged shock) effects, with a seasonal effect at lag 12.
Significance testing of auto-regressive and moving average components and of independent variables: On asking for ARIMA
in SPSS, the output will include a t-test in the "Variables in the Model" section for AR (ex., AR1 for autoregressive component
with p=1), MA (ex, MA1 for the moving average component with q=1), and if seasonal effects are specified, the SAR and SMA
components. The probability levels of the t-tests for these components, as well as for any independent variables, reveals which
significantly contribute to the model. Non-significant components and variables are normally dropped. This is akin to
significance testing of b coefficients in ordinary regression.
Unit roots. If the coefficient of the AR(1) coefficient approaches 1 (has a "unit root"), this means the autoregressive
component is emulating a first difference. This would lead the researcher to remove the autoregressive (p) component and
add the order of differencing (d+1) in its place. In a higher-order AR(n) models, a unit root is when the sum of AR
coefficients approaches 1. One would then reduce p by 1 and increase d by 1. Likewise, if the MA(1) coefficient
approaches 1 then the MA(1) component is cancelling a first difference, leading the researcher to remove the MA(1) (i.e.,
q) term and reduce differencing (d) by 1. In MA(n) models, if the sum of MA coefficients approaches 1, then q is reduced
by 1 and the order of differencing is reduced by 1.
ACF, PACF, and model fit. After fitting the model, ideally there will be no significant ACF's and PACF's. While such perfection
is not common in practical research, results with several significant ACF's and PACF's indicate a poor fit.
Residual analysis: A random normal distribution of residuals indicates a good ARIMA model (good identification of p, d, and 1).
The Box-Ljung statistic should have significance levels of .05 or better for 95% of the time periods under analysis as a test of
randomness. Box-Ljung statistics are created by SPSS under Analyze, Time Series, ARIMA; then click on the Display button
and check autocorrelations and partial autocorrelations. P-P probability plot of residuals should form a 45-degree line as a test
of normality. Failure to meet these tests suggests the need to transform data and/or to identify ARIMA parameters differently. In
SPSS, P-P plots are found under Graphs, P-P, for plotting the variable ERR_1 (the residuals) created by Analyze, Time Series,
ARIMA.
RMS (residual variance) is root mean square residual (called residual variance in SPSS and the variance estimate in SAS).
It is an alternate means of assessing the residuals. As a goodness of fit measure, the smaller the RMS, the smaller the error
and the better the fit of the model. That is, between two models, the one with the lower residual variance is the one with
better fit. RMS is the mean square root of the squared residuals summed for all time periods.
AIC is the Akaike Information Criterion and is a goodness of fit measure used to assess which of two ARIMA models are
better, when both have acceptable residuals. The lower the AIC, the better the model. However, this comparison may only
be made with nested models. (Ex., ARIMA (0,1,0) is nested under ARIMA(1,1,0); however, ARIMA (1,0,1) and ARIMA
(0,1,0) cannot be compared by AIC because neither is nested under the other. There are also other goodness of fit
6 of 12 3/21/2010 10:09 PM
Time Series Analysis: Statnotes, from North Carolina State University, ... http://faculty.chass.ncsu.edu/garson/PA765/time.htm
measures less commonly used for this purpose, such as BIC (Bayesian Information Criterion) or SBC (Schwarz Bayesian
Criterion).
Adding interventions is simply using a dichotomous variable to divide the time periods into two sets - before an intervention
event and after and intervention event - then seeing if the ARIMA (or other) models are the same. When the omega statistic is
significant, the researcher concludes the intervention had a significant effect. Alternatively, the closer the delta statistic to 0, the
more abrupt the effect of the intervention. SPSS produces omega; SAS produces both.
Interpreting b coefficients of intervention variables. When an intervention variable coded 0= pre intervention, 1= post
intervention, is added to the ARIMA model, whether the intervention significantly affects the time series variable (ex.,
whether the partial birth abortion ban significantly affected the number of abortions time series) can also be inferred from
the significance of the intervention variable's b coefficient in the "Variables in the Equation" table of the ANOVA table in
ARIMA output. If the b coefficient is not significant, the intervention was not significant. The value of b is the mean
number of units in the dependent variable per time period (ex., mean number of abortions per year) change attributable to
the intervention, controlling for autoregressive and moving average effects (if specified in the ARIMA model).
In SPSS, one can also get case summary output using Analyze, Reports, Case Summaries. By doing this for cases before
and after the intervention, SPSS will compute the median number of dependent variable units per time period before and
after the intervention (ex., the median number of abortions per year).
Control variables in intervention analysis. When additional independent variables beyond the intervention variable are
added to the equation, these serve as controls. That is, the b coefficient of the intervention variable then reflects the
intervention variable's effect on the dependent controlling for other variables in the equation, just as with ordinary
regression, because the b coefficients are partial coefficients. Thus, for instance, if median income were added as a control
in the example above, the value of b would be the mean number of abortions per year change attributable to the partial
birth abortion ban, controlling for median income and controlling for autoregressive and moving average effects (if
specified in the ARIMA model).
Adding transfer functions is adding continuous variables to the right-hand side of the time series equation, not including an
intervention variable. As with ordinary regression (and as with adding interventions), the purpose of adding transfer functions is
to see the effect on the dependent of additional independent variables other than previous values of the dependent itself. In the
example, if median income is added but not a dichotomous variable for the partial birth abortion ban, then the b coefficient for
median income would be the mean number of abortions per year change attributable to a unit change in median income,
controlling for autoregressive and moving average effects (if specified in the ARIMA model).
A note on log transformed dependents. In both intervention and transfer function analysis, note that if the dependent variable is
measured in log base 10 transform format, then the effect calculation is the antilog of 10 to the bth power to get mean number of
units per time period associated with a unit change in the independent variable (which, in the case of intevention variables, is
moving from 0=no intervention to 1=intervention).
Obtaining predictions. As with other forms of regression, ARIMA models can generate predicted values.
Setting the estimation and validation periods. The estimation (a.k.a. historical) period is the range of cases used to
develop the ARIMA model. The validation period refers to hold-out time periods (rows in your dataset) used to test the
model. In SPSS, this can be done outside ARIMA by selecting Data, Select cases, then click on the radio button for
"Based on time or case range", then click the Range button and set the range for the estimation period. Then in SPSS,
select Analyze, Time Series, ARIMA; then click the Save button and check the radio button you want:
Predict from estimation period through last case. This option is the default. It develops the ARIMA model based on the
estimation time period, then predicts values for all cases from the estimation period through the end of the file (the last
time period). If no estimation period was defined, all cases are used to predict values.
Predict through. Choose this option to predict into the future. This option develops the ARIMA model based on cases in
the estimation period but the user can pick cases beyond the end of the dataset to predict for (ex., the data set ends in
2005 but the researcher can ask for predictions through 2050).
Output. With either choice above, SPSS will add additional columns to your dataset: FIT_1 is the prediction; ERR_1 is the
residual (error) for the prediction; LCL_1 is the lower confidence limit on the estimate; UCL_1 is the upper confidence
limit on the estimate; and SEP_1 is the standard error of fit.
Autocorrelation is the serial correlation of error terms for estimates of a time series variable, resulting from the fact that the value of a
datum in time t in the series is dependent on the value of that datum in time t - 1 (or some higher lag). Autocorrelation can be detected
visually by examining a regression line scatterplot. Distances from the regression line to the actual values represent error. When no
autocorrelation exists, dots representing actual values will be randomly scattered around the full length of the series. Negative autocorrelation
exists when, as one moves along the x axis, the next observation tends to be lower than the previous one, then the next one higher, then
lower, and so on, indicating a negative or reactive dependency exists for observations in the series. A positive autocorrelation extis when, as
7 of 12 3/21/2010 10:09 PM
Time Series Analysis: Statnotes, from North Carolina State University, ... http://faculty.chass.ncsu.edu/garson/PA765/time.htm
one moves along the x axis, there are series of above-the-line observations, then series of below-the-line observations, then more highs, etc.,
indicating that a positive or inertial dependency exists for the observations.
The Durbin-Watson coefficient is a test for autocorrelation. That is, it tests the time series assumption that error terms are
uncorrelated. The Durbin-Watson coefficient tests only first-order autocorrelation (lag t-1). The value of d ranges from 0 to 4. A value
of 2 indicates no autocorrelation; 0 indicates positive autocorrelation; and 4 indicates negative autocorrelation. For a given level of
significance such as .05, there is an upper and a lower d value limit. If the computed Durbin-Watson d value for a given series is more
than the upper limit for the case of positive serial correlation, the null hypothesis is not rejected. If the computed d value is less than
the lower limit, the null hypothesis is rejected. If the computed value is in between the two limits, the result is inconclusive. For the
case of negative first-order serial correlation d must be more than (4 - d-lower-limit) to reject the null hypothesis. In SPSS, one can
obtain the Durbin-Watson coefficient for a set of residuals by opening the syntax window and running the command, FIT RES_1,
assuming the residual variable is named RES_1. Click here for further explanation of the Durbin-Watson coefficient.
Box-Ljung test. The Box-Ljung test tests the significance of autocorrelation at each lag. ITs significance value should be less than or
equal to .05 for all or almost all lags. SPSS supports this test.
Pankrantz criterion. The Pankrantz criterion states that the autocorrelation divided by its standard error must be less than 1.25 for the
first three lags and less than 1.60 for subsequent lags, to conclude the series has no significant autocorrelation. This criterion is applied
to the series residuals to establish that error is not autocorrelated.
Generalized least squares estimation (GLS). When autocorrelation is present, one may choose to use generalized least-squares
(GLS) estimation rather than the usual ordinary least-squares (OLS). In iteration 0 of GLS, the estimated OLS residuals are used to
estimate the error covariance matrix. Then in iteration 1, GLS estimation minimizes the sum of squares of the residuals weighted by the
inverse of the sample covariance matrix.
Decomposition refers to separating a time series into trend, cyclical, and irregular effects. Decomposition may be linked to de-trending and
de-seasonalizing data so as to leave only irregular effects, which are the main focus of time series analysis.
Model order refers to the model's lag order. A lag order of -1 is common, indicating variables on the right hand side are lagged one time
unit (that is, the variable on the left hand side of the equation -- the variable being forecast -- is matched with values of the independents one
time period previous. Testing for model order follows identification and specification, and is the first step in estimation. Model order tests
include the likelihood ratio test (LR), the final prediction error test (FPE), and the autoregressive transfer function test (CAT), to name a few.
Assumptions
1. Stationarity is a critical assumption of time series analysis, stipulating that statistical descriptors of the time series are invariant for
different ranges of the series. Weak stationarity assumes only that the mean and variance are invariant. Strict stationarity also requires
that the series is normally distributed. Stationarity is tested by the following tests: Durbin-Watson, Dickey-Fuller, Augmented D-F, and
Root Examination for univariate time series. There is also a test (Fountis-Dickey) for joint stationarity when modeling two time series
together. Stationarity may also be inspected graphically in SPSS by selecting Graphs, Sequence, to visually inspect for a linear or
quadratic slope. Testing stationarity is a first step in time series modeling. These may be followed by tests for normality: the normal
distribution test, Jarqua-Bear, or studentized range tests.
2. Uncontrolled autocorrelation. Time series analysis requires stationarity be established through differencing or some other technique.
If two variables trend upward in raw data, as do GNP and entertainment expenditures, they will tend to correlate highly when a linear
technique such as OLS (ordinary least-squares) regression is applied. In fact, many if not most nationally aggregated variables are of
this type. For data in such series, the value of any given datum is largely determined by the value of the preceding datum in the series.
This autocorrelation must be controlled before inferences may be made about correlation with other variables. Failure to control
autocorrelation is vary apt to lead to spurious results, thinking there is a strong effect of, say, entertainment expenditures on GNP.
More technically, significance tests of OLS regression estimates assume non-autocorrelation of the error terms. Error terms at
sequential points in the series should constitute a random series. It is also assumed that the mean of the error terms will be zero
(because estimates are half are above and half below the actual values), and the variance of the error terms will be constant throughout
the time series. When, as in many time series, the value of a datum in time t largely determines the value of the subsequent datum in
time t + 1, a dependency exists linking the error terms and the non-autocorrelation assumption is violated. The practical effect is that
the significance of OLS estimates is computed to be far better than actual, leading the researcher to think that significant relationships
exist when they do not. The Durbin-Watson test is the standard test for autocorrelation.
3. Applying Linear Techniques to Nonlinear Data. OLS regression assumes linear relationships. Applying linear techniques to
nonlinear data will underestimate relationships and increase error of estimate. As with other uses of OLS regression, the linearity
assumption is not violated by adding power or other nonlinear transform terms to the equation (ex., income-squared). The researcher
must conduct tests for linearity. A common test is Ramsey's RESET test, discussed in the section on data assumptions. There are a
variety of other tests for linear or nonlinear dependence, including the Keenan, Luukkonen, McLeod-Li, and Hsieh tests. If
non-linearity is present, it may be possible to eliminate it by double differencing or data transformation (ex., logarithmic).
8 of 12 3/21/2010 10:09 PM
Time Series Analysis: Statnotes, from North Carolina State University, ... http://faculty.chass.ncsu.edu/garson/PA765/time.htm
4. Arbitrary model lag order. Model lag order can have great effects on results. While tests exist to determine the optimal model order,
these tests are purely statistical in nature. The researcher should have a theoretical basis establishing the face validity of the order of
the model he or she has put forward.
5. No outliers. As in other forms of regression, outliers may affect conclusions strongly and misleadingly.
6. Random shocks. If shocks are present in the time series, they are assumed to be randomly distributed with a mean of 0 and a constant
variance.
7. Uncorrelated random error. Residuals in a good time series model will be randomly distributed, exhibit a normal distribution, have
non-significant autocorrelations and partial autocorrelations, and have a mean of 0 and homogeneity of variance over time. Correlated
error does not bias estimates but does inflate standard errors, making statistical inference problematic. The Durbin-Watson test is the
standard test for correlated error.
What are some common sources of time series data for social scientists?
Many time series datasets are available through the Inter-University Consortium for Political and Social Research (ICPSR).
Among the repeated surveys available are the General Social Survey (GSS) and the American National Election Studies (ANES),
as wellas the Current Population Survey (CPS), the National Health Interview Survey (NHIS), and the Consumer Surveys from
the Survey Research Center of the University of Michigan.
I suspect there is not a single trend line but rather the trend is different for different subgroups in my population. How do I
handle this?
While separate regressions might be run for each subgroup in the population, significance comparisons may become difficult
because of different subgroup population sizes. It is therefore more advisable (and easier) to include the subgroups in the
regression equation as dummy variables (ex., Democrat=1, non-Democrat=0) along with year as well as an interaction term (the
dummy variable times year). The dependent variable will then equal a constant plus a regression coefficient times year, plus a
regression coefficient times the dummy variable, plus a regression coefficient times the interaction term. If there is no group
effect, the coefficients will be zero for the dummy variable and interaction term. Nonzero, same-sign coefficients for the dummy
variable and interaction term would indicate that Democrats and non-Democrats are diverging on the dependent variable over
time. Non-zero, different-sign coefficients would indicate that Democrats and non-Democrats are converging on the dependent
variable over time, or the trend lines may even have crossed.
Firebaugh (1997: 16) warns researchers to remember that one possible cause of group trends is differential recruitment into the
groups rather than group effects per se. That is, Democrats and non-Democrats might be diverging on the abortion issue, for
instance, because Democrats are attracting a larger percentage of pro-choice individuals to their ranks over time.
A trend might be due to individuals changing their minds, or due to new individuals coming into my sample. How do I
decompose the aggregate trend into individual and population turnover components?
An aggregate change in the percent in favor of legalizing marijuana might be due to changed opinions or due to younger
individuals coming into the sampled population, for instance. The easiest situation is where the aggregate change is due entirely
to the individual component or entirely to the turnover component. The researcher keeps track of the pro-legalization percentage
9 of 12 3/21/2010 10:09 PM
Time Series Analysis: Statnotes, from North Carolina State University, ... http://faculty.chass.ncsu.edu/garson/PA765/time.htm
by cohort (range of years of birth of the respondents, such as born 1950-1960). If we have entirely an individual effect we would
expect that the change within each cohort would approximate the aggregate change in the percent favoring legalization. If we
have entirely a turnover effect we would expect that the percent favoring legalization would not change over time within any
cohort, but new cohorts entering the sample would differ from existing ones.
Often trends exhibit both individual and turnover effects. Firebaugh (1997: 22) recommends the construction of a cohort-
by-period data array to assess relative effects. This is simply a table in which column 1 is the cohort ranges (ex., born
1950-1960, 1961 - 1970, etc). Subsequent columns are the percentages (ex., percentage favoring legalization of marijauna) for
each of the repeated surveys (ex., 1985 survey, 1990 survey, etc.), The last columns are the percent changes between surveys
(ex., percent change between 1985 and 1990, between 1990 and 1995, etc.). Such a data array allows the researcher to visually
inspect changes by cohort by period using the same expectations discussed above.
An alternative decomposition approach is to use regression, though this requires that the researcher to assume within-cohort
changes are linear and additive. All respondents in all surveys are cumulated into a single dataset for this analysis. The regression
formula sets the response of the a given respondent on a given year of survey equal to a constant plus a regression coefficient
times year (of the survey for the given respondent) plus a regression coefficient times cohort (birth year of the given
respondent). The regression coefficient for year is the within-cohort slope and the estimated effect of within-cohort change
equals this coefficient times the difference in year of the last survey and year of the first (ex., 1995 - 1985 = 10). The regression
coefficient for cohort is the cross-cohort slope and the estimated effect of cohort turnover equals this coefficient times the
difference in average year of birth in the last survey minus the average year of birth in the first survey. The ratio of the two
effects is the ratio of importance of individual versus turnover effects. The two effects will sum approximately (but not exactly)
to the aggregate effect. A proof of this is given by Firebaugh (1997: 25-26), who provides more extensive discussion and
additional strategies.
How does one go about disentangling age, period, and cohort time series effects?
This is a major focus of Firebaugh (1997). Age is the age of the individual surveyed. Period is the year of measurement. Cohort
is the year of birth of the respondent. Therefore, cohort equals period minus age. Because of this equality, there is an
identification problem: knowing any two fixes the value of the third. For instance, knowing period and age fixes the value of
cohort. The researcher may be tempted to disentangle age, period, and cohort effects by including all three in the analysis, as by
making all three independent variables in a regression equation. However, this is mathematically invalid because such a
regression model is underidentified. This problem cannot be sidestepped by running three regressions, each with only two of
these variables, because that will result in the same proportion of variance explained (R2) for each of the three models, because
of the foregoing identity (that is, the fact that, for instance, age and period contain the same information as age and cohort).
There is no solution to this identification problem, only strategies for trying to deal with it. Sometimes it is possible to fix the
value of the regression coefficient for one of the effects, typically to zero. For instance, in a study of bisexual vs. homosexual
preferences, one might assume that the age effect was zero and all changes over time were due to period and cohort effects. One
could then use period and cohort as independents in a regression in which sexual preference was a dependent, but results would
be invalid if the assumption that there was no age effect was an untrue assumption. A way of wrestling with such assumptions is
to run three regressions, each time fixing one of the effects (age, cohort, period) to zero, then examining the resulting
coefficients to assess whether, on the basis of external information, all three models seemed plausible. One may find, for
instance, that a regression coefficient approaches zero for one of the models, yet one has reason to believe that the effect for
that coefficient does indeed exist, meaning that that model is not plausible.
Bibliography
Methodology
QASS Monograph Series Quantitative Applications in the Social Sciences (QASS) monograph series from Sage Publications, 2455
Teller Road, Thousand Oaks, CA 91320-2218; tel. 805-499-9774; fax 805-499-0871; e-mail order@sagepub.com; web
http://www.sagepub.com. Cost $10.95 per monograph, published in soft cover.
Time Series Analysis: Regression Techniques, Second Edition, By Charles W. Ostrom, Jr. QASS No. 9 (1978, Second Edition
1990).
Charles Ostrom's initial volume in this series grew out of his work analyzing the US- USSR arms race, and from a desire to
awaken social scientists to the dangers of applying ordinary least squares (OLS) regression analysis to time series data.
The theme of the monograph was emphasis on the particular danger that the assumption of uncorrelated error terms might
well be violated in time series analysis, quite possibly with disastrous results in terms of misinterpretation. Secondarily,
Ostrom argued that with proper precautions, regression techniques for time-series analysis could be fruitful and should not
be abandoned in favor of empirical Box-Jenkins approaches to time series (though Ostrom used Box-Jenkins models for
10 of 12 3/21/2010 10:09 PM
Time Series Analysis: Statnotes, from North Carolina State University, ... http://faculty.chass.ncsu.edu/garson/PA765/time.htm
Interrupted Time Series Analysis. By David McDowall, Richard McCleary, Errol E. Meidinger, and Richard A. Hay, Jr. QASS
No. 21 (1980).
In Interrupted Time Series Analysis, McDowall et al. eschew the EGLS estimation approaches discussed by Ostrom and
instead focus on ARIMA (auto-regressive integrated moving average, a.k.a Box-Jenkins) modeling, at least for the case of
time series which are interrupted by an intervention. In such analyses the object is to determine how the intervention
affected the time series. McDowell and his colleagues outline methods for modeling the serial dependence of a time series
so that it may be controlled. These methods de-trend and de- seasonalize data as well as treat random error, so that the
treated data more clearly reveal intervention effects. In general, models for serial dependence are identified, parameter
estimates are tested for significance and for lying within the bounds of stationarity, and then model residuals are tested for
randomness using the Q-statistic and other criteria. The authors conclude with sections showing how parameter estimates
for acceptable models may be used to make inferences about such questions as the abruptness/gradualness of the
intervention impact, whether the impact was permanent or temporary, as well as testing the null hypothesis.
In regard to computational aspects of ARIMA, McDowell et al. cite the original FORTRAN programs distributed by Glass
et al. (P. 14). Output is presented for Box-Tiao time series ith SCRUNCH, and a citation to the software is provided (pp.
26-27). Writing in 1980, largely before the PC revolution, the authors simply note that BMDP, MINITAB, SAS, and SPSS
all implement ARIMA modeling. They recommend BMDP for instructional purposes and in an appendix provide the
addresses for six vendors: BMDP, SAS, SPSS, MINITAB, IMSL, and PACK. There is no discussion of the command-level
use of any of these packages either in general, or for specific recommended analytic strategies.
Univariate Tests for Time Series Models. By Jeff B. Cromwell, Walter C. Labys, and Michel Terraza. QASS No. 99 (1994), and
Multivariate Tests for Time Series Models. By Jeff B. Cromwell, Michael J, Hannan, Walter C. Labys, and Michel Terraza.
QASS No. 100 (1994).
The two most recent (1994) monographs in the QASS time series group survey univariate (QASS #99, by Cromwell,
Labys, and Terraza,) and multivariate (QASS #100, by Cromwell, Hannan, Labys, and Terraza) tests for time series
models. Univariate tests include those for stationarity to determine if differencing sequential time observations or some
other process has eliminated general augmentation trends which may lead to spurious correlation and inference of
causality. Other tests discussed test for normality, independence, linearity, heteroscedasticity, and autoregression, as well
a procedures for testing model order and testing residuals. In the case of multivariate tests, tests cover joint stationarity,
joint normality, independence, cointegration, causality, linear and nonlinear model specification, model order, and forecast
accuracy. These two volumes are the most recent, and in some ways the most lucid, monographs in the set. Dozens of tests
are included in the survey, and in each monograph an appendix comprehensively charts whether any of three computer
packages (MicroTSP, RATS, and SHAZAM) implement the test, allow computation of the test, or if the test cannot be
performed using the package. SAS and SPSS modules are not discussed.
Brockwell, Peter J.; Davis, Richard A. 1991 Time Series: Theory and Methods, Second Edition Springer-Verlag, New York. ISBN
0-387-97429-6 [SJS]: Focuses on theory and on linear models.
Burchinal, M., and Appelbaum, M. I. (1991). Estimating individual developmental functions: Methods and their assumptions. Child
Development, 62(1), 23-43.
Burchinal, M. R., Bailey, D. B., & Snyder, P. (1994). Using growth curve analysis to evaluate child change in longitudinal
investigations. Journal of Early Intervention, 18(4), 403-423.
Crato, Nuno and Ray, Bonnie K. (1995). Model selection and forecasting for long-range dependent processes. Department of
Mathematics, New Jersey Institute of Technology.
Curran, P.J. (forthcoming, 1999). A latent curve framework for studying developmental trajectories of adolescent substance use. To
appear in: J. Rose, L. Chassin, C. Presson, & J. Sherman (Eds.), Multivariate Applications in Substance Use Research. Hillsdale, NJ:
Erlbaum.
Duncan, T.E., Duncan, S.C., Stoolmiller, M. (1994). Modeling developmental processes using latent growth structural equation
methodology. Applied Psychological Measurement, 18(4), 343-354.
Enders, W. (2004). Applied econometric time series, 2nd ed.. NY: Wiley.
Feiock, Richard C. and Jered B. Carr (1997). A reassessment of city/county consolidation: Economic development impacts. State and
Local Government Review, Vol. 29, No. 3 (Fall): 166-171. A public administration example.
11 of 12 3/21/2010 10:09 PM
Time Series Analysis: Statnotes, from North Carolina State University, ... http://faculty.chass.ncsu.edu/garson/PA765/time.htm
Francis, D. J., Fletcher, J. M., Stuebing, K. K., Davidson, K. C., et al. (1991). Analysis of change: Modeling individual growth. Journal
of Consulting and Clinical Psychology, 59(1), 27-37.
Hamilton, James D. 1994 Time Series Analysis Princeton U. Press, Princeton, NJ ISBN 0-691-04289-6 [SJS]: An applied focus with
emphasis on economics and assumption reader has an economics background.
Lawrence, F.R. & Hancock, G.R. (1998). Assessing change over time using latent growth modeling. Measurement and Evaluation in
Counseling and Development, 30(4): Lawrence, F. R., & Hancock, G. R. (1998). Assessing change over time using latent growth
modeling. Measurement and Evaluation in Counseling and Development, 30(4), 211-224.
MacCallum, R.C., Kim, C., Malarkey, W., Kiecolt-Galser, J. (1997). Studying multivariate change using multilevel models and latent
curve models. Multivariate Behavioral Research, 32, 215-253.
Meredith, W. and Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107-122.
Muthén, B.O, & Curran, P. J. (1997). General longitudinal modeling of individual differences in experimental designs: A latent variable
framework for analysis and power estimation. Psychological Methods, 2(4), 371-402.
Ray, Bonnie K. (1993). Modeling long memory processes for optimal long-range prediction. Journal of Time Series Analysis 14(5),
511-525.
Stoolmiller, M. (1995). Using latent growth curve models to study developmental processes. In J. M. Gottman & G. Sackett (Eds.), The
analysis of change. Cambridge, MA: Cambridge University Press.
Tabachnick, Barbara G. and Linda S. Fidell (2001). Using Multivariate Statistics, Fourth Edition. Boston: Allyn and Bacon. Chapter
16 treats ARIMA models.
Willet, J. B. (1997). Measuring change: What individual growth modeling buys you. In E. Amsel & K. A. Reninger (Eds.), Change and
development (pp. 213-243). Mahwah, NJ: Erlbaum.
Willett, J. B., and Sayer, A. G. (1994). Using covariance structure analysis to detect correlates and predictors of change. Psychological
Bulletin, 116, 363-381.
Willet, J. B., Singer, J. D., & Martin, N. C. (1998). The design and analysis of longitudinal studies of development and
psychopathology in context: Statistical models and methodological recommendations. Development and Psychopathology, 10,
395-426.
12 of 12 3/21/2010 10:09 PM