You are on page 1of 8

c c c

c   c 
 


Time series arise in many applications: in sociological statistics on births, deaths,


unemployment, crime, and divorces; in economic statistics on production, interest rates, and
exchange rates; in meteorological statistics on temperature, cloud cover, and humidity; in sales
statistics; in statistics on defective products produced by a machine, a factory, or an entire
company; in traffic statistics on the number of riders using a bus or the number of cars crossing a
bridge; in measurements by psychologists on attention or body movement; in physiological
measures of heart rate, respiration, and other bodily functions; in measures of river flow taken
for flood-control planning; and in many other areas.

A time series may trend upward or downward, as many economic series do, or may fluctuate
around a steady mean, as human body temperature does. A series may contain a single cycle, like
the daily cycle of body temperature, or may contain several superimposed cycles. For instance,
outdoor temperature usually exhibits both daily and annual cycles, while traffic density usually
exhibits daily, weekly, and annual cycles.

We shall concentrate on three major goals of time-series analysis. First, you may want to
 future values of a time series, using either previous values of just that one series, or
values from other series as well. Second, you may want to assess the impact of a ,
such as the effect of a new law on the frequency of drunken driving or the effect of a bridge toll
on traffic across neighboring bridges. Third, you may study , by which we mean
the effects of =   rather than = on a series. This requires two or more time series. For
instance, if changes in unemployment consistently precede changes in crime, that might imply
that unemployment is one of the causes of crime.

 !"     

The central point that differentiates time-series problems from most other statistical problems is
that in a time series, observations are not mutually independent. Rather a single chance event
may affect all later data points. This makes time-series analysis quite different from most other
areas of statistics.

Because of this nonindependence, the true patterns underlying time-series data can be extremely
difficult to see by visual inspection. Anyone who has looked at a typical newspaper chart of
stock-market averages sees trends that seem to go on for weeks or months. But statisticians who
have studied the subject agree that such trends occur with essentially the same frequency one
would expect by chance, and there is virtually no correlation between one day's stock-market
movement and the next day's movement. If there were such a correlation, anybody could make
money in the stock market simply by betting that today's trend will continue tomorrow, and it's
simply not that easy. In fact, cumulating nearly any series of random numbers will yield a pattern
that looks nonrandom.
#
$%

In this section we discuss some of the problems that arise in the three major types of application
mentioned earlier.

&

The central logical problem in forecasting is that the "cases" (that is, the time periods) which you
use to make predictions never form a random sample from the same population as the time
periods about which you make the predictions. This point is vividly illustrated by the 509-point
plunge in the Dow-Jones Industrial Average on October 19, 1987. Even in percentage terms, no
one-day drop in the previous 40 years (comprising some 8000 trading days) had ever been more
than a fraction of that size. Thus this drop (which occurred in the absence of any dramatic news
developments) could never have been forecast just from a time-series study of stock prices over
the previous 40 years, no matter how detailed. It is now widely believed that a major cause of the
drop was the newly-introduced widespread use of personal computers programmed to sell stock
options whenever stock prices dropped slightly; this created a snowball effect. Thus the stock
market's history of the previous 40 or even 100 years was irrelevant to predicting its volatility in
October 1987, because nearly all that history had occurred in the absence of this new factor.

The same general point arises in nearly all forecasting work. If you have records of monthly
sales in a department store for the last 10 years, and are asked to project those sales into the
future, those statistics will not reflect the fact that as you work, a new discount store is opening a
few blocks away, or the city has just changed the street in front of your store to a one-way street,
making it harder for customers to reach your store.

A second problem that arises in time-series forecasting is that you rarely know the true shape of
the distribution with which you are working. Workers in other areas often assume normal
distributions while knowing that assumption is not fully accurate. However, such a simplification
may produce more serious errors in time-series work than in other areas. In much statistical work
the problem of non-normal distributions is greatly ameliorated by the fact that you are really
concerned with sample means, and the Central Limit Theorem asserts that means are often
approximately normally distributed even when the underlying scores are not. However, in time-
series forecasting you are often concerned with just one time period--the period for which you
want to forecast. Thus the Central Limit Theorem has no chance to operate, and the assumption
of normality may lead to seriously wrong conclusions. Even if your forecast is just one of a
series of forecasts which you update after each new time period, the forecasts are made one at a
time, so that a single seriously wrong forecast may bankrupt your company or lead to your
dismissal, and nobody will ever learn that your next 50 forecasts would have been within the
range predicted by a normal distribution. Some stock market speculators, who had previously
been quite successful, were bankrupted or driven into retirement by the stock market plunge of
1987. That's very different from the situation in which a company hires, all at once, 50 workers
identified by a competence test. If one out of the 50 is a spectacular failure, the company (and
you the forecaster) will survive because at the very same time the other 49 were turning out well.

$%' c  


When you try to assess the impact of a single event, the major problem is that there are always
many events occurring at any one time. Suppose you are trying to assess the effect of a new toll
on bridge A on traffic across bridge B, but a new store opened near bridge B the same day the
toll was introduced, permanently increasing traffic on bridge B. When critics remind us that
"correlation does not imply causation", they are mostly talking about the possible effects of
overlooked =  . But in these time-series examples we are talking about the possible effects
of overlooked =. It's difficult to say which type of problem is more intractable, but they do
seem to be two different types of problem.

$%'

When scientists use time series to study the effects of one variable on another, they usually have
at least two time series--one for the independent variable and one for the dependent--as in our
earlier example on the relation between unemployment and crime. The problems in analyzing
causal patterns are difficult but not impossible.

One problem with such research is that because the observations within each series are not
independent of each other, the probability of finding a high correlation between the two series
may be higher than is suggested by standard formulas. Later we describe a solution to this
problem.

A second problem is that it is rarely reasonable to assume that the time sequence of the causal
patterns matches the time periods in the study. Thus if increased unemployment typically
produced an increase in crime exactly six months later but not five months later, then it would be
fairly easy to discover that relationship by correlating monthly changes in unemployment with
monthly changes in crime six months later. However, it is much more plausible to assume that
increased unemployment in January produces a slight rise in crime during February, a further
slight rise during March, and so on for several months. Such effects can be much more difficult
to detect, though later we do suggest a solution to this problem.

A third problem in analyzing causal patterns is the familiar problem that correlation does not
imply causation. As in ordinary regression problems, it helps to be able to control statistically for
covariates. Later we describe one way to do this in time-series problems.

$ &

Despite the difficulties just outlined, time-series analyses have many important uses. So we now
turn to methods of time-series analysis.

 %&
In time-series analysis as in other areas of statistics, hypotheses about cause-effect relationships
can be stated in terms of prediction. In simple statistical methods, the hypothesis that Î causes
implies a correlation between Î and , which in turn implies that can be predicted from Î.
More generally, any hypothesis about the effect of Î on is tested by seeing whether scores on
can be predicted more accurately from a model that includes Î than from a model that excludes
Î.

The same general logic applies in time-series work. For any of our three major uses of time-
series analysis, you predict or forecast each value in the series as accurately as possible from
previous values--either in the same series or other series. Then you may draw causal inferences
from the fact that some particular factor is or is not of predictive value. As in a previous
example, if you could somehow predict the crime rate from previous levels of unemployment,
that would suggest that unemployment may be one of the causes of crime.

Thus the topic of forecasting is central to all our major uses of time series. Of course forecasting
is often of great interest in its own right, and indeed many discussions of time series don't even
mention other applications. Therefore we start with that topic.

$

The basic goal of autoregression is to find a formula which forecasts each entry in a time series
(except perhaps the first few) accurately from the preceding entries. It may then be reasonable to
hope that the same formula can be used to forecast future entries in the series from the last few
entries currently known.

In autoregression Î usually denotes the   variable, which is plotted as a function of


TIME. Consider possible ways of forecasting xi from the previous points. One way would be to
simply use each entry as the estimate of the following entry. That would actually give the best
possible forecasts in a simple random walk. A second way would be to average, say, the last 5
entries before each entry xi and use that average as an estimate of xi. A third way would be to
draw a straight line through the last two entries before xi, and extend that line one unit to the
right to estimate xi. You could imagine trying out these three methods, seeing which one works
best in forecasting entries in the series from the ones preceding them, and then applying the
winning method to the last few entries in the series to forecast the next entry.

In all three of these methods, the forecast is a  


  of the observations preceding xi.
That point is obvious for the first method, in which the forecast is simply xi-1. A mean is also a
linear function of the observations, so the second method also uses a linear function. In using the
third method (passing a straight line through points i-2 and i-1), you are essentially assuming
that the difference between i and i-1 will equal the difference between i-1 and i-2. Thus the
forecast of i is i-1 + (i-1 - i-2) or 2i-1 - xi-2, which is also a linear function of the observations.

Autoregression provides a way of examining an extremely broad class of linear functions like
these, and selecting the one that works best from this class. In effect you merely have to say
which preceding observations you wish to include in the linear function, and autoregression will
examine every possible linear function of those observations and select the one that works best in
the current sample. If you use only the most recent observations then the number of observations
used in making each forecast is the  of the autoregression.
Our three examples might suggest that linear functions include only a limited range of predictive
techniques, but linear autoregressive functions include far more techniques than is obvious. For
instance, it can be shown that assigning weights of +3, -3, and +1 to i-1, i-2, and i-3 respectively
is equivalent to fitting a parabola to those three points and extending the parabola one unit to the
right to make a forecast. To illustrate, suppose three points are respectively 1, 4, and 9, the
squares of the first three integers. Then the forecast of the next point is 3*9 -3*4 + 1*1 = 27 - 12
+ 1 = 16 = 42. Notice we're not talking about fitting just one parabola to an entire series, but
rather a separate parabola for each point we're trying to forecast. Some of these parabolas might
curve upward while others curve down, and some might be steepest on the right while others are
steepest on the left. All they would have in common is that they are all parabolas.

Going even beyond parabolas, consider finding the best-fitting polynomial of order  for a set of
 consecutive points, where n >= +1, and making a forecast by extending that polynomial one
unit to the right. Or suppose you do this while weighting the earlier points less than the later
points because they're farther from the point to be forecast. It can be shown that all these
techniques are linear techniques, and are thus within the class of techniques from which
autoregression selects the best one. Therefore autoregression can be an extremely powerful tool
for finding a way to predict each point in the series from earlier points.

$( #$ &

In the table below, column ^ is simply column Î shifted down one place, with a dot entered in
the first position of ^ to represent missing data. Column ^ is Î shifted down two places with
two missing-data symbols. We chose the names ^ and ^ to stand for "back one" and "back
two", since you can read the entries in ^ and ^ by going back one or two cases in Î.
X B1 B2
3 . .
7 3 .
4 7 3
5 4 7
2 5 4
8 2 5

In a program like SYSTAT, the command LET B1 = LAG(X) will produce the second column
from the first, and the command LET B2 = LAG(B1) will then produce the third. In some
programs if you want to lag more than one row in a single command, you can add the order of
the lag as a second argument. Thus for instance column ^ might be produced directly from
column Î with the command LET B2 = LAG(X, 2). Once the lagged variables are constructed
the regression may be run in the usual way. Thus the name autoregression--the variable is
predicted from itself. If all terms in the model are consecutive starting with ^ , then the number
of lagged terms in the regression is called the  of the autoregression.

The 
   of Î is the correlation between Î and ^ , the  

autocorrelation is the correlation between Î and ^, and so on. (This is not exactly true, though
it's usually very close; see other works for exact formulas.) The terminology of autoregression
can be used to define a random walk precisely. In a simple random walk, the regression
coefficient of ^ is 1 and all other coefficients (and the additive constant) are 0. Thus the best
forecast of any entry in the series is simply the previous entry.

For readers familiar with ARIMA who want to see how they can use a regression program to get
exactly the same results (regression slopes, mean squared error, and standard errors of slopes)
they get from ARIMA, we offer this paragraph. Modify ordinary regression practices in two
ways. First, run the regression without a constant term. Second, at the tops of the columns of
lagged variables, replace the missing values with zeros. Your results will then be identical to
ARIMA results to many decimal places. However, we don't generally recommend either of these
modifications to ordinary regression practices, especially for the more complex models we
introduce later, for reasons that should ultimately become clear.

$& $

Even under the most restrictive statistical assumptions, regression results are not exact results the
way they sometimes are in ordinary regression theory. However, the details of statistical theory
become minor in comparison to the logical leap of faith you must always make when making a
forecast. As mentioned earlier, you must be willing to think of the next time period as just
another "case" drawn randomly from the same population as the cases used to construct the
forecasting formula. The plausibility of this assumption is ultimately a matter of judgment. But
given this assumption, it follows that the formula that has worked best to forecast past series
entries from the entries preceding them, is a reasonable formula to forecast the next series value
from the last values in the current series.

For maximum confidence in this forecast, you want to see whether the past success of the
forecasting formula was uniform across time. To do that, plot the forecasting errors against time.
Look for sections in which the mean error was not zero, and look for sections in which the
variance of the errors was larger than at other times. If you don't trust your eyeball to do this, you
can use quadratic or higher-order polynomial regression to separately predict  and 2 from
TIME. Significant curvilinearity in predicting  suggests that the forecasting model can be
improved, perhaps by adding one or more terms for trend; see later sections of this chapter.
Significant curvilinearity in predicting 2 cannot be removed by changing the model; it suggests
that the model works better at some times than others, and thus suggests caution in interpreting
the confidence limits described next.

If your data pass the tests just described, and if your sample was quite large, you can put
confidence limits on your forecast without assuming normality. Since you are thinking of the
next case in the series as just another case in your sample, it follows that the probability is 95%
that its forecasting error won't be among the largest 5% of forecasting errors in the sample.
Therefore you can use the actual distribution of forecast residuals to find what size error would
make the new error fall in that top 5%, and use that to put 95% confidence limits on your
forecast of the next series entry.

If you prefer you can do this without even assuming symmetry of residuals; you can use the
actual forecast errors to determine what error would fall in the top 5% of positive errors, and
determine separately what error would fall in the top 5% of negative errors. For instance, if your
forecast was 40, and the sixth-largest of 120 positive errors was +16 and the fifth-largest (in
absolute value) of 100 negative errors was -13, then your 95% confidence limits would be 56 and
27.

If the sample was only moderately large, then you should consider the fact that forecasting errors
will tend to be smaller for cases used to develop the forecasting formula than for other cases. If
the forecast errors are approximately normal, it helps to assume normality at this point. You can
use ordinary tests for normality to check this assumption. The MSE (mean squared error),
reported by either a regression program or a time-series autoregression program, equals the sum
of squared errors divided not by the sample size ‘, but by (‘ - number of parameters used in
fitting the model). This helps adjust for the downward bias in individual errors. It's not ordinarily
done, but a further adjustment would consist of finding 95% confidence limits by multiplying the
square root of MSE not by 1.96, but by the appropriate value of , which is a little larger. The
argument is that you really want to find the ratio between a normally distributed error, and an
independently distributed value of sqrt(MSE), and that ratio is distributed as . For instance,
suppose your forecast is 50, MSE = 36, ‘ = 63, and the model contains 3 parameters. Then df =
63 - 3 = 60, and a  table shows that a  of 2.00 should be used for 95% confidence limits. Thus
the confidence limits are 50 plus and minus 2 sqrt(36), or 38 and 62.

To construct a forecasting formula that predicts two or more time periods ahead, simply omit the
shortest lags from the regression model. For instance, a model predicting Î from ^, ^, and ^
forecasts three time periods ahead of the most recent observation used in the forecast. Of course
such forecasts are often much less accurate than shorter-term forecasts. As in one-step-ahead
forecasts, confidence limits on the forecasts can be found either from normal-curve formulas or
from the empirical distribution of forecast residuals.

&
)& 

When time-series variables Î and  can help forecast another variable , Î and  are often
called "leading indicators" of . To develop forecasting formulas which include such variables,
simply add the lagged forms of the other variables to the model. Let Î^ , ^ , and ^ denote
the first-order lags of variables Î, , and , and define higher-order lag terms similarly. Then a
formula predicting from previous values of Î, , and  could include the terms Î^ , ^ ,
^ , Î^, ^, ^, and so on.

$$ c  

A single event such as a new drunken-driving law may affect a time series in a number of ways;
we shall emphasize two. The event may produce an immediate jump (up or down) in the series,
or it may alter the trend of the series, so that a series that had trended up (perhaps deaths due to
drunken driving) starts to trend down, or at least trend up slower. Later we describe how to
analyze the second type of impact; this section considers only the first.

To study immediate jumps in a series, apply autoregression to the portion of the series preceding
the event, and use the methods already described to forecast a series entry after the event. If the
event's effect is hypothesized to occur primarily in a single time period, use a one-step-ahead
forecast. If the effect is hypothesized to occur over several time periods, a test forecasting several
time periods ahead may have greater power. Of course you will typically know in advance
whether the series jumped in one step or in several, but at least theoretically you should choose
your test (or get colleagues to make the choice) without that knowledge.

As was mentioned earlier, when you use this test you should think as carefully as possible about
other events occurring at 
  the same time, which might have produced the jump; the test
may not distinguish between the effects of two events which occur just a few time periods apart.

You might also like