23K views

Uploaded by Faizan Ahmad

Times Series Analysis will not be a problem now for researcher. This article will give give an indepth knowledge of time series with the help of SPSS output. With the help of this on can easily find out the trend prevailing in any type of industry.

- time series
- Time Series With SPSS
- Time Series Analysis R
- SPSS Forecasting
- Users Guide SPSS Modeler
- Methods of Time Series
- Time series analysis in R
- 14.2.6 SPSS Time Series Analysis and Forecasting
- A First Course on Time Series Analysis
- SPSS Statistcs Base User's Guide 17.0
- SPSS for Beginners
- Time Series Book
- Time Series
- Regression Models by SPSS Use
- Multivariate Data Analysis Using SPSS
- Time Series Analysis.hamilton
- Statistical Analysis Using SPSS
- Time Series Analysis
- SPSS Modeler Book
- 1468053450-Practical Time Series Forecasting

You are on page 1of 154

0

™

For more information about SPSS® software products, please visit our Web site at http://www.spss.com or contact

SPSS Inc.

233 South Wacker Drive, 11th Floor

Chicago, IL 60606-6412

Tel: (312) 651-3000

Fax: (312) 651-3668

SPSS is a registered trademark and the other product names are the trademarks of SPSS Inc. for its proprietary computer software. No

material describing such software may be produced or distributed without the written permission of the owners of the trademark and

license rights in the software and the copyrights in the published materials.

The SOFTWARE and documentation are provided with RESTRICTED RIGHTS. Use, duplication, or disclosure by the Government is

subject to restrictions as set forth in subdivision (c) (1) (ii) of The Rights in Technical Data and Computer Software clause at 52.227-7013.

Contractor/manufacturer is SPSS Inc., 233 South Wacker Drive, 11th Floor, Chicago, IL 60606-6412.

General notice: Other product names mentioned herein are used for identification purposes only and may be trademarks of their

respective companies.

Windows is a registered trademark of Microsoft Corporation.

DataDirect, DataDirect Connect, INTERSOLV, and SequeLink are registered trademarks of DataDirect Technologies.

Portions of this product were created using LEADTOOLS © 1991–2000, LEAD Technologies, Inc. ALL RIGHTS RESERVED.

LEAD, LEADTOOLS, and LEADVIEW are registered trademarks of LEAD Technologies, Inc.

Sax Basic is a trademark of Sax Software Corporation. Copyright © 1993–2004 by Polar Engineering and Consulting. All rights reserved.

Portions of this product were based on the work of the FreeType Team (http://www.freetype.org).

A portion of the SPSS software contains zlib technology. Copyright © 1995–2002 by Jean-loup Gailly and Mark Adler. The zlib

software is provided “as is,” without express or implied warranty.

A portion of the SPSS software contains Sun Java Runtime libraries. Copyright © 2003 by Sun Microsystems, Inc. All rights reserved.

The Sun Java Runtime libraries include code licensed from RSA Security, Inc. Some portions of the libraries are licensed from IBM and

are available at http://oss.software.ibm.com/icu4j/.

Copyright © 2005 by SPSS Inc.

All rights reserved.

Printed in the United States of America.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic,

mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher.

1234567890 08 07 06 05

ISBN 1-56827-373-8

Preface

SPSS 14.0 is a comprehensive system for analyzing data. The SPSS Trends optional

add-on module provides the additional analytic techniques described in this manual.

The Trends add-on module must be used with the SPSS 14.0 Base system and is

completely integrated into that system.

Installation

To install the SPSS Trends add-on module, run the License Authorization Wizard using

the authorization code that you received from SPSS Inc. For more information, see the

installation instructions supplied with the SPSS Trends add-on module.

Compatibility

SPSS is designed to run on many computer systems. See the installation instructions

that came with your system for specific information on minimum and recommended

requirements.

Serial Numbers

Your serial number is your identification number with SPSS Inc. You will need this

serial number when you contact SPSS Inc. for information regarding support, payment,

or an upgraded system. The serial number was provided with your Base system.

Customer Service

If you have any questions concerning your shipment or account, contact your local

office, listed on the SPSS Web site at http://www.spss.com/worldwide. Please have

your serial number ready for identification.

iii

Training Seminars

SPSS Inc. provides both public and onsite training seminars. All seminars feature

hands-on workshops. Seminars will be offered in major cities on a regular basis. For

more information on these seminars, contact your local office, listed on the SPSS Web

site at http://www.spss.com/worldwide.

Technical Support

Customers may contact Technical Support for assistance in using SPSS or for

installation help for one of the supported hardware environments. To reach Technical

Support, see the SPSS Web site at http://www.spss.com, or contact your local office,

listed on the SPSS Web site at http://www.spss.com/worldwide. Be prepared to identify

yourself, your organization, and the serial number of your system.

Additional Publications

Additional copies of SPSS product manuals may be purchased directly from SPSS Inc.

Visit the SPSS Web Store at http://www.spss.com/estore, or contact your local SPSS

office, listed on the SPSS Web site at http://www.spss.com/worldwide. For telephone

orders in the United States and Canada, call SPSS Inc. at 800-543-2185. For telephone

orders outside of North America, contact your local office, listed on the SPSS Web site.

The SPSS Statistical Procedures Companion, by Marija Norušis, has been published

by Prentice Hall. A new version of this book, updated for SPSS 14.0, is planned.

The SPSS Advanced Statistical Procedures Companion, also based on SPSS 14.0, is

forthcoming. The SPSS Guide to Data Analysis for SPSS 14.0 is also in development.

Announcements of publications available exclusively through Prentice Hall will be

available on the SPSS Web site at http://www.spss.com/estore (select your home

country, and then click Books).

Your comments are important. Please let us know about your experiences with SPSS

products. We especially like to hear about new and interesting applications using

the SPSS Trends add-on module. Please send e-mail to suggest@spss.com or write

to SPSS Inc., Attn.: Director of Product Planning, 233 South Wacker Drive, 11th

Floor, Chicago, IL 60606-6412.

iv

About This Manual

This manual documents the graphical user interface for the procedures included in the

SPSS Trends add-on module. Illustrations of dialog boxes are taken from SPSS for

Windows. Dialog boxes in other operating systems are similar. Detailed information

about the command syntax for features in the SPSS Trends add-on module is available

in two forms: integrated into the overall Help system and as a separate document in

PDF form in the SPSS 14.0 Command Syntax Reference, available from the Help menu.

Contacting SPSS

If you would like to be on our mailing list, contact one of our offices, listed on our Web

site at http://www.spss.com/worldwide.

v

Contents

Data Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Estimation and Validation Periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Building Models and Producing Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Model Selection and Event Specification . . . . . . . . . . . . . . . . . . . . . . . . 10

Handling Outliers with the Expert Modeler . . . . . . . . . . . . . . . . . . . . . . . 12

Custom Exponential Smoothing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Custom ARIMA Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Model Specification for Custom ARIMA Models. . ... ... ... ... ... .. 16

Transfer Functions in Custom ARIMA Models . . . ... ... ... ... ... .. 18

Outliers in Custom ARIMA Models . . . . . . . . . . . . ... ... ... ... ... .. 20

Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... ... ... ... ... .. 21

Statistics and Forecast Tables . . . . . . . . . . . . . . . . . . . . . ... ... ... .. 22

Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... ... ... .. 24

Limiting Output to the Best- or Poorest-Fitting Models . . . ... ... ... .. 26

Saving Model Predictions and Model Specifications. . . . . . . . ... ... ... .. 28

Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

TSMODEL Command Additional Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

vii

3 Apply Time Series Models 33

Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Statistics and Forecast Tables . . . . . . . . . . . . . . . . . . . . . ... ... ... .. 38

Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... ... ... .. 41

Limiting Output to the Best- or Poorest-Fitting Models . . . ... ... ... .. 43

Saving Model Predictions and Model Specifications. . . . . . . . ... ... ... .. 45

Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

TSAPPLY Command Additional Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4 Seasonal Decomposition 49

SEASON Command Additional Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Spectral Plots 53

Running the Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Model Summary Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

viii

Model Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Model Fit Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Model Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Predictors 81

Plotting Your Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Running the Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Series Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Model Description Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Model Statistics Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

ARIMA Model Parameters Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Models 93

Extending the Predictor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Modifying Predictor Values in the Forecast Period . . . . . . . . . . . . . . . . . . . . 99

Running the Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

ix

10 Seasonal Decomposition 107

Preliminaries . . . . . . . . . . . . . . . . . . . . . . ... ... ... ... ... ... ... . 107

Determining and Setting the Periodicity . . ... ... ... ... ... ... ... . 108

Running the Analysis . . . . . . . . . . . . . . . . ... ... ... ... ... ... ... . 113

Understanding the Output . . . . . . . . . . . . ... ... ... ... ... ... ... . 114

Summary . . . . . . . . . . . . . . . . . . . . . . . . . ... ... ... ... ... ... ... . 116

Related Procedures . . . . . . . . . . . . . . . . . . . . ... ... ... ... ... ... ... . 117

Running the Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... ... ... . 119

Understanding the Periodogram and Spectral Density . . . ... ... ... . 121

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... ... ... . 124

Related Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... ... ... . 124

x

Appendices

Bibliography 135

Index 137

xi

Part I:

User's Guide

Chapter

1

Introduction to Time Series in

SPSS

over a period of time. In a series of inventory data, for example, the observations might

represent daily inventory levels for several months. A series showing the market share

of a product might consist of weekly market share taken over a few years. A series of

total sales figures might consist of one observation per month for many years. What

each of these examples has in common is that some variable was observed at regular,

known intervals over a certain length of time. Thus, the form of the data for a typical

time series is a single sequence or list of observations representing measurements

taken at regular intervals.

Table 1-1

Daily inventory time series

Time Week Day Inventory

level

t1 1 Monday 160

t2 1 Tuesday 135

t3 1 Wednesday 129

t4 1 Thursday 122

t5 1 Friday 108

t6 2 Monday 150

...

t60 12 Friday 120

One of the most important reasons for doing time series analysis is to try to forecast

future values of the series. A model of the series that explained the past values may

also predict whether and how much the next few values will increase or decrease. The

1

2

Chapter 1

or scientific field.

When you define time series data for use with SPSS Trends, each series corresponds to

a separate variable. For example, to define a time series in the Data Editor, click the

Variable View tab and enter a variable name in any blank row. Each observation in a

time series corresponds to a case in SPSS (a row in the Data Editor).

If you open a spreadsheet containing time series data, each series should be arranged

in a column in the spreadsheet. If you already have a spreadsheet with time series

arranged in rows, you can open it anyway and use Transpose on the Data menu to flip

the rows into columns.

Data Transformations

A number of data transformation procedures provided in the SPSS Base system are

useful in time series analysis.

The Define Dates procedure (on the Data menu) generates date variables used

to establish periodicity and to distinguish between historical, validation, and

forecasting periods. Trends is designed to work with the variables created by

the Define Dates procedure.

The Create Time Series procedure (on the Transform menu) creates new time

series variables as functions of existing time series variables. It includes functions

that use neighboring observations for smoothing, averaging, and differencing.

The Replace Missing Values procedure (on the Transform menu) replaces system-

and user-missing values with estimates based on one of several methods. Missing

data at the beginning or end of a series pose no particular problem; they simply

shorten the useful length of the series. Gaps in the middle of a series (embedded

missing data) can be a much more serious problem.

See the SPSS Base User’s Guide for detailed information concerning data

transformations for time series.

3

It is often useful to divide your time series into an estimation, or historical, period

and a validation period. You develop a model on the basis of the observations in

the estimation (historical) period and then test it to see how well it works in the

validation period. By forcing the model to make predictions for points you already

know (the points in the validation period), you get an idea of how well the model

does at forecasting.

The cases in the validation period are typically referred to as holdout cases because

they are held-back from the model-building process. The estimation period consists of

the currently selected cases in the active dataset. Any remaining cases following the

last selected case can be used as holdouts. Once you’re satisfied that the model does

an adequate job of forecasting, you can redefine the estimation period to include the

holdout cases, and then build your final model.

SPSS Trends provides two procedures for accomplishing the tasks of creating models

and producing forecasts.

The “Time Series Modeler” procedure creates models for time series, and produces

forecasts. It includes an Expert Modeler that automatically determines the best

model for each of your time series. For experienced analysts who desire a greater

degree of control, it also provides tools for custom model building.

The “Apply Time Series Models” procedure applies existing time series

models—created by the Time Series Modeler—to the active dataset. This allows

you to obtain forecasts for series for which new or revised data are available,

without rebuilding your models. If there’s reason to think that a model has

changed, it can be rebuilt using the Time Series Modeler.

Chapter

2

Time Series Modeler

Autoregressive Integrated Moving Average (ARIMA), and multivariate ARIMA (or

transfer function models) models for time series, and produces forecasts. The procedure

includes an Expert Modeler that automatically identifies and estimates the best-fitting

ARIMA or exponential smoothing model for one or more dependent variable series,

thus eliminating the need to identify an appropriate model through trial and error.

Alternatively, you can specify a custom ARIMA or exponential smoothing model.

Example. You are a product manager responsible for forecasting next month’s unit

sales and revenue for each of 100 separate products, and have little or no experience in

modeling time series. Your historical unit sales data for all 100 products is stored in a

single Excel spreadsheet. After opening your spreadsheet in SPSS, you use the Expert

Modeler and request forecasts one month into the future. The Expert Modeler finds the

best model of unit sales for each of your products, and uses those models to produce

the forecasts. Since the Expert Modeler can handle multiple input series, you only have

to run the procedure once to obtain forecasts for all of your products. Choosing to save

the forecasts to the active dataset, you can easily export the results back to Excel.

square error (RMSE), mean absolute error (MAE), mean absolute percentage

error (MAPE), maximum absolute error (MaxAE), maximum absolute percentage

error (MaxAPE), normalized Bayesian information criterion (BIC). Residuals:

autocorrelation function, partial autocorrelation function, Ljung-Box Q. For ARIMA

models: ARIMA orders for dependent variables, transfer function orders for

independent variables, and outlier estimates. Also, smoothing parameter estimates for

exponential smoothing models.

Plots. Summary plots across all models: histograms of stationary R-square, R-square

(R2), root mean square error (RMSE), mean absolute error (MAE), mean absolute

percentage error (MAPE), maximum absolute error (MaxAE), maximum absolute

5

6

Chapter 2

plots of residual autocorrelations and partial autocorrelations. Results for individual

models: forecast values, fit values, observed values, upper and lower confidence limits,

residual autocorrelations and partial autocorrelations.

Data. The dependent variable and any independent variables should be numeric.

Assumptions. The dependent variable and any independent variables are treated as time

series, meaning that each case represents a time point, with successive cases separated

by a constant time interval.

Stationarity. For custom ARIMA models, the time series to be modeled should

be stationary. The most effective way to transform a nonstationary series into a

stationary one is through a difference transformation—available from the Create

Time Series dialog.

Forecasts. For producing forecasts using models with independent (predictor)

variables, the active dataset should contain values of these variables for all cases in

the forecast period. Additionally, independent variables should not contain any

missing values in the estimation period.

Defining Dates

Although not required, it’s recommended to use the Define Dates dialog box to specify

the date associated with the first case and the time interval between successive cases.

This is done prior to using the Time Series Modeler and results in a set of variables

that label the date associated with each case. It also sets an assumed periodicity of the

data—for example, a periodicity of 12 if the time interval between successive cases

is one month. This periodicity is required if you’re interested in creating seasonal

models. If you’re not interested in seasonal models and don’t require date labels on

your output, you can skip the Define Dates dialog box. The label associated with

each case is then simply the case number.

Analyze

Time Series

Create Models...

7

Figure 2-1

Time Series Modeler, Variables tab

E From the Method drop-down box, select a modeling method. For automatic modeling,

leave the default method of Expert Modeler. This will invoke the Expert Modeler to

determine the best-fitting model for each of the dependent variables.

To produce forecasts:

E Specify the forecast period. This will produce a chart that includes forecasts and

observed values.

8

Chapter 2

Select one or more independent variables. Independent variables are treated much

like predictor variables in regression analysis but are optional. They can be

included in ARIMA models but not exponential smoothing models. If you specify

Expert Modeler as the modeling method and include independent variables, only

ARIMA models will be considered.

Click Criteria to specify modeling details.

Save predictions, confidence intervals, and noise residuals.

Save the estimated models in XML format. Saved models can be applied to new

or revised data to obtain updated forecasts without rebuilding models. This is

accomplished with the “Apply Time Series Models” procedure.

Obtain summary statistics across all estimated models.

Specify transfer functions for independent variables in custom ARIMA models.

Enable automatic detection of outliers.

Model specific time points as outliers for custom ARIMA models.

Modeling Methods

Expert Modeler. The Expert Modeler automatically finds the best-fitting model for

each dependent series. If independent (predictor) variables are specified, the Expert

Modeler selects, for inclusion in ARIMA models, those that have a statistically

significant relationship with the dependent series. Model variables are transformed

where appropriate using differencing and/or a square root or natural log transformation.

By default, the Expert Modeler considers both exponential smoothing and ARIMA

models. You can, however, limit the Expert Modeler to only search for ARIMA

models or to only search for exponential smoothing models. You can also specify

automatic detection of outliers.

Exponential Smoothing. Use this option to specify a custom exponential smoothing

model. You can choose from a variety of exponential smoothing models that differ in

their treatment of trend and seasonality.

ARIMA. Use this option to specify a custom ARIMA model. This involves explicitly

specifying autoregressive and moving average orders, as well as the degree of

differencing. You can include independent (predictor) variables and define transfer

9

functions for any or all of them. You can also specify automatic detection of outliers or

specify an explicit set of outliers.

Estimation Period. The estimation period defines the set of cases used to determine

the model. By default, the estimation period includes all cases in the active dataset.

To set the estimation period, select Based on time or case range in the Select Cases

dialog box. Depending on available data, the estimation period used by the procedure

may vary by dependent variable and thus differ from the displayed value. For a given

dependent variable, the true estimation period is the period left after eliminating any

contiguous missing values of the variable occurring at the beginning or end of the

specified estimation period.

Forecast Period. The forecast period begins at the first case after the estimation period,

and by default goes through to the last case in the active dataset. You can set the end of

the forecast period from the Options tab.

The Expert Modeler provides options for constraining the set of candidate models,

specifying the handling of outliers, and including event variables.

10

Chapter 2

Figure 2-2

Expert Modeler Criteria dialog box, Model tab

The Model tab allows you to specify the types of models considered by the Expert

Modeler and to specify event variables.

All models. The Expert Modeler considers both ARIMA and exponential smoothing

models.

Exponential smoothing models only. The Expert Modeler only considers exponential

smoothing models.

ARIMA models only. The Expert Modeler only considers ARIMA models.

11

Expert Modeler considers seasonal models. This option is only enabled if a periodicity

has been defined for the active dataset. When this option is selected (checked), the

Expert Modeler considers both seasonal and nonseasonal models. If this option is not

selected, the Expert Modeler only considers nonseasonal models.

Current Periodicity. Indicates the periodicity (if any) currently defined for the active

dataset. The current periodicity is given as an integer—for example, 12 for annual

periodicity, with each case representing a month. The value None is displayed if no

periodicity has been set. Seasonal models require a periodicity. You can set the

periodicity from the Define Dates dialog box.

Events. Select any independent variables that are to be treated as event variables. For

event variables, cases with a value of 1 indicate times at which the dependent series are

expected to be affected by the event. Values other than 1 indicate no effect.

12

Chapter 2

Figure 2-3

Expert Modeler Criteria dialog box, Outliers tab

The Outliers tab allows you to choose automatic detection of outliers as well as the

type of outliers to detect.

performed. Select (check) this option to perform automatic detection of outliers, then

select one or more of the following outlier types:

Additive

Level shift

Innovational

13

Transient

Seasonal additive

Local trend

Additive patch

Figure 2-4

Exponential Smoothing Criteria dialog box

Model Type. Exponential smoothing models (Gardner, 1985) are classified as either

seasonal or nonseasonal. Seasonal models are only available if a periodicity has been

defined for the active dataset (see “Current Periodicity” below).

Simple. This model is appropriate for series in which there is no trend or

seasonality. Its only smoothing parameter is level. Simple exponential smoothing

is most similar to an ARIMA model with zero orders of autoregression, one order

of differencing, one order of moving average, and no constant.

14

Chapter 2

Holt's linear trend. This model is appropriate for series in which there is a linear

trend and no seasonality. Its smoothing parameters are level and trend, which

are not constrained by each other's values. Holt’s model is more general than

Brown’s model but may take longer to compute for large series. Holt’s exponential

smoothing is most similar to an ARIMA model with zero orders of autoregression,

two orders of differencing, and two orders of moving average.

Brown's linear trend. This model is appropriate for series in which there is a linear

trend and no seasonality. Its smoothing parameters are level and trend, which are

assumed to be equal. Brown’s model is therefore a special case of Holt’s model.

Brown’s exponential smoothing is most similar to an ARIMA model with zero

orders of autoregression, two orders of differencing, and two orders of moving

average, with the coefficient for the second order of moving average equal to the

square of one-half of the coefficient for the first order.

Damped trend. This model is appropriate for series with a linear trend that is dying

out and with no seasonality. Its smoothing parameters are level, trend, and damping

trend. Damped exponential smoothing is most similar to an ARIMA model with 1

order of autoregression, 1 order of differencing, and 2 orders of moving average.

Simple seasonal. This model is appropriate for series with no trend and a seasonal

effect that is constant over time. Its smoothing parameters are level and season.

Simple seasonal exponential smoothing is most similar to an ARIMA model with

zero orders of autoregression, one order of differencing, one order of seasonal

differencing, and orders 1, p, and p + 1 of moving average, where p is the number

of periods in a seasonal interval (for monthly data, p = 12).

Winters' additive. This model is appropriate for series with a linear trend and a

seasonal effect that does not depend on the level of the series. Its smoothing

parameters are level, trend, and season. Winters' additive exponential smoothing is

most similar to an ARIMA model with zero orders of autoregression, one order

of differencing, one order of seasonal differencing, and p + 1 orders of moving

average, where p is the number of periods in a seasonal interval (for monthly

data, p = 12).

Winters' multiplicative. This model is appropriate for series with a linear trend and

a seasonal effect that depends on the level of the series. Its smoothing parameters

are level, trend, and season. Winters’ multiplicative exponential smoothing is not

similar to any ARIMA model.

Current Periodicity. Indicates the periodicity (if any) currently defined for the active

dataset. The current periodicity is given as an integer—for example, 12 for annual

periodicity, with each case representing a month. The value None is displayed if no

15

periodicity has been set. Seasonal models require a periodicity. You can set the

periodicity from the Define Dates dialog box.

Dependent Variable Transformation. You can specify a transformation performed on

each dependent variable before it is modeled.

None. No transformation is performed.

Square root. Square root transformation.

Natural log. Natural log transformation.

The Time Series Modeler allows you to build custom nonseasonal or seasonal ARIMA

(Autoregressive Integrated Moving Average) models—also known as Box-Jenkins

(Box, Jenkins, and Reinsel, 1994) models—with or without a fixed set of predictor

variables. You can define transfer functions for any or all of the predictor variables,

and specify automatic detection of outliers, or specify an explicit set of outliers.

All independent (predictor) variables specified on the Variables tab are explicitly

included in the model. This is in contrast to using the Expert Modeler where

independent variables are only included if they have a statistically significant

relationship with the dependent variable.

16

Chapter 2

Figure 2-5

ARIMA Criteria dialog box, Model tab

The Model tab allows you to specify the structure of a custom ARIMA model.

ARIMA Orders. Enter values for the various ARIMA components of your model

into the corresponding cells of the Structure grid. All values must be non-negative

integers. For autoregressive and moving average components, the value represents

the maximum order. All positive lower orders will be included in the model. For

example, if you specify 2, the model includes orders 2 and 1. Cells in the Seasonal

column are only enabled if a periodicity has been defined for the active dataset (see

“Current Periodicity” below).

17

Autoregressive orders specify which previous values from the series are used to

predict current values. For example, an autoregressive order of 2 specifies that the

value of the series two time periods in the past be used to predict the current value.

Difference (d). Specifies the order of differencing applied to the series before

estimating models. Differencing is necessary when trends are present (series with

trends are typically nonstationary and ARIMA modeling assumes stationarity) and

is used to remove their effect. The order of differencing corresponds to the degree

of series trend—first-order differencing accounts for linear trends, second-order

differencing accounts for quadratic trends, and so on.

Moving Average (q). The number of moving average orders in the model. Moving

average orders specify how deviations from the series mean for previous values

are used to predict current values. For example, moving-average orders of 1 and 2

specify that deviations from the mean value of the series from each of the last two

time periods be considered when predicting current values of the series.

Seasonal Orders. Seasonal autoregressive, moving average, and differencing

components play the same roles as their nonseasonal counterparts. For seasonal orders,

however, current series values are affected by previous series values separated by one

or more seasonal periods. For example, for monthly data (seasonal period of 12), a

seasonal order of 1 means that the current series value is affected by the series value 12

periods prior to the current one. A seasonal order of 1, for monthly data, is then the

same as specifying a nonseasonal order of 12.

Current Periodicity. Indicates the periodicity (if any) currently defined for the active

dataset. The current periodicity is given as an integer—for example, 12 for annual

periodicity, with each case representing a month. The value None is displayed if no

periodicity has been set. Seasonal models require a periodicity. You can set the

periodicity from the Define Dates dialog box.

Dependent Variable Transformation. You can specify a transformation performed on

each dependent variable before it is modeled.

None. No transformation is performed.

Square root. Square root transformation.

Natural log. Natural log transformation.

Include constant in model. Inclusion of a constant is standard unless you are sure that

the overall mean series value is 0. Excluding the constant is recommended when

differencing is applied.

18

Chapter 2

Figure 2-6

ARIMA Criteria dialog box, Transfer Function tab

The Transfer Function tab (only present if independent variables are specified) allows

you to define transfer functions for any or all of the independent variables specified on

the Variables tab. Transfer functions allow you to specify the manner in which past

values of independent (predictor) variables are used to forecast future values of the

dependent series.

Transfer Function Orders. Enter values for the various components of the transfer

function into the corresponding cells of the Structure grid. All values must be

non-negative integers. For numerator and denominator components, the value

represents the maximum order. All positive lower orders will be included in the model.

In addition, order 0 is always included for numerator components. For example, if you

specify 2 for numerator, the model includes orders 2, 1, and 0. If you specify 3 for

19

denominator, the model includes orders 3, 2, and 1. Cells in the Seasonal column are

only enabled if a periodicity has been defined for the active dataset (see “Current

Periodicity” below).

Numerator. The numerator order of the transfer function. Specifies which previous

values from the selected independent (predictor) series are used to predict current

values of the dependent series. For example, a numerator order of 1 specifies that

the value of an independent series one time period in the past—as well as the

current value of the independent series—is used to predict the current value of

each dependent series.

Denominator. The denominator order of the transfer function. Specifies how

deviations from the series mean, for previous values of the selected independent

(predictor) series, are used to predict current values of the dependent series. For

example, a denominator order of 1 specifies that deviations from the mean value of

an independent series one time period in the past be considered when predicting

the current value of each dependent series.

Difference. Specifies the order of differencing applied to the selected independent

(predictor) series before estimating models. Differencing is necessary when trends

are present and is used to remove their effect.

Seasonal Orders. Seasonal numerator, denominator, and differencing components

play the same roles as their nonseasonal counterparts. For seasonal orders, however,

current series values are affected by previous series values separated by one or more

seasonal periods. For example, for monthly data (seasonal period of 12), a seasonal

order of 1 means that the current series value is affected by the series value 12 periods

prior to the current one. A seasonal order of 1, for monthly data, is then the same

as specifying a nonseasonal order of 12.

Current Periodicity. Indicates the periodicity (if any) currently defined for the active

dataset. The current periodicity is given as an integer—for example, 12 for annual

periodicity, with each case representing a month. The value None is displayed if no

periodicity has been set. Seasonal models require a periodicity. You can set the

periodicity from the Define Dates dialog box.

Delay. Setting a delay causes the independent variable’s influence to be delayed by

the number of intervals specified. For example, if the delay is set to 5, the value of

the independent variable at time t doesn’t affect forecasts until five periods have

elapsed (t + 5).

20

Chapter 2

also includes an optional transformation to be performed on those variables.

None. No transformation is performed.

Square root. Square root transformation.

Natural log. Natural log transformation.

Figure 2-7

ARIMA Criteria dialog box, Outliers tab

The Outliers tab provides the following choices for the handling of outliers (Pena,

Tiao, and Tsay, 2001): detect them automatically, specify particular points as outliers,

or do not detect or model them.

21

Do not detect outliers or model them. By default, outliers are neither detected nor

modeled. Select this option to disable any detection or modeling of outliers.

Detect outliers automatically. Select this option to perform automatic detection of

outliers, and select one or more of the following outlier types:

Additive

Level shift

Innovational

Transient

Seasonal additive

Local trend

Additive patch

For more information, see “Outlier Types” in Appendix B on p. 127.

Model specific time points as outliers. Select this option to specify particular time points

as outliers. Use a separate row of the Outlier Definition grid for each outlier. Enter

values for all of the cells in a given row.

Type. The outlier type. The supported types are: additive (default), level shift,

innovational, transient, seasonal additive, and local trend.

Note 1: If no date specification has been defined for the active dataset, the Outlier

Definition grid shows the single column Observation. To specify an outlier, enter the

row number (as displayed in the Data Editor) of the relevant case.

Note 2: The Cycle column (if present) in the Outlier Definition grid refers to the value

of the CYCLE_ variable in the active dataset.

Output

Available output includes results for individual models as well as results calculated

across all models. Results for individual models can be limited to a set of best- or

poorest-fitting models based on user-specified criteria.

22

Chapter 2

Figure 2-8

Time Series Modeler, Statistics tab

The Statistics tab provides options for displaying tables of the modeling results.

Display fit measures, Ljung-Box statistic, and number of outliers by model. Select (check)

this option to display a table containing selected fit measures, Ljung-Box value, and

the number of outliers for each estimated model.

Fit Measures. You can select one or more of the following for inclusion in the table

containing fit measures for each estimated model:

Stationary R-square

R-square

23

Mean absolute percentage error

Mean absolute error

Maximum absolute percentage error

Maximum absolute error

Normalized BIC

For more information, see “Goodness-of-Fit Measures” in Appendix A on p. 125.

Statistics for Comparing Models. This group of options controls display of tables

containing statistics calculated across all estimated models. Each option generates a

separate table. You can select one or more of the following options:

Goodness of fit. Table of summary statistics and percentiles for stationary R-square,

R-square, root mean square error, mean absolute percentage error, mean absolute

error, maximum absolute percentage error, maximum absolute error, and

normalized Bayesian Information Criterion.

Residual autocorrelation function (ACF). Table of summary statistics and percentiles

for autocorrelations of the residuals across all estimated models.

Residual partial autocorrelation function (PACF). Table of summary statistics and

percentiles for partial autocorrelations of the residuals across all estimated models.

Statistics for Individual Models. This group of options controls display of tables

containing detailed information for each estimated model. Each option generates a

separate table. You can select one or more of the following options:

Parameter estimates. Displays a table of parameter estimates for each estimated

model. Separate tables are displayed for exponential smoothing and ARIMA

models. If outliers exist, parameter estimates for them are also displayed in a

separate table.

Residual autocorrelation function (ACF). Displays a table of residual autocorrelations

by lag for each estimated model. The table includes the confidence intervals for

the autocorrelations.

Residual partial autocorrelation function (PACF). Displays a table of residual

partial autocorrelations by lag for each estimated model. The table includes the

confidence intervals for the partial autocorrelations.

Display forecasts. Displays a table of model forecasts and confidence intervals for each

estimated model. The forecast period is set from the Options tab.

24

Chapter 2

Plots

Figure 2-9

Time Series Modeler, Plots tab

The Plots tab provides options for displaying plots of the modeling results.

This group of options controls display of plots containing statistics calculated across

all estimated models. Each option generates a separate plot. You can select one or

more of the following options:

Stationary R-square

R-square

25

Mean absolute percentage error

Mean absolute error

Maximum absolute percentage error

Maximum absolute error

Normalized BIC

Residual autocorrelation function (ACF)

Residual partial autocorrelation function (PACF)

For more information, see “Goodness-of-Fit Measures” in Appendix A on p. 125.

Series. Select (check) this option to obtain plots of the predicted values for each

estimated model. You can select one or more of the following for inclusion in the plot:

Observed values. The observed values of the dependent series.

Forecasts. The model predicted values for the forecast period.

Fit values. The model predicted values for the estimation period.

Confidence intervals for forecasts. The confidence intervals for the forecast period.

Confidence intervals for fit values. The confidence intervals for the estimation

period.

Residual autocorrelation function (ACF). Displays a plot of residual autocorrelations for

each estimated model.

Residual partial autocorrelation function (PACF). Displays a plot of residual partial

autocorrelations for each estimated model.

26

Chapter 2

Figure 2-10

Time Series Modeler, Output Filter tab

The Output Filter tab provides options for restricting both tabular and chart output to

a subset of the estimated models. You can choose to limit output to the best-fitting

and/or the poorest-fitting models according to fit criteria you provide. By default, all

estimated models are included in the output.

Best-fitting models. Select (check) this option to include the best-fitting models in the

output. Select a goodness-of-fit measure and specify the number of models to include.

Selecting this option does not preclude also selecting the poorest-fitting models. In that

case, the output will consist of the poorest-fitting models as well as the best-fitting ones.

27

Fixed number of models. Specifies that results are displayed for the n best-fitting

models. If the number exceeds the number of estimated models, all models are

displayed.

Percentage of total number of models. Specifies that results are displayed for models

with goodness-of-fit values in the top n percent across all estimated models.

Poorest-fitting models. Select (check) this option to include the poorest-fitting models

in the output. Select a goodness-of-fit measure and specify the number of models

to include. Selecting this option does not preclude also selecting the best-fitting

models. In that case, the output will consist of the best-fitting models as well as the

poorest-fitting ones.

Fixed number of models. Specifies that results are displayed for the n poorest-fitting

models. If the number exceeds the number of estimated models, all models are

displayed.

Percentage of total number of models. Specifies that results are displayed for models

with goodness-of-fit values in the bottom n percent across all estimated models.

Goodness of Fit Measure. Select the goodness-of-fit measure to use for filtering models.

The default is stationary R square.

28

Chapter 2

Figure 2-11

Time Series Modeler, Save tab

The Save tab allows you to save model predictions as new variables in the active

dataset and save model specifications to an external file in XML format.

Save Variables. You can save model predictions, confidence intervals, and residuals

as new variables in the active dataset. Each dependent series gives rise to its own

set of new variables, and each new variable contains values for both the estimation

and forecast periods. New cases are added if the forecast period extends beyond the

length of the dependent variable series. Choose to save new variables by selecting the

associated Save check box for each. By default, no new variables are saved.

Predicted Values. The model predicted values.

29

Lower Confidence Limits. Lower confidence limits for the predicted values.

Upper Confidence Limits. Upper confidence limits for the predicted values.

Noise Residuals. The model residuals. When transformations of the dependent

variable are performed (for example, natural log), these are the residuals for the

transformed series.

Variable Name Prefix. Specify prefixes to be used for new variable names, or

leave the default prefixes. Variable names consist of the prefix, the name of

the associated dependent variable, and a model identifier. The variable name is

extended if necessary to avoid variable naming conflicts. The prefix must conform

to the rules for valid SPSS variable names.

Export Model File. Model specifications for all estimated models are exported to the

specified file in XML format. Saved models can be used to obtain updated forecasts,

based on more current data, using the “Apply Time Series Models” procedure.

30

Chapter 2

Options

Figure 2-12

Time Series Modeler, Options tab

The Options tab allows you to set the forecast period, specify the handling of missing

values, set the confidence interval width, specify a custom prefix for model identifiers,

and set the number of lags shown for autocorrelations.

Forecast Period. The forecast period always begins with the first case after the end of

the estimation period (the set of cases used to determine the model) and goes through

either the last case in the active dataset or a user-specified date. By default, the end of

the estimation period is the last case in the active dataset, but it can be changed from

the Select Cases dialog box by selecting Based on time or case range.

31

First case after end of estimation period through last case in active dataset. Select this

option when the end of the estimation period is prior to the last case in the active

dataset, and you want forecasts through the last case. This option is typically

used to produce forecasts for a holdout period, allowing comparison of the model

predictions with a subset of the actual values.

First case after end of estimation period through a specified date. Select this option to

explicitly specify the end of the forecast period. This option is typically used to

produce forecasts beyond the end of the actual series. Enter values for all of the

cells in the Date grid.

If no date specification has been defined for the active dataset, the Date grid shows

the single column Observation. To specify the end of the forecast period, enter the

row number (as displayed in the Data Editor) of the relevant case.

The Cycle column (if present) in the Date grid refers to the value of the CYCLE_

variable in the active dataset.

User-Missing Values. These options control the handling of user-missing values.

Treat as invalid. User-missing values are treated like system-missing values.

Treat as valid. User-missing values are treated as valid data.

Missing Value Policy. The following rules apply to the treatment of missing values

(includes system-missing values and user-missing values treated as invalid) during

the modeling procedure:

Cases with missing values of a dependent variable that occur within the estimation

period are included in the model. The specific handling of the missing value

depends on the estimation method.

A warning is issued if an independent variable has missing values within the

estimation period. For the Expert Modeler, models involving the independent

variable are estimated without the variable. For custom ARIMA, models involving

the independent variable are not estimated.

If any independent variable has missing values within the forecast period, the

procedure issues a warning and forecasts as far as it can.

Confidence Interval Width (%). Confidence intervals are computed for the model

predictions and residual autocorrelations. You can specify any positive value less than

100. By default, a 95% confidence interval is used.

32

Chapter 2

Prefix for Model Identifiers in Output. Each dependent variable specified on the Variables

tab gives rise to a separate estimated model. Models are distinguished with unique

names consisting of a customizable prefix along with an integer suffix. You can enter

a prefix or leave the default of Model.

Maximum Number of Lags Shown in ACF and PACF Output. You can set the

maximum number of lags shown in tables and plots of autocorrelations and partial

autocorrelations.

You can customize your time series modeling if you paste your selections into a syntax

window and edit the resulting TSMODEL command syntax. SPSS command language

allows you to:

Specify the seasonal period of the data (with the SEASONLENGTH keyword on

the AUXILIARY subcommand). This overrides the current periodicity (if any)

for the active dataset.

Specify nonconsecutive lags for custom ARIMA and transfer function components

(with the ARIMA and TRANSFERFUNCTION subcommands). For example, you can

specify a custom ARIMA model with autoregressive lags of orders 1, 3, and 6; or a

transfer function with numerator lags of orders 2, 5, and 8.

Provide more than one set of modeling specifications (for example, modeling

method, ARIMA orders, independent variables, and so on) for a single run of the

Time Series Modeler procedure (with the MODEL subcommand).

See the SPSS Command Syntax Reference for complete syntax information.

Chapter

3

Apply Time Series Models

The Apply Time Series Models procedure loads existing time series models from an

external file and applies them to the active dataset. You can use this procedure to obtain

forecasts for series for which new or revised data are available, without rebuilding your

models. Models are generated using the “Time Series Modeler” procedure.

Example. You are an inventory manager with a major retailer, and responsible for each

of 5,000 products. You’ve used the Expert Modeler to create models that forecast sales

for each product three months into the future. Your data warehouse is refreshed each

month with actual sales data which you’d like to use to produce monthly updated

forecasts. The Apply Time Series Models procedure allows you to accomplish this

using the original models, and simply reestimating model parameters to account for

the new data.

Statistics. Goodness-of-fit measures: stationary R-square, R-square (R2), root mean

square error (RMSE), mean absolute error (MAE), mean absolute percentage

error (MAPE), maximum absolute error (MaxAE), maximum absolute percentage

error (MaxAPE), normalized Bayesian information criterion (BIC). Residuals:

autocorrelation function, partial autocorrelation function, Ljung-Box Q.

Plots. Summary plots across all models: histograms of stationary R-square, R-square

(R2), root mean square error (RMSE), mean absolute error (MAE), mean absolute

percentage error (MAPE), maximum absolute error (MaxAE), maximum absolute

percentage error (MaxAPE), normalized Bayesian information criterion (BIC); box

plots of residual autocorrelations and partial autocorrelations. Results for individual

models: forecast values, fit values, observed values, upper and lower confidence limits,

residual autocorrelations and partial autocorrelations.

Data. Variables (dependent and independent) to which models will be applied should

be numeric.

33

34

Chapter 3

Assumptions. Models are applied to variables in the active dataset with the same names

as the variables specified in the model. All such variables are treated as time series,

meaning that each case represents a time point, with successive cases separated by a

constant time interval.

Forecasts. For producing forecasts using models with independent (predictor)

variables, the active dataset should contain values of these variables for all cases

in the forecast period. If model parameters are reestimated, then independent

variables should not contain any missing values in the estimation period.

Defining Dates

The Apply Time Series Models procedure requires that the periodicity, if any, of the

active dataset matches the periodicity of the models to be applied. If you’re simply

forecasting using the same dataset (perhaps with new or revised data) as that used to

the build the model, then this condition will be satisfied. If no periodicity exists for

the active dataset, you will be given the opportunity to navigate to the Define Dates

dialog box to create one. If, however, the models were created without specifying a

periodicity, then the active dataset should also be without one.

To Apply Models

Analyze

Time Series

Apply Models...

35

Figure 3-1

Apply Time Series Models, Models tab

E Enter the file specification for a model file or click Browse and select a model file

(model files are created with the “Time Series Modeler” procedure).

Reestimate model parameters using the data in the active dataset. Forecasts are

created using the reestimated parameters.

Save predictions, confidence intervals, and noise residuals.

Save reestimated models in XML format.

36

Chapter 3

Load from model file. Forecasts are produced using the model parameters from the

model file without reestimating those parameters. Goodness of fit measures displayed

in output and used to filter models (best- or worst-fitting) are taken from the model

file and reflect the data used when each model was developed (or last updated). With

this option, forecasts do not take into account historical data—for either dependent or

independent variables—in the active dataset. You must choose Reestimate from data

if you want historical data to impact the forecasts. In addition, forecasts do not take

into account values of the dependent series in the forecast period—but they do take

into account values of independent variables in the forecast period. If you have more

current values of the dependent series and want them to be included in the forecasts,

you need to reestimate, adjusting the estimation period to include these values.

Reestimate from data. Model parameters are reestimated using the data in the active

dataset. Reestimation of model parameters has no effect on model structure. For

example, an ARIMA(1,0,1) model will remain so, but the autoregressive and

moving-average parameters will be reestimated. Reestimation does not result in the

detection of new outliers. Outliers, if any, are always taken from the model file.

Estimation Period. The estimation period defines the set of cases used to reestimate

the model parameters. By default, the estimation period includes all cases in the

active dataset. To set the estimation period, select Based on time or case range in

the Select Cases dialog box. Depending on available data, the estimation period

used by the procedure may vary by model and thus differ from the displayed value.

For a given model, the true estimation period is the period left after eliminating

any contiguous missing values, from the model’s dependent variable, occurring at

the beginning or end of the specified estimation period.

Forecast Period

The forecast period for each model always begins with the first case after the end

of the estimation period and goes through either the last case in the active dataset

or a user-specified date. If parameters are not reestimated (this is the default), then

the estimation period for each model is the set of cases used when the model was

developed (or last updated).

37

First case after end of estimation period through last case in active dataset. Select

this option when the end of the estimation period is prior to the last case in the

active dataset, and you want forecasts through the last case.

First case after end of estimation period through a specified date. Select this option

to explicitly specify the end of the forecast period. Enter values for all of the

cells in the Date grid.

If no date specification has been defined for the active dataset, the Date grid shows

the single column Observation. To specify the end of the forecast period, enter the

row number (as displayed in the Data Editor) of the relevant case.

The Cycle column (if present) in the Date grid refers to the value of the CYCLE_

variable in the active dataset.

Output

Available output includes results for individual models as well as results across all

models. Results for individual models can be limited to a set of best- or poorest-fitting

models based on user-specified criteria.

38

Chapter 3

Figure 3-2

Apply Time Series Models, Statistics tab

The Statistics tab provides options for displaying tables of model fit statistics, model

parameters, autocorrelation functions, and forecasts. Unless model parameters are

reestimated (Reestimate from data on the Models tab), displayed values of fit measures,

Ljung-Box values, and model parameters are those from the model file and reflect the

data used when each model was developed (or last updated). Outlier information is

always taken from the model file.

39

Display fit measures, Ljung-Box statistic, and number of outliers by model. Select (check)

this option to display a table containing selected fit measures, Ljung-Box value, and

the number of outliers for each model.

Fit Measures. You can select one or more of the following for inclusion in the table

containing fit measures for each model:

Stationary R-square

R-square

Root mean square error

Mean absolute percentage error

Mean absolute error

Maximum absolute percentage error

Maximum absolute error

Normalized BIC

For more information, see “Goodness-of-Fit Measures” in Appendix A on p. 125.

Statistics for Comparing Models. This group of options controls the display of tables

containing statistics across all models. Each option generates a separate table. You can

select one or more of the following options:

Goodness of fit. Table of summary statistics and percentiles for stationary R-square,

R-square, root mean square error, mean absolute percentage error, mean absolute

error, maximum absolute percentage error, maximum absolute error, and

normalized Bayesian Information Criterion.

Residual autocorrelation function (ACF). Table of summary statistics and percentiles

for autocorrelations of the residuals across all estimated models. This table is

only available if model parameters are reestimated (Reestimate from data on the

Models tab).

Residual partial autocorrelation function (PACF). Table of summary statistics and

percentiles for partial autocorrelations of the residuals across all estimated models.

This table is only available if model parameters are reestimated (Reestimate from

data on the Models tab).

40

Chapter 3

Statistics for Individual Models. This group of options controls display of tables

containing detailed information for each model. Each option generates a separate

table. You can select one or more of the following options:

Parameter estimates. Displays a table of parameter estimates for each model.

Separate tables are displayed for exponential smoothing and ARIMA models. If

outliers exist, parameter estimates for them are also displayed in a separate table.

Residual autocorrelation function (ACF). Displays a table of residual autocorrelations

by lag for each estimated model. The table includes the confidence intervals

for the autocorrelations. This table is only available if model parameters are

reestimated (Reestimate from data on the Models tab).

Residual partial autocorrelation function (PACF). Displays a table of residual

partial autocorrelations by lag for each estimated model. The table includes the

confidence intervals for the partial autocorrelations. This table is only available if

model parameters are reestimated (Reestimate from data on the Models tab).

Display forecasts. Displays a table of model forecasts and confidence intervals for

each model.

41

Plots

Figure 3-3

Apply Time Series Models, Plots tab

The Plots tab provides options for displaying plots of model fit statistics,

autocorrelation functions, and series values (including forecasts).

This group of options controls the display of plots containing statistics across all

models. Unless model parameters are reestimated (Reestimate from data on the Models

tab), displayed values are those from the model file and reflect the data used when each

42

Chapter 3

model was developed (or last updated). In addition, autocorrelation plots are only

available if model parameters are reestimated. Each option generates a separate plot.

You can select one or more of the following options:

Stationary R-square

R-square

Root mean square error

Mean absolute percentage error

Mean absolute error

Maximum absolute percentage error

Maximum absolute error

Normalized BIC

Residual autocorrelation function (ACF)

Residual partial autocorrelation function (PACF)

Series. Select (check) this option to obtain plots of the predicted values for each model.

Observed values, fit values, confidence intervals for fit values, and autocorrelations

are only available if model parameters are reestimated (Reestimate from data on the

Models tab). You can select one or more of the following for inclusion in the plot:

Observed values. The observed values of the dependent series.

Forecasts. The model predicted values for the forecast period.

Fit values. The model predicted values for the estimation period.

Confidence intervals for forecasts. The confidence intervals for the forecast period.

Confidence intervals for fit values. The confidence intervals for the estimation

period.

each estimated model.

autocorrelations for each estimated model.

43

Figure 3-4

Apply Time Series Models, Output Filter tab

The Output Filter tab provides options for restricting both tabular and chart output

to a subset of models. You can choose to limit output to the best-fitting and/or the

poorest-fitting models according to fit criteria you provide. By default, all models are

included in the output. Unless model parameters are reestimated (Reestimate from data

on the Models tab), values of fit measures used for filtering models are those from the

model file and reflect the data used when each model was developed (or last updated).

44

Chapter 3

Best-fitting models. Select (check) this option to include the best-fitting models in the

output. Select a goodness-of-fit measure and specify the number of models to include.

Selecting this option does not preclude also selecting the poorest-fitting models. In that

case, the output will consist of the poorest-fitting models as well as the best-fitting ones.

Fixed number of models. Specifies that results are displayed for the n best-fitting

models. If the number exceeds the total number of models, all models are

displayed.

Percentage of total number of models. Specifies that results are displayed for models

with goodness-of-fit values in the top n percent across all models.

Poorest-fitting models. Select (check) this option to include the poorest-fitting models

in the output. Select a goodness-of-fit measure and specify the number of models

to include. Selecting this option does not preclude also selecting the best-fitting

models. In that case, the output will consist of the best-fitting models as well as the

poorest-fitting ones.

Fixed number of models. Specifies that results are displayed for the n poorest-fitting

models. If the number exceeds the total number of models, all models are

displayed.

Percentage of total number of models. Specifies that results are displayed for models

with goodness-of-fit values in the bottom n percent across all models.

Goodness of Fit Measure. Select the goodness-of-fit measure to use for filtering models.

The default is stationary R-square.

45

Figure 3-5

Apply Time Series Models, Save tab

The Save tab allows you to save model predictions as new variables in the active

dataset and save model specifications to an external file in XML format.

Save Variables. You can save model predictions, confidence intervals, and residuals

as new variables in the active dataset. Each model gives rise to its own set of new

variables. New cases are added if the forecast period extends beyond the length of the

dependent variable series associated with the model. Unless model parameters are

reestimated (Reestimate from data on the Models tab), predicted values and confidence

46

Chapter 3

limits are only created for the forecast period. Choose to save new variables by

selecting the associated Save check box for each. By default, no new variables are

saved.

Predicted Values. The model predicted values.

Lower Confidence Limits. Lower confidence limits for the predicted values.

Upper Confidence Limits. Upper confidence limits for the predicted values.

Noise Residuals. The model residuals. When transformations of the dependent

variable are performed (for example, natural log), these are the residuals for

the transformed series. This choice is only available if model parameters are

reestimated (Reestimate from data on the Models tab).

Variable Name Prefix. Specify prefixes to be used for new variable names or

leave the default prefixes. Variable names consist of the prefix, the name of

the associated dependent variable, and a model identifier. The variable name is

extended if necessary to avoid variable naming conflicts. The prefix must conform

to the rules for valid SPSS variable names.

Export Model File Containing Reestimated Parameters. Model specifications, containing

reestimated parameters and fit statistics, are exported to the specified file in XML

format. This option is only available if model parameters are reestimated (Reestimate

from data on the Models tab).

47

Options

Figure 3-6

Apply Time Series Models, Options tab

The Options tab allows you to specify the handling of missing values, set the

confidence interval width, and set the number of lags shown for autocorrelations.

User-Missing Values. These options control the handling of user-missing values.

Treat as invalid. User-missing values are treated like system-missing values.

Treat as valid. User-missing values are treated as valid data.

48

Chapter 3

Missing Value Policy. The following rules apply to the treatment of missing values

(includes system-missing values and user-missing values treated as invalid):

Cases with missing values of a dependent variable that occur within the estimation

period are included in the model. The specific handling of the missing value

depends on the estimation method.

For ARIMA models, a warning is issued if a predictor has any missing values

within the estimation period. Any models involving the predictor are not

reestimated.

If any independent variable has missing values within the forecast period, the

procedure issues a warning and forecasts as far as it can.

Confidence Interval Width (%). Confidence intervals are computed for the model

predictions and residual autocorrelations. You can specify any positive value less than

100. By default, a 95% confidence interval is used.

Maximum Number of Lags Shown in ACF and PACF Output. You can set the

maximum number of lags shown in tables and plots of autocorrelations and partial

autocorrelations. This option is only available if model parameters are reestimated

(Reestimate from data on the Models tab).

Additional features are available if you paste your selections into a syntax window and

edit the resulting TSAPPLY command syntax. SPSS command language allows you to:

Specify that only a subset of the models in a model file are to be applied to the

active dataset (with the DROP and KEEP keywords on the MODEL subcommand).

Apply models from two or more model files to your data (with the MODEL

subcommand). For example, one model file might contain models for series that

represent unit sales, and another might contain models for series that represent

revenue.

See the SPSS Command Syntax Reference for complete syntax information.

Chapter

4

Seasonal Decomposition

component, a combined trend and cycle component, and an “error” component. The

procedure is an implementation of the Census Method I, otherwise known as the

ratio-to-moving-average method.

Example. A scientist is interested in analyzing monthly measurements of the ozone

level at a particular weather station. The goal is to determine if there is any trend in

the data. In order to uncover any real trend, the scientist first needs to account for the

variation in readings due to seasonal effects. The Seasonal Decomposition procedure

can be used to remove any systematic seasonal variations. The trend analysis is then

performed on a seasonally adjusted series.

Statistics. The set of seasonal factors.

Data. The variables should be numeric.

Assumptions. The variables should not contain any embedded missing data. At least

one periodic date component must be defined.

Analyze

Time Series

Seasonal Decomposition...

49

50

Chapter 4

Figure 4-1

Seasonal Decomposition dialog box

E Select one or more variables from the available list and move them into the Variable(s)

list. Note that the list includes only numeric variables.

Model. The Seasonal Decomposition procedure offers two different approaches for

modeling the seasonal factors: multiplicative or additive.

Multiplicative. The seasonal component is a factor by which the seasonally adjusted

series is multiplied to yield the original series. In effect, Trends estimates seasonal

components that are proportional to the overall level of the series. Observations

without seasonal variation have a seasonal component of 1.

Additive. The seasonal adjustments are added to the seasonally adjusted series

to obtain the observed values. This adjustment attempts to remove the seasonal

effect from a series in order to look at other characteristics of interest that may

be "masked" by the seasonal component. In effect, Trends estimates seasonal

components that do not depend on the overall level of the series. Observations

without seasonal variation have a seasonal component of 0.

Moving Average Weight. The Moving Average Weight options allow you to specify how

to treat the series when computing moving averages. These options are available

only if the periodicity of the series is even. If the periodicity is odd, all points are

weighted equally.

51

Seasonal Decomposition

All points equal. Moving averages are calculated with a span equal to the periodicity

and with all points weighted equally. This method is always used if the periodicity

is odd.

Endpoints weighted by .5. Moving averages for series with even periodicity are

calculated with a span equal to the periodicity plus 1 and with the endpoints of the

span weighted by 0.5.

Optionally, you can:

Click Save to specify how new variables should be saved.

Figure 4-2

Season Save dialog box

Add to file. The new series created by Seasonal Decomposition are saved as regular

variables in your active dataset. Variable names are formed from a three-letter

prefix, an underscore, and a number.

Replace existing. The new series created by Seasonal Decomposition are saved

as temporary variables in your active dataset. At the same time, any existing

temporary variables created by the Trends procedures are dropped. Variable names

are formed from a three-letter prefix, a pound sign (#), and a number.

Do not create. The new series are not added to the active dataset.

The Seasonal Decomposition procedure creates four new variables (series), with the

following three-letter prefixes, for each series specified:

SAF. Seasonal adjustment factors. These values indicate the effect of each period

on the level of the series.

52

Chapter 4

SAS. Seasonally adjusted series. These are the values obtained after removing the

seasonal variation of a series.

STC. Smoothed trend-cycle components. These values show the trend and cyclical

behavior present in the series.

ERR. Residual or “error” values. The values that remain after the seasonal, trend,

and cycle components have been removed from the series.

The SPSS command language also allows you to:

Specify any periodicity within the SEASON command rather than select one of the

alternatives offered by the Define Dates procedure.

See the SPSS Command Syntax Reference for complete syntax information.

Chapter

5

Spectral Plots

The Spectral Plots procedure is used to identify periodic behavior in time series.

Instead of analyzing the variation from one time point to the next, it analyzes the

variation of the series as a whole into periodic components of different frequencies.

Smooth series have stronger periodic components at low frequencies; random variation

(“white noise”) spreads the component strength over all frequencies.

Series that include missing data cannot be analyzed with this procedure.

Example. The rate at which new houses are constructed is an important barometer of

the state of the economy. Data for housing starts typically exhibit a strong seasonal

component. But are there longer cycles present in the data that analysts need to be

aware of when evaluating current figures?

Statistics. Sine and cosine transforms, periodogram value, and spectral density estimate

for each frequency or period component. When bivariate analysis is selected: real and

imaginary parts of cross-periodogram, cospectral density, quadrature spectrum, gain,

squared coherency, and phase spectrum for each frequency or period component.

Plots. For univariate and bivariate analyses: periodogram and spectral density.

For bivariate analyses: squared coherency, quadrature spectrum, cross amplitude,

cospectral density, phase spectrum, and gain.

Assumptions. The variables should not contain any embedded missing data. The time

series to be analyzed should be stationary and any non-zero mean should be subtracted

out from the series.

Stationary. A condition that must be met by the time series to which you fit an

ARIMA model. Pure MA series will be stationary; however, AR and ARMA

series might not be. A stationary series has a constant mean and a constant

variance over time.

53

54

Chapter 5

Graphs

Time Series

Spectral...

Figure 5-1

Spectral Plots dialog box

E Select one or more variables from the available list and move them to the Variable(s)

list. Note that the list includes only numeric variables.

E Select one of the Spectral Window options to choose how to smooth the periodogram

in order to obtain a spectral density estimate. Available smoothing options are

Tukey-Hamming, Tukey, Parzen, Bartlett, Daniell (Unit), and None.

Tukey-Hamming. The weights are Wk = .54Dp(2 pi fk) + .23Dp (2 pi fk + pi/p) +

.23Dp (2 pi fk - pi/p), for k = 0, ..., p, where p is the integer part of half the span

and Dp is the Dirichlet kernel of order p.

Tukey. The weights are Wk = 0.5Dp(2 pi fk) + 0.25Dp (2 pi fk + pi/p) + 0.25Dp(2

pi fk - pi/p), for k = 0, ..., p, where p is the integer part of half the span and Dp is

the Dirichlet kernel of order p.

Parzen. The weights are Wk = 1/p(2 + cos(2 pi fk)) (F[p/2] (2 pi fk))**2, for k=

0, ... p, where p is the integer part of half the span and F[p/2] is the Fejer kernel

of order p/2.

55

Spectral Plots

Bartlett. The shape of a spectral window for which the weights of the upper half

of the window are computed as Wk = Fp (2*pi*fk), for k = 0, ... p, where p is

the integer part of half the span and Fp is the Fejer kernel of order p. The lower

half is symmetric with the upper half.

Daniell (Unit). The shape of a spectral window for which the weights are all equal

to 1.

None. No smoothing. If this option is chosen, the spectral density estimate is

the same as the periodogram.

Bivariate analysis – first variable with each. If you have selected two or more variables,

you can select this option to request bivariate spectral analyses.

The first variable in the Variable(s) list is treated as the independent variable, and

all remaining variables are treated as dependent variables.

Each series after the first is analyzed with the first series independently of other

series named. Univariate analyses of each series are also performed.

Span. The range of consecutive values across which the smoothing is carried out.

Generally, an odd integer is used. Larger spans smooth the spectral density plot more

than smaller spans.

Center variables. Adjusts the series to have a mean of 0 before calculating the spectrum

and to remove the large term that may be associated with the series mean.

Plot. Periodogram and spectral density are available for both univariate and bivariate

analyses. All other choices are available only for bivariate analyses.

Periodogram. Unsmoothed plot of spectral amplitude (plotted on a logarithmic

scale) against either frequency or period. Low-frequency variation characterizes

a smooth series. Variation spread evenly across all frequencies indicates "white

noise."

Spectral density. A periodogram that has been smoothed to remove irregular

variation.

Cospectral density. The real part of the cross-periodogram, which is a measure of

the correlation of the in-phase frequency components of two time series.

Quadrature spectrum. The imaginary part of the cross-periodogram, which is a

measure of the correlation of the out-of-phase frequency components of two time

series. The components are out of phase by pi/2 radians.

Cross amplitude. The square root of the sum of the squared cospectral density

and the squared quadrature spectrum.

56

Chapter 5

Gain. The quotient of dividing the cross amplitude by the spectral density for one

of the series. Each of the two series has its own gain value.

Squared coherency. The product of the gains of the two series.

Phase spectrum. A measure of the extent to which each frequency component of

one series leads or lags the other.

By frequency. All plots are produced by frequency, ranging from frequency 0 (the

constant or mean term) to frequency 0.5 (the term for a cycle of two observations).

By period. All plots are produced by period, ranging from 2 (the term for a cycle of two

observations) to a period equal to the number of observations (the constant or mean

term). Period is displayed on a logarithmic scale.

The SPSS command language also allows you to:

Save computed spectral analysis variables to the active dataset for later use.

Specify custom weights for the spectral window.

Produce plots by both frequency and period.

Print a complete listing of each value shown in the plot.

See the SPSS Command Syntax Reference for complete syntax information.

Part II:

Examples

Chapter

6

Bulk Forecasting with the Expert

Modeler

user subscriptions in order to predict utilization of bandwidth. Forecasts are needed

for each of the 85 local markets that make up the national subscriber base. Monthly

historical data is collected in broadband_1.sav, found in the \tutorial\sample_files\

folder where SPSS was installed.

In this example, you will use the Expert Modeler to produce forecasts for the next

three months for each of the 85 local markets, saving the generated models to an

external XML file. Once you are finished, you might want to work through the next

example, “Bulk Reforecasting by Applying Saved Models” in Chapter 7 on p. 73,

which applies the saved models to an updated dataset in order to extend the forecasts

by another three months without having to rebuild the models.

It is always a good idea to have a feel for the nature of your data before building a

model. Does the data exhibit seasonal variations? Although the Expert Modeler

will automatically find the best seasonal or non-seasonal model for each series, you

can often obtain faster results by limiting the search to non-seasonal models when

seasonality is not present in your data. Without examining the data for each of the 85

local markets, we can get a rough picture by plotting the total number of subscribers

over all markets.

Graphs

Sequence...

59

60

Chapter 6

Figure 6-1

Sequence Charts dialog box

E Select Total Number of Subscribers and move it into the Variables list.

E Select Date and move it into the Time Axis Labels box.

E Click OK.

61

Figure 6-2

Total number of broadband subscribers across all markets

The series exhibits a very smooth upward trend with no hint of seasonal variations.

There might be individual series with seasonality, but it appears that seasonality is not

a prominent feature of the data in general. Of course you should inspect each of the

series before ruling out seasonal models. You can then separate out series exhibiting

seasonality and model them separately. In the present case, inspection of the 85 series

would show that none exhibit seasonality.

To use the Expert Modeler:

Analyze

Time Series

Create Models...

62

Chapter 6

Figure 6-3

Time Series Modeler dialog box

E Select Subscribers for Market 1 through Subscribers for Market 85 for dependent

variables.

E Verify that Expert Modeler is selected in the Method drop-down list. The Expert

Modeler will automatically find the best-fitting model for each of the dependent

variable series.

The set of cases used to estimate the model is referred to as the estimation period.

By default, it includes all of the cases in the active dataset. You can set the estimation

period by selecting Based on time or case range in the Select Cases dialog box. For this

example, we will stick with the default.

63

Notice also that the default forecast period starts after the end of the estimation

period and goes through to the last case in the active dataset. If you are forecasting

beyond the last case, you will need to extend the forecast period. This is done from

the Options tab as you will see later on in this example.

E Click Criteria.

Figure 6-4

Expert Modeler Criteria dialog box, Model tab

E Deselect Expert Modeler considers seasonal models in the Model Type group.

Although the data is monthly and the current periodicity is 12, we have seen that the

data does not exhibit any seasonality, so there is no need to consider seasonal models.

This reduces the space of models searched by the Expert Modeler and can significantly

reduce computing time.

64

Chapter 6

E Click Continue.

E Click the Options tab on the Time Series Modeler dialog box.

Figure 6-5

Time Series Modeler, Options tab

E Select First case after end of estimation period through a specified date in the Forecast

Period group.

E In the Date grid, enter 2004 for the year and 3 for the month.

The dataset contains data from January 1999 through December 2003. With the current

settings, the forecast period will be January 2004 through March 2004.

65

Figure 6-6

Time Series Modeler, Save tab

E Select (check) the entry for Predicted Values in the Save column, and leave the default

value Predicted as the Variable Name Prefix.

The model predictions are saved as new variables in the active dataset, using the prefix

Predicted for the variable names. You can also save the specifications for each of the

models to an external XML file. This will allow you to reuse the models to extend your

forecasts as new data becomes available.

E Click the Browse button on the Save tab.

This will take you to a standard dialog box for saving a file.

E Navigate to the folder where you would like to save the XML model file, enter a

filename, and click Save.

66

Chapter 6

The path to the XML model file should now appear on the Save tab.

Figure 6-7

Time Series Modeler, Statistics tab

This option produces a table of forecasted values for each dependent variable series

and provides another option—other than saving the predictions as new variables—for

obtaining these values.

The default selection of Goodness of fit (in the Statistics for Comparing Models

group) produces a table with fit statistics—such as R-squared, mean absolute

percentage error, and normalized BIC—calculated across all of the models. It provides

a concise summary of how well the models fit the data.

67

Figure 6-8

Time Series Modeler, Plots tab

This suppresses the generation of series plots for each of the models. In this example,

we are more interested in saving the forecasts as new variables than generating plots of

the forecasts.

The Plots for Comparing Models group provides several plots (in the form of

histograms) of fit statistics calculated across all models.

E Select Mean absolute percentage error and Maximum absolute percentage error in the

Plots for Comparing Models group.

68

Chapter 6

Absolute percentage error is a measure of how much a dependent series varies from its

model-predicted level. By examining the mean and maximum across all models, you

can get an indication of the uncertainty in your predictions. And looking at summary

plots of percentage errors, rather than absolute errors, is advisable since the dependent

series represent subscriber numbers for markets of varying sizes.

Figure 6-9

Histogram of mean absolute percentage error

This histogram displays the mean absolute percentage error (MAPE) across all models.

It shows that all models display a mean uncertainty of roughly 1%.

69

Figure 6-10

Histogram of maximum absolute percentage error

This histogram displays the maximum absolute percentage error (MaxAPE) across

all models and is useful for imagining a worst-case scenario for your forecasts. It

shows that the largest percentage error for each model falls in the range of 1 to 5%.

Do these values represent an acceptable amount of uncertainty? This is a situation in

which your business sense comes into play because acceptable risk will change from

problem to problem.

70

Chapter 6

Model Predictions

Figure 6-11

New variables containing model predictions

The Data Editor shows the new variables containing the model predictions. Although

only two are shown here, there are 85 new variables, one for each of the 85 dependent

series. The variable names consist of the default prefix Predicted, followed by the

name of the associated dependent variable (for example, Market_1), followed by a

model identifier (for example, Model_1).

Three new cases, containing the forecasts for January 2004 through March 2004,

have been added to the dataset, along with automatically generated date labels. Each of

the new variables contains the model predictions for the estimation period (January

1999 through December 2003), allowing you to see how well the model fits the known

values.

Figure 6-12

Forecast table

71

You also chose to create a table with the forecasted values. The table consists of the

predicted values in the forecast period but—unlike the new variables containing the

model predictions—does not include predicted values in the estimation period. The

results are organized by model and identified by the model name, which consists

of the name (or label) of the associated dependent variable followed by a model

identifier—just like the names of the new variables containing the model predictions.

The table also includes the upper confidence limits (UCL) and lower confidence limits

(LCL) for the forecasted values (95% by default).

You have now seen two approaches for obtaining the forecasted values: saving the

forecasts as new variables in the active dataset and creating a forecast table. With either

approach, you will have a number of options available for exporting your forecasts

(for example, into an Excel spreadsheet).

Summary

You have learned how to use the Expert Modeler to produce forecasts for multiple

series, and you have saved the resulting models to an external XML file. In the

next example, you will learn how to extend your forecasts as new data becomes

available—without having to rebuild your models—by using the Apply Time Series

Models procedure.

Chapter

7

Bulk Reforecasting by Applying

Saved Models

You have used the Time Series Modeler to create models for your time series data and

to produce initial forecasts based on available data. You plan to reuse these models to

extend your forecasts as more current data becomes available, so you saved the models

to an external file. You are now ready to apply the saved models.

This example is a natural extension of the previous one, “Bulk Forecasting with the

Expert Modeler” in Chapter 6 on p. 59, but can also be used independently. In this

scenario, you are an analyst for a national broadband provider who is required to

produce monthly forecasts of user subscriptions for each of 85 local markets. You have

already used the Expert Modeler to create models and to forecast three months into

the future. Your data warehouse has been refreshed with actual data for the original

forecast period, so you would like to use that data to extend the forecast horizon by

another three months.

The updated monthly historical data is collected in broadband_2.sav, and

the saved models are in broadband_models.xml. Both files can be found in the

\tutorial\sample_files\ folder where SPSS was installed. Of course, if you worked

through the previous example and saved your own model file, you can use that one

instead of broadband_models.xml.

To apply models:

Analyze

Time Series

Apply Models...

73

74

Chapter 7

Figure 7-1

Apply Time Series Models dialog box

E Click Browse, navigate to the \tutorial\sample_files\ folder where you installed SPSS,

and select broadband_models.xml (or choose your own model file saved from the

previous example).

The path to broadband_models.xml (or your own model file) should now appear on

the Models tab.

75

To incorporate new values of your time series into forecasts, the Apply Time Series

Models procedure will have to reestimate the model parameters. The structure of the

models remains the same though, so the computing time to reestimate is much quicker

than the original computing time to build the models.

The set of cases used for reestimation needs to include the new data. This will be

assured if you use the default estimation period of First Case to Last Case. If you ever

need to set the estimation period to something other than the default, you can do so by

selecting Based on time or case range in the Select Cases dialog box.

E Select First case after end of estimation period through a specified date in the Forecast

Period group.

E In the Date grid, enter 2004 for the year and 6 for the month.

The dataset contains data from January 1999 through March 2004. With the current

settings, the forecast period will be April 2004 through June 2004.

76

Chapter 7

Figure 7-2

Apply Time Series Models, Save tab

E Select (check) the entry for Predicted Values in the Save column and leave the default

value Predicted as the Variable Name Prefix.

The model predictions will be saved as new variables in the active dataset, using the

prefix Predicted for the variable names.

77

Figure 7-3

Apply Time Series Models, Plots tab

This suppresses the generation of series plots for each of the models. In this example,

we are more interested in saving the forecasts as new variables than generating plots of

the forecasts.

78

Chapter 7

Figure 7-4

Model Fit table

The Model Fit table provides fit statistics calculated across all of the models. It

provides a concise summary of how well the models, with reestimated parameters, fit

the data. For each statistic, the table provides the mean, standard error (SE), minimum,

and maximum value across all models. It also contains percentile values that provide

information on the distribution of the statistic across models. For each percentile,

that percentage of models have a value of the fit statistic below the stated value. For

instance, 95% of the models have a value of MaxAPE (maximum absolute percentage

error) that is less than 3.676.

While a number of statistics are reported, we will focus on two: MAPE (mean

absolute percentage error) and MaxAPE (maximum absolute percentage error).

Absolute percentage error is a measure of how much a dependent series varies

from its model-predicted level and provides an indication of the uncertainty in your

predictions. The mean absolute percentage error varies from a minimum of 0.669% to

a maximum of 1.026% across all models. The maximum absolute percentage error

varies from 1.742% to 4.373% across all models. So the mean uncertainty in each

model’s predictions is about 1% and the maximum uncertainty is around 2.5% (the

mean value of MaxAPE), with a worst case scenario of about 4%. Whether these

values represent an acceptable amount of uncertainty depends on the degree of risk

you are willing to accept.

79

Model Predictions

Figure 7-5

New variables containing model predictions

The Data Editor shows the new variables containing the model predictions. Although

only two are shown here, there are 85 new variables, one for each of the 85 dependent

series. The variable names consist of the default prefix Predicted, followed by the

name of the associated dependent variable (for example, Market_1), followed by a

model identifier (for example, Model_1).

Three new cases, containing the forecasts for April 2004 through June 2004, have

been added to the dataset, along with automatically generated date labels.

Summary

You have learned how to apply saved models to extend your previous forecasts when

more current data becomes available. And you have done this without rebuilding your

models. Of course, if there is reason to think that a model has changed, then you

should rebuild it using the Time Series Modeler procedure.

Chapter

8

Using the Expert Modeler to

Determine Significant Predictors

on monthly sales of men’s clothing along with several series that might be used

to explain some of the variation in sales. Possible predictors include the number of

catalogs mailed, the number of pages in the catalog, the number of phone lines open

for ordering, the amount spent on print advertising, and the number of customer service

representatives. Are any of these predictors useful for forecasting?

In this example, you will use the Expert Modeler with all of the candidate predictors

to find the best model. Since the Expert Modeler only selects those predictors that have

a statistically significant relationship with the dependent series, you will know which

predictors are useful, and you will have a model for forecasting with them. Once you

are finished, you might want to work through the next example, “Experimenting with

Predictors by Applying Saved Models” in Chapter 9 on p. 93, which investigates the

effect on sales of different predictor scenarios using the model built in this example.

The data for the current example is collected in catalog_seasfac.sav, found in the

\tutorial\sample_files\ folder where you installed SPSS.

It is always a good idea to plot your data, especially if you are only working with

one series:

Graphs

Sequence...

81

82

Chapter 8

Figure 8-1

Sequence Charts dialog box

E Select Sales of Men’s Clothing and move it into the Variables list.

E Select Date (the one labeled DATE_) and move it into the Time Axis Labels box.

E Click OK.

83

Figure 8-2

Sales of men’s clothing (in U.S. dollars)

The series exhibits numerous peaks, many of which appear to be equally spaced, as

well as a clear upward trend. The equally spaced peaks suggests the presence of a

periodic component to the time series. Given the seasonal nature of sales, with highs

typically occurring during the holiday season, you should not be surprised to find an

annual seasonal component to the data.

There are also peaks that do not appear to be part of the seasonal pattern and which

represent significant deviations from the neighboring data points. These points may be

outliers, which can and should be addressed by the Expert Modeler.

84

Chapter 8

To use the Expert Modeler:

Analyze

Time Series

Create Models...

Figure 8-3

Time Series Modeler dialog box

Representatives for the independent variables.

85

E Verify that Expert Modeler is selected in the Method drop-down list. The Expert

Modeler will automatically find the best-fitting seasonal or non-seasonal model for

the dependent variable series.

Figure 8-4

Expert Modeler Criteria dialog box, Outliers tab

E Select Detect outliers automatically and leave the default selections for the types of

outliers to detect.

Our visual inspection of the data suggested that there may be outliers. With the current

choices, the Expert Modeler will search for the most common outlier types and

incorporate any outliers into the final model. Outlier detection can add significantly

to the computing time needed by the Expert Modeler, so it is a feature that should

86

Chapter 8

be used with some discretion, particularly when modeling many series at once. By

default, outliers are not detected.

E Click Continue.

E Click the Save tab on the Time Series Modeler dialog box.

Figure 8-5

Time Series Modeler, Save tab

You will want to save the estimated model to an external XML file so that you can

experiment with different values of the predictors—using the Apply Time Series

Models procedure—without having to rebuild the model.

This will take you to a standard dialog box for saving a file.

87

E Navigate to the folder where you would like to save the XML model file, enter a

filename, and click Save.

The path to the XML model file should now appear on the Save tab.

Figure 8-6

Time Series Modeler, Statistics tab

This option produces a table displaying all of the parameters, including the significant

predictors, for the model chosen by the Expert Modeler.

88

Chapter 8

Figure 8-7

Time Series Modeler, Plots tab

E Deselect Forecasts.

In the current example, we are only interested in determining the significant predictors

and building a model. We will not be doing any forecasting.

E Select Fit values.

This option displays the predicted values in the period used to estimate the model. This

period is referred to as the estimation period, and it includes all cases in the active

dataset for this example. These values provide an indication of how well the model

fits the observed values, so they are referred to as fit values. The resulting plot will

consist of both the observed values and the fit values.

E Click OK in the Time Series Modeler dialog box.

89

Series Plot

Figure 8-8

Predicted and observed values

The predicted values show good agreement with the observed values, indicating that

the model has satisfactory predictive ability. Notice how well the model predicts the

seasonal peaks. And it does a good job of capturing the upward trend of the data.

90

Chapter 8

Figure 8-9

Model Description table

The model description table contains an entry for each estimated model and includes

both a model identifier and the model type. The model identifier consists of the

name (or label) of the associated dependent variable and a system-assigned name.

In the current example, the dependent variable is Sales of Men’s Clothing and the

system-assigned name is Model_1.

The Time Series Modeler supports both exponential smoothing and ARIMA

models. Exponential smoothing model types are listed by their commonly used names

such as Holt and Winters’ Additive. ARIMA model types are listed using the standard

notation of ARIMA(p,d,q)(P,D,Q), where p is the order of autoregression, d is the

order of differencing (or integration), and q is the order of moving-average, and

(P,D,Q) are their seasonal counterparts.

The Expert Modeler has determined that sales of men’s clothing is best described by

a seasonal ARIMA model with one order of differencing. The seasonal nature of the

model accounts for the seasonal peaks that we saw in the series plot, and the single

order of differencing reflects the upward trend that was evident in the data.

Figure 8-10

Model Statistics table

The model statistics table provides summary information and goodness-of-fit statistics

for each estimated model. Results for each model are labeled with the model identifier

provided in the model description table. First, notice that the model contains two

predictors out of the five candidate predictors that you originally specified. So it

appears that the Expert Modeler has identified two independent variables that may

prove useful for forecasting.

91

statistics, we opted only for the stationary R-squared value. This statistic provides

an estimate of the proportion of the total variation in the series that is explained by

the model and is preferable to ordinary R-squared when there is a trend or seasonal

pattern, as is the case here. Larger values of stationary R-squared (up to a maximum

value of 1) indicate better fit. A value of 0.948 means that the model does an excellent

job of explaining the observed variation in the series.

The Ljung-Box statistic, also known as the modified Box-Pierce statistic, provides

an indication of whether the model is correctly specified. A significance value less

than 0.05 implies that there is structure in the observed series which is not accounted

for by the model. The value of 0.984 shown here is not significant, so we can be

confident that the model is correctly specified.

The Expert Modeler detected nine points that were considered to be outliers. Each

of these points has been modeled appropriately, so there is no need for you to remove

them from the series.

Figure 8-11

ARIMA Model Parameters table

The ARIMA model parameters table displays values for all of the parameters in the

model, with an entry for each estimated model labeled by the model identifier. For our

purposes, it will list all of the variables in the model, including the dependent variable

and any independent variables that the Expert Modeler determined were significant.

We already know from the model statistics table that there are two significant

predictors. The model parameters table shows us that they are the Number of Catalogs

Mailed and the Number of Phone Lines Open for Ordering.

Summary

You have learned how to use the Expert Modeler to build a model and identify

significant predictors, and you have saved the resulting model to an external file. You

are now in a position to use the Apply Time Series Models procedure to experiment

92

Chapter 8

with alternative scenarios for the predictor series and see how the alternatives affect

the sales forecasts.

Chapter

9

Experimenting with Predictors by

Applying Saved Models

You’ve used the Time Series Modeler to create a model for your data and to identify

which predictors may prove useful for forecasting. The predictors represent factors

that are within your control, so you’d like to experiment with their values in the

forecast period to see how forecasts of the dependent variable are affected. This task is

easily accomplished with the Apply Time Series Models procedure, using the model

file that is created with the Time Series Modeler procedure.

This example is a natural extension of the previous example, “Using the Expert

Modeler to Determine Significant Predictors” in Chapter 8 on p. 81, but this example

can also be used independently. The scenario involves a catalog company that has

collected data about monthly sales of men’s clothing from January 1989 through

December 1998, along with several series that are thought to be potentially useful as

predictors of future sales. The Expert Modeler has determined that only two of the five

candidate predictors are significant: the number of catalogs mailed and the number of

phone lines open for ordering.

When planning your sales strategy for the next year, you have limited resources to

print catalogs and keep phone lines open for ordering. Your budget for the first three

months of 1999 allows for either 2000 additional catalogs or 5 additional phone lines

over your initial projections. Which choice will generate more sales revenue for this

three-month period?

The data for this example are collected in catalog_seasfac.sav, and

catalog_model.xml contains the model of monthly sales that is built with the Expert

Modeler. Both files can be found in the \tutorial\sample_files\ subdirectory of

the directory in which you installed SPSS. Of course, if you worked through the

previous example and saved your own model file, you can use that file instead of

catalog_model.xml.

93

94

Chapter 9

When you’re creating forecasts for dependent series with predictors, each predictor

series needs to be extended through the forecast period. Unless you know precisely

what the future values of the predictors will be, you’ll need to estimate them. You can

then modify the estimates to test different predictor scenarios. The initial projections

are easily created by using the Expert Modeler.

Analyze

Time Series

Create Models...

95

Figure 9-1

Time Series Modeler dialog box

E Select Number of Catalogs Mailed and Number of Phone Lines Open for Ordering for

the dependent variables.

96

Chapter 9

Figure 9-2

Time Series Modeler, Save tab

E In the Save column, select (check) the entry for Predicted Values, and leave the default

value Predicted for the Variable Name Prefix.

97

Figure 9-3

Time Series Modeler, Options tab

E In the Forecast Period group, select First case after end of estimation period through a

specified date.

E In the Date grid, enter 1999 for the year and 3 for the month.

The data set contains data from January 1989 through December 1998, so with the

current settings, the forecast period will be January 1999 through March 1999.

E Click OK.

98

Chapter 9

Figure 9-4

New variables containing forecasts for predictor series

Predicted_phone_Model_2, containing the model predicted values for the number of

catalogs mailed and the number of phone lines. To extend our predictor series, we

only need the values for January 1999 through March 1999, which amounts to cases

121 through 123.

E Copy the values of these three cases from Predicted_mail_Model_1 and append them

to the variable mail.

E Repeat this process for Predicted_phone_Model_2, copying the last three cases and

appending them to the variable phone.

Figure 9-5

Predictor series extended through the forecast period

The predictors have now been extended through the forecast period.

99

Testing the two scenarios of mailing more catalogs or providing more phone lines

requires modifying the estimates for the predictors mail or phone, respectively.

Because we’re only modifying the predictor values for three cases (months), it would

be easy to enter the new values directly into the appropriate cells of the Data Editor.

For instructional purposes, we’ll use the Compute Variable dialog box. When you have

more than a few values to modify, you’ll probably find the Compute Variable dialog

box more convenient.

Transform

Compute...

100

Chapter 9

Figure 9-6

Compute Variable dialog box

E Click If.

101

Figure 9-7

Compute Variable If Cases dialog box

This will limit changes to the variable mail to the cases in the forecast period.

E Click Continue.

E Click OK in the Compute Variable dialog box, and click OK when asked whether you

want to change the existing variable.

This results in increasing the values for mail—the number of catalogs mailed—by

2000 for each of the three months in the forecast period. You’ve now prepared the data

to test the first scenario, and you are ready to run the analysis.

E From the menus choose:

Analyze

Time Series

Apply Models...

102

Chapter 9

Figure 9-8

Apply Time Series Models dialog box

which you installed SPSS, and then select catalog_model.xml, or choose your own

model file (saved from the previous example).

The path to catalog_model.xml, or your own model file, should now appear on the

Models tab.

E In the Forecast Period group, select First case after end of estimation period through a

specified date.

E In the Date grid, enter 1999 for the year and 3 for the month.

103

Figure 9-9

Apply Time Series Models, Statistics tab

104

Chapter 9

Figure 9-10

Forecast table

The forecast table contains the predicted values of the dependent series, taking into

account the values of the two predictors mail and phone in the forecast period. The

table also includes the upper confidence limit (UCL) and lower confidence limit (LCL)

for the predictions.

You’ve produced the sales forecast for the scenario of mailing 2000 more catalogs

each month. You’ll now want to prepare the data for the scenario of increasing the

number of phone lines, which means resetting the variable mail to the original values

and increasing the variable phone by 5. You can reset mail by copying the values of

Predicted_mail_Model_1 in the forecast period and pasting them over the current

values of mail in the forecast period. And you can increase the number of phone

lines—by 5 for each month in the forecast period—either directly in the data editor or

using the Compute Variable dialog box, like we did for the number of catalogs.

To run the analysis, reopen the Apply Time Series Models dialog box as follows:

105

Figure 9-11

Apply Time Series Models dialog box

106

Chapter 9

Figure 9-12

Forecast tables for the two scenarios

Displaying the forecast tables for both scenarios shows that, in each of the three

forecasted months, increasing the number of mailed catalogs is expected to generate

approximately $1500 more in sales than increasing the number of phone lines that

are open for ordering. Based on the analysis, it seems wise to allocate resources to

the mailing of 2000 additional catalogs.

Chapter

10

Seasonal Decomposition

A catalog company is interested in modeling the upward trend of sales of its men’s

clothing line on a set of predictor variables (such as the number of catalogs mailed

and the number of phone lines open for ordering). To this end, the company collected

monthly sales of men’s clothing for a 10-year period. This information is collected in

catalog.sav, which is found in the \tutorial\sample_files\ subdirectory of the directory

in which you installed SPSS.

To perform a trend analysis, you must remove any seasonal variations present in the

data. This task is easily accomplished with the Seasonal Decomposition procedure.

Preliminaries

In the examples that follow, it is more convenient to use variable names rather than

variable labels.

Edit

Options...

107

108

Chapter 10

Figure 10-1

Options dialog box

E Click OK.

The Seasonal Decomposition procedure requires the presence of a periodic date

component in the active dataset—for example, a yearly periodicity of 12 (months),

a weekly periodicity of 7 (days), and so on. It’s a good idea to plot your time series

first, because viewing a time series plot often leads to a reasonable guess about the

underlying periodicity.

109

Seasonal Decomposition

Graphs

Sequence...

Figure 10-2

Sequence Charts dialog box

E Select date and move it into the Time Axis Labels list.

E Click OK.

110

Chapter 10

Figure 10-3

Sales of men’s clothing (in U.S. dollars)

The series exhibits a number of peaks, but they do not appear to be equally spaced.

This output suggests that if the series has a periodic component, it also has fluctuations

that are not periodic—the typical case for real-time series. Aside from the small-scale

fluctuations, the significant peaks appear to be separated by more than a few months.

Given the seasonal nature of sales, with typical highs during the December holiday

season, the time series probably has an annual periodicity. Also notice that the seasonal

variations appear to grow with the upward series trend, suggesting that the seasonal

variations may be proportional to the level of the series, which implies a multiplicative

model rather than an additive model.

more quantitative conclusion about the underlying periodicity.

Graphs

Time Series

Autocorrelations...

111

Seasonal Decomposition

Figure 10-4

Autocorrelations dialog box

E Click OK.

Figure 10-5

Autocorrelation plot for men

112

Chapter 10

exponential tail—a typical pattern for time series. The significant peak at a lag of 12

suggests the presence of an annual seasonal component in the data. Examination of the

partial autocorrelation function will allow a more definitive conclusion.

Figure 10-6

Partial autocorrelation plot for men

The significant peak at a lag of 12 in the partial autocorrelation function confirms the

presence of an annual seasonal component in the data.

Data

Define Dates...

113

Seasonal Decomposition

Figure 10-7

Define Dates dialog box

E Click OK.

This sets the periodicity to 12 and creates a set of date variables that are designed to

work with Trends procedures.

To run the Seasonal Decomposition procedure:

Analyze

Time Series

Seasonal Decomposition...

114

Chapter 10

Figure 10-8

Seasonal Decomposition dialog box

E Click OK.

The Seasonal Decomposition procedure creates four new variables for each of the

original variables analyzed by the procedure. By default, the new variables are added to

the active data set. The new series have names beginning with the following prefixes:

SAF. Seasonal adjustment factors, representing seasonal variation. For the multiplicative

model, the value 1 represents the absence of seasonal variation; for the additive model,

the value 0 represents the absence of seasonal variation.

SAS. Seasonally adjusted series, representing the original series with seasonal

variations removed. Working with a seasonally adjusted series, for example, allows a

trend component to be isolated and analyzed independent of any seasonal component.

STC. Smoothed trend-cycle component, which is a smoothed version of the seasonally

adjusted series that shows both trend and cyclic components.

ERR. The residual component of the series for a particular observation.

115

Seasonal Decomposition

For the present case, the seasonally adjusted series is the most appropriate, because it

represents the original series with the seasonal variations removed.

Figure 10-9

Sequence Charts dialog box

E Click Reset to clear any previous selections, and then select SAS_1 and move it into

the Variables list.

E Click OK.

116

Chapter 10

Figure 10-10

Seasonally adjusted series

The seasonally adjusted series shows a clear upward trend. A number of peaks are

evident, but they appear at random intervals, showing no evidence of an annual pattern.

Summary

Using the Seasonal Decomposition procedure, you have removed the seasonal

component of a periodic time series to produce a series that is more suitable for trend

analysis. Examination of the autocorrelations and partial autocorrelations of the time

series was useful in determining the underlying periodicity—in this case, annual.

117

Seasonal Decomposition

Related Procedures

The Seasonal Decomposition procedure is useful for removing a single seasonal

component from a periodic time series.

To perform a more in-depth analysis of the periodicity of a time series than is

provided by the partial correlation function, use the Spectral Plots procedure.

For more information, see Chapter 11.

Chapter

11

Spectral Plots

Time series representing retail sales typically have an underlying annual periodicity,

due to the usual peak in sales during the holiday season. Producing sales projections

means building a model of the time series, which means identifying any periodic

components. A plot of the time series may not always uncover the annual periodicity

because time series contain random fluctuations that often mask the underlying

structure.

Monthly sales data for a catalog company are stored in catalog.sav, which is found

in the \tutorial\sample_files\ subdirectory of the directory in which you installed SPSS.

Before proceeding with sales projections, you want to confirm that the sales data

exhibits an annual periodicity. A plot of the time series shows many peaks with an

irregular spacing, so any underlying periodicity is not evident. Use the Spectral Plots

procedure to identify any periodicity in the sales data.

To run the Spectral Plots procedure:

Graphs

Time Series

Spectral...

119

120

Chapter 11

Figure 11-1

Spectral Plots dialog box

E Select Sales of Men’s Clothing and move it into the Variables list.

E Click OK.

121

Spectral Plots

Figure 11-2

Periodogram

The plot of the periodogram shows a sequence of peaks that stand out from the

background noise, with the lowest frequency peak at a frequency of just less than

0.1. You suspect that the data contain an annual periodic component, so consider the

contribution that an annual component would make to the periodogram. Each of the

data points in the time series represents a month, so an annual periodicity corresponds

to a period of 12 in the current data set. Because period and frequency are reciprocals

of each other, a period of 12 corresponds to a frequency of 1/12 (or 0.083). So an

annual component implies a peak in the periodogram at 0.083, which seems consistent

with the presence of the peak just below a frequency of 0.1.

122

Chapter 11

Figure 11-3

Univariate statistics table

The univariate statistics table contains the data points that are used to plot the

periodogram. Notice that, for frequencies of less than 0.1, the largest value in the

Periodogram column occurs at a frequency of 0.08333—precisely what you expect

to find if there is an annual periodic component. This information confirms the

identification of the lowest frequency peak with an annual periodic component. But

what about the other peaks at higher frequencies?

123

Spectral Plots

Figure 11-4

Spectral density

The remaining peaks are best analyzed with the spectral density function, which is

simply a smoothed version of the periodogram. Smoothing provides a means of

eliminating the background noise from a periodogram, allowing the underlying

structure to be more clearly isolated.

The spectral density consists of five distinct peaks that appear to be equally spaced.

The lowest frequency peak simply represents the smoothed version of the peak at

0.08333. To understand the significance of the four higher frequency peaks, remember

that the periodogram is calculated by modeling the time series as the sum of cosine and

sine functions. Periodic components that have the shape of a sine or cosine function

(sinusoidal) show up in the periodogram as single peaks. Periodic components that are

not sinusoidal show up as a series of equally spaced peaks of different heights, with

the lowest frequency peak in the series occurring at the frequency of the periodic

component. So the four higher frequency peaks in the spectral density simply indicate

that the annual periodic component is not sinusoidal.

You have now accounted for all of the discernible structure in the spectral density

plot and conclude that the data contain a single periodic component with a period

of 12 months.

124

Chapter 11

Summary

Using the Spectral Plots procedure, you have confirmed the existence of an annual

periodic component of a time series, and you have verified that no other significant

periodicities are present. The spectral density was seen to be more useful than the

periodogram for uncovering the underlying structure, because the spectral density

smoothes out the fluctuations that are caused by the nonperiodic component of the data.

Related Procedures

The Spectral Plots procedure is useful for identifying the periodic components of

a time series.

To remove a periodic component from a time series—for instance, to perform a

trend analysis—use the Seasonal Decomposition procedure. See Chapter 10 for

details.

Appendix

A

Goodness-of-Fit Measures

This section provides definitions of the goodness-of-fit measures used in time series

modeling.

Stationary R-squared. A measure that compares the stationary part of the model to a

simple mean model. This measure is preferable to ordinary R-squared when there

is a trend or seasonal pattern. Stationary R-squared can be negative with a range of

negative infinity to 1. Negative values mean that the model under consideration

is worse than the baseline model. Positive values mean that the model under

consideration is better than the baseline model.

R-squared. An estimate of the proportion of the total variation in the series that is

explained by the model. This measure is most useful when the series is stationary.

R-squared can be negative with a range of negative infinity to 1. Negative values

mean that the model under consideration is worse than the baseline model. Positive

values mean that the model under consideration is better than the baseline model.

RMSE. Root Mean Square Error. The square root of mean square error. A measure

of how much a dependent series varies from its model-predicted level, expressed

in the same units as the dependent series.

MAPE. Mean Absolute Percentage Error. A measure of how much a dependent

series varies from its model-predicted level. It is independent of the units used and

can therefore be used to compare series with different units.

MAE. Mean absolute error. Measures how much the series varies from its

model-predicted level. MAE is reported in the original series units.

MaxAPE. Maximum Absolute Percentage Error. The largest forecasted error,

expressed as a percentage. This measure is useful for imagining a worst-case

scenario for your forecasts.

MaxAE. Maximum Absolute Error. The largest forecasted error, expressed in the

same units as the dependent series. Like MaxAPE, it is useful for imagining the

worst-case scenario for your forecasts. Maximum absolute error and maximum

125

126

Appendix A

absolute percentage error may occur at different series points--for example, when

the absolute error for a large series value is slightly larger than the absolute error

for a small series value. In that case, the maximum absolute error will occur at

the larger series value and the maximum absolute percentage error will occur at

the smaller series value.

Normalized BIC. Normalized Bayesian Information Criterion. A general measure

of the overall fit of a model that attempts to account for model complexity. It is

a score based upon the mean square error and includes a penalty for the number

of parameters in the model and the length of the series. The penalty removes the

advantage of models with more parameters, making the statistic easy to compare

across different models for the same series.

Appendix

B

Outlier Types

This section provides definitions of the outlier types used in time series modeling.

Additive. An outlier that affects a single observation. For example, a data coding

error might be identified as an additive outlier.

Level shift. An outlier that shifts all observations by a constant, starting at a

particular series point. A level shift could result from a change in policy.

Innovational. An outlier that acts as an addition to the noise term at a particular

series point. For stationary series, an innovational outlier affects several

observations. For nonstationary series, it may affect every observation starting at a

particular series point.

Transient. An outlier whose impact decays exponentially to 0.

Seasonal additive. An outlier that affects a particular observation and all

subsequent observations separated from it by one or more seasonal periods. All

such observations are affected equally. A seasonal additive outlier might occur if,

beginning in a certain year, sales are higher every January.

Local trend. An outlier that starts a local trend at a particular series point.

Additive patch. A group of two or more consecutive additive outliers. Selecting this

outlier type results in the detection of individual additive outliers in addition to

patches of them.

127

Appendix

C

Guide to ACF/PACF Plots

The plots shown here are those of pure or theoretical ARIMA processes. Here are

some general guidelines for identifying the process:

Nonstationary series have an ACF that remains significant for half a dozen or more

lags, rather than quickly declining to 0. You must difference such a series until

it is stationary before you can identify the process.

Autoregressive processes have an exponentially declining ACF and spikes in the

first one or more lags of the PACF. The number of spikes indicates the order of

the autoregression.

Moving average processes have spikes in the first one or more lags of the ACF

and an exponentially declining PACF. The number of spikes indicates the order

of the moving average.

Mixed (ARMA) processes typically show exponential declines in both the ACF

and the PACF.

At the identification stage, you do not need to worry about the sign of the ACF or

PACF, or about the speed with which an exponentially declining ACF or PACF

approaches 0. These depend upon the sign and actual value of the AR and MA

coefficients. In some instances, an exponentially declining ACF alternates between

positive and negative values.

ACF and PACF plots from real data are never as clean as the plots shown here. You

must learn to pick out what is essential in any given plot. Always check the ACF and

PACF of the residuals, in case your identification is wrong. Bear in mind that:

Seasonal processes show these patterns at the seasonal lags (the multiples of the

seasonal period).

129

130

Appendix C

You are entitled to treat nonsignificant values as 0. That is, you can ignore

values that lie within the confidence intervals on the plots. You do not have to

ignore them, however, particularly if they continue the pattern of the statistically

significant values.

An occasional autocorrelation will be statistically significant by chance alone. You

can ignore a statistically significant autocorrelation if it is isolated, preferably at

a high lag, and if it does not occur at a seasonal lag.

Consult any text on ARIMA analysis for a more complete discussion of ACF and

PACF plots.

ARIMA(0,0,1), θ>0

ACF PACF

131

ARIMA(0,0,1), θ<0

ACF PACF

ARIMA(0,0,2), θ1θ2>0

ACF PACF

132

Appendix C

ARIMA(1,0,0), φ>0

ACF PACF

ARIMA(1,0,0), φ<0

ACF PACF

133

ACF PACF

ARIMA(2,0,0), φ1φ2>0

ACF PACF

134

Appendix C

ACF

Bibliography

Forecasting and control, 3rd ed. Englewood Cliffs, N.J.: Prentice Hall.

Gardner, E. S. 1985. Exponential smoothing: The state of the art. Journal of

Forecasting, 4, 1–28.

Pena, D., G. C. Tiao, and R. S. Tsay, eds. 2001. A course in time series analysis.

New York: John Wiley and Sons.

135

Index

ACF constant, 16

in Apply Time Series Models, 38, 41 differencing orders, 16

in Time Series Modeler, 22, 24 moving average orders, 16

plots for pure ARIMA processes, 129 outliers, 20

additive outlier, 127 seasonal orders, 16

in Time Series Modeler, 12, 20 transfer functions, 18

additive patch outlier, 127 autocorrelation function

in Time Series Modeler, 12, 20 in Apply Time Series Models, 38, 41

Apply Time Series Models, 33, 73, 93 in Time Series Modeler, 22, 24

best- and poorest-fitting models, 43 plots for pure ARIMA processes, 129

Box-Ljung statistic, 38 autoregression

confidence intervals, 41, 47 ARIMA models, 16

estimation period, 36

fit values, 41

Box-Ljung statistic

forecast period, 36, 75, 102

in Apply Time Series Models, 38

forecast table, 104

in Time Series Modeler, 22, 91

forecasts, 38, 41, 103

Brown’s exponential smoothing model, 13

goodness-of-fit statistics, 38, 41, 78

missing values, 47

model fit table, 78 confidence intervals

model parameters, 38 in Apply Time Series Models, 41, 47

new variable names, 45, 79 in Time Series Modeler, 24, 30

reestimate model parameters, 36, 74

residual autocorrelation function, 38, 41 damped exponential smoothing model, 13

residual partial autocorrelation function, 38, 41 difference transformation

saving predictions, 45, 76 ARIMA models, 16

saving reestimated models in XML, 45

statistics across all models, 38, 41, 78

estimation period, 3

ARIMA model parameters table

in Apply Time Series Models, 36

in Time Series Modeler, 91

in Time Series Modeler, 9, 62

ARIMA models, 8, 16

events, 11

autoregressive orders, 16

in Time Series Modeler, 10

137

138

Index

limiting the model space, 10, 63 in Time Series Modeler, 12, 20

outliers, 12, 85 local trend outlier, 127

exponential smoothing models, 8, 13 in Time Series Modeler, 12, 20

log transformation

in Time Series Modeler, 13, 16, 18

fit values

in Apply Time Series Models, 41

in Time Series Modeler, 24, 88 MAE, 125

forecast period in Apply Time Series Models, 38, 41

in Apply Time Series Models, 36, 75, 102 in Time Series Modeler, 22, 24

in Time Series Modeler, 9, 30, 62, 64 MAPE, 125

forecast table in Apply Time Series Models, 38, 41, 78

in Apply Time Series Models, 104 in Time Series Modeler, 22, 24, 67

in Time Series Modeler, 70 MaxAE, 125

forecasts in Apply Time Series Models, 38, 41

in Apply Time Series Models, 38, 41, 103 in Time Series Modeler, 22, 24

in Time Series Modeler, 22, 24, 66 MaxAPE, 125

in Apply Time Series Models, 38, 41, 78

in Time Series Modeler, 22, 24, 67

goodness of fit maximum absolute error, 125

definitions, 125 in Apply Time Series Models, 38, 41

in Apply Time Series Models, 38, 41, 78 in Time Series Modeler, 22, 24

in Time Series Modeler, 22, 24, 66 maximum absolute percentage error, 125

in Apply Time Series Models, 38, 41, 78

harmonic analysis, 53 in Time Series Modeler, 22, 24, 67

historical data mean absolute error, 125

in Apply Time Series Models, 41 in Apply Time Series Models, 38, 41

in Time Series Modeler, 24 in Time Series Modeler, 22, 24

historical period, 3 mean absolute percentage error, 125

holdout cases, 3 in Apply Time Series Models, 38, 41, 78

Holt’s exponential smoothing model, 13 in Time Series Modeler, 22, 24, 67

missing values

in Apply Time Series Models, 47

innovational outlier, 127 in Time Series Modeler, 30

in Time Series Modeler, 12, 20 model description table

integration in Time Series Modeler, 90

ARIMA models, 16

139

Index

in Apply Time Series Models, 78 in Time Series Modeler, 10, 13, 16, 18

model names

in Time Series Modeler, 30

R2, 125

model parameters

in Apply Time Series Models, 38, 41

in Apply Time Series Models, 38

in Time Series Modeler, 22, 24

in Time Series Modeler, 22, 87

reestimate model parameters

model statistics table

in Apply Time Series Models, 36, 74

in Time Series Modeler, 90

residuals

models

in Apply Time Series Models, 38, 41

ARIMA, 8, 16

in Time Series Modeler, 22, 24

Expert Modeler, 8

RMSE, 125

exponential smoothing, 8, 13

in Apply Time Series Models, 38, 41

moving average

in Time Series Modeler, 22, 24

ARIMA models, 16

root mean square error, 125

in Apply Time Series Models, 38, 41

natural log transformation in Time Series Modeler, 22, 24

in Time Series Modeler, 13, 16, 18

normalized BIC (Bayesian information criterion),

125 save

in Apply Time Series Models, 38, 41 model predictions, 28, 45

in Time Series Modeler, 22, 24 model specifications in XML, 28

new variable names, 28, 45

reestimated models in XML, 45

outliers

seasonal additive outlier, 127

ARIMA models, 20

in Time Series Modeler, 12, 20

definitions, 127

Seasonal Decomposition, 49, 51–52

Expert Modeler, 12, 85

assumptions, 49

computing moving averages, 49

PACF create variables, 51

in Apply Time Series Models, 38, 41 models, 49

in Time Series Modeler, 22, 24 new variables, 114

plots for pure ARIMA processes, 129 periodic date component, 108

partial autocorrelation function related procedures, 117

in Apply Time Series Models, 38, 41 saving new variables, 51

in Time Series Modeler, 22, 24 seasonal difference transformation

plots for pure ARIMA processes, 129 ARIMA models, 16

140

Index

ARIMA models, 16 new variable names, 28, 70

simple exponential smoothing model, 13 outliers, 12, 20, 85

simple seasonal exponential smoothing model, 13 periodicity, 10, 13, 16, 18

Spectral Plots, 53, 56 residual autocorrelation function, 22, 24

assumptions, 53 residual partial autocorrelation function, 22, 24

bivariate spectral analysis, 55 saving model specifications in XML, 28, 65, 86

centering transformation, 55 saving predictions, 28, 65

periodogram, 121 series transformation, 13, 16, 18

related procedures, 124 statistics across all models, 22, 24, 66, 68

spectral density, 121 transfer functions, 18

spectral windows, 54 transfer functions, 18

square root transformation delay, 18

in Time Series Modeler, 13, 16, 18 denominator orders, 18

stationary R2, 125 difference orders, 18

in Apply Time Series Models, 38, 41 numerator orders, 18

in Time Series Modeler, 22, 24, 90 seasonal orders, 18

transient outlier, 127

Time Series Modeler, 5 in Time Series Modeler, 12, 20

ARIMA, 8, 16

ARIMA model parameters table, 91 validation period, 3

best- and poorest-fitting models, 26 variable names

Box-Ljung statistic, 22 in Apply Time Series Models, 45

confidence intervals, 24, 30 in Time Series Modeler, 28

estimation period, 9, 62

events, 10

Winters’ exponential smoothing model

Expert Modeler, 8, 59, 81

additive, 13

exponential smoothing, 8, 13

multiplicative, 13

fit values, 24, 88

forecast period, 9, 30, 62, 64

forecast table, 70 XML

forecasts, 22, 24, 66 saving reestimated models in XML, 45

goodness-of-fit statistics, 22, 24, 66, 90 saving time series models in XML, 28, 65, 86

missing values, 30

model description table, 90

model names, 30

model parameters, 22, 87

- time seriesUploaded byDavid James
- Time Series With SPSSUploaded byHery Mulyana
- Time Series Analysis RUploaded byFrancesca M
- SPSS ForecastingUploaded byBon Vt
- Users Guide SPSS ModelerUploaded byAnca Trusca
- Methods of Time SeriesUploaded bycul239
- Time series analysis in RUploaded byCristiano Christofaro
- 14.2.6 SPSS Time Series Analysis and ForecastingUploaded byalvika
- A First Course on Time Series AnalysisUploaded bybauhauss
- SPSS Statistcs Base User's Guide 17.0Uploaded byHasan
- SPSS for BeginnersUploaded byBehrooz Saghafi
- Time Series BookUploaded byUsman Khalid
- Time SeriesUploaded byMahobe
- Regression Models by SPSS UseUploaded byFaizan Ahmad
- Multivariate Data Analysis Using SPSSUploaded byk9denden
- Time Series Analysis.hamiltonUploaded byJhon Diego Vargas Oliveira
- Statistical Analysis Using SPSSUploaded bySherwan R Shal
- Time Series AnalysisUploaded byKonstantin
- SPSS Modeler BookUploaded byHenry Grimes
- 1468053450-Practical Time Series ForecastingUploaded byabernalol
- SPSS for BeginnersUploaded byHasan
- SPSS Modeler Tutorial 1Uploaded byxor657
- SPSS Base Users Guide 16.0Uploaded byHasan
- SPSS Data AnalysisUploaded byRosle Mohidin
- Applied Time Series Modelling and ForecastingUploaded byalicorpanao
- Chatfield Ch. - Time-Series Forecasting(2000)(1st Edition)(280)Uploaded bylittle_john85
- Ts Econometric sUploaded byDwi Mayu
- Analyzing and Forecasting Time-series Data PPT @ BEC DOMSUploaded byBabasab Patil (Karrisatte)
- spssUploaded byvenkataswamynath channa
- Jorgenson D.W. Econometrics (MIT, 2000)(ISBN 0262100940)(493s)_GLUploaded byathulahsenaratne

- 64 Interview QuestionsUploaded byshivakumar N
- Customer Satisfaction of commercial Indian Bank Using SERVQUAL ModelUploaded byFaizan Ahmad
- IMC process with examplesUploaded byAdil
- Regression Models by SPSS UseUploaded byFaizan Ahmad
- SPSS Tutorial: A detailed ExplanationUploaded byFaizan Ahmad
- Service Quality DilemmaUploaded byFaizan Ahmad
- 22620124 Marketing of Financial ServicesUploaded bySameer Gopal
- Customer satisfaction in retail: Servqual ApproachUploaded byFaizan Ahmad

- homogen sikapUploaded byRoyan Thanthowie
- MMM15 Part 3 - Moderation.pdfUploaded bySana Javaid
- Introduction to SPSS 1Uploaded bynahanuna
- SPSS17_CrossSectional_TutorialUploaded byMehboob Ali
- IntroUploaded byonmcv
- SPSS for the ClassroomUploaded byJennifer L. Magboo-Oestar
- RM Practical Session IUploaded byfahadfiaz
- manovaUploaded byIoana Pavaliu
- EMiner ComparisonsUploaded byLoneli Costaner
- Chapter 7 SlidesUploaded byParth Rajesh Sheth
- ENUS213-206Uploaded byelias.ancares8635
- ITLS5200 Student GuideUploaded byThanh Tu Nguyen
- SPSS Statistics v 17 Network License Administrators GuideUploaded byDevin Garrett
- Basic Statistics for the Utterly ConfusedUploaded bywaqarnasirkhan
- Adept PovertyUploaded byKitchie Hermoso
- Linggi-CorporateSlideISSB v1.3.pptxUploaded bypatahul
- Applications GuideUploaded byAndi Firman Mubarak
- Modeler Db MiningUploaded byKoala
- SPSS Usersguide12 SyntaxUploaded bylfish664899
- MB0040 Statistics for Management Answer KeysUploaded byullasp85
- Multiple ResponseUploaded byrbmalasa
- Introduction to SPSSUploaded byCosy Cheung
- Excel 2010 Data AnalysisUploaded byHugo Martins
- Drews 1999 Wildlife in Costa Rican Households - HSI ReportUploaded bycdrews
- Tutorial rUploaded byCarmen María Alarcón Alarcón
- SPSS Lab ManualUploaded byarchielferrer3975
- Student User Guide for SpssUploaded byakita1610
- WHOQOL-BREF and Scoring Instructions.pdfUploaded byNYONGKER
- Conjoint SpssUploaded bystatsoumya
- How to Use SPSS Syntax, An Overview of Common Commands - 2014 Te Grotenhuis, Manfred & Visscher, ChrisUploaded byNarcisa Ortiz