You are on page 1of 12

DX71-04H-HistRSM-P1 Rev.

4/16/07

Historical Data RSM Tutorial


(Part 1 – The Basics)
Introduction
In this tutorial you will see how the tool of regression in Design-Expert® software,
intended for response surface methods (RSM), can be applied to historical data. We
don’t recommend you work with such happenstance variables if there’s any possibility
of performed a designed experiment. However, if you feel you must, why not take
advantage of how easy Design-Expert makes it to develop predictive models and graph
responses, as you will see by doing this tutorial. It will be assumed that at this stage
you’ve mastered many of the program features by completing the preceding tutorials. At
the very least you ought to first do the one-factor RSM tutorials, basic and advanced,
prior to starting this one.

The historical data for this tutorial, shown below, comes from the U.S. Bureau of Labor
Statistics via James Longley (An Appraisal of Least Squares Programs for the Electronic
Computer from the Point of View of the User, Journal of the American Statistical
Association, 62 (1967): 819-841). As discussed in RSM Simplified (Mark J. Anderson
and Patrick J. Whitcomb, Productivity, Inc., New York: Chapter 2), it presents some
interesting challenges for regression modeling.
Ru A: B: C: D: E: F: Employ.
n Prices GNP Unemp. Military Pop. Time
# (1954 Armed People Year Total
=100) Forces >14
1 83 234289 2356 1590 107608 1947 60323
2 88.5 259426 2325 1456 108632 1948 61122
3 88.2 258054 3682 1616 109773 1949 60171
4 89.5 284599 3351 1650 110929 1950 61187
5 96.2 328975 2099 3099 112075 1951 63221
6 98.1 346999 1932 3594 113270 1952 63639
7 99 365385 1870 3547 115094 1953 64989
8 100 363112 3578 3350 116219 1954 63761
9 101.2 397469 2904 3048 117388 1955 66019
10 104.6 419180 2822 2857 118734 1956 67857
11 108.4 442769 2936 2798 120445 1957 68169
12 110.8 444546 4681 2637 121950 1958 66513
13 112.6 482704 3813 2552 123366 1959 68655
14 114.2 502601 3931 2514 125368 1960 69564
15 115.7 518173 4806 2572 127852 1961 69331
16 116.9 554894 4007 2827 130081 1962 70551
Longley data on U.S. economy from 1947-1962

Assume that the objective for analysis of this data is to predict future employment as a
function of leading economic indicators – the factors labeled A through F in the table
above. Longley’s goal was different: He wanted to test regression software circa 1967

Design-Expert 7.1 User’s Guide Historical Data RSM Tutorial – Part 1 • 1


for round-off error due to highly-correlated inputs. Will Design-Expert be up to the
challenge? We will see!

Let’s begin by setting up this “experiment” (quotes added to emphasize that this is not
really and experiment, but rather an after-the-fact analysis of happenstance data).

Design the “Experiment”


Start Design-Expert. You will then see the main menu and icon bar. To save you time
typing stuff, we will re-build a previously saved design rather than enter it from scratch.
Using your mouse, press the Open Design icon  (or select File, Open Design).

Main menu and Tool bar – Open Design icon highlighted

The file name is Longley.dx7. Click on it and press Open.

Opening the Longley data

The data should now appear on your screen. To re-build this design (and thus see how it
was created), press the blank-sheet icon  on the left of the toolbar (or select File, New
Design).

New Design icon

2 • Historical Data RSM Tutorial – Part 1 Design-Expert 7.1 User’s Guide


DX71-04H-HistRSM-P1 Rev. 4/16/07

When prompted by Design-Expert to “Use previous design info,” click Yes.

Re-using previous design

Now you see how this design was created via the Response Surface tab and Historical
Data option.

Setting up design on historical data

Note that for each of the 6 numeric factors we entered the name, units and range from
minimum (“Min”) to maximum (“Max”). Before moving ahead, you must also tell
Design-Expert how many rows of data you want to type or paste into the design layout.
In this case there are 16 rows.

Entry for rows

Press Continue to accept all the entries on this screen. You now see details on the
response(s) – in this case only the one we will study.

Design-Expert 7.1 User’s Guide Historical Data RSM Tutorial – Part 1 • 3


Response entry

Press Continue to see the resulting design layout in run order (ignore the column
labeled “Std” because there will be no standard order for happenstance data).

A Peculiarity on Pasting Data


You could now type in all the data for factor levels and resulting responses, row-by-row.
(Don’t worry: We won’t make you do this!) However, in most cases the data will
already be available via a Microsoft Window’s based spreadsheet. Then simply drag
over this data, copy it to the Window’s clipboard, and Edit, Paste (or right-click and
Paste as shown below) into the design layout within Design-Expert after first dragging
the top row, as shown below, or over all the destination cells.

Right way to paste data into Design-Expert (top-row of cells pre-selected)

If you simply click the upper left cell in the empty run-sheet, the program will only past
one value.

4 • Historical Data RSM Tutorial – Part 1 Design-Expert 7.1 User’s Guide


DX71-04H-HistRSM-P1 Rev. 4/16/07

Analyze the Results


Normally you’d save your work at this stage, but since we already did this, simply re-
open our file: Press the Open Design icon  and double-click Longley.dx7. Click
No to pass on the opportunity to save what you did previously.

Last chance to save (say “No” in this case)

Before we get started, be forewarned that you will now get exposed to quite a number of
statistics related to least squares regression and analysis of variance (ANOVA). If you
are coming into this cold, pick up a copy of RSM Simplified and keep it handy. For a
good guided tour of these statistics for RSM analysis, attend the Stat-Ease workshop
RSM for Process Optimization. Details on this computer-intensive hands-on class,
including what’s needed a prerequisite, can be found at www.statease.com.

Under the Analysis node click the branch labeled Employment. Design-Expert then
displays a screen for transforming the response. However, as noted by the program, the
range of response in this case is so small that there would be little advantage to applying
any transformation.

Information about the response shown on the Transformation screen

Go ahead and press Fit Summary. Design-Expert then evaluates each degree of the
model from the mean on up. In this case, the best that can be done is linear. Anything
above that becomes aliased.

Design-Expert 7.1 User’s Guide Historical Data RSM Tutorial – Part 1 • 5


Fit Summary – only the linear model possible in this case

You may as well press on to Model.

The linear model chosen

It’s all set the way the software suggested. Notice that many of the two-factor
interactions cannot be estimated due to aliasing symbolized by the red tildas (~). Hold
on to your hats (because this data is really just a lot of hot air!) and press ANOVA for
the analysis of variance.

6 • Historical Data RSM Tutorial – Part 1 Design-Expert 7.1 User’s Guide


DX71-04H-HistRSM-P1 Rev. 4/16/07

Analysis of variance (ANOVA)

Notice that although the overall model is significant, some terms are not.

Some statistical details on how Design-Expert does analysis of variance


You may have noticed that this ANOVA is labeled as “[Partial sum of squares - Type
III]. This approach to ANOVA, done by default, causes the total sums-of-squares (SS)
for the terms to come up short of the overall model when analyzing data from a non-
orthogonal array, such as historical data. If you want SS terms to add up to the model
SS, go to Edit, Preferences and change the default to Sequential (Type I). However, we
do not recommend this approach because it favors the first term put into the model. For
example, in this case the ANOVA by partial SS (Type III -- the default of DX) for the
response (employment total) calculates prob>F p-value for A as 0.8631 (F=0.031) as
seen above, which is not significant. Recalculating ANOVA by sequential sum of
squares (Type I) changes the p to <0.0001 (F=1876), which looks highly significant, but
only because this term (main effect of factor A) is fit first. That simply is not correct.

Assuming Factor A (population) is least significant of all as indicated by the default


ANOVA (partial SS), let’s see what happens with it removed. However, before we do,
on the Bookmarks click the R-Squared look at some statistics (shown below) that
will help us compare what happens before and after reducing the model.

Model statistics

While you’re at it, also bookmark the Coefficients estimates.

Design-Expert 7.1 User’s Guide Historical Data RSM Tutorial – Part 1 • 7


Coefficient estimates for linear model

Notice the huge VIF’s (variance inflation factor). A value of 1 is ideal (orthogonal), but
a VIF less than 10 is generally accepted. VIF’s above 1000, such as that observed for
factor B (GNP), indicate severe multicollinearity in the model coefficients (that’s bad!).
In the follow-up tutorial (Part 2) based on this same Longley data, we will delve more
into this and other statistics generated by Design-Expert for purposes of design
evaluation. For now, try right-clicking any of the VIF results to access context-sensitive
Help, or go to Help on the main menu and search on this statistic. You will likely find
some details there.

Press back to Model and via a right-click on A-Prices and Exclude it, or simply
double-click on this term to take off the model (“M”) designation.

Excluding an insignificant term

You can now go back to ANOVA, look for the next least significant term, exclude it,
etc. However, this backward elimination process can be done automatically in
Design-Expert. Here’s how. First, reset the Process Order to Linear.

8 • Historical Data RSM Tutorial – Part 1 Design-Expert 7.1 User’s Guide


DX71-04H-HistRSM-P1 Rev. 4/16/07

Resetting model to linear

Now change the Selection to Backward.

Specifying backward stepwise regression

Notice that a new field called “Alpha Out” appears. By default the program will remove
the least significant term step-by-step so long as it exceeds the risk level (symbolized
with the Greek letter alpha by statisticians) of 0.1 (estimated by the p-value). Let’s be a
bit more conservative by changing Alpha Out to 0.05.

Changing the risk level alpha for taking out model terms via backward selection

Now press ANOVA to see what happens.

Results from backward regression

Design-Expert 7.1 User’s Guide Historical Data RSM Tutorial – Part 1 • 9


Not surprisingly the program first removed A and then E – that’s it. Take a look at the
ANOVA table that follows to see that all the other terms come out significant.

ANOVA for backward-reduced model

You may’ve noticed that in the full model, factor B had a much higher p-value than
what’s shown above. This instability is typical of models based on historical data.
Scroll down and view the model statistics and coefficients.

Backward reduced model statistics and coefficients

Now let’s try a different regression approach – building the model from the ground
(mean) up, rather than from tearing things down from the top (all terms in chosen
polynomial). Press Model, re-set the Process Order to Linear and this time ask for a
Selection based on Forward stepwise regression. To provide a fair comparison of this
forward approach with that done earlier going backward, change Alpha In to 0.05.

Forward selection (remember to re-set the model to the original process order first!)

10 • Historical Data RSM Tutorial – Part 1 Design-Expert 7.1 User’s Guide


DX71-04H-HistRSM-P1 Rev. 4/16/07

Heed the caution put up by the program – this approach may not work as well for this
highly-collinear set of factors. See what happens now in ANOVA.

Results from forward regression

Surprisingly, factor B now comes in first as the single most significant factor. Then
comes factor C. That’s it! The next most significant factor evidently does not achieve
the alpha-in significance threshold of p<0.05.

Take a look at the ANOVA below to see that all the other terms come out significant.

ANOVA for forward-reduced model

On the Bookmarks click the R-Squared.

Forward reduced model statistics and coefficients

This simpler model scores very high on all measures of R-squared, but it falls a bit short
of what was achieved in the model derived from the backward regression.

Design-Expert 7.1 User’s Guide Historical Data RSM Tutorial – Part 1 • 11


Finally, go back to the Model, re-set the Process Order to Linear and check out the
last model Selection option offered by Design-Expert software: Stepwise.

As you might infer from seeing both and Alpha In and Alpha Out now displayed, the
stepwise algorithm involves elements of forward selection with bits of backward added
in for good measure. For details search program Help, but consider this – terms that
pass the alpha test in (via forward regression) may later (after further terms are added)
become disposable according to the alpha test out (via backward selection). If this
seems odd, just look back at how the p-value for factor B changed depending what other
factors were chosen along with it for modeling. To see what happens with this forward
selection method, press ANOVA. The results depend on what you do with Alpha In and
Alpha Out, which default back to 0.1.

As you see on the cautionary message displayed for both forward and stepwise (in
essence an enhancement on forward) approaches, we favor the backward approach if you
decide to make use of an automated selection method. Ideally an analyst will also be an
expert on the subject matter, or have such a person readily accessible. Then they could
do model reduction via the manual method filtered not only by the statistics, but also
common sense of someone with profound knowledge of the system.

This concludes part 1 of our exploration of the Longley data set. In Part 2 we will dig
further under the covers of Design-Expert to see some interesting aspects in the residual
analysis under Diagnostics, and also see what can be gleaned from its sophisticated tools
under Design, Evaluation.

12 • Historical Data RSM Tutorial – Part 1 Design-Expert 7.1 User’s Guide