You are on page 1of 8

This article was downloaded by: [Northwestern University]

On: 29 January 2015, At: 12:37


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41
Mortimer Street, London W1T 3JH, UK

The American Statistician


Publication details, including instructions for authors and subscription information:
http://amstat.tandfonline.com/loi/utas20

Ridge Analysis 25 Years Later


a
Roger W. Hoerl
a
Engineering Sciences Division of Hercules , Incorporated, Research Center , Wilmington , DE ,
19894 , USA
Published online: 12 Mar 2012.

To cite this article: Roger W. Hoerl (1985) Ridge Analysis 25 Years Later, The American Statistician, 39:3, 186-192

To link to this article: http://dx.doi.org/10.1080/00031305.1985.10479425

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the
publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or
warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed
by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with
primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings,
demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly
in connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at http://amstat.tandfonline.com/page/terms-
and-conditions
Ridge Analysis 25 Years Later
ROGER W. HOERL*

whenever the statistician faces a higher-dimensional qua-


The response surface technique called ridge analysis was dratic response surface.
originally introduced by Hoerl (1959) more than 25 years
ago. Despite tremendous advantages over more conven- 2. REVIEW
tional response surface procedures when more than two If we have a quadratic surface in p independent variables,
independent variables are present, ridge analysis has re- the equation is of the form
ceived little attention in the statistical literature since then,
P p-l P P
although numerous applications have appeared in engi-
neering journals. This situation may be partially due to the Y = bo + 2:
i=!
b.x, + 2: 2:
i=l }=i+!
b., XiX) + 2: bi;xl,
1=1
(1)

fact that this procedure led to the discovery of ridge regres-


sion, which has completely overshadowed ridge analysis in where the three sets of terms model the linear, interaction,
the literature since. This discussion will briefly review the and quadratic effects, respectively. We can write this in
mathematics of ridge analysis, its literature, practical ad- matrix notation as
vantages, and relationship to ridge regression. Y = bo + b'X + (l/z)x'Bx, (2)
KEY WORDS: Response surface; Constrained optimiza- where b is the p x 1 vector of linear coefficients, x is the
Downloaded by [Northwestern University] at 12:37 29 January 2015

tion; Graphics. p x 1 vector of independent variable values, bo is the con-


stant term, and B is a p x p symmetric matrix whose di-
agonal values are twice the quadratic terms and whose off-
diagonal values are the interaction terms. If the variables
1. INTRODUCTION are standardized to have zero mean and equal standard de-
viations, our experimental region can be interpreted easily
Ridge analysis was originally developed by Hoerl (1959)
as some geometric figure with the center point as the origin.
for examining higher-dimensional quadratic response sur-
For a rotatable design (e.g., Box-Wilson), this is a
faces. In contrast to ridge regression, which is an alternative
(hyper-)sphere defined by x'x :s c Z (i.e., the contours of
to least squares estimation in multiple regression, ridge anal-
predicted response variance form spheres; the actual points
ysis graphically portrays the behavior of these surfaces and
may not define an exact sphere). For p = 2 (or 3, if three-
locates overall and local optimal regions. While employed
dimensional graphics are available), contour plots of the
in the statistical group at DuPont, Hoerl was often asked to
surface can be drawn to look for optimal areas, as in Figure 1.
optimize industrial processes involving more than the two
These plots are easily interpreted and make it obvious to
or three independent variables traditionally seen in response
the statistician when he or she is extrapolating beyond the
surface literature. Although the method of canonical anal-
experimental region. In higher dimensions, these plots re-
ysis had been developed by that time, this was generally
quire fixing p - 2 of the variables, which leads to exactly
inadequate for multidimensional surfaces, for reasons to be
the same difficulties as with one-variable-at-a-time optimi-
discussed in a later section. Possessing an engineering back-
zation. It should be kept in mind that response surface meth-
ground, he felt the need for more than a numerical optimi-
odology was developed specifically to avoid these pitfalls
zation of the estimated function. Ridge analysis, then, was
(see Box et al. 1978). From a practical standpoint, this may
the approach he proposed for this problem. Despite provid-
not even be feasible for p greater than 5 or 6, as thick stacks
ing insightful graphics of the effects of all factors simul-
of plots may be required to adequately examine the domain.
taneously, as well as optimizing the surface for any distance
A canonical analysis (see Davies 1956) can be performed
from the center point of the design, the technique did not
for p > 2, but this lacks the advantages of contour plots.
immediately catch on. It may not have been widely under-
It shifts the reference point away from the origin and pro-
stood at that time, however, as response surface analysis
vides no graphics. With this procedure, it is never imme-
was then a rather new concept. Box and Wilson's (1951)
diately obvious when one has left the experimental region
classic paper had been published only eight years previ-
and, in fact, a canonical analysis may use a point completely
ously. Another hindrance has been the general dearth of
outside this region for reference. Interpretations of ridges,
literature on analysis relative to design in response surface
maxima, or both relative to a point outside the domain is
methodology. The confusion of ridge analysis with ridge
clearly undesirable. In addition, the path of steepest ascent
regression has not helped this situation. Ridge analysis ap-
from the original center point is not given. This method
pears to deserve a better fate. The technique should be
does have its advantages, however. The nature of the surface
routinely taught in response surface courses and considered
(max, min, or saddle point) is determined, and the existence
and nature of possible stationary ridges are given.
*Roger W. Hoerl is a Research Mathematician in the Engineering Sci-
Ridge analysis combines the advantages of both proce-
ences Division of Hercules Incorporated, Research Center, Wilmington, dures with higher-dimensional surfaces. Basically, it pro-
DE 19894. The author would like to thank the referees for their substantial vides a canonical analysis relative to the original center point
contribution. and produces graphics of surfaces of any dimension without

186 The American Statistician, August 1985, Vol. 39, No.3 © 1985 American Statistical Association
._~--

----'i.y = 80,4

I
\. I
;'.-2~

;'·1
Figure 1. Response Contours. The curve leaving the origin at approximately 45° is the maximum ridge, or path of steepest ascent.
Downloaded by [Northwestern University] at 12:37 29 January 2015

holding any variables fixed. In addition, secondary (i.e., The last property follows from aYlaR 2 = A/2. Since the
local) optimal regions are examined, and insight into the slope of any ridge is determined by its Avalues and therefore
adequacy of the fitted model can often be gained through will change sign only if 0 is included in its A range, an
examination of the graphics. This procedure has its draw- overall optimum will exist if and only if all eigenvalues are
backs as well, however, particularly in that its graphics are negative (maximum), or all are positive (minimum). If B
not as easily understood as contour plots, and it depends has both negative and positive eigenvalues, a saddle point
on the use of a quadratic model. (minimax) exists in the fitted surface. If AP :::; 0, the max-
imum ridge plot will be increasing as it moves away from
3. INTRODUCTION TO RIDGE ANALYSIS the origin, will eventually hit a maximum, and will begin
decreasing. See Draper (1963) or Hoerl (1964) for a more
Using the previous notation, consider fixing XiX = R2
detailed discussion of the mathematical properties.
and maximizing equation (2) subject to this constraint. For
any given R, some maximum Y(R) is defined (with prob-
4. PRACTICAL ADVANTAGES
ability 1 if the coefficients are normally distributed). Con-
necting the coordinates of the Y(R) values for 0 < R 2 <C 2 Numerical coordinates alone do not provide the human
would display the coordinates of the maximum response mind with sufficient information about higher-dimensional
attainable for any given distance from the origin. This is response surfaces. Graphics are necessary for the same rea-
defined to be the maximum ridge, and traces the path of sons they are necessary in regression, time series, or other
steepest ascent from the origin. The contour plot in Figure 1 statistical analyses. With ridge analysis, the predicted re-
(from Hoerl 1964) has the maximum ridge drawn on it. It sponse can be plotted against R for each ridge, the coor-
is merely coincidence that this ridge is nearly a straight line dinates of any ridge can be plotted against R, and a plot of
at 45°. The minimum ridge is defined similarly and gives R versus A enables the statistician to calibrate A with both
the path of steepest descent. Mathematically, these points desired ridge and R. (Recall that A is an undetermined La-
are determined by differentiating (2) (with use of a La- grangian multiplier and therefore one can not solve for X
grangian multiplier) with respect to X, equating to 0, and simply by specifying a particular ridge and R).
solving for x. The resulting equation is The five-factor yield example discussed by Hoerl (964)
(3)
will be used to illustrate the usefulness of the plots. This
example is rather old, but proves very illustrative of unique
where A is the Lagrangian multiplier that determines X, R, information attainable using ridge analysis with "live" data.
and Y. If A I 2: A2 .. , 2: 'Ap are the ranked eigenvalues of Figure 2 is the plot of Y versus R. Each separate curve
B, the following properties result: corresponds to a different ridge, or local optimum. At each
I. The maximum ridge is defined by A 2: AI. "cusp" point where a new ridge begins, maximum (of these
2. The minimum ridge is defined by A < Ap. two) and minimum (of these two) secondary ridges exist
3. Secondary ridges are defined for Aj < A < Aj+ I. that jointly form a cone shape. The overall maximum and
4. At least two, and at most 2p, ridges exist. minimum ridges begin atR = 0, the origin, and each takes
on the value bo at this point. Note that the boundary of the
5. No two ridge plots (Y as a function of R) cross.
6. All ridge plots are monotonic with R with at most experimental region (here approximately 2.24) is clearly
one exception. discernable. The maximum ridge is virtually a horizontal
line coming out from bo, indicating that very little improve-
The secondary ridges correspond to local optima on the ment over the center point yield is attainable, (It should be
hyperspheres XiX = R 2 . mentioned that the fitted surface has an overall maximum

The American Statistician, August 1985, Vol. 39, No.3 187


88

87

8b

BLe,

81

8eJ-+-----.,------,..------..,.----l.---,--.......l----:'_T_-----,
0. ~ ~ e. 5~ 1. 0~ 1. 5~ Z. 5~
R
Downloaded by [Northwestern University] at 12:37 29 January 2015

Figure 2. Response Ridges. For each ridge (local optimum), we see the predicted value of the response versus R, the distance from the
origin. The vertical line shows the range of experimentation.

of 87.26 at R = 1.54). The flatness of this ridge after the coordinates of the maximum point (at R = 1.54) and
approximately R = .6 suggests that the center point may the point at which we begin to extrapolate, but the behavior
be lying just off a stationary ridge in the true yield surface. of the individual coordinates themselves is available. In
The nature of this stationary ridge will become obvious situations in which coordinates rapidly achieve some "op-
shortly. Another very useful piece of information from this timal" level and then remain relatively stable (recall that
plot is a secondary ridge beginning at approximately R = the overall magnitude of x is being constrained), we can
1.20. The level of the maximum secondary ridge beginning believe that we have found a real and stable solution. This
here is almost identical to the level of the maximum ridge. is the case with variables 1-4 in this plot. The behavior of
This suggests a possible alternative region for optimizing the fifth coefficient is particularly interesting, however. It
yield that may be at much more feasible or economic levels remains virtually zero until approximately R = 1.0 and then
of the independent variables. increases Wildly. When one remembers that on this ridge Y
Figure 3 is the plot of the maximum ridge coordinates is virtually constant after approximately R = .6, there is
versus distance from the origin (R). Not only can we see cause for suspicion. The drastic increase in x5 after R = 1

iI
.." --
C3
I- 1.0-1
I -_...----

~ I ---~..•_ . / - - - - - - - - - - i - - - X2
~
C
\~~///..
I - .."
C) 1 ~---- ..._.~.....- X4
I - .~--
w 0.0 I .. -
s : --=~~: =_~ _ Xl
0:":I
!--I ---__
---.__ ~ _
§:c I
1 X3
><!-1.0~
<1::
::c !;

- Z. 0 + - - - - - . I- I

0. 00 0. ~!~1 1. 00
D

Figure 3. Maximum Ridge Coordinates. This figure plots the coordinates of the maximum ridge (path of steepest ascent) as we move
away from the center point. The vertical line shows the range of experimentation.

188 The American Statistician, August 1985, Vol. 39, No.3


_. l
? 0

(F!
I
1.0~
LU
~
<I:
Z
H
_ _ _ _- - - - - ~ - -X2
~
n:: I
0
! _ _ _- - - - - - ' T - - X4
0
u
w
Q 0.0
~
.....
n::
>-
\.
0: ;~-
<I:
~

8Z -'1
"'"r
V1
I
tH I

- 2. 0 J!-i -----,-------,-------r-------r---"'~-A..L.,
I I
X5
0.00 0.50 1. 00 1.50 2.00 i, 5 0
R
Downloaded by [Northwestern University] at 12:37 29 January 2015

Figure 4. Secondary Ridge Coordinates. This ridge begins at approximately R = 1.19. The vertical line shows the range of experimentation.
Note the similarity to Figure 3 for all variables except x5.

has no noticeable effect on Y. Viewed from this perspective, ticular ridge for some desired range of R. The vertical
it appears obvious that x5 is not critical to the optimization asymptotes are the eigenvalues of B. Recall that the max-
and that the stationary ridge must be almost exactly along imum ridge corresponds to "II. values greater than the max-
the x5 axis. This conclusion is verified by the coordinates imum eigenvalue, and the minimum to "II. values less than
of the maximum secondary ridge discussed earlier, seen in the minimum eigenvalue. The overall stationary point cor-
Figure 4. The coordinates are almost exactly the same as responds to "II. = O.
those for the overall maximum ridge, except that x5 goes
off in the opposite direction as R increases.
5. ADDITIONAL CONSIDERATIONS
If one had only solved numerically for the maximum of
this fitted surface, however, the coordinates would have There are several other considerations to keep to mind
been discovered to bexl = -.28,x2 = .77,x3 = -.69, when one is examining response surfaces. One of these is
x4 = .32, and x5 = 1.05. The coordinate for x5 is the the adequacy of the fitted model, or lack of fit. This is
largest in magnitude! This information alone would lead one generally done in the regression stage of analysis, before
to believe that if one wishes to optimize yield, the greatest examination of any fitted surface, according to standard
change of the independent variables from the center point procedures (see Box et al. 1978). The power of these tests
must be for x5. Such information mayor may not be dis- is virtually unknown for other than the higher terms of the
cernable from the significance of the individual terms in the included variables, however, and lack of significance does
regression model, depending on the nature of the true sur- not prove the null hypothesis true. The behavior of the
face. It is certainly feasible that a variable that is never optimal coordinates as we move away from the center point
significant at the 95% level could be important in the op- may give insight into the adequacy of the fitted model. As
timization, or that a variable that is highly significant in previously mentioned, the optimal ridge coordinators should
interaction terms may become relatively irrelevant when rapidly achieve "optimal" levels and remain relatively sta-
other variables hover at certain levels. The author's personal ble as R increases. If the experimental region is on a rising
experience suggests that this plot often reveals lack of fit ridge, some coordinates will be monotonically increasing
that is not detected as significant by conventional tests. Note or decreasing. In any case, the "ridge trace" of the coor-
also that this plot provides the path of steepest ascent (de- dinates should be reasonably stable. Erratic behavior is an
scent) from the origin to the overall optimum, or if none indication that the model may not adequately fit the true
exists within the domain, to the optimum within the design surface. In the present example, the erratic behavior seems
space. It is useful to have the exact path mapped out ex- to indicate insignificance of x5.
plicitly if operators are reluctant to make drastic changes Another concern (which is often overlooked) is the sta-
based on statistical predictions. This allows an "evolution- bility of the selected "optimal" point. By using contour
ary" attainment of the optimum based on one round of plots of a two-factor surface, one can easily see how drasti-
experimentation. cally the surface drops off when small departures from the
The plot of R versus "II. in Figure 5 reveals the role of "II. selected point occur. In an industrial setting, operating at
in ridge analysis. This plot is often quite useful from a exact levels consistently is impossible; hence a stable point
practical point of view because it reveals the range of "II. for setting the independent variables is desired. With higher-
values needed to substitute into equation (3) to plot a par- dimensional surfaces, holding all but two variables constant

The American Statistician, August 1985, Vol. 39, No.3 189


4

2---1 / \\ I
I ,/
I . . ,./-" \\,,",--.~-_./j \
\ .
("' "\
t···..

' "<, ---..........

~---

o
1 +-----r-----.--------'---.----'---'----+--------,
-4 -] -z -1 o 1
LAt1BDA
Downloaded by [Northwestern University] at 12:37 29 January 2015

Figure 5. Lambda Versus R. This reveals the range of lambda values needed to examine a particular ridge for a particular range of R.
The vertical asymptotes are the eigenvalues of B. The horizontal line shows the range of experimentation.

gives an unrealistic picture of how the response may drop the particular situation, ridge analysis is well suited for
off in an industrial environment, as it allows departures from optimization of this summary response. Khuri and Conlon
the optimal point only along 2-dimensional planes. The (1981) gave a more elaborate co-optimization procedure
effect of all independent variables changing simultaneously based on this general idea, and Myers and Carter (1973)
is completely missed. This determination can be accom- discussed the application of ridge analysis to dual response
plished with ridge analysis, however. Once a point has been systems.
selected, the variables can be restandardized so that the
selected point is now the center point. The response function 6. THE CONNECTION TO RIDGE REGRESSION
is mathematically identical, but is now given in terms of
At about the same time that he was developing ridge
the rescaled variables. The ridge analysis can then be re-
analysis, another problem was perplexing Hoerl. This was
peated, concentrating on the minimum ridge (for a surface
the frequent occurrence of nonsensical estimates from least
to be maximized). This will clearly show how rapidly the
squares multiple regression. In an obscure article (Hoerl
response can drop off if the independent variables depart
1962), he noted that the residual sum of squares in regression
from the selected point. The exact path of the sharpest drop-
could be written as a quadratic function of the coefficients.
off is also given explicitly. If our selected point is an overall
Ridge analysis could therefore be used to calculate and plot
optimum, the ridges will not be uniquely defined with this
the coefficients as one moved along the minimum ridge of
restandardization, but canonical analysis will give the sta-
the residual sum of squares from the overall minimum (least
bility of the surface around the overall stationary point.
squares) to some more stable solution closer to the origin.
It should be noted that the choice of design may influence
The unsolved problem of how far to remove the coefficients
the credibility of the ridge analysis results. A rotatable de-
from the minimum (how to choose k) prevented any further
sign is particularly useful, as the predicted response will
publication of this idea until the assistance of Kennard,
have constant variance on the x'x = R 2 hyperspheres. Even
which led to the publication of the "first" papers on ridge
a rotatable design does not guarantee a desirable distribution
regression in 1970 (Hoerl and Kennard 1970a,b).
of predicted response variance over the entire design region,
The following describes the derivation of ridge regression
however. When dealing with designs that contain drastic
by ridge analysis.
differences in variance of predicted response, the statistician
The residual sum of squares in multiple regression can
should incorporate this in the analysis of the surface. For a
be written as
rotatable design, this could be done with a plot of predicted
response variance versus R. (Y - y)'(Y - y) = y'Y - 2 ~'X'Y + ~'(X'X)~,
The question of how to handle multiple responses is often
of great importance to industrial statisticians. Unfortunately, where Y = X~. This is in the samp form as (2) for ridge
this is basically an unsolved problem, unless additional analysis (multiplied by two) with {3 as our x coordinates,
concessions are made. One popular strategy is to create a - X'Y as our b vector, and (X' X) as our "B" matrix. We
single "summary" response made up of some function of are therefore trying to minimize the residual sum of squares
the original responses, such as a weighted average. This is as a function of our coefficients. It is interesting to note
then regressed by using the independent variables, which that in some cases there may be a minimum secondary ridge
gives one surface to optimize. If this procedure is valid for at almost the same level as the overall minimum. In this

190 The American Statistician, August 1985, Vol. 39, No.3


case, there are two separate regions in the space in which interpretations of ridge regression have appeared in the lit-
to nearly minimize residual sum of squares. If (X'X) is erature, this is the philosophy by which it was originally
positive definite, all eigenvalues of our "B" matrix are developed and interpreted.
positive, implying that an overall minimum exists at (X 'X)-1
X'Y. We can plot the minimum ridge of the residual sum
7. APPLICATIONS AND EXTENSIONS
of squares from the origin (R = 0) to the overall minimum
by using a lambda value smaller than the smallest eigenvalue As previously mentioned, the lack of popularity of ridge
of (X'X) in (3), giving (X'X - 1.1)-1 X/Yo (Note that the analysis in the statistical literature has not prevented prac-
Lagrangian multiplier is generally k = - A in the ridge tical-minded engineers from employing it. Of the numerous
regression literature.) Lambda less than 0 (i.e., k > 0) applications papers appearing in the engineering literature
results in a solution closer to the origin than the overall (the author knows of more than 25), many have come from
minimum. outside the United States, including Eastern Europe. Erhardt
We can also plot the behavior of the coordinates as we et al. (1978, 1980) of East Germany discussed applications
move toward the minimum, resulting in a "ridge trace." dealing with low methoxyl pectin gels. A detailed discussion
Note that with ridge regression one plots ~ versus k (i.e., of the technique, along with applications, appeared in Po-
- A), rather than ~ versus R. This has the effect of shifting land (Jaworski and Szelejewska 1978). Applications dealing
the origin to the overall minimum (A = 0) and moving with the dealkylation of xylene isomers were given by Sarma
backward toward R = O. Thus Figure 6 shows the ridge and Ravindram (1975) from India. Mohammed et al. (1979)
coordinates plot from a ridge regression perspective as the of West Germany discussed the use of ridge analysis to
"ridge trace," beginning at the overall minimum (A = 0) optimize Fischer-Tropsch synthesis of olefins.
Downloaded by [Northwestern University] at 12:37 29 January 2015

and moving on the A (i.e., - k) scale, rather than on the R Ridge analysis was recently applied to the problem of
scale. Note that k is negative here because we are plotting estimating particle size distribution parameters in small an-
the maximum rather than the minimum ridge. In this case, gle neutron scattering (Fatica et a1. 1985). It was used as a
x 1-x5 play the role of regression coefficients rather than sequential search procedure to minimize a multivariate error
design-variable levels as in ridge analysis. Again we see function that could not be written in closed form but could
that the coordinates for x l-x4 are stable, but those for x5 be approximated with a quadratic function of the parameters.
are not. In ridge regression we are therefore making the This application and Hoerl's (1964) discussion of applying
interpretation relative to a more important point for this ridge analysis to the solution of simultaneous linear equa-
application, the least squares solution. With the ridge trace tions suggest that the technique has potential as a general
we are again concerned with the stability of the "optimal" numerical analysis procedure. It can be used for optimi-
coordinates (coefficients). When a coefficient drops errat- zation and examination of the stability of "exact" solutions.
ically toward 0 as k increases from 0, this is analogous to Although ridge analysis may not have received the at-
the coordinate for an independent variable hovering at 0 and tention it deserves in the statistical literature, some notable
then increasing wildly in magnitude just before the overall papers have been published. As the original paper (Hoerl
optimum is reached in ridge analysis. In both cases we view 1959) was published in an engineering journal, no proofs
the coefficient (coordinates) with suspicion and consider of the properties discussed were given. These were soon
setting it at some less drastic level. Although many other provided in the literature in a rigorous fashion by Draper

,
.......... ......,.,.1,
,7,.<;
;;,.

IJ)
"i \
1 1,

w r----:.,
I-- ! \ '------------~---- X2
.:T I
:Z I

~ 0. 50~
Q:' !
8 ~ ~~

~H.2
-~--------
-----------~_.-
.~--~--------- X5
,. 0.00+,- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
III

Q:'

~ r~----------------------- Xl
:::r: -0. 50-! -----X3
1: Il __ -~-
,--
,,
I
I
-1. 00 - I 1 c - - - - - , - - - - - - - , - - - - - - - - - , - - - - - - - -I - , - - - - - - ,
~I. 000 0. 020 0. 0 b0 0. 060
LA!"!BDA

Figure 6. Ridge Trace. This is a ridge coordinates plot (see Figure 3) in the form of a ridge trace from a ridge regression. The origin has
been shifted to the stationary point, and the horizontal axis is A (- k) rather than R. Note that k is negative here because we are plotting a
maximum ridge rather than a minimum ridge as in ridge regression.

The American Statistician, August 1985, Vol. 39, No.3 191


(1963), although they had been independently derived by [Received March 1984. Revised November 1984.]
Jackson (see Hoerl 1959). Hoerl (1964) then combined this REFERENCES
mathematical rigor with the practical interpretation of Hoerl
Box, G. E. P., Hunter, W. G., and Hunter, J. S. (1978), Statistics for
(1959). Experimenters, New York: John Wiley.
Khuri and Myers (1979) presented a modification of ridge Box, G. E. P., and Wilson, K. B. (1951), "On the Experimental Attain-
analysis for use with nonrotatable designs. They suggested ment of Optimum Conditions," Journal of the Royal Statistical Society,
that it may be more appropriate to optimize Y for fixed Ser. B, 13, 1-38.
Davies, O. L. (ed.) (1956), Design and Analysis ofIndustrial Experiments,
variance of prediction rather than for fixed distance from
New York: Hafner.
the origin. Their point is certainly well taken, but this mod- Draper, N. R. (1963), "Ridge Analysis of Response Surfaces," Techno-
ified procedure loses some of the intuitive appeal and ease metrics, 5, 469-479.
of interpretation that make ridge analysis desirable in prac- Erhardt, V., Krause, M., Seppelt, B., and Bock, W. (1978), "Losung
tice. The use of a rotatable design will clearly satisfy both Von Lebensmittelchemischen Und -Technologischen Aufgaben Mit Hilfe
Der Statistischen Versuchsplanung," Lebensmittelindustrie, 25, 151-
criteria, but standard ridge analysis still makes logical sense
154.
with designs that are not exactly rotatable. Only when the - - - (1980), "Optimierung Von Zwei Und Mehr Zielgrossen Bei Der
variance of prediction varies drastically for fixed R will Anwendung Statistischer Versuchsplane Zur Losung Lebensmittelchem-
nonrotatability be a major concern. ischer Und -Technologischer Aufgaben," Lebensmittelindustrie, 27, 107-
Myers and Carter (1973) discussed a procedure similar 110.
Fatica, M. G., Gelman, R. A., Wai, M. P., Hoerl, R. W., and Wignall,
to ridge analysis for optimizing dual response systems. Es-
G. D. (1985), "Neutron Scattering From Latices Prepared by Seeded
sentially, the x';;; = R 2 constraint is replaced with a quadratic Emulsion Polymerization," manuscript in preparation.
constraint on a secondary response. They also discussed Hoerl, A. E. (1959), "Optimum Solution of Many Variables Equations,"
Downloaded by [Northwestern University] at 12:37 29 January 2015

combining the ridge analysis constraint with the secondary Chemical Engineering Progress, 55, 69-78.
response constraint. This results in a ridge analysis subject - - - (1962), "Application of Ridge Analysis to Regression Problems,"
Chemical Engineering Progress, 58, 54-59.
to an additional quadratic constraint.
- - - (1964), "Ridge Analysis," Chemical Engineering Progress Sym-
posium Series, 60, 67-77.
Hoerl, A. E., and Kennard, R. W. (I 970a), "Ridge Regression: Biased
8. SUMMARY Estimation for Non-Orthogonal Problems," Technometrics, 12, 55-67.
- - - (1970b), "Ridge Regression: Applications to Non-Orthogonal
During the past 25 years, ridge analysis has received Problems," Technometrics , 12,69-82.
insufficient attention in the statistical literature. A popular Jaworski, A., and Szelejewska, I. (1978), "Zastosowanie Metody Hoerla
text by Myers (1976), which contains a detailed develop- Do Optymalizacji Procesu Chemicznego Na Przykladzie Syntezy Te-
ment, is one of the only sources to mention it when dis- trahydrofuranu Z Butandiolu-l,4," Przemysl Chemiczny ; 57, 564-567.
Khuri, A. I., and Conlon, M. (1981), "Simultaneous Optimization of
cussingresponse surface analysis. This situationis unfortunate, Multiple Responses Represented by Polynomial Regression Functions,"
as ridge analysis displays the effects of all factors simul- Technometrics, 23, 363-375.
taneously, finds secondary (local) optima, and interprets the Khuri, A.!., and Myers, R. H. (1979), "Modified Ridge Analysis," Tech-
surface relative to the original center point. The popularity nometrics, 21, 467-473.
Mohammed, M. S., Schmidt, B., Schneidt, D., and Ralek, M. (1979),
and controversial nature of ridge regression, as well as the
"Optimierung Der Fischer-Tropsch-Flussig-Phasen-Synthese in Rich-
general lack of emphasis on analysis in response surface tung Der Maximalen Selektivitat An C2 Bis C4-0lefinen," Chemie
methodology, are major causes of this oversight. Ingenieur Technik, 51, 739-741.
Although no major statistical computer packages perform Myers, R. H. (1976), Response Surface Methodology, Blacksburg, VA:
ridge analysis, the author will gladly supply interested par- Author, Virginia Polytechnic Institute and State University.
Myers, R. H., and Carter, W. H. (1973), "Response Surface Techniques
ties with either a Minitab macro or FORTRAN 77 source
for Dual Response Systems," Technometrics, 15,301-317.
code. The FORTRAN code analyzes the surface and writes Sarma, G. S., and Ravindram, M. (1975), "Studies in Dealkylation-A
relevant data to separate files that can be accessed in a Statistical Analysis of Process Variables," Journal ofthe Indian Institute
graphics package for plotting. of Science, 58, 67-83.

192 The American Statistician, August 1985, Vol. 39, No.3

You might also like