Professional Documents
Culture Documents
To cite this article: Roger W. Hoerl (1985) Ridge Analysis 25 Years Later, The American Statistician, 39:3, 186-192
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the
publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or
warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed
by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with
primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings,
demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly
in connection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at http://amstat.tandfonline.com/page/terms-
and-conditions
Ridge Analysis 25 Years Later
ROGER W. HOERL*
186 The American Statistician, August 1985, Vol. 39, No.3 © 1985 American Statistical Association
._~--
----'i.y = 80,4
I
\. I
;'.-2~
;'·1
Figure 1. Response Contours. The curve leaving the origin at approximately 45° is the maximum ridge, or path of steepest ascent.
Downloaded by [Northwestern University] at 12:37 29 January 2015
holding any variables fixed. In addition, secondary (i.e., The last property follows from aYlaR 2 = A/2. Since the
local) optimal regions are examined, and insight into the slope of any ridge is determined by its Avalues and therefore
adequacy of the fitted model can often be gained through will change sign only if 0 is included in its A range, an
examination of the graphics. This procedure has its draw- overall optimum will exist if and only if all eigenvalues are
backs as well, however, particularly in that its graphics are negative (maximum), or all are positive (minimum). If B
not as easily understood as contour plots, and it depends has both negative and positive eigenvalues, a saddle point
on the use of a quadratic model. (minimax) exists in the fitted surface. If AP :::; 0, the max-
imum ridge plot will be increasing as it moves away from
3. INTRODUCTION TO RIDGE ANALYSIS the origin, will eventually hit a maximum, and will begin
decreasing. See Draper (1963) or Hoerl (1964) for a more
Using the previous notation, consider fixing XiX = R2
detailed discussion of the mathematical properties.
and maximizing equation (2) subject to this constraint. For
any given R, some maximum Y(R) is defined (with prob-
4. PRACTICAL ADVANTAGES
ability 1 if the coefficients are normally distributed). Con-
necting the coordinates of the Y(R) values for 0 < R 2 <C 2 Numerical coordinates alone do not provide the human
would display the coordinates of the maximum response mind with sufficient information about higher-dimensional
attainable for any given distance from the origin. This is response surfaces. Graphics are necessary for the same rea-
defined to be the maximum ridge, and traces the path of sons they are necessary in regression, time series, or other
steepest ascent from the origin. The contour plot in Figure 1 statistical analyses. With ridge analysis, the predicted re-
(from Hoerl 1964) has the maximum ridge drawn on it. It sponse can be plotted against R for each ridge, the coor-
is merely coincidence that this ridge is nearly a straight line dinates of any ridge can be plotted against R, and a plot of
at 45°. The minimum ridge is defined similarly and gives R versus A enables the statistician to calibrate A with both
the path of steepest descent. Mathematically, these points desired ridge and R. (Recall that A is an undetermined La-
are determined by differentiating (2) (with use of a La- grangian multiplier and therefore one can not solve for X
grangian multiplier) with respect to X, equating to 0, and simply by specifying a particular ridge and R).
solving for x. The resulting equation is The five-factor yield example discussed by Hoerl (964)
(3)
will be used to illustrate the usefulness of the plots. This
example is rather old, but proves very illustrative of unique
where A is the Lagrangian multiplier that determines X, R, information attainable using ridge analysis with "live" data.
and Y. If A I 2: A2 .. , 2: 'Ap are the ranked eigenvalues of Figure 2 is the plot of Y versus R. Each separate curve
B, the following properties result: corresponds to a different ridge, or local optimum. At each
I. The maximum ridge is defined by A 2: AI. "cusp" point where a new ridge begins, maximum (of these
2. The minimum ridge is defined by A < Ap. two) and minimum (of these two) secondary ridges exist
3. Secondary ridges are defined for Aj < A < Aj+ I. that jointly form a cone shape. The overall maximum and
4. At least two, and at most 2p, ridges exist. minimum ridges begin atR = 0, the origin, and each takes
on the value bo at this point. Note that the boundary of the
5. No two ridge plots (Y as a function of R) cross.
6. All ridge plots are monotonic with R with at most experimental region (here approximately 2.24) is clearly
one exception. discernable. The maximum ridge is virtually a horizontal
line coming out from bo, indicating that very little improve-
The secondary ridges correspond to local optima on the ment over the center point yield is attainable, (It should be
hyperspheres XiX = R 2 . mentioned that the fitted surface has an overall maximum
87
8b
BLe,
81
8eJ-+-----.,------,..------..,.----l.---,--.......l----:'_T_-----,
0. ~ ~ e. 5~ 1. 0~ 1. 5~ Z. 5~
R
Downloaded by [Northwestern University] at 12:37 29 January 2015
Figure 2. Response Ridges. For each ridge (local optimum), we see the predicted value of the response versus R, the distance from the
origin. The vertical line shows the range of experimentation.
of 87.26 at R = 1.54). The flatness of this ridge after the coordinates of the maximum point (at R = 1.54) and
approximately R = .6 suggests that the center point may the point at which we begin to extrapolate, but the behavior
be lying just off a stationary ridge in the true yield surface. of the individual coordinates themselves is available. In
The nature of this stationary ridge will become obvious situations in which coordinates rapidly achieve some "op-
shortly. Another very useful piece of information from this timal" level and then remain relatively stable (recall that
plot is a secondary ridge beginning at approximately R = the overall magnitude of x is being constrained), we can
1.20. The level of the maximum secondary ridge beginning believe that we have found a real and stable solution. This
here is almost identical to the level of the maximum ridge. is the case with variables 1-4 in this plot. The behavior of
This suggests a possible alternative region for optimizing the fifth coefficient is particularly interesting, however. It
yield that may be at much more feasible or economic levels remains virtually zero until approximately R = 1.0 and then
of the independent variables. increases Wildly. When one remembers that on this ridge Y
Figure 3 is the plot of the maximum ridge coordinates is virtually constant after approximately R = .6, there is
versus distance from the origin (R). Not only can we see cause for suspicion. The drastic increase in x5 after R = 1
iI
.." --
C3
I- 1.0-1
I -_...----
~ I ---~..•_ . / - - - - - - - - - - i - - - X2
~
C
\~~///..
I - .."
C) 1 ~---- ..._.~.....- X4
I - .~--
w 0.0 I .. -
s : --=~~: =_~ _ Xl
0:":I
!--I ---__
---.__ ~ _
§:c I
1 X3
><!-1.0~
<1::
::c !;
- Z. 0 + - - - - - . I- I
0. 00 0. ~!~1 1. 00
D
Figure 3. Maximum Ridge Coordinates. This figure plots the coordinates of the maximum ridge (path of steepest ascent) as we move
away from the center point. The vertical line shows the range of experimentation.
(F!
I
1.0~
LU
~
<I:
Z
H
_ _ _ _- - - - - ~ - -X2
~
n:: I
0
! _ _ _- - - - - - ' T - - X4
0
u
w
Q 0.0
~
.....
n::
>-
\.
0: ;~-
<I:
~
8Z -'1
"'"r
V1
I
tH I
- 2. 0 J!-i -----,-------,-------r-------r---"'~-A..L.,
I I
X5
0.00 0.50 1. 00 1.50 2.00 i, 5 0
R
Downloaded by [Northwestern University] at 12:37 29 January 2015
Figure 4. Secondary Ridge Coordinates. This ridge begins at approximately R = 1.19. The vertical line shows the range of experimentation.
Note the similarity to Figure 3 for all variables except x5.
has no noticeable effect on Y. Viewed from this perspective, ticular ridge for some desired range of R. The vertical
it appears obvious that x5 is not critical to the optimization asymptotes are the eigenvalues of B. Recall that the max-
and that the stationary ridge must be almost exactly along imum ridge corresponds to "II. values greater than the max-
the x5 axis. This conclusion is verified by the coordinates imum eigenvalue, and the minimum to "II. values less than
of the maximum secondary ridge discussed earlier, seen in the minimum eigenvalue. The overall stationary point cor-
Figure 4. The coordinates are almost exactly the same as responds to "II. = O.
those for the overall maximum ridge, except that x5 goes
off in the opposite direction as R increases.
5. ADDITIONAL CONSIDERATIONS
If one had only solved numerically for the maximum of
this fitted surface, however, the coordinates would have There are several other considerations to keep to mind
been discovered to bexl = -.28,x2 = .77,x3 = -.69, when one is examining response surfaces. One of these is
x4 = .32, and x5 = 1.05. The coordinate for x5 is the the adequacy of the fitted model, or lack of fit. This is
largest in magnitude! This information alone would lead one generally done in the regression stage of analysis, before
to believe that if one wishes to optimize yield, the greatest examination of any fitted surface, according to standard
change of the independent variables from the center point procedures (see Box et al. 1978). The power of these tests
must be for x5. Such information mayor may not be dis- is virtually unknown for other than the higher terms of the
cernable from the significance of the individual terms in the included variables, however, and lack of significance does
regression model, depending on the nature of the true sur- not prove the null hypothesis true. The behavior of the
face. It is certainly feasible that a variable that is never optimal coordinates as we move away from the center point
significant at the 95% level could be important in the op- may give insight into the adequacy of the fitted model. As
timization, or that a variable that is highly significant in previously mentioned, the optimal ridge coordinators should
interaction terms may become relatively irrelevant when rapidly achieve "optimal" levels and remain relatively sta-
other variables hover at certain levels. The author's personal ble as R increases. If the experimental region is on a rising
experience suggests that this plot often reveals lack of fit ridge, some coordinates will be monotonically increasing
that is not detected as significant by conventional tests. Note or decreasing. In any case, the "ridge trace" of the coor-
also that this plot provides the path of steepest ascent (de- dinates should be reasonably stable. Erratic behavior is an
scent) from the origin to the overall optimum, or if none indication that the model may not adequately fit the true
exists within the domain, to the optimum within the design surface. In the present example, the erratic behavior seems
space. It is useful to have the exact path mapped out ex- to indicate insignificance of x5.
plicitly if operators are reluctant to make drastic changes Another concern (which is often overlooked) is the sta-
based on statistical predictions. This allows an "evolution- bility of the selected "optimal" point. By using contour
ary" attainment of the optimum based on one round of plots of a two-factor surface, one can easily see how drasti-
experimentation. cally the surface drops off when small departures from the
The plot of R versus "II. in Figure 5 reveals the role of "II. selected point occur. In an industrial setting, operating at
in ridge analysis. This plot is often quite useful from a exact levels consistently is impossible; hence a stable point
practical point of view because it reveals the range of "II. for setting the independent variables is desired. With higher-
values needed to substitute into equation (3) to plot a par- dimensional surfaces, holding all but two variables constant
2---1 / \\ I
I ,/
I . . ,./-" \\,,",--.~-_./j \
\ .
("' "\
t···..
~---
o
1 +-----r-----.--------'---.----'---'----+--------,
-4 -] -z -1 o 1
LAt1BDA
Downloaded by [Northwestern University] at 12:37 29 January 2015
Figure 5. Lambda Versus R. This reveals the range of lambda values needed to examine a particular ridge for a particular range of R.
The vertical asymptotes are the eigenvalues of B. The horizontal line shows the range of experimentation.
gives an unrealistic picture of how the response may drop the particular situation, ridge analysis is well suited for
off in an industrial environment, as it allows departures from optimization of this summary response. Khuri and Conlon
the optimal point only along 2-dimensional planes. The (1981) gave a more elaborate co-optimization procedure
effect of all independent variables changing simultaneously based on this general idea, and Myers and Carter (1973)
is completely missed. This determination can be accom- discussed the application of ridge analysis to dual response
plished with ridge analysis, however. Once a point has been systems.
selected, the variables can be restandardized so that the
selected point is now the center point. The response function 6. THE CONNECTION TO RIDGE REGRESSION
is mathematically identical, but is now given in terms of
At about the same time that he was developing ridge
the rescaled variables. The ridge analysis can then be re-
analysis, another problem was perplexing Hoerl. This was
peated, concentrating on the minimum ridge (for a surface
the frequent occurrence of nonsensical estimates from least
to be maximized). This will clearly show how rapidly the
squares multiple regression. In an obscure article (Hoerl
response can drop off if the independent variables depart
1962), he noted that the residual sum of squares in regression
from the selected point. The exact path of the sharpest drop-
could be written as a quadratic function of the coefficients.
off is also given explicitly. If our selected point is an overall
Ridge analysis could therefore be used to calculate and plot
optimum, the ridges will not be uniquely defined with this
the coefficients as one moved along the minimum ridge of
restandardization, but canonical analysis will give the sta-
the residual sum of squares from the overall minimum (least
bility of the surface around the overall stationary point.
squares) to some more stable solution closer to the origin.
It should be noted that the choice of design may influence
The unsolved problem of how far to remove the coefficients
the credibility of the ridge analysis results. A rotatable de-
from the minimum (how to choose k) prevented any further
sign is particularly useful, as the predicted response will
publication of this idea until the assistance of Kennard,
have constant variance on the x'x = R 2 hyperspheres. Even
which led to the publication of the "first" papers on ridge
a rotatable design does not guarantee a desirable distribution
regression in 1970 (Hoerl and Kennard 1970a,b).
of predicted response variance over the entire design region,
The following describes the derivation of ridge regression
however. When dealing with designs that contain drastic
by ridge analysis.
differences in variance of predicted response, the statistician
The residual sum of squares in multiple regression can
should incorporate this in the analysis of the surface. For a
be written as
rotatable design, this could be done with a plot of predicted
response variance versus R. (Y - y)'(Y - y) = y'Y - 2 ~'X'Y + ~'(X'X)~,
The question of how to handle multiple responses is often
of great importance to industrial statisticians. Unfortunately, where Y = X~. This is in the samp form as (2) for ridge
this is basically an unsolved problem, unless additional analysis (multiplied by two) with {3 as our x coordinates,
concessions are made. One popular strategy is to create a - X'Y as our b vector, and (X' X) as our "B" matrix. We
single "summary" response made up of some function of are therefore trying to minimize the residual sum of squares
the original responses, such as a weighted average. This is as a function of our coefficients. It is interesting to note
then regressed by using the independent variables, which that in some cases there may be a minimum secondary ridge
gives one surface to optimize. If this procedure is valid for at almost the same level as the overall minimum. In this
and moving on the A (i.e., - k) scale, rather than on the R Ridge analysis was recently applied to the problem of
scale. Note that k is negative here because we are plotting estimating particle size distribution parameters in small an-
the maximum rather than the minimum ridge. In this case, gle neutron scattering (Fatica et a1. 1985). It was used as a
x 1-x5 play the role of regression coefficients rather than sequential search procedure to minimize a multivariate error
design-variable levels as in ridge analysis. Again we see function that could not be written in closed form but could
that the coordinates for x l-x4 are stable, but those for x5 be approximated with a quadratic function of the parameters.
are not. In ridge regression we are therefore making the This application and Hoerl's (1964) discussion of applying
interpretation relative to a more important point for this ridge analysis to the solution of simultaneous linear equa-
application, the least squares solution. With the ridge trace tions suggest that the technique has potential as a general
we are again concerned with the stability of the "optimal" numerical analysis procedure. It can be used for optimi-
coordinates (coefficients). When a coefficient drops errat- zation and examination of the stability of "exact" solutions.
ically toward 0 as k increases from 0, this is analogous to Although ridge analysis may not have received the at-
the coordinate for an independent variable hovering at 0 and tention it deserves in the statistical literature, some notable
then increasing wildly in magnitude just before the overall papers have been published. As the original paper (Hoerl
optimum is reached in ridge analysis. In both cases we view 1959) was published in an engineering journal, no proofs
the coefficient (coordinates) with suspicion and consider of the properties discussed were given. These were soon
setting it at some less drastic level. Although many other provided in the literature in a rigorous fashion by Draper
,
.......... ......,.,.1,
,7,.<;
;;,.
IJ)
"i \
1 1,
w r----:.,
I-- ! \ '------------~---- X2
.:T I
:Z I
~ 0. 50~
Q:' !
8 ~ ~~
~H.2
-~--------
-----------~_.-
.~--~--------- X5
,. 0.00+,- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
III
Q:'
~ r~----------------------- Xl
:::r: -0. 50-! -----X3
1: Il __ -~-
,--
,,
I
I
-1. 00 - I 1 c - - - - - , - - - - - - - , - - - - - - - - - , - - - - - - - -I - , - - - - - - ,
~I. 000 0. 020 0. 0 b0 0. 060
LA!"!BDA
Figure 6. Ridge Trace. This is a ridge coordinates plot (see Figure 3) in the form of a ridge trace from a ridge regression. The origin has
been shifted to the stationary point, and the horizontal axis is A (- k) rather than R. Note that k is negative here because we are plotting a
maximum ridge rather than a minimum ridge as in ridge regression.
combining the ridge analysis constraint with the secondary Chemical Engineering Progress, 55, 69-78.
response constraint. This results in a ridge analysis subject - - - (1962), "Application of Ridge Analysis to Regression Problems,"
Chemical Engineering Progress, 58, 54-59.
to an additional quadratic constraint.
- - - (1964), "Ridge Analysis," Chemical Engineering Progress Sym-
posium Series, 60, 67-77.
Hoerl, A. E., and Kennard, R. W. (I 970a), "Ridge Regression: Biased
8. SUMMARY Estimation for Non-Orthogonal Problems," Technometrics, 12, 55-67.
- - - (1970b), "Ridge Regression: Applications to Non-Orthogonal
During the past 25 years, ridge analysis has received Problems," Technometrics , 12,69-82.
insufficient attention in the statistical literature. A popular Jaworski, A., and Szelejewska, I. (1978), "Zastosowanie Metody Hoerla
text by Myers (1976), which contains a detailed develop- Do Optymalizacji Procesu Chemicznego Na Przykladzie Syntezy Te-
ment, is one of the only sources to mention it when dis- trahydrofuranu Z Butandiolu-l,4," Przemysl Chemiczny ; 57, 564-567.
Khuri, A. I., and Conlon, M. (1981), "Simultaneous Optimization of
cussingresponse surface analysis. This situationis unfortunate, Multiple Responses Represented by Polynomial Regression Functions,"
as ridge analysis displays the effects of all factors simul- Technometrics, 23, 363-375.
taneously, finds secondary (local) optima, and interprets the Khuri, A.!., and Myers, R. H. (1979), "Modified Ridge Analysis," Tech-
surface relative to the original center point. The popularity nometrics, 21, 467-473.
Mohammed, M. S., Schmidt, B., Schneidt, D., and Ralek, M. (1979),
and controversial nature of ridge regression, as well as the
"Optimierung Der Fischer-Tropsch-Flussig-Phasen-Synthese in Rich-
general lack of emphasis on analysis in response surface tung Der Maximalen Selektivitat An C2 Bis C4-0lefinen," Chemie
methodology, are major causes of this oversight. Ingenieur Technik, 51, 739-741.
Although no major statistical computer packages perform Myers, R. H. (1976), Response Surface Methodology, Blacksburg, VA:
ridge analysis, the author will gladly supply interested par- Author, Virginia Polytechnic Institute and State University.
Myers, R. H., and Carter, W. H. (1973), "Response Surface Techniques
ties with either a Minitab macro or FORTRAN 77 source
for Dual Response Systems," Technometrics, 15,301-317.
code. The FORTRAN code analyzes the surface and writes Sarma, G. S., and Ravindram, M. (1975), "Studies in Dealkylation-A
relevant data to separate files that can be accessed in a Statistical Analysis of Process Variables," Journal ofthe Indian Institute
graphics package for plotting. of Science, 58, 67-83.