Professional Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.
Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve and extend
access to The American Statistician.
http://www.jstor.org
This content downloaded from 128.235.251.160 on Thu, 05 Mar 2015 08:55:41 UTC
All use subject to JSTOR Terms and Conditions
Ridge Analysis25 Years Later
ROGER W. HOERL*
wheneverthe statistician
faces a higher-dimensional
qua-
The responsesurfacetechniquecalled ridgeanalysiswas draticresponsesurface.
originallyintroduced by Hoerl (1959) morethan25 years
ago. Despite tremendous advantagesover moreconven- 2. REVIEW
tionalresponsesurfaceprocedureswhen more thantwo Ifwe havea quadraticsurfaceinp independent
variables,
independent variablesare present,ridgeanalysishas re- theequationis of theform
ceivedlittleattentionin thestatistical
literature
sincethen,
p7 p1- p p
althoughnumerousapplicationshave appearedin engi-
neeringjournals.This situation maybe partially due to the
Y= bo + Ebjx. + E b
b1xix1+ Ebx (1)
factthatthisprocedure led to thediscoveryofridgeregres-
sion,whichhas completely overshadowed ridgeanalysisin wherethethreesetsof termsmodelthelinear,interaction,
theliteraturesince. This discussionwill briefly
reviewthe and quadraticeffects,respectively.
We can writethisin
mathematics of ridgeanalysis,its literature,
practicalad- matrixnotationas
vantages,and relationship to ridgeregression. Y = bo + b'x + ( ?12)x'Bx, (2)
KEY WORDS: Response surface;Constrained
optimiza- whereb is thep x 1 vectorof linearcoefficients, x is the
tion;Graphics. p x 1 vectorof independent variablevalues,bois thecon-
stantterm,and B is a p x p symmetric matrixwhosedi-
agonalvaluesare twicethequadratictermsand whoseoff-
diagonalvalues are the interaction terms.If thevariables
1. INTRODUCTION are standardized to have zero meanand equal standard de-
viations,ourexperimental regioncan be interpreted easily
Ridgeanalysiswas originally developedbyHoerl(1959)
as somegeometric figurewiththecenterpointas theorigin.
forexamininghigher-dimensional quadraticresponsesur-
For a rotatabledesign (e.g., Box-Wilson), this is a
faces.In contrasttoridgeregression,whichis an alternative
(hyper-)sphere definedby x'x < C2 (i.e., thecontoursof
toleastsquaresestimation inmultipleregression,
ridgeanal-
predicted responsevarianceformspheres;theactualpoints
ysisgraphically portraysthebehaviorof thesesurfacesand
maynotdefinean exactsphere).Forp = 2 (or 3, ifthree-
locatesoveralland local optimalregions.Whileemployed
dimensionalgraphicsare available),contourplots of the
in thestatistical
groupat DuPont,Hoerlwas oftenaskedto
surfacecanbe drawntolookforoptimal areas,as inFigure1.
optimizeindustrial processesinvolvingmorethanthetwo These plotsare easily interpreted and make it obviousto
orthreeindependent seeninresponse
variablestraditionally thestatistician whenhe or she is extrapolating beyondthe
surfaceliterature.Althoughthemethodof canonicalanal-
experimental region.In higherdimensions, theseplotsre-
ysis had been developedby thattime,thiswas generally
quirefixingp - 2 of thevariables,whichleads to exactly
inadequateformultidimensional surfaces,forreasonsto be thesamedifficulties as withone-variable-at-a-timeoptimi-
discussedina latersection.Possessingan engineeringback-
zation.Itshouldbe keptinmindthatresponsesurfacemeth-
ground,he felttheneedformorethana numerical optimi-
odologywas developedspecifically to avoid thesepitfalls
zationof theestimated function. Ridgeanalysis,then,was thismay
(see Box et al. 1978). Froma practicalstandpoint,
theapproachhe proposedforthisproblem.Despiteprovid-
notevenbe feasibleforpgreater than5 or6, as thickstacks
ing insightfulgraphicsof theeffectsof all factorssimul- ofplotsmaybe required toadequatelyexaminethedomain.
taneously,as wellas optimizingthesurfaceforanydistance A canonicalanalysis(see Davies 1956)can be performed
fromthecenterpointof thedesign,thetechniquedid not
forp > 2, butthislacks theadvantagesof contourplots.
immediately catchon. It maynothavebeenwidelyunder- It shiftsthereference pointaway fromtheoriginand pro-
stoodat thattime,however,as responsesurfaceanalysis vides no graphics.Withthisprocedure,it is neverimme-
was thena rathernew concept.Box and Wilson's (1951)
diatelyobviouswhenone has lefttheexperimental region
classic paper had been publishedonly eightyearsprevi-
and,infact,a canonicalanalysismayuse a pointcompletely
ously. Anotherhindrancehas been the generaldearthof outsidethisregionforreference. of ridges,
Interpretations
on analysisrelativeto designin responsesurface
literature
maxima,or bothrelativeto a pointoutsidethedomainis
methodology. The confusionof ridgeanalysiswithridge
clearlyundesirable.In addition,thepathof steepestascent
regressionhas nothelpedthissituation.Ridgeanalysisap- fromthe originalcenterpointis not given. This method
pears to deservea betterfate. The techniqueshouldbe doeshaveitsadvantages, however.Thenature ofthesurface
routinelytaughtin responsesurfacecoursesandconsidered (max,min,orsaddlepoint)is determined, andtheexistence
and natureof possiblestationary ridgesare given.
*RogerW. Hoerlis a ResearchMathematician
Ridge analysiscombinesthe advantagesof bothproce-
in theEngineeringSci-
ences Divisionof HerculesIncorporated,
ResearchCenter,Wilmington, dureswithhigher-dimensional surfaces.Basically,it pro-
DE 19894.The authorwouldliketothanktherefereesfortheirsubstantial videsa canonicalanalysisrelativetotheoriginalcenterpoint
contribution. andproducesgraphicsofsurfacesofanydimension without
186 The American Statistician,August 1985, Vol. 39, No. 3 C) 1985 American Statistical Association
This content downloaded from 128.235.251.160 on Thu, 05 Mar 2015 08:55:41 UTC
All use subject to JSTOR Terms and Conditions
RIDGE
MAX
- / -Y 82.5
Y
=
81.,4
- 1- Y=
80.14
N| Y 79.2
ascent.
nidge,orpathofsteepest
450is themaximum
Figure1. Response Contours.Thecurveleavingtheoriginat approximately
holdingany variablesfixed.In addition,secondary(i.e., The lastproperty followsfromaY/aR2= X/2.Since the
local) optimalregionsare examined,and insightintothe slopeofanyridgeis determined byitsXvaluesandtherefore
adequacyof thefittedmodelcan oftenbe gainedthrough will changesign only if 0 is includedin its A range,an
examination of thegraphics.This procedure has its draw- overalloptimum willexistifandonlyifall eigenvaluesare
backsas well,however,particularly in thatitsgraphicsare negative(maximum),or all are positive(minimum).If B
not as easily understoodas contourplots,and it depends has bothnegativeand positiveeigenvalues,a saddlepoint
on theuse of a quadraticmodel. (minimax)existsin thefittedsurface.If Ap ' 0, themax-
imumridgeplotwill be increasing as it movesawayfrom
3. INTRODUCTION TO RIDGE ANALYSIS theorigin,will eventuallyhita maximum,and will begin
decreasing.See Draper(1963) or Hoerl(1964) fora more
Using thepreviousnotation,considerfixingx'x = R2
detaileddiscussionof themathematical properties.
and maximizing equation(2) subjectto thisconstraint.
For
anygivenR, some maximumY(R) is defined(withprob-
4. PRACTICAL ADVANTAGES
ability1 if thecoefficients
are normally Con-
distributed).
nectingthecoordinates of theY(R) valuesfor0 < R2 <C2 Numericalcoordinatesalone do notprovidethehuman
would displaythe coordinatesof the maximumresponse mindwithsufficient information abouthigher-dimensional
attainableforany givendistancefromtheorigin.This is responsesurfaces.Graphicsarenecessaryforthesamerea-
definedto be themaximumridge,and tracesthepathof sonstheyare necessaryin regression, timeseries,or other
steepestascentfromtheorigin.Thecontour plotinFigure1 statistical
analyses.Withridgeanalysis,thepredictedre-
(fromHoerl 1964) has themaximumridgedrawnon it. It sponsecan be plottedagainstR foreach ridge,thecoor-
is merelycoincidencethatthisridgeis nearlya straight
line dinatesof anyridgecan be plottedagainstR, and a plotof
at 45?. The minimum ridgeis definedsimilarly and gives R versusA enablesthestatistician to calibrateA withboth
thepathof steepestdescent.Mathematically, thesepoints desiredridgeandR. (Recall thatA is an undetermined La-
are determined (2) (withuse of a La-
by differentiating grangianmultiplier and therefore one can not solve forx
grangianmultiplier) withrespectto x, equatingto 0, and simplyby specifying a particular
ridgeandR).
solvingforx. The resulting equationis The five-factoryieldexamplediscussedby Hoerl(1964)
will be used to illustratetheusefulnessof theplots.This
x = -(B - XI)-lb, (3)
exampleis rather old, butprovesveryillustrative ofunique
whereX is theLagrangianmultiplier thatdetermines
x, R, information attainable usingridgeanalysiswith"live" data.
and Y. If Xl ' X2 . . . ' Xp are the rankedeigenvalues of Figure2 is theplotof Y versusR. Each separatecurve
B, thefollowing properties result: corresponds to a differentridge,or local optimum. At each
'cusp" pointwherea newridgebegins,maximum (ofthese
1. The maximum ridgeis definedby X XI.
A
2. The minimum two) and minimum(of thesetwo) secondaryridgesexist
ridgeis definedby A < Xp.
thatjointlyforma cone shape. The overallmaximumand
3. Secondaryridgesare definedforAj< A < Xj+ 1.
minimum ridgesbeginatR = 0, theorigin,andeachtakes
4. At leasttwo,and at most2p, ridgesexist.
on thevaluebo at thispoint.Notethattheboundary of the
5. No tworidgeplots(Y as a functionofR) cross.
6. All ridgeplotsare monotonicwithR withat most
experimental region(here approximately 2.24) is clearly
discernable.The maximumridgeis virtually a horizontal
one exception.
linecomingoutfrombo,indicating thatverylittleimprove-
to local optimaon the
The secondaryridgescorrespond mentoverthecenterpointyieldis attainable.(It shouldbe
hyperspheresx'x = R2. mentioned thatthefittedsurfacehas an overallmaximum
This content downloaded from 128.235.251.160 on Thu, 05 Mar 2015 08:55:41 UTC
All use subject to JSTOR Terms and Conditions
5-b
Sb N~~~~~~~N
LO IN,~~~~~~~~N
LuJ
8 4-\
c:, 8
-
LU - N
Cy IOK-I'\ ,
O. 00 0
0. 58 0
. 15 . 50 L 0 50
Dw 3.
Figure2. Response Ridges. For each ridge (local optimum),we see thepredicted value of the response versus R, the distance fromthe
origin.The verticalline shows the range of experimentation.
2~. 0 _ X2
w 1 7 _
X2
= X4
U I
x . 0 i
c I i~~~~~~~~~~~~~~~~~~~~x
x -1. 0-
;
-2. 0 I~~~~~~~~~~~~~~~~~~~~~~
O. 00 0 . 50 i. 00 1 . 50 2.s ',AL
i 2h 150
188 TheAmericanStatistician,
August1985, Vol. 39, No. 3
This content downloaded from 128.235.251.160 on Thu, 05 Mar 2015 08:55:41 UTC
All use subject to JSTOR Terms and Conditions
2.
(nI
z _ _ _ _ _ _ _
X2
Xo4
FI
v~~~~~~~~~~~~~~~~~~~~~i
al: q 1I X3
LU I \ N
\ I
-2. 0- ,X
0. 00 0. 50 1. 00 1.50 2.00 2.50
R
Figure4. Secondary Ridge Coordinates.Thisridgebegins at approximatelyR = 1.19. The verticallineshows therange ofexperimentation.
to Figure3 forall variables except x5.
Note the similarity
This content downloaded from 128.235.251.160 on Thu, 05 Mar 2015 08:55:41 UTC
All use subject to JSTOR Terms and Conditions
4
7- 1' /jIV
C,/ $1 l
-d -I3 -2 0
LAMBDA
Figure 5. Lambda Versus R. Thisreveals the range of lambda values needed to examine a particularridge fora particularrange of R.
The verticalasymptotesare the eigenvalues of B. The horizontalline shows the range of experimentation.
190 TheAmericanStatistician,
August1985, Vol. 39, No. 3
This content downloaded from 128.235.251.160 on Thu, 05 Mar 2015 08:55:41 UTC
All use subject to JSTOR Terms and Conditions
case, thereare two separateregionsin thespace in which have appearedin thelit-
of ridgeregression
interpretations
to nearlyminimizeresidualsum of squares. If (X'X) is erature,thisis thephilosophyby whichit was originally
positivedefinite,all eigenvaluesof our "B" matrixare developedand interpreted.
positive,implying thatan overallminimum existsat (X'X)-1
X'Y. We can plottheminimum ridgeof theresidualsum
7. APPLICATIONS AND EXTENSIONS
of squaresfromtheorigin(R = 0) to theoverallminimum
byusinga lambdavaluesmallerthanthesmallesteigenvalue As previously mentioned, thelackof popularity ofridge
of (X'X) in (3), giving(X'X - XI)-1 X'Y. (Note thatthe analysisin thestatistical literaturehas notprevented prac-
Lagrangianmultiplier is generallyk = -X in the ridge tical-minded engineers fromemploying it. Ofthenumerous
regressionliterature.) Lambda less than0 (i.e., k > 0) applicationspapersappearingin theengineering literature
resultsin a solutioncloser to the originthanthe overall (theauthorknowsof morethan25), manyhavecomefrom
minimum. outsidetheUnitedStates,including EasternEurope.Erhardt
We can also plotthebehaviorof thecoordinates as we et al. (1978, 1980) ofEast Germany discussedapplications
move towardthe minimum, in a "ridgetrace."
resulting dealingwithlowmethoxyl pectingels.A detaileddiscussion
Note thatwithridgeregression one plots,3 versusk (i.e., of thetechnique,alongwithapplications, appearedin Po-
- A), rather than,BversusR. Thishas theeffectof shifting land(Jaworski andSzelejewska1978).Applications dealing
the originto the overallminimum(A = 0) and moving withthcdealkylation ofxyleneisomersweregivenbySarma
backwardtowardR = 0. Thus Figure6 showstheridge andRavindram (1975) fromIndia.Mohammed et al. (1979)
coordinates plotfroma ridgeregression perspectiveas the of West Germanydiscussedthe use of ridgeanalysisto
"ridgetrace,"beginning at theoverallminimum (A = 0) optimizeFischer-Tropsch synthesis of olefins.
and movingon theA (i.e., - k) scale, rather thanon theR Ridge analysiswas recentlyappliedto theproblemof
scale. Notethatk is negativeherebecausewe are plotting estimating particlesize distributionparameters in smallan-
themaximumratherthantheminimum ridge.In thiscase, gle neutron scattering(Faticaet al. 1985). It was used as a
xl-x5 play the role of regressioncoefficients ratherthan sequentialsearchprocedure tominimizea multivariate error
design-variable levels as in ridgeanalysis.Again we see function thatcouldnotbe written in closedformbutcould
thatthecoordinates forxl-x4 are stable,butthoseforx5 be approximated witha quadratic function oftheparameters.
are not. In ridgeregressionwe are therefore makingthe This applicationand Hoerl's(1964) discussionof applying
interpretation relativeto a more important pointforthis ridgeanalysisto thesolutionof simultaneous linearequa-
application, theleastsquaressolution.Withtheridgetrace tionssuggestthatthetechniquehas potentialas a general
we are againconcernedwiththestability of the"optimal" numericalanalysisprocedure.It can be used foroptimi-
coordinates(coefficients). Whena coefficient dropserrat- zationandexamination ofthestability of "exact" solutions.
icallytoward0 as k increasesfrom0, thisis analogousto Althoughridgeanalysismay not have receivedthe at-
thecoordinate foran independent variablehovering at0 and tentionitdeservesin thestatistical literature,somenotable
thenincreasing wildlyin magnitude justbeforetheoverall papershave been published.As theoriginalpaper(Hoerl
optimum is reachedinridgeanalysis.In bothcases we view 1959) was publishedin an engineering journal,no proofs
the coefficient (coordinates)withsuspicionand consider of the properties discussedwere given.These were soon
settingit at some less drasticlevel. Althoughmanyother providedin theliterature in a rigorousfashionby Draper
LUi
H0 . 5 0
o.52 \\ -X4
___ X5
00
I
r
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ xi
X
E -0. 501 X3
-1. 00 1 I I I
This content downloaded from 128.235.251.160 on Thu, 05 Mar 2015 08:55:41 UTC
All use subject to JSTOR Terms and Conditions
(1963), althoughtheyhad been independently derivedby [ReceivedMarch 1984. RevisedNovember1984.]
Jackson(see Hoerl1959). Hoerl(1964) thencombinedthis REFERENCES
mathematical rigorwiththepracticalinterpretation ofHoerl
Box, G. E. P., Hunter,W. G., and Hunter,J. S. (1978), Statistics for
(1959). Experimenters, New York:JohnWiley.
KhuriandMyers(1979) presented a modification
ofridge Box, G. E. P., and Wilson,K. B. (1951), "On theExperimental Attain-
analysisforuse withnonrotatable designs.Theysuggested mentofOptimum Conditions,"JournaloftheRoyalStatistical Society,
thatit may be moreappropriate to optimizeY forfixed Ser. B, 13, 1-38.
Davies,0. L. (ed.) (1956), DesignandAnalvsisofIndustrial Experiments,
varianceof prediction ratherthanforfixeddistancefrom
New York:Hafner.
theorigin.Theirpointis certainly welltaken,butthismod- Draper,N. R. (1963), "Ridge Analysisof ResponseSurfaces,"Techno-
ifiedprocedure loses someof theintuitive appealand ease metrics,5, 469-479.
of interpretationthatmakeridgeanalysisdesirablein prac- Erhardt,V., Krause, M., Seppelt,B., and Bock, W. (1978), "Losung
tice. The use of a rotatabledesignwill clearlysatisfyboth Von Lebensmittelchemischen Und -Technologischen Aufgaben MitHilfe
Der Statistischen Versuchsplanung," Lebensmittelindustrie,25, 151-
butstandard
criteria, ridgeanalysisstillmakeslogicalsense 154.
withdesignsthatare notexactlyrotatable.Onlywhenthe (1980), "Optimierung Von Zwei Und MehrZielgrossenBei Der
varianceof predictionvariesdrasticallyforfixedR will Anwendung Statistischer
Versuchsplane ZurLosungLebensmittelchem-
nonrotatabilitybe a majorconcern. ischerUnd-Technologischer Aufgaben," 27, 107-
Lebensmittelindustrie,
Myersand Carter(1973) discusseda proceduresimilar 110.
Fatica,M. G., Gelman,R. A., Wai, M. P., Hoerl,R. W., and Wignall,
to ridgeanalysisforoptimizing dual responsesystems.Es-
G. D. (1985), "NeutronScattering FromLatices Preparedby Seeded
thex'x = R2 constraint
sentially, is replacedwitha quadratic EmulsionPolymerization," manuscript in preparation.
constraint on a secondaryresponse.They also discussed Hoerl,A. E. (1959), "OptimumSolutionof ManyVariablesEquations,"
combining theridgeanalysisconstraint withthesecondary ChemicalEngineering Progress,55, 69-78.
responseconstraint. This resultsin a ridgeanalysissubject (1962), "ApplicationofRidgeAnalysisto Regression Problems,"
ChemicalEngineering Progress,58, 54-59.
to an additionalquadraticconstraint.
(1964), "Ridge Analysis,"ChemicalEngineering ProgressSym-
posiumSeries,60, 67-77.
Hoerl,A. E., and Kennard,R. W. (1970a), "Ridge Regression:Biased
8. SUMMARY Estimation forNon-Orthogonal Problems,"Technometrics, 12, 55-67.
(1970b), "Ridge Regression:Applicationsto Non-Orthogonal
Duringthe past 25 years,ridgeanalysishas received Problems,"Technometrics, 12, 69-82.
insufficient in thestatistical
attention literature.A popular Jaworski, A., and Szelejewska,I. (1978), "ZastosowanieMetodyHoerla
textby Myers(1976), whichcontainsa detaileddevelop- Do OptymalizacjiProcesuChemicznegoNa PrzykladzieSyntezyTe-
trahydrofuranu Z Butandiolu-1,4," Przemysl Chemiczny, 57, 564-567.
ment,is one of theonlysourcesto mentionit whendis-
Khuri,A. I., and Conlon, M. (1981), "SimultaneousOptimization of
cussingresponse surface Thissituation
analysis. is unfortunate, MultipleResponsesRepresented byPolynomial Regression Functions,"
as ridgeanalysisdisplaystheeffectsof all factorssimul- Technometrics, 23, 363-375.
taneously,findssecondary (local) optima,andinterprets the Khuri,A. I., andMyers,R. H. (1979), "ModifiedRidgeAnalysis,"Tech-
surfacerelativeto theoriginalcenterpoint.The popularity nometrics, 21, 467-473.
Mohammed,M. S., Schmidt,B., Schneidt,D., and Ralek, M. (1979),
and controversialnatureof ridgeregression, as well as the
"Optimierung Der Fischer-Tropsch-Flussig-Phasen-Synthese in Rich-
generallack of emphasison analysisin responsesurface tungDer MaximalenSelektivitat An C2 Bis C4-Olefinen,"Chemie
methodology, are majorcauses of thisoversight. IngenieurTechnik,51, 739-741.
Although no majorstatistical
computer packagesperform Myers,R. H. (1976), ResponseSurfaceMethodology, Blacksburg,VA:
ridgeanalysis,theauthorwillgladlysupplyinterested par- Author,VirginiaPolytechnic and StateUniversity.
Institute
Myers,R. H., and Carter,W. H. (1973), "ResponseSurfaceTechniques
ties witheithera Minitabmacroor FORTRAN 77 source
forDual ResponseSystems,"Technometrics, 15, 301-317.
code. The FORTRAN code analyzesthesurfaceandwrites Sarma,G. S., and Ravindram, M. (1975), "Studiesin Dealkylation-A
relevantdata to separatefilesthatcan be accessed in a Statistical
AnalysisofProcessVariables,"JournaloftheIndianInstitute
graphicspackageforplotting. ofScience,58, 67-83.
192 TheAmericanStatistician,
August1985, Vol. 39, No. 3
This content downloaded from 128.235.251.160 on Thu, 05 Mar 2015 08:55:41 UTC
All use subject to JSTOR Terms and Conditions