Professional Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and extend
access to Journal of the Society for Industrial and Applied Mathematics.
http://www.jstor.org
4 are ellipsoids, while iff is nonlinear, the contours are distorted, according
to the severity of the nonlinearity. Even with nonlinear models, however,
the contours are nearly elliptical in the immediate vicinity of the minimum
of 1. Typically the contour surface of 1 is greatly attenuated in some direc-
tions and elongated in others so that the minimum lies at the bottom of a
long curving trough.
In this paper attention is confined to the algorithm for obtaining least-
squares estimates. For discussion of the broader statistical aspects of non-
linear estimation and application to specific examples the reader is referred
to items [1], [2], [4], [5], [6] in the References. Other referencesare given in
the papers cited.
Methods in currentuse. The method based upon expanding f in a Taylor
series (sometimes referredto as the Gauss method [1], [7], or the Gauss-
Newton method [4]) is as follows.
Writing the Taylor series through the linear terms
or
Now, at appears linearly in (4), and can thereforebe found by the standard
least-squares method of setting 0()/jaj = 0, forall j. Thus at is found by
solving
(6) Abe = g,
where'
(9) mg= a
--(d\bi Qbk)T
Various modifiedsteepest-descentmethods have been employed [5] to
compensatepartiallyforthe typicallypoor conditioningof the 1) surface
whichleads to veryslow convergenceof the gradientmethods.With these
gradientmethods,as with the Taylor series methods,it is necessaryto
controlthe step size carefullyonce the directionof the correctionvector
has been established.Even so, slow convergenceis the rule ratherthan
the exception.
A numberof convergenceproofs,valid undervariousassumptionsabout
the propertiesof f, have been derived for the Taylor seriesmethod [4],
[7]. Convergenceproofshave also been derivedforvarious gradienttype
methods;e.g., [3]. However,mathematicalproofof the monotonicityof 1
fromiterationto iteration,assuminga well behaved function f, and assum-
ing infinitely
precise arithmeticprocesses,is at best a necessarycondition
forconvergencein practice.Such proofs,in themselves,give no adequate
indicationof the rate of convergence.Among the class of theoretically
convergentmethodsone mustseek to findmethodswhichactuallydo con-
vergepromptlyusingfinitearithmetic,forthe largemajorityof problems.
Qualitativeanalysis of the problem.In view of the inadequacies of the
Taylor seriesand gradientmethods,it is well to reviewthe undergirding
principlesinvolved.First,any propermethodmust resultin a correction
vectorwhose directionis within90? of the negativegradientof 1. Other-
wise the values of 1 can be expectedto be largerratherthan smallerat
pointsalong the correctionvector.Second,because of the severeelongation
(10) (A + XI)5o = g.
Then 5o minimizes(~) on the spherewhose radius 115 II satisfies
11
a 112
= 110o112
Proof.In orderto find5 whichwillminimize
(11) (((5)) = Y f-o - P 2,
underthe constraint
(12) I2, I1a |12= 1' 0
a necessaryconditionfora stationarypointis, by the methodof Lagrange,
au = allu alu au
(13) u . ... = --= 0, -=
081 062 06k aX
where
(14) u(5, X) = 11Y - fo - PF 112+ X(115 112
- 115o 112)
and X is a Lagrangemultiplier.
Thus, takingthe indicatedderivatives
(15) 0 = -[P(Y - fo) - PTP]I + X?,
(16) 2- 11 112.
0= 11 112
For a givenX, (15) is satisfiedby the solution5 of the equation
(17) (pTp + XI) = PT(Y - fo)
aTg
cos Y =
(11 11)(11
g II)
vT(D + XI)-lv
(23) (vT[(D + XI)2]-lv)12(gTg)12
2
[
V jk Vj2
(DIV42 (gg)
and simplifying
Differentiating
2 2 2
v2 1
(24) d Z L 1 Dj + X2 1 (Dj+ X)- [='(D +X)2
(24) jx-
C OS
4dX j
L[Z-cs= (D, + X)2
(grg1/12
g* = (gj*) -=
j
(28) (
and solvefortheTaylorseriescorrection
using
(29) A*t* =g*
Then
(30) 6 = j*/7/a .
Constructionof the algorithm.The broad outline of the appropriate
algorithmis now clear. Specifically,at the rth iterationthe equation
(31) (A*() + X(r)I)*(r) = g*(r)
will lead to a new sum of squares (r+1). It is essentialto select X() such
that
(r+) (r)
(33)
are extremely high (>0.99) it can happen that X will be increased to unreasonably
large values. It has been found helpful forthese instances to alter test iii. The revised
test is:
Let b(r+1) = b(r) + K(r);(r) K(') < 1.
Noting that the angle 7(r) is a decreasing function of X(r), select a criterion angle
yo < 7r/2and take
However, if test iii. is not passed even though X(r) has been increased until y(r) < 0o,
then do not increase X(r) further,but take K(r) sufficientlysmall so that cp(r+l) < cr).
This can always be done since y(r) < 70 < r/2. A suitable choice forthe criterionangle
is Yo = r/4.
It may be noted that positivity of cos y when X = 0 is guaranteed only when A is
positive definite. In the presence of very high correlations, the positive definiteness
of A can break down due to rounding error alone. In such instances the Taylor series
method can diverge, no matter what value of K(r) is used. The present algorithm, by
its use of (A + NX) will guarantee positive cos 7 even though A may not quite be
positive definite.While it is not always possible to forsee high correlations among the
parameter-estimates when taking data for a nonlinear model, better experimental
design can often substantially reduce such extreme correlations when they do occur.
(35) = EfZ(x)]
i=l
So that
(37) Xj(r+l) = xj() + ^ j = 1, 2, *.. , k,
where