Professional Documents
Culture Documents
Nonlinear Regresion PDF
Nonlinear Regresion PDF
Gordon K. Smyth
Volume 3, pp 14051411
in
Encyclopedia of Environmetrics
(ISBN 0471 899976)
Edited by
n
k
[yi fxi , q] 2 fx, q D 1 C 2j exp2jC1 x 6
iD1 jD1
and estimation based on this criterion is known as where k is the order of the differential equation. See,
nonlinear least squares. If the errors i follow a for example, [3, Chapter 6] or [9, Chapter 8].
2 Nonlinear regression
Another common form of model is the rational If all the observed responses are positive and varia-
function: tion in ln Y is more symmetric than variation in Y,
k then it is likely to be more appropriate to estimate
j x j1 the parameters 1 and 2 by linear regression of ln Y
jD1
fx, q D m 7 on x rather than by nonlinear least squares. The same
1C kCj x j
jD1 considerations apply to the simple rational function
model
Rational functions are very flexible in form and can 1
be used to approximate a wide variety of functional YD 12
1 C x
shapes.
In many applications the systematic part of the which can, if Y 6D 0, be linearized as
response is known to be monotonic increasing in 1
x, where x might represent time or dosage. Nonlin- 1 D x 13
Y
ear regression models with this property are called
growth models (see Agegrowth modeling). The One can estimate by proportional linear regres-
simplest growth model is the exponential growth sion (without an intercept) of 1/Y 1 on x. Trans-
model (4), but pure exponential growth is usually formably linear models have some advantages in that
short-lived. A more generally useful growth curve the linearized form can be used to obtain starting
is the logistic curve values for iterative computation of the parameter esti-
mates. Transformation to linearity is only a possibility
1
fx, q D 8 for the simplest of nonlinear models, however.
1 C 2 exp3 x
This produces a symmetric growth curve which
asymptotes to 1 as x ! 1 and to zero as x ! 1. Iterative Techniques
Of the two other parameters, 2 determines horizontal
If the function f is continuously differentiable in q,
position or take-off point, and 3 controls steepness.
then it can be linearized locally as
The Gompertz curve produces an asymmetric growth
curve fq, x D fq0 , x C X0 q q0 14
fx, q D 1 exp[2 exp3 x] 9 where X0 is the n p gradient matrix with elements
As with the logistic curve, 1 sets the asymptotic fxi , q0 /j . This leads to the GaussNewton
upper limit, 2 determines horizontal position, and algorithm for estimating q,
3 controls steepness. Despite these interpretations,
1 D q0 C XT0 X0 1 XT0 e 15
it can often be difficult in practice to isolate the
interpretations of individual parameters in a nonlin- where e is the vector of working residuals yi
ear regression model because of high correlations fxi , q0 . The GaussNewton algorithm increments
between the parameter estimators. More examples of the working estimate q at each iteration by an amount
growth models are given in [9, Chapter 7]. equal to the coefficients from the linear regression of
the current residuals e on the current gradient matrix
Transformably Linear Models X. If the errors i are independent and normally
distributed, then the GaussNewton algorithm is
Some simple nonlinear models can be converted an application of Fishers method of scoring for
into linear models by transforming one or both of obtaining maximum likelihood estimators.
the responses and the covariates. For example, the If X is of full column rank in a neighborhood
exponential decay model of the least squares solution, then it can be shown
that the GaussNewton algorithm will converge
Y D 1 exp2 x 10 to the solution from a sufficiently good starting
can, if Y > 0, be transformed into value [6]. There is no guarantee, though, that the
algorithm will converge from values further from
ln Y D ln 1 2 x 11 the solution. In practice, it is usually necessary to
Nonlinear regression 3
modify the GaussNewton algorithm in order to Table 1 GaussNewton iteration, starting from
secure convergence. 3 D 0.5, for PCB in lake trout data
Iteration 1 2 3
Example 2: PCB in Lake Trout Data on the
concentration of Polychlorinated biphenyl (PCB) 1 0.0315 0.2591 0.5000
residues in a series of lake trout from Cayuga Lake, 2 2.7855 2.6155 1.0671
3 3.7658 3.8699 2.8970
New York, were reported in Bache et al [1]. The
4 2.2455 2.3663 1.9177
ages of the fish were accurately known because the 5 0.1145 0.1698 1.9426
fish were annually stocked as yearlings and distinctly 6 0.0999 0.1619 1.6102
marked as to year class. Each whole fish was mechan- 7 0.3286 0.3045 0.9876
ically chopped and ground and a 5-g sample taken. 8 1.1189 0.9774 0.1312
The samples were treated and PCB residues in parts 9 2.9861 2.8261 0.6888
per million (ppm) were estimated by means of col- 10 1.9183 1.7527 0.5595
11 2.4668 2.2974 0.3316
umn chromatography. A combination of empirical 12 3.9985 3.8307 0.1827
and theoretical considerations leads us to consider 13 4.7879 4.6254 0.2033
the model 14 4.8760 4.7126 0.1964
15 4.8654 4.7023 0.1968
ln[PCB] D 1 C 2 x 3 C 16
The basic idea behind modified GaussNewton method consists of using one-dimensional optimiza-
algorithms is that care is taken at each itera- tion techniques to minimize the sum of squares with
tion to ensure that the update does not overshoot respect to at each iteration. This method reduces
the desired solution. A smaller step is taken if the sum of squares at every iteration and is there-
the proposed update would lead to an increase fore guaranteed to converge unless rounding error
in the residual sum of squares. There are two intervenes.
ways in which a smaller step can be taken: line- Even easier to implement than line searches and
search methods and LevenbergMarquart damping. similarly effective is LevenbergMarquardt damping.
The key to line-search algorithms is the fact that Given any positive definite matrix D, the sum of
XT X is a positive definite matrix. This guaran- squares will be reduced by the step from q0 to
tees that XT0 X0 1 e is a descent direction; that q0 C XT0 X0 C D1 XT0 e, if is sufficiently large.
is, the sum of squared residuals will be reduced The matrix D is usually chosen to be either the
by the step from q0 to q0 C XT0 X0 1 XT0 e, > diagonal part of XT0 X0 or the identity matrix. In
0, where is sufficiently small. The line-search practice, is increased as necessary to ensure a
4 Nonlinear regression
reduction in the sum of squares at each iteration, and residual sum of squares are generally more reliable
is otherwise decreased as the algorithm converges to than tests or confidence intervals based on standard
the solution. errors. If the i follow a normal distribution then tests
Although the GaussNewton algorithm and its based on F-statistics are equivalent to tests based on
modifications are the most popular algorithms for likelihood ratios (see Likelihood ratio tests).
nonlinear least squares, it is sometimes convenient
to use derivative-free methods to minimize the sum Example 3: PCB in Lake Trout The least squares
of squares and in so doing to avoid computing the estimates and associated standard errors for the model
gradient matrix X. Possible algorithms include the represented by (16) are given in Table 2. Notice
NelderMead simplex algorithm and the pseudo- the large standard errors for the parameters, which
NewtonRaphson method with numerical derivatives are caused largely by high correlations between the
(see Optimization). parameters. The correlations obtained from the off-
diagonal elements of XT X1 are in fact 0.997
between O1 and O3 , 0.998 between O2 and O3 ,
Inference and 0.9998 between O1 and O2 . A 95% confidence
interval for 3 based on its standard error would
Suppose that the i are uncorrelated with mean zero suggest that 3 is not significantly different from zero.
and variance 2 . Then the least squares estimators q However, if 3 is set to zero then fx, q no longer
are asymptotically normal with mean q and covari- depends on age, and the sum of squared residuals
ance matrix 2 XT X1 [6]. The variance 2 is increases from 6.33 to 31.12. For these data, s2 D
usually estimated by 6.33/25 D 0.253, and the F-test statistic for testing
3 D 0 is F D 31.12 6.33/2s2 D 48.95, for 2
1
n
s2 D [yi fxi , q]2 17 and 25 degrees of freedom. The null hypothesis of 3
np is soundly rejected.
iD1
into the estimation process. Regression models with converges from a much wider range of starting
conditionally linear parameters are called separable values, for example from 3 D 1.
because the linear parameters can be separated out of
the least squares problem in this way.
Any separable model can be written in the form Measures of Nonlinearity
fx, D Xab 21 Since most asymptotic inference for nonlinear regres-
sion models is based on analogy with linear models,
where a is a vector of nonlinear parameters, X is a and since this inference is only approximate inso-
matrix of full-column rank depending on a, and b far as the actual model differs from a linear model,
is a vector of conditionally linear parameters. The various measures of nonlinearity have been proposed
conditional least squares estimator of b is as a guide for understanding how good linear approx-
imations are likely to be.
ba D [XaT Xa]1 XaT Y 22
One class of measures focuses on curvature of the
The nonlinear parameters can be estimated by function f and is based on the sizes of the second
minimizing the reduced sum of squares derivatives 2 f/j k . If the second derivatives
are small in absolute value, then the model is
[Y Xab]T [Y Xab] D YT PY 23 approximately linear. Let u D u1 , . . . , un T be a
given vector representing a direction of interest and
where P D I Xa[XaT Xa]1 XaT is the write
orthogonal projection onto the space orthogonal to the n
2 fxi , q
column space of Xa. Several algorithms have been Ajk D ui 25
j k
suggested to minimize the reduced sum of squares [9, iD1
Section 14.7], the simplest of which is probably the The matrix A D fAjk g represents the size of the
nested GaussNewton algorithm [10]: second derivatives in the direction u. To obtain
akC1 D ak C XT PX1 XT Py 24 curvature measures it is necessary to standardize the
derivatives by dividing by the size of the gradient
This iteration is equivalent to performing an itera- matrix. Let X D QR be the QR decomposition of the
tion of the full GaussNewton algorithm and then gradient matrix X D fxi , q/j , and write
resetting the conditionally linear parameters to their
conditional least squares values. Separable algorithms C D R1 T AR1 26
can be modified by using line searches or Leven-
Bates and Watts [2] use C to define relative curvature
bergMarquardt damping in the same way as the full
measures of nonlinearity. If u belongs to the range
GaussNewton algorithm.
space of X (u D Xb for some b), then C defines
Example 4: PCB in Lake Trout Table 3 gives the parameter effect curvatures. Bates and Watts [3]
results of the nested GaussNewton iteration from consider an orthonormal basis of vectors u for
the same starting values as previously used for the full the range space of X and interpret the resulting
GaussNewton algorithm. The nested GaussNew- curvatures in geometric terms. If XT u D 0 then C
ton algorithm converges far more rapidly. It also defines intrinsic curvatures. It can be shown that
intrinsic curvatures depend only on the shape of
Table 3 Conditionally linear GaussNewton the expectation surface fx, q as q varies and are
iteration, starting from 3 D 0.5, for PCB in lake invariant under reparameterizations of the nonlinear
trout data model; see [3, Chapter 7] or [9, Chapter 4].
Intrinsic curvature in the direction defined by
Iteration 1 2 3
the residuals, with ui D yi fxi , q, is particularly
1 1.1948 1.1986 0.5000 important. In this case, Bates and Watts [3] call
2 6.1993 6.0181 0.1612 C the effective residual curvature matrix. The
3 4.7777 4.6161 0.1997 eigenvalues of this matrix can be used to obtain
4 4.8740 4.7107 0.1966
5 4.8657 4.7026 0.1968
improved confidence regions for the parameters [4].
The largest eigenvalue in absolute size determines the
6 Nonlinear regression
limiting rate of convergence of the GaussNewton [3] Bates, D.M. & Watts, D.G. (1988). Nonlinear Regres-
algorithm [10]. sion Analysis and Its Applications, Wiley, New
Another method of assessing nonlinearity is to York.
[4] Hamilton, D.C., Watts, D.G. & Bates, D.M. (1982).
vary one component of q and to observe how the Accounting for intrinsic nonlinearity in nonlinear regres-
profile sum of squares and conditional estimators sion parameter inference regions, The Annals of Statis-
vary. Let qj j0 be the least squares estimator of q tics 10, 386393.
subject to the constraint j D j0 . Bates and Watts [3] [5] Grimvall, A. & Stalnacke, P. (1996). Statistical methods
study nonlinearity by observing the rate of change in for source apportionment of riverine loads of pollutants,
qj j0 and in the sum of squares as j0 varies. Environmetrics 7, 201213.
[6] Jennrich, R.I. (1969). Asymptotic properties of non-
linear least squares estimators, The Annals of Mathe-
matical Statistics 40, 633643.
Robust and Generalized Nonlinear [7] Press, W.H., Teukolsky, S.A., Vetterling, W.T. & Flan-
Regression nery, B.P. (1992). Numerical Recipes in Fortran, Cam-
bridge University Press, Cambridge (also available for
This entry has concentrated on least squares C, Basic and Pascal).
estimation, but it may be of interest to consider [8] Ratkowsky, D.A. (1983). Nonlinear Regression Mod-
other estimation criteria in order to accommodate elling: A Unified Practical Approach, Marcel Dekker,
New York.
outliers or non-normal responses. Stromberg and
[9] Seber, G.A.F. & Wild, C.J. (1989). Nonlinear Regres-
Ruppert [11, 12] have considered high-breakdown sion, Wiley, New York.
nonlinear regression. Wei [13] gives an extensive [10] Smyth, G.K. (1996). Partitioned algorithms for maxi-
treatment of generalized nonlinear regression models mum likelihood and other nonlinear estimation, Statis-
with exponential family responses. In particular, tics and Computing 6, 201216.
Wei [13] extends curvature measures of nonlinearity [11] Stromberg, A.J. (1993). Computation of high break-
to this more general context and uses them for down nonlinear regression parameters, Journal of the
American Statistical Association 88, 237244.
second-order asymptotics. [12] Stromberg, A.J. & Ruppert, D. (1992). Breakdown in
nonlinear regression, Journal of the American Statistical
References Association 87, 991997.
[13] Wei, B.-C. (1998). Exponential Family Nonlinear Mod-
[1] Bache, C.A., Serum, J.W., Youngs, W.D. & Lisk, D.J. els, Springer-Verlag, Singapore.
(1972). Polychlorinated biphenyl residues: accumula-
tion in Cayuga Lake trout with age, Science 117,
11921193.
(See also Generalized linear mixed models; Non-
[2] Bates, D.M. & Watts, D.G. (1980). Relative curvature parametric regression model)
measures of nonlinearity (with discussion), Journal of
the Royal Statistical Society, Series B 42, 125. GORDON K. SMYTH