0% found this document useful (0 votes)
64 views10 pages

Nonlinearregression Huang

This document provides an overview of nonlinear regression analysis. It discusses key topics such as least squares estimation using the Gauss-Newton method, intrinsic and parameter-effects nonlinearity, and assessing curvature. Nonlinear regression is important in fields like biochemistry, ecology, economics, and social sciences for modeling nonlinear relationships in data. While least squares is commonly used for parameter estimation, the quality of inferences depends on the degree of nonlinearity in both the model and parameters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views10 pages

Nonlinearregression Huang

This document provides an overview of nonlinear regression analysis. It discusses key topics such as least squares estimation using the Gauss-Newton method, intrinsic and parameter-effects nonlinearity, and assessing curvature. Nonlinear regression is important in fields like biochemistry, ecology, economics, and social sciences for modeling nonlinear relationships in data. While least squares is commonly used for parameter estimation, the quality of inferences depends on the degree of nonlinearity in both the model and parameters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/265638446

Nonlinear Regression Analysis

Chapter · January 2010

CITATIONS READS

2 9,782

1 author:

Hsin-Hsiung Huang
University of Central Florida
41 PUBLICATIONS 321 CITATIONS

SEE PROFILE

All content following this page was uploaded by Hsin-Hsiung Huang on 24 September 2014.

The user has requested enhancement of the downloaded file.


This is a review article for the chapter "Statistics: Nonlinear Regression", to appear in International Encyclopedia of Education, 3rd ed. Elsevier.

Nonlinear Regression Analysis


Hsin-Hsiung Huang, Academia Sinica, Taiwan. well as in engineering. In this article, we offer an
Chuhsing Kate Hsiao, National Taiwan introduction of theories and methods of nonlinear
University, Taiwan. regression. Least squares with Gauss-Newton
Su-Yun Huang, Academia Sinica, Taiwan. method is the most widely used approach to
parameters estimation. Under the normality
GLOSSARY assumption of errors, the least squares estimates
equal the maximum likelihood estimates. The
Curvature. The curvature measures how a geo-
predicted values of the responses can be biased
metric object deviates from being flat. There
because of the “intrinsic nonlinearity” of the
are various definitions of curvatures depending
model. Even if the degree of intrinsic nonlinearity
on the context. In nonlinear regression analy-
is slight, the least squares estimates of the para-
sis, the relative curvature array developed by
meters may still be hard to converge due to the
Bates and Watts is a widely used approach for
“parameter-effects nonlinearity”. The intrinsic
assessing curvature and nonlinearity.
nonlinearity is invariant to re-parametrization,
Gauss-Newton method. The Gauss-Newton while the parameter-effects nonlinearity can be
algorithm as a modification of Newton’s corrected by a suitable re-parametrization. We
method is used to solve nonlinear least squares also discuss techniques from geometric viewpoints
problems. It is based on iterative local linear for estimation and inferences as well as for assess-
approximation to the regression function and ing their statistical properties.
does not require to evaluate the second deriv-
atives. Keywords: Curvature, Gauss-Newton method,
Intrinsic curvatures, Intrinsic nonlinearity, Max-
Intrinsic nonlinearity. Intrinsic nonlinearity imum likelihood estimation, Nonlinear least
refers to a quantity which makes the pre- squares, Nonlinear regression analysis, Parameter-
diction biased in nonlinear models. The de- effects curvatures, Parameter-effects nonlinearity,
gree of intrinsic nonlinearity is determined by Relative curvature array, Uniform coordinates.
the model and data and is invariant to re-
parametrization.
1 Introduction
Nonlinear regression analysis. Nonlinear re-
gression models are referring to models that Regression analysis refers to the statistical infer-
are nonlinear in parameters. Nonlinear regres- ences for a model
sion analysis mainly concerns the prediction of
responses, statistical inferences of parameters y = f (x; θ) + ², (1)
estimates, and the goodness of fit of the non-
linear model. where y ∈ R is the response variable, x =
(x1 , . . . , xk ) ∈ Rk are explanatory variables and
Parameter-effects nonlinearity. Parameter- θ = (θ1 , . . . , θp ) ∈ Rp are parameters. We name
effects nonlinearity refers to a quantity which f the regression function, whose functional form is
affects the rate of convergence of the estimates known up to some unknown parameters θ, and ²
from Gauss-Newton method. It varies under is an error term with zero mean and variance σ 2 .
different re-parametrization. When the regression function f is linear in the pa-
Solution locus. The collection of points of re- rameters θ, it leads to the very popular and widely
gression function in the sample space evalu- used statistical inferential techniques known as lin-
ated at all feasible parameter values consti- ear regression analysis. However, linear models are
tutes the solution locus. It is also known as not always appropriate, so one often needs to apply
expectation surface. a nonlinear regression model, where f is nonlinear
in θ. The statistical inferential problem concerning
(1) is called a nonlinear regression analysis, when
Abstract f is nonlinear in θ.
In the following sections, we present a set of top-
Nonlinear regression analysis is a very popular ics in nonlinear regression. Among various esti-
technique in mathematical and social sciences as mation methods of nonlinear regression, the least

1
This is a review article for the chapter "Statistics: Nonlinear Regression", to appear in International Encyclopedia of Education, 3rd ed. Elsevier.

squares method is the most popular. The Gauss- 2.1 Least squares estimation
Newton procedure is used to estimate the unknown
The least squares estimator of θ, denoted by θ̂, is
parameters without calculating the Hessian ma-
the point in the parameter space such that f (θ̂) is
trix. Underlying the normality assumption of error
closest to y in the sample space among all feasible
terms, the maximum likelihood estimator is equiv-
f (θ) in the solution locus. The least squares esti-
alent to the least squares estimator.
mator is derived from minimization of the residual
In fields such as biochemistry, ecology, eco-
sum of squares
nomics, and social sciences, there are plenty non-
linear regression models being applied for a long n
X 2
time. Yet if the model needs to be scrutinized care- S(θ) = {yi − f (xi ; θ)} , θ ∈ Θ ⊂ Rp . (3)
fully, the quality of inferences from parameter esti- i=1

mates relies on the magnitude of model nonlinear- When f is differentiable with respect to θ, we solve
ity and parameter effects. We present a graphical for the least squares solution θ̂ in the following
example to illustrate this phenomenon. system of equations
Vectors and matrices are denoted respectively ¯
by bold faced lower and upper case letters, such as ∂S(θ) ¯¯
= 0, ` = 1, . . . , p.
x ∈ Rk and X ∈ Rn×k , and scalars are denoted ∂θ` ¯θ=θ̂
by roman letters.
The system of equations (called normal equations)
are given by
2 Estimation methods X n ¯
∂f (xi ; θ) ¯
{yi − f (xi ; θ)} ¯¯ = 0, (4)
∂θ` θ=θ̂
For a given data set {(xi , yi ) : i = 1, . . . , n} with i=1
size n, an empirical model based on the data can for ` = 1, . . . , p. Or, in matrix form,
be written as
V (θ̂)0 ²̂ = 0, (5)
y = f (X; θ) + ², (2)
where Vi,` (θ) = ∂f (xi ; θ)/∂θ` , and ²̂ = y − f (θ̂).
where The matrix V (θ), of size n × p, is a velocity matrix
  with the `th column denoting the instantaneous
x01
  £ ¤ speed at θ when moving on the solution locus along
X =  ...  = x(1) ··· x(k) , the `th -coordinate θ` . In a linear regression model,
x0n this velocity matrix is simply the data design ma-
trix X and does not depend on the parameter val-
and     ues θ.
y1 ²1 Often normal equations do not have an analytic
   
y =  ...  , ² =  ...  . solution for θ̂ and numerical iterative procedures
yn ²n are needed. Below we introduce the Gauss-Newton
method for solving nonlinear least squares prob-
In the data design matrix X, the ith row x0i = lems based on iterative local linear approximations
(xi1 , xi2 , . . . , xik ) represents the ith observation, to the solution locus. Consider a small neighbor-
and the j th column x(j) = (x1j , x2j , . . . , xnj )0 rep- hood of θ ∗ . The linear Taylor expansion is given
resents the j th variable. We will use f (θ) to denote by
¡ ¢0
f (X; θ) = f (x1 ; θ), . . . , f (xn ; θ) for short. Let
Θ ⊂ Rp be the parameter space. The collection of f (x; θ) ≈ f (x; θ ∗ ) + ∇θ f (x; θ ∗ )0 (θ − θ ∗ ), (6)
points of regression function in the sample space where
evaluated at all feasible parameter values, denoted µ ¶
by ∂f (x; θ ∗ ) ∂f (x; θ ∗ )
∇θ f (x; θ ∗ )0 = ,..., .
∂θ1 ∂θp
f (Θ) = {f (θ) = f (X; θ) ∈ Rn : θ ∈ Θ ⊂ Rp } ,
Therefore, the linear approximation to f (x; θ)
is called the solution locus or expectation sur- around the neighborhood of θ ∗ leads to an approx-
face. A linear regression model has the solution imate residual sum of squares:
locus as a p-dimensional hyperplane lying in the S(θ)
n-dimensional sample space. While a nonlinear re- n p
X © X ª2
gression model has the solution locus as a curved ≈ ∗
yi − f (xi ; θ ) − Vi,` (θ ∗ )(θ` − θ`∗ ) .
hypersurface in the sample space. i=1 `=1

2
This is a review article for the chapter "Statistics: Nonlinear Regression", to appear in International Encyclopedia of Education, 3rd ed. Elsevier.

Using the linear Taylor expansion in the iterative Then, the log-likelihood function is given by
update, the corresponding normal equations at the
tth iteration are given by n n S(θ)
`(θ, σ 2 ) = − log(2π) − log(σ 2 ) − .
2 2 2σ 2
V (θt )0 V (θt )(θ − θt ) = V (θt )0 (y − f (θt )) . (7)
When σ 2 is known, maximizing the log-likelihood
The increment with respect to θ is the same as minimizing the
error sum of squares S(θ). Thus, the maximum
−1
δ = (V (θt )0 V (θt )) V (θt )0 (y − f (θt )) likelihood estimator θ̂mle is the same as the least
squares estimator θ̂. On the other hand, when σ 2
serves as an update direction for next iteration. is not known, ∂`/∂σ 2 = 0 has the solution
The Gauss-Newton method, an iterative approach,
is then given by S(θ)
σ̂ 2 (θ) = .
n
θt+1 = θt + λt δt ,
Substituting σ̂ 2 (θ) into the log-likelihood expres-
where λt is the step size. The update direction δ sion results in
is derived from the tangent plane approximation n
to the solution locus. This approximation is only `(θ, σ̂ 2 (θ)) = κ − log (S(θ)) ,
2
valid in a local neighborhood of the current para-
meter estimator θt . The size of this neighborhood, where κ is some constant. Maximizing `(θ, σ̂ 2 (θ))
where the linear approximation is valid, depends with respect to θ, we have
on the curvedness of the nonlinear model and its
2 S(θ̂)
parametrization. It is possible that taking a full θ̂mle = θ̂ and σ̂mle = .
step, i.e., λ = 1, will produce an increase in the n
residual sum of squares, especially when the Gauss- In summary, when the noise structure follows a
Newton update increment δ has extended beyond normal distribution, the maximum likelihood esti-
the validity region of linear approximation. Thus, mates equal the least squares estimates.
a smaller step size is required to ensure a decrease
in residual sum of squares. Practically, λt can be 2.3 Good initials
taken as 1/2 step size of the previous step, i.e.,
consecutively the step size 0.5, 0.25, 0.125, etc. till For a linear model the Gauss-Newton method will
a decrease in residual sum of squares (Nocedal and find the minimum in one single iteration from any
Wright, 2006). Then, the procedure moves on to initial parameter estimates. For a model which
the next iteration. In addition, it is necessary to is close to being linear, the convergence for Gauss-
set up a requirement to stop the Gauss-Newton Newton method will be rapid and not depend heav-
procedure. Equation (5) implies that the residual ily on the initial parameter estimates. However, as
vector is orthogonal to the tangent plane of the so- the magnitude of model nonlinearity becomes more
lution locus, and can be used as stopping criterion and more prominent, convergence will be slow or
for convergence. This concept leads to the relative even may not occur, and the resulting parameter
offset orthogonality convergence criterion by Bates estimates may not be reliable. In that case, good
and Watts (1981). initials are important.
The estimate of the asymptotic covariance of the One approach to find initial values is via trans-
least squares estimate is given by formation, so that the linear regression analysis un-
³ ´−1 der the assumption of additive error terms can be
Cov(θ̂) = σ 2 V (θ̂)0 V (θ̂) . (8) utilized (Bates and Watts, 1988; Ryan, 1997). For
instance, the reciprocal of the Michaelis-Menten
regression function,
In a linear model, V (θ̂) = X, and it does not
depend on the parameter estimates. Furthermore, θ1 x
the covariance expression (8) for a linear model f (x; θ) = , (9)
θ2 + x
does not require an asymptotic character, i.e., it is
valid regardless of the sample size. leads to the model
µ ¶
−1 1 θ2 1
y = + + ².
2.2 Maximum likelihood estimation θ1 θ1 x
Consider a normal distribution for the error term: Indeed, this is a linear regression model denoted as

y − f (x; θ) = ² ∼ N (0, σ 2 ). ỹ = β0 + β1 x̃ + ²,

3
This is a review article for the chapter "Statistics: Nonlinear Regression", to appear in International Encyclopedia of Education, 3rd ed. Elsevier.

with ỹ = y −1 , β0 = θ1−1 , β1 = θ2 θ1−1 , and x̃ = x−1 . gression depends on the “curvedness” of the under-
The least squares estimates of β1 and β2 can there- lying model and on the parametrization adopted.
fore be transformed to provide initial values for the
parameters in the Michaelis-Menten model. Alter- 3.1 Intrinsic nonlinearity and para-
natively, we can adopt a grid search, which eval-
meter effects nonlinearity
uates the residual sum of squares at these para-
meter grid points and then finds the minimizing Of various measures for curvedness, the relative
point among these grid points to provide initial curvature array is widely used in nonlinear regres-
values. In many occasions, the grid search helps sion analysis and it is presented below. The nonlin-
to avoid improper initial values that lead to lo- ear least squares fit is based on the iterative update
cal but not global minimum of the residual sum of of local linear approximations to the solution locus
squares. Another alternative is to use uniform de-
f (xi ; θ) ≈ f (xi ; θ ∗ ) + ∇θ f (xi ; θ ∗ )0 (θ − θ ∗ ). (10)
sign points to replace the lattice grid points. Uni-
form design points are so designed to be as uniform Consider a further extension to include the
and as space-filling as possible and provide a more quadratic term in the Taylor expansion
efficient search scheme (Fang et al., 2000).
f (xi ; θ) ≈ f (xi ; θ ∗ ) + ∇θ f (xi ; θ ∗ )0 (θ − θ ∗ )
1
+ (θ − θ ∗ )0 ∇2θ f (xi ; θ ∗ )(θ − θ ∗ ),
2
3 Assessing nonlinearity
where ∇2θ f (xi ; θ ∗ ) = ∂ 2 f (xi ; θ)/∂θ 2 is a p × p ma-
Given that the error terms are i.i.d. normally trix of second derivatives for each i = 1, . . . , n.
distributed random variables with zero mean, the The magnitude of the quadratic term relative to
least squares estimator in linear models is un- the linear term determines the difference between
biased, normally distributed and having mini- the tangent-plane (10) and the corresponding so-
mum variance among all linear unbiased estima- lution locus. Display ∇2θ f (xi ; θ ∗ ), i = 1, . . . , n,
tors (BLUE). However, the least squares estimator into a 3-dimensional array A(θ ∗ ), which is of size
for nonlinear models does not have the same prop- n × p × p. This array A(θ ∗ ) is called the acceler-
erties. The nonlinear least squares estimator ap- ation array at θ ∗ . The acceleration array A can
proaches BLUE only asymptotically. In addition, be decomposed into two components, one parallel
when the sample size increases, the Gauss-Newton and the other normal to the tangent plane. The
iterative estimator becomes asymptotically numer- intrinsic nonlinearity is based on the normal com-
ically stable (Jennrich, 1969). However, if the ponent, and it is invariant corresponding to any
model is highly curved and the first derivative re-parametrization. Factorization of the accelera-
V (θ) changes violently in the iteration, the least tion array into the tangent and normal components
squares estimator can be numerically unstable. along with other technical details are placed in Ap-
The parametrization is another important is- pendix.
sue. Although the shape of the solution locus is In the linear regression model, the solution lo-
fixed, the performance of Gauss-Newton interac- cus is a hyperplane. Uniform and equi-spaced grid
tive estimator varies with respect to different pa- lines on the p-dimensional parameter space ap-
rameterizations. This depends on the nonlinearity pear also as uniform and equi-spaced grid lines on
in the model. There are two kinds of nonlinear- this hyperplane. In the nonlinear model, the tan-
ity: “intrinsic nonlinearity” and “parameter-effects gent plane to the solution locus is used to serve
nonlinearity” (Bates and Watts, 1980; Hamilton as a local approximation to the locus. When
et al., 1982). The intrinsic nonlinearity is asso- these p-dimensional grid lines by θ are mapped
ciated with the modeling and is invariant under onto this tangent plane, we would prefer to see
re-parametrization. The parameter-effects nonlin- the resulting mappings as uniform and as equi-
earity, however, can be lessened through a proper spaced as possible. The parameter-effects curva-
re-parametrization. If either component of the ture measure (18) given in Appendix can assess the
nonlinearity is large, the least squares estimate is deviation of the parameter curves from the uni-
hard to converge, or even does not converge. Fur- form coordinate system. Figure 1 presents plots
thermore, the asymptotic covariance of the least of parameter-curves using simulated data from
³ ´−1 Michaelis-Menten model.
square estimate, given by σ 2 V (θ̂)0 V (θ̂) ,
would change greatly in each step of the itera-
3.2 Confidence regions
tion, and the statistical inference based on the as-
ymptotic normality becomes unreliable. In other In the iterative updates using local linear approxi-
words, the least squares estimator in nonlinear re- mation, V plays the role as the design matrix X in

4
This is a review article for the chapter "Statistics: Nonlinear Regression", to appear in International Encyclopedia of Education, 3rd ed. Elsevier.

the linear model. Under some suitable conditions The curvedness of a coordinate system will affect
on the error distribution and the regularity of the the quality of statistical inferences based on the as-
regression function f , the following holds asymp- ymptotic normality. When the mapped parameter
totically curves onto the tangent plane are not uniform grid
√ ³ ´ ³ ´ lines, the resulting confidence region may not be
−1
n θ̂ − θ ∼ N 0, σ 2 (V (θ)0 V (θ)/n) . reliable. We use the well-known Michaelis-Menten
(11) model (9) to serve as an example for showing the
Using the linear approximation (10) and the as- parameter curves on the tangent plane. This model
ymptotic normality property (11), we can con- is commonly used for population dynamics in ecol-
struct a confidence region and make statistical in- ogy studies, and for pharmacokinetics such as ve-
ferences, such as hypothesis testing, concerning θ. locity of enzyme reaction. The parameter θ1 in-
The performance of parameter confidence region dicates the maximal growth rate or maximal en-
and inferences depend on how well the linear ap- zyme concentration level, and θ2 is the half satu-
proximation is, or in another aspect, on the curved- ration coefficient or called Michaelis-Menten con-
ness of the solution locus at the point of least stant. Statistical inferences of these parameters
squares estimate θ̂. help to understand the growth or increase pattern
Obtaining the observed information matrix at of a certain species.
the convergence, i.e., the information matrix eval- The following simulation study illustrates that
uated at θ̂, each diagonal element of the inverse of the parameter curves become more like grid lines in
the information matrix provides a minimum vari- the Michaelis-Menten model, where σ = 40, θ1 =
ance bound for the corresponding parameter. An 100, θ2 = 0.05, and x’s come from the absolute
approximate 1 − α joint confidence region for θ is values of N (0, 2) random variables. We take two
different sample sizes, n = 10 and n = 100. The
(θ − θ̂)0 V 0 (θ̂)V (θ̂)(θ − θ̂) ≤ ρ2 F (p, n − p; α), mapped parameter curves displayed in Figure 1 are
colored in blue (φ1 ) and black (φ2 ). The circle in
or equivalently red prescribes an ideal 95% confidence region for
φ0 φ ≤ ρ2 F (p, n − p; α), a zero-curvature model and data. We can see in
Figure 1 that the ideal confidence region has gone
where φ is a re-parametrization (14) discussed in beyond the square [−3 3]2 on the tangent plane
Appendix, and the boundary of this inference re- when n = 10, while it stays inside the square for
gion is the case when
p n = 100. The radius of the circle is
n p o given by F (2, n − 2; 0.05).
−1
θ = θ̂ + ρ F (p, n − p; α)R11 u with kuk = 1 .

Note that the degree of bias, non-normality, and 4 Diagnostics and practical
excessive variance depend on the model and data. considerations
For a given model with a fixed parametrization,
data sample size helps to level off the curvedness. The diagnostic study in nonlinear regression mod-
As the sample size increases, the sample space di- els is important for data analysis. The key factors
mensionality increases. Consequently, the solution are the nonlinearity of the model and the charac-
locus becomes flatter and flatter and is getting teristics of data. After obtaining the least squares
closer to being linear in the higher dimensional estimates, some diagnostic checks of the magnitude
sample space (Beale, 1960). of the intrinsic and parameter-effects nonlinearity
are necessary to assess the performance of the es-
3.3 Coordinate system timates and the fitted model. Also some practical
considerations are discussed.
The coordinate grids on the parameter space,
named parameter curves, do not necessarily stay • Multicollinearity. This occurs when some
equi-spaced, straight and parallel, when first lifted columns in the velocity matrix V are highly
to the solution locus and then projected down to correlated and it leads to ill-conditioned sys-
the tangent plane, especially when the locus is tem of normal equations. It is an indication
highly curved. When projecting the equi-spaced that the model may be over-parametrized, and
parameter curves in the parameter space onto the a simpler model, or a transformation of the
tangent plane, the degree of unequal spacing and regressors or parameters may be considered.
the lack of parallelism of the mapped parameter Sometimes the degree of multicollinearity can
curves lead to a measure of the parameter-effects be reduced by centering and scaling the regres-
nonlinearity. sors (explanatory variables) data (Bates and

5
This is a review article for the chapter "Statistics: Nonlinear Regression", to appear in International Encyclopedia of Education, 3rd ed. Elsevier.

3
in nonlinear models, as the number of parame-
ters is not related to the number of explana-
2
tory variables. Alternatively, other lack-of-fit
1
tests and residual plots are useful in the non-
0 linear case.
−1

5 Model building
−2

−3

−4 Model building aims at finding more realistic ways


−5 to describe the stochastic behavior observed in
−5 −4 −3 −2 −1 0 1 2 3 4
data. Models used in data analysis are approxima-
tions to the unknown regression function f (x; θ).
4
It is desired to find nonlinear models that be-
3 have close to linear models in estimation and in
2
inferences. In addition to the Michaelis-Menten
model (9) we listed a few widely used models
1
that have provided a great deal of useful applica-
0 tions of real data analysis in various scientific fields
(Ratkowsky, 1983).
−1

−2 • Yield-density model. The following three-


parameter models are widely used for mod-
−3
−4 −3 −2 −1 0 1 2 3 4
eling relations between yield of a crop and the
density of planting.
Figure 1: Parameter curves for the orthogonal pa-
f (x; θ) = (θ1 + θ2 x)−1/θ3 ,
rameters φ = R11 (θ−θ̂) using simulated data from
the Michaelis-Menten model. The upper figure is f (x; θ) = (θ1 + θ2 x + θ3 x2 )−1 ,
with sample size n = 10; the lower figure is with f (x; θ) = (θ1 + θ2 xθ3 )−1 .
sample size n = 100.
The analysis of growth data is important in many
Watts, 1988). fields of study. The following models are practi-
cally used for modeling growth data.
• Convergence status of the Gauss-Newton algo-
rithm. A slow speed of convergence or diver- • Weibull model. f (x; θ) = θ1 − θ2 exp(−θ3 xθ4 ).
gence of the Gauss-Newton algorithm serves
as an indication that the model and data com- • Logistic model.
bination are not close to linear and the asymp- θ1
totic normality property at the convergence f (x; θ) = .
1 + θ2 exp(−θ3 x)
point is not trustful.
• Curvature measures at convergence and re- • Richards growth model.
parametrization. Bates and Watts (1981) set
bounds on the maximal values of intrinsic and θ1
f (x; θ) = θ4
.
parameter-effects nonlinearity measures in or- (1 + exp(θ2 − θ3 x))
der to determine whether the estimator at-
tains the global minimum of the residual sum There are also other forms of parameteriza-
of squares; furthermore, they offered a way tions.
to evaluate the parameter-effects array un-
der a re-parametrization (Bates and Watts, • Monomolecular growth model.
1988). Nonetheless, little general guidance is ¡ ¢
provided to choose parameters attaining the f (x; θ) = θ1 1 − exp(−θ2 (x − θ3 )) .
minimum of parameter-effects curvatures.
If we re-parameterize by replacing −θ1 eθ2 θ3
• Model adequacy. Although R2 is a common with θ2 and e−θ2 by θ3 , we have
tool for model adequacy in linear regression
analysis, it might be unreasonable to apply it f (x; θ) = θ1 + θ2 θ3x (0 < θ3 < 1).

6
This is a review article for the chapter "Statistics: Nonlinear Regression", to appear in International Encyclopedia of Education, 3rd ed. Elsevier.

Some nonlinear regression models are constructed or count data. If the distribution of response
under theoretical consideration, for instance, the variables belongs to the exponential family, a
Michaelis-Menten model (9). In physical and link function can be applied on the expecta-
chemical fields, there are many models constructed tion of y to connect to the linear predictors
from differential equations (Seber and Wild, 1989). Xθ. This makes the expectation surface non-
The structural relationships between random vari- linear. However, the expectation surface is de-
ables and the realizations lead to different models, termined by quantities linear in parameters
so the assessment of models is very crucial. Ac- with a pre-determined non-linear link func-
cording to the principle of Occam’s razor (Jaynes, tion. It is thus easier to perform statistical in-
2003), simpler models are preferred than compli- ference under GLMs. Many commercial soft-
cated models. wares have been developed for estimation.

• Semiparametric and nonparametric regression.


6 Further reading If the aim of data analysis is to obtain a
good fit to the response curve on the ex-
• Multiresponse Model. The models that we have planatory variables, a nonparametric regres-
discussed so far are for single response. When sion (also known as curve fitting) is probably a
there are more than one response, and when better alternative than a nonlinear regression
the errors have a joint normal distribution, the model. The latter is a parametric approach,
least squares method is still useful. If the nor- and its aim is to explore and predict the re-
mal assumption is not justified, the maximum sponse at given values of explanatory variables
likelihood or a Bayesian approach (Box and as well as to make statistical inferences based
Tiao, 1992) are candidate alternatives. upon interpretation of parameter estimates.
The semiparametric regression, which adopts
• Levenberg-Marquardt algorithm. The use of a model that has parametric and nonparamet-
the Gauss-Newton algorithm with Levenberg- ric components, is another alternative to non-
Marquardt modifications (Marquardt, 1963) linear regression modelling.
is to speed up the convergence as well
as to stabilize the computation for near- Appendix
singular V 0 V in the least squares normal Factorization of D and A. Due to the sym-
equations. The Levenberg-Marquardt modi- metric structure of the second derivatives, i.e.,
fications are compromise between the Gauss- ∂ 2 f (θ ∗ )/∂θ` ∂θ`0 = ∂ 2 f (θ ∗ )/∂θ`0 ∂θ` , there are
Newton method and the steepest descent p(p − 1)/2 redundant columns in A. Remove these
method. redundant columns, and denote such a resulting
matrix by W , which is of size n × p(p + 1)/2. Let
• Differential geometric view. The measures of
D = [V W ], whose columns are formed by ve-
curvature described above are based on the
locity and acceleration vectors. QR-factorize the
geometric properties of the solution locus f (θ)
matrix D as
relative to the parametrization θ. Therefore,
a different approach to assessing the degree " ¯ # 
n×p n×p0 ¯ n×(n−p−p0 ) R
of nonlinearity is via studying the geomet- D= QT QN ¯¯ Q0  −−− 
ric structures of the distributions. When a 0
normal error assumption is assumed, we are
imposing a Euclidean metric on the underly- with
ing class of distributions. Changing the er-  
p×p p×p0
ror distribution leads to a different metric, R R12 
thus changing the geometry and concepts of R(p+p0 )×(p+p0 ) =  0 11 p0 ×p0
,
p ×p
curvature. The study of probability and in- 0 R22
formation by way of differential geometry is
known as information geometry. We refer the where QT consists of p orthonormal n-vectors and
reader to Amari (1982) and Amari and Na- spans the tangent space of the solution locus at θ ∗ ,
gaoka (2000) for detailed accounts of differen- QN consists of p0 = p(p + 1)/2 − p orthonormal n-
tial geometric viewpoints and approaches. vectors, which are normal to QT and together with
QT have the same column span as W , and R is
• Generalized linear model. The generalized lin- a (p + p0 ) square matrix with zero entries in the
ear models (GLMs) extend the linear regres- lower-triangular part. The Q0 matrix is composed
sion to include the response variables that are of the remaining orthonormal column basis. These
no longer normally distributed, such as binary columns are orthogonal to D and together with

7
This is a review article for the chapter "Statistics: Nonlinear Regression", to appear in International Encyclopedia of Education, 3rd ed. Elsevier.

QT and QN they form a complete orthonormal This quantity is invariant under any re-
basis for Rn . Note that Q0 has no contribution to parametrization. Often we consider a scaled
either the intrinsic nor the parameter-effects non- curvature instead:
linearity. Actually, we can write an economic QR-
kh0 ¯ AN (θ ∗ ) ¯ hk √
factorization: κN
h =ρ ∗ 2
, where ρ = s p
· ¸ kV (θ )hk
R11 R12 (13)
D = [ QT QN ] R with R = .
0 R22 with s2 = S(θ ∗ )/(n − p). Thepquantity ρ is called
the standard radius so that ρ Fp,n−p;α is the ra-
Note that all the above-mentioned matrices and dius for the 100(1 − α)% confidence sphere of a
arrays depend implicitly on θ ∗ , the current update re-parametrization φ in a neighborhood of θ ∗ :
of the least squares estimate. We omit θ ∗ from all
these matrices and arrays for notation simplicity φ = R11 (θ − θ ∗ ). (14)
when there is no ambiguity.
In the classical geometry, the curvature corre- With such a re-parametrization, the velocity ma-
sponding to a direction is the ratio of the length of trix ∇φ f (X; φ) will have orthonormal columns,
the second derivative to the squared length of the and the second derivative array projected onto the
first derivative. In the context of residual sum of space spanned by QT and QN will become
squares S(θ), such a curvature measure combines −1 0 −1 0
the first derivatives and the second derivatives in (R11 ) ¯ [ QT QN ] A ¯ R11 .
such a way that the nonlinearity due to model
Then, we have the definition of the relative curva-
and due to parametrization are mixed together.
ture array as follows.
This curvature measure can be decomposed into
two components: the intrinsic and the parameter- ¡ −1 ¢0 0 −1
C = ρ R11 ¯ [ QT QN ] A ¯ R11 .
effects nonlinearity. The second derivative array
A can be decomposed accordingly into two com- Note that C is a scaled second derivative array
ponents reflecting the model effects and the para- with parametrization φ (14) and is of the same
meter effects. By left multiplying Q0T and Q0N to 0
size as [ QT QN ] A, which is a (p + p0 ) × p × p
A, we have AT = Q0T A and AN = Q0N A rep- array. Its corresponding decomposition is given by
resenting respectively the component parallel and C = CT ⊕ CN with
normal to the tangent plane. Consider a line in
p×p×p
the parameter space Θ and its corresponding tra- z }| {
jectory along the direction h, or called lifted line, −1 0 −1
CT = ρ (R11 ) ¯ AT ¯ R11 , (15)
on the solution locus:
and
line in Θ : θ(t) = θ ∗ + th, lifted line : f (θ ∗ + th).
p0 ×p×p
z }| {
The second derivative of the lifted line with respect −1 0 −1
CN = ρ (R11 ) ¯ AN ¯ R11 . (16)
to parameters along the direction h is given by
0
h0 ¯ [ QT QN ] A ¯ h with decomposition: The intrinsic curvature measure can then be ex-
p×1 0
p ×1
pressed as
z }| { z }| { κN 0
d = kd CN dk, (17)
h0 ¯ AT ¯ h ⊕ h0 ¯ AN ¯ h,
where d has unit length and the corresponding di-
−1
where the direct sum is not a matrix sum but rection in θ coordinates is h = R11 d. We can use
N
rather is used to denote two components and ¯ aR maximal value maxd κd or an average quantity
denotes the slice-wise multiplication. For instance, κN
d to account for the intrinsic curvature mea-
AN is a p0 × p × p array with the ith slice a p × p sure.
matrix consisting of entries AN,ijk , j, k = 1, . . . , p. Parameter-effects nonlinearity. To assess the
When the p-vector h and its transpose are right deviation of parameter curves from the uniform
and left multiplied to each slice of AN , it gives coordinate system, the parameter-effect curvature
a scalar for each slice. Collectively, h0 ¯ AN ¯ h measure is introduced below
forms a p0 -vector.
κTd = kd0 CT dk. (18)
Intrinsic nonlinearity. The intrinsic curvature
The parameter-effects curvature is the projection
corresponding to the direction h at θ ∗ is given by
of the second derivative array of the solution lo-
kh0 ¯ AN (θ ∗ ) ¯ hk cus onto the tangent plane, where the derivatives
. (12) are taken over the parameters of interest. The
kV (θ ∗ )hk2

8
This is a review article for the chapter "Statistics: Nonlinear Regression", to appear in International Encyclopedia of Education, 3rd ed. Elsevier.

parameter-effects curvature (15) and the intrin- Wu, C. F. (1981). Asymptotic theory of nonlinear
sic curvature (16) are collectively called the rel- least squares estimation, Annals of Statistics,
ative curvature array. Unlike the invariance of in- 9, 501-513.
trinsic curvatures, the parameter-effects curvatures
change when a different re-parametrization is car- Further Reading
ried out in the nonlinear model. Allen, D. M. (1983). Parameter estimation for
nonlinear models with emphasis on compart-
Bibliography mental models. Biometrics, 39, 629-637.
Bard, Y. (1974). Nonlinear Parameter Estima- Amari, S. (1982). Differential geometry of the
tion. New York: Academic Press. curved exponential family–curvatures and in-
formation loss. Annals of Statistics, 10, 357-
Bates, D. M. and Watts, D. G. (1980). Relative 368.
curvature measures of nonlinearity. Journal
of Royal Statistical Society, B, 42, 1-25. Amari, S. and Nagaoka, H. (2000). Methods of In-
formation Geometry. Transactions of Math-
Bates, D. M. and Watts, D. G. (1981). A relative ematical Monographs, vol. 191. American
offset orthogonality convergence criterion for Mathematical Society.
nonlinear least squares. Technometrics, 23,
179-183. Box, G. E. P. and Draper, N. R. (1965). The
Bayesian estimation of common parameters
Bates, D. M. and Watts, D. G. (1988). Nonlin- from several responses. Biometrika, 52, 355-
ear Regression Analysis and Its Applications. 365.
New York: Wiley.
Box, G. E. P. and Tiao, G. C. (1992). Bayesian
Beale, E. M. L. (1960). Confidence regions in Inference in Statistical Analysis. New York:
nonlinear estimation (with discussion). Jour- Wiley.
nal of Royal Statistical Society, B, 22, 41-88.
Clarke, G. P. Y. (1980). Moments of the least
Cook, R. D. and Tsai, C. L. (1985). Residuals in squares estimators in a non-linear regression
nonlinear regression. Biometrika, 72, 23-29. model. Journal of the Royal Statistical Soci-
ety, B, 42, 227-237.
Hamilton, D. C., Watts, D. G. and Bates, D. M.
Fang, K. T., Lin, D. K. J., Winker, P. and Zhang,
(1982). Accounting for intrinsic nonlinearity
Y. (2000). Uniform design: theory and appli-
in nonlinear regression parameter inference re-
cation. Technometrics, 42, 237-248.
gions. Annals of Statistics, 10, 386-393.
Hamilton, D. C. and Watts, D. G. (1985). A
Jaynes, E. T. (2003). Probability Theory: The quadratic design criterion for precise estima-
Logic of Science. New York: Cambridge Uni- tion in nonlinear regression models Techno-
versity Press. metrics, 27, 241-250.
Jennrich, R. I. (1969). Asymptotic properties of Kass, R. E. (1984). Canonical parametrizations
non-linear least squares estimators. Annals of and zero parameter-effects curvature. Journal
Mathematical Statistics, 40, 633-643. of Royal Statistical Society, B, 46, 86-92.
Marquardt, D. W. (1963). An algorithm for Nocedal, J. and Wright, S. (2006). Numerical Op-
the estimation of non-linear parameters. J. timization. 2nd ed. New York: Springer.
SIAM, 11, 431-441.

Ratkowsky, D. A. (1983). Nonlinear Regression


Modeling: A Unified Practical Approach. New
York: Marcel Dekker.

Ryan, T. (1997). Modern Regression Methods.


Series in Probability and Statistics. New
York: Wiley

Seber, G. A. F. and Wild, C. J. (1989). Nonlinear


Regression. New York: Wiley.

View publication stats

You might also like