You are on page 1of 12

Nonlinear Regression

Model: Y
i
= f (x

i
;

) +
i
, i = 1, . . . , n

Y: response variable

= (x
1
, . . . , x
r
): predictors

= (
1
, . . . ,
p
)

: parameters (unknown)

: random error (unobserved)

Assumptions:
1
,. . . ,
n
are iid with mean 0 and variance
2
(unknown)

Matrix notation:
Y

= f (X;

) +

where Y

= (Y
1
, . . . , Y
n
)

, X = (x

1
, . . . , x

n
)

= (
1
, . . . ,
n
)

and

are n 1 vectors,
X =

x
11
x
1r
.
.
.
.
.
.
x
n1
x
nr

is n r matrix, x
ij
being the i th observation on variable j .
Examples
1. f (x;

) =
1
+
2
x (linear)
2. f (x;

) =
1
+
2
x +
3
x
2
(NL in x, but linear in x and x
2
)
Some say linear in parameters
3. f (x;

) =
1
+
2
e

3
x
(exponential decay/growth)
4. f (x;

) =

1
1+
2
e

3
x
(growth)
5. f (x;

) =

k
i =1

i
x
i 1
1 +

m
i =1

i +k
x
i
(rational)
6. Nonlinear: not necessarily linear. Some mean not linear.
Box, Hunter and Hunter (1978)
Statistics for Experimenters

Y =
1
(1 e

2
x
) +

Y: biochemical oxygen demand in mg/l

x: incubation time in days


0 2 4 6 8 10 12
80
100
120
140
160
180
200
220
240

x <- c(1,2,3,5,7,10); y <- c(109, 149, 149, 191, 213, 224)


xx <- seq(from=0, to=10, by=0.1); yy <- 213.8094*(1-exp(-0.5472*xx))
plot(x,y); lines(xx, yy)

More info at www.itl.nist.gov/div898/strd/nls/data/boxbod.shtml


Ratkowsky, D. A. (1983)
Nonlinear Regression Modelling, p61, 88

Y =

1
1 + e

3
x
+

Y: pasture yield, x: growing time


0 10 20 30 40 50 60 70 80 90
0
10
20
30
40
50
60
70
80

More info itl.nist.gov/div898/strd/nls/data/ratkowsky2.shtml


Bates and Watts (1988)
Nonlinear Regression Analysis and Its Applications

Y =
1
+
2
x

3
+

Y: log PCB (polychlorinated biphenyl) concentration (ppm)


in lake trout

x: age (years)
0 2 4 6 8 10 12 14
1
0.5
0
0.5
1
1.5
2
2.5
3
3.5
age
l
o
g
(
P
C
B
)

More info at www.statsci.org/data/general/troutpcb.html


Least Squares Estimation

Find

to minimize
n

i =1
(y
i
f (x

i
;

))
2
= ||y

f (X;

)||
2

Equivalent to ML estimation when normal errors

Partial derivatives w.r.t.

i =1
2(y
i
f (x

i
;

))
f (x

i
;

j
, j = 1, . . . , p

Gradient vector (partial derivatives in a column)


2

f (x

1
;

1

f (x

n
;

1
.
.
.
.
.
.
f (x

1
;

p

f (x

n
;

y
1
f (x

1
;

)
.
.
.
y
n
f (x

n
;

Normal Equations (Partial derivatives=0)


F

)(y

f (X;

)) = 0

This is because the gradient vector is

(y

f (X;

))

(y

f (X;

)) = 2F(

(y

f (X;

))
where the Jacobian
F(

) =
f (X;

f (x

1
;

1

f (x

1
;

p
.
.
.
.
.
.
.
.
.
f (x

n
;

1

f (x

n
;

Compare with X

(y

) = 0 for linear case y

= X

.
See Gallant, A. R. (1987). Nonlinear Statistical Models, p15.
Properties of LS Estimators
It can be shown that

+ (F

F)
1
F

+ o
p
(n
1/2
)

2
=
||y

f (X;

)||
2
n p
=

[I F(F

F)
1
F

n p
+ o
p
(n
1
)
With normal errors and asymptotically,

N
p
(

,
2
(F

F)
1
),
(n p)
2

2

2
np
and

and
2
are independent.
More details in Gallant, A. R. (1987). Nonlinear Statistical Models,
pp16-7.
Inferences

N
p
(

,
2
(F

F)
1
) approximately for large n

2
=
1
n p
n

i =1
(y
i
f (x

i
;

))
2
on n p d. f.
Condence intervals and t-tests on

Also F-tests:
F =
(SSE
reduced
SSE
full
)/d

2
full
F
d;np
under reduced model
d = no. of parameters full - reduced
Example. PCB in lake trout.

Model: Y =
1
+
2
x

3
+ e Y : log(PCB)

Initial values:
0
3
= 0.5 (guess)

0
1
= 0.0315,
0
2
= 0.2591 (linear LS estimates using
0
3
)

Better initial values can be obtained by plotting


S(
3
) = min

1
,
2
28

i =1
(y
i

1

2
x

3
i
)
2
(by linear regression)
against
3
.

Nonlinear LS estimates

1
= 4.8647 s.e.(

1
) = 8.4243 t = -0.5775

2
= 4.7016 s.e.(

2
) = 8.2721 t = 0.5684

3
= 0.1969 s.e.(

3
) = 0.2739 t = 0.7189

No signicant parameters? None sig. in presence of others.

At least
2
= 0 and
3
= 0, for otherwise E[Y]=costant.
Example (continued). PCB in lake trout.

Correlation between estimated parameters

1
0.997

2
-0.998 -1.000
Highly correlated, making s.e.s large and t-tests nonsig.

F-test on H
0
:
3
= 0 against H
1
:
3
(, ) (not
necessarily zero)

Under H
0
, model becomes Y = + e, where =
1
+
2
SSE=31.12 on 28-1 d.f.

Compared with SSE=6.33 on 28-3 d.f. under H


1

F statistic: F =
(31.126.33)/(2725)
6.33/25
= 48.89 > F
0.01;2,25
= 5.57

Signicant evidence at 1% sig. level to reject H


0
:
3
= 0
(P-value=0.000)

Critical value and P-value: qf(0.99, 2, 25); 1-pf(48.59, 2, 25)


q for quantile, p for probability (to the left), f for F-distribution
Reading List

J Fox (2002). Nonlinear Regression and Nonlinear Least


Squares. http://cran.r-project.org/doc/contrib/Fox-
Companion/appendix-nonlinear-regression.pdf

Gallant, A. R. (1975). Nonlinear Regression. The American


Statistician. 29, 73-81. (Library - e-journal)

Bates and Watts (1988). Nonlinear Regression Analysis and


Its Applications. (Book)

Gallant, A. R. (1987). Nonlinear Statistical Models. (Book)

You might also like