Week 3 - Nonlinear Regression

Nonlinear Regression
Jerry Dwi Trijoyo Purnomo
INSTITUT TEKNOLOGI SEPULUH NOPEMBER (ITS)

Surabaya - Indonesia
www.its.ac.id
Introduction (1/2)
• The linear regression model
y = β 0 + β1 x1 + β 2 x2 +  + β k xk + ε
They are linear in the unknown parameters. We may write this model in a
general form as
y = xT β + ε
= f ( x, β ) + ε
where xT = [1, x1 , x2 ,  , xk ].
∂f (x, β) ∂ ( β 0 + ∑ j =1 β j x j )
k
= = xj
∂β j ∂β j
2
Introduction (2/2)
• Now consider the nonlinear model
y = θ1eθ 2 x + ε
This model is not linear in the unknown parameters θ1 and θ 2
• In general, we will write the nonlinear regression model as
y = f ( x, θ ) + ε
• The derivatives with respect to θ1 and θ 2
∂f (x, β) θ2 x ∂f (x, β) θ2 x
= e and = θ1 xe
∂θ1 ∂θ 2
3
Nonlinear Least Square (1/2)
• The method of LS in linear regression involves minimizing the LS function
2
n   k 
S (β) = ∑  yi −  β 0 + ∑ β j xij 
i =1 
  j =1 
• Now consider the nonlinear regression model
yi = f ( x i , θ ) + ε i
where xTi = [1, x1 , x2 ,  , xk ] for i = 1, 2, …, n.
4
Nonlinear Least Square (2/2)
• The LS function is
n
S (θ) = ∑ [ yi − f (x i , θ)]2
i =1
The LS normal equation are

n  ∂f (x i , θ) 
∑ [ yi − f (x i , θ)]  
i =1  ∂θ j 
This normal equations can be very difficult to solve.

5
Example (1/2)
• Consider the nonlinear regression model
y = θ1eθ 2 x + ε
The LS normal equations for this model are
n
∑ [ yi − θ1e ] e = 0
i =1
ˆ θˆ2 xi θˆ2 xi
∑ [ yi − θ1e ]θ1 xi e = 0
i =1
ˆ θˆ2 xi ˆ θˆ2 xi
6
Example (2/2)
• After simplification, the normal equations are
n n
∑ye
i =1
i
θˆ2 xi
− θˆ 1 ∑e
i =1
2θˆ2 xi
=0
n n
∑y xe
i =1
i i
θˆ2 xi
− θˆ 1 ∑xe
i =1
i
2θˆ2 xi
=0
These equations are not linear in θˆ1 and θˆ2 , and no simple closed-form
solution exist. In general, iterative methods must be used to find the values
of θˆ1 and θˆ2 .
7
MLE for Nonlinear Regression
• Consider the model
yi = θ1eθ 2 xi + ε i
If the errors are normally and independently distributed with mean zero
and variance σ2, then the likelihood function is
1  1 n θ 2 xi 2 
L(θ , σ ) =
2
exp − 2 ∑ [ yi − θ1e ] 
(2πσ )
2 n2
 2σ i =1 
• Maximizing this likelihood function is equivalent to minimizing the residual
sum of square. Therefore, in the normal-theory case, LS estimates are the
same as ML estimates.
8
Transformation to a Linear Model (1/2)
• Consider the model
y = θ1eθ 2 x + ε
Now since E ( y ) = f ( x, θ) = θ1eθ 2 x , we can linearize the expectation
function by taking logarithms,
ln E ( y ) = ln θ1 + θ 2 x
We can rewrite the model as
ln y = ln θ1 + θ 2 x + ε
= β 0 + β1 x + ε
Then we can use simple linear regression to estimate β 0 and β1. This is
called additive model.
9
Transformation to a Linear Model (2/2)
• If the error structure is multiplicative, say
y = θ1eθ 2 xε
then taking logarithms will be appropriate, since

ln y = ln θ1 + θ 2 x + ln ε
= β 0 + β1 x + ε *
and if ε * follows a normal distribution, all the standard linear regression

model properties and associated inference will apply.
10
The Puromycin Data
• Bates and Watts (1988) use the Michaelis-Menten model for chemical
kinetics to relate the initial velocity of an enzymatic reaction to the
substrate concentration x. The model is
θ1 x
y= +ε
x + θ2
The expectation function can be linearized easily, since
1 x + θ2 1 θ2 1
= = +
f ( x, θ) θ1 x θ1 θ1 x
= β 0 + β1u
11
Original Data
Concentration Velocity
0.02 47
0.06 97
0.11 123
0.22 152
0.56 191
1.10 200
0.02 76
0.06 107
0.11 139
0.22 159
0.56 201
1.10 207
12
Scatter Plot
Figure 1. Scatter plot of puromycin data

13
Output (Original Data)
14
Transformation Procedure
• So we are tempted to fit the linear model
y* = β 0 + β1u + ε
where
1 1 1 θ2
y* = ; u = ; β 0 = ; β1 =
y x θ1 θ1
15
Data After Transformation
Concentration_New Velocity_New
50.000000 0.021277
16.666667 0.010309
9.090909 0.008130
4.545455 0.006579
1.785714 0.005236
0.909091 0.005000
50.000000 0.013158
16.666667 0.009346
9.090909 0.007194
4.545455 0.006289
1.785714 0.004975
0.909091 0.004831
16
Output (Transform Data)
17
Regression Model (Transform Data)
• The regression model is
yˆ * = 0.005107 + 0.0002472u
• We have
1 θˆ2
0.005107 = and 0.0002472 =
θˆ 1 θˆ1
and so we can estimate θ1 and θ 2 in the original model as
θˆ = 195.81 and θˆ = 0.04841
1 2
(could become another initial value)
18
Parameter Estimation (1/5)
• A method widely used in computer algorithm for nonlinear regression is
linearization of the nonlinear function followed by the Gauss-Newton
iteration method of parameter estimation.
• Linearization is accomplished by a Taylor series expansion of f (x i , θ) about
the point θT0 = [θ10 , θ 20 , , θ p 0 ] with only the linear terms retained.
19
• This yields
p  ∂f (x i , θ) 
f ( x i , θ) = f ( x i , θ 0 ) + ∑   (θ j − θ j 0 )
 ∂θ j θ =θ 0
j =1 
• If we set
f i 0 = f (x i , θ 0 ); θ 0 is the initial value of θ
β 0j = θ j − θ j 0
 ∂f (x i , θ) 
Z =
0

 ∂θ j  θ =θ0
ij
20
• We note that the nonlinear regression model can be written as
p
yi − f i 0 = ∑ β 0j Z ij0 + ε i ; i = 1, 2, , n
j =1
That is, we now have a linear regression model. We usually call θ 0 the
starting values for the parameters.
• We may rewrite this equation as
y 0 = Z 0β 0 + ε
• So the estimate of β 0 is
ˆβ = (ZT Z ) −1 ZT y = (ZT Z ) −1 ZT (y − f )
0 0 0 0 0 0 0 0 0
21
• Now since β 0 = θ − θ 0 , we could define
θˆ 1 = βˆ 0 + θ 0
as revised estimates of θ. Sometimes β̂ 0 is called the vector of
increments. We may now replace the revised estimates θ̂1 and then
produce another set of revise estimates, say θ̂ 2, and so forth.
• In general, we have at the kth iteration
θˆ k +1 = θˆ k + βˆ k = θˆ k + (ZTk Z k ) −1 ZTk (y − f k )
22
• where
Z k = [Z ] k
ij
f k = [f , f ,  , f ]
1
k k
2
k T
n
θˆk = [θ1k ,θ 2 k ,,θ pk ]T

• The iterative process continues until convergence, that is, until
( )
θˆ j , k +1 − θˆ jk θˆ jk < δ ; j = 1, 2,, p
δ is some small number, say 1.0 x 10-6.
23
Estimation of Variance
• When the estimation procedure converges to a final vector of parameter
estimates θ̂, we can obtain an estimate of the error variance σ2 from the
residual mean square
∑i =1 i i ∑i =1 i θ
n n
( y − ˆ
y ) 2
[ y − f ( x , ˆ )]2 S (θˆ )
σˆ = MS Re s = = =
i
n− p n− p n− p
• We may also estimate the asymptotic (large-sample) covariance matrix

of θ̂ by
var(θˆ ) = σˆ 2 (ZT Z) −1
24
100(1-α)% CI
• Approximate 100(1-α)% CI for θ1 and θ 2 are found as follows:
θˆ1 − tα 2, n − p se(θˆ1 ) ≤ θ1 ≤ θˆ1 + tα 2, n − p se(θˆ1 )
and
θˆ2 − tα 2, n − p se(θˆ2 ) ≤ θ 2 ≤ θˆ2 + tα 2, n − p se(θˆ2 )
where
se(θˆ ) = var(θˆ )
25
Puromycin Data
• Bates and Watts (1988) use the Gauss-Newton method to fit the Michaelis-
Menten model to puromycin data using starting value θ10 = 205 and
θ 20 = 0.08 (or θˆ T0 = [205, 0.08]T ). At this starting point, the residual sum of
squares S (θ 0 ) = 3155. Note that
∂f ( x, θ1 , θ 2 ) x ∂f ( x, θ1 , θ 2 ) − θ1 x
= and =
∂θ1 θ2 + x ∂θ 2 (θ 2 + x) 2
26
• and since the first observation on x is x1 = 0.02, we have
x1 0.02
Z =
0
= = 0.2000
θ 2 + x1 θ 0.08 + 0.02
11
2 = 0.08
− θ1 x1 (−205)(0.02)
Z =
0
=
(θ 2 + x1 ) (0.08 + 0.02)
12 2 2
θ1 = 205,θ 2 = 0.08
= −410.00
27
i xi yi fi0 yi − f i 0 Z i01 Z i02
1 0.02 76 41.00 35.00 0.2000 -410.00
2 0.02 47 41.00 6.00 0.2000 -410.00
3 0.06 97 87.86 9.14 0.4286 -627.55
4 0.06 107 87.86 19.14 0.4286 -627.55
5 0.11 123 118.68 4.32 0.5789 -624.65
6 0.11 139 118.68 20.32 0.5789 -624.65
7 0.22 159 150.33 8.67 0.7333 -501.11
8 0.22 152 150.33 1.67 0.7333 -501.11
9 0.56 191 179.38 11.62 0.8750 -280.27
10 0.56 201 179.38 21.62 0.8750 -280.27
11 1.10 207 191.10 15.90 0.9322 -161.95
12 1.10 200 191.10 8.90 0.9322 -161.95
28
• The derivatives Z ij0 are now collected into the matrix Z 0 and the vector of
increments calculated as
ˆβ =  8 . 03 
0 − 0.017 
 
The revised estimate θˆ 1 ,
θˆ = βˆ + θ
1 0 0
 8.03   205  213.03

=  +  = 
 − 0 . 017   0 . 08   0 . 063 
29
• The residual sum of squares at this point is S (θˆ 1 ) = 1206, which is
considerably smaller than S (θ 0 ). Therefore, θ̂1 is adopted as the revised
estimate of θ, and another iteration would be performed.
• The Gauss-Newton algorithm converged at θˆ T = [212.7, 0.0641]T with
S (θˆ ) = 1195. Therefore, the fitted model obtained by linearization is
θˆ2 x
yˆ = θ e = 212.7e 0.0641x
1
ˆ
30
Residuals
• That is much better fit to the data than did the transformation followed by
linear regression.
• Residual can be obtained from a fitted nonlinear regression model
ei = yi − yˆ i
= yi − 212.7e 0.0641x , i = 1, 2,  ,10
31
Estimation of Variance (1/2)
• We can obtain an estimate of the error variance σ2 from the residual mean
square
S (θˆ ) 1195
σˆ =
2
= = 119.5
n − p 12 − 2
The asymptotic (large-sample) covariance matrix is defined as
 0.4037 36.82 ×10 −5 

var(θ) = σˆ (Z Z) = 119.5
ˆ 2 T −1
−5 −5 
36 . 82 × 10 57.36 ×10 
32
Estimation of Variance (2/2)
• The main diagonal elements of this matrix are appropriate variances of the
estimates of the regression coefficients. Therefore, approximate standard
errors on the coefficients are
se(θˆ ) = var(θˆ ) = 119.5(0.4037) = 6.95
1 1
and
se(θˆ2 ) = var(θˆ2 ) = 119.5(57.36 × 10 −8 ) = 8.28 ×10 −3
and the correlation between θˆ1 and θˆ2 is about

36.82 ×10 −5
= 0.77
−8
0.4037(57.36 ×10 )
33
95% CI
• Approximate 95% CI for θ1 and θ 2 are found as follows:
θˆ1 − t0.025,10 se(θˆ1 ) ≤ θ1 ≤ θˆ1 + t0.025,10 se(θˆ1 )

212.7 − 2.228(6.95) ≤ θ1 ≤ 212.7 + 2.228(6.95)
197.2 ≤ θ1 ≤ 228.2
and
θˆ2 − t0.025, n − p se(θˆ2 ) ≤ θ 2 ≤ θˆ2 + t0.025, n − p se(θˆ2 )

0.0641 − 2.228(0.00828) ≤ θ 2 ≤ 0.0641 + 2.228(0.00828)
0.0457 ≤ θ 2 ≤ 0.0825 34
MINITAB
Figure 2. Quadratic line plot for puromycin data

35
Output (1/2)
try to use different initial

value (page 18)
36
Output (2/2)
We cannot find R2 in
nonlinear regression due to
“nonlinear in parameter” is
occured
= σˆ
used to find the best model
37

Week 3 - Nonlinear Regression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 3 - Nonlinear Regression

Uploaded by

Copyright:

Available Formats

Nonlinear Regression

Jerry Dwi Trijoyo Purnomo

INSTITUT TEKNOLOGI SEPULUH NOPEMBER (ITS)

The LS normal equation are

This normal equations can be very difficult to solve.

then taking logarithms will be appropriate, since

and if ε * follows a normal distribution, all the standard linear regression

Figure 1. Scatter plot of puromycin data

θˆk = [θ1k ,θ 2 k ,,θ pk ]T

• We may also estimate the asymptotic (large-sample) covariance matrix

 8.03   205  213.03

The asymptotic (large-sample) covariance matrix is defined as

 0.4037 36.82 ×10 −5 

se(θˆ2 ) = var(θˆ2 ) = 119.5(57.36 × 10 −8 ) = 8.28 ×10 −3

and the correlation between θˆ1 and θˆ2 is about

θˆ1 − t0.025,10 se(θˆ1 ) ≤ θ1 ≤ θˆ1 + t0.025,10 se(θˆ1 )

θˆ2 − t0.025, n − p se(θˆ2 ) ≤ θ 2 ≤ θˆ2 + t0.025, n − p se(θˆ2 )

Figure 2. Quadratic line plot for puromycin data

try to use different initial

You might also like