You are on page 1of 37

Nonlinear Regression

Jerry Dwi Trijoyo Purnomo

INSTITUT TEKNOLOGI SEPULUH NOPEMBER (ITS)


Surabaya - Indonesia

www.its.ac.id
Introduction (1/2)
• The linear regression model
y = β 0 + β1 x1 + β 2 x2 +  + β k xk + ε
They are linear in the unknown parameters. We may write this model in a
general form as
y = xT β + ε
= f ( x, β ) + ε
where xT = [1, x1 , x2 ,  , xk ].

∂f (x, β) ∂ ( β 0 + ∑ j =1 β j x j )
k

= = xj
∂β j ∂β j
2
Introduction (2/2)
• Now consider the nonlinear model
y = θ1eθ 2 x + ε
This model is not linear in the unknown parameters θ1 and θ 2
• In general, we will write the nonlinear regression model as
y = f ( x, θ ) + ε
• The derivatives with respect to θ1 and θ 2
∂f (x, β) θ2 x ∂f (x, β) θ2 x
= e and = θ1 xe
∂θ1 ∂θ 2

3
Nonlinear Least Square (1/2)
• The method of LS in linear regression involves minimizing the LS function
2
n   k 
S (β) = ∑  yi −  β 0 + ∑ β j xij 
i =1 
  j =1 
• Now consider the nonlinear regression model
yi = f ( x i , θ ) + ε i
where xTi = [1, x1 , x2 ,  , xk ] for i = 1, 2, …, n.

4
Nonlinear Least Square (2/2)
• The LS function is
n
S (θ) = ∑ [ yi − f (x i , θ)]2
i =1

The LS normal equation are


n  ∂f (x i , θ) 
∑ [ yi − f (x i , θ)]  
i =1  ∂θ j 

This normal equations can be very difficult to solve.


5
Example (1/2)
• Consider the nonlinear regression model

y = θ1eθ 2 x + ε
The LS normal equations for this model are
n

∑ [ yi − θ1e ] e = 0
i =1
ˆ θˆ2 xi θˆ2 xi

∑ [ yi − θ1e ]θ1 xi e = 0
i =1
ˆ θˆ2 xi ˆ θˆ2 xi

6
Example (2/2)
• After simplification, the normal equations are
n n

∑ye
i =1
i
θˆ2 xi
− θˆ 1 ∑e
i =1
2θˆ2 xi
=0
n n

∑y xe
i =1
i i
θˆ2 xi
− θˆ 1 ∑xe
i =1
i
2θˆ2 xi
=0

These equations are not linear in θˆ1 and θˆ2 , and no simple closed-form
solution exist. In general, iterative methods must be used to find the values
of θˆ1 and θˆ2 .

7
MLE for Nonlinear Regression
• Consider the model
yi = θ1eθ 2 xi + ε i
If the errors are normally and independently distributed with mean zero
and variance σ2, then the likelihood function is
1  1 n θ 2 xi 2 
L(θ , σ ) =
2
exp − 2 ∑ [ yi − θ1e ] 
(2πσ )
2 n2
 2σ i =1 
• Maximizing this likelihood function is equivalent to minimizing the residual
sum of square. Therefore, in the normal-theory case, LS estimates are the
same as ML estimates.

8
Transformation to a Linear Model (1/2)
• Consider the model
y = θ1eθ 2 x + ε
Now since E ( y ) = f ( x, θ) = θ1eθ 2 x , we can linearize the expectation
function by taking logarithms,
ln E ( y ) = ln θ1 + θ 2 x
We can rewrite the model as
ln y = ln θ1 + θ 2 x + ε
= β 0 + β1 x + ε
Then we can use simple linear regression to estimate β 0 and β1. This is
called additive model.

9
Transformation to a Linear Model (2/2)
• If the error structure is multiplicative, say

y = θ1eθ 2 xε

then taking logarithms will be appropriate, since


ln y = ln θ1 + θ 2 x + ln ε
= β 0 + β1 x + ε *

and if ε * follows a normal distribution, all the standard linear regression


model properties and associated inference will apply.

10
The Puromycin Data
• Bates and Watts (1988) use the Michaelis-Menten model for chemical
kinetics to relate the initial velocity of an enzymatic reaction to the
substrate concentration x. The model is
θ1 x
y= +ε
x + θ2
The expectation function can be linearized easily, since
1 x + θ2 1 θ2 1
= = +
f ( x, θ) θ1 x θ1 θ1 x
= β 0 + β1u

11
Original Data
Concentration Velocity
0.02 47
0.06 97
0.11 123
0.22 152
0.56 191
1.10 200
0.02 76
0.06 107
0.11 139
0.22 159
0.56 201
1.10 207
12
Scatter Plot

Figure 1. Scatter plot of puromycin data


13
Output (Original Data)

14
Transformation Procedure
• So we are tempted to fit the linear model

y* = β 0 + β1u + ε

where
1 1 1 θ2
y* = ; u = ; β 0 = ; β1 =
y x θ1 θ1

15
Data After Transformation
Concentration_New Velocity_New
50.000000 0.021277
16.666667 0.010309
9.090909 0.008130
4.545455 0.006579
1.785714 0.005236
0.909091 0.005000
50.000000 0.013158
16.666667 0.009346
9.090909 0.007194
4.545455 0.006289
1.785714 0.004975
0.909091 0.004831
16
Output (Transform Data)

17
Regression Model (Transform Data)
• The regression model is

yˆ * = 0.005107 + 0.0002472u

• We have
1 θˆ2
0.005107 = and 0.0002472 =
θˆ 1 θˆ1
and so we can estimate θ1 and θ 2 in the original model as
θˆ = 195.81 and θˆ = 0.04841
1 2
(could become another initial value)
18
Parameter Estimation (1/5)
• A method widely used in computer algorithm for nonlinear regression is
linearization of the nonlinear function followed by the Gauss-Newton
iteration method of parameter estimation.
• Linearization is accomplished by a Taylor series expansion of f (x i , θ) about
the point θT0 = [θ10 , θ 20 , , θ p 0 ] with only the linear terms retained.

19
Parameter Estimation (2/5)
• This yields
p  ∂f (x i , θ) 
f ( x i , θ) = f ( x i , θ 0 ) + ∑   (θ j − θ j 0 )
 ∂θ j θ =θ 0
j =1 

• If we set
f i 0 = f (x i , θ 0 ); θ 0 is the initial value of θ
β 0j = θ j − θ j 0
 ∂f (x i , θ) 
Z =
0

 ∂θ j  θ =θ0
ij

20
Parameter Estimation (3/5)
• We note that the nonlinear regression model can be written as
p
yi − f i 0 = ∑ β 0j Z ij0 + ε i ; i = 1, 2, , n
j =1
That is, we now have a linear regression model. We usually call θ 0 the
starting values for the parameters.
• We may rewrite this equation as
y 0 = Z 0β 0 + ε
• So the estimate of β 0 is
ˆβ = (ZT Z ) −1 ZT y = (ZT Z ) −1 ZT (y − f )
0 0 0 0 0 0 0 0 0

21
Parameter Estimation (4/5)
• Now since β 0 = θ − θ 0 , we could define
θˆ 1 = βˆ 0 + θ 0
as revised estimates of θ. Sometimes β̂ 0 is called the vector of
increments. We may now replace the revised estimates θ̂1 and then
produce another set of revise estimates, say θ̂ 2, and so forth.
• In general, we have at the kth iteration
θˆ k +1 = θˆ k + βˆ k = θˆ k + (ZTk Z k ) −1 ZTk (y − f k )

22
Parameter Estimation (5/5)
• where
Z k = [Z ] k
ij

f k = [f , f ,  , f ]
1
k k
2
k T
n

θˆk = [θ1k ,θ 2 k ,,θ pk ]T


• The iterative process continues until convergence, that is, until
( )
θˆ j , k +1 − θˆ jk θˆ jk < δ ; j = 1, 2,, p
δ is some small number, say 1.0 x 10-6.

23
Estimation of Variance
• When the estimation procedure converges to a final vector of parameter
estimates θ̂, we can obtain an estimate of the error variance σ2 from the
residual mean square

∑i =1 i i ∑i =1 i θ
n n
( y − ˆ
y ) 2
[ y − f ( x , ˆ )]2 S (θˆ )
σˆ = MS Re s = = =
i

n− p n− p n− p

• We may also estimate the asymptotic (large-sample) covariance matrix


of θ̂ by
var(θˆ ) = σˆ 2 (ZT Z) −1

24
100(1-α)% CI
• Approximate 100(1-α)% CI for θ1 and θ 2 are found as follows:
θˆ1 − tα 2, n − p se(θˆ1 ) ≤ θ1 ≤ θˆ1 + tα 2, n − p se(θˆ1 )

and
θˆ2 − tα 2, n − p se(θˆ2 ) ≤ θ 2 ≤ θˆ2 + tα 2, n − p se(θˆ2 )

where

se(θˆ ) = var(θˆ )

25
Puromycin Data
• Bates and Watts (1988) use the Gauss-Newton method to fit the Michaelis-
Menten model to puromycin data using starting value θ10 = 205 and
θ 20 = 0.08 (or θˆ T0 = [205, 0.08]T ). At this starting point, the residual sum of
squares S (θ 0 ) = 3155. Note that
∂f ( x, θ1 , θ 2 ) x ∂f ( x, θ1 , θ 2 ) − θ1 x
= and =
∂θ1 θ2 + x ∂θ 2 (θ 2 + x) 2

26
Parameter Estimation (1/4)
• and since the first observation on x is x1 = 0.02, we have
x1 0.02
Z =
0
= = 0.2000
θ 2 + x1 θ 0.08 + 0.02
11
2 = 0.08

− θ1 x1 (−205)(0.02)
Z =
0
=
(θ 2 + x1 ) (0.08 + 0.02)
12 2 2
θ1 = 205,θ 2 = 0.08

= −410.00

27
Parameter Estimation (2/4)
i xi yi fi0 yi − f i 0 Z i01 Z i02
1 0.02 76 41.00 35.00 0.2000 -410.00
2 0.02 47 41.00 6.00 0.2000 -410.00
3 0.06 97 87.86 9.14 0.4286 -627.55
4 0.06 107 87.86 19.14 0.4286 -627.55
5 0.11 123 118.68 4.32 0.5789 -624.65
6 0.11 139 118.68 20.32 0.5789 -624.65
7 0.22 159 150.33 8.67 0.7333 -501.11
8 0.22 152 150.33 1.67 0.7333 -501.11
9 0.56 191 179.38 11.62 0.8750 -280.27
10 0.56 201 179.38 21.62 0.8750 -280.27
11 1.10 207 191.10 15.90 0.9322 -161.95
12 1.10 200 191.10 8.90 0.9322 -161.95
28
Parameter Estimation (3/4)
• The derivatives Z ij0 are now collected into the matrix Z 0 and the vector of
increments calculated as

ˆβ =  8 . 03 
0 − 0.017 
 
The revised estimate θˆ 1 ,
θˆ = βˆ + θ
1 0 0

 8.03   205  213.03


=  +  = 
 − 0 . 017   0 . 08   0 . 063 
29
Parameter Estimation (4/4)
• The residual sum of squares at this point is S (θˆ 1 ) = 1206, which is
considerably smaller than S (θ 0 ). Therefore, θ̂1 is adopted as the revised
estimate of θ, and another iteration would be performed.
• The Gauss-Newton algorithm converged at θˆ T = [212.7, 0.0641]T with
S (θˆ ) = 1195. Therefore, the fitted model obtained by linearization is
θˆ2 x
yˆ = θ e = 212.7e 0.0641x
1
ˆ

30
Residuals
• That is much better fit to the data than did the transformation followed by
linear regression.
• Residual can be obtained from a fitted nonlinear regression model
ei = yi − yˆ i
= yi − 212.7e 0.0641x , i = 1, 2,  ,10

31
Estimation of Variance (1/2)
• We can obtain an estimate of the error variance σ2 from the residual mean
square

S (θˆ ) 1195
σˆ =
2
= = 119.5
n − p 12 − 2

The asymptotic (large-sample) covariance matrix is defined as

 0.4037 36.82 ×10 −5 


var(θ) = σˆ (Z Z) = 119.5
ˆ 2 T −1
−5 −5 
36 . 82 × 10 57.36 ×10 

32
Estimation of Variance (2/2)
• The main diagonal elements of this matrix are appropriate variances of the
estimates of the regression coefficients. Therefore, approximate standard
errors on the coefficients are
se(θˆ ) = var(θˆ ) = 119.5(0.4037) = 6.95
1 1
and

se(θˆ2 ) = var(θˆ2 ) = 119.5(57.36 × 10 −8 ) = 8.28 ×10 −3

and the correlation between θˆ1 and θˆ2 is about


36.82 ×10 −5
= 0.77
−8
0.4037(57.36 ×10 )
33
95% CI
• Approximate 95% CI for θ1 and θ 2 are found as follows:

θˆ1 − t0.025,10 se(θˆ1 ) ≤ θ1 ≤ θˆ1 + t0.025,10 se(θˆ1 )


212.7 − 2.228(6.95) ≤ θ1 ≤ 212.7 + 2.228(6.95)
197.2 ≤ θ1 ≤ 228.2
and

θˆ2 − t0.025, n − p se(θˆ2 ) ≤ θ 2 ≤ θˆ2 + t0.025, n − p se(θˆ2 )


0.0641 − 2.228(0.00828) ≤ θ 2 ≤ 0.0641 + 2.228(0.00828)
0.0457 ≤ θ 2 ≤ 0.0825 34
MINITAB

Figure 2. Quadratic line plot for puromycin data


35
Output (1/2)

try to use different initial


value (page 18)

36
Output (2/2)

We cannot find R2 in
nonlinear regression due to
“nonlinear in parameter” is
occured

= σˆ
used to find the best model
37

You might also like