Professional Documents
Culture Documents
1
What is Regression?
What is regression? Given n data points ( x1, y1), ( x 2, y 2), ... , ( xn, yn)
best fit y f (x ) to the data. The best fit is generally based on
minimizing the sum of the square of the residuals, S r.
y f (x )
Sum of the square of the residuals
n
( x1, y1)
Sr ( yi f ( xi )) 2
i 1
Figure. Basic model for regression
2
Linear Regression-Criterion#1
Given n data points ( x1, y1), ( x 2, y 2), ... , ( xn, yn) best fit y a 0 a 1 x
to the data.
y
x ,y
i i
i y i a 0 a1 x i xn , y n
x2 , y2
x3 , y3
x1 , y 1 i yi a0 a1xi
x
Figure. Linear regression of y vs. x data showing residuals at a typical point, xi .
n
Does minimizing
i 1
i work as a criterion, where i yi ( a 0 a1 xi )
3
Example for Criterion#1
Example: Given the data points (2,4), (3,6), (2,6) and (3,8), best fit
the data to a straight line using Criterion#1
8
x y
6
2.0 4.0
y
3.0 6.0 4
2
2.0 6.0
3.0 8.0 0
0 1 2 3 4
x
4
Linear Regression-Criteria#1
Using y=4x-4 as the regression curve
x y ypredicted ε = y - ypredicted 8
y
3.0 6.0 8.0 -2.0 4
5
Linear Regression-Criteria#1
Using y=6 as a regression curve
y
4
2.0 6.0 6.0 0.0
2
3.0 8.0 6.0 2.0
4 0
i 1
i
0 0 1 2 3 4
x
6
Linear Regression – Criterion #1
i 1
i 0 for both regression models of y=4x-4 and y=6.
7
Linear Regression-Criterion#2
n
Will minimizing
i 1
i work any better?
y
xi , yi
i y i a 0 a1 x i xn , yn
x2 , y2
x3 , y3
x ,y i yi a0 a1xi
1 1
x
Figure. Linear regression of y vs. x data showing residuals at a typical point, xi .
8
Linear Regression-Criteria 2
Using y=4x-4 as the regression curve
y
2.0 4.0 4.0 0.0 4
3.0 6.0 8.0 2.0
2
2.0 6.0 4.0 2.0
0
3.0 8.0 8.0 0.0 0 1 2 3 4
4 x
i 1
i
4
Figure. Regression curve for y=4x-4, y vs. x data
9
Linear Regression-Criteria#2
Using y=6 as a regression curve
x y ypredicted |ε| = |y – 8
ypredicted|
6
2.0 4.0 6.0 2.0
y 4
3.0 6.0 6.0 0.0
2
2.0 6.0 6.0 0.0 0
3.0 8.0 6.0 2.0 0 1 2 3 4
4 x
i 1
i 4
Figure. Regression curve for y=6, y vs. x data
10
Linear Regression-Criterion#2
4
i 1
i 4 for both regression models of y=4x-4 and y=6.
The sum of the errors has been made as small as possible, that
is 4, but the regression model is not unique.
Hence the above criterion of minimizing the sum of the absolute
value of the residuals is also a bad criterion.
4
Can you find a regression line for which
i 1
i 4 and has unique
regression coefficients?
11
Least Squares Criterion
The least squares criterion minimizes the sum of the square of the
residuals in the model, and also produces a unique line.
n n 2
S r i yi a0 a1 xi
2
i 1 i 1
y
xi , yi
i y i a 0 a1 x i xn , y n
x ,y
2 2
x3 , y3
x ,y
i yi a0 a1xi
1 1
x
Figure. Linear regression of y vs. x data showing residuals at a typical point, xi .
12
Finding Constants of Linear Model
n n 2
i 1 i 1
To find a0 and a1 we minimize Sr with respect to a1 and a0 .
S r n
2 yi a 0 a1 xi 1 0
a0 i 1
S r n
2 yi a0 a1 xi xi 0
a1 i 1
giving
n n n
a a x y
i 1
0
i 1
1 i
i 1
i
n n n
(a0 y a1 x)
a x a x
i 1
0 i
i 1
1 i
2
yi xi
i 1
13
Finding Constants of Linear Model
Solving for a 0 and a1 directly yields,
n n n
n x i y i x i y i
i 1 i 1 i 1
a1 2
2
n n
n xi x i
i 1 i 1
and
n n n n
x y x x y
2
i i i i i
a0 i 1 i 1 i 1 i 1
2 (a0 y a1 x)
n
n
n xi2 xi
i 1 i 1
14
Example 1
The torque, T needed to turn the torsion spring of a mousetrap through
an angle, is given below. Find the constants for the model given by
T k1 k 2
Angle, θ Torque, T
Torque (N-m)
0.3
Radians N-m
0.2
0.698132 0.188224
0.959931 0.209138
1.134464 0.230052 0.1
0.5 1 1.5 2
1.570796 0.250965 θ (radians)
1.919862 0.313707
Figure. Data points for Angle vs. Torque data
15
Example 1 cont.
The following table shows the summations needed for the calculations of
the constants in the regression model.
Table. Tabulation of data for calculation of important
summations
Using equations described for
T 2 T
a 0 and a1 with n 5
Radians N-m Radians2 N-m-Radians 5 5 5
0.698132 0.188224 0.487388 0.131405 n i Ti i Ti
0.959931 0.209138 0.921468 0.200758 k2 i 1 i 1 i 1
2
2
5 5
1.134464 0.230052 1.2870 0.260986
n i i
1.570796 0.250965 2.4674 0.394215 i 1 i 1
51.5896 6.28311.1921
1.919862 0.313707 3.6859 0.602274
5
i 1
6.2831 1.1921 8.8491 1.5896 58.8491 6.2831
2
9.6091 10 2 N-m/rad
16
Example 1 cont.
Use the average torque and average angle to calculate k1
5 5
_ T i _ i
T i 1
i 1
n n
1.1921 6.2831
5 5
2.3842 10 1 1.2566
Using,
_ _
k1 T k 2
2.3842 10 1 (9.6091 10 2 )(1.2566)
1.1767 10 1 N-m
17
Example 1 Results
Using linear regression, a trend line is found from the data
Can you find the energy in the spring if it is twisted from 0 to 180 degrees?
18
Example 2
To find the longitudinal modulus of composite, the following data is
collected. Find the longitudinal modulus, E using the regression model
Table. Stress vs. Strain data E and the sum of the square of the
Strain Stress residuals.
(%) (MPa)
0 0 3.0E+09
0.183 306
0.36 612
Stress, σ (Pa)
2.0E+09
0.5324 917
0.702 1223
0.867 1529 1.0E+09
1.0244 1835
1.1774 2140
0.0E+00
1.329 2446 0 0.005 0.01 0.015 0.02
1.479 2752 Strain, ε (m/m)
1.5 2767
Figure. Data points for Stress vs. Strain data
1.56 2896
19
Example 2 cont.
Residual at each point is given by
i i E i
The sum of the square of the residuals then is
n
S r i2
i 1
n
i E i
2
i 1
i i
Therefore E i 1
n
2
i
i 1
20
Example 2 cont.
Table. Summation data for regression model
With
i ε σ ε2 εσ
12
1
2
0.0000
1.8300×10−3
0.0000
3.0600×108
0.0000
3.3489×10−6
0.0000
5.5998×105
i 1
i
2
1.2764 10 3
i 1
1.2764×10−3 2.3337×108 182.84 GPa
21
Example 2 Results
The equation 182.84 describes the data.
22
Nonlinear Regression
Nonlinear Regression
Some popular nonlinear regression models:
24
Nonlinear Regression
Given n data points ( x1, y1), ( x 2, y 2), ... , ( xn, yn) best fit y f (x )
to the data, where f (x ) is a nonlinear function of x .
( xn , yn )
( x2 , y2 )
y f (x)
( xi , yi )
yi f ( xi )
( x1, y1 )
25
Regression
Exponential Model
26
Exponential Model
Given ( x1 , y1 ), ( x 2 , y 2 ), ... , ( x n , y n ) best fit y ae to the data.
bx
(x , y )
1 1
y aebx
yi aebxi
(x , y )
i i
(x , y )
2 2 ( xn , yn )
27
Finding Constants of Exponential Model
The sum of the square of the residuals is defined as
n
bxi 2
Sr yi ae
i 1
Differentiate with respect to a and b
S r
n
2 y i ae bxi e bxi 0
a i 1
S r
n
2 y i ae bxi
axi e 0
bxi
b i 1
28
Finding Constants of Exponential Model
Rewriting the equations, we obtain
n n
bxi 2bxi
yi e ae 0
i 1 i 1
n n
bxi 2bxi
y i xi e a xi e 0
i 1 i 1
29
Finding constants of Exponential Model
Solving the first equation for a yields
n
bxi
yi e
i 1
a n
2bxi
e
i 1
t(hrs) 0 1 3 5 7 9
1.000 0.891 0.708 0.562 0.447 0.355
31
Example 1-Exponential Model cont.
The relative intensity is related to time by the equation
t
Ae
Find:
a) The value of the regression constants A and
b) The half-life of Technium-99m
c) Radiation intensity after 24 hours
32
Plot of data
33
Constants of the Model
Ae t
i
e t i
A i 1
n
e 2 ti
i 1
34
Setting up the Equation in MATLAB
n
ti
n
ie n
f i t i e ti
i 1
n
ti e
2ti
0
i 1 2ti i 1
e
i 1
t (hrs) 0 1 3 5 7 9
γ 1.000 0.891 0.708 0.562 0.447 0.355
35
Setting up the Equation in MATLAB
n
ti
n
ie n
f i t i e ti
i 1
n
i
t e 2ti
0
i 1 2ti i 1
e
i 1
0.1151
t=[0 1 3 5 7 9]
gamma=[1 0.891 0.708 0.562 0.447 0.355]
syms lamda
sum1=sum(gamma.*t.*exp(lamda*t));
sum2=sum(gamma.*exp(lamda*t));
sum3=sum(exp(2*lamda*t));
sum4=sum(t.*exp(2*lamda*t));
f=sum1-sum2/sum3*sum4;
36
Calculating the Other Constant
The value of A can now be calculated
6
e i
ti
A i 1
6 0.9998
e 2 ti
i 1
0.9998 e 0.1151t
37
Plot of data and regression curve
0.9998 e 0.1151t
38
Relative Intensity After 24 hrs
The relative intensity of radiation after 24 hours
0.1151 24
0.9998 e
2
6.3160 10
This result implies that only
6.316 102
100 6.317%
0.9998
radioactive intensity is left after 24 hours.
39
Homework
• What is the half-life of Technetium 99m
isotope?
• Write a program in the language of your
choice to find the constants of the
model.
• Compare the constants of this regression
model with the one where the data is
transformed.
t
• What if the model was e ?
40
Polynomial Model
Given ( x1, y1), ( x 2, y 2), ... , ( xn, yn) best fit y a0 a1 x ... am x
m
( xn , yn )
( x2 , y2 )
(x , y ) y a0 a1x am xm
i i
yi f ( xi )
(x1, y1)
41
Polynomial Model cont.
The residual at each data point is given by
E i y i a 0 a1 xi . . . a m xim
The sum of the square of the residuals then is
n
S r Ei2
i 1
y i a 0 a1 xi . . . a m xim
n
2
i 1
42
Polynomial Model cont.
To find the constants of the polynomial model, we set the derivatives
with respect to ai where i 1, m, equal to zero.
S r
2.yi a0 a1 xi . . . am xim (1) 0
n
a0 i 1
S r
2.yi a0 a1 xi . . . am xim ( xi ) 0
n
a1 i 1
S r
2. yi a0 a1 xi . . . am xim ( xim ) 0
n
am i 1
43
Polynomial Model cont.
These equations in matrix form are given by
n
n n n m
xi . xi a
yi
. .
i 1 i 1 0
n ni 1
xi n 2 n m1 a1
xi . . . xi xi yi
i 1 i 1 i 1 . .
. i 1
. . . . . . . . . . . a . . .
m
n
n m n m1 n
xim yi
xi xi . . . xi2 m
i 1
i 1 i 1 i 1
44
Example 2-Polynomial Model
Regress the thermal expansion coefficient vs. temperature data to
a second order polynomial.
40 6.24×10−6
3.00E-06
−40 5.72×10−6
−120 5.09×10−6 2.00E-06
n n 2 n
n Ti Ti i
i 1 i1 a0 i1
n n 2 n 3 n
i
T Ti Ti a1 Ti i
in1 i 1 i1 i 1
a n
T 2 n 3 n 4 2 T 2
i Ti Ti
i i
i 1 i1 i1 i 1
46
Example 2-Polynomial Model cont.
The necessary summations are as follows
7
Table. Data points for temperature vs.
Temperature, T Coefficient of
α
T
i 1
i
2
2.5580 105
(oF) thermal expansion,
7
T
α (in/in/oF)
i
3
7.0472 10 7
80 6.47×10−6
i 1
40 6.24×10−6 7
−40 5.72×10−6 T
i 1
i
4
2.1363 1010
−120 5.09×10−6
7
−200 4.30×10−6
i 3.3600 10 5
−280 3.33×10−6
i 1
−340 2.45×10−6 7
T
i 1
i i 2.6978 10 3
7
T
i 1
i
2
i 8.5013 10 1
47
Example 2-Polynomial Model cont.
Using these summations, we can now calculate a0 ,a1 , a2
7.0000 8.6000 10 2 2.5800 10 5 a 0 3.3600 10 5
8 .600 10 2
2.5800 10 5 7.0472 10 7 a1 2.6978 10 3
2.5800 10 5 7.0472 10 7 2.1363 1010 a 2 8.5013 10 1
Solving the above system of simultaneous linear equations we have
a 0 6.0217 10
6
a 6.2782 10 9
1
a 2 1.2218 10
11
48
Transformation of Data
To find the constants of many nonlinear models, it results in solving
simultaneous nonlinear equations. For mathematical convenience,
some of the data for such models can be transformed. For example,
the data for an exponential model can be transformed.
As shown in the previous example, many chemical and physical processes
are governed by the equation,
y aebx
Taking the natural log of both sides yields,
ln y ln a bx
Let z ln y and a 0 ln a
We now have a linear regression model where z a 0 a1 x
(implying) a e ao with a1 b
49
Linearization of data cont.
Using linear model regression methods,
n n n
n xi z i xi z i
a1 i 1 i 1 i 1
2
n
n
n xi2 xi
i 1 i 1
_ _
a 0 z a1 x
Once ao , a1 are found, the original constants of the model are found as
b a1
a e a0
50
Example 3-Linearization of data
Many patients get concerned when a test involves injection of a radioactive
material. For example for scanning a gallbladder, a few drops of Technetium-
99m isotope is used. Half of the technetium-99m would be gone in about 6
hours. It, however, takes about 24 hours for the radiation levels to reach what
we are exposed to in day-to-day activities. Below is given the relative intensity
of radiation as a function of time.
1
Table. Relative intensity of radiation as a function
0
0 5 10
Time t, (hours)
52
Example 3-Linearization of data cont.
Exponential model given as,
Ae t
ln ln A t
Assuming z ln , ao ln A and a1 we obtain
z a0 a1t
This is a linear relationship between z and t
53
Example 3-Linearization of data cont.
Using this linear relationship, we can calculate a0 , a1 where
n n n
n t z t z
i i i i
a1 i 1 i 1 i 1
2
n
n
n t 2 t
i 1
1
i 1 i
and
a0 z a1t
a1
a0
Ae
54
Example 3-Linearization of Data cont.
Summations for data linearization are as follows
i ti i zi ln i ti zi t 2 t
i 1
i 25.000
i
6
z
1 0 1 0.00000 0.0000 0.0000
2 1 0.891 −0.11541 −0.11541 1.0000 i
2.8778
3 3 0.708 −0.34531 −1.0359 9.0000 i 1
4 5 0.562 −0.57625 −2.8813 25.000 6
5
6
7
9
0.447
0.355
−0.80520
−1.0356
−5.6364
−9.3207
49.000
81.000 t z
i 1
i i
18.990
6
25.000 −2.8778 −18.990 165.00
t i
2
165.00
i 1
55
Example 3-Linearization of Data cont.
Calculating a0 , a1
6 18.990 25 2.8778
a1 0.11505
6165.00 25
2
2.8778 25
a0 0.11505 2.6150 10 4
6 6
Since
a0 ln A
A e a0
2.6150104
e 0.99974
also
a1 0.11505
56
Example 3-Linearization of Data cont.
Resulting model is 0 .99974 e
0 . 11505 t
1
0.99974 e 0.11505t
Relative
Intensity
0.5
of
Radiation,
0
0 5 10
Time, t (hrs)
57
Example 3-Linearization of Data cont.
The regression formula is then
0 .99974 e 0.11505 t
1
b) Half life of Technetium 99 is when
2 t 0
1
0 .99974 e 0 .11505 t 0 .99974 e 0 .11505 0
2
e 0 .11508 t 0 .5
0 .11505 t ln 0 .5
t 6 .0248 hours
58
Example 3-Linearization of Data cont.
c) The relative intensity of radiation after 24 hours is then
0.99974e 0.1150524
0.063200
6.3200 10 2
This implies that only 100 6.3216% of the radioactive
0.99983
material is left after 24 hours.
59
Comparison
Comparison of exponential model with and without data linearization:
Table. Comparison for exponential model with and without data
linearization.
60