Chapter 3

Definition of Regression
 Regression analysis is an important tool in statistics and data science. It helps us
understand and measure how variables are related and how it will affect on each other.
 Regression analysis is super important for understanding data and making smart decisions.
 Regression analysis aims to find the best model that shows how one variable depends on
others. We do this by drawing a line or curve through the data points, so it matches the
actual results as closely as possible.
 We call the differences between the actual and predicted values residuals.
Definition of Regression
 Given 𝑛 data points 𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , … , 𝑥𝑛 , 𝑦𝑛 , we should find the best fit to the data
𝑦 = 𝑓(𝑥)
 The Residual at each points is 𝐸𝑖 = 𝑦𝑖 − 𝑓(𝑥𝑖 )

Linear Regression
𝑦 = 𝑎0 + 𝑎1 𝑥
 Does minimizing 𝑛
𝑖=1 𝐸𝑖 work as a criterion?
Linear Regression
 Example:
 Given the data points 2,4 , 3,6 , 2,6 , 3,8 , best fit the data to a straight line using the
criterion of minimizing 𝑖=1 𝐸𝑖 ?

𝑛
𝒙 𝒚
2 4
3 6
2 6
3 8
Linear Regression
 Example:
 To fit a straight line, we choose 2 points
2,4 , 3,8
 4 = 𝑎0 + 𝑎1 (2)
 8 = 𝑎0 + 𝑎1 (3)
𝒙 𝒚 𝒚𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝑬 = 𝒚 − 𝒚𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
 𝑦 = 4𝑥 − 4 will be the regression line 2 4 4 0
3 6 8 −2
2 6 4 2
3 8 8 0
4
𝐸𝑖 = 0
𝑖=1
Linear Regression
 Example:
2,6 , 3,6
 6 = 𝑎0 + 𝑎1 (2)
 6 = 𝑎0 + 𝑎1 (3)
 𝑦 = 6 will be the regression line 2 4 6 −2
3 6 6 0
2 6 6 0
3 8 6 2
4
𝐸𝑖 = 0
𝑖=1
Linear Regression
 Example:
 4
𝑖=1 𝐸𝑖 = 0 for both regression models 𝑦 = 4𝑥 − 4 and 𝑦 = 6
 The sum of the residuals is minimized, in this case it is zero, but the regression model is
not unique
 Thus the criterion of minimizing the sum of the residuals is a bad criterion
Linear Regression
 Will minimizing 𝑛
𝑖=1 |𝐸𝑖 | work better?
Linear Regression
 Example:
 Given the data points 2,4 , 3,6 , 2,6 , 3,8 , best fit the data to a straight line using the
criterion of minimizing 𝑛
𝑖=1 |𝐸𝑖 |?
𝒙 𝒚
2 4
3 6
2 6
3 8
Linear Regression
 Example:
2,4 , 3,8
 4 = 𝑎0 + 𝑎1 (2)
 8 = 𝑎0 + 𝑎1 (3)
 𝑦 = 4𝑥 − 4 will be the regression line 2 4 4 0
3 6 8 −2
2 6 4 2
3 8 8 0
4
𝐸𝑖 = 4
𝑖=1
Linear Regression
 Example:
2,6 , 3,6
 6 = 𝑎0 + 𝑎1 (2)
 6 = 𝑎0 + 𝑎1 (3)
 𝑦 = 6 will be the regression line 2 4 6 −2
3 6 6 0
2 6 6 0
3 8 6 2
4
𝐸𝑖 = 4
𝑖=1
Linear Regression
 Example:
 4
𝑖=1 𝐸𝑖 = 4 for both regression models 𝑦 = 4𝑥 − 4 and 𝑦 = 6
 The sum of the absolute residuals has been made as small as possible, in this case it is
four, but the regression model is not unique
 Thus the criterion of minimizing the sum of the absolute residuals is also a bad criterion
Linear Regression
 The least squares criterion minimizes the sum of the square of the residuals in the model,
and also produces a unique line.
𝒏 𝟐 𝒏
 𝑺𝒓 = 𝒊=𝟏 𝑬𝒊 = 𝒊=𝟏(𝒚𝒊 − 𝒂𝟎 − 𝒂𝟏 𝒙𝒊 )𝟐
Linear Regression
𝒏 𝟐 𝒏
 𝑺𝒓 = 𝒊=𝟏 𝑬𝒊 = 𝒊=𝟏(𝒚𝒊 − 𝒂𝟎 − 𝒂𝟏 𝒙𝒊 )𝟐
 To find 𝒂𝟎 and 𝒂𝟏 we minimize 𝑺𝒓 with respect to 𝒂𝟎 and 𝒂𝟏 :

𝜕𝑆𝑟 𝑛
 =2 𝑖=1 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 −1 = 0
𝜕𝑎0
𝜕𝑆𝑟 𝑛
 =2 𝑖=1 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 −𝑥𝑖 = 0
𝜕𝑎1
 Or we have:
𝑛 𝑛 𝑛
 𝑖=1 𝑎0 + 𝑖=1 𝑎1 𝑥𝑖 = 𝑖=1 𝑦𝑖
𝑛 𝑛 𝑛
 𝑖=1 𝑎0 𝑥𝑖 + 𝑖=1 𝑎1 𝑥𝑖 ² = 𝑖=1 𝑦𝑖 𝑥𝑖
Linear Regression
𝒏 𝟐 𝒏
 𝑺𝒓 = 𝒊=𝟏 𝑬𝒊 = 𝒊=𝟏(𝒚𝒊 − 𝒂𝟎 − 𝒂𝟏 𝒙𝒊 )𝟐
 Solving for 𝒂𝟎 and 𝒂𝟏 :
𝑛 𝑛 𝑦 𝑥
𝑖=1 𝑖 𝑖 − 𝑛 𝑛
𝑖=1 𝑖 𝑖=1 𝑦𝑖
𝑥
𝑎1 = 2
𝑛 𝑛
𝑖=1 𝑥 𝑖 −( 𝑛
𝑖=1 𝑥𝑖 )²
𝑛 2 𝑛 𝑛 𝑛
𝑥 𝑦 𝑖 − 𝑥𝑖 𝑖=1 𝑥𝑖 𝑦𝑖
𝑎0 = 𝑖=1 𝑖 𝑖=1
𝑛 2
𝑖=1
𝑛 2 = 𝑦 − 𝑎1 𝑥
𝑛 𝑥
𝑖=1 𝑖 − 𝑥
𝑖=1 𝑖
Linear Regression
 Example:
 The torque, T needed to turn the torsion spring of a mousetrap through an angle, is
given below. Find the constants for the model given by T = 𝑘1 + 𝑘2 𝜃
𝑨𝒏𝒈𝒍𝒆, 𝜃 𝑻𝒐𝒓𝒒𝒖𝒆, 𝑻
𝑹𝒂𝒅𝒊𝒂𝒏𝒔 𝑵−𝒎
0.698132 0.188224
0.959931 0.209138
1.134464 0.230052
1.570796 0.250965
1.919862 0.313707
Linear Regression
 Example:
 The following table shows the summations needed for the calculations of the constants in
the regression model. 𝑨𝒏𝒈𝒍𝒆, 𝜃 𝑻𝒐𝒓𝒒𝒖𝒆, 𝑻 𝜃² 𝑻𝜃

𝑹𝒂𝒅𝒊𝒂𝒏𝒔 𝑵−𝒎 𝑹𝒂𝒅𝒊𝒂𝒏𝒔2 𝐍 − 𝐦 − 𝑹𝒂𝒅𝒊𝒂𝒏𝒔
0.698132 0.188224 0.487388 0.131405
0.959931 0.209138 0.921468 0.200758
1.134464 0.230052 1.2870 0.260986
1.570796 0.250965 2.4674 0.394215
1.919862 0.313707 3.6859 0.602274
6.2831 1.1921 8.8491 1.5896

Linear Regression
 Example:
𝑨𝒏𝒈𝒍𝒆, 𝜃 𝑻𝒐𝒓𝒒𝒖𝒆, 𝑻 𝜃² 𝑻𝜃
𝑹𝒂𝒅𝒊𝒂𝒏𝒔 𝑵−𝒎 𝑹𝒂𝒅𝒊𝒂𝒏𝒔2 𝐍 − 𝐦 − 𝑹𝒂𝒅𝒊𝒂𝒏𝒔
5 5𝑖=1 𝑇𝑖 𝜃𝑖 − 5𝑖=1 𝜃𝑖 5𝑖=1 𝑇𝑖
 𝑘2 = 2 0.698132 0.188224 0.487388 0.131405
5 5𝑖=1 𝜃𝑖 −( 5𝑖=1 𝜃𝑖 )²
0.959931 0.209138 0.921468 0.200758
1.134464 0.230052 1.2870 0.260986
5(1.5896)−(6.2831)(1.1921)
 𝑘2 = = 9.6091 × 10−2 1.570796 0.250965 2.4674 0.394215
5 8.8491 −(6.2831)²
1.919862 0.313707 3.6859 0.602274
5 5
𝑖=1 𝑇𝑖 𝑖=1 𝜃𝑖
 𝑘1 = 𝑇 − 𝑘2 𝜃 = − 𝑘2 6.2831 1.1921 8.8491 1.5896
𝑛 𝑛
1.1921 9.6091×10−2 6.2831

 𝑘1 = − = 1.1767 × 10−1
5 5
Linear Regression
 Example:
5 5𝑖=1 𝑇𝑖 𝜃𝑖 − 5𝑖=1 𝜃𝑖 5𝑖=1 𝑇𝑖

 𝑘2 = 2
5 5𝑖=1 𝜃𝑖 −( 5𝑖=1 𝜃𝑖 )²
5(1.5896)−(6.2831)(1.1921)
 𝑘2 = = 9.6091 × 10−2
5 8.8491 −(6.2831)²
5 5
𝑖=1 𝑇𝑖 𝑖=1 𝜃𝑖
 𝑘1 = 𝑇 − 𝑘2 𝜃 = − 𝑘2
𝑛 𝑛
1.1921 9.6091×10−2 6.2831

 𝑘1 = − = 1.1767 × 10−1
5 5
Linear Regression
𝑦 = 𝑎1 𝑥 (special case)
𝑛 𝑛 𝑛 𝑛
𝑖=1 𝑦𝑖 𝑥𝑖 − 𝑖=1 𝑥𝑖 𝑖=1 𝑦𝑖
 𝑎1 = 2 is it correct ? Can it be use?
𝑛 𝑛 𝑛
𝑖=1 𝑥𝑖 −( 𝑖=1 𝑥𝑖 )²
 Residual at each point:𝜀𝑖 = 𝑦𝑖 − 𝑎1 𝑥𝑖
 The least squares criterion :
𝒏 𝟐 𝒏
 𝑺𝒓 = 𝒊=𝟏 𝜀𝒊 = 𝒊=𝟏(𝒚𝒊 − 𝒂𝟏 𝒙𝒊 )𝟐
Linear Regression
 Differentiate with respect to 𝑎1
𝑑𝑆𝑟 𝑛 𝑛
 = 𝑖=1 2 𝑦𝑖 − 𝑎1 𝑥𝑖 −𝑥𝑖 = 𝑖=1 −2𝑥𝑖 𝑦𝑖 − 𝑎1 𝑥𝑖 ² = 0
𝑑𝑎1
𝑑²𝑆𝑟 𝑛
 = 𝑖=1 2 𝑥𝑖 ² > 0 ⇒ 𝑎1 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑠 𝑡𝑜 𝑎 𝑙𝑜𝑐𝑎𝑙 𝑚𝑖𝑛𝑖𝑚𝑎
𝑑²𝑎1
𝑛
𝑖=1 𝑦𝑖 𝑥𝑖
 𝑎1 = 𝑛
if no a0
𝑖=1 𝑥𝑖 ²
Linear Regression
 Example:
𝑺𝒕𝒓𝒂𝒊𝒏 𝑺𝒕𝒓𝒆𝒔𝒔
% 𝑴𝒑𝒂
 To find the longitudinal modulus of composite, the following data is
0 0
collected. Find the longitudinal modulus, E using the regression model 0.183 306
0.36 612
𝜎 = 𝐸𝜀 and the sum of the square of the residuals.
0.5324 917
0.702 1223
0.867 1529
1.0244 1835
1.1774 2140
1.329 2446
1.479 2752
1.5 2767
1.56 2896
Linear Regression
𝒊 𝜀(%) 𝜎(𝑮𝒑𝒂) 𝜀² 𝜀𝜎
 Example:
1 0 0 0 0
2 1.83 × 10−3 3.06 × 108 3.3489 × 10−6 5.5998 × 105
𝑛 3 3.6 × 10−3 6.12 × 108 1.296 × 10−5 2.2031 × 106
𝑖=1 𝜎𝑖 𝜀𝑖
𝐸 = 𝑛 𝜀2 4 5.324 × 10−3 9.17 × 108 2.8345 × 10−5 4.8821 × 106
𝑖=1 𝑖
5 7.02 × 10−3 1.223 × 109 4.928 × 10−5 8.5855 × 106
6 8.67 × 10−3 1.529 × 109 7.5169 × 10−5 1. 3256 × 107
2.3337×108 7 1.0244 × 10−2 1.835 × 109 1.0494 × 10−4 1.8798 × 107
𝐸 = 8 1.1774 × 10−2 2.140 × 109 1.3863 × 10−4 2.5196 × 107
1.2764×10−3
9 1.329 × 10−2 2.446 × 109 1.7662 × 10−4 3.2507 × 107
 𝐸 = 182.84 𝐺𝑃𝑎 10 1.479 × 10−2 2.752 × 109 2.1874 × 10−4 4.0702 × 107
11 1.5 × 10−2 2.767 × 109 2.25 × 10−4 4.1505 × 107
12 1.56 × 10−2 2.896 × 109 2.4336 × 10−4 4.5178 × 107
12
1.2764 × 10−3 2.3337 × 108
𝑖=1
Linear Regression
 Example:
𝑛
𝑖=1 𝜎𝑖 𝜀𝑖
𝐸 = 𝑛 𝜀2
𝑖=1 𝑖
2.3337×108
𝐸 =
1.2764×10−3
 𝐸 = 182.84 𝐺𝑃𝑎
NonLinear Regression
NonLinear Regression
 We will study some nonlinear regression models:
 Exponentiel model : 𝑦 = 𝑎𝑒 𝑏𝑥
 Power model : 𝑦 = 𝑎𝑒 𝑏𝑥
𝑎𝑥
 Saturation growth model : 𝑦 =
𝑏+𝑥
 Polynomial model : 𝑦 = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑚 𝑥 𝑚
NonLinear Regression: Exponential
 Given 𝑛 data points 𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , … , 𝑥𝑛 , 𝑦𝑛 , the best fit to the data 𝑦 = 𝑎𝑒 𝑏𝑥 where
𝑓(𝑥) is a nonlinear function of 𝑥.

𝒏 𝟐 𝒏 𝒏
 𝑺𝒓 = 𝒊=𝟏 𝑬𝒊 = 𝒊=𝟏(𝒚𝒊 − 𝒇 𝒙𝒊 )𝟐 = 𝒊=𝟏(𝒚𝒊 − 𝑎𝑒 𝑏𝒙𝒊 )𝟐
 To find 𝒂 and 𝒃 we minimize 𝑺𝒓 with respect to 𝒂 and 𝒃:

𝜕𝑆𝑟 𝑛
 =2 𝑖=1 𝑦𝑖 − 𝑎𝑒 𝑏𝒙𝒊 −𝑒 𝑏𝒙𝒊 = 0
𝜕𝑎
𝜕𝑆𝑟 𝑛
 =2 𝑖=1 𝑦𝑖 − 𝑎𝑒 𝑏𝒙𝒊 −𝑥𝑖 𝑎𝑒 𝑏𝒙𝒊 = 0
𝜕𝑏
 Rewriting the equations:

𝑛 𝑏𝒙𝒊 𝑛 2𝑏𝒙𝒊
− 𝑖=1 𝑦𝑖 𝑒 + 𝑖=1 𝑎𝑒 =0
𝑛 𝑏𝒙𝒊 𝑛 2𝑏𝒙𝒊
 𝑖=1 𝑦𝑖 𝑥𝑖 𝑒 − 𝑖=1 𝑎𝑥𝑖 𝑒 =0
𝒏 𝟐 𝒏 𝒏
 𝑺𝒓 = 𝒊=𝟏 𝑬𝒊 = 𝒊=𝟏(𝒚𝒊 − 𝒇 𝒙𝒊 )𝟐 = 𝒊=𝟏(𝒚𝒊 − 𝑎𝑒 𝑏𝒙𝒊 )𝟐
 Solving for 𝒂 and 𝒃:
𝑛 𝑏𝒙𝒊
𝑖=1 𝑦𝑖 𝑒
 From the first equation : 𝑎 = 𝑛 2𝑏𝒙𝒊
need to find value of B first to calculate a
𝑖=1 𝑒
𝑛 𝑏𝒙𝒊
𝑖=1 𝑦𝑖 𝑒
 Substituting 𝑎 in the second equation : 𝑛
𝑖=1 𝑦𝑖 𝑥𝑖 𝑒
𝑏𝒙𝒊 − 𝑛 𝑒 2𝑏𝒙𝒊
𝑛
𝑖=1 𝑥𝑖 𝑒
2𝑏𝒙𝒊 =0
𝑖=1
 The constant 𝑏 can be found through numerical methods such as bisection method.
 Example:
 Many patients get concerned when a test involves injection of a radioactive material. For
example for scanning a gallbladder, a few drops of Technetium-99m isotope is used. Half
of the Technetium-99m would be gone in about 6 hours. It, however, takes about 24
hours for the radiation levels to reach what we are exposed to in day-to-dat activities.
Below is given the relative intensity of radiation as a function of time.

𝒕(𝒉𝒓𝒔) 𝟎 𝟏 𝟑 𝟓 𝟕 𝟗
𝜸 𝟏. 𝟎𝟎𝟎 𝟎. 𝟖𝟗𝟏 𝟎. 𝟕𝟎𝟖 𝟎. 𝟓𝟔𝟐 𝟎. 𝟒𝟒𝟕 𝟎. 𝟑𝟓𝟓
 Example:
 The relative intensity is related to time by the equation 𝛾 = 𝐴𝑒 𝜆𝑡
 Find the value of the regression constants 𝐴 and 𝜆
𝒕(𝒉𝒓𝒔) 𝟎 𝟏 𝟑 𝟓 𝟕 𝟗
𝜸 𝟏. 𝟎𝟎𝟎 𝟎. 𝟖𝟗𝟏 𝟎. 𝟕𝟎𝟖 𝟎. 𝟓𝟔𝟐 𝟎. 𝟒𝟒𝟕 𝟎. 𝟑𝟓𝟓
 Example:
 𝛾 = 𝐴𝑒 𝜆𝑡
𝑛 𝜆𝑡𝑖
𝑖=1 𝛾𝑖 𝑒
𝐴 = 𝑛 2𝜆𝑡𝑖
𝑖=1 𝑒
𝑛 𝜆𝑡𝑖
𝑛 𝜆𝑡𝑖 𝑖=1 𝛾𝑖 𝑒 𝑛 2𝜆𝑡𝑖
𝑓 𝜆 = 𝑖=1 𝛾𝑖 𝑡𝑖 𝑒 − 𝑛 𝑒 2𝜆𝑡𝑖 𝑖=1 𝑡𝑖 𝑒 =0
𝑖=1
 Example:
𝑛 𝜆𝑡𝑖
𝑛 𝜆𝑡𝑖 𝑖=1 𝛾𝑖 𝑒 𝑛 2𝜆𝑡𝑖
𝑓 𝜆 = 𝛾 𝑡
𝑖=1 𝑖 𝑖 𝑒 − 𝑛 𝑒 2𝜆𝑡𝑖 𝑡
𝑖=1 𝑖 𝑒 =0
𝑖=1
 To find the solution we will implement the problem on Matlab (setting up
the equation)
 Example:
 Result from Matlab: 𝜆 = −0.1151

 The value of 𝐴 can now be calculated:
𝑛 𝜆𝑡𝑖
𝑖=1 𝛾𝑖 𝑒
𝐴= 𝑛 𝑒 2𝜆𝑡𝑖 = 0.9998
𝑖=1
 The exponential regression model is:

𝛾 = 𝐴𝑒 𝜆𝑡 = 0.9998𝑒 −0.1151𝑡
Plot of data and regression curve

NonLinear Regression: Polynomial
 Given 𝑛 data points 𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , … , 𝑥𝑛 , 𝑦𝑛 , the best fit to the data 𝑦 = 𝑎0 + 𝑎1 𝑥 +
⋯ + 𝑎𝑚 𝑥 𝑚 ; m ≤ 𝑛 − 2 is a nonlinear function of 𝑥.
𝒏 𝟐 𝒏 𝒏
 𝑺𝒓 = 𝒊=𝟏 𝑬𝒊 = 𝒊=𝟏(𝒚𝒊 − 𝒇 𝒙𝒊 )𝟐 = 𝒊=𝟏(𝒚𝒊 − 𝑎0 − 𝑎1 𝑥 − ⋯ − 𝑎𝑚 𝑥 𝑚 )𝟐
 To find 𝑎0 , 𝑎1 , … and 𝑎𝑚 we minimize 𝑺𝒓 with respect to 𝑎0 , 𝑎1 , … and 𝑎𝑚 :
𝜕𝑆𝑟 𝑛
 =2 𝑖=1 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥 − ⋯ − 𝑎𝑚 𝑥 𝑚 −1 = 0
𝜕𝑎0
𝜕𝑆𝑟 𝑛
 =2 𝑖=1 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥 − ⋯ − 𝑎𝑚 𝑥 𝑚 −𝑥𝑖 = 0
𝜕𝑎1
…
𝜕𝑆𝑟 𝑛
 =2 𝑖=1 𝑦𝑖 − 𝑎0 − 𝑎1 𝑥 − ⋯ − 𝑎𝑚 𝑥 𝑚 −𝑥𝑖𝑚 = 0
𝜕𝑎𝑀
𝒏 𝟐 𝒏 𝒏
 𝑺𝒓 = 𝒊=𝟏 𝑬𝒊 = 𝒊=𝟏(𝒚𝒊 − 𝒇 𝒙𝒊 )𝟐 = 𝒊=𝟏(𝒚𝒊 − 𝑎0 − 𝑎1 𝑥 − ⋯ − 𝑎𝑚 𝑥 𝑚 )𝟐
 To find 𝑎0 , 𝑎1 , … and 𝑎𝑚 we minimize 𝑺𝒓 with respect to 𝑎0 , 𝑎1 , … and 𝑎𝑚 :
𝑛 𝑛
𝑛 … 𝑛
𝑥𝑖
𝑖=1
𝑥𝑖𝑚 𝑎0 𝑦𝑖
Solve matrix to find a0 till am
𝑖=1 𝑖=1
𝑛 𝑛 𝑛
𝑥𝑖 𝑥𝑖2
…
𝑥𝑖𝑚+1 𝑎1 𝑛
𝑖=1 𝑖=1 𝑖=1 = 𝑖=1

𝑥𝑖 𝑦𝑖
𝑛
…
𝑛
… …
𝑛
… … …
… 𝑛
𝑥𝑖𝑚 𝑥𝑖𝑚+1 𝑥𝑖2𝑚
𝑖=1 𝑖=1 𝑖=1 𝑎𝑚 𝑖=1
𝑥𝑖𝑚 𝑦𝑖
 Example:
𝑪𝒐𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕 𝒐𝒇
 Regress the thermal expansion coefficient vs. 𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆
𝒕𝒉𝒆𝒎𝒂𝒍 𝒆𝒙𝒑𝒂𝒏𝒔𝒊𝒐𝒏
𝑻(°𝑭)
𝜶 (𝒊𝒏/ 𝒊𝒏 /°𝑭)
temperature data to a second order polynomial.
𝟖𝟎 𝟔. 𝟒𝟕 × 𝟏𝟎−𝟔
𝟒𝟎 𝟔. 𝟐𝟒 × 𝟏𝟎−𝟔
−𝟒𝟎 𝟓. 𝟕𝟐 × 𝟏𝟎−𝟔
−𝟏𝟐𝟎 𝟓. 𝟎𝟗 × 𝟏𝟎−𝟔
−𝟐𝟎𝟎 𝟒. 𝟑𝟎 × 𝟏𝟎−𝟔
−𝟐𝟖𝟎 𝟑. 𝟑𝟑 × 𝟏𝟎−𝟔
−𝟑𝟒𝟎 𝟐. 𝟒𝟓 × 𝟏𝟎−𝟔
 Example:
𝑪𝒐𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕 𝒐𝒇
 We are to fit the data to the polynomial 𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆
𝒕𝒉𝒆𝒎𝒂𝒍 𝒆𝒙𝒑𝒂𝒏𝒔𝒊𝒐𝒏
𝑻(°𝑭)
𝜶 (𝒊𝒏/ 𝒊𝒏 /°𝑭)
regression model: 𝛼 = 𝑎0 + 𝑎1 𝑇 + 𝑎2 𝑇 2 𝟖𝟎 𝟔. 𝟒𝟕 × 𝟏𝟎−𝟔
𝟒𝟎 𝟔. 𝟐𝟒 × 𝟏𝟎−𝟔
𝑛 𝑛 𝑛
𝑛 𝟓. 𝟕𝟐 × 𝟏𝟎−𝟔
𝑇𝑖 𝑇𝑖2 𝑎0 𝛼𝑖 −𝟒𝟎
𝑖=1 𝑖=1 𝑖=1 −𝟏𝟐𝟎 𝟓. 𝟎𝟗 × 𝟏𝟎−𝟔
𝑛 𝑛 𝑛 𝑛
−𝟐𝟎𝟎 𝟒. 𝟑𝟎 × 𝟏𝟎−𝟔
𝑇𝑖 𝑇𝑖2 𝑇𝑖3 𝑎1 𝑇𝑖 𝛼𝑖
𝑖=1
𝑛
𝑖=1
𝑛
𝑖=1
𝑛
= 𝑖=1 −𝟐𝟖𝟎 𝟑. 𝟑𝟑 × 𝟏𝟎−𝟔
𝑛 −𝟑𝟒𝟎 𝟐. 𝟒𝟓 × 𝟏𝟎−𝟔
𝑇𝑖2
𝑖=1
𝑇𝑖3
𝑖=1
𝑇𝑖4 𝑎2 𝑇𝑖2 𝛼𝑖
𝑖=1 𝑖=1
𝑛
 Example: 𝑇𝑖 = −8.6000 × 102
𝑖=1
𝑪𝒐𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕 𝒐𝒇 7
𝑻𝒆𝒎𝒑𝒆𝒓𝒂𝒕𝒖𝒓𝒆
𝑻(°𝑭)
𝒕𝒉𝒆𝒎𝒂𝒍 𝒆𝒙𝒑𝒂𝒏𝒔𝒊𝒐𝒏 𝑇𝑖2 = 2.5580 × 105
𝜶 (𝒊𝒏/ 𝒊𝒏 /°𝑭) 𝑖=1
7
𝟖𝟎 𝟔. 𝟒𝟕 × 𝟏𝟎−𝟔
𝑇𝑖3 = −7.0472 × 107
𝟒𝟎 𝟔. 𝟐𝟒 × 𝟏𝟎−𝟔 𝑖=1
7
−𝟒𝟎 𝟓. 𝟕𝟐 × 𝟏𝟎−𝟔
𝑇𝑖4 = 2.1363 × 1010
−𝟏𝟐𝟎 𝟓. 𝟎𝟗 × 𝟏𝟎−𝟔 𝑖=1
−𝟐𝟎𝟎 𝟒. 𝟑𝟎 × 𝟏𝟎−𝟔 7
−𝟐𝟖𝟎 𝟑. 𝟑𝟑 × 𝟏𝟎−𝟔 𝛼𝑖 = 3.3600 × 10−5

𝑖=1
−𝟑𝟒𝟎 𝟐. 𝟒𝟓 × 𝟏𝟎−𝟔 7
𝑇𝑖 𝛼𝑖 = −2.6978 × 10−3
𝑖=1
7
𝑇𝑖2 𝛼𝑖 = 8.5013 × 10−1

𝑖=1
 Example:
3.3600 × 105
7.0000 −8.6000 × 102 2.5580 × 105 𝑎0
−8.6000 × 102 2.5580 × 105 −7.0472 × 107 𝑎1

= −2.6978 × 10−3
2.5580 × 105 −7.0472 × 107 2.1363 × 10 10 𝑎2 8.5013 × 10−1
 We are to fit the data to the polynomial regression model:
 𝛼 = 𝑎0 + 𝑎1 𝑇 + 𝑎2 𝑇 2 = 6.0217 × 10−6 + 6.2782 × 10−9 𝑇 − 1.2218 × 10−11 𝑇 2

after solving, diagonals are
a0,a1,a2
NonLinear Regression:
Transfromation of data
 To find the constants of many nonlinear models, it results in solving simultaneous
nonlinear equations.
 For mathematical convenience, some of the data for such models can be transformed.
 For example, the data for an exponential model can be transformed.
 𝑦 = 𝑎𝑒 𝑏𝑥 ⇒ ln 𝑦 = ln 𝑎𝑒 𝑏𝑥 ⇒ ln 𝑦 = ln 𝑎 + ln 𝑒 𝑏𝑥 ⇒ln 𝑦 = ln 𝑎 + 𝑏𝑥
 Let z = ln 𝑦 and 𝑎0 = ln 𝑎 and 𝑎1 = 𝑏 How to transform exponential to

linear
 Thus we will have the linear regression z = 𝑎0 + 𝑎1 𝑥

 Using linear model regression methods
𝑛 𝑛 𝑛 𝑛
𝑖=1 𝑧𝑖 𝑥𝑖 − 𝑖=1 𝑥𝑖 𝑖=1 𝑧𝑖
 𝑎1 = 2
𝑛 𝑛 𝑛
𝑖=1 𝑥𝑖 −( 𝑖=1 𝑥𝑖 )²
 𝑎0 = 𝑧 − 𝑎1 𝑥.
 Once 𝑎0 and 𝑎1 are found, the original constants of the model are found as:
 𝑎 = 𝑒 𝑎0
go back to original form after finding a0 and a1
 𝑏 = 𝑎1
 Example:
 Many patients get concerned when a test involves injection of a radioactive material. For
example for scanning a gallbladder, a few drops of Technetium-99m isotope is used. Half
of the Technetium-99m would be gone in about 6 hours. It, however, takes about 24
hours for the radiation levels to reach what we are exposed to in day-to-dat activities.
Below is given the relative intensity of radiation as a function of time.
𝒕(𝒉𝒓𝒔) 𝟎 𝟏 𝟑 𝟓 𝟕 𝟗
𝜸 𝟏. 𝟎𝟎𝟎 𝟎. 𝟖𝟗𝟏 𝟎. 𝟕𝟎𝟖 𝟎. 𝟓𝟔𝟐 𝟎. 𝟒𝟒𝟕 𝟎. 𝟑𝟓𝟓
 Example:
 The relative intensity is related to time by the equation 𝛾 = 𝐴𝑒 𝜆𝑡
 Find the value of the regression constants 𝐴 and 𝜆
𝒕(𝒉𝒓𝒔) 𝟎 𝟏 𝟑 𝟓 𝟕 𝟗
𝜸 𝟏. 𝟎𝟎𝟎 𝟎. 𝟖𝟗𝟏 𝟎. 𝟕𝟎𝟖 𝟎. 𝟓𝟔𝟐 𝟎. 𝟒𝟒𝟕 𝟎. 𝟑𝟓𝟓
 Example:
 𝛾 = 𝐴𝑒 𝜆𝑡
 𝑙𝑛𝛾 = 𝑙𝑛𝐴𝑒 𝜆𝑡
 𝑙𝑛𝛾 = 𝑙𝑛𝐴 + 𝜆𝑡
 Assuming z = 𝑎0 + 𝑎1 t the linear relationship between z and t
𝑛 𝑛 𝑛 𝑛
𝑖=1 𝑡𝑖 𝑧𝑖 − 𝑖=1 𝑡𝑖 𝑖=1 𝑧𝑖
 𝑎1 = 2
𝑛 𝑛 𝑛
𝑖=1 𝑡𝑖 −( 𝑖=1 𝑡𝑖 )²
 𝑎0 = 𝑧 − 𝑎1 𝑡.
 Example:
𝑛
𝑡𝑖 = 25.00
𝒊 𝑡𝑖 𝜸𝑖 𝑧𝑖 𝑡𝑖 𝑧𝑖 𝑡𝑖 ² 𝑖=1
𝑛
1 0 1 0.00000 0.00000 0.00000
𝑧𝑖 = −2.8778
2 1 0.891 −0.11541 −0.11541 1.00000 𝑖=1
3 3 0.708 −0.34531 −1.0359 9.00000 𝑛
4 5 0.562 −0.57625 −2.8813 25.0000 𝑡𝑖 𝑧𝑖 = −18.990

𝑖=1
5 7 0.447 −0.80520 −5.6364 49.0000 𝑛
6 0 0.355 −1.0356 −9.3207 81.0000 𝑡𝑖 ² = 165.00

25.000 −2.8778 −18.990 165.00 𝑖=1
 Example:
𝑛
𝑡𝑖 = 25.00
𝑛 𝑛 𝑛 𝑛
𝑖=1 𝑡𝑖 𝑧𝑖 − 𝑖=1 𝑡𝑖 𝑖=1 𝑧𝑖 6 −18.990 −25(−2.8778)
 𝑎1 = 2 = = −0.11505 𝑖=1
𝑛 𝑛 𝑛
𝑖=1 𝑡𝑖 −( 𝑖=1 𝑡𝑖 )²
6 165.00 −(25)² 𝑛
𝑧𝑖 = −2.8778
−2.8778 −0.11505 25 𝑖=1
 𝑎0 = 𝑧 − 𝑎1 𝑡 = − = −2.6150 × 10−4 𝑛
6 6
𝑡𝑖 𝑧𝑖 = −18.990
−4 𝑖=1
 𝑎0 = ln A ; A = 𝑒 𝑎0 = 𝑒 −2.6150×10 = 0.99974 𝑛
𝑡𝑖 ² = 165.00
𝛾= 𝐴𝑒 𝜆𝑡 = 0.99974𝑒 −0.11505𝑡 𝑖=1
Adequacy of Linear Regression
If we have some data is Straight Line Model adequate?
 If we have some data is Straight Line Model adequate?
 We should check:
steps to calculate adequacy of linear regression
 Plot the data and the model
 Find standard error of estimate
 Calculate the coefficient of determination
 Thus we can answer question such as does the model describe the data adequately?
How well does the model predict the response variable predictably?
 Example:
 Check the adequacy of the straight line model for given data
 𝛼 = 𝑎0 + 𝑎1 𝑇 𝑇 𝛼
(𝐹) (𝝁𝑖𝑛/𝑖𝑛/𝐹)
−340 2.45
−260 3.58
−180 4.52
−100 5.28
−20 5.86
60 6.36
 Example:
Step 1
 1) Plot the data and the model
 𝛼 = 𝑎0 + 𝑎1 𝑇 𝑇 𝛼
(𝐹) (𝝁𝑖𝑛/𝑖𝑛/𝐹)
 2.45 = 𝑎0 + 𝑎1 (−340) −340 2.45
−260 3.58
 6.36 = 𝑎0 + 𝑎1 (60) −180 4.52
−100 5.28
 𝛼 = 6.0325 + 0.0096964𝑇
−20 5.86
60 6.36
 Example:
𝑇𝑖 𝛼𝑖 𝑎0 + 𝑎1 𝑇 𝛼𝑖 − 𝑎0 − 𝑎1 𝑇𝑖
 2) Standard error of estimate: −340 2.45 2.7357 −0.28571
−260 3.58 3.5114 0.068571
Step 2
−180 4.52 4.2871 0.23286
𝑆𝑟 Standard error of estimate formula
 𝑆𝛼/𝑇 = −100 5.28 5.0629 0.21714
𝑛−2
−20 5.86 5.8386 0.021429
𝑛 2 𝑛
 𝑆𝑟 = 𝑖=1 𝐸𝑖 = 𝑖=1(𝛼𝑖 − 𝑎0 − 𝑎1 𝑇𝑖 )2 60 6.36 6.6143 −0.25429
𝑆𝑟 𝑆𝛼/𝑇
𝐸𝑖 𝛼𝑖 −𝑎0 −𝑎1 𝑇𝑖
 𝑆𝑐𝑎𝑙𝑒𝑑 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = = 0.25283
0.25141
𝑆𝛼/𝑇 𝑆𝛼/𝑇
 Example:
 2) Standard error of estimate: 𝑇𝑖 𝛼𝑖 𝑅𝒆𝒔𝒊𝒅𝒖𝒂𝒍 𝑆𝒄𝒂𝒍𝒆𝒅 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍

−340 2.45 −0.28571 −1.1364
−260 3.58 0.068571 0.27275
𝑆𝑟
 𝑆𝛼/𝑇 = = 0.25141 −180 4.52 0.23286 0.92622
𝑛−2
−100 5.28 0.21714 0.86369
−20 5.86 0.021429 0.085235
60 6.36 −0.25429 −1.0115
 Example:
 3) Find the coefficient of determination

Step 3
𝑛
 𝑆𝑡 = 𝑖=1(𝛼𝑖 − 𝛼)²
𝑛
 𝑆𝑟 = 𝑖=1(𝛼𝑖 − 𝑎0 − 𝑎1 𝑇𝑖 )2
coeff of determination
𝑆𝑡 −𝑆𝑟
 𝑟2 = ; 0 ≤ 𝑟2 ≤ 1
𝑆𝑡
 Example:

 Example:

 Example:

𝑇𝑖 𝛼𝑖 𝛼𝑖 − 𝛼
 𝛼 = 4.6750 −340 2.45 −2.2250
−260 3.58 −1.0950
 𝑆𝑡 = 10.783
−180 4.52 0.15500
 𝑆𝑟 = 0.25283 −100 5.28 0.60500
−20 5.86 1.1850
𝑆𝑡 −𝑆𝑟 60 6.36 1.6850
 𝑟2 = = 0.97655
𝑆𝑡
 Example:
 3) Find the correlation coefficient
𝑟= = 0.98820 𝑉𝑒𝑟𝑦 𝑠𝑡𝑟𝑜𝑛𝑔 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 0.8 𝑡𝑜 1.0
𝑆𝑡
𝑆𝑡𝑟𝑜𝑛𝑔 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 0.6 𝑡𝑜 0.8
𝑀𝑜𝑑𝑒𝑟𝑎𝑡𝑒 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 0.4 𝑡𝑜 0.6
𝑊𝑒𝑎𝑘 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 0.2 𝑡𝑜 0.4
𝑉𝑒𝑟𝑦 𝑤𝑒𝑎𝑘 𝑜𝑟 𝑛𝑜𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 0.0 𝑡𝑜 0.2
 When the range of values for (x) increases in a plot of y, the coefficient of determination
tends to increase.
 A vertical regression line can make the coefficient of determination value appear high, but this
might not reflect the true relationship between the variables.
 A high coefficient of determination doesn′t necessarily mean the linear model is suitable or
accurate.
 Just because the coefficient of determination is high, it doesn′t guarantee that the regression
model will make accurate predictions.

 Example:
 4) Model meets assumption of random errors
𝑟= = 0.98820
𝑆𝑡
Linear Regression
Linear Regression
Linear Regression

Chapter 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3

Uploaded by

Copyright:

Available Formats

Definition of Regression

 Regression analysis is an important tool in statistics and data science. It helps us

actual results as closely as possible.

 The Residual at each points is 𝐸𝑖 = 𝑦𝑖 − 𝑓(𝑥𝑖 )

criterion of minimizing 𝑖=1 𝐸𝑖 ?

 To fit a straight line, we choose 2 points

 To fit a straight line, we choose 2 points

 To fit a straight line, we choose 2 points

 To fit a straight line, we choose 2 points

four, but the regression model is not unique

and also produces a unique line.

 To find 𝒂𝟎 and 𝒂𝟏 we minimize 𝑺𝒓 with respect to 𝒂𝟎 and 𝒂𝟏 :

 Solving for 𝒂𝟎 and 𝒂𝟏 :

given below. Find the constants for the model given by T = 𝑘1 + 𝑘2 𝜃

the regression model. 𝑨𝒏𝒈𝒍𝒆, 𝜃 𝑻𝒐𝒓𝒒𝒖𝒆, 𝑻 𝜃² 𝑻𝜃

6.2831 1.1921 8.8491 1.5896

1.1921 9.6091×10−2 6.2831

5 5𝑖=1 𝑇𝑖 𝜃𝑖 − 5𝑖=1 𝜃𝑖 5𝑖=1 𝑇𝑖

1.1921 9.6091×10−2 6.2831

 Residual at each point:𝜀𝑖 = 𝑦𝑖 − 𝑎1 𝑥𝑖

 The least squares criterion :

𝑓(𝑥) is a nonlinear function of 𝑥.

 To find 𝒂 and 𝒃 we minimize 𝑺𝒓 with respect to 𝒂 and 𝒃:

 Rewriting the equations:

 Solving for 𝒂 and 𝒃:

Below is given the relative intensity of radiation as a function of time.

 The relative intensity is related to time by the equation 𝛾 = 𝐴𝑒 𝜆𝑡

 Find the value of the regression constants 𝐴 and 𝜆

 To find the solution we will implement the problem on Matlab (setting up

 Result from Matlab: 𝜆 = −0.1151

 The exponential regression model is:

Plot of data and regression curve

 To find 𝑎0 , 𝑎1 , … and 𝑎𝑚 we minimize 𝑺𝒓 with respect to 𝑎0 , 𝑎1 , … and 𝑎𝑚 :

 To find 𝑎0 , 𝑎1 , … and 𝑎𝑚 we minimize 𝑺𝒓 with respect to 𝑎0 , 𝑎1 , … and 𝑎𝑚 :

𝑖=1 𝑖=1 𝑖=1 = 𝑖=1

−𝟐𝟖𝟎 𝟑. 𝟑𝟑 × 𝟏𝟎−𝟔 𝛼𝑖 = 3.3600 × 10−5

𝑇𝑖2 𝛼𝑖 = 8.5013 × 10−1

−8.6000 × 102 2.5580 × 105 −7.0472 × 107 𝑎1

2.5580 × 105 −7.0472 × 107 2.1363 × 10 10 𝑎2 8.5013 × 10−1

 We are to fit the data to the polynomial regression model:

 𝛼 = 𝑎0 + 𝑎1 𝑇 + 𝑎2 𝑇 2 = 6.0217 × 10−6 + 6.2782 × 10−9 𝑇 − 1.2218 × 10−11 𝑇 2

 For example, the data for an exponential model can be transformed.

 Let z = ln 𝑦 and 𝑎0 = ln 𝑎 and 𝑎1 = 𝑏 How to transform exponential to

 Thus we will have the linear regression z = 𝑎0 + 𝑎1 𝑥

Below is given the relative intensity of radiation as a function of time.

 The relative intensity is related to time by the equation 𝛾 = 𝐴𝑒 𝜆𝑡

 Find the value of the regression constants 𝐴 and 𝜆

 Assuming z = 𝑎0 + 𝑎1 t the linear relationship between z and t

4 5 0.562 −0.57625 −2.8813 25.0000 𝑡𝑖 𝑧𝑖 = −18.990

6 0 0.355 −1.0356 −9.3207 81.0000 𝑡𝑖 ² = 165.00

 Find standard error of estimate

 Calculate the coefficient of determination

 2) Standard error of estimate: 𝑇𝑖 𝛼𝑖 𝑅𝒆𝒔𝒊𝒅𝒖𝒂𝒍 𝑆𝒄𝒂𝒍𝒆𝒅 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍

 3) Find the coefficient of determination

 3) Find the coefficient of determination

 3) Find the coefficient of determination

 3) Find the coefficient of determination

 3) Find the correlation coefficient

might not reflect the true relationship between the variables.

model will make accurate predictions.

 4) Model meets assumption of random errors

You might also like