You are on page 1of 4

Linear Regression

Two variables, 𝑥 and 𝑦, are said to be linearly related if:


 When 𝑥 increases, 𝑦 increases.
 When 𝑥 increases, 𝑦 decreases.

Determining the Line of Best Fit


Given n data points of the form (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ),…, (𝑥2 , 𝑦2 ), the line of best fit
through these points is given by the regression line:
𝑦 = 𝑎0 + 𝑎1 𝑥
𝑛 ∑ 𝑥𝑖 𝑦𝑖 −∑ 𝑥𝑖 ∑ 𝑦𝑖
Where 𝑎1 = 2
𝑛 ∑ 𝑥𝑖2 −(∑ 𝑥𝑖 )

and 𝑎0 = 𝑦̅ − 𝑎1 𝑥̅
∑ 𝑥𝑖 ∑ 𝑦𝑖
with 𝑥̅ = and 𝑦̅ = .
𝑛 𝑛
𝑛

∑ 𝑥𝑖 𝑦𝑖 = ∑ 𝑥𝑖 𝑦𝑖 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋯ + 𝑥𝑛 𝑦𝑛
𝑖=1
𝑛

∑ 𝑥𝑖 = ∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
𝑖=1
𝑛

∑ 𝑦𝑖 = ∑ 𝑦𝑖 = 𝑦1 + 𝑦2 + ⋯ + 𝑦𝑛
𝑖=1
𝑛

∑(𝑥𝑖2 ) = ∑(𝑥𝑖2 ) = 𝑥12 + 𝑥22 + ⋯ + 𝑥22


𝑖=1
𝑛

∑(𝑦𝑖2 ) = ∑(𝑦𝑖2 ) = 𝑦12 + 𝑦22 + ⋯ + 𝑦22


𝑖=1

Group 2 MATH 1115 Sem 1 2019/2020


2
Please note ∑(𝑥𝑖2 ) ≠ (∑ 𝑥𝑖 )
The point (𝑥̅ , 𝑦̅) always lies on the line of best fit thus 𝑦̅ = 𝑎0 + 𝑎1 𝑥̅
(i.e. 𝑎0 = 𝑦̅ − 𝑎1 𝑥̅ ).
Correlation Coefficient, 𝒓
The correlation coefficient, 𝑟, indicates the strength of the linear relationship
between 𝑥 and 𝑦 (or the linear degree of scatter among data points).
∑ 𝑥𝑖 𝑦𝑖 −𝑛𝑥̅ 𝑦̅
Correlation coefficient 𝑟 = , − 1 ≤ 𝑟 ≤ 1, 0 ≤ 𝑟 2 ≤ 1
√∑ 𝑥𝑖2 −𝑛(𝑥̅ )2 √∑ 𝑦𝑖2 −𝑛(𝑦̅)2

 𝑟 = 1, is an indicator of perfect positive correlation (line has positive slope)


 𝑟 = -1 is an indicator of perfect negative correlation (line has negative slope)
 𝑟 = 0 suggests that there is absolutely no linearly correlation.

Worked Example:
Two variables, 𝑥 and 𝑦 are linearly related. From experiment:
𝑥 1.1 2.2 2.9 3.4 5.4
𝑦 4.9 6.0 6.9 7.5 9.6

(a) Determine the equation of the line of best fit (place answers to 2 d.p.)
(b) Using the equation of the line of best, find
i. The value of 𝑦 when 𝑥 = 4.3
ii. The value of 𝑥 when 𝑦 = 4.3
(c) Compute the correlation coefficient, r.

Group 2 MATH 1115 Sem 1 2019/2020


Solution:
We first must set up and complete the table as shown below:
x y xy x*x y*y
1 1.1 4.9 5.39 1.21 24.01
2 2.2 6 13.2 4.84 36
3 2.9 6.9 20.01 8.41 47.61
4 3.4 7.5 25.5 11.56 56.25
5 5.4 9.6 51.84 29.16 92.16
Total ∑ 𝑥𝑖 = 15 ∑ 𝑦𝑖 ∑ 𝑥𝑖 𝑦𝑖 ∑(𝑥𝑖2 ) ∑(𝑦𝑖2 )
= 34.9 = 115.94 = 55.18 = 256.03

(a) Determine the equation of the line of best fit (place answers to 2 d.p.)
𝑛 ∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥𝑖 ∑ 𝑦𝑖 5(115.94) − (15)(34.9) 56.2
𝑎1 = 2 = = = 1.10 (𝑡𝑜 2 𝑑. 𝑝. )
𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 ) 5(55.18) − (15)2 50.9

∑ 𝑥𝑖 15
𝑥̅ = = =3
𝑛 5
∑ 𝑦𝑖 34.9
𝑦̅ = = = 6.98
𝑛 5
56.2
𝑎0 = 𝑦̅ − 𝑎1 𝑥̅ = 6.98 − ( × 3) = 3.67 (𝑡𝑜 2 𝑑. 𝑝. )
50.9
Equation of the line of best fit is 𝑦 = 3.67 + 1.10𝑥

(b) i. The value of 𝑦 when 𝑥 = 4.3 (Interpolation)


𝑦 = 3.67 + 1.10(4.3) = 8.4

ii. The value of 𝑥 when 𝑦 = 4.3 (Extrapolation)


4.3 = 3.67 + 1.10𝑥
4.3 − 3.67
𝑥= = 0.57
1.10

Group 2 MATH 1115 Sem 1 2019/2020


(c ) Compute the correlation coefficient, r.

∑ 𝑥𝑖 𝑦𝑖 − 𝑛𝑥̅ 𝑦̅ 115.94 − 5(3)(6.98)


𝑟= =
√∑ 𝑥𝑖2 − 𝑛(𝑥̅ )2 √∑ 𝑦𝑖2 − 𝑛(𝑦̅)2 √55.18 − (5)(3)2 √256.03 − (5)(6.98)2

11.24
= = 0.999
√10.18√12.428

Graphical Presentation done in Excel:

Group 2 MATH 1115 Sem 1 2019/2020

You might also like