Professional Documents
Culture Documents
Chapter 2
Simple Linear Regression
1
Overview
➢2.1 Background
➢2.2 Introduction
➢2.3 Regression
➢2.4 Least Squares Method
➢2.5 Simple Linear Regression (SLR)
➢2.6 Software Output
➢2.7 ANOVA
➢2.8 Model Evaluation
➢2.9 Applications/Examples 2
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Applications/Examples
3
2.1 Background - Regression
4
2.1 Background – Regression Model
5
2.1 Background – Types of Regression
Regression
Models 2+ Variables
Simple Multiple
Non- Non-
Linear Linear
Linear Linear
6
2.1 Background – Types of Regression
(Education) x y (Income)
7
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Application/Examples
8
2.2 Introduction - Simple Regression
➢ Simple regression analysis is a statistical tool.
10
2.2 Introduction – Simple Regression
Man hours
180
x. 100
𝑦 = 𝑓(𝑥) 80
60
40
20
0
0 20 40 60 80 100
11
2.2 Introduction - Regression Function
❑ Regard Y as a random
variable.
❑ For each X, take f (x) to be
the expected value (i.e. mean
value) of y.
❑ Given that E (Y) denotes the
expected value of Y, call the
equation the regression
function. 𝐸(𝑌) = 𝑓(𝑥)
12
2.2 Introduction - Regression Application
◼ Description
◼ Control
◼ Prediction
13
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Applications/Examples
14
2.3 Regression
◼ Scope of model
We may need to restrict the coverage of model to
some interval or region of values of the independent
variable(s).
15
2.3 Regression - Population & Sample
16
2.3 Regression - Regression Model
X is a known constant
17
2.3 Regression - Regression Coefficients
18
2.3 Regression - Regression Line
19
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Application/Examples
20
2.4 Least Squares Method
22
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Applications/Examples
23
2.5 SLR - Computation
𝑛 𝑛 2
24
2.5 SLR - Computation
25
2.5 SLR - Estimation of Mean Response
26
2.5 SLR – Estimation of Mean Response
◼ From the previous table:
𝑛 = 10 𝑥 = 564 𝑥 2 = 32604
𝑦 = 14365 𝑥𝑦 = 818755
28
2.5 SLR – Estimation of Mean Response
30
2.6 Software Output
Regression Statistics Part 3
Multiple R 0.84795003
R Square 0.71901926
Adjusted R Square 0.68389667
Standard Error 67.1944721
Observations 10
ANOVA Part 2
df SS MS F Significance F
Regression 1 92431.72331 92431.7 20.4717 0.0019382
Residual 8 36120.77669 4515.1
Total 9 128552.5
Regression Analysis Part 1
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 828.126888 136.1285978 6.08342 0.00029 514.2137788 1142.039998 514.2137788 1142.039998
X Variable 1 10.7867573 2.384042146 4.52457 0.00194 5.289146253 16.28436835 5.289146253 16.28436835
31
2.6 Software Output
Evaluation of the SLR model using Software output
32
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Applications/Examples
33
2.7 ANOVA
◼ ANOVA (Analysis of Variance) is the term for
statistical analyses of the different sources of variation.
Production (x)
4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
($M)
Electricity Usage
2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53
(y)(kWh)
Regression Statistics
Multiple R 0.895605603
R Square 0.802109396
Adjusted R Square 0.782320336
Standard Error 0.172947969
Observations 12
ANOVA
df SS MS F Significance F
Regression 1 1.212381668 1.21238 40.533 8.1759E-05
Residual 10 0.299109998 0.02991
Total 11 1.511491667
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 0.409048191 0.385990515 1.05974 0.31419 -0.450992271 1.269088653 -0.45099227 1.269088653
X Variable 1 0.498830121 0.078351706 6.36655 8.2E-05 0.324251642 0.673408601 0.32425164 0.673408601
36
2.7 ANOVA – SST, SSE & SSR
◼ The measure of total variation, denoted by SST, is the sum of the
squared deviations:
lj 2
𝑆𝑆𝑇 = (𝑦𝑖 − 𝑦)
◼ The greater is SST, the greater is the variation among the y values.
𝑦𝑖 − 𝑦ො𝑖
2.7 ANOVA – SST, SSE & SSR
◼ The measure of variation in the data around the fitted
regression line is the sum of squared estimates of
errors (SSE) (or SS residuals):
lj 2
𝑆𝑆𝑅 = (𝑦ො𝑖 − 𝑦)
◼ In our example:
SSR = SST – SSE = 1.51149 – 0.2991 = 1.21239
lj 2 = (𝑦ො𝑖 − 𝑦)
(𝑦𝑖 − 𝑦) lj 2 + (𝑦𝑖 − 𝑦ො𝑖 )2
𝑛 − 1 = 1 + (𝑛 − 2)
2.7 ANOVA – SST, SSE & SSR
𝑆𝑆𝑅
◼ Regression mean square (MSR) 𝑀𝑆𝑅 =
1
𝑆𝑆𝐸
◼ Error mean square (MSE) 𝑀𝑆𝐸 =
𝑛−2
2.7 ANOVA - Mean Squares (MS)
◼ From the previous example:
◼ MSR = SSR/1
= 1.21239/1
= 1.21239
◼ MSE = SSE/n-2
= 0.2991/10
= 0.02991
2.7 ANOVA – Table
◼ The breakdowns of the total sum of squares and
associated degrees of freedom are displayed in a
table called analysis of variance table (ANOVA
table)
Source of SS df MS F-Test
Variation
Regression SSR 1 MSR MSR/MSE
=SSR/1
Error SSE n-2 MSE
=SSE/(n-2)
Total SST n-1
2.7 ANOVA – Excel Output
◼ In our car plant example the ANOVA table is:
ANOVA
df SS MS F Significance F
Regression 1 1.212381668 1.21238 40.533 8.1759E-05
Residual 10 0.299109998 0.02991
Total 11 1.511491667
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Application/Examples
48
2.8 Model Evaluation
49
2.8 Model Evaluation
SLR model evaluation using Software outpot
◼ To estimate , use 𝜎ො = 𝜎ො 2
52
2.8 Model Evaluation - (i) Standard error of
estimate ()
𝐒𝐒𝐄
➢ Compute Standard Error of Estimate by 𝜎ො 𝟐 =
𝐧−𝟐
𝐧 𝟐
(𝐒 )
➢ Where SSE is 𝐒𝐒𝐄 = (𝐲𝐢 − 𝑦ො𝐢 )𝟐 = 𝐒𝐲𝐲 −
𝐱𝐲
𝐒𝐗𝐗
𝐢=𝟏
53
2.8 Model Evaluation - (i) Standard error of
estimate () – Excel Output
Regression Statistics
Multiple R 0.895605603
R Square 0.802109396
Adjusted R Square 0.782320336
Standard Error 0.172947969
Observations 12
54
2.8 Model Evaluation – (ii) Coefficient of
Determination
➢ Coefficient of determination
𝑆𝑆𝑇 − 𝑆𝑆𝐸 𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅2 = = =1−
𝑆𝑆𝑇 𝑆𝑆𝑇 𝑆𝑆𝑇
55
2.8 Model Evaluation – (ii) Coefficient of
Determination – Excel Output
Regression Statistics
Multiple R 0.895605603
R Square 0.802109396
Adjusted R Square 0.782320336
Standard Error 0.172947969
Observations 12
56
2.8 Model Evaluation – (iii.) The hypothesis test
This notation reminds us that the intercept 𝛽መ0 of the fitted line
estimates the intercept 𝛽0 of the population line, and the
slope estimates the slope 𝛽1 .
57
2.8 Model Evaluation – (iii) The hypothesis test
60
2.8 Model Evaluation – (iii) The hypothesis
test (b. F-test)
ANOVA
df SS MS F Significance F
Regression 1 1.212381668 1.21238 40.533 8.1759E-05
Residual 10 0.299109998 0.02991
Total 11 1.511491667
63
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Examples
64
2.9 Example 1
Production (x) ($M) 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
Electricity Usage 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.5
(y)(kWh) 3
65
Solution – Example 1
x 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
𝑥
=58.62
y 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53
𝑦
=34.15
xy 11.18 8.09 10.65 14.02 16.86 15.22 16.82 20.17 14.24 18.29 13.08 10.63
𝑥𝑦
=169.25
x2 20.34 12.82 18.58 25.60 31.81 24.90 27.98 33.99 22.09 31.47 24.01 17.64
𝑥2
=291.23
66
i. Estimate the linear regression equation
𝐧 𝐧 𝟐
𝟐 𝟏
𝐒𝐗𝐗 = 𝐱𝐢 − 𝐱𝐢
𝐧
𝐢=𝟏 𝐢=𝟏
= 291.23– (58.62)𝟐 /12
= 4.8723
𝐧 𝐧 𝐧
𝟏
𝐒𝐗𝐘 = 𝐱 𝐢 𝐲𝐢 − 𝐱𝐢 𝐲𝐢
𝐧
𝐢=𝟏 𝐢=𝟏 𝐢=𝟏
= 169.25– (58.62)(34.15)/12
= 2.43045
𝟏
𝐒𝐘𝐘 = σ 𝐲 𝟐 − σ𝐲 𝟐
𝐧
1
= 98.6967 − (34.15)2
12
= 1.51149
67
i. Estimate the linear regression equation
𝑆𝑋𝑌
𝛽መ1 = = 2.43045/4.8723 = 0.4988
𝑆𝑋𝑋
𝛽መ0 = 𝐲lj − 𝛽መ1 𝐱lj
34.15 58.62
= – 0.4988 = 0.4091
12 12
68
ii. Find the standard error of estimate of this regression.
𝟐
𝐒𝐒𝐄
𝜎ො =
𝐧−𝟐
Square root is called standard error of estimate
SSE = SYY – (SXY)2/SXX
=1.51149 - (2.43045)2/4.8723
= 0.2991
𝟐
𝐒𝐒𝐄
𝜎ො =
𝐧−𝟐
= 0.0299
69
iii. Determine the coefficient of determination of this regression.
R2 = SSR/SST = 1 - (SSE/SST)
R2 = 1-SSE/SST
=1-(0.2991/1.51149)
= 0.802
70
iv. Test for significance of regression at 5% significance level.
𝛽መ1 − 𝛽1 0.499 − 0
𝑇= = = 6.37
𝜎ො 2 /𝑆𝑋𝑋 0.078
◼ Conclusion:
Since 40.53> 4.96, we reject H0, that is there is a linear
association between the distribution of Electricity usage
and level of production
Excel output – Example 1
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.895605603
The regression equation is
R Square 0.802109396
Adjusted R Square 0.782320336
Electricity usage = 0.409 + 0.499production
Standard Error 0.172947969
Observations 12
ANOVA
df SS MS F Significance F
Regression 1 1.212381668 1.21238167 40.5329703 8.1759E-05
Residual 10 0.299109998 0.029911
Total 11 1.511491667
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 0.409048191 0.385990515 1.05973638 0.31418974 -0.450992271 1.269088653 -0.450992271 1.269088653
X Variable 1 0.498830121 0.078351706 6.3665509 8.1759E-05 0.324251642 0.673408601 0.324251642 0.673408601
74
Exercise.
The following measurements of the specific heat of a certain chemical were made
in order to investigate the variation in specific heat with temperature.