CHAPTER 2 Simple Linear Regression

FEM 2063 - Data Analytics
Chapter 2
Simple Linear Regression
1
Overview
➢2.1 Background
➢2.2 Introduction
➢2.3 Regression
➢2.4 Least Squares Method
➢2.5 Simple Linear Regression (SLR)
➢2.6 Software Output
➢2.7 ANOVA
➢2.8 Model Evaluation
➢2.9 Applications/Examples 2
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Applications/Examples
3
2.1 Background - Regression
➢ Relation between variables where changes in some

variables may “explain” the changes in other variables.
➢ Regression model estimates the nature of the

relationship between independent and dependent
variables.
➢ Dependent variable - Employment income
➢ Independent variables - hours of work, education,
occupation, sex, age, region, years of experience etc.
4
2.1 Background – Regression Model
➢ Price of a product and quantity produced:

➢ Quantity affected by price.
➢ Dependent variable is quantity of product

➢ Independent variable is price.
➢ Price affected by quantity offered for sale.

➢ Dependent variable is price
➢ Independent variable is quantity sold.
5
2.1 Background – Types of Regression
Regression
Models 2+ Variables
Simple Multiple
Non- Non-
Linear Linear
Linear Linear
6
2.1 Background – Types of Regression
Bivariate or simple regression model
(Education) x y (Income)
Multivariate or multiple regression model

(Education) x1
(Sex) x2
(Experience) x3 y (Income)
(Age) x4
7
Overview
➢ 2.1 Background
➢ 2.3 Regression
➢ 2.7 ANOVA
➢ 2.9 Application/Examples
8
2.2 Introduction - Simple Regression
➢ Simple regression analysis is a statistical tool.
➢ Estimate the mathematical relationship between a

dependent variable (y) and an independent variable (x).
➢ The dependent variable is the variable for which we

want to make a prediction.
➢ Various non-linear forms may be used but simple

linear regression models are the most common.
9
2.2 Introduction – Simple Regression
• The quantitative analysis use the lot size Man-hours
information to predict its future 30 73
behavior. 20 50
• Current information is usually in the 60 128
form of a set of data. 80 170
40 87
• When the data form a set of pairs of
numbers, we may interpret them as 50 108
representing the observed values of 60 135
an independent (predictor) variable 30 69
X and a dependent (response) 70 148
variable Y. 60 132
10
2.2 Introduction – Simple Regression
Man hours
180
The goal is to find a functional 160
relation between the response 140
variable y and the predictor variable 120
x. 100
𝑦 = 𝑓(𝑥) 80
60
40
20
0
0 20 40 60 80 100
11
2.2 Introduction - Regression Function
❑ Regard Y as a random
variable.
❑ For each X, take f (x) to be
the expected value (i.e. mean
value) of y.
❑ Given that E (Y) denotes the
expected value of Y, call the
equation the regression
function. 𝐸(𝑌) = 𝑓(𝑥)
12
2.2 Introduction - Regression Application
Three major applications
◼ Description
◼ Control
◼ Prediction
13
Overview
➢ 2.1 Background
➢ 2.3 Regression
➢ 2.7 ANOVA
14
2.3 Regression
◼ Selection of independent variable(s)

Choose the most important predictor variable(s).
◼ Scope of model
We may need to restrict the coverage of model to
some interval or region of values of the independent
variable(s).
15
2.3 Regression - Population & Sample
16
2.3 Regression - Regression Model
General regression model 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝜀

where
0, and 1 are parameters
X is a known constant
Deviations  are independent N(o, 2)
17
2.3 Regression - Regression Coefficients
◼ The values of the regression parameters 0,

and 1 are not known. We estimate them
from data.
◼ 1 indicates the change in the mean

response per unit increase in X.
18
2.3 Regression - Regression Line
◼ If the scatter plot of our sample data suggests a

linear relationship between two variables i.e.
𝑦 = 𝛽መ0 + 𝛽መ1 𝑥
the relationship can be summarized by a straight

line plot.
◼ Least squares method give us the “best” estimated

line for our set of sample data.
19
Overview
➢ 2.1 Background
➢ 2.3 Regression
➢ 2.7 ANOVA
20
2.4 Least Squares Method
◼ ‘Best Fit’ Means Difference Between Actual Y Values &

Predicted Y Values is a Minimum.
◼ But Positive Differences Off-Set Negative ones. So square
errors!
◼ LS methods minimizes the Sum of the Squared

Differences (errors) (SSE)
21
2.4 Least Squares Method - Graphically
22
Overview
➢ 2.1 Background
➢ 2.3 Regression
➢ 2.7 ANOVA
23
2.5 SLR - Computation
◼ Write an estimated regression line based on sample data

as
𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥
◼ The method of least squares chooses the values for b0,

and b1 to minimize the sum of squared errors (SSE)
𝑛 𝑛 2
𝑆𝑆𝐸 = ෍(𝑦𝑖 − 𝑦ො𝑖 )2 = ෍ 𝑦 − 𝑏0 − 𝑏1 𝑥

𝑖=1 𝑖=1
24
2.5 SLR - Computation
25
2.5 SLR - Estimation of Mean Response
◼ Fitted regression line can be used to estimate the

mean value of y for a given value of x.
◼ Example
◼ The weekly advertising expenditure (x) and weekly
sales (y) are presented in the following table.
y x
1250 41
1380 54
1425 63
1425 54
1450 48
1300 46
1400 62
1510 61
1575 64
1650 71
26
2.5 SLR – Estimation of Mean Response
◼ From the previous table:
𝑛 = 10 ෍ 𝑥 = 564 ෍ 𝑥 2 = 32604
෍ 𝑦 = 14365 ෍ 𝑥𝑦 = 818755
◼ The least squares estimates of the regression coefficients are:

𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦 10(818755) − (564)(14365)
𝛽መ1 = 2 2
= 2
= 10.8
𝑛 σ 𝑥 − (σ 𝑥) 10(32604) − (564)
𝛽መ0 = 1436.5 − 10.8(56.4) = 828

27
◼ The estimated regression function is:

yො = 828 + 10.8x
Sales = 828 + 10.8 Expenditure
◼ This means that if the weekly advertising

expenditure is increased by $1 we would expect the
weekly sales to increase by $10.8.
28
◼ Fitted values for the sample data are obtained by

substituting the x value into the estimated regression
function.
◼ For $50 of expenditure, then estimated Sales is:
𝑆𝑎𝑙𝑒𝑠 = 828 + 10.8(50) = 1368
◼ This is called the point estimate (forecast) of the mean

response (sales).
29
Overview
➢ 2.1 Background
➢ 2.3 Regression
➢ 2.7 ANOVA
30
2.6 Software Output
Regression Statistics Part 3
Multiple R 0.84795003
R Square 0.71901926
Adjusted R Square 0.68389667
Standard Error 67.1944721
Observations 10
ANOVA Part 2
df SS MS F Significance F
Regression 1 92431.72331 92431.7 20.4717 0.0019382
Residual 8 36120.77669 4515.1
Total 9 128552.5
Regression Analysis Part 1
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 828.126888 136.1285978 6.08342 0.00029 514.2137788 1142.039998 514.2137788 1142.039998
X Variable 1 10.7867573 2.384042146 4.52457 0.00194 5.289146253 16.28436835 5.289146253 16.28436835
31
2.6 Software Output
Evaluation of the SLR model using Software output
(i) Standard error of estimate () – Part (3)
(ii) Coefficient of determination (R2) – Part (3)
(iii) Hypothesis test of slope

a)The t-test of the slope – Part (1)
b)The F-test of the slope – Part (2)
32
Overview
➢ 2.1 Background
➢ 2.3 Regression
➢ 2.7 ANOVA
33
2.7 ANOVA
◼ ANOVA (Analysis of Variance) is the term for
statistical analyses of the different sources of variation.
◼ Partitioning of sums of squares and degrees of freedom

associated with the response variable.
◼ In the regression setting, the observed variation in the

responses (yi) comes from two sources.
2.7 ANOVA
◼ Consider the manager of a car plant wishes to investigate

how the plant’s electricity usage (kWh) depends upon the
plant production ($M).
Production (x)
4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
($M)
Electricity Usage
2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53
(y)(kWh)
◼ There is variation in the amount (kWh) of electricity. The

variation of the yi is conventionally measured in terms of
the deviations:
𝑦𝑖 − 𝑦ǉ
35
2.7 ANOVA
Regression Statistics
R Square 0.802109396
Observations 12
ANOVA
Regression 1 1.212381668 1.21238 40.533 8.1759E-05
Residual 10 0.299109998 0.02991
Total 11 1.511491667
Intercept 0.409048191 0.385990515 1.05974 0.31419 -0.450992271 1.269088653 -0.45099227 1.269088653
X Variable 1 0.498830121 0.078351706 6.36655 8.2E-05 0.324251642 0.673408601 0.32425164 0.673408601
36
2.7 ANOVA – SST, SSE & SSR
◼ The measure of total variation, denoted by SST, is the sum of the
squared deviations:
ǉ 2
𝑆𝑆𝑇 = ෍(𝑦𝑖 − 𝑦)
◼ If SST = 0, all observations are the same (No variability).
◼ The greater is SST, the greater is the variation among the y values.
◼ In regression model, the measure of variation is that of the y

observations variability around the fitted line:
𝑦𝑖 − 𝑦ො𝑖
◼ The measure of variation in the data around the fitted
regression line is the sum of squared estimates of
errors (SSE) (or SS residuals):
𝑆𝑆𝐸 = ෍(𝑦𝑖 − 𝑦ො𝑖 )2
◼ For our Car Plant expenditure example

SSE = 0.2991 , SST = 1.51149
◼ Difference between these two sums of squares?

◼ The difference is the regression sum of squares (SSR)
ǉ 2
𝑆𝑆𝑅 = ෍(𝑦ො𝑖 − 𝑦)
◼ SSR is the variation among the predicted responses 𝑦ො𝑖
◼ The larger is SSR relative to SST, the greater is the role of

regression line in explaining the total variability in y
observations.
◼ In our example:
SSR = SST – SSE = 1.51149 – 0.2991 = 1.21239
◼ This indicates that most of variability in

electricity usage can be explained by the
relation between the plant production and the
electricity usage.
◼ We can decompose the total variability in the

observations yi as follows:
𝑦𝑖 − 𝑦ǉ = 𝑦ො𝑖 − 𝑦ǉ + 𝑦𝑖 − 𝑦ො𝑖
◼ The total deviation 𝑦ǉ can be viewed as the sum of

two components:
◼ The deviation of the fitted value 𝑦ො𝑖 around the mean
◼ The deviation of yi around the fitted regression line.

◼ The analysis of variance equation holds:
ǉ 2 = ෍(𝑦ො𝑖 − 𝑦)
෍(𝑦𝑖 − 𝑦) ǉ 2 + ෍(𝑦𝑖 − 𝑦ො𝑖 )2
◼ Breakdown of degree of freedom:
𝑛 − 1 = 1 + (𝑛 − 2)
▪ Total Sum of Squares (SST ):

▪ Measure how much variance is in the dependent variable.
▪ Made up of the SSE and SSR
𝐧 𝐧 𝐧
ǉ 𝟐 = ෍(𝐲𝐢 − 𝐲ො𝐢 )𝟐 + ෍( 𝐲ො𝐢 − 𝐲)

𝐒𝐒𝐓 = ෍(𝐲𝐢 − 𝐲) ǉ 𝟐
𝐢=𝟏 𝐢=𝟏 𝐢=𝟏
SST = SSE + SSR

43
2.7 ANOVA - Mean Squares (MS)
◼ A sum of squares divided by its degrees of freedom is
called a mean square (MS)
𝑆𝑆𝑅
◼ Regression mean square (MSR) 𝑀𝑆𝑅 =
1
𝑆𝑆𝐸
◼ Error mean square (MSE) 𝑀𝑆𝐸 =
𝑛−2
2.7 ANOVA - Mean Squares (MS)
◼ From the previous example:
◼ MSR = SSR/1
= 1.21239/1
= 1.21239
◼ MSE = SSE/n-2
= 0.2991/10
= 0.02991
2.7 ANOVA – Table
◼ The breakdowns of the total sum of squares and
associated degrees of freedom are displayed in a
table called analysis of variance table (ANOVA
table)
Source of SS df MS F-Test
Variation
Regression SSR 1 MSR MSR/MSE
=SSR/1
Error SSE n-2 MSE
=SSE/(n-2)
Total SST n-1
2.7 ANOVA – Excel Output
◼ In our car plant example the ANOVA table is:
ANOVA
Regression 1 1.212381668 1.21238 40.533 8.1759E-05
Residual 10 0.299109998 0.02991
Total 11 1.511491667
Overview
➢ 2.1 Background
➢ 2.3 Regression
➢ 2.7 ANOVA
48
2.8 Model Evaluation
(i) Standard error of estimate ()
(ii) Coefficient of determination (R2)
(iii) Hypothesis test

a)The t-test of the slope
b)The F-test of the slope
49
2.8 Model Evaluation
SLR model evaluation using Software outpot
(i) Standard error of estimate () – Part (3)
(ii) Coefficient of determination (R2) – Part (3)
(iii) Hypothesis test

a)The t-test of the slope – Part (1)
b)The F-test of the slope – Part (2)

50
2.8 Model Evaluation - (i) Standard error of
estimate ()
◼ Error (or residual) difference between the

observed value yi and the corresponding fitted
value 𝑦ො𝑖 .
𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖
◼ Highly useful for studying whether a given

regression model is appropriate for the data at
hand.
51
estimate ()
◼ For simple linear regression the estimate of 2 is the

average Sum Squared Error (SSE)
𝐒𝐒𝐄
𝜎ො 𝟐 =
𝐧−𝟐
◼ To estimate  , use 𝜎ො = 𝜎ො 2
◼ estimates the standard deviation  of the error term

 in the statistical model for simple linear regression.
52
estimate ()
𝐒𝐒𝐄
➢ Compute Standard Error of Estimate by 𝜎ො 𝟐 =
𝐧−𝟐
𝐧 𝟐
(𝐒 )
➢ Where SSE is 𝐒𝐒𝐄 = ෍(𝐲𝐢 − 𝑦ො𝐢 )𝟐 = 𝐒𝐲𝐲 −
𝐱𝐲
𝐒𝐗𝐗
𝐢=𝟏
➢ This is an unbiased estimator for  2 (for Population)
➢ The smaller SSE the more successful is the Linear Regression

Model in explaining y.
53
estimate () – Excel Output
R Square 0.802109396
Observations 12
54
2.8 Model Evaluation – (ii) Coefficient of
Determination
➢ Coefficient of determination
𝑆𝑆𝑇 − 𝑆𝑆𝐸 𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅2 = = =1−
𝑆𝑆𝑇 𝑆𝑆𝑇 𝑆𝑆𝑇
➢ proportion of variability in the observed dependent

variable that is explained by the linear regression model.
➢ The coefficient of determination measures the strength of

that linear relationship, denoted by R2
➢ The greater R2 the more successful is the Linear Model
55
2.8 Model Evaluation – (ii) Coefficient of
Determination – Excel Output
R Square 0.802109396
Observations 12
56
2.8 Model Evaluation – (iii.) The hypothesis test
➢ Write the population regression line as 𝛽0 + 𝛽1 𝑥

The numbers 𝛽0 and 𝛽1 are parameters that describe the
population.
➢ Write the least-squares line fitted to sample data as

𝛽መ0 + 𝛽መ1 𝑥
This notation reminds us that the intercept 𝛽መ0 of the fitted line
estimates the intercept 𝛽0 of the population line, and the
slope estimates the slope 𝛽1 .
57
2.8 Model Evaluation – (iii) The hypothesis test
➢ Equivalence of F Test and t Test:

For given  level, the F test of 1 = 0 versus 1  0 is
equivalent algebraically to the two sided t-test.
➢ Thus, at a given level, we can use either the t-test or

the F-test for testing 1 = 0 versus 1  0.
➢ The t-test is more flexible since it can be used for

one sided test as well.
2.8 Model Evaluation – (iii) The hypothesis
test (a. t-test)
➢ T-test to check on adequate relationship between x and y
➢ Test the hypothesis

H0 : 𝛽 = 0 (No relationship between x and y)
1
H1: 𝛽 ≠ 0 (There is relationship between x and y)

1
𝛽መ1 − 𝛽1 𝛽መ1 − 𝛽1
➢ Test Statistic: T – distribution: 𝑇= =
𝜎ො 2 𝑠𝑒(𝑏)
𝑠𝑠𝑥𝑥
➢ Critical Region: |T | > tα/2, n-2 .

59
test (a. t-test) – Excel Output
Coefficients Standard Error t Stat

Intercept 0.409048191 0.385990515 1.05974
X Variable 1 0.498830121 0.078351706 6.36655
60
test (b. F-test)
◼ In order to be able to construct a statistical

decision rule, we need to know the distribution
of our test statistic F.
𝑀𝑆𝑅
𝐹=
𝑀𝑆𝐸
◼ When H0 is true, our test statistic, F, follows

the F- distribution with 1, and n-2 degrees of
freedom.
𝐹(𝛼; 1, 𝑛 − 2)
test (b. F-test)
◼ This time we will use the F-test. The null and
alternative hypothesis are:
𝐻0 : 𝛽1 = 0
𝐻𝑎 : 𝛽1 ≠ 0
◼ Construction of decision rule:
At  = 5% level, Reject H0 if 𝐹 > 𝐹(𝛼; 1, 𝑛 − 2)
◼ Large values of F support Ha and values of F near

1 support H0.
2.8 Model Evaluation – (iii) The
hypothesis test (b. F-test) – Excel Output
ANOVA
Regression 1 1.212381668 1.21238 40.533 8.1759E-05
Residual 10 0.299109998 0.02991
Total 11 1.511491667
63
Overview
➢ 2.1 Background
➢ 2.3 Regression
➢ 2.7 ANOVA
➢ 2.9 Examples
64
2.9 Example 1
The manager of a car plant wishes to investigate how the plant’s

electricity usage depends upon the plant production. The data is given
below estimate the linear regression equation
Production (x) ($M) 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
Electricity Usage 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.5
(y)(kWh) 3
i. Estimate the linear regression equation

ii. Find the standard error of estimate of this regression.
iii. Determine the coefficient of determination of this regression.
iv. Test for significance of regression at 5% significance level.
65
Solution – Example 1
x 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
෍𝑥
=58.62
y 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53
෍𝑦
=34.15
xy 11.18 8.09 10.65 14.02 16.86 15.22 16.82 20.17 14.24 18.29 13.08 10.63
෍ 𝑥𝑦
=169.25
x2 20.34 12.82 18.58 25.60 31.81 24.90 27.98 33.99 22.09 31.47 24.01 17.64
෍ 𝑥2
=291.23
66
𝐧 𝐧 𝟐
𝟐 𝟏
𝐒𝐗𝐗 = ෍ 𝐱𝐢 − ෍ 𝐱𝐢
𝐧
𝐢=𝟏 𝐢=𝟏
= 291.23– (58.62)𝟐 /12
= 4.8723
𝐧 𝐧 𝐧
𝟏
𝐒𝐗𝐘 = ෍ 𝐱 𝐢 𝐲𝐢 − ෍ 𝐱𝐢 ෍ 𝐲𝐢
𝐧
𝐢=𝟏 𝐢=𝟏 𝐢=𝟏
= 169.25– (58.62)(34.15)/12
= 2.43045
𝟏
𝐒𝐘𝐘 = σ 𝐲 𝟐 − σ𝐲 𝟐
𝐧
1
= 98.6967 − (34.15)2
12
= 1.51149
67
𝑆𝑋𝑌
𝛽መ1 = = 2.43045/4.8723 = 0.4988
𝑆𝑋𝑋
𝛽መ0 = 𝐲ǉ − 𝛽መ1 𝐱ǉ
34.15 58.62
= – 0.4988 = 0.4091
12 12
Estimated Regression Line 𝑦ො = 0.4091 + 0.4988𝑥
68
ii. Find the standard error of estimate of this regression.
𝟐
𝐒𝐒𝐄
𝜎ො =
𝐧−𝟐
Square root is called standard error of estimate
SSE = SYY – (SXY)2/SXX
=1.51149 - (2.43045)2/4.8723
= 0.2991
𝟐
𝐒𝐒𝐄
𝜎ො =
𝐧−𝟐
= 0.0299
69
iii. Determine the coefficient of determination of this regression.
R2 = SSR/SST = 1 - (SSE/SST)
R2 = 1-SSE/SST
=1-(0.2991/1.51149)
= 0.802
70
𝛼 = 0.05; 𝑡𝛼/2,𝑛−2 = 𝑡0.025,10 = 2.228
𝑆𝑋𝑋 = 4.8723; 𝜎ො 2 = 0.0299
𝛽መ1 − 𝛽1 0.499 − 0
𝑇= = = 6.37
𝜎ො 2 /𝑆𝑋𝑋 0.078
Critical Region: |T | > tα/2, n-2 .

Since 6.37 > 2.228, reject H0 , thus, the distribution of
Electricity usage does depend on level of production
71
◼ Using our example again, let us repeat the earlier test on

1. This time we will use the F-test. The null and
alternative hypothesis are:
𝐻0 : 𝛽1 = 0
𝐻𝑎 : 𝛽1 ≠ 0
◼ Let  = 0.05. Since n=12, we require F(0.05; 1,10). From

table we find that F(0.05; 1,10) = 4.96. Therefore the
decision rule is:
◼ Reject H0 since: 𝐹 = 40.53 > 4.96
◼ Conclusion:
Since 40.53> 4.96, we reject H0, that is there is a linear
association between the distribution of Electricity usage
and level of production
Excel output – Example 1
SUMMARY OUTPUT
The regression equation is
R Square 0.802109396
Electricity usage = 0.409 + 0.499production
Observations 12
ANOVA
Regression 1 1.212381668 1.21238167 40.5329703 8.1759E-05
Residual 10 0.299109998 0.029911
Total 11 1.511491667
Intercept 0.409048191 0.385990515 1.05973638 0.31418974 -0.450992271 1.269088653 -0.450992271 1.269088653
X Variable 1 0.498830121 0.078351706 6.3665509 8.1759E-05 0.324251642 0.673408601 0.324251642 0.673408601
74
Exercise.
The following measurements of the specific heat of a certain chemical were made
in order to investigate the variation in specific heat with temperature.
i. Plot the points on a scatter diagram

ii. Estimate the regression line of specific heat on temperature
iii. Determine the coefficient of determination of this regression
model.
iv. Estimate the value of the specific heat when the temperature is
35oC.
75
76

CHAPTER 2 Simple Linear Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CHAPTER 2 Simple Linear Regression

Uploaded by

Copyright:

Available Formats

FEM 2063 - Data Analytics

➢ Relation between variables where changes in some

➢ Regression model estimates the nature of the

➢ Price of a product and quantity produced:

➢ Dependent variable is quantity of product

➢ Price affected by quantity offered for sale.

Bivariate or simple regression model

Multivariate or multiple regression model

➢ Estimate the mathematical relationship between a

➢ The dependent variable is the variable for which we

➢ Various non-linear forms may be used but simple

The goal is to find a functional 160

relation between the response 140

variable y and the predictor variable 120

Three major applications

◼ Selection of independent variable(s)

General regression model 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝜀

Deviations  are independent N(o, 2)

◼ The values of the regression parameters 0,

◼ 1 indicates the change in the mean

◼ If the scatter plot of our sample data suggests a

the relationship can be summarized by a straight

◼ Least squares method give us the “best” estimated

◼ ‘Best Fit’ Means Difference Between Actual Y Values &

◼ LS methods minimizes the Sum of the Squared

◼ Write an estimated regression line based on sample data

◼ The method of least squares chooses the values for b0,

𝑆𝑆𝐸 = ෍(𝑦𝑖 − 𝑦ො𝑖 )2 = ෍ 𝑦 − 𝑏0 − 𝑏1 𝑥

◼ Fitted regression line can be used to estimate the

◼ The least squares estimates of the regression coefficients are:

𝛽መ0 = 1436.5 − 10.8(56.4) = 828

◼ The estimated regression function is:

◼ This means that if the weekly advertising

◼ Fitted values for the sample data are obtained by

◼ For $50 of expenditure, then estimated Sales is:

𝑆𝑎𝑙𝑒𝑠 = 828 + 10.8(50) = 1368

◼ This is called the point estimate (forecast) of the mean

(i) Standard error of estimate () – Part (3)

(ii) Coefficient of determination (R2) – Part (3)

(iii) Hypothesis test of slope

◼ Partitioning of sums of squares and degrees of freedom

◼ In the regression setting, the observed variation in the

◼ Consider the manager of a car plant wishes to investigate

◼ There is variation in the amount (kWh) of electricity. The

◼ If SST = 0, all observations are the same (No variability).

◼ In regression model, the measure of variation is that of the y

𝑆𝑆𝐸 = ෍(𝑦𝑖 − 𝑦ො𝑖 )2

◼ For our Car Plant expenditure example

◼ Difference between these two sums of squares?

◼ SSR is the variation among the predicted responses 𝑦ො𝑖

◼ The larger is SSR relative to SST, the greater is the role of

◼ This indicates that most of variability in

◼ We can decompose the total variability in the

◼ The total deviation 𝑦ǉ can be viewed as the sum of

◼ The deviation of the fitted value 𝑦ො𝑖 around the mean

◼ The deviation of yi around the fitted regression line.

◼ Breakdown of degree of freedom:

▪ Total Sum of Squares (SST ):

ǉ 𝟐 = ෍(𝐲𝐢 − 𝐲ො𝐢 )𝟐 + ෍( 𝐲ො𝐢 − 𝐲)

SST = SSE + SSR

(i) Standard error of estimate ()

(ii) Coefficient of determination (R2)

(iii) Hypothesis test