You are on page 1of 76

FEM 2063 - Data Analytics

Chapter 2
Simple Linear Regression

1
Overview
➢2.1 Background
➢2.2 Introduction
➢2.3 Regression
➢2.4 Least Squares Method
➢2.5 Simple Linear Regression (SLR)
➢2.6 Software Output
➢2.7 ANOVA
➢2.8 Model Evaluation
➢2.9 Applications/Examples 2
Overview

➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Applications/Examples
3
2.1 Background - Regression

➢ Relation between variables where changes in some


variables may “explain” the changes in other variables.

➢ Regression model estimates the nature of the


relationship between independent and dependent
variables.
➢ Dependent variable - Employment income
➢ Independent variables - hours of work, education,
occupation, sex, age, region, years of experience etc.

4
2.1 Background – Regression Model

➢ Price of a product and quantity produced:


➢ Quantity affected by price.

➢ Dependent variable is quantity of product


➢ Independent variable is price.

➢ Price affected by quantity offered for sale.


➢ Dependent variable is price
➢ Independent variable is quantity sold.

5
2.1 Background – Types of Regression

Regression
Models 2+ Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear

6
2.1 Background – Types of Regression

Bivariate or simple regression model

(Education) x y (Income)

Multivariate or multiple regression model


(Education) x1
(Sex) x2
(Experience) x3 y (Income)
(Age) x4

7
Overview

➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Application/Examples
8
2.2 Introduction - Simple Regression
➢ Simple regression analysis is a statistical tool.

➢ Estimate the mathematical relationship between a


dependent variable (y) and an independent variable (x).

➢ The dependent variable is the variable for which we


want to make a prediction.

➢ Various non-linear forms may be used but simple


linear regression models are the most common.
9
2.2 Introduction – Simple Regression
• The quantitative analysis use the lot size Man-hours
information to predict its future 30 73
behavior. 20 50
• Current information is usually in the 60 128
form of a set of data. 80 170
40 87
• When the data form a set of pairs of
numbers, we may interpret them as 50 108
representing the observed values of 60 135
an independent (predictor) variable 30 69
X and a dependent (response) 70 148
variable Y. 60 132

10
2.2 Introduction – Simple Regression
Man hours
180

The goal is to find a functional 160

relation between the response 140

variable y and the predictor variable 120

x. 100

𝑦 = 𝑓(𝑥) 80

60

40

20

0
0 20 40 60 80 100

11
2.2 Introduction - Regression Function

❑ Regard Y as a random
variable.
❑ For each X, take f (x) to be
the expected value (i.e. mean
value) of y.
❑ Given that E (Y) denotes the
expected value of Y, call the
equation the regression
function. 𝐸(𝑌) = 𝑓(𝑥)
12
2.2 Introduction - Regression Application

Three major applications

◼ Description
◼ Control

◼ Prediction

13
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Applications/Examples

14
2.3 Regression

◼ Selection of independent variable(s)


Choose the most important predictor variable(s).

◼ Scope of model
We may need to restrict the coverage of model to
some interval or region of values of the independent
variable(s).

15
2.3 Regression - Population & Sample

16
2.3 Regression - Regression Model

General regression model 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝜀


where
0, and 1 are parameters

X is a known constant

Deviations  are independent N(o, 2)

17
2.3 Regression - Regression Coefficients

◼ The values of the regression parameters 0,


and 1 are not known. We estimate them
from data.

◼ 1 indicates the change in the mean


response per unit increase in X.

18
2.3 Regression - Regression Line

◼ If the scatter plot of our sample data suggests a


linear relationship between two variables i.e.
𝑦 = 𝛽መ0 + 𝛽መ1 𝑥

the relationship can be summarized by a straight


line plot.

◼ Least squares method give us the “best” estimated


line for our set of sample data.

19
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Application/Examples
20
2.4 Least Squares Method

◼ ‘Best Fit’ Means Difference Between Actual Y Values &


Predicted Y Values is a Minimum.
◼ But Positive Differences Off-Set Negative ones. So square
errors!

◼ LS methods minimizes the Sum of the Squared


Differences (errors) (SSE)
21
2.4 Least Squares Method - Graphically

22
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Applications/Examples
23
2.5 SLR - Computation

◼ Write an estimated regression line based on sample data


as
𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥

◼ The method of least squares chooses the values for b0,


and b1 to minimize the sum of squared errors (SSE)

𝑛 𝑛 2

𝑆𝑆𝐸 = ෍(𝑦𝑖 − 𝑦ො𝑖 )2 = ෍ 𝑦 − 𝑏0 − 𝑏1 𝑥


𝑖=1 𝑖=1

24
2.5 SLR - Computation

25
2.5 SLR - Estimation of Mean Response

◼ Fitted regression line can be used to estimate the


mean value of y for a given value of x.
◼ Example
◼ The weekly advertising expenditure (x) and weekly
sales (y) are presented in the following table.
y x
1250 41
1380 54
1425 63
1425 54
1450 48
1300 46
1400 62
1510 61
1575 64
1650 71

26
2.5 SLR – Estimation of Mean Response
◼ From the previous table:

𝑛 = 10 ෍ 𝑥 = 564 ෍ 𝑥 2 = 32604

෍ 𝑦 = 14365 ෍ 𝑥𝑦 = 818755

◼ The least squares estimates of the regression coefficients are:


𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦 10(818755) − (564)(14365)
𝛽መ1 = 2 2
= 2
= 10.8
𝑛 σ 𝑥 − (σ 𝑥) 10(32604) − (564)

𝛽መ0 = 1436.5 − 10.8(56.4) = 828


27
2.5 SLR – Estimation of Mean Response

◼ The estimated regression function is:


yො = 828 + 10.8x
Sales = 828 + 10.8 Expenditure

◼ This means that if the weekly advertising


expenditure is increased by $1 we would expect the
weekly sales to increase by $10.8.

28
2.5 SLR – Estimation of Mean Response

◼ Fitted values for the sample data are obtained by


substituting the x value into the estimated regression
function.

◼ For $50 of expenditure, then estimated Sales is:

𝑆𝑎𝑙𝑒𝑠 = 828 + 10.8(50) = 1368

◼ This is called the point estimate (forecast) of the mean


response (sales).
29
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Applications/Examples

30
2.6 Software Output
Regression Statistics Part 3
Multiple R 0.84795003
R Square 0.71901926
Adjusted R Square 0.68389667
Standard Error 67.1944721
Observations 10
ANOVA Part 2
df SS MS F Significance F
Regression 1 92431.72331 92431.7 20.4717 0.0019382
Residual 8 36120.77669 4515.1
Total 9 128552.5
Regression Analysis Part 1
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 828.126888 136.1285978 6.08342 0.00029 514.2137788 1142.039998 514.2137788 1142.039998
X Variable 1 10.7867573 2.384042146 4.52457 0.00194 5.289146253 16.28436835 5.289146253 16.28436835
31
2.6 Software Output
Evaluation of the SLR model using Software output

(i) Standard error of estimate () – Part (3)

(ii) Coefficient of determination (R2) – Part (3)

(iii) Hypothesis test of slope


a)The t-test of the slope – Part (1)
b)The F-test of the slope – Part (2)

32
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Applications/Examples
33
2.7 ANOVA
◼ ANOVA (Analysis of Variance) is the term for
statistical analyses of the different sources of variation.

◼ Partitioning of sums of squares and degrees of freedom


associated with the response variable.

◼ In the regression setting, the observed variation in the


responses (yi) comes from two sources.
2.7 ANOVA

◼ Consider the manager of a car plant wishes to investigate


how the plant’s electricity usage (kWh) depends upon the
plant production ($M).

Production (x)
4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
($M)
Electricity Usage
2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53
(y)(kWh)

◼ There is variation in the amount (kWh) of electricity. The


variation of the yi is conventionally measured in terms of
the deviations:
𝑦𝑖 − 𝑦lj
35
2.7 ANOVA

Regression Statistics
Multiple R 0.895605603
R Square 0.802109396
Adjusted R Square 0.782320336
Standard Error 0.172947969
Observations 12

ANOVA
df SS MS F Significance F
Regression 1 1.212381668 1.21238 40.533 8.1759E-05
Residual 10 0.299109998 0.02991
Total 11 1.511491667

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 0.409048191 0.385990515 1.05974 0.31419 -0.450992271 1.269088653 -0.45099227 1.269088653
X Variable 1 0.498830121 0.078351706 6.36655 8.2E-05 0.324251642 0.673408601 0.32425164 0.673408601
36
2.7 ANOVA – SST, SSE & SSR
◼ The measure of total variation, denoted by SST, is the sum of the
squared deviations:
lj 2
𝑆𝑆𝑇 = ෍(𝑦𝑖 − 𝑦)

◼ If SST = 0, all observations are the same (No variability).

◼ The greater is SST, the greater is the variation among the y values.

◼ In regression model, the measure of variation is that of the y


observations variability around the fitted line:

𝑦𝑖 − 𝑦ො𝑖
2.7 ANOVA – SST, SSE & SSR
◼ The measure of variation in the data around the fitted
regression line is the sum of squared estimates of
errors (SSE) (or SS residuals):

𝑆𝑆𝐸 = ෍(𝑦𝑖 − 𝑦ො𝑖 )2

◼ For our Car Plant expenditure example


SSE = 0.2991 , SST = 1.51149

◼ Difference between these two sums of squares?


2.7 ANOVA – SST, SSE & SSR
◼ The difference is the regression sum of squares (SSR)

lj 2
𝑆𝑆𝑅 = ෍(𝑦ො𝑖 − 𝑦)

◼ SSR is the variation among the predicted responses 𝑦ො𝑖

◼ The larger is SSR relative to SST, the greater is the role of


regression line in explaining the total variability in y
observations.
2.7 ANOVA – SST, SSE & SSR

◼ In our example:
SSR = SST – SSE = 1.51149 – 0.2991 = 1.21239

◼ This indicates that most of variability in


electricity usage can be explained by the
relation between the plant production and the
electricity usage.
2.7 ANOVA – SST, SSE & SSR

◼ We can decompose the total variability in the


observations yi as follows:
𝑦𝑖 − 𝑦lj = 𝑦ො𝑖 − 𝑦lj + 𝑦𝑖 − 𝑦ො𝑖

◼ The total deviation 𝑦lj can be viewed as the sum of


two components:

◼ The deviation of the fitted value 𝑦ො𝑖 around the mean

◼ The deviation of yi around the fitted regression line.


2.7 ANOVA – SST, SSE & SSR
◼ The analysis of variance equation holds:

lj 2 = ෍(𝑦ො𝑖 − 𝑦)
෍(𝑦𝑖 − 𝑦) lj 2 + ෍(𝑦𝑖 − 𝑦ො𝑖 )2

◼ Breakdown of degree of freedom:

𝑛 − 1 = 1 + (𝑛 − 2)
2.7 ANOVA – SST, SSE & SSR

▪ Total Sum of Squares (SST ):


▪ Measure how much variance is in the dependent variable.
▪ Made up of the SSE and SSR
𝐧 𝐧 𝐧

lj 𝟐 = ෍(𝐲𝐢 − 𝐲ො𝐢 )𝟐 + ෍( 𝐲ො𝐢 − 𝐲)


𝐒𝐒𝐓 = ෍(𝐲𝐢 − 𝐲) lj 𝟐
𝐢=𝟏 𝐢=𝟏 𝐢=𝟏

SST = SSE + SSR


43
2.7 ANOVA - Mean Squares (MS)
◼ A sum of squares divided by its degrees of freedom is
called a mean square (MS)

𝑆𝑆𝑅
◼ Regression mean square (MSR) 𝑀𝑆𝑅 =
1

𝑆𝑆𝐸
◼ Error mean square (MSE) 𝑀𝑆𝐸 =
𝑛−2
2.7 ANOVA - Mean Squares (MS)
◼ From the previous example:
◼ MSR = SSR/1
= 1.21239/1
= 1.21239

◼ MSE = SSE/n-2
= 0.2991/10
= 0.02991
2.7 ANOVA – Table
◼ The breakdowns of the total sum of squares and
associated degrees of freedom are displayed in a
table called analysis of variance table (ANOVA
table)
Source of SS df MS F-Test
Variation
Regression SSR 1 MSR MSR/MSE
=SSR/1
Error SSE n-2 MSE
=SSE/(n-2)
Total SST n-1
2.7 ANOVA – Excel Output
◼ In our car plant example the ANOVA table is:

ANOVA
df SS MS F Significance F
Regression 1 1.212381668 1.21238 40.533 8.1759E-05
Residual 10 0.299109998 0.02991
Total 11 1.511491667
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Application/Examples
48
2.8 Model Evaluation

(i) Standard error of estimate ()

(ii) Coefficient of determination (R2)

(iii) Hypothesis test


a)The t-test of the slope
b)The F-test of the slope

49
2.8 Model Evaluation
SLR model evaluation using Software outpot

(i) Standard error of estimate () – Part (3)

(ii) Coefficient of determination (R2) – Part (3)

(iii) Hypothesis test


a)The t-test of the slope – Part (1)

b)The F-test of the slope – Part (2)


50
2.8 Model Evaluation - (i) Standard error of
estimate ()

◼ Error (or residual) difference between the


observed value yi and the corresponding fitted
value 𝑦ො𝑖 .
𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖

◼ Highly useful for studying whether a given


regression model is appropriate for the data at
hand.
51
2.8 Model Evaluation - (i) Standard error of
estimate ()

◼ For simple linear regression the estimate of 2 is the


average Sum Squared Error (SSE)
𝐒𝐒𝐄
𝜎ො 𝟐 =
𝐧−𝟐

◼ To estimate  , use 𝜎ො = 𝜎ො 2

◼ estimates the standard deviation  of the error term


 in the statistical model for simple linear regression.

52
2.8 Model Evaluation - (i) Standard error of
estimate ()
𝐒𝐒𝐄
➢ Compute Standard Error of Estimate by 𝜎ො 𝟐 =
𝐧−𝟐
𝐧 𝟐
(𝐒 )
➢ Where SSE is 𝐒𝐒𝐄 = ෍(𝐲𝐢 − 𝑦ො𝐢 )𝟐 = 𝐒𝐲𝐲 −
𝐱𝐲
𝐒𝐗𝐗
𝐢=𝟏

➢ This is an unbiased estimator for  2 (for Population)

➢ The smaller SSE the more successful is the Linear Regression


Model in explaining y.

53
2.8 Model Evaluation - (i) Standard error of
estimate () – Excel Output

Regression Statistics
Multiple R 0.895605603
R Square 0.802109396
Adjusted R Square 0.782320336
Standard Error 0.172947969
Observations 12

54
2.8 Model Evaluation – (ii) Coefficient of
Determination

➢ Coefficient of determination
𝑆𝑆𝑇 − 𝑆𝑆𝐸 𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅2 = = =1−
𝑆𝑆𝑇 𝑆𝑆𝑇 𝑆𝑆𝑇

➢ proportion of variability in the observed dependent


variable that is explained by the linear regression model.

➢ The coefficient of determination measures the strength of


that linear relationship, denoted by R2

➢ The greater R2 the more successful is the Linear Model

55
2.8 Model Evaluation – (ii) Coefficient of
Determination – Excel Output

Regression Statistics
Multiple R 0.895605603
R Square 0.802109396
Adjusted R Square 0.782320336
Standard Error 0.172947969
Observations 12

56
2.8 Model Evaluation – (iii.) The hypothesis test

➢ Write the population regression line as 𝛽0 + 𝛽1 𝑥


The numbers 𝛽0 and 𝛽1 are parameters that describe the
population.

➢ Write the least-squares line fitted to sample data as


𝛽መ0 + 𝛽መ1 𝑥

This notation reminds us that the intercept 𝛽መ0 of the fitted line
estimates the intercept 𝛽0 of the population line, and the
slope estimates the slope 𝛽1 .
57
2.8 Model Evaluation – (iii) The hypothesis test

➢ Equivalence of F Test and t Test:


For given  level, the F test of 1 = 0 versus 1  0 is
equivalent algebraically to the two sided t-test.

➢ Thus, at a given level, we can use either the t-test or


the F-test for testing 1 = 0 versus 1  0.

➢ The t-test is more flexible since it can be used for


one sided test as well.
2.8 Model Evaluation – (iii) The hypothesis
test (a. t-test)
➢ T-test to check on adequate relationship between x and y

➢ Test the hypothesis


H0 : 𝛽 = 0 (No relationship between x and y)
1

H1: 𝛽 ≠ 0 (There is relationship between x and y)


1
𝛽መ1 − 𝛽1 𝛽መ1 − 𝛽1
➢ Test Statistic: T – distribution: 𝑇= =
𝜎ො 2 𝑠𝑒(𝑏)
𝑠𝑠𝑥𝑥

➢ Critical Region: |T | > tα/2, n-2 .


59
2.8 Model Evaluation – (iii) The hypothesis
test (a. t-test) – Excel Output

Coefficients Standard Error t Stat


Intercept 0.409048191 0.385990515 1.05974
X Variable 1 0.498830121 0.078351706 6.36655

60
2.8 Model Evaluation – (iii) The hypothesis
test (b. F-test)

◼ In order to be able to construct a statistical


decision rule, we need to know the distribution
of our test statistic F.
𝑀𝑆𝑅
𝐹=
𝑀𝑆𝐸

◼ When H0 is true, our test statistic, F, follows


the F- distribution with 1, and n-2 degrees of
freedom.
𝐹(𝛼; 1, 𝑛 − 2)
2.8 Model Evaluation – (iii) The hypothesis
test (b. F-test)
◼ This time we will use the F-test. The null and
alternative hypothesis are:
𝐻0 : 𝛽1 = 0
𝐻𝑎 : 𝛽1 ≠ 0
◼ Construction of decision rule:
At  = 5% level, Reject H0 if 𝐹 > 𝐹(𝛼; 1, 𝑛 − 2)

◼ Large values of F support Ha and values of F near


1 support H0.
2.8 Model Evaluation – (iii) The
hypothesis test (b. F-test) – Excel Output

ANOVA
df SS MS F Significance F
Regression 1 1.212381668 1.21238 40.533 8.1759E-05
Residual 10 0.299109998 0.02991
Total 11 1.511491667

63
Overview
➢ 2.1 Background
➢ 2.2 Introduction
➢ 2.3 Regression
➢ 2.4 Least Squares Method
➢ 2.5 Simple Linear Regression (SLR)
➢ 2.6 Software Output
➢ 2.7 ANOVA
➢ 2.8 Model Evaluation
➢ 2.9 Examples
64
2.9 Example 1

The manager of a car plant wishes to investigate how the plant’s


electricity usage depends upon the plant production. The data is given
below estimate the linear regression equation

Production (x) ($M) 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2

Electricity Usage 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.5
(y)(kWh) 3

i. Estimate the linear regression equation


ii. Find the standard error of estimate of this regression.
iii. Determine the coefficient of determination of this regression.
iv. Test for significance of regression at 5% significance level.

65
Solution – Example 1

x 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
෍𝑥

=58.62

y 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53
෍𝑦

=34.15

xy 11.18 8.09 10.65 14.02 16.86 15.22 16.82 20.17 14.24 18.29 13.08 10.63
෍ 𝑥𝑦

=169.25

x2 20.34 12.82 18.58 25.60 31.81 24.90 27.98 33.99 22.09 31.47 24.01 17.64
෍ 𝑥2

=291.23

66
i. Estimate the linear regression equation

𝐧 𝐧 𝟐
𝟐 𝟏
𝐒𝐗𝐗 = ෍ 𝐱𝐢 − ෍ 𝐱𝐢
𝐧
𝐢=𝟏 𝐢=𝟏
= 291.23– (58.62)𝟐 /12
= 4.8723
𝐧 𝐧 𝐧
𝟏
𝐒𝐗𝐘 = ෍ 𝐱 𝐢 𝐲𝐢 − ෍ 𝐱𝐢 ෍ 𝐲𝐢
𝐧
𝐢=𝟏 𝐢=𝟏 𝐢=𝟏
= 169.25– (58.62)(34.15)/12
= 2.43045

𝟏
𝐒𝐘𝐘 = σ 𝐲 𝟐 − σ𝐲 𝟐
𝐧
1
= 98.6967 − (34.15)2
12
= 1.51149

67
i. Estimate the linear regression equation

𝑆𝑋𝑌
𝛽መ1 = = 2.43045/4.8723 = 0.4988
𝑆𝑋𝑋

𝛽መ0 = 𝐲lj − 𝛽መ1 𝐱lj
34.15 58.62
= – 0.4988 = 0.4091
12 12

Estimated Regression Line 𝑦ො = 0.4091 + 0.4988𝑥

68
ii. Find the standard error of estimate of this regression.

𝟐
𝐒𝐒𝐄
𝜎ො =
𝐧−𝟐
Square root is called standard error of estimate
SSE = SYY – (SXY)2/SXX
=1.51149 - (2.43045)2/4.8723
= 0.2991

𝟐
𝐒𝐒𝐄
𝜎ො =
𝐧−𝟐
= 0.0299

69
iii. Determine the coefficient of determination of this regression.

R2 = SSR/SST = 1 - (SSE/SST)

R2 = 1-SSE/SST
=1-(0.2991/1.51149)
= 0.802

70
iv. Test for significance of regression at 5% significance level.

𝛼 = 0.05; 𝑡𝛼/2,𝑛−2 = 𝑡0.025,10 = 2.228

𝑆𝑋𝑋 = 4.8723; 𝜎ො 2 = 0.0299

𝛽መ1 − 𝛽1 0.499 − 0
𝑇= = = 6.37
𝜎ො 2 /𝑆𝑋𝑋 0.078

Critical Region: |T | > tα/2, n-2 .


Since 6.37 > 2.228, reject H0 , thus, the distribution of
Electricity usage does depend on level of production
71
iv. Test for significance of regression at 5% significance level.

◼ Using our example again, let us repeat the earlier test on


1. This time we will use the F-test. The null and
alternative hypothesis are:
𝐻0 : 𝛽1 = 0
𝐻𝑎 : 𝛽1 ≠ 0

◼ Let  = 0.05. Since n=12, we require F(0.05; 1,10). From


table we find that F(0.05; 1,10) = 4.96. Therefore the
decision rule is:
◼ Reject H0 since: 𝐹 = 40.53 > 4.96
iv. Test for significance of regression at 5% significance level.

◼ Conclusion:
Since 40.53> 4.96, we reject H0, that is there is a linear
association between the distribution of Electricity usage
and level of production
Excel output – Example 1
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.895605603
The regression equation is
R Square 0.802109396
Adjusted R Square 0.782320336
Electricity usage = 0.409 + 0.499production
Standard Error 0.172947969
Observations 12

ANOVA
df SS MS F Significance F
Regression 1 1.212381668 1.21238167 40.5329703 8.1759E-05
Residual 10 0.299109998 0.029911
Total 11 1.511491667

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 0.409048191 0.385990515 1.05973638 0.31418974 -0.450992271 1.269088653 -0.450992271 1.269088653
X Variable 1 0.498830121 0.078351706 6.3665509 8.1759E-05 0.324251642 0.673408601 0.324251642 0.673408601
74
Exercise.

The following measurements of the specific heat of a certain chemical were made
in order to investigate the variation in specific heat with temperature.

i. Plot the points on a scatter diagram


ii. Estimate the regression line of specific heat on temperature
iii. Determine the coefficient of determination of this regression
model.
iv. Estimate the value of the specific heat when the temperature is
35oC.
75
76

You might also like