You are on page 1of 82

SMQ27103/EQT 271

ENGINEERING
STATISTICS

Semester 2
2021/2022

CHAPTER 5 :
SIMPLE LINEAR
REGRESSION

IMK, FSGM

Slide Contributors: Prof. Madya Dr.


Safwati Ibrahim & Aishah Mohd Noor
5.1 Introduction

5.2 Describing Linear Relationship between Two


Quantitative Variables
Scatter Plot
Sample Linear Correlation Coefficient, r
Hypothesis Testing for Population Correlation
Coefficient, 𝝆

5.3 Regression Analysis


Least Square Regression Line
CHAPTER How to Interpret Slope of a Least-Square Regression
Line?
OUTLINE
5.4 Assessing Least-Square Regression Line
Variation around Least-Square Regression Line
Goodness of Fit: Coefficient of Determination, 𝑟 2
Residuals Analysis

5.5 Inference in Simple Linear Regression


Hypothesis Testing for Population Slope, 𝛽1
Using Regression Equation for Making Prediction
Learning Outcomes

At the end of this chapter, you must be able to answer the following questions:
5.1 & 5.2:
• What is correlation? (Textbook: LO1-LO4)
• How to find the correlation? (Textbook: LO1-LO4)
• How to infer about the significant of the correlation? (Textbook: LO5)
• What is a simple linear regression? (Textbook: LO7)

5.3:
• What are the concepts to build/estimate a simple linear regression model? (Textbook: LO6)
• What is a simple linear regression model? What is the difference between correlation & regression? (Textbook: LO7- LO8)

5.4:
• Variation is everywhere : What are the component of variations around regression line? (Textbook: LO9)
• How a simple linear regression model works & how to evaluate? (Textbook: LO9-LO12)
• How to check the assumptions that need to be met for a simple linear regression model to be valid? (Textbook: LO13)

5.5:
• How to perform statistical inference for a simple linear regression model? (Textbook: LO14)
• How a simple linear regression model is used to estimate and predict likely values? (Textbook: LO15)
The Learning Dots

What will you learn?


Why it is important?
When it will be useful?
How it should be applied?
Simple Linear
Regression

Correlation Regression
Analysis Analysis

Connecting the dots is the way we make information useful ~


Carolyn Sloan
https://www.linkedin.com/pulse/connecting-dots-closer-look-value-learning-how-learn-carolyn-sloan
5.1
INTRODUCTION

Textbook Pg 186
HOW TWO VARIABLES ARE RELATED?

What do you see from the plots?


Plot A Plot B

25 1.5
1.4
20
1.3

y-variable
y-variable

15 1.2

10 1.1
1
5
0.9
0 0.8
-5 5 15 25 35 3.5 4.5 5.5 6.5 7.5
x-variable x-variable
HOW TWO VARIABLES ARE RELATED?

• This chapter helps us to answer the following question:

How to describe the relationship between two variables?

• Regression analysis can be used when we want to describe relationship a set


of predictor (independent) variables and outcome (dependent) variables.

• In this chapter, the scope of our discussion is investigating relationship


between one predictor variable and one outcome variable.
Example: Example:
Medical Application:
REGRESSIONManufacturing
ANALYSIS Application:

Is there any relationship between Is there any relationship between


BMI (kg/m2) and % of body fat? displacement and mpg (miles per
gallon)?

Image Source:
Image Source: support.minitab.com https://www.polarsparc.com/xhtml/LinearRegression-3.html

How can we describe pattern of each plot?


• Fit a straight line to a set of data points to study
how two variables are related.

• This relationship is described by a linear function.


Simple
Linear • Used to estimate and explain
Regression the relationship / association
between two variables.

What &
Why • Used to predict the value of the outcome
(dependent) variable based on the value of the
predictor (independent) variable.

• Used to explain how a change in predictor variable


give an impact on / influence on the outcome
variable.
Example: Example:
Medical Application:
REGRESSIONManufacturing
ANALYSIS Application:

Is there any relationship between Is there any relationship between


BMI (kg/m2) and % of body fat? displacement and mpg (miles per
gallon)?

Image Source:
https://www.polarsparc.com/xhtml/LinearRegression-3.html
Image Source: support.minitab.com

Recall : What are the variables in above plots? Is it quantitative or qualitative


variable?
• Role of Variables, depending on context:

Outcome Variable Predictor Variable


dependent variable (DV) independent variable (IV)
response variable explanatory variable
Simple y-variable x-variable
Linear
Regression • How many DVs? One quantitative DV.
• How many IVs? One quantitative IV.

Dependent variable (DV) Independent variable (IV)

What & Variable that you are tyring to


understand/predict.
Variable that you
think/hypothesized to have an
Why impact/influence on DV.

• ‘Simple’ : Why? - 1 predictor (independent) variable.

• What if > 2 predictors / IVs? – Multiple Linear Regression.

Simple linear regression is a statistical method that allows us to


examine/investigate relationships between two quantitative variables.
5.2 DESCRIBING
LINEAR
RELATIONSHIP
BETWEEN TWO
QUANTITATIVE
VARIABLES

Textbook Pg 187
How to find the relationship / association using a graph?

WHAT is a SCATTER PLOT?


• A graphical way to represent values for two different
variables using dots. Scatter Plot
• Each dot in the plot refers to the ordered pair (x,y) data
points on a two-dimensional graph.

WHEN to use?
• To explore potential relationship/association between the
two variables (pairs of variables).
• To answer this question: As the values of one variable
change, do we see corresponding changes in the other
variable?

HOW to draw?
• x-axis : predictor/explanatory/independent variable.
• y- axis : outcome/response/dependent variable.
How to find the relationship / association using a graph?

• WHAT the PLOT tells us?


(1) Shape - linear, non-linear (curved, etc). Scatter Plot
(2) Direction - positive or negative.
(3) Presence of outliers, gaps, clusters.
(4) Strength – intuitively, require numerical measure
to describe the strength.

• Useful for explanatory data analysis (initial


investigations to discover pattern, anomalies in a
dataset).

• Can be difficult to see the pattern when we have lots of


data points and overlap points.

Scatter plot is a graphical method that allows us to examine/investigate


relationships between two quantitative variables.
What are the possible relationships /
associations that may exist?
Curvilinear relationship
Positive relationship

As x – variable , y – variable

No relationship
Negative relationship

As x – variable , y – variable

How can we describe pattern of each plot?


How to describe numerically the relationship between two
variables?
WHAT is correlation coefficient?

• “co” means together, “relation” means connection/link between two


variables.
• Used to determine the ………….. and ………….. of linear relationship
between y – variable and x – variable.
• The sample correlation coefficient is denoted by 𝒓.

r=
 ( x − x )( y − y )
i i
HOW to calculate 𝒓? Formula to
(x − x ) (y − y )
2 2
i i calculate r

n
1 n n

 xi y i −  xi  y i
n i =1 i =1 Sxy
r= i =1
=
 n 2 1 n  2
  n 2 1 n  2
 Sxx Syy Formula to
  xi −   xi     yi −   yi   calculate r.
 i =1 n  i =1    i =1 n  i =1  
   
How to describe numerically the relationship between two
variables?
WHAT is the property 𝒓? Weak

The range of the correlation coefficient is between -1 and 1. Strong

moderate
−𝟏 ≤ 𝒓 ≤ 𝟏 Very weak
Very strong
Relationship between Correlation Coefficient & Scatter Plot

r=1 r = 0.82 r = 0.63 r = -0.55 r =-0.94 r = -1

What do you notice from the scatter plot?


How to describe numerically the relationship between two
variables?

OTHER EXAMPLES
How correlation coefficient relates with a SCATTER plot?

HOW to interpret 𝒓?
• The nearer 𝒓 to 1 or -1, the …………. the linear
relationship.

• The nearer 𝒓 to 0, the ………….. the linear relationship.


Non-linear relationship
• If r is ………….. , we can say that the linear relationship
may not exist.

• If linear relationship does not exist, it doesn’t mean there


is no relationship.

Curvilinear relationship
Exercise 1 What do you think the correlation coefficient for Plot A & B would be?

To validate your initial assumption/prediction, calculate the sample


correlation coefficient by using Excel.

Interpret the value.


Excel Function: Textbook Pg 191

Solution:
Plot A Plot B

25 1.5
1.4
20
1.3

y-variable
y-variable

15 1.2
1.1
10
1
5 0.9
0 0.8
0 10 20 30 40 3.5 4.5 5.5 6.5 7.5
x-variable x-variable

How can you relate from the correlation coefficient calculated


with the corresponding scatter plot?
Scatter Plot vs Correlation Coefficient

Recall the things that you know about:

Basis of Comparison Scatter Plot Correlation Coefficient

From the scatter plot and the correlation coefficient, can we generalized the
existence of linear relationship to the population of interest?
Sample Correlation Coefficient : From Sample to Population

WHAT does it mean when r is NOT equal to zero?

The value of r is computed from sample, there are two possibilities when
r is not equal to zero either:

OR

value of r is large enough to conclude


the correlation existence is by chance
that there is possibility of significant
or because of a random/unknown
linear relationship/correlation between
factor.
the variables.

To conclude whether the is a significant linear relationship/correlation,


we need to perform the appropriate hypothesis test.
Hypothesis testing for population correlation coefficient, 𝝆
Conceptual Understanding Procedures
• A claim about the ………….. 1. State the null and alternative hypotheses.
• Assume H0 is …………. until sample evidence
…………. H0. H0 :  = 0 or There is no linear correlation between
y–variable and x–variable.
• Our objective is to …………. H1.
H1 :   0 or There is a linear correlation between
y–variable and x–variable.

• Value that separates rejection and non- 2. Obtain the critical values :
rejection region.
• Determine by significance level & the types of ±𝑡𝛼,𝑑𝑓=𝑛−2 since test is a …………. test
test. 2
(𝐻1 : 𝜌 ≠ 0 )
• Significance level is the ………….
(Probability you ………….when H0 is ………….).

• Value that we used to make decision about 3. Compute the test statistic.
the …………. hypothesis.
• The …………. the value, the …………. the t-test used to determine significance of linear
chance to support the H1. 𝑛−2
correlation : 𝑡𝑡𝑒𝑠𝑡 = 𝑟
1−𝑟 2
Hypothesis testing for population correlation coefficient, 𝝆

Conceptual Understanding Procedures


• …………. when test statistic falls in the 4. Make the decision,
rejection region.
• …………. when test statistic falls in the non- We reject 𝑯𝟎 if 𝑡𝑡𝑒𝑠𝑡 ≥ 𝑡𝛼,𝑛−2 or 𝑡𝑡𝑒𝑠𝑡 ≤ −𝑡𝛼,𝑛−2
rejection region. 2 2
• We are at risk of making Type-1 and Type-2
Error.

• If you reject H0, it means that you have 5. State conclusion in context of the problem
…………. evidence from the sample data to and claim.
support H1 at the specified significance level.

Otherwise, not enough evidence. Lack of


evidence doesn’t prove that the relationship
does not exist.
Hypothesis testing for population Correlation Coefficient, 𝝆

Decision

Reject the null Fail to reject the


hypothesis null hypothesis

There is sufficient evidence to conclude There is insufficient evidence to conclude


that there is a linear correlation between that there is a linear correlation between
y-variable and x-variable. y-variable and x-variable.

Remarks:
Always write the conclusion:
• in context of the x-variable & y-variable.
• the significance level.
i. Can you conclude the correlation between the two variables is
Exercise 2 statistically significant at 5% level?
ii. How would you relate i. and the calculated sample correlation coefficient
from Exercise 1?
Solution:
i.
1. State hypotheses:
𝑯𝟎 : There is no linear relationship between ….. and …… (𝝆 = 𝟎)
𝑯𝟏 : There is a linear relationship between ….. and …… (𝝆 ≠ 𝟎)

2. Obtain the critical value : 𝑡𝛼,𝑑𝑓=𝑛−2 = 𝑡0.05,𝑑𝑓=25−2 = ±2.0687


2 2

25
𝑛−2 25−2
3. t -test : 𝑡 = 𝑟 = 0.9574 = 15.918 20
1−𝑟 2 1−(0.9574)2

y-variable
15

10

4. Make the decision: 5

0
0 10 20 30 40
x-variable
5. Conclusion:
There is sufficient evidence to conclude that there is a significant linear relationship
between … and ….. at 5% significance level.
i. Can you conclude the correlation between the two variables is
Exercise 3 statistically significant at 5% level?
ii. How would you relate i. and the calculated sample correlation
coefficient from Exercise 1?

1.5
1.4
1.3

y-variable
1.2
1.1
1
0.9
0.8
3.5 4.5 5.5 6.5 7.5
x-variable

Recall: “What does it mean when we can conclude the linear correlation is
“statistically significant”?
Solution:
Example 5.5 (text book):
Perform the hypothesis test for the significance of linear correlation between Physics
scores and Mathematics scores in Example 5.1 at α=0.05.

Solution:
1. 𝑯𝟎 :There is no linear relationship between Physics scores and Mathematics scores (𝝆 = 𝟎)
𝑯𝟏 : There is a linear relationship between Physics scores and Mathematics scores (𝝆 ≠ 𝟎)

2. Obtain the critical value : 𝑡𝛼,𝑑𝑓=𝑛−2 = 𝑡0.05,𝑑𝑓=7−2 = ±2.5706


2 2

𝑛−2 7−2
3. t -test : 𝑡 = 𝑟 = 0.991 = 16.55
1−𝑟 2 1−(0.991)2

4. Make the decision: Reject the null hypothesis, since


the test value falls in the
rejection region, 𝑡𝑡𝑒𝑠𝑡 = 16.55 >
2.5706.
𝑡𝛼,𝑑𝑓=𝑛−2
-2.5706 2 +2.5706 16.55

5. Conclusion:
There is sufficient evidence to conclude that there is a significant linear relationship
between Physics scores and Mathematics scores at 5% significance level.
List three things you have learned so far ….
QUESTIONS FOR SELF-
REFLECTION
• Why scatter plot? (LO2 Textbook)

Summary from • What are the possible relationships between two


variables? (LO2 Textbook)
Correlation
Analysis/ • How to numerically measure strength of a
Describing relationship between two variables? (LO3-4 Textbook)
Linear
Relationship • How to determine significance of linear
correlation? (LO5 Textbook)
What if the hypothesis for population correlation
coefficient is …..

significant not significant


(reject H0)? (fail to reject H0)?

• The NEXT STEP is to • determine the regression line


determine the equation of and making prediction on
the regression line or the the basis of sample data.
data’s line of best fit OR • No attempt to generalized
best-fit line. to the population level.
The Learning Dots

Simple Linear
Regression

Correlation Regression
Analysis Analysis

Connecting the dots is the way we make information useful ~ Carolyn Sloan
https://www.linkedin.com/pulse/connecting-dots-closer-look-value-learning-how-learn-carolyn-sloan
5.3
REGRESSION
ANALYSIS

Textbook Pg 195
WHAT IS A LEAST-SQUARE REGRESSION LINE OR
BEST-FIT LINE?

Which of the straight lines best explained the


relationship between y-variable and x-variable?

The BEST-FIT LINE : A line such that the


vertical distance or deviation between
observed value (the green dots) and the
straight line (yellow line) is at
minimum/smallest distance (“least”).
WHAT IS THE RELATIONSHIP BETWEEN
SAMPLE CORRELATION COEFFICIENT
& BEST-FIT LINE?

When r is positive, the line slopes When r is negative, the line slopes
upward to the right or, as the downward to the right or, as the
value of x ..……, y ………. value of x ………., y ………..

• The closer the line to all points on the scatter plot the better the relationship.

• The reason you need a line of best fit is that the values of y will be predicted from the
values of x ; hence, the closer the points are to the line, the better the fit and the
prediction will be.
Deterministic Model Output can be predicted with
certainty. No elements of
randomness/uncertainty.
Example: In mathematics: A deterministic
model is a mathematical model in which
the output is determined only by the
specified values of the input data and
the initial conditions.

Every time you run the model with the same


initial conditions, you will get the same
results.
Deterministic & Example: Relationship between area of a
Probabilistic circle and the radius.

Model Output can be different for the


Probabilistic Model
same inputs. Include elements
of randomness/uncertainty.

Every time you run the model with the same


initial conditions, you will likely get different
results.

Example: Observational Study:


Relationship between cholesterol level
and age. Experimental Study:
Relationship between blood pressure
reading and test stress score.
• WHAT? A model that expresses the linear
relationship between two quantitative variables.

• The population model/equation is written as:

y i =  0 + 1xi +  i
Simple Linear
Regression where ;
Model 0 is the population intercept of the line with the y-axis
is the population slope of the line
1
 is the random error/unexplained variation in y

The random error (residual) takes into account


other unknown factors that are not included in the
model.
POPULATION & SAMPLE
SIMPLE LINEAR REGRESSION
MODEL / EQUATION

Population Sample

y-intercept 𝛽0 𝛽መ0

Slope/regression coefficient 𝛽1 𝛽መ1

𝑦ො = 𝛽መ0 +𝛽መ1 𝑥
True mean of y for a
Regression given x. We assume all 𝜇𝑦|𝑥 = 𝛽0 + 𝛽1 𝑥
equation points fall on the line.

Variation is everywhere ! y i =  0 + 1xi +  i Estimated regression


Y-values may not fall on
equation
the line
Random error
component

How to estimate the population intercept &


slope/regression coefficient?
GRAPHICAL ILLUSTRATION OF POPULATION
SIMPLE LINEAR REGRESSION MODEL

Random error is the difference between observed/actual value and the


predicted /estimated value of y at xi.
HOW TO ESTIMATE THE POPULATION PARAMETERS?
LEAST-SQUARE METHOD

Commonly used to estimate the population parameter 𝛽0 and 𝛽1 .

How it works: The sum of squares (SS) of the …………. of data points (observed values) and
the regression line …………. .

The estimated equation for the best-fit line/regression line is:


Sxy
ˆ1 =
Sxx
slope

intercept

ˆ0 = y − ˆ1x
Independent variable / estimated value of y
explanatory variable for a given value of x.

The line always passes through the mean of both x-variable and y-variable, ( x , y )
Properties of Least-Square Regression Line &
Residuals

1  ˆ = 0
i

4
 (ˆ )
2
2 i
as minimum as possible.

3 Passes through the, ( x, y ) 1

3
4 Unbiased estimate

5 The estimated slope


is the average y for
a 1-unit x.

Textbook Pg 197
For i = 1, 2, …, n

1 E ( ˆi ) = 0

2  2 =  2 = ... =  2 =  2
ˆ1 ˆ2 ˆn

3 ˆi ~ N(0, 2 )

Properties of
Residuals

Textbook Pg 197
i. By using Excel, find the best-fit line, the regression equation and
Exercise 4 interpret the slope of the regression line.
ii. How do you relate Exercise 1 and results obtained from i.

Excel Function: Textbook Pg 198


Solution:

25 1.5

20 1.4
1.3

y-variable
y-variable

15
1.2
10 1.1
1
5
0.9
0 0.8
0 10 20 30 40 3.5 4.5 5.5 6.5 7.5
x-variable x-variable

How to interpret slope of a regression line?


Solution:
Which part you need clarification?
The Learning Dots

Simple
Linear
Regression

Correlation Regression
Analysis Analysis

Connecting the dots is the way we make information useful ~ Carolyn Sloan
https://www.linkedin.com/pulse/connecting-dots-closer-look-value-learning-how-learn-carolyn-sloan
• Why regression? (LO6 Textbook)

QUESTIONS FOR SELF-


REFLECTION • What is the concept behind best-fit line?
(LO6 Textbook)

• What is the equation of the best-fit line?


(LO6 Textbook)

• How do you relate best-fit line and


Summary from sample correlation coefficient?
Regression
Analysis
• How to interpret the slope? (LO7 Textbook)

• What is the relationship between slope of


regression equation and sample
correlation coefficient?

• What is the difference between correlation


& regression?
What is the difference between
CORRELATION & REGRESSION?

Basis of Comparison Correlation Regression


What is the purpose? To examine co-relationship To find best-fit line to describe
between two quantitative relationship between DV and
variables. IV.
When to use? To quantify the relationship To estimate how much one
(strength and direction) variable (DV) is affected by
between two variables. another variable (IV).

What is the role of DV and Perform the experiment ! Perform the experiment !
IV? Interchange the pair (x,y) to
(y,x). Re-calculate the
correlation coefficient & find
the best-fit line. What do you
notice?
5.4
ASSESSING
LEAST-
SQUARE
REGRESSION
LINE

Textbook Pg 200
VARIATIONS AROUND LEAST-SQUARE
REGRESSION LINE
Total Deviation:
Vertical distance data point to y-bar

Vertical distance of observed data


point to line

Vertical distance of estimated line /


predicted-y to y-bar

Graphical illustrations of deviations in regression analysis

Regression analysis involves measuring the amount of variations not considered by the
regression equation, and this variation is known as the unexplained variation/deviation. Textbook Pg 201
VARIATIONS AROUND LEAST-SQUARE REGRESSION LINE

Total Variation
Variation of each observed y around 𝑦.

σ 𝒚−𝒚 ഥ 𝟐

Total variations is the sum of:

……………. Variation …………. Variation


σ 𝒚−𝒚
ෝ 𝟐
σ 𝒚
ෝ−𝒚ഥ 𝟐

A variation due to random


A variation due to the relationship
chance/factors other than the
between x and y.
relationship between x and y.

Sum of explained variations Sum of Unexplained variations


(SS Regression, SSR) (SS Error, SSE)
RELATIONSHIP BETWEEN VARIATION AROUND REGRESSION
LINE & CORRELATION COEFFICIENT
Plot A Plot B
25 1.5
1.4
20
1.3

y-variable
y-variable

15 1.2

10 1.1
1
5
0.9
0 0.8
0 10 20 30 40 3.5 4.5 5.5 6.5 7.5
x-variable Which plot would give higher x-variable
explained variation component?
Why?

• When the ………….. variation is small, the value of r is close to ………….. . Why?

• If all points fall on the regression line, the unexplained variation will be …… and the
sample correlation coefficient value will be …………. ?
Goodness of Fit : Coefficient of Determination

WHAT is a coefficient of determination?


• A measure of the variation.

• Variation in y that is explained by the


regression line and x.

• It is the ratio of the explained variation to


the total variation and denoted by 𝒓𝟐 .

What would happen to 𝑟 2 value if the percentage of unexplained variations


around the regression line is higher than the explained variation? Why?

Textbook Pg 202
Properties of Coefficient of Determination

• usually expressed as a percentage.

• takes value within the range 𝟎 ≤ 𝒓𝟐 ≤ 𝟏.

• If 𝒓𝟐 = 0, then the regression line cannot explain any of the variation which
means that the …………. variable cannot be predicted from the ………….
variable.

• If 𝒓𝟐 = 𝟏, then the regression line explain 100% of the variation in the


…………. variable, and imply that the …………. variable can be predicted
without error from the …………. variable.

Is it possible to get 𝒓𝟐 = 𝟏?
By using Excel and from Exercise 4,
Exercise 5
i. What is the percentage of explained variation and unexplained
variation?
ii. calculate & interpret the coefficient of determination for both plots.

Excel Function: Textbook Pg 199


Solution:

25 1.5
1.4
20
1.3

y-variable
y-variable

15 1.2

10 1.1
1
5
0.9
0 0.8
0 10 20 30 40 3.5 4.5 5.5 6.5 7.5
x-variable x-variable
Solution:

From the results, discuss the following:

What is the relationship between 𝒓 and 𝒓𝟐 ?

Compare the 𝒓𝟐 of both plots, what do you notice?

What can you summarize from the above exercise?


Solution:
Simple Linear
Regression
Using Excel
WHAT is a residual?
The difference between the actual value of y i
and the predicted value ŷ i for a given x.

Residuals or prediction errors, ˆi = y i − yˆi

Residuals
Analysis

Textbook Pg 204
Residuals Analysis:
How to Check Assumptions of a Regression Model ?

If a linear model makes sense/valid, the residuals (estimated errors)


will:

Assumptions How to analyze?


be approximately normally Graphical: Histogram/Normal Probability Plot.
distributed (with a mean of zero) Numerical: Test statistic (AD-Statistic, etc.)

have a constant variance Graphical: Residual Plot (Residuals vs


(homoscedasticity) independent variable / predicted-value / fitted -
value)
Independent of each other Graphical: Sequence / Order Plot (Residuals vs
Observation Order)

Additional Reading Link to: Simple-linear-regression-assumptions.


Textbook Pg 205
Do the points roughly form a straight line? N P P
O R L
R O O
M B T
A A
L B
I
L
I
T
Y

Image Source: https://courses.lumenlearning.com/suny-natural-resources-biometrics/chapter/chapter-7-correlation-and-simple-linear-regression/


Textbook Pg 206
Do the residuals randomly scattered around the R P
horizontal line (center line) of mean equal to zero?
E L
random funnel S O
I T
D
U
A
L
Non-linear Double bow S

Mean of the
residuals is always
zero

x-axis can be x-
variable/IV or fitted
value/predicted
value.

How to interpret a residual plot?


Textbook Pg 207
By using Excel Data Analysis ToolPax,
Exercise 6 i. obtain the residuals.
ii. do the residuals assumptions are met?

Excel Function: Textbook Pg 216

Solution:

25 1.5
1.4
20
1.3

y-variable
y-variable

15 1.2

10 1.1
1
5
0.9
0 0.8
0 10 20 30 40 3.5 4.5 5.5 6.5 7.5
x-variable x-variable
Solution:
Which part you need help?
• Variations is everywhere ! What is the difference
QUESTIONS FOR between explained variation vs unexplained
variation?
SELF-REFLECTION

• Why do we calculate coefficient of


determination? (LO 9-10 Textbook)

• What is the relationship between coefficient of


determination and variations around regression
line? (LO 11 Textbook)
Summary from
Regression • How do you know the regression model is
Analysis II adequate? (LO 12-13 Textbook)

• How do you assess the assumptions of a regression


model? (LO 13 Textbook)

• What is a normal probability plot? When we use


the plot? Why it is important? (LO 13 Textbook)

• What is a residual plot? When we use the plot?


Why it is important? (LO 13 Textbook)
5.5
INFERENCE
IN SIMPLE
LINEAR
REGRESSION

Textbook Pg 200
Hypothesis testing for population slope, 𝛽1

Conceptual Understanding Procedures


• What is your claim? 1. State the null and alternative hypotheses.
• What is the assumption about the
𝐻0 ? 𝐻0 : There is no linear relationship between y-variable
and x–variable. 1 = 0
• What is your objective/goal? 𝐻1 : There is a linear relationship between y–variable
and x–variable.   0
1

• Why this value is important? 2. Obtain the critical value.


• How to determine this value? Is there
any conditions? FCV = Fdf 1,df 2,
• What do you think you know about
significance level?

• Why this value is important? 3. Compute the test statistic.


explained variation in y MS Regression
Ftest = =
unexplained variation in y MS Error
Hypothesis testing for population slope, 𝛽1

Conceptual Understanding Procedures


• When can you reject H0? 4. Make the decision.
• What is the risk you may take when We reject 𝑯𝟎
you perform a hypothesis test? Why? Ftest  FCV
• What does it mean when you 5. State conclusion in context of the claim.
reject/fail to reject H0?
Hypothesis testing for population slope, 𝛽1

Decision

Reject the null Fail to reject the null


hypothesis hypothesis

There is sufficient evidence to conclude that There is insufficient evidence to conclude that
there is a linear relationship between y- there is a linear relationship between y-variable
variable and x-variable. and x-variable.

Remarks:
Always write the conclusion:
• in context of the x-variable & y-variable.
• the significance level.
Exercise 7 By using Excel Data Analysis ToolPax, perform the hypothesis testing for
significance of the linear relationship.
Excel Function: Textbook Pg 216

REFER TO CHAPTER 5 EXCEL


WORKSHEET
Solution:
Using Regression Equation for Making
Prediction

• The regression line can be used to make predictions for the …………. variable.

• When a 𝒚 ෝ value is predicted for a specific …………., the prediction is a point


prediction.

• The magnitude of change in one variable when the other variable change exactly
1 unit is called a marginal change represented by the value of …………..

• For example, if 𝑦ො = 2 + 3𝑥, then for x=4, 𝑦ො = 2 + 3 4 = 14.

• Do not use the equation for predicting y when the value of x is not in the range of
sample data used to develop the equation. Why?
QUESTIONS FOR
SELF-REFLECTION

• Why do we perform a hypothesis test? (LO 15


Textbook)

• What is the relationship between variations


around regression line and significance of a test
Summary from for linear relationship? (LO 15 Textbook)
Regression
Analysis III • How do we know the regression model can be
used for prediction? (LO 16 Textbook)
HYPOTHESIS TESTING FOR SIGNIFICANCE OF LINEAR CORRELATION
&
LINEAR RELATIONSHIP BETWEEN TWO VARIABLES

What is the difference between the two statistical tests?

Test for Linear Correlation Test for Linear Relationship

F - test
t-test

ANOVA Table
REFRESH YOUR
MIND
The Learning Dots

Connecting the dots is the way we make information useful ~ Carolyn Sloan
https://www.linkedin.com/pulse/connecting-dots-closer-look-value-learning-how-learn-carolyn-sloan
REFLECT & EXPLAIN
Least Square Method Regression Normal Probability Plot

Regression Line Best-Fit Line


Linear Regression
Equation / Model Explained Variation
Slope of Regression Equation
Residuals Simple Linear Regression
Unexplained
Residuals Plot
Linear Relationship Scatter Plot Variation

Correlation Coefficient Describing Residuals Plot


Notations

Interpreting Correlation Describing Scatter Plot


Coefficient Correlation
Coefficient of Determination
Interpreting Coefficient Predicted value
of Determination Significance of Linear Relationship

Intercept of Regression Equation


COMPARE & CONTRAST

Correlation Regression

Explained Variation Unexplained Variation

Regression Line Regression Equation

Intercept of Regression Equation Slope of Regression Equation

Observed value Predicted value

Correlation Coefficient Coefficient of Determination

Significance of Correlation Coefficient Significance of Slope


THE END

You might also like