You are on page 1of 38

Topic

Simple Linear Regression


Learning Objectives

On completion of this lecture/session students should be able to:

• Evaluate relationship between two variables


• Predict relationships between two variables using linear regression model
• Recognise the equation of a simple regression line from a sample of data and interpret
the slope and intercept of the equation
• realise the usefulness of residual analysis in testing the assumptions underlying
regression analysis and in examining the fit of the regression line to the data and
testing model adequacy
• Estimate prediction equation
• Conduct prediction using statistical packages
Key Terms

 Simple Linear Regression Model


 Simple Linear Regression Equation
 Least Squares Method
 Coefficient of Determination
 Model Assumptions
 Testing for Significance
Simple Linear Regression

 Managerial decisions often are based on the


relationship between two or more variables.
 Regression analysis can be used to develop an
equation showing how the variables are related.
 The variable being predicted is called the dependent
variable and is denoted by y.
 The variables being used to predict the value of the
dependent variable are called the independent
variables and are denoted by x.
Simple Linear Regression

 Simple linear regression involves one independent


variable and one dependent variable.
 The relationship between the two variables is
approximated by a straight line.
 Regression analysis involving two or more
independent variables is called multiple regression.
Simple Linear Regression Model

 The equation that describes how y is related to x and


an error term is called the regression model.
 The simple linear regression model is:

y = b0 + b1x +e

where:
b0 and b1 are called parameters of the model,
e is a random variable called the error term.
Simple Linear Regression Equation

 The simple linear regression equation is:

E(y) = 0 + 1x

• Graph of the regression equation is a straight line.


• b0 is the y intercept of the regression line.
• b1 is the slope of the regression line.
• E(y) is the expected value of y for a given x value.
Simple Linear Regression Equation

 Positive Linear Relationship

E(y)

Regression line

Intercept Slope b1
b0
is positive

x
Simple Linear Regression Equation

 Negative Linear Relationship

E(y)

Intercept
b0 Regression line

Slope b1
is negative

x
Simple Linear Regression Equation

 No Relationship

E(y)

Intercept Regression line


b0
Slope b1
is 0

x
Types of Linear Relationships

Strong relationships Weak relationships

Y Y

X X

Y Y

X X
08/14/2020 1
Estimated Simple Linear Regression Equation

 The estimated simple linear regression equation

ŷ  b0  b1 x

• The graph is called the estimated regression line.


• b0 is the y intercept of the line.
• b1 is the slope of the line.
• ŷ is the estimated value of y for a given x value.
Estimation Process

Regression Model Sample Data:


y = b0 + b1x +e x y
Regression Equation x1 y1
E(y) = b0 + b1x . .
Unknown Parameters . .
b0, b1 xn y n

Estimated
b0 and b1 Regression Equation
provide estimates of ŷ  b0  b1 x
b0 and b1 Sample Statistics
b0, b1
Least Squares Method
No need to
 Least Squares Criterion memorise
this formula
min  (y i  y i ) 2

where:
It’s called a “least squares” because the best line of fit is
one that minimizes the variance (the sum of squares of the
errors). ^
Least Squares Method

 Slope for the Estimated Regression Equation

b1= slope of the regression Equation

 y-Intercept for the Estimated Regression Equation

b0  y  b1 x
Simple Linear Regression

 Example: Reed Auto Sales


Reed Auto periodically has a special week-long sale.
As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Data from a sample of 5 previous
sales are shown on the next slide.
Simple Linear Regression

 Example: Reed Auto Sales

Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
Sx = 10 Sy = 100
x2 y  20
Estimated Regression Equation

 Slope for the Estimated Regression Equation

Assume this is computed and given  b1= 5

 y-Intercept for the Estimated Regression Equation


b0  y  b1 x  20  5(2)  10
 Estimated Regression Equation
yˆ  10  5x
Using Excel’s Chart Tools for
Scatter Diagram & Estimated Regression Equation

Reed Auto Sales Estimated Regression Line


30

25
20
Cars Sold

y = 5x + 10
15

10
5

0
0 1 2 3 4
TV Ads
Coefficient of Determination
• The Coefficient of Determination, also known as R Squared, is
interpreted as the goodness of fit of a regression.
• The higher the coefficient of determination, the better the variance that
the dependent variable is explained by the independent variable. 
• The coefficient of determination is the overall measure of the
usefullness of a regression.

If in our example  r2 = .8772

The regression relationship is very strong; 87.72%


of the variability in the number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.
Sample Correlation Coefficient

rxy  (sign of b1 ) Coefficient of Determination


rxy  (sign of b1 ) r 2

where:
b1 = the slope of the estimated regression
equation yˆ  b0  b1 x

correlation coefficient gives us the relationship strength


Sample Correlation Coefficient

rxy  (sign of b1 ) r 2

yˆ  10  is
The sign of b1 in the equation 5 x“+”.

rxy = + .8772

rxy = +.9366
Assumptions About the Error Term e

1. The error  is a random variable with mean of zero.

2. The variance of  , denoted by  2, is the same for


all values of the independent variable.

3. The values of  are independent.

4. The error  is a normally distributed random


variable.
Testing for Significance

To test for a significant regression relationship, we


must conduct a hypothesis test to determine whether
the value of b1 is zero.
We won’t discuss F
test in this
subject
Two tests are commonly used:
t Test and F Test

Both the t test and F test require an estimate of s 2,


the variance of e in the regression model.
Testing for Significance: t Test

 Hypotheses

H0 : 1  0
H a : 1  0 No need to
Memorise this
 Test Statistic formula

b1 s
t where sb1 
sb1 ( xi  x ) 2
Testing for Significance: t Test

 Rejection Rule

Reject H0 if p-value < a


or t < -tor t > t

where:
t is based on a t distribution
with n - 2 degrees of freedom
Testing for Significance: t Test

1. Determine the hypotheses. H0 : 1  0


H a : 1  0
2. Specify the level of significance. a = .05
This t-statistics is computed
b1
3. Select the test statistic. t  by Excel
sb1

4. State the rejection rule. Reject H0 if p-value < .05


or |t| > 3.182 (with
3 degrees of freedom)
Testing for Significance: t Test

5. Compute the value of the test statistic.


b1 5
t   4.63
sb1 1.08

6. Determine whether to reject H0.


t = 4.541 provides an area of .01 in the upper
tail. Hence, the p-value is less than .02. (Also,
t = 4.63 > 3.182.) We can reject H0.
Computer Solution

 Performing the regression analysis computations


without the help of a computer can be quite time
consuming.
 On the next slide we show Excel output for the
Reed Auto Sales example.
 Recall that the independent variable was named Ads
and the dependent variable was named Cars in the
example.
SUMMARY OUTPUT estimated regression equation :

Regression Statistics Cars = 10.0 + 5.00 Ads.


Multiple R 0.937 rxy
R Square 0.877 ANOVA part of Summary
Adjusted R Square 0.836
rxy2 output is not taught
in this subject
Standard Error 2.160
Observations 5.000
Sample size

ANOVA
  df SS MS F Significance F

Regression 1 100.000 100.000 21.429 0.019


Residual 3 14.000 4.667
Total 4 114.000      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept b0  10.000 2.366 4.226 0.024 2.469 17.531
Number of TV Ads(X)
b1  5.000 1.080 4.629 0.019 1.563 8.437
Sample Data for Model

Weekly Number of Weekly sales model: scatter plot


sales in
$1000s Customers
(Y) (X) 500

245 1400 400

Weekly sales
312 1600 300

279 1700 200

308 1875 100

0
199 1100
0 500 1000 1500 2000 2500 3000
219 1550 Number of customers
405 2350
324 2450
319 1425
255 1700
Regression Using Excel

 Tools / Data Analysis / Regression


Excel Output

08/14/2020 3
Graphical Presentation

 Weekly sales model: scatter plot and regression line

450
400
350
Slope
Weekly sales

300
= 0.10977
($1000s)

250
200
150
100
50
0
Intercept 0 1000 2000 3000
= 98.248 Number of customers

Weekly sales  98.24833  0.10977 (customers )


Some Cautions about the
Interpretation of Significance Tests
 Rejecting H0: b1 = 0 and concluding that the
relationship between x and y is significant does
not enable us to conclude that a cause-and-effect
relationship is present between x and y.
 Just because we are able to reject H0: b1 = 0 and
demonstrate statistical significance does not enable
us to conclude that there is a linear relationship
between x and y.

You might also like