You are on page 1of 44

Simple Linear Regression

Part 2
(Selected material
from Chapter
to accompany
15)Statistics
Managerial
7th edition, by Ronald M. Weiers
Prepared by Professor John Knox
For TOM 302
Cal Poly, Pomona

Chapter

15

Simple Linear Regression Part


2
Standard Error of Estimate
Coefficient of Determination
Correlation
Test of Significance for Slope
Confidence & Prediction Intervals
Statistix 9
McGraw-Hill/Irwin

2007 The McGraw-Hill Companies, Inc. All rights reserved.

Simple Linear Regression


Example Problem (Restaurant Sales):
Prior to opening a new restaurant, the
management of a chain of restaurants requires
an estimate of the quarterly sales revenue. The
management believes that the size of the
student population at the nearby college
campus is related to the quarterly sales revenue.
To evaluate the relationship between student
population (x) and quarterly sales (y), data are
collected from a sample of ten restaurants
located near college campuses.
12A-3

Simple Linear Regression

12A-4

Simple Linear Regression

12A-5

Simple Linear Regression


Calculation of sample regression equation:
i

xi

yi

xi yi

xi2

yi2

58

116

3,364

105

630

36

11,025

88

704

64

7,744

118

944

64

13,924

12

117

1,404

144

13,689

16

137

2,192

256

18,769

20

157

3,140

400

24,649

20

169

3,380

400

28,561

22

149

3,278

484

22,201

10

26

202

5,252

676

40,804

Totals:

140

1,300

21,040

2,528

184,730

xi2

yi2

xi

yi
12A-6

xi yi

Simple Linear Regression


xi yi

140 1,300

x
y

21,
040

i i
n
10
b1

2
2
140

x
i
2
2,528
x

i n
10

b0 Y b1 X

yi b xi
n

2,840
5.0000
568

1,300
140
5
130 5 14 60.0000
10
10

Sample Regression Equation: y i 60 5 xi


where yi quarterly sales in thousands of dollars
xi student population in thousands of students
12A-7

Simple Linear Regression


Statistix 10 Linear
Regression Output:

12A-8
12A-8

Simple Linear Regression


Standard error of estimate (estimated standard

deviation of population data around regression line)


n

s y| x

(
y

y
)
i i
i 1

n2

SSE
MSE
n2

where yi actual value of Y (ith value of Y in the sample)


yi predicted value of Y (calculated value of Y using sample
regression equation with ith value of X in the sample)

12A-9

Simple Linear Regression


Standard error of estimate

Alternate formula (computational formula)


n

s y| x

i 1

yi2

b0 yi b1
i 1

n2
n

i 1

xi yi

Example Problem (Restaurant Sales):


s y| x

184, 730 60 1300 5.0 21, 040


10 2

1530

191.25 13.8293
8

Standard error of estimate value is 13.83 units of Y ($13,830).


12A-10

Simple Linear Regression


Standard error of estimate can be compared with

sample standard deviation of Y-values (sy).


n

sy
sy

( yi Y )2
i 1

n 1

SST
n 1

15, 730
15, 730

1,747.78 41.8064
10 1
9

Standard error of estimate value is 13.83 ($13,830), which is much


smaller than sample standard deviation of 41.81 ($41,810).
12A-11

Simple Linear Regression


Statistix 10 Linear
Regression Output:

12A-12
12A-12

Simple Linear Regression


Coefficient of Determination: The proportion
of the variation in the dependent variable that
is explained by the independent variable.
Total variation = Unexplained variation + Explained variation
(SST)

(SSE)

Total variation = SST =

(SSR)

yi Y

Unexplained variation = SSE =


Explained variation = SSR =
12A-13

yi yi

yi Y

Simple Linear Regression

12A-14

Simple Linear Regression


Coefficient of Determination:
r2

explained variation SSR

total variation
SST

r2 1

unexplained variation
SSE
1
total variation
SST
n

r 1

( yi yi )
i 1
n

( yi Y )
i 1

i 1

yi2

b0 yi b1
i 1

yi2
i 1

12A-15

i 1

yi
n

x
i yi
i 1
n

Simple Linear Regression


Coefficient of Determination Example Problem:
n

r2 1

r2 1

i 1

yi2

b0 yi b1
i 1

x
i yi
184, 730 60 1,300 5.0 21, 040
i 1

2
2
1,300

n
184, 730
yi
n
10
i 1
2
y

i
n
i 1
n

1,530
1 0.0973 0.9027
15, 730

Approximately 90% of the variation in quarterly sales can be


explained by the influence of the student population.
12A-16

Simple Linear Regression


Statistix 10 Linear
Regression Output:

12A-17
12A-17

Simple Linear Regression


Correlation Analysis used to measure the
strength of association between X and Y.
(Note: Correlation analysis does not establish a
cause and effect relationship between X and Y.)
Coefficient of correlation (r) is a measure of the
strength of the linear relationship between X and Y.

12A-18

Simple Linear Regression


r r2

where r 2 = coefficient of determination

If b1 0, then r 0; if b1 0, then r 0
b1

b1 0 and r 0

b1
r

12A-19

Simple Linear Regression


Coefficient of Correlation Example Problem:
r 2 0.9027 and b1 5.00 r 0.9027 0.9501
Coefficient of correlation (r) ranges from 1 to +1.
1 indicates perfect negative correlation.
+1 indicates perfect positive correlation.
0 indicates no correlation.
The closer r is to 1 or +1, the stronger is the
association between X and Y.
12A-20

Simple Linear Regression


Hypothesis Test of Population Slope (1) Example Problem:

Test the hypothesis that there is no linear relationship


between student population (X) and quarterly sales (Y)
using a 0.05 level of significance.

12A-21

Simple Linear Regression


Hypothesis Test of Population Slope (1) Example Problem:
If there is no linear relationship between the student population
(x) and the quarterly sales (y), then 1 = 0.
Hypotheses:

H 0 : 1 0
H1: 1 0

Location of rejection regions: two-tail test


Level of significance () = 0.05

12A-22

Simple Linear Regression


Hypothesis Test of Population Slope (1) Example Problem:
Decision rule: If the
calculated t from the
sample is less than -2.306
or greater than 2.306, then
reject H0; otherwise do not
reject H0.

df n 2 10 2 8

tcv1 2.306

tcv2 2.306

Alternate decision rule using p-value: If the two-tail p-value is


less than 0.05, then reject H0; otherwise do not reject H0.
12A-23

Simple Linear Regression


Hypothesis Test of Population Slope (1) Example Problem:

b1 1
sb1

b1 5.0000

1 0

s y| x

sb1

xi2

i 1

i 1

xi

13.82932
140

2,528

13.82932
568

0.58027

10

5.0000 0
8.6167
0.58027

8.6167 2.306 Reject H 0


12A-24

Simple Linear Regression


Hypothesis Test of Population Slope (1) Example Problem:
At the 0.05 level of significance, there is sufficient sample
evidence to conclude that there is a linear relationship between
the student population (x) and the quarterly sales (y).
Using computer output, the two-tail p-value is 0.0000, which is
less than 0.05; so reject H0 (same decision as above).

Statistix 9 linear regression output:

b1 5.00000

sb1 0.58027

t 8.62

12A-25

p-value 0.0000

Simple Linear Regression


Statistix 10 Linear
Regression Output:

12A-26
12A-26

Simple Linear Regression


Confidence Interval for Slope (1) Example Problem:
Calculate the 95% confidence interval estimate for the population
slope where student population (X) is the independent variable
and quarterly sales (Y) is the dependent variable.

12A-27

Simple Linear Regression


Confidence Interval for Slope (1) Example Problem:

b1 tsb1
b1 5.0

0.95

0.025

df n 2
10 2 8
0.025

t 2.306
sb1 0.5803 (see H 0 test of 1 for calculation)

b1 tsb1 5.0 2.306 0.5803 5.0 1.3382 3.6618 to 6.3382


We are 95% confident that the slope of the population regression line is within the
interval 3.6618 to 6.3382. An increase in the student population of one thousand
students will produce an expected increase in quarterly sales of between $3,662 to
$6,338.
12A-28

Simple Linear Regression


Confidence Interval for y|x Example Problem:
Calculate the 90% confidence interval estimate for mean
quarterly sales of all restaurants located near college
campuses with 8,000 students.

12A-29

Simple Linear Regression


Confidence Interval for y|x Example Problem:
CI of y| x = yi ts y| x

( xi x ) 2

xi2

Adjustment for scaling factor (x):

xi

8, 000 students
8 units of x
1, 000 students per unit of x

yi b0 b1 xi 60 5.0(8) 100
previously determined values: s y| x 13.82932
12A-30

X 14.0

Simple Linear Regression


Confidence Interval for y|x Example Problem:
0.90
0.05

df 10 2 8
t 1.860
0.05

90% CI of y| x

1
= 100 (1.860)(13.82932)

10

1
Standard error (SE) = (13.82932)

10

(8 14) 2
(140) 2
2,528
10

(8 14) 2
(13.82932) 0.16338 5.58985
2
(140)
2,528
10
12A-31

Simple Linear Regression

12A-32

Simple Linear Regression


90% limits for mean

90% limits for individual


predicted values

12A-33

Simple Linear Regression


Confidence Interval for y|x Example Problem:
Margin of error (e) = t(SE) (1.860)(5.58985) 10.3971

90% CI of y| x = 100 10.3971 89.6029 to 110.3971


Adjustment for scaling factor (y):
(89.603 units of y )($1, 000 per unit of y ) $89, 603
(110.397 units of y )($1, 000 per unit of y ) $110,397

12A-34

Simple Linear Regression


Confidence Interval for y|x Example Problem:
Confidence
Interpretation of confidence interval:
We are 90 percent confident that the average quarterly
sales of all restaurants that are located near college
campuses with 8,000 students is within the interval of
$89,603 to $110,397.

12A-35

Simple Linear Regression


Example Problem - Statistix 10 Confidence Interval

12A-36

Simple Linear Regression


Prediction Interval for Individual Yx Example Problem:
Calculate the 90% prediction interval estimate for the
quarterly sales of a particular restaurant located near a
college campus with 8,000 students.

12A-37

Simple Linear Regression


Prediction Interval for Individual Yx Example Problem:
Prediction
PI of y x = yi ts y| x

1
1
n

( xi x ) 2

xi2

xi

yi b0 b1 xi 60 5.0(8) 100
previously determined values: s y| x 13.82932

12A-38

X 14.0

t 1.860

Simple Linear Regression


Prediction Interval for Individual Yx Example Problem:
Prediction
1
90% PI of y x = 100 (1.860)(13.82932) 1
10

1
Standard error (SE) = (13.82932) 1
10

(8 14) 2
(140) 2
2,528
10

(8 14) 2
(13.82932) 1.16338 14.91632
(140) 2
2,528
10

Margin of error (e) = t(SE) (1.860)(14.91632) 27.74436

90% PI of y x = 100 27.74436 72.256 to 127.744

12A-39

Simple Linear Regression


Prediction Interval for Individual Yx Example Problem:
Prediction

Adjustment for scaling factor (y):


(72.256 units of y)($1, 000 per unit of y ) $72, 256
(127.744 units of y)($1, 000 per unit of y ) $127, 744
Interpretation of prediction interval:
We are 90 percent confident that the quarterly sales of a
restaurant that is located near a college campus with 8,000
students is within the interval of $72,256 to $127,744.
12A-40

Simple Linear Regression


Example Problem - Statistix 10 Prediction Interval

12A-41

Simple Linear Regression


Note: The 90% prediction interval for yi is wider than the 90%
confidence interval for y|x (where yi is the value of y for an
individual element of the population and y|x is the average value
of y for a subset of the population having the same value of x).
90% limits for mean

90% limits for individual


predicted values

12A-42

Simple Linear Regression


In this example problem, simple linear regression produced a
confidence interval that is narrower and centered on a different value
than we would have gotten with a simple confidence interval based
solely on a sample of y-values.
Simple 90% confidence interval of y = 130 24.23 = 105.77 to 154.23
SLR 90% confidence interval of y|x = 100 10.39 = 89.61 to 110.39
Midpoint of simple 90%confidence interval = 130
Midpoint of SLR 90% confidence interval = 100
Margin of error for simple 90% confidence interval = 24.23
Margin of error for SLR 90% confidence interval = 10.39
12A-43

Managerial
Statistics
End of Simple Linear
Regression
Part 2

You might also like