You are on page 1of 92

# 4/1/2013 1

Huangpu River
Chapter 12
Linear Regression and
Correlation
4/1/2013 2
Chapter 12
Linear Regression and Correlation
Weekly
Sales
Aptitude Test Score
4/1/2013 3
TO DISCUSS SCATTER DIAGRAMS.
TO DISCUSS THE COEFFICIENT OF
CORRELATION.
TO DISCUSS THE COEFFICIENT OF
DETERMINATION.
TO USE THE LEAST SQUARES METHOD TO
DETERMINE A LINEAR REGRESSION
EQUATION.
TO INTERPRET THE LINEAR REGRESSION
EQUATION
Learning Objectives
4/1/2013 4
Learning Objectives
(continued)
TO COMPUTE THE STANDARD ERROR OF
ESTIMATE AND EXPLAIN ITS USE.
TO CONSTRUCT A CONFIDENCE
INTERVAL AND A PREDICTION INTERVAL
FOR THE ESTIMATES OF THE
DEPENDENT VARIABLE.
Understand the limitations, errors, and
caveats of using regression and correlation
and evaluating assumptions using residual
analysis
4/1/2013 5
GBS221 GRADE DISTRIBUTION
0
10
20
30
40
50
60
70
Class < 59 60 - 69 70 - 79 80 - 89 90 - 100
#

O
F

S
T
U
D
E
N
T
S

Descriptive Statistics

4/1/2013 6
Statistical Inference
Population
Sample

o
o
=
=
=
=
?
?
?
2
P
X
S
p
s
2
Estimates
4/1/2013 7
Chapter 12
Linear Regression and Correlation
Weekly
Sales
Aptitude Test Score
4/1/2013 8
Example 1:
Plot the relationship between Test Scores and
Weekly Sales:

Sales
Person
Test Score
(X)
Weekly
Sales (Y)
Mike 1 2
Melissa 2 4
Jalene 3 8
Jeff 4 6
Brian 5 12
Nicole 6 10

4/1/2013 9
Correlation and Regression
Weekly Sales vs. Test scores
0
2
4
6
8
10
12
0 1 2 3 4 5 6
__________________________
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_

4/1/2013 10
Correlation and Regression
Weekly Sales vs Test Scores
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6 7
Aptitude Test Score
W
e
e
k
l
y

S
a
l
e
s

(
\$
0
0
0
)
4/1/2013 11
Correlation and Regression
Weekly Sales vs Test Scores
y = 1.7714x + 0.8
R
2
= 0.7845
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6 7
Test Score
W
e
e
k
l
y

S
a
l
e
s

(
\$
0
0
0
)
4/1/2013 12
Demonstrate how to create a scatter
diagram and compute the regression
equation using Excel
Use directions on pages 70-74
Use Insert|Chart for Excel
Also see pages 493-495

(For this demonstration, use X=5)
4/1/2013 13
Example 1 continued...
r
b
b
r
Y
X
O
2
1
4/1/2013 14
= 3.5
= (.8857)
2
=.7845
= .8857
= 1.7714
= 0.8
= 7
Example 1 continued...
Sample mean of X values.
Sample mean of Y values.
Y-intercept
Slope of the regression line.
Coefficient of correlation
Coefficient of determination
r
b
b
r
Y
X
O
2
1
4/1/2013 15
Slope-Intercept form
of a straight line
Y = mX + b
Y is the dependent variable
X is the independent variable
m is the slope of the line
b is the Y-intercept
But statisticians are peculiar. You
might say they have a deviation!!!
4/1/2013 18
i i
X b b Y
1 0
+ =
.
Y
i

.
= Predicted Value of Y for observation i
X
i

= Value of X for observation i
b
0

= Sample Y - intercept used as estimate of
the population |
0
b
1

= Sample Slope used as estimate of the
population |
1
Simple Linear Regression Model
4/1/2013 19
Interpreting the Results
Y
i
= +0.8 + 1.7714X
i
The slope of 1.7714 means for each increase of one
unit in X, the Y is estimated to increase 1.7714 units.
For each increase of 1 unit in the test score, the
model predicts that the expected weekly sales are
estimated to increase by \$1.7714 thousand.
.
4/1/2013 20
PERFECT NEGATIVE CORRELATION
Y
X
r = -1
4/1/2013 21
PERFECT POSITIVE CORRELATION
Y
X
r = +1
4/1/2013 22
ZERO CORRELATION
Y
X
r = 0
4/1/2013 23
STRONG POSITIVE CORRELATION
Y
X
4/1/2013 24
Use the following definitions to
interpret the results of this example
r is the coefficient of correlation. This
indicates the strength of the relationship
between X and Y and whether the
relationship is + or -.
4/1/2013 25
Interpretation of the coefficient
of correlation
r = 0.8857
There is strong positive correlation between
a salespersons weekly sales and his/her
score on the aptitude test.
-1 0 +1
+0.8857
4/1/2013 26
Coefficient of determination
r
2
is the coefficient of determination. This
indicates the proportion of the variation in
Y that is explained by X.
4/1/2013 27
Coefficient of determination
r
2
= 0.7845
About 78% of the variation in weekly sales
is explained by the variation in test scores.
or
The variation in test scores explains 78% of
the variation in weekly sales.
4/1/2013 28
Purpose of Regression and
Correlation Analysis
Regression Analysis is Used Primarily for
Prediction
A statistical model used to predict the values of a
dependent or response variable based on values of
at least one independent or explanatory variable
Correlation Analysis is Used to Measure
Strength of the Association Between
Numerical Variables
4/1/2013 29
For the Test Score/Weekly Sales
Problem
b
O
b
O
b
1
Y X,
Plot the regression line on your graph
using the given values of =0.8 and
=1.7714.
Hint: The regression line will
always go through the Y-intercept
(0, ) and ( ).
4/1/2013 30
(0.8)
(0,0.8)
Correlation and Regression
Weekly Sales vs Test scores
0
2
4
6
8
10
12
0 1 2 3 4 5 6
Aptitude Test Score
W
e
e
k
l
y

S
a
l
e
s

(
\$
0
0
0
)
(3.5,7)
Regression Line: =0.8+1.77X
(0,.8)
^
Y
4/1/2013 31
Correlation and Regression
Weekly Sales vs Test Scores
y = 1.7714x + 0.8
R
2
= 0.7845
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6 7
Test Score
W
e
e
k
l
y

S
a
l
e
s

(
\$
0
0
0
)
4/1/2013 32
For the Test Score/Weekly Sales
Problem
Compute the predicted value of weekly
sales ( ) for each of the following
test scores (X).
X
1 ?
2
3
4
5
^
Y
^
Y
4/1/2013 33
For the Test Score/Weekly Sales
Problem
Compute the predicted value of weekly
sales ( ) for each of the following test
scores (X).
X
1 2.571
2 4.343
3 6.114
4 7.886
5 9.657
^
Y
^
Y
4/1/2013 34
Confidence Intervals
OH NO!!!!!!!!
OH NO!!!!!!!!
4/1/2013 35
For the Test Score/Weekly Sales
Problem
Assume that an applicant scored 5 on the
aptitude test:
What do you predict her weekly sales will be?
Interpret your answer in light of what you know
about point and interval estimates.
We have to find the standard error of
the estimate.

4/1/2013 36
Predicted mean weekly sales for
applicants who scored 5 on the aptitude
test
The predicted weekly sales for applicants
who scored 5 on the test is \$9,657.
Since the regression line is an average
line drawn through the data, this is a point
estimate of average weekly sales.
A confidence interval can be computed, i.e.,
We can be 95% confident that mean
weekly sales will be between _ and _.

4/1/2013 37
Correlation and Regression
Weekly Sales vs Test Scores
y = 1.7714x + 0.8
R
2
= 0.7845
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6 7
Test Score
W
e
e
k
l
y

S
a
l
e
s

(
\$
0
0
0
)
In a REAL PROBLEM there
may be many observed values of
Y for each value of X.
4/1/2013 38
Trade Executions vs. Incoming Phone
Calls
y = 0.1415x + 39.351
R
2
= 0.3533
200
250
300
350
400
450
500
1
8
0
0
1
9
0
0
2
0
0
0
2
1
0
0
2
2
0
0
2
3
0
0
2
4
0
0
2
5
0
0
2
6
0
0
2
7
0
0
# of Incoming Calls
#

o
f

T
r
a
d
e

E
x
e
c
u
t
i
o
n
s
4/1/2013 39
Trade Executions vs. Incoming Phone
Calls
y = 0.1415x + 39.351
R
2
= 0.3533
200
250
300
350
400
450
500
1
8
0
0
1
9
0
0
2
0
0
0
2
1
0
0
2
2
0
0
2
3
0
0
2
4
0
0
2
5
0
0
2
6
0
0
2
7
0
0
# of Incoming Calls
#

o
f

T
r
a
d
e

E
x
e
c
u
t
i
o
n
s
2
3
1 S
YX
4/1/2013 40
Standard Error of the Estimate
In chapter 3 we measured the dispersion
about an average called the Mean.
In chapter 6 we measured the dispersion
about a average called the Mean of the
Means.
Now we want to measure the dispersion
about an average line called the
Regression Line.
4/1/2013 41
Measures of Dispersion
Estimate the of Error Standard =
Mean the of Error Standard = or
Deviation Standard Sample or Population = S or
S
YX
S
X X
o
o
4/1/2013 42
Section 12.3, Measures of
Variation
See pages 421 through 427 in the text.
We will discuss the following topics.
Obtaining the Sum of Squares
The Coefficient of Determination
The Standard Error of the Estimate
4/1/2013 43
(0.8)
(0,0.8)
Correlation and Regression
Weekly Sales vs Test scores
0
2
4
6
8
10
12
0 1 2 3 4 5 6
Aptitude Test Score
W
e
e
k
l
y

S
a
l
e
s

(
\$
0
0
0
)
Y
Y
X
=0.8+1.77X
^
Y
4/1/2013 44
(0.8)
(0,0.8)
Correlation and Regression
Weekly Sales vs Test scores
0
2
4
6
8
10
12
0 1 2 3 4 5 6
Aptitude Test Score
W
e
e
k
l
y

S
a
l
e
s

(
\$
0
0
0
)
Y
Y
'
Y
X
=0.8+1.77X
^
Y
4/1/2013 45
(0.8)
(0,0.8)
Correlation and Regression
Weekly Sales vs Test scores
0
2
4
6
8
10
12
0 1 2 3 4 5 6
Aptitude Test Score
W
e
e
k
l
y

S
a
l
e
s

(
\$
0
0
0
)
Total
Error
Unexplained Error
Error explained by
regression line
Y
Y
X
=0.8+1.77X
^
Y
^
Y
(SSR)
(SSE)
(SST)
What proportion of the variation in Y is explained
by the variation in X?
4/1/2013 46
The Coefficient of
Determination
SSR regression sum of squares
SST total sum of squares
r
2
= =

Measures the proportion of variation that is
explained by the independent variable X in
the regression model
SSR Regression Sum of Squares
4/1/2013 47
Y Y = Error Total
( )

= =
Y Y
SST
2
Total Squares of Sum
Y Y =
^
line regression by explained Error

= =
) (
^
regression squares of Sum
2
Y Y
SSR
^
Error d Unexplaine Y Y =

=
) (
^
error squares of Sum
2
Y Y
SSR + SSE = SST
4/1/2013 48
Test Weekly
Score(X) Sales (Y)
1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 25
2 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 9
3 8 6.114 -0.886 0.7846 1.8858 3.5562 1 1
4 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 1
5 12 9.657 2.657 7.0596 2.3430 5.4896 5 25
6 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9
7 SSR 54.91251 SSE 15.0857 SST 70

) (
^
2
Y Y

) (
^
2
Y Y
( ) Y Y
2
^
Y
Y
Y Y
^
2
^
|
.
|

\
|
Y Y
^
Y Y
2
^
) ( Y Y Y Y
2
) ( Y Y
Computations for SSR, SSE and SST
4/1/2013 49
Test Weekly
Score(X) Sales (Y)
1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 25
2 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 9
3 8 6.114 -0.886 0.7846 1.8858 3.5562 1 1
4 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 1
5 12 9.657 2.657 7.0596 2.3430 5.4896 5 25
6 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9
7 SSR 54.91251 SSE 15.0857 SST 70

) (
^
2
Y Y

) (
^
2
Y Y
( ) Y Y
2
^
Y
Y
Y Y
^
2
^
|
.
|

\
|
Y Y
^
Y Y
2
^
) ( Y Y Y Y
2
) ( Y Y
Computations for SSR, SSE and SST
4/1/2013 50
Test Weekly
Score(X) Sales (Y)
1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 25
2 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 9
3 8 6.114 -0.886 0.7846 1.8858 3.5562 1 1
4 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 1
5 12 9.657 2.657 7.0596 2.3430 5.4896 5 25
6 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9
7 SSR 54.91251 SSE 15.0857 SST 70

) (
^
2
Y Y

) (
^
2
Y Y
( ) Y Y
2
^
Y
Y
Y Y
^
2
^
|
.
|

\
|
Y Y
^
Y Y
2
^
) ( Y Y Y Y
2
) ( Y Y
Computations for SSR, SSE and SST
4/1/2013 51
Test Weekly
Score(X) Sales (Y)
1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 25
2 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 9
3 8 6.114 -0.886 0.7846 1.8858 3.5562 1 1
4 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 1
5 12 9.657 2.657 7.0596 2.3430 5.4896 5 25
6 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9
7 SSR 54.91251 SSE 15.0857 SST 70

) (
^
2
Y Y

) (
^
2
Y Y
( ) Y Y
2
^
Y
Y
Y Y
^
2
^
|
.
|

\
|
Y Y
^
Y Y
2
^
) ( Y Y Y Y
2
) ( Y Y
Computations for SSR, SSE and SST
4/1/2013 52
Test Weekly
Score(X) Sales (Y)
1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 25
2 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 9
3 8 6.114 -0.886 0.7846 1.8858 3.5562 1 1
4 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 1
5 12 9.657 2.657 7.0596 2.3430 5.4896 5 25
6 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9
7 SSR 54.91251 SSE 15.0857 SST 70

) (
^
2
Y Y

) (
^
2
Y Y
( ) Y Y
2
^
Y
Y
Y Y
^
2
^
|
.
|

\
|
Y Y
^
Y Y
2
^
) ( Y Y Y Y
2
) ( Y Y
Computations for SSR, SSE and SST
4/1/2013 53
Test Weekly
Score(X) Sales (Y)
1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 25
2 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 9
3 8 6.114 -0.886 0.7846 1.8858 3.5562 1 1
4 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 1
5 12 9.657 2.657 7.0596 2.3430 5.4896 5 25
6 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9
7 SSR 54.91251 SSE 15.0857 SST 70

) (
^
2
Y Y

) (
^
2
Y Y
( ) Y Y
2
^
Y
Y
Y Y
^
2
^
|
.
|

\
|
Y Y
^
Y Y
2
^
) ( Y Y Y Y
2
) ( Y Y
Computations for SSR, SSE and SST
4/1/2013 54
Test Weekly
Score(X) Sales (Y)
1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 25
2 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 9
3 8 6.114 -0.886 0.7846 1.8858 3.5562 1 1
4 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 1
5 12 9.657 2.657 7.0596 2.3430 5.4896 5 25
6 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9
7 SSR 54.91251 SSE 15.0857 SST 70

) (
^
2
Y Y

) (
^
2
Y Y
( ) Y Y
2
^
Y
Y
Y Y
^
2
^
|
.
|

\
|
Y Y
^
Y Y
2
^
) ( Y Y Y Y
2
) ( Y Y
Computations for SSR, SSE and SST
4/1/2013 55
Test Weekly
Score(X) Sales (Y)
1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 25
2 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 9
3 8 6.114 -0.886 0.7846 1.8858 3.5562 1 1
4 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 1
5 12 9.657 2.657 7.0596 2.3430 5.4896 5 25
6 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9
7 SSR 54.91251 SSE 15.0857 SST 70

) (
^
2
Y Y

) (
^
2
Y Y
( ) Y Y
2
^
Y
Y
Y Y
^
2
^
|
.
|

\
|
Y Y
^
Y Y
2
^
) ( Y Y Y Y
2
) ( Y Y
Computations for SSR, SSE and SST
4/1/2013 56
Test Weekly
Score(X) Sales (Y)
1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 25
2 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 9
3 8 6.114 -0.886 0.7846 1.8858 3.5562 1 1
4 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 1
5 12 9.657 2.657 7.0596 2.3430 5.4896 5 25
6 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9
7 SSR 54.91251 SSE 15.0857 SST 70

) (
^
2
Y Y

) (
^
2
Y Y
( ) Y Y
2
^
Y
Y
Y Y
^
2
^
|
.
|

\
|
Y Y
^
Y Y
2
^
) ( Y Y Y Y
2
) ( Y Y
Computations for SSR, SSE and SST
4/1/2013 57
Test Weekly
Score(X) Sales (Y)
1 2 2.571 -4.429 19.6125 -0.5714 0.3265 -5 25
2 4 4.343 -2.657 7.0607 -0.3428 0.1175 -3 9
3 8 6.114 -0.886 0.7846 1.8858 3.5562 1 1
4 6 7.886 0.886 0.7843 -1.8856 3.5555 -1 1
5 12 9.657 2.657 7.0596 2.3430 5.4896 5 25
6 10 11.428 4.428 19.6107 -1.4284 2.0403 3 9
7 SSR 54.91251 SSE 15.0857 SST 70

) (
^
2
Y Y

) (
^
2
Y Y
( ) Y Y
2
^
Y
Y
Y Y
^
2
^
|
.
|

\
|
Y Y
^
Y Y
2
^
) ( Y Y Y Y
2
) ( Y Y
Computations for SSR, SSE and SST
4/1/2013 58
Measures of Variation:
The Sum of Squares
SST = Total Sum of Squares
Measures the variation of the Y values around
their mean
SSR = Regression Sum of Squares
explained variation attributable to the
relationship between X and Y.
SSE = Error Sum of Squares
variation attributable to factors other than the
relationship between X and Y
Y
4/1/2013 59
Standard Error of the Estimate
2
^
2
) (
2

=

n n
SSE
Y Y
S
YX
Standard error of estimate - measures the scatter, or dispersion,
of the observed values around the line of regression.
1
) (
2

E
=

n
S
x x
4/1/2013 60
Predicted weekly sales for an applicant
who scored 5 on the aptitude test
The predicted mean weekly sales for
applicants who scored 5 on the test is
\$9,657.
Since the regression line is an average
line drawn through the data, this is a point
estimate of average weekly sales.
A confidence interval can be computed, i.e.,
We can be 95% confident that mean
weekly sales will be between _ and _.

4/1/2013 61
Trade Executions vs. Incoming Phone
Calls
y = 0.1415x + 39.351
R
2
= 0.3533
200
250
300
350
400
450
500
1
8
0
0
1
9
0
0
2
0
0
0
2
1
0
0
2
2
0
0
2
3
0
0
2
4
0
0
2
5
0
0
2
6
0
0
2
7
0
0
# of Incoming Calls
#

o
f

T
r
a
d
e

E
x
e
c
u
t
i
o
n
s
2
3
1 S
YX
4/1/2013 62
Confidence Interval - Large
Sample
S
YX
Z or Y ) (
^
+
Dream on. It cant be this easy!!!
4/1/2013 63
Confidence Interval - Small
Sample

Dream on. It cant be this easy!!!
S
YX
t or Y ) (
^
+
4/1/2013 64
Estimation of Predicted
Values
Confidence Interval Estimate for
XY
The Mean of Y given a particular X
i

+ -
=

n
i
i
i
yx n i
) X X (
) X X (
n
S t Y

1
2
2
2
1
t value from table
with df=n-2
Standard error
of the estimate
Size of interval vary according to
distance away from mean, X.
For the mean weekly sales for a group of applicants who got
5 on the test.
4/1/2013 65
Estimation of
Predicted Values
Prediction Interval Estimate for Individual
Response Y
i
at a Particular X
i

+ + -
=

n
i
i
i
yx n i
) X X (
) X X (
n
S t Y

1
2
2
2
1
1
Addition of this 1 increased width of
interval from that for the mean Y
Use this when you want the estimated weekly
sales of one particular applicant (e.g., Jo
Cruickshank) who scored 5 on the test.
4/1/2013 66
Weekly Sales/Test Scores problem
Compute the interval estimate (for a group
of applicants who scored 5 on the test).
Two tail test
Alpha error = .05
df = n-2
= 9.657
Syx= 1.942
^
Y
4/1/2013 67
Weekly Sales/Test Scores problem
( )

'

+

Y t S
Y X
n
X X
X
X
n
where t is from Appendix F with
n of freedom
( )
( )
( ) .
1
2
2
2
2 degrees
9 657 2 776 1942
1
6
91
441
6
2
5 35
. ( . )( . )
( . )
+ +

or
9 657 2 776 1942 29524
9 657 2 929
. ( . )( . ) .
. .
+
+
or
or
Between \$6,728 and \$12,586
4/1/2013 68
Demonstrate how to compute
confidence intervals using PredInt.
4/1/2013 69
Confidence Interval Estimate
X Value 5
Confidence Level 95%
Sample Size 6
Degrees of Freedom 4
t Value 2.776450856
Sample Mean 3.5
Sum of Squared Difference 17.50
Standard Error of the Estimate 1.942016625
h Statistic 0.295238095
Average Predicted Y (YHat) 9.657142857
For Average Predicted Y (YHat)
Interval Half Width 2.929740344
Confidence Interval Lower Limit 6.727402513
Confidence Interval Upper Limit 12.5868832
For Individual Response Y
Interval Half Width 6.136457614
Prediction Interval Lower Limit 3.520685243
Prediction Interval Upper Limit 15.79360047
4/1/2013 70
Weekly Sales/Test Scores problem
Interpret the results of your interval
estimate.
4/1/2013 71
Weekly Sales/Test Scores problem
Interpretation of the Interval Estimate
We can say, with 95% confidence,
that the mean weekly sales for a
group of applicants who scored 5
on the aptitude test will be between
\$6,728 and \$12,586.

4/1/2013 72
Confidence Interval Estimate
X Value 5
Confidence Level 95%
Sample Size 6
Degrees of Freedom 4
t Value 2.776450856
Sample Mean 3.5
Sum of Squared Difference 17.50
Standard Error of the Estimate 1.942016625
h Statistic 0.295238095
Average Predicted Y (YHat) 9.657142857
For Average Predicted Y (YHat)
Interval Half Width 2.929740344
Confidence Interval Lower Limit 6.727402513
Confidence Interval Upper Limit 12.5868832
For Individual Response Y
Interval Half Width 6.136457614
Prediction Interval Lower Limit 3.520685243
Prediction Interval Upper Limit 15.79360047
4/1/2013 73
Weekly Sales/Test Scores problem
Compute the prediction interval (for Jo
Cruickshank who scored 5 on the test).
Two tail test
Alpha error = .05
df = n-2
= 9.657
Syx = 1.942
^
Y
4/1/2013 74
Weekly Sales/Test Scores problem
( )

'

+ +

Y t S
Y X
n
X X
X
X
n
where t is from Appendix F with
n rees of freedom
( )
( )
( ) deg .
1
1
2
2
2
2
9 657 2 776 1942 129524
9 657 6135
. ( . )( . ) .
. .
+
+
or
or
Between \$3,522 and \$15,792
4/1/2013 75
Weekly Sales/Test Scores problem
Interpret the results of your prediction
interval.
4/1/2013 76
Weekly Sales/Test Scores problem
Interpretation of the Prediction Interval
We can say, with 95% confidence,
that the weekly sales for applicant
Jo Cruickshank, who scored 5 on
the aptitude test, will be between
\$3,520 and \$15,790.

4/1/2013 77
Common Errors When Using
Regression And Correlation
Analysis
4/1/2013 78
Using Regression and Correlation Analyses:
Limitations and Errors
Extrapolation beyond the range of the
observed data
Cause and effect
Using past trends to estimate future trends
Misinterpreting the coefficients of
correlation and determination
Finding relationships when they do not exist
4/1/2013 79
Finding relationships when they do not exist
Nearly all sick people have eaten carrots.
Obviously, the effects are cumulative.
An estimated 99.9% of all people who die from
cancer and ruptured appendix have eaten carrots.
Another 99.9% of people involved in auto
accidents ate carrots within 60 days of the
incident.
Some 93.1% of gang members come from homes
where carrots were frequently served.
4/1/2013 80
MORE
Finding relationships when they do not exist
Among the people born in 1839, who later dined
on carrots, there has been a 100 % mortality rate.
Studies have shown, based on recent laboratory
tests, that rats who were fed 500 lbs. of carrots per
day died within 3 weeks.
Many bunnies have been examined post-mortem
and were found to have eaten carrots.
4/1/2013 81
MORE
Finding relationships when they do not exist
All surviving carrot eaters born between 1900 and
1910 have wrinkled skin, brittle bones, few if any
of their own teeth and failing eyesight.
Virtually all people who experience depression for
at least 45 minutes a week are known to have
eaten carrots sometime during their life.
4/1/2013 82
Monday, April 21, 1997 The
Arizona Republic
Tobacco executives insist
smoking isnt addictive
No worse than carrots, R.J.
Reynolds chief says
4/1/2013 83
Smoking in high school may indicate
teenage suicide risk, study suggests
The Associated Press
LOS ANGELES - High school students who
smoked were up to 18 times more likely as
nonsmokers to say they had attempted
suicide, a government study found.
4/1/2013 84
Cause and effect
Smoking in high school...
The results do not imply that smoking
causes suicide, stressed psychologist
Kenneth Carter of the Centers for Disease
Control and Prevention. Rather, he said,
smoking may be an indicator of depression.
4/1/2013 85
Cause and effect
Smoking in high school...
Some depressed youngsters may be using tobacco
to gain some relief, or smoking may be common
among teenagers who are hopeless and depressed,
he said.
What ever the reason for the link, the findings
suggest that if a student who seems depressed also
smokes a pack a day, Im going to be a lot more
worried about that student, Carter said.
4/1/2013 86
Cause and effect
Smoking in high school...
The results came from 11,243 high school
students who filled out questionnaires in a
1991 national survey.

( See following slides for rest of article.)
4/1/2013 87
Cause and effect
Smoking in high school...
Even light smokers were five to six times as likely to say
they had tried to kill themselves in the previous year,
according to the study, presented by Carter on Monsay at
the annual meeting of the American Psychological
Association.
Boys who were heavy smokers - six or more cigarettes a
day for more than six days within the prior month - were
18 times as likely as nonsmoking boys to have attempted
suicide in the prior year. They also were 10 times as likely
to report making a plan for killing themselves in the prior
year.
4/1/2013 88
Cause and effect
Smoking in high school...
Girls who were heavy smokers were five
times as likely as nonsmoking girls to have
attempted suicide.
4/1/2013 89
This is an old homework problem
You can use it for practice if youd
like.

4/1/2013 90
GMAT GPI
688 3.72
647 3.44
652 3.21
608 3.29
680 3.91
617 3.28
557 3.02
599 3.13
616 3.45
594 3.33
567 3.07
542 2.86
551 2.91
573 2.79
536 3
639 3.55
619 3.47
694 3.6
718 3.88
759 3.76
Data
4/1/2013 91
Scatter Diagram and Regression Line to predict Grade Point
Index from GMAT score
y = 0.0049x + 0.3003
R
2
= 0.7978
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
500 550 600 650 700 750 800
GMAT Score
G
r
a
d
e

P
o
i
n
t

I
n
d
e
x
For each one unit increase in the GMAT score, GPI
increases by .0049 points. Or, for a 100 point increase in
GMAT, GPI increases by .49 or half a grade.
R
2
: About 80% of the variation in GPI is explained by the
the variation in the GMAT Score
R=.89
There is very strong positive correlation between the GPI and
GMAT score. (+1.0 is perfect positive correlation and zero is no
correlation.
4/1/2013 92
A further discussion of
correlation and regression
There is very high correlation between GPI and GMAT test
scores. Does this mean that GMAT test scores cause GPI?
This does not mean that studying for the GMAT will raise
your GPI in graduate school. An increase in GMAT does not
causeyour GPI to go up. If this were the case, why study
finance in graduate school why not concentrate on raising
your GMAT score.
4/1/2013 93
Confidence Interval Estimate Problem 13.77 on page 847
X Value 600
Confidence Level 95%
Sample Size 20
Degrees of Freedom 18
t Value 2.100923666
Sample Mean 622.8
Sum of Squared Difference 72757.20
Standard Error of the Estimate 0.155870258
h Statistic 0.05714486
Average Predicted Y (YHat) 3.222458849
For Average Predicted Y (YHat)
Interval Half Width 0.078282036
Confidence Interval Lower Limit 3.144176813
Confidence Interval Upper Limit 3.300740886
For Individual Response Y
Interval Half Width 0.336698188
Prediction Interval Lower Limit 2.885760661
Prediction Interval Upper Limit 3.559157038
We are 95%
confident that the
mean GPI, for a
group of students
who scored 600 on
the GMAT, will be
between 3.14 and
3.30.
We are 95%
confident that the
GPI, for one
student who
scored 600 on the
GMAT, will be
between 2.90 and
3.56.
Margin of error
4/1/2013 94

) (
^
2
Y Y

) (
^
2
Y Y
( )

Y Y
^
Y
Y
Y Y
^
2
^
|
.
|

\
|
Y Y
2
^
) ( Y Y
2
) ( Y Y