Professional Documents
Culture Documents
Regression Model
MDA course 6
Purpose of Regression and
Correlation Analysis
• Regression Analysis is Used Primarily for
Prediction
A statistical model used to predict the values of a
dependent or response variable based on values of
at least one independent or explanatory variable
Axis
100
Title
50
0 Axis Title
0 20 40 60
Types of Regression Models
Y intercept Random
Error
Yi 0 1 X i i
Dependent
(Response) Independent
Slope (Explanatory)
Variable
Variable
Error Variable: Required
Conditions
The error is a critical part of the regression
model.
Four requirements involving the distribution of
must be satisfied.
The probability distribution of is normal.
The mean of is zero: E() = 0.
The standard deviation of is s for all values of x.
The set of errors associated with different values of y
are all independent.
6
Population
Linear Regression Model
Y Yi 0 1X i i Observed
Value
i = Random Error
m 0 1X i
YX
X
Observed Value
Sample Linear Regression
Model
Y i b0 b1X i
Yi = Predicted Value of Y for observation i
cov( X , Y )
b1 ŷ b 0 b1x
s 2x
b 0 y b1 x
9
REGRESSION COEFFICIENTS
Annual
Store Square Sales
You wish to examine the Feet ($000)
relationship between the 1 1,726 3,681
square footage of produce
2 1,542 3,395
stores and its annual sales.
Sample data for 7 stores 3 2,816 6,653
were obtained. Find the 4 5,555 9,543
equation of the straight 5 1,292 3,318
line that fits the data best 6 2,208 5,563
7 1,313 3,760
Scatter Diagram Example
12000
Annua l Sa le s ($000)
10000
8000
6000
4000
2000
0
0 1000 2000 3000 4000 5000 6000
S q u a re F e e t
Excel Output
Equation for the Best
Straight Line
Y i b0 b1 X i
1636 . 415 1 . 487 X i
10000
8000
6000
4000
2000
0
0 1000 2000 3000 4000 5000 6000
S q u a re F e e t
Interpreting the Results
Yi = 1636.415 +1.487Xi
b1 1 SYX
•Test Statistic: t Where Sb
S b1 1 n
( Xi X )
2
i 1
and df = n - 2
Standard Error of Estimate
n
( Yi Yi )
SSE 2
Syx = i 1
n2
n2
10000
8000
6000
4000
2000
0
0 1000 2000 3000 4000 5000 6000
S q u a re F e e t
Example: Produce Stores
.025 .025
Conclusion:
There is evidence of a
-2.5706 0 2.5706
t linear relationship.
Inferences about the Slope:
Confidence Interval Example
_
SSR = (Yi - Y)2
_
Y
X
Xi
Measures of Variation
The Sum of Squares: Example
H0: 1 = 0
H1: At least one i is not equal to 0
The F test
Construct the F statistic
MSR=SSR/k-1
MSR
[Variation in y] = SSR + SSE. F
Large F results from a large SSR. MSE
Then, much of the variation in y is
explained by the regression
Rejection regionmodel. MSE=SSE/(n-k)
The null hypothesis should
be rejected; thus, the model is valid.
F >Fa,k,n-k Required conditions
must be satisfied.
The Coefficient of
Determination
Y r2 = 1, r = +1 Y r2 = 1, r = -1
^=b +b X
Yi 0 1 i
^=b +b X
Yi 0 1 i
X X
^=b +b X
Y ^=b +b X
Y
i 0 1 i i 0 1 i
X X
Measures of Variation:
Example
Excel Output for Produce Stores
R e g r e ssi o n S ta ti sti c s
M u lt ip le R 0 .9 7 0 5 5 7 2
R S q u a re 0 .9 4 1 9 8 1 2 9
A d ju s t e d R S q u a re 0 .9 3 0 3 7 7 5 4
S t a n d a rd E rro r 6 1 1 .7 5 1 5 1 7
O b s e r va t i o n s 7
r2 = .94 Syx
94% of the variation in annual sales can be
explained by the variability in the size of the
store as measured by square footage
Estimation of
Predicted Values
1 ( Xi X ) 2
Ŷi t n 2 Syx 1 n
n ( X X )2
i
i 1
Interval Estimates for
Different Values of X
Confidence Interval Confidence
for a individual Yi Interval for the
Y mean of Y
_ X
X A Given X
Example: Produce Stores
1 ( X i X )2
Ŷi t n 2 Syx n = 4610.45 980.97
n ( X X )2
i
i 1 Confidence interval for mean Y
Estimation of Predicted
Values: Example
Confidence Interval Estimate for mXY
Find the 95% confidence interval for annual sales of one
particular store of 2,000 square feet
Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
1 ( X i X )2
Ŷi t n 2 Syx 1 n = 4610.45 1853.45
n ( X X )2
i
i 1
Confidence interval for individua
Y