Professional Documents
Culture Documents
Y Y
X X
Y Y
X X
(continued)
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
(continued)
No relationship
X
Simple Linear Regression
Model
• Only one independent variable, X
• Relationship between X and Y is described by a
linear function
• Changes in Y are assumed to be related to changes in
X
Simple Linear Regression
Model
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Y
Observed Value
of Y for Xi
εi Slope = β1
Intercept = β0
Xi
X
Simple Linear Regression
Equation (Prediction Line)
The simple linear regression equation provides an estimate of the
population regression line
Estimated (or
predicted) Y Estimate of the Estimate of the
value for regression regression slope
observation i intercept
Value of X for
observation i
Estimation Process
Sample Data:
Regression Model X Y
Y = β00 + β11X + e x11 y11
Unknown Parameters . .
b00, b11 . .
xnn ynn
Estimated
b00 and b11 Regression Equation
provide estimates of Y ^ =b0 +b 1 X
β00 and β11 Sample Statistics
b00, b11
b0 and b1 are obtained by finding the values of that
minimize the sum of the squared differences
between Y and
where:
Yi = observed value of the dependent variable
for the ith observation
= estimated value of the dependent variable
for the ith observation
Least Squares Method
Slope and intercept for the Estimated Regression Equation
∑ (X i −)(Y i −Ȳ )
b0 =Ȳ −b 1
b1 = and
∑ ¿¿¿
where:
Xi = value of independent variable for ith
observation
Yi = value of dependent variable for ith
observation
= mean of the independent variable
= mean of the dependent variable
Simple Linear Regression
Example: Reed Auto Sales
Reed Auto periodically has a special week-long sale. As part of the
advertising campaign Reed runs one or more television commercials during
the weekend preceding the sale. Data from a sample of 5 previous sales are
shown below:
Error (ɛ) =
Number of Number of No. of TV No. of cars
ads (X) sold (Y) (X-) (Y-Ȳ) Est. sales Actual
(X-)(Y-Ȳ) (X-)^2 = 10+5X sales - Est.
TV Ads (x) Cars Sold (y) sales
1 14 1 14 -1 -6 6 1 15 -1
3 24 3 24 1 4 4 1 25 -1
2 18
2 18 0 -2 0 0 20 -2
1 17
3 27 1 17 -1 -3 3 1 15 2
Sx = 10 Sy = 100 3 27 1 7 7 1 25 2
=2 Ȳ = 20 20 4
Estimated Regression Equation
i.e.,
Coefficient of Determination
• Relationship Among SST, SSR, SSE
SST = SSR + SSE
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Sum of Squares calculation
No. of TV ads No. of cars sold Est. sales Error (ɛ) = Actual
(X) (Y) (X-) (Y-Ȳ)
= 10+5X sales(Y) - Est. sales(
(Y-Ȳ)^2 ( - Ȳ)^2 (Y- )^2
1 14 -1 -6 15 -1 36 25 1
3 24 1 4 25 -1 16 25 1
2 18 0 -2 20 -2 4 0 4
1 17 -1 -3 15 2 9 25 4
3 27 1 7 25 2 49 25 4
r2 = SSR/SST
OR
r2 = 1-(SSE/SST)
where:
SSR = sum of squares due to regression
SSE = sum of squares due to errors
SST = total sum of squares
Coefficient of Determination
where:
b1 = the slope of the estimated regression
equation
Sample Correlation Coefficient
rxy = +.9366
Assumptions about the Error term e
s 2 = MSE = SSE/(n-k-1)
where:
k is the no. of independent variables
Testing for Significance
• An Estimate of s
• To estimate s we take the square root of s 2.
• The resulting s is called the standard error of
the estimate.
Testing for Significance: t-test
• Hypotheses
• Test Statistic
where
Testing for Significance: t Test
Rejection Rule
where:
t is based on a t distribution
with n - 2 degrees of freedom
Testing for Significance: t Test
Hypotheses
Test Statistic
F = MSR/MSE
MSR = SSR/k
MSE = SSE/(n-k-1)
Rejection Rule
Reject H0 if
p-value < a
or F > F
where:
F is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator
Testing for Significance: F Test
200
f(x) = 5 x + 60
R² = 0.9
Quarterly sales ($1000s)
150
Linear ()
100
50
0
0 5 10 15 20 25 30
Regression Statistics
Multiple R 0.950122955
R Square 0.90273363
Adjusted R Square 0.890575334
Standard Error 13.82931669
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 14200 14200 74.24836601 2.54887E-05
Residual/Error 8 1530 191.25
Total 9 15730
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 60 9.22603481 6.503335532 0.000187444 38.72472558 81.27527442 38.72472558 81.27527442
X Variable 1 5 0.580265238 8.616749156 2.54887E-05 3.661905962 6.338094038 3.661905962 6.338094038
RESIDUAL OUTPUT
1. Data on advertising expenditures and revenue (in thousands of dollars) for the Four Seasons Restaurant are as
follow:
Advt exp($1000s) Revenue($1000s)
1 19
2 32
4 44
6 40
10 52
14 53
20 54
a. Let x equal advertising expenditures and y equal revenue. Use the method of least squares to develop a
straight-line approximation of the relationship between the two variables.
b. Test whether revenue and advertising expenditures are related at 0.05 level of significance.
c. Test whether the estimated regression coefficient is significant at 0.05 level of significance.
d. Construct a confidence interval for regression coefficient at 0.05 level of significance.
2. Concur Technologies, Inc., is a large expense-management company located in Redmond, Washington. The Wall
street Journal asked Concur to examine the data from 8.3 million expense reports to provide insights regarding
business travel expenses. Their analysis of the data showed that New York was the most expensive city, with an
average daily hotel room rate of $198 and an average amount spent on entertainment, including group meals and
tickets for shows, sports, and other events, of $172. In comparison, the U.S. averages for these two categories were
$89 for the room rate and $99 for entertainment. The following table shows the average daily hotel room rate and the
amount spent on entertainment for a random sample of 9 of the 25 most visited U.S. cities.
City Room rent($) Entertainment($)
Boston 148 161
Denver 96 105
Nashville 91 101
New Orleans 110 142
Phoenix 90 100
San Diego 102 120
San Francisco 136 167
San Jose 90 140
Tampa 82 98
(i) Develop a scatter diagram for these data with the room rate as the independent variable. (ii) What does the scatter
diagram developed in part (i) indicate about the relationship between the two variables? (iii) Develop the least
squares estimated regression equation. (iv) Provide an interpretation for the slope of the estimated regression
equation. (v) The average room rate in Chicago is $128, considerably higher than the U.S. average. Predict the
entertainment expense per day for Chicago.