BA2 1 Simp Reg

In this chapter, you learn:
• How to use regression analysis to predict the value of a

dependent variable based on an independent variable
• The meaning of the regression coefficients b0 and b1
• How to evaluate the assumptions of regression analysis and
know what to do if the assumptions are violated
• To make inferences about the slope and correlation
coefficient
• To estimate mean values and predict individual values
Simple Linear Regression
 Managerial decisions are often based on the

relationship between two or more variables.
 Regression analysis can be used to develop an
equation showing how the variables are related.
 The variable being predicted is called the dependent
variable and is denoted by y.
 The variables being used to predict the value of the
dependent variable are called the independent
variables and are denoted by x.
 Simple linear regression involves one independent

variable and one dependent variable.
 The relationship between the two variables is
approximated by a straight line.
 Regression analysis involving two or more
independent variables is called multiple regression.
• A scatter plot can be used to show the relationship between
two variables
• Correlation analysis is used to measure the strength of the
association (linear relationship) between two variables
• Correlation is only concerned with strength of the relationship
• No causal effect is implied with correlation
Introduction to Regression
Analysis
• Regression analysis is used to:
• Predict the value of a dependent variable based on the
value of at least one independent variable
• Explain the impact of changes in an independent variable
on the dependent variable
Dependent variable: the variable we wish to
predict or explain
Independent variable: the variable used to predict
or explain the dependent
variable
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
(continued)
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
(continued)
No relationship
X
Model
• Only one independent variable, X
• Relationship between X and Y is described by a
linear function
• Changes in Y are assumed to be related to changes in
X
Model
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Linear component Random Error

component
Model (continued)
Y
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error for this Xi

of Y for Xi value
Intercept = β0
Xi
X
Equation (Prediction Line)
The simple linear regression equation provides an estimate of the
population regression line
Estimated (or
predicted) Y Estimate of the Estimate of the
value for regression regression slope
observation i intercept
Value of X for
observation i
Estimation Process
Sample Data:
Regression Model X Y
Y = β00 + β11X + e x11 y11
Unknown Parameters . .
b00, b11 . .
xnn ynn
Estimated
b00 and b11 Regression Equation
provide estimates of Y ^ =b0 +b 1 X
β00 and β11 Sample Statistics
b00, b11
b0 and b1 are obtained by finding the values of that
minimize the sum of the squared differences
between Y and
where:
Yi = observed value of the dependent variable
for the ith observation
= estimated value of the dependent variable
for the ith observation
Least Squares Method
Slope and intercept for the Estimated Regression Equation
∑ (X i −)(Y i −Ȳ )
b0 =Ȳ −b 1 

b1 = and
∑ ¿¿¿
where:
Xi = value of independent variable for ith
observation
Yi = value of dependent variable for ith
observation
= mean of the independent variable
= mean of the dependent variable
Example: Reed Auto Sales
Reed Auto periodically has a special week-long sale. As part of the
advertising campaign Reed runs one or more television commercials during
the weekend preceding the sale. Data from a sample of 5 previous sales are
shown below:
Error (ɛ) =
Number of Number of No. of TV No. of cars
ads (X) sold (Y) (X-) (Y-Ȳ) Est. sales Actual
(X-)(Y-Ȳ) (X-)^2 = 10+5X sales - Est.
TV Ads (x) Cars Sold (y) sales
1 14 1 14 -1 -6 6 1 15 -1
3 24 3 24 1 4 4 1 25 -1
2 18
2 18 0 -2 0 0 20 -2
1 17
3 27 1 17 -1 -3 3 1 15 2
Sx = 10 Sy = 100 3 27 1 7 7 1 25 2
=2 Ȳ = 20 20 4
Estimated Regression Equation
 Slope for the Estimated Regression Equation

= 20/4 = 5
 y-Intercept for the Estimated Regression Equation

= 20-5(2) = 10
 Estimated Regression Equation

i.e.,
Coefficient of Determination
• Relationship Among SST, SSR, SSE
SST = SSR + SSE
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Sum of Squares calculation
No. of TV ads No. of cars sold Est. sales Error (ɛ) = Actual
(X) (Y) (X-) (Y-Ȳ)
= 10+5X sales(Y) - Est. sales(
(Y-Ȳ)^2 ( - Ȳ)^2 (Y- )^2
1 14 -1 -6 15 -1 36 25 1
3 24 1 4 25 -1 16 25 1
2 18 0 -2 20 -2 4 0 4
1 17 -1 -3 15 2 9 25 4
3 27 1 7 25 2 49 25 4
=2 Ȳ = 20 SST=114 SSR=100 SSE=14

 The coefficient of determination is:
r2 = SSR/SST
OR
r2 = 1-(SSE/SST)
where:
SSR = sum of squares due to regression
SSE = sum of squares due to errors
SST = total sum of squares
r2 = SSR/SST = 100/114 = .8772
The regression relationship is very strong; 87.72%

of the variability in the number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.
Sample Correlation Coefficient
where:
b1 = the slope of the estimated regression
equation
Sample Correlation Coefficient
The sign of b1 in the equation is “+”.
rxy = +.9366
Assumptions about the Error term e
1. The error  is a random variable with mean of zero.
2. The variance of  , denoted by  2, is the same for

all values of the independent variable.
3. The values of  are independent.
4. The error  is a normally distributed random

variable.
Testing for Significance
To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
the value of b1 is zero.
Two tests are commonly used:

t Test and F Test
Both the t test and F test require an estimate of s 2,

the variance of e in the regression model.
• An Estimate of s 2
The mean square error (MSE) provides the estimate
of s 2, and the notation s2 is also used.
s 2 = MSE = SSE/(n-k-1)
where:
k is the no. of independent variables
• An Estimate of s
• To estimate s we take the square root of s 2.
• The resulting s is called the standard error of
the estimate.
Testing for Significance: t-test
• Hypotheses
• Test Statistic
where
Testing for Significance: t Test
 Rejection Rule
Reject H0 if p-value < a

or t < -tor t > t
where:
t is based on a t distribution
with n - 2 degrees of freedom
1. Determine the hypotheses.
2. Specify the level of significance. a = .05
3. Select the test statistic.
4. State the rejection rule. Reject H0 if p-value < .05

or |t| > 3.182 (with
3 degrees of freedom)
5. Compute the value of the test statistic.
6. Determine whether to reject H0.

t = 4.541 provides an area of .01 in the upper
tail. Hence, the p-value is less than .02. (Also,
t = 4.63 > 3.182.) We can reject H0.
Confidence Interval for β1
 We can use a 95% confidence interval for 1 to test
the hypotheses just used in the t test.
 H0 is rejected if the hypothesized value of 1 is not
included in the confidence interval for 1.
• The form of a confidence interval for 1 is: is the
margin
of error
b11 is the
point
estimator where is the t value providing an area
of a/2 in the upper tail of a t distribution
with n - 2 degrees of freedom
 Rejection Rule
Reject H0 if 0 is not included in
the confidence interval for 1.
 95% Confidence Interval for 1
= 5 +/- 3.182(1.08) = 5 +/- 3.44
or 1.56 to 8.44
 Conclusion
0 is not included in the confidence interval.
Reject H0
Testing for Significance: F Test
 Hypotheses
 Test Statistic
F = MSR/MSE
MSR = SSR/k
MSE = SSE/(n-k-1)
k = no. of independent variables

 Rejection Rule
Reject H0 if
p-value < a
or F > F
where:
F is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator
1. Determine the hypotheses.
2. Specify the level of significance. a = .05
3. Select the test statistic. F = MSR/MSE
4. State the rejection rule. Reject H0 if p-value < .05

or F > 10.13 (with 1 d.f.
in numerator and
3 d.f. in denominator)
5. Compute the value of the test statistic.
F = MSR/MSE = 100/4.667 = 21.43
6. Determine whether to reject H0.

F = 17.44 provides an area of .025 in the upper
tail. Thus, the p-value corresponding to F = 21.43
is less than .025. Hence, we reject H0.
The statistical evidence is sufficient to conclude
that we have a significant relationship between the
number of TV ads aired and the number of cars sold.
Example: Data were collected from a sample of 10 Armand’s Pizza Parlor restaurants located near college
campuses. Obtain the estimated regression line and estimate quarterly sales of an outlet near a campus with
30,000 students.
Student Quarterly Error
population sales (X-) (Y-Ȳ) (X-)(Y-Ȳ) (X-)2 = 60+5X (Y-Ȳ)2 (-Ȳ)2 (Y-)2
(1000s) X ($1000s) Y (Y-)
2 58 -12 -72 864 144 70 -12 5184 3600 144
6 105 -8 -25 200 64 90 15 625 1600 225

8 88 -6 -42 252 36 100 -12 1764 900 144
8 118 -6 -12 72 36 100 18 144 900 324
12 117 -2 -13 26 4 120 -3 169 100 9
16 137 2 7 14 4 140 -3 49 100 9
20 157 6 27 162 36 160 -3 729 900 9
20 169 6 39 234 36 160 9 1521 900 81
22 149 8 19 152 64 170 -21 361 1600 441
26 202 12 72 864 144 190 12 5184 3600 144

SST=
SST= SSR=
SSR= SSE=
SSE=

 == 14
14 Ȳ
Ȳ == 130
130 2840
2840 568
568 15730 14200 1530
15730 14200 1530
Estimated regression line of quarterly sales
250
200
f(x) = 5 x + 60
R² = 0.9
Quarterly sales ($1000s)
150
Linear ()
100
50
0
0 5 10 15 20 25 30
No. students (1000s)

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.950122955
R Square 0.90273363
Adjusted R Square 0.890575334
Standard Error 13.82931669
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 14200 14200 74.24836601 2.54887E-05
Residual/Error 8 1530 191.25
Total 9 15730
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 60 9.22603481 6.503335532 0.000187444 38.72472558 81.27527442 38.72472558 81.27527442
X Variable 1 5 0.580265238 8.616749156 2.54887E-05 3.661905962 6.338094038 3.661905962 6.338094038
RESIDUAL OUTPUT
Observation Predicted Y Residuals

1 70 -12
2 90 15
3 100 -12
4 100 18
5 120 -3
6 140 -3
7 160 -3
8 160 9
9 170 -21
10 190 12
Practice Exercises
1. Data on advertising expenditures and revenue (in thousands of dollars) for the Four Seasons Restaurant are as
follow:
Advt exp($1000s) Revenue($1000s)
1 19
2 32
4 44
6 40
10 52
14 53
20 54
a. Let x equal advertising expenditures and y equal revenue. Use the method of least squares to develop a
straight-line approximation of the relationship between the two variables.
b. Test whether revenue and advertising expenditures are related at 0.05 level of significance.
c. Test whether the estimated regression coefficient is significant at 0.05 level of significance.
d. Construct a confidence interval for regression coefficient at 0.05 level of significance.
2. Concur Technologies, Inc., is a large expense-management company located in Redmond, Washington. The Wall
street Journal asked Concur to examine the data from 8.3 million expense reports to provide insights regarding
business travel expenses. Their analysis of the data showed that New York was the most expensive city, with an
average daily hotel room rate of $198 and an average amount spent on entertainment, including group meals and
tickets for shows, sports, and other events, of $172. In comparison, the U.S. averages for these two categories were
$89 for the room rate and $99 for entertainment. The following table shows the average daily hotel room rate and the
amount spent on entertainment for a random sample of 9 of the 25 most visited U.S. cities.
City Room rent($) Entertainment($)
Boston 148 161
Denver 96 105
Nashville 91 101
New Orleans 110 142
Phoenix 90 100
San Diego 102 120
San Francisco 136 167
San Jose 90 140
Tampa 82 98
(i) Develop a scatter diagram for these data with the room rate as the independent variable. (ii) What does the scatter
diagram developed in part (i) indicate about the relationship between the two variables? (iii) Develop the least
squares estimated regression equation. (iv) Provide an interpretation for the slope of the estimated regression
equation. (v) The average room rate in Chicago is $128, considerably higher than the U.S. average. Predict the
entertainment expense per day for Chicago.

BA2 1 Simp Reg

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BA2 1 Simp Reg

Uploaded by

Copyright:

Available Formats

In this chapter, you learn:

• How to use regression analysis to predict the value of a

 Managerial decisions are often based on the

 Simple linear regression involves one independent

Linear component Random Error

Predicted Value Random Error for this Xi

 Slope for the Estimated Regression Equation

 y-Intercept for the Estimated Regression Equation

=2 Ȳ = 20 SST=114 SSR=100 SSE=14

 The coefficient of determination is:

r2 = SSR/SST = 100/114 = .8772

The regression relationship is very strong; 87.72%

The sign of b1 in the equation is “+”.

1. The error  is a random variable with mean of zero.

2. The variance of  , denoted by  2, is the same for

3. The values of  are independent.

4. The error  is a normally distributed random

Two tests are commonly used:

Both the t test and F test require an estimate of s 2,

Reject H0 if p-value < a

1. Determine the hypotheses.

2. Specify the level of significance. a = .05

3. Select the test statistic.

4. State the rejection rule. Reject H0 if p-value < .05

5. Compute the value of the test statistic.

6. Determine whether to reject H0.

k = no. of independent variables

1. Determine the hypotheses.

2. Specify the level of significance. a = .05

3. Select the test statistic. F = MSR/MSE

4. State the rejection rule. Reject H0 if p-value < .05

5. Compute the value of the test statistic.

F = MSR/MSE = 100/4.667 = 21.43

6. Determine whether to reject H0.

2 58 -12 -72 864 144 70 -12 5184 3600 144

6 105 -8 -25 200 64 90 15 625 1600 225

26 202 12 72 864 144 190 12 5184 3600 144

No. students (1000s)

Observation Predicted Y Residuals

You might also like