You are on page 1of 16

Business Analytics

Fall 2021

Week-04
Making Numerical Predictions
Simple Linear Regression

Question: How much? or How many?


uses regression algorithms

 Regression algorithms make numerical predictions,


such as:
 What will be next month car sales?
 What will be next week’s temp?
 They help answer any question that asks for a
number.
 Predict the values of a dependent (response) variable
based on values of at least one independent
(explanatory) variable
Scatterplot
Examine possible relationships between two numerical
variables

Types of Regression Models


Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship

Chap 13-4
Simple Linear Regression Model
 Simple: One (independent) variable
 Linear: Relationship between Variables is
Described by a Linear Function
 The Change of One Variable Causes the Other
Variable to Change
 A Dependence of One Variable on the Other

Introduction to Linear Regression


(cont.)
 Any straight line can be represented by an
equation of the form Y = a + bX, where ‘b’ and
‘a’ are constants.
 The value of ‘b’ is called the slope constant and
determines the direction and degree to which the
line is tilted.
 The value of ‘a’ is called the Y-intercept and
determines the point where the line crosses the
Y-axis.

6
Introduction to Linear Regression
(cont.)
 How well a set of data points fits a straight line can be
measured by calculating the distance between the data points
and the line.
 The total error between data points and the line is obtained by
squaring each distance & then summing the squared values.
 Regression equation: Minimize the sum of squared errors.

Error = Actual – Predicted

8
Regression Equation

Calculating a and b

Simple Linear Regression:


Example 1
• Determine / predict salary column values (Dependent
Variables) based on years of experience.
Example Solution
Finding the regression equation

X Y X2 XY
2 15 n=9 (no of data points)
3 28
5 42
13 64
8 50
16 90
11 58
1 8
9 54
Avg(X) Avg(Y) Sum(X2) Sum(XY)

Example Solution
Finding the regression equation

X Y X2 XY
2 15 4 30
n=9
3 28 9 84
5 42 25 210
13 64 169 832
8 50 64 400
b = 4.80
16 90 256 1440
11 58 121 638 a = 45.44 – (4.80 x 7.56)
1 8 1 8 a = 9.18
9 54 81 486
Y = 9.18 + 4.80 X
Sal = 9.18 + 4.80 (Exp)
Avg(X) = 7.56 Sum(X2) = 730
Avg(Y) = 45.44 Sum(XY) = 4128
Interpretation of Results: Example
Interpreting the slope

Y = 9.18 + 4.80 X
Sal = 9.18 + 4.80 (Exp)

The slope of 4.80 means that for each increase of


one unit in X, we predict the average of Y to
increase by an estimated 4.80 units.

The equation estimates that for each increase of 1


year in experience, the expected salary is
predicted to increase by $4800 (4.80 x 1000$).

Making predictions

 Sal = 9.18 + 4.80 (Exp)


 What is the predicted salary for someone with
an experience of
 6 years
 14 years
Example Solution
Predicted Values & Residuals (errors)

X Y Pred Y Residual Y = 9.18 + 4.80 X


2 15 18.78 -3.78
3 28 23.58 4.42
5 42 33.18 8.82
13 64 71.58 -7.58
8 50 47.58 2.42
16 90 85.97 4.03
11 58 61.98 -3.98
1 8 13.98 -5.98
9 54 52.38 1.62

What is the predicted salary for experience of 10


years? What is the prediction error?

Evaluating Regression Results


 The ability of the regression equation to accurately
predict the Y values is measured by first computing
the proportion of the Y-score variability that
is predicted by the regression equation and the
proportion that is not predicted.
Measures of Variation:
The Sum of Squares
(continued)

_
Y

X
Xi

Measures of Variation:
The Sum of Squares
(continued)

Y 
SSE =(Yi - Yi )2
_
SST = (Yi - Y)2

 _
SSR = (Yi - Y)2
_
Y

X
Xi
Measures of Variation:
The Sum of Squares
(continued)

 SST = Total Sum of Squares


 Measures the variation of the Yi values around
their mean, Y
 SSR = Regression Sum of Squares
 Explained variation attributable to the relationship
between X and Y
 SSE = Error Sum of Squares
 Variation attributable to factors other than the
relationship between X and Y

Measures of Variation:
The Sum of Squares

SST = SSR + SSE

Total
= Explained + Unexplained
Sample
Variability Variability
Variability
Measures of Variation
The Sum of Squares: Example
Excel Output for Salary example

SSR
SSE SST

The Coefficient of Determination

SSR Regression Sum of Squares



r 
2

SST Total Sum of Squares

 Measures the proportion of variation in Y that


is explained by the independent variable X in
the regression model
Excel Output: Salary example
R-sq = SSR/SST
= 4980.907/5226.222
= 0.953

The Coefficient of Determination


r2

 Recall:
 The coefficient of determination r2 is the
proportion of variability in the response variable
“explained” by the regression.

 It’s another way of saying, “By introducing this


other variable, how much better is my estimate
than it would be if I simply used the average to
make my estimate.”
The Coefficient of Determination
r2
 Example: In a study of bone density as a function
of body weight, an ‘r’ of .6 is noted.

Interpretation 1 (full credit): “About 36% of the


variability in bone density is explained by the
linear regression of bone density on body weight.”

Interpretation 2 (less credit): “About 36% of


variability in bone density is accounted for by body
weight.”

Inferences about the Slope:


F Test
 F Test for a Population Slope
 Is there a linear relationship between Y and X ?

 A significant F-ratio (p-value < 0.05) indicates that


the equation predicts a significant portion of the
variability in the Y scores (more than would be
expected by chance alone).

Chap 13-26
Important concepts
 Calculating ‘a’ and ‘b’ using formula, & writing the
regression equation.
 Interpreting the slope (with + or – sign).
 Using the equation, making a prediction &
calculating prediction errors (residuals)
 From regression output: R-sq, SSR, SSE, SST, F-
statistic and its significance, Slope and intercept
values.
 Calculating & Interpreting the r-squared value

The Simple Linear Regression


• Example 2: Car Odometer Price
 A car dealer wants to find 1 37388 14636
the relationship between 2 44758 14122
the odometer reading and 3 45833 14016
the selling price of used 4 30862 15590
cars. 5 31705 15568
 A random sample is 6 34010 14718
selected, and the data . . .
Independent Dependent
recorded. . . x
variable . y
variable
 Find the regression line. . . .
 Interpret the slope

28
Scatterplot
Car Odometer Price
1 37388 14636
2 44758 14122
3 45833 14016
4 30862 15590
5 31705 15568
6 34010 14718
. . .
. . .
. . .

29

6.30
Excel Output of car price example

Note: Can get R2 from ANOVA table too.


SS Regression = SSR = 2052339
SS Total = SST = 2318134

R2 = SSR/SST = 2052339 / 231814 = 0.8853


Homework Problem: Predict annual sales based upon
Store area. Use both calculator & MS Excel.

QUESTIONS
Data for 7 Stores:
1. What is the correlation
Annual coefficient? SSR, SST, SSE?
Store Square Sales
2. What is the regression equation?
Feet ($000)
3. Interpret the slope.
1 1,726 3,681
4. Interpret R-sq value.
2 1,542 3,395
5. Interpret the F-statistic.
3 2,816 6,653
6. What’s the prediction error for
4 5,555 9,543
Store 3 and 6?
5 1,292 3,318
7. Consider a store with 2000 square
6 2,208 5,563 feet area. Predict Annual Sales for
7 1,313 3,760 this case…

You might also like