Regression

- A House Divided against Itself Cannot Spend (as Much): The Fiscal Effect of Separate Taxing and Spending Committees in State Legislatures
- QBM101
- gender discrimination
- San Francisco Bread Co. Docx.
- Quantitative Applications in Management
- Guidelines for Reporting Statistics in Journals Published by the American Physiological Society
- tif_ch08
- 8. Confidence Interval Estimationnew.doc
- An Evaluation and Forecast of the Impact of Foreign Direct Investment in Nigeria’s Agriculture Sector in A VAR Environment (2)
- Error Bars Cumming 2007
- MASH Multiple Regression R
- Delay Analysis
- Error Bar in Experimental Biology
- A_conceptual_model_of_consumer_behavior.pdf
- Applied Regression Analysis - STAT 225 Z1 - Course Syllabus or Other Course-Related Document
- Business Mathematics & Statistics - 206
- Forecasting
- Research Paper 1 Edited
- Output Spss Prak 9
- Formula Sheet Edited Sec E and F

Business Analytics

Business Outcomes

1

Last lecture

population mean difference

proportion

2

Business decision in action:

Identifying drivers of business outcomes

Starbucks currently owns 24,000 retail outlets in 72 countries.

When choosing a new location, Starbucks carefully examines the

profitability of each candidate location.

(1) Demographics (population,

age, income, etc.)

(2) Nearby Starbucks stores

(3) Nearby office buildings

(4) Nearby colleges

3

Regression is useful in

explaining phenomena and forecasting

– What drives defects in a factory?

– What drives profits of Starbucks stores?

– What determines employees’ pay?

– Profit, sales, etc.

4

Two types of data

subjects at the same time (e.g., Oct sales and street

traffic density of 100 Starbucks stores)

time (e.g., Sales and street traffic density over 100

months of a certain Starbucks store)

5

Hybrid auto sales

variable) on hybrid sales (dependent variable)

Linear regression: Finding the LINE that minimizes the mean

square vertical difference from all sample points to this line

6

Hybrid auto sales:

Linear regression

tools and select Regression. Then specify:

1. The Input Y Range (hybrid sales)

2. The Input X Range (gas price)

3. Check the “Labels” box if appropriate.

4. Designate where the output is to go.

5. Check the “Residuals” and “Residual

Plots” boxes.

6. Click OK. 7

Hybrid auto sales:

Regression output

price is the independent variable, and the equation for

the relationship between the two is:

hybrid sales (K) = –13.8 + 15.2 gas price ($/gal)

8

Hybrid auto sales:

Gas price drives hybrid sales

50.0

Hybrid sales (K) 45.0

40.0

35.0

30.0

25.0

20.0

15.0

10.0

2.00 2.50 3.00

Gas price ($ per gallon)

hybrid sales (K) = –13.8 + 15.2 gas price ($/gal)

9

Hybrid auto sales:

Predict hybrid sales using gas price

50.0

Hybrid sales (K) 45.0

40.0

35.0

30.0

25.0

20.0

15.0

10.0

2.00 2.50 3.00

Gas price ($ per gallon)

hybrid sales (K) = –13.8 + 15.2 gas price ($/gal)

10

Hybrid auto sales:

Residuals

50.0

Hybrid sales (K) 45.0

40.0

35.0

30.0

25.0

20.0

15.0

10.0

2.00 2.50 3.00

Gas price ($ per gallon)

Residual: differences between actual value & prediction

11

What exactly did we do?

linear way:

hybrid sales = 𝛽0 + 𝛽1 ∗ gas price + 𝜀

where random variable 𝜀 represents the impact of

all other factors

MMSE) line based on a sample

– The estimated intercept and coefficient are random

variables

12

What else can we do?

independent of the driver, and is independent

over time, then we can use the sample to

dependent variable

13

Hypothesis test and CI for 𝜷𝟏

H0: 𝛽1 = 0 vs. H1: 𝛽1 ≠ 0

(meaning this is a two-tail p-value)

(we can also specify any significance level)

14

Hypothesis test and CI for 𝜷𝟏

drives hybrid sales (driver is significant)

• If cannot reject 𝛽1 = 0: We cannot conclude that

gas price drives hybrid sales (driver is insignificant)

• (Leave 𝛽0 alone!)

a $1 increase in gas price will cause monthly hybrid

sales to increase between 8.7k and 21.7k units

15

Constructing prediction

interval for the dependent variable

contains the actual value of 𝑦 (for a given 𝑥):

# of independent variables Std. error of

(=1 for simple regression) the prediction

𝑦ො ±T.INV.2T(𝛼, 𝑁 − 𝑘 − 1)*SE

16

R Square and Standard Error

1. The meaning of R Square – the proportion of variation in

the dependent variable (y) explained by the variation in the

independent variable (x)

2. Standard Error (of the estimate) – the standard deviation of

the residuals of the regression, useful for constructing CI

3. Observations – the number of pairs of data, (x, y) used to

run this linear regression

17

“ANOVA” –

Analysis of Variance

zero (R Square = 0) vs. H1: at least one slope is

non-zero (R Square > 0). In simple linear

regressions, Significance F = driver p-value

18

Regression equation in

standard format

Mark prediction Specify units

hybrid

Specify R Square

R Square: .5∗

Specify significance of

each driver and

∗

Significant at .05 model as a whole (R2)

2. Use multiple significance levels if necessary,

including insignificant

19

How it’s done in real life

20

Regression and correlation

σ𝑖 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത

𝑟𝑥𝑦 =

σ𝑖 𝑥𝑖 − 𝑥ҧ 2 σ𝑖 𝑦𝑖 − 𝑦ത 2

2

• In simple linear regressions, R Square = 𝑟𝑥𝑦

• Correlation ≠ regression coefficient

– Correlation measures how much two values tend to

move together; bounded by [-1,1]

– Regression coefficient measures how much an

outcome changes with a unit change of a driver;

unbounded

21

Correlation ≠ causation

pandemic

• B may cause A: Flu pandemic correlated with cold

weather

• C may cause A and B: Coke sales correlated with

gas price

• Mixture of all: Poor education correlated with

poverty

• Use intuition to build linear regression models; use

caution when interpreting results

22

Confirming assumptions

on residuals

independent of the driver, and is independent

over time. Is it so?

14 60% residuals

distribution % if Normal

7

Residuals

40%

0

1.90 2.20 2.50 2.80 3.10

-7 20%

-14

gas price ($ per gallon)

0%

Skew: 1.2

23

Linear regression is like

fitting a watermelon in a box…

even the best fit is a bad one

24

For next class

25

