You are on page 1of 23

Marketing Analytics

Fall 2022
MKTG 414
Vahid Karimi Motahhar
Vahid.karimimotahhar@sabanciuniv.edu
Week 3
Linear Regression and Correlation
Learning Objectives

• Explain what statistical model is and why it is important for marketing

• Understand, run and interpret results of simple marketing regression models

• Describe the importance of R-square in a marketing mix model

• Practice analytics with marketing mix model output

• Understand, run and interpret results of multiple marketing mix regression


model

• Understand Correlation
Monday, October 17, 2022 Vahid Karimi Motahhar 2
Simple Linear Regression

• Least squares line

• Interpreting coefficients

• Prediction

• Cautions

• The formal model

Monday, October 17, 2022 Vahid Karimi Motahhar 3


TV Ads and Sales

• Can you estimate the Lagged Sales, just by knowing how much Red
Bull spent on TV ads?

• We will fit a model to predict sales based on dollars spent on TV ads

Monday, October 17, 2022 Vahid Karimi Motahhar 4


Linear Model

• A linear model predicts a response variable, y, using a linear function


of explanatory variables

• Simple linear regression predicts on response variable, y, as a linear


function of one explanatory variable, x

• We will create a model that predicts sales as a linear function of TV


ads

Monday, October 17, 2022 Vahid Karimi Motahhar 5


Regression Line

• Goal: Find a straight line that best fits the data in a scatterplot

Monday, October 17, 2022 Vahid Karimi Motahhar 6


Predicted and Actual Values

• The actual response value, y, is the response value observed for a particular
data point
• The predicted response value, ŷ, is the response value that would be
predicted for a given x value, based on a model
• In linear regression, the predicted values fall on the regression line directly
above each x value
• The best fitting line is that which makes the predicted values closest to the
actual values
Monday, October 17, 2022 Vahid Karimi Motahhar 7
Predicted and Actual Values

Monday, October 17, 2022 Vahid Karimi Motahhar 8


Residual

• The residual for each data


point is (actual - predicted = y
- ŷ) or the vertical distance
from the point to the line

• Want to make all the residuals


as small as possible. How
would you measure this?

Monday, October 17, 2022 Vahid Karimi Motahhar 9


Least Squares Regression

• Least squares regression chooses the regression line that minimizes the
sum of the squared residuals

• Why do you think this is SQUARED?

Monday, October 17, 2022 Vahid Karimi Motahhar 10


Equation of the Line

• The estimated regression line is

• Slope: increase in predicted y for every unit increase in x

• Intercept: predicted y value when x = 0

Monday, October 17, 2022 Vahid Karimi Motahhar 11


Regression Results

• Results from regressing LaggedSales on TV

• Coefficients: (Intercept) 37.6786, ( TVad) 1.2307

• LaggedSales = 37.69 + 1.23 X TV

Monday, October 17, 2022 Vahid Karimi Motahhar 12


Follow up question

• LaggedSales = 37.69 + 1.23 X TVads - Which of the following is the correct


interpretation?

A. The average sale is $37.69

B. For every extra $1.23 spent on TV ads, the predicted sales increase by $1

C. Predicted sales increases by $1.23 for each extra dollar spend on TV ads

D. For every extra $1.23 spend on TV ads, the predicted sales increase by
$37.69

Monday, October 17, 2022 Vahid Karimi Motahhar 13


Prediction
• The regression equation can be used to predict y
for a given value of x

• LaggedSales = 37.69 + 1.23 X TVads

• If you spend 1000 on TV ads, your best guess at


sales is:

• LaggedSales = 37.69 + 1.23 X 1000

• you spend 1000 on TV ads, your best guess at


sales is:

$1,267.69 = 37.69 + 1.23 X 1000

Monday, October 17, 2022 Vahid Karimi Motahhar 14


Sales = 1989.69 + 1.63 X TV ads

• If you spend $1847 on TV advertising, which of the following is your


best guess at sales?

A. Closer to $3000

B. Closer to $4000

C. Closer to $5000

D. Closer to $6000

Monday, October 17, 2022 Vahid Karimi Motahhar 15


Regression Caution

• Do not use the regression equation or line to predict outside the range
of x values available in your data (do not extrapolate!)

• If none of the x values are anywhere near 0, then the intercept is


meaningless!

Monday, October 17, 2022 Vahid Karimi Motahhar 16


R2

• R2 is the proportion of the variability in Y that is explained by the


The p-value for each term tests the null
model hypothesis that the coef cient is equal to
zero (no effect). A low p-value (< 0.05)
indicates that you can reject the null
hypothesis. In other words, a predictor
that has a low p-value is likely to be a
meaningful addition to your model because
changes in the predictor's value are related
to changes in the response variable.

Conversely, a larger (insigni cant) p-value


suggests that changes in the predictor are
not associated with changes in the
response.

Monday, October 17, 2022 Vahid Karimi Motahhar 17


Significance (P-Value)

• Another piece of information that is important for interpreting results of a least


square regression is the P-Value.

• The P-Value reveals whether the estimate is likely to be greater or less than 0.

• The P-Value is a number between 0 and 1.

• If the P-Value is less than or equal to 0.05, which is a commonly accepted


threshold, then it is acceptable to say the estimate is significantly different than
0, or just significant.

Monday, October 17, 2022 Vahid Karimi Motahhar 18


Using Correlations to Summarize Linear
Relationships
• Looking at the correlation between any pair of variables can provide
insights into how multiple variables move up and down in value together.

• Correlation measures linear association, not causation.

• The correlation (usually denoted by r) between two variables (call them x


and y) is a unit-free measure of the strength of the linear relationship
between x and y

• always between —1 and +1


Monday, October 17, 2022 Vahid Karimi Motahhar 19
Interpreting Correlations: +1

• A correlation near +1 means that x and


y have a strong positive linear
relationship

• That is, when x is larger than average,


y is almost always larger than average,
and when x is smaller than average, y
is almost always smaller than average

Monday, October 17, 2022 Vahid Karimi Motahhar 20


Interpreting Correlations: -1

• If x and y have a correlation near –1, this


means that there is a strong negative linear
association between x and y

• That is, when x is larger than average, y is


usually be smaller than average, and when
x is smaller than average, y is usually
larger than average

Monday, October 17, 2022 Vahid Karimi Motahhar 21


Relationship between Correlation and R2

• The correlation between two sets of data is simply the square root of
R2 with the same sign as the sign of the slope in the simple linear
regression

Monday, October 17, 2022 Vahid Karimi Motahhar 22


Thank you for your attention!
Vahid.karimimotahhar@sabanciuniv.edu
1015 SBS
+90 216 438 97 12

Monday, October 17, 2022 Vahid Karimi Motahhar 23

You might also like