Chapter 6. Correlation vs. Causality in Regression Analysis

Chapter 6
Correlation vs Causality in Linear Regression Analysis

© 2019 McGraw-Hill Education. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or distribution without the prior written consent of McGraw-Hill Education
Learning Objectives
1. Differentiate between correlation and causality in general and in the

regression environment
2. Calculate partial and semi partial correlation
3. Execute inference for correlation regression analysis
4. Execute passive prediction using regression analysis
5. Execute inference for determining functions
6. Execute active prediction using regression analysis
7. Distinguish the relevance of model fit between active and passive
prediction
© 2019 McGraw-Hill Education. 2
The Difference Between Correlation and
Causality
Yi = fi(X1i, X2i, …, XKi) + Ui

• We define as the determining function, since it comprises the part
of the outcome that we can explicitly determine
• Ui can only be inferred by solving Yi – fi(X1i, X2i, …, XKi)
• Data-generating process as a framework for modeling causality
1. The reasoning established to measure an average treatment effect
using sample means easily maps to this framework
2. Easily extends into modeling causality for multi-level treatments
and multiple-treatments
Causality
• A causal relationship between two variables clearly implies co-

movement.
• If X casually impacts Y, then when X changes, we expect a change in Y
• However, variables often move together even when there is no casual
relationship between them
• For example, height of two different children of ages 5 and 10. Since
both the children are growing during these ages, their heights will
generally move together. this co-movement is not due to causality – an
increase in height by one child will not change in the height for the other.

Causality
• Measurement
of the co-movement between two variables in a
dataset is captured through sample covariance or correlation:
Covariance: sCov(X,Y) =
Correlation: sCorr(X,Y) =

Causality
• When there are more than two variables, e.g., Y, X1, X2, we can
also measure partial correlation between two of the variables
• Partial correlation between two variables is their correlation
after holding one or more other variables fixed

Causality
• Causality implies that a change in one variable or variables causes

a change in another
• Data analysis attempting to measure causality generally involves an
attempt to measure the determining function within the data-generating
process
• Correlation implies that variables move together
• Data analysis attempting to measure correlation is not concerned about
the data-generating process and determining function, it uses standard
statistical formulas (sample correlation, partial correlation) to assess how
variables move together
Regression Analysis for Correlation
• The dataset is a cross-section of 230 grocery stores
AvgPrice = Average Price

AvgHHSize = Average
Size of Households of
Customers at that
Grocery Store.

Sales = b + m1AvgPrice + m2AvgHHSize

Solving b, m1, m2:
Sales = 1591.54 – 181.66 × AvgPrice + 128.09 × AvgHHSize
• This equation provides us information about how the variables

in the equation are correlated within our sample.

Different Ways to Measure Correlation Between
Two Variables
• Unconditional
correlation is the standard measure of correlation
between two variables X and Y
Corr(X,Y) =
Sx = Sample standard deviation for X and
SY = Sample standard deviation for Y
• Partial correlation between X and Y is a measure of the relationship

between these two variables, holding at least one other variable fixed
• Semi-partial correlation between X and Y is a measure of the
relationship between these two variables, holding at least one other
variable fixed for only X or Y
• For the general regression equation: Y = b + m1X1 + … +mKXK the

solutions for m1 through mk when solving the sample moment
equations are proportional to the partial and semi-partial
correlation between Y and the respective Xs

Regression and Population Correlation
• Suppose we have the data for the entire population for our
grocery store data, then, we have:
Sales = B + M1AvgPrice + M2AvgHHSize
• Capital letters are used to indicate that these are the intercept
and slopes for the population, rather than the sample
• Solve for B, M1, and M2 by solving the sample moment
equations using the entire population of data

Regression and Population Criteria
• We do not have the data for the entire population, but for a
sample dataset for the population whose regression line is:
Sales = b + m1AvgPrice + m2AvgHHSize
• Solve for b, m1 and m2
• The intercept and slope(s) of the regression equation
describing a sample are estimators for the intercept and
slope(s) of the corresponding regression equation describing
the population.

Regression and Population Correlation
• Consistent estimator is an estimator whose realized value gets

close to its corresponding population parameter as the sample
size gets large.

Regression Line for Full Population

Regression Lines for Three Samples of Size 10

Regression Lines for Three Samples of Size 30

Confidence Interval and Hypothesis Testing for
the Population Parameters
• In order to conduct hypothesis testing or building confidence

intervals for the population parameters of a regression
equation, we need to know the distribution of the estimators
• Each estimator becomes very close to its corresponding
population parameters for a large sample
• For a large sample, these estimators are normally distributed

Confidence Interval and Hypothesis Testing for
the Population Parameters
• A large random sample implies that:

b~N(B,σB)
m1~N(M1,σm1)
mk~N(MK,σmk)
• If we write each element in the population as:
Yi = B + M1X1i + … + MKXK + Ei
, where Ei is the residual, then Var(Y|X) is equal to Var(E|X)
• Common assumption that this variance is constant across all values of X , so
Var(Y|X) = Var(E|X) = Var(E) = σ2
• This consistency of variance is called homoscedasticity
Prediction Using Regression
• Sales = 1591.54 – 181.66 × AvgPrice + 128.09 × AvgHHSize

• If Store A has an average price of $0.50 higher than Store B, and Store
A has an average household size that is 0.40 less than Store B, then:
= -181.66 × 0.50 + 128.09 × (-0.4) = -142
• We predict Store A has 143 fewer sales than Store B
• When using correlational regression analysis to make predictions, we
must be considering a population that spans across time and we
assume that the population regression equation best describes the
future population

Regression and Causation
• Data-generating process of an outcome Y can be written as:

Yi = fi(X1i, X2i, …, XKi) + Ui
• We assume the determining function can be written as:
fi(X1i, X2i, …, XKi) = α + β1X1i + β2X2i +… βKXKi
• Combining these assumptions into a single assumption, the data-
generating process can be written as:
Yi = α + β1X1i + β2X2i +… βKXKi + Ui
• Error term represents unobserved factors that determine the outcome

• Yi = B + M1X1i + … +MKXK + Ei (Correlation model)

• Yi = α + β1X1i + … βKXKi + Ui (Causality model)
• Correlational model residuals (Ei) have a mean of zero and are
uncorrelated with each of Xs. For this model, we simply plot all the
data points in the population and write each observation in terms
of equation that best describes these points.
• For the causality model, the data-generating process is the process
that actually generating the data we observe and determining
function need not be the equation that best describe the data.
The Difference Between the Correlation Model
and the Causality Model: An Example
CONSIDERING THESE DATA FOR Y, X, AND U ARE FOR THE ENTIRE POPULATION:
THESE DATA WERE GENERATED USING THE DATA- GENERATING PROCESS: Yi = 5 + 3.2Xi + Ui
MEANING WE HAVE A DETERMING FUNCTION : f(X) = 5 + 3.2X

Scatterplot, Regression Line, and Determining
Function of X and Y
IN THIS FIGURE, WE
PLOT Y AND X ALONG
WITH THE DETERMING
FUNCTION (BLUE LINE)
AND THE POPULATION
REGRESSION EQUATION
(RED LINE).

• The correlation model describes the data best but need not
coincide with the causal mechanism generating the data
• The causality model provides the casual mechanism but need
not describe the data best

The Relevance of Model Fit for Passive and
Active Prediction
• Total
sum of squares (TSS): The sum of the squared difference
between each observation of Y and the average value of Yi
TSS = Yi – )2
• Sum of squared residuals (SSRes): The sum of the squared
residuals.
SSRes = i
• R-squared: The fraction of the total variance in Y that can be
attributed to variation in the Xs
R2 = 1 – SSRes/TSS
The Relevance of Model Fit for Passive and
Active Prediction
• A high R-squared implies a good fit, meaning the points on the

regression equation tend to be close to the actual Y values
• R-squared for passive prediction (correlation) : Finding a high
R-squared implies the prediction is close to reality
• R-squared for active prediction (causality): R-squared is not a
primary consideration when evaluating predictions

Chapter 6. Correlation vs. Causality in Regression Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6. Correlation vs. Causality in Regression Analysis

Uploaded by

Copyright:

Available Formats

Chapter 6

Correlation vs Causality in Linear Regression Analysis

1. Differentiate between correlation and causality in general and in the

Yi = fi(X1i, X2i, …, XKi) + Ui

• A causal relationship between two variables clearly implies co-

© 2019 McGraw-Hill Education. 4

© 2019 McGraw-Hill Education. 5

© 2019 McGraw-Hill Education. 6

• Causality implies that a change in one variable or variables causes

• The dataset is a cross-section of 230 grocery stores

AvgPrice = Average Price

© 2019 McGraw-Hill Education. 8

Sales = b + m1AvgPrice + m2AvgHHSize

• This equation provides us information about how the variables

© 2019 McGraw-Hill Education. 9

• Partial correlation between X and Y is a measure of the relationship

• For the general regression equation: Y = b + m1X1 + … +mKXK the

© 2019 McGraw-Hill Education. 11

© 2019 McGraw-Hill Education. 12

© 2019 McGraw-Hill Education. 13

• Consistent estimator is an estimator whose realized value gets

© 2019 McGraw-Hill Education. 14

© 2019 McGraw-Hill Education. 15

© 2019 McGraw-Hill Education. 16

© 2019 McGraw-Hill Education. 17

• In order to conduct hypothesis testing or building confidence

© 2019 McGraw-Hill Education. 18

• A large random sample implies that:

• Sales = 1591.54 – 181.66 × AvgPrice + 128.09 × AvgHHSize

© 2019 McGraw-Hill Education. 20

• Data-generating process of an outcome Y can be written as:

© 2019 McGraw-Hill Education. 21

• Yi = B + M1X1i + … +MKXK + Ei (Correlation model)

MEANING WE HAVE A DETERMING FUNCTION : f(X) = 5 + 3.2X

© 2019 McGraw-Hill Education. 24

© 2019 McGraw-Hill Education. 25

• A high R-squared implies a good fit, meaning the points on the

© 2019 McGraw-Hill Education. 27

You might also like