Professional Documents
Culture Documents
Chapter 6. Correlation vs. Causality in Regression Analysis
Chapter 6. Correlation vs. Causality in Regression Analysis
• Measurement
of the co-movement between two variables in a
dataset is captured through sample covariance or correlation:
Covariance: sCov(X,Y) =
Correlation: sCorr(X,Y) =
• When there are more than two variables, e.g., Y, X1, X2, we can
also measure partial correlation between two of the variables
• Partial correlation between two variables is their correlation
after holding one or more other variables fixed
• Suppose we have the data for the entire population for our
grocery store data, then, we have:
Sales = B + M1AvgPrice + M2AvgHHSize
• Capital letters are used to indicate that these are the intercept
and slopes for the population, rather than the sample
• Solve for B, M1, and M2 by solving the sample moment
equations using the entire population of data
• We do not have the data for the entire population, but for a
sample dataset for the population whose regression line is:
Sales = b + m1AvgPrice + m2AvgHHSize
• Solve for b, m1 and m2
• The intercept and slope(s) of the regression equation
describing a sample are estimators for the intercept and
slope(s) of the corresponding regression equation describing
the population.
CONSIDERING THESE DATA FOR Y, X, AND U ARE FOR THE ENTIRE POPULATION:
THESE DATA WERE GENERATED USING THE DATA- GENERATING PROCESS: Yi = 5 + 3.2Xi + Ui
IN THIS FIGURE, WE
PLOT Y AND X ALONG
WITH THE DETERMING
FUNCTION (BLUE LINE)
AND THE POPULATION
REGRESSION EQUATION
(RED LINE).
• The correlation model describes the data best but need not
coincide with the causal mechanism generating the data
• The causality model provides the casual mechanism but need
not describe the data best
• Total
sum of squares (TSS): The sum of the squared difference
between each observation of Y and the average value of Yi
TSS = Yi – )2
• Sum of squared residuals (SSRes): The sum of the squared
residuals.
SSRes = i
• R-squared: The fraction of the total variance in Y that can be
attributed to variation in the Xs
R2 = 1 – SSRes/TSS
© 2019 McGraw-Hill Education. 26
The Relevance of Model Fit for Passive and
Active Prediction