Professional Documents
Culture Documents
Describing Relationships:
Scatterplots and Correlation
Chapter 14 1
1
Income versus Assets
• Income =
a + b×Assets
• Assets vary from 3.4
billion to 49 billion
• Income varies from
bank to bank, even
among those with
similar assets
• Statistical relationship
2
No relationship:
x and y vary independently. Knowing x tells you nothing about y.
Correlation
• measures the strength and direction of
a linear relationship between two
quantitative variables
3
• Negative correlation
– X ↑ Y↓
–X↓Y↑
• X,Y behave “oppositely”
• Positive correlation
–X↑Y↑
–X↓Y↓
• X,Y behave “similarly”
r
• Pearson correlation coefficient (r) describes the
direction and strength of a linear relationship between
two variables.
4
Problems with Correlations
• Outliers can inflate or deflate
correlations
• Groups combined inappropriately may
mask relationships (a third variable)
– groups may have different relationships
when separated
Not an outlier:
Outliers
The upper right-hand point here is
not an outlier of the
relationship—it is what you would
expect for this many beers given
the linear relationship between
beers/weight and blood alcohol.
5
Strength and Statistical
Significance
• A strong relationship seen in the sample may
indicate a strong relationship in the population.
• The sample may exhibit a strong relationship
simply by chance and the relationship in the
population is not strong or is zero.
• The observed relationship is considered to be
statistically significant if it is stronger than a
large proportion of the relationships we could
expect to see just by chance.
Warnings about
Statistical Significance
• “Statistical significance” does not imply the
relationship is strong enough to be considered
“practically important.”
• Even weak relationships may be labeled
statistically significant if the sample size is very
large.
• Even very strong relationships may not be labeled
statistically significant if the sample size is very
small.
Chapter 15
Describing Relationships:
Regression, Prediction, and
Causation
Chapter 15 33
6
Straight lines
• y = a + bx
• a = y intercept
• b = slope
• y = 3 - 2x
7
The least-squares regression line is the unique line
such that the sum of the squared vertical (y)
distances between the data points and the line is
the smallest possible.
Making predictions
The equation of the least-squares regression allows you to predict
y for any x within the range studied. This is called interpolating.
8
Coefficient of Determination
(R2)
• Measures usefulness of regression prediction
• R2 (or r2, the square of the correlation):
measures the percentage of the variation in
the values of the response variable (y) that is
explained by the regression line
• r=1: R2=1: regression line explains all (100%) of
the variation in y
• r=.7: R2=.49: regression line explains almost half
(50%) of the variation in y
r = −1 Changes in x
r2 = 1 explain 100% of r = 0.87
the variations in y. r2 = 0.76
y can be entirely
predicted for any
given value of x.
r=0 Changes in x
r2 = 0 explain 0% of the Here the change in x only
variations in y. explains 76% of the change in
The value(s) y y. The rest of the change in y
takes is (are) (the vertical scatter, shown as
entirely
red arrows) must be explained
independent of
by something other than x.
what value x
takes.
!!!
Height in Inches
!!!
9
Correlation Does Not Imply
Causation
Evidence of Causation
• A properly conducted experiment
establishes the connection
• Other considerations:
– A reasonable explanation for a cause and
effect exists
– The connection happens in repeated trials
– The connection happens under varying
conditions
– Potential confounding factors are ruled out
– Alleged cause precedes the effect in time
10
Association and causation
It appears that lung cancer is associated with smoking.
How do we know that both of these variables are not being affected by an
unobserved third (lurking) variable?
For instance, what if there is a genetic predisposition that causes people to
both get lung cancer and become addicted to smoking, but the smoking itself
doesn’t CAUSE lung cancer?
Ch 14 & 15 concepts
11