Handout Statistical Vs Detrministic Relationship PDF

Chapter 14
Describing Relationships:
Scatterplots and Correlation
Chapter 14 1
Statistical versus Deterministic

Relationships
• Distance versus Speed (when travel

time is constant).
• Income (in millions of dollars) versus
total assets of banks (in billions of
dollars).
Distance versus Speed

• Distance = Speed × Time
• Suppose time = 1.5 hours
• Each subject drives a
fixed speed for the 1.5 hrs
– speed chosen for each
subject varies from 10 mph
to 50 mph
• Distance does not vary for
those who drive the same
fixed speed
• Deterministic relationship
1
Income versus Assets
• Income =
a + b×Assets
• Assets vary from 3.4
billion to 49 billion
• Income varies from
bank to bank, even
among those with
similar assets
• Statistical relationship
• A scatter plot shows a linear

relationship if the points follow, more
or less, along a straight line
• Example - heights and weights of 165
students in a college statistics course:
Positive association: High values of one variable tend to occur together

with high values of the other variable.
Negative association: High values of one variable tend to occur together

with low values of the other variable.
2
No relationship:
x and y vary independently. Knowing x tells you nothing about y.
One way to remember this:

The equation for this line is y = 5.
x is not involved.
The strength of the relationship between the two

variables can be seen by how much variation, or
scatter, there is around the main form.
With a strong relationship, you With a weak relationship, for any

can get a pretty good estimate x you might get a wide range of
of y if you know x. y values.
Correlation
• measures the strength and direction of
a linear relationship between two
quantitative variables
3
• Negative correlation
– X ↑ Y↓
–X↓Y↑
• X,Y behave “oppositely”
• Positive correlation
–X↑Y↑
–X↓Y↓
• X,Y behave “similarly”
r
• Pearson correlation coefficient (r) describes the
direction and strength of a linear relationship between
two variables.
-1 ≤ r ≤ -0.8 strong negative correlation

-0.8 < r < -0.2 weak to moderate negative cor.
-0.2 ≤ r ≤ 0.2 negligible correlation
0.2 < r < 0.8 weak to moderate positive cor.
0.8 ≤ r ≤ 1 strong positive correlation
4
Problems with Correlations
• Outliers can inflate or deflate
correlations
• Groups combined inappropriately may
mask relationships (a third variable)
– groups may have different relationships
when separated
Not an outlier:
Outliers
The upper right-hand point here is
not an outlier of the
relationship—it is what you would
expect for this many beers given
the linear relationship between
beers/weight and blood alcohol.
This point is not in line with

the others, so it is an outlier
of the relationship.
What does “statistical

significance” mean?
• 5.Statistics. Of or relating to
observations or occurrences that are
too closely correlated to be
attributed to chance and therefore
indicate a systematic relationship
5
Strength and Statistical
Significance
• A strong relationship seen in the sample may
indicate a strong relationship in the population.
• The sample may exhibit a strong relationship
simply by chance and the relationship in the
population is not strong or is zero.
• The observed relationship is considered to be
statistically significant if it is stronger than a
large proportion of the relationships we could
expect to see just by chance.
Warnings about
Statistical Significance
• “Statistical significance” does not imply the
relationship is strong enough to be considered
“practically important.”
• Even weak relationships may be labeled
statistically significant if the sample size is very
large.
• Even very strong relationships may not be labeled
statistically significant if the sample size is very
small.
Chapter 15
Describing Relationships:
Regression, Prediction, and
Causation
Chapter 15 33
6
Straight lines
• y = a + bx
• a = y intercept
• b = slope
(lines: a quick review)

• Slope = ∆y/∆x = rise/run
• e.g. slope is - 2, y decreases 2 units for
every one unit increase in x
• y = 3 - 2x
A regression line is a straight line that describes

how a response variable y changes as an
explanatory variable x changes. We often use a
regression line to predict the value of y for a given
value of x.
7
The least-squares regression line is the unique line
such that the sum of the squared vertical (y)
distances between the data points and the line is
the smallest possible.
Distances between the points and

line are squared so all are positive
values. This is done so that
distances can be properly added.
The least-squares regression line can be

shown to have this equation: yˆ = a + bx
yˆ is the predicted y value

! (y hat)
b is the slope
a is the y-intercept
!
Making predictions
The equation of the least-squares regression allows you to predict
y for any x within the range studied. This is called interpolating.
yˆ = 0.0144 x + 0.0008 Nobody in the study drank 6.5

beers, but by finding the value
of ŷ from the regression line for
! x = 6.5, we would expect a
blood alcohol content of 0.094
mg/ml.
yˆ = 0.0144 * 6.5 + 0.0008

yˆ = 0.936 + 0.0008 = 0.0944 mg / ml
8
Coefficient of Determination
(R2)
• Measures usefulness of regression prediction
• R2 (or r2, the square of the correlation):
measures the percentage of the variation in
the values of the response variable (y) that is
explained by the regression line
• r=1: R2=1: regression line explains all (100%) of
the variation in y
• r=.7: R2=.49: regression line explains almost half
(50%) of the variation in y
r = −1 Changes in x
r2 = 1 explain 100% of r = 0.87
the variations in y. r2 = 0.76
y can be entirely
predicted for any
given value of x.
r=0 Changes in x
r2 = 0 explain 0% of the Here the change in x only
variations in y. explains 76% of the change in
The value(s) y y. The rest of the change in y
takes is (are) (the vertical scatter, shown as
entirely
red arrows) must be explained
independent of
by something other than x.
what value x
takes.
!!!
Height in Inches
!!!
Extrapolation is the use of a

regression line for predictions
outside the range of x values
used to obtain the line.
This can be a very stupid thing

to do, as seen here.
Height in Inches
9
Correlation Does Not Imply
Causation
Even very strong correlations

may not correspond to a real
causal relationship.
Evidence of Causation
• A properly conducted experiment
establishes the connection
• Other considerations:
– A reasonable explanation for a cause and
effect exists
– The connection happens in repeated trials
– The connection happens under varying
conditions
– Potential confounding factors are ruled out
– Alleged cause precedes the effect in time
Reasons for relationships between variables

1. Explanatory variable is the direct cause of
the response variable
2. The response variable is causing a change
in the explanatory variable
3. The explanatory variable is contributing to
but not the sole cause of change in the
response variable
4. Confounders may exist
5. Both variables result from a common cause
6. Both variables are changing over time
7. The association is coincidence
10
Association and causation
It appears that lung cancer is associated with smoking.
How do we know that both of these variables are not being affected by an
unobserved third (lurking) variable?
For instance, what if there is a genetic predisposition that causes people to
both get lung cancer and become addicted to smoking, but the smoking itself
doesn’t CAUSE lung cancer?
We can evaluate the association using the

following criteria:
1) The association is strong.

2) The association is consistent.
3) Higher doses are associated with stronger
responses.
4) The alleged cause precedes the effect.
5) The alleged cause is plausible.
Ch 14 & 15 concepts
•Statistical vs. Deterministic Relationships

•Statistical Significance
•Correlation Coefficient
•Problems with Correlations
•LS Regression Equation
•R2
•Correlation does not imply causation!
11

Handout Statistical Vs Detrministic Relationship PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handout Statistical Vs Detrministic Relationship PDF

Uploaded by

Copyright:

Available Formats

Chapter 14

Statistical versus Deterministic

• Distance versus Speed (when travel

Distance versus Speed

• A scatter plot shows a linear

Positive association: High values of one variable tend to occur together

Negative association: High values of one variable tend to occur together

One way to remember this:

The strength of the relationship between the two

With a strong relationship, you With a weak relationship, for any

-1 ≤ r ≤ -0.8 strong negative correlation

This point is not in line with

What does “statistical

(lines: a quick review)

A regression line is a straight line that describes

Distances between the points and

The least-squares regression line can be

yˆ is the predicted y value

yˆ = 0.0144 x + 0.0008 Nobody in the study drank 6.5

yˆ = 0.0144 * 6.5 + 0.0008

Extrapolation is the use of a

This can be a very stupid thing

Even very strong correlations

Reasons for relationships between variables

We can evaluate the association using the

1) The association is strong.

•Statistical vs. Deterministic Relationships

You might also like