Professional Documents
Culture Documents
17 Correlation 320E F21
17 Correlation 320E F21
Agenda
• Correlation
– Definition & properties
– Calculating r
– Hypothesis test
– Confidence Intervals
Projects
• Check the notes on your project grade!!!
• If you need major changes:
– Talk to me or grad TA (ASAP)!!!
• If no major changes:
– Start collecting data and working on preliminary
analysis
WHAT NOW?!?!
• SDS 322E Data Science with R and Python
– You will learn modern data manipulation and
visualization techniques in R (and python) as well
as more advanced statistical concepts in applied
biomedical contexts.
– Grad school?
– Research?
Example
• You want to know if ID Sodas/wk BMI
1. Direction
2. Strength
3. Linearity
4. Presence of outliers
Correlation Properties
• 1. Direction
• 2. Strength
Correlation Properties
3. Importance of Linearity:
Correlation Properties
3. Importance of Linearity:
r = 0.0004
Correlation ≠
Association
Correlation Properties
4. Importance of Outliers:
Correlation Properties
4. Importance of Outliers:
r = 0.602
Without outlier:
r = 0.971
What’s the deal with r?
• Things that DON’T CHANGE r
– 1. r[x,y] = r[y,x]
• e.g. correlation between weight and height is the same
as between height and weight
What’s the deal with r?
• Things that DON’T change r
• Restriction
70
60
of Range Problem
– 50Try to include the whole span of x values (“range”)
Height (in)
– 40
Restricting
30
it to one Correl all:
section of 0.78
x’s will change your
correlation Correl age 0-12: 0.99
20
Correl age 15-45: 0.02
10
0
0 5 10 15 20 25 30 35 40 45
Age
Correlation and Error
• We try to measure as accurately as possible
– Always some error
– Measurement error: (true value – measured value)
• Increasing measurement error will (usually)
reduce the absolute value of the correlation
coefficient
– attenuation
If you’re bored
• guessthecorrelation.com
• https://www.google.com/trends/correlate/
Calculating r
• Cigarettes: tar and nicotine
Cigarette Data
1.4
(mg/cig) (mg/cig)
Nicotine (mg/cig)
1
0.8
0.4
0
Vantage 8 0.7 0 2 4 6 8 10 12 14 16 18
Tar (mg/cig)
mean 40 26.8
s 15.811 2.683
Try it!
• The gas mileage of an automobile first increases
and then decreases as the speed increases.
Suppose this relationship is very regular (as
shown in the following table), with speed in mph
and mileage in miles per gallon.
Speed (MPH) Mileage (MPG) z(sp) z(mile) z(sp)z(mile)
20 24
30 28
40 30
50 28
60 24
Try it!
• The gas mileage of an automobile first increases
and then decreases as the speed increases.
Suppose this relationship is very regular (as
shown in the following table), with speed in mph
and mileage in miles per gallon.
Speed (MPH) Mileage (MPG) z(sp) z(mile) z(sp)z(mile)
20 24 -1.265 -1.043 1.320
30 28 -0.632 0.447 -0.283
40 30 0.000 1.193 0.000
50 28 0.632 0.447 0.283
60 24 1.265 -1.043 -1.320
Relationship?
Relation Between MPH and MPG
32
30
28
Milage (MPG)
26
24
22
20
10 20 30 40 50 60 70
Speed (MPH
Correlation Matrices
• Way to report the relations among several
numeric variables
Correlation Hypothesis Test
• Are the two variables significantly related?
– Is the correlation significant?
Correlation Hypothesis Test: Steps
• 1. Assumptions
• 2. Hypotheses
• 3. Calculate t
– need r and SEr
• 4. Find t*
• 5. Conclusion
Example
• Are the length and weight of vipers
significantly, linearly related? r = 0.944
Correlation Hypothesis Test
• Assumptions
– 1. Random sample
– 2. Independent observations
– 3. x, y come from a bivariate normal distribution
Check: Bivariate Normal Distribution
Y X
Violations of Bivariate Normality
• Common issues
Did we meet the assumption?
Step 2: Hypotheses
• Are the two variables significantly linearly
related?
• HO : ρ = 0
• HA : ρ ≠ 0
• r = .944
• SEr
.1247
Step 3: Find t
• Good news, everyone!
• I t I > t*
– Reject the null
– There is a significant, linear relationship between
diet soda consumption and BMI (t=5.965, df=98, p
<.05)
I want more!
• Confidence intervals!
• r = .516
• t* = 1.984
• SEr = 0.0865
Summary –
• 𝑟 can tell us the strength and direction of
the linear relationship between X and Y