You are on page 1of 19

https://www.student.uwa.edu.

au/learning/resources/ace/
respect-intellectual-property/copyright-and-uwa-unit-content

Adriano Polpo (UWA) STAT 1400


STAT 1400 - Statistics for Science

stat1400-ems@uwa.edu.au

Contributors to lecture material: Adrian Baddeley, Adriano Polpo, John Bamberg, Ed Cripps, Julie Marsh, Kevin Murray,
Gordon Royle, and Berwin Turlach.

Adriano Polpo (UWA) STAT 1400


Hubble’s experiment

A Relation between Distance and Radial Velocity among Extra-Galactic


Nebulae., PNAS 15 (1929) 168–173
His experiment

He observed n = 24 nebulae.
The recessional velocity of each nebula was estimated by its
redshift (in kilometers per second).
The distance to each nebula was estimated by its apparent
luminosity (in megaparsecs = 3.09 ⇥ 1019 meters)
He produced a scatterplot and thereby estimated a correlation
between distance and recessional velocity.
Is the universe expanding?

He was investigating the hypothesis that the universe is expanding.

If the universe is expanding, then (astronomical) objects further


away must be moving faster.

Hubble published the first empirical evidence of the expanding


universe hypothesis.

This provides supporting evidence for the theory that the universe
was created in a Big Bang.
Scatterplots
Given two quantitative variables on the same experimental units:

{(x1 , y1 ), (x2 , y2 ), (x3 , y3 ), . . . , (xn , yn )}

form a scatterplot with markers at n coordinate positions (xi , yi ).


Scatterplots

Scatterplots display patterns, trends, relationships and outliers.


Association

A scatterplot displays the association between the two variables by:

Direction

Positive: as x increases, y increases


Negative: as x increases, y decreases
Form
Is the form linear (straight), or non-linear (e.g. quadratic,
exponential etc...)?

Strength
How tightly the variables move together.
Types of association
Positive Negative Absent

20

10

● ● ●

15

● ●

Stronger Association

Stronger Association

Stronger Association

●●

● ●● ●
● ●● ●

●● ●● ● ●●●●

−5
● ●● ● ●● ● ●
● ●● ● ●

5
● ●● ●● ●
●●
●● ●●● ●
● ● ●●●
10 ● ●
●● ● ● ● ●● ●
● ●● ● ●●● ●
● ●● ● ● ●● ●●● ●●●

● ●● ● ●●
● ●●● ●

● ●●● ●● ●● ● ●
● ● ● ● ●● ●● ●●
● ● ●
● ●

−10
● ●●

● ● ● ● ●
● ● ●

0
●● ● ●

● ●
5

−15

−5
0

1 2 3 4 5 6 7 1 2 3 4 5 6 7 −2 −1 0 1 2

x x x1
20

● ● ●

● ●
● ●

0

10
● ● ●
● ● ●

15

● ● ●
● ● ●● ●

Weaker Association

Weaker Association

Weaker Association
● ● ● ● ●
● ●
● ● ● ●● ● ●
● ● ● ●
● ●
−5

● ● ●
● ●

5
● ● ●
● ●● ● ● ●●
10

● ●
● ● ● ●
● ●●
● ● ● ●

● ● ●
● ● ● ●●

● ● ● ●
● ● ● ●
●● ● ● ●
● ●● ●
−10

● ● ● ● ●
● ● ●● ● ●

0
●●


● ● ● ●
● ●
● ●●
● ● ● ●
5

● ●
● ● ●
● ● ●● ●●


−15


−5
● ●
● ● ●
0

1 2 3 4 5 6 7 1 2 3 4 5 6 7 −2 −1 0 1 2

x x x1
Quantify the assocation

We want to quantify the strength/weakness of the association


between two variables.

The correlation coefficient is a unit-free measure of the strength of


a linear association, with the following properties.

Correlation coefficient Implication


1 perfect positive linear association
0 no linear association
1 perfect negative linear association
Hours sleep vs grumpiness

Sleep Grumpiness
8 5
8.5 5
8 6
7.2 6.5
7.5 5
9 3
6.5 7
7 7
8.5 4
6 8
Correlation coefficient

To describe the correlation coefficient we need some notation:

Say we have paired data

(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ),

where (xi , yi ) is the ith observation.

Let sx denote the sample standard deviation for the x-variable

Let sy denote the sample standard deviation for the y-variable


Correlation coefficient cont.

The correlation coefficient is calculated as


X
(xi x)(yi y)
r=
(n 1) sx sy

(xi x) is the deviation of xi from the mean x-value


(yi y) is the deviation of yi from the mean y-value
Dividing by n 1 takes the “average”.
The 1/sx sy standardises this to lie between 1 and 1.
Our example
Do the calculations

x y sx sy
7.62 5.65 0.954 1.528

x y x x̄ y ȳ (x x̄)(y ȳ)
(hours sleep) (grumpiness)
8 5 0.38 -0.65 -0.247
8.5 5 0.88 -0.65 -0.572
8 6 0.38 0.35 0.133
7.2 6.5 -0.42 0.85 -0.357
7.5 5 -0.12 -0.65 0.078
9 3 1.38 -2.65 -3.657
6.5 7 -1.12 1.35 -1.512
7 7 -0.62 1.35 -0.837
8.5 4 0.88 -1.65 -1.452
6 8 -1.62 2.35 -3.807

We get r = 0.932
A strong correlation

There is a strong negative correlation between hours of sleep and


amount of grumpiness.

In other words, the more sleep you get, the less grumpy you feel
the next day.

Although correlation does not imply causation, the statistician


would agree that this at least supports the hypothesis that losing
sleep makes a person grumpy.
Correlation and Causation
Some correlation coefficients
Positive Negative Absent

20
r=−0.90

10

● ● ●

15

● ●
Stronger Association

Stronger Association

Stronger Association

r=0.89 ● ●●
●●

● r=−0.13
● ●● ●

●● ●● ● ●●●●

−5
● ●● ● ●● ● ●
● ●● ● ●

5
● ●● ●● ●
●●
●● ●●● ●
● ● ●●●
10
●● ● ●
●● ●●● ● ● ● ●● ●
● ● ●
● ●● ● ● ●● ●●● ●●●

● ●● ● ●●
● ●●● ●

● ●●● ●● ●● ● ● ●
● ● ●

● ● ●● ●●


● ●

−10
● ●●

● ● ● ● ●
● ● ●

0
●● ● ●

● ●
5

−15

−5
0

1 2 3 4 5 6 7 1 2 3 4 5 6 7 −2 −1 0 1 2

x x x1
20

● ● ●

r=0.44 ●




0

10
● ● ●
● ● ●

15

● ● ●
● ● ●●

r=−0.08 ●
Weaker Association

Weaker Association

Weaker Association
● ● ● ● ●
● ●
● ● ● ●● ● ●
● ● ● ●
● ●
−5

● ● ●
● ●

5
● ● ●
● ●● ● ● ●●
10

● ●
● ● ● ●
● ●●
● ● ● ●

● ● ●
● ● ● ●●

● ● ● ●
● ● ● ●
●● ● ● ●
● ●● ●
−10

● ● ● ● ●
● ● ●● ● ●

0
●●


● ● ● ●
● ●
● ●●
● ● ● ●
5

● ●
● ● ●
● ● ●● ●●


−15



r=−0.44 ●
−5 ●
● ● ●
0

1 2 3 4 5 6 7 1 2 3 4 5 6 7 −2 −1 0 1 2

x x x1
Cautionary notes about r

Correlation is only a measure of linear association.

It does not tell us about non-linear associations.

Does not indicate a cause and e↵ect relationship

Many correlations due to lurking variables.

You might also like