You are on page 1of 44

Statistics (0.

0)
IB Diploma Biology
Hummingbirds are nectarivores (herbivores
that feed on the nectar of some species of
flower).
In return for food, they pollinate the flower.
This is an example of mutualism –
benefit for all.

As a result of natural selection,


hummingbird bills have evolved.
Birds with a bill best suited to
their preferred food source have
the greater chance of survival.

Photo: Archilochus colubris, from wikimedia commons, by Dick Daniels.


Researchers studying comparative anatomy collect
data on bill-length in two species of hummingbirds:
Archilochus colubris
(red-throated hummingbird) and
Cynanthus latirostris (broadbilled hummingbird).

To do this, they need to collect sufficient


relevant, reliable data so they can test
the Null hypothesis (H0) that:

“there is no significant difference


in bill length between the two species.”

Photo: Archilochus colubris (male), wikimedia commons, by Joe Schneid


The sample size must
be large enough to provide
sufficient reliable data and for us
to carry out relevant statistical
tests for significance.

We must also be mindful of


uncertainty in our measuring tools
and error in our results.
Photo: Broadbilled hummingbird (wikimedia commons).
The mean is a measure of the central tendency
of a set of data.
Calculate the mean using:
Table 1: Raw measurements of bill length in • Your calculator
A. colubris and C. latirostris.
(sum of values / n)
Bill length (±0.1mm)
n A. colubris C. latirostris
• Excel
1 13.0 17.0
2 14.0 18.0
3 15.0 18.0 n = sample size. The bigger the better.
4 15.0 18.0 In this case n=10 for each group.
5 15.0 19.0
All values should be centred in the cell, with
6 16.0 19.0
decimal places consistent with the measuring
7 16.0 19.0
tool uncertainty.
8 18.0 20.0
9 18.0 20.0
10 19.0 20.0
Mean =AVERAGE(highlight raw data)
s
The mean is a measure of the central tendency
of a set of data.
Table 1: Raw measurements of bill length in Descriptive table title and number.
A. colubris and C. latirostris.
Bill length (±0.1mm) Uncertainties must be included.
n A. colubris C. latirostris
1 13.0 17.0
2 14.0 18.0
3 15.0 18.0
4 15.0 18.0
5 15.0 19.0 Raw data and the mean need to have
6 16.0 19.0 consistent decimal places (in line with
7 16.0 19.0 uncertainty of the measuring tool)
8 18.0 20.0
9 18.0 20.0
10 19.0 20.0
Mean 15.9 18.8
s
Graph 1: Comparing mean bill lengths in two Descriptive title, with graph
hummingbird species, A. colubris and C. latirostris. number.
20.0
C. latirostris,
18.0 18.8mm Labeled point

16.0 A. colubris,
15.9mm
Mean Bill length (±0.1mm)

14.0
Y-axis clearly labeled, with
12.0 uncertainty.

10.0

Make sure that the y-axis


8.0 begins at zero.

6.0

4.0

2.0

0.0
Species of hummingbird x-axis labeled
Graph 1: Comparing mean bill lengths in two
hummingbird species, A. colubris and C. latirostris.

20.0
C. latirostris,
18.0 18.8mm

16.0 A. colubris, From the means alone


15.9mm you might conclude
Mean Bill length (±0.1mm)

14.0
that C. latirostris has a
12.0
longer bill than A.
colubris.
10.0

8.0 But the mean only tells


part of the story.
6.0

4.0

2.0

0.0
Species of hummingbird
Standard deviation is a measure of the spread of
most of the data.
Table 1: Raw measurements of bill length in
A. colubris and C. latirostris.
Bill length (±0.1mm) Which of the two sets of data has:
n A. colubris C. latirostris
a. The longest mean bill length?
1 13.0 17.0
2 14.0 18.0
3 15.0 18.0
a. The greatest variability in the data?
4 15.0 18.0
5 15.0 19.0
6 16.0 19.0
7 16.0 19.0
8 18.0 20.0
9 18.0 20.0
10 19.0 20.0
Mean 15.9 18.8
s 1.91 1.03 Standard deviation can have one more
=STDEV (highlight RAW data). decimal place.
Standard deviation is a measure of the spread of
most of the data.
Table 1: Raw measurements of bill length in
A. colubris and C. latirostris.
Bill length (±0.1mm) Which of the two sets of data has:
n A. colubris C. latirostris
a. The longest mean bill length?
1 13.0 17.0
2 14.0 18.0 C. latirostris
3 15.0 18.0
a. The greatest variability in the data?
4 15.0 18.0
5 15.0 19.0
A. colubris
6 16.0 19.0
7 16.0 19.0
8 18.0 20.0
9 18.0 20.0
10 19.0 20.0
Mean 15.9 18.8
s 1.91 1.03 Standard deviation can have one more
=STDEV (highlight RAW data). decimal place.
Standard deviation is a measure of the spread of
most of the data. Error bars are a graphical
representation of the variability of data.
Error bars could represent standard deviation, range or confidence intervals.

Which of the two sets of data has:


a. The highest mean?
A
a. The greatest variability in the data?
B
Graph 1: Comparing mean bill lengths in two Title is adjusted to
hummingbird species, A. colubris and C. latirostris.
show the source of the
(error bars = standard deviation)
error bars. This is very
important.
20.0
C. latirostris,
18.8mm
You can see the clear
Mean Bill length (±0.1mm)

A. colubris, difference in the size of


15.9mm
15.0
the error bars.

Variability has been


visualised.
10.0

The error bars overlap


somewhat.
5.0

What does this mean?

0.0
Species of hummingbird
The overlap of a set of error bars gives a clue as to the
significance of the difference between two sets of data.
Large overlap No overlap

Lots of shared data points No (or very few) shared data


within each data set. points within each data set.
Results are not likely to be Results are more likely to be
significantly different from significantly different from
each other. each other.

Any difference is most likely The difference is more likely


due to chance. to be ‘real’.
Graph 1: Comparing mean bill lengths in two
hummingbird species, A. colubris and C. Our results show a very small overlap
latirostris.(error bars = standard deviation) between the two sets of data.
22.0

So how do we know if the difference is


C. latirostris,
18.8mm
significant or not?
(n=10)
17.0 A. colubris, We need to use a statistical test.
15.9mm
(n=10)
Mean Bill length (±0.1mm)

12.0

7.0
The t-test is a statistical
test that helps us determine
the significance of the
2.0
difference between the
means of two sets of data.
-3.0
Species of hummingbird
Excel can jump straight to a value of P for our results.
One function (=ttest) compares both sets of data.
As it calculates P directly (the
probability that the difference is due
to chance), we can determine
significance directly.

In this case, P=0.00051

This is much smaller than 0.005, so


we are confident that we can:
reject H0.

The difference is unlikely to be due to


chance.

Conclusion:
There is a significant difference in bill
length between A. colubris and C.
latirostris.
95% Confidence Intervals can also be plotted as error bars.

no overlap

=CONFIDENCE.NORM(0.05,stdev,samplesize)
e.g =CONFIDENCE.NORM(0.05,C15,10)

These give a clearer indication of the significance of a result:


• Where there is overlap, there is not a significant difference
• Where there is no overlap, there is a significant difference.
• If the overlap (or difference) is small, a t-test should still be carried out.
Interesting Study: Do “Better” Lecturers Cause More Learning?
Students watched a one-minute video of a lecture. In one video, the lecturer was
fluent and engaging. In the other video, the lecturer was less fluent.
They predicted how much they would learn on the topic
(genetics) and this was compared to their actual score.
(Error bars = standard deviation).

Find out more here: http://priceonomics.com/is-this-why-ted-talks-seem-so-convincing/


Interesting Study: Do “Better” Lecturers Cause More Learning?
Students watched a one-minute video of a lecture. In one video, the lecturer was
fluent and engaging. In the other video, the lecturer was less fluent.
They predicted how much they would learn on the topic
(genetics) and this was compared to their actual score.
(Error bars = standard deviation).

Is there a significant difference in the actual learning?

Find out more here: http://priceonomics.com/is-this-why-ted-talks-seem-so-convincing/


From MrT’s Excel Statbook.
Diabetes and obesity are ‘risk factors’ of each other.
There is a strong correlation between them,
but does this mean one causes the other?

http://diabetes-obesity.findthedata.org/b/240/Correlations-between-diabetes-obesity-and-physical-activity
Correlation does not imply causality.

Pirates vs global warming, from http://en.wikipedia.org/wiki/Flying_Spaghetti_Monster#Pirates_and_global_warming


Correlation does not imply causality.
Where correlations exist, we must then design solid scientific experiments to determine the
cause of the relationship. Sometimes a correlation exist because of confounding variables –
conditions that the correlated variables have in common but that do not directly affect each
other.

To be able to determine causality through experimentation we need:


• One clearly identified independent variable
• Carefully measured dependent variable(s) that can be attributed to change in the
independent variable
• Strict control of all other variables that might have a measurable impact on the
dependent variable.

We need: sufficient relevant, repeatable and statistically significant data.

Some known causal relationships:


• Atmospheric CO2 concentrations and global warming
• Atmospheric CO2 concentrations and the rate of photosynthesis
• Temperature and enzyme activity

Pirates vs global warming, from http://en.wikipedia.org/wiki/Flying_Spaghetti_Monster#Pirates_and_global_warming


Correlation does not imply causation, but it does waggle its eyebrows
suggestively and gesture furtively while mouthing "look over there."

Cartoon from: http://www.xkcd.com/552/


Bibliography / Acknowledgments

You might also like