You are on page 1of 22

Sex, extra-marital affairs, and death.

Otherwise known as…

The Binomial Distribution!


Sex: infanticide and sex ratios in lions
Extra-marital affairs: who’s the daddy?
Who is the biological father? the one who is feeding the chicks or a diff male?
Death: Cases of Ebola (Nov 2nd 2014)
understanding the proportion of individuals that die is important for health crisis' like ebola;

Did a person with ebola die or survive?

Countries with Widespread Transmission

Laboratory-Confirmed
Country Total Cases Total Deaths
Cases

Guinea 1731 1457 1041


Liberia 6525 2451 2413
Sierra Leone 4759 4057 1070
Total 13015 7965 4808
The Binomial Distribution: the Bernoulli trial

• A Bernoulli trial is an event or experiment with two possible


outcomes.

• These outcomes are denoted “success” and “failure” although this is


entirely arbitrary.

• A familiar example of a Bernoulli trial is a coin toss. In this case the


two possible outcomes are heads or tails and each is equally likely to
occur.

• But this doesn’t have to be the case…


The Bernoulli trial

• Let’s assume that X is a random variable that can only take two
possible values and let’s denote those values 0 and 1. It’s common
practice to associate “success” with 1 and “failure” with 0.

• 𝑃 𝑋=1 =𝑝
cannot be a third option, p+q=1
• Hence 𝑃 𝑋 = 0 = (1 − 𝑝) = 𝑞

• Notice that we can just define failure as a ‘lack of success’. So the


random variable could be “the identity of the next person to walk
into my office” with success defined as The Head of Department and
failure defined as NOT The Head of Department .
The Binomial Distribution: a series of Bernoulli trials

• The Binomial Distribution is concerned with the expected number of


successes in a series of independent Bernoulli trials.
if we assume that the probability of success does not change with each trial

• For example, imagine tossing a coin 10 times. How many heads do


you expect to see? And what’s the probability of getting 10 heads?

• The number of trials is denoted by k.

• The outcome is now the number of successes, X.

e.g. throwing a coin: If you're betting on the sequence, all are equally likely, but if you're betting on the outcome, one is twice as likely
P(X=0) -> T T -> p(0)*p(0) = 0.25 -> q^2
P(X=2) -> H H -> p^2 -> 0.25
P(X=1) -> H T or T H -> p*q + q*p -> 0.5
The Binomial Distribution: a series of Bernoulli trials
x can be 0 to k;

𝑘 𝑥 𝑘−𝑥
𝑝 𝑥 = 𝑝 (1 − 𝑝) , x = 0,1,2,…k
𝑥
𝑘 𝑘!
=
𝑥 𝑥! 𝑘 − 𝑥 !
This is written 𝑋~𝐵 𝑘, 𝑝 and provides a probability model for the total
number of successes in a sequence of k independent Bernoulli trials, in
which the probability of success in a single trial is p.
The Binomial Distribution: understanding the formula

𝑘 𝑥 𝑘−𝑥
𝑝 𝑥 = 𝑝 (1 − 𝑝) , x = 0,1,2,…k
𝑥

• The highlighted part is the probability of any one way of getting x


successes from k throws…

• The problem is that there are usually many possible ways of doing this.
The Binomial Distribution: understanding the formula

𝑘 𝑥 𝑘−𝑥
𝑝 𝑥 = 𝑝 (1 − 𝑝) , x = 0,1,2,…k
𝑥

• The second part (highlighted) is therefore the number of possible ways of getting
x successes from k trials.

• It’s called the binomial coefficient and the button on your calculator looks like
this: nCr , where n = k. Just make sure that you know where this button is!
• In R there is a function called choose(k,x).
The Binomial Distribution: properties

• The mean of the binomial distribution is given by kp.

• The variance of the binomial distribution is given by kp(1-p).

• When p>>q or p<<q the distribution can be heavily skewed.

• But when p = q the distribution is symmetrical.


The Binomial Distribution: properties
Back to the biology!
Great tits in Wytham Woods.

• Question: are extra-pair chicks randomly distributed among broods?


not fathered by the male who is feeding them

• We need appropriate data, an appropriate null model, and an appropriate test


statistic (to assess whether deviations from the null model are significant).

• In this case the data were collected by Professor Ben Sheldon and his group. They
have data on the number of extra-pair chicks in broods of size seven.

• The appropriate null model is the binomial distribution (with k = 7) and the
appropriate statistic is the chi-squared.
Great tits in Wytham Woods: assembling observed counts.
Number of EPCs Frequency Expected?

0 14
1 6
2 4
3 2
4 1
5 1
6 0
7 1
Great tits in Wytham Woods: assembling expected counts.

• Problem: I know k, but I don’t know p.

• Solution: Estimate p from the data.


what is the prob. of a chick in any nest being an extra pair chick
• p can be estimated using p = total number of successes/total number of events.

• In this case p = 0.177 (or 36/203).


29*7
Great tits in Wytham Woods: assembling expected counts.

Number of EPCs (x) Frequency Expected Where n is the number of


0 14 p(x) * n samples.

1 6 In the case of the great tits,


29 broods were assessed;
2 4
hence n = 29.
3 2
4 1
Where p(x) is the
5 1 probability of x, given X is
Binomial with parameters
6 0 k and p.
7 1 Here k = 7 and p = 0.177.

total should be 29
Great tits in Wytham Woods: assembling expected counts.

Number of EPCs (x) Frequency p(x) E = p(x).n


0 14 0.255 7.395
1 6 0.385 11.159
2 4 0.249 7.217
some values to small,
3 2 0.0894 2.593 expected values should be 5
or more -> lump data!
4 1 0.0193 0.559
5 1 0.00249 0.0723
6 0 0.000179 0.00519
7 1 Tiny… 0.000160
Great tits in Wytham Woods: overdispersion.

more variability in the data that expected; males and females are different;
In the observed data there are more zeroes
and more high values than expected.

This is similar to the spatial aggregation that


can cause significant departure from the
Poisson distribution.

Just as in the Poisson case, we call this


overdispersion – essentially there is more
variance than we are expecting.
Great tits in Wytham Woods: significant departure?

Number of EPCs (x) Observed Expected


(𝑂 − 𝐸)2

0 14 7.395 𝐸

1 6 11.159
= 10.689
2 4 7.217 rows number of parameters
df = k – n – 1
>2 5 3.229 df = 4 – 1 – 1
=2
Bit naughty as all expected
values should be >5
Great tits in Wytham Woods: significant departure?
Degrees of freedom (df) χ2 value

1 2.71 3.84 6.64 10.83


2 4.60 5.99 9.21 13.82
3 6.25 7.82 11.34 16.27
4 7.78 9.49 13.28 18.47
5 9.24 11.07 15.09 20.52
6 10.64 12.59 16.81 22.46
7 12.02 14.07 18.48 24.32
8 13.36 15.51 20.09 26.12
9 14.68 16.92 21.67 27.88
10 15.99 18.31 23.21 29.59
P value (Probability) 0.10 0.05 0.01 0.001

Conclude that data does not follow binomial distribution


Practical: the occurrence of males in ‘large’ lion litters

• Background: large coalitions of males can hold prides for longer. Thus male
reproductive success is highly correlated with the number of same-sex siblings.
This is not the case for females.
• Hypothesis: when litter size is large (4 or more cubs), there should be more litters
than expected with large numbers of male cubs.

• The data: the number of male lion cubs in 34 litters of size four was recorded in
the Serengeti National Park*. The data are: 2 3 3 3 1 3 3 2 3 2 1 4 3 0 0 3 0 0 3 1 3
3332332023211
• Analysis: Use these data to test the distribution of male cubs among litters.
Interpret your results.
* Data from: Packer & Pusey (1987). Intrasexual cooperation and the sex ratio in African lions. American Naturalist. 130: 636-
642.

You might also like