You are on page 1of 4

The Gamma Distribution

The gamma distribution is a continuous probability distribution that is popular for a range of


phylogenetic applications. The gamma distribution is popular in part because its a bit of a shape
shifter that can assume a range of shapes, from exponential to normal. This flexibility results from
the fact that gamma distribution has two parameters. In most phylogenetic applications, these
parameters are referred to as the shape parameter,  , and the rate parameter,  , (note that some
applications of the rate parameter,  , parameter is replaced by the “scale parameter,” which is
simply the inverse of the rate parameter).

You can explore the impact of these two parameters on the gamma distribution using the following
R scripts (note that this script focuses on plotting values that flank the default parameter settings for
the gamma distribution in the program SIMMAP 1.5):

Using the R script below, you can generate probability densities


for the gamma distribution that vary the value of the shape
parameter ( ) (Fig. 1). As you can see, varying   has a strong
impact on the shape of the gamma distribution. The gamma
distribution is the sum of   independent and identically
distributed (i.i.d.) exponential distributions (i.e., that have the
same   rate parameter). Accordingly, when   = 1, the gamma
collapses to an exponential distribution, when   >> 1, the
gamma distribution increasingly resembles a normal distribution.

Figure 1: The impact of varying the


shape parameter (alpha) on the
gamma distribution.

#Generate a plot of gamma distributions that vary the shape parameter (alpha).

x <- seq(0, 100, length=200)

simmapDefaultGamma <- dgamma(x, shape=1.25, scale=1/0.25) #Make probability density function


for SIMMAP default gamma distribution

plot(x, simmapDefaultGamma, type="l", yaxs="i", xaxs="i", ylim=c(0,0.16), xlim=c(0,100), xlab="x


value", ylab="Density", main="Probability density for gamma distribution with variable alpha and
beta=0.25", lwd=0)

colors <- c("red", "black", "blue", "darkgreen", "purple", "orange")

alphas <- c(0.1, 1.25, 2, 4, 8, 10)

labels <- c("alpha=0.1", "alpha=1.25 (SIMMAP default)", "alpha=2", "alpha=4", "alpha=8",


"alpha=10")

for(i in 1:length(alphas)) {
hx <- dgamma(x, shape=alphas[i], rate=1, scale=1/0.25)

lines(x, hx, lwd=3, col=colors[i])

legend("topright", inset=.05, title="Probability densities",

labels, lwd=3, col=colors)

Using the R script below, you can visualize the


impact of varying the rate parameter ( ) while
keeping the shape parameter ( ) constant (Fig.
2).   has a strong impact on the shape of the
gamma distribution. When   is set to less than 1,
we tend to observe relatively broad distributions
with long tails. As we increase the value of  , we
observe increasingly tight distributions. This effect
stems from the fact that the variance of the
gamma is  / .

Figure 2: The impact of varying the rate parameter


(beta) on the gamma distribution.

#Generate a plot of gamma distributions that vary the rate parameter (beta) as in Figure 2 below.

x <- seq(0, 100, length=200)

simmapDefaultGamma <- dgamma(x, shape=1.25, scale=1/0.25) #Make probability density function


for SIMMAP default gamma distribution

#plot(x, simmapDefaultGamma, type="l")

plot(x, simmapDefaultGamma, type="l", yaxs="i", xaxs="i", ylim=c(0,0.9), xlim=c(0,70), xlab="x


value", ylab="Density", main="Probability density for gamma distribution with a fixed alpha=1.25
and variable beta", lwd=2)

colors <- c("red", "black", "blue", "darkgreen", "purple", "orange")

betas <- c(0.1, 0.25, 2, 4, 8, 10)

labels <- c("beta=0.1", "beta=0.25 (SIMMAP default)", "beta=2", "beta=4", "beta=8", "beta=10")

for(i in 1:length(betas)) {
hx <- dgamma(x, shape=1.25, rate=1, scale=1/betas[i])

lines(x, hx, lwd=2, col=colors[i])

legend("topright", inset=.05, title="Probability densities",labels, lwd=2, col=colors)

For many phylogenetic applications of the


gamma distribution -- e.g, to accommodate variation in
substitution rate across sites (ASRV) -- the   and 
parameters are constrained to be equal. You can visualize
the gamma distributions generated under these conditions
using the R script below (Fig. 3). Because the mean of the
gamma distribution is  , this constraint ensures that the
gamma distribution has a mean of one. This is important
when the gamma distribution is used as a prior probability
density on ASRV, as it retains the ability to interpret branch
lengths as the expected (mean) number of substitutions per
site.

Figure 3: Gamma distributions with


alpha and beta set equal to one
another.

#Generate a plot of gamma distributions with alpha and beta equal to one another as in Figure 3
below.

x <- seq(0, 100, length=200)

simmapDefaultGamma <- dgamma(x, shape=1.25, scale=1/0.25) #Make probability density function


for SIMMAP default gamma distribution

#plot(x, simmapDefaultGamma, type="l")

plot(x, simmapDefaultGamma, type="l", yaxs="i", xaxs="i", ylim=c(0,2), xlim=c(0,30), xlab="x value",


ylab="Density", main="Probability density for gamma distribution with a alpha and beta equal to one
another", lwd=2)

colors <- c("red", "black", "blue", "darkgreen", "purple", "orange")

alphas <- c(0.1, 0.5, 1, 5, 20)

betas <- c(0.1, 0.5, 1, 5, 20)

labels <- c("alpha=0.1, beta=0.1", "alpha=1.25, beta=0.25 (SIMMAP default)", "alpha=0.5, beta=0.5",
"alpha=1, beta=1", "alpha=5, beta=5", "alpha=20, beta=20")
for(i in 1:length(betas)) {

hx <- dgamma(x, shape=alphas[i], rate=1, scale=1/betas[i])

lines(x, hx, lwd=2, col=colors[i])

legend("topright", inset=.05, title="Probability densities",labels, lwd=2, col=colors)

You might also like