Notes ch2 Discrete and Continuous Distributions

Discrete and Continuous Distributions
Lecture Notes
Erkin Diyarbakirlioglu
IAE Gustave Eiffel
February 6, 2022
1
Table of contents
1 Discrete distributions .............................................................................................................................................3
1.1 Bernoulli distribution ...................................................................................................................................3
1.2 Geometric distribution .................................................................................................................................3
1.3 Binomial distribution ....................................................................................................................................4
1.4 Poisson process and Poisson distribution............................................................................................5
2 Continuous distributions ......................................................................................................................................6
2.1 Normal distributions .....................................................................................................................................6
2.2 Standard normal distribution....................................................................................................................8
2.3 Lognormal distribution ............................................................................................................................. 10
2.4 Student’s 𝒕-distribution............................................................................................................................. 12
2.5 Chi-square distribution ............................................................................................................................. 14
2.6 𝑭 distribution ................................................................................................................................................ 15
2
1 Discrete distributions
1.1 Bernoulli distribution
1. A Bernoulli random variable 𝑋 takes the value 1 with probability 𝑝 and the value 0 with
probability 1 − 𝑝. The probability mass function (pmf) of a Bernoulli random variable is given by,
𝑝 𝑖𝑓 𝑥=1
𝑓(𝑥; 𝑝) = { (1)
1−𝑝 𝑖𝑓 𝑥=0
which can be also rewritten as,
𝑓(𝑥; 𝑝) = 𝑝 𝑥 (1 − 𝑝)1−𝑥 (2)
The last expression shows that the Bernoulli distribution is a special case of the binomial
distribution with 𝑛 = 1. The expected value of 𝑋 is 𝐸(𝑋) = 1 × 𝑝 + 0 × (1 − 𝑝) = 𝑝 and its
variance 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − 𝐸(𝑋)2 = 𝑝 − 𝑝2 = 𝑝(1 − 𝑝), where we make use of 𝐸(𝑋 2 ) =
(12 × 𝑝) + (02 × (1 − 𝑝)) = 𝑝. The Bernoulli variable plays an essential role in probability and
statistics for it’s a building block for several cases where one of the outcomes of an experiment
can be regarded as a "success".
2. A Bernoulli process is a finite or infinite sequence of independently and identically distributed

Bernoulli random variables 𝑋1 , 𝑋2 , … Consider again the game where the player wins if the coin
lands heads. Assume now the game is played several times. The outcome of each trial is
independent from earlier trials. No matter how many successes or failures have been seen
previously, the probability of getting a success at the result of the next trial still is the same. The
process has no memory. In addition, regardless of how many times the coin had been tossed, the
odds of heads vs. tails also still is unchanged. Such a sequence of Bernoulli trials forms an
identically and independently distributed, i.e. iid random variables.
1.2 Geometric distribution
3. Let 𝑋1 , 𝑋2 , … be a sequence of iid Bernoulli trials with success probability 𝑝. If the random
variable 𝑋 shows the number of trials 𝑛 until the first success is observed, then 𝑋 follows a
geometric distribution with pmf,
𝑓(𝑛; 𝑝) = 𝑃(𝑋 = 𝑛) = (1 − 𝑝)𝑛−1 𝑝 (3)
It can be shown that 𝐸(𝑋) = 1⁄𝑝 and 𝑉𝑎𝑟(𝑋) = (1 − 𝑝)⁄𝑝2 .

3
4. As an alternative way of understanding the idea of geometric distribution, consider again its
definition. Since the random variable describes the number of trials until the first success is
observed, it turns out that all previous trials failed to deliver the outcome of interest. In fact, in a
sequence of 𝑛 trials with the first 𝑛 − 1 trials yielding 0 (failure) and the 𝑛th trial 1 (success), we
have the following scheme: ⏟
0 0 … 0 ⏟
1 . Since each trial is independent, the probability
𝑛−1 𝑡𝑟𝑖𝑎𝑙 𝑛𝑡ℎ 𝑡𝑟𝑖𝑎𝑙
of seeing such a sequence of failures with probabilities 1 − 𝑝 followed by a success with

probability 𝑝 can be expressed by the multiplication rule for independent processes:
𝑃((𝑋 = 0) ∩ ⋯ ∩ (𝑋 = 0) ∩ (𝑋 = 1)) = ⏟
𝑃(𝑋 = 0) × ⋯ × 𝑃(𝑋 = 0) × 𝑃(𝑋 = 1)
𝑛−1 𝑡𝑟𝑖𝑎𝑙𝑠
𝑃((𝑋 = 0) ∩ ⋯ ∩ (𝑋 = 0) ∩ (𝑋 = 1)) = (1 − 𝑝)𝑛−1 𝑝
Example. Suppose you roll a die. Let's calculate the probability to get 6 for the first time at the 5 th
trial. Using geometric distribution, 𝑃(𝑋 = 5) = (1 − 1⁄6)5−1 × (1⁄6) = 8.0375%.
1.3 Binomial distribution
5. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a sequence of iid Bernoulli trials. If 𝑋 represents the probability of observing

𝑥 successes out of 𝑛 Bernoulli trials, then 𝑋 is said to follow a binomial distribution, denoted 𝑋 ∼
𝐵(𝑛, 𝑝), and has the following pmf,
𝑛
𝑓(𝑥; 𝑛, 𝑝) = 𝑃(𝑋 = 𝑥) = ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 (4)
𝑥
where,
𝑛 𝑛!
( )=
𝑥 𝑥! (𝑛 − 𝑥)!
The mean of 𝑋 is 𝐸(𝑋) = 𝑛𝑝 and its variance 𝑉𝑎𝑟(𝑋) = 𝑛𝑝(1 − 𝑝).
Example. Assume you roll a fair die 10 times. Then, the probability of observing 6 exactly three
times can be calculated using the binomial model as
10
𝑃(𝑋 = 3) = ( ) (0.1666)3 (1 − 0.1666)10−3 = 0.1550
3
4
6. The shape of binomial distribution is directly related to 𝑛 and 𝑝. Specifically,
▪ For small 𝑝 and small 𝑛, the distribution is right skewed. The probabilities cluster in the
smaller numbers 0, 1, 2, … and the distribution tails off to the right.
▪ For large 𝑝 and small 𝑛, the distribution is left skewed.
▪ For 𝑝 = 0.5 and large or small 𝑛, the distribution is symmetric around its mean.
▪ For small 𝑝 and large 𝑛, the distribution approaches symmetry.
1.4 Poisson process and Poisson distribution
7. A Poisson process is a sequence of random variables that satisfies the following properties:
▪ The outcome of each random variable can be classified as a success or failures.

▪ The average number of successes that occurs within a specified region or period is known.
▪ The probability of success is proportional to the size of that region.
▪ The probability of success in an extremely small region is virtually zero.
8. A Poisson random variable 𝑋 describes the number of successes that result from a Poisson
process. Its distribution is called the Poisson distribution. The Poisson distribution applies when
(1) the event of interest is something that can be counted; (2) occurrences are independent; and
(3) it is possible to count how many events have occurred. The pmf of the Poisson distribution is,
𝜆𝑥 𝑒 −𝜆
𝑓(𝑥; 𝜆) = 𝑃(𝑋 = 𝑥) = (5)
𝑥!
where 𝜆 is a positive real number and 𝑒 = 2.718281 … the basis of natural logarithm. The
parameter lambda defines the average number events per interval. The PDF is typically read as
the “probability of observing 𝑥 events per period”. The mean and the variance of a Poisson random
variable are 𝐸(𝑋) = 𝑉𝑎𝑟(𝑋) = 𝜆.
Example. Suppose an e-mail account receives on average 10 e-mails per day. What is the
probability that the same account receives 12 e-mails the next day? Using the Poisson distribution,
we get 𝑃(𝑋 = 12) = 9.478%.
9. Poisson approximation to binomial distribution The Poisson distribution arises as a limiting

case of a binomial random variable when the number of independent trials is large, and the
probability of success is small. In this case, the distribution of 𝑋 ∼ 𝐵(𝑛, 𝑝) can be substituted by a
5
Poisson distribution with parameter 𝜆 = 𝑛𝑝, i.e. 𝑋 ∼ 𝐵(𝑛, 𝑝) → 𝑋 ∼ 𝑃𝑜𝑖(𝜆) for 𝑛 sufficiently large.
Mathematically,
𝑛 𝑘 𝑛−𝑘
(𝑛𝑝)𝑘 𝑒 −𝑛𝑝 𝜆𝑘 𝑒 −𝜆
lim {(
𝑛→∞ 𝑘
) 𝑝 (1 − 𝑝) } = = (6)
𝑝→ 0
𝑘! 𝑘!
See the appendix for details.
2 Continuous distributions
10. Unlike discrete random variables with finite number of outcomes, realizations from
continuous random variables are not countable. Examples of continuous data include the amount
of rain, in centimeters, that falls in a randomly selected storm, the weight, in kilograms, of a
randomly selected student or the square footage of a randomly selected three-bedroom house. In
each of these examples, the resulting measurement comes from an interval of possible outcomes.
Recall that the measurement tool is often the restricting factor with continuous data. That is, if I
say I weigh 80 kilograms, I don't actually weigh exactly 80 kilograms... that's just what my scale
tells me. In reality, I might weigh 80.1284027401307 kilograms... that's where the interval of
possible outcomes comes in. That is, the possible measurements cannot be put into one-to-one
correspondence with the integers.1
2.1 Normal distributions
11. Among all the statistical distributions, the Normal distribution is perhaps the best known and
the most commonly used in practice. Many random variables are nearly normal, even though none
of them is exactly normal. The normal distribution is also very popular because it provides very
close approximations for a variety of scenarios and is at the heart of the sampling theory. The
density function of a normal random variable 𝑋 is specified as follows,
1 1 𝑥−𝜇 2
𝑓𝑋 (𝑥; 𝜇, 𝜎) = exp {− ( ) } (7)
𝜎√2𝜋 2 𝜎
where 𝜋 = 3.1415… and exp{𝑥} = 𝑒 𝑥 is the exponential function with exp{1} = 𝑒 1 = 2.7182 …
The distribution function of 𝑋 is given by,
𝑡
1 1 𝑡−𝜇 2
𝐹𝑋 (𝑥; 𝜇, 𝜎) = ∫ exp {− ( ) } 𝑑𝑡 (8)
−∞ 𝜎√2𝜋 2 𝜎
1 This paragraph is taken from https://newonlinecourses.science.psu.edu/stat414/node/87/.

6
A normally distributed random variable 𝑋 is characterized by its two parameters, namely its mean
𝜇 and standard deviation 𝜎, and typically denoted as 𝑋 ∼ 𝑁(𝜇, 𝜎).2
12. The figure below shows the pdf and the cdf of a normal distribution with 𝜇 = 50 and 𝜎 = 10.
Figure 1. Normal pdf and cdf for 𝑿 ∼ 𝑵(𝝁 = 𝟓𝟎, 𝝈 = 𝟏𝟎)
The mean of 𝑋 shows the central location of the random variable. The distribution is symmetric:
The mean divides the full density into two equal portions with 50% of the total probability below
the mean and 50% above it. In addition, the median and the model of a normal random variable
are both equal to its mean. The symmetry property also implies that its skewness is zero.
The standard deviation 𝜎 of 𝑋 shows the magnitude of the spread of the different values of 𝑋
around 𝜇. It can be shown that approximately 68% of the area under the bell-shaped curve of a
normal random variable lies within one standard deviation away from the mean, i.e. 𝜇 ± 𝜎. If we
consider the interval 𝜇 ± 2𝜎, it contains 95% of the total probability. If one considers the interval
𝜇 ± 3𝜎, then we cover almost 99% of the total probability. Thus, observations that fall more than
3 standard deviations away from the mean are typically considered as relatively unlikely under
the normal distribution.
13. Linear transformations of a normal random variable also follow a normal distribution.
Specifically, if 𝑎 and 𝑏 are two constants and 𝑋 ∼ 𝑁(𝜇, 𝜎), then any linear transformation of 𝑋 like
𝑌 = 𝑎𝑋 + 𝑏 will also be normally distributed as 𝑌 ∼ 𝑁(𝑎𝜇 + 𝑏, 𝑎𝜎). If 𝑋 and 𝑌 are two independent
normal random variables, then their sum will also be a normal random variable distributed as
(𝑋 + 𝑌) ∼ 𝑁 (𝜇𝑋 + 𝜇𝑌 , √𝜎𝑋2 + 𝜎𝑌2 ).
2 Some authors prefer using the variance 𝜎 2 instead of the standard deviation.
7
14. To understand how the pdf of 𝑋 changes as 𝜇 and/or 𝜎 change, observe the following figure
which shows the densities of two normal random variables 𝑋1 ∼ 𝑁(𝜇 = 0, 𝜎 = 1) and 𝑋2 ∼
𝑁(𝜇 = 10, 𝜎 = 2). As it can be readily seen on the graph, changing the mean of the distribution
shifts the bell-shaped curve to the left if the mean decreases, or to the right if the mean increases.
Changing the standard deviation stretches or constricts the curve. The standard deviation of 𝑋1 is
much lower than that of 𝑋2 . The result is that there is less dispersion for 𝑋1 around its mean
compared to the dispersion of the values that 𝑋2 can take.
Figure 2. Normal distributions
15. The corollary is that when comparing two normal random variables that differ substantially
either with respect to their means and/or standard deviations, it would be useful to transform the
variables first since differences in location and/or scale can make the comparison difficult. This
task will be accomplished by using the standard normal distribution, which I discuss below.
2.2 Standard normal distribution
16. The standard normal distribution is a special normal distribution with zero mean and unit
standard deviation. Thus, setting 𝜇 = 0 and 𝜎 = 1 in the normal density function, we obtain the
standard normal density function as,
1 1
𝑓𝑍 (𝑧) = exp {− 𝑧 2 } (9)
√2𝜋 2
In most textbooks, 𝑓𝑍 (𝑧) is denoted as 𝜙(𝑧). The standard normal distribution function is,
8
𝑧
𝐹𝑍 (𝑧) = 𝑃(𝑍 ≤ 𝑧) = ∫ 𝜙(𝑢)𝑑𝑢 (10)
−∞
which is also denoted often as Φ(𝑧) or 𝑍 ∼ 𝑁(0,1).
17. Any normal random variable 𝑋 with mean 𝜇 and standard deviation 𝜎 can be transformed into
a standard normal random variable 𝑍 via
𝑥−𝜇
𝑧= (11)
𝜎
where 𝑥 is any observation drawn from 𝑋 ∼ 𝑁(𝜇, 𝜎). We generally call 𝑧 the standardized score
or the 𝑧-score of 𝑥. Given the formula, the z-score of 𝑥 is the distance between the 𝑥 value and 𝜇 in
terms of the standard deviation of 𝑋. Thus, 𝑧 tells us how many standard deviations 𝑥 falls below
or above the mean.
18. As an example, consider a normal random variable 𝑋 with mean 120 and standard deviation
20. You are given three observations drawn from 𝑋: 𝑥1 = 90, 𝑥2 = 120 and 𝑥3 = 135. Figure 3
shows the corresponding normal curve and the locations of 𝑥1 , 𝑥2 and 𝑥3 on the x-axis.
Figure 3. z transformation
𝑥1 is below the mean of 𝑋. 𝑥2 is equal to 𝜇 and 𝑥3 is above the mean. The related z-scores of each
observation are 𝑧1 = (𝑥1 − 𝜇)⁄𝜎 = (90 − 120)⁄20 = −1.5, 𝑧2 = 0 and 𝑧3 = 0.75. These 𝑧-scores
can be read as follows. For example, we can say that “𝑥1 is −1.5 standard deviations below the
mean” or “𝑥3 is 0.75 standard deviations above the mean”. Since 𝑥2 = 𝜇, its 𝑧-score is
automatically zero.
9
The interpretation of 𝑧-score as a normalized distance from the mean is also helpful in discussing
how unlikely any observation is given a normal distribution.
Example. Let two normal random variables with 𝑋 ∼ 𝑁(100, 20) and 𝑌 ∼ 𝑁(1000, 120). You are
given two observations 𝑥 = 150 and 𝑦 = 1500, drawn from 𝑋 and 𝑌 respectively. We readily
observe that 𝑥 and 𝑦 are both superior to the mean of their population but which one is relatively
more “superior”? 𝑥 and 𝑦 can’t be directly compared since they do not come from the same
distribution. On the other hand, if one calculates their standardized scores, we get 𝑧𝑥=150 = 2.5
and 𝑧𝑦=1500 = 4.16. In other words, 𝑥 is 2.5 standard deviations above the mean while 𝑦 is 4.16
standard deviations above the mean. Therefore, 𝑦 is an observation more unlikely relative to 𝑥.
19. Using the normal probability table The normal probability table is a simple tool for
evaluating the normal cdf given a percentile or working backwards to find a standard normal
percentile given the cdf evaluated at this percentile. Let 𝑍 ∼ 𝑁(0,1). Using the normal probability
table, calculate first the following probabilities: (1) 𝑃(𝑍 ≤ 0.55); (2) 𝑃(𝑍 ≥ 1.65); (3)
𝑃(𝑍 ≤ −1.65); (4) 𝑃(−1.96 ≤ 𝑍 ≤ 1.96). The table can also be used to calculate the standard
normal quantiles. For example, calculate (1) Φ(𝑧) = 0.0307 → 𝑧 = ? ; (2) Φ(𝑧) = 0.9918 → 𝑧 = ?;
(3) 1 − Φ(𝑧) = 0.8962 → 𝑧 = ?; (4) 𝑃(−𝑧 ≤ 𝑍 ≤ |𝑧|) = 0.95 → 𝑧 = ?
2.3 Lognormal distribution3
20. A random variable 𝑌 follows a lognormal distribution if its logarithm is normally distributed.
Formally, a lognormal random variable is denoted 𝑌 ∼ ln 𝑁(𝜇, 𝜎). It turns out that 𝑋 = exp(𝑌) is
normal with 𝑋 ∼ 𝑁(𝜇, 𝜎). The median of a lognormal random variable is exp(𝑌) while its expected
𝜎2
value 𝐸(𝑌) = exp (𝜇 + 2
). With a lognormal distribution, 𝜇 is no longer a location parameter
because the support of the distribution is fixed to start at 0.4
21. Relationship between simple and log returns Let 𝑆𝑡 be the time 𝑡 price of security. The
one-period simple return is defined as 𝑌𝑡 = (𝑆𝑡 ⁄𝑆𝑡−1 ) − 1 and the one-period log return 𝑦𝑡 =
ln(𝑆𝑡 ⁄𝑆𝑡−1 ). Thus, 𝑌𝑡 = exp(𝑦𝑡 ) − 1 and 𝑦𝑡 = ln(𝑌𝑡 + 1). If 𝑌𝑡 ∼ ln 𝑁(𝜇𝑌 , 𝜎𝑌 ), then 𝑦𝑡 ∼ 𝑁(𝜇𝑦 , 𝜎𝑦 ).
The expected value and variance of simple and log-returns are related by,
1 (12)
𝐸(𝑌) = 𝜇𝑌 = exp (𝜇𝑦 + 𝜎𝑦2 ) − 1
2
3 This section can be skipped for an undergrad course.

4 The mean 𝜇 is henceforth a scale parameter and the log-standard deviation 𝜎 a shape parameter.
10
𝑉𝑎𝑟(𝑌) = 𝜎𝑌2 = exp(2𝜇𝑦 + 𝜎𝑦2 ) (exp(𝜎𝑦2 ) − 1)
𝑆𝑡
Consider now the multiperiod log return from time 𝑡 − 𝑚 to 𝑡, i.e. 𝑦𝑡 (𝑚) = ln (𝑆 ) = 𝑦𝑡 + 𝑦𝑡−1 +
𝑡−𝑚
⋯ + 𝑦𝑡−𝑚+1 . If the sequence {𝑦𝑡−𝑚+1 , … , 𝑦𝑡 } are iid normal random variables, then their sum will
also be normal with 𝑦𝑡 (𝑚) ∼ 𝑁(𝑚𝜇𝑦 , 𝑚𝜎𝑦 ).
22. Calculating price bounds Let 𝑆0 be the current price of a security. The objective is to estimate
a confidence interval for the price at a future time 𝑇, 𝑆𝑇 . Assume that one-period log returns are
normal with 𝑦𝑡 ∼ 𝑁(𝜇𝑦 , 𝜎𝑦 ) and the sequence ln(𝑆𝑇 ⁄𝑆0 ) = 𝑦1 + 𝑦2 + ⋯ + 𝑦𝑇−1 + 𝑦𝑇 are
uncorrelated iid normal random variables. The distribution of this multiperiod return is thus
ln(𝑆𝑇 ⁄𝑆0 ) ∼ 𝑁(𝑇𝜇𝑦 , 𝑇𝜎𝑦 ). Then, by virtue of the Central Limit Theorem, the following
transformation of ln(𝑆𝑇 ⁄𝑆0 ),
𝑆
ln ( 𝑆𝑇 ) − 𝜇𝑦 𝑇
𝑍𝑇 = 0
∼ 𝜙(𝑧) (13)
𝜎𝑦 √𝑇
Follows a standard normal distribution. Using this insight, lower and upper bounds for the time 𝑇
price at 1 − 𝛼 confidence can be expressed as
𝛼 𝑆𝑇 𝛼
𝑃 (𝜇𝑦 𝑇 − Φ−1 (1 − ) 𝜎𝑦 √𝑇 ≤ ln ( ) ≤ 𝜇𝑦 𝑇 + Φ−1 (1 − ) 𝜎𝑦 √𝑇) = 1 − 𝛼 (14)
2 𝑆0 2
Or, equivalently,
𝛼 𝛼
𝑃 (𝑆0 exp (𝜇𝑦 𝑇 − Φ−1 (1 − ) 𝜎𝑦 √𝑇) ≤ 𝑆𝑇 ≤ 𝑆0 exp (𝜇𝑦 𝑇 + Φ−1 (1 − ) 𝜎𝑦 √𝑇))
2 2 (15)
=1−𝛼
Using historical data, the procedure consists first of estimating the expected value and variance of
the log returns of a security. Then, assuming those estimates are time-invariant –in fact, a big
assumption, those lower and upper bounds can be numerically calculated.
11
2.4 Student’s 𝒕-distribution
23. A random variable follows a t-distribution5 with 𝑘 degrees of freedom if its probability
distribution function is given by,
𝑘+1 𝑘+1
−
𝛤( 2 ) 𝑥2 2
𝑓(𝑥; 𝑘) = (1 + ) (16)
𝑘 𝑘
𝛤 (2) √𝜋𝑘
where
∞
𝛤(𝑧) = ∫ 𝑥 𝑧−1 𝑒 −𝑥 𝑑𝑥 (17)
0
is the gamma function where 𝑧 is a complex number with a positive real part. The distribution is
centered at and symmetric around 0, so 𝐸(𝑋) = 0. The variance is given by 𝜎 2 = 𝑘⁄(𝑘 − 2). Thus,
the variance is not defined for 𝑘 ≤ 2. The symmetry property implies that its skewness coefficient
is 0. It can be shown graphically that the distribution converges to a standard normal distribution
as 𝑘 gets larger. In contrast, for small values of 𝑘, especially lower than 20, the 𝑡-distribution
exhibits fatter tails than what is observed under a normal model.6
24. To see how a 𝑡 distribution gets closer to the standard normal density as 𝑘 increases, consider
the following figure. The first chart at the top plots a 𝑡-distribution with 𝑘 = 4 degrees of freedom
and the standard normal distribution on the same axis. The chart below plots a 𝑡-distribution with
𝑘 = 20 and the standard normal distribution. Notice first the symmetry of the distribution around
0 regardless of the value of its parameter. It is always centered at 0, just like the standard normal
distribution. In addition, as 𝑘 increases it becomes difficult to distinguish the 𝑡-distribution from
a standard normal distribution. However, for small values of 𝑘 the 𝑡 model exhibits fatter tails
than the normal model.
5 The t-distribution is also known as Student’s t-distribution, after the British mathematician William Sealy Gosset,
who published under the pen name “Student”.
6 It can be shown that for a 𝑡 random variable with 𝑘 degrees of freedom, the fourth moment (i.e. the kurtosis) is equal
to 𝑚4 = 6⁄(𝑘 − 4).
12
Figure 4. Normal vs. t-distribution
0.45 0.45
0.40 0.40
0.35 0.35
0.30 0.30
0.25 0.25
0.20 0.20
0.15 0.15
0.10 0.10
0.05 0.05
0.00 0.00
-4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0
standard normal t(df = 4) standard normal t(df = 20)
25. The 𝑡 distribution can be used with any statistic that has a bell-shaped distribution if any of
the following conditions are met:
▪ The data come from a normal population.

▪ The population distribution is symmetric, unimodal, without outliers, and the sample size
is larger than 30.
▪ The population distribution is moderately skewed, unimodal, without outliers, and the
sample size is larger than 40.
▪ The sample size is larger than 40, without outliers.
The 𝑡 distribution should not be used with small samples from populations that are not
approximately normal.
26. Probability calculations that involve the t-distribution can be done using statistical software.
MS Excel has several built-in functions for this purpose. Yet, as this is the case for the normal
distribution, several values for the cdf of t random variables at different degrees of freedoms are
provided in a t probability table.
Example. Let 𝑋 ∼ 𝑡𝑘 = 8 . What is 𝑃(𝑋 ≤ 1.8595)? First, we look at the first column of the table and
scroll down till we find the row showing the value 𝑘. Then, we find the value 1.8595. Going
upward, we see that the upper-tail probability is equal to 0.05, i.e. 𝑃(𝑋 ≥ 1.8595) = 0.05.
Therefore, 𝑃(𝑋 ≤ 1.8595) = 1 − 𝑃(𝑋 ≥ 1.8595) = 1 − 0.05 = 0.95.
13
Example. Let 𝑋 ∼ 𝑡𝑘 = 8 . What is 𝑃(𝑋 ≤ 2)? On the row 𝑘 = 8, we observe that the number 2 is not
explicitly listed but we know that 1.8595 < 2 < 2.306. For 𝑥 = 1.8595, we’ve found 𝑃(𝑋 ≤
1.8595) = 0.95. Likewise, for 𝑥 = 2.306, we observe on the table that 𝑃(𝑋 ≥ 2.306) = 0.025;
therefore, 𝑃(𝑋 ≤ 2.306) = 1 − 0.025 = 0.975. What can then be 𝑃(𝑋 ≤ 2)? Even if we can’t tell
exactly the probability using the t-table, we can conclude that it is between 0.95 and 0.975 since
2 is between 1.8595 and 2.306.
Example. Let 𝑋 ∼ 𝑡𝑘 = 20 . Let’s calculate 𝑃(𝑋 ≤ −1.5). Scrolling down first to the row 𝑘 = 20, and
then going right, we can’t find minus 1.5; it is not even located between any table entries as there
are only positive numbers. Using the symmetry property, note that 𝑃(𝑋 ≤ −1.5) = 𝑃(𝑋 ≥ 1.5).
We then continue as in the previous example. You should find that 𝑃(𝑋 ≤ −1.5) = [0.05, 0.1].
2.5 Chi-square distribution
27. The 𝜒 2 distribution is a right-skewed continuous distribution. It is characterized by one

parameter, its degrees of freedom 𝑘. The density function 𝑓(𝑥) of a 𝜒 2 random variable with 𝑘
degrees of freedom is
1
𝑓(𝑥) = 𝑥 (𝑘⁄2−1) 𝑒 −𝑘⁄2 (18)
2𝑘⁄2 Γ(𝑘⁄2)
where
∞
𝛤(𝑧) = ∫ 𝑥 𝑧−1 𝑒 −𝑥 𝑑𝑥 (19)
0
is the gamma function with 𝑧 a complex number including a positive real part. A random variable
𝑋 that follows a 𝜒 2 distribution with 𝑘 degrees of freedom is denoted 𝑋 ∼ 𝜒𝑘2 . The mean and
variance of 𝑋 are 𝐸(𝑋) = 𝑘 and 𝑉𝑎𝑟(𝑋) = 2𝑘.
28. A 𝜒 2 distribution at 𝑘 degrees of freedom can be derived as the sum of 𝑘 squared independent
standard normal random variables. Specifically, if 𝑋𝑖 ∼ 𝑁(0,1), it can be shown that
𝑋12 + ⋯ + 𝑋𝑘2 ∼ 𝜒𝑘2 (20)
Therefore, if 𝑋 ∼ 𝜒𝑘2 , by linearity of expectation we have 𝐸(𝑋) = 𝑘. In addition, since
𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟(𝑋1 ) + ⋯ + 𝑉𝑎𝑟(𝑋𝑘 ) = 𝑘 (𝐸(𝑋14 ) − 𝐸 2 (𝑋12 ))
14
it can be shown that 𝑉𝑎𝑟(𝑋) = 2𝑘.
29. The shape of a 𝜒 2 density changes with 𝑘. For small values of 𝑘, the distribution is asymmetric.
As 𝑘 gets larger, the 𝜒 2 density tends to be symmetric. The following figure plots three 𝜒 2
distributions at 4, 6 and 10 degrees of freedom.
Figure 5. 𝝌𝟐 density function for three different values of 𝒌
30. Probabilities using the 𝜒 2 distribution can be calculated either using software or the 𝜒 2
probability table. For a given value of 𝑘, the entries in the table show the percentile of the chi-
2
square random variable matching the upper-tail probabilities. For example, let 𝑋 ∼ 𝜒𝑘=10 . Then,
the area on the right of, say, 3.94 is 95% of the density, 𝑃(𝑋 ≥ 3.94) = 0.95. Consequently, the cdf
2
at 3.94 for 𝑋 ∼ 𝜒𝑘=10 is 𝑃(𝑋 ≤ 3.94) = 1 − 𝑃(𝑋 ≥ 3.94) = 0.05.
2.6 𝑭 distribution
31. The 𝐹 distribution has two parameters, called their degrees of freedoms 𝑘1 and 𝑘2 . A random
variable 𝑋 follows an 𝐹 distribution if it can be also expressed as
𝑌1 ⁄𝑘1
𝑋= (21)
𝑌2 ⁄𝑘2
where 𝑌1 and 𝑌2 are two chi-square random variables with 𝑘1 and 𝑘2 degrees of freedom
respectively. Without loss of generality, we can represent the distribution of 𝑋 as 𝑋 ∼ 𝐹𝑘1 ,𝑘2 . The
support of an F random variable is [0, ∞). The probability distribution function of an F random
variable with 𝑘1 and 𝑘2 degrees of freedoms is
15
)
0.5𝑘1 −1
𝑘1 −0.5(𝑘1 +𝑘2
𝑓(𝑥) = {𝑐𝑥 (1 + 𝑥) 𝑖𝑓 𝑥 ∈ [0, ∞) (22)
𝑘2
0 𝑖𝑓 𝑥 ∉ [0, ∞)
where 𝑐 is a constant defined as,
𝑘1 0.5𝑘1 1
𝑐=( ) (23)
𝑘2 𝑘 𝑘
Β ( 21 , 22 )
and Β(⋅) is the beta function, i.e. Β(𝑥, 𝑦) = (Γ(𝑥)Γ(𝑦))⁄Γ(𝑥 + 𝑦) and Γ(𝑥) the gamma function.
32. The expected value of an 𝐹 random variable is
𝑘2
𝐸(𝑋) = (24)
𝑘2 − 2
for 𝑘2 > 2. And the variance is
2𝑘22 (𝑘1 + 𝑘2 − 2)
𝑉𝑎𝑟(𝑋) = (25)
𝑘1 (𝑘2 − 2)2 (𝑘2 − 4)
which is defined only for 𝑘2 > 4.
33. The form of a 𝐹 density depends both on 𝑘1 and 𝑘2 . To understand the effect of each degrees
of freedom parameter separately, we consider two cases. In the first case, we hold the second
parameter 𝑘2 constant and see what happens when one increases 𝑘1 . In the second case, we do
the complement of the exercise by holding 𝑘1 constant and modifying the value of 𝑘2 .
In both cases, the mean of the distribution is 𝐸(𝑋) = 𝑘2 ⁄(𝑘2 − 2) = 2. Thus, with 𝑘2 fixed,
increasing the first degrees of freedom parameter shifts the distribution towards the central
16
location. Now, we keep the value of 𝑘1 fixed and modify 𝑘2 from 4 to 20. Increasing 𝑘2 decreases
the mean of the random variable and the density shifts from the tails to the center.
Example. What happens when we increase both 𝑘1 and 𝑘2 ? Try it on a spreadsheet. MS Excel has
a built-in function for the 𝐹 distribution.
34. Probability calculations involving 𝐹 distribution, as this is the case for most other continuous
random variables, are carried out using software or the 𝐹 probability table. Most statistical
software already contains built-in functions. MS Excel has two such functions, F.DIST() and
F.INV(), which return, respectively the distribution and the quantile functions of F random
variables for specified degrees of freedoms and/or probabilities. As far as the 𝐹 probability table
is concerned, tabulated values are available in several textbooks or websites. Unlike the normal
or the 𝑡 table, however, an 𝐹 probability table is much voluminous due to the two parameters of
the distribution. This is the reason why I leave the reader to work with an 𝐹 table that she finds
suitable for her purposes.
35. 𝐹 statistic Loosely speaking, the 𝐹 statistic is a random variable that has an 𝐹 distribution. To
calculate it, we need two samples of size 𝑛1 and 𝑛2 . Both samples are independent and assumed
to be drawn from two normal populations with standard deviations 𝜎1 and 𝜎2 . Let 𝑠1 be the first
sample’s standard deviation and 𝑠2 the second sample’s standard deviation. Then, the 𝐹 statistic
is given by the following ratio,
𝑠12 ⁄𝜎12
𝐹 𝑠𝑡𝑎𝑡 = (26)
𝑠22 ⁄𝜎22
The following equation is also frequently used to compute an F statistic
17
𝑋12 ⁄𝑘1
𝐹 𝑠𝑡𝑎𝑡 = (27)
𝑋22 ⁄𝑘2
where 𝑋1 and 𝑋2 are two chi-square statistics drawn from the first and second sample. These
statistics follows chi-square distributions with 𝑘1 and 𝑘2 degrees of freedom, respectively. Finally,
we have 𝑘1 = 𝑛1 − 1 and 𝑘2 = 𝑛2 − 1.
18
Appendix. Poisson approximation to binomial distribution
Let 𝑋 ∼ 𝐵(𝑛, 𝑝) be a binomial random variable with parameters 𝑛 and 𝑝. The pmf of 𝑋 shows the
probability that 𝑋 takes the value 𝑘 as follows:
𝑛
𝑃(𝑋 = 𝑘) = ( ) 𝑝𝑘 (1 − 𝑝)𝑛−𝑘
𝑘
Denote the product 𝑛𝑝 = 𝜆. Then,
𝑛!
𝑃(𝑋 = 𝑘) = 𝑝𝑘 (1 − 𝑝)𝑛−𝑘
(𝑛 − 𝑘)! 𝑘!
𝑛! 𝜆 𝑘 𝜆 𝑛−𝑘
𝑃(𝑋 = 𝑘) = ( ) (1 − )
(𝑛 − 𝑘)! 𝑘! 𝑛 𝑛
𝜆 𝑛
𝑛(𝑛 − 1)(𝑛 − 2) ⋯ (𝑛 − 𝑘 + 1) 𝜆𝑘 (1 − 𝑛)
𝑃(𝑋 = 𝑘) =
𝑛𝑘 𝑘! 𝜆 𝑘
(1 − 𝑛)
As 𝑛 → ∞ and 𝑝 → 0, we obtain the following limits for different terms of the last expression:
𝜆 𝑛
(1 − ) ≈ 𝑒 −𝜆
𝑛
𝑛(𝑛 − 1)(𝑛 − 2) ⋯ (𝑛 − 𝑘 + 1)
≈1
𝑛𝑘
𝜆 𝑘
(1 − ) ≈ 1
𝑛
Plugging these results back into the binomial formula, it simplifies to
𝜆𝑘 𝑒 −𝜆
𝑃(𝑋 = 𝑘) =
𝑘!
which is the probability distribution function of a Poisson random variable with parameter 𝜆.
19
Appendix. Normal approximation to binomial distribution
Let 𝑋 ∼ 𝐵(𝑛, 𝑝), a binomial random variable with parameters 𝑛 and 𝑝. For large 𝑛, it can be shown
that the binomial distribution can be well approximated by a normal distribution with
𝑝
parameters, 𝜇 = 𝑛𝑝 and 𝜎 = √𝑛𝑝(1 − 𝑝), i.e. 𝑋 ∼ 𝐵(𝑛, 𝑝) → 𝑁(𝑛𝑝, √𝑛𝑝(1 − 𝑝)) This appendix
provides a brief description.
Consider a Bernoulli process 𝑋1 , 𝑋2 , … , 𝑋𝑛 and define 𝑌 = 𝑋1 + ⋯ + 𝑋𝑛 . Each 𝑋𝑖 share the same

probability of success 𝑝. Assume we want to calculate 𝑃(𝑌 ≤ 𝑦), that is the cdf of 𝑌. Note that 𝑌 is
the sum of 𝑛 independently and identically distributed random variables (recall the definition of
a Bernoulli process). Then, as long as 𝑛 is large, the Central Limit Theorem states that the following
transformation of 𝑌 is approximately distributed as a standard normal random variable:
𝑌 − 𝑛𝑝 𝑝
𝑍= → 𝑁(0,1)
√𝑛𝑝(1 − 𝑝)
In practice, the accuracy of the approximation can be easily verified using statistical software. If
not, a rule of thumb consists in using the normal approximation if the products 𝑛𝑝 and 𝑛(1 − 𝑝)
are both greater than 5. In addition, the farther the probability of success of each Bernoulli trial 𝑝
is away from 0.5, the larger 𝑛 is needed.7
7 For example, suppose 𝑝 = 0.9. To get 𝑛(1 − 𝑝) ≥ 5, we would have to have at least 50 observations.
20
Table 1. Normal probability table – positive 𝒛
Each table entry shows the area under the

standard normal distribution to the left of 𝑧.
For example, 𝑃(𝑍 ≤ 1.96) = 0.975.
𝒛 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
21
Table 2. Normal probability table – negative 𝒛
Each table entry shows the area under the

standard normal distribution to the left of 𝑧.
For example, 𝑃(𝑍 ≤ −1.96) = 0.025.
𝒛 −0.09 −0.08 −0.07 −0.06 −0.05 −0.04 −0.03 −0.02 −0.01 0.00
−3.4 0.0002 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003
−3.3 0.0003 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0005 0.0005 0.0005
−3.2 0.0005 0.0005 0.0005 0.0006 0.0006 0.0006 0.0006 0.0006 0.0007 0.0007
−3.1 0.0007 0.0007 0.0008 0.0008 0.0008 0.0008 0.0009 0.0009 0.0009 0.0010
−3.0 0.0010 0.0010 0.0011 0.0011 0.0011 0.0012 0.0012 0.0013 0.0013 0.0013
−2.9 0.0014 0.0014 0.0015 0.0015 0.0016 0.0016 0.0017 0.0018 0.0018 0.0019
−2.8 0.0019 0.0020 0.0021 0.0021 0.0022 0.0023 0.0023 0.0024 0.0025 0.0026
−2.7 0.0026 0.0027 0.0028 0.0029 0.0030 0.0031 0.0032 0.0033 0.0034 0.0035
−2.6 0.0036 0.0037 0.0038 0.0039 0.0040 0.0041 0.0043 0.0044 0.0045 0.0047
−2.5 0.0048 0.0049 0.0051 0.0052 0.0054 0.0055 0.0057 0.0059 0.0060 0.0062
−2.4 0.0064 0.0066 0.0068 0.0069 0.0071 0.0073 0.0075 0.0078 0.0080 0.0082
−2.3 0.0084 0.0087 0.0089 0.0091 0.0094 0.0096 0.0099 0.0102 0.0104 0.0107
−2.2 0.0110 0.0113 0.0116 0.0119 0.0122 0.0125 0.0129 0.0132 0.0136 0.0139
−2.1 0.0143 0.0146 0.0150 0.0154 0.0158 0.0162 0.0166 0.0170 0.0174 0.0179
−2.0 0.0183 0.0188 0.0192 0.0197 0.0202 0.0207 0.0212 0.0217 0.0222 0.0228
−1.9 0.0233 0.0239 0.0244 0.0250 0.0256 0.0262 0.0268 0.0274 0.0281 0.0287
−1.8 0.0294 0.0301 0.0307 0.0314 0.0322 0.0329 0.0336 0.0344 0.0351 0.0359
−1.7 0.0367 0.0375 0.0384 0.0392 0.0401 0.0409 0.0418 0.0427 0.0436 0.0446
−1.6 0.0455 0.0465 0.0475 0.0485 0.0495 0.0505 0.0516 0.0526 0.0537 0.0548
−1.5 0.0559 0.0571 0.0582 0.0594 0.0606 0.0618 0.0630 0.0643 0.0655 0.0668
−1.4 0.0681 0.0694 0.0708 0.0721 0.0735 0.0749 0.0764 0.0778 0.0793 0.0808
−1.3 0.0823 0.0838 0.0853 0.0869 0.0885 0.0901 0.0918 0.0934 0.0951 0.0968
−1.2 0.0985 0.1003 0.1020 0.1038 0.1056 0.1075 0.1093 0.1112 0.1131 0.1151
−1.1 0.1170 0.1190 0.1210 0.1230 0.1251 0.1271 0.1292 0.1314 0.1335 0.1357
−1.0 0.1379 0.1401 0.1423 0.1446 0.1469 0.1492 0.1515 0.1539 0.1562 0.1587
−0.9 0.1611 0.1635 0.1660 0.1685 0.1711 0.1736 0.1762 0.1788 0.1814 0.1841
−0.8 0.1867 0.1894 0.1922 0.1949 0.1977 0.2005 0.2033 0.2061 0.2090 0.2119
−0.7 0.2148 0.2177 0.2206 0.2236 0.2266 0.2296 0.2327 0.2358 0.2389 0.2420
−0.6 0.2451 0.2483 0.2514 0.2546 0.2578 0.2611 0.2643 0.2676 0.2709 0.2743
−0.5 0.2776 0.2810 0.2843 0.2877 0.2912 0.2946 0.2981 0.3015 0.3050 0.3085
−0.4 0.3121 0.3156 0.3192 0.3228 0.3264 0.3300 0.3336 0.3372 0.3409 0.3446
−0.3 0.3483 0.3520 0.3557 0.3594 0.3632 0.3669 0.3707 0.3745 0.3783 0.3821
−0.2 0.3859 0.3897 0.3936 0.3974 0.4013 0.4052 0.4090 0.4129 0.4168 0.4207
−0.1 0.4247 0.4286 0.4325 0.4364 0.4404 0.4443 0.4483 0.4522 0.4562 0.4602
−0.0 0.4641 0.4681 0.4721 0.4761 0.4801 0.4840 0.4880 0.4920 0.4960 0.5000
22
Table 3. 𝒕 probability table
Table entries show the 𝑡 value given the

degrees-of-freedom 𝑘 and upper-tail
probability 𝑝, i.e. 𝑃(𝑇 ≥ 𝑡) = 𝑝. For
example, if 𝑇 follows a t-distribution
with 𝑘 = 5 degrees-of-freedom, then
𝑃(𝑇 ≥ 1.4759) = 0.1.
upper-tail probability 𝒑
k 0.25 0.1 0.05 0.025 0.01 0.005
1 1.0000 3.0777 6.3138 12.7062 31.8205 63.6567
2 0.8165 1.8856 2.9200 4.3027 6.9646 9.9248
3 0.7649 1.6377 2.3534 3.1824 4.5407 5.8409
4 0.7407 1.5332 2.1318 2.7764 3.7469 4.6041
5 0.7267 1.4759 2.0150 2.5706 3.3649 4.0321
6 0.7176 1.4398 1.9432 2.4469 3.1427 3.7074
7 0.7111 1.4149 1.8946 2.3646 2.9980 3.4995
8 0.7064 1.3968 1.8595 2.3060 2.8965 3.3554
9 0.7027 1.3830 1.8331 2.2622 2.8214 3.2498
10 0.6998 1.3722 1.8125 2.2281 2.7638 3.1693
11 0.6974 1.3634 1.7959 2.2010 2.7181 3.1058
12 0.6955 1.3562 1.7823 2.1788 2.6810 3.0545
13 0.6938 1.3502 1.7709 2.1604 2.6503 3.0123
14 0.6924 1.3450 1.7613 2.1448 2.6245 2.9768
15 0.6912 1.3406 1.7531 2.1314 2.6025 2.9467
16 0.6901 1.3368 1.7459 2.1199 2.5835 2.9208
17 0.6892 1.3334 1.7396 2.1098 2.5669 2.8982
18 0.6884 1.3304 1.7341 2.1009 2.5524 2.8784
19 0.6876 1.3277 1.7291 2.0930 2.5395 2.8609
20 0.6870 1.3253 1.7247 2.0860 2.5280 2.8453
21 0.6864 1.3232 1.7207 2.0796 2.5176 2.8314
22 0.6858 1.3212 1.7171 2.0739 2.5083 2.8188
23 0.6853 1.3195 1.7139 2.0687 2.4999 2.8073
24 0.6848 1.3178 1.7109 2.0639 2.4922 2.7969
25 0.6844 1.3163 1.7081 2.0595 2.4851 2.7874
26 0.6840 1.3150 1.7056 2.0555 2.4786 2.7787
27 0.6837 1.3137 1.7033 2.0518 2.4727 2.7707
28 0.6834 1.3125 1.7011 2.0484 2.4671 2.7633
29 0.6830 1.3114 1.6991 2.0452 2.4620 2.7564
30 0.6828 1.3104 1.6973 2.0423 2.4573 2.7500
31 0.6825 1.3095 1.6955 2.0395 2.4528 2.7440
32 0.6822 1.3086 1.6939 2.0369 2.4487 2.7385
33 0.6820 1.3077 1.6924 2.0345 2.4448 2.7333
34 0.6818 1.3070 1.6909 2.0322 2.4411 2.7284
35 0.6816 1.3062 1.6896 2.0301 2.4377 2.7238
36 0.6814 1.3055 1.6883 2.0281 2.4345 2.7195
37 0.6812 1.3049 1.6871 2.0262 2.4314 2.7154
38 0.6810 1.3042 1.6860 2.0244 2.4286 2.7116
39 0.6808 1.3036 1.6849 2.0227 2.4258 2.7079
40 0.6807 1.3031 1.6839 2.0211 2.4233 2.7045
41 0.6805 1.3025 1.6829 2.0195 2.4208 2.7012
42 0.6804 1.3020 1.6820 2.0181 2.4185 2.6981
43 0.6802 1.3016 1.6811 2.0167 2.4163 2.6951
44 0.6801 1.3011 1.6802 2.0154 2.4141 2.6923
23
upper-tail probability 𝒑
k 0.25 0.1 0.05 0.025 0.01 0.005
45 0.6800 1.3006 1.6794 2.0141 2.4121 2.6896
46 0.6799 1.3002 1.6787 2.0129 2.4102 2.6870
47 0.6797 1.2998 1.6779 2.0117 2.4083 2.6846
48 0.6796 1.2994 1.6772 2.0106 2.4066 2.6822
49 0.6795 1.2991 1.6766 2.0096 2.4049 2.6800
50 0.6794 1.2987 1.6759 2.0086 2.4033 2.6778
60 0.6786 1.2958 1.6706 2.0003 2.3901 2.6603
70 0.6780 1.2938 1.6669 1.9944 2.3808 2.6479
80 0.6776 1.2922 1.6641 1.9901 2.3739 2.6387
90 0.6772 1.2910 1.6620 1.9867 2.3685 2.6316
100 0.6770 1.2901 1.6602 1.9840 2.3642 2.6259
150 0.6761 1.2872 1.6551 1.9759 2.3515 2.6090
200 0.6757 1.2858 1.6525 1.9719 2.3451 2.6006
300 0.6753 1.2844 1.6499 1.9679 2.3388 2.5923
400 0.6751 1.2837 1.6487 1.9659 2.3357 2.5882
500 0.6750 1.2832 1.6479 1.9647 2.3338 2.5857
∞ 0.6745 1.2816 1.6449 1.9600 2.3265 2.5760
24
Table 4. Chi-square probability table
Areas in the chi-square table refer to the right

tail probability. For example, if 𝑋 is a 𝜒 2
random variable with 𝑘 = 10 degrees of
freedom, then 𝑃(𝑋 ≥ 18.31) = 0.05.
upper-tail probabilities
k 0.995 0.99 0.975 0.95 0.9 0.1 0.05 0.025 0.01 0.005
1 0.00 0.00 0.00 0.00 0.02 2.71 3.84 5.02 6.63 7.88
2 0.01 0.02 0.05 0.10 0.21 4.61 5.99 7.38 9.21 10.60
3 0.07 0.11 0.22 0.35 0.58 6.25 7.81 9.35 11.34 12.84
4 0.21 0.30 0.48 0.71 1.06 7.78 9.49 11.14 13.28 14.86
5 0.41 0.55 0.83 1.15 1.61 9.24 11.07 12.83 15.09 16.75
6 0.68 0.87 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.55
7 0.99 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.28
8 1.34 1.65 2.18 2.73 3.49 13.36 15.51 17.53 20.09 21.95
9 1.73 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59
10 2.16 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19
11 2.60 3.05 3.82 4.57 5.58 17.28 19.68 21.92 24.72 26.76
12 3.07 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.30
13 3.57 4.11 5.01 5.89 7.04 19.81 22.36 24.74 27.69 29.82
14 4.07 4.66 5.63 6.57 7.79 21.06 23.68 26.12 29.14 31.32
15 4.60 5.23 6.26 7.26 8.55 22.31 25.00 27.49 30.58 32.80
16 5.14 5.81 6.91 7.96 9.31 23.54 26.30 28.85 32.00 34.27
17 5.70 6.41 7.56 8.67 10.09 24.77 27.59 30.19 33.41 35.72
18 6.26 7.01 8.23 9.39 10.86 25.99 28.87 31.53 34.81 37.16
19 6.84 7.63 8.91 10.12 11.65 27.20 30.14 32.85 36.19 38.58
20 7.43 8.26 9.59 10.85 12.44 28.41 31.41 34.17 37.57 40.00
21 8.03 8.90 10.28 11.59 13.24 29.62 32.67 35.48 38.93 41.40
22 8.64 9.54 10.98 12.34 14.04 30.81 33.92 36.78 40.29 42.80
23 9.26 10.20 11.69 13.09 14.85 32.01 35.17 38.08 41.64 44.18
24 9.89 10.86 12.40 13.85 15.66 33.20 36.42 39.36 42.98 45.56
25 10.52 11.52 13.12 14.61 16.47 34.38 37.65 40.65 44.31 46.93
26 11.16 12.20 13.84 15.38 17.29 35.56 38.89 41.92 45.64 48.29
27 11.81 12.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96 49.64
28 12.46 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99
29 13.12 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34
30 13.79 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67
40 20.71 22.16 24.43 26.51 29.05 51.81 55.76 59.34 63.69 66.77
50 27.99 29.71 32.36 34.76 37.69 63.17 67.50 71.42 76.15 79.49
60 35.53 37.48 40.48 43.19 46.46 74.40 79.08 83.30 88.38 91.95
70 43.28 45.44 48.76 51.74 55.33 85.53 90.53 95.02 100.4 104.2
80 51.17 53.54 57.15 60.39 64.28 96.58 101.9 106.6 112.3 116.3
90 59.20 61.75 65.65 69.13 73.29 107.6 113.2 118.1 124.1 128.3
100 67.33 70.06 74.22 77.93 82.36 118.5 124.3 129.6 135.8 140.2
25
End-of-chapter exercises for Discrete and continuous distributions
Exercise 1. Bernoulli random variable
A marketing research firm conducts a survey among potential consumers before a new product is
launch. The participants test the product and then fill a questionnaire in which answers are ranked
in a scale going from 1 to 5. The product will be launched if the average of the answers is equal to
or higher than 4. Built an appropriate probability model for this experiment.
Exercise 2. Bernoulli random variable
Consider a game where a six-sided die is rolled. If the outcome is equal to or less than 2, then the
player loses the amount he initially bets. Otherwise, he wins twice his bet. Built a probability
model for this game. Show the payoff profile and the associated probabilities to each outcome.8
Exercise 3. Geometric distribution
Suppose you roll a six-sided die until you get a six. How many times on average would you roll the
die to get a six for the first time? What is the probability that you get a six for the first time at the
second trial? What is the probability that you get a six at the 4th trial?
An urn contains red, blue and yellow marbles with respective proportions 20, 30 and 50%. (1)
Suppose you randomly pick a marble from the urn. What is the probability that it is a red one? (2)
Suppose you pick 4 marbles from the urn and note their color. What is the probability that the
fourth one is a blue marble? (3) How many marbles should you pick on average to get the first
yellow marble?
Minesweeper is a single-player puzzle game. The objective is to clear a rectangular board

containing hidden mines without detonating any of them, with help from clues about the number
of neighboring mines in each field. One version available online as of February 2021 has three
versions with increasing difficulties, i.e. easy, medium and hard. In the easiest version, there are
10 mines scattered through a 8 × 10 board (thus 80 cases). The medium version contains 40
8𝑋 takes the value −$1 ($1 bet lost) with probability 1/3 and the value $2 (twice the initial bet won) with probability
2/3. So, the expected value of the game is $1.
26
mines scattered through a 14 × 18 board and the hard version contains 99 mines hidden on a
20 × 24 board.
Easy-level minesweeper board at start Easy-level minesweeper board at play
(1) Suppose you play an easy level minesweeper. What is the probability that you blow a mine at
your first trial? (2) Suppose you play a medium level minesweeper and clear the board by
randomly selecting the cases without following any information about the number of neighboring
mines as it unfolds. What is the probability that you do not detonate a mine after your first three
trial but blow up one at your fourth trial? Are the random trials independent? (3) Answer the
preceding question assuming that a geometric probability model applies. (4) Now, we play a hard
level minesweeper and clear the board with no attention to the numbers displayed on cleared
cases. What is the probability that you detonate a mine at your fourth trial? Calculate the same
probability using the geometric probability model.9
Stanley Milgram began a series of experiments in 1963 to estimate what proportion of people
would willingly obey an authority and give severe shocks to a stranger. Milgram found that about
65% of people would obey the authority and give such shocks. Over the years, additional research
suggested this number is approximately consistent across communities and time.
9The easy level has 80 cases and 10 mines. So, the probability that a randomly cleared case has a mine is 𝑝 = 1⁄8. In
the same way, the medium level game board has 14 × 18 = 252 cases and the probability that a randomly cleared
case has a mine is 𝑝 = 40⁄252. For the hard level, these numbers are 480 cases and 𝑝 = 99⁄480. (1) The chances to
detonate a mine at the first trial during an easy-level game is 𝑃(𝑋 = 1) = 1⁄8. (2) The probability that the first three
cases are clear of mine and that one blows a mine at the fourth trial during a medium-level game is
212 211 210 40
𝑃((𝑋1 = 0) ∩ (𝑋2 = 0) ∩ (𝑋3 = 0) ∩ (𝑋4 = 1)) = ( ) × ( ) × ( ) × ( ) = 9.5429%. (3) If one applies the
252 251 250 249
geometric distribution, 𝑃(𝑋 = 4𝑡ℎ 𝑡𝑟𝑖𝑎𝑙) = (1 − 40⁄252)4−1 × (40⁄252) = 9.4507%. (4) The probability of no
381 380 379 99
detonation at the first three trial followed by a detonation is ( ) × ( ) × ( ) × ( ) = 10.36%. The same
480 479 478 477
calculation using the geometric model yields 𝑃(𝑋 = 4) = (1 − 99⁄480) 4−1
× (99⁄480) = 10.3144%.
27
A researcher wants to repeat Milgram’s experiments, but she only wants to sample people until
she finds someone who will not inflict the worst shock. What are the chances that she will stop
the study after the first person? After the third? After the tenth?
An employee wants to break into a password-protected computer file. Assume there are 𝑛
passwords, only one of which is correct. The employee tries them in a random order. Let 𝑋 be the
number of trials required to break into the file. What is the probability distribution function of 𝑋
if unsuccessful trials are not eliminated? If 𝑛 = 10, what is the expected number of trials?10
Harry is a young but promising basketball player. He is 208 cm tall so he’s likely to play either
power-forward or center. For that reason, he’s working hard to improve his free-throws
shootings. He’s currently making 60% of his free-throws. Assume that during the next game Harry
goes 10 times behind the free-throw line. Each successful free-throw adds 1 point to the team’s
score. What is the expected contribution of Harry’s free-throw shots to his team’s score? What is
the probability that he makes 7 or more of his free-throws out of 10 trials?
On average, 10% of the world population is left-handed. In a survey 20 people are asked whether
they are left-handed or not. What is the probability to find the first left-handed person at the 4th
participant?11
You play a chance game. You roll a six-sided die and win 6 times your bet if the die turns a 6. For
example, if you bet $10 and the die turns 6, then your payoff is $10 × 6 = $60. If the die lands on
a number other than 6, you lose your $10 bet. (1) Let 𝑋 be the random variable defined as the
Profit & Loss (i.e. P&L) of a player who bets $10 on this game one time. Build a probability model
for 𝑋. Calculate the expected value 𝐸(𝑋) and the variance 𝑉𝑎𝑟(𝑋) of 𝑋. (2) Suppose you play the
game 𝑛 = 1,2, … times. Calculate the expected value of 𝑛 such that you lose the previous bets and
win at the 𝑛th roll.12
10 The model is a geometric distribution with pmf 𝑃(𝑛 = 𝑛𝑖 ) = (1 − 0.1)𝑛𝑖 −1 0.1. For 𝑛 = 10, the expected number of
trials before he breaks into the computer file is 𝐸(𝑛) = 1⁄0.1 = 10 because each password is equally likely, 𝑝 = 0.1.
11 Using the geometric pmf with 𝑝 = 0.1, we calculate 𝑃(𝑋 = 4) = (1 − 0.1)3 0.1 = 7.29%.
12 (1) The probability model of 𝑋 can be described as follows:
outcome 6 not 6
28
Exercise 11. Binomial distribution
Consider again the proportion of left-handed people, 𝑝 = 10%. Assume one randomly picks 25
people. What are the chances that there are exactly 2 left-handed individuals in this sample?13
The probability that a planted cucumber seed germinates is 80%. A gardener planted nine seeds.
What is the average number of seeds that will successfully germinate? What is the probability that
5 seeds germinate?
Suppose a six-sided die is tossed 5 times. What is the probability of getting exactly 2 fours? What
is the probability of getting less than 2 fours, 𝑃(𝑋 < 2)?
Consider a factory specialized in the production of fan switches. Engineers detected that on
average 5% of the production is defective and can’t be commercialized. What is the expected
number of defective outputs from a sample of 30 randomly selected switches? What will be the
standard deviation? Calculate the probability of detecting at most 2 deficient fan switches, i.e.
𝑃(𝑋 ≤ 2), in a quality control process where 30 items are randomly selected and checked.
Public health statistics suggest that the probability that a random smoker will develop a severe
lung condition in his or her lifetime is about 30%. Assume you have 40 smoking friends. About
how many would you expect to develop such a condition? What is the standard deviation of the
number of people who would develop such a condition?14
𝑥𝑖 $60 − $10 = $50 −$10

𝑓(𝑥𝑖 ) 1⁄6 5⁄6
2
Then 𝐸(𝑋) = ∑𝑥𝑖 𝑓(𝑥𝑖 = $50 ×
) (1 ⁄ )
6 + (−$10) × (5 ⁄ 6 = $0. 𝑉𝑎𝑟(𝑋) = ∑(𝑥𝑖 − 𝐸(𝑋)) 𝑓(𝑥𝑖 ) = $500. (2) The
)
probability of winning at the 𝑛th roll after losing the previous bets follows a geometric model with 𝑝 = 1⁄6, 𝑛 ∼
𝐺𝑒𝑜(𝑝 = 1⁄6). The expected value of 𝑛 is 𝐸(𝑛) = 1⁄(1⁄6) = 6 times.
13 Using the binomial pmf with 𝑝 = 0.1 and 𝑛 = 25, we calculate 𝑃(𝑋 = 2) = (25) 0.12 0.923 = 26.59%.
2
14 The expected value is 12 and the standard deviation 2,9.
29
In most countries, the ratio of boys to girls at birth is about 1.04:1. Two interesting cases exist in
the world. The same ratio in Singapore is 1.09:1 while in China the odds are quite high at 1.15:1.
What proportion of Singapore families with exactly 4 children will have at least 2 boys? Ignore the
probability of multiple births. Calculate the same probability for Chinese families.15
A company produces sports bikes. Quality control engineers consider that 90% of the bikes pass
the final inspection whereas 10% fail and need to be fixed. Assume 20 bikes are randomly chosen
from the factory for quality control. How many would you expect to pass the final inspection?
What is the probability that 18 bikes or more from this sample pass the inspection, 𝑃(𝑋 ≥ 18)?
Exercise 18. Geometric and binomial distributions
In a multiple-choice exam, there are 5 questions and 4 choices for each question (a, b, c, d). Nancy
has not studied for the exam at all and decides to randomly guess the answers. What is the
probability that (1) the first question she gets right is the 5th question? (2) she gets all of the
questions right? (3) She gets at least one or more questions right?
Exercise 19. Geometric and binomial distributions
Students enrolled in the finance major of a graduate business school follow a self-study refresher
program on mathematics and statistics prior the beginning of classes. They then take an exam as
a multiple-choice test broken down in two blocks, namely, one section on mathematics and
another one on probability and statistics. Each section has 20 questions and each question admits
4 choices, with only one of which being the right answer. Consider a student who take this test
and suppose he randomly answers the questions. (1) What is the probability that he misses his
first 7 questions and answer his first question right at his 8th trial? (2) What is the probability that
he gets 3 or more questions right in the mathematics section? Calculate the same probability for
the statistics section. (3) What is the probability that he answers 5 or more questions right in both
sections?
Exercise 20. Poisson distribution
15 The odds boys to girls at birth must first be converted to probabilities, 𝑃(𝑏𝑎𝑏𝑦 𝑖𝑠 𝑎 𝑏𝑜𝑦 𝑎𝑡 𝑏𝑖𝑟𝑡ℎ) =
1.09⁄(1.09 + 1) = 0.5215. In a Singapore family with 6 children, the probability of finding at least 2 boys is
𝑃(𝑋 ≥ 2) = 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3) + 𝑃(𝑋 = 4) = 0.3736 + 0.2714 + 0.0739 = 0.7190.
30
It has been observed that the average number of traffic accidents on a highway between 7 and 8
AM is 1 per hour between Mondays and Fridays. What are the chances that there will be 2
accidents on this highway on a randomly chosen day of the week?16
Coliform bacteria are randomly distributed in a certain Arizona river at an average concentration
of 1 per 20cc of water. If we draw from the river a test tube containing 10cc of water, what is the
chance that the sample contains exactly 2 coliform bacteria?17
Hydrologists determined that overflow floods in a river occur once every 50 years on average.
Calculate the probability of observing 0, 1, 2, …, 6 overflow floods in a 100-year interval using the
Poisson distribution.18
The switchboard in a small law office gets an average of 2.5 incoming phone calls during the noon
hour on Thursdays. Staffing is reduced accordingly; people can go out for lunch in rotation.
Experience shows that the assigned levels are adequate to handle a high of 5 calls during that hour.
What is the chance that 6 calls will be received in the noon hour, some Thursday, in which case
the firm might miss an important call?19
Experience indicates that on average 6 customers per hour stop for gasoline at a gasoline pump.
(1) Calculate the probability that 3 customers stopping in any hour. (2) Calculate the probability
of 6 customers or less during the next two hours.
16 The basic rate is 𝜆 = 1 in hours units. Our observation window is 1 hour. To find the chances of observing 2 events
over such a window is given by a Poisson law like 𝑃(𝑋 = 2) = 12 𝑒 −1 ⁄2! = 0.1839. It’s not unlikely. One might get that
situation about once a week.
17 Our observation window is 10 cc. If the concentration is 1 per 20 cc, it is also 0.5 per 10 cc. So, 𝜆 = 0.5 is the rate
relevant to our chosen window. Applying the pmf of the Poisson random variable, we find 𝑃(𝑋 = 2) =
(0.5)2 𝑒 −0.5 ⁄2! = 0.0758.
18 If there is one overflow flood every 50 years, we should expect 𝜆 = 2 overflow floods in a century. Using the Poisson
distribution, we can calculate 𝑃(𝑋 = 0) = 0.1353, 𝑃(𝑋 = 1) = 0.2707, 𝑃(𝑋 = 2) = 0.2707, 𝑃(𝑋 = 3) = 0.1804,
𝑃(𝑋 = 4) = 0.0902, 𝑃(𝑋 = 5) = 0.0361, and 𝑃(𝑋 = 6) = 0.012.
19 The hourly rate is 2.5. The probability of getting 6 calls in the noon hour is 0.0278. This corresponds to a little more
than 1 missed phone call per month. How acceptable that is will depend on how cranky the firm’s clients are, and the
firm itself is in the best position to make that judgment.
31
Vehicles pass through a junction on a busy road at an average rate of 300 per hour. (1) Find the
probability that none passes in a given minute. (2) What is the expected number of passing in two
minutes? (3) Find the probability that this expected number of vehicles actually pass through the
junction in a given two-minute period.
An insurance company broker states that the probability to receive 2 policy claims during an
ordinary week in a city is 24.17% and the probability to receive 4 policy claims during an ordinary
week in the same city is 3.95%. Calculate the average number of claims the company registers
during an ordinary week in this city.20
ACME Corporation is specialized in the production of batteries for electric bicycles. The
probability that 2 batteries produced by ACME must be replaced during the next three years is
5.3626%. The probability that 3 batteries produced by ACME must be replaced during the next
three years is 0.715%. Calculate the probability that a randomly selected battery produced by
ACME needs no replacement during the next three years.21
Exercise 28. Poisson approximation to binomial distribution
Suicide is the 10th leading cause of death in the US at all ages. Statistics show that the suicide rate
is 18 per 100,000 people per year. Describe the binomial distribution that governs the occurrence
of suicides in the US. Is it accurate? If not, which distribution can be used instead? Find then the
probability that there will be 20 suicides per 100,000 inhabitants in the US during a given year.22
Flight statistics suggest that approximately 5% of the reservations on a given flight do not check
in, which explains why a company generally sell more tickets than the number of seats in a plane.
Consider a flight on an airplane with 98 seats and the company sold 100 tickets. What is the
20 The number of claims received during a given week is a random variable 𝑋 that follows a Poisson distribution with
parameter 𝜆. 𝑃(𝑋 = 2) = (𝜆2 𝑒 −𝜆 )⁄2! = 24.17% and 𝑃(𝑋 = 4) = (𝜆4 𝑒 −𝜆 )⁄4! = 3.95%. Building the system of
equations as 𝜆4 𝑒 −𝜆 = 0.0395 × 4! and 𝜆2 𝑒 −𝜆 = 0.2417 × 2!, we calculate 𝜆 = 1.400395, roughly 1.4 claims per week.
21 Let 𝑋 ∼ 𝑃𝑜𝑖(𝜆). We have 𝑃(𝑋 = 2) = 5.3626% and 𝑃(𝑋 = 3) = 0.715%. Using the Poisson pmf, we can solve for the
lambda of the model using the two equations. 𝜆 ≈ 0.4. Then, 𝑃(𝑋 = 0) = (0.40 𝑒 −0.4 )⁄0! = 0.67032
22 The proportion of US citizens who are going to suicide next year is 𝑝 = 18⁄100.000 = 0.00018. The binomial model
for the number of suicides in the US can be written as 𝑋 ∼ 𝐵(𝑛 = 100,000, 𝑝 = 0.00018). Yet, calculating the
probability of observing exactly 𝑥 suicides using binomial distribution can be difficult (why?). Since 𝑛 is large, we can
replace the original distribution by a Poisson distribution with 𝜆 = 𝑛𝑝 = 18. Evaluating the pdf, we find 𝑃(𝑋 = 20) =
7.98%.
32
probability that all passengers have their seats? Calculate the exact probability and repeat the
calculations using an appropriate approximation.23
On average, 1 computer in 800 crashes during a severe thunderstorm. A company had 4,000
working computers when the area was hit by such a thunderstorm. Which distribution is
appropriate for this experiment? Calculate the expected number of computers that will crash due
to such a thunderstorm. What is the corresponding standard deviation? Find the probability that
only 1 computer crashed. Find the probability that less than 3 computers crashed.
A manufacturer considers a fine-tuning of its machinery that produces 200 pieces of output per
month. The production contains on average 4 defective pieces. Engineers claim that the
modification will increase the productivity by reducing the number of defective pieces to 2 or less
per month. (1) Which probability distribution describes the number of defective parts out of 200
units of output from the machine? What are its parameters? (2) Consider your answer to 5.1.
Which alternative to this probability model can be used? Justify your answer briefly and calculate
the probability that the machinery produces equal to or less than 2 defective pieces, i.e. 𝑃(𝑋 ≤ 2)
as claimed by the engineers.
Exercise 32. Discrete distributions
Nicolas Batum is a French professional basketball player currently enrolled in the United States
National Basketball Association, commonly known as NBA. As of March 5th, 2021, Batum's career
statistics covering regular season games are as follows:
Games Minutes per Field goal 3-point field Rebounds Assists per Points per
played game perc. % goal perc. % per game game game
789 31.3 0.434 0.357 5.2 3.8 11.6
(1) What is the probability that during the next regular season game Batum makes 2 or less
rebounds? (2) Assume that during the next regular season game he attempts 10 three-point shots.
What is the probability that he makes 3 of these shots successful?24
23Let 𝑋 be the random variable that a passenger who bought a ticket does not check-in to the flight. Originally, 𝑋 ∼
𝐵(𝑛 = 100, 𝑝 = 0.05). If there are 98 seats available, then all passengers will have their seats if 2 or more passengers
do not check in to the flight, 𝑃(𝑋 ≥ 2) = 1 − (𝑃(𝑋 = 0) + 𝑃(𝑋 = 1)). Using the binomial model, we calculate
𝑃(𝑋 ≥ 2) = 1 − 0.0059 − 0.0312 = 96.29%. The Poisson approximation to the binomial model will use 𝑋 ∼
𝑃𝑜𝑖(𝜆 = 100 × 0.05 = 5). Then, 𝑃(𝑋 ≥ 2) = 1 − 0.0067 − 0.0337 = 95.98%.
5.20 𝑒 −5.2 5.21 𝑒 −5.2 5.22 𝑒 −5.2
24(1) 𝑃(𝑋 ≤ 2) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2) = + + = 0.0055 + 0.0287 + 0.0746 =
0! 1! 2!
10
0.1088. (2) 𝑃(𝑋 = 3) = ( ) 0.357 (1 − 0.357)
3 10−3
= 24.81%.
3
33
Exercise 33. Discrete distributions
According to the ESPN, JJ Redick from Philadelphia 76ers (a basketball team in the NBA) has the
best free-throw shooting percentage of the 2018 season. He’s making on average 92.6% of his
free-throws. On average, he takes 3.1 free-throw attempts per game. (1) What is the probability
that he misses his first shot on his 3th trial in an ordinary game? (2) What is the probability that
he makes 6 successful shots in a game in which he takes 8 attempts? (3) What is the probability
that he does not take any free-throw attempt in an ordinary game? (4) The regular season in NBA
consists of each team playing 82 games. What is Redick’s expected contribution to his team’s total
points from free-throw shots at the end of the regular season?25
Exercise 34. Normal distribution
Let 𝑋 ∼ 𝑁(𝜇 = 100, 𝜎 = 20). What is 𝑃(𝑋 ≤ 60)? 𝑃(𝑋 ≤ 140)? 𝑃(60 ≤ 𝑋 ≤ 140)?
Let 𝑋 ∼ 𝑁(𝜇 = 100, 𝜎 = 20). Find the 2.5, 50 and 97.5th percentiles of 𝑋.
𝑋 is a normal random variable with mean 𝜇 = 1000. What is the standard deviation of 𝑋 if
𝑃(𝑋 ≤ 880) = 0.015?
𝑋 ∼ 𝑁(𝜇, 2). Calculate 𝜇 if 𝑃(𝑋 ≥ 17.84) = 2.5%.
Scores of a standardized test are normally distributed with mean 200 and standard deviation 20.
What score would be needed to be in the first 10 percent of all test takers? Assume a participant
scored 242 in the test. Does she rank within the top 1 percentile?
8
251.1 𝑃((𝑋1 = 1) ∩ (𝑋2 = 1) ∩ (𝑋3 = 0)) = 0.926 × 0.926 × 0.074 = 0.0635. 1.2 𝑃(𝑋 = 6) = ( ) 0.9266 (1 −
6
𝜆𝑥 𝑒 −𝜆 3.10 𝑒 −3.1
0.926) 8−2 = 0.0967. 1.3 𝑃(𝑋 = 0) = = = 0.045. 1.4 If he takes on average 3.1 attempts per game and
𝑥! 0!
makes 0.926 of his shots, then his contribution to the team’s total score from free-throws is 𝐸(𝑋) = 0.926 × 3.1 =
2.8706 point per game. So, by the end of the season, his expected contribution from free-throws is 82 × 2.8706 =
235.39 points.
34
A speedometer measures the speeds of cars on a highway. Assume the speeds of cars on this
highway is normally distributed with mean 110 km/h and standard deviation 10 km/hr. What is
the probability that a randomly selected car on this highway is travelling at more 95 km/h?
An automobile manufacturer introduces a new model that averages 7.35 Liter per 100 kilometers
in the city. A customer who plans to purchase one of these new cars wrote the manufacturer for
the details of the tests and found out that the standard deviation is 0.25 Liter per 100 kilometers.
Assuming that in-city mileage is approximately normal, calculate the probability that the person
will purchase a car that averages between 7 and 7.5 Liter per 100 kilometers for in-city driving.
The length, 𝑋 centimeters, of eels in a river may be assumed to be normally distributed with mean
48 and standard deviation 8. An angler catches an eel from the river. Determine the probability
that the length of the eel is (1) exactly 60 cm; (2) less than 60 cm; (3) within 5% of the mean
length.
A college senior who took the Graduate Record Examination (GRE) scored 620 on the Verbal
Reasoning (VR) section and 670 on the Quantitative Reasoning (QR) section. The mean score for
VR section was 462 with a standard deviation of 119, and the mean score for the QR was 584 with
a standard deviation of 151. Suppose that both distributions are approximately normal. (1) Write
down the short-hand notation for these two normal distributions. (2) What is the participant’s z-
score on the VR section? On the QR section? Draw a standard normal distribution curve and mark
these two z-scores. What do these z-scores tell? (3) Relative to others, which section did she do
better on? Explain why simply comparing her raw scores from the two sections would lead to the
incorrect conclusion that she did better on the QR section.26
The average daily temperature in October in Gotham City is 14°C with a standard deviation of 3°C.
Suppose that the temperatures closely follow a normal distribution. (1) What is the probability of
26𝑉 ∼ 𝑁(462, 119) and 𝑄 ∼ 𝑁(584, 151). The z-scores are 𝑧𝑣=620 = (620 − 462)⁄119 = 1.3277 and 𝑧𝑞=670 = 0.5695.
These z-scores tell the distance in terms of standard deviation the score obtained by the candidate with respect to the
mean. Thus, she scored 1.3277 standard deviation above the mean in the verbal reasoning section and 0.5695
standard deviations above the mean in the quantitative reasoning section. When compared to other participants, she
had a relatively better mark on the VR section given the higher z-score. It would be misleading to compare the scores
obtained because of the difference in the parameters of the underlying normal distribution.
35
observing a temperature 15.5°C or higher in Gotham City during a randomly chosen day in
October? (2) What is the cutoff temperature for the coldest 10% of the days during October in
Gotham City?27
The weights, in grams, of the contents of tins of mackerel fillets are normally distributed with
mean 𝜇 and standard deviation 2.5. The value of 𝜇 may be adjusted as required. (1) Find the
proportion of tins with contents weighing between 125 grams and 130 grams when 𝜇 = 129. (2)
State, without proof, the value of 𝜇 which would maximize the proportion of tins with contents
weighing between 125 grams and 130 grams. (3) Find, to one decimal place, the value of 𝜇 such
that 99% of the tins have contents weighing more than 125 grams.28
Garden canes have lengths that are normally distributed with mean 208.5 cm and standard
deviation 2.5 cm. Ten canes are selected at random. Calculate the probability that exactly 6 of
these canes have lengths between 205 cm and 210 cm.
A researcher states that the distribution of car insurance premiums for drivers in Ile de France
(IDF) is approximately normal with a mean of €800. The researcher also states that 25% of IDF
residents pay more than €1,000. What is the cutoff value of the insurance cost for the 75th
percentile? Calculate the standard deviation of insurance premiums in IDF.
The time, 𝑋 minutes, taken by Freddy to install a satellite dish follows approximately a normal
distribution with mean 134 and standard deviation 16. Determine, to one decimal place, the time
exceeded by 10 per cent of installations.
27 30.85%. 10.15°C.
28 𝑋 ∼ 𝑁(𝜇, 2.5). (1) For 𝜇 = 129, 𝑥1 = 125 → 𝑧125 = −1.6 and 𝑥2 = 130 → 𝑧130 = 0.4. The proportion of tins
weighing between 125 and 130 grams is calculated by 𝑃(125 ≤ 𝑋 ≤ 130) = 𝑃(−1.6 ≤ 𝑍 ≤ 0.4) where 𝑍 is the
standard normal random variable. 𝑃(−1.6 ≤ 𝑍 ≤ 0.4) = Φ(0.4) − Φ(−1.6) = 0.6554 − 0.0548 = 60.06%. (2)
Because the normal distribution is symmetric, we would maximize the proportion of tins weighing between 125 and
130 grams by setting the mean to 127.5 grams. (3) 𝑃(𝑋 ≥ 125) = 0.99 = 𝑃(𝑍 ≥ 𝑧125 ). On the normal table, we find
the 1st percentile of the standard normal random variable because 99% of the density is above the 1st percentile. This
is equal to −2.3265. The mean that yields this score can then be solved from the equation −2.3265 = (125 − 𝜇)⁄2.5.
We obtain 𝜇 = 130.8.
36
The time, 𝑌 minutes, taken by Johnny, to install a satellite dish also follows approximately a normal
distribution. We know the following: 𝑃(𝑌 ≤ 156.58) = 𝑃(𝑌 ≥ 203.52) ≈ 0.025. Determine to the
nearest minute, the values of the mean and standard deviation of 𝑌.
In the prologue of the Lord of the Rings, J.R.R. Tolkien describes that Hobbits are between two and
four feet tall, i.e. 60 to 122 cm. The average height is about 107 cm. The Elves, on the other side,
were generally about six feet tall, that is 182 cm. Assume that the heights of Hobbits and Elves
follow two normal random variables with 𝑋 ∼ 𝑁(105, 6) and 𝑌 ∼ 𝑁(182, 8), and that Frodo
Baggins, who is a Hobbit was 114 cm tall and Legolas, who is an Elf, 188 cm. Who can be
considered taller relative to his own race?
Suppose weights of the checked baggage of airline passengers follow a nearly normal distribution
with mean 20 kg and standard deviation 4 kg. Consider a company that charges a fee for baggage
that weigh more than 22 kg. Instead of charging each extra kilogram, the company prefers a
stepwise scheme. The dollar amounts charged for excess weights are given in table below:
Baggage Extra fee

22 to 24.999 kg $12
25 to 29.999 kg $20
30 or more $40
Determine what percent of airline passengers incur an extra fee. Calculate the expected income
from extra baggage fees per passenger. (Note: Since the distribution is continuous and for
simplicity, you can round the numbers to the closest integer after the decimals and assume there
is no difference between 𝑃(𝑋 ≤ 24.999) and 𝑃(𝑋 ≤ 25)).
The distance, in kilometers, travelled to work by the employees of a city council may be modelled
by a normal distribution with mean 7.5 and standard deviation 2.5. (1) Find 𝑑 such that 10% of
the employees travel less than 𝑑 kilometers to. (2) Find the probability that the mean distance
travelled to work by a random sample of 6 of the council’s employees is less than 5 km.
A gas supplier keeps a team of engineers who are available to deal with leaks reported by
customers. Most reported leaks can be dealt with quickly, but some need a long time. The time
37
(excluding travelling time) taken to deal with reported leaks is found to have a mean of 65 minutes
and a standard deviation of 60 minutes.
Assuming that the times may be modelled by a normal distribution, estimate the probability that
(1) it will take more than 185 minutes to deal with a reported leak; (2) it will take between 50
minutes and 125 minutes to deal with a reported leak; (3) the mean time to deal with a random
sample of 90 reported leaks is less than 70 minutes.
A statistician, consulted by the gas supplier, stated that, as the times had a mean of 65 minutes
and a standard deviation of 60 minutes, the normal distribution would not provide an adequate
model. (4) Explain the reason for the statistician’s statement. (5) Explain why, despite the
statistician’s statement, your answer to (3) is still valid.29
According to a Wikipedia article, nearly 95% of newborns of European are comprised between
2500 and 5000 grams. It is also known that the birth weight in babies of European heritage nearly
follows a normal distribution, 𝑋 ∼ 𝑁(𝜇, 𝜎). Calculate the mean 𝜇 and standard deviation 𝜎 of the
corresponding normal random variable 𝑋.
Engineers in an industrial hardware company collect statistics to calibrate a specific hydraulic

press. A well calibrated press produces 99% of its output 𝑋 between 3992 and 4008 grams.
Assuming a normal distribution fits the weight 𝑋 of each unit of output delivered by the press,
calculate the mean 𝜇 and standard deviation 𝜎 of 𝑋 in grams.
Suppose the travel time from your home to your office is a normal random variable as 𝑋 ∼
𝑁(𝜇 = 40 𝑚𝑖𝑛𝑢𝑡𝑒𝑠, 𝜎 = 8 𝑚𝑖𝑛𝑢𝑡𝑒𝑠). If you want to be 95% certain that you will not be late for an
office appointment at 9 A.M., what is the latest time that you should leave home?30
29 (4) Mean only just over one standard deviation above zero. Normal model would give a substantial chance of
negative time which is impossible. (5) Large sample and independent observations, sample mean normally
distributed by the Central Limit Theorem.
30 To be 95% certain for not being late to an appointment, we should find the 95 th quantile of 𝑋 ∼ 𝑁(40, 8). Using the
normal table, we get 𝑧0.95 = 1.645. Solving for 𝑥 in 1.645 = (𝑥 − 40)⁄8, we get 𝑥 = 53.16 minutes. Therefore, one
should leave by 8:06 A.M. approximately to be 95% certain for not being late to the appointment.
38
Based on the current public health policy in France, children's health status is monitored by means
of a "health booklet" which established first at birth. One of the sections of this booklet shows the
children's weight as function of their age, expressed in months. A casual observation of the weight
curves suggests the following for 15-months old children:
25th percentile 75th percentile

9.1 kg 11.2 kg
Let 𝑋 denote the random variable of a 15-months old children born and raised in France. Assume
for simplicity that 𝑋 follows a normal distribution. Calculate the mean 𝜇 and the standard
deviation 𝜎 of 𝑋. Hint: The 0.25th and 0.75th quantiles of a standard normal random variable are,
respectively, −0.67449 and 0.67449.31
A well-known brand sells chocolate cookie. Suppose that the weight of all such cookies is a
normally distributed random variable with mean 𝜇 and standard deviation 𝜎, i.e. 𝑋 ∼ 𝑁(𝜇, 𝜎). The
parameters of 𝑋 are unknown. Suppose we draw a random sample of chocolate cookies and get
the following summary statistics (numbers given in grams):
min. 1st quartile 3rd quartile max.

19.91 22.33 23.73 26.13
Calculate the mean 𝜇 and standard deviation 𝜎 of 𝑋.32
Exercise 57. Normal approximation to binomial distribution
Smoking constitutes the primary cause for premature deaths that could be prevented in France.
Researchers find that one smoker out of two is likely to develop severe health problems
throughout his life due to reasons linked to smoking. This is also a common public health problem
among young people. Indeed, nation-wide statistics provided by INSEE suggest that 32% of the
teenagers in France is smoking. Consider now a survey carried out across 1,200 teenagers in
France. If the proportion reported by INSEE corresponds to the true population parameter,
calculate the probability to find exactly or more than 380 smokers in this sample?
31 Using the 25th and 75th percentiles of the standard normal random variable, we solve the mean and standard
deviation of 𝑋 by means of the two equations with two unknowns as,
−0.67449 = (9.1 − 𝜇)⁄𝜎 and 0.67449 = (11.2 − 𝜇)⁄𝜎
We find 𝜇 = 10.15 kg and 𝜎 = 1.5567 kg.
32 We know 𝑃(𝑋 ≤ 22.33) = 25% and 𝑃(𝑋 ≤ 23.73) = 75%. Standardizing 𝑃(𝑍 ≤ 𝑧 ) = 25% and 𝑃(𝑍 ≤ 𝑧 ) = 75%.
1 2
It turns out that 𝑧1 = −0.6745 and 𝑧2 = 0.6745. We can build the system of equations with two unknowns as
−0.6745 = (22.33 − 𝜇)⁄𝜎 and 0.6745 = (23.73 − 𝜇)⁄𝜎. Solving for 𝜇 and 𝜎 we get, 𝜇 = 23.03 and 𝜎 = 1.0378.
39
The management of a large store estimates that on average 30% of the people entering the store
make a purchase. (1) Using the normal approximation to binomial distribution, find the
probability that out of 30 people entering the store, 10 or more will make a purchase. (2) Calculate
also the same probability using the binomial model on software (like MS Excel) and compare with
the result you obtained in the previous question. Does the normal approximation yield a result
close to the one you got from the binomial model?33
It is believed that nearsightedness (myopia) affects about 8% of all children. Consider a sample
with 200 children. Calculate the probability to find at most 12 nearsighted children in this sample,
i.e. 𝑃(𝑋 ≤ 12) using the normal approximation to the binomial distribution.
OECD statistics on the prevalence of “violence against women” suggest that 26% of women in
France have experienced physical and/or sexual violence from an intimate partner at some time
in their life.34 Feeling uncomfortable because of this statistic, a public agency has conducted a
survey among 1,240 women living in metropolitan France. Among the participants, 288 declared
having been victim of such violence from an intimate partner. (1) Which distribution describes
the number of women victims of violence in France in a survey with 1,240 participants? What is
the expected value of violence victims? What is the standard deviation? (2) What is the probability
that we find less than or equal to 288 violence victims, i.e. 𝑃(𝑋 ≤ 288), in this survey?
Exercise 61. 𝒕 distribution
Let 𝑋 be a 𝑡 random variable at 𝑘 = 5 degrees of freedom, 𝑋 ∼ 𝑇(𝑘 = 5). What is 𝑃(𝑋 ≥ 0.7267)?
What is 𝑃(𝑋 ≥ 2.015)?
Let 𝑋 ∼ 𝑇(𝑘 = 12). Using the t probability table, calculate the following: (1) 𝑃(𝑋 ≥ 0.6955); (2)
𝑃(𝑋 ≤ 1.3562); (3) 𝑃(−2.1788 ≤ 𝑋 ≤ 2.1788); (4) 𝑃(−0.6955 ≤ 𝑋 ≤ 2.1788).
33 𝑋 ∼ 𝐵(𝑛 = 30, 𝑝 = 0.3). The probability that 10 or more customers will make a purchase is given by the cumulative
binomial probabilities 𝑃(𝑋 ≥ 10) = 𝑃(𝑋 = 10) + 𝑃(𝑋 = 11) + ⋯ + 𝑃(𝑋 = 30). Using the normal approximation, 𝑋 ∼
𝑁(𝜇 = 30 × 0.3 = 9, 𝜎 = √30 × 0.3 × 0.7 = 2.51), we calculate 𝑃(𝑋 ≥ 10) = 1 − 𝑃(𝑋 ≤ 10) = 1 − 𝑃(𝑍 ≤ 0.3984) =
0.3452. Using software, we calculate the exact binomial probability as 0.2696. The difference is not negligible for the
number of trials 𝑛 is not sufficiently high for the approximation to yield a precise answer.
34 https://data.oecd.org/inequality/violence-against-women.htm
40
Let 𝑋 ∼ 𝑇(𝑘 = 12). Using the 𝑡 probability table, calculate the following: (1) 𝑃(𝑋 ≥ 1.5); (2)
𝑃(𝑋 ≤ 2.3); (3) 𝑃(𝑋 ≤ −2); (4) 𝑃(−2 ≤ 𝑋 ≤ 2). If it is not possible to calculate the exact value,
you can suggest a range of values for each question.
Let 𝑋 ∼ 𝑇(𝑘 = 10). Using the t-probability table, calculate (1) 𝑃(𝑋 ≥ 𝑥) = 0.25 → 𝑥 = ?; (2)
𝑃(𝑋 ≤ 𝑥) = 0.9 → 𝑥 = ?; (3) 𝑃(𝑋 ≤ 𝑥) = 0.05 → 𝑥 = ?; (4) 𝑃(𝑥 ≤ 𝑋 ≤ |𝑥|) = 0.99 → 𝑥 = ? 35
What are the 95th and 99th percentiles of a 𝑡 random variable at 50 degrees of freedom?
Exercise 66. 𝝌𝟐 distribution
Let 𝑋 ∼ 𝜒𝑘2 = 8 . Using the chi-square probability table calculate the following: (1) 𝑃(𝑋 ≥ 2.18);
(2) 𝑃(𝑋 ≤ 2.73); (3) 𝑃(3.49 ≤ 𝑋 ≤ 13.36).
Exercise 67. 𝝌𝟐 distribution
Let 𝑋 ∼ 𝜒𝑘2 = 8 . Using the chi-square probability table calculate the following: (1) 𝑃(𝑋 ≥ 𝑥) =
0.975 → 𝑥 = ?; (2) 𝑃(𝑋 ≤ 𝑥) = 0.05 → 𝑥 = ?; (3) 1 − 𝑃(𝑋 ≤ 𝑥) = 0.1 → 𝑥 = ?
Exercise 68. Discrete and continuous distributions
David started a 3-months long workout program of free running. Below are the records of the
number of times he went out running as well as the total distance in kilometers he made per week.
week 1 2 3 4 5 6 7 8 9 10 11 12
workouts 2 2 3 2 3 3 2 3 3 3 3 2
distance 16.20 16.00 23.25 17.00 24.60 24.15 15.70 23.10 22.50 23.85 24.60 16.00
Suppose David decides continuing his training program during the next 3 months. Assume that
his performance will remain unchanged and the number of times he runs on a given week as well
as the total distance he runs are not correlated over time. (1) Calculate the probability that he run
only once on a given week during the next 3 months. (2) Calculate the probability that he runs 30
times during the next 3 month-period. (3) Assume that a normal model is appropriate to
parameterize the distance he runs on a given workout. Calculate first the average distance per
35 (1) 𝑥 = 0.6998; (2) 𝑥 = 1.3722; (3) 𝑥 = −1.8125; (4) 𝑥 = −3.1693.

41
workout and the sample standard deviation of the distance in kilometers. Calculate then the
probability that David runs more than 8 kilometers during his next workout.
42
Dropped. Uniform distribution
A random variable 𝑋 that follows a discrete uniform distribution with parameters 𝑎 and 𝑏, 𝑎 < 𝑏,
has the following probability mass function (pmf),
1
𝑓(𝑥; 𝑎, 𝑏) = 𝑃(𝑋 = 𝑥) = (28)
𝑏−𝑎+1
The cumulative distribution function (cdf) is defined as,
𝑥−𝑎+1
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = (29)
𝑏−𝑎+1
It can be shown that 𝐸(𝑋) = (𝑎 + 𝑏)⁄2 and 𝑉𝑎𝑟(𝑋) = ((𝑏 − 𝑎 + 1)2 − 1)⁄12. Graphically, the
discrete uniform pmf looks like,
43
Dropped
Negative binomial distribution (Pascal distribution)
Let 𝑋1 . 𝑋2 . … . 𝑋𝑛 be a sequence of Bernoulli trials.
Definition: The random variable 𝑋 follows a negative binomial distribution if it represents the
number of trials 𝑟 necessary to obtain a particular number of success out of 𝑛 Bernoulli trials.
The probability function of a negative binomial variable. denoted 𝑋 ∼ 𝑁𝐵(𝑟. 𝑝). is.
𝑥 − 1 𝑟 (1
𝑓𝑋 (𝑥; 𝑟. 𝑝) = 𝑃(𝑋 = 𝑟) = ( )𝑝 − 𝑝) 𝑥−𝑟 (30)
𝑟−1
The expected value is 𝐸(𝑋) = 𝑟⁄𝑝 and the variance 𝑉𝑎𝑟(𝑋) = 𝑟(1 − 𝑝)⁄𝑝2 .
Notice that in the case of binomial distribution. the random variable is described as the number
of successes at the end of a given number of trials. 𝑋 ∼ 𝐵(𝑛. 𝑝). whereas in the case of negative
binomial distribution. the random variable represents the number of trials needed to obtain a
given number of successes. 𝑋 ∼ 𝑁𝐵(𝑟. 𝑝). The relationship between the binomial and negative
binomial distributions can be summarized as follows:
𝑃(𝑋𝑁𝐵 ≤ 𝑛) = 𝑃(𝑋𝐵 ≥ 𝑟) (31)
Suppose we roll a dice repeatedly and count the number of sixes. If we continue rolling the die
until it turns 6 twice. we are conducting a negative binomial experiment. The negative binomial
variable is then the number of rolls required to achieve 6 two times.
As for the binomial model. some conditions must be met to apply the negative binomial
distribution. First. each trial must be an independent Bernoulli variable. The probability of success
or failure must remain the same. Finally. the last trial must be a success.
Example. Suppose we roll a fair die and consider 6 as the success. What is the probability of getting
two successes at the 10th roll?
Solution: The experiment fits a negative binomial distribution. So.
2
10 − 1 1 1 10−2
𝑃(𝑋 = 10) = ( ) ( ) (1 − ) = 5.81%
2−1 6 6
Example. A factory produces various auto parts. Engineers know that 5% of the fan switches
produced by a machine are deficient. What is the probability that under a quality control
operation. three deficient fan switches are found out of 10 controlled parts?
Solution: We need to find the probability of getting three successes in the tenth trial. Evaluating
10 − 1
the pdf. we calculate 𝑃(𝑋 = 10) = ( ) 0.053 0.957 = 0.31%.
3−1
44
Dropped
Poisson distribution: Bortkiewicz (1898) data
The classic Poisson example is the data set of von Bortkiewicz (1898). for the chance of a Prussian
cavalryman being killed by the kick of a horse.36 The data show the number of deaths from horse
or mule kicks for 14 army corps observed from 1875 to 1894. There are 196 deaths reported out
of a total of 280 observations. This yields to an average number death per year by 196/280 = 0.7.
Therefore. in any given year. we expect to observe 0.7 deaths in one corps. Here is a classic Poisson
situation: a rare event. whose average rate is small. with observations made over many small
intervals of time.
Using the von Bortkiewicz data. calculate the probability that there are no deaths by horse kick in
a given year.
Solution: The rate is 𝜆 = 0.7. Using the pdf of the Poisson distribution. we calculate
(0.7)0 𝑒 −0.7
𝑃(𝑋 = 0) = = 0.4965
0!
36 The original dataset can be imported into the R software as part of the package “vcd”.
45
Dropped
46
Dropped
Normal approximation to binomial distribution
Let 𝑋 ∼ 𝐵(𝑛, 𝑝), a binomial random variable with parameters 𝑛 and 𝑝. For large 𝑛, it can be shown
that the binomial distribution can be well approximated by a normal distribution with
𝑝
parameters, 𝜇 = 𝑛𝑝 and 𝜎 = √𝑛𝑝(1 − 𝑝), i.e. 𝑋 ∼ 𝐵(𝑛, 𝑝) → 𝑁(𝑛𝑝, √𝑛𝑝(1 − 𝑝)).
To see this, consider a Bernoulli process 𝑋1 , … , 𝑋𝑛 and define 𝑌 = 𝑋1 + ⋯ + 𝑋𝑛 . Each 𝑋𝑖 share the
same probability of success 𝑝. Assume we want to calculate 𝑃(𝑌 ≤ 𝑦). Since 𝑌 is the sum of 𝑛
independently and identically distributed random variables (recall the definition of a Bernoulli
process), then, for large 𝑛, the Central Limit Theorem states that the following transformation of
𝑌 is approximately distributed as a standard normal random variable:
𝑌 − 𝑛𝑝 𝑝
𝑍= → 𝑁(0,1)
√𝑛𝑝(1 − 𝑝)
In practice, the accuracy of the approximation can be easily verified using statistical software. If
not, a rule of thumb consists in using the normal approximation if the products 𝑛𝑝 and 𝑛(1 − 𝑝)
are both greater than 5. In addition, the farther the probability of success of each Bernoulli trial 𝑝
is away from 0.5, the larger 𝑛 is needed. For example, if 𝑝 = 0.9, then to obtain 𝑛(1 − 𝑝) ≥ 5, we
would have to have at least 50 observations.
Example. Consider a sequence of Bernoulli variables X1 , … , X120 each with success probability p =
0.25. What is the probability of observing, say, 32 or more success in this sequence? Using the
binomial model, we must calculate P(X ≥ 32) = P(X = 32) + ⋯ + P(X = 120), which would take
too much time using pen & paper unless one makes use of software. Since n is large, however, an
alternative consists in approximating the original binomial distribution by a normal distribution
with parameters μ = np = 30 and σ = √0.25 × (1 − 0.25) × 120 = 4.74, i.e. X ∼ B(n = 120, p =
0.25) can be replaced by X ∼ N(μ = 30, σ = 4.74). So, we get P(X ≥ 32) = 1 − P(X ≤ 32) = 1 −
P(Z ≤ 0.4216) = 0.3366. To verify how good the approximation is, we can use Excel. In fact, the true
binomial model gives P(X ≥ 32) = 0.2948. The difference between the binomial and normal
probabilities is not negligible. The approximation will do better as n gets larger.
47
Dropped
Exercise:
Let 𝑋 ∼ 𝑁(𝜇, 𝜎). Show that the following variable 𝑍 = (𝑋 − 𝜇)⁄𝜎 follows 𝑍 ∼ 𝑁(0, 1).
Solution:
We need to show that the random variable follows 𝑍 ∼ 𝑁(0, 1). So, let’s find the cdf of 𝑍, which is
also incidentally referred to as Φ(𝑧) in many resources. The cdf is,
𝑋−𝜇
𝐹(𝑧) = 𝑃(𝑍 ≤ 𝑧) = 𝑃 ( ≤ 𝑧)
𝜎
which, by rearranging and using the normal pdf, is equivalent to,

𝜇+𝑧𝜎
1 1 𝑥−𝜇 2
𝐹(𝑧) = 𝑃(𝑋 ≤ 𝜇 + 𝑧𝜎) = ∫ exp {− ( ) }
−∞ 𝜎√2𝜋 2 𝜎
To perform the integration, let’s use the change of variables technique. Let
𝑥−𝜇
𝑎=
𝜎
Therefore, 𝑥 = 𝜎(𝑎) + 𝜇 and 𝑑𝑥 = 𝜎𝑑𝑎.
For the endpoints of the integral; if 𝑥 = −∞, then 𝑎 also equals −∞; and if 𝑥 = 𝜇 + 𝑧𝜎, then 𝑎 =
(𝜇 + 𝑧𝜎 − 𝜇)⁄𝜎. Therefore, after making all of the substitutions for 𝑥, 𝑎 and 𝑑𝑥, the integral looks
like this:
𝑧
1 1
𝐹(𝑧) = ∫ exp {− 𝑎2 } 𝜎𝑑𝑎
−∞ 𝜎√2𝜋 2
And since the 𝜎 in the denominator cancels out the 𝜎 in the numerator, we get
𝑧
1 1
𝐹(𝑧) = ∫ exp {− 𝑎2 } 𝑑𝑤
−∞ √2𝜋 2
We should now recognize that as the cdf of a normal random variable with mean 𝜇 = 0 and
standard deviation 𝜎 = 1. The proof is complete.
48

Notes ch2 Discrete and Continuous Distributions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes ch2 Discrete and Continuous Distributions

Uploaded by

Copyright:

Available Formats

Discrete and Continuous Distributions

1 Discrete distributions .............................................................................................................................................3

1.1 Bernoulli distribution ...................................................................................................................................3

1.2 Geometric distribution .................................................................................................................................3

1.3 Binomial distribution ....................................................................................................................................4

1.4 Poisson process and Poisson distribution............................................................................................5

2 Continuous distributions ......................................................................................................................................6

2.1 Normal distributions .....................................................................................................................................6

2.2 Standard normal distribution....................................................................................................................8

2.3 Lognormal distribution ............................................................................................................................. 10

2.4 Student’s 𝒕-distribution............................................................................................................................. 12

2.5 Chi-square distribution ............................................................................................................................. 14

2.6 𝑭 distribution ................................................................................................................................................ 15

1.1 Bernoulli distribution

which can be also rewritten as,

𝑓(𝑥; 𝑝) = 𝑝 𝑥 (1 − 𝑝)1−𝑥 (2)

2. A Bernoulli process is a finite or infinite sequence of independently and identically distributed

1.2 Geometric distribution

𝑓(𝑛; 𝑝) = 𝑃(𝑋 = 𝑛) = (1 − 𝑝)𝑛−1 𝑝 (3)

It can be shown that 𝐸(𝑋) = 1⁄𝑝 and 𝑉𝑎𝑟(𝑋) = (1 − 𝑝)⁄𝑝2 .

of seeing such a sequence of failures with probabilities 1 − 𝑝 followed by a success with

𝑃((𝑋 = 0) ∩ ⋯ ∩ (𝑋 = 0) ∩ (𝑋 = 1)) = (1 − 𝑝)𝑛−1 𝑝

1.3 Binomial distribution

5. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a sequence of iid Bernoulli trials. If 𝑋 represents the probability of observing

The mean of 𝑋 is 𝐸(𝑋) = 𝑛𝑝 and its variance 𝑉𝑎𝑟(𝑋) = 𝑛𝑝(1 − 𝑝).

1.4 Poisson process and Poisson distribution

▪ The outcome of each random variable can be classified as a success or failures.

9. Poisson approximation to binomial distribution The Poisson distribution arises as a limiting

See the appendix for details.

2.1 Normal distributions

1 This paragraph is taken from https://newonlinecourses.science.psu.edu/stat414/node/87/.

Figure 1. Normal pdf and cdf for 𝑿 ∼ 𝑵(𝝁 = 𝟓𝟎, 𝝈 = 𝟏𝟎)

(𝑋 + 𝑌) ∼ 𝑁 (𝜇𝑋 + 𝜇𝑌 , √𝜎𝑋2 + 𝜎𝑌2 ).

Figure 2. Normal distributions

2.2 Standard normal distribution

which is also denoted often as Φ(𝑧) or 𝑍 ∼ 𝑁(0,1).

2.3 Lognormal distribution3

because the support of the distribution is fixed to start at 0.4

3 This section can be skipped for an undergrad course.

standard normal t(df = 4) standard normal t(df = 20)

▪ The data come from a normal population.

2.5 Chi-square distribution

27. The 𝜒 2 distribution is a right-skewed continuous distribution. It is characterized by one

𝑋12 + ⋯ + 𝑋𝑘2 ∼ 𝜒𝑘2 (20)

Therefore, if 𝑋 ∼ 𝜒𝑘2 , by linearity of expectation we have 𝐸(𝑋) = 𝑘. In addition, since

𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟(𝑋1 ) + ⋯ + 𝑉𝑎𝑟(𝑋𝑘 ) = 𝑘 (𝐸(𝑋14 ) − 𝐸 2 (𝑋12 ))

Figure 5. 𝝌𝟐 density function for three different values of 𝒌

where 𝑐 is a constant defined as,

32. The expected value of an 𝐹 random variable is

for 𝑘2 > 2. And the variance is

which is defined only for 𝑘2 > 4.

The following equation is also frequently used to compute an F statistic

Denote the product 𝑛𝑝 = 𝜆. Then,

Plugging these results back into the binomial formula, it simplifies to

Consider a Bernoulli process 𝑋1 , 𝑋2 , … , 𝑋𝑛 and define 𝑌 = 𝑋1 + ⋯ + 𝑋𝑛 . Each 𝑋𝑖 share the same

Each table entry shows the area under the

Each table entry shows the area under the

Table entries show the 𝑡 value given the

Areas in the chi-square table refer to the right

Exercise 1. Bernoulli random variable

Exercise 2. Bernoulli random variable

Exercise 3. Geometric distribution

Exercise 4. Geometric distribution