You are on page 1of 9

BSU5335 – Unit I Session 06: Probability Distributions

Session 6

Probability Distributions

Contents
Introduction, p46
6.1 The Bernoulli distribution, p47
6.2 Binomial distribution, p48
6.3 Poisson distribution, p49
6.4 Normal distribution, p51
Summary, p53
Learning Outcomes, p54

Introduction

Health professionals very often use the term ‘probability’ in their day-to-day
work to express the level of uncertainty of health events that occur. Most of
the information related to patient diagnosis and outcome of treatment are
uncertain so that we cannot predict outcomes in health care with 100%
assurance. For example, a cardiothoracic surgeon may say that the patient
has 80% chance of surviving if he/she undergoes a Coronary Artery Bypass
Graft (CABG) surgery. A nurse may say that about one fourth of her
patients would not follow the medical advices given to them. Further, it is
observed that most of the data collected in health sciences research follow
some specific patterns. A pre-specified pattern of a data set is known as the
distribution of the data set. Such data sets are said to have parametric
distributions. However, there are data sets that do not follow any sort of
pattern, and such data sets are said to have non-parametric distributions. In
session 1, we learned that a random variable is a variable whose possible

46 Copyright © 2020, The Open University of Sri Lanka


BSU5335 – Unit I Session 06: Probability Distributions

values are numerical outcomes of a random phenomenon, and that there are
two types of random variables, discrete and continuous. Discrete random
variables can take only a finite number of distinct values, for example, the
number of emergency care admissions per week in a base hospital. The
possible list of probabilities associated with each of its possible values of a
discrete random variable is called the probability function (or probability
distribution) of that variable. A function that provides the probability of
occurrence of all possible outcomes of a discrete variable is called the
probability mass function.
A continuous random variable, on the other hand can take an infinite
number of possible values. In health sciences continuous random variables
are usually health measurements such as weight and blood glucose level.
Such variables usually are defined over an interval of values, and are
represented by the area under a curve. A function that provides the
probability of occurrence of all possible outcomes of a continues variable is
called the probability density function.

6.1 The Bernoulli distribution


The Bernoulli distribution is a simplest discrete distribution. The Bernoulli
distribution describes the probability of only two possible mutually
exclusive outcomes. If X is a Bernoulli random variable, then

p if x =1
P (X =x) =
1-p if x =0
These outcomes are often described as ‘success’ (yes) often denoted by 1,
and a ‘failure’ (no) denoted by 0. Suppose a study, where 1000 smokers
were studied to identify those who developed lung cancer in the recent years,
found that 90 out of 1000 smokers experienced lung cancer, then the
probability that a smoker developed lung cancer in the sample is 90 / 1000 =

Copyright © 2020, The Open University of Sri Lanka 47


BSU5335 – Unit I Session 06: Probability Distributions

0.09. Here the probability of success (developing lung cancer) is 0.09 or 9%


and the probability of failure is 0.91 or 91%.

6.2 Binomial Distribution


The binomial distribution is one of the most widely used discrete probability
distributions and it is an extension of the Bernoulli distribution. The
Bernoulli distribution describes the probability of a success in a sample with
one patient and the Binomial distribution describes the probability of
observing a number of successes, with more than one patient.
The binomial distribution describes the behavior of a discrete variable X and
its probability mass function is given by

Where n! = n x (n-1) x (n-2) x …….x 2 x 1

Parameters of Binomial Distribution (mean and variance) are given by

Suppose in a cancer hospital, health records indicate that 80% of the patients
suffering from esophageal cancer would eventually die of it. What is the
probability that out of 6 randomly selected esophageal cancer patients?
a. Four(4) patients will recover?

b. More than 4 patients will recover?

This is a Binomial Distribution because there are only 2 outcomes (either


the patient would die or would not die)
Let X = number of patients who would recover (would not die)

48 Copyright © 2020, The Open University of Sri Lanka


BSU5335 – Unit I Session 06: Probability Distributions

Here n=6 and x=4, p=0.20 (a success) and (1-P) = q=0.80 (failure)
The probability that 4 patients will recover is equal to

Thus, P (x = 4) = 15 x 0.001024 = 0.01536


The probability that 4 patients will recover = 0.01536 or 1.5%.

The probability that more than 4 patients will recover is

The probability that more than 4 esophageal cancer patients will recover =
0.0046 or 0.4%

6.3 Poisson distribution

The Poisson distribution represents the number of events occurring


randomly in a fixed time at an average rate λ. The distribution is used to
explain distributions of counts of events that are rare in occurrence such as
death of infants. Suppose a researcher is interested in counting number of
seizure relapses of a group of epilepsy patients over a period of 6 months.
The probability of observing one relapse, two relapses etc., in a given
sample in such cases, can be explained using Poisson distribution.

Probability mass function of the Poisson distribution is given by

Copyright © 2020, The Open University of Sri Lanka 49


BSU5335 – Unit I Session 06: Probability Distributions

Parameters of Poisson distribution (mean d anvariance)

Example: Suppose that an emergency medicine unit registers on the average


3 patients with paracot poisoning per week. Calculate the probability that in
a given week the unit will register
a. One or more cases of paracot poisoning
b. Two (2) or more cases but less than 5 casesof paracot poisoning

Hereµ = 3

a. For one or more cases, we can work out the probability of finding 1
minus the zero cases

Thus, the probability of having registered one or more cases in a week is


95%.
b. The probability of registering 2 or more cases but less than 5 cases
can be calculated using

50 Copyright © 2020, The Open University of Sri Lanka


BSU5335 – Unit I Session 06: Probability Distributions

Thus, the probability of having registered 2 or more cases but less than 5
cases in a week is approximately 62%.

6.4 Normal distribution


The normal distribution refers to a family of continuous probability
distributions described by the normal equation.
The Normal probability density function is given by the following
equation,

Where X is the normal random variable, µ is the mean and σ is the standard
deviation
The value ofΠ is approximately 3.1415 and the value of e is approximately
2.718
The normal distribution is a continuous probability distribution. Continuous
variables usually contain a very large number of outcomes. We can draw a
frequency distribution curve for a normally distributed variable and this
curve is generally referred to as normal curve.
Properties of normal distribution

• The total area under the normal curve is equal to 1 (or 100%).

• The mean, median and mode are equal.

• The probability that X is greater than the value “a” equals the area
under the normal curve bounded by “a” and plus infinity (as
indicated by the non-shaded area in the figure 6.1 given below).

Copyright © 2020, The Open University of Sri Lanka 51


BSU5335 – Unit I Session 06: Probability Distributions

• Also, the probability that X is less than “a “equals the area under the
normal curve bounded by “a” and minus infinity (as indicated by the

shaded area in the figure 6.1 given below).

Figure 6.1: Normal distribution

Additionally, every normal curve (regardless of its mean or standard


deviation) conforms to the following.
About 68% of the area under the curve falls within 1 standard deviation of
the mean, about 95% of the area under the curve falls within 2 standard
deviations of the mean and about 99.7% of the area under the curve falls
within 3 standard deviations of the mean (Figure 6.2).

-3SD -2SD -1SD Mean 1SD +2SD +3SD

Figure 6.2: Normal distribution

Detailed explanation of the normal curve and standard normal curve (z


curve) is given in session7.

52 Copyright © 2020, The Open University of Sri Lanka


BSU5335 – Unit I Session 06: Probability Distributions

Activity6.1

1 Suppose road traffic accidents occur according to a Poisson distribution with an


average of 5 accidents every week. Calculate the probability that there will be no
more than one accident during a particular week.

2. Suppose in a particular MOH office, 30% of the mothers are anemic. What is the
probability that out of 10 randomly selected mothers,
(a) 3 mothers are non-anemic?
(b) 5 or more mothers are equal to its mean.

Summary

• Probability is use in their day-to-day work to express the level of


uncertainty of health events that occur.

• The Bernoulli distribution is a simplest discrete distribution and it


describes the probability of only two possible mutually exclusive
outcomes.

• binomial distribution is one of the most widely used discrete


probability distributions and it is an extension of the Bernoulli
distribution

• The Poisson distribution represents the number of events occurring


randomly in a fixed time at an average rate λ.

• The normal distribution refers to a family of continuous probability


distributions described by the normal equation.

Learning Outcomes
At the end of the lesson you should be able to,

• Understand the basic concepts of probability distributions

Copyright © 2020, The Open University of Sri Lanka 53


BSU5335 – Unit I Session 06: Probability Distributions

• Explain how parameters characterize probability distribution of a


variable
• Describe properties of distributions and their use.

Review Questions

1. The variance of the binomial distribution is always


a) less than its mean.
b) equal to its mean.
c) greater than its mean.
d) equal to its standard deviation.
e) less than its standard deviation.

2. Probability distribution of a random variable is also known as


a) probability density function.
b) probability function.
c) probability mass function.
d) distribution function.
e) cumulative distribution function.

3. A nursing home owner knows that, on average, 10 elders per year would get
admitted to the nursing home for long term care.
a) The variable in this example is normally distributed.
b) The mean of the variable is 5.
c) The variance of this variable is 25.
d) The variable in this example is continuous.
e) The standard deviation of the variable in this example is greater than its
mean.

54 Copyright © 2020, The Open University of Sri Lanka

You might also like