You are on page 1of 57

Session 5: Discrete Probability

Distributions
Statistics for Business
Dr. Le Anh Tuan

1
Contents

►Discrete Probability Distributions

►Expected value, variance and standard deviation of a


discrete distribution

►Binomial distributions

►Hypergeometric distributions

►Poisson distributions

2
Introduction
► Discrete data
► Values that are whole number
► If there is space on the number line between each 2
possible values
► Examples: # of books in a room, number of correct
answers, # of TVs in a class room, difference in scores
between A and B sports teams can be −2.

► Continuous data
► data that can take any value (within a range); there are no
gaps
► Example: a person's height, a dog’s weight, temperature

3
Probability Distributions

4
Random variable
► For a given sample space S of some experiment, a random
variable is a rule that associates a number with each outcome
in the sample space S.
► Notation
► Random variables - usually denoted by uppercase
letters near the end of our alphabet (e.g. X, Y).
► Particular value - use lowercase letters, such as x,
which correspond to the random variable X

5
Types of random variables
►A discrete random variable
►Have outcomes that take on whole numbers
►A finite number of values

►A continuous random variable


►Have outcomes that take on any numerical value
►An infinite number of outcomes

6
Discrete Probability Distributions Rules

► A discrete probability distribution is


► A listing of all the possible outcomes of an experiment
for a discrete random variable
► Determine probabilities associated with the values of
any particular random variable

7
Discrete Probability Distributions Example

Experiment: Toss 2 Coins. Let X = # heads.


Show P(x) , i.e., P(X = x) , for all values of x:

4 possible outcomes
Probability Distribution
T T x Value Probability
0 1/4 = .25
T H 1 2/4 = .50
2 1/4 = .25
H T Probability
.50

H H .25

0 1 2 x

8
Discrete Probability Distributions Example
► Our classroom has 6 computers.
► Let X denote the number of these computers that are in use
during weekend {0, 1, 2… 6}.
► Suppose that the probability distribution of X is as given in the
following table:
0.3

xi p(xi) 0.25

0 0.05 0.2

1 0.10
Probability
0.15
2 0.15 p(x)

3 0.25 0.1

4 0.20 0.05

5 0.15 0

6 0.10 0 1 2 3
X
4 5 6

9
What is a PDF or CDF?
► A probability distribution function (PDF) is a mathematical
function that shows the probability of each X-value.
► A cumulative distribution function (CDF) is a mathematical
function that shows the cumulative sum of probabilities, adding
from the smallest to the largest X-value, gradually approaching
unity.
PDF P(X=x) CDF (P(X<x)
1.2
0.3

0.25 1

0.2 0.8
Probability

Probability
0.15 0.6

p(x)

0.1 0.4

0.05 0.2

0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Value of X Value of X

10
Discrete Probability Distributions Rules

► Each outcome in the distribution needs to be mutually exclusive


with other outcomes in the distribution.
► The probability of each outcome, P(x), must be between 0 and 1
0 ≤ #(%) ≤ 1
► The sum of the probabilities for all the outcomes in the
distribution must be 1
,
( #(%-) = 1
)*+
where n equals the total number of possible outcomes.

11
The Mean of Discrete Probability
Distributions
► The mean or expected value E(x) of a discrete random variable
is the sum of all X-values weighted by their respective
probabilities.
► E(X) is a measure of central tendency.
► If there are N distinct values of X, then
+

& " = ! = ( "# $("# )


#)*
where ! = the mean of the discrete probability distributions
"# = the value of the random variable for the ith outcome
$("# )= the probability that ith outcome will occur
n = number of outcomes in the distribution

12
Calculate mean
► Let X denote the number of these computers that are in use
during weekend {0, 1, 2… 6}.

0.3
xi P(xi) xi*P(xi)
0.25
0 0.05
0.2
1 0.10
2 0.15
Probability

0.15

0.1
p(x)
3 0.25
0.05
4 0.20
0
5 0.15
0 1 2 3
X
4 5 6
6 0.10
Total 1
%
The mean (expected) number of computers ! &" '(&" )
that are in use during weekend is ______ "#$
13
The Variances of Discrete Probability
Distributions
► The variance is a measure of the spread of the individual values
around the mean of a data set.
► Variance of a discrete random variable X
,

! " = )(#$ −%)" &(#$ )


$*+

where ! " = the variance of the discrete probability distributions


#$ = the value of the random variable for the ith outcome
% = the mean of the discrete probability distributions
&(#$ )= the probability that ith outcome will occur
n = number of outcomes in the distribution

15
The Variances of Discrete Probability
Distributions
► An equivalent shortcut formula for the variance:
*

$ % = &(,' % -(,' ) − 0%
'()

► The standard deviation is the square root of the variance

!" = $%

16
Calculate variance
► Let X denote the number of these computers that are in use
during weekend {0, 1, 2… 6}.
► ! = 3.3
xi P(xi) xi*P(xi) xi-! (xi-!)2 (xi-!)2P(xi)
0 0.05 0.00
1 0.10 0.10
2 0.15 0.30
3 0.25 0.75
4 0.20 0.80
5 0.15 0.75
6 0.10 0.60
Total 3.3

)
The variance is 2.61 computer squared
%(+& −!). /(+& )
➔ the SD = 0. 12=1.62 computer
&'(
17
Expected Money Value
► The expected money value (EMV) is the mean of a discrete
probability distribution when the discrete random variable is
expressed in terms of dollars.
► The EMV represents a long-term average, as if outcomes
from the distribution occurred many times.
► Calculate of EMV for the profits from facemasks.
Status Profit Probability
Covid-19 Increase $10,000 0.20
Normal $4000 0.50
Covid-19 Decrease $1000 0.30
Total
► EMV for the profits from facemasks is
………………………………….………………………………….

19
Probability Distributions

20
Probability Distributions

Probability
Distributions

Discrete Continuous
Probability Probability
Distributions Distributions

Binomial Uniform

Hypergeometric Normal

Poisson Exponential

21
Bernoulli Experiments
► A random experiment with only two outcomes is a
Bernoulli experiment.
► Consider only two outcomes: “success” or “failure”
► Let a denote the probability of success
► Let 1 – a be the probability of failure
► Define random variable X:
x = 1 if success, x = 0 if failure

► Then the Bernoulli probability function is


! 0 =1−&
! 1 =&

► Then the Bernoulli probability function is


! 0 +! 1 =1−&+& =1

22
Bernoulli Experiments
► The mean is µ = a
)
! = # $ = % *+ * = 0 1 − / + 1. / = /
&'(

► The variance is 2 3 = /(1 – /)

2 3 = # $ − µ 3 = # $ − µ 3 +(*)
= 0 − / 2 1 − / + 1 − / 2/
= /(1 − /)

23
Binomial Distributions

►The binomial distribution arises when a Bernoulli


experiment is repeated n times.

►The binomial distribution results from a procedure that


meets these four requirements:
❶ The procedure has a fixed number of trials (a trial
is a single observation).
❷ The trials must be independent (The outcome of
one observation does not affect the outcome of the
other).

24
Binomial Distributions

►The binomial distribution results from a procedure that


meets these four requirements:
❸ Each trial must have all outcomes classified into
exactly two categories that are mutually exclusive
and collectively exhaustive.
e.g., head or tail, success or failure (commonly
use), defective or not defective
❹ The probability of a success remains the same in
all trials.
e.g., Probability of getting a tail is the same each
time we toss the coin

25
Binomial Distributions

►We define S (success) and F (failure) denote the two


possible categories of all outcomes.
! " =$ p=probability of a success
! % =1−$=( q=probability of a failure
n = the fixed of number of trials (observations)
x = a specific number of success in n trials, so x can be
any whole number between 0 and n.
p is the probability of success in one of the n trials
q is the probability of failure in one of the n trials
P(x) – the probability of getting exactly x successes
among the n trials.

26
Binomial Distributions
►When a student is randomly selected (with
replacement), there is a 0.75 probability that this
student knows how to use ChatGPT. Assume that we
want to find the probability that exactly three of four
randomly selected students know how to use ChatGPT.
a. Does this survey result in a binominal
distribution?
b. If yes, identify the values of n, x, p and q

27
Binomial Distributions
►When a student is randomly selected (with
replacement), there is a 0.75 probability that this
student knows how to use ChatGPT. Assume that we
want to find the probability that exactly three of four
randomly selected students know how to use ChatGPT.
❶ A number of trials is fixed (4 observations).
❷ The 4 trials are independent because the answer of
this student does not affected by the answer of the other
students).
❸ Each trial must have all outcomes classified into exactly
two categories: KNOW or DO NOT KNOW.
❹ The probability of a success remains the same in all
trials (0.75)

28
Binomial Distributions
►When a student is randomly selected (with
replacement), there is a 0.75 probability that this
student knows how to use ChatGPT. Assume that we
want to find the probability that exactly three of four
randomly selected students know how to use ChatGPT.
❶ A number of trials is fixed (4 observations).
❷ The 4 trials are independent because the answer of
this student does not affected by the answer of the other
students).
❸ Each trial must have all outcomes classified into exactly
two categories: YES or NO.
❹ The probability of a success remains the same in all
trials (0.75)

29
Binomial Distributions
►When a student is randomly selected (with
replacement), there is a 0.75 probability that this
student knows how to use ChatGPT. Assume that we
want to find the probability that exactly two of four
randomly selected students know how to use ChatGPT.

►! = 4; % = 0.75; * = 1 − 0.75 = 0.25


►One to get 3 YES is YYYN
►The probability of this exact outcome is
(0.75)3(0.25)1 ≈ 0.105
►However, there are 234 = 4 different ways to get 3 YES,
include: YYYN, YYNY, YNYY, NYYY
►The probability of getting three rights is
5 3 = 234 . (0.75)3. 0.25 1 = 0.422

30
Binomial Distributions
►When a student is randomly selected (with
replacement), there is a 0.75 probability that this
student knows how to use ChatGPT. Assume that we
want to find the probability that exactly two of four
randomly selected students know how to use ChatGPT.

►! = 4; % = 0.75; * = 1 − 0.75 = 0.25


►One to get 3 YES is YYYN
►The probability of this exact outcome is
(0.75)3(0.25)1 ≈ 0.105
►However, there are 234 56 2 4,3 = 4 different ways to
get 3 YES, include: YYYN, YYNY, YNYY, NYYY
►The probability of getting three rights is
8 3 = 234 . (0.75)3. 0.25 1 = 0.422
31
Binomial Distributions Formula
►Binomial Probability Formula:

+!
! " = $(&, "). )".* +,- = +,- !-!
.)".* +,-
/01 " = 0, 1, 2, … , &

n = number of trials
x = number of success among n trials
p is the probability of success in one of the n trials
q=1-p is the probability of failure in one of the n trials
P(x) – the probability of getting exactly x successes
among the n trials.

32
Excel and Megastat
► Excel:

=BINOM.DIST(3,4,0.75,FALSE)

33
Tables

34
Tables

N x … p=.20 p=.25 p=.30 p=.35 p=.40 p=.45 p=.50


10 0 … 0.1074 0.0563 0.0282 0.0135 0.0060 0.0025 0.0010
1 … 0.2684 0.1877 0.1211 0.0725 0.0403 0.0207 0.0098
2 … 0.3020 0.2816 0.2335 0.1757 0.1209 0.0763 0.0439
3 … 0.2013 0.2503 0.2668 0.2522 0.2150 0.1665 0.1172
4 … 0.0881 0.1460 0.2001 0.2377 0.2508 0.2384 0.2051
5 … 0.0264 0.0584 0.1029 0.1536 0.2007 0.2340 0.2461
6 … 0.0055 0.0162 0.0368 0.0689 0.1115 0.1596 0.2051
7 … 0.0008 0.0031 0.0090 0.0212 0.0425 0.0746 0.1172
8 … 0.0001 0.0004 0.0014 0.0043 0.0106 0.0229 0.0439
9 … 0.0000 0.0000 0.0001 0.0005 0.0016 0.0042 0.0098
10 … 0.0000 0.0000 0.0000 0.0000 0.0001 0.0003 0.0010

Examples:
n = 10, x = 3, P = 0.35: P(x = 3|n =10, p = 0.35) = .2522
n = 10, x = 8, P = 0.45: P(x = 8|n =10, p = 0.45) = .0229
35
Questions
►If X is binomially distributed with 6 trials and a
probability of success equal to 0.25 at each attempt,
what is the probability of:
(a) exactly 4 successes
(b) at least one success
(c) fewer than two successes

36
Binomial Distribution Mean and Variance
► For binomial distributions:
►Mean: ! = #$
►Variance: % & = #$(1 − $)
►SD: %= #$+

►If X is binomially distributed with 6 trials and a


probability of success equal to 0.25 at each attempt
►Mean !=
►Variance: % & =
►SD =

38
Binomial Distribution

39
Binomial Distribution Shape
► A binomial distribution
► skewed right if p < 0.50
► skewed left if p > 0.50
► symmetric only if p = 0.50
► However, skewness decreases as n increases, regardless of the
value of p.
► Notice that p = 0.20 and p = 0.80 have the same shape, except
reversed from left to right.
► This is true for any values of p and q=1 − p.

40
Binomial Distribution Shape
Binomial distribution (n = 6, p = 0.1) Binomial distribution (n = 6, p = 0.2)
0.60 0.45
0.40
0.50
0.35
0.40 0.30
0.25
P(X)

0.30

P(X)
0.20
0.20 0.15
Binomial distribution (n = 6, p = 0.5)
0.10 0.10
0.05
0.35
0.00
0.00 0.30
0 1 2 3 4 5 6
0 1 2 3 4 5 6
X
X 0.25

► Right skewness (p=0.1, p=0.2) 0.20

P(X)
0.15

0.10

Binomial distribution (n = 6, p = 0.9) Binomial distribution (n = 6, p = 0.8) 0.05

0.60 0.45 0.00


0.40 0 1 2 3 4 5 6
0.50 X
0.35

► symmetric skewness
0.40 0.30

0.25
P(X)

P(X)

0.30
0.20
0.20 0.15

0.10
(p=0.5)
0.10
0.05
0.00 0.00
0 1 2 3 4 5 6 0 1 2 3 4 5 6
X X

► Left skewness (p=0.9, p=0.8)


41
Hypergeometric distributions

Probability
Distributions

Discrete
Probability
Distributions

Binomial

Hypergeometric

Poisson

42
Hypergeometric distributions

► The hypergeometric distribution is similar to the binomial


distribution.
► However, unlike the binomial, sampling is without replacement
from a finite population of N items.
► Outcomes of trials are dependent.
► The hypergeometric distribution may be skewed right or left
and is symmetric only if the proportion of successes in the
population is 50%.

43
Hypergeometric distributions

► Concerned with finding the probability of “X” successes in the


sample where there are “S” successes in the population

Where
N = population size
S = number of successes in the population
N – S = number of failures in the population
n = sample size
x = number of successes in the sample
n – x = number of failures in the sample
44
Hypergeometric distributions

► In a shipment of 10 iPods, 2 were damaged and 8 are good.


► The receiving department at Best Buy tests a sample of 3 iPods
at random to see if they are defective.
► Let the random variable X be the number of damaged iPods in
the sample.
► N = 10 (number of iPods in the shipment)
► n = 3 (sample size drawn from the shipment)
► S = 2 (number of damaged iPods in the shipment
(“successes” in population))
► N–s = 8 (number of non-damaged iPods in the shipment)
► x = number of damaged iPods in the sample (“successes”
in sample)
► n–x = number of non-damaged iPods in the sample

45
Hypergeometric distributions

► This is not a binomial problem because p is not constant.


► What is the probability of getting a damaged iPod on the first
draw from the sample?
► p1 = 2/10
► Now, what is the probability of getting a damaged iPod on the
second draw?
► p2 = 1/9 (if the first iPod was damaged) or
= 2/9 (if the first iPod was undamaged)
► What about on the third draw?
► p3 = 0/8 or = 1/8 or = 2/8 depending on what happened in the
first two draws.

46
Hypergeometric distributions
► What is the probability that 0, 1, or 2 of the 3 selected iPods are
damaged?
2! 8!
%&' %() (0! 2!)(3! 5!)
! "=0 = ) = = 0.467
%*' 3!
( )
3! 7!
2! 8!
%&* %(& (1! 2!)(2! 6!)
! "=1 = ) = = 0.467
%*' 3!
(3! 7!)
2! 8!
%&& %(* (2! 2!)(1! 7!)
! "=2 = ) = = 0.066
%*' 3!
(3! 7!)

47
Excel and Megastat
► Excel

► P(X=2)

48
Megastat

! $ "1 !("1)
! "1 $ =
! $ "1 ! "1 + ! $ "2 !("2)

49
Hypergeometric distributions
*
► The (expected value) mean ! = # ∗ % &ℎ()( % = +
+12
► The standard deviation ,- = #% 1 − % ( )
+13

50
How to recognize a hypergeometric
situation?
► Look for a finite population (N) containing a known number of
successes (s)
► Sampling without replacement (n items in the sample) where
the probability of success is not constant for each sample item
drawn.
►Both the binomial and hypergeometric involve samples of
size n and treat X as the number of successes.
►The binomial sample is with replacement while the
hypergeometric sample is without replacement.
►If n/N < 0.05, it is safe to use the binomial approximation
to the hypergeometric, using sample size n and success
probability p = s/N.

51
Poisson Distribution

52
Poisson Distribution
►The Poisson distribution describes the number of
occurrences of some events over a specified interval.
►The random variable x is the number of occurrences of the
event in an interval. The interval can be time, distance,
area, volume,…
►The probability of the event occurring x times over an
interval is given by:
$% .' ()
! " = *!
where, e=2.71828, the base of the natural logarithm system
x = number of occurrences in an interval
l= expected number of occurrences in an interval

53
Poisson Distribution Characteristics
►The mean is !
►The standard deviation is " = !
►A particular Poisson distribution is determined only by
the mean !
►Unlike the binomial, X has no obvious limit, that is, the
number of events that can occur in a given unit of time
is not bounded. It is 0, 1, 2,…with no upper limit.

54
Poisson Distribution Characteristics
►The number of industrial injuries per working week in a
particular factory is known to follow a Poisson
distribution with mean λ = 0.5.
►Find the probability that
►(a) in a particular week there will be:
►(i) less than 2 accidents,
►(ii) more than 2 accidents;
►(b) in a three week period there will be no
accidents.

55
How to recognize Poisson Applications

►An event of interest occurs randomly over time or space.

►The average occurrences rate (λ) remains constant.

►The occurrences are independent of each other.

►The random variable (X) is the number of occurrences of


an event in some interval.

57
Excel and Megastat

58
Poisson Table

59
Use the Poisson approximation to the
binomial
►The Poisson distribution may be used to approximate a
binomial by setting ! = np. This approximation is helpful
when the binomial calculation is difficult (e.g., when n is
large).
►The general rule for a good approximation is that n should
be “large” and p should be “small.”
►A common rule of thumb says the approximation is
adequate if " ≥ 20 and p≤0.05.

60
Exercise
► Next week: one online session for exercises

► Review Session 5, Online Quiz 6.

► Homework

► Mid-term exam
► 50 Multiple-choice questions, 90 minutes.
► MYISB schedule
► Closed book exam, Equation sheet is provided
► Calculators are allowed for use, but the use of laptops and
electronic devices is not permitted.
► Prepare a printed version (without any notes) of Appendix
A - Binomial Probabilities and Appendix B - Poisson
Probabilities.
61

You might also like