You are on page 1of 19

Data Analysis for Managers

Unit II (Part 2): Probability Distributions


Rajashree Kamath Ph.D. (Statistics),
Assistant Professor (Business Analytics),
Coordinator - AcadX (Kengeri Campus),
Coordinator - CSR Karma Club (Kengeri Campus),
School of Business and Management,
CHRIST (Deemed to be University) - BKC, Bangalore 560074.
Ph.: +918040129879 (O), Cell: +919448067196.
MISSION VISION CORE VALUES
CHRIST is a nurturing ground for an individual’s Excellence and Service Faith in God | Moral Uprightness
holistic development to make effective contribution to Love of Fellow Beings
the society in a dynamic environment Social Responsibility | Pursuit of Excellence
CHRIST
Deemed to be University

Random Variables

● A random variable is a numerical description of the outcome of an


experiment. It is denoted by X.
● Random variables must have numerical values.
● A random variable that may assume either a finite number of values
or an infinite sequence of values such as 0, 1, 2, . . . is referred to as a
discrete random variable. For e.g., if the experiment is to "contact 5
customers", the "number of customers who place an order" is a
discrete random variable taking values 0, 1, 2, 3, 4, 5.
● A random variable that may assume any numerical value in an
interval or collection of intervals is called a continuous random
variable. For e.g., if the experiment is to "fill a soft drink can", the
"amount of soft drink filled" is a continuous random variable taking
all possible values in the range [0, 300] ml.

Excellence and Service


CHRIST
Deemed to be University

Probability Distributions

● The probability distribution for a random variable describes how


probabilities are distributed over the values of the random variable.
● For a discrete random variable x, the probability distribution is
defined by a probability mass function (PMF), usually denoted by
P[X = x].
● The required conditions for a PMF are:
○ P[X = x] >= 0 and Sum of P[X = x] over all the values of X is 1.
● For a continuous random variable x, the probability distribution is
defined by a probability density function (PDF), usually denoted by
f(x).
● The required conditions for a PDF are:
○ f(x) > 0 and the area under f(x) over the range of X is 1.
● For a continuous random variable, the probability of X taking a
particular value is zero. The probability of X taking a range of values
in the interval [a, b] is equal to the
Excellence andarea under f(x) between the lines x
Service
CHRIST
Deemed to be University
Expected Value and Variance
● The mean of all possible values taken by a random variable is also
called as the Expected Value of the random variable, denoted by E(X)
or . It is the sum of x * P[X = x] over the range of x.
● The variance of a random variable is the sum of {x – E(X)}2 * P[X =
x] over the range of x, denoted by V(X) or σ 2.
● The standard deviation, σ, is defined as the positive square root of
the variance.
● Exercise 1 Data:
Number of Number of
Times Students
1 721,769
2 601,325
3 166,736
4 22,299
5 6,730

Excellence and Service


CHRIST
Deemed to be University

Exercise 1
● The number of students taking the Scholastic Aptitude Test (SAT) has
risen to an all-time high of more than 1.5 million (College Board,
August 26, 2008). Students are allowed to repeat the test in hopes of
improving the score that is sent to college and university admission
offices. The number of times the SAT was taken and the number of
students are given in the previous slide:
a. Show the probability distribution for the number of times a student
takes the SAT.
b. What is the probability that a student takes the SAT more than one
time?
c. What is the probability that a student takes the SAT three or more
times?
d. What is the expected value of the number of times the SAT is
taken? What is your interpretation of the expected value?
e. What is the variance and standard
Excellence deviation for the number of times
and Service
CHRIST
Deemed to be University

Binomial Distribution

● The random variable represents the number of times "success" is


achieved in n identical and independent trials. The probability of
success, unchanged between trials, denoted by p, is known. The
distribution associated with this is given by the PMF
P[X = x] = nCx px (1 – p)(n – x); x = 0, 1, …, n
where nCx = n!/{(n – x)! x!}; x! = x*(x – 1)*…3*2*1
● When n = 1, we call this a Bernoulli distribution.
● The function is the xth term in the expansion of (q + p)n, where q = 1 -
p.
● By independent trials, we mean that the outcome of one trial is not
influenced by the outcome of another.
● By identical trials, we mean that the experimental setup remains the
same from trial to trial.
● E(X) = np
● V(X) = np(1 - p) Excellence and Service
CHRIST
Deemed to be University

Exercise 2

● A Harris Interactive survey for InterContinental Hotels & Resorts


asked respondents, “When traveling internationally, do you generally
venture out on your own to experience culture, or stick with your tour
group and itineraries?” The survey found that 23% of the respondents
stick with their tour group (USA Today, January 21, 2004).
a. In a sample of six international travellers, what is the probability
that two will stick with their tour group?
b. In a sample of six international travellers, what is the probability
that at least two will stick with their tour group?
c. In a sample of 10 international travellers, what is the probability
that none will stick with the tour group?

Excellence and Service


CHRIST
Deemed to be University

Exercise 3

● Nine percent of undergraduate students carry credit card balances


greater than $7000 (Reader’s Digest, July 2002). Suppose 10
undergraduate students are selected randomly to be interviewed about
credit card usage.
a. Is the selection of 10 students a binomial experiment? Explain.
b. What is the probability that two of the students will have a credit
card balance greater than $7000?
c. What is the probability that none will have a credit card balance
greater than $7000?
d. What is the probability that at least three will have a credit card
balance greater than $7000?

Excellence and Service


CHRIST
Deemed to be University
Poisson Distribution
● This distribution is often useful in estimating the number of
occurrences over a specified interval of time or space. For example,
the number of arrivals at a car wash in one hour, the number of repairs
needed in 10 miles of highway, or the number of leaks in 100 miles of
pipeline.
● Properties:
1. The probability of an occurrence is the same for any two intervals
of equal length.
2. The occurrence or non-occurrence in any interval is independent of
the occurrence or non-occurrence in any other interval.
● Probability Function:
P[X = x] = ; x = 0, 1, 2, …
= 0 otherwise.
● Mean = Variance = 
● If, in a Binomial distribution, n is large and p is small,  = np being
finite, it tends to a Poisson distribution.
Excellence and Service
CHRIST
Deemed to be University

Exercise 4

● The National Safety Council (NSC) estimates that off-the-job


accidents cost U.S. businesses almost $200 billion annually in lost
productivity (National Safety Council, March 2006). Based on NSC
estimates, companies with 50 employees are expected to average three
employee off-the-job accidents per year. Answer the following
questions for companies with 50 employees.
a. What is the probability of no off-the-job accidents during a one-year
period?
b. What is the probability of at least two off-the-job accidents during a
one-year period?
c. What is the expected number of off-the-job accidents during six
months?
d. What is the probability of no off-the-job accidents during the next six
months?
Excellence and Service
CHRIST
Deemed to be University

Exercise 5
● Phone calls arrive at the rate of 48 per hour at the reservation desk for
Regional Airways.
a. Compute the probability of receiving three calls in a 5-minute
interval of time.
P[X = 3] with mu = 4
b. Compute the probability of receiving exactly 10 calls in 15 minutes.
P[X = 10] with mu = 12
c. Suppose no calls are currently on hold. If the agent takes 5 minutes
to complete the current call, how many callers do you expect to be
waiting by that time? What is the probability that none will be
waiting?
Mu = 4, P[X = 0]
d. If no calls are currently being processed, what is the probability that
the agent can take 3 minutes for personal time without being
interrupted by a call?
Mu = (48 * 3)/60 = 2.4 P[X = 0]
Excellence and Service
CHRIST
Deemed to be University

Exercise 6 (Space/Distance Example)

● The Road and Transportation Department is concerned with the


occurrence of major defects in a highway one month after resurfacing.
They assume that the probability of a defect is the same for any two
highway intervals of equal length and that the occurrence or non-
occurrence of a defect in any one interval is independent of the
occurrence or non-occurrence of a defect in any other interval. They
learn that major defects one month after resurfacing occur at the
average rate of two per mile.
a. Find the probability of at least two major defects occurring in a
particular three mile section of the highway. mu = 2 x 3 = 6
P[X>=2] = 1 – P[X = 0] – P[X = 1] = 1 – 0.0025 – 0.0149 = 0.9826
There is a 98.26% chance of having at least two major defects ...
b. Find the probability of no major defect occurring in a particular one
mile section of the highway. Mu = 2
P[X = 0] = e ^ (-2)(2)^0/0! =Excellence
e ^ (-2)and
= Service
0.1353
CHRIST
Deemed to be University

Exercise 7
● A survey showed that the average commuter spends about 26 minutes
on a one-way door-to-door trip from home to work. In addition, 5% of
commuters reported a one-way commute of more than one hour
(Bureau of Transportation Statistics website, January 12, 2004).
If 20 commuters are surveyed on a particular day, (Binomial)
a. what is the probability that three will report a one-way commute of
more than one hour?
n = 20 p = 0.05 P[X = 3]
b. what is the probability that none will report a one-way commute of
more than one hour?
If a company has 2000 employees, (Poisson)
c. what is the expected number of employees that have a one-way
commute of more than one hour? n = 2000 mu = 2000 * 0.05 =
d. What is the prob. That none of the employees have … P[X = 0]
with mu = 100
e. Suppose in a similar company with 500 employees, how many
Excellence and Service
employees are expected to have a commute of more than one hour.
CHRIST
Deemed to be University
Normal Distribution
● The most important continuous probability distribution for random variables
like heights and weights of people, test scores, scientific measurements,
amounts of rainfall, and other similar values.
● It is also widely used in statistical inference, where the normal distribution
provides a description of the likely results obtained through sampling.
● Bell-shaped curve, symmetric about the mean (zero skewness), mesokurtic
(zero kurtosis).
● The probability density function is given by
; - < x < 
= 0 otherwise.
● The mean = median = mode is , the standard deviation is 𝜎.
● The standard deviation determines how flat and wide the normal curve is.
Larger values of the standard deviation result in wider, flatter curves,
showing more variability in the data.
● When 𝜇 = 0 and 𝜎 = 1, this is called standard normal distribution.
Z = (X – mu)/sigma
When Z = 1, X – mu = sigma

Excellence and Service


CHRIST
Deemed to be University

Exercise 8
● For borrowers with good credit scores, the mean debt for revolving and instalment accounts is $15,015
(BusinessWeek, March 20, 2006). Assume the standard deviation is $3,540 and that debt amounts are
normally distributed.
Let X be the amount of debt for revolving… Then, we are given that, X follows normal distribution with mean mu
= 15,015 and sd sigma = 3,540.
a. What is the probability that the debt for a borrower with good credit is more than $18,000?
P[X > 18,000] = P[(X-15015)/3,540 > (18000 – 15015)/3,540] = P[Z > 0.84]
= 0.5 – P[0 < Z < 0.84] = 0.5 – 0.2995 = 0.2005
There is around a 20% chance that the debt… is more

b. What is the probability that the debt for a borrower with good credit is less than $10,000?
P[X < 10000] = P[(Z < (10000 – 15015)/3540] = P[Z < -1.42] = 0.5 – P[0 < Z < 1.42] = 0.5 – 0.4222
0 = 0.84
0.0778
There is a 7.7% chance that …
b. What is the probability that the debt for a borrower with good credit is between $12,000 and $18,000?
P[12000 < X < 18000] = P[-0.85 < Z < 0.84] = P[-0.85 < Z < 0] + P[0< Z < 0.84] = 0.3023 + 0.2995 = 0.6018
c. What is the probability that the debt for a borrower with good credit is no more than $14,000?
P[Z <= 14000] = P[Z < -0.28] =

Excellence and Service


CHRIST
Deemed to be University

Exercise 9

● In an article about the cost of health care, Money magazine reported that a visit to a
hospital emergency room for something as simple as a sore throat has a mean cost of
$328 (Money, January 2009). Assume that the cost for this type of hospital
emergency room visit is normally distributed with a standard deviation of $92.
Answer the following questions about the cost of a hospital emergency room visit for
this medical service.
a. What is the probability that the cost will be more than $500?
b. What is the probability that the cost will be less than $250?
c. What is the probability that the cost will be between $300 and $400?
d. If the cost to a patient is in the lower 8% of charges for this medical service,
what was the cost of this patient’s emergency room visit?
P[X < k] = 0.08
(k – mu)/sigma = -1.41
k = mu -1.41sigma = 328 -1.41 x 92 = 199.2
Therefore, the patient’s emergency visit cost him/her about $200.
Excellence and Service
CHRIST
Deemed to be University

Exercise 10
● Trading volume on the New York Stock Exchange is heaviest during the first half
hour (early morning) and last half hour (late afternoon) of the trading day. The early
morning trading volumes (millions of shares) for 13 days in January and February are
shown here (Barron’s, January 23, 2006; February 13, 2006; and February 27, 2006).
214 163 265 194 180 202 198 212 201 174 171 211 211
The probability distribution of trading volume is approximately normal.
a. Compute the mean and standard deviation to use as estimates of the population mean
and standard deviation.
b. What is the probability that, on a randomly selected day, the early morning trading
volume will be less than 180 million shares?
c. What is the probability that, on a randomly selected day, the early morning trading
volume will exceed 230 million shares?
d. How many shares would have to be traded for the early morning trading volume on a
particular day to be among the busiest 5% of days?
P[X > k] = 0.05

Excellence and Service


CHRIST
Deemed to be University

Normal Approximation to Binomial Distribution


● In Binomial distribution, when n is large, and np ≥ 5, and n(1 - p) ≥ 5, set
 = np, and 𝜎2 = np(1 – p)
● To approximate the binomial probability of a particular integer, compute the
corresponding normal probability of the variable lying between -0.5 to +0.5
of the integer. This 0.5 is called as the continuity correction factor.
● P[X = k] = P[k-0.5 < X < k + 0.5]
● Exercise 11: Although studies continue to show smoking leads to significant
health problems, 20% of adults in the United States smoke. Consider a group
of 250 adults.
a. What is the expected number of adults who smoke?
mu = np = 250 x 0.2 = 50 Variance = npq = 50 * 0.8 = 40
s.d. = sqrt(40) = 6.3
b. What is the probability that fewer than 40 smoke?
P[X < 40] = P[X= 0] + P[X = 1] + … + P[X = 39]
= P[-0.5 < X < 0.5] + P[0.5 < X < 1.5] + … + P[38.5 < X < 39.5]
= P[-0.5 < X < 39.5] = P[(-0.5 – 50)/6.3
Excellence < Z < (39.5 – 50)/6.3] =
and Service
CHRIST
Deemed to be University

● Since n = 250, which is large, and p = 0.2, we see that:


np = 50, which is >= 5
n(1 – p) = 250 * 0.8 = 200, which is >= 5
We use Normal approximation to Binomial distribution for the random
variable,
X, the number of adults who smoke, out of a random sample of 250
adults.
● µ = np = 50
●  = square root{np(1 – p)} = 6.324

● b) P[X < 40] = P[X = 0] + P[X = 1] + … + P[X = 39]


= P[-0.5 < X < 0.5] + P[0.5 < X < 1.5] + … + P[38.5 < X < 39.5]
= P[-0.5 < X < 39.5] using Normal approximation
where µ = 50, 2 = 40.
Excellence and Service

You might also like