Professional Documents
Culture Documents
Kaustav Banerjee
Decision Sciences Area, IIM Lucknow
All the emails that would pass through the filtering software, if installed, constitute the
population of interest here. Some fraction (p) of these emails will be spam. Purchasing the
software will be profitable if p < 0.20. If p ≥ 0.20, the software will not remove enough spam
to compensate for the fee. The two conjectures regarding p are complementary, and only one
of them can be true: purchasing the software is either cost effective or not.
Now assess the risks involved in this decision problem. During the trial, the manager could
check the sample proportion of spams say p̄, a natural estimator of the true proportion of spams
p. The sample proportion p̄ could be close to or far from p. So, looking at p̄, the manager
could decide to buy (p̄ < 0.20) the software, while actually it is ineffective (p ≥ 0.20): that’s
incurring a loss. Alternatively, the manager may fail to realize (p̄ ≥ 0.20) that the software is
quite effective (p < 0.20): that’s a lost opportunity. Can she avoid both the risks?
It’s easy to avoid the risk of buying an ineffective software: she could simply stop buying
any software, however effective. Then it inflates the risk of not buying an effective software.
To avoid the risk of not buying an effective software, she could by any filtering software under
the sun. Then it inflates the risk of buying ineffective software! Clearly, the risks work against
each other and one should prioritize them. Which one of these could be more serious for the
company?
Next we formulate this problem statistically.
1
2 Statistical formulation
Notice that, our parameter of interest p (proportion of spams not filtered by the software) is a
feature of the population (all the emails the office would receive via the commercial software,
if purchased). Regarding p there are two complementary views/hypotheses/conjectures, p ≥
0.20 and p < 0.20, formulated as the null (H0 ) and alternative (Ha ) hypotheses. The null
hypothesis usually represents the view that is content with the existing state of affairs and
thus, discourages any change. Alternative hypothesis represents the view that challenges the
existing state of affairs, and calls for a change.
What do you think? In the software example, which one should be regarded
as the null hypothesis?
Alternative hypothesis can be classified as one-sided and two-sided alternatives. To fix the
idea, consider the following process control exercise.
Application 2: A manufacturing process is
supposed to produce capsules having 400 mg
of a chemical. Since variation in a manufactur-
402
400
●
● ●
●
●
of the contents produced by the manufacturing
●
●
1 2 3 4 5 6 7 8 9 10
tinuous monitoring of the process is necessary
Sample for checking the stability of the mean of the pro-
cess. A consultant suggested him implementing
the following procedure: in every hour during a
shift a sample of 100 capsules is to be selected and if the average content of the sample falls
below 399.90 or above 400.10 stop the process and hunt for the trouble. So we need to assess
if µ is in-control or out-of-control.
What do you think? In the process control example, what are the risks
involved? What should be the null hypothesis? What type of alternative
hypothesis is that? What type of alternative was set in the software example?
In general, how do we prioritize the risks involved in a decision problem? Consider the following
analogy from courtroom. If you are charged with a crime, the court believes you are innocent,
until proven guilty. This way, rejecting a true null hypothesis is equivalent to convicting an
innocent person. Similarly, failing to reject a false null hypothesis is equivalent to letting a
guilty person go free. Lawmakers hold that convicting an innocent person is more serious,
than letting a guilty person go free.
2
True State of Affairs
H0 True H0 False
Reject H0 Type I Error Correct Decision
Sample-based Decision
Fail to Reject H0 Correct Decision Type II Error
Analogously we regard rejecting a true null hypothesis (Type I error) is much more severe
than not rejecting a false null hypothesis (Type II error). So we don’t want the probability of
committing type I error to exceed a pre-assigned (prior to sampling/data collection) threshold
value α: known as the level of significance, and usually set at 1%, 5% or 10%. This gives
a very important criterion to evaluate any decision rule: whether it keeps the probability of
type I error below the nominated level of significance, i.e. P (Type I error) ≤ α.
Notice that, we choose to consider the sampling distribution of p̄ at the break-even value 20%
and not at some value larger than 20% implied by H0 . This is because if p̄ is around 20%, we
feel most uncertain in our decision. However, as p̄ deviates from 20% in either direction, it is
much easier to decide accordingly.
Having believed in the modified null hypothesis H0 : p = 0.20, how likely an estimate p̄obs
is, with reference to the sampling distribution of p̄ as above? This is assessed by checking if
Notice, for an estimate p̄obs , if the associated p-value falls short of α, it is extremely unlikely
that such an estimate would be observed under the null hypothesis H0 : p = 0.20. In short,
such an estimate is a strong evidence against H0 . Consequently, we reject the null hypothesis.
3
However, if the p-value exceeds α, the
evidence against the null is not quite
strong, and we fail to reject the null
hypothesis, as per the data. Observe
that, this p-value criterion can also be
formulated as, to test H0 : p = 0.20
against Ha : p < 0.20
Probability density function
Reject H0 if
p̄ − 0.20
p obs < −Zα
(0.20 × 0.80)/n
4
Therefore to test H0 : µ = 400 against
Ha : µ 6= 400, we reject H0 if this p-
value falls short of α; and fail to re-
ject H0 if otherwise. Alternatively, we
could say
Reject H0 if
Probability density function
|x̄obs − 400|
√ > Zα/2
0.5/ n
Next, continuing with the software example, what is the probability the manager will purchase
a good (p = 0.15) software? Alternatively, what is the probability of rejecting the null H0 :
p = 0.20, when it is indeed false? To see this, taking α = 5% and n = 100, we obtain
" #
0.1342 − p
P (p̄ < 0.1342 | p = 0.15) = P Z < p | p = 0.15 ≈ 0.33
p(1 − p)/n
So compared to the chance of purchasing a bad software [P (type I error) ≤ 5%], the probability
of purchasing a good (p = 0.15) software is 33%. So our decision rule is designed to take the
correct decision more often than it takes the wrong decision. This is the power of the decision
rule, expressed as P (Rejecting H0 | H0 is false) = 1 − β, where β represents the probability of
type II error: the probability of not rejecting a null hypothesis, which is indeed false.
5
This graph illustrates the probability of
purchasing a software, against the true
1.0
0.6