Professional Documents
Culture Documents
HYPOTHESIS TESTING
HANOI – 2023
(1)
Email: thuy.nguyenthithu2@hust.edu.vn
Nguyễn Thị Thu Thủy (SAMI-HUST) ProSta-SigPro-CHAP4 1/77 HANOI – 2023 1 / 77
CHAPTER OUTLINE
Content
1 4.1 BASIC CONCEPTS OF HYPOTHESIS TESTING
4.1.1 Random Sample and Sample Mean
4.1.2 Statistical Inference
2 4.2 SIGNIFICANCE TESTING
4.2.1 Significance Level
4.2.2 Design a Significance Test
4.2.3 Two Kinds of Errors
4.2.4 Problems
3 4.3 BINARY HYPOTHESIS TESTING
4.3.1 Likelihood Functions
4.3.2 Type I Error and Type II Error
4.3.3 Design of a Binary Hypothesis Test
4.3.4 Four Methods of Choosing A0
4.3.5 Problems
4 4.4 MULTIPLE HYPOTHESIS TEST
Random Sample
- To define the random sample, consider repeated independent trials of an experiment. Each trial results in one
observation of a random variable, X. After n trials, we have sample values of the n random variables
X1 , . . . , Xn , all with the same CDF as X. Define (X1 , . . . , Xn ) to be a random sample of size n from X.
Sample Mean
Let (X1 , . . . , Xn ) be a random sample of size n from X. The sample mean is the numerical average of the
observations:
X1 + · · · + Xn
X= (1)
n
σ2
E(X) = µ and Var(X) = . (2)
n
Statistical Inference
- Some of the most important applications of probability theory involve reasoning in the presence of
uncertainty. In these applications, we analyze the observations of an experiment in order to arrive at a
conclusion. When the conclusion is based on the properties of random variables, the reasoning is referred to as
statistical inference. Like probability theory, the theory of statistical inference refers to an experiment consisting
of a procedure and observations. In all statistical inference methods, there is also a set of possible conclusions
and a means of measuring the accuracy of a conclusion.
- A statistical inference method assigns a conclusion to each possible outcome of the experiment. Therefore, a
statistical inference method consists of three steps:
1 perform an experiment;
2 observe an outcome;
3 state a conclusion.
Statistical Inference
- The assignment of conclusions to outcomes is based on probability theory. The aim of the assignment is to
achieve the highest possible accuracy. This chapter contains brief introductions to two categories of statistical
inference.
1 Significance Testing
Conclusion: Accept or reject the hypothesis that the observations result from a certain probability model
H0 .
Accuracy Measure: Probability of rejecting the hypothesis when it is true.
2 Hypothesis Testing
Conclusion: The observations result from one of M hypothetical probability models: H0 , H1 , . . . , HM −1 .
Accuracy Measure: Probability that the conclusion is Hi when the true model is Hj for
i, j = 0, 1, . . . , M − 1.
Example 1
Suppose X1 , . . . , Xn are iid samples of an exponential Exp(λ) random variable X with unknown parameter
λ. Using the observations X1 , . . . , Xn , each of the statistical inference methods can answer questions
regarding the unknown λ. For each of the methods, we state the underlying assumptions of the method
and a question that can be addressed by the method.
1 Significance Test: Assuming λ is a constant, should we accept or reject the hypothesis that λ = 3.5?
2 Hypothesis Test: Assuming λ is a constant, does λ equal 2.5, 3.5, or 4.5?
- To answer either of the questions in Example 1, we have to state in advance which values of X1 , . . . , Xn
produce each possible answer.
1 For a significance test, the answer must be either accept or reject.
2 For the hypothesis test, the answer must be one of the numbers 2.5, 3.5, or 4.5.
Content
1 4.1 BASIC CONCEPTS OF HYPOTHESIS TESTING
4.1.1 Random Sample and Sample Mean
4.1.2 Statistical Inference
2 4.2 SIGNIFICANCE TESTING
4.2.1 Significance Level
4.2.2 Design a Significance Test
4.2.3 Two Kinds of Errors
4.2.4 Problems
3 4.3 BINARY HYPOTHESIS TESTING
4.3.1 Likelihood Functions
4.3.2 Type I Error and Type II Error
4.3.3 Design of a Binary Hypothesis Test
4.3.4 Four Methods of Choosing A0
4.3.5 Problems
4 4.4 MULTIPLE HYPOTHESIS TEST
Significance Level
- A significance test begins with the hypothesis, H0 , that a certain probability model describes the observations
of an experiment. The question addressed by the test has two possible answers: accept the hypothesis or reject
it.
Definition 2 (Significance Level)
The significance level of the test is defined as the probability of rejecting the hypothesis if it is true.
- The test divides S, the sample space of the experiment, into an event space consisting of an acceptance set
A and a rejection set R = A. If the observation s ∈ A, we accept H0 . If s ∈ R, we reject the hypothesis. The
significance level is
α = P (s ∈ R). (3)
Suppose that on Thursdays between 9:00 and 9:30 at night, the number of call attempts N at a telephone
switching office is a Poisson random variable with an expected value of 1000. Next Thursday, the President
will deliver a speech at 9:00 that will be broadcast by all radio and television networks. The null
hypothesis, H0 , is that the speech does not affect the probability model of telephone calls. In other words,
H0 states that on the night of the speech, N is a Poisson random variable with an expected value of 1000.
Design a significance test for hypothesis H0 at a significance level of α = 0.05.
Solution.
The experiment involves counting the call requests, N , between 9:00 and 9:30 on the night of the speech.
To design the test, we need to specify a rejection set, R, such that
P (N ∈ R) = 0.05.
{n | |n − 1000| ≥ c}.
The remaining task is to choose c to satisfy Equation (3). Under hypothesis H0 , µN = σN = 1000. The
significance level is
N − µ c
N
α = P (|N − 1000| ≥ c) = P ≥ .
σN σN
Since E(N ) is large, we can use the central limit theorem and approximate (N − µN )/σN by the standard
normal distribution random variable Z so that
c c
α ' P |Z| ≥ √ =2 1−Φ √ = 0.05.
1000 1000
√ √
In this cases, Φ c/ 1000 = 0.975 and c = 1.96 × 1000 = 61.9806. Therefore, if we observe more than
1000 + 61 calls or fewer than 1000 − 61 calls, we reject the null hypothesis at significance level 0.05.
(2)
(2) N√−λ
If N is a Poisson random variable with E(N ) = λ and V ar(N ) = λ, Z = λ
is approximately a standard
normal random variable. The approximation is good for λ > 5.
Nguyễn Thị Thu Thủy (SAMI-HUST) ProSta-SigPro-CHAP4 14/77 HANOI – 2023 14 / 77
4.2 SIGNIFICANCE TESTING 4.2.3 Two Kinds of Errors
- In a significance test, two kinds of errors are possible. Statisticians refer to them as Type I errors and Type II
errors with the following definitions.
- The hypothesis specified in a significance test makes it possible to calculate the probability of a Type I error,
α = P (s ∈ R).
In the absence of a probability model for the condition “H0 false,” there is no way to calculate the probability of
a Type II error.
- A binary hypothesis test, described in next section, includes an alternative hypothesis H1 . Then it is possible
to use the probability model given by H1 to calculate the probability of a Type II error, which is
P (s ∈ A|H1 ).
- Although a significance test does not specify a complete probability model as an alternative to the null
hypothesis, the nature of the experiment influences the choice of the rejection set, R. In Example 2, we
implicitly assume that the alternative to the null hypothesis is a probability model with an expected value that is
either higher than 1000 or lower than 1000.
- In the following example, the alternative is a model with an expected value that is lower than the original
expected value.
Before releasing a diet pill to the public, a drug company runs a test on a group of 64 people. Before
testing the pill, the probability model for the weight of the people measured in pounds is a normal
N (190, 242 ) random variable X. Design a test based on the sample mean of the weight of the population to
determine whether the pill has a significant effect. The significance level is α = 0.01.
Solution.
Under the null hypothesis, H0 , the probability model after the people take the diet pill, is a N (190, 242 ),
the same as before taking the pill.
P (X ∈ R) = 0.01.
If we reject the null hypothesis, we will decide that the pill is effective and release it to the public.
In this example, we want to know whether the pill has caused people to lose weight. If they gain weight,
we certainly do not want to declare the pill effective. Therefore, we choose the rejection set R to consist
entirely of weights below the original expected value: R = {X ≤ r0 }. We choose r0 so that the probability
that we reject the null hypothesis is 0.01:
r − 190
0
P (X ∈ R) = P (X ≤ r0 ) = Φ = 0.01.
3
Since Φ(−2.33) = 0.01, it follows that (r0 − 190)/3 = −2.33, or r0 = 183.01.
Thus we will reject the null hypothesis and accept that the diet pill is effective at significance level 0.01 if
the sample mean of the population weight drops to 183.01 pounds or less.
-
Note the difference between the symmetrical rejection set in Example 2 and the one-sided rejection set in
Example 3.
We selected these sets on the basis of the application of the results of the test.
In the language of statistical inference, the symmetrical set is part of a two-tail significance test, and the
one-sided rejection set is part of a one-tail significance test.
Table
Z x
1 −t2
Table 1 (Φ(x) = √ e 2 dt)
2π −∞
x 0 1 2 3 4 5 6 7 8 9
0.0 0.50000 50399 50798 51197 51595 51994 52392 52790 53188 53586
0.1 53983 54380 54776 55172 55567 55962 56356 56749 57142 57535
0.2 57926 58317 58706 59095 59483 59871 60257 60642 61026 61409
0.3 61791 62172 62556 62930 63307 63683 64058 64431 64803 65173
0.4 65542 65910 66276 66640 67003 67364 67724 68082 68439 68739
0.5 69146 69447 69847 70194 70544 70884 71226 71566 71904 72240
0.6 72575 72907 73237 73565 73891 74215 74537 74857 75175 75490
0.7 75804 76115 76424 76730 77035 77337 77637 77935 78230 78524
0.8 78814 79103 79389 79673 79955 80234 80511 80785 81057 81327
0.9 81594 81859 82121 82381 82639 82894 83147 83398 83646 83891
1.0 84134 84375 84614 84850 85083 85314 85543 85769 85993 86214
1.1 86433 86650 86864 87076 87286 87493 87698 87900 88100 88298
1.2 88493 88686 88877 89065 89251 89435 89617 89796 89973 90147
1.3 90320 90490 90658 90824 90988 91149 91309 91466 91621 91774
1.4 91924 92073 92220 92364 92507 92647 92786 92922 93056 93189
1.5 93319 93448 93574 93699 93822 93943 94062 94179 94295 94408
1.6 94520 94630 94738 94845 94950 95053 95154 95254 95352 95449
1.7 95543 95637 95728 95818 95907 95994 96080 96164 96246 96327
1.8 96407 96485 96562 96638 96712 96784 96856 96926 96995 97062
1.9 97128 97193 97257 97320 97381 97441 97500 97558 97615 97670
Table 1 (continuous)
Z x
1 −t2
Φ(x) = √ e 2 dt
2π −∞
x 0 1 2 3 4 5 6 7 8 9
2.0 97725 97778 97831 97882 97932 97982 98030 98077 98124 98169
2.1 98214 98257 98300 98341 98382 98422 99461 98500 98537 98574
2.2 98610 98645 98679 98713 98745 98778 98809 98840 98870 98899
2.3 98928 98956 98983 99010 99036 99061 99086 99111 99134 99158
2.4 99180 99202 99224 99245 99266 99285 99305 99324 99343 99361
2.5 99379 99396 99413 99430 99446 99261 99477 99492 99506 99520
2.6 99534 99547 99560 99573 99585 99598 99609 99621 99632 99643
2.7 99653 99664 99674 99683 99693 99702 99711 99720 99728 99763
2.8 99744 99752 99760 99767 99774 99781 99788 99795 99801 99807
2.9 99813 99819 99825 99831 99836 99841 99846 99851 99856 99861
3.0 0,99865 3,1 99903 3,2 99931 3,3 99952 3,4 99966
3.5 99977 3,6 99984 3,7 99989 3,8 99993 3,9 99995
4.0 999968
4.5 999997
5.0 99999997
Problems
Exercise 1
Let K be the number of heads in n = 100 flips of a coin. Devise significance tests for the hypothesis H that
the coin is fair such that
(a) The significance level α = 0.05 and the rejection set R has the form {|K − E(K)| > c}.
(b) The significance level α = 0.01 and the rejection set R has the form {K > c}.
Problems
Exercise 2
The duration of a voice telephone call is an exponential random variable T with expected value E(T ) = 3
minutes. Data calls tend to be longer than voice calls on average. Observe a call and reject the null
hypothesis that the call is a voice call if the duration of the call is greater than t0 minutes.
(a) Write a formula for α, the significance of the test as a function of t0 .
(b) What is the value of t0 that produces a significance level α = 0.05?
Problems
Exercise 3
A class has 2n (a large number) students. The students are separated into two groups A and B each with n
students. Group A students take exam A and earn iid scores X1 , . . . , Xn . Group B students take exam B,
earning iid scores Y1 , . . . , Yn . The two exams are similar but different; however, the exams were designed so
that a student’s score X on exam A or Y on exam B have the same mean and variance σ 2 = 100. For each
exam, we form the sample mean statistic
X1 + · · · + Xn Y1 + · · · + Yn
XA = YB = .
n n
Based on the statistic D = X A − Y B , use the central limit theorem to design a significance test at
significance level α = 0.05 for the hypothesis H0 that a students score on the two exams has the same mean
µ and variance σ 2 = 100. What is the rejection region if n = 100? Make sure to specify any additional
assumptions that you need to make; however, try to make as few additional assumptions as possible.
Content
1 4.1 BASIC CONCEPTS OF HYPOTHESIS TESTING
4.1.1 Random Sample and Sample Mean
4.1.2 Statistical Inference
2 4.2 SIGNIFICANCE TESTING
4.2.1 Significance Level
4.2.2 Design a Significance Test
4.2.3 Two Kinds of Errors
4.2.4 Problems
3 4.3 BINARY HYPOTHESIS TESTING
4.3.1 Likelihood Functions
4.3.2 Type I Error and Type II Error
4.3.3 Design of a Binary Hypothesis Test
4.3.4 Four Methods of Choosing A0
4.3.5 Problems
4 4.4 MULTIPLE HYPOTHESIS TEST
Likelihood Functions
-
In a binary hypothesis test, there are two hypothetical probability models, H0 and H1 , and two possible
conclusions: accept H0 as the true model, and accept H1 .
There is also a probability model for H0 and H1 , conveyed by the numbers P (H0 ) and
P (H1 ) = 1 − P (H0 ). These numbers are referred to as the a priori probabilities or prior probabilities of H0
and H1 . They reflect the state of knowledge about the probability model before an outcome is observed.
The complete experiment for a binary hypothesis test consists of two sub-experiments.
1 The first sub-experiment chooses a probability model from sample space S 0 = {H0 , H1 }. The
probability models H0 and H1 have the same sample space, S.
2 The second sub-experiment produces an observation corresponding to an outcome, s ∈ S.
Likelihood Functions
-
When the observation leads to a random vector X, we call X the decision statistic. Often, the decision
statistic is simply a random variable X.
1 When the decision statistic X is discrete, the probability models are conditional probability
mass functions PX|H0 (x) and PX|H1 (x).
2 When X is a continuous random vector, the probability models are conditional probability
density functions fX|H0 (x) and fX|H1 (x).
In the terminology of statistical inference, these functions are referred to as likelihood functions.
For example, fX|H0 (x) is the likelihood of x given H0 .
-
The test design divides S into two sets, A0 and A1 = A0 .
1 If the outcome s ∈ A0 , the conclusion is accept H0 .
2 Otherwise, the conclusion is accept H1 .
The accuracy measure of the test consists of two error probabilities.
1 P (A1 |H0 ) corresponds to the probability of a Type I error. It is the probability of accepting
H1 when H0 is the true probability model.
2 Similarly, P (A0 |H1 ) is the probability of accepting H0 when H1 is the true probability
model. It corresponds to the probability of a Type II error.
Solution.
The design of a binary hypothesis test represents a trade-off between the two error probabilities,
PFA = P (A1 |H0 ) and PMISS = P (A0 |H1 ).
To understand the trade-off, consider an extreme design in which A0 = S consists of the entire sample
space and A1 = ∅ is the empty set. In this case, PFA = 0 and PMISS = 1.
Now let A1 expand to include an increasing proportion of the outcomes in S. As A1 expands, PFA
increases and PMISS decreases. At the other extreme, A0 = ∅, which implies PMISS = 0. In this case,
A1 = S and PFA = 1.
A graph representing the possible values of PFA and PMISS is referred to as a receiver operating curve
(ROC). Examples appear in Figure 1.
Example 5
The noise voltage in a radar detection system is a standard normal random variable, N . When a target is
present, the received signal is X = v + N volts with v ≥ 0. Otherwise, the received signal is X = N volts.
Periodically, the detector performs a binary hypothesis test with H0 as the hypothesis no target and H1 as
the hypothesis target present. The acceptance sets for the test are A0 = {X < x0 } and A1 = {X ≥ x0 }.
Find PMISS and PFA as functions of x0 .
Solution.
To perform the calculations, we observe that
under hypothesis H0 , X = N is a normal N (0, 1) random variable;
under hypothesis H1 , X = v + N is a normal N (v, 1) random variable.
Therefore,
and
Figure 2: The probability of a miss and the probability of a false alarm as a function the threshold x0
for Example 5
- The same data also appears in the corresponding receiver operating curves of Figure 3.
Figure 3: The corresponding receiver operating curve for the system. We see that the ROC improves as
v increases
Example 6
A modem transmits a binary signal to another modem. Based on a noisy measurement, the receiving
modem must choose between hypothesis H0 (the transmitter sent a 0) and hypothesis H1 (the transmitter
sent a 1). A false alarm occurs when a 0 is sent but a 1 is detected at the receiver. A miss occurs when a 1
is sent but a 0 is detected. For both types of error, the cost is the same.The MAP test minimizes PERR , the
total probability of error of a binary hypothesis test. Consider the PERR .
Solution.
The law of total probability, Chapter 1, relates PERR to the a priori probabilities of H0 and H1 and to the
two conditional error probabilities, PFA = P (A1 |H0 ) and PMISS = P (A0 |H1 ):
When the two types of errors have the same cost, as in Example 6, minimizing PERR is a sensible strategy.
- The following theorem specifies the binary hypothesis test that produces the minimum possible PERR .
Given a binary hypothesis testing experiment with outcome s, the following rule leads to the lowest
possible value of PERR :
-
Note that P (H0 |s) and P (H1 |s) are referred to as the a posteriori probabilities of H0 and H1 .
Just as the a priori probabilities P (H0 ) and P (H1 ) reflect our knowledge of H0 and H1 prior to
performing an experiment, P (H0 |s) and P (H1 |s) reflect our knowledge after observing s.
Theorem 1 states that in order to minimize PERR it is necessary to accept the hypothesis with the higher a
posteriori probability.
A test that follows this rule is a maximum a posteriori probability (MAP) hypothesis test.
- Equation (5) is another statement of the MAP decision rule. It contains the three probability models that are
assumed to be known:
The a priori probabilities of the hypotheses: P (H0 ) and P (H1 ).
The likelihood function of H0 : P (s|H0 ).
The likelihood function of H1 : P (s|H1 ).
- When the outcomes of an experiment yield a random vector X as the decision statistic, we can express the
MAP rule in terms of conditional PMFs or PDFs.
If X is discrete, we take X = xi to be the outcome of the experiment.
If the sample space S of the experiment is continuous, we interpret the conditional probabilities by
assuming that each outcome corresponds to the random vector X in the small volume x ≤ X < x + dx
with probability fX (x)dx.
Theorem 2
For an experiment that produces a random vector X, the MAP hypothesis test is
- Theorem 2 states that H0 is the better conclusion if the evidence in favor of H0 , based on the experiment,
outweighs the prior evidence in favor of H1 .
Example 7
σ2 p
x ∈ A0 if x ≤ x∗ = ln ; x ∈ A1 otherwise.
2v 1−p
When p = 1/2, the threshold x∗ = 0 and the conclusion depends only on whether the evidence in the
received signal favors 0 or 1, as indicated by the sign of x.
When p 6= 1/2, the prior information shifts the decision threshold x∗ .
1 The shift favors 1 (x∗ < 0) if p < 1/2.
2 The shift favors 0 (x∗ > 0) if p > 1/2.
Example 8
Example 9
At a computer disk drive factory, the manufacturing failure rate is the probability that a randomly chosen
new drive fails the first time it is powered up. Normally the production of drives is very reliable, with a
failure rate q0 = 10−4 . However, from time to time there is a production problem that causes the failure
rate to jump to q1 = 10−1 . Let Hi denote the hypothesis that the failure rate is qi .
Every morning, an inspector chooses drives at random from the previous day’s production and tests them.
If a failure occurs too soon, the company stops production and checks the critical part of the process.
Production problems occur at random once every ten days, so that P (H1 ) = 0.1 = 1 − P (H0 ). Based on N ,
the number of drives tested up to and including the first failure, design a MAP hypothesis test. Calculate
the conditional error probabilities PFA and PMISS and the total error probability PERR .
and expected value 1/qi (see Chapter 2). That is, PN |Hi (n) = qi (1 − qi )n−1 for n = 1, 2, . . . and PN |Hi (n) = 0
otherwise. Therefore, by Theorem 2, the MAP design states
Or
1 − q n−1 q1 P (H1 )
0
n ∈ A0 if ≥ ; n ∈ A1 otherwise
1 − q1 q0 P (H0 )
Substituting q0 = 10−4 , q1 = 10−1 , P (H0 ) = 0.9, and P (H1 ) = 0.1, we obtain n∗ = 45.8.
Therefore, in the MAP hypothesis test, A0 = {n ≥ 46}.
This implies that the inspector tests at most 45 drives in order to reach a conclusion about the failure
rate. If the first failure occurs before test 46, the company assumes that the failure rate is 10−1 . If the
first 45 drives pass the test, then N ≥ 46 and the company assumes that the failure rate is 10−4 .
The error probabilities are:(3)
n
(3)
Using q 0 + q 1 + · · · + q n−1 = q 0 1−q
1−q
.
Nguyễn Thị Thu Thủy (SAMI-HUST) ProSta-SigPro-CHAP4 50/77 HANOI – 2023 50 / 77
4.3 BINARY HYPOTHESIS TESTING 4.3.4 Four Methods of Choosing A0
-
The MAP test implicitly assumes that both types of errors (miss and false alarm) are equally serious. This
is not the case in many important situations.
Consider an application in which C = C10 units is the cost of a false alarm (decide H1 when H0 is
correct) and C = C01 units is the cost of a miss (decide H0 when H1 is correct). In this situation the
expected cost of test errors is
E(C) = P (H0 )P (A1 |H0 )C10 + P (H1 )P (A0 |H1 )C01 (10)
For an experiment that produces a random vector X, the minimum cost hypothesis test is
- In this test, we note that only the relative cost C01 /C10 influences the test, not the individual costs or the
units in which cost is measured. A ratio > 1 implies that misses are more costly than false alarms. Therefore, a
ratio > 1 expands A1 , the acceptance set for H1 , making it harder to miss H1 when it is correct. On the other
hand, the same ratio contracts H0 and increases the false alarm probability, because a false alarm is less costly
than a miss.
Example 10
Continuing the disk drive test of Example 9, the factory produces 1,000 disk drives per hour and 10,000
disk drives per day. The manufacturer sells each drive for $100. However, each defective drive is returned
to the factory and replaced by a new drive. The cost of replacing a drive is $200, consisting of $100 for the
replacement drive and an additional $100 for shipping, customer support, and claims processing. Further,
remedying a production problem results in 30 minutes of lost production. Based on the decision statistic
N , the number of drives tested up to and including the first failure, what is the minimum cost test?
Solution.
Based on the given facts, the cost C10 of a FA is 30 minutes (500 drives) of lost production, or roughly
$50,000.
On the other hand, the cost C01 of a MISS is that 10% of the daily production will be returned for
replacement. For 1,000 drives returned at $200 per drive, the expected cost is 200,000 dollars.
The minimum cost test is
PN |H0 (n) P (H1 )C01
n ∈ A0 if ≥ ; x ∈ A1 otherwise.
PN |H1 (n) P (H0 )C10
Therefore, in the minimum cost hypothesis test, A0 = {n ≥ 59}. An inspector tests at most 58 disk drives
to reach a conclusion regarding the state of the factory.
If 58 drives pass the test, N ≥ 59, and the failure rate is assumed to be 10−4 . The error probabilities are:
By comparison, the MAP test, which minimizes the probability of an error, rather than the expected cost
has an expected cost
A savings of $72.15 may not seem very large. The reason is that both the MAP test and the minimum
cost test work very well.
By comparison, for a “no test” policy that skips testing altogether, each day that the failure rate is
q1 = 0.1 will result, on average, in 1,000 returned drives at an expected cost of $200,000. Since such days
will occur with probability P (H1 ) = 0.1, the expected cost of a “no test” policy is $20,000 per day.
-
Given an observation, the MAP test minimizes the probability of accepting the wrong hypothesis and the
minimum cost test minimizes the cost of errors.
However, the MAP test requires that we know the a priori probabilities P (Hi ) of the competing
hypotheses, and the minimum cost test requires that we know in addition the relative costs of the two
types of errors.
In many situations, these costs and a priori probabilities are difficult or even impossible to specify.
The Neyman-Pearson test minimizes PMISS subject to the false alarm probability constraint PFA = α,
where α is a constant that indicates our tolerance of false alarms.
Because PFA = P (A1 |H0 ) and PMISS = P (A0 |H1 ) are conditional probabilities, the test does not require
the a priori probabilities P (H0 ) and P (H1 ).
Based on the decision statistic X, a continuous random vector, the decision rule that minimizes PMISS ,
subject to the constraint PFA = α, is
fX|H0 (x)
x ∈ A0 if L(x) = ≥ γ; x ∈ A1 otherwise (13)
fX|H1 (x)
Z
where γ is chosen so that fX|H0 (x)dx = α.
L(x)<γ
Based on the decision statistic X, a decision random vector, the decision rule that minimizes PMISS ,
subject to the constraint PFA ≤ α, is
PX|H0 (x)
x ∈ A0 if L(x) = ≥ γ; x ∈ A1 otherwise (14)
PX|H1 (x)
P
where γ is the largest possible value such that L(x)<γ PX|H0 (x) ≤ α.
Example 11
Continuing the disk drive factory test of Example 9, design a Neyman-Pearson test such that the false
alarm probability satisfies PFA ≤ α = 0.01. Calculate the resulting miss and false alarm probabilities.
Solution.
The Neyman-Pearson test is
PN |H0 (n)
n ∈ A0 if L(n) = ≥ γ; n ∈ A1 otherwise.
PN |H1 (n)
We see from Equation (8) that this is the same as the MAP test with P (H1 )/P (H0 ) replaced by γ.
Thus, just like the MAP test, the Neyman-Pearson test must be a threshold test of the form
n ∈ A0 if n ≥ n∗ ; n ∈ A1 otherwise.
Since ∗
PFA = P (N ≤ n∗ − 1|H0 ) = 1 − (1 − q0 )n −1
≤ α.
This implies
ln(1 − α) ln(0.99)
n∗ ≤ 1 + =1+ = 101.49.
ln(1 − q0 ) ln(0.9)
Thus, we can choose n∗ = 101 and still meet the false alarm probability constraint.
The error probabilities are:
- We see that a one percent false alarm probability yields a dramatic reduction in the probability of a miss.
Although the Neyman-Pearson test minimizes neither the overall probability of a test error nor the expected cost
E(C), it may be preferable to either the MAP test or the minimum cost test. In particular, customers will judge
the quality of the disk drives and the reputation of the factory based on the number of defective drives that are
shipped. Compared to the other tests, the Neyman-Pearson test results in a much lower miss probability and far
fewer defective drives being shipped.
-
Similar to the Neyman-Pearson test, the maximum likelihood (ML) test is another method that avoids the
need for prior probabilities.
Under the ML approach, we treat the hypothesis as some sort of “unknown” and choose a hypothesis Hi
for which P (s|Hi ), the conditional probability of the outcome s given the hypothesis Hi is the largest.
The idea behind choosing a hypothesis to maximize the probability of the observation is to avoid making
assumptions about the a priori probabilities P (Hi ).
The resulting decision rule, called the maximum likelihood (ML) rule, can be written mathematically.
For a binary hypothesis test based on the experimental outcome s ∈ S, the maximum likelihood (ML)
decision rule is
-
Comparing Theorem 1 and Definition 4, we see that in the absence of information about the a priori
probabilities P (Hi ), we have adopted a maximum likelihood decision rule that is the same as the MAP
rule under the assumption that hypotheses H0 and H1 occur with equal probability.
In essence, in the absence of a priori information, the ML rule assumes that all hypotheses are equally likely.
By comparing the likelihood ratio to a threshold equal to 1, the ML hypothesis test is neutral about
whether H0 has a higher probability than H1 or vice versa.
- When the decision statistic of the experiment is a random vector X, we can express the ML rule in terms of
conditional PMFs or PDFs, just as we did for the MAP rule.
Theorem 6
PX|H0 (x)
Discrete: x ∈ A0 if ≥ 1; x ∈ A1 otherwise (16)
PX|H1 (x)
fX|H0 (x)
Continuous: x ∈ A0 if ≥ 1; x ∈ A1 otherwise (17)
fX|H1 (x)
Example 12
Continuing the disk drive test of Example 9, design the maximum likelihood test for the factory state
based on the decision statistic N , the number of drives tested up to and including the first failure.
Solution.
The ML hypothesis test corresponds to the MAP test with P (H0 ) = P (H1 ) = 0.5.
In this case, Equation (9) implies n∗ = 66.62 or A0 = {n ≥ 67}.
The conditional error probabilities under the ML rule are
Problems
Exercise 4
The duration of a voice telephone call is an exponential random variable V with expected value E(V ) = 3
minutes. The duration of a data call is an exponential random variable D with expected value
E(D) = µD > 3 minutes. The null hypothesis of a binary hypothesis test is H0 : a call is a voice call. The
alternative hypothesis is H1 : a call is a data call. The probability of a voice call is P (V ) = 0.8. The
probability of a data call is P (D) = 0.2. A binary hypothesis test measures T minutes, the duration of a
call. The decision is H0 if T ≤ t0 minutes. Otherwise, the decision is H1 .
(a) Write a formula for the false alarm probability as a function of t0 and µD .
(b) Write a formula for the miss probability as a function of t0 and µD .
(c) Calculate the maximum likelihood decision time t0 = tML for µD = 6 minutes and µD = 10 minutes.
(d) Calculate the maximum a posteriori probability decision time t0 = tMAP for µD = 6 minutes and
µD = 10 minutes.
Problems
Solution
Likelihood functions fT |H0 (t) = 13 e−t/3 , t ≥ 0 and fT |H1 (t) = µ1D e−t/µD , t ≥ 0.
The acceptance regions are A0 = {T | T ≤ t0 } and A1 = {T | T > t0 }.
Z ∞
(a) PFA = P (A1 |H0 ) = fT |H0 (t)dt, and PFA =?
t0
Z t0
(b) PMISS = P (A0 |H1 ) = fT |H1 (t)dt, and PMISS =?
0
ln(µD /3)
(c) t ∈ A0 if t ≤ tML = 1/3−1/µD
. When µD = 6, tML =? When µD = 10, tML =?
ln(4µD /3)
(d) t ∈ A0 if t ≤ tMAP = 1/3−1/µD
. When µD = 6, tMAP =? When µD = 10, tMAP =?
Problems
Exercise 5
In the radar system of Example 5, the probability that a target is present is P (H1 ) = 0.01. In the case of a
false alarm, the system issues an unnecessary alert at the cost of C10 = 1 unit. The cost of a miss is
C01 = 104 units because the target could cause a lot of damage. When the target is present, the voltage is
X = 4 + N , a Gaussian (4, 12 ) random variable. When there is no target present, the voltage is X = N , the
Gaussian (0, 12 ) random variable. In a binary hypothesis test, the acceptance sets are A0 = {X ≤ x0 } and
A1 = {X > x0 }.
(a) What is the average cost of the MAP policy?
(b) What is the average cost of the minimum cost policy?
Problems
Solution
(a) Given H0 , X is Gaussian (0, 12 ). Given H1 , X is Gaussian (4, 12 ).
The MAP rule simplifies to x ∈ A0 if x ≤ xMAP = 2 − 41 ln P (H1 )
P (H0 )
and xMAP =?
So PFA = P (X > xMAP |H0 ) =? and PMISS = P (X ≤ xMAP |H1 ) =?
Hence E(CMAP ) = C10 PFA P (H0 ) + C01 PMISS P (H1 ) =?
(b) The minimum cost test is x ∈ A0 if x ≤ xMC = 2 − 14 ln C 01 P (H1 )
C10 P (H0 )
and xMC =?
So PFA = P (X ≥ xMC |H0 ) =? and PMISS = P (X < xMC |H1 ) =?
Hence E(CMC ) = C10 PFA P (H0 ) + C01 PMISS P (H1 ) =?
Content
1 4.1 BASIC CONCEPTS OF HYPOTHESIS TESTING
4.1.1 Random Sample and Sample Mean
4.1.2 Statistical Inference
2 4.2 SIGNIFICANCE TESTING
4.2.1 Significance Level
4.2.2 Design a Significance Test
4.2.3 Two Kinds of Errors
4.2.4 Problems
3 4.3 BINARY HYPOTHESIS TESTING
4.3.1 Likelihood Functions
4.3.2 Type I Error and Type II Error
4.3.3 Design of a Binary Hypothesis Test
4.3.4 Four Methods of Choosing A0
4.3.5 Problems
4 4.4 MULTIPLE HYPOTHESIS TEST
-
There are many applications in which an experiment can conform to more than two known probability
models, all with the same sample space S. A multiple hypothesis test is a generalization of a binary
hypothesis test.
There are M hypothetical probability models: H0 , H1 , . . . , HM −1 . We perform an experiment and based
on the outcome, we come to the conclusion that a certain Hm is the true probability model.
The design of the experiment consists of dividing S into an event space consisting of mutually exclusive,
collectively exhaustive sets, A0 , A1 , . . . , AM −1 , such that the conclusion is accept Hi if s ∈ Ai .
The accuracy measure of the experiment consists of M 2 conditional probabilities, P (Ai |Hj ),
i, j = 0, 1, 2, . . . , M − 1. The M probabilities, P (Ai |Hi ), i = 0, 1, . . . , M − 1 are probabilities of correct
decisions.
Example 13
A computer modem is capable of transmitting 16 different signals. Each signal represents a sequence of
four bits in the digital bit stream at the input to the modem. The modem receiver examines the received
signal and produces four bits in the bit stream at the modem’s output. The design of the modem considers
the task of the receiver to be a test of 16 hypotheses H0 , H1 , . . . , H15 , where H0 represents 0000, H1
represents 0001,. . . , and H15 represents 1111. The sample space of the experiment is an ensemble of
possible received signals. The test design places each outcome s in a set Ai such that the event s ∈ Ai
leads to the output of the four-bit sequence corresponding to Hi .
-
For a multiple hypothesis test, the MAP hypothesis test and the ML hypothesis test are generalizations of
the tests in Theorem 1 and Definition 4. Minimizing the probability of error corresponds to maximizing the
probability of a correct decision,
M
X −1
PCORRECT = P (Hi )P (Ai |Hi ) (18)
i=0
As in binary hypothesis testing, we can apply Bayes’ theorem to derive a decision rule based on the
probability models (likelihood functions) corresponding to the hypotheses and the a priori probabilities of
the hypotheses. Therefore, corresponding to Theorem 2, we have the following generalization of the MAP
binary hypothesis test.
Theorem 7
For an experiment that produces a random variable X, the MAP multiple hypothesis test is
Discrete xi ∈ Am if P (Hm )PX|Hm (xi ) ≥ P (Hj )PX|Hj (xi ) for all j (19)
Continuous x ∈ Am if P (Hm )fX|Hm (x) ≥ P (Hj )fX|Hj (x) for all j (20)
- If information about the a priori probabilities of the hypotheses is not available, a maximum likelihood
hypothesis test is appropriate: