You are on page 1of 2

Heuristics leading to a statistical hypothesis testing procedure

Hypothesis testing is about testing the truthfulness of two conflicting statements regarding
some unknown parameter(s) in the model. One statement is called the null hypothesis (H0 )
and the other alternative hypothesis (H1 ).
To decide which of these hypotheses/statements is true, one needs a policy/rule that would
be followed. The process of creating the rule is called the test procedure. This is done before
the (numerical) data have been obtained. In other words, a test procedure is a general rule or
recipe (for any data that may be obtained), not just for a particular sampled data. The final
rule is stated in terms of when H0 should be rejected and is called the decision rule.
Thus, the decision rule should involve a quantity/formula, called the test statistic, which
can be calculated form the sampled data. The rule itself would be of the form “reject H0 when
the test statistic falls in a certain region”. This region is called the critical region and the
boundaries of this region is called the critical values.
Ideally, the test statistic and critical region should be formed in such a way that, we never
make mistakes. However, since the sampled data is only a part of the population, one cannot
expect that the conclusion based on this partial set of observations to be always correct. Clearly,
there could be two situations when errors occur.

ˆ If we decide to reject H0 (i.e., conclude that H0 is not true) when the (unknown) reality
is that H0 is true. This is called type I error.

ˆ If we decide to accept H0 (i.e., conclude that H0 is true) when the (unknown) reality is
that H1 is true. This is called type II error.

Since the (unknown) parameter remains unknown even after the sample data is obtained, one
would never know whether the conclusion based on the data is correct or not. So, to compare
two different procedures/rules/policy, one must work with the probabilities of making the errors
corresponding to a certain rule/procedure. As argued earlier, it is not possible to make both the
probabilities to be zero. The next best option is to keep both the probabilities very small (close
to zero), i.e., build up the decision rule in such a way that if one follows the decision rule many
many times, then only a small percentage of the times one would make errors. But it turns out
that that is also not possible. If one wants to make probability of type I error close to zero, the
probability of type II error shoots up to 1 and the vice versa. In other words, one cannot keep
both probabilities at a very small level.
For this reason, the following paradigm, called the Neyman-Pearson paradigm, is followed to
construct a decision rule.

ˆ Fix a small level α > 0. Consider all decision rules such that P (type I error) ≤ α. Then
choose the one that has the minimum probability of type II error.

With this paradigm, it is obvious that type I error is getting more priority. In particu-
lar, one does not wish to reject H0 (when it is actually true) more than 100 α% of the time,
whereas about the other error of rejecting H1 (when it is true) one tries to keep the probability
as low as possible. Consequently, a statistical hypothesis testing procedure is not symmetric –
hypothesis H0 is, in some sense, “guarded/protected”. While forming/naming the hypotheses,
one usually puts the “old”/“status quo” claims under H0 and the “challenging” claims on H1 .
We say: “the burden of proof is on H1 ”. In other words, if there is not enough/strong evidence
in the data for H1 , we hold on to H0 .

1
Some more terminologies regarding a test procedure are:
ˆ The (maximum allowed) probability of type I error (α) for which the test procedure (de-
cision rule) is built, is called the level of significance.
ˆ The power of the test procedure is 1 − P (type II error). The higher the power, the better
the procedure is (because the smaller the error, the better the procedure).
So, in essence, the paradigm suggests to choose the decision rule in such a way that the level of
significance is a small value α and at the same time, the power is the highest.
Since a decision rule needs to satisfy certain restrictions on probabilities, it would be com-
pletely dependent on the model assumptions. One needs to build up the procedure, separately,
for each model. This leads to the following steps/elements in a testing procedure. It is important
to note/realize that two steps should not be combined together .
1. Model for the data, including which parameters are (un)known. [Data should be properly
introduced/defined with symbol X, Y, Z etc.]
2. Two hypotheses H0 and H1 . [It should be about unknown parameters of model in Step 1.]
3. The test statistic. [This should be an expression/formula and not a number, i.e., it should
be a random variable. But one should be able to calculate it once the data is available.]
4. The probability distribution of the test statistic (as a r.v.) under H0 , i.e., if H0 is true.
[This will be needed to ensure that P (type I error) ≤ α, a given level of significance.]
5. The decision rule about when to reject H0 . [This should be stated in terms of the test
statistic from step 3 and ensuring that the given level of significance α is maintained.]
6. The observed value of the test statistic in the sample. [This is the numerical value obtained
by plugging in the sample-data in the formula in step 2.]
7. Conclusion using steps 5 and 6. [At the end, one should always state the conclusion using
the words/formulation of the given problem/context, not just “H0 is rejected/accepted”.]

Another important quantity in hypothesis testing is the so-called P-value. This is what
most of the computer software programmes provide. It is defined as the probability under H0
(i.e., if the hypothesis in H0 is indeed true) that the test statistic takes a value which is as
extreme as the observed one in the sample or even more extreme (in favour of H1 ).
The interpretation is as follows: Suppose the P-value is (very) small, say (close to) zero.
This means: under the assumption that H0 is true, one “should not” observe such a value of the
test statistic. But we are observing (have observed) it in the sample. Then, something must be
wrong somewhere. If our calculations are correct, then it must be that the assumption is wrong,
under which the probability is calculated. In other words, if P-value is small, then one should
suspect that H0 is not true.
If one wants to use the P-value in a statistical hypothesis testing, the last 3 steps (5 – 7) of
the above test procedure need to be changed.
5∗ . Shape of the critical region. [One does not need to determine the decision rule completely.
Just the shape : right-sided, left-sided, two-sided, etc.]
6∗ . The P-value. [This will use the observed value of the test statistic of step 6 earlier, the
shape of the critical region (step 5∗ ), and the probability distribution of step 4.]
7∗ . Conclusion: reject H0 if P-value is less than or equal to the given level of significance
α, otherwise accept H0 . [Once again, the final conclusion should be stated using the
words/formulation of the given problem/context.]

You might also like