Professional Documents
Culture Documents
actually true in the population; a type II error (false-negative) occurs if the investigator fails
to reject a null hypothesis that is actually false in the population.
Type I and type II errors are part of the process of hypothesis testing.
There are two kinds of errors, which by design cannot be avoided, and we must be aware that
these errors exist.
Type I and Type II errors
Published on January 18, 2021 by Pritha Bhandari. Revised on May 7, 2021.
In statistics, a Type I error is a false positive conclusion, while a Type II error is a false
negative conclusion.
Making a statistical decision always involves uncertainties, so the risks of making these
errors are unavoidable in hypothesis testing.
Then, you decide whether the null hypothesis can be rejected based on your data
and the results of a statistical test. Since these decisions are based on probabilities,
there is always a risk of making the wrong conclusion.
If your results show statistical significance, that means they are very unlikely
to occur if the null hypothesis is true. In this case, you would reject your null
hypothesis. But sometimes, this may actually be a Type I error.
If your findings do not show statistical significance, they have a high chance of
occurring if the null hypothesis is true. Therefore, you fail to reject your null
hypothesis. But sometimes, this may be a Type II error.
Example: Type I and Type II errorsA Type I error happens when you get false positive
results: you conclude that the drug intervention improved symptoms when it actually didn’t.
These improvements could have arisen from other random factors or measurement errors.
A Type II error happens when you get false negative results: you conclude that the drug
intervention didn’t improve symptoms when it actually did. Your study may have missed key
indicators of improvements or attributed any improvements to other factors instead.
At the tail end, the shaded area represents alpha. It’s also called a critical region in
statistics.
If your results fall in the critical region of this curve, they are considered statistically
significant and the null hypothesis is rejected. However, this is a false positive
conclusion, because the null hypothesis is actually true in this case!
Type II error
A Type II error means not rejecting the null hypothesis when it’s actually false. This is not
quite the same as “accepting” the null hypothesis, because hypothesis testing can only tell
you whether to reject the null hypothesis.
Instead, a Type II error means failing to conclude there was an effect when there actually
was. In reality, your study may not have had enough statistical power to detect an effect of a
certain size.
Power is the extent to which a test can correctly detect a real effect when there is one. A
power level of 80% or higher is usually considered acceptable.
The risk of a Type II error is inversely related to the statistical power of a study. The higher
the statistical power, the lower the probability of making a Type II error.
Example: Statistical power and Type II errorWhen preparing your clinical study, you
complete a power analysis and determine that with your sample size, you have an 80%
chance of detecting an effect size of 20% or greater. An effect size of 20% means that the
drug intervention reduces symptoms by 20% more than the control treatment.
However, a Type II may occur if an effect that’s smaller than this size. A smaller effect size
is unlikely to be detected in your study due to inadequate statistical power.
Statistical power is determined by:
To (indirectly) reduce the risk of a Type II error, you can increase the sample size or the
significance level.
The Type II error rate is beta (β), represented by the shaded area on the left side. The
remaining area under the curve represents statistical power, which is 1 – β.
Increasing the statistical power of your test directly decreases the risk of making a Type II
error.
This means there’s an important tradeoff between Type I and Type II errors:
Setting a lower significance level decreases a Type I error risk, but increases a Type II
error risk.
Increasing the power of a test decreases a Type II error risk, but increases a Type I
error risk.
The null hypothesis distribution shows all possible results you’d obtain if the null
hypothesis is true. The correct conclusion for any point on this distribution means not
rejecting the null hypothesis.
The alternative hypothesis distribution shows all possible results you’d obtain if the
alternative hypothesis is true. The correct conclusion for any point on this distribution
means rejecting the null hypothesis.
Type I and Type II errors occur where these two distributions overlap. The blue shaded area
represents alpha, the Type I error rate, and the green shaded area represents beta, the Type II
error rate.
By setting the Type I error rate, you indirectly influence the size of the Type II error rate as
well.
A Type I error means mistakenly going against the main statistical assumption of a null
hypothesis. This may lead to new policies, practices or treatments that are inadequate or a
waste of resources.
Example: Consequences of a Type I errorBased on the incorrect conclusion that the new drug
intervention is effective, over a million patients are prescribed the medication, despite risks of
severe side effects and inadequate research on the outcomes. The consequences of this Type I
error also mean that other treatment options are rejected in favor of this intervention.
In contrast, a Type II error means failing to reject a null hypothesis. It may only result in
missed opportunities to innovate, but these can also have important practical consequences.
Example: Consequences of a Type II errorIf a Type II error is made, the drug intervention is
considered ineffective when it can actually improve symptoms of the disease. This means that
a medication with important clinical significance doesn’t reach a large number of patients
who could tangibly benefit from it
Parametric tests are those that make assumptions about the parameters of the population
distribution from which the sample is drawn. This is often the assumption that the population
data are normally distributed. Non-parametric tests are “distribution-free” and, as such, can
be used for non-Normal variables.
Parametric vs. Nonparametric
In steps 3 and 4, there are two general ways of assessing the difference between the groups to
see how “weird” the distribution is.
Parametric tests are used only where a normal distribution is assumed. The most widely used
tests are the t-test (paired or unpaired), ANOVA (one-way non-repeated, repeated; two-way,
three-way), linear regression and Pearson rank correlation.
Non-parametric tests are used when continuous data are not normally distributed or when
dealing with discrete variables. Most widely used are chi-squared, Fisher's exact tests,
Wilcoxon's matched pairs, Mann–Whitney U-tests, Kruskal–Wallis tests and Spearman rank
correlation.
If the average earnings from the sample data are sufficiently far from zero, then the
gambler will reject the null hypothesis and conclude the alternative hypothesis—
namely, that the expected earnings per play are different from zero. If the average
earnings from the sample data are near zero, then the gambler will not reject the
null hypothesis, concluding instead that the difference between the average from
the data and 0 is explainable by chance alone.
KEY TAKEAWAYS
Every time you conduct a hypothesis test, there are four possible outcomes of your decision to
reject or not reject the null hypothesis: (1) You don't reject the null hypothesis when it is true, (2)
you reject the null hypothesis when it is true, (3) you don't reject the null hypothesis when it is
false, and (4) you reject the null hypothesis when it is false.
Consider the following analogy: You are an airport security screener. For every passenger who
passes through your security checkpoint, you must decide whether to select the passenger for
further screening based on your assessment of whether he or she is carrying a weapon.
Suppose your null hypothesis is that the passenger has a weapon. As in hypothesis testing,
there are four possible outcomes of your decision: (1) You select the passenger for further
inspection when the passenger has a weapon, (2) you allow the passenger to board his flight
when the passenger has a weapon, (3) you select the passenger for further inspection when the
passenger has no weapon, and (4) you allow the passenger to board his flight when the
passenger has no weapon.
Which of the following outcomes corresponds to a Type I error?
A. You allow the passenger to board his flight when the passenger has a weapon.
B. You select the passenger for further inspection when the passenger has no weapon.
C. You allow the passenger to board his flight when the passenger has no weapon.
D. You select the passenger for further inspection when the passenger has a weapon.
Which of the following outcomes corresponds to a Type II error?
A. You allow the passenger to board his flight when the passenger has no weapon.
B. You select the passenger for further inspection when the passenger has no weapon.
C. You select the passenger for further inspection when the passenger has a weapon.
D> You allow the passenger to board his flight when the passenger has a weapon.
As a security screener, the worst error you can make is to allow the passenger to board his flight
when the passenger has a weapon. The probability that you make this error, in our hypothesis
testing analogy, is described by _____.
A one-tailed test is also known as a directional hypothesis or directional test.
Before the one-tailed test can be performed, null and alternative hypotheses have to
be established. A null hypothesis is a claim that the researcher hopes to reject. An
alternative hypothesis is the claim that is supported by rejecting the null hypothesis.
KEY TAKEAWAYS
A one-tailed test is a statistical hypothesis test set up to show that the
sample mean would be higher or lower than the population mean, but
not both.
When using a one-tailed test, the analyst is testing for the possibility of
the relationship in one direction of interest, and completely disregarding
the possibility of a relationship in another direction.
Before running a one-tailed test, the analyst must set up a null
hypothesis and an alternative hypothesis and establish a probability
value (p-value).