Type I and Type II Errors

A type I error (false-positive) occurs if an investigator rejects a null hypothesis that is
actually true in the population; a type II error (false-negative) occurs if the investigator fails
to reject a null hypothesis that is actually false in the population.
Type I and type II errors are part of the process of hypothesis testing.
There are two kinds of errors, which by design cannot be avoided, and we must be aware that
these errors exist.
Type I and Type II errors
Published on January 18, 2021 by Pritha Bhandari. Revised on May 7, 2021.
In statistics, a Type I error is a false positive conclusion, while a Type II error is a false
negative conclusion.
Making a statistical decision always involves uncertainties, so the risks of making these
errors are unavoidable in hypothesis testing.
 error in statistical decision-making

 Using hypothesis testing, you can make decisions about whether your data
support or refute your research predictions.
 Hypothesis testing starts with the assumption of no difference between groups
or no relationship between variables in the population—this is the null
hypothesis. It’s always paired with an alternative hypothesis, which is your
research prediction of an actual difference between groups or a true
relationship between variables.
Then, you decide whether the null hypothesis can be rejected based on your data
and the results of a statistical test. Since these decisions are based on probabilities,
there is always a risk of making the wrong conclusion.
 If your results show statistical significance, that means they are very unlikely
to occur if the null hypothesis is true. In this case, you would reject your null
hypothesis. But sometimes, this may actually be a Type I error.
 If your findings do not show statistical significance, they have a high chance of
occurring if the null hypothesis is true. Therefore, you fail to reject your null
hypothesis. But sometimes, this may be a Type II error.
Example: Type I and Type II errorsA Type I error happens when you get false positive
results: you conclude that the drug intervention improved symptoms when it actually didn’t.
These improvements could have arisen from other random factors or measurement errors.
A Type II error happens when you get false negative results: you conclude that the drug
intervention didn’t improve symptoms when it actually did. Your study may have missed key
indicators of improvements or attributed any improvements to other factors instead.
Type I error rate

The null hypothesis distribution curve below shows the probabilities of obtaining all
possible results if the study were repeated with new samples and the null hypothesis
were true in the population.
At the tail end, the shaded area represents alpha. It’s also called a critical region in
statistics.
If your results fall in the critical region of this curve, they are considered statistically
significant and the null hypothesis is rejected. However, this is a false positive
conclusion, because the null hypothesis is actually true in this case!
Type II error
A Type II error means not rejecting the null hypothesis when it’s actually false. This is not
quite the same as “accepting” the null hypothesis, because hypothesis testing can only tell
you whether to reject the null hypothesis.
Instead, a Type II error means failing to conclude there was an effect when there actually
was. In reality, your study may not have had enough statistical power to detect an effect of a
certain size.
Power is the extent to which a test can correctly detect a real effect when there is one. A
power level of 80% or higher is usually considered acceptable.
The risk of a Type II error is inversely related to the statistical power of a study. The higher
the statistical power, the lower the probability of making a Type II error.
Example: Statistical power and Type II errorWhen preparing your clinical study, you
complete a power analysis and determine that with your sample size, you have an 80%
chance of detecting an effect size of 20% or greater. An effect size of 20% means that the
drug intervention reduces symptoms by 20% more than the control treatment.
However, a Type II may occur if an effect that’s smaller than this size. A smaller effect size
is unlikely to be detected in your study due to inadequate statistical power.
Statistical power is determined by:
 Size of the effect: Larger effects are more easily detected.

 Measurement error: Systematic and random errors in recorded data reduce power.
 Sample size: Larger samples reduce sampling error and increase power.
 Significance level: Increasing the significance level increases power.
To (indirectly) reduce the risk of a Type II error, you can increase the sample size or the
significance level.
Type II error rate

The alternative hypothesis distribution curve below shows the probabilities of obtaining all
possible results if the study were repeated with new samples and the alternative hypothesis
were true in the population.
The Type II error rate is beta (β), represented by the shaded area on the left side. The
remaining area under the curve represents statistical power, which is 1 – β.
Increasing the statistical power of your test directly decreases the risk of making a Type II
error.
Trade-off between Type I and Type II errors

The Type I and Type II error rates influence each other. That’s because the significance level
(the Type I error rate) affects statistical power, which is inversely related to the Type II error
rate.
This means there’s an important tradeoff between Type I and Type II errors:
 Setting a lower significance level decreases a Type I error risk, but increases a Type II
error risk.
 Increasing the power of a test decreases a Type II error risk, but increases a Type I
error risk.
This trade-off is visualized in the graph below. It shows two curves:
 The null hypothesis distribution shows all possible results you’d obtain if the null
hypothesis is true. The correct conclusion for any point on this distribution means not
rejecting the null hypothesis.
 The alternative hypothesis distribution shows all possible results you’d obtain if the
alternative hypothesis is true. The correct conclusion for any point on this distribution
means rejecting the null hypothesis.
Type I and Type II errors occur where these two distributions overlap. The blue shaded area
represents alpha, the Type I error rate, and the green shaded area represents beta, the Type II
error rate.
By setting the Type I error rate, you indirectly influence the size of the Type II error rate as
well.
Is a Type I or Type II error worse?

For statisticians, a Type I error is usually worse. In practical terms, however, either type of
error could be worse depending on your research context.
A Type I error means mistakenly going against the main statistical assumption of a null
hypothesis. This may lead to new policies, practices or treatments that are inadequate or a
waste of resources.
Example: Consequences of a Type I errorBased on the incorrect conclusion that the new drug
intervention is effective, over a million patients are prescribed the medication, despite risks of
severe side effects and inadequate research on the outcomes. The consequences of this Type I
error also mean that other treatment options are rejected in favor of this intervention.
In contrast, a Type II error means failing to reject a null hypothesis. It may only result in
missed opportunities to innovate, but these can also have important practical consequences.
Example: Consequences of a Type II errorIf a Type II error is made, the drug intervention is
considered ineffective when it can actually improve symptoms of the disease. This means that
a medication with important clinical significance doesn’t reach a large number of patients
who could tangibly benefit from it
In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways

of computing the statistical significance of a parameter inferred from a data set, in terms of
a test statistic. A two-tailed test is appropriate if the estimated value is greater or less than a
certain range of values, for example, whether a test taker may score above or below a specific
range of scores. This method is used for null hypothesis testing and if the estimated value
exists in the critical areas, the alternative hypothesis is accepted over the null hypothesis. A
one-tailed test is appropriate if the estimated value may depart from the reference value in
only one direction, left or right, but not both. An example can be whether a machine produces
more than one-percent defective products. In this situation, if the estimated value exists in
one of the one-sided critical areas, depending on the direction of interest (greater than or less
than), the alternative hypothesis is accepted over the null hypothesis. Alternative names
are one-sided and two-sided tests; the terminology "tail" is used because the extreme
portions of distributions, where observations lead to rejection of the null hypothesis, are small
and often "tail off" toward zero as in the normal distribution, colored in yellow, or "bell
curve", pictured on the right and colored in green.
Parametric tests are those that make assumptions about the parameters of the population
distribution from which the sample is drawn. This is often the assumption that the population
data are normally distributed. Non-parametric tests are “distribution-free” and, as such, can
be used for non-Normal variables.
Parametric vs. Nonparametric
In steps 3 and 4, there are two general ways of assessing the difference between the groups to
see how “weird” the distribution is.
Parametric tests are used only where a normal distribution is assumed. The most widely used
tests are the t-test (paired or unpaired), ANOVA (one-way non-repeated, repeated; two-way,
three-way), linear regression and Pearson rank correlation.
Non-parametric tests are used when continuous data are not normally distributed or when
dealing with discrete variables. Most widely used are chi-squared, Fisher's exact tests,
Wilcoxon's matched pairs, Mann–Whitney U-tests, Kruskal–Wallis tests and Spearman rank
correlation.
Parametric and nonparametric are two broad classifications of statistical procedures.
What Is a Null Hypothesis?

A null hypothesis is a type of hypothesis used in statistics that proposes that there is
no difference between certain characteristics of a population (or data-generating
process).
For example, a gambler may be interested in whether a game of chance is fair. If it

is fair, then the expected earnings per play come to 0 for both players. If the game is
not fair, then the expected earnings are positive for one player and negative for the
other. To test whether the game is fair, the gambler collects earnings data from
many repetitions of the game, calculates the average earnings from these data,
then tests the null hypothesis that the expected earnings are not different from zero.
If the average earnings from the sample data are sufficiently far from zero, then the
gambler will reject the null hypothesis and conclude the alternative hypothesis—
namely, that the expected earnings per play are different from zero. If the average
earnings from the sample data are near zero, then the gambler will not reject the
null hypothesis, concluding instead that the difference between the average from
the data and 0 is explainable by chance alone.
KEY TAKEAWAYS
 A null hypothesis is a type of conjecture used in statistics that proposes

that there is no difference between certain characteristics of a
population or data-generating process.
 The alternative hypothesis proposes that there is a difference.
 Hypothesis testing provides a method to reject a null hypothesis within
a certain confidence level. (Null hypotheses cannot be proven, though.)
Every time you conduct a hypothesis test, there are four possible outcomes of your decision to
reject or not reject the null hypothesis: (1) You don't reject the null hypothesis when it is true, (2)
you reject the null hypothesis when it is true, (3) you don't reject the null hypothesis when it is
false, and (4) you reject the null hypothesis when it is false.
Consider the following analogy: You are an airport security screener. For every passenger who
passes through your security checkpoint, you must decide whether to select the passenger for
further screening based on your assessment of whether he or she is carrying a weapon.
Suppose your null hypothesis is that the passenger has a weapon. As in hypothesis testing,
there are four possible outcomes of your decision: (1) You select the passenger for further
inspection when the passenger has a weapon, (2) you allow the passenger to board his flight
when the passenger has a weapon, (3) you select the passenger for further inspection when the
passenger has no weapon, and (4) you allow the passenger to board his flight when the
passenger has no weapon.
Which of the following outcomes corresponds to a Type I error?
A. You allow the passenger to board his flight when the passenger has a weapon.
B. You select the passenger for further inspection when the passenger has no weapon.
C. You allow the passenger to board his flight when the passenger has no weapon.
D. You select the passenger for further inspection when the passenger has a weapon.
Which of the following outcomes corresponds to a Type II error?
A. You allow the passenger to board his flight when the passenger has no weapon.
B. You select the passenger for further inspection when the passenger has no weapon.
C. You select the passenger for further inspection when the passenger has a weapon.
D> You allow the passenger to board his flight when the passenger has a weapon.
As a security screener, the worst error you can make is to allow the passenger to board his flight
when the passenger has a weapon. The probability that you make this error, in our hypothesis
testing analogy, is described by _____.
Type and Type II Error

The Type and Type II Error occurs while checking the credibility of the claim. The Type Error is
the rejection region and the Type II Error is the acceptance region. Graphically it can say that to
reduce Type I Error, one must increase the Type II Error.
BASIC PRINCIPLES OF STATISTICALHYPOTHESIS TESTINGA statistical hypothesis

testing procedure is mathe-matically a formal way to use data to examine theplausibility of a
specific statement regarding theassociation of two or more variables. The statementabout the
association of the variables is called the‘null hypothesis’ in statistical jargon.
Conceptually,the null hypothesis could make many different typesof statements about the
association of the variables;however, in practice, the null hypothesis usuallyimplies that there
is no association between thevariables. For example, when searching for differen-tially
expressed genes, the null hypothesis usuallystates that all experimental groups have an
equalmean (or median) expression, thereby implying thatexpression is not associated with the
experimentalgroups. The null hypothesis is used to derive areference distribution of a test
statistic (such as at-statistic); this distribution is called the ‘null distribu-tion,’ and it describes
the variability of that statisticdue to chance. The procedure compares the teststatistic to its
null distribution and computes aP-value to summarize the comparison. A smallP-value
indicates that the test statistic lies in theextremities of the null distribution; this
findingsuggests that the null hypothesis does not accuratelydescribe the association of the
considered variables.In statistical jargon, the procedure ‘rejects the nullhypothesis’ when the
P-value is less than somethreshold such as 0.05. Commonly, a result isdescribed as
‘statistically significant’ or as ‘significant’when the procedure rejects the null hypothesis.In
this article, the term ‘significant result’ will beused. Conversely, the procedure fails to reject
thenull hypothesis when the P-value is greater thanthe preselected threshold. Commonly, a
result isdescribed as ‘insignificant’ or ‘not statistically sig-nificant’ when the procedure fails
to reject the nullhypothesis.The various statistical hypothesis testing proce-dures, such as the
t-test, analysis of variance(ANOVA), etc., use different assumptions to derivethe null
distribution of a test statistic. The reliabilityof a statistical procedure depends in part on
thevalidity of assumptions it uses to derive the nulldistribution of the test statistic. One class
ofprocedures, commonly referred to as pa ramet ric tests,assumes that the chance variation of
the data canbe accurately modelled by a particular probabilitydistribution, such as the normal
distribution.Another class of procedures, commonly referred toas exact tests, use the
observed data to empiricallyderive a null distribution. For simple comparisons oftwo or more
groups, exact tests assume that the datavalues are identically distributed within each
group.Under this assumption, the group names can bethought of as arbitrary labels assigned
to data valuescompletely at random. Exact tests compute the teststatistic for every possible
assignment of group labelsto data values. The resulting set of test statistics is anempirical null
distribution. The P-value is computedby comparing the test statistic computed from thelabel
assignments in the original data set to theempirical null distribution. In practice, there are
casesin which it is not feasible to enumerate all thepossible label assignments. In these cases,
permutationmethods can be used to approximate the empirical nulldistribution. Permutation
methods compute the teststatistic for each of many randomly selected assign-ments of group
labels to the data values to obtain anapproximate empirical null distribution that is usedto
compute the P-value. Parametric tests typicallyoffer greater statistical power (i.e. probability
that theprocedure correctly rejects a false null hypothesis)than the other methods when the
assumed modelaccurately describes the distribution of the data.However, permutation
methods are based on lessrestrictive assumptions and therefore provide reliableinferences in a
broader array of settings than doanalogous parametric methods.Each time a statistical test is
performed, one offour outcomes occurs, depending on whether thenull hypothesis is true and
whether the statisticalprocedure rejects the null hypothesis (Table 1): theprocedure rejects a
true null hypothesis (i.e. a falsepositive); the procedure fails to reject a true nullhypothesis
(i.e. a true negative); the procedurerejects a false null hypothesis (i.e. a true positive);or the
procedure fails to reject a false null hypothesis(i.e. a false negative) [3]. Therefore, each time
astatistical test is performed, there is some probabilityTa b l e 1: Four possible hypothesis
testing outcomesStatisti calinferenceNull hypothesisis trueNull hypothesisis falseReject
thenull hypothesisFalse positive,false disco v ery ortype I errorTrue positive orcorrect
rejectionFail to reject thenull hypothesisTrue negative False negative ortype II error26 P
ounds by guest on June 4, 2013http://bib.oxfordjournals.org/Downloaded from
that the procedure will suggest an incorrect infer-ence. When only one hypothesis is to be
tested, theprobability of each type of erroneous inference canbe limited to tolerable levels by
carefully planningthe experiment and the statistical analysis. In thissimple setting, the
probability of a false positive canbe limited by preselecting the P-value threshold usedto
determine whether to reject the null hypothesis.The probability of a false negative can be
limited byperforming an experiment with adequate replication.Statistical power calculations
can determine howmuch replication is required to achieve a desiredlevel of control of the
probability of a false negativeresult. When multiple tests are performed, as inthe analysis of
microarray data, it is even more criticalto carefully plan the experiment and statisticalanalysis
to reduce the occurrence of erroneousinferences.THE PROBLEM OFMULTIPLE
TESTINGThe analysis of microarray data usually requires thatmany statistical hypothesis
tests be performed.Typically, one or more tests are applied for eachfeature queried in the
experiment. For example,to identify differentially expressed genes, one mayapply a statistical
procedure to the data of eachfeature to test whether the feature has the samemean expression
across all experimental groups. Eachstatistical test has a certain probability of suggestingan
erroneous inference. For instance, it is expectedthat 5% of all features that are not associated
with thetrait of interest to be declared statistically significantif all P-values < 0.05 are
considered statisticallysignificant. Numerous false positives could occursimply because there
are many features not associatedwith the trait of interest. Also, the choice of thethreshold will
affect the number of false negativesin an experiment. Reducing the threshold ofsignificance
increases the stringency of the statisticaltests and may substantially increase the number
offalse negatives. Subsequently, choosing the P-valuethreshold used to determine statistical
significance isa delicate problem that requires very careful atten-tion. Additionally, the
results must be appropriatelyinterpreted after the P-value threshold is chosen.Several methods
account for multiple testingwhen determining which results should be consid-ered
statistically significant. These methods are called‘multiple testing procedures’ or ‘multiple
comparisonprocedures’. However, it can be difficult for aninvestigator to choose a procedure
that is appropriatefor a specific application. Understanding some keydifferences between the
various methods can helpin selecting the procedure that should be used in aparticular
setting.ERROR RAT ES FORMULTIPLE TESTINGEvery multiple testing procedure uses
some error rateto measure the occurrence of incorrect inferences.Multiple testing procedures
use a variety of errorrates to measure the occurrence of erroneousinferences. Most error rates
focus on the occurrenceof false positives, but a few also consider theoccurrence of false
negatives. Some error rates thathave been used in the multiple testing setting aredescribed
next.Classical multiple testing procedures use thefamily-wise error rate (FWER). The FWER
isdefined as the probability that the analysis yieldsany false positive findings. The FWER was
quicklyrecognized as being too conservative for the analysisof microarray data, because in
many applications,the only way to limit the probability that any ofthousands of statistical
tests yields a false positiveinference is to not allow any result to be deemedsignificant. A
similar, but less stringent, error rate isthe generalized family-wise error rate (gFWER ork-
FWER). The k-FWER is the probability that kor more of the significant findings are
actuallyfalse positives. Recently, some procedures have beenproposed that use the gFWER to
measure theoccurrence of false positives [4].The false discovery rate [5] (FDR) is
nowrecognized as a useful measure of the occurrence offalse positives in microarray studies
[2]. The FDR canbe interpreted as the expected proportion of sig-nificant findings that are
indeed false positives. Thepositive false discovery rate [6] (pFDR) and condi-tional false
discovery rate [7] (cFDR) have similarinterpretations and have also been proposed as
usefulerror rates for addressing multiple testing in theanalysis of microarray data. The FDR,
pFDR andcFDR are reasonable error rates for microarraystudies because they can naturally
be translated intoterms of the costs of attempting to validate falsepositive results.Other
criteria have recently been proposed tomeasure the occurrence of incorrect inferences in
theMultiple testing error rates 27 by guest on June 4,
2013http://bib.oxfordjournals.org/Downloaded from
multiple testing settings that arise in the analysis ofmicroarray data. The total error criterion
[8, 9](TEC) is the expected sum of the number of falsepositives and the number of false
negatives. Theprofile information criterion [9] (PIC) also measuresthe balance between false
positives and false nega-tives. The probability that the proportion ofsignificant findings that
are false positives is greaterthan a user-specified limit, which is also known asthe tail
probability of the proportion of falsepositives, has also been suggested as another usefulerror
rate that could be used by a multiple testingprocedure [4].OTHER PRINCIPLES THATDIS
TINGUISH MULTIPL ET ESTIN G PROCEDURESSeveral principles distinguish the
various multipletesting procedures from one another. The proce-dures differ according to:
whether their objective iserror estimation or error control; how they accountfor correlation,
their computational demands andcomplexity; how rigorously they have been validatedand
what conditions are most likely to lead them toproduce reliable or unreliable results.Most
multiple testing procedures applied to theanalysis of microarray data have one of two
generalobjectives: estimation of a particular error rate orcont r o l of a particular error rate.
Control proceduresseek to determine a threshold for significance insuch a manner that the
error rate is limited to beingless than or equal to a prespecified level of tolerance.On the other
hand, estimation procedures seek toaccurately estimate the value of an error rate for auser-
selected threshold of significance. A set of resultsthat are significant at a given level of error
control aregenerally considered to be more definitive than aset of results with an estimated
equal error rate.However, it can be difficult to choose an appropriatelevel of tolerance for an
error rate before performingthe analysis, which is a drawback of using a
controlmethod.Multiple testing methods differ in how theyaccount for correlations among the
collection of teststatistics (or P-values) computed in an analysis.In statistical language, these
differences can be sum-marized by stating whether the method performsinferences by
considering the marginal di stributions ofthe test statistics (or P-values) or the j o int dist
ributionof the collection of the test statistics (or P-values) as agroup. The marginal
distribution of an individualtest statistic describes how that statistic varies dueto chance,
without considering how it may becorrelated with the other test statistics. The
jointdistribution of a collection of test statistics describesthe chance variation of all statistics
as a group. Somemethods accept a collection of marginal P-values,i.e. P-values computed
using the marginal nulldistributions of the test statistics, as input. MarginalP-values can be
computed by a parametric proce-dure, a rank-based procedure, or a permutation.Methods that
use marginal P-values assume thatthe effects of correlation between test statistics
arenegligible. Other methods perform inferences thataccount for multiple testing by using an
empiricaljoint distribution of the test statistics derived bypermutation.Multiple testing
procedures vary in their compu-tational demands and complexity. As previouslymentioned,
some methods use permutation, whichcan require substantial computing time in
someapplications. Also, some methods use the boot-strap [10], a computationally demanding
resamplingprocedure. Many procedures perform relativelysimple calculations on the set of
marginal P-values.The computationally intensive procedures can offersome robustness
properties, such as more fullyaccounting for the effects of correlation. However,it is not
always clear if the gains in robustness warrantthe computational efforts that are required. A
latersection discusses how to balance the trade-offbetween robustness and computational
effort inchoosing a procedure for a particular application.The multiple testing methods have
been validatedat various degrees of rigour. There are advantagesand disadvantages of various
techniques used tovalidate statistical methods [11]. A rigorous mathe-matical proof is
considered the gold standard toestablish the properties of a statistical method.However, a
mathematical proof establishes theproperties of the method only under an assumedscenario,
which may be unrealistic in application.Therefore, it has been especially difficult to useproofs
to establish that a statistical method hasdesirable operating characteristics for the
complexsettings most likely to arise in the analysis ofmicroarray data. Nevertheless, some
methods haveproven mathematical properties under a fairly diverserange of hypothetical
settings. Another validationmethod is simulation. In simulation, many data setsare generated
under an assumed setting, methods28 P ounds by guest on June 4,
2013http://bib.oxfordjournals.org/Downloaded from
are applied to analyse those data sets, and then theperformance of the methods is summarized
acrossthe data sets. Again, properties demonstrated in asimulation hold only under the
assumed setting.Still, simulations are a valuable tool for validatingand studying the
performance properties of statisticalprocedures. Also, simulations are typically a
morefeasible way to validate a method than themathematical proofs.The performance and
reliability of multipletesting procedures depends primarily on howaccurately the assumptions
of the method reflectreality. Some methods fit mixture models withuniform and beta
components to the observed set ofP-values [3, 12]. These methods can give accurateestimates
of the error rates when the fitted modelaccurately represents the observed P-value distribu-
tion [3]. In some cases, however, these models donot fit the observed P-values very well;
therefore,the resulting error rate estimates are most likely tobe quite inaccurate [13].
Furthermore, methods thatoperate on marginal P-values typically assume thatP-values arising
from statistical tests of true nullhypotheses (i.e. corresponding to non-differentiallyexpressed
microarray features) are statistically inde-pendent and uniformly distributed over the
interval(0,1). Strictly speaking, the assumption regardingstatistical independence is probably
not true, becausegenes operate in pathways, and the results of analysesof genes in a common
pathway are correlated.Nevertheless, some of the methods operating onP-values are robust
against certain mild violationsof this assumption [14]. In particular, many of themethods that
operate on P-values are robust insettings where the number of correlated features issmall
relative to the number of queried features [14].Genome-wide experiments typically fall into
thistype of setting because the number of genes in anypathway is small relative to the number
of genesrepresented on the array. Nevertheless, someexperiments may have a large number of
featuresthat are correlated relative to the number of featuresrepresented on the array.
Resampling methods offeran alternative approach in the context of experimentshaving
correlation structures that influence a sub-stantial proportion of the queried features [15]
What Is a One-Tailed Test?

A one-tailed test is a statistical test in which the critical area of a distribution is one-
sided so that it is either greater than or less than a certain value, but not both. If the
sample being tested falls into the one-sided critical area, the alternative hypothesis
will be accepted instead of the null hypothesis.

A one-tailed test is also known as a directional hypothesis or directional test.
The Basics of a One-Tailed Test

A basic concept in inferential statistics is hypothesis testing. Hypothesis testing is
run to determine whether a claim is true or not, given a population parameter. A test
that is conducted to show whether the mean of the sample is significantly greater
than and significantly less than the mean of a population is considered a two-tailed
test. When the testing is set up to show that the sample mean would be higher or
lower than the population mean, it is referred to as a one-tailed test. The one-tailed
test gets its name from testing the area under one of the tails (sides) of a normal
distribution, although the test can be used in other non-normal distributions as well.
Before the one-tailed test can be performed, null and alternative hypotheses have to
be established. A null hypothesis is a claim that the researcher hopes to reject. An
alternative hypothesis is the claim that is supported by rejecting the null hypothesis.
KEY TAKEAWAYS
 A one-tailed test is a statistical hypothesis test set up to show that the
sample mean would be higher or lower than the population mean, but
not both.
 When using a one-tailed test, the analyst is testing for the possibility of
the relationship in one direction of interest, and completely disregarding
the possibility of a relationship in another direction.
 Before running a one-tailed test, the analyst must set up a null
hypothesis and an alternative hypothesis and establish a probability
value (p-value).

Type I and Type II Errors

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Type I and Type II Errors

Uploaded by

Copyright:

Available Formats

A type I error (false-positive) occurs if an investigator rejects a null hypothesis that is

 error in statistical decision-making

Type I error rate

 Size of the effect: Larger effects are more easily detected.

Type II error rate

Trade-off between Type I and Type II errors

This trade-off is visualized in the graph below. It shows two curves:

Is a Type I or Type II error worse?

In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways

Parametric and nonparametric are two broad classifications of statistical procedures.

What Is a Null Hypothesis?

For example, a gambler may be interested in whether a game of chance is fair. If it

 A null hypothesis is a type of conjecture used in statistics that proposes

Type and Type II Error

BASIC PRINCIPLES OF STATISTICALHYPOTHESIS TESTINGA statistical hypothesis

What Is a One-Tailed Test?

The Basics of a One-Tailed Test

You might also like