You are on page 1of 8

The One-Sample Runs Test: A Category of Exception

Robert G. Mogull
California State University, Sacramento

Key words: Runs Test, One-Sample Runs Test, nonrandom sample, random sample

The popular One-Sample Runs Test is used to identify a nonrandom pattern


in a sequence of dichotomous elements. Although the test is generally
effective in the identification of patterns, it is demonstrated to be incapable
of signaling departures from randomness with run lengths of two. Further-
more, with run lengths of two, increasing the sample size reduces the power
of the test. Run lengths strictly of two, therefore, generate a unique category
of anomaly in the tests overall performance.

Since A. M. Mood's publication in 1940, the Runs Test for evaluating the
randomness of a sample has grown markedly in popularity and in variety of
applications. The One-Sample Runs Test is used (a) to check for randomness
in a sample distribution, (b) in production quality control to detect assignable
sources of defects, (c) to examine a distribution of regression residuals for
nonrandomness, and (d) in testing for both trends and cyclical patterns with
temporal data. These various applications are all basically alike.
The test examines a distribution of sequentially ordered elements that fall
in just two mutually exclusive categories and evaluates whether the number
of runs of the elements is either too few or too many. The test does not
consider the relative counts (frequencies) of the elements or the lengths of
the runs of elements. Instead, it focuses only on the number of runs. (A run
is defined as a sequence of like symbols that are preceded and followed by
symbols of the other type or by none at all.)
Although the Runs Test is not efficient (since it ignores both the relative
frequencies of the elements and the lengths of the runs), its popularity is due
to its simplicity of use and its versatility in a wide variety of applications.
Furthermore, it can be employed with as few as just two observations for
each of the two symbols. The power of the test to detect nonrandomness
does increase, however, with larger samples. (The power of the test is defined
as the probability of correctly rejecting the null hypothesis.)
The next two sections of this article will demonstrate the One-Sample
Runs Test first with a typical example and then with a category of exception.
In the exceptions, it will be shown that the test is not able to detect a particular
pattern of nonrandomness among the elements. It will also be demonstrated
that, for this particular category of patterns, the test actually loses power
with larger samples.

296

Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 11, 2016


Teacher's Corner

A Typical Example
Consider the case of true (T) and false (F) answers on an examination, as
shown below.

TFTFTFTFTFTFTFTFTFTFTFTF (1)

In this distribution of two mutually exclusive symbols, there are 12 true and
12 false alternating answers for a total of 24 runs. The runs occur in lengths
of one only.
The null hypothesis claims that the sequence of T and F symbols is random
and that there is no pattern. The alternative hypothesis states that there is a
pattern to the sequence of symbols or, equivalently, that there are either too
few or too many runs in the distribution. Since neither symbol exceeds a
frequency count of 20, special small-sample tables of critical values are used
(Rohlf & Sokal, 1969; Swed & Eisenhart, 1943).1 In the example, the number
of runs (r = 24) exceeds the tabular upper critical value of 20. The 24
therefore falls in the upper region of rejection and the null hypothesis is
rejected at the two-tailed .01 total alpha level. The alternating true and
false answers create too many runs to reasonably conclude that the pattern
is random.
The example can be extended to a larger case, as follows.

TF • • • TF (2)

In this second case, there are 24 true and 24 false alternating answers for a
total of 48 runs. At least one symbol (in this case, both) exceeds 20. Therefore,
a normal distribution is approximated, where the normal deviate (Z) is

z _ (r ± .5) - |xr
(Jr

and where

r = the total number of runs in the distribution,

/ 2/2^2 \
JLJL,- = + 1 = the expected number of runs,
\n\ + n2

2n\n2(2nln2 — nx — n2)
crr = A / —— 2 —— — = the standard error of the number of runs,
(n, + n2)\n { + n2 - 1)

± .5 = a correction for continuity,

297

Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 11, 2016


Teacher's Corner

nx = the number of symbols of the first type, and


n2 — the number of symbols of the second type.
Hence,

z = (48-5)-25 s 6 5 6
3.43

Since IZI > (Z001/2 — — 3.09), the null hypothesis (of randomness in alternating
true and false answers) is rejected at better than the two-tailed .001 level
of significance.
An Exception
Let us now create the following patterned distribution.

TTFFTTFFTTFFTTFFTTFFTTFF (3)

This small-sample case uses run lengths of two only, and there are 12 runs
with 12 each of the T and F symbols.
As usual, the null hypothesis states that the sequence of dichotomous
elements is random and that there is no pattern to the distribution. It further
claims that there are neither too many nor too few runs and that the observed
number of runs is not significantly different from the expected number. Yet,
in this example, r = 12 falls in neither the upper nor lower regions of
rejection—as delineated by the special tables of critical values for small
samples. (The lower critical value is 8 and the upper critical value is 18 for
a two-tailed .10 total alpha level.) Hence, although the sequence of symbols
certainly is not random, the test fails to detect the situation.
Would the Runs Test perform better if a larger sample were used? Suppose,
for example, the distribution is as follows.

TTFF • • • TTFF (4)

Here, run lengths are again in twos only, but there are now 48 runs with 48
each of the Ts and Fs. Since at least one symbol exceeds a count of 20, a
normal deviate is calculated. Thus,

The test statistic of Z = -.10 is not statistically significant at any reasonable


level of alpha. Again, although the sequence of T and F symbols is clearly
not random, the Runs Test fails to identify the situation.

298

Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 11, 2016


Teacher's Corner

If one increases the sample size still further, as is done in Case 5, what
would be the effect? In this next example, there are now 96 runs composed
of 96 Ts and 96 Fs.

TTFF • • • TTFF (5)

The normal deviate test statistic is calculated as

z = (96 + .5)-97
6.91

However, again the test statistic is not significant at any reasonable level of
alpha. Moreover, by increasing the total sample size from 96 (in Case 4) to
192 (in Case 5), the test statistic has declined from IZI = .10 to IZI = .07.
It appears, therefore, that the test is unable to identify a pattern when the
symbols occur in run lengths of two. But, would the test also fail to discern
a pattern that occurs in run lengths of three—as illustrated in the next two
examples?

TTTFFFTTTFFFTTTFFFTTTFFFTTTFFFTTTFFF (6)

In Case 6, there are 12 runs with 18 Ts and 18 Fs. Since the small-sample
lower critical value is 12, the test does (barely) identify the nonrandom pattern
at the 5% level of significance.
Furthermore, increasing the sample size makes the test more reliable, as
seen below.

TTTFFF- • TTTFFF (7)

In this case, there are 24 runs with 36 Ts and 36 Fs. Consequently, the test
statistic is

2 = (24 4- 5 ) - 3 7 ^
4.21

which lies in a two-tailed 1% region of rejection. As anticipated, with run


lengths of three, a larger sample size does provide additional power to the
test in discerning departures from randomness.
To focus on this last point, the next two cases both have run lengths of
four. Case 8 contains 24 runs with 48 Ts and 48 Fs, while Case 9 has 100
runs with 200 Ts and 200 Fs. The pattern for both cases is

299

Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 11, 2016


Teacher's Corner

TTTTFFFF- • -TTTTFFFF, (8),(9)

and the normal deviate test statistics, respectively, are

z = (24 + 5)-49s_ 5 0 3
4.87

and

= (100 + -5) ~ 201 ^ _


9.99

Thus, by increasing the total sample size from 96 (in Case 8) to 400 (in Case
9), the test statistic grows from IZI = 5.03 to IZI = 10.06. Hence, the null
hypothesis is rejected at a smaller probability level. As expected, the test
becomes more powerful with larger samples.
As a rule then, as the sample size increases, the absolute value of the test
statistic also increases for various run lengths. As shown in Table 1 below,
for run lengths of one, three, and four, the test statistics behave as expected
and rise along with the larger samples. A unique exception is for run lengths
of two, however. For this single category of exception, the absolute values
of the test statistics decrease with larger samples. With run lengths of two,
as the total sample size increases from 48 to 2 million, the absolute values
of the test statistics decrease from .15 to .0007.
Therefore, in conclusion, this article reveals two notable anomalies regard-
ing the One-Sample Runs Test: (a) a pattern among dichotomous symbols
cannot be identified when run lengths are strictly in twos, and (b) in contrast
to other run lengths, increasing the sample size with run lengths of two does
not improve the power of the test. To the contrary, for run lengths strictly
of two, the power of the One-Sample Runs Test decreases with larger
sized samples.

TABLE 1
Total sample sizes!absolute value test statistics for run lengths of 1-4
Run length
Number of
runs One Two Three Four
24 24/4.38 48/.15 72/2.97 96/5.03
48 48/6.56 96/.10 144/4.10 192/7.02
100 100/9.76 200/.07 300/5.84 400/10.06
1,000 1,000/31.55 2,000/.02 3,000/18.28 4,000/31.64
1,000,000 1,000,000/999.99 2,000,000/.0007 3,000,000/577.35 4,000,000/1,000

Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 11, 2016


Teacher's Corner

Notes
'The Rohlf and Sokal tables have corrected the Swed and Eisenhart tables for incon-
sistencies.
2
This example was suggested by an anonymous reviewer for this journal. The
author wishes to thank the reviewer for the suggestion.

APPENDIX
Some additional examples

In this appendix, several additional examples of the One-Sample Runs Test are
presented for the unique case of run lengths of two. These examples, as well as the
true/false examples above, would be suitable for classroom illustration(s).
Example 1} Consider the case where two students are seated side by side and are
taking a statistics exam. One student (called A) is doing his own work, while his
neighbor (called B) is copying from As work. Suppose further that there are 10
exam problems to be solved and A always places two solutions on the left-hand page
and two on the right-hand page of his blue book. Whereas B cannot see A's left-
hand solutions, he can see and indeed does copy A's right-hand solutions. Hence,
their answers match for A's right-hand pages but do not match for the left. This
would yield a 0011001100 pattern, where 0 = answers do not agree and 1 = answers
do agree. Therefore, there are five runs, with six zeros and four ones.
Because neither symbol (0 or 1) exceeds a frequency count of 20, special Swed
and Eisenhart (1943) tables of critical values are used. The number of runs (r = 5)
is seen to fall in neither the lower (critical value of 3) nor the upper (critical value
of 9) region of rejection with a two-tailed .10 total alpha level. Hence, the Runs Test
fails to detect nonrandomness for run lengths of two.
Example 2. Consider the case where a professor holds office hours on Mondays
and Wednesdays over the first 15 weeks of a semester (the 16th week is for final
exams, and office hours are canceled). Also, suppose that the professor gives biweekly
tests on Fridays throughout the 15 weeks. Consequently, students typically seek
help during office hours on the weeks of the exams, but not the other weeks. If no
students drop in during office hours, it is designated as 0. However, if at least one
student shows up, it is designated by 1. The pattern, therefore, is
001100110011001100110011001100. This pattern indicates, for example, that on
Monday and Wednesday of the first week, no students came to the professor's office
hours. During office hours on both Monday and Wednesday of the second week,
however, students did come for help.
In this example, there are 15 runs, with 16 zeros and 14 ones. Because neither
symbol (0 or 1) exceeds a frequency count of 20, special Swed and Eisenhart (1943)
tables of critical values are used. The number of runs (r = 15) falls within the lower
and upper boundaries (11 and 21, respectively) of the acceptance region for a two-
tailed test with a. 10 total alpha level. Thus, the Runs Test fails to detect nonrandomness
for run lengths of two.
Example 3. Robin bought a bag of M&M's candy for her younger brothers, Scott
and Michael. She distributed the 200 pieces of candy from the bag by first giving
Scott two pieces and then giving Michael two. This pattern was followed until the

301
Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 11, 2016
Teacher's Corner

bag was empty. If we designate the symbol 0 for each piece given to Scott and the
symbol 1 for each piece given to Michael, the pattern of distribution was 00110011
. .. , with a total of 100 runs. Thus, the normal deviate test statistic is

(r ± .5) - yr (100 + .5) - 101 ^ _


Z
<rr 7.05 " '

where

/ 2M,n 2 \ /2-100-100\ , 1 i m

and

2n\n2(2nln2 — nx — n2)
07
(», + n2)\n{ + n2 - 1)

2 • 100 • 100(2 • 100 • 100 - 100 - 100) = ^


(100 + 100) 2 (100 + 1 0 0 - 1 )

Since the absolute value of the test statistic (IZI = .07) is not statistically significant
at any customary or reasonable level of alpha, the Runs Test does not reject the null
hypothesis of randomness. Hence, the test fails to identify the nonrandom runs pattern
with run lengths of two.
Example 4. Noah was about to load his ark with two of every kind of animal.
Among his many concerns, he was worried about the weights of the animals and
whether the gangplank would be strong enough to hold his boarders. He therefore
issued instructions to the animals to form a single line of alternating pairs of heavy
and light animals, where 0 signifies an animal lighter in weight than Noah and 1
signifies an animal heavier than Noah. At last count, there were 1 million different
types of animals—half of whom were designated as light and half as heavy. Hence,
there were 1 million (m) runs of alternating paired symbols, 00110011 . . . , and 1
million each of the light and heavy animals (e.g., the first two boarders were male
and female squirrels, the next two were male and female horses, etc.). Thus, the test
statistic is

_ (r ± .5) - ixr _ ( l m + .5) - 1,000,001 _ _


Z =
" <rr " 707.107 -0007

where

2n,«2\ + 1 = 2vlmvlm + ! = 1,000,001


n2 \ lm + lm

302

Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 11, 2016


Teacher's Corner

and

_ 2nin2(2n\n2 — n{ — n2)

- Z2' l n v l m ( 2 ' l m * lm - lm - lm) =


V (lm + lm)2(lm + lm - 1)

Since the absolute value of the test statistic is IZI = .0007, it falls in neither the
lower nor upper region of rejection using any customary or reasonable level of alpha.
Hence, the Runs Test fails to identify the nonrandom runs pattern for run lengths
of two.

References
Mood, A. M. (1940). The distribution theory runs. Annals of Mathematical Statistics,
XI (4), 367-392.
Rohlf, F. J., & Sokal, R. R. (1969). Statistical tables. San Francisco: W. H. Freeman.
Swed, Freida S., & Eisenhart, C. (1943). Tables for testing randomness of grouping
in a sequence of alternatives. Annals of Mathematical Statistics, XIV (1), 66-87.

Author
ROBERT G. MOGULL is Professor, School of Business Administration, California
State University at Sacramento, 6000 J Street, Sacramento, CA 95819. He special-
izes in estimating and projecting annual poverty for the state of California.

Received February 14, 1994


Accepted March 8, 1994

303

Downloaded from http://jebs.aera.net at PENNSYLVANIA STATE UNIV on May 11, 2016

You might also like