You are on page 1of 2

Department of Statistics, University of Connecticut Storrs, Connecticut 06268 USA

Robustness of the Two-Sample T-Test

HARRY O. POSTEN

Abstract from normality, the two-sample t-test is not very sensi-


tive to nonnormality. Also, the evidence indicates that
In the literature, one finds evidence that the two-sample
with equal or nearly equal sample sizes this test is robust
t-test is robust with respect to departures from norma-
with respect to departures from homogeneity of variance.
lity, and departures from homogeneity of variance (at
least when sample sizes are equal or nearly equal). This The present paper is concerned with a synthesis of several
evidence, presented in various articles, is usually based recent studies which in a detailed manner provide an
on an approximate approach without error analysis or assessment of the robustness level of the two-sample
on a simulation approach that is of limited extent. The t-test under common practical conditions. In some cases,
present paper takes a closer and more extensive look at the results are restricted to the two-tailed test or to equal
the quality of this procedure under departures from the sample sizes, but on the whole they provide answers to
primary assumptions of normality and of equal variances. the question "What level of robustness docs the two-
The results presented are a synthesis of several previous sample t-test have?" for impor1ant prac1ical cases,
papers by th'e author and colleagues, with particular
emphasis on the use of a broad Monte Carlo approach to
the assessment of robustness. 2. Robustness Under Heterogeneity of Variance

A recent paper, Posten, Yeh and Owen (1982), studied


1. Introduction the change in the true significance level <x(,t) of the two-
In robustness research, there are two directions one may sample double-tailed t-test when the populations are
take. One may attempt to quantify or measure the degree normal but the ratio A = 11,2/1122 varies from the assumed
of robustness inherent in a standard statistical procedure, value A = 1. The results of this theoretical study indicate
or one may attempt to develop a new alternative pro- an extremely strong level of robustness under departures
cedure which, in some sense, is more robust than the stan- from equal variances when the sample sizes are equal.
dard procedure. In recent years, much of the robustness This level of robustness is probably stronger than most
literature has been concerned with the development of people realize. Table 1 provides these results in terms
such new procedures. However, significant contributions of the concept of "total robustness at a given robustness
can still be made in the study of the robustness of stan- level". Specifically, the t-test is considered to be totally
dard procedures since, even for the most familiar proce- robust at level e if, no matter what the value of A, one
dures, there exists vagueness concerning the conditions makes no more an error than e in assuming the signi-
under which the procedure is robust and under which ficance level to be <x(l), the value under the condition of
it is nonrobust. For example, one finds general evidence equal variances. Mathematically, this means that as ..l
in the literature that the two-sample t-test is fairly ro- ranges over (0, (0), I<x(,t) - <x (1) I varies only within a
bust with respect to departures from normality and also range bounded bye. For example, from table 1, with
with respect to departures from homogeneity of variance n,'= n2 = 20 and <x(l) = .05, the maximum error one can
(at least when sample sizes are equal or nearly equal). make in assuming the significance level to be 0.05 is
On the other hand, one can also find evidence that the 0.0072. Thus, the true significance level will be no more
two-sample t-test may not be robust under certain con- than 0.0072 from the assumed level of 0.05, no matter how
ditions. Bradley (1980) provided results from a simulation much n',2 varies from 1122. Further, an error of a magnitude
study (30,000 generated values of the two-sample t-sta- near the value 0.0072 will occur only when n'1~ is very
tistic for samples from several pairs of populations and much larger or smaller than n'} and table 1 may there-
sample sizes) which suggested that dramatically different fore be used to conservatively determine the degree of
shapes for the two populations could produce significant robustness of the equal sample size t-test under viola-
nonrobustness in the Type 1 error probability. Also, stu- tions of the assumption of equal variances. From table 1,
dies by Hyrenius (1950) and Zachrisson (1959) for the one- it is clear that this test is quite robust when sample
sample t-test hint at the possibility of nonrobustness of sizes are equal.
the two-sample t-test under two types of practical con- The question of what happens to the robustness of the
ditions: the condition of samples from a compound popu- two-sample t-test when sample sizes are unequal is dis-
lation (occurring when a population is a mixture of two cussed in the same paper. The results are given in table 2
or more distinct populations), and the condition of the in terms of maximal regions of robustness. A "maximal
samples being stratified samples from two or more popu- region of robustness" of level e is the region of A-values
lations (occurring when conditions change during the over which the true significance level, <X(,t)' deviates from
selection of the sample). Despite this negative evidence, the assumed value, <x(l), by no more than e. If this range
there is sufficient support in the literature (see Hatch and of ..l is wide in a practical sense, then the t-test is robust
Posten (1966) for a survey of robustness research for the at this level E. The maximal regions of robustness are
one- and two-sample t-tests) to indicate that under simple given in table 2 for equal sample sizes and for sample
random sampling from populations which do not differ sizes that vary 10 % and 20 % from equality. Table 2
strongly in shape and which are not extreme departures indicates that the sizes of maximal regions of robustness
D. Rasch et al. (eds.), Robustness of Statistical Methods and Nonparametric Statistics
© Academy of Agricultural Sciences of the GDR, Research Centre of Animal Production, Dummerstorf-Rostock, DDR 2551 Dummerstorf. 1984
92
spect to the Type 1 error probability, even when sample
--,------~----------"----------- ~

Table 1
Minimum value of c for which the t-test is totally robust sizes are somewhat unequal, as long as the smaller sample
at level c (nl = n2 = n) is taken from the population having the smaller variance.
nominal value u = (1(1) nominal value It = u(l) The original paper also contains results for ~O) = .01 with
similar results.
u= 0.05 u= (J.Ol u=(J.05 u=O.OI
n E f 11 E f
:~. Robustness Under Nonnormality
2 .0954 .0539 15 .0098 .0052
3 .0589 .0341 20 .0072 .0038 To precisely determine the degree of robustness of thE'
4 .0419 .0241 25 .0057 .003U two-sample t-test over a wide range of practical nOIl-
5 .0324 .0184 3U .0048 .0025 normal distributions is a difficult problem. An exact
6 .0263 .0148 50 .0028 .O(J15
7 .(J222
theoretical approach is impractical because of its mathe-
.0124 lUO .0014 .OU07
8 .0191 .0106 5UO .0003 .0001 matical intractability, an approximate approach would
!J .0168 .0093 1000 .0001 .0001 lack accuracy assurances, and a simulation approach re-
10 .0150 .0082 JO .0000 .0000 quires an exhorbitant amount of computer time to achieve

Table 2
Maximal regions of robustness of level f for the two tailed t-test
(nominal significance level uO) = 0'(l5)
f = 0.03

Zqual Sample 10 % Sample 20 II,'U Sample


Sizes Size Change Size Change

n1 n~ A-range 11, 11-, l-range III n~ ,i-range


5 5 0.02-85.63 4 (j O.OO-2.1l9
10 10 O.OU-JQ !J 11 (J.OU- 8.33 Il 12 0.OO-3.0(j
15 15 O.OO-JQ 14 16' 0.00- 00 12 18 0.00-3.17
20 20 O.OO-JO 18 22 0.00- 17.58 16 24 0.00-3.25
25 25 O.OO-JQ 2:l 27* 0.00- (Xl 20 30 0.02-3.30
30 30 0.00- x) 27 :l3 0.00-- :~6.f)5 24 :3(j 0.0:3-3.3:l
40 40 o.on-x' :36 44 0.OU-104.22 32 48 0.06-:UIl
50 50 O.UU- '-, 4!i 55 0.00- :.JO 40 60 . 0,()!i-:l.42

E = 0.02

nl n~ A-range n, n·, l-range n1 Il~ A-range


5 5 0.09-12.33 4 6 0.00-2.14
10 10 0.00-00 U 11 0.00-4.02 Il 12 0.21-2.17
15 15 O.UO-:x; 14 16' 0.00-9.31 12 18 0.26-2.20
20 20 O.OO-.lV 18 22 0.00-4.95 16 24 0.28-2.21
25 2!i O.OO-x: 2:~ 27' O.OO-:tUO 2() 3U 0.29--2.22
:30 :30 O.OO-,X) 27 :3:3 0.00-5.55 24 :~6 0.30-2.2:3
4U 40 O.OO--;:)() 36 44 0.00-5.96 32 41l 0.31-2.24
50 5U 0.00-00 45 55 0.00-6.26 40 60 0.31-2.25
------ --------------- ----- --

*= sample size change nearest to 10 % change from equality but not greater

reduce dramatically as sample sizes vary significantly respectable preCISIOn over all extensive practical range
from equality. Thus, the t-test tends to lose its strong of distributiuns. A simulation approach, however, can be
degree of robustness rapidl.v as the sample sizes become macle practical by using a computer artifice to speed up
unequal. When each sample size varies by IO "" from a sample generation and by using low priority computer
condition of equal sample sizes, the t-test still has a time to reduce computer costs.
respectable amount of robustness with respect tu the Such a simulation study was provided by Posten (1978).
Type 1 error probability. However, when the sample The intent of thai stud~' was to accurall'ly quantify the
sizes reach a 20 "." difference from equality. Olll' might (kgree of robustness of the two-sample I-test for a range
wish to be more cautious with the usc of the t-test. To of sample sizes over a wiele range of practical distribu-
an important degree, the loss of robustness when sam pic tions. The Pearson family of distributions was chosen
sizes are unequal is in the range where l >
1, that is, because it appeared to have best withstood the test of
when the larger variance is associated with the smaller time, in terms of representing practical data. The range
~ample size. The level of robustness for the unequal of coverage was for both negative and positive skewness
sample size test can, therefore, be significantly improved over O:S:; PI :0;: 2.0 and 1.4::;; P2:O;: 7.8, where p, = p/la fl
if one knows beforehand which population has this and P2 = ft"la". This seems to be a wide range of coverage
smaller variance. In this case, the smaller sample size may for practical distributions if one judges by the range of
be assigned to the population with the smaller variance. reported values of p, and P~ in, for example, ScheHe (1959)
The range of A is the restricted to (0, 1] and table 2 can bo and Pearson and Please (1975). The decision on the fine-
used with the righthand entries all replaced by 1. The re- ness of the grid covering this region was conservatively
sult is that the t-test becomes somewhat robust with re- made and the final coverage was for Pl= 0 (0.4) 2.0 and

93

You might also like