Professional Documents
Culture Documents
Researchers frequently encounter studies that compare variables. This problem should be distinguished from the
two groups on many variables. We discourage the use of similar problem of multiple comparisons that arises in
multiple tests of hypotheses on individual variables, an
approach that ignores the correlation among the variables
analysis of variance, in which an investigator compares
and increases the chance of a type I error. Instead of three or more groups on one variable (instead of many).
examining each variable separately, we recommend using The use of multiple comparisons in analysis of variance is
multivariate procedures that integrate all measures on a discussed in many other sources (5-15) and is not ad-
person into a unified analysis of the differences between
the two groups. We describe three multivariate
dressed here.
procedures: Hotelling's T 2 , discriminant analysis, and
logistic regression. We also discuss the use of Predictors of Coronary Heart Disease in a Cohort Study
Bonferroni's adjustment to preserve the overall chance of To illustrate the issues, we analyzed data from the
a type I error in conducting individual tests on each
Framingham Heart Study (2, 3, 16, 17). (The sole pur-
variable after doing the multivariate procedures. We
review the underlying assumptions and relative merits and pose of our analyses is to show the statistical concerns;
disadvantages of the three multivariate methods and we believe the results presented here are consistent with
recommend which method to use in various other published results.) For our purposes, we consid-
circumstances. ered only men, whom we divided into two groups: those
who did and those who did not develop coronary heart
CLINICAL RESEARCHERS frequently ask, "Do two disease over a 26-year period. In our analyses, coronary
groups differ on one or more variables?" Case-control heart disease comprised one or more of three conditions:
studies, cohort studies, and clinical trials are all examples angina pectoris, coronary insufficiency, and myocardial
of studies in which this question is ubiquitous. In a case- infarction. To compare the two groups of patients, we
control study, researchers may wish to compare "cases" chose ten characteristics determined at baseline and puta-
and controls on several potential exposures or back- tively associated with coronary heart disease: systolic
ground variables, as in a study of oral contraceptive use blood pressure, total serum cholesterol level, casual blood
and the incidence of myocardial infarction ( 1 ) . In cohort glucose level, number of cigarettes smoked per day,
studies and clinical trials, researchers may wish to mea- Framingham relative weight, age, serum hemoglobin lev-
sure various outcomes as well as baseline characteristics, el, vital lung capacity, serum uric acid level, and ventric-
as in the Framingham Heart Study (2, 3) and the Uni- ular rate. (We used the first recorded data among the
versity Group Diabetes Project ( 4 ) . first three examinations to explore the long-range predict-
Whereas statistical methods (Mests and chi-squared ability of these measures. If data were missing from the
tests) for comparing two groups on one variable are well first examination, we used data from the second examina-
known, techniques for comparing groups on two or more tion; if data were not available on the second examina-
variables are less widely understood and applied. In tion, we used information from the third examination. If
many cases, investigators simply use Mests or chi- data for any of the ten variables were missing on all three
squared tests, or both, for all variables. This strategy en- examinations, we excluded that patient from analysis.)
tails doing many tests of hypotheses, an approach with The total sample consisted of 2248 men, 640 who devel-
two potential weaknesses. First, separate tests on each oped and 1608 who did not develop coronary heart dis-
variable ignore the fact that some variables may be corre- ease over the 26-year period; we excluded 88 men who
lated; hence, the result of a test for a particular variable had missing information.
may, in fact, be due to differences between the groups on The means and standard deviations of each variable in
some other related measure (the comparison is con- the two groups and the p values (two-tailed) associated
founded). Second, the overall probability of finding at with the conventional two-sample Mest (Appendix I)
least one statistically significant difference between two are given in Table 1. From these ten separate tests of
groups when, in fact, there are no differences (type I hypotheses (one for each variable), one might conclude
error) is somewhat larger than the conventional 5% and that all variables except hemoglobin level differ signifi-
1 % levels chosen for conducting each individual test. cantly between the two groups. In other words, initial
This article examines the problem of multiple testing of measurements of systolic blood pressure, cholesterol lev-
hypotheses in the comparison of two groups on many el, casual blood sugar level, Framingham relative weight,
age, vital lung capacity, uric acid level, ventricular rate,
• From the Department of Epidemiology and Biostatistics, Boston University
School of Public Health; Boston, Massachusetts. and perhaps cigarettes smoked per day (p < 0.06) are
122 Annals of Internal Medicine. 1984;100:122-129.
Systolic Cholesterol Blood Cigarettes Relative Age Hemoglobin Lung Uric Pulse
Blood Sugar Weight Capacity Acid
Pressure
Systolic blood
pressure 10
Cholesterol 0.14 1.0
Blood sugar 0.12 0.03 1.0
Cigarettes -0.07 0.06 -0.05 1.0
Relative
weight 0.28 0.13 0.08 -0.07 1.0
Age 0.24 0.09 0.10 -0.15 0.03 1.0
Hemoglobin 0.07 0.11 0.00 0.08 0.14 -0.11 1.0
Lung capacity -0.17 -0.08 -0.05 -0.01 -0.07 -0.39 0.05 1.0
Uric acid 0.14 0.09 -0.03 -0.04 0.29 0.01 0.08 -0.02 1.0
Pulse 0.25 0.09 0.10 0.14 0.14 -0.02 0.05 -0.13 0.08 1.0
* n = 2248 men
three packages do discriminant analysis, which can also Unlike Hotelling's T 2 statistic, these multiple Mests at
be done with multiple linear regression techniques ( 6 ) . this second stage of analysis do not incorporate the corre-
Logistic regression can also be done with the SAS and lations among variables, even though Bonferroni's adjust-
BMDP programs. ment controls the overall alpha level. In fact, these tests
In the following discussion, we assume that all vari- use the same statistic that we criticized earlier on this
ables are continuous. After describing these methods, we account. Hence, this procedure allows a researcher to de-
discuss the issue of nominal variables. scribe the differences in mean values between the groups,
but it does not consider confounding variables or inter-
HOTELLING'S T 2 A N D B O N F E R R O N f S A D J U S T M E N T correlations among variables that may explain these dif-
The multivariate counterpart of the conventional inde- ferences.
pendent samples f-statistic for the comparison of mean For the data from the Framingham Heart Study, Ho-
values of two groups on a single variable is Hotelling's T 2 telling's T2 is 188.73. Hence,
statistic (8, 9, 20). Instead of using one variable, several
variables, say k, are Used in multivariate analyses. The F ( 1 8 8 7 3 ) 18 79
null hypothesis states that with multivariate considera- =22^0) - = -
tion of all k variables simultaneously, there are no differ- with 10 and 2237 degrees of freedom. This F value is
ences in the mean values between the two groups. The highly significant with p < 0.0001, suggesting that men
alternative hypothesis is that the two groups differ in who have coronary heart disease differ from men who
their mean values on at least one variable. To compute have no coronary heart disease in their averages on at
Hotelling's T 2 statistic, we need the mean value of each least one of the ten variables.
variable for each group and a pooled estimate of the com- Given the overall conclusion that men with coronary
mon variability in the two groups (Appendix 1). One heart disease differ from men without coronary heart dis-
rejects the null hypothesis if ease, we next examine each of the ten variables separate-
ly. With Bonferroni's adjustment we should conclude
that there is a significant difference between the two
(nx + n2 - 2) k groups whenever the p-value associated with the conven-
yields a significantly large value. This test statistic (F) tional ^-statistic is less than 0.05/10, or 0.005. Applying
has an F distribution with k (numerator) and this procedure to the results in Table 1, we see that the
n
l + n2 ~ k — 1 (denominator) degrees of freedom, two groups differ in their averages of systolic blood pres-
where n\ and n2 are the sample sizes of the two groups sure, cholesterol level, Framingham relative weight, age,
(8). and uric acid level. Vital lung capacity and ventricular
When Hotelling's T 2 is statistically significant, one rate appear to be marginally significant. Thus, if the two
rejects the overall null hypothesis and proceeds to exam- groups were actually alike, the overall chance of seeing at
least one difference as extreme as these is 0.05.
ine the individual variables in the second stage of multi-
variate analysis. Again, one is faced with many tests of
D I S C R I M I N A N T ANALYSIS A N D BONFERRONI'S
hypotheses and would like to control the overall alpha
ADJUSTMENT
level. One method, called Bonferroni's adjustment (5, 8),
reduces the significance level (alpha) for each compari- Another approach to the problem of comparing two
son and then uses a conventional two-sample r-test. Spe- groups on multiple variables is through discriminant
cifically, to do k tests, one for each variable, divide the analysis (6, 8, 22). First, the variables (say k) for each
alpha level by k\ if the test statistic yields a p-value of less patient are reduced to a single variable by using a "linear
than alpha/k, then one concludes that the two groups discriminant" function, L, to combine them:
differ on a particular variable. L = b0 4- bxXx + &2*2+ • • • W*Y
124 January 1984 • Annals of Internal Medicine • Volume 100 • Number 1