You are on page 1of 12

Mutation Research 417 Ž1998.

19–30

Recommendations for statistical designs of in vivo mutagenicity


tests with regard to subsequent statistical analysis
Ilse-Dore Adler a,) , James Bootman b, John Favor a , Graham Hook c ,
Gerlinde Schriever-Schwemmer a , Gerhard Welzl d , Elbert Whorton e,
Isao Yoshimura f , Makoto Hayashi g
a
¨ Saugetiergenetik,
GSF-Institut fur ¨ D-85758 Neuherberg, Germany
b
Pharmaco-LSR, Eye, Suffolk IP23 7PX, UK
c
National Institute of EnÕironmental Health Sciences, P.O. Box 12233, Research Triangle Park, NC 27709, USA
d
¨ Medizinische Informatik und Sytemforschung, D-85758 Neuherberg, Germany
GSF-Institut fur
e
DiÕision of Epidemiology and Biostatistics, UniÕersity of Texas Medical Branch J47, GalÕeston, TX 7555-1047, USA
f
Faculty of Engeneering, Science UniÕersity of Tokyo, 1-3 Kagurazaka, Shinjuku-ku, Tokyo 162, Japan
g
DiÕision of Genetics and Mutagenesis, National Institute of Health Sciences, 1-18-1, Kamiyoga, Satagaya-ku, Tokyo 158, Japan
Received 12 February 1998; revised 15 June 1998; accepted 25 June 1998

Abstract

A workshop was held on September 13 and 14, 1993, at the GSF, Neuherberg, Germany, to start a discussion of
experimental design and statistical analysis issues for three in vivo mutagenicity test systems, the micronucleus test in mouse
bone marrowrperipheral blood, the chromosomal aberration tests in mouse bone marrowrdifferentiating spermatogonia, and
the mouse dominant lethal test. The discussion has now come to conclusions which we would like to make generally known.
Rather than dwell upon specific statistical tests which could be used for data analysis, serious consideration was given to test
design. However, the test design, its power of detecting a given increase of adverse effects and the test statistics are
interrelated. Detailed analyses of historical negative control data led to important recommendations for each test system.
Concerning the statistical sensitivity parameters, a type I error of 0.05 Žone tailed., a type II error of 0.20 and a dose related
increase of twice the background Žnegative control. frequencies were generally adopted. It was recommended that sufficient
observations Žcells, implants. be planned for each analysis unit Žanimal. so that at least one adverse outcome Žmicronucleus,
aberrant cell, dead implant. would likely be observed. The treated animal was the smallest unit of analysis allowed. On the
basis of these general consideration the sample size was determined for each of the three assays. A minimum of 2000
immature erythrocytesranimal should be scored for micronuclei from each of at least 4 animals in each comparison group in
the micronucleus assays. A minimum of 200 cells should be scored for chromosomal aberrations from each of at least 5
animals in each comparison group in the aberration assays. In the dominant lethal test, a minimum of 400 implants Ž40–50
pregnant females. are required per dose group for each mating period. The analysis unit for the dominant lethal test would be
the treated male unless the background frequency of dead implants ŽDI. is so low that multiple males would need to be

)
Corresponding author. Tel.: q49-89-3187-2302; Fax: q49-89-3187-3099; E-mail: adler@gsf.de

1383-5718r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved.


PII: S 1 3 8 3 - 5 7 1 8 Ž 9 8 . 0 0 0 9 1 - 6
20 I.-D. Adler et al.r Mutation Research 417 (1998) 19–30

integrated to meet the minimum observation of one adverse outcome ŽDI. per analysis unit. A three-step strategy of data
analysis was proposed for the cytogenetic assays. Use of negative historical controls was allowed in certain circumstances
for interpretation of results from micronucleus tests and chromosomal aberration tests. q 1998 Elsevier Science B.V. All
rights reserved.

Keywords: Experimental design; Statistical analysis; Micronucleus test in mouse bone marrowrperipheral blood; Chromosomal aberration
tests in mouse bone marrowrdifferentiating spermatogonia; Mouse dominant lethal test

1. Introduction response w7x that ‘statistical techniques are an essen-


tial tool in the interpretation of results’ w8x. We go
During the Melbourne Workshop on Harmoniza- even one step further in emphasizing that experimen-
tion of Test Guidelines it was decided to organize a tal design should be based on statistical considera-
separate meeting in which the statistical considera- tions w9x.
tions of experimental design and the statistical treat- Second, the smallest unit for statistical compar-
ment of data could be discussed and harmonized. isons is the treated animal. Limited animal resources
The test systems considered during the statistics and animal protection criteria indicate that it is im-
workshop held at the GSF in Neuherberg, Germany, portant to design the experiment with the minimal
on September 13 and 14, 1993, were in vivo mi- but optimal animal number using statistical consider-
cronucleus tests for bone marrow and peripheral ations.
blood, tests for chromosomal aberrations in bone Third, the design of the experiment has to be
marrow cells and differentiating spermatogonia, and based on the a-level Žtype I error. for recognizing a
the rodent dominant lethal assay. Even though, some predetermined increment over the spontaneous level
years have passed between the statistics workshop with an acceptable statistical power Ž1-type II error..
and the present publication, we are convinced that Fourth, all experiments must have a concurrent
these considerations are still of general value w1x. negative control and all concurrent negative controls
A number of guidelines and reviews on the statis- must be added to the negative historical control data
tics of the tests under consideration already exist base.
w2–5x. Therefore, the workshop focused on general Concurrent negative controls are essential for
concepts for the design of experiments rather than ) controlling the technical conduct of the ex-
specific statistical methods for data analysis of each periment and
test system. To approach this, we analyzed data ) appropriate statistical treatment of the data.
bases which were provided for this workshop and Historical negative controls when available are the
characterized the nature of the data for each test. basis for
) designing the experiment and
) accepting a test Žby comparing concurrent
2. General comments negative control values to the historical negative
control, Fig. 1, step 1..
The following statements represent a description Lastly, the increment in mutational events over
of the framework within which any statistical design the spontaneous level to be recognized is determined
can be applied. by biological criteria rather than by purely statisti-
First, statistics can provide guidance for objective cally achievable precision.
decision making, however, final conclusions have to With the tests under study, the ability to detect a
be based on biological judgements. Even though doubling of the observed endpoint at an a-level of
Ashby and Tinwell w6x suggested that there was no 0.05 and a power of 80% was considered to be an
need for statistics in their sequential approach of the easily achievable minimum standard. The sample
micronucleus test, we agree with Festing and Lovell’s size should be large enough to assure that at least
I.-D. Adler et al.r Mutation Research 417 (1998) 19–30 21

Fig. 1. Strategy of statistical evaluation of test data for micronucleated PCE and aberrant bone-marrow cells. The strategy applies for data
which contain several doses and a negative control. Historical negative controls serve two purposes; first to accept or reject a set of data, and
second, to evaluate individual data points by pair-wise comparison.

one adverse event Žmicronucleus, aberrant cell, dead and K ŽTable 1.. The data of laboratory K could
implant. will be recorded for each analysis unit. It deviate from the theoretical distribution because of
should be generally required that a clear description the rather small number of groups. The reason, how-
of the design of the experiment including the power ever, for the small Poisson index for the data of
of the test, the statistical methods used to analyse the laboratory A is not clear. Although there were these
data and the level of significance actually achieved is two exceptions, it can be concluded that the frequen-
required in every data report. cies of spontaneous micronucleated immature ery-
throcytes in negative Žno treatment or vehicle. con-
trols follow a Poisson distribution Ža binomial distri-
3. Micronucleus test bution, as well. thus the differences among animals
are minimal.
The analysed variables are firstly, the proportion The results of the F-test on the micronucleus data
of immature erythrocytes with micronuclei ŽMN. and excluding the two groups with exceptional factors
secondly, for bone marrow toxicity, the proportion of are summarized in Table 2. Data for the F-test were
immature erythrocytes among total erythrocytes. not transformed because some data were only pro-
vided as percent micronucleated cells. A preliminary
3.1. Historical control data analyzed trial with selected data showed no difference be-
tween the results of F-tests with square-root trans-
The historical control data of the micronucleus formed and non-transformed data. When F-ratios are
test have been provided by 12 laboratories Ž7 from relatively small then these factors do not seriously
Japan, 2 from UK, 2 from USA, and 1 from Ger- affect the background level. However, there are sev-
many. and are comprised of 1299 experiments with a eral factors affecting the frequencies of micronucle-
total of 6883 mice ŽTable 1.. In most laboratories, ated immature erythrocytes, i.e., years when data
the Poisson index of dispersion was approximately were obtained Žpossibly by different investigators.,
1.0 as expected with two exceptional laboratories, A strains of mice, and vehiclersolvents.
22 I.-D. Adler et al.r Mutation Research 417 (1998) 19–30

Table 1
Characteristics of the historical negative control data provided from micronucleus assays
X
Laboratory Total no of Total no of No. of animals per group Frequencies of MNPCE Mean T
animals groupsa Mean Max Min Mean S.D. Max Min
A 555 119 47 8 2 0.187 0.071 0.500 0.067 0.585
B 357 62 5.8 7 2 0.127 0.076 0.317 0.000 1.083
C 483 84 5.8 6 3 0.214 0.076 0.440 0.083 0.945
D 122 20 6.1 15 2 0.216 0.054 0.233 0.050 0.980
E 470 84 5.6 6 4 0.122 0.049 0.240 0.025 1.090
F 1026 193 6.3 12 1 0.278 0.087 0.600 0.000 0.903
G 395 69 5.7 6 4 0.208 0.067 0.400 0.075 1.021
b
H 601 121 5.0 5 3 0.185 0.074 0.380 0.020
I 599 120 5.0 5 4 0.090 0.053 0.244 0.000 1.041
J 1182 234 5.1 7 4 0.073 0.039 0.218 0.010 0.918
K 222 34 6.5 9 5 0.133 0.033 0.200 0.071 0.743
L 871 159 5.5 10 3 0.175 0.067 0.350 0.030 1.093
Total 6883 1299
a
Males and females in an experiment were considered as different groups.
b X
T could not be calculated because data were provided in percent with a significant figure of 2.
X
T : ŽPoisson index of dispersion.rŽgroup size minus 1..

On that basis, historical negative controls should the criteria necessary for quality of the historical
be compiled separately for strains and vehicles within control data are met, these data may be combined
each laboratory and it should comprise at least 10 over vehicles and time. To assure the quality of the
separate experiments. It was agreed that as long as historical control data there must exist no systematic

Table 2
Summary of F-tests of the historical negative control data provided from micronucleus assays
Laboratory Year Strain Vehicle Route No. of Sample time Cell type Sex No. of cells
treatments analysed
per animal
A ? q " y q q y " 1000
B ? BDF1 q q y y B " 1000
C ? BDF1 y y y q B y 1000
D " CD-1 y y y 24 B M 1000
E ? q y y y q B M y
F ? q nt nt $ $ P q 1000
G " ddY q " y y B M 1000
H q Swiss " po " y q y D
I ? ICR q po q q B y 1000
J ? CD-1 " ? q y B q q
K ? 102 = C3H y ip 1 24 B y 2000
L q B6C3F1 " ip 3 " B M 2000

q: Significant; ": Ambiguous; y: Not significant; ?: No data available; D: Probably identical numbers of cells were evaluated; nt: No
treatment, therefore ‘Number of treatments’ and ‘Sample time’ do not apply Ž$..
When name of strain, treatment Žpo or ip., Number of treatments Ž1 or 3., sample time Ž24., cell type ŽB s bone marrow or P s peripheral
blood., sex ŽM s males., and number of cells analysed per animal Ž1000. appear in the table, these factors were used exclusively in the
respective laboratory.
I.-D. Adler et al.r Mutation Research 417 (1998) 19–30 23

differences between the various control groups, cur- of statistical evaluation are proposed w12,13x. The
rent and historical w10x. Factors in achieving the steps are illustrated in Fig. 1.
above include: Ž1. Acceptance of the test: A test is acceptable
1. The method of scoring the response must be only if the mean of the concurrent negative control is
unchanged during the relevant period. within the mean " 3SD of the historical control,
2. The experimental units must be comparable where SD relates to the interanimal variability. Non-
throughout the period. acceptance requires a new experiment Žtest..
3. The experimental protocol must have remained Ž2. Dose–response analysis: When accepted, the
fixed throughout the period covered by the histor- data are evaluated by an appropriate trend analysis
ical data and the current experiment. using the concurrent negative control data w22x. A
significant positive trend test signifies an effect of
3.2. Size of the experiment the treatment.
Ž3. Evaluation of individual group responses: Each
On the basis of the historical control data Žaround
treated group mean is then evaluated by pair-wise
1–2 MNr1000 immature erythrocytes. a per animal
comparison to the historical negative control. If none
sample size of 1000 immature erythrocytes is insuffi-
of the individual treated groups shows a significant
cient to detect at least one micronucleus per animal.
difference but significance was shown in the trend
To avoid too many zero counts a minimum of 2000
test the interpretation requires biological judgement.
immature erythrocytes per animal should be scored.
The same applies to data which show no trend but a
The minimum number of animals is 4 per group if
significant difference between an individual treat-
the historical control is 1.5 MNr1000 ŽTable 3 and
ment group and the historical negative control pro-
Appendix A.. The protocol described above is suffi-
vided the pair-wise evaluation was corrected for
cient to detect at least 80% of chemicals which
multiplicity of comparisons w14x. The test may then
induce a 2-fold increase in micronucleated immature
be considered equivocal.
erythrocytes over the historical control level at the
The importance of the historical control mean and
significance level of 0.05 ŽTable 3, Fig. 3.. If a
variance makes it necessary that both should be
laboratory does not follow this sample size recom-
reported. We agree with Margolin and Risko w5x that
mendation it should justify their own experimental
historical control can be easily misused without
design and report the power of detection and the
proper safeguard Žsee Section 3.1..
assayed increment over the negative control.
This strategy is an illustration of an approach
3.3. Statistical methods to be proposed rather than a recommendation and laboratories should
feel free to choose their own approaches as long as
An experiment typically comprises a negative they justify them adequately.
Žsolvent. control group, three dose groups and a The procedure described here is applicable only to
positive control group w11x. Three consecutive steps the data of the frequencies of micronucleated imma-

Table 3
Power Ž%. for the micronucleus assay based on the mean frequencies of micronucleated cells in the negative controls under the given
sample size to detect a doubling of the spontaneous frequency with a significance of 5%
Percent of MN cells Number of animals per group Ž2000 cells per animal.
in negative 2 3 4 5 6
historical controls
0.10 42.9 65.2 73.4 82.8 86.4
0.15 63.6 78.4 86.3 91.8 96.2
0.20 73.1 86.7 93.2 96.6 98.6
0.25 82.1 91.7 96.6 98.7 99.4
0.30 87.1 95.8 98.3 99.5 99.8
24 I.-D. Adler et al.r Mutation Research 417 (1998) 19–30

ture erythrocytes and not to the ratio of the immature full potential, i.e., the variables analysed should in-
erythrocytes to total erythrocytes. The latter can be clude both the
approximated by a Gaussian distribution and stan- ) percentage of damaged cells, and
dard methods of statistical analysis are acceptable ) the types of aberrations observed, i.e., gaps,
w4x. breaks and exchanges.
The more laborious and skill-demanding cytoge-
netic test is best used for validationrcharacterization
4. Chromosomal aberration tests purposes.

The following considerations pertain to the analy- 4.1. Historical control data analyzed
sis of structural chromosome aberrations in bone
marrow cells as well as in differentiating spermato- The historical control data of the chromosomal
gonia. To make the effort of scoring all types of aberration test have been provided by 2 laboratories
aberrations meaningful the test should be used to its from the USA and are comprised of 135 experiments

Fig. 2. Comparison of negative control data from the bone marrow chromosomal aberration tests in two laboratories over time
Žmeans" SEM..
I.-D. Adler et al.r Mutation Research 417 (1998) 19–30 25

with 1073 mice. Vehicles, means and variances are


summarized in Table 4. The analysis of the historical
control data indicated that laboratory, vehicle, and
year contribute to total variability ŽFig. 2 and Table
4.. Thus, historical negative controls have to be
compiled separately for strains and vehicles within
each laboratory and it should comprise at least 10
separate experiments. The same criteria for historical
data quality apply as in the micronucleus test.

4.2. Size of the experiment

As in the micronucleus test, an experiment typi-


cally comprises a negative Žsolvent. control group,
three dose groups and a positive control group. Fig. 3. The type I error and the power of the proposed three-step
On the basis of the historical control data Žaverag- procedure for the micronucleus assay. For the simulation, dose
ing 1.0% damaged cells., a per animal sample size of levels were set as Ž d 0 , d1 , d 2 , d 3 . s Ž0, 1, 2, 4. and the assumed
population proportions of micronucleated cells of historical con-
100 cells is insufficient to detect at least one aberrant
trols as p s 0.1%Ž`., p s 0.15% Žv ., p s 0.2% ŽI., and
cell per animal. To avoid too many zero counts, a p s 0.25% ŽB.. The dose responses were assumed as Žp 0 , p 1 ,
minimum of 200 cells per animal should be scored. p 2 , p 3 ,. s Žp , p , p , p . for the type I error and as Ž1=p ,
The minimum number of animals is 5 per compari- 1.25=p , 1.5=p , 2.0=p . for the power. A total of 5000 quasi
son group. This sample size is sufficient to detect at experiments were performed in the present simulation study. To
keep the family-wise type I error at 5%, the nominal significance
least 80% of chemicals which induce a 2-fold in-
levels for step 2 ŽCochran–Armitage trend test. and step 3 Žcom-
crease in aberrant cells over the historical control parison with a historical control. were set as 20% and 5% Ž15r3.,
level of 1.0% and above at the significance level of respectively.
0.05 ŽTable 5, Fig. 4.. If a laboratory does not follow
this recommendation they should justify their own
experimental design and report the power of detec- group to detect a doubling over a control rate of 1
tion and the assayed increment over the negative exchange in 1000 cells. Using 200 cells from each of
control. 5 animals per group would only have a power of
Since the background frequency for exchanges is 15–20% to detect a doubling over the background
below 1r1000 cells, statistical evaluation of ex- rate.
change frequencies in treated groups has to rely on
historical negative controls. However, until the ex- 4.3. Statistical methods for data eÕaluation
change rate in the treatment group is very high the
assay will not show anything with its presently rec- The same three consecutive steps of statistical
ommended size. It would require 25 000 cells per evaluation as for the micronucleus data are recom-
mended. However, if a different experimental design
is used and justified, i.e., employing only one or two
Table 4 dose groups, the statistical evaluation may be re-
The historical negative control data for the bone marrow chromo- stricted necessarily to pair-wise comparisons be-
somal aberration test Žpercent of aberrant bone marrow cells.
tween treated and concurrent negative controls w4x.
Laboratory Vehicle No. of Mean Variance
animals
1 corn oil 366 0.69 0.76 5. Dominant lethal test
buffered saline 120 0.77 0.74
2 corn oil 428 1.19 1.60
buffered saline 159 0.89 1.37 The performance of dominant lethal experiments
can serve either to identify germ cell mutagens or to
26 I.-D. Adler et al.r Mutation Research 417 (1998) 19–30

Table 5
Power Ž%. for the chromosomal aberration assay based on the mean frequencies of aberrant cells in the negative controls under the given
sample size to detect a doubling of the spontaneous frequency with a significance of 5%
Percent of aberrant Number of cells per group
cells in negative 400 600 800 1000 1200 1400 1600 1800
historical controls
0.5 23.8 38.9 43.9 56.0 66.0 65.8 73.8 78.7
1.0 44.7 66.7 73.0 83.5 86.6 91.3 93.6 96.2
1.5 65.8 78.9 87.4 92.4 95.6 97.2 98.7 99.1
2.0 75.3 94.2 95.0 96.5 98.6 99.3 99.4 99.9

characterize mutagenic effects in germ cells. De- cal dominant lethal test with treated males. They are
pending on the purpose, continuous treatment of all applicable to experiments for identification of andror
developmental stages from meiosis to mature ga- for characterization of germ cell mutagens.
metes followed by a short mating period or single
acute treatment followed by sequential matings are
commonly used. The statistical treatment of domi- 5.1. Definition of Õariables
nant lethal data has been described by various au-
thors w2,15–20x. The present recommendations of the Many authors have defined the possible variables
statistical test design are directed towards the classi- to be included in the evaluation of dominant lethal
data w15,21–24x. The main parameters of dominant
lethality are )Postimplantation loss: the proportion
of dead implants ŽDI.. )Preimplantation loss: aver-
age number of total implants per pregnant female in
the control minus average number of total implants
per pregnant female in the treated group. If corpora
lutea are counted, then preimplantation loss is calcu-
lated as corpora lutea minus total implants.
Other variables to be evaluated are:
) pregnancy rate,
) live implants per pregnant female,
) dead implants per pregnant female,
) contribution of early death and late death to
total death.
The analysis of all variables should be male ori-
ented. Although the smallest unit for analysis may be
Fig. 4. The type I error and the power of the proposed three-step the individual male, an analysis unit may involve
procedure for the bone marrow chromosomal aberration test. For multiple males, that is, it may be necessary to com-
the simulation, dose levels were set as Ž d 0 , d1 , d 2 , d 3 . s Ž0, 1, 2, bine the results for two or more males to achieve a
4. and the assumed population proportions of micronucleated cells
of historical controls as p s 0.5% Ž`., p s1.0% Žv ., p s1.5%
sufficient number of implants per analysis unit to
ŽI., and p s 2.0% ŽB.. The dose–responses were assumed as analyse the DI. The results for each variable can be
Žp 0 , p 1 , p 2 , p 3 ,. s Žp , p , p , p . for the type I error and Ž1=p , analyzed using analysis of variance methods after
1.25=p , 1.5=p , 2.0=p . for the power. A total of 5000 quasi transformation to achieve reasonable normality and
experiments were performed in the present simulation study. To similar intra-group variances w15,25x. It is particu-
keep the family-wise type I error at 5%, the nominal significance
levels for step 2 ŽCochran–Armitage trend test. and step 3 Žcom-
larly useful to present the data not only in tabular
parison with a historical control. were set as 20% and 5% Ž15r3., form containing all these parameters but to illustrate
respectively. the data graphically w26,27x.
I.-D. Adler et al.r Mutation Research 417 (1998) 19–30 27

5.2. Size of the experiment Table 7


Approximate power achievable to detect actual increases above
negative control frequencies of dead implants ŽDI. when 400
It is known that DI-frequencies in controls are implants are available per group
different for different laboratories and different
DI-frequencies DI-frequencies in exposure groups Ž%.
strains of mice. Binomial distribution was assumed in control 10% 15% 20% 25%
with different control rates of DI-frequencies of 5%, groups
10% and 15% in order to determine the number of 5% 86.0 99.9 99.9 99.9
total implants needed in each comparison group to 10% na 70.0 99.0 99.9
achieve desired power with a one sided type I error 15% na na 60.0 97.0
of 0.05 ŽTables 6 and 7.. Extra-binomial variability,
na s not applicable.
as cautioned to exist by Generoso and Piegorsch
w28x, would call for increases in the implant require-
ments Žsample sizes. in each comparison group or power to detect a doubling is 70%, and to detect an
would result in slight decreases in the assay’s power. absolute increase of 5% DI the power is 95%. Thus,
The actual variance, whatever its size may be, will it is recommended that at least 400 total implants
be aptly considered when appropriately employing Žrequiring about 40 pregnant females. be attained for
the analysis of variance. The test assesses a no effect each comparison group and that each analysis unit
hypothesis against a one-sided increase above con- contains sufficient numbers of females and implants
trols. so that at least one DI is expected w20x.
In Table 6 it is seen that using at least 333 Table 8 gives the approximate number of implants
implants per group would allow detection of a 10% Žand females. needed in each analysis unit to achieve
DI-frequency in the treated group above a 5% con- at least one expected DI according to the size of the
trol frequency with 80% power; fewer implants would background rate and assuming 10 total implants per
be needed to detect a doubling over a 10% and 15% pregnant female. It is seen that with mating ratios of
control frequency. However, in excess of 500 im- 1 male to 3 females the male is acceptable as an
plants are needed to detect an absolute increase of analysis unit when a control rate is 3% DI or more.
5% DI over a 10% and 15% background. Using a mating ratio of 1:2, the male is acceptable as
Table 7 shows that using 400 implants per group the analysis unit when the control rate is 5% DI or
achieves 86% power to detect an absolute increase of more. Results from two males may be randomly
5% DI over a 5% control frequency and somewhat combined to form an analysis unit Žclustering. which
less power to detect an absolute increase of 5% DI meets the recommendation. Males should be ran-
over control frequencies of 10% and 15%. The power domly integrated into analysis units and the identities
is excellent for detecting doublings of control rates preserved before the experiment is initiated to pre-
in all instances when controls are greater than 5% serve data analysis uniqueness w20x. Using a 1:1
DI. Even when a control rate is as low as 3% DI the

Table 8
Approximate number of implants and number of pregnant females
Table 6
in each analysis unit needed to yield at least one dead implant
Approximate number of implants Žpregnant females. needed in
ŽDI. in negative controls with a range of DI-frequencies
each group to detect actual increases in the proportion of dead
implants ŽDI. when a s 0.05 and power is 80% DI-frequencies in negative controls
DI frequencies DI-frequencies in exposure groups Ž%. 3% 5% 7% 10% 12% 15%
in control 10% 15% 20% 25% Implants 30 20 15 10 8 7
groups Females 3 2 2 1 1 1
5% 333 Ž33. 105 Ž11. 55 Ž6. 35 Ž4. Males Ž1:3. a 1 1 1 1 1 1
10% na 535 Ž54. 155 Ž16. 76 Ž8. Males Ž1:2. a 2 1 1 1 1 1
15% na na 710 Ž71. 195 Ž20. Males Ž1:1. a 3 2 2 1 1 1
a
na s not applicable. Mating ratio.
28 I.-D. Adler et al.r Mutation Research 417 (1998) 19–30

mating ratio will require male integrations when Acknowledgements


control rates are lower than 7% DI and the individual
male is sufficient when control rates exceed 7% DI. We wish to thank all those who generously pro-
vided raw data of in vivo cytogenetic tests to us:
5.3. Statistical methods to be proposed Biosafety Research Center, Foods, Drugs and Pesti-
cides, Japan; Food and Drug Safety Center, Hatano
Each outcome variable which is analysed should ¨ Saugtier-
Research Institute, Japan; GSF Institut fur ¨
first be computed from the data for each analysis genetik, Germany; Hazleton, WA, USA; Hazleton
unit. All statistical analyses should be based on the Microtest, UK; Itoham Food, Central Research Insti-
analysis-unit results and any appropriate statistical tute, Japan; Japan Tobacco, Toxicology Research
method may be used, so long as test-procedure re- Laboratories, Japan; Japan Bioassay Laboratory,
quirements are reasonably met. However, using the Japan Industrial Safety and Health Association,
analysis of variance on transformed results Žto Japan; Life Science Research, Industrial Toxicology,
achieve reasonable normality and equal variance. or UK; Merck Research Laboratories, Genetic and Cel-
non-parametric tests are to be preferred. Although lular Toxicology, USA; Mitsubishi-Kasei Institute of
data may be transformed for analysis to determine Toxicological and Environmental Sciences, Japan;
the levels of significance, a summary of the original National Institute of Health Sciences, Japan; Na-
results should be presented in tabular form. It should tional Institute of Environmental Health Sciences,
contain original counts of pregnant females, live and NIEHSrNTP, USA; SRI International, Toxicology
dead implants as well as the unit of the original Laboratory, USA. We also wish to thank C. Hamada,
variables according to dose and mating period. This University of Tokyo, for performing the analysis of
should be accomplished by back-transforming the the historical micronucleus data.
relevant means and standard errors. A description of
all the statistical procedures which are used is re-
quired together with a reasonable assurance that they Appendix A
met the conditions necessary for their use. Doses and
mating periods should be evaluated statistically for The historical control data provided for this task
trend andror differences and the observed levels of showed that the mean frequencies of micronucleated
significance Žp-values. should always be stated for immature erythrocytes ranged from 0.073 to 0.278%
each statistical test conducted. ŽTable 1 in the text.. Also published negative control
data ranged from 0.0 to 0.3 w29x. We performed a
small simulation study to assess the power for the
6. Conclusion micronucleus assay and the chromosomal aberration
test according to the statistical strategy agreed to by
Our work group has prepared recommendations the working group. The assumptions of the simula-
for test designs and statistical analyses by reviewing tion study were as follows: Ž1. The experiment con-
negative control data collected from different labora- sists of three treatment groups and a concurrent
tories. For all assay systems considered, we found it negative control. The dose levels were set as Ž d 0 , d1 ,
possible to recommend practical sizes of experiments d 2 , d 3 . s Ž0, 1, 2, 4.. Ž2. The assumed population
which give a high Ž80%. probability of detecting a proportions of micronucleated cells of historical con-
doubling of the spontaneous rate. This was consid- trols were set as p s 0.1%, 0.15%, 0.2%, 0.25%,
ered relevant for hazard identification. It must be and 0.3% for the micronucleus assay and p s 0.5%,
understood by all scientists performing these assays 1.0%, 1.5%, and 2.0% for the chromosomal aberra-
that statistical analysis is an important aid to data tion test. Ž3. The dose–responses were assumed as
interpretation. In particular, the power of the selected Žp 0 , p 1 , p 2 , p 3 ,. s Žp , p , p , p . for the type I
test design and the a probability threshold selected error and Ž1 = p , 1.25 = p , 1.5 = p , 2.0 = p . for
will determine the utility of a negative result for the power for both tests. Ž4. The statistical procedure
hazard identification. was described in the text and Fig. 1. In the first step,
I.-D. Adler et al.r Mutation Research 417 (1998) 19–30 29

when the value of the negative control in the simula- replaced by such a procedure for a downturn in the
tion was out of the 3)SD range, new series of dose–response data.
random numbers were generated. In the second step,
the Cochran–Armitage trend test using logŽdose.
was applied and in the third step w10,30,31x, each References
value of the treatment group was evaluated by com-
paring it with the historical control approximated by w1x D. Hauschke, M. Hayashi, K.K. Lin, D.P. Lovell, W.D.
the binomial distribution BŽ2000, p . and BŽ200, p . Robinson, I. Yoshimura, Recommendations for biostatistics
for the micronucleus assay and the chromosomal in mutagenicity studies, Drug Information J. 31 Ž1997. 323–
aberration test, respectively w12x. Ž5. To keep the 326.
w2x D.O. Chanter, A. Bateman, A.K. Palmer, M. Richold, D.
family-wise type I error at 5%, after trial and error Anderson, D.P. Lovell, M.T. Stevens, Statistical methods for
trials the nominal significance levels for step 2 and the dominant lethal assay, in: D. Kirkland ŽEd.., Statistical
step 3 were set as 20% and 15%, respectively. Evaluation of Mutagenicity Test Data, Cambridge University
Actually, in the third step, the significance level for Press, Cambridge, 1989, pp. 233–250.
w3x J.T. MacGregor, J.A. Heddle, M. Hite, B.H. Margolin, C.
each group was 5% Ž15r3. because multiplicity of
Ramel, M.F. Salamone, R.R. Tice, D. Wild, Guidelines for
comparisons was considered. Ž6. A total of 5000 the conduct of micronucleus assays in mammalian bone
quasi experiments were performed and the probabil- marrow erythrocytes, Mutation Res. 189 Ž1987. 103–112.
ity of a type I error or the power of the procedure w4x D.P. Lovell, D. Anderson, R. Albanese, G.E. Amphlett, G.
were estimated with the coefficient of variation of Clare, R. Ferguson, M. Richold, D.G. Papworth, J.R.K.
about 0.06. Savage, Statistical analysis of in vivo cytogenetic assays, in:
D.J. Kirkland ŽEd.., Statistical Evaluation of Mutagenicity
The results of the simulation study are shown in Test Data, Cambridge University Press, Cambridge, 1989,
Figs. 3 and 4, for the micronucleus assay and the pp. 184–232.
chromosomal aberration test, respectively. The simu- w5x B.H. Margolin, K.J. Risko, The statistical analysis of in vivo
lation results show that with the use of four or more genotoxicity data: case studies of the rat hepatocyte UDS and
animals per group and the analysis of 2000 cells per mouse bone marrow micronucleus assays, in: J. Ashby, F.J.
de Serres, M.D. Shelby, B.H. Margolin, M. Ishidate, Jr.,
animal the power to detect a doubling of the histori- G.C. Becking ŽEds.., Evaluation of Short-term Tests for
cal control level is more than 80% when the mean Carcinogens: Report of the International Programme on
percent of the micronucleated cells in the historical Chemical Safety’s Collaborative Study on In Vivo Assays,
control is 0.15% or higher. Even for the historical Cambridge University Press, Cambridge, 1988, 1.29–1.42.
w6x J. Ashby, H. Tinwell, A sequential approach to testing with
control value as low as 0.1%, analysing 2000 cells
the rodent bone marrow micronucleus assay—obviation of
from each of five animals gives more than 80% the need for statistical analysis of the data, Mutation Res.
power. In the chromosomal aberration test, with 5 327 Ž1995. 49–55.
animals per dose and 200 cells scored per animal the w7x M.F.W. Festing, D.P. Lovell, The need for statistical analysis
power to detect a doubling of the historical control of rodent micronucleus test data. Comment on the paper by
level is more than 80% when the mean percent of Ashby and Tinwell, Mutation Res. 329 Ž1995. 221–224.
w8x M. Hayashi, T. Sofuni, A reaction to ‘A sequential approach
cells with chromosomal aberrations in the historical to testing with the rodent bone marrow micronucleus assay’,
control is greater or equal to 1.0%. Mutation Res. 331 Ž1995. 173–174.
Here, we did not consider the downturn phe- w9x M.F.W. Festing, The scope for improving the design of
nomenon in the present simulation. Sometimes, in laboratory animal experiments, Lab. Animal 26 Ž1992. 256–
the micronucleus assay downturns have been ob- 267.
w10x B.H. Margolin, K.J. Risko, The use of historical data in
served. Several procedures have been reported to laboratory studies, in: Proceedings of the International Bio-
overcome this phenomenon w32–37x. The procedure metrics Conference, Vol. 12, 1984, pp. 21–30.
proposed by Simpson and Margolin w36,37x is recom- w11x M. Hayashi, T. Sofuni, The need for three dose levels to
mended for the micronucleus assay. Their original detect genotoxic chemicals in in vivo rodent assays, Mutation
method was nonparametric testing but the strategy Res. 327 Ž1995. 247–251.
w12x M. Hayashi, I. Yoshimura, T. Sofuni, M. Ishidate Jr., A
can be applied to parametric procedures as well. The procedure for data analysis of the rodent micronucleus test
simple trend test for monotone dose–response data involving a historical control, Environ. Mol. Mutagen. 13
as the second step of the proposed method can be Ž1989. 347–356.
30 I.-D. Adler et al.r Mutation Research 417 (1998) 19–30

w13x M. Hayashi, S. Hashimoto, Y. Sakamoto, C. Hamada, T. w25x E.B. Whorton Jr., T.G. Pullin, A.F. Frost, A. Onofre, M.S.
Sofuni, I. Yoshimura, Statistical analysis of data in muta- Legator, D.S. Folse, Dominant lethal effects of n-butyl gly-
genicity assays: rodent micronucleus assay, Environ. Health cidyl ether in mice, Mutation Res. 124 Ž1983. 225–233.
Perspect. 102 Ž1994. 49–52, Suppl. 1. w26x J. Ashby, Rodent mutation assay data presentation and statis-
w14x R.J. Simes, An improved Bonferroni procedure for multiple tical assessment, Mutation Res. 331 Ž1995. 233–235.
tests of significance, Biometrics 73 Ž1986. 751–754. w27x J. Ashby, M.J.L. Clapp, The rodent dominant lethal assay: a
w15x S. Green, K.S. Lavappa, M. Manadhar, C. Sheu, E. Whorton, proposed format for data presentations that alerts to pseudo-
J.A. Springer, A guide for mutagenicity testing using the dominant lethal effects, Mutation Res. 320 Ž1995. 209–218.
dominant lethal assay, Mutation Res. 189 Ž1987. 167–174. w28x W.M. Generoso, W.W. Piegorsch, Dominant lethal tests in
w16x A.-M.C. Lockhart, W.W. Piegorsch, J.B. Bishop, Assessing male and female mice, Methods in Toxicology, Vol. 3A,
overdispersion and dose–response in the male dominant lethal Acad. Press, 1993, pp. 124–141.
assay, Mutation Res. 272 Ž1992. 35–58. w29x M.F. Salamone, K.H. Mavournin, Bone marrow micronu-
w17x J.K. Piegorsch, W.W. Haseman, Statistical methods for cleus assay: a review of the mouse stocks used and their
analysing developmental toxicity data, Teratogen., Carcino- published mean spontaneous micronucleus frequencies, Envi-
gen., Mutagen. 11 Ž1991. 115–133. ron. Mol. Mutagen. 23 Ž1994. 239–273.
w18x D.M. Smith, D.A. James, A comparison of alternative distri- w30x P. Armitage, Tests for linear trends in proportions and fre-
butions of postimplantation death in the dominant lethal quencies, Biometrics 11 Ž1955. 375–386.
assay, Mutation Res. 128 Ž1984. 195–206. w31x W.G. Cochran, Some methods forstrengthening the common
w19x J. Vollmar, Statistical problems in mutagenicity tests, Arch. chi-square tests, Biometrics 10 Ž1954. 417–451.
Toxicol. 38 Ž1977. 13–25. w32x Y.I. Chen, D.A. Wolfe, Nonparametric procedures for com-
w20x E.B. Whorton Jr., Parametric statistical methods and sample paring umbrella pattern treatment effects with a control in a
size considerations for dominant lethal experiments. The use one-way layout, Biometrics 49 Ž1992. 455–465.
of clustering to achieve approximate normality, Teratogen., w33x D. Krewski, B.G. Leroux, S.R. Bleuer, L.H. Broekhoven,
Carcinogen., Mutagen. 1 Ž1981. 353–360. Modeling the Ames SalmonellarMicrosome assay, Biomet-
w21x D. Anderson, The dominant lethal test in rodents, in: S. rics 49 Ž1993. 499–510.
Venitt, J.M. Parry ŽEds.., Mutagenicity Testing. A Practical w34x W.W. Piegorsch, Nonparametric methods to assess non-
Approach, IRL Press, Oxford, 1984, pp. 307–335. monotone dose response: application to genetic toxicology,
w22x A.J. Bateman, S.S. Epstein, Dominant lethal mutations in in: P.K. Sen, I.A. Salama ŽEds.., Order Statistics and Non-
mammals, in: A. Hollaender ŽEd.., Chemical Mutagens, Prin- parametrics: Theory and Application, Elsevier, Amsterdam,
ciples and Methods of Their Detection, Vol. 2, Plenum, New 1992, pp. 419–430.
York, 1971, pp. 541–568. w35x S. Schmoor, M. Schumacher, Adaptive statistical procedures
w23x U.H. Ehling, L. Machemer, W. Buselmaier, J. Dycka, H. for the analysis of non-monotone dose–response relation-
¨
Frohberg, J. Kratochvilova, R. Lang, D. Lorke, D. Muller, J. ships, Biometrie und Informatik in Medizin und Biologie 23
¨
Peh, G. Rohrborn, R. Roll, M. Schulze-Schenking, H. Wie- Ž1992. 113–126.
mann, Standard protocol for the dominant lethal test on male w36x D.G. Simpson, B.H. Margolin, Recursive nonparametric test-
mice, Arch. Toxicol. 39 Ž1978. 173–185. ing for dose–response relationships subject to downturn at
w24x A.-M.C. Lockhart, J.B. Bishop, W.W. Piegorsch, Issues re- high dose, Biometrika 73 Ž1986. 586–596.
garding data acquisition and analysis in the dominant lethal w37x D.G. Simpson, B.H. Margolin, Nonparametric testing for
assay, Proc. Biopharm. Sect. Amer. Stat. Association Ž1991. dose–response curves subject to downturns. Asymptotic
234–237. power considerations, Annals Statistics 18 Ž1990. 373–390.

You might also like