You are on page 1of 15

Reducing the Use of Laboratory Animals in Toxicological Research and Testing by Better

Experimental Design
Author(s): Michael F. W. Festing and David P. Lovell
Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 58, No. 1
(1996), pp. 127-140
Published by: Wiley for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2346169 .
Accessed: 30/11/2014 00:06
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to
Journal of the Royal Statistical Society. Series B (Methodological).

http://www.jstor.org

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

J.R. Statist.Soc. B (1996)


58,No. 1,pp.127-140

Reducing
theUse ofLaboratory
Animalsin Toxicological
Research
andTestingbyBetterExperimental
Design
By MICHAEL F. W. FESTINGt

and

University
ofLeicester,
UK

DAVID P. LOVELL
BIBRAInternational,
Carshalton,
UK

TheRoyalStatistical
[Readbefore
Societyat a meeting
on 'Statistical
aspectsofdesign'organized
bythe
Research
Sectionon Wednesday,
April12th,1995,Professor
V. S. IshamintheChair]
SUMMARY
More than 50 millionanimalsare used in biomedicalresearchin theworldeach year.It is
highlydesirablethat thisnumberis reducedboth for ethicaland for economicreasons.
Better experimentaldesign could lead to the use of feweranimals and improve the
of animalexperiments
repeatability
so thatalternativemethodswould be easierto validate.
aimed at identifying
Screeningexperiments
rodentcarcinogenswould be morepowerfulif
more than one strainof mice and/orratswereused. Attemptsto validatealternativetest
methods by using chemicals already tested in the Draize test for eye irritationare
complicatedby limitedinformation
on theinterexperiment
variabilityof thewholeanimal
test.In academictoxicologicalresearch,surveyssuggestthatmanyexperiments
are poorly
designed,and some seem to be unnecessarily
large.
Keywords:
ALTERNATIVESTO LABORATORYANIMALS; CARCINOGENESISSCREENING;
EXPERIMENTALDESIGN; LABORATORYANIMALS; MOUSE STRAINS;RAT STRAINS;
REDUCTION IN ANIMAL USE; TOXICITY TESTING

1. INTRODUCTION
Over50millionanimalsareusedinbiomedical
research
andsafety
testing
everyyear,
ofwhichnearly12millionareusedintheEuropeanUnion(EU). In 1991about55%
oftheseweremice,and27% wereratswithbirdsandfishbeingnumerically
thenext
mostimportant
species(Straughan,
1994).In theUK, thenumbers
ofanimalsused
in research
peakedat about5.6millionin 1971,remained
constant
forabout5 years
and thendeclinedto lessthan3 millionin 1993.However,bothsafetytesting
and
research
continueto be highlydependent
toxicological
on laboratory
animals,and
thereis littledoubtthattheiruse has beenessential
forthedevelopment
ofmuchof
modernmedicine
(Paton,1984;Botting,1992).The use oftheseanimalson sucha
andeconomically
largescaleis bothethically
eventhoughthereis strict
undesirable,
theiruse,and suffering
legislation
is minimized
as faras possible.There
controlling
couldbe no ethical
forusinganymoreanimalsthanis strictly
justification
necessary
to achievethedesiredresearch
objectives.
In their
classical
textThePrinciples
Russell
ofHumaneExperimental
Technique,
and Burch(1959) proposedthe'3Rs' of animalexperimentation.
wherever
Briefly,
possibletheuse of animalsshouldbe replacedbynon-sentient
alternatives
suchas
tissueculture,
lowerorganisms
or computer
simulation.
Animalexperiments
should
MedicalResearchCouncilToxicologyUnit,University
tAddress
of Leicester,
for correspondence:
Hodgkin
PO Box 138,Lancaster
Building,
LEI 9HN, UK.
Road,Leicester,
E-mail:mfwfl@le.ac.uk
? 1996RoyalStatistical
Society

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

0035-9246/96/58127

128

FESTING AND LOVELL

[No. 1,

and
Anaesthesia
to a minimum.
in sucha wayto reducesuffering
also be refined
and thehumanekillingof animalswhichare
enrichment
analgesia,environmental
possibletheuse
Finally,wherever
painareexamplesofrefinement.
clearlysuffering
thatis requiredto meetthe
to theabsoluteminimum
ofanimalsshouldbe reduced
couldbe achievedbybetter
Theyproposedthatreduction
objectives.
experimental
design.
experimental
improved
by
and
strategy
research
are poorlydesignedand inThereis evidencethatsome animalexperiments
havebeen
some
improvements
However,
1994).
1992,
(Festing,
analysed
correctly
ToxiBritish
the
by
dose
developed
procedure
the
fixed
for
instance
achieved;
the
though
ofanimalsusedin LD50-tests,
cologicalSocietyshouldreducethenumber
and Curnow,
dose (Whitehead
on thestarting
dependent
is sometimes
finalestimate
testsused to estimatethe
1992). Guidelineson the designof rodentprotection
beenpublishedwiththeaimof
agentshavealso recently
ofantimicrobial
efficiency
ofanimalsusedin suchtests(Acredet al., 1994).The aimof
thewelfare
improving
and the
thispaperis to discussthedesignofsomeexamplesofanimalexperiments,
designcan haveon thevalidationof alternative
effect
thatinadequateexperimental
designsare discussed.Theseare
methods.Two 'formal'experimental
(replacement)
hazards.Some
and eyeirritancy
carcinogenic
usedto screenchemicalsto identify
in
are also made on the designand analysisof experiments
generalcomments
which
publishedresearchpapers,thoughno attempthas been made to identify
animaluse.
forreducing
potential
thegreatest
areasoffer
research
2. CARCINOGENESIS SCREENING

on a largescaleand anywhichareto be used


whicharemanufactured
Chemicals
animalsto determine
usuallymustbe testedin laboratory
as drugsorfoodadditives
The implicitdefaultassumption
carcinogenic.
or not theyare considered
whether
rats
inmiceand/or
whichis carcinogenic
is thata chemical
behindsuchprogrammes
intheserodents
whichis notcarcinogenic
anda chemical
tobe so inhumans,
is likely
is lesslikelyto be so in humans.However,this'blackbox' approachis increasingly
thefateof thechemicalin animalsand
by a needto understand
beingtempered
theanimalis a suitablemodelof humans.If thetest
humansto assesswhether
or ifit is
hazard,it maythenbe discarded,
as a carcinogenic
chemicalis identified
safe'dose
a 'virtually
is carriedoutto identify
valuableriskestimation
particularly
of the
(VSD) (i.e. a dose whichmay onlycause cancerin a verysmallfraction
willthen
data (Lovell,1993).Attempts
fromtheavailableexperimental
population)
studyserves
abovetheVSD. Thus,thetypicallong-term
be madeto avoidexposures
an
to estimate
and secondly
as a screenorhazardidentification
firstly
twofunctions:
exposurelevelsifit is a carcinogen.
acceptableriskassociatedwithparticular
designhas evolved(Sontaget al., 1976),thoughthedetails
standard
A reasonably
are
seriesof experiments
The largestconsistent
to country.
can varyfromcountry
bioassay
those run by the USA National ToxicologyProgramcarcinogenesis
havebeentestedovera periodof
(NTPCBP) in whichover450 chemicals
program
$650million.A survey(McAuslaneet
morethan20 yearsat a totalcostexceeding
that
estimated
companies)
pharmaceutical
(mostly
al., 1991)basedon48 respondents
companies
are runby pharmaceutical
experiments
screening
72-84 carcinogenesis
$2 millionand$8 million.Thus,questions
eachyear.Eachis believedtocostbetween
financial
implications.
aboutthedesignof suchassaysmayhaveconsiderable

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

19961

REDUCING USE OF LABORATORYANIMALS

129

Thereare two reasonswhyan accurateresultis highlydesirable.Firstly,the


potentialcarcinogenic
hazard of each individualchemicalshould be correctly
assessedso thatriskestimates
to obtainacceptablehumanexposures
can be carried
out.Secondly,
alternative
invitro
techniques
needto be validated.
Thismustbe done
byusinganimaldatabecausethereareso fewchemicals
whichareconfirmed
human
and non-carcinogens.
carcinogens
However,iftheanimaldata are inaccurate
they
thevalidation
mayprevent
ofalternative
methods.
2.1. Design of CarcinogenBioassays

The designcurrently
usedby theNTPCBP involvesa controlgroupand a high
dose groupwhichreceivesthe'maximum
tolerateddose' (MTD). Thisis thedose
whichis
'. . . thehighest
doseofthetestagent... thatcanbepredicted
nottoaltertheanimal's
from
effects
other
thancarcinogenicity'
normal
longevity
(Sontaget al., 1976).Othergroupsreceivea halfof and a quarterof theMTD,
thoughin somecases therehave been onlythreetreatment
groups.Each group
consistsof 50 animalsof each sex, so the totalexperiment
usuallyinvolves400
animals.The screenis usuallyrepeatedin both mice and rats.In the past the
treatment
had a systematic
groupshavesometimes
ratherthana randomlay-out,
and thishas beencriticized
as a possiblesourceofbias.In theNTP studyon benzyl
acetatethe positionof the cage apparentlybiased the results(Young, 1989).
However,cages are now rotatedto avoid possiblebias (Haseman,1986),and
Haseman(1988)foundlittleevidenceofcageeffects
on tumour
incidence.
The statistical
tests.Threedifferent
analysisinvolvesnumerous
methodsare used
foradjusting
forage at death(assuming
thatall tumours
arefatal,assuming
thatall
are incidental,
and notinvolving
and foreachtypeofadjustment
age adjustment),
trendtestsareusedand eachgroupis comparedindividually
withthecontrols.
The
twosexesand,typically,
about20 tumourtypesaretestedseparately.
Therefore,
for
+ trend)x 2 (sexes)x 20
x 4 (comparisons
each speciesthereare 3 (age adjustments)
(tumour
types)= 480statistical
tests(FarrarandCrump,1988),leadingtopotentially
serioustype1 errorproblems.
oftheresultsdoes
However,thefinalinterpretation
notdependon anyrigidstatistical
rules.Accountis also takenofbiologicalfactors
suchas therarityof thetumour,
thepresenceor absenceof relatedpre-neoplastic
control
thesurvival
levelsofthetumour,
oftheanimalsandthe
lesions,thehistorical
acrossthetwo speciesand sexes.In rarecases a compoundmay be
consistency
designateda carcinogen
eventhoughwiththe testsused thereis no significant
difference
betweentreatedand controlgroups,and, conversely,
somestatistically
in tumour
increases
numbers
as beingofno biological
significant
maybe discounted
forhumans(Hasemanand Clark,1990).
importance
2.2. Choice of Strain

The responseto manycarcinogens


is genetically
For manyyearsthere
controlled.
has beena controversy
overwhether
an 'inbred'or an 'outbred'strainshouldbe
used.Inbredstrainsareproducedbymanygenerations
(at least20, butin practice
well over 50) of brother-sister
mating.The resultcan be likenedto a clone of
individuals.
and thefirst
genetically
Suchstrains,
identical,
homozygous
generation

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

[No. 1,

FESTING AND LOVELL

130

whichmake them
Fl-hybridsbetweentwo such strains,have severalproperties
valuable in research.They are 'isogenic'(i.e. all individualsare
particularly
theyremaingenetuniform,
theytendto be phenotypically
identical),
genetically
they
markers,
byusinggenetic
icallystableoverlongperiods,theycan be identified
and thereis usuallyconsiderabledata on their
have a worldwidedistribution
thoughthismaybe
ofspontaneous
tumours,
theincidence
including
characteristics,
(Festing,1979).The NTPCBP
influences
by environmental
alteredsubstantially
favourstheuse of isogenicstrainsmainlyon the groundsthattheiruniformity
power(Rao et al., 1988).
statistical
increases
as closedcoloniesin whichbrotheroutbredstrainsare maintained
In contrast,
a certainlevelofgenetic
sistermatingis avoided.As a result,thecolonymaintains
unique.However,theexactlevelof
and each individualis genetically
variability,
dependson the previoushistoryof the colony,and theyare
geneticvariability
than,say, thehumanpopulation(Festing,1993).
muchmoreuniform
generally
due
drift'
Theyaresubjectto 'genetic
fromseveraldisadvantages.
suffer
Thesestrains
at someloci(Papaioannouand Festing,1980),lackof
to changesin genefrequency
and the
geneticmarkersmeans that an individualstraincannotbe identified
maybe morevariablethanthatfoundin an inbredstrain(Festing,1976).
phenotype
Thosewhofavourtheiruse usuallydo so on thegroundsthattheaimis to model
also meansthatthe
humans,and humansare not inbred.The geneticvariability
thanifan inbredstrainis
compoundis beingtestedon a widerrangeof genotypes
is doneusingan
testing
used(Arcoset al., 1968).Roughly80% ofall carcinogenicity
and 20% usingan inbred
industry),
outbredstock(mostlyin thepharmaceutical
strain(McAuslaneet al., 1991).
2.3. MultistrainFactorialDesign
theyare inbredor
of whether
regardless
of straindifferences,
Tlheimplication
(Festing,1987).Theproblemis
bytoxicologists
has rarelybeenconsidered
outbred,
adenocarcinomas
in Table 1, whichshowstheincidenceof mammary
illustrated
(MFA) in strainACI and Sprague-Dawley
fibroadenomas
(MAC) and mammary
(DES) and
diethylstilbestrol
withtwoknownhumancarcinogens:
(SD) ratstreated
to
resistant
In theabsenceofDES, theACI ratswererelatively
neutron
irradiation.
TABLE 1

Percentagesof MAC and MFA in rats treatedwithDES and neutronirradiationt

0.0 rad
0.4 rad
1.3 rad
4.0 rad
DES+0.0 rad
DES+0.4 rad
DES+ 1.3 rad
DES+4.0 rad

Sprague-Dawleystrain

ACl strain

Treatment
No. of rats

MAC (%)

MFA (%)

No. of rats

MAC (%)

MFA (%)

13
35
24
23
23
33
23
23

0
0
0
9
52
67
83
91

0
0
4
4
0
0
0
0

33
46
35
34
33
44
35
33

3
2
3
15
0
0
0
3

3
11
14
32
0
2
0
3

tShellabargeret al. (1978).

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

1996]

REDUCINGUSEOF LABORATORY
ANIMALS

131

irradiation,
thoughin thepresenceof DES theydid showa cleardose-response
relationship.
However,ACI ratswerehighlysensitive
to DES, withover50% of
themdevelopingMAC. In contrast,the SD strainwas resistantto DES, but
developeda highincidenceof MFA in responseto irradiation.
Clearly,giventhat
onlyone strainis everused in a carcinogenesis
screen,anyconclusionabout the
carcinogenicity
ofDES orneutron
irradiation
woulddependentirely
on whichstrain
ofratshappenedto be chosen.
If straindifferences
ofthissortare common,theuse ofa singlestrainwillmean
thatsomecarcinogens
aremissedbecausea resistant
strainofmiceorratswas used.
However,
theseexperiments
arealreadyextremely
and theircostis nearly
expensive,
proportional
to theirsize.It wouldnotbe acceptableto increasethetotalnumber
of
animalsto accommodate
morestrains.
Theseconsiderations
ledFesting(1975,1987)
to proposethatscreening
experiments
of thissortshouldbe conductedbyusinga
factorial
of severalstrains,but stillmaintaining
designwithsmallnumbers
48-50
animalspersexbytreatment
eachgroupmightbe composedof
group.For example,
24 animalsoftwostrains,
or 12animalsofeachoffourstrains
oreven(theoretically)
one animalofeach of48 strains.
The statistical
implications
of theseproposalsare not immediately
apparent.
Factorialexperimental
designswithqualitativeend pointsin whichthe aim is
whetheror not thereis a treatment
initiallyto determine
effect(ratherthan
arenotcommonand arenotdescribed
itsmagnitude)
estimating
inmosttext-books
on experimental
design(e.g. Kempthorne
(1952),Cochranand Cox (1957),Cox
(1958) and Mead (1988)). However,Felton and Gaylor(1989) used computer
simulation
to studythepowerof themultistrain
experiment
by usinga one-sided
Mantel-Haenszel
test.In eachcase,theyassumedthattwogroupsof48 animalswere
involvedand lookedat variouscontrolincidences
and responseratesfora singlestrainexperiment,
a two-strain
experiment
(24 animalsof each strainpergroup),a
four-strain
experiment
(12 animalsof fourstrainsper group) and a 24-strain
experiment
(two animalsof 24 strainsper group).They foundthat in most
themultistrain
circumstances
designwas morepowerful,
and oftensubstantially
so,
thanthesingle-strain
design.Theonlyexceptions
werewhentheresponse
ratewasso
low thatit couldonlybe detectedwithlargenumbersof animalsof a susceptible
strain.Thechanceofchoosingsucha strainmaybe low,butifchosenitwouldoffer
the best chanceof detecting
the response.However,in thesecircumstances
all
would
have
low
strategies
and responserate
power.As the straindifferences
themultistrain
becamemorepowerful.
For example,where
increased,
experiment
strainA has a zeroresponserateand strainB has a 30% chanceofresponse,
the
was 0.87. Witha singlestrain,strainA would
powerof thetwo-strain
experiment
havezeropowerandstrainB wouldhaveapproximately
100%power,so theaverage
powerwouldbe 0.5. A similarpatternof resultswas observedwhenmorestrains
wereused,witha tendency
forthepowerofthemultistrain
toincreaseas
experiment
thenumberofstrainsincreased.
Theyconcludedthat
'Forthecasewhere
there
isnoknowledge
ofthesensitivities
ofavailable
thebest
strains,
interms
oftheMantel-Haenszel
willgenerally
consist
ofusingas many
design,
strains
test,
as possible.
Theriskofusingsucha design
isthepossible
lossofa smallamount
ofpower
whentheaverage
intheresponse
increase
rateduetoa chemical
is smallandthepower
is
Anadvantage
ofthemultistrain
smallanyway.
isthepossibility
ofa largegain
experiments
inpower.'

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

[No. 1,

FESTING AND LOVELL

132

usinggroupsoffourFl-hybridstrainswith12 animals
experiment
A multistrain
ofthis
thatan experiment
et al. (1991),showing
perstrainwas carriedoutbyWolff
concluded
but
Theydidnotfindlargestraindifferences
feasible.
designis technically
as the single-strain
was as powerfulstatistically
experiment
thatthe multistrain
design.
designs.The maximum
There are some potentialproblemswithmultistrain
whichmeansthateachstrainmayhaveto
strains,
between
dosemaydiffer
tolerated
wouldneedto be givento howthestrains
dose.Consideration
be givena different
acceptable
werechosen.Thismightbe doneat randomfroma pool of potentially
to relatedchemtheycouldbe chosenon thebasisofknownsusceptibility
strains,
diverseas possibleor a fixedpanelof
icals,theycouldbe chosento be as genetically
in theorgan
be usedinall tests(Festing,1995a).Strainsmayalso differ
might
strains
strainA/Jmice
Forexample,
thetypeoftumour.
occurs,and/or
inwhichthetumour
whereastheB6C3F1thecarcinogen,
whatever
nearlyalwaysdeveloplungtumours,
At present
hybridstrainusedin theNTPCBP assaymostlydevelopslivertumours.
witha
be
but thismaynot appropriate
each tumourtypeis analysedseparately,
of
analysis
of
statistical
methods
of
reappraisal
A complete
experiment.
multistrain
statistical
of
number
the
view
of
large
in
especially
theseassaysmaybe necessary,
Farrarand Crump(1988)proposedthe
done.For example,
testswhicharecurrently
whichwould
a singleteststatistic
with
tests
randomization
use of MonteCarlo
in sucha
data
the
combine
also
would
This
tests.
theproblemofmultiple
overcome
might
tumour
types
in
several
increase
insignificant,
statistically
waythata small,
carcinogenic.
was
the
that
compound
conclusion
an
overall
lead to
is
in studying
possiblemechanisms
experiment
useofthemultistrain
Thepotential
pharmstrainscan be identified,
and resistant
If susceptible
largelyunexplored.
and
data on therateand mannerof themetabolism
acologicaland biochemical
on carcinogenic
of the compoundmay provideusefulinformation
elimination
lociassociated
genetic
In somecases,itmayevenbe possibletoidentify
mechanisms.
maybe usefulin
information
(Festingetal., 1994).Thisadditional
withsusceptibility
assessingriskto humans.
2.4. ConcordancebetweenTests

to repeat.However,theNTPCBP
testsaretooexpensive
screening
Carcinogenesis
inbothratsandmice,usinga singleinbred
rodentbioassayserieshasbeenconducted
(isogenic)strainin each case. Hasemanand Seilkop(1992) tabulatedtheconcorTheresultsareshownin Table2.
thetwospeciesin 284experiments.
dancebetween
TABLE 2

responset
Interspeciesconcordancein carcinogenic
Totals

Rat

Mouse
Totals

+
-

67
38
105

40
139
179

107
177
284

tHaseman and Seilkop (1992).

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

19961

REDUCING USE OF LABORATORYANIMALS

133

Overall theconcordance
was 73%, with78 chemicalsapparently
onlyproducing
cancerin a singlespecies.Someof these78 chemicals
maybe truespecies-specific
carcinogens;
otherswillappearto be becauseofsampling
variation(truetype1 and
type2 errors).However,giventhatthereis good evidenceforstrainvariationin
in both
susceptibility,
someproportion
of these78 chemicalswillbe carcinogenic
speciesbuttheireffect
in onespecieswillhavebeenmissedbecausethechosenstrain
happenedto be resistant
to thatparticular
carcinogen.
Hence,thetrueconcordance
is probablymuchhigherthanthesefigures
wouldsuggest.Unfortunately,
without
it is impossible
further
to estimate
thetrueconcordance
information,
rate.
The resultsalso suggestthattheuse of bothspeciesis stillnecessary.
Assuming
whichwerepositivein one or otheror bothspeciesare true
thatall thechemicals
rodentcarcinogens,
theuse ofonlya singlespecieswouldresultin nearlya thirdof
wasobtainedwhenMaronpot
carcinogens
beingmissed.Similarly
poorconcordance
in theNTPCBP rodentbioassay(bothratsand
et al. (1983)compared54 chemicals
mice)withthestrainA mousepulmonary
tumour
bioassay.Thelatterusesa strainof
to thedevelopment
of lungtumourswhentreated
micewhichare highlysensitive
in bothassays.
concordant
withcarcinogens.
Theyfoundonly20 ofthe54 chemicals
in theNTP assay,sevenwerepositive
whichwerenegative
Amongthe16 chemicals
if strainA micehad been
tumourassay.Presumably,
in thestrainA pulmonary
includedin theNTP assay,someofthese16chemicals
wouldhavebeenclassified
as
tumourassayfoundno evidenceof
carcinogenic.
However,thestrainA pulmonary
in27 outof37 chemicals
in theNTP assay.14of
whichwerepositive
carcinogenicity
these37 had beenpositivein ratsandmice,eightonlyin theratand fiveonlyin the
further
workit is impossible
to knowhow
mousein theNTPCBP. Again,without
in assayprocedure
is due to differences
and howmuchto
muchofthisdiscordance
straindifferences.
2.5. Conclusions

Thereappearto be somefundamental
flawsin thedesignand interpretation
of
animalcarcinogenesis
screening
experiments.
Althoughthereare oftenlargestrain
in response,
all suchtestsuse onlya singlestrainofeach species.It was
differences
suggested
nearly20 yearsago that,insteadofusinggroupsofabout50 animalsall of
theexperiments
wouldnormally
be morepowerful
ifseveralstrains
thesamestrain,
thetotalnumbers,
themaindeterminant
ofcost.
wereused,butwithout
increasing
thissuggestion.
The potential
Recentstatistical
research
has supported
benefits
ofa
If theconcordance
moreaccuratedesignare enormous.
oftestresultsbetween
rats
and micecouldbe increasedby betterexperimental
design,it maybe possibleto
eliminate
one of thetwotestspecies,makinga savingof over$1 millionpertest
testsshouldalso be easierto validate,
(Hasemanand Lockhart,1993).Alternative
In viewofthe
withtheeventualhopethatanimaltestscouldbe totallyeliminated.
itis extraordinary
thatlittleorno research
is currently
beingdone
potential
benefits,
theuse ofmorethanone strain.
ofthisdesignthrough
on theimprovement
3. DRAIZE TEST
to testagrochemicals,
andtoiletries
TheDraizetestwasdesigned
cosmetics
foreye
irritancy
following

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

134

FESTING AND LOVELL

[No. 1,

fifteen
ortwenty
froma shampoo,dermatitis
'... severaldeathsfroma depilatory,
injuries
and disfigurement
fromhairlacquer,and blindness
froman eyemascara'

(Paton,1984).A reasonably
standard
procedure
involvesinstilling
0.1ml ofthetest
fluidunderonelowereyelidofan albinorabbit.Theresults
arescoredafter1,24 and
72h and 7 days.A subjectively
determined
pointsscoringsystemcan be applied
and theamountofdischarge
basedon damageto thecornea,irisandconjunctiva
to
scoreof between0 and 110,whichis thenamenableto statistical
givea numerical
are also used.Samplesizesof at
assessment
analysis,althoughcategorical
systems
leastthreerabbitsare requiredby theEU and Japaneseregulators,
whereasthe
forEconomicCo-operation
and Development
Organisation
(OECD), theUS Food
and the US Environmental
Protection
and Drug Administration
Agency(EPA)
requireat leastsixanimals(Koeter,1991).It has beenestimated
thatsamplesizesof
2, 3, 4 and 5 rabbitswere88%, 93%, 95% and 96% accuraterespectively,
when
comparedwithsamplesizesof 6 (DeSousa et al., 1984).The Draize testis highly
andvirtually
wouldlikean alternative
to be developed.In fact,
distasteful,
everyone
havebeenproposed.Theserangefromtheuse ofeyesfromdead
manyalternatives
housesto entirely
suchas the
animalsobtainedfromslaughter
non-living
systems
EYTEXTMmethodwhichinvolvesa complexproteinextractfromjack beans
ofthealternative
methodspresents
validation
a problem
(Gordon,1992).However,
on theDraize testare eitherabsentor
as thedata on interexperiment
variability
indicatea considerable
to definecriteria
fortheacceptanceof
variability.
Attempts
thattheyshouldagree
methodhavebeenbased on theassumption
an alternative
result.In
withtheanimaldatawhichareassumedtobe the'goldstandard'orcorrect
mostvalidationstudieshavetriedto use correlation
practice,
methodsratherthan
methods
basedon degreeofagreement
(BlandandAltman,1986).If theanimaltest
notrepeatable,
willbe foundbetween
is effectively
considerable
thein
disagreement
vivo and in vitro data, and the in vitro test is unlikelyto be considereda valid

alternative.
An extensive
interlaboratory
studyof the Draize test(Weil and Scala, 1971)
each of whichtested'blind'(an uncomparedthe resultsfrom25 laboratories
termin thiscontext!)12 centrally
fortunate
suppliedmaterialsby usinga method
as faras possible,although
itwasnotpossibleto standardize
whichwasstandardized
thesourceoftherabbits.Mostlaboratories
werein good agreement
in scoringnonforirritants,
theintra-individual
substances.
waslarge
irritating
However,
variability
in somelaboratories
and smallin others.Thedegreeofdamagefora givenchemical
and tissuedamage,whereaseyehealingwas
variedfromnoneto severeirritation
butprogressive
observedin somelaboratories,
damagewas seenin others.Certain
recorded
severescores,whereasothersconsistently
laboratories
consistently
reported
An independent
thecompoundsto be non-irritating.
observertendedto scoreless
in theformer
and moreseverely
in thelattertypeoflaboratory,
severely
suggesting
of theinterlaboratory
thata majorcomponent
was thescoringsystem.
variability
The authorsconcludedthat
. .. therabbiteyeandskinprocedures
recommended
currently
bytheFederalagenciesfor
of irritancy
of materialsshouldnot be recommended
use in delineation
as standard
in anynewregulations'.
procedures

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

19961

REDUCINGUSE OF LABORATORY
ANIMALS

135

20 yearslater,theinterpretation
of theresultsstillpresented
a problem.Koeter
noted
that
(1991)
'. . . itis reallyamazingto findoutthatwhereasdetailedscoring
systems
aredescribed
in
thevarioustests,mostof theguidelines
lackanyfollow-up
to this.NeitherUS-EPA nor
theonlytwoprotocolsthatdid
OECD defined
criteria.
anyclassification
Unfortunately,
payattention
to data interpretation
defined
different
criteria....'

Thepotentially
inanimalstudiesposes
between
highvariability
repeatexperiments
severeproblemsforgroupsworkingto developalternatives.
Balls et al. (1995)
thisproblemin detail,and made25 recommendations
considered
forthevalidation
ofalternative
methods.
forexample,thatproposedalternative
tests
Theysuggested,
shouldhave been adequatelydeveloped,standardized
and documented
and that
validation
results
shouldbe published
withfulldetailsinpeer-reviewed
publications.
thatsetsofreference
chemicals
withknowntoxicological
Theyalso suggested
profiles
thetoxicological
wouldbe neededin thevalidationprocess.However,establishing
is notalwayseasy.An earlierreport(Balls et al., 1990)
profiles
forsuchchemicals
notedthat
in
volumeoftoxicological
datahasbeengenerated
'Overthepast50 years,a considerable
recent
laboratory
animalson a wide rangeof chemicalsand mixtures.
Nevertheless,
thecompleteness
ofthesetoxicological
on existing
attempts
at assessing
profiles
chemicals
haveshownthatthedata arenotcomprehensive
formostchemicals,
and thatformany
chemicals
littleor no data exist.'

to validatealternative
Theyalso consideredthatthe use of animalsspecifically
methodsposedseriousethicalproblems.
Thepoorrepeatability
ofsomeanimalstudiesmakesit difficult
to finda solution
It createstheparadoxthattheworsean animaltest(whichhas been
to theproblem.
is in termsofrepeatability
themoredifficult
it willbe to
acceptedbytheregulators)
randomelement
to theresultofan animalteston
replaceit.Ifthereis a considerable
0 and 110itwillbe impossible
tofindanyalternative
a scalebetween
testwhichgives
It willalso be difficult
to distinguish
between
alternative
goodagreement.
testswhich
are good predictors
of theunderlying
fromthosewhichare poor,as both
toxicity
will showpoor agreement
withthe animaldata. Elimination
of the Draize test
is therefore
unlikelyto be based on one singlecomparisonbut ratherby the
ofa bodyofevidence
thattheanimaltestis highy
variableandthatthe
development
alternative
testsoffer
resultswhichovera periodoftimecan be shownto
repeatable
A battery
irritant
chemicals
ofalternative
teststhatarebelievedto
identify
reliably.
havea highsensitivity
foridentifying
couldbe usedas a
highly
irritating
compounds
If thetestspredicta highdegreeofirritancy,
thenthemanufacturer
pre-screen.
may
and takeappropriate
actionto
abandonthechemicalor acceptthatit is irritant
humanrisk.Thiswouldscreenouta proportion
ofirritant
at the
minimize
chemicals,
results.
Ifthemanufacturer
wasnotwilling
toacceptthein
costofsomefalsepositive
vitroresults,it would thenhave to be testedin the Draize test,but in these
Draizetestmightbe used.Smallnumbers
circumstances
someformofsequential
of
but if thechemicalprovedto be non-irritant
rabbitswouldbe used initially,
this
couldbe confirmed
withone or moreadditionalexperiments.
As chemicalsof this
typewereidentified,
theycouldbe used to tryto improvethein vitrotestbattery,
thatpatentlawsdidnotprevent
this.Chemicalsthatwerepredicted
to be
assuming

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

136

FESTING AND LOVELL

[No. 1,

non-irritant
on theinitialscreenwould,at leastfora fewyears,continue
to be tested
mostofthesewouldbe non-irritant
so would
byusingtheDraizetest.Presumably,
littlediscomfort
to the animals.However,the irritancy
of some
cause relatively
wouldbe underestimated
bytheinvitrotests.Thisinformation
couldagainbe used
to trytoimprove
theinvitro
testbattery.
As theDraizetestis believedto havea low
rateof falsepositiveresults,a positiveresultof thissortmaynot requirerepeat
testing.
Thealternative
testswouldalso benefit
fromtheapplication
ofstatistical
thinking
in theirdevelopment.
Test protocolsshouldprovideestimatesof between-and
Factorialdesignsshouldalso be usedto
within-laboratory
experimental
variability.
ofvariation
exploretheeffects
amongcelllines,media,periodofexposureandother
theresults,
to establishprotocolswhichare relatively
variableslikelyto influence
in theirexecution.
insensitive
to minorvariations
4. ANIMAL EXPERIMENTS IN ACADEMIC RESEARCH

It is probably
notsurprising
thatsophisticated
proposalssuchas theuse ofseveral
have not beenaccepted,as manytoxicologists
strainsin carcinogenesis
screening
ofstatistical
methods.
Threestudies(Festing,1992,1994,
havea poorunderstanding
in
1995b)of experimental
designand statistical
analysisof animalexperiments
paperspublishedin reputabletoxicological
journalshave shownthatmanyare
poorlydesignedand incorrectly
analysed.For example,in one survey(Festing,
useda completely
randomized
design.The
1995b)47 of48 experiments
experimental
in thestatistical
blockdesignfailedto removetheblockeffect
singlerandomized
involvedcomplexbiochemicaldeteranalysis.Yet most of these experiments
whichwouldrequirethattheexperiment
was brokendownintosomesort
minations
error.Exactlyhowthis
ofblock,andwhichwouldhavebeensubjectto experimental
wasdonewasneverstated.In somecasesthetreatment
groupmayhavebeenusedas
bias.Anecdotalinformation
theblock,thereby
thatthisis quite
introducing
suggests
usedunequal
common,buttherehavebeenno formalstudies.Severalexperiments
further
without
thattheyweredonead hoc
subclassnumbers
explanation,
suggesting
mentioned
randomization.
A
ratherthanbeingformally
planned.Few researchers
oftreatments
was quitecommon(17 of the48 experiments),
factorial
arrangement
buttheresearchers
had oftenattempted
to analyseit byusingStudent'st-test.
It has beensuggested
thatdesignedexperiments
witha quantitative
dependent
forerror(Mead,
variableshouldusuallyhavebetween10and20 degreesoffreedom
withfewerthan10degreesoffreedom
forerrorwilllackpower,
1988).Experiments
freedom
About 30% of
and thosewithover20 degreesof
maywasteresources.
forerror,
had over36 degreesoffreedom
thattheymayhave
experiments
suggesting
ofremoving
beenunnecessarily
largeand/orhavefailedto exploittheopportunity
effects
to reducetheresidualvariation.
Littleis knownaboutthewayin
systematic
decidethesizeoftheirexperiments.
whichscientists
However,it is almostuniversal
thedata as meanswiththestandarddeviationor standarderror
practiceto present
to thatmean.Pooled standarddeviations
are
basedonlyon thedata contributing
Most scientists,
seemto choosea groupsize of about
almostunknown.
therefore,
6-10animalsto givea reasonableestimate
ofthewithin-group
and thisis
variation,
combinations.
Withfactorialdesigns,this
multiplied
by thenumberof treatment
Thisapproachmayalso
approachwilloftenlead to excessively
largeexperiments.

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

19961

REDUCING USE OF LABORATORYANIMALS

137

accountfortherarity
ofrandomized
blockdesigns.
Theseonlygivea pooledestimate
ofthevariation,
thecalculationofwhichis obscureto themanyscientists
whoare
notfamiliar
withtheanalysisofvariance.
In part,manytoxicologists'
uneaseoverthe use of sophisticated
experimental
designsand theirassociatedstatistical
analysesseemsto be a reflection
of the
excessiveeffect
of hypothesis
testingand significance
levelsin the assessment
of
toxicological
results.This is perhapsunderstandable,
as originally
toxicity
testing
was designedto answerthe question'is thiscompoundtoxic?'.Althoughmost
toxicologists
define
theirtype1 errorrates,itis rareforthemto considerthepower
of theirexperiments.
However,withhighdose levelsand moresensitive
endpoints
muchmoreattention
nowneedsto be giventodataexploration
andestimation
ofthe
degreeoftoxicity
ofa chemical.
In somecases,thereseemsto be a misunderstanding
of therole of statistical
methods
in theanalysisandinterpretation
ofanimalexperiments.
Themicronucleus
testis usedto screenchemicals
whichmaycause chromosomal
damage(knownas
withpossibleadverseresults.
'clastogens'),
The UK Environmental
MutagenSociety
recommended
thatat leastfivemaleand fivefemalemiceshouldbe used in each
experimental
and controlgroup,and thata basicscreencan consistofa singledose
of the compoundusingthemaximumtolerateddose. At leastthreetimepoints
shouldbe used.The bonemarrowcellsare extracted,
placedon microscope
slides,
fixedandappropriately
stained.Theso-called'polychromatic
erythrocytes'
(PEs) are
studied,and the numberwhichhave 'micronuclei'(fragmented
or separated
are counted.At least 1000 PEs are studies,and the numberof
chromosomes)
micronuclei
in controlanimalsapproximately
followsa Poissondistribution
witha
meanof about2.5. Experiments
of thissortgeneratelargevolumesof numerical
data,thoughitsanalysisposesfewproblems.
Lovellet al. (1989)discussedvariousmethodsof analysing
theresultsincluding
parametric
testssuchas theanalysisof varianceusingappropriately
transformed
data,theX2-test
andvariousnonparametric
tests.Theyconcludedthattherewas no
'right'wayto analysethedata,butthattheuse ofthet-test
on untransformed
data
was clearly
wrong.Recently,
however,
AshbyandTinwell(1995)suggested
thatdata
fromthemicronucleus
testshouldnotbe subjectedto a statistical
analysison the
groundsthat
'.. . genetic
toxicity
in rodents
observed
is ofsuchsignificance
forhazardassessment
that
positiveresponses
shouldbe evidentwithout
recourse
to statistics
..

In a laterpaperMorrisonand Ashby(1995)statedthat
'Some protocolsrecommend
theuse of 5 animalsof each gender,but thesecannotbe
combinedintoa groupof 10 animalsbecauseof differences
in controlMPE (micronucleated
cells)valuesbetween
genders.
Theuse ofthesamenumber
ofanimalsofa single
sex. . . wouldtherefore
enhancetheresolving
powerofassays.'

Ashbyandco-workers'
comments
maybe a goodstarting
placefora debateon the
roleofstatistics
in toxicological
research.
The UK Environmental
MutagenSociety
has been highlysuccessful
in combiningthe skillsof statisticians
and genetic
in helpingto developappropriate
toxicologists
experimental
designsfor genetic
toxicology
tests(Lovell,1995).It wouldbe unfortunate
ifthecomments
of highly
respected
scientists
in thefieldofmutagenesis
hindered
thesedevelopments
bybeing

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

138

FESTING AND LOVELL

[No. 1,

have
Ashbyand Tinwell(1995),forinstance,
interpreted
as,being'antistatistics'.
could
and
probably
be
to
highly
homogeneous
appear
which
data
historical
compiled
processcontrolchartswhichwould,withstatistical
be used to producestatistical
of newdata and couldlead to a further
be usefulin theinterpretation
assistance,
in samplesize.
reduction
4.1. Conclusions
thereis a
research,
fromtoxicological
statistics
It is clearthat,farfromdivorcing
a
is
moreclosely.There consequently needfor
needto couplethetwodisciplines
area to ensure,for
involvedin thisresearch
to becomemoreintimately
statisticians
statistical
for
need
the
of
inputat
areaware
thatjournaleditorsandreferees
instance,
that
towards
first
ensuring
wouldbe a
step
Suchinvolvement
all levelsoftoxicology.
use.
in
animal
reduction
designwillindeedlead to a
betterexperimental
REFERENCES
H. C.,
D. J.,Ryan,D. M., Smulders,
J.A., Merrikin,
T. D., MacArthur-Clark,
Acred,P., Hennessey,
ofanimalsin
forthewelfare
W. D. (1994)Guidelines
Troke,P. F., Wilson,R. G. and Straughan,
28, 13-18.
tests.Lab. Anim.,
rodentprotection
Induction
ofCancer.London:Academic
Arcos,J.C., Argus,M. F. andWolf,G. (eds)(1968)Chemical
Press.
withtherodentbone marrow
Ashby,J. and Tinwell,H. (1995) A sequentialapproachto testing
analysisofdata.Mut.Res.,327,49-55.
assay-obviationoftheneedforstatistical
micronucleus
C., Roberfroid,
M., Reinhardt,
J.,Lamb,D., Pemberton,
D., Frazier,
B.,Brusick,
Balls,M., Blaauboer,
A.-L.andWalum,E. (1990)Reportand
H., Stammati,
B., Spielmann,
H., Schmid,
M., Rosenkranz,
of theCAAT/ERGATTworkshopon thevalidationof toxictestprocedures.
recommendations
ATLA,18,313-337.
Balls,M., Blaauboer,B. J.,Fentem,J. H., Bruner,L., Combes,R. D., Ekwall,B., Fielder,R. J.,
C. A., Repetto,
D., Spielmann,
G., Sladowski,
A., Lewis,R. W.,Lovell,D. P., Reinhardt,
Guillouzo,
and
thereport
testprocedures:
oftoxicity
aspectsofthevalidation
H. andZucco,F. (1995)Practical
5. ATLA,to be published.
ofECVAM workshop
recommendations
betweentwo
methodsforassessingagreement
Bland,J. M. and Altman,D. G. (1986) Statistical
Lancet,i, 307-310.
ofclinicalmeasurement.
methods
1stedn.London:
andtheFuture
Experimentation
ofMedicalResearch,
J.H. (ed.)(1992)Animal
Botting,
Portland.
Designs.NewYork:Wiley.
Cochran,W. G. and Cox,G. M. (1957)Experimental
NewYork:Wiley.
Experiments.
Cox,D. R. (1958)Planning
thenumber
ofreducing
consequences
DeSousa,D. J.,Rouse,A. A. andSmolon,W. J.(1984)Statistical
76,
Toxicol.Appl.Pharmacol.,
dataon 67 petrochemicals.
testing:
in eyeirritation
ofrabbitsutilized
234-242.
in animal
effect
testsforanycarcinogenic
Farrar,D. B. and Crump,K. S. (1988) Exactstatistical
bioassays.Fund.Appl.Toxicol.,11,652-663.
J.
toxicsubstances.
forscreening
experiments
Felton,R. P. and Gaylor,D. W. (1989) Multistrain
Hlth,26, 399-411.
Toxicol.Environ.
thesafety
animalsin evaluating
oflaboratory
M. F. W. (1975)A caseforusinginbredstrains
Festing,
ofdrugs.Fd Cosmet.Toxicol.,13,369-375.
mice.Nature,263,230-232.
ofinbredand outbred
variability
(1976)Phenotypic
London:Macmillan.
Research.
(1979)InbredStrainsinBiomedical
Crit.Rev.Toxicol.,
screening.
fortoxicological
implications
in toxicology:
(1987)Geneticfactors
18, 1-26.
Lab. Anim.,26,
animalexperiments.
thedesignoflaboratory
(1992)The scopeforimproving
256-267.
screening.
fortoxicological
ratsandmiceanditsimplications
inoutbred
variation
(1993)Genetic
J.Exp. Anim.Sci.,35,210-220.

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

19961

REDUCING USE OF LABORATORYANIMALS

139

(1994)Reduction
ofanimaluse:experimental
designandqualityofexperiments.
Lab. Anim.,
28,
212-221.
(1995a)Use of a multi-strain
assaycouldimprovetheNTP carcinogenesis
bioassayprogram.

Environ.Hlth Perspect.,103, 44-52.

(1995b) Are animalexperiments


in toxicological
researchthe "right"size? In Statisticsin
Toxicology
(ed. B. J.T. Morgan).Oxford:OxfordUniversity
Press.To be published.
Festing,
M. F. W., Yang,A. and Malkinson,
A. M. (1994)At leastfourgenesand sexareassociated
withsusceptibility
to urethane-induced
pulmonary
adenomasin mice.Genet.Res., 64,99-106.
Gordon,V. C. (1992)The scientific
basisoftheEYTEXrmsystem.
ATLA,20, 537-548.
Haseman,J.K. (1986)Replyto letter.
J.Natn.CancerInst.,77, 305-306.
(1988)Lack ofcageeffects
on livertumorincidence
in B6C3F1mice.Fund.Appl.Toxicol.,10,
179-187.
Haseman,J.K. andClark,A.-M.(1990)Carcinogenicity
results
for114laboratory
animalstudiesused
to assessthepredictivity
offourinvitrogenetictoxicity
assaysforrodentcarcinogenicity.
Environ.
Molec. Mutagen.,16, 15-31.

Haseman,J. K. and Lockhart,


A. M. (1993) Correlations
betweenchemically
relatedsite-specific
carcinogenic
effects
in long-term
studiesin ratsand mice.Environ.
HlthPerspect.,
101,50-54.
Haseman,J. K. and Seilkop,S. K. (1992)An examination
of theassociationbetweenmaximumtolerated
doseandcarcinogenicity
in 326long-term
studiesinratsandmice.Fund.Appl.Toxicol.,19,
207-213.

Kempthorne,0. (1952) The Design and Analysisof Experiments.


London: Wiley.

Koeter,H. B. W. M. (1991)Current
guidelines
and regulations
in toxicological
research.
In Replacement,Reductionand Refinement:
PresentPossibilitiesand FutureProspects(eds C. F. M. Hendriksen

and H. B. W. M. Koeter),1stedn,pp.17-34.Amsterdam:
Elsevier.
Lovell,D. P. (1993)Impactof pharmacogenetics
on toxicological
studies:statistical
implications.
J.
Exp. Anim.Sci., 35, 259-281.

(1995)Statistical
analysisofgenetic
testdata.In Statistics
toxicology
inToxicology
(ed. B. J.T.
Morgan).Oxford:OxfordUniversity
Press.To be published.
Lovell,D. P., Albanese,R., Clare,G., Richold,M., Savage,J.R. K., Anderson,
D., Amphlett,
G. E.,
Ferguson,R. and Papworth,
D. G. (1989) Statistical
analysisof in vivocytogenetic
assays.In
StatisticalEvaluation of MutagenicityTest Data (ed. D. J. Kirkland), pp. 184-232. Cambridge:
Cambridge
Press.
University
Maronpot,R. R., Witschi,
H. P., Smith,L. H. and McCoy,J.L. (1983)Recentexperience
withthe
strainA mousepulmonary
adenomabioassay.Environ.
Sci. Res.,27, 341-349.
J.A. N., Lumley,
McAuslane,
C. E. andWalker,R. S. (1991)The needforcontrolanimalpathology
database:an international
survey.
Hum.Exp. Toxicol.,10,205-213.
Mead,R. (1988)TheDesignofExperiments.
Cambridge:
Cambridge
University
Press.
V. and Ashby,J.(1995)Highresolution
Morrison,
rodentbonemarrowmicronucleus
assaysof 1,2dimethylhydrazine:
implications
of systemic
toxicity
and individualresponders.
Mut. Res., to be
published.
Papaioannou,V. E. and Festing,
M. F. W. (1980) Geneticdriftin a stockof laboratory
mice.Lab.
Anim.,14, 11-13.

Paton,W. (1984)Man andMouse,1stedn.Oxford:OxfordUniversity


Press.
Rao, G. N., Birnbaum,
L. S., Collins,J.J.,Tennant,
R. W. and Skow,L. C. (1988)Mousestrains
for
chemical
carcinogenesis:
overview
ofworkshop.
Fund.Appl.Toxicol.,10,385-394.
Russell,W. M. S. and Burch,R. L. (1959) The Principlesof HumaneExperimentalTechnique.Potters

Bar:Universities
Federation
forAnimalWelfare.
C. J.,Stone,J.P. andHoltzman,
Shellabarger,
S. (1978)Rat differences
inmammary
tumorinduction
withestrogen
and neutron
irradiation.
J.Natn.CancerInst.,61, 1505-1508.

Sontag,J.M., Page, N. P. and Saffiotti,


U. (1976) Guidelines
for CarcinogenBioassay in Small Rodents.

Bethesda:NationalCancerInstitute.
D. W. (1994)FirstEuropeanCommittee
Straughan,
Reporton statistics
ofanimaluse.ATLA,22,289292.
Weil,C. S. andScala,R. A. (1971)Studyofintraandinterlaboratory
intheresults
variability
ofrabbit
eyeand skinirritation
tests.Toxicol.Appl.Pharmacol.,
19,276-360.
A. andCurnow,
Whitehead,
R. N. (1992)Statistical
evaluation
ofthefixed-dose
procedure.
FoodChem.
Toxicol.,30, 313-324.

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

140

FESTING AND LOVELL

[No. 1,

inducedby2B.-N.(1991)Bladderandlivertumorigenesis
G. L., Gaylor,D. W. andBlackwell,
Wolff,
of using
and effects
withingenotypes
Fl mousehybrids:
variation
in different
acetylaminofluorine
Hlth,33, 327-348.
J. Toxicol.Environ.
on riskassessment.
morethanonegenotype
Anexamination
rodentstudies?
unitforlong-term
Young,S. S. (1989)Whatis theproperexperimental
54,233-239.
oftheNTP benzylacetatestudy.Toxicology,

This content downloaded from 131.247.112.3 on Sun, 30 Nov 2014 00:06:12 AM


All use subject to JSTOR Terms and Conditions

You might also like