You are on page 1of 2

Institutional Animal Care and Use Committee

The University of Tennessee


How Many Animals Do I Need?
Arnold M. Saxton, Animal Science
Size of animal experiments is a delicate balance. Use too many animals and you are wasting
resources and needlessly exposing animals to potential harm. If too few are used, experimental
results will not be clear cut, again wasting animal resources unless the experiment can be
enlarged by collecting more data.
Size of an experiment involves five quantities:
1. V, the variability among observations within a treatment..
2. D, the magnitude of treatment differences.
3. a, the chance of incorrectly detecting a treatment difference.
4. b, the chance of incorrectly detecting no treatment difference, or Power=1-b is the chance
of correctly detecting a treatment difference.
5. N, the number of observations.
Typically scientists will choose N to give an 80% chance of detecting the difference D (if it truly
exists) with no more than a 5% chance of error, assuming V variability. Note this statement uses
all five quantities.
Experiment size will be smaller if :
1. V is decreased. This is the reason for good experimental technique, reducing
measurement error, working with well defined populations of uniform animals, and
controlling known sources of variability by statistical design techniques such as blocking
and covariates. See papers by MFW Festing
ILARJ (2002) 43:244-258
and
Vet. Anaesth. Analg. (2003) 30:59-61.
2. D is larger. This generally is not under researcher control, but as treatment differences
get smaller, experiments must be larger.
3. a is made larger. Generally P<.05 is the largest error rate that is scientifically accepted.
4. b is made larger. Again, statistical power of 80% is a commonly quoted minimum.
Would you pay for an experiment that only had a 50% chance of detecting an important
treatment difference? Should we choose scientific knowledge based on flipping coins?
These considerations show that controlling variance is the best option for reducing sample size.
Festing also mentions experimental approaches that use fewer animals than the traditional
comparison of groups of animals on different treatments.
These five quantities are connected by complex formulas that change depending on the type of
data (continuous, binary, etc.) and the type of question (differences in means is just one of
many). A very rough approximation can be obtained from
N = 25*V/ (D*D).
Suppose a researcher wants to detect a difference in mouse body weight, and anticipates the
control group will weigh 40g, the treatment group will weigh 50g (D=50-40), and the CV will be
20%. CV, the coefficient of variation, is std. deviation (s) divided by the mean, so
s = (20%)*40g = 8g.
V is simply s squared, giving N = 25*8*8/(10*10) = 16 animals per treatment.

It is recommended that at a minimum, researchers give anticipated means and a measure of


variability for Question E5 in the IACUC protocol form. This will allow reviewers to more
objectively assess the proposed size of experiments, using the above approximation.
For more accuracy, however, use of a sample size computer program is recommended. Webbased versions are convenient, and use of
http://www.stat.uiowa.edu/~rlenth/Power/index.html
is now illustrated.
1. Go to the above URL, and select the Two-sample t test for comparing two means.
2. Sigma1 and sigma2 are in the upper left. These are std. deviations for the two treatments,
set equal by default. Set them to 8.
3. The a value is on the right (.05 by default), and below that the true difference can be set.
Set D to 10.
4. Change the sample sizes at lower left, and see how power changes. The approximation
gave a sample size of 16, which has a power of 93%. The experiment could be reduced
to 12 animals per treatment and have at least 80% power.
5. You can also change power values, and see how sample sizes change.
A web search for "power sample size" will provide many other calculators.
In a complex experiment, with many sub-experiments and treatments, does sample size need to
be calculated for every combination? The calculations above theoretically only need to be done
once for the worst-case scenario, where variability is highest and treatment difference is lowest.
But this would produce excessive use of animals in some treatments, so a design that allows
unequal samples sizes for different treatments might be considered. Then sample size
calculations would have to be repeated for each unequal sample size allowed.
If all sub-experiments are connected through the use of common animals or tissues, then only the
"weakest-link" needs to be considered. If stage 3 of the experiment needs tissue from 10
animals, obviously stage 1 and 2 that lead to stage 3 will need 10 animals, even if sample size
calculations suggest 4 animals are sufficient in stage 1 and 2. Again, identify the situation that
has smallest treatment difference and largest variance, and that will dictate sample size of the
experiment.
As a final example, suppose researchers intend to use Fisher's Exact Test to compare two
percentages. They want to be 90% sure of detecting a true difference between percentages of
70% and 80%, at the 5% significance level.
Go to http://calculators.stat.ucla.edu/powercalc/ and choose Fisher's, and "Sample size for a
given power." Fill in the form with 0.70, 0.80, 2 sided test, 0.05 and 0.90 power, and the sample
size required is about 400 animals. Percentage data generally require large experiments.

You might also like