You are on page 1of 4

If the Normal Distribution Is So Normal, How Come My

Data Never Are?

Andrew J. Vickers, PhD

How Normal Is Normal?

One of the first data sets that I looked at when I was learning statistics had a number of
missing observations. I was told that this was totally normal. I also noticed that the main
endpoint followed the bell-shaped curve that is often described as a "normal
distribution." This, I was told, was not normal at all; indeed, one of my lecturers became
rather excited, commenting, "They say it never happens, but look, here is an example,
which just goes to show that you can get a normal curve." I think what they were trying
to tell me was that it wasn't normal to get normal data. Nonnormality seemed to be the
norm, but I couldn't be sure.
The fact that I then decided to become a statistician no doubt raises some interesting
characterological questions, but here I am, and here are some data that I have been
reviewing. This graph shows the distribution of prostate-specific antigen levels in men
undergoing surgery for prostate cancer.
And, just for the sake of it, here are data from a totally different area of medicine: These
are baseline pain scores from a headache trial.

Medicine Changes the Norm

Both graphs look pretty similar to each other, and pretty dissimilar to a normal
distribution. The simple explanation for what is going on here is that medical research
typically involves studying patients with some kind of disease. By definition these
populations are not normal, they have presented for treatment exactly because they
have something wrong. Perhaps this is what was behind my professor's comment about
the rarity of normal distributions: you hardly ever see normal distributions in medical
research because you hardly ever study the normal population as a whole, only unusual
subsets.

A more mathematical way of saying this is that whereas normal processes usually
involve addition (see the previous article in this series, "Why Does Chutes and Ladders
Explain Hemoglobin Levels? Some Thoughts on the Normal Distribution"), disease
processes are often multiplicative in nature. Cancer is a good example: cancer cells
divide and grow and tumors therefore double in size every few months. In the case of
headache, a series of severe headaches leads to a number of changes - such as
increases in anxiety and muscle tension, or overuse of analgesics - that increase the
risk, and severity, of subsequent headaches: I'll have one mild headache, you'll have a
severe headache and then a milder one as a result, so yours are exponentially worse
than mine.
Some high school math: If you want to convert a multiplication into an addition, you use
logarithms. As a simple example, 10 x 100 = 1000. Now log(10) = 1, log(100) = 2, and
log(1000) = 3, so log(10) + log (100) = log(1000).

Let's calculate the log of our headache and prostate-specific antigen data and see what
we get. (We'll do what statisticians usually do and take the "natural" log using the
constant known as e.)
Linking Life to Math

These data look like a pretty good approximation to the normal distribution. From this I
would conclude that the rate of cancer growth is normally distributed in patients
undergoing prostatectomy and that there is some normally distributed tendency to
headache in patients with headache disorders.

This goes back to something that I have mentioned in previous articles in this series:
Biostatistics is about linking math and biology. I have heard it said that the purpose of
log transformation is to "bring down high values" or to "allow the use of parametric
statistics." However, that is looking at numbers in reference only to other numbers. The
time to use log transformation is when we believe the underlying biological process to
involve multiplication, the growth of cancer being an obvious example.

You might also like