You are on page 1of 3

IT’S ALL GREEK

When Humpty Dumpty uses a word, it means just what he chooses it to mean, neither more nor
less. To people not conversant in a technical specialty, it seems that all the experts are Humpty
Dumptys. Statistics is no exception. If you’re a beginner at data analysis, it will seem like there is
a superabundance of esoteric statistical slang. You’ll hear it even from friendly statisticians. It
gets worse when you start reading websites, books, and
worst of all, journal articles. If you want to see what I
mean, read some of the article titles in the Journal of the
American Statistical Association (at
http://pubs.amstat.org/loi/jasa). The statisticians who
write those obfuscatory tracts know they won’t be
around for you to slap them silly for using
“autoregressive stationarity” in a sentence.
To simplify statistical jargon, think of three
distinctions—statistical concepts named after someone,
special words created to convey a special meaning, and
common words and phrases with alternative meanings.
It doesn’t look like a mouse to me. We’ll leave the acronyms out of it for now.

Named Things
Statistical procedures, especially statistical tests, are often modified to accommodate some
special circumstance or to have some desirable property. When this occurs, the new procedure is
commonly named after the originators. Thus, there are statistical tests named after Dixon, Tukey,
Wilcoxon, Scheffe, Kolmogorov, Fisher, Levene, Hotelling, Dunnett, and Bonferroni. And those
are just some well-known ones. Dig into the literature, and you’ll find scores more.
It’s not just tests that get named. Bayesian statistics is a branch of statistics based on Bayes
Theorem formulated in the 1700s by Reverend Thomas Bayes. Kriging, the interpolation
algorithm of geostatistics was named after Daniel Krige, a South African mining engineer, who
pioneered the field in the 1950s. The Normal distribution is also called the Gaussian distribution
after Carl Friedrich Gauss, who introduced it in 1809, and the Laplacian distribution after Pierre-
Simon Laplace who showed that the distribution was the basis for the central limit theorem in
1810. There are also theoretical frequency distributions named after Benford, Weibull, Rayleigh,
Cauchy, Poisson, and Bernoulli.
If someone mentions a named distribution , test, or other statistical procedure, don’t panic.
Nobody knows everything. Just ask what the distribution or procedure is supposed to do. If you
took an introductory course in statistics and know about probability, the Normal distribution, and
hypothesis testing, you’re in great shape for understanding most of the named stat terms you
might run into. This type of statistical jargon could be much worse. When biologists name
something after someone, they do it in Latin.
Created Words
Some statistical jargon might just as well be a foreign language because the words have no
common meaning in the English language outside of statistics (or math). Examples of such
words include: kurtosis, leptokurtic, platykurtic, skewness, covariance, autoregressive,
variogram, logit, probit, eigenvalue, median, outlier, stationarity, winsorizing, communality,
multicollinearity, and my personal favorite, homoscedasticity. If you’re at a bar and you hear any
of these words being bandied around, slip quietly out the door and run for your life. Any
statistician who uses these words with innocent civilians without explanation either doesn’t
understand his or her audience or is a sadist.

Alternative Meanings
The most confusing statistical jargon just might be words in most people’s everyday vocabulary
that have a very different statistical meaning. For example, when you hear the word mean, your
mind has to sort out the word’s connotation. It can signify to intend, as in say what you mean. It
can be used to associate, as in spring means flowers. It can refer to resources or methods, as in by
any means. It can indicate character, as in she has a mean streak. It can imply exceptional skill,
as in he has a mean fastball. And of course, in statistics, mean means average.” If you don’t
realize that some words in English have different meanings in statistics, you can get confused
very quickly. I’ve had well-meaning report editors change median to medium and nonsignificant
to insignificant.
Here are a few more examples:

Word Meaning to a Statistician Meaning to a Nonstatistician


A method for combining What the cashier does with your
bagging predictions from many data groceries when you’re done
mining models paying
A technique for controlling What the offensive line does
blocking
variation in ANOVA during football season
Interactively selecting data points
on an on-screen graph to access What you do with your toothpaste
brushing
other information associated with and toothbrush
the point
Splitting data into groups to
What happens to your car when
breakdown calculate descriptive statistics and
you’re in a hurry to get somewhere
correlations
Data with a real but undetermined Restricting free speech; removing
censoring value, usually less than or greater material considered to be offensive
than all other values in a dataset. from books or other media
confidence Absence of type I errors Ego stability
discriminate Classify observations by a To make distinctions based on
statistical model; a good thing. race, creed, ethnicity, age or other
category without regard to
individual merit; a bad thing
Differences between observed
errors values and values predicted from a Mistakes
statistical model; residuals
The most frequently appearing A manner of acting, such as being
mode
number in a set of numbers in “relaxation mode.”
A simulation procedure for The quarter of Monaco known for
Monte Carlo evaluating the properties or its resorts and casinos; a hotel in
performance of a statistic Las Vegas
Follows a Gaussian (bell-shaped)
Normal Typical, routine, sane
distribution
Differences between observed Money made by musicians and
residuals values and values predicted from a actors when their works are
statistical model; errors replayed.
An individual observation or
sample multiple observations that are part A piece, a bit, a taste.
of a population

Don’t feel that you’re alone in the quagmire of statistical jargon. Like dialects of the English
language, different statistical specialties have their own jargon and ways of expressing ideas.
Data mining, time-series forecasting, quality control, nonlinear modeling, biometrics,
econometrics, and geostatistics are all examples of statistical specialties that use terms not used
in the other specialties. Imagine a Louisiana Cajun talking to a Pennsylvania Dutch. They both
speak dialects of English, but it might as well be Greek.

Join the Stats With Cats group on Facebook.

http://statswithcats.wordpress.com/2010/07/03/it%E2%80%99s-all-greek/

You might also like