You are on page 1of 4

W EAPONS OF MATH PRODUCTION

In theory, if you have the free time, you can calculate any statistic you might need using nothing
more than a pencil and paper. After all, it’s just matrix mathematics. With a lot of data or a
complicated procedure, though, you might need a lot of free time. A generation ago, that’s how
most statistics were calculated. Most people didn’t have computers, or calculators for that matter.
Slide rules … maybe. Now, there is an abundance of hardware and software to ease the tedium.
Having a statistician’s version of Norm Abram’s workshop to use actually makes analyzing data
a lot of fun.

Whether you’re planning a career in statistics or


just looking to analyze your current dataset, you’re
going to need software to do the calculations. Yes,
there are some people who still calculate
descriptive statistics manually, but this practice is
so prone to errors that it’s only applied to very
small datasets. And yes, there are some people
who develop their own statistical routines, usually
with R, a programming language for statistics
available for free under a General Public License,
or matrix manipulation software like matlab,
Be sure you have the resources you need to maple and mathematica. Unless you’re a
analyze your data. mathematical statistician developing a new
statistical technique, though, you won’t need to
take this approach if you don’t want to. There’s plenty of software available. All you need to
know is the kind of statistical analyses you’re likely to use and your price range.

Software for General Statistics

With a few exceptions, almost all of the statistical software you’ll find is geared to the most
common types of statistical analysis, including descriptive statistics, hypothesis testing,
correlation and regression, and analysis of variance. Software used for statistical analysis can be
grouped into five categories:

Web-based Calculators—Web sites that perform simple statistical calculations can be


found at statpages.org/. This is the low end of cost, but also usability. You usually have to
enter your data and edit it manually, so it’s not really suitable for production work.
Spreadsheets—You probably already have a copy of Microsoft Excel or some other
spreadsheet software on your computer. If you are a beginner at data analysis, you’ll find
that you can accomplish most of what you want to do using spreadsheet software.
Advanced data analysis may be more of an issue, though. Some statisticians advise
against using spreadsheet software, particularly Excel, citing three reasons. First, Excel
doesn’t do some calculations and graphs that statistical packages do. Well, of course it
doesn’t. It’s a spreadsheet program that sells for less than $200 (by itself, not part of
Office) compared to statistical packages that cost ten times as much. Big deal. Second,
Excel’s calculated probabilities are incorrect, reportedly in the third decimal place. OK,
but if you would base a decision solely on whether a probability is 0.051 instead of 0.049,
you really don’t understand the nature of statistical testing (more on this in another blog).
And third, Excel’s random number generators are not of research quality. Yup, so if
you’re planning to do Monte Carlo simulations with Excel … well, don’t (not necessarily
because your answer will be wrong as much as because some people will think it is
wrong).
Basic Statistical Software—This category includes software that is used mainly for less
sophisticated types of statistical analysis. Most can be purchased for less than about $500.
Key examples include StatsDirect, In Stat, Analyze It, and Assistat.
Intermediate Statistical Software—This category includes software that can be used for
many types of statistical analysis except some of the more sophisticated techniques like
multivariate analysis. Most but not all are a single module and cost less than about
$1,000. Examples include NCSS, Statistix, Costat, Origin, Prostat, Soritec, MVSP, and
Simstat.
Major Statistical Packages—This category includes software that can be used for a
variety of purposes. Most have a base module and a variety of optional add-on modules.
They are usually purchased through annual licenses specifying a number of users, and
cost more than about $1,000 (in some cases, way over). Some of the major packages like
SAS and SPSS have been around since the mainframe days of the 1960s. Others like
Statistica are products of the 1980s development of personal computers. Other examples
include S-Plus, Stata, Systat, Minitab, and Statgraphics.

Data analysis programs typically have spreadsheet screens for data because statistical
calculations use matrices, and after all, a spreadsheet is really just a matrix. They also have
utilities for both data management and graphing, which are essential for any type of data
analysis. Most all statistical software has graphical user interfaces (GUIs) and many also allow
you to write your own code for specialized applications. Almost all have downloadable demos,
usually fully functional (at least for basic statistics) for 30 days.

To conduct an analysis with statistical software, you enter or upload your data, scrub it (a whole
other discussion), then pick from the program’s menus the graphing or analysis procedure you
want to run. Submenus will pop up with all the specifications and options for the procedure. So,
it’s quite easy to do a lot of statistical analyses with just a few mouse clicks but you really have
to understand what all those specifications and options are about.

All of the software packages have their fans, especially the major packages. SPSS was created in
the 1960s by graduates of Stanford who continued development at the University of Chicago. It
used to be called Statistical Package for the Social Sciences, which is why it’s still very popular
in the social sciences. SPSS was bought by IBM in 2009. SAS, formerly called the Statistical
Analysis System, was developed in the early 1970s by professors at North Carolina State
University. S-Plus started out as a programming language developed by Bell Laboratories in the
1980s. Minitab was created by professors at the Pennsylvania University in the 1970s from
statistical spreadsheet software developed at the National Institute of Standards and Technology
(NIST). It’s now focusing on Six Sigma statistics procedures for managing quality.
There is no real best statistical software. They’re all pretty good, dollar-for-dollar. A lot of what
determines a user’s preference is what software is (was) available at their college or the place
they work. For example, if you go (went) to Penn State, you probably think Minitab is the best. If
you work at a pharmaceutical company, you probably use SAS because that’s what the entire
pharmaceutical industry uses. Social scientists like to use SPSS. If you like programming your
own procedures you’re probably a proponent of the R programming language for statistics.

Assuming you don’t have access to software through your school or work, you can evaluate your
software needs by answering three questions:

How sophisticated are the statistical techniques you need to use?


How often would you likely need to use the software?
How much do you have to spend for the software?

If you are planning on doing only one analysis, see if you can use what you have. You may be
able to do all your calculations in a spreadsheet program or use free software or web-based
software. If you are going to do full-time statistical consulting and you can’t afford a license for
a major package, bite the bullet and learn R. Another option would be to buy a basic or an
intermediate package and move up as you can afford to. If you’re only going to be an occasional
user, any of the statistical packages will be better than using a spreadsheet (except perhaps for
dataset scrubbing), so purchase whatever you can afford.

If you aren’t acquainted with statistical software, conduct a web search or start at
en.wikipedia.org/wiki/List_of_statistical_packages. Explore the web sites you find to be sure that
the software has the statistical procedures you think you will be using. Almost all of the sites
have free downloads, such as brochures, white papers and demonstration software. Don’t
download the demo software until you’re ready to make a decision. Most demos are good for
only 30 days after which the software won’t work even if you download a new copy.

Software for Specialized Applications

There are a few kinds of analysis you might run into that will require specialized software. For
example, have you ever seen an icon plot using sparklines or Chernoff faces? How about a
ternary diagram or a piper plot? Some day you may have to produce one of these specialized
graphics. Software you could look into would include: Sigmaplot, Origin, AquaChem,
GraphPad, EasyPlot, Delta Graph, and Grapher.

If you ever have to do time-series analysis, you could start with some of the high-end statistical
packages. Or, you could look into specialized software including Autobox, Eviews, ForecastX,
and RATS. If you have to produce maps, find a GIS expert to help you. If you’re committed to
doing it yourself, try Surfer. If you’re not into meteorology or geology, you probably don’t run
into orientation data very often, but if you ever do, get Oriana. For critical-path scheduling, try
Microsoft Project or P5, an update to Primavera Project Planner, now a product of Oracle.
There’s also software for resampling statistics, control charts, ANOVA, neural networks,
nonparametric statistics, power analysis, Bayesian statistics, data mining and many other
specialties.
The software market changes rapidly. The big packages keep getting bigger, spawning optional
modules from procedures that used to be part of the basic package. At the same time, new
statistical software appears, usually for specialized application. Spreadsheet software is also
becoming more sophisticated. Introductory statistics classes are now taught with spreadsheet
software; even calculators are a thing of the past. So do some research and get the software that’s
best for your situation.

Join the Stats With Cats group on Facebook

http://statswithcats.wordpress.com/2010/06/27/weapons-of-math-production/

You might also like