Professional Documents
Culture Documents
In theory, if you have the free time, you can calculate any statistic you might need using nothing
more than a pencil and paper. After all, it’s just matrix mathematics. With a lot of data or a
complicated procedure, though, you might need a lot of free time. A generation ago, that’s how
most statistics were calculated. Most people didn’t have computers, or calculators for that matter.
Slide rules … maybe. Now, there is an abundance of hardware and software to ease the tedium.
Having a statistician’s version of Norm Abram’s workshop to use actually makes analyzing data
a lot of fun.
With a few exceptions, almost all of the statistical software you’ll find is geared to the most
common types of statistical analysis, including descriptive statistics, hypothesis testing,
correlation and regression, and analysis of variance. Software used for statistical analysis can be
grouped into five categories:
Data analysis programs typically have spreadsheet screens for data because statistical
calculations use matrices, and after all, a spreadsheet is really just a matrix. They also have
utilities for both data management and graphing, which are essential for any type of data
analysis. Most all statistical software has graphical user interfaces (GUIs) and many also allow
you to write your own code for specialized applications. Almost all have downloadable demos,
usually fully functional (at least for basic statistics) for 30 days.
To conduct an analysis with statistical software, you enter or upload your data, scrub it (a whole
other discussion), then pick from the program’s menus the graphing or analysis procedure you
want to run. Submenus will pop up with all the specifications and options for the procedure. So,
it’s quite easy to do a lot of statistical analyses with just a few mouse clicks but you really have
to understand what all those specifications and options are about.
All of the software packages have their fans, especially the major packages. SPSS was created in
the 1960s by graduates of Stanford who continued development at the University of Chicago. It
used to be called Statistical Package for the Social Sciences, which is why it’s still very popular
in the social sciences. SPSS was bought by IBM in 2009. SAS, formerly called the Statistical
Analysis System, was developed in the early 1970s by professors at North Carolina State
University. S-Plus started out as a programming language developed by Bell Laboratories in the
1980s. Minitab was created by professors at the Pennsylvania University in the 1970s from
statistical spreadsheet software developed at the National Institute of Standards and Technology
(NIST). It’s now focusing on Six Sigma statistics procedures for managing quality.
There is no real best statistical software. They’re all pretty good, dollar-for-dollar. A lot of what
determines a user’s preference is what software is (was) available at their college or the place
they work. For example, if you go (went) to Penn State, you probably think Minitab is the best. If
you work at a pharmaceutical company, you probably use SAS because that’s what the entire
pharmaceutical industry uses. Social scientists like to use SPSS. If you like programming your
own procedures you’re probably a proponent of the R programming language for statistics.
Assuming you don’t have access to software through your school or work, you can evaluate your
software needs by answering three questions:
If you are planning on doing only one analysis, see if you can use what you have. You may be
able to do all your calculations in a spreadsheet program or use free software or web-based
software. If you are going to do full-time statistical consulting and you can’t afford a license for
a major package, bite the bullet and learn R. Another option would be to buy a basic or an
intermediate package and move up as you can afford to. If you’re only going to be an occasional
user, any of the statistical packages will be better than using a spreadsheet (except perhaps for
dataset scrubbing), so purchase whatever you can afford.
If you aren’t acquainted with statistical software, conduct a web search or start at
en.wikipedia.org/wiki/List_of_statistical_packages. Explore the web sites you find to be sure that
the software has the statistical procedures you think you will be using. Almost all of the sites
have free downloads, such as brochures, white papers and demonstration software. Don’t
download the demo software until you’re ready to make a decision. Most demos are good for
only 30 days after which the software won’t work even if you download a new copy.
There are a few kinds of analysis you might run into that will require specialized software. For
example, have you ever seen an icon plot using sparklines or Chernoff faces? How about a
ternary diagram or a piper plot? Some day you may have to produce one of these specialized
graphics. Software you could look into would include: Sigmaplot, Origin, AquaChem,
GraphPad, EasyPlot, Delta Graph, and Grapher.
If you ever have to do time-series analysis, you could start with some of the high-end statistical
packages. Or, you could look into specialized software including Autobox, Eviews, ForecastX,
and RATS. If you have to produce maps, find a GIS expert to help you. If you’re committed to
doing it yourself, try Surfer. If you’re not into meteorology or geology, you probably don’t run
into orientation data very often, but if you ever do, get Oriana. For critical-path scheduling, try
Microsoft Project or P5, an update to Primavera Project Planner, now a product of Oracle.
There’s also software for resampling statistics, control charts, ANOVA, neural networks,
nonparametric statistics, power analysis, Bayesian statistics, data mining and many other
specialties.
The software market changes rapidly. The big packages keep getting bigger, spawning optional
modules from procedures that used to be part of the basic package. At the same time, new
statistical software appears, usually for specialized application. Spreadsheet software is also
becoming more sophisticated. Introductory statistics classes are now taught with spreadsheet
software; even calculators are a thing of the past. So do some research and get the software that’s
best for your situation.
http://statswithcats.wordpress.com/2010/06/27/weapons-of-math-production/