You are on page 1of 11

STAT582 HW7

Review of Two Statistical Software Packages Minitab and SPSS Yan Sun

As a statistician, have you ever got stuck in front of your computer, trying to figure out the correct syntax of a command to type into the little programming window, and just could not get it right? At that moment, I am sure you would wish there was some magic easy button that you could just click and then things would work the way they should. Well, magic does not happen everyday. However, some better choices can make life easier. Instead of using programmed command lines, some statistical software make their usage much easier by using a menu-driven interface. This kind of software are like well-organized control panels. Each of the things you need to do is controlled by a button somewhere on the panel. Once you get familiar with the layout of the panel, the actual work should be quite an enjoyable process. Several good menu-interface statistical software are available. Among them, Minitab and SPSS are the most widely used ones. This report serves as an introduction to these two software packages. For each of them, the softwares specialties, advantages, and suitability will be discussed. Some important functionalities, their implementations, and programming in the two software will be introduced. This report also includes helpful resources, which I personally found very helpful in learning and using the two software.

1. Minitab History Minitab was originally developed by Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L. Joiner at Pennsylvania State University in 1972. Nowadays, it is a commercial product distributed by the Minitab Inc. The latest version of this software is Minitab15, which was released in 2008 (1). This report is based on this latest version. Specialties and advantages Comparing to other statistical software, Minitab has several very attractive advantages: (1) Easy to learn and easy to use. You do not need to memorize complicated programming languages to work with Minitab. All regular statistical functionalities can be performed in Minitab by one or several clicks in the pull-down menu. Besides, the menu is organized in a very intuitive way, such that it is not a hard thing to remember where to find what. All these features make Minitab very accessible to first-time

learners. Many statistics educators in higher education institutions prefer Minitab as the major software in teaching intermediate or some advanced level statistics courses (2). (2) Quality control functionalities. Large amount of statistical functionalities can be performed in Minitab, ranging from simple basic statistics to much more complicated multivariate analysis. However, what makes Minitab stand out among many statistical software is its strength in statistical quality control. Minitab is equipped with almost all of the widely-used tools for process control, including analyzing methods, graphics, designs of experiments, etc. In fact, Minitab is the leading software package used by quality improvement professionals in all kinds of industries around the world. Based on the Minitab Inc. website (3), their clients include GE, TOSHIBA, Bank of America, SAMSUNG, etc. Besides, Minitab has two other complementary software packages Quality Trainer and Quality Companion - to further enhance its strength in quality improvement. (3) Nicer graphing output. Most Minitab users are impressed by the variety and quality of the graphs generated by the software. Minitab can produce many kinds of statistical graphs, and they are very easy to be edited and customized. The quality of the graphs is superior than that of many other software (see figure 2 for an example). Suitability In the area of the most sophisticated statistical computation, Minitab is not as powerful as software packages such as STATA and R. So for academic research that involves intense and very complicated statistical computation and analysis, Minitab might not be the right choice. However, for most of the works in education, research, business, and industrial process control, which require intermediate or some advanced statistical analysis, Minitab is usually fully capable of meeting users needs. Functionalities The next part of the report will focus on some of the major functionalities in Minitab and their implementations. (1) Data importation and general data manipulation . Minitab stores data in the format of worksheet. To create a data worksheet, you can type in data directly, or import existing data into worksheet (File>Open). Data in text and Excel formats can be imported directly into Minitab worksheet. Conveniently, if you check the merge option when you import a dataset into an existing data worksheet, the new data will be put side by side with the old data to create a merged dataset (figure 1a). Data in Minitab worksheet are very easy to edit and manipulate. Deleting rows/columns can easily be done by selecting the whole rows/columns, right clicking the mouse and choosing delete cells; Missing values can be filled in directly; One can transpose columns and rows of the dataset by clicking Data>Transposing columns. Due to the fact that Minitab has a menu-driven interface, you do not need to write codes to edit data. Most of the data manipulating functionalities are in the data pull-down menu.

(2) Simple statistics and testing. Most of the statistical functionalities are in the stat pull-down menu. For instance, in the stat menu, the basic statistics sub-menu contains Display descriptive statistics, 1-sample z test, 2-sample t test, etc; The tables sub-menu contains chi-square test; In the power and sample size sub-menu, one can easily calculate sample size or power. During the calculation and testing process, one can specify parameters by choosing different options in dialog boxes. Figure 1b shows the options in descriptive statistics one can choose to analyze his/her data. a. b.

Figure 1a. old and new data can be merged together by checking the merge option in the open worksheet window. 1b. Options of descriptive statistics in Minitab. (3) Statistical analysis using different models. Minitab can analyze data using different models. Under the Stat menu, different methodology can be chosen to analyze your data, including Regression, ANOVA, Multivariate, Time series, etc. After the method is selected, you can specify the model you want to use by further selecting the options in dialog boxes. For instance, if you want to use a general linear model that involves random effect, you can click Stat>ANOVA>General linear model and specify the response, your model, and the random effect, in the dialog box, and then Minitab will do the analysis according to your specifications. A lot more types of statistical analysis can be done in Minitab, including response surface analysis for continuous-categorical variable combined data (Stat>DOE>Response surface), and nested ANOVA analysis for data that fit a nested variable model (Stat>ANOVA>Fully nested ANOVA). The StatGuide build-in manual in Minitab contains very detailed information, which can help you choose appropriate method for your data and interpret the analysis results. (4) Graphics. Minitab contains a large variety of graphing functionalities. In here, you can generate histogram, QQ-plot, residual plot, and a lot more. Figure 2 includes QQplot, residual plot, histogram and ordered-data plot generated in Minitab based on the diesel data we used in class (4). They were produced by checking the graphs option
3

when regression analysis was performed (Stat>Regression). The plots can be generated in separate windows or arranged nicely in one window depending on your choice (individual plots vs Four in one). The graphs can easily be edited by right clicking on the items in the graph you want to change and selecting the appropriate options, such as edit title, edit symbols, copy graph, etc. Most of the graphing functionalities are either incorporated in the statistical analysis procedure (Stat) as options or gathered under the pull-down menu of Graph.
plots for disel data
Normal Probability Plot
99 90 Residual Percent 50 10 1 -0.6 -0.3 0.0 Residual 0.3 0.6 0.0 0.5

residual plot

-0.5 0.7 0.8 0.9 1.0 Fitted Value 1.1

Histogram
0.5 8 Frequency 6 4 2 0 -0.6 -0.4 -0.2 0.0 Residual 0.2 0.4 Residual 0.0

ordered-data plot

-0.5 1 5 10 15 20 25 30 35 Observation Order 40 45

Figure 2. Graphs generated based on diesel data. Programming in Minitab While Most of the statistical functions are accessible through menus in Minitab, they can also be performed through programming. The language used to program in Minitab is called session command. In the session window, once the command prompt is activated (MTB> shows up in the window) by Editor>Enable commands, you can directly type in your commands and make the software perform the procedures. The following Minitab output results from the command: MTB > regress 'ignition' 1 'alcohol', based on the diesel data:
The regression equation is ignition = 0.737 + 0.00486 alcohol Predictor Constant alcohol Coef 0.73720 0.004863 SE Coef 0.06419 0.001421 T 11.49 3.42 P 0.000 0.001

S = 0.247874

R-Sq = 20.6%

R-Sq(adj) = 18.9%

One can also use subcommand to specify how the command needs to be carried out. If your work involves running the same program repeatedly, you can save the series of Minitab commands as Execs, so that you can re-run the program in the future. The
4

Minitab build-in help manual contains detailed instructions on the syntax for almost all of the session commands and how to use Execs and other more complicated micros. The link in (5) is also a good source for quick reference. Helpful resources Minitab is well known for its user friendliness. The reason for this is not only its straight forward menu-interface, but also its excellent help facilities and resources. The followings are some of the resources that I feel very helpful in learning and using this software: (1) Meet Minitab15 (6) - the first thing that you should read to learn about Minitab. This 142-page PDF file is practically the beginners guide to the most commonly used features in Minitab. The book is very well written with examples and snap shots of computer screens, such that it is quite fun to read. A good way to read this book is to have the Minitab software open in your computer at the same time. That way you can practice while you go through the book. Meet Minitab15 can be downloaded from the Minitab website (6). (2) Minitab build-in electronic manuals The build-in manuals in Minitab is one of the best among those of all statistical software packages. If you got questions while using Minitab, most of the time, you can find your answers here. I personally found two manuals are extremely helpful - the Help manual that helps you use the software, and the StatGuide manual that helps you understand the analysis results. Other build-in manuals include Tutorial, Methods and Formulas, etc, which should also be very useful. (3) There are all sorts of websites where you can find helpful information regarding your specific needs in Minitab. Most of the time, you can find them by a simple google search. Reference All information from web links are present at the time when this report is written (April, 2010) (1) http://en.wikipedia.org/wiki/Minitab (2) Using Minitab for teaching statistics in higher education. John Eales and Julian Stander. MSOR Connections. Vol 9 No 3. (3) www.minitab.com (4) http://www.stat.purdue.edu/~jennings/stat582/datasets/index.html (5) http://www.austincc.edu/mparker/1342/tf/mm/Appendix.pdf

(6) http://www.minitab.com/en-US/products/minitab/documentation.aspx? langType=1033 2. SPSS Introduction SPSS software was developed by Norman H. Nie and C. Hadlai Hull at Stanford University. When it was first released in 1968, the package was mainly focused on academic research, and SPSS stands for Statistical Package for the Social Sciences. Today, SPSS is one of the most widely used statistical software in the world. It is used by survey companies, market researchers, health researchers, education researchers, government, and others (1). Based on the SPSS Inc. website, the software now has customers including all 50 U.S. state governments, 100% of the top U.S. universities, 22 top global commercial banks, 18 top property and casualty insurance companies in the U.S., and 12 top global pharmaceutical companies (2). SPSS has evolved from its academia origin to a leading analytical tool for enterprises around the world. As a modular software, SPSS has a base system module, where you can perform most of the regular data management and statistical analysis functionalities such as descriptive statistics, commonly used tests, linear regression, ANOVA, etc. However, to perform more sophisticated functionalities, such as multivariate GLM, logistic regression, you need SPSS add-on modules. The link in (3) has a good summary on the SPSS base and its add-on modules regarding the specific statistical procedures they perform. Specialties and advantages (1) Easy to learn and easy to use. Due to the fact that SPSS is menu-driven, the software is very easy to use. Like Minitab, most of the functionalities in SPSS are organized into pull-down menus in a very intuitive way. Based on my own experience, the learning curves for SPSS and Minitab are similar. In fact, research has been done to compare user satisfactions between SPSS and Minitab among college students, and no significant difference was found (4). (2) Strength in data management. One of the major advantages that make SPSS unique and succeed in social science is its user-friendly setting for data management. Large amount of data can be handled in SPSS; Specifying or changing data attributes can be done by just several clicks; Variables and values can be easily labeled for future reference. The data management functionalities in SPSS will be further explored in later part of this report. Suitability Comparing to Minitab, I will say SPSS is generally stronger in statistical analysis, especially in some specific area, such as ANOVA-related procedures. The add-on modules give SPSS further flexibility and potentials to develop its capacities. However,
6

for cutting-edge statistical analysis, SPSS is still not as strong a candidate as STATA and R. So SPSS is most suitable to you if your work involves large dataset, frequent data management, and intermediate/partially-advanced statistical analysis (5). Functionalities (1) Data importation and general data manipulation. SPSS data files look very much like the spreadsheet in Excel or the worksheet in Minitab. The files usually have the extension of .sav and are presented in the Data editor window. SPSS can import dataset of almost all kinds of formats, including spreadsheet (e.g. Excel), Database (e.g. Access), and Text. Excel file can be imported directly using the menu (File>Open>Data), while Database and Text files can be brought in through importing wizards. SPSS makes it very easy to manipulate data. There are two types of views for each data editor window. In the data view, you can edit your dataset by filling in missing values, deleting rows/columns, transposing dataset (Data>Transpose), etc. In the variable view, the attributes of each variable are listed and you can edit them directly. For instance, you can specify the labels for the variables or the labels for their values, so that later in the data view you can see what the variables are and what their values of 0 or 1 mean. You can also change variable name, variable type, decimals of the variable values in the variable view. Most data manipulation functionalities are collected in the Data menu and the Transform menu. For instance, you can transform any variable in your dataset by clicking Transform>Compute variable and specify the transforming functions. For more information regarding data management, please refer to the SPSS users guide or the build-in help manual. (2) Simple statistics and testing. Most of the simple statistical functionalities and testing procedures can be found in the Analyze pull-down menu. For instance, descriptive statistics, such as mean and variance of the data, can be calculated through Analyze>Descriptive Statistics>Descriptive. For a chi-squar test to examine association, you can find the functionality in Analyze>Descriptive Statistics>Crosstabs. (3) Statistical analysis using different models. SPSS can analyze data using linear regression model or non-linear regression model (You need the regression add-on module to perform non-linear regression). These functionalities are in Analyze>Regression>Linear and Analyze>Regression>Nonlinear menus. For instance, to build a linear regression model, you can specify the dependent variable and the independent variable(s) in the linear regression dialog box. You can also specify the model selection method (forward, backward, stepwise, etc.) and the WLS weight in the box. By doing so, the criteria for modeling are set and SPSS will return the modeling results including the parameter estimates and their significance.

If you want to perform ANOVA analysis for categorical data, select through Analyze>General Linear Model>Univariate, where you can specify dependent variable, fixed factors, interactions and etc. You can also customize your model by directly typing it into the dialog box. Other more complicated models can also be used for analyzing data in SPSS. For instance, a random effect model can be applied to data analysis by Analyze>Mixed models>Linear menu. In the dialog box, one can specify the variables of random effect subject and of the repeated measurement to build the model. Some sophisticated models are not covered by menus and you have to run commands to do the analysis. For instance, to build a nested ANOVA model, you can first generate the model using menus and specify all criteria you want, except the nested variable. Then you click paste, so that the procedure you just performed using menus will appear in the syntax window as commands. Now you can specify the nested variable in the /design subcommand. After that you can run the composed program to generate the nested ANOVA model. (4) Graphics. SPSS can generate a large variety of graphs for data exploration and result presentation. For example, if you want to explore whether a group of values are normally distributed, you can generate a QQ-plot for those values. Figure 3a shows a QQ-plot based on the values of the ignition delay variable in the diesel data. It was generated through Analyze>Descriptive statistics>QQ plots menu. Some graphing functionalities are incorporated in analysis procedures. For instance, in the process of linear regression using alcohol as independent variable and ignition delay as dependent variable, one can ask SPSS to produce graphs for testing normality assumption by selecting the corresponding options in the dialog box. Figure 3b shows the histogram of residuals resulting from the regression. a. b.

Figure 3a. QQ-plot for the ignition delay variable in the diesel data. 3b. Histogram for assumption testing based on the diesel data. It is easy to edit graphs in SPSS. By double clicking the graphs in the statistics viewer window, a separate window called chart editor will appear. In chart editor, you can perform all kinds of modifications to your graphs, including changing text and color, adding footnote and data label, etc. Programming in SPSS SPSS can also run under programmed commands. In fact, although most of the functionalities driven by SPSS commands are also accessible through pull-down menus, some procedures and options can only be performed under commands. The advantage of using commands is that you can save the program and re-run it in the future. SPSS commands are written and edited in a separate window called syntax editor. As an example, lets perform linear regression of ignition delay in response to alcohol, as we did using Minitab in part 1 of this report. Open the syntax editor window by selecting through File>New>Syntax and we type in the following commands:
Regression /dependent ignition /method=enter alcohol

In this syntax, regression is the command, followed by two subcommands (initiated with a /). The response variable is specified after dependent, and the predicting variable is specified after method=enter. The following is part of the SPSS output in response to the commands:
Coefficientsa Standardized Unstandardized Coefficients Model 1 (Constant) alcohol (mass %) a. Dependent Variable: ignition delay (Cao) B .737 .005 Std. Error .064 .001 .454 Coefficients Beta t 11.485 3.422 Sig. .000 .001

The parameter estimates and their significance levels (in red) we get here are the same as what we got in the Minitab report. Another great feature in SPSS command is its auto-completion control. After a command, type in a subcommand indicator /, and then press Control+Spacebar, the

options for subcommand will show up for you to choose (figure 4). This feature is very helpful when you are unsure about the syntax of the functionality you want to perform.

Figure 4. The auto-completion feature in SPSS programming. Helpful Resources (1) SPSS Statistics Base 17.0 Users Guide (6). This is the guide that I started with in learning SPSS. I found it very helpful because it introduced me to the most commonly used procedures in the software. The guide has two versions a long and detailed one and a brief one. My suggestion is to start with the brief one, just to get a taste of what this software is like. Then later you can look into the detailed guide for more specifics. (2) SPSS build-in help manual. For help on specific questions or problems while you work with SPSS, the build-in electronic manual is very handy. In the Help menu, the topics submenu covers most of the procedures in the software regarding their functions and implementations. The Tutorial submenu illustrates how you can use the basic features. The Statistics Coach submenu asks what you want to do and helps you choose the most appropriate procedure that meets your specific needs. (3) Online resources: comp.soft-sys.stat.spss newsgroup: (http://groups.google.com/group/comp.softsys.stat.spss/topics?gvc=2). An active google group where people ask and answer questions about specific problems while using SPSS. Raynalds SPSS Tools: (http://www.spsstools.net/) a very resourceful website set up by Raynald Levesque, the author of SPSS Programming and Data Management published by SPSS (7). The website obviously needs to be updated since some of the links do not work anymore. However, the available information is still in large amount. This website is especially helpful if you are interested in programming in SPSS. UCLA Academic Technology Services: (http://www.ats.ucla.edu/stat/spss/). Great website for SPSS learners with many data analysis examples. Reference All information from web links is present at the time when this report is written (April, 2010).
10

(1) http://en.wikipedia.org/wiki/SPSS (2) http://www.spss.com/success/ (3) http://faculty.chass.ncsu.edu/garson/PA765/spssmodules.htm (4) An empirical comparison of student user-satisfaction between SPSS and Minitab. M. Feinberg and J. Siekpe. College Student Journal, Dec, 2003. (5) SAS, Stata, SPSS: A Comparison. AC. Acock. Journal of Marriage and Family, 2005. 67 (4), 1093-1095. (6)http://support.spss.com/ProductsExt/SPSS/Documentation/SPSSforWindows/index.h tml (7) http://www.spss.com/sites/dm-book/

11

You might also like