Professional Documents
Culture Documents
How To Use and Apply SPSS: Version 15.0 For Windows
How To Use and Apply SPSS: Version 15.0 For Windows
Benjamin Noblitt
Lead Tutor
Quantitative Skill Center
Notes
This mini guide/ walkthrough is for SPSS 15.0 for Windows and was written in
the Summer of 2010. If there are any changes/updates that apply afterwards, this
document will not address them. Additionally, if a previous version (pre 15.0) is
being used, this document might not fully apply. This said, the concepts behind
what is being explained should still be the same, so long as the functions of the
program remain similar.
This guide is intended to give the reader a VERY basic understanding on how
to use SPSS. This is also intended to be a crash course type of guide. The length of
this document is indicative of how in depth this document goes. Furthermore,
there is a lot that this document does not mention. If you want to perform a very
thorough analysis with very in depth statistics, you can read the SPSS survival
manual by Julie Pallant.
Table of Contents
Preparations
3
Getting Started 4
Entering Data
7
Output Window
9
Walkthrough
10
Analysis
10
Graphing
10
Regression 13
Correlation 14
Testing
15
One Sample T-Test 15
Independent Sample T-Test
17
Paired Sample T-Test
18
One Way ANOVA
20
Hypothesis Testing Crash Course 26
Preparations
Know that SPSS is for analyzing data from a statistical researchers
point of view. The name SPSS stands for Statistical Package for the
Social Sciences SPSS. This means that this program is best suited to
comparing data samples or surveys using statistics. Furthermore, this
is a very robust program that is much more difficult to use if you do not
know statistics very well; it is designed primarily for the researcher or
statistician. Essentially, if you need to use this program, you will need
to learn basic statistics and its application before using this program.
Set up a survey or data collection mechanism that is numerical in its
nature. SPSS works with numbers, so the data needs to be able to be
quantified. (If its qualitative, you will need to make it quantitative.)
Make sure you have a goal in your research. Have a question you want
to answer, and collect data that you think will help answer that
question. For example do you think there is a correlation between
stress level and red blood cell count? These are questions that can be
answered numerically, making SPSS a good candidate for analyzing
this data.
Getting Starting
1. Open up SPSS (may be PASW Statistics 18)
2. A screen will pop up that has 5 different options to choose from as follows:
Figure 1:
d. Open an existing data source Like the name implies, you can
open a file that already has data in it from this option.
3. Select Type in data. You will be presented with a screen that looks a lot like
an Excel page. At the top of the work-area (the area of white cells) you will
see tabs that say var; these columns are the variables of your sample data.
Consequently, the numbered rows represent the number of data points for
each variable in your data.
Figure 2
4. At the bottom of the page there are two tabs: Data View and Variable View.
The variable view brings you to the page where you can enter the variables into
the program. For example if you have a range of age groups, this is where you
would enter the age groups and their numerical assignments. The data view is
the tab where you enter the raw data into the program.
Figure 3:
6. Look at the variables in variable view; each one has an abbreviated name
(with no spaces) that will show up in the data view tab in place of the var
that usually shows up. The Label column is the full name of the variable,
which you can enter in as well with spaces. Under the Values column, you
can determine what numerical values represent certain data points. For
6
example, in the accidents.sav file, you can see that a 1 represents a female,
and a 0 represents a male. In the Scale column, this is where you determine
the type of data being used. For example, gender is a categorical type of
variable, so it best fits into the nominal1 option. Furthermore, age category is
an ordinal variable, so the ordinal2 option would fit best.
7. In the data view tab, notice a small button that looks like a price tag with a
red end (see Figure 6). If you click on this button, it changes the data from
numbers to labels. You can see here that Females are denoted with a 1 and
males are denoted with a 0 as stated in the variable view. This makes
entering data into the work-area easier. Instead of writing female for all of
your data, you can just enter a 1 and the program knows to treat it as female.
Figure 6:
Entering Data
1. Open up the variable view tab.
2. You must first assign every variable a numerical value. For example assign a
1 to Under 21 and a 2 to 21-25 and a 3 to 26-30. This is the ONLY way
for SPSS to quantify your data. Additionally it is advised to at least determine
the Label, Values, and Measure of your data.
Figure 7:
Figure 9:
Output Window
Figure 10:
1. The Output window is a window that is separate from the main window which
shows you the results of your analysis. For example, if you want to have a
graph of some of your data, it will pop up in the output window.
2. There are two sections of the output window. The area on the left is a log of
all the data you have generated in your study. The area on the right is the
visual output of that data. For example, if you have 5 different graphs you
have made, you can click on any specific chart in the left area, and it will
appear (selected) in the right area. If you look at Figure 10, you can see that
the Log portion is selected, and it highlighted the log data in the right section.
3. Note: you can close the output window and NOT close SPSS; it is an auxiliary
feature.
Walkthrough
Analysis
1. Graphing is the most basic type of visual analysis and is presented first in
this guide.
a. Open SPSS/Tutorial/sample_files/accidents.sav. At the top of the screen,
select Graphs>Chart Builder.
b. You can select the different type of graphs you want to create from the
Gallery tab. Select Scatter/Dot and then Grouped Scatter by
clicking and dragging the thumbnail up into the empty area above the
tabs (see Figure 11).
Figure 11:
10
c. Next, you can see the variables that you have created in the upper left
side of the screen. Click and drag Age Category and place it as the
independent variable (x-axis), and then click and drag Accidents to
the dependent variable (Y-axis). Next, click and drag Gender into the
Set Color area. This allows the graph to distinguish between the two
genders, when graphing the two sets of data provided (see Figure 12).
Figure 12:
11
d. Click OK and look at the graph produced in the output window. If you
notice, both males and females are graphed on the same scale. This
can allow you to compare different sets of data on the same graph.
(see Figure 13)
Figure 13:
12
a.
d. Click OK. A new set of tables should appear in the Output window. In
the output window, there are a lot of tables that SPSS will make for
you. If you want to model the data with a linear equation, the bottom
table should be used. The equation in this example can be made in the
form of y=mx+b where y is the horsepower of the car, and x is the
price of the car.
14
Figure 15:
15
c. Note that the box next to Pearson is checked. This will produce a table
that has the Pearson correlation values for two variables (basic
correlation). This tool is most handy when finding the correlation (not
causation) between large numbers of variables. If you noticed, the
correlation coefficient for the two variables is given in both the
regression (see previous section) and the correlation table (R=.840)
(see Figure 17).
Figure 17:
Testing
In order to determine the validity of sample data, you need to test it. Testing
determines the likelihood of obtaining the sample results given a certain
assumption (the assumption is called the Null Hypothesis:
H 0 ). If you are
unfamiliar with Hypothesis testing, please refer to the end of this walkthrough for a
brief crash course on hypothesis testing.
16
1. One Sample T-Test is a way to determine whether or not you are convinced
that your sample can allow you to make a conclusion from it, i.e. (does it
reject
the t-critical is larger than the t-value (using absolute values) then there is
insufficient proof to reject
Ha
H0
(also using
absolute values).
a. Go to File>Open>Data, and then go to
SPSS/Tutorial/sample_files/callwait.sav.
b. We will see if the waiting time for being on hold is different than 9
minutes.
c. Go to Analyze>Compare Means>One-Sample T Test. A box should
open up that looks like Figure 18.
Figure 18:
17
Ha
H0
is
18
e. SPSS should now bring you to the output window with a table that
looks like Figure 22. First you must look to see if Equal Variance can be
assumed. To do this, look at Levenes Test for Equal Variances and
see if the significance is large (above .05 or so). If it is, then use the
upper row of data, if it is not use the bottom row of data. In our case,
the significance level of Equal Variances is far too small to assume
equal variances, so we use the bottom row.
f. Look at the bottom row and see that the T-value of the test is -7.519
(very large!), and the significance of the test (Sig two tailed) is .000, so
there is significant data to determine a difference among the mean call
waiting time of Monday compared to Friday. Another way to measure
this is to look at the 95% confidence interval; if 0 is within the range,
there is not sufficient data to determine a difference. In our interval, 0
19
H 0 ).
Figure 22:
3. Paired Samples T-Test is useful when the data is either from the same
group, or the data is paired up. For example if two police officers are giving
tickets each day for a week, the data would be paired because the number of
tickets written by each officer is paired since we have two sets of data that
are coming from the same two officers on the same days.
a. Go to File>Open>Data, and then go to
SPSS/Tutorial/sample_files/dietstudy.sav.
b. We want to see if there is a difference in someones weight before and
after a treatment plan. Therefore,
and
Ha
H0
of paired data so that each test subject has two sets of data; an initial
weight (wgt0) and a final weight (wgt4).
c. Go to Analyze>Compare Means>Paired Sample T Test. Select
both the initial weight and the final weight and move them over to the
Paired Variables box and click OK (see Figure 23). Note: you must
select BOTH data points before you can move then to the paired
variables box, because it treats them as one pair.
Figure 23:
20
d. In Figure 24 it can be seen that the t-value for this test is 11.175, with
a sig value of .000. This means that the likelihood of these two samples
being from the same population is effectively 0. Thus, we will reject our
H0
in favor of the
Ha
populations).
Figure 24:
4. One-Way ANOVA test is most useful when more than two means are being
compared. This is done by looking at the sample variances. ANOVA stands for
ANalysis Of VAriances.
21
H0 :
there is no difference among the different mean income levels, and our
H0
in favor of
H a , then we
22
Figure 26:
23
f.
The output window will have a lot of data provided! Dont worry
because once it is explained, it is not too bad to follow. The first table
shows the basic descriptive statistics of the groups of data analyzed.
The data provided is a good overview of what the data looks like in a
condensed form with all the basic data provided (see Figure 28).
Figure 28:
g. Before the ANOVA test can be done, we must see if the assumption of
equal variances is appropriate. The table of Homogeneity of Variances
looks at this very thing and tests it using Levenes test (see Figure 29).
Since the F value of the test is large (14.766), the corresponding
significance level (.000) is well below .05, meaning that equal
variances cannot be assumed. If the sig value was larger than .05, we
could simply move onto the ANOVA test.
Figure 29:
24
H0
level, however we need to verify this with the Welch and DrownForsythe tests (Robust Test for Equality of Means).
Figure 30:
i.
If the Welch and Brown-Forsythe tests produce values that are above .
05 then this would contradict our ANOVA test and cause us to fail to
reject our
results as our ANOVA test, (significance levels of .000) we can use the
ANOVA results. See Figure 31.
Figure 31:
j.
H 0 , we need to
determine which groups are different from one another. The Post Hoc
table in Figure 32 provides the data need to see which groups of data
have sufficient evidence to say that they are indeed different. Look at
the Did not complete high school (I) compared to High school
25
degree (J). Notice that both the sig level and the 95% confidence
interval are above .05 and includes 0 respectively. This means that
there is not sufficient data to say that the income levels between Did
not complete high school and High school degree are different.
Conversely, look at Did not complete High school compared to
Some college. This sig level and confidence interval are below .05
and do not contain 0 respectively, so there is sufficient data to say that
this pair of data is likely to be different. If you noticed, the mean
differences that have an asterisk next to them indicate a significant
difference between values.
Figure 32:
26
Figure 33:
3 Accept is in quotes since the statistical terminology is fail to reject. The reason
for this is that there are no absolutes in statistics.
28
H
( a)
example if we want to show that trees are taller than shrubs, you would say that the
mean height for a shrub is less than the mean height for a tree.
H
( 0)
absolutely no prior knowledge of something, then this is what you would believe,
Using the previous case, if you never knew anything about trees or shrubs, you
would not know that there is a difference in their heights, or you could assume that
shrubs are taller than trees. This said, the Null hypothesis is that the mean height
for shrubs is either equal to (or greater than) 4 the mean height of trees.
H 0 ) is less
than some arbitrary percentage (alpha value or ) then it is called significant. This
means that there is sufficient evidence to say that
reject
H0
in favor of the
H0
Ha .
H 0 ) is to use a t-test
44 It is assumed that there is no difference among the trees and shrubs, however, it
is also IMPLIED that the shrubs could be greater than because that would still not
contradict the alternative hypothesis.
29