You are on page 1of 12

Session 8 – SPSS Session

From questionnaire to SPSS

Let’s take a look at our sample questionnaire. It starts off with demographic questions, followed by 6
items on Life Orientation Test (optimism scores) and 20 items on Positive and Negative Affect Scale
(PANAS). The top right box shows a respondent number 703 who has answered this questionnaire.
Before diving into SPSS, it is always best to have a code book to be clear of the variable names to
give for the items as well as the coding instructions for it.

Now that the codebook is ready, we can enter the data to SPSS. An SPSS file has the file extension
.sav. You can open this file only if you have SPSS.

Entering data to SPSS

1) Open the file Sample (Raw).sav.


2) SPSS has 2 views ➔ Data View and Variable View.

3) Data View shows the data that has already been entered. If you scroll down, you will see
that most of the data has already been entered except for respondent number 703.
4) Variable View shows the variables in the questionnaire and the characteristics of the
variables.
5) Let’s say we wish to check the values of marital status. At the marital status row, click at the
cell that intersect with the values column. Click at the little button.
6) You will see this.

These are to explain what the values represent.

Now that the data has been entered, it is best to always check that there are no errors before
analysing the data. This is known as cleaning the data.

Cleaning data

Descriptive Statistics and Frequency Tables

1) Analyze ➔ Descriptive Statistics ➔ Frequencies


2) Select the relevant variables and bring them to the right.
3) Check that Display frequency tables is ticked.
4) Click Statistics button and choose Mean, Median, Std deviation, maximum and minimum.
5) Click Continue. If you wish to produce charts like pie chart or bar chart, you may click the
Chart button. Otherwise, proceed to step 6.
6) Click OK.

sex
Cumulative
Frequency Percent Valid Percent Percent
Valid MALES 186 42.4 42.4 42.4
FEMALES 253 57.6 57.6 100.0
Total 439 100.0 100.0

Look at the frequency tables to look out for weird data. Eg. Highest educ completed. There is a value
22. Another one is Optimism 3. Make the relevant corrections by typing the correct values in the
data view.
Once the data is cleaned, check what you wish to analyse. Refer to the table below.

Research question Independent variable Dependent variable Test used


Is there an association One categorical variable One categorical Chi-square
between gender and Gender: M/F variable
smoking behaviour? smoker: Yes/No
OR

Are males more likely


to be smokers than
females?
Is there a difference in One categorical variable One continuous ANOVA
optimism scores for (3 or more levels) variable
young, middle-aged Age group Average Optimism
and old participants? score
Is there a significant One categorical variable One continuous Independent t-test
difference in the (2 groups) variable
average optimism Gender: M/F Average Optimism
scores for males and score
females?

Let’s take a look at the second research question. We need age groups and average optimism score.
Currently, the respondents write down their ages. As for optimism score, the questionnaire has 6
optimism items. So we need to make some modification known as data transformation.
We need to transform the data to put ages into 3 categories and to find the average optimism
scores.
Data Transformation

Put ages into 3 categories.


1) Transform ➔ Recode into different variables. Bring age to the right side.
2) At the Output Variable, put the Name as Age_cat, and for Label, type Age categories. Click
Change.
3) Click Old and New Values. Modify the Old values by typing the range.
4) Click Continue. Click OK.

A new column called Age_cat is created. Click at Variable View and put the Values as

1 = 18 - 29, 2 = 30 - 44, and 3 = 45+. Put the Decimal places to 0.


For Likert-scale type of items such as optimism items, before you take the average of the 6 items,
you need to check if the items are in the same direction, i.e. are they are all positive items, or are
some positive, and some negatively worded. If they are

Reverse code negative optimism scores.

Check the questionnaire for any negative items. Op2, Op4 and Op6 are negatively worded. To
reverse code these 3 items, perform the following.

1) Transform ➔ Recode into Same Variable. Select Op2. Click Old and New Values.
2) At Old Value, type the Value as 1. Then at New Value, type the Value as 5. Repeat for the
others like below.

3) Click Continue and Click OK.

Repeat the 3 steps for Op4 and Op6.

Now that the 6 optimism items are in the same direction, we would then need to find the average
optimism scores. The steps are in the next page.
Average Optimism Scores

1) Transform ➔ Compute Variable

2) Click Type & Label. For the Label, type Average Optimism Scores.
3) Type the Numeric Expression as seen above. Click OK.

The data is now ready for further analyses. To be continued…


Research question Independent variable Dependent variable Test used
Is there an association One categorical variable One categorical Chi-square
between gender and Gender: M/F variable
smoking behaviour? smoker: Yes/No
OR

Are males more likely


to be smokers than
females?
Is there a difference in One categorical variable One continuous ANOVA
optimism scores for (3 or more levels) variable
young, middle-aged Age group Average Optimism
and old participants? score
Is there a significant One categorical variable One continuous Independent t-test
difference in the (2 groups) variable
average optimism Gender: M/F Average Optimism
scores for males and score
females?

Sample hypotheses for the 3 research questions above

H1: Males are more likely to be smokers than females.

H2: There is a significant difference in the optimism scores for the three age groups.

H3: There is no significant difference in the average optimism scores for males and females.

First research question

Null hypothesis

There is no association between gender and smoking behavior.

Chi-square test

1) Analyze → Descriptive Statistics → Crosstabs

2) Put one variable as the row variable and the other as the column variable.

3) Click on Statistics. Tick Chi-square and Phi and Cramer’s V. Click Continue.

4) Click on the Cells button. In the Counts box, tick Observed. In the Percentage section, click Row,
column and Total boxes.

5) Click Continue. Click OK.


SPSS Output

sex * Are you a smoker? Crosstabulation


Are you a smoker?
YES NO Total
sex MALES Count 35 151 186
% within sex 18.8% 81.2% 100.0%
% within Are you a smoker? 40.7% 42.8% 42.4%
% of Total 8.0% 34.4% 42.4%
FEMALES Count 51 202 253
% within sex 20.2% 79.8% 100.0%
% within Are you a smoker? 59.3% 57.2% 57.6%
% of Total 11.6% 46.0% 57.6%
Total Count 86 353 439
% within sex 19.6% 80.4% 100.0%
% within Are you a smoker? 100.0% 100.0% 100.0%
% of Total 19.6% 80.4% 100.0%

Chi-Square Tests
Asymptotic
Significance (2- Exact Sig. (2- Exact Sig. (1-
Value df sided) sided) sided)
Pearson Chi-Square .122a 1 .726
Continuity Correctionb .052 1 .820
Likelihood Ratio .123 1 .726
Fisher's Exact Test .808 .411
Linear-by-Linear Association .122 1 .727
N of Valid Cases 439
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 36.44.
b. Computed only for a 2x2 table

Sample Interpretation:

19.6% of the sample are smokers, 80.4% of the sample are non-smokers.

40.7% of smokers are males, whereas 59.3% of smokers are females.

Sig value for Pearson chi square is 0.726 > 0.05 => do not reject null hypothesis. There is no
significant difference between gender and smokers.
Second research question

Null hypothesis

There is no difference in the optimism scores for the three age groups.

ANOVA

1) Analyze → Compare Means, One-way ANOVA

2) Move the dependent variable to the Dependent List.

3) Move the independent variable to Factor.

4) Click Options. Select Descriptive, Homogeneity of variance test, and Means Plot.

5) For missing values, make sure there is a dot in the option marked Exclude cases analysis by
analysis. Click Continue.

6) Click Post Hoc. Click Tukey.

7) Continue. OK.

SPSS Output

Descriptives
Average Optimism Score
95% Confidence Interval for
Std. Mean
N Mean Deviation Std. Error Lower Bound Upper Bound Minimum Maximum
18 - 29 147 3.5601 .75850 .06256 3.4365 3.6837 1.17 5.00
30 - 44 153 3.6841 .69121 .05588 3.5737 3.7945 1.67 5.00
45+ 136 3.8125 .76105 .06526 3.6834 3.9416 1.33 5.00
Total 436 3.6823 .74171 .03552 3.6125 3.7522 1.17 5.00

Test of Homogeneity of Variances


Levene Statistic df1 df2 Sig.
Average Optimism Score Based on Mean 1.023 2 433 .360
Based on Median .937 2 433 .393
Based on Median and with .937 2 419.948 .393
adjusted df
Based on trimmed mean .939 2 433 .392
ANOVA
Average Optimism Score
Sum of Squares df Mean Square F Sig.
Between Groups 4.501 2 2.251 4.150 .016
Within Groups 234.808 433 .542
Total 239.310 435

Sample Interpretation

According to the Descriptives table, respondents who are at least 45 years old have the highest
average optimism score of 3.81.

Test of homogeneity of variances

- Tests whether the variance in optimism scores is the same for all three groups
- Sig. value of 0.360 > 0.05 => did not violate the homogeneity of variances assumption

ANOVA

- Determine whether there is any difference between groups


- Sig value is 0.016 < 0.05 => reject null hypothesis ➔ significant difference among the
groups

NOTE: If ANOVA shows that there is no significant difference, then do


not continue with Post Hoc.

Post Hoc Tests

Multiple Comparisons
Dependent Variable: Average Optimism Score
Tukey HSD
Mean 95% Confidence Interval
(I) Age categories (J) Age categories Difference (I-J) Std. Error Sig. Lower Bound Upper Bound
18 - 29 30 - 44 -.12401 .08505 .312 -.3240 .0760
45+ -.25241* .08761 .012 -.4585 -.0464
30 - 44 18 - 29 .12401 .08505 .312 -.0760 .3240
45+ -.12840 .08679 .302 -.3325 .0757
45+ 18 - 29 .25241* .08761 .012 .0464 .4585
30 - 44 .12840 .08679 .302 -.0757 .3325
*. The mean difference is significant at the 0.05 level.

The mean score for Group 1 (18 – 29 years) was significantly different from Group 3 (45+ years).
Third research question

Null hypothesis

There is no significant difference in the average optimism scores for males and females

Independent t-test

1) Analyze → Compare Means → Independent Samples T test

2) Move the dependent variable into the test variable box.

3) Move the independent variable into Grouping variable

4) Click Define groups. Type in the numbers (code) for each group. For example, let 1 = males, 2 =
females. Then in Group 1 box, type 1, and in Group 2 box, type 2.

5) Continue → OK.

SPSS Output

Group Statistics
sex N Mean Std. Deviation Std. Error Mean
Average Optimism Score MALES 186 3.6604 .67191 .04927
FEMALES 250 3.6987 .79059 .05000

Independent Samples Test


Levene's Test
for Equality of
Variances t-test for Equality of Means
95%
Confidence
Sig. Interval of the
(2- Mean Std. Error Difference
F Sig. t df tailed) Difference Difference Lower Upper
Average Equal 4.198 .041 - 434 .595 -.03827 .07188 - .10300
Optimism variances .532 .17955
Score assumed
Equal - 426.329 .586 -.03827 .07020 - .09970
variances not .545 .17624
assumed

Sample Interpretation:

Females seem to have a slightly higher average optimism score of 3.699.

1) Levene’s test sig value is 0.041 < 0.05 => equal variances are not assumed
2) The sig value for Equality of Means is 0.586 > 0.05. ➔ do not reject null hypothesis ➔ There
is no statistical difference in the average optimism scores for males and females.

You might also like