You are on page 1of 35

SPSS guide for Research Methods and Skills for Premaster

2019-2020
Version 3

Note: all examples in this guide are based on ESS8e02

Contents
Downloading the data file ........................................................................................................................................................................................................................................................ 3
SPSS Settings ................................................................................................................................................................................................................................................................................. 4
SPSS Windows .............................................................................................................................................................................................................................................................................. 6
SPSS syntax .................................................................................................................................................................................................................................................................................... 7
Opening an SPSS data file from the syntax ....................................................................................................................................................................................................................... 9
Good data management ......................................................................................................................................................................................................................................................... 11
Frequency table......................................................................................................................................................................................................................................................................... 12
Cross tabulation ........................................................................................................................................................................................................................................................................ 13
Means ............................................................................................................................................................................................................................................................................................ 15
Chi-square ................................................................................................................................................................................................................................................................................... 16
One sample t-test ...................................................................................................................................................................................................................................................................... 17
Independent samples t-test.................................................................................................................................................................................................................................................. 18
Correlation (pearson) ............................................................................................................................................................................................................................................................. 19
Scatterplot ................................................................................................................................................................................................................................................................................... 20
Regression ................................................................................................................................................................................................................................................................................... 21
Cronbach’s alpha....................................................................................................................................................................................................................................................................... 22
Declaring user missing values............................................................................................................................................................................................................................................. 23
Renaming variables ................................................................................................................................................................................................................................................................. 24
Transforming and generating variables using recode .............................................................................................................................................................................................. 25
Transforming and generating variables using compute .......................................................................................................................................................................................... 26
Making a scale using compute............................................................................................................................................................................................................................................. 28
Dummy coding examples ...................................................................................................................................................................................................................................................... 29
Variable labels ........................................................................................................................................................................................................................................................................... 32
Value labels ................................................................................................................................................................................................................................................................................. 33
Sub-setting data ........................................................................................................................................................................................................................................................................ 34
Splitting the data in subgroups ........................................................................................................................................................................................................................................... 35

2
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Downloading the data file

Go to the course Canvas site.


Go to Pages > SPSS
Look for the dataset link
Click on the file name. The file download should now start. Make sure you save the file on your own drive and not in a temporary folder. You may want to
save it in a subfolder you name ‘RM - SPSS labs’.

3
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

SPSS Settings
You are recommended to adjust the settings of SPSS to the following:
Go to Edit > Options

In tab “general”, tick the box “only open one dataset at a time” In tab “pivot tables”, select TableLook “compact”.

4
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

In the tab “Output”, set all to “names and labels”/ “values and labels”. In the tab “Viewer”, tick “Display commands in the log”.

Click “apply”.

If you are working on your own laptop, you only have to do this once.
If you are working on a computer lab computer, you will need to adjust this at the start of each session.

5
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

SPSS Windows
When working with SPSS you will use three types of windows
1. Main window
2. Syntax
3. Output

Open SPSS

If you open SPSS, you will first only see the main window.

The main SPSS window consists of the tabs, the ‘data view’ and the ‘variable view’.

You should never manipulate the data in the data view.

6
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

SPSS syntax
In SPSS the commands are stored in a “syntax” file. This is the most important file. If you work with code, this is the only file you need to save at
the end of your session.

There are two main ways of working with statistical programmes such as SPSS. The first is to use the drop-down menus and click on whatever operation
you would like to execute (just like you would do in Word or Excel). The second is to use commands. These commands are the language of SPSS. The
rules of this language are called its “syntax”.

There are many advantages to using commands over drop-down menus. You can easily fine-tune settings, repeat a range of similar operations much
faster, and, most importantly, it allows you and others to trace what you have done. This is especially useful if you want to go back to an analysis or
recoding that you have done a week or longer ago. You do not have to remember what you have done, you can just see it.
If you are working with several people on the same dataset you can share your syntax-file and if you have a problem you can email the file to
someone so they can try to help you. Once you get it hang of it, it’s also faster to use syntax.
While you can get SPSS to generate the code for you by using the ‘paste’ option in the menus, this approach does not help you understand what the
code means. Pasted codes are also much longer than self-generated code.

The basic structure of SPSS syntax is : command varlist /options .


Command: what do you want to do? e.g. frequency tab, crosstab, regression.
Varlist: what do you want to do it with (replace this by the actual variable name)
/options: this will not always be necessary. The options vary with the command.

Remember to end an SPSS command with a full stop ‘.’. Otherwise SPSS doesn’t understand the command is finished and won’t execute it. (Note in
newer versions of SPSS you can also end a command with an empty line (witregel).

It is highly recommended to write comments above each command that explain what your file and code are for. Comments
should start with ‘*’ and end with ‘.’
If you do this correctly the comment will turn grey.

This guide covers examples of the most frequently used codes for this course.
The examples use the following placeholders
file-path: you should replace this by the file path you are using
filename : you should replace this by name of the file you are using or the name that you want to give to the file.
varlist: you should replace this by the names of 1 or more variable.
var: you should replace this by the name of 1 variable.
value(s): you should replace this by one or more numbers.

7
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Open a syntax window, by going to File>New>Syntax

At the top of your files, make a comment on what the file is for.

Save your syntax file in your folder and give it a clear name (so not ‘syntax1’ but for example ‘OM-20180418’).

When all changes are saved, the disk icon turn grey.

You are strongly recommended to frequently save your syntax (by clicking on the disk icon) during your SPSS session. This prevents you from
losing (a lot of) your work.

8
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Opening an SPSS data file from the syntax

Command:
get file=”file-path/filename”.
It is important to put the file name and path between quotation marks (“ ” or ‘ ’) so that SPSS knows where the file path starts and ends, and does not
get stuck on any spaces.
But how to know the file path?
Go to Windows explorer and open the folder in which you have save your data file. Click in the address bar at the top; this will show you the file path
of your datafile.

Copy this file path and paste it in your syntax, followed by the name of your file (I recommend also copying this from the explorer). Make sure the
file name ends with ‘ .sav’.

To run your command, select the command line in the syntax file and either use ctrl+r or click on the green arrow in the command ribbon.

9
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Open the European Social Survey (ESS) file via your syntax.

An output window should pop up with the ‘get file’ command (if not, check your SPSS settings). You should now see data and variables in the main
window.

10
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Information in the datafile

In the Data view; each row represents a case (most commonly a respondent of a survey) and each column represents a variable.

In the Variable view you can see information on the variables in the data set:
• Name: the variable name. This is what you use in the code to refer to a variable
• Type: indicates whether the variable is entered as numbers (numeric) or text (string). Note that a nominal variable can be entered as
numeric for example the nominal variable gender can be entered as 1=male and 2=female.
• Label: short description of the variable, usually the question text or a summary thereof
• Values: for numeric variables you can find the labels for the values here
• Missing: information on user defined missing values. These contain the codes to answer categories in the survey such as ‘don’t know’
and ‘refusal’ that should not be included in analyses.

Good data management


To prevent losing data or making unwanted (and irreversible!) changes, please stick to the following data management rules.
• Always give your dataset a clear name so you can easily locate it.
• Always keep a copy of the original dataset. Do not save any changes to this copy. This data is your ‘source’; you should always be able to go
back to it in case you fear (or know) you have made a mistake. When exiting SPSS and asked if you want to save changes to the data you
should almost always click “no”. However make an extra copy of your raw data for safe keeping just in case you accidentally click “yes”.
• Never make changes to the data directly in the data editor; always use a syntax-file. This way you can trace what you have changed and
there is less risk of accidentally changing the wrong variable or case.
• Always work with syntax-files to keep track of the data manipulation and analyses you have done. Make sure to save these files at the end
of each session and during the session.
• Always check variables for issues (unlikely values, undefined user missing values) before using them for variable generating or analyses.
• When you recode a variable, for instance if you want to merge categories or invert the coding, always give this new variable a different name.
This way you preserve the original variable which is very useful in case you have made a mistake in the recoding or decide you want to make
different sub-categories. You can also use this original variable to check if the recoding worked the way you wanted it to.
• Always check your newly generated variable for mistakes.
• Frequently hit the ‘save’ button on your syntax-files, but not on your data-files. Only save data-files if you have created a subset of the data
or have combined several files into one and use a new name for this combined file.

11
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Frequency table
Command:
freq varlist.

Example:
Request a frequency table for the variable ‘interest in politics’:

freq polintr.

You should see this in your output window:

You can see that the missing values have already be declared by the data producers (as they are listed under ‘missing’ rather than under ‘valid’).

Request a frequency table for the gender of the respondent. The name of this variable is ‘gndr’.
12
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Cross tabulation
Command:
crosstab varlist by varlist /cell col row count.

The first variable you list will make up the rows in the table and the second variable the columns.
You can add one or more of the following options
col= column wise percentage
row= row wise percentages
count=absolute numbers

These options need to be specified after “/cell” because this options refers to what SPSS should display in the cells of the cross tabulation.

Example:
Request a crosstabulation between the variable ‘interest in politics’ (polintr) and ‘main activity in last 7 days (recoded)’ (mnactic):

crosstab mnactic by polintr.

You should see this in your output window (see next page):

13
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Request a crosstabulation between “interest in politics” and “gender”. To see whether there is a difference in interest in
politics by gender, add percentages to your table. What do you conclude?

14
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Means
Command:
means varlist.

Example:
Request the means for the variable ‘interest in politics’:
means polintr.

You should see this in your output window:

You can also calculate the means across subgroups


Command:
means varlist by groupingvar.

Request the mean of “interest in politics” by “gender”. What do you conclude?


15
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Chi-square

Command:
crosstabs
/tables = var by var
/statistic = chisq.

Example:
To test whether there is a relation between the variable ‘interest in politics’ and ‘main activity in last 7 days (recoded)’:

crosstab
/tables= mnactic by polintr
/statistic=chisq.

The test meets the assumptions of chi-square for minimum expected cell count (5). The chi-square test is significant at p<.001 (top row);
there is a significant relationship between main activity and interest in politics.

16
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

One sample t-test


Command:
t-test
/testval=value
/variables=varlist
/criteria=CIN (value).

Example:
Test whether, Europeans are overall satisfied with the way democracy works (stfdem : measured on a scale from 0 ‘extremely dissatisfied’ tot
10’extremely satisfied’, say let’s see if satisfaction is above 5).

t-test
/testval=5
/variables=stfdem.

Because the command didn’t specify a confidence interval, SPSS presents the results for the default confidence interval (95%C.I.). The mean score in
the ESS dataset is 5.27. The mean difference between that score on the value we test against (5) is 5.27-5=.265 (rounded to .27). This is significant at
p<.001 (the p-value is listed under ‘Sig (2- tailed)’).
17
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Independent samples t-test


Command:
t-test groups=grouping variable (value_group1 value_group2)
/variables=dependent variable
/criteria=CIN (value).

Example:
Test whether women (respondents with a score of 2 on the variable gndr) are less interested in politics than men (respondents with a score of 1 on
the variable gndr)

t-test groups=gndr (1 2)
/variables= polintr.

The output shows that men have a lower mean score (2.39) than women (2.64); men in the sample are more interested in politics (on the interest in
politics variable a higher score means less interest). Levene’s test is significant (Sig .00: p<.001), so you have to look at the output of the bottom row
(Equal variances not assumed). The t-test has a p-value (Sig (2-tailed)) of p<.001; women are significantly less interested in politics than men.

18
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Correlation (pearson)
Command:
correlations
/variables=varlist.

Example:
Is there a relationship between trust in the national parliament (trstprl) and in the European parliament (trstep)?

Correlation /variables= trstprl trstep.

Both variables are measured on a scale from 0-10 with higher scores signaling more trust. The correlation between the two is .55 with a p<.001
There is a significant positive relation between trust in the national and European parliament.

19
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Scatterplot

Command:
GRAPH
/SCATTERPLOT(BIVAR)= var with [varname1]

A scatterplot is only useful if at least one of the variables has a wide range. Otherwise you just get rows of dots:

GRAPH /SCATTERPLOT(BIVAR)= trstprl with trstep

20
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Regression
Command:
regr
/dep= depvar
/enter= varlist
/descriptives.

Example:
The effect of age (agea) on interest in politics (polintr), controlling for years of fulltime education (eduyrs)

Reg
/dep= polintr
/enter=agea eduyrs.

Age and education explain 9.6% of the variation in political interest (R-square
is .096)

Controlling for education, for each year increase in age, the political interest
score decreases by .010 (look at column unstandardized coefficients, B).
This effect is significant with p<.001 (look at the Sig column): there is a
significant positive relation between age and interest
(Remember: a lower score on the political interest variable means more
interest.)

21
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Cronbach’s alpha
Command:
rel /variables=varlist /sum=total.

Example:
Examining whether the variables on trustin different institutions, form a reliable scale of institutional trust

rel /variables=trstprl trstlgl trstplc trstplt trstprt /sum=total.

The reliability analysis is conducted only with cases with no missing values on any
of the 5 items (variables) listed in the command, leaving an N of 44387.
The Cronbach’s alpha is .885 which is high.
The column ‘Cronbach’s alpha if item deleted’ in the bottom output table shows
that removing any of the items, would decrease Cronbach’s alpha.
If this column suggests the Cronbach’s alpha would improve considerably after
removing the item, you can rerun the command without the ‘bad’ item. Never
remove more than 1 item are a time.

22
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Declaring user missing values

Command:
missing values varlist (values).

In ESS, the data producers have already declared (almost) all missing values. This will not always be the case.
Therefore you should always inspect your variable in a frequency table before using it in any transformation or analyses. If you see that the
missing values have not yet been declared (because values 99/don’t are listed under ;’valid’ rather than under ‘missing’), you can do this with the
‘missing values’ command.

Example:
Let’s say you have a variables z1 in which 88 stands for ‘don’t know’ and 99 for ‘refusal’. To inform SPSS that these are (user) missing values, you
should type and run the code:

missing values z1 (88 99) .

23
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Renaming variables

Command:
rename variable oldname=newname.

Example:
Rename the variable ‘trust in parliament’ (variable name = trstprl) ‘trustpar’:

rename variable trstprl=trustpar.

Only the command will appear in the output; there is no other output to show.
The name of the variable has now change in the dataset. If you use the old in your commands, SPSS will return an error message.

24
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Transforming and generating variables using recode

Command:
recode varlist (oldvalue1=newvalue1) (oldvalue2)=(newvalue2) into newvariablename.

Example:
Reverse the coding of the variable polinterest so that a higher score means more interested, rather than less interested. Name the new
variable ‘polinterest_rev’.

recode polintr (4=1) (3=2) (2=3) (1=4) into polintr _rev.

Only the command will appear in the output; there is no other output to show.

Add value labels to the new variable (see also p33):


value labels polinterest_rev
1 "not at all interested"
2 "hardly interested"
3 "quite interested"
4 "very interested".

To check whether the new variable was generated correctly by comparing it to the source variable in 2 ways:
1) Comparing the number of missing values

Descr polintr polintr _rev.

2) Comparing the coding (with a reverse coding, all values should be on the diagonal of the crosstab)

crosstab polintr by polintr _rev.

25
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Transforming and generating variables using compute


Command:
compute newvarname=definition.
if condition newvarname=value.

Example:
Generate the variable age from information on the year of birth and the year of survey.
compute age= inwyys- yrbrn.
exe.

After running the ‘compute’ line SPSS may indicate that ‘transformations are pending’.

26
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

SPSS will finish the transformation once you run another command.
You can also force SSP to execute the command by running the line ‘exe’ (short for execute).

You can check this variable by requesting an excerpt from the dataset, for example the first 15 rows:
list age inwyys yrbrn /cases from 1 to 15.

SPSS displays the values of the three variables in the list for rows 1 to 15 of the dataset. This allows you to see whether you used the correct
formula. A person born in 1982 was indeed 34 at the time of the survey in 2016.

27
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Making a scale using compute


Command:
compute newvarname=mean (var1, var2, var3, var4, var5).

Combining items into a scale can best be done with ‘compute….mean’ because this command takes missing values into account.

Example
Making a scale of political trust using the items trstprl trstlgl trstplc trstplt trstprt.
All items are measured on the same 0-10 scale. If you want to combine items measured on different answer scales you should first standardize them,
before combining them into a scale (this is because a score of ‘2’ means something different on a scale from 1-3 than on a scale from 0-10).

Compute poltrust=mean(trstprl, trstlgl, trstplc, trstplt, trstprt).

Check the newly generated variable by exploring the range (in this case, values should remain between 0 and 10, because that is the range of the
variables going into the scale) and by looking at a list.

descr poltrust.
list poltrust trstprl trstlgl trstplc trstplt trstprt /cases from 1 to 20.

As you may be able to see in the output, if a respondent only fewer than 5 of the items items, the score on ‘poltrust’ will be based on the mean score on
these items they provide a valid answer on.
If you had used the code

Compute poltrust= (trstprl+trstlgl+trstplc+trstplt+trstprt).

The summed score of a respondent who answered 4 out of 5 questions would have still been divided by 5, artificially decreasing their score.
You can require a minimum number of valid answers for inclusion in the scale. Respondents who gave valid answers to fewer items will be assigned a
missing value in the scale. If for example, you want to only include respondents who gave at least 3 valid answers, you can use the code
Compute poltrust=mean.3(trstprl, trstlgl, trstplc, trstplt, trstprt).

28
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Dummy coding examples


For example, to generate a variable called “male” in which women receive a score of 0 and men a score of 1.
The first step is always to look up the coding of the gender variable (gndr). In this variable, male=1 and female=2.
There are multiple ways to make a dummy code. Which option to use depends on personal preference and complexity of the original variable.

Option 1
recode gndr (1=1) (2=0) into male.

Option 2
compute male=$SYSMIS.
if gndr=1 male=1.
if gndr=2 male=0.
exe.

The first line of code generates a new column in your dataset name ‘male’ with only missing values ($SYSMIS).
The second line of code assigns a score of ‘1’ in the new variable ‘male’ to all men (men are people coded 1 in the variable gndr). The
third line of code assigns a score of ‘0’ in the new variable ‘male’ to all women (women are people coded 2 in the variable gndr). “Exe”
forces SPSS to execute all transformations.

Option 3
Compute male=gndr=1.

This tell SPSS to make a new variable ‘male’ which equals 1 when gndr equals 1, and 0 for all other valid values.

As with recode, you should always check your newly generated variables.
1) Comparing the number of missing values (this should (almost) always remain the same if your new variable is generated from 1 variable)
descr gndr male.

2) Comparing the coding (with a reverse coding, all values should be on the diagonal of the crosstab)
crosstab gndr by male .

29
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

To make dummies for ‘city/suburb’ and ‘rural ’ from the variable ‘domicil’, which is coded
1 A big city
2 Suburbs or outskirts of big city
3 Town or small city
4 Country village
5 Farm or home in countryside
The dummy ‘city’ will have a score of 1 for respondents in big cities and suburbs or outskits of big city, and 0 for all other types of domicile.
The dummy ‘rural’ will have a score of 1 for respondents in country villages or farm or home in country side, and 0 for all other types of domicile.

Option 1
recode domicil (1 thru 2=1) (3 thru 5=0) into city.
recode domicil (1 thru 3=0) (4 thru 5=1) into rural.

Option 2
compute city=$SYSMIS.
if domicil <3 city =1.
if domicil >2 city =0.
exe.

The first line of code generates a new column in your dataset name ‘city’ with only missing values ($SYSMIS).
The second line of code assigns a score of ‘1’ in the new variable ‘city’ to all respondents livigin in a city or suburb (people coded 1
or 2 in the variable domicil). The third line of code assigns a score of ‘0’ in the new variable ‘city’ to respondents in all other types of
domiciles (respondents with codes of 3,4, or 5 on the variable domicil). “Exe” forces SPSS to execute all transformations.

compute rural =$SYSMIS.


if domicil>3 rural =1.
if domicil<4 rural =0.
exe.

30
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Option 3
Compute city= domicil<3.

This tell SPSS to make a new variable ‘city’ which equals 1 when domicil is smaller than 3 (so 1 or 2), and 0 for all other valid values.

Compute rural = domicil>3.

Check your newly generated variables.


1) Comparing the number of missing values (this should (almost) always remain the same if your new variable is generated from 1 variable)
descr domcil city rural.

2) Comparing the coding (with a reverse coding, all values should be on the diagonal of the crosstab)
crosstab domcil by city rural.

Label the new variables:

variable label city "city/suburb".


variable label rural "village/countryside".

31
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Variable labels

Command:
variable labels var ‘label’.

Example:
Label the variable age (see p26) as “age at time of survey”.

variable labels age ‘age at time of survey’.

32
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Value labels

Command:
value labels var value “label” value “label” value “label” value “label”

Example:
Add value labels to the variable political interest (reversed) – (see page 25):
value labels polinterest_rev
1 "not at all interested"
2 "hardly interested"
3 "quite interested"
4 "very interested".

Depending on your preference you can type the code on one line, or start a line for each value. The full stop (.) should only be listed once, at the end of
the code (see example above).

33
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Sub-setting data
Command:
Select if var=condition.

Examples:
The variable name for age is agea. The code to limit the dataset to only respondents aged 18 and over:

select if agea>17.

This commands tell SPSS to drop all respondents with an age of 17 or lower from the dataset. All analyses from this point on will only be on
respondents aged 18 and up.

You want to limit the dataset to only respondents from France. Country (cntry) is a string variable. The code for France FR. For string variables, codes
need to be placed between quotation marks.

select if cntry=’FR’.

This commands tell SPSS to drop respondents from all countries except from France from the dataset. All analyses from this point on will only be on
respondents from France.

Sometimes it can be helpful to only do one analysis for a subgroup, rather than dropping respondents from the dataset. This can be done by adding the
temporary command. For example to run a chi-square test only for France;

Temporary.
select if cntry=’FR’.
crosstab
/tables= mnactic by polintr
/statistic=chisq.

It is important to run all three commands (temporary, select if, and the analysis) in one go.

34
SPSS guide for Research Methods and Skills for Premaster - 2019-2020

Splitting the data in subgroups

Command:

Sort file by var.


Split file by var.
analysis
Split file off.

Example:
You may want to know if the relation between age, education and political interest is the same for all countries in the dataset.
The variable for country is cntry. The code is

Sort file by cntry.


Split file by cntry.
Reg
/dep= polintr
/enter=agea eduyrs.
Split file off.

All four commands (sort, split, reg and split file off) should be run in one go.
SPSS returns a regression table that is split by country.

35

You might also like