You are on page 1of 3

SPSS Session Week Nine

Missing Values

When a respondent completes a questionnaire there will invariably be some questions that
have not been answered. This is either because of the design of the questionnaire, or
because a respondent neglects to answer them. It may be tempting to think “Oh – they’re
blank they don’t count”, but they do count, they really do matter. SPSS can be set to
automatically exclude any missing values from an analysis, but it has to be told to do so.
Also, there are different kinds of ‘missing’ and sometimes we are interested in a particular
one; trying to work out why so many people may leave a particular question blank is one
instance, but there are many more.

There is a convention that all missing values are coded using a negative number. The - sign
makes them easy to spot, and also means that if they are included in an analysis by mistake
the results are likely to be quite strange.

If a respondent simply ignores a question and writes nothing at all in response then that is a
true missing value, and is usually coded as -9, or -99. It is important that a distinction is
made between this and ‘refused to answer’. We cannot assume that a blank response is a
refusal; the respondent may simply have not seen the question or they could have been
interrupted and forgotten about it. If we think people may refuse to answer a question and
are interested to know if they do, then this must be an option to choose; there must be a
‘prefer not to answer’ box to tick (or something similar). In coding this would be coded as a
negative number, but not -9 or -99 (or -999), perhaps -7.

Not applicable is frequently coded as -1. These values typically occur in two part questions
such as “Did you attend the research seminar last week?” (with yes/no response options), “If
yes, how interesting did you find the presentation?”. If the respondent didn’t attend the
seminar then they obviously cannot answer the second part. The question would be coded
using two variables, a ‘no’ response for the first automatically leading to a -1 code for the
second. It is very important to code this not applicable as a missing value so that SPSS does
not include it in any analysis. It is possible to have a missing value response to the second
part instead of not applicable. If a respondent indicated that they did attend the seminar,
but then did not complete the second part, the missing value code would be used instead.

With every variable that may have missing values it is vital to code these when the file is
initially being set up. Usually the only variable that can be guaranteed not to have any
missing data is an ID variable.

1
Read the last part of the Andy extract, which explains how to set missing data.

Open the file from last week, either the original file or the one you were working on. The
aim now is to intentionally make some of the data missing. You can either delete some data
or add a couple of new variables and not complete all the cells.

For the variables with the missing data follow Andy’s instructions and assign the missing
data appropriate values.

Changing column width on the screen

The width of the ‘comments’ column on the screen is very wide! It actually makes it difficult
to see the rest of the file. So go into Variable view and see if you can adjust the width of the
column so that it is more manageable. Be careful though, you still want all the information
to be available, just not always present on the screen.

The number/word swop label

In data view, the fourth button from the right on the icon row has a capital ‘A’ and a number
‘1’. Click on the button and by magic many of the numbers will change to words! Any
categorical variable will now be displayed using the value for the coding, rather than the
coding itself.

Frequency tables.

Last week we looked at frequency tables, this week we are also going to see what SPSS does
with the missing data.

To run a frequency table go to “Analyze” then “Descriptive Statistics” then “Frequencies”.


This opens a dialogue box. Highlight any categorical variable of interest in the left hand box
and then click on the arrow between the two boxes. This moves it across so that it is in the
box with the heading ‘Variable(s). A frequency table will be produced for any variables listed
in the box. Frequency tables can be run for as many or as few variables as you wish. For now
just select one variable and then simply click on OK.

2
SPSS opens a new window called Output1. It’s worth taking a few minutes to see what is in
this window. It helpfully tells us the name of the file we are running frequencies on. The first
table simply summarises the valid and missing data. The second table provides the results of
the frequency analysis. SPSS helpfully uses the category label rather than the coding
number, making it much easier to understand the table.

The first column, Frequency, simply tells up the number of respondents in that category.
The Percent column gives the percentage value for each category, and the Valid Percent the
percentage if any missing values were ignored. If there are no missing values the two
columns will be the same. The final column adds the percentages for each category as they
are presented in the table.

Make sure that you understand where the missing data has gone!

You might also like