You are on page 1of 6

7/10/2014

Treatment of Missing Data

MULTIPLE IMPUTATION USING SPSS
David C. Howell

USING SPSS TO HANDLE MISSING DATA
SPSS will do missing data imputation and analysis, but, at least for me, it takes some getting used to.
Because SPSS works primarily through a GUI, it is easiest to present it that way. However I will also provide
the script that results from what I do.
The data file is named CancerHead-9.dat and contains the following variables related to child behavior
problems among kids who have a parent with cancer. (The "-9" in the title of the file is there to remind me that
this file used "-9" for missing data, which is a common notation for missing data in SPSS. (You could also
use 999, 99, or whatever set of values you want.) Once the data are read in you go to the Variable View and
enter the missing value (e.g. -9) as the missing data entry for each variable. The "Head" tells me that the
names of the variables are to be found in Line 1. Several of the variables in this example relate to the parent
(patient) with cancer. The other variables relate to the spouse of the patient. The variable names are, in order,
SexP (sex parent), DeptP (parent's depression T score), AnxtP (parent's anxiety T score), GSItP (parent's
global symptom index T score), DeptS, AnxtS, GSItS (same variables for spouse), SexChild, Totbpt (total
behavior problem T score for child). These are a subset of a larger dataset, and the analysis itself has no
particular meaning. I just needed a bunch of data and I grabbed an available file related to a research project
with which I was involved. We will assume that we want to predict the child's Total Behavior Problem T score
as a function of several other variables. I no longer recall whether the missing values were actually missing or
whether I deleted a bunch of values to create an example.
The first few cases are shown below. Notice that variable names are included in the first line. Missing data
are indicated by "-9".
SexP

DeptP

AnxtP

GSItP

DeptS

AnxtS

2

50

52

52

44

41

42

-9

1

65

55

57

73

68

71

1

60

1

57

67

61

67

63

65

2

45

2

61

64

57

60

59

62

1

48

2

61

52

57

44

50

50

1

58

1

53

55

53

70

70

69

-9

2

64

59

60

-9

-9

-9

-9

1

53

50

50

42

38

33

2

2

42

38

39

44

41

45

-9

2

61

61

55

44

50

42

1

1

44

50

42

42

38

43

-9

-9

2

57

55

51

44

41

35

-9

-9

-9

-9

-9

-9

57

52

57

2

GSItS

SexChild

Totbpt

-9

-9
-9
52
-9
51

65

2

70

59

66

-9

-9

-9

1

61

2

57

61

52

53

59

53

2

49

We read in the data as we normally do in SPSS, in my case as a "dat" file. Then from the Analyze menu
choose Multiple Imputation and then select Impute Missing Values. When you have made the necessary
assignments of variables to the role you will have a menu that looks like the following.

http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS.html

1/6

and would be useful in imputing missing data for that variable. For example. I do this because those extra variables may be able to add importantly to the imputed values. showing the last few lines of the original data and the first few lines of the data from imputation 1. You should look at that. but in fact it has. even though I drop it later. There are other choices in that window because I have created other stuff as I wrote this page. Basically you will see a list of variables with their means. It has created five data sets containing imputed values. The important thing to notice here is the section called "Location of Imputed Data. but chose not to use in in the final analysis. even though I will only use six of them in the regressions. That measures would presumably be nicely correlated with DeptP." I have taken the default and specified that the new dataset will be named SPSSImputations. It is important to note that this will NOT create a file in your directory with that name. from the raw data and from the imputed data. etc. (Imputation = 0 refers to the original data file. it will offer you the choice of going to that data set. and those are held in SPSSImputations. Notice that it looks like the original.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS. but you want to select "Untitled[SPSSImputations]-IBM SPSS Statistics Editor. but with a new variable called "Imputation_.7/10/2014 Treatment of Missing Data Notice that I have included all nine variables in doing the imputations.html 2/6 . The areas shaded in yellow are imputed values where the value was missing in the original.) You can see part of that data file below. It will create a file in your current session to which we will turn very shortly. You can see this in the following image. but it is not very exciting.uvm. referring to the particular imputation session. http://www. standard deviations. If you go to the Window tag in the main SPSS Window. So I include it here. suppose that I had a second measure of depression. This step of the procedure doesn't look as if it has done much for us." When you make that selection you will get the following data set. I am not going to present the output from that procedure because it doesn't get us very far." This will consist of the numbers 0 to 5.

That really means that if you use this data set with that procedure.uvm.7/10/2014 Treatment of Missing Data Now we are ready to do our analysis. If you look back at the first window that I showed you. we want to use linear regression to predict Totbpt from 5 other variables. This means that if you now take this new data set and go to the standard Analyze menu. you will see a note at the bottom referring to a special icon.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS. SPSS will recognize that you want to combine imputed data sets and will allow you to do so. You set this up as follows http://www. For example.html 3/6 . you will see that some of the procedures have this icon next to them. but we do it in kind of a strange way.

html 4/6 . http://www.7/10/2014 Treatment of Missing Data Noitice that I have added "Imputation_ to the box labeled "Selection Variable" and used the "Rule" to specify that I want it to use all imputations numbered 1 or more.uvm. The partial results of this printout follow.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS.

but they will be reasonably close. DATASET ACTIVATE DataSet2. This is the result you were looking for.I think that the "error message" in that last window is not an error message. MULTIPLE IMPUTATION SexP DeptP AnxtP AnxtS Totbpt DeptS GSItP GSItS SexChild /IMPUTE METHOD=AUTO NIMPUTATIONS=5 MAXPCTMISSING=NONE /MISSINGSUMMARIES NONE /IMPUTATIONSUMMARIES MODELS DESCRIPTIVES /OUTFILE IMPUTATIONS=SPSSImputations . and then it shows you for the "pooled" data. It is simply saying that I did not chose to include Imputation 0. http://www. The values will not be exactly the same.html 5/6 . SPSS Syntax For those who like to work with syntax rather than focussing on the GUI.7/10/2014 Treatment of Missing Data The important part is the last set of output. the syntax for this analysis follows.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS.uvm. DATASET DECLARE SPSSImputations. and is comparable to what we found in the last bit of printouts for NORM and SAS. It shows you what the regression coefficients. their standard errors. which was the original data. *Impute Missing Data Values. etc were for the 5 separate imputations.

7/10/2014 Treatment of Missing Data REGRESSION /SELECT=Imputation_ GE 1 /MISSING LISTWISE /STATISTICS COEFF OUTS CI(95) R ANOVA /CRITERIA=PIN(.uvm.10) /NOORIGIN /DEPENDENT Totbpt /METHOD=ENTER SexP DeptP AnxtP DeptS AnxtS /SAVE SRESID.05) POUT(. Return to Dave Howell's Statistical Home Page Send mail to: David.edu/~dhowell/StatPages/More_Stuff/Missing_Data/MissingDataSPSS.edu) Last revised 12/37/2012 http://www.html 6/6 .Howell@uvm.