Quntative Data Analysis SPSS

Formating, Handling, & Manipulation

Jamil A. Malik (PhD)
National Institute of Psychology
Quaid-e-Azam Univeristy

Data Cleaning in SPSS
1. Data labeling and formatting
2. Data Merging
3. Re-coding existing variables
4. Data manipulation
Creating new variable from existing variables
5. Working with Syntax

00 47.00 67.00 66.00 72.00 60.00 69.00 68.00 59.00 61.00 68.00 71.Data labeling and formatting Specifying Type of Variable HT 61.00 65.00 .00 68.00 66.00 72.00 66.00 66.00 73.00 72.

Data labeling and formatting Data Labeling .

Data labeling and formatting Variable Formatting .

Data labeling and formatting Specifying missing values .

Data labeling and formatting Measurement category .

Data merging in SPSS 1. Merge Files . Select Add Variables under Data. In SPSS. Make sure that both files are sorted by Key variable in ascending order 2. open Data menu 3.

Select the dataset you want to merge into the working file. .Data merging in SPSS 4.

Click on Match cases on key variables in sorted files. Click on Both files provide cases 7.Data merging in SPSS 5. then click ► near key Variables . 6. Highlight ID in the excluded variables box.

Recoding existing variables From SPSS dialog box. go to: Transform Recode Into Same variables .

Recoding existing variables 1. Select Group from the variable box into String Variables box 2. Click on Old and new Values to proceed .

then click Continue Click OK to execute the commands. 4. 3. . 2. or change.Recoding existing variables 1. Type the old value and the new value you want to convert into Click on Add (To remove. click on Change or Remove) Type all values in the Old  New box.

Computing New Variables Computing patient’s age from birthday and date enrolled into the study. .

intended. mistakes. or lack of foresight by the researcher – Due to problems outside the control of the researcher – Deliberate. .Handling Missing • What is certain in life? – Death – Taxes • What is certain in research? – Measurement error – Missing data • Missing data can be: – Due to preventable errors. or planned by the researcher to reduce cost or respondent burden – Due to differential applicability of some items to subsets of respondents – Etc.

family) member non-response .Some Characteristics of Missing Data  Facets of missing data ◦ Persons ◦ Variables ◦ Occasions  Type of non-response ◦ Block non-response ◦ Wave non-response ◦ Item non-response  Special non-response problems in longitudinal and clustered data ◦ Attrition/drop-out ◦ Group (e. g.

Missing Data in Research Studies  Missing data mechanism ◦ Missing completely at random (MCAR)—Ignorable ◦ Missing at random (MAR)—Conditionally ignorable ◦ Missing not at random (MNAR)—Nonignorable  Amount of missing data ◦ Percent of cases with missing data ◦ Percent of variables having missing data ◦ Percent of data values that are missing .

nearest neighbor) methods  Mean substitution ◦ (Variable) mean substitution ◦ Mean substitution with added random error ◦ Predictor mean substitution with missing data dichotomy . or rule-based imputation ◦ Treat missing data for nominal predictors as an additional category  Hot deck (donor case) imputation ◦ Cluster based methods ◦ Distance based (e.Older Missing Data Treatments (1)  Deletion methods ◦ Listwise deletion (complete case analysis) ◦ Pairwise deletion (available case analysis)  Cold deck imputation ◦ Deterministic. logical. g.

Older Missing Data Treatments (2)  Regression imputation ◦ Regression predicted value imputation ◦ Regression imputation with added random error  Special methods for longitudinal studies and randomized controlled trials ◦ ◦ ◦ ◦  Endpoint only analysis Last observation carried forward (LOCF) Intent to treat worst (best) case imputation Summary growth parameters Special methods for multi-item scales ◦ ◦ ◦ ◦ Available item method of scale construction Person mean imputation Two-way imputation Two-way imputation with added random error .

and stand-alone missing data packages such as SOLAS . and other procedures ◦ There are also freeware and open source programs that can produce the ML covariance matrix and mean vector. EMCOV)  Multiple imputation ◦ Imputes individual data values in multiple complete datasets. Mplus. Mplus). etc. Stata (mi impute and mi estimate). Reliability. EQS. SAS (Proc MI and MIANALYZE). averaging the results of the statistical analyses across these datasets ◦ Available in the current versions of certain SEM software (Amos. Factor analysis.) ◦ The ML covariance matrix and mean vector can also be obtained from SPSS MVA. usually by using the Expectation Maximization (EM) algorithm (e. ◦ Also available in SPSS (MVA). Lisrel. and used for standard Regression.g.Modern Missing Data Treatments  Maximum likelihood (ML) ◦ Estimates summary statistics or statistical models using all available data ◦ Available in modern structural equation modeling software (Amos. Mx.

Why do social scientists use modern missing data treatments so infrequently?      Lack of awareness or familiarity They are not convinced of the problems with older methods The statistical literature on missing data is technically daunting The techniques aren’t incorporated into the standard statistical analysis procedures used by social scientists Journal reviewers and editors have not required it .

Working with SPSS Syntax Demonstration .