SPSS Instructions for Introduction to Biostatistics
Larry Winner Department of Statistics University of Florida
SPSS Windows
Data View
Used to display data Columns represent variables Rows represent individual units or groups of units that share common values of variables
Variable View
Used to display information on variables in dataset TYPE: Allows for various styles of displaying LABEL: Allows for longer description of variable name VALUES: Allows for longer description of variable levels MEASURE: Allows choice of measurement scale
Output View
Displays Results of analyses/graphs
Data Entry Tips I
For variables that are not identifiers (such as name, county, school, etc), use numeric values for levels and use the VALUES option in VARIABLE VIEW to give their levels. Some procedures require numeric labels for levels. SPSS will print the VALUES on output For large datasets, use a spreadsheet such as EXCEL which is more flexible for data entry, and import the file into SPSS Give descriptive LABEL to variable names in the VARIABLE VIEW Keep in mind that Columns are Variables, you dont want multiple columns with the same variable
Data Entry/Analysis Tips II
When reanalyzing previously published data, it is often possible to have only a few outcomes (especially with categorical data), with many individuals sharing the same outcomes (as in contingency tables) For ease of data entry:
Create one line for each combination of factor levels Create a new variable representing a COUNT of the number of individuals sharing this outcome
When analyzing data Click on:
DATA WEIGHT CASES WEIGHT CASES BY Click on the variable representing COUNT All subsequent analyses treat that outcome as if it occurred COUNT times
Example 1.3  Grapefruit Juice Study
crcl 38 66 74 99 80 64 80 120
To import an EXCEL file, click on: FILE OPEN DATA then change FILES OF TYPE to EXCEL (.xls)
To import a TEXT or DATA file, click on:
FILE OPEN DATA then change FILES OF TYPE to TEXT (.txt) or
DATA (.dat)
You will be prompted through a series of dialog boxes to import dataset
Descriptive StatisticsNumeric Data
After Importing your dataset, and providing names to variables, click on:
ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES Choose any variables to be analyzed and place them in box on right Options include:
Mean : y
y
i 1
Sum : yi
i 1
Std. deviation : S S S.E. Mean : n
y y
n i 1 i
n 1
Variance : S 2
Example 1.3  Grapefruit Juice Study
t i i t t t t t t t i i i i i i i
Descriptive StatisticsGeneral Data
After Importing your dataset, and providing names to variables, click on:
ANALYZE DESCRIPTIVE STATISTICS FREQUENCIES Choose any variables to be analyzed and place them in box on right Options include (For Categorical Variables):
Frequency Tables Pie Charts, Bar Charts
Options include (For Numeric Variables)
Frequency Tables (Useful for discrete data) Measures of Central Tendency, Dispersion, Percentiles Pie Charts, Histograms
Example 1.4  Smoking Status
Vertical Bar Charts and Pie Charts
After Importing your dataset, and providing names to variables, click on:
GRAPHS BAR SIMPLE (Summaries for Groups of Cases) DEFINE Bars Represent N of Cases (or % of Cases) Put the variable of interest as the CATEGORY AXIS GRAPHS PIE (Summaries for Groups of Cases) DEFINE Slices Represent N of Cases (or % of Cases) Put the variable of interest as the DEFINE SLICES BY
Example 1.5  Antibiotic Study
80 60
40
20
Count
5
0 1 2 3 4 5
OUTCOME
3 1
Histograms
After Importing your dataset, and providing names to variables, click on:
GRAPHS HISTOGRAM Select Variable to be plotted Click on DISPLAY NORMAL CURVE if you want a normal curve superimposed (see Chapter 3).
Example 1.6  Drug Approval Times
30
20
10
Std. Dev = 20.97 Mean = 32.1 0 N = 175.00
0 0. 12 0 0. 11 0 0. 10
.0 90 .0 80 .0 70 .0 60 .0 50 .0 40 .0 30 .0 20 .0 10
MONTHS
0 0.
SidebySide Bar Charts
After Importing your dataset, and providing names to variables, click on:
GRAPHS BAR Clustered (Summaries for Groups of Cases) DEFINE Bars Represent N of Cases (or % of Cases) CATEGORY AXIS: Variable that represents groups to be compared (independent variable) DEFINE CLUSTERS BY: Variable that represents outcomes of interest (dependent variable)
Example 1.7  Streptomycin Study
30
20
OUTCOME
1 2
10
3 4
Count
5 0 1 2 6
TRT
Scatterplots
After Importing your dataset, and providing names to variables, click on:
GRAPHS SCATTER SIMPLE DEFINE For YAXIS, choose the Dependent (Response) Variable For XAXIS, choose the Independent (Explanatory) Variable
Example 1.8  Theophylline Clearance
8 7
6 5
3 2
THCLRNCE
1 0 .5 1.0 1.5 2.0 2.5 3.0 3.5
DRUG
Scatterplots with 2 Independent Variables
After Importing your dataset, and providing names to variables, click on:
GRAPHS SCATTER SIMPLE DEFINE For YAXIS, choose the Dependent Variable For XAXIS, choose the Independent Variable with the most levels For SET MARKERS BY, choose the Independent Variable with the fewest levels
Example 1.8  Theophylline Clearance
8 7
6 5
DRUG
THCLRNCE
2 Tagamet 1 0 0 2 4 6 8 10 12 14 16 Pepcid Placebo
SUBJECT
Contingency Tables for Conditional Probabilities
After Importing your dataset, and providing names to variables, click on:
ANALYZE DESCRIPTIVE STATISTICS CROSSTABS For ROWS, select the variable you are conditioning on (Independent Variable) For COLUMNS, select the variable you are finding the conditional probability of (Dependent Variable) Click on CELLS Click on ROW Percentages
Example 1.10  Alcohol & Mortality
Independent Sample tTest
After Importing your dataset, and providing names to variables, click on:
ANALYZE COMPARE MEANS INDEPENDENT SAMPLES TTEST For TEST VARIABLE, Select the dependent (response) variable(s) For GROUPING VARIABLE, Select the independent variable. Then define the names of the 2 levels to be compared (this can be used even when the full dataset has more than 2 levels for independent variable).
Example 3.5  Levocabastine in Renal Patients
s u f a o n a l e r . e a e e o 2 d p i t r r g f p e e
4 1 6 0 4 7 7 0 3 a
6 3 6 7 7 3 6 n
Wilcoxon RankSum/MannWhitney Tests
After Importing your dataset, and providing names to variables, click on:
ANALYZE NONPARAMETRIC TESTS 2 INDEPENDENT SAMPLES For TEST VARIABLE, Select the dependent (response) variable(s) For GROUPING VARIABLE, Select the independent variable. Then define the names of the 2 levels to be compared (this can be used even when the full dataset has more than 2 levels for independent variable). Click on MANNWHITNEY U
Example 3.6  Levocabastine in Renal Patients
Paired ttest
After Importing your dataset, and providing names to variables, click on:
ANALYZE COMPARE MEANS PAIRED SAMPLES TTEST For PAIRED VARIABLES, Select the two dependent (response) variables (the analysis will be based on first variable minus second variable)
Example 3.7  Cmax in SRC&IRC Codeine
l r t 
Wilcoxon SignedRank Test
After Importing your dataset, and providing names to variables, click on:
ANALYZE NONPARAMETRIC TESTS 2 RELATED SAMPLES For PAIRED VARIABLES, Select the two dependent (response) variables (be careful in determining which order the differences are being obtained, it will be clear on output) Click on WILCOXON Option
Example 3.8  t1/2SS in SRC&IRC Codeine
f I
I I I
Relative Risks and Odds Ratios
After Importing your dataset, and providing names to variables, click on:
ANALYZE DESCRIPTIVE STATISTICS CROSSTABS For ROWS, Select the Independent Variable For COLUMNS, Select the Dependent Variable Under STATISTICS, Click on RISK Under CELLS, Click on OBSERVED and ROW PERCENTAGES NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
Example 5.1  Pamidronate Study
r l (
Example 5.2  Lip Cancer
r l
Fishers Exact Test
After Importing your dataset, and providing names to variables, click on:
ANALYZE DESCRIPTIVE STATISTICS CROSSTABS For ROWS, Select the Independent Variable For COLUMNS, Select the Dependent Variable Under STATISTICS, Click on CHISQUARE Under CELLS, Click on OBSERVED and ROW PERCENTAGES NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
Example 5.5  Antiseptic Experiment
c c p t t s a s s d i l i i d u d f d b P 5 1 4 a 8 1 8 L 7 1 3 F 5 4 L 2 1 4 A
b 0 1
McNemars Test
After Importing your dataset, and providing names to variables, click on:
ANALYZE DESCRIPTIVE STATISTICS CROSSTABS For ROWS, Select the outcome for condition/time 1 For COLUMNS, Select the outcome for condition/time 2 Under STATISTICS, Click on MCNEMAR Under CELLS, Click on OBSERVED and TOTAL PERCENTAGES NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
Example 5.6  Report of Implant Leak
t i l
Pvalue
Cochran MantelHaenszel Test
After Importing your dataset, and providing names to variables, click on:
ANALYZE DESCRIPTIVE STATISTICS CROSSTABS For ROWS, Select the Independent Variable For COLUMNS, Select the Dependent Variable For LAYERS, Select the Strata Variable Under STATISTICS, Click on COCHRANS AND MANTELHAENSZEL STATISTICS NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
Example 5.7 Smoking/Death by Age
C
T H e D o A a t e G a 5 S S 0 7 0 7 N 4 2 6 T o 1 2 3 5 S S 5 7 4 1 N 4 1 5 T o 1 5 6 6 S S 0 5 9 4 N 8 0 8 T o 3 9 2 6 S S 5 3 7 0 N 6 9 5 T o 9 6 5
E 7 ln 7 S 1 A 0 A C L 2 I R n U 8 ln L 6 U 7
T d t h
ChiSquare Test
After Importing your dataset, and providing names to variables, click on:
ANALYZE DESCRIPTIVE STATISTICS CROSSTABS For ROWS, Select the Independent Variable For COLUMNS, Select the Dependent Variable Under STATISTICS, Click on CHISQUARE Under CELLS, Click on OBSERVED, EXPECTED, ROW PERCENTAGES, and ADJUSTED STANDARDIZED RESIDUALS
(in absolute value) show which cells are inconsistent with null hypothesis of independence. A common rule of thumb is seeing which if any cells have values >3 in absolute value
NOTE: Large ADJUSTED STANDARDIZED RESIDUALS
Example 5.8  Marital Status & Cancer
R E V
N C R T C n o a c t n a e M S C i A o n 9 7 6 E x . 1 9 0 % % % % A d 3 3 M C o a 6 8 4 E x 3 7 0 % % % % A d 7 7 W C o i 7 6 3 E x 6 4 0 % % % % A d 1 1 D C i o v 5 5 0 E x . 0 0 0 % % % % A d 0 0 T C o o 7 6 3 E x 0 0 0 % % % %
p s a d i l d a 0 3 7 L 2 3 4 L 1 1 7
a 1
Goodman & Kruskals g / Kendalls tb
After Importing your dataset, and providing names to variables, click on:
ANALYZE DESCRIPTIVE STATISTICS CROSSTABS For ROWS, Select the Independent Variable For COLUMNS, Select the Dependent Variable Under STATISTICS, Click on GAMMA and KENDALLS tb
Examples 5.9,10  Nicotine Patch/Exhaustion
KruskalWallis Test
After Importing your dataset, and providing names to variables, click on:
ANALYZE NONPARAMETRIC TESTS k INDEPENDENT SAMPLES For TEST VARIABLE, Select Dependent Variable For GROUPING VARIABLE, Select Independent Variable, then define range of levels of variable (Minimum and Maximum) Click on KRUSKALWALLIS H
Example 5.11  Antibiotic Delivery
Note: This statistic makes the adjustment for ties. See Hollander and Wolfe (1973), p. 140.
Cohens k
After Importing your dataset, and providing names to variables, click on:
ANALYZE DESCRIPTIVE STATISTICS CROSSTABS For ROWS, Select Rater 1 For COLUMNS, Select Rater 2 Under STATISTICS, Click on KAPPA Under CELLS, Click on TOTAL Percentages to get the observed percentages in each cell (the first number under observed count in Table 5.17).
Example 5.12  Siskel & Ebert
1Factor ANOVA  Independent Samples (Parallel Groups)
After Importing your dataset, and providing names to variables, click on:
ANALYZE COMPARE MEANS ONEWAY ANOVA For DEPENDENT LIST, Click on the Dependent Variable For FACTOR, Click on the Independent Variable To obtain Pairwise Comparisons of Treatment Means:
Click on POST HOC Then TUKEY and BONFERRONI (among many other choices)
Examples 6.1,2  HIV Clinical Trial
i
a N 1 2 T a T Z 0 0 S 0 0 S 0 0 S 6 0
a D e U
e a d e n r e S r r ( ( i E J g I B J B ) ) T S S u Z Z 0 9 6 1 9 * Z Z 0 9 0 1 9 * S S Z Z 0 9 6 9 1 * Z Z 0 9 6 9 9 Z S Z Z 0 9 0 9 1 * S Z 0 9 6 9 9 B S S o Z Z 0 9 3 2 2 Z Z 0 9 0 8 2 * S S Z Z 0 9 3 2 2 Z Z 0 9 1 2 2 Z S Z Z 0 9 0 2 8 * S Z 0 9 1 2 2 * . T
KruskalWallis Test
After Importing your dataset, and providing names to variables, click on:
ANALYZE NONPARAMETRIC TESTS k INDEPENDENT SAMPLES For TEST VARIABLE, Select Dependent Variable For GROUPING VARIABLE, Select Independent Variable, then define range of levels of variable (Minimum and Maximum) Click on KRUSKALWALLIS H
Example 6.2(a)  Thalidomide and HIV1
Randomized Block Design  Ftest
After Importing your dataset, and providing names to variables, click on:
ANALYZE GENERAL LINEAR MODEL UNIVARIATE Assign the DEPENDENT VARIABLE Assign the TREATMENT variable as a FIXED FACTOR Assign the BLOCK variable as a RANDOM FACTOR Click on MODEL, then CUSTOM, under BUILD TERMS choose MAIN EFFECTS, move both factors to MODEL list Click on POST HOC and select the TREATMENT factor for POST HOC TESTS and BONFERRONI and TUKEY (among many choices) For PLOTS, Select the BLOCK factor for HORIZONTAL AXIS and the TREATMENT factor for SEPARATE LINES, click ADD
Example 6.3  Theophylline Clearance

I I I S q d F S S u i f g a q I n 3 1 3 5 0 a E 1 3 4
5 2 3 1 0 b E 9 6 1 S 1 3 4 3 0 b E 9 6 1 a
C o b
m .
M e a d e e r e e ( e S . I r r ( ( i E J g I J B B ) T C F u 3 2 0 0 0 6 8 1 1 P 3 0 3 3 6 6 2 5 7 * F C 3 2 0 0 0 6 8 1 1 P 3 0 3 3 6 6 1 5 7 * P C 3 0 3 3 6 6 2 7 5 * F 3 0 3 3 6 6 1 7 5 * B C F o 3 0 6 6 0 6 0 2 2 P 3 0 7 9 6 6 2 4 8 * F C 3 0 6 6 0 6 0 2 2 P 3 0 7 9 6 6 1 4 8 * P C 3 0 9 7 6 6 2 8 4 * F 3 0 9 7 6 6 1 8 4 * B * . T
Example 6.3  Theophylline Clearance
P
b s N 1 2 D a , T P u 4 7 C 4 3 F 4 3 S 0 8
M B T h a U .
Estimated Marginal Means of THEOPHCL
b A
6
Estimated Marginal Means
DRUG
Cimetidine Famotidine Placebo 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 0
SUBJECT
Randomized Block Design  Friedmans test
After Importing your dataset, and providing names to variables, click on:
ANALYZE NONPARAMETRIC TESTS k RELATED SAMPLES For TEST VARIABLES, select the variables representing the treatments (each line is a subject/block) Click on FRIEDMAN
Example 6.4  Absorption of Valproate Depakote
a a
a F
Note: This makes an adjustment for ties, see Hollander and Wolfe (1973), p. 140.
2Way ANOVA
After Importing your dataset, and providing names to variables, click on:
ANALYZE GENERAL LINEAR MODEL UNIVARIATE Assign the DEPENDENT VARIABLE Assign the FACTOR A variable as a FIXED FACTOR Assign the FACTOR B variable as a FIXED FACTOR Click on MODEL, then CUSTOM, select FULL FACTORIAL Click on POST HOC and select the both factors for POST HOC TESTS and BONFERRONI and TUKEY (among many choices) For PLOTS, Select FACTOR B for HORIZONTAL AXIS and FACTOR A for SEPARATE LINES, click ADD
Example 6.5  Nortriptyline Clearance
I I I q d F u ig f a 4 3 1 4 7 I n 8 1 8 2 0 0 1 0 4 5 2 1 2 6 8 2 1 2 2 0 8 6 7 T 0 0 2 9 Estimated Marginal Means of CLRNCE
9.5 a 9.0
8.5
Estimated Marginal Means
8.0
7.5
7.0
GENDER
1 2 1 2
6.5 6.0
ETHNIC
Linear Regression
After Importing your dataset, and providing names to variables, click on:
ANALYZE REGRESSION LINEAR Select the DEPENDENT VARIABLE Select the INDEPENDENT VARAIABLE(S) Click on STATISTICS, then ESTIMATES, CONFIDENCE INTERVALS, MODEL FIT For histogram of residuals, click on PLOTS, and HISTOGRAM under STANDARDIZED RESIDUAL PLOTS
Examples 7.17.6  Gemfibrozil Clearance
i i r i t (
Histogram Dependent Variable: CLGM
6 5
Frequency
1 0 1.50 1.00 .50 0.00 .50 1.00 1.50
Std. Dev = .97 Mean = 0.00 N = 17.00
Regression Standardized Residual
Examples 7.17.6  Gemfibrozil Clearance
i
Example 7.8  TB/Thalidomide in HIV
i i i t (
Useful Regression Plots
Scatterplot with Fitted (Least Squares) Line
GRAPHS INTERACTIVE SCATTERPLOT Select DEPENDENT VARIABLE for UP/DOWN AXIS Select INDEPENDENT VARIABLE for RIGHT/LEFT AXIS Click on FIT Tab, then REGRESSION for METHOD NOTE: Be certain both variables are SCALE in VARIABLE VIEW under MEASURE
Partial Regression Plots (Multiple Regression) to observe association of each Independent Variable with Y, controlling for all others
Fit REGRESSION model with all Independent Variables Click PLOTS, then PRODUCE ALL PARTIAL PLOTS
Example 7.1  Gemfibrozil Scatterplot
Linear Regression
600
500
clgm
400
300
200
clg m = 460.83 + 3.22 * clcr RSquare = 0.33
20
40
60
clcr
Logistic Regression
After Importing your dataset, and providing names to variables, click on:
ANALYZE REGRESSION BINARY LOGISTIC Select the DEPENDENT VARIABLE Select the INDEPENDENT VARAIABLE(S) as COVARIATES For a 95% CI for the odds ratio, click on OPTIONS, then CI for exp(B) Declare any CATEGORICAL COVARIATES (Independent variables whose levels are categorical, not numeric)
Example 8.1  Navelbine Toxicity
. . i
Omnibus test for all regression coefficients (like F in linear regression)
Example 8.2  CHD, BP, Cholesterol
. . i
Nonlinear Regression
After Importing your dataset, and providing names to variables, click on:
ANALYZE REGRESSION NONLINEAR Select the DEPENDENT VARIABLE Define the MODEL EXPRESSION as a function of the INDEPENDENT VARIABLE(s) and unknown PARAMETERS Define the PARAMETERS and give them STARTING VALUES (this may take several attempts)
Example 8.3  MK639 in AIDS Patients
Nonlinear Regression Summary Statistics Source DF Sum of Squares Dependent Variable RNACHNG Mean Square
Regression Residual Uncorrected Total
(Corrected Total)
3 2 5
4
24.97099 .02783 24.99881
10.83973
8.32366 .01391
R squared = 1  Residual SS / Corrected SS =
.99743
Parameter
Estimate
Asymptotic Std. Error
Asymptotic 95 % Confidence Interval Lower Upper
A B C
3.521788512 .121466117 2.999161991 4.044415032 35.598069675 7.532265897 3.189345253 68.006794097 18374.392967 82.899219276 18017.706415 18731.079519
Ax B Model : y B x CB
x AUC06 h
A, B, C Parameters
Survival Analysis KaplanMeier Estimates and LogRank Test
After Importing your dataset, and providing names to variables, click on:
ANALYZE SURVIVAL KAPLANMEIER Select the variable representing the survival TIME of individual Select the variable representing the STATUS of individual (whether or not event has occured). NOTE: If the variable is an indicator that the observation was CENSORED, then a value of 0 for that variable will mean the event has occured. Select the variable representing the FACTOR containing the groups to be compared Click on COMPARE FACTOR, select LOGRANK, and POOL ACROSS STRATA
Examples 9.12  Navelbine and Taxol in Mice
Survival Analysis for TIME Factor REGIMEN = 1 Time Status Cumulative Survival 0 0 0 0 0 0 0 0 0 .9796 .9592 .9388 .8980 .8776 .8571 .8367 .8163 Standard Error .0202 .0283 .0342 .0432 .0468 .0500 .0528 .0553 Cumulative Events 1 2 3 4 5 6 7 8 9 Number Remaining 48 47 46 45 44 43 42 41 40
6 8 22 32 32 35 41 46 54 Factor REGIMEN = 2 Time Status
Cumulative Survival 0 0 0 0 0 0 0 0 0 .9333 .8667 .8000 .7333 .6667 .6000 .5333 .4667 .4000
Standard Error .0644 .0878 .1033 .1142 .1217 .1265 .1288 .1288 .1265
Cumulative Events 1 2 3 4 5 6 7 8 9
Number Remaining 14 13 12 11 10 9 8 7 6
8 10 27 31 34 35 39 47 57
Examples 9.12  Navelbine and Taxol in Mice
Survival Functions
1.1 1.0 .9 .8 .7
REGIMEN
.6
Cum Survival
2 .5 .4 .3 0 10 20 30 40 50 60 70 2censored 1 1censored
TIME
Test Statistics for Equality of Survival Distributions for REGIMEN Statistic df Significance Log Rank 10.93 1 .0009
This is the square of the Zstatistic in text, and is a chisquare statistic
Relative Risk Regression (Cox Model)
After Importing your dataset, and providing names to variables, click on:
ANALYZE SURVIVAL COX REGRESSION Select the variable representing the survival TIME of individual Select the variable representing the STATUS of individual (whether or not event has occured). NOTE: If the variable is an indicator that the observation was CENSORED, then a value of 0 for that variable will mean the event has occured. Select the variable(s) representing the COVARIATES (Independent Variables in Model) Identify any CATEGORICAL COVARIATES including Dummy/Indicator variables KM PLOTS can be obtained, with separate SURVIVAL curves by categories
Example 9.3  6MP vs Placebo
Survival Function for patterns 1  2
1.2
1.0
.8
.6
.4
Cum Survival
TRT
.2 Placebo 0.0 10 0 10 20 30 6MP
REMSTIME