You are on page 1of 94

Statistical analysis

using SPSS
Jarir At Thobari, MD, DPharm, PhD
Faculty of Medicine
Universitas Gadjah Mada
SPSS Program Windows

• SPSS Program Windows


– Data Editor
• Data View
• Variable View
– Output Viewer
– Syntax Editor
• File Types
– Data: filename.sav
– Output: filename.spo
• Menus and Toolbars
SPSS Options

• Users can set options to make


program easier to use
• Edit menu
– Choose Options
– On General Tab:
• Display Names & File
• Record Syntax… & Temp. Dir.
– H:\
– On Viewer Tab:
• Display Commands in the Log
More SPSS Options
– Output Labels Tab
• For Pivot Table Labeling:
– Variables in labels shown as
Names and Labels
– Variable values in labels
shown as Values and Labels
– Pivot tables Tab
• For Table look:
– Choose Academic
(narrow).tlo
Loading Data
• Data Entry in the SPSS
Data Editor
• Import from Excel
– File Menu:
• Open
– Data
– In Open Data box, enter: C:\
SPSS\data example.xls
– Click OK for defaults
Preparing Data for Analysis

Variable
Variable Formats
Formats
Variable
Variable Labels
Labels
Value
Value Labels
Labels
Missing
Missing Values
Values
Copying
Copying Data
Data Properties
Properties
Formatting Your Variables
Variable Formats Variable Labels
– Click on the Variable View tab of – Type in descriptive text that
the Data Editor to edit or display explains what the variable
formats measures
Name, type, width decimals, label,
values, missing, columns, align ,
measure
Formatting Your Variables (cont.)

• Value Labels – Text that explains


what numeric values stand for
– Click in the cell of the Value
column for your variable, enter a
Value and Label, click Add
• Missing Values – Defines values
that should not be included in
calculations
– Click in the cell of the Missing
column for the variable, choose
either Discrete… or Range… and
enter the values
Label and values
• Patient number
• Date of birth
• Date of data collection
• Gender (1=male, 2=female)
• Height (cm)
• Weight (kg)
• Smoking (0=no, 1=yes)
• Myocardial infarction (0=no, 1=yes)
• Systolic Blood Pressure
• Diastolic Blood Pressure
• Plasma glucose (mmol/l)
• Cholesterol (mmol/l) - first measurement
• Cholesterol (mmol/l) - second measurement
• Use of Antidiabetic (0=no, 1=yes)
• Use of Antihypertensive (0=no, 1=yes)
• Use of Anticholesterolemia (0=no, 1=yes)
Creating New Variables

• Computing Variables
• Collapsing Variables Using Recode
• Counting Values in Other Variables
• Ranking Cases
• Date and Time Variables
Computing New Variables
• Create new variables
using equations or
functions
– Transform menu
• Compute Variable
– Enter a Target Variable
Name – e.g. TestAvg
– Build a Numeric
Expression
• E.g. – (Test1 + Test2 +
Test3)/3
– Click OK
Exercise!
Body Mass Index (BMI)
Calculates as (body weight in kg) divided (height2 in m)

• Body Mass Index (BMI) is categorized as


– Underweight : BMI < 18.5
– Normal weight : BMI 18.5-24.9
– Overweight : BMI 25.0-29.9
– Obesity : BMI >=30
Recoding Variables
• Recoding renumbers or
collapses the values of a
variable
– Transform menu
• Recode into different variables
– Highlight variable(s) and move
over with arrow
– Fill in a Name and Label for the
new variable
– Click Old and New Values
Recoding Variables

– Specify the Old Value


• e.g., 90 through 100, 80
through 89, etc.
– Specify a New Value
• e.g., 4 (for an A), 3(for a B),
etc.
– Click on the Add button
– Repeat until all old and new
values are specified
– Old values can be defined as
single values, ranges or
missing values
– Add value and variable labels,
etc.
Other Ways to Create Variables

• Counting Values in
Other Variables

• Ranking Cases

• Date and Time


Variables
Exercise!
Age
Age of subjects at the time data were collected

COMPUTE Age = (CTIME.DAYS(Collectdate-Birthdate))/365 .


EXECUTE .
Exercise!

• Recoding: Hyperlipidemia & Hypertension


• Hyperlipidemia defined as subjects who have
cholesterol level
– at least 5 mmol/l if the subject has history of
myocardial infarction or
– at least 8 mmol/l if the subject has no history of
myocardial infarction
• Hypertension defined as
– subjects who has SBP/DBP >=140/90 mmHg
Hyperlipidemia syntax
DO IF (MI=1) .
RECODE
Chol1 (5.00 thru Highest=1) INTO Hyperlipidemia .
END IF .
VARIABLE LABELS Hyperlipidemia 'Hyperlipidemia category'.
EXECUTE .

DO IF (MI=0) .
RECODE
Chol1 (8.00 thru Highest=1) INTO Hyperlipidemia .
END IF .
VARIABLE LABELS Hyperlipidemia 'Hyperlipidemia category'.
EXECUTE .

RECODE
Hyperlipidemia (SYSMIS=0) .
EXECUTE .
Hypertension syntax

USE ALL.
COMPUTE filter_$=(Systolic >= 140.00 OR Diastolic >= 90.00).
VARIABLE LABEL filter_$ 'Systolic >= 140.00 OR Diastolic >= 90.00 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE .
Descriptive Data Analysis

• Frequencies
• Descriptive
• Crosstabs
• Means
• Normality
The FREQUENCIES Procedure

• FREQUENCIES creates tables with


counts of cases for each value of the
variable
• Analyze Menu:
– Descriptive Statistics…
• Frequencies
• Highlight variables to create tables,
click the arrow to add to variable list,
then click OK
• Statistics, Chart and Format options
are available
FREQUENCIES Output

1. Command syntax
2. Summary statistics
1
3. Frequency counts for
each value
4. Percentages 2
– Raw percent
– Valid percents
3 4
– Cumulative percents
Exercise!
Calculate the frequency of

Male subject
Smoker
Myocardial infarction
Obesity
Hypertension
Hyperlipidemia
Use of antihypertensive
Use of antidiabetic
Use of anticholesterol
The DESCRIPTIVES Procedure

• DESCRIPTIVES creates tables


with summaries of values for
variables
• Analyze Menu:
– Descriptive Statistics…
• Descriptive
• Highlight variables to create tables,
click the arrow to add to variable
list, then click OK
• Options are available to choose
different statistics
DESCRIPTIVES Output

1. Command syntax
2. Variable name and label
3. Number of cases
4. Statistics:
– Minimum
– Maximum
– Mean
– Standard Deviation
Exercise!
Calculate the mean and SD of

Age
Body Mass Index
Systolic BP
Diastolic BP
Glucose level
Cholesterol level
The CROSSTABS Procedure

• CROSSTABS displays the


intersection of values of two or
more variables
• Analyze Menu:
– Descriptive Statistics…
• Crosstabs
• Highlight variables to create
tables, click the arrow to add to
Row, Column or Layer variable
lists, then click OK
• Statistics, Cells and Format
options are available
Crosstabs Output

1. Table title
2. Column variables
3. Row variables
4. Cell counts (# of
cases)
5. Column percents (%
of cases in column)
6. Statistics
Exercise!
Calculate % hyperlipidemia who are

Male
Smoker
Obesity
Myocardial infarction
Hypertension
Use of hypertension drug
Use of anticholesterol drug
Use of antidiabetic drug
The MEANS Procedure
• MEANS calculates overall means
and group means (defined by
independent variables)
• Analyze Menu:
– Compare mean…
• Means
• Highlight variables to create
tables, click the arrow to add to
Dependent or Independent
variable lists, then click OK
• Optional Statistics are available
MEANS Output

1. Command syntax
2. Numbers of cases included and
excluded
3. Dependent variable
4. Independent (group) variable
5. Means
6. Number of cases
7. Standard Deviations
Exercise!
Compare between Obesity’s vs. Normoweight

Mean of :

Age
Systolic BP
Diastolic BP
Cholesterol-1
Cholesterol-2
Glucose
Exercise!

Calculate male’s proportion of

Smoker
Myocard Infaction
Use of Antihypertensive
Use of Antihyperlipidemia
Use of Antidiabetic
Home Assignment
Variables Antilipidemia p value
Yes No
Age (mean +/- SD)
Male (%)
Smoker (%)
Systolic BP (mean +/- SD)
Diastolic BP (mean +/- SD)
Cholesterol (mean +/- SD)
Glucose (mean +/- SD)
Myocardial Infarction (%)

*Used independent t-test, ** used pearson chi square test


The patients who used antilipidemia is significantly older, higher SBP, DBP, Glucose and Cholesterol level,
andHypothesis: Statin
have higher history efektiftomenurunkan
of MI compared kadar cholesterol
those do not use antilipidemia
Proportion of male and smoker are not significantly difference between those who use antilipidemia
than do not use
Home Asignment
Variables Antilipidemia p value
Yes No
Age (mean +/- SD) 59.0 +/- 9.0 50.4 +/- 12.2 0.000*
Male (%) 43 (51.8) 377 (41.1) 0.059**
Smoker (%) 36 (43.4) 369 (40.1) 0.578**
Systolic BP (mean +/- SD) 136.1 +/- 20.6 128.1 +/- 20.0 0.001*
Diastolic BP (mean +/- SD) 76.5 +/- 8.7 73.7 +/- 9.9 0.007*
Cholesterol (mean +/- SD) 5.9 +/- 1.1 5.7 +/- 1.2 0.033*
Glucose (mean +/- SD) 5.3 +/- 1.1 4.9 +/- 1.6 0.007*
Myocardial Infartion (%) 25 (30.1) 21 (2.3) 0.000**

*Used independent t-test, ** used pearson chi square test


The patients who used antilipidemia is significantly older, higher SBP, DBP, Glucose and Cholesterol level,
andHypothesis: Statin
have higher history efektiftomenurunkan
of MI compared kadar cholesterol
those do not use antilipidemia
Proportion of male and smoker are not significantly difference between those who use antilipidemia
than do not use
Y= B0 + B1X1 (INDEPENDENT T-TEST)
Y= B0 + B1X1 + B2X2

Independent Dependent
Variable Variable

ANTILIPID (X1) PENURUNAN CHOLESTEROL (Y)

CONFOUNDING CONFOUNDING
BY INDICATION X variable BY OUTCOME

USIA (X2)
Smoking (X3)
MI (X4)

P(X1)= B0 + B1X2 + B2X3 + B4X4


• Does antilipidemia reduce cholesterol level?
– Baseline characteristics the use of antilipidemia vs. No use
–> homogeneity between use of antilipidemia and not use
– Difference on cholesterol level before and after use of
antilipidemia
– Association between baseline characteristics with
reduction of cholesterol level
– Bivariate: association between use of antipilidemia with
reduction of cholesterol
– Multivariate: association between antilipidemia with
reduction of cholesterol adjusted by confounding
Sample t-test

When do I use it?


• A one sample t-test compares the mean of one sample to a
fixed estimate, usually 0.
• A significant result indicates that the group's mean differs
from the fixed value.
Sample t-test

Checking your assumptions


• Normality  assumes that the population distribution is
normal. The t-test is quite robust over moderate violations of
this assumption. Check for normality by creating a histogram
• Independent observation  the observations within the
treatment must be independent.
Sample t-test
• Click "Compare Means" in the "Analyze" menu.
• Select "One-Sample T-test..."
• select the variable  For instance, below we are comparing
the mean with a fixed value
Sample t-test
• This test will give you the mean and standard deviation of
the sample in its first table. The second table, gives you the
information about the t-test.
Exercise!

Sample t-test

Populasi penelitian mempunyai tekanan


darah sistolik dan diastolik lebih tinggi
dari normal
Independent t-test
When do I use it?
• An independent sample t-test compares the means of two
independent groups. e.g. data from two different groups of
participants.
• The null hypothesis would be that the means are the same.
• A low p-value (indicating a sufficiently large difference
between groups) would suggest that you reject the null
hypothesis and conclude that the two groups are significantly
different.
Independent t-test
Checking your assumptions
• Normality: Assumes that the population distributions are normal. The
t-test is quite robust over moderate violations of this assumption. It is
especially robust if a two-tailed test is used and if the sample sizes are
not especially small. Check for normality by creating a histogram.

• Independent Observations: The observations within each treatment


condition must be independent.

• Equal Variances: Assume that the population distributions have the


same variances. This assumption is quite important (if it is violated, it
makes the test's averaging of the 2 variances meaningless). If it is
violated, then use a modification of the t-test procedure as needed. 
Levene's Test for Equality of Variances.
Independent t-test
• Click "Compare Means" in the "Analyze" menu.
• Select “Independent Sample T-test..."
• select the variable  "test variable" box. In our example  testing
a mean difference in grouping variables
Independent t-test
• This test will give the mean and standard deviation of each
of the two samples in its first table.
• The second table gives the information about the t-test. Use
the "equal variances assumed" row
Exercise!
Is there any difference on systolic and diastolic BP between
male and female?

Is mean of systolic and diastolic BP higher among smokers


compared to non-smoker?

Are SBP, DBP, cholesterol and glucose level independent on


nutritional status?
Paired t-test

When do I use it?


• This t-test evaluates two groups that are related to
each other. For example, data from a group of
participants who are tested before and after a
procedure would be analyzed using a paired sample t-
test.
Paired t-test

Checking your assumptions


• Normality  assumes that the population distribution
is normal. The t-test is quite robust over moderate
violations of this assumption. Check for normality by
creating a histogram
• Independent observation  the observations within
the treatment must be independent.
Paired t-test
• Click "Compare Means" in the "Analyze" menu.
• Select “Paired Sample T-test..."
• Select the two variables  "paired variables" box.
Paired t-test
• This test will give the mean and standard deviation of each
of the two samples in its first table.
• In its second table, it runs a correlation between the two
variables to see how much they are related to each other
Exercise!

Paired t-test
Level of Cholesterol level before and after
given anticholesterol
One way ANOVA
When do I use it?
• A one-way ANOVA allows us to test whether several
means (for different conditions or groups) are equal
across one variable.
One way ANOVA
Checking your assumptions

• Normality. Assume that the population distributions are normal. Check for
normality by creating a histogram.

• Independent Observations. The observations within each treatment condition


must be independent.

• Equal Variances. Assume that the population distributions have the same
variances.
– As long as the largest variance is no more than 4 or 5 times the size of the smallest
variance and the sample sizes are equal, then the test is robust to violations.
– Standard deviation is the square root of variance. Square the largest and smallest
standard deviation to get the variances, and then divide the larger by the smaller.
One way ANOVA
• Select "General Linear Model" in the "Analyze" menu  Select "Univariate..."
from this menu.
• Select a dependent variable.
• Next, select the factor (independent variable) and place it in the "Fixed Factor(s)"
box.
One way ANOVA
• To display the group means, click the Options button and then add your
independent variable to the "Display Means for" list.
• Select the "Descriptive Statistics" option to obtain more information about each
group such as standard deviation and count in addition to the means
One way ANOVA
• If wish to run post-hoc tests on ANOVA to examine individual
mean differences  click "Post Hoc..." Add the variables to test.
Most often, you will use the Tukey test.
One way ANOVA
the mean, standard deviation, and total count (N)
in each group.

Descriptive Statistics

Dependent Variable: Systolic Blood Pressure


BMI categorical Mean Std. Deviation N
underweight 112.23 22.016 13
normal weight 121.35 17.937 1232
over weight 134.29 20.425 1290
obese 138.51 20.547 465
Total 129.53 20.729 3000
One way ANOVA
• The third row  independent variable in capitals; use the F-
value and p-value from there.
• Degree of freedom "between" is also in that row. Use the
degrees of freedom in the next row, labeled "Error," to get
your degrees of freedom "within."
One way ANOVA
• Tukey test  compares each level to every other level to
see if they are significantly different.
Multiple Comparisons

Dependent Variable: Systolic Blood Pressure


Tukey HSD

Mean
95% Confidence Interval
Difference
(I) BMI categorical (J) BMI categorical (I-J) Std. Error Sig. Lower Bound Upper Bound
underweight normal weight -9.12 5.428 .335 -23.07 4.84
over weight -22.06* 5.427 .000 -36.01 -8.11
obese -26.28* 5.474 .000 -40.35 -12.21
normal weight underweight 9.12 5.428 .335 -4.84 23.07
over weight -12.94* .776 .000 -14.94 -10.95
obese -17.16* 1.060 .000 -19.89 -14.44
over weight underweight 22.06* 5.427 .000 8.11 36.01
normal weight 12.94* .776 .000 10.95 14.94
obese -4.22* 1.053 .000 -6.92 -1.51
obese underweight 26.28* 5.474 .000 12.21 40.35
normal weight 17.16* 1.060 .000 14.44 19.89
over weight 4.22* 1.053 .000 1.51 6.92
Based on observed means.
*. The mean difference is significant at the .05 level.
Exercise!

One way ANOVA

There is difference of age among nutritional


status (BMI)
Factorial ANOVA
When do I use it?
• To compare the means for a variable across two or more
variables, rather than one. This is known as a factorial design.
• For example, to know how gender and BMI affect systolic BP,
 two separate one-way ANOVAs, or run a factorial ANOVA.
• The advantage: allows to examine interaction effects between
independent variables.
Factorial ANOVA
Checking your assumptions

• Normality. Assume that the population distributions for each of your cells
are normal. ANOVA is quite robust over moderate violations of this
assumption. Check for normality by creating a histogram.

• Independent Observations. The observations within each cell must be


independent.

• Equal Variances. Assume that the population distributions for each cell
have the same variances.
Factorial ANOVA
• Select "General Linear Model" in the "Analyze" menu  Select
"Univariate..." from this menu.
• Select a dependent variable.
• Next, select the factor (independent variable) and place it in the
"Fixed Factor(s)" box.
Factorial ANOVA
• To display the group means, click the Options button and then add
your independent variable to the "Display Means for" list.
• Select the "Descriptive Statistics" option to obtain more information
about each group such as standard deviation and count in addition to
the means
Factorial ANOVA
• The third row  independent variable in capitals; use the F-
value and p-value from there.
• Degree of freedom "between" is also in that row. Use the
degrees of freedom in the next row, labeled "Error," to get
your degrees of freedom "within."
Factorial ANOVA
Descriptive Statistics

Dependent Variable: Systolic Blood Pressure


BMI categorical Gender Mean Std. Deviation N
underweight male 112.00 . 1
female 112.25 22.995 12
Total 112.23 22.016 13
normal weight male 127.82 17.293 453
female 117.58 17.225 779
Total 121.35 17.937 1232
over weight male 138.90 19.495 683
female 129.11 20.221 607
Total 134.29 20.425 1290
obese male 143.48 18.461 193
female 134.98 21.244 272
Total 138.51 20.547 465
Total male 135.77 19.535 1330
female 124.57 20.312 1670
Total 129.53 20.729 3000
Exercise!

Factorial ANOVA

Gender and BMI affect


diastolic and systolic BP Systolic BP
Repeated ANOVA
When do I use it?
• Similar to a paired t-test, repeated measures ANOVA tests
allow us to examine the means for two groups that are related
to each other.
• One group of participants who were measured at three or more
intervals.
Chi-Square
• Go to "Descriptive Statistics" under the "Analyze" menu  Select
"Crosstabs" from the list.

• Assign one variable to the rows and one variable to the columns.
• Click on the "Statistics" button.

• Check the box marked "Chi-square" and then click "Continue."


• Choose to have report the expected values for each cell by clicking "Cells..."
in the "Crosstabs"

• Set the "Cells" and "Statistics" to your specifications, click "OK."


Exercise!

Chi Square

Is there difference proportion of MI between smoker and non smoker?

Is there difference proportion of MI between male and female?

Is there difference proportion of hyperlipidemia between male and


female?
Pearson correlation
When do I use it?

• Correlation analyses are used to assess the relationship between two


variables.

• Graphically, the relationship between the two can be plotted (one as the
independent, or the predictor variable, and the other as the dependent, or
the predicted variable).

• In this case, the slant of the line represents the degree of correlation: the
steeper the line is, the more highly correlated the two variables are.
Otherwise, the correlation is represented in a table format as "r," in which
case, the greater the absolute value of "r," the higher the correlation.
Pearson correlation
Checking your assumptions

• Normality: Assume that the population distribution is normal.


Check for normality by creating a histogram.

• Independent observations: The observations within each


variable must be independent.
Pearson correlation
• Click on "Correlate" under the "Analyze" menu  Select "Bivariate"
from this menu.
• Select the variables you wish to correlate and press "OK."
Pearson correlation
• Look for where the columns and rows for two variables intersect to
find information about the r-value and p-value. Significant
correlations will be starred.
Exercise!

Pearson correlation

There is correlation between


Systolic BP and Cholesterol level

There is correlation between Systolic -Diastolic BP and


age
Linear Regression
(simple and multivariate)
Linear Regression

When do I use it?


• To look at the linear relationship between one normally
distributed interval predictor and one normally distributed
interval outcome variable.

• To look the linear relationship between systolic blood pressure


vs. cholesterol level, BMI and sex
Linear Regression

• Click on “regression" under the "Analyze" menu  Select


“linear" from this menu.
• Select the variables you wish to associate and press "OK."
Exercise
• Are the age, BMI and smoking status are
predictor for SBP and DBP?
• Y=SBP atau DBP (continous scale)
• X= age (continues), BMI (continues), smoking
(nominal)
• Liniear regressi
Y=B0+B1(Age)+B2(BMI) + B3 (Smoking) 60 tahun,
Y = 67,39 + 0,72 (Age) + 0,974 (BMI) – 2,130 (Smoking) BMI 25,5
smoking
SBP = 67,39 + 0,72*60 + 0,974*25,5 – 2,130*1
SBP = 133,2
Logistic Regression
(simple and multivariate)
Logistic Regression

When do I use it?


• Logistic regression assumes that the outcome variable is
binary
• The first variable listed after the logistic command is the
outcome (or dependent) variable, and all of the rest of the
variables are predictor (or independent) variables.
• The predictor variables must be either dichotomous or
continuous; they cannot be categorical.
• To look the relationship between the outcome of MI and
hyperlipidemia, cholesterol and systolic BP.
Logistic Regression

• Click on “regression" under the "Analyze" menu  Select


“binary logistic" from this menu.
• Select the variables you wish to associate and press "OK."
Exercise
• There are significant difference on cholesterol and blood
pressure level between male vs. Female
• Antilipidemia is effective lowering cholesterol level
– Reduction of cholesterol is higher among subject use of
antilipidemia compared to non user
• There are difference on cholesterol and blood pressure level
between BMI categories
• Male, smoking, myocardial infarction, and hyperlipidemia
are risk factors for hypertension
• BMI and age influence blood pressure and cholesterol level
– Increasing of age or BMI will increase blood pressure and
cholesterol level
THANK YOU

You might also like