You are on page 1of 27

Using SPSS for Windows -

Statistical Procedures

1. Summarizing Data:
– Frequencies: descriptive stats of one categorical variable
– Descriptive: descriptive stats of one or many numeric variables
– Crosstabs: descriptive stats of two categorical variables
– Explore: descriptive stats of one numeric variable
– Frequency table
1. Analyzing Data and Testing hypothesis
• Parametric Tests- (Comparing Means )
– T-Tests
– One-Way Analysis of Variance
– Correlation and regression
• Nonparametric Tests
– Chi Square or Crosstabs
– Wilcoxon Signed Ranks
– Mann-Whitney U
– Kruskal-Wallis
1
Choosing Statistical Test
Data type & Comparison & samples Association Regression
distribution 2 data sets More than 2 data (correlation (Predication
sets between 2 of one
Paired Unpaired Paired Unpaired variables) variable
from other
variable)

Normal Paired Unpaired Repeated One way Pearson Linear


Distribution t test t test measure ANOVA correlation Regression
(Mean) ANOVA
Metric
Non-normal Wilcoxon Wilcoxon rank Freidman Kruskal Spearman Non-
Distribution signed sum test, test wallis test rank parametric
(Median) rank test Mann Whitney correlation regression
Categorical U test
Dichotomous Mc Chi-Square Cochran Chi- Contingency Logistic
Distribution Nemar test, Q test Square Coefficient regression
(Proportion) test/Bino Fischer test (rough &
Dependent t-test (paired t-test or paired-
samples t-test or Repeated Measures t Test)
• It compares the means of two related groups to determine whether there is a
statistically significant difference between these means.
• Click Analyze > Compare Means > Paired-Samples T Test.

• Double-click on a variable to move it to the Variable1 slot in the Paired Variables


box. Then double-click on the other variable to move it to the Variable2 slot in
the Paired Variables box.
• Click OK.

3
Continued…
• Pair: The “Pair” column represents the number of Paired
Samples t Tests to run. You may choose to run multiple Paired
Samples t Tests simultaneously by selecting multiple sets of
matched variables. Each new pair will appear on a new line.
• Variable1: The first variable, representing the first group of
matched values.
• Variable2: The second variable, representing the second
group of matched values.
• Clicking Options will open a window where you can specify
the Confidence Interval Percentage and how the analysis will
address Missing Values

4
Output and Interpretation-Example

• Decision and Conclusions: REJECT if the Sig. or p-value is less than 0.05
• There was a significant average difference between English and Math scores (t397 =
36.313, p < 0.001).
• On average, English scores were 17.3 points higher than Math scores (95% CI [16.36,
18.23]). 5
Independent Samples t Test
• Only compare the means for two (and only two) groups to test the
Statistical differences between them
• To run an Independent Samples t Test in SPSS, click Analyze >
Compare Means > Independent-Samples T Test.

• You can move a variable(s) to either of two areas: Grouping Variable


or Test Variable(s).

6
Continued…
• Test Variable(s): The dependent variable(s). This is the continuous variable
whose means will be compared between the two groups.
• Grouping Variable: The independent variable. The categories (or groups)
of the independent variable will define which samples will be compared in
the t test.
• Click Define Groups to define the category indicators (groups) to use in the
t test.
• Options: this section is where you can set your desired confidence level for
the confidence interval for the mean difference, and specify how SPSS
should handle missing values.

• When finished, click OK to run the Independent Samples t Test

7
Output

• The p-value of Levene's test is ".000" (but should be read as p < 0.001), so we reject the
null of Levene's test and conclude that the variance in mile time of athletes is significantly
different than that of non-athletes.
• This tells us that we should look at the "Equal variances not assumed" row for the t test
(and corresponding confidence interval) results.
• Note that the mean difference is calculated by subtracting the mean of the second group
from the mean of the first group.
• The positive t value indicates that mean for the first group is significantly greater than the
mean for the second group.

8
Continued…
• The associated p value is printed as ".000"; double-clicking on the p-
value will reveal the un-rounded number. SPSS rounds p-values to three
decimal places, so any p-value too small to round up to .001 will print
as .000. (In this particular examples, the p-values are on the order of 10 -
40
.)
Decision and Conclusions
• Since p < .001 is less than our chosen significance level α = 0.05, we can
reject the null hypothesis, and conclude that the that the mean mile
time for athletes and non-athletes is significantly different.
Based on the results, we can state the following:
• There was a significant difference in mean mile time between non-
athletes and athletes (t315.846 = 15.047, p < .001).
• The average mile time for athletes was 2 minutes and 14 seconds faster
than the average mile time for non-athletes.

9
One-Way ANOVA/One-Factor ANOVA/Between

Subjects ANOVA
Compares the means of two or more independent groups in order to determine
whether there is statistical evidence that the associated population means are
significantly different.
• click Analyze > Compare Means > One-Way ANOVA.

• Dependent List: The dependent variable(s). the variable whose means will be
compared between the samples (groups).
• Factor: The independent variable. categories (or groups) of the independent
variable will define which samples will be compared.

10
Continued….
• Contrasts: (Optional)- planned comparisons, to be conducted after the
overall ANOVA test. When the initial F test indicates that significant
differences exist between group means, contrasts are useful for
determining which specific means are significantly different when you
have specific hypotheses to be tested.
• Contrasts are decided before analyzing the data (i.e., a priori).

11
Continued…
• Post Hoc: (Optional)- also known as multiple comparisons tests. post hoc
tests are useful for determining which specific means are significantly
different when you do not have specific hypotheses to be tested.
• It also specifies significance level: The desired cutoff for statistical
significance. By default, significance is set to 0.05.

12
Continued….
• Options: which Statistics to include, to include a Means plot and how
the analysis will address Missing Values

• Click Continue and then Click OK

13
Problem Statement
• In the sample dataset, the variable running time is the
respondent's time (in seconds) to run a given distance, and
Smoking is an indicator about whether or not the respondent
smokes (0 = Nonsmoker, 1 = Past smoker, 2 = Current
smoker).
• Let's use ANOVA to test if there is a statistically significant
difference in running time with respect to smoking status.
• Running time will serve as the dependent variable, and
smoking status will act as the independent variable.

14
Continued….

Running the Procedure


• Click Analyze > Compare Means > One-Way
ANOVA.
• Add the variable Running time to the Dependent
List box, and add the variable Smoking to the
Factor box.
• Click Options. Check the box for Means plot, then
click Continue.
• Click OK when finished.
15
Output, Discussion and Conclusions

• We conclude that the mean running time is significantly different for at least
one of the smoking groups (F2, 350 = 9.209, p < 0.001).

16
Pearson Correlation
• Bivariate Pearson Correlation produces a sample correlation coefficient, r, which
measures the strength and direction of linear relationships between pairs of
continuous variables.
• H0: ρ=0 (population correlation coefficient is 0; there is no association)
• Analyze > Correlate > Bivariate.

• Variables: select at least 2 continuous variables, but may select more than two.

17
Continued…
• Correlation Coefficients: By default, Pearson is selected. It will produce
the test statistics for a bivariate Pearson Correlation.
• Test of Significance: SPSS uses a two-tailed test by default.
• Flag significant correlations: this option will include asterisks (**) next to
statistically significant correlations in the output. By default, SPSS marks
statistical significance at the alpha = 0.05
• Options: specify which Statistics to include and how to address Missing
Values

18
Example: Understanding the linear association between
weight and height

1. Scatter-plots: Click Graphs > Legacy Dialogs > Scatter/Dot


• In the Scatter/Dot window, click Simple Scatter, then click Define.
• Move variable Height to the X-Axis box, and move variable Weight to the Y-Axis box.
When finished, click OK.
• From scatterplot, we can see that as height increases, weight also tends to increase.

2. Running the Test


• Analyze > Correlate > Bivariate.

19
Output, Decision and Conclusions

• The important cells we want to look at are either B or C. Cells B and C are identical and
contain the correlation coefficient for the correlation, its p-value, and the number of
complete pairwise observations.
• Pearson correlation coefficient for height and weight is .513, which is significant (p < .001
for a two-tailed test), based on 354 complete observations
Decision and Conclusions
• H0 is rejected. Weight and height have a statistically significant linear relationship (p
< .001).
• The direction of the relationship is positive
• The magnitude, or strength, of the association is approximately moderate (.3 < | r | < .5).

2
0
Linear Regression
• It is simple regression b/se it involves single independent variable
(X) & dependent variable (Y) and linear r/ship between them
• It is also extension of Pearson correlation and both variables must
be continuous variables
• It helps to predict the value of DV using a single IV
• Questions to be addressed are:
 Is there association b/n X and Y variables?
 If you make 1 unit change on IV, How much change can you expect
on DV?
• Ho: coefficient is equal to zero, X and Y have no r/ship (or no effect)
• Ho is rejected when p-value is less than alpha 0.05

21
Continued…
• Procedure: Analyze > Regression > Linear Regression
• Click Statistics- select Modal fit & Estimates (default) and add Confidence
interval & Descriptive
• Click Plots- select Normal probability plot
• Example- Does height (IV) significantly predicts weight (DV)?

Decision, interpretation and conclusion:


 From ANOVA table- the regression model is significant or there is good fit
of model F (1,352)=125.894, P <0.001
 From Model summary table- 26.1% of the variance on weight is explained
or predicted by one’s height
 From Coefficient table- height is a good predictor variable for weight
because the regression coefficient shows that for every additional one inch
in height, you can expect weight to increase by an average of 4.12 pounds

22
Write up of data analysis

A simple linear regression was calculated to


predict weight in pounds based on height in
inch, b=4.12,t(352)=125.894,P<0.001. Thus, a
significant regression equation was found
(F(1,352)= 125.894 , P <0.001, with an R2
of .261

23
Chi-square Test
Chi-Square Test of Independence determines whether there is an association b/n two
categorical variables (i.e., whether the variables are independent or not).
• Click on Analyze -> Descriptive Statistics -> Crosstabs.

• Drag and drop one variable into the Row(s) box, and one into Column(s) box.
• Click on Statistics, and select Chi-square.
• (Optional) Check the box for Display clustered bar charts.
• Press Continue, and then OK. Result will appear in output viewer.

24
Continued…
• Row(s): You must enter at least one Row variable.
• Column(s): You must enter at least one Column variable.
• Layer: An optional "stratification" variable.
• Statistics: Opens the Crosstabs. Make sure that Chi-square box is
checked off.

25
Continued….
• Cells: controls which output is
displayed in each cell of the crosstab.
There are three options in this
window that are useful (but
optional).
I. Observed: actual number of
observations for a given cell. It is
enabled by default.
II. Expected: expected number of
observations for that cell.
III. Unstandardized Residuals: "residual"
value, computed as observed minus
expected.
• Format: Table Format window, which
specifies how the rows of the table
are sorted. ASCENDING is enabled by
default
26
Output, Decision and Interpretation

• The value of the test statistic is 3.171 (see Pearson chi-square)


• The corresponding p-value of the test statistic is p = 0.205.
• Since the p-value is greater than our chosen significance level (α = 0.05), we do not
reject the null hypothesis.
• we conclude that there is not enough evidence to suggest an association between
gender and smoking.
• So, No association was found between gender and smoking behavior (Χ2(2)> = 3.171,
p = 0.205). 2
7

You might also like