Professional Documents
Culture Documents
Statistical Procedures
1. Summarizing Data:
– Frequencies: descriptive stats of one categorical variable
– Descriptive: descriptive stats of one or many numeric variables
– Crosstabs: descriptive stats of two categorical variables
– Explore: descriptive stats of one numeric variable
– Frequency table
1. Analyzing Data and Testing hypothesis
• Parametric Tests- (Comparing Means )
– T-Tests
– One-Way Analysis of Variance
– Correlation and regression
• Nonparametric Tests
– Chi Square or Crosstabs
– Wilcoxon Signed Ranks
– Mann-Whitney U
– Kruskal-Wallis
1
Choosing Statistical Test
Data type & Comparison & samples Association Regression
distribution 2 data sets More than 2 data (correlation (Predication
sets between 2 of one
Paired Unpaired Paired Unpaired variables) variable
from other
variable)
3
Continued…
• Pair: The “Pair” column represents the number of Paired
Samples t Tests to run. You may choose to run multiple Paired
Samples t Tests simultaneously by selecting multiple sets of
matched variables. Each new pair will appear on a new line.
• Variable1: The first variable, representing the first group of
matched values.
• Variable2: The second variable, representing the second
group of matched values.
• Clicking Options will open a window where you can specify
the Confidence Interval Percentage and how the analysis will
address Missing Values
4
Output and Interpretation-Example
• Decision and Conclusions: REJECT if the Sig. or p-value is less than 0.05
• There was a significant average difference between English and Math scores (t397 =
36.313, p < 0.001).
• On average, English scores were 17.3 points higher than Math scores (95% CI [16.36,
18.23]). 5
Independent Samples t Test
• Only compare the means for two (and only two) groups to test the
Statistical differences between them
• To run an Independent Samples t Test in SPSS, click Analyze >
Compare Means > Independent-Samples T Test.
6
Continued…
• Test Variable(s): The dependent variable(s). This is the continuous variable
whose means will be compared between the two groups.
• Grouping Variable: The independent variable. The categories (or groups)
of the independent variable will define which samples will be compared in
the t test.
• Click Define Groups to define the category indicators (groups) to use in the
t test.
• Options: this section is where you can set your desired confidence level for
the confidence interval for the mean difference, and specify how SPSS
should handle missing values.
7
Output
• The p-value of Levene's test is ".000" (but should be read as p < 0.001), so we reject the
null of Levene's test and conclude that the variance in mile time of athletes is significantly
different than that of non-athletes.
• This tells us that we should look at the "Equal variances not assumed" row for the t test
(and corresponding confidence interval) results.
• Note that the mean difference is calculated by subtracting the mean of the second group
from the mean of the first group.
• The positive t value indicates that mean for the first group is significantly greater than the
mean for the second group.
8
Continued…
• The associated p value is printed as ".000"; double-clicking on the p-
value will reveal the un-rounded number. SPSS rounds p-values to three
decimal places, so any p-value too small to round up to .001 will print
as .000. (In this particular examples, the p-values are on the order of 10 -
40
.)
Decision and Conclusions
• Since p < .001 is less than our chosen significance level α = 0.05, we can
reject the null hypothesis, and conclude that the that the mean mile
time for athletes and non-athletes is significantly different.
Based on the results, we can state the following:
• There was a significant difference in mean mile time between non-
athletes and athletes (t315.846 = 15.047, p < .001).
• The average mile time for athletes was 2 minutes and 14 seconds faster
than the average mile time for non-athletes.
9
One-Way ANOVA/One-Factor ANOVA/Between
•
Subjects ANOVA
Compares the means of two or more independent groups in order to determine
whether there is statistical evidence that the associated population means are
significantly different.
• click Analyze > Compare Means > One-Way ANOVA.
• Dependent List: The dependent variable(s). the variable whose means will be
compared between the samples (groups).
• Factor: The independent variable. categories (or groups) of the independent
variable will define which samples will be compared.
10
Continued….
• Contrasts: (Optional)- planned comparisons, to be conducted after the
overall ANOVA test. When the initial F test indicates that significant
differences exist between group means, contrasts are useful for
determining which specific means are significantly different when you
have specific hypotheses to be tested.
• Contrasts are decided before analyzing the data (i.e., a priori).
11
Continued…
• Post Hoc: (Optional)- also known as multiple comparisons tests. post hoc
tests are useful for determining which specific means are significantly
different when you do not have specific hypotheses to be tested.
• It also specifies significance level: The desired cutoff for statistical
significance. By default, significance is set to 0.05.
12
Continued….
• Options: which Statistics to include, to include a Means plot and how
the analysis will address Missing Values
13
Problem Statement
• In the sample dataset, the variable running time is the
respondent's time (in seconds) to run a given distance, and
Smoking is an indicator about whether or not the respondent
smokes (0 = Nonsmoker, 1 = Past smoker, 2 = Current
smoker).
• Let's use ANOVA to test if there is a statistically significant
difference in running time with respect to smoking status.
• Running time will serve as the dependent variable, and
smoking status will act as the independent variable.
14
Continued….
• We conclude that the mean running time is significantly different for at least
one of the smoking groups (F2, 350 = 9.209, p < 0.001).
16
Pearson Correlation
• Bivariate Pearson Correlation produces a sample correlation coefficient, r, which
measures the strength and direction of linear relationships between pairs of
continuous variables.
• H0: ρ=0 (population correlation coefficient is 0; there is no association)
• Analyze > Correlate > Bivariate.
• Variables: select at least 2 continuous variables, but may select more than two.
17
Continued…
• Correlation Coefficients: By default, Pearson is selected. It will produce
the test statistics for a bivariate Pearson Correlation.
• Test of Significance: SPSS uses a two-tailed test by default.
• Flag significant correlations: this option will include asterisks (**) next to
statistically significant correlations in the output. By default, SPSS marks
statistical significance at the alpha = 0.05
• Options: specify which Statistics to include and how to address Missing
Values
18
Example: Understanding the linear association between
weight and height
19
Output, Decision and Conclusions
• The important cells we want to look at are either B or C. Cells B and C are identical and
contain the correlation coefficient for the correlation, its p-value, and the number of
complete pairwise observations.
• Pearson correlation coefficient for height and weight is .513, which is significant (p < .001
for a two-tailed test), based on 354 complete observations
Decision and Conclusions
• H0 is rejected. Weight and height have a statistically significant linear relationship (p
< .001).
• The direction of the relationship is positive
• The magnitude, or strength, of the association is approximately moderate (.3 < | r | < .5).
2
0
Linear Regression
• It is simple regression b/se it involves single independent variable
(X) & dependent variable (Y) and linear r/ship between them
• It is also extension of Pearson correlation and both variables must
be continuous variables
• It helps to predict the value of DV using a single IV
• Questions to be addressed are:
Is there association b/n X and Y variables?
If you make 1 unit change on IV, How much change can you expect
on DV?
• Ho: coefficient is equal to zero, X and Y have no r/ship (or no effect)
• Ho is rejected when p-value is less than alpha 0.05
21
Continued…
• Procedure: Analyze > Regression > Linear Regression
• Click Statistics- select Modal fit & Estimates (default) and add Confidence
interval & Descriptive
• Click Plots- select Normal probability plot
• Example- Does height (IV) significantly predicts weight (DV)?
22
Write up of data analysis
23
Chi-square Test
Chi-Square Test of Independence determines whether there is an association b/n two
categorical variables (i.e., whether the variables are independent or not).
• Click on Analyze -> Descriptive Statistics -> Crosstabs.
• Drag and drop one variable into the Row(s) box, and one into Column(s) box.
• Click on Statistics, and select Chi-square.
• (Optional) Check the box for Display clustered bar charts.
• Press Continue, and then OK. Result will appear in output viewer.
24
Continued…
• Row(s): You must enter at least one Row variable.
• Column(s): You must enter at least one Column variable.
• Layer: An optional "stratification" variable.
• Statistics: Opens the Crosstabs. Make sure that Chi-square box is
checked off.
25
Continued….
• Cells: controls which output is
displayed in each cell of the crosstab.
There are three options in this
window that are useful (but
optional).
I. Observed: actual number of
observations for a given cell. It is
enabled by default.
II. Expected: expected number of
observations for that cell.
III. Unstandardized Residuals: "residual"
value, computed as observed minus
expected.
• Format: Table Format window, which
specifies how the rows of the table
are sorted. ASCENDING is enabled by
default
26
Output, Decision and Interpretation