This action might not be possible to undo. Are you sure you want to continue?
is a data set posted on the BlackBoard site that you can use to follow along FIRST OFF: HOW TO FIND/USE SPSS On DePaul lab computers, go to PROGRAMS, then STATISTICAL SOFTWARE and click on SPSS Alternatively, you can open a pre-existing data set (like the one used for this lab) and it should open up SPSS. In SPSS, you can enter your data and calculate descriptive and inferential statistics using fairly simple steps. When entering data, I would advise entering as follows: Participant ID Participant 1 Participant 2 Participant 3 Variable 1 (e.g., year) 3 2 4 Variable 2 (e.g., GPA) 3.2 2.5 2.8 Variable 3 (e.g., graduate school plans) 0 1 1
List each participant (give them ID numbers—PROTECT CONFIDENTIALITY) descending in the first column. Each of the following columns represents one of your variables. In SPSS, make sure to CODE all variables into NUMERIC values. You can use SPSS to compute statistics for numbers (not written variables). For example, Year (1= Freshmen, 2= Sophomore, 3= Junior, 4= Senior). You can also DICHOTOMIZE variables (assigning them a score of 0 or 1, depending on presence or absence—good for things like gender, presence/absence of a behavior [observational research]). For example, Graduate School Plans... 0= no school, 1= planning on going to grad school. STEP 1: GETTING TO KNOW THE DATA EXAMINE GENERAL FEATURES OF DATA, LOOK FOR OUTLIERS, ANOMALIES, IMPOSSIBLE NUMBERS IN YOUR DATA Go to ANALYZE, then DESCRIPTIVE STATISTICS in the dropdown menu, and then FREQUENCIES. Move variables of interest over into the column on the right. Click the box for DISPLAY FREQUENCY TABLES. Next hit CHARTS. From here you can click whatever chart type you are comfortable with. I generally recommend choosing HISTOGRAM, and I would also click on checkbox for WITH NORMAL CURVE. Then hit OK. You will see a print out like this:
Statistics IQ N Valid Missing 88 0
(the Valid lets you know how many pieces of data were entered for this variable [also see the N—Sample Size], Missing lets you know how many data points in this variable are not entered/answered)
IQ Frequency Valid 75 79 81 82 83 84 85 86 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 1 1 2 3 2 2 3 2 3 2 1 3 2 2 1 6 2 1 2 1 3 2 3 2 1 3 4 3 3 3 1 4 Percent 1.1 1.1 2.3 3.4 2.3 2.3 3.4 2.3 3.4 2.3 1.1 3.4 2.3 2.3 1.1 6.8 2.3 1.1 2.3 1.1 3.4 2.3 3.4 2.3 1.1 3.4 4.5 3.4 3.4 3.4 1.1 4.5 Valid Cumulative Percent Percent 1.1 1.1 1.1 2.3 2.3 4.5 3.4 8.0 2.3 10.2 2.3 12.5 3.4 15.9 2.3 18.2 3.4 21.6 2.3 23.9 1.1 25.0 3.4 28.4 2.3 30.7 2.3 33.0 1.1 34.1 6.8 40.9 2.3 43.2 1.1 44.3 2.3 46.6 1.1 47.7 3.4 51.1 2.3 53.4 3.4 56.8 2.3 59.1 1.1 60.2 3.4 63.6 4.5 68.2 3.4 71.6 3.4 75.0 3.4 78.4 1.1 79.5 4.5 84.1
112 114 115 118 120 121 127 128 131 137 Total
1 1 2 3 2 1 1 1 1 1 88
1.1 1.1 2.3 3.4 2.3 1.1 1.1 1.1 1.1 1.1 100.0
1.1 1.1 2.3 3.4 2.3 1.1 1.1 1.1 1.1 1.1 100.0
85.2 86.4 88.6 92.0 94.3 95.5 96.6 97.7 98.9 100.0
(this table lets you know how many people scored/responded with a certain score [Frequency] and the percentage of total participants answered that same way)
14 12 10
y c n e u q e r F
4 2 0
Std. Dev = 12.98 Mean = 100.3 N = 88.0 0 7 5.0 8 5.0 9 5.0 1 05.0 115.0 1 25.0 1 35.0 8 0.0 9 0.0 1 00.0 1 10.0 1 20.0 1 30.0
(this is a histogram of the chart—it gives you a visual view of what your data distribution/variability of the data, as well as the central tendencies, including how NORMAL the data is—again, NORMAL data will look somewhat like a bell-curve) OUTLIERS/ANOMALIES/IMPOSSIBLE NUMBERS So what happens when we find an outlier or “odd” numbers? Try doing the above steps with the “var00001” variable.
What would you do with data that looks like this? STEP 2: SUMMARIZING/DESCRIBING THE DATA DESCRIBE CENTRAL TENDENCIES AND DISTRIBUTION OF THE DATA Summarizing the data is similar to the above step (in fact, you can do them both at the same time!). Click on ANALYZE, then DESCRIPTIVE STATISTICS, and then FREQUENCIES again. From here click on STATISTICS. Now you can choose any of the statistics listed here. I would recommend the ones we talked about in class, such as MEAN, MEDIAN, MODE, RANGE, STANDARD DEVIATON, STANDARD ERROR OF THE MEAN (S.E. OF THE MEAN). You will get a similar output with the addition of a table that looks like this:
Statistics IQ N Mean Std. Error of Mean Std. Deviation Range
88 0 100.26 1.384 12.985 62
STEP 3: CONFIRM WHAT THE DATA REVEAL CONFIDENCE INTERVALS (CI) CONFIDENCE INTERVALS FOR A SINGLE MEAN To estimate confidence intervals for a single mean, you can click on GRAPHS then ERROR BAR. Next, click on SIMPLE and SUMMARIES OF SEPARATE VARIABLES. Then move over the variables of interest into the box to the right (I would recommend doing this one variable at a time). Then from the BARS REPRESENT dropdown menu select CONFIDENCE INTERVAL FOR THE MEAN. Make sure you have selected 95% for the level (you can choose others, but 95% is the norm). Click OK. You will see a box that looks like this:
Q I I C % 5 9
N = 88
(the dot in the middle represents the SAMPLE MEAN and the bars above and below represent the UPPER and LOWER CI) You can also go to ANALYZE, then DESCRIPTIVE STATISTICS, and then EXPLORE. Then move variable of interest into the DEPENDENT LIST to the right. Make sure you have STATISTICS or BOTH checked for DISPLAY. Then hit OK. You will see a print out like this:
Descriptives IQ Statistic Std. Error Mean 100.26 1.384 95% Confidence Interval for Lower Bound 97.51 Mean Upper Bound 103.01 5% Trimmed Mean 99.78 Median 100.00 Variance 168.609 Std. Deviation 12.985 Minimum 75 Maximum 137 Range 62 Interquartile Range 18.50 Skewness .394 .257 Kurtosis -.163 .508
(this gives us a numerical summary of the CI versus a visual display, as seen above) CONFIDENCE INTERVAL BETWEEN INDEPENDENT GROUP MEANS To view CI of difference on a measure (DV) across different groups (IV), take similar steps as above. Click on GRAPHS then ERROR BARS. Again, click on SIMPLE, but now click on SUMMARIES FOR GROUPS OF CASES—click DEFINE. Then move the variable of interest over into the VARIABLE section and whatever you want to group it by (e.g., year in school, gender, etc.) into the CATEGORY AXIS box. Hit OK. You will see a box similar to this:
Q I I C % 5 9
N = 78 10
(this shows the MEAN and CI for the variable [IQ] across the different groups [DROPOUT] as well as the CI for each group) Similar to the example above (CI OF SINGLE MEAN), you can go to ANALYZE, then DESCRIPTIVE STATISTICS, and then EXPLORE. Put the variable of interest in the DEPENDENT LIST and the variable you wish to group it by in the FACTOR LIST. Hit OK. You should have a print out like this:
Descriptives DROPOUT I 0 Q Statistic Std. Error 101.65 1.467
95% Confidence Interval for MeanLower Bound 98.73 Upper Bound 104.58 5% Trimmed Mean 101.25
Median 102.00 Variance 167.918 Std. Deviation 12.958 Minimum 75 Maximum 137 Range 62 Interquartile Range 17.50 Skewness .281 Kurtosis -.180 1 Mean 89.40 95% Confidence Interval for MeanLower Bound 84.58 Upper Bound 94.22 5% Trimmed Mean 89.50 Median 89.50 Variance 45.378 Std. Deviation 6.736 Minimum 79 Maximum 98 Range 19 Interquartile Range 13.50 Skewness -.267 Kurtosis -1.377
.272 .538 2.130
(this gives us numeric info on the lower and upper CI for both groups [0 and 1])
PART II: TESTS OF STATISTICAL SIGNIFICANCE AND THE “ANALYSIS STORY” STEP 3: CONFIRM WHAT THE DATA REVEAL continued... Last time we discussed CI’s and how they can be used to support our hypotheses Now we will discuss Null Hypothesis Significance Testing (NHST) This is the most common approach for data analysis Goal of NHST is to determine whether mean differences among groups in an experiment are greater than differences expected simply because of chance (error variation) First step-- assume that the groups do not differ. This is called the Null Hypothesis (H0) Assume the independent variable did not have an effect Step 2-- Probability theory: estimate likelihood of observed outcome, while assuming H0 is true This is what we mean by “statistically significant”—statistical significance is different from scientific significance or practical/clinical significance (Note: you don’t have to necessarily know the differences between these, but just know that something can be statistically significant without being practically significant) So what does “statistically significant” mean? The outcome has small likelihood (<5% chance= p< .05) of occurring under H0 Step 3: Run statistical analyses— p < .05... If this is the case, you reject H0 – conclude there is an effect of IV on the DV! So, the difference between means is larger than what would be expected if error variation (random chance) alone caused the outcome What do we conclude when a finding is not statistically significant (p > .05)? We do not reject the H0 -- no difference, no effect BUT, we also don’t accept the H0! We don’t conclude that the IV didn’t produce an effect We cannot make a conclusion about the effect of an IV Some factor in experiment may have prevented us from observing an effect of the IV Most common factor: too few participants!!!!
So, not matter what you find, make an argument as to WHY you could not find support for your hypothesis-- do not say that your hypothesis is incorrect Due to the nature of probability testing, it is possible that errors can occur with our findings! Types of errors: Type I error: null hypothesis is rejected when it really is true We observe statistically significant finding (p < .05) But in truth, there is no effect of IV Probability of making Type I error = alpha (α) Setting level of significance at p < .05 indicates researchers accept probability of Type I error as 5% for any given experiment Type II error: null hypothesis is false but it’s not rejected Claim effect of IV is not statistically significant (p > .05) But in truth, there is an effect of the IV Experiment missed the effect Because of the possibility of Type I and Type II errors researchers are always tentative about their claims Use words such as “findings support the hypothesis” or “consistent with the hypothesis” Never say the hypothesis was proven!!! There is important info about statistical power and sensitivity in the chapter you should be familiar with, but in interests of getting through material, I am leaving you to cover and be familiar with that information TESTS OF STATISTICAL SIGNIFICANCE INDEPENDENT GROUPS/SAMPLES T-TEST If you’re trying to demonstrate that 2 or more groups are different on some measure/variable Let’s see an example from our previous data set— Remember our above example where we examined the CI for IQ across 2 groups—HS dropout versus no-dropout? Let’s now do an INDEPENDENT SAMPLES T-TEST to determine if this difference is STASTICALLY SIGNIFICANT Go to ANALYZE, then COMPARE MEANS, then INDEPENDENT SAMPLES T-TEST. Then put the variable of interest in the TEST VARIABLES section and the group variable in the GROUPING VARIABLE section. Now you have to DEFINE the GROUPING VARIABLE (you can define it in whatever numbers/terms you want, just make sure it makes sense to you). Then click on OK. You will get a print out that looks like this.:
Group Statistics DROPOUT IQ 0 1
N 78 10
Std. Deviation 101.65 12.958 89.40 6.736
Std. Error Mean 1.467 2.130
Independent Samples Test Levene's Test for Equality of Variances F Sig. t df
t-test for Equality of Means
IQ Equal variances assumed Equal variances not assumed
Sig. (2- Mean Std. Error 95% Confidence Interval tailed) Difference Difference of the Difference Lower Upper .004 12.25 4.183 3.938 20.569
4.737 19.064 .000
Note the Levene’s Test for Equality of Variances—this tests whether the variances (standard deviation/standard error of the mean) for the 2 groups are equal (we want it to be equal). If this test is NOT SIGNIFICANT (p > .05—which is what we want), then we look at the top row—EQUAL VARIANCES ASSUMED. If this test IS SIGNIFICANT (p< .05) we use the lower row—EQUAL VARIANCES NOT ASSUMED. What would we do in this case? Next we look at the t value, and whether it’s significant. In this case, the t value is significant, so we would reject the null hypothesis—we have support that there are differences between the groups. ANALYSIS OF VARIANCE (ANOVA) Used if we are trying to determine 3 or more groups differ from each other on some variable/measure I have created a new variable in the data set called ADDLVL. This is a variable the groups participants into 3 groups based on how many ADHD-related problems the participants had (1= lower, 2= middle, 3= higher). I want to see if these 3 groups differ on their IQ. Go to ANALYZE, then COMPARE MEANS, and then ONE-WAY ANOVA. Put the variable of interest in the DEPENDENT LIST (in this case, IQ) and the grouping variable in the FACTOR section. You don’t have to make any other changes yet (although I would go to OPTIONS and click on DESCRIPTIVES... it can never hurt). You should see an output something like this:
1.00 2.00 3.00 Total ANOVA IQ
24 51 13 88
113.71 96.29 91.00 100.26
Std. Std. Error 95% Confidence Minimum Maximum Deviation Interval for Mean Lower Upper Bound Bound 10.407 2.124 109.31 118.10 90 137 10.356 1.450 93.38 99.21 75 120 6.819 1.891 86.88 95.12 79 102 12.985 1.384 97.51 103.01 75 137
Sum of Squares Between 6257.442 Groups Within 8411.547 Groups Total 14668.989
Mean Square 2 3128.721 98.959
Here, we can see that the ANOVA (F-test) is significant (F= 31.616, p= .000). This tells us that at least 2 of our groups are statistically significant. BUT WHICH ONES? To figure this out, we need to run POST HOC analyses. This basically means “after this,” or in this case “after this test.” POST HOC analyses should ONLY be run if the ANOVA/F-TEST was significant at the first step (like above). If it was not, there is no point in running POST HOCs. There are many different types of POST HOCs that use different assumptions and meet certain requirements. For this course, I will not make you figure out the many different ways in which POST HOCs are different. The most common POST HOC is Tukey’s- so feel free to use this POST HOC should you need it for your analysis. You calculate POST HOCs similar to running a regular ANOVA. Like before, go to ANALYZE, then COMPARE MEANS, and ONE-WAY ANOVA. Now- click on POST HOC. From here, click the desired test (Tukey’s in this case). Then hit OK. You should have a new table that looks like this:
Multiple Comparisons Dependent Variable: IQ Tukey HSD Mean Std. Error Difference (I-J) (I) addlvl (J) addlvl 2.00 17.41 2.462 .000 3.00 22.71 3.426 .000 2.00 1.00 -17.41 2.462 .000 3.00 5.29 3.091 .206 3.00 1.00 -22.71 3.426 .000 2.00 -5.29 3.091 .206 * The mean difference is significant at the .05 level. 1.00 Sig. 95% Confidence Interval Lower Bound 11.54 14.54 -23.29 -2.08 -30.88 -12.67
Upper Bound 23.29 30.88 -11.54 12.67 -14.54 2.08
From here, we can see that GROUP 1 is significantly different from GROUP 2 and GROUP 3. We can also see GROUP 2 and GROUP 3 are NOT significantly different from each other. Look at the MEAN DIFFERENCE (I-J) as well as the DESCRIPTIVES (i.e., MEAN) from before to determine HOW they differ. In this case, GROUP 1 had higher IQ than GROUP 2 and GROUP 3. CORRELATION Next we will run a correlation test. Correlations are used to see whether or not 2 or more variables are related. So, you can use a correlation to determine the relation between how a person scores on 2 items/variables/measures. By using this procedure, you can also use the correlation/relation to predict one variable using another. We’ll talk more about this in class. How to run a correlation in SPSS—go to ANALYZE, then CORRELATE, then BIVARIATE. From here, you would place the variables you are interested in seeing how/if they are related into the VARIABLES column to the right. For CORRELATION COEFFECIENTS, click on PEARSON’S (this gives you your r vale—remember this value from statistics?). On TEST OF SIGNIFICANCE, click on TWO-TAILED (we’ll talk about the difference between one-tailed and two-tailed later). Make sure FLAG SIGNIFICANT CORRELATIONS is checked. You can also click on OPTIONS and then MEANS AND STANDARD DEVIATIONS. Hit OK and you should have a print out that looks like this:
Correlations IQ GPA Pearson 1 .497** Correlation Sig. (2-tailed) . .000 N 88 88 GPA Pearson .497** 1 Correlation Sig. (2-tailed) .000 . N 88 88 ** Correlation is significant at the 0.01 level (2-tailed). IQ
In a correlation table, what you want to look for is the PEARSON value (r value) and the SIG row. In the table, you see that the correlation runs the correlation test for how IQ and GPA are related, but also how IQ is related to IQ and GPA is related to GPA (see above). When a variable is correlated with itself, the PEARSON value will be 1 (perfect correlation). You can ignore those scores. What you’re interested in looking at is how IQ is related to GPA. In this case, the PEARSON value between these is .497 and the sig (p) is .000. In this case, the correlation between these 2 variables IS SIGNIFICANT. The PEARSON/r value tells us the DIRECTION and the STRENGTH of the relationship. You might remember from Statistics that a correlation can be NEGATIVE or POSITIVE.
A POSITIVE correlation means that both variables move in the SAME DIRECTION—as one variable INCREASES, the other INCREASES; or as one variables DECREASES, the other DECREASES. An example of this is as “time spent studying for exam” INCREASES, “exam score” increases. Alternatively, as “time spent studying for midterm” DECRASES, “exam score” DECREASES.” If it is NEGATIVE or INVERSE, that means that the variables move in OPPOSITE DIRECTIONS-- as one variable INCREASES the other DECREASES. One example of this might be as “stress” INCREASES, school performance might DECREASE. On the SPSS print out, you can determine whether the correlation is POSITIVE or NEGATIVE by looking at the PEARSON value—is it a POSITIVE or NEGATIVE number? In this case, it is POSITIVE, meaning as IQ increases, GPA increases. The other thing to look for in correlation is the STRENGTH of the relationship. A correlation score (this is our PEARSON or r value) can range from anywhere between -1 and +1. Here is an example distribution of possible correlation scores: -1 -.75 -.50 -.25 0 .25 .50 .75 1
The closer the correlation is to 1 or -1, the STRONGER it is—the variables are more closely related and it will be easier to find a significant relationship. The closer they are to 0, the WEAKER the relationship is, and it may be harder to find a significant relationship. You can also run a correlation between 3 or more variables. You would follow the above procedures (ANALYZE, CORRELATE, BIVARIATE), but now you would just add additional variables. Make sure all the same things are clicked as noted above and hit OK. You will now get a print out that looks like this:
Correlations IQ GPA ADDSC Pearson 1 .497** -.632** Correlation Sig. (2-tailed) . .000 .000 N 88 88 88 GPA Pearson .497** 1 -.615** Correlation Sig. (2-tailed) .000 . .000 N 88 88 88 ADDSC Pearson -.632** -.615** 1 Correlation Sig. (2-tailed) .000 .000 . N 88 88 88 ** Correlation is significant at the 0.01 level (2-tailed). IQ
Now you will notice the table has gotten bigger and the test will test the correlation
between each variables separately (e.g., IQ—GPA, IQ—ADDSC, GPA—IQ, GPA— ADDSC, ADDSC—IQ, ADDSC—GPA). You would read the table the same way as above, focusing on the PEARSON/r value and the Sig level. What can we determine from adding the new variable into this correlation? We’ll now talk a little bit about how to COMPUTE and TRANSFORM data and variables in SPSS. I will not include this info on BlackBoard-- this material will not be included on the exam, but might be helpful for the final project.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.