You are on page 1of 115

Return to Menu

Page | Search this
Site | Provide Feedback

Lesson 1: SPSS Windows and Files
Objectives
1.

Launch SPSS for Windows.

2.

Examine SPSS windows and file types.

Overview
In a typical SPSS session, you are likely to work with two or more SPSS windows and to save the
contents of one or more windows to separate files. The window containing your data is the SPSS Data
Editor. If you plan to use the data file again, you may click on File, Save from within the Data Editor
and give the file a descriptive name. SPSS will supply the .sav extension, indicating that the saved
information is in the form of a data file. An SPSS data file includes both the data records and their
structure. The window containing the results of the SPSS procedures you have performed is the SPSS
Viewer. You may find it convenient to save this as an output file. It is okay to use the same name you
used for your data because SPSS will supply the .spo extension to indicate that the saved file is an
output file. As you run various procedures, you may also choose to show the SPSS syntax for these
commands in a syntax window, and save the syntax in a separate .sps file. It is possible to run SPSS
commands directly from syntax, though in this series of tutorials we will focus our attention on SPSS
data and output files and use the point-and-click method to enter the necessary commands.
Launching SPSS
SPSS for Windows is launched from the Windows desktop. There are several ways to access the
program, and the one you use will be based on the way your particular computer is configured. There
may be an SPSS for Windows shortcut on the desktop or in your Start menu. Or you may have to
click Start, All Programs to find the SPSS for Windows folder. In that folder, you will find the SPSS
for Windows program icon.
Once you have located it, click on the SPSS for Windows icon with the left mouse button to launch
SPSS. When you start the program, you will be given a blank dataset and a set of options for running
the SPSS tutorial, typing in data, running queries, creating queries, or opening existing data sources
(see Figure 1-1). For now, just click on Cancel to reveal the blank dataset in the Data Editor screen.

Figure 1-1 SPSS opening screen

The SPSS Data Editor
Examine the SPSS Data Editor's Data View shown in Figure 1-2 below. You will learn in Lesson 2 how
create an effective data structure within the Variable View and how to enter and manipulate data using
the Data Editor. As indicated above, if you click File, Save while in the Data Editor view, you can save
the data along with their structure as a separate file with the .sav extension. The Data Editor provides
the Data View as shown below, and also a separate Variable View. You can switch between these views
by clicking on the tabs at the bottom of the worksheet-like interface.

Figure 1-2 SPSS Data Editor (Data View)

The SPSS Viewer
The SPSS Viewer is opened automatically to show the output when you run SPSS commands. Assume
for example that you wanted to find the average age of 20 students in a class. We will examine the
commands needed to calculate descriptive statistics in Lesson 3, but for now, simply examine the
SPSS Viewer window (see Figure 1-3). When you click File, Save in this view, you can save the output
to a file with the .spo extension.

the syntax to calculate the mean age shown above is shown in Figure 1-4: Figure 1-4 SPSS Syntax Editor Though we will not address SPSS syntax except in passing in these tutorials.sps) files for future . When you are selecting commands. For example. you can view and save SPSS syntax commands from the Syntax Editor window. Clicking that button pastes the syntax for the commands you have chosen into the Syntax Editor. you will see a Paste button.Figure 1-3 SPSS Viewer Syntax Editor Window Finally. you should note that you can run commands directly from the Syntax Editor and save your syntax (.

Launch SPSS. which is a blank dataset in the Data View of the SPSS Data Editor (see Figure 2-1): . We will assume in this lesson that you will type data directly into the SPSS Data Editor to create a new data file. Select Type in Data or Cancel . It is always important to check data entries carefully and ensure that the data are accurate. Select cases. In this lesson you will learn how to build an SPSS data file from scratch. Creating a Data File A common first step in working with SPSS is to create or open a data file. Lesson 2: Entering and Working with Data Objectives 1. You will be given various options. Sort cases. the version illustrated in these tutorials. you are ready to learn how to enter. 3. Now that you know the kinds of windows and files involved in an SPSS session. Compute a new variable. as we discussed in Lesson 1. 4. how to calculate a new variable. Create a data file and data structure. Overview Data can be entered directly into the SPSS Data Editor or imported from a variety of file types. Unlike earlier versions of SPSS. or copy and paste data from worksheets and tables to create new data files. automatically presents in the SPSS Viewer the syntax version of the commands you give it when you point and click in the Data Editor or the SPSS Viewer (examine Figure 1-3 for an example). Those are the subjects of Lesson 2. and manipulate data. 5.reference. how to select and sort cases. structure. 2. version 15. You should now see a screen similar to the following. and how to split a file into separate layers. You should realize that you can also read data from many other programs. Split a file.

If you use a coding variable to indicate which group or condition was assigned to a case.Data View Key Point: One Row Per Participant. and that no case's data should appear on different rows. So if you were looking at the scores for five quizzes for each of 20 students. filtered. For this purpose. Although SPSS automatically numbers the rows of the data table. the data for each student would occupy a single row (line) in the data table. subject. When there are multiple measures for a case.Figure 2-1 SPSS Data Editor . it is a very good habit to provide a separate participant (or subject) number column so that records can be easily sorted. we will switch to the Variable View of the Data Editor by clicking on the Variable View tab at the bottom of the Data Editor window. Best practice also requires setting up the data structure for the data. or selected. each measure should appear in a separate column (called a "variable" by SPSS). or case.Variable View . that variable should also appear in a separate column. Figure 2-2 SPSS Data Editor . See Figure 2-2. One Column per Variable It is important to note that each row in the SPSS data table should be assigned to a single participant. and the score for each quiz would occupy a separate column.

We will assume that we also know the age and the sex of each student.Example Data Let us establish the data structure for our example of five quizzes and 20 students. The numbers at the left of the window now refer to variables rather than participants. the variable Width (in total characters or digits). Variable labels. labels for different Values. The hypothetical data are shown below: Student Sex Age Quiz1 Quiz2 Quiz3 Quiz4 Quiz5 1 0 18 83 87 81 80 69 2 0 19 76 89 61 85 75 3 0 17 85 86 65 64 81 4 0 20 92 73 76 88 64 5 1 23 82 75 96 87 78 6 1 18 88 73 76 91 81 7 0 21 89 71 61 70 75 8 1 20 89 70 87 76 88 9 1 23 92 85 95 89 62 10 1 21 86 83 77 64 63 11 1 23 90 71 91 86 87 12 0 18 84 71 67 62 70 13 0 21 83 80 89 60 60 14 0 17 79 77 82 63 74 15 0 19 89 80 64 94 78 16 1 20 76 85 65 92 82 17 1 19 92 76 76 74 91 18 1 22 75 90 78 70 76 19 1 22 87 87 63 73 64 20 0 20 75 74 63 91 87 Specifying the Data Structure Switch to the Variable View by clicking on the Variable View tab (see Figure 2-2 above). But you will definitely want to enter a variable Name and Label. how to Alignthe variable in the display. and type of measure. and also specify Value labels for the levels of categorical or grouping variables such as sex or the levels of an independent variable. most statistical procedures are easier to perform if a number is used to code such categorical variables. The variable names should be short and should not contain spaces or special characters other than perhaps underscores. or scale (interval and ratio). ordinal. Student 2. number of decimals. Note that you can specify the variable Name. Let us assign the number "1" to females and the number "0" to males. the number of Decimals . In many cases you can simply accept the defaults by leaving the entries blank. and whether the Measure is nominal. We will also provide information concerning the width. Age . a descriptive Label. along with a descriptive label: 1. Although we could enter "F" for female and "M" for male. can be longer and can contain spaces and special characters. how to deal with Missing Values. the display Column width. Let us specify the structure of our dataset by naming the variables as follows. the Type of variable. on the other hand. Sex 3.

After we enter the desired information. Quiz5 No decimals appear in our raw data. so we won't confuse our 1's and 0's later. so we will set the number of decimals to zero. To do this.4. click on Values in the Sex variable row and enter the appropriate labels for males and females (see Figure 2-4). Quiz1 5. Quiz3 7. the completed data structure might appear as follows: Figure 2-3 SPSS data structure (Variable View) Notice that we provided value labels for Sex. Quiz4 8. Quiz2 6. .

The data should appear as follows: . In this case. Save the data file with a name that will help you remember it. Remember that SPSS will provide the . you may retrieve a copy of the data file by clicking here.Figure 2-4 Adding value labels After entering the value and label for one sex.sav extension for a data file. Entering the Data Now return to the data view (click on the Data View tab). we used lesson_2. If you prefer. Click on Add after entering this information and then click OK. click on Add and then repeat the process for the other sex. and type in the data.sav as the file name.

Quiz4. Quiz5) You can enter the formula by selecting MEAN from the Functions window and then clicking on the variable names. Let us call the new variable Quiz_Avg and use SPSS's built-in function called MEAN to compute it. and enter the formula for computing the new variable. then Compute. it will be added to our variable list. In this case. You may type in the new variable name. The Compute Variable dialog box appears. Quiz3.Figure 2-5 Completed data entry Computing a New Variable Now we will compute a new variable by averaging the five quiz scores for each student. The initial Compute Variable dialog box with the target variable named Quiz_Avg and the MEAN function selected is below. we will use the formula: Quiz_Avg = MEAN(Quiz1. specify the type and provide a label. The question marks indicate that you must supply expressions for the computation. When we compute this new variable. and a new column will be created for it. Select Transform. Quiz2. or you can simply type in the formula. separating the variable names by commas. .

Figure 2-6 Compute Variable screen The appropriate formula is as follows: .

Figure 2-7 Completed expression When you click OK. the new variable appears in both the data and variable views (see below). As discussed earlier. . you can change the number of decimals (numerical variables default to two decimals) and add a descriptive label for the new variable.

Figure 2-8 New variable appears in Data View .

then click on Select Cases. click on Data. The resulting dialog box allows you to select the desired cases for further analysis. SPSS allows you to select cases either by filtering (which keeps all the cases but limits further analyses to the selected cases) or by removing the cases that do not meet your criteria. See the dialog box in the following figure. you will want to filter cases. . you may want to create separate files for additional analyses by deleting records that do not match your selection criteria. but sometimes. From either the variable view or the data view. We will select records for females and filter those records so that the records for males remain but will be excluded from analyses until we select them again. or to re-select all cases if data were previously filtered.Figure 2-9 New variable appears in Variable View Selecting Cases You may want to select only certain cases. such as the data for females or for individuals with ages lower than 20 years." and specify that we want to select only records for which the sex of the participant is female. Let us choose "If condition is satisfied. Usually.

. and then examine the data view (see Figure 2-12).. indicating that though still present. Records for males will now have a diagonal line through the row number label. In this case we will enter the expression Sex = 1.. then Click OK. these records are excluded from further analyses. or you can point and click to the entries in the dialog box Figure 2-11 Select Cases expression Click Continue.Figure 2-10 Select Cases dialog Click the "If." button and enter the condition for selection. You can type this in directly.

you can use this filter variable to select females instead of having to re-enter the selection formula. . If you return to the Data menu and select all the cases again.Figure 2-12 Selected and filtered data Also notice that a new variable called Filter_$ has been automatically added to your data file. you can right-click on its column label and select Clear. If you do not want to keep this new variable.

Figure 2-13 Filter variable added by SPSS Sorting Cases Next you will learn to sort cases. For example. Select Cases menu and choose "Select all cases" in order to re-select the records for males. Select Data. Let's return to the Data. Sort Cases: . we may want to sort the records in our dataset by age and sex. We can sort on one or more variables.

Figure 2-15 Sort Cases dialog Return to the Data View and confirm that the data are sorted by sex and by age within sex (see Figure 2-16).Figure 2-14 Sort Cases option Move Sex and Age to the "Sort by" window (see Figure 2-15) and then click OK. .

SPPS can do that for you at the same time the file is split (see Figure 2-17). Instead of filtering cases. splitting a file creates separate "layers" for the grouping variables. select Data. The data in a goup need to be consecutive cases in the dataset. Split File. instead of selecting only one sex at a time. However. if your data are not already sorted. One convenient way to accomplish that is to split the file so that every procedure you run will be automatically conducted and reported for the two groups separately. To split a file.Figure 2-16 Cases sorted by Sex and Age Splitting a File The last subject we will cover in this tutorial is splitting a file. . so the records must be sorted by groups. you may want to run several analyses separately for males and females. For example.

Display frequency distributions and histograms. Figure 2-18 Split file results in separate analysis for each group Lesson 3: Descriptive Statistics and Graphs Objectives 1. 3. . the command will be performed for each group separately and those results will be reported in the same output (see Figure 2-18). such as a table command to summarize average quiz scores. Compare means for different groups. 2.Figure 2-17 Split File menu Now. Compute descriptive statistics. when you run a command.

and then Descriptives as shown in the following screenshot (see Figure 3-1). You will also learn how to explore your data and create boxplots. Remember that we previously calculated the average quiz score for each person and included that as a new variable in our data file. Figure 3-1 Accessing the Descriptives Procedure Move the desired variables into the variables window (see Figure 3-2) and then Click OK.sav. and the average quiz score. We would like to calculate the average score (mean) and standard deviation for each quiz. select Analyze. Example Let us return to our example of 20 students and five quizzes. you will learn how to produce various descriptive statistics.4. or click here for lesson_3. We will also look at the mean scores for men and women on each quiz. simple frequency distribution tables. Open the SPSS data file you saved in Lesson 2. . and frequency histograms. then Descriptive Statistics. all quizzes. Overview In this lesson. Display boxplots. To calculate the means and standard deviations for age.

Figure 3-2 Move the desired variables into the variables window.

In the resulting dialog box, make sure you check (at a minimum) the boxes in front of Mean and Std.
deviation:

Figure 3-3 Descriptives options

The resulting output table showing the means and standard deviations of the variables is opened in
the SPSS Viewer (see Figure 3-4).

Figure 3-4 Output from Descriptives Procedure

Exploring Means for Different Groups
When you have two or more groups, you may want to examine the means for each group as well as
the overall mean. The SPSS Compare Means procedure provides this functionality and much more,
including various hypothesis tests. Assume that you want to compare the means of men and women
on age, the five quizzes, and the average quiz score. Select Analyze, Compare Means, Means (see
Figure 3-5):

Figure 3-5 Selecting Means Procedure

Click OK, and then in the resulting dialog box, move the variables you are interested in summarizing
into the Dependent List. At this point, do not worry whether your variables are actual "dependent
variables" or not. Move Sex to the Independent List (see Figure 3-6). Click on Options to see the
many summary statistics available. In the current case, make sure that Mean, Number of Cases, and
Standard Deviation are selected.

Figure 3-6 Means dialog box

For many procedures. and the interested reader is encouraged to explore the many additional charting and graphing features of SPSS. .When you click OK. and present data in graphic form. Let us examine the distribution of ages of our 20 hypothetical students. One very useful feature of the Frequencies procedure in SPSS is that it can produce simple frequency tables and histograms. You may optionally choose to have the normal curve superimposed on the histogram for a visual check as to how the data are distributed. From Lesson 2 you may recall that splitting the file would allow you to calculate the descriptive statistics separately for males and females. summarize. The way to find the procedure that works best in a given situation is to try different ones. Select Analyze. graphs and plots are available as output options. We will look at only a few of these features. Frequency Distributions and Histograms SPSS provides several different ways to explore. as shown in the following figure. and always to explore the options presented in the SPSS menus and dialog boxes. SPSS also has an extensive interactive chart gallery and a chart builder that can be accessed through the Graphs menu. The extensive SPSS help files and tutorials are also very useful. there are several ways to produce summary statistics such as means and standard deviations in SPSS. Figure 3-7 Report from Means procedure As this lesson makes clear. Frequencies (see Figure 3-8). the report table appears in the SPSS Viewer with the separate means for the two sexes along with the overall data. Descriptive Statistics.

and then click on Charts. Select Histograms and check the box in front of With normal curve (see Figure 3-9).Figure 3-8 Selecting Frequencies procedure In the Frequencies dialog. move Age to the variables window. .

SPSS displays the simple frequency table for age and the frequency histogram with the normal curve (see Figures 3-10 and 3-11). . In the resulting output.Figure 3-9 Frequencies: Charts dialog Click Continue and OK.

SPSS also provides many graphical and semi-graphical techniques collectively referred to as exploratory data .Figure 3-10 Simple frequency table Figure 3-11 Frequency histogram with normal curve Exploratory Data Analysis In addition to the standard descriptive statistics and frequency distributions and graphs.

a side-by-side boxplot comparing the average quiz grades of men and women is shown in Figure 3-12. Lesson 5 covers the paired-samples or within-subjects t test. The reader . As with the Compare Means procedure. Exlpore. Descriptive Statistics. Conduct an independent-samples t test. Some of the most widely-used EDA techniques are boxplots and stem-and-leaf displays. Overview The independent-samples or between-groups t test is used to examine the effects of one independent variable on one dependent variable and is restricted to comparisons of two conditions or groups (two levels of the independent variable).analysis (EDA). For example. 2. we will describe how to analyze the results of a between-groups design. and providing summary descriptions. groups can be separated if desired. In this lesson. identifying outliers. You can access these techniques through the commands found through Analyze. Figure 3-12 Boxplots Lesson 4: Independent-Samples t Test Objectives 1. EDA is useful for describing the characteristics of a dataset. Interpret the output of the t test.

then you are using a between-groups design. In a within-subjects design.0 10 0 69. you may want to revisit Lesson 2. the variable view of your SPSS data file should look like the one below.2 7 1 76. If you randomly assign some participants to the caffeine group and other participants to the no-caffeine group. the teacher gave the parents specific methods for encouraging their children's educational activities.0 6 1 78. The variable view of the data file might look similar to the one below. So. each participant is assigned to only one group.5 16 0 79.8 12 0 66. suppose that you are interested in studying the effects caffeine consumption on task performance.6 2 1 64.9 3 1 100.7 13 0 54. Further. The scores on the first test were tabulated for all of the children. and these are presented below: Student Involve Test1 1 1 78. For example. with three variables--one for student number. all participants would be tested once with caffeine and once without caffeine.should note that SPSS incorrectly labels this test a "T test" rather than a t test. The other half of the students in the class were assigned to the no-parental involvement group. as some of the SPSS output also refers to t-test results . you must also create a separate column for the grouping variable that shows to which condition or group a particular participant belongs. . that is the parental involvement condition.0 4 1 83. but is inconsistent in that labeling. When creating the data file.3 15 0 73. In this design. A between-groups design is one in which participants have been randomly assigned to the two levels of the independent variable. so you should create a numeric code that allows SPSS to identify the parental involvement condition for that particular score.9 8 1 82. the two groups are independent of one another. An Example: Parental Involvement Experiment Assume that you studied the effects of parental involvement (independent variable) on students' grades (dependent variable). and one column for the score on Test 1.8 14 0 69.0 9 0 81.5 11 0 73. one for parental involvement condition (using for example a code of "1" for involvement and "0" for no involvement).4 Creating Your Data File: Key Point When creating a data file for an independent-samples t test in SPSS. These labels make it easier to interpret the output of your statistical procedures.7 5 1 94. in which a grouping variable is created for male and female students. and consequently. If this concept is difficult to grasp. The teacher contacted the parents of these children throughout the year and told them about the educational objectives of the class. In this case. by contrast. is is a good idea to create a variable Label for each variable and Value label for the grouping variable(s). Half of the students in a third grade class were randomly assigned to the parental involvement group.

Figure 4-1 Variable View The data view of the file should look like the following: Figure 4-2 Data View .

complete the following steps in order. with the first half of the data corresponding to the parental involvement condition and the second half corresponding to the noinvolvement condition. Remember that Grouping Variable stands for the levels of the independent variable. Click on Analyze. . then Independent Samples T Test. Or you may access the SPSS data file for the parental involvement experiment by clicking here. you must specify which condition a participant is in by use of a grouping variable as indicated above. Figure 4-3 Select Analyze. Performing the t test for the Parental Involvement Experiment You should enter the data as described above. labeled "Score on Test 1 [Test 1] ") into the Test Variable window. then Compare Means. move the dependent variable (in this case. Compare Means. "Parental Involvement [Involve]") into the Grouping Variable window.Note that in this particular case the two groups are separated in the data file. whether or not the data are sorted by the independent variable. When performing the test. Although this makes for an orderly data table. Independent-Samples T Test Now. such ordering is NOT required in SPSS for the independent-samples t test. To perform the t test. Then move your independent variable (in this case.

click on Continue. In this case. you will want to put a "0" in the field labeled Group 1 and a "1" in the field labeled Group 2. You may also want to click on Paste in order to save the SPSS syntax of what you have done (see Figure 4-5) in case you desire to run the same kind of test from SPSS syntax. Now click on OK to run the t test. Figure 4-5 Syntax for the independent-samples t test . and indicate the numeric values that each group represents. Once you have done this.Figure 4-4 Independent-Samples T Test dialog box You will notice that there are question marks in the parentheses following your independent variable in the Grouping Variable field. To do so. click on Define Groups. This is because you need to define the particular groups that you want to compare.

so this value is often fractional. making it easier to interpret the output. The second table (see Figure 4-6) presents you with an F test (Levene's test for equality of variances) that evaluates the basic assumption of the t test that the variances of the two groups are approximately equal (homogeneity of variance or homoscedasticity). then the assumption of homogeneity of variance has been violated. and we should report the value of t as 2. Interpret the output of the paired-samples t test. t(14) = 2. our data show that parental involvement has a significant effect on grades. Figure 4-6 Independent-samples t test output Interpreting the Output In the SPSS output. Thus.05 or . as seen above.356. you can see that we have not violated the homogeneity assumption. and standard error of the mean for both of your groups. . 034. the first table lists the number of participants (N). standard deviation. If the F value reported here is very high and the significance level is very low--usually lower than . Notice that the value labels are printed as well as the variable labels for your variables. you should use the t test in the upper half of the table. p = . mean. 2. degrees of freedom of 14.01. In this particular case.356. In that case. and the significance level of .034. Conduct a paired-samples t test. The t-test formula for unequal variances makes an adjustment to the degrees of freedom. you should use the t test in the lower half of the table. Lesson 5: Paired-Samples t Test Objectives 1. whereas if you have not violated the homogeneity assumption.Output from the t test Procedure As you can see below. the output from an independent-samples t test procedure is relatively straightforward.

making this a generally more powerful test than the independent-samples t test. Example Imagine that you conducted an experiment to test the the effects of the presence of others (independent variable) on problem-solving performance (dependent variable). or pairings created by the experimenter. the degrees of freedom for the paired-samples t test are based on the number of pairs rather than the number of observations. Because of the dependency. Higher scores indicate better problem-solving performance. each participant was tested alone and in the presence of others on different days using comparable tasks. that is. The dependency between the two observations is taken into account. In any of these cases.Overview The paired-samples or dependent t test is used for within-subjects or matched-pairs designs in which observations in the groups are linked. the analysis is the same. and each set of observations serves as its own control. Assume further that you used a within-subjects design. natural pairings such as mothers and daughters. The linkage could be based on repeated measures. The data appear below: Participant Alone Others 1 12 10 2 8 6 3 4 5 4 6 5 5 12 10 6 6 5 7 11 7 8 5 3 9 7 6 10 12 7 11 9 8 12 5 2 The following figure shows the variable view of the structure of the dataset: Figure 5-1 Dataset variable view Entering Data for a Within-Subjects Design: Key Point .

When you enter data for a within-subjects design, there must be a separate column for each
condition. This tells SPSS that the two data points are linked for a given participant. Unlike the
independent-samples t test where a grouping variable is required, there is no additional grouping
variable in the paired-samples t test. The properly configured data are shown in the following
screenshot of the SPSS Data Editor Data View:

Figure 5-2 Dataset data view

Performing the Paired-Samples t test Step-by-Step
The SPSS data file for this example can be found here. After you have entered or opened the dataset,
you should follow these steps in order.
Click on Analyze, Compare Means, and then Paired-Samples T test.

Figure 5-3 Select Paired-Samples T Test

In the resulting dialog box, click on the label for Alone and then press <Shift> and click on the label
for Others. Click on the arrow to move this pair of variables to the Paired Variables window.

Figure 5-4 Identify paired variables

Interpreting the Paired-Samples t Test Output

Click OK and the following output appears in the SPSS Output Viewer Window (see Figure 5-5). Note
that the correlation between the two observations is reported along with its p level, and that the value
of t, the degrees of freedom (df), and the p level of the calculated t are reported as well.

Figure 5-5 Paired-Samples T Test output

Lesson 6: One-Way ANOVA
Objectives
1.

Conduct a one-way ANOVA.

2.

Perform post hoc comparisons among means.

3.

Interpret the ANOVA and post hoc comparison output.

Overview
The one-way ANOVA compares the means of three or more independent groups. Each group
represents a different level of a single independent variable. It is useful at least conceptually to think
of the one-way ANOVA as an extension of the independent-samples t test. The null hypothesis in the
ANOVA is that the several populations being sampled all have the same mean. Because the variance is
based on deviations from the mean, the "analysis of variance" can be used to test hypotheses about
means. The test statistic in the ANOVA is an F ratio, which is a ratio of two variances. When an ANOVA
leads to the conclusion that the sample means differ by more than a chance level, it is usually
instructive to perform post hoc or (a posteriori) analyses to determine which of the sample means are
different. It is also helpful to determine and report effect size when performing ANOVA.

Each student saw the same 10 words flashed on a computer screen for five seconds each. we could just as easily have used 0. The correctly-entered data would take the following form (see Figure 6-1). In the second method. The list was repeated in random order until each word had been presented a total of five times. the student was instructed to repeat the word silently when it was presented. In the first method. A week later. 1. For each of the three groups. students were asked to write down as many of the words as they could recall. the number of correctly-recalled words is shown in the following table: Method1 Method2 Method3 1 4 7 2 4 4 0 0 9 0 6 8 4 6 6 3 6 9 1 6 6 0 6 4 3 4 5 3 4 6 Entering the Data in SPSS Recall our previous lessons on data entry. and each participant's data should take up one line of the data file. Note that although we used 1. ten students each were randomly assigned to three different methods of memorizing word lists. and 2. .Example Problem In a class of 30 students. the student was instructed to spell the word backward and visualize the backward word and to pronounce it silently. These 30 scores represent 30 different individuals. and 3 to code group membership. The group membership should be coded as a separate variable. The third method required the student to associate each word with a strong memory. 2.

. click on Analyze. Compare Means. One-Way ANOVA (see Figure 6-2).Figure 6-1 Data for one-way ANOVA Conducting the One-Way ANOVA To perform the one-way ANOVA in SPSS.

which is one of the most frequently used post hoc procedures. move Recall to the Dependent List and Method to the Factor field. . Compare Means. One-Way ANOVA In the resulting dialog box.Figure 6-2 Select Analyze. Select Post Hoc and then check the box in front of Tukey for the Tukey HSD test (see Figure 6-3). Note also the many other post hoc comparison tests available.

but that the means for Methods 2 and 3 are not significantly different. The post hoc test results indicate that the mean for Method 1 is significantly lower than the means for Methods 2 and 3.Figure 6-3 One-Way ANOVA dialog with Tukey HSD test selected The ANOVA summary table and the post hoc test results appear in the SPSS Viewer (see Figure 6-4). Note that the overal (omnibus) F ratio is significant. . indicating that the means differ by a larger amount than would be expected by chance alone if the null hypothesis were true.

Figure 6-4 ANOVA summary table and post hoc test results As an aid to understanding the post hoc test results. SPSS also provides a table of homogenous subsets (see Figure 6-5). . Note that it is not strictly necessary that the sample sizes be equal in the one-way ANOVA. and when they are unequal. the Tukey HSD procedure uses the harmonic mean of the sample sizes for post hoc comparisons.

As such. A common effect size index is eta squared. but not the post hoc comparisons discussed earlier. Move Recall to the Dependent List and Method to the Independent List. An alternative method of performing the one-way ANOVA provides the effect-size index. Means (see Figure 6-6). To perform this alternative analysis. this index represents the proportion of variance that can be attributed to between-group differences or treatment effects. .Figure 6-5 Table of homogeneous subsets Missing from the ANOVA results table is any reference to effect size. select Analyze. select Anova Table and eta. which is the between-groups sum of squares divided by the total sum of squares. Under Options. Compare Means.

Eta squared is directly interpretable as an effect size index: 58 percent of the variance in recall can be explained by the method used for remembering the word list. .Figure 6-6 ANOVA procedure and effect size index available from Means procedure The ANOVA summary table from the Means procedure appears in Figure 6-7 below.

The hypothetical test results are as follows. 3. It is conceptually useful to think of the repeated-measures ANOVA as an extension of the paired-samples ttest. She administers a 20-item college algebra test to ten randomly selected statistics students at the beginning of the term. and six months after the course is finished. Conduct the repeated-measures ANOVA. 2. Construct a profile plot. Overview The repeated-measures or within-subjects ANOVA is used when there are multiple measures for each participant. Student Before After SixMo 1 13 15 17 2 8 8 7 3 12 15 14 4 12 17 16 5 19 20 20 6 10 15 14 . at the end of the term. Each set of observations for a subject or case serves as its own control. the test of interest is the within-subjects effect of the treatments or repeated measures. General Linear Model menu. Example Data Assume that a statistics professor is interested in the effects of taking a statistics course on performance on an algebra test. The procedure for performing a repeated-measures ANOVA in SPSS is found in the Analyze. In the repeated-measures ANOVA. Interpret the output.Figure 6-7 ANOVA table and effect size from Means procedure Lesson 7: Repeated-Measures ANOVA Objectives 1. so this test is quite powerful.

. click on Analyze.7 10 13 15 8 8 12 11 9 14 15 13 10 11 16 9 Coding Considerations Data coding considerations in the repeated-measures ANOVA are similar to those in the pairedsamples t test. The properly coded SPSS data file with the data entered correctly should appear as follows (see figure 7-1). Each participant or subject takes up a single row in the data file. and each observation requires a separate column. You may also retrieve a copy of the data file if you like. See Figure 7-2. Figure 7-1 SPSS data file coded for repeated-measures ANOVA Performing the Repeated-Measures ANOVA To perform the repeated-measures ANOVA in SPSS. then General Linear Model. and then Repeated Measures.

. and six months after the course. You can accept the default label of factor1. and there are three levels: at the beginning of the course. immediately after the course. you must specify the number of factors and the number of levels for each factor. We will use "Time" as the label for our factor. In this case. or change it to a more descriptive one. and specify that there are three levels (see Figure 7-3).Figure 7-2 Select Analyze. Repeated Measures In the resulting Repeated Measures dialog. the single factor is the time the algebra test was taken. General Linear Model.

Click on Add and then click on Define. you must add the factor and then define it. See Figure 7-4. Figure 7-4 Specifying within-subjects variable levels .Figure 7-3 Specifying factor and levels After naming the factor and specifying the number of levels.

and then click on Add. and then click on the right arrow to move all three levels to the window in one step (see Figure 7-5). If you like. . then hold down <Shift> and click on SixMo to select all three levels at the same time. Or you can click on Before in the left pane of the Repeated Measures dialog. you can also click on Plots to include a line graph of the algebra test mean scores for the three administrations. then Horizontal Axis.Now you can enter the levels one at a time by clicking on a variable name and then clicking on the right arrow adjacent to the Within-Subjects Variables field. Figure 7-6 is a screen shot of the Profile Plots dialog. and contrasts among the means. Click Continue to return to the Repeated Measures dialog. Figure 7-5 Within-subjects variables appropriately entered Clicking on Options allows you to specify the calculation of descriptive statistics. You should click on Time. effect size.

Figure 7-7 Specifying descriptive statistics. effect size. and mean contrasts . A Bonferroni correction will adjust the alpha level in the post hoc comparisons. and contrasts (see Figure 7-7). You must move Time to the Display Means window as well as specify a confidence level adjustment for the main effects contrasts. effect size. We will select the more conservative Bonferroni correction.Figure 7-6 Profile Plots dialog Now click on Options and specify descriptive statistics. while the default LSD (Fisher's least significant difference test) will not adjust the alpha level.

the multiviariate test is used to determine whether there is an overall within-subjects effect for the combined depedendent variables. we can ignore this test in the present case. As there is only one within-subject factor.003 (see Figure 7-8). which indicate that the means for Before and After are significantly different. See Figures 7-9 and 7-10. and adjustments to the degrees of freedom (and thus to the p level) are not required. The profile plot is of assistance in the visualization of these contrasts. Partial eta-squared has an interpretation similar to that of eta-squared in the one-way ANOVA. and is directly interpretable as an effect-size index: about 48 percent of the within-subjects variation in algebra test performance can be explained by knowledge of when the test was administered.149 with 2 and 18 degrees of freedom and the p level as . while none of the other comparisons are signficant.Click on Continue. Figure 7-8 Test of within-subjects effects Additional insight is provided by the Bonferroni-corrected pairwise comparisons. The SPSS output provides several tests. . We can assume sphericity and report the F ratio as 8. The insignificant test of sphericity indicates that this assumption is not violated in the present case. When there are multiple dependent variables. then OK to run the repeated-measures ANOVA. Sphericity is an assumption that the variances of the differences between the pairs of measures are equal. These results indicate an immediate but unsustained improvement in algebra test performance for students taking a statistics course. The test of interest is the Test of Within-Subjects Effects.

Figure 7-9 Bonferroni-corrected pairwise comparisions .

3.Figure 7-10 Profile plot Lesson 8: Two-Way ANOVA Objectives 1. let us refer to the factors as A and B and assume that each factor has two levels and each independent . Examine and interpret main effects and interaction effect. Conduct the two-way ANOVA. each of which has two or more levels. The group represents a combination of levels of the two factors. 2. In this case there are two independent variables (factors). We can think of this design as a table in which each cell represents a single independent group. Overview We will introduce the two-way ANOVA with the simplest of such designs. a balanced or completelycrossed factorial design. Produce a plot of cell means. For simplicity.

You are interested in the kind of violence observed: a violent cartoon versus a video of real-action violence. After the child watches the violent cartoon or action video. You randomly assign 8 children to each group. the child plays a Tetris-like computer video game for 30 minutes. Example Data and Coding Considerations Assume that you are studying the effects of observing violent acts on subsequent aggressive behavior. A second factor is the amount of time one is exposed to violence: ten minutes or 30 minutes. The hypothetical data are below: Figure 8-2 Example Data . The design can thus be visualized as follows: Figure 8-1 Conceptualization of Two-Way ANOVA The two-way ANOVA is an economical design. There will be four independent groups. The game provides options for either aggressing ("trashing" the other computerized player) or simply playing for points without interfering with the other player. The program provides 100 opportunities for the player to make an aggressive choice and records the number of times the child chooses an aggressive action when the game provides the choice.group has the same number of observations. because it allows the assessment of the main effects of each factor as well as their potential interaction.

For illustrative purposes we will use 1 and 2 to represent the levels of the factors. you could just as easily have used 0s and 1s. The data view of the resulting SPSS data file should appear something like this: Figure 8-3 SPSS data file data view for two-way ANOVA (partial data) For ease of interpretation. each participant takes up a row in the data file.When coding and entering data for this two-way ANOVA. and the data should be coded and entered in such a way that the factors are identified by two columns with group membership coded as a combination of the levels. you should recognize that each of the 32 participants is a unique individual and that there are no repeated measures. the variables can be labeled and the values of each specified in the variable view (see Figure 8-4). . though as you learned earlier. Therefore.

Figure 8-4 Variable view with labels and values identified If you prefer. Univariate In the resulting dialog. select Analyze. General Linear Model. General Linear Model. you should specify that Aggression is the dependent variable and that both Time and Type are fixed factors (see Figure 8-6). Performing the Two-Way ANOVA To perform the two-way ANOVA. . Figure 8-5 Select Analyze. and then Univariate because there is only one dependent variable (see Figure 8-5). you may retrieve a copy of the data file.

. the Type * Time interaction is added to the Plots window. It is helpful to specify profile plots to examine the interaction of the two variables.Figure 8-6 Specifying the two-way ANOVA This procedure will test the main effects for Time and Type as well as their possible interaction. select Plots and then move Type to the Horizontal Axis field and Time to the Separate Lines field (see Figure 8-7). as shown in Figure 8-8. For that purpose. Figure 8-7 Specifying profile plots When you click on Add.

Figure 8-9 Table of between-subjects effects As in the repeated-measures ANOVA. The table of interest is the Test of Between-Subjects Effects. The profile plot (see Figure 8-10) shows that the interaction is ordinal: the differences in the number of aggressive choices made after observing the two violence conditions increase with the time of exposure. Type and the Time * Type interaction (see Figure 8-9). . then click Options.Figure 8-8 Plotting an interaction term Click Continue. a partial eta-squared is calculated as a measure of effect size. Check the boxes in front of Descriptive statistics and Estimates of effect size (see Figure 8-9). Examination of the table reveals significant F ratios for Time. then click OK to run the two-way ANOVA. Click Continue.

3. 2. Overview A mixed factorial design involves two or more independent variables.Figure 8-10 Interaction plot Lesson 9: ANOVA for Mixed Factorial Designs Objectives 1. Construct a profile plot. of which at least one is a withinsubjects (repeated measures) factor and at least one is a between-groups factor. Test between-groups and within-subjects effects. In the simplest case. . Conduct a mixed-factorial ANOVA.

The between-groups factor would need to be coded in a single column as with the independent-samples t test or the one-way ANOVA. As always it is helpful to include a column for participant (or case) number. You may optionally download a copy of the data file. you obtained a group of younger adults and a separate group of older adults and had them learn under three conditions (eyes closed. To do this. The data appropriately entered in SPSS should look something like the following (see Figure 9-1). This is a 2 (age) x 3 (distraction condition) mixed factorial design. Figure 9-1 SPSS data structure for mixed factorial design Performing the Mixed Factorial Anova . so the data file will require eight rows. assume that you conducted an experiment in which you were interested in the extent to which visual distraction affects younger and older people's learning and remembering. There will be a column for the participants' age. Age Closed Eyes Simple Distraction Complex Distraction Younger 8 5 3 Younger 7 6 6 Younger 8 7 6 Younger 7 5 4 Older 6 5 2 Older 5 5 4 Older 5 4 3 Older 6 3 2 Building the SPSS Data File Note that there are eight separate participants. and three columns for the repeated measures. Example Data As an example. The scores on the data sheet below represent the number of words recalled out of ten under each distraction condition.there will be one between-groups factor and one within-subjects factor. eyes open looking at a distracting field of pictures). which are the distraction conditions. while the repeated measures variable would comprise as many columns as there are measures as in the paired-samples t test or the repeated-measures ANOVA. which is the between-groups variable. eyes open looking at a blank field.

You must first specify repeated measures to identify the within-subjects variable(s). Select Analyze. The initial steps are identical to those in the within-subjects ANOVA. Figure 9-2 Preparing for the Mixed Factorial Analysis Next. 3. and so on. If you like. you can give this measure (the three distraction levels) a new name by clicking in the Measure Name field. the SPSS name for the measure will default to MEASURE_1. you will use the repeated measures procedure. type in the label distraction and the number of levels. SPSS will give the within-subjects variables the names factor1.To conduct this analysis. If you choose to name this factor. . the name must be unique and may not conflict with any other variable names. and then specify the between-groups factor(s). there is only one within-subject variable. See Figure 9-3. the distraction condition. This process should be repeated for each factor on which there are repeated measures. If you do not name the measure. . then Repeated Measures (see Figure 9-2). In the Repeated Measures dialog box. In the present case we will leave the measure name blank and accept the default label. In our present case. you must define the within-subjects factor(s). but you can provide more descriptive names if you like. then General Linear Model. factor2.

Figure 9-3 Specifying the within-subjects factor. Click on Add and then Define to specify which variable in the dataset is associated with each level of the withinsubjects factor (see Figure 9-4). . We will now specify the within-subjects and between-groups variables.

and Complex variables to levels 1. and 3. 2. and then move Age to the Between-Subjects Factor(s) window (see Figure 9-5). respectively. Simple. You can optionally specify one or more covariates for analysis of covariance. .Figure 9-4 Defining the within-subjects variable Move the Closed.

Next click on Add to specify the plot (see Figure 9-6) and then click Continue. and then move Age to the Horizontal axis. Figure 9-6 Specifying plot . and distraction to Separate Lines. click on Plots.Figure 9-5 The complete design specification for the mixed factorial ANOVA To display a plot of the cell means.

to display descriptive statistics.We will use the Options menu to specify the display marginal and cell means. but you should focus on the between and within-subjects tests. and display measures of effect size. The test of sphericity is not significant. See Figure 9-7. and a possible interaction of the two. . The tables of interest from the SPSS Viewer are shown in Figures 9-8 and 9-9. Therefore you should use the F ratio and degrees of freedom associated with the sphericity assumption (see Figure 9-8). an effect for distraction condition. We will select the Bonferroni interval adjustment to control the level of Type I error. Specifically you will want to determine whether there is a main effect for age. The resulting SPSS output is rather daunting. to compare main effects. indicating that this assumption has not been violated. Figure 9-7 Repeated measures options Select Continue to close the options dialog and then OK to run the ANOVA.

Figure 9-8 Partial SPSS output The test of within-subjects effects indicates that there is a significant effect of the distraction condition on word memorization. The lack of an interaction between distraction and age indicates that this effect is consistent for both younger and older participants. The test of between-subjects effects (see Figure 9-9) indicates there is a significant effect of the age condition on word memory. Figure 9-9 Test of between-subjects effects .

Test the significance of correlation coefficients. Figure 9-10 Profile plot Lesson 10: Correlation and Scatterplots Objectives 1. 2.The remainder of the output assists in the interpretation of the main effects of the within-subjects (distraction condition) and between-subjects (age condition) factors. SPSS calls the within subjects variable MEASURE_1 in the plot. Of particular interest is the profile plot. Edit features of the scatterplot. 3. Construct a scatterplot. 4. which clearly displays the main effects and the absence of an interaction (see Figure 910). Calculate correlation coefficients. As disussed above. Overview .

" and "privacy. In multiple regression.Xk. we are assessing the degree of linear relationship between a predictor. and privacy. we measure variables in their natural state. . and you would like to test the hypothesis that satisfaction with the college living environment is related to wealth (family income)." "furniture.In correlational research. but because of a lack of linear relationship. study area. linear regression in the bivariate case in Lesson 11. Y. In bivariate (two-variable) correlation. we will need to determine whether the obtained correlation is significantly different from zero. We will address correlation in the bivariate case in Lesson 10. X1. Rather. We will also want to produce a scatterplot or scatter diagram to examine the nature of the relationship. X.) The questionnaire assesses the students' level of satisfaction with noise. The questionnaire contains five questions about satisfaction with the various aspects of the dormitory "noise. furniture. When we calculate a correlation coefficient from sample data. but to add appropriate labels in the Variable View to improve the readability of the output. Sometimes the correlation is low not because of a lack of relationship. examining the scatterplot will assist in determining whether a relationship may be nonlinear. Example Data Suppose that you have collected questionnaire responses to five questions concerning dormitory conditions from 10 college freshmen." These are answered on a 5-point Likerttype scale (very dissatisfied to very satisfied). (Normally you would like to have a larger sample. . and multiple regression and correlation in Lesson 12. there is no experimental manipulation. Y... Student Income Noise Furniture Study_Area Safety Privacy 1 39 5 5 4 5 5 2 59 3 3 5 5 4 3 75 2 1 2 2 2 4 45 5 3 4 4 5 5 95 1 2 2 1 2 6 115 1 1 1 1 1 7 67 3 2 4 3 3 8 48 4 4 5 4 4 9 140 2 2 1 1 1 10 55 3 4 5 4 4 Entering the Data in SPSS The data correctly entered in SPSS would look like the following (see Figure 10-1)." "space. Assume that you have also assessed the students' family income level. The Pearson product moment correlation coefficient summarizes and quantifies the relationship between two variables in a single number. but the small sample in this case is useful for illustration. If you prefer. Instead of independent and dependent variables. we are assessing the degree of relationship between a linear combination of two or more predictors. and a criterion. you can download a copy of the data file. it is useful to think of predictors and criteria. and a criterion. X2." "safety. The data sheet for this study is shown below. Remember not only to enter the data. In such cases. safety." "study. This number can range from -1 representing a perfect negative or inverse relationship to 0 representing no relationship or complete independence to +1 representing a perfect positive or direct relationship. which are coded as 1 to 5.

Correlate. . Bivariate (see Figure 10-2). select Analyze.Figure 10-1 Data entered in SPSS Calculating and Testing Correlation Coefficients To calculate and test the significance of correlation coefficients.

Figure 10-3 Move desired variables to the Variables window .Figure 10-2 The bivariate correlation procedure Move the desired variables to the Variables window. as shown in Figure 10-3.

The output contains a table of descriptive statistics (see Figure 10-4) and a table of correlations and related significance tests (see Figure 10-5). The correlation matrix is symmetrical.Under the Options menu. . let us select means and standard deviations and then click Continue. In our survey results we note strong negative correlations between family income and the various survey items and strong positive correlations among the various items. so the above-diagonal entries are the same as the below-diagonal entries. Figure 10-4 Descriptive statistics Figure 10-5 Correlation matrix Note that SPSS flags significant correlations with asterisks.

let us produce a scatterplot of the relationship between satisfaction with noise level in the dormitory and family income. select Graphs. Interactive.Constructing a Scatterplot For purposes of illustration. Scatterplot (see Figure 10-6). Please note that there are several different ways to construct the scatterplot in SPSS. As family income increases. We see from the correlation matrix that this is a significant negative correlation. . satisfaction with the dormitory noise level decreases. enter Family Income on the X-axis and Noise on the Y-axis (see Figure 10-7). and that we are illustrating only one here. Figure 10-6 Constructing a scatterplot In the resulting dialog. To build the scatterplot.

Figure 10-7 Specifying variables for the scatterplot The resulting scatterplot (see Figure 10-8) shows the relationship between family income and satisfaciton with dormitory noise. .

you can save this particular combination as a chart template to use it again in the future. you can change the labeling and scaling of axes. . In attition to many other options. and change the marker types. If you like. add trend lines and other elements to the scatterplot. The edited chart apears in Figure 109.Figure 10-8 Scatterplot In the SPSS Viewer it is possible to edit a chart object by double-clicking on it in the SPSS Viewer.

that is an indication that a linear equation can be used to model the relationship between the predictor X and the criterion Y. Overview Closely related to correlation is the topic of linear regression. You randomly select 10 class periods and record the outside temperature reading . If the correlation coefficient is significant. In this lesson you will learn how to determine the equation of the line of best fit between the predictor and the criterion. You observe an apparent relationship between the outside temperature and the number of people who skip class on a given day. Compute predicted Y values. Example Problem and Data This spring term you are in a large introductory psychology class. the correlation coefficient is an index of linear relationship. and how to calculate and interpret residuals. Determine the regression equation.Figure 10-9 Edited scatterplot Lesson 11: Linear Regression Objectives 1. 3. 2. and more seem to be present when it is cooler outside. Compute and interpret residuals. More people seem to be absent when the weather is warmer. how to compute predicted values based on that linear equation. As you learned in Lesson 10.

Running the Correlation procedure (see Lesson 10 for details).10 minutes before class time and then count the number of students in attendance that day. you find that the correlation is -. The data file may look something like the following (see Figure 11-1): Figure 11-1 Data in SPSS If you prefer. you can download a copy of the data. you should first determine whether there is a significant correlation between temperature and attendance. If you determine that there is a significant linear relationship.01 level (see Figure 11-2). The data you collect are the following: Temp Attendance 50 87 77 60 67 73 53 86 75 59 70 65 83 65 85 62 80 58 64 89 Entering the Data in SPSS These pairs of data must be entered as separate variables. As you learned in Lesson 10. and is significant at the .87. based on the outside temperature. . you would like to impress your professor by predicting how many people will be present on a given day.

Clearly. Figure 11-3 Scatterplot Linear Regression .Figure 11-2 Significant correlation A scatterplot is helpful in visualizing the relationship (see Figure 11-3). there is a negative relationship between attendance and temperature.

you can save the predicted values of Y and the residuals as either standard scores or raw-score equivalents. This is called "ordinary least squares" estimation or OLS. The usefulness of such a line may not be immediately apparent. Select Analyze. In linear regression. Optionally. the difference is called a "residual. so that the only thing that varies is the value of X. Let us now turn our attention to regression. Regression. we are seeking the equation of a straight line that best fits the observations. For example. the distance between the observed data points and the line of best fit represents the residual. We will "regress" the attendance (Y)on the temperature (X). we can use that line to predict a value of Y for any value of X. what attendance would you predict for a temperature of 60 degrees? The regression line can answer that question.The correlation and scatterplot indicate a strong. Running the Regression Procedure Open the data file in SPSS. and then Linear (see Figure 11-4). the correlation between the predicted Ys and the observed Ys will be the same as the correlation between the observed Ys and the observed Xs. and to calculate and interpret residuals. Therefore. but if we can model the relationship by a straight line. Note that the predicted value of Y (read "Y-hat") is a linear combination of two constants. to calculate predicted values of Y. SPSS's Regression procedure allows us to determine the equation of the line of best fit. Visually. This line will have an intercept term and a slope coefficient and will be of the general form The intercept and slope (regression) coefficient are derived in such a way that the sums of the squared deviations of the actual data points from the line are minimized. If we subtract the predicted value of Y from the observed value of Y. and the value of X. looking at the scatterplot in Figure 11-3." A residual represents the part of the Y variable that cannot be explained by the X variable. even those that have not yet been observed. . the intercept term and the slope term. relationship between the two variables. though by no means perfect.

When we run the regression tool.87. we can optionally ask for either standardized or unstandardized (raw-score) predicted values of Y and residuals to be calculated and saved as new variables (see Figure 11-5).4 Performing the Regression procedure The Regression procedure outputs a value called "Multiple R. The square of r or of Multiple R is . and is thus . ." which will always range from 0 to 1. and represents the amount of shared variance between Y and X.752.Figure 11. In the bivariate case. Multiple R is the absolute value of the Pearson r.

In the ANOVA table summarizing the regression. .Figure 11-5 Save options in the Regression procedure Click OK to run the Regression procedure. the omnibus F test tests the hypothesis that the population Multiple R is zero. by the total sum of squares. which is based on the predicted values of Y. which is based on the observed values of Y. The residual sum of squares represents the variance in the criterion that remains unexplained. We can safely reject that null hypothesis. Notice that dividing the regression sum of squares. The output is shown in Figure 11-6. produces the same value as R Square. The value of R Square thus represents the proportion of variance in the criterion that can be explained by the predictor.

Figure 11-6 Regression procedure output In Figure 11-7 you can see that the residuals and predicted values are now saved as new variables in the SPSS data file. .

. You would never predict attendance higher than 100 percent. and the attendance could begin to rise simply because the classroom is air-conditioned.556 . Note that this process of using a linear equation to predict attendance from the temperature has some obvious practical limits. you would predict the attendance to be 80 students (see Figure 11-8 in which this is illustrated graphically). for example. So for a temperature of 60 degrees.897 x Temp.Figure 11-7 Saving predicted values and residuals The regression equation for predicting attendance from the outside temperature is 133. and there may be a point at which the temperature becomes so hot as to be unbearable..

you predict that there will be 69 students in attendance that day. The normal curve can be superimposed for visual reference. This is another way of saying that the successive observations of the dependent variable are uncorrelated. The residuals are independent. & Tatham. Babin. Anderson. assume that the outside temperature on a class day is 72 degrees. The phenomenon (i. It is helpful to examine a histogram of the standardized residuals (see Figure 11-9). which can be created from the Plots menu. 2. Bivariate linear regression and multiple linear regression make four key assumptions about these residuals.Figure 11-8 Linear trend line and regression equation To impress your professor. 4. Examining Residuals A residual is the difference between the observed and predicted values for the criterion variable (Hair. Black. The residuals have equal variances at all levels of the predicted values of Y. the regression model being considered) is linear. Substituting 72 for X in the regression equation. Thus it can be very instructive to examine the residuals when you perform a regression analysis. 3.e. 2006). 1. . so that the relationship between X and Y is linear.. The residuals are normally distributed with a mean of zero.

. Another useful plot is the normal p-p plot produced as an option in the Plots menu. This plot compares the cumulative probabilities of the residuals to the expected frequencies if the residuals were normally distributed.Figure 11-9 Histogram of standardized residuals These residuals appear to be approximately normally distributed. In this case the residuals appear once again to be fairly normally distributed. Significant departures from a straight line would indicate nonnormality in the data (see Figure 11-10).

. Upper Saddle River. Hair. Multivariate data analysis (6th ed. Test the significance of the regression and the regression coefficients. 2. R. 1976). J. J. L. Babin. R. References Edwards. San Francisco: Freeman. C.. (1976). NJ: Pearson Prentice Hall. W. E. Perform and interpret a multiple regression analysis.Figure 11-10 Normal p-p plot of observed and expected cumulative probabilities of residuals When there are significant departures from normality. L.. Lesson 12: Multiple Correlation and Regression Objectives 1. and linearity. data transformations or the introduction of polynomial terms such as quadratic or cubic values of the original independent or dependent variables can often be of help (Edwards.. Anderson.. and Tatham. F. homoscedasticity. B. (2006). An introduction to linear regression and correlation. Black. A.).

By using group membership or treatment level qualitative coding variables as predictors. In the case of bivariate correlation. and the regression equation should be recomputed. GRE Quantitative scores. Other approaches include stepwise regression in which variables are entered according to their predictive ability and hierarchical regression in which variables are entered according to theory or hypothesis.S. As a rule of thumb. The multiple linear regression equation will take the following general form: Instead of using a to represent the Y intercept. Marchal. As well. We will examine hierarchical regression more closely in Lesson 14 on analysis of covariance. 2006). The equation for a line of best fit is derived in such a way as to minimize the sums of the squared deviations from the line. You may click here to retrieve a copy of the entire dataset. a regression analysis will yield a value of Multiple R that is the absolute value of the Pearson product moment correlation coefficient between X and Y.3. and thus of the entire regression. and homoscedasticity assumptions have been met. normality. we will ask for collinearity diagnostics when we run our regression. which is the degree of linear relationship among the predictors. As in simple linear regression. In this tutorial we will concentrate on the simplest kind of multiple regression. and heteroscedasticity. Example Data The following data (see Figure 12-1) represent statistics course grades. and the mathematical and statistical underpinnings of multiple regression make it an extremely powerful and flexible tool. Although there are multiple predictors. Various data transformations can be attempted to accommodate situations of curvilinearity. Overview Multiple regression involves one continuous criterion (dependent) variable and two or more predictors (independent variables). The significance of Multiple R. the significiance of the individual regression coefficients must be examined to verify that a particular independent variable is adding significantly to the prediction. In multiple regression we must also consider the potential impact of multicollinearity. non-normality. one or more of the predictors should be dropped from the analysis. The value of Multiple R will range from zero to one. if the variance inflation factor (VIF) for a given predictor is very high or if the absolute value of the correlation between two predictors is greater than . as discussed in Lesson 11. (source: data collected by the webmaster). one can easily use multiple regression in place of t tests and analyses of variance. there is only one predicted Y value. and the correlation between the observed and predicted Y values is called Multiple R. For this reason. When there is a high degree of collinearity in the predictors. it is common practice in multiple regression to call the intercept term b0. must be tested. and may lead to inappropriate conclusions regarding which predictors are statistically significant (Lind. and Wathen.70. Multiple regression is in actuality a general family of techniques. and cumulative GPAs for 32 graduate students at a large public university in the southern U. residual plots are helpful in diagnosing the degree to which the linearity. the regression equation will tend to be distorted. . Examine residuals for diagnostic purposes. a forced or simultaneous regression in which all predictor variables are entered into the regression equation at one time.

A very useful first step is to calculate the zero-order correlations among the predictors and the criterion. We will use the Correlate procedure for that purpose.Figure 12-1 Statistics course grades. GREQ. Bivariate (see Figure 12-2). and GPA (partial data) Preparing for the Regression Analysis We will determine whether quantitative ability (GREQ) and cumulative GPA can be used to predict performance in the statistics course. Correlate. Select Analyze. .

but are not significantly related to each other. you can request descriptive statistics if you like. . Thus our initial impression is that collinearity will not be a problem (see Figure 12-3). The resulting intercorrelation matrix reveals that GREQ and GPA are both significantly related to the course grade.Figure 12-2 Calculate intercorrelations as preparation for regression analysis In the Options menu of the resulting dialog box.

select Analyze.Figure 12-3 Descriptive statistics and intercorrelations Conducting the Regression Analysis To conduct the regression analysis. . Regression. Linear (see Figure 12-4).

. as shown in Figure 12-5. move Grade to the Dependent variable field and GPA and GREQ to the Independent(s) list.Figure 12-4 Selecting the Linear Regression procedure In the Linear Regression dialog box.

Figure 12-6 Requesting collinearity diagnostics Select Continue and then click on Plots to request standardized residual plots and also to request scatter diagrams. You should request a histogram and normal distribution plot of the standardized residuals.Figure 12-5 Linear Regression dialog box Click on the Statistics button and check the box in front of collinearity diagnostics (see Figure 12-6). You can also plot the standardized residuals against the standardized predicted values to check the assumption of homoscedasticity (see Figure 12-7). .

Figure 12-8 Regression procedure output (excerpt) .Click OK to run the regression analysis. The results are excerpted in Figure 12-8.

If the two predictor variables were orthogonal (uncorrelated). The histogram of the standardized residuals shows that the departure from normality is not too severe (see Figure 12-9). Thus we conclude that there is not a problem with collinearity in this case. the standardized coefficients are quite interpretable.Interpreting the Regression Output The significant overall regression indicates that a linear combination of GREQ and GPA predicts grades in the statistics course. though both are significant. we can conclude that GREQ has more predictive value than GPA. and indicates that about 51 percent of the variation in grades is accounted for by knowledge of GPA and GREQ. In the present case. the variance inflation factor (VIF) for each would be 1. the raw-score regression coefficient for GREQ is much smaller than that for GPA because the two variables use different scales. These are technically standardized partial regression coefficicients. The value of R-Square is . because each shows the relative contribution to the prediction of the given variable with the other variable held constant. because these are based on raw scores and their values are influenced by the units of measurement of the predictors.513. Examining the unstandardized regression coefficients is not very instructive. Figure 12-9 Histogram of standardized residuals . The collinearity diagnostics indicate a low degree of overlap between the predictors (as we predicted). The significant t values for the regression coefficients for GREQ and GPA show that each variable contributes significantly to the prediction. Thus. On the other hand.

Eliminating that case and recomputing the regression increases Multiple R slightly and also reduces the heteroscedasticity.The normal p-p plot indicates some departure from normality and may suggest a curvilinear relationship between the predictors and the criterion (see Figure 12-10). case 11 (Participant 118). Figure 12-10 Nomal p-p plot The plot of standardized predicted values against the standardized residuals indicates a large degree of heteroscedasticity (see Figure 12-11). . whose GREQ and grade scores are significantly lower than those of the remainder of the group. This is mostly the result of a single outlier.

That is. the expected frequencies are commonly derived on the basis of the assumption of independence. In this analysis. 2. Perform and interpret a chi-square test of goodness of fit. Tests for two categorical variables are usually called tests of independence or association. Tests for one categorical variable are generally called goodness-of-fit tests. Overview Chi-square tests are used to compare observed frequencies to the frequencies expected under some hypothesis. In this case. then a cell entry would be expected to be the product of the cell's row and column marginal totals divided by the overall sample size.Figure 12-11 Plot of predicted values against residuals Lesson 13: Chi-Square Tests Objectives 1. The null hypothesis might state that the expected frequencies are equally distributed or that they are unequal on the basis of some theoretical or postulated distribution. there is a one-way table of observed frequencies of the levels of some categorical variable. if there were no association between the row and column variables. Perform and interpret a chi-square test of independence. . In this case. there will be a two-way contingency table with one categorical variable occupying rows of the table and the other categorical variable occupying columns of the table.

listening skills. and are encouraged to schedule private sessions with their peer mentors whenever they desire.In both tests. You collect the following (hypothetical) data: . If the student is no longer enrolled (i. For that reason. The control group members receive no formal peer mentoring. You identify an additional 30 students at orientation as a control group.e. are encouraged to work with their small group for study sessions. dropped out. If he or she is still enrolled. you assign a 1. you code a 1. you generate a binary code for grades. The cross-tabulation procedure can make use of numeric or text entries. At the end of four years. while the Nonparametric Test procedure requires numeric entries. Example Data Assume that you are interested in the effects of peer mentoring on student academic success in a competitive private liberal arts college. the chi-square test statistic is calculated as the sum of the squared differences between the observed and expected frequencies divided by the expected frequencies. Students whose grades are below the median for their major receive a zero. These students are assigned to a team of seniors who have been trained as tutors in various academic subjects. If he or she has graduated. you code a 2. you will use the Crosstabs procedure under the Descriptive Statistics menu in SPSS. You will find the goodness of fit test for equal or unequal unexpected frequencies as an option under Nonparametric Tests in the Analyze menu. If the student's cumulative GPA is at the median or higher for his or her academic major. A group of 30 students is randomly selected during their freshman orientation. according to the following simple formula: where O represents the observed frequency in a given cell of the table and E represents the corresponding expected frequency under the null hypothesis. or flunked out). has transferred. you code a zero for retention. you compare the two groups on academic retention and academic performance.. You determine that there are no significant differences between the high school grades and SAT scores of the two groups. We will illustrate both the goodness-of-fit test and the test of independence using the same dataset. but has not yet graduated after four years. The 30 selected students meet in small group sessions with their peer tutors once each week during their entire freshman year. For the chi-square test of independence. Because GPAs differ by academic major. and team-building skills. you will need to recode any text entries into numerical values for goodnessof-fit tests. You code mentoring as 1 = present and 0 = absent to identify the two groups.

the data should look like the following (see Figure 13-1). you may also download a copy of the dataset. .Properly entered in SPSS. For your convenience.

select Analyze. and 20 would have graduated after four years. you would expect each outcome to be observed in 1/3 of the cases if there were no differences in the frequencies of these outcomes. Thus the null hypothesis would be that 20 students would not be enrolled. you must use the Nonparametric Tests procedure. To conduct the test. and graduated) and sixty total students.Figure 13-1 Dataset in SPSS (partial data) Conducting a Goodness-of-Fit Test To determine whether the three retention outcomes are equally distributed. Because there are three possible outcomes (no longer enrolled. Chi-Square as shown in Figure 13-2. Nonparametric Tests. currently enrolled. To test this hypothesis. you can perform a goodness-of-fit test. . 20 would be currently enrolled.

. move Retention to the Test Variable List and accept the default for equal expected frequencies. p = . The degrees of freedom for the goodness-of-fit test are the number of categories minus one. N = 60) = 6.10. χ2 (2.Figure 13-2 Selecting chi-square test for goodness of fit In the resulting dialog box. SPSS counts and tabulates the observed frequencies and performs the chisquare test (see Figure 13-3).047. The significant chi-square shows that the freqencies are not equally distributed.

To conduct the test. . To test the hypothesis that there is an association (or non-independence) between mentoring and retention. Descriptive Statistics.Figure 13-3 Chi-square test of goodness of fit Conducting a Chi-Square Test of Independence If mentoring is not related to retention. Crosstabs (see Figure 13-4). That would mean that you would expect half of the students in each outcome group to come from the mentored students. and the other half to come from the non-mentored students. so that any observed differences in frequencies would be due to chance. select Analyze. you will conduct a chi-square test as part of the cross-tabulation procedure. you would expect mentored and non-mentored students to have the same outcomes.

move one variable to the row field and the other variable to the column field. I typically place the variable with more levels in the row field to keep the output tables narrower (see Figure 13-5). .Figure 13-4 Preparing for the chi-square test of independence In the Crosstabs dialog. though the results of the test would be identical if you were to reverse the row and column variables.

Under the Statistics option. .Figure 13-5 Establishing row and column variables Clustered bar charts are an excellent way to compare the frequencies visually. so we will select that option (see Figure 13-5). select chi-square and Phi and Cramer's V (measures of effect size for chi-square tests). Click OK to run the Crosstabs procedure and conduct the chi-square test. The format menu allows you to specify whether the rows are arranged in ascending (the default) or descending order. You can also click on the Cells button to display both observed and expected cell frequencies.

or in this case 2 x 1 = 2. The clustered bar chart provides an excellent visual representation of the chi-square test results (see Figure 13-7). χ2 (2. indicating a large effect size (Gravetter & Walnau. . the degrees of freedom are the number of rows minus one multiplied by the number of columns minus one. indicating that mentoring had an effect on retention. p < .58. 2005). N = 60) = 14.001. The Pearson Chi-Square is significant.Figure 13-6 Partial output from Crosstabs procedure For the test of independence.493. The value of Cramer's V is .

B. J. Perform and interpret an analysis of covariance using the General Linear Model. so the choice is left to the user and . F. Belmont. 2. (2005).Figure 13-7 Clustered bar chart Going Further For additional practice. 1975).. An entirely equivalent analysis is also possible using hierarchical regression. CA: Thomson/Wadsworth. & Walnau. L. Analysis of covariance (ANCOVA) is a blending of regression and analysis of variance (Roscoe. References Gravetter. Perform and interpret an analysis of covariance using hierarchical regression. Essentials of statistics for the behavioral sciences (5th ed. It is possible to perform ANCOVA using the General Linear Model procedure in SPSS. you can use the Nonparametric Tests and Crosstabs procedures to determine whether grades differ between mentored and non-mentored students and whether there is an association between grades and retention outcomes. Lesson 14: Analysis of Covariance Objectives 1.).

If there is a covariate(X) that correlates with the dependent variable (Y). Because the two classes are intact. if there were multiple groups you would perform an ANOVA rather than a ttest. You will find or recall this test as the subject of Lesson 4. and a single variate (dependent variable). one can be used to predict the other. There are two obvious advantages to this approach: (1) any variable that influences the variation in the dependent variable can be statistically controlled. exams. ANCOVA provides a mechanism for assessing the differences in dependent variable scores after statistically controlling for the covariate. we conclude that the second method led to improved test scores. If this is the case. then dependent variable scores can be predicted by the covariate. but must rule out the possibility that this difference is attributable to differences in quantitative ability of the two groups. At the beginning of the term all students take a test of quantitative ability (pretest). . Both classes are taught by the same instructor. As a precursor to the ANCOVA. and details will not be repeated here. so this is a quasi-experimental design. As you recall from correlation and regression. As you know by now. See Figure 14-1. two treatments. and assignments. and at the end.his or her preferences. let us perform a between-groups t test to examine overall differences between the two groups on the final exam. Of course. you could just as easily have compared the means using the Compare Means or One-way ANOVA procedures. and (2) this control can reduce the amount of error variance in the analysis. The result of the t test is shown below. and use the same textbook. it is not possible to achieve experimental control. Example Data Assume that you are comparing performance in a statistics class taught by two different methods. while students in the second class take their class online. if two variables are correlated. We will use the simplest of cases. their score on the final exam is recorded (posttest). We will illustrate both procedures in this tutorial. as you know. Students in one class are instructed in the classroom. In this case. the differences observed between the groups cannot then be attributed to the experimental treatment(s). a single covariate. and the square root of the F-ratio obtained would be the value of t. ANCOVA is statistically equivalent to matching experimental groups with respect to the variable or variables being controlled (or covaried). The hypothetical data are as follows: Before the ANCOVA You may retrieve the SPSS dataset if you like. Assume that you would like to compare the scores for the two groups on the final score while controlling for initial quantitative ability.

Performing the ANCOVA in GLM To perform the ANCOVA via the General Linear Model menu. The resulting ANCOVA will verify whether there are any differences in the posttest scores of the two groups after controlling for differences in ability. Univariate (see Figure 14-3). Figure 14-2 Correlation between pretest and posttest scores Knowing that there is a statistically significant correlation between pretest and posttest scores.Figure 14-1 t Test Results As a second precursor to the ANCOVA. As correlation is the subject of Lesson 10. . and only the results are shown in Figure 14-2. let us determine the degree of correlation between quantitative ability and exam scores. the details are omitted here. we would like to exercise statistical control by holding the effects of the pretest scores constant. General Linear Model. select Analyze.

. See Figure 14-4.Figure 14-3 ANCOVA via the GLM procedure In the resulting dialog box. Method to the Fixed Factor(s) field. move Posttest to the Dependent Variable field. and Pretest to the Covariate(s) field.

main effect comparisons are not appropriate.Figure 14-4 Univariate dialog box Under Options you may want to choose descriptive statistics and effect size indexes. As there are just two groups. Figure 14-5 Univariate options for ANCOVA . as well as plots of estimated marginal means for Method. Examine Figure 14-5.

partial eta-squared = .64. The results are shown in Figure 14-6.001. The results indicate that after controlling for initial quantitative ability. . If you like.Click Continue. Figure 14-6 ANCOVA results The profile plot makes it clear that the online class had higher exam scores after controlling for initial quantitative ability (see Figure 14-7).381. you can click on Plots to add profile plots for the estimated marginal means of the posttest scores of the two groups after adjusting for pretest scores. p < . F(1. the differences in posttest scores are statistically significantly different between the two groups.27)=16. Click on OK to run the analysis.

Then enter the covariate (pretest) as one independent variable block and group membership (method) as a second block. Examine the change in R-Square as the two models are compared. and the significance of the change. enter the posttest as the criterion. The F value produced by this analysis is identical to that produced via the GLM approach. Select Analyze. Linear (see Figure 14-8). Regression.Figure 14-7 Profile plot Performing an ANCOVA Using Hierarchical Regression To perform the same ANCOVA using hierarchical regression. .

.Figure 14-8 ANCOVA via hierarchical regression Now enter Posttest as the Dependent Variable and Pretest as an Independent variable (see Figure 149).

Figure 14-9 Linear regression dialog box Click on the Next button and enter Method as an Independent variable. . as shown in Figure 14-10.

Figure 14-11 Specify R squared change Click Continue then OK to run the hierarchical regression. Note in the partial output shown in Figure 14-12 that the value of F for the R Square Change with pretest held constant is identical to that calculated earlier. . and check the box in front of R squared change (see Figure 14-11).Figure 14-10 Entering second block Click on Statistics.

). New York: Hot. J.Figure 14-2 Hierarchical regression yields results identical to GLM References Roscoe. Fundamental research statistics for the behavioral sciences (2nd ed. Inc. Rinehart and Winston. T. . (1975).