You are on page 1of 34

# H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Venue
Conference Room, Health Economics Unit Ministry of Health and Family Welfare 14/2, Topkhana Road (4th level, Room # 311), Dhaka 1000

Day # 2

## Nonparametric Hypothesis Testing

Session Outline
1. Nonparametric Hypothesis Testing a. Binomial Test b. Two-Independent-Samples Tests c. Tests for Several Independent Samples d. Two-Related-Samples Tests e. Tests for Several Independent Samples

Page 1 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Binomial Test

Introduction
The Binomial Test procedure compares the observed frequencies of the two categories of a dichotomous variable to the frequencies that are expected under a binomial distribution with a specified probability parameter. By default, the probability parameter for both groups is 0.5. To change the probabilities, you can enter a test proportion for the first group. The probability for the second group will be 1 minus the specified probability for the first group.

Example
When you toss a dime, the probability of a head equals 1/2. Based on this hypothesis, a dime is tossed 40 times, and the outcomes are recorded (heads or tails). From the binomial test, you might find that 3/4 of the tosses were heads and that the observed significance level is small (0.0027). These results indicate that it is not likely that the probability of a head equals 1/2; the coin is probably biased.

Statistics
Mean, standard deviation, minimum, maximum, number of nonmissing cases, and quartiles.

Data
The variables that are tested should be numeric and dichotomous. To convert string variables to numeric variables, use the Automatic Recode procedure, which is available on the Transform menu. A dichotomous variable is a variable that can take only two possible values: yes or no, true or false, 0 or 1, and so on. The first value encountered in the dataset defines the first group, and the other value defines the second group. If the variables are not dichotomous, you must specify a cut point. The cut point assigns cases with values that are less than or equal to the cut point to the first group and assigns the rest of the cases to the second group.

Page 2 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Assumptions
Nonparametric tests do not require assumptions about the shape of the underlying distribution. The data are assumed to be a random sample.

## To Obtain a Binomial Test

From the menus choose: Analyze Nonparametric Tests Binomial... Select one or more numeric test variables. Optionally, click Options for descriptive statistics, quartiles, and control of the treatment of missing data.

## Binomial Test Options

Statistics. You can choose one or both summary statistics. Descriptive. Displays the mean, standard deviation, minimum, maximum, and number of nonmissing cases. Quartiles. Displays values corresponding to the 25th, 50th, and 75th percentiles.

Missing Values. Controls the treatment of missing values. Exclude cases test-by-test. When several tests are specified, each test is evaluated separately for missing values. Exclude cases listwise. Cases with missing values for any variable are excluded from all analyses.

Page 3 of 34

(SPSS)

## Worked out Example

A telecommunications firm loses about 27% of its customers to churn each month. In order to properly focus churn reduction efforts, management wants to know if this percentage varies across predefined customer groups. This information is collected in the file telco.sav. Use the binomial test to determine whether a single rate of churn adequately describes the four major customer types. There are two preparatory steps to take. In order to perform the test, you must first split the file by Customer category. You also want to ensure that the first group for each split is defined by the customers who churned, so before splitting the file, you will sort the cases on Customer category and Churn within last month. To sort the cases, from the Data Editor menus choose: Data Sort Cases... Select Customer category and Churn within last month as variables to sort by. Select Churn within last month and then select Descending as the sort order. Click OK. This will sort the cases by Customer category in ascending order and then Churn within last month in descending order. Now, to split the file, from the Data Editor menus choose: Data Split File... Select Compare groups. Select Customer category as the variable on which to base groups. Select File is already sorted. Click OK.

Page 4 of 34

(SPSS)

## To begin the analysis, from the menus choose:

Analyze Nonparametric Tests Binomial... Select Churn within last month as the test variable. Enter 0.27 as the test proportion. Click Options Select Descriptive. Click Continue. Click OK in the Binomial Test dialog box.

Because Churn within last month is a dichotomous variable, the mean tells us the proportion of churn within each customer type. Multiplying these proportions by 100 expresses the same data as percentages. For example, the percentage of churn for customers subscribing only to basic service was 31%. Similarly, customers who prefer more high-end electronic services churned at a rate of about 27% within the last month. There are about 280 customers who subscribe to a set of convenience services (three-way calling, call forwarding, call waiting, etc.). Of these, only 16% recently churned. Customers who take advantage of all of the services offered by the firm churned the most--37%, or 10% higher than the average of all customers within the last month.

Page 5 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Each panel of the binomial test table displays one binomial test. For example, the first panel displays the test of the null hypothesis that the proportion of churn for Basic service users is the same as the proportion of churn in the total sample. Of the 266 Basic service customers, 83 churned within the last month. The Observed Prop. column here shows that these 83 customers account for 31% of the total Basic service group in this sample. The test proportion of 0.27 suggests that we should expect 0.27 * 266, or about 72 customers, to have churned. The asymptotic significance value is 0.07, which is above the conventional cutoff for statistical significance (0.05). By that standard, you cannot reject the null hypothesis that the churn rate for basic service customers is equal to the churn rate in the sample at large. The same cannot be said for Plus service customers, however. In this case, the proportion, 0.16, is significantly lower than the test proportion. Many fewer Plus service customers found another service provider last month. At the other extreme, significantly more Total service customers were lost last month than the test proportion predicts.

Page 6 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Using the Binomial Test procedure, you have determined that the rate of churn differs across customer types. Now that Total Service customers have been identified as high-risk, you can focus further efforts on finding out why these customers are dissatisfied.

## Using a Cut Point to Define the Samples

As part of a continuing analysis of churn in their customer base, a telecommunications firm questions whether those who churn tend to be above or below the median household income of \$47,000. This example uses the file telco.sav. Use the Binomial Test procedure to dynamically compute and test the proportion of each churn group falling below the median value. In order to perform the test, you must first split the file by Churn within last month. To split the file, from the Data Editor menus choose: Data Split File... Click Reset to restore the default settings. Select Compare groups. Select Churn within last month as the variable to base groups on. Click OK. To begin the analysis, from the menus choose: Analyze Nonparametric Tests Binomial... Click Reset to restore the default settings. Select Household income in thousands as the test variable. Type 47 as the cut point to define the dichotomy. Click Options.

Page 7 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Select Quartiles. Click Continue. Click OK in the Binomial Test dialog box.

The descriptives table displays the quartiles for each churn group. Generally, customers who churned last month have lower household incomes.

The binomial test table is split by the values of Churn within last month. The first test selects only those customers in the sample who did not churn last month. Within this first split file group, the cut point has created two groups. The first group consists of those customers who did not churn and whose household income is less than or equal to the median for the total sample. In these data, just about half of those who did not churn fall at or below median income. As we would expect, the difference in proportions is not significant.

Page 8 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

However, of those 274 customers who churned last month, the proportion with household incomes at or below the median is significantly different from the null hypothesis value. Those who churn tend to be the less affluent customers. Using a cut point to define the groups, you have found that a majority of the customers who churned within the last month fall below the median household income. Now that these customers have been identified as high-risk, you can focus further efforts on determining why these customers are dissatisfied.

Page 9 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Two-Independent-Samples Tests

Introduction
The Two-Independent-Samples Tests procedure compares two groups of cases on one variable.

Example
New dental braces have been developed that are intended to be more comfortable, to look better, and to provide more rapid progress in realigning teeth. To find out whether the new braces have to be worn as long as the old braces, 10 children are randomly chosen to wear the old braces, and another 10 children are chosen to wear the new braces. From the Mann-Whitney U test, you might find that, on average, children with the new braces did not have to wear the braces as long as children with the old braces.

Statistics
Mean, standard deviation, minimum, maximum, number of nonmissing cases, and quartiles. Tests: Mann-Whitney U, Moses extreme reactions, Kolmogorov-Smirnov Z, Wald-Wolfowitz runs.

Data
Use numeric variables that can be ordered.

Assumptions
Use independent, random samples. The Mann-Whitney U test tests equality of two distributions. In order to use it to test for differences in location between two distributions, one must assume that the distributions have the same shape.

## To Obtain Two-Independent-Samples Tests

From the menus choose: Analyze Nonparametric Tests 2 Independent Samples...

Page 10 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Select one or more numeric variables. Select a grouping variable and click Define Groups to split the file into two groups or samples.

## Two-Independent-Samples Test Types

Four tests are available to test whether two independent samples (groups) come from the same population. The Mann-Whitney U test is the most popular of the two-independent-samples tests. It is equivalent to the Wilcoxon rank sum test and the Kruskal-Wallis test for two groups. MannWhitney tests that two sampled populations are equivalent in location. The observations from both groups are combined and ranked, with the average rank assigned in the case of ties. The number of ties should be small relative to the total number of observations. If the populations are identical in location, the ranks should be randomly mixed between the two samples. The test calculates the number of times that a score from group 1 precedes a score from group 2 and the number of times that a score from group 2 precedes a score from group 1. The Mann-Whitney U statistic is the smaller of these two numbers. The Wilcoxon rank sum W statistic is also displayed. If both samples have the same number of observations, W is the rank sum of the group that is named first in the Two-Independent-Samples Define Groups dialog box. The Kolmogorov-Smirnov Z test and the Wald-Wolfowitz runs test are more general tests that detect differences in both the locations and shapes of the distributions. The Kolmogorov-Smirnov test is based on the maximum absolute difference between the observed cumulative distribution functions for both samples. When this difference is significantly large, the two distributions are considered different. The Wald-Wolfowitz runs test combines and ranks the observations from both groups. If the two samples are from the same population, the two groups should be randomly scattered throughout the ranking. The Moses extreme reactions test assumes that the experimental variable will affect some subjects in one direction and other subjects in the opposite direction. The test tests for extreme responses compared to a control group. This test focuses on the span of the control group and is a measure of how much extreme values in the experimental group influence the span when combined

Page 11 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

with the control group. The control group is defined by the group 1 value in the Two-IndependentSamples Define Groups dialog box. Observations from both groups are combined and ranked. The span of the control group is computed as the difference between the ranks of the largest and smallest values in the control group plus 1. Because chance outliers can easily distort the range of the span, 5% of the control cases are trimmed automatically from each end.

## Two-Independent-Samples Tests Define Groups

To split the file into two groups or samples, enter an integer value for Group 1 and another value for Group 2. Cases with other values are excluded from the analysis.

## Two-Independent-Samples Tests Options

Statistics. You can choose one or both summary statistics. Descriptive. Displays the mean, standard deviation, minimum, maximum, and number of nonmissing cases. Quartiles. Displays values corresponding to the 25th, 50th, and 75th percentiles.

Missing Values. Controls the treatment of missing values. Exclude cases test-by-test. When several tests are specified, each test is evaluated separately for missing values. Exclude cases listwise. Cases with missing values for any variable are excluded from all analyses.

## Worked Out Example

Physicians randomly assigned female stroke patients to receive only physical therapy or physical therapy combined with emotional therapy. Three months after the treatments, the Mann-Whitney test is used to compare each group's ability to perform common activities of daily life.

Page 12 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

The results are collected in the file adl.sav. Use the Mann-Whitney test to determine whether the two groups' abilities differ. To begin the analysis, from the menus choose: Analyze Nonparametric Tests 2 Independent Samples... Select Travel ADL, Cooking ADL, and Housekeeping ADL as the test variables. Select Treatment group as the grouping variable. Click Define Groups. Type 0 as the group 1 value and 1 as the group 2 value. Click Continue. Click OK in the Two-Independent-Samples Tests dialog box.

Because the test variables are assumed to be ordinal, the Mann-Whitney and Wilcoxon tests are based on ranks of the original values and not on the values themselves. The rank table is divided into three panels, one panel for each test variable. The first test variable, Travel ADL, measures the ability to regularly get around the community. It ranges from 0 to 4, where 0 = Same as before illness and 4 = Bedridden. All 46 women in the control group and all 54 women in the treatment group provided valid data for this variable.

Page 13 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

First, each case is ranked without regard to group membership. Cases tied on a particular value receive the average rank for that value. After ranking the cases, the ranks are summed within groups. Average ranks adjust for differences in the number of patients in both groups. If the groups are only randomly different, the average ranks should be about equal. For Travel ADL, the average ranks are over 9 points apart. The test variables Cooking ADL and Housekeeping ADL contain missing data. For these variables, the value 4 = Never did any; thus, these scales do not apply to all patients. However, for those to whom they do apply, there are differences of about 12 to 13 points between the average ranks of the treatment and control groups.

The U statistic is simple (but tedious) to calculate. For each case in group 1, the number of cases in group 2 with higher ranks is counted. Tied ranks count as 1/2. This process is repeated for group 2. The Mann-Whitney U statistic displayed in the table is the smaller of these two values. The Wilcoxon W statistic is simply the smaller of the two rank sums displayed for each group in the rank table. The values displayed here are the rank sums for the treatment group. A nice feature of the Mann-Whitney and Wilcoxon tests is that the Z statistic and normal distribution provide an excellent approximation as the sample size grows beyond 10 in either group. The negative Z statistics indicate that the rank sums are lower than their expected values. Each two-tailed significance value estimates the probability of obtaining a Z statistic as or more extreme (in absolute value) as the one displayed, if there truly is no effect of the treatment.

Page 14 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

The significantly lower rank sums of the treatment group indicate to the physicians that the additional emotional therapy had some beneficial effect on such activities of daily life as cooking and cleaning.

Page 15 of 34

(SPSS)

## Tests for Several Independent Samples Introduction

The Tests for Several Independent Samples procedure compares two or more groups of cases on one variable.

Example
Do three brands of 100-watt lightbulbs differ in the average time that the bulbs will burn? From the Kruskal-Wallis one-way analysis of variance, you might learn that the three brands do differ in average lifetime.

Statistics
Mean, standard deviation, minimum, maximum, number of nonmissing cases, and quartiles. Tests: Kruskal-Wallis H, median.

Data
Use numeric variables that can be ordered.

Assumptions
Use independent, random samples. The Kruskal-Wallis H test requires that the tested samples be similar in shape.

## To Obtain Tests for Several Independent Samples

From the menus choose: Analyze Nonparametric Tests K Independent Samples...

Page 16 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Select one or more numeric variables. Select a grouping variable and click Define Range to specify minimum and maximum integer values for the grouping variable.

## Tests for Several Independent Samples Test Types

Three tests are available to determine if several independent samples come from the same population. The Kruskal-Wallis H test, the median test, and the Jonckheere-Terpstra test all test whether several independent samples are from the same population. The Kruskal-Wallis H test, an extension of the Mann-Whitney U test, is the nonparametric analog of one-way analysis of variance and detects differences in distribution location. The median test, which is a more general test (but not as powerful), detects distributional differences in location and shape. The Kruskal-Wallis H test and the median test assume that there is no a priori ordering of the k populations from which the samples are drawn. When there is a natural a priori ordering (ascending or descending) of the k populations, the Jonckheere-Terpstra test is more powerful. For example, the k populations might represent k increasing temperatures. The hypothesis that different temperatures produce the same response distribution is tested against the alternative that as the temperature increases, the magnitude of the response increases. Here, the alternative hypothesis is ordered; therefore, Jonckheere-Terpstra is the most appropriate test to use. The Jonckheere-Terpstra test is available only if you have installed the Exact Tests add-on module.

## Tests for Several Independent Samples Define Range

To define the range, enter integer values for Minimum and Maximum that correspond to the lowest and highest categories of the grouping variable. Cases with values outside of the bounds are excluded. For example, if you specify a minimum value of 1 and a maximum value of 3, only the integer values of 1 through 3 are used. The minimum value must be less than the maximum value, and both values must be specified.

Page 17 of 34

(SPSS)

## Tests for Several Independent Samples Options

Statistics. You can choose one or both summary statistics. Descriptive. Displays the mean, standard deviation, minimum, maximum, and number of nonmissing cases. Quartiles. Displays values corresponding to the 25th, 50th, and 75th percentiles.

Missing Values. Controls the treatment of missing values. Exclude cases test-by-test. When several tests are specified, each test is evaluated separately for missing values. Exclude cases listwise. Cases with missing values for any variable are excluded from all analyses.

## Worked Out Example

A sales manager evaluates two new training courses. Sixty employees, divided into three groups, all receive standard training. In addition, group 2 receives technical training, and group 3 receives a hands-on tutorial. Each employee was tested at the end of the training course and their score recorded. The results are collected in the file salesperformance.sav. Use the median test to assess the difference in performance between the three groups, if any. To begin the analysis, from the menus choose: Analyze Nonparametric Tests K Independent Samples... Select Score on training exam as the test variable. Deselect Kruskal-Wallis H, and select Median as the test type. Select Sales training group as the grouping variable.

Page 18 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Click Define Range Type 1 as the minimum and 3 as the maximum values. Click Continue. Click Options in the Tests for Several Independent Samples dialog box. Select Quartiles in the Statistics group. Click Continue. Click OK in the Tests for Several Independent Samples dialog box.

Across all 60 subjects, the median performance on the exam is a score just below 75. The null hypothesis for the median test is that this particular value is a good approximation of center for each of the three training groups.

To test this hypothesis, each group is divided into two subgroups: those whose scores fall at or below the median, and those whose scores are above it. The result is a two-way frequency table with two rows and g columns, where g is the number of categories in your grouping variable. In this table, for example, the first cell is a count of the number of employees who received standard training and scored above the median. While the null hypothesis would predict that about 10 subjects scored above the median, only four subjects in this group did so.

Page 19 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

In addition to standard training, group 2 also received some technical training. Unlike the other groups, the median for all trainees does what the null hypothesis says it should do: it nearly divides this group into two equal subgroups.

In the final training group, those with exam scores greater than the median outnumber those at or below it by a margin of three to one. Like group 1, the null hypothesis does not provide a good approximation of center for these trainees.

From this two-way frequency table, a chi-square statistic can be calculated to test the null hypothesis of row and column independence. In fact, the median test is a chi-square test of independence between group membership and the proportion of cases above and below the median.

The chi-square value is obtained in the usual fashion for two-way tables. For each cell, the distance between the observed and expected counts is squared, then divided by the expected value. Finally, these quantities are summed across all cells. For this table, the value is 12.4. Degrees of freedom for the frequency table are equal to (rows - 1) * (columns - 1). In this case, that is 1 * 2 = 2. The asymptotic significance tells us how often we can expect a chi-square value at least as large as 12.4 in similar repeated samples, if there really is no relationship between the median and group membership. The probability is very low: about two times per thousand.

Page 20 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

From this analysis, the manager learns that type of training resulted in different median scores between the groups. Trainees who received the hands-on tutorial have a higher median value than

either their counterparts who received standard training or additional technical training.

## Using Kruskal-Wallis to Test Ordinal Outcomes

Agricultural researchers are studying the effect of mulch color on the taste of crops. Strawberries grown in red, blue, and black mulch were rated by taste-testers on an ordinal scale of one to five (far below to far above average). The results are collected in the file tastetest.sav. Use the Kruskal-Wallis test to determine if taste varies by mulch color. To begin the analysis, from the menus choose: Analyze Nonparametric Tests K Independent Samples... Select Taste scale as the test variable. Select Mulch color as the grouping variable. Click Define Range. Type 1 as the minimum and 3 as the maximum values. Click Continue. Click OK in the Tests for Several Independent Samples dialog box.

The Kruskal-Wallis test uses ranks of the original values and not the values themselves. That's appropriate in this case, because the scale used by the taste-testers is ordinal.
Ministry of Health and Family Welfare
Page 21 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

First, each case is ranked without regard to group membership. Cases tied on a particular value receive the average rank for that value. After ranking the cases, the ranks are summed within groups.

The Kruskal-Wallis statistic measures how much the group ranks differ from the average rank of all groups. The chi-square value is obtained by squaring each group's distance from the average of all ranks, weighting by its sample size, summing across groups, and multiplying by a constant. The degrees of freedom for the chi-square statistic are equal to the number of groups minus one. The asymptotic significance estimates the probability of obtaining a chi-square statistic greater than or equal to the one displayed, if there truly are no differences between the group ranks. A chisquare of 9.751 with 2 degrees of freedom should occur only about 8 times per 1,000. The table tells us the ratings of the strawberries differed by type of mulch used for cultivation. Like the F test in standard ANOVA, Kruskal-Wallis does not tell us how the groups differed, only that they are different in some way. The Mann-Whitney test could be used for pairwise comparisons.

Page 22 of 34

(SPSS)

## Two-Related-Samples Tests Introduction

The Two-Related-Samples Tests procedure compares the distributions of two variables.

Example
In general, do families receive the asking price when they sell their homes? By applying the Wilcoxon signed-rank test to data for 10 homes, you might learn that seven families receive less than the asking price, one family receives more than the asking price, and two families receive the asking price.

Statistics
Mean, standard deviation, minimum, maximum, number of nonmissing cases, and quartiles. Tests: Wilcoxon signed-rank, sign, McNemar. If the Exact Tests option is installed (available only on Windows operating systems), the marginal homogeneity test is also available.

Data
Use numeric variables that can be ordered.

Assumptions
Although no particular distributions are assumed for the two variables, the population distribution of the paired differences is assumed to be symmetric.

## To Obtain Two-Related-Samples Tests

From the menus choose: Analyze Nonparametric Tests 2 Related Samples... Select one or more pairs of variables.

Page 23 of 34

(SPSS)

## Two-Related-Samples Test Types

The tests in this section compare the distributions of two related variables. The appropriate test to use depends on the type of data. If your data are continuous, use the sign test or the Wilcoxon signed-rank test. The sign test computes the differences between the two variables for all cases and classifies the differences as positive, negative, or tied. If the two variables are similarly distributed, the number of positive and negative differences will not differ significantly. The Wilcoxon signed-rank test considers information about both the sign of the differences and the magnitude of the differences between pairs. Because the Wilcoxon signed-rank test incorporates more information about the data, it is more powerful than the sign test. If your data are binary, use the McNemar test. This test is typically used in a repeated measures situation, in which each subject's response is elicited twice, once before and once after a specified event occurs. The McNemar test determines whether the initial response rate (before the event) equals the final response rate (after the event). This test is useful for detecting changes in responses due to experimental intervention in before-and-after designs. If your data are categorical, use the marginal homogeneity test. This test is an extension of the McNemar test from binary response to multinomial response. It tests for changes in response (using the chi-square distribution) and is useful for detecting response changes due to experimental intervention in before-and-after designs. The marginal homogeneity test is available only if you have installed Exact Tests.

## Two-Related-Samples Tests Options

Statistics. You can choose one or both summary statistics. Descriptive. Displays the mean, standard deviation, minimum, maximum, and number of nonmissing cases. Quartiles. Displays values corresponding to the 25th, 50th, and 75th percentiles.

Page 24 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Missing Values. Controls the treatment of missing values. Exclude cases test-by-test. When several tests are specified, each test is evaluated separately for missing values. Exclude cases listwise. Cases with missing values for any variable are excluded from all analyses.

## Worked Out Example

An analyst at an investment firm knows that the median gain for S&P 500 stocks last year was about 0.078% per day, and it is 0.123% so far this year. Worried about the ailing technology sector, he uses the Wilcoxon test to see if the median gain for technology stocks is different from the known median for all stocks. The data are in the file mutualfund.sav. To begin the analysis, from the menus choose: Analyze Nonparametric Tests 2 Related Samples... Select Fund avg % gain 2000 and S&P median change 2000 as the first test pair. Select Fund avg % gain 2001 and S&P median change 2001 as the second test pair. Click OK.

Page 25 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

In the Wilcoxon test, ranks are based on the absolute value of the difference between the two test variables. The sign of the difference is used to classify cases into one of three groups: differences below 0 (negative ranks), above 0 (positive rank), or equal to 0 (ties). Tied cases are ignored. In these data, 5 cases have negative differences whose absolute values are ranked 3, 4, 6, 7, and 11 among all differences. The sum of these ranks equals 31. The other cases have positive differences, whose ranks sum to 60.

Z is a standardized measure of the distance between the rank sum of the negative group and its expected value. The expected rank sum is 45.5, half the sum of all ranks. The standard deviation is 14.31. The negative group rank sum equals 31, so the Z statistic is (31 - 45.5) / 14.31, or -1.013. The two-tailed asymptotic significance estimates the probability of obtaining a Z statistic that is as extreme or more extreme in absolute value as the one displayed, if there truly is no difference between the group ranks. In this case, the probabilities for both tests are above any reasonable cutoff. From this analysis, the investment analyst can breathe a sigh of relief. His technology stock holdings did not underperform on a daily basis in the years 2000 and 2001, compared to the median daily performance of all other stocks on the S&P 500 over the same period.

Page 26 of 34

(SPSS)

## Using the McNemar Test in a Pre-Post Design

A grocery store manager wants to increase sales of the store-brand detergent. She puts together an in-store promotion and talks with customers at check-out. She will use the McNemar test to determine if the in-store advertisement changed her customers' buying behavior. The data are in the file storebrand.sav. To begin the analysis, from the menus choose: Analyze Nonparametric Tests 2 Related Samples... Select Preference before promotion and Preference after promotion as the test pair. Deselect Wilcoxon and select McNemar. Click OK.

The McNemar test focuses on change from one condition or one sample response to another. In this example, the null hypothesis is that the promotion would have no effect; customers would be equally likely to change preferences from one brand to another. In this table, you can see that 26 customers preferred the store brand before seeing the promotion but not afterwards. This is a change in buying behavior that the manager would like not to attribute to the promotion. On the other hand, 48 customers said that they did not prefer the store detergent prior to the promotion but did prefer it afterwards. This is a change in the direction that would certainly please the store manager.

Page 27 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

The McNemar chi-square is computed using only the two cells of the previous table where customers changed their preferences from before to after the promotion. Continuity is corrected because the chi-square statistic is used to approximate a discrete distribution. The asymptotic significance is the approximate probability of obtaining a chi-square statistic as extreme as 5.959 in repeated samples, if the frequencies of the two change conditions are only randomly different. Because a chi-square this large is unlikely to have arisen by chance, the manager rejects the null hypothesis of no difference in favor of her hypothesis that the promotion had a favorable effect.

Page 28 of 34

(SPSS)

## Tests for Several Related Samples Introduction

The Tests for Several Related Samples procedure compares the distributions of two or more variables.

Example
Does the public associate different amounts of prestige with a doctor, a lawyer, a police officer, and a teacher? Ten people are asked to rank these four occupations in order of prestige. Friedman's test indicates that the public does associate different amounts of prestige with these four professions.

Statistics
Mean, standard deviation, minimum, maximum, number of nonmissing cases, and quartiles. Tests: Friedman, Kendall's W, and Cochran's Q.

Data
Use numeric variables that can be ordered.

Assumptions
Nonparametric tests do not require assumptions about the shape of the underlying distribution. Use dependent, random samples.

## To Obtain Tests for Several Related Samples

From the menus choose: Analyze Nonparametric Tests K Related Samples... Select two or more numeric test variables.
Ministry of Health and Family Welfare
Page 29 of 34

(SPSS)

## Tests for Several Related Samples Test Types

Three tests are available to compare the distributions of several related variables. The Friedman test is the nonparametric equivalent of a one-sample repeated measures design or a two-way analysis of variance with one observation per cell. Friedman tests the null hypothesis that k related variables come from the same population. For each case, the k variables are ranked from 1 to k. The test statistic is based on these ranks. Kendall's W is a normalization of the Friedman statistic. Kendall's W is interpretable as the coefficient of concordance, which is a measure of agreement among raters. Each case is a judge or rater, and each variable is an item or person being judged. For each variable, the sum of ranks is computed. Kendall's W ranges between 0 (no agreement) and 1 (complete agreement). Cochran's Q is identical to the Friedman test but is applicable when all responses are binary. This test is an extension of the McNemar test to the k-sample situation. Cochran's Q tests the hypothesis that several related dichotomous variables have the same mean. The variables are measured on the same individual or on matched individuals.

## Tests for Several Related Samples Statistics

You can choose statistics. Descriptive. Displays the mean, standard deviation, minimum, maximum, and the number of nonmissing cases. Quartiles. Displays values corresponding to the 25th, 50th, and 75th percentiles.

## Worked Out Example

An online retailer has created a new Web store. Usability testing is expensive but vital, so five users are invited. Each is asked to perform six tasks on the site, all of which are designed to be equally easy. The results are collected in the file webusability.sav. Use Cochran's Q to test the hypothesis that all six tasks had equal success rates.

Page 30 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

To begin the analysis, from the menus choose: Analyze Nonparametric Tests K Related Samples... Select all of the variables, from Registered warranty data to Edited database information as the test variables. Deselect Friedman, and select Cochran's Q. Click Statistics. Select Descriptive. Click Continue. Click OK in the Tests for Several Related Samples dialog box.

The only possible outcomes for each task were 0 (Failure) or 1 (Success). Therefore, the means measure the proportion of users who succeeded at each task. For example, all five users were able to register their warranty data, but none could successfully add a question to the support list.

Page 31 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

The frequency table summarizes the number of observations of success or failure at each task. Because the null hypothesis would predict that each task had the same number of successes, you can sense that perhaps that hypothesis is not supported by this pattern of frequencies.

Cochran's Q statistic is a chi-square variate formed by a ratio of the variation in success across tasks to the variation in success within subjects. Based on the statistics and frequency tables, you would expect a large statistic because you observed quite a bit of variation in success by task. Degrees of freedom for this chi-square are equal to the number of test variables minus 1. There were six tasks, so there are five degrees of freedom. The asymptotic significance is the approximate probability of obtaining a chi-square statistic as extreme as 12.949 in repeated samples if the frequencies of task success are only randomly different. Because a chi-square this large is unlikely to have arisen by chance, the design team rejects the null hypothesis that all tasks have an equal number of successes. Clearly, users had difficulty interacting with the support list, as well as the fax and newsletter request pages.

## Using the Friedman Test on Related Ordinal Measures

An insurance group is evaluating four health care plans for small employers. Twelve employers are recruited to rank the plans by how much they would prefer to offer them to their employees. The results are collected in the file healthplans.sav. Use the Friedman test to determine if the plans are of equal preference. To begin the analysis, from the menus choose:

Page 32 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Analyze Nonparametric Tests K Related Samples... Select PPO plan 1 through HMO plan 2 as the test variables. Click OK.

The Friedman test ranks the scores in each row of the data file independently of every other row. In this example, each employer has already performed this ranking. For each plan, these ranks are summed and then divided by the number of employers to yield an average rank for each plan. In this table, you can see that the 12 employers tended to rank PPO plan 2 more highly than the other three plans.

The Friedman chi-square tests the null hypothesis that the ranks of the variables do not differ from their expected value. For a constant sample size, the higher the value of this chi-square statistic, the larger the difference between each variable's rank sum and its expected value. For these rankings, the chi-square value is 10.3. Degrees of freedom are equal to the number of variables minus 1. Because four health plans were being ranked, there are three degrees of freedom The asymptotic significance is the approximate probability of obtaining a chi-square statistic as extreme as 10.3 with three degrees of freedom in repeated samples if the rankings of each health plan are not truly different.
Ministry of Health and Family Welfare
Page 33 of 34

## H AN D S ON TR AIN IN G PRO GR AM ME ON ST AT IST IC AL P AC K AGE S FOR SOCIAL SECIENCES

(SPSS)

Because a chi-square of 10.3 with three degrees of freedom is unlikely to have arisen by chance, the insurer concludes that the 12 employers do not have equal preference for all four health care plans.

Page 34 of 34