You are on page 1of 11

Introduction to Nonparametric Statistics Craig L.

Scanlan, EdD, RRT


Parametric statistics assume (1) that the distribution characteristics of a sample's population are known (e.g. the mean, standard deviation, normality) and (2) that the data being analyzed are at the interval or ratio level. Frequently, however, these assumptions cannot be met. Commonly, this occurs with nominal- or ordinal-level data (for which there are no measures of means or standard deviations). Alternatively, continuous data may be so severely skewed from normal that it cannot be analyzed using regular parametric methods. In these cases we cannot perform analyses based on means or standard deviations. Instead, we must use nonparametric methods. Unlike their parametric counterparts, non-parametric tests make no assumptions about the distribution of the data nor do they rely on estimates of population parameters such as the mean in order to describe a variable's distribution. For this reason, nonparametric tests often are called 'distribution-free' or ' parameter-free' statistics. Given that nonparametric methods make less stringent demands on the data, one might wonder why they are not used more often. There are several reasons. First, nonparametric statistics cannot provide definitive measures of actual differences between population samples. A nonparametric test may tell you that two interventions are different, but it cannot provide a confidence interval for the difference or even a simple mean difference between the two. Second, nonparametric procedures discard information. For example, if we convert severely skewed interval data into ranks, we are discarding the actual values and only retaining their order. Because vital information is discarded, nonparametric tests are less powerful (more prone to Type II errors) than parametric methods. This also means that nonparametric tests typically require comparably larger sample sizes in order to demonstrate an effect when it is present. Last, there are certain types of information that only parametric statistical tests can provide. A good example is independent variable interaction, as provided by factorial analysis of variance. There is simply no equivalent nonparametric method to analysis such interactions. For these reasons, you will see nonparametric analysis used primarily on an as-needed basis, either (1) to analyze nominal or ordinal data or (2) to substitute for parametric tests when their assumptions are grossly violated, e.g., when a distribution is severely skewed. Discussion here will be limited to the analysis of nominal or ordinal data. Nominal (Categorical) Data Analysis We previously have learned that the Pearson product-moment correlation coefficient (r) is commonly used to assess the relationship between two continuous variables. If instead the two variables are measured at the nominal level (categorical in nature), we assess their relationship by crosstabulating the data in a contingency table. A contingency table is a two-dimensional (rows x columns) table formed by 'cross-classifying' subjects or events on two categorical variables. One variable's categories define the rows while the other variable's categories define the columns. The intersection (crosstabulation) of each row and column forms a cell, which

displays the count (frequency) of cases classified as being in the applicable category of both variables. Below is a simple example of a hypothetical contingency table that crosstabulates patient gender against survival of chest trauma: Outcome Survives Dies 34 16 7 41 43 59

Male Female Total

Total 50 50 100

Testing for Independence (Chi-square and Related Tests) Based on simple probability, we can easily compute the expected values for each cell, i.e., the number of cases we would expect based on their total distribution in the sample. For example, given that the sample contains exactly 50% male and 50% female, were there no relationship between gender and outcome (the null hypothesis of independence), we would expect exactly half of those surviving (41) to be male, i.e., 41/2 = 20.5.* Similar expected values can be computed for all cells in the table. The greater the difference between the observed (O) and expected (E) cell counts, the less likely that the null hypothesis of independence holds true, i.e., the stronger the evidence that the two variables are related. In our example, the large difference between the observed (O = 34) and expected (E = 20.5) cell counts for the Male/Survives cell suggests that being male is associated with greater likelihood of survival. To determine whether or not the row and column categories for the table as a whole are independent of each other, i.e. we compute the Chi-square statistic (2):

where O = the observed frequency and E = the expected frequency. As indicated in the formula, one first computes the differences between the observed and expected frequencies in each cell, squares this difference, and then divides the squared difference by that cell's expected frequency. These values are then summed for each cell (the symbol), yielding the values of chi-square (2). In our example 2 = 30.14. The resulting 2 statistic is then compared to a critical value that is based on the number of rows and columns and obtained from a Chi-square distribution table. If the computed 2 statistic is less than this critical value, then we must accept the null hypothesis and conclude that the variable categories are independent of each other, i.e., not associated. If on the other hand the computed
*

The actual formula for computing the expected count (E) in any contingency table cell is:

E = (row total x column total)/grand total For the Male/Survives cell E = (50 x 41) / 100 = 20.5

2 statistic exceeds the critical value, then we reject the null hypothesis and conclude that the variable categories are indeed related. In our example, the critical value for 2 in this analysis is 3.84. Since our computed 2 of 30.14 clearly exceeds this critical value, we can conclude that the variable categories are indeed related, i.e., that gender is associated with survival after chest trauma (hypothetical example). If the minimum expected count for any cell in a contingency table is less than 5, then the resulting 2 statistic may not be accurate. In this case, an alternative is needed. The alternative to the 2 test for this situations is Fisher's Exact Test. Most authors recommend using Fisher's Exact Test statistics instead of 2 whenever one or more of the expected counts in a table cell is less than 5 or when the row or column totals are very uneven. It is important to note that both 2 and Fisher's Exact Test are nondirectional (symmetrical) tests, i.e., they make no assumptions as to directionality or cause and effect. If one is assessing the relationship between cause and effect, other nonparametric test would need to be considered. Testing for the Strength of Categorical Relationships 2 and Fisher's Exact Test only test whether or not there is a relationship between categorical variables. To test the strength of such relationships we use correlation-like measures such as the Contingency Coefficient, the Phi coefficient or Cramer's V. These coefficients can be thought of as Pearson product-moment correlations for categorical variables. However, unlike the Pearson r, which can assume negative values, these coefficients only range from 0 to +1 (you cannot have a 'negative' relationship between categorical variables) . The contingency coefficient (CC) is computed as follows:

where 2 = the Chi-square value and N = the sample size. Unfortunately, the maximum value of the contingency coefficient varies with table size (being larger for larger tables). For this reason, it is difficult to compare the association among variables among different size tables using this coefficient. The Phi coefficient () is a measure of nominal association applicable only to 2 x 2 contingency tables. It is calculated using the following formula:

In our example, the Phi coefficient = 30.14/50 = 0.60, a moderately strong association. If we were conducting crosstabulation on contingency tables larger than 2 x 2, Cramer's V is the nominal association measure of choice. The formula for Cramer's V is:

where N is the total number of cases and k is the lesser of the number of rows or columns. Because in 2 x 2 tables k = 2 and k-1 = 1, Cramer's V equals Phi for 2 x 2 analyses. Ordinal (Ranked) Data Analysis Testing for the Strength of Ordinal (Ranked) Relationships As with continuous and nominal data, measures exist to quantify the strength of association between variable measured at the ordinal level. The two most common ordinal measures of association are Spearman's rho () and Kendall's rank order correlation coefficient or Kendall's tau (). Both Spearman's rho and Kendall's tau require that the two variables, X and Y, are paired observations, with the variables measured are at least at the ordinal level.* Like the parametric Pearson product-moment correlation coefficient, both these measures can range between -1.0 and +1.0, with a positive correlation indicating that the ranks increase together, while a negative correlation indicates that as the rank of one variable increases the other one decreases. Spearman's rho. In principle, Spearman's rho is simply a special case of the Pearson productmoment coefficient in which the data are converted to ranks before calculating the coefficient. In practice, however, a simpler procedure is normally used to calculate . The raw scores are converted to ranks, and the differences D between the ranks of each observation on the two variables are calculated. is then computed as:

where: D = the difference between the ranks of corresponding values of X and Y, and N = the number of pairs of values. As an example, suppose we rank a group of eight people by height and by weight (here person A is tallest and third-heaviest, and so on): Case Rank by Height Rank by Weight A 1 3 B 2 4 C D 3 1 4 2 E 5 5 F 6 7 G H 7 8 8 6

The differences between the ranks for the 8 subjects (height rank weight rank) are: -2, -2, 2, 2, 0, -1, -1, 2 Squaring and then summing these values:
If the data are at the interval or ratio level, nonparametric correlation tests like Spearman's rho and Kendall's tau simply replace these data with their ranks.
*

D2 = 4 + 4 + 4 + 4 + 0 + 1 + 1 + 4 = 22 Computing the denominator: N(N2-1) = 8(82 1) = 8(64 1) = 504 And finally Spearman's = 1 [(6*22)/504] = 1 (132/504) = 0.738 Like a Pearson r, a Spearman's rho () of 0.738 would be considered a moderately strong positive correlation, in this case indicating that as a person's height rank increases, so too does their weight rank. Kendall's tau. An alternative measure used to test for the strength of a relationship between ordinal (ranked) variables is Kendall's rank order correlation coefficient or Kendall's tau (). The main advantage of using Kendall's tau over Spearman's rho is that one can interpret its value as a direct measure of the probabilities of observing concordant and discordant pairs. As long as one of the variables is presorted by order, Kendall's tau can be computed using the following formula:

Where P is the sum, over all the cases, of cases ranked after the given item by both rankings, and n is the number of paired items. Using the same data as we employed to compute Spearman's rho, we note that the paired observations are sorted in order of height, so we will compute P based on the weight data. In the Weight row of this table, the first entry, 3, has five higher ranks to the right of it; so its contribution to P is 5. Moving to the second entry, 4, we see that there are four higher ranks to the right of it and its contribution to P is 4. Continuing this way, we find that P = 5 + 4 + 5 + 4 + 3 + 1 + 0 + 0 = 22 And thus 2P = 2 x22 = 44 Computing the denominator: n(n -1) = 4(8-1) = 4 x 7 = 28 And finally computing Kendall's tau

. Again, we see a positive correlation between the height and weight ranks, albeit less strong than that revealed by Spearman's rho. 5

Testing for Group Differences on Ordinal (Ranked) Data There are many times when researchers want to compare two or more groups on an outcome that is measured at the ordinal level (as opposed to interval or ratio data). Alternatively, interval or ratio-level measurements on groups may be so skewed as to make regular parametric analysis impossible. In these cases, comparable nonparametric approaches to traditional t-testing or analysis of variance (ANOVA) are needed. The following table summarizes the nonparametric equivalents to traditional t-testing or ANOVA:
Purpose or Need To analyze differences between 2 independent groups To analyze differences between 2 related groups (repeated measures) To analyze differences between 3 or more independent groups To analyze differences between 3 or more related groups (repeated measures) Parametric Approach Independent t-test Paired (dependent) t-test One way ANOVA (F-test) Repeated measures ANOVA Nonparametric Approach Mann-Whitney U test (aka Wilcoxon rank-sum test) Wilcoxon signed rank test for paired data Kruskal-Wallis ANOVA Friedman two-way ANOVA

Adapted from: Dallal, G.E. Nonparametric statistics. In The Little Handbook of Statistical Practice available at: http://www.tufts.edu/~gdallal/LHSP.HTM

Comparing Two Groups by Ranks the Mann-Whitney U Test. The Mann-Whitney U test (also known as the Wilcoxon Rank Sum Test) is a nonparametric test used to determine whether two samples of ordinal/ranked data differ.* It is the nonparametric equivalent to conducting an independent t-test comparing two groups on a normally distributed continuous variable. The Mann-Whitney U ranks all the cases for each of the two groups from the lowest to the highest value. Then a mean rank, sum of ranks and 'U' score is computed for each group. Two U scores are computed: U1 and U2. U1 is defined as the number of times that a score from group 1 is lower in rank than a score from group 2. Likewise U2 is defined as the number of times that a score from group 2 is lower in rank that a score from group 1. U1 and U2 are computed as follows: U1 = n1n2 + (n1(n1 + 1))/2 - R1 U2 = n1n2 + (n2(n2 + 1))/2 - R2 where: n1 = number of observations in group 1 n2 = number of observations in group 2 R1 = sum of ranks assigned to group 1 R2 = sum of ranks assigned to group 2
If the data are at the interval or ratio level, nonparametric tests like the Mann-Whitney U simply replace these data with their ranks. However, if the sample data are continuous and normally distributed, then nonparametric tests like the Mann-Whitney U Test should not be employed since they are less powerful than their parametric equivalents and thus more likely to miss a true difference between groups.
*

The Mann-Whitney U statistic is defined as the smaller of U1 or U2. The Wilcoxon W statistic (Wilcoxon rank-sum test) is simply the smaller of the two groups Sums of Ranks. Since the sampling distributions for both the U and W statistics approach that of a normal curve (as long as N > 20), we can use a simple Z-score to judge the significance of group differences in ranks. If the rank distributions are identical to one another, then the Z-score will equal 0. Positive Zscores indicate that the sums of the ranks of group 2 are greater than that of group 1, while negative Z-scores indicate the opposite, i.e., that the sums of the ranks of group 2 are less than that of group 1. At the normal confidence level of 0.05, any Z-score greater than 1.96 indicates a statistically significant difference in the distribution of ranks. Note that if the observations are paired instead of independent of each other (e.g., a pre/post measure conducted on the same subjects), then we use the Wilcoxon signed rank test for paired data (not to be confused with the Wilcoxon rank-sum test described above) instead of the MannWhitney U test. Comparing More than Two Groups by Ranks the Kruskal-Wallis Test. The Kruskal-Wallis Test is a generalization of the Wilcoxon rank sum test nonparametric test used to determine whether more than two groups of ordinal/ranked data differ.* It is the nonparametric equivalent to conducting a one-way ANOVA comparing multiple groups on a normally distributed continuous variable. The Kruskal-Wallis statistic, H, is computed as follows, with the results being compared to a critical value in the Chi-square distribution:

where: k = number of samples (groups) ni = number of observations for the i-th sample or group N = total number of observations (sum of all the ni) Ri = sum of ranks for group i As an example, consider the following comparison of four diet plans (labeled as plan A, B, C & D) enrolling a total of 19 patients. The observations represent kilograms of weight lost over a 3 month period.

If the data are at the interval or ratio level, nonparametric tests like the Mann-Whitney U simply replace these data with their ranks. However, if the sample data are continuous and normally distributed, then nonparametric tests like the Mann-Whitney U Test should not be employed since hey are less powerful than their parametric equivalents and thus more likely to miss a true difference between groups.

. A 4.2 4.6 3.9 4.0 B 3.3 2.4 2.6 3.8 2.8 C 1.9 2.4 2.1 2.7 1.8 D 3.5 3.1 3.7 4.1 4.4

The first step in conducting a Kruskal-Wallis analysis is to rank order ALL the observations from lowest (1) to highest (19) and then sum the ranks for each plan:

A 17 19 14 15 Sum of Ranks 65

B 10 4.5 6 13 8 41.5

C 2 4.5 3 7 1 17.5

D 11 9 12 16 18 66

Based on the sum of ranks for each group, we apply the computation formula for H:

Last, from using a Chi-square table, we determine that the critical value three degrees of freedom (degrees of freedom = # groups 1) is 7.812. Since 13.678 is greater than this critical value, we reject the null hypothesis and conclude that the rankings of weight loss do differ among the four diet plans. Inspecting the sum of ranks suggests that plans A and B are the best (and nearly equivalent), whereas plan C ranks lowest in weight loss. Note that if the observations are repeated more than once (e.g., pre-test, post-test, other followup), then we cannot use the Kruskal-Wallis test and instead must use a nonparametric alternative to the repeated-measures ANOVA, e.g., Friedman's two-way ANOVA.

Reference Bibliography Agresti, A. (1996). Introduction to categorical data analysis. NY: Wiley. Altman, D.G. (1991). Comparing groups categorical data (Chapter 10). In Practical statistics for medical research. Boca Raton, FL: Chapman & Hall. Becker, L.A. (1999a). Crosstabs: Measures for nominal data. University of Colorado at Colorado Springs. Retrieved January 9, 2004 from http://web.uccs.edu/lbecker/SPSS/ctabs1.htm Becker, L.A. (1999b). Crosstabs: Measures for ordinal data. University of Colorado at Colorado Springs. Retrieved January 9, 2004 from http://web.uccs.edu/lbecker/SPSS80/ctabs2.htm Becker, L.A. (1999c). Testing for differences between two groups: Nonparametric tests. University of Colorado at Colorado Springs. Retrieved January 9, 2004 from http://web.uccs.edu/lbecker/spss80/nonpar.htm Connor-Linton, J. (2003). Chi-square tutorial. Georgetown University. Retrieved November 30, 2004 from http://www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html Conover, W. J. (1999). Practical nonparametric statistics (3rd ed). New York: Wiley. Daniel, W. W. (1990). Applied nonparametric statistics (2nd ed). Boston: PWS-Kent Dallal, G.E. (2000). Nonparametric statistics. In The little handbook of statistical practice. Retrieved November 22, 2002 from http://www.tufts.edu/~gdallal/npar.htm Dallal, G.E. (2000). Contingency tables. In The little handbook of statistical practice. Retrieved November 22, 2002 from http://www.tufts.edu/~gdallal/ctab.htm Daniel, W.W. (2004). The Chi-square distribution and the analysis of frequencies (Chapter 12). In Biostatistics: A foundation for analysis in the health sciences, 8th ed. New York: Wiley Daniel, W.W. (2004). Nonparametric and distribution-free statistics (Chapter 13). In Biostatistics: A foundation for analysis in the health sciences, 8th ed. New York: Wiley deRoche, J. (2004). Measures of association. Cape Breton University. Retrieved May 7, 2004 from http://anthrosoc.capebretonu.ca/Measures%20of%20association.doc Field, A. (2005). Categorical data (Chapter 16). In Discovering statistics using SPSS. 2nd ed. London: Sage Publications Friel, C.M. (2004a). Nonparametric tests. Sam Houston State University. Retrieved February 19, 2004 from http://www.shsu.edu/~icc_cmf/cj_685/mod9.doc Friel, C.M. (2004b). Nonparametric correlation techniques. Sam Houston State University. Retrieved February 19, 2004 from http://www.shsu.edu/~icc_cmf/cj_685/mod12.doc Garson, G.D. (1998a). Chi-square significance tests. In Statnotes: Topics in multivariate analysis. Retrieved February 26, 2004 from http://www2.chass.ncsu.edu/garson/pa765/chisq.htm

Garson, G.D. (1998b). Fisher exact test of significance. In Statnotes: Topics in multivariate analysis. Retrieved February 26, 2004 from http://www2.chass.ncsu.edu/garson/pa765/fisher.htm Garson, G.D. (1998c). Nominal association: Phi, contingency coefficient, Tschuprow's T, Cramer's V, lambda, uncertainty coefficient. In Statnotes: Topics in multivariate analysis. Retrieved February 26, 2004 from http://www2.chass.ncsu.edu/garson/pa765/assocnominal.htm Garson, G.D. (1998d). Ordinal association: gamma, Kendall's tau-b and tau-c, Somers' d. In Statnotes: Topics in multivariate analysis. Retrieved February 26, 2004 from http://www2.chass.ncsu.edu/garson/pa765/assocordinal.htm Garson, G.D. (1998e). Tests for two independent samples: Mann-Whitney U, KolmogorovSmirnov Z, & Moses extreme reactions tests. In Statnotes: Topics in multivariate analysis. Retrieved May 17. 2004 from http://www2.chass.ncsu.edu/garson/pa765/mann.htm Gibbons, J. (1993). Nonparametric measures of association. Quantitative applications in the social sciences series. Thousand Oaks, CA: Sage Publications Gibbons, J. (1992). Nonparametric statistics: An introduction. Quantitative applications in the social sciences series. Thousand Oaks, CA: Sage Publications Gibbons, J. D., & Chakraborti, S. (1992). Nonparametric statistical inference (3rd ed.). New York: Marcel Dekker Gore, A. P., Deshpande, J. V., & Shanubhogue, A. (1993). Statistical analysis of non-normal data. New York: Wiley. Hollander, W., & Wolfe, D.A. (1999). Nonparametric statistical methods (2nd ed). New York: Wiley. Lehmkuhl, L.D. (1996). Nonparametric statistics: Methods for analyzing data not meeting assumptions required for the application of parametric tests. J Prothetics Orthotics, 8, 105-113. Michael, R.S. (2001) Crosstabulation & Chi square. Indiana University. Retrieved November 30, 2004 from http://www.indiana.edu/~educy520/sec5982/week_12/chi_sq_summary011020.pdf Noether, G. E. (1991). Introduction to statistics: the nonparametric way. New York: SpringerVerlag. Norman, G.R. & Streiner, D.L. (2000). Test of significance for categorical frequency data (Chapter 20). In Biostatistics: The bare essentials (2nd ed). Hamilton, Ontario: B.C. Decker Norman, G.R. & Streiner, D.L. (2000). Measures of association for categorical data (Chapter 21). In Biostatistics: The bare essentials (2nd ed). Hamilton, Ontario: B.C. Decker Norman, G.R. & Streiner, D.L. (2000). Tests of significance for ranked data (Chapter 22). In Biostatistics: The bare essentials (2nd ed). Hamilton, Ontario: B.C. Decker Norman, G.R. & Streiner, D.L. (2000). Measures of association for ranked data (Chapter 23). In Biostatistics: The bare essentials (2nd ed). Hamilton, Ontario: B.C. Decker

10

Pett, P. (1997). Nonparametric statistics in health care research: Statistics for small samples and unusual distributions. Thousand Oaks, CA: Sage Publications Reynolds, H.T. (1984). Analysis of nominal data. Sage series on Quantitative Applications in the Social Sciences. Newbury Park CA: Sage. Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York: McGraw-Hill. Statsoft (2006). Nonparametric statistics. In Electronic Statistics Textbook. Tulsa, OK: StatSoft, Inc. Retrieved Aprill 22, 2004 from http://www.statsoft.com/textbook/stnonpar.html van Belle, G., Fisher, L.D., Heagerty, P.J., & Lumley, T.S. (2004). Categorical data: Contingency tables (Chapter 7). Biostatistics: A methodology for the health sciences, 2nd ed. New York: Wiley van Belle, G., Fisher, L.D., Heagerty, P.J. Lumley, T.S. (2004). Nonparametric, distribution-free and permutation models: Robust procedures. (Chapter 8). Biostatistics: A methodology for the health sciences, 2nd ed. New York: Wiley Weaver, B. (2002). Nonparametric tests (Chapter 3). Northern Ontario School of Medicine. Retrieved August 2, 2006 from http://www.angelfire.com/wv/bwhomedir/notes/nonpar.pdf Weaver, B. (2005). Analysis of categorical data (Chapter 2). Northern Ontario School of Medicine. Retrieved August 2, 2006 from http://www.angelfire.com/wv/bwhomedir/notes/categorical.pdf Williams, R. (2005). Categorical data analysis. University of Notre Dame, Department of Sociology. Retrieved March 25, 2007 from http://www.nd.edu/~rwilliam/stats1/x51.pdf

11

You might also like