Professional Documents
Culture Documents
The gamma coefficient (also called the gamma statistic, or Goodman and Kruskal’s gamma) tells us how closely two
pairs of data points “match”. Gamma tests for an association between points and also tells us the strength of
association. The goal of the test is to be able to predict where new values will rank. For example, if score A scores
“LOW” for question 1 and “HiGH” for question 2, will score B also result in a LOW/High response?
Gamma can be calculated for ordinal (ordered) variables that are continuous variables (like height or weight)
or discrete variables (like “hot” “hotter” and “hottest”). While there are other coefficients that can calculate
relationships for these types of variables, like Somer’s D or Kendall’s Tau, Goodman and Kruskal’s gamma is
generally preferred for when you have many tied ranks. It is also particularly useful when your data has outliers,
as they don’t affect the results much. For some fields of study it may be the preferred method for all ordinal data
arranged in a bivariate table. If you have two dichotomous variables (e.g. responses that are yes/no), use Yule’s
Q instead.
• 1 = perfect positive correlation: if one value goes up, so does the other.
• -1 = perfect inverse correlation: as one value goes up, the other goes down.
• 0 = there is no association between the variables
The closer you get to a 1 (or -1), the stronger the relationship. You can deduce the significance of your result by
running a significance test for gamma (see below). But how strong these relationships need to be depend upon which
field of study you’re working in. For example, a .75 might be “strong enough” in one field while another might require
over .8.
You can interpret gamma as the proportion of ranked pairs in agreement. For example, if gamma = +1, it means
that every single pair in your experiment is in agreement, or that every rater has agreed upon which order the items
should be ranked.
Gamma treats the variables symmetrically; you don’t have to hypothesize which might be dependent and which
might be independent variables.
Where:
• Nc is the total number of pairs that rank the same (concordant pairs)
• Nd is the number of pairs that don’t rank the same (discordant pairs).
Cells a and d (minimal time/bad scores and extensive time/ good scores) are those that support your hypothesis
(i.e. they are concordant). Cells b and c go the other direction; If significant, they go against your hypothesis and
are non-supporting (discordant).
To calculate the gamma coefficient:
1. Find the number of concordant pairs, Nc Start with the upper left square and multiply by the sum of
all agreeing squares below and to the right (in this case, just d). Nc = 10 * 20 = 200,
2. Find the number of disconcordant pairs. Nd is calculated the same way; now, start with the upper
right square and multiply by the sum of all ‘non-supporting’ squares below and to the right. Do this again
for every non-supporting square, working down and left.
Nd = 5 * 6 = 30.
3. Insert the values from Step 1 into the formula:
The gamma statistic is:
(Nc – Nd) / (Nc + Nd) =(200 – 30) / (200 + 30), or 0.7391.
Since the gamma coefficient is much closer to 1 (perfect correlation) than to 0 (no association), your data points to
a strong correleation and your hypothesis has a good chance of being correct.
Example 2 (Complex)
This works with a more complicated table:
As before, find Ncby multiplying each cell by the sums of the cells and then adding those sums together. Identifying
the cells you’ll need in the equation is more easily explained in a step-by-step example.
1. Start with the top left cell (9). The cells below and to the right of it are 9,9,9, and 9.
2. Move over to the next cell (7). The two cells below and to the right are 9 and 9.
3. Move over to the next cell (7). There are no cells to the right of this cell.
4. Move down to the next row. The first cell (9) has two cells to the right and bottom (9,9).
5. Continue down the table until all the cells have been accounted for.
The summary of cells for this example is:
• 9 (9 + 9 + 9 + 9) = 324
• 7 (9 + 9) = 126
• 7 (0) = 0
• 9 (9 + 9) = 162
• 9 (9) = 81
• 9 (0) = 0
• 7 (0) = 0
• 9 (0) = 0
• 9 (0) = 0
Adding those all together, we have:
Nc = 324 + 162 + 81 + 126 = 693
Find the Nd in the same way, only this time start in the top right and count all of the cells to the left and bottom.
• 7 (9 + 9 + 7 + 9) = 238
• 7 (9 + 7) = 112
• 9 (0) = 0
• 9 (9 + 7) = 144
• 9 (7) = 63
• 9 (0) = 0
• 9 (0) = 0
• 9 (0) = 0
• 7 (0) = 0
Adding those all together, we have:
Nd = 238 + 112 + 144 + 63 = 557
Which means Gamma is: (693 – 557) / (693 + 557) = 0.12
= .4103
Kruskal Wallis H Test: Definition, Examples & Assumptions
Share on
The Kruskal Wallis test is the non parametric alternative to the One Way ANOVA. Non parametric means that the
test doesn’t assume your data comes from a particular distribution. The H test is used when the assumptions for
ANOVA aren’t met (like the assumption of normality). It is sometimes called the one-way ANOVA on ranks, as the
ranks of the data values are used in the test rather than the actual data points.
The test determines whether the medians of two or more groups are different. Like most statistical tests, you
calculate a test statistic and compare it to a distribution cut-off point. The test statistic used in this test is called the H
statistic. The hypotheses for the test are:
• H0: population medians are equal.
• H1: population medians are not equal.
The Kruskal Wallis test will tell you if there is a significant difference between groups. However, it won’t tell
you which groups are different. For that, you’ll need to run a Post Hoc test.
Examples
1. You want to find out how test anxiety affects actual test scores. The independent variable “test anxiety”
has three levels: no anxiety, low-medium anxiety and high anxiety. The dependent variable is the exam
score, rated from 0 to 100%.
2. You want to find out how socioeconomic status affects attitude towards sales tax increases. Your
independent variable is “socioeconomic status” with three levels: working class, middle class and
wealthy. The dependent variable is measured on a 5-point Likert scale from strongly agree to strongly
disagree.
• One independent variable with two or more levels (independent groups). The test is more commonly
used when you have three or more levels. For two levels, consider using the Mann Whitney U
Test instead.
• Ordinal scale, Ratio Scale or Interval scale dependent variables.
• Your observations should be independent. In other words, there should be no relationship between the
members in each group or between groups. For more information on this point, see: Assumption of
Independence.
• All groups should have the same shape distributions. Most software (i.e. SPSS, Minitab) will test for this
condition as part of the test.
Running the H Test
Example question: A shoe company wants to know if three groups of workers have different salaries:
Women: 23K, 41K, 54K, 66K, 78K.
Men: 45K, 55K, 60K, 70K, 72K
Minorities: 18K, 30K, 34K, 40K, 44K.
Step 1: Sort the data for all groups/samples into ascending order in one combined set.
• 20K
• 23K
• 30K
• 34K
• 40K
• 41K
• 44K
• 45K
• 54K
• 55K
• 60K
• 66K
• 70K
• 72K
• 90K
Step 2: Assign ranks to the sorted data points. Give tied values the average rank.
• 20K 1
• 23K 2
• 30K 3
• 34K 4
• 40K 5
• 41K 6
• 44K 7
• 45K 8
• 54K 9
• 55K 10
• 60K 11
• 66K 12
• 70K 13
• 72K 14
• 90K 15
Step 3: Add up the different ranks for each group/sample.
Women: 23K, 41K, 54K, 66K, 90K = 2 + 6 + 9 + 12 + 15 = 44.
Men: 45K, 55K, 60K, 70K, 72K = 8 + 10 + 11 + 13 + 14 = 56.
Minorities: 20K, 30K, 34K, 40K, 44K = 1 + 3 + 4 + 5 + 7 = 20.
Step 4: Calculate the H statistic:
Where:
• n = sum of sample sizes for all samples,
• c = number of samples,
• Tj = sum of ranks in the jth sample,
• nj = size of the jth sample.
H = 6.72
Step 5: Find the critical chi-square value, with c-1 degrees of freedom. For 3 – 1 degrees of freedom and an alpha
level of .05, the critical chi square value is 5.9915.
Step 6: Compare the H value from Step 4 to the critical chi-square value from Step 5.
If the critical chi-square value is less than the H statistic, reject the null hypothesis that the medians are equal.
If the chi-square value is not less than the H statistic, there is not enough evidence to suggest that the medians are
unequal.
In this case, 5.9915 is less than 6.72, so you can reject the null hypothesis.
What is the Mann Kendall Trend Test?
The Mann Kendall Trend Test looks for trends in data. Dow Jones Timeplot from the Wall Street Journal.
The Mann Kendall Trend Test (sometimes called the M-K test) is used to analyze data collected over time for
consistently increasing or decreasing trends (monotonic) in Y values. It is a non-parametric test, which means it
works for all distributions (i.e. your data doesn’t have to meet the assumption of normality), but your data should
have no serial correlation. If your data does follow a normal distribution, you can run simple linear
regression instead.
The test can be used to find trends for as few as four samples. However, with only a few data points, the test has a
high probability of not finding a trend when one would be present if more points were provided. The more data
points you have the more likely the test is going to find a true trend (as opposed to one found by chance). The
minimum number of recommended measurements is therefore at least 8 to 10.
1. Your data isn’t collected seasonally (e.g. only during the summer and winter months), because the
test won’t work if alternating upward and downward trends exist in the data. Another test—
the Seasonal Kendall Test—is generally used for seasonally collected data.
2. Your data does not have any covariates—other factors that could influence your data other than the
ones you’re plotting. See Covariates for more information.
3. You have only one data point per time period. If you have multiple points, use the median value.
The Mann-Kendall Trend Test analyzes difference in signs between earlier and later data points. The idea is
that if a trend is present, the sign values will tend to increase constantly, or decrease constantly. Every value is
compared to every value preceding it in the time series, which gives a total of n(n – 1) / 2 pairs of data, where “n” is
the number of observations in the set. For example, if you have 20 observations, the number of pairwise
comparisons is:
20(20 – 1) / 2 = 20(19)/2 = 380/2 = 190.
As you can probably tell, even a relatively small data set can result in a huge number of comparisons. Although it’s
possible to run the test by hand (you can find the steps in this Mann Kendall Analysis PDF), most people choose to
use software. You have many options, including:
• R: Install the Kendall package developed by A.I. McLeod with the following command.
install.packages(“Kendall”)
Full instructions can be found in this Word document: Mann-Kendall Trend Test in R
• Minitab: Download this macro from the Minitab site.
What is a Mann Whitney U Test?
The Mann-Whitney U test is the nonparametric equivalent of the two sample t-test. While the t-test makes an
assumption about the distribution of a population (i.e. that the sample came from a t-distributed population), the
Mann Whitney U Test makes no such assumption.
Either of these two formulas are valid for the Mann Whitney U Test. R is the sum of ranks in the sample, and n is the
number of items in the sample.
1. Name the sample with the smaller ranks “sample 1” and the sample with the larger ranks “sample 2”.
Choosing the sample with the smaller ranks to be “sample 1” is optional, but it makes the computation
easier.
2. Take the first observation in sample 1. Count how many observations in sample 2 are smaller than it. If
the observations are equal, count it as one half. For example, if you have ten that are less and two that
are equal: 10 + 2(1/2) = 11.
3. Repeat Step 2 for all observations in sample 1.
4. Add up all of your totals from Steps 2 and 3. This is the U statistic.
Step 1: Make a 2 x k contingency table, where k is the number of samples. For this example, let’s say there are 3
samples, making a 2 x 3 table. Fill in the table as shown.
Step 2: Find M, the overall median for all the data in your samples. To do this, list all of your data (from all
samples) in a single set. Sort in ascending order and then find the middle number.
Step 3: List each individual sample’s data in ascending order. Count how many data points are greater than M (from
Step 2) and then count how many data points are smaller than or equal to M. List these in the first row of the
contingency table.
Step 4: Perform a chi-square test on the completed contingency table.
Step 5: Compare the chi-square statistic to the table value with: degrees of freedom = (number of rows – 1) *
(number of columns – 1). For this example, df = (2 – 1) * (3 – 1) = 1 * 2 = 2.