You are on page 1of 9

Gamma Coefficient (Goodman and Kruskal’s Gamma) & Yule’s Q

The gamma coefficient (also called the gamma statistic, or Goodman and Kruskal’s gamma) tells us how closely two
pairs of data points “match”. Gamma tests for an association between points and also tells us the strength of
association. The goal of the test is to be able to predict where new values will rank. For example, if score A scores
“LOW” for question 1 and “HiGH” for question 2, will score B also result in a LOW/High response?
Gamma can be calculated for ordinal (ordered) variables that are continuous variables (like height or weight)
or discrete variables (like “hot” “hotter” and “hottest”). While there are other coefficients that can calculate
relationships for these types of variables, like Somer’s D or Kendall’s Tau, Goodman and Kruskal’s gamma is
generally preferred for when you have many tied ranks. It is also particularly useful when your data has outliers,
as they don’t affect the results much. For some fields of study it may be the preferred method for all ordinal data
arranged in a bivariate table. If you have two dichotomous variables (e.g. responses that are yes/no), use Yule’s
Q instead.

Range of the Gamma Coefficient


The gamma coefficient ranges between -1 and 1.

• 1 = perfect positive correlation: if one value goes up, so does the other.
• -1 = perfect inverse correlation: as one value goes up, the other goes down.
• 0 = there is no association between the variables

The closer you get to a 1 (or -1), the stronger the relationship. You can deduce the significance of your result by
running a significance test for gamma (see below). But how strong these relationships need to be depend upon which
field of study you’re working in. For example, a .75 might be “strong enough” in one field while another might require
over .8.
You can interpret gamma as the proportion of ranked pairs in agreement. For example, if gamma = +1, it means
that every single pair in your experiment is in agreement, or that every rater has agreed upon which order the items
should be ranked.
Gamma treats the variables symmetrically; you don’t have to hypothesize which might be dependent and which
might be independent variables.

Calculating the Gamma Coefficient


Goodman and Kruskal’s gamma uses the following formula:

Where:
• Nc is the total number of pairs that rank the same (concordant pairs)
• Nd is the number of pairs that don’t rank the same (discordant pairs).

Example 1 (Simple 2×2)


Suppose you are analyzing data on hours spent studying versus test scores. You might hypothesize that more
studying will lead to better grades, and you collect data that might show that. To make it simple, you’ll define
minimal studying as less than one hour a week, and extensive studying as anything more than that. You’ll also define
good grades as A and A+, with bad grades being anything below. You can tabulate your hypothetical data like this:

Cells a and d (minimal time/bad scores and extensive time/ good scores) are those that support your hypothesis
(i.e. they are concordant). Cells b and c go the other direction; If significant, they go against your hypothesis and
are non-supporting (discordant).
To calculate the gamma coefficient:
1. Find the number of concordant pairs, Nc Start with the upper left square and multiply by the sum of
all agreeing squares below and to the right (in this case, just d). Nc = 10 * 20 = 200,
2. Find the number of disconcordant pairs. Nd is calculated the same way; now, start with the upper
right square and multiply by the sum of all ‘non-supporting’ squares below and to the right. Do this again
for every non-supporting square, working down and left.
Nd = 5 * 6 = 30.
3. Insert the values from Step 1 into the formula:
The gamma statistic is:
(Nc – Nd) / (Nc + Nd) =(200 – 30) / (200 + 30), or 0.7391.

Since the gamma coefficient is much closer to 1 (perfect correlation) than to 0 (no association), your data points to
a strong correleation and your hypothesis has a good chance of being correct.

Example 2 (Complex)
This works with a more complicated table:

As before, find Ncby multiplying each cell by the sums of the cells and then adding those sums together. Identifying
the cells you’ll need in the equation is more easily explained in a step-by-step example.
1. Start with the top left cell (9). The cells below and to the right of it are 9,9,9, and 9.

2. Move over to the next cell (7). The two cells below and to the right are 9 and 9.

3. Move over to the next cell (7). There are no cells to the right of this cell.
4. Move down to the next row. The first cell (9) has two cells to the right and bottom (9,9).

5. Continue down the table until all the cells have been accounted for.
The summary of cells for this example is:

• 9 (9 + 9 + 9 + 9) = 324
• 7 (9 + 9) = 126
• 7 (0) = 0
• 9 (9 + 9) = 162
• 9 (9) = 81
• 9 (0) = 0
• 7 (0) = 0
• 9 (0) = 0
• 9 (0) = 0
Adding those all together, we have:
Nc = 324 + 162 + 81 + 126 = 693
Find the Nd in the same way, only this time start in the top right and count all of the cells to the left and bottom.
• 7 (9 + 9 + 7 + 9) = 238
• 7 (9 + 7) = 112
• 9 (0) = 0
• 9 (9 + 7) = 144
• 9 (7) = 63
• 9 (0) = 0
• 9 (0) = 0
• 9 (0) = 0
• 7 (0) = 0
Adding those all together, we have:
Nd = 238 + 112 + 144 + 63 = 557
Which means Gamma is: (693 – 557) / (693 + 557) = 0.12

There is a practically no significance (0.12 is close to zero)

Testing for Significance


The gamma test for significance works like most other hypothesis tests: find a test statistic and compare it to a table
value. I skim over the steps here, so if you’ve never performed a hypothesis test before you may want to read this
article: What is Hypothesis Testing?
The formula for the test statistic is:

Inserting the values from example 2 above, we have:

= .4103
Kruskal Wallis H Test: Definition, Examples & Assumptions
Share on

What is the Kruskal Wallis Test?

The Kruskal Wallis H test uses ranks instead of actual data.

The Kruskal Wallis test is the non parametric alternative to the One Way ANOVA. Non parametric means that the
test doesn’t assume your data comes from a particular distribution. The H test is used when the assumptions for
ANOVA aren’t met (like the assumption of normality). It is sometimes called the one-way ANOVA on ranks, as the
ranks of the data values are used in the test rather than the actual data points.
The test determines whether the medians of two or more groups are different. Like most statistical tests, you
calculate a test statistic and compare it to a distribution cut-off point. The test statistic used in this test is called the H
statistic. The hypotheses for the test are:
• H0: population medians are equal.
• H1: population medians are not equal.
The Kruskal Wallis test will tell you if there is a significant difference between groups. However, it won’t tell
you which groups are different. For that, you’ll need to run a Post Hoc test.

Examples
1. You want to find out how test anxiety affects actual test scores. The independent variable “test anxiety”
has three levels: no anxiety, low-medium anxiety and high anxiety. The dependent variable is the exam
score, rated from 0 to 100%.
2. You want to find out how socioeconomic status affects attitude towards sales tax increases. Your
independent variable is “socioeconomic status” with three levels: working class, middle class and
wealthy. The dependent variable is measured on a 5-point Likert scale from strongly agree to strongly
disagree.

Assumptions for the Kruskal Wallis Test


Your variables should have:

• One independent variable with two or more levels (independent groups). The test is more commonly
used when you have three or more levels. For two levels, consider using the Mann Whitney U
Test instead.
• Ordinal scale, Ratio Scale or Interval scale dependent variables.
• Your observations should be independent. In other words, there should be no relationship between the
members in each group or between groups. For more information on this point, see: Assumption of
Independence.
• All groups should have the same shape distributions. Most software (i.e. SPSS, Minitab) will test for this
condition as part of the test.
Running the H Test
Example question: A shoe company wants to know if three groups of workers have different salaries:
Women: 23K, 41K, 54K, 66K, 78K.
Men: 45K, 55K, 60K, 70K, 72K
Minorities: 18K, 30K, 34K, 40K, 44K.
Step 1: Sort the data for all groups/samples into ascending order in one combined set.
• 20K
• 23K
• 30K
• 34K
• 40K
• 41K
• 44K
• 45K
• 54K
• 55K
• 60K
• 66K
• 70K
• 72K
• 90K
Step 2: Assign ranks to the sorted data points. Give tied values the average rank.
• 20K 1
• 23K 2
• 30K 3
• 34K 4
• 40K 5
• 41K 6
• 44K 7
• 45K 8
• 54K 9
• 55K 10
• 60K 11
• 66K 12
• 70K 13
• 72K 14
• 90K 15
Step 3: Add up the different ranks for each group/sample.
Women: 23K, 41K, 54K, 66K, 90K = 2 + 6 + 9 + 12 + 15 = 44.
Men: 45K, 55K, 60K, 70K, 72K = 8 + 10 + 11 + 13 + 14 = 56.
Minorities: 20K, 30K, 34K, 40K, 44K = 1 + 3 + 4 + 5 + 7 = 20.
Step 4: Calculate the H statistic:

Where:
• n = sum of sample sizes for all samples,
• c = number of samples,
• Tj = sum of ranks in the jth sample,
• nj = size of the jth sample.
H = 6.72

Step 5: Find the critical chi-square value, with c-1 degrees of freedom. For 3 – 1 degrees of freedom and an alpha
level of .05, the critical chi square value is 5.9915.
Step 6: Compare the H value from Step 4 to the critical chi-square value from Step 5.
If the critical chi-square value is less than the H statistic, reject the null hypothesis that the medians are equal.
If the chi-square value is not less than the H statistic, there is not enough evidence to suggest that the medians are
unequal.
In this case, 5.9915 is less than 6.72, so you can reject the null hypothesis.
What is the Mann Kendall Trend Test?

The Mann Kendall Trend Test looks for trends in data. Dow Jones Timeplot from the Wall Street Journal.

The Mann Kendall Trend Test (sometimes called the M-K test) is used to analyze data collected over time for
consistently increasing or decreasing trends (monotonic) in Y values. It is a non-parametric test, which means it
works for all distributions (i.e. your data doesn’t have to meet the assumption of normality), but your data should
have no serial correlation. If your data does follow a normal distribution, you can run simple linear
regression instead.
The test can be used to find trends for as few as four samples. However, with only a few data points, the test has a
high probability of not finding a trend when one would be present if more points were provided. The more data
points you have the more likely the test is going to find a true trend (as opposed to one found by chance). The
minimum number of recommended measurements is therefore at least 8 to 10.

How the Test Works


• The null hypothesis for this test is that there is no monotonic trend in the series.
• The alternate hypothesis is that a trend exists. This trend can be positive, negative, or non-null.
Before running the test, you should ensure that:

1. Your data isn’t collected seasonally (e.g. only during the summer and winter months), because the
test won’t work if alternating upward and downward trends exist in the data. Another test—
the Seasonal Kendall Test—is generally used for seasonally collected data.
2. Your data does not have any covariates—other factors that could influence your data other than the
ones you’re plotting. See Covariates for more information.
3. You have only one data point per time period. If you have multiple points, use the median value.
The Mann-Kendall Trend Test analyzes difference in signs between earlier and later data points. The idea is
that if a trend is present, the sign values will tend to increase constantly, or decrease constantly. Every value is
compared to every value preceding it in the time series, which gives a total of n(n – 1) / 2 pairs of data, where “n” is
the number of observations in the set. For example, if you have 20 observations, the number of pairwise
comparisons is:
20(20 – 1) / 2 = 20(19)/2 = 380/2 = 190.
As you can probably tell, even a relatively small data set can result in a huge number of comparisons. Although it’s
possible to run the test by hand (you can find the steps in this Mann Kendall Analysis PDF), most people choose to
use software. You have many options, including:
• R: Install the Kendall package developed by A.I. McLeod with the following command.
install.packages(“Kendall”)
Full instructions can be found in this Word document: Mann-Kendall Trend Test in R
• Minitab: Download this macro from the Minitab site.
What is a Mann Whitney U Test?
The Mann-Whitney U test is the nonparametric equivalent of the two sample t-test. While the t-test makes an
assumption about the distribution of a population (i.e. that the sample came from a t-distributed population), the
Mann Whitney U Test makes no such assumption.

Null Hypothesis for the Test


The test compares two populations. The null hypothesis for the test is that the probability is 50% that a randomly
drawn member of the first population will exceed a member of the second population.
Another option for the null hypothesis is that the two samples come from the same population (i.e. that they both
have the same median).
The result of performing a Mann Whitney U Test is a U Statistic. For small samples, use the direct method (see below)
to find the U statistic; For larger samples, a formula is necessary. Or, you can use technology like SPSS to run the test.

Either of these two formulas are valid for the Mann Whitney U Test. R is the sum of ranks in the sample, and n is the
number of items in the sample.

Mann Whitney U Test Direct Method


This method is limited only by how much computation you want to perform. The larger the sample, the more complex
the math:

1. Name the sample with the smaller ranks “sample 1” and the sample with the larger ranks “sample 2”.
Choosing the sample with the smaller ranks to be “sample 1” is optional, but it makes the computation
easier.
2. Take the first observation in sample 1. Count how many observations in sample 2 are smaller than it. If
the observations are equal, count it as one half. For example, if you have ten that are less and two that
are equal: 10 + 2(1/2) = 11.
3. Repeat Step 2 for all observations in sample 1.
4. Add up all of your totals from Steps 2 and 3. This is the U statistic.

Assumptions for the Mann Whitney U Test


• The dependent variable should be measured on an ordinal scale or a continuous scale.
• The independent variable should be two independent, categorical groups.
• Observations should be independent. In other words, there should be no relationship between the two
groups or within each group.
• Observations are not normally distributed. However, they should follow the same shape (i.e. both
are bell-shaped and skewed left).
What is Mood’s Median Test?
Mood’s median test is used to compare the medians for two samples to find out if they are different. For example,
you might want to compare the median number of positive calls to a hotline vs. the median number
of negative comment calls to find out if you’re getting significantly more negative comments than positive comments
(or vice versa).
This test is the nonparametric alternative to a one way ANOVA; Nonparametric means that you don’t have to know
what distribution your sample came from (i.e. a normal distribution) before running the test. That said, your samples
should have been drawn from distributions with the same shape. This test has very low statistical power for
samples drawn from normal distributions or short-tailed distributions.
Use this test instead of the sign test when you have two independent samples.
• The null hypothesis for this test is that the medians are the same for both groups.
• The alternate hypothesis for the test is that the medians are different for both groups.

Running the Test


This test is normally run using statistical software. However, you can run the test by hand using the following steps:

Step 1: Make a 2 x k contingency table, where k is the number of samples. For this example, let’s say there are 3
samples, making a 2 x 3 table. Fill in the table as shown.

Step 2: Find M, the overall median for all the data in your samples. To do this, list all of your data (from all
samples) in a single set. Sort in ascending order and then find the middle number.

Step 3: List each individual sample’s data in ascending order. Count how many data points are greater than M (from
Step 2) and then count how many data points are smaller than or equal to M. List these in the first row of the
contingency table.
Step 4: Perform a chi-square test on the completed contingency table.
Step 5: Compare the chi-square statistic to the table value with: degrees of freedom = (number of rows – 1) *
(number of columns – 1). For this example, df = (2 – 1) * (3 – 1) = 1 * 2 = 2.

Results for The Test


Like most hypothesis tests, your results will include a p-value and an alpha level (usually 5% or 0.05).
If your p-value is less than or equal to alpha, the medians are different and you can reject the null hypothesis that they
are the same.

You might also like