Professional Documents
Culture Documents
Contingency Tables
After examining the univariate frequency distribution of the values of each variable separately,
the researcher is often interested in the joint occurrence and distribution of the values of the
independent and dependent variable together. The joint distribution of two variables is called a
bivariate distribution.
A contingency table shows the frequency distribution of the values of the dependent variable,
given the occurrence of the values of the independent variable. Both variables must be grouped
into a finite number of categories (usually no more than 2 or 3 categories) such as low, medium,
or high; positive, neutral, or negative; male or female; etc.
1) obtain a frequency distribution for the values of the independent variable; if the variable is
not divided into categories, decide on how to group the data.
2) obtain a frequency distribution for the values of the dependent variable; if the variable is not
divided into categories, decide on how to group the data.
3) obtain the frequency distribution of the values of the dependent variable, given the values of
the independent variable (either by tabulating the raw data, or from a computer program)
Example:
Independent Variable: Place of Residence
Categories: Inside City Limits=505
Outside City Limits=145
Joint Distribution:
1. Title
4. Order categories of the two variables from lowest to highest (from left to right across the
columns; from top to bottom along the rows).
1) Inspect the contingency table for patterns. This may be difficult if there are different totals of
observations in the different categories of the independent variable.
2) Convert the observations in each cell to a percentage of the column total; be sure to still show
the total number of observations for each column on which the percentages are based.
3) Compare the percentages across the categories of the dependent variable (the rows).
Example:
Table 1. Attitudes toward Consolidation by Area of Residence
The percentage distribution can suggest the strength of a relationship, but interpretation is up
to each individual researcher. There is no minimum percentage difference that must be reached
to indicate a strong or weak relationship between the two variables.
Does this mean that there is a relationship between the two variables, area of residence and
attitude toward consolidation? Is ones's attitude about consolidation associated with one's area of
residence?
If there is a relationship, how strong is it? Are the results statistically significant? Are the
results meaningfully significant? In order to answer these questions, we must turn to a set of
statistics called Measures of Association.
What is an Association
Can the value of one variable be predicted, if we know the value of the other variable?
For example, say half the people participating in training programs get a job. What is the
likelihood of any one participant getting a job? About fifty-fifty. So we would not be very good
at predicting whether people will get jobs or not.
But if we introduce a second variable (the independent variable), does it help us to be more
accurate in our predictions of the likelihood that someone will get a job?
If we know the length of the training program, we can perfectly predict the likelihood of
getting a job. The longer the training program, the more likely the participant is to get a job and,
conversely, the shorter the training program the less likely the participant is to get a job. That is,
as the training program length increases, so does the likelihood of obtaining a job. The value of
the measure of association would be +1.0.
If we know the length of the training program, we can perfectly predict the likelihood of
getting a job. The longer the training program, the less likely the participant is to get a job and,
conversely, the shorter the training program the more likely the participant is to get a job. That is,
as the training program length increases, likelihood of obtaining a job decreases. The value of the
measure of association would be -1.0.
Here we are back to a 50/50 guess. Knowing the length of the training program does not help
in any way to predict the likelihood of getting a job. The value of the measure of association
would be 0.0
Measures of Association
Measures of Association are statistics that provide a standard against which to judge the
relationship between the variables observed in contingency tables. They can indicate the strength
of a relationship between two variables measured on a nominal or ordinal scale. For the latter,
they can also indicate the direction of the relationship (positive or negative).
Measures of Association are descriptive statistics, so they can be used with samples which
were not selected using a strict random sampling method. But they do not allow the researcher to
infer whether the relationship observed in the sample is true of the general population.
Measures of Association do not indicate causality, but association--that is, whether one's score
on one variable tends to be associated with one's score on another variable. The value of the
measure of association statistic also indicates the strength of the relationship, whether weak,
moderate, or strong.
Measures of Association for variables measured at the nominal level generally vary from a
low of 0.0 to a high of +1.0. Lower values indicate weaker associations, and higher values
indicate stronger associations.
In addition, for variables measured at the ordinal level, Measures of Association vary from a
low of 0.0, indicating the weakest level of association, to a high of either +1.0 or -1.0, which
indicate the strongest level of association.
A value on the statistic between 0.0 and +1.0 indicates a positive (or direct) relationship. That
is, as the value of one variable increases the value of the other variable also increases. For
example, as the number of hours spent studying increases, the student's grade on the test also
increases. And conversely, as the number of hours spent studying decreases, the student's grade
on the test also decreases
A value on the statistic between 0.0 and -1.0 indicates a negative (or indirect) relationship.
That is, as the value of one variable increases the value of the other variable decreases. For
example, the as the number of librarians on duty increases, the number of patron complaints
decreases. And conversely, the as the number of librarians on duty decreases, the number of
patron complaints increases.
4) the researcher is familiar with the statistic and knows how to interpret it
5) look at what has been done in the past with research on this type of variable
Note that some statistics take on different values, depending on which of the two variables is
the independent variable and which is the dependent variable. These are called asymmetric
measures of association. Symmetric measures of association take on the same value, no matter
which variable is the independent variable and which is the dependent variable.
Note that the value of one statistic, such as gamma, cannot be directly compared with the
value of another statistic, such as Tau. Each statistic has its own standard, and the value of the
statistic obtained by the researcher must be compared with the standard for that statistic.
If the values of a number of statistics are obtained, and they all indicate a strong relationship
between two variables, the researcher may take that as additional support for the existence of a
relationship. However, if the values of a number of statistics are contradictory, with some
indicating a strong relationship and others a weak relationship, the researcher must look more
closely at the data. For example, there may be a non-linear relationship between the two
variables.
Note that some measures of association are not useful when there is a non-linear relationship
between the two variables. This can occur when there are three or more categories of values for
the independent variable, and the values of the dependent variable do vary but not in a strictly
linear fashion.
Lambda is a measure of association that measures the Proportional Reduction in Error (PRE)
obtained when the researcher uses the value of the independent variable to predict the value of
the dependent variable.
If the researcher only has the value of the dependent variable, the researcher will make a
number of errors trying to predict the values of the dependent variable for new observations. The
amount of error made in trying to predict the dependent variable alone is called original error.
For example, say you asked the people in your organization to rate the personnel department.
You know that the univariate distribution for this variable looks like this:
Rating of Personnel Department Frequency
Poor 38
Satisfactory 32
Good 35
Total 95
Let's say you want to guess what the rating of another 95 people would be. Your best guess
would be to pick to modal category, which is "Poor." That is, more people picked "Poor" than
any other category. If you consistently pick "Poor," you will make the fewest number of wrong
guesses. Original error=38 right and 57 wrong (out of 95 total guesses).
Now, let's say that you are given one additional piece of information. You now know what the
ratings of the personnel department are by the people who work in one of four departments:
police, fire, public works, and planning.
Now, if you had to guess the personnel department rating, you could qualify your best guess
by knowing the department of employment. For each department, you would guess the modal
category.
To calculate Lambda, subtract the number of new errors from the number of original errors
and divide by the number of original errors. In this case, [(57-42)/57]=.263
By knowing a person's department, we can reduce the error in predicting how they rate the
personnel department by 26.3%. This indicates a weak relationship between department of
employment and perception of the personnel department. As the independent variable is
measured on a nominal scale, there is no direction for the relationship (neither positive nor
negative, just an association).
Gamma is a measure of association that measures the Proportional Reduction in Error (PRE)
obtained when the researcher uses the value of the independent variable to predict the value of
the dependent variable.
Gamma varies from a value of 0.0 for the weakest level of association, to a value of +1.0 for
the strongest level of association for a direct or positive or -1.0 for the strongest level of
association for a negative or inverse relationship.
Note that both variables must be coded so that the values of the variable go from low to high,
for example, dissatisfied=1, neutral=2, high=3, or less than high school=1, high school=2, more
than high school=3. The values of the variables in the contingency table should be arrayed from
low to high as you read from left to right across the columns, and from low to high as you read
from top to bottom along the rows.
Gamma can be used with two variables measures at the ordinal level, but is not good at
reflecting non-linear relationships between two variables. In that case, a nominal measure of
association should be used.
For example, let us hypothesize that there is a relationship between the length of time a person
has been employed in an organization, and that person's opinion of that organization's personnel
department: the longer employed, the better the opinion.
To calculate gamma, we look at the number of observations that would support our hypothesis
(called A) and the number of observations that would not support it (called D).
First we look for the number of observations in agreement (A). This consists in identifying the
cells in the table that tend to support our hypothesis. We begin in the upper left hand corner, and
work right and downward across the table.
We take the number of people who have worked less than 1 year and rate the department as
poor (this would support our hypothesis). We multiply this number times the number of
observations found in the cells which are under and to the right of this cell. These are the cells
that contain the number of people who have worked either from 1-5 years or more than 5 years
and who rate the department as either satisfactory or good.
Next we find the number of people who have worked more less than 1 year and who rate the
department as satisfactory. We multiply this number times the number of observations found in
the cells which are under and to the right of this cell. This includes the number of people who
have worked from 1-5 years or more than 5 years and rate the department as good.
Next we find the number of people who have worked from 1-5 years and rate the department
as poor. We multiply this number times the number of observations found in the cells which are
under and to the right of this cell. This includes the number of people who have worked more
than 5 years and rate the department as satisfactory or good.
Finally, we count the number of people who have worked from 1-5 years and rate the
department as satisfactory. We multiply this number times the number of observations found in
the cells which are under and to the right of this cell. This includes the number of people who
have worked more than 5 years and rate the department as good.
Next we look for the number of observations in disagreement (D). This consists in identifying
the cells in the table that tend to support our hypothesis. In this case, we would begin in the
opposite (upper right hand) corner and work left and downward across the table.
In the table, there are 12 people who have worked more than five years who rate the personnel
department as poor (this would disconfirm our hypothesis). We multiply this number times the
number of observations found in the cells which are under and to the left of this cell. These are
the cells that contain the number of people who have worked either less than one or from 1-5
years and who rate the department as either satisfactory or good.
Next we find the number of people who have worked more than 5 years who would rate the
department as satisfactory. We multiply this number times the number of observations found in
the cells which are under and to the left of this cell. This includes the number of people who have
worked less than 1 year or from 1-5 years and rate the department as good.
Next we find the number of people who have worked from 1-5 years and rate the department
as poor. We multiply this number times the number of observations found in the cells which are
under and to the left of this cell. This includes the number of people who have worked less than 1
year and rate the department as satisfactory or good.
Finally, we count the number of people who have worked from 1-5 years and rate the
department as satisfactory. We multiply this number times the number of observations found in
the cells which are under and to the left of this cell. This includes the number of people who have
worked less than 1 year and rate the department as good.
Gamma is calculated by finding the number of observations in agreement minus the number of
observations in disagreement, and dividing that by the number of observations in agreement plus
the number of observations in disagreement.
Gamma=(0-360)/(0+360)=-1.0
This value of gamma tells us that we have a very strong relationship between the length of
time employed and opinion of the personnel department, but the relationship is in the opposite
direction than we predicted. That is, as length of employment increases, opinion of the personnel
department decreases.
In establishing whether or not a relationship exists between two variables, it is not enough to
obtain a high value on a measure of association. The researcher must also show that the
purported relationships between the two variables is not spurious. A spurious relationship is one
where two variables seem to be associated with one anther, but the association can be explained
away by the introduction of a third variable.
The introduction of a third, control, variable is called the specification or elaboration of the
relationship observed between the original two variables. Control variables come from the
researcher's experience; from a review of the literature; from a conceptual model that guides the
research; or from a hypothesis.
For example, it is possible to establish that an association exists between the amount of ice
cream sold and the number of assaults in any given city. However, this relationship is spurious:
both the amount of ice cream sold and the number of assaults increase as the temperature
increases. The temperature is associated with ice cream sales, and the temperature is associated
with assaults, but ice cream and assaults are not related. This becomes apparent because when
temperature is controlled, the value of the measure of association between ice cream sales and
assaults will greatly diminish.
2) Obtain the frequency distribution for the control variable and divide the observations in the
original table into groups according to the categories of the control variable.
3) within each of these two new groups, re-create the original bivariate distribution table
4) compare the new bivariate distributions with the original distribution (in step 1)
Divide the 650 observations in the original table into two groups: those who rate their current
services as satisfactory, and those who rate their current services as unsatisfactory.
Step 3. Within each of these two new groups, re-create the original bivariate distribution table.
Control Table A. Current Services Rated as Satisfactory (N=388)
Step 4. Compare the new bivariate distributions with the original distribution (in step 1). There
are three distinct possibilities: the original relationship is unchanged; the original relationship
disappears; the original relationship is changed.
If the original relationship is unchanged, then the control variable has no effect, and can be
disregarded in further analysis of the dependent variable.
If the original relationship disappears, then that relationship was spurious, and the control
variable becomes the new independent variable in further analysis of the dependent variable.
If the original relationship is changed, then both variables are important, and must be
considered in further analysis of the dependent variable.
In the original table, more city residents (54%) than non-city residents (37%) were for
consolidation. This relationship is similar among the respondents in the first control table. For
those who rate their current services as satisfactory, more city residents (65%) than non-city
residents (2%) were for consolidation.
However, the relationship is reversed in the second control table. Among respondents who rate
their current services as unsatisfactory, fewer city residents (34%) than non-city residents (62%)
are for consolidation.
In this case, both area of residence and perception of current services are important influences
on a citizen's attitude toward consolidation. Those who live outside the city, and who are
satisfied with their services, are opposed to consolidation, but those who live outside the city and
are unsatisfied with their services favor consolidation.
Among city residents, the relationship is reversed: those who are satisfied favor consolidation,
while those who are unsatisfied oppose it. Perhaps those who are unsatisfied think that their
services will deteriorate even further if the city and county are consolidated.
Another example concerns the attitude of organizational employees toward merit pay. We
hypothesize that men will be more favorable to merit pay than women. We obtain the following
bivariate distribution table:
This table seems to confirm our hypothesis: 80% of men favor merit pay but only 20% of
women favor it. Values obtained for various measures of association are strong.
However, our MPA intern suggests that it is not sex but whether or not someone is in
management position that determines their attitude toward merit pay. We obtain the distribution
for type of job, and find that of the original 1734 people in our study, 444 have management jobs
and 1290 do not.
Here the relationship between sex and attitude completely disappears. Equally high
percentages of women and men in management jobs are in favor of merit pay. The value
obtained for the measure of association drops to nearly zero.
Here the relationship between sex and attitude completely disappears. Equally high
percentages of women and men in non-management jobs are opposed to merit pay. The value
obtained for the measure of association drops to nearly zero.
In conclusion, we can discard the variable sex and concentrate on level of employment in our
further analysis of the dependent variable, attitude toward merit pay.