You are on page 1of 14

Lesson 14: CONTINGENCY TABLES

Contingency Tables

    After examining the univariate frequency distribution of the values of each variable separately,
the researcher is often interested in the joint occurrence and distribution of the values of the
independent and dependent variable together. The joint distribution of two variables is called a
bivariate distribution.

    A contingency table shows the frequency distribution of the values of the dependent variable,
given the occurrence of the values of the independent variable. Both variables must be grouped
into a finite number of categories (usually no more than 2 or 3 categories) such as low, medium,
or high; positive, neutral, or negative; male or female; etc.

Constructing a Contingency Table

    1) obtain a frequency distribution for the values of the independent variable; if the variable is
not divided into categories, decide on how to group the data.

    2) obtain a frequency distribution for the values of the dependent variable; if the variable is not
divided into categories, decide on how to group the data.

    3) obtain the frequency distribution of the values of the dependent variable, given the values of
the independent variable (either by tabulating the raw data, or from a computer program)

    4) display the results of step 3 in a table

Example:
Independent Variable: Place of Residence
Categories: Inside City Limits=505
Outside City Limits=145

Dependent Variable: Attitude about Consolidation


Categories: Favor consolidation=327
No Opinion=168
Against consolidation=155

Joint Distribution:

Table 1. Attitudes toward Consolidation by Area of Residence


 

Attitude toward  Area of Residence


Consolidation Inside Outside
City Limits City Limits
Against 98 57
No Opinion 134 34
For  273 54
Total 505 145

Characteristics of a Contingency Table:

1. Title

2. Categories of the Independent Variable head the tops of the columns

3. Categories of the Dependent Variable label the rows

4. Order categories of the two variables from lowest to highest (from left to right across the
columns; from top to bottom along the rows).

4. Show totals at the foot of the columns


 

Interpreting a Contingency Table

1) Inspect the contingency table for patterns. This may be difficult if there are different totals of
observations in the different categories of the independent variable.

2) Convert the observations in each cell to a percentage of the column total; be sure to still show
the total number of observations for each column on which the percentages are based.

3) Compare the percentages across the categories of the dependent variable (the rows).

Example:
Table 1. Attitudes toward Consolidation by Area of Residence
 

Attitude toward  Area of Residence


Consolidation Inside Outside
City Limits City Limits
(N=505) (N=145)
Against 19% 39%
No Opinion 27% 23%
For  54% 37%
Total 100% 100%
    According to this table, more city residents (54%) than non-city residents (37%) are for
consolidation. Conversely, more non-city residents (39%) than city residents (19%) are against
consolidation. About the same percentage of both groups have no opinion about consolidation.

    The percentage distribution can suggest the strength of a relationship, but interpretation is up
to each individual researcher. There is no minimum percentage difference that must be reached
to indicate a strong or weak relationship between the two variables.

    Does this mean that there is a relationship between the two variables, area of residence and
attitude toward consolidation? Is ones's attitude about consolidation associated with one's area of
residence?

    If there is a relationship, how strong is it? Are the results statistically significant? Are the
results meaningfully significant? In order to answer these questions, we must turn to a set of
statistics called Measures of Association.
 
 

What is an Association

    Can the value of one variable be predicted, if we know the value of the other variable?

    For example, say half the people participating in training programs get a job. What is the
likelihood of any one participant getting a job? About fifty-fifty. So we would not be very good
at predicting whether people will get jobs or not.

    But if we introduce a second variable (the independent variable), does it help us to be more
accurate in our predictions of the likelihood that someone will get a job?

Dependent variable: Obtaining a Job


No job=100
Gets a job=100

Independent Variable: Length of Training Program


Short=100
Long=100

Bivariate Distribution--Perfect Positive Relationship


(If training is good for getting a job)
 

Obtains a Job Length of Training Program


Short Long
(N=100) (N=100)
No 100% 0%
Yes 0% 100%
Total 100% 100%

    If we know the length of the training program, we can perfectly predict the likelihood of
getting a job. The longer the training program, the more likely the participant is to get a job and,
conversely, the shorter the training program the less likely the participant is to get a job. That is,
as the training program length increases, so does the likelihood of obtaining a job. The value of
the measure of association would be +1.0.

Bivariate Distribution--Perfect Inverse Relationship


(If training is bad for getting a job)
 

Obtains a Job Length of Training Program


Short Long
(N=100) (N=100)
No 0% 100%
Yes 100% 0%
Total 100% 100%

    If we know the length of the training program, we can perfectly predict the likelihood of
getting a job. The longer the training program, the less likely the participant is to get a job and,
conversely, the shorter the training program the more likely the participant is to get a job. That is,
as the training program length increases, likelihood of obtaining a job decreases. The value of the
measure of association would be -1.0.

Bivariate Distribution--No Relationship


(If training has no effect on getting a job)
 

Obtains a Job Length of Training Program


Short Long
(N=100) (N=100)
No 50% 50%
Yes 50% 50%
Total 100% 100%

    Here we are back to a 50/50 guess. Knowing the length of the training program does not help
in any way to predict the likelihood of getting a job. The value of the measure of association
would be 0.0
 

Measures of Association

    Measures of Association are statistics that provide a standard against which to judge the
relationship between the variables observed in contingency tables. They can indicate the strength
of a relationship between two variables measured on a nominal or ordinal scale. For the latter,
they can also indicate the direction of the relationship (positive or negative).

    Measures of Association are descriptive statistics, so they can be used with samples which
were not selected using a strict random sampling method. But they do not allow the researcher to
infer whether the relationship observed in the sample is true of the general population.

    Measures of Association do not indicate causality, but association--that is, whether one's score
on one variable tends to be associated with one's score on another variable. The value of the
measure of association statistic also indicates the strength of the relationship, whether weak,
moderate, or strong.

Examples of Measures of Association:


 

Level of  Measures of Values Symmetric?


Measurement
Association
Nominal Lambda 0.0 (weakest relationship) to 1.0 (strongest Lambda is
relationship) asymmetric
Ordinal Gamma 0.0 (weakest relationship) to +1.0 (strongest Gamma is
relationship) symmetric

    Measures of Association for variables measured at the nominal level generally vary from a
low of 0.0 to a high of +1.0. Lower values indicate weaker associations, and higher values
indicate stronger associations.

    In addition, for variables measured at the ordinal level, Measures of Association vary from a
low of 0.0, indicating the weakest level of association, to a high of either +1.0 or -1.0, which
indicate the strongest level of association.

    A value on the statistic between 0.0 and +1.0 indicates a positive (or direct) relationship. That
is, as the value of one variable increases the value of the other variable also increases. For
example, as the number of hours spent studying increases, the student's grade on the test also
increases. And conversely, as the number of hours spent studying decreases, the student's grade
on the test also decreases

    A value on the statistic between 0.0 and -1.0 indicates a negative (or indirect) relationship.
That is, as the value of one variable increases the value of the other variable decreases. For
example, the as the number of librarians on duty increases, the number of patron complaints
decreases. And conversely, the as the number of librarians on duty decreases, the number of
patron complaints increases.
 

Which Measure to Use


1) it is appropriate to the level of measurement of the data (nominal or ordinal);

2) it equals 0.0 for no relationship and 1.0 for a perfect relationship;

3) it is sensitive to subtle differences in the strength of a relationship

4) the researcher is familiar with the statistic and knows how to interpret it

5) look at what has been done in the past with research on this type of variable

    Note that some statistics take on different values, depending on which of the two variables is
the independent variable and which is the dependent variable. These are called asymmetric
measures of association. Symmetric measures of association take on the same value, no matter
which variable is the independent variable and which is the dependent variable.

    Note that the value of one statistic, such as gamma, cannot be directly compared with the
value of another statistic, such as Tau. Each statistic has its own standard, and the value of the
statistic obtained by the researcher must be compared with the standard for that statistic.

    If the values of a number of statistics are obtained, and they all indicate a strong relationship
between two variables, the researcher may take that as additional support for the existence of a
relationship. However, if the values of a number of statistics are contradictory, with some
indicating a strong relationship and others a weak relationship, the researcher must look more
closely at the data. For example, there may be a non-linear relationship between the two
variables.

    Note that some measures of association are not useful when there is a non-linear relationship
between the two variables. This can occur when there are three or more categories of values for
the independent variable, and the values of the dependent variable do vary but not in a strictly
linear fashion.
 

Nominal Measures of Association

    Lambda is a measure of association that measures the Proportional Reduction in Error (PRE)
obtained when the researcher uses the value of the independent variable to predict the value of
the dependent variable.

    If the researcher only has the value of the dependent variable, the researcher will make a
number of errors trying to predict the values of the dependent variable for new observations. The
amount of error made in trying to predict the dependent variable alone is called original error.

    For example, say you asked the people in your organization to rate the personnel department.
You know that the univariate distribution for this variable looks like this:
 
Rating of Personnel Department Frequency
Poor 38
Satisfactory 32
Good 35
Total 95

    Let's say you want to guess what the rating of another 95 people would be. Your best guess
would be to pick to modal category, which is "Poor." That is, more people picked "Poor" than
any other category. If you consistently pick "Poor," you will make the fewest number of wrong
guesses. Original error=38 right and 57 wrong (out of 95 total guesses).

    Now, let's say that you are given one additional piece of information. You now know what the
ratings of the personnel department are by the people who work in one of four departments:
police, fire, public works, and planning.
 

Personnel  Department of Employment


Department Police Fire Public Planning
Rating Works
Poor 10 15 5 8
Satisfactory 5 10 15 2
Good 15 5 5 0
Total 30 30 25 10

    Now, if you had to guess the personnel department rating, you could qualify your best guess
by knowing the department of employment. For each department, you would guess the modal
category.
 

Rating of  Modal Right Guesses Wrong Guesses


Personnel Department Category
Police (N=30) "Good" 15 15
Fire (N=30) "Poor" 15 15
Public Works (N=25) "Satisfactory" 15 10
Planning (N=10) "Poor" 8 2
Total 53 42

    The total number of new errors (wrong guesses) is 42.

    To calculate Lambda, subtract the number of new errors from the number of original errors
and divide by the number of original errors. In this case, [(57-42)/57]=.263

    By knowing a person's department, we can reduce the error in predicting how they rate the
personnel department by 26.3%. This indicates a weak relationship between department of
employment and perception of the personnel department. As the independent variable is
measured on a nominal scale, there is no direction for the relationship (neither positive nor
negative, just an association).
 

Ordinal Measures of Association

    Gamma is a measure of association that measures the Proportional Reduction in Error (PRE)
obtained when the researcher uses the value of the independent variable to predict the value of
the dependent variable.

    Gamma varies from a value of 0.0 for the weakest level of association, to a value of +1.0 for
the strongest level of association for a direct or positive or -1.0 for the strongest level of
association for a negative or inverse relationship.

    Note that both variables must be coded so that the values of the variable go from low to high,
for example, dissatisfied=1, neutral=2, high=3, or less than high school=1, high school=2, more
than high school=3. The values of the variables in the contingency table should be arrayed from
low to high as you read from left to right across the columns, and from low to high as you read
from top to bottom along the rows.

    Gamma can be used with two variables measures at the ordinal level, but is not good at
reflecting non-linear relationships between two variables. In that case, a nominal measure of
association should be used.

    For example, let us hypothesize that there is a relationship between the length of time a person
has been employed in an organization, and that person's opinion of that organization's personnel
department: the longer employed, the better the opinion.
 

Opinion of the  Number of Years Employed


Personnel Less than 1 1 to 5 More than 5
Department
Poor 0 6 12
Satisfactory 0 6 0
Good 12 0 0
Total 12 12 12

    To calculate gamma, we look at the number of observations that would support our hypothesis
(called A) and the number of observations that would not support it (called D).

    First we look for the number of observations in agreement (A). This consists in identifying the
cells in the table that tend to support our hypothesis. We begin in the upper left hand corner, and
work right and downward across the table.
    We take the number of people who have worked less than 1 year and rate the department as
poor (this would support our hypothesis). We multiply this number times the number of
observations found in the cells which are under and to the right of this cell. These are the cells
that contain the number of people who have worked either from 1-5 years or more than 5 years
and who rate the department as either satisfactory or good.

    Next we find the number of people who have worked more less than 1 year and who rate the
department as satisfactory. We multiply this number times the number of observations found in
the cells which are under and to the right of this cell. This includes the number of people who
have worked from 1-5 years or more than 5 years and rate the department as good.

    Next we find the number of people who have worked from 1-5 years and rate the department
as poor. We multiply this number times the number of observations found in the cells which are
under and to the right of this cell. This includes the number of people who have worked more
than 5 years and rate the department as satisfactory or good.

    Finally, we count the number of people who have worked from 1-5 years and rate the
department as satisfactory. We multiply this number times the number of observations found in
the cells which are under and to the right of this cell. This includes the number of people who
have worked more than 5 years and rate the department as good.

A=0 x (6+0+0+0) + 0 x (0 + 0) + 6 x (0 + 0) + 6 x (0)


A=0 x (6) + 0 x (0) +6 x (0) + 6 x (0)
A=0

    Next we look for the number of observations in disagreement (D). This consists in identifying
the cells in the table that tend to support our hypothesis. In this case, we would begin in the
opposite (upper right hand) corner and work left and downward across the table.

    In the table, there are 12 people who have worked more than five years who rate the personnel
department as poor (this would disconfirm our hypothesis). We multiply this number times the
number of observations found in the cells which are under and to the left of this cell. These are
the cells that contain the number of people who have worked either less than one or from 1-5
years and who rate the department as either satisfactory or good.

    Next we find the number of people who have worked more than 5 years who would rate the
department as satisfactory. We multiply this number times the number of observations found in
the cells which are under and to the left of this cell. This includes the number of people who have
worked less than 1 year or from 1-5 years and rate the department as good.

    Next we find the number of people who have worked from 1-5 years and rate the department
as poor. We multiply this number times the number of observations found in the cells which are
under and to the left of this cell. This includes the number of people who have worked less than 1
year and rate the department as satisfactory or good.
    Finally, we count the number of people who have worked from 1-5 years and rate the
department as satisfactory. We multiply this number times the number of observations found in
the cells which are under and to the left of this cell. This includes the number of people who have
worked less than 1 year and rate the department as good.

D=12 x (6+0+12+0) + 0 x (0 + 12) + 6 x (0 + 12) + 6 x (12)


D=12 x (18) + 0 x (12) +6 x (12) + 6 x (12)
D=216 + 0 + 72 + 72
D=360

    Gamma is calculated by finding the number of observations in agreement minus the number of
observations in disagreement, and dividing that by the number of observations in agreement plus
the number of observations in disagreement.

Gamma=(0-360)/(0+360)=-1.0

    This value of gamma tells us that we have a very strong relationship between the length of
time employed and opinion of the personnel department, but the relationship is in the opposite
direction than we predicted. That is, as length of employment increases, opinion of the personnel
department decreases.
 

Introducing Control Variables

    In establishing whether or not a relationship exists between two variables, it is not enough to
obtain a high value on a measure of association. The researcher must also show that the
purported relationships between the two variables is not spurious. A spurious relationship is one
where two variables seem to be associated with one anther, but the association can be explained
away by the introduction of a third variable.

    The introduction of a third, control, variable is called the specification or elaboration of the
relationship observed between the original two variables. Control variables come from the
researcher's experience; from a review of the literature; from a conceptual model that guides the
research; or from a hypothesis.

    For example, it is possible to establish that an association exists between the amount of ice
cream sold and the number of assaults in any given city. However, this relationship is spurious:
both the amount of ice cream sold and the number of assaults increase as the temperature
increases. The temperature is associated with ice cream sales, and the temperature is associated
with assaults, but ice cream and assaults are not related. This becomes apparent because when
temperature is controlled, the value of the measure of association between ice cream sales and
assaults will greatly diminish.

    Previously, we established an apparent relationships between attitude toward consolidation


and area of residence. But what if citizens' attitude toward consolidation is really influenced by
their evaluation of their current public services?
    Say that we have collected information on the third variable, evaluation of current public
services. The variable is coded as either satisfactory or unsatisfactory. In order to introduce this
as a control variable, we need to take the following steps.

1) obtain the original bivariate distribution table

2) Obtain the frequency distribution for the control variable and divide the observations in the
original table into groups according to the categories of the control variable.

3) within each of these two new groups, re-create the original bivariate distribution table

4) compare the new bivariate distributions with the original distribution (in step 1)

5) interpret the results


 

Interpreting Control Tables

Step 1. Obtain the original bivariate distribution table

Attitudes toward Consolidation by Area of Residence


 

Attitude toward  Area of Residence


Consolidation Inside Outside
City Limits City Limits
(N=505) (N=145)
Against 19% 39%
No Opinion 27% 23%
For  54% 37%
Total 100% 100%

Step 2. Obtain the frequency distribution for the control variable.

Control Variable: Rating of Current Services


Categories: Satisfactory=388
Unsatisfactory=262

    Divide the 650 observations in the original table into two groups: those who rate their current
services as satisfactory, and those who rate their current services as unsatisfactory.
 

Step 3.  Within each of these two new groups, re-create the original bivariate distribution table.
Control Table A. Current Services Rated as Satisfactory (N=388)
 

Attitude toward  Area of Residence


Consolidation Inside Outside
City Limits City Limits
(N=505) (N=145)
Against 15% 54%
No Opinion 20% 44%
For  65% 2%
Total 100% 100%

Control Table B. Current Services Rated as Unsatisfactory (N=262)


 

Attitude toward  Area of Residence


Consolidation Inside Outside
City Limits City Limits
(N=505) (N=145)
Against 27% 29%
No Opinion 39% 9%
For  34% 62%
Total 100% 100%

Step 4.  Compare the new bivariate distributions with the original distribution (in step 1). There
are three distinct possibilities: the original relationship is unchanged; the original relationship
disappears; the original relationship is changed.

    If the original relationship is unchanged, then the control variable has no effect, and can be
disregarded in further analysis of the dependent variable.

    If the original relationship disappears, then that relationship was spurious, and the control
variable becomes the new independent variable in further analysis of the dependent variable.

    If the original relationship is changed, then both variables are important, and must be
considered in further analysis of the dependent variable.

    In the original table, more city residents (54%) than non-city residents (37%) were for
consolidation. This relationship is similar among the respondents in the first control table. For
those who rate their current services as satisfactory, more city residents (65%) than non-city
residents (2%) were for consolidation.
    However, the relationship is reversed in the second control table. Among respondents who rate
their current services as unsatisfactory, fewer city residents (34%) than non-city residents (62%)
are for consolidation.
 

Step 5.  Interpret the results.

    In this case, both area of residence and perception of current services are important influences
on a citizen's attitude toward consolidation. Those who live outside the city, and who are
satisfied with their services, are opposed to consolidation, but those who live outside the city and
are unsatisfied with their services favor consolidation.

    Among city residents, the relationship is reversed: those who are satisfied favor consolidation,
while those who are unsatisfied oppose it. Perhaps those who are unsatisfied think that their
services will deteriorate even further if the city and county are consolidated.

    Another example concerns the attitude of organizational employees toward merit pay. We
hypothesize that men will be more favorable to merit pay than women. We obtain the following
bivariate distribution table:

Original Table: Attitude toward Merit Pay by Sex


 

Attitude toward Sex


Merit Pay Female  Male
(n=1506) (n=228)
Negative 80% 20%
Positive 20% 80%
Total 100% 100%

    This table seems to confirm our hypothesis: 80% of men favor merit pay but only 20% of
women favor it. Values obtained for various measures of association are strong.

    However, our MPA intern suggests that it is not sex but whether or not someone is in
management position that determines their attitude toward merit pay. We obtain the distribution
for type of job, and find that of the original 1734 people in our study, 444 have management jobs
and 1290 do not.

Control Table A: Management Jobs


 

Attitude toward Sex


Merit Pay Female  Male
(n=238) (n=206)
Negative 13% 13%
Positive 87% 87%
Total 100% 100%

    Here the relationship between sex and attitude completely disappears. Equally high
percentages of women and men in management jobs are in favor of merit pay. The value
obtained for the measure of association drops to nearly zero.

Control Table B: Non-management Jobs


 

Attitude toward Sex


Merit Pay Female Male
(n=1268) (n=22)
Negative 92% 91%
Positive 8% 9%
Total 100% 100%

    Here the relationship between sex and attitude completely disappears. Equally high
percentages of women and men in non-management jobs are opposed to merit pay. The value
obtained for the measure of association drops to nearly zero.

    In conclusion, we can discard the variable sex and concentrate on level of employment in our
further analysis of the dependent variable, attitude toward merit pay.

You might also like