You are on page 1of 62

Chapter 10

Analyzing
Categorical
Variables and
Interpreting
Research

Copyright © 2017, 2014 Pearson Education, Inc. Slide 1


Chapter 10 Topics

• Explore associations between categorical


variables
• Discuss important considerations when
reading research papers

Copyright © 2017, 2014 Pearson Education, Inc. Slide 2


Section 10.1
Monika Wisniewska. Shutterstock

THE BASIC INGREDIENTS FOR TESTING


WITH CATEGORICAL VARIABLES
• Identify the Basic Ingredients for Testing with
Categorical Variables

Copyright © 2017, 2014 Pearson Education, Inc. Slide 3


Introductory Example: Fair Die
Suppose we wanted to determine if a standard 6-sided
die was fair. In a perfect world, if the die was fair, the
distribution of outcomes would look like this:

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 44
Introductory Example: Fair Die
We roll a die 60 times and record the number of spots.
The outcomes are shown in the table and graph below.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 55
Introductory Example: Fair Die
We can see that our outcomes were not exactly
what we would expect in a perfect world. We
will use a statistic, called the chi-square statistic,
to compare the real outcomes with the
expected outcomes. We use the chi-square
distribution to find p-values that tell us whether
we should be suspicious that our outcomes are
not matching our expectations.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 66
Contingency Table
(Two-Way Table)

• Summary table that displays frequencies for


outcomes when two categorical variables are
analyzed
• Even though there are numbers in the table,
these numbers are summaries of variables
whose values are categories.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 77
Example: Contingency Table
A sample of US adults and members of the American
Association for the Advancement of Science (AAAS)
were asked by the Pew Poll, “Is it safe to eat genetically
modified foods?” The results are shown in the
contingency table. (We assumed the sample size was
100 for each group.)

US Adults AAAS
Scientists
Yes 37 88
No 63 12

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 88
Expected Counts
The expected counts are the numbers of
observations we would see in each cell of the
contingency table if the null hypothesis were
true.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 99
Example: Expected Counts
In our previous example of the fair die, if the die
is rolled 60 times, we would expect 10 of each
outcome. The table below shows the expected
counts and the observed counts from the
experiment.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 10
10
Example: Finding Expected Counts
In the Pew Research poll, there are two categorical
variables: Background (US Adult or scientist) and Belief
in Safety of GMO (Yes/No). What counts should we
expect if these variables are truly not related to each
other?

US Adults AAAS
Scientists
Yes 37 88
No 63 12

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 11
11
Example: Finding Expected Counts
Starting with the GMO (Yes/No) variable:
125/200 (0.625) of the sample said “Yes” and
75/200 (0.375) of the sample said “No.”

If GMO (Yes/No) is independent of background,


then we should expect the same percentage of
US adults and AAAS scientists to say “Yes” and
the same percentage to say “No.”

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 12
12
Example: Finding Expected Counts
There were 100 US Adults in the sample, so we
expect 0.625(100) = 62.5 to say “Yes” and
0.375(100) = 37.5 to say “No.”

Since the sample size of AAAS Scientists in the


survey is the same as that of the US adults (100
in each group), so we expect the same numbers
to say “Yes” and “No” as the US Adults.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 13
13
Contingency Table Showing Observed
and Expected Counts

The table below shows the actual counts and


the expected counts (in parentheses):

US Adults AAAS Total


Scientists
Yes 37 (62.5) 88 (62.5) 125
No 63 (37.5) 12 (37.5) 75
Total 100 100 200

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 14
14
Notes about Expected Counts
In this example we started with the GMO variable. We could
also have started with the background variable and computed
the expected counts using the background percentages. For
example, 50% of the sample were AAAS Scientists, so we would
expect 50% of those saying “Yes” to be scientists. If the expected
counts are computed this way, the results are exactly the same.
The formula (row total)×  column total
Expected count for a cell =
grand total
can also be used to compute the expected counts, but in practice
the expected counts are computed using technology.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 15
15
The Chi-Square Statistic
The chi-square statistic measures the amount
that our expected counts differ from our
observed counts. The formula for the chi-square
statistic is (O  E ) 2
X 
2

cells E
where O is the observed count in each cell, E is
the expected count in each cell, and  means to
add the results in each cell.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 16
16
Example: Chi-Square Statistic
In our GMO example,

(37 - 62.5) 2
(88 - 62.5) 2
(63- 37.5) 2
(12 - 37.5) 2
X2 = + + +
62.5 62.5 37.5 37.5

X = 55.488
2

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 17
17
Finding the P-Value for the
Chi-Square Statistic
• The p-value is found using the chi-square
distribution.
• The chi-square distribution has only positive values
for test statistics and is right skewed.
• Like the t-distribution, the shape of the chi-square
distribution depends on a number called the degrees
of freedom.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 18
18
The Chi-Square Distribution
The chi-square distribution provides a good
approximation to the sampling distribution of
the chi-square statistic only if the sample size is
large enough (if each expected count is five or
higher).
In practice, technology is used to compute the
chi-square statistic and the accompanying
p-value.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 19
19
GMO Example: Conclusion
Chi-square statistic:
55.488
p-value: < 0.0001
US Adults and AAAS
scientists differ in their
support of GMO foods.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 20
20
Section 10.2 Amy Walters. Shutterstock

CHI-SQUARE TESTS FOR ASSOCIATIONS


BETWEEN CATEGORICAL VARIABLES
• Use the Chi-Square Test to Determine Whether
there is an Association between Two Categorical
Variables
Copyright © 2017, 2014 Pearson Education, Inc. Slide 21
Two Tests for Association
There are two tests to determine whether two
categorical variables are associated.

Which test you use depends on how the data


were collected.

Both methods use two-way tables to display


data. Both are conducted in similar ways.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 22
22
Test of Homogeneity
• Collect two (or more) independent samples,
one from each population.
• Each object sampled has a categorical value
that is recorded.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 23
23
Test of Homogeneity
Example: Collect a random sample of men and a
random sample of women. Ask each person
sampled if they agree that global warming is a
serious problem.

In this example we have two samples: one


categorical response variable (opinion) and one
categorical grouping variable (gender).

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 24
24
Test of Independence
• Collect only one sample.
• For objects in the sample we record two
categorical response variables.

Example: Collect a large sample of people and


record their marital status and income.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 25
25
Similarities in the Two Approaches
• In both situations we are interested in
knowing whether the two categorical
variables are related or unrelated.
• Use the same chi-square test statistic and the
same chi-square distribution to find the
p-value.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 26
26
Example:
Homogeneity or Independence?
A polling organization asks a random sample of
people for their party affiliation (Democrat,
Republican, or other) and whether they think
the minimum wage should be raised. If the
organization wanted to test whether party
affiliation and opinion on minimum wage are
associated, would this be a test of homogeneity
or independence?

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 27
27
Example:
Homogeneity or Independence?

This is a test of independence because only one


sample was collected and two categorical
variables (party affiliation and opinion on
minimum wage) were recorded for each
member of the sample.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 28
28
Example:
Homogeneity or Independence?

In 2013 the Pew Organization surveyed adults in


eight countries that had legalized same-sex
marriage, asking the question, “Should
homosexuality be accepted?” If the
organization wanted to investigate whether
country of origin and opinion are associated,
would this be a test of homogeneity or
independence?

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 29
29
Example:
Homogeneity or Independence?

This is an example of a test of homogeneity.


Eight independent samples are collected (a
sample from each of eight countries) and a
single categorical variable is recorded for each
member of the sample (the response to the
question “ Should homosexuality be
accepted?”).

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 30
30
Tests of Homogeneity and
Independence
1. Hypothesize
H0: There is no association between the two
variables (the variables are independent).
Ha: There is an association between the two
variables (the variables are not independent).

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 31
31
Tests of Homogeneity and
Independence
2. Prepare
• Random samples
• Independent samples and observations
• Large samples: The expected counts must be
five or more in each cell of the table.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 32
32
Tests of Homogeneity and
Independence
3. Compute to Compare
Test statistic is
(O - E) 2
X =S2

E
Degrees of freedom = (#rows – 1)(#columns – 1)
p-value comes from the X2 distribution.

Technology can be used to compute the test


statistic and p-value.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 33
33
Tests of Homogeneity and
Independence
4. Interpret
If the p-value is less than or equal to the
significance level, we reject H0 and conclude
there is an association between the variables.
Otherwise we do not reject H0 and we cannot
conclude there is an association between the
variables.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 34
34
Example: Republican Views on Global
Warming

The Yale Project on Climate Change investigated


views on global warming among the Republican
Party. Republicans surveyed identified
themselves as Liberal, Moderate, Conservative,
or Tea Party Republicans and also answered the
question, “Do you believe global warming is
happening?” The results are shown in the
following two-way table.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 35
35
Example: Republican Views on Global
Warming
1. Is this a test of homogeneity or
independence?
2. Run a test to see if there is an association
between type of Republican and opinion.
3. The report on this survey is titled “Not All
Republicans Think Alike about Global
Warming.” Do the results of the hypothesis
test support this headline?

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 36
36
Liberal Moderate Conservative Tea Party
Republican Republican Republican Republican
Yes 72 335 483 120
No 34 205 788 292

“Do you believe global warming is happening?”

1. Hypothesize
H0: Republican type and opinion are independent.
Ha: Republican type and opinion are not independent.
2. Prepare Samples are random and independent. Check on
the technology output that all expected counts are greater
than or equal to 5.
StatCrunch: Stats > Table > Contingency > with summary

Copyright © 2017, 2014 Pearson Education, Inc. Slide 37


Copyright © 2017, 2014 Pearson Education, Inc. Slide 38
All expected counts (in
parentheses) are
greater than or equal
to 5.
3. Compute to
Compare
Test statistic:
X2 = 151.59
p-value: < 0.0001

4. Interpret
Reject H0. There is an
association between
type of Republican and
opinion.

Copyright © 2017, 2014 Pearson Education, Inc. Slide 39


Example: Republican Views on Global
Warming

This was a test of independence. There is an


association between Republican type and
opinion. The study supports the headline, “Not
all Republicans Think Alike about Global
Warming.”

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 40
40
To Run a Test of Independence Using a
TI-84 Calculator
To run a test of independence on the TI-84 calculator:

1. Enter your data into a Matrix.


2. Push STAT > TESTS then select option X2 Test.
3. Press Calculate. The X2 test statistic and p-value
will be displayed.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 41
41
Example:
Education and Marital Status

Does a person’s educational level affect his or


her decision about marrying? A sample of 665
people was taken. Their marital status and
educational level were recorded. The data is
shown in the table. Are the variables marital
status and educational level independent?

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 42
42
The Data:
Education and Marital Status

College or HS Less HS
higher
Divorced 15 59 10
Married 98 240 70
Single 27 68 17
Widow/er 3 30 28

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 43
43
Example:
Education and Marital Status

1. Hypothesize
H0: Marital status and educational level are
independent.
Ha: Marital status and educational level are associated.

2. Prepare We use technology to compute the test


statistic, p-value, and expected counts. We need to
check that the expected counts are all five or more.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 44
44
All expected counts (in
parentheses) are greater than
or equal to 5.

3. Compute to Compare
Test statistic: X2=39.97
p-value: p<0.0001

Copyright © 2017, 2014 Pearson Education, Inc. Slide 45


Example:
Education and Marital Status

4. Interpret
We reject H0 because the p-value is small.
Marital status and educational level are
associated.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 46
46
Drawback of the Chi-Square Test
The chi-square test reveals only if two variables
are associated, not how they are associated.

When both categorical variables only have two


categories, the data can be analyzed using a
two-proportion z-test instead which gives more
information on how the variables are associated.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 47
47
Example: AIDS Vaccine
In a study of a potential AIDS vaccine, 8200 volunteers were
randomly assigned to receive a vaccine against AIDS and another
8200 to receive a placebo. The number in each group who had
contracted AIDS at the end of 3 years was recorded. The data is
shown in the following table.

Vaccine No Vaccine Total


AIDS 51 74 125
No AIDS 8149 8126 16275
Total 8200 8200 16400

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 48
48
Example: AIDS Vaccine
A Chi-square test could be used to determine if
vaccine and AIDS are independent. The
conclusion of this test could tell us there is an
association between the variables but not how
they are associated. What the researchers want
to know is if the vaccine is effective.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 49
49
Example: AIDS Vaccine
Because both categorical variables Vaccine and
AIDS have only 2 outcomes (yes/no), the data
can be analyzed using a two-proportion z-test.
By testing the hypotheses:

H0: propvaccine = propplacebo


Ha: propvaccine < propplacebo
the researchers can determine the direction of
the effect; in other words, whether the vaccine
was effective.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 50
50
Africa Studio. Shutterstock

Section 10.3

READING RESEARCH PAPERS

• Discuss Methods to Critically Evaluate Published


Research

Copyright © 2017, 2014 Pearson Education, Inc. Slide 51


Some Guiding Principles
1. Pay attention to how randomness is used.
2. Don’t rely solely on the conclusions of any
single paper.
3. Extraordinary claims require extraordinary
evidence.
4. Be wary of conclusions based on very
complex statistical or mathematical models.
5. Stick to peer-reviewed journals.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 52
52
Random vs. Non-Random Assignment

Randomness in study design makes certain


inferences possible and others not possible.

Copyright © 2017, 2014 Pearson Education, Inc. Slide 53


Reading Abstracts
An abstract is a short paragraph at the beginning
of a research article that described its basic
findings. It often includes a description of:
• The methods used in the study
• The results of the study
• The conclusions of the researchers

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 54
54
Evaluating Abstracts
When reading an abstract, answer these
questions:
1. What is the research question that the
investigators are trying to to answer?
2. What is their answer to the research
question?
3. What were the methods they used to collect
data?

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 55
55
Evaluating Abstracts
4. Is the conclusion appropriate for the
methods used to collect data?
5. To what population do the conclusions apply?
6. Have the results been replicated in other
articles? Are the results consistent with what
other researchers have suggested?

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 56
56
Beware of Data Dredging
Data Dredging: the practice of stating a
hypothesis after first looking at the data. This
makes it more likely to mistakenly reject the null
hypothesis. If we first look at the test statistic to
decide what the hypothesis should be we are
rigging the system in favor of the alternative
hypothesis.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 57
57
Beware of Publication Bias
Publication Bias – most scientific and medical
journals prefer to publish “positive” findings –
one in which the null hypothesis is rejected. If a
journal favors positive findings over negative
findings, then we will only read studies that find
a drug works (for example), even though the
vast majority of researchers came to the
opposite conclusion (with unpublished studies).

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 58
58
Beware of Profit Motive
Much statistical research is now paid for by
corporations that hope their products make life
better for people. Sometimes the corporation
funding the research can influence whether or
not results get published. Always evaluate the
methods of the study used and decide whether
those methods are sound.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 59
59
Beware of the Media
Media often use “catchy” headlines that do not
always capture the true spirit of the study. The
most common problem is that the headlines
often suggest a cause-and-effect relationship
even though such a conclusion is not supported
by the data.

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 60
60
Clinical vs. Statistical Significance
An outcome of an experiment or study that is
large enough to have a real effect on people’s
health or lifestyle is said to have clinical
significance. Sometimes researchers discover
that a treatment is statistically significant
(meaning the outcome is too large to be due to
chance) but too small to be meaningful (so it is
not clinically significant).

Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 61
61
Example:
Clinically vs. Statistically Significant

A rare disease affects only one person in 10


million. A controlled experiment finds that a
new drug “significantly reduces your risk of
getting this disease.” Given that the disease is
so rare, is it worth producing the drug to lower
your chance of getting the disease from one in
10 million to one in 20 million? This is a case
where the treatment may not be clinically
significant.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 62
62

You might also like