Professional Documents
Culture Documents
Analyzing
Categorical
Variables and
Interpreting
Research
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 44
Introductory Example: Fair Die
We roll a die 60 times and record the number of spots.
The outcomes are shown in the table and graph below.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 55
Introductory Example: Fair Die
We can see that our outcomes were not exactly
what we would expect in a perfect world. We
will use a statistic, called the chi-square statistic,
to compare the real outcomes with the
expected outcomes. We use the chi-square
distribution to find p-values that tell us whether
we should be suspicious that our outcomes are
not matching our expectations.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 66
Contingency Table
(Two-Way Table)
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 77
Example: Contingency Table
A sample of US adults and members of the American
Association for the Advancement of Science (AAAS)
were asked by the Pew Poll, “Is it safe to eat genetically
modified foods?” The results are shown in the
contingency table. (We assumed the sample size was
100 for each group.)
US Adults AAAS
Scientists
Yes 37 88
No 63 12
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 88
Expected Counts
The expected counts are the numbers of
observations we would see in each cell of the
contingency table if the null hypothesis were
true.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 99
Example: Expected Counts
In our previous example of the fair die, if the die
is rolled 60 times, we would expect 10 of each
outcome. The table below shows the expected
counts and the observed counts from the
experiment.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 10
10
Example: Finding Expected Counts
In the Pew Research poll, there are two categorical
variables: Background (US Adult or scientist) and Belief
in Safety of GMO (Yes/No). What counts should we
expect if these variables are truly not related to each
other?
US Adults AAAS
Scientists
Yes 37 88
No 63 12
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 11
11
Example: Finding Expected Counts
Starting with the GMO (Yes/No) variable:
125/200 (0.625) of the sample said “Yes” and
75/200 (0.375) of the sample said “No.”
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 12
12
Example: Finding Expected Counts
There were 100 US Adults in the sample, so we
expect 0.625(100) = 62.5 to say “Yes” and
0.375(100) = 37.5 to say “No.”
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 13
13
Contingency Table Showing Observed
and Expected Counts
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 14
14
Notes about Expected Counts
In this example we started with the GMO variable. We could
also have started with the background variable and computed
the expected counts using the background percentages. For
example, 50% of the sample were AAAS Scientists, so we would
expect 50% of those saying “Yes” to be scientists. If the expected
counts are computed this way, the results are exactly the same.
The formula (row total)× column total
Expected count for a cell =
grand total
can also be used to compute the expected counts, but in practice
the expected counts are computed using technology.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 15
15
The Chi-Square Statistic
The chi-square statistic measures the amount
that our expected counts differ from our
observed counts. The formula for the chi-square
statistic is (O E ) 2
X
2
cells E
where O is the observed count in each cell, E is
the expected count in each cell, and means to
add the results in each cell.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 16
16
Example: Chi-Square Statistic
In our GMO example,
(37 - 62.5) 2
(88 - 62.5) 2
(63- 37.5) 2
(12 - 37.5) 2
X2 = + + +
62.5 62.5 37.5 37.5
X = 55.488
2
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 17
17
Finding the P-Value for the
Chi-Square Statistic
• The p-value is found using the chi-square
distribution.
• The chi-square distribution has only positive values
for test statistics and is right skewed.
• Like the t-distribution, the shape of the chi-square
distribution depends on a number called the degrees
of freedom.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 18
18
The Chi-Square Distribution
The chi-square distribution provides a good
approximation to the sampling distribution of
the chi-square statistic only if the sample size is
large enough (if each expected count is five or
higher).
In practice, technology is used to compute the
chi-square statistic and the accompanying
p-value.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 19
19
GMO Example: Conclusion
Chi-square statistic:
55.488
p-value: < 0.0001
US Adults and AAAS
scientists differ in their
support of GMO foods.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 20
20
Section 10.2 Amy Walters. Shutterstock
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 22
22
Test of Homogeneity
• Collect two (or more) independent samples,
one from each population.
• Each object sampled has a categorical value
that is recorded.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 23
23
Test of Homogeneity
Example: Collect a random sample of men and a
random sample of women. Ask each person
sampled if they agree that global warming is a
serious problem.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 24
24
Test of Independence
• Collect only one sample.
• For objects in the sample we record two
categorical response variables.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 25
25
Similarities in the Two Approaches
• In both situations we are interested in
knowing whether the two categorical
variables are related or unrelated.
• Use the same chi-square test statistic and the
same chi-square distribution to find the
p-value.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 26
26
Example:
Homogeneity or Independence?
A polling organization asks a random sample of
people for their party affiliation (Democrat,
Republican, or other) and whether they think
the minimum wage should be raised. If the
organization wanted to test whether party
affiliation and opinion on minimum wage are
associated, would this be a test of homogeneity
or independence?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 27
27
Example:
Homogeneity or Independence?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 28
28
Example:
Homogeneity or Independence?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 29
29
Example:
Homogeneity or Independence?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 30
30
Tests of Homogeneity and
Independence
1. Hypothesize
H0: There is no association between the two
variables (the variables are independent).
Ha: There is an association between the two
variables (the variables are not independent).
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 31
31
Tests of Homogeneity and
Independence
2. Prepare
• Random samples
• Independent samples and observations
• Large samples: The expected counts must be
five or more in each cell of the table.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 32
32
Tests of Homogeneity and
Independence
3. Compute to Compare
Test statistic is
(O - E) 2
X =S2
E
Degrees of freedom = (#rows – 1)(#columns – 1)
p-value comes from the X2 distribution.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 33
33
Tests of Homogeneity and
Independence
4. Interpret
If the p-value is less than or equal to the
significance level, we reject H0 and conclude
there is an association between the variables.
Otherwise we do not reject H0 and we cannot
conclude there is an association between the
variables.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 34
34
Example: Republican Views on Global
Warming
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 35
35
Example: Republican Views on Global
Warming
1. Is this a test of homogeneity or
independence?
2. Run a test to see if there is an association
between type of Republican and opinion.
3. The report on this survey is titled “Not All
Republicans Think Alike about Global
Warming.” Do the results of the hypothesis
test support this headline?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 36
36
Liberal Moderate Conservative Tea Party
Republican Republican Republican Republican
Yes 72 335 483 120
No 34 205 788 292
1. Hypothesize
H0: Republican type and opinion are independent.
Ha: Republican type and opinion are not independent.
2. Prepare Samples are random and independent. Check on
the technology output that all expected counts are greater
than or equal to 5.
StatCrunch: Stats > Table > Contingency > with summary
4. Interpret
Reject H0. There is an
association between
type of Republican and
opinion.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 40
40
To Run a Test of Independence Using a
TI-84 Calculator
To run a test of independence on the TI-84 calculator:
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 41
41
Example:
Education and Marital Status
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 42
42
The Data:
Education and Marital Status
College or HS Less HS
higher
Divorced 15 59 10
Married 98 240 70
Single 27 68 17
Widow/er 3 30 28
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 43
43
Example:
Education and Marital Status
1. Hypothesize
H0: Marital status and educational level are
independent.
Ha: Marital status and educational level are associated.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 44
44
All expected counts (in
parentheses) are greater than
or equal to 5.
3. Compute to Compare
Test statistic: X2=39.97
p-value: p<0.0001
4. Interpret
We reject H0 because the p-value is small.
Marital status and educational level are
associated.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 46
46
Drawback of the Chi-Square Test
The chi-square test reveals only if two variables
are associated, not how they are associated.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 47
47
Example: AIDS Vaccine
In a study of a potential AIDS vaccine, 8200 volunteers were
randomly assigned to receive a vaccine against AIDS and another
8200 to receive a placebo. The number in each group who had
contracted AIDS at the end of 3 years was recorded. The data is
shown in the following table.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 48
48
Example: AIDS Vaccine
A Chi-square test could be used to determine if
vaccine and AIDS are independent. The
conclusion of this test could tell us there is an
association between the variables but not how
they are associated. What the researchers want
to know is if the vaccine is effective.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 49
49
Example: AIDS Vaccine
Because both categorical variables Vaccine and
AIDS have only 2 outcomes (yes/no), the data
can be analyzed using a two-proportion z-test.
By testing the hypotheses:
Section 10.3
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 52
52
Random vs. Non-Random Assignment
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 54
54
Evaluating Abstracts
When reading an abstract, answer these
questions:
1. What is the research question that the
investigators are trying to to answer?
2. What is their answer to the research
question?
3. What were the methods they used to collect
data?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 55
55
Evaluating Abstracts
4. Is the conclusion appropriate for the
methods used to collect data?
5. To what population do the conclusions apply?
6. Have the results been replicated in other
articles? Are the results consistent with what
other researchers have suggested?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 56
56
Beware of Data Dredging
Data Dredging: the practice of stating a
hypothesis after first looking at the data. This
makes it more likely to mistakenly reject the null
hypothesis. If we first look at the test statistic to
decide what the hypothesis should be we are
rigging the system in favor of the alternative
hypothesis.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 57
57
Beware of Publication Bias
Publication Bias – most scientific and medical
journals prefer to publish “positive” findings –
one in which the null hypothesis is rejected. If a
journal favors positive findings over negative
findings, then we will only read studies that find
a drug works (for example), even though the
vast majority of researchers came to the
opposite conclusion (with unpublished studies).
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 58
58
Beware of Profit Motive
Much statistical research is now paid for by
corporations that hope their products make life
better for people. Sometimes the corporation
funding the research can influence whether or
not results get published. Always evaluate the
methods of the study used and decide whether
those methods are sound.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 59
59
Beware of the Media
Media often use “catchy” headlines that do not
always capture the true spirit of the study. The
most common problem is that the headlines
often suggest a cause-and-effect relationship
even though such a conclusion is not supported
by the data.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 60
60
Clinical vs. Statistical Significance
An outcome of an experiment or study that is
large enough to have a real effect on people’s
health or lifestyle is said to have clinical
significance. Sometimes researchers discover
that a treatment is statistically significant
(meaning the outcome is too large to be due to
chance) but too small to be meaningful (so it is
not clinically significant).
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc. Slide
Slide 61
61
Example:
Clinically vs. Statistically Significant