You are on page 1of 8

Section 11.

2 - Complete

Comparing Counts
In Canada there is universal health care. In the US, well... However in the
US there are more elaborate treatments offered to those patients with
access. A comparison of two random samples of 2165 US and 311 Canadian
heart attack patients a year after their heart attack asked for the patients'
own assessment of their quality of life relative to what it had been before
the heart attack:

It is difGicult (and inaccurate) to compare the counts because the sample


sizes are much different.

Here are the percents of each sample with each outcome:

These are the conditional distributions of outcomes, given the patients'


nationality.
We can compare these distributions in a bar graph and see if there are any
differences between the two distributions. Our goal is to determine if
there is a signi/icant difference in the distribution of quality of life between
the two populations.

1
Section 11.2 - Complete

The problem of multiple comparisons


Ultimately we want to know if there is a difference between the
distributions of outcomes in Canada and the United States.

Our null hypothesis is that there is no difference between the populations


in terms of distribution of quality of life.
Our alternate hypothesis is that there is a relationship between the
populations in terms of distribution of quality of life BUT does not specify
any particular kind of relationship. This means it is not one-sided or two-
sided...

OPTION 1: Compare two proportions at a time using the two-


sample z test for proportions. For the US and Canada example,
this means Give tests in all with Give different p-values.
BAD IDEA - we don't get an overall conclusion...what if we
wanted to compre three countries...

OPTION 2: Conduct an overall test to see if there is good


evidence of any differences among the parameters that we
want to compare. Do a detailed follow-up analysis if necessary
to decide which of the parameters differ signiGicantly and to
estimate how large the difference are.

We will mainly focus on the overall test.

2
Section 11.2 - Complete

Expected counts in two-way tables:


The formula is more complicated than the basic idea. The expected count
in any cell of a two-way table when H0 is true is:

Expected count = Row total x Column total


Table total

Let's add a column in our table that gives the row totals:

Use the formula to determine the expected counts for the


US/Canada example:

3
Section 11.2 - Complete

The formula for the X2 test is the same as it was for the X2
Goodness of Fit test.
Use it to Gind the value of X2 for each of the 10 cells in the U.S./
Canada example.

Calculate the overall value of X2.

In order to calculate the p-value (in the same way as we did for
the X2 - GOF test) we need to determine the degrees of
freedom.

For these tests (where we are using a two-way table for our
data) the degrees of freedom is found by taking the product of
1 less than the number of rows (r) and 1 less than the number
of columns (c).

df = (r - 1)(c - 1)
What are the degrees of freedom for the X2 distribution for the
US/Canada example?

Calculate the p-value for the US/Canada example:

4
Section 11.2 - Complete

We can run the entire test on our calculator.


1. Enter your two-way table of OBSERVED values into Matrix A
(press 2nd x-1 to get the "MATRIX" menu; select EDIT, and
highlight [A] and press enter; enter the dimensions of your table
without the "total" column or row (be sure to do rows, then
columns); then enter your data)

2. Run the X2 test (go to the stat menu, highlight TESTS; scroll
down to X2 - TEST and press enter; make sure it says [A] by
observed and [B] by expected; press calculate)

3. You can see the two-way table of expected values in Matrix B.


Your calculator can calculate these for you. (Press 2nd x-1 to get
the "MATRIX" menu; select EDIT; highlight [B] and press enter;
the expected counts should appear)

Residuals
It is often useful to know which cells contributed the most to the
overall test statistic. This gives us a sense as to where our model
doesn't Git.

We examine this by looking at the residual for each cell.

Notice this is just the square root of the Chi-Square statistic for
each cell.

These are standardized so consider the 68-95-99.7 rule to


estimate which are signiGicant differences.

5
Section 11.2 - Complete

Calculate the residuals for the US/Canada example in order to


determine which cells showed the most
signiGicant differences from the model.
residual residual

Test for Homogeneity vs. Test for Independence


Independent SRSs from each of A single SRS, with each
several populations, with each individual classiGied
individual classiGied according to
according to both of two
one categorical variable. (The other
variable says which sample the
categorical variables.
individual comes from)

Ho: Variable _____ and variable


Ho: The distribution of variable ______ are independent
_______ is the same for all
values of variable _______ H : Variable _____ and variable
a

______ are NOT independent


H : The distribution of variable
a

_______ differs for values of


variable _______

6
Section 11.2 - Complete

Conditions
1. Randomization Condition - random sample from some
population. If you don't want to generalize, you don't need
this.

2. Expected Cell Frequency Condition (Large Sample Size) -


expected counts in each cell must be at least 5

3. Independence Condition - individuals must be


independent, if sampling without replacement check the 10%
condition

A study of the career plans of young women and men sent questionnaires to

386 RANDOM business majors. One question asked which major


within the business program the student had chosen. Here are the data
from the students who responded. Is the distribution of major selection
different for males and females?

7
Section 11.2 - Complete

Section 11.2 Homework:


p. 724 #s 19 - 22 all, 27, 29, 31, 33, 35, 43, 45, 49, 51, 53 - 58 all

You might also like