You are on page 1of 45

The Chi-Square

Statistic

1
Categorical data analysis
O Categorical scales are pervasive in the
social sciences for measuring attitudes and
opinions.
O Also occur frequently in the health sciences,
for measuring responses such as
O whether a patient survives an operation (yes,
no),
O severity of an injury (none, mild, moderate,
severe), and
O stage of a disease (initial, advanced).
2
Cont…
O Methods for response (dependent) variable
Y having scale that is a set of categories
O Explanatory variables may be categorical or
continuous or both
O Example: Y=ANC attendance (yes, no)
O X’s - income, education, gender, race
O Two types of categorical variables
O Nominal - unordered categories
O Ordinal - ordered categories

3
Parametric and Nonparametric Tests

O The term "non-parametric" refers to


the fact that the chi‑ square tests do
not require assumptions about
population parameters nor do they
test hypotheses about population
parameters.

4
Nonparametric Statistics
O A special class of hypothesis tests
O Used when assumptions for parametric tests are
not met
O Review: What are the assumptions for parametric
tests?
Assumptions for Parametric Tests

O Dependent variable is a scale variable 


interval or ratio
O If the dependent variable is ordinal or nominal, it is a
non-parametric test
O Participants are randomly selected
O If there is no randomization, it is a non-parametric
test

O The underlying population distribution is normal


O If the shape is not normal, it is a non-parametric test
cont.
O The most obvious difference between the
chi‑ square tests and the other hypothesis tests we
have considered (t and ANOVA) is the nature of the
data.
O chi-square statistic is non-parametric hypothesis
tests
O For chi‑ square, the data are frequencies rather
than numerical scores.

7
chi-square statistic

O The chi-square test for:


O Tests of goodness – of – fit
O Tests of independence (Association)
O Tests of Homogeneity

8
The Chi-Square Test for Goodness-of-Fit
O Chi-square Goodness-of-fit Test is used to
test if an observed distribution conforms to
any particular distribution.
O Calculation of this goodness of fit test is by
comparison of observed data with data
expected based on the particular
distribution
O It answers the question of how well do
experimental data fit expectations.

9
Cont’d
O Example: In tossing a coin, you expect half heads
and half tails. You tossed a coin 100 times. You
expected 50 heads and 50 tails. However, you
obtained 48 heads and 52 tails. Are 48 heads
and 52 tails close enough to call the coin fair?

10
Cont’d
O The chi-square test for goodness-of-fit uses
frequency data from a sample to test
hypotheses about the shape or proportions
of a population.
O Each individual in the sample is classified
into one category on the scale of
measurement.
O The data, called observed frequencies,
simply count how many individuals from
the sample are in each category.
11
MI

Yes No
Group

New a b a+b

Old c d c+d

a+c b+d n=a+b+c+d

12
The Concept of Expected Frequencies

v Expected frequencies fe : the cell frequencies that


would be expected in a bivariate table if the two
tables were statistically independent.

v Observed frequencies fo: the cell frequencies


actually observed in a bivariate table.
Determining the Expected Frequencies (fe)
O First, add the columns and the rows to get the
totals as shown in the previous slide.
O To obtain the expected frequency within a
particular cell, perform the following operation:
Multiply the row total and the column total for the
cell in question and then divide that product by
the Total number of all respondents.

fe = (column marginal)(row marginal)


N
Calculating the Expected Value for a Particular Cell

Attitude Toward Bill


For Undecided Against Total

Republican 38 17 95 150
55.7

Democrat 92 18 90 200

Total 130 35 185 350

1. 130*150 = 19500 2. 19500/350 = 55.7


Political Party and Attitude toward Bill

Attitude Toward Bill


For Undecided Against Total
Republican 38 17 95 150
55.7 15 79.3
Democrat 92 18 90 200
74.3 20 105.7
Total 130 35 185 350
Numbers in Black are obtained (fo),
Numbers in Purple are expected (fe)
The Chi-Square Test for
Independence
O The second chi-square test, the chi-square test for
independence, can be used and interpreted in two
different ways:
1. Testing hypotheses about the relationship
between two variables in a population, or
2. Testing hypotheses about differences between
proportions for two or more populations.

17
Statistical Independence
O Independence (statistical): the absence of
association between two cross-tabulated variables.
The percentage distributions of the dependent
variable within each category of the independent
variable are identical.
Independence Demonstrated
O Suppose we are interested in the relationship
between gender and attending college.

O If there is no relationship between gender and


attending college and 40% of our total sample
attend college, we would expect 40% of the
males in our sample to attend college and 40%
of the females to attend college.

O If there is a relationship between gender and


attending college, we would expect a higher
proportion of one group to attend college than
the other group, e.g. 60% to 20%.
Displaying Independent and Dependent
Relationships
When group membership
When the variables are makes a difference, the
independent, the proportion dependent relationship is
in both groups is close to indicated by one group
the same size as the having a higher proportion
proportion for the total than the proportion for the
sample. total sample.

Independent Relationship Dependent Relationship


betw een Gender and College betw een Gender and College
Poportion Attending College

Poportion Attending College


100% 100%

80% 80%
60%
60% 60%
40% 40% 40% 40%
40% 40%
20%
20% 20%
0% 0%
Males Females Total Males Females Total
Independent and Dependent Variables
O The two variables in a chi-square test of
independence each play a specific role.
O The group variable is also known as the independent
variable because it has an influence on the test variable.
O The test variable is also known as the dependent variable
because its value is believed to be dependent on the
value of the group variable.
O The chi-square test of independence is a test of
the influence or impact that a subject’s value on
one variable has on the same subject’s value for a
second variable.
Hypothesis Testing with Chi-Square
Chi-square follows five steps:
1. Making assumptions (random sampling)
2. Stating the research and null hypotheses
and selecting alpha
3. Selecting the sampling distribution and
specifying the test statistic
4. Computing the test statistic
5. Making a decision and interpreting the
results
Step 1. Assumptions for the Chi-square Test
O The chi-square Test of Independence can be used
for any level variable, including interval level
variables grouped in a frequency distribution. It is
most useful for nominal variables for which we do
not another option.
O Assumptions: No cell has an expected frequency
less than 5.
O If these assumptions are violated, the chi-square
distribution will give us misleading probabilities.
Step 2. Stating Research and null
Hypotheses and alpha
O The research hypothesis (H1) states that the two
variables are dependent or related. This will be true if
the observed counts for the categories of the variables
in the sample are different from the expected counts.
O The null hypothesis (H0) states that no association
exists between the two cross-tabulated variables in the
population, and therefore the variables are statistically
independent.
O The amount of difference needed to make a decision
about difference or similarity is the amount
corresponding to the alpha level of significance, which
will be either 0.05 or 0.01. The value to use will be
stated in the problem.
H1: The two variables are related in the population.
Gender and fear of walking alone at night are
statistically dependent.

Afraid Men Women Total

No 83.3% 57.2% 71.1%


Yes 16.7% 42.8% 28.9%
Total 100% 100% 100%
H0: There is no association between the two
variables.

Gender and fear of walking alone at night are


statistically independent.

Afraid Men Women Total

No 71.1% 71.1% 71.1%


Yes 28.9% 28.9% 28.9%
Total 100% 100% 100%
Step 3. Sampling distribution
and test statistic
O To test the relationship, we use the chi-
square test statistic, which follows the chi-
square distribution.

O If we were calculating the statistic by hand,


we would have to compute the degrees of
freedom to identify the probability of the test
statistic. SPSS will print out the degrees of
freedom and the probability of the test
statistics for us.
The Sampling Distribution of Chi-Square
O The distributions are positively skewed. The
research hypothesis for the chi-square is
always a one-tailed test.
O Chi-square values are always positive. The
minimum possible value is zero, with no
upper limit to its maximum value.
O As the number of degrees of freedom
increases, the  distribution becomes more
symmetrical.
Step 4. Computing the Test Statistic
O Conceptually, the chi-square test of independence
statistic is computed by summing the difference
between the observed and expected frequencies
for each cell in the table divided by the expected
frequencies for the cell.

O We identify the value and probability for this test


statistic from the SPSS statistical output.
Determining the Degrees of Freedom

df = (r – 1)(c – 1)
where
r = the number of rows
c = the number of columns
Step 5. Decision and
Interpretation
O If the probability of the test statistic is less than
or equal to the probability of the alpha error rate,
we reject the null hypothesis and conclude that
our data supports the research hypothesis. We
conclude that there is a relationship between
the variables.

O If the probability of the test statistic is greater


than the probability of the alpha error rate, we
fail to reject the null hypothesis. We conclude
that there is no relationship between the
variables, i.e. they are independent.
2 Test of Independence
Example on HIV
O You randomly sample 286 sexually active
individuals and collect information on their
HIV status and History of STDs. At the .05
level, is there evidence of a relationship?

33
2 Test of Independence Solution
H 0:
H a:
=
df =
Critical Value(s):

34
2 Test of Independence Solution
H0: No Relationship
Ha: Relationship
=
df =
Critical Value(s):

35
2 Test of Independence Solution
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

36
2 Test of Independence Solution
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

37
2 Test of Independence Solution

38
2 Test of Independence Solution

39
2 Test of Independence Solution
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

40
2 Test of Independence Solution
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

41
2 Test of Independence Solution
H0: No Relationship
Ha: Relationship
 = .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

42
Assignment

O Research question: Is there an association


between treatment group and MI?

43
Actual No MI MI Total

12 8 20
New TRT

Old TRT 4 16 20

Total 16 24 40
44
O The most common tests are the Chi-
square test and Fisher’s Exact test.
O The classic Pearson’s chi-squared test of
independence
O Conservatively, we require expected ≥ 5
for all i, j
O Sample size should also >20

45

You might also like