You are on page 1of 3

Chi-Square Distribution 121

the right conditions, this map can produce the standard


S-shaped trajectory that is characteristic of all logistic
structures. However, oscillations in the trajectory occur
when the value of the parameter a is sufciently large.
For example, when the value of parameter a is set to
2.8, the trajectory of the model oscillates around the
equilibrium value of Y
t +1
= Y
t
= Y

while it con-
verges asymptotically toward this equilibrium limit.
But when the value of the parameter a is set equal
to, say, 4.0, the resulting longitudinal trajectory never
settles down toward the equilibrium limit and instead
continues to oscillate irregularly around the equilib-
rium in what seems to be a random manner that is
caused by a deterministic process.
The most famous continuous-time model that
exhibits chaotic behavior is the so-called Lorenz
attractor (Lorenz, 1963). This model is an inter-
dependent nonlinear system involving three rst-order
differential equations, and it was originally used to
analyze meteorological phenomena.
Models with forced oscillators are sometimes good
candidates for exhibiting chaotic or near chaotic
(i.e., seemingly random) longitudinal properties. In
the social sciences, such a model has been devel-
oped and explored by Brown (1995b, chap. 6). This
model is a nonlinear system of four interdependent
differential equations that uses a forced oscillator
with respect to a parameter specifying alternating
partisan control of the White House (or other rele-
vant governmental institution). The dynamics of the
system are investigated with regard to longitudinal
damage to the environment, public concern for envi-
ronmental damage, and the cost of cleaning up the
environment. Variations in certain parameter values
yield a variety of both stable and unstable nonlinear
dynamic behaviors, including behaviors that have
apparent random-like properties typically associated
with chaos.
Courtney Brown
REFERENCES
Brown, C. (1995a). Chaos and catastrophe theories. Thousand
Oaks, CA: Sage.
Brown, C. (1995b). Serpents in the sand: Essays on the non-
linear nature of politics and human destiny. Ann Arbor:
University of Michigan Press.
Lorenz, E. N. (1963). Deterministic non-periodic ow. Journal
of Atmospheric Science, 20, 130141.
May, R. M. (1976). Simple mathematical models with very
complicated dynamics. Nature, 26, 459467.
CHI-SQUARE DISTRIBUTION
The chi-square distribution is a distribution that
results from the sums of squared standard normal vari-
ables. Even though the sums of normally distributed
variables have a normal distribution (with the shape
of a bell-shaped curve), the sums of their squares do
not. The commonly used chi-square distribution refers
to one type known as central chi-square distribution.
The following discussion focuses on this type. As both
negative and positive values square to a positive num-
ber, the curve only covers nonnegative numbers, with
its left-hand tail starting fromnear zero and with a skew
to the right. The curve is specied by its degrees of
freedom (df ), which is the number of unconstrained
variables whose squares are being summed. The curve
has its mean as the df, and the square root of twice
the df forms its standard deviation. The highest point
of the curve occurs at df 1. As the number of df
increases, the distribution takes on a more humpbacked
shape with less skew but with a broader spread, given
that its standard deviation will be increasing and that its
furthest left point will remain at 0. Thus, a distribution
with5df will have a meanof 5anda standarddeviation
of 3.1, whereas a distribution with 10 df will have a
mean of 10 and a standard deviation of 4.5. At high val-
ues of df, the curve tends toward a normal distribution,
and in such (uncommon) cases, it is possible to revert
to normal probabilities rather than calculating those for
the chi-square distribution. The right tail probability
of a chi-square distribution, which is the area in the
right tail above a certain value, divided by the total
area under the curve represents the likelihood that the
chi-square value is randomly distributed. If the area in
the tail beyond the given chi-square value represented,
say, 5% of the area under the curve, then that chi-
square value would be that at which we could be 95%
condent that the association being tested held.
APPLICATION
These probabilities have been calculated and tabu-
lated for certain ranges of chi-square values and df.
Conversely, tables are commonly produced that pro-
vide the chi-square values that will produce certain
probability levels at certain df . For example, such
tables tell us that we can be 95% condent at a chi-
square value of 9.49 for 4 df. Figure 1 shows the
chi-square distribution for 4 df (with mean = 4 and
122 Chi-Square Distribution
5 10
0
.05
P
r
o
b
a
b
i
l
i
t
y

D
e
n
s
i
t
y
.1
.15
Figure 1 An Example of Chi-Square Distribution With
4 Degrees of Freedom
standard deviation = 2.83). The right tail probability
is 5%at the point where the illustrated cutoff line takes
the value of 9.49. This means that with statistics that
approximate a chi-square distribution, we can compare
the test statistic with such tables and ascertain its level
of signicance.
The most common such statistic is that resulting
from Pearsons chi-square test. However, this test is
not the only use of the chi-square distribution. Other
statistics that test the relationship between observed
and expected or theoretical values (i.e., statistics that
measure goodness of t) approximate a chi-square dis-
tribution. For example, the likelihoodratiotest statistic,
also known as G
2
or L
2
, which is derived from max-
imum likelihood method, approximates a chi-square
distribution. It does this through a (natural) log trans-
formation of the ratio between the maximumlikelihood
based on the null hypothesis (the 2 log likelihood)
and the maximum likelihood based on the actual data
values found. G
2
is a statistic commonly quoted for
log-linear models. Inpractice, chi-square andG
2
statis-
tics, though distinct, tend to give similar results and
lead to similar conclusions about the goodness of t of
a given model, although chi-square is more robust for
small samples.
EXAMPLE
For an example, Table 1 shows some data on voting
behavior andclass from2years inthe Britishelectionof
the late 1980s. The null hypothesis would suggest that
the cell frequencies are randomlydistributed. However,
the chi-square for this is 84.7, and the G
2
(or likelihood
ratio chi-square) is 88.4 for 4 df. These are clearly
high values on a chi-square distribution based around
Table 1 Voting Data From Two British Elections in
the Late 1980s, by Class
Vote
Year Class Labor Conservative Total
Year 1 Working class 50 20 70
Middle class 10 40 50
Year 2 Working class 70 30 100
Middle class 20 80 100
Total 150 170 320
4 df and, therefore, make it highly improbable that this
model reects reality. Constraining the model so that
expected values of vote and class are expected to vary
with each other, but so that the variation is constant
across the 2 years, produces a model with Pearson and
likelihood ratio chi-squares of 2.1 for 3 df . Clearly,
such a value, when compared with the chi-square dis-
tribution having 3 df, is very probable (p = .54) and
therefore represents a good t with relative parsimony.
The change in the chi-square and G
2
values (roughly
86 or 82 for 1 df ) clearly represents a highly signicant
change.
HISTORY
The probabilities for a number of chi-square distri-
butions were tabulated by Palin Elderton in 1901, and
it was these tables that were used by Karl Pearson and
his researchers in demonstrations of his chi-square test
in the rst decades of the 20th century. However, it is
now widely acknowledged that Pearsons calculation
of the numbers of degrees of freedom was misspec-
ied, although Fishers attempt to correct him was
rejected and was only later acknowledged as being the
correct solution. Pearson was therefore assessing his
probability values fromdistributions that had too many
degrees of freedom. Maximum likelihood ratios have
been traced in origin to the 17th century, although it
was the latter half of the 20th century that saw the real
development and use of logit and log-linear models, as
well as their accompanying goodness-of-t statistics,
with the work of researchers such as Leo Goodman and
Stephen Fienberg.
Lucinda Platt
See also Chi-Square Test, Degrees of Freedom,
Log-Linear Model, Logit Model
Chi-Square Test 123
REFERENCES
Agresti, A. (1996). Anintroductiontocategorical dataanalysis.
New York: John Wiley.
Fienberg, S. E. (1977). The analysis of cross-classied categor-
ical data. Cambridge: MIT Press.
Yule, G. U., & Kendall, M. G. (1950). An introduction to the
theory of statistics (14th ed.). London: Grifn.
CHI-SQUARE TEST
The chi-square test is the most commonly used
signicance test for categorical variables in the social
sciences. It was developed by Karl Pearson around
1900 as a means of assessing the relationship between
two categorical variables as tabulated against each
other in a contingency table. The test compares the
actual values in the cells of the table with those that
would be expected under conditions of independence
(i.e., if there was no relationship between the variables
being considered). Expected values are calculated for
each cell by cross-multiplying the row and column
proportions for that cell and taking as a share of the
total number of cases considered. For example, Table 1
shows the expected counts in a hypothetical example
of voters, which have been obtained on the basis of the
row and column totals (known as the marginal distri-
butions) and the total number of cases. It also shows
the observed counts, which can be seen to differ from
those determined by independence. In this situation of
independence, voting behavior does not vary with sex.
Of course, even where there is independence in
the population, the observed values in a sample are
very unlikely exactly to mimic the expected values.
What the chi-square test ascertains is whether any
differences between observed and expected values in
the cells of the table according to the sample imply
real differences in the population (i.e., that the counts
are not independent). It does this by comparing the
actual cell counts with the cell counts expected if the
proportions were consistent across each category of
the explanatory variable. The value of the chi-square
statistic is calculated as

2
=

(f o f e)
2
f e
,
where f o = the observed cell count and f e = the
expected cell count. The test then takes into account
the size of the sample and the degrees of freedom
in the table to determine whether the differences in the
sample are likely to be due to chance or the probability
that they are reected in the population (i.e., that they
are signicant). The degrees of freedom are calculated
as (number of rows 1) (number of columns 1). The
size of the chi-square test statistic can then be related
to the chi-square distribution for those degrees of
freedom to ascertain the probability that the diver-
gence of the observed from the expected values could
occur within that distribution or whether it likely falls
outside it.
As an example, Table 2 shows the difculty of
making ends meet tabulated by class for a sample
of 987 people living in Britain in the late 1980s.
The chi-square statistic resulting from comparing the
observed counts in each cell with the expected counts
(given in brackets) is 47.9 for 4 degrees of free-
dom. This gure is signicant at the.000 level and
thus shows that the probability that nancial difculty
and social class are independent is less than 1 in a
thousand.
Chi-square does not provide information about the
strength of an association or its direction, only about
the probability of dependence between the variables.
The value of chi-square increases with sample size,
and thus a low p-value (high signicance) in a large
sample may come with a fairly weak relationship
between the variables. The chi-square test also does
not indicate which values of the response variable
vary with the explanatory variable. It may be that
only one or two cell counts deviate greatly from their
expected counts. This can be ascertained by examin-
ing the table in more detail. It is not appropriate to
use the chi-square test when an expected cell count
for any cell in the table is less than 5. This may
occur with small samples or with a large number
of cells in the table. In these situations, an alterna-
tive test, such as Fishers exact test, may be more
useful.
Lucinda Platt
REFERENCES
Agresti, A. (1996). Anintroductiontocategorical dataanalysis.
New York: John Wiley.
Everitt, B. S. (1977). The analysis of contingency tables.
London: John Wiley.
Fienberg, S. E. (1977). The analysis of cross-classied categor-
ical data. Cambridge: MIT Press.

You might also like