You are on page 1of 3

A standard normal deviate is a random sample from the standard normal distribution.

The
Chi Square distribution is the distribution of the sum of squared standard. The degrees of
freedom of the distribution is equal to the number of standard normal deviates being
summed. Therefore, Chi Square with one degree of freedom, written as 2 (1), is simply the
distribution of a single normal deviate squared. The area of a Chi Square distribution below
4 is the same as the area of a standard normal distribution below 2, since 4 is 22.
Let us consider the following problem: we sample two scores from a standard normal
distribution, square each score, and sum the squares. What is the probability that the sum of
these two squares will be six or higher? Since two scores are sampled, the answer can be
found using the Chi Square distribution with two degrees of freedom. A Chi Square calculator
can be used to find that the probability of a Chi Square (with 2 df) being six or higher is
0.050.
The mean of a Chi Square distribution is its degrees of freedom. Chi Square
distributions are positively skewed, with the degree of skew decreasing with increasing
degrees of freedom. As the degrees of freedom increases, the Chi Square distribution
approaches a normal distribution. Figure 1 shows density functions for three Chi Square
distributions. Notice how the skew decreases as the degrees of freedom increases.

The chi-squared distribution is used primarily in hypothesis testing. Unlike more widelyknown distributions such as the normal distribution and the exponential distribution, the chisquared distribution is rarely used to model natural phenomena. It arises in the following
hypothesis tests, among others.

Chi-squared test of independence in contingency tables

Chi-squared test of goodness of fit of observed data to hypothetical distributions

Likelihood-ratio test for nested models

Log-rank test in survival analysis

CochranMantelHaenszel test for stratified contingency tables

It is also a component of the definition of the t-distribution and the F-distribution used in ttests, analysis of variance, and regression analysis.
The primary reason that the chi-squared distribution is used extensively in hypothesis testing
is its relationship to the normal distribution. Many hypothesis tests use a test statistic, such
as the t statistic in a t-test. For these hypothesis tests, as the sample size, n, increases,
the sampling distribution of the test statistic approaches the normal distribution (Central
Limit Theorem). Because the test statistic (such as t) is asymptotically normally distributed,
provided the sample size is sufficiently large, the distribution used for hypothesis testing may
be approximated by a normal distribution. Testing hypotheses using a normal distribution is
well understood and relatively easy. The simplest chi-squared distribution is the square of a
standard normal distribution. So wherever a normal distribution could be used for a
hypothesis test, a chi-squared distribution could be used.
Specifically, suppose that Z is a standard normal random variable, with mean = 0 and
variance = 1. Z ~ N(0,1). A sample drawn at random from Z is a sample from the distribution
shown in the graph of the standard normal. Define a new random variable Q. To generate a
random sample from Q, take a sample from Z and square the value. The distribution of the
squared values is given by the random variable Q = Z2. The distribution of the random
variable Q is an example of a chi-squared distribution:
The subscript 1
indicates that this particular chi-squared distribution is constructed from only 1 standard
normal distribution. A chi-squared distribution constructed by squaring a single standard
normal distribution is said to have 1 degree of freedom. Thus, as the sample size for a
hypothesis test increases, the distribution of the test statistic approaches a normal
distribution. and the distribution of the square of the test statistic approaches a chi-squared
distribution. Just as extreme values of the normal distribution have low probability (and give
small p-values), extreme values of the chi-squared distribution have low probability.
An additional reason that the chi-squared distribution is widely used is that it is a member of
the class of likelihood ratio tests (LRT). LRT's have several desirable properties; in
particular, LRT's commonly provide the highest power to reject the null hypothesis (Neyman

Pearson lemma). However, the normal and chi-squared approximations are only valid
asymptotically. For this reason, it is preferable to use the t distribution rather than the
normal approximation or the chi-squared approximation for small sample size. Similarly, in
analyses of contingency tables, the chi-squared approximation will be poor for small sample
size, and it is preferable to use the Fisher Exact test. Ramsey and Ramsey show that the exact
binomial test is always more powerful than the normal approximation.
The Chi Square distribution is very important because many test statistics are approximately
distributed as Chi Square. Two of the more common tests using the Chi Square distribution
are tests of deviations of differences between theoretically expected and observed frequencies
(one-way tables) and the relationship between categorical variables (contingency tables).
Numerous other tests beyond the scope of this work are based on the Chi Square distribution.

You might also like