You are on page 1of 55

Chi-Square & F Distributions

Carolyn J. Anderson
EdPsych 580
Fall 2005

Chi-Square & F Distributions – p. 1/55


Chi-Square & F Distributions
. . . and Inferences about Variances
• The Chi-square Distribution
• Definition, properties, tables of, density calculator

• Testing hypotheses about the variance of a single


population
(i.e., Ho : σ 2 = K). Example.
• The F Distribution
• Definition, important properties, tables of

• Testing the equality of variances of two independent


populations
(i.e., Ho : σ12 = σ22 ). Example.
Chi-Square & F Distributions – p. 2/55
Chi-Square & F Distributions

. . . and Inferences about Variances


• Comments regarding testing the homogeneity
of variance assumption of the two
independent groups t–test (and ANOVA).
• Relationship among the Normal, t, χ2 , and F
distributions.

Chi-Square & F Distributions – p. 3/55


Chi-Square & F Distributions
• Motivation. The normal and t distributions are
useful for tests of population means, but often
we may want to make inferences about
population variances.
• Examples:
• Does the variance equal a particular value?

• Does the variance in one population equal the


variance in another population?
• Are individual differences greater in one population
than another population?
• Are the variances in J populations all the same?

• Is the assumption of homogeneous variances


Chi-Square & F Distributions – p. 4/55
Chi-Square & F Distributions
• To make statistical inferences about
populations variance(s), we need
• χ2 −→ The Chi-square distribution (Greek
“chi”).
• F−→ Named after Sir Ronald Fisher who
developed the main applications of F.
• The χ2 and F–distributions are used for many
problems in addition to the ones listed above.
• They provide good approximations to a large
class of sampling distributions that are not
easily determined.
Chi-Square & F Distributions – p. 5/55
The Big Five Theoretical Distributions
• The Big Five are Normal, Student’s t, χ2 , F,
and the Binomial (π, n).
• Plan:
• Introduce χ2 and then the F distributions.
• Illustrate their uses for testing variances.
• Summarize and describe the relationship
among the Normal, Student’s t, χ2 and F.

Chi-Square & F Distributions – p. 6/55


The Chi-Square Distributions
• Suppose we have a population with scores Y
that are normally distributed with mean
E(Y ) = µ and variance = var(Y ) = σ 2 (i.e.,
Y ∼ N (µ, σ 2 )).
• If we repeatedly take samples of size n = 1
and for each “sample” compute
2
(Y − µ)
z2 = 2
= squared standard score
σ
• Define χ21 = z 2
• What would the sampling distribution of χ21
look like?
Chi-Square & F Distributions – p. 7/55
The Chi-Square Distribution, ν = 1

Chi-Square & F Distributions – p. 8/55


The Chi-Square Distribution, ν = 1
• χ21 are non-negative Real numbers
• Since 68% of values from N (0, 1) fall between
−1 to 1, 68% of values from χ21 distribution
must be between 0 and 1.
• The chi-square distribution with ν = 1 is very
skewed.

Chi-Square & F Distributions – p. 9/55


The Chi-Square Distribution, ν = 2
• Repeatedly draw independent (random)
samples of n = 2 from N (µ, σ 2 ).
• Compute Z12 = (X1 − µ)2 /σ 2 and
Z22 = (X2 − µ)2 /σ 2 .


Compute the sum: χ22 = Z12 + Z22 .

Chi-Square & F Distributions – p. 10/55


The Chi-Square Distribution, ν = 2
• All value non-negative
• A little less skewed than χ21 .
• The probability that χ22 falls in the range of 0
to 1 is smaller relative to that for χ21 . . .
2
P (χ1 ≤ 1) = .68
P (χ22 ≤ 1) = .39
• Note that mean ≈ ν = 2. . . .

Chi-Square & F Distributions – p. 11/55


Chi-Square Distributions
• Generalize: For n independent observations
from a N (µ, σ 2 ), the sum of the squared
standard scores has a Chi-square distribution
with n degrees of freedom.
• Chi–squared distribution only depends on
degrees of freedom, which in turn depends
on sample size n.
• The standard scores are computed using
population µ and σ 2 ; however, we usually
don’t know what µ and σ 2 equal. When µ and
σ 2 are estimated from the sampled data, the
degrees of freedom are less than n.
Chi-Square & F Distributions – p. 12/55
Chi-Square Dist: Varying ν

Chi-Square & F Distributions – p. 13/55


2
Properties of Family of χ Distributions
• They are all positively skewed.
• As ν gets larger, the degree of skew
decreases.
• As ν gets very large, χ2ν approaches the
normal distribution.

Why? The Central Limit Theorem (for sums):


Consider a random sample of size n from a population
distribution having mean µ and variance σ 2 . If n is
Pn
sufficiently large, then the distribution of i=1 Yi is
approximately normal with mean nµ and variance σ 2 .

Chi-Square & F Distributions – p. 14/55


2
Properties of Family of χ Distributions
• E(χ2ν ) = mean = ν = degrees of freedom.
• E[(χ2ν − E(χ2ν ))2 ] = var(χ2ν ) = 2ν.
• Mode of χ2ν is at value ν − 2 (for ν ≥ 2).
• Median is approximately = (3ν − 2)/3 (for
ν ≥ 2).

Chi-Square & F Distributions – p. 15/55


2
Properties of Family of χ Distributions
IF
• A random variable χ2ν1 has a chi-squared
distribution with ν1 degrees of freedom, and
• A second independent random variable χ2ν2
has a chi-squared distribution with ν2 degrees
of freedom,
THEN
χ2(ν1 +ν2 ) = χ2ν1 + χ2ν2
their sum has a chi-squared distribution with
(ν1 + ν2 ) degrees of freedom.
Chi-Square & F Distributions – p. 16/55
2
Percentiles of χ Distributions
2 2 2
Note: χ
.95 1 = 3.84 = 1.96 = z.95
• Tables
• http://calculator.stat.ucla.edu/cdf/
• pvalue.f program or the executable version,
pvalue.exe, on the course web-site.
• SAS: PROBCHI(x,df<,nc>) where
• x = number
• df = degrees of freedom
• If p=PROBCHI(x,df), then
p = P rob(χ2df ≤ x)
Chi-Square & F Distributions – p. 17/55
SAS Examples & Computations
p-values:
DATA probval;
pz=PROBNORM(1.96);
pzsq=PROBCHI(3.84,1);
output;
RUN;
Output:
pz pzsq
0.97500 0.95000

What are these values?

Chi-Square & F Distributions – p. 18/55


SAS Examples & Computations

. . . To get density values. . .


Probability Density;
data chisq3;
do x=0 to 10 by .005;
pdfxsq=pdf(’CHISQUARE’,x,3);
output;
end;
run;

Chi-Square & F Distributions – p. 19/55


Inferences about a Population Variance
or the sampling distribution of the sample
variance from a normal population.
• Statistical Hypotheses:

Ho : σ 2 = σo2 versus Ha : σ 2 6= σo2


• Assumptions: Observations are
independently drawn (random) from a normal
population; i.e.,
Yi ∼ N (µ, σ 2 ) i.i.d

Chi-Square & F Distributions – p. 20/55


2
Inferences about σ (continued)
Test Statistic:
n n
• We know
X (Yi − µ)2 X
= zi2 ∼ χ2n
i=1
σ2 i=1

if z ∼ N (0, 1).
• We don’t know µ, so we use Ȳ as an estimate
of µ Xn
(Yi − Ȳ )2 2
∼ χn−1
i=1
σ2
Pn
or i=1 (Y i − Ȳ )2
(n − 1)s 2
2
= ∼ χ n−1
σ2 σ2

Chi-Square & F Distributions – p. 21/55


2 2
Test Statistic for Ho : σ = σo
2
• So σ
s2 ∼ χ2n−1
(n − 1)

• This gives us our test statistic:


n
Ȳ )2
P
i=1 (Yi −
Xν2 =
σo2

where Ho : σ 2 = σo2 .
• Sampling distribution of Test Statistic: If Ho is
true, which means that σ 2 = σo2 , then
2
Pn 2
2 (n − 1)s i=1 (Y i − Ȳ ) 2
Xν = 2
= ∼ χ n−1
σo σo2
Chi-Square & F Distributions – p. 22/55
2 2
Decision and Conclusion, Ho : σ = σo
• Decision: Compare the obtained test statistic
to the chi-squared distribution with ν = n − 1
degrees of freedom.

or find the p-value of the test statistic and


compare to α.
• Interpretation/Conclusion: What does the
decision mean in terms of what you’re
investigating?

Chi-Square & F Distributions – p. 23/55


2 2
Example of Ho : σ = σo
• High School and Beyond: Is the variance of
math scores of students from private schools
equal to 100?
• Statistical Hypotheses:

Ho : σ 2 = 100 versus Ha : σ 2 6= 100


• Assumptions: Math scores are independent
and normally distributed in the population of
high school seniors who attend private
schools and the observations are
independent.
Chi-Square & F Distributions – p. 24/55
2 2
Example of Ho : σ = σo (continued)
• Test Statistic: n = 94, s2 = 67.16, and set
α = .10.
2
2 (n − 1)s (94 − 1)(67.16)
X = 2
= = 62.46
σ 100
with ν = (94 − 1) = 93.
• Sampling Distribution of the Test Statistic:
Chi-square with ν = 93.

Critical values: .05 χ293 = 71.76 & .95 χ293 = 116.51.

Chi-Square & F Distributions – p. 25/55


2 2
Example of Ho : σ = σo (continued)
• Critical values: .05 χ2 = 71.76 & .95 χ2 = 116.51.
93 93

• Decision: Since the obtained test statistic X 2 = 71.76 is


less than .05 χ293 = 116.51, reject Ho at α = .10.
Chi-Square & F Distributions – p. 26/55
2
Confidence Interval Estimate of σ
• Start with
2
(n − 1)s
µ ¶
2 2
Prob χ
(α/2) ν ≤ 2
≤ χ
(1−α/2) ν =1−α
σ

• After a little algebra. . .


2
·µ ¶ µ ¶¸
1 σ 1
Prob 2
≤ 2
≤ 2
=1−α
(1−α/2) χν (n − 1)s (α/2) χν

• and a little more


2 2
(n − 1)s (n − 1)s
·µ ¶ µ ¶¸
2
Prob 2
≤σ ≤ 2
=1−α
(1−α/2) χν (α/2) χν

Chi-Square & F Distributions – p. 27/55


2
90% Confidence Interval Estimate of σ
• (1 − α)% Confidence interval,
(n − 1)s2 (n − 1)s2
2
≤σ≤ 2
χ
(1−α/2) ν χ
α/2 93

• So,
(94 − 1)(67.16) (94 − 1)(67.16)
, −→ (53.61, 87.04),
116.51 71.76
which does not include 100 (the null
hypothesized value).
• s2 = 67.16 isn’t in the center of the interval.
Chi-Square & F Distributions – p. 28/55
The F Distribution
• Comparing two variances: Are they equal?
• Start with two independent populations, each
normal and equal variances.. . .
Y1 ∼ N (µ1 , σ 2 ) i.i.d.
Y2 ∼ N (µ2 , σ 2 ) i.i.d.
• Draw two independent random samples from
each population,
n1 from population 1
n2 from population 2
Chi-Square & F Distributions – p. 29/55
The F Distribution (continued)
• Using data from each of the two samples,
estimate σ 2 .
s21 and s22
• Both S12 and S22 are random variables, and
their ratio is a random variable,
estimate of σ 2 s21
F = 2
= 2
estimate of σ s2
χ2(n1 −1) /(n1 − 1) χ2ν1 /ν1
= 2
= 2
χ(n2 −1) /(n2 − 1) χν2 /ν2

• Random variable F has an F distribution.


Chi-Square & F Distributions – p. 30/55
Testing for Equal Variances
• F gives us a way to test Ho : σ12 = σ22 (= σ 2 ).
• Test statistic:
1
Pn1 2
¡1¢
i=1 (Yi1 − Ȳ1 )
µ 2¶
s1 n1 −1 σ2
F = 2 = 1
Pn2 ¡1¢
s2 n2 −1 i=1 (Yi2 − Ȳ2 )2 σ2
χ2ν1 /ν1
= 2
χν2 /ν2
• A random variable formed from the ratio of
two independent chi-squared variables, each
divided by it’s degrees of freedom, is an
“F –ratio” and has an F distribution.
Chi-Square & F Distributions – p. 31/55
Conditions for an F Distribution
• IF
• Both parent populations are normal.
• Both parent populations have the same
variance.
• The samples (and populations) are
independent.
• THEN the theoretical distribution of F is Fν1 ,ν2
where
• ν1 = n1 − 1 = numerator degrees of freedom

• ν2 = n2 − 1 = denominator degrees of freedom

Chi-Square & F Distributions – p. 32/55


Eg of F Distributions: F2,ν2

Chi-Square & F Distributions – p. 33/55


Eg of F Distributions: F5,ν2

Chi-Square & F Distributions – p. 34/55


Eg of F Distributions: F50,nu2 . . .

Chi-Square & F Distributions – p. 35/55


Important Properties of F Distributions
• The range of F–values is non-negative real
numbers (i.e., 0 to +∞).
• They depend on 2 parameters: numerator
degrees of freedom (ν1 ) and denominator
degrees of freedom (ν2 ).
• The expected value (i.e, the mean) of a
random variable with an F distribution with
ν2 > 2 is
E(Fν1 ,ν2 ) = µFν1 ,ν2 = ν2 /(ν2 − 2).

Chi-Square & F Distributions – p. 36/55


Properties of F Distributions
• For any fixed ν1 and ν2 , the F distribution is
non-symmetric.
• The particular shape of the F distribution
varies considerably with changes in ν1 and ν2 .
• In most applications of the F distribution (at
least in this class), ν1 < ν2 , which means that
F is positively skewed.
• When ν2 > 2, the F distribution is uni-modal.

Chi-Square & F Distributions – p. 37/55


Percentiles of the F Dist.
• http://calculators.stat.ucla.edu/cdf
• p-value program
• SAS probf
• Tables textbooks given the upper 25th , 10th ,
5th , 2.5th , and 1st percentiles. Usually, the
• Columns correspond to ν1 , numerator df.
• Rows correspond to ν2 , denominator df.
• Getting lower percentiles using tables
requires taking reciprocals.

Chi-Square & F Distributions – p. 38/55


Selected F values from Table V
Note: all values are for upper α = .05
ν1 ν2 Fν1 ,ν2 which is also . . .
1 1 161.00 t21
1 20 4.35 t220
1 1000 3.85 t21000
1 ∞ 3.84 t2∞ = z 2 = χ21

ν1 ν2 Fν1 ,ν2
1 20 4.35
4 20 2.87
10 20 2.35
20 20 2.12
1000 20 1.57

Chi-Square & F Distributions – p. 39/55


Test Equality of Two Variances
Are students from private high schools more
homogeneous with respect to their math test
scores than students from public high schools?
• Statistical Hypotheses:
2 2 2 2
Ho : σprivate = σpublic or σpublic /σprivate =1
2 2
versus Ha : σprivate < σpublic ,(1-tailed test).
• Assumptions: Math scores of students from private
schools and public schools are normally distributed
and are independent both between and within in
school type.
Chi-Square & F Distributions – p. 40/55
Test Equality of Two Variances
• Test Statistic: s2 91.74
1
F = 2= = 1.366
s2 67.16
with ν1 = (n1 − 1) = (506 − 1) = 505 and
ν2 = (n2 − 1) = (94 − 1) = 93.
Since the sample variance for public schools,
s21 = 91.74, is larger than the sample variance for
private schools, s22 = 67.16, put s21 in the numerator.
• Sampling Distribution of Test Statistic is
F distribution with ν1 = 505 and ν2 = 93.

Chi-Square & F Distributions – p. 41/55


Test Equality of Two Variances
• Decision: Our observed test statistic,
F505,93 = 1.366 has a p–value= .032. Since
p–value < α = .05, reject Ho .
• Or, we could compare the observed test
statistic, F505,93 = 1.366, with the critical value
of F505,93 (α = .05) = 1.320. Since the
observed value of the test statistic is larger
than the critical value, reject Ho .
• Conclusion: The data support the conclusion
that students from private schools are more
homogeneous with respect to math test
scores than students from public schools.
Chi-Square & F Distributions – p. 42/55
Example Continued
• Alternative question: “Are the individual
differences of students in public high schools
and private high schools the same with
respect to their math test scores?”
• Statistical Hypotheses: The null is the same,
but the alternative hypothesis would be
2 2
Ha : σpublic 6= σprivate (a 2–tailed alternative)
• Given α = .05, Retain the Ho , because our
obtained p–value (the probability of getting a
test statistic as large or larger than what we
got) is larger than α/2 = .025.
Chi-Square & F Distributions – p. 43/55
Example Continued
• Given α = .05, Retain the Ho , because our
obtained p–value (the probability of getting a
test statistic as large or larger than what we
got) is larger than α/2 = .025.
• Or the rejection region (critical value) would
be any F –statistic greater than
F505,93 (α = .025) = 1.393.
• Point: This is a case where the choice
between a 1 and 2 tailed test leads to different
decisions regarding the null hypothesis.

Chi-Square & F Distributions – p. 44/55


Test for Homogeneity of Variances

Ho : σ12 = σ22 = . . . = σJ2


• These include
• Hartley’s Fmax test
• Bartlett’s test
• One regarding variances of paired
comparisons.
• You should know that they exist; we won’t go
over them in this class. Such tests are not as
important as they once (thought) they were.
Chi-Square & F Distributions – p. 45/55
Test for Homogeneity of Variances
• Old View: Testing the equality of variances
should be a preliminary to doing independent
t-tests (or ANOVA).
• Newer View:
• Homogeneity of variance is required for small
samples, which is when tests of homogeneous
variances do not work well. With large samples, we
don’t have to assume σ12 = σ22 .
• Test critically depends on population normality.

• If n1 = n2 , t-tests are robust.

Chi-Square & F Distributions – p. 46/55


Test for Homogeneity of Variances
• For small or moderate samples and there’s
concern with possible heterogeneity −→
perform a Quasi-t test.
• In an experimental settings where you have
control over the number of subjects and their
assignment to groups/conditions/etc. −→
equal sample sizes.
• In non-experimental settings where you have
similar numbers of participants per group, t
test is pretty robust.

Chi-Square & F Distributions – p. 47/55


2
Relationship between z, tν , χν , and Fν1,ν2
. . . and the central importance of the normal
distribution.
• Normal, Student’s tν , χ2ν , and Fν1 ,ν2 are all
theoretical distributions.
• We don’t ever actually take vast (infinite)
numbers of samples from populations.
• The distributions are derived based on
mathematical logic statements of the form
IF . . . . . . . . . Then . . . . . . . . .

Chi-Square & F Distributions – p. 48/55


Derivation of Distributions
• Example
• IF we draw independent random samples of size
(large) n from a population and compute the mean
Ȳ and repeat this process many, many, many, many
times,
• THEN Ȳ is approximately normal.

• Assumptions are part of the “if” part, the conditions


used to deduce sampling distribution of statistics.
• The t, χ2 and F distributions all depend on normal
“parent” population.
Chi-Square & F Distributions – p. 49/55
Chi-Square Distribution
• χ2ν = sum of independent squared normal
random variables with mean µ = 0 and
variance σ 2 = 1 (i.e., “standard normal”
random variables).
n
X
χ2ν = zi2 where zi ∼ N (0, 1)
i=1

• Based on the Central Limit Theorem, the


“limit” of the χ2ν distribution (i.e., n → ∞) is
normal.
Chi-Square & F Distributions – p. 50/55
The F Distribution
• Fν1 ,ν2 = ratio of two independent chi-squared
random variables each divided by their
respective degrees of freedom.
χ2ν1 /ν1
Fν1 ,ν2 = 2
χν2 /ν2
• Since χ2ν ’s depend on the normal distribution,
the F distribution also depends on the normal
distribution.
• The “limiting” distribution of Fν1 ,ν2 as ν2 → ∞
is χ2ν1 /ν1 .. . . . . . because as ν2 → ∞,
χ2ν2 /ν2 → 1.
Chi-Square & F Distributions – p. 51/55
Students t Distribution
Note that
¶2
Ȳ − µ
µ
t2ν = √
s/ n
(Ȳ − µ)2 n
= Pn 2 /(n − 1)
i=1 (Y i − Ȳ )
2 1 ¶
(Ȳ − µ) n
µ
σ2
= Pn 2 /(n − 1) 1
i=1 (Y i − Ȳ ) σ2
(Ȳ −µ)2
σ 2 /n z2
= Pn =
i=1 (Yi −Ȳ )
2
χ2 /ν
σ 2 (n−1)

Chi-Square & F Distributions – p. 52/55


Students t Distribution (continued)
• Student’s t based on normal,
2
z z
t2ν = 2 or tν = p
χν /ν χ2ν /ν
• A squared t random variable equals the ratio
of squared standard normal divided by
chi-squared divided by its degrees of
freedom. So. . .

Chi-Square & F Distributions – p. 53/55


Students t Distribution (continued)
2
Since z z
t2ν = 2 or tν = p
χν /ν χ2ν /ν
• As ν → ∞, tν → N (0, 1) because χ2ν /ν → 1.
• Since z 2 = χ21 , 2 z 2
/1 χ 2
1 /1
t = 2 = 2 = F1,ν
χn /ν χn /ν
• Why are the assumptions of normality,
homogeneity of variance, and independence
required for
• t test for mean(s)

• Testing homogeneity of variance(s).


Chi-Square & F Distributions – p. 54/55
Summary of Relationships
Let z ∼ N (0, 1)
Distribution Definition Parent Limiting
2
Pν 2
χν z
i=1 i normal As ν → ∞,
independent z’s χ2ν → normal
Fν1 ,ν2 (χ2ν1 /ν1 )/(χ2ν2 /ν2 ) chi-squared As ν2 → ∞,
independent χ2 ’s Fν1 ,ν2 → χ2ν1 /ν1
p
t z/ χ2 /ν normal As ν → ∞,
t → normal
Note: F1,ν = t2ν , also F1,∞ = t2∞ = z 2 = χ21 .

Chi-Square & F Distributions – p. 55/55

You might also like