You are on page 1of 55

Chi-Square Test of Homogeneity

Example:
Are Employees at Organizations with Differing
Views of Empowerment Equally Satisfied?
Well-Known American Statistician:
George Gallup
George Gallup (1901-84), American public opinion analyst and
statistician, educated at the University of Iowa. He was head of the
journalism department at Drake University (1929-31), professor of
journalism and advertising at Northwestern University (1931-32), and
professor at the Pulitzer School of Journalism, Columbia University
(1935-37). In 1935 he founded and became director of the American
Institute for Public Opinion, and in 1936 he established the British
Institute of Public Opinion. Gallup was a pioneer in the use of statistical
methods for measuring the interest of readers in the features and
advertisements of magazines and newspapers and for determining public
opinion on general issues. He extended this research to include reactions
of radio audiences and founded the Audience Research Institute in 1939.
Introduction
 Chi-square analysis methods are approximate methods that
are among the most commonly used of all statistical
techniques.
 The method introduced will be used to examine “frequency”
or “count” data.
 The methods are conceptually simple although manual
computation can be tedious.
Chi-Square Tables
2 tables are included in most statistics texts
and consist of columns and rows, with columns
representing areas under the curve and rows
associated with the degrees of freedom (df)
which, for 2 tests of homogeneity and
independence are: df = (r-1)(c-1).
Chi-Square Tables
• Typical columns are: 2.100 2.050 2.025 2.010 2.005
• The decision rule both for for a chi-square test of homogeneity and
one of independence is:
DR: Reject H0 in favor of HA if and only if 2calc > 2crit.
Otherwise, FTR H0.
• In the case of homogeneity, this is essentially:
DR: Reject similarity of processes in favor of distinctions
between
them if and only if their profiles differ markedly from one another
so that the preponderance of evidence supports distinctions.
Otherwise, FTR H0.
2crit Determination
With 8 df and  = .05
df 2.100 2.050 2.025 2.010 2.005

1 2.7055 3.8415 5.0239 6.6349 7.8794


2 4.6052 5.9915 7.3778 9.2103 10.5966
3 6.2514 7.8147 9.3484 11.3449 12.8381
4 7.7794 9.4877 11.1433 13.2767 14.8602
. . . . . .
. . . . . .
8 13.3616 15.5073 17.5346 20.0902 21.9550
. . . . . .
. . . . . .
30 40.2560 43.7729 46.9792 50.8922 53.6720
2 Distribution with 8 Degrees of Freedom
0.2

0.1
2.05,8 = 15.5073
C4

 = .05

0.0

00 8 15.5073
10 20
C3
This test examines whether several populations or processes
are, effectively, really just one process (homogeneous) or
distinct ones.

Formulation of advertising strategies for distinct market


segments are one example of the use of this test.

 Test of Homogeneity
2
Expressing Homogeneity
Expressed as probabilities, the 2 test of
homogeneity can be represented as follows:

trait of interest category


process 1 2 3 4 ........ c
1 p11 p12 p13 p14 ........ p1c

2 p21 p22 p23 p24 ........ p2c


. ...........................................................
r pr1 pr2 pr3 pr4 ........ prc

p.1 p.2 p.3 p.4 ........ p.c


Expressing Homogeneity
We have (r) processes, which are by convention
represented by the rows in the table.
There are (c) categories of a trait of interest,
represented by the columns in the table.
Homogeneity presumes that the (r) processes
behave similarly with respect to a trait of interest.
H0 can be expressed as:
H0: p11 = p21 = ... = pr1 = p.1 (1st column probability)

 p12 = p22 = ... = pr2 = p.2 (2nd column probability)


 ..........

 p1c = p2c = ... = prc = p.c (cth column probability)

 this states that the probability that an item from any


process is best-described by a specific category of the trait
of interest, say the jth category, is the same for each process.
 HA: there are at least two distinct process groupings

Expressing Homogeneity
The Test Statistic
 Let ni. be the number of items sampled from the ith
process.
 Let n = n1. + n2. + ... + nr. be the total number of items
sampled from the (r) processes.
 Let Oij be the number of items from the ith process sample
that are best-described by the jth category of the trait of
interest.
 Let n.j be the number of items best-described by the jth
category of the trait of interest, irrespective of the process
from which an item originates.
• Under H0 p.j is best-estimated by p. ^ = n. /n
j j
^
pij is estimated by p.j under homogeneity or, more
generally, by Oij/ni.
• For each cell in the table we compare the two
estimates of the cell probability, each weighted by
^
the sample size, that is: (Oij - ni.p.j)
• We square the differences, standardize, and sum the
results to obtain 2calc. ^
• It can be seen that Eij = ni.p.j = (ni.)(n.j)/n
or, ultimately, the same as in independence.

The Test Statistic


The Test Statistic
• From the preceding development we have:

 2
calc
^j)2/(ni.p.^j) = (Oij - Eij)2/Eij
= (Oij - ni.p.

• This test has (r-1)(c-1) df. Critical values of 2 are


found in the same manner for both homogeneity and
independence so that decision rules for independence
and homogeneity are identical.
An empowered human resource is a satisfied one.

Such a human resource is integral to a


satisfied and loyal customer base, which is in
turn critical to financial bottom line success.
 Established business excellence models such as those for
the European Quality Award and the Malcolm Baldrige
National Quality Award support the practice of employee
empowerment and often tie metrics for empowerment to
“employee results” or “people results”.

 One hundred employees were selected from each of three


organizations and their level of job satisfaction assessed.
The three organizations operate in a common business
sector, but practice differing philosophies of employee
empowerment. Data follows.

 Do satisfaction levels differ for these three organizations?

The Business Case for


Empowerment
Very Positive Neutral Negative Very
Total
Positive Negative

Autocratic
5 15 25 35 20 100
Observed Data
Low Value
10 25 49 10 6 100

Input Valued
45 35 10 9 1 100

60 75 84 54 27 300

Empowerment & Job Satisfaction


Calculation Example:
Autocratic / Positive Cell and Decision Rule

 We have Oautocratic/positive = O12 = 15

 Similarly E12 = (n1.)(n.2)/n = (100)(75)/300 = 25

 And (O12 - E12)2/E12 = (15-25)2/25 = 4.000

 DR: Reject H0 in favor of HA if and only if 2calc >


2crit = 15.5073. Otherwise, FTR H0.
Very Positive Neutral Negative Very
Total
Positive Negative
Autocratic
20.00 25.00 28.00 18.00 9.00 100

Low Value
20.00 25.00 28.00 18.00 9.00 100
Expected Frequencies
Input Valued
20.00 25.00 28.00 18.00 9.00 100

60 75 84 54 27 300

Empowerment & Job Satisfaction


Very Positive Neutral Negative Very Total
Positive Negative

Autocratic 11.250 4.000 0.321 16.056 13.444 45.071

Low Value 5.000 0.000 15.575 3.556 1.000 25.131

Cell Contributions to
Input Valued 31.250 4.000 11.571 4.500 27.111
calc 58.432

47.500 8.000 27.467 24.112 21.555 128.634

Empowerment & Job


Empowerment & Job Satisfaction
Very Positive Neutral Negative Very Total
Positive Negative

Auto- 5 15 25 35 20 100
cratic 20.00 25.00 28.00 18.00 9.00
11.250 4.000 0.321 16.056 13.444

Low 10 25 49 10 6 100
Value 20.00 25.00 28.00 18.00 9.00
5.000 0.000 15.750 3.556 1.000

Input 45 35 10 9 1 100
Valued 20.00 25.00 28.00 18.00 9.00
31.25 4.000 11.571 4.500 7.111

60 75 84 54 27 300
128.634
A Brief Interpretation
Policies that more greatly empower employees tend to lead
to employees with more positive outlooks while policies
that provide little power to employees tend to lead to
employees with more negative outlooks.
Intermediate empowerment is associated with intermediate
job satisfaction.
Chi-Square Test of Independence

Example:
Is there a Link Between
Employee Empowerment & Customer Satisfaction?
 The Chi-Square (2) Test of Independence is
used to determine whether two factors or traits are
related to one another and, if so, to identify the
nature of the relationship.
 Warning labels and seat belt laws have resulted
from use of this test.
 To make progress, we need to recall the formal
definition of independence.
Expressing Independence
 Two traits, R and C, are independent if: P(RC) = P(R)P(C)

 What if R has categories R1, R2, ..., Rr


 and C has categories C1, C2, ..., Cc?

 Then we must have P(RiCj) = P(Ri)P(Cj) for


 i = 1, 2, ..., r and j = 1, 2, ..., c i.e. micro- independence to
establish macro-independence.
Expressing Independence
 Let R = a “row” trait with r categories
 Let C = a “column” trait with c categories
 Let the probability that an observation is classified into the ith
row be pi.
 Let the probability that an observation is classified into the jth
column be p.j
 Let pij be the probability that an observation is classified into
the ith row and jth column.
Expressing Independence

 Given the preceding development, the concept of


independence is formally expressed as:
 H0: pij = pi.p.j for all i,j combinations
similarly
 HA: pij  pi.p.j for at least one i,j combination
The Test Statistic & its Basis
 Let ni. be the number of items in the ith row
 Let n.j be the number of items in the jth column
 Let n be the total number of sample items
 Let Oij be the number of items at the intersection
of the ith row & jth column.
• pi. is estimated by ^
pi. = ni./n
^
• p.j is estimated by p.j = n.j/n

• pij is estimated by
Oij/n if the two traits are dependent, by
^^
pi.p.j if the two traits are independent.

The Test Statistic & its Basis


The Test Statistic & its Basis
 To test H0 vs HA we compare the two estimates of pij,
after each estimate has been weighted by the amount of
evidence that we have, n.

 Doing this, squaring the comparisons (e.g. the


differences) and standardizing the result yields the 2
statistic for independence.
The chi-square statistic for independence is:

2calc = (Oij - np^i.p.^j)2/(np^i.p.^j)


= (Oij - Eij)2/Eij (sum over all cells)

^.p.
where Eij = np ^ = (n .)(n. )/n is the expected
i j i j
number of values at the intersection of the ith row
& jth column under independence.

The Test Statistic


The Test Statistic
 Examination of 2calc indicates that it will be “small” in value if
Oij and Eij are close - which would support independence.
 If 2calc is “large”, it is due to discrepancy between Oij and Eij in
at least one cell, and perhaps numerous cells. This supports
dependence.
 The cells which contribute most greatly to 2calc likely have the
most to say about the nature of any dependence structure.
Customer Satisfaction & Employee
Empowerment
 The organization responsible for administration of
the Customer Satisfaction Index in Sweden
examined customer satisfaction and employee
empowerment indices for a sample of 500 Swedish
companies.
 Categories for each index were “very low”, “low”,
“moderate”, “high” and “very high”.
 Results follow – is there a discernable relationship
between customer and employee satisfaction?
Employee
Customer
Satisfaction
Empowerment
Very Low Moderate
Low
High Very Total
High

Very Low 13 11 8 5 3 40

Low 10 18 19 12 6 65

Moderate 18 32 42 44 34 170

High 12 16 34 57 61 180

Very High 1 3 8 14 19 45
Total 54 80 111 132 123 500
Observed
The Hypotheses & Decision
Rule
 H0: pij = pi.p.j for
 i = very low, low, moderate, high, very high customer sat.
 j = very low, low, moderate, high, very high empower.
 HA: pij  pi.p.j for at least one i,j combination
 DR: Reject H0 in favor of HA if and only if 2calc >
 2crit = 31.9999 . Otherwise, FTR H0.

 Essentially, we are examining whether there is a discernible


relationship between levels of employee empowerment and
customer satisfaction as, HA asserts.
2crit Determination
With 16 df and  = .01
df 2.100 2.050 2.025 2.010 2.005

1 2.7055 3.8415 5.0239 6.6349 7.8794


2 4.6052 5.9915 7.3778 9.2103 10.5966
3 6.2514 7.8147 9.3484 11.3449 12.8381
4 7.7794 9.4877 11.1433 13.2767 14.8602
. . . . . .
. . . . . .
8 23.5418 26.2962 28.8454 31.9999 34.2672
. . . . . .
. . . . . .
30 40.2560 43.7729 46.9792 50.8922 53.6720
Calculation Example:
Very Low Customer Satisfaction and Low Employee Empowerment Cell
and Decision Rule

• We have Overy low / low = O12 = 11

• Similarly E12 = (n1.)(n.2)/n = (40)(80)/500 = 6.4

• And (O12 - E12)2/E12 = (11-6.4)2/6.4 = 3.30625

• DR: Reject H0 in favor of HA if and only if 2calc >


2crit = 31.9999. Otherwise, FTR H0.
Employee Empowerment
Customer Very Low Moderate High Very Total
Satisfaction Low High

Very Low 4.32 6.40 8.88 10.56 9.84 40

Low 7.02 10.40 14.43 17.16 15.19 65

Moderate 18.36 27.20 37.74 44.88 41.82 170

High 19.44 28.80 39.96 47.52 44.28 180

Very High 4.86 7.20 9.99 11.88 11.07 45


Total 54 80 111 132 123 500
Expected Counts
Employee
Customer Very Low Moderate High Very Total
Satisfaction Low Empowerment High

Very Low 17.44 3.31 0.09 2.93 4.76 28.53

Low 1.27 5.55 1.45 1.55 6.24 16 06

Moderate 0.01 0.85 0.48 0.02 1.46 2.82

High 2.85 5.69 0.89 1.89 6.31 17.63

Very High 3.07 2.45 0.40 0.38 5.68 11.98


Total
24.64 17.85 3.31 6.77 23.45 76.99

Cell Contributions to 2calc


Chi-Square Test Expected counts are printed below observed counts
VL L M H VH Total
1 13 11 8 5 3 40
4.32 6.40 8.88 10.56 9.84  EXPECTED COUNTS

2 10 18 19 12 6 65
7.02 10.40 14.43 17.16 15.99

3 18 32 42 44 34 170
18.36 27.20 37.74 44.88 41.82

4 12 16 34 57 61 180
19.44 28.80 39.96 47.52 44.28

5 1 3 8 14 19 45
4.86 7.20 9.99 11.88 11.07

Total 54 80 111 132 123 500

Chi-Sq = 17.440 + 3.306 + 0.087 + 2.927 + 4.755 + CELL CONTRIBUTIONS


1.265 + 5.554 + 1.447 + 1.552 + 6.241 + TO Chi-Square Calc
0.007 + 0.847 + 0.481 + 0.017 + 1.462 +
2.847 + 5.689 + 0.889 + 1.891 + 6.313 +
3.066 + 2.450 + 0.396 + 0.378 + 5.681 = 76.991
DF=16, P-Value = 0.000 2 cells with expected counts less than 5.0
Decision & Interpretation
 Since 2calc (76.99) greatly exceeds 2crit (31.9999) we
reject H0 (independence) in favor of HA (dependence).
The question becomes one of “what is the nature of the
relationship”. This can usually be addressed by study of
the cells which most greatly contributed to 2calc.

 Cells with large contributions to 2calc are so because, for


that cell, either Oij is much greater than or much smaller
than Eij --- why? Study of such cells will commonly
lead to a consistent theme.
Interpretation
 In this example, study of such cells leads reasonably to a
conclusions that, in SWEDEN:
 Swedish companies that more greatly empower their human
resource also achieve higher customer satisfaction ratings.
 Similarly, Swedish companies that empower their human
resource at lower levels produce lower customer satisfaction
levels.
 A follow-up issue worthy of examination would be that of
possible linkages between these two variables and that of
bottom-line financial profitability or the other two “bottom
lines” of the so-called Triple Bottom Line: the social and
environmental ones.
Model Adequacy:
Chi-Square Goodness-of-Fit Testing
DOES THIS MODEL FIT?
Chi-Square Goodness-of-Fit Tests

 The purpose of 2 goodness-of-fit tests is to evaluate


whether a particular probability distribution does an
adequate job of modeling the behavior of the process
under consideration. This sort of test can be applied to
any model.

 A “skeleton” or template for the chi-square


 goodness-of-fit test follows.
2 Goodness-of-Fit Test: General Layout.
1) H0: p1 = p10, p2 = p20, ... , pk = pk0
HA: at least one pi ≠ pi0
2) n = _______  = _______
3) DR: Reject H0 in favor of HA iff 2calc > 2crit = ___.
Otherwise, FTR H0.
4) 2calc = (Oi - npio)2/npio = (Oi - Ei)2/Ei
5) Interpretation: Should relate to whether the hypothesized
model adequately describes behavior of the process under
consideration.
Suppose that historic failure rates are:
Due to A: .20 Due to B: .35 Due to C: .30 Due to D: .15
The manufacturer has worked on A, B, and C and believes that failures due to these
causes has been reduced, so that, while fewer failure will occur, it is more likely that
when one occurs, it will be due to D. To examine this claim the manufacturer will
sample 200 failed disk drives manufactured since process changes were made.
IF THE CHANGES HAD NO IMPACT then the number of these failed drives that
were due to causes A, B, C, and D that would be EXPECTED would be:
EA = npA0 = 200(.20) = 40 EB = npB0 = 200(.35) = 70
EC = npC0 = 200(.30) = 60 ED = npD0 = 200(.15) = 30

Upon observation, suppose that we had OA = 28, OB = 66, OC = 46, OD = 60.


Test the appropriate hypothesis at the= .05 level.
CONTINUED NEXT PAGE

Generic Example: A computer manufacturer produces a disk drive which has three
major causes of failure (A, B, C) and a variety of minor failure causes (D).
Failure Mode Profile Example - Continued

1) H0: pA = .20, pB = .35, pC = .30, pD = .15


HA: at least one pi ≠ pi0 for i = A, B, C, D

2) n = 200  = .05
3) DR: Reject H0 in favor of HA iff 2c > 2T = 7.8147. Otherwise, FTR H0.
Note: There are (k-1) = 3 degrees of freedom.
4) 2c = (Oi - npio)2/npio = (Oi - Ei)2/Ei
= (28-40)2/40 + (66-70)2/70 + (46-60)2/60 + (60-30)2/30
= 3.6000 + 0.2286 + 3.2667 + 30.0000 = 37.0953
5) Interpretation: Since 2c exceeds 2T, we can conclude that the historic failure mode
distribution no longer applies (reject H0 in favor of HA). So how has the distribution changed?
The answer is embedded in the individual category contributions to 2calc ... larger contributions
indicate where the changes have occurred: reductions in A and C, no obvious change in B, the
various failures that make-up D now comprise a (proportionally) larger amount of the failures.
Chi-Square Goodness of Fit Test
for the Poisson Distribution

A sample of 120 minutes selected during rush periods at FFB gave the
following number of customers arriving during each of those 120 minutes.
Is this data consistent with a Poisson distribution with a mean of 1.7
customers per minute, as previously stated? Test the appropriate hypothesis
at the  = .10 level of significance.

Number of 0 1 2 3 4 or more
Customers
Frequency 25 42 35 9 9
FFB of Centreville
Poisson Goodness of Fit Test
Customers/ Prob. Obs (O) Exp (E) (O-E)2/E
minute
0 0.1827 25 21.924 0.4316
1 0.3106 42 37.272 0.5998
2 0.2640 35 31.680 0.3479
3 0.1496 9 17.952 4.4640

>4 0.0932 9 11.184 0.4265


1.00 120 120 6.2698 = 2calc

with  = .10 and (k-1) = 4 df, the critical value is 7.7794


FFB of Centreville - Continued
1) H0: the number of customers arriving per minute is Poisson distributed
with a mean of 1.7. OR
p(0) = .1827 p(1) = .3106 p(2) = .2640 p(3) = .1496 p(4+) = .0932
HA: the number of customers arriving per minute is not Poisson with  = 1.7

2) n = 120 and  = .10

3) DR: Reject H0 in favor of HA iff 2calc > 2crit = 7.7794. Otherwise, FTR H0.
(NOTE - THERE ARE 4 DF)

4) 2calc = 6.2698 (calculations on previous slide)

5) FTR H0. In this case, the number of customers arriving per minute during the business rush
at FFB of Centreville is reasonably well-modeled by a Poisson distribution with a mean of 1.7.

As a modification --- if we had not had information about the mean number
of customers arriving per minute, we would have had to estimate this value
with the sample mean and then determined the estimated probabilities.
This would have cost an additional degree of freedom (e.g. df = (k-1) - 1 = 3.
 Goodness-of-Fit Test: Binomial
2

Example
Oil & Gas Exploration is both expensive and risky. The average cost of a “dry
hole” is in excess of $20 million. New technologies are always under
development in an effort to reduce the likelihood of drilling a “dry hole” with the
result being increased profitability. Suppose an experimental technology has been
developed that claims to have an 80% success rate (e.g. only 20% dry holes). This
technology was tested by drilling four holes and counting the number of
productive wells. This was done 100 times, each time counting the number of
productive wells. The data is recorded below:
Number of
productive wells 0 1 2 3 4
Observed 3 6 22 41 28
Frequency

Test the appropriate hypothesis at the  = .01 level of significance.


Oil & Gas Exploration
Example
1) H0: the new technology delivers success according to a binomial distribution
with p = .8
or ... p(0 or 1) = .0272 p(2) = .1536 p(3) = .4096 p(4) = .4096
(NOTE - SEE NEXT PAGE FOR THESE VALUES)
HA: the new technology does not deliver success according to a binomial
distribution with p=.8.

2) n = 100 and  = .01

3) DR: Reject H0 in favor of HA iff 2calc > 2crit = 11.3449. Otherwise, FTR H0.

4) 2calc = 33.678 (calculations on next slide)

5) Reject H0 in favor of HA. In this case, note that “O” tends to be greater than
“E” for lower numbers of successful wells, and the reverse for higher
numbers of successful wells ... this indicates that the success rate of the new
technology is LESS THAN THE CLAIMED 80% rate.
Oil & Gas Exploration Example
Continued
MTB > pdf;
SUBC> binomial n = 4, p=.8.

BINOMIAL WITH N = 4 P = 0.800000


K P( X = K) Observed Expected (O-E)2/E
0 0.0016 3 0.16
combine 0 & 1  .0272  9  2.72 14.4994
1 0.0256 6 2.56

2 0.1536 28 15.36 10.4017


3 0.4096 41 40.96 0.0000
4 0.4096 22 40.96 8.7764
100 100 33.678
MTB > invcdf .99;
SUBC> chis 3. = 2calc
0.9900 11.3449
This is 2crit, based on  = .01 with 3 df
Modified Oil & Gas Exploration Example
(still binomial)
If p were unknown, then it would have to be estimated from the data. There
is a cost to this --- a lost degree of freedom. In general df = (k - 1) - m
where k = number of categories
-1 because the probabilities across all categories add to one
(lacking only one probability, we can determine the other
m = the number of parameters that must be estimated.
In this case, the estimate of p is this: a total of 400 wells were drilled (100
fields at 4 wells each). The number of productive wells was
(3*0 + 6*1 + 28*2 + 41*3 + 22*4) = 273
So that our estimate of p is 273/400 = .6825. The modified calculations
follow.
Modified Oil & Gas Exploration Example
MTB > pdf;
SUBC> binomial n=4 p=.6825.

BINOMIAL WITH N = 4 P = 0.682500


K P( X = K) Observed Expected (O-E)2/E
0 0.0102
combine these .0976 9 9.76 0.0592
1 0.0874

2 0.2817 28 28.17 0.0010


3 0.4037 41 40.37 0.0098
4 0.2170 22 21.70 0.0041
0.0742 = calculated value of 2
MTB > invcdf .99;
SUBC> chis 2.
0.9900 9.2103 = critical value

Clearly we would FTR H0. So that if you combine the information, really, you have
not rejected the binomial distribution altogether ... though you did reject the binomial
distribution with p=.8. The binomial distribution with p=.6825 does an excellent job
of modeling the performance of this new oil & gas exploration technology.

You might also like