Professional Documents
Culture Documents
Example:
Are Employees at Organizations with Differing
Views of Empowerment Equally Satisfied?
Well-Known American Statistician:
George Gallup
George Gallup (1901-84), American public opinion analyst and
statistician, educated at the University of Iowa. He was head of the
journalism department at Drake University (1929-31), professor of
journalism and advertising at Northwestern University (1931-32), and
professor at the Pulitzer School of Journalism, Columbia University
(1935-37). In 1935 he founded and became director of the American
Institute for Public Opinion, and in 1936 he established the British
Institute of Public Opinion. Gallup was a pioneer in the use of statistical
methods for measuring the interest of readers in the features and
advertisements of magazines and newspapers and for determining public
opinion on general issues. He extended this research to include reactions
of radio audiences and founded the Audience Research Institute in 1939.
Introduction
Chi-square analysis methods are approximate methods that
are among the most commonly used of all statistical
techniques.
The method introduced will be used to examine “frequency”
or “count” data.
The methods are conceptually simple although manual
computation can be tedious.
Chi-Square Tables
2 tables are included in most statistics texts
and consist of columns and rows, with columns
representing areas under the curve and rows
associated with the degrees of freedom (df)
which, for 2 tests of homogeneity and
independence are: df = (r-1)(c-1).
Chi-Square Tables
• Typical columns are: 2.100 2.050 2.025 2.010 2.005
• The decision rule both for for a chi-square test of homogeneity and
one of independence is:
DR: Reject H0 in favor of HA if and only if 2calc > 2crit.
Otherwise, FTR H0.
• In the case of homogeneity, this is essentially:
DR: Reject similarity of processes in favor of distinctions
between
them if and only if their profiles differ markedly from one another
so that the preponderance of evidence supports distinctions.
Otherwise, FTR H0.
2crit Determination
With 8 df and = .05
df 2.100 2.050 2.025 2.010 2.005
0.1
2.05,8 = 15.5073
C4
= .05
0.0
00 8 15.5073
10 20
C3
This test examines whether several populations or processes
are, effectively, really just one process (homogeneous) or
distinct ones.
Test of Homogeneity
2
Expressing Homogeneity
Expressed as probabilities, the 2 test of
homogeneity can be represented as follows:
Expressing Homogeneity
The Test Statistic
Let ni. be the number of items sampled from the ith
process.
Let n = n1. + n2. + ... + nr. be the total number of items
sampled from the (r) processes.
Let Oij be the number of items from the ith process sample
that are best-described by the jth category of the trait of
interest.
Let n.j be the number of items best-described by the jth
category of the trait of interest, irrespective of the process
from which an item originates.
• Under H0 p.j is best-estimated by p. ^ = n. /n
j j
^
pij is estimated by p.j under homogeneity or, more
generally, by Oij/ni.
• For each cell in the table we compare the two
estimates of the cell probability, each weighted by
^
the sample size, that is: (Oij - ni.p.j)
• We square the differences, standardize, and sum the
results to obtain 2calc. ^
• It can be seen that Eij = ni.p.j = (ni.)(n.j)/n
or, ultimately, the same as in independence.
2
calc
^j)2/(ni.p.^j) = (Oij - Eij)2/Eij
= (Oij - ni.p.
Autocratic
5 15 25 35 20 100
Observed Data
Low Value
10 25 49 10 6 100
Input Valued
45 35 10 9 1 100
60 75 84 54 27 300
Low Value
20.00 25.00 28.00 18.00 9.00 100
Expected Frequencies
Input Valued
20.00 25.00 28.00 18.00 9.00 100
60 75 84 54 27 300
Cell Contributions to
Input Valued 31.250 4.000 11.571 4.500 27.111
calc 58.432
Auto- 5 15 25 35 20 100
cratic 20.00 25.00 28.00 18.00 9.00
11.250 4.000 0.321 16.056 13.444
Low 10 25 49 10 6 100
Value 20.00 25.00 28.00 18.00 9.00
5.000 0.000 15.750 3.556 1.000
Input 45 35 10 9 1 100
Valued 20.00 25.00 28.00 18.00 9.00
31.25 4.000 11.571 4.500 7.111
60 75 84 54 27 300
128.634
A Brief Interpretation
Policies that more greatly empower employees tend to lead
to employees with more positive outlooks while policies
that provide little power to employees tend to lead to
employees with more negative outlooks.
Intermediate empowerment is associated with intermediate
job satisfaction.
Chi-Square Test of Independence
Example:
Is there a Link Between
Employee Empowerment & Customer Satisfaction?
The Chi-Square (2) Test of Independence is
used to determine whether two factors or traits are
related to one another and, if so, to identify the
nature of the relationship.
Warning labels and seat belt laws have resulted
from use of this test.
To make progress, we need to recall the formal
definition of independence.
Expressing Independence
Two traits, R and C, are independent if: P(RC) = P(R)P(C)
• pij is estimated by
Oij/n if the two traits are dependent, by
^^
pi.p.j if the two traits are independent.
^.p.
where Eij = np ^ = (n .)(n. )/n is the expected
i j i j
number of values at the intersection of the ith row
& jth column under independence.
Very Low 13 11 8 5 3 40
Low 10 18 19 12 6 65
Moderate 18 32 42 44 34 170
High 12 16 34 57 61 180
Very High 1 3 8 14 19 45
Total 54 80 111 132 123 500
Observed
The Hypotheses & Decision
Rule
H0: pij = pi.p.j for
i = very low, low, moderate, high, very high customer sat.
j = very low, low, moderate, high, very high empower.
HA: pij pi.p.j for at least one i,j combination
DR: Reject H0 in favor of HA if and only if 2calc >
2crit = 31.9999 . Otherwise, FTR H0.
2 10 18 19 12 6 65
7.02 10.40 14.43 17.16 15.99
3 18 32 42 44 34 170
18.36 27.20 37.74 44.88 41.82
4 12 16 34 57 61 180
19.44 28.80 39.96 47.52 44.28
5 1 3 8 14 19 45
4.86 7.20 9.99 11.88 11.07
Generic Example: A computer manufacturer produces a disk drive which has three
major causes of failure (A, B, C) and a variety of minor failure causes (D).
Failure Mode Profile Example - Continued
2) n = 200 = .05
3) DR: Reject H0 in favor of HA iff 2c > 2T = 7.8147. Otherwise, FTR H0.
Note: There are (k-1) = 3 degrees of freedom.
4) 2c = (Oi - npio)2/npio = (Oi - Ei)2/Ei
= (28-40)2/40 + (66-70)2/70 + (46-60)2/60 + (60-30)2/30
= 3.6000 + 0.2286 + 3.2667 + 30.0000 = 37.0953
5) Interpretation: Since 2c exceeds 2T, we can conclude that the historic failure mode
distribution no longer applies (reject H0 in favor of HA). So how has the distribution changed?
The answer is embedded in the individual category contributions to 2calc ... larger contributions
indicate where the changes have occurred: reductions in A and C, no obvious change in B, the
various failures that make-up D now comprise a (proportionally) larger amount of the failures.
Chi-Square Goodness of Fit Test
for the Poisson Distribution
A sample of 120 minutes selected during rush periods at FFB gave the
following number of customers arriving during each of those 120 minutes.
Is this data consistent with a Poisson distribution with a mean of 1.7
customers per minute, as previously stated? Test the appropriate hypothesis
at the = .10 level of significance.
Number of 0 1 2 3 4 or more
Customers
Frequency 25 42 35 9 9
FFB of Centreville
Poisson Goodness of Fit Test
Customers/ Prob. Obs (O) Exp (E) (O-E)2/E
minute
0 0.1827 25 21.924 0.4316
1 0.3106 42 37.272 0.5998
2 0.2640 35 31.680 0.3479
3 0.1496 9 17.952 4.4640
3) DR: Reject H0 in favor of HA iff 2calc > 2crit = 7.7794. Otherwise, FTR H0.
(NOTE - THERE ARE 4 DF)
5) FTR H0. In this case, the number of customers arriving per minute during the business rush
at FFB of Centreville is reasonably well-modeled by a Poisson distribution with a mean of 1.7.
As a modification --- if we had not had information about the mean number
of customers arriving per minute, we would have had to estimate this value
with the sample mean and then determined the estimated probabilities.
This would have cost an additional degree of freedom (e.g. df = (k-1) - 1 = 3.
Goodness-of-Fit Test: Binomial
2
Example
Oil & Gas Exploration is both expensive and risky. The average cost of a “dry
hole” is in excess of $20 million. New technologies are always under
development in an effort to reduce the likelihood of drilling a “dry hole” with the
result being increased profitability. Suppose an experimental technology has been
developed that claims to have an 80% success rate (e.g. only 20% dry holes). This
technology was tested by drilling four holes and counting the number of
productive wells. This was done 100 times, each time counting the number of
productive wells. The data is recorded below:
Number of
productive wells 0 1 2 3 4
Observed 3 6 22 41 28
Frequency
3) DR: Reject H0 in favor of HA iff 2calc > 2crit = 11.3449. Otherwise, FTR H0.
5) Reject H0 in favor of HA. In this case, note that “O” tends to be greater than
“E” for lower numbers of successful wells, and the reverse for higher
numbers of successful wells ... this indicates that the success rate of the new
technology is LESS THAN THE CLAIMED 80% rate.
Oil & Gas Exploration Example
Continued
MTB > pdf;
SUBC> binomial n = 4, p=.8.
Clearly we would FTR H0. So that if you combine the information, really, you have
not rejected the binomial distribution altogether ... though you did reject the binomial
distribution with p=.8. The binomial distribution with p=.6825 does an excellent job
of modeling the performance of this new oil & gas exploration technology.