A Frequency Table Analytical Approach to Observational Data

by
Norm Shklov, Ph. D.
Department of Mathematics and Statistics
University of Windsor
Windsor, Ontario, Canada
(Retired)
and
Jay C. Powell, Ph.D.
Faculty of Education
University of Windsor
(Retired)
Running Head: Frequency table data analysis
September 2001.
ABSTRACT
Having established clear evidence for a form of learning that is transformational and
discontinuous, the authors herewith present a way to consider frequency data in the
educational and social-sciences setting. This paper describes the procedure in detail and gives
several instances of its application.
2
The Problem
These two researchers have, for some time, been investigating non-traditional
approaches to observational data. The initial problem arose because they were trying to find
some way to examine, in multiple-choice tests, the patterns in which students' answers
(whether right or wrong) change with learning. Their presumption has bel'that there may be a
discontinuous component to learning in addition to the linear cumulative one commonly
assumed.
As background to this problem there are two possibilities. The first possibility is that
all wrong answers are blind guesses and that students will change from any wrong answer to
the right one with appropriate instruction. In this case the frequent current practice of
considering only right answers when observing student behavior for quality control would be
entirely appropriate. There would be no meaningful information to be found among the
wrong answers.
The second possibility, which comes from Powell's teaching experience and from
Piaget's many years of clinical observation (see: Flavell, 1963), is that answers change in
some sort of sequential order that is dependent upon how students interpret the questions
being asked. Their interpretations, in tum, could be dependent upon their current style of
thinking and their current depth of intellectual maturity.
In this case, learning would display discontinuous transformational properties
(systematic answer changes) in addition to cumulative properties (increasing numbers of
correct answers). There might be considerable meaningful information in all answers, not
merely the right ones.
To look at change within the testing context, it is necessary to give the same test more
than once. When this is done, it becomes possible to compare the answers selected upon the
3
first administration to those selected on the second administration and build frequency tables
that summarize these selections. There are two possibilities. First, the students may select the
same answer on both administrations. Second, they may change from one answer to another
within each item.
If the first possibility (that students guess blindly until they know the answer) is
correct, the typical observation should be changes to the right answer from any wrong one
followed by systematic repetition of the right answer over time. No other combination of
selection should be statistically significant beyond a chance level.
If the second possibility is correct, then a much more complex pattern of answer
stability and answer change should become evident as students transform their thinking with
increased maturity.
-<:.
The Approach
In a four-option multiple-choice test there will be a four by four frequency matrix for
each item containing 16 cells. One direction may be used to represent the responses from the
first administration of the test and the other direction to represent the responses from the
second administration. In this case each cell will be the frequency of the event of joint
selection for each pre-post response pair. Those students who, for some reason, omitted
making a choice at either of two administrations may be dropped from consideration because
these data will not add to the information about the dynamics of answering from within the
test.
With these frequency tables, there are four pieces of information that can be
considered firm for each cell frequency. These are:
4
1) The number of times the members of the sample chose both members of the pair
of responses being considered (the observed frequency),
2) The number of times this group chose the first member of the pair on the first
administration (row or column sums),
3) The number of times this group of students chose the second member of this pair
on the second administration (column or row sums), and
4) The total nwnber of students in the entire frequency table (N).
In mathematical terms, the values being considered are the cell frequency, its
associated row and column sums, and the table sum.
Unlike typical contingency tables, the concern here is with each cell instead of the
overall departure of the entire matrix from a l or some other generalized mathematical
distribution. The problem being considered here is trying to decide how large (or how small)
an observed frequency in any cell must be in order to indicate a meaningfl1l (statistically
significant) joint (or mutually exclusive) event.
Insert Table 1 about here
Suppose, as illustrated in Table l.a, that we have a joint choice of 11, a first variable
frequency of 15, and a second variable frequency of 19. If the group size is 23, the expected
joint choice frequency is (15 x 19)/23 = 12.39. In this case, 11 is a lesser value than this
expectation but may not be enough below this expected value to conclude that choosing one
option implies the rejection of the other. If the group size is 90, as illustrated in Table l.b, the
expected frequency is (15 x 19)/90 = 3.17. Are 11 observations enough larger than this
expectation to indicate a systematic joint choice?
.5
~ -tb' £::. investigation of the literature The nearest procedure that seemed to have
credibility was the proposal of Fuchs and Kennett (1980). Using a simulation of this problem
with 10,000 cases on two by two tables, the standard deviation from their proposed procedure
was consistently greater that 1.00. The observation indicates that their proposal has a
conservative bias in which frequencies that should be accepted would be rejected.
In this same simulation for finding a way to put a probability value on one cell with
know marginal sums and any total frequency were employed. Shklov and Powell (1988)
summarized the most important of these attempts, in which they compared, in simulated
conditions; the binomial, multinomial, hypergeometric and uniform distributions. Only the
multinomial distribution gave a statistically significant fit with constrained marginal totals.
Historically, this problem ar-ose from the observation that, on highly discriminating
test questions, par-ticular "wrong" answers tend to cluster at specific narrow ranges of the
total scores. These answer tend to cluster around their modes of selection across an entire
test. This property is well known from item response theory (IRT) but the interpretation of
these answers has been problematic.
To address this interpretation issue, two approaches are possible. First, interpretations
can be assigned to distracters based upon errors in information or logic. Powell (1970) and
Powell and Isbister (1974) attempted this approach with mixed results. These classifications
were less stable, using linear statistical procedmes, than would be hoped fo If a major
improvement in testing technology were to be achieved.
The second possibility would be to cluster these answers statistically and then attempt
to provide interpretations from written reports ofreasoning or from interviews. Powell (1968)
showed that, with adults, answers in the "wrong answer" set could be predicted from
reasoning reports about two thirds of the time. Using interviews with students from the third
6
through the eighth grades Powell (1977) found a Piaget-like sequence of interpretations of
these answer clusters with a consistency of more than 50 % but less than 65 %.. This study
also showed, as would be expected because these subsets clustered around their modes of
selection, a strong age-dependent sequence in their order.
There were unanswered question remaining from this latter study. First, how much
influence does the thought processes leading to the "wrong" answers have upon the "right"
answers being observed. By predicting the total scores on another test from all of the
subscores on the test showing the Piaget-like sequence, Powell (1976) showed that the
"wrong" answer subscores were first in frequently and explained more of the
variability than the "right" answer subscores (both concrete and abstract).
Second, ifthere is truly a Piaget-like developmental progression among these
answers, then this lea.';ing sequence must be discontinuous and the changes of answers from
k
phase to the next should be appropriate, both in direction and by age level.
,.
Giving the test twice across the age range from less than eight to more than nineteen and
using the multinomial procedure (described in detail in this present paper) to establish which
cells were statistically significant, Powell and Shklov (1992) showed that both these
conditions were met. Of additional interest is the observation that although nearly every age
level was highly statistically significant for the repeated selection of the "right" answer, This
test-retest value accounted for only 23 % of all the answers pairs on this test. When the sum
of the frequencies in all the significant cells was tabulated, 75 % of all the answer pairs were
explained as meaningful (P ""7'0.945).
Discussions with other psychometricians suggested that some of the more recent
econometric techniques might serve this same purpose. Keswick (2001) undertook this
attempt. This is what he reported:
7
I applied a logistic regression, an ANOVA (using the categorizations - two
by two and four by four transition matrices), a Chi Square (again using the
categorization), and since the categories are ordinals I applied a tobit (which is just an
ordered probit analysis). No procedure explained more variance. Of greater
concern, none really showed a direct detectable relationship.
It would seem, therefore, that to detect discontinuities in learning that may influence
the scores we obtain for assessing student performance, some other approach than the typical
linear statistical analysis would seem to be required. One possible approach is the application
paper to provide the details of this procedure, with examples from different types of
of the adaptation we have developed of the multinomial procedures. It is the purpose of this
qualitative research.
L..---
/
I
The Procedure
Step 1.
This procedure begins with an m x n matrix of frequencies 0u as follows:
011 0
12
0
1
"
R
1
0
21
0
22
O
2
,, R
2
Oml Om2
Om"
R
m
C
1
C
2
C"
N
n
where: R='L.o
I IJ '
)=1
m
~ . ='L. 0
IJ ,
i=]
(1)
(2)
8
and:
Step 2
In n
N= 2. 2. 0u.
i=lj=1
(3)
The next step is to collapse this matrix around any cell 0u . This step produces a 2 x 2
matrix, with the observed frequency of 0u being 0, of the sort:
0 j R', where: j=R'j-o,
g h R'2 R'2 = N - R'I ,
C'I C'2 N g= C', - 0,
C'2=N-C'I,
and h = N - (0 +j+ g).
In words,jis the frequency of the remainder of the row, g is the frequency of the remainder
of the column and h is the residual frequency of the total table. Tables 2 and 3 give a
numerical example of these first two steps in this procedure, begilming with the 4 x 4
frequency matrix, such as:
Insert Table 2 about here
By choosing to look at 0
22
(with a frequency of 0 = 33; the repeated choice of the
correct answer as shown in the box in Table 2) the collapsed matrix becomes:
Insert Table 3 about here
In order to determine the probability that 0 would be 33 by chance alone, it is
necessary to find two cumulative probabilities. First, it is necessary to find the cumulative
9
probability for the range of all possible values in that cell with the tlu'ee marginal values kept
constant. Second, there is the need to find the cumulative probability for all single tables that
have a value of 33 or less.
To achieve this end, it is necessary to find the smallest possible value and the largest
possible value that 0 can achieve within these marginal constraints.
Since these values are all frequencies, and the least possible frequency is zero (0), the
least possible value that 0 can possess is when either the frequency value of0 or ofh is zero
(0). Similarly, the greatest possible value that cell 0 can contain occurs when either the
fi'equency of f or of g is zero (0).
Mathematically:
O(min) = 0, if 0 :::; h; otherwise O(min) = 0 - h
o(max) = the lesser of R' I and C' I
(4)
(5)
In the present case, the minimum value is 0 (zero) and the maximum value is 46.
Step 3
The multinomial equation is applied to the range of all possible values from the
minimum (0) to the maximum (46), adding these partial probabilities to find the total
possible probability (PI )of events within these constrained conditions. The equation
becomes:
(6)
O(min)
[0

Sign up to vote on this title
UsefulNot useful