Professional Documents
Culture Documents
I . Introduction
If we knew the probabilities of all things, good thinking would be apt
calculation, and the proper subject of theories of good reasoning would be
confined to the artful and efficient assembly of judgments in accordance
with the calculus of probabilities. In fact, we know the probabilities of very
few things, and to understand inductive competence we must construct
theories that will explain how preferences among hypotheses may be
established in the absence of such knowledge. More than that, we must
also understand how the procedures for establishing preferences in the
absence of knowledge of probabilities combine with fragmentary knowl-
edge of probability and with procedures for inferring probabilities. And,
still more, we need to understand how knowledge of probabilities is
obtained from other sorts of knowledge, and how preferences among
hypotheses, knowledge of probabilities, and knowledge of other kinds
constrain or determine rational belief and action.
For at least two reasons, the first and second of these questions have
been neglected for many years by almost everyone except groups of
philosophers who have been virtually without influence on practitioners of
"inductive logic." One reason, unfortunately, is that a long tradition of
attempts to construct an account of hypothesis testing, confirmation, or
comparison outside of the calculus of probabilities has failed to achieve a
theory possessing both sufficient logical clarity and structure to merit
serious interest, and sufficient realism to be applied in scientific and
engineering contexts with plausible results. A second reason is the recent
predominance of the personal or subjective interpretation of probability:
probability is degree of rational belief, and rational inference consists in
appropriate changes in the distribution of intensity of belief. Rationality at
any moment requires only that belief intensities be so distributed as to
satisfy the axioms of probability. Thus, even when we reason about things
in the absence of knowledge of objective chances, we still retain degrees of
3
4 Clark Glymour
Schema 11: Let H and T be as above and mutually consistent, and let e* be a
set of values for the variables Cj consistent with H and T. For each x( in
Var(H) let Tf be a subtheory of T such that
(i) Tj determines Xj as a (possibly partial) function of the Cj, denoted
Ti(/ ej);\
(ii) the set of values for Var(H) given by xs* = Tj(e*) satisfies H;
(iii) for each xs in Var(H), there exists a set of values e^ for the Cj such
that for k ± i, T k (eD = Tk(e*) and the set of values for Var(H)
given by x^ = T^e*) does not satisfy H;
(iv) for all i, H does not entail that Tj is equivalent to any equation K
such that Var(K) C Var(Tj).
When the conditions of Schema II are met, the values e* are again said to
provide a positive test of H with respect to T.
Schemas I and II are not equivalent, and Schema II imposes stronger
requirements on positive tests. For example, let H be x j = x 2 -x 3 .
Suppose that the xs are measured directly, so that xf = e{ in Schemas I and
II. Let the evidence be that X! = 0, \2 — 0, and x3 = 1. Then H is
positively tested according to Schema I but not according to Schema II.
The difference between these two schemas lies entirely in the third
clause, for which there is a natural further alternative:
Schema HI
(iii) If K is any sentence such that &T entails K and Var(K) C Var(H),
then K is valid.
Schema III is neither stronger nor weaker than Schema II. For the
example just given, X! = X2 = 0, and x3 = 1 does count as a positive test
according to III. On the other hand, if we have
H: x > 3
T: x = e2
8 Clark Glymour
and evidence e = 2, where variables are real valued, then we have a positive
test according to I and II but not according to III since T entails x > 0. Hence
all three schemas are inequivalent. Further, Schema III satisfies a
consequence condition: If H is positively tested with respect to T, then so is
any consequence G of H, so long as Var(G) C Var(H).
3. Variant Schemes
3.1 Set Values
Schemas I, II, and III are stated in such a way that the hypothesis tested,
H, may state an equation or an inequality. The T, in these schemas,
however, must be equations. In situations in which the Tj include
inequations the schemas must be modified so that the variables become, in
effect, set valued. Rynasiewicz and Van Fraassen have both remarked that
they see no reason to require that unique values of the computed quantities
be determined from the evidence, and I quite agree—in fact, in the
lamented Chapter 8 of Theory and Evidence, the testing scheme for
equations is applied without the requirement of unique values.
A set valuation for a collection of variables is simply a mapping taking
each variable in the collection to a subset of the algebra A. For any algebraic
function f, the value of f(x) when the value of x is (in) A is (f(a) I a is in A} ,
written f(A). If x; = TJ(CJ) is an equation, the value of the x} determined by
that equation for values of 6j in A j is the value of T,(ej) when the value of Cj
is Aj. If Xj > Tj(ej) is an inequation, the value of xf determined by that
inequation for values of Cj in Aj, is {a in A : b is in Tj(Aj) implies a > b}.
(There is an alternative definition {a in A I there exists b in T f (Aj) such that b
> a}). For a collection of inequations determining x; , the set-value of x(
determined by the inequations jointly is the intersection of the set-values
for Xj that they determine individually.
An equation or inequation H(XJ) is satisfied by set values A f for its
essential variables x( if there exists for each i a value x,* in A ( such that the
collection of Xj* satisfy H in the usual sense. The notions of equivalence and
entailment are explained just as before. With these explications, Schemas
I, II, and III make sense if the e* are set values or if the T assert
inequalities. (Of course, in the schemas we must then understand a value of
x to be any proper subset of the set of all possible values of x.)
3.2 Extending "Quantities"
The first condition of each schema requires that for H to be positively
ON TESTING AND EVIDENCE 9
4. Questions of Separability
I view the bootstrap testing of a theory as a matter of testing one equation
in a theory against others, with an eye in theory construction towards
establishing a set of equations each of which is positively tested with
respect to the entire system. Clearly, in many systems of equations there
are epistemological entanglements among hypotheses, and that fact consti-
tutes one of the grains of truth in holism. Are there natural ways to describe
or measure such entanglements?
Let E be a set of variables. I shall assume that all possible evidence
consists in all possible assignments of values to variables in E. Explicit
reference to E is sometimes suppressed in what follows, but it should be
understood that all notions are relative to E. We consider some definite
ON TESTING AND EVIDENCE 11
The input signals are the x f , the output signal is z. Cusps signify an "or" or
" + " gate, and dees signify an "and" or "•" gate. Each line of the circuit is
an immediate function either of inputs or of lines to which it is connected by
a gate. Thus expressing each line, and also the output z, as a function of
those lines that lie to the left of it and are connected to it by a gate, or as a
function of the input variables where no gate intervenes, results in a system
ON TESTING AND EVIDENCE 13
and this equation is called the representative of (1) for the computations
given. Values of the input variables and of the output variables will
constitute a test of hypothesis (1) if and only if:
since this condition is necessary and sufficient for the computation of e from
the data to determine a value for e, and since, also, (2) is not a Boolean
identity. Thus, assuming that condition (3) is met by the data, the
representative (2) of the hypothesis becomes:
(4) z = 0.
Any of the following inputs will therefore test hypothesis (1):
(xj, x2, x.3, x4)
(1, 0, 0, 0)
(0, 0, 0, 0)
(1, 0, 0, 1)
(0, 0, 0, 1)
14 Clark Glymour
and the output z must in each case be 0 if the hypothesis is to pass the test, 1
if it is to fail the test.
Alternative hypotheses usually considered by computer scientists are
that the e gate is stuck at 1 (s.a. 1.) or that it is stuck at 0 (s.a.O.). The same
computations apply to either of these hypotheses as to hypothesis (1). For
the stuck at 1 fault the hypothesis to be tested is e = 1, and the
representative of the hypothesis for the computations is:
which is, again assuming the condition (3) for the determinacy of e,
(6) z = 1.
As with the first case, we obtain tests of the hypothesis that e is s.a.l. if and
only if the data meet condition (3), and the value of z required for the
hypothesis to pass the test is in each case the complement of the value
required for hypothesis (1) to pass the tests. Thus each of the inputs is a test
of e = 1 if and only if it is a test of (1), and each such test discriminates
between the two hypotheses, since the predicted outputs are always
different, i.e., the representatives are incompatible.
For the hypothesis that e is s.a.O., everything is as before, but when
condition (3) for the determinacy of e is met, the representative of the
hypothesis becomes
(4) z = 0.
Thus no bootstrap test will discriminate between a s.a.O. fault and the
normal condition given by hypothesis (1).
A standard technique in the computer science literature for detecting
facults in logic circuits derives from the observation that, for example, the
s.a.O. fault can be discriminated from the normal condition of a line,
provided there is a combination of inputs that, in the normal condition,
gives the line the value 1, and, for these inputs, the output variable z would
be different from the value determined by the normal circuit if the line
value were 0 instead of 1. This "Boolean derivative" method, applied to the
case discussed above in which the normal hypothesis is (1) and the two
alternative hypotheses are that e is s.a.O. and that e is s.a.l., yields the
following necessary and sufficient condition for a set of input values (a1; a2,
a3, 34) to be a test that discriminates the normal condition from the s.a.O.
fault:
ON TESTING AND EVIDENCE 15
dz
In equations (7) and (8) -r- is the Boolean derivative, i.e., the Boolean
de
function whose value is 1 if and only if a change in the value of e changes the
value of z when x2, x3, x4 are kept constant.
For the case given it can easily be shown that:
dz
and since e = Xj-x 2 , it follows that e i -j— = 0 identically, and therefore in
de
computer parlance the s.a.O. fault is not testable, i.e., not discriminable
from the normal circuit element. This is the same result obtained from
bootstrap testing. For the s.a.l. fault, the Boolean derivative method yields
as a condition necessary and sufficient for discriminating the fault:
7. Bayesian Bootstrapping
In scientific contexts, Bayesian probability not only rests on a different
conception of probability from the one embraced by the more "orthodox"
tradition in statistics, it also has different applications. Despite my distaste
for Bayesian confirmation theory as a general theory of rational belief and
rational change of belief—a theory that is, in Glenn Shafer's words, suited
for "never-never land"—I think Bayesian and related techniques are
practical and appealing in various scientific and engineering circum-
stances. In this and the next two sections, we focus on the first of the
ON TESTING AND EVIDENCE 17
conditions in the three schemas for bootstrap testing, and its interaction
with probability.
How can we associate a probability measure, closed under Boolean
operations, with a set of equations or inequations in specified variables?
Straightforwardly, by considering for n-variables an n-dimensional space
Sn and associating with every set K of equations or inequations the solution
set of K in S n . The probability of K is then the value of a probability
measure on S n for the solution set of K.
Consider a set of variables xf essential to a hypothesis H, and a (prior)
probability distribution Prob over the space of the Xj. Let Prob(H) be the
prior probability of H, i.e., of the solution set of H. Suppose values x s * are
computed for the variables \( from the measured variables Cj using some
theory T. Let Tj be the fragment of T used to compute x f , and let (&T) be
the set of Tj for all of the x; occurring in H. We can associate T and &T with
subspaces in the space of the variables x, and e,. Prob and the computed
values for the variables xf could be taken to determine
(1) Prob(H/ Xj - x*) = Prob(H/ ej = e* & (&T))
where the e* are the measured values of the Cj.
Where we are dealing with point values xf and where the Tf are
equations, (1) will typically require that we conditionalize on a set with
probability measure zero, and so we would need either approximation
techniques or an extended, nonstandard, notion of conditionalization.
Finally, one might take the values e* for e, to confirm H with respect to T
if
(2) Prob(H/ Xi = x*) = Prob(H/ej - e* & (&T)) > Prob (H).
This is essentially the proposal made by Aaron Edidin. For Schemas I and
II, if we take this procedure strictly, then hypotheses involving the x; that
are not testable because they are entailed by &T will always get posterior
probability one. On the other hand, if we avoid this difficulty as Edidin
suggests by conditionalizing only when H is not entailed by &T, then
Prob(H/(&T) & e, = e*) will not necessarily be a probability distribution.
8. Jeffrey Conditionalization
Sometimes we do have established probabilities for a collection of
hypotheses regarding a set of variables, and we regard a consistent body of
such probabilized hypotheses as forming a theory that we wish to test or to
18 Clark Glymour
For hypotheses about the xs, how can the probabilistic conclusions, when
independent, be combined with the prior probabilities of hypotheses to
generate a new, posterior probability? Let E s be set of mutually exclusive,
jointly exhaustive propositions, Prob a prior probability distribution and
Pos(Ej) a new probability for each Ej. Richard Jeffrey's rule for obtaining a
posterior distribution from this information is:
(3) P o s ( H ) = 2 P o s ( E j ) - P r o b ( H / E i ) .
Applied to the present context, Jeffrey's rule suggests that we have for any
hypothesis H(xJ
(4) Pos(H) = Prob(&T)-Prob(H/x*) + (l-Prob(&T))-Prob(H/x"*)
where x* = Tj(e*) and x* is the set complementary to x* and,
e.g., Prob(H/x*) is short for Prob(H/Xi = x* for all Xj in Var (H)).
Equation (4) is not a strict application of Jeffrey's rule (3), however, for (3)
uses the posterior probability of the evidence, Pos(Ej), whereas in (4) the
prior probability of the computed values, Prob(&T), is used. Carnap, and
more recently H. Field and others, have noted that Jeffrey's Pos(Ej)
depends not only on the impact of whatever new evidence there is for E i5
but also on the prior probability of E,. In contrast, the Prob(&T) of (4) need
not depend on the prior distribution of the Xj.
ON TESTING AND EVIDENCE 19
that this same evidence e*, and same theory &T, gives x, = x(* a degree of
support equal to one-third. But that conclusion is required by the construal
of degree of support as a probability, and is the source of one of the
difficulties with rule (4) of the previous section.
What is required is a way of combining a "degree of support" for a claim
with a prior probability distribution that has no effect on the degree of
support that the evidence provides. Jeffrey's rule gives us no way to
construct the combination effectively. It gives us at best a condition (3),
which we might impose on the result of the combination, whatever it is. A
theory proposed by A. Dempster and considerably elaborated in recent
years by G. Shafer1 has most of the desired properties.
To explain how the rudiments of Shafer's theory arise, consider a space of
hypotheses having only two points, h1 and h2. Suppose there is a prior
probability distribution Bel1 over the space ft. Given that 6j = Cj* and that
no other theory of known probability (save extensions of &T) determines h1
or h2 as functions of the e J5 we take the degree of support, Bel2 (h1), that
the evidence provides for h1 to be
Prob (&T)
if the Xj* satisfy h1. Under these same assumptions, what is the degree of
support that the evidence provides for h2? None at all. The evidence
provides no reason to believe h2, but it does provide some reason to
believe h 1 . We set Bel2(h2) = 0. It seems a natural and harmless
convention to suppose that any piece of evidence gives complete support to
any sentence that is certain. Hence Bel2(ft) = 1.
The function Bel2, which represents the degrees of support that 6j*
provides with respect to &T, does not satisfy the usual axioms of probabil-
ity. Shafer calls such functions Belief functions. They include as special
cases functions such as Bel1, which satisfy the usual axioms of probability.
In general, given ft, a belief function Bel is a function 2 n ^» [0, 1] such that
Bel((J>) = 0, Bel (ft) = 1, and for any positive integer n and every collection
Our problem is this: How should Bel1 and Bel2 be used to determine a
new belief function in the light of the evidence, 6j*? Shafer's answer is
given most simply in terms of functions that are determined by the belief
functions. Given Bel: 2n —> [0, 1], define
ON TESTING AND EVIDENCE 21
where
(i) The set of mutally incompatible theories T1; . . . ,T n for which there
are established probabilities do not jointly determine Prob(A/E)
but do determine a lower bound for Prob(A/E), where A is a prop-
er subset of the space of interest and E is the "direct" evidence.
(ii) Separate pieces of evidence E 2 , E3 . . . E n are statistically
independent.
(iii) None of the theories T l5 . . . , Tn individually entails that the
probability of any proper subset of O is unity.
Condition (iii) is, of course, a version of the conditions characteristic of
bootstrap testing.
Likewise, from each of the input variables x\,. . . ,xn whose values must be
fixed at (ai, . . . ,a n ) in order for e = f(z) to hold, there is a path leading from
the variable to some node of G. Call the graph of all such paths S.
Assuming that the probabilities of failure of the circuit elements are
independent, we can estimate the probability that e = f(z) for inputs a l 5
. . . an from the graphs S and G.
We have a prior probability that e = x 1 -x 2 , given by one minus the
probability of failure of the gate leading to the e line. For a measured value
z* of z and inputs (a^. ,a n ) we also have a probability—that computed in the
preceding paragraph—that e = e* = f(z*). Assuming X! and x2 known
from measurement, for appropriate values of the measured variables we
thus have a degree of support for Xi*x 2 = e, although we may have none
for Xi'X 2 =£ e. If the measured value of z is not the value predicted by the
normal circuit for the measured inputs (including xi and x2) then the same
procedure may support Xi • x2 =£e. In the more general case, the values of
the lines leading to the e gate would not be known directly from
measurement but would also have to be computed, and would for given
inputs have probabilities less than one (determined, obviously, by the
probabilities of the failure of gates in the graph of paths leading/rom input
gates to the e-gate).
24 Clark Glymour
The basic idea behind the Boolean derivative method is the natural and
correct one that a fault is detectable by an input if, were the circuit normal,
the value of some output variable would be different for that input than it
would be if the circuit contained the fault in question (but no other faults).
The following equivalence holds: an input (ai. . . a n ) discriminates
between the hypothesis that a circuit line e is "normal" (i.e., as in the
circuit description) and stuck at 1 (0) according to the Boolean derivative
method if and only if (i) there is a bootstrap test (in the sense of Schema I or
III) with a data set including ( a ! . . . a n ) that tests, with respect to the system
of equations determined by the circuit, an equation E, entailed by the
system, giving e as a function of input values or of values of lines into the
gate of which e is an output line and (ii) there is a bootstrap test with the
same data sets that tests e = 1 (0) with respect to the system of equations
determined by the circuit obtained from the original circuit by deleting all
lines antecedent to (that is, on the input side of) e, and (iii) the data set
confirms E with respect to the equations of the original circuit if and only if
it disconfirms e — 1 (0) with respect to the equations of the modified
circuit.
The equivalence follows from a few observations that may serve as the
sketch of a proof. First, if the Boolean derivative condition is met, then
some output variable z( can be expressed as a function of e and input
variables connected to zf by lines not passing through e. If E, is the
equation expressing this relationship, then E f will hold in both the original
and the modified (i.e., stuck) circuits. Moreover, the Boolean derivative
dzj(e,X!,. . . ,xj
;
de
equals 1 exactly for those values of the input variables
which determine e as a function of z,. It follows that every Boolean
derivative discrimination is a bootstrap discrimination. Conversely, if
conditions for bootstrap testing are met, there must be an equation
But each of the zi is a function of the xl. . . xn and e, and thus with Xj = ai;
of e alone:
where
where e' = 1—e, and formula (7) has value 1 if and only if its two component
subformulae have distinct values. But
g( Z l (e),...,z k (e)) *g( Z l (e'),...,z k (e'))
only if for some i, Zj(e)=£ z,(e'), and hence only if for some i
Notes
1. G. Shafer, A Mathematical Theory of Evidence (Princeton: Princeton University Press,
1980).
2. See for example, A. Thayse and M. Davio, "Boolean Calculus and Its Applications to
Switching Theory," IEEE Trans. Computers, C-22, 1973, and S. Lee, Modern Switching
Theory and Digital Design (Englewood Cliffs, N.J.: Prentice Hall, 1978).
3. Research for this essay was supported by the National Science Foundation, Grant No.
SES 80 2556. I thank Kevin Kelly and Alison Kost for their kind help with illustrations.