You are on page 1of 9

Explaining Image Classifiers

Hana Chockler1 , Joseph Y. Halpern2


1
King’s College, London, U.K
2
Cornell University, Ithaca, New York, USA
hana.chockler@kcl.ac.uk, halpern@cs.cornell.edu
arXiv:2401.13752v1 [cs.AI] 24 Jan 2024

Abstract and a number of attempts to provide explana-


tions for the output of DNNs [Ribeiro et al., 2016;
We focus on explaining image classifiers, taking Selvaraju et al., 2017; Lundberg and Lee, 2017;
the work of Mothilal et al. [2021] (MMTS) as Sun et al., 2020; Chockler et al., 2021] ([Molnar, 2002]
our point of departure. We observe that, although provides an overview). Here we focus on one particular
MMTS claim to be using the definition of explana- definition of explanation, that was given by Halpern [2016],
tion proposed by Halpern [2016], they do not quite which is in turn based on a definition due to Halpern and
do so. Roughly speaking, Halpern’s definition has Pearl [2005b]. Mothilal et al. [2021] (MMTS from now
a necessity clause and a sufficiency clause. MMTS on) already showed that this definition could be usefully
replace the necessity clause by a requirement that, applied to better understand and evaluate what MMTS called
as we show, implies it. Halpern’s definition also attribution-based explanation approaches, such as LIME
allows agents to restrict the set of options consid- [Ribeiro et al., 2016] and SHAP [Lundberg and Lee, 2017],
ered. While these difference may seem minor, as which provide a score or ranking over features, conveying the
we show, they can have a nontrivial impact on ex- (relative) importance of each feature to the model’s output,
planations. We also show that, essentially without and contrast them with what they called counterfactual-based
change, Halpern’s definition can handle two issues approaches, such as DICE [Mothilal et al., 2020] and that of
that have proved difficult for other approaches: ex- Wachter et al. [2017], which generate examples that yield a
planations of absence (when, for example, an im- different model output with minimum changes in the input
age classifier for tumors outputs “no tumor”) and features.
explanations of rare events (such as tumors). In this paper, we take MMTS as our point of departure
and focus on explaining image classifiers. We first observe
that, although MMTS claim to be using Halpern’s definition,
1 Introduction they do not quite do so. Roughly speaking, Halpern’s def-
Black-box AI systems and, in particular Deep Neural Net- inition (which we discuss in detail in Section 2) has a ne-
works (DNNs), are now a primary building block of many cessity clause and a sufficiency clause. MMTS replace the
computer vision systems. DNNs are complex non-linear necessity clause by a requirement that, as we show, implies
functions with algorithmically generated coefficients. In con- it. Halpern’s definition also allows agents to restrict the set
trast to traditional image-processing pipelines, it is difficult of options considered. While these difference may seem mi-
to retrace how the pixel data are interpreted by the layers of a nor, as we show, they can have a nontrivial impact on ex-
DNN. This “black box” nature of DNNs creates a demand for planations. We also show that, essentially without change,
explanations. A good explanation should answer the ques- Halpern’s definition can handle two issues that have proved
tion “Why did the neural network classify the input the way difficult for other approaches: explanations of absence (when,
it did?” By doing so, it can increase a user’s confidence in the for example, an classifier for tumors outputs “no tumor”) and
result. Explanations are also useful for determining whether explanations of rare events (again, a classifier for tumors can
the system is working well; if the explanation does not make be viewed as an example; a tumor is a relatively rare event).
sense, it may indicate that there is a problem with the system. The upshot of this discussion is that, while the analysis of
Unfortunately, it is not clear how to define what an MMTS shows that a simplification of Halpern’s definition can
explanation is, let alone what a good explanation is. go a long way to helping us understand notions of explana-
There have been a number of definitions of explana- tion used in the literature, we can go much further by using
tion given by researchers in various fields, particu- Halpern’s actual definition, while still retaining the benefits
larly computer science [Chajewska and Halpern, 1997; of the MMTS analysis.
Halpern, 2016; Halpern and Pearl, 2005b; Pearl, 1988] The rest of the paper is organized as follows: In Section 2,
and philosophy [Gärdenfors, 1988; Hempel, 1965; we review the relevant definitions of causal models and ex-
Salmon, 1970; Salmon, 1989; Woodward, 2014], planations; in Section 3, we discuss explanations of image
classifiers and show their relation to explanations in actual is some setting of all the variables in U ∪ V other than X and
causality; and in Section 4, we discuss explanations of ab- Y such that varying the value of X in that setting results in a
sence and of rare events. variation in the value of Y ; that is, there is a setting ~z of the
variables other than X and Y and values x and x′ of X such
2 Causal Models and Relevant Definitions that FY (x, ~z ) 6= FY (x′ , ~z).
A causal model M is recursive (or acyclic) if its causal
In this section, we review the definition of causal mod- graph is acyclic. It should be clear that if M is an acyclic
els introduced by Halpern and Pearl [2005a] and relevant causal model, then given a context, that is, a setting ~u for the
definitions of causes and explanations given by Halpern exogenous variables in U, the values of all the other variables
[2016]. The material in this section is largely taken from are determined (i.e., there is a unique solution to all the equa-
[Halpern, 2016].
tions). In this paper, following the literature, we restrict to
We assume that the world is described in terms of variables recursive models.
and their values. Some variables may have a causal influence
We call a pair (M, ~u) consisting of a causal model M and
on others. This influence is modeled by a set of structural
a context ~u a (causal) setting. A causal formula ψ is true or
equations. It is conceptually useful to split the variables into
false in a setting. We write (M, ~u) |= ψ if the causal formula
two sets: the exogenous variables, whose values are deter-
ψ is true in the setting (M, ~u). The |= relation is defined
mined by factors outside the model, and the endogenous vari-
inductively. (M, ~u) |= X = x if the variable X has value
ables, whose values are ultimately determined by the exoge-
x in the unique (since we are dealing with acyclic models)
nous variables. The structural equations describe how these
solution to the equations in M in context ~u (i.e., the unique
values are determined.
vector of values for the exogenous variables that simultane-
Formally, a causal model M is a pair (S, F ), where S is
ously satisfies all equations in M with the variables in U set to
a signature, which explicitly lists the endogenous and exoge- ~ ← ~y]ϕ if (M ~ , ~u) |= ϕ, where
nous variables and characterizes their possible values, and F ~u). Finally, (M, ~u) |= [Y Y =~y
defines a set of (modifiable) structural equations, relating the MY~ ←~y is the causal model that is identical to M , except that
values of the variables. A signature S is a tuple (U, V, R), the equations for variables in Y~ in F are replaced by Y = y
where U is a set of exogenous variables, V is a set of endoge- for each Y ∈ Y ~ and its corresponding value y ∈ ~y .
nous variables, and R associates with every variable Y ∈ A standard use of causal models is to define ac-
U ∪ V a nonempty set R(Y ) of possible values for Y (i.e., tual causation: that is, what it means for some par-
the set of values over which Y ranges). For simplicity, we ticular event that occurred to cause another particu-
assume here that V is finite, as is R(Y ) for every endogenous lar event. There have been a number of defini-
variable Y ∈ V. F associates with each endogenous variable tions of actual causation given for acyclic models (e.g.,
X ∈ V a function denoted FX (i.e., FX = F (X)) such that [Beckers, 2021; Glymour and Wimberly, 2007; Hall, 2007;
FX : (×U∈U R(U )) × (×Y ∈V−{X} R(Y )) → R(X). This Halpern and Pearl, 2005a; Halpern, 2016; Hitchcock, 2001;
mathematical notation just makes precise the fact that FX de- Hitchcock, 2007; Weslake, 2015; Woodward, 2003]). In this
termines the value of X, given the values of all the other vari- paper, we focus on what has become known as the modified
ables in U ∪ V. If there is one exogenous variable U and three Halpern-Pearl definition and some related definitions intro-
endogenous variables, X, Y , and Z, then FX defines the val- duced by Halpern [2016]. We briefly review the relevant
ues of X in terms of the values of Y , Z, and U . For example, definitions below (see [Halpern, 2016] for more intuition and
we might have FX (u, y, z) = u + y, which is usually written motivation).
as X = U + Y . Thus, if Y = 3 and U = 2, then X = 5, The events that can be causes are arbitrary conjunctions of
regardless of how Z is set.1 primitive events (formulas of the form X = x); the events
The structural equations define what happens in the pres- that can be caused are arbitrary Boolean combinations of
ence of external interventions. Setting the value of some vari- primitive events. an arbitrary formula ϕ.
able X to x in a causal model M = (S, F ) results in a new
causal model, denoted MX←x , which is identical to M , ex- Definition 1. [Actual cause] X ~ = ~x is an actual cause of ϕ
cept that the equation for X in F is replaced by X = x. in (M, ~u) if the following three conditions hold:
We can also consider probabilistic causal models; these ~ = ~x) and (M, ~u) |= ϕ.
are pairs (M, Pr), where M is a causal model and Pr is a AC1. (M, ~u) |= (X
probability on the contexts in M . ~ a (possibly
AC2. There is a a setting ~x′ of the variables in X,
The dependencies between variables in a causal model ~ ~ ′
empty) set W of variables in V − X , and a setting w ~ of
M = ((U, V, R), F ) can be described using a causal net- ~ such that (M, ~u) |= W ~ = w
work (or causal graph), whose nodes are labeled by the en- the variables in W ~ and
dogenous and exogenous variables in M , with one node for ~ ← ~x′ , W
(M, ~u) |= [X ~ ← w]¬ϕ,
~ and moreover
each variable in U ∪ V. The roots of the graph are (labeled ~ is minimal; there is no strict subset X ~ ′ of X
~ such
AC3. X
by) the exogenous variables. There is a directed edge from ~ = ~x can replace X
′ ′′ ~ = ~x in AC2, where ~x′′ is

variable X to Y if Y depends on X; this is the case if there that X
′ ~ ′.
the restriction of ~x to the variables in X
1
The fact that X is assigned U + Y (i.e., the value of X is the
~ = ~x cannot be considered a cause of ϕ
AC1 just says that X
sum of the values of U and Y ) does not imply that Y is assigned
X − U ; that is, FY (U, X, Z) = X − U does not necessarily hold. ~
unless both X = ~x and ϕ actually holds. AC3 is a minimality
condition, which says that a cause has no irrelevant conjuncts. cause of F B = 1 (so we get SC2 by taking Y = M L3 ).
AC2 extends the standard but-for condition (X ~ = ~x is a cause Thus, SC2 holds in both (M, u1 ) and (M, u2 ).
~ ′
of ϕ if, had X been ~x , ϕ would have been false) by allowing While SC3 is typically taken to be the core of the suffi-
us to apply it while keeping some variables fixed to the value ciency requirement, to show causality, we also need SC2,
that they had in the actual setting (M, ~u). In the special case since it requires that ¬ϕ hold for some setting of the vari-
~ = ∅, we get the but-for definition. ables. We might hope that if there were a setting where ϕ
that W
was false, SC3 would imply SC2. As the following example
To define explanation, we need the notion of sufficient
shows, this is not the case in general.
cause in addition to that of actual cause.
~ = ~x is a sufficient cause Example 1. Consider a causal model M with three binary
Definition 2. [Sufficient cause] X variables A, B, and C, and the structural equations A = B
of ϕ in (M, ~u) if the following four conditions hold: and C = A ∨ (¬A ∧ B). Let ~u be a context in which all
~ = ~x) and (M, ~u) |= ϕ.
SC1. (M, ~u) |= (X variables are set to 1, and let ϕ be the formula C = 1. In
context ~u, A = 1 satisfies SC1, SC3, and SC4 for ϕ. There
SC2. Some conjunct of X~ = ~x is part of an actual cause of ϕ is also some setting of the variables for which C = 0 :
in (M, ~u). More precisely, there exists a conjunct X = (M, ~u) |= [A ← 0, B ← 0](C = 0). However, A = 1
x of X~ = ~x and another (possibly empty) conjunction does not satisfy SC2. Indeed, A = 1 by itself is not an actual
~ = ~y such that X = x ∧ Y~ = ~y is an actual cause of
Y cause of C = 1, as B=1 holds as well, nor is A = 1 part
ϕ in (M, ~u). of an actual cause, as B = 1 is already an actual cause of
C = 1.
~ = ~x]ϕ for all contexts ~u′ ∈ R(U).
SC3. (M, ~u′ ) |= [X
On the other hand, if there are no dependencies between
SC4. X~ is minimal; there is no strict subset X ~ ′ of X
~ such the variables and some other assumptions made by MMTS in
~ = ~x satisfies conditions SC1, SC2, and SC3,
that X ′ ′ their analysis of image classifiers hold, then, roughly speak-
~ ′. ing, SC1, SC3, and SC4 do imply SC2. To make this precise,
where ~x′ is the restriction of ~x to the variables in X
we need two definitions.
Note that this definition of sufficient cause (which is ~ of endogenous vari-
Definition 3. The variables in a set X
taken from [Halpern, 2016]) is quite different from that in
ables are causally independent in a causal model M if, for
[Halpern and Pearl, 2005a]. Like the necessity clause used by
~ of X,
all contexts ~u, all strict subsets Y ~ and all Z ∈ X
~ −Y ~,
MMTS, the definition of [Halpern and Pearl, 2005a] requires
~ = ~x be an actual cause of ϕ, with- ~
(M, ~u) |= Z = z iff (M, ~u) |= [Y ← ~y](Z = z).
only that some subset of X
out allowing the subset to be extended by another conjunction ~ are causally independent if
Intuitively, the variables in X
~ = ~y (and uses a different definition of actual cause—that
Y ~ has no
setting the values of some subset of the variables in X
of [Halpern and Pearl, 2005b]), but (somewhat in the spirit of ~
impact on the values of the other variables in X.
SC3) requires that this necessity condition hold in all con-
texts. It has no exact analogue of SC3 at all. An example Definition 4. X ~ is determined by the context if for each set-
might help clarify the definition. ~ there is a context ~u~x such that (M, ~u~x ) |= X =
ting ~x of X,
Suppose that we have a dry forest and three arsonists. x.
There are three contexts: in u1 , it takes just one dropped ~ ′ of endogenous variables in a
Theorem 1. Given a set X
match to burn the forest down, but arsonist 1 and arsonist 2
causal model M such that (a) the variables in X ~ ′ are causally
drop matches; in u2 , it takes just one dropped match to burn
the forest down in and all three arsonists drop a match; fi- independent, (b) X ~ is determined by the context, (c) X
′ ~ ′ in-
nally, in u3 , it takes two dropped matches to burn the forest cludes all the parents of the variables in ϕ, (d) there is some
down, and arsonists 1 and 3 drop matches. To model this, setting ~x′ of the variables in X ~ ′ that makes ϕ false in con-
we have binary variables M L1 , M L2 , and M L3 , denoting ~ ← ~x′ ]¬ϕ), and (e) X
text ~u (i.e., (M, ~u) |= [X ′ ~ ⊆X ~ ′ , then
which arsonist drops a match, and F B denoting, whether the X~ = ~x is a sufficient cause of ϕ in (M, ~u) iff it satisfies SC1,
forest burns down. (Note that we use the structural equa- SC3, and SC4 (i.e., SC2 follows).
tion for F B to capture these differences; for example, if xi is
the value of M Li , then FF B (x1 , x2 , x3 , u1 ) = 1 iff at least Proof. By assumption, there is some setting ~x′ such that
one of x1 , x2 , and x3 is 1, and FF B (x1 , x2 , x3 , u3 ) = 1 X~ ′ that makes ϕ false in context ~u (i.e., (M, ~u) |= [X ~′ ←
iff at least two of x1 , x2 , and x3 are 1.) We claim that ~x′ ]¬ϕ). Choose ~x′ to be such a setting that differs mini-
M L1 = 1 ∧ M L2 = 1 is a sufficient cause of F B = 1 mally from the values that variables get in ~u; that is, the set
in (M, u1 ) and (M, u2 ), but not (M, u3 ). To see that it is not of variables Y ∈ X ~ such that (M, ~u) |= Y = y, and the
a sufficient cause in (M, u3 ), note that SC1 does not hold: value of Y in ~x′ is different from Y is minimal. Let Y ~ be
arsonist 2 does not drop a match. It is also easy to see that the set of variables in this minimal set. Let ~u~x′ be a con-
(M, ui ) |= [M L1 = 1 ∧ M L2 = 1](F B = 1) for i = 1, 2, 3, text such that (M, ~u~x′ ) |= X~ ′ = ~x′ ; by assumption, such a
so SC3 holds. Now in (M, u1 ), M L1 ∧M L2 = 1 is an actual ~ ′ includes all the parents of the vari-
cause of F B = 1 (since the values of both M L1 and M L2 context exists. Since X
have to be changed in order to change the value of F B). Sim- ables in ϕ, we must have (M, ~u~x′ ) |= [X ~ ′ ← ~x′ ]¬ϕ. Since
ilarly, in (M, u2 ), M L1 ∧ M L2 = 1 ∧ M L3 = 1 is an actual ~ ′ ′
(M, ~u~x′ ) |= X ← ~x , it follows that (M, ~u~x′ ) |= ¬ϕ.
Y~ must contain a variable in X. ~ For suppose not. By SC1, out allowing the subset to be extended by another con-
~
(M, ~u) |= X = ~x, so if Y does not contain a variable in X, junction Y~ = ~y ) (and uses a different definition of actual
the values that the variables in X ~ get in the setting X ~ ′ = ~x′ is cause—that of [Halpern and Pearl, 2005b]). The definition
the same as their value in (M, ~u). Thus, (M, ~u~x′ ) |= X ~ = ~x. of [Halpern and Pearl, 2005a] has no analogue to the second
~ = ~x]ϕ, from which it would follow part of EX1 here.
By SC3, (M, ~u~x′ ) |= [X
Of course, if the assumptions of Theorem 1 hold, then we
that (M, ~u~x′ ) |= ϕ, a contradiction.
can drop the requirement that the first part of EX1 holds.
Let ~y′ be the restriction of ~x′ to the variables in Y ~ , and let ~y
(Note that if the assumptions of Theorem 1 hold for some
~
be the values of the variables in Y in (M, ~u), that is (M, ~u) |= context, they hold for all contexts; in particular, assumption
~ = ~y. We claim that Y
Y ~ =~ y is a cause of ϕ in (M, ~u). By (d) holds, since X ~ ′ includes all the parents of the variables
assumption, (M, ~u) |= Y~ = ~y; by SC1, (M, ~u) |= ϕ. Thus, in ϕ.) Although the changes made by MMTS to Halpern’s
AC1 holds. By construction, (M, ~u) |= [Y ~ ←~ y ′ ]¬ϕ, so AC2 definition seem minor, they are enough to prevent Theorem 1
~ ~ from holding. We show this in Example 2 in the next section,
holds (with W = ∅). Finally, suppose that W is such that
after we have discussed the assumptions made by MMTS in
(M, ~u) |= W ~ =w ~ and for some subset Y ~ ′ of Y
~ , we have that
more detail.
(M, ~u) |= [Y~ ← ~y , W
′ ′′ ~ ← w]¬ϕ,
~ where ~ ′′
y is the restriction The requirement that the first part of condition EX1 as
′ ~ ′
of ~y to the variables in Y . We can assume that W ~ is a subset given here holds in all contexts in K that satisfy X ~ = ~x ∧ ϕ
~ ′ ~ ′
of X , since X includes all the parents of the variables in and that the second part holds in all contexts in K is quite
ϕ. By causal independence, (M, ~u) |= [Y ~ ′ ← ~y ′ ](W ~ = strong, and often does not hold in practice. We are often will-
~ ← ~y ]¬ϕ. But this contradicts the
′ ′′ ing to accept X~ = ~x as an explanation if these requirements
~ Thus, (M, ~u) |= [Y
w).
minimality of Y ~ . It follows that AC3 holds. hold with high probability. Given a set K of contexts in a
~ =~ causal model M , let Kψ consist of all contexts ~u in K such
We have shown that Y y is a cause of ϕ in (M, ~u). Since ~ = ~x, ϕ, SC2) consist of all
~ ~ it follows that some conjunct of that (M, ~u) |= ψ, and let K(X
Y includes a variable in X, ~
~ = ~x is part of a cause of ϕ in (M, ~u), so SC2 holds, as contexts ~u ∈ K that satisfy X = ~x ∧ ϕ and the first condition
X in EX1 (i.e., the analogue of SC2).
desired.
Definition 6. [Partial Explanation] X~ = ~x is a partial ex-
The notion of explanation builds on the notion of sufficient planation of ϕ with goodness (α, β) relative to K in a proba-
causality, and is relative to a set of contexts. bilistic causal model (M, Pr) if
Definition 5. [Explanation] X ~ = ~x is an explanation of ϕ ~ = ~x, ϕ, SC2) | K ~
EX1′ . α ≤ P r(K(X X=~x∧ϕ ) and β ≤
relative to a set K of contexts in a causal model M if the
following conditions hold: P r(K[X=~
~ x]ϕ ).

EX1. X ~ = ~x is a sufficient cause of ϕ in all contexts in K ~ is minimal; there is no strict subset X


EX2′ . X ~ ′ of X
~ such
~ = ~x) ∧ ϕ. More precisely,
satisfying (X that α ≤ P r(K(X ~ = ~x , ϕ, SC2) | K ~ ′ ′ ) and
′ ′
X =~ x ∧ϕ
~ = ~x) ∧ ϕ, then there β ≤ P r(K[X~ ′ =~x′ ]ϕ ), where x′ is the restriction of ~x to
• If ~u ∈ K and (M, ~u) |= (X
~ = ~x and a (possibly the variables in X.
exists a conjunct X = x of X
~ =~ ~ = ′
EX3 . (M, u) |= X ~ = ~x ∧ ϕ for some u ∈ K.
empty) conjunction Y y such that X = x ∧ Y
~y is an actual cause of ϕ in (M, ~u). (This is SC2 Theorem 1 has no obvious counterpart for partial explana-
applied to all contexts ~u ∈ K where (X ~ = ~x) ∧ ϕ tions. The problem is that the conjuncts in EX1′ can be sat-
holds.) isfied by different contexts in the set K, while still satisfying
~ = ~x]ϕ for all contexts ~u′ ∈ K. (This
• (M, ~u′ ) |= [X the probabilistic constraint. (The theorem would hold with
is SC3 restricted to the contexts in K.) α = β if both conjuncts were satisfied by the same subset of
K.)
~
EX2. X is minimal; there is no strict subset X ~ ′ of X
~ such
~ ′ ′ ′
that X = ~x satisfies EX1, where ~x is the restriction of 3 Using Causality to Explain Image
~x to the variables in X~ ′ . (This is SC4).
Classification
~ = ~x ∧ ϕ for some u ∈ K.
EX3. (M, u) |= X Following MMTS, we view an image classifier (a neural net-
Note that this definition of explanation (which is work) as a probabilistic causal model. MMTS make a num-
taken from [Halpern, 2016]) is quite different from that ber of additional assumptions in their analysis. Specifically,
in [Halpern and Pearl, 2005a]. What is called EX2 in MMTS take the endogenous variables to be the set V~ of pix-
[Halpern and Pearl, 2005a] is actually an analogue of the first els that the image classifier gets as input, together with an
part of EX1 here; although it is called “sufficient causal- output variable that we call O. The variable Vi ∈ V ~ describes
ity”, it is closer to the necessity condition. But, like the color and intensity of pixel i; its value is determined by
the necessity clause used by MMTS, it requires only that the exogenous variables. The equation for O determines the
some subset of X ~ = ~x be an actual cause of ϕ (with- output of the neural network as a function of the pixel values.
Thus, the causal network has depth 2, with the exogenous ~ such that (M, ~u) |= [X
the pixels in X ~ = ~x′ ](O 6= o). (Since
variables determining the feature variables, and the feature α > 0, there must exist such a context.) Let X ~ ′ be a mini-
variables determining the output variable. There are no de- ~
mal subset of X for which there exists a setting ~x′′ such that
pendencies between the feature variables. Moreover, for each ~ ′ = ~x′′ ](O 6= o). It is easy to see that X~ ′ = ~x∗
(M, ~u) |= [X
setting ~v of the feature variables, there is a setting of the ex- ∗
is a cause of O = o, where ~x is the restriction of ~x to X ~ ′.
ogenous variables such that V ~ = ~v . That is, the variables in ′
~ Thus, the first part of EX1 holds for ~u. The desired result
V are causally independent and determined by the context, follows.
in the sense of Theorem 1. Moreover, all the parents of the
output variable O are contained in V~ . So for any explanation How reasonable are the assumptions made by MMTS? The
of the output involving the pixels, conditions (a), (b), (c), and following example shows that their assumption that, in each
(e) of Theorem 1 hold if we take ϕ to be some setting of O. context, some subset of the explanation be an actual cause (as
While a neural network often outputs several labels, MMTS opposed to some conjunct of the explanation being extend-
assume that the output is unique (and a deterministic function able to an actual cause, as required to EX1) leads to arguably
of the pixel values). Given these assumptions, the probability unreasonable explanations.
on contexts directly corresponds to the probability on seeing Example 2. Consider the following voting scenario. There
various images (which the neural network presumably learns are three voters, A, B, and C, who can vote for the candidate
during training). or abstain; just one vote is needed for the candidate to win the
Although they claimed to be using using Halpern’s defini- election. The voters make their decisions independently. By
tion, the definition of (partial) explanation given in MMTS Definition 5, A = 1 (the fact that A voted for the candidate)
differs from that of Halpern in three respects. The first is that, is an explanation of the outcome (as well as B = 1 and C =
rather than requiring that a conjunct of the explanation can be 1, separately). By assumption, A = 1 is sufficient for the
extended to an actual cause (i.e., the first part of EX1), they candidate to win in all contexts. Now consider any context ~u
require that in each context, some subset of the explanation be where the candidate wins and A = 1. It is easy to see that the
an actual cause. Second, they do not use Halpern’s definition conjunction of voters for the candidate in ~u is a cause of the
of actual cause; in particular, in their analogue of AC2, they candidate winning. So, for example, if A and C voted for the
do not require that W ~ =w ~ be true in the context being con- candidate in context ~u, but B did not, then A = 1 ∧ C = 1
sidered. This appears to be an oversight on their part, and is is a (but-for) cause of the candidate winning; if both votes
mitigated by the fact that they focus on but-for causality (for change, the candidate loses.
which W ~ = ∅, as we observed, so that the requirement that On the other hand, A = 1 is not an explanation of the
~
W =w ~ in the context being considered has no bite). Finally, candidate winning according to the MMTS definition. To see
they take K = R(U). Since we have identified contexts and why, note that in the context ~u above, no subset of A = 1 is
images, this amounts to considering all images possible. As an actual cause of the candidate winning. Rather, according
we shall see in Section 4, there are some benefits in allow- to the MMTS definition, A = 1 ∧ B = 1 ∧ C = 1 is the only
ing the greater generality of having K be an arbitrary subset explanation (since there are contexts—namely, ones where all
of R(U). We now show that the conditions considered by three voters voted for the candidate)—where A = 1 ∧ B =
MMTS suffice to give Halpern’s notion. 1 ∧ C = 1 is the only actual cause). This does not match
our intuition. (Of course, we can easily convert this to a story
Theorem 2. For the causal model (M, Pr) corresponding about image classification, where the output is 1 if any of the
~ = ~x is a partial explanation of
to the image classifier, X pixels A, B, and C fire.)
O = o with goodness (α, β), where α, β > 0, if the following As we observed, the assumptions of MMTS imply that con-
conditions hold: ditions (a), (b), (c), and (e) of Theorem 1 hold, if we take X ~′
• β ≤ P r(K[X=~
~ x](O=o)] ); ~
to be the pixels, X to be some subset of pixels, and ϕ to be
some setting of the output variable O. They also hold in this
• there is no strict subset X ~ ′ of X ~ such that β ≤ ~ ′ to be the set of voters, X
~ to be any subset
example, taking X
P r(K[X~ ′ =~x′ ](O=o)] ), where x′ is the restriction of ~x to of voters, and ϕ to be the outcome of the election. Moreover,
the variables in X; condition (d) holds. So in this example, we do not need to
• α ≤ Pr({~u : ∃~x′ ((M, ~u) |= [X ~ = ~x′ ](O 6= o))} | check the first part of EX1 to show that A = a is an expla-
KX=~
~ x∧O=o ). nation of the candidate winning, given that the second part
of EX1 and EX2 hold. But this is not the case for the MMTS
Proof. The first condition in the theorem clearly guarantees
definition. Although A = 1 is a sufficient cause of the can-
that the second part of EX1′ holds; the second condition guar-
didate winning and is certainly minimal, as we observed, it
antees that EX2′ holds. The fact that β > 0 means that for
~ ← ~x](O = o). Let ~u′ is not an explanation of the outcome according to the MMTS
some ~u, we must have (M, ~u) |= [X definition. This is because MMTS require a subset of the con-
be a context that gives the pixels in X~ the value ~x and agrees juncts in the explanation to be an actual cause, which is not
~ ~ = ~x ∧O = o,
with ~u on the pixels in Y . Clearly (M, ~u′ ) |= X the case for SC2.

so EX3 holds. For another, perhaps more realistic, example of this phe-
It remains to show that the first part of EX1′ holds. Sup- nomenon in image classification, as observed by Shitole et al.
pose that ~u is a context for which there exists a setting ~x′ of [2021] and Chockler et al. [2023], images usually have more
than one explanation. Assume, for example, that the input does not hold. Given a context (an input image) in which the
image I is an image of a cat, labeled as a cat by the image first pixel is on, the explainability tools for image classifiers
classifier. An explainability tool might find several explana- would output the first pixel being on as an explanation for the
tions for this label, such as the cat’s ears and nose, the cat’s classification. This is because they do not take the dependen-
tail and hind paws, or the cat’s front paws and fur. All those cies between pixels into account. These systems would not
are perfectly acceptable as explanations of why this image call the second pixel part of an explanation, as it is off in the
was labeled a cat, but only their conjunction is an explana- input image.
tion according to MMTS. In contrast, most of the existing ex- The following example shows that even without assuming
plainability tools for image classifiers output one explanation causal structure between the pixels, the second condition in
(i.e., a single conjunct). Thus, these explanations match Def- Theorem 2 may not hold.
inition 5 (rather than that of MMTS) for the set of contexts
that includes the original image and all partial coverings of Example 4. Suppose that pixels have values in {0, 1}. Let an
this image. image be a (2n + 1)-tuple of pixels. Suppose that an image is
labeled 0 if the first pixel is a 0 and the number of 0s in the re-
MMTS’s restriction to but-for causality is reasonable if we maining 2n pixels is even (possibly 0), or the first pixel is a 1,
assume that there are no causal connections between the set- and the number of 0s in the remaining pixels is even and pos-
ting of various pixels. However, there are many examples in itive. Suppose that the probability distribution is such that the
the literature showing that but-for causality does not suffice set of images where there is an even number of 0s in the final
if we have a richer causal structure. Consider the following 2n pixels has probability .9, Moreover, suppose that X1 = 0
well-known example due to Hall [2004]: with probability 1/2, where X1 denotes the first pixel. Now
Example 3. Suzy and Billy both pick up rocks and throw the question is whether X1 = 0 is a partial explanation of
them at a bottle. Suzy’s rock gets there first, shattering O = 0 with goodness (α, .9) for some α > 1/22n−1 . Clearly
the bottle. Since both throws are perfectly accurate, Billy’s the probability that O = 0 conditional on X1 = 0 is .9, while
rock would have shattered the bottle had it not been pre- conditional on X1 = 1 and unconditionally, the probability
empted by Suzy’s throw. The standard causal model for this that O = 0 is less than .9. Thus, the second part of EX1′ and
story (see [Halpern and Pearl, 2005a]) has endogenous bi- EX2′ both hold. Now consider the first part of EX1′ . Suppose
nary variables ST (Suzy throws), BT (Billy throws), SH that (M, ~u) |= X1 = 0 ∧ O = 0, so in ~u, an even number of
(Suzy’s rock hits the bottle), BH (Billy’s rock hits the bot- the last 2n pixels are 0. Suppose in fact that a positive num-
tle), and BS (the bottle shatters). The values of ST and BT ber of the last 2n pixels are 0. Then we claim that there is no
are determined by the exogenous variable(s). The remaining ~ and ~y such that X1 = 0 ∧ Y
Y ~ = ~y is a cause of O = 0 in
equations are SH = ST (Suzy’s rock hits the bottle if Suzy (M, ~u). Clearly, X1 = 0 is not a cause (since setting it to 1
throws), BH = BT ∧ ¬BH (Billy’s rock hits the bottle if does not affect the labeling). Moreover, if Y~ is nonempty, let
Billy throws and Suzy’s rock does not hit the bottle—this is Y ∈Y ~ and let y be the value of Y in Y ~ . Then it is easy to
how we capture the fact that Suzy’s rock hits the bottle first), see that Y = y is a cause of O = 0 (flipping the value of Y
and BS = SH ∨ BH (the bottle shatters if it is hit by either results in changing the value of O to 1), so X1 = 0 ∧ Y ~ = ~y
rock). is not a cause of O = o (AC3 is violated). Thus, X1 = 0 is
Suppose that K consists of four contexts, corresponding to a cause of O = 0 only in the context ~u where X1 = 0 and
the four possible combinations of Suzy throwing/not throw- all the last 2n pixels are 1. Using Pascal’s triangle, it is easy
ing and Billy throwing/not throwing, and Pr is such that all to show that this context has probability 1/22n−1 conditional
context have positive probability. In this model, Suzy’s throw on X1 = 0 ∧ O = 0. Note that this is also the only context
(i.e., ST = 1) is an explanation for the bottle shattering (with where changing the value of X1 affects the value of O.
goodness (1,1)). Clearly the second part of EX1 holds; if Suzy
throws, the bottle shatters, independent of what Billy does. Example 4 is admittedly somewhat contrived; it does not
The first part of EX1 holds because ST = 1 is a cause of the seem that there are that many interesting examples of prob-
bottle shattering; if we hold BT fixed at 0 (its actual value), lems that arise if there is really no causal structure among the
then switching ST from 1 to 0 results in the bottle not shat- pixels. But, as Example 3 suggests, there may well be some
tering. But ST=1 is not a but-for cause of BS = 1. Switching causal connection between pixels in an image. Unfortunately,
ST to 0 still results in BS = 1, because if Suzy doesn’t throw, none of the current approaches to explanation seems to deal
Billy’s rock will hit the bottle. with this causal structure. We propose this as an exciting area
for future research; good explanations will need to take this
We can easily convert this story to a story more appropri-
causal structure into account.
ate for image classification, where we have an isomorphic
model. Suppose that we have two coupled pixels. ST and
BT correspond to whether we turn on the power to the pix- 4 Beyond Basic Explanations
els. However, if we turn the first one (which corresponds to So far we have considered classifiers that output only posi-
SH) on, the second (which corresponds to BH) is turned off, tive labels, that is, labels that describe the image. However,
even if the power is on. We classify the image as “active” if there are classifiers that output negative answers. Consider,
either pixel is on. Since this model is isomorphic to the Suzy- for example, the use of AI in healthcare, where image clas-
Billy story, turning on the first pixel is an explanation for the sifiers are used as a part of the diagnostic procedure for MRI
classification, but again, the second condition of Theorem 2 images, X-rays, and mammograms [Amisha et al., 2019;
Payrovnaziri et al., 2020]. In the use of image classifiers in of the brain, all pixels are equally important for the negative
brain tumor detection based on an MRI, the classifier outputs classification.
either “tumor” or “no tumor”. While the discussion above In general, people find it difficult to explain absences. Nev-
shows what would it would take for X ~ = ~x to be an expla- ertheless, they can and do do it. For example, an expert radi-
nation of “tumor”, what would count for X ~ = ~x to be an ologist might explain their decision of “no tumor” to another
explanation of “no tumor”? We claim that actually the same expert by pointing out the most suspicious region(s) of the
ideas apply to explanations of “no tumor” as to tumor. For brain and explaining why they did not think there was a tu-
the second part of EX1, X ~ = ~x would have to be such that, mor there, by indicating why the suspicious features did in
with high probability, setting X ~ to ~x would result in an out- fact not indicate a tumor. But this is exactly what Definition 6
provides, for appropriate values of the probabilistic bounds
put of “no tumor”. For the first part of EX1, we need to find a
~ such that (α, β).
minimal subset of pixels that includes a pixel in X Indeed, note that we can get an explanation to focus on
changing the values of these pixels would result in a label of the most suspicious regions by making β sufficiently small.
“tumor”. Since explanations must be minimal, the explanation will
There is a subtlety here though. A tumor is a rare event. then return a smallest set of regions that has total probabil-
Overall, the probability that someone develops a brain tumor ity (at least) β.2 Alternatively, we can just restrict K to the
in their lifetime is less than 1%, and the probability that a most suspicious regions, by considering only contexts where
random person on the street has a brain tumor at this moment nonsuspicious regions have all their pixels set to white, or
is much lower than that. Suppose that the classifier derived some other neutral color (this is yet another advantage of con-
its probability using MRI images from a typical population. sidering K rather than all of R(U)). In addition, to explain
Given an image I that is (correctly) labeled “no tumor”, let X why there is no tumor in a particular region, the expert would
be a single pixel whose value in I is x such that X is part of an likely focus on certain pixels and say “there’s no way that
explanation X ~ = ~x of the label “tumor” in a different image
those pixels can form part of a tumor”; that’s exactly what
I ′ , and in I ′ , X = x′ 6= x. Then X = x is an explanation of the “sufficiency” part of the explanation does. The expert
“no tumor” with goodness (α, β) for quite hight α and β: in might also point out pixels that would have to be different in
most images where X = x, the output is “no tumor” (because order for there to be a tumor; that’s exactly what the “neces-
the output is “no tumor” with overwhelming probability), and sity” part of the explanation does. Of course, this still must
changing X to x′ , as well as the values of other pixels in X, ~ be done for a number of regions, where the number is con-
we typically get an output of “tumor”. But X = x does not trolled by β or the choice of K. Thus, it seems to us that
seem like a good explanation of why we believe there is no the definition really is doing a reasonable job of providing an
tumor! explanation in the spirit of what an expert would provide.
To deal with this problem (which arises whenever we are We can get simpler (although perhaps not as natural) expla-
classifying a rare event), we (a) assume that the probability is nations for the absence of tumors by taking advantage of do-
derived from training on MRI images of patients who doctors main knowledge. For example, if we know the minimal size
suspect of having a tumor; thus, the probability of “tumor” of tumors, explaining their absence can be done by providing
would be significantly higher than it is in a typical sample a “net” of pixels that cannot be a part of a brain tumor, in
of images; and (b) we would expect an explanation of “no which the distance between the neighboring pixels is smaller
tumor” to have goodness (α, β) for α and β very close to than the size of a minimal tumor.
1. With these requirements, an explanation X ~ = ~x of “no
tumor” would include pixels from all the most likely tumor We conclude by repeating the point that we made in the
sites, and these pixels would have values that would allow introduction (now with perhaps more evidence): while the
us to preclude there being a tumor at those sites (with high analysis of MMTS shows that a simplification of Halpern’s
probability). This is an instance of a situation where K would definition can go a long way to helping us understand notions
not consist of all contexts. of explanation used in the literature, we can go much further
The bottom line here is that we do not have to change the by using Halpern’s actual definition, while still retaining the
definitions to accommodate explanations of absence and rare benefits of the MMTS analysis. However, there is still more
events, although we have to modify the probability distribu- to be done. Moreover, dealing with the full definition may
tion that we consider. That said, finding an explanation for involve added computational difficulties. We believe that us-
“no tumor” seems more complicated than finding an expla- ing domain knowledge may well make things more tractable,
nation for “tumor”. The standard approaches that have been although this too will need to checked. For example, We
used do not seem to work. In fact, none of the existing image showed above how domain knowledge could be used to pro-
classifiers is able to output explanations of absence. The rea- vide simpler explanations of the absence of tumors. For an-
son for this is that, due to the intractability of the exact com- other example, if we are explaining the absence of cats in an
putation of explanations, all existing black-box image classi- image of a seascape, knowledge of zoology allows to elimi-
fiers construct some sort of ranking of pixels of an input im- nate the sea part of the image as a possible area where cats
age, which is then used as a (partially) ordered list from which
an approximate explanation is constructed greedily. Unfortu- 2
There will be may be many choices for this; we can add code to
nately, for explaining absence, there is no obvious ranking of get the choice that involves the smallest set of pixels. These will be
pixels of the image: since a brain tumor can appear in any part the ones of highest probability.
can be located, as cats are not marine animals. This seems References
doable in practice. [Amisha et al., 2019] P. M. Amisha, M. Pathania, and V. K.
Rathaur. Overview of artificial intelligence in medicine.
Journal of Family Medicine and Primary Care, 8(7):2328–
2331, 2019.
[Beckers, 2021] S. Beckers. Causal sufficiency and actual
causation. Journal of Philosophical Logic, 50:1341–1374,
2021.
[Chajewska and Halpern, 1997] U. Chajewska and J. Y.
Halpern. Defining explanation in probabilistic systems. In
Proc. Thirteenth Conference on Uncertainty in Artificial
Intelligence (UAI ’97), pages 62–71, 1997.
[Chockler et al., 2021] H. Chockler, D. Kroening, and
Y. Sun. Explanations for occluded images. In
Proc. IEEE/CVF International Conference on Computer
Vision (ICCV 2021), pages 1214–1223, 2021.
[Chockler et al., 2023] H. Chockler, D. A. Kelly,
and D. Kroening. Multiple different expla-
nations for image classifiers. Available at
https://arxiv.org/pdf/2309.14309.pdf, 2023.
[Gärdenfors, 1988] P. Gärdenfors. Knowledge in Flux. MIT
Press, Cambridge, Mass., 1988.
[Glymour and Wimberly, 2007] C. Glymour and F. Wim-
berly. Actual causes and thought experiments. In J. Camp-
bell, M. O’Rourke, and H. Silverstein, editors, Causation
and Explanation, pages 43–67. MIT Press, Cambridge,
MA, 2007.
[Hall, 2004] N. Hall. Two concepts of causation. In
J. Collins, N. Hall, and L. A. Paul, editors, Causation and
Counterfactuals. MIT Press, Cambridge, MA, 2004.
[Hall, 2007] N. Hall. Structural equations and causation.
Philosophical Studies, 132:109–136, 2007.
[Halpern and Pearl, 2005a] J. Y. Halpern and J. Pearl.
Causes and explanations: a structural-model approach.
Part I: causes. British Journal for Philosophy of Science,
56(4):843–887, 2005.
[Halpern and Pearl, 2005b] J. Y. Halpern and J. Pearl.
Causes and explanations: a structural-model approach.
Part II: explanations. British Journal for Philosophy of
Science, 56(4):889–911, 2005.
[Halpern, 2016] J. Y. Halpern. Actual Causality. MIT Press,
Cambridge, MA, 2016.
[Hempel, 1965] C. G. Hempel. Aspects of Scientific Expla-
nation. Free Press, New York, 1965.
[Hitchcock, 2001] C. Hitchcock. The intransitivity of causa-
tion revealed in equations and graphs. Journal of Philoso-
phy, XCVIII(6):273–299, 2001.
[Hitchcock, 2007] C. Hitchcock. Prevention, preemption,
and the principle of sufficient reason. Philosophical Re-
view, 116:495–532, 2007.
[Lundberg and Lee, 2017] S. M. Lundberg and S. Lee. A
unified approach to interpreting model predictions. In
Advances in Neural Information Processing Systems
(NeurIPS), volume 30, pages 4765–4774, 2017.
[Molnar, 2002] C. Molnar. Interpretable Ma- [Woodward, 2003] J. Woodward. Making Things Happen: A
chine Learning. 2 edition, 2002. Available at Theory of Causal Explanation. Oxford University Press,
https://christophm.github.io/interpretable-ml-book. Oxford, U.K., 2003.
[Mothilal et al., 2020] R. K. Mothilal, C. Tan, and [Woodward, 2014] J. Woodward. Scientific explanation.
A. Sharma. Explaining machine learning classifiers In E. N. Zalta, editor, The Stanford Encyclopedia of
through diverse counterfactual explanations. In Proc. of Philosophy (Winter 2014 edition). 2014. Available at
the 2020 Conference on Fairness, Accountability, and http://plato.stanford.edu/archives/win2014/entries/scientific-
Transparency (FAccT 2020), pages 607–616, 2020. explanation/.
[Mothilal et al., 2021] R. K. Mothilal, D. Mahajan, C. Tan,
and A. Sharma. Towards unifying feature attribution and
counterfactual explanations: different means to the same
end. In Proc. of the 2021 AAAI/ACM Conference on AI,
Ethics, and Society (AIES ’21), pages 652–663, 2021.
[Payrovnaziri et al., 2020] S. N. Payrovnaziri, Z. Chen,
P. Rengifo-Moreno, T. Miller, J. Bian, J. H. Chen, X. Liu,
and Z. He. Explainable artificial intelligence models us-
ing real-world electronic health record data: a systematic
scoping review. Journal of the American Medical Infor-
matics Association, 27(7):1173–1185, 2020.
[Pearl, 1988] J. Pearl. Probabilistic Reasoning in Intelligent
Systems. Morgan Kaufmann, San Francisco, 1988.
[Ribeiro et al., 2016] M. T. Ribeiro, S. Singh, and
C. Guestrin. “Why should I trust you?” Explaining
the predictions of any classifier. In Knowledge Discovery
and Data Mining (KDD), pages 1135–1144. ACM, 2016.
[Salmon, 1970] W. C. Salmon. Statistical explanation. In
R. Kolodny, editor, The Nature and Function of Scientific
Theories, pages 173–231. University of Pittsburgh Press,
Pittsburgh, PA, 1970.
[Salmon, 1989] W. C. Salmon. Four Decades of Scientific
Explanation. University of Minnesota Press, Minneapolis,
1989.
[Selvaraju et al., 2017] R. R. Selvaraju, M. Cogswell,
A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-
CAM: Visual explanations from deep networks via
gradient-based localization. In Proc. IEEE/CVF Interna-
tional Conference on Computer Vision (ICCV 2017), pages
618–626, 2017.
[Shitole et al., 2021] V. Shitole, F. Li, M. Kahng, P. Tade-
palli, and A. Fern. One explanation is not enough:
structured attention graphs for image classification. In
Proc. Advances in Neural Information Processing Systems
(NeurIPS 2021), pages 11352–11363, 2021.
[Sun et al., 2020] Y. Sun, H. Chockler, X. Huang, and
D. Kroening. Explaining image classifiers using statistical
fault localization. In Proc. 18th Conference on computer
vision (ECCV 2020), Part XXVIII, pages 391–406, 2020.
[Wachter et al., 2017] S. Wachter, B. Mittelstadt, and
C. Russell. Counterfactual explanations without opening
the black box: automated decisions and the GDPR. Har-
vard Journal of Law and Technology, (31):841–887, 2017.
[Weslake, 2015] B. Weslake. A partial theory of actual cau-
sation. British Journal for the Philosophy of Science,
2015. To appear.

You might also like