You are on page 1of 20

Causal Inference in Natural Language Processing:

Estimation, Prediction, Interpretation and Beyond


Amir Feder∗1,10 , Katherine A. Keith2 , Emaad Manzoor3 , Reid Pryzant4 , Dhanya Sridhar5 ,
Zach Wood-Doughty6 , Jacob Eisenstein7 , Justin Grimmer8 , Roi Reichart1 , Margaret E. Roberts9 ,
Brandon M. Stewart10 , Victor Veitch7,11 , and Diyi Yang12
1
Technion - Israel Institute of Technology
2
Williams College
3
University of Wisconsin - Madison
4
Microsoft
5
Columbia University
6
Northwestern University
7
Google Research
8
Stanford University
9
University of California San Diego
arXiv:2109.00725v2 [cs.CL] 30 Jul 2022

10
Princeton University
11
University of Chicago
12
Georgia Tech
Abstract 1 Introduction
A fundamental goal of scientific research is The increasing effectiveness of NLP has created
to learn about causal relationships. How- exciting new opportunities for interdisciplinary
ever, despite its critical role in the life collaborations, bringing NLP techniques to a
and social sciences, causality has not had wide range of external research disciplines (e.g.,
the same importance in Natural Language
Roberts et al., 2014; Zhang et al., 2020; Ophir
Processing (NLP), which has traditionally
placed more emphasis on predictive tasks. et al., 2020) and incorporating new data and tasks
This distinction is beginning to fade, with into mainstream NLP (e.g., Thomas et al., 2006;
an emerging area of interdisciplinary re- Pryzant et al., 2018). In such interdisciplinary col-
search at the convergence of causal infer- laborations, many of the most important research
ence and language processing. Still, re- questions relate to the inference of causal relation-
search on causality in NLP remains scat- ships. For example, before recommending a new
tered across domains without unified def-
drug therapy, clinicians want to know the causal
initions, benchmark datasets and clear ar-
ticulations of the challenges and opportuni-
effect of the drug on disease progression. Causal
ties in the application of causal inference to inference involves a question about a counterfac-
the textual domain, with its unique proper- tual world created by taking an intervention: What
ties. In this survey, we consolidate research would a patient’s disease progression have been if
across academic areas and situate it in the we had given them the drug? As we explain be-
broader NLP landscape. We introduce the low, with observational data, the causal effect is
statistical challenge of estimating causal ef- not equivalent to the correlation between whether
fects with text, encompassing settings where
the drug is taken and the observed disease progres-
text is used as an outcome, treatment, or to
address confounding. In addition, we ex- sion. There is now a vast literature on techniques
plore potential uses of causal inference to for making valid inferences using traditional (non-
improve the robustness, fairness, and inter- text) datasets (e.g., Morgan and Winship, 2015),
pretability of NLP models. We thus provide but the application of these techniques to natural
a unified overview of causal inference for language data raises new fundamental challenges.
the NLP community. 1 Conversely, in many classical NLP applications,

All authors equally contributed to this paper. Au- the main goal is to make accurate predictions:
thor names are organized alphabetically in two clusters: Any statistical correlation is admissible, regard-
First students and post-docs and then faculty members.
The email address of the corresponding (first) author is: on causal inference and language processing is avail-
feder@campus.technion.ac.il. able here: https://github.com/causaltext/
1
An online repository containing existing research causal-text-papers
less of the underlying causal relationship. How- encodes the relevant confounders of a causal ef-
ever, as NLP systems are increasingly deployed in fect. The text as a confounder setting is one of
challenging and high-stakes scenarios, we cannot many causal inferences we can make with text
rely on the usual assumption that training and test data. The text data can also encode outcomes or
data are identically distributed, and we may not treatments of interest. For example, we may won-
be satisfied with uninterpretable black-box predic- der about how gender signal affects the sentiment
tors. For both of these problems, causality offers of the reply that a post receives (text as outcome),
a promising path forward: Domain knowledge of or about how a writing style affects the “likes” a
the causal structure of the data generating process post receives (text as treatment).
can suggest inductive biases that lead to more ro- NLP helps causal inference. Causal infer-
bust predictors, and a causal view of the predictor ence with text data involves several challenges
itself can offer new insights on its inner workings. that are distinct from typical causal inference set-
The core claim of this survey paper is that deep- tings: Text is high-dimensional, needs sophisti-
ening the connection between causality and NLP cated modeling to measure semantically meaning-
has the potential to advance the goals of both so- ful factors like topic, and demands careful thought
cial science and NLP researchers. We divide the to formalize the intervention that a causal question
intersection of causality and NLP into two ar- corresponds to. The developments in NLP around
eas: Estimating causal effects from text, and using modeling language, from topic models (Blei et al.,
causal formalisms to make NLP methods more re- 2003) to contextual embeddings (e.g. (Devlin
liable. We next illustrate this distinction. et al., 2019)), offer promising ways to extract the
Example 1. An online forum has allowed its users information we need from text to estimate causal
to indicate their preferred gender in their profiles effects. However, we need new assumptions to en-
with a female or male icon. They notice that users sure that the use of NLP methods leads to valid
who label themselves with the female icon tend causal inferences. We discuss existing research on
to receive fewer “likes” on their posts. To better estimating causal effects from text and emphasize
evaluate their policy of allowing gender informa- these challenges and opportunities in § 3.
tion in profiles, they ask: Does using the female Example 2. A medical research center wants to
icon cause a decrease in popularity for a post? build a classifier to detect clinical diagnoses from
Ex. 1 addresses the causal effect of signaling the textual narratives of patient medical records.
female gender (treatment) on the likes a post re- The records are aggregated across multiple hospi-
ceives (outcome) (see discussion on signaling at tal sites, which vary both in the frequency of the
(Keith et al., 2020)). The counterfactual ques- target clinical condition and the writing style of
tion is: If we could manipulate the gender icon the narratives. When the classifier is applied to
of a post, how many likes would the post have re- records from sites that were not in the training
ceived? set, its accuracy decreases. Post-hoc analysis in-
The observed correlation between the gender dicates that it puts significant weight on seemingly
icons and the number of “likes” generally does not irrelevant features, such as formatting markers.
coincide with the causal effect: it might instead be Like Ex. 1, Ex. 2 also involves a counterfactual
a spurious correlation, induced by other variables, question: Does the classifier’s prediction change
known as confounders, which are correlated with if we intervene to change the hospital site, while
both the treatment and the outcome (see Gururan- holding the true clinical status fixed? We want
gan et al., 2018, for an early discussion of spurious the classifier to rely on phrases that express clin-
correlation in NLP). One possible confounder is ical facts, and not writing style. However, in the
the topic of each post: Posts written by users who training data, the clinical condition and the writ-
have selected the female icon may be about cer- ing style are spuriously correlated, due to the site
tain topics (e.g., child birth or menstruation) more acting as a confounding variable. For example, a
often, and those topics may not receive as many site might be more likely to encounter the target
likes from the audience of the broader online plat- clinical condition due to its location or speciality,
form. As we will see in § 2, due to confounding, and that site might also employ distinctive textual
estimating a causal effect requires assumptions. features, such as boilerplate text at the beginning
Example 1 highlights the setting where the text of each narrative. In the training set, these features
will be predictive of the label, but they are unlikely explainable prediction) involve causal inference.
to be useful in deployment scenarios at new sites. The key ingredient to causal inference is defining
In this example, the hospital site acts like a con- counterfactuals based on an intervention of inter-
founder: It creates a spurious correlation between est. We will illustrate this idea with the motivating
some features of the text and the prediction target. examples from §1.
Example 2 shows how the lack of robustness Example 1 involves online forum posts and the
can make NLP methods less trustworthy. A re- number of likes Y that they receive. We use a bi-
lated problem is that NLP systems are often black nary variable T to indicate whether a post uses a
boxes, making it hard to understand how human- “female icon” (T = 1) or a “male icon” (T = 0).
interpretable features of the text lead to the ob- We view the post icon T as the “treatment” in
served predictions. In this setting, we want to this example, but do not assume that the treatment
know if some part of the text (e.g., some sequence is randomly assigned (it may be selected by the
of tokens) causes the output of an NLP method posts’ authors). The counterfactual outcome Y (1)
(e.g., classification prediction). represents the number of likes a post would have
Causal models can help NLP. To address the received had it used a female icon. The counter-
robustness and interpretability challenges posed factual outcome Y (0) is defined analogously.
by NLP methods, we need new criteria to learn The fundamental problem of causal inference
models that go beyond exploiting correlations. (Holland, 1986) is that we can never observe Y (0)
For example, we want predictors that are invari- and Y (1) simultaneously for any unit of analysis,
ant to certain changes that we make to text, such the smallest unit about which one wants to make
as changing the format while holding fixed the counterfactual inquiries (e.g. a post in Ex. 1). This
ground truth label. There is considerable promise problem is what makes causal inference harder
in using causality to develop new criteria in ser- than statistical inference and impossible without
vice of building robust and interpretable NLP identification assumptions (see § 2.2).
methods. In contrast to the well-studied area of Example 2 involves a trained classifier f (X)
causal inference with text, this area of causality that takes a textual clinical narrative X as input
and NLP research is less well-understood, though and outputs a diagnosis prediction. The text X is
well-motivated by recent empirical successes. In written based on the physician’s diagnosis Y , and
§4, we cover the existing research and review the is also influenced by the writing style used at the
challenges and opportunities around using causal- hospital Z. We want to intervene upon the hospital
ity to improve NLP. Z while holding the label Y fixed. The counter-
This position paper follows a small body of sur- factual narrative X(z) is the text we would have
veys that review the role of text data within causal observed had we set the hospital to the value z
inference (Egami et al., 2018; Keith et al., 2020). while holding the diagnosis fixed. The counterfac-
We take a broader view, separating the intersec- tual prediction f (X(z)) is the output the trained
tion of causality and NLP into two distinct lines classifier would have produced had we given the
of research on estimating causal effects in which counterfactual review X(z) as input.
text is at least one causal variable (§3) and using
causal formalisms to improve robustness and in- 2.1 Causal Estimands
terpretability in NLP methods (§4). After reading An analyst begins by specifying target causal
this paper, we envision that the reader will have a quantities of interest, called causal estimands,
broad understanding of: Different types of causal which typically involve counterfactuals. In Exam-
queries and the challenges they present; the statis- ple 1, one possible causal estimand is the average
tical and causal challenges that are unique to work- treatment effect (ATE) (Rubin, 1974),
ing with text data and NLP methods; and open
problems in estimating effects from text and ap- ATE = E[Y (1) − Y (0)]. (1)
plying causality to improve NLP methods.
where the expectation is over the generative distri-
2 Background bution of posts. The ATE can be interpreted as the
change in the number of likes a post would have
Both focal problems of this survey (causal effect received, on average, had the post used a female
estimation and causal formalisms for robust and icon instead of a male icon.
Another possible causal effect of interest is the where X is a set of observed variables, condi-
conditional average treatment effect (CATE) (Im- tioning on which ensures independence between
bens and Rubin, 2015), the treatment assignment and the potential out-
CATE = E[Y (1) − Y (0) | G]. (2) comes. In other words, we can assume that all
confounders are observed.
where G is a predefined subgroup of the popula- Positivity requires that the probability of re-
tion. For example, G could be all posts on political ceiving treatment is bounded away from 0 and 1
topics. In this case, the CATE can be interpreted for all values of the confounders X:
as the change in the number of likes a post on a po-
litical topic would have received, on average, had 0 < Pr(T = 1 | X = x) < 1, ∀x (5)
the post used a male icon instead of a female icon.
CATEs are used to quantify the heterogeneity of Intuitively, positivity requires that each unit under
causal effects in different population subgroups. study has the possibility of being treated and has
the possibility of being untreated. Randomized
2.2 Identification Assumptions for Causal treatment assignment can also guarantee positiv-
Inference ity by design.
We will focus on Example 1 and the ATE in Equa- Consistency requires that the outcome ob-
tion (1) to explain the assumptions needed for served for each unit under study at treatment level
causal inference. Although we focus on the ATE, a ∈ {0, 1} is identical to the outcome we would
related assumptions are needed in some form for have observed had that unit been assigned to treat-
all causal estimands. Variables are the same as ment level a,
those defined previously in this section.
Ignorability requires that the treatment assign- T = a ⇔ Y (a) = Y ∀a ∈ {0, 1} (6)
ment be statistically independent of the counter-
Consistency ensures that the potential outcomes
factual outcomes,
for each unit under study take on a single value
T ⊥
⊥ Y (a) ∀a ∈ {0, 1}. (3) at each treatment level. Consistency will be vi-
Note that this assumption is not equivalent to in- olated if different unobservable “versions” of the
dependence between the treatment assignment and treatment lead to different potential outcomes. For
the observed outcome Y . For example, if ignora- example, if red and blue female icons had differ-
bility holds, Y ⊥
⊥ T would additionally imply that ent effects on the number of likes received, but
the treatment has no effect. icon color was not recorded. Consistency will also
Randomized treatment assignment guarantees be violated if the treatment assignment of one unit
ignorability by design. For example, we can guar- affects the potential outcomes of another; a phe-
antee ignorability in Example 1 by flipping a coin nomenon called interference (Rosenbaum, 2007).
to select the icon for each post, and disallowing Randomized treatment assignment does not guar-
post authors from changing it. antee consistency by design. e.g. if different
Without randomized treatment assignment, ig- icon colors affect the number of likes but are not
norability could be violated by confounders, vari- considered by the model, then a randomized ex-
ables that influence both the treatment status and periment will not solve the problem. As Hernán
potential outcomes. In Example 1, suppose that: (2016) discusses, consistency assumptions are a
(i) the default post icon is male, (ii) only experi- “matter of expert agreement” and, while subjec-
enced users change the icon for their posts based tive, these assumptions are at least made more
on their gender, (iii) experienced users write posts transparent by causal formalisms.
that receive relatively more likes. In this sce- These three assumptions enable identifying the
nario, the experience of post authors is a con- ATE defined in Equation (1), as formalized in the
founder: posts having female icons are more likely following identification proof:
to be written by experienced users and thus, re- (i)
ceive more likes. In the presence of confounders, E[Y (a)] = EX [E[Y (a) | X]]
causal inference is only possible if we assume con- (ii)
= EX [E[Y (a) | X, T = a]]
ditional ignorability,
(iii)
T ⊥
⊥ Y (a) | X ∀a ∈ {0, 1} (4) = EX [E[Y | X, T = a]], ∀a ∈ {0, 1}
Example 1 Example 2
where equality (i) is due to iterated expectation,
equality (ii) follows from conditional ignorabil-
ity, and equality (iii) follows from consistency and X Z
positivity, which ensures that the conditional ex- Y X f (X )
pectation E[Y | X, T = a] is well defined. The T Y
final expression can be computed from observable
quantities alone.
We refer to other background material to dis- Figure 1: Causal graphs for the motivating exam-
cuss how to identify and estimate causal effects ples. (Left) In Example 1, the post icon (T ) is cor-
with these assumptions in hand (Rubin, 2005; related with attributes of the post (X), and both
Pearl, 2009; Imbens and Rubin, 2015; Egami variables affect the number of likes a post receives
et al., 2018; Keith et al., 2020). (Y ). (Right) In Example 2, the label (Y , i.e., di-
agnosis) and hospital site (Z) are correlated, and
2.3 Causal Graphical Models both affect the clinical narrative (X). Predictions
Finding a set of variables X that ensure con- f (X) from the trained classifier depend on X.
ditional ignorability is challenging, and requires
making several carefully assessed assumptions
causal estimation communities to understand what
about the causal relationships in the domain un-
are the requisite assumptions to draw valid causal
der study. Causal directed-acyclic graphs (DAGs)
conclusions. We highlight prior approaches and
(Pearl, 2009) enable formally encoding these as-
future challenges in settings where the text is a
sumptions and deriving the set of variables X after
confounder, the outcome, or the treatment – but
conditioning on which ignorability is satisfied.
this discussion applies broadly to many text-based
In a causal DAG, an edge X → Y implies that causal problems.
X may or may not cause Y . The absence of an
To make these challenges clear, we will expand
edge between X and Y implies that X does not
upon Example 1 by supposing a hypothetical on-
cause Y . Bi-directed dotted arrows between vari-
line forum wants to understand and reduce harass-
ables indicate that they are correlated potentially
ment on its platform. Many such questions are
through some unobserved variable.
causal: Do gendered icons influence the harass-
Figure 1 illustrates the causal DAGs we assume ment users receive? Do longer suspensions make
for Example 1 and Example 2. Given a causal users less likely to harass others? How can a post
DAG, causal dependencies between any pair of be rewritten to avoid offending others? In each
variables can be derived using the d-separation al- case, using NLP to measure aspects of language is
gorithm (Pearl, 1994). These dependencies can integral to any causal analysis.
then be used to assess whether conditional ignor-
ability holds for a given treatment, outcome, and 3.1 Causal Effects with Textual Confounders
set of conditioning variables X. For example, in
Returning to Example 1, suppose the platform
the left DAG in Figure 1, the post icon T is not
worries that users with female icons are more
independent of the number of likes Y unless we
likely to receive harassment from other users.
condition on X. In the right DAG, the prediction
Such a finding might significantly influence plans
f (X) is not independent of the hospital Z even
for a new moderation strategy (Jhaver et al., 2018;
after conditioning on the narrative X.
Rubin et al., 2020). We may be unable or unwill-
3 Estimating Causal Effects with Text ing to randomize our treatment (the gender signal)
of the author’s icon), so the causal effect of gen-
In §2, we described assumptions for causal in- der signal on harassment received might be con-
ference when the treatment, outcome and con- founded by other variables. The topic of the post
founders were directly measured. In this section, may be an important confounder: some subject
we contribute a novel discussion about how causal areas may be discussed by a larger proportion of
assumptions are complicated when variables nec- users with female icons, and more controversial
essary for a causal analysis are extracted automati- subjects may attract more harassment. The text of
cally from text. Addressing these open challenges the post provides evidence of the topic and thus
will require collaborations between the NLP and acts as a confounder (Roberts et al., 2020).
Previous approaches. The main idea in this set- sible to imagine changing the gender icon while
ting is to use NLP methods to extract confounding holding the gendered text fixed.
aspects from text and then adjust for those aspects
in an estimation approach such as propensity score
3.2 Causal Effects on Textual Outcomes
matching. However, how and when these methods
violate causal assumptions are still open questions. Suppose platform moderators can choose to sus-
Keith et al. (2020) provides a recent overview of pend users who violate community guidelines for
several such methods and many potential threats either one day or one week, and we want to know
to inference. which option has the greatest effect at decreasing
One set of methods apply unsupervised di- the toxicity of the suspended user. If we could col-
mensionality reduction methods that reduce high- lect them for each user’s post, ground-truth human
dimensional text data to a low-dimensional set of annotations of toxicity would be our ideal outcome
variables. Such methods include latent variable variable. We would then use those outcomes to
models such as topic models, embedding methods, calculate the ATE, following the discussion in § 2.
and auto-encoders. Roberts et al. (2020) and Srid- Our analysis of suspensions is complicated if, in-
har and Getoor (2019) have applied topic models stead of ground-truth labels for our toxicity out-
to extract confounding patterns from text data, and come, we rely on NLP methods to extract the out-
performed an adjustment for these inferred vari- come from the text. A core challenge is to distill
ables. Mozer et al. (2020) matches texts using dis- the high-dimensional text into a low-dimensional
tance metrics on the bag-of-words representation. measure of toxicity.
A second set of methods adjust for confounders
from text with supervised NLP methods. Recently,
Challenges for causal assumptions with text.
Veitch et al. (2020) adapted pre-trained language
We saw in § 2 that randomizing the treatment
models and supervised topic models with multiple
assignment can ensure ignorability and positiv-
classification heads for binary treatment and coun-
ity; but even with randomization, we require more
terfactual outcomes. By learning a “sufficient”
careful assessment to satisfy consistency. Sup-
embedding that obtained low classification loss on
pose we randomly assign suspension lengths to
the treatment and counterfactual outcomes, they
users and then once those users return and con-
show that confounding properties could be found
tinue to post, we use a clustering method to dis-
within text data. Roberts et al. (2020) combines
cover toxic and non-toxic groupings among the
these strategies with the topic model approach in a
formerly-suspended users. To estimate the causal
text matching framework.
effect of suspension length, we rely on the trained
Challenges for causal assumptions with text. clustering model to infer our outcome variable.
In settings without randomized treatments, NLP Assuming that the suspension policy does in truth
methods that adjust for text confounding require have a causal effect on posting behavior, then be-
a particularly strong statement of conditional ig- cause our clustering model depends on all posts in
norability (Equation 4): all aspects of confounding its training data, it also depends on the treatment
must be measured by the model. Because we can- assignments that influenced each post. Thus, when
not test this assumption, we should seek domain we use the model to infer outcomes, each user’s
expertise to justify it or understand the theoretical outcome depends on all other users’ treatments.
and empirical consequences if it is violated. This violates the assumption of consistency – that
When the text is a confounder, its high- potential outcomes do not depend on the treatment
dimensionality makes positivity unlikely to hold status of other units. This undermines the theo-
(D’Amour et al., 2020). Even for approaches retical basis for our causal estimate, and in prac-
that extract a low-dimensional representation of tice, implies that different randomized treatment
the confounder from text, positivity is a concern. assignments could lead to different treatment ef-
For example, in Example 1, posts might con- fect estimates.
tain phrases that near-perfectly encode the chosen These issues can be addressed by developing the
gender-icon of the author. If the learned repre- measure on only a sample of the data and then esti-
sentation captures this information alongside other mating the effect on a separate, held-out data sam-
confounding aspects, it would be nearly impos- ple (Egami et al., 2018).
3.3 Causal Effects with Textual Treatments of texts, we still have to assume that there is no
As a third example, suppose we want to under- confounding due to latent properties of the reader,
stand what makes a post offensive. This might al- such as their political ideology or their tastes.
low the platform to provide automated suggestions
that encourage users to rephrase their post. Here, 3.4 Future Work
we are interested in the causal effect of the text We next highlight key challenges and opportuni-
itself on whether a reader reports it as offensive. ties for NLP researchers to facilitate causal infer-
Theoretically, the counterfactual Y (t) is defined ence from text.
for any t, but could be limited to an exploration Heterogeneous effects. Texts are read and in-
of specific aspects of the text. For example, do terpreted differently by different people; NLP re-
second-person pronouns make a post more likely searchers have studied this problem in the context
to be reported? of heterogeneous perceptions of annotators (Paun
et al., 2018; Pavlick and Kwiatkowski, 2019). In
Previous approaches. One approach to study-
the field of causal inference, the idea that dif-
ing the effects of text involves treatment discovery:
ferent subgroups experience different causal ef-
producing interpretable features of the text—such
fects is formalized by a heterogeneous treatment
as latent topics or lexical features like n-grams
effect, and is studied using conditional average
(Pryzant et al., 2018)—that can be causally linked
treatment effects (Equation (2)) for different sub-
to outcomes. For example, Fong and Grimmer
groups. It may also be of interest to discover sub-
(2016) discovered features of candidate biogra-
groups where the treatment has a strong effect on
phies that drove voter evaluations, Pryzant et al.
an outcome of interest. For example, we may want
(2017) discovered writing styles in marketing ma-
to identify text features that characterize when a
terials that are influential in increasing sales fig-
treatment such as a content moderation policy is
ures, and Zhang et al. (2020) discovered conver-
effective. Wager and Athey (2018) proposed a
sational tendencies that lead to positive mental
flexible approach to estimating heterogeneous ef-
health counseling sessions.
fects based on random forests. However, such ap-
Another approach is to estimate the causal ef-
proaches, which are developed with tabular data in
fects of specific latent properties that are inter-
mind, may be computationally infeasible for high-
vened on during an experiment or extracted from
dimensional text data. There is an opportunity to
text for observational studies (Pryzant et al., 2021;
extend NLP methods to discover text features that
Wood-Doughty et al., 2018). For example, Ger-
capture subgroups where the causal effect varies.
ber et al. (2008) studied the effect of appealing to
civic duty on voter turnout. In this setting, factors Representation learning. Causal inference
are latent properties of the text for which we need from text requires extracting low-dimensional fea-
a measurement model. tures from text. Depending on the setting, the
low-dimensional features are tasked with extract-
Challenges for causal assumptions with text. ing confounding information, outcomes or treat-
Ensuring positivity and consistency remains a ments. The need to measure latent aspects from
challenge in this setting, but assessing conditional text connects to the field of text representation
ignorability is particularly tricky. Suppose the learning (Le and Mikolov, 2014; Liu et al., 2015;
treatment is the use of second-person pronouns, Liu and Lapata, 2018). The usual objective of text
but the relationship between this treatment and the representation learning approaches is to model
outcome is confounded by other properties of the language. Adapting representation learning for
text (e.g., politeness). For conditional ignorabil- causal inference offers open challenges; for exam-
ity to hold, we would need to extract from the text ple, we might augment the objective function to
and condition on all such confounders, which re- ensure that (i) positivity is satisfied, (ii) confound-
quires assuming that we can disentangle the treat- ing information is not discarded, or (iii) noisily-
ment from many other aspects of the text (Pryzant measured outcomes or treatments enable accurate
et al., 2021). Such concerns could be avoided causal effect estimates.
by randomly assigning texts to readers (Fong and Benchmarks. Benchmark datasets have
Grimmer, 2016, 2021), but that may be impracti- propelled machine learning forward by creating
cal. Even if we could randomize the assignment shared metrics by which predictive models can
be evaluated. There are currently no real-world soning to help solve traditional NLP tasks such as
text-based causal estimation benchmarks due to understanding, manipulating, and generating nat-
the fundamental problem of causal inference that ural language.
we can never obtain counterfactuals on an indi- At a first glance, NLP may appear to have lit-
vidual and observe the true causal effects. How- tle need for causal ideas. The field has achieved
ever, as Keith et al. (2020) discuss, there has remarkable progress from the use of increasingly
been some progress in evaluating text-based es- high-capacity neural architectures to extract cor-
timation methods on semi-synthetic datasets in relations from large-scale datasets (Peters et al.,
which real covariates are used to generate treat- 2018; Devlin et al., 2019; Liu et al., 2019). These
ment and outcomes, e.g. Veitch et al. (2020); architectures make no distinction between causes,
Roberts et al. (2020); Pryzant et al. (2021); Feder effects, and confounders, and they make no at-
et al. (2021); Weld et al. (2022). Wood-Doughty tempt to identify causal relationships: A feature
et al. (2021) employed large-scale language mod- may be a powerful predictor even if it has no di-
els for controlled synthetic generation of text on rect causal relationship with the desired output.
which causal methods can be evaluated. An open Yet correlational predictive models can be un-
problem is the degree to which methods that per- trustworthy (Jacovi et al., 2021): They may latch
form well on synthetic data generalize to real- onto spurious correlations (“shortcuts”), leading to
world data. errors in out-of-distribution (OOD) settings (e.g.,
Controllable Text generation. When running McCoy et al., 2019); they may exhibit unac-
a randomized experiment or generating synthetic ceptable performance differences across groups of
data, researchers make decisions using the em- users (e.g., Zhao et al., 2017); and their behavior
pirical distribution of the data. If we are study- may be too inscrutable to incorporate into high-
ing whether a drug prevents headaches, it would stakes decisions (Guidotti et al., 2018). Each of
make sense to randomly assign a ‘reasonable’ dose these shortcomings can potentially be addressed
– one that is large enough to plausibly be effec- by the causal perspective: Knowledge of the
tive but not so large as to be toxic. But when causal relationship between observations and la-
the causal question involves natural language, do- bels can be used to formalize spurious correlations
main knowledge might not provide a small set of and mitigate their impact (§ 4.1); causality also
‘reasonable’ texts. Instead, we might turn to con- provides a language for specifying and reason-
trollable text generation to sample texts that ful- ing about fairness conditions (§ 4.2); and the task
fill some requirements (Kiddon et al., 2016). Such of explaining predictions may be naturally formu-
methods have a long history in NLP; for example, lated in terms of counterfactuals (§ 4.3). The ap-
a conversational agent should be able to answer plication of causality to these problems is still an
a user’s question while being perceived as polite active area of research, which we attempt to facil-
(Niu and Bansal, 2018). In our text as treatment itate by highlighting previously implicit connec-
example where we want to understand which tex- tions among a diverse body of prior work.
tual aspects make a text offensive, such methods
could enable an experiment allowing us to ran- 4.1 Learning Robust Predictors
domly assign texts that differ on only a specific
latent aspect. For example, we could change the The NLP field has grown increasingly concerned
style of a text while holding its content fixed (Lo- with spurious correlations (Gururangan et al.,
geswaran et al., 2018). Recent work has explored 2018; McCoy et al., 2019, inter alia). From
text generation from a causal perspective (Hu and a causal perspective, spurious correlations arise
Li, 2021), but future work could develop these when two conditions are met. First, there must be
methods for causal estimation. some factor(s) Z that are informative (in the train-
ing data) about both the features X and label Y .
4 Robust and Explainable Predictions Second, Y and Z must be dependent in the train-
from Causality ing data in a way that is not guaranteed to hold in
general. A predictor f : X → Y will learn to use
Thus far we have focused on using NLP tools for parts of X that carry information about Z (because
estimating causal effects in the presence of text Z is informative about Y ), which can lead to errors
data. In this section, we consider using causal rea- if the relationship between Y and Z changes when
the predictor is deployed.2 identical in the factual X and the counterfactual
This issue is illustrated by Example 2, where X(Y = ỹ). A predictor that relies solely on such
the task is to predict a medical condition from the spurious correlations will be unable to correctly la-
text of patient records. The training set is drawn bel both factual and counterfactual instances.
from multiple hospitals which vary both in the fre- A number of approaches have been proposed
quency of the target clinical condition (Y ) and the for learning predictors that pass tests of sensitivity
writing style of the narratives (represented in X). and invariance. Many of these approaches are ei-
A predictor trained on such data will use textual ther explicitly or implicitly motivated by a causal
features that carry information about the hospital perspective. They can be viewed as ways to in-
(Z), even when they are useless at predicting the corporate knowledge of the causal structure of the
diagnosis within any individual hospital. Spurious data into the learning objective.
correlations also appear as artifacts in benchmarks
4.1.1 Data augmentation
for tasks such as natural language inference, where
negation words are correlated with semantic con- To learn predictors that pass tests of invariance
tradictions in crowdsourced training data but not and sensitivity, a popular and straightforward ap-
in text that is produced under more natural condi- proach is data augmentation: Elicit or construct
tions (Gururangan et al., 2018; Poliak et al., 2018). counterfactual instances, and incorporate them
Such observations have led to several proposals into the training data. When the counterfac-
for novel evaluation methodologies (Naik et al., tuals involve perturbations to confounding fac-
2018; Ribeiro et al., 2020; Gardner et al., 2020) tors Z, it can help to add a term to the learn-
to ensure that predictors are not “right for the ing objective to explicitly penalize disagreements
wrong reasons”. These evaluations generally take in the predictions for counterfactual pairs, e.g.,
two forms: Invariance tests, which assess whether |f (X(Z = z)) − f (X(Z = z̃))|, when f is the
predictions are affected by perturbations that are prediction function (Garg et al., 2019). When
causally unrelated to the label, and sensitivity tests, perturbations are applied to the label Y , training
which apply perturbations that should in some on label counterfactuals X(Y = ỹ) can improve
sense be the minimal change necessary to flip the OOD generalization and reduce noise sensitivity
true label. Both types of test can be motivated by (Kaushik et al., 2019, 2020; Jha et al., 2020).3
a causal perspective. The purpose of an invariance Counterfactual examples can be generated in
test is to determine whether the predictor behaves several ways: (1) manual post-editing (e.g.,
differently on counterfactual inputs X(Z = z̃), Kaushik et al., 2019; Gardner et al., 2020), (2)
where Z indicates a property that an analyst be- heuristic replacement of keywords (e.g., Shekhar
lieves should be causally irrelevant to Y . A model et al., 2017; Garg et al., 2019; Feder et al., 2021),
whose predictions are invariant across such coun- and (3) automated text rewriting (e.g., Zmigrod
terfactuals can in some cases be expected to per- et al., 2019; Riley et al., 2020; Wu et al., 2021;
form better on test distributions with a different re- Calderon et al., 2022). Manual editing is typ-
lationship between Y and Z (Veitch et al., 2021). ically fluent and accurate but relatively expen-
Similarly, sensitivity tests can be viewed as evalu- sive. Keyword-based approaches are appropriate
ations of counterfactuals X(Y = ỹ), in which the in some cases — for example, when counterfactu-
label Y is changed but all other causal influences als can be obtained by making local substitutions
on X are held constant (Kaushik et al., 2020). Fea- of closed-class words like pronouns — but they
tures that are spuriously correlated with Y will be cannot guarantee fluency or coverage of all labels
and covariates of interest (Antoniak and Mimno,
2
From the perspective of earlier work on domain adapta- 2021), and are difficult to generalize across lan-
tion (Søgaard, 2013), spurious correlations can be viewed as a guages. Fully generative approaches could poten-
special case of a more general phenomenon in which feature-
3
label relationships change across domains. For example, the More broadly, there is a long history of methods that
lexical feature boring might have a stronger negative weight elicit or construct new examples and labels with the goal of
in reviews about books than about kitchen appliances, but this improving generalization, e.g. self-training (McClosky et al.,
is not a spurious correlation because there is a direct causal 2006; Reichart and Rappoport, 2007), co-training (Steedman
relationship between this feature and the label. Spurious cor- et al., 2003), and adversarial perturbations (Ebrahimi et al.,
relations are a particularly important form of distributional 2018). The connection of such methods to causal issues
shift in practice because they can lead to inconsistent predic- such as spurious correlations has not been explored until re-
tions on pairs of examples that humans view as identical. cently (Chen et al., 2020; Jin et al., 2021).
tially combine the fluency and coverage of manual turning to Example 2, the desideratum is that the
editing with the ease of lexical heuristics. predicted diagnosis f (X) should not be affected
Counterfactual examples are a powerful re- by the aspects of the writing style that are asso-
source because they directly address the missing ciated with the hospital Z. This can be formal-
data issues that are inherent to causal inference, ized as counterfactual invariance to Z: The pre-
as described in § 2. However, in many cases it dictor f should satisfy f (X(z)) = f (X(z 0 )) for
is difficult for even a fluent human to produce all z, z 0 . In this case, both Z and Y are causes
meaningful counterfactuals: Imagine the task of of the text features X.4 Using this observation,
converting a book review into a restaurant review it can be shown that any counterfactually invari-
while somehow leaving “everything else” constant ant predictor will satisfy f (X) ⊥ ⊥ Z | Y , i.e., the
(as in Calderon et al. (2022)). A related con- prediction f (X) is independent of the covariate Z
cern is lack of precision in specifying the desired conditioned on the true label Y . In other cases,
impact of the counterfactual. To revise a text such as content moderation, the label is an effect
from, say, U.S. to U.K. English, it is unambiguous of the text, rather than a cause — for a detailed dis-
that “colors” should be replaced with “colours”, cussion of this distinction, see Jin et al. (2021). In
but should terms like “congress” be replaced with such cases, it can be shown that a counterfactually-
analogous concepts like “parliament”? This de- invariant predictor will satisfy f (X) ⊥ ⊥ Z (with-
pends on whether we view the semantics of the out conditioning on Y ). In this fashion, knowl-
text as a causal descendent of the locale. If such edge of the true causal structure of the problem can
decisions are left to the annotators’ intuitions, it be used to derive observed-data signatures of the
is difficult to ascertain what robustness guarantees counterfactual invariance. Such signatures can be
we can get from counterfactual data augmentation. incorporated as regularization terms in the train-
Finally, there is the possibility that counterfactuals ing objective (e.g., using kernel-based measures
will introduce new spurious correlations. For ex- of statistical dependence). These criteria do not
ample, when asked to rewrite NLI examples with- guarantee counterfactual invariance—the implica-
out using negation, annotators (or automated text tion works in the other direction—but in practice
rewriters) may simply find another shortcut, intro- they increase counterfactual invariance and im-
ducing a new spurious correlation. Keyword sub- prove performance in out-of-distribution settings
stitution approaches may also introduce new spu- without requiring counterfactual examples.
rious correlations if the keyword lexicons are in- An alternative set of distributional criteria can
complete (Joshi and He, 2021). Automated meth- be derived by viewing the training data as aris-
ods for conditional text rewriting are generally not ing from a finite set of environments, in which
based on a formal counterfactual analysis of the each environment is endowed a unique distribu-
data generating process (cf. Pearl, 2009), which tion over causes, but the causal relationship be-
would require modeling the relationships between tween X and Y is invariant across environments.
various causes and consequences of the text. The This view motivates a set of environmental in-
resulting counterfactual instances may therefore variance criteria: The predictor should include a
fail to fully account for spurious correlations and representation function that is invariant across en-
may introduce new spurious correlations. vironments (Muandet et al., 2013; Peters et al.,
2016); we should induce a representation such
4.1.2 Distributional Criteria
that the same predictor is optimal in every en-
An alternative to data augmentation is to design vironment (Arjovsky et al., 2019); the predictor
new learning algorithms that operate directly on should be equally well calibrated across environ-
the observed data. In the case of invariance tests, ments (Wald et al., 2021). Multi-environment
one strategy is to derive distributional properties training is conceptually similar to domain adap-
of invariant predictors, and then ensure that these tation (Ben-David et al., 2010), but here the goal
properties are satisfied by the trained model. is not to learn a predictor for any specific target
Given observations of the potential confounder domain, but rather to learn a predictor that works
at training time, the counterfactually-invariant pre-
dictor will satisfy an independence criterion that 4
This is sometimes called the anticausal setting, because
can be derived from the causal structure of the the predictor f : X → Ŷ must reverse the causal direction
data generating process (Veitch et al., 2021). Re- of the data generating process (Schölkopf et al., 2012).
well across a set of causally-compatible domains, in text classification, and Zhao et al. (2018) swap
known as domain generalization (Ghifary et al., gender markers such as pronouns and names for
2015; Gulrajani and Lopez-Paz, 2020). However, coreference resolution. Counterfactual data aug-
it may be necessary to observe data from a very mentation has also been applied to reduce bias
large number of environments to disentangle the in pre-trained models (e.g., Huang et al., 2019;
true causal structure (Rosenfeld et al., 2021). Maudslay et al., 2019) but the extent to which
Both general approaches require richer training biases in pre-trained models propagate to down-
data than in typical supervised learning: Either ex- stream applications remains unclear (Goldfarb-
plicit labels Z for the factors to disentangle from Tarrant et al., 2021). Fairness applications of the
the predictions or access to data gathered from distributional criteria discussed in § 4.1.2 are rel-
multiple labeled environments. Obtaining such atively rare, but Adragna et al. (2020) show that
data may be rather challenging, even compared invariant risk minimization (Arjovsky et al., 2019)
to creating counterfactual instances. Furthermore, can reduce the use of spurious correlations with
the distributional approaches have thus far been race for toxicity detection.
applied only to classification problems, while data
augmentation can easily be applied to structured 4.3 Causal Model Interpretations
outputs such as machine translation.
Explanations of model predictions can be crucial
4.2 Fairness and bias to help diagnose errors and establish trust with de-
cision makers (Guidotti et al., 2018; Jacovi and
NLP systems inherit and sometimes amplify unde-
Goldberg, 2020). One prominent approach to gen-
sirable biases encoded in text training data (Baro-
erate explanations is to exploit network artifacts,
cas et al., 2019; Blodgett et al., 2020). Causality
such as attention weights (Bahdanau et al., 2014),
can provide a language for specifying desired fair-
which are computed on the path to generating
ness conditions across demographic attributes like
a prediction (e.g., Xu et al., 2015; Wang et al.,
race and gender. Indeed, fairness and bias in pre-
2016). Alternatively, there have been attempts to
dictive models have close connections to causal-
estimate simpler and more interpretable models by
ity: Hardt et al. (2016) argue that a causal anal-
using perturbations of test examples or their hid-
ysis is required to determine the fairness proper-
den representations (Ribeiro et al., 2016; Lund-
ties of an observed distribution of data and pre-
berg and Lee, 2017; Kim et al., 2018). How-
dictions; Kilbertus et al. (2017) show that fair-
ever, both attention and perturbation-based meth-
ness metrics can be motivated by causal inter-
ods have important limitations. Attention-based
pretations of the data generating process; Kusner
explanations can be misleading (Jain and Wallace,
et al. (2017) study “counterfactually fair” predic-
2019), and are generally possible only for indi-
tors where, for each individual, predictions are the
vidual tokens; they cannot explain predictions in
same for that individual and for a counterfactual
terms of more abstract linguistic concepts. Exist-
version of them created by changing a protected
ing perturbation-based methods often generate im-
attribute. However, there are important questions
plausible counterfactuals and also do not allow for
about the legitimacy of treating attributes like race
estimating the effect of sentence-level concepts.
as variables subject to intervention (e.g., Kohler-
Hausmann, 2018; Hanna et al., 2020), and Kilber- Viewed as a causal inference problem, explana-
tus et al. (2017) propose to focus instead on invari- tion can be performed by comparing predictions
ance to observable proxies such as names. for each example and its generated counterfactual.
While it is usually not possible to observe coun-
Fairness with text. The fundamental connec- terfactual predictions, here the causal system is the
tions between causality and unfair bias have been predictor itself. In those cases it may be possible to
explored mainly in the context of relatively low- compute counterfactuals, e.g. by manipulating the
dimensional tabular data rather than text. How- activations inside the network (Vig et al., 2020;
ever, there are several applications of the counter- Geiger et al., 2021). Treatment effects can then
factual data augmentation strategies from § 4.1.1 be computed by comparing the predictions under
in this setting: For example, Garg et al. (2019) the factual and counterfactual conditions. Such a
construct counterfactuals by swapping lists of controlled setting is similar to the randomized ex-
“identity terms”, with the goal of reducing bias periment described in § 2, where it is possible to
compute the difference between an actual text and specified in the causal model. Unobserved con-
what the text would have been had a specific con- founding is challenging for causal inference in
cept not existed in it. Indeed, in cases where coun- general, but it is likely to be ubiquitous in lan-
terfactual texts can be generated, we can often esti- guage applications, in which the text arises from
mate causal effects on text-based models (Ribeiro the author’s intention to express a structured ar-
et al., 2020; Gardner et al., 2020; Rosenberg et al., rangement of semantic concepts, and the label cor-
2021; Ross et al., 2021; Meng et al., 2022; Zhang responds to a query, either directly on the intended
et al., 2022). However, generating such counter- semantics or on those understood by the reader.
factuals is challenging (see § 4.1.1). Partial causal models of text can be “top down”,
To overcome the counterfactual generation in the sense of representing causal relationships
problem, another class of approaches proposes to between the text and high-level document meta-
manipulate the representation of the text and not data such as authorship, or “bottom up”, in the
the text itself (Feder et al., 2021; Elazar et al., sense of representing local linguistic invariance
2021; Ravfogel et al., 2021). Feder et al. (2021) properties, such as the intuition that a multiword
compute the counterfactual representation by pre- expression like ‘San Francisco’ has a single cause.
training an additional instance of the language The methods described here are almost exclu-
representation model employed by the classifier, sively based on top-down models, but approaches
with an adversarial component designed to "for- such as perturbing entity spans (e.g., Longpre
get" the concept of choice, while controlling for et al., 2021) can be justified by implicit bottom-
confounding concepts. Ravfogel et al. (2020) offer up causal models. Making these connections more
a method for removing information from represen- explicit may yield new insights. Future work may
tations by iteratively training linear classifiers and also explore hybrid models that connect high-level
projecting the representations on their null-spaces, document metadata with medium-scale spans of
but do not account for confounding concepts. text such as sentences or paragraphs.
A complementary approach is to generate coun- A related issue is when the true variable of in-
terfactuals with minimal changes that obtain a terest is unobserved but we do receive some noisy
different model prediction (Wachter et al., 2017; or coarsened proxy variable. For example, we may
Mothilal et al., 2020). Such examples allow us to wish to enforce invariance to dialect but have ac-
observe the changes required to change a model’s cess only to geographical information, with which
prediction. Causal modeling can facilitate this by dialect is only approximately correlated. This is an
making it possible to reason about the causal re- emerging area within the statistical literature (Tch-
lationships between observed features, thus iden- etgen et al., 2020), and despite the clear applicabil-
tifying minimal actions which might have down- ity to NLP, we are aware of no relevant prior work.
stream effects on several features, ultimately re- Finally, applications of causality to NLP have
sulting in a new prediction (Karimi et al., 2021). focused primarily on classification, so it is natural
Finally, a causal perspective on attention-based to ask how these approaches might be extended to
explanations is to view internal nodes as media- structured output prediction. This is particularly
tors of the causal effect from the input to the out- challenging for distributional criteria like f (X) ⊥⊥
put (Vig et al., 2020; Finlayson et al., 2021). By Z | Y , because f (X) and Y may now represent
querying models using manually-crafted counter- sequences of vectors or tokens. In such cases it
factuals, we can observe how information flows, may be preferable to focus on invariance criteria
and identify where in the model it is encoded. that apply to the loss distribution or calibration.

4.4 Future work 5 Conclusion


In general we cannot expect to have full causal Our main goal in this survey was to collect the var-
models of text, so a critical question for future ious touchpoints of causality and NLP into one
work is how to safely use partial causal models, space, which we then subdivided into the prob-
which omit some causal variables and do not com- lems of estimating the magnitude of causal effects.
pletely specify the causal relationships within the and more traditional NLP tasks. These branches of
text itself. A particular concern is unobserved con- scientific inquiry share common goals, intuitions,
founding between the variables that are explicitly and are beginning to show methodological syner-
gies. In § 3 we showed how recent advances in Shai Ben-David, John Blitzer, Koby Crammer,
NLP modeling can help researchers make causal Alex Kulesza, Fernando Pereira, and Jen-
conclusions with text data and the challenges of nifer Wortman Vaughan. 2010. A theory of
this process. In § 4, we showed how ideas from learning from different domains. Machine
causal inference can be used to make NLP mod- learning, 79(1):151–175.
els more robust, trustworthy and transparent. We
also gather approaches that are implicitly causal David M Blei, Andrew Y Ng, and Michael I Jor-
and explicitly show their relationship to causal in- dan. 2003. Latent dirichlet allocation. Jour-
ference. Both of these spaces, especially the use nal of machine Learning research, 3(Jan):993–
of causal ideas for robust and explainable predic- 1022.
tions, remain nascent with a large number of open Su Lin Blodgett, Solon Barocas, Hal Daumé III,
challenges which we have detailed throughout this and Hanna Wallach. 2020. Language (technol-
paper. ogy) is power: A critical survey of “bias” in nlp.
A particular advantage of causal methodology In Proceedings of the 58th Annual Meeting of
is that it forces practitioners to explicate their as- the Association for Computational Linguistics,
sumptions. To improve scientific standards, we pages 5454–5476.
believe that the NLP community should be clearer
about these assumptions and analyze their data us- Nitay Calderon, Eyal Ben-David, Amir Feder, and
ing causal reasoning. This could lead to a bet- Roi Reichart. 2022. Docogen: Domain coun-
ter understanding of language and the models we terfactual generation for low resource domain
build to process it. adaptation. In Proceedings of the 60th Annual
Meeting of the Association of Computational
Linguistics (ACL).
References
Yining Chen, Colin Wei, Ananya Kumar, and
Robert Adragna, Elliot Creager, David Madras,
Tengyu Ma. 2020. Self-training avoids us-
and Richard Zemel. 2020. Fairness and ro-
ing spurious features under domain shift. Ad-
bustness in invariant learning: A case study
vances in Neural Information Processing Sys-
in toxicity classification. arXiv preprint
tems, 33:21061–21071.
arXiv:2011.06485.
Alexander D’Amour, Peng Ding, Avi Feller, Li-
Maria Antoniak and David Mimno. 2021. Bad
hua Lei, and Jasjeet Sekhon. 2020. Overlap in
seeds: Evaluating lexical methods for bias mea-
observational studies with high-dimensional co-
surement. In Proceedings of the 59th Annual
variates. Journal of Econometrics.
Meeting of the Association for Computational
Linguistics and the 11th International Joint Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Conference on Natural Language Processing Kristina Toutanova. 2019. BERT: pre-training
(Volume 1: Long Papers), pages 1889–1904, of deep bidirectional transformers for language
Online. Association for Computational Linguis- understanding. In Proceedings of the 2019
tics. Conference of the North American Chapter of
the Association for Computational Linguistics:
Martin Arjovsky, Léon Bottou, Ishaan Gulrajani,
Human Language Technologies, NAACL-HLT
and David Lopez-Paz. 2019. Invariant risk min-
2019, Minneapolis, MN, USA, June 2-7, 2019,
imization. arXiv preprint arXiv:1907.02893.
Volume 1 (Long and Short Papers), pages 4171–
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua 4186. Association for Computational Linguis-
Bengio. 2014. Neural machine translation by tics.
jointly learning to align and translate. arXiv
preprint arXiv:1409.0473. Javid Ebrahimi, Anyi Rao, Daniel Lowd, and De-
jing Dou. 2018. Hotflip: White-box adversarial
Solon Barocas, Moritz Hardt, and Arvind examples for text classification. In Proceedings
Narayanan. 2019. Fairness and Machine of the 56th Annual Meeting of the Association
Learning. fairmlbook.org. http://www. for Computational Linguistics (Volume 2: Short
fairmlbook.org. Papers), pages 31–36.
Naoki Egami, Christian J Fong, Justin Grimmer, Counterfactual fairness in text classification
Margaret E Roberts, and Brandon M Stewart. through robustness. In Proceedings of the 2019
2018. How to make causal inferences using AAAI/ACM Conference on AI, Ethics, and Soci-
texts. arXiv preprint arXiv:1802.02163. ety, pages 219–226.
Yanai Elazar, Shauli Ravfogel, Alon Jacovi, and Atticus Geiger, Hanson Lu, Thomas Icard, and
Yoav Goldberg. 2021. Amnesic probing: Be- Christopher Potts. 2021. Causal abstractions of
havioral explanation with amnesic counterfac- neural networks. Advances in Neural Informa-
tuals. Transactions of the Association for Com- tion Processing Systems, 34.
putational Linguistics, 9:160–175.
Alan S Gerber, Donald P Green, and Christo-
Amir Feder, Nadav Oved, Uri Shalit, and Roi Re- pher W Larimer. 2008. Social pressure and
ichart. 2021. Causalm: Causal model expla- voter turnout: Evidence from a large-scale field
nation through counterfactual language models. experiment. American political Science review,
Computational Linguistics, 47(2):333–386. 102(1):33–48.
Matthew Finlayson, Aaron Mueller, Sebastian
Muhammad Ghifary, W Bastiaan Kleijn, Mengjie
Gehrmann, Stuart Shieber, Tal Linzen, and
Zhang, and David Balduzzi. 2015. Domain
Yonatan Belinkov. 2021. Causal analysis of
generalization for object recognition with multi-
syntactic agreement mechanisms in neural lan-
task autoencoders. In Proceedings of the IEEE
guage models. In Proceedings of the 59th An-
international conference on computer vision,
nual Meeting of the Association for Compu-
pages 2551–2559.
tational Linguistics and the 11th International
Joint Conference on Natural Language Pro- Seraphina Goldfarb-Tarrant, Rebecca Marchant,
cessing (Volume 1: Long Papers), pages 1828– Ricardo Muñoz Sánchez, Mugdha Pandya, and
1843, Online. Association for Computational Adam Lopez. 2021. Intrinsic bias metrics do
Linguistics. not correlate with application bias. In Proceed-
Christian Fong and Justin Grimmer. 2016. Dis- ings of the 59th Annual Meeting of the Asso-
covery of treatments from text corpora. In Pro- ciation for Computational Linguistics and the
ceedings of the 54th Annual Meeting of the As- 11th International Joint Conference on Natu-
sociation for Computational Linguistics (Vol- ral Language Processing (Volume 1: Long Pa-
ume 1: Long Papers), pages 1600–1609. pers), pages 1926–1940, Online. Association
for Computational Linguistics.
Christian Fong and Justin Grimmer. 2021. Causal
inference with latent treatments. American Riccardo Guidotti, Anna Monreale, Salvatore
Journal of Political Science. Forthcoming. Ruggieri, Franco Turini, Fosca Giannotti, and
Dino Pedreschi. 2018. A survey of methods for
Matt Gardner, Yoav Artzi, Victoria Basmov, explaining black box models. ACM computing
Jonathan Berant, Ben Bogin, Sihao Chen, surveys (CSUR), 51(5):1–42.
Pradeep Dasigi, Dheeru Dua, Yanai Elazar,
Ananth Gottumukkala, Nitish Gupta, Hannaneh Ishaan Gulrajani and David Lopez-Paz. 2020. In
Hajishirzi, Gabriel Ilharco, Daniel Khashabi, search of lost domain generalization. arXiv
Kevin Lin, Jiangming Liu, Nelson F. Liu, preprint arXiv:2007.01434.
Phoebe Mulcaire, Qiang Ning, Sameer Singh,
Noah A. Smith, Sanjay Subramanian, Reut Suchin Gururangan, Swabha Swayamdipta, Omer
Tsarfaty, Eric Wallace, Ally Zhang, and Ben Levy, Roy Schwartz, Samuel R Bowman, and
Zhou. 2020. Evaluating models’ local deci- Noah A Smith. 2018. Annotation artifacts in
sion boundaries via contrast sets. In Findings natural language inference data. Proceedings
of the Association for Computational Linguis- of the North American Chapter of the Associ-
tics: EMNLP 2020, pages 1307–1323, Online. ation for Computational Linguistics: Human
Association for Computational Linguistics. Language Technologies (NAACL).

Sahaj Garg, Vincent Perot, Nicole Limtiaco, Alex Hanna, Emily Denton, Andrew Smart, and
Ankur Taly, Ed H Chi, and Alex Beutel. 2019. Jamila Smith-Loud. 2020. Towards a critical
race methodology in algorithmic fairness. In Shagun Jhaver, Sucheta Ghoshal, Amy Bruck-
Proceedings of the 2020 conference on fairness, man, and Eric Gilbert. 2018. Online harassment
accountability, and transparency, pages 501– and content moderation: The case of blocklists.
512. ACM Transactions on Computer-Human Inter-
action (TOCHI), 25(2):1–33.
Moritz Hardt, Eric Price, and Nati Srebro. 2016.
Equality of opportunity in supervised learning. Zhijing Jin, Julius von Kügelgen, Jingwei Ni, Te-
Advances in neural information processing sys- jas Vaidhya, Ayush Kaushal, Mrinmaya Sachan,
tems, 29:3315–3323. and Bernhard Schoelkopf. 2021. Causal direc-
tion of data collection matters: Implications of
Miguel A Hernán. 2016. Does water kill? a call
causal and anticausal learning for nlp. In Pro-
for less casual causal inferences. Annals of epi-
ceedings of the 2021 Conference on Empiri-
demiology, 26(10):674–680.
cal Methods in Natural Language Processing,
Paul W Holland. 1986. Statistics and causal infer- pages 9499–9513.
ence. Journal of the American statistical Asso-
Nitish Joshi and He He. 2021. An investigation
ciation, 81(396):945–960.
of the (in) effectiveness of counterfactually aug-
Zhiting Hu and Li Erran Li. 2021. A causal lens mented data. arXiv preprint arXiv:2107.00753.
for controllable text generation. Advances in
Amir-Hossein Karimi, Bernhard Schölkopf, and
Neural Information Processing Systems, 34.
Isabel Valera. 2021. Algorithmic recourse:
Po-Sen Huang, Huan Zhang, Ray Jiang, Robert from counterfactual explanations to interven-
Stanforth, Johannes Welbl, Jack Rae, Vishal tions. In Proceedings of the 2021 ACM Con-
Maini, Dani Yogatama, and Pushmeet Kohli. ference on Fairness, Accountability, and Trans-
2019. Reducing sentiment bias in language parency, pages 353–362.
models via counterfactual evaluation. arXiv
Divyansh Kaushik, Eduard Hovy, and
preprint arXiv:1911.03064.
Zachary C Lipton. 2019. Learning the
Guido W Imbens and Donald B Rubin. 2015. difference that makes a difference with
Causal inference in statistics, social, and counterfactually-augmented data. arXiv
biomedical sciences. Cambridge University preprint arXiv:1909.12434.
Press.
Divyansh Kaushik, Amrith Setlur, Eduard Hovy,
Alon Jacovi and Yoav Goldberg. 2020. To- and Zachary C Lipton. 2020. Explaining the
wards faithfully interpretable nlp systems: How efficacy of counterfactually-augmented data.
should we define and evaluate faithfulness? In arXiv preprint arXiv:2010.02114.
Proceedings of the 58th Annual Meeting of
Katherine Keith, David Jensen, and Brendan
the Association for Computational Linguistics,
O’Connor. 2020. Text and causal inference:
pages 4198–4205.
A review of using text to remove confounding
Alon Jacovi, Ana Marasović, Tim Miller, and from causal estimates. In ACL.
Yoav Goldberg. 2021. Formalizing trust in ar-
Chloé Kiddon, Luke Zettlemoyer, and Yejin Choi.
tificial intelligence: Prerequisites, causes and
2016. Globally coherent text generation with
goals of human trust in ai. In Proceedings of the
neural checklist models. In Proceedings of the
2021 ACM Conference on Fairness, Account-
2016 conference on empirical methods in natu-
ability, and Transparency, pages 624–635.
ral language processing, pages 329–339.
Sarthak Jain and Byron C Wallace. 2019. At-
Niki Kilbertus, Mateo Rojas-Carulla, Giambattista
tention is not explanation. arXiv preprint
Parascandolo, Moritz Hardt, Dominik Janz-
arXiv:1902.10186.
ing, and Bernhard Schölkopf. 2017. Avoid-
Rohan Jha, Charles Lovering, and Ellie Pavlick. ing discrimination through causal reasoning. In
2020. Does data augmentation improve Proceedings of the 31st International Confer-
generalization in nlp? arXiv preprint ence on Neural Information Processing Sys-
arXiv:2004.15012. tems, pages 656–666.
Been Kim, Martin Wattenberg, Justin Gilmer, Car- Scott M Lundberg and Su-In Lee. 2017. A unified
rie Cai, James Wexler, Fernanda Viegas, et al. approach to interpreting model predictions. In
2018. Interpretability beyond feature attribu- Advances in neural information processing sys-
tion: Quantitative testing with concept activa- tems, pages 4765–4774.
tion vectors (tcav). In International Conference
on Machine Learning, pages 2668–2677. Rowan Hall Maudslay, Hila Gonen, Ryan Cot-
terell, and Simone Teufel. 2019. It’s all in the
Issa Kohler-Hausmann. 2018. Eddie murphy and name: Mitigating gender bias with name-based
the dangers of counterfactual causal thinking counterfactual data substitution. arXiv preprint
about detecting racial discrimination. Nw. UL arXiv:1909.00871.
Rev., 113:1163.
David McClosky, Eugene Charniak, and Mark
Matt J Kusner, Joshua Loftus, Chris Russell, and Johnson. 2006. Effective self-training for pars-
Ricardo Silva. 2017. Counterfactual fairness. ing. In Proceedings of the main conference
In Advances in neural information processing on human language technology conference of
systems, pages 4066–4076. the North American Chapter of the Association
of Computational Linguistics, pages 152–159.
Quoc Le and Tomas Mikolov. 2014. Distributed
Citeseer.
representations of sentences and documents. In
International conference on machine learning, R Thomas McCoy, Ellie Pavlick, and Tal Linzen.
pages 1188–1196. PMLR. 2019. Right for the wrong reasons: Diagnos-
ing syntactic heuristics in natural language in-
Xiaodong Liu, Jianfeng Gao, Xiaodong He,
ference. arXiv preprint arXiv:1902.01007.
Li Deng, Kevin Duh, and Ye-Yi Wang. 2015.
Representation learning using multi-task deep Kevin Meng, David Bau, Alex Andonian, and
neural networks for semantic classification and Yonatan Belinkov. 2022. Locating and edit-
information retrieval. In Proceedings of the ing factual knowledge in gpt. arXiv preprint
2015 Conference of the North American Chap- arXiv:2202.05262.
ter of the Association for Computational Lin-
guistics: Human Language Technologies, pages Stephen L Morgan and Christopher Winship.
912–921. 2015. Counterfactuals and causal inference.
Cambridge University Press.
Yang Liu and Mirella Lapata. 2018. Learning
structured text representations. Transactions of Ramaravind K Mothilal, Amit Sharma, and Chen-
the Association for Computational Linguistics, hao Tan. 2020. Explaining machine learning
6:63–75. classifiers through diverse counterfactual expla-
nations. In Proceedings of the 2020 Conference
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei on Fairness, Accountability, and Transparency,
Du, Mandar Joshi, Danqi Chen, Omer Levy, pages 607–617.
Mike Lewis, Luke Zettlemoyer, and Veselin
Stoyanov. 2019. Roberta: A robustly opti- Reagan Mozer, Luke Miratrix, Aaron Russell
mized bert pretraining approach. arXiv preprint Kaufman, and L Jason Anastasopoulos. 2020.
arXiv:1907.11692. Matching with text data: An experimental eval-
uation of methods for matching documents and
Lajanugen Logeswaran, Honglak Lee, and Samy of measuring match quality. Political Analysis,
Bengio. 2018. Content preserving text genera- 28(4):445–468.
tion with attribute controls. Advances in Neural
Information Processing Systems, 31. Krikamol Muandet, David Balduzzi, and Bernhard
Schölkopf. 2013. Domain generalization via in-
Shayne Longpre, Kartik Perisetla, Anthony Chen, variant feature representation. In International
Nikhil Ramesh, Chris DuBois, and Sameer Conference on Machine Learning, pages 10–18.
Singh. 2021. Entity-based knowledge conflicts
in question answering. In Proceedings of the Aakanksha Naik, Abhilasha Ravichander, Nor-
2021 Conference on Empirical Methods in Nat- man Sadeh, Carolyn Rose, and Graham Neu-
ural Language Processing, pages 7052–7063. big. 2018. Stress test evaluation for natural lan-
guage inference. In Proceedings of the 27th In- Van Durme. 2018. Hypothesis only baselines
ternational Conference on Computational Lin- in natural language inference. arXiv preprint
guistics, pages 2340–2353, Santa Fe, New Mex- arXiv:1805.01042.
ico, USA. Association for Computational Lin-
guistics. Reid Pryzant, Dallas Card, Dan Jurafsky, Victor
Veitch, and Dhanya Sridhar. 2021. Causal ef-
Tong Niu and Mohit Bansal. 2018. Polite dialogue fects of linguistic properties. In Proceedings
generation without parallel data. Transactions of the 2021 Conference of the North Ameri-
of the Association for Computational Linguis- can Chapter of the Association for Computa-
tics, 6:373–389. tional Linguistics: Human Language Technolo-
gies, pages 4095–4109.
Yaakov Ophir, Refael Tikochinski, Christa SC
Asterhan, Itay Sisso, and Roi Reichart. 2020. Reid Pryzant, Youngjoo Chung, and Dan Juraf-
Deep neural networks detect suicide risk from sky. 2017. Predicting sales from the language
textual facebook posts. Scientific reports, of product descriptions. In eCOM@ SIGIR.
10(1):1–10.
Reid Pryzant, Kelly Shen, Dan Jurafsky, and Ste-
Silviu Paun, Bob Carpenter, Jon Chamberlain, fan Wagner. 2018. Deconfounded lexicon in-
Dirk Hovy, Udo Kruschwitz, and Massimo Poe- duction for interpretable social science. In Pro-
sio. 2018. Comparing bayesian models of an- ceedings of the 2018 Conference of the North
notation. Transactions of the Association for American Chapter of the Association for Com-
Computational Linguistics, 6:571–585. putational Linguistics: Human Language Tech-
nologies, Volume 1 (Long Papers), pages 1615–
Ellie Pavlick and Tom Kwiatkowski. 2019. Inher-
1625.
ent disagreements in human textual inferences.
Transactions of the Association for Computa- Shauli Ravfogel, Yanai Elazar, Hila Gonen,
tional Linguistics, 7:677–694. Michael Twiton, and Yoav Goldberg. 2020.
Judea Pearl. 1994. A probabilistic calculus of ac- Null it out: Guarding protected attributes by
tions. In Uncertainty Proceedings 1994, pages iterative nullspace projection. arXiv preprint
454–462. Elsevier. arXiv:2004.07667.

Judea Pearl. 2009. Causality. Cambridge univer- Shauli Ravfogel, Grusha Prasad, Tal Linzen, and
sity press. Yoav Goldberg. 2021. Counterfactual interven-
tions reveal the causal effect of relative clause
J Peters, P Bühlmann, and N Meinshausen. 2016. representations on agreement prediction. arXiv
Causal inference using invariant prediction: preprint arXiv:2105.06965.
identification and confidence intervals. Jour-
nal of the Royal Statistical Society-Statistical Roi Reichart and Ari Rappoport. 2007. Self-
Methodology-Series B, 78(5):947–1012. training for enhancement and domain adap-
tation of statistical parsers trained on small
Matthew E. Peters, Mark Neumann, Mohit Iyyer, datasets. In Proceedings of the 45th Annual
Matt Gardner, Christopher Clark, Kenton Lee, Meeting of the Association of Computational
and Luke Zettlemoyer. 2018. Deep contextu- Linguistics, pages 616–623.
alized word representations. In Proceedings
of the 2018 Conference of the North American Marco Tulio Ribeiro, Sameer Singh, and Carlos
Chapter of the Association for Computational Guestrin. 2016. Why should i trust you?: Ex-
Linguistics: Human Language Technologies, plaining the predictions of any classifier. In
NAACL-HLT 2018, New Orleans, Louisiana, Proceedings of the 22nd ACM SIGKDD in-
USA, June 1-6, 2018, Volume 1 (Long Papers), ternational conference on knowledge discovery
pages 2227–2237. Association for Computa- and data mining, pages 1135–1144. ACM.
tional Linguistics.
Marco Tulio Ribeiro, Tongshuang Wu, Carlos
Adam Poliak, Jason Naradowsky, Aparajita Guestrin, and Sameer Singh. 2020. Beyond ac-
Haldar, Rachel Rudinger, and Benjamin curacy: Behavioral testing of NLP models with
CheckList. In Proceedings of the 58th Annual Jennifer D Rubin, Lindsay Blackwell, and Terri D
Meeting of the Association for Computational Conley. 2020. Fragile masculinity: Men, gen-
Linguistics, pages 4902–4912, Online. Associ- der, and online harassment. In Proceedings of
ation for Computational Linguistics. the 2020 CHI Conference on Human Factors in
Computing Systems, pages 1–14.
Parker Riley, Noah Constant, Mandy Guo, Girish
Kumar, David Uthus, and Zarana Parekh. 2020. B Schölkopf, D Janzing, J Peters, E Sgouritsa,
Textsettr: Label-free text style extraction and K Zhang, and J Mooij. 2012. On causal
tunable targeted restyling. arXiv preprint and anticausal learning. In 29th Interna-
arXiv:2010.03802. tional Conference on Machine Learning (ICML
Margaret E Roberts, Brandon M Stewart, and 2012), pages 1255–1262. International Machine
Richard A Nielsen. 2020. Adjusting for con- Learning Society.
founding with text matching. American Journal
Ravi Shekhar, Sandro Pezzelle, Yauhen
of Political Science, 64(4):887–903.
Klimovich, Aurélie Herbelot, Moin Nabi,
Margaret E Roberts, Brandon M Stewart, Dustin Enver Sangineto, and Raffaella Bernardi. 2017.
Tingley, Christopher Lucas, Jetson Leder-Luis, FOIL it! find one mismatch between image
Shana Kushner Gadarian, Bethany Albertson, and language caption. In Proceedings of the
and David G Rand. 2014. Structural topic mod- 55th Annual Meeting of the Association for
els for open-ended survey responses. American Computational Linguistics (Volume 1: Long
Journal of Political Science, 58(4):1064–1082. Papers), pages 255–265, Vancouver, Canada.
Association for Computational Linguistics.
Paul R Rosenbaum. 2007. Interference between
units in randomized experiments. Journal of the Anders Søgaard. 2013. Semi-supervised learn-
american statistical association, 102(477):191– ing and domain adaptation in natural language
200. processing. Synthesis Lectures on Human Lan-
Daniel Rosenberg, Itai Gat, Amir Feder, and Roi guage Technologies, 6(2):1–103.
Reichart. 2021. Are vqa systems rad? mea-
suring robustness to augmented data with fo- Dhanya Sridhar and Lise Getoor. 2019. Estimat-
cused interventions. In Proceedings of the 59th ing causal effects of tone in online debates. In
Annual Meeting of the Association for Compu- International Joint Conference on Artificial In-
tational Linguistics and the 11th International telligence.
Joint Conference on Natural Language Pro-
Mark Steedman, Miles Osborne, Anoop Sarkar,
cessing (Volume 2: Short Papers), pages 61–70.
Stephen Clark, Rebecca Hwa, Julia Hocken-
Elan Rosenfeld, Pradeep Ravikumar, and Andrej maier, Paul Ruhlen, Steven Baker, and Jeremiah
Risteski. 2021. The risks of invariant risk Crim. 2003. Bootstrapping statistical parsers
minimization. In International Conference on from small datasets. In 10th conference of the
Learning Representations, volume 9. European chapter of the association for compu-
tational linguistics.
Alexis Ross, Tongshuang Wu, Hao Peng,
Matthew E Peters, and Matt Gardner. 2021. Tai- Eric J Tchetgen Tchetgen, Andrew Ying, Yifan
lor: Generating and perturbing text with seman- Cui, Xu Shi, and Wang Miao. 2020. An in-
tic controls. arXiv preprint arXiv:2107.07150. troduction to proximal causal learning. arXiv
Donald B Rubin. 1974. Estimating causal effects preprint arXiv:2009.10982.
of treatments in randomized and nonrandom-
Matt Thomas, Bo Pang, and Lillian Lee. 2006.
ized studies. Journal of educational Psychol-
Get out the vote: Determining support or op-
ogy, 66(5):688.
position from congressional floor-debate tran-
Donald B Rubin. 2005. Causal inference using po- scripts. In Proceedings of the 2006 Conference
tential outcomes: Design, modeling, decisions. on Empirical Methods in Natural Language
Journal of the American Statistical Association, Processing, pages 327–335, Sydney, Australia.
100(469):322–331. Association for Computational Linguistics.
Victor Veitch, Alexander D’Amour, Steve Yad- Zach Wood-Doughty, Ilya Shpitser, and Mark
lowsky, and Jacob Eisenstein. 2021. Counter- Dredze. 2021. Generating synthetic text data
factual invariance to spurious correlations: Why to evaluate causal inference methods. arXiv
and how to pass stress tests. arXiv preprint preprint arXiv:2102.05638.
arXiv:2106.00545.
Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey
Victor Veitch, Dhanya Sridhar, and David M Blei. Heer, and Daniel S Weld. 2021. Polyjuice: Au-
2020. Adapting text embeddings for causal in- tomated, general-purpose counterfactual gener-
ference. In UAI. ation. arXiv preprint arXiv:2101.00288.

Jesse Vig, Sebastian Gehrmann, Yonatan Be- Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun
linkov, Sharon Qian, Daniel Nevo, Yaron Cho, Aaron Courville, Ruslan Salakhudinov,
Singer, and Stuart M. Shieber. 2020. Inves- Rich Zemel, and Yoshua Bengio. 2015. Show,
tigating gender bias in language models using attend and tell: Neural image caption genera-
causal mediation analysis. In Advances in Neu- tion with visual attention. In International con-
ral Information Processing Systems 33: Annual ference on machine learning, pages 2048–2057.
Conference on Neural Information Processing PMLR.
Systems 2020, NeurIPS 2020, December 6-12, Justine Zhang, Sendhil Mullainathan, and Cristian
2020, virtual. Danescu-Niculescu-Mizil. 2020. Quantifying
the causal effects of conversational tendencies.
Sandra Wachter, Brent Mittelstadt, and Chris Rus-
Proceedings of the ACM on Human-Computer
sell. 2017. Counterfactual explanations without
Interaction, 4(CSCW2):1–24.
opening the black box: Automated decisions
and the gdpr. Harv. JL & Tech., 31:841. Yi-Fan Zhang, Hanlin Zhang, Zachary C Lipton,
Li Erran Li, and Eric P Xing. 2022. Can trans-
Stefan Wager and Susan Athey. 2018. Esti- formers be strong treatment effect estimators?
mation and inference of heterogeneous treat- arXiv preprint arXiv:2202.01336.
ment effects using random forests. Jour-
nal of the American Statistical Association, Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente
113(523):1228–1242. Ordonez, and Kai-Wei Chang. 2017. Men also
like shopping: Reducing gender bias amplifi-
Yoav Wald, Amir Feder, Daniel Greenfeld, and cation using corpus-level constraints. In Pro-
Uri Shalit. 2021. On calibration and out- ceedings of the 2017 Conference on Empiri-
of-domain generalization. arXiv preprint cal Methods in Natural Language Processing,
arXiv:2102.10395. pages 2979–2989, Copenhagen, Denmark. As-
sociation for Computational Linguistics.
Yequan Wang, Minlie Huang, Xiaoyan Zhu, and
Li Zhao. 2016. Attention-based LSTM for Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente
aspect-level sentiment classification. In Pro- Ordonez, and Kai-Wei Chang. 2018. Gender
ceedings of the 2016 Conference on Empiri- bias in coreference resolution: Evaluation and
cal Methods in Natural Language Processing, debiasing methods. In Proceedings of the 2018
pages 606–615, Austin, Texas. Association for Conference of the North American Chapter
Computational Linguistics. of the Association for Computational Linguis-
tics: Human Language Technologies, Volume
Galen Weld, Peter West, Maria Glenski, David 2 (Short Papers), pages 15–20, New Orleans,
Arbour, Ryan Rossi, and Tim Althoff. 2022. Louisiana. Association for Computational Lin-
Adjusting for confounders with text: Chal- guistics.
lenges and an empirical evaluation framework
for causal inference. ICWSM. Ran Zmigrod, Sabrina J. Mielke, Hanna Wal-
lach, and Ryan Cotterell. 2019. Counterfac-
Zach Wood-Doughty, Ilya Shpitser, and Mark tual data augmentation for mitigating gender
Dredze. 2018. Challenges of using text clas- stereotypes in languages with rich morphology.
sifiers for causal inference. In EMNLP. In Proceedings of the 57th Annual Meeting of
the Association for Computational Linguistics,
pages 1651–1661, Florence, Italy. Association
for Computational Linguistics.

You might also like