Professional Documents
Culture Documents
Prospect Theory
May, 2019
Contents
1 Introduction 1
3 Prospect Theory 5
3.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Treatment Assignment Mechanism . . . . . . . . . . . . . . . . . . . 8
5 Bayesian Inference 12
5.1 Rubin Causal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2 Posterior predictive distribution of missing potential outcomes . . . . 12
5.3 Posterior distribution of causal effects . . . . . . . . . . . . . . . . . . 13
5.4 Model-based inference . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7 Conclusion 22
8 Future Work 23
A Appendix 24
A.1 Summary Statistics of Causal Estimands . . . . . . . . . . . . . . . . 24
A.2 Pair plots of parameters . . . . . . . . . . . . . . . . . . . . . . . . . 25
A.3 MATLABStan Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
References 32
1 Introduction
Expected Utility Theory is criticized by more and more people. Some argue the
underlying assumptions of the theory do not hold, some prove predictions do not
match behaviors through experimental or empirical test and some combine both of
criticism[1]. Enormous alternative theories have been proposed. Among them, the
most representative one is Prospect Theory, which was proposed by Daniel Kahneman
and Amos Tversky in 1979. Prospect Theory, corrective critique of Expected Utility
Theory, successfully illustrated many unexplained anomalies in economics, which had
a huge impact on later behavioral finance research. Behavioral finance attempts to
change the standard analytical paradigm based on Efficient Market Hypothesis for
decades, analyzing the market behaviors from the perspective of human being[2].
The main goal of this report is that fully discuss improvements of Prospect The-
ory, compared of Expected Utility Theory(EUT). In Section 2, we will quick review
the EUT, and discuss the limitations of it. In Section 3, Prospect Theory will be
introduced in detail. In Section 4, we will mainly focus on the improvements of the
Prospect Theory. In Section 5, we will discuss some limitations and extensions of
Prospect Theory.
investors are risk advise (i.e. utility function u is concave) in most of cases. One of
the basic assumption of Expected Utility Theory is that each individual is rational.
However, does it is really true?
1
2.1 Subjective Probability Biases
In 1974, Kahneman and Tversky first questioned the rationality assumption from
a psychological perspective. They argue that investors face the Representativeness
bias, Availability bias, Anchoring Effect and etc [3].
(1) Representativeness Bias
Kahneman and Tversky (1974) state that people tend to categorize events accord-
ing to traditional or similar circumstances, and overtrust the possibility of historical
repetition when assessing the probability [3]. Bondt and Thaler (1985) further argue
that Investors will be overly pessimistic (optimistic) for the losers (winners) of the
past stock market, resulting in large differences in stock prices and fundamental prices
[4].
(2) Availability Bias
Kahneman and Tversky (1974) state events that are easily reminiscent can lead
to the misconception that they often occur. They believe that people tend to overes-
timate the probability of an event occurring under the availability bias [3].
(3) Anchoring Effect
Kahneman and Tversky (1974) states the final answer proposed by people is sig-
nifyingly impacted by the initial values. For example, the high school students are
divided into two group. Given 5 mins, they are required to offer an estimation of
1 × 2 × 3 × 4 × 5 × 6 × 7 × 8. The group 1 and group 2 are given by ascending and
descending order. The result is following:
1 1×2×3×4×5×6×7×8 512
2 8×7×6×5×4×3×2×1 2250
Obviously, group 1 offer the overall smaller values, which is influence by the initial
value 1 and vice versa [3].
2
Effect, Reflection Effect and Isolation Effect in 1979 [6]. In this section, we are going
to introduce each of the effect in detail, based on survey data.
§ Certainty Effect
Table 1: Problem 1
Table 1 shows the uncertainty outcomes with probability and peoples preference.
Each test people are required to choose prospect. Without loss generality, we de-
fine u(0) = 0. According to Expected Utility Theory, we can derive u(2400) >
0.33u(2500) + 0.66u(2400), from Test 1, equivalently 0.34u(2400) > 0.33u(2500).
However, we obtains 0.33u(2500) > 0.34u(2400) from Test 2, which contradicts with
each other. Summarily, when the outcomes change from certain gain to probable
gain, the desirability will be reduced [6].
Table 2: Problem 2
3
A -4000(0.80) 0(0.20) 92%
Test 30
B -3000(1.00) 8%
C -4000(0.20) 0(0.80) 42%
Test 40
D -3000(0.25) 0(0.75) 58%
Table 3: Problem 3
Comparing between Table 2 and Table 3, we could conclude decision makers have
two opposite behavioral preference under the uncertainty gains and losses. That is,
when facing the gains, investors act as risk aversion, but when facing the losses,
investors act as risk seeking, which also violates the assumption of overall concave
utility function of Expected Utility Theory.
§ Isolation Effect
Kahneman and Tversk argue that the inconsistence of preference may be at-
tributed to the way of decomposing prospects based on individuals preference. When
the decomposition is not unique, the preference will vary [6]. Consider the following
two-stage game, in first stage, each individual has 0.75 chance to be eliminated and
0.25 chance to move into next stage. In stage 2, the individual face two choice: (1)
(4000, 0.8), (2) (3000, 1). From the perspective of the entire game, the result show in
Table 4. The outcome 4000 with probability 0.2 (0.25 × 0.8). The individuals pref-
erence is conflicted with Test 4 in Table 2. It is due to the individual tends to ignore
the first stage and only focus on the second stage, when facing the muti-stage prob-
lem. Then applying the certainty effect, the individual tends to choose B. Kahneman
and Tversk indicate that investors make the choice between prospects are not based
completely on the probability in final states, which again violates the assumption of
Expected Utility Theory [6].
Table 4: Problem 4
Consider another game, group 1 and 2 are initially given 1000 and 2000 respec-
tively. The choices they facing and the test preference are showed in Table 5. From
the perspective of the final stage, A = (2000, 0.5; 1000, 0.5) = C and B = (1500) = D.
Again, the investors choices are inconsistent. Investors seems ignore the initial bonus,
because it is common in both opinions in each problem. Thus, we may propose that
4
the change in wealth is the core part estimate the utility, rather than the final wealth
states [6].
Table 5: Problem 5
3 Prospect Theory
Under assumptions of traditional financial theory, investors maximize their expected
utility based on the terminal wealth level with corresponding probability. These
decision-making processes are built on the Efficient Market Hypothesis, where each
investor is rational and has the same level information of market. However, investors
preference is influenced by various factors, such as economic situation, knowledge
level, psychological quality and so on, when facing the real choices in financial market.
Also, we find Expected Utility Theory is not sufficient to explain the preference in
real-world decisions. Kahneman and Tversk proposed Prospect Theory in 1979, which
revised the traditional decision-making framework and emphasize that investors are
affected by many factors under uncertainty environment.
Prospect Theory deals with the simple decision problems with monetary outcomes
and states probabilities. Generally, the theory divides the decision-making process
into two stages. The first stage is Editing, a preliminary analysis of the outcomes
and probabilities, and then transform them into edited prospects. This stage some-
times also accompanied by the simplification of these prospects. The second stage is
evaluation, where investors choose the optimal prospect through evaluating them [6].
Next, we will explain each stage in detail.
5
Notationally, let W be the variable that indicates the treatment level, to which
the N unit may receive: W = 1 for Nt treated units and W = 0 for (N − Nt ) control
units. Also let Yi (1) denote the value of a treated unit, and Yi (0) denote the value of a
control unit. In the example described above, Yi (0) and Yi (1) represent the pain level
of headache. The individual treatment effect can be determined by the difference of
two potential outcomes, Yi (1) − Yi (0), for any unit i. The fundamental problem of
causal inference [7] faced by statisticians is that, only one potential outcome can be
observed, and the other being missing for each unit, and it can be described by the
expression:
The data tabulated below illustrates the fundamental problem of causal inference
that we face:
1 0 8 5 3* −2*
2 1 15 7* 9 +2*
3 0 26 7 5* −2*
4 1 35 8* 9 +1*
5 0 42 5 4* −1*
True Averages 6.4* 6* -0.4*
Observed 5.7 9
where values marked with ∗ are unobserved, and X denotes the covariates that may
affect the effect of the treatment on the outcomes (i.e., age of a person in the study of
effect of aspirin). The array of values of X,Y(0) and Y(1) represents the ”science”
that we want to learn, which does not change how we assign the treatment to the
unit.
However, in the real situation we would always observe a table as follows:
6
Unit W Y(0) Y(1) Individual causal effect
1 0 5 ? ?
2 1 ? 9 ?
3 0 7 ? ?
4 1 ? 9 ?
5 0 5 ? ?
True Averages ? ? ?
Observed 5.7 9
Therefore, the Rubin causal model reduces the potential outcome framework to a
missing data problem. From the table above, one may draw the conclusion that
taking aspirin is bad for reducing the level of headache pain, by simply looking at the
observed values, (i.e., Y (1) = 9 > Y (0) = 5.7). But we ”know” from the science that
the treatment on average has positive impact on patients’ medical condition. This
motivates us to develop an imputation method to fill the unobserved values, and later
in Section 5 we will discuss how Bayesian approach is implemented in this problem.
Identifying the individual causal effect is not possible in general since there is
always a piece of information missing. Hence we draw our attention to the aggre-
gated causal effect which is the average treatment effect (ATE) over a population of
individuals, and it can be formulated as follows:
3.1 Assumptions
There are two major assumptions [8] need to be made which are often adopted in the
causal analysis.
Assumption 1 implies that neither Y (0) nor Y (1) can be affected by what actions
other units received. Assumption 2 assumes that the ”science” is not affected by how
we learn it, no matter it is randomized experiments or observational studies. These
two assumptions constitute a critical assumption called ”Stable Unit Treatment Value
7
Assumption” (SUTVA) [8].
Under SUTVA, estimation of causal effect can be found from the matrix of science
X1 Y1 (0) Y1 (1)
..
.
(X,Y(0),Y(1)) = Xi Yi (0) Yi (1) (4)
..
.
XN YN (0) YN (1)
Since labelling of the N -row array is a random permutation, it suggests that every row
is exchangeable with others. To evaluate the causal effect, we must have some units
with Y (1) being observed and some units with Y (0) being observed. It motivates
us to study a probabilistic mechanism that determines which units receive treatment
and which units receive control.
Without any knowledge about how the missing data is created during the process,
it is generally impossible to infer meaningful things from the missing data. In a
completely randomized experiment, the treatment assignment is often seen as being
unconfounded with potential outcomes. Two key assumptions we need to make in
a completely randomized experiment are ”overlap” and ”unconfoundedness” [9]:
Assumption 3 (Overlap)
8
Assumption 4 (Unconfoundedness) The assignment mechanism is independent of
all potential outcomes including both observed and unobserved. Hence it implies:
such that
(Yi (1), Yi (0)) Wi |Xi
|=
where the conditional independence notation, A B|C was first used by Dawid [10],
|=
denoting that A and B are independent conditional on C. The combination of these
two assumptions constitute the strong ignorability. Given this, the assignment mech-
anism can be considered to be ignorable in the imputation process of missing potential
outcomes.
9
units that receive treatment and control:
Since the pair of potential outcomes for the same unit is never observed simultane-
ously, it is generally impossible to identify Sτ2 from the observed data, thus we need
to develop an estimator for this sampling variance given by Equation 10. If the treat-
ment effects are constant and additive across the population (Yi (1) − Yi (0) = τf s ),
the third component of Equation 10 vanishes, and the variance can be approximated
by an unbiased estimator [9] as follows:
1 X
s2t = (Y obs − Ȳ1obs )2 (16)
Nt − 1 i:W =1 i
i
10
Hence, the confidence intervals generated with the Neyman’s estimator will have
larger coverage than their nominal coverage in large samples. Neyman reviewed in
Stats. Sci (1990) [12] formulated the following as a confidence interval with frequentist
coverage probabilities of Ȳ1 − Ȳ0 :
√
Ȳ1obs − Ȳ0obs ± 1.96 V neyman (18)
The effect of covariates are ignored by the asymptotic confidence interval above,
the interval tends to be too conservative for rerandomization, that is an assignment
mechanism that checks balances between covariates at the time of randomization. It
is suggested by Morgan and Rubin [13] that more accurate results can be obtained
by using regression adjustment methods since the results can better capture the true
sampling variance of the treatment effect. As a result of simple random sampling, the
super-population ATE, τsp , is equal to the expected value of the finite sample ATE,
τsp , and the estimator Vsp for sampling variance of super-population is identical to
the conservative Neyman’s estimator [9], V neyman .
It is worth mentioning Fisher’s sharp null hypothesis, [14] where Fisher attempted
to prove the following hypothesis is incorrect using randomization distribution:
It is being ”sharp” because under the null hypothesis of no treatment effects for any
unit, all potential outcomes are known. Under such hypothesis, the assumption of
constant additive treatment effect is satisfied for Neyman’s estimator, and its estimate
for the sampling variance would be unbiased. However, neither Fisher’s nor Neyman’s
approaches addressed the problem on which interventions should be taken to the
unseen units.
In the upcoming section, we will introduce a model-based approach which is more
flexible compared to Neyman repeated sampling approach.
11
5 Bayesian Inference
5.1 Rubin Causal Model
A framework that defines causal effect by potential outcomes with explicit proba-
bilistic assignment mechanism, is called ”Rubin Causal Model”, first given that name
by Holland [15]. The model views all statistical inference problems for causal ef-
fects as missing data problems for which the missing data is created via probabilistic
assignment mechanism [16].
The essence of Rubin causal model is that it beautifully separates the true underly-
ing model of science, Pr(X, Y (0), Y (1)) from the assignment mechanism, Pr(W |X, Y (0), Y (1)).
Combining them together, the model specifies a joint distribution for all quanti-
ties, Pr(W, X, Y (0), Y (1)), that allows simulation-based computational methods (i.e.,
Markov Chain Monte Carlo) to be implemented.
where θ is a parameter with prior distribution Pr(θ). Furthermore, we can factor the
conditional distribution of (X, Yi (0), Yi (1)) into:
12
where θY |X is the parameter of the conditional distribution, Pr(Yi (0), Yi (1)|Xi ). Sim-
ilarly, θX is the parameter of the marginal distribution, Pr(Xi ). We assume θY |X
θX are both functions of θ and a priori independent. Rubin [19] pointed out that
Equation 23 creates a connection between fundamental theory and the practice of
applying independent and identical models.
13
6 Application in NSW example
6.1 The Lalonde NSW experimental job-training data
The data we will be analyzing, to illustrate our methods, come from a randomized
evaluation of the National Support Work (NSW) project. It is a transitional and
subsidized work experience program for groups of people who suffer from longstanding
employment problems [21]. This dataset has been extensively analyzed in the field
of econometrics by Heckman and Hotz [22], Dehejia and Wahba [23], and Smith and
Todd [24].
Table 6 summarizes 12 variables that were collected from units may or may not
be assigned to the job training program. In the size of N = 445 people, Nt = 185 of
them were assigned to the treatment group and Nc = 260 were assigned to the control
group. The treatment variable of interest is treat, an indicator variable for whether
or not being assigned in the job-training program. The outcome variable of interest
is re78, a variable represents the post-program labor market experiences, earnings in
1978. The other 10 variables are covariates that may affect the outcome variable at
different levels such as age, educ, married, etc that have a certain degree of influence
on the outcome. Summary statistics for them are presented in the table below:
14
When applying the methodology we described above to the real-world data, the ma-
jor problem researchers often encountered is to formalize the dependence between
two potential outcomes since the researchers never observe both potential outcomes
simultaneously. Thus, we build several models for the joint distribution of two poten-
tial outcomes and study the sensitivity to the choice of models. For first 3 models, we
choose normal distribution as prior for all the parameters of our interest. It is worth
noting that the inference is insensitive to the specification of prior distribution with
ignorable treatment assignment [7] [25].
We begin with the simplest model where we assume independence between two
potential outcomes with no covariates. One of our goals is to investigate how sensi-
tive the results are to the assumption of independence between potential outcomes,
hence the second model we would like to consider is for which the potential outcomes,
Yi (0) and Yi (1), are perfectly correlated (i.e. ρ = 1) and covariances of both poten-
tial outcomes are equal. Equivalently, the individual treatment effects are constant.
The third model we consider assumes independence between potential outcomes but
with covariates and we want to study whether adding pre-treatment covariates will
improve the imputation process or not. Finally, we incorporate the assumption of
independence, the presence of covariates and adjustment based on dataset feature
into our fourth model, and compare the result with the one obtained via Neyman
repeated sampling approach.
In addition to finite-sample ATE and super-population ATE, we further define
3 finite-sample causal estimands, which are quantile treatment effects (QTE), for
the 0.25, 0.50 and 0.75 quantiles. These QTEs represent the difference between
pth quantiles of the potential outcomes where some values are observed and some
are imputed,τqte = Qp (Yi (1)) − Qp (Yi (0)). These estimands are generally used for
evaluating the effect of treatment at the top or bottom of the distribution.
We start by thinking about the joint distribution of Yi (0) and Yi (1) in the general
case without covariate, defined as follows:
! ! !!
Yi (0) µc σc2 ρσc σt
|θ ∼ Normal , (24)
Yi (1) µt ρσt σc σt2
15
where µc and µt are the posterior means of Yi (0) and Yi (1), and σc2 and σt2 are the
posterior variances of Yi (0) and Yi (1). If we set µc = α as the mean of potential
outcomes under control, then µt equals to α + τ as the mean of potential outcomes
after treatment. Furthermore, we assign the same Gaussian prior with zero mean
and variance of 1002 for all parameters: α, τ, σc2 , and σt2 . The standard deviation of
the prior distribution (i.e., 100) has to be large enough to cover the scale of outcome
variable ranges from 0 to 60.3 (see Table 6):
We use Stan program (see Appendix A.3) in Matlab to sequentially draw the posterior
predictive distribution of missing potential outcomes and the posterior distribution
of the parameters. For summary statistics of model 1, please see Table 9 in Appendix
A.1.
All 5 causal estimands are presented as follows:
16
(a) Finite-Sample ATE vs Super-Population (b) Quantile treatment effects for 0.25, 0.50
ATE and 0.5 quantiles
From Figure 1a, we observe that the posterior standard deviation of super-population
ATE is considerably greater than that of finite-sample ATE. From Figure 1b, one
should notice that the posterior distribution of τ0.25 does not look like a normal
distribution, it is because over 31% of the outcome variable, re78, is zero and this
causes the irregularity on the behavior of normal distribution. Figure 1b also provides
some knowledge on the heterogeneity of treatment effect, that is how treatment might
affect different experimental subjects at different levels. For instance, being assigned
to the job-training program provides more benefits for subjects at the high end of the
earnings distribution.
17
(a) Model 1: independence between two (b) Model 2: constant individual treatment
potential outcomes effects
From Figure 2, one observes that there are considerably large variations among pos-
terior means, with large uncertainty intervals in Model 1. However, the posterior
means are the same with minimal uncertainty interval in Model 2. It implies that the
majority of uncertainty in the imputation process arises from when multiply-imputing
[20] the missing potential outcomes.
The table below provides some statistics of causal estimands of our interest for
Model 2, generated by Stan program:
18
where se mean represents the Monte Carlo standard error, n eff represents the
effective sample size and Rhat is R-hat statistics.
It is assumed there is no treatment effect heterogeneity across the units, hence all
QTE estimands are equal as expected. We notice that the finite sample ATE is equal
to the super-population ATE in this case. It suggests that Model 2 does not only
provide a ”conservative” estimate of the posterior variance for the finite sample ATE,
but also give an unbiased estimate of the super-population ATE in the worst-case
scenario.
where βc and βt are vectors of the slope coefficients for the units receiving control
and treatment respectively. We define βinter to be the difference between these two
vectors (i.e., βinter = βt − βc ), βinter can be obtained by adding an interaction term
between X and W. We assign same prior distribution with mean of 0 and variance of
100 to these two parameters, and again we assume they are prior independent from
the other parameters:
β, βinter ∼ Normal(0, 100) (33)
The rest of the procedures for model-based approach with covariates are similar to
the one without covariates. We can derive the conditional distribution of missing
potential outcome for unit i given observed potential outcomes, assignment vector,
covariates and parameters are given as follows:
The following figure displays the posterior predictive distribution of the first 15 miss-
ing control outcomes, and for summary statistics for the causal estimands of our
interest, please see Table 10 in Appendix A.1.
19
(a) Model 1: without covariates (b) Model 3: with covariates
We observe that the distributions of missing potential outcomes are almost the same
in Model 1 while there is greater variation among posterior means in Model 3. Adding
pre-treatment covariates provides more information about the prediction of missing
data, therefore this improves the imputations of missing potential outcomes.
However, all 3 models discussed above indirectly assume continuity of the distri-
bution of potential outcomes while there is a high percentage of zeros in the observed
outcomes (i.e., approximately 31%). Hence the results we obtained are implausible.
This motivates us to construct a model that fits a conditional distribution only to
those strictly positive values of potential outcomes.
20
positive values of Yi (0) and Yi (1) using logistic regression as follows:
exp (Xi γc )
Pr(Yi (0) > 1|Xi , Wi , θ) = (35)
1 + exp(Xi γc )
exp (Xi γt )
Pr(Yi (1) > 1|Xi , Wi , θ) = (36)
1 + exp(Xi γt )
where γc and γt denote the slope coefficients for the control and treated units respec-
tively. Similar as before, we define γinter to be the difference between the two vectors
(i.e., γinter = γt − γc ). We assign a t-distribution prior to these parameters with 5
degrees of freedom and zero means:
Second, we assume the logarithm of the potential outcome conditional on the positive
values of outcome is normally distributed:
The rest of procedures of this model is similar to Model 3. (For Stan program of this
model, please see Appendix A.3).
By using Equation 9 and 14 of Neyman repeated sampling approach, and adjusting
with 10 covariates via linear regression methods, we can derive the point estimate
and standard error of τ . The statistics of Neyman and Bayesian approach can be
summarized as follows:
Table 8: Point and interval estimates for the population average treatment effect τ
Model 1 and Model 3 provide nearly the same standard deviation of population
ATE, it indicates that the level of uncertainty is determined by the independence
assumption between two potential outcomes instead of the presence of the covariates.
The point estimates obtained, 1.67 via Neyman’s repeated sampling and 1.77,1.55
21
and 1.60 via Bayesian approach are not small in magnitude relative to the mean and
standard deviation of the re78. Therefore, we can conclude that being assigned to
the job-training program brings the positive effect on people’s earnings in 1978.
By comparing the interval estimates of τ from the Neyman and Bayesian approach,
we see that there is a minor difference between the four intervals relative to the scale
of re78. From Neyman repeated sampling analysis, we are 95% confident that the
super-population ATE of treat = 1 on re78 is between 0.41 − 2.92. Similarly, we are
95% confident that the credible interval for τ with Bayesian Model 1 is from 0.45 to
3.08, with model 3 is from 0.24 to 2.86, and with Model 4 is from 0.15 − 3.05.
In this example, one obvious advantage of Neyman approach is that it avoids the
need to specify the prior distribution, Pr(θ), for the parameters that govern the joint
distribution of two potential outcomes. However, it also comes with the price for
heavily relying on large sample approximations, so that it can justify the proposed
frequentist confidence interval. It should be noted that in large samples, the presumed
benefit of Neyman approach vanishes because the practical implications of the choice
of prior is limited by the Bernstein-Von Mises Theorem [26].
7 Conclusion
In this paper, We have introduced the fundamental of causal inference and described
how Rubin causal model turns a potential outcome framework into a missing data
problem. Based on the stability assumption, SUTVA, we also introduced the Neyman
repeated sampling approach, which is a frequentist method that provides an unbi-
ased estimator for point estimate of ATE and constructs an interval estimator for
it. Furthermore, we formulated 4 main steps from Bayesian perspective to derive the
posterior predictive distributions of missing potential outcomes and interested causal
estimands.
Based on the theoretical results obtained in Section 3 and Section 4, we built 4
different models for NSW dataset from the simplest one to the more sophisticated
one, to assess how each factor affected the results of the models. From simulation
result of Model 1, we observed that a substantial amount of zeros can induce a
non-normal distribution of 0.25 quantile QTE. By investigating the conservative case
in Model 2, we determined the bound of super-population ATE where we assumed
the potential outcomes are perfectly correlated. We preceded by considering the
presence of covariates in Model 3 and showed that adding pre-treatment covariates can
improve the imputations of missing potential outcomes. However, all 3 models assume
22
continuity of distribution of potential outcomes which contradicts to the feature of
the dataset. We thereby built the most sophisticated model by incorporating all
the considerations into Model 4, which provided the most plausible result among all
4 Bayesian models. We also realized that the Bayesian approach can be relatively
more flexible than frequentist approach (i.e., Neyman’s approach) as it can easily
accommodate with a wide variety of causal estimands such as QTEs.
8 Future Work
There are many real-world causal inference problems that can be tackled more flexibly
with Bayesian approach than with frequentist approach. For instance, when there are
unintended missing data (e.g. patient dropout) in the dataset, Bayesian techniques
such as multiple imputation or Expectation-Maximization algorithm for imputing the
missing data, are well matched with the Bayesian approach to causal inference ([27],
Section 3 in [28]). However, the way the missing data imputed may have non-trivial
impact on the causal analysis [29]. More study in the understanding of such impact
may be valuable to investigate.
Alternatively, we can explore the situation when the units (e.g. people) refuse to
comply the active treatment, but take the control treatment instead. This compli-
cation is often called noncompliance, a very active research field for application of
Bayesian approaches in causal inference. In the case described above, the sensitivity
to the prior assumption can be very severe. A possible direction of research can be
revealing the sensitivity and developing the prior restrictions.
23
A Appendix
A.1 Summary Statistics of Causal Estimands
24
A.2 Pair plots of parameters
25
Figure 6: MCMC Pairs of parameters for Model 3
26
A.3 MATLABStan Code
The following code is used to build Stan program for Model 1 and Model 4 for NSW
project. The code for Model 2 and Model 3 are not included here since they are
straightforward and not very informative.
The code for Model 1:
data {
int<lower=0> N; %% size of sample
vector[N] y; %% potential outcome
vector[N] w; %% treatment assignment
}
parameters {
real alpha; %% intercept
real tau; %% super-population ATE
real<lower=0> sigma_c; %% standard deviation for control units
real<lower=0> sigma_t; %% standard deviation for treated units
}
model {
27
real mu_t = alpha + tau;
if(w[n] == 1){
y0[n] = normal_rng(mu_c, sigma_c);
y1[n] = y[n];
}else{
y0[n] = y[n];
y1[n] = normal_rng(mu_t, sigma_t);
}
tau_unit[n] = y1[n] - y0[n];
}
tau_fs = mean(tau_unit);
tau_qte25 = quantile(to_vector(y1), 0.25) - quantile(to_vector(y0), 0.25);
tau_qte50 = quantile(to_vector(y1), 0.50) - quantile(to_vector(y0), 0.50);
tau_qte75 = quantile(to_vector(y1), 0.75) - quantile(to_vector(y0), 0.75);
}
data {
int<lower=0> N; %% size of sample
int<lower=0> N_cov; %% NO. of covariates
real y[N]; %% continuous potential outcome y
int<lower=0,upper=1> y_pos[N]; %% flag for y > 0
int<lower=0,upper=1> z[N]; %% treatment assignment
real<lower=-1,upper=1> rho; %% correlation coeffs
vector[N_cov] x[N]; %% vector of covariates
vector[N_cov] xz_inter[N]; %% introduced interaction terms
parameters {
28
row_vector[N_cov] beta_inter_bin; %% coeffs for interaction terms
real tau_bin; %% treatment effect
%% Continuous model
real alpha_cont; %% continuous intercept
row_vector[N_cov] beta_cont; %% coefficients for vector of covariates
row_vector[N_cov] beta_inter_cont; %% coeffs for interaction terms
real tau_cont; %% treatment effect for continuous part
model {
%% Priors
alpha_bin ~ student_t(5, 0, 2.5); %% student distribution with 5 dofs
beta_bin ~ student_t(5, 0, 2.5);
beta_inter_bin ~ student_t(5, 0, 2.5);
tau_bin ~ student_t(5, 0, 2.5);
for (n in 1:N) {
%% binary model
y_pos[n] ~ bernoulli_logit(alpha_bin + beta_bin * x[n] +
beta_inter_bin * xz_inter[n] + tau_bin * z[n]); %% likelihood function
29
log(y[n]) ~ normal(alpha_cont + beta_cont * x[n] +
beta_inter_cont * xz_inter[n] + tau_cont * z[n], sigma_t*z[n] +
sigma_c*(1-z[n]));
}
}
generated quantities{
real y0_bar;
real y1_bar;
real tau_samp;
real tau_qte25;
real tau_qte50;
real tau_qte75;
{
real y0[N];
real y1[N];
real tau_ind[N];
int y_pred_pos;
for(n in 1:N){
%% predicted success
real theta_c = inv_logit(alpha_bin + beta_bin * x[n]);
real theta_t = inv_logit(alpha_bin + beta_bin * x[n] +
beta_inter_bin * x[n] + tau_bin);
if(z[n] == 1){
30
y_pred_pos = bernoulli_rng(theta_c);
if(y_pred_pos == 0){
y0[n] = 0;
}else{
y0[n] = exp(normal_rng(mu_c +
rho*(sigma_c/sigma_t)*(y[n] - mu_t), sigma_c*sqrt(1 - rho^2)));
}
y_pred_pos = bernoulli_rng(theta_t);
if(y_pred_pos == 0){
y1[n] = 0;
}else{
y1[n] = exp(normal_rng(mu_t +
rho*(sigma_t/sigma_c)*(y[n] - mu_c), sigma_t*sqrt(1 - rho^2)));
}
}
tau_ind[n] = y1[n] - y0[n];
}
31
References
[1] Borch K. H. Economics of uncertainty, notes for lectures 10. critiques of expected
utility theory. University of Princeton, 2009.
[2] Causi L. G. Theories of investor behaviour: From the efficient market hypothesis
to behavioural finance. 2017.
[3] Amos Tversky and Daniel Kahneman. Judgment under uncertainty: Heuristics
and biases. science, 185(4157):1124–1131, 1974.
[4] John Y Campbell, John J Champbell, John W Campbell, Andrew W Lo, Andrew
Wen-Chuan Lo, and Archie Craig MacKinlay. The econometrics of financial
markets. princeton University press, 1997.
[6] Daniel Kahneman and Amos Tversky. Prospect theory: An analysis of decision
under risk. Econometrica, 47(2):263–292, 1979.
[7] Rubin D.B. Bayesian inference for causal effects: The role of randomization.
Ann. Statist., 6(1):34–58, 01 1978.
[8] Rubin D.B. Randomization analysis of experimental data: The fisher random-
ization test. Journal of the American Statistical Association, 75(371):575–582,
1980.
[9] Imben G.W. and Rubin D.B. Neyman’s Repeated Sampling Approach to Com-
pletely Randomized Experiments, page 83112. Cambridge University Press, 2015.
[11] Neyman J. S., Dabrowska D. M., and Speed T. P. On the application of probabil-
ity theory to agricultural experiments. essay on principles. section 9. Statistical
Science, 5(4):465–472, 1990.
32
[12] Neyman J. On the two different aspects of the representative method: The
method of stratified sampling and the method of purposive selection. Journal of
the Royal Statistical Society, 97(4):558–625, 1934.
[14] Enderlein G. Fisher, ronald a.: The design of experiments. eighth edition. oliver
and boyd, edinburgh 1966. xvi + 248 s., 5 abb., 39 tab., brosch. preis s 15.
Biometrische Zeitschrift, 11(2):139–139, 1969.
[15] Holland P. W. Statistics and causal inference. Journal of the American Statistical
Association, 81(396):945–960, 1986.
[16] Rubin D.B. Estimating causal effects of treatments in randomized and nonran-
domized studies. Journal of Educational Psychology, 66, 10 1974.
[17] Bruno de Finetti. Foresight: Its Logical Laws, Its Subjective Sources, pages 134–
174. Springer New York, New York, NY, 1992.
[19] O’Hagan A., West M., Rubin D.B., Wang X., Yin L., and Zell E. Bayesian causal
inference: Approaches to estimating the effect of treating hospital type on cancer
survival in sweden using principal stratification, 08 2018.
[20] Rubin D.B. Multiple Imputation for Nonresponse in Surveys. Wiley, 1987.
[22] Heckman J. J. and Robb R. Alternative methods for evaluating the impact of
interventions: An overview. Journal of Econometrics, 30(1):239 – 267, 1985.
33
[25] Rosenbaum P. R. and Rubin D. B. Sensitivity of bayes inference with data-
dependent stopping rules. The American Statistician, 38(2):106–109, 1984.
[26] A. W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and
Probabilistic Mathematics. Cambridge University Press, 1998.
[27] Ding P. and Li F. Causal inference: A missing data perspective. Statist. Sci.,
33(2):214–237, 05 2018.
[28] Gelman A., Carlin J.B, Stern H.S., Dunson D.B., Vehtari A., and Rubin D.B.
Bayesian Data Analysis, Third Edition. Chapman & Hall/CRC Texts in Statis-
tical Science. Taylor & Francis, 2013.
34