You are on page 1of 32

Reading Dr.

Carrier’s ”Proving History”
A Review From a Bayesian Perspective
Tim Hendrix∗
May 4, 2013

Dr. Richard Carrier’s new book, ”Proving History” (Prometheus Books, ISBN
978-1-61614-560-6), is the first of two volumes in which Dr. Carrier investigates
the question if Jesus existed or not. According to Dr. Carrier, the current state
of Jesus studies is one of confusion and failures in which all past attempts to
recover the ”true” Jesus has failed. According to Dr. Carrier the main problem
is the methods employed: Past studies has focused on developing historical
criteria to determine which parts of a text (for instance the Gospel of Luke)
can be trusted, but according to Dr. Carrier all criteria and their use is flawed.
This has as a result led to many incompatible views of what Jesus said or
did, and accordingly the question ”Who was Jesus?” has many incompatible
answers: a Cynic sage, a Rabinical Holy Man, a Zealot Activist, an Apolytic
prophet and so on.
Richard Carrier propose that Bayes theorem (see below) should be employed
in all areas of historical study. Specifically, Dr. Carrier propose that the problems plaguing the methods of criteria can be solved by applying Bayes theorem,
and this will finally allow allow the field of Jesus studies to advance. What this
progress will be like and specifically, how the question if Jesus exist should be
answered, will be the subject of his second volume.
I was interested in Dr. Carriers book, both because I have a hobby interest
in Jesus studies and found his other book on early christianity, ”Not the impossible faith” very enjoyable and informative, but certainly also because Bayesian
methods was the focus area of my PhD and my current research area. My main
focus in writing this review will therefore be on the technical content relating to
the use of Bayes theorem and its applicability to historical questions as argued
in the book.
The book is divided into six chapters. Chapter one contain an introduction
which argues historical Jesus studies in it’s present form is ripe with problems,
chapter two introduce the historical method as a set of 12 axioms and 12 rules,
∗ Tim Hendrix is not my real name. For family reasons I prefer not to have my name
associated with my religious views online. All quotations are from ”Proving History”.


chapter three introduce Bayes Theorem, chapter four discuss historical methods
and seek to demonstrate with formal logic that all valid historical methods
reduce to applications of Bayes theorem, chapter five goes through historical
criteria often used in Jesus studies and conclude each is only valid insofar as it
agrees with Bayes theorem. Finally chapter six, titled ”the hard stuff”, discuss
a number of issues that arise in applying Bayes theorem as well as Richard
Carriers proposal for how a frequentist and Bayesian view of probabilities can
be unified.
In reviewing this book I wish to focus on what I believe are the books main
contributions: The first point is that Bayes theorem not only applies to the
historical method, but that it can be formally proven all historical methods can
be reduced to applications of Bayes theorem and importantly, thinking in this
way will give tangible benefits compared to traditional historical methods.
The second point is how Dr. Carrier address several philosophical points that
are raised throughout the book, for instance the unification of the frequentistic
and Bayesian view of probabilities. Since I am not a philosopher I will not be
able to say much on the philosophical side, but I do think there are a number
of points which fall squarely within my field that should be raised.
However before I proceed I will first briefly touch upon the Bayesian view of
probabilities and Bayes theorem.


Bayes Theorem

I wish to begin with a point that may seem pedantic at first, namely why we
should think Bayes theorem is true at all. Dr. Carrier introduces Bayes Theorem
as follows:
In simple terms, Bayes’s Theorem is a logical formula that deals
with cases of empirical ambiguity, calculating how confident we can
be in any particular conclusion, given what we know at the time.
The theorem was discovered in the late eighteenth century and has
since been formally proved, mathematically and logically, so we now
know its conclusions are always necessarily true if its premises are
true. (Chapter 3)
Unfortunately there are no references for this section, and so it is not explained
what definitions Bayes theorem make use of, which assumptions Bayes theorem
rests upon and how it’s proven. For reasons I will return to later I think this
omission is problematic. However, shortly after the above quotation, just before
introducing the formula for bayes theorem, we are given a reference:
But if you do want to advance to more technical issues of the
application and importance of Bayes’s Theorem, there are several
highly commendable texts[9]
Footnote 9 has as it’s first entry E.T. Jaynes ”Probability Theory” from 2003. I
highly endorse this choice and I think most Bayesian statisticians would agree.

E.T. Jaynes was not only an influential physicist, he was a great communicator
and his book is in my preferred reference for students. In his book, Jaynes
argues Bayes theorem is an extension of logic, and I will attempt to give the
gist of Jaynes treatment of Bayes theorem below. Interested readers can find
an almost complete draft of Jaynes book freely available online1 :
Suppose you want to program a robot that can reason in sensible manner.
You want the robot to be reason quantitatively about true/false statements such
A = ”The next flip of the coin will be heads”
B = ”There has been life on mars”
C = ”Jesus existed”,
A basic problem is neither we or the robot have perfect knowledge, and so it
must reason under uncertainty. Accordingly, we want the robot to have a notion
of the ”degree of plausibility” of some statements given other statements that
are accepted as true.
The most important choice in the above is I have not defined what the
”degree of plausibility” is. Put in other words, the goal is to analyse what
”the degree of plausibility” of some statement could possibly mean and derive
a results. Jaynes treatment in Probability Theory is both throughout and entertaining2 , and at the end he arrive at the following 3 desiderata a notion of
degree of plausibility must fulfill:
• The degree of plausibility must be described by a real number
• It must agree with common sense (logic) in the limit of certainty
• It must be consistent
Consistency implies that if we have two ways to reason about the degree of
plausibility of a statement, these two ways must give the same result. After
some further analysis he arrive at the result that the degree of plausible of
statements A, B, C, . . . can be described by a function P , and the function
must behave like ordinary probabilities usually do, hereunder Bayes theorem:
P (A|B) =

P (B|A)P (A)
P (B)

Where by the notation P (A|B) mean ”the degree of plausibility of A given B.
The key point is Bayes theorem now -if we accept what goes into the derivationnot only applies to flips with coins, but to all assignment of the degrees of
plausibility of true/false statements we may consider, and the interpretation
that a probability is really a degree of plausibility is then called the Bayesian
1 c.f.
should be noted the argument is not original to E.T. Jaynes, see R.T Cox’s work from
1946 and 1961 or Jaynes book for a detailed discussion of the history
2 It


interpretation of probabilities. It is in this sense Jaynes (as well as most other
who call themselves Bayesians) consider Bayes theorem an extension of logic.
These definitions may appear somewhat technical and irrelevant at this
point, however their importance will hopefully be apparent later. For now let
us make a few key observations:
• Bayes theorem do not tell us what any particular probability should be
• Bayes theorem do not tell us how we should define the statements A, B, C, . . .
in a particular situation
What Bayes theorem do provide us is a consistency requirement: If we know
the probabilities on the right-hand side of the above equation, then know what
the probability on the left-hand side should be.


Is all historical reasoning just Bayes theorem?

First and foremost, I think it is entirely uncontroversial to say Bayes theorem
has something important to say about reasoning in general and so also historical
reasoning. For instance, by going through various toy examples, Bayes theorem
provide a powerful tool to weed out biases and logical fallacies we are all prone
to make.
However I believe Dr. Carrier has a more general connection between BT
and the historical method in mind. In chapter 3:
Since BT is formally valid and its premises (the probabilities
we enter into it) constitute all that we can relevantly say about
the likelihood of any historical claim being true, it should follow
that all valid historical reasoning is described by Bayes’s Theorem
(whether historians are aware of this or not). That would mean
any historical reasoning that cannot be validly described by Bayes’s
Theorem is itself invalid (all of which I’ll demonstrate in the next
chapter). There is no other theorem that can make this claim. But
I shall take up the challenge of proving that in the next chapter.
and later, just before the formal proof:
(...) we could simply conclude here and now that Bayes’s Theorem models and describes all valid historical methods. No other
method is needed, apart from the endless plethora of techniques
that will be required to apply BT to specific cases of which the AFE
and ABE represent highly generalized examples, but examples at
even lower levels of generalization could be explored as well (such as
the methods of textual criticism, demographics, or stylometrics). All
become logically valid only insofar as they conform to BT and thus
are better informed when carried out with full awareness of their
Bayesian underpinning. This should already be sufficiently clear by


now, but there are always naysayers. For them, I shall establish this
conclusion by formal logic
The crux of the logical argument seem to be this. Dr. Carrier define variables
C, D and E, of which only D and E will be of interest to us. The relevant part
of the argument is as follows:
Formally, if C = ”a valid historical method that contradicts BT,”
D = ”a valid historical method fully modeled and described by (and
thereby reducible to) BT,” and E = ”a valid historical method that is
consistent with but only partly modeled and described by BT,” then:
P8 Either C, D, or E. (proper trichotomy)
P10 If P5 and P6, then ∼E.
P11 P5 and P6.
C4 Therefore, ∼E
To establish premise P5 and P6, we consider a historical claim h, a piece of
evidence e and some background knowledge b. The premises are as follows:3
P5 Anything that can be said about any historical claim h that
makes any valid difference to the probability that h is true will
either (a) make h more or less likely on considerations of background knowledge alone or (b) make the evidence more or less
likely on considerations of the deductive predictions of h given
that same background knowledge or (c) make the evidence more
or less likely on considerations of the deductive predictions of
some other claim (a claim which entails h is false) given that
same background knowledge.
P6 Making h more or less likely on considerations of background
knowledge alone is the premise P (h|b) in BT; making the evidence more or less likely on considerations of the deductive predictions of h on that same background knowledge is the premise
P (e|h.b) in BT; making the evidence more or less likely on considerations of the deductive predictions of some other claim that
entails h is false is the premise P (e| ∼h.b) in BT; any value for
P (h|b) entails the value for the premise P (∼h|b) in BT; and
these exhaust all the premises in BT.

have chosen to follow Dr. Carriers typesetting and accordingly for propositions such
as A = ”It will rain tomorrow” and B = ”It will be cold tomorrow” the notation ∼A means
”not A” (”it will not rain tomorrow”) and A.B means ”A and B” (It will be rainy and cold


I think we can summarize the argument as follows: Consider a valid historical
method. Either the historical method is fully or partly described by Bayes
theorem. We can rule out the later possibility, E, for the following reason:
Anything that can be said about the probability a historical claim h is true
given some background knowledge b and evidence e, denoted by P (h|e.b), will
affect either P (h|b), P (e|h.b), P (e| ∼h, b) or P (h|b). However these values fully
determine P (h|e.b) according to Bayes theorem:
P (h|e.b) =

P (e|h.b)p(h|b)
p(e|h.b)p(h|b) + p(e| ∼h.b)p(∼h|b)

and so the method must be fully included in Bayes theorem, proving the original
I see two problems with the argument. The first is somewhat technical
but need to be raise: Though it is not stated explicitly, Dr. Carrier tacitly
assume anything we are interested in about a claim h is the probability h is true.
However I see no reason why this should be the case. For instance DempsterSchafer theory establish the support and plausibility (the later term is used to
a different effect than I did in the introduction) of a claim and multi-valued
logics attempts to define and analyze the graded truth of a proposition; all
of these are concepts different than the probability. It is not at all apparent
why these concepts can be ruled out as being either not useful or reducing to
Bayes theorem. For instance, suppose we define Jesus as a ”highly inspirational
prophet”, a great many in my field would say the modifier ”highly” is not well
analysed in terms of probabilities but requires other tools. More generally, it
goes without saying we do not have a general theory for cognition, and I would
be very surprised if that theory turned out to reduce to probability theory in
the case of history.
The second problem is more concrete and relates to the scope of what is being
demonstrated: Lets assume we are only interested in the probability of a claim
h being true. As noted in the past section, Bayes theorem is clearly only saying
something about how the quantity on the left-hand side of the above equation,
P (h|e.b), must be related to those on the right-hand side, and Dr. Carrier
is correct in pointing out any change in P (h|e.b) must (this is pure algebra!)
correspond to a change in at least one term on the right-hand side. The problem
is we do not know what those quantities on the right-hand side are numerically,
and we cannot figure them out only by applying Bayes theorem more times. For
instance, applying Bayes theorem to the term P (e|h.b) will require knowledge
of P (h|e.b), exactly the term we set out to determine.
This however seems to severely undercut the importance of what is being
demonstrated. Let me illustrate this by an example: Lets say I make a claim
such as:
Basic algebra [Bayes’s Theorem] models and describes all valid
methods for reasoning about unemployment [historical methods]
My proof goes at follows: Let X be the number of unemployed people, Y is the
number of people who are physically unable to work due to some disability and

Z is the number of people who can work but have not found a work. Now the
X =Y +Z
(contrast this equation to Bayes theorem). I can now make an equivalent claim
to P5 and P6: All that can be validly said about X must imply a change in
either Y or Z, and I can conclude: all that can validly be said about the number
of unemployed people must therefore be described by algebra.
Clearly in some sense this is true however it misses nearly everything of
economical interest such as what actually affects the terms Y and Z and by
how much; while it is clear if X change at least one of the terms Y or Z have
to change, algebra does not tell us which, just as Bayes theorem does not tell
us what the quantities P (e|h.b), P (h|b), · · · actually are, and it does not tell us
how the propositions e, h, b should be defined.
Suppose we try to rescue the idea of a formal proof by accepting the term
”a valid historical method” simply mean system (or method) of inference which
operate on the probability of propositions, without worrying which propositions are relevant (which Bayes theorem does not say) or how to obtain these
probabilities (which Bayes theorem does not say either). But if we accept this
definition, I see no reason why we could not simply replace the argument in
chapter 4 by the following:
Bayesian inference describes the relationship between probabilities of various propositions (c.f. Jaynes, 2003). In particular it
applies when the propositions are related to historical events.
This claim would of course be hard to expand into about half a chapter.
It is of course true Bayesian methods has found wide applications in almost
all sciences, but this has been because Bayesian methods has shown themselves
to work. I completely agree with Dr. Carrier that there are reasons to consider how Bayesian methods could be applied to history so as to give tangible
results, but the main point is this must be settled by giving examples of actual
applications that offer tangible benefits, just as it has been the case in all other
scientific disciplines where Bayesian methods are presently applied. This is what
I will focus on in the next sections.


Applications of Bayes theorem in ”Proving

To my surprise, Proving History contains almost no applications of Bayes theorem to historical problems. The purpose of most of the applications of Bayes
theorem in Proving History is to illustrate aspects of Bayes theorem and show
how it agree with our common intuition. Take for instance the first example in
the book, the analysis of the disappearing sun in chapter in chapter 3, which
seem mainly intended to show how different degree of evidence affect ones conclusion in Bayes theorem. The example considers an ahistorical disappearing

sun in 1989 with overwhelming observational evidence, and the claimed disappearing sun in the gospels with very little evidence, and shows that according
to Bayes theorem we should be more inclined to believe the disappearance with
overwhelming evidence. This is certainly true, however it is not telling us anything new.
The example which by far receive the most extensive treatment is the criteria
of embarrassment, for which the discussion take up about half of chapter five
and end with a computation of probabilities. I will therefore only focus on this


The criteria of embarrassment

The criteria of embarrassment (EC) is as follows:
The EC (or Embarrassment Criterion) is based on the folk belief that if an author says something that would embarrass him, it
must be true, because he wouldn’t embarrass himself with a lie. An
EC argument (or Argument from Embarrassment) is an attempt to
apply this principle to derive the conclusion that the embarrassing
statement is historically true. For example, the criterion of embarrassment states that material that would have been embarrassing to
early Christians is more likely to be historical since it is unlikely that
they would have made up material that would have placed them or
Jesus in a bad light, (Chapter 5)
Dr. Carrier then offers an extended discussion of some of the problems with
the criteria of embarrassment which I found well written an interesting. The
problems raised are: (1) the gospels are themselves very late making it problematic to assume the authors had access to an embarrassing core tradition they
felt compelled to write down (2) we do not know what would be embarrassing
for the early church and (3) would the gospel authors pen something genuinely
embarrassing at all?.
There then follows treatments of several ”embarrassing” stories in the gospels
where Dr. Carrier argues (convincingly in my opinion) there can be little ground
for an application of the EC. We then gets to the application of Bayes theorem:
Thus, for the Gospels, we’re faced with the following logic. If
N (T ) = the number of true embarrassing stories there actually were
in any friendly source, N (∼ T ) = the number of false embarrassing stories that were fabricated by friendly sources, N (T.M ) =
the number of true embarrassing stories coinciding with a motive
for friendly sources to preserve them that was sufficient to cause
them to be preserved, N (∼ T.M ) = the number of false embarrassing stories (fabricated by friendly sources) coinciding with a
motive for friendly sources to preserve them that was sufficient to
cause them to be preserved, and N (P ) = the number of embarrassing stories that were preserved (both true and fabricated), then

N (P ) = N (T.M ) + N (∼T.M ), and P (T |P ), the frequency of true
stories among all embarrassing stories preserved, = N (T.M )/N (P ),
which entails P (T |P ) = N (T.M )/(N (T.M ) + N (∼T.M )) Since all
we have are friendly sources that have no independently confirmed
reliability, and no confirmed evidence of there ever being any reliable
neutral or hostile sources, it further follows that N (T.M ) = qN (T ),
where q  1, and N (∼T.M ) = 1 × N (∼T ): because all false stories
created by friendly sources have motives sufficient to preserve them
(since that same motive is what created them in the first place),
whereas this is not the case for true stories that are embarrassing,
for few such stories so conveniently come with sufficient motives to
preserve them (as the entire logic of the EC argument requires). So
the frequency of the former must be 1, and the frequency of latter
(i.e., q) must be  1. Therefore: [Assuming N (T ) = N (∼T ) and
with slight changes to the typesetting]
P (T |P ) =

qN (T )
N (T.M )
N (T.M ) + N (∼T.M )
q × N (T ) + 1 × N (∼T )

So this is saying the probability a story will be true given it is embarrassing will
always be less than 0.5, so the EC actually works in reverse!


Reality Check

If you read a memoir and it said (1) the author severely bullied one of his
classmates for a year (2) the author once gave a large sum of money to a homeless
man, then, all things being equal, which of the two would you be more inclined
to believe the author to had made up? If the memoir was a gospel, we should
be more inclined to believe the story of the bullying was made up, however this
obviously goes against common sense! As Richard Carrier himself points out,
sometimes the EC does work, and any computation must at the least be able
to replicate this situation.


What happened

I think the first observation is the quoted argument in Proving History do not
actually use Bayes theorem (specifically, it avoids the use of probabilities), but
rely on fractions of the size of appropriately defined sets. I can’t tell why this
choice is made, but it is a recurring theme throughout the book to argue for
the application of Bayes theorem and then carry out at least a part of the
argumentation using non-standard arguments. Another thing I found confusing
was how the sets are actually defined and why they are chosen the way they
are. To first translate the criteria into Bayes theorem we need to define the


appropriate variables. As I understand the text they are defined as follows
T, F : The story is true (as opposed to fabricated)
Pres : The story was preserved
Em : The story is embaressing
The discussion carried out in the text now amount to the following assumptions
P (Pres| ∼T, Em) = 1
P (Pres|T, Em) = q < 1
The first assumption is saying the only way someone would fabricate a seemingly
embarrassing story is if it serves some purpose and so it must be preserved, and
the second is saying a true story which seems embarrassing might not serve a
specific purpose and we are not guaranteed it will be preserved. It should be
clear now we are really interested in computing P (T |Pres, Em), the probability a
story is true given it is preserved and seems embarrassing. Turning the Bayesian
P (Pres|T, Em)P (T |Em)
P (Pres|T, Em)P (T |Em) + P (Pres| ∼T, Em)P (∼T |Em)
qP (T |Em)
qP (T |Em) + P (∼T |Em)

P (T|Pres, Em) =

from which the result follows. We can try to translate the result into English: Suppose the gospel writers started out with/made up an equal number
of true and false stories that seems embarrassing today. However all the seemingly embarrassing stories that are false were made up (by the gospel writers
or whoever supplied them with their material) because they were significant
and were therefore preserved, and the true seemingly embarrassing stories were
preserved/writtern down by the gospel writers at a low rate, q, and therefore
almost all seemingly embarrassing stories that survive to this date are false.
A reader might notice I have used the phrase ”seemingly embarrassing”, by
which I mean ”seemingly embarrassing to us”. This is evidently required for the
argument to work. Consider for instance the assumption P (Pres| ∼T, Em) = 1.
However if Em meant that it was truly embarrassing to the author, this would
mean that false stories made up by friendly sources (are there any?) which were
truly embarrassing would always be preserved – a highly dubious assumption
and clearly contrary to Dr. Carrier’s argument.
A basic problem in the above line of argument is there is no way to encode
the information that a story was actually embarrassing. We are, effectively,
analysing the criteria for embarrassment without having any way to express a
story was embarrassing to the author!.
Embarrassing therefore become effectively synonymous with ”embarrassing
with a deeper literary meaning” (the reader can try the substitution this phrase
in the previous sections and notice the argument become more natural), and the

analysis boil down to saying stories with a deeper literary meaning (that also
happens to look embarrassing today) are for the most part made up, except a
few that are true and happens to have a deeper meaning by accident.


Adding Embarrassment to the Criteria of Embarrassment

To call something an analysis of the criteria of embarrassment, we need to
include an expressiveness amongst our variables that include the basic intuition
behind the criteria. I believe the following is minimal:
T, F : The story is true or fabricated
Pres : The story was preserved
Em : The story is seemingly embaressing (to us)
Tem : The story was truly embaressing to the author
Lp : The story served a litteraty purpose (we assume ∼Tem = Lp)
Notice Tem mean something different than Em: Tem mean the story was embarrassing to the one doing the preservation, Em means it seem embarrassing
to us 2000 years later. To put the EC into words: A person would not preserve something that was actually embarrassing which he knew was false, or in
P (Pres| ∼T, Tem) = 0
The following is always true:
P (Pres, T, Tem|Em) = P (Pres|T, Tem, Em)P (T |Tem, Em)P (Tem|Em)
Where I have been really sloppy in the notation and implicitly assume variables
such as T and Tem can also take values ∼T and ∼Tem = LP. The next step is
to add simplifying assumptions. I am going to assume
P (Pres|T, Tem, Em) = P (Pres|T, Tem)
P (T |Tem, Em) = P (T |Tem)
The assumptions here is that our (20th century) interpretation of whether a
story is embarrassing or not is secondary to if it was truly embarrassing. Next,
lets look at the likelihood term. I will assume:
P (Pres|F, Tem) = 0
P (Pres|F, LP) = l
P (Pres|T, Tem) = c
P (Pres|T, LP) = 1
The first and last specification is saying an author would never record something
truly embarrassing he knew was false, and he would always record something he

knew was true and served a literary purpose. The second specification is saying
the author will (with probability l) include stories that are false but nevertheless
serve a literary purpose, and the third that he has a certain candor that makes
him sometimes (with probability c) include embarrassing stories he know are
true. Turning the Bayesian crank now give:
P (T |Pres, Em) =

P (Tem|Em)P (T |Tem)c + P (LP|Em)P (T |LP)
P (Tem|Em)P (T |Tem)c + P (LP|Em)P (T |LP) + P (F |LP)P (LP|Em)l

This is a bit of a mess. Lets begin by assuming we are equally good at
determining if a story is truly embarrassing or serves a literary purpose, ie.
P (Tem|Em) = P (LP|Em) = 0.5 and we know nothing of the (conditional)
probability a story is true/false, eg. P (T |Tem) = P (T |LP) = 0.5. In this case:
We can now try to plug in some limits. Assume the gospel authors have perfect
candor and will always report true stories (c = 1) we get:
P (T |Pres, Em) =

∈ [ ; 1]
so in this case the criteria of embarrassment actually work. Another case might
be where the gospel authors have no candor and will always suppress embarrassment stories, c = 0, and in this case
P (T |Pres, Em) =
∈ [ ; 1]
so actually the criteria of embarrassment also work in this limit(!). To recover
Dr. Carrier’s analysis, we need something more. Inspecting the full expression
reveal the easiest thing to assume is something like:
P (T |LP) = q <
Which is saying stories that serves a literary purpose are likely to be made up.
I suppose which value you think q would have depend on how you view Jesus:
Do you expect him to have lived the sort of life where many of the things he did
or said would have a deeper literary purpose afterwards? Your religious views
may influence how you judge that question to put it mildly. At any rate, this
lead to the new expression:
P (T |Pres, Em) =
c + q + (1 − q)l
P (T |Pres, Em) =

It is difficult to directly relate this expression to Dr. Carrier’s analysis, however
lets assume a story is preserved with probability 1 if it is true and serves a
literary purpose (l = 1) and a story which is true but also embarrassing will
never be preserved (c = 0). Then we simply obtain
which is qualitatively consistent with Dr. Carrier’s result.
P (T |Pres, Em) = q <



Some thorny issues

Dr. Carrier offered one analysis of the EC which indicate embarrassment lower
the probability a story is historical, I included a variable that actually allow
for a story to be embarrassing and got the opposite result. My point is not to
demonstrate one of us is wrong or right, but motivate some questions I think
are problematic in terms of applying Bayes theorem to history:
Do we actually model history: I think both Dr. Carriers and my analysis
contained a term like P (Pres|T, x) (x possibly meaning different things).
The model this presume is something akin to the following: The gospel authors are compiling (or preserving) a set of stories with knowledge of their
truth-value and –at least in my case– knowledge of their literary purpose
and them being embarrassing. However I think it is uncontroversial to say
this is a bad model of how the gospel authors worked. For instance the
gospel authors also made up a good deal of the gospels, changed stories
to fit an agenda and so, and this should also figure in the analysis.
When true is not true: Continuing the above line of thought: With some
probability, which we need to estimate, the gospel authors did not know
what was true or false per see because they were writing about events
that may have happened 40 years prior. This means that conditioning
on a variable T (true) is problematic. True seem to more likely mean
(with some probability at least) that the statement was something that
were being told by the Christian community and believed to be true. This
should be included in the analysis.
Where do the stories come from: Continuing the above line of thought,
if the Gospel authors had access to a set of stories about Jesus, we need
to ask where they came from. This lead to a secondary application of the
criteria of embarrassment, but with the subtle difference that we know
even less about who the original compilers (or tellers) of these stories
were, what they would find embarrassing, what they actually produced
and so on, this should also be included in the analysis.
Variable sprawl: A basic point is this: If we want to determine how well the
criteria of embarrassment work in a Bayesian fashion, we need to model
the underlying situation with some accuracy. Continuing the above line
of thought would properly result in a good 10-20-(100?) variables that
mean different things and are all relevant to determining if a seemingly
embarrassing story is historical or not. Basically, every time one have a
noun and a ”might” or ”properly”, there is a new variable for the analysis,
and we must include this variables in our analysis. Determining what the
variables actually mean, what their probabilities are and how they (numerically) affect each other is a truly daunting task that scale exponentially
in the number of variables. Is it possible to undertake this project and
expect some accuracy at the end?


Toy models: An alternative view is to undertake the analysis using naive toy
models and arguing why large parts of the problem can either be ignored
or approximated by these toy model. This is what both I and Dr. Carrier
has done. This is properly a more fruitful way to approach the problem,
however since all toy models are going to be wrong (the fact Dr. Carrier
and I produced exactly opposite results is evidence of this), this raises some
basic questions of how the numerical estimates we get out are connected
to the historical truth of any given proposition under consideration.
In statistical modelling, or any other science for that matter, whenever one is
postulating a model, no matter how reasonable the assumptions that goes into
it may seem, there must be a step where the result is validated in some way by
predicting a feature of the data which can be checked. I hope the disagreement
of Dr. Carrier’s model for the Criteria of Embarrassment and my proposed
model will convince the reader such measures are required.
How such validations should be carried out is not discussed in proving history, nor does one get the impression there would be much of a need in the first
place. I will try to illustrate how ”Proving History” treats this issue by two
examples. The first is from chapter six, on resolving expert disagreement, in
which it is discussed at some length is how Bayes theorem can be used to make
two parties agree:
The most common disagreements are disagreements as to the
contents of b (background knowledge) or its analysis (the derivation
of estimated frequencies). Knowledge of the validity and mechanics
of Bayes’s Theorem, and of all the relevant evidence and scholarship,
must of course be a component of b (hence the need for meeting those
conditions before proceeding). This basic process of education can
involve making a Bayesian argument, allowing opponents to critique
it (by giving reasons for rejecting its conclusion), then resolving that
critique, then iterating that process until they have no remaining objections (at which time they will realize and understand the validity
and operation of Bayes’s Theorem and the soundness of its application in the present case). So, too, for any other relevant knowledge
although they may also have their own information to impart to you,
which might in fact change your estimates and results, but either way
disagreements are thereby resolved as both parties become equally
informed and negotiate a Bayesian calculation whose premises (the
four basic probability numbers) neither can object to, and therefore
whose conclusion both must accept
A worrying aspect of the above quote is how Dr. Carrier discuss these problems
as having to do with estimating the ”four basic probability numbers”, by which
I assume he really intend to say the three numbers p(e|h, b), p(h|b), p(e| ∼h, b).
Just to take my toy example from above, there will very clearly be more than
four numbers involved. In fact, the amount of numbers will grow exponentially
in the number of different binary variable (such as T , Em, Tem, etc. in the

above) we attempt to treat in our analysis. I think the pressing issue is not if
or if not two perfectly rational scholars should in principle end up agreeing, but
how we ourselves would know what we were doing had scientific value and what
two scholars should do in practise.
The second suggestion in Proving History is a-fortiori reasoning. This roughly
means using the largest/smallest plausible value of probabilities in the analysis
to see which kind of results one may obtain. I think there are ample reasons
to suspect, based on the past example alone, that one can get divergent results
this way. At any rate such over or underestimation would not fix the problem
of having the wrong model to begin with, a point the toy example above should
be sufficient to demonstrate.


The re-interpretation of probability theory

In my reading of the book there was a number of times where I had problems following the discussion, for instance when discussion how to obtain prior
probabilities from frequencies, or the suggestion of a-fortiori reasoning. I think
Chapter six, ”The technical stuff”, explain much of this confusion, namely Dr.
Carriers suggestion for how one can combine the Bayesian and the frequentist
view on probabilities, which is also a main theoretical contribution of the book.
Before I return to some more practical considerations I wish to treat Dr. Carrier’s suggestion in more details.
One of the main purposes of chapter six is to address some philosophical
issues of Bayesian theory. Dr. Carrier introduce the chapter with these words:
Six issues will be taken up here: a bit more on how to resolve
expert disagreements with BT; an explanation of why BT still works
when hypotheses are allowed to make generic rather than exact predictions; the technical question of determining a reference class for
assigning prior probabilities in BT; a discussion of the need to attenuate probability estimates to the outcome of hypothetical models
(or a hypothetically infinite series of runs), rather than deriving estimates solely from actual data sets (and how we can do either); and a
resolution of the epistemological debate between so-called Bayesians
and frequentists, where I’ll show that since all Bayesians are in fact
actually frequentists, there is no reason for frequentists not to be
Bayesians as well. That last may strike those familiar with that
debate as rather cheeky. But I doubt you’ll be so skeptical after having read what I have to say on the matter. That discussion will
end with a resolution of a sixth and final issue: a demonstration of
the actual relationship between physical and epistemic probabilities,
showing how the latter always derive from (and approximate) the
Where the emphasis are the claims I will focus on in this review. In reviewing
Dr. Carriers suggestion, I will not focus so much on the ”debate” between fre15

quentists and Bayesians (in my experience it is a not something one encounters
very frequently), but rather on Dr. Carriers proposed interpretation of Bayesian
probabilities. I apologize in advance the section will be somewhat technical at
places, I have tried to structure it by providing what I consider a ”standard”
Bayesian answer (these sections will be marked with an *) to the questions Dr.
Carrier attempt to answer, and then discuss Dr. Carriers alternative suggestion.
But before I begin I think it is useful to review the standard Bayesian interpretation of the two central terms Dr. Carrier seek to investigate, namely
probabilities and frequencies. The following continue from the introduction of
Bayes theorem I outlined in the first section. I will refer readers to E.T. Jayne’s
book which discuss these issues with much more clarity.


Probabilities and frequencies. The mainstream view*

The mainstream Bayesian view on frequencies and probabilities can be summarized as follows:
Probabilities represent degrees of plausibility. Probabilities therefore refer
to a state of knowledge of a rational agent and are either assigned based on (for
instance) symmetry considerations (the chance a coin come up heads is 50%
because there are two sides) or derived from other probabilities according to the
rules of probability theory (hereunder Bayes theorem).
Frequencies is a factual property of the real world that we measure or
estimate. For instance, if we count 10 cows on the field and notice 3 are red, the
frequency of red cows is 3/10 = 0.3. This is not a probability. The two things
simply refer to completely different things: Probabilities change when our state
of knowledge change, frequencies do not.
With these things in mind lets focus on Dr. Carriers definition of probabilities and frequencies:


Richard Carriers proposal

A key point I found confusing is what Dr. Carrier actually mean by the word
probability. The word is used from the beginning to the end of the book, however
an attempt to clarify it’s meaning is only encountered in Chapter 2, right after
stating axiom 4: ”Every claim has a nonzero probability of being true or false
(unless it’s being true or false is logically impossible)” 4 the following clarification
follows: probability here I mean epistemic probability, which is the
probability that we are correct when affirming a claim is true. Setting aside for now what this means or how they’re related, philosophers have recognized two different kinds of probabilities: physical
and epistemic. A physical probability is the probability that an
4 The axioms and rules are themselves somewhat. If the historical method reduces to
application of Bayes theorem, shouldn’t we rather be interested in the assumptions behind
Bayes theorem?


event x happened. An epistemic probability is the probability
that our belief that x happened is true.
Notice that both the definition of ”probability”, ”epistemic probability” and
”physical probability themselves rely on the word ”probability” which is of
course circular. The definition is revisited in chapter 6 in the section ”The
role of hypothetical data in determining probability”. The definition (it is hard
to tell if an actual definition is offered) introduce the axillary concepts ”logical
truths”, ”emperical truths” and ”hypothetical truths”. I will confess I found
the chapter very difficult to understand, and I will therefore provide quotations
before giving my own impression of the various definitions and arguments such
that the reader can form his own opinion.
What are probabilities really probabilities of? Mathematicians
and philosophers have long debated the question. Suppose we have
a die with four sides (a tetrahedron), its geometry is perfect, and we
toss it in a perfectly randomizing way. From the stated facts we can
predict that it has a 1 in 4 chance of coming up a 4 based on the
geometry of the die, the laws of physics, and the previously proven
randomizing effects of the way it will be tossed (and where). This
could even be demonstrated with a deductive syllogism (such that
from the stated premises, the conclusion necessarily follows). Yet
this is still a physical probability. So in principle we can connect
logical truths with empirical truths. The difference is that empirically we don’t always know what all the premises are, or when or
whether they apply (e.g. no die’s geometry is ever perfect; we don’t
know if the die-thrower may have arranged a scheme to cheat; and
countless other things we might never think of). That’s why we
can’t prove facts from the armchair.
From this, it seem the ”logical truth” is the observation a perfectly random
throw with a perfect die with four sides will come up 4 exactly 1/4’th of the time.
Dr. Carrier note this probability is connected to the ”physical probability”, by
which I believe is meant how a concrete die will behave. While it is clearly true
the two things must be connected in some way, the entire point must be how the
two are connected. In the following section Dr. Carrier (correctly) identify this
connection as having to do with our lack of knowledge. The text then continue:
Thus we go from logical truths to empirical truths. But we have
to go even further, from empirical truths to hypothetical truths. The
frequency with which that four-sided die turns up a 4 can be deduced
logically when the premises can all be ascertained to be true, or near
enough that the deviations don’t matter (...), yet ascertained still
means empirically, which means adducing a hypothesis and testing
it against the evidence, admitting all the while that no test can
leave us absolutely certain. And when these premises can’t be thus
ascertained, all we have left is to just empirically test the die: roll

it a bunch of times and see what the frequency of rolling 4 is. Yet
that method is actually less accurate. We can prove mathematically
that because of random fluctuations the observed frequency usually
won’t reflect the actual probability. For example, if we roll the die
four times and it comes up 4 every time, we cannot conclude the
probability that this die will roll a 4 on the next toss is 100% (or
even 71%, which is roughly the probability that can be deduced if
we don’t assume the other facts in evidence). That’s because if the
probability really is 1 in 4, then there is roughly a 4% chance you’ll
see a straight run of four 4’s (mathematically: 0.254 = 0.00390625)
I believe the above discussion can be summarized as follows: Suppose we have
an idealized die with four sides we roll in an idealized way. The chance it will
come up 4 is (exactly) 0.25. This is what Dr. Carrier call a hypothetical truth.
However, since the die has minute random imperfections, the real chance it will
come up 4 is slightly different, perhaps 0.256. This is the physical probability.
The reason why these two numbers are different is because we are unaware of
the small imperfections in the die. Now, if we roll an actual die a number of
times, say 4, and compute the frequency of times the die will come up 4 to the
total number of rolls, we will get a third number which properly will not be
any of the above. In fact, the fluctuations that are being discussed are exactly
distributed according to the previously introduced expression, viz.: 

N n
P (n rolls|N rolls) =
p (1 − p)N −n , and p = 0.25.
While there are a few minor points about the way the problem is laid out
(for instance the use of the word ”chance” is problematic; how is that defined
without reference to probabilities?) and the terminology, the problems raised
above -namely how these three numbers are related- is the central one. We will
now turn to Dr. Carrier’s proposal, the discussion continue as follows:
Even a thousand tosses of an absolutely perfect four-sided die
will not generate a perfect count of 250 4’s (except but rarely). The
equivalent of absolutely perfect randomizer do exist in quantum mechanics. An experiment involving an electron apparatus could be
constructed by a competent physicist that gave a perfect 1 in 4 decision every time. Yet even that would not always generate 250 hits
every 1,000 runs. Random variation will frequently tilt the results
slightly one way or another. Thus, you cannot derive the actual
frequency from the data alone. For example, using the hypothetical
electron experiment, we might get 256 hits after 1,000 runs. Yet we
would be wrong if we concluded the probability of getting a hit the
next time around was 0.256. That probability would still be 0.250.
We could show this by running the experiment several times
again. Not only would we get a different result on some of those
new runs (thus proving the first result should not have been so concretely trusted), but when we combined all these data sets, odds are

the result would converge even more closely on 0.250. In fact you
can graph this like an approach vector over many experiments and
see an inevitable curve, whose shape can be quantified by mathematical calculus, which deductively entails that that curve ends (when
extended out to infinity) right at 0.250. Calculus was invented for
exactly those kinds of tasks, summing up an infinite number of cases,
and defining a curve that can be iterated indefinitely, so we can see
where it goes without actually having to draw it (and thus we can
count up infinite sums in finite time).
The last paragraph verge on gobblygog in using technical words in a manner that
is both unclear and very hard to recognize. The proposal seem to be that if we
carry out the idealized experiment out for sufficiently long time, the observed
frequency will converge towards 0.25. A reader who is unfamiliar with this
result should keep in mind that a formal statement of the result (from the setup
I assume it is the weak law of large numbers Dr. Carrier has in mind) contain
the somewhat technical statement: ”..will converge with probability one..”, so
if one is using such an argument to later define probability there is again an
issue of circularity. Directly following the above paragraph is this:
Clearly, from established theory, when working with the imagined quantum tabletop experiment we should conclude the frequency
of hits is 0.25, even though we will almost never have an actual data
set that exhibits exactly that frequency. Hence we must conclude
that that hypothetical frequency is more accurate than any actual
frequency will be. After all, either the true frequency is the observed
frequency or the hypothesized frequency; due to the deductive logic
of random variation you know the observed frequency is almost never
exactly the true frequency (the probability that it is is always ≤ 0.5,
and in fact approaches 0 as the odds deviate from even and the
number of runs increases); given any well-founded hypothesis you
will know the probability that the hypothesized frequency is the true
frequency is > 0.5 (and often  0.5), and certainly not → 0); therefore P (THE HYPOTHESIZED FREQUENCY IS THE TRUE FREQUENCY)
quite often P (HYPOTHESIZED)  P (OBSERVED). So the same is true
in every case, including the four-sided die, and anything else we are
measuring the frequency of. Deductive argument from empirically
established premises thus produces more accurate estimates of probability.
The main philosophical ”charge” (if you will) leveled by Bayesian statesticians
against frequentists is a frequentist view tend to require thought-experiments
in idealized situations that are run to infinite, and I will just notice we are now
having a imagined quantum tabletop experiment where we assume we know the
limit frequency is 0.25 (no concrete experiment I can think of would behave
like that, and no experiment can be run to the limit of infinite). The typical

Bayesian objection is that while we are free to think of this idealized situation as
a thought-experiment, it is quite different to eg. the situation where we consider
the probability a corpse is stolen from a grave. Again I will refer to Jaynes book
for a deeper treatment of the problems that arise and again only notice Carrier
does not discuss them at all.
However Dr. Carrier also introduce some novel problems in his discussion.
Consider the statement: ”After all, either the true frequency is the observed
frequency or the hypothesized frequency”. But clearly this is false. Suppose i
hypothesize that the so-called hypothesized frequency of the die coming up 4
is 0.25. I then roll the die 10 times and get a observed frequency (in Bayesian
terms, the frequency) of 3/10. However both of these values are going to be
wrong, because clearly the microscopic imperfections in the die is going to mean
it will have a different ”true frequency” (in Dr. Carriers language) than either
0.25 or 0.3, simply due to the fact there are an infinite number of other candidate
true frequencies. The statement is therefore in any practical situation a false
dilemma; regarding the inequalities what would happen would be that both
sides would tend towards zero, in direct contradiction to what Dr. Carrier write
(because, again staying in the frequentist language, the true frequency is with
probability 1 something else) and depending on the situation the inequality
could go either way. The argument is simply false.
Finally, and this is a recurrent theme, it is very hard to tell what has actually
been defined. I have carefully gone through the chapter, and the above quotation
is the first time the word ”hypothetical frequency” is used. But what exactly
does it mean? The closest to a definition is shortly later in chapter six: ”Thus
we must instead rely on hypothetical frequencies, that is, frequencies that are
generated by hypothesis using the data available which data includes not just
the frequency data (from which we can project an observed trend to a limit of
infinite runs), but also the physical data regarding the system that’s generating
that frequency (like the shape and weight distribution of a die).”. What I think
is intended here is to say the ”hypothetical frequency” represent our best guess
at what will happen with the die (or quantum tabletop experiment) if we roll
it in the future, given given our knowledge of the geometry of the die and past
rolls. In Bayesian terms, we would call this the probability.
Having introduced observed and hypothetical frequencies, we can now begin
to make headway towards defining probabilities, unfortunately it is done in a
very indirect manner:
...that hypothetical frequencies are more accurate than observed
frequencies, should not surprise anyone. ... if we take care to manufacture a very good four-sided die and take pains to use methods of
tossing it that have been proven to randomize well, we don’t need
to roll it even once to know that the hypothetical frequency of this
die rolling 4’s is as near to 0.25 as we need it to be. (...) Thus
it’s not valid to argue that because hypothetical frequencies are not
actual data, and since all we have are actual data, we should only
derive our frequencies from the latter. All probability estimates (even


of the very fuzzy kind historians must make, such as occasioned in
chapters 3 through 5) are attempted approximations of the true frequencies (as I’ll further explain in the next and last section of this
chapter, starting on page 265). So that’s what we’re doing when we
subjectively assign probabilities, attempting to predict and thus approximate the true frequencies, which we can only approximate from
the finite data available because those data do not reflect the true
frequency of anything (...). Thus we must instead rely on hypothetical frequencies, that is, frequencies that are generated by hypothesis
using the data available which data includes not just the frequency
data (from which we can project an observed trend to a limit of
infinite runs), but also the physical data regarding the system that’s
generating that frequency (like the shape and weight distribution of
a die). Of course, when we have a lot of good data, the observed and
hypothetical frequencies will usually be close enough as to make no
difference. [my italic]
The question I started out with was this: What is a probability in Proving
History? To the best of my knowledge, probability is being equated with hypothetical frequencies, however this suggestion is definately non-Bayesian and
is plagued by all the problems Bayesian has been raising for nearly a century,
starting with Dr. Carriers main technical reference for Bayes theorem, namely
Jaynes book.
The first thing to notice is the discussion above is entirely focused on dies and
quantum tabletop computers, that is, experiments which we can easily imagine
be carried out over and over again. However these setups are very different from
the ones we are really interested in, namely probabilities of historical events
that perhaps only happened once. To give a concrete example of this difficulty,
consider the following propositions
A : ”I believe with probability 0.8 that the 8th digit of π is a nine”
In a Bayesian view, the term ”with probability 0.8” refer to a state of knowledge
of π, and thus require no axillary considerations; it simply reflect me thinking
the 8th digit is properly a nine while not being certain.
However, in the interpretation above, when we assign a probability of 0.8 to
the statement then (to quote): ”what we’re doing when we subjectively assign
probabilities,[is] attempting to predict and thus approximate the true frequencies,
which we can only approximate from the finite data available”. But what is the
true frequency of the 8th digit in π being a 9? Why should we think there is
such a thing? How would we set out to prove it exists? What is the true value
of the true frequency? The basic reason why these questions are hard to answer
is this: either it is or it is not a nine, and the reason I am uncertain reflect
only my lack of knowledge. A Bayesian treatment give a direct analysis of this
situation, an attempt to connect it to a quantum tabletop experiment does not.
The situation is analogous for history. Consider for instance the probability
Caesar crossed the Rubicon, or a miracle was made up and attributed to a

first-century miracle worker. The notion of ”true frequency” in these situations
become very hard to define, however if we accept probability simply refer to our
degree of belief there is no need for such thought experiments.


The connection between frequencies and probabilities

The last section of Chapter six offers a main philosophical point of the book,
namely a combination of frequentistic and Bayesian view of probabilities. This is
done by re-interpretating what is meant by Bayesian probabilities. The chapter
open thus:
Probability is obviously a measure of frequency. If we say 20%
of Americans smoke, we mean 1 in 5 Americans smoke, or in other
words, if there are 300 million Americans, 60 million Americans
smoke. When weathermen tell us there is a 20% chance of rain
during the coming daylight hours, they mean either that it will rain
over one-fifth of the region for which the prediction was made (i.e., if
that region contains a thousand acres, rain will fall on a total of two
hundred of those acres before nightfall) or that when comparing all
past days for which the same meteorological indicators were present
as are present for this current day we would find that rain occurred
on one out of five of those days (i.e., if we find one hundred such days
in the record books, twenty of them were days on which it rained).
Speaking of bold assertions, consider the first line: ”Probability is obviously a
measure of frequency”. The basic problem is this: If this is obvious, how come
Bayesians has failed to see the obvious for 50 years and insisted on probability
as being rational degrees of belief, ie. a state of knowledge? if it is obvious
how come the main technical reference, Jaynes book, dedicate entire chapters
to argue against this misconception?
What is of course obvious is one can go from probabilities to frequencies
-as I have already illustrated with the example of the coin-, but in that case
the implication goes the other way: If the probability is defined in a situation
where there is a well-defined experiment, such as with a coin, one can make
probabilistic predictions about it’s frequency using Bayesian methods.
What is frustrating is Dr. Carriers examples illustrate this well. For instance,
if I am the weatherman, if I say i believe it will rain tomorrow with probability
0.2, what I mean is most definitely not what Dr. Carrier says, ”it will rain
over one-fifth of the region”. Think of how variable the weather is and how
nonsensical that statement is if you take it at face value! In fact, I would be
be almost certain that it might rain over either 1/10 or 1/2 or 1/3 or some
other fraction of the region. What I am trying to convey is a have a lack of
knowledge whether or not it will rain tomorrow, and my models and data (and
possible Bayes theorem) allow me to quantify this as being 0.2, full stop, no
further thought-experiments required!.

The section continues directly:
Those are all physical probabilities. But what about epistemic
probabilities? As it happens, those are physical probabilities, too.
They just measure something else: the frequency with which beliefs
are true. Hence all Bayesians are in fact frequentists (and as this
book has suggested, all frequentists should be Bayesians). When
Bayesians talk about probability as a degree of certainty that h is
true, they are just talking about the frequency of a different thing
than days of rain or number of smokers. They are talking about
the frequency with which beliefs of a given type are true, where
of a given type means backed by the kind of evidence and data
that produces those kinds of prior and consequent probabilities. For
example, if I say I am 95% certain h is true, I am saying that of all
the things I believe that I believe on the same strength and type of
evidence as I have for h, 1 in 20 of those beliefs will nevertheless still
be false (...). Probability can be expressed in fractions or percentile
notation, but either is still a ratio, and all ratios by definition entail a
relation between two values, and those values must be meaningful for
a probability to be meaningful. For Bayesians, those two values are
beliefs that are true and all beliefs backed by a certain comparable
quantity and quality of evidence, which values I’ll call T and Q. T
is always a subset of Q, and Bayesians are always in effect saying
that when we gather together and examine every belief in Q, we’ll
find that n number of them are T , giving us a ratio, nt /nq , which is
the epistemic probability that any belief selected randomly from Q
will be true
The good news about the proposal is that it is relatively clearly stated, the bad
news is it is both unnecessary and defective. That the definition is defective is
properly best illustrated with a small puzzle: Suppose I have a coin of which
I know if I flip it two times (independently), the chance it will come up heads
both times is 1/2. What is the probability it will come up heads if I flip it once?
The problem is easy to solve: P (HH) = P (H)P (H) = 21 and so P (H) =

1/ 2. Now, the problem is 1/ 2 cannot be represented as a fraction of two
integers, so when Dr. Carrier writes: Probability can be expressed in fractions
or percentile notation, but either is still a ratio, and all ratios by definition
entail a relation between two values, and those values must be meaningful for a
probability to be meaningful., and then go on to define the probability in terms
of fractions of integers (see the quotation above), he is exactly excluding the
above case.
It goes without saying the coin should not and do not pose a problem from a
Bayesian or frequentist perspective.
There are two ways to avoid the problem: One is to say we simply don’t
care about the coin because it’s a stupid example. In my opinion thats just
admitting the proposed definition do not work. The other is to say the above
discussion only applies to epistemic probabilities and the coins probability is

something else which we have not defined. The problem is this would create
absurdities, because I could then change the type of probability from epistemic
to ”that something else” by considering a new system that involve the coin at
some point.
I think this basic example is fatal in terms of obtaining a general and consistent theory out of Dr. Carriers proposal, but to avoid charges of rejecting a
good idea because of some mathematical trickery which can perhaps be fixed,
I want to point out some other more serious ailments of the proposal of which
the coin-example is only a symptom:
Lets simply try to imagine how the proposal can be implemented. Suppose
I consider the statement: ”I will get an email between 16.00-17.00 today”. Lets
say that after thinking about this as carefully I can, possibly using Bayes theorem, I arrive at a probability of 0.843 of that statement being true. Now, to
implement the above definition, I think very carefully about all I know and,
though I cannot at the moment tell how I would arrive at this conclusion, I realize I know exactly 3 other things on ”the same type and strength of evidence”
as was the case of the email, giving nq = 4. I now need to compute nt , namely:
beliefs that are true. A basic problem is that I wouldn’t know how to do this
because I do not know which of these are true or not, so I suppose I should
imagine I have access to an oracle that knows the real truth.
At any rate, even without the oracle, nt can take the values: 0, 1, 2, 3 and 4.
This give 5 different possible epistemic probabilities, nt /nq = 0, 1/4, 1/2, 3/4, 1,
none of which is 0.843. So does this mean I didn’t really believe the statement
at probability 0.843? In which case, with what probability do I believe the
statement with, then? Does it mean the probabilities we have available is limited
by how many things we know? If taken at face value, the proposal seems entirely
To counter any claim I am quoting Dr. Carrier out of context the proposal
is summarized later in the section as follows:
So when you say you are only about 75% sure you’ll win a particular hand of poker, you are saying that of all the beliefs you have
that are based on the same physical probabilities available to you in
this case, 1 in 4 of them will be false without your knowing it, and
since this particular belief could be one of those four, you will act
accordingly. So when Bayesians argue that probabilities in BT represent estimates of personal confidence and not actual frequencies,
they are simply wrong. Because an estimate of personal confidence
is still a frequency: the frequency with which beliefs based on that
kind of evidence turn out to be true (or false). As Faris says of
Jaynes (who in life was a prominent Bayesian), Jaynes considers
the frequency interpretation of probability as far too limiting. Instead, probability should be interpreted as an indication of a state of
knowledge or strength of evidence or amount of information within
the context of inductive reasoning. But an indication of a state of
knowledge is a frequency: the frequency with which beliefs in that


state will actually be true, such that a 0.9 means 1 out of every 10
beliefs achieving that state of knowledge will actually be false (so of
all the beliefs you have that are in that state, 1 in 10 are false, you
just won’t know which ones). This is true all the way down the line.
If anything I think this write up is even more muddled. To take the poker
example, I don’t know 1 in 4 things I know with probability 0.75 will be false.
Why should they? It might turn out everything I know with probability 0.75
will be true.
Asides suffering from the above flaws, it suffer from all the other flaws I
previously discussed: Suppose I know exactly 4 things with probability 0.75:
The 6th digit of π is 3, that the Brazillians speak Brazillian, that there are 52
states in USA and that Adam and Eve really lived; however these things will all
be false! For that reasoner, the frequency of which beliefs based on that type
of evidence turn out to be false is 1. This is no problem if we use probability
to refer to a state of knowledge, as Jaynes do, but it is a problem if we want
to root it in what is actually the case, as Dr. Carrier suggests. Again there is
absolutely nothing novel about the points raised here they can all be found in
Jaynes book.
One might attempt to rescue the proposal as follows. Suppose one say:
”I did not intend to say, ’a [probability of ] 0.9 means 1 out of every
10 beliefs achieving that state of knowledge will actually be false’. I
merely meant: The average (or expected value) of nt /nq is 0.9”. The problem of
such a definition is that it will almost inherently be circular, since the average is
computed using the probability, and so cannot be used to define probabilities. A
source of confusion is that we can make probabilistic statements about nt , but
in order to do so require we have a theory for probabilities. What is that theory?
If we are frequentists, we need to consider why it should apply to statements
about eg. Jesus. If it is Bayesian, well, there is your theory. There is no reason
to force an ad-hoc layer of interpretation on top of it.


There is good weather at infinity

That the proposal is flawed simply by the virtue of not allowing one to represent
a probability of 0.843 if one only know 3 other things at that confidence (and
suppose how unfortunate
we would be if we only knew 1 such thing..) or a a

probability of 1/ 2 makes me suspect Dr. Carrier had intended some sort of
limit statement, that is, using infinities in some way.
A basic problem of using infinities is the things we consider are not infinite.
If we have two interpretations of assigning probabilities in the case of 3 coins
and the existence of 1 Jesus, and the first only require us to consider 3 coins and
one Jesus while the second require us to consider an infinite number of coins
and Jesuses, I think there is ample reason to suspect the first proposal is the
more fundamental for the sole reason there was at most one Jesus.
Nevertheless I will briefly mention 3 ways to attempt to ”fix” the proposal
above by appeal to infinities and simply notice there is no need for any similar


ad-hockery in a Bayesian interpretation.
The first is to propose we always know an infinite number of things of any
given probability. I think this proposal can be rejected on the grounds it is
blatantly false.
The second proposal is somewhat related to the first, and that is that to make
sense of any given probability of (say) 0.8, we immediately imagine an infinite
number of coin-flips with biased coins that come up heads with probability 0.8
and define probability from this. I suspect it is hard to define this in a noncircular fashion (keep in mind random must be defined in this context without
using probability), but a worse problem is the chance of the event happening
in the real world is irrelevant to the definition, since the limit will be entirely
dominated by the infinite number of hypothetical coins. Thus, the proposal
has no normative power. Finally the proposal seem to simply be a fancy way
of arriving at the number 0.8: Does the proposal effectively differ from simply
saying a probability of 0.8 is taking a cake and dividing it in such a way the one
part is in a ratio of 0.8 to the total and that’s the definition of probability? Put
briefly, I don’t see how the proposal has a normative effect on how probabilities
are used.
The third proposal is going deeper into frequentist land and imagining an
infinite number of words in which we believe things at a probability of 0.8 and
imagine the probability is defined as how often things believed at a probability
0.8 turn out to be true in these worlds. This is basically the frequentist definition
of probabilities, and contain all the illusions of circularity and fancy reasoning
Bayesians usually object to, and has led frequentists themselves to object to the
idea we can assign probability to things like Jesus raising from the dead. For
instance, how does the infinite number of worlds where the 6th digit of π is 3
look like?


The Bayesian/frequentist divide is not only about probabilities

Finally, I am not sure how the division between frequentists and Bayesians is
resolved even if the proposal work. The division involve things such as if data
is fixed and parameters are variable, or if data is variable and parameters is
fixed. It involve frequentists objecting to applying Bayes theorem to things
like those considered by Dr. Carrier, and it involve (at least some) Bayesians
rejecting frequentist methods such as confidence intervals and t-tests as blatant
ad-hockery that should go the way of the Dodo. I simply do not see how adding
a layer of frequencies on top of Bayesian language affect the difference of opinion
on these issues.


The big picture

Why should we accept Bayes theorem and it’s applications to questions like the
book of Matthew being written by Matthew? If we do, it must be because of a


rigorous argument. I believe that eg. Cox and Jaynes provide such arguments,
and it seem Dr. Carrier believe so as well, recall from chapter three:
The theorem was discovered in the late eighteenth century and
has since been formally proved, mathematically and logically, so we
now know its conclusions are always necessarily trueif its premises
are true”
Though the claim us surprisingly not given a reference, Carrier himself suggest
exactly Jaynes. But I think it is evident Dr. Carrier is in opposition to most
of Jaynes philosophical points and assumptions from first to last chapter. For
instance, Dr. Carrier advance several different notions of probability (probability, physical probability, epistemic probability, hypothized probability) and of
frequencies. Suppose all of these are equivalent to what Jaynes call probability
and frequency, in that case why confuse the language and not simply talk about
probability and frequency?
The most logical conclusion, which I repeat think is very evident from simply
noting the differences in opinion I have documented above, is Dr. Carrier is in
opposition to Jaynes and by extension Cox and most other Bayesian thinkers of
the 20th century. In that case why should we think Bayes theorem hold? How
do we set out to prove it? Simply pointing to the Kolmogorov Axioms wont cut
it: Sure, that give us a mathematical theory of probabilities, but why suppose
it applies to historical questions any more than the theory of matrices?
The alternative is that Dr. Carrier is in agreement with eg. Jaynes and
Cox and I have just been to sloppy to see it. For instance the re-interpretation
of epistemic probabilities as frequencies is really just something added on top
of the Bayesian framework. Well if it is just something we add and it has no
normative effect in terms of our calculations, I think Laplace reply is in order:
”[No, Sire,] I had no need of that hypothesis”.



The problem with interpreting probabilities as frequencies is in my opinion
reflected through the book, for instance when Dr. Carrier propose how one
should arrive at priors from frequencies. The problem can be summarized as
this: Suppose you want to assign a prior probability to some event E. You
observe E happening n times out of N . What is the prior probability p(E)?
For probability to have a quantitative applicability to history it is crucial to
arrive at objective ways of specifying prior probabilities. For instance in the
example of the Criteria of Embarrassment we must be able to estimate numbers
such as P (P ) (the probability a gospel is preserved) or P (Em) (the probability
a story is embarrassing). Without such machinery Bayes theorem will just be
a consistency requirement without the ability to provide quantitative results.
To give a concrete example of how Dr. Carrier treats this problem consider the
following from chapter six, in the section on determining a reference class:


If our city is determined to have had that special patronage, and
our data shows 67 out of 67 cities with their patronage have public
libraries, then the prior probability our new city did as well will now
be over 98%.
Laplaces rule of succession is invoked here to arrive at the figure 98%, as it often
is through the book, but without any consideration where it come from or if
the specific assumptions are fulfilled. In fact, one would not get the impression
from reading the book Laplaces rule is a Bayesian method at all, but I digress.
Now consider the following example of a more elaborate problem on libraries
in two provinces:
To illustrate this, the libraries scenario can be represented with
this Venn diagram [see figure 1] In this example, P (LIBRARY |RC) =

Figure 1: Venn diagram from Proving History
0.80, P (LIBRARY |IT ) = 0.90, and P (LIBRARY |N P ) = 0.20.
What’s unknown is P (LIBRARY |C), the frequency of libraries at
the conjunction of all three sets. If we use the shortcut of assigning
P (LIBRARY |C) the value of P (LIBRARY |N P ) < P (LIBRARY |C) <
P (LIBRARY |IT ), that is, P (LIBRARY |C) can be any value from
P (LIBRARY |N P ) to P (LIBRARY |IT ), then the first concern is
how likely it is that P (LIBRARY |C) might actually be less than
P (LIBRARY |N P ), or more than P (LIBRARY |IT ), and the second concern is whether we can instead narrow the range. Given that
we know Seguntium lacked special patronage, in order for P (LIBRARY |C) <
P (LIBRARY |N P ), there have to be regionally pervasive differences
in the means and motives of veteran settlers in Italy - enough to

make a significant difference from veteran settlers in the rest of the
Roman empire. And indeed, on the other side of the equation, for
P (LIBRARY |C) > P (LIBRARY |IT ) these deviations would have
to be remarkably extreme, not only because P (LIBRARY |IT ) >
P (LIBRARY |RC), but also because P (LIBRARY |RC) is already
>> P (LIBRARY |N P ), which to overcome requires something extremely unusual. Lacking evidence of such differences, we must assume there are none until we know otherwise, and even becoming
aware of such differences, we must only allow those differences to
have realistic effects (e.g., evidence of a small difference in conditions
cannot normally warrant a huge difference in outcome; and if you
propose something abnormal, you have to argue for it from pertinent
evidencewhich all constitutes attending to the contents of b and its
conditional effect on probabilities in BT). However, we would have
to say all the same for P (LIBRARY |C) > P (LIBRARY |N P ),
since we have no more evidence that P (LIBRARY |C) is anything
other than exactly P(LIBRARY—NP). All we have is the fact that
P (LIBRARY |IT ) is higher than P (LIBRARY |RC), but that in
itself does not even suggest an increase in P (LIBRARY |N P ), and
certainly not much of an increase. Thus P (LIBRARY |N P ) <
P (LIBRARY |C) < P (LIBRARY |IT ) introduces far more ambiguity than the facts warrant. There is every reason to believe
P (LIBRARY |C) ≈ P (LIBRARY |N P ) and no reason to believe
being in Italy makes that much of a difference, especially as P (LIBRARY |IT )
is only slightly greater than P (LIBRARY |RC), which does suggest only a small rather than a large difference between Italy and
the rest of the empire, and likewise we should expect the large disparity between P (LIBRARY |N P ) and P (LIBRARY |RC) to be
preserved between P (LIBRARY |C) and P (LIBRARY |IT ), as the
causes producing the first disparity should be similarly operating
to produce the secondunless, again, we have evidence otherwise. In
short, NP appears to be far more relevant a reference class than
IT in this case and should be preferred until we know otherwise.
And if we also use a fortiori values (setting the probability at, say,
10 − 30%), we will almost certainly be right to a high degree of probability. All this constitutes a more complex application of the rule
of greater knowledge. When you have competing reference classes
entailing a higher and a lower prior, if you have no information indicating one prior is closer to the actual (but unknown) prior, then
you must accept a margin of error encompassing both, but when you
have information indicating the actual prior is most probably nearer
to one than the other, you must conclude that it is (because, so far
as you know, it is). In short, we can already conclude that it’s so
unlikely that P (LIBRARY |C) deviates by any significant amount
from P (LIBRARY |N P ) that we must conclude, more probably
than not, P (LIBRARY |C) ≈ P (LIBRARY |N P ), regardless of the

difference between P (LIBRARY |IT ) and P (LIBRARY |RC). And
as in this case, so in many others you’ll encounter.
I will admit after six readings I am still not quite certain what exactly is being
argued above and put generously I think it is another example of the books
sometimes less than lucid style of writing.
The reason why the argument is hard is because the problem is underdetermined, meaning one has to add additional assumptions to get a definite result.5
What is perhaps surprising Bayes theorem was not invoked, according to
which the analysis is actually fairly straight forward. All probabilities here are
conditioned on RC. By Bayes theorem:
P (L|N P, IT ) =

P (N P, IT |L)
P (L)
P (N P, IT )

Thus the argument is actually fairly simple: Since almost all provinces that
are NP do not have a library, P (L|N P ) = 0.2, and almost all provinces that
are IT do, P (L|IT ) = 0.9, it follows that all things being equal, if a province
has a library, there is less chance it is both NP and IT at the same time than
otherwise. For instance, if we assume the distribution factorize, P (N P, IT ) =
P (N P )P (IT ) and P (N P, IT |L) = P (N P |L)P (IT |L), then just applying Bayes
theorem two times give:
P (L|N P, IT ) =

P (N P, IT |L)
P (L|N P )P (L|IT )
0.2 × 0.9
P (L) =
≈ 0.22.
P (N P, IT )
P (L)

which is in agreement with the discussion above.
In reality one should of course never attempt such an argument. Clearly
the relevant piece of information is the number of libraries in the provinces
in addition to a number of other things we would know, and we should not
simply assume independence or some other ad-hock handwaving argument to
get a prior. It is difficult to say what one should actually do. My first advice
to students would be to come up with a way to validate whatever method they
came up with worked, but this is evidently very difficult to do for historical
problems. Since any computation using Bayes theorem relies on the estimation
of many such probabilities it goes without saying this is an important difficulty.



Dr. Carrier attempt to apply Bayes theorem to problems in history, specifically
the existence of Jesus. I emphasize I think this is an interesting idea, and while
I am uncertain we will find Jesus (or not) at the end, I am sure Dr. Carrier can
get something interesting out of the endeavour. Furthermore, the sections of
the book which discuss history are both entertaining and informative. However
5 If the problem was actually the way it is stated, one should apply the method of maximum
entropy (again see Jaynes).


the book should not be evaluated as a treatment of history but as advocating
the use of Bayes theorem for advancing historical Jesus studies. However for
this to be successful it must be because two conditions are met:
• The main problem keeping back Jesus studies is one of inference, not for
instance forming hypothesis, discerning how a-priori plausible ideas are or
• We can easily pose the inference problems in a numerical format suited
for Bayesian analysis; ie. the problems with granularity of language, formalization of our thoughts and so on can be overcome.
Are these two conditions actually met? Unfortunately I feel the book contains
great difficulties in it’s treatment of it’s various main claims which I have discussed in the review and will summarize here:
• The proof historical methods reduce to the application of Bayes theorem
is either false or not demonstrating anything which one would not already
accept as true if a Bayesian view of probabilities is accepted as true.
• The problem that a throughout treatment of a historical problem will
include a great many interacting variables with little chance of checking
the modelling assumptions
• The practical assignment of probabilities (and determining proper reference classes). The main example of assigning probabilities (the libraries
example discussed above) relies on non-standard argument and cannot be
said to be practical.
To convincingly make case Bayes theorem can advance history one needs lots
and lots of worked-out examples. Unfortunately the book contains nearly none
of these, and I would say the only time it tries to venture into the historical
method –the case of the criteria of embarrassment– it does so in a fashion that is
both distinctly non-Bayesian and without a way to encode something is actually
embarrassing to the author.
The book has even greater difficulties when it addresses foundational issues
such as the proposed resolution of the frequentist and Bayesian view of probabilities. An important problem with this proposal is that√it is flawed by the
virtue of not being able to represent probabilities like 1/ 2 if taken at face
value, however even a flawed suggestion could be interesting reading if it provided the reader with a comprehensive and accurate account of the underlying
problem and the current Bayesian resolution. Unfortunately this is not found
in the book, indeed it is impossibly to find (non-circular) definitions of the
most basic concepts such as probabilities and frequencies within it. Instead the
book introduces a plethora of important sounding terms (epistemic probability, hypothetical probability, true frequency, etc.) which are rapidly introduced
together with elaborate though-experiments. For the most part these thoughexperiments fails to demonstrate anything concrete and worse may give the


unsuspecting reader the impression something important and widely accepted
is being conveyed. This discussion could easily be extended to the many other
oddities found in chapter six and it is difficult not to get the impression sections
of the book were written in a hurry.