# What Do Words Of Estimative Probability Mean?

An Exercise In Analyst Education

Kristan J. Wheaton

What Do Words Of Estimative Probability Mean?
An Exercise In Analyst Education

I was cleaning my office this week in anticipation of a new term (we are on a quarter
system at Mercyhurst) and I ran across the results of a classroom exercise I conduct
regarding the meaning of words of estimative probability1 (such as “likely” or “virtually
certain”) or as they are commonly referred to around here, WEPs. I thought some
discussion of the exercise I use and the results of that exercise would be of interest to
intelligence studies students and educators.2

The value of WEPs is, of course, an ongoing question both within the intelligence
community and among its critics. At one end of the spectrum are those, like Michael
Schrage, who call for numeric estimates -- x has a 75% chance of happening plus or
minus 10%, that sort of thing. At the other end of the spectrum are those who Sherman
Kent called “poets” who believe that it doesn’t matter what an analyst says,
policymakers and others will interpret the analysis however they wish. The intelligence
community (IC) has recently moved further in the direction of a position that, while not
quite as extreme as Schrage’s, is clearly on that side of the spectrum as the “best
practice” for effectively communicating the results of intelligence analysis to
decisionmakers.

Much of the reason for using WEPs instead of numbers centers around the imprecise
nature of intelligence analysis in general, coupled with the misunderstandings that could
arise in the minds of decisionmakers if analysts used numbers to communicate their

This article started out a blog post (see details below). I have kept the hyperlinked endnotes for the
convenience of the reader in this draft version of the text.
2
Note: This is another attempt at what I call "experimental scholarship" (See this series for my first
attempt). The discussion regarding the use of blogs as a way to publish scholarly works (or, in my case,
more-or-less scholarly works...) is pretty hot and heavy right now. However, I found writing an article in
the form of a series of blog posts extraordinarily useful the first time, if only for the comments that I
received that I am sure will make any traditional journal article just that much better. It was the positive
feedback I received from that experience that makes me want to give it another go.

estimative judgments. A large part of the argument against WEPs, on the other hand, has
to do with the imprecise meaning of the words themselves. In other words, what exactly
does ‘likely” mean? That is where I intend to go next.

To Kent And Beyond!

The discussion of Words of Estimative Probability (WEPs) starts with Sherman Kent’s
seminal essay on the topic but hardly ends there. Linguistics experts have done a large
number of studies on what they refer to (among other things) as “verbal expressions of
probability”, “verbally expressed uncertainties” or “verbal probability expressions”.
Others, in the fields of finance, health and meteorology have also wrestled with this
question.3

Within the IC, though, there appears to be a limited number of studies on the topic. Steve
Rieber, now with the DNI, presented his own paper on the meaning of WEPs a couple of
years ago at the International Studies Association conference. At the time, he cited only
two studies as major research
findings within the realm of
intelligence analysis: One in
Dick Heuer’s classic, The
Psychology Of Intelligence
Analysis, and one (at least part of
the basis for Rieber's paper) from
a study of Kent School analysts.
In the study cited from Heuer,
analysts gave a single numerical
probability for each word. For
example, one analyst might
claim that the word “likely”
suggests a 75% probability while
another might claim that it
suggests only a 60% probability.
Kent School analysts, on the
other hand, were asked to give a
range of values for each word.
The charts showing both results
are below (Heuer's is on this

Rachel Kesselman, a student at Mercyhurst, in her thesis, will address all these literatures at some length.
She is scheduled to present her preliminary findings at the ISA conference at the end of March and will
likely complete her thesis (which focuses on the historical use of WEPs in National Intelligence Estimates)
sometime in May or June, 2008. I won’t steal her thunder, then, but suffice it to say that this is a well
studied topic outside the IC.

page Rieber's is on the next).

The conclusion from both studies was that the level of agreement was rough, to say the
least. There was a distinct difference between words at either end of the spectrum (such
as “highly unlikely” and “highly likely”) but differences between words that were closer
together in meaning (such as “probably” and “likely”) hardly seemed to be differences at
all.

Other writers have tried
to more or less establish
statistical meanings to
the words by simply
declaring that certain
words have certain
probabilistic meanings.
Kent's own attempt fell
much along these lines
as does the recent
attempt by the authors
of Joint Publication 2-0, "Joint Intelligence", Appendix A (published 22 JUN 07). The
fundamental problem with dictating these intervals is that it ignores the considerable
evidence (including the two studies cited above) suggesting that people don't think about
these words in these rigid ways (The problems with the Joint Pub run even deeper as it
unnecessarily confuses the ideas of probability and confidence and is, as a consequence,
180 degrees out from what the National Intelligence Council was promulgating at
approximately the same time! All this argues, I might add, for a need for more research
into intelligence theory and, in the interim, some standardized estimative language that
reflects the current best practice.)

What is clear, however, is that decisionmakers want clarity and consistency in the
language of intelligence estimates. One of Mercyhurst’s alumni, Jen Wozny, did a very
strong thesis on this subject a number of years ago (Available, unfortunately only through
inter-library loan at Mercyhurst's Hammermill Library). She looked at what over 40
decisionmakers, from the national security, business and law enforcement fields, wanted
from intelligence. Two of the items that consistently popped up were clarity and
consistency in the language that intelligence analysts used to communicate the results of
their analysis.

The Exercise And Its Learning Objectives

The issue of the use of Words Of Estimative Probability (WEPs) is one of the most
significant theoretical issues in the intelligence profession. What is the best way to
communicate the results of intelligence analysis to decisionmakers? If it is to be through

WEPs, shouldn’t we know what they mean? This is why I think the work of Kent, Heuer,
Rieber and, soon, Kesselman, is so enormously important.

At Mercyhurst, we have been teaching WEPs as the “best practice” for communicating
with decisionmakers since at least 2003 and probably well before that. While we teach it
as a best practice, we do not avoid the controversy surrounding this practice. The
classroom exercise that I am about to describe is specifically designed to highlight both
the strengths and weaknesses of WEPs. My goal is to get my students to understand the
limits as well as the utility of WEPs, to get them to think about the boundaries implicit in
any theory and not just to “know stuff”.

Therefore, this classroom exercise does not present the meanings of WEPs as a fait
accompli to the students. The exercise is designed to capture both the point value (Heuer)
and the range of values (Rieber) behind a select series of WEPS. The WEPs I choose to
use are those that come directly from the recent series of National Intelligence Estimates
(NIEs). These NIEs, which I have discussed in detail elsewhere, all include a sort of scale
that leaves the impression of the probabilities associated with particular words without
actually mentioning any numbers. I have included a graphic (taken from the most recent
NIE on Iran and its nuclear ambitions) of the scale below.

To set the stage for the exercise, I converted the scale above into the graphic below. Note
that I left the two right hand column headings empty and that I separated the words
"probably" and "likely" into their own rows. I did this in order to help me make some key
teaching points later on.

I hand out this sheet to each
state, in terms of a single
number (as with the study
reported by Heuer), what
each word means in terms
of probability. I usually give
them an example such as:
"If you think "remote"
means a 1% chance of
whatever it is you are
studying happening, then
write "1" in the block for

remote." I always choose "remote" or "virtually certain" for these examples as I know I
run the risk of anchoring the students when I give such an example and I figure it is safest
to anchor at the extremes where it is less likely to influence the overall outcome.

Once all of the students have filled in the first column, I ask them to label the next two
columns, "Low" and "High". I ask them to write the lowest and the highest percentage
they would assign to each word in those two columns. Once they have completed this
task, I ask them to calculate the difference between each word in the "odds" column (For
example, if a student wrote 1 for "remote" and 20 for "very unlikely" then the difference
would be 19). I also ask them to calculate the range of their answers for each word (For
example, if the low score for "very unlikely" was 10 and the high score was 30, then the
range would be 20). In this part of the exercise, I am clearly mirroring the study reported
by Rieber.

Handing out the forms, explaining the instructions and actually having the students fill in
the sheets can take as little as 5 or as many as 15 minutes depending on the types of
students and their level of sophistication with WEPs in general. In my last class where I
used this specific exercise I think it took me all of 5 minutes but that class was quite
bright and very used to the concept of WEPs. Once all the numbers have been entered
and the calculations complete, it is time to start making teaching points.

Teaching Points

Given the withering criticism offered by Kent and Schrage and the wide range of other
studies regarding the appropriate interpretation of Words of Estimative Probability
(WEPs), it is fairly easy to get intelligence studies students to see the problems with using
"bad" WEPs in their estimative statements. Bad WEPS, which include such words as
"could", "may", "might" and "possible", convey such a broad range of probabilities that,
in the best case, they do little to reduce a decisionmaker's uncertainty concerning an issue
and, at worst, create the sense, in the decisionmaker's mind, that the analyst is simply
trying to cover his or her backside in the event of a failed estimative conclusion.

Student analysts, then, are generally happy to see that the National Intelligence Council
(NIC) has "solved" this problem with their notional scale of appropriate WEPs (the scale
is available on page five of the latest Iran NIE and was discussed earlier). This scale not
only provides adequate gradations of probability (translated into words, of course) but
also avoids the use of either numbers or bad WEPs; both of which, for different reasons,
appear to be goals of the NIC in these public documents.

While there are many possible ways to explore with students the data generated by this
exercise, my primary teaching point is to disabuse entry level analysts of the idea that the
problems regarding communicating estimative conclusions to decisionmakers have been,
in any way, "solved". Rather, I want my students to come away with the idea that using

WEPs in a more-or-less formal way, while currently the best practice, is a system that can
still be improved upon; that it is an important question of intelligence theory that deserves

I generally start the review of the results of the exercise by exploring how "rational" (in a
classical economic sense) the students were in assigning point values and ranges to the
various WEPs. I point out the words are clearly ordered in increasing order of likelihood
and it makes sense, absent other information, to assign levels of probability at equal
intervals to each of the words. There are eight words and 100 possible percentage points
and a wholly "rational" person would place each word, therefore, about 12% points apart.
When you ask students, however, to look at the differences between the point values of
each word they will typically see nothing that comes even close to this rational approach.
The vast majority of students will have assigned probabilities intuitively with little regard
for the mathematic difference between one word and another.

The results are even worse when you ask students to look at the range of values for each
word. Again, the rational person would have assigned equal ranges for each of the words
but students typically do not. A good exercise to do at this point is to pick a word and
find out who in the class had the lowest score and who in the class gave the highest score
and to then ask the students to justify their decisions for doing so. This range is typically
quite broad and the justifications for selecting one number over another are typically
quite vague.

Inevitably, there will be a handful of students in each class who have, in fact, done the
math and calculated both the point values and the ranges accordingly. This exercise offers
two places to highlight the problems with this approach. First, the exercise separates out
the words "probably" and "likely". That is not the case with the NIC's chart which treats
the two words as synonymous. While it is quite surprising for the NIC to treat these
words this way since much of the literature does not indicate that people actually see
them as synonymous, the net effect in this exercise is to create a learning opportunity. It
is rare for a student to have taken into account the idea that two words may be partly or
largely synonymous in their mathematical calculations.

Likewise, there is an even better chance for learning in examining the results for the
"even chance" WEP. "Even chance" would appear to mean exactly what it says -- an even
chance, 50-50. Some students will inevitably interpret it in this literal way and assign a
point probability of 50% to the WEP and also mark both its high and low scores at 50%.
Other students will see the phrase more generally and, while typically giving it a point
value of 50%, will also include a range of values around it such that "even chance" could
mean anything from 40-60%! Of course, there is no right answer here, both sides can
make valid arguments, and fomenting this discussion is the ultimate point of this part of
the exercise.

The relative firmness of "even chance" coupled with the synonymity problem described
earlier also lends itself to a further examination of the mathematical approach. Few of the
mathematicians in the class will have noticed that there are three WEPs below “even
chance” and four above it, creating an uneven distribution centering on the 50% (more or
less) probability ascribed to the phrase "even chance". A wholly logical approach would
lead to an uneven distribution of both the point values and the ranges for those WEPs
below "even chance" when compared with those WEPs above it.

Students are typically confused by the end of this exercise. While they do (or should)
fully understand the problems with waffle words such as "could", "may", "might" and
"possible", and were willing to applaud the NIC's efforts at standardization, they now see
these "approved" words as far more squishy than they had previously thought. Good.
This is exactly the time to reinforce the message laid out at the beginning of this article;
to bring students back full circle. As analysts, they have an obligation to communicate as
effectively as possible the results of their intelligence analysis to decisionmakers. What
this exercise and the learning that went on before it demonstrate is that there is not yet a
perfect way to do this; there is only a best practice that tries to balance the competing
concerns. In my mind, it is the degree to which students come to understand not only the
best practice but also these concerns that marks the difference between a well-trained
analyst and a well-educated one.

A Surprise Ending

So far in this series, I have discussed the issues surrounding the use of Words Of
Estimative Probability as a way of communicating the results of intelligence analysis to
real-world decisionmakers. I have tried to devise an exercise that can demonstrate to
intelligence studies students that, while a consistent and limited series of so called "good"
WEPs (like the ones the National Intelligence Council (NIC) has adopted for use in its
recent National Intelligence Estimates (NIEs)) constitute the current "best practice" in
communicating the results of analysis, it is far from a perfect system. Studies both within
the intelligence community and from fields such as medicine, finance and meteorology
have all demonstrated that people assign only roughly consistent meanings to WEPs --
that one person's "likely" is another person's "virtually certain".

As I began to look at the data from my recent round of this classroom exercise, I began to
notice something interesting, though. There seemed to be a level of consistency in the
data that I had not noticed before. Was it there previously and I just missed it? I don't
know. I don't typically keep the data from these exercises and the only reason I had this
batch of data was because it was buried in one of the many piles of paper I have in my
office (I believe in that ancient organizational system -- mounding).

I decided to take a closer look at the data. I was surprised by what I saw. While some
individuals were throwing the full range out of whack (and keeping the teaching points in

the exercise relevant), these were clearly statistical outliers. The bulk of the students were
congregating quite nicely around an approximately ideal trendline. To be sure, the results
were still off in places, but the results were much closer to optimal than I expected.

I have reproduced the aggregate results in a chart below. I have used what financial
analysts call a high-low-close chart that marks the average high score, the average low
score and the average point value for each WEP. I have also included the idealized
trendline and have connected the high and low averages so you can see how the range
fluctuates as the probabilities associated with each WEP increases.

If you want to see the raw data, I have included it in the chart below:

(Notes on the chart: The "High" column represents the average high score while the

"Low" column represents the average low score for each WEP. The "Odds" column
represents the average point value given for each WEP. The "High-Low" column
represents the range (difference between high and low score) for each WEP. The "Odds-
odds" column represents the difference between the average point value from one WEP
to another. N=18)

While I know there are statistical nuances that I have not accounted for in the way I have
calculated and displayed the data, the overall pattern seems to suggest to me that there
may be something interesting going on here. We can be pretty adamant about the use of
good WEPs here at Mercyhurst. The students in this exercise have been exposed to that
thinking and it seems to have calibrated their use of WEPs to a certain degree.

There is, in fact, precedent for this kind of calibration. According to Rachel Kesselman's
early results, the medical profession, with outside pressure from the insurance industry,
has adopted a more or less "accepted" meaning for a number of WEPs (used primarily in
prognostic statements to patients and their families). The same thing might well be
happening here.4

The key seems to be, in all these cases, outside pressure. In the case of our students the
pressure comes from the professors. In the case of the medical profession, the pressure
comes from the insurance companies. I have already argued that the potential for public
exposure of the results of NIEs is one of the primary drivers behind a more consistent and
rigorous approach to the communication of estimates in general. It may well be that this
potential for public exposure will force the meanings of WEPs to collapse around certain
estimative ranges as well.

My colleague, Steve Marrin, has done a number of papers on the more general aspects of the medical
analogy to the intelligence profession. All are worth checking out.

