What Do Words Of Estimative Probability Mean?
An Exercise In Analyst Education
Kristan J. Wheaton
Mercyhurst College 501 E. 38th St. Erie, PA 16546 #814 824 2023 email@example.com www.sourcesandmethods.blogspot.com
Wheaton – What Do WEPs Mean?
What Do Words Of Estimative Probability Mean?
An Exercise In Analyst Education
I was cleaning my office this week in anticipation of a new term (we are on a quarter system at Mercyhurst) and I ran across the results of a classroom exercise I conduct regarding the meaning of words of estimative probability1 (such as “likely” or “virtually certain”) or as they are commonly referred to around here, WEPs. I thought some discussion of the exercise I use and the results of that exercise would be of interest to intelligence studies students and educators.2 The value of WEPs is, of course, an ongoing question both within the intelligence community and among its critics. At one end of the spectrum are those, like Michael Schrage, who call for numeric estimates -- x has a 75% chance of happening plus or minus 10%, that sort of thing. At the other end of the spectrum are those who Sherman Kent called “poets” who believe that it doesn’t matter what an analyst says, policymakers and others will interpret the analysis however they wish. The intelligence community (IC) has recently moved further in the direction of a position that, while not quite as extreme as Schrage’s, is clearly on that side of the spectrum as the “best practice” for effectively communicating the results of intelligence analysis to decisionmakers. Much of the reason for using WEPs instead of numbers centers around the imprecise nature of intelligence analysis in general, coupled with the misunderstandings that could arise in the minds of decisionmakers if analysts used numbers to communicate their
This article started out a blog post (see details below). I have kept the hyperlinked endnotes for the convenience of the reader in this draft version of the text. 2 Note: This is another attempt at what I call "experimental scholarship" (See this series for my first attempt). The discussion regarding the use of blogs as a way to publish scholarly works (or, in my case, more-or-less scholarly works...) is pretty hot and heavy right now. However, I found writing an article in the form of a series of blog posts extraordinarily useful the first time, if only for the comments that I received that I am sure will make any traditional journal article just that much better. It was the positive feedback I received from that experience that makes me want to give it another go.
Wheaton – What Do WEPs Mean? estimative judgments. A large part of the argument against WEPs, on the other hand, has to do with the imprecise meaning of the words themselves. In other words, what exactly does ‘likely” mean? That is where I intend to go next. To Kent And Beyond! The discussion of Words of Estimative Probability (WEPs) starts with Sherman Kent’s seminal essay on the topic but hardly ends there. Linguistics experts have done a large number of studies on what they refer to (among other things) as “verbal expressions of probability”, “verbally expressed uncertainties” or “verbal probability expressions”. Others, in the fields of finance, health and meteorology have also wrestled with this question.3 Within the IC, though, there appears to be a limited number of studies on the topic. Steve Rieber, now with the DNI, presented his own paper on the meaning of WEPs a couple of years ago at the International Studies Association conference. At the time, he cited only two studies as major research findings within the realm of intelligence analysis: One in Dick Heuer’s classic, The Psychology Of Intelligence Analysis, and one (at least part of the basis for Rieber's paper) from a study of Kent School analysts. In the study cited from Heuer, analysts gave a single numerical probability for each word. For example, one analyst might claim that the word “likely” suggests a 75% probability while another might claim that it suggests only a 60% probability. Kent School analysts, on the other hand, were asked to give a range of values for each word. The charts showing both results are below (Heuer's is on this
Rachel Kesselman, a student at Mercyhurst, in her thesis, will address all these literatures at some length. She is scheduled to present her preliminary findings at the ISA conference at the end of March and will likely complete her thesis (which focuses on the historical use of WEPs in National Intelligence Estimates) sometime in May or June, 2008. I won’t steal her thunder, then, but suffice it to say that this is a well studied topic outside the IC.
Wheaton – What Do WEPs Mean? page Rieber's is on the next). The conclusion from both studies was that the level of agreement was rough, to say the least. There was a distinct difference between words at either end of the spectrum (such as “highly unlikely” and “highly likely”) but differences between words that were closer together in meaning (such as “probably” and “likely”) hardly seemed to be differences at all. Other writers have tried to more or less establish statistical meanings to the words by simply declaring that certain words have certain probabilistic meanings. Kent's own attempt fell much along these lines as does the recent attempt by the authors of Joint Publication 2-0, "Joint Intelligence", Appendix A (published 22 JUN 07). The fundamental problem with dictating these intervals is that it ignores the considerable evidence (including the two studies cited above) suggesting that people don't think about these words in these rigid ways (The problems with the Joint Pub run even deeper as it unnecessarily confuses the ideas of probability and confidence and is, as a consequence, 180 degrees out from what the National Intelligence Council was promulgating at approximately the same time! All this argues, I might add, for a need for more research into intelligence theory and, in the interim, some standardized estimative language that reflects the current best practice.) What is clear, however, is that decisionmakers want clarity and consistency in the language of intelligence estimates. One of Mercyhurst’s alumni, Jen Wozny, did a very strong thesis on this subject a number of years ago (Available, unfortunately only through inter-library loan at Mercyhurst's Hammermill Library). She looked at what over 40 decisionmakers, from the national security, business and law enforcement fields, wanted from intelligence. Two of the items that consistently popped up were clarity and consistency in the language that intelligence analysts used to communicate the results of their analysis. The Exercise And Its Learning Objectives The issue of the use of Words Of Estimative Probability (WEPs) is one of the most significant theoretical issues in the intelligence profession. What is the best way to communicate the results of intelligence analysis to decisionmakers? If it is to be through
Wheaton – What Do WEPs Mean? WEPs, shouldn’t we know what they mean? This is why I think the work of Kent, Heuer, Rieber and, soon, Kesselman, is so enormously important. At Mercyhurst, we have been teaching WEPs as the “best practice” for communicating with decisionmakers since at least 2003 and probably well before that. While we teach it as a best practice, we do not avoid the controversy surrounding this practice. The classroom exercise that I am about to describe is specifically designed to highlight both the strengths and weaknesses of WEPs. My goal is to get my students to understand the limits as well as the utility of WEPs, to get them to think about the boundaries implicit in any theory and not just to “know stuff”. Therefore, this classroom exercise does not present the meanings of WEPs as a fait accompli to the students. The exercise is designed to capture both the point value (Heuer) and the range of values (Rieber) behind a select series of WEPS. The WEPs I choose to use are those that come directly from the recent series of National Intelligence Estimates (NIEs). These NIEs, which I have discussed in detail elsewhere, all include a sort of scale that leaves the impression of the probabilities associated with particular words without actually mentioning any numbers. I have included a graphic (taken from the most recent NIE on Iran and its nuclear ambitions) of the scale below.
To set the stage for the exercise, I converted the scale above into the graphic below. Note that I left the two right hand column headings empty and that I separated the words "probably" and "likely" into their own rows. I did this in order to help me make some key teaching points later on. I hand out this sheet to each student and ask them to state, in terms of a single number (as with the study reported by Heuer), what each word means in terms of probability. I usually give them an example such as: "If you think "remote" means a 1% chance of whatever it is you are studying happening, then write "1" in the block for
Wheaton – What Do WEPs Mean? remote." I always choose "remote" or "virtually certain" for these examples as I know I run the risk of anchoring the students when I give such an example and I figure it is safest to anchor at the extremes where it is less likely to influence the overall outcome. Once all of the students have filled in the first column, I ask them to label the next two columns, "Low" and "High". I ask them to write the lowest and the highest percentage they would assign to each word in those two columns. Once they have completed this task, I ask them to calculate the difference between each word in the "odds" column (For example, if a student wrote 1 for "remote" and 20 for "very unlikely" then the difference would be 19). I also ask them to calculate the range of their answers for each word (For example, if the low score for "very unlikely" was 10 and the high score was 30, then the range would be 20). In this part of the exercise, I am clearly mirroring the study reported by Rieber. Handing out the forms, explaining the instructions and actually having the students fill in the sheets can take as little as 5 or as many as 15 minutes depending on the types of students and their level of sophistication with WEPs in general. In my last class where I used this specific exercise I think it took me all of 5 minutes but that class was quite bright and very used to the concept of WEPs. Once all the numbers have been entered and the calculations complete, it is time to start making teaching points. Teaching Points Given the withering criticism offered by Kent and Schrage and the wide range of other studies regarding the appropriate interpretation of Words of Estimative Probability (WEPs), it is fairly easy to get intelligence studies students to see the problems with using "bad" WEPs in their estimative statements. Bad WEPS, which include such words as "could", "may", "might" and "possible", convey such a broad range of probabilities that, in the best case, they do little to reduce a decisionmaker's uncertainty concerning an issue and, at worst, create the sense, in the decisionmaker's mind, that the analyst is simply trying to cover his or her backside in the event of a failed estimative conclusion. Student analysts, then, are generally happy to see that the National Intelligence Council (NIC) has "solved" this problem with their notional scale of appropriate WEPs (the scale is available on page five of the latest Iran NIE and was discussed earlier). This scale not only provides adequate gradations of probability (translated into words, of course) but also avoids the use of either numbers or bad WEPs; both of which, for different reasons, appear to be goals of the NIC in these public documents. While there are many possible ways to explore with students the data generated by this exercise, my primary teaching point is to disabuse entry level analysts of the idea that the problems regarding communicating estimative conclusions to decisionmakers have been, in any way, "solved". Rather, I want my students to come away with the idea that using 6
Wheaton – What Do WEPs Mean? WEPs in a more-or-less formal way, while currently the best practice, is a system that can still be improved upon; that it is an important question of intelligence theory that deserves additional research and study. I generally start the review of the results of the exercise by exploring how "rational" (in a classical economic sense) the students were in assigning point values and ranges to the various WEPs. I point out the words are clearly ordered in increasing order of likelihood and it makes sense, absent other information, to assign levels of probability at equal intervals to each of the words. There are eight words and 100 possible percentage points and a wholly "rational" person would place each word, therefore, about 12% points apart. When you ask students, however, to look at the differences between the point values of each word they will typically see nothing that comes even close to this rational approach. The vast majority of students will have assigned probabilities intuitively with little regard for the mathematic difference between one word and another. The results are even worse when you ask students to look at the range of values for each word. Again, the rational person would have assigned equal ranges for each of the words but students typically do not. A good exercise to do at this point is to pick a word and find out who in the class had the lowest score and who in the class gave the highest score and to then ask the students to justify their decisions for doing so. This range is typically quite broad and the justifications for selecting one number over another are typically quite vague. Inevitably, there will be a handful of students in each class who have, in fact, done the math and calculated both the point values and the ranges accordingly. This exercise offers two places to highlight the problems with this approach. First, the exercise separates out the words "probably" and "likely". That is not the case with the NIC's chart which treats the two words as synonymous. While it is quite surprising for the NIC to treat these words this way since much of the literature does not indicate that people actually see them as synonymous, the net effect in this exercise is to create a learning opportunity. It is rare for a student to have taken into account the idea that two words may be partly or largely synonymous in their mathematical calculations. Likewise, there is an even better chance for learning in examining the results for the "even chance" WEP. "Even chance" would appear to mean exactly what it says -- an even chance, 50-50. Some students will inevitably interpret it in this literal way and assign a point probability of 50% to the WEP and also mark both its high and low scores at 50%. Other students will see the phrase more generally and, while typically giving it a point value of 50%, will also include a range of values around it such that "even chance" could mean anything from 40-60%! Of course, there is no right answer here, both sides can make valid arguments, and fomenting this discussion is the ultimate point of this part of the exercise.
Wheaton – What Do WEPs Mean? The relative firmness of "even chance" coupled with the synonymity problem described earlier also lends itself to a further examination of the mathematical approach. Few of the mathematicians in the class will have noticed that there are three WEPs below “even chance” and four above it, creating an uneven distribution centering on the 50% (more or less) probability ascribed to the phrase "even chance". A wholly logical approach would lead to an uneven distribution of both the point values and the ranges for those WEPs below "even chance" when compared with those WEPs above it. Students are typically confused by the end of this exercise. While they do (or should) fully understand the problems with waffle words such as "could", "may", "might" and "possible", and were willing to applaud the NIC's efforts at standardization, they now see these "approved" words as far more squishy than they had previously thought. Good. This is exactly the time to reinforce the message laid out at the beginning of this article; to bring students back full circle. As analysts, they have an obligation to communicate as effectively as possible the results of their intelligence analysis to decisionmakers. What this exercise and the learning that went on before it demonstrate is that there is not yet a perfect way to do this; there is only a best practice that tries to balance the competing concerns. In my mind, it is the degree to which students come to understand not only the best practice but also these concerns that marks the difference between a well-trained analyst and a well-educated one. A Surprise Ending So far in this series, I have discussed the issues surrounding the use of Words Of Estimative Probability as a way of communicating the results of intelligence analysis to real-world decisionmakers. I have tried to devise an exercise that can demonstrate to intelligence studies students that, while a consistent and limited series of so called "good" WEPs (like the ones the National Intelligence Council (NIC) has adopted for use in its recent National Intelligence Estimates (NIEs)) constitute the current "best practice" in communicating the results of analysis, it is far from a perfect system. Studies both within the intelligence community and from fields such as medicine, finance and meteorology have all demonstrated that people assign only roughly consistent meanings to WEPs -that one person's "likely" is another person's "virtually certain". As I began to look at the data from my recent round of this classroom exercise, I began to notice something interesting, though. There seemed to be a level of consistency in the data that I had not noticed before. Was it there previously and I just missed it? I don't know. I don't typically keep the data from these exercises and the only reason I had this batch of data was because it was buried in one of the many piles of paper I have in my office (I believe in that ancient organizational system -- mounding). I decided to take a closer look at the data. I was surprised by what I saw. While some individuals were throwing the full range out of whack (and keeping the teaching points in 8
Wheaton – What Do WEPs Mean? the exercise relevant), these were clearly statistical outliers. The bulk of the students were congregating quite nicely around an approximately ideal trendline. To be sure, the results were still off in places, but the results were much closer to optimal than I expected. I have reproduced the aggregate results in a chart below. I have used what financial analysts call a high-low-close chart that marks the average high score, the average low score and the average point value for each WEP. I have also included the idealized trendline and have connected the high and low averages so you can see how the range fluctuates as the probabilities associated with each WEP increases.
If you want to see the raw data, I have included it in the chart below:
(Notes on the chart: The "High" column represents the average high score while the 9
Wheaton – What Do WEPs Mean? "Low" column represents the average low score for each WEP. The "Odds" column represents the average point value given for each WEP. The "High-Low" column represents the range (difference between high and low score) for each WEP. The "Oddsodds" column represents the difference between the average point value from one WEP to another. N=18) While I know there are statistical nuances that I have not accounted for in the way I have calculated and displayed the data, the overall pattern seems to suggest to me that there may be something interesting going on here. We can be pretty adamant about the use of good WEPs here at Mercyhurst. The students in this exercise have been exposed to that thinking and it seems to have calibrated their use of WEPs to a certain degree. There is, in fact, precedent for this kind of calibration. According to Rachel Kesselman's early results, the medical profession, with outside pressure from the insurance industry, has adopted a more or less "accepted" meaning for a number of WEPs (used primarily in prognostic statements to patients and their families). The same thing might well be happening here.4 The key seems to be, in all these cases, outside pressure. In the case of our students the pressure comes from the professors. In the case of the medical profession, the pressure comes from the insurance companies. I have already argued that the potential for public exposure of the results of NIEs is one of the primary drivers behind a more consistent and rigorous approach to the communication of estimates in general. It may well be that this potential for public exposure will force the meanings of WEPs to collapse around certain estimative ranges as well.
My colleague, Steve Marrin, has done a number of papers on the more general aspects of the medical analogy to the intelligence profession. All are worth checking out.