Professional Documents
Culture Documents
To cite this article: Daniel Irwin & David R. Mandel (2019): Improving information evaluation for
intelligence production, Intelligence and National Security, DOI: 10.1080/02684527.2019.1569343
ARTICLE
ABSTRACT
National security decision-making is informed by intelligence assessments,
which in turn depend on sound information evaluation. We critically examine
information evaluation methods, arguing that they mask rather than effec-
tively guide subjectivity in intelligence assessment. Drawing on the guidance
metaphor, we propose that rigid ‘all-purpose’ information evaluation meth-
ods be replaced by flexible ‘context-sensitive’ guidelines aimed at improving
the soundness, precision, accuracy and clarity of irreducibly subjective judg-
ments. Specific guidelines, supported by empirical evidence, include use of
numeric probability estimates to quantify the judged likelihood of informa-
tion accuracy, promoting collector-analyst collaboration and periodic revalua-
tion of information as new information is acquired.
Introduction
Intelligence practitioners must regularly exploit information of uncertain quality to support decision-
making.1 Whether information is obtained from a human source or an automated sensor, failure to
critically assess its characteristics may contribute to intelligence failure.2 This is evident in the case of
Curveball, the Iraqi informant who fabricated extensive testimony on Saddam Hussein’s alleged
weapons of mass destruction (WMD).3 Subjected to inadequate scrutiny, Curveball’s false allegations
underpinned the 2002 National Intelligence Estimate on Iraq’s WMD programmes and may have
influenced the ill-fated decision to invade Iraq in 2003.4
Recognizing information evaluation as a critical function within the intelligence process, certain
organizations promulgate methods for rating and communicating relevant information characteristics.
Despite their intent, many of these methods are inconsistent across organizations and may be
fundamentally flawed or otherwise ill-suited to the context of application. Under certain circumstances,
poorly formulated or misapplied evaluation procedures could even degrade the quality of analytic
judgements and impair decision-making.
In 1950, US officials adhering to such methods broadly applied the lowest possible reliability
rating to South Korean sources and dismissed repeated warnings of imminent North Korean
invasion as politically motivated.5 North Korea thus ‘achieved complete tactical surprise and
would nearly overwhelm the peninsula’ before being repelled by UN intervention.6 As UN forces
advanced on Pyongyang, local reports of Chinese mobilization were also prematurely discounted
as self-serving.7 As a result, US decision-makers were again blindsided when Communist Chinese
troops crossed the Yalu River and routed the Eighth Army.8 Despite this anecdote being nearly
70 years old, the information evaluation methods enshrined in current doctrine remain largely
unchanged.9
In this paper, we provide a critical examination of the source reliability and information
credibility methods in use across a variety of intelligence organizations and intelligence domains.
We identify weaknesses and inconsistencies that may undermine the fidelity of information
Communicative issues
Under the Admiralty Code, qualitative ratings of reliability and credibility form a demonstrably intuitive
progression.14 However, subjective interpretations of the boundaries between these ratings are likely
to vary among users, as are interpretations of the relevant rating criteria.15 For instance, in many
versions of the Admiralty Code, a reliable (‘A’) source is said to have a ‘history of complete reliability’,
while a usually reliable (‘B’) source has a ‘history of valid information most of the time’.16 None of the
standards examined associate these descriptions with numeric values (i.e., ‘batting averages’), poten-
tially leading to miscommunication. One analyst may assign usually reliable to sources that provide
valid information more than 70 per cent of the time. An analyst receiving this rating may interpret it to
mean valid information more than 90 per cent of the time, and place more confidence in the source
than is warranted. Conversely, an analyst may assume usually reliable only reflects valid information
more than 50 per cent of the time and might prematurely discount the source. Asked to assign absolute
probability values to reliability and credibility ratings, US intelligence officers demonstrated consider-
able variation in their interpretations.17 For example, probabilistic interpretations of usually reliable and
probably true ranged from .55 to .90 and .53 to .90, respectively, while interpretations of fairly reliable
and possibly true both ranged from .40 to .80.18 The latter terms not only varied greatly in meaning
across officers, they straddled the fifty-fifty midpoint of the probability scale, which suggests that the
same evaluations may point different analysts in different directions, prompting some to reject
information that others would accept based on their disparate interpretations.
Among the methods examined, reliable or completely reliable indicates maximum source relia-
bility, while confirmed, confirmed by other sources or completely credible marks the highest degree of
information credibility. Despite these inconsistencies, most scales faithfully reproduce the
Admiralty Code’s A–F (reliability)/1–6 (credibility) scoring scheme, and ratings are often commu-
nicated using only the appropriate alphanumeric code (e.g., A1). These terminological variations
4 D. IRWIN AND D. R. MANDEL
may therefore contribute to miscommunication between users familiar with different methods. For
instance, under most US methods examined, ‘A’ is defined as reliable, while UK Joint Doctrine
Publication (JDP) 2–00 defines ‘A’ as completely reliable (conforming to NATO doctrine).19 A US
analyst who understands ‘A’ to mean reliable might transmit that rating to a UK counterpart, who
interprets it as completely reliable. This translation is potentially problematic, given that an analyst
or consumer may place more weight on a source labelled completely reliable than one labelled
reliable. Alternatively, the translation from completely reliable to reliable could lead a recipient to
undervalue a source.
Opportunities for miscommunication due to variance between methods could also arise where
information credibility scales use the term ‘information accuracy’ as a synonym for information
credibility (e.g., US ATTP 2–91.5; US TC 2–91.8).20 Although information credibility often includes
considerations of accuracy, it is usually conceptualized as a multidimensional construct. Credibility
generally incorporates criteria that can serve as cues to accuracy, but which are not equivalent to
accuracy. For example, ‘triangulating evidence’ contributes to credibility, but it does not require
ground truth used for verifying accuracy. Thus, use of ‘information accuracy’ by certain standards
may further undermine reliable interpretation of ratings.
Another communicative issue relates to liberal use of terms conveying certainty (e.g., confirmed).
In intelligence contexts where the information is seldom complete and often ambiguous or vague,
these expressions could lead to overconfidence on the part of consumers.21 NATO intelligence
doctrine explicitly discourages statements of certainty, ‘given the nature of intelligence projecting
forward in time’.22 However, it remains unverified whether NATO’s current term, completely credible,
conveys less certainty than its earlier term, confirmed by other sources.
The problematic inclusion of terms indicating certainty is compounded by the observed tendency
of evaluators to confine their ratings to the high ends of the scales.23 In their review of spot reports
completed during a US Army field exercise, Baker, McKendry and Mace found that A1 and B2
represented 80 per cent of all reliability/credibility ratings, with B2 alone comprising 74 per cent of
ratings.24 This tendency to treat B2 as a ‘sweet spot’ is particularly concerning given findings that
decision makers who receive highly rated information are less likely to seek additional information
prior to making an initial decision.25
The constriction of scale use (i.e., the process by which the five ordinal levels provided are
effectively narrowed to two) might reflect accountability pressure – namely, the ‘implicit or explicit
expectation that one may be called on to justify one’s beliefs, feelings, and actions to others’.26
Accountability pressure has been posited to account for why analysts exhibit underconfidence in
strategic intelligence forecasting, and why they do so even more strongly when the forecasts are
deemed to be of high importance to decision-makers.27 Individuals under accountability pressures
engage in a variety of tactics to secure approval and pre-empt criticism from their respective
audiences.28 A rating of B2 likely represents the accountability sweet spot because it uses terms
with positive linguistic directionality that optimistically signal the value of the information from a
trustworthy source, yet without sounding overconfident or insufficiently critical.29 In fact, intelli-
gence analysts show a distinct bias towards using verbal probabilities with positive directionality in
strategic intelligence forecasts.30 Thus, evaluators may prefer to avoid ratings worse than B2 if they
anticipate being challenged for doing so. Conversely, evaluators may view ratings higher than B2
as running the risk of seeming too confident or uncritical. Note as well that as scale levels fall into
disuse, they become increasingly untenable because there are strong normative pressures to avoid
them. That is, the expectation is that one ought to be especially well justified to use something that
is an abnormal practice.
Criterial issues
Beyond the communicative issues outlined above, a set of criterial issues stem from the rating
determinants incorporated (and neglected) by current evaluation methods. A particularly problematic
INTELLIGENCE AND NATIONAL SECURITY 5
feature of the Admiralty Code is its lack of situational considerations and implicit treatment of source
reliability as constant across different contexts.31 Regardless of past performance, source reliability may
vary dramatically depending on the nature of the information provided, the characteristics of the
source(s), and the circumstances of collection. A human intelligence (HUMINT) source with a proven
track record reporting on military operations may lack the expertise to reliably observe and report on
economic developments. Beyond variable expertise, HUMINT source motivations, expectations, sensi-
tivity and recall ability may shift between situations, with major implications for information quality.32
Even the reliability of an ‘objective source’ (i.e., a sensor) is context dependent.33 For example,
inclement weather may compromise the quality of information provided by an optical sensor, despite
its history of perfect reporting under ideal conditions.
Aside from source history, most of the methods examined highlight reliability determinants
such as ‘authenticity’, ‘competency’ and ‘trustworthiness’. The inclusion of these determinants is
consistent with the broader literature on source reliability.34 However, the extant methods fail to
formally define or operationalize these concepts. Their inclusion is therefore likely to increase
subjectivity and further undermine the internal consistency of source reliability evaluations. Chang
et al. describe how a process designed to decompose and evaluate components of a problem (i.e.,
information characteristics) may inadvertently amplify unreliability in assessments if that process is
ambiguous and open to subjective interpretations.35 Given the ambiguity built into current
evaluation methods, users are unlikely to retrieve every relevant determinant, let alone reliably
and validly weigh every relevant determinant when arriving at an ordinal assessment.
Another issue with current source reliability methods is their failure to delineate procedures for
evaluating subjective sources versus objective sources (e.g., human sources versus sensors), or
primary sources versus secondary/relaying sources.36 A determinant such as source motivation may
be relevant when assessing HUMINT sources, but not sensors. Similarly, source expertise may be
highly relevant for a primary source collecting technical information (e.g., a HUMINT asset gather-
ing information on Iranian nuclear technology), but less so for an intermediary delivering this
information to a collector. In cases where information passes through multiple sources, there are
often several intervals where source reliability considerations are relevant. For instance, when
receiving second-hand information from a HUMINT source, one might consider the reliability of
the primary source, the reliability of the secondary/relaying source(s), the reliability of the collector,
as well as the reliability of any medium(s) used to transmit the information.37
Following initial collection, Noble describes how information may undergo distortion at other
stages of the intelligence process.38 Like sources, intelligence practitioners will vary in terms of
their ability to reliably assess and relay information. For instance, an economic subject matter
expert may lack the expertise to accurately evaluate and transmit information on enemy troop
movements. Beyond expertise, an intelligence practitioner’s assessment is also undoubtedly influ-
enced by other personal characteristics (e.g., motivation, expectations, biases, recall ability) as well
as various contextual factors.39 When a finished intelligence product is edited and approved for
dissemination, managers may inject additional distortion by adjusting analytic conclusions.40 The
many opportunities for distortion may warrant the formalization of information evaluation as an
ongoing requirement throughout the intelligence process.41 At the very least, efforts should be
made to ensure that intelligence practitioners and consumers alike are cognizant of the mutability
of information characteristics following the initial evaluation.
Much like the scales for evaluating source reliability, information credibility scales suffer from an
inherent lack of clarity. Information credibility generally incorporates confirmation ‘by other inde-
pendent sources’ as a key determinant. However, evaluators are given no guidance on how many
independent sources must provide confirmation for information to be judged credible. Where one
evaluator might consider confirmation by two sources sufficient for a confirmed rating, another
might seek verification by three or more sources. Perceptions of how much corroboration is
necessary may also vary depending on the information in question. For instance, an analyst may
decide that a particularly consequential piece of information requires more corroboration than
6 D. IRWIN AND D. R. MANDEL
usual to be rated confirmed. This lack of consistency could lead analysts to misinterpret each other’s
credibility ratings.
Methods for evaluating information credibility also lack instructions for grading pieces of
information that are, by alternative evidence sources, simultaneously confirmed and disconfirmed.
Under the Admiralty Code, such information could be considered both confirmed/completely
credible (‘1’) and improbable (‘5’).42 Without guidance, some analysts may base their assessments
more heavily on instances of confirmation, whereas others might focus on instances of disconfir-
mation or analysts might attempt to strike a balance between confirmation and disconfirmation.
These three approaches may yield very different evaluations, despite evaluating the same piece of
information using the same method.
Capet and Revault d’Allonnes argue that confirmation does not necessarily translate into
information credibility and not all forms of confirmation should be given equal weight.43 For
instance, a spurious rumour corroborated by many unreliable sources (e.g., tweets about a second
shooter during a terrorist attack) and disconfirmed by a single reliable source (e.g., a police
statement indicating a single attacker) could still be rated highly credible under current methods.
Capet and Revault d’Allonnes suggest that source reliability be taken into consideration when
weighing confirmation against disconfirmation.44 This would directly contravene the Admiralty
Code’s treatment of source reliability and information credibility as independent factors.
Current methods also lack explicit guidance to consider whether relationships of affinity,
hostility or independence45 exist between corroborating sources.46 However, corroboration from
a source that has a ‘friendly’ relationship with the source under scrutiny should likely have less
influence than corroboration from an independent or hostile source. For example, all else being
equal, if Saudi Arabia corroborates information provided by Syria (with which it currently has a
hostile relationship), that confirmation should carry more weight than identical confirmation
provided by Russia (which currently has a relationship of affinity with Syria). As a general rule,
sources that are friendly to one another should be expected to corroborate each other more often
than sources that are not friendly.47
Friedman and Zeckhauser suggest that the current emphasis on consistency with existing
evidence may encourage confirmation bias.48 ‘Biased attrition’ is used to describe an information
filtering process that systematically favours certain information types in a problematic way.49
Information that conflicts with prior beliefs and analysis may in fact be more valuable, as it can
shift the views of analysts and consumers more significantly. Friedman and Zeckhauser argue that
methods for evaluating information credibility could reduce biased attrition by incorporating the
extent to which information provides a new or original perspective on the intelligence requirement
at hand.50 Likewise, Capet and Revault d’Allonnes suggest that evaluation methods be modified to
gauge the extent to which information provides meaningful corroboration.51
Along similar lines, Lemercier notes that confirmation-based credibility standards do not
account for the phenomenon of amplification, whereby analysts come to believe closely correlated
sources are independently verifying a piece of information.52 In order to control for amplification,
credibility evaluation could incorporate successive corroboration by the same source, corrobora-
tion by sources of the same type, as well as comparative corroboration from different collection
disciplines.53
The current emphasis placed on confirmation/consistency may also reinforce a primacy effect,
given that new information must conform to prior information to be deemed credible. All else
being equal, if an analyst receives three new pieces of information, the first item received will
typically face the fewest hurdles to being assessed as credible. Meanwhile, the second piece of
information must conform to the first, and the third must conform to both the first and second.
Under this system, an analyst may inadvertently underweight information that is in fact more
accurate or consequential than information received earlier, potentially decreasing the quality of
analysis. Ultimately, the order in which information is received should be irrelevant to judgements
made about its quality.54
INTELLIGENCE AND NATIONAL SECURITY 7
One option for dealing with the primacy effect would be the inclusion of a triangulation procedure
for revaluating prior pieces of information as new information becomes available. Figure 1 compares
the primacy effect reinforced by current methods with a system of evidence triangulation, where the
arrows indicate the ‘direction’ of confirmation. Two of the US methods examined advocate continuous
analysis and revaluation of source reliability/information credibility as new information becomes
available.55 However, neither document outlines a specific procedure for doing so.
The lack of guidance regarding confirmation (specifically, what level of corroboration warrants each
rating) could also facilitate overconfidence stemming from information volume. Beyond an early point
in the information gathering process, predictive accuracy plateaus, while confidence continues to rise,
creating substantial confidence-accuracy discrepancies.56 Failing to adjust for their cognitive limita-
tions, judges often become overconfident in the face of surplus information, despite being unable to
assimilate it effectively.57 Thus, without adequate guidance, evaluators may overvalue a piece of
information confirmed six times, when an item confirmed three times has an equal probability of
being accurate. In other words, the amount of confirmation on its own is a fallible indicator of
information accuracy.
Beyond confirmation, most of the information credibility scales examined incorporate consideration
of whether an item is ‘logical in itself’. Current methods do not specify whether this simply refers to the
extent that information conforms to the analyst’s current assessment. Furthermore – and not without a
touch of irony – the use in certain standards of ‘not illogical’ as a level between ‘logical in itself’ and
‘illogical in itself’ is nonsensical, as ‘not illogical’ effectively means ‘logical’ (in itself).58
As noted with regards to source reliability, the Admiralty Code’s one-size-fits-all approach to
information credibility neglects important contextual considerations. Several US evaluation methods
suggest that certain credibility determinants have more relevance depending on the collection
discipline utilized. For example, US TC 2–91.8 and US ATP 2–22.9 suggest that there is a greater
Figure 1. Primacy effect reinforced by current methods versus evidence triangulation (arrows indicate the direction of
confirmation).
8 D. IRWIN AND D. R. MANDEL
risk of deception (an information credibility determinant) when utilizing open source intelligence
than captured enemy documents.59 Similarly, US ATTP 2–91.5 refers to the Admiralty Code as the
‘HUMINT system’, and recommends the development of separate rating systems to assess the three
basic components of document and media exploitation (document exploitation/DOMEX, media
exploitation/MEDEX, cell phone exploitation/CELLEX).60
Joseph and Corkill stress that the Admiralty Code is a grading system rather than a comprehensive
evaluation methodology.61 Beyond what is outlined in the scales, evaluators may have a formal
assessment procedure and/or a more exhaustive list of determinants to consider. Supplementary
documents examined add some clarity to the methods, but these also vary in terms of the extra
determinants identified. Furthermore, none of these extra determinants (e.g., source ‘integrity’) are
defined or operationalized. Therefore, such supplementary material may further contribute to
unreliability in the application of the evaluation method.62
Structural issues
In addition to the communicative and criterial issues identified, current methods also vary in terms
of where they position information evaluation within the intelligence process. For instance, NATO
intelligence doctrine embeds evaluation procedures within the processing stage, thus emphasizing
the analyst’s role in gauging information characteristics.63 UK JDP 2–00 outlines a joint role for
analysts and collectors, whereby collectors are responsible for pre-rating information characteristics
before analysts weigh in with their own (potentially broader) understanding of the subject.64
Carter, Noble and Pechan emphasize the primary collector’s role in assessing reliability, particularly
in contexts where access to a clandestine source is restricted to the agent handler.65 This incon-
sistency is significant given the noted mutability of information characteristics over time, across
contexts and at different stages of the intelligence process itself.66
Whether information is assessed upon initial collection, by an analyst during processing, or by
several practitioners throughout the intelligence process could have a substantial impact on its
evaluation. Consequently, the extent to which information is deemed fit to use will largely
determine its influence on intelligence assessments (or lack thereof). In other words, the timing
of source and information evaluation within the intelligence process could add additional inter-
analytic inconsistency to the evaluation process. An intriguing question is whether information
evaluation is given more weight in intelligence analysis when the evaluation step is conducted by
the analyst rather than by the collector. Moreover, do individual differences in analyst character-
istics play a role in how the evaluation is regarded in subsequent analysis? Perhaps analysts who
have a disposition of high self-confidence place more weight in information evaluations when they
rendered them personally, whereas analysts who are perennial self-doubters give more weight to
evaluations from collectors. These hypotheses could be tractably tested in future research.
A compounding issue is the absence of mechanisms for revaluation when new information
becomes available and determinants, such as a source’s reliability rating, are updated.67 For example,
under current methods, it is unclear how users should treat information provided by a source long
considered completely reliable, but suddenly discovered to be unreliable. This is particularly complicated
when information ratings form an interdependent chain (e.g., Information A’s rating is tied to
Information B’s rating; Information B’s rating is tied to the rating of Source X; Source X has just been
exposed as a double agent). Together, these issues may warrant the implementation of information
evaluation as an iterative function throughout the intelligence process. This approach could be applied
to the evaluation of individual pieces of information, as well as the marshalling of evidence when
forming analytic judgements.68 As noted, certain US evaluation methods advocate continuous revalua-
tion of information quality, but none examined provide a clear procedure for doing so.69
INTELLIGENCE AND NATIONAL SECURITY 9
procedures will vary considerably between source types (e.g., the Visible NATO Imagery Interpretability
Rating Scale is designed to assess the sensory capabilities of imaging systems, but it is not relevant to
other collection disciplines).78
Given the diversity of intelligence contexts, which are typically characterized by time constraints and
incomplete information, implementation of a reliable, all-encompassing scoring method may be
unrealistic. Identifying which determinants are relevant and how to weight and combine them will
ultimately fall to the judgement of each evaluator. However, by providing evaluators with sound
methods for evaluating information and for communicating those evaluations (where communication
is necessary), it should be possible to mitigate unreliability and inaccuracy in evaluations, and infidelity
in their communication to end users. Such methods should not only draw on scientific research, their
effectiveness should be the focus of applied scientific research supported by the IC. In the US, the
Intelligence Advanced Research Projects Activity (IARPA), a science and technology organization within
the Office of the Director of National Intelligence is an excellent example of this research approach.
IARPA is exceptional in that its foci within many, if not most, of its research programmes are geared
towards improving intelligence production rather than intelligence collection. This is an unusual
strategy for the IC, which has disproportionately valued and invested in collections capabilities to
the virtual neglect of analysis.79 Nevertheless, it remains to be seen how well IARPA-funded research
will be leveraged within the IC for the benefit of intelligence production.
As described earlier, the Admiralty Code is predicated on the belief that source reliability and
information credibility must be independently evaluated, yet this belief is of questionable validity.
We argue that the dual-rating approach only elliptically addresses the central question underlying
the evaluation step: is the information accurate – should it be factored into analysis? Accordingly, we
propose the introduction of a single measure of estimated information accuracy that incorporates all
available characteristics, including source reliability and its various subcomponents.
Several studies support the synthesis of reliability and credibility into a comprehensive accuracy
measure. Evaluators have been found to pair source reliability and information credibility scores from
the same level (i.e., providing ratings of A1, B2, C3, D4, E5 or F6), and also to base decisions about
accuracy more on credibility than reliability.80 The ambiguity inherent in combining incongruent
ratings may partially explain why evaluators often default to ratings from the same level.81 Moreover,
Nickerson and Feehrer note that when no other information is available to gauge information
credibility, evaluators will logically base their rating on source reliability, given that reliable sources
tend to produce credible information.82 Similarly, Lemercier suggests that determining source
reliability is not an end in itself, but rather a means of assessing information credibility.83
A single accuracy measure could address several challenges related to incongruent ratings and the lack
of comparability between the two scales.84 Samet showed that analysts estimate accuracy less reliably
when basing their decision on separate reliability and credibility metrics than when accuracy is based on a
single measure.85 Similarly, Mandel et al. found that analysts show poor test-retest reliability when
estimating the accuracy of information with incongruent reliability/credibility scores (namely, A5, E1)
compared to when they estimated the accuracy of information with congruent scores (namely, A1, E5).86
The same study showed that inter-analyst agreement plummets as source reliability and information
credibility scores become less congruent.87 For instance, analysts agree less on the accuracy of A5 than A4,
which in turn yields lower agreement than A3, and so on. The research just discussed indicates that,
although the two scales may be treated as distinct in doctrine, in practice, evaluators and users do not –
and perhaps cannot for psychological reasons beyond their control – treat them as such.
Current evaluation standards also lack mechanisms for comparing multiple items of varying quality,
which intelligence analysts are often required to do.88 For instance, it is unclear how analysts should
weight one piece of information rated B3 against another rated C2.89 The margin of interpretation may
be increased by the use of two different scale types; credibility comprises a positive-negative scale
(information is confirmed/invalidated), while reliability ranges from low/non-existent (the source has
provided little/no credible information) to a maximum level (the source has a history of complete
reliability).90 Without clear guidance on how to fuse the alphanumeric dual-valued code into a single
INTELLIGENCE AND NATIONAL SECURITY 11
estimate of information accuracy, end users of such evaluations are left to their own (subjective)
interpretations.91 Again, this creates ample opportunity for unreliability within and across analysts, as
recent research has shown.92 The lack of comparability between scales also means that current
measures of reliability and credibility are ill suited for integration into a semi-automated system for
information evaluation.93
While Bayesian networks can help analysts explore uncertain situations and overcome cognitive
biases, routine (as opposed to supplemental) use of these models could degrade analytic quality
due to the challenges of estimating certain input parameters (e.g., assessing the accuracy of
information provided under conditions of anonymity).107 To this point, Rogova argues that a priori
domain knowledge is often essential when determining many of the input parameters in a system
for information evaluation.108 McNaught and Sutovsky only advocate the use of Bayesian networks
for evidence marshalling when the input parameters are known with a ‘reasonable degree’ of
accuracy.109 Simply put, a coherent integration of ‘garbage in’, which Bayesian approaches should
ensure, will still yield ‘garbage out’. Coherence does not ensure accuracy, but it does raise the
likelihood of accurate judgement.110
Opponents of numeric probabilities in intelligence (among them, Kent’s proverbial ‘poets’) warn
that by exaggerating precision and analytic rigour, such expressions may render decision-makers
overconfident, and excessively risk seeking.111 Contrary to this assumption, however, Friedman,
Lerner and Zeckhauser show that national security officials presented numeric probability assess-
ments are actually less confident, and more receptive to gathering additional information.112
Quantifying the probability of information accuracy could also restrict the empirically demonstrated
tendency of individuals to exploit ambiguous (i.e., verbal) uncertainty expressions to reach self-
serving conclusions.113 While communicators of probabilistic information generally favour verbal
expressions of uncertainty, it is well-established that consumers of such information prefer numeric
estimates rather than verbal expressions.114 Given that information evaluation is fundamentally
consumer-oriented (i.e., designed to inform intelligence analysts), we believe the likely preferences
of these consumers will also favour numeric clarity over vague verbiage.
Concerns over exaggerated precision could be assuaged by explicitly educating both intelligence
producers and consumers that numeric probabilities can be used to convey degrees of belief or
subjective probabilities. In fact, numeric probability judgements do not imply anything about the
method by which one arrives at such judgements.115 Providing such information to educate users
could be achieved with a written disclaimer. Evaluators could also give confidence intervals on
information accuracy estimates, similar to the expressions of analytic confidence that accompany
certain intelligence assessments.116 For instance, an evaluator could judge the probability that
Information A is accurate to be 0.7 (or 70 per cent if expressed in percentage) with a 95 per cent
confidence interval of 0.55–0.85. In other words, the evaluator is conveying that he or she is 95 per cent
certain the probability lies between 55 per cent and 85 per cent and offers 70 per cent as the best
current estimate. By providing an explicit confidence interval, evaluators could directly militate against
misperceptions of over-precision. A confidence interval would also provide unique meta-information
to the consumer (e.g., capturing cases where high quality information is in conflict) and could thus
prompt requests for additional information or clarification of why confidence may be so low.
A probabilistic approach to information evaluation could be supplemented with training
designed to improve collectors’ and analysts’ understanding of probability and their statistical
skill. For example, Mandel designed a brief training protocol on Bayesian belief revision and
hypothesis testing with probabilistic information.117 Military intelligence analysts were assessed
on accuracy and probabilistic coherence before and after receiving the training. The results showed
a statistically significant improvement after training on both accuracy and coherence of analysts’
probability estimates, suggesting that intelligence professionals can reap quick wins in learning
that might enable them to better understand the kinds of probabilistic models described above.118
Similar results have been reported with non-analyst samples.119
This type of training is not only important for understanding such models, however. People
routinely violate logical constraints on probability assessments and exhibit systematic biases in
judgement, and there is no sound reason to believe that analysts are exempt from such
limitations.120 Indeed, findings from recent studies demonstrate that they are not.121 For instance,
Mandel, Karvetski and Dhami find that intelligence analysts violated coherence principles in
judging the probabilities of alternative hypotheses being true. Consistent with research on the
INTELLIGENCE AND NATIONAL SECURITY 13
unpacking effect, analysts’ probability judgements assigned to four mutually exclusive and exhaus-
tive hypotheses summed to significantly more than 100 per cent.122 On average, analysts who used
the Analysis of Competing Hypotheses structured analytic technique to solve the hypothesis-
testing task were less coherent than analysts who were not instructed to use any structured
analytic technique.123 However, by using numeric probabilities, statistical (recalibration and aggre-
gation) methods could be applied to substantially improve both coherence and accuracy of
analysts’ judgements.124 The use of numeric probabilities would greatly improve the IC’s ability
to use post-analytic methods such as these to increase the accuracy and logical rigour of assess-
ments of information and, ultimately, intelligence.
evaluators to pool reasoning strategies and expertise, and help address persistent communication gaps
between analysts and collectors.131
It should be emphasized that we are advocating a hybrid model of process accountability (via
written rationales) and outcome accountability (via accuracy estimates that can be objectively
measured to assess performance). In doing so, we are responding to the IC’s tendency to promote
good judgement by invoking process accountability.132 Efforts to standardize analytic processes,
unless paired with mechanisms to evaluate analytic outcomes, might in fact amount to little more
than ‘bureaucratic symbolism’ used to deflect blame for intelligence failures.133 As well, hybrid
accountability systems have been shown to strike an effective balance between pure accuracy
goals and knowledge-sharing goals.134
Conclusion
Information evaluation is a critical function within the intelligence process. Arguably, failure to evaluate
‘is tantamount to judging [every piece of information] as completely trustworthy and their inputs as
equally important’.137 Based on an examination of current information evaluation methods, we identify
several limitations that can undermine the fidelity of information assessments and, by extension, the
quality of intelligence analysis and decision-making that depends on it.
Considering the diversity of intelligence contexts, and the complex interactions between informa-
tion accuracy determinants, we argue that designing a comprehensive method for scoring accuracy
determinants is unrealistic, and could inadvertently increase unreliability. Instead, we recommend the
following: First, information accuracy should be communicated as a subjective probability expressed in
numeric form, and clarified (when warranted) by a confidence interval. Second, collaboration and
revaluation should be formalized during information evaluation. Third, considerations of information
redundancy, completeness and diagnosticity should be considered later in the intelligence production
stage as part of the assessment process. At the very least, evaluators and consumers should be made
aware of the limitations identified, and intelligence organizations should seek to rectify obvious
inconsistencies between methods, especially at the national level.
Rather than imposing these methods on evaluators in every circumstance, we favour a pragmatic,
contingent approach in which the level of evaluative detail corresponds to the relative importance of
the information under scrutiny. A highly consequential piece of information may warrant teamwork
and thorough annotation (i.e., an accuracy estimate, confidence interval and written rationale), whereas
a singular accuracy estimate will suffice in most cases. Ultimately, subjective judgements will moderate
the level of detail of each evaluation. In providing these recommendations, we aim to improve
information evaluation without unduly taxing intelligence practitioners.
Notes
1. Johnson et al., “Utilization of Reliability Measurements.”
2. United Nations Office on Drugs and Crime, Criminal Intelligence Manual; and Carter, Law Enforcement
Intelligence.
3. Schum and Morris, “Assessing the Competence and Credibility of Human Sources”; and Betts, “Two Faces of
Intelligence Failure.”
4. Ibid.
5. Finley, The Uncertain Oracle; and Aid, “US HUMINT and COMINT.”
6. Finley, The Uncertain Oracle.
16 D. IRWIN AND D. R. MANDEL
56. Nickerson and Feehrer, Decision Making and Training; Oskamp, “Overconfidence in Case-study Judgements”;
and Tsai, Klayman, and Hastie, “Effects of Amount of Information.”
57. Nickerson and Feehrer, Decision Making and Training; and Tsai, Klayman, and Hastie, “Effects of Amount of
Information.”
58. See note 2 above.
59. United States Department of the Army, TC 2–91.8; and United States Department of the Army, ATP 2–22.9.
60. United States Department of the Army, ATTP 2–91.5, 4–4.
61. Joseph and Corkill, “Information Evaluation.”
62. United Nations Office on Drugs and Crime, Criminal Intelligence Manual.
63. NATO Standardization Office, AJP-2.1.
64. United Kingdom Ministry of Defence, JDP 2–00.
65. Carter, Law Enforcement Intelligence; Noble, “Diagnosing Distortion”; and Pechan, “The Collector’s Role.”
66. Schum, Evidence and Inference; Pechan, “The Collector’s Role”; Cholvy and Nimier, “Information Evaluation”; and
Noble, “Diagnosing Distortion.”
67. See note 37 above.
68. Schum and Morris, “Assessing the Competence and Credibility of Human Sources”; and McNaught and
Sutovsky, “Representing Variable Source Credibility.”
69. United States Department of the Army, ATP 3–39.20; and United States Department of the Army, ATTP 2–91.5.
70. Kent, “Words of Estimative Probability.”
71. Rieber and Thomason, “Creation of a National Institute”; Pool, Field Evaluation; Dhami et al., “Improving
Intelligence Analysis”; and Mandel, “Can Decision Science Improve Intelligence Analysis?”
72. Pool, Field Evaluation; and Mandel and Barnes, “Geopolitical Forecasting Skill.”
73. Chang et al., “Restructuring Structured Analytic Techniques”; Irwin and Mandel, Methods for Communicating
Analytic Confidence; Irwin and Mandel, Methods for Communicating Estimative Probability; Barnes, “Making
Intelligence Analysis More Intelligent”; Dhami, “Towards an Evidence-based Approach”; Chang and Tetlock,
“Rethinking the Training”; Hulnick, “What’s Wrong with the Intelligence Cycle”; and Wheaton, “Re-imagining
the Intelligence Process.”
74. Chang et al., “Restructuring Structured Analytic Techniques”; Barnes, “Making Intelligence Analysis More
Intelligent”; Dhami, “Towards an Evidence-based Approach”; and Chang and Tetlock, “Rethinking the Training.”
75. Canadian Forces School of Military Intelligence, Source Reliability and Information; Schum and Morris,
“Assessing the Competence and Credibility of Human Sources”; and Lesot, Pichon, and Delavallade,
“Quantitative Information Evaluation.”
76. Rogova, “Information Quality”; and Rogova et al., “Context-based Information Quality.”
77. Rogova et al., “Context-based Information Quality.”
78. Wong and Jassemi-Zargani, “Predicting Image Quality.”
79. Kerbel, Are the Analytic Tradecraft Standards Hurting as Much as Helping?
80. Baker, McKendry, and Mace, Certitude Judgements; Miron, Patten, and Halpin, The Structure of Combat
Intelligence Ratings; and Samet, “Quantitative Interpretation of Two Qualitative Scales.”
81. See note 15 above.
82. Nickerson and Feehrer, Decision Making and Training.
83. See note 37 above.
84. See note 15 above.
85. Samet, “Quantitative Interpretation of Two Qualitative Scales.”
86. Mandel, “Proceedings of SAS-114 Workshop.”
87. Ibid.
88. McNaught and Sutovsky, “Representing Variable Source Credibility.”
89. See note 15 above.
90. Ibid.
91. Besombes, Nimier, and Cholvy, “Information Evaluation in Fusion.”
92. See note 87 above.
93. See note 33 above.
94. See note 14 above.
95. Friedman et al., “The Value of Precision.”
96. Ibid.
97. Tetlock and Gardner, Superforecasting.
98. See note 86 above.
99. Mellers et al., “Psychological Strategies”; Chang and Tetlock, “Rethinking the Training”; and Mandel and Barnes,
“Geopolitical Forecasting Skill.”
100. Rieber, “Intelligence Analysis”; and Fischoff and Chauvin, Intelligence Analysis.
101. Marchio, “Analytic Tradecraft.”
102. Nelson, “Finding Useful Questions”; and Chang and Tetlock, “Rethinking the Training.”
18 D. IRWIN AND D. R. MANDEL
Acknowledgements
The authors thank Lars Borg, Stephen Coulthart, Jeffrey Friedman, Kristan Wheaton and two anonymous reviewers for
their helpful comments on previous drafts.
Disclosure statement
No potential conflict of interest was reported by the authors.
INTELLIGENCE AND NATIONAL SECURITY 19
Disclaimer
The views presented are those of the authors and do not represent the views of the Department of National Defence
or any of its components, or the Government of Canada. This article builds upon an earlier version that will be a
forthcoming chapter in the final report of the NATO System Analysis and Studies Panel Activity on Communication
and Assessment of Uncertainty in Intelligence to Support Decision-Making (SAS-114).
Funding
This work was supported by the Joint Intelligence Collection and Analytic Capability Project #05ad and Canadian
Safety and Security Program project #2016-TI-2224. These projects are carried out by Defence Research and
Development Canada, an agency of the Department of National Defence.
Notes on contributors
Daniel Irwin is a Research Technologist with Defence Research and Development Canada. He holds an MS in Applied
Intelligence from Mercyhurst University.
David R. Mandel is a senior Defence Scientist with Defence Research and Development Canada and Adjunct Professor
of Psychology at York University. He publishes widely in peer-reviewed journals on the topics of reasoning, judge-
ment, and decision-making and has co-edited The Psychology of Counterfactual Reasoning, Neuroscience of Decision
Making, and Improving Bayesian Reasoning: What Works and Why? Mandel is Chairman of the NATO System Analysis
and Studies Panel Research Technical Group on Assessment and Communication of Uncertainty in Intelligence to
Support Decision Making (SAS-114) and Principal Investigator of multiple Canadian government projects aimed at
improving intelligence production through the application of decision science.
ORCID
David R. Mandel http://orcid.org/0000-0003-1036-2286
Bibliography
Ahlawat, S. S. “Order Effects and Memory for Evidence in Individual versus Group Decision Making in Auditing.”
Journal of Behavioral Decision Making 12 (1999): 71–88. doi:10.1002/(ISSN)1099-0771.
Aid, M. “US HUMINT and COMINT in the Korean War: From the Approach of War to the Chinese Intervention."
Intelligence and National Security 14, no. 4 (1999): 17–63. doi:10.1080/02684529908432570.
Arkes, H. R., and J. Kajdasz. “Intuitive Theories of Behavior.” In Intelligence Analysis: Behavioral and Social Scientific
Foundations, edited by B. Fischoff and C. Chauvin, 143–168. Washington, DC: The National Academies Press,
2011.
Azotea, C. M. “Operational Intelligence Failures of the Korean War.” Master’s thesis, US Army Command and General
Staff College, School of Advanced Military Studies, 2014.
Baker, J. D., J. M. McKendry, and D. J. Mace. “Certitude Judgments in an Operational Environment.” Technical Research
Note 200. Arlington, VA: US Army Research Institute for Behavioral and Social Sciences, 1968. doi:10.1055/s-0028-
1105114.
Bang, D., and C. D. Frith. “Making Better Decisions in Groups.” Royal Society Open Science 4, no. 8 (2017): 170–193.
doi:10.1098/rsos.170193.
Barnes, A. “Making Intelligence Analysis More Intelligent: Using Numeric Probabilities.” Intelligence and National
Security 31, no. 3 (2016): 327–344. doi:10.1080/02684527.2014.994955.
Besombes, J., V. Nimier, and L. Cholvy. “Information Evaluation in Fusion Using Information Correlation.” In Paper
presented at the 12th International Conference on Information Fusion. Seattle, WA, 2009.
Betts, R. K. “Two Faces of Intelligence Failure: September 11 and Iraq’s Missing WMD.” Political Science Quarterly 122,
no. 4 (2008): 585–606. doi:10.1002/j.1538-165X.2007.tb00610.x.
Brun, W., and K. H. Teigen. “Verbal Probabilities: Ambiguous, Context-Dependent, or Both?” Organizational Behavior
and Human Decision Processes 41 (1988): 390–404. doi:10.1016/0749-5978(88)90036-2.
Canadian Forces School of Military Intelligence. Source Reliability and Information Credibility Matrix (V 1.2). Kingston,
Canada: DND, no date.
Capet, P., and T. Delavallade, eds. Information Evaluation. Hoboken, NJ: Wiley-ISTE, 2014.
20 D. IRWIN AND D. R. MANDEL
Capet, P., and A. Revault d’Allonnes. “Information Evaluation in the Military Domain: Doctrines, Practices, and
Shortcomings.” In Information Evaluation, edited by P. Capet and T. Delavallade, 103–125. Hoboken, NJ: Wiley-
ISTE, 2014.
Carter, D. L. Law Enforcement Intelligence: A Guide for State, Local, and Tribal Law Enforcement Agencies. 2nd ed.
Washington, DC: Office of Community Oriented Policing Services, U.S. Department of Justice, 2009, 57–75, 283–
317.
Chang, W., P. Atanasov, S. Patil, B. A. Mellers, and P. E. Tetlock. “Accountability and Adaptive Performance under
Uncertainty: A Long-Term View.” Judgment and Decision Making 12, no. 6 (2017): 610–626.
Chang, W., E. Berdini, D. R. Mandel, and P. E. Tetlock. “Restructuring Structured Analytic Techniques in Intelligence.”
Intelligence and National Security 33, no. 3 (2018): 337–356. doi:10.1080/02684527.2017.1400230.
Chang, W., E. Chen, B. Mellers, and P. Tetlock. “Developing Expert Political Judgment: The Impact of Training and
Practice on Judgmental Accuracy in Geopolitical Forecasting Tournaments.” Judgment and Decision Making 11, no.
5 (2016): 509–526.
Chang, W., and P. E. Tetlock. “Rethinking the Training of Intelligence Analysts.” Intelligence and National Security 31, no.
6 (2016): 903–920. doi:10.1080/02684527.2016.1147164.
Cholvy, L. “Information Evaluation in Fusion: A Case Study.” In Paper presented at the International Conference on
Processing and Management of Uncertainty in Knowledge Based Systems, Perugia, Italy, 2004.
Cholvy, L., and V. Nimier. “Information Evaluation: Discussion About STANAG 2022 Recommendations.” In Proceedings
of the RTO IST Symposium on Military Data and Information Fusion, Prague, Czech Republic, 2003.
Department of National Defence. Canadian Forces Joint Publication CFJP 2-0, Intelligence. Ottawa, Canada: DND, 2011.
Dhami, M. K. “Towards an Evidence-Based Approach to Communicating Uncertainty in Intelligence Analysis.”
Intelligence and National Security 33, no. 2 (2018): 257–272. doi:10.1080/02684527.2017.1394252.
Dhami, M. K., D. R. Mandel, B. A. Mellers, and P. E. Tetlock. “Improving Intelligence Analysis with Decision Science.”
Perspectives on Psychological Science 106, no. 6 (2015): 753–757. doi:10.1177/1745691615598511.
Edwards, W., H. Lindman, and L. J. Savage. “Bayesian Statistical Inference for Psychological Research.” Psychological
Review 70, no. 3 (1963): 193–242. doi:10.1037/h0044139.
Finley, J. P. The Uncertain Oracle: Some Intelligence Failures Revisited. Fort Huachuca, AZ: US Army Intelligence Center
and Fort Huachuca, 1995.
Fischoff, B., and C. Chauvin, eds. Intelligence Analysis: Behavioral and Social Scientific Foundations. Washington, DC: The
National Academies Press, 2011.
Friedman, J. A., J. D. Baker, B. A. Mellers, P. E. Tetlock, and R. Zeckhauser. “The Value of Precision in Probability
Assessment: Evidence from a Large-Scale Geopolitical Forecasting Tournament.” International Studies Quarterly 62,
no. 2 (2018): 410–422.
Friedman, J. A., J. S. Lerner, and R. Zeckhauser. “Behavioral Consequences of Probabilistic Precision: Experimental
Evidence from National Security Professionals.” International Organization 71, no. 4 (2017): 803–826. doi:10.1017/
S0020818317000352.
Friedman, J. A., and R. Zeckhauser. “Assessing Uncertainty in Intelligence.” Intelligence and National Security 27, no. 6
(2012): 824–847. doi:10.1080/02684527.2012.708275.
Friedman, J. A., and R. Zeckhauser. “Handling and Mishandling Estimative Probability: Likelihood, Confidence, and the
Search for Bin Laden.” Intelligence and National Security 30, no. 1 (2015): 77–99. doi:10.1080/02684527.2014.885202.
Hanson, J. M. “The Admiralty Code: A Cognitive Tool for Self-Directed Learning.” International Journal of Learning,
Teaching and Educational Research 14, no. 1 (2015): 97–115.
Heale, R., and D. Forbes. “Understanding Triangulation in Research.” Evidence Based Nursing 16 (2013): 98. doi:10.1136/
eb-2012-101141.
Heuer, R. J., Jr. Psychology of Intelligence Analysis. Washington, DC: Central Intelligence Agency, Center for the Study of
Intelligence, 1999.
Hulnick, A. S. “What’s Wrong with the Intelligence Cycle.” Intelligence and National Security 21, no. 6 (2006): 959–979.
doi:10.1080/02684520601046291.
Irwin, D., and D. R. Mandel. Methods for Communicating Analytic Confidence in Intelligence to Decision-Makers: An
Annotated Collection. DRDC Scientific Letter: DRDC-RDDC-2018-L020. Ottawa, Canada: DND, 2018.
Irwin, D., and D. R. Mandel. Methods for Communicating Estimative Probability in Intelligence to Decision-Makers: An
Annotated Collection. DRDC Scientific Letter: DRDC-RDDC-2018-L017. Ottawa, Canada: DND, 2018.
Irwin, D., and D. R. Mandel. Methods for Evaluating Source Reliability and Information Credibility in Intelligence and Law
Enforcement: An Annotated Collection. DRDC Scientific Letter: DRDC-RDDC-2018-L035. Ottawa, Canada: DND, 2018.
Johnson, E. M., R. C. Cavanagh, R. L. Spooner, and M. G. Samet. “Utilization of Reliability Measurements in Bayesian
Inference: Models and Human Performance.” IEEE Transactions on Reliability 22, no. 3 (1973): 176–182. doi:10.1109/
TR.1973.5215934.
Joseph, J., and J. Corkill. “Information Evaluation: How One Group of Intelligence Analysts Go about the Task.” In
Proceedings of the fourth Australian Security and Intelligence Conference, Perth, Australia, 2011.
Karvetski, C. W., K. C. Olson, D. R. Mandel, and C. R. Twardy. “Probabilistic Coherence Weighting for Optimizing Expert
Forecasts.” Decision Analysis 10, no. 4 (2013): 305–326. doi:10.1287/deca.2013.0279.
INTELLIGENCE AND NATIONAL SECURITY 21
Kent, S. “Words of Estimative Probability.” In Sherman Kent and the Board of National Estimates: Collected Essays, edited
by D. P. Steury, 133–146. Washington, DC: Center for the Study of Intelligence, 1994.
Kerbel, J. Are the Analytic Tradecraft Standards Hurting as much as Helping? Research Short. Bethesda, MD: National
Intelligence University, 2017.
Laughlin, P. R. Group Problem Solving. Princeton, NJ: Princeton University Press, 2011.
Laughlin, P. R., E. C. Hatch, J. S. Silver, and L. Boh. “Groups Perform Better than the Best Individuals on Letters-to-
Numbers Problems: Effects of Group Size.” Journal of Personality and Social Psychology 90, no. 4 (2006): 644–651.
doi:10.1037/0022-3514.90.4.644.
Lemercier, P. “The Fundamentals of Intelligence.” In Information Evaluation, edited by P. Capet and T. Delavallade, 55–
100. Hoboken, NJ: Wiley-ISTE, 2014.
Lerner, J. S., and P. E. Tetlock. “Accounting for the Effects of Accountability.” Psychological Bulletin 125, no. 2 (1999):
255–275.
Lesot, M., F. Pichon, and T. Delavallade. “Quantitative Information Evaluation: Modeling and Experimental Evaluation.”
In Information Evaluation, edited by P. Capet and T. Delavallade, 187–228. Hoboken, NJ: Wiley-ISTE, 2014.
Levine, J. M., and M. G. Samet. “Information Seeking with Multiple Sources of Conflicting and Unreliable Information.”
Human Factors 15, no. 4 (1973): 407–419. doi:10.1177/001872087301500412.
Mandel, D. R. “Are Risk Assessments of a Terrorist Attack Coherent?” Journal of Experimental Psychology: Applied 11, no.
4 (2005): 277–288. doi:10.1037/1076-898X.11.4.277.
Mandel, D. R. “Violations of Coherence in Subjective Probability: A Representational and Assessment Processes
Account.” Cognition 106, no. 1 (2008): 130–156. doi:10.1016/j.cognition.2007.01.001.
Mandel, D. R. “Accuracy of Intelligence Forecasts from the Intelligence Consumer’s Perspective.” Policy Insights from
the Behavioral and Brain Sciences 2 (2015): 111–120. doi:10.1177/2372732215602907.
Mandel, D. R. “Instruction in Information Structuring Improves Bayesian Judgment in Intelligence Analysts.” Frontiers in
Psychology 6 (2015): 1–12. doi:10.3389/fpsyg.2015.00001.
Mandel, D. R. “Proceedings of SAS-114 Workshop on Communicating Uncertainty, Assessing Information Quality and
Risk, and Using Structured Techniques in Intelligence Analysis.” In NATO Meeting Proceedings [Pub. Ref. STO-MP-
SAS-114-AC/323(SAS-114)TP/780]. Brussels, Belgium: NATO STO, 2018. doi: 10.14339/STO-MP-SAS-114.
Mandel, D. R. “Can Decision Science Improve Intelligence Analysis?.” In Researching National Security Intelligence: A
Reader, edited by S. Coulthart, M. Landon-Murray, D. Van Puyvelde. Washington, DC: Georgetown University Press,
in press.
Mandel, D. R., and A. Barnes. “Accuracy of Forecasts in Strategic Intelligence.” Proceedings of the National Academy of
Sciences 111, no. 30 (2014): 10984–10989. doi:10.1073/pnas.1406138111.
Mandel, D. R., and A. Barnes. “Geopolitical Forecasting Skill in Strategic Intelligence.” Journal of Behavioral Decision
Making 31, no. 1 (2018): 127–137. doi:10.1002/bdm.v31.1.
Mandel, D. R., C. W. Karvetski, and M. K. Dhami. “Boosting Intelligence Analysts’ Judgment Accuracy: What Works?
What Fails?.” Judgment and Decision Making, 13, no. 6 (2018), 607–621.
Marchio, J. “‘Analytic Tradecraft and the Intelligence Community: Enduring Value, Intermittent Emphasis.” Intelligence
and National Security 29, no. 2 (2014): 159–183. doi:10.1080/02684527.2012.746415.
McDowell, D. Strategic Intelligence: A Handbook for Practitioners, Managers, and Users. Lanham, MD: Scarecrow Press,
2009.
McNaught, K., and P. Sutovsky. “Representing Variable Source Credibility in Intelligence Analysis with Bayesian
Networks.” In Proceedings of the fifth Australian Security and Intelligence Conference, Perth, Australia, 2012.
Mellers, B., E. Stone, P. Atanasov, N. Rohrbaugh, S. E. Metz, L. Ungar, M. M. Bishop, M. Horowitz, E. Merkle, and P.
Tetlock. “The Psychology of Intelligence Analysis: Drivers of Prediction Accuracy in World Politics.” Journal of
Experimental Psychology: Applied 21, no. 1 (2015): 1–14. doi:10.1037/xap0000040.
Mellers, B., L. Ungar, J. Baron, J. Ramos, B. Gurcay, K. Fincher, S. E. Scott, et al. “Psychological Strategies for Winning a
Geopolitical Forecasting Tournament.” Psychological Science 25, no. 5 (2014): 1106–1115. doi:10.1177/
0956797614524255.
Mellers, B. A., J. D. Baker, E. Chen, D. R. Mandel, and P. E. Tetlock. “How Generalizable Is Good Judgement? A Multi-
Task, Multi-Benchmark Study.” Judgment and Decision Making 12, no. 4 (2017): 369–381.
Miron, M. S., S. M. Patten, and S. M. Halpin. “The Structure of Combat Intelligence Ratings.” Technical Paper 286.
Arlington, VA: US Army Research Institute for Behavioral and Social Sciences, 1978.
Murphy, A. H., S. Lichtenstein, B. Fischhoff, and R. L. Winkler. “Misinterpretation of Precipitation Probability Forecasts.”
Bulletin of the American Meteorological Society 6 (1980): 695–701. doi:10.1175/1520-0477(1980)061<0695:
MOPPF>2.0.CO;2.
NATO Standardization Office. STANAG 2511 – Intelligence Reports. 1st ed. Belgium: Brussels, 2003.
NATO Standardization Office. AJP-2.1, Edition B, Version 1: Allied Joint Doctrine for Intelligence Procedures. Belgium:
Brussels, 2016.
Nelson, J. D. “Finding Useful Questions: On Bayesian Diagnosticity, Probability, Impact, and Information Gain.”
Psychological Review 112, no. 4 (2005): 979–999. doi:10.1037/0033-295X.112.4.979.
22 D. IRWIN AND D. R. MANDEL
Nickerson, R. S. “Confirmation Bias: A Ubiquitous Phenomenon in Many Guises.” Review of General Psychology 2, no. 2
(1998): 175–220. doi:10.1037/1089-2680.2.2.175.
Nickerson, R. S., and C. E. Feehrer. “Decision Making and Training: A Review of Theoretical and Empirical Studies of
Decision Making and Their Implications for the Training of Decision Makers.” Technical Report: NAVTRAEQUIPCEN 73-
C-0128-1. Cambridge, MA: Bolt, Beranek and Newman, Inc., 1975.
Noble, G. P., Jr. “Diagnosing Distortion in Source Reporting: Lessons for HUMINT Reliability from Other Fields.” Master’s
thesis, Mercyhurst College, 2009.
Norman, D. “How to Identify Credible Sources on the Web.” Master’s thesis, Joint Military Intelligence College, 2001.
Oskamp, S. “Overconfidence in Case-Study Judgements.” Journal of Consulting Psychology 29, no. 3 (1965): 261–265.
doi:10.1037/h0022125.
Pechan, B. L. “The Collector’s Role in Evaluation.” In Inside CIA’s Private World: Declassified Articles from the Agency’s
Internal Journal, 1955-1992, edited by H. B. Westerfield, 99–107. New Haven, CT: Yale University Press, 1995.
Piercey, M. D. “Motivated Reasoning and Verbal vs. Numerical Probability Assessment: Evidence from an Accounting
Context.” Organizational Behavior and Human Decision Processes 108, no. 2 (2009): 330–341. doi:10.1016/j.
obhdp.2008.05.004.
Pool, R. Field Evaluation in the Intelligence and Counterintelligence Context: Workshop Summary. Washington, DC: The
National Academies Press, 2010.
Rieber, S. “Intelligence Analysis and Judgemental Calibration.” International Journal of Intelligence and
CounterIntelligence 17, no. 1 (2004): 97–112. doi:10.1080/08850600490273431.
Rieber, S., and N. Thomason. “Creation of a National Institute for Analytic Methods.” Studies in Intelligence 49, no. 4
(2005): 71–78.
Rogova, G., M. Hadzagic, M. St-Hillaire, M. Florea, and P. Valin. “Context-Based Information Quality for Sequential
Decision Making.” In Paper presented at the 2013 IEEE International Multi-Disciplinary Conference on Cognitive
Methods in Situation Awareness and Decision Support (CogSIMA), San Diego, CA, 2013.
Rogova, G., and P. Scott, eds. Fusion Methodologies in Crisis Management Higher Level Fusion and Decision Making.
Cham, Switzerland: Springer International Publishing, 2016.
Rogova, G. L. “Information Quality in Information Fusion and Decision Making with Applications to Crisis
Management.” In Fusion Methodologies in Crisis Management, Higher Level Fusion and Decision Making, edited by
G. L. Rogova and P. Scott, 65–86. Cham, Switzerland: Springer International Publishing, 2016.
Samet, M. G. “Quantitative Interpretation of Two Qualitative Scales Used to Rate Military Intelligence.” Human Factors
17, no. 2 (1975): 192–202. doi:10.1177/001872087501700210.
Samet, M. G. “Subjective Interpretation of Reliability and Accuracy Sales for Evaluating Military Intelligence.” Technical
Paper 260. Arlington, VA: US Army Research Institute for Behavioral and Social Sciences, 1975.
Savage, L. J. The Foundations of Statistics. New York, NY: Wiley, 1954.
Scholten, L., D. van Knippenberg, B. A. Nijstad, and C. K. W. De Reu. “Motivated Information Processing and Group
Decision-Making: Effects of Process Accountability on Information Processing and Decision Quality.” Journal of
Experimental Social Psychology 43 (2007): 539–552. doi:10.1016/j.jesp.2006.05.010.
Schum, D. A. Evidence and Inference for the Intelligence Analyst. Lanham, MD: University Press of America, 1987.
Schum, D. A., and J. R. Morris. “Assessing the Competence and Credibility of Human Sources of Intelligence Evidence:
Contributions from Law and Probability.” Law, Probability and Risk 6 (2007): 247–274. doi:10.1093/lpr/mgm025.
Sedlmeier, P., and G. Gigerenzer. “Teaching Bayesian Reasoning in Less than Two Hours.” Journal of Experimental
Psychology: General 130, no. 3 (2001): 380–400.
Siegel-Jacobs, K., and J. F. Yates. “Effects of Procedural and Outcome Accountability on Judgement Quality.”
Organizational Behavior and Human Decision Processes 65, no. 1 (1996): 1–17. doi:10.1006/obhd.1996.0001.
Tecuci, G., M. Boicu, D. Schum, and D. Marcur. “Coping with the Complexity of Intelligence Analysis: Cognitive
Assistants for Evidence-Based Reasoning.” Research Report #7, Learning Agents Center. Fairfax, VA: George Mason
University, 2010.
Teigen, K. H., and W. Brun. “The Directionality of Verbal Probability Expressions: Effects on Decisions, Predictions, and
Probabilistic Reasoning.” Organizational Behavior and Human Decision Processes 80, no. 2 (1999): 155–190.
doi:10.1006/obhd.1999.2857.
Tetlock, P. E., and D. Gardner. Superforecasting: The Art and Science of Prediction. New York, NY: Crown Publishing
Group, 2015.
Tetlock, P. E., and B. A. Mellers. “Intelligent Management of Intelligence Agencies: Beyond Accountability Ping-Pong.”
American Psychologist 66, no. 6 (2011): 542–554. doi:10.1037/a0024285.
Tsai, C. I., J. Klayman, and R. Hastie. “Effects of Amount of Information on Judgement Accuracy and Confidence.”
Organizational Behavior and Human Decision Processes 107 (2008): 97–105. doi:10.1016/j.obhdp.2008.01.005.
Tubbs, R. M., G. J. Gaeth, I. P. Levin, and L. A. Van Osdol. “Order Effects in Belief Updating with Consistent and
Inconsistent Evidence.” Journal of Behavioral Decision Making 6 (1993): 257–269. doi:10.1002/(ISSN)1099-0771.
Tversky, A., and D. J. Koehler. “Support Theory: A Nonextensional Representation of Subjective Probability.”
Psychological Review 101, no. 4 (1994): 547–567. doi:10.1037/0033-295X.101.4.547.
INTELLIGENCE AND NATIONAL SECURITY 23
United Kingdom Ministry of Defence. Joint Doctrine Publication JDP 2-00, Understanding and Intelligence Support to
Joint Operations. 3rd ed. Swindon, United Kingdom: MOD, 2011.
United Nations Office on Drugs and Crime. Criminal Intelligence Manual for Analysts. Vienna, Austria: UNODC, 2011.
United States Department of the Army. Field Manual FM 30-5, Combat Intelligence. Washington, DC: DOD, 1951.
United States Department of the Army. Field Manual FM 2-22.3, Human Intelligence Collector Operations. Washington,
DC: DOD, 2006.
United States Department of the Army. Document and Media Exploitation Tactics, Techniques, and Procedures ATTP 2-
91.5 - Final Draft. Washington, DC: DOD, 2010.
United States Department of the Army. Training Circular TC 2-91.8, Document and Media Exploitation. Washington, DC:
DOD, 2010.
United States Department of the Army. Army Techniques Publication ATP 2-22.9, Open-Source Intelligence. Washington,
DC: DOD, 2012.
United States Department of the Army. Army Techniques Publication ATP 3-39.20, Police Intelligence Operations.
Washington, DC: DOD, 2012.
Villejoubert, G., and D. R. Mandel. “The Inverse Fallacy: An Account of Deviations from Bayes’s Theorem and the
Additivity Principle.” Memory & Cognition 30, no. 2 (2002): 171–178. doi:10.3758/BF03195278.
Wallsten, T. S., D. V. Budescu, R. Zwick, and S. M. Kemp. “Preferences and Reasons for Communicating Probabilistic
Information in Verbal or Numerical Terms.” Bulletin of the Psychonomic Society 31 (1993): 135–138. doi:10.3758/
BF03334162.
Wang, G., S. R. Kulkami, H. V. Poor, and D. N. Osherson. “Aggregating Large Sets of Probabilistic Forecasts by Weighted
Coherent Adjustment.” Decision Analysis 8 (2011): 128–144. doi:10.1287/deca.1110.0206.
Wheaton, K. “Re-Imagining the Intelligence Process or ‘Let’s Kill the Intelligence Cycle’.” 2012. http://files.isanet.org/
ConferenceArchive/1b1feb5339bf4573a9545306f734eedb.pdf
Wong, S., and R. Jassemi-Zargani. “Predicting Image Quality of Surveillance Sensors.” DRDC Scientific Report: DRDC-
RDDC-2014-R97. Ottawa, Canada: DND, 2014.
Yates, J. F., P. C. Price, J. Lee, and J. Ramirez. “Good Probabilistic Forecasters: The ‘Consumer’s’ Perspective.”
International Journal of Forecasting 12, no. 1 (1996): 41–56. doi:10.1016/0169-2070(95)00636-2.