Improving Information Evaluation For Intelligence Production

Intelligence and National Security
ISSN: 0268-4527 (Print) 1743-9019 (Online) Journal homepage: https://www.tandfonline.com/loi/fint20
Improving information evaluation for intelligence

production
Daniel Irwin & David R. Mandel
To cite this article: Daniel Irwin & David R. Mandel (2019): Improving information evaluation for
intelligence production, Intelligence and National Security, DOI: 10.1080/02684527.2019.1569343
To link to this article: https://doi.org/10.1080/02684527.2019.1569343
Published online: 06 Feb 2019.
Submit your article to this journal
View Crossmark data
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=fint20
INTELLIGENCE AND NATIONAL SECURITY
https://doi.org/10.1080/02684527.2019.1569343
ARTICLE
Improving information evaluation for intelligence production

Daniel Irwin and David R. Mandel
ABSTRACT
National security decision-making is informed by intelligence assessments,
which in turn depend on sound information evaluation. We critically examine
information evaluation methods, arguing that they mask rather than effec-
tively guide subjectivity in intelligence assessment. Drawing on the guidance
metaphor, we propose that rigid ‘all-purpose’ information evaluation meth-
ods be replaced by flexible ‘context-sensitive’ guidelines aimed at improving
the soundness, precision, accuracy and clarity of irreducibly subjective judg-
ments. Specific guidelines, supported by empirical evidence, include use of
numeric probability estimates to quantify the judged likelihood of informa-
tion accuracy, promoting collector-analyst collaboration and periodic revalua-
tion of information as new information is acquired.
Introduction
Intelligence practitioners must regularly exploit information of uncertain quality to support decision-
making.1 Whether information is obtained from a human source or an automated sensor, failure to
critically assess its characteristics may contribute to intelligence failure.2 This is evident in the case of
Curveball, the Iraqi informant who fabricated extensive testimony on Saddam Hussein’s alleged
weapons of mass destruction (WMD).3 Subjected to inadequate scrutiny, Curveball’s false allegations
underpinned the 2002 National Intelligence Estimate on Iraq’s WMD programmes and may have
influenced the ill-fated decision to invade Iraq in 2003.4
Recognizing information evaluation as a critical function within the intelligence process, certain
organizations promulgate methods for rating and communicating relevant information characteristics.
Despite their intent, many of these methods are inconsistent across organizations and may be
fundamentally flawed or otherwise ill-suited to the context of application. Under certain circumstances,
poorly formulated or misapplied evaluation procedures could even degrade the quality of analytic
judgements and impair decision-making.
In 1950, US officials adhering to such methods broadly applied the lowest possible reliability
rating to South Korean sources and dismissed repeated warnings of imminent North Korean
invasion as politically motivated.5 North Korea thus ‘achieved complete tactical surprise and
would nearly overwhelm the peninsula’ before being repelled by UN intervention.6 As UN forces
advanced on Pyongyang, local reports of Chinese mobilization were also prematurely discounted
as self-serving.7 As a result, US decision-makers were again blindsided when Communist Chinese
troops crossed the Yalu River and routed the Eighth Army.8 Despite this anecdote being nearly
70 years old, the information evaluation methods enshrined in current doctrine remain largely
unchanged.9
In this paper, we provide a critical examination of the source reliability and information
credibility methods in use across a variety of intelligence organizations and intelligence domains.
We identify weaknesses and inconsistencies that may undermine the fidelity of information
CONTACT Daniel Irwin dan.irwin@drdc-rddc.gc.ca

© Copyright of the Crown in Canada 2019 Department of National Defence
2 D. IRWIN AND D. R. MANDEL
evaluations and, by extension, the quality of intelligence consumers’ decision-making. We link

these issues to a broader tendency of the intelligence community (IC) to attempt to mitigate or
‘contain’ subjectivity rather than guide it on the basis of sound normative principles that might
offer prescriptive value. Based on our critique, we propose an alternative approach to information
evaluation, whereby information quality is expressed as a probabilistic estimate of information
accuracy. This estimate can vary in its precision while maintaining a high level of communication
fidelity for analysts that might then use the information to reach their own probabilistic intelli-
gence assessments. We also provide a set of potentially useful questions for intelligence practi-
tioners to consider during information evaluation.
Overview of current methods

The information evaluation criteria presented in NATO intelligence doctrine is known as the
Admiralty Code, Admiralty System or NATO System.10 Developed by the Royal Navy in the 1940s,
the system has undergone little change since its inception, and it forms the basis of methods used
by several Alliance members, as well as organizations in other domains.11 Under the Admiralty
Code, information is assessed on two dimensions: source reliability and information credibility.
Users are instructed to consider these dimensions independently and to rate them on two separate
scales (Table 1). The resultant rating is expressed using the corresponding alphanumeric code. For
example, information that is judged as probably true and coming from a source deemed to be
usually reliable would be rated B2. Both scales include an option to be used when there is an
inability to assess (‘F’ for source reliability and ‘6’ for information credibility). Thus, ratings ‘F’ and ‘6’
are not part of the ordinal scales composed of ratings A–E and 1–5, respectively.
The defunct NATO Standardization Agreement (STANAG) 2511, which was superseded in terms
of information evaluation methods by NATO AJP-2.1, provides a more detailed version of the
Admiralty Code.12 It is presented in Tables 2 and 3 for historical reference. In line with many of the
methods examined, STANAG 2511 includes a qualitative description for each reliability and cred-
ibility rating. Source reliability is conceptually linked to ‘confidence’ in a given source, based on
past performance, whereas information credibility reflects the extent to which new information
conforms to previous reporting.13
Critical examination of these methods exposes several limitations. Given the extensive influence
of the Admiralty Code, and efforts by many NATO members to conform to Alliance doctrine, the
issues outlined below are common across most of the methods examined. We organize our
discussion of these issues into three categories: (a) communicative, which relate to how ratings
are communicated, (b) criterial, which relate to the rating determinants under consideration and (c)
structural, which relate to the position of information evaluation within the intelligence process.
Communicative issues
Under the Admiralty Code, qualitative ratings of reliability and credibility form a demonstrably intuitive
progression.14 However, subjective interpretations of the boundaries between these ratings are likely
to vary among users, as are interpretations of the relevant rating criteria.15 For instance, in many
Table 1. NATO AJP-2.1 source reliability and information credibility scales.

Reliability of the collection capability Credibility of the information
A Completely reliable 1 Completely credible
B Usually reliable 2 Probably true
C Fairly reliable 3 Possibly true
D Not usually reliable 4 Doubtful
E Unreliable 5 Improbable
F Reliability cannot be judged 6 Truth cannot be judged
INTELLIGENCE AND NATIONAL SECURITY 3
Table 2. NATO STANAG 2511 source reliability scale.

Reliability of source
A Completely Reliable Refers to a tried and trusted source which can be depended upon with confidence.
B Usually Reliable Refers to a source which has been successful in the past but for which there is still some element
of doubt in a particular case.
C Fairly Reliable Refers to a source which has occasionally been used in the past and upon which some degree of
confidence can be based.
D Not Usually Reliable Refers to a source which has been used in the past but has proved more often than not
unreliable.
E Unreliable Refers to a source which has been used in the past and has proved unworthy of any confidence.
F Reliability cannot be Refers to a source which has not been used in the past.
judged
Table 3. NATO STANAG 2511 information credibility scale.

Credibility of information
1 Confirmed by other If it can be stated with certainty that the reported information originates from another source
sources than the already existing information on the same subject, it is classified as ‘confirmed by other
sources’ and is rated ‘1’.
2 Probably true If the independence of the source of any item or information cannot be guaranteed, but if, from
the quantity and quality of previous reports its likelihood is nevertheless regarded as
sufficiently established, then the information should be classified as ‘probably true’ and given a
rating of ‘2’.
3 Possibly true If, despite there being insufficient confirmation to establish any higher degree of likelihood, a
freshly reported item of information does not conflict with the previously reported behaviour
pattern of the target, the item may be classified as ‘possibly true’ and given a rating of ‘3’.
4 Doubtful An item of information which tends to conflict with the previously reported or established
behaviour pattern of an intelligence target should be classified as ‘doubtful’ and given a rating
of ‘4’.
5 Improbable An item of information which positively contradicts previously reported information or conflicts
with the established behaviour pattern of an intelligence target in a marked degree should be
classified as ‘improbable’ and given a rating of ‘5’.
6 Truth cannot be Any freshly reported item of information which provides no basis for comparison with any known
judged behaviour pattern of a target must be classified as ‘truth cannot be judged’ and given a rating
of ‘6’. Such a rating should be given only when the accurate use of higher rating is impossible.
versions of the Admiralty Code, a reliable (‘A’) source is said to have a ‘history of complete reliability’,
while a usually reliable (‘B’) source has a ‘history of valid information most of the time’.16 None of the
standards examined associate these descriptions with numeric values (i.e., ‘batting averages’), poten-
tially leading to miscommunication. One analyst may assign usually reliable to sources that provide
valid information more than 70 per cent of the time. An analyst receiving this rating may interpret it to
mean valid information more than 90 per cent of the time, and place more confidence in the source
than is warranted. Conversely, an analyst may assume usually reliable only reflects valid information
more than 50 per cent of the time and might prematurely discount the source. Asked to assign absolute
probability values to reliability and credibility ratings, US intelligence officers demonstrated consider-
able variation in their interpretations.17 For example, probabilistic interpretations of usually reliable and
probably true ranged from .55 to .90 and .53 to .90, respectively, while interpretations of fairly reliable
and possibly true both ranged from .40 to .80.18 The latter terms not only varied greatly in meaning
across officers, they straddled the fifty-fifty midpoint of the probability scale, which suggests that the
same evaluations may point different analysts in different directions, prompting some to reject
information that others would accept based on their disparate interpretations.
Among the methods examined, reliable or completely reliable indicates maximum source relia-
bility, while confirmed, confirmed by other sources or completely credible marks the highest degree of
information credibility. Despite these inconsistencies, most scales faithfully reproduce the
Admiralty Code’s A–F (reliability)/1–6 (credibility) scoring scheme, and ratings are often commu-
nicated using only the appropriate alphanumeric code (e.g., A1). These terminological variations
may therefore contribute to miscommunication between users familiar with different methods. For
instance, under most US methods examined, ‘A’ is defined as reliable, while UK Joint Doctrine
Publication (JDP) 2–00 defines ‘A’ as completely reliable (conforming to NATO doctrine).19 A US
analyst who understands ‘A’ to mean reliable might transmit that rating to a UK counterpart, who
interprets it as completely reliable. This translation is potentially problematic, given that an analyst
or consumer may place more weight on a source labelled completely reliable than one labelled
reliable. Alternatively, the translation from completely reliable to reliable could lead a recipient to
undervalue a source.
Opportunities for miscommunication due to variance between methods could also arise where
information credibility scales use the term ‘information accuracy’ as a synonym for information
credibility (e.g., US ATTP 2–91.5; US TC 2–91.8).20 Although information credibility often includes
considerations of accuracy, it is usually conceptualized as a multidimensional construct. Credibility
generally incorporates criteria that can serve as cues to accuracy, but which are not equivalent to
accuracy. For example, ‘triangulating evidence’ contributes to credibility, but it does not require
ground truth used for verifying accuracy. Thus, use of ‘information accuracy’ by certain standards
may further undermine reliable interpretation of ratings.
Another communicative issue relates to liberal use of terms conveying certainty (e.g., confirmed).
In intelligence contexts where the information is seldom complete and often ambiguous or vague,
these expressions could lead to overconfidence on the part of consumers.21 NATO intelligence
doctrine explicitly discourages statements of certainty, ‘given the nature of intelligence projecting
forward in time’.22 However, it remains unverified whether NATO’s current term, completely credible,
conveys less certainty than its earlier term, confirmed by other sources.
The problematic inclusion of terms indicating certainty is compounded by the observed tendency
of evaluators to confine their ratings to the high ends of the scales.23 In their review of spot reports
completed during a US Army field exercise, Baker, McKendry and Mace found that A1 and B2
represented 80 per cent of all reliability/credibility ratings, with B2 alone comprising 74 per cent of
ratings.24 This tendency to treat B2 as a ‘sweet spot’ is particularly concerning given findings that
decision makers who receive highly rated information are less likely to seek additional information
prior to making an initial decision.25
The constriction of scale use (i.e., the process by which the five ordinal levels provided are
effectively narrowed to two) might reflect accountability pressure – namely, the ‘implicit or explicit
expectation that one may be called on to justify one’s beliefs, feelings, and actions to others’.26
Accountability pressure has been posited to account for why analysts exhibit underconfidence in
strategic intelligence forecasting, and why they do so even more strongly when the forecasts are
deemed to be of high importance to decision-makers.27 Individuals under accountability pressures
engage in a variety of tactics to secure approval and pre-empt criticism from their respective
audiences.28 A rating of B2 likely represents the accountability sweet spot because it uses terms
with positive linguistic directionality that optimistically signal the value of the information from a
trustworthy source, yet without sounding overconfident or insufficiently critical.29 In fact, intelli-
gence analysts show a distinct bias towards using verbal probabilities with positive directionality in
strategic intelligence forecasts.30 Thus, evaluators may prefer to avoid ratings worse than B2 if they
anticipate being challenged for doing so. Conversely, evaluators may view ratings higher than B2
as running the risk of seeming too confident or uncritical. Note as well that as scale levels fall into
disuse, they become increasingly untenable because there are strong normative pressures to avoid
them. That is, the expectation is that one ought to be especially well justified to use something that
is an abnormal practice.
Criterial issues
Beyond the communicative issues outlined above, a set of criterial issues stem from the rating
determinants incorporated (and neglected) by current evaluation methods. A particularly problematic
feature of the Admiralty Code is its lack of situational considerations and implicit treatment of source
reliability as constant across different contexts.31 Regardless of past performance, source reliability may
vary dramatically depending on the nature of the information provided, the characteristics of the
source(s), and the circumstances of collection. A human intelligence (HUMINT) source with a proven
track record reporting on military operations may lack the expertise to reliably observe and report on
economic developments. Beyond variable expertise, HUMINT source motivations, expectations, sensi-
tivity and recall ability may shift between situations, with major implications for information quality.32
Even the reliability of an ‘objective source’ (i.e., a sensor) is context dependent.33 For example,
inclement weather may compromise the quality of information provided by an optical sensor, despite
its history of perfect reporting under ideal conditions.
Aside from source history, most of the methods examined highlight reliability determinants
such as ‘authenticity’, ‘competency’ and ‘trustworthiness’. The inclusion of these determinants is
consistent with the broader literature on source reliability.34 However, the extant methods fail to
formally define or operationalize these concepts. Their inclusion is therefore likely to increase
subjectivity and further undermine the internal consistency of source reliability evaluations. Chang
et al. describe how a process designed to decompose and evaluate components of a problem (i.e.,
information characteristics) may inadvertently amplify unreliability in assessments if that process is
ambiguous and open to subjective interpretations.35 Given the ambiguity built into current
evaluation methods, users are unlikely to retrieve every relevant determinant, let alone reliably
and validly weigh every relevant determinant when arriving at an ordinal assessment.
Another issue with current source reliability methods is their failure to delineate procedures for
evaluating subjective sources versus objective sources (e.g., human sources versus sensors), or
primary sources versus secondary/relaying sources.36 A determinant such as source motivation may
be relevant when assessing HUMINT sources, but not sensors. Similarly, source expertise may be
highly relevant for a primary source collecting technical information (e.g., a HUMINT asset gather-
ing information on Iranian nuclear technology), but less so for an intermediary delivering this
information to a collector. In cases where information passes through multiple sources, there are
often several intervals where source reliability considerations are relevant. For instance, when
receiving second-hand information from a HUMINT source, one might consider the reliability of
the primary source, the reliability of the secondary/relaying source(s), the reliability of the collector,
as well as the reliability of any medium(s) used to transmit the information.37
Following initial collection, Noble describes how information may undergo distortion at other
stages of the intelligence process.38 Like sources, intelligence practitioners will vary in terms of
their ability to reliably assess and relay information. For instance, an economic subject matter
expert may lack the expertise to accurately evaluate and transmit information on enemy troop
movements. Beyond expertise, an intelligence practitioner’s assessment is also undoubtedly influ-
enced by other personal characteristics (e.g., motivation, expectations, biases, recall ability) as well
as various contextual factors.39 When a finished intelligence product is edited and approved for
dissemination, managers may inject additional distortion by adjusting analytic conclusions.40 The
many opportunities for distortion may warrant the formalization of information evaluation as an
ongoing requirement throughout the intelligence process.41 At the very least, efforts should be
made to ensure that intelligence practitioners and consumers alike are cognizant of the mutability
of information characteristics following the initial evaluation.
Much like the scales for evaluating source reliability, information credibility scales suffer from an
inherent lack of clarity. Information credibility generally incorporates confirmation ‘by other inde-
pendent sources’ as a key determinant. However, evaluators are given no guidance on how many
independent sources must provide confirmation for information to be judged credible. Where one
evaluator might consider confirmation by two sources sufficient for a confirmed rating, another
might seek verification by three or more sources. Perceptions of how much corroboration is
necessary may also vary depending on the information in question. For instance, an analyst may
decide that a particularly consequential piece of information requires more corroboration than
usual to be rated confirmed. This lack of consistency could lead analysts to misinterpret each other’s
credibility ratings.
Methods for evaluating information credibility also lack instructions for grading pieces of
information that are, by alternative evidence sources, simultaneously confirmed and disconfirmed.
Under the Admiralty Code, such information could be considered both confirmed/completely
credible (‘1’) and improbable (‘5’).42 Without guidance, some analysts may base their assessments
more heavily on instances of confirmation, whereas others might focus on instances of disconfir-
mation or analysts might attempt to strike a balance between confirmation and disconfirmation.
These three approaches may yield very different evaluations, despite evaluating the same piece of
information using the same method.
Capet and Revault d’Allonnes argue that confirmation does not necessarily translate into
information credibility and not all forms of confirmation should be given equal weight.43 For
instance, a spurious rumour corroborated by many unreliable sources (e.g., tweets about a second
shooter during a terrorist attack) and disconfirmed by a single reliable source (e.g., a police
statement indicating a single attacker) could still be rated highly credible under current methods.
Capet and Revault d’Allonnes suggest that source reliability be taken into consideration when
weighing confirmation against disconfirmation.44 This would directly contravene the Admiralty
Code’s treatment of source reliability and information credibility as independent factors.
Current methods also lack explicit guidance to consider whether relationships of affinity,
hostility or independence45 exist between corroborating sources.46 However, corroboration from
a source that has a ‘friendly’ relationship with the source under scrutiny should likely have less
influence than corroboration from an independent or hostile source. For example, all else being
equal, if Saudi Arabia corroborates information provided by Syria (with which it currently has a
hostile relationship), that confirmation should carry more weight than identical confirmation
provided by Russia (which currently has a relationship of affinity with Syria). As a general rule,
sources that are friendly to one another should be expected to corroborate each other more often
than sources that are not friendly.47
Friedman and Zeckhauser suggest that the current emphasis on consistency with existing
evidence may encourage confirmation bias.48 ‘Biased attrition’ is used to describe an information
filtering process that systematically favours certain information types in a problematic way.49
Information that conflicts with prior beliefs and analysis may in fact be more valuable, as it can
shift the views of analysts and consumers more significantly. Friedman and Zeckhauser argue that
methods for evaluating information credibility could reduce biased attrition by incorporating the
extent to which information provides a new or original perspective on the intelligence requirement
at hand.50 Likewise, Capet and Revault d’Allonnes suggest that evaluation methods be modified to
gauge the extent to which information provides meaningful corroboration.51
Along similar lines, Lemercier notes that confirmation-based credibility standards do not
account for the phenomenon of amplification, whereby analysts come to believe closely correlated
sources are independently verifying a piece of information.52 In order to control for amplification,
credibility evaluation could incorporate successive corroboration by the same source, corrobora-
tion by sources of the same type, as well as comparative corroboration from different collection
disciplines.53
The current emphasis placed on confirmation/consistency may also reinforce a primacy effect,
given that new information must conform to prior information to be deemed credible. All else
being equal, if an analyst receives three new pieces of information, the first item received will
typically face the fewest hurdles to being assessed as credible. Meanwhile, the second piece of
information must conform to the first, and the third must conform to both the first and second.
Under this system, an analyst may inadvertently underweight information that is in fact more
accurate or consequential than information received earlier, potentially decreasing the quality of
analysis. Ultimately, the order in which information is received should be irrelevant to judgements
made about its quality.54
One option for dealing with the primacy effect would be the inclusion of a triangulation procedure
for revaluating prior pieces of information as new information becomes available. Figure 1 compares
the primacy effect reinforced by current methods with a system of evidence triangulation, where the
arrows indicate the ‘direction’ of confirmation. Two of the US methods examined advocate continuous
analysis and revaluation of source reliability/information credibility as new information becomes
available.55 However, neither document outlines a specific procedure for doing so.
The lack of guidance regarding confirmation (specifically, what level of corroboration warrants each
rating) could also facilitate overconfidence stemming from information volume. Beyond an early point
in the information gathering process, predictive accuracy plateaus, while confidence continues to rise,
creating substantial confidence-accuracy discrepancies.56 Failing to adjust for their cognitive limita-
tions, judges often become overconfident in the face of surplus information, despite being unable to
assimilate it effectively.57 Thus, without adequate guidance, evaluators may overvalue a piece of
information confirmed six times, when an item confirmed three times has an equal probability of
being accurate. In other words, the amount of confirmation on its own is a fallible indicator of
information accuracy.
Beyond confirmation, most of the information credibility scales examined incorporate consideration
of whether an item is ‘logical in itself’. Current methods do not specify whether this simply refers to the
extent that information conforms to the analyst’s current assessment. Furthermore – and not without a
touch of irony – the use in certain standards of ‘not illogical’ as a level between ‘logical in itself’ and
‘illogical in itself’ is nonsensical, as ‘not illogical’ effectively means ‘logical’ (in itself).58
As noted with regards to source reliability, the Admiralty Code’s one-size-fits-all approach to
information credibility neglects important contextual considerations. Several US evaluation methods
suggest that certain credibility determinants have more relevance depending on the collection
discipline utilized. For example, US TC 2–91.8 and US ATP 2–22.9 suggest that there is a greater
Figure 1. Primacy effect reinforced by current methods versus evidence triangulation (arrows indicate the direction of
confirmation).
risk of deception (an information credibility determinant) when utilizing open source intelligence
than captured enemy documents.59 Similarly, US ATTP 2–91.5 refers to the Admiralty Code as the
‘HUMINT system’, and recommends the development of separate rating systems to assess the three
basic components of document and media exploitation (document exploitation/DOMEX, media
exploitation/MEDEX, cell phone exploitation/CELLEX).60
Joseph and Corkill stress that the Admiralty Code is a grading system rather than a comprehensive
evaluation methodology.61 Beyond what is outlined in the scales, evaluators may have a formal
assessment procedure and/or a more exhaustive list of determinants to consider. Supplementary
documents examined add some clarity to the methods, but these also vary in terms of the extra
determinants identified. Furthermore, none of these extra determinants (e.g., source ‘integrity’) are
defined or operationalized. Therefore, such supplementary material may further contribute to
unreliability in the application of the evaluation method.62
Structural issues
In addition to the communicative and criterial issues identified, current methods also vary in terms
of where they position information evaluation within the intelligence process. For instance, NATO
intelligence doctrine embeds evaluation procedures within the processing stage, thus emphasizing
the analyst’s role in gauging information characteristics.63 UK JDP 2–00 outlines a joint role for
analysts and collectors, whereby collectors are responsible for pre-rating information characteristics
before analysts weigh in with their own (potentially broader) understanding of the subject.64
Carter, Noble and Pechan emphasize the primary collector’s role in assessing reliability, particularly
in contexts where access to a clandestine source is restricted to the agent handler.65 This incon-
sistency is significant given the noted mutability of information characteristics over time, across
contexts and at different stages of the intelligence process itself.66
Whether information is assessed upon initial collection, by an analyst during processing, or by
several practitioners throughout the intelligence process could have a substantial impact on its
evaluation. Consequently, the extent to which information is deemed fit to use will largely
determine its influence on intelligence assessments (or lack thereof). In other words, the timing
of source and information evaluation within the intelligence process could add additional inter-
analytic inconsistency to the evaluation process. An intriguing question is whether information
evaluation is given more weight in intelligence analysis when the evaluation step is conducted by
the analyst rather than by the collector. Moreover, do individual differences in analyst character-
istics play a role in how the evaluation is regarded in subsequent analysis? Perhaps analysts who
have a disposition of high self-confidence place more weight in information evaluations when they
rendered them personally, whereas analysts who are perennial self-doubters give more weight to
evaluations from collectors. These hypotheses could be tractably tested in future research.
A compounding issue is the absence of mechanisms for revaluation when new information
becomes available and determinants, such as a source’s reliability rating, are updated.67 For example,
under current methods, it is unclear how users should treat information provided by a source long
considered completely reliable, but suddenly discovered to be unreliable. This is particularly complicated
when information ratings form an interdependent chain (e.g., Information A’s rating is tied to
Information B’s rating; Information B’s rating is tied to the rating of Source X; Source X has just been
exposed as a double agent). Together, these issues may warrant the implementation of information
evaluation as an iterative function throughout the intelligence process. This approach could be applied
to the evaluation of individual pieces of information, as well as the marshalling of evidence when
forming analytic judgements.68 As noted, certain US evaluation methods advocate continuous revalua-
tion of information quality, but none examined provide a clear procedure for doing so.69
Information evaluation methods in context

Before outlining an alternative approach to information evaluation, it is worth situating our critique
of current methods within a broader conceptual framework. We argue that many of the limitations
identified reflect a larger problem with how the IC supports intelligence analysts. The IC has long
recognized the high degree of subjectivity inherent in intelligence production.70 It further recognizes
that subjectivity can breed both unreliability and invalidity. Intelligence doctrine and methods to
support intelligence production have therefore been directed by the ostensibly reasonable goal of
mitigating or containing subjectivity to improve the reliability and objectivity of intelligence analysis.
In our view, there are problems with both the goal itself and with the means the IC has taken to
pursue it. The goal of rooting out subjectivity in intelligence production is unrealistic because so many
aspects of intelligence production depend on the informed judgements of individuals and on their
expert knowledge. If subjectivity were to be denied its place in intelligence analysis, much of that
knowledge would need to be excised, but to be replaced by what exactly? Intelligence analysis cannot
effectively proceed on the basis of statistical relative frequency data alone. It must draw on analysts’
cognitive models of complex geopolitical phenomena, which are inherently subjective. We do not
believe intelligence production would be aided by the elimination of subjectivity from its processes,
even if it were possible.
The problem with the IC’s desire to contain subjectivity, however, has been compounded by its
means for achieving that goal. The IC rarely draws systematically and with proper care on scientific
theory and evidence concerning effective means for improving expert judgement.71 Perhaps even
more importantly, the IC does not routinely put what it regards as good ideas for intelligence
production methods to well-conceived scientific tests. It rarely, if ever, does the equivalent of
running a randomly controlled trial to test whether its proposed ‘treatment’ is more effective than
the status quo ‘control.’ Rather, the IC all too often tasks intelligence personnel with ‘thinking up’
something better than the status quo, without ever confirming whether these ‘good ideas’ in fact
succeed or fail in achieving their intended effects. Nor is anecdotal evidence likely to be sufficiently
revealing since most intelligence products are not systematically tracked for accuracy or other
quantifiable facets of skill in expert judgement, in spite of recommendations for such monitoring.72
Beyond information evaluation, the tendency to undervalue and underutilize, if not entirely neglect,
relevant areas of social, behavioural and cognitive science can be observed in the IC’s treatment of such
topics as structured analytic techniques, communication of probability and confidence, analytic train-
ing, and even the concept of the intelligence cycle.73 Doctrinal methods give the impression that the IC
abides by a set of universal processes, when in practice, intelligence production is much more organic,
and practitioners infrequently (if ever) adhere to the rigid guidelines provided. Rather than seeking to
contain or mask subjectivity, the IC should explicitly acknowledge it, and it should provide flexible
guidelines that draw on relevant areas of science and technology to help practitioners manage it in a
principled manner.74
An alternative approach to information evaluation

Several attempts have been made to develop more objective methods of information evaluation.75
However, without a firm empirical basis, efforts to weight determinants will likely inject additional
subjectivity that may, in turn, further increase the unreliability or invalidity of such evaluations. In building
a comprehensive scoring method, it would be necessary to incorporate the hierarchy of relevant
determinants, possible interactions or trade-offs between determinants, as well as their importance
depending on context and end-user information requirements.76 For example, confirmation may be
less important than (or even conflict with) considerations of timeliness, where the pursuit of confirmation
translates into unacceptable decision latency.77 As described, certain determinants (e.g., motivation) may
be completely irrelevant depending on the information under scrutiny. Meanwhile, other determinants
(e.g., sensory capabilities) are relevant for both subjective and objective sources, but assessment
procedures will vary considerably between source types (e.g., the Visible NATO Imagery Interpretability
Rating Scale is designed to assess the sensory capabilities of imaging systems, but it is not relevant to
other collection disciplines).78
Given the diversity of intelligence contexts, which are typically characterized by time constraints and
incomplete information, implementation of a reliable, all-encompassing scoring method may be
unrealistic. Identifying which determinants are relevant and how to weight and combine them will
ultimately fall to the judgement of each evaluator. However, by providing evaluators with sound
methods for evaluating information and for communicating those evaluations (where communication
is necessary), it should be possible to mitigate unreliability and inaccuracy in evaluations, and infidelity
in their communication to end users. Such methods should not only draw on scientific research, their
effectiveness should be the focus of applied scientific research supported by the IC. In the US, the
Intelligence Advanced Research Projects Activity (IARPA), a science and technology organization within
the Office of the Director of National Intelligence is an excellent example of this research approach.
IARPA is exceptional in that its foci within many, if not most, of its research programmes are geared
towards improving intelligence production rather than intelligence collection. This is an unusual
strategy for the IC, which has disproportionately valued and invested in collections capabilities to
the virtual neglect of analysis.79 Nevertheless, it remains to be seen how well IARPA-funded research
will be leveraged within the IC for the benefit of intelligence production.
As described earlier, the Admiralty Code is predicated on the belief that source reliability and
information credibility must be independently evaluated, yet this belief is of questionable validity.
We argue that the dual-rating approach only elliptically addresses the central question underlying
the evaluation step: is the information accurate – should it be factored into analysis? Accordingly, we
propose the introduction of a single measure of estimated information accuracy that incorporates all
available characteristics, including source reliability and its various subcomponents.
Several studies support the synthesis of reliability and credibility into a comprehensive accuracy
measure. Evaluators have been found to pair source reliability and information credibility scores from
the same level (i.e., providing ratings of A1, B2, C3, D4, E5 or F6), and also to base decisions about
accuracy more on credibility than reliability.80 The ambiguity inherent in combining incongruent
ratings may partially explain why evaluators often default to ratings from the same level.81 Moreover,
Nickerson and Feehrer note that when no other information is available to gauge information
credibility, evaluators will logically base their rating on source reliability, given that reliable sources
tend to produce credible information.82 Similarly, Lemercier suggests that determining source
reliability is not an end in itself, but rather a means of assessing information credibility.83
A single accuracy measure could address several challenges related to incongruent ratings and the lack
of comparability between the two scales.84 Samet showed that analysts estimate accuracy less reliably
when basing their decision on separate reliability and credibility metrics than when accuracy is based on a
single measure.85 Similarly, Mandel et al. found that analysts show poor test-retest reliability when
estimating the accuracy of information with incongruent reliability/credibility scores (namely, A5, E1)
compared to when they estimated the accuracy of information with congruent scores (namely, A1, E5).86
The same study showed that inter-analyst agreement plummets as source reliability and information
credibility scores become less congruent.87 For instance, analysts agree less on the accuracy of A5 than A4,
which in turn yields lower agreement than A3, and so on. The research just discussed indicates that,
although the two scales may be treated as distinct in doctrine, in practice, evaluators and users do not –
and perhaps cannot for psychological reasons beyond their control – treat them as such.
Current evaluation standards also lack mechanisms for comparing multiple items of varying quality,
which intelligence analysts are often required to do.88 For instance, it is unclear how analysts should
weight one piece of information rated B3 against another rated C2.89 The margin of interpretation may
be increased by the use of two different scale types; credibility comprises a positive-negative scale
(information is confirmed/invalidated), while reliability ranges from low/non-existent (the source has
provided little/no credible information) to a maximum level (the source has a history of complete
reliability).90 Without clear guidance on how to fuse the alphanumeric dual-valued code into a single
estimate of information accuracy, end users of such evaluations are left to their own (subjective)
interpretations.91 Again, this creates ample opportunity for unreliability within and across analysts, as
recent research has shown.92 The lack of comparability between scales also means that current
measures of reliability and credibility are ill suited for integration into a semi-automated system for
information evaluation.93
Information accuracy as a probability estimate

For the reasons outlined earlier, we argue that a unitary measure of information accuracy that takes
comprehensive account of relevant factors would be more conducive to the evaluation of informa-
tion than current two-part alphanumeric systems. We further propose that information accuracy be
expressed as a numeric probability estimate (e.g., Information A has a .75 probability of being
accurate). This method would not only bypass the vagueness and ambiguity inherent in the current
scale levels, it would also enable users to grade information with finer discrimination. While Samet
finds that users can make quantitative distinctions between the five levels in current scales, the
average size of the difference between the mean probabilities assigned by users to adjacent levels
indicates there is room for greater precision.94 These findings are comparable to those of Friedman
et al. in the context of geopolitical forecasting.95 They find that analysts can assign probabilities
more precisely than conventional wisdom supposes, and they show that the imprecision built into
current ‘words of estimative probability’ standards sacrifices predictive accuracy.96 Moreover, such
calibration skill can be improved through training. Tetlock and Gardner describe how analysts who
practice metacognition – the process of carefully considering estimates and translating uncertainty
terms into numeric values – improve at distinguishing degrees of uncertainty.97 Replacing ambig-
uous verbal labels with numeric values could also mitigate language barriers and most of the inter-
standard semantic issues we have identified.98
Another benefit of expressing information accuracy probabilistically is that the accuracy of these
judgements could be objectively measured to generate performance metrics. Brier scoring, for
instance, can measure how closely a probabilistic judgement conforms to a final outcome.99 While
post hoc review of every information rating is infeasible, evaluating the accuracy of a sufficiently
large, random selection of past ratings could have instructive value and improve evaluator
calibration.100 Such efforts would be much in line with the long foregone practice of conducting
validity studies.101 This method could also be used to establish accuracy base rates for certain
information types, which could in turn streamline subsequent evaluation and guide information
search strategies by informing analysts and collectors about which types of indicators might be
most informative.102 For instance, the estimated accuracy of information provided by an imaging
satellite will likely be close to 100 per cent, but establishing an accuracy baseline could help
identify abnormalities (e.g., when other technical collection platforms point towards a satellite
malfunction).
Expressing accuracy as a probability would also enable the use of Bayesian networks to capture
interactions between pieces of information during evidence marshalling.103 As new information
becomes available, Bayesian networks can be updated coherently; that is, respecting fundamental
axioms of probability theory, such as unitarity, additivity and non-negativity.104 This process may
reduce the systematic errors exhibited by individuals estimating the impact of less than totally
accurate information.105 Using Bayesian or other normative methods of belief revision in information
evaluation could also improve the fidelity of assessments where sources communicate information
using probabilistic language. Current methods provide no guidance for incorporating probabilistic
expressions into a broader evaluation of information accuracy (e.g., if a usually reliable source reports
she probably saw two helicopters). As Kent had long ago noted, given the subjectivity inherent in
interpreting probability phrases commonly used in intelligence production, this could be another
source of miscommunication embedded in current evaluation methods.106
While Bayesian networks can help analysts explore uncertain situations and overcome cognitive
biases, routine (as opposed to supplemental) use of these models could degrade analytic quality
due to the challenges of estimating certain input parameters (e.g., assessing the accuracy of
information provided under conditions of anonymity).107 To this point, Rogova argues that a priori
domain knowledge is often essential when determining many of the input parameters in a system
for information evaluation.108 McNaught and Sutovsky only advocate the use of Bayesian networks
for evidence marshalling when the input parameters are known with a ‘reasonable degree’ of
accuracy.109 Simply put, a coherent integration of ‘garbage in’, which Bayesian approaches should
ensure, will still yield ‘garbage out’. Coherence does not ensure accuracy, but it does raise the
likelihood of accurate judgement.110
Opponents of numeric probabilities in intelligence (among them, Kent’s proverbial ‘poets’) warn
that by exaggerating precision and analytic rigour, such expressions may render decision-makers
overconfident, and excessively risk seeking.111 Contrary to this assumption, however, Friedman,
Lerner and Zeckhauser show that national security officials presented numeric probability assess-
ments are actually less confident, and more receptive to gathering additional information.112
Quantifying the probability of information accuracy could also restrict the empirically demonstrated
tendency of individuals to exploit ambiguous (i.e., verbal) uncertainty expressions to reach self-
serving conclusions.113 While communicators of probabilistic information generally favour verbal
expressions of uncertainty, it is well-established that consumers of such information prefer numeric
estimates rather than verbal expressions.114 Given that information evaluation is fundamentally
consumer-oriented (i.e., designed to inform intelligence analysts), we believe the likely preferences
of these consumers will also favour numeric clarity over vague verbiage.
Concerns over exaggerated precision could be assuaged by explicitly educating both intelligence
producers and consumers that numeric probabilities can be used to convey degrees of belief or
subjective probabilities. In fact, numeric probability judgements do not imply anything about the
method by which one arrives at such judgements.115 Providing such information to educate users
could be achieved with a written disclaimer. Evaluators could also give confidence intervals on
information accuracy estimates, similar to the expressions of analytic confidence that accompany
certain intelligence assessments.116 For instance, an evaluator could judge the probability that
Information A is accurate to be 0.7 (or 70 per cent if expressed in percentage) with a 95 per cent
confidence interval of 0.55–0.85. In other words, the evaluator is conveying that he or she is 95 per cent
certain the probability lies between 55 per cent and 85 per cent and offers 70 per cent as the best
current estimate. By providing an explicit confidence interval, evaluators could directly militate against
misperceptions of over-precision. A confidence interval would also provide unique meta-information
to the consumer (e.g., capturing cases where high quality information is in conflict) and could thus
prompt requests for additional information or clarification of why confidence may be so low.
A probabilistic approach to information evaluation could be supplemented with training
designed to improve collectors’ and analysts’ understanding of probability and their statistical
skill. For example, Mandel designed a brief training protocol on Bayesian belief revision and
hypothesis testing with probabilistic information.117 Military intelligence analysts were assessed
on accuracy and probabilistic coherence before and after receiving the training. The results showed
a statistically significant improvement after training on both accuracy and coherence of analysts’
probability estimates, suggesting that intelligence professionals can reap quick wins in learning
that might enable them to better understand the kinds of probabilistic models described above.118
Similar results have been reported with non-analyst samples.119
This type of training is not only important for understanding such models, however. People
routinely violate logical constraints on probability assessments and exhibit systematic biases in
judgement, and there is no sound reason to believe that analysts are exempt from such
limitations.120 Indeed, findings from recent studies demonstrate that they are not.121 For instance,
Mandel, Karvetski and Dhami find that intelligence analysts violated coherence principles in
judging the probabilities of alternative hypotheses being true. Consistent with research on the
unpacking effect, analysts’ probability judgements assigned to four mutually exclusive and exhaus-
tive hypotheses summed to significantly more than 100 per cent.122 On average, analysts who used
the Analysis of Competing Hypotheses structured analytic technique to solve the hypothesis-
testing task were less coherent than analysts who were not instructed to use any structured
analytic technique.123 However, by using numeric probabilities, statistical (recalibration and aggre-
gation) methods could be applied to substantially improve both coherence and accuracy of
analysts’ judgements.124 The use of numeric probabilities would greatly improve the IC’s ability
to use post-analytic methods such as these to increase the accuracy and logical rigour of assess-
ments of information and, ultimately, intelligence.
Incorporating collaboration and revaluation into information evaluation

As noted previously, the position of information evaluation within the intelligence process varies
between methods, with implications for how information is rated and the extent to which it shapes
analytic judgements. Furthermore, while certain methods promote collaboration between analysts
and collectors (e.g., UK JDP 2–00), no guidance is given on how to aggregate multiple perspectives
on the same piece of information.125 For these reasons, we propose that practitioners at each stage
of the intelligence process provide individual accuracy assessments, which are subsequently
aggregated. Under this system, a collector would first estimate the probability of an item being
accurate and provide a brief but explicit outline of his or her reasoning. An analyst receiving the
information would follow the same process, without reference to the first rating. To aggregate the
ratings, the average probability could be calculated (see Figure 2).
By formally aggregating multiple perspectives, this system could reduce some of the inconsistencies
stemming from the varied position of information evaluation within the intelligence process. This
system could also provide a framework for coherently updating estimates as new meta-information
becomes available. In the context of intelligence forecasting, frequently updating estimates is shown to
improve accuracy.126 By invoking a hybrid process-outcome accountability system, the requirement for
evaluators to outline their rationales could also improve accuracy and calibration, and encourage more
thorough consideration of all available evidence.127 Yates et al. find a ‘very strong desire by consumers
that forecasters be able to give good explanations for their predictions’.128 Additionally, whenever an
evaluator critically misjudges a piece of information, written rationales could be used to reconstruct the
evaluation process and identify where errors in reasoning might have occurred.
Given the time-effort trade-offs inherent in intelligence production, a formal teaming process could be
reserved for pieces of information over which there is major disagreement between evaluators, or when
the information under scrutiny is particularly consequential (e.g., information that warranted the high-risk
raid on Osama bin Laden’s Abbottabad compound).129 Under the right circumstances, teamwork could
improve the accuracy of estimative judgements, streamline information processing and temper the
severity of order effects.130 Formalizing collaboration in the evaluation process could also enable
Figure 2. Example of collaborative information evaluation.

evaluators to pool reasoning strategies and expertise, and help address persistent communication gaps
between analysts and collectors.131
It should be emphasized that we are advocating a hybrid model of process accountability (via
written rationales) and outcome accountability (via accuracy estimates that can be objectively
measured to assess performance). In doing so, we are responding to the IC’s tendency to promote
good judgement by invoking process accountability.132 Efforts to standardize analytic processes,
unless paired with mechanisms to evaluate analytic outcomes, might in fact amount to little more
than ‘bureaucratic symbolism’ used to deflect blame for intelligence failures.133 As well, hybrid
accountability systems have been shown to strike an effective balance between pure accuracy
goals and knowledge-sharing goals.134
Considerations for information evaluators

Most current information evaluation methods incorporate information confirmation as a credibility
determinant. Triangulation can be used to identify confirmation (while mitigating order effects),
but can also be applied to assess the extent to which information is uniquely informative.135
Consideration of whether information provides a novel perspective is likely more relevant when
forming analytic judgements than gauging the accuracy of individual pieces of information. We
therefore suggest that the assessment of redundancy via triangulation take place following
information evaluation. To maintain the focus on accuracy, consideration of information complete-
ness and diagnosticity (i.e., the extent to which information is consistent with a hypothesis and
inconsistent or less consistent with competing hypotheses) could also take place later in the
intelligence process.136
Below we provide a non-exhaustive list of questions for information evaluators to review when
estimating accuracy and triangulating information. These questions respond to many of the issues
identified in our examination of current methods, and also reflect a broader review of the intelligence,
information fusion and decision science literatures. Evaluators can consider these questions while
arriving at probabilistic accuracy estimates, but also while grading information using the methods
currently in place (i.e., the Admiralty Code). Lacking an empirically grounded weighting scheme,
evaluators must judge which factors are most relevant to the information/intelligence requirement
at hand.
Considerations when evaluating information accuracy

● Does the source have obvious biases? (HUMINT)
● What are the source’s motivations? (HUMINT)
● What does the source expect to observe? (HUMINT)
● What does the source believe we want or expect them to report? (HUMINT)
● What relevant knowledge/expertise does the source possess? (HUMINT)
● How confident is the source in the information being reported? (HUMINT)
● Does the source have access to the information being reported?
● What are the source’s sensory capabilities?
● What is the source’s recall ability?
● What is the source’s reputation?
● How accurate has the source been in the past?
● How vulnerable is the source to manipulation?
● Is the source a primary source or a secondary/relaying source?
● How could information accuracy have been influenced by the source’s characteristics, given
the circumstances of collection?
● How could information accuracy have been influenced by the source’s characteristics, given
the type of information collected?
● How internally consistent is the information?

● How recent is the information?
● Based on triangulation, to what extent is the information corroborated?
● Based on triangulation, to what extent is the information contradicted?
● Is corroboration provided by friendly, hostile or independent sources?
● Is corroboration provided by multiple collection disciplines (i.e., ‘INTs’)?
● Is corroboration provided by the same collection discipline?
● What is the potential for denial and/or deception?
● How was the information modified or distorted as it moved up the reporting chain?
Considerations after information evaluation

● Does the information provide a new or original perspective?
● Does the information support one hypothesis more than others?
● Is the information open to multiple interpretations?
● How complete is the information?
Conclusion
Information evaluation is a critical function within the intelligence process. Arguably, failure to evaluate
‘is tantamount to judging [every piece of information] as completely trustworthy and their inputs as
equally important’.137 Based on an examination of current information evaluation methods, we identify
several limitations that can undermine the fidelity of information assessments and, by extension, the
quality of intelligence analysis and decision-making that depends on it.
Considering the diversity of intelligence contexts, and the complex interactions between informa-
tion accuracy determinants, we argue that designing a comprehensive method for scoring accuracy
determinants is unrealistic, and could inadvertently increase unreliability. Instead, we recommend the
following: First, information accuracy should be communicated as a subjective probability expressed in
numeric form, and clarified (when warranted) by a confidence interval. Second, collaboration and
revaluation should be formalized during information evaluation. Third, considerations of information
redundancy, completeness and diagnosticity should be considered later in the intelligence production
stage as part of the assessment process. At the very least, evaluators and consumers should be made
aware of the limitations identified, and intelligence organizations should seek to rectify obvious
inconsistencies between methods, especially at the national level.
Rather than imposing these methods on evaluators in every circumstance, we favour a pragmatic,
contingent approach in which the level of evaluative detail corresponds to the relative importance of
the information under scrutiny. A highly consequential piece of information may warrant teamwork
and thorough annotation (i.e., an accuracy estimate, confidence interval and written rationale), whereas
a singular accuracy estimate will suffice in most cases. Ultimately, subjective judgements will moderate
the level of detail of each evaluation. In providing these recommendations, we aim to improve
information evaluation without unduly taxing intelligence practitioners.
Notes
1. Johnson et al., “Utilization of Reliability Measurements.”
2. United Nations Office on Drugs and Crime, Criminal Intelligence Manual; and Carter, Law Enforcement
Intelligence.
3. Schum and Morris, “Assessing the Competence and Credibility of Human Sources”; and Betts, “Two Faces of
Intelligence Failure.”
4. Ibid.
5. Finley, The Uncertain Oracle; and Aid, “US HUMINT and COMINT.”
6. Finley, The Uncertain Oracle.
7. Azotea, “Operational Intelligence Failures.”

8. Ibid.
9. United States Department of the Army, FM 30–5; and NATO Standardization Office, AJP-2.1.
10. McDowell, Strategic Intelligence; Joseph and Corkill, “Information Evaluation”; and Hanson, “The Admiralty
Code.”
11. United States Department of the Army, FM 30–5; and Miron, Patten, and Halpin, The Structure of Combat
Intelligence Ratings.
12. NATO Standardization Office, STANAG 2511.
13. STANAG 2511 uses confirmed by other sources as its highest information credibility rating, where current
NATO doctrine substitutes completely credible.
14. Samet, Subjective Interpretation of Reliability and Accuracy Scales.
15. Capet and Revault d’Allonnes, “Information Evaluation.”
16. United States Department of the Army, FM 2–22.3; United States Department of the Army, TC 2–91.8; Department
of National Defence, CFJP 2–0; United Nations Office on Drugs and Crime, Criminal Intelligence Manual; United
States Department of the Army, ATP 2–22.9; and United States Department of the Army, ATP 3–39.20.
17. See note 14 above.
18. Ibid.
19. United States Department of the Army, FM 2–22.3; United States Department of the Army, TC 2–91.8; United
States Department of the Army, ATP 2–22.9; United States Department of the Army, ATP 3–39.20; and United
Kingdom Ministry of Defence, JDP 2–00.
20. United States Department of the Army, ATTP 2–91.5; and United States Department of the Army, TC 2–91.8.
21. Tecuci et al., Coping with the Complexity.
22. NATO Standardization Office, AJP-2.1, 3–14.
23. Baker, McKendry, and Mace, Certitude Judgements.
24. Ibid.
25. Levine and Samet, “Information Seeking with Multiple Sources.”
26. Lerner and Tetlock, “Accounting for the Effects of Accountability,” 255.
27. Mandel and Barnes, “Accuracy of Forecasts”; and Arkes and Kajdasz, “Intuitive Theories of Behavior.”
28. Lerner and Tetlock, “Accounting for the Effects of Accountability.”
29. Teigen and Brun, “The Directionality of Verbal Probability Expressions.”
30. Mandel, “Accuracy of Intelligence Forecasts.”
31. Besombes, Nimier, and Cholvy, “Information Evaluation in Fusion”; and Capet and Revault d’Allonnes,
“Information Evaluation.”
32. Schum, Evidence and Inference; and Pechan, “The Collector’s Role.”
33. Cholvy and Nimier, “Information Evaluation.”
35. Chang et al., “Restructuring Structured Analytic Techniques.”
36. Rogova, “Information Quality”; and Lemercier, “The Fundamentals of Intelligence.”
37. Lemercier, “The Fundamentals of Intelligence.”
38. Noble, “Diagnosing Distortion.”
39. Schum, Evidence and Inference; Noble, “Diagnosing Distortion”; and Capet and Revault d’Allonnes, “Information
Evaluation.”
42. Cholvy and Nimier, “Information Evaluation”; and Besombes, Nimier, and Cholvy, “Information Evaluation in
Fusion.”
44. Ibid.
45. Several credibility standards call for the independence of corroborating sources (e.g., US FM 2–22.3; CFJP 2–0),
but none examined consider other types of relationships.
46. Capet and Revault d’Allonnes, “Information Evaluation”; and Lesot, Pichon, and Delavallade, “Quantitative
Information Evaluation.”
47. Ibid.
48. Friedman and Zeckhauser, “Assessing Uncertainty in Intelligence.”
49. Ibid, 841.
50. Ibid.
51. Capet and Revault d’Allonnes, “Information Evaluation,” 117.
53. Ibid.
54. Tubbs et al., “Order Effects in Belief Updating.”
55. United States Department of the Army, ATTP 2–91.5; and United States Department of the Army, ATP 3–39.20.
56. Nickerson and Feehrer, Decision Making and Training; Oskamp, “Overconfidence in Case-study Judgements”;
and Tsai, Klayman, and Hastie, “Effects of Amount of Information.”
57. Nickerson and Feehrer, Decision Making and Training; and Tsai, Klayman, and Hastie, “Effects of Amount of
Information.”
59. United States Department of the Army, TC 2–91.8; and United States Department of the Army, ATP 2–22.9.
60. United States Department of the Army, ATTP 2–91.5, 4–4.
61. Joseph and Corkill, “Information Evaluation.”
62. United Nations Office on Drugs and Crime, Criminal Intelligence Manual.
63. NATO Standardization Office, AJP-2.1.
64. United Kingdom Ministry of Defence, JDP 2–00.
65. Carter, Law Enforcement Intelligence; Noble, “Diagnosing Distortion”; and Pechan, “The Collector’s Role.”
66. Schum, Evidence and Inference; Pechan, “The Collector’s Role”; Cholvy and Nimier, “Information Evaluation”; and
Noble, “Diagnosing Distortion.”
68. Schum and Morris, “Assessing the Competence and Credibility of Human Sources”; and McNaught and
Sutovsky, “Representing Variable Source Credibility.”
69. United States Department of the Army, ATP 3–39.20; and United States Department of the Army, ATTP 2–91.5.
70. Kent, “Words of Estimative Probability.”
71. Rieber and Thomason, “Creation of a National Institute”; Pool, Field Evaluation; Dhami et al., “Improving
Intelligence Analysis”; and Mandel, “Can Decision Science Improve Intelligence Analysis?”
72. Pool, Field Evaluation; and Mandel and Barnes, “Geopolitical Forecasting Skill.”
73. Chang et al., “Restructuring Structured Analytic Techniques”; Irwin and Mandel, Methods for Communicating
Analytic Confidence; Irwin and Mandel, Methods for Communicating Estimative Probability; Barnes, “Making
Intelligence Analysis More Intelligent”; Dhami, “Towards an Evidence-based Approach”; Chang and Tetlock,
“Rethinking the Training”; Hulnick, “What’s Wrong with the Intelligence Cycle”; and Wheaton, “Re-imagining
the Intelligence Process.”
74. Chang et al., “Restructuring Structured Analytic Techniques”; Barnes, “Making Intelligence Analysis More
Intelligent”; Dhami, “Towards an Evidence-based Approach”; and Chang and Tetlock, “Rethinking the Training.”
75. Canadian Forces School of Military Intelligence, Source Reliability and Information; Schum and Morris,
“Assessing the Competence and Credibility of Human Sources”; and Lesot, Pichon, and Delavallade,
“Quantitative Information Evaluation.”
76. Rogova, “Information Quality”; and Rogova et al., “Context-based Information Quality.”
77. Rogova et al., “Context-based Information Quality.”
78. Wong and Jassemi-Zargani, “Predicting Image Quality.”
79. Kerbel, Are the Analytic Tradecraft Standards Hurting as Much as Helping?
80. Baker, McKendry, and Mace, Certitude Judgements; Miron, Patten, and Halpin, The Structure of Combat
Intelligence Ratings; and Samet, “Quantitative Interpretation of Two Qualitative Scales.”
82. Nickerson and Feehrer, Decision Making and Training.
85. Samet, “Quantitative Interpretation of Two Qualitative Scales.”
86. Mandel, “Proceedings of SAS-114 Workshop.”
87. Ibid.
88. McNaught and Sutovsky, “Representing Variable Source Credibility.”
90. Ibid.
91. Besombes, Nimier, and Cholvy, “Information Evaluation in Fusion.”
95. Friedman et al., “The Value of Precision.”
96. Ibid.
97. Tetlock and Gardner, Superforecasting.
99. Mellers et al., “Psychological Strategies”; Chang and Tetlock, “Rethinking the Training”; and Mandel and Barnes,
“Geopolitical Forecasting Skill.”
100. Rieber, “Intelligence Analysis”; and Fischoff and Chauvin, Intelligence Analysis.
101. Marchio, “Analytic Tradecraft.”
102. Nelson, “Finding Useful Questions”; and Chang and Tetlock, “Rethinking the Training.”

104. Karvetski et al., “Probabilistic Coherence Weighting.”
106. Kent, “Words of Estimative Probability”; and Irwin and Mandel, Methods for Communicating Estimative
Probability.
108. Rogova, “Information Quality.”
109. McNaught and Sutovsky, “Representing Variable Source Credibility,” 50.
110. Wang et al., “Aggregating Large Sets of Probabilistic Forecasts”; Karvetski et al., “Probabilistic Coherence
Weighting”; Mellers et al., “How Generalizable is Good Judgement?”; and Mandel et al., “Boosting
Intelligence Analysts’ Judgment Accuracy.”
111. Kent, “Words of Estimative Probability”; and Friedman, Lerner, and Zeckhauser, “Behavioral Consequences of
Probabilistic Precision.”
112. Friedman, Lerner, and Zeckhauser, “Behavioral Consequences of Probabilistic Precision.”
113. Brun and Teigen, “Verbal Probabilities”; and Piercey, “Motivated Reasoning.”
114. Murphy et al., “Misinterpretation of Precipitation Probability Forecasts”; Brun and Teigen, “Verbal Probabilities”;
and Wallsten et al., “Preferences and Reasons for Communicating Probabilistic Information.”
115. Savage, The Foundations of Statistics; and Edwards, Lindman, and Savage, “Bayesian Statistical Inference.”
116. Irwin and Mandel, Methods for Communicating Analytic Confidence.
117. Mandel, “Instruction in Information Structuring.”
118. Ibid.
119. Sedlmeier and Gigerenzer, “Teaching Bayesian Reasoning”; and Chang et al., “Developing Expert Political
Judgment.”
120. Karvetski et al., “Probabilistic Coherence Weighting”; Tversky and Koehler, “Support Theory”; Villejoubert and
Mandel, “The Inverse Fallacy”; Mandel, “Are Risk Assessments of a Terrorist Attack Coherent?”; and Mandel,
“Violations of Coherence.”
121. Mandel, “Instruction in Information Structuring”; and Mandel et al., “Boosting Intelligence Analysts’ Judgment
Accuracy.”
122. Mandel, “Are Risk Assessments of a Terrorist Attack Coherent?”; and Tversky and Koehler, “Support Theory.”
123. Mandel et al., “Boosting Intelligence Analysts’ Judgment Accuracy.”
124. Ibid.
126. Mellers et al., “Psychological Strategies.”; and Tetlock and Gardner, Superforecasting.
127. Siegel-Jacobs and Yates, “Effects of Procedural and Outcome Accountability”; Scholten et al., “Motivated
Information Processing.”
128. Yates et al., “Good Probabilistic Forecasters.”
129. Friedman and Zeckhauser, “Handling and Mishandling Estimative Probability.”
130. Mellers et al., “Psychological Strategies”; Mellers et al., “The Psychology of Intelligence Analysis”; Tetlock and
Gardner, Superforecasting; Bang and Frith, “Making Better Decisions”; Laughlin et al., “Groups Perform Better”;
Laughlin, Group Problem Solving; and Ahlawat, “Order Effects and Memory.”
131. Laughlin et al., “Groups Perform Better”; and Hulnick, “What’s Wrong with the Intelligence Cycle.”
132. Tetlock and Mellers, “Intelligent Management of Intelligence Agencies.”
133. Ibid., 542.
134. Chang et al., “Accountability and Adaptive Performance.”
135. Heale and Forbes, “Understanding Triangulation in Research.”
136. Nickerson, “Confirmation Bias.”
137. Nickerson and Feehrer, “Decision Making and Training,” 5.
Acknowledgements
The authors thank Lars Borg, Stephen Coulthart, Jeffrey Friedman, Kristan Wheaton and two anonymous reviewers for
their helpful comments on previous drafts.
Disclosure statement
No potential conflict of interest was reported by the authors.
Disclaimer
The views presented are those of the authors and do not represent the views of the Department of National Defence
or any of its components, or the Government of Canada. This article builds upon an earlier version that will be a
forthcoming chapter in the final report of the NATO System Analysis and Studies Panel Activity on Communication
and Assessment of Uncertainty in Intelligence to Support Decision-Making (SAS-114).
Funding
This work was supported by the Joint Intelligence Collection and Analytic Capability Project #05ad and Canadian
Safety and Security Program project #2016-TI-2224. These projects are carried out by Defence Research and
Development Canada, an agency of the Department of National Defence.
Notes on contributors
Daniel Irwin is a Research Technologist with Defence Research and Development Canada. He holds an MS in Applied
Intelligence from Mercyhurst University.
David R. Mandel is a senior Defence Scientist with Defence Research and Development Canada and Adjunct Professor
of Psychology at York University. He publishes widely in peer-reviewed journals on the topics of reasoning, judge-
ment, and decision-making and has co-edited The Psychology of Counterfactual Reasoning, Neuroscience of Decision
Making, and Improving Bayesian Reasoning: What Works and Why? Mandel is Chairman of the NATO System Analysis
and Studies Panel Research Technical Group on Assessment and Communication of Uncertainty in Intelligence to
Support Decision Making (SAS-114) and Principal Investigator of multiple Canadian government projects aimed at
improving intelligence production through the application of decision science.
ORCID
David R. Mandel http://orcid.org/0000-0003-1036-2286
Bibliography
Ahlawat, S. S. “Order Effects and Memory for Evidence in Individual versus Group Decision Making in Auditing.”
Journal of Behavioral Decision Making 12 (1999): 71–88. doi:10.1002/(ISSN)1099-0771.
Aid, M. “US HUMINT and COMINT in the Korean War: From the Approach of War to the Chinese Intervention."
Intelligence and National Security 14, no. 4 (1999): 17–63. doi:10.1080/02684529908432570.
Arkes, H. R., and J. Kajdasz. “Intuitive Theories of Behavior.” In Intelligence Analysis: Behavioral and Social Scientific
Foundations, edited by B. Fischoff and C. Chauvin, 143–168. Washington, DC: The National Academies Press,
2011.
Azotea, C. M. “Operational Intelligence Failures of the Korean War.” Master’s thesis, US Army Command and General
Staff College, School of Advanced Military Studies, 2014.
Baker, J. D., J. M. McKendry, and D. J. Mace. “Certitude Judgments in an Operational Environment.” Technical Research
Note 200. Arlington, VA: US Army Research Institute for Behavioral and Social Sciences, 1968. doi:10.1055/s-0028-
1105114.
Bang, D., and C. D. Frith. “Making Better Decisions in Groups.” Royal Society Open Science 4, no. 8 (2017): 170–193.
doi:10.1098/rsos.170193.
Barnes, A. “Making Intelligence Analysis More Intelligent: Using Numeric Probabilities.” Intelligence and National
Security 31, no. 3 (2016): 327–344. doi:10.1080/02684527.2014.994955.
Besombes, J., V. Nimier, and L. Cholvy. “Information Evaluation in Fusion Using Information Correlation.” In Paper
presented at the 12th International Conference on Information Fusion. Seattle, WA, 2009.
Betts, R. K. “Two Faces of Intelligence Failure: September 11 and Iraq’s Missing WMD.” Political Science Quarterly 122,
no. 4 (2008): 585–606. doi:10.1002/j.1538-165X.2007.tb00610.x.
Brun, W., and K. H. Teigen. “Verbal Probabilities: Ambiguous, Context-Dependent, or Both?” Organizational Behavior
and Human Decision Processes 41 (1988): 390–404. doi:10.1016/0749-5978(88)90036-2.
Canadian Forces School of Military Intelligence. Source Reliability and Information Credibility Matrix (V 1.2). Kingston,
Canada: DND, no date.
Capet, P., and T. Delavallade, eds. Information Evaluation. Hoboken, NJ: Wiley-ISTE, 2014.
Capet, P., and A. Revault d’Allonnes. “Information Evaluation in the Military Domain: Doctrines, Practices, and
Shortcomings.” In Information Evaluation, edited by P. Capet and T. Delavallade, 103–125. Hoboken, NJ: Wiley-
ISTE, 2014.
Carter, D. L. Law Enforcement Intelligence: A Guide for State, Local, and Tribal Law Enforcement Agencies. 2nd ed.
Washington, DC: Office of Community Oriented Policing Services, U.S. Department of Justice, 2009, 57–75, 283–
317.
Chang, W., P. Atanasov, S. Patil, B. A. Mellers, and P. E. Tetlock. “Accountability and Adaptive Performance under
Uncertainty: A Long-Term View.” Judgment and Decision Making 12, no. 6 (2017): 610–626.
Chang, W., E. Berdini, D. R. Mandel, and P. E. Tetlock. “Restructuring Structured Analytic Techniques in Intelligence.”
Intelligence and National Security 33, no. 3 (2018): 337–356. doi:10.1080/02684527.2017.1400230.
Chang, W., E. Chen, B. Mellers, and P. Tetlock. “Developing Expert Political Judgment: The Impact of Training and
Practice on Judgmental Accuracy in Geopolitical Forecasting Tournaments.” Judgment and Decision Making 11, no.
5 (2016): 509–526.
Chang, W., and P. E. Tetlock. “Rethinking the Training of Intelligence Analysts.” Intelligence and National Security 31, no.
6 (2016): 903–920. doi:10.1080/02684527.2016.1147164.
Cholvy, L. “Information Evaluation in Fusion: A Case Study.” In Paper presented at the International Conference on
Processing and Management of Uncertainty in Knowledge Based Systems, Perugia, Italy, 2004.
Cholvy, L., and V. Nimier. “Information Evaluation: Discussion About STANAG 2022 Recommendations.” In Proceedings
of the RTO IST Symposium on Military Data and Information Fusion, Prague, Czech Republic, 2003.
Department of National Defence. Canadian Forces Joint Publication CFJP 2-0, Intelligence. Ottawa, Canada: DND, 2011.
Dhami, M. K. “Towards an Evidence-Based Approach to Communicating Uncertainty in Intelligence Analysis.”
Intelligence and National Security 33, no. 2 (2018): 257–272. doi:10.1080/02684527.2017.1394252.
Dhami, M. K., D. R. Mandel, B. A. Mellers, and P. E. Tetlock. “Improving Intelligence Analysis with Decision Science.”
Perspectives on Psychological Science 106, no. 6 (2015): 753–757. doi:10.1177/1745691615598511.
Edwards, W., H. Lindman, and L. J. Savage. “Bayesian Statistical Inference for Psychological Research.” Psychological
Review 70, no. 3 (1963): 193–242. doi:10.1037/h0044139.
Finley, J. P. The Uncertain Oracle: Some Intelligence Failures Revisited. Fort Huachuca, AZ: US Army Intelligence Center
and Fort Huachuca, 1995.
Fischoff, B., and C. Chauvin, eds. Intelligence Analysis: Behavioral and Social Scientific Foundations. Washington, DC: The
National Academies Press, 2011.
Friedman, J. A., J. D. Baker, B. A. Mellers, P. E. Tetlock, and R. Zeckhauser. “The Value of Precision in Probability
Assessment: Evidence from a Large-Scale Geopolitical Forecasting Tournament.” International Studies Quarterly 62,
no. 2 (2018): 410–422.
Friedman, J. A., J. S. Lerner, and R. Zeckhauser. “Behavioral Consequences of Probabilistic Precision: Experimental
Evidence from National Security Professionals.” International Organization 71, no. 4 (2017): 803–826. doi:10.1017/
S0020818317000352.
Friedman, J. A., and R. Zeckhauser. “Assessing Uncertainty in Intelligence.” Intelligence and National Security 27, no. 6
(2012): 824–847. doi:10.1080/02684527.2012.708275.
Friedman, J. A., and R. Zeckhauser. “Handling and Mishandling Estimative Probability: Likelihood, Confidence, and the
Search for Bin Laden.” Intelligence and National Security 30, no. 1 (2015): 77–99. doi:10.1080/02684527.2014.885202.
Hanson, J. M. “The Admiralty Code: A Cognitive Tool for Self-Directed Learning.” International Journal of Learning,
Teaching and Educational Research 14, no. 1 (2015): 97–115.
Heale, R., and D. Forbes. “Understanding Triangulation in Research.” Evidence Based Nursing 16 (2013): 98. doi:10.1136/
eb-2012-101141.
Heuer, R. J., Jr. Psychology of Intelligence Analysis. Washington, DC: Central Intelligence Agency, Center for the Study of
Intelligence, 1999.
Hulnick, A. S. “What’s Wrong with the Intelligence Cycle.” Intelligence and National Security 21, no. 6 (2006): 959–979.
doi:10.1080/02684520601046291.
Irwin, D., and D. R. Mandel. Methods for Communicating Analytic Confidence in Intelligence to Decision-Makers: An
Annotated Collection. DRDC Scientific Letter: DRDC-RDDC-2018-L020. Ottawa, Canada: DND, 2018.
Irwin, D., and D. R. Mandel. Methods for Communicating Estimative Probability in Intelligence to Decision-Makers: An
Annotated Collection. DRDC Scientific Letter: DRDC-RDDC-2018-L017. Ottawa, Canada: DND, 2018.
Irwin, D., and D. R. Mandel. Methods for Evaluating Source Reliability and Information Credibility in Intelligence and Law
Enforcement: An Annotated Collection. DRDC Scientific Letter: DRDC-RDDC-2018-L035. Ottawa, Canada: DND, 2018.
Johnson, E. M., R. C. Cavanagh, R. L. Spooner, and M. G. Samet. “Utilization of Reliability Measurements in Bayesian
Inference: Models and Human Performance.” IEEE Transactions on Reliability 22, no. 3 (1973): 176–182. doi:10.1109/
TR.1973.5215934.
Joseph, J., and J. Corkill. “Information Evaluation: How One Group of Intelligence Analysts Go about the Task.” In
Proceedings of the fourth Australian Security and Intelligence Conference, Perth, Australia, 2011.
Karvetski, C. W., K. C. Olson, D. R. Mandel, and C. R. Twardy. “Probabilistic Coherence Weighting for Optimizing Expert
Forecasts.” Decision Analysis 10, no. 4 (2013): 305–326. doi:10.1287/deca.2013.0279.
Kent, S. “Words of Estimative Probability.” In Sherman Kent and the Board of National Estimates: Collected Essays, edited
by D. P. Steury, 133–146. Washington, DC: Center for the Study of Intelligence, 1994.
Kerbel, J. Are the Analytic Tradecraft Standards Hurting as much as Helping? Research Short. Bethesda, MD: National
Intelligence University, 2017.
Laughlin, P. R. Group Problem Solving. Princeton, NJ: Princeton University Press, 2011.
Laughlin, P. R., E. C. Hatch, J. S. Silver, and L. Boh. “Groups Perform Better than the Best Individuals on Letters-to-
Numbers Problems: Effects of Group Size.” Journal of Personality and Social Psychology 90, no. 4 (2006): 644–651.
doi:10.1037/0022-3514.90.4.644.
Lemercier, P. “The Fundamentals of Intelligence.” In Information Evaluation, edited by P. Capet and T. Delavallade, 55–
100. Hoboken, NJ: Wiley-ISTE, 2014.
Lerner, J. S., and P. E. Tetlock. “Accounting for the Effects of Accountability.” Psychological Bulletin 125, no. 2 (1999):
255–275.
Lesot, M., F. Pichon, and T. Delavallade. “Quantitative Information Evaluation: Modeling and Experimental Evaluation.”
In Information Evaluation, edited by P. Capet and T. Delavallade, 187–228. Hoboken, NJ: Wiley-ISTE, 2014.
Levine, J. M., and M. G. Samet. “Information Seeking with Multiple Sources of Conflicting and Unreliable Information.”
Human Factors 15, no. 4 (1973): 407–419. doi:10.1177/001872087301500412.
Mandel, D. R. “Are Risk Assessments of a Terrorist Attack Coherent?” Journal of Experimental Psychology: Applied 11, no.
4 (2005): 277–288. doi:10.1037/1076-898X.11.4.277.
Mandel, D. R. “Violations of Coherence in Subjective Probability: A Representational and Assessment Processes
Account.” Cognition 106, no. 1 (2008): 130–156. doi:10.1016/j.cognition.2007.01.001.
Mandel, D. R. “Accuracy of Intelligence Forecasts from the Intelligence Consumer’s Perspective.” Policy Insights from
the Behavioral and Brain Sciences 2 (2015): 111–120. doi:10.1177/2372732215602907.
Mandel, D. R. “Instruction in Information Structuring Improves Bayesian Judgment in Intelligence Analysts.” Frontiers in
Psychology 6 (2015): 1–12. doi:10.3389/fpsyg.2015.00001.
Mandel, D. R. “Proceedings of SAS-114 Workshop on Communicating Uncertainty, Assessing Information Quality and
Risk, and Using Structured Techniques in Intelligence Analysis.” In NATO Meeting Proceedings [Pub. Ref. STO-MP-
SAS-114-AC/323(SAS-114)TP/780]. Brussels, Belgium: NATO STO, 2018. doi: 10.14339/STO-MP-SAS-114.
Mandel, D. R. “Can Decision Science Improve Intelligence Analysis?.” In Researching National Security Intelligence: A
Reader, edited by S. Coulthart, M. Landon-Murray, D. Van Puyvelde. Washington, DC: Georgetown University Press,
in press.
Mandel, D. R., and A. Barnes. “Accuracy of Forecasts in Strategic Intelligence.” Proceedings of the National Academy of
Sciences 111, no. 30 (2014): 10984–10989. doi:10.1073/pnas.1406138111.
Mandel, D. R., and A. Barnes. “Geopolitical Forecasting Skill in Strategic Intelligence.” Journal of Behavioral Decision
Making 31, no. 1 (2018): 127–137. doi:10.1002/bdm.v31.1.
Mandel, D. R., C. W. Karvetski, and M. K. Dhami. “Boosting Intelligence Analysts’ Judgment Accuracy: What Works?
What Fails?.” Judgment and Decision Making, 13, no. 6 (2018), 607–621.
Marchio, J. “‘Analytic Tradecraft and the Intelligence Community: Enduring Value, Intermittent Emphasis.” Intelligence
and National Security 29, no. 2 (2014): 159–183. doi:10.1080/02684527.2012.746415.
McDowell, D. Strategic Intelligence: A Handbook for Practitioners, Managers, and Users. Lanham, MD: Scarecrow Press,
2009.
McNaught, K., and P. Sutovsky. “Representing Variable Source Credibility in Intelligence Analysis with Bayesian
Networks.” In Proceedings of the fifth Australian Security and Intelligence Conference, Perth, Australia, 2012.
Mellers, B., E. Stone, P. Atanasov, N. Rohrbaugh, S. E. Metz, L. Ungar, M. M. Bishop, M. Horowitz, E. Merkle, and P.
Tetlock. “The Psychology of Intelligence Analysis: Drivers of Prediction Accuracy in World Politics.” Journal of
Experimental Psychology: Applied 21, no. 1 (2015): 1–14. doi:10.1037/xap0000040.
Mellers, B., L. Ungar, J. Baron, J. Ramos, B. Gurcay, K. Fincher, S. E. Scott, et al. “Psychological Strategies for Winning a
Geopolitical Forecasting Tournament.” Psychological Science 25, no. 5 (2014): 1106–1115. doi:10.1177/
0956797614524255.
Mellers, B. A., J. D. Baker, E. Chen, D. R. Mandel, and P. E. Tetlock. “How Generalizable Is Good Judgement? A Multi-
Task, Multi-Benchmark Study.” Judgment and Decision Making 12, no. 4 (2017): 369–381.
Miron, M. S., S. M. Patten, and S. M. Halpin. “The Structure of Combat Intelligence Ratings.” Technical Paper 286.
Arlington, VA: US Army Research Institute for Behavioral and Social Sciences, 1978.
Murphy, A. H., S. Lichtenstein, B. Fischhoff, and R. L. Winkler. “Misinterpretation of Precipitation Probability Forecasts.”
Bulletin of the American Meteorological Society 6 (1980): 695–701. doi:10.1175/1520-0477(1980)061<0695:
MOPPF>2.0.CO;2.
NATO Standardization Office. STANAG 2511 – Intelligence Reports. 1st ed. Belgium: Brussels, 2003.
NATO Standardization Office. AJP-2.1, Edition B, Version 1: Allied Joint Doctrine for Intelligence Procedures. Belgium:
Brussels, 2016.
Nelson, J. D. “Finding Useful Questions: On Bayesian Diagnosticity, Probability, Impact, and Information Gain.”
Psychological Review 112, no. 4 (2005): 979–999. doi:10.1037/0033-295X.112.4.979.
Nickerson, R. S. “Confirmation Bias: A Ubiquitous Phenomenon in Many Guises.” Review of General Psychology 2, no. 2
(1998): 175–220. doi:10.1037/1089-2680.2.2.175.
Nickerson, R. S., and C. E. Feehrer. “Decision Making and Training: A Review of Theoretical and Empirical Studies of
Decision Making and Their Implications for the Training of Decision Makers.” Technical Report: NAVTRAEQUIPCEN 73-
C-0128-1. Cambridge, MA: Bolt, Beranek and Newman, Inc., 1975.
Noble, G. P., Jr. “Diagnosing Distortion in Source Reporting: Lessons for HUMINT Reliability from Other Fields.” Master’s
thesis, Mercyhurst College, 2009.
Norman, D. “How to Identify Credible Sources on the Web.” Master’s thesis, Joint Military Intelligence College, 2001.
Oskamp, S. “Overconfidence in Case-Study Judgements.” Journal of Consulting Psychology 29, no. 3 (1965): 261–265.
doi:10.1037/h0022125.
Pechan, B. L. “The Collector’s Role in Evaluation.” In Inside CIA’s Private World: Declassified Articles from the Agency’s
Internal Journal, 1955-1992, edited by H. B. Westerfield, 99–107. New Haven, CT: Yale University Press, 1995.
Piercey, M. D. “Motivated Reasoning and Verbal vs. Numerical Probability Assessment: Evidence from an Accounting
Context.” Organizational Behavior and Human Decision Processes 108, no. 2 (2009): 330–341. doi:10.1016/j.
obhdp.2008.05.004.
Pool, R. Field Evaluation in the Intelligence and Counterintelligence Context: Workshop Summary. Washington, DC: The
National Academies Press, 2010.
Rieber, S. “Intelligence Analysis and Judgemental Calibration.” International Journal of Intelligence and
CounterIntelligence 17, no. 1 (2004): 97–112. doi:10.1080/08850600490273431.
Rieber, S., and N. Thomason. “Creation of a National Institute for Analytic Methods.” Studies in Intelligence 49, no. 4
(2005): 71–78.
Rogova, G., M. Hadzagic, M. St-Hillaire, M. Florea, and P. Valin. “Context-Based Information Quality for Sequential
Decision Making.” In Paper presented at the 2013 IEEE International Multi-Disciplinary Conference on Cognitive
Methods in Situation Awareness and Decision Support (CogSIMA), San Diego, CA, 2013.
Rogova, G., and P. Scott, eds. Fusion Methodologies in Crisis Management Higher Level Fusion and Decision Making.
Cham, Switzerland: Springer International Publishing, 2016.
Rogova, G. L. “Information Quality in Information Fusion and Decision Making with Applications to Crisis
Management.” In Fusion Methodologies in Crisis Management, Higher Level Fusion and Decision Making, edited by
G. L. Rogova and P. Scott, 65–86. Cham, Switzerland: Springer International Publishing, 2016.
Samet, M. G. “Quantitative Interpretation of Two Qualitative Scales Used to Rate Military Intelligence.” Human Factors
17, no. 2 (1975): 192–202. doi:10.1177/001872087501700210.
Samet, M. G. “Subjective Interpretation of Reliability and Accuracy Sales for Evaluating Military Intelligence.” Technical
Paper 260. Arlington, VA: US Army Research Institute for Behavioral and Social Sciences, 1975.
Savage, L. J. The Foundations of Statistics. New York, NY: Wiley, 1954.
Scholten, L., D. van Knippenberg, B. A. Nijstad, and C. K. W. De Reu. “Motivated Information Processing and Group
Decision-Making: Effects of Process Accountability on Information Processing and Decision Quality.” Journal of
Experimental Social Psychology 43 (2007): 539–552. doi:10.1016/j.jesp.2006.05.010.
Schum, D. A. Evidence and Inference for the Intelligence Analyst. Lanham, MD: University Press of America, 1987.
Schum, D. A., and J. R. Morris. “Assessing the Competence and Credibility of Human Sources of Intelligence Evidence:
Contributions from Law and Probability.” Law, Probability and Risk 6 (2007): 247–274. doi:10.1093/lpr/mgm025.
Sedlmeier, P., and G. Gigerenzer. “Teaching Bayesian Reasoning in Less than Two Hours.” Journal of Experimental
Psychology: General 130, no. 3 (2001): 380–400.
Siegel-Jacobs, K., and J. F. Yates. “Effects of Procedural and Outcome Accountability on Judgement Quality.”
Organizational Behavior and Human Decision Processes 65, no. 1 (1996): 1–17. doi:10.1006/obhd.1996.0001.
Tecuci, G., M. Boicu, D. Schum, and D. Marcur. “Coping with the Complexity of Intelligence Analysis: Cognitive
Assistants for Evidence-Based Reasoning.” Research Report #7, Learning Agents Center. Fairfax, VA: George Mason
University, 2010.
Teigen, K. H., and W. Brun. “The Directionality of Verbal Probability Expressions: Effects on Decisions, Predictions, and
Probabilistic Reasoning.” Organizational Behavior and Human Decision Processes 80, no. 2 (1999): 155–190.
doi:10.1006/obhd.1999.2857.
Tetlock, P. E., and D. Gardner. Superforecasting: The Art and Science of Prediction. New York, NY: Crown Publishing
Group, 2015.
Tetlock, P. E., and B. A. Mellers. “Intelligent Management of Intelligence Agencies: Beyond Accountability Ping-Pong.”
American Psychologist 66, no. 6 (2011): 542–554. doi:10.1037/a0024285.
Tsai, C. I., J. Klayman, and R. Hastie. “Effects of Amount of Information on Judgement Accuracy and Confidence.”
Organizational Behavior and Human Decision Processes 107 (2008): 97–105. doi:10.1016/j.obhdp.2008.01.005.
Tubbs, R. M., G. J. Gaeth, I. P. Levin, and L. A. Van Osdol. “Order Effects in Belief Updating with Consistent and
Inconsistent Evidence.” Journal of Behavioral Decision Making 6 (1993): 257–269. doi:10.1002/(ISSN)1099-0771.
Tversky, A., and D. J. Koehler. “Support Theory: A Nonextensional Representation of Subjective Probability.”
Psychological Review 101, no. 4 (1994): 547–567. doi:10.1037/0033-295X.101.4.547.
United Kingdom Ministry of Defence. Joint Doctrine Publication JDP 2-00, Understanding and Intelligence Support to
Joint Operations. 3rd ed. Swindon, United Kingdom: MOD, 2011.
United Nations Office on Drugs and Crime. Criminal Intelligence Manual for Analysts. Vienna, Austria: UNODC, 2011.
United States Department of the Army. Field Manual FM 30-5, Combat Intelligence. Washington, DC: DOD, 1951.
United States Department of the Army. Field Manual FM 2-22.3, Human Intelligence Collector Operations. Washington,
DC: DOD, 2006.
United States Department of the Army. Document and Media Exploitation Tactics, Techniques, and Procedures ATTP 2-
91.5 - Final Draft. Washington, DC: DOD, 2010.
United States Department of the Army. Training Circular TC 2-91.8, Document and Media Exploitation. Washington, DC:
DOD, 2010.
United States Department of the Army. Army Techniques Publication ATP 2-22.9, Open-Source Intelligence. Washington,
DC: DOD, 2012.
United States Department of the Army. Army Techniques Publication ATP 3-39.20, Police Intelligence Operations.
Washington, DC: DOD, 2012.
Villejoubert, G., and D. R. Mandel. “The Inverse Fallacy: An Account of Deviations from Bayes’s Theorem and the
Additivity Principle.” Memory & Cognition 30, no. 2 (2002): 171–178. doi:10.3758/BF03195278.
Wallsten, T. S., D. V. Budescu, R. Zwick, and S. M. Kemp. “Preferences and Reasons for Communicating Probabilistic
Information in Verbal or Numerical Terms.” Bulletin of the Psychonomic Society 31 (1993): 135–138. doi:10.3758/
BF03334162.
Wang, G., S. R. Kulkami, H. V. Poor, and D. N. Osherson. “Aggregating Large Sets of Probabilistic Forecasts by Weighted
Coherent Adjustment.” Decision Analysis 8 (2011): 128–144. doi:10.1287/deca.1110.0206.
Wheaton, K. “Re-Imagining the Intelligence Process or ‘Let’s Kill the Intelligence Cycle’.” 2012. http://files.isanet.org/
ConferenceArchive/1b1feb5339bf4573a9545306f734eedb.pdf
Wong, S., and R. Jassemi-Zargani. “Predicting Image Quality of Surveillance Sensors.” DRDC Scientific Report: DRDC-
RDDC-2014-R97. Ottawa, Canada: DND, 2014.
Yates, J. F., P. C. Price, J. Lee, and J. Ramirez. “Good Probabilistic Forecasters: The ‘Consumer’s’ Perspective.”
International Journal of Forecasting 12, no. 1 (1996): 41–56. doi:10.1016/0169-2070(95)00636-2.

Improving Information Evaluation For Intelligence Production

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Improving Information Evaluation For Intelligence Production

Uploaded by

Copyright:

Available Formats

Intelligence and National Security

ISSN: 0268-4527 (Print) 1743-9019 (Online) Journal homepage: https://www.tandfonline.com/loi/fint20

Improving information evaluation for intelligence

Daniel Irwin & David R. Mandel

To link to this article: https://doi.org/10.1080/02684527.2019.1569343

Published online: 06 Feb 2019.

Submit your article to this journal

View Crossmark data

Full Terms & Conditions of access and use can be found at

Improving information evaluation for intelligence production

CONTACT Daniel Irwin dan.irwin@drdc-rddc.gc.ca

evaluations and, by extension, the quality of intelligence consumers’ decision-making. We link

Overview of current methods

Table 1. NATO AJP-2.1 source reliability and information credibility scales.

Table 2. NATO STANAG 2511 source reliability scale.

Table 3. NATO STANAG 2511 information credibility scale.

Information evaluation methods in context

An alternative approach to information evaluation

Information accuracy as a probability estimate

Incorporating collaboration and revaluation into information evaluation

Figure 2. Example of collaborative information evaluation.

Considerations for information evaluators

Considerations when evaluating information accuracy

● How internally consistent is the information?

Considerations after information evaluation

7. Azotea, “Operational Intelligence Failures.”

103. See note 1 above.

You might also like