You are on page 1of 7

Information Fusion for Intelligence Analysis

1


Kari Chopra
Aptima, Inc.
kchopra@aptima.com

Craig Haimson
Aptima, Inc.
haimson@aptima.com


Abstract
1


Ensuring the accuracy of intelligence assessments is
made difficult by the pervasiveness of uncertainty in
intelligence information and the demand to fuse
information from multiple sources. This paper describes
Infusiun, a model-based software tool for information
fusion and uncertainty assessment in intelligence
analysis.

1. Introduction

Intelligence analysts overall do an exceptional job in
protecting the interests of the American people and their
country. They continually face much more complex
problem sets and the increase in available data has grown
dramatically over the past 20 years. As analysts struggle
to remain on top of emerging events they have adopted
survival techniques that can help them sift through large
amounts of data, produce accurate estimates and
predictions, and report information to their
customers [1].
Ensuring the accuracy of intelligence assessments is
made difficult by the pervasiveness of uncertainty in
intelligence information and the demand to fuse
information from multiple sources. As a result, there is
growing interest within the Intelligence Community in
tools and techniques to assist the analyst in coping with
the volume and complexity of information.
The goal of the present work is to develop the
Information Fusiun and Uncertainty Analysis (Infusiun)
software, a model-based tool that assists the intelligence
analyst in aggregating information from multiple
sources, identifying critical sources of uncertainty, and
determining the impact of the uncertainty on the
analysts ability to assess potential threats in the
environment.

1
This research was supported by the Office of Naval
Research, Contract N00014-04-M-0055.

The structure of the paper is as follows. The first
section presents a brief description of the practice of
intelligence analysis, based on interviews with subject
matter experts and a review of the background literature
[1][2][4][5]. We then describe a simple mathematical
model for assessing the validity of conclusions drawn
from potentially unreliable evidence. The final section
presents a brief discussion of future work.

2. Domain

Intelligence production begins with the definition of an
intelligence problem by a customer/consumer.
Intelligence requests vary along several different
dimensions, depending chiefly on the consumers
intended use of the supplied intelligence. Intelligence
requests may be of a technical, tactical, operational, or
strategic nature, covering issues of military, civil, or
geographic importance. In addition to consumer-driven
requests, agencies are also tasked with standing
requirements, such as uncovering threats to Americans
citizens or facilities (i.e., investigating any evidence of
such threats, should it become available). Once provided
with tasking, an analyst determines the availability and
quality of relevant data sources, identifies information
gaps, and generates requirements for data collection that
may be satisfied through various methods.
Once collection requirements have been satisfied, an
analyst sifts through accumulated data in order to
assemble a set of facts for analysis. The analyst distills
these facts from data that have been selected on the basis
of their perceived value. In part, the analyst appraises the
basic content of the information, judging such qualities
as clarity, relevance to the intelligence request, and
significance or diagnosticity (i.e., the extent to which the
information unambiguously supports a potential
analytical conclusion). Corroboration is highly valued,
whether it reflects redundancy (two sources provide the
same piece of evidence) or consistency (two sources
provide two different pieces of evidence, both of which
support the same theory or analytical conclusion).
0-7695-2268-8/05/$20.00 (C) 2005 IEEE
Proceedings of the 38th Hawaii International Conference on System Sciences - 2005
1
Additional factors that influence an analysts
estimation of information likelihood relate to the
perceived reliability and credibility of the informations
source. Source valuation is partially based on the
valuation of the information that the source provides; if
any portion of that information is deemed to be unlikely
or inaccurate, the rest is typically rejected, as well.
However, other more general factors are considered,
such as the overall reporting history of the source. Thus,
perceived source reliability builds over time, throughout
the course of analysts own experience and the
accumulated experience of the intelligence agency for
which the analyst works. A source with a history of
demonstrated accuracy and credibility will be valuated
higher than an unknown or untrusted source.
Analysts develop conclusions based on the strength of
the evidence they select for synthesis and analysis. Such
conclusions depend heavily upon the effective use of
situational logic (trying to develop consistent mental
models and stories to explain data) and abduction
(determining the most-likely/best-supported story based
on strength/preponderance of evidence). Analysts may
document the products of their ongoing analysis using
aids such as notecards, databases, decision trees,
networks of links between individuals and organizations,
etc., and a number of computerized tools exist to support
these activities.
Tools for weighting source reliability, recording
associations between individuals in different networks,
linking facts to conclusions via structured argumentation
formats, etc., have been developed previously. However,
such tools have generally been poorly received by the
intelligence community for at least two reasons:
1. They often impose too rigid a structure upon the
format in which intelligence requests, data, and
conclusions must be represented. The
expectation that all questions can be answered
in terms of yes/no or likely/not likely responses
is unfounded for many intelligence problems
that require highly nuanced solutions (e.g.,
conclusions predicated on different sets of
contingencies).
2. They often succumb to the fallacy of
precision by requiring and producing exact
numeric quantities for inherently imprecise
phenomena. Analysts are extremely wary of
providing numeric estimates (e.g., probability
estimates) because there is virtually never
sufficient information available to generate a
reasonable estimate.
3. They often remove the process of analytical
judgment from the analyst, requiring the analyst
to input assessments of information and then
automatically inferring conclusions that may or
may not coincide with the analysts own
intuition. Such dynamically-generated
conclusions may change with subtle alterations
in underlying parameters for reasons that are not
intuitive or psychologically salient to the
analyst. If analysts are required to demonstrate
that their conclusions correspond with those of
the tool, they may be tempted to alter inputs to
the tool post hoc in order to force it to generate
desired outputs.
To avoid these mistakes, the Infusiun model possesses
a simple and flexible structure that capitalizes on the
subjective judgment of the analyst. These judgments are
elicited from the analyst through qualitative scales that
are intuitive and aligned with accepted practices, such as
the rating of the reporting history of a source. The focus
of the software is less on computing a precise numerical
outcome and more on providing a qualitative guide that
enables the analyst to quickly evaluate and compare
alternate hypotheses and identify gaps and conflicts that
may necessitate further collection.

3. Model

The process for information fusion and analysis in
Infusiun consists of two basic activities:
Collection: the analyst extracts facts from
intelligence documents or sources;
Inference: the analyst evaluates the facts to
determine the existence of threats in the
environment.
The mathematical model described in this section is thus
structured along similar lines. Concepts for calculations
within the model were derived from external work on
fuzzy aggregation operations [2] and the propagation of
heuristic knowledge in belief models [6].

3.1 Collection

The basic elements of the model relevant to
information collection are:
A set of facts F = {f
1
, f
2
, , f
I
}
A set of excerpts E = {e
1
, e
2
, , e
J
}
A set of sources S = {s
1
, s
2
, , s
K
}. Sources
may be either people (individuals or
organizations) or documents.
0-7695-2268-8/05/$20.00 (C) 2005 IEEE
Proceedings of the 38th Hawaii International Conference on System Sciences - 2005
2
The elements defined above are also linked to each
other through a set of relationships. These relationships
are depicted in Figure 1 and described in detail below.
Each excerpt is extracted from a single source.
Let s(e
k
) S denote the source that contains or
articulates the excerpt e
k
. Then s(e
k
) is referred
to as the informant for the excerpt e
k
.
Each excerpt is mapped to a single fact. Let f(e
k
)
denote the fact that represents the excerpt e
k
.
Then f(e
k
) is referred to as the meaning of the
excerpt e
k
.
A fact may be associated with one or more
excerpts. Let E(f
j
) denote the set of excerpts
associated with the fact f
i
. Then E(f
i
) is referred
to as the basis for the fact f
i
.
Given the basis E(f
i
) of a fact f
i
, we can derive
the set of sources providing support for the fact
f
i
. Let S(f
i
) denote the set of informants for the
excerpts in the basis of fact f
i
, i.e.,
S(f
i
) = {s(e
k
) | e
k
E(f
j
)}. Then S(f
i
) is referred
to as the informant set for the fact f
i
.
For the sake of convenience, the form of notation
above may also be represented by an alternate notation
that uses subscripts in place of parentheses. For example,
the notation
k
e
d is equivalent to d(e
k
).

3.1.1. Reliability. Each source has a reliability rating
associated with it. Let rel(s
k
) denote the reliability of the
source s
k
, and let Rel represent the range of values that
rel can assume. This reliability measure obeys the
following conventions:
The range of values is finite, i.e., |Rel| < +.
The reliability value rel is an ordinal measure
rather than an interval measure. Thus the set Rel
is strictly ordered, and we may assume without
loss of generality that the members of Rel may
be labeled as {Rel
1
, Rel
2
, , Rel
R
} where
Rel
1
< Rel
2
< < Rel
R
.
A source s
k
may have a null reliability value,
also denoted Rel
0
, indicating that the reliability
is unknown or unassigned.
For the current domain, the range of applicable reliability
values is displayed in Table 1.

3.1.2. Reliability profile. Define the reliability profile
of a fact f
i
, denoted Rel(f
i
), as the vector of the number of
sources supporting the fact at each level of reliability.
That is,

Rel(f
i
) = [Rel
0
(f
i
), Rel
1
(f
i
), , Rel
R
(f
i
)]
Figure 1. Relationships between facts, excerpts, and sources.
0-7695-2268-8/05/$20.00 (C) 2005 IEEE
Proceedings of the 38th Hawaii International Conference on System Sciences - 2005
3

where Rel
h
(f
i
) is the number of informants for the fact f
i

whose corresponding reliability is Rel
h
, i.e.,

Rel
h
(f
i
) = |{s S(f
i
): rel(s) = Rel
h
}|, h = 0, , R.

An alternate notation for the reliability profile that
explicitly links each entry to the corresponding reliability
value is:

( )
( ) ( ) ( )
[ ]
1 1 1 1 0
,..., ,
1 0
f Rel
R
f Rel f Rel
i
R
Rel Rel Rel f = Rel .

For example, consider the case depicted in Figure 1.
Suppose that the sources in the informant set have the
following reliability ratings:

Source Reliability
s
1
Unknown
s
2
Potential
s
3
Reliable
s
4
Reliable

The reliability profile of fact f
i
is computed as:

Reliability No. of Sources
Unknown 1
Unreliable 0
Potential 1
Reliable 2

Then the following expressions for the reliability profile
are equivalent:

Rel(f
i
) = [1, 0, 1, 2];

Rel(f
i
) = [Unknown
1
, Unreliable
0
, Potential
1
, Reliable
2
].

3.1.3. Corroboration. Each fact f has a measure of
corroboration associated with it, denoted corr(f). The
corroboration represents the extent to which the fact is
supported by reliable sources. Thus, the measure of
corroboration of fact f is computed on the basis of the
reliability profile Rel(f).
Determining the level of corroboration of a fact is
guided by the following comparative heuristics, elicited
from subject matter experts during the Operational
Analysis:
It is better to have corroboration from multiple
sources than from a single source.
It is better to have corroboration from a source
with a higher level of reliability than from a
source with a lower level of reliability.
Information from unknown and unreliable
sources is treated with equal skepticism.

These heuristics lead to a categorization of facts based
on the number and degree of reliability of the sources in
its informant set. Thus we define corr(f
i
) to be a
categorical variable with the range of values displayed in
Table 2. Moreover, the heuristics induce a partial
ordering on corr(f
i
), as shown in Figure 2.

3.2 Inference

The basic elements related to inference are the
following:
A set of questions Q = {q
1
, q
2
,, q
X
}
A set of mutually exclusive outcomes for each
question q:

Q q
qY q q
Q q
q
o o o q O O

= = } ,..., , { ) (
2 1
.
Alternately, the set of outcomes for a question
q = q
x
may be denoted by:
{ }
x
xY x x x
o o o O ,..., ,
2 1
= .

3.2.1. Implication. Evidence regarding the true outcome
of a question q is assembled by analyzing the
implications of the available facts. This is expressed by
measuring the implication with which a fact f F
supports a particular outcome o O
q
. The implication of
the fact f on the outcome o represents the degree to
Table 1. Range of values for source reliability.
Value Label Description
Rel
0
Unknown No prior history or knowledge of source
Rel
1
Unreliable Source is believed to be unreliable
Rel
2
Potential Prior history shows some evidence of reliable reporting, but not enough to
designate source as Reliable
Rel
3
Reliable Source is believed to be reliable
0-7695-2268-8/05/$20.00 (C) 2005 IEEE
Proceedings of the 38th Hawaii International Conference on System Sciences - 2005
4
which f (if true) implies that the true answer to q is o, and
not one of the other outcomes (O
q
\ o).
Let imp(f, o) denote the strength of implication of the
fact f on the outcome o, and let
Imp = {Imp
1
, Imp
2
,, Imp
P
} denote the range of values
of the strength measure. As with the measure of source
reliability, this measure is assumed to be ordinal but not
interval in nature. A null or zero value indicates that the
fact f implies that the outcome o is not the true answer to
the question q. For current purposes, the range of
implication values is simply defined to be:

Imp = {0, Low, Med, High},

where

0 < Low < Med < High.

A fact may be associated with multiple outcomes (e.g.,
the fact f implies that the true answer to q is either o
1
or
o
2
). Similarly, an outcome may be associated with
multiple facts. Define the evidence F(o) as the set of
facts that imply the outcome o, i.e.,

F(o) = {f F | imp(f, o) > 0}.

Similarly, define the consequences O(f) as the set of
outcomes that are implied by the fact f, i.e.,

O(f) = {o O | imp(f, o) > 0}.

3.2.2. Support. The total amount of support that a fact f
provides to an outcome o is determined as the product of
the corroboration of f and the implication of f on o. Let
spt(f, o) denote the level of support that fact f provides
for the outcome o, and let Spt = {Spt
1
, Spt
2
, , Spt
S
}
represent the range of permissible values.
Given that both corr and imp are defined as categorical
measures, we define spt to be categorical as well.
Essentially, each value of spt(f, o) is defined as the 2-
tuple of the implication and corroboration, i.e.,
(corr
f
, imp
fo
). Thus the measure of support is calculated
as follows:

spt: (F O) Spt

where:

=
) , (
0
) , (
fo f
imp corr
o f spt

Moreover, the orderings of Corr and Imp induce a
partial ordering on Spt as well, where:

spt(f
1
, o
1
) > spt(f
2
, o
2
)
iff
corr(f
1
) > corr(f
2
) imp(f
1
, o
1
) > imp(f
2
, o
2
).

if imp
fo
= 0;
otherwise.
Table 2. Categorical values for the level of corroboration of a fact.
Category Description Constraints
Uncorroborated The fact f
i
does not have any supporting excerpts
obtained from sources
|E(f
i
)| = 0
Single unknown or
unreliable source
The fact f
i
is corroborated by a single source that is
either unknown or unreliable
|S(f
i
)| = 1
Rel
Unreliable
(f
i
) + Rel
Unknown
(f
i
) = 1
Single potential
source
The fact f
i
is corroborated by a single source with
potential reliability
|S(f
i
)| = 1
Rel
Potential
(f
i
) = 1
Single reliable
source
The fact f
i
is corroborated by a single source with
a history of reliability
|S(f
i
)| = 1
Rel
Reliable
(f
i
) = 1
Multiple reliable
sources
The fact f
i
is corroborated by multiple sources, at
least one of which has a history of reliability
|S(f
i
)| > 1
Rel
Reliable
(f
i
) 1
Multiple potential
sources
The fact f
i
is corroborated by multiple sources of
potential reliability at best
|S(f
i
)| > 1
Rel
Reliable
(f
i
) = 0
Rel
Potential
(f
i
) 1
Multiple unknown
or unreliable sources
The fact f
i
is corroborated by multiple sources, all
of whose reliability is either unknown or known to
be unreliable
|S(f
i
)| > 1
Rel
Reliable
(f
i
) = 0
Rel
Potential
(f
i
) = 0
0-7695-2268-8/05/$20.00 (C) 2005 IEEE
Proceedings of the 38th Hawaii International Conference on System Sciences - 2005
5
3.2.3. Validity. The validity of an outcome o reflects the
degree to which it is supported by the total evidence
available. In essence, validity sums up the support
provided for the outcome o across all the supporting facts
in evidence, F
o
. The definition of the validity measure
val(o) simply counts the number of occurrences of each
potential value of its parent measure. Thus we have:

Val(o) = [val
1
(o), val
2
(o), , val
S
(o)]

where:

val
s
(o) = |{ f F : spt(f, o) = Spt
s
}|.

Similar to the notation for corroboration, the validity
vector may be written as a superscripted list of the
support values, i.e.,

[ ]
) ( ) (
2
) (
1
,... , ) (
2 1
o val
S
o val o val
S
Spt Spt Spt o = Val .

Alternately, the validity measure may be denoted using
the subscripts of the corresponding corroboration and
implication, i.e.:

val
cp
(o) = val
s
(o),

where:

Spt
s
= (Corr
c
, Imp
p
).

3.2.4. Dominance. Given a question q and two distinct
outcomes o
1
and o
2
, we can compare the validity of both
outcomes to determine which is more likely to be true,
given the available evidence. We use the term dominance
to describe the relative validity between two outcomes,
and define a dominance rule (denoted ) as a
comparison operation that induces a partial ordering on
the space of possible validity vectors. In other words, we
say that outcome o
1
dominates outcome o
2
if o
2
o
1

under a chosen dominance rule.
The rule of strong dominance requires that outcome o
1

has at least as much support as o
2
at all levels of support
and exceeds it for at least one level. In other words, the
strong dominance rule requires that the following two
conditions are satisfied:

val
s
(o
1
) val
s
(o
2
) s = 1, , S;

s [1, S]

val
s
(o
1
) > val
s
(o
2
).

The rule of weak dominance relaxes this requirement
so that outcome o
1
need only exceed the support for o
2
at
the highest levels, i.e.:

val
s
(o
1
) val
s
(o
2
)
[ s' [1, S]

val
s'
(o
1
) val
s'
(o
2
) Spt
s'
> Spt
s
]
s = 1, , S;

s [1, S]

val
s
(o
1
) > val
s
(o
2
) Spt
s'
> Spt
s

val
s'
(o
1
) val
s'
(o
2
) s' [1, S].

4. Discussion

The current research discusses the development of
Infusiun, a model-based software tool to support the
fusion of uncertain information in intelligence analysis.
The present paper has provided a brief overview of the
problem domain and the mathematical model at the heart
of Infusiun.
The model presented herein outlines a simple process
for obtaining an overall picture of the validity of a
conclusion based on unreliable evidence, and to compare
competing hypotheses to determine which holds the
greater degree of support within the available evidence.
Work is ongoing to extend the model to incorporate
additional features such as:
Derivation of complex conclusions based on
pre-defined conclusions;
Sensitivity analysis to determine the value of
acquiring additional corroboration of the
available evidence;
Figure 2. Partial ordering of
levels of corroboration.
0-7695-2268-8/05/$20.00 (C) 2005 IEEE
Proceedings of the 38th Hawaii International Conference on System Sciences - 2005
6
Impact analysis to determine the
consequences of changes in the perceived
reliability of a source;
Collaboration support to fuse strength and
reliability judgments from multiple analysts.
We believe that building on the framework established
here has the potential to produce a highly effective and
intuitive decision aid for intelligence analysts.

5. References

[1] Clark, R.M. Intelligence Analysis: Estimation and
Prediction. Baltimore, MD: American Literary Press, 1996.

[2] Dubois, D. and H. Prade, On the use of aggregation
operations in information fusion processes, Fuzzy Sets and
Systems 142, 2004, pp. 143-161.

[3] Heuer, R.J. Psychology of Intelligence Analysis.
Washington, DC: Center for the Study of Intelligence, 1999.

[4] Johnston, R. Foundations for meta-analysis: Developing
a taxonomy of intelligence analysis variables, Studies in
Intelligence 47:3, 2003.

[5] Johnston, R. Reducing analytic error: Integrating
methodologists into teams of substantive experts, Studies in
Intelligence 47:1, 2003.

[6] Liu, W., J.G. Hughes, and M.F. McTear, Representing
heuristic knowledge and propagating beliefs in the Dempster-
Shafer theory of evidence. In R.R. Yager, J. Kacprzyk, and M.
Fedrizzi (eds.), Advances in the Dempster-Shafer Theory of
Evidence, New York: John Wiley & Sons, 1994.



0-7695-2268-8/05/$20.00 (C) 2005 IEEE
Proceedings of the 38th Hawaii International Conference on System Sciences - 2005
7

You might also like