You are on page 1of 14

PROCEEDINGS OF SPIE

SPIEDigitalLibrary.org/conference-proceedings-of-spie

Performance evaluation of single-


target tracking in clutter

Oliver E. Drummond

Oliver E. Drummond, "Performance evaluation of single-target tracking in


clutter," Proc. SPIE 2468, Acquisition, Tracking, and Pointing IX, (26 May
1995); doi: 10.1117/12.210449

Event: SPIE's 1995 Symposium on OE/Aerospace Sensing and Dual Use


Photonics, 1995, Orlando, FL, United States

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use


Performance Evaluation of Single Target Tracking in Clutter

Oliver E. Drummond

Consulting Engineer
10705 Cranks Road, Culver City, CA 90230
Telephone: 310-838-5300

ABSTRACT

Performance evaluation is fairly straight forward when tracking a single bright target. Tracking a dim target in clutter, on the
other hand, can present some challenges in performance evaluation. For example, a target track can be a false track. A false
track is one based primarily on clutter points or false signals rather than the target. How is a false track to be identified and
what measures of performance should be used to account for a false track?

In comparing two different trackers or two variants of the same tracker, how can the performance be evaluated effectively and
fairly? One tracker might acquire the target earlier (at greater range) than the other, yet not track as accurately. In some
scenarios one tracker might even miss the target completely, that is, exhibit a missed track. This is another performance
evaluation issue that must be addressed.

This paper presents a two-step method for evaluating performance of tracking a single target in clutter. The first step
classifies the target track as valid, missed or false. Various tracking measures of performance are then computed in the
second step. This two-step approach provides a systematic method for tracker performance evaluation. More importantly,
this methodology is designed to permit a fair evaluation and comparison oftwo or more competing trackers.

KEYWORDS: Performance Evaluation, Measures Of Performance, Trackers, Tracking, Single Target Tracking, Multiple
Object Tracking, Multiple Track Processing, Multiple Sensor Tracking, False Tracks.

1. INTRODUCTION

Performance evaluation of both single and multiple target trackers have been conducted for many years. Then why this paper
on performance evaluation of single target trackers, especially when single target trackers are substantially easier to evaluate
than multiple target trackers? There are four motivations for this paper, as follows:

. There is a growing recognition of the advantages and needs for tracking a single target with what could be called
multiple track processing. For example, multiple track processing is being considered for air intercept missiles
to take advantage ofthe imaging sensors that will be used to enable target detection at longer ranges.

. There have been lessons learned in recent years from extensive discussions and papers on multiple target tracker
performance evaluation. There have also been some insights gained from experience in the evaluation and
comparison of two competing single target trackers.
• There is the recognition that it is critical to provide a fair method of evaluation when comparing the performance
of trackers developed by competing suppliers.
• There is still no single, well-accepted methodology for evaluating tracker performance and there will be
potential controversy about any selected approach due to the uncertainty and complex nature of the problem.

The method used to evaluate tracker performance depends in part on the type of tracker employed. It is therefore important to
distinguish between at least two major types of trackers and a discussion of these two is provided in the following subsections.

92 ISPIE Vol. 2468 O-8194-1821-8/95/$6.OO

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
In the discussion that follows, a definite distinction is made between false signals and clutter points. By false signal is meant
a threshold exceedance of the signal detection processing that is caused by random errors. The random error can be due to
electronic noise or random sensor phenomena. A clutter point is a threshold exceedance caused by a relatively bright
background feature. A clutter point is persistent over time and is typically approximately stationary relative to some
coordinate system. Thus clutter points will move together (more or less) in the sensor field of view over at least a short time
interval. For example, bright spots within a cloud will move together over a moderate time period. Multiple clouds will move
together in the field of view of an JR sensor for at least a short period of time. The major difference between clutter points
and false signals is that the locations of clutter points are correlated over time. The locations of false signals are virtually
independent over time.

1.1 Closed Loop Trackers

There are a number of single target tracking applications that employ what could be called a closed loop tracker. In a classical
closed 1oop tracker, the opto-mechanical sensor is gimbal mounted and directed so that the apparent target image is positioned
near the center of the field of view. The brightest (or darkest) object in the field of view is detected by the signal processor,
processed, and then sent to the sensor gimbal driver. The signal processor signal is typically temporally filtered before being
sent to the gimbal driver to reduce the high frequency errors. This type of tracker is most useful for applications where the
target is bright (high contrast) compared to the background and random errors, such as electronic noise. Note that this type of
tracker, does not incorporate data association processing and the filter may be fairly simple compared to a Kalman filter.
Also, this type of tracker can employ either an analog or digital processor and does not maintain multiple tracks.

1.2 Multiple Object Trackers (Multiple Track Processing)

As system requirements become more stringent, simple closed loop tracking is giving way to advanced digital processing that
maintains multiple tracks in digital data files. To acquire a target at greater ranges, the processor must detect and track the
target when some clutter points are about as bright as the target and others may be even brighter.

For tracking a dim target in clutter, a multiple object tracker can be used to maintain tracks on not only the target but also on
other objects, in particular, persistent clutter points. A multiple object tracker used to track a single target will be referred to
as a single target, multiple object tracker (STMOT). A STMOT maintains multiple tracks to aid in distinguishing clutter
points from the target.

With a STMOT, the signal processing detection threshold is set low enough to detect the target at the desired range. All the
threshold exceedances that are processed by the signal processor are then passed on to the tracker. As a consequence, clutter
points and false signals are also detected, processed by the signal processor and passed to the tracker. The threshold
exceedances passed on to the tracker by the signal processor are sometimes called hits orobservations. Figure 1 displays the
overview of one typical algorithm architecture (processing chain).

In starting and maintaining tracks on the various objects in the field of view, a typical STMOT would include the functions of
track promotion/demotion logic and target acquisition logic. The four stages in the evolution of any track might be:

. Track initiation
. Tentative track maintenance
S Mature track maintenance
. Track termination

The track promotion/demotion logic decides when to initiate a track, when to promote an initiated track (a candidate track) to
a tentative track, when to promote a tentative track to a mature track, and when to demote or terminate a track. Since a track
could be based on false signals, clutter, or the target, the target acquisition logic function is needed to decide which track is
the target. The target acquisition logic typically processes all the mature tracks and declares one of them the target when
specified test criteria are met by a track. When the tracker decides that the target of interest has been acquired, the track

SPIE Vol. 2468 / 93

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
selected to represent the target is provided as the output of the tracker. This selected track will be referred to simply as the
acquired track. Typically there is no track provided by the tracker until the target is acquired.

The design of a typical track promotion/demotion logic makes it very unlikely that a mature track will be caused exclusively
by false signals. Therefore, an acquired track could be caused primarily by clutter or the target of interest. Even if the
acquired track is due to the target, it could include clutter points and false signals and hence exhibit larger than normal
estimation errors.

A STMOT is similar to well know multiple target trackers"2'3 (MTT) except that there is assumed to be at most one target of
interest in the field of view. The other persistent objects are clutter points. Both STMOT and MTT include data association
(sometimes called correlation) algorithms and a Kalman filter or a simplified version of one. Each track is maintained in a
file and includes the object's estimated state vector, other object attributes and the variance-covariance matrix of the
estimation errors.

Note that there are some trackers that maintain multiple tracks but employ very simplistic data association algorithms, such as
the nearest neighbor algorithm. While these trackers can track multiple objects or targets, performance falls short with close
or crossing objects. It may be misleading to call this type of tracker a STMOT or MTT, which typically refers to a tracker
with a more advanced data association algorithm.

By employing a STMOT, the signal processing threshold can be lowered. This permits the target to be detected at a greater
range. The tracker is assigned the task of distinguishing between detections due to clutter and those of the target. That is one
reason why the tracker is designed to track the clutter detects. Another advantage of tracking the clutter detects is that the
target could pass close to, under, or over a clutter point. The tracks of the target and the clutter aid in the decision of which
observation in each new frame of data is due to the target and which are due to the background clutter. This decision process
is handled by the data association algorithm.

There are many more advantages to using a STMOT that are beyond the scope of this paper to discuss. Replacing a closed
loop tracker with a STMOT introduces system design freedoms that may have a big impact on how some missiles, for
example, will be designed.

A STMOT may provide improved performance, but it increases the processing complexity and complicates the evaluation
process. Furthermore, it may be necessary to evaluate and compare a closed loop tracker to a STMOT.

1.3 Performance Evaluation Concerns

Evaluation of tracking performance is straightforward if there is only one target, few false signals and no relatively bright clutter
points. Under these conditions, a track is consistently updated with observations from the target. The target state estimate can be
compared with the true state ofthe target to evaluate track accuracy.

Whenever the target is dim relative to clutter, however, persistent clutter points can create objects in the field of view that cause
tracks to be formed. Most of these tracks should eventually be identified as clutter tracks. However, the target acquisition logic
could decide that a clutter track is from the target of interest. The performance evaluation methodology must accommodate all
types of acquired tracks, those for the target of interest and those due to background clutter and other objects that are not of
interest.

There is no well agreed upon method for evaluating performance of tracking a dim target in clutter. The evaluation
methodology requires that hard decisions be made that may not be universally accepted. There have been numerous
discussions of performance evaluation methods6 and this paper draws heavily from two papers on multiple target tracking
performance evaluation.4'5 In fact, portions of references 3 and 4 have been modified for use in this paper without citing them
every time. This paper also draws from personal experience in evaluating and comparing two competing single target trackers
designed for advanced performance. Note that tracking multiple targets of interest presents additional ambiguities and even
greater challenges for performance evaluation that will not be discussed since this paper concentrates on single target tracking.

94 / SPIE Vol. 2468

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
The performance evaluation methodology must be designed to be robust so that it can accommodate the various kinds of
sensor phenomenology and anomalous track conditions. The observations provided by the signal processor to the tracker
could be due to:
S false signals
. background clutter
. target of interest
. unresolved, closely spaced objects (CSOs) caused by the target and clutter

and an unresolved, closely spaced object is sometimes called a clump. Observations caused by other than the target of interest can
lead to anomalous track conditions, including:

. missed track: target without track


. false track: spurious or lost track
0 spurious track: track not related primarily to the target
0 lost track: once valid track that became spurious

which can make performance evaluation complex.

A fundamental problem in evaluating performance is that with a dim target, the acquired track can be a false track. A false
track is one based primarily on observations from clutter points plus possibly false signals. Also, even a valid track can
become a "lost track," namely, one that was following the target but laterwandered far off from the target and became a false
track. How is performance to be evaluated when the acquired track is a false track? In computing the average track accuracy,
should false tracks be included in the computations? False tracks can be far from the target of interest. If false tracks are
included, then the computed sample standard deviation or variance will be very large! If the false tracks are not included
when computing the average track accuracy, then how is a track identified as a false track? These are the issues that prompted
this paper.

This paper describes a two-stage method for performance evaluation, as illustrated in Figure 2. This approach is a single target
version ofthe multiple target methodology described in previous papers''5 First the acquired track is designated as either valid or
false and then, if the track is designated as valid, various measures of performance are computed. Alternative methods for the
first step are discussed and one used in practice is discussed in more detail.

It must be emphasized that the subject of this paper is the methodology for performance evaluation, NOT how to track. It is
assumed that the tracking algorithms have been designed and are ready to be evaluated. It is NOT the tracking algorithms, but
rather the methods for performance evaluation, that are addressed here. The emphasis of this paper is on the first step in this
process, namely, the decision of whether a track is valid. An earlier pape1 discusses some of the details of the second step in this
process as they apply to MTT, namely, some ofthe various measures of performance.

Ultimately, the performance of tracking algorithms is judged by the success or failure of the mission they support. The
probability of kill of a missile is one such measure of system performance. If the system mission is not met, what is the cause?
Was the tracker the culprit? A method of evaluating the tracker itself is thus needed.

The emphasis of this paper is on the part of the target-sensor scenario when there is a long range to the target. Under these
conditions the target is small in the field of view so that automatic target recognition is not effective using a single frame of data.
Under these challenging conditions the target can be confused with a false signal or clutter point.

This paper concentrates on describing a method to evaluate tracking algorithm performance with a computer Monte Carlo
simulation. Tracker performance as evaluated with a computer simulation is admittedly only an intermediate measure of
effectiveness for the system as a whole. After successful computer simulations of the tracker, then system simulations followed
by field and flight tests should follow.

SPIE Vol. 2468 / 95

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
2. SIMULATION CONSIDERATIONS

How the observations are generated by the simulation testbed can influence the selection ofperformance evaluation methodology.
There are various approaches to simulating the sensor and signal processing system in order to provide observations for the
tracker, which is the test article. One of the major aspects of a testbed is the degree of fidelity of the simulation, that is, how
accurately or faithfully does the testbed reproduce the characteristics ofthe system being simulated.

An earlier paper enumerates four generic classes of simulation fidelity. These four classes are summarized here because the
simulation class is indicative ofhow readily certain evaluation methodologies can be implemented.

Classes A and B: High Fidelity Simulations

The sensor and signal processing functions are simulated in detail. Threshold exceedances are computed based on data
from actual or simulated scenes containing target, clutter and random errors. Observations generated by the signal
processing algorithms are provided to the tracker.

Classes C and D: Moderate Fidelity or Simple Simulations

Observations are computed by generating random variables and adding them to truth for the location of the target and
simulated clutter points. Typically the simulation includes missed and false signals. Unresolved closely spaced objects
might be generated by combining raw observations before providing the fmal observations to the tracker.

The selection of a simulation approach can have a significant impact on how difficult it is to evaluate the performance. For
example, in evaluating track purity it is typically necessary to determine the object source for each observation. With Class C and
D simulations this might be a simple matter because ofthe simple, well-defmed way each observation is formed.

With Class A and B simulations, however, many complex factors interact to create an observation. Consequently, the object
source for an observation may not be obvious. For example, two observations may be caused by the target and a clutter point that
are close together. The attributes of the clutter point will affect the measurement vector of the observation of the target due to the
sensor spread function. Also, a clutter point near a target might incorrectly appear to be caused by the target yet have very
different attributes. Finally, misassociations can occur in the "color correlation" function of signal processing. The measurement
data from one sensor waveband for a target can be combined with the measurement data from another waveband for a clutter
point. Thus there can be confusion in determining which object goes with an observation.

An advantage to computer simulations is that the truth data that are needed for performance evaluation are readily available,
unlike in real world tests where accurate truth data can be difficult to obtain. With a simulation, numerous Monte Carlo runs can
be made to compute meaningful sample statistics for a scenario. Also, simulations permit evaluation of a variety of scenarios.
With field and flight tests, typically only a few scenarios are conducted with one run each. It is difficult to debug algorithms and
evaluate performance with only a few runs and without accurate truth data. Thus Monte Carlo simulations provide a valuable tool
for evaluating a tracker before committing expensive resources and time to conducting field or flight tests.

3. GOALS AND ASSUMPTIONS

To limit and simplify the discussion, the assumption is that there is at most one target of interest in the field of view and at
most one acquired track provided by the tracker. Furthermore, the performance is evaluated using multiple runs of a Monte
Carlo simulation from which target truth is known. The methodology should accommodate evaluation of a single tracker or
comparison of performance of two or more trackers. In comparing trackers, there could be trackers with different algorithms
or with the same algorithms but with their parameters adjusted differently. Also the methodology should be appropriate for
evaluating closed loop trackers and STMOT's with data from one or more sensors. If trackers are to be compared, their
parameters should be adjusted so that they exhibit comparable performance.

Of concern is evaluation of performance of only the acquired track. During algorithm development, not just the acquired
track but all tracks should be carefully scrutinized but that task is not addressed in this paper. Consideration is not given here

96 / SPIE Vol. 2468

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
to evaluation of overall system performance, target classification/recognition, signal processing or the other intermediate
tracks. Furthermore, to simplify the discussion, the track state vector is assumed to consist of estimated target position and
velocity in two or three dimensions.

The goal is to establish a methodology for evaluating the performance of tracking algorithms fairly, without showing undue
preference for or against any specific algorithm. If the performance of two (or more) competing trackers is being compared, it
is especially critical that the methodology be fair to both (all) trackers.

For instance, the evaluation method should be equally fair to different approaches to data association, including nearest neighbor
algorithms, joint probabilistic data association algorithms (JPDA), and multiple hypotheses algorithms. Nearest neighbor
algorithms assign at most one observation per track. By comparison, the JPDA algorithm weights each observation within a track
gate with a number between zero and one. Performance evaluations should properly accommodate these differences, where
appropriate, in order to evaluate algorithms on the merits oftheir performance rather than the tracker design.

Finally a robust methodology should also accommodate the full scope of sensor, signal processing, and tracking phenomena
including data misassociations, unresolved closely spaced objects, and color misassociations. These phenomena can cause a track
to use an observation that is either from an incorrect source or from multiple sources. In addition to accommodating differences
in tracking algorithms, the performance methodology must also accommodate these real-world target tracking and sensor
phenomena.

These various simplifying assumptions do not really limit the methodology from applying to more general situations. These
assumptions are invoked primarily to help simplify the discussion ofthe problem and the advocated methodology.

4. THE PROBLEM

The fundamental problem in evaluating performance is that the acquired track can be a false track. The first task is how to
determine if and when the acquired track is a false track. The second task is how to evaluate performance when the acquired
track is a false or missed track. These are some of the problems to be addressed in selecting a performance evaluation
methodology. Unfortunately, there is no nice, universally accepted, solution to these problems. Any chosen methodology
will be controversial.

Table 1 is helpful in understanding the _____________________________


difficulty in evaluating performance. There are Actual Number
basically two possible target conditions in Of Targets
single target tracking, either: there is a target in
the field of view or there is none. Similarly,
there are two track conditions, either: there is an 0 I
acquired track or there is none. If there is no _____________ _________ _______________ _______________
target in the field ofview, then: there is no track
or there is a false track, since there cannot be a 0 Valid Miss Missed Track
valid track.
Estimated
Number
On the other hand, if the target is in the field of Of Valid Track
view then there can be a missed track, a false Targets I FalseTrack or
track or a valid track. If there is no track then False Track
the result is a missed track. If there is an
acquired track, then it is either a valid track or a Table I. Possible Target-Track Outcomes.
false track.

A major dilemma can occur whenever there is a target in the field of view and there is an acquired track. Since the
track can be a false track or a valid track, a method is needed to determine if the track is valid or false. Typically with
a dim target, it is not obvious whether the track is valid or false. Some alternative solutions to this dilemma are discussed
in the Section 5.

SPIE Vol. 2468 / 97

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
Deciding if a track is false or valid is the first step in the performance evaluation methodology advocated by this paper. After
deciding whether a track is valid or false, then the performance can be evaluated, such as computing the state estimation errors.

If a track is declared a false or missed track, then how is performance to be evaluated? For example, how are the position or
state estimation errors to be computed? If there is a missed track, then there is no track and hence no way of calculating the
tracking accuracy. If there is no target in the field of view, then there is no true target state available to compute tracking
accuracy. These two conditions are clear cut in that the data is not available to compute the usual performance metrics, such
as estimation errors.

If there is a target in the field of view and the track has been declared false, then both track and target data are available.
Should track metrics, such as estimation errors, be computed? For example, in computing the average track variance over
Monte Carlo runs, should false tracks be included in the computations? False tracks can be far from the target of interest. If
false tracks are included, then the computed sample standard deviation or variance might be very large. By including a few
very large track errors in the sample statistic, then these metrics may be meaningless. On the other hand, if false tracks are
not included in the metrics for track accuracy, what additional measures of performance should be computed to account for the
lost tracks? These issues are also addressed in the sections that follow.

5. STEP 1 OF THE METHODOLOGY

In performance evaluation, the first task is to decide if a track is false or valid. As indicated in Section 4, there is no decisionto be
made except when the target is in the field of view and there is an acquired track. As indicated in Section 4, there is noproblem
in determining if a track is false ifthere is no target in the field of view. Similarly, there is no false track if there is a target in the
field of view but no acquired track. Thus this section will concentrate on the case in which there is a target in the field of view
and there is an acquired track.

In general, there are three methods of dealing with this issue, as follows:

S Make no decision
. Use observation source
. Use distance between track and target truth

and each ofthese will be discussed, in turn.

5.1 Make No False Track Decision

Some measures of performance can be computed without making the decision of whether a track is valid or false. The problem is
that the Monte Carlo ensemble can contain some false tracks with very large errors. Also the ensemble may not be complete
because of missed tracks. Computing the average of the error magnitudes over all available tracks could be misleading. False
tracks could cause the average to indicate very large tracking errors. Leaving missed tracks out of the average would lead to
optimistic results. It would be better to substitute moderately large errors for missed tracks, but a value would have to be
established.

Some measures of performance can be computed so that they are not meaningless even if the computations include false tracks.
For example, a sample median rather than sample mean can be used to evaluate a scalar measure of performance with multiple
Monte Carlo runs. If more than half of the tracks are valid, then the false tracks should not have too big an impact on the results.
The extrapolated median discussed in Section 5.3 would be better than the classical median.

Not distinguishing between valid and false tracks is not very satisfactory because it would not indicate what percentage of the
tracks were false, and this is an important measure of performance. Also there are a number of measures of performance that can
not be evaluated meaningfully without deciding if a track is false or valid.

98 / SPIE Vol. 2468

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
5.2 Using Observation Source

"Observation source" refers to the object that generated the observation or observations used by a track for a frame of data. The
object that generated an observation could be the target, clutter, a false signal, or a clutter point plus target. If the source for each
observation used by a track can be established, then this information can be used to decide if a track is valid or false. However,
one wrong observation does not a false track make. It is not uncommon to use a so called M-out-of-N rule to determine if a track
is false. For example, ifthe majority ofthe observations (say M = 5) in a specified number of frames of data (say N=9) are from
only the target, then the track would be declared valid, otherwise it is false.

The use of observation source to identify the object represented by the track requires the evaluator to perform the following two-
step procedure:

. Examine the track to determine which (or the weights with which) observations are used in a track for each frame

. Determine the object source or sources for each observation, and, if appropriate, the weight each object contributes
to an observation, such as for a clump.

To identify the object by observation source, tracking algorithms must be closely examined to determine the weight with which
each observation is used by the track. This is not a simple task; the tracker cannot be treated as simply a black box. For example,
some tracking algorithms, such as JPDA, permit observations to be used more than once (multiple tracks can use the same
observation) and a track uses all the observations in its gate. Use of observation source may be very complex depending on the
type of data association algorithm used in the tracker. Determining the source of observations used by a track would be even
more complex if tracking with multiple sensors. More importantly, the use of observation source typically would not treat all
algorithms fairly.

As discussed in Section 2, determining the target source or sources for each observation is relatively straightforward for a
moderate fidelity or simple simulation (Class C or D). Determining the source for a high fidelity simulation (Classes A and B) is
problematic, however. Another difficulty is that due to their design, some sensor and signal processors provide more than one
observation in a frame from a single object.

The difficulties and limitations of using the observation source, particularly for higher fidelity simulations, argue for using the
distance between target and track as the assignment criterion. Distance is a fairer, easier criterion to use in deciding if a track is
false.

5.3 Use Distance Between Track and Target Truth

By "use distance" is meant the use of a scalar, non-negative, numerical measure of the difference between the track estimate and
the target. Many functions can be used for this measure of distance. For instance, the sum of the squares of the difference
between each component of the track estimated state vector and the target true state vector is one measure of distance. The square
root of this quantity is another. Another measure of distance would involve only the position components of the track and target
state vectors. Yet another distance measure would involve the weighted sum of squares of the difference between each
component ofthe track and target state vectors, as in a chi-square statistic.

The track-target distance is used to decide if a track is false by comparing the distance to a specified threshold value. If the
measure of distance exceeds the specified threshold value, the track is designated a false track. The distance measure could be
computed every time the output track is updated and a decision made each time. A more conservative approach would be to
designate the track to be a false track if the threshold is exceeded for a specified number of consecutive frames or use an M-out-
ofN rule as discussed in Section 5.2. The method used depends in part on the value specified for the threshold.

Specifying the threshold value may not be a very simple task. Personal experience has taught that in some applications, selecting
the threshold value can require some ingenuity. The most challenging conditions seem to be early in a scenario when the track is
first provided by the tracker. For example, if the range to the target is decreasing, then initially the acquired track may be false
part of the time. Also, competing trackers may first acquire at different ranges to the target and initially exhibit substantially

SPIE Vol. 2468 / 99

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
different track accuracies.

The specified threshold value should be a function of time to account for various factors such range to target and time tracked.
Two candidate methods for selecting the specified value for the threshold follow:

S Theoretical Statistic Method. Compute the variance-covariance matrix for an optimal filter, such as a Kalman
filter, starting at the expected range at initial acquisition. This covariance matrix could be obtained by using a
Kalman filter to track the target in a simulation without clutter or false signa1s Use the chi-square statistic as a
measure of distance with the inverse of the optimal variance-covariance matrix as the weighting matrix. Set the
threshold for each frame of data to a multiple of the number of elements in the state vector, for example, use a
multiple of somewhere between 16 and 81 (between 42 and 92)• These values should allow for some misassociation
including unresolved closely spaced objects.

. Empirical Statistic Method. Compute the sum ofsquares ofthe estimation errors (SSE) in position and in velocity
for each track and for each frame of data. Compute the sample median for the two SSE's over all the Monte Carlo
runs for each frame of data. Better still, compute the "extrapolated median' described below. Compute the ratio of
the median position SSE divided by the median velocity SSE for each frame of data. For the measure of distance
for each track, use the position SSE plus the computed ratio times the velocity SSE. Then use as the specified
threshold for each frame of data, the sample median of the position SSE times a multiple, such as 12 (2 times32).
This could be meaningless if the number of really good tracks is not a majority of the ensemble of Monte Carlo
runs.

There are many methods for establishing the threshold and the above methods are just two of those that were considered for a
challenging evaluation task.

For the Theoretical Statistic Method, the chi-square statistic, 2 is defmed as:

x2 ( —x)TP1( —x)
where: I estimated state vector oftrack at time n
xn true state vector oftarget at time n
Pn optimal state estimation error variance - covariance matrix

and is similar to the chi-square statistic used for a gate (validation region) in tracking algorithms'2'3'7

Simplifications to the chi-square statistic can reduce the computing cost. As an illustration, the velocity errors can be omitted
from the measure of distance. In fact, it may be necessary to eliminate the velocity errors if one of the trackers to be tested does
not provide an estimate of velocity. If all the trackers provide a variance-covariance matrix with the acquired track, then the
average of these matrices for each frame of data could be used instead of the optimal matrix described above. While the
covariance matrix computed by each track could be used in the chi-square statistic, this could be misleading. The same
covariance matrix would not be used for all Monte Carlo runs and all trackers ifthere is more than one.

For the Empirical Statistic Method, the author devised the "extrapolated median" specifically for this task. The intent is to
accommodate both missed and false tracks. Missed tracks correspond to missing sample points and false tracks correspond to
outliers. The assumption is that the SSE for a false or missed track would almost always be greater than the median would be if
all tracks were valid.

The extrapolated median is computed as follows. For the purpose of defmition, note that given an odd number of Monte Carlo
runs (NMc), the sample median for a scalar statistic would normally be the middle one in a list sorted by the value of the scalar
statistic. It would be the M-th one, where M = NTMc+l)/2. Using that defmition, sort the SSE's for all the available acquired
tracks for one tracker over the Monte Carlo runs for a frame of data. Actually only the tracks with the M smallest values of SSE
must be identified. The extrapolated median is the M-th smallest SSE, i.e., count up M starting with the smallest in the sorted list.

100/SPIE Vol. 2468

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
Repeat this process for all trackers. Then use, as the fmal extrapolated median for each frame of data, the average (or median)
over all trackers ofthe extrapolated medians computed for each track.

Note that due to missed tracks the number of tracks could be less than the number of Monte Carloruns (NMc). If the above
assumption is correct, the SSE of false tracks will be larger than the extrapolated median SSE provided there are at least M tracks
that are truly valid. The extrapolated median approach penalizes missed tracks by assuming their SSE is larger than the computed
extrapolated median. Obviously the extrapolated median cannot be computed if half the tracks or more are missed tracks. Also
the extrapolated median may be oflittle value ifthere are less than M truly valid tracks.

A big advantage to using distance as the criterion in deciding if a track is false is that the evaluation algorithms need not know
anything about the observations and their object sources. Also, there is no need to know anything about the tracking algorithms to
make the decision. All the evaluation algorithms need do is compute the distance between the target and track states, such as the
chi-square value, and compare that to the specified threshold. The distance measure approach also seems to provide for a fair
evaluation of various types of trackers regardless of the type of data association or filtering algorithms used. Accordingly, the
distance method for deciding if a track is valid or false (given that the target is in the field ofview) is advocated in this paper.

The first step in the methodology advocated here is to classify the track. This step is the first block in Figure 2. In preparation for
this step the measure of distance must first be selected and the threshold value established for each frame of data. Then the
computations for each track, for each Monte Carlo run, and for each frame of data are as follows.

S If the target is in the field of view and there is an acquired track, the chosen measure of distance is computed and
compared to the specified threshold. If the distance is less than the threshold, then the track is designated valid,
otherwise the track is designated false.

I Ifthe target is in the field ofview but there is no acquired track, then there is a missed track.
. Ifthe target is not in the field ofview and there is a track, it is designated a false track.
. lfthe target is not in the field ofview and there is no track, then the designation is valid miss.
This along with the track and target data provides the information needed for the second step of the methodology, which is
discussed briefly in the Section 6.

6. STEP 2: MEASURES OF PERFORMANCE

The specific measures of performance that are applicable to a tracker depend on the application. For example, in some single
target tracking applications, the covariance matrix of the estimated state vector plays an important role and in others it is not even
needed by the user. In some applications the estimated velocity vector is not used and in others it is needed to predict the future
location ofthe target. Computing the various measure ofperformance is the second set ofblocks ofthe methodology displayed in
Figure 2. A priorpaper5 describes some specific measures ofperformance and also lists some pertinent references. Some aspects
oftwo measures ofperformance are briefly discussed in the following subsections.

6.1 Valid Track Score

A method for classifying a track as valid or not is discussed in Section 5. This classification is an important aspect of
performance. A measure of performance or score is needed to indicate the success of the track to provide a valid track over a
number of Monte Carlo runs. The specifics of the score function that is used will depend on the application.

One approach is to compute a score as follows for each tracker. For every frame of data for a single Monte Carlo run:
• Record a value of one for the score if the classification is either valid track or valid miss,
• Record a value of zero if the designation is missed track, and
• Record a value of minus one for a false track

SPIE Vol. 2468 / 101

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
and then for each frame for each scenario, the score values can be averaged to obtain the score as a function of time for each
tracker for each scenario. The scenario scores could be plotted as a function oftime for each tracker and compared.

It would be desirable to also compute a "summary statistic" by also averaging over all scenarios for each frame for each
tracker. This poses some difficulty if the scenarios are very different because there may not be a rational way to normalize
the scenarios so that a meaningful summary statistic can be computed. If, for example, the range to the target is decreasing in
all the scenarios, then range could be used as the normalizing parameter. The scores for a tracker could be averaged by
selecting and combining a frame of data from each scenario average that are for the same range to the target. Finally, a
"global summary statistic" could be computed for each tracker by computing the scalar value that is the average score over all
frames and scenarios for each tracker.

Note that computing the probability of a false or missed track with a simulation may not always be practical. If the
probability of a miss or false track is very small then too many Monte Carlo runs would be needed.8 Computing the measure
of performance for the validity of the acquired track with a simulation will be most practical when a dim target is first
acquired and soon thereafter when the probability of a valid track is still much less than one.

6.2 State Estimation Error

The track accuracy is certainly another important aspect of performance. Common measures of accuracy are the root sum
squares (RSS) of the estimation errors in position and velocity. These are computed for each tracker, each scenario, and each
frame of data. The SSE (sum of squares of the estimation errors) is computed for each frame of data and averaged over all the
valid tracks in the Monte Carlo runs for a tracker and a scenario. The RSS is then obtained by taking the square root of the
computed average SSE. This computation is repeated for both the position SSE and the velocity SSE. The position and
velocity RSS ' s obtained for each scenario could be plotted as a function of time for each tracker and compared. As discussed
in a Section 6. 1 , RSS summary statistics and scalar, global, summary statistics could be computed if there is a rational way to
normalize the scenarios.

Also RSS errors could be obtained using both the average and the median (classical or extrapolated) methods of computation.
Depending on the tracker specifications, it may be appropriate to edit out a specified percent of the worst tracks as discussed
in a previous paper.5

Note that the above computations use only valid tracks in computing the RSS. This edits out the false and missed tracks.
Accordingly, this measure of performance does not stand alone but must be coupled with the valid track scores discussed in
Section 6. 1 . Clearly the parameters of a tracker can be adjusted to favor one of these measures of performance at the expense
of the other. A less appealing alternative is to include false and missed tracks in the SSE computations by providing {large}
default values for the SSE for missed and false tracks.

7. CONCLUSIONS

The practicalities of imaging sensors and digital, multiple track processing permit the tracking of dim targets, e.g., acquiring the
target at a longer range. The performance evaluation of a dim target tracker is challenging due to the uncertainty of whether the
acquired track is valid or false.

A two-stage process has been presented for conducting performance evaluation of a single target tracker. This methodology
should be sufficiently robust to handle the anomalous track that can be caused by misassociations and at the same time be fair in
the evaluation of the various types of tracking algorithms.

The emphasis of this paper is on the first step of this process, namely, deciding if the acquired track is valid. Alternative
approaches for deciding if a track is valid are presented and compared. The recommended criterion is to use a measure of
distance between the target and the track and compare the computed distance to a threshold value. Methods for establishing a
reasonable value for the threshold are also described. This approach provides a reasonable penalty against false and missed tracks.

102 /SPIE Vol. 2468

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
8. ACKNOWLEDGMENTS

Barry Fridling, a co-author of reference 4, graciously gave his permission for portions of that reference to be modified for use
in this paper and he also provided helpful suggestions. The contributions of Dennis Blay and Lee Young and many other
engineers too numerous to list here are also gratefully acknowledged.

9. REFERENCES

1. Blackman, S.S., Multiple Target Tracking with Radar Applications Artech House, Dedham, MA, 1986.

2. Bar-Shalom, Y. and T.E. Fortmann, Tracking and Data Association, Academic Press, San Diego, CA, 1987.

3. Drummond, O.E., Mult4le Target Tracking Lecture Notes, UCLA, Oct. 1985 and subsequent versions, Technology Training
Corporation, Torrance, CA.

4. Drummond, O.E. and B.E. Fridling, "Ambiguities in Evaluating Performance of Multiple Target Tracking Algorithms,"Signal
and Data Processing of Small Targets 1992, SPIE Proc. Vol. 1698, pp. 326-337, April1992

5. Fridling, B.E., and O.E. Drummond, "Perfonnance Evaluation Methods for Multiple Target Tracking Algorithms,"Signal and
Data Processing of Small Targets 1991, SPIE Proc. Vol. 1481, pp. 37 1-383, April 1991.

6. Frenkel, G. (Editor), Proceedings of the SDI Panels on Tracking, Institute of Defense Analyses, Alexandria, VA, Numerous
volumes starting in 1989.

7. Drummond, O.E., "Gate Size for Sequential Most Probable Hypothesis Tracking' Signal and Data Processing of Small
Targets 1994, SPIE Proc. Vol. 2235, pp. 670-682, April 1994.

8. Drummond, O.E., "A Compound Markov Chain Analysis of False Track Generation' Signal and Data Processing of Small
Targets 1994, SPIE Proc. Vol. 2235, pp. 370-38 1, April 1994.

SPIE Vol. 2468 / 103

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
Figure 1 . Typical Signal and Track Processor Algorithm Architecture.

I Classify Track I Evaluate False


Track Track & Target
as and Missed
Valid or False
orIndication of Tracks

False or
Target Truth Missed track Evaluate State
EstimationError
and Bias

Evaluate
Covariance
Matrix Credibility

Evaluate Track
p Purity and
Misassociations

Figure 2. Diagram of Two.Step Performance Evaluation Methodology

104 LSPIE Vol. 2468

Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 16 Mar 2019


Terms of Use: https://www.spiedigitallibrary.org/terms-of-use

You might also like