You are on page 1of 27

875376 HFSXXX10.

1177/0018720819875376Human FactorsEvaluation of Objective Measures of SAresearch-article2019

A Systematic Review and Meta-Analysis of Direct


Objective Measures of Situation Awareness:
A Comparison of SAGAT and SPAM
Mica R. Endsley , SA Technologies Inc., Mesa, Arizona, USA

Objective: To examine evidence of sensitivity, pre- Introduction


dictiveness, and methodological concerns regarding direct,
objective measures of situation awareness (SA). The study of situation awareness (SA) has
Background: The ability to objectively measure SA grown over the past three decades to encompass
is important to the evaluation of user interfaces and dis- investigations of not only the construct, but also
plays, training programs, and automation initiatives, as well the evaluation of new system designs and train-
as for studies that seek to better understand SA in both
individuals and teams. A number of methodological criti-
ing programs in a wide variety of domains. To
cisms have been raised creating significant confusion in the accommodate these efforts, the need to measure
research field. SA has remained fundamental to progress in
Method: A meta-analysis of 243 studies was conducted this area of research. A number of approaches
to examine evidence of sensitivity and predictiveness, and have been proposed and used in research on
to address methodological questions regarding Situation
Awareness Global Assessment Technique (SAGAT), Situ-
SA, including process measures, performance
ation Present Assessment Technique (SPAM), and their measures, and measures that seek to assess a
variants. person’s level of knowledge and understanding
Results: SAGAT and SPAM were found to be equally about the situation via direct questioning of the
predictive of performance. SPAM (64%) and real-time individual. These techniques are summarized
probes (73%) were found to have significantly lower sensi-
tivity in comparison to SAGAT (94%). While SAGAT was
in Table 1, along with some advantages and
found not to be overly memory reliant nor intrusive into disadvantages of each. See Endsley (1995b) or
operator performance, SPAM resulted in problems with Endsley and Jones (2012) for a review. SA mea-
intrusiveness in 40% of the studies examined, as well as surement approaches will be briefly described,
problems with speed-accuracy tradeoffs, sampling bias, followed by a focus on the direct measurement
and confounds with workload. Concerns about memory
reliance, the utility of these measures for assessing Team
of SA as an ongoing state of knowledge about
SA, and other issues are also addressed. relevant information in the environment that is
Conclusion: SAGAT was found to be a highly sensi- needed to support decision-making in complex
tive, reliable, and predictive measure of SA that is use- and dynamic domains.
ful across a wide variety of domains and experimental Some researchers have examined the pro-
settings.
Application: Direct, objective SA measurement pro-
cesses that people use to develop SA using mea-
vides useful and diagnostic insights for research and design sures such as eye tracking (de Winter, Eisma,
in a wide variety of domains and study objectives. Cabrall, Hancock, & Stanton, 2019; Ikuma, Har-
vey, Taylor, & Handal, 2014; Smolensky, 1993),
Keywords: situation awareness, metrics, SAGAT, SPAM, communications (Bolstad et al., 2007; Gorman,
measurement Cooke, Pederson, Connor, & DeJoode, 2005;
Orasanu, 2000; Prince, Salas, & Stout, 1995),
verbal protocols (Hall & Phelps, 1983; Rose,
Address correspondence to Mica R. Endsley, SA Bearman, Naweed, & Dorrian, 2019; Sullivan &
Technologies, 5301 S. Superstition Mountain Drive, Suite
104377, Gold Canyon, AZ 85118, USA; e-mail: mica@
Blackman, 1991; Walker, Stanton, & Young,
satechnologies.com. 2008), and physiological measurement (Wilson,
2000). While process measures can provide
HUMAN FACTORS
Vol. XX, No. X, Month XXXX, pp. 1­–27
insights into how people develop SA, they can
DOI: 10.1177/0018720819875376 only be used to indirectly infer the quality and
Article reuse guidelines: sagepub.com/journals-permissions completeness of the resulting SA, as a state of
Copyright © 2019, Human Factors and Ergonomics Society. knowledge about the situation, obtained by the
2
Table 1: Comparison of SA Measurement Approaches
Direct SA Measures

Process Measures Performance Measures Subjective Objective

Metrics
  Eye Tracking, Communications, Response time, Errors Likert-type scales, SART SAGAT SPAM
Verbal Protocols, Physiological
Advantages
•• Objective and continuous •• Objective or subjective •• Easy to collect •• Queries people on •• Queries people on relevant
relevant SA knowledge SA knowledge of past,
on perceptions, present, and future
comprehension, and
projection
•• Eye tracking provides information •• Can be gathered without •• Can be used across •• Objectively scored based •• Objectively scored and timed
on order and duration of attention operator input many domains on simulation data based on simulation data
to visual information
•• Communications and verbal •• Often already collected •• Provides indication •• Unbiased sampling •• Simulation freeze not
protocols can provide information of confidence in SA across scenario avoids required
on processes, strategies, types of end-of-trial memory
assessments made dependence
Disadvantages
•• Eye tracking does not capture •• Assumes what behavior will •• People may not be •• Requires freezing of •• Requires dual-tasking
attention to auditory cues or occur given a particular state aware of what they simulation scenario to answer queries while
if information is correctly of SA. System or training do not know; meta- performing task, potentially
understood or integrated for changes may affect awareness is poor interfering with performance
higher levels of SA performance in unexpected and creating a secondary
ways. task workload measure
•• Communications provides only •• SA for normal and •• May be overly •• Requires people to answer •• Allows people to “look-up”
partial information on what is emergency events may influenced by self queries based on memory answers to queries which
attended to and how processed. be different, so inferences assessments of for 2–3 min during freeze may not assess SA
Some people verbalize more than constrained by scenarios performance
others. tested and performance
measures collected
•• Little research to date to support •• Confuses SA with •• Some scales (e.g., •• Requires development •• Requires development of
validity of physiological measures performance which can be SART) include of domain-specific queries domain-specific queries
for SA affected by other factors. measures of
Often insufficient sensitivity workload
and diagnosticity.

Note. SA = situation awareness; SART = Situational Awareness Rating Technique; SAGAT = Situation Awareness Global Assessment Technique; SPAM = Situation
Present Assessment Technique.
Evaluation of Objective Measures of SA 3

individuals involved. An individual’s knowl- ratings more likely assess an individual’s confi-
edge and capabilities, as well as the system dence level (Endsley, Selcon, Hardiman, &
interfaces available, mediate the degree to which Croft, 1998; Hamilton, Mancuso, Mohammed,
the SA processes used are successful in creating Tesler, & McNeese, 2017; Sulistyawati, Wick-
accurate SA; so even two people using the same ens, & Chui, 2009). Observer ratings of SA
process may not arrive at the same situation overcome some problems with self-assessment,
understanding. Furthermore, these techniques but still suffer from becoming a proxy for sub-
generally provide only partial insights into what jective performance ratings in that observers
information is attended to and how it is pro- have limited information on a person’s mental
cessed to form situation comprehension and pro- representation of the situation. See also Endsley
jection. Verbal protocols have not been found to (in press).
be effective for measuring SA (Rose et al., 2019; Objective SA measures assess the operator’s
Walker et al., 2008), and physiological measures knowledge of the current situation by asking the
have had little research to establish validity for operator relevant questions about the situation
this purpose. Therefore, process measures are that can be objectively scored as correct or incor-
inadequate for measuring SA on their own. rect. Two commonly used metrics in this cate-
Other researchers have focused on trying to gory are the Situation Awareness Global Assess-
infer whether or not people have good SA based ment Technique (SAGAT) (Endsley, 1988,
on their actions and performance; however, this 1995b) and the Situation Present Assessment
approach can be severely hampered when peo- Technique (SPAM) (Durso et al., 1998). How-
ple do not act as expected (Farley, Hansman, ever, these methods have been the subject of
Amonlirdviman, & Endsley, 2000; Jones & considerable debate, which has created confu-
Endsley, 2000b; Pritchett, Hansman, & Johnson, sion in the field (Chiappe, Rorie, Moran, & Vu,
1995). While collecting data on performance is 2012; Durso et al., 1998; Endsley, 1995a; Jones
always desirable, the main limitation of this & Endsley, 2004; Salmon et al., 2008; Salmon,
class of measures is that it only infers SA indi- Stanton, & Young, 2011; Sarter & Woods, 1991).
rectly, in what can be a circular argument, with- The objective of this article is to examine the
out providing sufficient detail or diagnosticity SAGAT and SPAM methods for objective mea-
on what the operator really thought was going surement of SA and to review and compare the
on. In addition, Wickens (2000) pointed out that success of these metrics. I first provide an over-
SA for normal and emergency events may be view of these two SA measurement approaches
quite different, thus inferences about SA are along with a summary of their advantages and
highly constrained by the scenarios that are disadvantages. I then focus on questions of sen-
tested. In that many other factors can effect the sitivity and predictiveness, performing a meta-
decisions and actions people make based on analysis of the research literature that has been
their SA (such as tactics, procedures, and execu- conducted on and with these SA metrics. In
tion skills), behavioral and performance mea- addition, a systematic review of the available lit-
sures also offer only limited insights into SA. erature is conducted to address a number of
Largely due to the limitations of the above methodological issues that have been raised
approaches, the majority of research employing concerning the construct validity of the two
SA measures has focused on directly determin- techniques.
ing a person’s SA, as a state of knowledge,
through either subjective or objective assess- SAGAT
ment (Endsley & Garland, 2000). Subjective SAGAT is one of the earliest and most
measures of SA are easy to obtain; however, widely used measures of SA. When using
they suffer from the fact that people may be SAGAT, simulations of representative tasks or
unaware of what they do not know and may be scenarios are frozen at randomly selected times,
unduly influenced by observations of perfor- and system displays are blanked while people
mance (Endsley, 1995b; Endsley & Jones, quickly answer questions about their current
2012). Research indicates that subjective SA perceptions of the situation. Queries can be
4 Month XXXX - Human Factors

provided verbally, via pencil and paper, or on theory (Hogg, Follesø, Strand-Volden, & Tor-
a computer or tablet for ease of administration. ralba, 1995); (b) the Qualitative Assessment of
SAGAT queries correspond to an individual’s SA (QUASA), which poses probes in the form
SA requirements as determined from the results of true/false statements and adds an assessment
of a SA requirements’ analysis, such as a Goal of confidence in each answer (McGuinness,
Directed Task Analysis (Endsley, 1993b; End- 2004); (c) SALSA, which attempts to weight
sley & Jones, 2012), for a given domain and each query type in computing an overall score
role. People’s perceptions are then compared to (Hauss & Eyferth, 2003); and (d) the SA Verifi-
the real situation, based on simulation computer cation and Analysis Tool (SAVANT), which
databases, to provide an objective measure of provides a partial display to ask questions about
SA. Scoring some queries, such as those related missing information, and includes an assess-
to situation comprehension, may be provided by ment of time to answer each question along
subject matter experts with perfect knowledge with accuracy (Willems & Heiney, 2001).
of the situation at the time of the freeze. Multi-
ple “snapshots” of a person’s SA are acquired in SPAM
this way, providing an assessment of the quality
SPAM and real-time probes also provide
of SA provided by a particular system design.
queries to assess SA; however, the queries are
SAGAT provides an objective, unbiased
provided in real time, usually verbally, while
assessment of SA by providing the queries at ran-
the individual is carrying out his or her normal
dom times across a scenario, collecting data dur-
operational tasks. In addition to response accu-
ing both high-workload and low-workload peri-
racy, the time to respond to each SA probe is
ods. By assessing SA at different points in time,
collected as an index of how readily available
SAGAT avoids the problem of relying on mem-
the information is. This form of measurement
ory of events after the scenario is completed
is referred to as a real-time probe (Endsley,
(Nisbett & Wilson, 1977). As a global measure-
1995b; Jones & Endsley, 2000a). SPAM also
ment tool, SAGAT includes queries across a
provides a “ready” prompt prior to each SA
wide range of SA requirements for a given job,
probe that allows operators to delay receiving
including Level 1 (perception of data), Level 2
it until they are ready for the probe. SA probes
(comprehension of meaning), and Level 3 (pro-
may be staged as an embedded probe, with a
jection of the near future). This includes a con-
confederate playing the role of the questioner,
sideration of system functioning and status, as
so that they appear to be natural to the scenario.
well as relevant features of the external environ-
SPAM provides queries that correspond to past,
ment and team as appropriate. Since SAGAT
present, and future aspects of the situation.
includes queries across the full spectrum of an
individual’s SA requirements, this approach min-
imizes the possible biasing of attention, as people Methodological Issues and Concerns
cannot prepare for the queries in advance (Ends- A number of studies have attempted to
ley, 1995b). Examples of queries for air traffic directly compare SAGAT and SPAM, focusing
control (ATC) are shown in Table 2. on their relative abilities to predict performance
SAGAT scores are normally expressed as in the simulation, with differing conclusions
percent correct for each query, based on opera- (Durso, Bleckley, & Dattel, 2006; Durso et al.,
tionally relevant tolerance bands. Many re- 1998; Jones & Endsley, 2004; Loft, Bowden,
searchers have varied from this recommended et al., 2015; Pierce, Strybel, & Vu, 2008; Stry-
approach, instead combining the scores on all bel, Vu, Kraft, & Minakata, 2008). In addition,
SAGAT queries into a combined overall score, many researchers have sought to determine how
or into three combined scores that represent sensitive these techniques are to the independent
Level 1, Level 2, and Level 3 SA. Other scoring manipulation provided in the experiment (Alex-
variants include (a) the SA Control Room ander & Wickens, 2005; Jones & Endsley, 2004;
Inventory (SACRI), which computes a d′ sensi- Silva, Grigoleit, Ann Burress, & Fitzpatrick,
tivity and bias score based on signal detection 2017; Vidulich, 2000). A goal of the present
Evaluation of Objective Measures of SA 5

Table 2: Sample SAGAT Queries for Air Traffic Control

Level 1 SA
  Mark all aircraft on the attached sector map.
    (Completed map provided for all subsequent questions)
  What is the airspeed of the indicated aircraft?
  What is the heading of the indicated aircraft?
  What is the type of the indicated aircraft?
  Is the indicated aircraft currently level, climbing, or descending?
  Which aircraft are currently experiencing an emergency?
Level 2 SA
  Which aircraft have been issued assignments (clearances) that have not yet been completed?
  Which aircraft are not conforming to their clearance?
  Which aircraft are not in communication with you?
  Which aircraft are currently being affected by weather?
  Which aircraft will violate special airspace separation standards if they stay on their path?
Level 3 SA
  What is the next sector of the indicated aircraft?
  Which pairs of aircraft will lose separation if they stay on their current (assigned) course?
  Which aircraft must be handed off to another sector/facility in the next 2 min?

Note. SAGAT = Situation Awareness Global Assessment Technique; SA = situation awareness.

study is to examine the existing research base to delay answering until they feel able. The abil-
to address these questions and to determine how ity to wait to answer probes until the participant
well these two approaches to objective SA mea- is ready also creates a problem, as it systemati-
surement fare in terms of sensitivity and predic- cally biases SPAM results toward lower work-
tive ability. Rather than making an assessment load periods. Another concern is that the ability
based on any one study, a meta-analysis across to look at displays while answering questions
the available research literature is conducted to fails to capture SA as an ongoing understanding
provide a clearer picture of the utility and pre- of the world, but rather measures people’s ability
dictive validity of these two metrics. to look up information (Endsley, 2015a).
The research literature is also rife with a The literature base is full of these logical, but
number of criticisms of SAGAT and SPAM that differing, viewpoints, and many of the claims
need to be addressed. First, a number of research- have been offered without evidence. Therefore,
ers have criticized SAGAT claiming that the empirical research addressing these method-
freezes to collect data are intrusive and that it ological issues will be examined, including the
relies too much on working memory (Chiappe potential intrusiveness of the measures, their
et al., 2012; Durso et al., 1998; Salmon et al., reliance on working memory, speed accuracy
2011; Sarter & Woods, 1991). They advocate for trade-offs that may affect outcomes, sampling
the SPAM approach, claiming that it produces a bias, potential confounds with workload, and
picture of SA that is more natural and “situated” query design. The utility of these measures for
(Chiappe, Strybel, & Vu, 2015). assessing Team SA is also considered.
On the other hand, concerns have also been
raised about the intrusiveness of real–time probes Method
and SPAM, in that they require people to multi- A literature search was conducted on Google
task to answer questions while performing oper- Scholar, Scopus, and Science Direct with the
ational tasks, which could negatively affect pri- key words SA, probes, SAGAT, SPAM, SACRI,
mary task performance (Endsley, 1995b; Jones & QUASA, SAVANT, and SALSA. Papers that
Endsley, 2000a; Pierce, 2012). Because of this, collected experimental data with one of these
SPAM could actually be considered a secondary techniques (or variants of them), with sufficient
workload task, even though operators are allowed data reported, were included in this review.
6 Month XXXX - Human Factors

Duplicates of the same study were excluded effects sizes. Unfortunately, calculating the
where identified. A total of 243 papers were tab- comparative sensitivity of the SA measures
ulated that include data on metrics, experimental using any of these approaches was not possible,
environment, domain area, type of participants, due to the differing types of output measures
number of participants, and experimental results provided by the SA metrics. Cohen’s d is often
of the study, including statistical analyses. used to compare relative differences between the
Given that SACRI, QUASA, SAVANT, and means of two different groups. So Cohen’s d
SALSA are all variants of SAGAT, based on the could be used to show the effects size of a mea-
same freeze and query technique, these measures sure such as SPAM that produces response time
are reported on together. Differences in results as its outcome measure. However, this metric is
associated with the effect of these different anal- inappropriate for measures where the outcome is
ysis approaches are then broken out in the meta- expressed as a proportion (i.e., percent correct),
analysis. Similarly, both real-time probes and such as SAGAT. In such a case, Cohen’s h is
SPAM are based on the same technique of pro- more appropriate. Furthermore, some of the
viding probes to operators during task perfor- studies examined provided only chi-square sta-
mance, although SPAM also provides a ready tistics for frequency counts, in which the appro-
prompt prior to probe administration. These two priate metrics for an effects size calculation is
techniques are also reported on together; how- Cramer’s V. Therefore, it was not possible to
ever, differences in findings for these two varia- rely on the calculation of effect sizes in each
tions are broken-out in the meta-analysis. study as a common method for comparison
across these different types of SA measures,
Sensitivity since these analyses rely on different statistics
To determine the sensitivity of a measure, that are not comparable.
typically an attempt is made to independently Instead, sensitivity of each measure was cal-
vary the construct of interest and to assess culated as the percentage of studies where the
how well the measure detects this change. In metric was employed in which it detected a dif-
that there is no way to independently vary SA ference between study conditions (i.e., probabil-
directly, the best proxy in the SA research lit- ity of detection). This approach has been used by
erature is the variation provided by the experi- Vidulich (2000) and Bushman (1994), and is
mental manipulation in the study (e.g., changes based on considering the independent variable
in displays, automation, training condition, or in the study to have been a means of manipulat-
operator experience). The sensitivity meta- ing SA. The sensitivity of the metric can then be
analysis, therefore, examined the sensitivity of assessed as its ability to detect that change. As
the measures to the experimental manipulation Wickens (1998) has argued in favor of consider-
in each study, that is, the likelihood that the ing larger p-values than .05, the analysis also
measures would find a difference in SA between considered studies where the metrics showed a
conditions, assuming that true differences exist. significant difference between conditions at a
While it is almost impossible to say that every p-value of up to .10. If the measure was found
study manipulation should result in an effect sensitive (p < .05) it received a score of 1, if it
on SA, the hypothesis of each study was that was marginally significant (p < .10) it received a
such a difference should theoretically be found. score of .5, and if it was not significant (p > .10)
Furthermore, when performance or workload it received a score of 0. This provided for an
measures in the same study found differences assessment of the proportion of studies using
between conditions, this increases the expecta- each metric that showed sensitivity to the stud-
tion that the SA measures should be sensitive to ies’ manipulation of SA. The relative sensitivity
study manipulations. of the SA metrics across different domains, test
Other approaches have also been applied to environments, subject types, and methodologi-
examining the sensitivity of metrics in meta- cal differences were also compared to determine
analyses, including Cohen’s d (Cohen, 1995), if these factors were relevant to metric sensitiv-
Cohen’s h (Cohen, 1995), and Cramer’s V ity. In addition, studies that employed both
(Cramer, 1946), which determine comparative SAGAT and SPAM in the same study were
Evaluation of Objective Measures of SA 7

analyzed separately to determine relative sensi- differences were directly compared to see if
tivity between the measures. these factors affected predictiveness. Papers that
employed both SAGAT and SPAM in the same
Predictiveness study were analyzed separately to determine
Another issue of concern has been whether relative predictiveness of the measures.
the SA measure employed is predictive of per-
formance and if so, whether SAGAT or SPAM Intrusiveness
is more predictive than the other. Assessing the Studies that directly compared performance
predictiveness of SA metrics against perfor- and or workload in trials in which the SA
mance measures is rather complicated. First, it measures were employed to trials that did
assumes that SA metrics should be predictive of not include a SA measure were analyzed to
performance. While this is true at a high level, it determine evidence of the intrusiveness of the
neglects the fact that both SA and performance techniques. Intrusiveness was determined by
are multi-dimensional. For example, Wickens whether the SA measure significantly (p < .05)
(1995) shows that the SA needed for routine affected a performance metric in the study.
performance may not be the same as that needed
for emergency performance. And SA of some Memory Dependence
information may only be relevant to some per- Whether or not SAGAT is overly reliant on
formance outcomes. For example, knowing how working memory was analyzed by qualitatively
fast the driver of a car is going may be relevant examining the studies that also independently
for getting speeding tickets, but not for lane assessed working memory to determine the rela-
tracking accuracy. Not all SA is relevant to all tionship between these two measures. Differ-
performance measures. Furthermore, most stud- ences between expert and novice subjects were
ies are limited in the number of performance considered. Because SPAM allows operators to
measures assessed, increasing the likelihood see all information when answering queries, it
that some SA metrics may not have the relevant was not considered in this analysis.
performance metrics for comparison. Therefore,
this meta-analysis assesses whether any SA Speed Accuracy Trade-offs
measure was predictive of any performance
As the predominant measure for SPAM is the
measure in each study.
time to respond to the SA query, this analysis
Following the same approach as the sensitiv-
focused on studies that provided evidence of
ity meta-analysis, predictiveness was calculated
speed-accuracy trade-offs that potentially con-
as the proportion of studies using a metric in
found this measure. Because SAGAT does not
which the SA measure was predictive of at least
involve timed performance, but only accuracy,
one performance measure. If the measure was
it was not the focus of this analysis.
found to be predictive (p < .05) of performance
in the study, it received a score of 1, if it was
Sampling Bias
marginally predictive (p < .10) it received a
score of 0.5, and if it was not predictive it was Both SAGAT and SPAM should provide an
assigned a score of 0. unbiased estimate of SA by sampling operator
In addition, the degree of strength of the cor- knowledge across a wide range of conditions
relations between SA measures and the perfor- and events during a scenario. Studies were
mance measures across studies was also calcu- examined to determine whether the provision
lated, by converting each Pearson’s r to Fisher’s of a “ready” prompt with the SPAM technique,
Z. This provided for a direct comparison between allowing people to defer answering, created a
the mean correlation strength (performance pre- bias toward lower workload periods.
dictiveness) and a 95% confidence interval
across metrics. The relative predictiveness of the SA vs. Workload
SA metrics across different domains, test envi- Studies that examined the relationship
ronments, subject types, and methodological between the SA measures and workload were
8 Month XXXX - Human Factors

also assessed to determine whether the measures A, which is available with the manuscript on the
assess SA as an independent construct, or are Human Factors website. The proportion of stud-
confounded by workload. ies in which SAGAT or one of its variants was
sensitive to the experimental manipulation in the
Query Design study is listed in Table 3. The sensitivity score
Differences in the content and nature of the across all 152 studies is 85.5%.
SA queries provided in SAGAT and SPAM Studies were conducted in a number of
studies were qualitatively assessed. domains including aviation, ATC, driving, mili-
tary, medical (including nursing, surgery, and
Team SA other specialty areas), process control, robotics
and unmanned air vehicle control, artificial tasks
The SA of teams (as an aggregate or in terms
created specifically for experiments, and other
of the shared SA among team members) is
areas including firefighting (4), train driving (2),
often a focus of research interest. Studies that
maritime (2), teleoperations (2), emergency
used either SAGAT or SPAM to measure team
management, weather forecasting, cyber secu-
SA were analyzed separately to determine the
rity, and maintenance. Overall, sensitivity of the
degree to which these techniques support SA
measure was not significantly different across
research at the team level in addition to at the
domains (χ2 = 2.47, p = .96).
individual level.
The majority of studies were conducted in
simulations or microworlds, but there were also
Results and Discussion
studies that employed the viewing of videos and
The presentation of results and discussion live exercises. There were no significant differ-
will be considered together for each aspect of ences (χ2 = 0.38, p = .94) in the sensitivity of the
the meta-analysis: sensitivity of the metrics technique between testing environments, nor in
to the study manipulation, and predictiveness whether the test subjects were experienced
of the metrics for the performance measures domain practitioners, novices in the domain, or
provided in the study. This is followed by a were completely naïve (either students or from
discussion of findings with regard to intrusive- the general population) (χ2 = 2.26, p = .69).
ness of the techniques, memory dependence, Significant differences in sensitivity were
the presence of speed-accuracy tradeoffs, the found based on how the measure was adminis-
potential for sampling bias, distinguishability tered. Studies that only measured SA at the end
from measures of workload, query design, and of trials (73% sensitivity) experienced lower
ability to measure team SA. sensitivity than those using the freeze technique
prescribed by SAGAT (89% sensitivity), (z =
Sensitivity 2.02, p = .02). Studies that collected SA only at
In all, 171 papers were found that used a the end of trials generally suffered from a low
direct, objective SA measure to evaluate a sample size as well as only capturing SA at the
system design (hardware or displays), automa- end of the trial, which may not be indicative of
tion, operational concept, attention allocation, SA throughout.
training manipulation, or to assess differences A second source of variance occurred due to
in participants and the effects of other factors the method of scoring used. The SAGAT meth-
such as workload. Of those, 150 papers involved odology recommends scoring each query sepa-
SAGAT or a variant, and 27 involved SPAM or rately because many design interventions can
a real-time probe (with 6 papers including both effect changes in just one or two elements of SA,
measures). often in different directions, and if the queries
SAGAT and variants.  Of the 150 papers that are combined these differences can cancel each
included SAGAT or a variant of it, 5 papers were other out (Endsley, 2000). This is consistent
duplicates, 1 paper included 3 studies, and 5 with Landry and Yoo (2012) who recommend
papers included 2 studies, making for 152 unique scoring by query rather than a combined score to
studies in all. These papers are listed in Appendix avoid the problem of assumed equal likelihood
Evaluation of Objective Measures of SA 9

Table 3: Studies Examining the Sensitivity of SAGAT (and Variants)

Category Number Sensitivity (%)

Domain
 Aviation 17 74
 ATC 17 94
 Driving 32 98
  Health care 19 66
  Power/process control 15 87
 Military 20 90
 Robotics 7 93
  Artificial tasks 6 100
 Other 19 74
Test environment
 Simulation 104 84
 Microworld 26 88
 Video 10 100
  Live exercise 11 77
 Experimental 1 100
Participants
 Experienced 73 84
 Novice 17 59
 General 15 100
 Students 29 88
 Combination 18 100
Method* (p = .02)
 SAGAT 119 89
  End of trial 20 73
 SACRI 4 75
 QUASA 6 67
 SALSA 2 100
 SAVANT 1 50
Analysis* (p = .04)
  By query 43 91
  By SA level 42 92
  Overall score 52 81

Note. SAGAT = Situation Awareness Global Assessment Technique; ATC = air traffic control; SACRI = SA Control
Room Inventory; QUASA = Qualitative Assessment of SA; SALSA = Situation awareness of en-route air traffic
controllers in the context of automation; SAVANT = SA Verification and Analysis Tool; SA = situation awareness.
*p < .05.

of each element. Analysis of SAGAT queries has that were analyzed by query (91% sensitivity) or
also shown a lack of internal consistency (Ends- by SA level (92% sensitivity; z = 1.77, p = .04).
ley, 1990b, 2000), arguing against combining When just the 68 studies that used SAGAT
across queries. Nonetheless, many researchers as designed (with scenario freezes and analysis
analyzed SAGAT data as a single score com- by level or by query) are considered, overall
bined across all queries, or as a separate score sensitivity rises to 90%. An additional problem
for each level of SA. It was found that studies was found in that some studies collected only 1
using a single combined SA score (81% sensi- or 2 SAGAT freezes across conditions, result-
tivity) were less likely to be sensitive than those ing in very small sample sizes, well below the
10 Month XXXX - Human Factors

60 samples per query per condition recom- generally reported across the literature spanning
mended by Endsley (2000). When the four some 30 years. To examine the possibility that
studies with very low sample sizes are removed, variations in sample size across studies unduly
overall sensitivity of SAGAT rises to 94%. affected the sensitivity analyses, an analysis of
It should be noted that the two studies using variance (ANOVA) was conducted that found
the SALSA scoring method performed well no significant effect of number of subjects on
(100%), three out of four SACRI studies were the sensitivity score of the studies in this meta-
sensitive (75%), and four of the six studies using analysis, F(2, 175) = 0.05, p = .95.
the QUASA method were sensitive (67%). Several other caveats also need to be noted
However, the numbers of studies using these with regard to this meta-analysis of sensitivity.
techniques are so few that it is difficult to con- First, it is certainly possible that a metric could
clude anything about them. find a difference between study conditions that
SPAM and real-time probes. Of the 27 does not exist (i.e., false positives). It is also pos-
papers that included SPAM or real-time probes sible that the study itself was poorly constructed
to measure SA, 2 were duplicates and 1 paper with insufficient power, leading to a lack of sen-
involved 2 studies, making for a total of 26 stud- sitivity due to study design rather than the metric
ies. These papers are listed in Appendix B, itself. While there was no obvious data to sup-
which is available with the manuscript on the port that either situation was more or less likely
Human Factors website. The proportion of stud- to exist with regard to the studies that used either
ies in which SPAM or a real-time probe was sen- SAGAT or SPAM, a further assessment was
sitive to the experimental manipulation in the made of the studies that employed both mea-
study is listed in Table 4. The mean sensitivity sures within the same study as a means of more
for these studies overall is 69.2%. carefully assessing their comparative sensitivity
The majority of the studies were conducted in to SA differences.
aviation, ATC, driving, military, process control, Comparison of technique sensitivity. Figure
and submarine management. There were not 1 shows the proportion of studies in which each
sufficient numbers of studies to compare sensi- metric was sensitive to the study manipulation.
tivity across these domains. The majority of the When conducted with freezes according to the
studies were conducted in simulations, although prescribed methodology and scored by query,
a few were also conducted in microworlds, SAGAT is significantly more sensitive than both
watching videos, or in live exercises. Partici- SPAM (z = 14.88, p < .001) and real-time probes
pants were primarily either experienced opera- (z = 3.05, p = .001).
tors or students. Neither testing environment nor Six studies included a direct comparison of
participant type significantly affected the sensi- the sensitivity of SAGAT and either SPAM or
tivity of the metric (p > .1). SPAM probes (sen- real-time queries in the same study. Two studies
sitivity = 64%) were actually slightly less sensi- found SAGAT more sensitive than real-time
tive than real-time probes (sensitivity = 73%), probes (Endsley, Sollenberger, & Stein, 2000;
although this difference was not significant (z = Jones & Endsley, 2004), three studies found
0.53; p = .30). SPAM more sensitive than SAGAT (Alexander
Limitations. The sensitivity meta-analyses & Wickens, 2005; Cummings & Guerlain, 2007;
were based on whether each measure was sensi- Silva et al., 2017), and one study found mixed
tive in a wide variety of studies, each with dif- results with SAGAT and real-time probes
ferent numbers of participants. The use of equally sensitive to the study manipulations
p-values for conducting hypothesis testing has (Burns et al., 2008).
been criticized on the basis that it does not nec- To further examine these findings, Jones and
essarily indicate the magnitude of effects (i.e., it Endsley (2004) found that SAGAT was more
is subject to the effects of sample size) (Kline, sensitive than real-time probes because the sim-
2004). ulation freeze allows more queries to be col-
However, a p-value threshold for hypothesis lected, yielding a more reliable measure of SA
testing has been the standard in the field and is (which is important when periodic sampling is
Evaluation of Objective Measures of SA 11

Table 4: Studies Examining the Sensitivity of SPAM and Real-Time Probes

Category Number Sensitivity (%)

Domain
 Aviation 5 100
 ATC 3 33
 Driving 6 83
  Submarine management 3 33
  Power/process control 3 67
 Military 4 50
 Other 2 100
Test environment
 Simulation 17 65
 Microworld 3 67
 Video 4 100
  Live exercise 2 50
Participants
 Experienced 16 75
 Students 10 60
Method
 SPAM 11 64
  Real-time probes 15 73

Note. SPAM = Situation Present Assessment Technique; ATC = air traffic control.

involved). As SPAM and real-time probes can involved SALSA (predictiveness = 50%), 1
only be asked one at a time and since excessive involved a real-time probe (predictiveness = 0%),
interruptions are not possible, it is at a disadvan- and 12 involved SPAM (predictiveness = 100%).
tage. Of the three studies where SPAM was more The sensitivity of SAGAT/SALSA/end-of-trial
sensitive than SAGAT, one measured SAGAT assessments compared to SPAM/real-time probes
only at the end of two trials, which created insuf- was not significantly different (z = .37, p = .35).
ficient data (Cummings & Guerlain, 2007). In A total of 43 studies provided correlation data
Silva et al. (2017), while SAGAT was not sensi- between the SA measure and performance mea-
tive to operator experience level, neither were sures. In comparing the mean Pearson’s r calcu-
the performance measures captured. Experi- lated for studies using each method, the end-of-
enced operators had better accuracy on SPAM trial measures were most highly predictive of
however. performance (r = .533), followed by SAGAT
(r = .459), and then SPAM (r = .411). There was
Predictiveness partial overlap in the 95% confidence intervals
A total of 53 comparisons were found across that were calculated from the converted Z-scores
46 papers that examined the predictiveness of an for SAGAT and SPAM, however.
objective measure of SA, as shown in Appendix Seven papers examined the relative predic-
C, which is available with the manuscript on the tiveness of these measurement techniques within
Human Factors website. Overall, 89.6% of the the same study. Two studies found SAGAT to be
analyses found SA measures predictive of the more predictive than real-time probes or SPAM
performance or decision-making measures col- (Jones & Endsley, 2004; Loft, Bowden, et al.,
lected in the study. 2015). Two studies found SPAM to be more pre-
Table 5 shows that 35 analyses involved dictive than SAGAT (Durso et al., 2006; Krae-
SAGAT (predictiveness = 89%), 4 examined end- mer & Süß, 2015). The Durso et al. (2006) study,
of-trial SA assessment (predictiveness = 100%), 1 however, artificially restricted SAGAT to only
12 Month XXXX - Human Factors

Figure 1. Sensitivity of SA Metrics: Probability of detection of SA difference due to


study manipulation.
a
SAGAT with prescribed method and sample size.

one query per freeze, significantly reducing its trial, which is hard to explain on any theoretical
sample size unnecessarily. In the Loft, Bowden, level. SAGAT scores have been shown to be
et al. (2015) study, which allowed more queries highly stable across individuals (Endsley & Bol-
during freezes, this handicap disappeared result- stad, 1994), so the finding of between subjects
ing in SAGAT accounting for twice the variance predictiveness is not unexpected.
in performance compared to SPAM. In examining the lack of a within subjects
Three studies found mixed results. Durso effect, the most likely explanation is that in this
et al. (1998) found that SPAM was more sensi- study the queries were all referenced to ship con-
tive to one performance measure and SAGAT tact numbers which creates a significant memory
was more sensitive to another. Pierce, Strybel, burden. Previous research has shown that air traf-
and Vu (2008) similarly found SAGAT and fic controllers do not remember aircraft call
SPAM to each be more sensitive to different out- signs, for example, but rather rely on spatial
come measures. Strybel et al. (2008) showed memory of where air traffic is located (Endsley
that SPAM latency was a slightly better predic- & Rodgers, 1998). By pinning each query to the
tor of pilot airspeed variability than SAGAT ship contact number in the Loft et al. (2018)
(r2 = .08 vs. r2 = .07), although they used a com- study, rather than a spatial map as is typical with
bined score for SAGAT which is generally less SAGAT administration, it is possible that the
sensitive. Based on these findings, it would researchers created an artificial memory require-
appear that both SPAM and SAGAT are predic- ment that interfered with their results. This mat-
tive of performance. ter will require further research. The authors also
In one counter-intuitive finding, Loft et al. indicated that it is possible that different perfor-
(2018) found that while SAGAT was predictive mance windows need to be considered to link SA
of performance in a submarine management to performance in their domain.
study, this was only at the between subjects
level, but not at the within subjects level. This Intrusiveness
would indicate that the differences in SA A common concern about both measures is
between subjects correlated with performance, whether they are intrusive on operator perfor-
but not changes in an individual’s SA within a mance, SAGAT by freezing the scenario, and
Evaluation of Objective Measures of SA 13

Table 5: Predictiveness of SA Metrics

(95% Confidence
Method Number Predictiveness (%) Mean Pearson’s r Interval)

End of trial 4 100 .533 [.522, .545]


SAGAT 35 89 .459 [.432, .487]
SPAM 12 100 .411 [.368, .454]
Real-time probes 1 0 — —
SALSA 1 50 .184 —

Note. SA = situation awareness; SAGAT = Situation Awareness Global Assessment Technique; SPAM = Situation
Present Assessment Technique; SALSA = Situation awareness of en-route air traffic controllers in the context of
automation.

SPAM by dual tasking the operator while con- improvement in performance was shown in the
ducting operations. A total of 18 studies were trial that included SAGAT; however, this effect
found that examined the issue of intrusiveness of was confounded with trial order and likely
one or both of the techniques, shown in Table 6. showed a learning effect for the very inexperi-
Eleven of the papers examined the effect of enced participants in that study.
SAGAT on operator performance and or work- In contrast, 6 of 15 studies (40%) that used
load. No negative effect of SAGAT administra- SPAM or real-time probes reported a negative
tion on operator performance was found in any effect on operator performance or workload.
of these studies. Endsley (1995b) showed that Pierce (2012) found a negative affect of SPAM
stopping the simulator to collect SAGAT data on both operator performance and workload.
for 30, 60, or 120 s and either 1, 2, or 3 times in Loft et al. (2016) also found that experts in their
a 20 min scenario had no effect on participant study (submarine operators) took almost 20 s to
performance as compared to scenarios in which accept SPAM queries, presumably due to work-
there were no SAGAT freezes. In a second study, load issues, calling into question its validity.
I examined whether just the possibility that a Reportedly subjects delayed answering the
SAGAT freeze would be administered created a probes when SA was lower and uncertainty
change in performance (Endsley, 2000). In that higher. Similarly Shelton, Kinston, Molyneux,
study, I told participants there would definitely and Ambrose (2013) reported that their partici-
not be a freeze in one-third of the trials and that pants (physicians) found the verbal probes intru-
there might be a freeze in the other two-thirds sive and interfered with team dialog. Some que-
(of which half actually experienced a SAGAT ries were not answered at all, presumably due to
freeze). Again, there was no effect on perfor- workload issues. Four studies reported a nega-
mance from either the SAGAT freeze or the tive effect of SPAM on concurrent performance,
threat of a SAGAT freeze, showing no “pop as compared to trials in which SPAM was not
quiz” effect. administered (Pierce, 2012; Pierce, Strybel, &
Since those studies, an additional nine studies Vu, 2008; Pierce, Vu, et al., 2008; Strybel et al.,
have compared trials in which SAGAT freezes 2008).
were administered to trials without SAGAT Based on these studies, SPAM appears to cre-
freezes and found no negative effect on perfor- ate significant interference with performance
mance indicating intrusiveness. In one study, a and increases operator workload. Keeler et al.
small increase in workload (2.4%) was reported (2015) implemented SPAM with computerized
in the SAGAT trials (Loft, Bowden, et al., 2015), queries rather than verbal queries, similarly to
but the authors reported that this may have been Bacon and Strybel (2013) and Silva et al. (2013).
due to the fact that twice as many queries were None of these 3 studies found SPAM to be intru-
provided in the SAGAT trials than in SPAM tri- sive. However, their instructions were to only
als. In another study (Kraemer & Süß, 2015), an answer the SPAM queries if the operator felt
Table 6: Intrusiveness of SA Metrics

14
Study Domain Environment Subjects SAGAT SPAM Notes

Durso et al. (1998) ATC Microworld Students No effect on No effect on performance Only 1 SAGAT question/stop
performance
Durso, Bleckley, and ATC Microworld Students No effect on No effect on performance or  
Dattel (2006) performance or workload
workload
Bacon and Strybel (2013) ATC Simulation Novice — No effect on performance or Trend toward faster RT on conflicts of
workload probed aircraft
Endsley (1995b) Aviation Simulation Experienced No effect on — 0, 1, 2, or 3 stops of 30, 60, or 120 s
performance
Endsley (2000) Aviation Simulation Experienced No effect on — Performance not effected by SAGAT
performance stop or by possibility of SAGAT stop
Hogg, Folles, Strand-Volden, Nuclear Simulation Experienced No effect on — (SACRI) Subjective performance
and Torralba (1995) Power performance
Keeler et al. (2015) ATC Simulation Novice No effect on performance Computer version of SPAM
Kraemer and Süß (2015) ATC Simulation Students Positive effect on No effect on performance Confounded with order effect (i.e.,
performance improvement due to learning)
Loft, Bowden, et al. (2015) Submarine Microworld Students No effect on No effect on performance or More queries with SAGAT (2%
performance, small workload increase in workload)
effect on workload
Loft et al. (2016) Submarine Microworld Experienced and — Negative effect on performance Experts took almost 20 s to accept
students SPAM queries—longer when SA
lower and uncertainty higher
C. A. Morgan, Chiappe, Kraut, ATC Microworld Novices No effect on No effect on workload  
Strybel, and Vu (2012) workload
Pierce (2012) ATC Microworld Students — Negative effect on performance  
& workload
Pierce, Vu, Nguyen, and ATC Microworld Students — Negative effect on performance  
Strybel (2008)
Pierce, Strybel, and Vu (2008) ATC Microworld Students No effect on Negative effect on performance SAGAT end of trial
performance
Shelton, Kinston, Molyneux, Health Care Simulation Novice — Negative effect on discussion Verbal queries viewed as intrusive,
and Ambrose (2013) confused with dialog
Silva et al. (2013) ATC Simulation Novice — No effect on performance Computer version of SPAM
Snow and Reising (1999) Aviation Simulation Experienced No effect on No effect on performance  
performance
Strybel, Vu, Kraft, and Aviation Simulation Experienced No effect on Negative effect on performance Performance worse on SPAM trials
Minakata (2008) performance compared to SAGAT trials

Note. SAGAT = Situation Awareness Global Assessment Technique; SPAM = Situation Present Assessment Technique; ATC = air traffic control; RT = response time; SACRI = SA
Control Room Inventory; SA = situation awareness.
Evaluation of Objective Measures of SA 15

able. This results in a skewing of SA probes into This discussion rests partially on whether people
low workload periods, as will be discussed in a in complex domains need to gather and integrate
later section. information in memory to make relevant, timely
These results indicate that concerns over the decisions, or whether one believes that simply
potential intrusiveness of SAGAT are unwar- looking up information as needed is sufficient
ranted, with no negative effects on performance for expert performance. In time-critical domains
in any of the studies that examined it. It should like driving, medicine, aviation, and ATC, it is
be noted that the reason SAGAT is not intrusive clear that the former is the case.
on performance is that during the freeze period Endsley (2015b) provides a detailed discus-
operators are actively working with their situa- sion on the role of memory in SA, showing that
tion knowledge to answer the queries (Endsley, inexperienced operators’ SA is often constrained
1995b). This keeps the situational information by working memory limitations, but that experts
active in memory. have access to long-term memory stores that
This is a very different situation than inter- largely circumvent these limits. The question for
ruptions in which a person conducts unrelated SA measurement is whether operators can report
tasks or answers unrelated questions, which has SA data during freezes, or whether their knowl-
been shown to be detrimental to both SA and edge will rapidly decay over the period of the
performance (Gartenberg, Breslow, McCurry, & freeze. Working memory generally lasts about
Trafton, 2014; Loft, Sadler, et al., 2015; 20 to 30 s unless it is kept activated (Baddeley,
McGowan & Banbury, 2004; Ratwani & Traf- 1986).
ton, 2008). Negative effects can also be incurred To investigate this question, Endsley (1990a)
with other types of interference that are not con- provided pilots with SAGAT queries in random
sistent with the SAGAT methodology. For order during freezes. This study showed that
example, Tremblay, Vachon, Lafond, & Kramer pilots were equally accurate in answering
(2012) required people to recover SA of changed SAGAT queries across a 6-min period after a
situations by failing to pause the simulation freeze, indicating no memory decay. Endsley
while displays were blanked, negatively effect- (1990a, 2000) concluded that this result supports
ing task performance. McGowan and Banbury a model of cognition in which working memory
(2004) provided 10-s interruptions in 20- to is an activated subset of long-term memory
25-s-long video clips that were passively (Cowan, 1988). This model is consistent with
viewed, showing both that irrelevant questions many others, including Durso and Gronlund
negatively affected performance and that rele- (1999), Adams, Tenney, and Pew (1995), and
vant questions improved performance (even Sarter and Woods (1991).
when the trial was not interrupted). This very A number of studies were found in this review
short, highly artificial task is not representative that refute the charge that SAGAT is overly
of the normal types of domain relevant and dependent on working memory, shown in
engaging simulations that SA data are collected Appendix D, which is available with the manu-
in. It likely became a task of memorization for script on the Human Factors website. Endsley
the very narrow set of items that were ques- and Bolstad (1994) showed that primary work-
tioned during the study. In contrast, SAGAT is ing memory ability did not predict SAGAT
generally only collected two to three times, no scores in experienced pilots. Sulistyawati, Wick-
closer than 3 min apart in a 20-min interactive ens, and Chui (2011) also found that experienced
scenario where participants are engaged in nor- pilots answered SAGAT questions equally well,
mal task performance, and provides a wide irrespective of their scores on memory span
range of queries that cannot be effectively pre- tests. Jipp and Ackerman (2016), similarly,
pared for in advance. found no relationship between primary working
memory scores and SAGAT in an ATC task with
Memory Dependence student participants.
Another concern about SAGAT has been Gonzalez and Wimisberg (2007) showed that
whether it is overly reliant on working memory. Level 1 SA, as measured by SAGAT, improved
16 Month XXXX - Human Factors

over time with experience on a task, and its rela- Bleckley, Conway, & Engle, 2001; Poole &
tionship to a visual working-memory measure Kane, 2009; Unsworth & Engle, 2005; Unsworth
decreased accordingly. Gutzwiller and Clegg & Spillers, 2010).
(2012) found no relationship between working Primary memory (the ability to hold informa-
memory and Level 1 SA in their study, based on tion in memory), attentional control, and the
an end-of-trial SAGAT measure; however, it ability to recall information from secondary
was correlated with Level 3 SA which would be memory have been shown to be separate aspects
required for complex reasoning and projection of cognition that all contribute to Operation
per the Endsley (1995c) model of SA. Span—a measure of complex working memory
Two studies show somewhat different con- (Shipstead, Lindsey, Marshall, & Engle, 2014).
clusions. Durso et al. (2006) reported that As concluded by Shipstead et al (2014)
SAGAT scores correlated with measures of
complex working memory (Operational Span
these mechanisms are not similarly rep-
and Reading Span) as well as fluid intelligence,
resented by all working memory tasks.
but not measures of primary working memory or
Running memory span performance
visual memory. It should be noted that they used
reflects primary memory more strongly
inexperienced subjects from the general popula-
than either complex span or visual arrays
tion performing in an ATC simulation. They also
tasks. The performance of these later tasks
provided questions about the situation that were
is more closely associated with a persons’
not necessarily reflective of the SA needed to
attentional control and retrieval abilities.
perform the task, such as “at what level will the
(p. 138)
next airplane to have a heading change be?”
This type of question requires extensive manip-
ulation of information in working memory to Therefore, it appears that both Durso et al.
answer, as compared to typical SAGAT queries (2006) and Cak et al. (2019) employed a com-
that ask for information that should already be plex span working memory measure that is more
known to perform the task (e.g., what is the next closely representative of factors such as atten-
sector of the indicated aircraft?). tional control and secondary memory aspects,
Cak, Say, and Misirlisoy (2019) found that whereas the other studies that examined the role
both SAGAT and SPAM were correlated with an of working memory on SA have operationalized
Operational Span complex working memory it via tests which more closely reflect primary
test in a study with experienced pilots. (In addi- memory.
tion, both measures were correlated with the From the standpoint of SA, therefore, it is not
number of hours the pilots had spent in the simu- too surprising that both SAGAT and SPAM per-
lator and SPAM was also correlated with a mea- formance correlated with the Operation Span
sure of divided attention.) The fact that SPAM as test, reflecting the importance of attentional con-
well as SAGAT correlated with this complex trol on SA. What is surprising is that Cak et al.
working memory measure is puzzling in that all (2019) only found SPAM to be correlated with a
information is visually present when answering measure of attention sharing ability, whereas
SPAM probes (although since SPAM includes Endsley and Bolstad (1994) found attention
queries that focus on the past, it introduces its sharing to be highly predictive of SAGAT in
own memory component). pilots. From the standpoint of the present ques-
Cak, Say, and Misirlisoy explained that their tion, it does not appear that SAGAT is overly
finding runs counter to previous studies due to reliant on the need for people to hold informa-
the different measure of working memory tion in primary (working) memory with experi-
employed in their study—the Operation Span enced participants; however both SAGAT and
test. This test requires people to actively main- SPAM do have correlations with attentional
tain information in memory while performing measures, reflecting its importance to SA.
secondary processing tasks and has been shown Therefore, with experienced participants col-
to be significantly correlated with the ability to lection of SAGAT data is not working memory
control attention in numerous studies (Kane, constrained when the information is collected
Evaluation of Objective Measures of SA 17

immediately following a freeze, for at least 2 to answers. In a meta-analysis of 65 studies, Vidu-


3 min afterwards, as they have access to relevant lich (2000) found that probe measures are most
information in longer term memory representa- sensitive when a wide range of queries are pro-
tions. Novices and inexperienced participants vided, rather than just a few. This is likely
are likely more reliant on working memory for because it precludes trying to memorize answers
processing information and typically have lower in advance of a freeze. The unpredictability of
SAGAT scores than experienced participants when the freezes will occur also avoids memori-
(Hogan, Pace, Hapgood, & Boone, 2006; Jack- zation and effects on performance (Endsley,
son, Chapman, & Crundall, 2009; Kass, Cole, & 1995b, 2000).
Stanny, 2007; Randel, Pugh, & Reed, 1996;
Soliman & Mathna, 2009; Strater, Endsley, Ple- Speed-Accuracy Tradeoffs
ban, & Matthews, 2001). Even so, it is interest-
A number of studies found that answers to
ing to find that there were not significant differ-
SPAM probes were often more accurate than
ences in the sensitivity of SAGAT based on the
similarly worded SAGAT queries (Durso et al.,
experience level of the participants in the 152
2006; Pierce, Strybel, & Vu, 2008; Strybel
studies reviewed here (p >.05). SAGAT was sen-
et al., 2008), which is not surprising considering
sitive with both experienced and inexperienced
that the correct answer is clearly visible when
participants. Overall these findings are consis-
responding to a SPAM probe.
tent with recent work showing that with increas-
While SPAM reaction time theoretically
ing levels of domain expertise people rely more
measures how cognitively available information
on running memory, actively updating and pro-
is, a number of studies reviewed show that it
cessing new information as the task proceeds,
may instead be measuring look up time. SPAM
as opposed to passive working memory that
and SAGAT differ in two main respects: SAGAT
simply tries to memorize information without
pauses the simulation and hides displays during
further processing (Anderson-Montoya, Scerbo,
query administration, while SPAM leaves them
Ramirez, & Hubbard, 2017).
visible and does not pause the simulation. C. A.
It should also be noted that a few studies
Morgan, Chiappe, Kraut, Strybel, and Vu (2012)
using SAGAT applied it to very short, artificial
performed a study comparing the same ques-
tasks that may not be suitable. Gugerty (1998),
tions provided when the display was visible ver-
for example, provided very short scenarios (18–
sus not visible and when the simulation was
35 s) and then assessed driver’s ability to recall
paused or not paused. They found that SA que-
the location of other vehicles. deWinter et al.
ries were more accurate when the displays
(2018) provided participants with a set of six
remained visible, as would be expected, but that
gages to monitor with instructions to report
response time was almost 2 s slower, revealing a
when a gage exceeded a limit. Even though
significant speed-accuracy tradeoff. Several
SAGAT was sensitive in these studies, they are
other studies have also found speed-accuracy
not realistic examples of real-world tasks. Such
tradeoffs for SPAM and real-time probe mea-
artificial tasks likely leave participants with little
sures (Alexander & Wickens, 2004, 2005; Jones
to do but attempt to memorize information.
& Endsley, 2004; Taber, McCabe, Klein, &
In contrast, SAGAT was designed for realis-
Pelot, 2013). In general, it appears that people
tic, ecologically valid simulations, and it is rec-
will look to check before answering queries if
ommended that no SAGAT freezes occur for at
the information is available, showing that SPAM
least the first 3 min (and with at least 3 min in
more accurately assesses the time for people to
between freezes) in order that participants
look-up information on a display, regardless of
become engaged in performing their tasks (End-
whether they know the answer.
sley, 1995b, 2000). Participants are instructed to
perform their tasks as usual and to simply answer
the SAGAT questions to the best of their ability. Sampling Bias
Because SAGAT queries are representative An additional concern is whether the meth-
across a wide sample of normal SA require- ods contain any sampling bias. Because SAGAT
ments, this precludes trying to memorize recommends conducting the freezes at random
18 Month XXXX - Human Factors

times, the likelihood of sampling bias (e.g., by also been shown that SA and workload can be
collecting during only high workload or low independent across much of the demand scale,
periods, or just during interesting events) is with low SA possible in low-workload periods
minimized. While Lau, Jamieson, and Skraan- (vigilance) and both high and low SA pos-
ing (2016) advocate for collecting SA at only sible across moderate workload levels (Endsley,
important times as indicated by a subject matter 1993a; Vidulich & Tsang, 2015). It is therefore
expert, this can cue subjects and provide arti- important that SA measures be clearly distin-
ficial information affecting their performance guishable from workload measures.
(Endsley, 1995b, 2000). The fact that operators must answer real-time
SPAM provides probes at random intervals as probes while conducting operational tasks cre-
well; however, participants are able to delay ates a situation in which it serves as a secondary
answering, or may even neglect to answer. workload measure. Jones and Endsley (2004)
Durso et al. (1998) noted that participants could found a weak correlation between real-time
take as long as 10 s to answer the call line to get probes and a concurrent measure of workload.
a SPAM probe in their study, and as long as 4 s This issue is somewhat reduced with the intro-
to answer a query. Loft et al. (2016) found their duction of a ready prompt with the SPAM tech-
participants took as long as 20 s to accept que- nique, which allows participants to answer the
ries, taking longer when SA was low or uncer- probes while they have more spare capacity, but
tainty was high. Trapsilawati, Wickens, Qu, and it may not be completely eliminated.
Chen (2016) reported means latencies in accept- Pierce, Strybel, and Vu (2008), for example,
ing a SPAM probe of between 9 and 23 s. Stry- found that SPAM latency measures were not
bel, Vu, Battiste, and Johnson (2013) reported predictive when participants were under high
that 5% of probes went unanswered, and Cun- workload because they neglected the SPAM
ningham et al. (2015) found that pilots did not probes while meeting task demands. Pierce,
respond the ready prompt 16% of the time in Strybel, and Vu (2008) also found correlations
their study due to workload. Further two addi- between workload subscale ratings on trials that
tional studies showed that SPAM latency mea- used SPAM and those that involved other types
sures are correlated with NASA TLX scores, of secondary tasks. Given that Pierce (2012)
indicating that it is associated with workload found a negative effect of SPAM on workload,
(Strybel et al., 2010; Strybel et al., 2008). and that 40% of the studies reported on here that
These results indicate that participants may used SPAM found negative effects on perfor-
delay attending to SPAM probes until lower mance, it seems likely that SPAM is providing
workload periods. While the time to accept a secondary task loading. Coupled with the find-
probe is a secondary workload indicator, this ing that people frequently look up the answer to
ability to delay answering probes, or to ignore SPAM probes, which is a secondary task, there
them completely, creates a bias in the measure in is considerable evidence that SPAM likely mea-
favor of lower workload periods. While Bacon sures workload to some degree, rather than SA.
and Strybel (2013), Silva et al. (2013), and Kee-
ler et al., (2015) reported that instructing partici- Query Design
pants to only answer SPAM queries if they felt Whichever measure is used, the actual que-
able to and providing computerized SPAM que- ries provided are critical to obtaining valid
ries rather than verbal queries could reduce research results. Variability between the stud-
SPAM intrusiveness, this method also system- ies reviewed in this study may be due to (a)
atically skews SA probes into low-workload whether they employed a systematic analysis
periods. of SA requirements for the domain under study,
(b) construction of clear queries that can be
SA vs. Workload consistently scored, (c) differences between
Another theoretical concern is the extent to SAGAT and SPAM in the types of information
which SPAM is measuring SA or workload. solicited, and (d) methods for soliciting partici-
While high workload can lead to low SA, it has pant responses.
Evaluation of Objective Measures of SA 19

Reflection of SA requirements.  One observed SPAM-future and SAGAT-projection queries


issue with SPAM, and with some researchers seem clearly aligned. SPAM-present and
use of SAGAT, has been the development of SAGAT-perception also seem to be aligned,
queries that may in fact have little to do with the although SPAM-present may also potentially
operator’s SA requirements. For example, Durso include SAGAT-comprehension questions.
et al. (1998) posed “Which has the lower alti- SPAM includes questions on the past under
tude, TWA799 or AAL957?” While that probe the belief that remembering past events is impor-
undoubtedly requires knowledge of situational tant for projecting future events (Durso et al.,
information to answer (either from memory or 2006). This, however, adds a clear memory com-
from displays), it is not something that air traffic ponent to SPAM that may or may not be relevant
controllers typically need to know to do their to current operations. SA is generally considered
jobs (Endsley & Rodgers, 1994b). Similarly, to be a person’s knowledge and understanding of
Chiappe et al. (2015) report providing queries the present, and its reliance on what has occurred
such as “are the majority of the aircraft in your in the past is only relevant to the extent that it
sector westbound?” Someone looking at a dis- affects the present or future, for example, under-
play could answer that question, but it is quite standing the trend in some variable (Endsley,
extraneous to what is required for good SA in 1995c). SAGAT may directly ask whether some
ATC (Endsley & Jones, 1995; Endsley & Rodg- relevant variable (e.g., altitude or speed) is
ers, 1994a). Asking questions about the situation increasing, decreasing, or staying level, for
is not the same as asking questions that are per- example, but would not ask what it was at some
tinent to SA. Whichever technique is used, a prior point in time. In other words, while SPAM
detailed analysis of SA requirements for the asks specifically for memory of past events or
operational role being examined is extremely status, SAGAT asks for how information is
important (Endsley, 1995b). While many of the changing or how it will change in the future, not
studies reviewed reported conducting a goal- specifically requiring reporting on memory of
directed task analysis (GDTA) or other analysis the past. Thus, SPAM may be more reflective of
of the task domain to determine their questions, memory than SAGAT in this respect.
others did not. Query presentation format.  A final issue has
Query construction. Second, crafting clear, to do with the referents for queries that are pro-
concise questions that can be objectively scored vided. It is recommended that SAGAT queries
as correct or incorrect is important. The ques- start with a domain relevant map that allows par-
tions should not be subjective or ambiguous, and ticipants to input the relevant objects (e.g., air-
should produce the same answer from experts craft, automobiles, friendly, and enemy forces)
with full knowledge of the situation. For exam- that they know about in an intuitive spatial lay-
ple, Lau, Jamieson, and Skraaning (2014) had out (Endsley, 1995b, 2000). Subsequent queries
trouble with ill-defined queries that could not be refer back to the objects entered on that map.
consistently scored by three experts who were This is because identifiers (e.g., aircraft call
actually looking at the same displays, even with signs, track numbers, train numbers) are often
all the data visible and all the time needed, not known, but rather are peripheral information
resulting in low inter-rater reliability in evaluat- that serve a functional task, but have little value
ing their queries. While typically the query scor- otherwise in terms of situation understanding
ing key is determined via data in the simulation (Endsley, 2015a; Endsley & Rodgers, 1998).
computer, this study points to the importance of Implementations of SAGAT measures that rely
creating clear and relevant queries. heavily on such identifiers to answer queries
Query focus. A notable difference between create an artificial memory requirement that is
SPAM and SAGAT is that SPAM creates ques- likely to lead to difficulties such as experienced
tions that test a participant’s knowledge of the by Golightly, Wilson, Lowe, and Sharples
past, present, and the future, while SAGAT (2010) in train operations, or Loft et al. (2018) in
questions are directed at the perception, compre- submarine management. Also Ratwani and Traf-
hension, and projection components of SA. ton (2008) found that spatial memory (such as
20 Month XXXX - Human Factors

that provided with maps in SAGAT) is impor- 2. Allowing for a collaborative team response to
tant for resumption from an interruption. queries (two studies);
Conversely, SPAM is heavily reliant on iden- 3. Assessing Shared SA (or SA similarity) as the
tifiers, but this does not cause a problem because degree of concurrence of teammates on infor-
they are readily available on displays. And Stry- mation elements that are relevant to both roles
bel et al. (2016) found that participants per- (14 studies);
formed more poorly on SPAM with graphically 4. Examining the degree to which team members
presented probes as compared to verbal probes, are aware of the SA of each other (Team Meta-
most likely due to physical interference with the SA) (three studies);
ongoing task, or potentially because they were 5. Assessing the correlation between the SA of
presented in a way that made guessing less easy. different team members, or sub-teams (six
In contrast, Shelton et al. (2013) created a text studies).
based display for SPAM probes on a small hand-
held computer because verbally presented que- The SA of teams was assessed in a variety of
ries interfered with normal discussions in a study domains including military (10), health care (5),
with health care teams. aviation (4), emergency management (4), pro-
cess control (3), and business (1) settings. While
Team SA most studies were conducted in simulators or
microworlds, two studies were conducted in live
In addition to assessing individual SA, many
exercises.
researchers are also interested in the SA of
Although the exact methods for assessing
teams. A total of 27 studies were found that used
SA in teams varied among these studies, a few
either SAGAT or SPAM to assess SA at the team
general statements can be made. Shared SA as
level, shown in Appendix F, which is available
measured by SAGAT was shown to be predic-
with the manuscript on the Human Factors
tive of overall team performance (Bonney,
website. Because SPAM does not elicit data
Davis-Sramek, & Cadotte, 2016; Coolen, Draa-
from multiple operators at the same time, and
isma, & Loeffen, 2019; Rosenman et al., 2018),
queries and responses are generally provided
as well as when it was measured by SPAM
verbally, an assessment of shared or team SA is
(Cooke, Kiekel, & Helm, 2001). Combining
logistically challenging with the technique, and
SA scores into an overall Team SA score gener-
only two papers were found that used SPAM for
ated mixed results, with five studies showing it
assessing team SA.
to be predictive of overall team performance
A key advantage of SAGAT is that it allows
(Cooke et al., 2001; Crozier et al., 2015; Gard-
for the assessment of team SA (Bolstad & Ends-
ner, Kosemund, & Martinez, 2017; Parush
ley, 2003) because it is administered to all par-
et al., 2017; Prince, Ellis, Brannick, & Salas,
ticipants in a simulation at the same time. Prob-
2007), and three that did not (Brooks, Switzer,
lems with tools and bottlenecks in team pro-
& Gugerty, 2003; P. Morgan et al., 2015;
cesses where information is not passed, or where
Sorensen, Stanton, & Banks, 2010). This body
different interpretations are made from data that
of research also demonstrated the effect of
could lead to poor team performance can be
team membership on the SA of individuals
readily identified by comparing the SA of differ-
(Bolstad & Endsley, 2003; Cuevas & Bolstad,
ent team members or sub-teams at the same
2010; Leggatt, 2004; Sætrevik, 2012; Seet,
point in time. A total of 24 studies were found
Teh, Soo, & Teo, 2004), as well as the signifi-
that used SAGAT to evaluate team SA, 1 used
cant effect of the team leader (Cuevas & Bols-
SACRI, and 1 used QUASA.
tad, 2010), although one study did not find a
These measures were used to assess team SA
significant effect of team membership on
in several ways including
Shared SA (Sætrevik, 2012).
In addition, measures of combined Team SA
1. Creating a combined or average SA score and Shared SA were shown to be sensitive to
across the team as an overall Team SA score other aspects of teams, including team knowl-
(11 studies); edge (Bolstad, Cuevas, Gonzalez, & Schneider,
Evaluation of Objective Measures of SA 21

2005; Cooke, Stout, Rivera, & Salas, 1998), based on accuracy of individual queries or SA
team cohesion and coordination (Hallbert, levels, rather than an overall combined SA score.
1997), team experience (Crozier et al., 2015), Studies that collected only a few samples or that
organizational hub distance (Bolstad et al., 2005; collected data only at the end of the trial had
Saner, Bolstad, Gonzalez, & Cuevas, 2009), lower sensitivity.
team displays (Javed, Norris, & Johnston, 2012), This analysis shows that claims that SAGAT
information flows (Artman, 1999), and stress is intrusive (Chiappe et al., 2012; de Winter
(Price & LaFiandra, 2017). et al., 2019; Durso et al., 1998; Salmon et al.,
Only a few studies have examined team meta- 2011; Sarter & Woods, 1991) are unwarranted,
awareness, with Sulistyawati, Chui, and Wick- with no negative effects on performance found
ens (2008) and Sulistyawati et al. (2009) show- in the 11 studies that examined it. The conditions
ing that fighter pilots’ SA was moderately cor- of SAGAT administration are significantly dif-
related with their teammates, but awareness of ferent than interruptions with unrelated tasks.
the teammates’ SA was not predictive of perfor- This analysis also lays to rest many other poten-
mance. Yuan, She, Li, Zhang, and Wu (2016) tial criticisms of SAGAT. Concerns about the
found that awareness of teammate SA was nega- over-reliance of SAGAT on working memory
tively correlated with own SA, however, likely (de Winter et al., 2019; Gugerty, 1998; Jeannot,
due to competing task demands. Kelly, & Thompson, 2003; Langan-Fox, San-
While the SA of teams has often been investi- key, & Canty, 2009; Stanton, Salmon, Walker, &
gated by examining team communications or Jenkins, 2010) were found to be unsupported.
coordination metrics, or by an examination of Most SA measurement examined in the pres-
the types of processes they engage in, this work ent analysis was conducted in medium to high
shows that objective measurement of the overall fidelity simulations, or in computer micro-
Team SA or Shared SA that people have when worlds, both to good effect. While the ability to
working in teams can provide useful comple- administer SA queries in real-world situations is
mentary information to this body of research. often cited as a potential benefit of SPAM as
compared to SAGAT, it was interesting to find
Conclusions that SPAM was only used in two live exercises,
In conclusion, this analysis examined 243 while SAGAT was used in nine. All of these
papers that measured SA using SAGAT, SPAM, were controlled studies, rather than actual real
or one of their variants, and provided a useful world activities.
examination of concerns regarding these tech- SAGAT also was found to provide valuable
niques. While both SAGAT and SPAM were insights into the nature of Team SA and Shared
found to be equally predictive of performance, SA in a number of studies. The ability to deter-
this analysis shows that SPAM has significantly mine the outcome of team communications,
lower sensitivity than SAGAT (64% compared coordination processes, or compositions pro-
to 94%) and is far more intrusive, with 40% vides a useful addition to studies in this area.
of studies showing an effect of SPAM on task This is in contrast to claims that SAGAT is not
workload or performance. In addition, issues suited to study of team SA (Langan-Fox et al.,
with speed-accuracy trade-offs, correlations 2009; Stanton et al., 2017).
with workload, and a sampling bias in favor A few other spurious criticisms of SAGAT
of lower workload periods create considerable have been made that also deserve comment.
concern over SPAM’s validity as a measure of Claims that SAGAT does not capture situation
SA. Evidence of any benefit of “situatedness” dynamics (Langan-Fox et al., 2009; Salmon
during the SA probe is lacking. et al., 2008) are perplexing in that many queries
When administered during simulation freezes do indeed cover issues of how key information
and with adequate sample size, SAGAT was is changing (e.g., whether aircraft are ascending
shown to have a very high level of sensitivity to or descending in Table 1), as well as how infor-
study manipulations at 94%. Sensitivity was mation is projected to change over time as repre-
shown to be much higher when the analysis was sented in Level 3 SA questions.
22 Month XXXX - Human Factors

Additional concerns have been raised over testing reliant on p-values on the basis that such
whether SA queries, which tap into operator’s statistics are often misinterpreted, do not indi-
explicit awareness, are in some way unreflective cate the magnitude of effects, and that statistical
of implicit awareness (de Winter et al., 2019; assumptions are rarely tested, instead favoring
Gugerty, 1997; Lo, Sehic, Brookhuis, & Meijer, the reporting of confidence intervals and effects
2016). Endsley (1995c) describes ways in which sizes (Cumming, 2014; Kline, 2004). However,
automatized performance may be accomplished others have pointed out that these statistical
without much conscious awareness (e.g., a approaches are reliant on the same statistics as
driver following a well known route). While this p-values and that many concerns are over-
allows for effective performance in some cir- generalized (Kennedy, 2015; van der Linden &
cumstances, the resultant lower conscious Chryst, 2015). In the present research, a consis-
awareness can be a problem if something novel tent p-value of .05 was used in comparing stud-
occurs (e.g., the driver would miss a new stop ies, as well as a wider p-value of .10 to consider
sign, or a police officer looking for speeders). I trend data, although very few studies were added
argue that this represents one of the ways in with this more lenient criterion.
which SA (as an explicit measure of situation In addition, variance in research results may
knowledge) can sometimes diverge from perfor- have been introduced by the manner in which
mance outcomes. Despite this possibility, different researchers have employed the SA
SAGAT was shown to be highly predictive of methods, which was specifically examined in
performance. the analyses to the extent possible and reported
Other domain specific criticisms of SAGAT on. Finally, variance in research results may be
are also largely unsupported. Langan-Fox et al. due to factors associated with study designs. The
(2009) and Jeannot et al. (2003), for example, analysis considered the effects of domain, exper-
claim that SAGAT for ATC (a) considers all air- imental condition, participant type, and number
craft equal, which is not the case (for example, of subjects, which were largely insignificant;
see Endsley, Mogford, & Stein, 1997; Endsley, however, there could be other unknown sources
Sollenberger, & Stein, 1999), (b) requires expen- of variances between the studies.
sive simulators (see also Salmon, Stanton, In conclusion, many of the methodological
Walker, & Green, 2006), even though many concerns regarding objective measurement of
studies use inexpensive microworlds or com- SA via SAGAT can be laid to rest based on this
puter games, and (c) is not suited to multi-sector meta-analysis of research on SA over the past 30
studies, which is not accurate as many studies years. When implemented as designed, SAGAT
involve multiple sectors (e.g., Endsley & Rodg- can provide valuable insights into the effects of
ers, 1998). Overall SAGAT had 94% sensitivity display designs, training implementations, auto-
in ATC studies. mation effects, individual differences, and
SAGAT was successfully used in 20 military expertise, and into the construct of SA in both
studies, including 5 in distributed team command individuals and teams.
and control, despite claims that it is unsuitable
for this domain (Salmon et al., 2006; Stanton Key Points
et al., 2010). Similarly, Golightly et al. (2010) •• A meta-analysis of studies employing direct,
concluded SAGAT was not suited for SA mea- objective measures of SA found both SAGAT and
surement in the train driving domain, based on a SPAM to be predictive of performance.
short study in which they only collected a small •• SPAM was found to be less sensitive than SAGAT,
amount of data at the end of the trial, signifi- intrusive on primary task performance and to suf-
cantly reducing its sensitivity. Rose et al. (2019) fer from speed-accuracy trade-offs, correlations
found it sensitive in train driving, however. with workload, and a sampling bias in favor of
As a caveat, the present meta-analysis is con- lower workload periods, which make it problem-
strained by the available data and statistics atic for the measurement of SA.
reported on in the various studies reviewed. •• SAGAT was found to have a high level of sen-
Some have criticized the use of hypothesis sitivity (94%) when employed using the freeze
Evaluation of Objective Measures of SA 23

technique and analyzed by query or SA level, not operators. Proceedings of the Human Factors and Ergonomics
Society Annual Meeting, 47, 606–609.
to suffer from primary memory constraints, and
Burns, C. M., Skraaning, G., Jr., Jamieson, G. A., Lau, N., Kwok,
not to be intrusive on task performance across a J., Welch, R., & Andresen, G. (2008). Evaluation of ecological
large number of studies. It provides useful insights interface design for nuclear process control: Situation aware-
into both individual and team SA in a wide variety ness effects. Human Factors, 50, 663–679.
Bushman, B. J. (1994). Vote-counting procedures in meta-analysis.
of domains and experimental settings.
In H. Cooper & L. V. Hedges (Eds.), The handbook of research
synthesis (pp. 192–213). New York, NY: Russell Sage.
ORCID iD Cak, S., Say, B., & Misirlisoy, M. (2019). Effects of working mem-
ory, attention, and expertise on pilots’ situation awareness.
Mica R. Endsley https://orcid.org/0000- Cognition, Technology & Work. Advance online publication.
0002-2359-947X doi:10.1007/s10111-019-00551-w
Chiappe, D. L., Rorie, R. C., Moran, C. A., & Vu, K. P. L. (2012).
Supplemental Material A situated approach to the acquisition of shared SA in team
contexts. Theoretical Issues in Ergonomics Science, 15, 69–87.
The online supplemental material is available Chiappe, D. L., Strybel, T. Z., & Vu, K. P. L. (2015). A situated
with the manuscript on the Human Factors website. approach to the understanding of dynamic situations. Journal
of Cognitive Engineering and Decision Making, 9, 33–43.
Cohen, J. (1995). Statistical power analysis for the behavioral sci-
References ences. New York, NY: Routledge Academic.
Adams, M. J., Tenney, Y. J., & Pew, R. W. (1995). Situation aware- Cooke, N. J., Kiekel, P. A., & Helm, E. E. (2001). Measuring team
ness and the cognitive management of complex systems. knowledge during skill acquisition of a complex task. Interna-
Human Factors, 37, 85–104. tional Journal of Cognitive Ergonomics, 5, 297–315.
Alexander, A. L., & Wickens, C. D. (2004). Measuring traffic Cooke, N. J., Stout, R., Rivera, K., & Salas, E. (1998). Explor-
awareness in an integrated hazard display. Proceedings of the ing measures of team knowledge. Proceedings of the Human
Human Factors and Ergonomics Society Annual Meeting, 48, Factors and Ergonomics Society Annual Meeting, 1, 215–219.
171–175. Coolen, E., Draaisma, J., & Loeffen, J. (2019). Measuring situa-
Alexander, A. L., & Wickens, C. D. (2005, April 18–21). Synthetic tion awareness and team effectiveness in pediatric acute care
vision systems: Flightpath tracking, situation awareness, and by using the situation global assessment technique. European
visual scanning in an integrated hazard display. Proceedings Journal of Pediatrics, 178, 837–850.
of the International Symposium on Aviation Psychology, Okla- Cowan, N. (1988). Evolving conceptions of memory storage,
homa City, OK. selective attention, and their mutual constraints within the
Anderson-Montoya, B. L., Scerbo, M. W., Ramirez, D. E., & Hub- human information processing system. Psychological Bulletin,
bard, T. W. (2017). Running memory for clinical handoffs: A 104, 163–191.
look at active and passive processing. Human Factors, 59, Cramer, H. (1946). Mathematical methods of statistics. Princeton,
393–406. NJ: Princeton University Press.
Artman, H. (1999). Situation awareness and co-operation within Crozier, M. S., Ting, H. Y., Boone, D. C., O’Regan, N. B., Ban-
and between hierarchical units in dynamic decision making. drauk, N., Furey, A., . . . Hogan, M. P. (2015). Use of human
Ergonomics, 42, 1404–1417. patient simulation and validation of the Team Situation Aware-
Bacon, L. P., & Strybel, T. Z. (2013). Assessment of the valid- ness Global Assessment Technique (TSAGAT): A multidisci-
ity and intrusiveness of online-probe questions for situation plinary team assessment tool in trauma education. Journal of
awareness in a simulated air-traffic-management task with stu- Surgical Education, 72, 156–163.
dent air-traffic controllers. Safety Science, 56, 89–95. Cuevas, H. M., & Bolstad, C. A. (2010). Influence of team leaders’
Baddeley, A. D. (1986). Human memory. Oxford, UK: Clarendon situation awareness on their team’s situation awareness and
Press. performance. Proceedings of the Human Factors and Ergo-
Bolstad, C. A., Cuevas, H., Gonzalez, C., & Schneider, M. (2005, nomics Society Annual Meeting, 54, 309–313.
May 16–19). Modeling shared situation awareness. Paper pre- Cumming, G. (2014). The new statistics: Why and how. Psycho-
sented at the 14th Conference on Behavior Representation in logical Science, 25, 7–29.
Modeling and Simulation (BRIMS), Los Angeles, CA. Cummings, M. L., & Guerlain, S. (2007). Developing operator
Bolstad, C. A., & Endsley, M. R. (2003). Measuring shared and capacity estimates for supervisory control of autonomous vehi-
team situation awareness in the army’s future objective force. cles. Human Factors, 49, 1–15.
Proceedings of the Human Factors and Ergonomics Society Cunningham, J. C., Battiste, H., Curtis, S., Hallett, E. C., Koltz, M.,
Annual Meeting, 47, 364–373. Brandt, S. L., & Johnson, W. W. (2015). Measuring situation
Bolstad, C. A., Foltz, P., Franzke, M., Cuevas, H. M., Rosenstein, awareness with probe questions: Reasons for not answering the
M., & Costello, A. M. (2007). Predicting situation awareness probes. Procedia Manufacturing, 3, 2982–2989.
from team communications. Proceedings of the Human Fac- de Winter, J., Eisma, Y., Cabrall, C., Hancock, P., & Stanton, N. A.
tors and Ergonomics Society Annual Meeting, 51, 789–793. (2019). Situation awareness based on eye movements in rela-
Bonney, L., Davis-Sramek, B., & Cadotte, E. R. (2016). “Think- tion to the task environment. Cognition, Technology & Work,
ing” about business markets: A cognitive assessment of market 21, 99–111.
awareness. Journal of Business Research, 69, 2641–2648. Durso, F. T., Bleckley, M. K., & Dattel, A. R. (2006). Does situa-
Brooks, J. O., Switzer, F. S., & Gugerty, L. (2003). Effects of tion awareness add to the validity of cognitive tests? Human
situation awareness training on novice process control plant Factors, 48, 721–733.
24 Month XXXX - Human Factors

Durso, F. T., & Gronlund, S. D. (1999). Situation awareness. Endsley, M. R., & Rodgers, M. D. (1994a). Situation awareness
In F. T. Durso, R. Nickerson, R. Schvaneveldt, S. Dumais, information requirements for en route air traffic control (DOT/
S. Lindsay, & M. Chi (Eds.), Handbook of applied cognition FAA/AM-94/27). Washington, DC: Federal Aviation Adminis-
(pp. 284–314). New York, NY: John Wiley. tration Office of Aviation Medicine.
Durso, F. T., Hackworth, C. A., Truitt, T. R., Crutchfield, J., Endsley, M. R., & Rodgers, M. D. (1994b). Situation aware-
Nikolic, D., & Manning, C. A. (1998). Situation awareness as a ness information requirements for en route air traffic control.
predictor of performance for en route air traffic controllers. Air Proceedings of the Human Factors and Ergonomics Society
Traffic Control Quarterly, 6(1), 1–20. Annual Meeting, 38, 71–75.
Endsley, M. R. (1988). Design and evaluation for situation aware- Endsley, M. R., & Rodgers, M. D. (1998). Distribution of attention,
ness enhancement. Proceedings of the Human Factors Society situation awareness, and workload in a passive air traffic con-
Annual Meeting, 32, 97–101. trol task: Implications for operational errors and automation.
Endsley, M. R. (1990a). A methodology for the objective mea- Air Traffic Control Quarterly, 6, 21–44.
surement of situation awareness. In Situational awareness in Endsley, M. R., Selcon, S. J., Hardiman, T. D., & Croft, D. G.
aerospace operations (pp. 1/1–1/9). Neuilly Sur Seine, France: (1998). A comparative evaluation of SAGAT and SART for
NATO-AGARD. evaluations of situation awareness. Proceedings of the Human
Endsley, M. R. (1990b). Situation awareness in dynamic human Factors and Ergonomics Society Annual Meeting, 42, 82–86.
decision making: Theory and measurement. Los Angeles: Uni- Endsley, M. R., Sollenberger, R., & Stein, E. (1999). The use of
versity of Southern California. predictive displays for aiding controller situation awareness.
Endsley, M. R. (1993a). Situation awareness and workload: Flip Proceedings of the Human Factors and Ergonomics Society
sides of the same coin. In R. S. Jensen & D. Neumeister (Eds.), Annual Meeting, 43, 51–55.
Proceedings of the Seventh International Symposium on Avia- Endsley, M. R., Sollenberger, R., & Stein, E. (2000). Situation
tion Psychology (pp. 906–911). Columbus, OH: Department of awareness: A comparison of measures. In Proceedings of the
Aviation, The Ohio State University. Human Performance, Situation Awareness and Automation:
Endsley, M. R. (1993b). A survey of situation awareness require- User-Centered Design for the New Millennium (pp. 15–19).
ments in air-to-air combat fighters. International Journal of Savannah, GA: SA Technologies, Inc.
Aviation Psychology, 3, 157–168. Farley, T. C., Hansman, R. J., Amonlirdviman, K., & Endsley, M.
Endsley, M. R. (1995a). Direct measurement of situation aware- R. (2000). Shared information between pilots and controllers
ness in simulations of dynamic systems: Validity and use of in tactical air traffic control. Journal of Guidance, Control and
SAGAT. In D. J. Garland & M. R. Endsley (Eds.), Experi- Dynamics, 23, 826–836.
mental analysis and measurement of situation awareness Gardner, A. K., Kosemund, M., & Martinez, J. (2017). Examin-
(pp. 107–113). Daytona Beach, FL: Embry-Riddle University. ing the feasibility and predictive validity of the SAGAT tool to
Endsley, M. R. (1995b). Measurement of situation awareness in assess situation awareness among medical trainees. Simulation
dynamic systems. Human Factors, 37, 65–84. in Healthcare, 12, 17–21.
Endsley, M. R. (1995c). Toward a theory of situation awareness in Gartenberg, D., Breslow, L., McCurry, J. M., & Trafton, J. G.
dynamic systems. Human Factors, 37, 32–64. (2014). Situation awareness recovery. Human Factors, 56,
Endsley, M. R. (2000). Direct measurement of situation aware- 710–727.
ness: Validity and use of SAGAT. In M. R. Endsley & D. J. Golightly, D., Wilson, J. R., Lowe, E., & Sharples, S. (2010). The
Garland (Eds.), Situation awareness analysis and measurement role of situation awareness for understanding signalling and
(pp. 147–174). Mahwah, NJ: Lawrence Erlbaum. control in rail operations. Theoretical Issues in Ergonomics
Endsley, M. R. (2015a). Final reflections: Situation awareness Science, 11, 84–98.
models and measures. Journal of Cognitive Engineering and Gonzalez, C., & Wimisberg, J. (2007). Situation awareness in
Decision Making, 9, 101–111. dynamic decision making: Effects of practice and working
Endsley, M. R. (2015b). Situation awareness misconceptions and memory. Journal of Cognitive Engineering and Decision Mak-
misunderstandings. Journal of Cognitive Engineering and ing, 1, 56–74.
Decision Making, 9, 4–32. Gorman, J. C., Cooke, N. J., Pederson, H. K., Connor, O. O., &
Endsley, M. R. (in press). The divergence of objective and subjec- DeJoode, J. A. (2005). Coordinated awareness of situation by
tive situation awareness: A meta-analysis. Journal of Cognitive teams (CAST): Measuring team situation awareness of a com-
Engineering and Decision Making. munications glitch. Proceedings of the Human Factors and
Endsley, M. R., & Bolstad, C. A. (1994). Individual differences Ergonomics Society Annual Meeting, 49, 274–277.
in pilot situation awareness. International Journal of Aviation Gugerty, L. (1998). Evidence from a partial report task for forget-
Psychology, 4, 241–264. ting in dynamic spatial memory. Human Factors, 40, 498–508.
Endsley, M. R., & Garland, D. J. (Eds.). (2000). Situation awareness Gugerty, L. J. (1997). Situation awareness during driving: Explicit
analysis and measurement. Mahwah, NJ: Lawrence Erlbaum. and implicit knowledge in dynamic spatial memory. Journal of
Endsley, M. R., & Jones, D. G. (1995). Situation awareness Experimental Psychology: Applied, 3, 42–66.
requirements analysis for TRACON air traffic control (TTU- Gutzwiller, R. S., & Clegg, B. A. (2012). The role of working
IE-95-01). Lubbock: Texas Tech University. memory in levels of situation awareness. Journal of Cognitive
Endsley, M. R., & Jones, D. G. (2012). Designing for situation Engineering and Decision Making, 7, 141–154.
awareness: An approach to human-centered design (2nd ed.). Hall, J., & Phelps, R. H. (1983). Nonlaboratory techniques for the
London, England: Taylor & Francis. study of cognitive and decision-making processes: A descrip-
Endsley, M. R., Mogford, R. H., & Stein, E. S. (1997). Controller tion and selected bibliography (No. 83-45). Washington, DC:
situation awareness in free flight. Proceedings of the Human U.S. Army Research Institute for the Behavioral and Social
Factors and Ergonomics Society Annual Meeting, 41, 4–8. Sciences.
Evaluation of Objective Measures of SA 25

Hallbert, B. P. (1997). Situation awareness and operator perfor- Kennedy, J. E. (2015). Critique of Cumming’s “new statistics” for
mance: Results from simulator-based studies. In Proceedings psychological research: A perspective from outside psychol-
of the IEEE Sixth Conference on Human Factors and Power ogy. Retrieved from https://jeksite.org/psi/critique_new_stat
Plants (pp. 18/11–18/16). New York, NY: IEEE. .pdf
Hamilton, K., Mancuso, V., Mohammed, S., Tesler, R., & McNeese, Kline, R. D. (2004). Beyond significance testing: Reforming data
M. (2017). Skilled and unaware: The interactive effects of team analysis methods in behavioral research. Washington, DC:
cognition, team metacognition, and task confidence on team APA Books.
performance. Journal of Cognitive Engineering and Decision Kraemer, J., & Süß, H. M. (2015). Real time validation of online
Making, 11, 382–395. situation awareness questionnaires in simulated approach air
Hauss, Y., & Eyferth, K. (2003). Securing future ATM-concepts’ traffic control. Procedia Manufacturing, 3, 3152–3159.
safety by measuring situation awareness in ATC. Aerospace Landry, S. J., & Yoo, H.-S. (2012). Sampling error and other statis-
Science and Technology, 7, 417–427. tical problems with query-based situation awareness measures.
Hogan, M. P., Pace, D. E., Hapgood, J., & Boone, D. C. (2006). Use Proceedings of the Human Factors and Ergonomics Society
of human patient simulation and the situation awareness global Annual Meeting, 56, 292–296.
assessment technique in practical trauma skills assessment. Langan-Fox, J., Sankey, M. J., & Canty, J. M. (2009). Human fac-
Journal of Trauma and Acute Care Surgery, 61, 1047–1052. tors measurement for future air traffic control systems. Human
Hogg, D. N., Follesø, K., Strand-Volden, F., & Torralba, B. (1995). Factors, 51, 595–637.
Development of a situation awareness measure to evaluate Lau, N., Jamieson, G. A., & Skraaning, G. (2014). Inter-rater reli-
advanced alarm systems in nuclear power plant control rooms. ability of query/probe-based techniques for measuring situa-
Ergonomics, 38, 2394–2413. tion awareness. Ergonomics, 57, 959–972.
Ikuma, L. H., Harvey, C., Taylor, C. F., & Handal, C. (2014). A Lau, N., Jamieson, G. A., & Skraaning, G., Jr. (2016). Situation
guide for assessing control room operator performance using awareness acquired from monitoring process plants—The
speed and accuracy, perceived workload, situation awareness, Process Overview concept and measure. Ergonomics, 59,
and eye tracking. Journal of Loss Prevention in the Process 976–988.
Industries, 32, 454–465. Leggatt, A. (2004, September 14–16). Objectively measuring the
Jackson, L., Chapman, P., & Crundall, D. (2009). What happens promulgation of commander’s intent in a coalition effects
next? Predicting other road users’ behaviour as a function of driv- based planning experiment (MNE3). Proceedings of the 9th
ing experience and processing time. Ergonomics, 52, 154–164. International Command and Control Research and Technology
Javed, Y., Norris, T., & Johnston, D. (2012, April). Evaluating Symposium, Copenhagen, Denmark.
SAVER: Measuring shared and team situation awareness of Lo, J. C., Sehic, E., Brookhuis, K. A., & Meijer, S. A. (2016).
emergency decision makers. Paper presented at the 9th Interna- Explicit or implicit situation awareness? Measuring the situ-
tional ISCRAM Conference, Vancouver, BC, Canada. ation awareness of train traffic controllers. Transportation
Jeannot, E., Kelly, C., & Thompson, D. (2003). The development of Research, Part F: Traffic Psychology and Behaviour, 43,
situation awareness measures in ATM systems. Brussels, Bel- 325–338.
gium: Eurocontrol. Loft, S., Bowden, V., Braithwaite, J., Morrell, D. B., Huf, S., &
Jipp, M., & Ackerman, P. L. (2016). The impact of higher levels of Durso, F. T. (2015). Situation awareness measures for simu-
automation on performance and situation awareness: A func- lated submarine track management. Human Factors, 57, 298–
tion of information-processing ability and working-memory 310.
capacity. Journal of Cognitive Engineering and Decision Mak- Loft, S., Jooste, L., Li, Y. R., Ballard, T., Huf, S., Lipp, O. V., &
ing, 10, 138–166. Visser, T. A. (2018). Using situation awareness and workload
Jones, D. G., & Endsley, M. R. (2000a). Can real-time probes pro- to predict performance in submarine track management: A
vide a valid measure of situation awareness? In Proceedings multilevel approach. Human Factors, 60, 978–991.
of the Human Performance, Situation Awareness and Automa- Loft, S., Morrell, D. B., Ponton, K., Braithwaite, J., Bowden, V.,
tion: User-Centered Design for the New Millennium Confer- & Huf, S. (2016). The impact of uncertain contact location on
ence (pp. 245–250). Savannah, GA: SA Technologies. situation awareness and performance in simulated submarine
Jones, D. G., & Endsley, M. R. (2000b). Overcoming represen- track management. Human Factors, 58, 1052–1068.
tational errors in complex environments. Human Factors, 42, Loft, S., Sadler, A., Braithwaite, J., & Huf, S. (2015). The chronic
367–378. detrimental impact of interruptions in a simulated submarine
Jones, D. G., & Endsley, M. R. (2004). Using real time probes for track management task. Human Factors, 57, 1417–1426.
measuring situation awareness. International Journal of Avia- McGowan, A., & Banbury, S. (2004). Interruption and reorienta-
tion Psychology, 14, 343–367. tion effects of a situation awareness probe on driving hazard
Kane, M. J., Bleckley, M. K., Conway, A. R., & Engle, R. W. anticipation. Proceedings of the Human Factors and Ergonom-
(2001). A controlled-attention view of working-memory ics Society Annual Meeting, 48, 290–294.
capacity. Journal of Experimental Psychology: General, 130, McGuinness, B. (2004). Quantitative analysis of situational aware-
169–183. ness (QUASA): Applying signal detection theory to true/false
Kass, S. J., Cole, K. S., & Stanny, C. J. (2007). Effects of distrac- probes and self-ratings. In Proceedings of the Command and
tion and experience on situation awareness and simulated driv- Control Research and Technology Symposium (pp. 1–14). Bris-
ing. Transportation Research, Part F: Traffic Psychology and tol, UK: BAE Systems Advanced Technology Centre.
Behaviour, 10, 321–329. Morgan, C. A., Chiappe, D. L., Kraut, J., Strybel, T. Z., & Vu,
Keeler, J., Battiste, H., Hallett, E. C., Roberts, Z., Winter, A., San- K. P. L. (2012). An investigation into the potential sources
chez, K., . . . Vu, K.-P. L. (2015). May I interrupt? The effect of interference in SA probe techniques. In S. J. Landry (Ed.),
of SPAM probe questions on air traffic controller performance. Advances in human aspects of aviation (pp. 623–631). Boca
Procedia Manufacturing, 3, 2998–3004. Raton, FL: CRC Press.
26 Month XXXX - Human Factors

Morgan, P., Tregunno, D., Brydges, R., Pittini, R., Tarshis, J., Rosenman, E. D., Dixon, A. J., Webb, J. M., Brolliar, S., Golden,
Kurrek, M., . . . Ryzynski, A. (2015). Using a situational S. J., Jones, K. A., . . . Chao, G. T. (2018). A simulation-based
awareness global assessment technique for interprofessional approach to measuring team situational awareness in emer-
obstetrical team training with high fidelity simulation. Journal gency medicine: A multicenter, observational study. Academic
of Interprofessional Care, 29, 13–19. Emergency Medicine, 25, 196–204.
Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can Sætrevik, B. (2012). A controlled field study of situation aware-
know: Verbal reports on mental processes. Psychological ness measures and heart rate variability in emergency handling
Review, 84, 231–259. teams. Proceedings of the Human Factors and Ergonomics
Orasanu, J. (2000). Evaluating team situation awareness through Society Annual Meeting, 56, 2006–2010.
communication. In D. J. Garland & M. R. Endsley (Eds.), Salmon, P. M., Stanton, N., Walker, G., Baber, C., Jenkins, D.,
Experimental analysis and measurement of situation aware- McMaster, R., & Young, M. (2008). What is really going on?
ness (pp. 283–288). Daytona Beach, FL: Embry-Riddle Uni- Review of situation awareness models for individuals and
versity. teams. Theoretical Issues in Ergonomics Science, 9, 297–323.
Parush, A. G., Mastoras, A., Bhandari, A., Momtahan, K., Day, K., Salmon, P. M., Stanton, N., Walker, G., & Green, D. (2006). Situa-
Weitzman, B., . . . Calder, L. (2017). Can teamwork and situ- tion awareness measurement: A review of applicability for C4i
ational awareness (SA) in ED resuscitations be improved with environments. Applied Ergonomics, 37, 225–238.
a technological cognitive aid? Design and a pilot study of a Salmon, P. M., Stanton, N. A., & Young, K. L. (2011). Situation
team situation display. Journal of Biomedical Informatics, 76, awareness on the road: Review, theoretical and methodological
154–161. issues and future directions. Theoretical Issues in Ergonomics
Pierce, R. S. (2012). The effect of SPAM administration during a Science, 13, 472–492.
dynamic simulation. Human Factors, 54, 838–848. Saner, L. D., Bolstad, C. A., Gonzalez, C., & Cuevas, H. M. (2009).
Pierce, R. S., Strybel, T. Z., & Vu, K.-P. L. (2008). Comparing situ- Measuring and predicting shared situation awareness in teams.
ation awareness measurement techniques in a low fidelity air Journal of Cognitive Engineering and Decision Making, 3,
traffic control simulation. In Proceedings of the 26th Interna- 280–308.
tional Congress of the Aeronautical Sciences (ICAS) (pp. 1–8). Sarter, N. B., & Woods, D. D. (1991). Situation awareness: A criti-
Anchorage, AS: ICAS. cal but ill-defined phenomenon. The International Journal of
Pierce, R. S., Vu, K.-P. L., Nguyen, J., & Strybel, T. Z. (2008). Aviation Psychology, 1, 45–57.
The relationship between SPAM, workload, and task per- Seet, A. W., Teh, C. A., Soo, J. K., & Teo, L. (2004). Constructible
formance on a simulated ATC task. Proceedings of the assessment for situation awareness in a distributed C2 envi-
Human Factors and Ergonomics Society Annual Meeting, ronment. Singapore: DSO National Laboratories.
52, 34–38. Shelton, C. L., Kinston, R., Molyneux, A. J., & Ambrose, L. J. (2013).
Poole, B. J., & Kane, M. J. (2009). Working-memory capacity pre- Real-time situation awareness assessment in critical illness man-
dicts the executive control of visual search among distractors: agement: Adapting the situation present assessment method to
The influences of sustained and selective attention. The Quar- clinical simulation. BMJ Quality & Safety, 22, 163–167.
terly Journal of Experimental Psychology, 62, 1430–1454. Shipstead, Z., Lindsey, D. R., Marshall, R. L., & Engle, R. W.
Price, T. F., & LaFiandra, M. (2017). The perception of team (2014). The mechanisms of working memory capacity: Pri-
engagement reduces stress induced situation awareness over- mary memory, secondary memory, and attention control. Jour-
confidence and risk-taking. Cognitive Systems Research, 46, nal of Memory and Language, 72, 116–141.
52–60. Silva, H. I., Grigoleit, T., Ann Burress, M., & Fitzpatrick, D.
Prince, C., Ellis, E., Brannick, M. T., & Salas, E. (2007). Mea- (2017). Measuring the impact of console operator experi-
surement of team situation awareness in low experience level ence in a simulated petrochemical refining emergency event.
aviators. The International Journal of Aviation Psychology, 17, Proceedings of the Human Factors and Ergonomics Society
41–57. Annual Meeting, 61, 527–531.
Prince, C., Salas, E., & Stout, R. (1995). Situation awareness: Team Silva, H. I., Ziccardi, J., Grigoleit, T., Battiste, V., Strybel, T. Z., &
measures, training methods. In D. J. Garland & M. R. Endsley Vu, K.-P. L. (2013). Are the intrusive effects of spam probes
(Eds.), Experimental analysis and measurement of situation present when operators differ by skill level and training? In
awareness (pp. 135–140). Daytona Beach, FL: Embry-Riddle S. Yamamoto (Ed.), Proceedings of the International Confer-
University Press. ence on Human Interface and the Management of Information
Pritchett, A. R., Hansman, R. J., & Johnson, E. N. (1995). Use (pp. 269–275). Berlin: Springer.
of testable responses for performance-based measurement of Smolensky, M. W. (1993). Toward the physiological measurement
situation awareness. In D. J. Garland & M. R. Endsley (Eds.), of situation awareness: The case for eye movement measure-
Experimental analysis and measurement of situation aware- ments. In Proceedings of the Human Factors and Ergonom-
ness (pp. 75–81). Daytona Beach, FL: Embry-Riddle Univer- ics Society 37th Annual Meeting (p. 41). Santa Monica, CA:
sity Press. Human Factors and Ergonomics Society.
Randel, J. M., Pugh, H. L., & Reed, S. K. (1996). Differences in Snow, M. P., & Reising, J. M. (1999). Effect of pathway-in-the-
expert and novice situation awareness in naturalistic decision sky and synthetic terrain imagery on situation awareness in a
making. International Journal of Human-computer Studies, simulated low-level ingress scenario. Wright-Patterson AFB,
45, 579–597. OH: Air Force Research Lab Crew System Interface Division.
Ratwani, R. M., & Trafton, J. G. (2008). Spatial memory guides Soliman, A. M., & Mathna, E. K. (2009). Metacognitive strategy
task resumption. Visual Cognition, 16, 1001–1010. training improves driving situation awareness. Social Behavior
Rose, J., Bearman, C., Naweed, A., & Dorrian, J. (2019). Proceed and Personality: An International Journal, 37, 1161–1170.
with caution: Using verbal protocol analysis to measure situa- Sorensen, L. J., Stanton, N., & Banks, A. P. (2010). Back to SA
tion awareness. Ergonomics, 62, 115–127. school: Contrasting three approaches to situation awareness
Evaluation of Objective Measures of SA 27

in the cockpit. Theoretical Issues in Ergonomics Science, 12, Unsworth, N., & Spillers, G. J. (2010). Working memory capacity:
451–471. Attention control, secondary memory, or both? A direct test of
Stanton, N. A., Salmon, P. M., Rafferty, L. A., Walker, G. H., the dual-component model. Journal of Memory and Language,
Baber, C., & Jenkins, D. P. (2017). Human factors methods: 62, 392–406.
A practical guide for engineering and design. Boca Raton, FL: van der Linden, S., & Chryst, B. (2015). Why the “new statistics”
CRC Press. isn’t new. Psychologist, 28, 610.
Stanton, N. A., Salmon, P. M., Walker, G., & Jenkins, D. (2010). Vidulich, M. (2000). Testing the sensitivity of situation awareness
Is situation awareness all in the mind? Theoretical Issues in metrics in interface evaluations. In M. R. Endsley & D. J. Gar-
Ergonomics Science, 11, 29–40. land (Eds.), Situation awareness analysis and measurement
Strater, L. D., Endsley, M. R., Pleban, R. J., & Matthews, M. D. (pp. 227–248). Mahwah, NJ: Lawrence Erlbaum.
(2001). Measures of platoon leader situation awareness in vir- Vidulich, M. A., & Tsang, P. S. (2015). The confluence of situation
tual decision making exercises (Research Report No. 1770). awareness and mental workload for adaptable human-machine
Alexandria, VA: Army Research Institute. systems. Journal of Cognitive Engineering and Decision Mak-
Strybel, T. Z., Chiappe, D., Vu, K.-P. L., Miramontes, A., Battiste, ing, 9, 95–97.
H., & Battiste, V. (2016). A comparison of methods for assess- Walker, G. H., Stanton, N. A., & Young, M. S. (2008). Feedback
ing situation awareness in current day and future air traffic and driver situation awareness (SA): A comparison of SA mea-
management operations: Graphics-based vs text-based online sures and contexts. Transportation Research Part F: Traffic
probe systems. IFAC-PapersOnLine, 49(19), 31–35. Psychology and Behaviour, 11, 282–299.
Strybel, T. Z., Vu, K.-P. L., Bacon, L. P., Kraut, J., Battiste, V., & Wickens, C. D. (1995). The tradeoff of design for routine and unex-
Johnson, W. (2010). Diagnosticity of an online query technique pected performance: Implications of situation awareness. In
for measuring pilot situation awareness in NextGen. In Proceed- D. J. Garland & M. R. Endsley (Eds.), Experimental analysis
ings of the 2010 IEEE/AIAA 29th Digital Avionics Systems Con- and measurement of situation awareness (pp. 57–64). Daytona
ference (DASC) (pp. 4. B. 1-1–4. B. 1-12). New York, NY: IEEE. Beach, FL: Embry-Riddle Aeronautical University Press.
Strybel, T. Z., Vu, K.-P. L., Battiste, V., & Johnson, W. (2013). Wickens, C. D. (1998). Statistics. Ergonomics in Design, 6(4),
Measuring the impact of NextGen operating concepts for sepa- 18–22.
ration assurance on pilot situation awareness and workload. Wickens, C. D. (2000). The tradeoff of design for routine and unex-
The International Journal of Aviation Psychology, 23, 1–26. pected performance: Implications of situation awareness. In M.
Strybel, T. Z., Vu, K.-P. L., Kraft, J., & Minakata, K. (2008). R. Endsley & D. J. Garland (Eds.), Situation awareness analy-
Assessing the situation awareness of pilots engaged in self sis and measurement (pp. 211–226). Mahwah, NJ: Lawrence
spacing. Proceedings of the Human Factors and Ergonomics Erlbaum.
Society Annual Meeting, 52, 11–15. Willems, B. F., & Heiney, M. (2001, December 3–7). Real-time
Sulistyawati, K., Chui, Y. P., & Wickens, C. D. (2008). Multi-method assessment of situation awareness of air traffic control special-
approach to team situation awareness. Proceedings of the Human ists on operational host computer system and display system
Factors and Ergonomics Society Annual Meeting, 52, 463–467. replacement hardware. Proceedings of the USA/Europe Air
Sulistyawati, K., Wickens, C. D., & Chui, Y. P. (2009). Explor- Traffic Management R&D Seminar, Santa Fe, NM.
ing the concept of team situation awareness in a simulated air Wilson, G. F. (2000). Strategies for psychophysiological assess-
combat environment. Journal of Cognitive Engineering and ment of situation awareness. In M. R. Endsley & D. J. Gar-
Decision Making, 3, 309–330. land (Eds.), Situation awareness analysis and measurement
Sulistyawati, K., Wickens, C. D., & Chui, Y. P. (2011). Prediction (pp. 175–188). Mahwah, NJ: Lawrence Erlbaum.
in situation awareness: Confidence bias and underlying cogni- Yuan, X., She, M., Li, Z., Zhang, Y., & Wu, X. (2016). Mutual
tive abilities. International Journal of Aviation Psychology, 21, awareness: Enhanced by interface design and improving team
153–174. performance in incident diagnosis under computerized work-
Sullivan, C., & Blackman, H. S. (1991). Insights into pilot situation ing environment. International Journal of Industrial Ergonom-
awareness using verbal protocol analysis. Proceedings of the ics, 54, 65–72.
Human Factors Society Annual Meeting, 35, 57–61.
Taber, M. J., McCabe, J., Klein, R. M., & Pelot, R. P. (2013).
Development and evaluation of an offshore oil and gas Emer-
Mica Endsley is President of SA Technologies and is
gency Response Focus Board. International Journal of Indus- the former Chief Scientist of the U.S. Air Force. She
trial Ergonomics, 43, 40–51. received a PhD in Industrial and Systems Engineer-
Trapsilawati, F., Wickens, C. D., Qu, X., & Chen, C.-H. (2016). ing from the University of Southern California. She
Benefits of imperfect conflict resolution advisory aids for
future air traffic control. Human Factors, 58, 1007–1019.
has published extensively on situation awareness,
Tremblay, S., Vachon, F., Lafond, D., & Kramer, C. (2012). Deal- automation, and system design.
ing with task interruptions in complex dynamic environments:
Are two heads better than one? Human Factors, 54, 70–83.
Unsworth, N., & Engle, R. W. (2005). Working memory capacity
and fluid abilities: Examining the correlation between opera- Date received: February 12, 2019
tion span and Raven. Intelligence, 33, 67–81. Date accepted: August 15, 2019

You might also like