You are on page 1of 11

Applied Ergonomics 45 (2014) 1484e1494

Contents lists available at ScienceDirect

Applied Ergonomics
journal homepage: www.elsevier.com/locate/apergo

Validating the Strategies Analysis Diagram: Assessing the reliability


and validity of a formative method
Miranda Cornelissen a, b, *, Roderick McClure b, Paul M. Salmon c, Neville A. Stanton d
a
Griffith Aviation, Griffith University, Nathan Campus, 170 Kessels road, Nathan, QLD 4111, Australia
b
Monash Injury Research Institute, Monash University, Building 70, Clayton Campus, Wellington Road, Clayton, VIC 3800, Australia
c
University of the Sunshine Coast Accident Research (USCAR), Faculty of Arts and Business, University of the Sunshine Coast, Locked Bag 4, Maroochydore
DC, QLD 4558, Australia
d
Civil, Maritime, Environmental Engineering and Science Unit, Faculty of Engineering and the Environment, University of Southampton, Highfield,
Southampton SO17 1BJ United Kingdom

a r t i c l e i n f o a b s t r a c t

Article history: The Strategies Analysis Diagram (SAD) is a recently developed method to model the range of possible
Received 27 May 2013 strategies available for activities in complex sociotechnical systems. Previous applications of the new
Accepted 7 April 2014 method have shown that it can effectively identify a comprehensive range of strategies available to
Available online 2 May 2014
humans performing activity within a particular system. A recurring criticism of Ergonomics methods is
however, that substantive evidence regarding their performance is lacking. For a method to be widely
Keywords:
used by other practitioners such evaluations are necessary. This article presents an evaluation of
Strategies Analysis Diagram
criterion-referenced validity and test-retest reliability of the SAD method when used by novice analysts.
Cognitive Work Analysis
Validation
The findings show that individual analyst performance was average. However, pooling the individual
analyst outputs into a group model increased the reliability and validity of the method. It is concluded
that the SAD method’s reliability and validity can be assured through the use of a structured process in
which analysts first construct an individual model, followed by either another analyst pooling the in-
dividual results or a group process pooling individual models into an agreed group model.
Ó 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.

1. Introduction are a feature of complex sociotechnical systems. Such systems exist


in a highly changeable and demanding environment, in which
Cognitive Work Analysis (CWA; Rasmussen et al., 1994; Vicente, adaptive capacity is essential (Vicente, 1999; Woods, 1988).
1999) is an Ergonomics method that is used to describe and eval- Contemporary safety related concepts such as resilience (cf.
uate complex sociotechnical systems. CWA has been applied in Hollnagel, 2006) and performance variability (cf. Hollnagel, 2002,
various complex sociotechnical systems such as defence (Burns 2004) further underline the importance of methods to take
et al., 2004), aviation (Ahlstrom, 2005), road transport adaptability of complex sociotechnical systems into account.
(Cornelissen et al., 2013), and process control (Vicente, 1999). Despite the popularity of CWA, reliability and validity analyses
CWA is one of few formative methods within the discipline of of formative methods remain largely absent from the Ergonomics
Ergonomics. Formative methods aim to describe potential ways in literature. One of the few efforts to establish validity of CWA was
which a system can operate, rather than describing how a system conducted by Burns et al. (2004). They conducted a qualitative post
should operate (normative methods) or actually operates hoc comparison of two independently conducted Work Domain
(descriptive methods) (Vicente, 1999). Formative methods are an Analyses (WDA), the first phase of CWA, on similar defence sys-
important category of methods, as they are used to assist the design tems. More formal validation studies, including validation of latter
of adaptive and flexible systems and tools that can cope with non- phases of the CWA framework, are currently lacking.
routine situations. Non-routine situations, driven by emergence, A recurring criticism aimed at the Ergonomics discipline is that
some of the methods are only used by their developers and that
methods are often chosen by practitioners based on familiarity and
* Corresponding author. Tel.: þ61 7 3735 5276. ease of use rather than based on reliability and validity evidence
E-mail addresses: m.cornelissen@griffith.edu.au (M. Cornelissen), Rod.Mcclure@ (Stanton et al., 2013). Many practitioners are unaware of whether
monash.edu (R. McClure), psalmon@usc.edu.au (P.M. Salmon), N.Stanton@soton.ac.
uk (N.A. Stanton).
the methods are reliable and valid (Stanton and Young, 1998).

http://dx.doi.org/10.1016/j.apergo.2014.04.010
0003-6870/Ó 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.
M. Cornelissen et al. / Applied Ergonomics 45 (2014) 1484e1494 1485

Fig. 1. Strategies analysis diagram.

Validation studies can benefit Ergonomics methods by providing and neither applied as often as the earlier phases. Recently, the
clear evaluation and empirical evidence of their performance and Strategies Analysis has regained attention (Cornelissen et al., 2012,
value (Stanton and Young, 1999, 2003). Further, a key requisite of 2013; Hassall and Sanderson, 2012; Hilliard et al., 2008). In
ergonomics methods is that they are usable by non-experts and particular, Cornelissen et al. (2013) developed and applied a
that they achieve acceptable levels of performance when used by structured method, the Strategies Analysis Diagram (SAD) to model
other analysts (Stanton and Young, 1999, 2003). Without this, up- strategies available in complex sociotechnical systems. Initial
take of the method by ergonomics practitioners, designers and evaluations of this method have gathered evidence of the methods
engineers may be limited. effectiveness to identify strategies possible (Cornelissen et al., 2012,
The aim of the study reported in this article was to provide a 2013.); however, the method would benefit from more formal
more formal reliability and validity analysis of CWA. This study was evaluations. This paper is a response to this requirement.
conducted using non-expert analysts to provide an evaluation of
CWA’s level of performance when used by other analysts and 1.1.1. Strategies Analysis Diagram
ensure uptake of the method by ergonomics practitioners. The SAD method models how activities can potentially be
executed within a system’s constraints. It also models criteria for
1.1. Cognitive Work Analysis when or why work will be executed in a certain way. SAD builds
upon the first phase of the CWA framework by adding verbs and
The CWA framework comprises five phases (Vicente, 1999). Each criteria to the constraints identified. This allows further specifica-
phase models a different set of constraints. First, WDA models the tion of courses of action possible within the system’s constraints as
system constraints by describing what the system is trying to well as criteria influencing the employment of courses of action.
achieve and how and with what it achieves its purpose. Second, The SAD, see Fig. 1, is a networked hierarchical diagram using
Control Task Analysis models situational constraints and decision- means ends links to represent ‘how’ and ‘why’ relationships be-
making requirements. Third, Strategies Analysis models ways in tween the different levels of the diagram. Links upwards explain
which activities within the system can be carried out. Fourth, social why a certain object or function is there, whereas links downwards
organisation and cooperation analysis models communication and explain how a system works to achieve its purpose or execute its
coordination demands imposed by organisational constraints. Fifth, functions. The levels transferred from the WDA are illustrated in
worker competencies analysis describes skills, rules and knowledge light grey and the SAD specific levels are dark grey.
required for the activities possible within the system. The lower half of the SAD models how activities can potentially
To date, applications have focussed on the development and be executed. The diagram describes, bottom up, verbs that describe
application of the first two phases: WDA and Control Task Analysis. potential interactions with or manipulations of objects in the sys-
The third phase, Strategies Analysis, is useful for providing insight tem (e.g. follow), physical objects present in the system (e.g. lane
into the different response options that enable a systems adaptive markings), object related processes afforded by the physical objects
capacity. This phase has traditionally not been as well developed (e.g. display information) and purpose related functions describing
1486 M. Cornelissen et al. / Applied Ergonomics 45 (2014) 1484e1494

activities that need to be carried out for the system to achieve its CWA are models developed by expert analysts who are highly
purpose (e.g. determine path). The top half of the diagram describes experienced in applying the methods from the CWA framework. It
why certain ways of work, or courses of action, are seen within the is worth noting that expert CWA models are not normative
system. The top of the diagram describes top-down the functional models representing the expert’s knowledge of a system, but
purpose, this is the reason why the system exists (e.g. support rather well developed formative CWA analyses conducted by
negotiation of right hand turn by road users), values and priority expert analysts. The assumption is that the expert model is most
measures evaluating the effectiveness of a system and driving accurate and highly valuable in circumstances in which uncer-
behaviour (e.g. safety) and criteria, describing when certain courses tainty exists and behaviour has yet to occur, is noisy or complex
of action at the lower end of the diagram are valid or likely to be (Bolger and Wright, 1992). In absence of other objective standards,
chosen (e.g. high traffic volume). Together, nodes from the different a model developed by expert analysts with no time constraints, is
levels provide analysts with syntax for strategy definition. For likely to be the best standard against which to compare other
example bottom up a strategy could be defined as ‘assess’ ‘road analysts’ CWA results.
users’ ‘show behaviour’ when ‘avoiding conflict with other road Once the standard against which the novice analyst’s results will
users’. Assess whether the ‘road user is unfriendly’, for ‘safety’ be assessed is established, measures to assess the quality of the
purposes and to ‘support negotiation of right hand turns by road novice results have to be determined. Quantitative methods to
users’. compare a expert results versus novices results (or predicted versus
actual outcomes) are often based on the use of signal detection
1.2. Evaluating the Strategies Analysis Diagram theory to calculate the sensitivity of the method under analysis
(Baber and Stanton, 1994; Stanton et al., 2009; Stanton and Young,
Based on initial evaluations of the SAD method (Cornelissen 2003). The signal detection theory sorts the method’s outputs into
et al., 2012; Cornelissen et al., 2013) the evidence suggests that, hits, misses, false alarms and correct rejections. Hits represent
when used by its developers, it can effectively identify a compre- items identified by novice analysts that were also identified by
hensive range of strategies available to humans performing activity expert analysts. Misses represent items that were identified by
within a particular system. Following this, the next critical step is to expert analysts but not by novice analysts. False alarms refer to
more formally evaluate the methods usefulness and ensure that the those items identified by novice analysts and not the expert ana-
SAD can be applied by other practitioners. The methods’ reliability lysts. Correct rejections are those items neither identified by the
and validity when used by novices in the SAD method should be expert or novice analysts. The signal detection theory metrics are
established. commonly used to assess the reliability and validity of ergonomics
The aim of the study reported in this article was to evaluate the methods such as human error prediction (Stanton et al., 2009);
performance of the SAD method when used by analysts not already however, not all of the metrics may usefully apply to formative
skilled in the CWA framework. Specifically, the study aimed to methods.
evaluate SAD in terms of its reliability and validity, when used by Since formative methods describe things that could be possible
novice analysts to identify the range of strategies available to within a particular system, it is complicated to use the correct re-
different road user groups at intersections. This allows for a more jections category because they are unknown and possibly infinite.
ample validation and reliability evaluation of formative methods. That is, theoretically the total number of items that could be
included is anything in the world as we know it, as formative
1.2.1. Reliability and validity analysis methods are not restrained by what is currently happening or
Evaluating the methods criterion-referenced validity entails should be happening. Therefore using an artificial number for the
ensuring that the method allows analysts to come up with a SAD number of items representing the world would be infinite and it is
that contains accurate, and not too much irrelevant, content. hard to know what items were actively rejected from this large pool
Evaluating its test-retest reliability comprises ensuring that the of items by expert and novice analysts. While others (Stanton and
method produces a similar model when used by the same analyst Baber, 2002; Stanton and Stevenage, 1998) have been able to
for the same system more than once. argue for a theoretical maximum based on a set number of tasks
and error categories provided by a taxonomy, such theoretical
1.2.1.1. Assessing validity. Studies assessing the validity of Ergo- maximum would be artificially inflated when used for formative
nomics methods have been reported in the literature (Baber and methods. Therefore, it is argued that measures using correct re-
Stanton, 1996; Stanton et al., 2009; Stanton and Young, 2003). jections are not suitable for assessing the validity of the SAD
Many of those have focussed on human reliability and error pre- method.
diction methods (Baysari et al., 2011; Kirwan et al., 1997; Stanton Measures involving hits, misses or false alarms can be used for
and Young, 2003). In those studies, the validity of methods was the evaluation of CWA. Such measures include hit rate (hits divided
assessed by comparing a method’s results (e.g. errors predicted) by hits and misses), which allows comparison of items identified by
against actual observations (e.g. errors observed). Since the CWA novice analysts versus items identified by expert analysts. The false
framework is formative and models behaviour possible and include alarm rate cannot be used here as that includes using correct re-
behaviour beyond that currently prescribed or actually seen within jections. To still account for the number of false alarms a novice
a system, a comparison of the methods results with actual obser- analyst identifies, a measure often used in clinical studies or recall
vations would not provide sufficient conclusive evidence about the studies can be used: positive predictive value (Descatha et al., 2009;
validity of the method. Lindegård Andersson and Ekman, 2008). Predictive value (hits
Alternatively, CWA results could be compared to results using a divided by hits and false alarms) reflects the amount of items
similar but validated method. However, CWA is unique in that it is a identified by novice analysts that were also identified by expert
constraints-based approach and is one of few formative methods analysts compared to the total number of items identified by novice
and substantial validation studies of such methods remain absent analysts.
from the literature to date. Therefore, no method is available that
could be used for SAD’s validation. 1.2.1.2. Assessing reliability. Reliability of Ergonomics methods is
Expert models are used when other validated standards are often assessed using a test-retest paradigm (Baysari et al., 2011).
not available (Gordon et al., 2005). Expert models in the context of Measures include percentage agreement (Baber and Stanton, 1996;
M. Cornelissen et al. / Applied Ergonomics 45 (2014) 1484e1494 1487

Fig. 2. Individual and multi-analyst approach to assessing reliability and validity.

Baysari et al., 2011) and Pearson’s correlation (Harris et al., 2005; discussion. Therefore, the present validity analysis includes both an
Stanton and Young, 2003). As discussed above, the formative na- unadjusted multi-analyst model (collating all individual items) and
ture of CWA provides some challenges to traditional Ergonomics an adjusted pooled multi-analyst model (collating all individual
reliability and validity measures. To avoid unwarranted over- items into one model, but eliminating all items that have only been
complicating of analysis, percentage agreement was applied here to identified by one novice analysts) to evaluate the value of a multi-
evaluate the test-retest reliability. The reliability measure compares analyst approach to SAD.
novice analyst’s items identified at two different times. Fig. 2 summarises the approach taken to evaluate the SAD
method. Validity was measured by quantifying hit rate and pre-
1.2.2. Multi analyst approach dictive value of participant’s results. Reliability was be measured by
CWA is a resource intensive method. The formative nature of the using a test-retest paradigm. Results were analysed for individual
method requires participants to go beyond what they currently analysts as well as for a pooled multi-analyst approach.
know which is a challenging activity. Further, the analysis concerns There were a number of hypotheses. It is expected that if the
a complex system of which analysts may understand a part of the SAD method is valid, novice analysts’ models will be similar to the
system in great detail while missing the lack of knowledge on other expert analyst model. It is expected, however, that novice analyst’s
parts. A multi-analyst approach, using more than one analyst to models will fail to produce complete coverage of the expert ana-
conduct the analysis, is therefore practicable for a method such as lyst’s model within the constraints of the study. Reasons for this
CWA to decrease resources required and compensate for shortfalls include the use of novice analysts, time constraints of a reliability
of individual analysts (Stanton et al., 2009). and validity study and the semi-structured formative approach of
Other validation studies have pooled individual results (Harris SAD. It is expected that by using a multi-analyst approach, and
et al., 2005) or proposed multiple analyst approaches (Stanton especially an adjusted multi-analyst model, results will improve
et al., 2009). These approaches are suggested to increase the val- and resemble the expert analyst’s model better. If SAD can be used
idity and comprehensiveness of the methods’ results. reliably to conduct a SAD and elicit strategies, novice analysts’
Unfortunately, one of the main weaknesses of a multi-analyst models over time are expected to be similar.
approach is the increase in false alarm rate (e.g. items identified
that are not accurate) (Stanton et al., 2009). To ensure that the 2. Method
benefits of a multi-analyst approach outweigh the cost of intro-
ducing false alarms, it is worth exploring strategies to reduce the 2.1. Participants
false alarms and ensure the quality of the output. For example, it is
assumed that if a method is accurate, relevant content should be 17 transportation safety professionals aged between 27 and 61
identified more consistently than irrelevant content. A practical (M 38, SD 11) took part in the study, see Tables 1 and 2. While the
solution for a multi-analyst approach to CWA would be to only participants were all professionals with an interest in Human Fac-
include items if more than one novice analyst generated that item tors, only 4 of them had applied CWA once or twice before but
or multiple analysts agreed on the relevance of that item in a group never in full (i.e. using all five phases). None of the participants had
1488 M. Cornelissen et al. / Applied Ergonomics 45 (2014) 1484e1494

used the SAD. Therefore, all participants were considered to be Table 2


novices in using the SAD method. Participant background.

Participants were recruited through professional association M SD Min Max


and university newsletters and were compensated for their time Age 38 11 27 61
and expenses. Prior to commencing the study, ethics approval was Years experience 14 11 4.5 40
formally granted by the Monash Human Ethics Committee.
Participants were divided into two groups based on their
flowcharts to write down the strategies elicited. These flowcharts
availability for the workshops. The two groups however, were the
contained empty nodes from each level of the SAD for participants
same in regards to the process taken and the time constraints
to fill out. From left to right the flowcharts contained nodes for
imposed on all activities.
verbs, physical objects, object related processes, criteria and values
and priority measures.
2.2. SAD analysis task
2.4. Design
The study focused on drivers and cyclists turning right at urban
signalised intersections in Melbourne, Australia. In Australia, road
All participants underwent the same training and exercises.
users travel on the left hand side of the road. A right hand turn per-
Workshop one (T1) was used to assess the validity of participant’s
formed on the main road involves crossing the intersection and
results using the SAD against the expert analyst results. In a second
continuing on the intersecting road. Cyclists can also travel on the
workshop (T2), a month after the first workshop, participants
footpath, which involves travelling across the intersection using the
completed the exercise again. The participant’s results at T1 were
pedestrian crossings available to the far right corner as viewed from
compared to their results at T2 to assess the test-retest reliability.
their approach, see Fig. 3. Right hand turns at urban signalised in-
tersections were chosen, as it is a relatively complex task with ample
2.5. Procedure
opportunities for performance variability and therefore ideal to eval-
uate the SAD. At the same time it also defines a reasonable boundary
Participants each attended two workshops.
for the system under analysis as suggested by Naikar (2005).

2.5.1. Workshop 1: data for criterion-referenced validity and test-


2.3. Materials
retest reliability
In workshop one, upon completion of a demographic ques-
Training material comprised descriptions and flowcharts of how
tionnaire and informed consent form, participants received
to conduct an analysis using the WDA and SAD methods. Back-
training in CWA, and specifically the SAD. The training consisted of
ground information, available throughout the entire exercise,
an introduction to the method as well as a walkthrough of the
consisted of de-identified quotes obtained from drivers and cyclists.
method using road transport examples.
These quotes described verbalised thought processes obtained
Once familiar with the method, participants were given a
while making right hand turns and post hoc explanation of
training exercise in which they applied the method to a relatively
cognitive processes underlying their decision making at the time; a
simple system, the Apple iPod. In the training exercise participants
more detailed description can be found in Cornelissen et al. (2012,
could ask questions to ensure they fully understood the method
2013). Participants were furthermore provided with a WDA of the
before starting the main exercise. In the training exercise, partici-
intersection ‘system’. An extract of the WDA is provided in Fig. 2.
pants were given an hour to identify verbs and criteria and link
them to the neighbouring levels in the SAD. They were then given
2.3.1. Building the Strategies Analysis Diagram
45 min to elicit strategies from the SAD.
Exercise material for building the SAD consisted of matrices in
Following a break, participants started the main exercise. The
which participants could first write possible verbs and criteria
exercise was explained to the participants and a step-by-step
identified in columns on the left hand side and subsequently link
walkthrough of the materials provided was conducted.
them to for example physical objects, which were already provided
in the top row. Participants used matrices rather than drawing a
SAD by hand to make it easier for participants to keep track of their
process and for the analysts to analyse the diagrams afterwards.

2.3.2. Eliciting strategies from the Strategies Analysis Diagram


After participants built their own SAD they were provided with
a SAD prepared for them, see Fig. 4, from which to elicit strategies.
This was done to ensure that all participants elicited strategies from
the same diagram and that elicitation was not influenced by how
participants performed in building the SAD. Participants used SAD

Table 1
Participant background.

Participants (n ¼ 17) n

Gender Female 8
Male 9
Educational background Human Factors 10
Engineering 7
Employment background Academia 5
Government agency 8
Fig. 3. Example on-road and pedestrian crossing route for right hand turns at
Industry 4
Australian intersections.
M. Cornelissen et al. / Applied Ergonomics 45 (2014) 1484e1494 1489

Fig. 4. Work Domain Analysis extract.

Participants were once more prompted to the formative nature of overall group of 17 analysts generated it. The output of this approach
the method and urged to identify possible verbs and criteria not is referred to in the text as the adjusted pooled model.
only those currently seen in our intersection system. Participants The reliability of SAD was assessed by comparing participants’
were given an hour and 45 min to build a SAD. results at two different times, using a test retest paradigm. The verbs,
After a short break, participants were provided with a SAD criteria and strategies identified by participants at T1 were compared
prepared for them and were asked to elicit strategies from this with the verbs, criteria and strategies identified at T2. To assess the
diagram. Participants elicited strategies for a simple and complex reliability of a multi-analyst approach, items identified by all 17
scenario see Table 3. Participants were given 30 min to elicit stra- novice analysts were combined into one group model and compared
tegies for the simple scenario and an hour for the complex scenario. to the items identified by all 17 novice analysts at T2. In the adjusted
multi-analyst model, items that were only identified by one of the 17
2.5.2. Workshop 2: data for test-retest reliability novice analysts were eliminated from the group model.
A month after the first workshop participants returned for a
second workshop. In workshop two, participants received a
3. Results
refresher training familiarising them again with the method. After
the refresher training, participants started with the main exercise.
3.1. Criterion-referenced validity
This was the same exercise as in workshop one. Once finished with
the exercise participants filled out a feedback form and were
3.1.1. Building the Strategies Analysis Diagram
thanked for their participants and reimbursed for their expenses.
3.1.1.1. Individual analyst approach. All participants defined verbs
2.6. Data analysis and criteria. Not all participants linked the nodes in the SAD while
some connected all nodes with each other. Therefore the definition
The expert model was developed by one analyst, and reviewed of verbs and criteria is analysed but linking of the nodes was not
by two analysts, all with extensive experience in the CWA frame- analysed further.
work and SAD method.
The validity of SAD was assessed by comparing the verbs, criteria
Table 3
and strategies in the participants’ model (novice analyst) with those Scenarios strategy elicitation.
in the expert model (expert analyst). Measures of validity included
Simple The road user (driver or cyclist) has entered the right hand
hit rate and predictive value. The analysis is depicted in Fig. 5. The scenario turning lane. There is no traffic in front. The road user wants
validity of the method using a multi-analyst approach was analysed to position him/herself at the traffic lights aiming to activate
by pooling individual participant’s analyses into a group model. That the traffic light sensor embedded in the tarmac.
is, the raw data was combined into one group model and the verbs, What strategies could the road user apply to position him or
herself at the traffic light sensor?
criteria and strategies identified in this group model were compared
Complex The road user (cyclist) has entered the right hand turning
to the expert analyst’s model. In addition, an adjusted multi-analyst scenario lane. Traffic has banked up in the turning lane. The cyclist is
model was tested to evaluate whether a practical solution could be approaching the stopped traffic and is deciding whether to
found to counter the increase of false alarms in a multi-analyst filter to the front of the traffic cue.
approach. In the adjusted multi-analyst model, items were elimi- What strategies could the road user apply to decide whether to
filter to the front?
nated from the group model if only one novice analyst from the
1490 M. Cornelissen et al. / Applied Ergonomics 45 (2014) 1484e1494

Fig. 5. Example analysis of evaluating accuracy novice model compared to expert model; identifying verbs and criteria.

It was found that the accuracy of verbs and criteria identified by 3.1.2. Eliciting strategies
individual participants, produced within the time constraints set, Next participants’ ability to accurately elicit strategies from the
compared to the verbs and criteria identified by the expert analysts SAD diagram was assessed. This involved evaluating whether
was average, see Fig. 3. The results showed that only 21e24% of the novice analyst’s courses of action (comprising verbs, physical ob-
verbs and criteria described in the expert model were identified by jects and object related processes) were similar to the courses of
the novice analysts. action identified by expert analyst in their model.

3.1.1.2. Multi-analyst approach. The novices’ group model, pro- 3.1.2.1. Individual analyst approach.. The results showed that nov-
duced by pooling individual verbs and criteria into one model, ices’ individual results for eliciting courses of action did not achieve
improved the accuracy of verbs and criteria identified when high levels of accuracy, see Fig. 7. The novice analyst’s identified
compared with the expert model, see Fig. 6. Pooled results only 8e13% of the strategies identified in the expert model.
resemble the expert model well, identifying 81e88% of the verbs
and criteria defined in the expert model. 3.1.2.2. Multi-analyst approach. Pooling individual participant’s
Unfortunately, pooling participant’s results into a group model courses of action identified into a group model improved the re-
increases the number of false alarms due to pooling of error data. sults, see Fig. 4. As a group, the novice analysts identified 57e58% of
Therefore the predictive value of the pooled model is low (29e33%). the strategies in the expert model.

3.1.1.3. Adjusted pooled model. The rise in false alarms was coun- 3.1.2.3. Adjusted pooled model. Again, strategies identified by one
tered in the adjusted pooled model by removing items that were participant only were removed. This resulted in an adjusted pooled
identified by one novice analyst only. This reduced the irrelevant model. This adjustment reduced the accuracy of the group model,
verbs and criteria significantly while only removing a small number with the group now only identifying 32e43% of strategies identi-
of verbs and criteria that were in fact relevant. This improved the fied in the expert model; see Fig. 7. Participants elicited a great
predictive value of the pooled model to 48 and 66% respectively, see variety in courses of action and therefore many relevant strategies
Fig. 6. were removed. Adjusting the pooled model did increase the
M. Cornelissen et al. / Applied Ergonomics 45 (2014) 1484e1494 1491

3.2. Test-retest reliability of individual analysts approach

Participants applied the SAD method at two different times with


a month’s interval. Assessing the test-retest reliability involved
evaluating whether results were consistent over time, namely
whether participants identified the same content at both times.
Interpretation scores of reliability measures for formative methods,
or instances where correct rejections do not exist or cannot be
estimated are absent from the literature to date. Therefore Kappa’s
interpretation of reliability is the closest guidance and is used for
the interpretation of the results here.

3.2.1. Building the Strategies Analysis Diagram


3.2.1.1. Individual analyst approach. The reliability of individual
results was found to be fair for identifying verbs and criteria, see
Fig. 9. Between .23 and .28 of the content was consistently identi-
fied at both applications (T1 and T2).

3.2.1.2. Multi-analyst approach.. The pooled results, pooling all in-


dividual models into a group model at T1 and pooling individual
models into a group model for T2, resulted in fair to moderate
agreement. As a group, between .37 and .41 of the verbs and criteria
was consistently identified at both applications (T1 and T2).

3.2.1.3. Adjusted pooled model. Eliminating items that were only


identified by one of the 17 analysts improved the test-retest reli-
ability. The consistency of the adjusted group model over time was
between .52 and .53.

Fig. 6. Individual (Mdn and IQR) and pooled results for verbs and criteria. 3.2.2. Eliciting strategies from the Strategies Analysis Diagram
3.2.2.1. Individual analyst approach. Reliability for eliciting strate-
predictive value. Hence, the chances that a course of action that was gies was lower. There was only a slight agreement for individuals
identified was relevant were greater for the adjusted than the non eliciting strategies, see Fig. 9. Between .11 and .13 of the strategies
adjusted pooled model. elicited at T1 were also elicited at T2.

Next, participant’s ability to identify accurate criteria and values


3.2.2.2. Multi-analyst approach. Pooling of the individual models
and priority measures for the courses of action identified was
into a group model at T1 and a group model at T2 increased the
assessed. Criteria and values and priority measures were only
test-retest reliability of eliciting strategies. As a group, between .25
assessed for each participant’s hits as naturally there are no cor-
and .26 of the strategies elicited were consistently identified at T1
responding criteria and values and priority measures for false
and T2.
alarms in the expert model. The number of accurate courses of
action (hits) identified per participant varied and the number of
3.2.2.3. Adjusted pooled model. Eliminating strategies that were
criteria and values and priority measures per strategy in the expert
only identified by one of the 17 analysts improved the test-retest
model varied as well. Counts of criteria and values and priority
reliability even more. About .39 of the strategies in the adjusted
measures identified do not take this relativity into account.
group model were identified at T1 and T2.
Therefore only percentages and no counts are reported in this
section.
4. Discussion

The aim of this article was to evaluate the criterion-referenced


3.1.2.4. Individual analyst approach. Participants were better at validity and test-retest reliability of the SAD method. The study
identifying relevant values and priority measures than criteria for represents a first of a kind evaluation of reliability and validity of
both scenarios, see Fig. 8. This finding is not surprising as the the SAD and one of the first to conduct such evaluations for
elicitation of values and priority measures was much more con- formative methods. Therefore the study adds to the knowledge
strained (6 values and priority measures versus 44 criteria) and base surrounding the reliability and validity of Ergonomics
therefore the chances of poor accuracy was lower. There were large methods work (Stanton and Young, 1999, 2003).
differences between participants, with some identifying as little as The analysis showed that individual results, within the con-
0e5% of the relevant criteria and others 40e50%. straints of the current exercise, proved to be only moderate. This is
especially true when compared to other reliability and validity
studies, see Table 4. Group models, produced by pooling individual
3.1.2.5. Multi-analyst approach. Pooling of the individual partici- analyst output into one model, were more valuable than the indi-
pant’s criteria and values and priority measures into a group model vidual models on its own. This suggests that the SAD method’s
was not conducted for this part of the analysis as participants reliability and validity can be optimised through the use of a
identified different courses of actions and therefore the criteria and structured process whereby individual analysts first conduct an
values and priority measures relate to different courses of action analysis, following which the individual outputs are pooled by an
and cannot be pooled. independent analysts or the individual analysts are brought
1492 M. Cornelissen et al. / Applied Ergonomics 45 (2014) 1484e1494

Engineers generated more content and relatively more false alarms,


whereas Human Factors professionals generated less content but
the content generated was more accurate. These differences were
not statistically significant but might explain some of the variation
observed between individuals. The performance when using Er-
gonomics methods of analysts with different discipline skill sets is
therefore proposed as a pertinent line of further research.
Table 4 summarises the reliability and validity scores of other
Ergonomics methods that have been tested and reported on within
the published peer reviewed literature. While these studies were
conducted using structured human error prediction methods with
limited number of categories for participants to chose from, it al-
lows for relative evaluation of the scores obtained in the study
presented in this article. The human error prediction studies did
not report predictive value scores and therefore only hit rates are
compared. This confirms that the individual novice analyst’s scores
using the SAD are below average when compared with results
obtained in other validation studies. This is not surprising, however,
when one delves deeper into the structure of formal error predic-
tion methods. For example, the Systematic Human Error Reduction
and Prediction Approach (SHERPA, Embrey, 1986) bounds the
analysis. Only a certain number of error modes can be selected for
each task step depending on its behavioural classification (e.g. as an
Action or a Checking behaviour). This limits the extent to which
analysts can achieve poor accuracy scores. SAD, on the other hand,

Fig. 7. Individual (Mdn and IQR) and pooled results for eliciting strategies.

together to integrate their analyses and produce an agreed group


result.
This finding is not surprising since the CWA framework has
typically been applied as a group process involving teams of ana-
lysts (Birrell et al., 2011) or analyses being reviewed and adapted
using other experts (Jenkins et al., 2008); however, it is an impor-
tant finding as it highlights the requirement for the SAD to be
applied in a similar group manner. When combined with other
positive pooled analyst study findings (Stanton et al., 2009), it adds
further weight to the notion that the Ergonomics design and
evaluation process should follow a group process involving multi-
ple analysts, regardless of the methods used. Further, based on the
evidence from other reliability and validity studies it is likely that
such approaches will prove beneficial for applications of Ergo-
nomics methods generally, including task analysis, error prediction
and analysis, and accident analysis methods.
The results demonstrated variability between participants.
Whether such differences could be explained by experience in
CWA, academic or industry background, field of study (e.g. whether
Human Factors and Engineers performed differently), differences
between groups, age or years of experience was investigated,
however, no valid explanation could be found. An observation
made during the workshops, however, was that Human Factors
professionals and Engineering professionals did approach the ex-
ercise differently. Human Factors professionals attempted to un-
derstand the method and experienced blocks in generating verbs,
criteria and strategies. Engineering professionals appeared to trust
the method and started generating verbs, criteria and strategies Fig. 8. Individual (Mdn and IQR) and pooled results for matching criteria and values
with less scrutiny on what was generated. It was found that and priority measures with strategies.
M. Cornelissen et al. / Applied Ergonomics 45 (2014) 1484e1494 1493

validity scores of SAD, when using a multi-analyst approach, are


promising. However, lessons were identified where the method can
be improved.
There are a number of reasons why the SAD method performed
poorly when individual analyst outputs were examined. Partici-
pants provided feedback that they found it hard to work with
someone else’s output. Participants were provided a finished WDA
when building the SAD, and a finished SAD when eliciting strate-
gies. It is apparent that not being involved in development of the
WDA and the SAD made it difficult for participants to understand
the language used and subsequently define the correct terms.
Involvement of analysts throughout the model development pro-
cess, spending considerable time to explain what each term means,
and providing clearer guidance on how to define verbs and criteria
all seem logical approaches that could be beneficial and produce
better outcomes.
The language issue might explain why there was variety in the
items identified over time. Participants reported that they were
impressed by the expert model. Participants therefore might have
remembered the items that they did not identify themselves
(misses) better than the ones that they did identify themselves
(hits) and ensured that they would come up with these additional
verbs and criteria, for example, in the second workshop.
Also, while most Human Factors task analysis methods have a
structured process to start at the same step every time (e.g. Hier-
archical Task Analysis (Stanton, 2006)) participants were free to
start at any point in the SAD, and were even encouraged to use
strategies to formatively identify new verbs and criteria. While this
is beneficial for innovation it is perhaps not inducing high levels of
consistency.
The evaluation of participant’s results and sorting them into
Fig. 9. Individual (Mdn and IQR) and pooled results for reliability scores. categories of hits, misses and false alarms was conducted in a strict
manner. If items were worded very differently (e.g. verb ‘accelerate’
referring to actual interaction of ‘press’ (pedal) or ‘slow’ versus
gives the analyst far greater scope to identify different verbs, ‘release’ pedal), or an incorrect level of detail was provided in
criteria etc. describing the items identified (e.g. criteria ‘weather conditions’
The pooled multi-analyst hit rates scores obtained for building versus ‘wet conditions’), such items were counted as false alarms or
the SAD in this study compare well with the human error predic- misses. However, while such evaluation is appropriate for a reli-
tion methods, especially considering the semi structured approach ability or validity study, a real world application of pooling results
of SAD. The hit rates for the multi-analyst approach are a little would filter such false alarms out or reword them to fit the purpose
lower than hit rates reported in human error studies. Reliability of the application.
scores were considerably lower than those of human error predi- The use of novices and the strict assessment protocol therefore
cation methods. Given the difference in nature of the methods, the provides a worst case scenario. Stanton and Stevenage (1998) for
example used novices to apply human error identification methods,
Table 4 in which the theoretical maximum of options was much more
Outcomes other validity scores.
constrained than the current method, and found reliability scores
Method Expert/novices Validity Reference between .4 and .6. They regarded those positive and attributed
TAFEI Expert analysts 67% Baber and Stanton, 1996 them to lack of experience of their participants since other papers
Novices 48% Stanton and Baber, 2002 using the same methods reported much higher performance when
SHERPA Experts 80% Baber and Stanton, 1996 using experts for the same methods. Therefore reliability and val-
75% Stanton and Stevenage, 1998
idity results of SAD are expected to improve when more experi-
Novices 68%; Stanton and Baber, 2002
w67e70% Stanton et al., 2009 enced analysts use the method.
Pooled; novices 90% Harris et al., 2005 Stanton and Young (2003) also point out that the scope of
HAZOP Novices w60e68% Stanton et al., 2009 analysis influences the results and that with a wider scope it is
HEIST Novices w75e80% Stanton et al., 2009 harder to get favourable reliability and validity statistics. It has been
HET Novices 88-89% Stanton et al., 2009
HFIT Novices 38% Gordon et al., 2005
argued that novices perform better when using a simple system
Tracer-Rail Novices 62% Baysari et al., 2011 under analysis (Stanton and Stevenage, 1998; Stanton and Young,
Tracer-rav1 Novices 54% Baysari et al., 2011 2003) and for a complex system, experts outperform novices
Multi method Novices w71e79% Stanton et al., 2009 (Baber and Stanton, 1996; Stanton and Young, 2003). SAD analysis
Method Expert/novices Reliability Reference was performed on a complex system and the formative nature of
TAFEI Expert .9 Baber and Stanton, 1996 the method provides for a wide scope of analysis.
Novices .79 Stanton and Baber, 2002 The current study presented a first of a kind evaluation of the
SHERPA Expert .9 Baber and Stanton, 1996 reliability and validity of a formative method, the SAD. While the
Novices .73 Stanton and Baber, 2002 measures used were found to be the most appropriate measures
.7 Harris et al., 2005
available in the literature to date, further research should address
1494 M. Cornelissen et al. / Applied Ergonomics 45 (2014) 1484e1494

whether more appropriate methods could be developed for reli- Embrey, D.E., 1986. SHERPA: a systematic human error reduction and prediction
approach. In: Paper Presented at the International Meeting of Advances in
ability and validity evaluations of formative methods. Moreover,
Nuclear Power Systems, Knoxville, Tennessee.
establishing the reliability and validity of Ergonomics methods is a Gordon, R., Flin, R., Mearns, K., 2005. Designing and evaluating a human factors
key endeavour for all within the discipline, particularly to support investigation tool (HFIT) for accident analysis. Saf. Sci. 43 (3), 147e171.
increased involvement of Ergonomics methods and practitioners in Harris, D., Stanton, N.A., Marshall, A., Young, M.S., Demagalski, J., Salmon, P., 2005.
Using SHERPA to predict design-induced error on the flight deck. Aerosp. Sci.
the design and evaluation of safety critical systems. Continued Technol. 9 (6), 525e532.
discussion and investigation in this area is therefore encouraged. Hassall, Maureen E., Sanderson, Penelope M., 2012. A formative approach to the
Future directions should focus on providing more detailed strategies analysis phase of cognitive work analysis. Theor. Issues Ergon. Sci., 1e
47 http://dx.doi.org/10.1080/1463922x.2012.725781.
guidance on the SAD process and how to define verbs, criteria and Hilliard, A., Thompson, L., Ngo, C., 2008. Demonstrating CWA strategies analysis: a
strategies, similar to directions provided on conducting a WDA by case study of municipal winter maintenance. Proc. Hum. Factors Ergon. Soc.
Naikar et al. (2005). While in the current study the pooling exercise Annu. Meet. 52 (4), 262e266.
Hollnagel, E., 2002. Understanding Accidents-from Root Causes to Performance
was conducted by a separate analyst, future directions include variability. In: Paper presented at the IEEE 7th Human Factors Meeting,
exploring the difference with such an approach and allowing the Scottsdale, Arizona.
individuals who build their own models to pool their models into a Hollnagel, E., 2004. Barriers and Accident Prevention. Ashgate Publishing,
Aldershot.
group model through a structured group process. Hollnagel, E., 2006. Resilience: the challenge of the unstable. In: Hollnagel, E.,
Woods, D.D., Leveson, N. (Eds.), Resilience Engineering: Concepts and Precepts.
Acknowledgement Ashgate, Aldershot, United Kingdom.
Jenkins, D.P., Stanton, N.A., Salmon, P.M., Walker, G.H., Young, M.S., 2008. Using
cognitive work analysis to explore activity allocation within military domains.
Miranda Cornelissen’s contribution to this article was conducted Ergonomics 51 (6), 798e815.
as part of her PhD candidature. This was funded by a Monash Kirwan, B., Kennedy, R., Taylor-Adams, S., Lambert, B., 1997. The validation of three
Graduate Scholarship and a Monash International Postgraduate Human Reliability Quantification techniques d THERP, HEART and JHEDI: part
II d results of validation exercise. Appl. Ergon. 28 (1), 17e25.
Research Scholarship. Paul Salmon’s contribution to this article was Lindegård Andersson, A., Ekman, A., 2008. Reply to the short communication paper
funded through his Australian National Health and Medical by T.P. Hutchinson regarding “Concordance between VDU-users’ ratings of
Research Council post doctoral fellowship. comfort and perceived exertion with experts’ observations of workplace layout
and working postures”. Applied Ergonomics (2005) 36, 319e325. Appl. Ergon.
39 (1), 133e134.
References Naikar, N., 2005. A methodology for work domain analysis, the first phase of
cognitive work analysis. In: Paper Presented at the Human Factors and Ergo-
Ahlstrom, U., 2005. Work domain analysis for air traffic controller weather displays. nomics Society 49th Annual Meeting, Orlando, Florida.
J. Saf. Res. 36, 159e169. Naikar, N., Hopcroft, R., Moylan, A., 2005. Work Domain Analysis: Theoretical
Baber, C., Stanton, N.A., 1994. Task analysis for error identification: a methodology Concepts and Methodology. DSTO, Fishermans bend.
for designing error-tolerant consumer products. Ergonomics 37 (11), 1923e Rasmussen, J., Pejtersen, A.M., Goodstein, L.P., 1994. Cognitive Systems Engineering.
1941. Wiley, New York.
Baber, C., Stanton, N.A., 1996. Human error identification techniques applied to Stanton, N.A., 2006. Hierarchical task analysis: developments, applications, and
public technology: predictions compared with observed use. Appl. Ergon. 27 extensions. Appl. Ergon. 37 (1), 55e79.
(2), 119e131. Stanton, N.A., Baber, C., 2002. Error by design: methods for predicting device us-
Baysari, M.T., Caponecchia, C., McIntosh, A.S., 2011. A reliability and usability study ability. Des. Stud. 23 (4), 363e384.
of TRACEr-RAV: the technique for the retrospective analysis of cognitive errors Stanton, N.A., Salmon, P.M., Harris, D., Marshall, A., Demagalski, J., Young, M.S.,
e for rail, Australian version. Appl. Ergon. 42 (6), 852e859. Dekker, S.W.A., 2009. Predicting pilot error: testing a new methodology and a
Birrell, S.A., Young, M.S., Jenkins, D.P., Stanton, N.A., 2011. Cognitive Work Analysis multi-methods and analysts approach. Appl. Ergon. 40 (3), 464e471.
for safe and efficient driving. Theor. Issues Ergon. Sci., 1e20. Stanton, N.A., Salmon, P.M., Rafferty, L., Walker, G.H., Baber, C., Jenkins, D.P., 2013.
Bolger, F., Wright, G., 1992. Reliability and validity in expert judgment. In: Human Factors Methods: A practical guide for Engineering and Design. second
Wright, G., Bolger, F. (Eds.), Expertise and Decision Support. Plenum Press, New ed. Ashgate Publishing, Aldershot, United Kingdom.
York, pp. 47e76. Stanton, N.A., Stevenage, S.V., 1998. Learning to predict human error: issues of
Burns, C.M., Bisantz, A.M., Roth, E.M., 2004. Lessons from a comparison of work acceptability, reliability and validity. Ergonomics 41 (11), 1737e1756.
domain models: representational choices and their implications. Hum. Factors Stanton, N.A., Young, M.S., 1998. Is utility in the mind of the beholder? A study of
46 (4), 711e727. ergonomics methods. Appl. Ergon. 29 (1), 41e54.
Cornelissen, M., Salmon, P.M., Jenkins, D.P., Lenné, M.G., 2012. A structured Stanton, N.A., Young, M.S., 1999. What price ergonomics? Nature 399, 197e198.
approach to the strategies analysis phase of cognitive work analysis. Theor. Stanton, N.A., Young, M.S., 2003. Giving ergonomics away? The application of er-
Issues Ergon. Sci., 1e19. gonomics methods by novices. Appl. Ergon. 34 (5), 479e490.
Cornelissen, M., Salmon, P.M., McClure, R., Stanton, N.A., 2013. Using cognitive work Vicente, K.J., 1999. Cognitive Work Analysis: toward Safe, Productive and Healthy
analysis and the strategies analysis diagram to understand variability in road Computer-based Work. Lawrence Erlbaum Associates, Inc, Mahwah, New Jersey.
user behaviour at intersections. Ergonomics, 1e17. Woods, D.D., 1988. Coping with complexity: the psychology of human behaviour in
Descatha, A., Roquelaure, Y., Caroly, S., Evanoff, B., Cyr, D., Mariel, J., Leclerc, A., 2009. complex systems. In: Goodstein, L.P., Andersen, H.P., Olsen, S.E. (Eds.), Tasks,
Self-administered questionnaire and direct observation by checklist: comparing Errors, and Mental Models. Taylor & Francis, London, pp. 128e148.
two methods for physical exposure surveillance in a highly repetitive tasks
plant. Appl. Ergon. 40 (2), 194e198.

You might also like