You are on page 1of 3

The Shape of Knowledge

Exploring Two Methods of Usability Testing:


Concurrent versus Retrospective Think-Aloud Protocols

Maaike J. van den Ha& Memo D.T. de Jong


University of Twente University of Twente
m.i. vandenhaak~,utwente.nl m.d.t.deionp@utwente.nl

Abstract very fact that the participants verbalize their


thoughts may also cause reactivity, i.e.
Think-aloudprotocols are commonly used fo? participants may work differently from usual as a
the usabili@ testing of instructional documents, result of their thinking aloud. This difference
web sites and interfaces. This paper addresses may lead to a hener or a worse performance,
the benefits and drawbacks of two. think-aloud neither of which is desirable because in the first
variations: the traditional concurrent think- case, potential user problems do not come to
aloud method and the less familiar retrospective light, while in the second case false a l m s may
fhink-aloudprotocols.It also offers an outline of be generated. According to Ericsson and Simon,
a long-term research project designed io who discuss think-aloud protocols for
empirically investigate the value of both investigating cognitive processes, the risk of
variants. The results of afirst comparative study reactivity can be largely eliminated, if strict
indicafe that, although the two methods have guidelines are observed [I]. However, in the
distinct differences, they do seem to produce a context of usability testing, the potential bias of
similar outcome. A more detailed description of concurrent thinking aloud has received little
the results will be offered during the scientific attention.
presentation. Keywords: usability testing, think- An alternative approach concerns the use of
aloudprotocols, methodolom, retrospective think-aloud protocols. This method
involves participants first c a v i n g out their tasks
silently, after which they verbalize their thoughts
1. Introduction in retrospect. In some cases, this retrospective
verbalization takes place without any stimuli,
which is likely to have a negative effect on the
In its most common form, usability testing
exhaustiveness of the comments produced [2-41.
represents either a test situation in which
In other cases, however, the retrospective
participants are observed while working silently
verbalizations are supported by a recording of
with a particular test object, or a situation in
the performance. Nielsen, for instance,
which they simultaneously work and verbalize
recommends using a video recording [ 5 ] ;
their thoughts. Even though the tasks involved
Henderson et al. used computer log tiles [6].
and the laboratory situation are to a certain
When verbalization is accompanied by stimuli,
degree artificial for both methods, the first one
the retrospective think-aloud method potentially
mentioned, working silently, comes closest to
combines the benefits of both working silently
regular working procedures. At the same time,
and thinking aloud. All the same, it remains to be
the second method, concurrent thinking-aloud,
seen whether participants are indeed able to
has a clear benefit in that it allows insight into
remember everything they thought during their
the participants’ thinking process. This would
task performance. What is more, they might
seem to result in a more complete overview of
actually come up with invented thoughts. Again,
user problems encountered: in addition to the
however, there is little empirical evidence with
observable problems (which constitute devia-
regard to the validity of the method in question.
tions fiom the optimum working procedure), the
In sum, it is clear that more research into the
participants’ verbalizations may reveal any
doubts, irritation, surprise or other feelings that methodology of usability testing (and formative
evaluation in general) is desirable. In an earlier
arise during the process. On the other hand, the
overview of available research, De Jong and

0-7803-7949-7/03/%17.00 02003 IEEE 285


The Shape of Knowledge

Schellens show that there is a discrepancy problem detection is typically one of the most
between the popularity of think-aloud usability important functions of usability testing, this
testing and the scientific attention that has been meant that a crucial aspect was not included in
paid to the reliability and validity of the method the comparison ofthe two methods.
[7]. To make up for this lack of attention, a
research project was initiated which aims to shed 3. Outline of research project
light on the benefits and drawbacks of the three
methods mentioned, i.e. participants working In a long-term research project, the merits and
silently, thinking aloud concurrently, or thinking restrictions of the methods as discussed above
aloud retrospectively. This paper first offers a will be investigated using different test objects.
brief overview of the available research in the
context of usability testing. It then provides an
outline of the current research project. 3.1 Research questions

Several aspects will be taken into account


2. Research available while comparing the three methods. Given the
goals of usability testing, the most important
The literature on usability testing tends to aspect is the outcome in terms of problem
describe concurrent and retrospective think- detections. Both the number of problems and the
aloud protocols as equal alternatives [ 5 ] . So far, nature of problems will be considered. To
only two studies have compared concurrent and investigate the nature of the problems, a
retrospective think-aloud protocols. typology of problems will be developed. One
Hoc and Leplat used the two types of think- issue to consider here is the manner of problem
aloud protocols to investigate a problem-solving detecting, i.e. by means of observation,
process of participants (they had to order a set of verbalization, or both. Another is the kind of
letters on a computer screen using a limited set problems detected, i.e. relating to layout,
of commands) [8]. In the retrospective condition, terminology, etc.
participants were first asked to give an unaided A second important aspect to consider with
account of their process, and after that had to regard to the comparison of the metbods is the
think aloud, while watching all the steps in the participants’ task performance. This is essential
process, which had been recorded in a computer for investigating the reactivity of concurrent
log file. They conclude that unaided think-aloud protocols. In an ideal situation, one
retrospective accounts should be avoided, would expect that participants in a concurrent
because of the distortions and gaps in the think-aloud condition are equally successhl as
protocols, but that the retrospective and participants working silently.
concurrent think-aloud protocols produce similar The third aspect for consideration involves
results. It should be noted, however, that both the Participants’ experiences during the test, i.e. how
task given to the participants (which more or less did they feel about carrying out the test situation,
resembled a logical puzzle) and the analysis of tasks, and thinking aloud (retrospectively)?
the results (focusing more on strategies than on In sum, three research questions will be
problems encountered) do not correspond to the addressed
situation of usability testing.
Bowers and Snyder compared the two think- Do the methods differ in terms of numbers
aloud variations in a usability test focusing on and types of usability problems detected?
the handling of multiple windows on a computer Do the methods differ in terms of task
screen [9]. They found no significant differences performance?
regarding task performance and task completion
time, but the retrospective think-aloud condition Do the methods differ in terms of participant
resulted in considerably fewer verbalizations, experiences?
and these were often of a different type than the
concurrent verbalizations, focusing more on
explanations and less on procedures. While these 3.2 Research design
results are interesting, the study has a serious
drawback in that it does not report on the number The research questions will be investigated by
and kinds of problems detected by the means of three test objects: an online library
participants in the two think-aloud conditions. As catalogue, a household appliance plus manual,

0-7803-7949-7/03/$17.00 02003 IEEE 286


The Shape of Knowledge

and a web site. For each test object, a set of


realistic user tasks will be formulated, which will [4] Taylor, K.L., and J.P. Dionne. Accessing problem-
he handed out to 20 participants per condition. solving strategy knowledge: The complementary use
The participant sessions will he held of concurrent verbal orotocols and retromective
individually, and will he recorded on video tape. debriefing. Journal o/ Educational Psychologv
29:413-425,2000,
Each session' ends with a questionnaire
containing questions on participant experience. [5] Nielsen, I., Usobiliry Engineering, Academic
Once the sessions are over, all recordings will he Press, Boston, MA, 1993.
analyzed with a view to problems detected and [6] Henderson, R.D.,el al. A comparison of the four
overall successful task performance. prominent user-based methods for evaluating the
usability of computer software. Ergonomics. 38: 2030-
2044.
3.3 A first study [7] De Jong,, M., and P.J. Schellens. Toward a
document evaluation methodology: what does research
A first study involved the testing of an online tell us about the validity and reliability o f methods?
library catalogue. Its results indicated that the IEEE Tranroetiom on Pro/essionol Communication.
participants' verbalizations indeed resulted in 43: 242-260,2000,
more problem detections, compared to
participants working silently. The extent to [XI Hoc, J.M, and I. Leplat. Evaluation of different
which these verbalizations complement the modalities o f verbalization in a sorting task.
lnternotionol Journal o/ Man-Machine Studies. 18:
observable usability problems differs between
283-306, 1983.
the concurrent and the retrospective think-aloud
condition. The added value of the verbalizations 191 Bowers, V.A., and H.L.Snyder. Concurrent versus
was more substantial in the retrospective think- retrospective verbal protocols for comparing window
aloud method. Overall, the two think-aloud usability. Humon Factors Society 34th Meeting, 8-12
methods resulted in similar numbers and types of October 1990 (Santa Monica: HFES), 1270-1274,
problems. 1990
One of the most striking results of our first
study is that the participants in the concurrent
think-aloud condition performed less successful
than the participants who worked silently and About the Authors
verbalized in retrospect. This result is not only
reflected in the number of observable problems Maaike van den Haak is a part-time PhD
per participant, but also in the overall success candidate at the University of Twente (The
rate for the tasks. This may point to a certain Netherlands). Her PhlJ research focuses on the
degree of reactivity within the concurrent think- merits and drawbacks of (variants ot) the think-
aloud method. A possible explanation lies in the aloud method as an evaluation tool for
workload of the participants, which together with instructive communication. Apart from her
the requirement to think aloud may have had a position at the University of Twente, she is also a
negative effect on task performance. part-time teacher of English at the Vrije
Universiteit, Amsterdam (The Netherlands).

Menno de Jong is an associate professor of


4. References Communication Studies at the University of
Twente (The Netherlands). His research interests
[I] Ericsson, K.A., and H.A. Simon, Protocol concern the use and methodology of applied
Analysis: Verbal Reports as Data, MIT Press, research to optimize communication. He has
Cambridge, MA, 1993. published about text and web evaluation,
usability, and document design. He was co-editor
[2] Branch, J.L. Investigating the information-seeking o f a special issue of IEEE Transactions on
processes of adolescents: The value of using think
alouds and think afters.Library & Informotion Science Professional Communication on document
Research. 22: 371-392,2000, evaluation and usability testing.

[3] Kuusela, H., and P. Paul. A comparison of


concurrent and retrospective verbal protocol analysis.
American Journal o/Psychologv I 13: 387-404.2000,

0-7803-7949-7/03/$I7.00 02003 IEEE 287

You might also like