Cognition Training: Optimizing Efficiency, Durability, and Generalizability

Training
Cognition
Optimizing
Efﬁciency,
Durability, and
Generalizability
Alice F. Healy and

Lyle E. Bourne Jr., Eds
Training Cognition
Training Cognition
Optimizing Efficiency, Durability,
and Generalizability
Edited by
Alice F. Healy and
Lyle E. Bourne, Jr.
First published 2012
by Psychology Press part of the Taylor and Francis Group
711 Third Avenue, New York, NY 10017
Simultaneously published in the UK

by Psychology Press
27 Church Road, Hove, East Sussex BN3 2FA
Psychology Press is an imprint of the Taylor & Francis Group, an informa

business
© 2012 Taylor & Francis, LLC
All rights reserved. No part of this book may be reprinted or reproduced

or utilized in any form or by any electronic, mechanical, or other means,
now known or hereafter invented, including photocopying and recording,
or in any information storage or retrieval system, without permission in
writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or reg-

istered trademarks, and are used only for identification and explanation
without intent to infringe.
Library of Congress Cataloging in Publication Data

Training cognition : optimizing efficiency, durability, and generalizability
/ edited by Alice F. Healy, Lyle E. Bourne Jr.
p. ; cm.
ISBN 978-1-84872-950-6
1. Cognitive styles. 2. Employees—Training of. I. Healy, Alice F.
II. Bourne, Lyle Eugene, 1932–
[DNLM: 1. Cognitive Science—education. 2. Cognitive Science—
methods. 3. Cognition. 4. Models, Educational. 5. Task Performance
and Analysis. ]
BF311.T685 2012
153—dc23
2011051388
ISBN: 978-1-84872-950-6 (hbk)

ISBN: 978-0-203-81678-3 (ebk)
Typeset in Bookman
by EvS Communication Networx, Inc.
Contents
Preface vii
1 Introduction: Training and its Cognitive Underpinnings 1

LYLE E. BOURNE, JR. AND ALICE F. HEALY
2 Empirically Valid Principles of Training 13

ALICE F. HEALY, VIVIAN I. SCHNEIDER, AND LYLE E. BOURNE, JR.
3 Basic Research on Training Principles 40

ALICE F. HEALY AND LYLE E. BOURNE, JR
4 Attention and Cognitive Resource Load in Training

Strategies 67
CHRIS WICKENS, SHAUN HUTCHINS, TOM CAROLAN, AND JOHN CUMMING
5 Acquisition and Transfer of Basic Skill Components 89

ROBERT W. PROCTOR, MOTONORI YAMAGUCHI, AND JAMES D. MILES
6 How Cognitive Ability and Automation Influence

Training Performance and Transfer 112
ERIC D. HEGGESTAD, BENJAMIN A. CLEGG, ADRIAN GOH, AND ROBERT
S. GUTZWILLER
7 Conducting Technology-Based Applied Training

Research 134
STEPHEN L. GOLDBERG AND PAULA J. DURLACH
vi Contents
8 A New Taxonomy for Training 156
WILLIAM D. RAYMOND, ALICE F. HEALY, AND LYLE E. BOURNE, JR
9 Cognitive Models of Training Principles and

the Instance-Based Learning Tool 181
CLEOTILDE GONZALEZ
10 Modeling Cognitive Tasks in IMPRINT 201

CAROLYN J. BUCK-GENGLER, WILLIAM D. RAYMOND, ALICE F. HEALY,
AND LYLE E. BOURNE, JR
11 Evaluation and Comparison of Models of Human

Performance During Training 225
BENGT FORNBERG, WILLIAM D.RAYMOND, CAROLYN J. BUCK-GENGLER,
ALICE F. HEALY, BRADLEY J. BEST, AND LYLE E. BOURNE, JR.
12 A Compact Mathematical Model for Predicting

the Effectiveness of Training 247
MATT JONES, LYLE E. BOURNE, JR., AND ALICE F. HEALY
13 Put the SPRINT in knowledge training: Training

with SPacing, Retrieval, and INTerleaving 267
MARK MCDANIEL
14 Training for Real-World Job Performance 287

IMMANUEL BARSHI AND LOUKIA LOUKOPOULOS
15 Cognitive Retraining Following Acquired Brain Injury 307

KEITH R.LOHSE AND LYLE E. BOURNE, JR.
16 Conclusions 326
LYLE E. BOURNE, JR. AND ALICE F. HEALY
Index 337
Preface
Training is both a teaching and a learning experience, and

just about everyone has had that experience. Training involves
acquiring knowledge and skills. This newly acquired training
information is meant to be applicable to specific activities, tasks,
and jobs. In modern times, where jobs are increasingly more
complex, training workers to perform successfully is of more
importance than ever. The range of contexts in which training
is required includes industrial, corporate, military, artistic, and
sporting, at all levels from assembly line to executive function.
The required training can take place in a variety of ways and
settings, including the classroom, the laboratory, the studio, the
playing field, and the work environment itself.
The general goal of this book is to describe the current state of
research on training using cognitive psychology to build a com-
plete empirical and theoretical picture of the training process.
The book focuses on training cognition, as opposed to physical
or fitness training. It attempts to show how to optimize training
efficiency, durability, and generalizability. The book includes a
review of relevant cognitive psychological literature, a summary
of recent laboratory experiments, a presentation of original the-
oretical ideas, and a discussion of possible applications to real-
world training settings.
The initial interest of the editors in training was piqued by an
opportunity to investigate the long-term retention of skills under
a contract from the Air Force Human Resources Laboratory given
in 1985. For this contract, the existing literature on skill mainte-
nance over time was reviewed and new experimental methodolo-
gies to study this process were developed. This effort led to the
formulation of a broader research program on ways to optimize
the long-term retention of skills guided by both a structural
and an analytic approach to skilled performance. Work on this
program benefited from the collaboration of Anders Ericsson,
who was formerly at the University of Colorado (CU). Support for
viii Preface
this particular effort was provided by the Army Research Insti-
tute (ARI) and led to the publication of a research monograph,
Learning and Memory of Knowledge and Skills: Durability and
Specificity, published by Sage (Healy & Bourne, 1995). Over the
ensuing years, this program has expanded to include training
of foreign language, navigation, and first responder activities,
among other job-related behaviors. This work has been sup-
ported financially by various funding agencies, including not
only the ARI, but also the Army Research Office (ARO), the Naval
Training Systems Center, the National Science Foundation, and
the National Aeronautics and Space Administration (NASA). In
addition to many journal articles and book chapters, a volume,
focusing specifically on Foreign Language Learning: Psycholin-
guistic Studies on Training and Retention, was published by Erl-
baum (Healy & Bourne, 1998). All of this research culminated in
a major 5-year Multidisciplinary University Research Initiative
(MURI) grant from the ARO (“Training Knowledge and Skills for
the Networked Battlefield” W911NF-05-1-0153), which enabled
the enlistment of a number of collaborators at various academic
institutions across the United States. The MURI coinvestigators
include several authors of chapters in this book. Among them
are Benjamin Clegg (Colorado State University), Bengt Forn-
berg (CU), Cleotilde Gonzalez (Carnegie Mellon University), Eric
Heggestad (University of North Carolina, Charlotte), and Rob-
ert Proctor (Purdue University). This collaboration permitted an
expansion of this research program beyond the cognitive psy-
chological laboratory into more theoretical domains including
computational modeling and taxonomic analysis. Accomplish-
ments under the MURI grant constitute a major portion of the
work reported in this volume.
This volume also benefited from contributions made by a
number of investigators who were consulted during the course
of the MURI. Among them are CU colleagues Carolyn Buck-
Gengler, Matt Jones, William Raymond, Vivian Schneider, and
Christopher Wickens. Outside of CU, assistance was provided by
Immanuel Barshi (NASA), Stephen Goldberg and Paula Durlach
(ARI), and Mark McDaniel (Washington University, St. Louis).
Contacts with these contributors were facilitated by the estab-
lishment of the Center for Research on Training at CU in 2006.
Among other things, the Center has sponsored a series of dis-
tinguished lectures, and Wickens (in 2006), McDaniel (in 2008),
and Barshi (in 2010) each gave one of these lectures.
A major challenge facing the military, industry, arts, sports,
and other skill-based domains is how most effectively to intro-
duce trainees to their task requirements and to bring them to
Preface ix
a level of performance that permits them to use the technology
available and to assimilate, as needed, information from multi-
ple sources into their activities and decisions. Concerns include
the amount of time and equipment required for initial training,
for transfer to related equipment or instruments and tasks, and
for retention of the trained knowledge and skills over periods
of disuse. For example, in industry workers need to be trained
in the operation of equipment, such as a milling machine, for
production purposes. In the corporate world, financial plan-
ners need to be trained in the use of computing applications for
spreadsheets. The arts present an especially interesting example
for this book because of its relevance to a distinction between
education and training. One can be highly knowledgeable about
the arts without necessarily being a performing artist. The dif-
ference here hinges on being well-informed about a particular
domain, say, opera, as a music critic is likely to be, versus being
able to perform in that domain, say, credibly sing an aria, as a
diva might. This is the difference between art appreciation and
artistic performance. The former is the target of an educational
process, and the focus is on declarative or factual knowledge
regarding a given domain. The latter entails a training process
in which procedural knowledge or skill development is the tar-
get. The difference between education and training is not exclu-
sive, of course. Education in art appreciation involves some
amount of skill, and training for artistic performance requires a
fair amount of factual knowledge. The difference is rather a mat-
ter of emphasis and, as suggested elsewhere, the focus in educa-
tion is on knowledge acquisition, whereas the focus in training
is on skill acquisition.
This volume confronts the training challenge. It attempts to
provide possible solutions to training problems, based on cog-
nitive psychological research. The effort has been driven and
guided by three major goals: constructing a framework for train-
ing, predicting the effects of training methods on various jobs,
and optimizing training outcomes. The work to be reported falls
into three interrelated categories: empirical research on training
principles, a taxonomy of training and jobs, and development of
computational models to capture human performance.
Other topics addressed in this volume are: (a) an overarch-
ing quantitative theoretical framework for training that encom-
passes the basic cognitive processes and the empirically valid
training principles; (b) ways in which attentional processes
enter into training to affect acquisition, retention, and trans-
fer of knowledge and skills; (c) the distinction, as introduced
above, between education and training and the possibility that
x Preface
different principles might apply in the two cases; and (d) train-
ing as it occurs and might be improved in everyday modern life.
It is important to acknowledge the contributions to this work
made by persons who do not appear as authors in the volume.
In particular, individuals from various granting and contracting
agencies encouraged and supported these research efforts over
the last 25 years. These supporters include Martha Polson (Air
Force Human Resources Laboratory); Judith Orasanu, Michael
Kaplan, Bob Wisher, George Lawton, Michael Drillings, Joseph
Psotka, Jonathan Kaplan, and Paul Gade (all of the U.S. Army
Research Institute); Ray Perez (Office of Naval Research); John
Lavery and Elmo Schmeisser (both of the U.S. Army Research
Office); and John Lockett (U.S. Army Research Laboratory). The
contributions of individual research collaborators to grant or
contract proposals (Ron Laughery, Micro Analysis and Design)
and to the completed laboratory research (most importantly
James Kole and Erica Wohldmann) are also acknowledged with
gratitude.
Finally, other people and sources of support contributed to
some of the individual chapters, as listed below.
Chapter 2: Work on this chapter was supported by grants
NNA07CN59A and NNX10AC87A from the National Aero-
nautics and Space Administration to the University of
Colorado.
Chapter 3: The following collaborators, whose data are
reported here, are thanked: Lindsay Anderson, Bill Bonk,
Carolyn Buck-Gengler, Varun Dutt, Deanna Fierman,
Coty Gonzalez, Shaw Ketels, James Kole, Blu McCor-
mick, Vicki Schneider, Chris Wickens, and Dusty Young.
Chapter 4: This work was supported by the U.S. Army
Research Institute under Contract #: W91WAW-
09-C-0081 to Alion Science and Technology titled
“Understanding the Impact of Training on Performance.”
Chapter 5: Motonori Yamaguchi is now at the Department
of Psychology, Vanderbilt University, and James D. Miles
is now at California State University Long Beach.
Chapter 9: Thanks are due to Ripta Pasay for his program-
ming effort in the IBLTool.
Chapter 13: Preparation of this chapter and its related
cited references for research conducted in the middle
school were supported in part by Grant R305H060080-
06 awarded to Washington University in St. Louis from
the U.S. Department of Education, Institute of Educa-
tion Sciences. Doug Rohrer provided helpful comments
Preface xi
on a previous version of this chapter, Rebecca Roediger
and Tammy Sonn provided assistance with medical-
training examples, and Ron King provided the account-
ing example.
Chapter 14: The writing of this chapter was supported in
part by a grant from the National Aeronautics and Space
Administration to the San Jose State University (grant
number NNX08AX13A).
Finally, the Editors would like to acknowledge the support
and encouragement they received throughout this project from
their families, Bruce and Charlotte Healy and Alice’s mother
Doris Fenvessy, and Rita Yaroush Bourne and the entire loyal
Bourne clan, Barbara, Betsy, and Andrew. The work would not
have been accomplished without them.
Editor Bourne expresses his own special gratitude to for-
mer student and constant mentor, Edward Hess, who is now
Professor of Business Administration and Batten Executive-in-
Residence, Darden School of Business, University of Virginia.
Ed has shared his vast knowledge of how to train young entre-
preneurs for success in business, about which he knows just
about all there is to know. Many interactions over the years
have generated important insights into the general training pro-
cess and appear in various forms throughout this book. Editor
Healy expresses her own special gratitude to her mentor, the
late William K. Estes, who was a constant source of guidance
and inspiration.
Alice F. Healy and Lyle E. Bourne, Jr., Editors
References
Healy, A. F., & Bourne, L. E., Jr. (Eds.). (1995). Learning and memory
of knowledge and skills: Durability and specificity. Thousand Oaks,
CA: Sage.
Healy, A. F., & Bourne, L. E., Jr. (Eds.). (1998). Foreign language learn-
ing: Psycholinguistic studies on training and retention. Mahwah, NJ:
Erlbaum.
1 Introduction
Training and Its Cognitive
Underpinnings
Lyle E. Bourne, Jr. and Alice F. Healy
University of Colorado
The major challenge facing the military, industry, sports, and

other skill-based domains is how to train people for successful
performance on the job. Training often involves the use of new
technology and ways to incorporate information from multiple
sources into activities and decisions. Critical questions relate to
how much time and effort are required to achieve a criterion of
performance, how to ensure transfer of training to related equip-
ment and tasks, and how to promote retention of the trained
knowledge and skills during periods of disuse. The chapters
in this volume will summarize and highlight some of the find-
ings and products of the research conducted by the authors on
these questions as well as related research by other investiga-
tors. Emphasis in the volume is on principles of training that
are applicable to a variety of instructional and training settings.
Keep in mind that the research to be reviewed applies primarily
to “mental” training; that is, training the knowledge and skills
that permit successful performance. “Physical” training to pre-
pare the body for successful performance (e.g., strengthening
muscles or lung capacity) is largely outside the scope of this
volume.
Basic empirical and theoretic research in three interrelated
categories will be addressed in the following chapters. The first
category contains experiments designed to identify basic com-
ponents of skill and to develop and test training principles for
skills applicable to military and other settings, to examine atten-
tional processes and the impact of technology in training, and to
study effects on training of levels of automation across a range
of individual differences in ability. The second category includes
a training taxonomy, consisting of a four-dimensional decom-
position by training methods, task types, performance mea-
sures, and training principles, which can be used to extend the
knowledge gained from experimental research to performance
in other more naturalistic settings. The third category is the
2 Lyle E. Bourne, Jr. and Alice F. Healy
development of computational models that capture human per-
formance of a variety of tasks and can provide quantitative pre-
dictions for training of different components of tasks and jobs.
An evaluation and comparison of those models, along with ways
to optimize the predictability of those models, is also included,
as is a description of an overarching quantitative framework for
training research and practice.
Underlying the research in all of these categories are three
major goals. These are (a) to construct a theoretical and empiri-
cal framework for training in any of a variety of applied set-
tings, (b) to predict the outcomes of different training methods
on particular tasks or jobs, and (c) to point to ways to optimize
training.
Important Distinctions
Training and Education

To understand the process of training, several important dis-
tinctions need to be considered: Foremost among these is the
difference between training and education. Both training and
education involve the acquisition of new knowledge and skills.
The focus in education is on general domains, such as history,
physics, and statistics and the unique knowledge and skills rel-
evant to those domains. In training the focus is narrower and
pertains to particular jobs, tasks, or activities, such as piloting
a plane, entering data into a computer, or dart throwing, which
require their own unique and relatively specific set of knowledge
and skills. The principles of training are not necessarily the same
as the principles of education although there is undoubtedly a
good deal of overlap (see chapter 13). Both training and educa-
tion represent a transaction between teachers, who try to impart
knowledge, and students, who are the recipients of that knowl-
edge. The principles of training considered in this volume recog-
nize that relationship and apply to both teachers and students.
Acquisition, Retention, and Transfer

There are three fundamental cognitive components of training.
The first component is acquiring new knowledge and skill. Nor-
mally the acquisition process depends upon repeated exposure to
and practice of the knowledge and skills to be learned. The second
component is retention. It is not sufficient for successful train-
ing merely to produce new knowledge and skills; what is learned
must be retained over time, sometimes without further exposure
Introduction 3
or practice. Both knowledge and skills are susceptible to being
forgotten, so understanding the retention process is necessary
for effective training. The third component is transfer. Often the
context in which knowledge and skills are acquired differs from
those in which they are eventually expressed. Thus, understand-
ing the transferability or generality of acquired knowledge and
skills to new contexts is crucial for training to be satisfactory.
These three cognitive components are in some sense indepen-
dent of one another, taking place over different time intervals
and in different environments. In fact, it has been shown that
optimizing one of these components does not necessarily opti-
mize the other two. For example, conditions that produce rapid
skill or knowledge acquisition do not necessarily produce better
retention or transferability of what has been learned (Schmidt &
Bjork, 1992; Schneider, Healy, & Bourne, 2002). Indeed, speed-
ing up the acquisition process sometimes results in poor memory
for the learned material and weak generalization of that material
to new environments or tasks. Likewise, maximal retention does
not necessarily ensure maximal transfer (Healy, 2007). Thus, it
is important for any psychological understanding of training to
take into account all three cognitive components. Optimal train-
ing maximizes simultaneously the efficiency, the durability, and
the generalizability of new knowledge and skills. One purpose
of this volume is to elucidate the training optimization process.
Acquisition: Power Law of Acquisition. There are two

fundamental measures of performance in any task: accuracy
and speed of responses. With respect to response speed, Newell
and Rosenbloom (1981) have argued that the power law of
acquisition describes the acquisition process for most skills.
This law formalizes the relationship between trials of practice
and time to make a correct response as a power function,
R = aN -b, where R is response time on trial N, a is response
time on trial 1, and b is the rate of change. It follows that the
relationship between response time and trial number is linear in
log-log coordinates, log R = log a – b log N. In some cases, where
more than one strategy can be used in the task, separate power
functions might apply to the different strategies (Delaney, Reder,
Staszewski, & Ritter, 1998; Rickard, 1997). The power law of
acquisition affords a way of predicting performance in a variety
of tasks as a function of degree of practice (for possible exceptions
see Leibowitz, Baum, Enden, & Karniel, 2010; Roediger, 2008).
With respect to response accuracy, a similar function might
apply (e.g., Bourne, Healy, Parker, & Rickard, 1999) although a
power function has not been as well established for these data.
In some cases, speed and accuracy are not positively correlated
(e.g., Pachella, 1974). People sometimes trade speed for accuracy
or vice versa. Likewise, the speed of executing the different
steps of a complex task may not be positively correlated, with
people slowing down on one step in order to be faster on another
step (Healy, Kole, Buck-Gengler, & Bourne, 2004; Kole, Healy,
& Bourne, 2008). In these cases, the power law of acquisition
might not be a good description for all measures. The implication
is that, for optimal training, instructors need to be aware of the
various steps in any task as well as whether speed or accuracy is
more important in each step, so that the more important aspect
can be emphasized in training.
Retention: Power Law of Forgetting. With the passage of time

and the lack of opportunity to rehearse or refresh acquired
knowledge or skills, performance declines and reflects failure
to retain information. This decline in performance, exhibited
in increased response time (or decreased accuracy), has
been known since the time of Ebbinghaus (1885/1913). The
relationship between response time and retention interval has
also been described as a power law by Wickelgren (1974), R = d
+ fT-g, where R is response time, T is the retention interval, d is
the criterion of original learning, f is a scaling parameter, and
g is the rate of forgetting. The power law of forgetting (Wixted &
Carpenter, 2007; see also Rubin & Wenzel, 1996) can be thought
of as the inverse of the power law of acquisition.
Transfer: Laws Relating to Similarity. Training on a particular

task has implications for performance on other related tasks.
The effect of training on one task can be either positive
(facilitation) or negative (interference) on performance of another
task. Transfer is the term for the acquisition of one task affecting
performance on another. The major variable that determines the
extent and direction of transfer is similarity between the two
tasks. Osgood (1949) has conceptualized this relationship in
the form of a transfer surface, which relates transfer magnitude
both to response similarity and to stimulus similarity between
the training and the transfer tasks. When the stimuli in the
two tasks are varied in their similarity and the responses are
held constant, positive transfer is obtained, with its magnitude
increasing as the similarity between the stimuli increases.
On the other hand, when the stimuli are held constant and
the responses are varied in their similarity, negative transfer
is obtained, with its magnitude decreasing as the similarity
between the responses increases. Finally, when both the stimuli
Introduction 5
and responses are simultaneously varied in their similarity,
negative transfer is obtained, with its magnitude increasing as
the similarity between stimuli increases. Shepard (1987) has
given a quantitative expression to such similarity functions,
which he refers to as a universal law of generalization (see
chapter 12).
Declarative and Procedural Information

The principles discussed in this volume apply to both knowledge
and skill training. Knowledge consists of facts, discriminations,
and concepts about a domain, which are generally a part of
a person’s awareness about a given task. In contrast, skills
consist of knowing how to use those facts, which might be, in
some sense, outside of a person’s awareness or consciousness.
For example, in statistics, knowledge includes the fact that the
standard deviation is a measure of data dispersion, whereas
skills include executing the sequence of steps needed to compute
a standard deviation in a data set. Both knowledge and skills
are hierarchical and are logically linked together; facts at every
level of abstraction are associated with procedures for using
them. Note that training applies primarily to skill learning,
whereas education emphasizes fact learning, although both fact
and skill learning are involved in both training and education.
Knowledge acquired during training pertains to a particular
domain or task, which can be as broad as American history or
as narrow as a list of nonsense syllables. Facts are sometimes
referred to in psychological theory as declarative information,
which can be specific to a particular time or place, in which
case it is often referred to as episodic. For example, we probably
remember what we had for dinner last night. That is a fact,
and it is tied to a very specific episode in one’s life. In contrast,
declarative information might be more general in that it is not tied
to a particular time and place, in which case it is often referred
to as semantic (Tulving, 1985). For example, most of us know
that George Washington was the first president of the United
States, but we may not remember when we first encountered
that fact or the circumstances in which we heard it; and if so,
then the fact is considered to be semantic.
Skills, in contrast, relate to things we know how to do.
Skills, like facts, can be either elementary like walking or more
complex like figure skating. But, like facts, skills are not innate,
but rather require experience, usually in the form of repetition
or practice. Skills are sometimes referred to in psychological
theory as procedural information. An additional distinction to
keep in mind with respect to skills is that some are continuous
and some are discrete (e.g., Schmidt, 1975; Wulf & Schmidt,
1997). Continuous skills are executed without interruption over
a time interval, as in a tracking task. Discrete skills, in contrast,
consist of a sequence of steps usually executed in a fi xed order,
as in administering CPR.
It should be noted that there is some inconsistency in the
literature in the use of the term procedure. In some cases,
procedure is restricted to a discrete skill, consisting of a series
of steps. In other cases, though, the term procedure is used
interchangeably with the term skill and encompasses both
discrete and continuous activities. In this volume, skill will be
used as a term to refer to both continuous and discrete activities.
There are occasions, however, where the term procedure is
also used with the more restricted meaning, to preserve the
intention of the relevant investigators. In addition, some theories
(Anderson, 1982) employ special definitions of declarative and
procedural information, relating them to different stages in the
skill acquisition process (see chapter 6). The different uses of
the term procedure and of the distinction between procedural
and declarative information will be made clear in the context of
discussions in the various chapters of the volume.
Even though procedural and declarative information have
been described here as separate constructs, as noted above, the
two are very closely linked in behavior (e.g., Kolers & Roediger,
1984). No act of behavior can be completely described or
understood without some statement regarding both the facts and
the skills of the behaving individual. A fact without a skill or a
skill without facts cannot be expressed in behavior. In training,
therefore, attention needs to be focused on both the basic facts
that relate to the task, job, or domain and on the skills required
to implement those facts. A trainee in any new job will acquire a
related set of facts and skills required for successful performance.
Nevertheless, as will be evident in chapter 2, the analysis of tasks
or jobs into procedural and declarative components is useful
in understanding the basic cognitive processes of acquisition,
retention, and transfer of training.
A related distinction to keep in mind in any discussion of
training is the difference between implicit and explicit learning.
Implicit learning usually refers to the acquisition of skill or
procedures, which is often accomplished by repetition and practice
and does not necessarily involve intention to learn. Furthermore,
the skill that results from implicit learning may appear to be
unconscious and seems often to be applied automatically. In
contrast, explicit learning usually refers to the acquisition of facts
Introduction 7
or new associations. Explicit learning is generally accomplished
intentionally by instruction, is applied consciously, and may not
require repetition for its acquisition. Facts are acquired explicitly
and rapidly, perhaps by an all-or-none process. Further, facts may
be rapidly forgotten. If they are available, however, they transfer
broadly across new situations (e.g., Postman & Underwood, 1973).
In contrast, skills are acquired implicitly but slowly. They gain
strength with practice and repetition. Once acquired, skills are
well retained but transfer minimally to new situations (Ivancic &
Hesketh, 2000; Lee & Vakoch, 1996; Maxwell, Masters, Kerr, &
Weedon, 2001). These empirical findings are captured within the
procedural reinstatement training principle, to be discussed in
chapter 2. Although these distinctions between declarative and
procedural information, on one hand, and between explicit and
implicit learning, on the other hand, are clearly connected to one
another, the procedural/declarative distinction is descriptive,
whereas the implicit/explicit distinction is more often used as
a way of explaining behavior rather than describing it. Thus,
the explicit/implicit distinction appears in various theoretical
accounts of human behavior (e.g., Graf & Masson, 1993; Reber,
1967), and is subject to considerable debate among theorists. This
debate will be discussed in subsequent chapters of the volume.
Principles, Guidelines, and Specifications

Training in the real world can be optimized by the implementation
of empirically valid training principles. The basic principles
and their empirical support will be reviewed in chapter 2. But
principles of training are typically abstract and not directly
connected to particular jobs or tasks. What is then required is
a way to translate basic principles into actual training routines.
This translation process has been described by Salas, Cannon-
Bowers, and Blickensderfer (1999), who introduced a distinction
among principles, guidelines, and specifications for training.
Principles, guidelines, and specifications all relate to how
training is best accomplished. In effect, they provide a conduit
between training theory and training practice. A principle, which
is the level addressed in chapter 2, is an underlying truth or fact
about human behavior. A guideline, in contrast, is a description
of actions or conditions that, if correctly applied, could improve
training. A specification is a detailed, precise statement of how
training should be conducted by operationalizing training
guidelines within a training program. A consideration of training
principles, thus, provides only an initial step toward designing
training programs that can optimize on-the-job performance.
Additional developmental or applied research, mostly beyond the
scope of the present volume, will be required to translate training
principles into guidelines and, subsequently, to specifications.
Content of Chapters
The purpose of this volume is to present the current state of
knowledge about how to train people mentally to perform jobs
and tasks in the real world. The plan is to address the process of
training from a number of vantage points, all based on research
in cognitive psychology. The effort begins, in chapter 2, with
a review of principles of training that have strong empirical
validation. Each principle is defined, its evidential support is
described, and possible guidelines for its use are suggested.
Chapters 3 through 7 describe a set of interrelated original
research projects designed to produce new empirical evidence
on training. Specifically, chapter 3 summarizes research aimed
at identifying and empirically supporting training principles
for procedural and declarative memory skills. Experiments are
presented illustrating a range of issues, including (a) generality
across tasks of individual principles, (b) tests of multiple
principles in a single task, (c) tests of principles in complex,
dynamic environments, and (d) development and testing of
new principles. Chapter 4 discusses the role of attention in the
training process. It identifies a number of component aspects of
attention, such as selectivity (i.e., filtering incoming information
from the environment), resource allocation (i.e., how to deploy
attention to meet the specific demands of a target task or job),
and effort investment (i.e., the role of motivation in attention),
and it demonstrates their contribution to the training process.
Chapter 5 summarizes research on response selection and
stimulus-response compatibility as contributing to speed and
accuracy in rudimentary binary choice tasks. The purpose is to
elucidate the most fundamental cognitive processes involved in
skill acquisition, retention, and transfer during training. Chapter
6 addresses issues related to level of automation in the training
process, individual differences in cognitive aptitude, and how
these factors interact to influence the effectiveness of training.
Finally, chapter 7 describes research that has investigated
methods for producing effective training using simulation and
other technologies. It focuses on technology-based training for
both individuals and collectives, especially in military settings.
The next set of chapters (chapters 8 through 12) focuses on
training theory, rather than on new empirical data. Specifically,
chapter 8 provides a taxonomic analysis of tasks, methods,
Introduction 9
performance measures, and principles involved in training. The
range of variables that can affect training and the multiplicity
of tasks that may require training prevent an exhaustive
quantification of training effects for specific tasks and training
scenarios. To render the study of training effects tractable and to
guide research, this chapter develops a taxonomy that includes
separate dimensions for task descriptions, training procedures,
training principles, and assessments of performance. The
taxonomy provides a framework by which training effects can be
assessed and predicted componentially for any task. Chapter 9
describes three computational models of training effects using the
ACT-R framework (Anderson & Lebiere, 1998) and the instance-
based learning (IBL) theory (Gonzalez, Lerch, & Lebiere, 2003).
The models address (a) fatigue effects in data entry, (b) stimulus-
response compatibility and Simon effects, and (c) visual detection
in a dynamic environment. A major conclusion from this work
is the robustness of an instance-based theory of training based
on memory for exemplars rather than on rules. Chapter 10
describes three models of training using a computational tool
called IMPRINT, which has previously been used mainly in large-
scale modeling for making resource allocation and personnel
decisions in the military. The goal of chapter 10 is to show
that this platform can be extended to capture the relationship
between training variables and performance based on smaller-
scale cognitive tasks. Three different tasks are examined: (a)
data entry, (b) visual detection in a dynamic environment, and
(c) information integration in a networked environment. The
goal of the modeling is to predict performance that reflects
underlying cognitive processes as revealed by the experimental
data in these three tasks. Chapter 11 describes the use of
contemporary applied mathematical techniques to evaluate and
compare computational models of the type described in chapters
9 and 10. The work demonstrates that models based on various
modeling platforms are comparable in their ability to account
for human performance in a couple of different tasks. However,
extant models require an excessive amount of computation time,
which often precludes parameter optimization and rapid precise
prediction. To solve this problem, selected models have been
reprogrammed in Matlab, which allows them to be speeded up
substantially. The translation to Matlab also permits extremely
rapid optimization of model parameters. Chapter 12 describes
an overarching quantitative framework designed to account for
the basic cognitive components of training in a single global
equation. The framework incorporates the separate functions
identified with each of the three basic processes of training
(acquisition, retention, transfer). It incorporates an activation
function, similar to that in the ACT-R modeling platform,
describing the strength of a person’s knowledge or skill for a
given task or domain, as a function of past experience with
that and similar tasks. The global equation accommodates
established principles of generalization and transfer.
Earlier, a distinction was made between training and education.
Training focuses on specific tasks or jobs with an emphasis on
skill learning, whereas education seems more general, with a
focus on a particular domain of knowledge and an emphasis
on fact learning. Chapter 13 focuses on retrieval practice as
a training technique to enhance knowledge acquisition in an
educational context. It reviews classroom studies designed to
assess retrieval training when used in combination with spacing
of training sessions and with interleaving of subtasks or of
subparts of the material to be learned. The results illustrate the
close connection or similarity of cognitive processes involved in
training and education.
The overall goal of this volume is to provide enough information
about training to permit some speculation about, if not outright
application to, task and job requirements in the real world. To
make applications possible, however, we must first understand
the “real world.” To that end, chapter 14 describes the reality
of complex operations, particularly those conducted by airline
pilots, and discusses the implications of that reality for research
on training. A critical issue in this discussion is the demand of
concurrent task management in many everyday environments,
such as driving a car, and the ways in which training research
must address such demands.
The focus in this volume has been on the normal training
process. But what if the person being trained is deficient or
defective in some serious way, making the training required
unusual or out-of-the-ordinary? The importance of this question
arises, in part, from the current public and political interest in
acquired brain injuries and the consequent effects on physical
and cognitive health. Chapter 15 reviews the current status
of cognitive retraining programs and evaluates their success.
The review identifies training principles, like those described
in chapter 2, which function effectively in retraining programs.
In addition, it suggests possibilities for the transition of as-yet-
unused training principles into cognitive retraining efforts in
the future.
The final chapter reviews the current status of scientific
knowledge about training. Chapter 16 ties together the major
points of the preceding chapters and arrives at some general
Introduction 11
conclusions and implications of research in cognitive psychology
for application outside the laboratory so as to enhance the
training process.
References
Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological
Review, 89, 369–406.
Anderson, J. R., & Lebiere, C. (1998). The atomic components of thought.
Hillsdale, NJ: Erlbaum.
Bourne, L. E., Jr., Healy, A. F., Parker, J. T., & Rickard, T. C. (1999).
The strategic basis of performance in binary classification tasks:
Strategy choices and strategy transitions. Journal of Memory and
Language, 41, 223–252.
Delaney, P. F., Reder, L. M., Staszewski, J. J., & Ritter, F. E. (1998). The
strategy-specific nature of improvement: The power law applies by
strategy within task. Psychological Science, 9, 1–7.
Ebbinghaus, H. (1913). Memory: A contribution to experimental psy-
chology. New York: Teachers College, Columbia University. (Original
work published 1885)
Gonzalez, C., Lerch, J. F., & Lebiere, C. (2003). Instance-based learn-
ing in dynamic decision making. Cognitive Science, 27, 591–635.
Graf, P., & Masson, M. E. J. (Eds.) (1993). Implicit memory: New direc-
tions in cognition, development, and neuropsychology. Hillsdale, NJ:
Erlbaum.
Healy, A. F. (2007). Transfer: Specificity and generality. In H. L. Roedi-
ger, III, Y. Dudai, & S. M. Fitzpatrick (Eds.), Science of memory: Con-
cepts (pp. 271–275). New York: Oxford University Press.
Healy, A. F., Kole, J. A., Buck-Gengler, C. J., & Bourne, L. E., Jr. (2004).
Effects of prolonged work on data entry speed and accuracy. Journal
of Experimental Psychology: Applied, 10, 188–199.
Ivancic, K., & Hesketh, B. (2000). Learning from errors in a driving
simulation: Effects on driving skill and self-confidence. Ergonomics,
43, 1966–1984.
Kole, J. A., Healy, A. F., & Bourne, L. E., Jr. (2008). Cognitive com-
plications moderate the speed-accuracy tradeoff in data entry: A
cognitive antidote to inhibition. Applied Cognitive Psychology, 22,
917–937.
Kolers, P. A., & Roediger, H. L., III (1984). Procedures of mind. Journal
of Verbal Learning and Verbal Behavior, 23, 425–449.
Lee, Y., & Vakoch, D. A. (1996). Transfer and retention of implicit and
explicit learning. British Journal of Psychology, 87, 637–651.
Leibowitz, N., Baum, B., Enden, G., & Karniel, A. (2010). The expo-
nential learning equation as a function of successful trials results
in sigmoid performance. Journal of Mathematical Psychology, 54,
338–340.
Maxwell, J. P., Masters, R. S. W., Kerr, E., & Weedon, E. (2001). The
implicit benefit of learning without errors. Quarterly Journal of
Experimental Psychology: Human Experimental Psychology, 54A,
1049–1068.
Newell, A., & Rosenbloom, P. S. (1981). Mechanisms of skill acquisition
and the power law of practice. In J. R. Anderson (Ed.), Cognitive
skills and their acquisition (pp. 1–55). Hillsdale, NJ: Erlbaum.
Osgood, C. E. (1949). The similarity paradox in human learning: A
resolution. Psychological Review, 56, 132–143.
Pachella, R. G. (1974). The interpretation of reaction time in informa-
tion-processing research. In B. H. Kantowitz (Ed.), Human informa-
tion processing: Tutorials in performance and cognition (pp. 41–82).
Postman, L., & Underwood, B. J. (1973). Critical issues in interference
theory. Memory & Cognition, 1, 19–40.
Reber, A. S. (1967). Implicit learning of artificial grammars. Journal of
Verbal Learning and Verbal Behavior, 6, 855–863.
Rickard, T. C. (1997). Bending the power law: A CMPL theory of strategy
shifts and the automatization of cognitive skills. Journal of Experi-
mental Psychology: General, 126, 288–311.
Roediger, H. L., III. (2008). Relativity of remembering: Why the laws of
memory vanished. Annual Review of Psychology, 59, 225–254.
Rubin, D. C., & Wenzel, A. E. (1996). One hundred years of forgetting:
A quantitative description of retention. Psychological Review, 103,
734–760.
Salas, E., Cannon-Bowers, J. A., & Blickensderfer, E. L. (1999).
Enhancing reciprocity between training theory and practice: Princi-
ples, guidelines, and specifications. In G. R. Ferris (Ed.), Research in
personnel and human resources management (Vol. 17, pp. 291–321).
Stamford, CT: JAI Press.
Schmidt, R. A. (1975). A schema theory of discrete motor skill learning.
Psychological Review, 82, 225–260.
Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of prac-
tice: Common principles in three paradigms suggest new concepts
for training. Psychological Science, 3, 207–217.
Schneider, V. I., Healy, A. F., & Bourne, L. E., Jr. (2002). What is learned
under difficult conditions is hard to forget: Contextual interference
effects in foreign vocabulary acquisition, retention, and transfer.
Journal of Memory and Language, 46, 419–440.
Shepard R. N. (1987). Toward a universal law of generalization for psy-
chological science. Science, 237, 1317–1323.
Tulving, E. (1985). How many memory systems are there? American
Psychologist, 40, 385–398.
Wickelgren, W.A. (1974). Single-trace fragility theory of memory dynam-
ics. Memory & Cognition, 2, 775–780.
Wixted, J. T., & Carpenter, S. K. (2007). The Wickelgren power law
and the Ebbinghaus savings function. Psychological Science, 18,
133–134.
Wulf, G., & Schmidt, R. A. (1997). Variability of practice and implicit
motor learning. Journal of Experimental Psychology: Learning, Mem-
ory, and Cognition, 23, 987–1006.
2 Empirically Valid Principles
of Training
Alice F. Healy, Vivian I. Schneider,
and Lyle E. Bourne, Jr.
This chapter reviews the theoretical and empirical research lit-

erature in experimental cognitive psychology as it pertains to
training for successful performance of specific tasks or jobs.
The aim is to identify evidence-based principles of training that
are well enough established that they might be implemented in
actual training regimens. The principles vary to some degree in
their empirical support, but this review includes only those for
which there is convincing evidence and theoretical understand-
ing. This review focuses primarily on principles, but it also offers
tentative guidelines for implementation that might be examined
in further applied research.
The principles of training reviewed here are organized into
categories or clusters, with the grouping based primarily on
underlying cognitive processes and training requirements. It
should be recognized at the outset that these categories are
somewhat arbitrary. A given principle might have been catego-
rized differently or placed in more than one category, but only a
single category choice was used here. Where necessary, cross-
linkages between categories are referenced.
Principles Relating to Resource and Effort

Allocation
Implementation of some training principles requires the learner
to direct or allocate cognitive resources and effort to particular
aspects of the knowledge or skills to be acquired.
Deliberate Practice
The old adage that practice makes perfect is true, but not all
practice is equivalent in effectiveness. Deliberate (i.e., effortful,
highly focused, and highly motivated) practice is best in terms of
promoting skill acquisition and expertise (Ericsson, Krampe, &
14 Alice F. Healy, Vivian I. Schneider, and Lyle E. Bourne, Jr.
Tesch-Römer, 1993). Indeed, learners, even those who might be
talented or have a high aptitude for the training domain, will not
acquire their highest level of performance if they do not engage
in deliberate practice over a prolonged period of time with many
repetitions of the skill to be performed.
Guideline: By initial instructions, trainers should try to
engage trainees in deliberate practice throughout the training
process.
Depth of Processing
One aspect of deliberate practice relates to how deeply the mate-
rial to be learned is processed. Activities during training that
promote deep or elaborate processing of materials yield supe-
rior retention (e.g., Craik & Lockhart, 1972; but see Roediger,
2008, for exceptions). The depth of processing principle can be
achieved in various ways. For example, asking individuals to
rate the pleasantness of each word to be memorized enhances its
memory over simply viewing the word. Memory is also enhanced
when the material is presented in a format that requires a trans-
lation process or speech coding. Counter to intuition, when
numerical data must be entered into some system, the num-
bers should be presented in word format (e.g., three-five-two)
rather than numeral format (3-5-2) if the goal is to maximize
memory for the numbers. Word format, but not numeral format,
requires translation from the words to the digits represented
on a keyboard and facilitates speech coding of the digits. This
additional process enhances long-term memory for the material
(Buck-Gengler & Healy, 2001).
Guideline: To enhance the durability of training material, pro-
mote deep processing of the material to be learned either by
explicit instructions or by incidental task demands.
Generation Effect
Another recommendation related to deliberate practice is to
promote active, rather than passive, learning techniques. For
example, if the task is to memorize a set of procedures for trou-
bleshooting a piece of equipment, the trainees should try to gen-
erate the procedures from memory, rather than simply to read
or to reread them. Then the trainees should check the accuracy
of their actively generated responses against the correct list and
make note of any errors. They should actively generate the list
repeatedly until they are able to produce it without error. This
recommendation follows from the generation effect (the finding
Empirically Valid Principles of Training 15
that people show better retention of learned material when it
is self-produced, or generated, than when it is simply copied or
read; e.g., Crutcher & Healy, 1989; McNamara & Healy, 1995,
2000; Slamecka & Graf, 1978). This generation effect is incorpor-
ated into a common study strategy known as the 3R technique
(read-recite-review), which has been shown to improve learning
from educational texts (McDaniel, Howard, & Einstein, 2009).
Guideline: Trainers should use whatever methods are possible
to engage trainees actively in the learning process; this advice
would include periodically requiring them to generate answers
to questions.
Focus of Attention
It is possible for a learner to deploy or focus attention in vari-
ous ways during training. Furthermore, a learner might be
instructed about how to focus attention effectively. Some stud-
ies of motor skill acquisition have compared an external focus
of attention (i.e., attention to the results of a movement) to an
internal focus of attention (i.e., attention to the body movements
themselves). That research has consistently found, at least after
some initial training, that there is an advantage for the external
focus of attention with respect to learning, retention, and trans-
fer of motor skills (Lohse, Sherwood, & Healy, 2010; Lohse, Wulf,
& Lewthwaite, in press; C. H. Shea & Wulf, 1999; Wulf, 2007).
This result is explained by the constrained action hypothesis,
according to which well-developed motor skills are represented
in bodily mechanisms that are impaired by conscious attention
to them (Beilock, Bertenthal, McCoy, & Carr, 2004).
Guideline: Trainers of motor skills should encourage learn-
ers to adopt an external focus of attention on the target of their
movements rather than an internal focus on the bodily move-
ments themselves.
Strategic Use of Knowledge

Whenever possible, information should be related to trainees’
preexisting knowledge, particularly in those circumstances
when they need to absorb a large amount of new information.
Previously acquired knowledge can be used as a structure for
organizing otherwise seemingly unrelated facts even when the
facts themselves fall outside the domain of existing knowledge.
For example, if trainees know a lot about baseball, they can use
that knowledge to organize and, thus, quickly learn a large set
of facts about their team members or classmates. The idea is to
associate each unfamiliar group member with a famous indi-
vidual from the baseball domain. Although additional associa-
tions might seem to complicate the task at hand, connections
to existing knowledge will enhance performance both in terms
of accuracy and speed of responding with the new information,
following the strategic-use-of-knowledge principle (learning and
memory are facilitated whenever preexisting knowledge can be
employed as a mediator in the process of acquisition; Healy,
Shea, Kole, & Cunningham, 2008; Kole & Healy, 2007, 2011;
Van Overschelde & Healy, 2001).
Guideline: Trainees should be instructed to use their previ-
ously acquired knowledge when learning a new set of facts, even
if the existing knowledge seems irrelevant to the new facts.
Cognitive Antidote to Fatigue, Boredom,

or Task Disengagement
Prolonged work on a given task often results in deterioration
of performance, despite ongoing skill acquisition. It has been
found that prolonged work sometimes produces an increas-
ing speed–accuracy trade-off in performance, such that accu-
racy declines over trials while at the same time response speed
improves (Healy, Kole, Buck-Gengler, & Bourne, 2004). The
deterioration in accuracy is attributable to fatigue, task disen-
gagement, or boredom on the part of subjects. This deterioration
can be counteracted by the periodic introduction of a simple
cognitive requirement. For example, subjects might be required
to make a simple computation before each response or to alter-
nate terminating keystrokes after each response (Kole, Healy,
& Bourne, 2008). Under these conditions, the speed–accuracy
trade-off is eliminated; that is, the decline in accuracy disap-
pears although responses continue to speed up across practice
trials. These results have led to a cognitive antidote training
principle (the introduction of cognitive activities can counteract
fatigue, task disengagement, and boredom effects, resulting in
performance maintenance or even improvement during sessions
of prolonged work).
Guideline: Instructors should consider adding a cognitive
component to a routine task on a trial-by-trial basis to avoid
disengagement and boredom.
Principles Relating to Context Effects

Some training principles reflect the fact that training is often
context specific, meaning that the knowledge and skills learned
are bound, at least to some degree, to the circumstances in
which they were acquired. The following are the most important,
well-established principles of this type.
Procedural Reinstatement
The procedural reinstatement principle implies that dupli-
cating at test procedures that were required during learning
facilitates subsequent retention and transfer (Clawson, Healy,
Ericsson, & Bourne, 2001; Healy & Bourne, 1995; Healy, Claw-
son, et al., 1993; Healy, Fendrich, et al., 1992; Healy, Wohld-
mann, & Bourne, 2005). This principle is similar to others that
were derived primarily from studies of list learning, including
the principles of encoding specificity (memory for information is
best when retrieval cues elicit the original encoding operations;
e.g., Tulving & Thomson, 1973), transfer appropriate processing
(memory performance will be best when test procedures evoke
the procedures used during prior learning; e.g., C. D. Morris,
Bransford, & Franks, 1977; Roediger, Weldon, & Challis, 1989),
and context-dependent memory (memory for information is worse
when tested in a new context than when tested in the original
context in which it was learned; e.g., Kole, Healy, Fierman, &
Bourne, 2010; S. M. Smith & Vela, 2001). An important corol-
lary to this procedural reinstatement principle is that specificity
(limited transfer) occurs for tasks based primarily on procedural
information, or skill, whereas generality (robust transfer) occurs
for tasks based primarily on declarative information, or facts
(Healy, 2007; Healy, Wohldmann, et al., in press). Thus, for skill
learning, retention is strong but transfer is limited, whereas for
fact learning, retention is poor but transfer is robust.
Guideline: Trainees should reinstate the conditions of study
as closely as possible when taking a test or performing in the
field. If trainees are able to anticipate the test or field conditions,
then they should modify their study conditions to match them.
Training can be expected to be generalizable when it involves
declarative facts, whereas it can be expected to be durable when
it involves procedural skills.
Functional Task Development

Instructors often assume that teaching a primary task without
distraction from extraneous secondary task requirements will
benefit the learning process. However, if such secondary task
requirements exist in the field, then use of this training method
will not provide optimal transfer to field performance. Research
has shown that to be effective, training must incorporate the
complete set of field task requirements, including all second-
ary task requirements imposed in the field. This effect works
both ways. That is, training with extraneous secondary task
requirements will also not be optimal if field performance does
not include those requirements. In general, learning, especially
skill learning, is highly specific to the conditions of training.
This observation follows from both the procedural reinstate-
ment principle and the functional task principle (secondary task
requirements are often integrated with primary task require-
ments during learning, resulting in the acquisition of a single
functional task rather than two separate tasks; Healy, Wohld-
mann, Parker, & Bourne, 2005; Hsiao & Reber, 2001; Wohld-
mann, Healy, & Bourne, 2010).
Guideline: For optimal performance, the entire configura-
tion of task requirements during training, including secondary
as well as primary tasks, needs to match those in the field as
closely as feasible.
Part-Task Training
Under certain conditions training only a part of a task before
training the whole task is more effective than training the whole
task from the beginning. For complex tasks that can be divided
into parts, the conditions for part-training superiority appear
to be a function of the organization of the parts. A segmented
task contains parts that are performed sequentially, whereas
a fractionated task contains parts that are performed simulta-
neously. For segmented tasks, part training can either involve
forward chaining (when the initial segment of a task is trained
first) or backward chaining (when the final segment of the task
is trained first). In this case, part-task training is most benefi-
cial when a backward-chaining procedure is used (Wightman
& Lintern, 1985) because there is a strong association between
performance level on the terminal task and knowledge of results
(i.e., the feedback resulting from task completion). In contrast, for
a fractionated task, Adams and Hufford (1962) found that part
training initially disrupted performance on the whole procedure.
In both segmented and fractionated tasks, during the initial
part-training phase, the trainee constructs independent pro-
cedural representations for each part of the whole task. When
transfer to the whole task occurs, there is only a single inter-
ruption between two successive parts in a segmented task. In
contrast, there can be multiple interruptions among the parts
in a fractionated task. Thus, the procedural representations
can remain intact and independent only in a segmented task;
in a fractionated task a new procedural representation must be
established, which requires integration of the two parts, because
the parts in that case are performed as an interlocking unit
(Marmie & Healy, 1995).
Chapter 4 provides a more detailed account of part-task train-
ing, and chapter 6 discusses the effects of automating parts of
a whole task.
Guideline: Whether or not initial training of a complex task
should involve only parts of that task depends on a number of
task characteristics including forward versus backward chain-
ing of the parts and the segmented versus fractionated nature
of the whole task.
Easy–Difficult Ordering
Complex tasks can be divided into parts based on aspects of the
stimuli involved, such as their difficulty. When a task involv-
ing a stimulus set is trained incrementally, the question arises
as to whether the easier or the more difficult stimuli in the set
should be trained first. Pellegrino, Doane, Fischer, and Alderton
(1991) found that initial training on a difficult subset of stimuli
was beneficial relative to initial training on an easy subset of the
stimuli in a visual discrimination task. According to Pellegrino
et al. (1991; see also Doane, Alderton, Sohn, & Pellegrino, 1996;
Doane, Sohn, & Schreiber, 1999), incremental training should
begin with the part of the stimulus set that yields the most effec-
tive strategic skills. However, it is not always the more difficult
part that yields the optimal strategic skills. For example, Clawson
et al. (2001) found that initial training on easy stimuli in a Morse
code reception task led participants to adopt an effective unitiza-
tion strategy for representing codes, whereas initial training on
difficult stimuli led to a less effective strategy in which individual
elements were separately represented and then integrated.
A related issue that has been explored by Maxwell, Masters,
Kerr, and Weedon (2001) is what they call errorless learning. For
a motor skill, subjects should begin with the easiest task, where
few if any errors are made, and progress to increasingly harder
tasks to minimize the overall number of errors made. It has been
shown that skills that have been learned in an error-prone man-
ner require more attentional resources than do skills learned
in an errorless manner. Because there is less attention needed
to perform the skill learned in errorless training, distractions,
such as a secondary task, cause less disruption. Also related are
the concepts of training wheels and scaffolding, usually used in
the context of fact-learning tasks, in which training begins with
easy items and subsequent training introduces more difficult
items that build on the easier ones (see chapters 4 and 13 for
further discussions of these concepts).
Guideline: Whether or not training should begin with the easi-
est or most difficult components of a task depends on task char-
acteristics. Trainers need to be sensitive to these characteristics
before deciding on the order of subtask training. An important
factor to consider is which parts of the task yield the most effec-
tive strategic skills.
Principles Relating to Task Parameters

Training can vary along a number of task dimensions depend-
ing, for example, on the demands placed on the trainee. Certain
training principles follow from variations in these task char-
acteristics. The most well-established of these principles are
described next, grouped by the task parameters entailed.
Spacing
When training new skills involves repeated practice trials, per-
formance on the skill improves more rapidly when rest inter-
vals are interpolated between trials (i.e., spaced or distributed
practice) than when the trials are administered without rest
intervals (i.e., massed practice; e.g., Bourne & Archer, 1956). In
contrast, when training involves new knowledge (i.e., fact acqui-
sition), performance improves more rapidly when practice tri-
als are massed rather than spaced (Bjork, 2010; Underwood &
Ekstrand, 1967). For both skill and fact learning, however, per-
formance after a retention interval is better for spaced than for
massed practice. A related spacing effect, based on interleaving
materials (see discussion in chapter 13) involves the separation
of repetitions of a given item within a list of items (e.g., Glenberg,
1976; Hintzman, 1974). Although usually some rest between
repetitions improves performance, the rest interval cannot be
increased indefinitely with benefit to the learner. There is an
optimal rest interval for at least some tasks (Bourne, Guy, Dodd,
& Justesen, 1965), but more research needs to be done to deter-
mine the generality of this effect. With respect to retention of
the learned material, the spacing effect does not always hold
when the retention interval (interval between the last repetition
and the test) is very short. Generally, the advantage of spacing
holds for mixed lists including spacing intervals varying across
different items as well as for pure lists with a single spacing
interval (Kahana & Howard, 2005). All of this work is based on
single-session training paradigms with short spacing and reten-
tion intervals.
In a different paradigm, Bahrick (1979) used long spacing
intervals to separate learning sessions and long retention inter-
vals between the end of learning and final testing. He found
that the level of performance on the final test session depended
more on the spacing between learning sessions than it did on
the level of performance achieved in the final learning session.
Unlike findings from experiments with short intervals between
practice trials or items (cited above), which generally show an
advantage for spaced practice, performance on the final learn-
ing session of Bahrick’s study was greatest when the interses-
sion intervals were shortest, but performance on the final test
session was highest when the intersession intervals were longest
(so that they resembled the retention interval). Bahrick, thus,
concluded that for optimal knowledge maintenance, practice
should be spaced at intervals approximating the length of the
eventual retention interval.
More recently, Pashler, Rohrer, Cepeda, and Carpenter (2007)
looked at the effects of varying the training intersession inter-
val (ISI). They showed strong effects of ISI over long retention
intervals (RIs). In addition, test performance after a given RI
was found to be optimal when the ISI was intermediate in value.
Making spacing longer than optimal was, however, less harmful
to retention than making it shorter than optimal. These authors
suggest that it is more effective to use an ISI of several months or
years than to use shorter intervals when retention is tested after
a delay of several years (see also Cepeda, Pashler, Vul, Wixted, &
Rohrer, 2006, for a review of the literature).
Guideline: For optimal benefits from training, repeated prac-
tice on particular items or responses should be spaced in time.
The amount of spacing (length of the time interval between rep-
etitions) should be related to the amount of time that is likely to
pass between training and eventual testing. Generally, it seems
desirable to match the time between repetitions during training
to the time between training and test.
Feedback
Two distinct and major questions have been asked about the
effects of feedback: what form it should take and when to pro-
vide it.
What Kind of Feedback Should be Provided? What type of
feedback to provide is a crucial issue for optimizing training and
retention of knowledge and skills (Schmidt & Bjork, 1992). Trial-
by-trial feedback has been shown to facilitate rate of learning
in many tasks, possibly by motivating participants to set
increasingly higher standards of performance or by identifying
errors and how to correct them. But, if participants have a
good sense anyway of how well they responded, then trial-by-
trial feedback might be a distraction that results in inferior
performance on later acquisition trials, on retention tests, or on
tests with tasks requiring slightly different responses. In such
circumstances, periodic summary feedback, given only on some
proportion of training trials, is often a more effective procedure
for promoting long-term retention than is trial-by-trial feedback
(see, e.g., Schmidt, Young, Swinnen, & Shapiro, 1989, for
illustration of this finding in a ballistic timing task). Indeed
there is some suggestion in the literature that the frequency
of feedback given during acquisition can be gradually reduced
or faded without serious or adverse effects on acquisition
performance and at the same time produce beneficial effects
on long-term retention (Schmidt & Bjork, 1992). Other studies
have shown that in some tasks (e.g., speeded aiming) feedback
effects during training might not persist into later testing for
retention (Bourne, Healy, Pauli, Parker, & Birbaumer, 2005).
Thus, differences among tasks will need to be examined in the
future to assess the boundary conditions on effects involving
what kind of feedback to provide.
When Should Feedback be Provided? In a declarative memory

task, feedback is most effective for learning and retention when it
serves to correct erroneous responses. Pashler, Cepeda, Wixted,
and Rohrer (2005) have shown that for foreign vocabulary
learning, feedback had no benefit on correct response trials
even when those responses were given with low confidence.
On the other hand, in a concept-learning task, Bourne, Dodd,
Guy, and Justesen (1968) found facilitative effects of feedback
on both correct response and incorrect response trials, although
the effects were stronger on incorrect response trials. In a task
involving recall of trivia, T. A. Smith and Kimball (2010) found
facilitative effects of feedback following correct responses as well
as errors, but these effects depended on the introduction of a
delay before feedback was presented. Recall that Maxwell et al.
(2001) demonstrated errorless learning for a motor skill, showing
that learning is promoted when errors are avoided. This finding is
inconsistent with the results for cognitive and verbal tasks, just
reviewed, according to which feedback following errors typically
is more informative to the learner than feedback following correct
responses. Thus, the issue of task differences in when to provide
feedback needs to be clarified in future research.
In a study of message comprehension in a navigation task,
Schneider, Healy, Buck-Gengler, Barshi, and Bourne (2007)
found that training with immediate feedback led to worse per-
formance at test than did training with delayed feedback. These
results suggest that immediate feedback, even when it provides
supplemental information otherwise not available, might not
always be desirable. In some cases, it might interfere with mem-
ory because of the interruption of the processing stream that
supports learning. Further along those lines, Butler, Karpicke,
and Roediger (2007) in a study of fact learning from text found
that a longer delay before feedback was better than a shorter
delay. An explanation for the benefit of delaying the presenta-
tion of feedback after a test is that feedback then serves as an
additional spaced presentation of the information (see above).
Wulf, Shea, and Whitacre (1998) point out that, in learning a
motor skill, knowledge of results (KR) given too frequently or too
quickly after the response might improve performance at the
time of practice but impair later performance relative to learning
a motor skill with KR that is given less frequently or after a delay.
Guideline: Feedback with respect to erroneous responses
is generally more effective than feedback with respect to cor-
rect responses, summary feedback is more desirable in some
cases than trial-by-trial feedback, and briefly delayed feedback
is sometimes preferable to immediate feedback, presumably
because of a spacing effect.
Rehearsal
Mental versus Physical Rehearsal. Often a skill-based task

can be practiced either physically (i.e., by making the actual
required responses) or mentally (i.e., by merely imagining the
required responses). A number of studies have reported no
benefits of mental practice (e.g., Shanks & Cameron, 2000),
whereas other studies have reported benefits on tasks that
are largely cognitive, but not on tasks that are largely motoric
(e.g., Driskell, Copper, & Moran, 1994; Minas, 1978). But other
studies have shown clear benefits to performance after mental
practice even for motoric tasks (e.g., Kohl & Roenker, 1983), and
Decety, Jeannerod, and Preblanc (1989) reported behavioral
similarities between mental and physical practice of walking,
either blindfolded or by imagination, to specified locations
at varying distances. Furthermore, Wohldmann, Healy,
and Bourne (2007) demonstrated in the context of a simple
perceptual-motor laboratory task that some aspects of mental
and physical practice are similar behaviorally in that mental
practice is just as effective as physical practice both for learning
a new motor skill and for maintaining a previously learned motor
skill across a 3-month delay. In fact, Wohldmann, Healy, and
Bourne (2008a) established that mental rehearsal is in some
circumstances better than physical rehearsal in promoting the
acquisition, durability, and transferability of perceptual-motor
skill if performance on the skill is vulnerable to interference
effects attributable to physical movements.
Fixed versus Expanding Rehearsal. The studies of spacing

effects reviewed above all used fi xed intertrial intervals during
training. Landauer and Bjork (1978) suggested that constant
intervals, regardless of their length, might not be optimal for
learning and retention. They examined a training procedure
in which the intervals between test trials gradually increased
during learning. This expanding rehearsal procedure produced
greater eventual performance than did a rehearsal procedure
with uniform intervals between tests. The positive effects of
expanding rehearsal have been replicated by Cull, Shaughnessy,
and Zechmeister (1996; see also P. E. Morris & Fritz, 2000),
but there have been some failures to replicate (Cull, 2000). In
fact, Karpicke and Roediger (2010) suggested that the positive
effects of expanding rehearsal might be due to the overall
greater amount of spacing under expanded, as opposed to
fi xed, rehearsal conditions. When the amount of spacing was
controlled, the difference between fi xed and expanding conditions
disappeared in their study (see also Karpicke & Bauernschmidt,
2011). However, a study by Storm, Bjork, and Storm (2010)
found conditions under which expanding rehearsal is effective,
namely those involving material that is highly vulnerable to
being forgotten. In any event, an interesting possible extension
for future experimental study is to expand the intervals between
training sessions following the work of Bahrick (1979, 2005).
Although Bahrick found it optimal to match the interval between
training sessions to the retention interval separating the last
training session and the test session, it may be instead that
optimal performance occurs with an expanding set of intervals
between training sessions, with only the last equal to the
retention interval.
Guideline: Type and scheduling of rehearsal opportunities
can have important impacts on the acquisition, retention, and
transfer of knowledge and skill. In general, mental rehearsal
should be employed whenever physical practice is difficult or
impractical. Also, expanding rehearsal might be considered as
a possible strategy, if there is sufficient time during training to
allow for the spacing that is entailed, but the supporting empiri-
cal evidence is still weak.
Testing
Tests are usually thought of as performance assessment tools,
but there is increasing evidence that people learn from tak-
ing tests often as much as or more than they learn from pure
study (see chapter 13). This phenomenon has been referred to
as a “testing effect” (Carpenter & DeLosh, 2005; Izawa, 1992;
McDaniel & Fisher, 1991). Specifically, the testing effect is the
advantage in retention for material that is tested relative to
material that is presented for additional study. Pashler et al.
(2007) point out that the testing effect has been found for vari-
ous types of tests and materials. Specifically, the effect is evident
for free recall (e.g., Allen, Mahler, & Estes, 1969; Carpenter &
DeLosh, 2006) and cued recall (Carrier & Pashler, 1992) and for
face–name associations (Carpenter & DeLosh, 2005), definitions
(Cull, 2000), and general knowledge facts (McDaniel & Fisher,
1991). The testing effect is found even when the subject’s answer
is incorrect, at least when corrective feedback is provided (Kor-
nell, Hays, & Bjork, 2009), and even when the testing occurs
after an initial correct recall response (Karpicke & Roediger,
2008). Pashler et al. also found that covert retrieval practice, a
form of mental rehearsal, in which subjects are asked to retrieve
without providing an observable response, enhances learning.
Note that the testing effect is similar to the generation effect, but
there are subtle though important differences demonstrated by
Karpicke and Zaromb (2010).
McDaniel, Roediger, and McDermott (2007) summarized
experiments on the testing effect in the context of a university
course. They found that providing short-answer and multiple-
choice tests initially, compared to providing no tests initially,
significantly aided performance on a subsequent test. They also
found that short-answer tests (requiring production or recall)
were more helpful to later test performance than were multiple-
choice tests (requiring only recognition), even when the later
tests involved multiple-choice questions. Finally, they found that
short-answer tests were more effective than focused restudy of
the same material when those tests involved corrective feedback.
Note that the testing effect has been examined primarily in
declarative learning tasks, where it is possible to separate pure
study from test performance. In skill learning tasks, study and
tests are usually integrated into the trial-by-trial acquisition
procedure, with each trial necessarily including a testing com-
ponent. The testing effect is really, thus, not directly applicable
to skill learning although mental practice (or even observation)
might be considered an analogue of studying without testing.
Guideline: A lot of fact learning occurs during test taking.
Therefore tests should be embedded in the training process
whenever possible.
Overlearning
Training usually ends when the trainee reaches some predes-
ignated performance criterion, such as one or more error-free
training trials. Overlearning refers to practice beyond the per-
formance criterion (Pashler et al., 2007). It has been found that
overlearning, relative to less practice, improves later perfor-
mance (Krueger, 1929). Consequently, overlearning has been
proposed as a useful, general strategy when long-term reten-
tion is the goal (Driskell, Willis, & Copper, 1992). However, over-
learning is not always an efficient way to strengthen acquired
knowledge and skill. For example, in a study by Rohrer, Taylor,
Pashler, Wixted, and Cepeda (2005) subjects were taught novel
vocabulary pairs. They saw each word pair either 5 or 10 times.
After 1 week, the subjects who saw the pairs 10 times showed a
substantial benefit over the subjects who saw the pairs 5 times,
but the difference had disappeared after 4 weeks. Rohrer and
Taylor (2006) conducted a similar study using a new math
skill. One group of subjects had three times the number of
practice problems, but no difference was found after either the
1-week or the 4-week retention interval. Thus, Pashler et al.
concluded that for long-term memory, overlearning is ineffi-
cient as a training technique (confi rming results were obtained
by Rawson & Dunlosky, 2011). They pointed out, however, that
in some cases overlearning might be the only alternative when
a skill needs to be performed with absolutely no errors at a
much later time (e.g., performing CPR or landing a space shut-
tle). They also concluded that, even when retrieval accuracy
is at ceiling, overlearning might improve speed of responding
(e.g., Logan & Klapp, 1991), and speedup could be useful when
rapid responding is a prime consideration.
Guideline: Overlearning is recommended as a training tech-
nique only when training time is not severely limited and when
it is crucial to have the strongest possible representations of
knowledge and skill.
Task Difficulty
Interference is a source of difficulty in training that occurs
when conditions allow incorrect answers to come to the train-
ee’s mind, along with the correct answer, thereby requiring
the trainee to choose the correct answer from among several
alternatives. Increasing interference during training has been
shown to impede training speed but ultimately to enhance the
durability and flexibility of what is learned. For example, mixing
material across categories during training, as opposed to group-
ing the material by category, enhances interference, which may
inhibit initial acquisition, but should yield better retention and
transfer (see chapter 13 for a discussion of interleaving). In fact,
it has been shown that many things that make learning difficult
(not just categorical interference) facilitate transfer to a new task
as well as long-term retention of the original task (see chapter
3 for an example). This recommendation to introduce difficulty
during training follows from both the effects of contextual inter-
ference (interference during learning facilitates later retention
and transfer; Battig, 1972, 1979; Carlson & Yaure, 1990; Lee
& Magill, 1983; Schneider, Healy, & Bourne, 1998; Schneider,
Healy, Ericsson, & Bourne, 1995; J. B. Shea & Morgan, 1979;
but see Wulf & Shea, 2002, for some exceptions) and, more
generally, the training difficulty principle (any condition that
causes difficulty during learning facilitates later retention and
transfer; Schmidt & Bjork, 1992; Schneider, Healy, & Bourne,
2002; but see Young, Healy, Gonzalez, Dutt, & Bourne, 2011, for
some qualifications).
Not all sources of difficulty during training are desirable, how-
ever (see Bjork, 1994). McDaniel and his colleagues (McDaniel &
Butler, 2011; McDaniel & Einstein, 2005) argue that difficulties
introduced during training are facilitative only when they cause
the learner to apply task-relevant cognitive processes (e.g., gen-
eration from answer fragments) to the to-be-learned material
that otherwise would not be engaged.
Guideline: Counter to intuition, trainers should consider
introducing sources of interference into any training routine.
If durable retention and flexible transfer are the goals of train-
ing, then mixing materials during training is advisable for most
learners. Trainers might consider enhancing the difficulty of
training exercises in other ways as well with the caveat that
task-relevant cognitive processes must be engaged.
Stimulus-Response Compatibility
Cognitive skills can be divided into three stages: (a) perception
of the stimulus, (b) decision making and response selection, and
(c) response execution (Proctor & Dutta, 1995). The most ubiq-
uitous phenomenon observed in the second stage of skill acqui-
sition is the effect of stimulus-response compatibility (Fitts &
Deininger, 1954; Fitts & Seeger, 1953; Proctor & Vu, 2006; see
also chapter 5). This effect reflects a difference in performance
attributable to the mapping of individual stimuli to responses,
such that performance is best when the stimulus set and the
response set are configured in a similar way and each stimulus
is mapped to its corresponding response (e.g., left-right stimu-
lus locations are mapped to left-right responses). The detrimen-
tal effects of incompatibility are not easily overcome, even after
extensive practice (e.g., Dutta & Proctor, 1992; but see Miles
& Proctor, 2010, for an exception). Making a task difficult by
introducing stimulus-response incompatibilities is clearly not a
desirable difficulty for learning or retention.
Guideline: It is important to maintain stimulus-response
compatibility during training to avoid the prolonged, detrimen-
tal effects that incompatibility can have on performance.
Serial Position
Better memory has been found for the initial and final items in a
to-be-learned list of items (Nipher, 1878). This bow-shaped serial
position function, with both primacy and recency components,
is found at the start of learning but diminishes as repeated tri-
als on the same material are given (Bonk & Healy, 2010). The
same effect is observed for short lists (as few as 4 items) and
long lists (40 items or more), for tasks that require item learn-
ing or response-sequence learning, and for both immediate and
delayed memory. The relative magnitude of primacy and recency
effects differs depending on many variables, especially the test-
ing procedure. In any event, the items in the middle of a list are
at a disadvantage for both learning and memory (see chapter 3).
Thus, training will require more practice on items in the middle
of a list than on those at either end.
Guideline: For tasks that require training on a sequence
of informational items or responses, the trainer should place
greater emphasis on items in the middle of the sequence than on
those at the beginning or end.
Variability of Practice
Variable practice conditions (in which individuals train on a
number of different tasks) typically yield better performance at
transfer testing than do constant practice conditions (in which
individuals train on a single task), even when testing is con-
ducted on the same task as trained under constant practice.
For example, practice that varied the distance from a target at
which beanbags were tossed (2 or 4 feet) produced greater accu-
racy at test on an intermediate distance (3 feet) than did prac-
tice limited to the same intermediate target distance (Schmidt
& Bjork, 1992). The benefits of variable practice were first recog-
nized by Schmidt (1975) for discrete motor tasks and explained
by him in terms of a schema theory, according to which vari-
ability promotes effective and general use of rules (schemata)
relating external task requirements to internal movement com-
mands. Wulf and Schmidt (1997) extended these findings to a
continuous, feedback-regulated tracking task, and Schmidt and
Bjork (1992) extended them further to tasks that do not involve
motor learning, such as concept formation and text process-
ing. Goode, Geraci, and Roediger (2008) also found that vari-
able practice yielded superior transfer over repeated practice on
anagram solutions.
Contrary to these findings, in a feedback-regulated non-
tracking perceptual-motor task, Healy, Wohldmann, Sutton,
and Bourne (2006) found that performance was worse for vari-
able practice conditions relative to constant practice conditions
involving the same task used during transfer testing. The tasks
in this study were defined in terms of particular perceptual-
motor reversal conditions, all involving the same targets. It is
possible to introduce a different form of variability by varying
the targets rather than the type of perceptual-motor reversal. In
a subsequent study involving the same perceptual-motor task,
Wohldmann, Healy, and Bourne (2008b) found benefits of vari-
able practice when subjects were given multiple targets under
the same perceptual-motor reversal condition. The conclusion is
that varying task parameters within a single generalized motor
program (schema) enhances transfer, whereas varying the motor
programs themselves has no beneficial transfer effect.
Guideline: Trainers should vary the conditions of practice
to facilitate generalization of the trained skill. There are some
limits, however, which involve how variability is introduced into
the task.
Summary and Conclusions

This chapter has reviewed the cognitive experimental literature
on training and identified some training principles with strong
empirical validity. These principles do not necessarily apply for
all tasks under all circumstances. Thus, it is important for a
trainer to keep in mind certain distinctions that qualify these
principles. Possibly the most critical of these distinctions is the
difference between skill and knowledge, which is sometimes
equated with the distinction between procedural and declarative
information or the difference between implicit and explicit learn-
ing (see, e.g., Anderson, 1983, and the discussion in chapter 1).
Optimal training will differ depending on whether developing
skill or acquiring knowledge is the primary goal.
The review also acknowledges the three fundamental cogni-
tive processes underlying training, namely acquisition, reten-
tion, and transfer (see chapter 1 for definition and chapter 12 for
a quantitative framework incorporating these processes). Train-
ing principles in some cases apply differentially across those
processes, such that some manipulations might facilitate acqui-
sition but impede retention or transfer. Likewise, some training
principles might impact particular performance measures but
not others, especially under conditions involving a speed–accu-
racy trade-off. Trainers need to be alert to the primary goal of
training, which in some cases might be training efficiency but
in other cases might be durability or generalizability. Similarly,
trainers need to recognize the aspects of behavior that are most
important to be optimized by training, which in some cases will
be accuracy and in other cases speed of response.
The training principles outlined here should be applicable
in a variety of real-world training contexts including the train-
ing of military personnel and industrial workers (see chapters 7
and 14 for detailed discussions of this issue). However, these are
training principles, with tentative training guidelines, but cer-
tainly not specifications for how training guidelines should be
implemented (see chapter 1 this volume; Salas, Canon-Bowers,
& Blickensderfer, 1999). This review provides the first step in the
design of optimal training programs. Additional applied research
needs to be undertaken to translate these principles and ini-
tial guidelines to specifications. Particular applications must
be based on research that refines the guidelines and translates
them into usable training specifications.
References
Adams, J. A., & Hufford, L. E. (1962). Contributions of a part-task
trainer to the learning and relearning of a time-shared fl ight maneu-
ver. Human Factors, 4, 159–170.
Allen, G. A., Mahler, W. A., & Estes, W. K. (1969). Effects of recall tests
on long-term retention of paired associates. Journal of Verbal Learn-
ing and Verbal Behavior, 8, 463–470.
Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA:
Harvard University Press.
Bahrick, H. P. (1979). Maintenance of knowledge: Questions about
memory we forgot to ask. Journal of Experimental Psychology: Gen-
eral, 108, 296–308.
Bahrick, H. P. (2005). The long-term neglect of long-term memory:
Reasons and remedies. In A. F. Healy (Ed.), Experimental cogni-
tive psychology and its applications (pp. 89–100). Washington, DC:
American Psychological Association.
Battig, W. F. (1972). Intratask interference as a source of facilitation in
transfer and retention. In R. F. Thompson & J. F. Voss (Eds.), Top-
ics in learning and performance (pp. 131–159). New York: Academic
Press.
Battig, W. F. (1979). The flexibility of human memory. In L. S. Cermak
& F. I. M. Craik (Eds.), Levels of processing in human memory (pp.
23–44). Hillsdale, NJ: Erlbaum.
Beilock, S. L., Bertenthal, B. I., McCoy, A. M., & Carr, T. H. (2004).
Haste does not always make waste: Expertise, direction of attention,
and speed versus accuracy in performing sensorimotor skills. Psy-
chonomic Bulletin & Review, 11, 373–379.
Bjork, R. A. (1994). Memory and metamemory considerations in the
training of human beings. In J. Metcalfe & A. Shimamura (Eds.),
Metacognition: Knowing about knowing (pp. 185–205). Cambridge,
MA: MIT Press.
Bjork, R. A. (2010, November). The dynamics of use and disuse in
human memory. Keynote address at the 51st Annual Meeting of the
Psychonomic Society, St. Louis, MO.
Bonk, W. J., & Healy, A. F. (2010). Learning and memory for sequences
of pictures, words, and spatial locations: An exploration of serial
position effects. American Journal of Psychology, 123, 137–168.
Bourne, L. E., Jr., & Archer, E. J. (1956). Time continuously on target
as a function of distribution of practice. Journal of Experimental Psy-
chology, 51, 25–33.
Bourne, L. E., Jr., Dodd, D. H., Guy, D. E., & Justesen, D. R. (1968).
Response-contingent intertrial intervals in concept identification.
Journal of Experimental Psychology, 76, 601–608.
Bourne, L. E., Jr., Guy, D. E., Dodd, D. H., & Justesen, D. R. (1965).
Concept identification: The effects of varying length and informa-
tional components of the intertrial interval. Journal of Experimental
Psychology, 69, 624–629.
Bourne, L. E., Jr., Healy, A. F., Pauli, P., Parker, J. T., & Birbaumer,
N. (2005). The influence of stimulus array on training of a speeded
response. American Journal of Psychology, 118, 385–411.
Buck-Gengler, C. J., & Healy, A. F. (2001). Processes underlying long-
term repetition priming in digit data entry. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 27, 879–888.
Butler, A. C., Karpicke, J. D., & Roediger, H. L., III. (2007). The effect of
type and timing of feedback on learning from multiple-choice tests.
Journal of Experimental Psychology: Applied, 13, 273–281.
Carlson, R. A., & Yaure, R. G. (1990). Practice schedules and the use
of component skills in problem solving. Journal of Experimental Psy-
chology: Learning, Memory, and Cognition, 16, 484–496.
Carpenter, S. K., & DeLosh, E. L. (2005). Application of the testing and
spacing effects to name learning. Applied Cognitive Psychology, 19,
619–636.
Carpenter, S. K., & DeLosh, E. L. (2006). Impoverished cue support
enhances subsequent retention: Support for the elaborative retrieval
explanation of the testing effect. Memory & Cognition, 34, 268–276.
Carrier, M., & Pashler, H. (1992). The influence of retrieval on retention.
Memory & Cognition, 20, 633–642.
Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006).
Distributed practice in verbal recall tasks: A review and quantitative
synthesis. Psychological Bulletin, 132, 354–380.
Clawson, D. M., Healy, A. F., Ericsson, K. A., & Bourne, L. E., Jr. (2001).
Retention and transfer of Morse code reception skill by novices:
Part-whole training. Journal of Experimental Psychology: Applied, 7,
129–142.
Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A frame-
work for memory research. Journal of Verbal Learning and Verbal
Behavior, 11, 671–684.
Crutcher, R. J., & Healy, A. F. (1989). Cognitive operations and the gen-
eration effect. Journal of Experimental Psychology: Learning, Mem-
Cull, W. L. (2000). Untangling the benefits of multiple study opportuni-
ties and repeated testing for cued recall. Applied Cognitive Psychol-
ogy, 14, 215–235.
Cull, W. L., Shaughnessy, J. J., & Zechmeister, E. B. (1996). Expand-
ing understanding of the expanding-pattern-of-retrieval mnemonic:
Toward confidence in applicability. Journal of Experimental Psychol-
ogy: Applied, 2, 365–378.
Decety, J., Jeannerod, M., & Preblanc, C. (1989). The timing of mentally
represented actions. Behavioral Brain Research, 72, 35–42.
Doane, S. M., Alderton, D. L., Sohn, Y. W., & Pellegrino, J. W. (1996).
Acquisition and transfer of skilled performance: Are visual discrimi-
nation skills stimulus specific? Journal of Experimental Psychology:
Human Perception and Performance, 22, 1218–1248.
Doane, S. M., Sohn, Y. W., & Schreiber, B. (1999). The role of processing
strategies in the acquisition and transfer of a cognitive skill. Journal
of Experimental Psychology: Human Perception and Performance, 25,
1390–1410.
Driskell, J. E., Copper, C., & Moran, A. (1994). Does mental practice
enhance performance? Journal of Applied Psychology, 79, 481–492.
Driskell, J. E., Willis, R. P., & Copper, C. (1992). Effect of overlearning
on retention. Journal of Applied Psychology, 77, 615–622.
Dutta, A., & Proctor, R. W. (1992). Persistence of stimulus-response
compatibility effects with extended practice. Journal of Experimental
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of
deliberate practice in the acquisition of expert performance. Psycho-
logical Review, 100, 363–406.
Fitts, P. M., & Deininger, R. L. (1954). S-R compatibility: Correspon-
dence among paired elements within stimulus and response codes.
Fitts, P. M., & Seeger, C. M. (1953). S-R compatibility: Spatial charac-
teristics of stimulus and response codes. Journal of Experimental
Glenberg, A. M. (1976). Monotonic and nonmonotonic lag effects in
paired-associate and recognition memory paradigms. Journal of Ver-
bal Learning and Verbal Behavior, 15, 1–16.
Goode, M. K., Geraci, L., & Roediger, H. L., III. (2008). Superiority of
variable to repeated practice in transfer on anagram solution. Psy-
chonomic Bulletin & Review, 15, 662–666.
Healy, A. F., & Bourne, L. E., Jr. (Eds.). (1995). Learning and memory
of knowledge and skills: Durability and specificity. Thousand Oaks,
CA: Sage.
Healy, A. F., Clawson, D. M., McNamara, D. S., Marmie, W. R., Schnei-
der, V. I., Rickard, T. C., … Bourne, L. E., Jr. (1993). The long-term
retention of knowledge and skills. In D. Medin (Ed.), The psychology
of learning and motivation: Advances in research and theory (Vol. 30,
pp. 135–164). New York: Academic Press.
Healy, A. F., Fendrich, D. W., Crutcher, R. J., Wittman, W. T., Gesi, A.
T., Ericsson, K. A., & Bourne, L. E., Jr. (1992). The long-term reten-
tion of skills. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.),
From learning processes to cognitive processes: Essays in honor of
William K. Estes (Vol. 2, pp. 87–118). Hillsdale, NJ: Erlbaum.
Healy, A. F., Shea, K. M., Kole, J. A., & Cunningham, T. F. (2008). Posi-
tion distinctiveness, item familiarity, and presentation frequency
affect reconstruction of order in immediate episodic memory. Jour-
nal of Memory and Language, 58, 746–764.
Healy, A. F., Wohldmann, E. L., & Bourne, L. E., Jr. (2005). The pro-
cedural reinstatement principle: Studies on training, retention, and
transfer. In A. F. Healy (Ed.), Experimental cognitive psychology and
its applications (pp. 59–71). Washington, DC: American Psychologi-
cal Association.
Healy, A. F., Wohldmann, E. L., Kole, J. A., Schneider, V. I., Shea, K.
M., & Bourne, L. E., Jr. (in press). Training for efficient, durable, and
flexible performance in the military. In W. Arthur, Jr., E. A. Day, W.
Bennett, Jr., & A. Portrey (Eds.), Individual and team skill decay:
State of the science and implications for practice. New York: Taylor
& Francis.
Healy, A. F., Wohldmann, E. L., Parker, J. T., & Bourne, L. E., Jr. (2005).
Skill training, retention, and transfer: The effects of a concurrent
secondary task. Memory & Cognition, 33, 1457–1471.
Healy, A. F., Wohldmann, E. L., Sutton, E. M., & Bourne, L. E., Jr.
(2006). Specificity effects in training and transfer of speeded
responses. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 32, 534–546.
Hintzman, D. L. (1974). Theoretical implications of the spacing effect.
In R. L. Solso (Ed.), Theories in cognitive psychology: The Loyola sym-
posium (pp. 77–99). Potomac, MD: Erlbaum.
Hsiao, A. T., & Reber, A. S. (2001). The dual-task SRT procedure: Fine-
tuning the timing. Psychonomic Bulletin & Review, 8, 336–342.
Izawa, C. (1992). Test trial contributions to optimization of learning
processes: Study/test trials interactions. In A. F. Healy, S. M. Koss-
lyn, & R. M. Shiffrin (Eds.), From learning processes to cognitive
processes: Essays in honor of William K. Estes (Vol. 2, pp. 1–33).
Kahana, M. J., & Howard, M. W. (2005). Spacing and lag effects in free
recall of pure lists. Psychonomic Bulletin & Review, 12, 159–164.
Karpicke, J. D., & Bauernschmidt, A. (2011). Spaced retrieval: Absolute
spacing enhances learning regardless of relative spacing. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 37,
1250–1257.
Karpicke, J. D., & Roediger, H. L., III (2008). The critical importance of
retrieval for learning. Science, 319, 966–968.
Karpicke, J. D., & Roediger, H. L., III. (2010). Is expanding retrieval a
superior method for learning text materials? Memory & Cognition,
38, 116–124.
Karpicke, J. D., & Zaromb, F. M. (2010). Retrieval mode distinguishes
the testing effect from the generation effect. Journal of Memory and
Language, 62, 227–239.
Kohl, R. M., & Roenker, D. L. (1983). Mechanism involvement during
skill imagery. Journal of Motor Behavior, 15, 179–190.
Kole, J. A., & Healy, A. F. (2007). Using prior knowledge to minimize
interference when learning large amounts of information. Memory &
Cognition, 35, 124–137.
Kole, J. A., & Healy, A. F. (2011). Memory for details about people:
Familiarity, relatedness, and gender congruency. Memory & Cogni-
tion, 39, 637–648.
917–937.
Kole, J. A., Healy, A. F., Fierman, D. M., & Bourne, L. E., Jr. (2010).
Contextual memory and skill transfer in category search. Memory &
Cognition, 38, 67–82.
Kornell, N., Hays, M. J., & Bjork, R. A. (2009). Unsuccessful retrieval
attempts enhance subsequent learning. Journal of Experimental
Krueger, W. C. F. (1929). The effect of overlearning on retention. Journal
of Experimental Psychology, 12, 71–78.
Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns
and name learning. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes
(Eds.), Practical aspects of memory (pp. 625–632). New York: Aca-
demic Press.
Lee, T. D., & Magill, R. A. (1983). The locus of contextual interference in
motor-skill acquisition. Journal of Experimental Psychology: Learn-
ing, Memory, and Cognition, 9, 730–746.
Logan, G. D., & Klapp, S. T. (1991). Automatizing alphabet arithmetic:
I. Is extended practice necessary to produce automaticity? Journal
179–195.
Lohse, K. R., Sherwood, D. E., & Healy, A. F. (2010). How changing the
focus of attention affects performance, kinematics, and electromy-
ography in dart throwing. Human Movement Science, 29, 542–555.
Lohse, K. R., Wulf, G., & Lewthwaite, R. (in press). Attentional focus
affects movement efficiency. In N.J. Hodges & M.A. Williams (Eds.),
Skill acquisition in sport: Research, theory and practice (2nd ed.).
New York: Routledge.
Marmie, W. R., & Healy, A. F. (1995). The long-term retention of a
complex skill. In A. F. Healy & L. E. Bourne, Jr. (Eds.), Learning
and memory of knowledge and skills: Durability and specificity (pp.
30–65). Thousand Oaks, CA: Sage.
Maxwell, J. P., Masters, R. S. W., Kerr, E., & Weedon, E. (2001). The
implicit benefit of learning without errors. Quarterly Journal of
Experimental Psychology: Human Experimental Psychology, 54A,
1049–1068.
McDaniel, M. A., & Butler, A. C. (2011). A contextual framework for
understanding when difficulties are desirable. In A. S. Benjamin
(Ed.), Successful remembering and successful forgetting: A fest-
schrift in honor of Robert A. Bjork (pp. 175–198). New York: Psychol-
ogy Press.
McDaniel, M. A., & Einstein, G. O. (2005). Material appropriate dif-
ficulty: A framework for determining when difficulty is desirable for
improving learning. In A. F. Healy (Ed.), Experimental cognitive psy-
chology and its applications (pp. 73–85). Washington, DC: American
Psychological Association.
McDaniel, M. A., & Fisher, R. P. (1991). Tests and test feedback as learn-
ing sources. Contemporary Educational Psychology, 16, 192–201.
McDaniel, M. A., Howard, D. C., & Einstein, G. O. (2009). The read-
recite-review study strategy: Effective and portable. Psychological
Science, 20, 516–522.
McDaniel, M. A., Roediger, H. L., III, & McDermott, K. B. (2007). Gener-
alizing test-enhanced learning from the laboratory to the classroom.
Psychonomic Bulletin & Review, 14, 200–206.
McNamara, D. S., & Healy, A. F. (1995). A procedural explanation of the
generation effect: The use of an operand retrieval strategy for multi-
plication and addition problems. Journal of Memory and Language,
34, 399–416.
McNamara, D. S., & Healy, A. F. (2000). A procedural explanation of
the generation effect for simple and difficult multiplication problems
and answers. Journal of Memory and Language, 43, 652–679.
Miles, J. D., & Proctor, R. W. (2010). Attention is required for acquisition
but not expression of new response biases. Journal of Experimental
Minas, S. C. (1978). Mental practice of a complex perceptual motor
skill. Journal of Human Movement Studies, 4, 102–109.
Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of process-
ing versus transfer appropriate processing. Journal of Verbal Learn-
ing and Verbal Behavior, 16, 519–533.
Morris, P. E., & Fritz, C. O. (2000). The name game: Using retrieval
practice to improve the learning of names. Journal of Experimental
Psychology: Applied, 6, 124–129.
Nipher, F. E. (1878). On the distribution of errors in numbers written
from memory. Transactions of the Academy of Science of St. Louis, 3,
ccx–ccxi.
Pashler, H., Cepeda, N. J., Wixted, J. T., & Rohrer, D. (2005). When
does feedback facilitate learning of words? Journal of Experimental
Pashler, H., Rohrer, D., Cepeda, N. J., & Carpenter, S. K. (2007).
Enhancing learning and retarding forgetting: Choices and conse-
quences. Psychonomic Bulletin & Review, 14, 187–193.
Pellegrino, J. W., Doane, S. M., Fischer, S. C., & Alderton, D. (1991).
Stimulus complexity effects in visual comparisons: The effects of
practice and learning context. Journal of Experimental Psychology:
Proctor, R. W., & Dutta, A. (1995). Skill acquisition and human perfor-
mance. Thousand Oaks, CA: Sage.
Proctor, R. W., & Vu, K.-P.L. (2006). Stimulus-response compatibility
principles: Data, theory, and application. Boca Raton, FL: CRC Press.
Rawson, K. A., & Dunlosky, J. (2011). Optimizing schedules of retrieval
practice for durable and efficient learning: How much is enough?
Journal of Experimental Psychology: General, 140, 283–302.
Roediger, H. L., III. (2008). Relativity of remembering: Why the laws of
memory vanished. Annual Review of Psychology, 59, 225–254.
Roediger, H. L., III, Weldon, M. S., & Challis, B. H. (1989). Explaining
dissociations between implicit and explicit measures of retention:
A processing account. In H. L. Roediger, III & F. I. M. Craik (Eds.),
Varieties of memory and consciousness: Essays in honour of Endel
Tulving (pp. 3–41). Hillsdale, NJ: Erlbaum.
Rohrer, D., & Taylor, K. (2006). The effects of overlearning and distrib-
uted practice on the retention of mathematics knowledge. Applied
Cognitive Psychology, 20, 1209–1224.
Rohrer, D., Taylor, K., Pashler, H., Wixted, J. T., & Cepeda, N. J. (2005).
The effect of overlearning on long-term retention. Applied Cognitive
Salas, E., Cannon-Bowers, J. A., & Blickensderfer, E. L. (1999).
Enhancing reciprocity between training theory and practice: Princi-
ples, guidelines, and specifications. In G. R. Ferris (Ed.), Research in
personnel and human resources management (Vol. 17, pp. 291–321).
Stamford, CT: JAI Press.
Schmidt, R. A. (1975). A schema theory of discrete motor skill learning.
Schmidt, R. A., Young, D. E., Swinnen, S., & Shapiro, D. C. (1989).
Summary knowledge of results for skill acquisition: Support for the
guidance hypothesis. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 15, 352–359.
Schneider, V. I., Healy, A. F., & Bourne, L. E., Jr. (1998). Contextual
interference effects in foreign language vocabulary acquisition and
retention. In A. F. Healy & L. E. Bourne, Jr. (Eds.), Foreign language
learning: Psycholinguistic studies on training and retention (pp.
77–90). Mahwah, NJ: Erlbaum.
Schneider, V. I., Healy, A. F., Buck-Gengler, C. J., Barshi, I., & Bourne,
L. E., Jr. (2007, July). The effects of feedback on learning to follow
navigation instructions. Poster presented at the joint meeting of the
Experimental Psychology Society and the Psychonomic Society,
Edinburgh, Scotland.
Schneider, V. I., Healy, A. F., Ericsson, K. A., & Bourne, L. E., Jr. (1995).
The effects of contextual interference on the acquisition and reten-
tion of logical rules. In A. F. Healy & L. E. Bourne, Jr. (Eds.), Learn-
ing and memory of knowledge and skills: Durability and specificity
(pp. 95–131). Thousand Oaks, CA: Sage.
Shanks, D. R., & Cameron, A. (2000). The effect of mental practice on
performance in a sequential reaction time task. Journal of Motor
Behavior, 32, 305–313.
Shea, C. H., & Wulf, G. (1999). Enhancing motor learning through
external-focus instructions and feedback. Human Movement Sci-
ence, 18, 553–571.
Shea, J. B., & Morgan, R. L. (1979). Contextual interference effects on
the acquisition, retention, and transfer of a motor skill. Journal of
Experimental Psychology: Human Learning and Memory, 5, 179–187.
Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation of
a phenomenon. Journal of Experimental Psychology: Human Learn-
ing and Memory, 4, 592–604.
Smith, S. M., & Vela, E. (2001). Environmental context-dependent mem-
ory: A review and meta-analysis. Psychonomic Bulletin & Review, 8,
203–220.
Smith, T. A., & Kimball, D. R. (2010). Learning from feedback: Spacing
and the delay-retention effect. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 36, 80–95.
Storm, B. C., Bjork, R. A., & Storm, J. C. (2010). Optimizing retrieval
as a learning event: When and why expanding retrieval practice
enhances long-term retention. Memory & Cognition, 38, 244–253.
Tulving, E., & Thomson, D. M. (1973). Encoding specificity and retrieval
processes in episodic memory. Psychological Review, 80, 352–373.
Underwood, B. J., & Ekstrand, B. R. (1967). Studies of distributed
practice: XXIV. Differentiation and proactive inhibition. Journal of
Experimental Psychology, 74, 574–580.
Van Overschelde, J. P., & Healy, A. F. (2001). Learning of nondomain
facts in high- and low-knowledge domains. Journal of Experimental
Wightman, D. & Lintern, G. (1985). Part-task training for tracking and
manual control. Human Factors, 27, 267–283.
Wohldmann, E. L., Healy, A. F., & Bourne, L. E., Jr. (2007). Pushing
the limits of imagination: Mental practice for learning sequences.
Journal of Experimental Psychology: Learning, Memory, and Cogni-
tion, 33, 254–261.
Wohldmann, E. L., Healy, A. F., & Bourne, L. E., Jr. (2008a). A mental
practice superiority effect: Less retroactive interference and more
transfer than physical practice. Journal of Experimental Psychology:
Wohldmann, E. L., Healy, A. F., & Bourne, L. E., Jr. (2008b). Global
inhibition and midcourse corrections in speeded aiming. Memory &
Cognition, 36, 1228–1235.
Wohldmann, E. L., Healy, A. F., & Bourne, L. E., Jr. (2010). Task inte-
gration in time production. Attention, Perception, & Psychophysics,
72, 1130–1143.
Wulf, G. (2007). Attention and motor skill learning. Champaign, IL:
Human Kinetics.
Wulf, G., & Schmidt, R. A. (1997). Variability of practice and implicit
motor learning. Journal of Experimental Psychology: Learning, Mem-
Wulf, G., & Shea, C. H. (2002). Principles derived from the study of
simple skills do not generalize to complex skill learning. Psycho-
nomic Bulletin & Review, 9, 185–211.
Wulf, G., Shea, C. H., & Whitacre, C. A. (1998). Physical-guidance ben-
efits in learning a complex motor skill, Journal of Motor Behavior, 30,
367–380.
Young, M. D., Healy, A. F., Gonzalez, C., Dutt, V., & Bourne, L. E., Jr.
(2011). Effects of training with added difficulties on RADAR detec-
tion. Applied Cognitive Psychology, 25, 395–407.
3 Basic Research on Training
Principles
Alice F. Healy and Lyle E. Bourne, Jr.
Chapter 2 reviewed the theoretical and empirical literature on

training principles in an effort to give a general picture of cur-
rent scientific knowledge applicable to training from a cogni-
tive psychological point of view. The present chapter extends
or expands upon the empirical foundation of what is currently
known by reviewing in some detail sets of recent experiments
that reexamine some of the principles discussed in chap-
ter 2 and introduce some new principles first observed in this
research. The experiments to be described were chosen in part
to illustrate the range of training issues, cognitive processes,
and experimental methods that are currently under examina-
tion by the authors. The reader should bear in mind that train-
ing principles are empirical effects and generalizations validated
in the laboratory that have implications for training. These prin-
ciples have the potential to transition to training guidelines or
specifications (see chapter 1), although the discussion in the
present chapter is focused primarily on the level of principles.
These experiments address four questions left open by the
literature review presented in chapter 2. First, how general is
the applicability of training principles that are known from the
literature to be empirically valid? Does a given principle apply
broadly, or is its effect limited only to certain contexts or task
requirements? Second, if principles are combined for use in a
particular training task, how do they interact? Do they support
each other so that a greater impact on performance is obtained
from multiple principles than from any one alone? Or are there
circumstances in which the impact of one principle is offset
by the introduction of another? Third, the empirical validity of
most training principles reviewed in chapter 2 was tested and
established with relatively simple cognitive psychological tasks.
Do these principles work when the job becomes complex and
takes on characteristics of the real world? Finally, it should be
Basic Research on Training Principles 41
obvious that training principles that have been researched to
date do not exhaust the possibilities. It is likely that many effec-
tive and empirically valid principles are yet to be discovered.
What are some of these possibilities? Certain experiments to
be reviewed in this chapter examined principles not revealed in
prior research and others, although not intended to produce a
new training procedure, nonetheless returned results consistent
with a novel principle.
Thus the experiments of this chapter are organized into the
following categories:
1. Tests of the generality of individual principles across tasks

or jobs,
2. Tests of multiple principles in a single task,
3. Tests of principles in complex, dynamic task or job
environments,
4. Identifying and testing new principles.
The following sections contain experiments that illustrate

each of these four categories. The experiments are intended
to demonstrate the breadth of this empirical research pro-
gram covering multiple training principles and a wide variety
of experimental contexts. Nonetheless, it is important to note
that the coverage is illustrative and by no means complete or
exhaustive.
Tests of the Generality of Individual Principles

across Tasks or Jobs
Experiments to be described next attempt to assess the gen-
erality of selected training principles validated in one task to
performance in significantly different other tasks. This infor-
mation is, of course, important to know if training principles
are to have practical value in the real world. The first prin-
ciple to be considered is the procedural reinstatement principle,
which was originally formulated based on results obtained in
a target detection task (Healy, Fendrich, & Proctor, 1990) and
later extended to a variety of other tasks (Healy, Wohldmann,
& Bourne, 2005). It is now extended in a new way to a majorly
different database lookup task. The principle, as discussed in
chapter 2, states that transfer of training is limited for tasks
based primarily on procedural information or skill whereas
robust transfer occurs for tasks based primarily on declarative
information or facts.
42 Alice F. Healy and Lyle E. Bourne, Jr.
Information Lookup in a Computerized Database
The task used to determine the generality of the procedural
reinstatement principle involves information lookup in a com-
puterized database (Kole, Healy, Fierman, & Bourne, 2010). The
situation is meant as an analogue of jobs in which an individual
worker might be required to consult a database for repair instruc-
tions related to a particular kind of equipment malfunction. The
difficulty of the search in this case can be compounded by the
fact that the database is often not static, but rather dynamic or
periodically changing.
Subjects were asked to find particular items in the database.
To find an item, they first searched through a list of catego-
ries and then searched for the particular designated item in the
list of exemplars for the selected category. A domain familiar
to the subjects was used, items in a grocery store. On a given
trial, subjects chose a category from a list of 12 or 24 grocery
store categories (such as baking supplies, fruits, health foods,
and snacks) arranged alphabetically. Then they clicked with a
mouse to make their category choice. After clicking on a cat-
egory, subjects were shown a list of four exemplars in the chosen
category (e.g., for the category baking supplies the items might
be almonds, corn starch, molasses, and salt), and they then had
to either select the designated one or, if it was not there, go back
to the category list.
When the categories were in the same domain during train-
ing and test, transfer of training was observed. The first of the
experiments reported here sought to understand the underly-
ing basis for that transfer effect. Did it rely on seeing the same
categories in training and test? Would there be transfer even if
the targets came from different categories at training and test?
To answer these questions, the variable of category continu-
ity, which varied the extent to which the category information
provided at training corresponded with the category information
provided at test, was introduced, with complete correspondence
in the match condition, partial correspondence in the partial
match condition, and no correspondence in the no match con-
dition. The manipulation of category continuity is relevant, for
example, to situations when workers must search through a
dynamic computerized database, with changes made periodi-
cally in the categories. According to the procedural reinstate-
ment principle, there should be transfer in the match and partial
match conditions because of the overlap in declarative category
information, but there should be little or no transfer in the no
match condition because, in that case, there is overlap only in
Experiment 1
3.0
Training
Test
2.5
Mean Response Time per Click
2.0
1.5
1.0
0.5
0.0
Match Partial Match No Match
Category Continuity Condition
Figure 3.1 Response time per click (in s) as a function of phase and
category continuity condition in Kole et al. (2010, Experi-
ment 2). Error bars show standard errors of the mean.
the procedural information (mouse movement and the clicking

response).
The results are summarized in Figure 3.1 in terms of the
measure of response time per click (in s). Subjects improved
from training to test the most when there was a match in cat-
egories (although all items within the categories were different
in the training and test phases). There was less transfer when
categories only partially matched, and even less transfer when
there was no match between categories. Thus, there was domain
specific transfer of declarative category information.
A second experiment asked whether transfer would also be
dependent on exemplar continuity. The variable of exemplar
continuity is relevant to a slightly different dynamic database
that might involve periodic changes in the category exemplars,
rather than changes in the categories themselves. The results
of Experiment 2 are summarized in Figure 3.2 in terms of the
measure proportion of perfect trials. Subjects improved from
training to test only when there was a full or partial match in
exemplars (although all targets were different in the two phases).
Thus, transfer of declarative exemplar information is domain
specific.
Experiment 2
0.8
Training
Proportion of Perfect Trials Test
0.6
0.4
0.2
0.0
Match Partial Match No Match
Exemplar Continuity Condition
Figure 3.2 Proportion of perfect trials as a function of phase and

exemplar continuity condition in Kole et al. (2010, Experi-
ment 3). Error bars show standard errors of the mean.
In summary, there is domain specific transfer of both declar-

ative category information and declarative exemplar informa-
tion, but little or no transfer of procedural information, all of
which is in accordance with the procedural reinstatement prin-
ciple and points to the generality of that principle. These find-
ings imply that training in computerized systems can capitalize
on the transfer benefits of consistent declarative information (as
long as there is at least partial overlap in the categories or exem-
plars) but should anticipate little or no benefit from consistent
procedural components.
Message Comprehension
Another principle in need of a generality check is the optimal
modality principle; that is, learning is better when information
is seen than when it is read, and it is best when it is both read
and seen. By this principle, the ordering of modalities to use
for presenting information from worst to best is (a) read, (b) see,
and (c) both read and see. This principle is popular in the field
of classroom education (Dale, 1969), but there is little empiri-
cal support for it in other contexts. Is the ordering of modalities
specified by this principle valid generally and, thus, appropriate
for optimizing training?
A message comprehension task, meant to mimic communica-
tion between air traffic controllers and pilots (Barshi & Healy,
2002, 2011), was used to examine this principle in a context out-
side of the classroom. Subjects were given messages about navi-
gating in a space shown on the computer screen, and then they
followed those instructions by making mouse clicks to indicate
movement in the space. Three format conditions were examined
(McCormick, Schneider, & Healy, 2010), one with messages pre-
sented in only one modality one time each (single), one with mes-
sages presented in only one modality two times each (double),
and one with messages presented first in one modality and then
in the other (mixed). For the single and double conditions, mes-
sages that were read (words) were compared with messages that
were seen (symbols).
Figure 3.3 shows proportion of correct responses as a func-
tion of format condition. Performance was worst when the com-
mands were shown only once (single), and performance was no
worse (in fact there was a trend for performance to be better)
when the commands were shown twice in the same modality
(double) than when two modalities (mixed) were used. Perfor-
mance in the word modality improved more consistently across
0.8
Proportion Correct
0.7
0.6
0.5
double mixed single
Condition
Figure 3.3 Proportion correct in study by McCormick et al. (2010) as

a function of format condition. Error bars show standard
errors of the mean.
0.8
Words
Symbols
Proportion Correct
0.7
0.6
0.5
1 2 3 4 5 6
Block
Figure 3.4 Proportion correct in study by McCormick et al. (2010) as a

function of modality and block. Error bars show standard
errors of the mean.
blocks of trials than in the symbol modality (see Figure 3.4).

Thus in the first block, words yielded numerically worse per-
formance than symbols, but in the last blocks words yielded
numerically better performance than symbols.
These findings suggest that the optimal modality principle,
which specifies the ordering of modalities from worst to best as
(a) read, (b) see, and (c) both read and see, is at best incomplete
because the modality ordering depends on the amount of prac-
tice. Moreover, presenting information in two modalities does
not necessarily yield better performance than presenting infor-
mation twice in the same modality. Thus, the optimal modality
principle, which appears to be accepted as valid for classroom
education, lacks generality and should not be used in other
tasks without first testing its efficacy in those tasks.
Tests of Multiple Principles in a Single Task

Suppose manipulations are introduced into an experiment
which permit the assessment of effects attributable to two (or
possibly more) different principles. It might be that the two
principles operate independently of one another and that the
experimental outcome is essentially a summation of effects to
be expected from each principle on its own. But there are other
possibilities. There might, for example, be a synergy, such that
the effects of one principle potentiate the effects of the other.
Alternatively, the two principles might, to some degree, compen-
sate for one another, resulting in a subadditive effect of prin-
ciples. There is no good theory to predict what to expect when
principles are combined, and thus, it makes sense to study at
least some principle combinations. Illustrative experiments of
this type are reviewed in the following section.
Memory for Serial Lists

The next set of experiments examines in combination three dif-
ferent principles: the serial position, the dual coding, and the
retrieval distraction principles. The serial position principle
asserts that retention is best for items at the start of a list (pri-
macy advantage) and to some extent also at the end of a list
(recency advantage). The dual coding principle recognizes that
retention is better when information is presented in two, as
opposed to only one, modality. The retrieval distraction principle
implies intuitively that retention should be better when tested
with minimal distraction. Each of these principles has impli-
cations for training following from their known effects in the
laboratory. To address these principles in combination in order
to examine possible interactive effects, a new task was devel-
oped to investigate learning of, and memory for, serial lists, with
varying amounts of distractor items and with successive snap-
shots of the learning process taken until a criterion is reached
(Bonk & Healy, 2010). This task comes closer than other stan-
dard laboratory tasks to the everyday tasks that involve learn-
ing a series of steps in a standardized procedure, such as how
to clean and reassemble a device.
Subjects were shown and responded to the same sequence
of items repeatedly until they reached the criterion of two per-
fect consecutive reproductions of the sequence. Stimulus items
consisted of clip art pictures of identifiable, nameable objects,
such as a trumpet, a key, a microwave oven, and a slingshot.
Two presentation conditions were compared: In the moves con-
dition, a movie showed an arrow cursor navigate from the cen-
ter of the response space and click on 12 of the items, one at a
time. In contrast, in the no moves condition the same 12 target
items were presented one at a time in the center of a blank white
space. Subjects responded in both conditions by using a com-
puter mouse to navigate with the cursor among the pictures
within the response space used in the moves condition. Sub-
jects were to click on the pictures in an attempt to reproduce
the exact order of pictures they had just seen. Thus, spatial
information supplemented the item information during stimu-
lus presentation in the moves condition but not in the no moves
condition. Three distractor conditions were compared: In the
none condition, the only items shown on the screen were the 12
target items; in the equal condition there were an equal number
of target and distractor items shown on the screen; and in the
double condition there were twice as many distractor items as
target items shown on the screen.
The results are summarized in Figure 3.5, which shows the
serial position functions for the first and second recall attempts
in each combination of presentation condition and distractor
condition. The most important thing to note about these func-
tions is that they all show a bow shape, but the bowing is more
pronounced in some cases than in others. Specifically, there
were strong and extended primacy effects and weak recency
effects. There was also a large effect of recall attempt, with per-
formance much higher for the second attempt than for the first
attempt, as would be expected. There was a significant advan-
tage for the none distractors condition relative to the equal and
Moves Condition First Attempt No Moves Condition First Attempt
1.0 1.0
Double
0.8 0.8 Equal
Proportion Correct
Proportion Correct
Double None
Equal
0.6 None 0.6
0.4 0.4
0.2 0.2
0.0 0.0
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Serial Position Serial Position
Moves Condition Second Attempt No Moves Condition Second Attempt

1.0 1.0
0.8 0.8
Proportion Correct
Proportion Correct
0.6 0.6
Double
Equal
0.4 None 0.4
Double
Equal
None
0.2 0.2
0.0 0.0
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Serial Position Serial Position
Figure 3.5 Proportion correct for the fi rst two attempts as a function
of presentation condition, distractor condition, and serial
position in experiment by Bonk and Healy (2010). Error
bars show standard errors of the mean.
double conditions. In addition, there was a surprising small but
significant advantage for the no moves condition (with no extra
spatial information) relative to the moves condition. Finally, the
shape of the serial position function depended on attempt as
well as on both presentation and distractor conditions, with the
primacy and recency effects largest when overall performance
was worst.
Although primacy and recency effects were found in all cases,
thus supporting a general serial position principle, the finding
of different serial position patterns for different conditions is not
captured by quantitative models of the serial position function
and therefore poses a challenge to those models (e.g., Brown,
Neath, & Chater, 2007; Murdock, 1960). The results from this
experiment also document the intuitive expectation that dis-
tractors at test disrupt recall. There was, however, no support
for the dual coding principle because the spatial information
provided by the moves condition proved to be distracting instead
of helpful to the recall of verbally encoded items.
The three principles under examination in this experiment
clearly have interactive effects, with the magnitude of primary
and recency effects depending on retrieval distraction and the
type of coding. Therefore these principles need to be considered
in concert in any training routine. The more general theoreti-
cal implication is that single principles should not be applied
indiscriminately or in isolation because they might affect perfor-
mance differently depending on training and testing contexts.
The task used in this experiment resembles tasks that involve
learning a series of steps in a standardized procedure, such as
how to clean and reassemble a device (although the steps in that
case are not necessary independent). Finding best performance
for the initial items in the series and worst performance for the
middle items should lead trainers to concentrate their training
efforts toward the middle items, assuming all items in the series
are equally important. But the extent to which specific middle
items should be emphasized clearly depends on both the learn-
ing and the testing conditions.
Clicker Technique
A popular new educational technology involves clickers. Periodi-
cally, during a classroom lecture, students are asked multiple-
choice questions and use a hand-held clicker device to produce
answers, with the distribution of answers visible to the class as
a whole and to the instructor. The instructor might decide to
review, elaborate, or provide more information about material
if many students in the class give wrong answers to the clicker
questions. The clicker technique provides a possible method of
classroom teaching time conservation or compression, whereby
any material that is already known is eliminated from further
presentation. In addition, like the well-established testing effect,
it uses tests, not just study episodes, as a way to strengthen
classroom learning. There are, thus, two training principles
that are relevant to the clicker technique. First is the principle
of training compression, by which training can be truncated by
eliminating practice on known facts. Second is the principle of
testing, by which tests can be used to strengthen a person’s
knowledge of material as much as, or possibly even more than,
can further study.
How do these two training principles interact? In the first
of two experiments reviewed here, a laboratory analogue of
the clicker technique was compared to a more standard tech-
nique, in which all material was presented whether or not it was
already learned, and a technique based on individualized train-
ing, in which material was dropped out from presentation to a
particular subject if the subject responded correctly to the mate-
rial (Anderson, Healy, Kole, & Bourne, 2010). Students learned
64 facts about plants. The facts were all correct, with the plant
names changed to fictitious ones to eliminate effects of prior
knowledge. Each fact was stated in either specific or general
form. Subjects were tested with four-alternative multiple-choice
questions, in which they had to recognize the fictitious plant
name associated with a fact. There were four study-test training
rounds, in which subjects were shown blocks of up to 8 facts
for study and then immediately tested on those facts. The first
and fourth training rounds involved all 64 facts; the second and
third rounds involved fewer facts in some conditions.
The four training conditions were (a) full, the standard condi-
tion involving presentation of all facts on each round; (b) drop-
out, the individualized condition involving presentation and
testing on Rounds 2 and 3 of only those facts missed on the pre-
ceding round by a given subject; (c) yoked, the control condition
involving presentation and testing on Rounds 2 and 3 of only
those facts missed on the preceding round by a matched subject
in the dropout condition; and (d) clicker, a condition simulat-
ing the clicker technology. The clicker condition was equated
to the dropout condition of a preliminary experiment in terms
of the number of facts presented and tested on Rounds 2 and
3 on average (26 of the 64 facts on Round 2 and 10 on Round
3). The specific facts selected were determined on the basis of a
full condition of the preliminary experiment to be the facts most
often missed by subjects on the previous round. An immediate
test followed the training rounds, and a retention test was given
1 week later. Both tests involved all 64 facts tested without
any prior study. At each test, half of the facts were in the same
format (specific or general) as at training and half were in dif-
ferent formats at training and test. The full condition involved
no training compression. The dropout, yoked, and clicker con-
ditions all involved compression of training. The compression
was individualized in the dropout condition and was based on
group performance in the clicker condition. The yoked condition
served as a control because the compression depended on the
performance of only one individual but not of the subject.
Performance was better for the clicker, dropout, and full con-
ditions than for the yoked condition at the immediate test. A
similar pattern was evident at the retention test 1 week later, but
forgetting occurred, and the advantage for the clicker condition
was reduced to some extent because of greater forgetting in that
case (see Figure 3.6). Finding test performance in the clicker
and dropout conditions comparable to that in the full condi-
tion indicates that teaching time can be conserved successfully
without sacrificing the amount learned, in accordance with the
principle of training compression. In addition, this pattern of
Test
1.0
Immediate test
Retention test
0.8
Proportion Correct
0.6
0.4
0.2
0.0
Clicker Dropout Full Yoked
Condition
Figure 3.6 Proportion correct at test in Experiment 1 by Anderson et

al. (2010) as a function of condition and test. Error bars
show standard errors of the mean.
results (superior performance for the clicker and dropout con-
ditions relative to the yoked condition) demonstrates that the
performance level of a group is just as useful as the performance
of the specific subject, and both are more useful than the perfor-
mance of another individual, for determining which facts should
be given further attention in order to promote learning. Overall,
these results provide support for the principle of training com-
pression and for the clicker technique as an effective method of
conserving instructional time in the classroom.
Experiment 2 sought to explore the principle of testing, which
underlies the clicker’s compression effect and overall effective-
ness. The use of clickers necessitates testing, and previous
research on the testing effect shows that testing individuals
over material to be remembered improves retention compared
to simply studying the material (see Roediger & Karpicke, 2006,
for a review). In Experiment 2, the same set of facts was used as
in Experiment 1, with the same materials. The same procedure
was used in Experiment 2 as in the clicker condition of Experi-
ment 1. Three clicker conditions were compared including (a)
study-test, (b) study-study, and (c) test-test. In the study-test
condition (as in Experiment 1), subjects were presented with up
to eight facts per block in Rounds 2 and 3 for study, and then
they were immediately tested on those facts. To control for fre-
quency of exposure, in the study-study condition each fact was
presented twice for study during Rounds 2 and 3. Likewise in
the test-test condition each fact was presented twice for test dur-
ing those rounds. All of these conditions had the same facts pre-
sented and tested on Rounds 2 and 3 as in the clicker condition
of Experiment 1. Of primary interest was performance in each
of the three training conditions. The clicker technique involves
both studying and testing facts. The comparison of conditions
allowed a determination of whether both aspects are important
for the effectiveness of the technique.
In Rounds 1 and 4 of training, there was an advantage for the
study-test condition, but only with specific facts (see Figure 3.7).
This finding agrees with the well-established testing effect and
principle of testing. However, this finding indicates that test-
ing is most beneficial when it is preceded by study, at least if
specific, rather than general, facts are being learned—but see
Karpicke and Roediger (2008) for a benefit of retesting and no
benefit of restudying dropped items. Taken together, these two
experiments demonstrate the independent effects of compres-
sion and testing and further elucidate the role that testing plays
in teaching time compression, all within the context of a newly
developed and still evolving classroom teaching technology.
Training
1.0 Training Fact Type
General
Specific
0.8
Proportion Correct
0.6
0.4
0.2
0.0
Study-Study Study-Test Test-Test
Condition
Figure 3.7 Proportion correct during training in Experiment 2 by
Anderson et al. (2010) as a function of condition and train-
ing fact type. Error bars show standard errors of the mean.
Tests of Principles in Complex, Dynamic Task,

or Job Environments
Because most of the empirical support for training principles
comes from cognitive psychological studies using standard labo-
ratory paradigms, there is reason to be concerned about their
validity in situations that are more like the real world. Fortu-
nately, there are some recent tests of certain principles in more
complex dynamic tasks, and the results of this research seem
consistent with laboratory observations, as the following two
sets of experiments illustrate.
Radar Target Detection and Decision Making

Consider the training difficulty principle, by which any condi-
tion that causes difficulty during learning should facilitate later
retention and transfer. This principle was established within the
realm of paired associates learning (Schneider, Healy, & Bourne,
2002). The present experiments aim to define the boundary
conditions under which this principle applies, and to deter-
mine whether there are “desirable,” as opposed to “undesirable,”
difficulties (Bjork, 1994) that should be incorporated into train-
ing regimes. A RADAR target detection and decision making task
was used, in which the potential targets were nine military vehi-
cles (e.g., submarine, helicopter, and jeep; Young, Healy, Gonza-
lez, Dutt, & Bourne, 2011). Each trial started with the display of
four enemy targets arranged horizontally. A given trial was com-
prised of seven frames. Each frame consisted of a continually
changing display showing four blips (white squares) originat-
ing at the four principal corners of the display and converging
uniformly toward the center of the display after about 4 s. Each
white square was filled with either a target or a distractor. There
was at most one target on a given trial. Subjects responded to
a target by pressing the space bar. After pressing the space bar
for detection of a target, half of the subjects made an immediate
relevant action-firing response, a secondary requirement meant
to increase the difficulty of the task.
In two experiments, subjects were trained in one session and
were tested 1 week after training. Half of the subjects were in the
same firing response condition each week; half were in different
conditions. There were methodological differences between the
two experiments; for example, in whether or not subjects also
had to count tones that occurred during the seven frames of a
trial, but those differences are ignored here. A hit was scored
when a target was present and the subjects pressed the space
bar.
In Experiment 1, during training, subjects made more hits
when not required to make the firing response than when required
to do so. But at test when the firing response was required those
subjects trained with the firing response made more hits than
those not trained with the firing response (see Figure 3.8 top
panel). In Experiment 2, during training, subjects made some-
what more hits when not required to make the firing response
than when required to do so. But at test in this case when no
firing response was required, those subjects trained with the fir-
ing response made more hits than those not trained with the
firing response (see Figure 3.8 bottom panel). Although the pat-
tern was somewhat different in the two experiments, difficult
training (with the firing decision) did promote better test per-
formance in both, thus showing that adding difficulties during
training, which would be likely to reduce training performance,
can enhance subsequent performance at test or in the field.
There is at least one caveat that needs to be mentioned: Earlier
work with the RADAR task (Young et al., 2011) has shown that
not all types of difficult training enhance subsequent test per-
formance. Training with a concurrent, irrelevant tone-counting
Experiment 1
1.0
0.9
0.8
Hit Rate
Train-No Fire
0.7 Train-Fire
0.6
0.5
TRAINING Test-No Fire Test-Fire
TEST
Experiment 2
1.0
0.9
0.8
Hit Rate
Train-No Fire
Train-Fire
0.7
0.6
0.5
TRAINING Test-No Fire Test-Fire
TEST
Figure 3.8 Hit rate at training and test in Experiment 1 (top panel)
and in Experiment 2 (bottom panel) of study by Young et
al. (2011, Experiments 2 & 3) as a function of training con-
dition and test condition. Error bars show standard errors
of the mean.
task (rather than the sequential, relevant action-firing response)

had harmful effects on detection performance at both training
and test (see chapter 10). Thus, difficulties that occur concur-
rently with the main task and that are irrelevant to it may be
undesirable for training (see chapter 4 for a theoretical account
of performance effects attributable to cognitive load).
Information Integration
Anecdotal evidence suggests that, in a dynamic battlefield envi-
ronment, temporally recent information might have a greater
influence on decision making than information more remote
in time. This is an intriguing hypothesis, as many laboratory
researchers have found just the opposite for decisions based on
sequentially presented data—namely that the initial data have a
greater influence on the outcome of decisions than do those that
occur later in the sequence. In decision making, this phenom-
enon is generally referred to as “anchoring” (Tversky & Kahne-
man, 1974). Anchoring is reminiscent of the primacy effect found
in memory research.
A novel information integration task was used to study the
serial position principle in a complex, dynamic, decision making
environment (Ketels, Healy, Wickens, Buck-Gengler, & Bourne,
2011). In this task, subjects saw sequences of seven targets
appearing in different locations in a 20 × 20 grid on a computer
screen, with each target representing an enemy. After the seven
targets were shown, the subjects had to decide where in the tar-
get field to deploy surveillance resources. There was one optimal
deployment location. This situation simulates a battlefield task
where information occurs in sequence over time and must be
held in mind, after which rapid and accurate deployment or fir-
ing decisions are required.
After the deployment decision, subjects were required to recall
the targets to support the real battlefield need to communicate
to others where enemies have been seen. These recall responses
reveal whether subjects remembered best the locations of the
primacy or recency items because memory for the items might
be the cause of any serial position effects on the deployment
decision. Subjects were to recall the targets in any order in free
recall and in order of presentation in serial recall. For half of the
subjects (those in a “no threat” condition), all enemies were the
same color (red) and had the same threat level. For the remain-
ing subjects (those in a “threat” condition), the color of the ene-
mies varied in red saturation (in four levels from gray to red)
reflecting increasing levels of threat. Subjects in the threat con-
dition were told to choose a deployment location that depended
in part on the threat level of each enemy. In earlier experiments
in this series (Ketels, Healy, Wickens, Buck-Gengler, & Bourne,
2010), subjects demonstrated an anchoring effect. The purpose
of the present experiment was to examine the extent to which
this effect depended on task complexity.
The distance of the selected deployment location from the
items occurring at each of the seven serial positions, as shown
Deployment Decision
6.2
No Threat
Mean Distance from Location Threat
6.0
5.8
5.6
5.4
5.2
5.0
1 2 3 4 5 6 7
Serial Position
Recall
1.00
Proportion of Errors
Free
0.95 Serial
0.90
0.85
1 2 3 4 5 6 7
Serial Position
Figure 3.9 Mean distance of selected deployment location from items

occurring at each serial position as a function of threat
condition (top panel) and proportion of recall errors at
each serial position as a function of recall condition (bot-
tom panel) in study by Ketels et al. (2011). Error bars show
standard errors of the mean.
in the top panel of Figure 3.9, was the measure used to examine
the effect of serial position on the deployment decision at test.
Lower values indicate a closer distance, and thus, a larger con-
tribution to the deployment decision. Figure 3.9 does not provide
an index of performance, which would be measured as distance
from the ideal deployment location. Instead it provides an index
of the extent to which the item at each serial position influenced
the deployment decision.
Subjects were closer on average to all serial positions in the no
threat condition, where all enemies should be weighted equally
in the deployment decision, than in the threat condition, where
some enemies should be weighted more heavily than others
because of their threat. The serial position functions for the two
conditions were similar, with subjects choosing a deployment
location that was closest to the initial enemy they saw. Thus, the
subjects exhibit an anchoring effect on their deployment deci-
sion at both levels of task complexity.
The bottom panel of Figure 3.9 shows the recall serial posi-
tion function for the two recall conditions. Errors were high
overall in this recall task but higher in the serial than in the
free recall condition. There was a strong primacy effect in both
conditions, with a recency effect evident only in the free recall
condition. The primacy effect observed for recall might be the
underlying cause of the anchoring effect in the deployment deci-
sion. Subjects might choose a deployment location that is close
to a given one of the enemy locations to the extent that they can
remember that location. Thus, the serial position principle plays
an important but different role in complex decision making and
memory tasks.
The most important take-home message is that, in situations
such as those occurring with soldiers on the battlefield when
information occurs in sequence over time and must be held in
memory, individuals making a decision about the sequence put
greatest weight on what they see first, presumably using that
information as an anchor for processing subsequent informa-
tion. Because this strategy might not be optimal for many deci-
sions, trainees should be instructed or otherwise alerted to the
need to attend more to middle and late informational items in a
sequence or to items that have the greatest relevance to accurate
decision making.
Identifying and Testing New Principles

Some experiments conducted as a part of the present research
program revealed unanticipated new training principles, not
seen in earlier studies. In some cases, these experiments were
designed explicitly to establish the validity of a proposed new
principle. In other cases, a new principle emerged serendipi-
tously. The following sections review a couple of examples of this
work.
Continuous Assessment of Memory

The next set of experiments was originally intended as a test
of memory constriction attributable to stress. The hypothesis,
from Staal, Bolton, Yaroush, and Bourne (2008), asserts that,
under time pressure to perform, subjects will focus on the here
and now, demonstrating better memory for events in the imme-
diate past than for the more remote past, relative to subjects not
under pressure. But, in fact, the results showed that, although
time pressure did impede overall performance, there was little
difference between pressure and no pressure conditions in sub-
jects’ memory for immediate and remote associations. Thus, the
time pressure manipulation will not be considered further in
the following discussion. Unexpectedly, however, the obtained
results did reveal evidence of a new principle, the protective func-
tion principle, by which holding in working memory an intention
to respond in a special way to certain predesignated items not
only preserves that intention but also sustains the associated
information.
The task used allowed a continuous assessment of four types
of memory in a single setting—short-term and long-term memory
for default and special responses (Bourne, Healy, Bonk, & Buck-
Gengler, 2011). Specifically, subjects were required to keep track
of the color associated with a particular common name as well
as the response location required to demonstrate recognition of
that color on a later test. On all trials, a name appeared in the
center of a large square in the middle of the screen, surrounded
by eight color patches, two per compass direction. One set of four
color patches was arrayed inside the square, and the second
set was arrayed outside the square. During study, subjects saw
the name in color and were required to respond to its color by
clicking on the appropriate color in the inner ring. At test, they
saw a name in black and were required to respond to the color
in which it had been printed earlier, clicking on the inner ring of
colors in most cases but on the outer ring on special occasions.
For default response trials subjects saw a name that had previ-
ously been in lower case and responded with a color click in the
inner ring; for special response trials subjects saw a name that
had previously been in upper case and responded with a color
click in the outer ring. Thus, in both default and special response
trials, subjects learned to associate colors with names. The only
difference was in where the color response was made (inner or
outer ring). Also, importantly, there were twice as many default as
special trials so clicking in the outer ring was a special response,
especially because all study trials also required an inner ring
click. For the short retention interval, the study trial was two
trials before the test (2-back), whereas for the long retention inter-
val, the study trial was eight trials before the test (8-back).
Correct color responses, regardless of ring location, is an index
of memory for color–name associations. In each of two experi-
ments, subjects were more accurate overall for color responses
when special ring responses were required than when default
ring responses were required (see Figure 3.10). Also, subjects
were more accurate overall for color responses at short (2-back)
than at long (8-back) delays. However, importantly, the effect
of delay was evident only for default location responses, not for
special location responses.
These results imply that the requirement to respond in a spe-
cial way preserves associated information in working memory
(Baddeley, 2007). Thus, holding special response requirements
in working memory might protect association memory against
forgetting due to interference. The implication for job perfor-
mance is that asking workers to make special responses to cer-
tain (possibly especially important) target items that they might
see in the future could serve to protect their memory for infor-
mation about those targets—the protective function principle.
Artificial Grammar Learning

How can individuals best learn regularities that obey nonobvi-
ous, difficult-to-verbalize rules (Young & Healy, 2011)? Experi-
ments using an artificial grammar learning paradigm revealed
an unanticipated training principle called here the principle of
positive focusing—regularities obeying complex rules can some-
times be best identified from only positive exemplars, rather
than a mixture of positive and negative exemplars.
Subjects participated in two sessions separated by 1 week.
In both weeks, they acquired and were tested on a finite state
artificial grammar (e.g., Gomez, 1997) that generated short
string of letters. The grammars used were either the same or
different in the 2 weeks, but the letter sets used were always
different. During acquisition in each session, subjects were
instructed to observe 60 strings of letters. After a string was
displayed and removed, subjects typed the string from memory.
Experiment 1
Color Accuracy
1.0
0.8 2-Back
8-Back
Proportion Correct
0.6
0.4
0.2
0.0
Default Special
Response Location
Experiment 2
Color Accuracy
1.0
0.8 2-Back
8-Back
Proportion Correct
0.6
0.4
0.2
0.0
Default Special
Response Location
Figure 3.10 Proportion of correct color responses in Experiment 1

(top panel) and Experiment 2 (bottom panel) of study by
Bourne et al. (2011) as a function of response location and
retention interval. Error bars show standard errors of the
mean.
This process was repeated until the subject correctly typed
the string from memory. Subjects typed both grammatical and
ungrammatical strings during acquisition, with the grammati-
cal strings printed in blue ink and the ungrammatical strings
printed in green ink. There were three acquisition conditions.
For subjects in the all positive condition, the ungrammatical
strings were simply repeating consonants (e.g., CCC). For sub-
jects in the other two conditions, the ungrammatical strings
followed the grammar except for a single letter substitution or
a two-letter transposition. In the blocked condition, all of the
blue grammatical strings preceded the green ungrammatical
strings, whereas in the mixed condition, the blue grammatical
and green ungrammatical strings were interleaved randomly
(as they were in the all positive condition). At test, subjects were
asked to indicate whether new strings, printed in black were
grammatical or ungrammatical in terms of the acquired gram-
mar. The ungrammatical strings were like those shown during
acquisition in the blocked and mixed conditions. After indicat-
ing whether a string was grammatical or ungrammatical, sub-
jects gave a confidence rating in their response from 1 to 6, with
6 indicating highest confidence.
The results at test are summarized in the top panel of Figure
3.11 in terms of the total number correct (out of 15 possible) as
a function of acquisition condition. It might be expected that
providing reasonable ungrammatical strings during acquisition
would help subjects figure out the rules. Clearly this was not the
case because subjects performed significantly better in the all
positive condition than in the other two conditions. There was
no difference between the blocked and mixed (interleaved) con-
ditions. The unexpected finding of no advantage for mixing rela-
tive to blocking (cf. chapters 2 and 13) can be attributed to the
fact that subjects did not benefit from the opportunity to study
ungrammatical strings.
The same pattern of results was found for confidence ratings,
as shown in the bottom panel of Figure 3.11. These results sup-
port a principle of positive focusing according to which regu-
larities obeying complex rules can sometimes be best identified
with only positive exemplars, rather than both positive and neg-
ative exemplars. Following this principle, trainers should not
necessarily provide exemplars in which rules are violated along
with exemplars in which rules are followed, possibly because
negative instances are a source of confusion that obscures the
unknown underlying regularities.
10.0
9.5
Mean Total Number Correct
9.0
8.5
8.0
All Positive Blocked Mixed
Acquisition
5.0
4.5
Mean Confidence Rating (1-6)
4.0
3.5
3.0
All Positive Blocked Mixed
Acquisition
Figure 3.11 Mean total number correct (out of 15 possible) (top panel)
and mean confidence rating (1–6) (bottom panel) at test as
a function of acquisition condition in study by Young and
Healy (2011). Error bars show standard errors of the mean.
Conclusions
The purpose of this chapter was to review in some detail a col-
lection of recent basic research experiments designed to extend
in a number of directions the body of research on training prin-
ciples reviewed in chapter 2. The chapter addresses four ques-
tions following up on the implications drawn from that review.
The questions examined in this chapter concern (a) the general-
ity of training principles, (b) the interaction of principles when
combined, (c) the operation of principles in complex, dynamic
environments, and (d) finally the possibility that yet unidentified
principles would emerge.
These results provide an expanded and more general picture
of how training principles affect performance in specific tasks.
Much of what was observed is consistent with expectations
derived from the review of laboratory-based cognitive psycho-
logical studies of these principles. In many cases, however, the
expectations needed to be modified in light of the experimental
results that contradict or qualify some of the training principles
in these different contexts. Moreover, some of the results led to
unexpected findings that require formulation of principles that
could not be anticipated on the basis of previous experiments.
Thus, the experiments presented in this chapter provide a more
complete picture about how cognitive psychological effects can
be incorporated into training principles.
The research reviewed in this chapter also suggests possi-
ble ways of making transitions from the laboratory to real life
training needs. Those transitions will require the development
of training guidelines and, eventually, training specifications for
specific tasks and jobs. That work will require applied research
of the type discussed in chapter 7 and elsewhere in this volume.
But the applications must begin with empirically valid training
principles of the type illustrated in the present chapter. There is
every reason to believe that a transition of this sort is the opti-
mal way of enhancing the effectiveness of real-world training.
References
Anderson, L. S., Healy, A. F., Kole, J. A., & Bourne, L. E., Jr. (2010,
November). The clicker technique: An effective way to compress
teaching time. Paper presented at the 51st Annual Meeting of the
Psychonomic Society, St. Louis, MO.
Baddeley, A. (2007). Working memory, thought, and action. New York:
Oxford University Press.
Barshi, I., & Healy, A. F. (2002). The effects of mental representa-
tion on performance in a navigation task. Memory & Cognition, 30,
1189–1203.
Barshi, I., & Healy, A. F. (2011). The effects of spatial representation
on memory for verbal navigation instructions. Memory & Cognition,
39, 47–62.
Bjork, R. A. (1994). Memory and metamemory considerations in the
MA: MIT Press.
Bonk, W. J., & Healy, A. F. (2010). Learning and memory for sequences
of pictures, words, and spatial locations: An exploration of serial
position effects. American Journal of Psychology, 123, 137–168.
Bourne, L. E., Jr., Healy, A. F., Bonk, W. J., & Buck-Gengler, C. J.
(2011). Intention to respond in a special way offers some protection
against forgetting associations. American Journal of Psychology,
124, 23–36.
Brown, G. D. A., Neath, I., & Chater, N. (2007). A temporal ratio model
of memory. Psychological Review, 114, 539–576.
Dale, E. (1969). Audio-visual methods in teaching (3rd ed.). New York:
Holt, Rinehart & Winston.
Gomez, R. L. (1997). Transfer and complexity in artificial grammar
learning. Cognitive Psychology, 33, 154–207.
Healy, A. F., Fendrich, D. W., & Proctor, J. D. (1990). Acquisition and
retention of a letter-detection skill. Journal of Experimental Psychol-
ogy: Learning, Memory, and Cognition, 16, 270–281.
cal Association.
Karpicke, J. D., & Roediger, H. L., III. (2008). The critical importance of
Ketels, S. L., Healy, A. F., Wickens, C. D., Buck-Gengler, C. J., & Bourne,
L. E., Jr. (2010, April). Spatial list learning and decision making in
the fusion paradigm. Poster presented at the 80th Annual Conven-
tion of the Rocky Mountain Psychological Association, Denver, CO.
L. E., Jr. (2011, March). A dual-process account of decision making:
Memory and anchoring. Paper presented in the Training Symposium
at the 57th Annual Meeting of the Southeastern Psychological Asso-
ciation, Jacksonville, FL.
McCormick, B. A., Schneider, V. I., & Healy, A. F. (2010). [Comparing
verbal and symbol modalities in a navigational task]. Unpublished
raw data.
Murdock, B. B., Jr. (1960). The distinctiveness of stimuli. Psychological
Review, 67, 16–31.
Roediger, H. L., III, & Karpicke, J. D. (2006). The power of testing mem-
ory: Basic research and implications for educational practice. Per-
spectives on Psychological Science, 1, 181–210.
Staal, M. A., Bolton, A. E., Yaroush, R. A., & Bourne, L. E., Jr. (2008).
Cognitive performance and resilience to stress. In B. Lukey & V.
Tepe (Eds.), Biobehavioral resilience to stress (pp. 259–300). Boca
Raton, FL: CRC Press.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty:
Heuristics and biases. Science, 185, 1124–1130.
Young, M. D., & Healy, A. F. (2011, March). Artificial grammar learn-
ing: Implicit and explicit components for retention and transfer. Paper
presented in the Training Symposium at the 57th Annual Meeting of
the Southeastern Psychological Association, Jacksonville, FL.
4 Attention and Cognitive
Resource Load in Training
Strategies
Chris Wickens
Shaun Hutchins and Tom Carolan

AlionSciences
John Cumming
Colorado State University
Attention and learning are inexorably linked. Put simply, it is

necessary to attend, in order to learn efficiently. Furthermore,
the more attention that is given to learning and training, the
better are the results for retention and transfer. Hence, distrac-
tion, or divided attention away from learning/training material
is undesirable. This commonsense dictum is manifest in many
forms, but some of these forms are not entirely intuitive, as dis-
cussed in other chapters and below, as they are reflected in a
variety of attention-related training strategies. Some strategies
designed to increase the amount of attention devoted to learn-
ing material can backfire; correspondingly, others designed
to reduce the attention demands of learning can have similar
unanticipated and undesirable consequences.
Cognitive load theory (Paas, Renkl, & Sweller, 2003; Paas
& van Gog, 2009; Sweller, 1988) provides a useful framework
for conceptualizing the relationship between attention and
learning/training. Briefly, cognitive load theory represents the
learner in the learning environment as dividing limited attention
or cognitive resources between three demands or task “loads.”
(a) Intrinsic load is characterized by the inherent complexity of
the task to be learned. Some tasks are relatively simple because
they have few elements, or the elements are not interrelated.
An example might be a set of procedural steps to activate a
piece of equipment. Other than possibly a necessary sequential
order, each step is independent of any other step. Others are
complex, because of heavy working memory load, or because
68 Christopher D. Wickens et al.
of the task’s relational complexity (Halford, Wilson, & Phil-
lips, 1998), such that some elements change form depending
on the state of other elements. The relational complexity of the
simple procedural task described above will be increased if the
procedures followed in elements later in the sequence depend
upon the outcome of procedures followed earlier. (b) Extraneous
load is imposed by sources of resource demand unrelated to
the task complexity (intrinsic load), but also not productively
related to learning. For example, suppose in learning the pro-
cedures sequence above, every time the learner is confronted
with a feedback screen, it contains unfamiliar acronyms, and
the learner must look up the meaning (and its implications for
action) in a separate paper manual, or must dive into a three-
level “vocabulary menu.” This procedure diverts attention both
from performing the procedural task and from learning how to
do it. The third source, called (c) germane load is related to this
“learning how to do it”: putting attention into understanding
the relationships in the task, rehearsing the procedures, devel-
oping strategies for coding the material in long term memory,
whether declarative or procedural (see chapter 2 for discussion
of this distinction), and paying attention to whatever consis-
tencies may exist across task material (“If I always do X under
condition Y, then Z will always happen”). Cognitive load theory,
and the many training experiments based upon its tenets, pos-
its that the ideal training strategy will take whatever resources
are available after those related to intrinsic load are expended,
and deploy those in a way to maximize the ratio of germane to
extrinsic load (Mayer, 2007; Mayer & Moreno, 2003; Paas & van
Gog, 2009). But how to achieve this state of affairs is nontrivial
(see chapter 5 for further discussion of performance of concur-
rent tasks).
A second theoretical perspective, useful in the analysis
of attention-based training strategies directly describes the
resource metaphor in task performance (Kahneman, 1973; Nor-
man & Bobrow, 1975; Wickens, 2008a; Wickens & McCarley,
2008). Here the focus is on how the level of task performance
is inversely related to the difference between the resources
demanded by a task and the resources supplied to it. Within
the framework of cognitive load theory, the task of interest is
learning, and hence the relevant resources are those devoted
to germane load (van Merriënboer, Kester, & Paas, 2006). This
equivalence between supply and demand (necessary for effective
learning) can be destroyed in three ways: (a) resource competi-
tion from the task’s intrinsic load (the task is too complex to
allow learning); (b) resource competition from extraneous load;
Attention and Cognitive Resource Load 69
Optimal increased motivation increased intrinsic added extraneous
A B C D
“fixed”
resource Germane load
supply
Intrinsic load
resources
Extraneous load
D S D S D S D S
Figure 4. 1 Representation of three sources of load in cognitive load

theory, and the implications of three instructional inter-
ventions (B, C, D) on load.
(c) loss of motivation (fewer resources supplied, or less effort

invested; Colquitt, LePine, & Noe, 2000; Paas, Tuovinen, van
Merriënboer, & Darabi, 2005). Correspondingly learning perfor-
mance should be improved by simplifying the task, removing
extrinsic load, and increasing motivation to invest more effort.
The training strategies discussed below are all based upon some
combination of efforts to achieve these three goals.
These effects can be represented schematically in Figure 4.1.
Each pair of bar graphs represents the resources demanded by
a task (D on the left) and those supplied by the learner (S on the
right). The relatively constant level of resource supply (the hori-
zontal line) represents a relatively “fi xed” capacity. The bottom
(white) part of each bar represents the resources demanded by
and supplied to the task’s intrinsic load; the upper (gray) seg-
ments are those available for germane load.
The left pair of bar graphs (A) represents an optimal situation
in which the resources needed for germane load are available
because the task is only of modest difficulty (medium intrin-
sic load). In the second pair (B), an increase in motivation has
increased the supply of available resources. If the task remains
constant in its difficulty, more resources are then available for
germane load. The third pair (C) represents an increase in task
difficulty. The intrinsic load is greater, and hence fewer resources
are available for learning (germane load), if performance is to be
maintained and motivation is not increased. Finally, the fourth
case (D) is one in which extraneous load is imposed, either
intentionally or inadvertently (the hashed region at the bottom).
Here again resources available for germane load are diminished
relative to (A).
Regarding Training and Transfer

Before discussing attention-related training strategies, it is
important to highlight the distinctions between learning (train-
ing) and transfer. All of the present treatment of the effectiveness
of such strategies will be based on transfer measures, rather
than the level of performance obtained or effort experienced
during training. This is an important distinction, highlighted
by Schmidt and Bjork (1992), and relevant to the discussion of
some of the strategies. Those strategies that may produce bet-
ter performance during learning and impose less effort may not
produce more effective transfer. And the applied community
must be more concerned about the latter effect than the former.
A particularly poignant example of these effects is discussed in
the chapter on cognitive retraining (chapter 15). It is important
to realize, however, that the time spent in training (as opposed
to the level of performance during training) does have impor-
tant implications for transfer, in that a strategy that requires
longer time to implement will reduce the transfer effectiveness
ratio (Povenmire & Roscoe, 1973), even if it may produce positive
transfer relative to a control group.
Further implications of this distinction between training and
transfer performance for the desirability of different training
devices are discussed at the end of this chapter (Bjork, 1999).
Several training strategies are now presented that can be roughly
assigned into two major categories: those that induce greater
resource investment, and those that reduce resource demand.
In the context of Figure 4.1, their degree of success is described,
and the manner in which efforts to induce resource allocation to
germane load, must be carefully analyzed and applied, in order
to achieve this goal.
Inducing Resource Investment

In a classic paper, Craik and Lockhart (1972) set the stage for a
resource investment training strategy by distinguishing between
deep processing (focusing on semantic relationships) and shal-
low processing (focusing on acoustic or sensory relationships) in
their effect on memory and retention (see the discussion of depth
of processing in chapter 2). Deep processing better serves reten-
tion. At the same time, deep processing requires more cognitive
effort—resource investment—than simple rote rehearsal (recy-
cling sounds in a phonetic code). Thus a challenge to train-
ing researchers is to uncover other mechanisms for inducing
increased resource investment. Three categories of techniques
that accomplish this purpose are discussed below.
The Generation Effect: Active Choice

Slamecka and Graf (1978) have proposed the generation effect
to describe how information is better retained when it is gener-
ated by one whose retention of that information is assessed,
than by another agent (see the discussion of the generation
effect in chapter 2). Such an effect is witnessed in navigation
experiments, in which making active choices of navigational
directions (in particular, being in the driver’s seat when driving
or flying) induces better route knowledge of the area traversed,
than being a passenger passively witnessing those choice points
(Pearce, 1981), or being a pilot whose plane is flown by autopilot
(Williams et al., 1996).
In the study of human–automation interaction (Endsley &
Kiris, 1995; Parasuraman & Wickens 2008; Sheridan & Para-
suraman, 2006; see also chapter 6), a well-replicated phenom-
enon is that, to the extent that changes in system state are
implemented by an automated agent, human memory for those
state changes will be degraded, describing what Endsley (1995)
refers to as a loss of Level 2 situation awareness. In fl ight train-
ing programs, a general conclusion is that students are less
proficient at learning fl ight dynamics when they train on “dual
controls,” keeping their hands on the fl ight yoke, and passively
moving it, as its movement is actually governed by a fl ight
instructor, than when they learn by actively generating their
own movements. Roediger and Karpicke (2006) have presented
compelling evidence that requiring students to take practice
tests will improve performance on subsequent high stake tests.
Karpicke and Roediger (2008) found that answer retrieval prac-
tice provided substantial benefits over simple study and review
of material (see the discussion of the testing effect and the effect
of retrieval practice in chapters 2 and 13).
In all of these cases active choice requires greater resource
effort into the material or task than does passive viewing of the
results of another agent making those choices. But that invest-
ment is germane to learning.
Although the advantage for retention of active choice in
training is well-established in many instances, there are still
circumstances in which too much learner guided choice can
lead to serious errors, and a confused state labeled “thrash-
ing,” in which learner resources are redirected from germane
load to simply recovering from a performance catastrophe that
results from the high intrinsic load imposed by a now out-of-
control system. Thus consider the student pilot who, in taking
active control, has now stalled the aircraft, or produced wild
patterns of instability. Although this situation may provide a
lesson of what not to do (if the student is calm enough in these
circumstances to pay attention to what was done), it does not
instruct the student well in how effective aircraft control should
be maintained.
Active Learning and Learner Control

Two closely related concepts in the pedagogy of instruction are
active or exploratory learning (also discussed in chapter 2), in
which the learner participates actively in the process (rather
than passively listening to a lecture or reading material), and
learner control, in which the learner makes choices about what
material to study, or when to move on versus repeat study of
a particular concept. Both of these are concepts designed to
induce effort investment via active choices (see above) as well as
to increase interest, enjoyment, and engagement in the learning
process. Both may also increase the workload experienced in
learning.
But do they work? Here meta-analyses can provide information
as to the overall effectiveness of the two techniques, showing that
neither produces general benefits relative to more controlled and
instructor driven strategies (Neimick, Sikorski, & Walborg, 1996;
Wickens, Hutchins, Carolan, & Cumming, 2011). Similar to the
situation with active choice, the limits of these techniques may
be reached when thrashing (or some variant thereof) is induced
by too much learner control, or unguided exploration. Illuminat-
ing in this regard is the meta-analysis carried out by Wickens et
al. (2011) on learner control, which revealed little difference in
transfer between learner control and programmed control, but
found that the best transfer was achieved by a level in between
these two that the authors labeled “advisement.” Here the agent
(computer or human being) does not mandate which material is
to be encountered (or which computer screens are to be studied),
but advises visiting material that would otherwise be mandated
in programmed control: a little guidance can go a long way in
both preventing thrashing, but enabling active choices (a similar
conclusion is discussed in chapter 15; see also chapter 6).
Interest, Entertainment, and “Engagement”
Everyone is familiar with the lecturer or instructor who tells
jokes, and generally keeps students awake with humor or com-
pelling “war stories.” In computer based learning, video-game-
like features can be fun and certainly engaging. Training
through virtual reality has the same engaging properties (Wick-
ens, 1992). And recent research has begun to focus on defin-
ing the concept of “engagement” (Horrey, Lesch, Garabet, 2009;
Montgomery, Sharafi, & Hedman, 2004) and its clearly attractive
features, which can induce resource investment into the learning
or training environment. Yet the effects of engagement compelled
through interest in a learning or training environment are not
clear-cut. Humor or war stories in a lecture may divert attention
to that entertaining material (extraneous load) and away from
a deeper consideration of the concepts to be mastered (germane
load). As one specific example, Mayer, Griffith, Jurkowitz, and
Rothman (2008) examined the effects of interesting, but extrane-
ous detail on computer-based instruction and observed it to be
counterproductive to learning and transfer.
Summary
Collectively then, all of these substrategies make the same point:
Any vehicle that can induce the allocation of more resources into
the task to be mastered can improve learning if those resources
go into germane load (Example B in Figure 4.1). But “the task”
must be carefully trisected into its three load components. A
vehicle that makes the task too difficult, temporarily increasing
the resources demanded by intrinsic load (necessary to sustain
performance) can induce “thrashing” (Example C)—this idea is
sometimes referred to as staying within the “zone of learnabil-
ity” during training (Wolfe et al., 1998). A vehicle that misdirects
attention to irrelevant (albeit sometimes engaging) aspects of the
pedagogy will simply impose added extrinsic load (Example D).
Reducing Resource Demand

Although the previous section focused on means to increase
the resources invested in learning, the present section consid-
ers a set of three strategies designed to reduce the resources
demanded by learning, hence taking a different tack toward
reducing the gap between resources demanded and supplied.
These three strategies are increasing difficulty, error prevention,
and part task training.
Increasing Difficulty
It is apparent that for many complex skills it is not appropriate
to start training at the full complexity level of the transfer task,
making the learner “sink or swim.” There is little benefit in plac-
ing the beginning student in an aircraft simulator cockpit with
the simple instruction: “fly the plane.” Even given a little knowl-
edge, the task is so complex, its intrinsic load is so high, that
full resources will be required to recover from errors or thrash-
ing will take place so that nothing will be available to invest in
learning (germane load). Hence an important strategy is to start
with a simple version of the task and gradually increase its diffi-
culty as learning progresses. This is a technique that Wightman
and Lintern (1985) call simplification. As difficulty increases
match skill development over time, the resource demands of
performance (intrinsic load) remain relatively constant, availing
ample resources for germane load.
Many variants of this procedure have been examined (see
Wightman & Lintern, 1985, for a review). For example flight
dynamics in aircraft control are quite complex, requiring pre-
diction and perception of higher derivatives of change to “stay
ahead” of the aircraft and avoid pilot-induced oscillations. Briggs
and Naylor (1962) examined a technique whereby learners began
with simplified dynamics, and were gradually exposed to the
added sluggish lag in the control task as learning progressed.
As another example, Pollock, Chandler, and Sweller (2002) pro-
vided training to operate a complex piece of equipment either
at its full difficulty (control group), or in a sequence in which
first the noninteracting components were trained (low relational
complexity and intrinsic load) and then the interacting compo-
nents were added and trained (high relational complexity).
Importantly, there are two different means of increasing dif-
ficulty. In a fi xed schedule technique, difficulty is increased for
all learners at the same predefined schedule. In an adaptive
technique, the increases are based on individual learner per-
formance. If the learner hits a plateau in the learning curve,
difficulty remains constant until performance begins to improve
again. The latter strategy is called adaptive training (see chapter
7 for the results of a study comparing adaptive and nonadaptive
training).
There is no doubt that increasing difficulty can assist learn-
ing and hence transfer in some circumstances (see Wightman &
Lintern, 1985), particularly when compared against the extreme
case like that given above (“sink or swim” in the aircraft). It
keeps intrinsic load at a manageable level. The overall success
of increasing difficulty training, compared to training that ini-
tiates training the task at its full difficulty level is conclusively
established by a meta-analysis (Wickens et al., 2011; see also
chapters 2 and 15); yet careful consideration of specific studies
suggest two cautions.
First, the nature of what aspects of the task are changed
to increase the difficulty is of great importance. In the case of
the Briggs and Naylor (1962) tracking study, when the actual
strategy of display-control mappings is changed, the advantage
of such a technique was eliminated. This is because changing
stimulus-response mappings, whether suddenly or incremen-
tally over time, can lead to negative transfer (Holding, 1976),
which in that context can offset any benefits of reduced resource
demand. In this regard, Mane, Adams, and Donchin (1989) pro-
posed that an effective means of imposing increased difficulty is
to increase time pressure gradually. Greater time pressure does
not alter the event-action mappings of the task, but only the
requirement to invest more resources, or deploy those resources
more efficiently.
Second, whether training is adaptive or on a fi xed difficulty
schedule (see again chapter 7) has a substantial impact (Wick-
ens et al., 2012). Adaptive training provides a clear transfer ben-
efit relative to constant difficulty control, whereas training with
increasing, but nonadaptive difficulty has a clear cost.
Error Prevention: Training Wheels

Strategies discussed above increased the intrinsic difficulty
of the task itself, because learning progresses in order to
maintain constant intrinsic load. In error prevention strate-
gies, however, the task characteristics remain constant dur-
ing the training period, but other techniques are applied early
on, in order to lower the resource demands of performance,
prevent thrashing, and guide more resources toward mastery
of the concepts or skills to be learned. This technique of error
prevention or “training wheels” (Carroll, 1990; Carroll & Car-
rithers, 1984) is closely related to a technique of “scaffolding”
that often appears in the education literature (Pea, 2004).
Although the two techniques are quite similar, and both dem-
onstrate equivalent levels of effectiveness relative to control
groups (Wickens et al., 2011) the focus here will be on error
prevention or training wheels.
In a typical training wheels approach, when the skill is com-
plex so that many errors are possible as, for example, when
operating a complex piece of equipment or a software package,
unintended errors can have two consequences. Sometimes they
may simply get the task off track, as when a screen suddenly
goes blank after the wrong key is hit. The learner is confused,
wasting valuable time getting back on track, perhaps thrashing
about to do so. Sometimes errors can be catastrophic, requiring
the learning session to restart from scratch (deleting an entire
file, or shutting down the equipment). In order to prevent such
mild or catastrophic events, a training wheels approach sim-
ply locks out certain actions that can have serious unintended
consequences, in much the same way that providing training
wheels on a bicycle will lock out the unfortunate tendency of the
child to fall over.
An example of error prevention in mathematical learning is
the imposition of worked examples (Renkl, Stark, Grubel, &
Mandl, 1998). Here when problems are quite difficult, and the
learner may implement a totally incorrect strategy, or not know
how to “set up” the problem, it is beneficial early in the process
to present problems that show the appropriate procedures step
by step.
To be effective, the key element of error prevention is the
schedule for release—removing the error-prevention lockouts
or dismantling the scaffold. For example, initially all aspects
of the mathematical problems may be worked, but as learning
progresses more and more of these steps are turned over to the
learner, often starting with the final steps of the problem, and
working backward toward the initial steps.
Like increasing difficulty, error prevention and its close
cousin scaffolding have revealed substantial success in transfer
of training experiments, when compared with control counter-
parts (errors allowed, no scaffolding provided). A meta-analysis
revealed an overall gain in transfer (30% for training wheels and
60% for scaffolding) for the two techniques over control condi-
tions (Wickens et al., 2011).
But, as with difficulty increasing strategies, careful atten-
tion of the training designer must be given to the strategies that
learners develop during training. In particular, care must be
taken to insure that while constraints (lockouts) are provided,
learners do not learn the task by adopting a strategy of depend-
ing on them, such that when they are removed, not only may
the learner thrash (and crash), but a dependence-free skill
will not have been acquired. The example of training wheels is
direct: The key skill in learning stability control while bike rid-
ing is to turn toward the direction of falling. But while training
wheels are on a bike, the learning child may never experience
the sense of falling, or may never realize the importance of this
“turn toward” strategy, because the wheels prevent any sense of
falling.
An example of error prevention in one of the most challenging
aviation skills, lining up the plane for landing, is also instruc-
tive. Perceiving the appropriate visual cues in the environment
is very challenging in these circumstances. Lintern and Roscoe
(1980) examined a technique of providing an artificial display
guidance that gave the student pilots a precise dynamic graphic
depicting when they were on the appropriate approach path
angle, and if they were off, the size of the deviation. Although
this technique generally showed positive transfer, a challenge
was how to prevent the students from concentrating solely on
the artificial guidance and ignoring the texture gradient and
flow field in the world beyond, the necessary source of informa-
tion when the scaffold was removed. In a clever manipulation,
Lintern and Roscoe examined a training condition in which the
artificial guidance was only displayed, adaptively, when the pilot
veered beyond a certain criterion off the approach path. Thus,
when the learner was doing well, he or she was forced to rely
upon the world cues to maintain such performance, and the
crutch of the training wheel was only provided in the serious
cases when a large error begins to emerge (but not so large that
thrashing begins).
The discussion of the scaffolding and training wheels strate-
gies above reintroduces a distinction made in learner control
between mandating and advising. That is, during the earlier
constrained learning phases, the undesirable behavior can
be either locked out (mandating error free behavior) or simply
advised against. In the case of Lintern and Roscoe’s (1980) land-
ing experiment, pilots were not prevented from going off course,
but were simply advised that they were. As with learner con-
trol, there is evidence that advising is superior to mandating. In
the meta-analysis, those studies in which advising rather than
mandating were applied produced significantly and substan-
tially more positive transfer (78% vs. 37%). Advising still forces
the learner to make active choices, which should improve learn-
ing; mandating does not. This distinction is revisited at the end
of this chapter.
Part Task Training

Wightman and Lintern (1985) have outlined and contrasted two
qualitatively different ways in which a complex whole task can be
broken into its parts. In Segmentation, the parts are sequential.
For example a complex piece of music might be broken into dif-
ferent phrases, and intensive practice given on each in turn,
before they are joined in a longer sequence. In fractionation, the
two (or more) parts are performed concurrently, as time-shared
tasks, between which attention must be divided. Examples are
the left and right hands in a piano piece, strumming and chord-
ing on a guitar, talking on the radio while controlling the air-
craft, and, again in the aircraft, controlling a turn while climbing
or descending. In this chapter only fractionation is discussed,
as segmentation effects on transfer (which are generally posi-
tive; Wightman & Lintern, 1985) have less relation to attention.
Such effects are found to be positive for manual control tasks
(Wightman & Lintern, 1985), but not for the broader category of
all tasks (Wickens et al., 2012). Further discussion of part task
training is found in chapter 2.
Part task fractionation can be seen to have two related ben-
efits. First, the whole task may be extremely difficult, impos-
ing excessive intrinsic load, and so initial whole task training
performance may leave few resources at all to devote to ger-
mane load; in a worst-case scenario this procedure may lead
to thrashing and crashing. Second, from the opposite perspec-
tive, part task training can allow full resources to be deployed
to only half of a complex dual task configuration. Not only are
ample resources then available for its germane load, but cer-
tain types of tasks with consistent mappings can be practiced
to near full automaticity (theoretically, performance demanding
no resources; Schneider, 1985). Thus, when they are recombined
with the other half of the complex whole, performance of that
whole now demands few resources beyond those required by the
new added part.
In general, fractionation has not proven to be a terribly effec-
tive training strategy. Wightman and Lintern (1985) found that
only one of the nine tracking experiments that they reviewed
showed positive transfer over a whole task control group. The
more recent meta-analysis carried out by Wickens et al. (2012),
including 65 different comparisons, revealed a significant
(p < .05) 30% cost for fractionation part task training. There was
neither cost nor benefit for segmentation.
At least three possible reasons can be cited for this ambiva-
lent effect of reducing demand during training by fractionation.
First, the sequence of practicing and joining the parts matters
considerably. In the extreme, it would make little sense to divide
practice time 50/50 between an extremely easy subtask and a
very difficult one, rather than giving more time to the more dif-
ficult task. More subtly, it would make sense to provide more
part task practice time to a component for which automaticity
could be developed because of its consistent mappings (Schnei-
der, 1985), and less to that which has little consistency. There
is also subtlety and complexity in the time sequence with which
parts are joined together (Schneider & Detweiller, 1988). Indeed,
in the meta-analysis, the only study to show a clear benefit of
fractionation (Mane et al., 1989) was one preceded by a careful
tasks analysis.
Second, performance of many dual task combinations
depends not only on skills of the component task but also on
development of a time-sharing skill, an emergent property of the
task combination that resides in neither task by itself (Damos
& Wickens, 1980). It is not always clear precisely what this skill
might be but it would seem to be related to the ability to fluently
and optimally allocate and reallocate resources between tasks.
Such a conclusion is consistent with the findings of Gopher
and his colleagues (e.g., Gopher, 2007; Gopher, Weil, & Bara-
keit, 1994; Gopher, Weil, & Siegel, 1989) that part task training
of dynamic resource allocation skills transfers very effectively
to whole task dual task performance, and this holds true not
only for the two task parts that were trained, but also for two
very different tasks in a new dual task combination (Kramer,
Larish, & Strayer, 1995). An important element of this atten-
tion allocation skill may lie in visual scanning which is often
required for dual task performance. Scanning as an element of
part-task training is discussed below. The importance of this
time-sharing skill in part task training is supported not only
by the overall cost, but also by the fact that this cost is greater
when the task ensemble to be trained is more complex (Wickens
et al., 2011), a circumstance that would be expected to increase
the demand for a time-sharing skill.
Third, the components of some dual task combinations are
not really independent at all, but are integrated in their infor-
mation processing requirements and their effects on system
behavior (Wickens & Hollands, 2000) (see the related discus-
sion of functional task development in chapter 2). Consider for
example two parts to the complex whole task of shifting gears
in a stick shift car—the clutch coordinated with the gas pedal
and the gear shift. Although two different manual actions are
required, careful coordination of timing between them is also
essential in order to prevent the gnashing of gears. Another
example introduced above is the task of lateral and vertical
control of the aircraft when carrying out an ascending turn.
Aircraft dynamics are such that turning the aircraft will cause
it to lose lift, resulting in a descent that must be compensated
by more aggressive vertical control than in a straight climb.
Furthermore, the loss of airspeed resulting from an ascent will
itself affect the dynamics of the turn. Therefore, vertical and
lateral control cannot be considered two independent tasks.
These interrelations simply cannot be learned or practiced in
part task training. Thus both time-sharing skills and, in par-
ticular, integrated task components produce a force that enjoins
against part task training (Lintern & Wickens, 1991). In the
case of task integration, it appears to be a great enough force
to impose a serious cost to part task training (Naylor & Briggs,
1963).
Variable Priority Training

Given the challenges of part task training, in eliminating the
development of time-sharing skills, Gopher and his colleagues
developed the technique of variable priority training described
above in the context of resource allocation time-sharing skills.
Here, on the one hand, the task is practiced as a whole; but on
the other hand its parts are treated somewhat separately as the
learner is asked, in turn, to emphasize one or the other dur-
ing training. A time-sharing skill (or task integration) is still
required. But resources are focused primarily on each compo-
nent in turn, to direct those resources to component-specific ele-
ments of germane load. Such a technique provides clear positive
transfer relative to whole task training (Fabiani et al., 1989).
Scanning Training
Another way of dividing up the parts of a multiple task scenario
when the tasks are visual is to simply train the attention allo-
cation component. The logic of such an approach is inherent in
the findings that experts show qualitatively different scan pat-
terns in complex multitask skills, like flying a plane (Bellenkes,
Wickens, & Kramer, 1997; Schriver, Morrow, Wickens, & Talleur,
2008; Wickens et al., 2008), or assisting in the hospital operat-
ing room (Koh, Park, Wickens, Ong, & Chia, 2011). Hence one
should be able to capture experts’ scan patterns, extract their
general features, and train the appropriate allocation behav-
ior separately from the rest of the task. Shapiro and Raymond
(1989) have done so for a complex Space-Fortress video game in
a way that was more productive than whole task training (but
less so than other forms of part task training). However, in con-
trast, Bellenkes (1999) failed to reveal any effectiveness of scan
training on the fundamental aspects of multi-axis control in
aircraft flight. That is, training the scan pattern directly failed
to improve the mental model of multi-axis flight dynamics. But
Bellenkes found greater success was obtained with a second
group that was first trained by developing that mental model of
flight dynamics, via a paper-based instructional sheet, showing
how changes on one axis affected changes on others.
Additional Considerations
A set of training principles and associated strategies has been
described above within the context of the resource allocation
metaphor. In considering their collective effects, three general
findings emerge, that transcend individual strategies.
The Expertise Effect

One of the strongest tests of cognitive load theory comes from
what is called the expertise effect in training strategies (Kaly-
uga, Chandler, & Sweller, 1998; Paas & van Gog, 2009; Pollack
et al., 2002; van Merriënboer et al., 2006). Put simply, effective
effort-reducing training strategies for novices may be less effec-
tive for experts, whose performance of the task is less resource
limited. The expertise effect was confirmed in the meta-analysis
(Wickens et al., 2012), revealing a significantly greater cost for
fractionation for high vs. low experienced learners, along with
an elimination of the benefit for training wheels for high experi-
enced learners.
The basis of this effect in cognitive load theory is that the task
to be mastered is less complex for one with more experience; it
imposes less intrinsic load. Hence, with more resources already
available for germane load for the beginner, additional simplify-
ing techniques that are designed to increase resources available
for germane load (lower difficulty, error prevention, training in
parts) are simply unnecessary. Indeed, when deployed for the
experts in the absence of need, such techniques may simply
amplify their costs that were described earlier in the chapter
(e.g., developing inappropriate strategies from simplification,
failure to learn time-sharing skills in part task training).
Guidance versus Mandating

An emerging theme from the meta-analysis (Wickens et al., 2011)
of two of the strategies (training wheels and learner control) is
the superiority of strategies that do not mandate what a learner
should do (or not do, in the case of “lockouts”), nor encourage
her or him to take total freedom to do anything and make any
kind of mistakes, but rather advise or guide them toward certain
kinds of choices, actions, and behavior, and against others. Such
a finding has parallels in the extensive literature and emerg-
ing theory of human–automation interaction (HAI), regarding
the appropriate level of automation (Parasuraman, Sheridan,
& Wickens, 2000; Parasuraman & Wickens, 2008; Sheridan &
Parasuraman, 2006). Such a parallel is not surprising given the
extensive use of automated agents in many advanced computer
or Web-based simulators and training devices.
More specifically, the HAI literature has carefully distin-
guished between multiple levels of automation action advice
along scales typical of the following (see also chapter 6 for fur-
ther elucidation of the effects that different levels of automation
have on training efficiency):
High. High levels of automation advice:

Automation chooses an option with no veto possible
Automation chooses an option but allows the human to
veto it
Automation suggests the best choice
Automation suggests a selection of good options
Automation offers no advice (no automation)
Low.
Moving up the continuum (up the list) of levels of automation

authority has four effects (Wickens, Li, Santamaria, Sebok, &
Sarter, 2010): (a) workload is reduced as more cognitive work is
carried out by the automated agent; (b) performance is generally
better (given well validated and reliable automation routines); (c)
situation awareness is degraded, and, if automation does make
a mistake at the highest two levels, such errors usually have
more serious consequences; and (d) error recovery is often longer
and less fluent (Endsley & Kiris, 1995; Sarter & Schroeder, 2001
). The latter two effects are generally linked, because the loss of
situation awareness at higher levels (resulting from fewer active
deliberative choices—see the generation effect discussed ear-
lier in the chapter), will make error recovery more challenging.
Importantly, because of the trade-offs involved between work-
load and situation awareness, and between routine and failure
performance, the optimum level of automation often appears to
be somewhere in the midrange (Endsley & Kiris, 1995; Kaber &
Endsley, 2003; Wickens, 2008b).
Similar findings, and to some extent similar mechanisms,
appear to apply to the continuum in training strategies between
total freedom and total control, with guidance/advisement in the
middle. Total freedom can create excessive workload leaving few
resources available for germane load. Total control, by removing
active choice can degrade skill acquisition because the generation
effect is removed; in the same way that removal of action genera-
tion at high levels in HAI degrades situation awareness. A final
feature of this parallel is that, just as attention and attention
theory is a salient presence in learning and training (the point
highlighted in this chapter; Lintern & Wickens, 1991), so it is also
salient in the theory of HAI (Manzey & Parasuraman, 2010).
Affect and Training: Illusory Competence and the

Choice of Training Devices
The concept of resource allocation and mental effort is integral
to an intriguing training-related phenomenon that Bjork (1999)
has described as the illusion of competence in one’s own memory
capabilities. In short, people often think that they have learned
things better than they actually have. As discussed above, cer-
tain training strategies such as training wheels or simplification
(reduced difficulty early in training) will improve performance
during learning, but may not lead to positive transfer. Other
training strategies—encouraging active choice—may show
poorer performance in training, even as they can produce supe-
rior transfer. This dissociation between the level of performance
in training, and that in transfer is well documented (Schmidt
& Bjork, 1992; see also chapters 1 and 2). But Bjork (1999)
has noted further that people intuitively evaluate the ease of
learning, training, and practice as a proxy for the quality and
effectiveness of that learning: People erroneously think that if
learning is easy, it is effective, and memory for what is learned
will be strong. This is an illusion. People using this heuristic
(ease of learning = quality of learning) will often study material
less than they should, or chose an inappropriate easy training
technique (e.g., relying upon training wheels).
This metacognitive illusion also has implications beyond
the self-choices of training strategy and practice time. If learn-
ers enjoy a particular training device or strategy, because of
its favorable impact on performance during training (and other
enjoyable aspects related to extraneous load), this positive affect
may well reflect favorably on the instructor or training environ-
ment in which that strategy is employed. Vendors who sell that
strategy (or an instructional curriculum or device based on it)
will benefit in sales and marketing because of this favorable atti-
tude. But the proof of effectiveness must lie in transfer, which
may not be correlated with (or may be negatively correlated with)
performance and enjoyment in training.
Conclusion
This chapter provides an “attention-centric” view of training,
anchored in cognitive load theory. In so doing, many other
principles and strategies discussed in this volume, based on
memory and learning theory have been ignored. It is strongly
believed that attention and resource allocation are necessary,
but far from sufficient for effective learning and training to
take place, and so these are important considerations in the
design of any training environment. The general trichotomy
distinguishing between intrinsic, germane, and extraneous load
can always be applied to any training environment, even as the
boundaries between the three are sometimes fuzzy (Kalyuga,
2011). The greatest challenge for researchers would appear to
be identifying that boundary between germane and intrinsic
load, and understanding the intriguing paradox that sometimes
allocating more resources into intrinsic task load, to sustain
performance, does not assist skill acquisition, as intrinsic load
competes with resources for germane load. A challenge for
training research will be to continue to disentangle these issues.
Acknowledgments
Portions of this work were supported by the US Army Research
Institute under Contract #: W91WAW-09-C-0081 to Alion Sci-
ence and Technology titled Understanding the Impact of Train-
ing on Performance.
References
Bellenkes, A. (1999). The use of pilot performance models to facilitate
cockpit visual scanning. Unpublished Doctorial Dissertation. Uni-
versity of Illinois at Urbana Champaign.
Bellenkes, A. H., Wickens, C. D., & Kramer, A. F. (1997). Visual scan-
ning and pilot expertise: The role of attentional flexibility and mental
model development. Aviation, Space, and Environmental Medicine,
68, 569–579.
Bjork, R. A. (1999). Assessing our own competence: Heuristics and illu-
sions. In D. Gopher & A. Koriat (Eds.), Attention and performance:
Vol. 17. Cognitive regulation of performance: Interaction of theory and
application (pp. 435–459). Cambridge, MA: MIT Press.
Briggs, G., & Naylor, J. (1962). The relative efficiency of several training
methods as a function of transfer task complexity. Journal of Experi-
mental Psychology, 64, 505–512.
Carroll, J. (1990). The Nurnberg Funnel: Designing minimalist instruc-
tion for practical computer skills. Cambridge, MA: MIT Press.
Carroll, J. M., & Carrithers, C. (1984). Blocking learner error states in
a training-wheels system. Human Factors, 26, 377–389.
Colquitt, J. A., LePine, J. A., & Noe, R. A. (2000). Toward an integrative
theory of training motivation: A meta-analytic path analysis of 20
years of research. Journal of Applied Psychology, 85, 678–707.
Behavior 11, 671–684.
Damos, D., & Wickens, C. D. (1980). The identification and transfer of
time-sharing skills. Acta Psychologica, 46, 15–39.
Endsley, M. R. (1995). Towards a theory of situation awareness in
dynamic systems, Human Factors, 37, 32–64.
Endsley, M. R., & Kiris, E. O. (1995). The out-of-the-loop performance
problem and level of control in automation. Human Factors, 37,
381–394.
Fabiani, M., Buckely, J., Gratton, G., Coles, M., Donchin, E., & Logie,
R. (1989). The training of complex task performance. Acta Psycho-
logica, 71, 179–199.
Gopher, D. (2007). Emphasis change as a training protocol for high-
demand tasks. In A. Kramer, D. Wiegmann, & A. Kirlik (Eds.), Atten-
tion: From theory to practice (pp. 209–224). Oxford, England: Oxford
University Press.
Gopher, D., Weil, M., & Barakeit, T. (1994). Transfer of skill from a com-
puter game trainer to fl ight. Human Factors, 36, 387–405.
Gopher, D., Weil, M., & Siegel, D. (1989). Practice under changing pri-
orities: An approach to training of complex skills. Acta Psychologica,
71, 147–197.
Halford, G. S., Wilson, W. H., & Phillips, S. (1998). Processing capac-
ity defined by relational complexity: Implications for comparative,
developmental, and cognitive psychology. Behavioral and Brain Sci-
ences, 21, 803–864.
Holding, D. H. (1976). An approximate transfer surface. Journal of
Motor Behavior, 8, 1–9.
Horrey, W., Lesch, A., & Garabet, A. (2009). Dissociation between driv-
ing performance and driver’s subjective estimates of performance
and workload in dual task conditions. Journal of Safety Research,
40, 7–12.
Kaber, D. B., & Endsley, M. R. (2003). The effects of level of automation
and adaptive automation on human performance, situation aware-
ness and workload in a dynamic control task. Theoretical Issues in
Ergonomic Science, 3, 1–40.
Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Pren-
tice Hall.
Kalyuga, S. (2011) Cognitive load theory: How many types of load does
it really need? Educational Psychology Review, 23, 1–19.
Kalyuga, S., Chandler, P., & Sweller, J. (1998). Levels of expertise and
instructional design. Human Factors, 40, 1–17.
Karpicke, J., & Roediger, H. (2008). The critical importance of retrieval
for learning. Science, 319, 966–968.
Koh, R., Park, T., Wickens, C., Ong, L. T., & Chia, S. N. (2011) Differ-
ences in attentional strategies by novice and experienced operating
theatre scrub nurses. Journal of Experimental Psychology: Applied,
17, 233–246.
Kramer, A. F., Larish, J. F., & Strayer, D. L. (1995). Training for atten-
tional control in dual task settings: A comparison of young and old
adults. Journal of Experimental Psychology: Applied, 1, 50–76.
Lintern, G., & Roscoe, S. (1980). Visual cue augmentation in contact
flight and simulation. In S. N. Roscoe (Ed.), Aviation psychology (pp.
227–238). Ames, IA: Iowa State University.
Lintern, G., & Wickens, C. D. (1991). Issues for acquisition in transfer
of timesharing and dual-task skills. In D. Damos (Ed.), Multiple-task
performance (pp. 123–138). London: Taylor & Francis.
Mane, A., Adams, J., & Donchin, E. (1989) Adaptive and part-whole
training in the acquisition of a complex perceptual-motor skill. Acta
Psychologica, 71, 179–196.
Manzey, D., & Parasuraman, R. (2010). Complacency and bias in
human use of automation: An attentional perspective. Human Fac-
tors, 52, 381–410.
Mayer, R. (2007). Research guidelines for multi-media instructions. In
F. Durso (Ed.), Reviews of human factors and ergonomics (Vol. 3, pp.
127–147). Santa Monica, CA: Human Factors.
Mayer, R., Griffith, E., Jurkowitz, T., & Rothman, D. (2008). Increas-
ing interestingness of extraneous details in a multimedia science
presentation leads to decreased learning. Journal of Experimental
Psychology: Applied, 24, 329–363.
Mayer, R. E., & Moreno, R. (2003). Nine ways to reduce cognitive load in
multimedia learning. Educational Psychologist, 38, 45–52.
Montgomery, H., Sharafi, P., & Hedman, L. R. (2004). Engaging in
activities involving information technology: Dimensions, mode and
flow. Human Factors, 46, 334–348.
Naylor, J., & Briggs, G. (1963). Effects of task complexity and task orga-
nization on the relative efficiency of part and whole training meth-
ods. Journal of Experimental Psychology, 65, 217–224.
Neimick , R., Sikorski, C., & Walborg, H. (1996) Learner control effects:
A review of reviews and a meta-analysis. Journal of Educational and
Computing Research, 15, 157–174.
Norman, D. A., & Bobrow, D. G. (1975). On data-limited and resource-
limited processes. Cognitive Psychology 7, 44–64.
Paas, F., Renkl, A., & Sweller, J. (2003). Introduction: Cognitive load
theory and instructional design: Recent developments [Guest Edito-
rial Statement]. Educational Psychologist, 38, 1–4.
Paas, F., Tuovinen, J., van Merriënboer, J. J. G., & Darabi, A. (2005).
A motivational perspective on the relation between mental effort
and performance: Optimizing learners’ involvement in instructional
conditions. Educational Technology, Research and Development, 53,
25–33.
Paas, F., & van Gog, T. (2009) Principles for designing effective and effi-
cient training for complex cognitive skills. In F. Durso (Ed.), Reviews
of human factors and ergonomics (Vol. 5, pp. 166–194). Santa Mon-
ica, CA: Human Factors.
Parasuraman, R., Sheridan, T., & Wickens, C. D. (2008). Situation
awareness, mental workload, and trust in automation: Viable,
empirically supported cognitive engineering constructs. Journal of
Cognitive Engineering and Decision Making, 2, 140–160.
Parasuraman, R., & Wickens, C. D. (2008). Humans still vital after
all these years of automation [Special issue]. Human Factors, 3,
511–520.
Pea, R. D. (2004). The social and technological dimensions of scaf-
folding and related theoretical concepts for learning, education, and
human activity. The Journal of the Learning Sciences, 13, 423–451.
Pearce, P. (1981) Route maps: A study of traveller’s perception of a sec-
tion of countryside. Journal of Environmental Psychology, 1, 141–153.
Pollock, E., Chandler, P., & Sweller, J. (2002). Assimilating complex
information. Learning and Instruction, 12, 61–86.
Povenmire, H. K., & Roscoe, S. N. (1973). Incremental transfer effective-
ness of a ground-based general aviation trainer. Human Factors, 15,
534–542.
Renkl, A., Stark, R., Gruber, H., & Mandl, H. (1998). Learning from
worked-out examples: The effects of example variability and elic-
ited self-explanations. Contemporary Educational Psychology, 23,
90–108.
Roediger, H., & Karpicke, J (2006) Test-enhanced learning: Taking
memory tests improves long-term retention. Psychological Science,
17, 249–255.
Sarter, N. B., & Schroeder, B. (2001). Supporting decision making and
action selection under time pressure and uncertainty: The case of
in-fl ight icing. Human Factors, 43, 573–583.
Schneider, W. (1985). Training high-performance skills: Fallacies and
guidelines. Human Factors, 27, 285–300.
Schneider, W., & Detweiler, M. (1988). The role of practice in dual-task
performance: Toward workload modeling in a connectionist/control
architecture. Human Factors, 30, 539–566.
Schriver, A. T., Morrow, D. G., Wickens, C. D., & Talleur, D. A. (2008).
Expertise differences in attentional strategies related to pilot deci-
sion making. Human Factors, 50, 864–878.
Shapiro, K. L., & Raymond, J. E. (1989). Training of efficient oculo-
motor strategies enhances skill acquisition. Acta Psychologica, 71,
217–242.
Sheridan, T., & Parasuraman, R. (2006). Human–automation interac-
tion. Reviews of Human Factors and Ergonomics, 1, 89–129.
Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation of
a phenomenon. Journal of Experimental Psychology: Human Learn-
ing and Memory, 4, 592–604.
Sweller, J. (1988). Cognitive load during problem solving: Effects on
learning. Cognitive Science, 12, 257–285.
van Merriënboer, J. J. G., Kester, L., & Paas, F. (2006). Teaching com-
plex rather than simple tasks: Balancing intrinsic and germane load
to enhance transfer of learning. Applied Cognitive Psychology 20,
343–352.
Wickens, C. D. (1992). Virtual reality and education. Proceedings of
the IEEE International Conference on Systems, Man, and Cybernetics
(Vol. 1, pp. 842–847) New York: IEEE.
Wickens, C. D. (2008a). Multiple resources and mental workload
[Golden Anniversary Special issue]. Human Factors, 50, 449–455.
Wickens, C. D. (2008b). Situation awareness: Review of Mica Endsley’s
articles on situation awareness [Golden Anniversary Special issue].
Human Factors, 50, 397–403.
Wickens, C. D., & Hollands, J. (2000) Engineering psychology and
human performance (3rd ed.). Upper Saddle River, NJ: Prentice Hall.
Wickens, C. D., Hutchins, S., Carolan T., & Cumming, J. (2011). Investi-
gating the impact of training on transfer: A meta-analysis. Proceed-
ings of the 2011 Conference of the Human Factors and Ergonomics
Society. Santa Monica, CA: Human Factors.
Wickens, C. D., Hutchins, S., Carolan, T., & Cummings, J (2012). Part
task training and increasing difficulty training strategies: A meta-
analysis approach. Human Factors, 54, ppxx.
Wickens C. D., Li, H., Santamaria, A., Sebok, A., & Sarter, N. (2010).
Stages & Levels of automation, An integrated meta-analysis. In
proceedings, 2011 conference of the Human Factors & Ergonomics
Society. Santa Monica, CA: Human Factors.
Wickens, C. D., & McCarley, J. (2008) Applied attention theory. Boca
Raton, FL: CRC Press.
Wickens, C. D., McCarley, J. S., Alexander, A. L., Thomas, L. C.,
Ambinder, M., & Zheng, S. (2008). Attention-situation awareness
(A-SA) model of pilot error. In D. Foyle & B. Hooey (Eds.), Human
performance models in aviation (pp. 213–239). Boca Raton, FL: CRC
Press.
Wightman, D., & Lintern, G. (1985). Part task training for tracking and
manual control. Human Factors, 27, 267–284.
Williams, H. P., Hutchinson, S., & Wickens, C. D. (1996). A comparison
of methods for promoting geographic knowledge in simulated air-
craft navigation. Human Factors, 38, 50–64.
Wolfe, M. B. W., Schreiner, M. E., Rehder, B., Laham, D., Foltz, P. W.,
Kintsch, W., & Landauer, T. K. (1998). Learning from text: Matching
readers and text by latent semantic analysis. Discourse Processes,
25, 309–336.
5 Acquisition and Transfer
of Basic Skill Components
Robert W. Proctor, Motonori
Yamaguchi, and James D. Miles
Purdue University
This chapter examines factors that influence the acquisition

and transfer of fundamental components of skill. Much of the
research described in it was conducted with basic choice reac-
tion tasks, which permit isolation of fundamental cognitive
processes and rapid acquisition of skill within a single exper-
imental session (Proctor & Vu, 2006a). The methods relied
heavily, though not exclusively, on variants of spatial stimulus-
response compatibility (SRC) tasks. The concept of SRC and the
first investigations of compatibility effects are attributed to Paul
M. Fitts (Fitts & Deininger, 1954; Fitts & Seeger, 1953), who
founded the Psychology Branch of the Aero Medical Laboratory
of the U.S. Army at Wright Field at the end of World War II. Per-
haps more than anyone, he recognized the value of basic labora-
tory tasks for understanding processes involved in much more
complex military tasks. This value has also been appreciated by
other researchers associated with the military who have used
SRC tasks in the investigation of human performance issues,
including Alluisi and Warm (1990). Thus, the work described
in this chapter follows in a tradition of exploiting the properties
of SRC tasks to investigate a range of issues in human skilled
performance, in this case, ones concerning practice and trans-
fer effects.
The SRC task reveals a natural bias of human performance,
known as the SRC effect. In the prototypical spatial SRC task, a
stimulus can appear in a left or right location, and the performer
is to press an assigned left or right response key as quickly as
possible. Response times (RTs) are on average about 50 to 75 ms
shorter when subjects are instructed to press the left key to the
left stimuli and right key to the right stimuli (compatible map-
ping) than when they are instructed the opposite (incompatible
mapping). This advantage of the compatible mapping typically
amounts to 10 to 20% of the mean latency to react to visual or
auditory stimuli, implying its importance as a task parameter.
90 Robert W. Proctor, Motonori Yamaguchi, and James Miles
The SRC effect is robust and can be obtained in a wide range of
operational settings, such as flight simulations (Yamaguchi &
Proctor, 2006, 2010, 2011b) and driving simulations (Müsseler,
Aschersleben, Arning, & Proctor, 2009; see also Proctor & Vu,
2006b, for a review). The effect is also prevalent across different
age groups (Vu & Proctor, 2008) and cultures (Proctor & Vu,
2010b). Finally, the SRC effect is persistent; although perfor-
mance improves with practice, the effect never fully goes away
after even large amounts of practice (Dutta & Proctor, 1992;
Fitts & Seeger, 1953).
A variant of the SRC task that has come to be known as the
Simon task, after J. R. Simon (1990), was also used. For a Simon
task, although stimulus location still varies, the relevant stimu-
lus dimension is not the location of the stimulus but some non-
spatial feature such as its color (often, red or green). The Simon
effect refers to the fact that responses are still faster, and often
more accurate, when stimulus and response locations corre-
spond than when they do not, even though stimulus location is
defined as irrelevant to the task. The Simon effect has attracted
much research interest in recent years because it enables inves-
tigation of how response selection is affected by features of a
task that are not an explicit part of the instructed task goals
(e.g., Mordkoff & Hazeltine, 2011). The Simon effect is typically
attributed to long-term associations, or links, between particu-
lar stimulus and response locations (e.g., left stimulus locations
and left responses) that have been acquired through years of
experience (e.g., Zorzi & Umiltà, 1995). Activation of the corre-
sponding response is often described as occurring automatically
by way of these long-term links (sometimes called a direct acti-
vation route) when the appropriate stimulus occurs. Consistent
with this view, the benefit of correspondence for stimulus and
response locations is evident even when the left hand presses
the right key and the right hand the left key, and this benefit for
spatial correspondence of the stimulus and key locations per-
sists across extensive practice (Proctor & Shao, 2010).
As mentioned, the SRC and Simon effects reflect a preexist-
ing bias in operational performance that is very general across
different populations and task settings. The strategy for the
research described in this chapter was to isolate and reveal
basic components of skill acquisition and transfer by examin-
ing influences of task parameters on this robust performance
bias. Three major lines of investigation were carried out with
this aim in mind. The first investigated transfer of stimulus-
response (S-R) associations acquired during a practice task to
a subsequent task for which they were no longer relevant. The
Acquisition and Transfer of Basic Skill Components 91
second used a procedure in which trials from different tasks or
S-R mappings were intermixed, with only one task performed
on any given trial; this procedure allowed examination of the
influence of active S-R associations for one task on performance
of the other task. The final set of studies looked at the role of
attention, both in the acquisition and transfer of new associa-
tions as well as task performance, when two or more tasks were
performed concurrently (see also chapter 4). The main findings
in each of these areas and implications of those findings for skill
training are described in the following sections.
Factors Affecting Transfer of Learning

The studies of transfer of learning described in this section used
the following basic paradigm: In a practice session, subjects
performed a two-choice spatial SRC task with an incompatible
mapping (e.g., press “left” key to a stimulus that appears on the
“right”; incompatible-mapping task). Then, in a transfer session,
the subjects performed a Simon task in which they responded
to a nonspatial stimulus attribute (e.g., color). Thus, the spa-
tial relation between stimulus and response in the practice task
was task-relevant, but it became task-irrelevant in the transfer
task. The logic behind this research is that practice establishes
new short-term links between the stimulus locations and their
assigned responses (e.g., Zorzi & Umiltà, 1995; sometimes called
the indirect activation route) that, in the case of an incompatible
mapping, are counter to the long-term links that produce the
typical Simon effect.
Transfer from Incompatible-Mapping Task

to Simon Task
After performing the incompatible-mapping task, the advantage
for the spatially corresponding responses in the Simon task is
eliminated and in some cases reversed (Proctor & Lu, 1999).
This outcome implies that the incompatible S-R links acquired
for the practice task are transferred to the subsequent task even
though they are no longer relevant. This experimental para-
digm is well suited for investigating factors that affect trans-
fer of learning because of the various manipulations of sensory
modalities, verbal and nonverbal stimulus modes, manual and
vocal response modes, and so on, that are possible for the prac-
tice and transfer tasks.
Perhaps the most striking outcome of the practice/transfer
studies is how easy it is to overcome or counteract effects of
long-term associations between stimuli and responses. The ben-
efit for spatial correspondence is eliminated by fewer than 100
trials of practice with an incompatible spatial mapping, which
is counterintuitive given the robust influences of the preexist-
ing biases in a variety of task conditions. This elimination is
equally apparent when tested 5 minutes later, one day later, and
a week later (Tagliabue, Zorzi, Umiltà, & Bassignani, 2000; Vu,
Proctor, & Urcuioli, 2003), and lasts for at least 800 Simon-task
trials (Soetens, Maetens, & Zeischka, 2010). In other words, this
small amount of training is sufficient to produce durable new
S-R links that override the preexisting habitual response ten-
dencies. With larger amounts of practice with an incompatible
spatial mapping, the transfer task shows reversal of the Simon
effect to favor the practiced incompatible S-R relation (Proctor &
Lu, 1999) and a broader range of transfer (e.g., Proctor, Yamagu-
chi, & Vu, 2007; Vu, 2007).
Transfer of the practiced mapping also occurs for tones pre-
sented to the left or right ear, left and right pointing arrows
and the location words left and right, and various response
modes (e.g., left–right unimanual joystick movements; “left” and
“right” vocal responses), as well as for vertically and orthogo-
nally oriented S-R sets (Bae, Cho, & Proctor, 2009; Vu, 2007).
Moreover, such transfer is obtained with nonspatial stimulus
and response sets. For example, when people first perform with
a compatible or incompatible mapping of positive and negative
pictures to approach and avoidance lever movements, the affec-
tive Simon effect in a subsequent task for which picture affect
is irrelevant (a benefit for correspondence of positive→approach
and negative→avoid over the opposite relation) is enhanced fol-
lowing the compatibly-mapped task and reversed following the
incompatibly mapped task (Eder, Rothermund, & Proctor, 2010).
Quantitative Learning Framework

Many of the findings obtained with the practice/transfer para-
digm can be accommodated within the quantitative framework
described in chapter 12, in which the strength of learned knowl-
edge is represented by an activation function:
n
a n = ∑ α i t i−λ exp{βS i } , (1)
i =1
where an represents the activation of target knowledge after n

practice trials. In general, the effect of learning (an) tends to
increase as the number of practice trials (n) increases, and the
learning rate is rapid at first and then decelerates over time,
reaching at an asymptote where the effect of learning offsets.
This learning curve is consistent with the power law of learn-
ing (Newell & Rosenbloom, 1981). The equation embraces a kind
of strength theory in which remembering is a function of the
strength of the memory trace (representation).
According to the framework, efficiency of training is deter-
mined by number of trials (n), learning rate (β), contextual simi-
larity (S), and time passage (t and λ). As noted, practice with an
incompatible mapping increases the associative strength for the
incompatible S-R link through increase in n. The strength of
the incompatible S-R link is reflected in reduction of the Simon
effect, as in Figure 5.1a, which shows that the Simon effect
became smaller (or eliminated) after practice with the incom-
patible mapping (Proctor, Yamaguchi, Zhang, & Vu, 2009). The
relation between strength of S-R link and the amount of practice
follows a power function, as predicted by the learning model.
Consequently, when plotted against the number of practice tri-
als (Figure 5.1b), the Simon effect initially decreases rapidly, but
the amount of change decelerates over trials, eventually reach-
ing an asymptote.
The learning rate β may be dependent on several factors such
as learners’ motivation and comprehensive capability, the effec-
tiveness of instructions, time scheduling of training, and the dif-
ficulty of learning materials. Examinations of the transfer effect
showed that the learning rate depended on the stimulus mode
25 25
a b
20 20
r---
15 15
I 10
I 10
u u
~ ~
w w
c c
0 0
E E
u; u;
L______j
-5 -5
-10 -10
No Practice After Practice 100 300 600
Practice Type Number of Practice Trials
Figure 5.1 Simon effect (a) with no prior practice and after < 100 tri-
als of practice with an incompatible spatial mapping and
(b) as a function of number of practice trials (Proctor,
Yamaguchi, Zhang, & Vu, 2009).
through which spatial information is conveyed (physical location
of a circle, pointing direction of an arrow, the meaning of spatial
words; Proctor et al., 2009). In particular, the transfer effect
was evident after less than 100 trials of practice with physical
locations or with the pointing direction of arrows. Although the
Simon effect tended to be larger for the arrow stimuli than for
the location stimuli, the size of the transfer effect was equivalent
for the two stimulus modes. In contrast, after practice with the
word stimuli for less than 100 trials, there was little indication
of the transfer effect. Nevertheless, when the number of practice
trials was increased to 300, the transfer effect was observed (as
shown in Figure 5.1b), and it was as large as that for the location
and arrow stimuli. Because responses were made by pressing
the left and right keys, set-level compatibility was higher for the
location and arrow stimuli than for the word stimuli (cf. Proctor
& Wang, 1997). Therefore, a similar experiment was conducted
with vocal responses (i.e., saying “left” or “right”; Proctor et al.,
2009), for which set-level compatibility should be higher for the
word stimuli than for the location and arrow stimuli. However,
in this case, the transfer effect still was evident for the location
stimuli after less than 100 practice trials, but it appeared for
the word stimuli only after the number of trials was doubled.
This outcome suggests that the learning rate is not dependent
on set-level compatibility but, regardless of the response mode,
is determined by the stimulus type.
Specificity of Transfer
Another aspect of learning is its limited transfer to other situ-
ations. According to the framework, learning is utilized best in
a context that is similar to the original context in which the
learning has taken place, the principle of transfer specificity (see
chapter 2, this volume). The influence of contextual similarity
of the current trial to past trials is expressed by the exponen-
tial component of Equation 1, where Si is the similarity of the
ith practice trial to the current trial. A well-known nonmetric
theory of similarity judgment is Tversky’s (1977) contrast model
in which an object or event is considered to be a set of unique
features. Then, the similarity between two objects Xi and Xj is
expressed by
S ij = f (X i ∩ X j ) − g (X i / X j ) − h(X j / X i ) , (2)
where Xi ∩ Xj is the features shared by the two objects, Xi/Xj is

the features of Xi not contained in Xj; and f, g, and h are arbitrary
Study Context (C p )
S(C p , C t )
Cp / C t
Cp ∩ C t
Ct / Cp
Test Context (C t )
Figure 5.2 A feature overlap account of contextual similarity.
functions. A special case of the contrast model is the feature

overlap account of contextual similarity (Yamaguchi & Proctor,
2009; see Figure 5.2) in which the similarity between two task
contexts (practice context Cp and test context C t) is considered
to be a function of the number of overlapping features between
the contexts
S (C p , C t ) = f (C p ∩ C t ) . (3)
Boundary conditions of transfer of newly acquired associa-

tions were studied by varying contextual features of the practice
and transfer tasks. The results were consistent with the feature
overlap account. For instance, the transfer effect is larger when
(a) the stimulus modalities (visual or auditory) match between
the practice and transfer conditions than when they mismatch
(Proctor et al., 2007; Vu et al., 2003); (b) the types of stimu-
lus mode (location word, arrow direction, or physical location)
match than when they mismatch (Proctor et al., 2009); (c) the
response modes match than when they mismatch (Yamaguchi &
Proctor, 2009); (d) the stimuli and responses are oriented along
the same spatial dimension (e.g., both horizontal) rather than
along orthogonal dimensions (one vertical, the other horizon-
tal; Proctor et al., 2007; Vu, 2007). Hence, transfer of newly
acquired associations depends on overlap of contextual features
present during practice and test.
Interference as a Source of Skill Dissipation

According to the framework, the influence of time passage (t
and λ) is thought to be loss of learning; that is, learned skills
dissipate over time if the skills are not used. However, there is
a long debate in psychology as to whether dissipation of learn-
ing (or memory) is due to passive decay or interference (e.g.,
Lewandowsky, Oberauer, & Brown, 2009; Portrat, Barrouillet,
& Camos, 2008). Depending on the theoretical position in this
debate, one can formulate different models of skill dissipation.
In previous studies (Vu et al., 2003), the transfer effect was as
large a week after the practice session took place as 5 min after
the session. This finding suggests that learned S-R links did
not decay during a period in which subjects did not perform
the incompatible-mapping task for a week. On the contrary, the
transfer effect was essentially eliminated if there were inter-
vening trials for which subjects performed the incompatible-
mapping task but with a different type of stimuli. In particular,
subjects were first provided with a practice session with word
stimuli. Then, they performed another practice session with
arrow stimuli. Finally, they transferred to the Simon task with
the word stimuli (Yamaguchi & Proctor, in preparation). The
Simon effect was larger than the effect observed for a group that
was provided only with the first practice session (no intervening
session) but as large as the control group who were not provided
with the practice sessions. These results imply that the inter-
vening task “cut off” the learned incompatible S-R links. Hence,
the results support interference as the cause of skill dissipation.
Implementation Instructions
Although practiced skills influence performance of a subse-
quent task, there is evidence to suggest that instructions alone
may lead to similar effects. Gollwitzer (1999) has presented con-
siderable evidence that implementation intention instructions
provide an effective and targeted way to improve performance
without the need for practice. Rather than designating a general
task goal, such instructions are presented in an “if … then …”
format that defines the relation between a specified target (“if
a certain situation occurs”) and its associated response (“then
take a particular action”). By prespecifying this association,
responses purportedly become automatic; therefore, they are
immune to interference and lead to faster and more accurate
performance without a trade-off for alternative, nonspecified
actions. For example, Brandstatter, Lengfelder, and Gollwitzer
(2001) had subjects respond to digits in a stream of letters and
digits, and included the implementation instruction to respond
especially fast to the digit 5. Responses to the digit 5 showed
a performance benefit that was not influenced by the level of
cognitive load imposed by a concurrent task, indicating that
the target of the implemented instruction was not susceptible
to interference.
Cohen, Bayer, Jaudas, and Gollwitzer (2008) used implemen-
tation intention instructions to designate an opposite response
on specific incompatible trials in the Simon task (e.g., “if a red
circle appears on the left side, then I press the right button”)
and found an elimination of the Simon effect. Miles and Proc-
tor (2008) showed that implementation intention instructions
led to faster responses on both incompatible and compatible tri-
als, though, suggesting that the instructions do not replace the
long-terms associations responsible for the Simon effect but set
up new associations that act in addition to them. However, the
instructions were still beneficial to performance of the specified
actions and did not lead to a performance cost for nonspeci-
fied actions. Exactly why implementation intention instructions
produce benefits in performance on a subsequent task is a topic
deserving of further research (see also the related discussion of
mental practice in chapter 2).
Factors Affecting Transfer in Mixed Tasks with

Multiple Mappings
People often have to be prepared to perform multiple tasks, any
one of which must be performed when an appropriate event
occurs, rather than performing a single task in isolation. Thus, it
is important to know how performance of one task is influenced
by the presence of other tasks to perform. In other words, how
does the readiness to perform one task affect performance of
another task with the same or different S-R mappings? Because
selection of which task to perform is required in such situations,
customary findings include (a) overall mixing costs (longer RTs
when tasks are mixed than when they are presented in pure
blocks of a single task) and (b) task switch costs (or repetition
benefits—longer RTs on trials for which the task switches from
that on the prior trial compared to those for which the task
repeats; e.g., Proctor, Koch, Vu, & Yamaguchi, 2008; Yamaguchi
& Proctor, 2011a; see Kiesel et al., 2010, for a review).
Elimination of the Compatible Mapping Benefit

In this line of research, the influence of mixing compatible and
incompatible mappings on choice-reaction tasks was investi-
gated (Vu & Proctor, 2004; Yamaguchi & Proctor, 2006). Among
the most interesting results is that the performance advantage
of the compatible mapping over the incompatible mapping is
reduced or eliminated under mixed conditions (e.g., Shaffer,
1965; Vu & Proctor, 2004; Yamaguchi & Proctor, 2006). This
finding can be attributed to subjects’ having to be prepared to
perform the incompatible-mapping task on any trial during the
session, so that they suppress the natural tendency to respond
with a spatially compatible response to a stimulus. The advan-
tage for the compatible spatial mapping is also lost when tri-
als for which stimulus location is relevant (with only a single
mapping) are mixed with Simon-task trials for which stimulus
location is irrelevant (Proctor & Vu, 2002; Proctor, Vu, & Mar-
ble, 2003). Also, the Simon effect increases somewhat when the
spatial mapping for the location-relevant trials is compatible
but reverses to favor the noncorresponding response when that
mapping is incompatible (e.g., Proctor et al., 2003), a result also
thought to reflect transfer of the task-defined S-R location links
to trials in which stimulus location is not relevant.
Specificity of Mixing Effects

Recent studies examined the specificity of these mixing effects
on performance. Proctor and Vu (2010a) showed that the effects
of mixing were reduced considerably when each mapping or task
used distinct key presses, with responses for one task made
with fingers on the left hand and those for the other task made
with fingers on the right hand. The lack of influence of mix-
ing on the SRC and Simon effects when the tasks have unique
responses implies that suppression of direct activation of the
corresponding response occurs primarily when tasks share
responses. Using different stimulus modes to convey the loca-
tion information for the SRC and Simon tasks also reduces the
impact of mixing the tasks. Proctor and Vu (2009c) showed that
the effects of task mixing on the spatial compatibility and Simon
effects were reduced when the location information was pre-
sented in physical locations for one task and location words for
the other. The mode distinction had little influence, though, on
the effects of mixing compatible and incompatible mappings for
the SRC task. These results imply that when the relevant task
dimension is location for one task and color for the other, the
task-defined associations of locations to responses are to some
extent mode-specific, but when the relevant task dimension is
location for both tasks, the associations are mode-independent
and relay on a shared spatial representation.
However, even in the case of mixed spatial SRC and Simon
tasks, an incompatible mapping of the words left and right to
responses eliminates the Simon effect for physical locations
(although it does not reverse it; Proctor, Marble, & Vu, 2000),
suggesting that in this situation a shared spatial representation
still plays a role. For bilinguals, this influence of incompatibly
mapped location words on performance occurs only when the
words are in the primary language, implying that those in the
second language do not activate the shared spatial representa-
tion (Notebaert, De Moor, Gevers, & Hartsuiker, 2007; Vu, Ngo,
Minikata, & Proctor, 2010).
Concomitant Mixing of Tasks and Spatial Mappings

Several experiments using a more complex version of the mix-
ing paradigm in which both tasks and mappings were mixed
(Proctor, Yamaguchi, Dutt, & Gonzalez, 2012) will now be
described in more detail. One purpose of this effort was to
model two major aspects of task performance, practice and
sequential effects, with an instance-based learning (IBL) the-
ory (Gonzalez, Lerch, & Lebiere, 2003) implemented within an
ACT-R modeling environment (Dutt, Gonzalez, Yamaguchi, &
Proctor, 2012; see chapter 9, this volume). For the experiments,
two tasks could occur on any trial, an SRC task in which sub-
jects responded to the locations of a white visual stimulus or
a Simon task in which subjects responded to the red or green
color of the stimulus while ignoring its location. Furthermore,
for the SRC task, subjects were required to respond by pressing
a response key whose location was compatible with the stimu-
lus location on some trials and incompatible on other trials.
The mapping was signaled by the orientation of a line (hori-
zontal or vertical) presented along with the imperative stimu-
lus. Four blocks of 160 trials were performed, allowing practice
effects to be examined.
Basic Findings. The basic findings in the mixed-task experiment

when the two tasks (and the two mappings for the SRC task) were
equally likely include: (a) responses were faster for the Simon
task than for the SRC task; (b) reduction in RT with practice was
larger for the SRC task than for the Simon task; (c) overall, the
SRC and Simon effects were eliminated (more specifically, they
were eliminated when the spatial correspondence on the current
trial was different from that on the preceding trial, but they
were present when the spatial correspondence on the current
trial was the same as that on the preceding trial); (d) the cost
of switching the correspondence relation was larger for the SRC
task than for the Simon task; (e) the task-switching cost was
larger for the Simon task than for the SRC task.
The first outcome can be attributed to the fact that the SRC
task requires more processing steps than does the Simon
task, as illustrated in Figure 5.3. For the Simon task subjects
must identify the stimulus color and select a correct response,
whereas for the SRC task they have to identify the stimulus
location, determine an appropriate S-R mapping rule, and then
select a correct response. The greater improvement with practice
for the SRC task than for the Simon task can be attributed to
improvement of the mapping determination process, which is
unique to the SRC task. The third outcome is consistent with
the results of previous studies showing that the benefit for the
compatible mapping tends to be eliminated when mappings or
tasks are mixed (e.g., Yamaguchi & Proctor, 2006). The fourth
outcome, that the cost of switching the correspondence relation
was larger for the SRC task than for the Simon task, is due to
the spatial correspondence relation being task-relevant for the
SRC task but task-irrelevant for the Simon task. Consequently,
the influence of switching that relation is more strongly mani-
fested in the former task than the latter task; thus, the effect is
due mainly to the mapping-determination stage. The last out-
come is consistent with what is called the switch-cost asym-
metry—the task-switch cost is typically larger from a difficult
task to an easy task than in the reverse direction (e.g., Allport,
Styles, & Hsieh, 1994; see Kiesel et al., 2010, for a review). As the
SRC task with mixed mappings is more complex than the Simon
task (see Figure 5.3), a larger cost of task-switching is expected
for the Simon task than for the SRC task.
Bias toward SRC or Simon Task. In a second experiment, the

frequencies of occurrence of the SRC and Simon tasks were
varied (Proctor et al., 2012): For half the subjects, 80% of the
trials were from the Simon task (mostly Simon group), whereas
for the other half, 80% of the trials were from the SRC task
(mostly SRC group). For the mostly Simon group, responses were
generally faster for the Simon task than for the SRC task, but
for the mostly SRC group, responses were initially faster for
the Simon task but then faster for the SRC task in later trials.
Thus, as subjects experienced the SRC task more often than the
Simon task, they became more proficient at performing the SRC
task than the Simon task. In contrast to the prior experiment,
the mostly SRC group showed similar costs of switching tasks,
implying that, in this case, the SRC task was no longer more
difficult than the Simon task.
Encode display
information
SRC task Simon task

Select a task
Identify stimulus Identity stimulus

location color
Determine mapping
rule
Select response Select response
Make response
Figure 5.3 A hypothetical process architecture for the expanded mix-

ing paradigm.
As noted, Dutt et al. (2012) developed a computational

model, based on the IBL theory, to simulate performance with
mixed SRC and Simon tasks (see chapter 9, this volume, for
more details). The model was able to reproduce and explain the
changes with practice in both the Simon task alone and for the
experiment in which there was equal mixing of trials from the
Simon and SRC tasks (with equal compatible and incompatible
mapping trials). The model starts with response-selection via
an indirect route, with the response determined by applying a
decision rule. With practice, there is a gradual shift to a direct
route, in which the response for the current trial is based on
retrieval of a previous instance. The retrieval process is con-
trolled by the activation of the instances (which is a function of
recency and frequency of use of instances from memory) and
the similarity of the instances to a task situation. Sequential
patterns in the IBL model occur because the instance from the
previous trial has higher activation than other instances due to
its recent occurrence. When the task and mapping repeat, this
repetition increases the probability of that immediately prior
instance being retrieved on the current trial (due to similarity of
the mapping and/or task), leading to RT benefits. Predictions of
the model were generated and validated for the second of the two
experiments described above, in which bias toward the Simon or
SRC task was introduced.
Bias toward Compatible or Incompatible Spatial Mapping. Two
additional experiments using the mixed mappings and
tasks procedure investigated the influence of bias toward the
compatible or incompatible mapping for the SRC task. In the first
of these experiments, the Simon and SRC tasks were equally
likely, as were the compatible and incompatible mappings for
the latter task. Participants received points for each correct
response and lost points for each incorrect response, and were
to try to maximize the number of points obtained. The payoff
structure for the Simon task was the same for all subjects, but
for the SRC task, half the subjects received a higher payoff for
correct responses on the compatible-mapping trials (C-favor
group), and the other half a higher payoff on the incompatible-
mapping trials (I-favor group). The general idea was that this
procedure would bias participants toward the compatible or
incompatible response. The experiment replicated the findings
of (a) faster responses for the Simon task than the SRC task
and (b) a larger practice effect for the SRC task than for the
Simon task. There was a dissociation between the Simon and
SRC effects; the Simon effect was positive (16 ms for the RT data,
1.77% for the percentage error data), whereas the SRC effect was
slightly negative overall (-14 ms, -0.76%). Moreover, the error
data suggested that in the first trial block, the correspondence
effect (average of the SRC and Simon effects) was positive for
the C-favor group and negative for the I-favor group, but, over
trials, the effects for both groups gradually approached zero.
Thus, the payoff manipulation was effective at early stages of
practice, but its influence decreased at later stages, with subjects
performing the mixed tasks in later trials much like the subjects
in Experiment 1 did when there was no payoff differential for
compatible and incompatible responses. This result is generally
consistent with the IBLT model, because rule-based biases
would influence performance early in practice, whereas instance
retrieval would dominate on later trials.
In a final experiment (Proctor et al., 2012), mapping bias
was induced by varying the relative frequencies of the compat-
ible and incompatible mappings. The SRC and Simon tasks
occurred equally often, but on SRC trials, the mapping was
compatible on 80% of the trials and incompatible on 20% for
half of the subjects, and the opposite for the other half. The SRC
effect was again absent overall in this experiment, but it was
sensitive to the manipulation of mapping frequencies. In con-
trast, the Simon effect remained significant and was insensitive
to the manipulation of mapping frequencies in the SRC trials.
The dissociation of the Simon and SRC effects under mapping
bias conditions implies that the direct response-selection route
is not suppressed in mixed conditions and suggests that mixing
influences the effects by different means.
Factors Affecting Performance of Concurrent

Tasks
Often, not only does one have to be prepared to perform one
of two or more tasks in isolation, but multitasking demands
require that the tasks be performed concurrently (similar issues
involving multitasking are discussed in chapters 4 and 14, this
volume). The research summarized in this section examined
issues relating to whether skills are acquired when attention is
directed toward another task and coordination of performance
across different tasks.
Attention during Learning and Expression of Skill

An issue of importance is the extent to which attention is
required (a) during learning of a skill and (b) for that newly
learned information to be expressed subsequently. This issue
was investigated with an auditory version of the practice/trans-
fer paradigm described in the first section, in which subjects
practiced making spatially incompatible responses to left and
right tones based on their locations and then made the same
responses based on the auditory frequencies (high or low) of the
tones (Miles & Proctor, 2010).
The unique aspect of the Miles and Proctor (2010) study was
that some subjects performed the incompatible-mapping task
while concurrently tracking a ball displayed on the screen by
moving the computer mouse. Because the ball tracking task was
attentionally demanding, subjects had less attention to devote
to the incompatible-mapping task. Consequently, if attention is
required for establishing the new S-R associations, a smaller
transfer effect to the Simon task should be obtained, as com-
pared to those subjects who performed the incompatible-map-
ping task without the ball tracking. This is the outcome that
was obtained. As in previous research, practice with the spa-
tially incompatible mapping eliminated the Simon effect in the
transfer task when there was no concurrent task during the
acquisition phase. However, the Simon effect was not reduced
in the transfer session when the tracking task was performed
concurrently during practice. In addition, the influence of the
concurrent ball tracking task in the transfer task was evalu-
ated. That is, all subjects performed the incompatible-mapping
task without the ball tracking and then transferred to the Simon
task either with the ball-tracking or without it. The Simon effect
was equivalent for the two groups, suggesting that attention is
not required to manifest the effect of incompatible S-R links.
These results imply that the transfer effect reflects “automatic”
or highly skilled retrieval or the use of a direct access route,
and is also consistent with the instance-based learning theory
(Gonzalez et al., 2003; Logan, 1988; see chapter 9, this volume).
Influence of Ideomotor Compatibility

Dual-task performance is often studied in what is called the
psychological refractory period (PRP) paradigm, in which stim-
uli for two different tasks are presented in close temporal prox-
imity, each of which requires a speeded response (see Lien &
Proctor, 2002, for a review). This paradigm, which has a long his-
tory of research in applied experimental psychology much like
that of compatibility effects (Telford, 1931), is of value because
it allows assessment of both general attentional demands of
response selection and more specific interactions across tasks.
The most widely established finding is that the response for
Task 2 is slowed considerably when the time between stimulus
onsets is short, and this PRP effect is typically attributed to
a response-selection bottleneck (Pashler, 1984). One issue has
been whether this bottleneck is bypassed (and the dual-task
interference eliminated) when the stimuli and responses have
a very high degree of compatibility called ideomotor compatibil-
ity (e.g., Greenwald, 2005; Lien, McCann, Ruthruff, & Proctor,
2005). An example of an ideomotor compatible task is respond-
ing to spoken letter stimuli by saying each letter’s name. The
idea is that the high S-R compatibility of such tasks may allow
the response to be generated automatically, without requiring
the typical response-selection process.
Two studies were conducted examining the PRP effect with
ideomotor compatible tasks. Shin, Cho, Lien, and Proctor (2007)
reported three experiments in which both Task 1 and Task 2
were two-choice tasks: Task 1 required manual responses (key
presses or joystick movements) to left and right pointing arrows
presented in left and right locations, respectively, and Task 2
required vocal naming responses to letters. Shin and Proctor
(2008) varied whether the first task had two or four choices,
also in three experiments. A PRP effect for Task 2 response time
was evident in all of the conditions of these two studies, show-
ing that ideomotor tasks do not seem to bypass the response-
selection bottleneck. Of most concern for present purposes is
that across four or more dual-task blocks of up to 48 trials each
in all experiments, only in one case, that of auditory-vocal Task
1 and visual-joystick Task 2 (Shin & Proctor, 2008), did the PRP
effect decrease with practice, and even there the effect was still
evident in the last trial block. In fact, for the two experiments in
which Task 1 used joystick responses to visual stimuli (Shin et
al., Experiment 2; Shin & Proctor, Experiment 1), the PRP effect
increased across blocks. So, even with very highly compatible
individual tasks, practice is not sufficient to overcome dual-task
interference.
Cross Talk between Tasks

Studies were also conducted that used the PRP paradigm to
examine cross talk between spatial tasks performed with the
left and right hands. For these experiments, the stimulus loca-
tions for Task 1 were to the left of center and those for Task 2
were to the right of center, and the responses were made with
fingers on the left and right hands respectively. Each task had
the same number of alternatives, two in the experiments of Vu
and Proctor (2006), three in the experiments of Proctor and Vu
(2009b), and four in those of Proctor and Vu (2009a). The main
variable of interest in all cases was the consistency of mappings
for the two tasks. Mappings were consistent when both were
compatible or both incompatible (e.g., make the mirror oppo-
site response) and inconsistent when Task 1 used one mapping
and Task 2 another. In all cases, a benefit for consistent map-
pings was obtained, similar to that reported initially by Duncan
(1979) for three-choice tasks. However, the basis for this consis-
tency benefit was different for the two-choice tasks when com-
pared to those involving more than two choices. For two-choice
tasks, several findings (e.g., presence of benefit mainly at short
onset intervals; no benefit when one task used auditory stimuli)
implied that the consistency benefit was due to an emergent per-
ceptual blank feature that allowed subjects to respond compat-
ibly to blank regions of the visual display (i.e., when both task
mappings were incompatible, the responses for both tasks cor-
responded to the locations in which stimuli did not occur). For
three- and four-choice tasks, in contrast, the evidence favored
Duncan’s original hypothesis that the benefit comes about from
having only a single mapping rule to apply to both tasks, rather
than having to choose between rules. These results suggest that
performance will be best when consistency of mappings is main-
tained across tasks and that training that highlights consistent
relationships may be most beneficial.
Influence of Current and Prior Payoff Schedules
on Allocation of Effort
Finally, a characteristic of multitasking in many situations is
that a person must determine how much effort to devote to a
particular task and when to switch attention from one task to
another (see chapter 4, this volume, for a similar argument and
conclusion). Issues relating to this strategic aspect of multitask-
ing were examined in a synthetic work environment (Wang,
Proctor, & Pick, 2007, 2009) intended to be a generic represen-
tation of a variety of multitasking situations. This environment
requires concurrent performance of four tasks (math, memory
search, visual monitoring, and auditory monitoring), each rep-
resented in a quadrant of the computer screen, that require
positioning of a cursor with a computer mouse on a response
button, and then clicking on the button. Points are received for
correct responses and lost for incorrect responses, and the goal
is to maximize the number of points obtained. Payoffs were var-
ied for the two more cognitively demanding tasks, math and
memory search, jointly (Wang et al., 2007) or singly (Wang et al.,
2009) between subjects to determine sensitivity of strategies to
the payoff schedule across eight 5-min sessions.
Subjects were sensitive to the payoff differences, performing
a task relatively more when its payoff was high than when it
was low. When the payoffs for the math and memory task were
varied concurrently, performance of both tasks reflected their
relative emphasis. However, when the payoff was varied explic-
itly for only one of the tasks, implicitly modifying the relative
payoff for the other, just performance of the task associated with
the explicit payoff was affected. For the next four transfer ses-
sions, the payoff schedule was switched for half of the subjects
and kept the same for the other half. Results showed that the
subjects modified their strategies consistent with the new pay-
offs. However, residual effects of prior payoffs were evident such
that the performance of the subjects for whom the payoff sched-
ule changed did not match that of subjects who had performed
with that payoff schedule all along. General implications of this
research include that payoffs for multiple-task environments
need to be explicit, and practice should be provided for strategy
development. When payoffs change, strategies adopted reflect
current and previous payoffs.
Conclusion
The research described in this chapter illustrates that there
are benefits of applying individual principles in the training of
specific tasks. A small amount of practice responding to a stim-
ulus dimension may have a long-lasting impact on performance
when that dimension is no longer relevant. Many results are
consistent with skill learning being an increasing function of
the number of trials, the rate of learning for a particular stimu-
lus mode, and the amount of allocated attention. The degree
of transfer varies directly as a function of contextual similar-
ity, with the greatest transfer occurring when the stimulus and
response modes are the same in practice and transfer condi-
tions. The amount of transfer is largely uninfluenced by the
passage of time, although intervening tasks with different S-R
mappings tend to reduce the amount of transfer.
The expression of preexisting biases is also influenced by
the current task environment. Factors such as similarity in
stimulus and response modes for two mixed or concurrent
tasks, as well as consistencies of mapping rules across the
tasks, are critical. Relative payoffs and frequencies for the dif-
ferent tasks and mappings in these more complex task envi-
ronments need to be taken into account in characterizing
skilled performance.
References
Allport, A., Styles, E. A., & Hsieh, S. (1994). Shifting intentional set:
Exploring the dynamic control of tasks. In C. Umiltà & M. Mosco-
vitch (Eds.), Attention and performance: Vol. 15. Conscious and non-
conscious information processing (pp. 421–452). Cambridge, MA:
MIT Press.
Alluisi, E. A., & Warm, J. S. (1990). Things that go together: A review of
stimulus-response compatibility and related effects. In R. W. Proctor
& T. G. Reeve (Eds.), Stimulus-response compatibility: An integrated
perspective (pp. 3–30). Amsterdam, the Netherlands: North-Holland.
Bae, G. Y., Cho, Y. S., & Proctor, R. W. (2009). Transfer of orthogonal
stimulus-response mappings to an orthogonal Simon task. Quar-
terly Journal of Experimental Psychology, 62, 746–765.
Brandstatter, V., Lengfelder, A., & Gollwitzer, P. M. (2001). Implementa-
tion intentions and efficient action initiation. Journal of Personality
and Social Psychology, 81, 946–960.
Cohen, A. L., Bayer, U. C., Jaudas, A., & Gollwitzer, P. M. (2008). Self-
regulatory strategy and executive control: Implementation intentions
modulate task switching and Simon task performance. Psychologi-
cal Research, 72, 12–26.
Duncan, J. (1979). Divided attention: The whole is more than the sum
of its parts. Journal of Experimental Psychology: Human Perception
and Performance, 5, 216–228.
Dutt, V., Gonzalez, C., Yamaguchi, M., & Proctor, R. W. (2012). Instance-
based learning models of SRC and Simon effects. Manuscript sub-
mitted for publication.
Dutta, A., & Proctor, R. W. (1992). Persistence of stimulus-response
compatibility effects with extended practice. Journal of Experimental
Eder, A. B., Rothermund, K., & Proctor, R. W. (2010). The prepared
emotional reflex: Intentional preparation of automatic approach and
avoidance tendencies as a means to regulate emotional responding.
Emotion, 10, 593–598.
Fitts, P. M., & Deininger, R. L. (1954). S-R compatibility: Correspon-
dence among paired elements within stimulus and response codes.
Fitts, P. M., & Seeger, C. M. (1953). S-R compatibility: Spatial charac-
teristics of stimulus and response codes. Journal of Experimental
Gollwitzer, P. M. (1999). Implementation intentions: Strong effects of
simple plans. American Psychologist, 54, 493–503.
Greenwald, A. G. (2005). A reminder about procedures needed to reli-
ably produce perfect timesharing: Comment on Lien, McCann,
Ruthruff, and Proctor (2005). Journal of Experimental Psychology:
Kiesel, A., Steinhauser, M., Wendt, M., Falkenstein, M., Jost, K., Philipp,
A. M., & Koch, I. (2010). Control and interference in task switching—
A review. Psychological Bulletin, 136, 849–874.
Lewandowsky, S., Oberauer, K., & Brown, G. A. (2009). No temporal
decay in verbal short-term memory. Trends in Cognitive Sciences,
13(3), 120–126.
Lien, M., McCann, R. S., Ruthruff, E., & Proctor, R. W. (2005). Con-
firming and disconfirming theories about ideomotor compatibility
in dual-task performance: A reply to Greenwald (2005). Journal of
Experimental Psychology: Human Perception and Performance, 31,
226–229.
Lien, M.-C., & Proctor, R. W. (2002). Stimulus-response compatibility
and psychological refractory period effects: Implications for response
selection. Psychonomic Bulletin & Review, 9, 212–238.
Logan, G. D. (1988). Toward an instance theory of automatization. Psy-
chological Review, 95, 492–527.
Miles, J. D., & Proctor, R. W. (2008). Improving performance through
implementation intentions: Are preexisting response biases
replaced? Psychonomic Bulletin & Review, 15, 1105–1110.
Miles, J. D., & Proctor, R. W. (2010). Attention is required for the
acquisition but not expression of new spatial associations. Journal
1554–1560.
Mordkoff, J. T., & Hazeltine, E. (2011).Responding to the source of stim-
ulation: J. Richard Simon and the Simon effect [Special issue]. Acta
Psychologica, 136(2).
Müsseler, J., Aschersleben, G., Arning, K., & Proctor, R. W. (2009).
Reversed effects of spatial compatibility in natural scenes. American
Journal of Psychology, 122, 325–336.
and the law of practice. In J. R. Anderson (Ed.), Cognitive skills and
their acquisition (pp. 1–55). Hillsdale, NJ: Erlbaum.
Notebaert, W., De Moor, W., Gevers, W., & Hartsuiker, R. J. (2007). New
visuospatial associations by training verbospatial mappings in the
first language. Psychonomic Bulletin & Review, 14, 1183–1188.
Pashler, H. (1984). Processing stages in overlapping tasks: Evidence
for a central bottleneck. Journal of Experimental Psychology: Human
Perception & Performance, 10, 358–377.
Portrat, S., Barrouillet, P., & Camos, V. (2008). Time-related decay or
interference-based forgetting in working memory? Journal of Experi-
mental Psychology: Learning, Memory, and Cognition, 34, 1561–1564.
Proctor, R. W., Koch, I., Vu, K.-P. L., & Yamaguchi, M. (2008). Influ-
ence of display and cue formats on switch costs and the prevalence
effect for horizontal and vertical dimensions. Memory & Cognition,
36, 998–1012.
Proctor, R. W., & Lu, C.-H. (1999). Processing irrelevant location infor-
mation: Practice and transfer effects in choice-reaction tasks. Mem-
ory & Cognition, 27, 63–77.
Proctor, R. W., Marble, J. G., & Vu, K.-P. L. (2000). Mixing incompat-
ibly mapped location-relevant trials with location-irrelevant trials:
Effects of stimulus mode on the reverse Simon effect. Psychological
Research, 64, 11–24.
Proctor, R. W., & Shao, C. (2010). Does the contribution of stimulus-
hand correspondence to the auditory Simon effect increase with
practice? Experimental Brain Research, 204, 131–137.
Proctor, R. W., & Vu, K.-P. L. (2002). Mixing location irrelevant and
relevant trials: Influence of stimulus mode on spatial compatibility
effects. Memory & Cognition, 30, 281–294.
Proctor, R. W., & Vu, K.-P. L. (2006a). Laboratory studies of training,
skill acquisition, and retention of performance. In K. A. Ericsson, N.
Charness, P. J. Feltovich, & R. R. Hoffman (Eds.), Cambridge hand-
book of expertise and expert performance (pp. 265–286). Cambridge,
England: Cambridge University Press.
Proctor, R. W., & Vu, K.-P. L. (2006b). Stimulus-response compatibility
principles: Data, theory, and application. Boca Raton, FL: CRC Press.
Proctor, R. W., & Vu, K.-P. L. (2009a). Determinants of the benefit for
consistent stimulus-response mappings in dual-task performance
of four-choice tasks. Attention, Perception and Psychophysics, 71,
734–756.
Proctor, R. W., & Vu, K.-P. L. (2009b). Determinants of the benefit for
consistent stimulus-response mappings in dual-task performance
of three-choice tasks. Attention, Perception, and Psychophysics, 71,
1771–1781.
Proctor, R. W., & Vu, K.-P. L. (2009c). Task-defined associations are
mode specific for selection of relevant dimension but mode inde-
pendent for selection of mapping. Quarterly Journal of Experimental
Proctor, R. W., & Vu, K.-P. L. (2010a). Stimulus-response compatibil-
ity for mixed mappings and tasks with unique responses. Quarterly
Proctor, R. W., & Vu, K.-P. L. (2010b). Universal and culture-specific
effects of display-control compatibility. American Journal of Psychol-
ogy, 123, 425–435.
Proctor, R. W., Vu, K.-P. L., & Marble, J. G. (2003). Eliminating spa-
tial compatibility effects for location-relevant trials by intermixing
location-irrelevant trials. Visual Cognition, 10, 15–50.
Proctor, R. W., & Wang, H. (1997). Differentiating types of set-level
compatibility. In B. Hommel & W. Prinz (Eds.), Theoretical issues in
stimulus-response compatibility (pp. 11–37). Amsterdam, the Neth-
erlands: North-Holland.
Proctor, R. W., Yamaguchi, M., Dutt, V., & Gonzalez, C. (2012). Disso-
ciation of S-R compatibility and Simon effects with mixed tasks and
mappings. Manuscript submitted for publication.
Proctor, R. W., Yamaguchi, M., & Vu, K.-P. L. (2007). Transfer of non-
corresponding spatial associations to the auditory Simon task. Jour-
nal of Experimental Psychology: Learning, Memory, and Cognition,
33, 245–253.
Proctor, R. W., Yamaguchi, M., Zhang, Y., & Vu, K.-P. L. (2009). Influ-
ence of visual stimulus mode on transfer of acquired spatial asso-
ciations. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 35, 434–445.
Shaffer, L. H. (1965). Choice reaction with variable S-R mapping. Jour-
nal of Experimental Psychology, 70, 284–288.
Shin, Y. K., Cho, Y. S., Lien, M.-C., & Proctor, R. W. (2007). Is the
psychological refractory period effect for ideomotor compatible tasks
eliminated by speed-stress instructions? Psychological Research,
71, 553–567.
Shin, Y. K., & Proctor, R. W. (2008). Are spatial responses to visuo-
spatial stimuli and spoken responses to auditory letters ideomo-
tor-compatible tasks? Examination of set-size effects on dual-task
interference. Acta Psychologica, 129, 352–364.
Simon, J. R. (1990). The effects of an irrelevant directional cue on
human information processing. In R. W. Proctor & T. G. Reeve
(Eds.), Stimulus-response compatibility: An integrated perspective
(pp. 31–86). Amsterdam: North-Holland.
Soetens, E., Maetens, K., & Zeischka, P. (2010). Practice-induced and
sequential modulations of the Simon effect. Attention, Perception, &
Psychophysics, 72, 895–911.
Tagliabue, M., Zorzi, M., Umiltà, C., & Bassignani, F. (2000). The role of
LTM links and STM links in the Simon effect. Journal of Experimen-
tal Psychology: Human Perception and Performance, 26, 648–670.
Telford, C. W. (1931). Refractory phase of voluntary and associative
response. Journal of Experimental Psychology, 14, 1–35.
Tversky, A. (1977). Features of similarity. Psychological Review, 84,
327–352.
Vu, K.-P. L. (2007). Influences on the Simon effect of prior practice with
spatially incompatible mappings: Transfer within and between hori-
zontal and vertical dimensions. Memory & Cognition, 35, 1463–1471.
Vu, K.-P. L., Ngo, T. K., Minakata, K., & Proctor, R. W. (2010). Shared
spatial representations for physical locations and location words in
bilinguals’ primary language. Memory & Cognition, 38, 713–722.
Vu, K.-P. L, & Proctor, R. W. (2004). Mixing compatible and incompat-
ible mappings: Elimination, reduction, and enhancement of spatial
compatibility effects. Quarterly Journal of Experimental Psychology,
57A, 539–556.
Vu, K.-P. L., & Proctor, R. W. (2006). Emergent perceptual features in
the benefit of consistent stimulus-response mappings on dual-task
performance. Psychological Research, 70, 468–483.
Vu, K.-P. L., & Proctor, R. W. (2008). Age differences in response selec-
tion for pure and mixed stimulus-response mappings and tasks.
Acta Psychologica, 129, 49–60.
Vu, K.-P. L, Proctor, R. W., & Urcuioli, P. (2003). Transfer effects of
incompatible location-relevant mappings on subsequent visual or
auditory Simon tasks. Memory & Cognition, 31, 1146–1152.
Wang, D.-Y. D., Proctor, R. W., & Pick, D. F. (2007). Acquisition and
transfer of attention-allocation strategies in a multiple-task work
environment. Human Factors, 49, 995–1004.
Wang, D.-Y. D., Proctor, R. W., & Pick, D. F. (2009). Allocation of effort
as a function of payoffs for individual tasks in a multitasking envi-
ronment. Behavior Research Methods, 41, 705–716.
Yamaguchi, M., & Proctor, R. W. (2006). Stimulus-response compat-
ibility with pure and mixed mappings in a flight task environment.
Yamaguchi, M., & Proctor, R. W. (2009). Transfer of learning in choice
reactions: Contributions of specific and general components of man-
ual responses. Acta Psychologica, 130, 1–10.
Yamaguchi, M., & Proctor, R. W. (2010). Compatibility of motion infor-
mation in two aircraft attitude displays for tracking task. American
Journal of Psychology, 123, 81–92.
Yamaguchi, M., & Proctor, R. W. (2011a). Automaticity without exten-
sive training: The role of memory retrieval in automatic implemen-
tation of task-defined rules. Psychonomic Bulletin & Review, 18,
347–354.
Yamaguchi, M., & Proctor, R. W. (2011b). The Simon task with multi-
components responses: Two loci of response-effect compatibility.
Psychological Research, 75, 214–226.
Yamaguchi, M., & Proctor, R. W. (in preparation). Transfer of an incom-
patible spatial mapping to the Simon task: Influence of an interven-
ing task with a different stimulus mode.
Zorzi, M., & Umiltà, C. (1995). A computational model of the Simon
effect. Psychological Research, 58, 193–205.
6 How Cognitive Ability
and Automation Influence
Training Performance and
Transfer
Eric D. Heggestad
University of North Carolina, Charlotte
Benjamin A. Clegg
Adrian Goh
University of North Carolina, Charlotte
Robert S. Gutzwiller
Automation is becoming increasingly ubiquitous (see Sheridan,

2002), perhaps most notably in transportation systems like cars
and airplanes, but also in areas such as medicine, computing,
manufacturing, military environments, commerce, and even
in the home. Automation can be defined broadly as a process
whereby a device or system takes over a task or components
of a task that previously were, or potentially could have been
executed by people (Parasuraman & Riley, 1997). The intended
benefit of automated systems is the improvement of performance
resulting from reduced complexity of the task for the human
operator. As an example, one prevalent, longstanding use of
automation is the automatic gearbox in automobiles, which
removes the task of shifting gears from the human operator.
In this case, automation allows novice drivers to increase their
allocation of attention to other relevant tasks. Consistent with
this notion, evidence suggests that automation of this function
improves monitoring and remembering of road signs in learners
(Shinar, Meir, & Ben-Shoham, 1998).
In addition to being incorporated into tasks themselves, auto-
mation offers potential as a training aide. The inclusion of such
Cognitive Ability and Automation 113
aides within the learning process is intended to reduce the cogni-
tive load on the learner, allowing him or her to focus on learning
particular components of the task within the full task environ-
ment (for related topics see chapters 2 and 4). Continuing in the
context of learning to drive a car, learners are typically required
to master simultaneously a set of distinct subtasks, including
steering, controlling the velocity of the vehicle, looking for and
anticipating the location of other road users, and navigating
the route. The inclusion of an automated training aide might
allow novice drivers first to focus their attention on learning
the single component of steering the vehicle while automation
controls other task components. Once learners have mastered
steering, then the automation could be adjusted so that learners
are either presented with a different component of the task (e.g.,
velocity control) or are required to perform a second component
of the task along with the steering component. Of course, auto-
mation as a training aide could be incorporated into training
programs for a wide range of complex tasks.
A potential benefit of additional attentional capacity afforded
by the presence of automation is the capability to focus on
higher-order aspects of the task. For example, automating some
task systems might provide the learner with an opportunity to
focus on learning to anticipate upcoming system states or how
to strategically handle those states. Returning to the example
of driving, the complexity of learning all of the component tasks
simultaneously leads to an immediate emphasis on managing
vehicle control (e.g., lane-keeping) with relatively little attention
directed to the higher-order aspects of the task (Groeger & Clegg,
2007) such as anticipating what other drivers are likely to do.
This imbalanced emphasis has also been observed in studies
using eye tracking, which show that novice drivers tend to focus
on the lane lines while more experienced drivers use periph-
eral vision to maintain lane position (e.g., Mourant & Rockwell,
1970, 1972). Thus, although the driver learns how to control a
vehicle, he or she does not learn about the situational/contex-
tual factors that would aid prediction of a collision prior to it
occurring. Directing attention to the higher-order aspects of the
task earlier in training may lead to improved performance over
more traditional training methods where the development of the
higher-order coordination and integration skills is delayed (for a
discussion of such issues see chapter 7).
It must be recognized, however, that the use of automation
within the training process might not always improve learn-
ing as there might be notable negative consequences to having
114 Eric D. Heggestad et al.
an automated system operate a portion of the task during the
learning process. Learners might come to rely on the auto-
mation, failing to learn all the components of the task or to
understand the broader perspective required to coordinate
the different components (Clegg & Heggestad, 2010). Thus, the
inclusion of automation may make training less effective (e.g.,
Ballas, Heitmeyer, & Pérez, 1992). To date, there have been few
empirical studies exploring the role of automation as a training
aide. Hence, whether the inclusion of automation in a training
program will enhance or hinder trainee learning remains an
open question.
Even if there are situations for which automation can enhance
the training process, such approaches are not likely to repre-
sent one-size-fits-all interventions. Individuals with certain
characteristics (i.e., lower levels of intellectual capabilities) may
benefit more from the inclusion of automated systems within
programs for training complex tasks. That is, although higher
ability learners may be able to deal with the level of complexity
presented by the task, lower ability learners may become over-
whelmed. Incorporating automation may reduce the complexi-
ties of the task for lower ability learners, allowing them to focus
on learning certain elements of the task and keeping them mov-
ing forward in the learning process.
Given that automation is being incorporated into training
processes, theory and research on the joint and interactive
effects of learner characteristics and methods of training with
automation on learning, retention, and transfer of skills is much
needed. Ideally, such research would allow for the specification
of individualized training programs that would optimally pair a
method of training with automation with a learner based on his
or her unique set of cognitive abilities (and potentially other indi-
vidual differences). Initial empirical and theoretical work on this
broadly defined issue is presented in this chapter. The chapter
begins with a discussion of the nature of skill acquisition and
the role of cognitive abilities in skill acquisition processes. Next,
an overview of the nature of automation and discussion of how
it can serve as a training aide are provided. The findings from
three initial experiments designed to examine the roles of auto-
mation and individual differences on learning and performance
in training are then presented. Building on these results, some
initial thoughts on a framework for understating automation as
a training aide are laid out. Finally, the chapter closes with a
brief summary statement and the identification of directions for
future work in this area.
Skill Acquisition, Training, and Cognitive Ability
The degree of difficulty in acquiring a skill (i.e., learning a new
task) is a direct function of the complexity of a task. Complex
tasks require considerable practice before an individual fully
learns the task and is able to perform it proficiently, whereas
simpler tasks require relatively little practice to achieve profi-
ciency. Regardless of the difficulty of the task, Anderson (1982)
claims that skill acquisition occurs in three distinct phases (see
also Fitts & Posner, 1967; Schneider & Shiffrin, 1977; Shiffrin &
Schneider, 1977; for other conceptions of skill learning see chap-
ter 1). Using Anderson’s (1982) terminology, these three phases
include the declarative, knowledge compilation, and procedural
phases (some related discussion of the nature of declarative and
procedural information can be found in chapters 1 and 2). In
the declarative phase, there are strong cognitive demands on the
learner as he or she seeks to develop a conceptual understand-
ing of the task and the required responses. As a consequence
of these demands, learning during this phase is slow and error
prone. Knowledge compilation is characterized by the further
development and refining of the stimulus-response pairings
required for successful performance. Finally, in the procedural
phase, having learned how to perform the task, performance
often requires little cognitive effort on the part of the learner.
In this phase also, learners can approach automaticity in task
performance with continued practice.
The Role of Cognitive Ability in Training

Research has clearly demonstrated that individual differ-
ences in cognitive abilities are a powerful predictor of train-
ing outcomes such as learning, retention, and transfer (Ferreti
& Butterfield, 1992; Robinson & Kingsley, 1977). Individuals
with higher levels of cognitive ability learn faster and achieve
higher levels of performance. After time spent away from a
task, individuals with higher levels of cognitive ability are able
to return to their previous levels of performance more quickly
than individuals with lower levels of cognitive ability. Those
with higher levels of cognitive ability are also better able to
transfer their knowledge to different tasks that share the same
fundamental elements. Moreover, these differences between
higher and lower ability learners tend to increase as task com-
plexity increases.
Although true, these statements are in some ways overly sim-
plistic as cognitive ability is not a unitary construct but rather
a collection of distinct (although related) ability dimensions.
There has been a long history in the cognitive ability literature of
defining and differentiating these distinct cognitive ability con-
structs. Perhaps the most widely accepted perspective is that
offered by Carroll (1993, 1997) who, after conducting a thorough
review of the literature and reanalysis of some 460 data sets,
proposed a three-stratum model of cognitive abilities (a simi-
lar representation was proposed by Gustafsson, 1984, 1989).
According to Carroll (1992), “the structure of abilities can best
be described in terms of a three-stratum model comprising a
single g factor at the apex, or third stratum, a series of broad
abilities at the second stratum, and a larger set of narrow abili-
ties at the lowest stratum” (p. 268).
Ackerman (1988) argued that successful performance in each
of the phases of skill acquisition is associated with a different
cognitive ability construct. Given that the initial declarative
phase of skill acquisition is characterized by strong cognitive
demands on the learner, Ackerman argued that general cogni-
tive ability is associated most strongly with performance in this
phase. He further argued that performance during the knowl-
edge compilation phase, which is associated with building the
speed of already learned stimulus-response pairings, should
be associated with perceptual speed abilities (i.e., the speed
at which an individual can cognitively process and respond to
information). Once the learner approaches a high level of skill in
the procedural phase, performance is limited by the individual’s
psychomotor abilities (i.e., the speed and accuracy of motoric
responses). Ackerman provided notable empirical evidence in
support of these expectations for learning a complex air traffic
control task.
Cognitive Ability and the Training Situation

The nature of the relationships between cognitive abilities and
performance within training also depends on the nature of the
task being trained and variations in training. This idea is com-
monly referred to as an aptitude-treatment interaction (ATI),
which Cronbach and Snow (1977) suggested is quite common in
the educational domain. Perhaps the most well known example
of an ATI is that between general cognitive ability and the struc-
ture of classroom environments (Snow, 1989) where it is expected
that the relationship between cognitive ability and classroom
performance will differ as a function of the level of structure pro-
vided in the classroom (high structure tends to involve teacher-
led activities designed to teach specific aspects of the lesson;
lower structure tends to involve self-paced experiential learning
on the part of the student). More specifically, the relationship
between cognitive ability and classroom performance should
be strong in classroom settings with lower levels of structure;
that is, higher ability students will perform well and lower abil-
ity students will struggle. In contrast, the presence of struc-
ture in the classroom should help facilitate learning among the
lower ability students, thereby reducing the overall relationship
between cognitive ability and classroom performance in these
environments. Such ATIs suggest that individual differences in
learner capabilities and variations in the learning environment
jointly impact the degree of learning and performance in train-
ing contexts.
Automation as a Training Aid
Changes with Automation

As Sheridan and Verplanck (1978) pointed out, automation
should be regarded as something other than just a binary choice
(automation vs. no automation). A range of options (“levels of
automation”) exist regarding the allocation of control between
human beings and machines. Subsequent developments in
understanding levels of automation (Endsley & Kaber, 1999;
Kaber & Endsley, 2004; Parasuraman, Sheridan, & Wickens,
2000) offer important insights into the nature of automation,
and further illustrate the potential complexity of matching the
level of automation to the situation (e.g., Ruff, Calhoun, Draper,
Fontejon, & Guilfoos, 2004). Crucially, the effectiveness of the
different levels of automation is likely to depend on the specific
cognitive processes implicated (Parasuraman et al., 2000). For
example, Clamann, Wright, and Kaber (2002) suggested that
problems adapting to automation seem more pronounced for
cognitive tasks (information analysis and decision aids) than for
lower level component tasks (information acquisition and action
implementation). These and other critical advances in under-
standing automation, particularly in terms of system design
from a user-centered perspective (Endsley, 1996; Sheridan,
1995), might result in automation that makes tasks easier.
Reducing the complexity of a particular task by including an
automated component might have distinct benefits. For instance,
a single operator might be able to control more systems and be
more productive, individuals might learn the skills required for
performing the task more rapidly, or the automation may enable
individuals without highly specialized training to accomplish
complex tasks. It needs to be recognized, however, that add-
ing automation does not always decrease the complexity of a
task and in fact, automation may increase mental workload as
operators are given more distinct tasks to perform (see Kirlik,
1993; Wiener, 1988). As Sheridan (1997, 2002) has argued, far
from automation replacing human beings, the introduction of
technology has changed the person’s role, placing the individual
in supervisory control rather than direct control (see also Para-
suraman & Wickens, 2008).
Consider the case of unmanned vehicle control in military
contexts where there is a push to incorporate further automa-
tion of basic tasks (Finn & Scheding, 2010; U.S. Roadmap,
2007). With this increasing role of automation there have been
numerous calls to “invert” the ratio of operators to unmanned
vehicles. Whereas it has been common to have one operator, or
a team of operators, control one unmanned vehicle, the hope is
to have a single operator control as many as 12 vehicles simul-
taneously (Cummings & Guerlain, 2007; Dixon, Wickens, &
Chang, 2003; Ruff, Narayanan, & Draper, 2002). The number of
vehicles controlled is limited by the extent to which each vehi-
cle operates autonomously, and in turn then by how well the
human supervisor can perceive or remember what each entity
is doing, and meaningfully predict and plan the vehicles’ future
actions. Therefore, the interaction between human information
processing limitations and the workings of any instantiation of
automation becomes critical. In these situations, the demands
on the operator change from executing the task, to attending to,
switching between, and coordinating multiple automated tasks.
This perspective, which echoes Norman’s (1991) distinction
between a systems view and a personal view, hints at an impor-
tant issue for training. Even though from a systems view the
combination of human and automation might appear more pow-
erful, from a personal view the use of automation fundamentally
changes the nature of the task and imposes different demands
on the operator (e.g., Dzindolet, Beck, Pierce, & Dawe, 2001;
Funke, Matthews, Warm, & Emo, 2007; Parasuraman & Riley,
1997). Following this line of reasoning, training with automa-
tion ought not to be characterized as merely an enhanced ver-
sion of training without automation.
Automation within Training

Automation within training may be a double-edged sword. On
the one hand, automation can reduce the demands on the learn-
ers and keep them from being overwhelmed in complex learning
environments, allowing them to engage in more effective learn-
ing of critical task components (e.g., Marmie & Healy, 1995). On
the other hand, automation may reduce the need for the learner
to engage actively in learning or may preclude the development
of a comprehensive understanding of the task. To expand on
this point, researchers (Bainbridge, 1983; Endsley & Kiris, 1995;
Moray, 1986) have argued that automation can impair acquisi-
tion and maintenance of operators’ skill, and the development
of accurate mental models of the controlled system. One of the
main consequences of reduced direct contact with the system is
what has been termed the “out-of-the-loop performance prob-
lem” (Endsley & Kiris, 1995), an effect they suggest occurs in
a range of settings (e.g., Billings, 1991; Moray, 1986; Wiener &
Curry, 1980). Operators in overly automated environments can
have diminished ability to detect system failures or changes,
and a reluctance to take over manual control, in addition to
significantly reduced situation awareness along a spectrum of
automation levels (Endsley & Kiris, 1995; see also chapter 4).
Despite the concerns raised in the previous paragraph, there
remains a pervasive belief that automation can be an effective
training aide. Yet, the issue is clearly going to be complex. Con-
sideration must be given to the distinct task demands and the
type and nature of the automation to be used (Parasuraman et
al., 2000). Further, consideration will need to be given to the
characteristics of the learner (i.e., cognitive abilities), as the
effectiveness of the inclusion of automation is likely to depend
on these learner characteristics. Moreover, given the dynamic
nature of skill acquisition the optimal implementation of auto-
mation at one point in the skill acquisition process will likely
be different from other points in the training process. All of this
reasoning implies that there will be a limited number of ways of
implementing automation within training that will be effective
at any one point in the training process. Consequently, there
are also likely to be many ways to implement automation within
training that will not be effective at improving learning or sub-
sequent performance.
Initial Investigations of Aptitude, Automation,

and Training
This chapter reports a new program of research to begin to
understand the independent and joint influences of cognitive
abilities of learners and methods of including automation in
training on training outcomes. At this point in time the research
does not address the complexities discussed in the previous
sections. Rather, this initial work has sought simply to identify
basic influences of cognitive ability and automation on learning
and performance within the context of training.
Experiment 1: Seeking Evidence of an Aptitude

Automation Interaction
The goal in the initial experiment (Clegg, Heggestad, & Blalock,
2010) was to develop an understanding of how the presence of
automation influenced training performance and transfer to
variants of the same task, and whether automation differen-
tially impacted learners as a function of their level of cognitive
ability. The training context for this experiment was a simu-
lated process control task, “Pasteuriser,” derived from an earlier
microworld simulation of orange juice pasteurization developed
by Moray and colleagues (Lee & Moray, 1992; Muir & Moray,
1996). Complexity in the learner’s task arises from the interac-
tion of the three subsystems, the presence of competing goals,
and dynamics that incorporate time lags (for more details on
the simulation see Lee, 1992). Thus, this type of task captures
a number of elements present within a variety of high perfor-
mance skill training for complex dynamic systems.
Adapted from the levels of automation instantiated in this task
by Moray, Rodriguez, and Clegg (2000; see also Liu, Wasson, &
Vincenzi, 2009; Ruff et al., 2002), automation was operational-
ized in two ways. In a user-initiated automation (UIA) condition
automation offered the potential to assume automatic control
over the most misaligned subsystem when the plant was out of
equilibrium. Short-term assistance on that subsystem was pro-
vided only when the learner actively engaged the automation.
In a second condition, referred to as the automatically initiated
automation (AIA) condition, the automation of various subsys-
tems was engaged automatically by the system unless it was
actively vetoed by the learner. Thus, in the UIA group automa-
tion was enacted only when the learner actively decided to use it,
whereas in the AIA group automation was enacted by the system
unless the learner actively decided not to use it. A control condi-
tion, the manual control (MC) condition, was also included in
which learners were not provided with any automation.
Learners in the MC condition produced less good juice and
more spoiled juice early in training than did learners in either of
the automation conditions. Although performance did not differ
by training condition later in training for good juice produc-
tion, learners in the MC condition produced less spoiled juice
(i.e., showed better performance) at the end of training than did
learners in the two automation conditions. These findings show
a cost from automation later in training. In part, these results
may reflect the suboptimal, simple heuristic-based automation
supplied within Pasteuriser. However, learners were not com-
pelled to employ the automation. Thus, the decision to continue
to use automation despite having had sufficient practice to be
able to outperform the automation is enlightening, highlight-
ing costs associated with having automation present during
training.
Additional evidence for the negative costs of training with auto-
mation was observed when the data from a final, nonautomated
transfer trial were examined. For this trial, in statistical terms,
the MC group produced significantly more good juice and less
spoiled juice than the AIA group. Performance of the UIA group
did not differ from either of the other groups. Taken together,
these results suggest the presence of support from automation is
certainly not enhancing learning of the underlying system, and
learners trained with the highest levels of automation may not
be acquiring the equivalent underlying knowledge of the system.
The extent to which the presence of automation changed the
relationship between general cognitive ability and training per-
formance was also investigated. Early in training the relation-
ship between g (general intelligence) and good juice production
was less strong in the two automation groups than in the MC
group. A similar pattern of results was found when the amount
of spoiled juice in the initial trial was used as the dependent
variable. These findings are consistent with prior research on
skill acquisition in training (e.g., Kanfer & Ackerman, 1989) and
suggest that including automation early in training programs
can be particularly advantageous to lower ability learners in
that it lowers the cognitive demands of the task. By the end of
training, the main effect for g was weaker in the prediction of
good juice production than it was earlier in training (though the
relationship remained statistically significant). Also consistent
with expectations, differences in the relationship between g and
performance between the conditions were less pronounced later
in training than they had been earlier in training. Specifically,
the strength of the relationship between g and performance was
not statistically different between the MC and AIA conditions.
Although the relationship between g and performance remained
stronger in the MC condition than it was in the UIA group later
in performance, the magnitude of the difference was smaller at
this later point in training than it had been in the initial train-
ing block. Again, the differences in performance were more pro-
nounced among lower g learners.
Experiment 2: Variations in Automation
across Training
A second experiment (Clegg & Heggestad, 2010) began to explore
the ways in which the negative consequences of automation in
training might be removed. The first experiment suggested that
the presence of automation early in training was beneficial,
likely through the automation guiding performance when nov-
ices had no knowledge to draw upon, and operators needing to
devote cognitive resources to understanding and learning the
task. With the greatest impact of automation found in the initial
stages of acquisition, perhaps it is only necessary to provide
automation early in training, when the cognitive demands of the
task are the highest. Taking a scaffolding approach (see chapter
4), automation could be slowly removed as the task becomes less
cognitively demanding. Reducing the involvement of automation
over time would ensure the cognitive demands of the task remain
more consistent over the course of learning. To operationalize
this situation, learners in the decreasing automation (DA) condi-
tion started out by simply observing the system (i.e., automation
controlled all three subsystems of the task). Over the next set of
trials learners were given responsibility for operating only one
of the three subsystems, although this responsibility involved a
different subsystem on each trial. As training progressed, they
were then given responsibility for two of the subsystems. Finally,
by the end of training they were controlling all three subsystems
and automation was not available (see chapter 2 for discussion
of the related topic of part-task training).
It was also possible that the learners trained with automa-
tion in the initial experiment performed less well at the end of
training and when automation was removed because they failed
to learn one of the integrated aspects of the task (allowing auto-
mation to consistently do the work of one or a pair of specific
subsystems). To ensure learning of all of the subsystems of the
task while at the same time reducing the complexity of the task,
a condition was instituted in which automation controlled two of
the subsystems on each trial. This situation allowed the learner
to focus on only a single subsystem at a time. The subsystems
controlled by automation were balanced so that learners had to
operate all of the subsystems the same number of times across
the training period. This condition was referred to as the ran-
dom automation (RA) condition. A control condition with no
automation (MC) was also included.
As expected, participants in the MC condition showed
improved performance over the course of training. More modest
performance gains occurred in the RA condition. The results
again suggest potential costs associated with training with
automation, although unlike the previous experiment these
individuals were compelled to allow the nonoptimal automa-
tion to control subsystems, which certainly hampered their
performance.
Initially those in the DA condition benefited from the ability of
automation to exceed novice performance, but when some level
of control was handed over to learners (they continued to have
substantial support for their performance from the automated
subsystems) as they progressed through training, they pro-
duced less good juice than learners in the MC condition. By the
end of training those in the MC condition were producing more
good juice than those who had support from automation earlier
in their training. These results again suggested that training
with automation is detrimental to performance later in training.
As in Experiment 1, the relationships between cognitive abil-
ity and training performance across the conditions at both the
beginning and the end of training were examined. With regard
to the amount of good juice produced in the first training block,
the results were consistent with the findings in the first experi-
ment. Specifically, g was related to level of performance in each
of the three conditions, but the relationship was stronger in the
MC condition than in either of the two automation conditions. By
the last training block, although g remained predictive of perfor-
mance, there were no differences between the conditions in the
relationship. Thus, the provision of automation early in training
reduced the cognitive demands of the learning task, resulting in
superior performance primarily among lower g learners.
In sum, the second experiment suggests that neither gradu-
ally removing automation nor using automation to impose the
need for an operator to learn the functioning of specific subsys-
tems was effective at reducing the negative impacts of training
with automation on learning. Hence simple solutions to reduc-
ing reliance on automation were shown to be ineffective. As in
Experiment 1, automation reduced the g demands of the task
early in training, but the reduction in task complexity appears
to result in less learning over time.
Experiment 3: Automation and Training

in Vehicle Control
In a third experiment (Blitch & Clegg, 2011) a military-task sim-
ulation was employed which extended the work from the previ-
ous continuous, closed-loop, slow dynamics of process control
to the domain of simulation featuring continuous, closed-loop,
fast dynamics (Moray & Inagaki, 1999). This research used the
Predator unmanned aerial vehicle (UAV) synthetic task envi-
ronment (STE), developed at the Air Force Research Labora-
tory’s Warfighter Training Research Division (Martin, Lyon, &
Schreiber, 1998). Advantages of the use of this platform were
the presence of structured training (in contrast to the trial-and-
error training in the Pasteuriser task) and a different instan-
tiation of automation within this platform (although of broadly
the same style, with control over subsystems of the integrated
task).
Training within the STE environment occurs in a series of
modules, with each accompanied by a multimedia tutorial fol-
lowed by hands-on practice. Following each maneuver learners
are supplied with feedback on their deviations from the optimal
flight path. In this experiment all participants were trained first
on maintaining heading and altitude while reducing airspeed.
The next training maneuver required a heading change. Only
this maneuver featured the automation manipulation, with
either the autopilot functions controlling pitch and throttle, or
learners required to manually maintain altitude and airspeed.
Next, all participants were trained on a descent maneuver, fea-
turing simultaneous airspeed and altitude reduction but no
change of heading.
A final test featured a combination of all of the trained ele-
ments within a landing task that required the learners to follow
a descending flight path around a series of markers and onto
an airfield runway. Consistent with the findings in the earlier
experiments, the use of automation in training ultimately led
to poorer acquisition of underlying knowledge of how to con-
trol the UAV (see Blitch & Clegg, 2011). Learners trained with
automation (automated control of pitch and throttle) during the
second training maneuver showed higher levels of error within
their glide slope during the landing task. Note that these learn-
ers, who had been able to focus during this earlier maneuver
on controlling their heading, showed no better performance in
maintaining heading within the landing task.
Conclusions from These Initial Findings

Across this series of experiments, data showed the presence
of automation in training can result in higher levels of per-
formance early in training, but can also result in lower levels
of performance later in training and when the automation is
removed. In other words, although automation can assist novice
operators early in training, it apparently often does so at a cost
to the degree of learning that occurs. Furthermore, the presence
of aptitude-automation interactions, shown for the first time in
these experiments, suggests that the effects of automation on
training are greater for lower aptitude learners. While supplying
greater support to such learners, the presence of automation
may be masking differences between individuals and simultane-
ously impairing their ability to acquire fundamental knowledge
about the operation of the system. These are clearly matters of
potential practical importance.
A Proposed Framework of Cognitive Abilities and

Automation in Training
Although the preliminary findings indicating interactions
between automation and aptitude on training performance are
novel and interesting, applications of automation to the train-
ing context are far more complex than portrayed in these initial
experiments. To capture that complexity better, a high-level con-
ceptual framework is offered to begin to consider how automa-
tion, human information processing, and individual differences
jointly influence learner performance in the phases of training
(see Figure 6.1). The framework is referred to as automation,
processing, training, and individual differences (APTID).
The starting point for this framework is the model of automa-
tion from Parasuraman et al. (2000). These authors suggest that
classes of automation could be broadly linked with four stages of
human information processing. These classes of automation are
shown on the left side of Figure 6.1. Within each class, automa-
tion can vary from low to high, in accordance with the concepts
of levels of automation mentioned above. Hence, Parasuraman
and colleagues’ model delivers a level of specificity with respect
to automation that is necessary to encompass investigation of a
range of domains. However, the simplified information process-
ing stages originally suggested by Parasuraman and colleagues
have been adapted here to capture the types of processes
included within Kieras and Meyer’s (1997) EPIC architecture.
The reformulation of the information processing elements
offers several advantages, including a basis for future specifica-
tion of a minimal set of processes required, and what may be a
clearer path to ultimately mapping cognitive abilities from the
individual differences literature on to the information processing
framework. The simplification of the elements involved within
the human information processing for this initial formulation is
aimed at capturing classes of processing elements, rather than
Human Information
Automation Abilities
Processing
High
Input
Information Perceptual
Processors
Acquisition Speed
Low
Working
High Memory
Information Cognitive
Analysis Processors
Low
Spatial
High
Decision Memory
Automation Processors Reasoning
Low
High
Action Motor Psycho-
Automation Processors motor
Low
Figure 6.1 Proposed automation, processing, training, and individual

differences (APTID) framework.
individual items within EPIC (e.g., separate visual and verbal

processors within EPIC are instead presented here as a single
category of “Input Processors”). This simplification provides
some correspondence to the original formulation by Parasura-
man et al. (2000), but also a route to add complexity by allow-
ing future decomposition of aggregated processing stages to be
related back to those found in the EPIC architecture, or another
similar approach. Indeed the potential for future integration
into this framework of processing components from alternative
modeling approaches (e.g., chapters 9 and 10) certainly exists.
For now, a benefit to basing things within the general realm of
the EPIC model is that this model has already been applied to
the examination of the impact of automation on performance
(see Kieras & Meyer, 1997), and is well suited to the type of mul-
tiple task performance that occurs within complex situations
(Meyer & Kieras, 1997).
Although EPIC itself does not currently account for training
effects (for several examples of how training can be instantiated
within production system models see Klahr, Langley, & Neches,
1987), the basic elements offer sufficient insight to begin con-
sidering the types of mechanisms that would feature in the per-
formance changes that accompany increasing task proficiency.
The most central aspect is that with increasing exposure to a
task during training, new items will be stored within the “Mem-
ory Processors” components of the framework. EPIC contains
two forms of long-term memory processors, with individuals
able to create and store additional productions (sets of condi-
tion–action rules, essentially new ways of doing things) and
declarative memories (new facts and information). However, in
addition to refining the items and operations within stages, it
is plausible that training could also change the relationships
between stages; for example, one or more of the human infor-
mation processing stages employed by a novice in performing
the task might become bypassed following practice (see Pashler
& Baylis, 1991). Central to this notion is that training can have
different effects depending on the nature of the task, and the
form of training being utilized—related to the idea explored in
greater detail in chapter 8. Hence learning does not appear as a
single box or arrow within the present framework, but rather is
considered in terms of various changes within and between the
human information processing stages.
The APTID framework provides a basis to begin to consider
the implications of automation across different phases of skill
acquisition. As outlined above, one suggestion is that there are
three distinct phases: declarative, knowledge compilation, and
procedural phases, with each characterized by different foci on
the part of the learner. This perspective implies that the opti-
mal way of implementing automation may change as learners
move from one phase to the next. In first learning to drive a
car, for example, the learner in the declarative knowledge phase
might benefit most from an automated system that controls the
majority of subsystems, while the learner focuses in turn on
acquiring initial information about different limited sets of other
subsystems. The necessary declarative information could thus
be obtained relatively rapidly without overwhelming the learner.
As the learner then moves to the knowledge compilation phase,
however, it may be best to change the type and level of auto-
mation present during training exercises. Here the learner may
be focused on the creation of productions. Productions might
be seen in simple terms as types of if–then pairings (if—stop
sign seen, then—apply brakes). Thus for this type of learning
to progress efficiently, training might employ automation that
facilitates or reinforces stimulus response pairings.
As discussed earlier in this chapter, the phases of skill acqui-
sition are associated with different cognitive abilities (see Acker-
man, 1988). Within the context of automation as a training aide,
this perspective implies that the presence of automation will be
more or less effective for different people at different stages of
skill acquisition. Consistent with our research and findings that
general cognitive ability is a key determinant of the speed of
learning in the declarative phase of skill acquisition (Ackerman,
1988), automation implemented to facilitate learning in early
phases of training should be more beneficial to learners with
lower levels of g than it would be for those with higher levels
of g. However, automation implemented in a different phase of
skill acquisition would likely be more beneficial to a different
set of individuals as the specific ability determinants of learn-
ing would have also changed. As such, the learners that most
benefit from automation early in training are not likely to be
the same learners who most benefit from automation later in
training.
Taken together this line of reasoning suggests optimization
will occur when a specific level of automation is chosen for a
particular phase of learning based on the individual charac-
teristics of the learner and the information processing require-
ments of the task at that phase of learning. The core idea is that
the rate of progress within each stage of skill acquisition, and
on to a new stage from the previous one, will vary not just as
a function of the type of (sub)task being learned but also with
the abilities of the learner. Creating the correct variations in the
level of automation within the appropriate classes of automation
will be the challenge for those both designing automation and
designing training programs.
Future Directions, Applications, and Prospects

The experiments described in this chapter offer some important
initial insights into the impact on learning of using automation
as a training aide, including several cautionary tales for anyone
using untested approaches to training simply because automa-
tion can be attached to extant training environments. Automa-
tion clearly has a crucial role to play in enhancing performance,
productivity, and safety in many situations, but although there
may be functions that are better assigned to a machine (Fitts,
1951), reducing human involvement to passive observation or
overtaxing the human operator can both come at a cost (see
Hancock & Scallen, 1998). It is critical, however, to note that
even though the results of the initial research efforts suggest
that issues can arise when automation is present during train-
ing, it is far too early to advocate against using automation in
all training situations. The immediate need is to develop a better
understanding of the impact that automation has on learning,
retention, and transfer. The APTID framework laid out in this
chapter offers an initial perspective on how to begin to integrate
these various elements.
Although the initial empirical results offer at best no ill effects
from automation, and at worst detriments to learning, it is
expected that there are certain combinations of tasks and indi-
viduals for whom the inclusion of automation can enhance the
training process. As the APTID framework suggests, identifying
benefits will ultimately depend on the combination of the type
and level of automation within the context of the task, the abili-
ties of the learner, and the changing influence of those abilities
on performance as training proceeds.
One of the major barriers to answering key questions regard-
ing the impact of automation in the training of complex skills is
the scale of work required. Complex skill acquisition is inher-
ently slow to occur, and will, for many tasks, require commit-
ment and motivation from learners. At the same time, studying
individual differences necessitates the use of large sample sizes
comprised of individuals of varying levels of abilities. Yet, it is
fully expected that only by engaging in such time-consuming,
demanding, and resource consumptive research will a thorough
understanding of how best to instantiate automation in training
processes be developed.
References
Ackerman, P. L. (1988). Determinants of individual differences dur-
ing skill acquisition: Cognitive abilities and information processing.
Journal of Experimental Psychology: General, 117, 288–318.
Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological
Review, 89, 369–406.
Bainbridge, L. (1983). Ironies of automation. Automatica, 19, 755–779.
Ballas, J. A., Heitmeyer, C. L., & Pérez, M. A. (1992). Evaluating two
aspects of direct manipulation in advanced cockpits. Proceedings
of ACM CHI 92 Conference on Human Factors in Computing Systems
(pp. 127–134).
Billings, C. E. (1991). Human-centered aircraft automation: A concept
and guidelines (NASA Technical Memorandum 103885). Moffet
Field, CA: NASA Ames Research Center.
Blitch, J. G., & Clegg, B. A. (2011). The influences of automation and
trainee aptitude on training effectiveness. Proceedings of 55th
Annual Meeting of the Human Factors and Ergonomics Society.
Carroll, J. B. (1992). Cognitive abilities: The state of the art. Psychologi-
cal Science, 3, 266–270.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-ana-
lytic studies. New York: Cambridge University Press.
Carroll, J. B. (1997). The three-stratum theory of cognitive abilities.
In Contemporary intellectual assessment: Theories, tests, and issues
(pp. 122–130). New York: Guilford.
Clamann, M. P., Wright, M. C., & Kaber, D. B. (2002). Comparison of
performance effects of adaptive automation applied to various stages
of human-machine system information processing. Proceedings of
the Human Factors and Ergonomics Society 46th Annual Meeting
(pp.342–346). Santa Monica, CA: Human Factors and Ergonomics
Society.
Clegg, B. A., & Heggestad, E. D. (2010). Automation and effective train-
ing (Technical Report for U.S. Army Research Office MURI Grant
W911NF-05-1-0153). Boulder, CO: Colorado State University.
Clegg, B. A., Heggestad, E. D., & Blalock, L. D. (2010). The influences of
automation and trainee aptitude on training effectiveness. Proceed-
ings of 54th Annual Meeting of the Human Factors and Ergonomics
Society.
Cronbach, L., & Snow, R. (1977). Aptitudes and instructional methods:
A handbook for research on interactions. Oxford, England: Irvington.
Cummings, M. L., & Guerlain, S. (2007). Developing operator capacity
estimates for supervisory control of autonomous vehicles. Human
Factors, 49, 1–15.
Dixon, S. R., Wickens, C. D., & Chang, D. (2003). Comparing quantita-
tive model predictions to experimental data in multiple-UAV fl ight
control. Proceedings of the 47th Annual Meeting of the Human Fac-
tors and Ergonomics Society.
Dzindolet, M. T., Beck, H. P., Pierce, L. G., & Dawe, L. A. (2001). A
framework of automation use (Report Number ARL-CR-2412). Aber-
deen Proving Ground, MD: Army Research Laboratory.
Endsley, M. R. (1996). Automation and situation awareness. In R. Para-
suraman & M. Mouloua (Eds.), Automation and human performance:
Theory and applications (pp. 163–181). Mahwah, NJ: Erlbaum.
Endsley, M. R., & Kaber, D. B. (1999). Level of automation effects on
performance, situational awareness, and workload in a dynamic
control task. Ergonomics, 42, 462–492.
Endsley, M. R., & Kiris, E. O. (1995). The out-of-the-loop performance
problem and level of control in automation. Human Factors, 37,
381–394.
Ferreti, R. P., & Butterfield, E. C. (1992). Intelligence-related differ-
ences in the learning, maintenance, and transfer of problem-solving
strategies. Intelligence, 16, 207–223.
Finn, A., & Scheding, S. (2010). Developments and challenges for auton-
omous unmanned vehicles. Berlin, Germany: Springer-Verlag.
Fitts, P. M. (1951). Human engineering for an effective air-navigation
and traffic-control system. Columbus Ohio: Ohio State University
Research Foundation.
Fitts, P. M., & Posner, M. I. (1967). Human performance. Belmont, CA:
Brooks-Cole.
Funke, G., Matthews, G., Warm, J. S., & Emo, A. K. (2007). Vehicle
automation: A remedy for driver stress? Ergonomics, 50, 1302–1323.
Groeger, J. A., & Clegg, B. A. (2007). Systematic changes in the rate of
instruction during driver training. Applied Cognitive Psychology, 21,
1229–1244.
Gustafsson, J. E. (1984). A unifying model for the structure of intel-
lectual abilities. Intelligence, 8, 179–203.
Gustafsson, J. E. (1989). Broad and narrow abilities in research on
learning and instruction. In R. Kanfer, P. L. Ackerman, & R. Cudeck
(Eds.), Abilities, motivation, and methodology: The Minnesota Sym-
posium on Learning and Individual Differences (pp. 203–237). Hills-
dale, NJ: Erlbaum.
Hancock, P. A., & Scallen, S. F. (1998). Allocating functions in human-
machine systems. In R. R. Hoffman, M. F. Sherrick, & J. S. Warm
(Eds.), Viewing psychology as a whole: The integrative science of Wil-
liam M. Dember (pp. 509–539). Washington, DC: APA Press.
Kaber, D. B., & Endsley, M. R. (2004). The effects of level of automa-
tion and adaptive automation on human performance, situational
awareness, and workload in a dynamic control task. Theoretical
Issues in Ergonomics Science, 5, 113–153.
Kanfer, R., & Ackerman, P. L. (1989). Motivation and cognitive abili-
ties: An integrative/aptitude-treatment interaction approach to skill
acquisition. Journal of Applied Psychology, 74, 657–690.
Kieras, D., & Meyer, D. E. (1997). An overview of the EPIC architecture
for cognition and performance with application to human-computer
interaction. Human-Computer Interaction, 12, 391–438.
Kirlik, A. (1993). Modeling strategic behavior in human-automation
interaction: Why an “aid” can (and should) go unused. Human Fac-
tors, 35, 221–242.
Klahr, D., Langley, P., & Neches R. (1987). Production system models of
learning and development. Cambridge, MA: MIT Press.
Lee, J. D. (1992). Trust, self confidence and operators’ adaptation to
automation (Unpublished doctoral dissertation). University of Illinois
at Urbana-Champaign.
Lee, J. D., & Moray, N. (1992). Trust, control strategies and allocation of
function in human-machine systems. Ergonomics, 35, 1243–1270.
Liu , D., Wasson, R., & Vincenzi, D. A. (2009). Effects of system automa-
tion management strategies and multi-mission operator-to-vehicle
ratio on operator performance in UAV systems. Journal of Intelligent
and Robotic Systems, 54, 795–810.
Marmie, W. R., & Healy, A. F. (1995). The long-term retention of a
complex skill. In A. F. Healy & L. E. Bourne, Jr. (Eds.), Learning
and memory of knowledge and skills: Durability and specificity (pp.
30–65). Thousand Oaks, CA: Sage.
Martin, E., Lyon, D. R., & Schreiber, B. T. (1998). Designing synthetic
tasks for human factors research: An application to uninhabited air
vehicles. Proceedings of the Human Factors and Ergonomic Society.
Meyer, D. E., & Kieras, D. E. (1997). A computational theory of execu-
tive control processes and human multiple-task performance: Part
1. Basic Mechanisms. Psychological Review, 104, 3–65.
Moray, N. (1986). Monitoring behaviour and supervisory control. In K.
R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception
and human performance (pp. 1–51). New York: Wiley.
Moray, N., & Inagaki, T. (1999). Laboratory studies of trust between
humans and machines in automated systems [Special issue on
Man–Machine Systems]. Transactions of the Institute for Measure-
ment and Control, 21, 203–211.
Moray, N., Rodriguez, D., & Clegg, B. A. (2000). Levels of automation
in process control. Proceedings of 44th Annual Meeting of the Human
Factors and Ergonomics Society.
Mourant, R. R., & Rockwell, T. H. (1970). Mapping eye-movement pat-
terns to the visual scene driving: An exploratory study. Human Fac-
tors, 12, 81–87.
Mourant, R. R., & Rockwell, T. H. (1972). Strategies of visual search by
novice and experienced drivers. Human Factors, 14, 325–335.
Muir, B. M., & Moray, N. (1996). Trust in automation. Part II. Experi-
mental studies of trust and human intervention in a process control
simulation. Ergonomics, 39, 429–461.
Norman, D. A. (1991). Cognitive artifacts. In J. M. Carroll (Ed.), Design-
ing interaction: Psychology at the human-computer interface (pp.
17–38). New York: Cambridge University Press.
Parasuraman, R., & Riley, V. (1997). Humans and automation: Use,
misuse, disuse, abuse. Human Factors, 39, 230–253.
Parasuraman, R., Sheridan, T. B., & Wickens, C. D. (2000). A model of
types and levels of human interaction with automation. IEEE Trans-
actions on Systems, Man, and Cybernetics. Part A: Systems and
Humans, 30, 286–297.
Parasuraman, R., & Wickens, C. D. (2008). Humans: Still vital after all
these years of automation. Human Factors, 50, 511–520.
Pashler, H., & Baylis, G. (1991). Procedural learning: Locus of practice
effects in speeded choice tasks. Journal of Experimental Psychology:
Robinson, J. A., & Kingsley, M. E. (1977). Memory and intelligence: Age
and ability differences in strategies and organization of recall. Intel-
ligence, 1, 318–330.
Ruff, H. A., Calhoun, G. L., Draper, M. H., Fontejon, J. V., & Guilfoos,
B. J. (2004). Exploring automation issues in supervisory control of
multiple UAVs. In D. A. Vincenzi, M. Mouloua, & P. A. Hancock (Eds.),
Human performance, situation awareness, and automation: Current
research and trends (Vol. 2, pp. 218–222). Mahwah, NJ: Erlbaum.
Ruff, H. A., Narayanan, S., & Draper, M. H. (2002). Human interaction
with levels of automation and decision-aid fidelity in the supervisory
control of multiple simulated unmanned air vehicles. Presence, 11,
335–351.
Schneider, W., & Shiffrin, R. M. (1977). Controlled and automatic
human information processing: I. Detection, search and attention.
Sheridan, T. B. (1995). Human centered automation: Oxymoron or
common sense? IEEE, 823–828.
Sheridan, T. B. (1997). Supervisory control. In G. Salvendy (Ed.), Hand-
book of human factors (2nd ed, pp. 1275–1327). New York: Wiley.
Sheridan, T. B. (2002). Humans and automation. New York: Wiley.
Sheridan, T. B., & Verplank, W. L. (1978). Human and computer control
of undersea teleoperators (Technical report). Cambridge, MA: Man-
Machine Systems Laboratory, Department of Mechanical Engineer-
ing, Massachusetts Institute of Technology.
Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic
human information processing: II. Perceptual learning, automatic
attending and a general theory. Psychological Review, 84, 127–190.
Shinar, D., Meir, M., & Ben-Shoham, I. (1998). How automatic is man-
ual gear shifting? Human Factors, 40, 647–654.
Snow, R. (1989). Cognitive-conative aptitude interactions in learning.
In R. Kanfer, P. L. Ackerman & R. Cudeck (Eds.), Abilities, motiva-
tion, and methodology: The Minnesota Symposium on Learning and
Individual Differences (pp. 435–474). Hillsdale, NJ: Erlbaum.
U.S. Roadmap. (2007). Unmanned systems roadmap 2007–2032. U.S.
Department of Defense. Retrieved from http://purl.access.gpo.gov/
GPO/LPS91893
Wiener, E. L. (1988). Cockpit automation. In E. L. Wiener & D. C. Nagel
(Eds.), Human factors in aviation (pp. 433–461). San Diego, CA:
Academic.
Wiener, E. L., & Curry, R. E. (1980). Flight-deck automation: Promises
and problems. Ergonomics, 23, 995–1011.
7 Conducting Technology-
Based Applied Training
Research
Stephen L. Goldberg and Paula J. Durlach
Orlando Research Unit, U.S. Army Research
Institute for the Behavioral and Social Sciences
Military training has been going on since armies were first

organized. The goal of military training is to provide soldiers
and units with the skills and knowledge necessary to maintain
effective and ready military forces. This goal is accomplished by
individual training of soldiers on the skills and knowledge their
military occupational specialty requires, and collective training
of units to perform their assigned missions with coordination
and synchronization.
Military training is a large and complex enterprise that must
overcome many challenges, such as limited time and resources
for training, the size and diversity of the trainee population, and
continually changing training content. In a short time, civilian
recruits must be taught the skills and knowledge necessary for
them to contribute to their unit meeting its mission require-
ments. Almost everyone enters the military at the entry level.
Unlike the business world an experienced platoon sergeant from
another army cannot be hired. Because most soldiers spend
only 3 or 4 years in the military, there is a continual need to
train new recruits. Collective training trains soldiers to perform
their jobs as members of a hierarchy of units or teams.
These challenges and the large numbers that must be trained
necessitate use of training methods that are not only effective
but efficient, durable, and generalizable. Training development
needs to keep pace with the introduction of new equipment
and technology, and adaptive enemies. U.S. joint forces will
face a range of national security challenges that will require
a weighted mix of military actions including combat aimed at
defeating armed enemies, security activities to protect and man-
age civil populations, engagement to improve capabilities of or
cooperation with other governments, and relief and reconstruc-
tion (U.S. Department of Defense, 2009).
The opinions expressed in this chapter are those of the authors and should not
be construed to represent the official position of the Department of the Army.
Conducting Technology-Based Applied Training Reserach 135
Conducting Applied Military Research
In the U.S. military, training research grew out intelligence
testing and selection and classification research during World
War I, and human factors research during World War II that
was conducted in academic and government laboratories. Since
1972 the organization that has been charged with conducting
training research in the Army has been the U.S. Army Research
Institute for the Behavioral and Social Sciences (ARI). ARI’s mis-
sion has been to apply behavioral science research methods to
improve Army training (Uhlaner, 1977). Technology-based train-
ing has been the research area addressed by ARI’s research unit
in Orlando, Florida. The unit was established in the mid-1980s
to provide behavioral science support to the Army’s agency that
procures its training systems and simulators. Technology-based
training research has been multidisciplinary with psycholo-
gists working closely with engineers and computer scientists to
develop training system research findings that will reduce risk
in technologies’ technical design and their features that support
learning and performance. The unit led research in the design
of after action review (AAR) systems which collect simulation
data to portray, summarize, and provide feedback to soldiers
regarding what happened during a collective training event. The
unit has also been conducting research on how virtual reality
technologies can be applied to training dismounted combatants
in virtual environments or computer games. Recently, the unit
has begun research on the design and authoring of adaptive
computer-based training to include intelligent tutor technology.
Applied research projects collect data from either military or
nonmilitary volunteers. Research employing soldiers as partici-
pants usually is conducted on military installations. Less fre-
quently soldiers are brought to research facilities like those at
the ARI laboratory in Orlando where nonmilitary participants
also have been subjects in experiments.
Conducting research with soldiers can be complicated to
arrange. In order to obtain troop support for research, coordina-
tion must occur through either the Army’s training command or
its command that is responsible for preparing operational units
for deployment. These two commands are responsible for the
majority of the soldiers in the United States. Both commands
schedule “umbrella weeks” during which an Army installation
stands down from its normal training activities to support vari-
ous research organizations’ requests for soldiers to participate
in either surveys or experiments. Typically, the major installa-
tions offer 1 week per year, at different times. The number and
types of soldiers and the amount of time they are available are
136 Stephen Goldberg and Paula Durlach
negotiated prior to the umbrella week with the major command
headquarters and the installation. The wars in Iraq and Afghan-
istan have reduced most installations’ capability to participate
in umbrella weeks because soldiers’ time at home is limited, and
there is much to do to reorganize and train for the next rotation
to a war zone. Training research may require more time than
umbrella weeks allow. In that case it has been possible to nego-
tiate for support during non-umbrella week periods.
Regardless of how soldiers are recruited for a research project,
problems sometimes arise because units and individual soldiers
have competing demands for their time. Supporting research
could become a lower priority than an operational requirement
or medical appointment. As a result no matter what support
was negotiated there could be a difference between what the
researcher requested and who participates in the research.
Applied research involves compromise and adaptability. Given
that it could be impossible to run additional participants,
research results and conclusions may have to be tempered by
the vagaries of conducting research in the field.
Army Training in Transition

Traditionally, U.S. Army institutional training has been based
on career tracks, where, over time, the individual participates
in a set of predefined courses linked to promotions; however,
the Army is now in the process of converting this linear model
of education and training to one allowing more spontaneity and
responsiveness to immediate needs. The Army aims to become
“learner-centric” (TRADOC, 2011). Soldier proficiency and devel-
opmental needs, as well as specific operational circumstances
and requirements will be used to shape the content, timing,
delivery, and duration of training. Task analysis has been the
foundation of military training for the last 40 years. (The tax-
onomy presented in chapter 8 provides an elaborate description
of military tasks and their training requirements.) Task analysis
defines the skills, knowledge, and abilities needed to perform
a given occupational specialty from novice to expert. The sys-
tems approach to training (SAT; Branson, 1975) has been the
general approach to training used by the military, and it has
used task analysis to define tasks, conditions, and standards of
performance for each military specialty at each level of experi-
ence and for each type of military unit. SAT follows the analyze,
design, develop, implement, evaluate model, with an analyze
phase focusing on job and task analysis. This model results in
training that is highly procedural, but sometimes inadequate
in preparing soldiers to deal with novel situations (see chapter
2 for a discussion of the specificity of procedures and the pro-
cedural reinstatement principle). Today’s soldiers must be able
to deal effectively with ambiguity and quickly adapt to dynamic
evolving operations over short and extended periods. Current
operations demand that soldiers at all levels have an increased
understanding of geopolitical, cultural, language, technical, and
tactical knowledge. Leaders must be able to cope with complex,
ill-defined problems and make effective decisions with less than
perfect information. These aspects of soldiers’ activities have not
been captured in the traditional behavioral task analysis that
supports SAT (Clark & Estes, 1996; see also chapter 8).
Thus, Army training development is currently in a period
of transition aimed at producing adaptable soldiers and lead-
ers who have the cognitive, interpersonal, and cultural skills
necessary to make sound tactical and strategic judgments in
complex environments. The Army needs training that will create
not just routine expertise (capability to perform skills efficiently
and effectively), but also adaptive expertise, as per Hatano and
Inagaki (1986) and Schwartz, Bransford, and Sears (2005).
According to Hatano and Inagaki, adaptive experts understand
when and why different procedures are appropriate for different
circumstances. Moreover they can modify procedures according
to new constraints, and can make predictions or provide expla-
nations about novel situations beyond their past experience.
Left on its own the development of an adequate level of adaptive
expertise could take years to achieve. A challenge to military
training and military training researchers is to develop methods
to accelerate the process through training.
Development of adaptive expertise depends on deliber-
ate practice involving modification of procedures to deal with
variations in the situation. Deliberate practice is a focused and
guided process that relies heavily on coaching for scaffolding of
task, feedback, and error diagnosis (Lussier & Shadrick, 2003),
which are discussed in chapters 2 and 4. This type of practice
might lead to conceptual understanding through development
and refinement of a mental model that can be used explicitly
for reasoning about new situations (Gentner & Stevens, 1983)
or through more implicit processes such as recognition-primed
decision making (Klein, 1998). During recognition-primed deci-
sion making, people make relatively fast decisions without hav-
ing to explicitly weigh various options, by generalizing from
experience of similar situations. The implicit pattern-matching
process may be experienced phenomenologically as intuition.
Scenario-Based Training
The military has a long history of using scenario-based train-
ing; that is, training that involves practice conducting tasks or
making decisions in situations similar to ones that will be faced
in the real situation. Unlike drill-and-practice, involving short
repetitive trials, scenario-based training often has a narrative
character, in which the students must try to solve some problem
(e.g., equipment troubleshooting), or must play a character in a
simulated situation (e.g., rescue the hostages). Traditional “war
games” are live simulations; but in addition to these, the military
has invested considerable resources to create simulation envi-
ronments, incorporating ever increasing technical capabilities
for networking and fidelity. One advantage of technology-based
simulation training (over live) is that it is potentially repeatable,
so that students can try out different courses of action, and
observe the results. Another advantage is that rare but hazardous
situations can be simulated, providing the opportunity to learn
how to deal with these situations in a realistic and safe environ-
ment. Technology-based scenario training has the potential to
provide students with the varied situations required to establish
adaptive expertise, exposing them to key learning opportuni-
ties, which normally might take several years of real experience
to encounter; however, the understanding of the most efficient
and effective methods for training adaptive expertise in such
environments has advanced more slowly than the technology
itself. Reaping training benefits from scenario-based training is
often highly dependent on support from human instructors who
select training scenarios, observe trainee behavior, and provide
feedback, prompts, and reflective discussion. The training ben-
efits can therefore be highly dependent on the motivations and
expertise of those human instructors. Much of current applied
research conducted on military scenario-based training is
aimed at automating some of these processes or in determining
the best guidance for human instructors, in order for students
to get the most out of the training experience. The questions
involve the entire training process, including determining the
training content, the learning objectives and assessment mea-
sures (including retention and transfer), designing appropri-
ate technology delivery mediums, and providing instructional
guidance and feedback. Examination of various methods to
implement these training components should be conducted by
randomized, controlled tests of competing methods, altering one
relevant variable at a time (Sweller, Kirschner, & Clark, 2007);
however, the reality of military applied research is that this ideal
may be difficult to attain. For example, the technology delivery
medium may already exist and be in use (e.g., an established
simulation system), and one must work with the existing tech-
nology and usage constraints.
Adaptive Training
The ultimate challenge in developing technology-based training
for individuals is to approximate as closely as possible one-on-
one human tutoring. This approximation requires the ability to
represent and employ the rich knowledge and behavioral flexibil-
ity of human tutors, the aim of intelligent tutoring systems (ITS).
Development of ITS has progressed since the 1960s (VanLehn,
2006), but, although a handful of companies profess to produc-
ing ITS, expertise in this area is still primarily in the hands
of academia. ITS ideally consist of several component models,
which interact to control the student experience. These com-
ponents correspond to the knowledge used by human tutors:
a student model (knowledge about the student, a pedagogical
model); a set of instructional strategies and behaviors: a domain
model (knowledge about the subject being taught); and an expert
model (knowledge of how to solve problems in the domain). The
term ITS has been somewhat loosely applied, however; applica-
tions labeled as ITS do not necessarily have all (or any!) of these
models. Many of the “true” ITS have been created to supplement
classroom training in academic subjects, such as algebra and
physics (Graesser et al., 2004; Koedinger & Aleven, 2007; Van-
Lehn et al., 2005).
These ITS provide students with an environment in which to
practice solving step-based problems that require a sequence of
steps to arrive at a solution (e.g., finding the value of x in 35 = 5x
+ 5). Guidance and feedback are typically provided for each step,
and analysis of student performance is used to select the next
problem the student should attempt. Relatively less effort has
been devoted to incorporating the techniques of ITS into sce-
nario-based practice environments; however, Sherlock was one
early exception (Lesgold & Nahemow, 2001). Sherlock trained
technicians on device troubleshooting, targeting rare, but criti-
cal breakdowns. Besides simulating the device environment, and
presenting challenging problems to solve, two methods of help
were provided. One was an opportunity to review brief descrip-
tions of fault-causing phenomena that might be relevant to the
situation. The other was the opportunity to review a structured
summary of actions already taken. The structuring was based
upon application of an expert model to the data already gathered
by the trainee plus the data in the original problem statement.
The trainee could click on any action and see a list of questions
the ITS were prepared to answer. Sherlock truly accelerated the
acquisition of adaptive expertise, producing the same amount
of improvement in skill after 25 hours of training as occurred,
on average, in 4 years on the job. One of the critical aspects of
Sherlock development, which made it successful, was the inten-
sive upfront analysis of the domain. This analysis allowed the
creation of an expert model which generated expert solutions to
the troubleshooting problems (A.M. Lesgold, personal communi-
cation, March 2010).
Testing Training Effectiveness. Research to develop an

adaptive training system to train leaders in Army companies
on employment of small unmanned aerial systems (SUAS) had
goals to create an automated adaptive training system and
a parallel nonadaptive system, to test whether the adaptive
version promoted superior learning outcomes compared with the
nonadaptive version, all in approximately 1 year (see Durlach
& Dargue, 2010, for additional details). The motivation was to
determine whether the extra effort involved in making training
software adaptive reaps benefits in superior learning outcomes
or time savings.
Based on the scope of the project and analysis of the knowl-
edge leaders need to bring to bear on SUAS operations, nine
terminal learning objectives were selected. Each of these had
3 to 10 specific enabling learning objectives. For example, for
terminal airspace mission requests, there were three types
of requests to be covered: planned, immediate, and dynamic.
Didactic materials to teach knowledge of the enabling objectives
were constructed by selecting information from relevant doc-
trine and other Army publications. Two subject matter experts
then constructed two branching scenarios, which would allow
students to demonstrate knowledge of the learning objectives in
the context of tactical missions. At each decision point, students
had to select a decision option. Each choice updated the stu-
dent’s scores on the enabling learning objectives, using weights
set by the subject matter experts. Thus the expert knowledge
was implicit in the way the scenarios were written, and the deci-
sion choices were linked to the enabling learning objectives.
The experts also created 22 multiple choice test questions
to assess student knowledge of the learning objectives, which
would be used for pretest and posttest assessment. An example
of a question illustrates the nature of the knowledge test used
for pre- and posttesting:
What procedural control measure is used to restrict fast
attack manned aircraft from operating in the battalion air-
space where it would interfere with SUAS fl ights?
a. Communications Checkpoint
b. Coordinating Altitude
c. Minimum Risk Route
d. Restricted Operating Zone
The adaptive version of the prototype system is adapted to

the learner in two main ways. First, results on the pretest were
used to allow students to skip the didactic materials related to
the questions passed on the pretest (whereas in the nonadaptive
version, students had to review all the didactic material). (This
adaptation is similar to the clicker technology and the dropout
procedure discussed in chapter 3, which allow instructors to
compress instruction time to eliminate already known mate-
rial.) Second, remediation could be given during scenario per-
formance, if student reactions indicated poor understanding of
specific enabling learning objectives. If an error rate criterion
was met, the scenario would terminate and students would be
provided with a review of the relevant didactic information. They
would then be returned to the scenario at some point prior to
where they exited (at a natural break in the storyline). In con-
trast, in the nonadaptive version, scenarios were not aborted for
purposes of remediation. Neither did scenarios branch, based
on student input (the story line was fi xed). For both conditions,
feedback was given after each decision, and scenarios were fol-
lowed with performance review. In the nonadaptive case, stu-
dents could return to the didactic material if they chose, before
doing another scenario. In both conditions, references materials
were available during the scenarios, including a glossary, and
the ability to ask for hints.
In order to compare the effectiveness of the adaptive and non-
adaptive versions, the participation of 48 soldiers was requested
through umbrella weeks, and complete data sets were obtained
from 31 participants, 17 from one installation and 14 from
another. Requested were 12 each: company commanders (cap-
tains), platoon leaders (lieutenants), and noncommissioned offi-
cers (staff sergeants or sergeants first class), from maneuver
units (infantry and armor), because the responsibility for over-
seeing SUAS operations can vary from company to company.
The actual participants were 2 captains, 12 lieutenants, 4 staff
sergeants, and 13 sergeants first class, who came from nine dif-
ferent branches (e.g., infantry, artillery, military police, etc.).
Overall, 16 participants completed the adaptive version, and 15
the nonadaptive version. Soldiers all gave informed consent, and
were randomly assigned to conditions. They completed a paper
questionnaire prior to training, which asked about experience
and background.
Results. Participants completed the pretest (without feedback)

prior to training and an identical posttest after training. A mixed
analysis of variance using these test scores as the repeated
factor, and location, rank (commissioned vs. noncommissioned
officer [NCO]), and condition as the between-group factors
indicated a significant effect of test time (pretest vs. posttest),
F(1, 23) = 4.55, p < .01, showing that overall, participants
did increase their scores from pretest (mean 49%) to posttest
(mean 62%), with an effect size (partial eta-squared) of 0.48.
There failed to be a significant two-way interaction between
test time and condition, as had been predicted; however, there
was a three-way interaction, among test time, test location,
and condition, F(1, 23) = 4.55, p <.05. It appeared that at one
test site, participants learned more from the adaptive version
(average posttest scores of 69% vs. 56% for the adaptive and
nonadaptive conditions, respectively), whereas at the other test
site, they benefited more from the nonadaptive version (average
posttest scores of 54% vs. 66% for the adaptive and nonadaptive
conditions, respectively). The adaptive condition at one location
barely learned at all, with pretest and posttest scores of 52%
and 54%, respectively. The data were checked repeatedly to
rule out the possibility of a coding mix-up. Presumably, this
interaction was not truly caused by location per se, but by
some intervening variable or variables, or by chance. It was the
case that there were (by chance) more NCOs in the location-
adaptive combination that evidenced poor learning: 78% of the
participants in that condition were NCOs, whereas for the other
three subgroups only 28% to 38% of the participants were NCOs.
It is possible that the NCOs at one particular location tended to
have a poorer attitude about participating and did not take the
exercise seriously.
Participants were required to complete training, which
entailed successful completion of two scenarios, or unsuccess-
ful completion of the two after three attempts at each. With
respect to the time to complete training, the same interaction
was apparent as for test scores, F(1, 22) = 9.57, p < .01. That
is, participants at one location took more time to complete the
adaptive version than the nonadaptive version (means of 154.3
and 86.6 min, respectively), whereas at the other location, the
reverse was the case (means of 63.8 and 106.7 min for the adap-
tive and nonadaptive conditions, respectively). Shorter training
times tended to be associated with poorer posttest performance
(Spearman r = .35); but the time distribution was bimodal, mak-
ing it an inappropriate covariate in the analysis of test perfor-
mance scores. An analysis of pretest and posttest scores based
on a median split of time-on-task (median = 82.5 minutes) failed
to yield a significant interaction (F(1, 28) = 2.73; p > .10; pre-
test means were 49% correct for both below and above median
groups; posttest means were 58% and 67% for below and above
median groups, respectively). A Mann-Mann Whitney U test on
the posttest data alone, however, yielded a U = 64.5; p < .05.
These results are consistent with the conclusion that when
learners spend more time on training, they learn more; however,
they fail to disambiguate whether this is a causal effect (i.e.,
more time causes better learning), or whether this is due to an
unmeasured intervening variable (e.g., higher aptitude students
tend to spend more time on task).
Clearly more data are required to evaluate properly the
effectiveness of the two versions of the prototype training, and
whether it really might be the case that the adaptive version is
more beneficial to some soldiers whereas the nonadaptive ver-
sion is more beneficial to others. It had not been anticipated that
testing location might actually be involved in an interaction. The
mixture in ranks was requested to reflect the potential training
audience, were the prototype actually fielded. Likewise testing
occurred at two locations in order to get a sufficient number of
participants.
The outcome illustrates some of the challenges of attempting
to develop and evaluate adaptive training in a short time frame.
There was insufficient time to pilot test the questions contribut-
ing to the outcome evaluation measure (the questions used on
the pre- and posttest) for difficulty or sensitivity. Assuming the
observed interactions were spurious, the failure to find an out-
come difference across conditions could reflect true nondifferen-
tial effectiveness of the two conditions; but, it might also reflect
lack of sensitivity of the outcome measure. On the other hand,
if the observed interactions can be replicated, it would suggest
that there are factors besides during-training performance that
need to be considered when designing adaptive training tech-
nology. Something in the background or attitudes of the NCOs
at one testing location led to better posttest performance with
the nonadaptive version than the adaptive version; but that fac-
tor could not be isolated solely on the basis of the demograph-
ics data that were collected. If that factor could be identified, it
could be used to contribute to the selection of which version of
the training to present to different trainees. It could be that stu-
dent attitudes about participating in the training at all should
influence the type of training system they interact with.
Learning in Virtual Environments

The U.S. Army has invested in networked virtual simulators to
support collective training of its armor and mechanized infan-
try units. SIMNET (Alluisi, 1991) first, and then the Close Com-
bat Tactical Trainer (CCTT) were fielded across the Army to
provide training on tactics at the company and platoon level.
Before the introduction of virtual collective training systems,
sand table rehearsals (training conducted on terrain relief mod-
els on which models of vehicles or symbols of larger units could
be moved tactically) prepared units for field training. The net-
worked simulations provided a more realistic and active way to
train platoon and company tactics. The network simulations,
however, consisted exclusively of vehicle simulators (tanks and
infantry carriers). Effective simulation of infantry on foot, an
important battlefield element, was missing from both training
systems. Developing an effective simulator for the dismounted
soldier has been a significant simulation challenge from both a
technological and psychological perspective. A vehicle operator
learning to drive in a simulator sits in a work station with visual
inputs and controls similar to those in the vehicle. The trainee
turns a steering wheel, steps on a brake pedal and looks at a
visual virtual world through a windshield. The simulator for the
dismounted combatant is very different (see Figure 7.1). It has
to provide visual, auditory, and tactile inputs directly to the sol-
dier’s senses through devices such as helmet mounted visual
displays, and have effectors that allow the soldier to accomplish
in the simulation the various types of movement and positions
a person on foot can assume. The simulator also has to pro-
vide a means for the soldier to communicate with others in his
small unit and employ infantry weapons. Campbell, Knerr, and
Lampton (2004) listed the functions that needed to be present
in virtual simulations for dismounted soldier training to be pos-
sible. Psychologically, the virtual dismounted soldier needs to be
able to perceive the environment accurately, develop situational
awareness, navigate from point to point, and perform infantry
tasks when appropriate.
ARI conducted a multiyear research program that inves-
tigated the capabilities to train and perform military tasks in
virtual environments using virtual reality technology (Knerr et
Figure 7.1 Dismounted soldier simulation system including helmet

mounted display, head and body tracking, weapon that
also controls movement and other functions, and wearable
image generator.
al., 1998). The program included research on individuals’ ability

to perceive stimuli and perform simple movements and actions
(Lampton et al., 1995), to navigate through a complex building
and transfer that knowledge to the real world (Witmer, Bailey,
& Knerr, 1995; Witmer, Sadowski, & Finkelstein, 2000), and to
use artificial cues placed in the environment to facilitate learn-
ing (Singer, Kring, & Hamilton, 2007). Also investigated were
training of two-person teams (Lampton, McDonald, Rodriguez,
Morris, & Parsons, 2001) and small units (Evans, Knerr, &
Gesselman, 2009; Knerr, 2007). Training in the virtual world
affords team training of team members who are geographically
distributed and have never met. Thus, research also compared
training outcomes when teammates were colocated geographi-
cally (i.e., in the real world) vs. distributed geographically (i.e.,
in different cities; Singer, Grant, Commarford, Kring, & Zavod,
2001).
Knerr (2007) listed the essential components and capabilities
required for collective training using dismounted soldier simu-
lation. The critical elements needed are:
1. Computer environment generator/processor.

2. Network that transmits information among the simulators.
3. Trainee control suite including: weapon controller, locomo-
tion controller, voice input device.
4. Trainee display suite including: visual display (large screen,
helmet mounted, or PC monitor), auditory display, possibly
haptic display.
5. Management system to set up training scenarios, track
progress, and provide feedback through an After Action
Review performance recording capability.
Knerr (2007) summarized the results of a series of dismounted

soldier simulation collective training assessments conducted by
ARI that ran from 1999 to 2005. Each of the assessments used
similar hardware and software with the majority of the exer-
cises being conducted at Ft. Benning, Georgia at the Soldier
Battle Lab (Knerr & Lampton, 2005; Knerr, Lampton, Crowell,
et al., 2002; Knerr, Lampton, Thomas et al., 2003). Each year
features were added and the general capabilities of the simula-
tors increased.
In each of the assessments a squad of soldiers was asked to
perform a series of scenarios in the Soldier Visualization Station
(SVS). The SVS was a system in which a soldier stood facing a
rear projected screen that displayed a visual image of a virtual
environment from the soldier’s perspective. In the SVS, the sol-
dier had a rifle that had a small joystick and some buttons built
into it. The joystick allowed the soldier to move in the environ-
ment and the buttons could be used to throw explosive or smoke
grenades. The soldier wore a head tracker which informed the
system on whether he or she was standing, kneeling, or in a
prone position. Each squad of soldiers was given training on the
use of the SVS followed by their performance of five scenarios
over a 2-day period. Each of the scenarios had soldiers perform-
ing a small unit infantry mission such as conduct a patrol, clear
a building, or come to the aid of a downed helicopter. Squad lead-
ers were given time to plan prior to conducting each scenario. In
most cases enemy combatants were computer generated semiau-
tomated forces. Semiautomated forces follow predefined routes,
have rules of behavior that dictate their movement, and have the
ability to detect and engage opponents. They are semiautomated
in that an operator can reprogram them during the exercise if
the original plan is no longer applicable or they are about to do
something that showed their limited artificial intelligence.
Soon after each scenario concluded there was an after action
review (AAR) led by an experienced trainer using the dismounted
infantry virtual after action review system. This system (Clark,
Lampton, Martin, & Bliss, 2004) was developed jointly by ARI,
Orlando and the University of Central Florida’s Institute for Sim-
ulation and Training. The system records the computer network
traffic during the scenario. During the AAR it allows the AAR
leader to play back selectively portions of the scenario or use
specialized displays, like the firefight graphic, which summa-
rizes who was shooting at whom, to make teaching points, to
provide a ground truth counterpoint to the squad’s perceptions
of what happened, or to show specialized displays like the snail
trail feature, which visually depicts where each squad member
went, how long squad members were in one spot, and if they
knelt down or went into a prone position. At the end of the sec-
ond day the soldiers were asked to rate their ability to perform
infantry tasks in the simulators. Squad and fire team leaders
and for some exercises fire team members were administered
a training effectiveness questionnaire, which asked if they felt
their performance on various tasks improved over the course
of the training. Also during the last exercise at Ft. Benning in
2002, three evaluators (experienced trainers) rated each squad
on 14 items with scores corresponding to the percentage of time
the leader and unit demonstrated the behaviors described in the
item. The first and last scenarios were designed to be equiva-
lent in an attempt to see if the evaluators’ rating of the second
equivalent scenario improved over the first. Knerr et al. (2003)
reported that five of the six comparisons showed improvement
though the improvement was not large and the small number of
squads (three) precluded further statistical analysis.
Knerr (2007) summarized the results of the training effec-
tiveness questionnaire over five separate data collections. He
reported that leaders’ ratings increased consistently every year
from .82 (less than slight improvement) in 1999 to 2.06 (moder-
ate improvement) in 2005. Leaders reported the most improve-
ment in controlling their units, assessing the tactical situation,
and communication. Ratings of simulator capability to perform
infantry tasks changed little over the exercises even though the
capabilities of the simulators increased with time. Precise move-
ment (to enter rooms or avoid furniture and other soldiers) and
determining where enemy fire was coming from were consis-
tently rated low.
The squad training exercises provided limited answers with
regard to the effectiveness of the virtual dismounted trainers to
improve squad learning and performance over the course of the
five training scenarios. With the exception of the trainer rat-
ings obtained in the last Ft. Benning exercise, the assessment of
learning was at Kirkpatrick’s first level of learning and training
evaluation, the reaction of the trainees, what they thought and
felt about the training (Kirkpatrick & Kirkpatrick, 2006). There
are several factors that contributed to the limited measures
of training effectiveness. The exercises used the usual Army
approach to collective training in which it is rare for scenarios
to be repeated. This variability (although probably beneficial
for learning; see the variability of practice principle in chapter
2) makes it difficult to compare performance across scenarios,
because conditions are always different. Because this is collec-
tive training, capturing the performance of all the individuals is
difficult, as is defining what constitutes learning. Also, because
the unit of measurement is the squad, obtaining enough squads
to reach adequate statistical power levels was not possible due
to time, expense, and availability of soldiers. Therefore, there
was no way to measure learning or the effectiveness of training
under these conditions.
In response to these problems two projects were initiated to
develop the tools needed. McGilvray, Leibrecht, and Lockaby’s
(2008) goal was to develop the foundation for a tool that could
be used to evaluate unit performance in a scenario independent
manner. Sticha, Weaver, Ford, and Campbell (2011) took the ear-
lier work as a starting point and developed a Bayesian network
probabilistic model that represents conditional dependencies
between the performance of specific skills, level of competency,
and effectiveness of training. In the model, observable skills are
linked to competencies such as “move” and “shoot.” Performance
measures of the skills (such as go/no go outcomes) are consid-
ered to be indicators of the general competencies. Skill perfor-
mance is moderated by the conditions present in the scenario.
Conditions might be “easy” or “difficult,” for example. Successful
performance of a skill would be evidence of competency, and it
would be stronger evidence if moderated by difficult conditions.
Competency levels are summed to provide an overall probability
of successful performance of the unit on a given scenario. If the
unit then performed a second scenario, the performance on that
scenario would provide estimates of competency and the learn-
ing rate which represents the net improvement or decrement in
each of the competencies between the first and second scenario.
Observations from each scenario add to or subtract from col-
lective competency scores. The result is a model that addresses
the core skills of small unit tactics and allows changes in per-
formance to provide evidence for changes in proficiency in those
competencies over diverse scenarios.
Distributed Multiplayer Games

Game technology has improved to the point that game engines
for training have become the driver of dismounted virtual simu-
lations. Multiplayer games can be networked through a server
with game participants possibly widely geographically distrib-
uted. One such multiplayer game, the On-Line Interactive Virtual
Environment system (OLIVE) developed by Forterra Systems,
Inc., was modified to include soldiers, vehicles, weapons, civil-
ians, and enemy combatants on a Middle Eastern terrain data-
base. OLIVE evolved from a game that was originally designed to
provide opportunities for singles to socialize in a resort setting.
The resort game’s dune buggies and gliders were transformed in
the military version into HUMVEEs and helicopters. Unlike first
person shooter games, which emphasize tactical operations, the
social origins of OLIVE included the capability for its avatars to
portray facial expressions and gestures. Given the urban nature
of the wars in Iraq and Afghanistan, the ability to interpret body
language and gestures and use an interpreter was as important
as learning to maneuver under fire.
ARI participated with the U.S. Army Simulation and Training
Technology Center, a primarily engineering oriented research
organization, in three training exercises that used OLIVE. In
each exercise soldiers from the United States and the United
Kingdom trained together in game-based virtual environments
representing the Middle East and Africa. The soldiers learned
to perform rescue and peace keeping missions and perform
operations with soldiers from another country that had different
customs, idioms, and procedures. Research objectives for these
exercises were to evaluate the computer networks, to conduct
game-based simulations, to work out the processes for conduct-
ing training across 5,000 miles, and to determine usability, sol-
dier acceptance, and potential training value (Singer & Knerr,
2010).
In each of the three exercises the same measures were used to
collect data from both countries’ soldiers who were asked about
their military background and game playing experience prior
to participating in the scenarios. After the scenarios they were
asked about the game’s fidelity, usability, and ability to support
training. Soldiers rated the effectiveness of the training rela-
tive to other training they received for the same types of tasks
and missions. Training effectiveness questions also addressed
perceived skill changes in the individual and their small unit’s
performance. In the first two exercises there was interest in
comparing the results generated by the UK versus U.S. soldiers.
The third exercise introduced a second game engine to be com-
pared to OLIVE. Virtual Battle Space 2 (VBS2) was used for half
the scenarios (Singer, Barnett, & Taylor, 2011). It is a game the
U.S. Army has purchased for wide use, and there was interest to
see if it had the capability to support what were primarily peace
keeping scenarios with limited combat.
The exercises were complicated undertakings and difficult to
control. Engineering issues, as well as training issues, had to
be addressed. The exercises were the most complicated training
scenarios the OLIVE system had supported both in number of
objects in the scenarios and in the long-distance nature of the
computer networks. There were serious problems during each
of the exercises with the reliability and speed of the network.
The training scenarios and expectations for each exercise were
different, and there was sometimes a clash in training prac-
tices between the U.S. and UK contingents that impacted on
how feedback was provided after a scenario. There were differ-
ences in the makeup of the units from exercise to exercise. In
the first exercise the U.S. participants consisted of a mixture
of West Point cadets, reservists, and officers from Ft. Benning,
Georgia. The last two exercises included U.S. soldiers from the
same operational units. The UK side had an intact unit partici-
pating in each of the exercises. As a result of the complicated
nature of the exercises, limited time available, soldiers who were
unfamiliar with each other and game-based training, and tech-
nical problems, training data that were collected were more like
that from a pilot study then a carefully controlled experiment.
The data collected were mostly descriptive and opinion data. The
comments of the military participants indicated that they felt
that distributed game-based training had potential to provide
effective training.
Despite the technical problems, all of the exercises demon-
strated that training could be accomplished using game-based
simulations with a widely dispersed training audience. The
soldiers involved perceived that they benefited from the training
despite the deficiencies in the simulation system. U.S. and UK
soldiers were asked how they felt the training they received com-
pared to similar training in the field with regard to the diversity
of tasks to be performed, the ability to review their performance,
and the ease with which changes in the exercise could be imple-
mented. They were also asked if the game-based exercise was
adequate for them to learn the escalation of force learning objec-
tives built into the scenarios. For all of the training effective-
ness questions average responses were in the 2.5 to 3.0 range
on a 5 point scale with standard errors from .163 to .270. These
responses indicated that training could be conducted but was
neither better nor worse than field training. Soldiers felt that
the OLIVE system did not contain all of the weapons and capa-
bilities that they would normally bring to the types of scenarios
used in the training. Although they liked the graphics, weap-
ons, and other military capabilities built into the VBS2 game
in Exercise 3, they preferred the interface for OLIVE over the
keyboard inputs required by VBS2. Soldiers from both the U.S.
and UK had never experienced small unit coalition training or
missions. Many of them found the concept of working in small
units with soldiers from another country as unrealistic and not
necessarily a good use of their time. The U.S. Army’s Director of
Training (personal communication, BG R.C. Longo, December
2, 2009) felt that coalition training at the small unit level pro-
vided soldiers with insights into the culture, organization, tac-
tics, and procedures of soldiers from other nations that would
be useful given the coalition nature of current operations. UK
officers, when asked if game-based training met their training
needs, responded that it could be a great addition to their overall
training program if it were used prior to field training (Singer et
al., 2011).
Final Comments
Applied training research investigates the application of psycho-
logical theories, technologies, and innovative training methods to
improve military training. Intelligent tutor systems, dismounted
soldier simulation, and game-based distributed training were
presented to demonstrate the problems and opportunities inher-
ent in applied research. The controls applied to basic research
experiments are often difficult to impose on research conducted
in the field with soldiers. Sample sizes are small for collective
training research because the unit of measure is the small unit
(fire team, squad, or platoon) and not the individual soldier.
When methods for training with technologies are being investi-
gated, the technologies in many cases have not been developed
to the point where they are reliable and stable. While innova-
tion can most easily be incorporated in new training systems
at this point, the innovation must change with the changes in
the overall system. Another frequent problem in applied training
research is the lack of adequate measurement methods, par-
ticularly for collective training.
The common problems inherent in conducting applied research
can sometimes limit the conclusions that can be drawn. However,
the value of the findings that have been produced has and will
continue to have an impact on the way the U.S. Army trains. As
mentioned above the Army is moving to a learner-centric train-
ing approach that emphasizes the use of ITS, simulation, and
collaborative learning (TRADOC, 2011). Applied research will
provide the support for adopting new training methods. It will
also help to reduce the risk of government contractors tasked
with developing new training technologies by demonstrating the
efficacy of approaches that do and do not work.
References
Alluisi, E. A. (1991). The development of technology for collective train-
ing: SIMNET, a case history. Human Factors, 33, 343–362.
Branson, R. K. (1975) Interservice procedures for instructional systems
development: Executive summary and model. Tallahassee, FL: Flor-
ida State University Center for Educational Technology for Naval
Education and Training Command, Pensacola, FL.
Campbell, C. H., Knerr, B. W., & Lampton, D. R. (2004). Virtual envi-
ronments for infantry Soldiers (ARI Special Report 59). U.S. Army
Research Institute. Retrieved from http://handle.dtic.mil/100.2/
ADA464022.
Clark, B. R., Lampton, D. R., Martin, G. A., & Bliss, J. P. (2004). User’s
manual for the Dismounted Infantry Virtual After Action Review Sys-
tem (DIVAARS) (ARI Research Product 2004-03). U.S. Army Research
Institute. Retrieved from http://handle.dtic.mil/100.2/ADA425427
Clark, R. E., & Estes, F. (1996). Cognitive task analysis for training.
International Journal of Educational Research, 25(5), 403–417.
Durlach, P. J., & Dargue, B. W. (2010). Adaptive and non-adaptive
training technology for small unmanned aerial system employment.
Paper presented at the U. S. Army Science Conference, Orlando, FL.
Evans, K. L., Knerr, B. W., & Gesselman, A. N. (2009) Training small unit
leaders and teams (ARI Special Report S-68). U.S. Army Research
Institute. Retrieved from http://handle/dtic.mil/100.2/ADA508029.
Gentner, D., & Stevens, A. L. (1983). Mental models. Hillsdale, NJ:
Erlbaum.
Graesser, A. C., Lu, S., Jackson, G. T., Mitchell, H. H., Ventura, M.,
Olney, A., & Louwerse, M. M. (2004). AutoTutor: A tutor with dia-
logue in natural language. Behavior Research Methods, Instruments,
and Computers, 36, 180–192.
Hatano, G., & Inagaki, K. (1986). Two courses of expertise. In H. Ste-
venson, H. Azuma, & K. Hakuta (Eds.), Child development and edu-
cation in Japan (pp. 262–272). New York: Freeman.
Kirkpatrick, D. L., & Kirkpatrick, J. D. (2006). Evaluating training pro-
grams (3rd ed.). San Francisco, CA: Berrett-Koehler.
Klein, G. A. (1998). Sources of power: How people make decisions. Cam-
bridge, MA: MIT Press.
Knerr, B. W. (2007). Immersive simulation training for the dismounted
Soldier (ARI Study Report 2007-01). U.S. Army Research Institute.
Retrieved from http://handle.dtic.mil/100.2/ADA464022.
Knerr, B. W., & Lampton, D. R. (2005). An assessment of the Virtual-
Integrated MOUT Training System (V-IMTS; ARI Technical Report
1163). U.S. Army Research Institute. Retrieved from http://handle.
dtic.mil/100.2/ADA438315.
Knerr, B. W., Lampton, D. R., Crowell, H. P., Thomas, M. A., Comer, B.
D., Grosse, J. R., … Washburn, D. A. (2002). Virtual environments
for dismounted soldier simulation, training, and mission rehearsal:
Results of the FY 2001 Culminating Event (ARI Technical Report
dtic.mil/100.2/ADA403147 .
Knerr, B. W., Lampton, D. R., Singer, M. J., Witmer, B. G., Goldberg,
S. L., Parsons, K. J., & Parsons, J. (1998). Virtual environments for
dismounted soldier training and performance: Results, recommenda-
tions and issues (ARI Technical Report-1089). U.S. Army Research
Institute. Retrieved from http://handle.dtic.mil/100.2/ADA360109
Knerr, B. W., Lampton, D. R., Thomas, M., Comer, B. D., Grosse, J.
R., Centric, J. H., … Washburn, D. A. (2003). Virtual environments
for dismounted soldier simulation, training, and mission rehearsal:
Results of the Fy2002 Culminating Event. (ARI Technical Report
dtic.mil/10.2/ADA417360.
Koedinger, K. R., & Aleven, V. (2007). Exploring the assistance
dilemma in experiments with cognitive tutors. Educational Psychol-
ogy Review, 19, 239–264
Lampton, D. R., Knerr, B. W., Goldberg, S. L., Bliss, J. P., Moshell, J. M.,
& Blau, B. S. (1995). The Virtual Environment Performance Assess-
ment Battery (VEPAB): Development and evaluation. (ARI Technical
Report 1029). U.S. Army Research Institute. Retrieved from http://
handle.dtic.mil/100.2/ADA297277.
Lampton, D. R., McDonald, D. P., Rodriguez, M. E., Morris, C. S., &
Parsons, J. (2001). Instructional stategies for training teams in vir-
tual environments (ARI Technical Report 1110). U.S. Army Research
Institute. Retrieved from http://handle.dtic.mil/100.2/ADA389674.
Lesgold, A., & Nahemow, M. (2001). Tools to assist learning by doing:
Achieving and assessing efficient technology for learning. In D.
Klahr & S. Carver (Eds.), Cognition and instruction: Twenty-five
years of progress (pp. 307–346). Mahwah, NJ: Erlbaum.
Lussier, J. W. & Shadrick, S. B. (2003, October13–15). Adaptive think-
ing training for tactical leaders. Paper presented at NATO Human
Factors and Medicine Symposium on Advanced Technologies for
Military Training, Genoa, Italy.
McGilvray, D. H., Leibrecht, B. C., & Lockaby, K. J. (2008). Measuring
learning and performance in collective training exercises (ARI Con-
tractor Report 2008-02). U.S. Army Research Institute. Retrieved
from http://handle.dtic.mil/100.2/ADA480052.
Schwartz, D. L., Bransford, J. D., & Sears, D. (2005). Efficiency and
innovation in transfer. In J. P. Mestre (Ed.), Transfer of learning from
a modern mutidisciplinary perspective (pp. 1–51). Greenwich, CT:
Information Age.
Singer, M. J., Barnett, J., & Taylor, G. (2011) Evaluation of two distrib-
uted game-based simulations during exercises. Arlington, VA: U.S.
Army Research Institute for the Behavioral and Social Sciences.
Manuscript in preparation.
Singer, M. J., Grant, S. C., Commarford, P. M., Kring, J. P., & Zavod,
M. (2001). Team performance in distributed virtual environments (ARI
Technical Report 1118). U.S. Army Research Institute. Retrieved
from http://handle.dtic.mil/100.2/ADA396489.
Singer, M. J., & Knerr, B. W. (2010). Evaluation of a game-based simula-
tion during distributed exercises (ARI Research Report 1931). Arling-
ton, VA: U.S. Army Research Institute for the Behavioral and Social
Sciences.
Singer, M. J., Kring, J. P., & Hamilton, R. M. (2007). Instructional fea-
tures for training in virtual environments. (ARI Technical Report
1184). Retrieved from http://handle.dtic.mil/100.2/ADA455301.
Sticha, P. J., Weaver, E. A., Ford, L. A., & Campbell, R. C. (2011). Assess-
ing learning and performance in collective training exercises (Draft
HumRRO Report). Alexandria, VA: Human Resources Research
Organization.
Sweller, J., Kirschner, P. A., & Clark, R. E. (2007). Why minimally
guided teaching techniques do not work: A reply to commentaries.
Educational Psychologist, 42, 115–121.
TRADOC. (2011). The U. S. Army Learning Concept for 2015. TRADOC
Pamphlet 525-8-2. Retrieved from http://www-tradoc.army.mil/
tpubs/pams/tp525-8-2.pdf.
Uhlaner, J. E. (1977). The research psychologist in the Army 1917 to
1977 (ARI Research Report 1155). Arlington, VA: U.S. Army Research
Institute for the Behavioral and Social Sciences.
U.S. Department of Defense (2009). Capstone concept for joint opera-
tions (version 3.0). Retrieved from http://www.jfcom.mil/newslink/
storyarchive/2009/CCJO_2009.pdf
VanLehn, K. (2006) The behavior of tutoring systems. International
Journal of Artificial Intelligence in Education, 16, 227–265.
VanLehn, K., Lynch, C., Schulze, K., Shapiro, J. A., Shelby, R., Taylor,
L., … Wintersgill, M. (2005). The Andes physics tutoring systems:
Lessons learned. International Journal of Artificial Intelligence in
Education, 15, 147–204.
Witmer, B. G., Bailey, J. H., & Knerr, B. W. (1995). Training dismounted
soldiers in virtual environment: Route learning and transfer (ARI
Technical Report 1022). Retrieved from http://www.dtic.mil/dtic/tr/
fulltext/u2/a292900.pdf.
Witmer, B. G., Sadowski, W. J., & Finkelstein, N. M. (2000). Training
dismounted Soldiers in virtual environments: Enhancing configura-
tion learning (ARI Technical Report 1103).Retrieved from http://
handle.dtic.mil/100.2/ADA381715.
8 A New Taxonomy
for Training
William D. Raymond, Alice F. Healy,
The goal of research supported by the MURI grant was to pre-

dict the effects on performance of different training methods for
complex military tasks. A multipronged approach for meeting
this goal involved extensive basic experimental research explor-
ing the effects of training variables on performance in labora-
tory tasks, together with computational modeling of human
task performance. The empirical research has been the basis
for a set of training principles, which are heuristics that relate
training methods and outcomes and can assist in the develop-
ment of training regimens by the military (see chapter 2). How-
ever, the range of variables that can affect training efficacy and
the multiplicity and diversity of tasks that may require train-
ing prevent an exhaustive quantification of training outcomes
for specific tasks and training scenarios. In order to render the
study of training effects tractable and to guide future research,
a multidimensional taxonomy was developed and is described
in this chapter, to provide a framework by which training effects
can be assessed and predicted for any task.
A taxonomy is a hierarchical classification based on a con-
sistent set of principles that can be tested for agreement with
empirical data and whose order corresponds to a real order of
the classified elements (Krathwohl, Bloom, & Masia, 1964). To be
testable, features of the present taxonomy should thus be relat-
able to the design of laboratory experiments conducted to explore
training variables. That is, the taxa must be capable of captur-
ing the tasks, manipulations, and measured responses of the
experiments. At the same time, taxa should be no finer than the
experimental manipulations. In addition, the features should be
broad enough to cover task, training, and performance require-
ments that may likely be encountered in a military context,
which may be broader than the scope of current experimental
coverage (although military tasks frequently include the experi-
mental tasks as subtasks). Of further interest to the military is
A New Taxonomy for Training 157
relating taxon effects captured by the present taxonomy to the
task taxonomy in the military’s simulation software, Improved
Performance Integration Tool (IMPRINT; Archer et al., 1999) (see
chapter 10). Thus, a further constraint on the taxonomy is that
there be a mapping from the present task taxa to IMPRINT task
taxa.
At the highest level, the taxonomy that was developed involves
a four-dimensional decomposition of the training space. It
includes separate dimensions of classification for task struc-
ture, training procedure, and the context and assessment of
task performance. The training principles that were described
in chapter 2 are considered the fourth dimension of the tax-
onomy. The first three dimensions have been structured in the
present taxonomy as hierarchical classifications, whose values
and relationships are described in this chapter.
An assumption of the decompositional approach is that the
goal of predicting performance for any task can be accomplished
by combining the effects on each performance measure of indi-
vidual training components for all task elements. Accomplishing
this goal would rely on an exploration of the matrix of cells in
the training space defined by the taxa of the three dimensions.
The required work extends well beyond the current state of the
science; however, the space was partially explored by empirical
studies conducted with support from the MURI, and identifica-
tion of current coverage allows for planning of future work.
This chapter presents a brief review of approaches to tax-
onomies, emphasizing the first three dimensions, together with
motivation and description of the taxa selected for use in the
present taxonomy. Principles used to select taxa, as well as the
correspondence between the organization of taxa and the phe-
nomena they are meant to capture, are highlighted. After pre-
senting the taxonomy, applications of it to two tasks, a digit
data entry task (e.g., Healy, Kole, Wohldmann, Buck-Gengler, &
Bourne, 2011) and a visual search task (Young, Healy, Gonzalez,
Dutt, & Bourne, 2011), are discussed to illustrate how a taxo-
nomic analysis can facilitate the understanding of task acqui-
sition and performance. A taxonomic analysis using IMPRINT
task taxa and the present training and performance taxa was
performed on all experimental tasks used in MURI supported
research. The analyses were compiled to produce a planning
matrix that shows the current extent to which the training
space has been investigated. The planning matrix can be used
to guide future research. Finally, areas that are identified as
needing further development to enhance taxonomic analysis of
the training space are discussed.
158 William D. Raymond et al.
Task Type
A general definition of a task was given by Miller (1953) to
accommodate the analysis of increasingly complex human
activities. According to Miller, a task is “a group of discrimina-
tions, decisions and effector activities related to each other by
temporal proximity, immediate purpose and a common man-
machine output” (cited in Meister, 1976, p. 96). The definition
can be interpreted as recognizing that tasks involve perceptual
inputs, cognitive processing, and motor responses. From this
starting point, the development of a specific taxonomy of human
tasks has been approached in a variety of ways, including clas-
sifications based on task stimuli, human behavior during task
performance, or human ability requirements (Companion &
Corso, 1982). The approach to classification clearly depends on
the purpose to which a taxonomy is to be put (Gawron, Drury,
Czaja, & Wilkins, 1989).
One class of task taxonomies particularly important in the
fields of human learning and performance began with the
notion that tasks can be analyzed according to their demand
on human abilities (Fleishman, 1978). Roth (1992) proposed a
taxonomy with five broad ability taxa: attentional, perceptional,
psychomotor, physical, and cognitive. As an application of the
taxonomy, empirical data were used by Roth (1992) to relate
the effects of external stressors to each ability taxon. Weighted
decompositions of specific subtasks are then available to predict
stressor effects at the task level.
The current task decomposition, shown in Table 8.1, was built
on taxonomies like the Roth (1992) taxonomy of abilities, intro-
ducing a finer classification of abilities, while keeping the num-
ber of taxa tractable. Taxa were selected principally to capture
the cognitive processing of stimuli. Categorizing information pro-
cessing tasks was considered to be central, because of both the
military’s primary desire to optimize training for the networked
battlefield and the fact that most empirical studies conducted
with MURI support were largely designed to explore cognitive
processing, with concomitant perceptual and psychomotor pro-
cesses. In information processing tasks inputs are initially pro-
cessed using perceptual and attentional abilities. Information is
further synthesized with higher order cognitive processes and
memory, and output responding is planned. Finally, a psycho-
motor response is produced. This sequential processing cycle is
reflected in the hierarchy of the taxonomy.
Although the current task taxonomy is sufficiently compre-
hensive to decompose laboratory tasks, it may be that the use of
the task taxonomy for some Army tasks may require additional
Table 8.1 The Task Dimension of the New Taxonomy
Perceptual/Attentional Visual detection
Processing Visual discrimination
Language processing
(written)
Auditory detection
Auditory discrimination
Language processing (oral)
Haptic processing
Cognitive/Affective Synthesis Executive control/

Processing Monitoring
Memory/Symbolic
representation
Imagery/Visual
representation
Concept formation/
Classification
Reasoning/Problem solving
Decision making
Motivation/Affect
Response Language planning
planning Motor response planning
Physical/Communicative Manipulation/Fine motor

Response output
Action/Gross motor output
Language production
distinctions. New ability taxa could readily be incorporated into

the existing taxonomy. In addition, it may be desirable to allow
for the inclusion of the relative contribution of each taxon to the
performance of a task, which may vary from task to task and
also across training.
The current task taxa are different from the task taxa used for
military simulation in IMPRINT; however, it is possible to estab-
lish a mapping between the present features and the IMPRINT
task taxa, although the mapping is not one-to-one. The map-
ping is shown in Table 8.2.
Training Method
The training dimension must include variables that capture
the method of instruction and the types of activities performed
during learning. The taxonomy adopted here builds on earlier
research on educational methods in the classroom. Berliner
(1983) recognized the need for more rigorous definitions of edu-
cational treatments, and he provided a taxonomy for classroom
Table 8.2 Mapping between the New Task Taxa and IMPRINT Task
Taxa
New task taxa IMPRINT task taxa
Visual detection, Visual Visual

discrimination
Language processing (written) Communication (reading & writing)
Auditory detection, (no corresponding IMPRINT taxon)

discrimination
Language processing (oral) Communication (oral)
Haptic processing Fine motor - discrete

Fine motor - continuous
Executive control/Monitoring Information processing
Memory/Symbolic Information processing

representation Communication (oral)
Communication (reading & writing)
Imagery/Visual representation Information processing
Concept formation/ Information processing

Classification
Reasoning/Problem solving Information processing

Numerical Analysis
Decision making Information processing
Motivation/Affect (no corresponding IMPRINT taxon)
Language planning Communication (oral)
Motor response planning Communication (reading & writing)

Fine motor - discrete
Manipulation/Fine motor Fine motor - discrete

output Fine motor - continuous
Action/Gross motor output Gross motor - light

Gross motor - heavy
Language production Communication (reading & writing)

Communication (oral)
activity structures that takes into account variables such as
the roles of students and teachers during instruction, classroom
group size, response and feedback types, and the range and
source of instructional content. The Berliner taxonomy captures
many important training manipulations, but is limited because
of its focus on classroom settings.
A broader perspective of training methods for instructional
systems is captured by Jonassen and Tessmer (1996–1997),
who present a taxonomy of training outcomes and the instruc-
tional and learning strategies and tactics that can be used to
achieve them. Their taxonomy was compiled from a review of
relevant literature. The learning outcomes encompass not only
declarative and procedural (or more broadly structural) knowl-
edge (see chapter 1), but also the acquisition of higher order
cognitive and metacognitive processes, as well as motivational
learning and self-knowledge. The strategies and tactics range
from general objective strategies (e.g., present examples, allow
practice, provide feedback, relate to prior knowledge) to more
outcome-specific approaches that are meant to support the
higher order learning outcomes and that reflect psychological
research on learning (e.g., model cognitive activity, scaffold per-
formance, contextualize instruction, facilitate learner explora-
tion). To facilitate instructional design and evaluation, Jonassen
and Tessmer linked desired outcomes to the strategies appropri-
ate for achieving those outcomes. The measure of instructional
effectiveness is the number of identified required and supportive
tactics that are present in the instructional design.
The strategies and tactics of the Jonassen and Tessmer (1996–
1997) taxonomy cover a broad range of training components,
many of which are concepts important to training design. How-
ever, their taxonomy lacks detail sufficient to specify particular
training regimens. For example, their taxonomy includes taxa
for type of stimulus presentation (“massed practice” or “distrib-
uted practice”), but not the amount of practice; it also includes a
taxon for presentation of information (“present facts”), but does
not cover the medium of instruction (e.g., lecture vs. text). To
meet the goal of quantification of the effects of training method,
a more detailed specification of training methods, which can be
directly related to experimental design, is desired.
There are two major pieces in the decomposition of task learn-
ing in the present taxonomy: pedagogy and practice. Pedagogy
captures in detail the method of task instruction. The peda-
gogy taxa are shown in Table 8.3, along with the values each
parameter may assume. The practice taxa are used to describe
the nature of practice performed during training. Practice can
Table 8.3 The New Training Dimension Pedagogy Taxa
Instruction method Lecture/Instruction
Demonstration
Discovery
Computer instruction
Simulation (i.e.,
interaction with
computerized
representation of a task)
Pedagogy parameters
Modeling (mimicking =
observe and mimic a
model performing the
task)
Discussion/Question & answer default = 1-way; 2-way
Immersion default = no; yes
(embedded in field
context)
Learning location default = local; remote or
“distance learning”
Individualization default = no; yes - e.g.,
human or intelligent
computer tutoring
Group training default = no; group size
Automation default = no; yes
be further subdivided into scheduling parameters (how and

when stimuli are presented), task parameters (characteristics
of the training task, especially as it relates to the training goal),
feedback parameters, and training context parameters (speci-
fying additional or competing activities during task training).
The parameter groupings for the practice taxa and the currently
defined parameters within each grouping are shown in Table
8.4. Standard parameter values are indicated as default val-
ues in Tables 8.3 and 8.4, with the range of alternative values
indicated.
Evidence for effects of parameters from both groupings on
skill acquisition in a variety of tasks has been demonstrated
in numerous laboratory studies—see Proctor and Vu (2006) for
a review; see also O’Neil (2003) on distance learning; Carpen-
ter, Pashler, Wixted, and Vul (2008) and Szpunar, McDermott,
and Roediger (2008) on testing during training. Corroborative
evidence comes from studies of expert performance. Although
the set of parameter values selected for inclusion in the current
Table 8.4 The New Training Dimension Practice Taxa
Scheduling Number of items/trials

Parameters Item difficulty default = unspecified;
difficulty level
Item repetition default = massed;
repetition interval
Time spacing default = no rest; rest
interval
Distribution default = mixed; blocked
Change in spacing default = none;
expansion; contraction
Session (parameters of importance; at least
number of sessions and session spacing)
Testing default = no testing; test
schedule
Overlearning default = no; yes
Task Scope part, e.g., mental

Parameters rehearsal; default =
Practice parameters
whole; supplemental
Deep processing default = no; yes
Mediation (e.g., use default = no; yes
of prior knowledge)
Attentional focus default = no focus;
internal, external
Attentional breadth default = intermediate;
global, local
Stimulus-response default = yes; no
compatibility
Mapping type default = consistent;
varied
Contralateral default = no; yes
training
Time pressure default = no; yes
Stressor default = no; yes
Feedback Presence of default = no; yes

Parameters (response) feedback
Feedback scheduling (relative to items)
Training Distractor default = no; yes

Context Secondary activity default = none;
Parameters simultaneous;
sequential
taxonomy are intended to allow an analysis of most training
scenarios, additional pedagogy and practice parameters may
be added to the taxonomy as the range of training exploration
expands.
Performance Context and Assessment

Taxonomies of criteria for assessing training outcomes have
been particularly important in assessing the effectiveness of
training programs in the business environment. A simple and
influential taxonomy of assessment criteria (Kirkpatrick, 1987;
see Alliger, Tannenbaum, Bennett, Traver, & Shotland, 1997, for
an augmented version of the taxonomy) specifies four catego-
ries of criteria: reactions, learning, behavior, and results. The
category of reactions assesses a trainee’s judgment of training
usefulness, difficulty, and pleasantness. Learning encompasses
all posttest assessments of knowledge and skill, although their
tests most commonly measure declarative knowledge of train-
ing materials. The behavior category captures on-the-job perfor-
mance or behavior. The results category includes measures of
the organizational impact of training.
Of importance to the current research effort from this tax-
onomy are the categories of behavior and learning, that is, mea-
sures of performance on the job (i.e., “in the field”) and of posttest
performance. However, the Kirkpatrick (1987) taxonomy lacks
sufficient detail to apply it to specific posttraining performance
situations. The behavior category does not capture differences
between training and performance environments, which are
known to impact performance. Additionally, the learning cate-
gory in the Kirkpatrick taxonomy leaves unspecified what types
of measures may be necessary to assess training outcomes. The
performance dimension of the current taxonomy incorporates
these two components with separate taxa, of performance con-
text and of performance assessment, but provides greater detail
for both. Performance context covers the conditions of and delay
to posttraining performance, relative to training; performance
assessment specifies appropriate measures of performance.
Performance Context
The performance context component relates the environment
of posttraining performance to the training environment. The
major component of performance context captures the relation-
ship of performance to the items, context, and task encountered
in training. In addition, performance context is concerned with
Table 8.5 Decomposition of the Performance Context Dimension of the
New Taxonomy
Transfer parameters New items, item default = same as
order, or item training; different
distribution items, order, or
distribution
New context default = same as
training; different
context
New task default = same as
training; different
task
Retention interval default = none; time

since training
Refresher training default = none;

schedule refresher schedule
the time between training and performance and the frequency

of any intervening refresher training prior to performance. The
taxa in the present taxonomy for performance context are shown
in Table 8.5.
Performance Assessment
Complex training goals can be evaluated using systems designed
to facilitate assessment of the acquisition of knowledge, such
as in the taxonomy of cognitive learning developed by Bloom,
Englehart, Furst, Hill, and Krathwohl (1956). In their taxon-
omy, cognitive learning goals can be arranged in a hierarchy
of knowledge complexity. Mastering any level of the hierarchy
requires mastery of the behaviors in the taxa below it. The lev-
els proposed by Bloom et al. are shown in Table 8.6, along with
methods of assessment for each level.
Table 8.6 The Bloom et al. (1956) Taxonomic Hierarchy for the
Cognitive Learning Domain
Learning Goal Assessment
Knowledge Recall or recognize information
Comprehension Comprehend or interpret information
Application Use information to complete a task
Analysis Distinguish, classify, and relate knowledge
Synthesis Originate and combine ideas
Evaluation Appraise and assess ideas based on standards
The Bloom et al. (1956) taxonomy focuses on the acquisition
of verbal, or declarative, knowledge and associated behaviors.
Skill performance can generally be objectively assessed in terms
of speed or accuracy of task completion. Separate measures are
needed, because it has been shown that there are trade-offs
between speed and accuracy in some tasks. In a digit data entry
task, speed and accuracy show different patterns of results;
speed improves with training while accuracy declines (Healy,
Kole, Buck-Gengler, & Bourne, 2004; see chapters 1 and 2).
However, in other scenarios, the opposite pattern might obtain.
Moreover, situations in which training produces improved effi-
ciency of performance (i.e., faster and more accurate responding)
need to be differentiated from those in which it alters only the
speed–accuracy criterion. It is also important to assess perfor-
mance on subcomponents of a task. For example, the response
times for executing the different steps of a digit data entry task
are not always positively correlated, with the typing of digits
slowing down on some digits in order to be faster on subsequent
ones (Healy et al., 2004).
In some tasks, there is also a necessity to develop some
index of changes in the learner’s cognition during training. For
example, in a binary classification task, Bourne, Raymond,
and Healy (2010) have shown that even when both speed and
accuracy measures show continuous improvement, subjects
use different strategies to guide their responses, often changing
strategies during training, with strategy differences impacting
performance. Measures must be developed to assess changes
in cognitive strategies, because the strategy chosen may impact
speed and accuracy, or even retention and transfer.
Researchers have also expanded the scope of learning out-
comes to include affective or attitudinal learning goals as well as
knowledge and skill acquisition. Drawing on all three areas of
research, Kraiger, Ford, and Salas (1993) proposed a more com-
prehensive taxonomy of learning outcomes, shown in Table 8.7.
They define learning as changes in cognitive, skill-based, and
attitudinal states and discuss how learning in each category
can be measured (see Table 8.7). The Kraiger et al. (1993) classi-
fication forms the basis for the present performance assessment
taxonomy. However, speed and accuracy measures of individual
components can be combined with the different levels to form
a more detailed taxonomy of assessment tests. Having quanti-
fied the outcome of a particular training scenario, the effective-
ness of training can be measured by comparing posttraining
performance with performance before or at the beginning of
Table 8.7 The Kraiger, Ford, and Salas (1993) Classification of
Learning Outcomes and Associated Measures of Assessment
Learning outcome Assessment
Cognitive Verbal Knowledge Tests of memory
Outcomes Knowledge Organization Probe cognitive
structures
Cognitive Strategies Probe task protocol
Skill-based Compilation Proceduralization Change in
Outcomes Composition performance
Automaticity Test with

interference
stimuli or
distractors
Affective Attitudinal Self-report
Outcomes Motivational Disposition Self-report with
increasing
problem difficulty
training, using an accepted measure of improvement, such as

the training effectiveness ratio (Wickens & Holland, 2000). Per-
formance results can then feed back to further training design
and enhancements to the training regimen.
Training Principles
Existing taxonomies put an emphasis on domains and teaching
methods and neglect empirically based principles that impact
performance within those domains and methods. A major con-
tribution of the present new taxonomy is its inclusion of an inde-
pendent dimension identifying those principles. This dimension
is orthogonal to the three others that comprise the new tax-
onomy. Table 8.8 provides a hierarchical listing of the principles,
following the outline of chapter 2. The listing divides the princi-
ples into three major categories relating to (a) resource and effort
allocation, (b) context effects, and (c) task parameters. These
principles have a solid basis in empirical research but might be
differentially effective across task types or domains, training
methods, and performance measures. The full four-dimensional
taxonomy, thus, serves to provide a framework for predicting
training outcomes under various combinations of tasks, meth-
ods, and measures.
Table 8.8 Principles of Training (from Chapter 2)
Principles Relating to Deliberate Practice
Resource and Effort Depth of Processing
Allocation
Generation Effect
Focus of Attention
Strategic Use of Knowledge
Cognitive Antidote to Fatigue, Boredom, or
Task Disengagement
Principles Relating to Procedural Reinstatement

Context Effects Functional Task Development
Part-Task Training
Easy-Difficult Ordering
Principles Relating to Spacing

Task Parameters Feedback Optimal feedback type
Optimal feedback schedule
Rehearsal Mental versus physical
rehearsal
Fixed versus expanding
rehearsal
Testing
Overlearning
Task Difficulty
Stimulus-Response Compatibility
Serial Position
Variability of Practice
Using the Taxonomy

A taxonomic breakdown of task, training, and performance
dimensions provides a way to explore the training space incre-
mentally. For example, by holding the task constant, training
effects can be empirically quantified within many cells in the tax-
onomic space across the training and performance dimensions.
Empirical data for combinations of taxa have been generated
by numerous experiments, and various separate experimen-
tal manipulations have provided speed, accuracy, and strategy
measures of performance for the effects of many training and
performance contexts on task taxa. This section considers the
coverage of the training space defined by the current taxonomy
that has been provided by experiments using two tasks, a simple
number typing task (digit data entry) and a more complex visual
search task (the RADAR task). Additional support and informa-
tion regarding the effects of some taxa have been provided by
computational modeling of these tasks (described in chapters 9
and 10).
Digit data entry is one simple task that has been extensively
used by the investigators to explore the effects of training on
skill acquisition (e.g., Healy et al., 2011). Most basically, the digit
data entry task consists of typing, usually using the number
keypad, a series of four-digit numbers presented visually on a
computer screen. In this form, the task can be broken down,
using the task taxonomy, into four taxa: visual detection (read-
ing numbers from the screen); memory/symbolic representation
(the cognitive representation of each number); motor response
planning (for typing each number); and manipulation/fine motor
output (typing).
Pedagogy in all digit data entry experiments has simply
involved (written) instruction. Practice in all training scenarios
has involved the repeated entry of numbers. However, experi-
ments have explored the effects of varying practice scheduling
parameters, including the number of items, item difficulty (e.g.,
by varying numerical structure or by requiring the arithmetic
generation of the numbers that are to be entered), item repeti-
tion, item distribution, and the number of training sessions.
Various task parameters have also been manipulated, includ-
ing task scope (full typing task vs. mental rehearsal), process-
ing depth (numeral vs. verbal presentation format), processing
mediation (association of numbers with prior knowledge), con-
tralateral training (practice with one hand followed by execution
with the other), and the presence of a physical stressor during
training (hand weights). Additionally, the presence of feedback
has been manipulated, as well as use of a simultaneous second-
ary task (articulatory suppression) and a sequential secondary
task (calculation of the typing termination key at the end of each
digit sequence). Finally, performance context has been varied
from training context in terms of transfer parameters (new vs.
old numbers, mental vs. physical typing task, typing hand, and
typing on keypad vs. number row), posttraining retention inter-
val, and refresher training schedule.
A number of important findings are the result of analyzing
task performance in terms of its component taxa for digit data
entry. Measuring speed and accuracy separately revealed that
these measures show different patterns of results, as noted.
Moreover, different training methods can influence the results
of the measures independently, with, for example, the presence
of a secondary task requirement (the calculation of the typing
termination key) providing a cognitive antidote to the other-
wise observed decline in typing accuracy across practice (Kole,
Healy, & Bourne, 2008; see chapter 2). The scope of practice
(whole task vs. mental rehearsal) has an effect on the transfer
of performance, with mental practice improving retention and
transfer by strengthening an effector-independent representa-
tion (Wohldmann, Healy, & Bourne, 2008; and see chapter 2). A
taxonomic analysis of the digit data entry task has also allowed
us to quantify differential effects of training on individual taxa.
In particular, repeated practice results in faster performance;
however, the rate of improvement differs for the cognitive and
motoric components of the task, with more learning occurring
for the cognitive component than for the motoric component
(Healy et al., 2004). The size of the cognitive and motoric compo-
nents was confirmed and quantified by optimizing the IMPRINT
model digit data entry (see chapter 11).
The RADAR task, developed by Gonzalez and Thomas (2008),
is a visual search task in which subjects look for assigned sym-
bol targets in four squares that move from the four corners to
the center of a radarlike display in a fi xed amount of time. Each
search opportunity, as the squares converge to the center, is
called a frame. In each of seven frames comprising a trial differ-
ent sets of target and distractor symbols may be shown in the
squares, and the target symbols may differ from trial to trial
(with only one target possible per trial). The size of the target
memory set assigned for a trial includes either one or four sym-
bols. Targets and distractors may be taken from different sym-
bol sets (consistent mapping) or the same symbol set (varied
mapping). The squares that contain the symbols may also be
blank. Subjects are to respond only if a target in the current
memory set appears in one of the squares, and scoring is based
on both accuracy and correct response speed. The task can be
broken down into six taxa: visual detection (scanning for sym-
bols); memory/symbolic representation (remembering targets
in the memory set); imagery/visual representation (of symbols
seen in a frame); decision making (target or distractor decision);
motor response planning; and manipulation/fine motor output
(button push on detection).
Several experiments have explored the RADAR task (e.g.,
Young et al., 2011). Pedagogy in all RADAR experiments involved
(written) instruction. Practice involved repeated searches, with
blocked practice of items varying in difficulty of mapping type
(consistent vs. varied mapping) and processing load (size of the
memory set). Training involved two sessions, and the presence
of both a simultaneous secondary task (concurrent tone count-
ing) and a sequential secondary task (action firing decision) was
manipulated.
Analysis of RADAR experimental results showed that practice
enhanced correct target detection times at delayed test. Analyz-
ing speed and accuracy measures separately showed improve-
ment in target detection accuracy (viz., fewer false alarms)
with practice, but no improvement in target detection times. At
training, both simultaneous and sequential secondary tasks
increased correct response times, and the sequential secondary
task also lowered accuracy (resulting in more missed targets).
The effects on test performance of training with a secondary
task depended on the nature of the secondary task. There was
a detrimental effect on target detection accuracy at test (more
missed targets) of training with the simultaneous secondary
task, but a beneficial effect on target detection accuracy at test
(fewer missed targets) of training with the sequential secondary
task. These results corroborate the proposal that not all added
task difficulty during training enhances task performance at
test; only some difficulties are desirable during training (Bjork,
1994; see also chapter 3).
Possible Expansions to the Taxonomy

One important factor that is known to affect learning but that
is not currently taken into account in the present taxonomy is
individual differences in abilities and backgrounds. Whether or
not practice in a skill makes individuals more similar or more
different depends on the task, on individual differences in abil-
ity, and on individual differences in prior knowledge (Acker-
man, 2007). For example, for tasks that depend on declarative
knowledge, performance levels depend on whether the tasks are
“open” or “closed.” Closed tasks are those that are bounded by
a reasonably finite domain of knowledge, whereas open tasks
consist of those that increase in required knowledge with com-
plexity. Thus, for open tasks (but not for closed tasks) there will
be an increasing difference between the levels of the highest
and lowest performing people. For tasks that allow individu-
als to build on existing knowledge, individual differences in
prior knowledge have a larger effect on the acquisition of new
knowledge than do individual differences in working memory
(e.g., see Beier & Ackerman, 2005). Thus, understanding the
effects of individual differences on training ultimately depends
on the identification and effective use of a taxonomy of indi-
vidual differences. As an example, work reported in this volume
(see chapter 6) has indicated that individual differences in gen-
eral intelligence interact with task automation, with a reduced
influence of general intelligence under higher levels of automa-
tion (Clegg & Heggestad, 2010). How individual differences affect
training and interact with other training variables remains to be
fully explored.
Group training is another important area for future work.
Many Army tasks involve the interaction of multiple individuals,
who share in the responsibility of task completion. Individuals
may have complementary or shared skills, with the combination
of these types of individuals, as well as group size, impacting
the complexity of task completion and hence the difficulty of
task training. Shute, Lajoie, and Gluck (2000) provide a discus-
sion of a taxonomy of common group training techniques and
the interaction of techniques with individual differences in abil-
ity, demographics, and background.
Toward Improving Training Effectiveness

As the section on expanding the taxonomy indicates, experi-
mental work performed as part of the MURI supported project
has provided empirical data on a substantial number of task,
training, and performance taxa combinations. Taking into
account all experiments increases the number of cells of the
training space for which empirical data have been collected.
To provide a basis for future research planning by the Army,
a matrix of training and performance taxa has been compiled
against the IMPRINT task taxa. The cells of the matrix for which
empirical data have been collected are indicated with the name
of the appropriate experimental task. This planning matrix is
presented in Appendix A.
The number of cells in the taxonomic space defined by the
current taxonomy outlined in this chapter is large, and so at this
time many cells in the taxonomic space lack empirical data from
laboratory experiments that can be used to quantify the effects
of training. It is also important to note that the empirical data
generated for many cells have come from exploration of only a
single task, so that their generality remains to be examined. In
addition, at this point it is not known whether the effects in cells
of the taxonomic space that have been quantified are additive
when task, training, or performance context taxa are combined.
As noted, the effects of individual differences in skill and ability
and the interaction of individuals in group tasks also need to
be taken into account. Exploration of the taxonomic space must
necessarily extend beyond the MURI supported project. However,
the taxonomic decomposition made possible by the present tax-
onomy affords an approach to evaluating training effectiveness
across tasks, potentially facilitating improved training in the
future.
References
Ackerman, P. L. (2007). New developments in understanding skilled
performance. Current Directions in Psychological Science 16,
235–239.
Alliger, G. A., Tannenbaum, S. I., Bennett, W., Traver, H., & Shotland,
A. (1997). A meta-analysis of the relations among training criteria.
Personnel Psychology, 50, 341–359.
Archer, R., Walters, B., Yow, A., Carolan, T., Laughery, K. R., & Gillis, P.
(1999). Training as a performance shaping factor in computer gen-
erated forces. Proceedings of the 1999 Computer Generated Forces
Conferences, Orlando, FL.
Beier, M. E., & Ackerman, P.L. (2005). Age, ability and the role of prior
knowledge on the acquisition of new domain knowledge. Psychology
and Aging, 20, 341–355.
Berliner, D. C. (1983). Developing conceptions of classroom environ-
ments: Some light on the T in classroom studies of ATI. Educational
Bjork, R, A. (1994). Memory and metamemory considerations in the
MA: MIT Press.
Bloom, B., Englehart, M., Furst, E., Hill, W., & Krathwohl, D. (1956).
Taxonomy of educational objectives: The classification of educa-
tional goals. Handbook: Vol. 1. Cognitive domain. New York: Long-
mans, Green.
Bourne, L. E., Jr., Raymond, W. D, & Healy, A. F. (2010). Strategy selec-
tion and use during classification skill acquisition. Journal of Exper-
imental Psychology: Language, Memory, and Cognition, 36, 500–514.
Carpenter, S. K., Pashler, H., Wixted, J. T., & Vul, E. (2008). The
effects of tests on learning and forgetting. Memory & Cognition, 36,
438–448.
Clegg, B., & Heggestad, E. (2010, April). Experiments on levels of auto-
mation, individual differences, and team performance. Paper pre-
sented at the 2010 Ellis-Battig Memory Symposium: Optimizing
the Training of Knowledge and Skills: A Review of Accomplishments
from the Multidisciplinary University Research Initiative (MURI) on
Training, 80th Annual Convention of the Rocky Mountain Psycho-
logical Association, Denver, CO.
Companion, M. A., & Corso, G. M. (1982). Task taxonomies: A general
review and evaluation. International Journal of Man-Machine Stud-
ies, 17, 459–472.
Fleishman, E. A. (1978). Relating individual differences to the dimen-
sions of human tasks. Ergonomics, 21, 1007–1019.
Gawron, V. J., Drury, C. G., Czaja, S. J., & Wilkins, D. M. (1989). A
taxonomy of independent variables affecting human performance.
International Journal of Man–Machine Studies, 31, 643–672.
Gonzalez, C., & Thomas, R. P. (2008). Effects of automatic detection
on dynamic decision making. Journal of Cognitive Engineering and
Decision Making, 2, 328–348.
Healy, A. F., Kole, J. A., Buck-Gengler, C. G., & Bourne, L. E., Jr. (2004).
Effects of prolonged work on data entry speed & accuracy. Journal of
Experimental Psychology: Applied, 10, 188–199.
Healy, A. F., Kole, J. A., Wohldmann, E. L., Buck-Gengler, C. J., &
Bourne, L. E., Jr. (2011). Data entry: A window to principles of train-
ing. In A. S. Benjamin (Ed.), Successful remembering and successful
forgetting: A festschrift in honor of Robert A. Bjork (pp. 277–296). New
York: Psychology Press.
Jonassen, D., & Tessmer, M. (1996/97). An outcomes-based taxonomy
for instructional systems design, evaluation, and research. Training
Research Journal, 2, 11–46.
Kirkpatrick, D. L. (1987). Evaluation of training. In R. L. Craig (Ed.),
Training and development handbook: A guide to human resource
development (3rd ed., pp. 301–319). New York: McGraw-Hill.
917–937.
Kraiger, K., Ford, J. K., & Salas, E. (1993). Application of cognitive, skill-
based, and affective theories of learning outcomes to new methods
of training evaluation. Journal of Applied Psychology, 78, 311–328.
Krathwohl, D. R., Bloom, B. S., & Masia, B. B. (1964). Taxonomy of
educational objectives: The classification of educational goals. White
Plains, NY: Longman.
Meister, D. (1976). Behavioral foundations of system development. New
York: Wiley.
Miller, R. B. (1953). A method for man-machine task development (Tech-
nical Report WADC-TR-53-137). Wright-Patterson A.F.B., Ohio:
Wright Air Development Center.
O’Neil, H. F. (Ed.). (2003). What works in distance learning: Guidelines.
Charlotte, NC: Information Age.
Proctor, R. W., & Vu, K.-P. L. (2006). Laboratory studies of training,
skill acquisition, and retention of performance. In K. A. Ericsson, N.
Charness, P. J. Feltovich, & R. R. Hoffman (Eds.), Cambridge hand-
book of expertise and expert performance (pp. 265–286). Cambridge,
England: Cambridge University Press.
Roth, J. T. (1992). Reliability and validity assessment of a taxonomy for
predicting relative stressor effects on human task performance (Tech-
nical Report 5060-1 prepared under contract DNA001-90-C-0139).
Boulder, CO: Micro Analysis and Design.
Shute, V, J., Lajoie, S. P., & Gluck, K. A. (2000). Individualized and
group approaches to training. In S. Tobias & J. D. Fletcher (Eds.),
Training and retraining: A handbook for business, industry, govern-
ment, and the military (pp. 171–207). New York: Macmillan.
Szpunar, K. K., McDermott, K. B., & Roediger, H. L., III. (2008). Testing
during study insulates against the buildup of proactive interference.
tion, 34, 1392–1399.
Wickens, C. D., & Holland, J. G. (2000). Memory, learning, and train-
ing. In C. D. Wickens & J. G. Holland (Eds.), Engineering psychol-
ogy and human performance (3rd ed., pp. 241–284). New York:
HarperCollins.
Wohldmann, E. L., Healy, A. F., & Bourne, L. E., Jr. (2008). A mental
practice superiority effect: Less retroactive interference and more
transfer than physical practice. Journal of Experimental Psychology:
Appendix A: The IMPRINT Planning Matrix
Pedagogy
Instruction
T
Immersion Learning . . Individualizat Group

Simulation locat1on D1scuss1on/Q&A ion (default= training Automation
Modeling (Mimicking (embeded in default= (default= 1·
(interaction with (default= (default=
= observe and mimic actual field local; remote way, else 2· n~, :~:- =
IMPRINT task taxons Lecture/Instruction Demonstration computerized Discovery Computer instruction ?' no, group no, yes)
a model performing situation of ) m e 1gen
representation of
C1.e., d.1stance way tutoring)
task) task) size)
task) learnmg))
Visual letter detection, data radar, tankgunner navigation

entry, navigation,
target finding
(clockface),fusion,
color naming,
handwriting symbols,
dart throwing
Numerical Analysis pseudo-arithmetic, fire radar fire control
.. fire control fire control Clegg
control(lecture) .. (socratic) pasteurizer
Information processing letter string Clegg pasteurizer, Clegg pasteurizer, Clegg
classification, letter radar letter string pasteurizer
detection, data entry, classification, time
fact learning, mental estimation, quantity
calculation, estimation, sequence
reconstruction of learning
order
Fine motor · discrete data entry, sequence Proctor flight simulator, navigation
learning, navigation, tank gunner, radar
fusion ..
Fine motor· continuous target finding Proctor flight simulator target finding
(clockfacewithmouse (clockfacewithmouse
reversal) reversal)
Gross motor-light
Gross motor-heavy
Communication foreign language fire control fire control fire control
(reading &writing) learning, letter (socratic)
detection, fire control,
color naming
Communication (oral) navigation
.. navigation
Practice
Scheduling parameters (of items and sessions)
Time spacing (default

Sessions (whatever
Item difficulty =no rest; rest) &
Item repetition parameters are Testing .
Number of (default= (default = Distribution (default intervaiChange in
IMPRINT task taxons important; at least (default= no; (~=~:~~a~mg)
items/trials unspecified; fixed , massed) = mixed, blocked) spacing (default = number and t est schedule) t- no
difficulty level) none, expansion,
spacing)
contraction)
Visual radar (letters/#s, radar (blocks &

planes) sessions)
Numerical Analysis radar (blocks &
sessions)
I nformation processing radar data entry (f xed logic decision data ent ry (variable
(varied/consistent v. massed) (blocked/mixed), time practice), radar
mapping, memory estimation (blocks & sessions)
load), navigation (blocked/mixed),
(message length), navigation
fusion (distribution) (mixed/blocked
length)
Fine motor - discrete sequence learning data entry (f 'xed data entry (variable
(length, clustering) v. massed) practice)
Fine motor- continuous
Gross motor - lig ht
Gross motor - heavy
Communication foreign language foreign language learning
(reading & writing) learning (blocked v. (fixed v. expanding
mixed), coding (easy items)
1st v. hard 1st)
Practice
Task parameters
Scope (part Mediation Attentional Attentional Sf
breadth •mu 1us- Mapping
[e.g . mental (e.g ., through focus (default Contralater Time
Deep response type Stressor
rehearsal], prior ( default" al training pressure
IMPRINT task taxons processing = no focus; compatibilit (default;;; (default;;;
default;;; knowledge)? intermediat (default;;; (default;;;
(default;;; no) internal, y (default" consistent; no)
whole, (default " e; global, no) no)
external) yes) variable)
supplemental) no) local)
Visual letter detection radar symbol copy;

(standard/idiosy dart t hrowi ng
ncratic
mappings)
Numerical Ana lysis
I nformation processin g data ent ry data ent ry fact lea rn ing data ent ry radar data entry memory
(whole v. partial (number/words), (person (i/o format); (hand components
v. supplemental) letter detection association), Proctor s-r weights)
(standard/idiosy data entry compatibility
ncratic (person
mappings), co lor association)
naming (word,
sentence)
Fine motor- discrete data entry data entry symbo l copy; data entry
(whole v. partial (i/ o format), dart th rowing (hand
v. supplemental) Proctor s-r we ights)
compatibi lity
Fine moto r - co nti nuous ta rget fi nding ta rget fi ndi ng
(mouse (reversa ls)
reversals)
Gross motor - light
Gross motor - heavy
Comm unication co lor naming
(reading & writing) (word, sentence)
Comm unication (ora l)

Practice
Feedback parameters Context parameters
Presence of Feedback Distractor secondary

(response) scheduling (default = no· activity (default
IMPRINT task taxons = no;
feedback (relative to simultaneiou~
(default = no) items) , sequential) simultaneous,
sequential)
Visual navigation
(noise?)
Numerical Analysis radar (visual radar (fire
detection) decision)
Information processing data entry, time time estimation
navigation estimation, (letter counting),
(correct/incorrect reconstruction data entry
; immediate v. of order, radar (articulatory
delayed) (tone counting) suppression, +/-
termination),
radar (fire
decision)
Fine motor - discrete data entry sequence data entry
learning (articulatory
(tones) suppression)
Fine motor - continuous target finding
(no reversals;
periodic v. trial-
by-trial)
Gross motor - light
Gross motor - heavy
Communication
(reading & writing)
Communication (oral) navigation navigation
(abbreviated
responses)
MURI post-training performance context factors
Transfer parameters
New items, item
order, item Refresher training
New context New task (relative to Retention interval schedule (default =
distribution (default (default none;
=
(same as, training) (same as, none; refresher
IMPRINT task taxons = same as training; time since training)
different from different from schedule)
different items,
training) training)
order, or
distribution)
Visual letter detection, target data entry (variable

finding, practice)
Numerical Analysis
Information processing navigation (block time estimation (+I- data entry (typing mental calculation, data entry (variable
training, mixed test) 2ndary task, diff hand, output time estimation, interval)
2ndary task),radar configuration), reconstruction of
(+/-tone counting, order, letter string
fire decision) classification, memory
components, data
entry
Fine motor - discrete data entry (hand, data entry (variable data entry (variable
output configuration), practice) interval)
Proctor s-r
compatibility, Proctor
flight simulator
Gross motor - light
Gross motor- heavy
Communication foreign language
(reading & writing) learning
9 Cognitive Models
of Training Principles
and the Instance-Based
Learning Tool
Cleotilde Gonzalez
Carnegie Mellon University
This chapter reviews computational representations of human

behavior involving three training principles discussed in pre-
ceding chapters (especially chapters 2, 3, and 5): Speed–accu-
racy trade-off attributable to fatigue, training difficulty, and
stimulus-response compatibility. Effects of these three training
principles were modeled using the ACT-R cognitive architec-
ture (Anderson & Lebiere, 1998) and the instance-based learn-
ing (IBL) theory (Gonzalez, Lerch, & Lebiere, 2003). The use of
similar memory principles in all three projects resulted in the
implementation of an IBL tool (Dutt & Gonzalez, 2011), which
provides a computational framework that facilities building com-
putational models using ACT-R and IBL theory. The last section
of this chapter summarizes the IBL tool and concludes with the
benefits of using computational representations of learning and
training principles: to develop an understanding of the learn-
ing process in a variety of tasks; to predict learning effects from
training principles; and most importantly, to demonstrate the
generality of computational principles and representations from
the ACT-R architecture and IBL theory.
Cognitive Architectures: ACT-R

In Unified Theories of Cognition, Allen Newell calls for “unifica-
tion” as an aim of science: “positing a single system of mech-
anisms—a cognitive architecture—that operate together to
produce the full range of human cognition” (Newell, 1990, p. 1).
His approach to unification involved a single piece of software
representing a theory about the nature of the human mind.
ACT-R (Anderson & Lebiere, 1998) is an example of a theory that
consists of multiple modules, but that evolved as an integrated
cognitive architecture (Anderson et al., 2004). Corresponding
to human abilities like perceptual-motor, declarative memory,
182 Cleotilde Gonzalez
and the goal, the modules may help make particular predictions
about human behavior (Anderson et al., 2004).
ACT-R can be conceptualized as a toolbox of cognitive pro-
cesses, and a cognitive model is a particular computational
representation of the processes involved in executing a concrete
task. When a task is to be performed, different cognitive pro-
cesses may be required in different sequences, along with dif-
ferent intensities and durations to accomplish a task. Thus, a
cognitive modeler makes the difficult decisions of which particu-
lar sequences are needed. Up to this point, there has been little
theory to guide the modeler. In reference to models of human
learning that are directly relevant to training principles, how-
ever, there have been at least two approaches: (a) learning by
means of rules (procedural knowledge, strategies) and (b) learn-
ing from particular domain-related events (declarative knowl-
edge, instances; Anderson & Lebiere, 2003). These approaches
may use ACT-R’s symbolic and subsymbolic knowledge repre-
sentations in different ways (Anderson & Lebiere, 1998, 2003).
The symbolic aspects can be declarative, procedural, or both.
Declarative knowledge is represented in chunks, and proce-
dural knowledge is represented in productions (if–then rules).
Subsymbolic elements are the neural-like mathematical mecha-
nisms that manipulate the symbolic representations.
ACT-R affords the modeler considerable freedom in which
approaches to take when developing an accurate representation
of the learning processes involved in a task’s execution. Modelers
can choose to “think of” or discover the strategies that human
beings use in a task, which can be represented in production
rules (strategy-based learning or SBL). These production rules
“compete” according to the values that are provided through a
process of reinforcement learning. In contrast, modelers can also
choose to represent knowledge in instances (i.e., task cues are
represented as slots of chunks), following IBL theory. Thus, IBL
represents the learning process in a generic set of productions
and uses mostly declarative knowledge as the basis for learning.
In IBL, instances are accumulated and retrieved according to a
memory mechanism called activation, which is a function of the
recency, frequency of the use of instances, and their similarity
to the task’s cues. ACT-R models often use a combination of the
SBL and IBL approaches. Furthermore, there are also consider-
able degrees of freedom to decide what parameters and subsym-
bolic mechanisms are used to “fit” a model to human data.
The IBL theory attempts to provide a framework of the cogni-
tive processes involved in making decisions from experience in
dynamic tasks (Gonzalez et al., 2003). As will become clear in
Cognitive Models and the Instance-Based Learning Tool 183
the rest of the chapter, this framework helps reduce the degrees
of freedom involved in modeling by adopting one particular per-
spective of learning from experience and exploration, which can
be applied to a broad range of dynamic and repeated choice
tasks. In modeling, the training effects illustrated in the chapter
utilized IBL theory and other proposed ACT-R learning mecha-
nisms. IBL theory will be introduced next, then computational
models of three training principles will be summarized: speed–
accuracy trade-off, training difficulty, and stimulus-response
compatibility. Examples of modeling these three training princi-
ples will highlight the common and robust mechanisms of expe-
riential learning used in IBL theory. The chapter will conclude
with the presentation of the IBL tool, an easy-to-use compu-
tational approach that facilitates and frames the choices that
modelers can take with IBL theory.
Instance-Based Learning Theory

IBL theory (Gonzalez et al., 2003; Gonzalez & Dutt, 2011) pro-
poses a particular learning process free of fabricated and spe-
cific strategies, as well as concrete guidelines for the symbolic
representation of information. The theory also uses a subset of
subsymbolic learning mechanisms developed in and adapted
from ACT-R. Thus, developing models that follow IBL theory
reduces the number of decisions a modeler must make.
IBL theory was initially proposed to demonstrate how learn-
ing occurs in dynamic decision-making tasks (Gonzalez et al.,
2003). An IBL model was implemented within the ACT-R archi-
tecture, and it was demonstrated how IBL theory and ACT-R
parameters and processes were needed to account for human
decision making in a complex task. IBL theory has more recently
been used in other type of tasks, including simple binary choice
tasks (Lebiere, Gonzalez, & Martin, 2007; Lejarraga, Dutt, &
Gonzalez, in press), two-person game theory tasks (Gonzalez &
Lebiere, 2005), and other dynamic control tasks (Martin, Gon-
zalez, & Lebiere, 2004).
An instance in IBL theory is a triplet containing the cues that
define a situation, the actions that define a decision, and the
expected or experienced value from an action in such a situa-
tion. Simply put, an instance is a concrete representation of the
experience that a human acquires in terms of the task situation
encountered, the decision made, and the outcome (feedback)
obtained in the task. A modeler following the IBL theory must
define the structure of a situation-decision-value instance. IBL’s
generic decision making process involves the following steps:
Recognition (comparison of cues from the task to cues from
memory); Judgment (the calculation of a decision’s possible util-
ity in a situation, either from past memory or from heuristics);
Choice (the selection of the instance containing the highest util-
ity); Execution (the act of making a decision based upon the cho-
sen instance); and Feedback (the modification of the expected
utility defined in the judgment process with the experienced
utility after receiving the outcome from a decision made).
In making a choice, the IBL theory selects the alternative
with the highest blended value, V (Gonzalez & Dutt, 2011; Lejar-
raga et al., in press) resulting from all instances belonging to an
alternative. The blended value of alternative j is defined as
n
V j = ∑ pi xi
i =1 (1)
where x i is the value of the observed outcome in the outcome slot

of an instance i corresponding to the alternative j, and pi is the
probability of that instance’s retrieval from memory. The blended
value of an alternative (its utility) is the sum of all observed
outcomes x i of corresponding instances in memory, weighted by
their probability of retrieval. In any trial t, the probability of
retrieving instance i from memory is a function of its activation
relative to the activation of all other instances corresponding to
that alternative, given by
Ai , t

e
Pi ,t =
∑
Aj , t
e  (2)
j
where  is random noise defined as = σ × √2̄ and  is a free noise

parameter. Noise in Equation 2 captures the imprecision of
recalling instances from memory.
The activation of each instance in memory depends upon the
Activation mechanism originally proposed in the ACT-R archi-
tecture (Anderson & Lebiere, 1998). For each trial t, the Activa-
tion Ai,t of instance i is:
Ai = Bi + ∑W j
j S ji − Di + ε (3)
The activation Ai of an instance i reflects how likely the instance

would match a task cue at the current point of time, and the
probability and speed of retrieval of that instance (Anderson &
Lebiere, 1998). The activation is determined by the base-level
activation Bi, the associative activation Si, the mismatch penalty
value Di, and noise. The base-level activation Bi of the instance
i reflects the recency and frequency of that instance’s use. Si
reflects the impact of contextual values on the instance’s acti-
vation, and D i is the degree to which the instance matches a
context (i.e., the extent to which a given instance is similar to
previously presented instances in each S slot). The noise param-
eter  is a variable value associated with an instance. See detail
information regarding each of the terms of the activation equa-
tion in Anderson and Lebiere (1998) and in Gonzalez, Best,
Healy, Kole, and Bourne (2011).
Si is defined by the sum of the source activation that an
instance receives from the elements currently in attention (i.e.,
the task cues). Wj represents the attentional weighting of each
element’s j cues that are part of the current goal instance, and
the Sji component represents the strengths of association that
measures how often the instance i is needed when cue j is an
element of the goal. ACT-R assumes that there is a limited total
amount of attention (W, the sum of all Wj) that one can be dis-
tributed over source objects. W is an ACT-R parameter that
reflects the salience or attention given to an instance’s cues.
This salience helps create a contrast between relevant and irrel-
evant cues for the current goal that will help in maintaining
information necessary for task performance. Thus, W influences
the maintenance and prioritization of goals, attention to rele-
vant and irrelevant information, and the amount of concurrent
processing (Lovett, Reder, & Lebiere, 1999). Higher values of W
facilitate the retrieval process by increasing spreading activa-
tion, whereas lower values reduce activation and increase the
likelihood of retrieving incorrect items.
Computational Models of Three Training

Principles
This section describes three examples of cognitive models devel-
oped to demonstrate the learning processes involved in three
training principles (see chapters 2, 3, and 5): speed–accuracy
trade-off attributable to fatigue, training difficulty, and stimu-
lus-response compatibility. In each project, behavioral results
are used from human experiments and human performance is
compared to the results produced by the computational mod-
els in the same tasks. Furthermore, all three projects involve
fitting human data and predictions on new, unknown condi-
tions, which demonstrate one of the most important benefits of
cognitive modeling. Note that a parallel modeling effort using
IMPRINT, rather than ACT-R, is reported in chapter 10 for the
first two training principles, and the two sets of models are eval-
uated and compared in chapter 11.
Models of Speed-Accuracy Trade-Off

in a Data Entry Task
Fatigue often results from prolonged work that is manifested as
deterioration in performance along with skill acquisition (see
chapter 2). On one hand, fatigue effects might be attributed to
limitations of cognitive processes such as attention. For exam-
ple, some models assert that cognitive resources are needed
during task performance and that there is a limited amount to
expend in the task (Wickens, 1984; see also chapter 4). Thus,
monotonous and prolonged perceptual processing depletes this
pool of resources, making it harder to maintain attention (Para-
suraman, 1986) and often resulting in habituation (Mackworth,
1969). On the other hand, fatigue effects might be explained
with arousal theories, which argue that performance decre-
ments are due to the lack of stimulation needed to maintain
alertness (Ballard, 1996). Often, sustained repetitive tasks are
boring (Hoffman, Sherrick, & Warm, 1998), which produces
decreases in arousal (Mackworth, 1969).
Gonzalez et al. (2011) presented a cognitive model represent-
ing the cognitive aspects of fatigue (e.g., attention) and fatigue
itself as an arousal process. This was a model that followed
initial work on fatigue modeling (Fu, Gonzalez, Healy, Kole,
& Bourne, 2006; Gonzalez, Fu, Healy, Kole, & Bourne, 2006).
The cognitive model was developed in the ACT-R architecture
to represent the behavioral pattern observed in a number of
experiments incorporating extended task performance, which
resulted in both beneficial and deleterious performance effects
(Healy, Kole, Buck-Gengler, & Bourne, 2004; Kole, Healy, &
Bourne, 2008). Beneficial effects, demonstrated as a decrease in
response latency over time, resulted from general skill acquisi-
tion and from specific learning or repetition priming attributable
to the repeated occurrence of stimuli and responses. Deleterious
effects, demonstrated as an increase in errors over time, have
causes that are less clear, but might be attributed to fatigue or
fatiguelike processes such as boredom, task disengagement, or
loss of attention that builds across trials.
Following previous work (Jongman, 1998), Gonzalez et
al. (2011) developed an ACT-R model of mental fatigue where
both arousal and cognitive factors influence performance. The
task was data entry, which required subjects to read a four-
digit number and then type it on the computer. Two laboratory
experiments were examined using the data entry task reported
in Healy et al. (2004). Some of the results from this effort that
involved comparisons of model predictions to human behavior for
average response time and the proportion of correct responses
are summarized. Figure 9.1 shows that the ACT-R model was
able to capture the primary observation by Healy et al. (2004):
that prolonged work resulted in both learning and fatigue
effects, with learning effects dominating the speed measure and
fatigue effects dominating the accuracy measure. Gonzalez et
al. (2011) showed that prolonged work effects are captured by
the combination of arousal and cognitive factors corresponding
to two ACT-R sub-symbolic parameters in combination with the
production compilation mechanism.
Exp 1 Total Response Time

2.8
2.7
Total Response Time (s)
2.6
2.5
Model
2.4 Prediction
2.3
Human
2.2
2.1
2
1 2 3 4 5 6 7 8 9 10
Block
Exp 1 Accuracy
1
0.9
Proportion Correct
0.8
Model
Prediction
0.7
Human
0.6
0.5
1 2 3 4 5 6 7 8 9 10
Block
Figure 9.1 Experiment 1 data from Healy et al. (2004) and ACT-R
model fits to the data from Gonzalez et al. (2011).
Models of Training Difficulty Principle
in the RADAR Task
The training difficulty principle (see chapters 2 and 3) predicts
that conditions that cause difficulty during learning would facil-
itate later retention and transfer. This principle was tested in a
RADAR target detection and decision-making task (Gonzalez &
Thomas, 2008), using laboratory experiments where, in some
cases, the potential targets were nine military vehicles (e.g.,
submarine, helicopter, and jeep) (Young, Healy, Gonzalez, Dutt,
& Bourne, 2011).
The goal in RADAR is to detect and eliminate hostile enemy
targets by visually discriminating moving targets among mov-
ing distractors. RADAR is similar to military target visual detec-
tion devices, in which a moving target needs to be identified as
a potential threat or not and a decision is made on how to best
destroy that target. The task requires the participant to make
both visual and memory searches. The participant must memo-
rize a set of targets and then seek out one or more targets on
a radar grid. A target threat may or may not be present among
a set of moving blips. The blips—in the form of potential tar-
gets or blank masks—begin at the four corners of the grid and
approach the center at a uniform rate. Detection of an enemy
target must occur before the blips collapse in the center.
Models of the training difficulty principle in the RADAR
task were developed under two perspectives, the IBL and SBL
approaches, and compared (Gonzalez, Dutt, Healy, Young, &
Bourne, 2009). The goal of the model comparison effort was to
understand the processes by which behavior is represented, the
constraints that the different approaches impose upon the task
models, and the comparison of the two approaches’ theoretical
assumptions (Lebiere, Gonzalez, & Warwick, 2009).
The IBL model was based upon the IBL theory as presented
above. The SBL model used four concrete strategies that varied
in their effectiveness at performing the target detection task.
One strategy was an optimal strategy, and three strategies were
suboptimal. These strategies represented practically feasible
ways to go about the task. The utility learning mechanism in
ACT-R (Anderson et al., 2004) was used, by which the different
strategies compete using a reinforcement learning algorithm.
This algorithm produces a gradual transition from the subop-
timal to the optimal strategies. When the model executes, there
is a competition set up between the three suboptimal strategies
and the optimal one. Although the suboptimal strategies are
executed more often initially, the optimal strategy later picks up
in usage because of its increased utility through repeated posi-
tive rewards.
The SBL and IBL models were compared along two different
dimensions: (a) fit: how well each model fits human learning
data in the task; and (b) adaptability: how well each model that
has been able to reproduce the way human beings learned in
one task scenario behaves in new scenarios that are similar to
or different from the training condition. The fit criterion is com-
mon in model comparisons, whereas the adaptability criterion
is relatively new (Gluck, Bello, & Busemeyer, 2008). The adapt-
ability criterion used here is similar to the generalization crite-
rion method (Busemeyer & Wang, 2000), which divides observed
data into two sets: a calibration or training set to estimate model
parameters, and a validation or test set to determine predic-
tive performance. However, the models’ adaptability was further
tested by examining their ability to adapt to new test condi-
tions that are either similar to or different from the training
conditions.
Figure 9.2 presents the average times for correct responses
during the training phase, in four conditions that varied in the
difficulty of target detection (Young et al., 2011). The 1+1 condi-
tion indicated the need to memorize one target and the presence
of only one item on the RADAR screen. Thus, this was the easi-
est condition. The 4+4 condition indicated the need to memo-
rize four targets and the presence of four items on the RADAR
screen, making it the most difficult condition. The mappings of
1800
Average Correct Response Time (ms)
SBL model
1600
1400 Human data

1200
IBL model
1000
800
600
400
200
0
1+1 4+4 1+1 4+4
Consistent Varied
Condition
Figure 9.2 Average correct response times (ms) for CM 1+1, VM 1+1,
CM 4+4, and VM 4+4 blocks in human data and SBL and
IBL models during training. The error bars show 90% con-
fidence intervals.
the targets were either consistent (the target was always a target
within a block of trials) or varied (the target was sometimes used
as a distractor on a different trial within the same block).
As shown in Figure 9.2, both the IBL and SBL models fit the
human data quite well, RMSD = 69 ms for IBL and RMSD = 163
ms for SBL. However, the SBL model seems to generate generally
higher time values compared to human data, and it has a higher
RMSD. This difference may be because the four strategies in
the SBL model execute productions in a fi xed time (50 ms per
production). There is also no speedup in the correct response
times due to this fi xed strategy execution time, whereas the IBL
model speeds up on account of activation-retrieval time accel-
eration. The retrieval time decreases if the instances’ activation
increases over blocks (Anderson & Lebiere, 1998). It is also clear
from Figure 9.2 that both models take more time in the more
difficult (4+4) blocks than in the easier blocks (1+1) for both con-
sistent and varied target mappings. This finding demonstrates
the effects of workload well known from behavioral studies of
automaticity (Gonzalez & Thomas, 2008), which result from the
extra time taken to process additional items.
Figure 9.3 demonstrates the effects of added difficulty in the
task (Young et al., 2011). In the “Tone” condition, participants
were required to count deviant tones (low and high frequency)
among standard tones (medium frequency) playing in the back-
ground during the target detection task. As shown by both mod-
els, the tone condition takes slightly more time to process than
silent trials because of an extra auditory production in both
1800 SBL Data

1600 Human data

1400 IBL model
1200
1000
800
600
400
200
0
Silent Tone
Condition
Figure 9.3 Average correct response times (ms) for silent and tone
conditions for human data and SBL and IBL models during
training. The error bars show 90% confidence intervals.
models that processes the tones. Again, the difference between
the time in the SBL model and human data is greater than the
difference between the time in the IBL model and human data.
The SBL model has no activation-retrieval speedup to compen-
sate for time spent in tone counting, whereas there is such a
speedup in the IBL model that reduces the overall time.
To test the adaptability of both models, transfer was com-
pared from difficult to easier conditions (tone-to-silent) and easy
to more difficult conditions (silent-to-tone). Figure 9.4 shows
these results. The SBL model has an RMSD = 160 ms when
it is trained in tone and transferred to silent, whereas the IBL
SBL Data
1600
Human data
1400
IBL model
1200
1000
800
600
400
Training Transfer
Tone Silent
1600 SBL Data

Human data
1400
IBL model
1200
1000
800
600
400
Training Test
Silent Tone
Figure 9.4 Left panel: Average correct response times (ms) for human
data and SBL and IBL models for training in the tone and
testing in the silent condition. Right panel: Average cor-
rect response times (ms) for human data and SBL and IBL
models for training in the silent and testing in the tone
condition. The error bars show 90% confidence intervals.
model’s RMSD = 50 ms. The SBL model’s RMSD when trained in
silent and transferred to tone is 248 ms, whereas the RMSD for
the IBL model is 62 ms. Thus, one can conclude that both mod-
els are quite good according to the adaptability criterion, but
the IBL model produces values closer to the human data than
the SBL model. Although the numerical values are important,
the IBL model has other advantages over the SBL model not
shown in measurements: the changes in environmental condi-
tions are captured in the instances stored and retrieved from
memory, whereas the SBL approach is blind to those changes.
The SBL model continues applying the same strategies at test,
which might not be as effective as they were during training
once the task conditions change. Also, the strategies in dynamic
situations are often unknown a priori or difficult to define at all.
Human beings are often unable to explain any rules or strate-
gies used to solve a dynamic problem. Thus, the IBL approach is
more appropriate for modeling dynamic decision making (Gon-
zalez et al., 2003) than the SBL approach.
Models of Stimulus-Response Compatibility

The stimulus-response compatibility (SRC) training principle
and the Simon effect, as discussed in both chapters 2 and 5,
can be modeled using IBL theory (Dutt, Yamaguchi, Gonzalez,
& Proctor, 2011; Yamaguchi, Dutt, Gonzalez, & Proctor, 2011).
The SRC effect is characterized by faster responses when the
stimulus and response locations correspond than when they do
not. The effect is so robust that it is found even when stimu-
lus location is irrelevant to the task, a variation known as the
Simon Effect (Simon, 1990). Both SRC and Simon effects occur
for visual and tactile stimuli, verbal and nonverbal symbols that
convey location information (e.g., location words; Proctor, Yama-
guchi, Zhang, & Vu, 2009), a variety of response modes (e.g., a
steering wheel), and in more complex tasks such as flight opera-
tions (Yamaguchi & Proctor, 2006).
A dominant cognitive explanation of the faster RT with com-
patible stimuli and responses is the dual-route account (Proc-
tor & Vu, 2006), which assumes two distinct response-selection
processes characterized as direct and indirect routes. The indi-
rect route is presumed to activate a response based on the inten-
tions created through the instructed stimulus-response (S-R)
mappings. In contrast, the direct route is presumed to activate
automatically a response corresponding to the stimulus loca-
tion, which facilitates response when it is correct but interferes
when it is incorrect. Recent findings that the RT speedup can
be attenuated or even reversed in mixed-task conditions sug-
gests that the response-selection process that gives rise to these
effects is not as purely automatic (e.g., unconditional and inde-
pendent of task goals) as it is often described in the literature.
What is missing in the current literature is an account of the
learning mechanism(s) that produce(s) the observed phenomena.
Dutt et al. (2011) provide an explanation of the observed
learning effects using a computational model based on IBL the-
ory. The IBL theory has both the direct (automatic) and the indi-
rect (controlled) routes to model human performance in tasks
where there is a slow transition from the indirect to the direct
route over time. The presence of both routes enables IBL theory
to explain how the cognitive processes are used, how SRC and
Simon tasks become automatic, and how the performance can
be captured when SRC and Simon tasks are intermixed, and to
predict the behavior in novel task-mixing conditions. Further-
more, the results of the IBL model were compared to the human
data in sequential trials for mixed Simon and SRC tasks, when
the compatible (corresponding) or incompatible (noncorrespond-
ing) mapping repeats or switches in a SRC (Simon) task and
when the Simon or SRC task repeats or switches.
In IBL theory, learning occurs through a progressive accu-
mulation of decision instances in memory and by gradually
moving from an exploration phase, where more explicit rules
of action are used (the indirect route) to an exploitation phase,
where instances retrieved from memory are used. This latter
phase involves implicit recognition of familiar patterns and spe-
cific retrievals from memory, similar to the gradual process pro-
posed in Logan’s (1988) instance theory of automaticity. Thus,
an IBL model starts within the indirect route, predicting the
application of an action rule. The process then moves to the
direct route, such that an instance is retrieved from memory to
make a response. Under the direct route, if the task is SRC and
the mapping is compatible, an instance closest in similarity to
the task and mapping is retrieved from the memory. Because
IBL theory works by retrieval of past experiences in the form of
instances, a decrease in RT is expected when task and mapping
repeat, compared to when either task, mapping, or both switch
in mixed SRC and Simon tasks. This result occurs because the
retrieval of a past instance is faster when it has been performed
recently (recency effect) and/or frequently (frequency effect)
under the direct route.
The discussion in Dutt et al. (2010) shows that the calibrated
model is able to explain the RT observed in a human experi-
ment. Furthermore, the same model without modification is
750
700
650
RT(ms)
Model
600 Human
550
500
SRC Incompatible SRC Compatible Simon Non
- Simon
Corresponding Corresponding
Figure 9.5 The IBL model’s fits to human data in different mappings
of the SRC and Simon tasks. The error bars show 95%
confidence intervals around the point estimate.
used to generate predictions in novel mixed task conditions. The

model prediction’s fit to human data reveals the role of recency
and frequency in the mixed-task paradigm. Figure 9.5 shows
that the IBL model results are very close to those of human par-
ticipants. These fits capture the RTs in four different task trials,
and the sequential task and mapping trials. The model fits were
generally good with respect to practice and sequential effects in
two experiments, suggesting that it provides a good account of
the performance in mixed SRC/Simon tasks.
Making Computational Modeling Easy: IBL Tool

Although “unification” as a scientific goal for the cognitive sci-
ences is commendable (Newell, 1990), the representation of a
full range of human behavior has proven to be a very complex
challenge. Current cognitive architectures that embrace this
unification goal are rare (but ACT-R is an exception). The unifi-
cation goal has turned architectures into very complex systems
that are often incomplete and difficult to use.
IBL is not the basis of a unified theory of human behavior. It
is only a theory of dynamic decision making. Yet, it has shown
robustness across a wide diversity of tasks that vary in their
dynamic demands (see Gonzalez & Dutt, 2011; Lejarraga et al.,
in press) for concrete discussions and demonstrations, and the
examples shown in this chapter add to these demonstrations.
This section presents a way in which cognitive modeling and
the reuse of IBL theory as a whole can be facilitated: the cre-
ation of a simple-to-use tool that represents the unification and
constraints of IBL theory. The tool is built upon the ACT-R’s sub-
symbolic mechanisms needed for IBL theory. The construction of
a modeling tool will help demonstrate that IBL theory can make
general predictions across many diverse tasks; rather than cre-
ating multiple, task-dependent models. It will also make the the-
ory more accessible to the community of cognitive modelers and
psychologists (a free copy of the tool is available at http://www.
cmu.edu/ddmlab/). The tool is motivated and explained further
in Dutt and Gonzalez (2011).
Figure 9.6 shows the architecture of the IBL tool, the step-
by-step processes from the theory and the interaction with an
“Environment,” a task for which a model is developed. The IBL
tool is an easy-to-use graphical user interface that uses a com-
mon mechanism of network communication between two com-
puter applications (i.e., socket communication) to communicate
remotely with the task. The tool allows the situation (S) and
feedback (U) cues to be retrieved from the task environment,
and the processed decisions (D) to be sent from the model to
the task. Thus, the task may be developed in any programming
language. Using socket interfaces for task communication is the
only technical requirement to using the tool.
Instance-Based Learning Tool
Recognition
PKase
● Retrieval Threshold
● Utility Threshold
Stopping Rule Similarity
HV
●
&X and BLL

Judgment
,QVW
PKase 5H DQFH
WULH
YDO
+H RU
XULV
WLFV
Choice
Environment
PKase
Memory
'H Choice Threshold
FLV
LRQ
WH
GD
.Q 8S VLRQ
RZ H FL
5H OHGJ Execution '
VX HR PKase
OWV I
Goal
Feedback
Update Utility
PKase
Figure 9.6 The IBL tool with five distinct IBL theory phases (Right)
and the task Environment (Left).
The IBL tool takes a modeler step-by-step through the dis-
tinct process of IBL theory, making it easy to understand and
intuitive for new modelers. Most importantly, these steps are a
generic decision-making process that does not depend on the
modeler’s creativity to define (often complicated) decision-mak-
ing strategies. Research has demonstrated that this process is
generic enough to model most forms of decisions from experi-
ence (Gonzalez & Lebiere, 2005; Gonzalez et al., 2003; Lejar-
raga et al., in press). But the most relevant contribution of the
IBL tool is to make the theory more accessible. The IBL tool
makes it shareable, by bringing the theory closer to the end
users; generalizable, by making it possible to use in different
and diverse tasks; understandable, by making it easy to use
in cognitive models; robust, by abstracting the specifics of its
implementation independent of any specific programming lan-
guage; communicable, by making the tool interact more easily
and in a more standard way with tasks; and usable, by making
the theory more transparent.
A step-by-step demonstration of building a cognitive model
in the IBL tool for a particular task (the Iowa Gambling Task)
is explained in Dutt and Gonzalez (2011). Once a modeler has
defined the model’s parameters and instance structure, the tool
can simulate a number of model participants by connecting it to
the task using well-known computer communication standards.
These simulations provide the model’s predictions regarding
human behavior and performance in the task of interest.
1
0.9
Proportion of Choices of
Advantageous Buttons
0.8
0.7
0.6
0.5
Human
0.4
0.3 Model
0.2
0.1
0
1 2 3 4 5 6
Block
Figure 9.7 The fit of the IBL model developed in the IBL tool to human
data for controls in the Iowa Gambling Task. See Dutt and
Gonzalez (2011).
Figure 9.7 shows the results obtained from running the IBL
model of the Iowa Gambling Task in the IBL tool compared to
that of a group of healthy human (control) participants run
in the same task, as reported by Bishara et al. (2009). These
results were obtained using the default values of the parameters
in the tool. Thus, the intention was not to calibrate the model
parameters to produce the best predictions for human data, but
rather to use the Iowa Gambling Task as an example to explain
the IBL tool. Figure 9.7 shows the proportion of choices for the
two advantageous alternatives (out of a total of four options in
the Iowa Gambling Task) over six blocks of 20 trials each. The
proportion of choices has been averaged across all 32 human
and model participants. Although the exact level of the values
are discrepant, the model in the IBL tool set at default param-
eter values provides a reasonable prediction of the trend in the
observed human behavior over the six blocks of the experiment
(MSD = 0.010; r = 0.86). In addition to the performance data,
running a model in the IBL tool produces data on the values and
dynamics of its mechanisms (e.g., activation, base-level learn-
ing, noise, and the values of the instance’s situation-decision-
value slots).
Conclusion
Three projects involving computational representations of
human behavior for three training principles are summarized:
speed–accuracy trade-off attributable to fatigue, training dif-
ficulty, and stimulus-response compatibility. Taken together,
these studies show that the ACT-R architecture and IBL theory
presents an accurate and robust representation of the learn-
ing process in several training paradigms. Because IBL theory
has also demonstrated accurate representations in many other
tasks (see Gonzalez & Dutt, 2011, for a discussion), the theory
is more general than it was initially conceived to be: IBL theory
accounts for decisions from experience at many different levels.
This ability is illustrated by the precision of the models’ predic-
tions in the projects described here. Moreover, the creation of
an explicit computer tool that represents the theory can also
give rise to interesting demonstrations and new questions and
answers. The theory was embodied in the IBL tool, which is
available for research purposes. This tool should allow for more
widespread from the authors use of IBL theory as it helps facili-
tate a cognitive modeler’s work.
References
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C.,
& Qin, Y. (2004). An integrated theory of the mind. Psychological
Review, 111, 1036–1060.
Anderson, J. R., & Lebiere, C. (2003). The Newell test for a theory of
mind. Behavioral and Brain Sciences, 26, 587–639.
Ballard, J. C. (1996). Computerized assessment of sustained attention:
A review of factors affecting vigilance performance. Journal of Clini-
cal and Experimental Psychology, 18, 843–863.
Bishara, A. J., Pleskac, T. J., Fridberg, D. J., Yechiam, E., Lucas, J.,
Busemeyer, J. R., et al. (2009). Similar processes despite divergent
behavior in two commonly used measures of risky decision making.
Journal of Behavioral Decision Making, 22, 435–454.
Busemeyer, J. R., & Wang, Y. M. (2000). Model comparison and model
selections based on generalization criterion methodology. Journal of
Mathematical Psychology, 44, 171–189.
Dutt, V., & Gonzalez, C. (2011). Making instance-based learning the-
ory usable and understandable: The Instance-Based Learning Tool.
Manuscript under review.
Dutt, V., Yamaguchi, M., Gonzalez, C., & Proctor, R. W. (2011). An
instance-based learning model of stimulus-response compatibility
effects in mixed location-relevant and location-irrelevant tasks. Man-
uscript under review.
Fu, W., Gonzalez, C., Healy, A. F., Kole, J. A., & Bourne, L. E., Jr. (2006).
Building predictive models of skill acquisition in a data entry task.
In Proceedings of the Human Factors and Ergonomics Society Annual
Meeting (HFES 50th Annual Meeting) (pp. 1122–1126). Santa Monica,
CA: Human Factors and Ergonomics Society.
Gluck, K., Bello, P., & Busemeyer, J. (2008). Introduction to the special
issue. Cognitive Science, 32, 1245–1247.
Gonzalez, C., Best, B. J., Healy, A. F., Kole, J. A., & Bourne, L. E., Jr.
(2011). A cognitive modeling account of simultaneous learning and
fatigue effects. Cognitive Systems Research, 12, 19–32.
Gonzalez, C., & Dutt, V. (2011). Instance-based learning: Integrating
decisions from experience in sampling and repeated choice para-
digms. Psychological Review, 118, 523–551.
Gonzalez, C., Dutt, V., Healy, A. F., Young, M. D., & Bourne, L. E.
(2009). Comparison of instance and strategy models in ACT-R. In A.
Howes, D. Peebles, & R. Cooper (Eds.), Proceedings of the 9th Inter-
national Conference on Cognitive Modeling—ICCM2009. Manchester,
UK.
Gonzalez, C., Fu, W., Healy, A. F., Kole, J. A., & Bourne, L. E., Jr.
(2006). ACT-R models of training data entry skills. In Proceedings of
the Conference on Behavior Representation in Modeling and Simula-
tion (BRIMS 2006) (pp. 101–109). Baltimore, MD.
Gonzalez, C., & Lebiere, C. (2005). Instance-based cognitive models of
decision making. In D. Zizzo & A. Courakis (Eds.), Transfer of knowl-
edge in economic decision-making (pp. 148–165). New York: Palgrave
Macmillan.
Hoffman, R. R., Sherrick, M. F., & Warm, J. S. (1998). Viewing psychol-
ogy as a whole: The integrative science of William N. Dember. Wash-
ington DC: American Psychological Association.
Jongman, G. M. G. (1998). How to fatigue ACT-R? In Proceedings of the
2nd European Conference on Cognitive Modeling (pp. 52–57). Not-
tingham, England: Nottingham University Press.
Kole, J. A., Healy, A. F., & Bourne, L. E. (2008). Cognitive complications
moderate the speed-accuracy tradeoff in data entry: A cognitive
antidote to inhibition. Applied Cognitive Psychology, 22, 917–937.
Lebiere, C., Gonzalez, C., & Martin, M. (2007). Instance-based decision
making model of repeated binary choice. In R. L. Lewis, T. A. Polk &
J. E. Laird (Eds.), Proceedings of the 8th International Conference on
Cognitive Modeling (pp. 67–72). Ann Arbor, MI.
Lebiere, C., Gonzalez, C., & Warwick, W. (2009). Convergence and con-
straints revealed in a qualitative model comparison. Journal of Cog-
nitive Engineering and Decision Making, 3, 131–155.
Lejarraga, T., Dutt, V., & Gonzalez, C. (in press). Instance-based learn-
ing: A general model of repeated binary choice. Journal of Behavioral
Decision Making.
Lovett, M. C., Reder, L. M., & Lebiere, C. (1999). Modeling working
memory in a unified architecture: An ACT-R perspective. In A.
Miyake & P. Shah (Eds.), Models of working memory: Mechanisms of
active maintenance and executive control (pp. 135–182). New York:
Cambridge University Press.
Mackworth, J. F. (1969). Vigilance and habituation. Harmondsworth,
England: Penguin Books.
Martin, M. K., Gonzalez, C., & Lebiere, C. (2004). Learning to make
decisions in dynamic environments: ACT-R plays the beer game. In
M. C. Lovett, C. D. Schunn, C. Lebiere, & P. Munro (Eds.), Proceed-
ings of the Sixth International Conference on Cognitive Modeling (Vol.
420, pp. 178–183). Mahwah, NJ: Erlbaum.
Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Har-
vard University Press.
Parasuraman, R. (1986). Vigilance, monitoring, and search. In K. Boff,
L. Kaufman, & J. Thomas (Eds.), Handbook of perception and human
performance. Vol. 2: Cognitive processes and performance (pp. 1–43).
New York: Wiley.
Proctor, R. W., & Vu, K. L. (2006). Stimulus-response compatibility prin-
ciples: Data, theory, and application. Boca Raton, FL: CRC Press.
Proctor, R. W., Yamaguchi, M., Zhang, Y., & Vu, K. L. (2009). Influence
of visual stimulus mode on transfer of acquired spatial associations.
tion, 35, 434–445.
Simon, H. A. (1990). Invariants of human behavior. Annual Review of
Wickens, C. D. (1984). The multiple resources model of human perfor-
mance: Implication for display design (AGARD/NATO Report). Wil-
liamsburg, VA: AGARD/NATO.
Yamaguchi, M., Dutt, V., Gonzalez, C., & Proctor, R. W. (2011). Cog-
nitive mechanisms of the stimulus-response compatibility effects in
mixed location-relevant and location-irrelevant tasks. Manuscript in
preparation.
Yamaguchi, M., & Proctor, R. W. (2006). Stimulus-response compat-
ibility with pure and mixed mappings in a flight task environment.
Young, M. D., Healy, A. F., Gonzalez, C., Dutt, V., & Bourne, L. E.
10 Modeling Cognitive Tasks
in IMPRINT
Carolyn J. Buck-Gengler,
William D. Raymond, Alice F. Healy,
The Imprint Platform

The Improved Performance Research Integration Tool (IMPRINT)
is an Army-developed modeling tool used to simulate com-
plex, long-term activities involving personnel and equipment. It
contains a number of tools and algorithms allowing analysts
to project resource needs and task performance in a dynamic
way and in different scenarios. According to the Army Research
Laboratory’s website, “IMPRINT can be used to help set realis-
tic system requirements; to identify Soldier-driven constraints
on system design; and to evaluate the capability of available
manpower and personnel to effectively operate and maintain
a system under environmental stressors…. As a research tool,
IMPRINT incorporates task analysis, workload modeling, per-
formance shaping and degradation functions and stressors, and
embedded personnel characteristics data” (U.S. Army Research
Laboratory, 2011, para. 2). As a modeling tool, it allows systems
analysis and prediction of personnel and training requirements
early in system design so that problems may be worked out at a
less costly point in development. It focuses on the mental work-
load necessary for performing various and multiple tasks in
sequence as well as simultaneously. It can be used to help decide
the balance between soldiers and automation in allocating vari-
ous functions. IMPRINT has been used mainly, if not solely, for
large scale modeling, involving large numbers of equipment and
personnel, and even when used to model an individual’s per-
formance, it typically is used for activities that span long time
durations. So far as is known, it has not been used before to
model events that are short in duration, nor to reflect cognitive
processes at the scale of those modeled and described in this
chapter.
202 Carolyn J. Buck-Gengler et al.
Use of Imprint for Cognitive Modeling
As noted, most uses of IMPRINT have been for large-scale mod-
eling of Army personnel and equipment in war- or mainte-
nance/repair-related scenarios, typically activities that involve
many actors, both human and machine, over periods of days or
weeks. A typical cognitive psychology experiment is more likely
to involve repeated trials of similar types, with the purpose of
looking at the performance on individual trials that take place
in very short amounts of time (on the order of hundreds of mil-
liseconds to a few seconds). Another purpose is to evaluate how
performance changes with practice and how change is affected
by various experimental manipulations. Furthermore, the rela-
tionships among different variables in an experiment are more
complicated than are typically expressed in a straightforward
IMPRINT simulation. As a result, simulating the various experi-
mental tasks chosen for modeling in this research project posed
some unique challenges. One result of solving these challenges
was that most of the work in the IMPRINT models was done
in “expressions,” which had to be coded, rather than using the
basic modeling interface. One benefit of doing the modeling in
this manner was that the model was readily convertible to Mat-
lab (see chapter 11) for further analysis. However, because most
of the implementation was done in expressions, there were other
facets of the IMPRINT tool that could not be used. Nevertheless,
the result is clear: The type of cognitive tasks that are repre-
sented here can be modeled in IMPRINT, even though it is not a
typical way of using IMPRINT.
The Modeling Tasks

A large portion of the research leading up to this volume
attempted to identify and further understand various training
principles that could be employed in the modern army. These
principles are discussed in more detail in other chapters (see
specifically chapters 2 and 3). Most of the training principles
that have been identified have been validated using a limited
number of laboratory tasks. Among all the tasks considered
for modeling, the one demonstrating the largest number of
principles is a data entry task (see Healy, Kole, Wohldmann,
Buck-Gengler, & Bourne, 2011). Hence, the modeling efforts
were begun with data entry to provide insights into ways
to model human behavior that reflect a variety of training
principles. In its most basic form data entry is a simple task
Modeling Cognitive Tasks in IMPRINT 203
in which subjects type four-digit numbers using a computer
console. Data entry is a component skill of many complex tasks
including military operations performed in the networked
battlefield. Furthermore, by modeling the same target task
in both the ACT-R and IMPRINT platforms, the models could
be compared and contrasted, so as to illuminate their relative
strengths and weaknesses, thereby yielding appropriate
refi nements for each platform. The ACT-R model of data entry
is described in chapter 9. The evaluation of the models based
on both platforms is presented in chapter 11.
Data entry is a cognitively simple task. To examine the abil-
ity of IMPRINT to model other, more complex tasks, the second
laboratory task modeled was the RADAR target detection and
decision-making task. The description of this task using the new
taxonomy described in chapter 8 involves more component tax-
ons than the description of the data entry task, and by this mea-
sure is more complex. The greater complexity of RADAR poses,
to subjects and, hence, to any potential modeling platform, a
larger workload challenge and more processing requirements
than does data entry. As with data entry, given its similarity to
actual military tasks the RADAR task is of high military rele-
vance. As with data entry, the RADAR task was modeled in both
ACT-R and IMPRINT.
The third task modeled in IMPRINT was the information inte-
gration, or fusion, task (see chapter 3). The fusion task was
designed to allow for the evaluation of memory processes con-
tributing to decision-making and thus to simulate in a more
realistic way these vital aspects of the networked battlefield.
The Data Entry Task
Experimental Task to Be Modeled

The basic task in data entry involves typing four-digit numbers
followed by the Enter key, with equal emphasis on speed and
accuracy. The numbers are displayed on the computer screen
and typically typed on the number keypad on the right side
of the keyboard. Usually only the dominant hand is used, and
only right-handed individuals participate. Over the years there
have been many variations of the basic experiment, including
the length of time that numbers appear on the screen, the for-
mat of the numbers, whether the digits are presented simulta-
neously or sequentially, and whether feedback is given (Healy et
al., 2011).
Experimental Data to Be Modeled
In the set of two experiments modeled (Healy, Kole, Buck-Gen-
gler, & Bourne, 2004), right-handed subjects were presented
with five blocks of 64 (four-digit) numbers, given a rest break,
then presented with five more blocks of 64 numbers. In Experi-
ment 1 the 64 numbers in each set of five blocks were repeated
in every block in different orders (repetitious practice), whereas
in Experiment 2 all 640 of the numbers were unique (nonrep-
etitious practice). Also, in Experiment 1 all subjects used only
their (nondominant) left hand, whereas in Experiment 2, in the
first five blocks half of the subjects used their dominant right
hand and half used their left hand, and in the last five blocks
half of the subjects from each group continued using the same
hand and half switched hands. Analyses of data of Experiment
2 of Kole, Healy, and Bourne (2008) also informed some aspects
of the model. Healy et al. (2004, Experiment 2) provided initial
values for nonrepetitious practice of data entry on the numerical
keypad of a computer keyboard. The analyses of these data pro-
vided block response time (RT) means for keystrokes 1 through 4
and the Enter key (terminating each trial) as well as mean total
response times (TRT) per trial for all blocks. These data formed
the basis for calibrating and evaluating performance of the ini-
tial model of nonrepetitious data entry, first using the dominant
hand (by focusing on the first half of the trials for subjects using
their right hand). Later, use of the nondominant left hand was
added to be able to model the effect of handedness. Subsequent
analyses of the data from Healy et al. (2004, Experiment 2) and
also Kole et al. (2008, Experiment 2) were performed to exam-
ine the variability of subjects’ overall RTs, how the individual
trial RTs for each component were distributed, the types and
rates of errors, and the extent of learning with (nonrepetitious)
skill practice. Using the results of these analyses, the model
was enhanced to include normally distributed subject RT vari-
ability, a skewed distribution of individual trial RTs, errors, and
learning as a linear function of practice. The corresponding set
of means from Healy et al. (2004, Experiment 1) was used to
determine a learning function for repetition of numbers.
The basic RT results from the experimental subjects that were
modeled were as follows:
1. When items were repeated over five blocks (Healy et al.,

2004, Experiment 1), TRT improved; however, in nonrepeti-
tious practice (Experiment 2), TRT did not improve as much
as in repetitious practice (Experiment 1) in the first half,
and in fact increased in the second half. These results sug-
gest specific item learning with repeated practice (Experi-
ment 1), but some learning of the basic skill of typing digits
in the first half of Experiment 2 being replaced by fatigue
in the second half of that experiment.
2. Regarding average RTs for individual keystrokes, the first
keystroke is by far the longest, indicating that the major-
ity of a subject’s cognitive processing takes place before
typing the first keystroke. Also, the third keystroke is sig-
nificantly longer than the second and fourth keystrokes,
indicating some additional cognitive processing resulting
from “chunking” of the four digits into groups of two digits.
3. The first keystroke was the only keystroke showing
improvement over time and also slowing, indicating cogni-
tive fatigue, in the second half of Experiment 2.
4. There was an aggregate speed–accuracy trade-off, in that
speed increased but accuracy declined across blocks in
both halves of both experiments.
5. Typing with the nondominant hand was slower than typing
with the dominant hand.
6. When the RTs of separate keystrokes of individual sub-
jects were examined, it was clear that not all subjects
“chunked”—that is, for some subjects all three keystroke
lengths were comparable. Using this difference in behavior,
the subjects were differentiable into two groups, chunkers
and nonchunkers.
In addition to speed, the model simulates subjects’ response

accuracy at data entry. Modeling both speed and accuracy is
desirable, because the two measures of performance are known
to be differently affected by some training factors. In particu-
lar, although skill practice results in faster data entry, it also
produces a decline in accuracy for most subjects, an increasing
speed-accuracy trade-off.
The following assumptions about human behavior guided the
modeling of data entry:
1. Statistical subjects’ overall TRTs for each number were

drawn from a gamma distribution around the experimen-
tal mean for TRT across all subjects, with RTs for each
keystroke then apportioned from the sampled TRT to each
component.
2. Statistical subjects were randomly selected to be “chun-
kers” or “nonchunkers,” indicating a predisposition toward
chunking or not on each trial that was modeled to reflect
the behavior of human subjects in the two groups.
3. Most of the learning during practice was cognitive, with
a small amount of physical learning; cognitive learning
accrues at the beginning of each chunk, shortening the
first and third keystrokes; physical learning accrues while
typing all keystrokes, resulting in shorter RTs for each key-
stroke across practice.
4. Fatigue counteracts improvements due to learning, result-
ing in longer TRTs. Fatigue becomes apparent partway
through the experiment, but not at the same time for all
subjects; statistical subjects’ fatigue distribution mimics
that of the human experimental subjects.
5. The left hand is slower overall than the right hand; how-
ever, physical learning is faster with the left hand than
with the right hand.
Modeling Data Entry in IMPRINT

The IMPRINT model of data entry was designed so that the
effects of a number of factors could be simulated (Buck-Gengler,
Raymond, Healy, & Bourne, 2007). Thus, the model was also
able to make predictions about the effects of combinations of
factors that have not yet been explored empirically, which both
tested the robustness of the model and led to further empirical
studies.
The cognitive model simulated in IMPRINT consists of three
processing stages: (a) visual processing of a number creates an
internal representation of the digits to be typed, (b) the repre-
sentation guides development of a motor output plan, and (c) the
motor plan is accessed and implemented to execute each key-
stroke in sequence (see Figure 10.1). The model components are
compatible with generally accepted principles of cognitive pro-
cessing, and the resulting model is thus similar to other models
of human information processing and response, such as lan-
guage production. The model assumes serial processing of each
stimulus. In the IMPRINT model, the main network simulates
the computer, and the subject is simulated as a goal network,
whose activity is triggered by computer activity, namely, presen-
tation of stimuli to the subject. The simulated subject responds
by processing each stimulus of the experiment in turn.
The model was developed to simulate RTs for both right-
handed and left-handed practice of unique and repeated num-
bers, typing accuracy, chunking behavior, and effects of fatigue.
Initial parameter settings in the model were estimated from
1395
Language/reading read digit represent digit Symbolic representation
access
representation plan response Motor planning
access plan execute plan Fine motor manipulation
1395
Figure 10.1 Cognitive model of data entry.
experimental data and modified iteratively until settings pro-

duced a satisfactory match between predictions and obser-
vations. Model parameters controlling speed and accuracy
performance were the same for the two experiments.
In the IMPRINT model, the main and goal networks run in
parallel to simulate a subject performing data entry. The main
network represents the program presenting the stimuli to the
subject, and sets experiment and subject variables for typing
hand and number repetition. The average population TRT var-
ies by experiment to match differences between subject groups.
Subject variables are set probabilistically to reflect individual
variation in TRT, chunking strategy, cognitive and physical
learning rates, and fatigue onset and rate. Each iteration of the
goal network is a single trial, representing a subject reading
and typing a four-digit number. A trial consists of several tasks,
based on the cognitive model, each contributing part of the sub-
ject TRT. Task times are randomly drawn from a right-skewed
distribution around the task’s proportion of the TRT, adjusted
for typing hand and improvement with practice. Improvement
on unique numbers is modeled by exponential functions of the
number of trials. Improvement from repetition follows an expo-
nential function of the number of repetitions seen. Left hand
typing is modeled as a multiplier penalty for motor activity. If
a trial involves chunking, an additional task executed before
the third keystroke models the extra cognitive processing of
the chunk. When a subject’s fatigue onset threshold is reached,
a small amount of time per block is added. Error trials occur
randomly on approximately 1 out of 10 trials, and each error
trial is randomly assigned an output length (from 0 to 8 digits);
time accrues for all digits typed. Errors increase linearly across
blocks.
Model Assessment
The final model was used to fit the data from both experiments,
with only the parameters of hand use, number repetition, and
overall aggregate RT being changed in the model between exper-
iments. Two seeds used for random number generation were
chosen arbitrarily, then two runs were executed for each experi-
ment, one with each of the two seeds. In each run the model
simulated 32 statistical subjects. Each run’s outputs were then
compared to the data from the 32 experimental subjects from
each of the respective experiments by Healy et al. (2004), as well
as to each other. RTs (TRT and RTs for each keystroke) for cor-
rect trials and error output length were used for comparison. To
evaluate model fit, both block mean r2 and Root Mean Square
Error (RMSE) between the set of individual keystroke RTs (50
data points) from the experiment and each of the model outputs
for that experiment were computed (see Table 10.1). RTs over all
10 blocks for each experiment and for both model and experi-
mental data are shown in Figures 10.2 (individual keystrokes)
and 10.3 (first keystroke).
Summary of IMPRINT model of data entry

Although IMPRINT was never intended as a platform for model-
ing detailed cognitive processes, the model of data entry dem-
onstrates that IMPRINT can be used for such modeling. The
IMPRINT model of data entry was able to fit the experimental
means, but also replicated individual differences and subject
behavior patterns not evident from examining just the aggregate
Table 10.1 Values for r2 and RMSEs for Comparison of the Two Final
Model Runs to the Subject Data as well as to Each Other in the Two
Experiments of the Data Entry Task
Experiment Exp vs. Exp vs. Run 1 vs.
run 1 run 2 run 2
r2 Exp 1 0.960 0.956 0.999
Exp 2 0.994 0.993 0.999
RMSE Exp 1 0.087 0.102 0.023
Exp 2 0.031 0.047 0.023
1.6
1.4
1.2 E1 data
E2 data
1
E1 model
RT (s)
0.8 E2 model
0.6
0.4
0.2
0
1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9
1st 2nd 3rd 4th Enter
Keystroke and Block
Figure 10.2 Mean response times (in s) across blocks from one run of
the IMPRINT model of data entry for each experiment and
the experimental data on which it was based, by keystroke.
data (i.e., block means) reported by Healy et al. (2004). It suc-

cessfully simulated speed improvement due to skill practice
and number repetition, accuracy decline, effects of fatigue, and
speed differences due to typing hand and chunking behavior.
Moreover, it mirrored individual differences in chunking strat-
egy, speed improvement, and fatigue onset time and rate.
1.35
E1 Data
1.3 E2 Data
E1 model
1.25
E2 model
RT (s)
1.2
1.15
1.1
1.05
1 2 3 4 5 6 7 8 9 10
Block
Figure 10.3 Mean response times (in s) for the fi rst keystroke, across
blocks, for both experiments and one model run for each
experiment, showing similar patterns of learning and
fatigue for data and model. The dashed lines show best fit-
ting linear functions.
The RADAR Task

The RADAR task was developed by Gonzalez and Thomas (2008).
In the experiment modeled here (Young, Healy, Gonzalez, Dutt,
& Bourne, 2011; see also chapter 3 in this volume for data from
the RADAR task and chapter 9 for discussion of ACT-R models
of this task), subjects searched for assigned symbol targets in
four squares that moved inward from the four corners of the
computer screen to the center of a radar-like display in 2.062 s.
Different sets of symbols were shown in the squares for each of
seven frames comprising a trial. New targets were assigned for
each trial. Squares did not always contain a symbol, but could
be blank. Subjects were to respond only if an assigned target
appeared in one of the squares during a trial, and were scored
on response speed and accuracy.
The experiment contained both consistent mapping (CM)
and varied mapping (VM) trials. In CM, targets and distractors
came from different symbol types (letters, digits), so could be
distinguished by set membership alone; in VM, both targets and
distractors were from the same set, requiring specific memory
for target items. Processing load was manipulated by varying
memory load (i.e., number of targets in a trial) and search dif-
ficulty (i.e., number of filled squares) (see chapter 4 for a discus-
sion of the expected effects of processing load on performance).
In low processing load trials (LP), the target set for each trial
consisted of a single symbol, and only one square on the screen
of each frame contained a symbol, with the rest being blank. In
high processing load trials (HP), the target set consisted of four
symbols, and all four squares of each frame contained a symbol,
although only at most one symbol was from the target set.
Trials were grouped in blocks of 20, with 8 blocks in each of
2 sessions. Session 1 (training) occurred 1 week before Session
2 (test). A random 15 of the 20 trials in each block contained
a target. All trials in a block had the same mapping type and
processing load, and the block type varied systematically across
the 8 blocks in the following order: CM1, CM4, VM1, VM4, VM4,
VM1, CM4, CM1 (where 1 indicates LP and 4 indicates HP).
The effects on the main task of a concurrent secondary task
were also examined. The secondary task required counting and
reporting the number of tones heard deviating from a standard
(base) tone during the seven frames of each trial. In tone count-
ing conditions tones were played 500–1500 ms apart during all
trials of the experiment. Deviant tones were easily differentiated
from the base tone. About 15% of the tones in each trial deviated
from the base tone. There were 48 subjects; half trained with
tone counting and target detection, and half performed target
detection in silence. At test, half the subjects in each tone condi-
tion stayed in the same condition, and half switched to the other
tone condition.

Three measures of a subject’s target detection response were
used to capture subject behavior in the task; these measures
form the basis for assessing the IMPRINT model of RADAR
through comparison with the experimental results. The first
measure is average subject RT for correct target detections
on trials in which a target was presented (target trial). To be
included in the correct target detection RT measure, the space-
bar must have been pressed for the first time in the trial during
the frame in which a target was presented (target frame). The
second and third measures, hit rate and false alarm rate, reflect
subject accuracy. Hit rate is the proportion of target trials in a
block (out of 15) on which the subject correctly responded to the
target during the target frame. (Note that the average of the hit
RTs is the correct RT measure.) False alarm rate is the propor-
tion of trials in a block (out of a total of 20 trials) in which the
first response during a trial was within a frame before a target
appears in the trial.
For target detection, correct RTs were faster overall for CM
than for VM trials, and also faster for LP than for HP trials. The
disadvantage for HP trials was larger overall for VM trials than
for CM trials; this interaction was evident in both training and
test sessions. Accuracy as measured by hit rate was lowest for
the VM4 trials. The results for false alarm rate were more com-
plex, demonstrating improvement across trials as well as effects
of mapping type and processing load.
The secondary tone counting task negatively impacted all
measures in both sessions. Furthermore, counterintuitively and
counter to the training difficulty and procedural reinstatement
principles (see chapters 2 and 3), training with tone counting
resulted in reduced speed and accuracy in both tone conditions
at test.
Modeling the RADAR Task in IMPRINT

The IMPRINT model of the RADAR task was designed to capture
the interactions between mapping type and processing load,
as well as the interaction of those factors with session and the
secondary tone counting task (including the between-subjects
manipulation of tone counting across sessions). Correct RT, hit
rate, and false alarm rate were simulated at the frame level,
and the cumulative response across frames in a trial gives the
response at the trial level, which was used for comparison with
the reported experimental data (Buck-Gengler, Raymond, Healy,
& Bourne, 2010).
The IMPRINT model was implemented as two parallel net-
works: one network represented the computer presenting the
visual stimuli (and tones in the tone counting conditions); a sep-
arate network simulated the subject processing and responding
to stimuli as they were presented. The model assumes indepen-
dence between frames in a trial and between trials that may
not reflect the behavior of experimental subjects; however, the
model was not allowed to give a target response in a frame if it
had responded in an earlier frame in that trial.
The model’s assumptions are as follows. If a target is rec-
ognized, the simulated search terminates, and the simulated
subject presses the space bar at that time (variability in when
the termination occurs is allowed, which could result in either
a longer response or a missed response, occurring in a frame
subsequent to the target frame). On each trial one square is
examined, and any symbol in the square is compared to the
target set for the trial. If the symbol does not match an item in
the target set, then (in the conditions with four-target sets and
four squares filled in with symbols) the search moves to another,
unexamined square on the screen and the decision compari-
son is made again; this combination of eye movement to a new
search location and target decision can happen up to four times
for the four squares. The assumption that the search terminates
if a target is detected means that, on average, 2.5 eye movements
and target decisions are executed per frame in the four-target
conditions. In the single-target conditions, on the other hand, if
a target is not detected, then the subject simply waits until the
end of the frame, unless there is an erroneous target detection,
resulting in a false alarm.
The cognitive model of the visual search task simulated in
IMPRINT consists of three processing subtasks. The search sub-
task modeled a subject’s eye movement to one of the four search
locations of a frame. The decision subtask modeled the subject’s
decision of whether a location contains a symbol in the target
set. The response subtask modeled the subject’s motor response
to target detection by pressing the space bar. Implementation
details of each of the subtasks varied as a function of processing
load. Subtasks are repeated in each frame until the target is
found, all squares have been searched, or the trial times out
(see Figure 10.4).
Implementation details of the eye movement subtasks differed
depending on processing load; details of decision subtasks dif-
fered depending on mapping type and training condition. Eye
movements in the LP conditions were to the square containing
a symbol; in the HP conditions any square could be moved to
first, resulting in shorter movement time, with equivalent times
for subsequent movements. In CM, whether the square with a
symbol contains a target can be decided simply by comparing
the target’s symbol type to the symbol type of a square’s content
because all distractors were of the other symbol type. In VM,
target decisions require comparison of the square’s content to
the target set in memory. In VM1, the decision is a comparison
of the single target with the square’s content, with decision time
equivalent to that for CM. In VM4, four possible targets must
be compared against each square examined, resulting in longer
decision times. In all trials, if a target is detected, a response is
made and the trial ends; otherwise, the condition-appropriate
subtasks repeat until a target has been detected or all seven
trial frames have been presented.
Hits were modeled stochastically for frames with targets.
Hit rate was lower for VM4 trials than other trial types. False
alarms were also modeled stochastically for frames without tar-
gets. The false alarm rate declines were implemented with expo-
nential functions across trials, with exponents determined by
block type. Initial rates in a block were based on the false alarm
rate at the end of the previous block and the type of change in
difficulty from the previous block to the current block.
Is this in YES!!
Press
right char
space bar
CM set
Move
Prepare eye to Is this the Wait for
VM1
for next next target next
frame square frame
Are there
VM4 more no
Is this in the squares to
memory set look at
NO
yes
Figure 10.4 Cognitive model of RADAR task.
RTs for frames with hits were the sum of eye movement, deci-
sion, and response times. Eye movement and response times
were based on IMPRINT micromodels for eye movement and key
pressing. CM and VM1 decision times were modeled stochasti-
cally. Greater VM4 decision times were multiples of VM1 times
to model search of the memory set. RTs were increased and hit
rates were decreased to simulate the additional load of the sec-
ondary task and the impairment at test from training with tone
counting.
Modeling Tone Counting. Data analysis indicated that subjects

in conditions with tone counting performed worse than subjects
in conditions without tone counting. The analysis indicated
overall slowing of all responses with tone counting, regardless
of processing load. RTs with tone counting showed no evidence
of improvement with practice, as was also the case without tone
counting. As in the experiment, tones were implemented as
occurring randomly every 500 to 1500 ms and approximately
15% of the tones were “deviant.”
The model added RT costs of tone counting in three ways.
First, RTs were increased by a constant RT penalty for all trials
when tone counting was performed as a secondary task. Sec-
ond, a smaller RT penalty was added to RTs for each deviant
tone presentation, to simulate the cognitive demand of recog-
nizing the deviant tone and incrementing the count of deviant
tones heard during a trial. Third, a very small RT penalty was
added to RTs for each presentation of a nondeviant tone, recog-
nizing the cognitive demand of the tone deviance decision. The
tone counting task was implemented in IMPRINT by allowing
the computer network to keep track of tone presentations and
tone type during each frame. Penalties for the totals, along with
the general tone task penalty, were added to the subject’s total
frame RT whenever the subject responded.
Additionally, the data indicated that having trained with tone
resulted in longer RTs at test for both tone-counting conditions.
Therefore an RT penalty was assessed on all trials at test for
the conditions that had trained with tone counting. The penalty
for training with tone was implemented by increasing the sub-
ject’s total frame RT by a constant percentage for all test frames
whenever the subject responded.
Modeling Accuracy Measures. The model was also designed

to simulate accuracy as measured by hit rate and false alarm
rate. Hit rate was lower in VM trials than in CM trials, with
lower hit rate in VM trials for the higher processing load than for
the lower processing load. There was also a decrease in hit rate
with tone counting and in both tone counting conditions at test
when training had been with tone counting. Hit rate effects were
implemented by assessing penalties at the trial level.
False alarm rate was clearly influenced by task difficulty, with
generally more false alarms in VM trials than in CM trials, and
with an additional increase in higher processing load VM trials
in Session 1. False alarm rate was high in the first block of each
session, relative to other CM blocks, perhaps because the task
was new or not recently practiced. Tone counting increased false
alarm rate at training and test and additionally increased false
alarm rate at test when subjects had trained with tone count-
ing. Furthermore, changing from one tone counting condition to
the other between sessions also increased false alarm rate. The
detailed pattern of initial false alarm rate results was modeled
without any further attempt to provide a cognitive model of the
effects. Initial false alarm rates for each block were implemented
as multipliers of the initial rate of the prior block, with only the
first blocks of the first session being numerically specified. Ini-
tial false alarm rate values for the first block were taken from
the experimental data.
Hit rate did not show any evidence of learning. However,
false alarm rate had a more complex pattern of performance
improvement that did suggest learning with practice. Learning
to avoid false alarms could be seen, and the rate of improve-
ment was a function of mapping type, processing load, and ses-
sion. Subjects learned most when the initial level was highest,
presumably because initial high rates allowed more room for
improvement. Initial high rates were found (a) in the first block
of both sessions, (b) in the VM4 blocks of training, and (c) in all
VM blocks of test. The results were implemented by allowing
false alarm rate to be adjusted downward from the block-initial
rates on each trial by a constant proportion.
Model Assessment
The final model was run twice with different random seeds. As
an example, Figure 10.5 shows the comparison of the RTs from
the experiment and simulated RTs from the first of these two
model runs. Table 10.2 includes r 2 and RMSE comparing the two
final model runs to the data and to each other, for RT, hit rate,
and false alarm rate.
The fit is not as good for false alarm rate as it is for hit rate
or RT (see Table 10.2), due in large part to the more variable
and complex false alarm rate pattern. Also, twice as many data
1400
Silent-Silent
1300 Silent-Tone
1200 SS
ST
1100
1000
RT (ms)
900
800
700
600
500
CM1 CM4 VM1 VM4 CM1 CM4 VM1 VM4
Training Test
Session and Shift Type
1400
Tone-Silent
1300 Tone-Tone
1200 TS
TT
1100
RT (ms)
1000
900
800
700
600
500
CM1 CM4 VM1 VM4 CM1 CM4 VM1 VM4
Training Test
Session and Shift Type
Figure 10.5 Comparison of the IMPRINT model and RADAR data for
response time for correct responses, for training in silence
(top) and training with tone (bottom) groups. The spelled-
out labels represent subject data, and the labels with two
letters represent IMPRINT output. SS = silent-silent, ST =
silent-tone, TS = tone-silent, TT = tone-tone, CM = consis-
tent mapping, VM = varied mapping, 1 = one target item
and one fi lled square (LP); 4 = four targets and four fi lled
squares (HP). To see the impact of tone, for the training
half compare the two panels, and for the test half within
a panel compare the first and second of each pair of bars.
Also, means in training are averaged for the groups with
the same training condition, because at that point there is
no difference in method between them.
Table 10.2 Values for r and RMSE for the Comparisons of the Two
2
Final Model Runs to the Subject Data in the RADAR Task, as well as
to Each Other. The Unit for the Response time RMSE is ms.
r2 RMSE
Response Time
Model Run 1 vs. Data 0.975 54.2
Model Run 1 vs. Model Run 2 0.980 34.3
Hit Rate
False Alarm Rate
points had to be fit because of the observed learning, which pre-

cluded averaging over block in each session as was done with
the previous two measures.
The Fusion Task

The Information Integration, or fusion, task was developed to
provide a test bed that resembled authentic military field opera-
tions. In the fusion task, a sequence of targets is shown in a
matrix followed by the subject’s selection of the best location in
that field to maximize damage to targets.
In this task, for the experiments modeled here, the subject
saw a 10 × 10 grid of squares occupying most of the computer
screen. Seven locations (“targets”) were shown sequentially for 1
s each; then the subject had to do two things: (a) recall the seven
locations (either using serial recall or free recall), and (b) select
the one square (firing decision) that he or she thought would
result in the most “damage” to targets in the seven locations.
The ideal firing location was the one that maximized the dam-
age based on the distance of the firing location to each target.
Subjects were given feedback as to accuracy with respect to the
optimal location.
The experiments modeled are Experiments 2 and 3 of Ketels,
Healy, Wickens, Buck-Gengler, and Bourne (2010). In these two
experiments the seven target locations were confined to one of
the four 5 × 5 quadrants of the 10 × 10 grid, with each of the
four quadrants being used equally often among the 48 trials of
seven items; two orders of items, one the reverse of the other,
were used to eliminate location differences between the initial
and final items. The primary difference between the two experi-
ments is that in Experiment 3 the recall came before the firing
decision, and in Experiment 2 the firing decision came before
the recall. Only the serial recall condition is modeled.

The fusion experiments required both decision and recall, and
both responses were modeled. The primary purpose of the mod-
eling was to determine the extent to which the serial position
recall function could be used to predict the corresponding deci-
sion-making function. To the extent that the modeling effort was
successful there would then be evidence that the decisions made
by subjects depended, at least in part, upon what they recalled
about the locations.
The observed firing decision data essentially did not differ
between the two experiments. In contrast, the observed recall
rate in Experiment 3, in which recall was done first, was better
than that in Experiment 2, in which recall followed the decision.
The recall functions showed large primacy effects with a very
small recency effect only on the last position. The observed fir-
ing location, in relation to the seven target locations, bore some
resemblance to the serial recall curves. Thus, Experiment 3 was
taken as the base experiment, with the idea that the amount
recalled in both experiments should inform the firing decision
but that the firing decision in Experiment 2 might adversely
affect memory for the seven locations, thereby depressing the
recall accuracy in Experiment 2 relative to Experiment 3.
Accuracy in recall was scored all-or-none, based on whether
or not the exact location was chosen. Firing response selection
was measured as the distance of the firing response from the
displayed location for each displayed location position (except
the middle position which was omitted from the analysis). Note
that this is not a measure of decision accuracy (which would
be the distance from the ideal location) but rather of decision
response bias. For each position, the distance calculated was
the average of the values for the X and Y coordinates. Overall,
subjects showed a huge primacy effect, firing distance becom-
ing larger as the positions approached the middle. There was, in
addition, a weaker recency effect in firing distance. Thus, initial
positions were closer to the selected firing position than later
ones, and especially closer than ones in the middle (see chapter
3 and Ketels et al., 2010, for a cognitive interpretation of these
effects).
Modeling Fusion in IMPRINT

Both the firing decision data and the recall measure produced
bow-shaped serial position functions. It was hypothesized that
the extent to which a particular location was recalled contrib-
utes proportionately to the firing decision. Thus, a model of serial
recall was needed that could capture the serial recall functions
seen (lots of primacy, hint of recency) and then be used as the
basis for the firing decision. Statistical analysis showed that fir-
ing first (Experiment 2) degraded performance on recall, com-
pared to recalling first (Experiment 3), whereas recall order did
not seem to make a difference in the firing location itself.
Several well-known models of recall were examined to find one
that could be beneficial as a starting point for understanding the
data trends and as an underpinning for the IMPRINT model. The
Start-End Model (SEM; Henson, 1998) was found to be the most
useful of existing cognitive models for the present purposes. The
SEM has several core assumptions; of importance here are that
(a) the starting and ending positions are the most salient and
thus known as the anchors; (b) the length of the list is known
or at least expected; (c) there are both a start marker, strongest
at the beginning and weakening over the span of the list, and
an end marker, weakest at the start of the list and strength-
ening over the span of the list, thus the relative strengths and
rates of increase or decrease of strength (four parameters) can
be used to create a two-dimensional code for each position; (d)
these parameters can be translated into a bow-shaped function,
with different values of the parameters giving different shapes.
The strengths of the SEM relative to the fusion experiment are
that it captures key serial recall findings (e.g., effects of primacy,
recency, list length, proactive interference, etc.), it was the first
model to capture complete patterns of errors (e.g., transposi-
tions, repetitions, omissions, inclusions, etc.), and for the pur-
pose of modeling fusion, even a truncated version recreated in a
theoretical fashion a usable bow-shaped curve.
SEM computes the start and end marker strengths from the
number of items, the maximum strengths of the start and end
markers, and the change in strength of each marker over posi-
tion in the list. These strengths for the start and end markers
are summed for each position, resulting in a curve resembling
a bow-shaped recall function. In the IMPRINT model, these val-
ues (one for each position) were converted to proportions and
then adjusted to put them into the appropriate range for the
experimental data. The resulting values were used as a thresh-
old (explained in more detail below) for recall accuracy. The
parameter values for the SEM equations were adjusted until the
function resembled that of Experiment 3.
The working hypothesis for the IMPRINT modeling was that
the selected firing decision location would be impacted by the
amount of memory for each position, rather than all targets con-
tributing equally (which, in practice, would be the desired con-
tribution of the locations). The recall response was assumed to
be the best guide to what is in memory, and therefore it ought to
contribute to the firing response as well. A second assumption
was that the Experiment 2 firing response was based on the
same memory as the Experiment 3 recall, but the act of firing
degraded the recall threshold for Experiment 2 recall.
The model worked as follows for Experiment 3, in which
the recall response was given before the firing response. The
strengths at each position were computed and adjusted for a
recall accuracy threshold. Then in every trial a random num-
ber was generated for every position and compared against the
value for that position in the recall accuracy threshold function.
Values below the threshold resulted in correct recall, whereas
those greater than threshold resulted in incorrect recall. The
firing response location was computed by using those values
as weights on the positions to give weighted average X and Y
values. The modeled recall data of Experiment 3 were also used
to predict firing responses given in Experiment 2. Recall data in
Experiment 2 were predicted from recall data in Experiment 3
by subtracting the average difference between observed Experi-
ment 2 and Experiment 3 recall curves, to reflect the degraded
recall in Experiment 2 compared with Experiment 3.
Model Assessment
The results of the IMPRINT model compared with the experi-
mental data can be seen in Figure 10.6 (Panel 1: Experiment 3
recall, r2 = 0.947, RMSE = 0.034; Panel 2: Experiment 3 fi ring
decision, r2 = 0.973, RMSE = 0.067), and Figure 10.7 (Panel
1: Experiment 2 recall, r2 = 0.978, RMSE = 0.028; Panel 2:
Experiment 2 firing decision, r2 = 0.987, RMSE = 0.080). As can
be seen from the fits and the resulting r2 and RMSE values,
the IMPRINT model did a good job capturing the firing values
based on the recall values, including capturing the depressed
0.7
0.6
Proportion of correct response
0.5
0.4
0.3
0.2
Experimental Subjects
0.1 IMPRINT Statistical Subjects
0
1 2 3 4 5 6 7
Serial Position
1.4
Mean item distance from firing position
1.3
1.2
1.1
Initial (Exp Ss)

1
Final (Exp Ss)
Initial (IMPRINT)
0.9
Final (IMPRINT)
0.8
End 2nd from end 3rd from end
Item Presentation Position
Figure 10.6 Recall (top) and firing response (bottom) results from
Experiment 3, experimental data vs. IMPRINT model. For
the fi ring response, the initial positions (1st, 2nd, 3rd) are
compared to the final positions (7th, 6th, 5th), which have
the same locations across subjects because of the counter-
balancing. Exp = experimental.
0.4
Proportion of correct response 0.35
Experimental Subjects
0.3
IMPRINT Statistical Subjects
0.25
0.2
0.15
0.1
0.05
0
1 2 3 4 5 6 7
Serial position
Mean item distance from firing position
1.4
1.3
1.2
1.1
1
Initial (Exp Ss)
Final (Exp Ss)
0.9
Initial (IMPRINT)
Final (IMPRINT)
0.8
End 2nd from end 3rd from end
Item presentation position
Figure 10.7 Recall (top) and firing response (bottom) results from
Experiment 2, experimental data vs. IMPRINT model. For
the fi ring response, the initial positions (1st, 2nd, 3rd) are
compared to the final positions (7th, 6th, 5th), which have
the same locations across subjects because of the coun-
terbalancing. (Note, the IMPRINT values for the firing
response are identical to those for Experiment 3 because
they are based on the same threshold values.) Exp =
experimental.
recall values in Experiment 2 compared to Experiment 3 that
one might assume if having to make the firing decision first
did indeed impair memory. Thus, the IMPRINT model supported
the hypothesis that the firing decision was based in large part
on memory for the locations, such that the decisions made by
subjects depended upon how much they recalled about the
locations.
Summary of Imprint Modeling

The goals of modeling using IMPRINT were twofold: to provide
models that simulate and predict task performance and to
compare cognitive task models on two very different platforms
(IMPRINT and ACT-R). The IMPRINT platform had not previously
been used for detailed cognitive modeling, which makes the
present effort unique and original.
An additional benefit of the IMPRINT modeling effort was that
in the process of understanding the data in order to model it, fur-
ther behavioral and theoretical insights into the modeled tasks
were gained. In the data entry task, IMPRINT model-guided data
reanalysis led to the discovery that chunking between the first
two and the last two digits was not done by all subjects, and
appeared to be a strategy choice; this finding was then success-
fully built into the IMPRINT model (as well as into the ACT-R
model; see chapter 9). In the RADAR task, the false alarm rate
only (not RT or hit rate) showed improvement between blocks.
This previously unnoted but interesting fact was revealed as a
consequence of modeling in IMPRINT. Finally, in the fusion task,
the IMPRINT model revealed how recall memory could be used
to predict the firing decision. Specifically, the model was use-
ful in validating the hypothesis that firing decisions rest largely
on memory for the sequence of target locations preceding the
decisions. In all three cases, the IMPRINT modeling effort con-
tributed to a deeper conceptual understanding of the empirical
results.
References
Buck-Gengler, C. J., Raymond, W. D., Healy, A. F., & Bourne, L. E.,
Jr. (2007). Modeling data entry in IMPRINT. In Proceedings of the
Sixteenth Conference on Behavior Representation in Modeling and
Simulation (pp. 205–206). Orlando, FL: Simulation Interoperability
Standards Organization.
Jr. (2010). Modeling a visual search task with a secondary task in
IMPRINT. In Proceedings of the Nineteenth Conference on Behavior
Representation in Modeling and Simulation (pp. 63–64). Charleston,
SC: Simulation Interoperability Standards Organization.
Henson, R. N. A. (1998). Short-term memory for serial order: The Start-
End Model. Cognitive Psychology, 36, 73–137.
L. E., Jr. (2010, April). Spatial list learning and decision making in
the fusion paradigm. Poster presented at the 80th Annual Conven-
tion of the Rocky Mountain Psychological Association, Denver, CO.
917–937.
U.S. Army Research Laboratory. (2011). Improved performance research
integration tool (last updated March 1, 2011). Retrieved from http://
www.arl.army.mil/www/default.cfm?page=445
11 Evaluation and Comparison
of Models of Human
Performance During
Training
Bengt Fornberg, William D. Raymond,
Carolyn J. Buck-Gengler, Alice F. Healy,
Bradley J. Best
Adaptive Cognitive Systems LCC
Lyle E. Bourne, Jr.

The comparison and evaluation of computational models of

human learning are issues of active interest to psychologists
and members of the behavior modeling community (e.g., Gluck
& Pew, 2001; Pitt & Myung, 2002; Young, 2003). A goal of the
present research project was to identify techniques and pro-
cedures that could be used for comparison and evaluation of
models of human learning, and to apply the techniques to the
models developed for predicting training effects. The results of
the model comparisons and evaluations are presented in this
chapter (see also Raymond, Fornberg, Buck-Gengler, Healy, &
Bourne, 2008).
The focus of this work has been on comparisons and eval-
uation of models of two empirical tasks, a digit entry task of
numerical data (digit data entry) and a more complex visual
search task (RADAR). Both of these tasks primarily involve cog-
nitive phenomena, which makes them directly relevant to the
effects of training on performance of cognitive tasks, and both
have been the subject of a wide variety of experimental stud-
ies (Healy, Kole, Wohldmann, Buck-Gengler, & Bourne, 2011;
Young, Healy, Gonzalez, Dutt, & Bourne, 2011; see also chap-
ters 2, 3, and 8, this volume). The models of these tasks were
developed not only to provide descriptive and predictive capa-
bilities, but also to deepen our understanding of the genuine
226 Bengt Fornberg et al.
nature of the processes that are modeled. As part of the present
research project, both tasks were modeled using three different
computational systems: ACT-R, IMPRINT, and Matlab. The three
resulting models of each task were then separately evaluated
and compared. Evaluation of the model simulations was accom-
plished by measuring model accuracy in terms of the closeness
of fit of the tasks against previously collected experimental data.
Comparisons of the model simulations were done in terms of
model accuracy, model compactness, readability of model code,
and, especially, model execution timing. The comparison data
were gathered for each model on comparable computer systems,
typically standard desktop and notebook PCs (using a single
processor with clock speeds around 2 to 3 GHz).
As will be described in this chapter, the IMPRINT and ACT-R
models were both similarly accurate and had similar execution
times. However, the most striking outcome of the present model
evaluation effort was the very large speed gains that proved pos-
sible when using the Matlab environment to model the tasks, as
compared to execution times of models using IMPRINT and ACT-
R. Specifically, using Matlab resulted in speed increases over
other modeling platforms at factors of about 10,000. It should
be noted that Matlab is a computing environment designed for
scientific or engineering use and not a platform designed specifi-
cally for human behavior simulation. Matlab thus required an
algorithmic implementation of the tasks studied, which may not
be feasible for all applications. With this caveat, equivalent, or
perhaps larger, execution speed gains are likely if other similar
computer environments were to be employed, such as Fortran or
C++, so that the conclusions regarding the advantages of Matlab
are not limited to this particular programming environment.
Because of the speed increases obtained using Matlab, it was
possible to extend the present model evaluation and compari-
son effort to the exploration of two new opportunities that the
increased model execution speed afforded. The first additional
exploration was to perform rapid automated parameter optimi-
zation using state-of-the-art multivariate optimizers; the second
was to use radial basis functions (RBFs) to construct computa-
tionally even faster approximations of the Matlab models’ param-
eter spaces. It should be noted that these opportunities have a
very direct impact on one question that was of central interest
in this research project and in human behavior modeling more
generally, namely, the scalability of models of human learn-
ing using ACT-R and IMPRINT. The major opportunities shown
here to be available algorithmically (e.g., with a present Matlab
RBF model running some 5 million times faster than a direct
Evaluation and Comparison of Models 227
simulation in IMPRINT) suggest that the issue of scalability is
not the concern that it was perceived to be at the beginning of
this model evaluation and comparison effort. An important con-
clusion, however, is that different computational systems have
different strengths and weaknesses, and each system should
be used for the applications for which it was designed. In par-
ticular, scientific programming environments (such as Matlab,
but also Fortran or C++) can handle equation-based tasks with
vastly greater efficiency than systems with other primary goals.
On the other hand, IMPRINT is more appropriate for simulations
of military operations involving personnel and equipment, and
ACT-R for modeling human cognitive processes. With the ability
of most systems to communicate data and interchange compu-
tational requests with other programming environments, hybrid
solutions involving more than one programming platform can be
employed to achieve the best results for a given problem.
The Modeling Tasks

Digit data entry and RADAR were chosen as the tasks for model
evaluation for two reasons: Both tasks had been explored in
multiple empirical studies, so that there were abundant data
available for model creation and evaluation; and the two tasks
differ in cognitive complexity, with digit data entry being a cog-
nitively simple task, and RADAR a more cognitively challenging,
complex task. In addition, the two tasks had already been mod-
eled in ACT-R and IMPRINT (see chapters 9 and 10).
Digit Data Entry Task

The experiments on which the digit data entry models are based
are described in Healy, Kole, Buck-Gengler, and Bourne (2004).
The digit data entry task required subjects to type four-digit
numbers that were displayed to them on a computer screen as
quickly and as accurately as possible. Numbers were presented
one at a time and their digits typed on the keypad to the right of
the keyboard. Subjects did not see the number that they typed.
Each number was terminated by pressing the “Enter” key. The
stimuli consisted of 10 blocks of 64 numbers each, which were
divided by a short break into two session halves of 5 blocks
each. In both experiments there were 32 subjects. In Experi-
ment 1, a set of 64 numbers were repeated in each of the 5
blocks of the first half in different random orders, and a sec-
ond set of 64 numbers were repeated in different random orders
in each of the 5 blocks of the second half. All subjects typed
using their left (nondominant) hand. In Experiment 2, all num-
bers were unique, and the hand used for typing (left, right) was
crossed with session half to create four conditions of hand use
during the experiment (LL, LR, RL, and RR).
The data entry task can be broken down into four taxa from
the new task taxonomy for training (see chapter 8): visual detec-
tion (reading numbers from the screen), memory/symbolic rep-
resentation (the cognitive representation of each number), motor
response planning (for typing each number), and manipulation/
fine motor output (typing).
RADAR Task
The experiments on which the RADAR models are based are
described in Young et al. (2011). The RADAR task, developed by
Gonzalez and Thomas (2008), is a visual search task in which
subjects look for assigned symbol targets in four squares that
move from the four corners to the center of a radar-like display in
a fi xed amount of time. Each search opportunity, as the squares
converge to the center, is called a frame. In each of seven frames
comprising a trial different sets of target and distractor symbols
may be shown in the squares, and the target symbols may differ
from trial to trial (with only one target possible per trial). The
size of the target memory set assigned for a trial includes either
one or four symbols. Targets and distractors may be taken from
different symbol sets (consistent mapping) or the same symbol
set (varied mapping). The squares that contain the symbols may
also be blank. Subjects are to respond only if a target in the cur-
rent memory set appears in one of the squares, and scoring is
based on both accuracy and correct response speed.
The task can be broken down into six taxa from the new tax-
onomy for training (see chapter 8): visual detection (scanning
for symbols), memory/symbolic representation (remembering
targets in the memory set), imagery/visual representation (of
symbols seen in a frame), decision making (target or distrac-
tor decision), motor response planning, and manipulation/fine
motor output (button push on target detection).
The Modeling Platforms and Model

Implementations
The IMPRINT an ACT-R models of the digit data entry and
RADAR tasks were developed independently by experts in the
two platforms to match the quality of model implementations
between platforms. The ACT-R models of the two tasks were
developed by researchers at Carnegie Mellon University (see
chapter 9). The ACT-R model of digit data entry is described
in Gonzalez, Best, Healy, Kole, and Bourne (2011). The ACT-R
models of RADAR are described in Best, Gonzalez, Young, Healy,
and Bourne (2007) and in Gonzalez, Dutt, Healy, Young, and
Bourne (2009). The programmers for the IMPRINT models were
researchers at the University of Colorado, Boulder (see chapter
10). The IMPRINT model of digit data entry is described in Buck-
Gengler, Raymond, Healy, and Bourne (2007). The IMPRINT
model of RADAR is described in Buck-Gengler, Raymond, Healy,
and Bourne (2010). Because the IMPRINT models of both digit
data entry and RADAR were equation based, it was possible to
convert these models to Matlab, so that the IMPRINT models of
the two tasks and the corresponding Matlab versions of them
are algorithmically equivalent.
Although the details of the ACT-R and the IMPRINT/Matlab
differ, implementations in all three environments share some
underlying general principles, including the facts that (a) the
tasks are decomposed into simple conceptual components; (b)
the components are combined to create a simulation with a (rela-
tively) user-friendly interface; and (c) the generated data simulate
variable human behavior on the tasks. However, the modeling
platforms differ in several respects. The focus of intended use
is different for the platforms: ACT-R was designed for cognitive
modeling; IMPRINT was designed for assessing and predicting
human performance in military tasks; and Matlab was designed
for science and engineering applications. The platforms also dif-
fer in computational speed, with Matlab faster than the other
two platforms, as has been noted. In addition because Matlab
was intended for general engineering and scientific use, its pro-
gramming environment includes tools for parameter optimiza-
tion, graphics display, and interfacing to parallel computing
hardware, which the other platforms lack. On the other hand,
because of their intended use for human behavior simulation,
both the ACT-R and the IMPRINT platforms provide embedded
human performance-specific information, whereas Matlab does
not. This difference will inevitably make the former two plat-
forms slower on some simple tasks, but gradually more power-
ful as this type of information is increasingly utilized in more
complex tasks or scenarios. Moreover, the embedded informa-
tion differs in ACT-R and IMPRINT: ACT-R embodies a theory of
general cognitive mechanisms; IMPRINT can call on information
regarding the skills and abilities of army personnel and proper-
ties of military equipment.
The next three subsections give overviews of the three model-
ing platforms and the modes created within each.
ACT-R Platform
ACT-R (Anderson et al., 2004; Anderson & Lebiere, 1998) is a
modeling platform based on a unified theory of cognition devel-
oped through over 30 years of cumulative improvement. It has
accounted for hundreds of phenomena from the cognitive psy-
chology and human factors literature. The version employed
here, ACT-R 6.0, is a modular architecture composed of interact-
ing modules for declarative memory, perceptual systems (such
as vision and audition modules), and motor systems (such as a
manual module), all synchronized through a central production
system.
ACT-R is a hybrid system combining a tractable symbolic
level, implemented as a production system that enables the
specification of complex cognitive functions, with a subsymbolic
level that tunes itself to the statistical structure of the environ-
ment. The combination of these aspects provides both the broad
structure of cognitive processes and the graded characteristics
of cognition, such as adaptivity, robustness, and stochasticity.
The central part of the architecture is the production mod-
ule. A production can match the contents of any combination of
buffers. Buffers include the goal buffer, which holds the current
context and intentions, the retrieval buffer, which holds the most
recent chunk retrieved from declarative memory, visual and audi-
tory buffers, which hold the current sensory information, and the
manual buffer, which holds the current state of the motor module.
During the matching phase, production rules whose conditions
match to the current state of various information buffers (goal,
memory retrieval, perceptual, etc.) qualify to enter a conflict set.
Because ACT-R specifies that only one production can fire at a
time, the rule with the highest expected utility from among those
that match is selected as the one to fire. Utility is graded both by
the expected value of information, driven by activation, and the
quality or exactness of the match itself.
The general structure of the ACT-R model of the data entry
experiments includes two main steps: (a) noticing and encod-
ing of the stimulus from the computer screen, and (b) entry of
the encoded stimulus using the keypad. The first step further
unpacks to include reading of individual numbers, whereas
the second step includes preparing the proper motor program
to press the desired keys. These steps say little about whether
numbers are encoded more than one at a time, and whether
any key presses occur before all of the numbers are encoded.
As is described below, human participants actually use mul-
tiple strategies to approach even this simple task, and tend to
vary between individuals in a preference either to encode all
four digits before entering any, or to encode a pair of digits at
a time, entering a pair after it is encoded. Thus, the model was
constructed to support both of these strategies. Again, although
the task is quite simple, it still requires maintenance of encoded
stimuli in working memory, potentially decomposing a task into
subgoals (working on entering one pair at a time), and the inter-
action with skilled actions (keyboard entry), which is simulated
through the application of individual ACT-R productions (e.g.,
typing the “9” key on the keypad).
IMPRINT Platform
The versions of IMPRINT used to create the models of digit data
entry and RADAR examined in the present research, IMPRINT
7 and IMPRINT Pro, are primarily used to create simulations of
military personnel and equipment engaged in military tasks.
The simulations can be used to evaluate planning efficiency,
given constraints on time, accuracy, and equipment function-
ality, as well as human skills, abilities, and capacities. Simu-
lations can also take into account variables in the external
environment that may affect personnel or equipment. IMPRINT
was not specifically designed for modeling cognitive tasks; how-
ever, the current modeling effort shows that cognitive models
can be implemented on the IMPRINT platform.
The model of the digit data entry task was based on a cogni-
tive model of the task that involves three subprocesses: (a) read
each number and create an ordered mental representation of the
digits, one digit at a time; (b) access each of the represented dig-
its in sequence to create a motor plan for typing it; and (c) uti-
lize the motor plan to type each digit, followed by the enter key.
The subprocesses were assumed to occur sequentially for each
number. However, accommodation was made in the simulation
for a phenomenon observed in the experimental data, mentioned
in the last section, in which some subjects tended to group, or
chunk, the first two digits of a number and the last two digits
of a number, as evidenced in these subjects by longer response
times for the third keystroke than for the second and fourth.
The chunking phenomenon presumably entails some additional
cognitive processing between the two chunks, which was simu-
lated in the model.
The IMPRINT model consisted of a main network and a goal
network. In the main network, parameters can be set to dupli-
cate the conditions of Experiment 1 of Healy et al. (2004) (all left
hand typing and number repetition in each half) or of Experi-
ment 2 (typing hand crossed with session half and no repeated
numbers). The goal network was called repeatedly until the
stimuli were exhausted. Each run of the model represented the
output from one statistical subject.
A number of human performance parameters in the model
were assigned values stochastically to simulate human variabil-
ity of performance. Values for stochastic variables were taken
from a variety of probability distributions (viz., normal, uniform,
and gamma), which were chosen, together with their param-
eters, to capture distributions observed in the experimental
data. Other model parameters were predetermined through data
inspection and were not left free to vary in the model.
Matlab Platform
Matlab evolved from FORTRAN in the late 1970s, and has since
become a widely used programming environment in science and
engineering. The language is technically an interpreted one, but
its statements are in effect compiled on their first execution,
and then reused in this latter form, providing greater execution
speed. The language is built around matrix/vector operations
and, when used so that this feature is exploited, model execu-
tion speeds come quite close to what is theoretically possible on
the computer hardware on which models are run.
The hardware of modern PCs often allows many computa-
tional threads to execute simultaneously. Not only are com-
puters typically equipped with one or several dual-core (or
multiple-core) processors, each of these cores may furthermore
be hyperthreaded (doubling again the number of independent
simultaneous threads). The resulting opportunities of parallel
processing are automatically utilized in Matlab’s matrix opera-
tions, with no special user attention needed. Matlab’s parallel
computing toolbox can also be utilized for other types of opera-
tions with (in most cases) only a few lines of extra programming.
In the present project, the Matlab model was a direct trans-
lation of the algorithms used in the IMPRINT code. Several
advantages were found to porting the algorithmic parts of
the IMPRINT code to Matlab. In addition to higher execution
speeds, Matlab code is short and easy to write. It can also be
comprehensively viewed as a single program, unlike IMPRINT
code, which is distributed throughout the simulation interface.
As mentioned, Matlab also has available within it some pow-
erful tools for graphics, optimization, debugging, and perfor-
mance profi ling. Modeling in Matlab also allowed comparison
of two environments specific to modeling human cognition and
performance (ACT-R and IMPRINT) against one with no such
specialization.
It should be stressed that the choice of Matlab (as opposed to
some other scientific or engineering programming environment)
was made for obtaining outside benchmark assessments on the
evaluation speeds of ACT-R and IMPRINT in the most flexible
and convenient way possible, and not because it was expected
that Matlab should be adopted on a large scale for cognitive
modeling. Matlab allowed a focus on obtaining timing and scal-
ability comparisons with little attention diverted to implementa-
tion technicalities.
Model Evaluations and Comparisons

Model comparison and evaluation can be divided into two prob-
lems, each focusing on models of the two tasks described pre-
viously. The first model comparison problem considered the
IMPRINT (see chapter 10), ACT-R (see chapter 9), and Matlab
models of the digit data entry task (see Raymond et al., 2008).
The second model comparison problem considered only the
IMPRINT and Matlab models of the RADAR task, not the ACT-R
models.
Model evaluations were performed in two ways. First, for a
model to be useful, it must be accurate. Specifically, the model
should be capable of accurately simulating the empirically
derived human data. To measure model accuracy, correlations
were calculated between the model outputs and the experimen-
tal data for each experiment. In addition, Root Mean Square
Errors (RMSEs) were calculated between model outputs and
the experimental data. Second, a model can be evaluated with
respect to the time it takes to run a simulation. For this com-
parison timing information was collected for execution of each
model.
Accuracy Comparisons
Generally, there were no significant differences among the
platforms in their ability to simulate the empirical data. Cor-
relations were above .97 between all model pairs for both data
entry experiments. The models of digit data entry rely heavily
on random samplings to simulate human behavioral variability,
so that no two runs of any model give identical outputs. Nev-
ertheless, it should be noted that the differences between any
combination of IMPRINT, ACT-R, and Matlab runs used for the
accuracy comparison were no larger than between two different
IMPRINT, ACT-R, or Matlab runs.
Similarly, there were no significant differences in modeling
accuracy between the Matlab and IMPRINT models of RADAR.
The results for the comparison between the Matlab and
IMPRINT models of digit data entry were expected, because the
Matlab code for the digit data entry task was a direct transla-
tion of the IMPRINT algorithms. However, the Matlab code for
the RADAR task was structured differently from the IMPRINT
model of that task. The logical structure of the Matlab model
was a set of nested loops, whereas the IMPRINT structure
consisted of separate, interacting modules. Nevertheless, the
mathematical algorithms and parameter choices were identi-
cal in the two models, and hence, in comparing the Matlab and
IMPRINT models of RADAR there were again no differences in
modeling accuracy.
Performance Timing Comparisons

The ACT-R model of digit data entry was run on a Dell laptop
running Windows XP with a 2.0 GHz mobile Intel CPU and 1 GB
RAM and used ACT-R version 6.0. ACT-R requires a Lisp envi-
ronment; the current model runs used Allegro Common Lisp
version 6.1. The model required approximately 16 min for the
simulation of each experiment (i.e., approximately 30 s for each
of the 32 subjects). Note that the 16 min also included the time
needed to create detailed data files containing individual trial
results for each participant and file summaries, as well as the
time needed to handle the memory management necessary for
the Lisp interpreter.
Lisp is an interpreted programming language, and significant
speedup could be achieved by allowing model code to be par-
tially compiled, although this possibility was not pursued with
the current model. Note that, in addition to the small amount
of data that is collected and collated for the individual model
runs, the ACT-R system processes and records a large amount
of information that was not used in the current study, but is
nonetheless readily accessible (e.g., activations of every chunk,
previous instantiations of productions, a record of every goal the
system attempts to achieve, etc.). Thus, the performance data
produced by the model are derived from its behavior rather than
produced as a primary product.
The IMPRINT model of digit data entry was run using version
7.30 on a Dell computer under Microsoft Windows XP Profes-
sional with a 2.8 GHz processor and 2 GB memory. The model
required 24 min for the simulation of each experiment. This
total amounts to approximately 45 s for each of the 32 subjects.
During execution IMPRINT writes output data to a Microsoft
Excel spreadsheet. The execution time reported here does not
include the overhead for Excel output, but only the time needed
for producing the means for each of the 10 blocks of stimuli
when averaged over all statistical subjects (in each condition)
and overall nonerror items, for both experiments. Writing all the
data generated during a model run or the experiment to an Excel
spreadsheet file takes an additional 10 min. IMPRINT does not
include any profiler option that details the amount of time each
line of code requires. The times quoted are “wall clock times.”
The code for the Matlab implementation of data entry was
accomplished in no more than about 70 lines of code (not count-
ing comment lines). The Matlab model was run on a Dell GX-270
PC single processor operating at 3.2 GHz, with 2 GB RAM, under
Windows XP. Execution of the Matlab code with parameters set
for simulation of Experiment 1 (for all 32 subjects) took approxi-
mately 0.085 s. The time per subject for Experiment 1 was thus
about 0.0027 s (2.7 ms). For Experiment 2 the timing was equiv-
alent, resulting in a typical computer time of 0.17 s for running
both experiments.
The ratio between the Matlab and IMPRINT execution times is
thus approximately 6/100,000. Because the speeds of the com-
puters are roughly similar, this ratio suggests that assigning
specific subtasks to Matlab can offer gains that are much larger
than porting from a PC to a supercomputer system. It should be
stressed that using different computer languages or systems for
different tasks within a single project is a much more appropri-
ate approach for large tasks than relying on a single language.
Most programming environments, including IMPRINT and Mat-
lab, have interface options for running subtasks written in other
languages.
The conversion of the RADAR model from IMPRINT to Matlab
differed from the conversion of the data entry task in two pri-
mary ways: (a) stochastic features entered into the RADAR task
in such a way that Matlab’s array processing features could not
be applied to increase code efficiency as they were in the digit
data entry conversion; and (b) the general programming style
of Matlab (shared with Fortran, C/C++) encouraged the use of
nested loops for the RADAR conversion, rather than relying on
interacting modules, which were used for the digit data entry
model. These differences led to roughly offsetting advantages
and disadvantages. The Matlab model turned out again to be
about 10,000 times faster than IMPRINT on equivalent hard-
ware. The Matlab code was again extraordinarily compact and
readable, requiring in this case only about 100 lines of execut-
able code (plus about 30 lines for setting variable values).
Model Exploration: Parameter Optimization and

RBF Modeling
Model evaluation was initially conceived in this research as
comparisons of model accuracy and performance, which would
lead to an assessment of scalability for future larger modeling
tasks. However, the extreme speed advantages of the Matlab
implementations pointed to two opportunities for further model
exploration. The first of these opportunities was the applica-
tion of algorithms that would allow for more extensive model
parameter optimization than could be accomplished by repeat-
edly running the IMPRINT and ACT-R models with selected
parameter values. The second opportunity was the creation of
RBF approximation functions for models of the two tasks that
further increased the speed of model execution, and hence of
parameter space exploration.
Optimization
There are several approaches available for parameter optimi-
zation, some of which are yet to reach their full potential in
cognitive modeling. Although finding the global optimum of a
function of one or two variables can usually be handled effi-
ciently, and finding local optima of functions of many variables
is also relatively straightforward (meaning that effective algo-
rithms and software are available in optimization “toolboxes”),
the problem of finding global optima of functions of many vari-
ables is a daunting one. In Gluck, Scheutz, Gunzelmann, Harris,
and Kershner (2007), a calculation based on ACT-R, exploring a
four-variable parameter space by means of 21, 26, 105, and 31
increments in respective parameters, is reported to have con-
sumed 96,000 processor hours on a cluster at Wright-Patter-
son’s High Performance Computing Center. For each additional
variable sampled in a similar manner, times would be expected
to rise by another factor of around 20. Clearly, both faster gen-
eral optimization algorithms and an increase in computational
speed should be pursued as far as possible.
In a review in 2000 of the 10 most influential algorithms
developed during the 20th century (Cipra, 2000; Dongarra &
Sullivan, 2000), simulated annealing appeared in fi rst place.
Genetic algorithms is a second approach, whose impact is yet
to be fully realized. In many cases, optimization even over
dozens or tens of dozens of variables can be entirely feasible.
Both simulated annealing and genetic algorithms are avail-
able within the Matlab environment, and adding either of these
optimizers to an existing model requires less than 10 extra
lines of code.
A large number of optional toolboxes are available with Mat-
lab, including one that provides both genetic algorithms (GA)
and simulated annealing (SA) search capabilities. As noted
above, these are two very successful strategies for searching
through high-dimensional parameter spaces for locating global
optima more effectively than an exhaustive parameter space
search. Both search methodologies borrow their key ideas from
processes in nature: biological evolution (for GA), and crystal
formation through slow cooling (for SA). Both GA and SA were
applied to the optimization of the IMPRINT/Matlab model of
digit data entry with five free parameters.
For optimization of the data entry model five important model
parameters were chosen that moderated learning and perfor-
mance. Using these five parameters, GA and SA optimizations
were each run 20 times. The runs executed both Experiments
1 and 2. A GA optimization consisted of allowing a population
of size 30 to evolve through 60 generations, for a total of 1800
evaluations of the RMSE objective function. The typical time for
each GA optimization was about 5 min. The SA optimizations
were stopped after about equally many objective function evalu-
ations, thus again taking around 5 min for each full run. The
results for both GA and SA optimizations are shown in Figure
11.1. The 20 GA and 20 SA runs give results comparable to what
an exhaustive search would have provided, but in a fraction of
the approximately 10 days the latter would have required.
For each of the five selected variables, Figure 11.1 shows two
horizontal lines with short vertical lines between them. The
extent of each of the horizontal lines corresponds to a reason-
able range of values for the respective variable, shown at the left
edge of the lines. Along the top line for each variable is shown
the outcome for that variable of 20 separate global GA optimiza-
tions; along the bottom line is shown the same for 20 SA opti-
mizations. The results show that the values (the “hand derived”
values) found by data inspection and a small number of runs
of the original IMPRINT model are consistent with the global
Cognitive l e~ min g GA
[-0.50, 0.00] SA
Moloric learning GA
[-0.20, 0.00] SA
ReJ:.etition Jlriming GA
[0.00, 0.50] SA
Left-hand J:.enalty GA
[1.00, 3.00] SA
Cognitive slowdown GA
[0.00, 0.20] SA
Figure 11.1 Outcomes of 20 GA and 20 SA optimizations of five model

parameters for the Matlab digit data entry model. The hor-
izontal lines represent the search ranges indicated numer-
ically to the left. Short vertical lines show the outcomes of
the individual optimizations for which the RMSE was 0.06
or better. We can see that these mostly are in good agree-
ment with the hand-derived parameter values (longer ver-
tical lines, placed in-between the GA and SA results), while
also conveying a measure of the parameter uncertainty.
optimization results. Although the results of the automated

optimization did not differ appreciably from the hand-derived
optimal values, manual optimization, which was feasible for this
model, but is not always practical, can be confirmed (or replaced)
by only a few minutes of computing using a global optimizer.
In addition, the variation between different optimization runs
can provide good information about different model parameters’
uncertainty ranges. The presence of even large amounts of sta-
tistical noise in a model does not cause major difficulties for
fully automated parameter determination with either GA or SA.
Because scaling issues form a critical aspect of the present
model evaluation task, we can note that having 10 parameters
instead of 5 in the optimization would only increase the GA or SA
times by a factor of 20-100, whereas the cost for an exhaustive
search would increase times by a further factor of 215, that is,
to completely unrealistic computer times of the order of 500,000
years.
The RADAR model contains about 30 nontrivial parameters.
Rather than attempting a global optimization simultaneously
over all of these (which would make a challenging problem
nearly impossible), smaller groups of parameters were isolated
that logically belong together, and for which the hand-derived
values were particularly uncertain (or particularly interesting).
Sixteen parameters in four groups were selected: (a) four param-
eters determining target decision times; (b) four parameters
specifying the probability of target response; (c) five parameters
describing typing rate; and (d) three parameters capturing sub-
ject learning rate.
For these four parameter groups, 20 GA optimizations were
run with population sizes of 40 that evolved through between
10 and 30 generations, depending on the number of parameters
in the group. The result is illustrated in Figure 11.2. In each of
the four subplots, a horizontal line is shown for each parameter.
To the left is given a reasonable range of values, and below each
line a small vertical tick mark shows the hand-derived value for
each parameter. Above each line are shown the outcomes of the
20 GA simulations. In most cases, the agreement is fully sat-
isfactory, with one notable exception (i.e., the third case in the
top right subplot). For this case it was subsequently determined
that there had been an error in entering the hand-derived value
from the original model. The Matlab model optimization thus
found the correct value, underscoring another advantage to
the automated optimization procedure. Overall, in some cases,
the parameters turn out to be well determined by the GA data
whereas, in other cases, the uncertainties are large.
DecisionlimeDist Respon seProb
)0.1 ' 0.4] II 11 111111

1 )0.8' 1) 1nur•n1
)0.1 ' 0 4) 1111 1111 )0.8' 1) II III~ 111
)0.1 ' 0.4] I I IIIII I

I )0.8' 1) I
•••I
)0.2' 0.5) I ·~II II
1o. s5. o.95! I !W!!j l !! l
FA BlockTypeRate FALearn ingRate
)0' 0.1]
II III IIIII!!)
)0.98' 1) \'" "''
111 1
11' 3)
)1' 2) II
I
I I l l !! II ' I II" 'II )0.98' 1)
)1 .5' 5) Ill 11111 Il l
I I !!I I I I I II )0.98' 1) 1111 'l l"'l'~ ','l

)1.5 ' 5) ' II' 'I
Figure 11.2 Comparison between hand-derived and GA obtained val-

ues for 16 model parameters for the Matlab RADAR model,
distributed over four different parameter categories. For
each parameter, a “reasonable range” is displayed, with
the hand-derived value of that parameter marked below
each line and the results of 20 GA optimizations marked
above it.
Adjusting the model parameters to agree better with the GA
results (e.g., replacing each value with the average for the GA
runs) reduced the typical RMSE by about 15%. The level of
reduction is not so much the issue as the fact that in a totally
automated way and in spite of all the random fluctuations, it
is possible to get information separately on a large number of
parameters, although these contribute only in combined form
toward the (measurable) model fit, as represented by the RMSE.
Radial Basis Function Modeling

The second opportunity afforded by the use of Matlab as a mod-
eling platform was the availability of Radial Basis Function
(RBF) modeling. RBF modeling is a form of data fitting, which is
closely related to the process of data mining (Fayyad, Piatetsky-
Shapiro, Smyth, & Uthurusamy, 1996; Han & Kamber, 2006;
Thuraisingham, 1998). The strength of the data fitting approach
is its ability to bring out entirely unanticipated, but neverthe-
less significant, relations in the data. Such relations frequently
lie deeply hidden in most large data sets, and virtually always
escape attention when using conventional visualization or simi-
lar inspection methods. The RBF methodology was first proposed
about 40 years ago in the context of multivariate interpolation
(Hardy, 1971). The generality and power of RBFs have only been
fully recognized much more recently. It is currently successfully
employed in numerous other areas of application, such as within
neural networks, for the numerical solution of partial differen-
tial equations, and for graphical surface rendering. For a recent
survey of the concept of RBFs, their mathematical background
and approximation properties, as well as their implementation
in Matlab, see Fasshauer (2007). Note that multivariate inter-
polation allows the creation of RBF approximations of functions,
which can be very rapidly evaluated.
For the current research an RBF model of the previous digit
data entry model’s objective function was used to optimize
model parameters. The model was evaluated at some (i.e., a few
thousand) suitably chosen parameter locations. An attempt to
display the five-dimensional objective function can be seen in
Figure 11.3, where 10 two-dimensional slices through the five-
parameter space show the behavior of the objective function
in each of the 10 two-dimensional subspaces. With about 20
lines of additional code, an RBF model of the Matlab model’s
parameter space was then created that reproduces the process
of parameter space evaluation, but with stochastic noise sup-
pressed. Noise suppression can be performed to whatever degree
is desired. Applied to Matlab’s fi ve-parameter digit data entry
model, with the same 10 slices through thefive-parameter space
containing the objective function illustrated in Figure 11.3, the
RBF model produces the result shown in Figure 11.4. Note the
identical trends in the two fi gures (which hold throughout the
full parameter space, and not just on the slices shown).
The advantages of the RBF approach include increased
computational speed and the suppression of stochastic noise.
Whereas each evaluation of the original Matlab model (both
Experiments 1 and 2) required 0.17 s, the RBF model evaluates
in 0.00030 s (i.e., over 500 times faster than the already very
fast original Matlab version). The GA (and SA) algorithms were
highly effective for parameter optimization, even in the presence
of stochastic noise, so the accuracy gain achieved by working
instead with smoother (and deterministic) functions proved to
0.4 u 2 i: i 0.2
0.2 o i i: i U.I
:,.
"i
0 I : 5
c i. 0
-0.1 •0 l i: 1 -C.I
-0.1 -0.2 J •C 2 k-ll :.'fi:i •o: cognitive 0 12
0.4 L" 2 0.2
see
''.? 0.2 r I 0 1
-0.05 t. v,
1 a ie
I t *
0 4
0 i
i b
i.
-0 •
•0 2 -ais •i] i
«* •C Ui i: rap learn
0 3
J 0 1
0 ]5
i.
led penalty 0 i
[ o;
J
Vr-
0 •C. i
i: i: •:•
p h y sic a:
i
vuiinli-
o j 0 3
•I :•• n? S
0 1
0 I 0 I
.1 .ii I.
I 1 I.I I
:>•'•
0.2
i £ 0.2
J j
-0.2 j,-: i; i -o.os I" •11 C C5 'j led penalty r
'•-:-' :••'•'
0 c • •:-!-• ' - • > "
. i.ijl 11- V
0.2
9
1.2 1 2 1.3
5 1 % -:... I
'.
'. I
J.i;
-0.2 -0 .-- •i; i •f o^ c n i •i i -, j "i 0.1 n :•• 0 3 '•A J '• •
1(1 :K-I-A--
] • o i 0 I 0.1
R
A er
OJ
5c o:. i o oz
? o-^
i j . :e w 3 0.05
0.™ 0.08
.1 J *
'.<-• <*
:• j o
*e 0
-0.2 -0 .-" •ii l -O.OS c n i r r -:, j o : o 2 n 1.2
l o t ;:••:-! i ;:iy
Figure 11.3 Displays of RMSE errors (the objective function) for the
Matlab data entry model when any pair of two out of the
fi ve parameters that were optimized was left to freely vary
over a reasonable range, while the remaining three param-
eters were held at their assumed best positions. The same
data are displayed in the top right and the bottom left sub-
plots as surface plots and as contour plots, respectively.
The solid dots in the latter fi gures mark the hand-derived
values. The fact that these dots are located at low spots of
the different functions indicates that the manual param-
eter determination was successful.
u-i 0.1 0.2 0.2
o ;• IJ • 0.1 u i
j il- I 0 IJ •
0 IJ 0 J
-ii.i.;. 02 ij. i.O
-0.1 •u i iep lean- IJ -0.2 -i. j o -c i
II
0.4 u j IJ •
10
0-1 0.1
P'"-v= '-'
0.2 o.i
-0.05 Q.12
aie °-u u.j 1.5 U 1

0 n II
02 -ij Of •i. ii: IJ i . •:•
-0 05
-0.2 -i. 15 •u 1 •i. o:: iep leair 0 -0 I phyacal -C l i: h y s i c a I o -c i physical
r 5 0.3
IJ • u •
s a
0 2
^ o5<*
0.2
- 0.1
ij l
# ij l
j
i
I.:.
•:• IJ i
0-i n .;
u j j . •:• 0.2
IJ
'_• -f. 15 .IJ
ccgrithe
i ••: u'.
,
-0.05 u let penalty i.
:
i.
IJ J
O.OB
left pen.*
0.1
1.2 1.2 1.2
«*S *:•»'
u i
O-075 Q.rf* !.:•-•
:
-IJ .• -c 15 • IJ 1
ij.ir:*:
•'. u-: u -01 • J :. 0 i. t l
•«[. Ie,in
12 IJ j o
o i 0.1 0 1 01
c.i-
0.11
0.1
O.OB
«/°-r
*
1.
'•'- : -
j i: 5 :M:-:,
o-oe i: ri-
j .1 j J
IJ :• -c !•: -01 •: U-: -Q1 ij c •:• j i. i: i 12 0.3 i :•
: •'.•I. k'-.'in •.•It !.»• - u i V v
Figure 11.4 The counterpart to Figure 11.3, with the difference that
the original IMPRINT/Matlab model has been replaced by
a RBF data-fitting model. All the trends from the display
in Figure 11.3 can be recognized again here, but a large
amount of the stochastic noise has been eliminated.
be minor. However, the speed gain of a factor of about 500 will

make optimizations faster by about that same amount.
The computation behind Figure 11.3 required a total of 4410
evaluations of the Matlab model for each of Experiment 1 and
Experiment 2. The total time for producing Figure 11.3 was 12.5
minutes (which would have been 147 days in IMPRINT). In con-
trast, the computing time for producing the data for Figure 11.4,
once the RBF model had been created, was 1.3 s.
Note that the ability to evaluate an RBF model very rapidly
opens up an opportunity, not yet utilized in the literature, to
interactively move through different dimensions and thereby
display multivariate functions without the customary limita-
tion of two-dimensional paper or aflat computer screen. The left
part of Figure 11.5 displays a standard two-dimensional sur-
face plot, conveying very clearly the character of a function of
two variables. A dashed frame shows how one can “slice” out
a one-dimensional function of x only (with its y-value fi xed at
a certain value y0). The resulting slice can be displayed a s a
curve, shown to the right, together with a “slider” that can be
moved by a mouse, causing the curve above it to be dynamically
Figure 11.5 Schematic illustration of the opportunity offered by fast

RBF models to visualize functions of several variables. We
see here the concept in the case of a two-dimensional func-
tion visualized by means of one-dimensional functions.
updated. By this method, a two-dimensional function can be

visualized as a one-dimensional curve together with one slider.
The opportunity that fast RBF models offer in this regard is that
the function to be displayed can be in d dimensions. Displaying
a surface and using d – 2 sliders, moving these sliders allows
immediate visual inspection of a d-dimensional function. This
real-time display capability is an entirely novel opportunity
which was made possible by the d-dimensional RBF models that
were created and their very high computational speeds (effective
up to five- or six-dimensional spaces, very well past the usual
two- or three-dimensional space limitation).
Conclusions
The present project is highly complex, both with regard to quan-
tifying cognitive and motoric concepts and with regard to their
modeling. While pursuing both ACT-R and IMPRINT modeling,
it became clear that remarkable opportunities were available by
performing some tasks in a high-speed scientific programming
environment, here represented by Matlab. Speed gains on the
order of 10,000 seem to be typical relative to IMPRINT and ACT-
R, and optimization tools like GA or SA are then also readily
available. Within this environment, equation-based models can
be developed relatively easily, and then optimized with regard
to their parameters. Once the optimization is complete, conver-
sion to IMPRINT will preserve model quality and is an option for
creating codes that interface well with other military tasks. It
seems to be very beneficial to separate clearly the tasks of model
development and model applications (and use the appropriate
tools for each of these). For the application of finished models,
different tasks should each be carried out in the software envi-
ronments that are best suited for them.
For the critical tasks of model development and parameter
tuning, the equation-based approach implemented in a scientific
computing environment uses tools that will scale well also when
extended far beyond the levels they were encountered in the
present research. For example, additional factors in the thou-
sands are quite readily available through distributed processing
(still within the Matlab environment). Another direction for the
future will most likely be improved visualization tools for dis-
playing multivariate data (via the use of radial basis functions).
As much as it might be desirable to find completely new com-
putational paradigms that will totally revolutionize all model-
ing and prediction for cognitive and training processes (maybe
based on the incredible computing resources provided by the
latest generations of peta-flop or exa-flop supercomputer sys-
tems, by highly cost-effective GPU or SPI processors, or by quan-
tum computers), it is still very hard at present to see any more
solid opportunities than what are offered by an equation-based
framework, when this is combined with cutting edge numerical
algorithms.
References
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., &
Qin, Y. (2004). An integrated theory of mind. Psychological Review,
111, 1036–1060.
Best, B. J., Gonzalez, C., Young, M. D., Healy, A. F., & Bourne, L. E.,
Jr. (2007). Modeling automaticity and strategy selection in dynamic
visual detection. In Proceedings of the Sixteenth Conference on
Behavior Representation in Modeling and Simulation (pp. 3–11).
Orlando, FL: Simulation Interoperability Standards Organization.
Jr. (2007). Modeling data entry in IMPRINT. In Proceedings of the
Sixteenth Conference on Behavior Representation in Modeling and
Simulation (pp. 205–206). Orlando, FL: Simulation Interoperability
Standards Organization.
Jr. (2010). Modeling a visual search task with a secondary task in
IMPRINT. In Proceedings of the 19th Annual Conference on Behav-
ior Representation in Modeling and Simulation (BRIMS) (pp. 63–64).
BRIMS Society.
Cipra, B. (2000). The best of the 20th century: Editors name top 10
algorithms. SIAM News, 33, 4.
Dongarra, J., & Sullivan, F. (2000). Top 10 algorithms of the century.
Computing in Science and Engineering, 2, 22–23.
Fasshauer, G.E. (2007). Meshfree approximation methods with Matlab.
Singapore: World Scientific.
Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R.
(Eds.). (1996). Advances in knowledge discovery and data mining.
Menlo Park, CA: American Association for Artificial Intelligence.
Gluck, K. A., & Pew, R. W. (2001). Lessons learned and future direc-
tions for the AMBR model comparison project. In Proceedings of the
10th Annual Conference on Computer-Generated Forces and Behavior
Representation (pp. 113–121). Orlando, FL: Division of Continuing
Education, University of Central Florida.
Gluck, K., Scheutz, M., Gunzelmann, G., Harris, J., & Kershner, J.
(2007). Combinatorics meets processing power: Large-scale com-
putational resources for BRIMS. In Proceedings of the 16th Confer-
ence on Behavior Representation in Modeling and Simulation (BRIMS)
(pp. 73–83). Orlando, FL: Simulation Interoperability Standards
Organization.
Gonzalez, C., Best, B., Healy, A. F., Kole, J. A., & Bourne, L. E., Jr.
(2011). A cognitive modeling account of simultaneous learning and
fatigue effects. Cognitive Systems Research, 12, 19–32.
Gonzalez, C., Dutt, V., Healy, A. F., Young, M. D., & Bourne, L. E., Jr.
(2009). Comparison of instance and strategy models in ACT-R. In A.
Howes, D. Peebles, & R. Cooper (Eds.), 9th International Conference
on Cognitive Modeling—ICCM2009. Manchester, UK.
Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques
(2nd ed.). Waltham, MA: Morgan Kaufman.
Hardy, R. L. (1971), Multiquadric equations of topography and other
irregular surfaces. Journal of Geophysical Research, 76, 1905–1915.
Pitt, M. A., & Myung, J. (2002). When a good fit can be bad. Trends in
Cognitive Sciences, 6, 421–425.
Raymond, W. D., Fornberg, B., Buck-Gengler, C. J., Healy, A. F., &
Bourne, L. E., Jr. (2008). Matlab optimization of an IMPRINT model
of human behavior. In Proceedings of the Seventeenth Conference
on Behavior Representation in Modeling and Simulation (pp. 26–34).
Orlando, FL: Simulation Interoperability Standards Organization.
Thuraisingham, B. M. (1998). Data mining: Technologies, techniques,
tools, and trends. Hug, Switzerland: CRC Press/Informa.
Young, M. J. (2003). Human performance model validation: One size
does not fit all. SCSC ’03, 732–736.
12 A Compact Mathematical
Model for Predicting
the Effectiveness
of Training
Matt Jones, Lyle E. Bourne, Jr.,
and Alice F. Healy
In any training context, it is often useful to make predictions

of learners’ performance. Learners and training systems can
vary greatly in their history of relevant experience, and likewise
there is great variety in how that experience may be called on
in the future. The present chapter offers a concise mathematical
approach to making quantitative predictions of future perfor-
mance, as a function of the learner’s training history.
The goal of predicting a learner’s performance can be impor-
tant for a variety of reasons. First, prediction is useful in design-
ing training systems, including military and other job training,
tutorial software, and classroom curricula. In all of these cases,
designers can benefit from reliable predictions of how effective
candidate systems will be. One cannot generally expect math-
ematical modeling to completely replace empirical evaluation
with actual test subjects, but it can significantly narrow the
search space so that resources are focused on only the most
promising approaches.
A second, closely related use of predictive models is for fine-
tuning existing training systems. Most realistic training sys-
tems involve large numbers of variables, such as timing and
quantity of training experience on different tasks. Thus, opti-
mization entails search in a high-dimensional parameter space
that cannot be practically accomplished using empirical data.
In contrast, such high-dimensional search can be attainable
with mathematical models—especially simple ones like the
model presented here—because of their ability to rapidly gener-
ate predictions for a large number of candidate scenarios. Thus
one can quickly find the set of training variables for which the
model predicts that learning or test performance will be most
successful.
248 Matt Jones, Lyle E. Bourne, Jr., and Alice F. Healy
Third, predictive modeling can be useful for anticipating how
learners will actually perform in future situations. This sort of
prediction can be especially useful when applied to groups of
people working as a team, such as in a corporation or military
unit. Larger-scale planning can often benefit from estimates
of how productive a team will be or how much time the team
will require to complete a task, as is the goal of the IMPRINT
framework discussed in chapter 10. Mathematical models can
provide numerical answers to these questions, based on infor-
mation about the individuals’ training histories and the task in
question.
Fourth, mathematical modeling can be used in creating more
realistic simulation systems. Computer simulation has become
very important in military applications, including predicting
outcomes of combat scenarios, evaluating equipment and tac-
tics, and providing low-risk and inexpensive training to combat
personnel (Archer et al., 1999). Likewise, the video game indus-
try relies on increasingly faithful simulations. In both cases, the
realism or reliability of the simulation depends strongly on the
accuracy of the behavior of simulated characters. Fully realistic
simulation of human behavior of course is currently out of reach
and is certainly beyond the scope of this work. Still, one can
significantly increase the fidelity of simulated behavior by mak-
ing it nonstationary, dependent on the character’s past experi-
ence. Learning models are useful in this regard, because they
can specify how a character’s abilities and performance should
change over the course of a simulation.
Given this goal of quantitatively predicting learning behavior,
we now consider the elements of such a model. In its simplest
form, a learning model should take as input some description
of the learner’s past experience and should output predictions
of performance in a specified test situation. Characterizing past
experience can of course be highly complicated, but at a mini-
mum it should include what previous relevant tasks or task situ-
ations were encountered, for how long, and when. Likewise, test
behavior can in principle be predicted in great detail, but mini-
mally a model should generate predictions of performance (i.e.,
accuracy or proportion of correct choices), speed (i.e., response
time or task-completion time), and learning curves for each (i.e.,
their dependence on time in the test task).
The goal of the present work is to present a concise mathemat-
ical model that captures general relationships between training
experience and test performance. The model is not specific to
any learning domain, but it has free parameters that allow it
to be tuned to specific applications. The model is not meant to
Predicting the Effectiveness of Training 249
capture details of cognitive processing but to provide quantita-
tive predictions that might be useful to the applications listed
above. Similarly, the model is not intended as a new psychologi-
cal theory, but is instead meant as an encapsulation of previous
well-established theories that can explain or formalize many
of the empirical training principles discussed elsewhere in this
volume (e.g., chapters 2 and 3). Finally, the model is intention-
ally simple, comprising only a few straightforward equations. Its
goal is to encompass significant predictive power and a variety
of training principles within a concise formulation that is widely
applicable and easily integrated into existing theoretical or sim-
ulation frameworks.
The model presented here draws its foundations from two of
the most successful lines of research on mathematical modeling
of learning. The first line concerns skill acquisition and reten-
tion, specifically the interplay between strengthening of knowl-
edge from repeated experience and decay of knowledge with the
passage of time. The second line concerns transfer, specifically
the ability to generalize knowledge from one situation or stimu-
lus to another. Mathematical relationships have been proposed
and empirically supported in both cases, as reviewed below.
Here, we propose a natural integration of the two, yielding a
single master equation describing the strength of knowledge in
any task, as a function of the learner’s history of experience with
it and similar tasks. Two auxiliary equations use this strength
of knowledge to predict accuracy and task-completion time. Free
parameters in the model enable a variety of other training prin-
ciples to be expressed, via assumptions about how their values
depend on various task characteristics.
The remainder of this chapter is structured as follows. First,
we review past research and theory on skill acquisition and
retention and on transfer or generalization. We then show how
mathematical models in these two domains can be naturally
integrated, leading to the model proposed here. The equations
of the model are presented and the psychological interpretations
of its parameters are discussed. We then explain how the model
can capture many of the training principles presented in other
chapters of this volume. We conclude by noting some limitations
of the model, possible extensions, and implications for basic
cognitive theory.
A Mathematical Model of Learning

The model presented here is a simple set of equations meant to
provide useful predictions in a wide range of training situations.
The theoretical starting point is a decomposition of learning
into three component processes: acquisition, retention, and
transfer (see chapter 1). Acquisition concerns initial storage of
knowledge at training, retention concerns how that knowledge
is maintained or decays over time, and transfer involves apply-
ing knowledge to situations different from those in which it was
acquired. Previous research on learning and memory has pro-
duced a simple mathematical law encapsulating acquisition and
retention and another capturing certain types of transfer, both
discussed in detail below. The present model integrates these
two laws, yielding a concise yet powerful theory that captures a
wide array of training principles.
Acquisition and Retention

The time course of memory decay has been a focus of research
at least since Ebbinghaus (1885/1913). The mathematical
relationship between the strength of an individual memory and
time since its formation has been a matter of debate, but there
is good support that this relationship follows a power function
(Rubin & Wenzel, 1996; Wickelgren, 1972; Wixted & Carpenter,
2007). This power law of forgetting can be described by the
following formula, where a is memory strength (or activation)
after time t,  > 0 is a parameter determining the decay rate, and
 is a scaling parameter determining overall memory strength.1
a = t– (1)
Complementary to the question of retention is that of acquisi-

tion. Acquisition concerns how strength of knowledge increases
with repeated experience. Newell and Rosenbloom (1981) have
argued that acquisition also follows a power law. This power law
of practice can be described by the formula a = bnc , where n is
the number of training trials, c ϵ (0, 1) is a parameter determin-
ing the learning rate, and b is a scaling parameter.
The power laws of practice and forgetting are naturally com-
patible. Under the assumption that memory strength for a
repeated fact or skill is the sum of trace strengths contributed
by each separate experience, then the power law of practice is a
consequence of the power law of forgetting (Anderson & Lebiere,
1998). Specifically, assume that activation can be expressed as
a= ∑βti– (2)
i
where i indexes all past learning experiences, and ti is the time
elapsed since experience i. When there are n equally spaced
experiences, separated by gaps of length , then it can easily be
shown (by replacing the sum with an integral) that Equation 2
is approximately equal to -γ/(1-)∙n1-γ. Therefore the power law
of practice is obtained under appropriate substitution for b and
c, provided  ϵ (-1, 0). Figure 12.1 displays the evolution of activa-
tion under equally spaced practice, illustrating how the power
law of practice emerges from the power law of forgetting.
Equation 2 offers a complete characterization of the dynam-

ics of knowledge activation for any single fact or skill, under
an arbitrary pattern of practice. It has been used with great
success as an activation function in the ACT-R cognitive archi-
tecture (Anderson et al., 2004; Anderson & Lebiere, 1998) and
many models based on that architecture (e.g., Petrov & Ander-
son, 2005; also see the instance-based ACT-R models described
in chapter 9). This activation function contributes half of the
central equation of the model proposed here.
3.5
2.5
Activation
1.5
0.5
0
0 50 100 150 200 250 300
Time
Figure 12.1 Emergence of power law of practice from power law of for-
getting. Solid curve is activation predicted from Equation
1, assuming regularly spaced training events, each trace
decaying according to a power law. Dashed line is fit of
power law of practice.
Transfer
Transfer is an important consideration in any learning con-
text, because test situations are generally different from those
in which knowledge was acquired. The general question of how
experience in one task or context affects performance in another
is more complex than any existing psychological theory can
address. However, in simpler cases, where issues of task struc-
ture can be ignored and the relationship between training and
transfer situations can be reduced to some notion of psychologi-
cal distance, modeling of how knowledge is generalized from one
situation to another can be straightforward.
The most successful example of this approach is in research
on stimulus generalization, which investigates how knowledge
associated with one stimulus (e.g., an appropriate response) is
extended to another, novel stimulus. Extensive empirical evi-
dence and normative Bayesian modeling support the proposal
that stimulus generalization is an exponentially decaying func-
tion of perceived similarity between stimuli (Shepard, 1987).
This universal law of generalization can be formalized as
a = e−d (3)
where  is a scaling parameter and a is activation (in this case,

strength of generalization) as before, d is psychological distance
between training and transfer stimuli, and  is a parameter
determining how narrowly knowledge is generalized. A large
value of  produces generalization only when similarity is quite
high; a low value produces generalization in a wider range of
situations (Figure 12.2).
Key to this approach to modeling generalization is the notion
of psychological distance, which is negatively related to similar-
ity and depends on the assumption that stimulus representa-
tions can be modeled as points in a multidimensional Cartesian
space. The spatial model capturing distances between stimuli
can be derived from any of several multidimensional scal-
ing approaches (e.g., Kruskal, 1964; Shepard, 1962), starting
from pairwise confusion or similarity-rating data. The spa-
tial approach is known to be invalid for more complex stimuli
(e.g., Tversky & Gati, 1982), but several remedies exist, includ-
ing incorporating selective attention among stimulus dimen-
sions (Nosofsky, 1992), or replacing distance with alignability
of internal relational structure (Markman & Gentner, 1993) or
with transformation complexity (Chater & Vitányi, 2003). In
the remainder of this chapter, we interpret d as psychological
Generalization
Dissimilarity
Figure 12.2 Illustration of generalization component of the model

(Equation 3). Both curves show strength of generalization
as a function of dissimilarity between training and testing
tasks. Solid: Lower value of , leading to broader transfer.
Dashed: Higher value of , leading to narrower transfer.
dissimilarity between training and testing situations, with the

assumption that it is appropriately defined in accordance with
the type of tasks or stimuli in question.
An Integrated Model of Acquisition, Retention, and

Transfer
The models of retention (Eq. 1) and transfer (Eq. 3) discussed
above respectively address how the effectiveness of training
declines with delay or dissimilarity between training and test-
ing events. In practice, both effects are always relevant, because
testing is never simultaneous with training, nor is the testing
task ever perfectly identical to the training task. Therefore, we
seek an integrated characterization of how delay and dissimilar-
ity combine to determine the contribution of a training experi-
ence to test performance.
The retention and generalization equations (Eqs. 1 and 3,
respectively) are conceptually quite similar, both addressing
how differences between training and testing diminish the con-
tributions of past learning to current performance. Equation 1
deals with differences in temporal context, and Equation 3 deals
with all other differences (e.g., stimulus or response variables,
task structure, task context). Therefore, a natural proposal is
that delay and dissimilarity combine multiplicatively. The reten-
tion equation states that trace strength of any training event is
diminished by a factor of t− from acquisition to test. The gener-
alization equation states that the contribution of any training
event to test performance equals a proportion e−d of what its
contribution would be if training and testing situations were
identical. Therefore, taking both delay and dissimilarity effects
into account, the contribution of any past training experience
should be equal to
a = t–e−d (4)
The final question in specifying the model is how the contri-

butions of multiple training experiences combine. The model of
acquisition discussed above (Eq. 2) assumes additive contribu-
tions, an assumption that leads the model to correctly predict
the power law of practice (Figure 12.1). Additivity of past expe-
riences is also a common assumption in exemplar models of
concept learning (Nosofsky, 1986) that are built on the general-
ization model of Equation 3. Applying the same assumption to
Equation 4 yields the final form of the proposed model.
a= ∑βiti–e–d i (5)
i
Equation 5 states that knowledge activation at test equals

the summed contribution of all relevant training experiences,
indexed by i. Each such contribution divides into three compo-
nents that correspond to the processes of acquisition, retention,
and transfer. Acquisition, as defined by the initial trace strength
of each training experience, is determined by i, a parameter that
may vary according to characteristics of the training event (dis-
cussed below). Retention is determined by the time, ti, elapsed
since each training experience. The term ti− models the reduc-
tion in trace strength of that experience from acquisition to the
time of test. Transfer is determined by the distance or dissimi-
larity, di, between each training event and the test event. The
term e–di models the reduction in contribution of each training
experience due to differences from the test situation.
The model in Equation 5 allows prediction of knowledge
strength at test following an arbitrary training schedule, con-
sisting of experiences of varying similarity to the test task that
2
Activation 1.5
1
0.5
0
0
20
40
Tas
k
60
80
90 100
100 60 70 80
40 50
10 20 30
0
Time
Figure 12.3 Example dynamics of model, for a unidimensional task

space (arbitrary units). Activation surface shows strength
of knowledge for any task at any time, as predicted from
Equation 5. Peaks correspond to training events, at vari-
ous times and on various tasks.
occurred at various times in the past. Figure 12.3 illustrates

the resulting dynamics of knowledge strength over time, in the
simple case of a unidimensional task space (e.g., size or bright-
ness of a stimulus). The height of the surface indicates knowl-
edge strength indexed by task and time. Each peak corresponds
to a training event. Acquisition can be seen as the sudden jump
in knowledge at the time of training, retention is the subsequent
decay across time, and transfer is the spread of the learning
benefit to similar tasks. Knowledge and skill build up over the
course of learning (a second sense of acquisition), in a detailed
manner depending on the timing and identity of the training
tasks. The model makes predictions of knowledge strength for
all possible test tasks, at any time in the future.
The activation formula (Eq. 5) alone is insufficient to make
behavioral predictions, because knowledge strength is an unob-
servable psychological construct. Therefore, we offer two addi-
tional formulas, for using the output of Equation 5 to predict
accuracy and task-completion time. For accuracy, the model fol-
lows the assumption of ACT-R that activation (a) represents the
log odds of retrieving the correct response (Anderson & Schooler,
1991). Within the present framework, this assumption leads to
the following equation for accuracy.
1
p(correct) = (6a)
1 + (N –1)e–
The variable N represents the number of possible response
options. Equation 6a assumes that only the correct response
has been reinforced in training. When the learner has been
trained with multiple responses (e.g., in different training tasks),
the accuracy formula can be replaced with Equation 6b, which
is a variant of the classic Luce choice rule (Luce, 1963; Shepard,
1957).
ea j
p(Rj ) = (6b)
∑eak
k
In this formula, Rj represents the jth response, aj repre-

sents the activation of that response, and k indexes all possible
responses. This formulation allows the model to capture nega-
tive transfer effects, when previous training has reinforced a
response other than the one that is correct at test. In addition,
if only the correct response (Rj) has been trained, then ak = 0 for
all other responses, and Equation 6b reduces to Equation 6a.
For task-completion time (or response time, RT), the model
predicts
B
RT = A + — (7)
a
where A is asymptotic (i.e., minimal) time for highly trained
experts, and B is the amount of a novice’s completion time that
can be reduced by practice (Anderson, Fincham, & Douglass,
1999).
Equations 5 to 7, taken together, enable predictions of accu-
racy and response time following arbitrary training schedules.
To generate these predictions, one needs estimates of the model
parameters , , , and d. Table 12.1 summarizes the interpreta-
tions of these parameters, factors they are proposed to depend
on, and training principles they can help explain and implement.
Application to Training Principles

The model presented above captures or accommodates many of
the training principles described elsewhere in this volume (chap-
ters 2 and 3). Some of these principles derive directly from the
Table 12.1 Model Parameters
Parameter Interpretation Principles
β Acquisition—initial trace Deliberate practice
strength
 Retention—decay rate Power law of forgetting
Power law of practice
Depth of processing
Testing effect
Instance- vs. rule-based
training
Spacing of practice
Procedural vs. declarative
training
 Transfer—specificity of Generalization depends on
generalization similarity
Procedural reinstatement
d Dissimilarity between Attention effects on
training and test transfer
situations
core equations (Eqs. 5–7), whereas others can be implemented

via assumptions regarding how model parameters depend on
certain task factors. This section describes a number of training
principles and how they emerge from the model.
It should be noted that these training principles, as cap-
tured in the mathematical model proposed here and expressed
in equations, are candidates for inclusion in the military’s pro-
gramming platform known as IMPRINT (see chapter 10). As
noted above, IMPRINT is used in the military to make deci-
sions and plans about the deployment of personnel and maté-
riel to address issues or problems that might arise in military
operations. These training principles could be incorporated
into IMPRINT as performance shaping functions, thus expand-
ing the role of IMPRINT in the domain of training (see Bourne,
Raymond, & Healy, 2010a). The inclusion of training principles
in IMPRINT requires estimation of model parameters based on
real data. These data are available at least for some of the prin-
ciples. To date, these principles have not yet been added to the
IMPRINT platform.
Power Law of Practice
As discussed above, task completion time generally decreases
as a power law of the number of training encounters (Newell &
Rosenbloom, 1981; see also chapter 1). This principle emerges in
Equation 5 of the model from the retention component together
with the summation of trace strengths over training encounters,
as explained above (see Figure 12.1 and also Anderson & Leb-
iere, 1998). The validity of this principle has been established
in a variety of learning contexts including those involving the
learning of compatible and incompatible stimulus-response
associations (see chapter 5).
Power Law of Forgetting

As discussed above, task performance generally decreases as
a power function of the time since training (Wickelgren, 1972;
see also chapter 1). This principle arises from the model as a
direct consequence of the retention component of Equation 5
(ti –). The rate of forgetting in this power law is determined by the
decay parameter, . The empirical validity of this power law was
first established by Ebbinghaus (1885/1913) using a measure of
savings at relearning, and this principle has been consistently
replicated in more recent studies with a variety of performance
measures.
Deliberate Practice
Highly focused and highly motivated practice is best in terms
of promoting skill acquisition and expertise. This principle can
be incorporated into the model by assuming that acquisition
strength, , is greater for training experiences associated with
more focused or deliberate practice. The most definitive research
on this principle has been conducted by Ericsson and colleagues
(e.g., Ericsson, Krampe, & Tesch-Römer, 1993), who argued that
deliberate practice is a requirement for optimizing the level of
skill or becoming an expert at a given task.
Depth of Processing
Training that promotes deep and elaborate processing enhances
durability of knowledge (Craik & Lockhart, 1972). Deep process-
ing usually involves semantic relationships between the mate-
rial to be learned and the learner’s existing knowledge, whereas
shallow processing draws merely on the surface characteristics
of the material, such as phonological and graphemic features.
This principle is incorporated into the model on the assumption
that the major influence of depth of processing is to mitigate for-
getting. Thus, the decay parameter, , is reduced by enhancing
processing depth.
Testing Effect
Testing after initial presentation of material slows forgetting
(Carpenter, Pashler, Wixted, & Vul, 2008). This principle is
predicted as a corollary to the principle of depth of processing,
under the assumption that testing leads people to process infor-
mation more deeply. That is, testing events during the training
phase can be treated as additional training events with smaller
values of the decay parameter, .
Generalization Depends on Similarity

The gain in performance on one task as a consequence of train-
ing on a different task is an exponentially decaying function of
the dissimilarity between the two tasks (Shepard, 1987). This is
a direct consequence of the generalization component of Equa-
tion 5. The specificity of transfer, meaning how narrowly (vs.
broadly) knowledge and skills can be generalized, is determined
by the  parameter.
Procedural vs. Declarative Training

Learning of procedural and of declarative information differs
in two respects. First, training on procedural tasks is more
durable than training on declarative tasks (Healy, Wohldmann,
& Bourne, 2005). The model accommodates this principle by
assuming that declarative information is associated with a
greater decay rate (declarative > procedural). Second, declarative train-
ing leads to better generalization (Healy, 2007), in accordance
with the procedural reinstatement principle. The model accom-
modates this principle by assuming that procedural information
is associated with greater specificity of generalization (procedural
> declarative). For both types of information, performance at test
is improved by training that matches the test conditions as
closely as possible (Healy et al., 1992; Kole, Healy, Fierman, &
Bourne, 2010). This principle is predicted by the generalization
(e-d) component of Equation 5. When dissimilarity (d) between
training and testing conditions is low, the generalization term
will be close to 1, causing little loss in knowledge strength. The
greater value of the specificity-of-generalization parameter ()
for procedural information implies that procedural learning is
more sensitive to similarity between training and testing.
Instance- vs. Rule-Based Training

Training based on abstract rules versus specific instances
has differential consequences for resultant knowledge. First,
instance-based strategies lead to more efficient performance in
simple tasks, whereas rule-based strategies are optimal in more
complex tasks (Bourne, Healy, Kole, & Graham, 2006; Bourne,
Healy, Parker, & Rickard, 1999; Bourne, Raymond, & Healy,
2010b). This principle can be incorporated into the model by
assuming that  is a decreasing function of task complexity, and
that this dependence is stronger for instance- than rule-based
training. Second, rules tend to be more durably represented
in memory than are instances. The model can accommodate
this principle by assuming different decay rates, instance > rule.
Because rules are a form of declarative knowledge, combin-
ing this assumption with the above assumption regarding
procedural versus declarative knowledge leads to the ordering
instance > rule > procedural. The distinctions between instance- and
rule-based learning are discussed in greater detail in chapters
9 and 13.
Spacing of Practice
Knowledge is retained for longer periods when training sessions
are spaced in time than when they are massed (Cepeda, Pashler,
Vul, Wixted, & Rohrer, 2006; Cepeda, Vul, Rohrer, Wixted, &
Pashler, 2008; see also chapter 13). This principle is predicted
by the model under the assumption that the decay rate of any
training experience depends on the time since the preceding
encounter (Anderson & Schooler, 1991).
{ }
i = max 0, b(ti – ti–1) –0 (8)
Here, 0 is a maximal decay rate, ti – ti-1 is the interval since

the previous training event, and b is a scaling parameter. The
consequence of Equation 8 is that, if a training event occurs
after a longer gap, its memory trace will decay more slowly, lead-
ing to better long-term retention.
Attention Effects on Transfer
Knowledge is generalized more strongly between two situa-
tions when the psychological dimensions on which they differ
are unattended (Jones, Maddox, & Love, 2005; Nosofsky, 1986;
Sutherland & Mackintosh, 1971). This principle is consistent
with research on similarity that shows it is more dependent on
attended than unattended dimensions (Medin, Goldstone, &
Gentner, 1993; Tversky, 1977; see also chapter 4). The effects of
attention on generalization can be incorporated in the model by
defining the dissimilarity, d, between a training experience and
the test situation as an attention-weighted sum of their differ-
ences on individual psychological dimensions (Nosofsky, 1986).
d = ∑│traink – testk│ (9)

k
Here, k indexes the psychological dimensions on which train-

ing and testing scenarios might differ, and traink and testk denote
the corresponding values on those dimensions (which can be
coded as 0 and 1 for binary features). This equation applies to
all training events, and the index i from Equation 5 has been
suppressed here. The consequence of this principle is that, if
trainees vary in their focus of attention (due to instructions,
learning, or predisposition), the model will predict differential
patterns of transfer from different training tasks. Optimal per-
formance will arise if attention is low to irrelevant task dimen-
sions (thus facilitating positive transfer) and high to dimensions
that signal important differences between tasks (thus reduc-
ing negative transfer). This conception of attention in learning
relates to cognitive load theory, as discussed in chapter 4, in
that extraneous load draws attention to irrelevant or unneces-
sary information, impairing learners’ ability to generalize across
learning and testing experiences. Conversely, germane load
involves attention to task-relevant information, allowing learn-
ers to acquire important discriminations in how their behavior
should depend on task context.
Discussion
Development, implementation, and simulation of training sys-
tems can all benefit from quantitative predictions of performance.
The mathematical model proposed in this chapter offers one way
to generate such predictions, in a way that allows comparison
among arbitrary training schedules consisting of varying tasks
and contexts. It is founded on basic cognitive theory and has
potential utility in a variety of practical applications.
The model is formulated to be concise and general yet also
to capture a wide range of empirical principles related to the
effectiveness of training. The core activation equation is built
on the delineation of three fundamental cognitive processes of
learning: acquisition, retention, and transfer. Different model
parameters govern the characteristics of each process, thus
determining the initial strength of knowledge from each train-
ing experience, how that knowledge decays over time, and how
it can be applied to situations other than the one in which it was
acquired. The collective contribution of all past training events
determines the model’s prediction for performance in any future
testing situation.
Some work is still needed for the model to be implemented.
First, values of the model parameters must be determined.
These values will in general be domain-specific, so this work
would mostly need to be carried out empirically, for each
domain of interest. There are also interesting questions of vari-
ance and covariance of parameters across subjects, which could
be worked out if one desired to model individual differences.
Second, several qualitative parameter dependencies proposed
in the previous section would need to be formalized for model-
ing certain training principles. For example, it is proposed that
decay is slower for procedural than declarative knowledge and
for abstract rules than specific instances, but the magnitudes
of these differences (as well as possible domain dependencies)
are yet to be worked out. Third, the equations of the model need
to be implemented as computer code or integrated into existing
software. This effort should be fairly straightforward, given the
relative simplicity of the model.
Although the proposed model is meant primarily as a tool
for applying cognitive theory, it does raise interesting theoreti-
cal questions. One question concerns the relationship between
retention and generalization. Both of these phenomena concern
how the effect of learning diminishes as training and testing
contexts become more distant—either in time or in task char-
acteristics. The mathematical forms used to model the two pro-
cesses are compatible, but they are not identical, with retention
decaying according to a power function and transfer decaying
exponentially. The debate over power law versus exponential
memory decay has favored the power law (Wixted & Carpenter,
2007). However, we know of no studies in the generalization lit-
erature that test between exponential and power functions, and
we assume exponential generalization in the present exposition
only for consistency with prior modeling efforts (Shepard, 1987).
Therefore, an intriguing topic of future research would be to
evaluate a power law for generalization. If a power law were
empirically supported over the exponential, such a result would
suggest a more mathematically elegant—and perhaps theoreti-
cally deep—connection between retention and generalization.
Another question concerns how multiple prior experiences
combine to determine knowledge strength. Although exemplar
models have been highly successful (Hintzman, 1986; Logan,
1988; Nosofsky, 1992), there is good evidence from concept
learning that people learn more about a category than the union
of its elements (Sakamoto, Jones, & Love, 2008; Tenenbaum &
Griffiths, 2001). The spacing effect in memory is a temporal
analog of this phenomenon. Although Equation 8 above offers
one explanation of the spacing effect that maintains additiv-
ity of memory traces, the analogy to category learning (derived
from the connection between retention and generalization at the
core of the present proposal) suggests that richer forms of learn-
ing may be needed in memory modeling as well. One fruitful
approach may be to augment the current instance-based model
with rule-based abstraction of the sort discussed in chapters 9
and 13.
The above considerations suggest ways in which the model
might be refined or extended, and they might limit its predic-
tive power in its current form. Other limitations arise from
the simplifying view of transfer among tasks, specifically that
strength of transfer can be reduced to a notion of similarity.
A more complete model would need to address the internal
structure of tasks and the applicability of knowledge structures
acquired from one task in performing another. Such an exten-
sion would in turn entail consideration of the sort of construc-
tive learning (as opposed to mere accumulation of knowledge)
that presumably occurs as learners piece together experiences
to develop deeper representations and understandings of task
environments. Nevertheless, we suggest that the simplicity of
the present model makes it well suited for implementation in a
wide variety of applications, and that it can offer effective pre-
dictions of how people will perform following different histories
of training.
Note
1. An undesirable mathematical property of decreasing power laws
(i.e., with negative exponents) is that they predict infinite value
at t = 0. A useful convention for solving this problem is to start
time at 1, or equivalently to replace t with t+1. This approach has
the added benefit that  equals the initial memory strength at the
time of encoding.
References
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C.,
& Qin, Y. (2004). An integrated theory of the mind. Psychological
Review, 111, 1036–1060.
Anderson, J. R., Fincham, J. S., & Douglass, S. (1999). Practice and
retention: A unifying analysis. Journal of Experimental Psychology:
Mahwah, NJ: Erlbaum.
Anderson, J. R., & Schooler, L. (1991). Reflections of the environment in
memory. Psychological Science, 2, 396–408.
Archer, R., Walters, B., Yow, A., Carolan, T., Laughery, K. R., & Gillis, P.
(1999). Training as a performance shaping factor in computer gen-
erated forces. Proceedings of the 1999 Computer Generated Forces
Conferences, Orlando, FL.
Bourne, L. E., Jr., Healy, A. F., Kole, J. A., & Graham, S. M. (2006).
Strategy shifts in classification skill acquisition: Does memory
retrieval dominate rule use? Memory & Cognition, 34, 903–913.
Bourne, L. E., Jr., Healy, A. F., Parker, J. T. & Rickard, T. C. (1999).
The strategic basis of performance in binary classification tasks:
Strategy choices and strategy transitions. Journal of Memory and
Language, 41, 223–252.
Bourne, L. E., Jr., Raymond, W. D., & Healy, A. F. (2010a). Quantify-
ing performance effects of training manipulations: Performance shap-
ing functions based on selected training principles (Technical report:
CRT Publications). Boulder: University of Colorado.
Bourne, L. E., Jr., Raymond, W. D., & Healy, A. F. (2010b). Strategy
selection and use during classification skill acquisition. Journal
500–514.
Carpenter, S. K., Pashler, H., Wixted, J. T., & Vul, E. (2008). The
effects of tests on learning and forgetting. Memory & Cognition, 36,
438–448.
Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008).
Spacing effects in learning: A temporal ridgeline of optimal reten-
tion. Psychological Science, 19, 1095–1102.
Chater, N. & Vitányi, P. (2003). The generalized universal law of gener-
alization. Journal of Mathematical Psychology, 47, 346–369.
Behavior, 11, 671–684.
Ebbinghaus, H. (1913). Memory: A contribution to experimental psy-
chology. New York: Teachers College, Columbia University. (Original
work published 1885)
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of
deliberate practice in the acquisition of expert performance. Psycho-
logical Review, 100, 363–406.
Healy, A. F., Fendrich, D. W., Crutcher, R. J., Wittman, W. T., Gesi, A.
T., Ericsson, K. A., & Bourne, L. E., Jr. (1992). The long-term reten-
tion of skills. In A. F. Healy, S. M. Kosslyn, & R. M. Shiffrin (Eds.),
From learning processes to cognitive processes: Essays in honor of
William K. Estes (Vol. 2, pp. 87–118). Hillsdale, NJ: Erlbaum.
cal Association.
Hintzman, D. L. (1986). “Schema abstraction” in a multiple-trace mem-
ory model. Psychological Review, 93, 328–338.
Jones, M., Maddox, W. T., & Love, B. C. (2005). Stimulus generalization
in category learning. In Proceedings of the 27th Annual Meeting of
the Cognitive Science Society (pp. 1066–1071).
Kruskal, J. B. (1964). Nonmetric multidimensional scaling: A numeri-
cal method. Psychometrika, 29, 115–129.
Luce, R. D. (1963). Detection and recognition. In R. D. Luce, R. R.
Bush, & E. Galanter (Eds.), Handbook of Mathematical Psychology
(pp. 103–189). New York: Wiley.
Markman, A. B., & Gentner, D. (1993). Structural alignment during
similarity comparisons. Cognitive Psychology, 25, 431–467.
Medin, D. L., Goldstone, R. L., & Gentner, D. (1993). Respects for simi-
larity. Psychological Review, 100, 254–278.
and the power law of practice. In J. R. Anderson (Ed.), Cognitive
skills and their acquisition (pp. 1–55). Hillsdale, NJ: Erlbaum.
Nosofsky, R. M. (1986). Attention, similarity, and the identification-cat-
egorization relationship. Journal of Experimental Psychology: Gen-
eral, 115, 39–57.
Nosofsky, R. M. (1992). Similarity scaling and cognitive process mod-
els. Annual Review of Psychology, 43, 25–53.
Petrov, A. A., & Anderson, J. R. (2005). The dynamics of scaling: A
memory-based anchor model of category rating and absolute identi-
fication. Psychological Review, 112, 383–416.
Rubin, D. C., & Wenzel, A. E. (1996). One hundred years of forgetting:
A quantitative description of retention. Psychological Review, 103,
734–760.
Sakamoto, Y., Jones, M., & Love, B.C. (2008). Putting the psychology
back into psychological models: Mechanistic vs. rational approaches.
Memory & Cognition, 36, 1057–1065.
Shepard, R. N. (1957). Stimulus and response generalization: Deduc-
tion of the generalization gradient from a trace model. Psychological
Review, 65, 242–256.
Shepard, R. N. (1962). The analysis of proximities: Multidimensional
scaling with an unknown distance function. Psychometrika, 27,
125–140.
Shepard, R. N. (1987). Towards a universal law of generalization for
psychological science. Science, 237, 1317–1323.
Sutherland, N., & Mackintosh, N. (1971). Mechanisms of animal dis-
crimination learning. New York: Academic Press.
Tenenbaum, J. B., & Griffiths, T. L. (2001). Generalization, similar-
ity and Bayesian inference. Behavioral and Brain Sciences, 24,
629–640.
Tversky, A. (1977). Features of similarity. Psychological Review, 84,
327–352.
Tversky, A., & Gati, I. (1982). Similarity, separability, and the triangle
inequality. Psychological Review, 89, 123–154.
Wickelgren, W. A. (1972). Trace resistance and the decay of long-term
memory. Journal of Mathematical Psychology, 9, 418–455.
Wixted, J. T., & Carpenter, S. K. (2007). The Wickelgren power law
and the Ebbinghaus savings function. Psychological Science, 18,
133–134.
13 Put the SPRINT
in Knowledge Training
Training with SPacing,
Retrieval, and INTerleaving
Mark McDaniel
Washington University in St. Louis
The general focus of this chapter is the challenge of training

knowledge for particular jobs or for particular tasks. Clearly,
this topic cannot be treated comprehensively in a single chap-
ter. Rather, the chapter highlights several parameters incorpo-
rated into a number of knowledge training environments that
are arguably nonoptimal, suggests modifications based on well-
established principles in the basic memory literature (princi-
ples mentioned throughout this volume; see chapters 2, 3), and
describes recent research that demonstrates the effectiveness of
the suggested modifications.
Knowledge Training: Illustrative Cases

Consider three jobs that require extensive knowledge training,
that of high school biology teacher, business executive, and
medical doctor. The high school biology teacher, the business
executive, and the doctor gain jobs in their professions following
a general education along with some specific training in the con-
tent area. Often, however, there is need for additional knowledge
training to enhance job performance (indeed for the high school
teacher this can include the requirement to obtain advanced
degrees or additional knowledge training as professional devel-
opment; for the medical doctor this includes continuing medical
training; Moulton et al., 2006). Such knowledge training is fre-
quently provided as intense short courses, which, for example,
may involve 5 days during the teacher’s summer break, or sev-
eral days over a weekend for the doctor (Moulton et al., 2006)
and business executive. After the short course, the trainees
are expected to have gained the target knowledge to improve
their job performance (implementing better business practices,
or providing more knowledgeable instruction). Yet, a potential
268 Mark McDaniel
obstacle to effective integration of the new knowledge into the
job is that the trainees may not retain the target knowledge.
For instance, in a 5-day summer institute for high school biol-
ogy teachers, there could be a lecture on the principles of neu-
roscience and an associated lab exercise that could illustrate
possible activities for students. The idea is that training in this
knowledge will allow the teachers to provide more comprehen-
sive and accurate instruction to their high school students. The
catch is that this topic may not be included in the high school
curriculum until some months after the summer institute, and
some teachers report that by then they have forgotten nuanced
details that they need to enrich their standard lessons. Con-
sequently, the lessons remain as they were before the teacher
attended the institute for knowledge training (sometimes cre-
dentialed by the awarding of an advanced degree).
The above scenario is not an isolated one. In some medical
schools, students spend an intensive several week rotation in a
particular medical specialty, such as internal medicine. Students
completing the rotation are then expected to have the knowledge
necessary to skillfully consider straightforward pathologies they
may subsequently encounter. For example, after training stu-
dents may need to recall the workup of anemia. Also, series of
didactic conferences are mandated for resident training cur-
ricula (Accreditation Council for Graduate Medical Education,
2007; see as well Moulton et al., 2006, for 1-day or a weekend
continuing medical education programs; other examples of this
kind of knowledge training include summer boot camps in bio-
medical techniques; Rohrer & Pashler, 2010). Instructors com-
plain, however, that they are doubtful that the medical residents
have retained the core knowledge targeted in the rotation by
the following summer. Indeed, studies have documented that
residents who attend didactic conferences perform no better
on long-term assessment measures than residents who do not
attend (Fitzgerald & Wenger, 2003; Picciano et al., 2003). Again,
arguably the knowledge training has not met its objective.
In sum, in many circumstances successful training requires
that the trained knowledge be retained over relatively long inter-
vals before it is first applied (before the knowledge is needed).
Unfortunately, at least three aspects of the mentioned knowl-
edge training scenarios disfavor precisely this outcome. First,
repetition of the material, if it occurs, is presented in a fairly
compressed time frame. Second, in these situations there is little
if any retrieval practice. That is, the short professional develop-
ment courses for the high school biology and business profes-
sionals do not include quizzing and testing, techniques that
Put the SPRINT in Knowledge Training 269
require learners to retrieve information from long-term memory.
Similarly, the rotations for the medical students frequently do
not involve quizzing or testing. Third, there is little interleaving
of similar material; that is, content that is similar but requires
differentiation may not be interspersed; rather, it is trained in a
blocked fashion. For instance, in medical training on the gastro-
intestinal tract, the content would be typically blocked accord-
ing to cell types within the stomach, small intestine, and large
intestine. These cells serve different functions, but have simi-
lar appearances and are difficult for students to distinguish.
For instructors and students, blocking training by cell type has
intuitive appeal. Yet these parameters of many knowledge train-
ing contexts (little retrieval practice, massed repetition, and
blocked presentations) are counter to favored principles in basic
skill training for supporting acquisition and retention: practice
the skill, space the practice, and interleave practice of different
skills (Bjork, 1994; see chapter 2 in this volume).
The following sections review evidence from laboratory experi-
ments and authentic knowledge-training contexts indicating
that the commonplace knowledge-training practices just consid-
ered might be significantly enhanced by incorporating SPacing,
Retrieval practice, and INTerleaving (SPRINT). The final section
then offers brief illustrations of how these knowledge-training
practices might be modified to improve knowledge retention in
knowledge-training programs like those mentioned above, and in
turn, improve the success of these knowledge training programs.
Spacing
A fair amount of laboratory research has indicated that reten-
tion of verbal material is improved with spaced review of material
rather than massed review (see Cepeda, Pashler, Vul, Wixted, &
Rohrer, 2006, for a review). However, for several reasons this
body of work may not be compelling to professionals who are
in the business of knowledge training. The materials used in
such laboratory research are simple (nonsense syllables, word
pairs, word lists), and not reflective of the more complicated and
conceptual material that is central to many knowledge-training
contexts. Further, the retention intervals studied are usually
on the order of hours (if that) rather than the time frame of
weeks and months (and years) that is of interest in the authentic
knowledge training situations noted at the outset. Accordingly, a
legitimate question is whether spacing fosters better retention of
more authentic materials (those of interest in knowledge train-
ing) over long intervals spanning months and years.
270 Mark McDaniel
Recent research has begun to address this key question. In
one experiment, retention of knowledge over a 9-month time
interval was examined as a function of the degree of spacing (for
review of the target information). Students in an 8th grade U.S.
history course were given a set of facts to review that they had
learned in the course (Carpenter, Pashler, & Cepeda, 2009). One
group of students reviewed the facts relatively soon after they
had finished their exams in the course (1 week later), whereas
another group reviewed the facts with much greater spacing
after finishing the exams (16 weeks later). On a recall test that
was administered 9 months after each group’s review session,
the group with greater spacing remembered over 50% more of
the reviewed facts than did the group with minimal spacing.
Moreover, facts reviewed with minimal spacing showed mini-
mal retention relative to tested (control) facts that had not been
reviewed. Thus, spaced review of target knowledge enhanced
retention over long periods that are reflective of those of import
in knowledge training contexts.
In another impressively systematic experiment, a greater
range of spacing gaps (the interval between initial learning and
review) was examined, as well as several long retention intervals
(the time interval between the review and the final test) that
ranged from 1 week to 1 year (Cepeda, Vul, Rohrer, Wixted, &
Pashler, 2008). Subjects initially had to learn 32 trivia facts;
across conditions, subjects reviewed these facts immediately
upon initial learning (no spacing) or after a particular gap (e.g.,
1 day, 1 week, 3 weeks, or 15 weeks). Final recall and recogni-
tion tests administered after the prescribed delay showed that
spaced review produced better long-term knowledge retention
than did massed review.
Importantly, the study also established with ecologically valid
retention intervals that the optimal spacing gap for knowledge
retention depends on the ratio between the spacing gap and the
retention interval. The results were roughly consistent with the
rule of thumb that spacing is most effective when the time inter-
val of the gap is approximately 15 to 20% of the retention inter-
val. For instance, for a 10-week retention interval, to provide the
optimal benefit of spacing, the gap between initial learning and
review should be between 1 and 2 weeks. For more precise opti-
mization of spacing it is worth noting, however, that the opti-
mal ratio of spacing gap to retention interval became smaller
with the longest intervals so that for a 1-year retention inter-
val a spacing gap of about a month (8%) was optimal. Also, the
decline in retention when nonoptimal spacing gaps were used
was more substantial when the gaps were too short than when
the gaps were longer than the optimal gap. These findings rein-
force the observation that knowledge-training contexts that rely
on confining training to compressed time frames do not favor
long-term retention of the knowledge.
Spacing in Authentic Training Contexts

Recent experimental results from authentic training contexts
converge on the above conclusions. In a required English pro-
ficiency course (English was the nonnative language), students
were given five remedial training sessions on English verb mor-
phology for past and past perfect tenses (these forms are typi-
cally difficult to master for nonnative speakers; Bird, 2010).
The spacing of these 1-hour training sessions (embedded in
the 14-week course) was manipulated. One spacing gap was a
relatively short 3 days; thus, training on the verb tenses was
compressed into less than 2 weeks, an interval reflecting those
common to intensive language training courses. The other spac-
ing gap was 14 days. The retention test was administered at
either 7 days or 60 days following the last training session. On
the 7-day posttest, significant gains in performance (judging
grammaticality of sentences) were observed relative to a pre-
training baseline regardless of the spacing used in training.
However, the 60-day posttest indicated that these gains were
only sustained when training was generously spaced. That is,
significant forgetting occurred after compressed training (every
3 days) but not spaced training (every 14 days). This pattern
again indicates that spaced training is preferred to compressed
training for achieving the objective of long-term retention of
knowledge, and also dovetails with the rule of thumb described
above that the spacing intervals should be about 20% of the
anticipated retention interval (here the 14-day spacing was
23% of the 60-day retention test). The strong implication is that
intensive (compressed) training programs are inimical to long-
term proficiency.
In an experiment that directly reflects the scenarios of com-
pressed knowledge training outlined at the outset, surgical resi-
dents were either given four training sessions on microsurgery
in 1 day (including videos on microsurgery) or given one training
session per week over 4 weeks (Moulton et al., 2006). A reten-
tion test and a transfer test were administered 1 month at the
conclusion of the final training session, with both tests requir-
ing performance of the trained surgical procedure. Both groups
showed improvement over training; however, the spaced group
retained the trained surgical skill at a significantly higher level
272 Mark McDaniel
than the nonspaced training group. A new and important find-
ing was that spaced training also produced significantly better
performance on the transfer task, microsurgery on a live ani-
mal, according to expert-based judgments (blind to condition)
and to ability to complete the microsurgery (16% in the nons-
paced group damaged a vessel beyond repair, whereas no one in
the spaced group did so).
Spacing and Induction

When knowledge training is directed at conceptual information
or category patterns that typically require some induction (e.g.,
training a medical student to distinguish x-rays of pathological
tissue from normal tissue; training forestry students the leaf
shapes of different hard woods), the issue of transfer of train-
ing is paramount because the training exemplars will rarely be
those that are encountered during the job. Interestingly, a theo-
retical premise that has remained unchallenged until recently is
that “spacing is the enemy of induction” (Rothkopf; cited in Kor-
nell & Bjork, 2008). The reasoning here is that spacing obscures
the similarities among examples illustrating a concept, whereas
massing the examples from a category highlights their essential
similarities, thereby promoting learning of the concept. In one of
the first tests of this assertion using a naturalistically inspired
concept-learning task, McDaniel, Fadler, and Pashler (2011)
completed an experiment to contrast massing versus spacing for
learning function concepts (relations among continuous input
and output values). Function concepts are encountered in many
contexts, such as forecasting interest rates based on inflation
rates, predicting job performance on the basis on intelligence,
and predicting harvest yields on the basis of the amount of rain-
fall (McDaniel & Busemeyer, 2005). Because the experimental
results are not yet published, some of the key procedural details
will be described.
Participants were given the cover story that they were hired
by NASA to figure out the behavior of a new organism found on
Mars, an organism that ingested a Martian element (zebon) and
released another element (beros). Participants were told that
they needed to learn the relation between the input amounts
and the output amounts. They were presented with training
points (input values) and had to predict the output value of each
input training point. The input-output relations characterized
a V-shaped function as displayed in Figure 13.1. Learners were
given 10 blocks of training on 20 input values within a certain
range (80–120), with feedback on every trial. Several minutes
140
Output (Beros Released)
120 ▪ Trained output values
100
80
60
40
20
0
40 60 80 100 120 140 160
Input (Zebon Absorbed)
Figure 13.1 The trained points for the function learning experiment
contrasting massed and spaced training.
after training, learning was tested by presenting a sample of the

original training points, and transfer was tested by presenting
new points not seen in training (no feedback was provided dur-
ing the test phase). The new points reflected both interpolation
(points within the training range) and extrapolation (points out-
side the training range).
The key manipulation was spacing or massing the presen-
tation of the training points. For the spaced presentation, the
trained points were presented once each in a block, and then
the blocks were repeated. For massed presentation, McDaniel,
Fadler, and Pashler (2011) initially presented each training point
5 times in succession; once all 20 training points had been pre-
sented in this fashion (100 trials), this sequence was repeated
(another 100 trials). However, this massed training procedure
produced little if any learning of the function (as determined by
highly inaccurate test performances on both trained points and
new points not seen in training). Accordingly, McDaniel, Fadler,
and Pashler (2011) reasoned that to be effective the massed pre-
sentation needed to highlight the degree to which output values
changed as a function of change of the input values (e.g., the
function rule). To do so, pairs of input points were massed during
training (10 massed repetitions of each pair per block, with each
successive training block presenting a different pair) so that the
change in y (output) as a function of the change in x (input)
would be readily apparent. The retention and transfer perfor-
mances of this massed group and the spaced training group are
shown in Figures 13.2 (retention) and 13.3 (transfer). Inspection
of Figure 13.2 reveals that spaced training produced much bet-
ter learning and retention of training points. Figure 13.3 indi-
cates that spaced training also produced more accurate transfer
274 Mark McDaniel
60
50
Output (Predicted) Value
40
Output
30 Massed Predict
Spaced Predict
20
10
0
60 70 60 50 100 110 120 130
Input value
Figure 13.2 Mean predicted outputs at test for the trained points for
the massed and spaced conditions.
than massed training; that is, the responses to interpolation

and extrapolation points better approximated the given function
after spaced training (especially so for the interpolation points,
though good interpolation alone does not implicate induction
of the function; DeLosh, Busemeyer, & McDaniel, 1997). These
transfer patterns suggest that there was better induction of the
function after spaced than massed training, at least for some of
the learners in the spaced training. Indeed, a startling aspect of
these results is that massed training, despite the intuition that
it might make the functional relation more salient, appeared to
support little, if any, learning about the function.
140
120
Output (Predicted) Value
100
60
Output
Massed Predict
Spaced Predict
60
40
20
0
0 20 40 60 80 100 120 140 160
Input value
Figure 13.3 Mean predicted outputs at test for transfer points for the
massed and spaced conditions.
In sum, recent experimental work with relatively real-world
materials and retention intervals converges on the conclusion
that spacing knowledge training promotes long-term reten-
tion of target factual information and target skills relative to
compressing training into short time periods. Importantly, the
newest results also suggest that spacing is favored for knowl-
edge training that involves induction of concepts. This pattern
is especially informative and critical for training practice, as
learners’ intuitions (and presumably those of people responsible
for designing training) are that massing supports better induc-
tive learning than does spacing (Kornell & Bjork, 2008).
Retrieval Practice
The Holy Grail for skill training is practice, practice, practice. A
key issue for knowledge training is to identify what constitutes
effective practice. One straightforward idea is to repeatedly
study, review, or reread the target information. An alternative
is to practice retrieval of the information. Retrieval practice
essentially involves testing oneself on the target information
or having tests administered as part of the training protocol.
Theoretically, testing is not just an assessment of knowledge;
it also modifies memory (Carpenter & DeLosh, 2006; McDan-
iel & Masson, 1985). Empirically, the basic-laboratory literature
has shown that repeated testing is substantially more effective
in promoting retention than is repeated study (e.g., Karpicke
& Roediger, 2007, 2008). Though suggestive, these basic labo-
ratory paradigms (which also are the foundation for the theo-
retical work) diverge in potentially critical ways from many real
knowledge-training contexts. In the typical laboratory para-
digms, the materials are presented in a single session, learners
have no opportunity to review the material outside of the labo-
ratory session, and learners may have little inherent motivation
to learn the target material. In contrast, in most training situa-
tions, the target material is emphasized in presentations, dem-
onstrations, simulations, labs, and so on, and can be reinforced
with assigned reading and homework. Accordingly, a legitimate
question is whether retrieval practice has substantial benefits
when applied to ecologically valid training contexts.
Effects of Retrieval Practice (Quizzing) On Exam

Performance
The most relevant experimental studies addressing this ques-
tion are being conducted in recent work in educational settings.
276 Mark McDaniel
One ongoing project is examining the benefits of low-stakes
quizzing (testing) on middle-school students’ social studies and
science course exams. The general paradigm involves giving
students several in-class multiple-choice quizzes (with feed-
back) for target content from each unit. Quizzing is conducted
prelecture, immediately postlecture, and a day prior to the unit
exam. The experiments manipulate quizzing within-subjects,
so that some of the target content that forms the unit exam is
quizzed and some of the target content is not quizzed, with the
particular content for quizzing (or not quizzing) randomly deter-
mined across several class sections. Performance on the unit
examinations and cumulative semester examinations (contain-
ing both quizzed and nonquizzed items) indicates whether quiz-
zing affected learning and retention.
Several informative results have emerged from the above
project. First, quizzing produced significant gains in exam per-
formance (comparing exam performance on quizzed versus
nonquizzed content) for a range of topics in social studies and
science (McDaniel, Agarwal, Huelser, McDermott, & Roediger,
2011; Roediger, Agarwal, McDaniel, & McDermott, 2011). The
gains were long-term as they persisted for months, with quiz
items faring better on semester and end-of-year cumulative
exams than nonquizzed items (see also Lyle & Crawford, 2011,
for similar results in a college statistics course). One interpre-
tational ambiguity of this pattern, however, is that the quiz-
zing effects may simply be showing that reviewing material is
advantageous, rather than indicating that retrieval practice
produces learning benefits. To address this issue, Roediger et
al. (2011, Experiment 2) included a restudy (review) control in
which, instead of being quizzed, some of the target content that
would subsequently be on the exam was presented in class for
students to review (at times that coincided with the quizzing).
As shown in Figure 13.4, the review only slightly improved exam
performance relative to content not reviewed (or quizzed), and
quizzing produced gains that were significantly above that evi-
denced for reviewed content. Further, the slight improvement
gained by review was short-lasting, as review did not enhance
long-term retention (semester exam performance), whereas as
noted above, quizzing did enhance long-term retention.
Perhaps more pertinent to knowledge training per se, espe-
cially in terms of the situations introduced at the outset, an
experiment examined the value of repeated testing for promot-
ing long-term retention (about 6 months) of medical information
presented at a didactic conference for pediatric and emer-
gency medicine residents (Larsen, Butler, & Roediger, 2009).
100
Tested
89.4 Read
90
83.0 Non-Tested
Percent Correct
80 76.1
70
59.7
60
53.7 54.9
50
40
Chapter Exams End of the Semester
Figure 13.4 6th Grade Social Studies (data from Roediger et al., 2011).
Immediately following a 1 hour teaching conference covering

two pediatric neurological emergencies that could be encoun-
tered, residents studied a review sheet on one of the neurologi-
cal emergencies and took a test (with feedback) on the other
neurological emergency (the content studied or tested was coun-
terbalanced across the residents). The same review sheet and
test were then presented twice more at 2 week intervals. Six
to 7 months after the didactic conference, the residents took
a final examination on both topics (the tests were identical to
the follow-up examinations). Final examination performances
showed that long-term retention of the information presented at
the didactic conference was significantly and consistently better
after repeated testing than after repeated study. Indeed, for one
of the topics, final performance after repeated testing was nearly
twice that achieved after repeated study. Note that because the
repeated study was spaced, one could expect that the repeated
study itself was somewhat beneficial to long-term retention (e.g.,
Carpenter et al., 2009). Thus, testing boosted long-term reten-
tion of critical medical information (information that the resi-
dents judged to be highly useful and well taught) that was above
and beyond any positive effects of spacing per se.
Retrieval Practice (Quizzing) and Transfer

An important feature of the research described thus far is that
the questions for the final examinations were identical to the
questions appearing on the follow-up tests (or quizzes). This pro-
cedure is appropriate in knowledge training situations in which
278 Mark McDaniel
the precise expression of the target knowledge in the job environ-
ment can be well specified; here, repeated practice on particu-
lar questions that require such expression can be implemented.
However, trainers may not know ahead of time the exact manner
in which knowledge will be put to use in the job environment. In
these situations, testing would be broadly useful if it enhanced
transfer or generalization of the tested information to novel
expressions (new questions) of that information. Research is just
now emerging in educational contexts examining the value of
testing (retrieval practice) for improving final performance on
tests (questions) that are not identical to those used in follow-up
tests, and the results are promising.
In the middle-school testing project, McDaniel, Thomas,
Agarwal, McDermott, and Roediger (2012) examined whether
testing would foster transfer or generalization of 7th and 8th
grade science concepts. Students were repeatedly quizzed on
one particular application of a concept, followed by a unit exam
that tested application of the concept in a different context or
instantiation. For instance, for the concept of competition, the
quiz question was, “Both foxes and raccoons on Long Island
eat pheasant, which in recent years, has been in decline. The
foxes and raccoons’ situation is an example of what ecological
process?” The exam question posed a different context: “A group
of 500 pandas are living in a reserve. Recent dry weather has
reduced the bamboo populations, which the pandas rely on. The
pandas are in what type of relationship?” Exam performance on
these application questions was better if the concept had been
quizzed with the different application than if the concept had
not been quizzed. The application quizzes also improved per-
formance on exam questions that focused on definitions of the
concepts (relative to no quizzing). However, quizzing on defini-
tional information did not consistently improve performance on
application exam questions (relative to no quizzing). These pat-
terns indicate that testing, especially application test items, can
promote flexible application of target content to new contexts
(i.e., not previously instructed) in authentic knowledge-training
settings.
Incorporating repeated testing into training (as was done in
the just reviewed studies), allows an interesting additional possi-
bility for improving generalization or transfer. The basic problem
solving and concept learning literatures suggest that learning
and transfer might be optimized by varying the contexts used
across the repeated test questions so that the target concept/
principle is more broadly and completely illustrated (e.g., Gick &
Holyoak, 1983; Homa & Vosburgh, 1976; Jacoby, Wahlheim, &
Coane, 2010). Glass (2009) examined this possibility in a college
lecture course in which students were given three quizzes (pre-
class, in class, and postclass 1 week after the class) prior to the
exams, with the quizzes composed of inference questions requir-
ing application of facts covered in the class. For some items, the
same inference question was presented across three quizzes;
for other items a different context (instantiating the inference)
was presented on each quiz. The corresponding exam questions
also required application (inference) of the fact, but in a context
that differed from any of the quiz questions. On unit exams and
cumulative final exams, exam performance on the application
questions was improved when the quizzes varied the inference
questions relative to repeating the identical question.
Retrieval Practice through Self-Testing

In some training contexts (e.g., medical training for residents,
especially in the context of didactic conferences), implementing
quizzing and testing may be awkward or impractical. An alter-
native is to encourage learners to self-test. For instance, learn-
ers could be instructed to recite target material after completing
a reading assignment or attending a didactic conference. In one
experiment, college students read encyclopedia articles on how
brakes work and how pumps work (McDaniel, Howard, & Ein-
stein, 2009). Some students then recited (retrieved) as much of
the material as they could remember, and others reread the arti-
cles. One-week retention was significantly better after recitation
than after rereading (restudy). Further, recitation also improved
problem solving (figuring out how to fi x a broken pump or how to
build a better brake). Thus, retrieval not only enhances knowl-
edge retention, it also may foster a more organized (Zaromb &
Roediger, 2010) representation or perhaps a more coherent men-
tal model of the target content.
Further, in line with a major theme of this volume, retrieving
content to be used for the task at hand might itself be viewed
as a skill that improves with practice (e.g., facts about brakes,
information about a surgical technique). To the extent that this
idea has merit, it raises the exciting possibility that training
regimens that consistently incorporate testing to improve learn-
ing of target content might also benefit retrieval skill in general,
so that learners become more capable of retrieving information
relevant to their tasks. Favoring this possibility are contempo-
rary theoretical advances in memory that emphasize cognitive
control of memory and retrieval (e.g., Benjamin & Ross, 2008)
and successful retrieval-training procedures for older adults
280 Mark McDaniel
that increase their reliance on recollection (rather than familiar-
ity) under conditions of high interference during test (Jennings
& Jacoby, 2003). The possibility that quizzing/testing could
enhance retrieval skill in general appears to be fertile ground
for further investigation.
In sum, testing (quizzing) is an effective way to implement
practice for knowledge training. Testing promotes long-term
retention of knowledge relative to restudying. Testing supports
flexible use of that knowledge (i.e., improvements in performance
are not limited to the identical test items), so that the knowledge
can be applied to new contexts. This effect of testing appears to
be augmented when repeated tests vary the instantiation of the
target fact or construct. Self-testing (recitation) also yields bene-
fits. As well, the retrieval processes involved in recitation (recall)
may modify learning so that the knowledge is better organized
and integrated into a coherent mental model, thereby facilitating
use of that knowledge in problem solving and transfer outside
the particular training context.
Interleaving
Knowledge-training situations require training a body of knowl-
edge and skills that require differentiation among different kinds
of problems and situations within a particular domain and link-
ing appropriate responses to these problems and situations. For
instance, in medicine, training in oncology requires students
to learn about and differentiate different types of tumors and
appropriate treatments. In business, training in accounting
requires students to learn different computational and concep-
tual components; for example revenue versus income. A straight-
forward training procedure that seems intuitively superior to
many individuals is to block training of each particular problem
type (cf. Kornell & Bjork, 2008). Indeed in teaching math, most
mathematics practice assignments consist almost entirely of a
block of problems on the immediately preceding topic (e.g., a
set of 24 ratio problems after a lesson on ratios; Rohrer & Tay-
lor, 2007). Unfortunately, blocking practice may not foster the
ability to discriminate among similar problems and situations,
thereby disrupting the selection of an appropriate and effective
response.
Research from several arenas suggests that interleaving prac-
tice, such that different types of similar problems are intermixed
rather than blocked, produces much more effective training out-
comes than does the standard blocked practice. This claim is par-
ticularly supported in studies of motor skill learning (reviewed
in Rohrer & Pashler, 2010). Regarding knowledge training, Kor-
nell and Bjork (2008) presented learners with paintings from
various artists either blocked by artist or interleaved by artist
(so that paintings from the same artist were never successively
presented). After the learning phase, participants were required
to classify new paintings (not seen in training) from the same
artists. Across several experiments, countering popular wisdom
and to the surprise of the researchers, interleaving exemplars
during training promoted more accurate categorization of the
new paintings than did blocking the exemplars.
Building on the findings described above, Rohrer and col-
leagues conducted several clever experiments to isolate and
demonstrate the benefits of interleaved practice in math learning
(Rohrer & Taylor, 2007; Taylor & Rohrer, 2010). In one experiment,
college students were taught four different types of geometric
solids and how to find their volume (wedge, spheroid, spherical
cone, and half cone). One group of students was trained using a
typical blocked procedure: for each solid a tutorial was provided,
followed immediately by four practice problems on the solid.
Another group of students was trained using an interleaved pro-
cedure. The tutorials on all four solids were first provided, and
then the practice problems were presented in a random order (but
constrained so that one of each problem was presented in a set
of four). Both the blocked and interleaved groups were given two
training sessions, spaced 1 week apart. A week after the second
training session, students were tested with two novel problems
for each of the four solids. Students who received the blocked
training procedure, despite two spaced sessions, performed
poorly on the test, solving an average of only 20% of the problems
(less than two problems)! By contrast, students given interleaved
training showed over a three-fold advantage over blocked train-
ing in their ability to solve the test problems (63% of the problems
solved). Similar benefits of interleaving were reported in a related
experiment on math training in young children (Taylor & Rohrer,
2010). Importantly, converging with the claim that blocking is
inimical to learning which response is appropriate for a given
problem or context, the high failure rate on the test problems (for
the blocked group) was nearly always a consequence of selecting
an incorrect formula (see also chapter 2 in this volume; Schnei-
der, Healy, & Bourne, 2002).
Further, performance on the training problems provides some
insight into why instructional designers, as well as trainees,
may erroneously believe that blocked training is preferred. With
blocked training, practice performance was particularly good
(89% averaged across the two sessions), and significantly better
282 Mark McDaniel
than with interleaved training (60% on average; Rohrer & Taylor,
2007). Blocked training may well quickly provide learners with
facility in how to solve each problem, thereby giving the illusion
that learning is complete. However, to apply the knowledge (e.g.,
in the job environment) individuals need to be able to deter-
mine which response is most appropriate for the problem that is
encountered. The evidence available at this time indicates that
interleaving but not blocking assists in training this knowledge.
Applying SPRINT to Real-World Knowledge

Training
To close, let us now consider how SPRINT might be applied to
improve training outcomes from several of the knowledge train-
ing domains mentioned at the outset. With regard to medical
training, one common model is to block exposure of material
and to test minimally on that material. For example in one med-
ical school, training structure—function knowledge includes
lectures on one area (arms, back) followed by one test; lectures
on another area (chest, pelvis) followed by a test, and so on.
Further within one particular area, information that is similar
but is critical to distinguish is blocked. As one example, for the
pelvis unit, the female system is presented on one day and the
male on another day. This blocking likely does not facilitate stu-
dents being able to differentiate between fallopian tubes and vas
deferens (similar looking structures). This training paradigm is
not optimal if the desired outcome is to promote retention of
critical medical information over long intervals, as well as to
promote discrimination of functionally different structures/cell
types that have morphological and cellular similarities (e.g., GI
cells). To improve retention, spacing and repeated testing should
be implemented. One straightforward way to achieve this goal
would be to implement cumulative examinations following the
unit exams. This practice would stimulate spaced study of the
material, as well as provide some repeated testing. To improve
differentiation, lectures should interleave critically similar
material (e.g., particular female and male reproductive struc-
tures). These changes would not require monumental redesigns
in training, and could have great potential payoffs in terms of
knowledge retention and application.
As another example, in training junior surgical residents,
one training curriculum is to have the residents attend a 2- to
3-hour session to learn something technical on one particular
organ/system, with each successive session (taught the follow-
ing week) devoted to another organ/system. To improve retention
of the surgical procedures, the training curricula could be rede-
signed so that the residents are required to repeatedly return for
practice on the procedure over a spaced interval (see Moulton et
al., 2006, for consideration of just such a redesign at one medi-
cal school). Repeated practice sessions could also very natu-
rally incorporate interleaving of practice on the various surgical
techniques, which would be expected further to enhance reten-
tion. As a final example, in training accounting the concepts of
revenue and income are typically presented in separate sessions
(blocked by topic); these concepts are often confused by stu-
dents. Better understanding and differentiation of these con-
cepts (and associated accounting procedures) would likely be
improved by interleaving their presentation (as one instructor
at Washington University in St. Louis is now doing). One prag-
matic consideration in implementing interleaving might involve
educating both faculty and residents that their metacognitive
experience of greater fluency/acquisition of a particular skill for
blocked relative to spaced practice is not a reliable indicator of
the expected retention and transfer performances of the skill
(for which interleaving is superior; Bjork, 1994).
A perhaps more difficult setting in which to implement the
SPRINT techniques is that of continuing education. In medical
training, for more advanced medical professionals, the train-
ing is often a single event, conducted in a day or a weekend,
after which the individuals return to their practice and leave
the training environment (see also chapter 14, this volume). As
noted earlier, a similar situation can exist for business-exec-
utive training and for some continuing training of secondary
science educators. To foster better retention, several improve-
ments are possible. First, these single training events could
include retrieval practice (by giving a test at the end of the ses-
sion). Additionally, Web-based tests (retrieval practice) could be
administered some time after the training event and perhaps
repeated in spaced fashion (with completion of tests strongly
encouraged). Web-based quizzes are becoming common in uni-
versity settings, and recent experiments have demonstrated that
repeated Web-based quizzes can increase retention of course
content (even with Web-based instruction where the instructor
does not meet with students; McDaniel, Anderson, Derbish, &
Morrisette, 2007; McDaniel, Wildman, & Anderson, 2012) and
might have the bonus advantage of improving skills associ-
ated with retrieving and using the newly acquired knowledge.
Accordingly, modifications to knowledge training in continuing
education settings that would be expected to significantly pro-
mote retention of trained knowledge are possible.
284 Mark McDaniel
References
Accreditation Council for Graduate Medical Education (2007). Com-
mon program requirements, Chicago, IL: Author. Retrieved from
http://www.acgme.org/acWebsite/dutyHours/dh_dutyhoursCom-
monPR07012007.ppf.
Benjamin, A. S., & Ross, B. H. (Eds.). (2008). Skill and strategy in mem-
ory use. San Diego, CA: Academic Press.
Bird, S. (2010). Effects of distributed practice on the acquisition of
second language English syntax. Applied Psycholinguistics, 31,
635–660.
Bjork, R. A. (1994). Memory and meta-memory considerations in the
MA: MIT Press.
Carpenter, S. K., & DeLosh, E. L. (2006). Impoverished cue support
enhances subsequent retention: Support for the elaborative pro-
cessing explanation of the testing effect. Memory & Cognition, 34,
268–276.
Carpenter, S. K., Pashler, H., & Cepeda, N. J. (2009). Using tests to
enhance 8th grade students’ retention of U. S. history facts. Applied
Cognitive Psychology, 23, 760–771.
Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008).
Spacing effects in learning: A temporal ridgeline of optimal reten-
tion. Psychological Science, 11, 1095–1102.
DeLosh, E. L., Busemeyer, J. R., & McDaniel, M. A. (1997). Extrapola-
tion: The sine qua non for function learning. Journal of Experimental
Fitzgerald, J. D., & Wenger, N. S. (2003). Didactic teaching conferences
for IM residents: Who attends, and is attendance related to medical
certifying examination scores? Academic Medicine, 78, 84–89.
Gick, M. L., & Holyoak, K. J. (1983). Schema induction and analogical
transfer. Cognitive Psychology, 15, 1–38.
Glass, A. L. (2009). The effect of distributed questioning with varied
examples on exam performance on inference questions. Educational
Homa , D., & Vosburgh, R. (1976). Category breadth and the abstrac-
tion of prototypical information. Journal of Experimental Psychology:
Human Learning and Memory, 2, 322–330.
Jacoby, L. L., Walheim, C., & Coane, J. (2010). Test-enhanced learning
of natural concepts: Effects on recognition memory, classification,
and metacognition. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 36, 1441–1451.
Jennings, J. M., & Jacoby, L. L. (2003). Improving memory in older
adults: Training recollection. Neuropsychological Rehabilitation, 13,
417–440.
Karpicke, J. D., & Roediger, H. L. (2007). Repeated retrieval during
learning is the key to long-term retention. Journal of Memory and
Language, 57, 151–162.
Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of
Kornell, N., & Bjork, R. A. (2008). Learning concepts and categories:
Is spacing the “enemy of induction”? Psychological Science, 19,
585–592.
Larsen, D. P., Butler, A. C., & Roediger, H. L., III. (2009). Repeated
testing improves long-term retention relative to repeated study: A
randomized controlled trial. Medical Education, 43, 1174–1181.
Lyle, K.B., & Crawford, N.A. (2011). Retrieving essential material at the
end of lectures improves performance on statistics exams. Teaching
of Psychology, 38, 94–97.
McDaniel, M. A., Agarwal, P. K., Huelser, B. J., McDermott, K. B.,
& Roediger, H. L., III. (2011). Test-enhanced learning in a middle
school science classroom: The effects of quiz frequency and place-
ment. Journal of Educational Psychology, 103, 399–414.
McDaniel, M. A., Anderson, J. L., Derbish, M. H., & Morrisette, N.
(2007). Testing the testing effect in the classroom. European Journal
of Cognitive Psychology, 19, 494–513.
McDaniel, M. A., & Busemeyer, J. R. (2005). The conceptual basis of
function learning and extrapolation: Comparison of rule and asso-
ciative based models. Psychonomic Bulletin and Review, 12, 24–42.
McDaniel, M. A., Fadler, C., & Pashler, H. (2011). Effects of spaced ver-
sus massed training in function learning. Manuscript under review.
McDaniel, M. A., Howard, D. C., & Einstein, G. O. (2009). The read-
recite-review study strategy: Effective and portable. Psychological
Science, 20, 516–522.
McDaniel, M. A., & Masson, M. E. J. (1985). Altering memory repre-
sentations through retrieval. Journal of Experimental Psychology:
McDaniel, M. A., Thomas, R. Agarwal, P.K., McDermott, K., & Roedi-
ger, H. L, III. (2011). Quizzing promotes learning of target principles
in middle school science: Benefits on classroom exams. Manuscript
under review.
McDaniel, M. A., Wildman, K., & Anderson, J. L. (2012). Using quiz-
zes to enhance summative-assessment performance in a web-based
class: An experimental study. Journal of Applied Research in Mem-
ory and Cognition, 1, 18–26.
Moulton, C-A. E., Dubrowski, A., MacRae, H., Graham, B., Grober, E.,
& Reznick, R.(2006). Teaching surgical skills: What kind of practice
makes perfect? Annals of Surgery, 244, 400–409.
Picciano, A., Winter, R., Ballan, D., Bimberg, B., Jacks, M., & Laing, E.
(2003). Resident acquisition of knowledge during a noontime confer-
ence series. Family Medicine, 35, 418–422.
Roediger, H. L., III, Agarwal, P. K., McDaniel, M. A., & McDermott,
K. (2011). Test-enhanced learning in the classroom: Long-term
286 Mark McDaniel
improvements from quizzing. Journal of Experimental Psychology:
Applied, 17, 382–395.
Rohrer, D., & Pashler, H. (2010). Recent research on human learn-
ing challenges conventional instructional strategies. Educational
Researcher, 39, 406–412.
Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics problems
improves learning. Instructional Science, 35, 481–498.
Taylor, K., & Rohrer, D. (2010). The effect of interleaving practice.
Applied Cognitive Psychology, 24, 837–848.
Zaromb, F. M., & Roediger, H. L. (2010). The testing effect in free recall
is associated with enhanced organizational processes. Memory &
Cognition, 38, 995–1010.
14 Training for Real-World Job
Performance
Immanuel Barshi
National Aeronautics and Space Administration,
Ames Research Center
Loukia Loukopoulos
San Jose State University
The training principles presented in preceding chapters (espe-

cially chapters 2 and 3) come primarily from controlled labo-
ratory experiments. An important question, then, is how well
these principles generalize to real-world job training. One such
training principle with clear relevance to the real world is the
Procedural Reinstatement Principle (Healy, Wohldmann, &
Bourne, 2005). Specifically, in simple laboratory experiments,
people perform best when the testing conditions reinstate those
encountered during training. In real-world terms, this prin-
ciple implies that training conditions should reinstate the real
world—the actual conditions under which people will perform
these trained tasks. This principle has been the primary driver
behind large investments in high-fidelity simulators used for
training in the military, aviation, space, medicine, and nuclear
power generation industries. Given the cost of training, and
more importantly, the potential cost of errors in such high-risk
operations, it is clear that training should be as effective and as
efficient as possible. However, before training that reinstates the
“real world” can be designed, the real world must be understood.
This discussion begins with commercial aviation. The cock-
pit lends itself well to learning about the real world, and illus-
trates the kinds of challenges facing research and development
of training to truly prepare people for the real world. Later, flying
is linked with other, similar types of high-risk, complex human
activities.
Learning from Accidents

The success of an operation is typically judged by its final out-
come. An airplane reaching its destination must mean that all
288 Immanuel Barshi and Loukia Loukopoulos
went well. In reality, however, it’s hard to tell what may or may
not have happened in the course of the flight. This only becomes
apparent when the final outcome is not accomplished and an
accident occurs (similar to brain research done with people who
sustained a brain injury).
On August 20, 2008, Spanair flight 5022 crashed shortly after
taking off from Madrid’s airport. The preliminary investigation
report confirmed that the wing flaps had not been extended for
takeoff (Comisión de Investigación de Accidentes e Incidentes
de Aviación Civil [CIAIAC], 2008). The flaps are special surfaces
that extend from the trailing edge of the wing to change the
wing’s shape and provide additional lift in slow speed situations
such as takeoff and landing. Extending (setting) the flaps to the
proper position prior to takeoff is critical because the flapless
wing cannot produce sufficient lift to enable the aircraft to climb
away from the ground.
Because takeoff flaps are such a safety-critical item, a num-
ber of safeguards are in place to ensure their proper setting, in
the form of procedures and checklists, as well as an automated
takeoff configuration warning horn. That horn sounds when the
pilot advances the throttles to takeoff power on the ground and
the aircraft is not properly configured for takeoff, such as when
the flaps are not extended.
Complex factors influenced the context within which the Spa-
nair crew prepared their aircraft for takeoff. But not setting the
flaps for takeoff was certainly an unintentional omission. That
omission was coupled, by chance, with the failure of the warn-
ing system. This very omission of setting flaps for takeoff has led
to accidents before (e.g., National Transportation Safety Board
[NTSB], 1988, 1989). In general, pilots’ inadvertent omission of
steps and activities that are well-practiced, habitual, memo-
rized, and standardized are infamous because of their poten-
tial catastrophic outcome (Dismukes, Berman, & Loukopoulos,
2007). Although evidence exists to suggest various mechanisms
whereby errors might occur in the recall of items in serial lists
(such as checklists; e.g., Serial Position Principle, see chapters 2
and 3), it is important to bear in mind that accidents result from
series of events and factors, interacting in complex ways. Still,
seemingly simple human errors are often among the contribut-
ing factors.
Learning from Incidents

It is critical to elucidate the factors that lead to unintended
omissions and in general to accidents in order to help reduce
Training for Real-World Job Performance 289
the probability of such occurrences in the future. Accident data
force the reconsideration of both the operating conditions and
the training received to operate in such conditions so they are
very useful sources of information about the real world. However,
it is not necessary to wait for tragic accidents in order to learn
about the real world. Learning can also come from incidents.
The NASA Aviation Safety Reporting System (ASRS) database
is a rich source of information about the many layers of com-
plexity of real-world, routine flight operations. A search of the
database revealed that between July 2000 and July 2010, 33
airline flight crews did not set or incorrectly set the flaps for
takeoff. Though the number may appear insignificant, it trans-
lates into 33 narrowly-averted disasters. In the majority of cases,
the warning horn alerted the crew in time to reject the takeoff.
In other cases, pilots identified and corrected their mistake dur-
ing taxi (thanks to the checklist), or took off with the incorrect
flaps setting. In yet other cases, such as the one cited here, the
pilots realized the incorrect flap position and corrected it during
takeoff:
… As we started the taxi, I called for the taxi checklist, but

became confused about the route and queried the first-offi-
cer to help me clear up the discrepancy. We discussed the
route and continued the taxi…. We were cleared for takeoff
from runway 1, but the flight attendant call chime wasn’t
working. I had called for the Before Takeoff checklist, but
this was interrupted by the communications glitch – On
takeoff, rotation and liftoff were sluggish.… The first-officer
noticed the no flap condition and placed the flaps to 5….
(ASRS report #658970—May 2005)
Here, the last thread of hope, the configuration warning horn,

failed to provide the crew with a timely alert that flaps had not
been set for takeoff. Given that the aircraft successfully landed
at its planned destination, no one would have ever suspected
that anything out of the ordinary had happened on this flight.
Fortunately, the crew did report this occurrence, opening an
interesting window to the complexity of routine operations. For
example, after landing, it was determined that the warning horn
failed because of a “pulled” circuit breaker. A circuit breaker
may be “pulled” by mechanics prior to a flight to conduct nec-
essary tests, and they may forget to push it back “in.” Pilots
may later inadvertently overlook its position, because preflight
is a particularly busy time with frequent interruptions. Indeed,
a short while later, one such interruption, a “communications
glitch,” detracted this crew’s attention from the Before-Takeoff
checklist, the very safeguard designed to help ensure that criti-
cal items, such as the flaps, are properly set.
The selected report is not an exception. A systematic search
of the ASRS database reveals a variety of similar-type events.
In their totality, such events paint a realistic picture of every-
day operations, one where crews are vulnerable to all types of
inadvertent omissions and errors in performance, despite the
fact that they have received extensive training and accumulated
considerable experience performing, based on that training, in
daily operations.
The specifics of the reported omissions are highly variable:
pilots forgot to set flaps for takeoff or to verify that they had set
them, rushed to start taxiing without confirming they had the
necessary clearance, failed to start an engine, confused their
position on the taxiways, failed to monitor one another, to name
but a few. The time at which the omissions manifest themselves
also varies: failing to check the ramp area led to an immediate
collision with a vehicle, but not extending the flaps while still at
the gate led to an aborted takeoff many minutes later. Neverthe-
less, all these events led to significant safety compromises to the
aircraft occupants (and often other personnel on the ground),
and typically translated into monetary costs and delays.
So what do incidents and accidents say about the real world
of aviation? Are they somehow unique or extreme cases of (unin-
tentionally) negligent performance? Did these pilots just miss
parts of their training or were they inexperienced? No, these
pilots didn’t miss parts of their training; they all successfully
completed their training programs and have performed the very
same actions they omitted many times in previous flights with-
out failure. And no, neither these accidents nor the incidents are
the result of negligence. Any other crew facing the same condi-
tions would have been equally likely to make such mistakes.
Omissions and other types of errors say a lot about the nature of
real-world operations and the nature of real-world human per-
formance. And they say a lot about training.
Training (Theory)
Anyone who has ever peeked into an aircraft cockpit can easily
imagine the substantial volume and variety of activities that a
pilot must perform during a flight. To be managed, this workload
must be organized. Furthermore, it must be standardized, so
activities are accomplished in the same way by a pilot on every
flight, and also by different pilots on different flights. These
activities are, therefore, dutifully described in each airline’s
Flight Operations Manual (FOM). The FOM includes detailed
procedures for every activity pilots are expected to perform,
from the external checks prior to boarding, through preparing
the aircraft for flight, starting the engines, taxiing from the gate
to the runway, and all the way through shutting the engines
down and securing the aircraft at the end of the day at its final
destination.
Procedures must be, first and foremost, consistent with the
technical characteristics and limitations of the aircraft. The orig-
inal FOM is created by the aircraft manufacturer, based on the
design of the aircraft and on many hours of flight testing, giving
it an engineering focus, so that all aircraft systems function as
designed. The manufacturer, however, has no experience in car-
rying passengers in routine daily operations. It is therefore up to
each airline to take the FOM and adapt it to its own operational
needs, by adding different steps, depending on the particular
type of operation it conducts, and on its corporate culture.
Ultimately, FOMs describe two main kinds of procedures:
flows and checklists. Flows are sequences of actions that have
to be performed from memory. These flows are often based on
the spatial layout of the cockpit, so that a pilot may check all the
gauges on the instrument panel starting from the left and moving
to the right. In addition, actions are often functionally related;
for example, setting the flaps requires hydraulic pressure, and
hydraulic pressure, in turn, requires that the engines be started,
and for that to happen the battery has to be turned on.
Pilots have many opportunities to practice flows through
training and through their operational experience, and can thus
perform them from memory. To guard against memory failures,
there is the second kind of procedure in the FOM, the checklist.
This is a list of selected items from the flow that are considered
absolutely critical and must therefore be checked and confirmed
prior to the next phase of flight.
What does all this have to do with training? Flows and check-
lists form the basis for pilots’ training. In training, pilots spend
many hours in cockpit mockups and flight simulators, prac-
ticing these procedures to perfection. Unfortunately, 100% reli-
ability is not a common human trait and checklist procedures
carry particular challenges (e.g., Barshi & Healy, 1993; also see
the serial position principle, chapters 2 and 3); the focus here
is on the implications of having to perform these procedures in
the real world for the design and development of such training.
In the case of setting the flaps for takeoff, the action takes
place when the pilots are ready to taxi the aircraft from the gate
area to the runway. According to the FOM, each of the two pilots
in the cockpit has specific responsibilities during this phase of
flight. Before setting the aircraft into motion, the pilots set the
aircraft flaps to the takeoff position and request taxi clearance
(i.e., navigational “directions” to the runways) from Air Traf-
fic Control (ATC). While taxiing, the two pilots must perform,
from memory, a set of activities prescribed by a flow, which they
then verify by conducting together the corresponding checklist.
The taxi flow and checklist typically contain the “flaps” item,
because of its critical nature. Oftentimes, the FOM prescribes
yet one more, shorter flow and checklist as the aircraft nears
the runway. In any case, by the time all these taxi activities are
completed, the aircraft is ready for takeoff.
Pilots have additional responsibilities besides flows and
checklists while taxiing. They are expected to monitor several
radio frequencies and respond if called; they must monitor their
progress along the taxi route confirming that they are correctly
following the instructions in their taxi clearance; and they must
ascertain that their movements do not conflict with those of
other aircraft. Furthermore, they might be called upon by the
cabin crew, or receive important dispatch information via com-
puters. In short, taxiing to the runway is a very busy time in
the cockpit. Most of these responsibilities are mentioned in the
FOM, but are not organized in specific procedures and are with-
out specific guidance on how to best accomplish them or ensure
their integrity. Many responsibilities are mentioned in passing,
as side-notes to the precise description of procedural activities.
FOM-prescribed taxi-phase activities (and activities for each
of the flight phases), while large in number, are admittedly
straightforward as they are neatly packed into procedures, flows
and checklists, and laid out on paper. Furthermore, their orga-
nization lends itself perfectly to training thanks to three key
features. First, activities are linear, such that a given activity is
always preceded by and followed by specific activities, always in
the same sequence, because one activity is ostensibly a prereq-
uisite for the next. For example, the checklist is called for only
after each pilot has had a chance to accomplish the prerequisite
flow. Second, activities are predictable. Information and events
from the external environment are anticipated, and occur pre-
dictably at certain points in time. And third, activities are in the
pilots’ control. For instance, the pilots decide when to request the
taxi clearance, and they receive it as soon as they do so.
These three features make procedures attractive for train-
ing, both for pilots, and especially for trainers. Linearity facili-
tates learning. Given sufficient practice in flight simulators,
and taking advantage of natural cognitive characteristics, such
as motor memory, a pilot can learn to perform a linear set of
activities with minimal effort and high reliability. Moreover, an
instructor can assess one’s proficiency simply by ensuring that
all activities are accomplished, and in the right order. Predict-
ability implies no surprises. Consequently, these procedures
lend themselves to learning because a pilot can simply and
automatically perform well-practiced actions, without concern
for unforeseen factors. Likewise, the instructor can focus on
watching that habitual activities are accomplished flawlessly,
and can devote more time to serious contingencies, such as
those requiring emergency procedures. Controllability, the full
control of the initiation, timing, and termination of activities by
the pilot, again implies no surprises, specifically the absence
of factors that might alter or otherwise remove altogether that
control. Such organized situations can be both demonstrated by
pilots and assessed by instructors, so that the expected control-
lability of activities, together with their predictability and linear-
ity, all lend themselves very conveniently to the ease of training.
It’s not difficult to appreciate that people organize many of
their daily activities around procedures that, although not
always explicit or formally taught, give a linear, predictable, and
controllable flavor to behavior. Withdrawing cash from an auto-
matic teller machine (ATM), for example, is accomplished fol-
lowing a fi xed, linear sequence of activities (insert card, type
PIN, select option to withdraw, retrieve card, receive money and
receipt), each of which predictably generates an outcome (typ-
ing in the correct PIN produces a menu of possible actions) over
which the user has a large degree of control (type in the PIN
when ready, select the desired option, etc.).
An important outcome of human performance based on lin-
ear, predictable, and controllable procedures is that it becomes
largely automatic (i.e., it is cheap in cognitive resources) and rel-
atively reliable. But it also leads to the generation of, and subse-
quent reliance on, triggers. Repeated performance of activities in
a particular sequence builds strong links among actions, so that
certain, particularly salient events or actions become triggers
that automatically (i.e., with minimal conscious, controlled effort)
elicit the next step in sequence. An expert money-withdrawer
at the ATM can select from the available menu options without
actually reading them on the screen. When the ATM returns the
card it typically generates an aural signal that automatically
triggers the withdrawer’s action to retrieve the card.
Repeated performance of procedures in the cockpit likewise
generates triggers and leads pilots to rely upon them. Interaction
with an ATM is simple, and matters are obviously more complex
when it comes to interacting with an aircraft. Training developers
and instructors may not expect FOM-prescribed activities to
accurately represent everything about the real world, or the
FOM-based training to completely prepare pilots for the real
operational environment. Pilots are expected to learn to deal
with the real world through exposure and experience. But that
doesn’t always work well, so understanding the differences
between the real operational world and the theoretical world of
the FOM is critical for the design of effective training.
Performing (Reality)
One way to truly understand how procedures are performed in
the real world is to actually observe them in real time. A few
years ago, the authors occupied the spare seat (jump seat) in
cockpits, behind the pilots, while passengers (and the authors)
were being whisked to destinations around the country, and
gained a unique perspective on procedures in real life. This
observational study and its findings are discussed in great
detail in their book, The Multitasking Myth (Loukopoulos, Dis-
mukes, & Barshi, 2009). Certain findings, however, have par-
ticular implications for the design of training and for the basic
research upon which such design depends.
One of the findings confirms that expert, professional, exten-
sively trained pilots indeed perform what they have been trained
to do: They follow detailed, exact sequences of complex activi-
ties, working alone, and coming together in a prescribed, care-
fully orchestrated and executed manner. Procedures generally
serve their purpose well.
Another finding concerns the presence of monitoring require-
ments, neither explicitly described by procedures nor specifi-
cally covered in training. Pilots navigate across busy airports,
watching out for other aircraft, ground servicing equipment and
vehicles, maintaining appropriate distance to avoid collisions,
remaining aware of their position along the taxi route, while
listening to communications with ATC so as to immediately pick
out the instructions issued to their aircraft (and infer the situa-
tion around by listening to instructions to other aircraft). Pilots
undertake this “extra” workload seemingly effortlessly. They
simply interleave the monitoring demands with everything else
they are doing and largely depend on skills that enable them to
work well in unison and to back up one another.
Yet another finding concerns the presence of operational fac-
tors requiring on-the-spot solutions that are also not explicitly
(if at all) described by procedures nor introduced in training.
For example, ice and snow on the taxiways may necessitate
(for technical reasons) postponing setting the flaps until right
before takeoff. Radio calls often interrupt the pilots in the mid-
dle of conducting checklists, and the calls must be responded to
before resuming the interrupted checklist. Wind changes cause
controllers to issue changes to the takeoff runway, making
pilots temporarily unavailable to monitor one another or con-
duct checklists together because they are unexpectedly busy
recalculating takeoff data.
Such instances (and the many more that were observed in the
course of this study) lead to the conclusion that activities in the
real operational world are not as straightforward as procedures
portray. The prevailing feature appears to be a flurry of events
that occurs during the course of a flight, at the same time that
pilots are busy carrying out normal procedures per the FOM.
Such events arise from operating within the larger context of
busy airports, changing weather patterns, and traffic-ridden air
space, in association with scores of other professional groups
(e.g., dispatchers, ground services, air traffic controllers) and
under the heavy influence of the operational culture of the air-
line and the economic constraints of the industry. Their pres-
ence “perturbs” the nominal, anticipated sequence of normal
activities prescribed by procedures, and creates a dynamic,
multidimensional (in terms of the many, often-conflicting priori-
ties) environment, in which pilots are called to make on-the-spot
decisions about interleaving, postponing and resuming, and
combining the accomplishment of various activities.
Perturbations do not qualify as “emergencies” because they
do not require pilots to perform radically different steps from
those prescribed by the normal procedures for which they have
been trained. Instead, pilots must execute those procedures
with small variations in the sequence of some activities, a cer-
tain degree of resourcefulness in the timing of other activities,
and, on occasion, simultaneously with other activities. Here,
then is a true picture of the real operational world. Activities
are not linear, but dynamic. The checklist is typically called at
the conclusion of the preceding flow, but if ATC has issued new
instructions, the crew must first deal with them before the cap-
tain can call for the checklist. Activities are semi-predictable;
that is, many perturbations are unpredictable (such as a prob-
lem encountered by another aircraft that delays all movements
on the taxiway), and some can be anticipated though their spe-
cific timing is unpredictable (such as when a takeoff clearance
is issued much sooner than expected). Activities, finally, are
semi-controllable. The pilots may want to call for a clearance, but
the radio frequency may be congested with exchanges between
the controller and other crews, so the pilots have to wait before
they can make their call.
The question then becomes whether it matters that the real
world is drastically different from the theoretical world implied
by the FOM and for which procedures are written? Does it make
any difference for the design of training? More importantly, does
it make any difference for how the research that guides the
design of training is structured?
The short answer is that it does matter. According to the Pro-
cedural Reinstatement Principle (Healy et al., 2005), for train-
ing to be effective, it must reinstate the conditions in which the
skills learned will actually be put to use. The long answer is
that it matters a lot, because these features of the environment
that should be reinstated pose a safety risk given current pro-
cedures. This risk is evidenced in the way pilots’ responses to
operational perturbations interact with cognitive processes.
Multitasking
The issue with the real operational environment is not just one
of workload: perturbations do not simply increase its volume
(number of tasks), they increase its complexity. It’s not that
pilots are unable to cope with this complexity. On the contrary,
they are so used to perturbations (and so capable of incorporat-
ing them into their routine), that they consider them “business
as usual.” Recall, however, the initial discussion of the puzzling
occurrences of expert pilots forgetting to accomplish critical
tasks explicitly prescribed by procedures they had been trained
to perform, and had considerable experience performing in the
real world. These occurrences can be understood by examining
the link between the complexity of the real operational world
and inadvertent omissions (for an extensive discussion, see Lou-
kopoulos et al., 2009).
Perturbations give rise to interruptions and distractions, the
need for tasks to be executed in other than the normal, prac-
ticed sequence, unanticipated new tasks, the need to interleave
multiple tasks, and, of course, any combination of the above! In
doing so, they force pilots to engage in “multitasking”—a situa-
tion of doing several things at the same time. Multitasking with-
out negative effects on performance is not a human strength
because individuals are only able to focus their attention on
one thing at a time (Basak & Verhaeghen, 2011). Multitasking
therefore involves task switching, rather than parallel execution
of tasks. In turn, this means that the degree to which it can
be done successfully depends on many factors, such as the
number, nature, prior experience, and extent of practice of the
particular combination of tasks. When it involves familiar, well-
practiced activities that can be executed with little conscious
attention, multitasking is possible with a low risk for errors, as
can be attested by the pilot who taxis the aircraft and still moni-
tors the radio frequency for calls addressed to him. Throw in
a third, perhaps less familiar, more “involved,” or unexpected
activity such as having to make mental calculations and input
data into the computer, and the risk of errors increases because
task-switching has costs (Rubinstein, Meyer, & Evans, 2001). In
cases such as those encountered in real flight operations, where
multitasking typically involves a mixture of habitual and novel
activities, it proves to be particularly risky because it creates
situations that push the capacity of human memory and atten-
tion to the limits, rendering any pilot vulnerable to errors.
It is now possible to appreciate that behind each reported fail-
ure lie none other than perturbations of the routine flight opera-
tional environment; that is, events that gave rise to the need for
multitasking. A flight crew reported aborting the takeoff after
the configuration warning horn sounded to indicate that flaps
were not in their proper position. The reason? The crew, fol-
lowing instructions from the ground controller, had rushed to
vacate the gate area for another aircraft that had just arrived
and was assigned to the same gate. In rushing to release the
parking brake and get moving, the crew inadvertently forgot
to first set the flaps. The perturbing event (the other aircraft)
and the sense of time pressure it generated effectively masked
the habitual trigger for checking the flaps before releasing the
parking brake. A few moments later, when the aircraft was mov-
ing, the implicit assumption was generated that all preceding
activities had been completed, as usual. Operators everywhere
are vulnerable to attempt to respond to interruptions that arise
from various operational demands, oftentimes without realizing
the potential risks involved.
What about the crew that inadvertently struck a ground truck
after failing to obtain the expected signal from the truck driver?
The pilots were busy discussing the taxi instructions. Becoming
engaged and preoccupied by important and salient operational
issues can sidetrack performance, and allow the brain to jump
forward in the sequence of habitual activities and make assump-
tions that certain activities have already happened (on occasion,
having the strong, though wrong, feeling that they have actu-
ally taken place). False memories are a particularly well-known
and studied phenomenon in the retrieval of declarative informa-
tion (e.g., eyewitness testimonies) and one of the mechanisms
giving rise to such confusion (i.e., source confusion, Johnson,
Hashtroudi, & Lindsay, 1993) may well also be behind pilots’
mistaken assumption that an action has already been accom-
plished. However, false memory for procedural knowledge has
rarely been studied in the laboratory, and remains a challenge
to the research community.
How about the first officer who failed to adequately monitor the
captain, because she was busy reprogramming the computer in
response to just-issued changes, thus not paying enough atten-
tion to prevent the captain from inadvertently taxiing past an
intended turn? A change in runways issued while the aircraft
is already taxiing is a fairly common event. The first officer was
called to multitask, and was set up in a particularly challenging
situation combining an attention-absorbing activity (program-
ming) entailing physical direction of the body/head/eyes away
from the scene outside, with the notoriously nonsalient activity
of monitoring (especially another person’s actions). Monitoring,
even by itself, requires deliberate effort and, when combined
with another task, falls easily out of the scope of attention. Mon-
itoring is at high risk of being “dropped” in the face of perturba-
tions, both because of its nature (it has few salient cues) and
because it is often not considered a task in itself, but rather a
background activity.
The sources of reported and observed perturbations are truly
variable, their effects as well. No task seems immune to the risk
of being omitted, regardless of its criticality. Perturbations are
clearly related to errors and their effects in rendering the real
operational environment nonlinear, only semi-predictable, and
only semi-controllable must be factored into training to make it
effective.
Other Domains
Omissions and other types of human performance errors are
not exclusive to the cockpit. Multitasking and operational
perturbations are features of many professions such as air traffic
control (Loukopoulos, 2010), train dispatching (Department
of Transportation/Federal Rail Administration [USDOT/FRA],
2001), nursing (Kalisch & Aebersold, 2010), even working in
space (Burgess, 2000). A number of studies discuss operator
errors, including omissions, in a wide range of activities including
aircraft maintenance (Hobbs & Williamson, 2003), nuclear power
plants (Alder & Hausmann, 1995; Hirotsu, Suzuki, Kojima,
& Takano, 2001), the pipeline industry (American Petroleum
Institute [API], 2005), administering medication (McBride-Henry
& Foureur, 2006), and identifying the correct site for surgery
(Kwaan, Studdert, Zinner, & Gawande, 2006). Naturally, there
are large differences in the degree to which activities in each
of these work settings are dictated by precise procedures, the
manner with which such procedures are trained and carried
out, and the types and effectiveness of safeguards in place to
guard against unintended deviations from such procedures. The
brief descriptions below illustrate how the dynamic, multitasking
features of these real operational environments, as in aviation,
present a risk to human performance and therefore need to be
studied and reinstated in training.
Case 1: Air Traffic Control

When the crew of the first aircraft contacted him, the controller
was also providing traffic control services to two other aircraft
in the area, on another radio frequency. To handle the two fre-
quencies, he had to switch his physical position and his atten-
tion back and forth between two work-stations. Until that time,
the operational demands were routine and benign, because
covering for multiple positions had become common practice.
Everything changed when the controller picked up the telephone
to inform the airport of the arriving aircraft and found it was
not working because of technical problems with the bypass tele-
phone system (the main phone line was down for maintenance
work). A multitasking situation emerged. Preoccupied with try-
ing to contact the airport, distracted by the presence of techni-
cians in the room, and burdened by having to visually scan two
radar screens projecting images at significantly different scales,
he inadvertently “dropped” the act of monitoring the aircraft,
therefore failing to notice that two of them were actually on a
collision course. Sadly, on that day the last thread of hope, a
ground-based collision warning system, also happened to be
switched off for maintenance. Simply following standard air
traffic control procedures was not sufficient to prevent the con-
troller from making a fatal mistake, and the two aircraft collided
in midair killing all aboard (Bundesstelle für Flugunfallunter-
suchung [BFU], 2004).
Case 2: Medication Administration

An order for an IV bag to contain, among other things, sodium
chloride was put through to the pharmacy. The technician
started preparing the bag but was asked to assist another tech-
nician as she was about to add sodium from a multidose vial
from which some amount of sodium had already been removed.
She responded to the unexpected interruption by stepping away
from her primary task of preparing the solution for the IV bag.
When she returned, she saw the partially empty sodium chlo-
ride vial next to the bag she had been preparing before being
interrupted and assumed she had already completed the task of
adding the substance to the solution. The IV was administered
to the patient whose condition, after 5 hours of infusion, indi-
cated severe hyponatremia because of the lack of sodium in the
infused IV solution. The situation was immediately corrected
but the patient’s hospitalization was prolonged because of the
error. “Triggers,” such as those intuitively generated by having
prepared countless IV bags in the past, are reliable under nor-
mal circumstances. The sight of an open and (partially) empty
vial normally implies that the substance has already been used
in preparing the desired solution and triggers the technician to
proceed with the administration of the substance. However, in
the case of unexpected interruptions and other types of perturb-
ing events that may occur in parallel, the same “trigger” can be
misleading (U.S. Pharmacopeia [USP], 2003).
Case 3: Train Control

A coal train with 116 cars was traveling early in the morning east-
ward across Northern Texas when it received an “after-arrival”
warrant, an instruction to wait after arriving at a designated
point coming up ahead for the arrival of another, westbound
train, before proceeding. At the time the warrant was issued, the
coal train was 3 miles from the waiting point and was traveling
at about 48 mph. Just 5 seconds after confirming receipt of the
warrant, the train engineer placed a call on his cell phone. He
made no apparent preparations to stop the train. Four minutes
later, as the train passed the waiting point, he was still on the
phone. The telephone call lasted a total of 9 minutes. Four min-
utes after hanging up, the trains collided head-on, 8 miles past
the waiting point, resulting in fatal injuries (to the engineer of
the other train), critical injuries (to the engineer and conductor
of the coal train), and damages that exceeded $8 million. The
engineer was experienced and qualified on the territory, and
all communications had been clear. Yet, he somehow failed to
follow the straightforward instruction to wait for a train mov-
ing along the same track, in his direction. The “holding” pro-
cedure was undoubtedly simple, clear, and not uncommon, but
the engineer either never consciously processed the information
and what he needed to do in response, or forgot to follow through
with his intention to heed the warrant, being distracted by his
own cellphone conversation (NTSB, 2003).
Case 4: Operating Room

A woman with a trigger-finger condition was the last of three
patients scheduled for hand surgery that day. The first surgery
was a carpal-tunnel release. It went well, but as the dressing
was being applied, the patient became upset and the surgeon
had to help console her. A little later, before the second sur-
gery, the surgeon was asked to translate during the preoperative
preparation of the third patient, since no interpreter was avail-
able. According to hospital protocol, the correct arm had been
marked at the wrist by the nurse, though the planned incision
site on the hand was not. The surgeon went through his usual
pre-procedure routine with the patient, verifying the symptoms,
the examination findings, and confirming a persistent trig-
ger finger of the left ring finger. Next he performed a carpal-
tunnel release on the second patient. Several other surgeons
were behind schedule so the third patient was moved to another
operating room, and this also resulted in a change in person-
nel; specifically, the nurse who had performed the pre-operative
assessment would not be in the room during the procedure.
During the room change, the surgeon went to conduct another
consultation. When he returned, he was sent to console the first
patient who had again become agitated in the recovery area.
Although he was able to help put her at ease, the encounter was
very emotional. When he finally entered the operating room for
the last surgery, preparations were under way. He noticed that
there was no tourniquet so the circulating nurse had to leave
the room to get one. This distracted her from the patient and
made her fall behind on her documentation. While the patient’s
arm was being washed according to hospital protocol, the alco-
hol wiped the marking off the limb. The surgeon spoke with
the patient in Spanish, which the nurse mistook as a Time-Out
(standardized procedure of coordination among operating team
members prior to surgery). As a consequence, no formal Time-
Out took place before the procedure begun. The surgeon pro-
ceeded to perform a carpal-tunnel release on the patient, rather
than a trigger-finger release. About 15 minutes later, while in
his office dictating the report of the operation, he realized that
he had performed the wrong procedure. Small deviations from
standard procedures, combined with unexpected perturbations
to the normal sequence of events due to operational demands,
and there was nothing left to protect from a healthy dose of
human nature (emotional stress, habit-capture) (Ring, Herndon,
& Meyer, 2010).
Case 5: Aircraft Maintenance

The two technicians working the overnight shift had two differ-
ent, time-consuming tasks to accomplish on an aircraft: replace
the nosewheel spin pads and the toilet dump valve. To complete
everything within the allotted time, the two technicians split the
tasks. It was 3:30 in the morning when the technician started
working on the aircraft, following the procedure for a nosewheel
spin pad replacement. He was under time pressure to complete
the task so he could assist a colleague working on the same
aircraft and another engineer working on a different aircraft.
To ensure adequate illumination of the nosewheel area, he used
a flashlight which he balanced on the nosewheel strut. While
working on the spin pads, he knocked over some tools but rather
than interrupt what he was doing, he decided to pick them up
after completing the replacement. Later, while picking up the
tools he was momentarily distracted by the workshop headlights
and forgot his (implicit) intention to retrieve the flashlight from
the nosewheel area before departing. Much later, while taxiing
to the departure runway, the flight crew found the nosewheel
uncontrollable. The flashlight remained stuck in the nosewheel
assembly (Confidential Human Factors Incident Reporting Pro-
gram—Maintenance Error Management System [CHIRP-MEMS]
report, n.d.).
Training for Multitasking

So what are the implications for training? Are operators (pilots,
controllers, technicians, nurses, surgeons) currently being
trained for multitasking situations and, if so, how and how well
are they being trained? How does one train for such “organized
chaos”? Should training continue to be based on procedures?
Do procedures adequately capture the dynamic and unpredict-
able nature of the real world? What aspects of the real world
should they capture?
Not all the answers are known, nor can they all be treated
within the scope of this chapter. Importantly, it is not being
advocated that training should enable individuals to multitask
indiscriminately. For example, drivers should not be taught to
talk/text on a cell phone while driving (training should, however,
recognize that people will have the tendency to do it and should
provide guidance against it). But some answers can be offered
to some of these questions, while raising the awareness of the
research community and generating interest in these important
issues so that ultimately, and in order to make training truly
effective, all questions can be addressed.
Human performance calls upon cognitive skills that have
known strengths and limitations. But the human brain cannot
be rewired. Nevertheless, man-made systems could be designed
to fit with the existing wiring. Attention, as mentioned earlier,
can only be focused on one thing at a time. Thus, multitask-
ing always involves task switching, with all the costs it entails.
Although not much can be done to change the operational envi-
ronment and the demands it creates, or to magically render the
human brain able to manage multiple complex tasks simulta-
neously, a much better job can be done than is currently being
done, in understanding the true complexity of operations and
in minimizing the opportunity for perturbations to occur. The
design of tasks, procedures, and technologies can be improved,
to take into account the dynamic nature of complex operations,
and training can also be better designed so that it prepares
operators to manage the real world.
One potential solution is to introduce more deliberate prac-
tice (e.g., chapter 2) of handling perturbations and multitasking,
for example with part-task training methods that focus on dual
task combinations, and which gradually increase the training
context demands or alter the priorities of the tasks (for a more
detailed discussion of such training methods, see chapter 4).
Practice with concurrent tasks builds automaticity and reduces
the need for focused attention on their execution and, more
importantly, allows for the emergence of time-sharing skills
that are crucial for effective handling of operational demands.
Gradually increasing time pressure during training can help
teach trainees optimal deployment and resource investment
strategies (see chapter 4), skills they can rely on in real-life situ-
ations. Altering the priorities, or even teaching effective visual
scan patterns such as those used by experts, can help teach
strategies for attention allocation and monitoring, mnemonic
strategies, methods for selecting salient “cues” and setting arti-
ficial “triggers,” and techniques for crew cross-checking that
are specifically sensitive to threats arising from perturbations
and multitasking hazards. Relevant research on driving sug-
gests the advantages of a naturalistic environment for training
drivers, including similar training methods (Gamache, Hudon,
Teasdale, & Simoneau, 2010).
People are amazingly adaptable. The basic assumption behind
current training approaches is that people can learn the “ideal,”
by-the-book, set of procedures which form the foundation for the
unperturbed operation, and then adapt to the dynamic nature of
the real world. The very high safety record of airline operations
stands as evidence that the assumption is usually warranted.
However, if the expectation for ever higher levels of safety and
reliability from pilots, surgeons, and other professionals contin-
ues, they must be provided with improved training.
So here are some answers, at least from the operational world
of aviation. No, pilots are not being trained to multitask, and
humans can’t truly learn to pay close attention to two things
at once. And no, current procedures do not adequately capture
the dynamic and unpredictable nature of the real world. But
yes, training should continue to use procedures as its starting
point. These procedures should, however, be designed with the
real world in mind, and they should only be used as the start-
ing point. Beyond that point, training should include realistic
perturbations, based on real-time studies of routine operations
to identify specific factors that give rise to multitasking situ-
ations. Also, given that real-life conditions are quite variable,
variability should also be included in the training context—
something which in principle is beneficial for training anyway
(e.g., Variability of Practice Principle, chapter 2). Special atten-
tion should be given to specific activities that are particularly
vulnerable in the face of multitasking, taking into account the
powerful nature of triggers and the strong association of activi-
ties practiced extensively together. Training should pay special
attention to monitoring because of its critical nature and its vul-
nerabilities. Finally, training should include specific strategies
to respond to these perturbations and to monitor well. Which
strategies could be most effective and under which conditions is
a fertile and critical area for future research. This chapter has
pointed a direction in which research on training could move,
so that more questions can be answered, and the training pro-
grams themselves be better designed, to help support operators
and protect them from frustrating, if not catastrophic, mistakes.
References
Alder, H. P., & Hausmann, W. (1995). Analysis of human factors in inci-
dents reported by Swiss nuclear power plants to the Inspectorate.
In International Atomic Energy Agency (IAEA), Organizational factors
influencing human performance in nuclear power plants (pp. 115–
119; IAEA-TECDOC-943). Vienna, Austria: International Atomic
Energy Agency.
American Petroleum Institute (API). (2005). Facilities piping and equip-
ment: Focus on items involved and causes of incidents (PPTS Opera-
tor Advisory 2005-4). Retrieved from http://committees.api.org/
pipeline/ppts/docs/advisories/20054advisoryFacilitiessubset.pdf
Barshi, I., & Healy, A. F. (1993). Checklist procedures and the cost of
automaticity. Memory & Cognition, 21, 496–505.
Basak, C., & Verhaeghen, P. (2011) Three layers of working memory:
Focus-switch costs and retrieval dynamics as revealed by the
N-count task. Journal of Cognitive Psychology, 23, 204–219.
Bundesstelle für Flugunfalluntersuchung (BFU). (2004). Accident near
Überlingen/Lake of Constance/Germany, 1 July 2002 (Investigation
Report AX001-1-2/02). Retrieved from http://ocw.mit.edu/courses/
aeronautics-and-astronautics/16-358j-system-safety-spring-2005/
assignments/ueberlingen.pdf
Burgess, P. W. (2000). Real-world multitasking from a cognitive neuro-
science perspective. In S. Monsell & J. Driver (Eds.), Control of cogni-
tive processes (pp. 465–472). Cambridge, MA: MIT Press.
Confidential Human Factors Incident Reporting Program—Mainte-
nance Error Management System (CHIRP-MEMS). (n.d.). Learning
from experience: Report 2—Torch left in nosewheel steering cable
run. Retrieved from http://www.chirp-mems.co.uk/experiance.asp
Comisión de Investigación de Accidentes e Incidentes de Aviación Civil
(CIAIAC). (2008). Preliminary report A-32/2008. Retrieved from
http://www.fomento.es/NR/rdonlyres/C58972BC-B96C-4E14-
B047-71B89DD0173E/76728/PreliminaryReportA_032_2009.pdf
Dismukes, R.K., Berman, B., & Loukopoulos, L.D. (2007). The limits of
expertise: Rethinking pilot error and the causes of airline accidents.
Aldershot, England: Ashgate.
Gamache, P-L., Hudon, C., Teasdale, N., & Simoneau, M. (2010). Alter-
native avenues in the assessment of driving capacities in older driv-
ers and implications for training. Current Directions in Psychological
Science, 19, 370–374.
transfer. In A. F. Healy (Ed.), Experimental Cognitive Psychology and
Its Applications (pp. 59–71). Washington, DC: American Psychologi-
cal Association.
Hirotsu, Y., Suzuki, K., Kojima, M., & Takano, K. (2001). Multivari-
ate analysis of human error incidents occurring at nuclear power
plants: Several occurrence patterns of observed human errors. Cog-
nition, Technology, and Work, 3, 82–91.
Hobbs, A. N., & Williamson, A. (2003). Associations between errors
and contributing factors in aircraft maintenance. Human Factors,
45, 186–201.
Johnson, M. K., Hashtroudi, S., & Lindsay, D.S. (1993). Source moni-
toring. Psychological Bulletin, 114, 3–28.
Kalisch, B., & Aebersold, R. N. (2010). Interruptions and multitask-
ing in nursing care. The Joint Commission Journal on Quality and
Patient Safety, 36(3), 126–132.
Kwaan, M. R., Studdert, D. M., Zinner, M. J., & Gawande, A. A. (2006).
Incidence, patterns, and prevention of wrong-site surgery. Archives
of Surgery, 141, 353–358.
Loukopoulos, L. D. (2010). Air traffic controllers do it too! Hindsight: A
Eurocontrol Journal, 10, 48–51.
Loukopoulos, L. D., Dismukes, R. K., & Barshi, I. (2009). The multi-
tasking myth: Handling complexity in real-world operations. Alder-
shot, England: Ashgate.
McBride-Henry, K., & Foureur, M. (2006). Medication administration
errors: Understanding the issues. Australian Journal of Advanced
Nursing, 23(3), 33–41.
National Aeronautics and Space Administration, Aviation Safety
Reporting System. (2005). Retrieved November 22, 2010, from http://
asrs.arc.nasa.gov/search/database.html
National Transportation Safety Board (NTSB). (1988). Northwest Air-
lines, Inc., McDonnell Douglas DC-9-82, N312RC, Detroit Met-
ropolitan Wayne County Airport, Romulus, Michigan, August 16,
1987 (Report no. PB88-910406, NTSB/ AAR-88-05). Retrieved from
http://libraryonline.erau.edu/online-full-text/ntsb/aircraft-acci-
dent-reports/AAR88-05.pdf
National Transportation Safety Board (NTSB). (1989). Delta Airlines,
Inc., Boeing 727-232, N473DA, Dallas-Fort Worth International
Airport, Texas, August 31, 1988 (Report no. PB89-910406, NTSB/
AAR-89-04). Retrieved from http://libraryonline.erau.edu/online-
full-text/ntsb/aircraft-accident-reports/AAR89-04.pdf
National Transportation Safety Board (NTSB). (2003). Collision of two
Burlington Northern Santa Fe freight trains near Clarendon, Texas,
May 28, 2002 (Report No. PB2003-916301, NTSB/RAR-03/01).
Retrieved from http://www.ntsb.gov/publictn/2003/RAR0301.pdf
Ring, D. C., Herndon, J. H., & Meyer, G. S. (2010). Case 34-2010—A
65-year-old woman with an incorrect operation on the left hand.
New England Journal of Medicine, 363, 1950–1957. Retrieved from
http://www.nejm.org/doi/full/10.1056/NEJMcpc1007085#t=article
Rubinstein, J. S., Meyer, D. E., & Evans, J. E. (2001). Executive con-
trol of cognitive processes in task switching. Journal of Experimental
Psychology: Human Perception and Performance, 27(4), 763–797.
U.S. Pharmacopeia (USP). (2003, September). Distractions contribute
to medication errors. USP Patient Safety CAPSLink. Retrieved from
http://www.usp.org/pdf/EN/patientSafety/capsLink2003-09-01.
pdf
U.S. Department of Transport, Federal Rail Administration (DOT/FRA/
ORD-01/02). (2001). Understanding how train dispatchers manage
and control trains: Results of a cognitive task analysis. Retrieved
from http://www.fra.dot.gov/downloads/Research/ord0102.pdf
15 Cognitive Retraining
Following Acquired Brain
Injury
Keith R. Lohse and Lyle E. Bourne, Jr.
This volume explores the training of cognitive skills in order to

optimize their efficiency, durability, and generalizability. Most of
the data presented in the various chapters are based on studies
using healthy or unimpaired subject populations. In this chap-
ter, the focus is shifted from the training of cognitive skills in
healthy subjects to the “retraining” or rehabilitation of cognitive
skills in patient populations, specifically patients suffering from
acquired brain injuries (ABI). Acquired brain injury is a larger
category of brain injuries that includes traumatic brain injury
(TBI) and nontraumatic brain damage such as ischemic and
hemorrhagic stroke, but excludes developmental or congenital
neurological disorders.
In the United States, and indeed much of the world, TBI is
a common cause of death and disability (Adekoya, Thurman,
White, & Webb, 2002; Langlois, Rutland-Brown, & Wald, 2006;
Rutland-Brown, Langlois, Thomas, & Xi, 2006). TBI is a form
of acquired brain injury that results from a sudden trauma or
insult to the brain as a result of external forces (e.g., piercing,
shearing, or blunt forces). Common causes of TBI include motor
vehicle accidents and transportation related crashes (e.g., bicy-
cle–car or car–pedestrian crashes), falls (especially common in
older adults and young children), sport related activities, and
accidents related to military service (a growing concern; Tan-
ielian & Jaycox, 2008; Warden, 2006). There are many different
types of TBI (e.g., concussion, diffuse axonal injury, open head
trauma), but in general TBI leads to direct damage to brain tissue
and further damaging complications which can arise following
the initial injury (Nortje & Menon, 2004). For instance, as intra-
cranial pressure increases or mean arterial pressure decreases
as the result of bleeding, the brain will become ischemic and
acute or permanent damage may result depending on the length
of the ischemia (Joshi, Ornstein, & Young, 2001). Ischemia and
edema can trigger the release of cytokines, neurotransmitters,
308 Keith R. Lohse and Lyle E. Bourne, Jr.
and free radicals further increasing intracranial pressure and
making the effects of TBI more severe; damage to brain tissue
following the initial injury is collectively referred to as a second-
ary brain insult (Armstrong et al., 2003; Nortje & Menon, 2004.)
The severity and extent of damage from TBI can be mea-
sured structurally, using anatomical imaging methods such
as magnetic resonance imaging (MRI/fMRI), or functionally,
using behavioral scales such as the internationally recognized
Glasgow Coma Scale (Teasdale & Jennett, 1974). The Glasgow
Coma Scale is a standard noninvasive tool for the rapid assess-
ment of TBI that rates overall responsiveness on a 15-point scale
with three subscales: eye opening (4 points: eyes open spon-
taneously; eyes open with a command/shouting; eyes open in
response to pain; no eye opening); verbal responses (5 points:
oriented; disoriented but able to answer questions; inappropriate
answers to questions but words are discernable; incomprehen-
sible speech; no verbalization); and the best motor response (6
points; can follow commands; purposeful movement in response
to pain; withdrawal from pain; responds to pain with abnormal
flexion; responds to pain with abnormal extension; no move-
ment). A Glasgow Coma Scale score of 13 to 15 is considered
mild TBI; 9 to 12 is moderate TBI; and 3 to 8 is severe TBI.
Although acquired brain injuries (TBI and stroke) are talked
about collectively because of their similar etiologies, the exact
cause, area of damage, type of damage, and resulting type of def-
icit can differ widely across cases. For instance, neurobiological/
cognitive deficits as a result of TBI can be acute, as in the case
of concussion (de Kruijk, Leffers, Meerhoff, Rutten, & Twijnstra,
2002; Ryan & Warden, 2003), but if these deficits persist in the
long term it becomes necessary to start an intensive rehabilita-
tion program. As recommended by Parikh, Koch, and Narayan
(2007), optimal rehabilitation for acquired brain injury needs
to address motoric deficits through physical, occupational, and
speech therapy, as well as cognitive deficits through strategy-
or skill-building activities; and counseling should be provided
to meet the patient’s social and emotional needs. (Beyond the
needs of the patient, brain-injury support groups also exist to
provide assistance to the families of TBI patients.)
In this chapter, the focus will be on the rehabilitation of defi-
cits through cognitive skill-building activities following acquired
brain injury. Previous reviews of cognitive rehabilitation have
found that deficit-specialized rehabilitation programs do lead to
significant behavioral improvements (Cappa et al., 2003; Cice-
rone, Dahlberg, Kalmar et al., 2000; Cicerone, Dahlberg, Malec
et al., 2005). The various studies reviewed by Cappa et al. and
Cognitive Retraining Following Acquired Brain Injury 309
Cicerone et al. have strong theoretical bases in how to treat spe-
cific deficits (i.e., theories of attention based on basic research
formed the basis for treatment programs to correct attentional
deficits). However, the structure of the rehabilitation programs
in these studies was not always equally grounded in empirical
research on training. Therefore, this chapter will look at reha-
bilitation programs spanning a wide range of cognitive deficits
and compare the methods used in different rehabilitation pro-
grams with known training principles (e.g., chapters 2 and 3 of
this volume).
In many instances there is congruence between the principles
of “normal” training (i.e., training principles that have been vali-
dated on samples of unimpaired participants) and the methods
used in rehabilitation (retraining). There are, however, also sev-
eral areas where rehabilitation methods might be inconsistent
with normal training principles, or where alternative principles
have not been tested. In reviewing and comparing skill acqui-
sition research with rehabilitation research, the goals of this
chapter are to expose open research questions that need to be
resolved and to suggest future research that might optimize
rehabilitation methods.
Previous reviews of the literature on rehabilitation have been
based on major types of cognitive deficits that result from ABI
and the methods used to treat these specific impairments (Cappa
et al., 2003; Cicerone, Dahlberg, Kalmar et al., 2000; Cicerone,
Dahlberg, Malec, et al., 2005). This chapter will follow a similar
structure, analyzing rehabilitation methods for deficits in atten-
tion, memory, executive functions and problem solving, visual
spatial processing, and language and communication deficits.
After reviewing rehabilitation methods for these specific deficits,
the chapter will conclude with an integration of rehabilitation
methods with established training principles and suggestions
for future areas of research.
Correcting Attentional Deficits

Attentional deficits following TBI can manifest as deficits in
selective/focused attention, divided attention, or the shifting
of attention. These deficits have been shown to yield to train-
ing principles based on part-task training, deliberate practice,
and depth of processing. For example, Sohlberg, McLaughlin,
Pavese, Heidrich, and Posner (2000) tested the effectiveness of
a retraining intervention called attention process training. This
approach consists of deliberate practice on tasks that empha-
size identifiable components of attention (computer-based tasks
to train either focused attention for blocking out unwanted
stimuli, divided attention for attending to multiple streams of
information, or shifting attention for releasing-moving-engaging
attention between different stimuli). Training was organized
hierarchically in this study so that easier versions of the task
had to be completed before advancing to more difficult versions of
the same task. This kind of scaffolding (i.e., the easy-to-difficult
part training progression) is an established training method
for skill acquisition in healthy subjects (see chapters 2 and 4).
Training was given over 10 weeks, during which period atten-
tion process training led to significantly improved attention and
memory functioning, represented by both self-report measures
and neuropsychological measures, compared to a control group
that received 10 weeks of therapeutic support.
Fasotti, Kovacs, Eling, and Brouwer (2000) worked with 22
TBI patients to create cognitive strategies for overcoming atten-
tional deficits, using knowledge and skills that they already
possessed. The strategy they taught was called time pressure
management (Ylvisaker, Szekeres, Henry, Sullivan, & Wheeler,
1987). Under this system, patients were trained to increase their
awareness of attentional errors and deficits, to assess the time
requirements of the task at hand, to plan an approach before
beginning, to be alert to contingencies that require plan altera-
tion, and to refresh their plans regularly using working mem-
ory. Successful use of this strategic approach produced greater
improvement of attention and memory functioning than did an
alternative treatment of concentration training.
Neimann, Ruff, and Baser (1990) randomly assigned TBI
patients (who were at least 12 months post injury) to an atten-
tion training program or a memory training program as a con-
trol condition. In the attentional training condition, patients had
to practice visual, auditory, and divided attention between the
two modalities. Within the visual, auditory, and divided atten-
tion tasks, patients also practiced focusing attention (which
required the correct identification of targets among distrac-
tors) and alternating attention (which required shifting atten-
tion to different dimensions of the same stimulus; e.g., color
versus direction of arrows). Following training, the attentional
training group improved significantly on measures of attention
compared to the memory training group, but did not improve
on memory tests, consistent with the procedural reinstatement
principle (see chapter 2).
The apparent independence of attention and memory pro-
cesses in the Neimann et al. (1990) study might be attribut-
able to the fact that memory testing involved a 30-min retention
interval. Other types of memory, such as working memory, are
more closely related to attention and thus it is reasonable to
expect greater transfer to working memory performance from
attention training and vice versa. Westerberg et al. (2007) gave
stroke patients a computer-based 5-week rehabilitation pro-
gram focusing on working memory. Tasks in the computerized
training-set included: reproducing a light sequence in a visuo-
spatial grid; indicating numbers in reverse order from memory
(the sequence was verbally presented); identifying letter posi-
tions in a sequence (letters were heard one at a time and subjects
then had to recall the letter when prompted with the position);
finding mismatched letters (two pseudo-words presented with a
single difference); and reproducing a light sequence in a rotated
grid. Following training on these complex working memory
tests, there was significant improvement not only on the trained
measures of working memory, but also on other untrained mea-
sures of working memory and on measures of attention. This
result demonstrates that focused rehabilitation can generalize
not only to untrained measures within a construct (e.g., new
assessments of working memory) but related psychological con-
structs (e.g., attention).
Correcting Memory Deficits

Deficits in memory following acquired brain injury can affect a
variety of distinct psychological processes. For example, tem-
poral lobe damage might create problems for encoding new
information into long-term memory, whereas damage to the pre-
frontal cortex might create problems for the short-term mainte-
nance of information in working memory. Because of the diffuse
representation of memory in the brain and the diverse nature
of memory deficits, memory problems are a common outcome of
ABI and can have a severe impact on a patient’s independence
in daily life.
Earlier research has demonstrated the effectiveness of exter-
nal cues and memory aids in assisting individuals with memory
impairments (Glisky, 1995; Wilson 1995), but a study by Kaschel
et al. (2002) was one of the first to assess using cognitive strat-
egy training (in this case imagery training) as a rehabilitation
method. Kaschel et al. intensively trained patients to use fea-
tures of images as cues for retrieving verbal information and for
prospective remembering. Imagery trained patients were com-
pared to other patients who were rehabilitated without an imag-
ery training component. Patients receiving the imagery training
(30 sessions over the course of 10 weeks) significantly improved
on tests of delayed recall and the successful recall of relevant
everyday memories (e.g., making appointments). Decreases in
memory errors were also observed by relatives of the patients in
the imagery training condition.
Other studies have looked at the efficacy of external aids in
improving memory. For instance, in a study by Wilson, Emslie,
Quirk, and Evans (2001), half of the patients after 2 weeks of
baseline observation received a pager whereas the other half of
the subjects waited for 7 weeks before receiving pagers. Prior
to baseline observation, patients worked with the therapists to
select individualized memory cues to later be sent to the patients
via their pagers. During baseline, both groups of patients had
equivalent memory loss (~50% failure rate in achieving mem-
ory targets). After 7 weeks, patients who had received pagers
were significantly better at remembering important events and
appointments (success on 74.47% of memory targets) compared
to patients who had not yet received pagers (success on 48.18%
of memory targets).
After testing at Week 7, the groups switched and the other
half of the patients were given the pagers and were sent their
self-selected memory cues. Testing at Week 14 showed that the
group that now had pagers was doing significantly better with
memory in their daily life (success on 76.13% of memory targets).
However, patients who had the pagers originally, and had now
spent 7 weeks without pagers, were still doing significantly bet-
ter than baseline (success on 62.15% of memory targets). These
data suggest that improved memory performance in everyday
activities was not transient or dependent on having the pager,
but instead that pagers help patients learn to compensate for
their memory deficits.
Ownsworth and MacFarland (1999) combined self-instructional
training with the use of diaries to rehabilitate memory. Self-
instructional training emphasized self-awareness and self-regu-
lation using the mnemonic, WSTC: W, what are you trying to do?
S, select a strategy for the task. T, try the strategy you selected.
C, check if the strategy is working. One group of patients received
only training in the use and maintenance of a diary; a second
group of patients received training in using the diary and self-
instructional training. The second group of patients consistently
made more diary entries and had fewer self-reported memory
problems in their daily life, suggesting that self-instructional
training was a positive complement to diary-use. Further, the
results of this study show cognitive rehabilitation can be suc-
cessful even many months following TBI. Although early inter-
ventions might help to maximize gains in recovery, effective
rehabilitation can significantly improve performance long after
the initial injury (Gordon, 1990).
Retraining Executive Functions

Goal Management Training, developed by Robertson (1996), is a
structured, interactive protocol based on empirical research and
theories about cognitive executive functions and the disorgani-
zation of behavior following frontal lobe lesions (Duncan, 1986).
Goal management training centers around a patient and thera-
pist working through a five-step executive process: (a) orient-
ing—patients are trained to increase self-awareness and direct
awareness toward relevant goals; (b) goal selection—patients
are trained to select and clearly define higher-level task goals
(e.g., bake a cake); (c) build subgoals—patients are trained to
explicitly identify relevant subgoals that will contribute to the
higher-level goal (e.g., preheat the oven, sort all of the wet and
dry ingredients, etc.); (d) encoding—patients work on developing
personal strategies for encoding and retaining goals and sub-
goals; and (e) monitoring—the final outcome is compared to the
initially selected goal. In the event of a mismatch between the
goal and the outcome, the entire process is repeated. Levine et
al. (2000) studied patients with TBI damage to the prefrontal
cortex using goal management training to treat disorganized
behavior; a control group was given an equal amount of time
in motor-skills training. Goal management training, but not
motor-skills training, led to significantly improved performance
of laboratory tasks that were designed to replicate tasks that
were problematic for patients in their daily lives.
Tasks such as meal preparation are a major challenge for
patients with executive function impairments because goals
need to be selected and maintained in memory over time and
subgoals need to be properly sequenced. In a study by Manly,
Hawkins, Evans, Woldt, and Robertson (2002), these abilities
were tested by having patients complete a battery of six labora-
tory tests simulating problematic, daily tasks for patients with
lesions of the prefrontal cortex (PFC). Patients were required to
(a) sort a collection of coins, (b) proofread an informational pam-
phlet, (c) sort labels in alphabetical order, (d) open and close
garage doors at different times (i.e., pressing buttons on a but-
ton-box at appointed intervals), (e) look up telephone numbers
based on names, and (f) compile individual bills for hypotheti-
cal customers. In one condition, patients completed these tasks
with the assistance of auditory cues to remind them how much
time they had spent on a single task, and in the other condition,
no auditory cues were given. When auditory cues were given,
TBI patients performed comparably to IQ- and age-matched nor-
mal control subjects; without cues, TBI patients’ performance
was significantly worse.
Anger control and management is a common problem
encountered by TBI patients who are experiencing deficits in
executive functions. It is not always clear if anger is stemming
from a release of inhibition as a result of TBI, is a psychological
expression of frustration with cognitive deficits, or some com-
bination of physiological and psychological factors. Regardless
of the reasons, increased aggression is frequently reported by
patients, relatives, and therapists of TBI patients (Jacobs, 1987;
Livingston & Brooks, 1988; Miller, 1990). Medd and Tate (2000)
used an intervention in which PFC patients were given explicit
information about the relationship of anger to TBI, including
an explanation of psychological models of anger and common
affective difficulties facing TBI patients. The focus of training
was to increase patients’ awareness of their own anger and to
provide them with methods to monitor and control their anger.
Patients were trained to recognize cognitive, physical, and emo-
tional changes that occur in their initial moments of anger. After
working to identify triggers and symptoms of anger, therapists
trained patients to develop individual strategies to control anger
responses such as self-talk, relaxation, assertiveness training,
self-distraction, time-out methods, and cognitive challenging
(i.e., questioning the anger response from a rational perspec-
tive). These cognitive coping strategies were found to signifi-
cantly reduce levels of aggression and anger responses after
only six, 1-hour sessions of therapy compared to a “wait-listed”
control group (that later received an identical treatment).
Retraining Visual-Spatial Skills

Neimeier (1998) trained a group of stroke patients who were suf-
fering from visual deficits on either the left or the right side of
their visual field, to improve visual search by use of a very sim-
ple cognitive coping strategy. During the course of their regular
rehabilitation, patients completed the Mesulam Verbal Cancel-
lation Test (Mesulam, 1985) and were shown the effects of their
deficit on that test (i.e., failure to detect targets on either the
right or left side of space respective to their lesions). Then, one
group of patients was shown a line drawing of a lighthouse and
was asked to imagine “that your eyes are like the lights inside
the top, sweeping all the way to the left and right of the hori-
zon to guide the ships at sea to safety. What would happen if
the lighthouse lit only the right side of the ocean and horizon?”
(Neimeier, 1998, p. 401). Finally, patients had the opportunity to
retake the cancellation test now using the lighthouse strategy.
Performance was generally much better on the second trial.
In subsequent sessions, patients and a therapist walked
through the hospital, the therapist asking the patient to locate
items, people, or objects to the left and the right, cuing the
patient to use the lighthouse strategy when necessary. Over
time, patients began independently to use the lighthouse strat-
egy and progressively fewer prompts had to be given. Ultimately,
patients trained to use the lighthouse strategy significantly
improved their visual search abilities and showed significantly
greater improvement in overall attention compared to a control
group of stroke patients (matched on age, gender, and education
level) who were not trained to use a cognitive strategy.
Kasten, Wüst, Behrens-Baumann, and Sabel (1998) exam-
ined the efficacy of Visual Restitution Training (VRT), a proce-
dure that uses the training difficulty principle (see chapters 2
and 3), to restore optic field size in TBI patients suffering par-
tial blindness from optic nerve or postchiasmic brain damage.
In these cases, part of the patient’s visual field is normal (the
“intact area”), another area is completely blind (“the defective
area”), and between these two areas is a deficient, but not com-
pletely blind area (the “transition zone”) where stimulus detec-
tion is difficult, but not impossible. In VRT, a target stimulus is
displayed on a screen at variable latitudes and the longitude of
the stimulus changes as a function of the individual patient’s
visual field. VRT progressively moves the stimulus into the tran-
sition zone based on correct identification of the stimulus by
the subject. In this way, VRT training progressively increases
the difficulty of the identification task. But the training is
also highly individualized because the rate at which difficulty
increases is calibrated to each patient. VRT training was highly
effective, leading to 29.4% improvement in stimulus detection
for postchiasmic patients, and a 73.6% improvement for optic
nerve patients. Patients in a control group, who were trained for
an equal amount of time using an eye fi xation program requir-
ing eye movements to stimuli near the foveal region, showed no
improvement.
Retraining Language and Communication Skills

Several studies have highlighted the value of using empiri-
cally based normal training principles in cognitive rehabilita-
tion of language and communication skills in aphasic patients
(from both stroke and TBI populations; see Robey, 1994, 1998).
Although there is a large body of evidence suggesting general
effectiveness of aphasia therapy, only recently has research
begun manipulating training variables to determine what
makes treatment most effective. Elman and Bernstein-Ellis
(1999) examined a rehabilitation protocol for 24 stroke patients
who suffered from chronic aphasia. Patients were randomly
assigned to either an immediate or a deferred treatment group.
For both groups during treatment, patients received 5 hours of
group-based communication training per week. Group therapy
sessions focused on developing useful strategies for conveying
information to another person, improving patients’ understand-
ing of communicative disorders, creating explicit personal goals
for rehabilitation and evaluating progress toward these goals,
encouraging the initiation of conversation, and promoting confi-
dence in initiating conversation outside the group setting. While
patients in the immediate treatment group were engaged in
group therapy, patients in the delayed treatment group engaged
in support groups and group motor-performance rehabilita-
tion in order to control for the effects of social contact. Immedi-
ate group-based communication treatment led to significantly
improved communicative and linguistic abilities after 2 months
and 4 months of treatment (compared to the deferred treatment
control group) with no significant declines in performance in
follow-up tests administered after 4 to 6 weeks of no treatment.
Using a similar treatment protocol, Denes, Perazzolo, Piani,
and Piccione (1996) compared the efficacy of aphasia rehabili-
tation on an intensive massed training schedule (training ses-
sions daily) with a slower paced, spaced training schedule (three
training sessions per week), with the schedules matched for the
number of sessions. Massed training led to greater improvement
than spaced training across multiple measures of language flu-
ency. This result is somewhat at odds with research on skill
training in healthy subjects, which often finds an advantage of
distributed practice over massed practice, particularly when
the recall test is far removed from the training session (Cepeda,
Pashler, Vul, Wixted, & Rohrer, 2006; Schmidt & Bjork, 1992;
see chapters 2 and 13). Whether massed or spaced practice
leads to more efficient training must depend to some extent on
the nature of the knowledge and skill being acquired. There is
initial evidence from studies on the rate of practice that massed
practice improves performance relative to spaced practice in
aphasic patients (controlling for number of sessions; Basso &
Caporali, 2001; Denes et al., 1996) and several studies of reha-
bilitation following stroke have found an advantage for a more
intense rehabilitation schedule in terms of more hours per week
compared to a less intense training schedule with fewer hours
per week (when matched on number of weeks; see Hillis, 1998;
Hinckley & Craig, 1998). But these studies are limited in sev-
eral ways, and more research is needed to understand the rela-
tionship between the frequency and amount of rehabilitation
to short-term/long-term improvements in performance in stoke
patients.
In a related study, Pulvermuller et al. (2001) compared a con-
ventional language therapy (30–35 hours over 4 weeks) with
a constraint-induced therapy (30–35 hours over 10 consecu-
tive days). Patients with different types of aphasia were given
individualized treatments for their specific deficits, but across
different treatment types the only difference between the con-
ventional therapy group and the constraint-induced therapy
group was the rehabilitation schedule. Rehabilitation tasks for
both groups included object naming, word repetition, verbal
sentence completion, practice in following verbal instructions,
and conversations on topics selected by the patient. Constraint-
induced therapy led to larger pre/posttreatment improvements
on neuropsychological measures, patient’s self-ratings, and
blind therapist ratings of patient’s performance compared to the
conventional therapy group.
Obviously more research is needed to determine the condi-
tions under which spaced or massed practice is to be recom-
mended for training efficiency. Less is known about the effects
of practice distribution on the retention of rehabilitation skills.
Future research will be needed to determine whether this type
of skill is retained better after spaced training as is the case
with most acquired skills (again, see chapters 2 and 13).
Hinckley, Patterson, and Carr (2001) compared the efficacy
of a context-based rehabilitation protocol with a skill-based
rehabilitation protocol in stroke patients suffering from chronic
aphasia. Context-based rehabilitation focuses on indirectly
developing communicative skill by placing patients into person-
ally relevant contexts (such as a conversation), closely related
contexts, or role playing. In context-based rehabilitation, the
goal is to foster communication through any means possible.
In contrast, skill-based rehabilitation emphasizes the identifi-
cation of highly specific subprocesses within the overall deficit
of aphasia and these subprocesses are rehabilitated through
designed interventions. From a theoretical perspective, skill-
based training assumes that subprocesses are independent
components that can be practiced and improved in isolation of
each other and, once automated, these component skills can
be generalized to a range of different tasks (Carr & Levy, 1990;
Fisk & Eboch, 1989; also see the analysis of cognitive tasks
into their component taxons in chapter 8). Comparing pretreat-
ment to posttreatment performance for the two types of treat-
ment revealed that context-based rehabilitation led to larger,
but much more specific improvements in tests on linguistic and
communicative ability (i.e., rehabilitation involving a telephone
based conversation did not generalize to a written version of the
same task). In contrast, skill-based rehabilitation led to smaller,
but more general improvement in linguistic and communicative
ability (i.e., performance was improved even on unpracticed ver-
sions the task).
In a follow-up study, Hinckley and Carr (2005) compared
intensive to nonintensive training in a context-based rehabili-
tation of patients with moderate to severe nonfluent aphasia.
In the intensive condition, patients completed more consecutive
hours of individual treatment and group treatment weekly. In
the nonintensive group, patients completed fewer hours of indi-
vidual treatment weekly and no group treatment. In both con-
ditions, patients were trained to complete a catalogue ordering
task plus supplemental tasks that were unique to the individual
patient. Regardless of the task being trained, patients learned
to develop the same explicit cognitive strategies as the context-
based rehabilitation group in the Hinckley et al. (2001) study.
There were no initial differences between the two groups and
both groups showed comparable improvements as measured
by immediate posttest performance. Only the intensive therapy
group showed generalization to untrained language modalities
(i.e., improvements in writing in addition to speech), however.
This finding of improved generalization, in addition to the find-
ing that the mean number of minutes required to achieve cri-
terion in the catalogue-ordering task was 233 minutes across
both groups, suggests that an intensive training schedule might
be preferable when time is a limiting factor in the rehabilitation
of aphasia; however, without a long-term follow-up test it is not
clear if there would be advantages of increased retention with
the intensive training schedule or if the intensive schedule only
improved performance in the short-term.
Conclusions: Integrating Normal Training

Principles with Retraining Cognitive Skills
As matters stand currently, some training principles that
have been empirically validated in normal training (as out-
lined in chapters 2, 3 and elsewhere in this volume) have been
incorporated successfully into rehabilitation programs for
acquired brain injury. But to date, others have been neglected.
Where normal training principles have been applied in rehabili-
tation, their effects are for the most part consistent with those
found in normal training. This chapter concludes with a review
of a few significant examples of the way normal training prin-
ciples operate in rehabilitation programs.
Principles of resource or effort allocation are clearly evident
in the correction of attention and memory deficits, as discussed
by Wickens, Hutchins, Carolan, and Cumming (chapter 4). In
several rehabilitation studies, induced depth of processing and
the use of imagery have been shown to improve memory both in
standardized tests and in the activities of daily life.
The context in which retraining is performed also significantly
affects training efficiency. Evidence from the remediation of
executive functions, visual-spatial deficits, and communication
disorders suggests that scaffolding the difficulty of training is a
particularly effective method of rehabilitation. Using this easy-
to-difficult progression, especially when the rate of increasing
difficulty is tailored to the individual, promotes successful reha-
bilitation across of range of acquired brain injury deficits.
Related to training difficulty is the intensity or spacing of
training events. Some studies on the rehabilitation of aphasia
suggest that massed-practice schedules might confer an advan-
tage in rapid recovery over spaced-practiced schedules. The
results of research on spacing in normal training are already
complicated, with training efficiency usually better with spaced
practice (chapter 13). Retraining lost skills is different from
learning new skills and might need to follow a more intense
training schedule, especially following brain injury when there
might be a window of increased brain plasticity (Emery, Royo,
Fischer, Saatman, & McIntosh, 2003). However, one outcome
seems to be clear in normal training, the greater the spacing
between practice sessions, the better is performance on delayed
retention tests; longer spacing being more appropriate for longer
delays. Whether the same result holds for rehabilitation is not
yet well established.
In rehabilitation it is important that patients receive as much
guided and deliberate practice as possible as soon as they are
capable of such practice. Not much research has been done on
guided practice in normal training, although Wickens et al.
(chapter 4) suggest that guidance can have a strongly positive
impact on training efficiency in some normal training situations.
Patients rehabilitating memory or executive function deficits are
likely to need to use these skills in their everyday activities.
For a patient with an acquired brain injury undergoing rehabili-
tation, daily life is an integrated training/testing environment
and patients need as much practice as time will allow. In this
context of greater and earlier demand, the benefits of a massed
practice schedule make more intuitive sense. Further research
in rehabilitation science is needed, however, to weigh the long-
term clinical benefits of massed- compared to spaced-practice
schedules within the inpatient and outpatient environments.
Similarly, clinical research needs to address the effectiveness
of part-task training in greater detail (the principles of part- ver-
sus whole-task training have been explored in some detail in
normal training; chapters 2 and 4). Some evidence (specifically
from the rehabilitation of communicative disorders like apha-
sia) suggests the practice of individual language components,
divorced from the larger context of communication, can lead to
smaller, but more transferable, improvements in performance.
In contrast, in situ practice of communication leads to larger,
but more specific, improvements in the activities of daily living.
Further research is needed with patients suffering from various
types of deficits before this form of part-task training can be
seen as a valid principle of rehabilitation. It would also be fruit-
ful to examine combinations of context-based and task-specific
rehabilitation methods to determine the optimal rehabilitation
context for different disorders.
Another future direction for clinical research would be to
explore the role of attentional focus in the rehabilitation of cog-
nitive skills. Considerable work has been done on the role of
attentional focus during learning and performance of motor
skills (see chapter 2; McNevin, Wulf, & Carlson, 2000), and this
work is directly translatable to physical rehabilitation methods.
The effects of attentional focus in rehabilitation for motor defi-
cits has been studied in stroke patients (Fasoli, Trombly, Tickle-
Degnen, & Verfaellie, 2002); in Parkinson’s disease patients
(Landers, Wulf, Wallmann, & Guadagnoli, 2005; Wulf, Land-
ers, Lewthwaite, & Töllner, 2009); and in patients with mus-
culoskeletal injuries (Laufer, Rotem-Lehrer, Ronen, Khayutin,
& Rozenberg, 2007). However, the role of attentional focus has
not be explored in the rehabilitation of cognitive deficits, where
explicitly directing attention to components of a task or task
goal could potentially improve both learning and performance.
An exciting direction of research in both basic research on
training and applied research in rehabilitation is on increas-
ing a learner’s/patient’s engagement during training/rehabilita-
tion (see the cognitive antidote principle in chapter 2 for basic
research; see Burke et al., 2009; Ellul, Watkins, Ferguson,
Barer, & Rowe, 1993, for clinical research on patient engage-
ment). Patient engagement is a major predictor of continued
rehabilitation and recovery; however, the structural features of a
task that increases engagement (both with and without increas-
ing difficulty) need to be better elucidated and the physiological
mechanisms that underlie engagement (both in the laboratory
and the hospital) are not well established.
In conclusion, there is considerable research to demonstrate
the effectiveness of cognitive rehabilitation of acquired brain
injury based on normal training principles. More detailed
reviews have explored the efficacy of cognitive rehabilitation in
both the deficiencies identified in this chapter and additional
deficits, such as apraxia, which typically follow acquired brain
injury (Cappa et al., 2003; Cicerone, Dahlberg, Kalmar et al.,
2000; Cicerone, Dahlberg, Malec et al., 2005). Having estab-
lished that such cognitive approaches are effective, clinical sci-
ence should now further explore training methods to optimize
the benefits of rehabilitation. This is a momentous task because
of the many task and patient variables that need to be explored
or controlled (e.g., the schedule of training, the amount/nature
of feedback, and the nominal difficulty of training sessions, age,
personality, extent and area of damage, and type of deficit). To
that end, this chapter has been an initial attempt to synthesize
empirical work on rehabilitation with more general skill acquisi-
tion research, and to suggest potentially beneficial directions for
future clinical research.
References
Adekoya, N., Thurman, D. J., White, D. D., & Webb, K. W. (2002). Sur-
veillance for traumatic brain injury deaths—United States, 1989–
1998. MMWR Surveillance Summaries, 51, 1–14.
Armstrong, B. D., Hu, Z., Abad, C., Yamamoto, M., Rodrigeuz, W. I.,
Cheng, J. T., … Waschek, J. A. (2003). Lymphocyte regulation of
neuropeptide gene expression after neuronal injury. Journal of Neu-
roscience Research, 74, 240–247.
Basso, A., & Caporali, A. (2001). Aphasia therapy, or the importance of
being earnest. Aphasiology, 15, 307–332.
Burke, J. W., McNeill, M. D. J., Charles, D. K., Morrow, P. J., Cros-
bie, J. H., & McDonough, S. M. (2009). Optimising engagement for
stroke rehabilitation using serious games. The Visual Computer, 25,
1085–1099.
Cappa, S. F., Benke, T., Clarke, S., Rossi, B., Stemmer, B., & van
Heug ten, C. M. (2003). EFNS guidelines on cognitive rehabilitation:
Report of an EFNS task force. European Journal of Neurology, 10,
11–23.
Carr, T. H., & Levy, B. A. (1990). Reading and its development: Compo-
nent skills approaches. Orlando, FL: Academic Press.
Cepeda, N. J., Pashler, H., Vul, E., Wixted J. T., & Rohrer, D. (2006).
Cicerone, K. D., Dahlberg, C., Kalmar, K., Langenbahn, D. M., Malec,
J. F., Bergquist, T. F., … Morse, P. A. (2000). Evidence based cogni-
tive rehabilitation: Recommendations for clinical practice. Archives
of Physical Medicine & Rehabilitation, 81, 1596–1615.
Cicerone, K. D., Dahlberg, C., Malec, J. F., Langenbahn, D. M., Felicetti,
T., Kneipp, S., & Catanese, J. (2005). Evidence based cognitive reha-
bilitation: Updated review of the literature from 1998 through 2002.
Archives of Physical Medicine and Rehabilitation, 86, 1681–1692.
de Kruijk, J. R., Leffers, P., Meerhoff, S., Rutten, J., & Twijnstra, A.
(2002). Effectiveness of bed rest after mild traumatic brain injury: A
randomised trial of no versus six days of bed rest. Journal of Neurol-
ogy, Neurosurgery, and Psychiatry, 73, 167–172.
Denes, G., Perazzolo, C., Piani, A., & Piccione, F. (1996). Intensive ver-
sus regular speech therapy in global aphasia: A controlled study.
Aphasiology, 10, 385–394.
Duncan, J. (1986). Disorganization of behavior after frontal lobe dam-
age. Cognitive Neuropsychology, 3, 271–290.
Ellul, J., Watkins, C., Ferguson, N., Barer, D., & Rowe, J. (1993).
Increasing patient engagement in rehabilitation activities. Clinical
Rehabilitation, 7, 297–302.
Elman, R. J., & Bernstein-Ellis, E. (1999). The efficacy of group com-
munication treatment in adults with chronic aphasia. Journal of
Speech, Language, and Hearing Research, 42, 411–419.
Emery, D. L., Royo, N. C., Fischer, I., Saatman, K. E., McIntosh, T. K.
(2003). Plasticity following injury to the adult central nervous sys-
tem: Is recapitulation of a developmental state worth promoting?
Journal of Neurotrauma, 20, 1271–1292.
Fasoli, S. E., Trombly, C. A., Tickle-Degnen, L., & Verfaellie, M. H.
(2002). Effect of instructions on functional reach in persons with
and without cerebrovascular accident. American Journal of Occupa-
tional Therapy, 56, 380–390.
Fasotti, L., Kovacs, F., Eling, P. A., & Brouwer, W. H. (2000). Time pres-
sure management as a compensatory strategy training after closed
head injury. Neuropsychological Rehabilitation, 10, 47–65.
Fisk, A. D., & Eboch, M. (1989). An automatic/controlled process-
ing theory application to training component map reading skills.
Applied Ergonomics, 20, 2–8.
Glisky, E. L. (1995). Acquisition and transfer of word processing skill by
an amnesic patient. Neuropsychological Rehabilitation, 5, 299–318.
Gordon, W. A. (1990). Cognitive remediation: An approach to the ame-
lioration of behavioral disorders. In R. L. Wood (Ed.), Neurobehav-
ioural sequelae of traumatic brain injury (pp. 175–193). London:
Taylor & Francis.
Hillis, A. E. (1998). Treatment of naming disorders: New issues regard-
ing old therapies. Journal of the International Neuropsychological
Society, 4, 648–660.
Hinckley, J. J., & Carr, T. H. (2005). Comparing the outcomes of inten-
sive and non-intensive context-based aphasia treatment. Aphasiol-
ogy, 19, 965–974.
Hinckley, J. J., & Craig, H. K. (1998). Influence of rate of treatment on
the naming abilities of adults with chronic aphasia. Aphasiology,
12, 989–1006.
Hinckley, J. J., Patterson, J. P., & Carr, T. H. (2001). Differential effects
of context and skill-based treatment approaches: Preliminary find-
ings. Aphasiology, 15, 463–476
Jacobs, H. E. (1987). The Los Angeles head injury survey: Project ratio-
nale and design implications. Journal of Head Trauma Rehabilita-
tion, 2, 37–50.
Joshi, J., Ornstein, E., & Young, W. L. (2001). Cerebral and spinal cord
blood flow. In J. E. Cottrell & D. C. Smith (Eds.), Anesthesia and
neurosurgery (4th ed.). St Louis, MO: Mosby.
Kaschel, R., Della Sala, S., Cantagallo, A., Fahlböck, A., Laaksonen,
R., & Kazen, M. (2002). Imagery mnemonics for the rehabilitation
of memory: A randomised group controlled trial. Neuropsychological
Kasten, E., Wüst, S., Behrens-Baumann, W., & Sabel, B. A. (1998).
Computer-based training for treatment of partial blindness. Nature
Medicine, 4, 1083–1087.
Landers, M., Wulf, G., Wallman, H., & Guadagnoli, M. A. (2005). An
external focus of attention attenuates balance impairment in Par-
kinson’s disease. Physiotherapy, 91, 152–185.
Langlois, J. A., Rutland-Brown, W., & Wald, M. M. (2006). The epidemi-
ology and impact of traumatic brain injury: A brief overview. Journal
of Head Trauma Rehabilitation, 21, 375–378.
Laufer, Y., Rotem-Lehrer, N., Ronen, Z., Khayutin, G., & Rozenberg, I.
(2007). Effect of attention focus on acquisition and retention of pos-
tural control following ankle sprain. Archives of Physical Medicine
and Rehabilitation, 88, 105–108.
Levine, B., Robertson, I. H., Clare, L., Carter, G., Hong, J., Wilson, B.
A., … Stuss, D. T. (2000). Rehabilitation of executive functioning:
An experimental-clinical validation of Goal Management Training.
Journal of the International Neuropsychological Society, 6, 299–312.
Livingston, M. G., & Brooks, D. N. (1988). The burden on families of
the brain-injured: A review. Journal of Head Trauma Rehabilitation,
3, 6–15.
Manly, T., Hawkins, K., Evans, J., Woldt, K., & Robertson, I. (2002).
Rehabilitation of executive function: Facilitation of effective goal
management on complex tasks using periodic auditory alerts. Neu-
ropsychologia, 40, 271–281.
McNevin, N. H., Wulf, G., & Carlson, C. (2000). Effects of attentional
focus, self-control, and dyad training on motor learning: Implica-
tions for physical rehabilitation. Physical Therapy, 80, 373–385.
Medd, J., & Tate, R. L. (2000). Evaluation of an anger management
therapy programme following acquired brain injury: A preliminary
study. Neuropsychological Rehabilitation, 10, 185–201.
Mesulam, M. M. (Ed.). (1985). Principles of behavioral neurology. Phila-
delphia, PA: F. A. Davis.
Miller, L. (1990). Major syndromes of aggressive behaviour following
head injury: An introduction to evaluation and treatment. Cognitive
Neimann, H., Ruff, R. M., & Baser, C. A. (1990). Computer-assisted
attention retraining in head-injured individuals: A controlled effi-
cacy study of an outpatient program. Journal of Consulting and Clin-
ical Psychology, 58, 811–817.
Neimeier, J. P. (1998). The Lighthouse Strategy: Use of a visual imagery
technique to treat visual inattention in stroke patients. Brain Injury,
12, 399–406.
Nortje, J., & Menon, D. K. (2004). Traumatic brain injury: Physiol-
ogy, mechanisms, and outcome. Current Opinion in Neurology, 17,
711–718.
Ownsworth, T. L., & MacFarland, K. (1999). Memory remediation in
long-term acquired brain injury: Two approaches in diary training.
Brain Injury, 13, 605–626.
Parikh, S., Koch, M., & Narayan, R. K. (2007). Traumatic brain injury.
International Anesthesiology Clinics, 45, 119–135.
Pulvermuller, F., Neininger, B., Elbert, T., Mohr, B., Rockstroh, B., Koe-
bbel, P., & Taub, E. (2001). Constraint-induced therapy of chronic
aphasia after stroke. Stroke, 32, 1621–1626.
Robertson, I. H. (1996). Goal management training: A clinical manual.
Cambridge, England: PsyConsult.
Robey, R. R. (1994). The efficacy of treatment for aphasic persons: A
meta-analysis. Brain and Language, 47, 585–608.
Robey, R. R. (1998). A meta-analysis of clinical outcomes in the
treatment of aphasia. Journal of Speech, Language, and Hearing
Research, 41, 172–187.
Rutland-Brown, W., Langlois, J. A., Thomas, K. E., & Xi, Y. L. (2006).
Incidence of traumatic brain injury in the United States, 2003. Jour-
nal of Head Trauma Rehabilitation, 21, 544–548.
Ryan, L. M., & Warden, D. L. (2003). Post concussion syndrome. Inter-
national Review of Psychiatry, 15, 310–316.
Sohlberg, M. M., McLaughlin, K. A., Pavese, A., Heidrich, A., & Pos-
ner, M. I. (2000). Evaluation of attention process training and brain
injury education in persons with acquired brain injury. Journal of
Clinical & Experimental Neuropsychology, 22, 656–76.
Tanielian T., & Jaycox, L. H., (Eds.). (2008). Invisible wounds of war:
Psychological and cognitive injuries, their consequences, and ser-
vices to assist recovery. Santa Monica, CA: RAND.
Teasdale, B., & Jennett, B. (1974). Assessment of coma and impaired
consciousness: A practical scale. Lancet, 2, 81–84.
Warden, M. D. (2006). Military TBI during the Iraq and Afghanistan
Wars. Journal of Head Trauma Rehabilitation, 21, 398–402.
Westerberg, H., Jacobaeus, H., Hirvikoski, T., Clevberger, P., Östens-
son, M. L., Bartfai, A., & Klingberg, T. (2007). Computerized working
memory training after stroke—A pilot study. Brain Injury, 21, 21–29.
Wilson, B. A. (1995). Memory rehabilitation: Compensating for memory
problems. In L. Bäckman & R. Dixon (Eds.), Psychological compen-
sation (pp. 171–190). Hillsdale, NJ: Erlbaum.
Wilson, B. A., Emslie, H. C., Quirk, K., & Evans, J. J. (2001). Reduc-
ing everyday memory and planning problems by means of a paging
system: A randomized control crossover study. Journal of Neurology,
Neurosurgery, & Psychiatry, 70, 477–482.
Wulf, G., Landers, M., Lewthwaite, R., & Töllner, T. (2009). External
focus instructions reduce postural instability in individuals with
Parkinson disease. Physical Therapy, 89, 162–168
Ylvisaker, M., Szekeres, S. F., Henry, K., Sullivan, D. M., & Wheeler, P.
(1987). Topics in cognitive rehabilitation theory. In M. Ylvisaker &
E. M. Gobble (Eds.), Community re-entry for head injured adults (pp.
137–220). Boston, MA: College-Hill Press.
16 Conclusions
Lyle E. Bourne, Jr. and Alice F. Healy
The purpose of this volume is to describe the current state of

the art as regards training of knowledge and skills. One of the
byproducts of this effort is a set of empirically valid training
principles, which appear in various forms throughout the chap-
ters of the book. The volume pursues several interrelated sub-
goals, involving reviewing existing literature, describing new
empirical research conducted by the authors, formulating both
a general theoretical framework and specific task-based models,
and considering the applications of this basic research to impor-
tant questions arising in the real world.
Important Distinctions
This volume starts by making a set of important distinctions
that apply to training research, theory, and application: (a)
training and education; (b) acquisition, retention, and transfer;
(c) declarative and procedural information; and (d) principles,
guidelines, and specifications. The book focuses on training,
in contrast to education. Training is the attempt to develop
knowledge and skills particular to a well-defined task or job.
Education, in contrast, focuses on the acquisition of general
knowledge and skills pertaining to a relatively broad area or
domain, not necessarily limited to a well-defined task. In training,
the fundamental underlying cognitive processes are acquisition
(the initial learning), retention (the durability over time), and
transfer (the flexibility or generalizability) of knowledge and
skills. The emphasis on training for particular tasks requires
differentiating declarative information (facts, knowing that) and
procedural information (skills, knowing how). Finally, when it
comes to utilizing the body of scientific knowledge on training,
there is a logical progression from basic science to application.
This progression has been described as beginning with training
principles (which have been the primary focus of the basic
Conclusions 327
research), to developing guidelines for the transition of those
principles to real-life situations, and finally to specifications for
actual training practices in a given context. These distinctions
appear in various forms throughout the chapters of the book.
Chapter Synopses
The major points in each chapter after the first, which sets out
the major distinctions reviewed above, are summarized here,
including their take-home messages with respect to training.
Chapter 2 reviews the existing cognitive psychological litera-
ture supporting the conclusion that certain principles of train-
ing have strong empirical validity at least in the context of the
laboratory. The effects of these principles have been shown in
all three cognitive component processes of training (acquisition,
retention, and transfer), although the effects are not necessarily
the same on each component. This review points to some pos-
sible applications of the laboratory findings to a variety of train-
ing situations.
Expanding on the review of the literature, chapter 3 reports a
series of recent experiments, some unpublished, that examined
four types of questions left unanswered by the literature. These
experiments revealed the generality of certain training princi-
ples and the limitation of others; illuminated the way in which
principles interact in combination; assessed the applicability of
laboratory based principles to complex, dynamic environments;
and identified novel principles not previously established by the
literature.
Chapter 4 describes the implications of cognitive load theory
for the effectiveness of different attention-related training strate-
gies by analyzing the limited cognitive resources of the learner
as allocated to the intrinsic load of the task to be performed,
the germane load necessary for skill acquisition, and the extra-
neous load in the training environment that competes with
the other two sources of load. Data from a meta-analysis are
reported to support the effectiveness and influence of moderat-
ing variables on the attention-related strategies of active choice,
increasing difficulty training, error prevention, and part task
training, including the training of visual attention. Implications
for training environments are discussed.
Factors that affect the acquisition and transfer of rudimen-
tary components of skill are the focus of chapter 5. Much of
the research described uses basic choice reaction tasks, which
permit isolation of fundamental cognitive processes and rapid
acquisition of skill. The strategy of this research approach is to
isolate and reveal components of skill acquisition and transfer
by examining influences of task parameters on the preexisting
bias to make spatially corresponding responses.
The use of automation as a training aide is explored in chap-
ter 6. Working from an aptitude-treatment interaction perspec-
tive, the results are reported from three studies examining how
different instantiations of automation as a training aide interact
with the learner’s level of cognitive ability to influence learn-
ing and subsequent task performance. Building on these results
and extant theory and research in the automation, training,
and cognitive abilities literatures, a framework is proposed for
understanding how, when, and for whom automation can have a
positive influence on learning, retention, and transfer.
Technology-based training research, as reviewed in chapter
7, seeks to develop findings that reduce risk in training sys-
tems’ design and incorporate features supporting learning and
performance. Applied research projects conducted by the Army
Research Institute in a wide range of technology-based training
applications are described. Taken together they demonstrate the
strengths and weaknesses of applied training research.
Chapter 8 outlines the development and current form of a
four-dimensional taxonomy with which to categorize training
variables according to (a) task type, (b) training methodology,
(c) performance measures, and (d) training principles. This tax-
onomy should facilitate predictions about performance in given
training situations based on their locations in the taxonomic
space. Taxonomic analysis should help guide future research
into areas not yet fully explored in the taxonomic space and
should aid in the optimization of current training regimens.
Computational models of human behavior for three train-
ing principles are studied in chapter 9: (a) speed-accuracy
tradeoff attributable to fatigue, (b) training difficulty, and (c)
stimulus-response compatibility. These studies show that the
ACT-R architecture and IBL theory present an accurate and
robust representation of the learning process in several train-
ing paradigms. The creation of a computer tool that represents
IBL theory gives rise to interesting new demonstrations of other
learning and training principles.
Chapter 10 reviews three models of basic psychological tasks,
involving data entry, target detection, and information integra-
tion. IMPRINT was the modeling platform used in each case.
Although in the past IMPRINT has been used for larger scale
models of military scenarios, the present study demonstrates
that it is also capable of modeling cognitive tasks at the level of
component processes.
Conclusions 329
Chapter 11 describes a procedure used to compare and evalu-
ate models of cognitive training, which had been created using
both the ACT-R and IMPRINT modeling platforms, finding that
the models were equally accurate and relatively slow. Because
the IMPRINT models are equation based, they could be trans-
lated into equivalent models in Matlab, which had simulation
times that were orders of magnitude shorter than those of the
original models. This translation along with the functional-
ity available in Matlab enabled rapid optimization of multiple
parameter values and visualization of the high-dimensional
parameter space.
Predicting performance can be important for implementation
and design of alternative training scenarios. Chapter 12 pro-
poses a compact mathematical model, founded on existing cog-
nitive theory, that can generate predictions for test performance
as a function of training history on various tasks. The model
incorporates many of the training principles identified in other
chapters of this volume.
Chapter 13 addresses the challenge of training knowledge
so that it is retained and accessed for application to particular
jobs or tasks. Evidence is reviewed from laboratory experiments
and educational research indicating that SPacing, Retrieval
practice, and INTerleaving (SPRINT) can improve the retention
and transfer of trained knowledge. The chapter illustrates how
education-based practices might be modified to improve knowl-
edge retention in knowledge-training programs, and in turn, to
improve the success of these knowledge-training programs.
To make applications to the real world possible, it is neces-
sary first to understand the “real world.” Chapter 14 describes
the reality of complex operations, particularly those conducted
by airline pilots, and discusses the implications of such reality
for research on training. A critical issue in this discussion is
the demands of concurrent task management in many every-
day environments, such as distracted driving, and the ways in
which training research must address such demands.
Acquired brain injury (ABI) is one of the most common causes
of death and disability in the United States and around the
world. Because the nature of ABI and the resulting psychologi-
cal deficits can be so diverse, unique rehabilitation techniques
are being developed to retrain cognitive skills and restore inde-
pendence in patients with ABI. Chapter 15 compares training
methods found in rehabilitation practice with the training prin-
ciples of psychological science detailed in the rest of this volume.
There are many instances of overlap between rehabilitation
practices and empirical principles, but recommendations are
made for future directions to be explored in order to optimize
rehabilitation science.
Chapter Clusters
The chapters in the volume fall into three logical clusters. The
first is a collection of chapters dealing with basic scientific
knowledge about training derived primarily from the cognitive
psychology laboratory. This cluster includes chapter 2 (reviews
the existing experimental literature on training principles);
chapter 3 (presents new experiments identifying new training
principles, refining existing principles, and extending the range
of applicability of those principles); chapter 4 (provides a frame-
work for studying the role of attention in training); chapter 5
(describes experiments on the rudimentary components of skill);
and chapter 6 (examines the possible use of automation in train-
ing and the impact of individual differences on training). These
chapters complement each other to provide an all-encompassing
overview of empirically based principles of the training process.
The second cluster consists of a set of chapters aimed at devel-
oping a theoretical overview of the training process. The chap-
ters include chapter 8 (presents a four-dimensional taxonomic
analysis of tasks, methods, measures, and principles, which
maps out the general domain of training); chapters 9 and 10
(provide specific, detailed models of training in particular tasks,
using ACT-R and IMPRINT as modeling platforms and yielding
precise quantitative predictions of the effects of selected train-
ing principles in those tasks); chapter 11 (compares the models
described in chapters 9 and 10 as to their accuracy and speed
of execution); and chapter 12 (develops a high-level formulation
of training that captures the three fundamental cognitive pro-
cesses and the generalized influence of training principles on
performance).
The third cluster brings together a set of chapters that exam-
ines training in the real world. These chapters show a sample of
the possible applications of basic research and theory. Included
in this cluster are chapters 7 (summarizes the use of training
technology in the Army); chapter 13 (identifies possible transi-
tions from educational research to knowledge training required
by real jobs); chapter 14 (describes the need for training in multi-
tasking in most real-world jobs); chapter 15 (considers possible
uses of training principles developed in normal circumstances for
cognitive rehabilitation after an acquired brain injury). Training
research has applications in many real-world circumstances.
These chapters provide only a sample but illustrate the range of
Conclusions 331
possibilities and justify the attempt to extend the empirical and
theoretical findings from the laboratory to the field.
Possible Future Directions
Empirical Research
The chapters of this book describe the current state of the sci-
ence of training from the perspective of cognitive psychology.
These chapters reveal an impressive set of facts about the vari-
ables that affect training, pointing to possible ways to optimize
training. But these chapters do not close the book on the various
issues discussed. Just about every issue is in need of some fur-
ther scientific examination. A poignant example concerns how
to optimize simultaneously the three component cognitive pro-
cesses of training—acquisition, retention, and transfer. Studies
have shown that variables that improve one process might not
act in the same way for the other two processes. For instance,
making acquisition difficult slows the acquisition process while
at the same time benefits retention and possibly also transfer. If
the idea is to optimize all three processes simultaneously, then
these differential effects or trade-offs have to be evaluated and
accommodated.
Chapter 3 provides several examples of the need for future
research. One case that seems most in need of additional work
concerns ways to overcome biases that influence decision mak-
ing. Many decisions require integration of informational items
that are presented sequentially to the decision maker. Exist-
ing data show that end-of-sequence decisions depend heavily
on the first items in the sequence (primacy) and, to a somewhat
lesser extent, on the items presented at the end of the sequence
(recency). Items presented in the middle are, thus, largely
ignored even though they provide an equal amount of informa-
tion. Optimal decisions usually require equal weighting of all
the items in the sequence. Finding ways to de-bias the over-
weighting of items from the ends of the sequence is an important
issue for future research. Finding ways to make items from the
middle of the sequence more salient, or distinctive, is one pos-
sible approach. Research might, for example, compare instruc-
tions that emphasize middle items to item-neutral instructions
of the sort used in prior research. Another possibility is to use
perceptual manipulations to make the middle items stand out.
Any such manipulations that shift the weighting toward middle
items should serve to de-bias the decisions away from the typi-
cal primacy and recency effects.
The effects of part task training are ambiguous in tasks where
the parts are performed concurrently, requiring time-sharing.
Some studies show an advantage for initial training with only
part of the task, but others show a cost associated with this part
task training. Chapter 4 analyzes the problem and gives some
possible reasons for these ambiguous results. First, the order in
which parts are practiced and the way in which, once practiced,
are joined together makes a major difference in the success of
the training regimen. Second, independent part training does
not provide a sound basis for developing a time-sharing skill
that might be required of the task as a whole. Third, in some
cases the parts, while separable in principle, are integrated in
the whole task, and integration needs to be trained for success-
ful whole task performance. To resolve the ambiguity in these
results, further experimentation is needed to evaluate the rela-
tive contributions of the three possible causes of training failure.
Related to the issue of part task training is the question of
when automation might be used in training to simplify and,
thereby, enhance performance of a complex task. Automation is
usually introduced to eliminate the need to train on parts of a
task, which are performed automatically for the trainee, allow-
ing the trainee to concentrate on the remaining parts. Chap-
ter 6 demonstrates that eventual performance is, if anything,
adversely affected by introducing automation in the training
sequence. That outcome is entirely consistent with the training
principle of task difficulty described in chapter 2, whereby more
difficult training hinders acquisition but enhances retention
and transfer. Automation makes the training task simpler by
eliminating the need for training on certain parts of the overall
task. But this type of training does not eliminate the need to
perform on those parts that had been automated when whole
task requirements are introduced. Future research needs to be
aimed at how much automation to use at the beginning of train-
ing and how best to sequence the reintroduction of automated
parts into the training regimen.
One final empirical example in need of future research relates
to the connection between education and training. Chapter 13
discusses a set of empirical results related to fact learning and
fact memory that demonstrate that fact memory can be strength-
ened by repeated retrieval of those facts over time. Fact memory
is moderated by variables such as spacing of retrieval oppor-
tunities, self-testing, and interleaving of several sets of facts.
The emphasis was on training to enhance the memorability of
the facts. Throughout the book, however, a distinction has been
made between facts and skills (i.e., between declarative and
Conclusions 333
procedural components). Retrieval practice certainly serves to
strengthen the declarative component of performance. But there
is an additional possibility, mentioned in passing, that retrieval
practice might also strengthen the skill of retrieving, that is
the procedural component of performance. Thus, practicing the
retrieval of one set of facts might not work only to the benefit
of those facts but also might enhance the learners’ ability to
perform retrieval of other fact sets that might be encountered
in the future. To date there have been no direct experimen-
tal tests of that possibility although the research required is
straightforward.
Theoretical Research
Some of the work reported in this volume, specifically in chap-
ters 9 and 10, was designed to provide precise quantitative
computational models of well-defined tasks, such as data entry
and target detection. Although those models were successful
in accounting for and predicting performance by human sub-
jects in the base tasks, they are by their nature limited to the
particular tasks for which they were designed. These models
provide, however, a blueprint for model development that would
address on the same level other training tasks of interest. Using
this blueprint, future theoretical research is likely to produce
additional models specific to those tasks with the same compu-
tational platforms (ACT-R and IMPRINT).
These new models can be evaluated and compared using the
same procedures described in chapter 11 for the data entry mod-
els of chapters 9 and 10. That is, the mathematical soundness
and computational feasibility of these models can be assessed
by the procedures that were outlined. More importantly, for mul-
tiparameter models whose computations can be translated into
equations, the procedures developed in chapter 11 can be used
to find efficiently and rapidly parameter values that are opti-
mal for consistency with the data across a wide variety of task
conditions. The new techniques described in chapter 11 can
also be used to enhance visualization of the high-dimensional
parameter space. Knowledge of the best parameter values and
the ability to view the high-dimensional parameter space will
ensure that the predictions provided by future models are most
accurate.
The final, and possibly most important, future theoretical
development involves the compact mathematical model pre-
sented in chapter 12. This model captures all three component
cognitive processes involved in training in a quantitative form
in a single equation with separate parameters for each of the
three processes. It also accounts mathematically for the impact
of most major training principles. In the future, this model can
be used to analyze the data from specific tasks. Parameters esti-
mated from performance on particular tasks can be used to
evaluate the importance of each of the three processes inde-
pendently. A given experiment exploring a particular task will
include manipulations that could differentially influence the
three component processes, and these influences can be rep-
resented by estimated parameter values. Thus, the model can
enable an investigator to localize observed training effects to
the acquisition, retention, and/or transfer processes. Further-
more, the model generates ideas or hypotheses that remain to be
tested empirically. One example concerns predictions of trans-
fer performance based on similarity between the conditions of
a particular task used for training and for testing. The model
expresses this relationship quantitatively whenever the similar-
ity relationship is known and measurable.
Applications
The various chapters of this volume have already identified
applications of training research to real-world problems. For
example, chapter 7 discusses applications to training in the
military using simulators and other high-tech support systems.
Chapter 13 describes applications to knowledge training in the
classroom for professional skill development. Chapter 14 reviews
the need for training in multitasking as a fundamental compo-
nent of complex real-world tasks such as piloting an aircraft.
There are many vast additional areas of potential application
of training principles that could benefit from a deeper scientific
examination. One area of considerable current interest and con-
cern, because of head injuries suffered by military personnel,
sports injuries (e.g., in professional football), and random acci-
dents, is cognitive retraining or rehabilitation. The issue here is
not the training of a novice but the retraining of a person who
once had a set of knowledge and skills but lost them because of
an acquired brain injury. As reviewed in chapter 15, research-
ers and clinicians have barely begun to scratch the surface of
this problem. There is some obvious success in current and past
efforts, but it is also obvious that there is room for improvement.
In particular, cognitive retrainers will need to consider the prin-
ciples for normal training outlined in this volume as possible
candidates for the facilitation of retraining efforts.
Conclusions 335
Another example comes from the possible application of theo-
retical ideas outlined in this volume to military planning. As
noted in chapter 10 and elsewhere, the Army uses a compu-
tational platform called IMPRINT to assist resource allocation
decision making in the field. IMPRINT contains parameters that
relate primarily to different task requirements and personnel
capabilities. Predictions using IMPRINT can be improved by
incorporating into the system the dimension of training based on
well-established training principles. The compact mathematical
model developed in chapter 12 encompasses a set of quantitative
expressions describing the impact of these training principles
on performance after training. These quantitative expressions
lend themselves directly to the computational device known as
performance shaping functions, and these functions are a fun-
damental part of IMPRINT. Thus, there is a direct path from the
theoretical work presented in this volume to an application in
military practice. That path could be followed in a future mili-
tary application.
Applications of training research to date have been limited
to a relatively small number of domains. The principles, how-
ever, should be much more general than those domains would
suggest. As noted above, various chapters have discussed appli-
cations in pilot training, military training, and classroom educa-
tion. But there are other domains in which principles identified
should also be useful. Among the possibilities are sports train-
ing, musical training, medical training, training astronauts for
long-haul missions (e.g., to Mars), and training digital skills. In
some cases, the principles would appear to apply directly. In
other cases, where there are imponderables, such as on a trip to
Mars, or where there is a continually evolving technology, such
as digital devices, the principles themselves might need modifi-
cation or extension. Nonetheless, these domains, in which there
are currently few applications of training research to practice,
are prime targets for future examination.
The Transitional Path

A great amount of work needs to be done before applications
of training research to real-life problems are fully justified. As
stated in chapter 1, training research aims primarily at the
empirical validation of general principles, presumably applicable
over a wide variety of jobs or tasks. But as general principles,
many of the details required for application are missing. The
proper use of these principles itself needs to be established by
research. In other words, it has to be determined the best way to
adapt training so as to be consistent with the principles. What
is required then is a set of guidelines for introducing the prin-
ciples into training regimens, which is the next step along the
transitional path from basic science to application in the field.
Guidelines alone do not complete the transition. There is a need
for greater specificity. A guideline itself must be translated into
a particular set of actions, or specifications, that can be followed
by an instructor who might not be well versed in basic science.
At that point the transition is complete and can be evaluated as
to its success. This volume is not concerned with specifications
and has little to say about guidelines, but, through its delin-
eation of training principles, it does provide the foundation for
improving real-world training in the future.
Index
Page locators in italics indicate of effort, 106; influence of

figures and tables. ideometer compatibility,
104–5; integrated model of
accidents, learning from acquisition, retention, and
accidents, 287–88 transfer, 253–55, 255, 257;
accuracy: cognitive antidote to interference as a source of
fatigue, boredom, or task skill dissipation, 95–96; and
disengagement, 16; and performance of concurrent
power law of acquisition, tasks, 103–6; power law of
3–4 acquisition, 3–4; quantitative
acquired brain injury (ABI). learning framework,
See cognitive retraining 92–94, 93; Simon effect
following acquired brain as function of number of
injury practice trials, 93; specificity
acquisition and transfer of of mixing effects, 98–99;
basic skill components: specificity of transfer, 94–95,
overview, 8, 106–7, 327–28; 95; stimulus-response
acquisition, retention, and compatibility (SRC) task
transfer, 2–5; attention and effect, 89–91; training
during learning and effectiveness prediction,
expression of skill, 103–4; 250–51, 251; transfer from
concomitant mixing of incompatible-mapping task
tasks and spatial mappings, to Simon task, 91–92; and
99–103, 101; cross talk transfer in mixed tasks with
between tasks, 105; multiple mappings, 97–103;
elimination of the compatible and transfer of learning,
mapping benefit, 97–98; 91–97
feature overlap account ACT-R platform: cognitive
of contextual similarity, architecture, 181–83; human
95; hypothetical process performance during training,
architecture for the expanded 228–31
mixing paradigm, 101; active choice and resource
implementation instructions, investment, 71–72
96–97; influence of active learning and learner
current and prior payoff control, 72
schedules on allocation adaptive training, 139–44
338 Index
affect and training, illusory choice, 71–72; guidance
competence and choice of versus mandating, 81–83;
training devices, 83–84 illusory competence and
air traffic control case study, 299 the choice of training
aircraft: and performing (reality), devices, 83–84; increasing
294–98; and training difficulty, 74–75; inducing
(theory), 290–94 resource investment, 70–73;
aircraft maintenance case study, interest, entertainment,
302 and “engagement”, 73;
anchoring and information part task training, 77–80;
integration, 56–58, 57 reducing resource demand,
applied training research. See 73–81; regarding training
technology-based applied and transfer, 70; scanning
training research training, 80–81; three
aptitude: aptitude automation sources of load in cognitive
interaction experiment, load theory, 69; variable
120–21; aptitude-treatment priority training, 80. See
interaction (ATI), 116–17 also automation influences
Army training: adaptive training, on training performance and
139–44; Army Research transfer
Institute (ARI), 135; and automatically-initiated
conducting applied military automation (AIA), 120–21
research, 135–36; distributed automation influences on
multiplayer games, 149–51; training performance
scenario-based training, and transfer: overview, 8,
138–39; transitions in, 112–14, 328; automation
136–37; and virtual as a training aid, 117–19;
environments, 144–49, 145 automation within training,
artificial grammar learning, 60, 118–19; changes with
62–63, 63 automation, 117–18; cognitive
attention: attention during ability and the training
learning, performance of situation, 116–17; experiment
concurrent tasks, 103–4; 1: seeking evidence of
attention effects on transfer, an aptitude automation
261; correcting attentional interaction, 120–21;
deficits following acquired experiment 2: variations in
brain injury, 309–11; focus automation across training,
of attention, 15; and speed- 122–23; experiment 3:
accuracy trade-off in data automation and training
entry task, 186–87, 187 in vehicle control, 123–24;
attention and cognitive resource future research directions
load in training strategies: and applications, 128–29;
overview, 8, 84, 327; active initial investigations of
learning and learner control, aptitude, automation, and
72; cognitive load theory, training, 119–25; proposed
67–70, 69; error prevention: framework of cognitive
training wheels, 75–77; abilities and automation in
the expertise effect, 81; the training, 125–28, 126; role of
generation effect: active cognitive ability in training,
Index 339
115–16; skill acquisition, 83; computational models
training, and cognitive of three training principles,
ability, 115–17 185–87; IBL tool and
automation, processing, training, computer modeling, 194–97,
and individual differences 195, 196; instance-based
(APTID) framework, 125–28, learning (IBL) tool, 183–85;
126 instance-based training,
Aviation Safety Reporting System 260; speed-accuracy trade-
(ASRS), 289–90 off in a data entry task, 186–
87, 187; stimulus-response
bias: human performance bias, compatibility, 192–94, 194;
response times (RTs) and training difficulty principle
SRC effect, 89–90; and in the RADAR task, 188–94,
transfer in mixed tasks with 189, 190, 191
multiple mapping, 99–103, cognitive retraining following
101 acquired brain injury:
biology teacher, knowledge overview, 10, 307–9, 329–
training illustrative case, 30; correcting attentional
267–68 deficits, 309–11; correcting
boredom: cognitive antidote to memory deficits, 311–13;
fatigue, boredom, or task integrating normal training
disengagement, 16; and principles with retraining
speed-accuracy trade-off in cognitive skills, 318–21;
data entry task, 186–87, 187 retraining executive
brain injury. See cognitive functions, 313–14;
retraining following acquired retraining language and
brain injury communication skills, 315–
business executive, knowledge 18; retraining visual-spatial
training illustrative case, skills, 314–15
268–69 compatible mapping benefit
elimination, 97–98
clicker technique, 49–53, 51, 53 competence, illusory competence
close combat tactical trainer and choice of training
(CCTT), 144–46, 145 devices, 83–84
cognitive ability: and automation computational modeling and IBL
in training, 125–28, 126; tool, 194–97, 195, 196
cognitive/affective synthesis concurrent tasks, 103–6
processing, 158–59, 159; consistent mapping (CM) trials,
cognitive load theory, RADAR task modeling with
67–70, 69; role of cognitive IMPRINT, 210–17, 216
ability in training, 115–16; context effects and training
and stimulus-response principles, 16–20
compatibility, 28; and the contextual similarity, transfer
training situation, 116–17 specificity principle, 94–95, 95
cognitive models of training continuous assessment of
principles and the instance- memory, 59–60, 61
based learning tool: overview, cross talk between tasks and
9, 181, 197, 328; cognitive psychological refractory
architectures: ACT-R, 181– period (PRP), 105
340 Index
data entry task: modeling with errorless learning, 19, 22–23
IMPRINT, 203–9, 207, executive functions retraining
208, 209; models of speed- following acquired brain
accuracy trade-off, 186–87, injury, 313–14
187 expanded mixing paradigm,
declarative and procedural mixing of tasks and spatial
information, 5–7, 259–60 mappings, 100, 101
decomposition. See task expansions, new taxonomy for
decomposition training, 171–72
decreasing automation (DA), expertise effect and cognitive
variations in automation load theory, 81
across training experiment, explicit learning, 6–7
122–23 extraneous load, 68–69, 69
deep processing, 70–71
deliberate practice, 13–14 fatigue, 16, 186–87, 187
depth of processing, 14 feature overlap account of
difficulty: increasing difficulty contextual similarity, 94–95,
and resource demand 95
reduction, 74–75; skill feedback, 21–23
acquisition, training, and Fitts, Paul M., 89
cognitive ability, 115–17 fi xed versus expanding rehearsal
digit data entry task: human and task training, 24–25
performance during training, fl ight training, 80–81, 290–98
227–28; and new training focus of attention, 15
taxonomy, 168–70 fractionation, 78–79
distributed multiplayer games, functional task development,
149–51 17–18
dual coding principle, 47–49, 48 fusion task modeling with
dual-task performance and IMPRINT, 217–23, 221, 222
ideomotor compatibility,
104–5 generation effect, 14–15, 71–72
genetic algorithms (GA) and
easy-difficult ordering, 19–20 parameter optimization, 237–
effectiveness. See training 40, 238, 239
effectiveness prediction germane load, 68–70, 68
effort allocation and performance Glasgow Coma Scale, 308
of concurrent tasks, 106 group training, new training
engagement: and cognitive taxonomy, 172
resource load in training guidance versus mandating,
strategies, 73; and speed- 81–83
accuracy trade-off in data
entry task, 186–87, 187 human-automation interaction
EPIC architecture, cognitive (HAI), 82–83
abilities and automation in human performance during
training, 125–27 training: overview, 9, 225–
erroneous response correction 27, 233–34, 329; ACT-R
and timing of feedback, platform, 230–31; bias,
22–23 response times (RTs) and
error prevention, 19–20, 75–77 SRC effect, 89–90; digit
Index 341
data entry task, 227–28; integrated task components,
IMPRINT platform, 228–29, 79–80
231–32; Matlab platform, intelligent tutoring systems (ITS),
226–27, 232–33; model 139–40
accuracy comparisons, interest, entertainment, and
233–34; model evaluations “engagement”, 73
and comparisons, 233–36; interference: and introduced
model performance timing difficulty in training, 27–28;
comparisons, 234–36; skill dissipation and transfer
modeling platforms and of learning, 95–96
model implementations, 228– interleaving, 280–82
30; the modeling tasks, 227– intersession interval (ISI), 21
28; parameter optimization, intrinsic load, 67–69, 68
236–40, 238, 239; RADAR Iowa Gambling Task, IBL
task, 228; radial basis cognitive model, 196–97, 196
function (RBF) modeling,
240–43, 241, 242, 243 knowledge training: and
declarative information, 5;
identifying and testing new and spacing, 20–21; strategic
principles, 58–63 use of knowledge, 15–16. See
ideomotor compatibility, 104–5 also SPRINT in knowledge
implementation instructions, training
96–97
implicit learning, 6–7 language and communication
IMPRINT. See modeling cognitive skills retraining following
tasks in IMPRINT; new acquired brain injury,
taxonomy for training 315–18
improving training effectiveness, learner control, 72
172–73 learning: factors affecting
incidents, learning from transfer of learning,
incidents, 288–90 91–97; implicit and
incompatible-mapping task explicit learning, 6–7; and
transfer to Simon task, individual differences in
91–92 abilities and backgrounds,
inducing resource investment, 171–72; learning outcomes
70–73 classification and associated
induction and spacing, 272–75, measures of assessment,
273, 274 167; list learning, 17, 28–29,
Information Integration (fusion 47–49, 48; mathematical
task): described, 56–58, 57; model of learning, 249–56,
modeling with IMPRINT, 217– 251, 253, 255; overlearning,
23, 221, 222 26–27; quantitative learning
information lookup in a framework and transfer of
computerized database, learning, 92–94, 93; and
42–44, 43, 44 “testing effect”, 25–26;
instance-based learning (IBL) virtual environments and
tool. See cognitive models of technology-based applied
training principles and the training research, 144–49,
instance-based learning tool 145. See also transfer
342 Index
manual control (MC) and model platforms and
automation experiments, implementations, human
120–23 performance during training,
mapping between the New Task 228–30
Taxa and IMPRINT Task modeling cognitive tasks in
Taxa, 160 IMPRINT: overview, 9, 328;
Matlab platform and human assessment of modeling
performance during training, fusion in IMPRINT, 220–23,
226–27, 232–33 221, 222; assessment of
medical training: application modeling the RADAR task
of SPRINT to real-world in IMPRINT, 215–17, 216,
knowledge training, 282– 217; cognitive model of
83; knowledge training data entry (IMPRINT), 207;
illustrative case, 268–69 cognitive model of RADAR
medication administration case task, 213; the data entry
study, 299–300 task (IMPRINT), 203–9;
memory: continuous assessment data to be modeled (fusion),
of memory, 59–60, 61; 218–19; data to be modeled
correcting memory deficits (IMPRINT), 204–6; data to
following acquired brain be modeled (RADAR), 211;
injury, 311–13; and depth the fusion task, 217–23;
of processing, 14; and human performance during
procedural reinstatement, training, 231–32; IMPRINT
17; retention: power law of planning matrix, 176–80;
forgetting, 4; for serial lists, the IMPRINT platform, 201;
28–29, 47–49, 48; serial model assessment (IMPRINT),
lists and tests of multiple 208, 208, 209; modeling data
principles in a single task, entry in IMPRINT, 206–8,
47–49, 48. See also 207; modeling fusion in
retention IMPRINT, 219–20; modeling
mental versus physical rehearsal, tasks, 202–3; modeling the
23–24 RADAR task in IMPRINT,
message comprehension, 44–46, 211–15, 213; the RADAR
45, 46 task, 210–17; summary
metacognitive illusion, 83–84 of IMPRINT model of data
military training. See new entry, 208–9; summary of
taxonomy for training; IMPRINT modeling, 223;
technology-based applied task to be modeled (fusion),
training research 217–18; task to be modeled
mixed tasks with multiple (IMPRINT), 203; task to be
mappings, 97–102 modeled (RADAR), 210–11;
mixing effects specificity, use of IMPRINT for cognitive
98–99 modeling, 202
mixing of tasks and spatial modeling tasks, human
mappings, 99–103, 101 performance during training,
model evaluations and 227–28
comparisons, human motor skills, 15, 23
performance during training, multiplayer games, 149–51
233–36 multitasking, 296–98, 302–4
Index 343
The Multitasking Myth perceptual/attentional
(Loukopoulos, Dismukes, & processing, 158–59, 159
Barshi), 294 performance assessment, new
taxonomy for training, 165–
navigation experiments, 67, 165, 167
active choice and resource performance context, new
investment, 71–72 taxonomy for training, 164–
new taxonomy for training: 65, 165
overview, 8–9, 156–57, performance of concurrent tasks,
328; IMPRINT planning 103–6
matrix, 176–80; improving performing (reality), real-world
training effectiveness, job performance training,
172–73; mapping between 294–98
the New Task Taxa and periodic summary feedback, 22
IMPRINT Task Taxa, 160; physical/communicative
new training dimension response, 158–59, 159
pedagogy taxa, 162; new positive focusing principle,
training dimension practice artificial grammar learning,
taxa, 163; performance 60, 62-63, 63
assessment, 165–67, 165, power law of acquisition, 3–4
167; performance context, power law of forgetting, 4, 250,
164–65, 165; possible 251, 258
expansions to the taxonomy, power law of practice, 250, 251,
171–72; task dimension of 258
the new taxonomy, 159; practice: deliberate practice,
task type, 158–59, 159; 13–14; depth of processing,
taxonomic hierarchy for the 14; retrieval practice, 275–
cognitive learning domain, 80, 277; variable practice
165; training method, 159– conditions, 29–30
64, 160, 162, 163; training prediction. See training
principles, 167–68, 168; effectiveness prediction
using the taxonomy, 168–71 principle of testing and clicker
Newell, Allen, 181 training, 50–52, 51, 53
principles, guidelines, and
On-Line Interactive Virtual specifications of training.
Environment (OLIVE) system, See training principles
149–51 procedural reinstatement
operating room case study, principle, 17, 41–44, 43, 44,
301–2 296
optimal modality principle, protective function principle,
44–46, 45, 46 continuous assessment of
overlearning, 26–27 memory, 59–60, 61
psychological refractory period
parameter optimization, 236–40, (PRP), 104–5
238, 239
part task training, 18–19, quantitative learning framework,
77–80 92–94, 93
payoff schedules and effort quizzing. See retrieval practice
allocation, 106 (quizzing)
344 Index
RADAR task: human resource demand reduction,
performance during training, 73–81
228; modeling with IMPRINT, resource investment induction,
210–17, 213, 216, 217; 70–73
models of training difficulty resource metaphor in task
principle, 188–92, 189, performance, 68–70, 69
190, 191; and new training response times (RTs): and data
taxonomy, 170–71; RADAR entry task modeling in
target detection and decision IMPRINT, 204–9, 209; and
making, 53–55, 55 SRC effect, 89–90
radial basis function (RBF) retention: and correcting
modeling, 225–26, 240–43, attentional deficits following
241, 242, 243 acquired brain injury,
random automation (RA), 309–10; and generation
122–23 effect, 14–15; power law of
real-world job performance forgetting, 4; and procedural
training: overview, 10, reinstatement, 17; response
287, 329; air traffic control location and retention
case study, 299; aircraft interval, 61; and spacing,
maintenance case study, 269–71; and training, 2–3
302; application of training retention intervals (RIs) and
research, 335–36; learning spacing, 21
from accidents, 287–88; retraining. See cognitive
learning from incidents, retraining following acquired
288–90; medication brain injury
administration case study, retrieval distraction principle,
299–300; multitasking, 47–49, 48
296–98; operating room retrieval practice (quizzing): and
case study, 301–2; other exam performance, 275–77,
professions, 298–302; 277; and self-testing, 279–
performing (reality), 80; and transfer, 277–79
294–98; train control case rule-based training, 260
study, 300–301; training
for multitasking, 302–4; scaffolding: and correcting
training (theory), 290–94 attentional deficits, 309–10;
real-world knowledge training described, 19–20; and
and SPRINT, 282–83 resource demand reduction,
rehearsal and task training, 75–77
23–25 scanning training, 80–81
research: future empirical scenario-based training, 138–39
research directions, 331–33; schedule for release, error
future theoretical research prevention, 76–77
directions, 333–34. See also segmentation, part task training,
new taxonomy for training; 77–78
technology-based applied self-testing and retrieval practice,
training research; training 279–80
principles research serial position principle, 28–29,
resource and effort allocation 47–49, 48
principles, 13–16 shallow processing and resource
Index 345
investment, 70–71 overview, 10, 267, 329;
similarity and transfer, laws applying SPRINT to real-
relating to similarity, 4–5 world knowledge training,
Simon effect: as function of 282–83; effects of retrieval
number of practice trials, practice (quizzing) on exam
93; quantitative learning performance, 275–77,
framework and transfer of 277; interleaving, 280–
learning, 92–94, 93 82; knowledge training:
Simon task: mixing of tasks illustrative cases, 267–69;
and spatial mappings, retrieval practice (quizzing)
99–103, 101; transfer from and transfer, 277–79;
incompatible-mapping task, retrieval practice through
91–92 self-testing, 279–80; spacing
simplification and resource and induction, 272–75, 273,
demand reduction, 74 274; spacing in authentic
simulated annealing (SA) and training contexts, 271–72
parameter optimization, SRC task, mixing of tasks and
237–38, 238 spatial mappings, 99–103,
skill dissipation, interference and 101
transfer of learning, 95–96 Start-End Model (SEM) and
skill expression, attention during fusion modeling in IMPRINT,
learning and performance of 219–20
concurrent tasks, 103–4 stimulus-response compatibility
skills training: and procedural (SRC): models of stimulus-
information, 5–6; and response compatibility,
spacing, 20–21 192–94, 194; Simon task
small unmanned aerial systems and Simon effect, 90–91;
(SUAS), 140–44 SRC effect, 89–91; stimulus-
soldier visualization station response (SR) associations
(SVS), 146–49 and learning, 90–91; training
spacing: in authentic training principles, 28
contexts, 271–72; and strategic use of knowledge, 15–16
induction, 272–75, 273, strategy-based learning (SBL),
274; and retention, 269–71; 182
training principles, 20–21 synthetic task environment
SPacing, Retrieval practice, and (STE), 123–25
INTerleaving. See SPRINT in
knowledge training task decomposition:
Spanair fl ight 5022 crash, decompositional approach to
learning from accidents, 288 new training taxonomy, 157;
specificity of mixing effects, performance context and
98–99 assessment, 164–67, 165,
specificity of transfer, 94–95, 95 167; and task type, 158–59,
speed of responses: cognitive 159, 160; and training
antidote to fatigue, boredom, method, 159–64, 162, 163;
or task disengagement, 16; training pedagogy taxa,
and power law of acquisition, 162; training practice taxa,
3–4 163; and training principles,
SPRINT in knowledge training: 167–68, 168
346 Index
task difficulty, 27–28 thrashing, navigation
task dimension, new taxonomy experiments, 72, 73
for training, 159 time-sharing skill, part task
task disengagement, 16 training and resource
task parameters and training demand reduction, 79
principles: feedback, 21–23; train control case study,
overlearning, 26–27; 300–301
rehearsal, 23–25; serial training: acquisition, retention,
position, 28–29; spacing, and transfer, 2–5; conditions
20–21; stimulus-response of training and functional
compatibility, 28; task task development, 17–18;
difficulty, 27–28; testing, and declarative and
25–26; variability of practice, procedural information, 5–7;
29–30 part-task training, 18–19;
task type, new taxonomy for principles, guidelines, and
training, 158–59, 159 specifications, 7–8; training
taxonomic hierarchy for the and transfer, attention and
cognitive learning domain, cognitive resource load, 70;
new taxonomy for training, and training effectiveness
165 prediction, 256–61, 257;
taxonomy. See new taxonomy for training method, new
training taxonomy for training,
technology-based applied 159–64, 160, 162, 163;
training research: overview, training principles, new
8, 134, 150–51, 328; adaptive taxonomy for training, 167–
training, 139–44; Army 68, 168; training (theory),
training in transition, real-world job performance
136–51; conducting applied training, 290–94; training
military research, 135–36; vs. education, 2. See also
distributed multiplayer training effectiveness
games, 149–51; learning prediction; training
in virtual environments, principles
144–49, 145; scenario-based training and its cognitive
training, 138–39 underpinnings: overview,
test of principles in complex, 1–2, 8–11; acquisition: power
dynamic task or job law of acquisition, 3–4;
environments, 53–58 acquisition, retention, and
testing, training principles, transfer, 2–5; declarative and
25–26 procedural information, 5–7;
“testing effect”, and learning, principles, guidelines, and
25–26 specifications, 7–8; retention:
testing effect, training power law of forgetting, 4;
effectiveness prediction, 259 training vs. education, 2;
tests of generality of individual transfer: laws relating to
principles across tasks or similarity, 4–5
jobs, 41–46 training compression principle,
tests of multiple principles in a and clicker training, 50–52,
single task, 46–53 51, 53
Index 347
training difficulty principle, procedural reinstatement,
RADAR target detection and 17; rehearsal, 23–25;
decision making, 53–55, 55 resource and effort allocation
training effectiveness principles, 13–16; serial
improvement: and IMPRINT position, 28–29; spacing,
planning matrix, 176–80; 20–21; stimulus-response
and new training taxonomy, compatibility, 28; strategic
172–73 use of knowledge, 15–16;
training effectiveness prediction: task difficulty, 27–28; task
overview, 9–10, 247–49, parameters principles,
261–63, 329; acquisition 20–30; testing, 25–26;
and retention, 250–51, variability of practice,
251; application to training 29–30. See also task
principles, 256–61, 257; parameters and training
attention effects on transfer, principles
261; deliberate practice, 258; training principles research:
depth of processing, 258–59; overview, 8, 40–41, 64,
generalization depends on 327; artificial grammar
similarity, 259; instance- learning, 60, 62–63, 63;
vs. rule-based training, clicker technique, 49–53, 51,
260; integrated model of 53; continuous assessment
acquisition, retention, and of memory, 59–60, 61; hit
transfer, 253–55, 255, 257; rate at training and test as
mathematical model of function of training and test
learning, 249–56; power law conditions, 55; identifying
of forgetting, 250, 251, 258; and testing new principles,
power law of practice, 250, 58–63; information
251, 258; procedural vs. integration, 56–58, 57;
declarative training, 259–60; information lookup in a
small unmanned aerial computerized database,
systems (SUAS), 140–44; 42–44, 43, 44; mean total
spacing of practice, 260; number correct as function
testing effect, 259; transfer, of acquisition condition,
252–53, 253 63; memory for serial
training principles: overview, lists, 47–49, 48; message
8, 13, 30, 327; cognitive comprehension, 44–46, 45,
antidote to fatigue, boredom, 46; radar target detection
or task disengagement, 16; and decision making, 53–55,
context effects principles, 55; test of principles in
16–20; deliberate practice, complex, dynamic task or job
13–14; depth of processing, environments, 53–58; tests
14; easy-difficult ordering, of generality of individual
19–20; feedback, 21–23; principles across tasks or
focus of attention, 15; jobs, 41–46; tests of multiple
functional task development, principles in a single task,
17–18; generation effect, 46–53
14–15; overlearning, 26–27; training wheels (error
part-task training, 18–19; prevention), 19–20, 75–77
348 Index
transfer: attention effects cognitive retraining following
on transfer, 261; factors acquired brain injury
affecting transfer of learning, trial-by trial feedback, 22
91–97; illusory competence
and choice of training Unified Theories of Cognition
devices, 83–84; integrated (Newell), 181
model of acquisition, universal law of generalization,
retention, and transfer, 253– 252–53, 253
55, 255, 257; mixed tasks unmanned aerial vehicle (UAV),
with multiple mappings, 123–25
97–102; performance of user-initiated automation (UIA),
concurrent tasks, 103–6; 120–21
and procedural
reinstatement, 17; and variability of practice, 29–30
training effectiveness variable priority training, 80
prediction, 252–53, 253; and vehicle control, 123–25
variable practice conditions, Virtual Battle Space 2 (VBS2),
29–30. See also automation 150–51
influences on training virtual environments and Army
performance and transfer training, 144–49, 145
transfer from incompatible- Visual Restitution Training
mapping task to Simon task, (VRT), 315
91–92 visual-spatial skills retraining
transfer in mixed tasks with following acquired brain
multiple mappings, 97–103 injury, 314–15
transfer of learning, 91–97
transfer specificity principle, worked examples, error
94–95, 95 prevention and resource
traumatic brain injury (TBI). See demand reduction, 76
WSTC memory mnemonic, 312

Cognition Training: Optimizing Efficiency, Durability, and Generalizability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cognition Training: Optimizing Efficiency, Durability, and Generalizability

Uploaded by

Copyright:

Available Formats

Training

Alice F. Healy and

Simultaneously published in the UK

Psychology Press is an imprint of the Taylor & Francis Group, an informa

© 2012 Taylor & Francis, LLC

All rights reserved. No part of this book may be reprinted or reproduced

Trademark notice: Product or corporate names may be trademarks or reg-

Library of Congress Cataloging in Publication Data

ISBN: 978-1-84872-950-6 (hbk)

1 Introduction: Training and its Cognitive Underpinnings 1

2 Empirically Valid Principles of Training 13

3 Basic Research on Training Principles 40

4 Attention and Cognitive Resource Load in Training

5 Acquisition and Transfer of Basic Skill Components 89

6 How Cognitive Ability and Automation Influence

7 Conducting Technology-Based Applied Training

9 Cognitive Models of Training Principles and

10 Modeling Cognitive Tasks in IMPRINT 201

11 Evaluation and Comparison of Models of Human

12 A Compact Mathematical Model for Predicting

13 Put the SPRINT in knowledge training: Training

14 Training for Real-World Job Performance 287

15 Cognitive Retraining Following Acquired Brain Injury 307

Training is both a teaching and a learning experience, and

The major challenge facing the military, industry, sports, and

Training and Education

Acquisition, Retention, and Transfer

Acquisition: Power Law of Acquisition. There are two

Retention: Power Law of Forgetting. With the passage of time

Transfer: Laws Relating to Similarity. Training on a particular

Declarative and Procedural Information

Principles, Guidelines, and Specifications

This chapter reviews the theoretical and empirical research lit-

Principles Relating to Resource and Effort

Strategic Use of Knowledge

Cognitive Antidote to Fatigue, Boredom,

Principles Relating to Context Effects

Functional Task Development

Principles Relating to Task Parameters

When Should Feedback be Provided? In a declarative memory

Mental versus Physical Rehearsal. Often a skill-based task

Fixed versus Expanding Rehearsal. The studies of spacing

Summary and Conclusions

Chapter 2 reviewed the theoretical and empirical literature on

1. Tests of the generality of individual principles across tasks

The following sections contain experiments that illustrate

Tests of the Generality of Individual Principles

Category Continuity Condition

the procedural information (mouse movement and the clicking

Figure 3.2 Proportion of perfect trials as a function of phase and

In summary, there is domain specific transfer of both declar-

Figure 3.3 Proportion correct in study by McCormick et al. (2010) as

Figure 3.4 Proportion correct in study by McCormick et al. (2010) as a

blocks of trials than in the symbol modality (see Figure 3.4).

Tests of Multiple Principles in a Single Task

Memory for Serial Lists

Moves Condition Second Attempt No Moves Condition Second Attempt

Figure 3.6 Proportion correct at test in Experiment 1 by Anderson et

1.0 Training Fact Type

Tests of Principles in Complex, Dynamic Task,

Radar Target Detection and Decision Making

task (rather than the sequential, relevant action-firing response)

Figure 3.9 Mean distance of selected deployment location from items