You are on page 1of 43

Review of Educational Research

Summer 1993, Vol. 63, No. 2, pp. 201-243

Techniques and Procedures


for Assessing Cognitive Skills
James M. Royer, Cheryl A. Cisero, and Maria S. Carlo
University of Massachusetts

The intent of the article is to survey procedures that could be used to assess
progress in instructional programs designed to enhance cognitive skills. The
organizational framework is provided by J. R. Anderson's (1982) theory of
cognitive skill development and by Glaser, Lesgold, and Lajoie's (1985)
categorization of dimensions of cognitive skills. After describing Anderson's
theory, the article discusses the following types of measures of cognitive skills:
(a) measures of knowledge acquisition, organization, and structure; (b)
measures of depth ofproblem representation; (c) measures of mental mo dels;
(d) measures of metacognitive skills; (e) measures of the automaticity of
performance; and (f) measures of efficiency of procedures. Each of the
sections describing measurement procedures is followed by a discussion of
the strengths and weaknesses of the procedures. The article closes with a
general discussion of techniques for measuring cognitive skills.

Recent developments in cognitive science have provided new avenues for


approaching instruction and training. The recent developments include the
development of cognitive theories (or models) of learning and performance and
the development of instructional procedures based on the theories. These new
instructional methods require the development of new techniques for evaluating
the effectiveness of instruction. The purpose of this article is to present a catalog
of procedures that can be used to evaluate the effectiveness of instructional
procedures motivated by cognitive theories of learning.
The intended audience for this article is researchers interested in developing
procedures for assessing cognitive skills. Most of the assessment techniques
described in the article have been gleaned from research studies conducted for
the purpose of testing a hypothesis derived from cognitive theory. For this
reason, the techniques as described are rarely suitable for the purpose of assess-
ing instructional outcomes without some tinkering to make the techniques fit the

This work was supported by the Navy Personnel Research and Development
Center (Dr. John Ellis) under the auspices of the U.S. Army Research Office
Scientific Services Program administered by Battelle (Delivery Order 1248, Contract
No. DAAL03-86-D-0001). The authors would like to thank Tom Andre, Dan Chris-
tinaz, John Ellis, William E. Montague, Steve Parchman, Larry Pugh, Nick Van-
matre, Merle Vogel, and Wallace Wulfeck for their valuable comments on an earlier
draft of this article. The views, opinions, and/or findings herein are those of the
authors and should not be construed as an official Department of the Army position,
policy, or decision.
201
Royer, Cisero, and Carlo
instructional situation. However, we believe that researchers interested in assess-
ing instructional outcomes can develop useful assessment procedures using the
techniques as conceptual starting points.
The Purpose and Form of Cognitive Assessment
A central thesis that motivated the writing of this article is that assessment in
cognitive theory-based instructional systems has both a different purpose and a
different form than it had in earlier instructional systems. In behaviorially
oriented (noncognitive) instructional systems, assessment is generally used to
evaluate the effectiveness of instruction, as it relates to an instructional goal, or
to provide an index of the degree to which a student has benefited from exposure
to instruction. Assessment in a cognitive-based system has additional goals.
First, cognitive instructional systems view the product of learning as a developing
cognitive skill. This means that one purpose of the assessment is to identify the
student's current status in a developmental model of cognitive skill attainment.
This assessment of status in the developmental sequence of a cognitive skill can
then be used to prescribe future instructional experiences.
Another purpose of assessment in a cognitively oriented instructional system
is to provide diagnostic information. Performance on a cognitive assessment
procedure not only signals the success or failure of the instructional event but
also provides information that allows the instructional system to decide what to
do when learning failures are encountered. The role of errors may be critical in
this process. Errors are frequently considered to be the result of bugs in the
cognitive system, and certain errors are assumed to indicate the existence of
certain bugs (e.g., Burton, 1982). Thus, if a student makes one kind of error on
the assessment event, the system prescribes one kind of instructional event,
whereas a different error would result in the system's prescribing a different
instructional event.
The form of assessment procedures in cognitive instructional systems is also
different from earlier systems. Assessment in behaviorally oriented learning
systems generally focuses on whether learners have acquired the declarative
knowledge associated with the content domain and/or on whether learners can
use the knowledge they have acquired to perform activities such as selecting the
correct answer to a mathematical problem presented in the form of a multiple-
choice question. Success in this noncognitive assessment framework is generally
evaluated in terms of numbers of questions or problems correctly answered. In
contrast to this quantitative orientation to instructional assessment, cognitive
assessment frequently focuses on both qualitative and quantitative aspects of
performance. Successful instruction is thought to result in qualitative changes in
the organization and structure of knowledge and in the fluency and efficiency
with which the knowledge can be used. This means that cognitive assessment
procedures should be able to provide indexes of change in knowledge organiza-
tion and structure and indexes of the accuracy, speed, and resource load of the
activities being performed.
Procedures for assessing gains in cognitive instructional systems have not
received much attention thus far, largely due to the early stage of development of
both the theories and the instructional systems based on the theories. Theory
developers have made heavy use of techniques such as the collection of "thinking
202
Measuring Cognitive Skills

aloud" protocols as a means of evaluating theory and the instructional systems


derived from the theories. Verbal protocols are adequate for small scale research
purposes, but they are impracticable for most instructional purposes. The pri-
mary reasons they are impracticable for instructional purposes are because the
process of collecting, scoring, and analyzing protocol data is extremely labor
intensive and because the labor required for their use is unlikely to be available in
actual instructional situations. Thus, there is a need for assessment procedures
that can assess the learning outcomes of instructional systems that are based on
cognitive theory. In this article, we take an initial step toward the fulfillment of
that need.
Cognitive Skill
A concept that is common to both recent cognitive theories and earlier
behavioral learning theories is cognitive skill. However, the meaning of the
concept is quite different in cognitive and behavioral theories. A cognitive skill in
recent cognitive theories is a holistic capability that has a distinctive history of
quantitative and qualitative developmental change. In contrast, cognitive skills
in earlier behavioral theories were viewed as packets of information that could be
acquired piecemeal. An example of this view can be found in descriptions of the
task analysis process that was an integral part of behavioral approaches to
instructional design (e.g., R. C. Anderson & Faust, 1973). The goal of instruc-
tion in behaviorially oriented theories was to identify and arrange the systematic
acquisition of the skill packets in a prescribed order. Evidence for success of the
instruction could be obtained by determining if the student could demonstrate
that each of the packets could be recalled or used and that successful use of the
packets resulted in the attainment of a behaviorally stated instructional goal.
Recent cognitive theories have a very different perspective and a very different
goal for instruction than the earlier behavioral theories. Given the centrality of
the cognitive skill concept to cognitive theories and the importance of the
concept for assessment issues, the next section of the article will consider the
concept in some detail.
Cognitive skill from a cognitive theory perspective. For purposes of exposition
and organization, this article will assume that the human information processing
system can be divided into three hierarchical layers: a layer of basic capacities, a
layer of cognitive skills that are capable of being transformed from controlled to
automatic/encapsulated processes, and a layer of higher cognitive skills and
capacities that are responsible for goal setting and planning of cognitive activity.
Basic capacities are general capabilities, like working memory capacity and
speed of concept activation, that under normal circumstances are relatively
impervious to instructional intervention. Performance on tasks that are com-
monly used to assess basic capacities can be influenced by extraordinary training
efforts, as witnessed by the Ericsson, Chase, and Faloon (1980) demonstration
that digit span memory can be improved to well over 50 digits by extensive
practice. But these changes in performance are assumed to be the result of
improvements in coding strategies rather than in changes in basic capacities.
Support for this assumption can be found in the fact that improvements in a
practiced task do not transfer to other tasks that are also dependent on the basic
skill.
203
Royer, Cisero, and Carlo
The focus of this article will be on the development of instructionally relevant
cognitive skills (the middle and upper layers described in the previous para-
graph). The working definition adopted in this article of a cognitive skill has
several elements. First, cognitive skills consist of an integrated mixture of
specific facts and procedures for utilizing those facts. In other words, cognitive
skills are made up of both declarative and procedural knowledge. Second,
cognitive skills can be acquired through training and/or experience, in contrast to
intellectual abilities, such as intelligence, which are supposedly resistant to
change via training. Third, cognitive skills are applicable to a number of activ-
ities within a defined domain of activity, but their use is generally confined to
that domain. As an example, a cognitive skill utilized by a skilled electronic
troubleshooter will be applicable to troubleshooting activities on a variety of
types of electronic equipment. However, that skill will probably not transfer to
another domain of activity, such as troubleshooting internal combustion en-
gines. Finally, cognitive skills go through several ordered stages while being
acquired. These ordered acquisition stages transform the skill from an activity
that is slow and highly taxing on the cognitive system to an automated set of
activities that may place virtually no load on the system.
Another distinction that may help to capture the subject matter of this article
is that between weak method and strong method problem solving (e.g., Ander-
son, 1987). Weak method problem solving strategies such as solution via anal-
ogy, means-end analysis, and the utilization of worked problems can be applied
to problems in many different domains, but the techniques are not commonly
used by expert problem solvers. Rather, expert problem solvers utilize knowl-
edge intensive techniques that are restricted to the particular domain of exper-
tise. This article will primarily focus on the acquisition of strong-method skills.
The definition of a cognitive skill utilized in the article places several limita-
tions on the nature of the assessment procedures that will be discussed. For
instance, skills, as they are viewed in the article, are the goals of activities (e.g.,
training) designed to develop proficiency in limited domains of activity. These
specifically focused activities can be contrasted to other types of activities, such
as broad-based educational activities, where the goals are to acquire knowledge,
general problem solving procedures, and general learning strategies that can be
generalized to potentially large and unspecified domains of activity. One conse-
quence of this emphasis is that most of the assessment procedures considered in
this article are primarily applicable to adults, the target population for the bulk of
skill training activities.
Organizational Structure for the Article
The discussion of different procedures for assessing cognitive skills will be
organized around a framework provided by J. R. Anderson's (1982) theory of
skill development and around a taxonomy of important dimensions for assessing
achievement proposed by Glaser, Lesgold, and Lajoie (1985). Anderson's the-
ory was chosen to provide organizational structure because it is a developmental
theory: It has survived extensive empirical probings (e.g., Anderson, 1983); it
has relevance to domains of cognitive activity as varied as reading, mathematical
competency, and scientific problem solving; it has guided the development of
instructional activities (e.g., Anderson, Boyle, & Reiser, 1985), and it has been

204
Measuring Cognitive Skills

widely recognized as one of the most important theoretical developments of the


cognitive revolution. For instance, Holyoak (1991) refers to Anderson's theory
as "the first grand, overarching theory of cognition since the Hull-Spence
theories of the 1940s and 1950s" (p. 302). The Glaser et al. (1985) article
contributes to the organizational structure of this article by describing dimen-
sions of cognitive skills that could vary in the course of skill acquisition.
Cognitive Skill Development
Procedures used to assess the acquisition of cognitive skills have different
qualities than other assessment procedures. The manner in which cognitive skills
are trained frequently requires assessments of the stage of skill development
rather than an assessment of whether a skill has or has not been acquired. The
emphasis on the stage of skill development means that a scheme for assessing
cognitive skills should be embedded within a theory of skill development. The
discussion of ways of assessing cognitive skills will occur within the context of
such a theory.
Fitts (1964) originally proposed that skill acquisition could be described as a
three-stage process: a cognitive stage, an associative stage, and an autonomous
stage. Anderson elaborated the theory in 1982 and placed it within his ACT
theory of human cognition. Anderson labeled his three stages of skill develop-
ment the declarative stage, the knowledge compilation stage, and the procedural
stage.
The declarative stage. In the declarative stage, as Anderson (1982) describes it,
the learner either encounters or is instructed in the facts relevant to the execution
of a particular skill. These facts are represented in the learner's declarative
memory in the form of statements. In the initial stages of skill acquisition, the
learner uses these facts "interpretatively." The assumption is that the learner
uses a general problem solving strategy to organize the utilization of the facts and
then brings each fact into play as its use is called for by the general strategy.
When performance of the skill is called for, each fact is "interpreted" by the
general problem solver (Anderson, 1982). For example, a trainee enrolled in an
electronic troubleshooting course might have the goal of finding a faulted com-
ponent in an electrical circuit. The process of accomplishing the goal might
involve consulting a troubleshooting manual and executing a series of activities
described in the manual. The general problem solver specifies that the student
performs the first described activity, at which point memory or notes are consul-
ted to determine the next activity to be performed, and so on, with each step
being consulted and interpreted as the preceding step is performed.
A learner in the declarative stage of skill acquisition can answer questions
about the skill, and he or she can perform the skill by interpretatively utilizing
declarative information. However, the learner pays a serious cost for this perfor-
mance in terms of time and working memory load. The process is slow because
the learner must retrieve information from long-term memory (or from notes or
a book) each time a preceding step is completed. The process consumes a great
deal of memory because the learner must keep the goal, the general problem
solving strategy, and the declarative knowledge active in working memory at the
same time. This has the additional consequence of requiring relatively small
amounts of declarative knowledge to be active at any one time, thereby assuring
small steps and increased solution time.
205
Royer, Cisero, and Carlo
The knowledge compilation stage. The knowledge compilation stage is a transi-
tion stage from the declarative stage to the procedural stage. The factual infor-
mation that has been acquired in the declarative stage is gradually transformed
into a procedural form that can be applied with minimal conscious reasoning
activity. Anderson (1982) suggests that there are two processes that are responsi-
ble for knowledge compilation. The first process is called composition. Composi-
tion is a process wherein several steps that must be followed in problem solution
(Anderson refers to these steps as productions) are collapsed into a single step
that executes each of the embedded steps in sequence. The result of composition
is a greatly speeded up execution of the steps and a reduction in the memory load
for performance of the task.
The second process responsible for compilation is proceduralization. This
process constructs new productions that contain within them the declarative
knowledge that previously had to be retrieved from long-term memory or from
an archival source. A production is a condition/action statement. The condition
part of the statement specifies a stimulus situation that, if present, will produce
the action specified in the statement. As an example, suppose that a trouble-
shooting manual specified that, given a particular circuit symptom (Stimulus
Situation 1), a voltage check on a particular component (Action 1) should be
conducted and, if the reading were a certain value (Stimulus Situation 2), that
step should be followed by readings on a second component (Action 2). A novice
troubleshooter would have to consult the archival source for Step 1, record the
results of that step, consult the source for Step 2, record the outcome of that step,
and so on. Proceduralization results in the development of a production (pro-
cedure) that combines all of the previously separate steps into a single, automat-
ically executed step. That is, the several conditions and actions specified in the
several steps become integrated into a single production. In addition to saving
time, proceduralization also greatly reduces memory load because the declara-
tive knowledge associated with performance of the skill no longer has to remain
activated in working memory.
The procedural stage. After the knowledge in a skill area has been translated
into a set of procedures that are applied more or less automatically, there is still
considerable learning that can take place. This learning entails the speeding up of
the application of particular skills to appropriate problems and a strengthening
process in which better rules are strengthened and poor rules are weakened.
The speeding up process is a continuation of the knowledge compilation stage
wherein separate cognitive activities continue to compile, thereby speeding up
their actions. Speeding up also is associated with an increasing associative
relationship between a particular stimulus event and a production that fires in the
presence of that event.
The strengthening process serves to strengthen productions that have general
utility in task performance and to weaken productions that may interfere with
skilled performance. During the knowledge compilation stage, the learner may
develop productions that do not lead to rapid and efficient task performance.
The strengthening process gradually eliminates these productions from the
system and strengthens applicable productions to the point where their use is
automatic and produces little load on the memory system.
Anderson's theory provides a useful way of describing the course of cognitive
skill development, and it suggests that an important characteristic of skill assess-
206
Measuring Cognitive Skills

ment procedures is the ability to identify a learner's stage of skill development.


The theory does not, however, indicate how skill acquisition should be assessed.
An article by Glaser, Lesgold, and Lajoie (1985) provides a framework that is
useful for describing different skill assessment procedures.
Dimensions of Cognitive Assessment
There are many characteristics associated with performance at each of Ander-
son's stages of skill development. One way to provide order to these characteris-
tics is to utilize a taxonomy of dimensions proposed by Glaser et al., (1985).
Glaser and his associates described the dimensions as a set of components
common to developing skills. These dimensions, in combination with Ander-
son's stages of skill development, provide an organizational framework for
categorizing techniques for assessing cognitive skills. The dimensions suggested
by Glaser et al., (1985) are described below.
Knowledge organization and structure. As a learner first begins to acquire a
skill, the knowledge corresponding to that skill is stored as a set of unrelated or
loosely related facts. As skill develops, these knowledge units become highly
interconnected and structured. As a consequence, the skilled individual acti-
vates large chunks of information when performing an activity within the skill
domain. In contrast, the novice activates isolated facts, definitions, and con-
cepts. The implication of this is that measures of knowledge organization and
structure can provide indexes of skill development.
Depth of problem representation. Individuals who are highly skilled in a
problem domain frequently perceive a problem in terms of abstract principles
that subsume the particular problem as well as related problems. In contrast, the
novice commonly perceives a problem in terms of the particular elements pres-
ent in the problem. The ability to perceive the principles underlying a problem
rather than focusing on the surface structure of the problem is another index of
skill development.
Quality of mental models. As skill develops within a domain of activity, there is
a concomitant development in the learner's ability to envision the operation of
systems within the domain. This ability to imagine a system in operation is called
a mental model of the system. Experts within a domain frequently have complex
and elaborate mental models of the domain that guide their performance while
working within the domain. The presence and the sophistication of these models
are other ways of indexing skill development within the domain.
Efficiency of procedures. In many situations, the difference between skilled
and unskilled performance resides in the efficiency of performance rather than in
the accuracy of performance. Unskilled individuals can frequently reach the
correct solution to a problem by systematically following a fail-safe sequence of
steps. When faced with the same problem, the skilled performer follows a
solution path that eliminates many of the unnecessary steps in the fail-safe
procedure. The ability to efficiently utilize acquired skills is another index of
growing skill development.
Automaticity of performance. When an unskilled individual is performing
within an activity domain, every aspect of performance is frequently based on
conscious reasoning processes. Because the cognitive system has a very limited
capacity, this means that the individual's ability to process information other
207
Royer, Cisero, and Carlo
than that associated with the immediate problem is nil. In contrast, a skilled
performer can handle many aspects of performance in an automatic and nearly
load-free manner, thereby leaving a certain amount of cognitive capacity avail-
able for performing other activities such as integrating information, planning, or
even performing a completely unrelated task. The ability to perform tasks in an
automatic and capacity-free manner is yet another index of skilled performance.
Metacognitive skills for learning. Metacognitive skills are cognitive activities
that allow an individual to reflect on and to control performance in a useful and
efficient manner. Skilled performers within a domain possess the capability of
planning their activity, monitoring the success or failure of their own activities,
and altering behavior in accordance with the monitoring activity. Less skilled
performers are far less proficient at this monitoring process and, correspon-
dingly, less successful at applying the skills they do possess.
The remainder of this article will describe techniques that can be used to assess
cognitive skills. The description of these techniques will be organized within the
framework provided by Anderson's theory (1982) and the dimensions of cogni-
tive skills of Glaser et al. (1985).
Techniques for Measuring Cognitive Skills
When we first began conceptualizing the organization of the article, we
thought it would be possible to develop a matrix of assessment techniques such
that particular techniques would be identified as possible assessment procedures
for a particular cognitive dimension at a particular stage of cognitive develop-
ment. It soon became apparent that assessment procedures could not be sorted
as easily into the pigeonholes created by the stages of the development/cognitive
dimensions framework as we had envisioned. The problem is that many of the
assessment techniques can be used with benefit during each of the stages of skill
development. We have gone ahead, however, with the categorization process in the
hopes that it would provide a convenient means of identifying the different assess-
ment techniques. The results of this activity are summarized in Table 1. The reader
should view the contents of Table 1 as suggestive rather than conclusive.
Another point that deserves emphasis is that the assessment of cognitive skills
should occur in the context of an effort to identify the cognitive skills required for
the performance of a particular activity. In short, a cognitive task analysis should
be conducted before an attempt is made to train a cognitive skill, and it should
certainly precede any effort to measure the acquisition of a cognitive skill.
Measures of Knowledge Acquisition, Organization, and Structure
This section of the article will discuss techniques that can be used to measure
the extent to which a learner has acquired the information necessary to perform
within a domain and the extent to which that information is represented as an
integrated knowledge structure. The shift from isolated facts and loosely bun-
dled units of information to highly integrated knowledge structures represents
one of the hallmarks of transition from novice to expert performance.
Measures of knowledge acquisition. Traditional measurement procedures such
as short answer, true-false, matching, and multiple-choice tests can be used to
provide indexes of the extent to which learners have acquired the declarative
knowledge required to function in a domain (i.e., the first step in skill develop-
ment). These measurement techniques have been shown to provide reliable and
208
TABLE 1
Cognitive skill assessment techniques
Cognitive dimension assessed
Developmental level of
Author Type of task cognitive skill
Knowledge acquisition
Traditional assessment Declarative
Ronan et al., 1976 Fireman tab test Declarative
Lesgold & Lajoie, 1991 Recall of electronic com- Declarative
ponents
Knowledge structure and organization
Shepard, 1962 Multidimensional scaling All levels
Geeslin & Shavelson, Associative recall of con- All levels
1975 cepts
Chietal., 1982 Conceptual recall of All levels
physics concepts
Konold & Bates, 1982 Concept ratings All levels
Konold & Bates, 1982 Concept categorization All levels
Reitman & Rueter, Concept free recall All levels
1980
Adelson, 1981 Free recall of computer All levels
programs
Guthrie, 1988 Document search All levels
Cardet al., 1980 Text editing All levels
Royer, 1990 SVT assessment All levels
Carlo et al., 1992 Inferencing assessment All levels
Depth of problem representation
Chase & Simon, 1973 Chess perceptual repro- All levels
duction
Chase & Simon, 1973 Chess memory reproduc- All levels
tion
Egan & Schwartz, 1979 Reproduction of elec- All levels
tronic circuits
Barfield, 1986 Program recall All levels
Chietal., 1981 Physics problem sorting All levels
Schoenfeld & Her- Math problem sorting All levels
rmann, 1982
Weiser & Shertz, 1983 Computer program sort- All levels
ing
Hershey et al., 1990 Financial planning search All levels
Hardimanet al., 1989 Physics problem judg- All levels
ments
Carlo et al., 1992 Classification of scientific All levels
principles
Adelson, 1984 Flowchart comprehension All levels
Adelson, 1984 Insert missing line of All levels
program code
Gouletet al., 1989 Identification of tennis All levels
serves
Allardetal., 1980 Recall of basketball posi- All levels
tions
Purkitt & Dyson, 1988 Information usage in po- All levels
litical decision making
Royer, Cisero, and Carlo

TABLE 1 (continued)
Cognitive dimension assessed
Developmental level of
Author Type of task cognitive skill
Mental models
McClosky et al., 1980 Prediction of flight path Declarative/compilation
Gentner & Gentner, Identifying underlying Declarative/compilation
1983 metaphors
Lopes, 1976 Poker mental models All levels
J. R. Anderson, 1990 Correct and buggy pro- All levels
ductions
Johnson, 1988 Malfunctioning generator All levels
models
Lesgoldet al., 1988 X-ray drawing All levels
Metacognitive skills
Baker, 1989 Text faulting All levels
Rosenbaum, 1986 Visit planning All levels
Gerace & Mestre, 1990 Planning in physics All levels
problem solving
Lesgoldet al., 1990 Problem space planning All levels
Swelleret al., 1983 Changes in problem solv- All levels
ing strategy
Automaticity/encapsulation of performance
Lesgold & Lajoie, 1991 Speed of conceptual All levels
processing
Schneider, 1985 Dual task methodology All levels
Britton & Tesser, 1982 Dual task methodology All levels
Efficiency of procedures
Glaser et al., 1985 Card sorting of assembly Declarative
procedures
Lesgold & Lajoie, 1991 Multimeter judgment All levels
Lesgold & Lajoie, 1991 Multimeter placement All levels
Lesgold & Lajoie, 1991 Logic gate efficiency All levels
Green & Jackson, 1976 Hark-back technique All levels

valid assessments when the purpose of assessment is evaluating whether spe-


cified knowledge has been acquired. Moreover, when the exams are criterion
referenced, they are likely to be good indicators of the extent to which learners
have mastered the individual units of information that provide the initial building
blocks of a developing cognitive skill. It should be noted, however, that the
concept of mastery, as it is used in the context of behaviorially oriented instruc-
tional systems, is a misnomer in light of theories of cognitive skill development.
Schneider (1985, 1986) has shown that improvements in skilled performance
continue long after errorless performance is achieved. Thus, a student who has
achieved a mastery level of performance on a criterion-referenced test may be
years away from achieving the levels of performance attained by experts.1
Measures other than traditional procedures can also be useful in assessing the
degree to which individuals have acquired the declarative knowledge necessary
to function in their domain. An interesting example of a technique that uses a
210
Measuring Cognitive Skills
variety of stimulus materials has been reported by Ronan, Anderson, and
Talbert (1976). They assessed fire fighters' ability to integrate situational condi-
tions at the scene of a fire with their job knowledge by using a tab test format. The
experimenters constructed nine tab tests to isolate knowledge about possible
rescue techniques, possible methods of providing protection from exposure to
victims being rescued, equipment necessary for rescue, possible methods of
access, equipment necessary for ventilation, possible methods of fire suppres-
sion, possible salvage and overhaul techniques, procedures in the event of a
malfunctioning emergency Air-Pak, and procedures for dealing with a malfunc-
tioning pumper. Ronan et al.'s (1976) tab test was constructed by creating a
hypothetical fire situation using both a description of the fire situation and
detailed drawings of the scene. Alternative answers or solutions to the problem
situation were listed alongside a box in the right-hand column that contained
either a yes (the alternative is correct) or no (incorrect solution) covered by silver
ink. The examinee's task was to find the most appropriate answer (or answers,
depending on the tab test) and erase the silver ink to determine if the correct
alternative had been chosen. The test was scored by subtracting the number of
erasures required to locate the correct alternative(s) from the total number of
alternatives given. Attractive properties of this technique are that it allows the
construction of more complicated problems and response alternatives than
would typically be found on a multiple-choice test and that the examinee receives
immediate feedback as to the correctness of a response.
Lesgold and Lajoie (1991) have described two other techniques that have been
used to assess the acquisition of declarative knowledge. The first technique
entailed providing Air Force men working in avionics with a list of 14 electronics
components and then asking them to tell everything they knew about each compo-
nent. Their responses, which were written in list form or as short answers, were
guided by response categories that directed the airmen to describe the physical,
functional, operational, and applicability properties of each component.
The second technique tapped a somewhat deeper level of knowledge acquisi-
tion in that it entailed presenting airmen with the seven functional components of
a radar system and asking them to arrange the components in a manner that
reflected the interaction of the components in a functioning radar system.
From the context of the theory of cognitive skill acquisition, procedures that
measure only the extent to which learners have mastered the declarative knowl-
edge essential for functioning in a domain have a serious limitation. Specifically,
they can provide an index of whether a learner has acquired the declarative
information necessary to function in the domain, but they do not indicate where
the learner falls along a skill development continuum. For instance, a learner
who performed satisfactorily on a traditional test could be a relative novice who
has just mastered the necessary declarative information or an expert who is
highly skilled at performing the activities involved in functioning in the domain.
The fact that the learner can correctly recall or identify the declarative knowl-
edge making up the domain is of little help in distinguishing between an expert
who can perform a task with great fluency and efficiency and the novice who has
memorized the steps in task performance but actually performs the task in a slow
and error-prone manner.
Measures of knowledge organization that are based on assumptions about the
associative nature of memory. As learners begin to acquire expertise in a domain,
211
Royer, Cisero, and Carlo
the declarative knowledge they have acquired evolves into knowledge structures
that resemble the tightly integrated structures that are characteristic of an
expert's representation of the knowledge within a domain. There are several
techniques that have been developed to measure the nature of a developing
knowledge structure that are based on the assumption that concepts having
conceptual similarity with one another are associatively related in memory.
A classic procedure for describing knowledge structures makes use of multi-
dimensional scaling procedures (Shepard, 1962). Multidimensional scaling re-
quires the collection of pairwise judgments of similarity for all of the concepts or
elements being evaluated. These judgments are then subjected to scaling anal-
ysis which provides a multidimensional "portrait" of a learner's representation
of the domain under investigation. The learner's portrait can then be compared
to that of experts and novices in an attempt to evaluate the extent to which the
learner's portrait matches that of an expert or novice. Salomon (1991) has also
presented an interesting discussion of how multidimensional scaling could be
used to assess classroom learning outcomes.
Multidimensional scaling is an attractive means of representing knowledge
structure because of the objective nature of the mathematical representations it
provides. However, it has serious shortcomings as a practical means of represent-
ing knowledge structures in instructional contexts. First, the collection of the
data is extremely tedious and time consuming. For example, Shepard and
Chipman (1970) were interested in the extent to which subjects stored visual
representations of the shapes of states. They asked their subjects to make
pairwise similarity judgments of the shapes of 15 states. They described their
subjects as complaining that the nearly hour long task was "extremely trying"
(p. 4). This problem is compounded in many domains where there may be
substantially more than 15 concepts that are of interest to instructors or investi-
gators. Another problem with multidimensional scaling techniques involves the
process of deciding whether a novice's representation resembles a desired repre-
sentation. To date, there does not seem to be a simple and easily understandable
index that would allow one to determine that a learner had achieved a desired
degree of knowledge organization.
Perhaps in response to the complexity of using multidimensional scaling
procedures, several investigators have developed techniques for representing
knowledge structure that are somewhat easier to use. The first class of pro-
cedures is based on the assumption that experts possess knowledge representa-
tions that are organized so that concepts that are similar or connected concep-
tually will be represented within the same semantic space. Moreover, the
assumption suggests that the nature of this semantic space can be revealed by
using measures of the degree to which words are associated with one another. For
instance, one popular technique involves presenting learners with concepts from
a domain, one at a time, and asking them to write down other domain concepts
that come to mind (e.g., Geeslin & Shavelson, 1975; Johnson, 1969; Shavelson,
1972). The notion is that the concepts coming to mind and their order of mention
provide an index of the degree to which the concepts are associated in memory.
These indexes of association can then be compared to indexes of association
derived from the co-occurrence of concepts in instructional texts or to indexes
derived from the concept association performance of experts in the domain.
212
Measuring Cognitive Skills

Another technique that is based on the idea that knowledge structure can be
characterized by using indexes of associative memory has been reported by Chi,
Glaser, and Rees (1982). They had expert and novice physicists examine a list of
category labels from the domain of physics and then tell everything they could
about the category. The protocols were then transformed into node-link struc-
tures that presumably reflected the manner in which the concepts were repre-
sented in memory. The structures provided by the experts were interpreted as
being more tightly integrated than those provided by the novices.
Another associative technique involves having experts determine the degree
of relationship between concepts in a domain and then specifically choosing
concepts having particular relationships for learners to rate. For instance, Ko-
nold and Bates (1982) had experts arrange psychological concepts into groups of
pairs that had strong, moderate, slight, or negligible conceptual relationships.
These pairs of concepts were subsequently given to learners who rated the four
possible relationship categories into which the concepts fell. The ratings of
learners were then compared to the ratings provided by the experts.
Konold and Bates (1982) also used another technique to represent the knowl-
edge structures of their subjects. In the second technique, they obtained con-
cepts from a text that were associated with the fields of behavioral, cognitive, and
humanistic psychology. Subjects were then asked to place randomly ordered
concepts into the appropriate school of psychology, and their classification
accuracy was then compared to that obtained from the text.
A somewhat different form of knowledge structure assessment, based on
assumptions about the associative structure of memory, has been reported by
Reitman and Rueter (1980). They take advantage of the well-known finding that
subjects will utilize some form of structured recall when asked to free recall
words. Reitman and Rueter presented subjects with a list of words that could be
organized into categories based on conceptual similarities within a content
domain. Subjects were then asked to free recall the words, and the extent to
which subjects recalled the words in category order was taken as an index of the
extent to which a subject's knowledge was organized in accordance with the
conceptual organization of the domain.
Adelson (1981) has also reported a study that utilized free recall organization
as an index of knowledge structure. She investigated how novice and expert
computer programmers represent and use programming concepts by presenting
them with 16 lines of randomly organized programming code for nine presenta-
tion and free recall trials. The subjects were told that the code, which could be
organized procedurally as three separate programs or syntactically as five cate-
gories of commands, could be organized and that they might find it helpful to
impose an organization during recall. The extent to which the subjects grouped
the code either procedurally or syntactically was taken as an index of the extent
of the degree to which programming knowledge was organized.
Indirect measures of knowledge organization. The procedures for assessing
knowledge structure that were described in the preceding section assumed that
knowledge within a domain was organized around conceptual topics and that an
index of organization could be attained by directly measuring associative aspects
of memory. The measures to be described in this section are indirect measures of
cognitive structure in that they do not purport to directly measure an index of
213
Royer, Cisero, and Carlo
knowledge structure. Rather, they measure performance on a task that presuma-
bly is sensitive to the degree to which knowledge is tightly organized. The notion
is that if subjects do well on these tasks it would indicate that subjects have
achieved a high degree of knowledge acquisition and organization.
Guthrie's (1988; Guthrie, Britten, & Barker, 1991) research on document
search represents an instance of an indirect measure of knowledge organization.
Guthrie and his colleagues use a computer based environment to present sub-
jects with search tasks and the materials necessary to accomplish the tasks. For
instance, subjects might be presented with an air travel computer environment
which contained menu driven subcategories of information about things like
meals, arrival and departure times, entertainment, cost, and days of operation
(Guthrie, 1988). The subjects would then be presented with a problem like
arranging for a flight to Los Angeles that must be booked at or under a given cost
and that had to arrive at a particular time. The computer would record all of the
subjects' keystrokes as they proceeded to find the material necessary to solve the
problem.
One attribute that surely determines a subject's efficiency of search through a
document is the amount and degree of organization of the knowledge domain
possessed by the subject. Accordingly, the efficiency with which a learner can
search through a document to find relevant information provides an indirect
measure of the degree to which a learner has acquired and organized domain
knowledge. This task should be particularly attractive in those training situations
that have developed computer presentations of supporting documentation.
Modifications of the documentation programs could easily be developed that
would save the learner's search activity and thereby provide an index of knowl-
edge acquisition and organization.
Card, Moran, and Newell (1980) have reported a study that is similar in some
respects to Guthrie's work. They examined how experienced text-editor users
select methods and combine selection rules into sequences by using an actual
text-editing task on a computer. In the experiments, individuals with at least 1
year's experience using the POET Text Editor were given an editing task involv-
ing an 11-page memo marked with 73 corrections in red pen and were instructed
to edit the manuscript using the computer. Keystrokes typed by the users and the
system's responses were recorded by the computer. These keystroke files were
then subsequently examined as evidence of the extent to which the subjects had
mastered the knowledge required to function in the editing situation.
Another example of the indirect assessment of knowledge acquisition and
organization can be found in Royer, Carlo, and Cisero's (1992) research on
assessing comprehension using the Sentence Verification Technique (SVT).
SVT tests are constructed by developing four types of text sentences based on the
sentences in an original passage: originals, which are exact copies of a passage
sentence; paraphrases, which entail changing as many words as possible in a
passage sentence without altering the meaning; meaning changes, which entail
changing one or two words in a passage sentence so that the meaning of the
sentence is changed; and distractors, which are sentences that are consistent with
the theme of the passage but are unrelated to any passage sentence. An SVT test
consists of an equal mix of each of the test sentence types. An examinee takes an
SVT test by reading (or listening to) a passage and then classifying each test
214
Measuring Cognitive Skills

sentence as a yes or no sentence. Yes sentences have the same meaning as a


passage sentence (originals and paraphrases), and no sentences have a different
meaning than passage sentences (meaning changes and distractors).
The theoretical rationale for SVT tests is provided by theories of language
comprehension. Modern theories of comprehension (e.g., Perfetti, 1988; van
Dijk & Kintsch, 1983) assert that comprehension is a constructive process in
which the reader or listener must utilize the incoming linguistic message and his
or her prior knowledge to construct an interpretation of the message. The
interpretation is then stored in long-term memory. This suggests that compre­
hension could be measured by assessing the degree to which a reader or listener
had successfully stored a meaning preserving representation of a text that had
been read or heard. The SVT technique was designed to provide such an index.
The constructive view of language comprehension suggests that readers or
listeners must possess relevant prior knowledge that is appropriately organized
in order to comprehend a text. The research by Royer and his associates confirms
this expectation. For instance, Royer, Lynch, Hambleton, and Bulgareli (1984)
reported several studies showing that comprehension of a technical text as
indexed by SVT performance varied in accordance with the degree of domain
knowledge possessed by the subjects. Moreover, two other articles (Royer,
Abranovic, & Sinatra, 1987; Royer, Marchant, Sinatra, & Lovejoy, 1990) dem­
onstrated that the extent to which a student could comprehend text drawn from a
textbook to be used in a college course was a significant predictor of the grade the
student would receive in the course. This relationship was interpreted as indicat­
ing that both the comprehension of the text material and the amount learned in
the course were mediated by the amount of prior knowledge the learner had
about the course content.
Royer's SVT comprehension assessment technique has several attractive
properties as an indirect measure of amount and organization of knowledge
within a domain. First, the technique can provide assessments of either listening
or reading comprehension based on virtually any text material. Examples of
materials that have served as the source of tests are materials drawn from
military training manuals, materials from textbooks, materials from news­
papers, and materials from scripts used in radio programs. Second, the tech­
nique is easy to use, and most people with a solid knowledge of the domain the
text is drawn from can develop tests that have robust psychometric properties
(see Royer, 1990, for a review of SVT research).
An extension of the SVT technique called the Inference Identification Tech­
nique ( T) has recently been reported by Carlo, Royer, Dufresne, and Mestre
(1992). An T test consists of a series of passages and accompanying tests. Each
test consists of a set of test sentences, half of which are true inferences and half of
which are false inferences. In addition, half of the inferences (both true and
false) are near inferences, derived by combining information in two separate
sentences in the text, and half are far inferences, derived by combining a unit of
information in the text with prior knowledge the learner must have about the
domain. Carlo and her associates have shown that subjects varying in back­
ground knowledge in two domains (physics and psychology) perform on ΠT tests
in a manner consistent with the interpretation that the tests are sensitive to
variations in domain knowledge.
215
Royer, Cisero, and Carlo
Comments on measures of knowledge acquisition and organization. The mea-
sures of knowledge acquisition and organization described in the previous sec-
tion have strengths and weaknesses, depending on the purpose of the assess-
ment. As mentioned earlier, if the intent of instruction is to assure that learners
have acquired the declarative knowledge specified in instructional goals, tradi-
tional forms of assessment such as multiple-choice tests have very attractive
properties. In particular, they can be developed using well-established rules of
test development, and performance on the tests can be easily interpreted. The
difficulty with traditional forms of knowledge acquisition assessment is that they
are useful only for determining if a learner has acquired the declarative knowl-
edge that is needed to function in the domain.
Measures of knowledge organization that rely on assumptions about the
associative nature of memory also have strengths and weaknesses. Their
strengths are particularly apparent in situations where the instructional goal is
concerned with acquiring knowledge in a manner that results in a tightly inte-
grated knowledge structure. The associative measures described in the preced-
ing section are attractive for this purpose because they provide direct indexes of
the associative nature of knowledge organization. The shortcomings of the
associative measures are that they do not indicate whether the knowledge that
has been acquired can be used for any useful purpose. It is entirely possible that
an instructional sequence could result in the production of tightly integrated
knowledge structures as revealed by associative measures but that the learner
could not then do anything with the knowledge that had been acquired. For
instance, one could easily teach a subject to mimic the performance of a domain
expert on an associative task by presenting the concepts in the domain in paired
associate or list-learning formats. But the subject could not then use the asso-
ciatively learned concepts in any meaningful task.
The indirect measures of knowledge acquisition and organization also have
strengths and weaknesses. Their weaknesses involve the fact that they provide
indirect indexes of the extent to which knowledge has been acquired and orga-
nized. There are undoubtedly many situations where instructional designers
would not be satisfied with inferential evidence that knowledge has been ac-
quired and organized. In these cases, indirect measures would not be attractive
choices as an assessment procedure.
Indirect measures are attractive alternatives, however, in situations where
instructional designers are very interested in whether acquired knowledge trans-
fers to meaningful tasks within the domain. For instance, many domains require
people to consult documents, tables, schematics, or reference manuals while
employed at their jobs. Thus, Guthrie's (1988) measure of a learner's document
search efficiency provides an indirect index of the extent to which a learner has
acquired and organized domain knowledge and an index of the extent to which a
learner can perform a meaningful job activity.
In a similar fashion, almost all jobs require people to acquire new job related
information either through reading or listening. Thus, Royer's listening and
reading comprehension assessment procedure could be used to provide an index
of the extent to which a learner had acquired information necessary to compre-
hend new information and an index of a learner's future success in new job
related learning situations.
216
Measuring Cognitive Skills
It should also be noted that measures of knowledge acquisition and organiza­
tion are only inteφretable if one has a sense of the developmental properties of
knowledge attainment. Specifically, assessment specialists who are interested in
conducting assessments in a manner consistent with the theory of cognitive skill
development will have to measure different groups of learners at different points
in the instructional process, and they will have to measure acknowledged experts
in the domain, novices who are ignorant of the domain, and successful and
unsuccessful learners who are attempting to develop expertise in the domain.
This data would be essential in appropriately interpreting the test performance
of students currently receiving instruction.
The data from experts, novices, successful learners, and unsuccessful learners
typically would be collected in a cross-sectional manner. But an argument could
be made that interpretations of performance will also have to be based on
longitudinal examinations of learners in a training environment. The problem is
that cross-sectional portraits of performance indicate where examinees are at
particular points in a training program, but they do not indicate what they looked
like in between assessment periods. Information about performance profiles
during the entire period of skill development would be extremely valuable in
developing assessment/instructional systems that were truly diagnostic and
prescriptive.
Depth of Problem Representation
Chi, Feltovich, and Glaser (1981) noted that novices tend to recognize and
attend to the surface features of problems whereas experts tend to identify
inferences and principles that subsume the surface features. This general finding
has been noted in a broad domain of activities, and a variety of procedures have
evolved to identify the phenomenon. These techniques could be used to differen­
tiate between learners in the early stages of skill acquisition and those who had
attained considerable expertise.
Some of the earliest research examining differences in problem representation
occurred in studies involving novice and expert chess players. A classic example
is a study by Chase and Simon (1973) who attempted to identify the perceptual
structures (chunks) that novice and expert chess players perceive. They found
that skilled players could extract more information from a brief exposure of a
chess position than less skilled players and that skilled players could encode the
chess position into larger chunks than less skilled players. The differences
between skilled and less skilled players disappeared when the chess pieces were
in random order. These results were taken as evidence that experts and novices
were representing different kinds of information when examining a chess posi­
tion. Specifically, the experts were representing information as chunks by label­
ing the board positions in terms of familiar games or positions. In contrast,
novices represented the positions in terms of individual positions in particular
places.
Since the early chess research, there have been a large number of studies that
provide evidence consistent with the interpretation that experts and novices
represent problems in different ways. The techniques used in these studies can
be divided into those that use sorting techniques to examine problem representa­
tion, those that use some form of reproduction of a presented task, those that ask
217
Royer, Cisero, and Carlo
the learners to make a judgment about the nature of the task, and those that ask
the learner to judge a physical skill.
Techniques that asked learners to reproduce domain information. Egan and
Schwartz (1979) have reported a study using skilled and unskilled electronic
technicians as subjects that utilized a methodology similar to that used by chess
researchers. The experiments they reported were similar to the chess experi-
ments in that novice and expert subjects were asked to reconstruct legitimate and
illegitimate circuits after brief exposures. The results revealed a significant recall
advantage for expert subjects when exposed to legitimate circuits but no differ-
ence between the recall of experts and novices when the circuit to be recalled was
illegitimate. Moreover, Egan and Schwartz presented convincing evidence that
the recall advantages for expert subjects were not attributable to superior
guessing on the part of the experts relative to the novices. Instead, it seemed
apparent that experts were representing circuit diagrams as chunks of diagram-
matic information whereas novices were focusing on much smaller units of
information.
As was the case with the earlier chess research that showed that experts have
superior recall for legitimate chess positions, the Egan and Schwartz (1979)
studies can be interpreted as indicating that knowledgeable electronic techni-
cians encode a briefly presented circuit by activating a knowledge representation
that allows the subject to chunk the contents of the circuit. In contrast, the novice
is more likely to develop a representation that encodes individual elements in a
circuit. Findings similar to this using computer programmers as subjects have
been reported by several investigators (Barfield, 1986; Vessey, 1988).
For example, Barfield (1986) conducted a study in which BASIC programmers
characterized as being naive, novice, intermediate, or expert were presented
with 25-line BASIC programs that were organized either in executable order, as
random lines, or as random chunks. The subjects were asked to recall the
programs after viewing them for 5 minutes. The results indicated that the naive
and novice groups did not differ in terms of their recall on any of the program
conditions and that the experts were superior to naive and novice subjects in all
recall conditions. The experts were also superior to the intermediate program-
mers when the programs they were asked to recall were either executable or
presented as random chunks. The performance of the experts was not different
from the intermediates, however, when the programs were in random order.
Sorting performance as an index of problem representation. Another technique
that has been interpreted as revealing interesting differences between experts
and novices in the ability to represent problem information utilizes a problem
sorting methodology. For instance, Chi et al. (1981) presented undergraduates
with little experience in physics, graduate students, and professors of physics
with physics problems obtained from a college level introductory physics test.
The subjects were instructed to sort the problems into groups that could be
solved in the same way. They found that the novices tended to sort the problems
on the basis of surface features of the problems whereas experts tended to sort on
the basis of the physical principles involved in the problems. These results were
interpreted as indicating that novices focused on surface features like inclined
planes or pulleys whereas experts saw the problems as involving principles like
Newton's second law or angular momentum.
218
Measuring Cognitive Skills
In a study that used a methodology similar to that used by Chi et al. (1981),
Schoenfeld and Herrmann (1982) assessed knowledge structure and problem
representation of experts and novices in the domain of mathematics by using a
card sort procedure. Each of the 32 problems used in the study was assigned an a
priori mathematical deep structure and surface structure by the first author (a
mathematician), which was generally agreed on by other mathematicians. Nine-
teen freshman and sophomore college students with 1-3 semesters of college
mathematics experience (novices) and nine mathematics professors (experts)
read through the problems in random order and sorted the problems into
categories that could be solved the same way. The results paralleled those from
studies in other domains in that higher levels of expertise were associated with
the tendency to sort problems on the basis of deep structure characteristics.
A study examining the problem sorting performance of experts and novices in
the domain of computer programming has been reported by Weiser and Shertz
(1983). Programming problems were constructed so that they would fit into one
of three categories: application area (a plausible source of surface features for
novices), algorithm, and data structure. The algorithm and data structure attrib-
utes of the problems were considered to be high level attributes in that both were
considered to be foundations of programming knowledge. The results of the
study indicated that experts tended to use higher level attributes of programming
language as the basis for sorting whereas novices tended to focus on surface level
attributes.
A variant of a problem sorting procedure has been reported by Hershey,
Walsh, Read, and Chulef (1990) in their studies of expert and novice financial
planners. They presented experts and novices with a financial planning problem
and tracked the order and nature of the information the two groups consulted as
they solved the problem. Information that could be consulted was presented on a
series of overturned cards that had an identifying label presented on the back. Of
particular interest in the problem was the extent to which subjects appeared to
have a systematic and efficient card selection process as they proceeded toward a
solution to the problem. The card selection of expert planners suggested that
they represented the problem differently than novices.
Asking learners to make judgments about problems or situations. The problem
sorting procedure developed by Chi et al. (1981) to examine problem representa-
tion differences between experts and novices is useful for experimental research,
but it is cumbersome to use in instructional settings. The recall procedures used
in the chess research and the circuit diagram research are much more tractable in
instructional settings but still require labor intensive scoring efforts. There are,
however, several techniques that have been reported that measure depth of
problem representation in a manner much more amenable to instructional use.
Hardiman, Dufresne, and Mestre (1989) presented expert and novice physicists
with a series of standard problems and two accompanying comparison problems.
The standard problem was based on a physical principle, and the comparison
problems could match the standard problems in surface structure (5), deep struc-
ture (D), both deep and surface structure (5D), or neither deep nor surface
structure (TV). The subject's task in the experiment was to decide which comparison
problem could be solved in the same way as the standard problem.
The technique developed by Hardiman et al. (1989) is easily administered and
scored, but it does have a shortcoming in that three, possibly complex, problems
219
Royer, C¿sero, and Carlo
must be read and evaluated before the subject makes a single response with a
correct-by-chance probability of 0.5 (i.e., chooses one of the two comparison
problems as a match for the standard problem). This means that considerable
testing time must be expended in order to acquire sufficient information to
provide reliable assessments. A technique similar to the Hardiman et al. tech-
nique has been reported, though, that provides considerably more information
in approximately the same amount of testing time.
Carlo et al. (1992) have reported a study that involved presenting knowledge-
able and less knowledgeable students with a standard problem and four compari-
son problems. The comparison problems were similar to those used by Hardi-
man et al. (1989) in that the four comparison problems consisted of an 5, D, SD,
and N problem. A subject's task was to examine the standard problem and then
classify each comparison problem in terms of whether it embodied the same
principle/concept as the standard problem. The study, which was conducted in
the domains of both physics and psychology, provided results consistent with
earlier studies in that knowledgeable subjects were able to identify the principles
embedded in the problems with greater facility than were the novice subjects.
Moreover, the difference between knowledgeable and novice subjects in terms of
accuracy of classification was greatest in the situations in which there was a
discrepancy between the surface features and deep structure features of the
problem (i.e., the S and D problems).
The Carlo et al. (1992) modification of the Hardiman et al. (1989) procedure is
useful in instructional settings in that the technique provides four subject re-
sponses for every five problems that the subject evaluates. This compares to one
response for every three problems evaluated in the Hardiman et al. technique.
The added data acquired should provide more reliable assessment in a shorter
time period.
Another procedure for assessing problem representation that involves asking
subjects to make judgments about a problem has been reported by Adelson
(1984). She examined the representations constructed by novice and expert
computer programmers using time to understand flowcharts and error rates on
questions about program functioning as indexes of problem representation. In
one experiment, she presented subjects with a flowchart that was organized
either abstractly, in terms of what the program was designed to do, or concretely,
in terms of how the program functioned and recorded the time required for the
subjects to indicate that they comprehended the flowchart. In another experi-
ment, the subjects were presented with either a task that required the subject to
insert a missing line of code in a program that would allow the program to
perform a particular function or a task that required them to debug a flawed
program.
Adelson's procedure (1984) for assessing problem representation by asking
programmers to debug a program is very similar to assessing troubleshooting
capability in other domains such as electronic troubleshooting. But her pro-
cedure for assessing competence by inserting a missing line of code also could be
used in domains other than programming. For instance, one could imagine
presenting electronic troubleshooters with an incomplete circuit that is designed
to perform a particular function and asking learners to provide a component that
would make the circuit work.
220
Measuring Cognitive Skills
Representations ofphysical actions. Interest in differences between skilled and
unskilled individuals in problem representation has also been evident in domains
involving the representation of physical action. For instance, Goulet, Bard, and
FΊeury (1989) examined expertise differences in perception and selection of
visual information by comparing the performance of expert and novice tennis
players in identifying types of tennis serves. In the studies, subjects were pre­
sented with 16 mm. films showing right-handers and left-handers delivering flat,
topspin, and sliced serves. The task was to identify the type of serve as fast and
accurately as possible. The number of correct responses was recorded, and visual
search patterns were examined using an eye movement recorder. The expert
tennis players were shown to be both faster and more accurate than the novices in
picking up the serve being delivered.
In another study involving the perception of physical performance, Allard,
Graham, and Paarsalu (1980) explored the relationship between skill in perform­
ing and skill in perceiving in basketball players. The subjects were females who
were experienced basketball players or inexperienced basketball players. The
two groups of subjects were presented with 20 structured (involving execution of
set plays in a half-court offensive situation) and 20 unstructured (involving
transition situations) slides of basketball situations and were asked to recall
positions of players after viewing a slide for 4 seconds. In results paralleling those
for cognitive domains, the experienced players were superior to the inex­
perienced players in recalling the position of players in structured situations but
not in unstructured situations.
Problem framing in complex problem solving situations. An investigation of
problem representation in a highly complex problem space has been reported by
Purkitt and Dyson (1988). They reported a study that examined the cognitive
basis of political decision making in experts and novices. Groups of three novices
or experts were asked to examine (for a week) either quantitative or qualitative
information about a fictitious country with the idea in mind of formulating an
annual budget for the country. Videotape and written data collected in the study
indicated that the expert and novice groups approached the problem in different
ways. Specifically, expert groups would begin their sessions with an effort to
frame an issue before proceeding with formulating a response to a specified
problem. In contrast, novice groups immediately began attempting to provide
answers to a specified problem.
Comments on procedures for assessing problem representations. As the studies
reviewed in the previous section indicate, depth of problem representation is a
characteristic that distinguishes novice and competent performers in many dif­
ferent domains. Of the techniques reviewed in the previous section, only the card
sorting techniques would seem to be of limited value in instructional situations.
The problem with the card sorting techniques is that they are somewhat cumber­
some to administer and score. However, a paper-and-pencil variant of card
sorting techniques could prove to be more tractable. For instance, examinees
could be presented with a series of problems and asked to indicate on an answer
sheet which problems are similar in a specified manner. This technique would be
easier to use than card sorting techniques, but research would have to establish
that the measurement properties of the technique were not destroyed by limiting
the examinees' ability to physically manipulate the problems.
221
Royer, Cisero, and Carlo
The other techniques described in this section of the article are viable alterna-
tives in instructional settings. Reproduction procedures, such as that illustrated
by the Egan and Schwartz (1979) study, are relatively easy to administer, and
they have the attractive property of directly involving a meaningful domain
activity. As the Egan and Schwartz research exemplifies, electronic trouble-
shooters are frequently required to consult schematic diagrams while working.
Thus, their ability to process and remember elements of a diagram has direct
relevance to job performance.
The judgment procedures also have attractive properties. Some of the tech-
niques can be readily administered as paper-and-pencil procedures, and tests
could be machine scored, thereby increasing the practicality of the procedures.
These techniques do have a shortcoming, however, of being somewhat removed
from job performance activities. This means that the connection between perfor-
mance on the test and job performance must be inferential rather than direct.
Most of the techniques reviewed in this section involved relatively simple
situations. However, the Purkitt and Dyson (1988) represented a study that used
a problem representation activity that occurred in the context of a highly com-
plex problem solving situation. This study serves to demonstrate that depth of
problem representation can be assessed even in very complex performance
environments.
A serious shortcoming of the depth of representation procedures is that the
research provides little information about the developmental properties of the
skills that enable task performance. As indicated in Table 1, our judgment is that
the techniques would prove useful in assessing a range of competency from
beginning novice to expert. We could, however, be very wrong about this, and
the techniques may prove to be useful over a much narrower range of compe-
tence. It is entirely possible, for example, that trainees may have to acquire
competence beyond the acquisition of declarative knowledge before they are
able to process problem representations at even a superficial level.
Mental Models
Mental models are qualitative, domain specific representations formed by
learners as they are acquiring competence in a domain (cf., Gentner & Stevens,
1983). Glaser et al. (1985) suggest there are three types of representations. The
first, which they call a qualitative process model, is an internal representation of a
physical device along with a set of procedures for running that device. In a
problem solving situation, the qualitative process model allows the learner to
mentally simulate actions of the physical device under various parameter change
conditions, thereby facilitating problem solution.
A second type of model is called an appearance model. An appearance model
is similar to the qualitative process model in the sense that it is a representation of
a physical device. However, an appearance model provides a static representa-
tion of the device after it has undergone a change.
The final type of model described by Glaser et al. (1985) is a relational model,
which is a representation that encodes the features of a device as they relate to
other, perhaps better known, devices. It is quite possible that relational models
are the first type of model formed during the early stages of learning and the
more sophisticated process and appearance models form as the learner acquires
expertise in the domain.
222
Measuring Cognitive Skills
Anzai and Yokoyama (1984) have described mental models in physics in
somewhat different terms. They classify models as being experiential, correct
scientific, or false scientific. Experiential models are derived from individual
experience and do not include scientific entities or relations. A correct scientific
model is a set of scientific concepts and the relations that are correct and
sufficient to capture problem information. A false scientific model is a model
that contains scientific concepts and relations, but the model incorrectly charac-
terizes problem information.
As an example of each type of Anzai and Yokoyama's (1984) models, consider
how one might think about a ball's falling to earth. A physically naive subject
might literally picture a falling ball. A more sophisticated subject might imagine
the correct scientific model, which would characterize the falling ball in terms of
Galileo's model relating acceleration, mass, and force. Finally, a subject with a
false scientific model might imagine the ball in terms of Aristotle's model of a
falling object, which incorrectly substituted velocity for acceleration.
Glaser et al.'s (1985) classification of mental models is useful in categorizing
types of models, but the Anzai and Yokoyama (1984) terminology is also
important because it makes the point that not all models are equal. It would be
very important in the educational process if one could determine whether a
learner was operating with a correct or incorrect mental model. In the following
discussion of particular techniques for assessing mental models, the Glaser et al.
(1985) scheme will serve as the organizational structure, but the reader should
keep in mind the importance of determining whether the learner has a correct or
an incorrect mental model.
Assessing relational models. McClosky, Caramazza, and Green (1980) have
reported an interesting study that assessed the relational model underlying
predictions of physical motion. They presented subjects with a picture of a
spiraling tube and asked them to imagine that a pellet was shot through the tube.
The subject was then asked to draw a line that would predict the flight of the
pellet as it left the tube. Many of their subjects drew a line consistent with an
erroneous mental model that would suggest that the spiral tube imparted a
curvilinear path to the pellet as it left the tube.
Other examples of attempts to assess relatively simple relational models have
been reported by Riley (1985) and by Gentner and Gentner (1983) who asked
subjects questions about the state of an electrical circuit under varying condi-
tions. In the case of the Gentner and Gentner research, these questions were
directed at determining the nature of the model that subjects were using to guide
their question answering performance. For example, the models many of the
subjects seemed to be using were based on two metaphors: one that suggested
that the flow of electricity was analogous to the flow of water and another that
suggested that electricity flow could be characterized as the movement of teem-
ing crowds.
Assessing qualitative process models. An example of a study that attempted to
assess a qualitative process model has been reported by Lopes (1976). Lopes
used a modified version of five-card stud to examine how a poker player's mental
model of the game influences his or her play. Three experiments were reported in
which experienced poker players were given a standard five-card stud hand (a
pair of sevens and three unknown cards) and one or two competing hands
223
Royer, Cisero, and Carlo
consisting of four cards up and a down hole card. The subjects were also given a
description of the playing style of opponents that could be labeled as conserva-
tive, risky, or average along with descriptions of the betting patterns that had led
up to the current situation. Lopes collected a variety of information from her
subjects including predictions about the likelihood of winning, the amount of a
bet subjects were willing to risk, and protocol data that assessed subjective
assessments of probabilities, rule-governed behavior, and strategic practices.
Another study that made use of protocol data to assess mental models has
been reported by Johnson (1988), who examined the influence of the mental
models of expert and novice troubleshooters as they worked with a malfunction-
ing generator. The procedure for assessing the state of mental models involved
the development of an ideal conceptual representation of a generator and the
construction of a problem space based on the ideal representation and
on the nature of the faults the subjects were going to encounter during the
troubleshooting phase of the experiment. After several practice problems, the
subjects were presented with a faulted generator, schematics, wiring diagrams,
and technical manuals and asked to diagnose the nature of the difficulty and to
think aloud while doing so. The verbal protocols were recorded and subse-
quently mapped onto the problem spaces based on the ideal model and the
nature of the fault. These mappings were interpreted as indicating the nature of
the mental model the subject was using during the troubleshooting activity.
Anderson's (e.g., 1990b) work with computer-based LISP tutors represents
yet another approach to the assessment of mental models. Anderson and his
colleagues have categorized instructional instances in tutorial sequences and
then have attempted to specify correct and incorrect moves that would occur in
the presence of the instances. These correct and buggy moves are then specified
as productions and used to model student performance in the lessons. The
models then dictate the nature of the instructional material and feedback the
student will encounter as instruction proceeds.
Assessing appearance models. An appearance model is a mental representa-
tion that encodes how a particular device or state is supposed to look under
specified conditions. A good example of a study that explicates the concept of an
appearance model and that provides a procedure for measuring one has been
reported by Lesgold, Rubinson, Feltovich, Glaser, Klopfer, and Wang (1988).
Lesgold et al. asked medical doctors with varying degrees of expertise to
examine a set of X rays that were normal, were examples of a common and
readily diagnosable malady, or were examples of difficult-to-diagnose abnormal-
ities. They then asked the subjects to think aloud as they were examining the
X rays and to draw outlines on the X rays that would represent the normal state
of the organ being inspected and that represented the state of the organ that they
considered to be abnormal. The results of the study indicated that the expert
diagnosticians were superior to the novices in their ability to correctly diagnose
the malady and that the experts were more accurate in depicting the appearance
of normal and abnormal organs in their X-ray drawings. These drawings were
assumed to provide an index of the physician's mental model of normal and
diseased organs.
Comments on procedures for assessing mental models. The studies described
in the previous section demonstrate the importance of mental models in func-
224
Measuring Cognitive Skills
tioning in a variety of domains. Moreover, the research suggests that the identi-
fication of the nature of the mental model the learner is using could have
important instructional value. That is, knowing that a learner was working with
an incorrect model and knowing that the nature of that model would be an
important diagnostic could lead to an instructional intervention that was specifi-
cally designed to repair the flawed model. In an ideal assessment/instruction
system, this would involve the identification of all of the flawed models that
learners might possess, the development of procedures for uniquely identifying
each of the models, and the development of instructional procedures designed to
repair the specifically identified flawed model.
Some of the research examined in the section on mental models was concerned
with relatively simple situations where a single model was highlighted. Instances
are the McClosky et al. (1980) research on flight paths and the Gentner and
Gentner (1983) research on the identification of metaphors underlying concep-
tualizations of electricity. Both the McClosky and the Gentner and Gentner
studies represent instances of relational mental models. That is, the models can
be analogical (or metaphorical) in the sense that the target activity can be
represented in terms of activities more familiar to the subject. The assessment of
relational mental models should be most useful in situations where trainees are
just beginning a course of study and it is important to identify gross conceptual
defects that may hinder subsequent learning and understanding. If they are used
later in a course of study, they are likely to be superfluous, or they would identify
students who have wasted considerable amounts of instructional time by not
having been diagnosed at an earlier point.
The remaining techniques described in the section represent instances where
there is an attempt to evaluate qualitative and appearance mental models. The
Anderson (1990), Johnson (1988), and Lesgold et al. (1988) articles utilized
procedures that, with some modification, could be used in a variety of situations.
The Johnson study involved the development of an ideal mental model against
which the mental model of a learner could be contrasted. A variant of this
technique might involve presenting learners with models of devices that were
similar, but not identical, to ones they had previously worked with and then
conducting various exercises, like troubleshooting faults in the model and pre-
dicting the outcome of parameter changes in the device. The idea is that learners
with a good mental model of the device being studied should be able to transfer
that model to a similar device, thereby enabling them to perform activities such
as troubleshooting and state predictions.
The Johnson technique involves a qualitative process model that the learner
runs to solve the problem at hand. The Lesgold et al. (1988) studies would seem
to involve an appearance model in which the physician envisions the state of an
organ, under various healthy and diseased states, and then contrasts an actual
X ray with the envisioned models. Again, this would seem to be a type of task
that could be modified to fit a variety of circumstances. One could imagine a
situation where learners were presented with a device and asked to predict the
state of that device under a variety of normal and unusual circumstances. Their
predictions could be recorded in a variety of ways including the drawing pro-
cedure utilized in the Lesgold et al. study.
One attribute of the Johnson (l988)and Lesgold et al. (1988) procedures that
would limit their use in instructional situations is the utilization of protocol data.
225
Royer, Cisero, and Carlo
Protocol data is very valuable in research situations but too resource demanding
to be of use in instructional situations. It is very likely, however, that assessment
researchers could develop procedures whereby the same kind of information
gathered in a protocol format could be gathered in a less resource-consuming
fashion. This would allow the full utilization of mental model assessment
procedures in instructional conditions.
Metacognitive Skills
Metacognition is a general term that refers to one's capability of governing and
being aware of one's own learning activities. Examples of metacognitive perfor-
mance include an awareness of what one knows and what one does not know,
utilizing learning strategies that vary with the nature of the material to be learned
and the task demands of the learning situation, being able to predict the success
of one's learning efforts, monitoring the success of current learning efforts, and
planning ahead and utilizing learning time in an efficient manner.
Individuals who exhibit considerable skill in a content area are assumed to also
have well-developed metacognitive skills for operating in that area (e.g., Glaser
et al., 1985). The emphasis in the previous sentence on operating in a specific
area is intended in that many metacognitive skills are likely to be domain
specific. Glaser (1984) made this argument explicitly when he suggested that
attempts to teach students general problem solving strategies that could be used
in a variety of domains had been of little benefit. There are generalized abilities
such as reading (cf. Perfetti, 1989) and mathematics, but these are probably the
exceptions rather than the rule.
Assessing comprehension while reading. A considerable amount of attention
has been devoted to metacognitive abilities of young children when they are
learning to read (e.g., Baker & Brown, 1984). Recently, Baker (1989) has
reviewed the literature of metacognitive performance of adults while they are
reading with a particular focus on comprehension monitoring activities. The
general assessment procedure for examining metacognitive awareness involves
faulting a text in some way and determining if the reader can detect the fault.
Baker reported that there are seven different ways of faulting a text. It can occur
at the lexical level, where the understanding of individual words is assessed; at
the syntactic level, where the grammatical and syntactical appropriateness of a
text is evaluated; at the external consistency level, where text presented ideas are
evaluated for truth value using prior knowledge; at the propositional cohesive-
ness level, where the coherence of propositions is evaluated; at the internal
consistency level, where the reader evaluates whether ideas in a text are consis-
tent with other ideas in the same text; at the structural cohesiveness level, where
ideas are checked to see if they are thematically compatible; and at the informa-
tional completeness level, where the reader checks to see if all of the information
needed to complete a specified goal is contained in a text.
Comprehension monitoring tasks could be used in much the same way that
Royer's (1990) earlier described SVT technique for measuring reading and
listening comprehension might be used. Specifically, the comprehension mon-
itoring tasks could be used to assess the degree to which learners were able to
understand texts drawn from a content domain. The one advantage that compre-
hension monitoring tasks might have over the SVT task is that they can be
226
Measuring Cognitive Skills
administered at a number of different text processing levels, thereby adding to
the diagnostic potential of the procedures.
Assessing planning skills. Rosenbaum (1986) has described a technique that
can be used to assess planning skills that are not tied to any specific domain. The
task, as described by Rosenbaum, involves the computer presentation of a home
circle in the middle of a computer screen and three target circles that can be
located at varying distances from the home circle. The subject is given a problem
specification consisting of instructions to visit the circles (using mouse move-
ments) with specified stops to the home circle during the visitations. The infor-
mation acquired in the procedure includes the time elapsing between the initial
presentation of the problem and the initial move, and between each of the
subsequent moves (these times are presumed to reflect planning time), and
measures of the total distance encompassed by the mouse moves (presumably an
index of planning efficiency). It should be noted that there is no reason for
Rosenbaum's procedure to require computer presentation. It is likely that a
paper-and-pencil version of the task could also be developed.
A more domain specific example of assessing planning activities has been
reported by Gerace and Mestre (1990). They developed a procedure for mon-
itoring the problem solving capabilities of learners who were studying physics.
They conceptualized physics problem solving as being decomposable into two
activities: a solution plan and an executed plan. The solution plan could be
broken into the subactivities of identifying a concept or procedure to be applied
to the solution of the problem and identifying a reason why the procedure or
concept was applicable. Having identified pertinent principles and reasons for
their application, the learner could then execute the plan to solve the problem.
To illustrate Gerace and Mestre's technique (1990), imagine a learner who is
presented with the problem of determining the acceleration of a chair and given
the weight of the chair, the applied force, and the force of friction operating on
the chair. The authors suggest that an efficient problem solving plan would
entail: drawing a free-body diagram with an x (horizontal) y (vertical) coordinate
system to show all of the forces operating on the chair (reason: to identify and
keep track of the forces on the chair and to allow the breakup of Newton's second
law into component forms), the application of Newton's second law to the chair
(reason: to relate the net force operating on the chair to the chair's acceleration
and mass), finding the x and y components of the net force and writing Newton's
second law in component form (reason: the components are necessary to solve
for the unknown acceleration), and using the expression taken from Newton's
second law in the x direction to solve for acceleration. Once the learner had
carried out the planning steps, actual problem solution (the executed plan)
would involve the relatively simple process of plugging numbers into the appro-
priate equations and performing arithmetic calculations.
Gerace and Mestre (1990) describe two types of tests that can be used to assess
problem solving planning. In the first, learners are given a problem with a
completed execution plan and asked to fill in the solution plan—that is, asked to
provide the concept or procedure to be applied (thereby resulting in a step in the
executed plan) and to supply the reason that it is applicable. In a more difficult
version of the task, learners are presented with a problem and a sheet of paper
with spaces where they can provide both the solution plan and the executed plan.
227
Royer, Cisero, and Carlo
Lesgold, Lajoie, Logan, and Eggan (1990) have reported a technique that
utilizes a think aloud methodology for assessing planning activities in electronic
troubleshooters. The technique initially involves developing a representation of
an effective problem space for a particular troubleshooting problem. An effective
problem space is best understood by contrasting it with a complete problem
space. A complete problem space for a particular problem would consist of all of
the possible tests that could be conducted on an electronic system. An effective
problem space would consist of a representation of the system at the level of
replaceable units, such as a printed circuit board. This simplification greatly
reduced the complexity of the problem space, and, according to Lεsgold et al.
(1990), effectively captured the level of detail necessary to analyze planning
activities during a troubleshooting exercise.
Once the effective problem space was specified, novice and expert trou­
bleshooters were presented with a problem and asked to suggest hypotheses,
state plans, and specify the steps they expected to take in solving the problem.
These protocols then were mapped onto the representations of the effective
problem space. These mappings revealed that experts utilized only the relevant
portions of the effective problem spaces and their plans indicated a systematic
and linked transition through the problem space. In contrast, the mappings of
many of the novices revealed utilizations of nonrelevant aspects of the effective
problem space and isolated islands of activity with no means of transitioning
from one activity to another.
Using efficient problem solving strategies. A number of studies have shown that
individuals who have considerable domain expertise tend to utilize knowledge-
based (forward-working) strategies during problem solving whereas subjects
with less expertise tend to use means-end (backward-working) strategies. Sev­
eral studies have reported ways of examining these differences. For instance,
Sweller, Mawer, and Ward (1983) examined the change in the use of means-end
to knowledge-based strategy as subjects solved a series of kinematics problems
having similar form. The problems were presented via computer, and solutions
were recorded by the subjects by typing the forms of their equations into the
computer. A means-end strategy was assumed if an equation containing the goal
term was written or verbalized before the value of a variable was calculated from the
givens. A forward strategy was assumed if value of a variable was calculated from
the givens before the equation containing goal term was written or verbalized.
Comments on assessing metacognitive skills. There is a considerable body of
research on assessing metacognitive performance while reading, and it is quite
possible that the techniques reviewed by Baker (1989) would be of value in
instructional settings. In particular, some of the techniques described by Baker
could be of diagnostic use. She reports that good readers generally have better
metacognitive skills than poor readers and it is possible that learners who were
identified as having poor skills could benefit from instruction designed to im­
prove their abilities. There is, however, one caveat to be noted when considering
metacognitive assessment in instructional settings. Baker (1989) notes that the
literature suggests that adult readers who have good metacognitive skills are very
poor at estimating how much they have learned from a text they have read. This
suggests that there may be limited benefit in metacognitive training for adult
readers.
228
Measuring Cognitive Skills
The techniques for assessing planning activities and for examining the effi-
ciency of procedures appear promising, but there is little research assessing the
value of the procedures. It should be noted, however, that the Gerace and
Mestre (1990) technique of assessing planning performance is part of an instruc-
tional effort designed to improve physics problem solving. A demonstration that
an assessment procedure can be tied to an instructional procedure makes the
technique much more attractive in instructional settings designed to enhance
cognitive skills.
The think aloud technique described by Lesgold et al. (1990) may provide a
valuable departure point for future research. A particularly attractive aspect of
their research is the attempt to specify the problem space and to then develop
procedures for mapping the plans of the troubleshooters onto the problem
space. In order for the technique to be practical in an instructional system,
however, procedures would have to be developed that would make use of a
technique that is less labor intensive.
Automaticity I Encapsulation of Performance
One hallmark of expert performance is the capability of performing domain
tasks rapidly and with little memory load. Glaser et al. (1985) and a number of
earlier writers (e.g., LaBerge & Samuels, 1974) referred to this capability as
automaticity of performance. A number of difficulties have developed, however,
with the automaticity concept (cf. Stanovich, 1990), and a number of writers
have moved toward the concept of informational encapsulation as a means of
describing the change that occurs in processing capability as expertise develops.
Informational encapsulation is a construct that derives from modularity the-
ory (e.g., Fodor, 1983; Forster, 1979). Processes that are encapsulated are fast,
but, in addition, they are data driven and impervious to influence by higher level
cognitive activities. The term, encapsulated processes, has its origin in the notion
that the processes occur in a cognitive component that is impenetrable (i.e., is
encapsulated) by other cognitive processes. This means that encapsulated proc-
essing cannot be influenced by contextual events that occurred before the target
processing event and that the encapsulated processes cannot be influenced by
strategic actions. The data-driven characteristic of the processes refers to the fact
that cognitive processes are automatically activated by particular stimulus events.
Word identification in skilled readers is a prototypical example of encapsu-
lated processing. Numerous studies have shown that skilled readers activate a
lexical entry for an exposed word independent of contextual support and that
they do so even when they consciously try to suppress lexical identification
(Stanovich, 1990).
The concept of encapsulated processing suggests that expert performance is
not merely a matter of performing activities accurately and rapidly. Rather,
some expert activities must be performed with a minimal resource load. Regian
and Schneider (1990) place considerable emphasis on this point. They suggest
that there can be considerable variability in speed of performance among
learners acquiring cognitive skills and among competent experts. The true
expert, though, is identified by the ability to perform some cognitive skill
activities with minimal demand on precious cognitive resources. A technique
that involves measuring two tasks simultaneously provides an estimate of the
229
Royer, Cisero, and Carlo
resource load that accompanies task performance. This technique will be de-
scribed in greater detail later in this section.
Regian and Schneider (1990) provide a number of examples showing that at
least some of the processing activities conducted by experts in a domain have the
characteristics of encapsulated processing. The next section of the article de-
scribes several ways of detecting if some aspects of domain functioning are
supported by automatic/encapsulated processing.
Assessing word and conceptual processing. The preceding discussion suggests
experts in a domain may develop the ability to process concepts from that
domain more rapidly than novices. Lesgold and Lajoie (1991) have described a
procedure for measuring the speed of semantic retrieval of domain concepts that
involves presenting subjects with stimuli consisting of word-word, word-pic-
ture, and picture-picture pairs. An example of the procedure, appropriate in an
electronics training context, might involve the development of stimulus pairs
consisting of the correct and incorrect combinations of the words for electronic
components and the pictures of components. Changes of the speed of processing
of the pairs could be taken as indexes of movement toward automatic processing
of electronic concepts.
The procedure of pairing stimulus arrays and asking subjects if they are similar
in some fashion could be used to assess processing skills ranging from the
relatively simple to the highly complex. The Lesgold and Lajoie (1991) pro-
cedure mentioned above provides an example of the processing of relatively
simple concepts. To illustrate a procedure that would assess a much more
complicated processing capability, imagine a situation where a trainee is pro-
vided with two schematic circuit diagrams and asked to make a decision about
whether the circuits perform the same function. This type of procedure could be
performed with circuits that had been studied before as an index of material
learned during training, and it could be performed with previously unseen
circuits as an index of ability to transfer a learned capability.
Research that evaluates several other procedures for assessing the automatic/
encapsulated processing of technical terms is currently ongoing in our laboratory
at the University of Massachusetts. We have developed a computer-adminis-
tered battery of tasks that assess the speed and accuracy of several levels of the
processing of technical, psychological terms. At the simplest level, a word
naming task requires subjects to say the name as fast as they can of a technical
term (e.g., synapses, paranoia, episodic) appearing on the computer screen. The
next, more complex task presents subjects with conceptual categories drawn
from areas of psychology (e.g., abnormal, biological, methods) and then pres-
ents pairs of terms on the computer screen (e.g., phobia, affective; median,
oedipal) and asks the subjects to decide if the terms come from the same area of
psychology. The final task is a variant of a cloze task in which a sentence is
presented on the computer screen and the subject decides which of two terms
best completes a blank in the sentence. An example of such a sentence is: Some
abnormal behavior can be attributed to physical injury to the (neocortex/neuro-
transmitter).
All of the tasks described above involve the collection of both accuracy and
response time data and are directed at evaluating the extent to which text
processing capability improves as a function of gains in expertise in a technical
domain.
230
Measuring Cognitive Skills

Assessing encapsulated processes. In his recent review of how developmental


reading theory has tended to abandon the automaticity concept and adopt the
encapsulation concept, Stanovich (1990) suggested that one of the better ways of
assessing expenditure of processing resources was through the use of a dual task
methodology. Schneider (1985; Shiffrin & Schneider, 1977) has provided nu-
merous examples of the use of dual task methodologies to assess acquisition of
encapsulated processing in both experimental and training situations. In gen-
eral, the technique involves requiring subjects to perform a secondary task on a
given signal while simultaneously performing the primary task that is being
trained. For instance, an air intercept operator might press a button in response
to a signal from a headset while in the process of directing a fighter to a target
aircraft. The speed of the trainee's response to the secondary task then becomes
an index of the extent to which the primary task has become automated. The
reason for this is that, as performance of the primary task becomes automated, it
requires less and less cognitive capacity, thereby freeing up capacity that can be
utilized for the rapid performance of the secondary task.
Britton and Tesser (1982) have written an article that provides another exam-
ple of the use of a dual task methodology (see also, Navon & Gopher, 1979;
Norman & Bobrow, 1975; Reynolds & Anderson, 1982). They reported three
experiments in which subjects who were either knowledgeable or not knowl-
edgeable about a content domain performed a primary task of reading a text
concerned with the content, solving a problem in the domain, or thinking about
the subject matter. While performing these tasks, the subjects were also per-
forming a secondary task which involved pressing a button when a click was
heard. The response time to the secondary task was assumed to be an index of
the processing capacity taken up by the primary task. The basis for this assump-
tion is the notion that the amount of capacity taken up by the primary task will be
sensitive to the subject's familiarity with the subject matter domain. This sensi-
tivity will then be expressed as greater or lesser response times to the secondary
task.
Britton and Tesser (1982) found that, in each of the three experiments (which
involved different subject matter areas and different primary tasks), knowledge-
able subjects had significantly longer response times than subjects with lesser
amount of subject matter knowledge. These results were interpreted as suggest-
ing that subjects who were relatively unskilled in a content area have a consider-
able amount of processing capacity available for processing a secondary task
because they have little prior knowledge to activate while performing the pri-
mary task. In contrast, as learners become more skilled in the content domain,
they have greater amounts of knowledge that can be activated while performing
the primary task, and this activated knowledge takes up processing capacity,
thereby slowing the response to the primary task.
In the discussion section of their article, Britton and Tesser (1982) suggest that
the subjects in their experiments may have consisted of individuals with little
expertise in the content domains and with midranges of content expertise.
Moreover, they suggested that if truly expert subjects had performed their
primary task they may well have had secondary-task response times similar to
those for novice subjects. That is, they suggested that the pattern of secondary-
task response times for subjects exhibiting the full range of content expertise
231
Royer, Cisero, and Carlo

would have an inverted U shape. Subjects with little competence in the subject
matter would have relatively fast secondary-task response times because they
had little subject matter knowledge to engage as they were performing the
primary task. Subjects with midranges of subject matter competence would have
longer secondary-task response times because the knowledge they had acquired
would take up some of the available processing capacity. Finally, true experts
could have relatively short processing times because some of the domain activ-
ities they were involved in would have become automatic (or encapsulated),
thereby freeing up capacity for processing of the secondary task.
Comments on the assessment of automatic/encapsulated processes. The tech-
niques described in this section of the article are relatively easy to implement and
in some cases have already proven to be useful as indexes of improving cognitive
skill. Lesgold and Lajoie (1991) have apparently used the semantic processing
task with some success in their work on the SHERLOCK project, and the results
of our research with computer-administered processing tasks suggest they are
useful in differentiating between students having differing degrees of expertise in
psychology.
In addition, Schneider's (e.g., 1985) and Britton and Tesser's (1982) research
has shown that the dual task methodology was sensitive to varying degrees of
domain expertise over several content domains (electronic troubleshooting, air
intercept operating, fiction, chess, fashion, and football) while subjects were
performing a number of different processing activities (high performance skills,
reading, problem solving, thinking). These results suggest that the technique
may have general utility and could be used to monitor the development of
domain expertise in many settings.
However, in order for techniques designed to assess automatic processing to
be maximally useful, instructional researchers will have to either develop norms
of performance or conduct relatively continuous (longitudinal) monitoring of
student performance. The development of norms of performance would entail
the cross-sectional assessment of groups of students before exposure to instruc-
tion, at several points during instruction, and at the end of instruction. Moreover,
it would be important to develop a data base for students who were successful
and for those who were not successful. This normative data could then be used to
interpret performance for students at various points in the instructional process.
In the absence of normative performance data, useful information could be
obtained by continually assessing student progress. The research that we have
reviewed suggests that various aspects of task performance change as expertise
develops. These changes in performance over time could be used as indexes of
increased competence in the domain.
Efficiency of Procedures
It is frequently the case that individuals who are highly competent in a domain
of activity are not only accurate at what they do but highly efficient. That is, they
perform their activities in less time and with fewer steps than do novices. Glaser
et al. (1985) provide an interesting example of this capability. They describe
attempts to measure how well airmen could carry out the task of reassembling a
complex part of a jet engine after overhaul. The task they devised consisted of
asking airmen to sort a series of cards. Each card contained an assembly step,

232
Measuring Cognitive Skills
and perfect performance on the task was considered to be a sorting of the cards in
accordance with the order specified in the manufacturer's manual. Somewhat
surprisingly, there was no relation between performance on the task and other
indexes of expertise. Glaser et al. (1985) subsequently found that the manufac-
turer's manual specified procedures in idiot-proof fashion. That is, if someone
followed the specified steps, they would be unlikely to make a grievous error.
However, expert mechanics knew that the nature of the overhaul task that had
been conducted dictated the order of reassembly. Hence, efficient reassembly
was constantly changing, depending on the nature of the activity to be per-
formed. Experts could make these changes whereas novices were restricted to
the relatively inefficient procedures dictated by the manual.
Assessing efficiency of procedures. Glaser et al. (1985) reported that their
solution to assessing the efficiency of jet engine mechanics involved developing a
family of sorting tasks in which the airmen were initially given the details of the
maintenance that had been performed and were then asked to sort activities in
the order that they would perform them. This type of activity is probably
applicable to many training situations and represents a means of assessing
efficiency of procedures that is relatively easy to develop and use.
Lesgold and Lajoie (1991) have described several procedures that assess the
efficiency with which electronic troubleshooters perform routine activities. The
first procedure involved asking trainees to compare meter readings of voltage or
resistance with expected values in printed documentation. A second task mea-
sured the efficiency with which electronic trainees could place measurement
probes in the proper place on a given circuit for a given measurement. Gitomer
(1984) had previously found that both of these techniques differentiated between
skilled and less skilled Air Force technicians.
Lesgold and Lajoie (1991) also described a task involving more complicated
troubleshooting skills. Tracing a signal through digital circuity requires a thor-
ough understanding of the inputs and outputs of common logic gates. Lesgold
and Lajoie (1991) described a task that involved presenting logic gates that have
either an output or one input missing and then asking the trainees to indicate the
nature of the missing value. Of particular interest in the procedure was the
efficiency with which trainees could fill in the missing values. Gitomer (1984) had
also found that this task differentiated between skilled and less skilled Air Force
troubleshooters.
A technique that would appear to have general utility as a means of measuring
efficiency of procedures has been reported by Green and Jackson (1976). They
described a technique (Hark-back) for measuring the frequency with which a
subject refers back to earlier conclusions while using interviews, problem solving
protocols, or any activity where search is under the subject's control. A move
harks back when it is descended from another move that is not the most recent
one. The hark-back measure is relevant to assessing the efficiency of procedures
because subjects acquiring expertise in a domain frequently find themselves in
the position of continually referring back to something they have already done
while performing a domain activity. Anderson (1987) suggests this looking back
activity is largely the result of the reliance on general weak-method problem
solving strategies such as means-end analysis or solution-by-analogy strategies.
These techniques are extremely demanding of cognitive resources, and learners
233
Royer, Cisero, and Carlo
frequently find themselves in the position of having forgotten an earlier activity
and having to refer back to the step. In contrast, experts can rely on more
powerful domain specific strategies that are less demanding of cognitive re-
sources and, hence, more efficient.
Green and Jackson's (1976) hark-back measure provides a quantitative index
(called the H coefficient) of the extent to which subjects refer back to earlier
steps. The article reports the formulas for calculating the H coefficient along with
the statistical assumptions and characteristics of the measure.
Comments on assessing efficiency of procedures. As has been mentioned
several times in this article, assessment procedures based on sorting activities are
relatively cumbersome to use in instructional settings. This makes the specific
technique described by Glaser et al. (1985) an unlikely candidate for an assess-
ment procedure to be used in an instructional setting. However, it is entirely
possible that more tractable versions of the general technique could be devel-
oped. For instance, it would probably be possible to develop a paper-and-pencil
version of the technique, and it would surely be possible to develop a computer-
administered version of the procedure. The computer-administered version
could be automatically scored, making it even more attractive in instructional
settings involving computers.
The digital multimeter judgment task, the digital multimeter placement task,
and the logic gate efficiency task described by Lesgold and Lajoie (1991) would
all appear to be good candidates for assessment procedures in situations involv-
ing the training of electronic troubleshooters. These techniques also suggest the
more general possibility of developing measures of performance efficiency in any
situation involving a routine activity. The efficiency with which that activity
could be performed could provide a valuable index of one type of task expertise.
One cautionary note should be sounded, however, about measures of task
efficiency. The ability to perform a routine activity with efficiency is not syn-
onymous with high levels of task skill. True experts are frequently identified by
their ability to perform in the unusual rather than the usual situation.
Discussion
The preceding sections of this article document the fact that there are a variety
of procedures that can be used to measure dimensions of cognitive performance.
The remaining section of the article will discuss some of the ways that the use of
the procedures may benefit cognitive skill training, and it will discuss some of the
concerns and cautions that should accompany the use of cognitive skill assess-
ment procedures as a means of assessing training success.
Potential Benefits of Using Cognitive Skill Assessment Procedures
It is not at all uncommon in job settings to hear supervisors lament the fact that
newly trained workers "don't know anything" and will have to be trained on the
job. It is very likely that this lament is associated with the fact that many training
schools are designed to meet standards of performance that have little to do with
actual job performance. Moreover, it is also likely that the setting of standards of
performance has been strongly influenced by measurement procedures that
focus on acquisition of knowledge rather than on indexes associated with skilled
performance.

234
Measuring Cognitive Skills
Educators have been concerned for many years that assessment has been
driving the curriculum. The most often voiced concern is that standardized tests
are dictating curriculum content. A more recent concern is that tests are guiding
not only the content to be learned but also the way it is being learned. Specifi-
cally, educators have expressed the concern that tests encourage the learning of
material and skills (e.g., the memorization of facts) that are not transferable to
real world activities (Collins, 1990; Neill & Medina, 1989).
Similar concerns could be raised about the training of skilled cognitive perfor-
mance. Multiple-choice tests are relatively easy to construct, and it is easy to
devise training goals based on multiple-choice test performance (e.g., 90% of the
trainees will learn 90% of the material). Over the years, military and industrial
training specialists have devised training procedures that can successfully accom-
plish these goals. But goal accomplishment may, in some cases, have been
purchased at the expense of failing to acquire skills more relevant to job perfor-
mance. For instance, Regian and Schneider (1990) note that traditional testing
procedures (the most common means of assessing training progress) are poor
predictors of skilled performance after training. They go on to suggest that
assessment activities targeted at task-specific cognitive processes are much bet-
ter than global traditional procedures in accurately charting the course of skill
acquisition.
Cognitive skill research of the type reviewed in this article has revealed that
experts and novices differ in many ways, only one of which is the amount of
domain knowledge that has been mastered. Other attributes such as the organi-
zation of knowledge, the ability to process problems in depth, and the appro-
priateness of the mental model possessed by the learner have been ignored as
assessment issues and, more importantly, as instructional issues. Training sys-
tems that focus on the development of cognitive skills and that use cognitive skill
assessment procedures as indexes of training success have the potential of
changing this situation. It is possible that training systems focused on a broader
range of cognitive skills will produce fewer trainees who "don't know anything."
Do Cognitive Assessment Procedures Have Good Psychometric Properties?
Instructional systems motivated by cognitive theory will become popular only
if it can be shown that the systems have distinct advantages relative to more
traditional systems. This evidence will only be forthcoming if effective assess-
ment procedures can be developed that reveal the advantages. The problem is
that there is a gaping research hole that prevents the acceptance of virtually all of
the assessment procedures reviewed in this article.
In all of the research that we read, there was not a single report of a reliability
index for an assessment procedure, and indexes of validity were available only as
inferences of the form, "If the measures were not valid, the experiment would
not have come out as it did." The lack of concern about the psychometric
properties of measures used in experimental studies is not at all surprising, given
that the concern in experiments is typically about whether the mean of one group
is larger than the mean of another and given that poor psychometric properties
have a conservative bias. That is, measures having poor psychometric properties
favor the acceptance of null hypotheses.
The situation is quite different, however, in instructional situations. In instruc-
tional settings, one commonly wants to make an inference about the perfor-
235
Royer, Cisero, and Carlo
mance of a given individual. Inferences about individual accomplishment should
only be made based on measurement procedures that are highly reliable and that
have accumulated a mosaic of evidence (Messick, 1980) consistent with the
interpretation that the measure is valid for a specific purpose. Considerable
research establishing that cognitive assessment procedures are reliable and valid
will have to be completed before the procedures are commonly used as a means
of assessing progress in instructional settings.
Is Cognitive Theory Developed Enough to Support Cognitive Skill
Instructional Systems?
The preceding section suggested that instructional systems based on cognitive
theory will become popular only if reliable and valid assessment procedures can
be developed. The implication of the section was that the inability to develop an
evidential base establishing the reliability and validity of cognitive assessment
procedures would seriously retard the development of instructional systems
based on cognitive theory. Another factor that could retard the development of
instructional systems based on cognitive theory would be the inability to trans-
form theoretical principles into instructional principles.
There is a long philosophical tradition suggesting that there is no such thing as
a truly correct scientific theory. Rather, theories vary in their usefulness, and
presumably theories of lesser usefulness are supplanted by theories of greater
usefulness. Cognitive theory certainly seems to have supplanted behavioral
theory in terms of its popularity with psychologists. However, cognitive theory
has not had much of an impact on instructional systems, and it remains an open
question as to whether it will have an impact.
Part of the difficulty is that early cognitive theory seemed to focus more on
structure than on process (though Newell & Simon's 1963 work is a notable
exception). That is, some of the most important debates seemed to center
around how many memory stores there were and whether knowledge represen-
tation was best conceptualized as semantic nets, as dual-process (verbal and
visual) codes, as propositional representations, or as systems of production
statements. These are clearly important issues for the cognitive scientists, but
they are of only peripheral interest to an instructional scientist who is interested
in moving learners from one cognitive state to another.
Recent cognitive theory has become more involved in process issues, and, as
seen in the Anderson (1983) theory reviewed in the early sections of this article,
some theories provide an explicit account of how knowledge accumulates and
changes as learning and skill develop. The appearance of processing theories,
however, has encouraged the development of an issue that is potentially even
more vexing.
In a recent book, Anderson (1990) has suggested that there is no principled
way that cognitive scientists can distinguish between competing explanations of
cognitive phenomena. For instance, one could take a situation where a person
experiences a particular set of events and exhibits a set of behaviors in the
presence of those events. Theory A accounts for those events with a set of
hypothetical structures and processes, and theory B accounts for the same events
with a quite different set of structures and processes. Anderson (1990) suggests
that it is quite possible that both theories could provide equally acceptable
236
Measuring Cognitive Skills
explanations of the events and there would be no way that one could decide
whether one explanation was better than the other. The fundamental problem,
Anderson asserts, is that there are a myriad of functions that could map stimulus
event A onto behavioral sequence B and there is no way of knowing which
mapping function is closest to the true state of affairs.
The dilemma for Anderson represents an opportunity for instructional de-
signers.2 One approach to differentiating between theories of learning would be
to assess the impact of instructional approaches based on the theories. This is not
a novel idea. Many of the researchers involved in work on intelligent tutoring
systems are interested in evaluating cognitive theory through the use of instruc-
tional systems. Similarly, instructional practices associated with cognitive ap-
prenticeships (Collins, Brown, & Newman, 1989) such as reciprocal teaching
(e.g., Brown & Palincsar, 1989) and the procedural teaching of writing (Scar-
damalia, Bereiter, & Steinbach, 1984) have been motivated to some extent by
the desire to evaluate predictions derived from cognitive theoretical perspec-
tives. There seems to be a clear trend in cognitive science for some theory
evaluation to take place in the context of instructional efforts. It could very well
be that evaluations of instructional efforts based on differing cognitive theories
will serve to constrain viable forms of cognitive theories. Thus, instructional
designers could make a valuable contribution to theory development.
Are Cognitive Assessment Procedures Authentic?
Authentic assessment has been one of the more popular buzz terms in the
measurement community in the past few years.3 Authentic assessment involves
performances that have educational value in their own right (Wiggins, 1989).
Common examples include open-ended problems, computer simulations of real
world problems, essays, hands-on science problems, and portfolios of student
work. Given the current concern with making assessment more authentic and,
presumably, more responsive to educational needs, it is relevant to consider
whether the cognitive assessments we have described in this article are authentic.
It is obvious that many of the performances described in the article qualify as
instances of authentic assessment. Examples that readily come to mind are Egan
and Schwartz's (1979) assessment of processing circuit diagrams, Guthrie's
(1988) measures of document search efficiency, and Lesgold et al.'s (1988)
assessments of mental models involved in diagnosing X rays. However, it is
equally as obvious that many of the performances being assessed have little value
in and of themselves. Examples of assessments that are not authentic include
multidimensional scaling approaches to the assessment of knowledge organiza-
tion and dual-task methods of assessing automaticity of performance.
Given that not all of the cognitive assessments we have described in this article
have value in and of themselves (i.e., are not authentic), the question becomes
whether those that are not authentic should be any less valued than those that
are. The assessments described in the article that are not authentic generally fall
into one of two types: those that provide an indirect index of a valued educational
performance and those that measure a cognitive skill that is a component of a
larger complex skill. We argue that both of these types of cognitive assessments
are, in fact, highly valuable in and of themselves and deserve equal status with
truly authentic assessments in instructional efforts directed at the training of
cognitive skills.
237
Royer, Cisero, and Carlo
The value of indirect indexes of performance should be obvious. There are
some types of performance that are very difficult to measure directly. The extent
of knowledge acquisition is an example. If we were dependent on authentic tasks
for the measurement of knowledge acquisition, we would only be able to mea-
sure a very small amount of the knowledge of interest on any one assessment,
thereby running the risk that the assessment will either underestimate or over-
estimate the true extent to which the student has mastered a targeted knowledge
domain. Accordingly, researchers and educators rely on measures like those
described in the knowledge assessment section of the article to provide an
indirect measure of the extent to which targeted knowledge has been acquired.
We also argue that measures of cognitive skills that are components of the
complex of skills supporting the performance of authentic tasks are also valued
measures in and of themselves. In making this argument, we would like to
differentiate between assessments that have task authenticity and assessments
that have process authenticity. Assessments that have task authenticity are
performances that earlier writers have described as having value in and of
themselves. Assessments having processing authenticity measure a cognitive
skill that is a critical component of the authentic task skill. A critical component
skill is one that, if absent, would prevent the acceptable performance of the
authentic task. Assessments having processing authenticity would have diagnos-
tic value in that they would identify critical skills that had not been acquired and
they would add evidence that students had truly acquired the desired complex
skill.
Earlier in the article, it was mentioned that cognitive assessments are most
useful in situations where a cognitive task analysis has preceded the choice of
measurement procedures. It should be noted that the concept of processing
authenticity is dependent on the completion of a cognitive task analysis of a
complex skill. Authentic processing skills can only be identified by determining
the nature of the component skills that underlie a complex skill.

Notes
*We would like to thank Bill Montague for suggesting we talk about this point.
2
We thank an anonymous reviewer for making this point.
3
Tom Andre originally suggested we discuss this issue of authentic assessment, and
two reviewers drove the point home.
References
Adelson, B. (1981). Problem solving and the development of abstract categories in
programming languages. Memory & Cognition, 9(4), 422-433.
Adelson, B. (1984). When novices surpass experts: The difficulty of a task may
increase with expertise. Journal of Experimental Psychology: Learning, Memory
and Cognition, 10(3), 483-495.
Allard, F., Graham, S., & Paarsalu, M. E. (1980). Perception in sport: Basketball.
Journal of Sport Psychology, 2(1), 14-21.
Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological Review, 89,
369-406.
Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard
University Press.
Anderson, J. R. (1987). Skill acquisition: Compilation of weak-method problem
solutions. Psychological Review, 94, 192-210.
238
Measuring Cognitive Skills
Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Erlbaum.
Anderson, J. R. (1990b). Analysis of student performance with the LISP tutor. In N.
Frederiksen, R. Glaser, A. Lesgold, & M. G. Shafto (Eds.), Diagnostic monitor-
ing of skill and knowledge acquisition (pp. 27-50). Hillsdale, NJ: Erlbaum.
Anderson, J. R., Boyle, C. F., & Reiser, B. J. (1985). Intelligent tutoring systems.
Science, 228, 456-462.
Anderson, R. C , & Faust, G. W. (1973). Educational psychology: The science of
instruction and learning. New York: Dodd, Mead & Co.
Anzai, Y., & Yokoyama, T. (1984). Internal models in physics problem solving.
Cognition and Instruction, 1, 397-450.
Baker, L. (1989). Metacognition and the adult reader. Educational Psychology
Review, I, 3-38.
Baker, L., & Brown, A. L. (1984). Metacognitive skills and reading. In D. P. Pearson
(Ed.), Handbook of research in reading (pp. 353-394). New York: Longman.
Barfield, W. (1986). Expert-novice differences for software: Implications for prob-
lem-solving and knowledge acquisition. Behavior and Information Technology,
5(1), 15-29.
Britton, B. K., & Tesser, A. (1982). Effects of prior knowledge on use of cognitive
capacity in three complex cognitive tasks. Journal of Verbal Learning and Verbal
Behavior, 22,421-436.
Brown, A. L., & Palincsar, A. S. (1989). In L. S. Resnick (Ed.), Knowing, learning
and instruction: Essays in honor of Robert Glaser (pp. 393-452). Hillsdale, NJ:
Erlbaum.
Burton, R. R. (1982). Diagnosing bugs in a simple procedural skill. In D. Sleeman &
J. S. Brown (Eds.), Intelligent tutoring systems (pp. 157-184). New York: Aca-
demic.
Card, S. K., Moran, T. P., & Newell, A. (1980). Computer text-editing: An informa-
tion-processing analysis of a routine cognitive skill. Cognitive Psychology, 12(1),
32-74.
Carlo, M. S., Royer, J. M., Dufresne, R., & Mestre, J. P. (1992, April). Reading,
inferencing and problem identification: Do experts and novices differ in all three?
Paper presented at the Annual Meeting of the American Educational Research
Association, San Francisco.
Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4,
55-81.
Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representa-
tion of physics problems by experts and novices. Cognitive Science, 5, 121-125.
Chi, M. T. H., Glaser, R., & Rees, E. (1982). Expertise in problem solving. In
R. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 1,
pp. 17-76). Hillsdale, NJ: Erlbaum.
Collins, A. (1990). Reformulating testing to measure learning and thinking. In N.
Frederiksen, R. Glaser, A. Lesgold, & M. G. Shafto (Eds.), Diagnostic monitor-
ing of skill and knowledge acquisition (pp. 75-87). Hillsdale, NJ: Erlbaum.
Collins, A., Brown, J. S., &Newman, S. E. (1989). InL. S. Resnick (Ed.), Knowing,
learning and instruction: Essays in honor of Robert Glaser (pp. 453-494). Hills-
dale, NJ: Erlbaum.
Egan, D. E., & Schwartz, B. J. (1979). Chunking in recall of symbolic drawings.
Memory and Cognition, 7, 149-158.
Ericsson, K. A., Chase, W. G., & Faloon, S. (1980). Acquisition of a memory skill.
Science, 208, 1181-1182.
Fitts, P. M. (1964). Perceptual-motor skill learning. In A. W. Melton (Ed.), Catego-
ries of human learning (pp. 243-285). New York: Academic.
Fodor, J. (1983). Modularity of mind. Cambridge, MA: MIT Press.
239
Royer, Cisero, and Carlo
Forster, K. I. (1979). Levels of processing and the structure of the language proc­
essor. In W. E. Cooper & E. Walker (Eds.), Sentence processing: Psycholinguistic
studies presented to Merrill Garrett (pp. 27-85). Hillsdale, NJ: Erlbaum.
Geeslin, W. E., & Shavelson, R. J. (1975). An exploratory analysis of the representa­
tion of a mathematical structure in students' cognitive structure. American Educa­
tional Research Journal, 12, 21-39.
Gentner, D., & Gentner, D. (1983). Flowing waters or teeming crowds: Mental
models of electricity. In D. Gentner & A. L. Stevens (Eds.), Mental models
(pp. 99-129). Hillsdale, NJ: Erlbaum.
Gentner, D., & Stevens, A. L. (1983). Mental models. Hillsdale, NJ: Erlbaum.
Gerace, W. J., & Mestre, J. P. (1990). Materials for developing concept-based
problem solving skills in physics. Unpublished manuscript, University of Massa­
chusetts, Amherst.
Gitomer, D. H. (1984). A cognitive analysis of a complex troubleshooting task.
Unpublished doctoral dissertation, University of Pittsburgh.
Glaser, R. (1984). Education and thinking: The role of knowledge. American Psy­
chologist, 39, 93-104.
Glaser, R., Lesgold, A., & Lajoie, S. (1985). Toward a cognitive theory for the
measurement of achievement. In R. R. Ronning, J. Glover, J. C. Conoley, & J. C.
Witt (Eds.), The influence of cognitive psychology on testing and measurement
(pp. 41-85). Hillsdale, NJ: Erlbaum.
Goulet, C , Bard, C , & Fleury, M. (1989). Expertise differences in preparing to
return a tennis serve: A visual information processing approach. Journal of Sport
and Exercise Psychology, 11(A), 382-398.
Green, T. R., & Jackson, P. R. (1976). Ήark-back:' A simple measure of search
patterns. British Journal of Mathematical and Statistical Psychology, 29(1),
103-113.
Guthrie, J. T. (1988). Locating information in documents: Examination of a cogni­
tive model. Reading Research Quarterly, 23, 178-199.
Guthrie, J. T., Britten, T., & Barker, K. G. (1991). Roles of document structure,
cognitive strategy, and awareness in searching for information. Reading Research
Quarterly, 26, 300-324.
Hardiman, P. T., Dufresne, R., & Mestre, J. P. (1989). The relation between
problem categorization and problem solving among experts and novices. Memory
and Cognition, 17, 627-638.
Hershey, D. A., Walsh, D. A., Read, S. J., & Chulef, A. S. (1990). The effects of
expertise on financial problem solving: Evidence for goal-directed, problem-
solving scripts. Organizational Behavior and Human Decision Processes, 46,
77-101.
Holyoak, K. J. (1991). Symbolic connectionism: Toward third-generation theories of
expertise. In K. A. Ericsson & J. Smith (Eds.), Toward a general theory of expertise
(pp. 301-335). Cambridge, England: Cambridge University Press.
Johnson, P. E. (1969). On the communication of concepts in science. Journal of
Educational Psychology, 60, 32-40.
Johnson, S. D. (1988). Cognitive analysis of expert and novice troubleshooting
performance. Performance Improvement Quarterly, 7(3), 38-54.
Konold, C. E., & Bates, J. A. (1982). The episodic/semantic memory distinction as a
heuristic in the study of instructional effects on cognitive structure. Contemporary
Educational Psychology, 7, 124-138.
LaBerge, D., & Samuels, S. (1974). Toward a theory of automatic information
processing in reading. Cognitive Psychology, 6, 293-323.
Lesgold, A., & Lajoie, S. (1991). Complex problem solving in electronics. In R. J.
Sternberg & P. A. Frensch (Eds.), Complex problem solving: Principles and
mechanisms (pp. 287-316). Hillsdale, NJ: Erlbaum.
240
Measuring Cognitive Skills
Lesgold, A., Lajoie, S., Logan, D., & Eggan. G. (1990). Applying cognitive task
analysis and research methods to assessment. In N. Frederiksen, R. Glaser, A.
Lesgold, & M. G. Shafto (Eds.), Diagnostic monitoring of skill and knowledge
acquisition (pp. 325-350). Hillsdale, NJ: Erlbaum.
Lesgold, A., Rubinson, H., Feltovich, P., Glaser, R., Klopfer, D., & Wang, Y.
(1988). Expertise in a complex skill: Diagnosing x-ray pictures. In M. T. H. Chi, R.
Glaser, & M. J. Farr (Eds.), The nature of expertise (pp. 311-342). Hillsdale, NJ:
Erlbaum.
Lopes, L. L. (1976). Model-based decision and inference in stud poker. Journal of
Experimental Psychology: General, 105(3), 217-239.
McClosky, M., Caramazza, A., & Green, B. (1980). Curvilinear motion in the
absence of external forces: Naive beliefs about the motion of objects. Science, 210,
1139-1141.
Messick, S. (1980). Test validity and the ethics of assessment. American Psycholo-
gist, 35, 1012-1027.
Navon, D., & Gopher, D. (1979). On the economy of the human-processing system.
Psychological Review, 86, 214-255.
Neill, D. M., & Medina, N. J. (1989). Standardized testing: Harmful to educational
health. Phi Delta Kappan, 70, 688-697.
Newell, A., & Simon, H. A. (1963). GPS, a program that simulates human thought.
In E. Feigenbaum & J. Feldman (Eds.), Computers and thought (pp. 279-293).
New York: McGraw-Hill.
Norman, D. A., & Bobrow, D. G. (1975). On data-limited and resource-limited
processes. Cognitive Psychology, 7, 44-64.
Perfetti, C. A. (1988). Verbal efficiency in reading ability. In M. Daneman, G. E.
MacKinnon, & T. G. Waller (Eds.), Reading research: Advances in theory and
practice (pp. 109-143). New York: Academic.
Purkitt, H. E., & Dyson, J. W. (1988). An experimental study of cognitive processes
and information in political problem solving. Acta Psychologica, 68(3), 329-342.
Regian, J. W., & Schneider, W. (1990). Assessment procedures for predicting and
optimizing skill acquisition after extensive practice. In N. Frederiksen, R. Glaser,
A. Lesgold, & M. G. Shafto (Eds.), Diagnostic monitoring of skill acquisition and
knowledge acquisition (pp. 297-323). Hillsdale, NJ: Erlbaum.
Reitman, J. S., & Rueter, H. H. (1980). Organization revealed by recall orders and
confirmed by pauses. Cognitive Psychology, 12, 554-581.
Reynolds, R. E., & Anderson, R. C. (1982). Influence of questions on the allocation
of attention during reading. Journal of Educational Psychology, 74, 623-632.
Riley, M. S. (1985). Structural understanding in performance and learning. Un-
published doctoral dissertation, University of Pittsburgh.
Ronan, W. W., Anderson, C. L., & Talbert, T. L. (1976). A psychometric approach
to job performance: Fire fighters. Public Personnel Management, 5(6), 409-413.
Rosenbaum, D. A. (1986). Action planning. Unpublished manuscript, University of
Massachusetts, Amherst.
Royer, J. M. (1990). The Sentence Verification Technique: A new direction in the
assessment of reading comprehension. In S. M. Legg & J. Algina (Eds.), Cognitive
assessment of language and math outcomes (pp. 144-191). Norwood, NJ: Ablex.
Royer, J. M., Abranovic, W. A., & Sinatra, G. (1987). Using entering reading
performance as a predictor of course performance in college classes. Journal of
Educational Psychology, 79, 19-26.
Royer, J. M., Carlo, M. S., & Cisero, C. A. (1992). School-based uses for the
Sentence Verification Technique for measuring listening and reading comprehen-
sion. Psychological Test Bulletin, 5(1), 5-19.
Royer, J. M., Lynch, D. J., Hambleton, R. K., & Bulgareli, C. (1984). Using the
Sentence Verification Technique to assess the comprehension of technical text as a
241
Royer, Cisero, and Carlo
function of subject matter expertise. American Educational Research Journal, 21,
839-869.
Royer, J. M., Marchant, H., Sinatra, G., & Lovejoy, D. (1990). The prediction of
college course performance from reading comprehension performance: Evidence
for general and specific factors. American Educational Research Journal, 27,
158-179.
Salomon, G. (1991). Transcending the qualitative debate: The analytic and systemic
approaches to educational research. Educational Researcher, 20(6), 10-18.
Scardamalia, M., Bereiter, C , & Steinbach, R. (1984). Teachability of reflective
processes in written composition. Cognitive Science, 8, 173-190.
Schneider, W. (1985). Toward a model of attention and the development of auto-
maticity. In M. Posner & O. S. Marin (Eds.), Attention and performance XI
(pp. 475-492). Hillsdale, NJ: Erlbaum.
Schneider, W. (1986). Building automatic processing component skills. In V. Holt
(Ed.), Issues in psychological research and application in transfer of training (pp.
45-58). Arlington, VA: U.S. Army Research Institute.
Schoenfeld, A. H., & Herrmann, D. J. (1982). Problem perception and knowledge
structure in expert and novice mathematical problem solvers. Journal of Experi-
mental Psychology: Learning, Memory, and Cognition, 8(5), 484-494.
Shavelson, R. J. (1972). Some aspects of the correspondence between content
structure and cognitive structure in physics instruction. Journal of Educational
Psychology, 63, 225-234.
Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with an
unknown distance function (Vols. I & II). Psychometrika, 27,125-140, 219-246.
Shepard, R. N., & Chipman, S. (1970). Second-order isomorphism of internal
representations: Shapes of states. Cognitive Psychology, 1, 1-17.
Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human informa-
tion processing: Vol. II. Perceptual learning, automatic attending, and a general
theory. Psychological Review, 84, 127-190.
Stanovich, K. E. (1990). Concepts in developmental theories of reading skill: Cogni-
tive resources, automaticity and modularity. Developmental Review, 10, 72-100.
Sweller, J., Mawer, F., & Ward, M. R. (1983). Development of expertise in mathe-
matical problem solving. Journal of Experimental Psychology: General, 112(A),
639-661.
van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. New
York: Academic.
Vessey, I. (1988). Expert-novice knowledge organization: An empirical investigation
using computer program recall. Behavior and Information Technology, 7(2),
153-171.
Weiser, M., & Shertz, J. (1983). Programming problem representation in novice and
expert programmers. International Journal of Man-Machine Studies, 19(4),
391-398.
Wiggins, G. (1989). A true test: Toward more authentic and equitable assessment.
Phi Delta Kappan, 70, 703-713.

Authors
JAMES M. ROYER is Professor, Department of Psychology, Tobin Hall, University
of Massachusetts, Amherst, MA 01003. He specializes in cognitive approaches to
assessment and instruction.
CHERYL A. CISERO is PhD Candidate, Department of Psychology, Tobin Hall,
University of Massachusetts, Amherst, MA 01003. She specializes in prereading
skills and reading acquisition.
242
Measuring Cognitive Skills
MARIA S. CARLO is Research Associate, National Center on Adult Literacy,
University of Pennsylvania, 3910 Chestnut St., Philadelphia, PA 19104. She
specializes in bilingualism, reading, and assessment.
Received April 2, 1992
Revision received November 5, 1992
Accepted January 25, 1993

243

You might also like