Professional Documents
Culture Documents
The intent of the article is to survey procedures that could be used to assess
progress in instructional programs designed to enhance cognitive skills. The
organizational framework is provided by J. R. Anderson's (1982) theory of
cognitive skill development and by Glaser, Lesgold, and Lajoie's (1985)
categorization of dimensions of cognitive skills. After describing Anderson's
theory, the article discusses the following types of measures of cognitive skills:
(a) measures of knowledge acquisition, organization, and structure; (b)
measures of depth ofproblem representation; (c) measures of mental mo dels;
(d) measures of metacognitive skills; (e) measures of the automaticity of
performance; and (f) measures of efficiency of procedures. Each of the
sections describing measurement procedures is followed by a discussion of
the strengths and weaknesses of the procedures. The article closes with a
general discussion of techniques for measuring cognitive skills.
This work was supported by the Navy Personnel Research and Development
Center (Dr. John Ellis) under the auspices of the U.S. Army Research Office
Scientific Services Program administered by Battelle (Delivery Order 1248, Contract
No. DAAL03-86-D-0001). The authors would like to thank Tom Andre, Dan Chris-
tinaz, John Ellis, William E. Montague, Steve Parchman, Larry Pugh, Nick Van-
matre, Merle Vogel, and Wallace Wulfeck for their valuable comments on an earlier
draft of this article. The views, opinions, and/or findings herein are those of the
authors and should not be construed as an official Department of the Army position,
policy, or decision.
201
Royer, Cisero, and Carlo
instructional situation. However, we believe that researchers interested in assess-
ing instructional outcomes can develop useful assessment procedures using the
techniques as conceptual starting points.
The Purpose and Form of Cognitive Assessment
A central thesis that motivated the writing of this article is that assessment in
cognitive theory-based instructional systems has both a different purpose and a
different form than it had in earlier instructional systems. In behaviorially
oriented (noncognitive) instructional systems, assessment is generally used to
evaluate the effectiveness of instruction, as it relates to an instructional goal, or
to provide an index of the degree to which a student has benefited from exposure
to instruction. Assessment in a cognitive-based system has additional goals.
First, cognitive instructional systems view the product of learning as a developing
cognitive skill. This means that one purpose of the assessment is to identify the
student's current status in a developmental model of cognitive skill attainment.
This assessment of status in the developmental sequence of a cognitive skill can
then be used to prescribe future instructional experiences.
Another purpose of assessment in a cognitively oriented instructional system
is to provide diagnostic information. Performance on a cognitive assessment
procedure not only signals the success or failure of the instructional event but
also provides information that allows the instructional system to decide what to
do when learning failures are encountered. The role of errors may be critical in
this process. Errors are frequently considered to be the result of bugs in the
cognitive system, and certain errors are assumed to indicate the existence of
certain bugs (e.g., Burton, 1982). Thus, if a student makes one kind of error on
the assessment event, the system prescribes one kind of instructional event,
whereas a different error would result in the system's prescribing a different
instructional event.
The form of assessment procedures in cognitive instructional systems is also
different from earlier systems. Assessment in behaviorally oriented learning
systems generally focuses on whether learners have acquired the declarative
knowledge associated with the content domain and/or on whether learners can
use the knowledge they have acquired to perform activities such as selecting the
correct answer to a mathematical problem presented in the form of a multiple-
choice question. Success in this noncognitive assessment framework is generally
evaluated in terms of numbers of questions or problems correctly answered. In
contrast to this quantitative orientation to instructional assessment, cognitive
assessment frequently focuses on both qualitative and quantitative aspects of
performance. Successful instruction is thought to result in qualitative changes in
the organization and structure of knowledge and in the fluency and efficiency
with which the knowledge can be used. This means that cognitive assessment
procedures should be able to provide indexes of change in knowledge organiza-
tion and structure and indexes of the accuracy, speed, and resource load of the
activities being performed.
Procedures for assessing gains in cognitive instructional systems have not
received much attention thus far, largely due to the early stage of development of
both the theories and the instructional systems based on the theories. Theory
developers have made heavy use of techniques such as the collection of "thinking
202
Measuring Cognitive Skills
204
Measuring Cognitive Skills
TABLE 1 (continued)
Cognitive dimension assessed
Developmental level of
Author Type of task cognitive skill
Mental models
McClosky et al., 1980 Prediction of flight path Declarative/compilation
Gentner & Gentner, Identifying underlying Declarative/compilation
1983 metaphors
Lopes, 1976 Poker mental models All levels
J. R. Anderson, 1990 Correct and buggy pro- All levels
ductions
Johnson, 1988 Malfunctioning generator All levels
models
Lesgoldet al., 1988 X-ray drawing All levels
Metacognitive skills
Baker, 1989 Text faulting All levels
Rosenbaum, 1986 Visit planning All levels
Gerace & Mestre, 1990 Planning in physics All levels
problem solving
Lesgoldet al., 1990 Problem space planning All levels
Swelleret al., 1983 Changes in problem solv- All levels
ing strategy
Automaticity/encapsulation of performance
Lesgold & Lajoie, 1991 Speed of conceptual All levels
processing
Schneider, 1985 Dual task methodology All levels
Britton & Tesser, 1982 Dual task methodology All levels
Efficiency of procedures
Glaser et al., 1985 Card sorting of assembly Declarative
procedures
Lesgold & Lajoie, 1991 Multimeter judgment All levels
Lesgold & Lajoie, 1991 Multimeter placement All levels
Lesgold & Lajoie, 1991 Logic gate efficiency All levels
Green & Jackson, 1976 Hark-back technique All levels
Another technique that is based on the idea that knowledge structure can be
characterized by using indexes of associative memory has been reported by Chi,
Glaser, and Rees (1982). They had expert and novice physicists examine a list of
category labels from the domain of physics and then tell everything they could
about the category. The protocols were then transformed into node-link struc-
tures that presumably reflected the manner in which the concepts were repre-
sented in memory. The structures provided by the experts were interpreted as
being more tightly integrated than those provided by the novices.
Another associative technique involves having experts determine the degree
of relationship between concepts in a domain and then specifically choosing
concepts having particular relationships for learners to rate. For instance, Ko-
nold and Bates (1982) had experts arrange psychological concepts into groups of
pairs that had strong, moderate, slight, or negligible conceptual relationships.
These pairs of concepts were subsequently given to learners who rated the four
possible relationship categories into which the concepts fell. The ratings of
learners were then compared to the ratings provided by the experts.
Konold and Bates (1982) also used another technique to represent the knowl-
edge structures of their subjects. In the second technique, they obtained con-
cepts from a text that were associated with the fields of behavioral, cognitive, and
humanistic psychology. Subjects were then asked to place randomly ordered
concepts into the appropriate school of psychology, and their classification
accuracy was then compared to that obtained from the text.
A somewhat different form of knowledge structure assessment, based on
assumptions about the associative structure of memory, has been reported by
Reitman and Rueter (1980). They take advantage of the well-known finding that
subjects will utilize some form of structured recall when asked to free recall
words. Reitman and Rueter presented subjects with a list of words that could be
organized into categories based on conceptual similarities within a content
domain. Subjects were then asked to free recall the words, and the extent to
which subjects recalled the words in category order was taken as an index of the
extent to which a subject's knowledge was organized in accordance with the
conceptual organization of the domain.
Adelson (1981) has also reported a study that utilized free recall organization
as an index of knowledge structure. She investigated how novice and expert
computer programmers represent and use programming concepts by presenting
them with 16 lines of randomly organized programming code for nine presenta-
tion and free recall trials. The subjects were told that the code, which could be
organized procedurally as three separate programs or syntactically as five cate-
gories of commands, could be organized and that they might find it helpful to
impose an organization during recall. The extent to which the subjects grouped
the code either procedurally or syntactically was taken as an index of the extent
of the degree to which programming knowledge was organized.
Indirect measures of knowledge organization. The procedures for assessing
knowledge structure that were described in the preceding section assumed that
knowledge within a domain was organized around conceptual topics and that an
index of organization could be attained by directly measuring associative aspects
of memory. The measures to be described in this section are indirect measures of
cognitive structure in that they do not purport to directly measure an index of
213
Royer, Cisero, and Carlo
knowledge structure. Rather, they measure performance on a task that presuma-
bly is sensitive to the degree to which knowledge is tightly organized. The notion
is that if subjects do well on these tasks it would indicate that subjects have
achieved a high degree of knowledge acquisition and organization.
Guthrie's (1988; Guthrie, Britten, & Barker, 1991) research on document
search represents an instance of an indirect measure of knowledge organization.
Guthrie and his colleagues use a computer based environment to present sub-
jects with search tasks and the materials necessary to accomplish the tasks. For
instance, subjects might be presented with an air travel computer environment
which contained menu driven subcategories of information about things like
meals, arrival and departure times, entertainment, cost, and days of operation
(Guthrie, 1988). The subjects would then be presented with a problem like
arranging for a flight to Los Angeles that must be booked at or under a given cost
and that had to arrive at a particular time. The computer would record all of the
subjects' keystrokes as they proceeded to find the material necessary to solve the
problem.
One attribute that surely determines a subject's efficiency of search through a
document is the amount and degree of organization of the knowledge domain
possessed by the subject. Accordingly, the efficiency with which a learner can
search through a document to find relevant information provides an indirect
measure of the degree to which a learner has acquired and organized domain
knowledge. This task should be particularly attractive in those training situations
that have developed computer presentations of supporting documentation.
Modifications of the documentation programs could easily be developed that
would save the learner's search activity and thereby provide an index of knowl-
edge acquisition and organization.
Card, Moran, and Newell (1980) have reported a study that is similar in some
respects to Guthrie's work. They examined how experienced text-editor users
select methods and combine selection rules into sequences by using an actual
text-editing task on a computer. In the experiments, individuals with at least 1
year's experience using the POET Text Editor were given an editing task involv-
ing an 11-page memo marked with 73 corrections in red pen and were instructed
to edit the manuscript using the computer. Keystrokes typed by the users and the
system's responses were recorded by the computer. These keystroke files were
then subsequently examined as evidence of the extent to which the subjects had
mastered the knowledge required to function in the editing situation.
Another example of the indirect assessment of knowledge acquisition and
organization can be found in Royer, Carlo, and Cisero's (1992) research on
assessing comprehension using the Sentence Verification Technique (SVT).
SVT tests are constructed by developing four types of text sentences based on the
sentences in an original passage: originals, which are exact copies of a passage
sentence; paraphrases, which entail changing as many words as possible in a
passage sentence without altering the meaning; meaning changes, which entail
changing one or two words in a passage sentence so that the meaning of the
sentence is changed; and distractors, which are sentences that are consistent with
the theme of the passage but are unrelated to any passage sentence. An SVT test
consists of an equal mix of each of the test sentence types. An examinee takes an
SVT test by reading (or listening to) a passage and then classifying each test
214
Measuring Cognitive Skills
would have an inverted U shape. Subjects with little competence in the subject
matter would have relatively fast secondary-task response times because they
had little subject matter knowledge to engage as they were performing the
primary task. Subjects with midranges of subject matter competence would have
longer secondary-task response times because the knowledge they had acquired
would take up some of the available processing capacity. Finally, true experts
could have relatively short processing times because some of the domain activ-
ities they were involved in would have become automatic (or encapsulated),
thereby freeing up capacity for processing of the secondary task.
Comments on the assessment of automatic/encapsulated processes. The tech-
niques described in this section of the article are relatively easy to implement and
in some cases have already proven to be useful as indexes of improving cognitive
skill. Lesgold and Lajoie (1991) have apparently used the semantic processing
task with some success in their work on the SHERLOCK project, and the results
of our research with computer-administered processing tasks suggest they are
useful in differentiating between students having differing degrees of expertise in
psychology.
In addition, Schneider's (e.g., 1985) and Britton and Tesser's (1982) research
has shown that the dual task methodology was sensitive to varying degrees of
domain expertise over several content domains (electronic troubleshooting, air
intercept operating, fiction, chess, fashion, and football) while subjects were
performing a number of different processing activities (high performance skills,
reading, problem solving, thinking). These results suggest that the technique
may have general utility and could be used to monitor the development of
domain expertise in many settings.
However, in order for techniques designed to assess automatic processing to
be maximally useful, instructional researchers will have to either develop norms
of performance or conduct relatively continuous (longitudinal) monitoring of
student performance. The development of norms of performance would entail
the cross-sectional assessment of groups of students before exposure to instruc-
tion, at several points during instruction, and at the end of instruction. Moreover,
it would be important to develop a data base for students who were successful
and for those who were not successful. This normative data could then be used to
interpret performance for students at various points in the instructional process.
In the absence of normative performance data, useful information could be
obtained by continually assessing student progress. The research that we have
reviewed suggests that various aspects of task performance change as expertise
develops. These changes in performance over time could be used as indexes of
increased competence in the domain.
Efficiency of Procedures
It is frequently the case that individuals who are highly competent in a domain
of activity are not only accurate at what they do but highly efficient. That is, they
perform their activities in less time and with fewer steps than do novices. Glaser
et al. (1985) provide an interesting example of this capability. They describe
attempts to measure how well airmen could carry out the task of reassembling a
complex part of a jet engine after overhaul. The task they devised consisted of
asking airmen to sort a series of cards. Each card contained an assembly step,
232
Measuring Cognitive Skills
and perfect performance on the task was considered to be a sorting of the cards in
accordance with the order specified in the manufacturer's manual. Somewhat
surprisingly, there was no relation between performance on the task and other
indexes of expertise. Glaser et al. (1985) subsequently found that the manufac-
turer's manual specified procedures in idiot-proof fashion. That is, if someone
followed the specified steps, they would be unlikely to make a grievous error.
However, expert mechanics knew that the nature of the overhaul task that had
been conducted dictated the order of reassembly. Hence, efficient reassembly
was constantly changing, depending on the nature of the activity to be per-
formed. Experts could make these changes whereas novices were restricted to
the relatively inefficient procedures dictated by the manual.
Assessing efficiency of procedures. Glaser et al. (1985) reported that their
solution to assessing the efficiency of jet engine mechanics involved developing a
family of sorting tasks in which the airmen were initially given the details of the
maintenance that had been performed and were then asked to sort activities in
the order that they would perform them. This type of activity is probably
applicable to many training situations and represents a means of assessing
efficiency of procedures that is relatively easy to develop and use.
Lesgold and Lajoie (1991) have described several procedures that assess the
efficiency with which electronic troubleshooters perform routine activities. The
first procedure involved asking trainees to compare meter readings of voltage or
resistance with expected values in printed documentation. A second task mea-
sured the efficiency with which electronic trainees could place measurement
probes in the proper place on a given circuit for a given measurement. Gitomer
(1984) had previously found that both of these techniques differentiated between
skilled and less skilled Air Force technicians.
Lesgold and Lajoie (1991) also described a task involving more complicated
troubleshooting skills. Tracing a signal through digital circuity requires a thor-
ough understanding of the inputs and outputs of common logic gates. Lesgold
and Lajoie (1991) described a task that involved presenting logic gates that have
either an output or one input missing and then asking the trainees to indicate the
nature of the missing value. Of particular interest in the procedure was the
efficiency with which trainees could fill in the missing values. Gitomer (1984) had
also found that this task differentiated between skilled and less skilled Air Force
troubleshooters.
A technique that would appear to have general utility as a means of measuring
efficiency of procedures has been reported by Green and Jackson (1976). They
described a technique (Hark-back) for measuring the frequency with which a
subject refers back to earlier conclusions while using interviews, problem solving
protocols, or any activity where search is under the subject's control. A move
harks back when it is descended from another move that is not the most recent
one. The hark-back measure is relevant to assessing the efficiency of procedures
because subjects acquiring expertise in a domain frequently find themselves in
the position of continually referring back to something they have already done
while performing a domain activity. Anderson (1987) suggests this looking back
activity is largely the result of the reliance on general weak-method problem
solving strategies such as means-end analysis or solution-by-analogy strategies.
These techniques are extremely demanding of cognitive resources, and learners
233
Royer, Cisero, and Carlo
frequently find themselves in the position of having forgotten an earlier activity
and having to refer back to the step. In contrast, experts can rely on more
powerful domain specific strategies that are less demanding of cognitive re-
sources and, hence, more efficient.
Green and Jackson's (1976) hark-back measure provides a quantitative index
(called the H coefficient) of the extent to which subjects refer back to earlier
steps. The article reports the formulas for calculating the H coefficient along with
the statistical assumptions and characteristics of the measure.
Comments on assessing efficiency of procedures. As has been mentioned
several times in this article, assessment procedures based on sorting activities are
relatively cumbersome to use in instructional settings. This makes the specific
technique described by Glaser et al. (1985) an unlikely candidate for an assess-
ment procedure to be used in an instructional setting. However, it is entirely
possible that more tractable versions of the general technique could be devel-
oped. For instance, it would probably be possible to develop a paper-and-pencil
version of the technique, and it would surely be possible to develop a computer-
administered version of the procedure. The computer-administered version
could be automatically scored, making it even more attractive in instructional
settings involving computers.
The digital multimeter judgment task, the digital multimeter placement task,
and the logic gate efficiency task described by Lesgold and Lajoie (1991) would
all appear to be good candidates for assessment procedures in situations involv-
ing the training of electronic troubleshooters. These techniques also suggest the
more general possibility of developing measures of performance efficiency in any
situation involving a routine activity. The efficiency with which that activity
could be performed could provide a valuable index of one type of task expertise.
One cautionary note should be sounded, however, about measures of task
efficiency. The ability to perform a routine activity with efficiency is not syn-
onymous with high levels of task skill. True experts are frequently identified by
their ability to perform in the unusual rather than the usual situation.
Discussion
The preceding sections of this article document the fact that there are a variety
of procedures that can be used to measure dimensions of cognitive performance.
The remaining section of the article will discuss some of the ways that the use of
the procedures may benefit cognitive skill training, and it will discuss some of the
concerns and cautions that should accompany the use of cognitive skill assess-
ment procedures as a means of assessing training success.
Potential Benefits of Using Cognitive Skill Assessment Procedures
It is not at all uncommon in job settings to hear supervisors lament the fact that
newly trained workers "don't know anything" and will have to be trained on the
job. It is very likely that this lament is associated with the fact that many training
schools are designed to meet standards of performance that have little to do with
actual job performance. Moreover, it is also likely that the setting of standards of
performance has been strongly influenced by measurement procedures that
focus on acquisition of knowledge rather than on indexes associated with skilled
performance.
234
Measuring Cognitive Skills
Educators have been concerned for many years that assessment has been
driving the curriculum. The most often voiced concern is that standardized tests
are dictating curriculum content. A more recent concern is that tests are guiding
not only the content to be learned but also the way it is being learned. Specifi-
cally, educators have expressed the concern that tests encourage the learning of
material and skills (e.g., the memorization of facts) that are not transferable to
real world activities (Collins, 1990; Neill & Medina, 1989).
Similar concerns could be raised about the training of skilled cognitive perfor-
mance. Multiple-choice tests are relatively easy to construct, and it is easy to
devise training goals based on multiple-choice test performance (e.g., 90% of the
trainees will learn 90% of the material). Over the years, military and industrial
training specialists have devised training procedures that can successfully accom-
plish these goals. But goal accomplishment may, in some cases, have been
purchased at the expense of failing to acquire skills more relevant to job perfor-
mance. For instance, Regian and Schneider (1990) note that traditional testing
procedures (the most common means of assessing training progress) are poor
predictors of skilled performance after training. They go on to suggest that
assessment activities targeted at task-specific cognitive processes are much bet-
ter than global traditional procedures in accurately charting the course of skill
acquisition.
Cognitive skill research of the type reviewed in this article has revealed that
experts and novices differ in many ways, only one of which is the amount of
domain knowledge that has been mastered. Other attributes such as the organi-
zation of knowledge, the ability to process problems in depth, and the appro-
priateness of the mental model possessed by the learner have been ignored as
assessment issues and, more importantly, as instructional issues. Training sys-
tems that focus on the development of cognitive skills and that use cognitive skill
assessment procedures as indexes of training success have the potential of
changing this situation. It is possible that training systems focused on a broader
range of cognitive skills will produce fewer trainees who "don't know anything."
Do Cognitive Assessment Procedures Have Good Psychometric Properties?
Instructional systems motivated by cognitive theory will become popular only
if it can be shown that the systems have distinct advantages relative to more
traditional systems. This evidence will only be forthcoming if effective assess-
ment procedures can be developed that reveal the advantages. The problem is
that there is a gaping research hole that prevents the acceptance of virtually all of
the assessment procedures reviewed in this article.
In all of the research that we read, there was not a single report of a reliability
index for an assessment procedure, and indexes of validity were available only as
inferences of the form, "If the measures were not valid, the experiment would
not have come out as it did." The lack of concern about the psychometric
properties of measures used in experimental studies is not at all surprising, given
that the concern in experiments is typically about whether the mean of one group
is larger than the mean of another and given that poor psychometric properties
have a conservative bias. That is, measures having poor psychometric properties
favor the acceptance of null hypotheses.
The situation is quite different, however, in instructional situations. In instruc-
tional settings, one commonly wants to make an inference about the perfor-
235
Royer, Cisero, and Carlo
mance of a given individual. Inferences about individual accomplishment should
only be made based on measurement procedures that are highly reliable and that
have accumulated a mosaic of evidence (Messick, 1980) consistent with the
interpretation that the measure is valid for a specific purpose. Considerable
research establishing that cognitive assessment procedures are reliable and valid
will have to be completed before the procedures are commonly used as a means
of assessing progress in instructional settings.
Is Cognitive Theory Developed Enough to Support Cognitive Skill
Instructional Systems?
The preceding section suggested that instructional systems based on cognitive
theory will become popular only if reliable and valid assessment procedures can
be developed. The implication of the section was that the inability to develop an
evidential base establishing the reliability and validity of cognitive assessment
procedures would seriously retard the development of instructional systems
based on cognitive theory. Another factor that could retard the development of
instructional systems based on cognitive theory would be the inability to trans-
form theoretical principles into instructional principles.
There is a long philosophical tradition suggesting that there is no such thing as
a truly correct scientific theory. Rather, theories vary in their usefulness, and
presumably theories of lesser usefulness are supplanted by theories of greater
usefulness. Cognitive theory certainly seems to have supplanted behavioral
theory in terms of its popularity with psychologists. However, cognitive theory
has not had much of an impact on instructional systems, and it remains an open
question as to whether it will have an impact.
Part of the difficulty is that early cognitive theory seemed to focus more on
structure than on process (though Newell & Simon's 1963 work is a notable
exception). That is, some of the most important debates seemed to center
around how many memory stores there were and whether knowledge represen-
tation was best conceptualized as semantic nets, as dual-process (verbal and
visual) codes, as propositional representations, or as systems of production
statements. These are clearly important issues for the cognitive scientists, but
they are of only peripheral interest to an instructional scientist who is interested
in moving learners from one cognitive state to another.
Recent cognitive theory has become more involved in process issues, and, as
seen in the Anderson (1983) theory reviewed in the early sections of this article,
some theories provide an explicit account of how knowledge accumulates and
changes as learning and skill develop. The appearance of processing theories,
however, has encouraged the development of an issue that is potentially even
more vexing.
In a recent book, Anderson (1990) has suggested that there is no principled
way that cognitive scientists can distinguish between competing explanations of
cognitive phenomena. For instance, one could take a situation where a person
experiences a particular set of events and exhibits a set of behaviors in the
presence of those events. Theory A accounts for those events with a set of
hypothetical structures and processes, and theory B accounts for the same events
with a quite different set of structures and processes. Anderson (1990) suggests
that it is quite possible that both theories could provide equally acceptable
236
Measuring Cognitive Skills
explanations of the events and there would be no way that one could decide
whether one explanation was better than the other. The fundamental problem,
Anderson asserts, is that there are a myriad of functions that could map stimulus
event A onto behavioral sequence B and there is no way of knowing which
mapping function is closest to the true state of affairs.
The dilemma for Anderson represents an opportunity for instructional de-
signers.2 One approach to differentiating between theories of learning would be
to assess the impact of instructional approaches based on the theories. This is not
a novel idea. Many of the researchers involved in work on intelligent tutoring
systems are interested in evaluating cognitive theory through the use of instruc-
tional systems. Similarly, instructional practices associated with cognitive ap-
prenticeships (Collins, Brown, & Newman, 1989) such as reciprocal teaching
(e.g., Brown & Palincsar, 1989) and the procedural teaching of writing (Scar-
damalia, Bereiter, & Steinbach, 1984) have been motivated to some extent by
the desire to evaluate predictions derived from cognitive theoretical perspec-
tives. There seems to be a clear trend in cognitive science for some theory
evaluation to take place in the context of instructional efforts. It could very well
be that evaluations of instructional efforts based on differing cognitive theories
will serve to constrain viable forms of cognitive theories. Thus, instructional
designers could make a valuable contribution to theory development.
Are Cognitive Assessment Procedures Authentic?
Authentic assessment has been one of the more popular buzz terms in the
measurement community in the past few years.3 Authentic assessment involves
performances that have educational value in their own right (Wiggins, 1989).
Common examples include open-ended problems, computer simulations of real
world problems, essays, hands-on science problems, and portfolios of student
work. Given the current concern with making assessment more authentic and,
presumably, more responsive to educational needs, it is relevant to consider
whether the cognitive assessments we have described in this article are authentic.
It is obvious that many of the performances described in the article qualify as
instances of authentic assessment. Examples that readily come to mind are Egan
and Schwartz's (1979) assessment of processing circuit diagrams, Guthrie's
(1988) measures of document search efficiency, and Lesgold et al.'s (1988)
assessments of mental models involved in diagnosing X rays. However, it is
equally as obvious that many of the performances being assessed have little value
in and of themselves. Examples of assessments that are not authentic include
multidimensional scaling approaches to the assessment of knowledge organiza-
tion and dual-task methods of assessing automaticity of performance.
Given that not all of the cognitive assessments we have described in this article
have value in and of themselves (i.e., are not authentic), the question becomes
whether those that are not authentic should be any less valued than those that
are. The assessments described in the article that are not authentic generally fall
into one of two types: those that provide an indirect index of a valued educational
performance and those that measure a cognitive skill that is a component of a
larger complex skill. We argue that both of these types of cognitive assessments
are, in fact, highly valuable in and of themselves and deserve equal status with
truly authentic assessments in instructional efforts directed at the training of
cognitive skills.
237
Royer, Cisero, and Carlo
The value of indirect indexes of performance should be obvious. There are
some types of performance that are very difficult to measure directly. The extent
of knowledge acquisition is an example. If we were dependent on authentic tasks
for the measurement of knowledge acquisition, we would only be able to mea-
sure a very small amount of the knowledge of interest on any one assessment,
thereby running the risk that the assessment will either underestimate or over-
estimate the true extent to which the student has mastered a targeted knowledge
domain. Accordingly, researchers and educators rely on measures like those
described in the knowledge assessment section of the article to provide an
indirect measure of the extent to which targeted knowledge has been acquired.
We also argue that measures of cognitive skills that are components of the
complex of skills supporting the performance of authentic tasks are also valued
measures in and of themselves. In making this argument, we would like to
differentiate between assessments that have task authenticity and assessments
that have process authenticity. Assessments that have task authenticity are
performances that earlier writers have described as having value in and of
themselves. Assessments having processing authenticity measure a cognitive
skill that is a critical component of the authentic task skill. A critical component
skill is one that, if absent, would prevent the acceptable performance of the
authentic task. Assessments having processing authenticity would have diagnos-
tic value in that they would identify critical skills that had not been acquired and
they would add evidence that students had truly acquired the desired complex
skill.
Earlier in the article, it was mentioned that cognitive assessments are most
useful in situations where a cognitive task analysis has preceded the choice of
measurement procedures. It should be noted that the concept of processing
authenticity is dependent on the completion of a cognitive task analysis of a
complex skill. Authentic processing skills can only be identified by determining
the nature of the component skills that underlie a complex skill.
Notes
*We would like to thank Bill Montague for suggesting we talk about this point.
2
We thank an anonymous reviewer for making this point.
3
Tom Andre originally suggested we discuss this issue of authentic assessment, and
two reviewers drove the point home.
References
Adelson, B. (1981). Problem solving and the development of abstract categories in
programming languages. Memory & Cognition, 9(4), 422-433.
Adelson, B. (1984). When novices surpass experts: The difficulty of a task may
increase with expertise. Journal of Experimental Psychology: Learning, Memory
and Cognition, 10(3), 483-495.
Allard, F., Graham, S., & Paarsalu, M. E. (1980). Perception in sport: Basketball.
Journal of Sport Psychology, 2(1), 14-21.
Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological Review, 89,
369-406.
Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard
University Press.
Anderson, J. R. (1987). Skill acquisition: Compilation of weak-method problem
solutions. Psychological Review, 94, 192-210.
238
Measuring Cognitive Skills
Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Erlbaum.
Anderson, J. R. (1990b). Analysis of student performance with the LISP tutor. In N.
Frederiksen, R. Glaser, A. Lesgold, & M. G. Shafto (Eds.), Diagnostic monitor-
ing of skill and knowledge acquisition (pp. 27-50). Hillsdale, NJ: Erlbaum.
Anderson, J. R., Boyle, C. F., & Reiser, B. J. (1985). Intelligent tutoring systems.
Science, 228, 456-462.
Anderson, R. C , & Faust, G. W. (1973). Educational psychology: The science of
instruction and learning. New York: Dodd, Mead & Co.
Anzai, Y., & Yokoyama, T. (1984). Internal models in physics problem solving.
Cognition and Instruction, 1, 397-450.
Baker, L. (1989). Metacognition and the adult reader. Educational Psychology
Review, I, 3-38.
Baker, L., & Brown, A. L. (1984). Metacognitive skills and reading. In D. P. Pearson
(Ed.), Handbook of research in reading (pp. 353-394). New York: Longman.
Barfield, W. (1986). Expert-novice differences for software: Implications for prob-
lem-solving and knowledge acquisition. Behavior and Information Technology,
5(1), 15-29.
Britton, B. K., & Tesser, A. (1982). Effects of prior knowledge on use of cognitive
capacity in three complex cognitive tasks. Journal of Verbal Learning and Verbal
Behavior, 22,421-436.
Brown, A. L., & Palincsar, A. S. (1989). In L. S. Resnick (Ed.), Knowing, learning
and instruction: Essays in honor of Robert Glaser (pp. 393-452). Hillsdale, NJ:
Erlbaum.
Burton, R. R. (1982). Diagnosing bugs in a simple procedural skill. In D. Sleeman &
J. S. Brown (Eds.), Intelligent tutoring systems (pp. 157-184). New York: Aca-
demic.
Card, S. K., Moran, T. P., & Newell, A. (1980). Computer text-editing: An informa-
tion-processing analysis of a routine cognitive skill. Cognitive Psychology, 12(1),
32-74.
Carlo, M. S., Royer, J. M., Dufresne, R., & Mestre, J. P. (1992, April). Reading,
inferencing and problem identification: Do experts and novices differ in all three?
Paper presented at the Annual Meeting of the American Educational Research
Association, San Francisco.
Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4,
55-81.
Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representa-
tion of physics problems by experts and novices. Cognitive Science, 5, 121-125.
Chi, M. T. H., Glaser, R., & Rees, E. (1982). Expertise in problem solving. In
R. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 1,
pp. 17-76). Hillsdale, NJ: Erlbaum.
Collins, A. (1990). Reformulating testing to measure learning and thinking. In N.
Frederiksen, R. Glaser, A. Lesgold, & M. G. Shafto (Eds.), Diagnostic monitor-
ing of skill and knowledge acquisition (pp. 75-87). Hillsdale, NJ: Erlbaum.
Collins, A., Brown, J. S., &Newman, S. E. (1989). InL. S. Resnick (Ed.), Knowing,
learning and instruction: Essays in honor of Robert Glaser (pp. 453-494). Hills-
dale, NJ: Erlbaum.
Egan, D. E., & Schwartz, B. J. (1979). Chunking in recall of symbolic drawings.
Memory and Cognition, 7, 149-158.
Ericsson, K. A., Chase, W. G., & Faloon, S. (1980). Acquisition of a memory skill.
Science, 208, 1181-1182.
Fitts, P. M. (1964). Perceptual-motor skill learning. In A. W. Melton (Ed.), Catego-
ries of human learning (pp. 243-285). New York: Academic.
Fodor, J. (1983). Modularity of mind. Cambridge, MA: MIT Press.
239
Royer, Cisero, and Carlo
Forster, K. I. (1979). Levels of processing and the structure of the language proc
essor. In W. E. Cooper & E. Walker (Eds.), Sentence processing: Psycholinguistic
studies presented to Merrill Garrett (pp. 27-85). Hillsdale, NJ: Erlbaum.
Geeslin, W. E., & Shavelson, R. J. (1975). An exploratory analysis of the representa
tion of a mathematical structure in students' cognitive structure. American Educa
tional Research Journal, 12, 21-39.
Gentner, D., & Gentner, D. (1983). Flowing waters or teeming crowds: Mental
models of electricity. In D. Gentner & A. L. Stevens (Eds.), Mental models
(pp. 99-129). Hillsdale, NJ: Erlbaum.
Gentner, D., & Stevens, A. L. (1983). Mental models. Hillsdale, NJ: Erlbaum.
Gerace, W. J., & Mestre, J. P. (1990). Materials for developing concept-based
problem solving skills in physics. Unpublished manuscript, University of Massa
chusetts, Amherst.
Gitomer, D. H. (1984). A cognitive analysis of a complex troubleshooting task.
Unpublished doctoral dissertation, University of Pittsburgh.
Glaser, R. (1984). Education and thinking: The role of knowledge. American Psy
chologist, 39, 93-104.
Glaser, R., Lesgold, A., & Lajoie, S. (1985). Toward a cognitive theory for the
measurement of achievement. In R. R. Ronning, J. Glover, J. C. Conoley, & J. C.
Witt (Eds.), The influence of cognitive psychology on testing and measurement
(pp. 41-85). Hillsdale, NJ: Erlbaum.
Goulet, C , Bard, C , & Fleury, M. (1989). Expertise differences in preparing to
return a tennis serve: A visual information processing approach. Journal of Sport
and Exercise Psychology, 11(A), 382-398.
Green, T. R., & Jackson, P. R. (1976). Ήark-back:' A simple measure of search
patterns. British Journal of Mathematical and Statistical Psychology, 29(1),
103-113.
Guthrie, J. T. (1988). Locating information in documents: Examination of a cogni
tive model. Reading Research Quarterly, 23, 178-199.
Guthrie, J. T., Britten, T., & Barker, K. G. (1991). Roles of document structure,
cognitive strategy, and awareness in searching for information. Reading Research
Quarterly, 26, 300-324.
Hardiman, P. T., Dufresne, R., & Mestre, J. P. (1989). The relation between
problem categorization and problem solving among experts and novices. Memory
and Cognition, 17, 627-638.
Hershey, D. A., Walsh, D. A., Read, S. J., & Chulef, A. S. (1990). The effects of
expertise on financial problem solving: Evidence for goal-directed, problem-
solving scripts. Organizational Behavior and Human Decision Processes, 46,
77-101.
Holyoak, K. J. (1991). Symbolic connectionism: Toward third-generation theories of
expertise. In K. A. Ericsson & J. Smith (Eds.), Toward a general theory of expertise
(pp. 301-335). Cambridge, England: Cambridge University Press.
Johnson, P. E. (1969). On the communication of concepts in science. Journal of
Educational Psychology, 60, 32-40.
Johnson, S. D. (1988). Cognitive analysis of expert and novice troubleshooting
performance. Performance Improvement Quarterly, 7(3), 38-54.
Konold, C. E., & Bates, J. A. (1982). The episodic/semantic memory distinction as a
heuristic in the study of instructional effects on cognitive structure. Contemporary
Educational Psychology, 7, 124-138.
LaBerge, D., & Samuels, S. (1974). Toward a theory of automatic information
processing in reading. Cognitive Psychology, 6, 293-323.
Lesgold, A., & Lajoie, S. (1991). Complex problem solving in electronics. In R. J.
Sternberg & P. A. Frensch (Eds.), Complex problem solving: Principles and
mechanisms (pp. 287-316). Hillsdale, NJ: Erlbaum.
240
Measuring Cognitive Skills
Lesgold, A., Lajoie, S., Logan, D., & Eggan. G. (1990). Applying cognitive task
analysis and research methods to assessment. In N. Frederiksen, R. Glaser, A.
Lesgold, & M. G. Shafto (Eds.), Diagnostic monitoring of skill and knowledge
acquisition (pp. 325-350). Hillsdale, NJ: Erlbaum.
Lesgold, A., Rubinson, H., Feltovich, P., Glaser, R., Klopfer, D., & Wang, Y.
(1988). Expertise in a complex skill: Diagnosing x-ray pictures. In M. T. H. Chi, R.
Glaser, & M. J. Farr (Eds.), The nature of expertise (pp. 311-342). Hillsdale, NJ:
Erlbaum.
Lopes, L. L. (1976). Model-based decision and inference in stud poker. Journal of
Experimental Psychology: General, 105(3), 217-239.
McClosky, M., Caramazza, A., & Green, B. (1980). Curvilinear motion in the
absence of external forces: Naive beliefs about the motion of objects. Science, 210,
1139-1141.
Messick, S. (1980). Test validity and the ethics of assessment. American Psycholo-
gist, 35, 1012-1027.
Navon, D., & Gopher, D. (1979). On the economy of the human-processing system.
Psychological Review, 86, 214-255.
Neill, D. M., & Medina, N. J. (1989). Standardized testing: Harmful to educational
health. Phi Delta Kappan, 70, 688-697.
Newell, A., & Simon, H. A. (1963). GPS, a program that simulates human thought.
In E. Feigenbaum & J. Feldman (Eds.), Computers and thought (pp. 279-293).
New York: McGraw-Hill.
Norman, D. A., & Bobrow, D. G. (1975). On data-limited and resource-limited
processes. Cognitive Psychology, 7, 44-64.
Perfetti, C. A. (1988). Verbal efficiency in reading ability. In M. Daneman, G. E.
MacKinnon, & T. G. Waller (Eds.), Reading research: Advances in theory and
practice (pp. 109-143). New York: Academic.
Purkitt, H. E., & Dyson, J. W. (1988). An experimental study of cognitive processes
and information in political problem solving. Acta Psychologica, 68(3), 329-342.
Regian, J. W., & Schneider, W. (1990). Assessment procedures for predicting and
optimizing skill acquisition after extensive practice. In N. Frederiksen, R. Glaser,
A. Lesgold, & M. G. Shafto (Eds.), Diagnostic monitoring of skill acquisition and
knowledge acquisition (pp. 297-323). Hillsdale, NJ: Erlbaum.
Reitman, J. S., & Rueter, H. H. (1980). Organization revealed by recall orders and
confirmed by pauses. Cognitive Psychology, 12, 554-581.
Reynolds, R. E., & Anderson, R. C. (1982). Influence of questions on the allocation
of attention during reading. Journal of Educational Psychology, 74, 623-632.
Riley, M. S. (1985). Structural understanding in performance and learning. Un-
published doctoral dissertation, University of Pittsburgh.
Ronan, W. W., Anderson, C. L., & Talbert, T. L. (1976). A psychometric approach
to job performance: Fire fighters. Public Personnel Management, 5(6), 409-413.
Rosenbaum, D. A. (1986). Action planning. Unpublished manuscript, University of
Massachusetts, Amherst.
Royer, J. M. (1990). The Sentence Verification Technique: A new direction in the
assessment of reading comprehension. In S. M. Legg & J. Algina (Eds.), Cognitive
assessment of language and math outcomes (pp. 144-191). Norwood, NJ: Ablex.
Royer, J. M., Abranovic, W. A., & Sinatra, G. (1987). Using entering reading
performance as a predictor of course performance in college classes. Journal of
Educational Psychology, 79, 19-26.
Royer, J. M., Carlo, M. S., & Cisero, C. A. (1992). School-based uses for the
Sentence Verification Technique for measuring listening and reading comprehen-
sion. Psychological Test Bulletin, 5(1), 5-19.
Royer, J. M., Lynch, D. J., Hambleton, R. K., & Bulgareli, C. (1984). Using the
Sentence Verification Technique to assess the comprehension of technical text as a
241
Royer, Cisero, and Carlo
function of subject matter expertise. American Educational Research Journal, 21,
839-869.
Royer, J. M., Marchant, H., Sinatra, G., & Lovejoy, D. (1990). The prediction of
college course performance from reading comprehension performance: Evidence
for general and specific factors. American Educational Research Journal, 27,
158-179.
Salomon, G. (1991). Transcending the qualitative debate: The analytic and systemic
approaches to educational research. Educational Researcher, 20(6), 10-18.
Scardamalia, M., Bereiter, C , & Steinbach, R. (1984). Teachability of reflective
processes in written composition. Cognitive Science, 8, 173-190.
Schneider, W. (1985). Toward a model of attention and the development of auto-
maticity. In M. Posner & O. S. Marin (Eds.), Attention and performance XI
(pp. 475-492). Hillsdale, NJ: Erlbaum.
Schneider, W. (1986). Building automatic processing component skills. In V. Holt
(Ed.), Issues in psychological research and application in transfer of training (pp.
45-58). Arlington, VA: U.S. Army Research Institute.
Schoenfeld, A. H., & Herrmann, D. J. (1982). Problem perception and knowledge
structure in expert and novice mathematical problem solvers. Journal of Experi-
mental Psychology: Learning, Memory, and Cognition, 8(5), 484-494.
Shavelson, R. J. (1972). Some aspects of the correspondence between content
structure and cognitive structure in physics instruction. Journal of Educational
Psychology, 63, 225-234.
Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with an
unknown distance function (Vols. I & II). Psychometrika, 27,125-140, 219-246.
Shepard, R. N., & Chipman, S. (1970). Second-order isomorphism of internal
representations: Shapes of states. Cognitive Psychology, 1, 1-17.
Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human informa-
tion processing: Vol. II. Perceptual learning, automatic attending, and a general
theory. Psychological Review, 84, 127-190.
Stanovich, K. E. (1990). Concepts in developmental theories of reading skill: Cogni-
tive resources, automaticity and modularity. Developmental Review, 10, 72-100.
Sweller, J., Mawer, F., & Ward, M. R. (1983). Development of expertise in mathe-
matical problem solving. Journal of Experimental Psychology: General, 112(A),
639-661.
van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. New
York: Academic.
Vessey, I. (1988). Expert-novice knowledge organization: An empirical investigation
using computer program recall. Behavior and Information Technology, 7(2),
153-171.
Weiser, M., & Shertz, J. (1983). Programming problem representation in novice and
expert programmers. International Journal of Man-Machine Studies, 19(4),
391-398.
Wiggins, G. (1989). A true test: Toward more authentic and equitable assessment.
Phi Delta Kappan, 70, 703-713.
Authors
JAMES M. ROYER is Professor, Department of Psychology, Tobin Hall, University
of Massachusetts, Amherst, MA 01003. He specializes in cognitive approaches to
assessment and instruction.
CHERYL A. CISERO is PhD Candidate, Department of Psychology, Tobin Hall,
University of Massachusetts, Amherst, MA 01003. She specializes in prereading
skills and reading acquisition.
242
Measuring Cognitive Skills
MARIA S. CARLO is Research Associate, National Center on Adult Literacy,
University of Pennsylvania, 3910 Chestnut St., Philadelphia, PA 19104. She
specializes in bilingualism, reading, and assessment.
Received April 2, 1992
Revision received November 5, 1992
Accepted January 25, 1993
243