You are on page 1of 12

Journal of Experimental Psychology: © 2010 American Psychological Association

Learning, Memory, and Cognition 0278-7393/10/$12.00 DOI: 10.1037/a0019182


2010, Vol. 36, No. 3, 699 –710

Solve the Problem First: Constructive Solution Strategies Can Influence the
Accuracy of Retrospective Confidence Judgments

Ainsley L. Mitchum and Colleen M. Kelley


Florida State University

Two experiments tested whether differences in problem-solving strategies influence the ability of people
to monitor their problem-solving effectiveness as measured by confidence judgments. On multiple choice
problems, people tend to use either a constructive matching strategy, whereby they attempt to solve a
problem before looking at the response options, or a response elimination strategy, whereby they work
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

backward from response options trying to find one that fits as a solution. Constructive matching gives rise
This document is copyrighted by the American Psychological Association or one of its allied publishers.

to different cues that may enhance confidence monitoring. Experiment 1 showed that spontaneous
constructive matching in nonverbal spatial reasoning problems was associated with better confidence
calibration and resolution than response elimination. We manipulated strategy in Experiment 2 by
requiring constructive matching and found improved monitoring. Implications for research on monitor-
ing, overconfidence, and the association between skill and monitoring are discussed.

Keywords: strategies, monitoring, metacognition, individual differences, intelligence

Accurate monitoring is a key component of cognition. Solving memorize a word pair is a valid basis for confidence judgments
problems, even when one has a relatively high degree of accuracy, and judgments of learning, respectively.
is not sufficient for adaptive cognitive regulation unless one can When people engage in qualitatively different strategies on a
also tell whether a solution is correct versus incorrect. Confidence particular task, the resulting process differences are also likely to
in problem solving, as in other cognitive domains, must be based affect the kind of information one has available for confidence
on a range of cues and inferences (Koriat, 1993, 1997; Schwartz, monitoring. In the current set of experiments, we asked whether
Benjamin, & Bjork, 1997). The level of monitoring accuracy one different strategies used in a nonverbal inductive reasoning task,
can achieve depends on the type and quality of cues available Raven’s Advanced Progressive Matrices (Raven, Raven, & Court,
during monitoring rather than a direct assessment of knowledge or 1998), can create differences in the quality of confidence moni-
memory. toring and if these differences can be independent of effects on
One important source of cues for metacognitive judgments such accuracy. This task is well-suited to the question, as people are
as confidence is feedback from the cognitive operations leading up known to use different strategies on Raven’s Matrices, and those
to the production of a response (Koriat, 1997; Koriat, Ma’ayan, & strategies can be distinguished by using objective measures (Snow,
Nussinson, 2006). During problem solving, one finds out how 1980; Vigneau, Cassie, & Bors, 2006). In addition, the test has
difficult an item is by attempting to solve it (Kelley & Jacoby, excellent psychometric properties that reduce chance fluctuations
1996; Koriat et al., 2006, Experiment 7). Problems that do not in performance that might obscure assessment of monitoring
yield easily to attempts to solve them garner lower confidence (Budescu, Wallsten, & Au, 1997).
judgments. Koriat et al. (2006) demonstrated that one’s latency to
produce a response when solving a problem or attempting to
Strategy Use on Inductive Reasoning Tasks
Research examining performance on inductive, analogical, and
Ainsley L. Mitchum and Colleen M. Kelley, Department of Psychology, spatial/geometric reasoning tasks has shown a surprising degree of
Florida State University. both inter- and intraindividual flexibility in performance strategies
We thank Edward T. Cokely, Mark C. Fox, Katy Nandagopal, Tres (Bethell-Fox, Lohman, & Snow, 1984; Egan & Grimes-Farrow,
Roring, and Cari Zimmerman for feedback and lively discussion that 1982; Kossowska & Ne˛cka, 1994; Kyllonen, Lohman, & Woltz,
contributed substantially to the quality of this work. This research was 1984; Sternberg, 1977). Snow (1980) outlined two primary strat-
completed as part of Ainsley L. Mitchum’s thesis submitted to Florida egies that participants tend to use on multiple-choice nonverbal
State University. We thank committee members K. Anders Ericsson and reasoning tasks, constructive matching and response elimination.
Joyce Ehrlinger for their helpful feedback. We are also grateful for the Constructive matching, which is more likely to be favored by
assistance of our dedicated research team for help with data collection.
high-performing participants (Bethell-Fox et al., 1984; Schiano,
Finally, we thank David Schell for his assistance in preparing the final
version of the manuscript.
Cooper, Glaser, & Zhang, 1989; Snow, 1980; Vigneau et al.,
Correspondence concerning this article should be addressed to Ainsley 2006), is characterized by a tendency to spend proportionally more
L. Mitchum, Department of Psychology, Florida State University, 1107 time examining each problem before inspecting available answer
West Call Street, Tallahassee, FL 32306-4301. E-mail: mitchum@ choices. Converging evidence from verbal reports and eye-
psy.fsu.edu movement analyses suggests that participants spend this extra time

699
700 MITCHUM AND KELLEY

constructing a potential answer, which is then compared to the Effect of Strategy on Cues Available for Confidence
presented response options (Snow, 1980; Vigneau et al., 2006). Monitoring
Response elimination, which is more likely to be favored by poor
performers, is characterized by a more trial-and-error approach to Using qualitatively different strategies on a task should affect
solving items. Rather than predicting what the correct answer monitoring accuracy as well as performance by producing differ-
would look like beforehand, those using response elimination tend ences in the type and quality of cues available for monitoring.
to compare features of the stimulus items with features of response Compared to participants who rely heavily on response elimina-
options in hopes of eliminating incorrect responses, in essence tion, participants who use constructive matching may have a
“reasoning backward” from each potential response option qualitatively different and more diagnostic set of cues to draw
(Bethell-Fox et al., 1984; Snow, 1980). Snow and Bethell-Fox et from when making monitoring judgments. This allows them to
al. noted that individual participants often do not exclusively use more accurately evaluate their performance. Perhaps the most
one strategy or the other. Some participants initially used construc- salient cue available to those who use constructive matching, one
tive matching within tasks but switched to response elimination as that is not available to those using response elimination, is the
presence or absence of one’s generated response among the avail-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

problems became more difficult (Butler, Marsh, Goode, & Roedi-


able response options. If one’s generated candidate answer is not
This document is copyrighted by the American Psychological Association or one of its allied publishers.

ger, 2006; Snow, 1980). However, the worst performing partici-


present, this is a very salient and highly diagnostic cue that the
pants tended to rely almost exclusively on response elimination or
generated response is incorrect, as most participants would be
switched to that strategy fairly early during the course of the task.
aware that one of the response options is definitely correct. Be-
More recently, Vigneau et al. (2006) have confirmed the use of
cause it is possible to generate candidate responses that are incor-
constructive matching and response elimination strategies in geo-
rect but are still among the presented response options, finding
metric inductive reasoning tasks and their relationship to perfor-
one’s generated response is somewhat less diagnostic as a cue;
mance through an eye-movement analysis of performance on the
nonetheless, finding one’s generated solution among the presented
Raven’s test. Each problem consists of a 3 ⫻ 3 matrix of figures, options may be taken as indicating that one’s answer is likely
with the bottom right figure removed and with eight response correct and will likely prompt higher confidence ratings. Finding
options (see Figure 1). Participants are instructed to select the or failing to find one’s constructed answer among the response
response option that best completes the pattern. Vigneau et al. options gives participants using constructive matching an addi-
reported, consistent with results reported by Snow (1980) and tional opportunity to revise incorrect responses, detect systematic
Bethell-Fox et al. (1984), that participants who spent proportion- biases in their representations, and adjust confidence ratings ac-
ally more time examining the matrix portion of items before cordingly. This strategy results in more accurate monitoring judg-
examining the response options (taken as an indication of greater ments.
reliance on the constructive matching strategy) tended to earn
higher scores than those who spent less time examining the matrix.
In contrast, participants who quickly moved toward inspection of Experiment 1
response options (which was taken as indicating heavy reliance on Experiment 1 examined the relationship between spontaneous
the response elimination strategy) tended to earn lower scores. task strategy, performance, and confidence monitoring accuracy
on nonverbal reasoning problems. We expected, as in previous
studies, that spontaneous use of the constructive matching strategy
would be related to task performance. Participants earning higher
scores would also tend to spend proportionally more time exam-
ining the matrix portion of each problem, an indication that they
are using constructive matching to a greater extent. However, we
also expected that spontaneous use of the constructive matching
strategy would make an additional, highly diagnostic cue available
for monitoring confidence (i.e., the presence or absence of one’s
generated response) and so enhance monitoring accuracy.
Individual differences in performance are often associated with
differences in monitoring accuracy and task strategy, with high
performers generally showing more accurate monitoring and use
of better strategies compared to low performers (Higham & Ar-
nold, 2007; Snow, 1980). However, we believe that the contribu-
tions of performance and strategy are, at least in part, separable.
We predicted that individuals relying most heavily on constructive
matching, when compared to individuals relying more on response
elimination, would show better monitoring of whether they solve
problems correctly, as assessed by both relative and absolute
monitoring accuracy. Although performance on the problems is
Figure 1. Sample item from Raven’s Advanced Progressive Matrices. also likely to be related to improved monitoring accuracy, we
The participant is asked to select the response option that best completes predicted that spontaneous strategy use would explain additional
the pattern both down the columns and across the rows but not diagonally. unique variance in monitoring performance.
STRATEGIES, MONITORING, AND CONTROL 701

Method Results
Participants. Participants were 55 Florida State University Overall performance on the RAPM. Overall performance
undergraduates recruited from the general psychology participant on the RAPM was similar to that of the normative sample of 506
pool. Participants received course credit in exchange for their first-year university students collected by Bors and Stokes (1998).
participation. Data from five participants were excluded from all In the current sample (N ⫽ 50), the average score was 21.4 (SD ⫽
analyses because they indicated that they had previously partici- 5.57) correct responses out of a possible 36, whereas Bors and
pated in another experiment using Raven’s Matrices. Stokes reported a mean of 22.2 (SD ⫽ 5.60).
Materials, design, and procedure. Participants were tested Strategy measure. Following Vigneau et al. (2006), we took
individually in a single session lasting about one hour, during spending a larger proportion time viewing the matrix alone before
which they completed the nonverbal reasoning problems of displaying the response options as an indication of greater reliance
Raven’s Matrices and rated their confidence for each problem. The on the constructive matching strategy. Participants’ strategy use
testing was followed by a postexperimental questionnaire and was defined as the average proportion of total time per problem
debriefing. spent examining the matrix portion before displaying response
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

choices. Participants varied greatly in the proportion of time spent


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Raven’s Advanced Progressive Matrices (RAPM), Set 2. Par-


ticipants first completed a computer-administered version of the on the matrix-only screen (M ⫽ .26, SD ⫽ .21), ranging from as
RAPM, Set 2 (Raven et al., 1998), that was modified to assess little as 3% to as much as 72% of total time. Consistent with past
strategy use. The task includes 36 items that are presented in studies (Bethell-Fox et al., 1984; Snow, 1980; Vigneau et al.,
ascending order of normative difficulty. Participants are instructed 2006), strategy use was related to overall performance, r(48) ⫽
to select the piece that completes the pattern, both down the column .33, p ⫽ .02, indicating that higher performing participants tended
and across the rows, from the eight response options presented below to favor the constructive matching strategy.
the matrix. Each item was displayed in two successive screens on Participants’ responses on the postexperimental strategy ques-
the computer when assessing strategy. The first display consisted tionnaire generally matched the objective measure of average
of the 3 ⫻ 3 matrix only. Participants were free to view the matrix proportion time on matrix as a measure of strategy use. Partici-
pants who reported that they relied heavily on response elimination
for as long as they wished. They were instructed to press the space
(n ⫽ 6) spent, on average, only 6% of their total time per problem
bar when they were ready to view the second screen showing both
examining the matrix alone before displaying answer choices.
the 3 ⫻ 3 matrix and the eight response choices. Participants
Those who reported using constructive matching (n ⫽ 20) spent an
entered their responses while the second screen was displayed.
average of 33% of their total time examining the matrix alone.
Reaction time for the matrix-only screen was collected from the
Participants who reported using a combination of the two strate-
display of the matrix until the keypress that initiated the full item
gies (n ⫽ 22) spent, on average, 26% of their total time examining
(matrix plus response options). Reaction time for the duration of
the matrix alone.
the full item screen was also collected, starting when the full item
Measures of monitoring accuracy.
screen was displayed and terminating when a response was en-
Resolution. Resolution (or relative monitoring accuracy)
tered. The reaction times from these two screens were used to measures the degree to which confidence judgments for indi-
determine strategy use, as participants who spent a greater propor- vidual items distinguish between correct and incorrect re-
tion of time on the matrix before moving to the response screen sponses. The Goodman–Kruskal gamma coefficient (␥), a non-
were presumed to use that time to encode the problem and con- parametric within-subjects correlation, is the most widely used
struct a response. measure of the resolution for item-by-item judgments (Nelson,
Confidence judgments. Following the response to each prob- 1984). Values for gamma range from –1 to ⫹1, with higher
lem, participants were asked to rate their confidence in the likeli- values indicating better resolution. Although it is often the case
hood that their response was correct on a scale ranging from 12% that monitoring relates to overall performance, it is important to
(chance performance) to 100%. Reaction times for confidence note that this need not be the case with resolution, as measured
judgments were also collected. by gamma (Nelson, 1984).
Postexperimental questionnaire. After completing the test, The Goodman–Kruskal gamma coefficient between confidence
participants completed a brief questionnaire about their strategy and problem accuracy was calculated for each participant as a
use on the task. Participants were asked which of the following measure of resolution.1 Participants showed considerable variabil-
potential strategies they used most often on the task: (a) Looked at ity in relative monitoring accuracy. Gamma ranged from .14 to .98,
each response choice until I found one that seemed to fit; (b) Tried with an average gamma of .70 (SD ⫽ .19), which was significantly
to predict what the correct answer should be and then searched for different from 0, t(49) ⫽ 25.5, p ⬍ .001. As predicted, strategy as
it among the response options given; (c) A little of both; or (d)
Other. Participants were asked to provide additional detail about
1
their problem-solving approach if they selected either of the latter Masson and Rotello (2009) raised some concerns about the use of
two options. Two additional free-response questions asked partic- gamma as a measure of relative monitoring accuracy and recommended Az
as an alternative measure. We computed both gamma and Az and found
ipants whether there were any special circumstances that might
identical results with the two measures. For these data, Az correlated highly
have affected their performance (e.g., not enough rest, hungry, ill), with gamma (r ⫽ .96, p ⬍ .001), suggesting that the two measures
as well as whether they had participated in any other experiments essentially convey the same information. Average Az for Experiment 1 was
using similar matrix-reasoning tasks. Participants who had partic- .81 (SD ⫽ .10), which was significantly different from .50, t(49) ⫽ 21.77,
ipated in similar experiments were excluded from all analyses. p ⬍ .001.
702 MITCHUM AND KELLEY

measured by proportion time on the problem matrix was related to variability (see Figure 2). Calibration error scores ranged from .06
resolution, r(48) ⫽ .40, p ⫽ .004, indicating that participants who to .51, with higher scores indicating poorer calibration. Calibration
relied more heavily on constructive matching had better relative error scores were related to overall performance such that those
monitoring accuracy in their confidence judgments. However, earning lower scores tended to have larger calibration error scores,
gamma correlations were unrelated to overall performance, r(48) ⫽ ⫺.51, p ⫽ .0002. More important, calibration error scores
r(48) ⫽ .17, p ⫽ .24. were also related to strategy use such that those relying more
Absolute accuracy. Absolute accuracy or calibration (also heavily on constructive matching had smaller calibration error
called bias), examines the magnitude of the difference between scores, r(48) ⫽ ⫺.52, p ⫽ .0001.
one’s level of subjective confidence and actual performance, Hierarchical regression analysis was used to test the hypothesis
indicating overconfidence, underconfidence, or perfect calibra- that strategy use accounts for unique variance in calibration, even
tion. This is typically done by plotting a calibration curve that after controlling for performance (see Table 1 for a summary).
displays actual performance as a function of subjective confi- Calibration error scores were regressed on overall performance
dence ratings. For example, perfect calibration would be indi- and strategy use. Together, these predictors accounted for a sig-
cated when items to which an individual assigns a 50% prob- nificant amount of variance in calibration, F(2, 47) ⫽ 15.43, p ⬍
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

ability of being correct are correct 50% of the time, and so forth .001, adjusted R2 ⫽ .37. Performance alone accounted for a
across the scale. significant proportion of the overall variance in calibration error
Participant confidence judgments for each of the 36 problems scores (␤ ⫽ ⫺.38, p ⫽ .003). More important and as predicted, the
were divided into 11 discrete categories (0 –12, 13–20, 21–30 . . . degree of constructive matching as measured by proportion time
91–99, 100). Calibration error scores (Oskamp, 1962, 1965) were on the matrix before moving to the response options also ac-
calculated for each participant as the weighted mean of the abso- counted for significant unique variance in calibration error (␤ ⫽
lute differences between the mean confidence and actual propor- ⫺.40, p ⫽ .002).
tion correct for each confidence grouping, Overconfidence and underconfidence. The difference be-
C ⫽ 冘 共n兩p ⫺ c兩兲/N,
tween average confidence and average accuracy was computed for
each participant to evaluate the relationship between overconfi-
where n is the total number of observations at each confidence dence or underconfidence and strategy use. Participants were
level, p is the assessed confidence level, c is the actual proportion slightly overconfident. The average difference between overall
correct at each confidence level, and N is the total number of confidence and overall performance was 8.44 (SD ⫽ 16.1), with
observations. Calibration error scores give an absolute measure of difference scores ranging from ⫺32.3 to 50.2. The relationship
calibration and do not indicate overconfidence or underconfidence. between strategy and difference scores was marginally significant,
We chose this measure because we wanted to examine absolute r(48) ⫽ ⫺.24, p ⫽ .09, such that those who relied more on
calibration independent of over- or underconfidence. constructive matching tended to be slightly underconfident and
As in past studies (e.g., Stankov, 1998), participants were fairly those who relied primarily on response elimination tended to be
well calibrated (M ⫽ .20, SD ⫽ .10) but showed considerable slightly overconfident.

Figure 2. Calibration plot for Experiment 1. Labels are the number of observations per confidence bin.
STRATEGIES, MONITORING, AND CONTROL 703

Table 1
Summary of Hierarchical Regression Predicting Calibration Error Scores in Experiment 1

Correlations

Predictor F change ␤ t p Zero Part Partial Tolerance

Model 1 (R2 ⫽ .26, Adj. R2 ⫽ .24) 16.60


Mean accuracy on RAPM ⫺.51 ⫺4.07 .0002
Model 2 (R2 ⫽ .40, Adj. R2 ⫽ .37) 10.85
Mean accuracy on RAPM ⫺.38 ⫺3.15 .003 ⫺.51 ⫺.42 ⫺.36 .89
Proportion time on matrix ⫺.40 ⫺3.29 .002 ⫺.52 ⫺.43 ⫺.37 .89

Note. RAPM ⫽ Raven’s Advanced Progressive Matrices.


This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Discussion and answers are both displayed, participants using constructive


This document is copyrighted by the American Psychological Association or one of its allied publishers.

matching may be searching for their constructed answer rather


The results of Experiment 1 demonstrate a clear relationship than continuing to work on the problem. When the constructed
between spontaneous task strategy and monitoring accuracy such answer is present among the response options, those using con-
that participants who relied more on constructive matching con- structive matching should have relatively quick response times on
sistently showed more accurate monitoring across both relative the full problem screen and should be highly confident in their
and absolute measures. Task strategy accounted for unique vari- responses. On the other hand, when the constructed response is not
ance in absolute monitoring accuracy after controlling for overall found among the presented options, participants should take longer
performance differences. In the case of resolution, strategy use was to enter a response, given that they are likely to revise their
the only reliable predictor of monitoring: Task performance was previous, obviously incorrect response, and should also be less
unrelated to relative monitoring accuracy. Taken together, these confident in the response they eventually chose. Therefore, we
results are consistent with our hypothesis that strategies can influ- predicted that the people instructed to construct an answer before
ence monitoring accuracy and that the effect of strategy may be, at viewing the response options would show a negative correlation
least in part, independent of ability differences. However, because between response times on the full matrix screen and confidence
these data are correlational, we cannot make assumptions about the ratings.
causal role of strategy use on monitoring accuracy. In Experiment Participants in the control condition were free to solve problems
2, we attempted to gain evidence that strategy causally influences using whatever strategy they choose, and so some participants in
confidence monitoring by instructing participants to generate an- the control condition would likely spontaneously use constructive
swers before viewing response choices. matching and others would rely more on response elimination. The
relationship between time to choose a response option and confi-
Experiment 2 dence was predicted to be weaker for participants in the control
If qualitative differences in problem-solving strategy causally condition than for those in the constructive matching condition, as
affect the type and quality of cues available for monitoring con- some searched for a constructed answer among response options
fidence, we should to be able to improve confidence monitoring by and others attempted to solve the problem via response elimination
instructing participants to use constructive matching. Experiment 2 once the response options were displayed. However, we predicted
randomly assigned participants to either a control group, similar to that the relationship between time to enter a response and confi-
Experiment 1, in which they were free to use whatever strategy dence would increase as reliance on the constructive matching
they wished to solve the problems, or a constructive matching strategy increased.
group, in which they had to attempt to construct an answer to each
Method
problem before viewing the response options. We predicted im-
proved monitoring resolution and calibration in the constructive Participants. Participants were 69 Florida State University
matching group. Although instructed strategy may not be identical undergraduates recruited from the general psychology participant
to spontaneous strategies, we believed that the act of attempting to pool. Participants received partial course credit in exchange for
generate a candidate response would provide information similar their participation. In Experiment 2, 12 participants were excluded
to that one would obtain from engaging in the strategy spontane- from final analyses (8 reported that they had participated in other
ously. experiments using matrix reasoning tasks, 4 failed to follow di-
Perhaps the most salient and diagnostic cue available to partic- rections), leaving 28 participants in the constructive matching
ipants using constructive matching is the presence or absence of condition and 29 participants in the control condition.
their constructed response among presented response options. In Materials, design, and procedure. Participants were ran-
the present experiment, one indication that participants are using domly assigned to one of two conditions, constructive matching or
this cue would be the within-subjects relationship between confi- control. Both groups completed a computerized version of the
dence and time to enter a response once a candidate answer is RAPM that was identical to the version used in Experiment 1, with
generated. Participants using constructive matching tend to spend only the instructions differing between the two conditions. In the
a greater proportion of their overall time looking at the screen constructive matching condition, participants were instructed to
containing the matrix alone. This suggests that when the matrix generate a candidate answer before advancing to the screen with
704 MITCHUM AND KELLEY

the answers. In the control condition, instructions were identical to p ⫽ .07. Given that a significant correlation between strategy use
those in Experiment 1. To promote and monitor compliance with and performance, as well as strategy use and both relative and
the strategy instruction to generate a response, we required partic- absolute monitoring accuracy, was found in Experiment 1, it is
ipants in the constructive matching condition to draw their candi- surprising that a similar pattern of results was not found in the
date answer on a piece of scratch paper before continuing to the control condition for Experiment 2. We believe there are two
answer screen. potential explanations. First, the correlation between strategy use
In Experiment 2, the problems were presented to participants in and overall performance in Experiment 1 was .33; the sample size
a random order rather than in ascending order according to nor- for the control condition in Experiment 2 would be insufficient to
mative difficulty, as is typically done when the RAPM is used as detect this. Second, as noted earlier, we changed the administration
an intelligence test. Pilot testing and results from Experiment 1 order of items in Experiment 2. In addition to leading to lower
suggested that the item difficulty gradient could be the basis for scores for Experiment 2, this could have affected other dependent
monitoring accuracy, in that one becomes able to anticipate that measures, such as strategy use. If participants, particularly those at
each problem is more difficult than the last. The availability of this the highest ability levels, learn during the course of the task,
cue would reduce participants’ reliance on more data-driven, presenting items in random order could have interfered with this.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

experience-based cues. By random ordering the problems, we We found a significant relationship in the strategy instructed
hoped to increase the likelihood that participants’ confidence judg- group between overall performance and both relative monitoring
ments were based on actual attempts to solve the problems rather
than quick inferences reflecting the difficulty of previous prob-
2
lems.2 For Experiment 1, the correlation between item number and confidence
was computed for each participant, after controlling for accuracy, as a
measure of the extent to which participants used item gradient as a
Results confidence cue. The average correlation for the group was ⫺.34, which
was significantly different from zero, t(49) ⫽ ⫺22.96, p ⬍ .001, suggest-
Overall performance. The mean proportion correct on the ing that participants did use item presentation order as a confidence cue.
problems did not differ between conditions (F ⬍ 1; see Table 2 for For Experiment 2, we had hoped that presenting the items in random order
means). Forcing participants to engage in constructive matching by would reduce the use of item sequence as a cue. To examine this we
drawing a candidate answer before viewing the response options computed the correlation between presentation order and confidence, con-
did not improve their performance.3 trolling for accuracy, for each participant in Experiment 2. The average
Monitoring accuracy. Gamma coefficients were calculated correlation was .02 (SD ⫽ .22) for the constructive matching group and .01
for each participant as a measure of resolution of confidence (SD ⫽ .22) for the control group. These correlations did not differ signif-
judgments. As predicted, participants in the constructive matching icantly from one another (F ⬍ 1) and were not significantly different from
zero, t(27) ⫽ 0.42, p ⫽ .68, for the constructive matching group or the
condition (M ⫽ .79, SD ⫽ .12) showed better relative monitoring
control group, t(28) ⫽ 0.14, p ⫽ .89. Because the standard sequencing used
accuracy than did participants in the control condition (M ⫽ .68, on the RAPM is based on normative difficulty, we wondered if random
SD ⫽ .23), F(1, 55) ⫽ 5.77, p ⫽ .02, d ⫽ 0.70.4 Gamma was ordering the items would change the degree to which confidence tracked
significantly different than zero for both the constructive matching normative item difficulty in Experiment 2. Again, we computed the within-
condition, t(27) ⫽ 35.53, p ⬍ .001, and the control condition, subjects correlation between item number and confidence for each partic-
t(28) ⫽ 16.07, p ⬍ .001. ipant. The relationship between item number (normative difficulty) and
Calibration error scores were calculated for each participant as confidence remained about the same in Experiment 2 as in Experiment 1.
a measure of absolute monitoring accuracy. Participants in the The average correlation was ⫺.35 (SD ⫽ .20) for the constructive match-
constructive matching condition showed significantly better abso- ing group and ⫺.40 (SD ⫽ .15) for the control condition, both of which
lute accuracy than did those in the control condition, F(1, 55) ⫽ were different from zero, t(27) ⫽ ⫺9.12, p ⬍ .001, and t(28) ⫽ ⫺13.81,
p ⬍ .001, respectively. These correlations did not differ from one another
4.68, p ⫽ .04, d ⫽ 0.60 (see Table 2). Inspection of the overall
(F ⬍ 1), nor did they differ significantly from the average correlation in
calibration figures (see Figure 3) suggests that constructive match- Experiment 1 (F ⬍ 1). We believe this is due to the high reliability of the
ing led to better calibration across the range of confidence, includ- test. In essence, random ordering the problems removes the possibility that
ing reductions in overconfidence at the higher levels of confidence participants are using presentation order as a cue but does not substantially
and reductions in underconfidence at the very lowest levels. We change the relationship between confidence and normative difficulty.
also computed the difference between overall accuracy and mean 3
Performance on the RAPM was slightly lower in Experiment 2 than in
confidence for each participant. The constructive matching in- Experiment 1. We believe that this is related to having administered the
structed group showed slightly, but not significantly, less overcon- items in random order (see Carlstedt, Gustafsson, & Ullstadius, 2000).
fidence, F(1, 55) ⫽ 2.27, p ⫽ .14. Average confidence ratings did However, it is important to note that Experiment 2 did not require that the
not differ between conditions (F ⬍ 1). RAPM scores be comparable to those of the normative sample but rather
Relationship between spontaneous strategy use, perfor- that the two experimental groups’ scores not differ significantly from one
another.
mance, and monitoring accuracy. As in Experiment 1, we 4
examined the relationship between spontaneous strategy, overall Analyses were repeated with Az and yielded the same results, F(1,
55) ⫽ 7.01, p ⫽ .01, d ⫽ 0.76. Average Az was .86 (SD ⫽ .07) for the
performance, and monitoring accuracy for the control condition in
constructive matching group and .79 (SD ⫽ .11) for the control group. Both
Experiment 2. Within the control condition, spontaneous strategy of these were significantly different from .50, t(27) ⫽ 27.52, p ⬍ .001, and
use was unrelated both to overall performance, r(28) ⫽ .11, p ⫽ t(28) ⫽ 14.18, p ⬍ .001, respectively. As in Experiment 1, Az and gamma
.58, and to relative monitoring accuracy, r(28) ⫽ ⫺.06, p ⫽ .77, were highly correlated for both the constructive matching condition,
but the relationship between spontaneous strategy use and absolute r(26) ⫽ .81, p ⬍ .001, and the control condition, r(27) ⫽ .92, p ⬍ .001.
monitoring accuracy reached marginal significance, r(28) ⫽ .34, These correlations did not differ significantly (z ⫽ ⫺1.43, p ⫽ .08).
STRATEGIES, MONITORING, AND CONTROL 705

Table 2
Summary of Descriptive Statistics

Condition RAPM Confidence Gamma Calibration error Difference scores

Experiment 1 (N ⫽ 50) 21.44 (5.57) 68.02 (12.38) .70 (.19) .20 (.10) 8.44 (16.07)
Experiment 2
Strategy (N ⫽ 28) 19.00 (7.55) 57.50 (17.29) .79 (.12) .16 (.07) 4.43 (10.78)
Control (N ⫽ 29) 18.93 (5.24) 62.10 (17.56) .68 (.23) .21 (.09) 9.52 (14.39)

Note. The table gives group averages for critical measures. Standard deviations are shown in parentheses.
RAPM scores reflect the raw score out of 36 items. Confidence is the average confidence for all 36 items
averaged across participants. Difference scores are the signed difference between average confidence and
average accuracy, which was computed for each participant individually. RAPM ⫽ Raven’s Advanced Pro-
gressive Matrices.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

accuracy, r(26) ⫽ .39, p ⫽ .04, and absolute monitoring accuracy, matching instructed condition, also likely compared their gener-
r(26) ⫽ ⫺.39, p ⫽ .04. Participants who performed better on the ated response to the presented response choices (see Figure 4).
task also tended to have better monitoring accuracy across both
measures. Although this suggests that high performers tend to Discussion
monitor their performance more accurately, our data do not allow
us to speculate as to whether the monitoring advantage for high Our major goal in Experiment 2 was to test whether we could
performers emerged because they were better able to carry out the manipulate the likelihood of participants using constructive match-
instructed strategy or whether this advantage was simply a product ing by having them draw a candidate answer before seeing the
of higher performance overall. response options, which we predicted would produce a concomi-
Sources of improved monitoring: Differential cue availabil- tant improvement in the accuracy of confidence judgments. Mon-
ity and use. itoring accuracy did improve in the constructive matching in-
Presence or absence of constructed option among response structed condition, both in terms of higher monitoring resolution
options. One cue for monitoring created by our strategy manip- and lower calibration error scores. This advantage in monitoring
ulation is the presence or absence of one’s generated response accuracy occurred in the absence of performance differences be-
among the available options. On problems in which the generated tween groups, suggesting that the observed differences can be
response option was found among the given options, participants primarily attributed to differential strategy use. We believe this
using constructive matching should have entered their responses finding is particularly important, because it is often difficult to
quickly (see Park & Choi, 2008) and been more confident. If they disentangle what advantages of strategy use are due to the strategy
did not find their generated response, they should have been slower itself and which are due to ability advantages associated with
to enter a response (most likely because they were attempting to spontaneously selecting a given strategy.
revise their response) and less confident in that response. We suggest that constructing an answer before looking at the
To test whether participants were making use of the additional response options produces an additional cue for confidence judg-
cue of finding or not finding their generated response among the ments, namely, the confidence-enhancing presence of one’s con-
presented answers, we computed the within-subjects correlation structed answer among the response options or the confidence-
between time to select a response (on the screen displaying the undermining absence of one’s constructed answer among the
matrix and answers) and confidence for all participants. On aver- response options. Within the constructive matching group, partic-
age, participants in the constructive matching condition showed a ipants who took longer to select a response on the answer screen
significantly stronger relationship between latency to enter a re- tended to be less confident, likely because they did not find their
sponse and confidence (mean correlation, r ⫽ ⫺.38, SD ⫽ .18) generated option and took time to revise their answer, and those
than did participants in the control condition (mean correlation, who chose a response quickly were more confident. We found
r ⫽ ⫺.08, SD ⫽ .19), F(1, 55) ⫽ 36.93, p ⬍ .001, d ⫽ 1.62. For additional support for use of such a cue in the negative relation
both the constructive matching condition, t(27) ⫽ ⫺10.97, p ⬍ between time to select a response option and confidence for those
.001, and the control condition, t(28) ⫽ ⫺2.13, p ⫽ .04, the spontaneously using constructive matching in the control condi-
correlation between latency to produce a response and confidence tion, suggesting that the effects of our strategy manipulation on
was different than zero. monitoring were qualitatively similar to those of an endogenously
Within the control condition, there was a significant negative generated strategy.
relationship among the strategy measure, proportion time on ma-
trix, and the within-subjects correlation between time to enter a General Discussion
response and confidence, r(27) ⫽ ⫺.54, p ⫽ .003. That is, those
participants who spontaneously used the constructive matching The current experiments offer evidence that differences in task
strategy also showed a stronger negative relationship between time strategy on a logical reasoning task can causally affect the moni-
to select a response and confidence. This finding suggests that toring accuracy of confidence judgments, independent of overall
participants in the control condition who spontaneously used con- task performance. Experiment 1 established a link between spon-
structive matching, similar to participants in the constructive taneous task strategy, overall task performance, and monitoring
706 MITCHUM AND KELLEY
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Figure 3. Calibration plot for the experimental group (top panel) and control group (bottom panel) in
Experiment 2. Labels are the number of observations per confidence bin.

accuracy. Although use of the constructive matching strategy was but may also be a result of the informational advantage associated
associated with slightly better performance, strategies were found with selection of more effective and adaptive strategies.
to account for unique variance in absolute monitoring accuracy Experiment 2 demonstrated that differential strategy use can
even when controlling for overall performance differences. These causally affect monitoring accuracy, independent of performance
findings suggest that the superior monitoring accuracy of those differences. Across several different measures of monitoring, par-
performing well on tasks is not entirely due to differences in ability ticipants instructed to use constructive matching consistently dem-
STRATEGIES, MONITORING, AND CONTROL 707

computing the within-subjects correlation between (a) time to


enter an answer after displaying response options and (b) confi-
dence judgments for each participant. Participants who find their
generated response should answer more quickly and be more
confident. When they do not find their generated response, they
should answer more slowly and be less confident. Indeed, we
found that those instructed to use constructive matching showed a
stronger negative correlation between time to choose a response
and confidence than did those in the control condition. Within the
control condition, participants spontaneously using constructive
matching, as measured by the average proportion of time spent on
the matrix section of problems, also showed a strong, negative
relationship between time to enter a response and confidence.
Taken together, the results of these two experiments highlight
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

the potential influence of task strategy on monitoring performance.


Figure 4. Scatterplot of cue utilization by strategy. Participants in the If feedback from solution processes is a significant part of the
control condition in Experiment 2 who spontaneously used constructive informational basis for monitoring judgments, it follows that sys-
matching were less confident when they took longer to enter responses. tematic differences in solution processes can affect the accuracy of
judgments by making different information available. On multiple-
choice reasoning tasks, participants using constructive matching
onstrated better monitoring accuracy on both relative and absolute have access to at least one additional cue that those using response
measures. Further analyses established that these between-groups elimination do not, namely, the presence of their generated answer
differences in monitoring accuracy were related to differential cue among response choices.
use between the constructive matching and uninstructed groups,
rather than differences in overall confidence. We suggest that Relation to Overconfidence
constructing a candidate response before looking at the response
options gives rise to an additional cue for monitoring performance The constructive matching manipulation improved calibration
(i.e., the presence or absence of one’s constructed response among overall, with a combination of reduced overconfidence at the high
the options). Finding one’s answer may raise one’s confidence in end of the confidence scale and reduced underconfidence at the
it, whereas not finding the candidate answer may lower confi- low end of the scale. The improved calibration adds to a growing
dence. The absence of one’s constructed answer among response literature indicating that overconfidence can be rooted in cognitive
options also offers an additional opportunity to revise one’s orig- processes and does not simply reflect a bias due to random error or
inal response. a general mismatch between the test and the environment to which
Although it is possible that our strategy instruction manipulation one has adapted (Gigerenzer, Todd, & the ABC Research Group,
could lead to higher scores in the experimental condition, it did 1999). Prior manipulations that have reduced overconfidence in-
not. This may be due to a number of potential reasons. First, clude requiring participants to retrieve more prior exemplars in
instructed and spontaneous strategies are often not equivalent. In classification tasks (Sieck & Yates, 2001) and requiring partici-
some cases, not all individuals are able to benefit from an in- pants to consider each option on a general knowledge test as
structed strategy or benefit to the same extent as those spontane- correct and construct an explanation of why that could be (Sieck,
ously generating the strategy, nor do instructed strategies always Merkle, & Van Zandt, 2007). People using response elimination
improve performance (e.g., Baron, 1978; Bryan, Luszcz, & may be more susceptible to overconfidence when they are not
Pointer, 1999; Cokely, Kelley, & Gilchrist, 2006). Second, scores guessing if they quickly consider a best candidate option and then
on Raven’s Matrices and similar tests do not improve with brief work backward through the matrix problem to construct an expla-
training interventions such as the one used in the current experi- nation of how that option could be an answer. As Sieck et al. noted
ment; rather, extensive, specific training is required to produce in the case of multiple-choice general knowledge tests, it is rela-
significant changes in scores (e.g., Jaeggi, Buschkuehl, Jonides, & tively easy for people to construct a possible explanation for any
Perrig, 2008). Finally, there is nothing about our strategy manip- answer, leading to so-called option fixation and a reduction in the
ulation that necessarily gave those in the constructive matching consideration of other options. This may be particularly true in the
condition any additional information about how to carry out op- case of logical reasoning problems in which foils are very plausi-
erations that produce correct responses. Although it would be of ble. A second element that might contribute to relatively poor
great practical import if instructing people to engage in construc- monitoring when response elimination is used to solve nonverbal
tive matching improved their performance on IQ tasks like the reasoning problems is that people may test the fit of possible
Raven’s Matrices, the lack of an effect of such instructions on responses with only a subset of the constraints that constitute the
performance in our Experiment 2 helps sharpen our argument problem, making it too easy to “discover” a rule that fits with the
about the effects of strategy on monitoring. Namely, strategy can local context. Vigneau et al. (2006) found that participants solving
affect confidence monitoring independent of effects on perfor- Raven’s problems often visually fixate on the figures directly
mance. adjacent to the missing figure. There are fewer constraints on the
In Experiment 2, we further examined reliance of participants on possible response if participants evaluate only whether a possible
the presence or absence of their generated option as a cue by response fits with the local context than if they consider the entire
708 MITCHUM AND KELLEY

matrix. It follows that it is relatively easy to bolster one’s confi- behaviors ranging from occupational and educational performance
dence in a wrong answer by seeing that it fits with adjacent figures. to health outcomes (Hunter & Schmidt, 1996; Neisser et al., 1996).
Such a process might contribute to overconfidence in incorrect Monitoring is not part of the psychometric use of the RAPM, yet
responses. In contrast, constructive matching may lead to more we speculate that it could add important information to an under-
thorough consideration of the matrix that places more constraints standing of individual differences in reasoning. For example, the
on possible options and so results in less overconfidence in incor- constructive matching manipulation used in Experiment 2 did not
rect answers. change people’s scores, but it did improve the extent to which they
We predict that constructive matching is likely to improve calibra- knew what they knew. The confidence people have in their rea-
tion in many other tasks as well, relative to a strategy of working soning and judgment determines their reliance on the “answer” for
backward from candidate options. For example, the classic task in action and the reliance others may place on their recommenda-
overconfidence research is answering general knowledge questions. tions. Monitoring on the RAPM may predict additional variance in
People taking general knowledge tests vary in whether they at- real-world performance indicators, beyond that predicted by over-
tempt to answer a question before looking at response options or all score. Koren et al. (2004) demonstrated that monitoring accu-
work backward from response options. McClain (1983) found that racy on an executive functioning task, the Wisconsin Card Sorting
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

top students (i.e., students who earn “A” grades in their courses) Task, was a better predictor of poor insight (which is associated
taking a class exam first answered the question on their own with treatment outcomes) in patients with schizophrenia than the
and then considered all the options. Lower performing students conventional scores themselves.
(i.e., those who earn “C” grades) tended to go directly to the It is theoretically interesting that our results reveal individual
options and appeared to consider only one or two. We are currently differences in monitoring (for a related discussion, see Hertzog &
testing whether requiring participants to answer a question before Dunlosky, 2004). Although there may not be a general monitoring
seeing the response options improves monitoring of general ability (e.g., Kelemen, Frost, & Weaver, 2000), one avenue for
knowledge and to what extent the improvement is mediated by a future investigation is determining whether stable individual dif-
fuller consideration of response options, particularly when one’s ferences in strategic behavior could produce accompanying indi-
generated response is disconfirmed. vidual differences in monitoring accuracy (Cokely et al., 2006).
Taking a more predictive approach to tasks benefits monitoring
Additional Cues Linked to Performance and and is a common metastrategy favored by high performers across
domains (Baron, 1985; Cokely & Kelley, 2009; Dunlosky, Raw-
Strategy Use
son, & Middleton, 2005; Ericsson & Charness, 1994; Hertzog &
Although in the current experiments we have focused exclu- Robinson, 2005; Kossowska & Necka, 1994; McClain, 1983;
sively on one strategy-related cue, specifically, the presence or Sternberg, 1998). The two strategies examined in the current work,
absence of one’s constructed answer among response options, constructive matching and response elimination, could be classi-
there are other cues that could be used during confidence moni- fied as examples of a predictive strategy versus a more confirma-
toring in nonverbal reasoning tasks similar to those of the RAPM. tory strategy (in which one works backward). In general, one
Many of these cues are likely related both to performance on the would expect that predictive strategies would yield more informa-
task and, to a lesser extent, to task strategy. For example, Schiano tion that could be useful in monitoring one’s performance. How-
et al. (1989) found that, when asked to sort items into groups of ever, the current study was one of the first to examine the extent
similar items, high scorers grouped items on the basis of abstract to which these different strategic approaches influence monitoring
transformational relations (e.g., simple rotations vs. multiple trans- accuracy. More generally, predictive strategies similar to construc-
formations such as rotation and reflection) and low scorers sorted tive matching may also be useful because they could be taught to
problems into categories based on perceptual similarities or shared individuals with poor monitoring skills (Sternberg, 1999). Our
figural characteristics (e.g., “diamondlike” shapes and figures were results demonstrate that individuals can reap monitoring benefits
sorted together). Several studies have linked perceptual features from this kind of strategy even in the face of significant individual
and overall complexity of matrix and geometric analogy items to differences in performance.
normative difficulty (Meo, Roberts, & Marucci, 2007; Primi, Related work in social psychology finds that poor performers
2001), and, at least in the case of memory monitoring judgments, are often “unskilled and unaware.” That is, low performance tends
perceptual features have been shown to affect the magnitude of to beget poor monitoring because low performers lack insight into
predictive judgments (Rhodes & Castel, 2008). If high and low their own errors (Dunning, Johnson, Ehrlinger, & Kruger, 2003;
performers differ in their mental representations of items, this Ehrlinger, 2008; Ehrlinger, Johnson, Banner, Dunning, & Kruger,
could also affect how perceptual and item features are used as cues 2008; Kruger & Dunning, 1999; Maki, Shields, Wheeler, & Zac-
for confidence monitoring, with high and low performers using chilli, 2005). Although Krueger and Mueller (2002) showed that in
different criterion to classify an item as “easy” or “difficult.” some cases the unskilled and unaware effect is a statistical artifact
due to regression to the mean, this is more likely to be the case
Intelligence, Strategies, and Superior Performance when tests are unreliable and so is less likely to be true when
highly reliable tests, such as the RAPM, are used to assess per-
We chose to study monitoring in the context of an intelligence formance. More recently, Krajc and Ortmann (2008) proposed
test that is widely considered the best measure of general fluid another alternative explanation for the unskilled and unaware
intelligence (Marshalek, Lohman, & Snow, 1983; Snow, Kyl- effect. They pointed out, as part of their explanation, that estimat-
lonen, & Marshalek, 1984). Performance on Raven’s Matrices, and ing relative standing on tasks is more difficult for low than for high
on intelligence tests in general, predicts a number of real-world performers because the former have fewer diagnostic cues avail-
STRATEGIES, MONITORING, AND CONTROL 709

able as a basis for estimates. Similarly, but at lower level of fail to recognize their own incompetence. Current Directions in Psy-
analysis, our work suggests that, at the item level, strategies used chological Science, 12, 83– 87.
by poor performers may limit the cues available for monitoring Egan, D. E., & Grimes-Farrow, D. D. (1982). Differences in mental
and so may contribute to the unskilled and unaware phenomenon. representations spontaneously adopted for reasoning. Memory & Cog-
Instructing participants in the use of more constructive strategies nition, 10, 297–307.
that produce self-correcting feedback may be one avenue for Ehrlinger, J. (2008). Skill level, self-views, and self-theories as sources of
error in self-assessment. Social and Personality Psychology Compass, 2,
reducing potential “unskilled and unaware” type effects.
382–398.
Ehrlinger, J., Johnson, K., Banner, M., Dunning, D., & Kruger, J. (2008).
Conclusion Why the unskilled are unaware: Further explorations of (absent) self-
insight among the incompetent. Organizational Behavior and Human
It is well established that the selection of appropriate task Decision Processes, 105, 98 –121.
strategies is crucial for effective task performance (Gigerenzer et Ericsson, K. A., & Charness, N. (1994). Expert performance: Its structure
al., 1999; Simon, 1990; Snow, 1980; Sternberg, 1977). In the and acquisition. American Psychologist, 49, 725–747.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

current experiments, we extended these findings by demonstrating Gigerenzer, G., Todd, P. M., & the ABC Research Group. (1999). Simple
This document is copyrighted by the American Psychological Association or one of its allied publishers.

that task strategies also influence monitoring accuracy. Monitoring heuristics that make us smart. New York, NY: Oxford University Press.
accuracy can be improved by using a prediction-based strategy, Hertzog, C., & Dunlosky, J. (2004). Aging, metacognition, and cognitive
such as constructive matching, relative to confirmatory strategies control. In B. H. Ross (Ed.), Psychology of learning and motivation (pp.
that involve working backward, as in the case of response elimi- 215–251). San Diego: CA: Academic Press.
Hertzog, C., & Robinson, A. E. (2005). Metacognition and intelligence. In
nation. Our results demonstrate that strategies can affect monitor-
O. Wilhelm & R. W. Engle (Eds.), Understanding and measuring
ing by creating the opportunity for self-generated feedback about
intelligence (pp. 101–123). London, England: Sage.
ongoing performance. Moreover, our results shed light on some of Higham, P. A., & Arnold, M. M. (2007). How many questions should I
the influential strategic sources of superior cognitive regulation. answer? Using bias profiles to estimate optimal bias and maximum score
on formula-scored tests. European Journal of Cognitive Psychology, 19,
References 718 –742.
Hunter, J. E., & Schmidt, F. L. (1996). Intelligence and job performance:
Baron, J. (1978). Intelligence and general strategies. In G. Underwood Economic and social implications. Psychology Public Policy and Law, 2,
(Ed.), Strategies of information processing (pp. 403– 450). London, 447– 472.
England: Academic Press. Jaeggi, S. M., Buschkuehl, M., Jonides, J., & Perrig, W. J. (2008). Im-
Baron, J. (1985). Rationality and intelligence. New York, NY: Cambridge proving fluid intelligence with training on working memory. Proceed-
University Press. ings of the National Academy of Sciences, USA, 105, 6829 – 6833.
Bethell-Fox, C. E., Lohman, D. F., & Snow, R. E. (1984). Adaptive Kelemen, W. L., Frost, P. J., & Weaver, C. A. (2000). Individual differ-
reasoning: Componential and eye-movement analysis of geometric anal- ences in metacognition: Evidence against a general metacognitive abil-
ogy performance. Intelligence, 8, 205–238. ity. Memory & Cognition, 28, 92–107.
Bors, D. A., & Stokes, T. L. (1998). Raven’s Advanced Progressive Kelley, C. M., & Jacoby, L. L. (1996). Adult egocentrism: Subjective
Matrices: Norms for first-year university students and the development experience versus analytic bases for judgment. Journal of Memory and
of a short form. Educational and Psychological Measurement, 58, 382– Language, 35, 157–175.
398. Koren, D., Seidman, L. J., Poyurovsky, M., Goldsmith, M., Viksman, P.,
Bryan, J., Luszcz, M. A., & Pointer, S. (1999). Executive function and Zichel, S., & Klein, E. (2004). The neuropsychological basis of insight
processing resources as predictors of adult age differences in the imple- in first-episode schizophrenia: A pilot metacognitive study. Schizophre-
mentation of encoding strategies. Aging, Neuropsychology, and Cogni- nia Research, 70, 195–202.
tion, 6, 273–287.
Koriat, A. (1993). How do we know that we know? The accessibility model
Budescu, D. V., Wallsten, T. S., & Au, W. T. (1997). On the importance
of feeling of knowing. Psychological Review, 100, 609 – 639.
of random error in the study of probability judgment: Part II. Applying
Koriat, A. (1997). Monitoring one’s own knowledge during study: A
the stochastic judgment model to detect systematic trends. Journal of
cue-utilization approach to judgments of learning. Journal of Experi-
Behavioral Decision Making, 10, 173–188.
mental Psychology: General, 126, 349 –370.
Butler, A. C., Marsh, E. J., Goode, M. K., & Roediger, H. L., III. (2006).
Koriat, A., Ma’ayan, H., & Nussinson, R. (2006). The intricate relation-
When additional multiple-choice lures aid versus hinder memory. Ap-
ships between monitoring and control in metacognition: Lessons for the
plied Cognitive Psychology, 20, 941–956.
Carlstedt, B., Gustafsson, J., & Ullstadius, E. (2000). Item sequencing cause-and-effect relation between subjective experience and behavior.
effects on the measurement of fluid intelligence. Intelligence, 28, 145– Journal of Experimental Psychology: General, 135, 36 – 69.
160. Kossowska, M., & Ne˛cka, E. (1994). Do it your own way: Cognitive
Cokely, E. T., & Kelley, C. M. (2009). Cognitive abilities and superior strategies, intelligence, and personality. Personality and Individual Dif-
decision making under risk: A protocol analysis and process model ferences, 16, 33– 46.
evaluation. Judgment and Decision Making, 4, 20 –33. Krajc, M., & Ortmann, A. (2008). Are the unskilled really that unaware?
Cokely, E. T., Kelley, C. M., & Gilchrist, A. H. (2006). Sources of An alternative explanation. Journal of Economic Psychology, 29, 724 –
individual differences in working memory: Contributions of strategy to 738.
capacity. Psychonomic Bulletin & Review, 13, 991–997. Krueger, J., & Mueller, R. A. (2002). Unskilled, unaware, or both? The
Dunlosky, J., Rawson, K. A., & Middleton, E. L. (2005). What constrains better-than-average heuristic and statistical regression predict errors in
the accuracy of metacomprehension judgments? Testing the transfer- estimates of own performance. Journal of Personality and Social Psy-
appropriate-monitoring and accessibility hypothesis. Journal of Memory chology, 82, 180 –188.
and Language, 52, 551–565. Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How
Dunning, D., Johnson, K., Ehrlinger, J., & Kruger, J. (2003). Why people difficulties in recognizing one’s own incompetence lead to inflated
710 MITCHUM AND KELLEY

self-assessments. Journal of Personality and Social Psychology, 77, by perceptual information: Evidence for metacognitive illusions. Jour-
1121–1134. nal of Experimental Psychology: General, 137, 615– 625.
Kyllonen, P. C., Lohman, D. F., & Woltz, D. J. (1984). Componential Schiano, D. J., Cooper, L. A., Glaser, R., & Zhang, H. C. (1989). Highs are
modeling of alternative strategies for performing spatial tasks. Journal of to lows as experts are to novices: Individual differences in the represen-
Educational Psychology, 76, 1325–1345. tation and solution of standardized figural analogies. Human Perfor-
Maki, R. H., Shields, M., Wheeler, A. E., & Zacchilli, T. L. (2005). mance, 2, 225–248.
Individual differences in absolute and relative metacomprehension ac- Schwartz, B. L., Benjamin, A. S., & Bjork, R. A. (1997). The inferential
curacy. Journal of Educational Psychology, 97, 723–731. and experiential bases of metamemory. Current Directions in Psycho-
Marshalek, B., Lohman, D. F., & Snow, R. E. (1983). The complexity logical Science, 6, 132–137.
continuum in the radex and hierarchical models of intelligence. Intelli- Sieck, W. R., Merkle, E. C., & Van Zandt, T. (2007). Option fixation: A
gence, 7, 107–127. cognitive contributor to overconfidence. Organizational Behavior and
Masson, M. E. J., & Rotello, C. M. (2009). Sources of bias in the Human Decision Processes, 103, 68 – 83.
Goodman–Kruskal gamma coefficient measure of association: Implica- Sieck, W. R., & Yates, J. F. (2001). Overconfidence effects in category
tions for studies of metacognitive processes. Journal of Experimental learning: A comparison of connectionist and exemplar memory models.
Journal of Experimental Psychology: Learning, Memory, and Cogni-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Psychology: Learning, Memory, and Cognition, 35, 509 –527.


tion, 27, 1003–1021.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

McClain, L. (1983). Behavior during examinations: A comparison of “A,”


Simon, H. A. (1990). Invariants of human behavior. Annual Review of
“C,” and “F” students. Teaching of Psychology, 10, 69 –71.
Psychology, 41, 1–19.
Meo, M., Roberts, M. J., & Marucci, F. S. (2007). Element salience as a
Snow, R. E. (1980). Aptitude processes. In R. E. Snow, P. A. Federico, &
predictor of item difficulty for Raven’s Progressive Matrices. Intelli-
W. E. Montague (Eds.), Aptitude, learning, and instruction: Vol. 1.
gence, 35, 359 –368.
Cognitive process analyses of aptitude (pp. 27– 63). Hillsdale, NJ: Erl-
Neisser, U., Boodoo, G., Bouchard, T. J., Boykin, A. W., Brody, N., Ceci,
baum.
S. J., . . . Urbina, S. (1996). Intelligence: Knowns and unknowns. Amer-
Snow, R. E., Kyllonen, P. C., & Marshalek, B. (1984). The topography of
ican Psychologist, 51, 77–101.
ability and learning correlations. In R. J. Sternberg (Ed.), Advances in
Nelson, T. O. (1984). A comparison of current measures of the accuracy of the psychology of human intelligence (Vol. 2, pp. 47–103). Hillsdale, NJ:
feeling-of-knowing predictions. Psychological Bulletin, 95, 109 –133. Erlbaum.
Oskamp, S. (1962). The relationship of clinical experience and training Stankov, L. (1998). Calibration curves, scatterplots, and the distinction
methods to several criteria of clinical prediction. Psychological Mono- between general knowledge and perceptual tasks. Learning and Individ-
graphs, 76(28, Whole No. 547), 1–21. ual Differences, 10, 29 –50.
Oskamp, S. (1965). Overconfidence in case study judgments. Journal of Sternberg, R. J. (1977). Component processes in analogical reasoning.
Consulting Psychology, 29, 261–265. Psychological Review, 84, 353–378.
Park, J., & Choi, B. (2008). Higher retention after a new take-home Sternberg, R. J. (1998). Metacognition, abilities, and developing expertise:
computerized test. British Journal of Educational Technology, 39, 538 – What makes an expert student? Instructional Science, 26, 127–140.
547. Sternberg, R. J. (1999). Successful intelligence: Finding a balance. Trends
Primi, R. (2001). Complexity of geometric inductive reasoning tasks: in Cognitive Sciences, 3, 436 – 442.
Contribution to the understanding of fluid intelligence. Intelligence, 30, Vigneau, F., Cassie, A. F., & Bors, D. A. (2006). Eye-movement analysis
41–70. demonstrates strategic influences on intelligence. Intelligence, 34, 261–
Raven, J. C., Raven, J., & Court, J. H. (1998). Manual for Raven’s 272.
Progressive Matrices and Vocabulary Scales: Section 4. Advanced Pro-
gressive Matrices, Sets I and II. Oxford, England: Oxford Psychologists Received May 13, 2009
Press. Revision received December 17, 2009
Rhodes, M. G., & Castel, A. D. (2008). Memory predictions are influenced Accepted December 18, 2009 䡲

You might also like