Professional Documents
Culture Documents
Solve the Problem First: Constructive Solution Strategies Can Influence the
Accuracy of Retrospective Confidence Judgments
Two experiments tested whether differences in problem-solving strategies influence the ability of people
to monitor their problem-solving effectiveness as measured by confidence judgments. On multiple choice
problems, people tend to use either a constructive matching strategy, whereby they attempt to solve a
problem before looking at the response options, or a response elimination strategy, whereby they work
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
backward from response options trying to find one that fits as a solution. Constructive matching gives rise
This document is copyrighted by the American Psychological Association or one of its allied publishers.
to different cues that may enhance confidence monitoring. Experiment 1 showed that spontaneous
constructive matching in nonverbal spatial reasoning problems was associated with better confidence
calibration and resolution than response elimination. We manipulated strategy in Experiment 2 by
requiring constructive matching and found improved monitoring. Implications for research on monitor-
ing, overconfidence, and the association between skill and monitoring are discussed.
Accurate monitoring is a key component of cognition. Solving memorize a word pair is a valid basis for confidence judgments
problems, even when one has a relatively high degree of accuracy, and judgments of learning, respectively.
is not sufficient for adaptive cognitive regulation unless one can When people engage in qualitatively different strategies on a
also tell whether a solution is correct versus incorrect. Confidence particular task, the resulting process differences are also likely to
in problem solving, as in other cognitive domains, must be based affect the kind of information one has available for confidence
on a range of cues and inferences (Koriat, 1993, 1997; Schwartz, monitoring. In the current set of experiments, we asked whether
Benjamin, & Bjork, 1997). The level of monitoring accuracy one different strategies used in a nonverbal inductive reasoning task,
can achieve depends on the type and quality of cues available Raven’s Advanced Progressive Matrices (Raven, Raven, & Court,
during monitoring rather than a direct assessment of knowledge or 1998), can create differences in the quality of confidence moni-
memory. toring and if these differences can be independent of effects on
One important source of cues for metacognitive judgments such accuracy. This task is well-suited to the question, as people are
as confidence is feedback from the cognitive operations leading up known to use different strategies on Raven’s Matrices, and those
to the production of a response (Koriat, 1997; Koriat, Ma’ayan, & strategies can be distinguished by using objective measures (Snow,
Nussinson, 2006). During problem solving, one finds out how 1980; Vigneau, Cassie, & Bors, 2006). In addition, the test has
difficult an item is by attempting to solve it (Kelley & Jacoby, excellent psychometric properties that reduce chance fluctuations
1996; Koriat et al., 2006, Experiment 7). Problems that do not in performance that might obscure assessment of monitoring
yield easily to attempts to solve them garner lower confidence (Budescu, Wallsten, & Au, 1997).
judgments. Koriat et al. (2006) demonstrated that one’s latency to
produce a response when solving a problem or attempting to
Strategy Use on Inductive Reasoning Tasks
Research examining performance on inductive, analogical, and
Ainsley L. Mitchum and Colleen M. Kelley, Department of Psychology, spatial/geometric reasoning tasks has shown a surprising degree of
Florida State University. both inter- and intraindividual flexibility in performance strategies
We thank Edward T. Cokely, Mark C. Fox, Katy Nandagopal, Tres (Bethell-Fox, Lohman, & Snow, 1984; Egan & Grimes-Farrow,
Roring, and Cari Zimmerman for feedback and lively discussion that 1982; Kossowska & Ne˛cka, 1994; Kyllonen, Lohman, & Woltz,
contributed substantially to the quality of this work. This research was 1984; Sternberg, 1977). Snow (1980) outlined two primary strat-
completed as part of Ainsley L. Mitchum’s thesis submitted to Florida egies that participants tend to use on multiple-choice nonverbal
State University. We thank committee members K. Anders Ericsson and reasoning tasks, constructive matching and response elimination.
Joyce Ehrlinger for their helpful feedback. We are also grateful for the Constructive matching, which is more likely to be favored by
assistance of our dedicated research team for help with data collection.
high-performing participants (Bethell-Fox et al., 1984; Schiano,
Finally, we thank David Schell for his assistance in preparing the final
version of the manuscript.
Cooper, Glaser, & Zhang, 1989; Snow, 1980; Vigneau et al.,
Correspondence concerning this article should be addressed to Ainsley 2006), is characterized by a tendency to spend proportionally more
L. Mitchum, Department of Psychology, Florida State University, 1107 time examining each problem before inspecting available answer
West Call Street, Tallahassee, FL 32306-4301. E-mail: mitchum@ choices. Converging evidence from verbal reports and eye-
psy.fsu.edu movement analyses suggests that participants spend this extra time
699
700 MITCHUM AND KELLEY
constructing a potential answer, which is then compared to the Effect of Strategy on Cues Available for Confidence
presented response options (Snow, 1980; Vigneau et al., 2006). Monitoring
Response elimination, which is more likely to be favored by poor
performers, is characterized by a more trial-and-error approach to Using qualitatively different strategies on a task should affect
solving items. Rather than predicting what the correct answer monitoring accuracy as well as performance by producing differ-
would look like beforehand, those using response elimination tend ences in the type and quality of cues available for monitoring.
to compare features of the stimulus items with features of response Compared to participants who rely heavily on response elimina-
options in hopes of eliminating incorrect responses, in essence tion, participants who use constructive matching may have a
“reasoning backward” from each potential response option qualitatively different and more diagnostic set of cues to draw
(Bethell-Fox et al., 1984; Snow, 1980). Snow and Bethell-Fox et from when making monitoring judgments. This allows them to
al. noted that individual participants often do not exclusively use more accurately evaluate their performance. Perhaps the most
one strategy or the other. Some participants initially used construc- salient cue available to those who use constructive matching, one
tive matching within tasks but switched to response elimination as that is not available to those using response elimination, is the
presence or absence of one’s generated response among the avail-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Method Results
Participants. Participants were 55 Florida State University Overall performance on the RAPM. Overall performance
undergraduates recruited from the general psychology participant on the RAPM was similar to that of the normative sample of 506
pool. Participants received course credit in exchange for their first-year university students collected by Bors and Stokes (1998).
participation. Data from five participants were excluded from all In the current sample (N ⫽ 50), the average score was 21.4 (SD ⫽
analyses because they indicated that they had previously partici- 5.57) correct responses out of a possible 36, whereas Bors and
pated in another experiment using Raven’s Matrices. Stokes reported a mean of 22.2 (SD ⫽ 5.60).
Materials, design, and procedure. Participants were tested Strategy measure. Following Vigneau et al. (2006), we took
individually in a single session lasting about one hour, during spending a larger proportion time viewing the matrix alone before
which they completed the nonverbal reasoning problems of displaying the response options as an indication of greater reliance
Raven’s Matrices and rated their confidence for each problem. The on the constructive matching strategy. Participants’ strategy use
testing was followed by a postexperimental questionnaire and was defined as the average proportion of total time per problem
debriefing. spent examining the matrix portion before displaying response
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
measured by proportion time on the problem matrix was related to variability (see Figure 2). Calibration error scores ranged from .06
resolution, r(48) ⫽ .40, p ⫽ .004, indicating that participants who to .51, with higher scores indicating poorer calibration. Calibration
relied more heavily on constructive matching had better relative error scores were related to overall performance such that those
monitoring accuracy in their confidence judgments. However, earning lower scores tended to have larger calibration error scores,
gamma correlations were unrelated to overall performance, r(48) ⫽ ⫺.51, p ⫽ .0002. More important, calibration error scores
r(48) ⫽ .17, p ⫽ .24. were also related to strategy use such that those relying more
Absolute accuracy. Absolute accuracy or calibration (also heavily on constructive matching had smaller calibration error
called bias), examines the magnitude of the difference between scores, r(48) ⫽ ⫺.52, p ⫽ .0001.
one’s level of subjective confidence and actual performance, Hierarchical regression analysis was used to test the hypothesis
indicating overconfidence, underconfidence, or perfect calibra- that strategy use accounts for unique variance in calibration, even
tion. This is typically done by plotting a calibration curve that after controlling for performance (see Table 1 for a summary).
displays actual performance as a function of subjective confi- Calibration error scores were regressed on overall performance
dence ratings. For example, perfect calibration would be indi- and strategy use. Together, these predictors accounted for a sig-
cated when items to which an individual assigns a 50% prob- nificant amount of variance in calibration, F(2, 47) ⫽ 15.43, p ⬍
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
ability of being correct are correct 50% of the time, and so forth .001, adjusted R2 ⫽ .37. Performance alone accounted for a
across the scale. significant proportion of the overall variance in calibration error
Participant confidence judgments for each of the 36 problems scores ( ⫽ ⫺.38, p ⫽ .003). More important and as predicted, the
were divided into 11 discrete categories (0 –12, 13–20, 21–30 . . . degree of constructive matching as measured by proportion time
91–99, 100). Calibration error scores (Oskamp, 1962, 1965) were on the matrix before moving to the response options also ac-
calculated for each participant as the weighted mean of the abso- counted for significant unique variance in calibration error ( ⫽
lute differences between the mean confidence and actual propor- ⫺.40, p ⫽ .002).
tion correct for each confidence grouping, Overconfidence and underconfidence. The difference be-
C ⫽ 冘 共n兩p ⫺ c兩兲/N,
tween average confidence and average accuracy was computed for
each participant to evaluate the relationship between overconfi-
where n is the total number of observations at each confidence dence or underconfidence and strategy use. Participants were
level, p is the assessed confidence level, c is the actual proportion slightly overconfident. The average difference between overall
correct at each confidence level, and N is the total number of confidence and overall performance was 8.44 (SD ⫽ 16.1), with
observations. Calibration error scores give an absolute measure of difference scores ranging from ⫺32.3 to 50.2. The relationship
calibration and do not indicate overconfidence or underconfidence. between strategy and difference scores was marginally significant,
We chose this measure because we wanted to examine absolute r(48) ⫽ ⫺.24, p ⫽ .09, such that those who relied more on
calibration independent of over- or underconfidence. constructive matching tended to be slightly underconfident and
As in past studies (e.g., Stankov, 1998), participants were fairly those who relied primarily on response elimination tended to be
well calibrated (M ⫽ .20, SD ⫽ .10) but showed considerable slightly overconfident.
Figure 2. Calibration plot for Experiment 1. Labels are the number of observations per confidence bin.
STRATEGIES, MONITORING, AND CONTROL 703
Table 1
Summary of Hierarchical Regression Predicting Calibration Error Scores in Experiment 1
Correlations
the answers. In the control condition, instructions were identical to p ⫽ .07. Given that a significant correlation between strategy use
those in Experiment 1. To promote and monitor compliance with and performance, as well as strategy use and both relative and
the strategy instruction to generate a response, we required partic- absolute monitoring accuracy, was found in Experiment 1, it is
ipants in the constructive matching condition to draw their candi- surprising that a similar pattern of results was not found in the
date answer on a piece of scratch paper before continuing to the control condition for Experiment 2. We believe there are two
answer screen. potential explanations. First, the correlation between strategy use
In Experiment 2, the problems were presented to participants in and overall performance in Experiment 1 was .33; the sample size
a random order rather than in ascending order according to nor- for the control condition in Experiment 2 would be insufficient to
mative difficulty, as is typically done when the RAPM is used as detect this. Second, as noted earlier, we changed the administration
an intelligence test. Pilot testing and results from Experiment 1 order of items in Experiment 2. In addition to leading to lower
suggested that the item difficulty gradient could be the basis for scores for Experiment 2, this could have affected other dependent
monitoring accuracy, in that one becomes able to anticipate that measures, such as strategy use. If participants, particularly those at
each problem is more difficult than the last. The availability of this the highest ability levels, learn during the course of the task,
cue would reduce participants’ reliance on more data-driven, presenting items in random order could have interfered with this.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
experience-based cues. By random ordering the problems, we We found a significant relationship in the strategy instructed
hoped to increase the likelihood that participants’ confidence judg- group between overall performance and both relative monitoring
ments were based on actual attempts to solve the problems rather
than quick inferences reflecting the difficulty of previous prob-
2
lems.2 For Experiment 1, the correlation between item number and confidence
was computed for each participant, after controlling for accuracy, as a
measure of the extent to which participants used item gradient as a
Results confidence cue. The average correlation for the group was ⫺.34, which
was significantly different from zero, t(49) ⫽ ⫺22.96, p ⬍ .001, suggest-
Overall performance. The mean proportion correct on the ing that participants did use item presentation order as a confidence cue.
problems did not differ between conditions (F ⬍ 1; see Table 2 for For Experiment 2, we had hoped that presenting the items in random order
means). Forcing participants to engage in constructive matching by would reduce the use of item sequence as a cue. To examine this we
drawing a candidate answer before viewing the response options computed the correlation between presentation order and confidence, con-
did not improve their performance.3 trolling for accuracy, for each participant in Experiment 2. The average
Monitoring accuracy. Gamma coefficients were calculated correlation was .02 (SD ⫽ .22) for the constructive matching group and .01
for each participant as a measure of resolution of confidence (SD ⫽ .22) for the control group. These correlations did not differ signif-
judgments. As predicted, participants in the constructive matching icantly from one another (F ⬍ 1) and were not significantly different from
zero, t(27) ⫽ 0.42, p ⫽ .68, for the constructive matching group or the
condition (M ⫽ .79, SD ⫽ .12) showed better relative monitoring
control group, t(28) ⫽ 0.14, p ⫽ .89. Because the standard sequencing used
accuracy than did participants in the control condition (M ⫽ .68, on the RAPM is based on normative difficulty, we wondered if random
SD ⫽ .23), F(1, 55) ⫽ 5.77, p ⫽ .02, d ⫽ 0.70.4 Gamma was ordering the items would change the degree to which confidence tracked
significantly different than zero for both the constructive matching normative item difficulty in Experiment 2. Again, we computed the within-
condition, t(27) ⫽ 35.53, p ⬍ .001, and the control condition, subjects correlation between item number and confidence for each partic-
t(28) ⫽ 16.07, p ⬍ .001. ipant. The relationship between item number (normative difficulty) and
Calibration error scores were calculated for each participant as confidence remained about the same in Experiment 2 as in Experiment 1.
a measure of absolute monitoring accuracy. Participants in the The average correlation was ⫺.35 (SD ⫽ .20) for the constructive match-
constructive matching condition showed significantly better abso- ing group and ⫺.40 (SD ⫽ .15) for the control condition, both of which
lute accuracy than did those in the control condition, F(1, 55) ⫽ were different from zero, t(27) ⫽ ⫺9.12, p ⬍ .001, and t(28) ⫽ ⫺13.81,
p ⬍ .001, respectively. These correlations did not differ from one another
4.68, p ⫽ .04, d ⫽ 0.60 (see Table 2). Inspection of the overall
(F ⬍ 1), nor did they differ significantly from the average correlation in
calibration figures (see Figure 3) suggests that constructive match- Experiment 1 (F ⬍ 1). We believe this is due to the high reliability of the
ing led to better calibration across the range of confidence, includ- test. In essence, random ordering the problems removes the possibility that
ing reductions in overconfidence at the higher levels of confidence participants are using presentation order as a cue but does not substantially
and reductions in underconfidence at the very lowest levels. We change the relationship between confidence and normative difficulty.
also computed the difference between overall accuracy and mean 3
Performance on the RAPM was slightly lower in Experiment 2 than in
confidence for each participant. The constructive matching in- Experiment 1. We believe that this is related to having administered the
structed group showed slightly, but not significantly, less overcon- items in random order (see Carlstedt, Gustafsson, & Ullstadius, 2000).
fidence, F(1, 55) ⫽ 2.27, p ⫽ .14. Average confidence ratings did However, it is important to note that Experiment 2 did not require that the
not differ between conditions (F ⬍ 1). RAPM scores be comparable to those of the normative sample but rather
Relationship between spontaneous strategy use, perfor- that the two experimental groups’ scores not differ significantly from one
another.
mance, and monitoring accuracy. As in Experiment 1, we 4
examined the relationship between spontaneous strategy, overall Analyses were repeated with Az and yielded the same results, F(1,
55) ⫽ 7.01, p ⫽ .01, d ⫽ 0.76. Average Az was .86 (SD ⫽ .07) for the
performance, and monitoring accuracy for the control condition in
constructive matching group and .79 (SD ⫽ .11) for the control group. Both
Experiment 2. Within the control condition, spontaneous strategy of these were significantly different from .50, t(27) ⫽ 27.52, p ⬍ .001, and
use was unrelated both to overall performance, r(28) ⫽ .11, p ⫽ t(28) ⫽ 14.18, p ⬍ .001, respectively. As in Experiment 1, Az and gamma
.58, and to relative monitoring accuracy, r(28) ⫽ ⫺.06, p ⫽ .77, were highly correlated for both the constructive matching condition,
but the relationship between spontaneous strategy use and absolute r(26) ⫽ .81, p ⬍ .001, and the control condition, r(27) ⫽ .92, p ⬍ .001.
monitoring accuracy reached marginal significance, r(28) ⫽ .34, These correlations did not differ significantly (z ⫽ ⫺1.43, p ⫽ .08).
STRATEGIES, MONITORING, AND CONTROL 705
Table 2
Summary of Descriptive Statistics
Experiment 1 (N ⫽ 50) 21.44 (5.57) 68.02 (12.38) .70 (.19) .20 (.10) 8.44 (16.07)
Experiment 2
Strategy (N ⫽ 28) 19.00 (7.55) 57.50 (17.29) .79 (.12) .16 (.07) 4.43 (10.78)
Control (N ⫽ 29) 18.93 (5.24) 62.10 (17.56) .68 (.23) .21 (.09) 9.52 (14.39)
Note. The table gives group averages for critical measures. Standard deviations are shown in parentheses.
RAPM scores reflect the raw score out of 36 items. Confidence is the average confidence for all 36 items
averaged across participants. Difference scores are the signed difference between average confidence and
average accuracy, which was computed for each participant individually. RAPM ⫽ Raven’s Advanced Pro-
gressive Matrices.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
accuracy, r(26) ⫽ .39, p ⫽ .04, and absolute monitoring accuracy, matching instructed condition, also likely compared their gener-
r(26) ⫽ ⫺.39, p ⫽ .04. Participants who performed better on the ated response to the presented response choices (see Figure 4).
task also tended to have better monitoring accuracy across both
measures. Although this suggests that high performers tend to Discussion
monitor their performance more accurately, our data do not allow
us to speculate as to whether the monitoring advantage for high Our major goal in Experiment 2 was to test whether we could
performers emerged because they were better able to carry out the manipulate the likelihood of participants using constructive match-
instructed strategy or whether this advantage was simply a product ing by having them draw a candidate answer before seeing the
of higher performance overall. response options, which we predicted would produce a concomi-
Sources of improved monitoring: Differential cue availabil- tant improvement in the accuracy of confidence judgments. Mon-
ity and use. itoring accuracy did improve in the constructive matching in-
Presence or absence of constructed option among response structed condition, both in terms of higher monitoring resolution
options. One cue for monitoring created by our strategy manip- and lower calibration error scores. This advantage in monitoring
ulation is the presence or absence of one’s generated response accuracy occurred in the absence of performance differences be-
among the available options. On problems in which the generated tween groups, suggesting that the observed differences can be
response option was found among the given options, participants primarily attributed to differential strategy use. We believe this
using constructive matching should have entered their responses finding is particularly important, because it is often difficult to
quickly (see Park & Choi, 2008) and been more confident. If they disentangle what advantages of strategy use are due to the strategy
did not find their generated response, they should have been slower itself and which are due to ability advantages associated with
to enter a response (most likely because they were attempting to spontaneously selecting a given strategy.
revise their response) and less confident in that response. We suggest that constructing an answer before looking at the
To test whether participants were making use of the additional response options produces an additional cue for confidence judg-
cue of finding or not finding their generated response among the ments, namely, the confidence-enhancing presence of one’s con-
presented answers, we computed the within-subjects correlation structed answer among the response options or the confidence-
between time to select a response (on the screen displaying the undermining absence of one’s constructed answer among the
matrix and answers) and confidence for all participants. On aver- response options. Within the constructive matching group, partic-
age, participants in the constructive matching condition showed a ipants who took longer to select a response on the answer screen
significantly stronger relationship between latency to enter a re- tended to be less confident, likely because they did not find their
sponse and confidence (mean correlation, r ⫽ ⫺.38, SD ⫽ .18) generated option and took time to revise their answer, and those
than did participants in the control condition (mean correlation, who chose a response quickly were more confident. We found
r ⫽ ⫺.08, SD ⫽ .19), F(1, 55) ⫽ 36.93, p ⬍ .001, d ⫽ 1.62. For additional support for use of such a cue in the negative relation
both the constructive matching condition, t(27) ⫽ ⫺10.97, p ⬍ between time to select a response option and confidence for those
.001, and the control condition, t(28) ⫽ ⫺2.13, p ⫽ .04, the spontaneously using constructive matching in the control condi-
correlation between latency to produce a response and confidence tion, suggesting that the effects of our strategy manipulation on
was different than zero. monitoring were qualitatively similar to those of an endogenously
Within the control condition, there was a significant negative generated strategy.
relationship among the strategy measure, proportion time on ma-
trix, and the within-subjects correlation between time to enter a General Discussion
response and confidence, r(27) ⫽ ⫺.54, p ⫽ .003. That is, those
participants who spontaneously used the constructive matching The current experiments offer evidence that differences in task
strategy also showed a stronger negative relationship between time strategy on a logical reasoning task can causally affect the moni-
to select a response and confidence. This finding suggests that toring accuracy of confidence judgments, independent of overall
participants in the control condition who spontaneously used con- task performance. Experiment 1 established a link between spon-
structive matching, similar to participants in the constructive taneous task strategy, overall task performance, and monitoring
706 MITCHUM AND KELLEY
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Figure 3. Calibration plot for the experimental group (top panel) and control group (bottom panel) in
Experiment 2. Labels are the number of observations per confidence bin.
accuracy. Although use of the constructive matching strategy was but may also be a result of the informational advantage associated
associated with slightly better performance, strategies were found with selection of more effective and adaptive strategies.
to account for unique variance in absolute monitoring accuracy Experiment 2 demonstrated that differential strategy use can
even when controlling for overall performance differences. These causally affect monitoring accuracy, independent of performance
findings suggest that the superior monitoring accuracy of those differences. Across several different measures of monitoring, par-
performing well on tasks is not entirely due to differences in ability ticipants instructed to use constructive matching consistently dem-
STRATEGIES, MONITORING, AND CONTROL 707
matrix. It follows that it is relatively easy to bolster one’s confi- behaviors ranging from occupational and educational performance
dence in a wrong answer by seeing that it fits with adjacent figures. to health outcomes (Hunter & Schmidt, 1996; Neisser et al., 1996).
Such a process might contribute to overconfidence in incorrect Monitoring is not part of the psychometric use of the RAPM, yet
responses. In contrast, constructive matching may lead to more we speculate that it could add important information to an under-
thorough consideration of the matrix that places more constraints standing of individual differences in reasoning. For example, the
on possible options and so results in less overconfidence in incor- constructive matching manipulation used in Experiment 2 did not
rect answers. change people’s scores, but it did improve the extent to which they
We predict that constructive matching is likely to improve calibra- knew what they knew. The confidence people have in their rea-
tion in many other tasks as well, relative to a strategy of working soning and judgment determines their reliance on the “answer” for
backward from candidate options. For example, the classic task in action and the reliance others may place on their recommenda-
overconfidence research is answering general knowledge questions. tions. Monitoring on the RAPM may predict additional variance in
People taking general knowledge tests vary in whether they at- real-world performance indicators, beyond that predicted by over-
tempt to answer a question before looking at response options or all score. Koren et al. (2004) demonstrated that monitoring accu-
work backward from response options. McClain (1983) found that racy on an executive functioning task, the Wisconsin Card Sorting
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
top students (i.e., students who earn “A” grades in their courses) Task, was a better predictor of poor insight (which is associated
taking a class exam first answered the question on their own with treatment outcomes) in patients with schizophrenia than the
and then considered all the options. Lower performing students conventional scores themselves.
(i.e., those who earn “C” grades) tended to go directly to the It is theoretically interesting that our results reveal individual
options and appeared to consider only one or two. We are currently differences in monitoring (for a related discussion, see Hertzog &
testing whether requiring participants to answer a question before Dunlosky, 2004). Although there may not be a general monitoring
seeing the response options improves monitoring of general ability (e.g., Kelemen, Frost, & Weaver, 2000), one avenue for
knowledge and to what extent the improvement is mediated by a future investigation is determining whether stable individual dif-
fuller consideration of response options, particularly when one’s ferences in strategic behavior could produce accompanying indi-
generated response is disconfirmed. vidual differences in monitoring accuracy (Cokely et al., 2006).
Taking a more predictive approach to tasks benefits monitoring
Additional Cues Linked to Performance and and is a common metastrategy favored by high performers across
domains (Baron, 1985; Cokely & Kelley, 2009; Dunlosky, Raw-
Strategy Use
son, & Middleton, 2005; Ericsson & Charness, 1994; Hertzog &
Although in the current experiments we have focused exclu- Robinson, 2005; Kossowska & Necka, 1994; McClain, 1983;
sively on one strategy-related cue, specifically, the presence or Sternberg, 1998). The two strategies examined in the current work,
absence of one’s constructed answer among response options, constructive matching and response elimination, could be classi-
there are other cues that could be used during confidence moni- fied as examples of a predictive strategy versus a more confirma-
toring in nonverbal reasoning tasks similar to those of the RAPM. tory strategy (in which one works backward). In general, one
Many of these cues are likely related both to performance on the would expect that predictive strategies would yield more informa-
task and, to a lesser extent, to task strategy. For example, Schiano tion that could be useful in monitoring one’s performance. How-
et al. (1989) found that, when asked to sort items into groups of ever, the current study was one of the first to examine the extent
similar items, high scorers grouped items on the basis of abstract to which these different strategic approaches influence monitoring
transformational relations (e.g., simple rotations vs. multiple trans- accuracy. More generally, predictive strategies similar to construc-
formations such as rotation and reflection) and low scorers sorted tive matching may also be useful because they could be taught to
problems into categories based on perceptual similarities or shared individuals with poor monitoring skills (Sternberg, 1999). Our
figural characteristics (e.g., “diamondlike” shapes and figures were results demonstrate that individuals can reap monitoring benefits
sorted together). Several studies have linked perceptual features from this kind of strategy even in the face of significant individual
and overall complexity of matrix and geometric analogy items to differences in performance.
normative difficulty (Meo, Roberts, & Marucci, 2007; Primi, Related work in social psychology finds that poor performers
2001), and, at least in the case of memory monitoring judgments, are often “unskilled and unaware.” That is, low performance tends
perceptual features have been shown to affect the magnitude of to beget poor monitoring because low performers lack insight into
predictive judgments (Rhodes & Castel, 2008). If high and low their own errors (Dunning, Johnson, Ehrlinger, & Kruger, 2003;
performers differ in their mental representations of items, this Ehrlinger, 2008; Ehrlinger, Johnson, Banner, Dunning, & Kruger,
could also affect how perceptual and item features are used as cues 2008; Kruger & Dunning, 1999; Maki, Shields, Wheeler, & Zac-
for confidence monitoring, with high and low performers using chilli, 2005). Although Krueger and Mueller (2002) showed that in
different criterion to classify an item as “easy” or “difficult.” some cases the unskilled and unaware effect is a statistical artifact
due to regression to the mean, this is more likely to be the case
Intelligence, Strategies, and Superior Performance when tests are unreliable and so is less likely to be true when
highly reliable tests, such as the RAPM, are used to assess per-
We chose to study monitoring in the context of an intelligence formance. More recently, Krajc and Ortmann (2008) proposed
test that is widely considered the best measure of general fluid another alternative explanation for the unskilled and unaware
intelligence (Marshalek, Lohman, & Snow, 1983; Snow, Kyl- effect. They pointed out, as part of their explanation, that estimat-
lonen, & Marshalek, 1984). Performance on Raven’s Matrices, and ing relative standing on tasks is more difficult for low than for high
on intelligence tests in general, predicts a number of real-world performers because the former have fewer diagnostic cues avail-
STRATEGIES, MONITORING, AND CONTROL 709
able as a basis for estimates. Similarly, but at lower level of fail to recognize their own incompetence. Current Directions in Psy-
analysis, our work suggests that, at the item level, strategies used chological Science, 12, 83– 87.
by poor performers may limit the cues available for monitoring Egan, D. E., & Grimes-Farrow, D. D. (1982). Differences in mental
and so may contribute to the unskilled and unaware phenomenon. representations spontaneously adopted for reasoning. Memory & Cog-
Instructing participants in the use of more constructive strategies nition, 10, 297–307.
that produce self-correcting feedback may be one avenue for Ehrlinger, J. (2008). Skill level, self-views, and self-theories as sources of
error in self-assessment. Social and Personality Psychology Compass, 2,
reducing potential “unskilled and unaware” type effects.
382–398.
Ehrlinger, J., Johnson, K., Banner, M., Dunning, D., & Kruger, J. (2008).
Conclusion Why the unskilled are unaware: Further explorations of (absent) self-
insight among the incompetent. Organizational Behavior and Human
It is well established that the selection of appropriate task Decision Processes, 105, 98 –121.
strategies is crucial for effective task performance (Gigerenzer et Ericsson, K. A., & Charness, N. (1994). Expert performance: Its structure
al., 1999; Simon, 1990; Snow, 1980; Sternberg, 1977). In the and acquisition. American Psychologist, 49, 725–747.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
current experiments, we extended these findings by demonstrating Gigerenzer, G., Todd, P. M., & the ABC Research Group. (1999). Simple
This document is copyrighted by the American Psychological Association or one of its allied publishers.
that task strategies also influence monitoring accuracy. Monitoring heuristics that make us smart. New York, NY: Oxford University Press.
accuracy can be improved by using a prediction-based strategy, Hertzog, C., & Dunlosky, J. (2004). Aging, metacognition, and cognitive
such as constructive matching, relative to confirmatory strategies control. In B. H. Ross (Ed.), Psychology of learning and motivation (pp.
that involve working backward, as in the case of response elimi- 215–251). San Diego: CA: Academic Press.
Hertzog, C., & Robinson, A. E. (2005). Metacognition and intelligence. In
nation. Our results demonstrate that strategies can affect monitor-
O. Wilhelm & R. W. Engle (Eds.), Understanding and measuring
ing by creating the opportunity for self-generated feedback about
intelligence (pp. 101–123). London, England: Sage.
ongoing performance. Moreover, our results shed light on some of Higham, P. A., & Arnold, M. M. (2007). How many questions should I
the influential strategic sources of superior cognitive regulation. answer? Using bias profiles to estimate optimal bias and maximum score
on formula-scored tests. European Journal of Cognitive Psychology, 19,
References 718 –742.
Hunter, J. E., & Schmidt, F. L. (1996). Intelligence and job performance:
Baron, J. (1978). Intelligence and general strategies. In G. Underwood Economic and social implications. Psychology Public Policy and Law, 2,
(Ed.), Strategies of information processing (pp. 403– 450). London, 447– 472.
England: Academic Press. Jaeggi, S. M., Buschkuehl, M., Jonides, J., & Perrig, W. J. (2008). Im-
Baron, J. (1985). Rationality and intelligence. New York, NY: Cambridge proving fluid intelligence with training on working memory. Proceed-
University Press. ings of the National Academy of Sciences, USA, 105, 6829 – 6833.
Bethell-Fox, C. E., Lohman, D. F., & Snow, R. E. (1984). Adaptive Kelemen, W. L., Frost, P. J., & Weaver, C. A. (2000). Individual differ-
reasoning: Componential and eye-movement analysis of geometric anal- ences in metacognition: Evidence against a general metacognitive abil-
ogy performance. Intelligence, 8, 205–238. ity. Memory & Cognition, 28, 92–107.
Bors, D. A., & Stokes, T. L. (1998). Raven’s Advanced Progressive Kelley, C. M., & Jacoby, L. L. (1996). Adult egocentrism: Subjective
Matrices: Norms for first-year university students and the development experience versus analytic bases for judgment. Journal of Memory and
of a short form. Educational and Psychological Measurement, 58, 382– Language, 35, 157–175.
398. Koren, D., Seidman, L. J., Poyurovsky, M., Goldsmith, M., Viksman, P.,
Bryan, J., Luszcz, M. A., & Pointer, S. (1999). Executive function and Zichel, S., & Klein, E. (2004). The neuropsychological basis of insight
processing resources as predictors of adult age differences in the imple- in first-episode schizophrenia: A pilot metacognitive study. Schizophre-
mentation of encoding strategies. Aging, Neuropsychology, and Cogni- nia Research, 70, 195–202.
tion, 6, 273–287.
Koriat, A. (1993). How do we know that we know? The accessibility model
Budescu, D. V., Wallsten, T. S., & Au, W. T. (1997). On the importance
of feeling of knowing. Psychological Review, 100, 609 – 639.
of random error in the study of probability judgment: Part II. Applying
Koriat, A. (1997). Monitoring one’s own knowledge during study: A
the stochastic judgment model to detect systematic trends. Journal of
cue-utilization approach to judgments of learning. Journal of Experi-
Behavioral Decision Making, 10, 173–188.
mental Psychology: General, 126, 349 –370.
Butler, A. C., Marsh, E. J., Goode, M. K., & Roediger, H. L., III. (2006).
Koriat, A., Ma’ayan, H., & Nussinson, R. (2006). The intricate relation-
When additional multiple-choice lures aid versus hinder memory. Ap-
ships between monitoring and control in metacognition: Lessons for the
plied Cognitive Psychology, 20, 941–956.
Carlstedt, B., Gustafsson, J., & Ullstadius, E. (2000). Item sequencing cause-and-effect relation between subjective experience and behavior.
effects on the measurement of fluid intelligence. Intelligence, 28, 145– Journal of Experimental Psychology: General, 135, 36 – 69.
160. Kossowska, M., & Ne˛cka, E. (1994). Do it your own way: Cognitive
Cokely, E. T., & Kelley, C. M. (2009). Cognitive abilities and superior strategies, intelligence, and personality. Personality and Individual Dif-
decision making under risk: A protocol analysis and process model ferences, 16, 33– 46.
evaluation. Judgment and Decision Making, 4, 20 –33. Krajc, M., & Ortmann, A. (2008). Are the unskilled really that unaware?
Cokely, E. T., Kelley, C. M., & Gilchrist, A. H. (2006). Sources of An alternative explanation. Journal of Economic Psychology, 29, 724 –
individual differences in working memory: Contributions of strategy to 738.
capacity. Psychonomic Bulletin & Review, 13, 991–997. Krueger, J., & Mueller, R. A. (2002). Unskilled, unaware, or both? The
Dunlosky, J., Rawson, K. A., & Middleton, E. L. (2005). What constrains better-than-average heuristic and statistical regression predict errors in
the accuracy of metacomprehension judgments? Testing the transfer- estimates of own performance. Journal of Personality and Social Psy-
appropriate-monitoring and accessibility hypothesis. Journal of Memory chology, 82, 180 –188.
and Language, 52, 551–565. Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How
Dunning, D., Johnson, K., Ehrlinger, J., & Kruger, J. (2003). Why people difficulties in recognizing one’s own incompetence lead to inflated
710 MITCHUM AND KELLEY
self-assessments. Journal of Personality and Social Psychology, 77, by perceptual information: Evidence for metacognitive illusions. Jour-
1121–1134. nal of Experimental Psychology: General, 137, 615– 625.
Kyllonen, P. C., Lohman, D. F., & Woltz, D. J. (1984). Componential Schiano, D. J., Cooper, L. A., Glaser, R., & Zhang, H. C. (1989). Highs are
modeling of alternative strategies for performing spatial tasks. Journal of to lows as experts are to novices: Individual differences in the represen-
Educational Psychology, 76, 1325–1345. tation and solution of standardized figural analogies. Human Perfor-
Maki, R. H., Shields, M., Wheeler, A. E., & Zacchilli, T. L. (2005). mance, 2, 225–248.
Individual differences in absolute and relative metacomprehension ac- Schwartz, B. L., Benjamin, A. S., & Bjork, R. A. (1997). The inferential
curacy. Journal of Educational Psychology, 97, 723–731. and experiential bases of metamemory. Current Directions in Psycho-
Marshalek, B., Lohman, D. F., & Snow, R. E. (1983). The complexity logical Science, 6, 132–137.
continuum in the radex and hierarchical models of intelligence. Intelli- Sieck, W. R., Merkle, E. C., & Van Zandt, T. (2007). Option fixation: A
gence, 7, 107–127. cognitive contributor to overconfidence. Organizational Behavior and
Masson, M. E. J., & Rotello, C. M. (2009). Sources of bias in the Human Decision Processes, 103, 68 – 83.
Goodman–Kruskal gamma coefficient measure of association: Implica- Sieck, W. R., & Yates, J. F. (2001). Overconfidence effects in category
tions for studies of metacognitive processes. Journal of Experimental learning: A comparison of connectionist and exemplar memory models.
Journal of Experimental Psychology: Learning, Memory, and Cogni-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.