You are on page 1of 21

Educational Psychology

Vol. 28, No. 5, August 2008, 483–503

Combined online and in-class pretesting improves exam performance in


general psychology
Arnold Lewis Glass*, Gary Brill and Margaret Ingate

Psychology Department, Rutgers University, Piscataway, NJ, USA


(Received 2 August 2007; final version received 29 October 2007)
Taylor and Francis
CEDP_A_277732.sgm

This study examined the effect of distributed questioning on learning and retention in a college
Educational
10.1080/01443410701777280
0144-3410
Original
Taylor
02008
00
Dr.
aglass@rutgers.edu
000002008
ArnoldGlass
&Article
Francis
(print)/1469-5820
Psychology (online)

lecture course. A total of 48 question pairs were presented over four exams. The 16 question
pairs associated with each of the three blocks of the course appeared on the block exams, and
all 48 appeared on the final exam. The two questions in each pair were related to each other,
so that knowing the answer to one question usually implied knowing the answer to the other.
One question in each pair was included in an experimental condition, in which questions were
presented online, in class, or both online and in class, before appearing in exams. These
conditions were counter-balanced across the sample. The control questions appeared only in
exams. Providing a question online in advance of class, as well as in class, had a significant
long-term effect on the probability of knowing the answers to both experimental and control
questions when they appeared in exams. These results demonstrate that coordinated online and
in-class instruction can significantly improve exam performance. These results are consistent
with the hypothesis that distributed instruction creates more robust memory traces, rather than
the hypothesis that it creates additional memory traces.
Keywords: distributed study; testing effect

The study of learning has revealed several effects of theoretical importance to the understanding
of learning and of potential practical importance to pedagogy. Two of these effects were the focus
of this study. The first is the finding that active study is more likely to produce long-term retention
than passive study. The second is the finding that distributed repetition increases the duration of
the interval over which study material is retained.

Distributed testing
Any task that requires a response to the study material beyond mere repetition constitutes active
study. For example, listening to or reading study material is passive study. However, answering
a question about it or recalling the study material, after even a short delay, is active study.
Much experimental research in cognition has demonstrated the superiority of active learning.
For example, when subjects made up their own linking sentences for the words in a study pair,
they performed twice as well on a subsequent cued recall test as did subjects whose linking
sentences were composed by someone else (Bobrow & Bower, 1969). Similarly, Glass, Krejci,
and Goldman (1989) found that recognition of nine-digit strings an hour after study was more
than twice as high among subjects who had verbally repeated the strings after presentation than
among subjects who passively observed the strings. The general phenomenon of improved reten-
tion as a result of actively producing information is known as the ‘generation effect’ (Slamecka

*Corresponding author. Email: aglass@rutgers.edu

ISSN 0144-3410 print/ISSN 1469-5820 online


© 2008 Taylor & Francis
DOI: 10.1080/01443410701777280
http://www.informaworld.com
484 A.L. Glass et al.

& Graf, 1978), and has become one of the most robust findings emerging from the laboratory
study of learning (deWinstanley & Bjork, 2002). Importantly, Foos, Mora, and Tkacz (1994)
showed that the generation effect can be found in natural educational settings for material specif-
ically targeted by student-generated outlines and questions.
The specific active study task investigated here is answering a question about the study mate-
rial, which creates what is called the testing effect, as first reported by Gates (1917). Subsequently,
a facilitating effect of a recall test on recognition was reported by Zangwill (1939). The second
focus of this study was the effect of distributed repetition on retention – that is, whether spaced
study trials would produce longer retention than massed trials, as first reported by Ebbinghaus
(1885/1964) at the dawn of experimental psychology.
Despite the venerable history of these effects, the cognitive processes that produce them are
not understood, and their benefit to pedagogy is speculative. Research into these effects has
involved brief retention intervals, generally within the confines of a single laboratory session, and
never greater than a week. Also, most research has involved relatively meaningless tasks such as
the recall of a word list. These research tasks provide only a limited amount of information about
the phenomena. Clearly, research with longer retention intervals and more meaningful materials
is needed. These observations have been made before. In the most recent extensive review of the
effect of distributed study on verbal tasks, Cepeda, Pashler, Vul, Wixted, and Rohrer (2006) stated,
‘As noted above, new studies are sorely needed to clarify the effects of interstudy and retention
intervals that are educationally relevant, that is, in the order of weeks, months, or years’ (p. 370).
In a recent research report on the test effect, Roediger and Marsh (2005) stated:
Our experiment and the few prior ones similar to it (e.g., Brown, Schilling, & Hockensmith, 1999;
Toppino & Luipersbeck, 1993) are just the beginnings of research that will determine effects of
testing under conditions that are somewhat realistic for educational concerns. (p. 1158)
Until now, such research has been expensive and extremely time-consuming. The study of the
retention of meaningful information over an extended period necessarily requires recruiting
subjects willing to devote the time necessary to study the material and to be repeatedly tested,
over an interval of weeks or even months.
However, the introduction of new technologies has introduced a methodology for studying
the effect of repeated testing of meaningful study materials over educationally typical retention
intervals. These technologies include course management systems that make it possible to admin-
ister quizzes to students online, and personal response systems that make it possible to collect
responses from students in class. This study made use of these technologies to embed a counter-
balanced, within-subject and within-item, experiment within a two-section general psychology
course in order to investigate the effect of repeated testing on the retention of general psychology
knowledge under conditions that have immediate implications for pedagogy, since results were
obtained in an actual course context.
Discussion of the new methodology used in this study is divided into several parts. First, the
new online and in-class technologies are described and the limited research literature on them is
reviewed. Second, the research relevant to this study and the purpose of the study are described.
Finally, the advantages and possible disadvantages of this methodology are described.

Online and clicker technologies


The introduction of classroom response systems, also known as personal response systems or
‘clickers’, has made it possible to study learning, retention, and retrieval experimentally with more
natural materials and in more natural settings. A clicker is a response pad about the size of a cell
phone with a keypad similar to that of a cell phone. When the instructor presents or asks a question,
students respond by pressing the appropriate key or keys on their clickers. The responses are
Educational Psychology 485

collected by a receiver attached to the instructor’s laptop and immediately analysed, and may be
immediately displayed to the students, with the correct answer, through a projector attached to
the laptop. Software allows response collection and presentation to be integrated with PowerPoint
presentations. The student responses are saved in a file and may be graded. Although clickers are
now in widespread use, published research typically focuses on students’ reactions and lecture
attendance (Elliott, 2003). Relatively few studies report effects on student learning. Those that do
are correlational and lack rigorous experimental design (e.g., Kennedy & Cutts, 2005).
Similarly, online course management systems such as webCT or Blackboard are widely used
to supplement classroom instruction. Sometimes the system is used by the instructor to present
his/her own materials and sometimes it is used to present materials supplied by a publisher and
linked to a specific textbook. However, the impact of the use of such a system on student learning
is seldom evaluated. Students are able to take quizzes online and receive immediate feedback.
Their responses are collected in a file and may be graded. Romanov and Nevgi (2006) report a
recent experimental design comparing the final course grades of medical students who had been
for part of their course supported either by an interactive webCT site or by a passive website.
Students in the webCT condition obtained significantly higher grades, although the effect was
modest.

The effect of distributed repeated study on retention


As mentioned previously, the testing of recently studied material, the task used here, has been
shown to enhance memory. In two experiments, Roediger and Karpicke (2006) had students study
prose passages, and then either take one or three immediate free-recall tests (without feedback),
or restudy the material the same number of times again as the students who received the tests.
When compared with additional study, the immediate free-recall tests at the time of study
produced better free-recall performance on retention tests two days and one week later. Similarly,
Roediger and Marsh (2005) found that answering multiple-choice questions enhanced free-recall
performance. Furthermore, Karpicke and Roediger (2007) compared a study condition in which
subjects repeatedly recalled an entire word list until they recalled the entire list correctly with a
condition in which the subjects were only asked to recall those words that they failed to recall on
the previous trial. A week later, the probability of final recall was a function of the number of
times a word had been recalled during the study phase.
The purpose of the current study was, first, to determine the magnitude of the effect of
repeated questioning in an actual course; and second, to investigate the reason for the effect. This
study was a foundational attempt to scale up a laboratory-bound experimental design to the
context of a 16-week-long college lecture course.
One difference between experimental studies of learning and academic education is that
experimental studies often make use of simple learning materials such as word lists and verbatim
recognition or recall (e.g., Karpicke & Roediger, 2007; McDaniel & Masson, 1985), while
academic education focuses on comprehension and memory for gist. To make the results of this
study directly comparable to experimental studies, and also to address the goals of education, both
verbatim and gist measures of learning were used.
A total of 48 pairs of multiple-choice questions (each with four choices of answer) were
presented over four exams, (16 on each block exam and all 48 on the final exam). The two ques-
tions in each pair were related to each other; knowing the answer to one question usually implied
knowing the answer to the other. One question in each pair was included in the experimental
condition. The other question was used in the control condition. Questions in the experimental
condition may have appeared earlier online and/or in class, and hence were repeated questions
that provided a measure of gist retention. The questions in the control condition had not appeared
486 A.L. Glass et al.

before the exams, and so provided a measure of gist retention since each question was similar to
a repeated experimental question.
Another issue addressed was the effect of a single quiz versus a quiz repeated on two successive
days. At the outset of the study, it was an open question (at least in the minds of the investigators)
whether a single quiz would be sufficient to produce an enduring effect on exam performance.
There is some evidence that once a study item has elicited a correct response, additional study
trials are unnecessary (Bahrick, 1979; Buschke, 1973; Rohrer, Taylor, Pashler, Wixted, & Cepeda,
2005; Thompson, Wenger, & Bartling, 1978). On the other hand, if a single correct response
proved insufficient for long-term retention, the extensive experimental literature on the effect of
distributed study indicated that two presentations of the quiz on successive days would be suffi-
cient to cause a long-term effect (Cepeda et al., 2006).
Perhaps the best documented and least applied effect on learning is the effect of distributed
study on retention. As mentioned above, repeating a study trial produces longer retention than a
single study trial, and the retention interval is proportional to the interval between study trials
(Cepeda et al., 2006). Hence, repeating a quiz on successive days should have produced longer
retention than presenting the quiz only once. Of course, students are free to engage in spaced
study themselves throughout the course. If students do this strategically, they would render redun-
dant any experimental manipulations of repetition within the course – however, the extent to
which students spontaneously engage in spaced study outside specific course assignments is an
open question. If many students did not spontaneously practice spaced study, then including
repeated quizzes within the framework of course assignments should have increased retention.

Two explanatory hypotheses of the effect of repeated testing on retention


The relative effectiveness of one and two previous quizzes on subsequent retrieval should
discriminate between two quite different explanations of the effect of an immediate question on
subsequent performance.
One possibility is that testing creates a quantitatively more detailed memory trace than mere
study. According to this hypothesis, the self-cueing and evaluation process involved in producing
a response to a test question results in the formation of associations between the target answer and
related information. McDaniel and colleagues (McDaniel & Fisher, 1991; McDaniel, Kowitz, &
Dunay, 1989; McDaniel & Masson, 1985) have argued that testing enhances learning by produc-
ing elaboration of existing memory traces and their cue–target relationships, and Bjork (1988) has
suggested that testing operates by multiplying the number of ‘retrieval routes’ to stored events.
These additional associations increase the probability of retrieval on subsequent tests. According
to the quantitative hypothesis, since the act of generating the correct answer creates more associ-
ations than mere study, at all retention intervals the probability of retrieving the correct response
should be greater following active testing than following mere study. Furthermore, only a single
immediate quiz should be required to obtain this effect, although two quizzes might produce a
larger effect than one quiz.
A second possibility is that testing creates a qualitatively more robust memory trace than study
does. That is, the trace remains available for retrieval over a longer retention interval. According
to this hypothesis, most daily experience is guided by a fragile, short-term representation of the
immediate context. The part of the contextual representation that is relevant to a participant’s
actions may be consolidated in a robust, long-term episodic representation of the action and its
effect. Thus, study produces a fragile, short-term representation that soon becomes unavailable
for retrieval. However, questioning requires the voluntary action of producing a response, which
selectively creates a more robust representation of the question and the response. Furthermore,
and most important, repeating a question on successive days signals to the memory system that
Educational Psychology 487

the question is a routine rather than idiosyncratic event, and that it may occur again in the future.
Consequently, a much more robust and longer-term representation of the answer is constructed.
Unlike the quantitative hypothesis, the qualitative hypothesis would not predict a difference
in immediate retrieval between study and questioning; this difference would be evident only after
a delay, when the fragile, short-term representation produced by study alone is no longer avail-
able. Also, unlike the quantitative hypothesis, the qualitative hypothesis does not predict cumu-
lative effects of previous quizzes on all retention intervals. That is, it does not predict that two
previous quizzes will produce better performance than one previous quiz, which in turn will
produce better performance than no previous quiz, at all retention intervals. Rather, it predicts that
on immediate test there may be no effect of the number of previous quizzes, at a longer retention
interval one or two quizzes will produce better retention than no quiz, and at a still longer reten-
tion interval two quizzes will produce better retention than one or no quiz.
Glass et al. (1989) observed no difference in immediate recognition between active study (i.e.,
immediate recall of a digit string) and passive study (i.e., shadowing a digit string); however, they
found that only an actively studied digit string was recognised after a filled delay of 30 minutes.
Similarly, Roediger and Karpicke (2006) found that their active learning intervention (repeated
testing, as contrasted with repeated study) resulted in better retention over both a two-day and a
one-week period, but not after a five-minute interval. The lack of an immediate effect is consis-
tent with the qualitative hypothesis.
In order to evaluate the effect of repeated questioning on retention, half of the 48 questions in
the experimental condition were presented in class. Also, before class, half of the experimental
questions presented in class and half of the experimental questions not presented were presented
online. Students from a two-section general psychology course participated in the study. Each
student saw 12 questions in each of four conditions: both online and in class, online only, in class
only, and neither. Students on each section saw different questions in each of these conditions:
questions that were presented both online and in class to one section were presented neither online
nor in class to the other section. Questions that were presented only online to one section were
presented in class to the other section.

Notes on a new methodology


There are both advantages and disadvantages to embedding an experimental study within an
actual course, as compared with performing it in the laboratory.
There are three obvious advantages. First, significant results with useful effect sizes have face
validity for similar educational contexts. Second, the results of studies of learning are obviously
dependent on the degree of attention to the study materials. While there is no reason to believe
that subjects deliberately sabotage laboratory studies through inattention, the classroom context
is the ideal situation in which to study learning under intentional conditions. There is no reason
to believe that students who are consistently inattentive in class would produce more diagnostic
data regarding their learning ability in the laboratory. Third, an entire class is a larger and more
representative sample than the subset that may volunteer for a laboratory experiment for cash or
credit. In this study over 800 students, nearly all of the students who participated in the courses,
contributed.
On the other hand, the disadvantage of embedding an experiment in a class is that there is
much less control over what students are doing when they are not participating in the experiment.
Obviously, any study of the experimental items outside of the scope of the experiment may influ-
ence its results. There is no way that this control problem can be eliminated. However, we do
believe that it is manageable. It is often possible to use a design in which the most likely effect of
extra-experimental activity is to reduce the likelihood of a significant result. That was the case in
488 A.L. Glass et al.

this experiment, where additional study by the students should have reduced the advantage in
long-term retention caused by two pretests. Furthermore, it may be possible to obtain measures
of extra-experimental activity, such as self-report measures of studying, and incorporate them
within an analysis of covariance design. Therefore, we believe that it would be premature to
exclude this study from the experimental literature on the basis of its novel methodology. Rather,
this pilot study of a promising new methodology should be judged by the informativeness and
reliability of the results that it produces.

Method
The experimental design was embedded in two two-section general psychology courses offered
at a north-eastern US university. The experiment was performed twice, in the fall and again in the
spring semester. Thus, students from four sections participated across the two semesters.

Participants
A total of 1020 students were enrolled in the course, 679 in the fall and 341 in the spring. Of these
students, 851 completed the four exams, including the final exam: 542 in the fall and 309 in the
spring. Students who took make-up exams were not included in the analysis. Demographic data
were not collected in the fall. A total of 309 students answered demographic questions on the final
in the spring. There were 132 males and 177 females. The students also self-identified themselves
as 26 African-Americans, 112 Asian-Americans, 107 Caucasians, 31 Latinos, and 33 other. The
distribution of gender and race in the fall was probably the same.

Procedure
Semesters were 16 weeks long; there were 14 weeks of instruction followed by a two-week finals
period. Each lecture group met twice a week for 80-minute lecture periods (so there were 28 lecture
periods). The course was divided into three four-and-a-half-week blocks consisting of eight to nine
classes and a block exam, which was administered during a lecture period.
Kalat (2005) was the textbook for the course. A reading in Kalat (2005) was assigned for each
lecture, and an online quiz on the reading that was part of the ThomsonNOW online support
system for the text was available online before most classes, starting a few days before the class
and ending at the start of class. Students were instructed to do the assigned reading and then take
the quiz. The online quiz consisted of between 10 and 25 questions, presented sequentially. Once
they started the quiz, the students had to proceed through the questions and submit the quiz online
for grading. Students were aware that online ThomsonNOW scores would form 20% of their final
grade.
Each lecture was accompanied by a PowerPoint presentation of the main points being made.
Approximately 10 questions were asked throughout a class. Students responded with personal
response devices (clickers) using the TurningPoint system. Immediately after a topic containing
a target point was presented in class, both verbally and on a PowerPoint slide, a question would
be asked to assess how many students in the class understood the point being made. This meant
that the answer had appeared on one of the previous three slides, usually the immediately previous
one. Immediately after the class responded, the correct answer was shown and explained.
Students were aware that clicker scores recorded in class would comprise 30% of their final
grade.
All online, in class, block exam, and final exam questions were multiple-choice questions with
four alternative answers each. There were 60 questions on each block exam and 120 questions on
the final.
Educational Psychology 489

Table 1. General psychology target questions by course section.

Question number Online Section 1 Clicker Section 1 Online Section 2 Clicker Section 2

1 x x
2 x x
3 x x
4 x x
5 x x
6 x x
7 x x
8 x x
9 x x
10 x x
11 x x
12 x x
13 x x
14 x x
15 x x
16 x x
17 x x
18 x x
19 x x
20 x x
21 x x
22 x x
23 x x
24 x x
25 x x
26 x x
27 x x
28 x x
29 x x
30 x x
31 x x
32 x x
33 x x
34 x x
35 x x
36 x x
37 x x
38 x x
39 x x
40 x x
41 x x
42 x x
43 x x
44 x x
45 x x
46 x x
47 x x
48 x x
490 A.L. Glass et al.

The design of the experiment is shown in Table 1. The table shows which lectures and
sections had ThomsonNOW questions appearing in an online quiz before class and/or as a clicker
question in class. The similar questions appeared only on the block exams and the final exam.

Experimental materials
A total of 48 pairs of questions were used to test the effect of pretesting a question on exam perfor-
mance. One question in each pair was selected from the ThomsonNOW test bank for Kalat (2005)
and the other question was selected to be similar to it. For 36 of the pairs, a single proposition
logically entailed the answer to both members of the pair. For example, the following question
was selected from ThomsonNOW:
Which of the following is NOT part of the individual neuron?
a) synapse b) nucleus c) terminal button d) dendrite
The following similar question was constructed:
A synapse is
a) an area of dead tissue in the brain; b) the combination of a neuron and its nearest glia cell; c) a
junction where one neuron communicates with another; d) an immoral lapse.
Both questions are verified by the proposition ‘A synapse is a junction where one neuron commu-
nicates with the other.’
In the other 12 pairs, the similar question tested a related fact about the subject of the
ThomsonNOW question. For example, the following question was selected from ThomsonNOW:
Which measure of brain activity records the radioactivity put out by various brain areas after radio-
active chemicals are injected into the blood?
a) electroencephalograph; b) functional magnetic resonance imaging;
c) magnetoencephalograph; d) positron emission tomography
The following similar question was constructed:
Compared to the encephalograph (EEG), positron-emission tomography (PET) provides:
a) information on a millisecond-by-millisecond basis; b) more precise information about where the
brain is active; c) information about specific thoughts that a person is having; d) the best technique
for measuring people’s reactions to lights and sounds.
The 48 pairs of questions comprised four pairs each from 12 different chapters from Kalat (2005).
Each of these 12 sets of question pairs was associated with a different one of 12 lectures in the
course (i.e., four in each of the three blocks of the course). The 16 question pairs associated with
each block of the course appeared on that block exam. All 48 pairs of questions appeared on the
final exam.

Results
A level of p = .05 was adopted as the criterion for considering a result significant. Recall that the
experimental design was embedded within the course. Thus, in order for a student to contribute
to all cells of the experimental design, the student had to participate in: the in-class pretests in the
12 lectures constituting the design; the online pretests prior to those 12 lectures; and the three
block exams and the final. A total of 377 students met these criteria. Figure 1 shows the results
for these students on the pretests and exams.
A 6 × 2 × 3 analysis of variance was performed on the questions that were repeated four times
Figure 1. For students who participated in all conditions in all blocks, the probability of a correct response to a repeated question for questions presented online and in class (both), online, in class, and only on the two exams (neither) (left panel); and the probability of a correct response to questions that were similar to the repeated questions (right panel).

during the course, and on the similar question that appeared on the block exam and final. This is
Educational Psychology 491

Both Online In class Neither

0.95

0.9

0.85

0.8

0.75

0.7
Pretest 1 Pretest 2 Exam Final Exam Final
Figure 1. For students who participated in all conditions in all blocks, the probability of a correct response
to a repeated question for questions presented online and in class (both), online, in class, and only on the
two exams (neither) (left panel); and the probability of a correct response to questions that were similar to
the repeated questions (right panel).

the line labelled ‘Both’ in Figure 1. Recall that the students participated in online and in-class
pretests for exam items in all three blocks of the course, as shown in Table 1. The factors in the
analysis were question (six levels, as shown in Figure 1), section (two levels), and block (three
levels). As shown in Table 1, in each block of the course each section saw four exam questions
both online and in class and the other section saw the same questions only on the block exam and
final. Since the same design was used in the fall and spring semesters, the sections that received
the same items in the fall and spring were treated as the same section in the analysis. There were
240 students in one section and 137 students in the other section.

Quiz versus exam performance


As can be seen from Figure 1, the effect of question was significant (F[5,1875] = 39.1). Post-hoc
comparisons confirmed that this was because performance on the pretests (M =.93, SE =.005)
was better than performance on the repeated and similar items on the block exams and final exam
(M =.87, SE =.007, t[1875] = 8.0).
492 A.L. Glass et al.

Robustness of effects across lecture sections, blocks, and student sub-groups


First, consider the effect of section. The proportion of answers correct for the two sections was
.888 (SE =.005) and .894 (SE =.006), respectively (F[1,375] =.68). That is, there was no effect of
section. Since there was no effect of section, and section was not a factor of theoretical interest,
we followed the advice of Cohen and Cohen (1983, p. 347) and did not consider any interactions
between section and the other factors. In all subsequent analyses, section was always included as
a factor and never had an effect. These data will not be reported.
The effect of block was significant (F[2,375] = 68.7), as was the interaction between block
and question (F[10,3750] = 20.7). Figure 2 shows the block × question interaction. With the
exception of one data point (online Block 1), each of the three blocks replicates the pattern for the
combined data shown in Figure 1. So, despite the fact that both the main effect of block and its
interaction with question were significant, neither provided any information about the effect of
repeated questioning on exam performance. Rather, the interaction apparently resulted from some
minor variations in the magnitude of the effect across blocks that were most likely item-specific
differences since, of course, different items were presented in different blocks. In subsequent
analyses, the main effect of block, which was always significant, will be reported. However, no

Block 1 Block 2 Block 3

0.95

0.9

0.85

0.8

0.75

0.7
Pretest 1 Pretest 2 Exam Final Exam Final
Figure 2. The effect of block on the probability of a correct response to questions that appeared on both
pretests.
Educational Psychology 493

block interactions ever provided information beyond the main effects. These uninformative inter-
actions will not be reported.
For the spring semester, an analysis of variance was performed that was identical to that
Figure 2. The effect of block on the probability of a correct response to questions that appeared on both pretests.

reported above except that gender was substituted for section. There were 54 males and 73 females
included in the analysis. The proportion of correct answers was .89 (SE =.011) for the males and
.90 (SE =.008) for the females (F[1,125] =.8). That is, there was no effect of gender. This was not
surprising. Even if there are gender differences in learning, the common criteria for accepting
students to a university makes it unlikely that subgroups will differ in learning. Since there was
no effect of gender, and gender was not expected to be a factor of theoretical interest, we again
followed the advice of Cohen and Cohen (1983, p. 347) and did not consider any interactions
between gender and the other factors. In all subsequent analyses, an analysis of the spring data in
which gender was substituted for section was always performed and gender never had an effect.
These non-effects will not be reported.

Online versus in-class performance


Recall that the same questions that appeared on the online pretest for one section appeared on an
in-class pretest for the other section. An analysis of variance was performed on the questions that
were repeated three times during the course, and on the similar question that appeared on the
block exam and final. These are the lines labelled ‘Online’ and ‘In-class’ in Figure 1. The factors
in the analysis were question, pretest (online versus in-class), block, and section. The effects of
question (F[4,1500] = 28.6), pretest (F[1,375] = 45.2), and block (F[2,750] = 353.9) were all
significant, as was the question × pretest (F[4,1500] = 62.9) interaction. As shown in Figure 1 and
confirmed by post-hoc test (t[1500] = 12.8), the interaction was the result of better performance
on the online quizzes than on any other test.

The effect of pretests on subsequent exam performance


To evaluate the effects of the pretests, an analysis of variance was performed on the exams and
finals in which the factors were: exam (block exam versus final), similarity to pretest question
(repeated versus similar), pretest (both, online, in-class, neither), block, and section.
The effects of exam (M =.82, SE =.006 for block exam versus M =.83, SE =.006 for final;
F[1,375] = 11.9), similarity (M =.825, SE =.006 for repeated questions versus M =.817, SE =.005
for similar questions; F[1,345] = 4.8), pretest (F[3,1125] = 87.8), and block (F[2,750] = 289.6)
were all significant, as was the exam × pretest interaction (F[3,1125] = 6.4). As can be seen from
Figure 1, and was confirmed by post-hoc comparisons, the probability of a correct response was
greater for questions that appeared on two pretests (M =.87, SE =.005) than questions that appeared
on one or no pretest (M =.80, SE =.006; t[1125] = 5.8). Also, the exam × pretest interaction was
such that there was no difference in the proportion of answers correct on the block exams versus
the final exam for questions that appeared on the pretests (M =.83, SE =.006 for block exams versus
M =.83, SE =.007 for final), but performance improved on the final for questions that did not
appear on the pretest (M =.78, SE =.008 for block exams versus M =.81, SE =.007 for final).
Notice from Figure 1 that performance on both pretests was over 90%. For this reason, it
was not possible to analyse whether answering a question correctly or incorrectly on the pretest
had an effect on subsequent exam performance. There were not enough incorrect pretest
responses to construct a meaningful comparison. However, it was possible to extract all the
exam questions for which the student got all previous pretest occurrences (two, one, or zero)
correct. Of the 377 students in the analysis, 341 students met this criterion for some questions of
all kinds in all blocks. The results of this tabulation are shown in Figure 3.
494 A.L. Glass et al.

Both Online In class Neither

0.95

0.9

0.85

0.8

0.75

0.7
Exam Final Exam Final
Figure 3. For students who participated in all conditions in all blocks, the probability of a correct response
to a question on the block exam and final for questions answered correctly online and in-class (both), online,
and in class, and those questions only presented on the two exams (neither) (left panel); and the probability
of a correct response to questions that were similar to repeated questions answered correctly on their pre-
test(s) (right panel).

Figure 3. For students who participated in all conditions in all blocks, the probability of a correct response to a question on the block exam and final for questions answered correctly online and in-class (both), online, and in class, and those questions only presented on the two exams (neither) (left panel); and the probability of a correct response to questions that were similar to repeated questions answered correctly on their pretest(s) (right panel).

The effect of correct responses on pretests on subsequent exam performance


To evaluate the effect of correct pretest responses, an analysis of variance was performed on the
questions on the block exams and final that had been answered correctly on all pretests in which
the question was included. The factors were: exam, similarity, pretest, block, and section.
The effects of exam (M =.83, SE =.006 for block exam, M =.84, SE =.006 for final; F[1,339]
= 10.4), similarity (M =.84, SE =.006 for repeated questions, M =.82, SE =.006 for similar ques-
tions; F[1,339] = 25.3), pretest (F[3,1017] = 88.3), and block (F[2,678] = 212.8) were all signif-
icant, as was the exam × pretest interaction (F[3,1017] = 3.5) and the pretest × similarity
interaction (F[3,1017] = 17.8). As can be seen from Figure 3, and was confirmed by post-hoc
comparisons, the probability of a correct response was greater for questions that appeared on two
Educational Psychology 495

pretests (M =.89, SE =.007) than for questions that appeared on one or no pretest (M =.81, SE
=.008) (t[1017] = 4.8).
As shown in Figure 3, the exam × pretest interaction was such that there was no difference
between block exams and the final exam in terms of the proportion of correct answers to questions
that appeared on the pretests (M =.84, SE =.007 for block exams versus M =.84, SE =.007 for
final), but performance was better on the final exam for questions that did not appear on the
pretest (M =.79, SE =.009 for block exams versus M =.82, SE =.008 for final). The pretest × simi-
larity interaction was such that questions that had appeared on at least one pretest (hence, were
repeated) were more likely to be answered correctly than similar questions (M =.86, SE =.008 for
repeated questions versus M =.82, SE =.007 for similar questions), but questions that had not
appeared on at least one pretest (hence, were not repeated) were not more likely to be answered
correctly than similar questions (M =.80, SE =.009 for unrepeated questions versus M =.81, SE
=.008 for similar questions).

Results for less diligent students who did not participate in all lectures or online assignments
Next, the 377 students whose data were analysed as part of a balanced experimental design were
eliminated from the data set. All the analyses reported above were performed again, eliminating
block as a variable, on the reduced data set. This made it possible to include 433 students who failed
to participate in one of the 12 critical lectures or in the preceding online quiz, but nevertheless
responded to the majority of the items in all conditions in this simplified design. Of the 433 students
included in this analysis, there were 272 students in one section and 161 students in the other section.
Figure 4 shows the effect of the pretests on exam performance for these students.
Figure 4. For students who did not participate in all conditions in all blocks, the probability of a correct response to a repeated question for questions presented online and in-class (both), online, in class, and only on the two exams (neither) (left panel); and the probability of a correct response to questions that were similar to the repeated questions (right panel).

Quiz versus exam performance


An analysis of variance was performed on the questions that were repeated four times during the
course, and on the similar questions that appeared on the block exam and final. This is the line
labeled ‘Both’ in Figure 4.
As can be seen from Figure 4, the effect of question was significant (F[5,2155] = 53.3). Post-
hoc comparisons confirmed that this was because performance on the pretests (M =.90, SE =.006)
was greater than performance on the repeated and similar items on the block exams and final
(M =.83, SE =.006) (t[2155] = 16.8).

Online versus in-class performance


Recall that the questions that appeared in the online pretest for one section appeared in an in-class
pretest for the other section. An analysis of variance was performed on the questions that were
repeated three times during the course, and on the similar questions that appeared in the block
exams and final. These are the lines labeled ‘Online’ and ‘In-class’ in Figure 4. The factors in the
analysis were question, pretest (online versus in-class), and section. The effects of question
(F[4,1724] = 81.3) and pretest (F[1,431] = 49.7) were both significant, as was the question ×
pretest interaction (F[4,1724] = 35.4). As shown in Figure 4, and confirmed by post-hoc test
(t[1724] = 25.2), the interaction was the result of better performance on the online quizzes than
on any other test.

The effect of the pretests on subsequent exam performance


To evaluate the effects of the pretests, an analysis of variance was performed on the block
exams and finals in which the factors were exam (block exam versus final), similarity to pretest
496 A.L. Glass et al.

Both Online In class Neither

0.95

0.9

0.85

0.8

0.75

0.7
Pretest 1 Pretest 2 Exam Final Exam Final

Figure 4. For students who did not participate in all conditions in all blocks, the probability of a correct
response to a repeated question for questions presented online and in-class (both), online, in class, and only
on the two exams (neither) (left panel); and the probability of a correct response to questions that were sim-
ilar to the repeated questions (right panel).

question (repeated versus similar), pretest (both, online, in-class, neither), block, and section.
The 433 students’ results that were included in the previous analysis were used for this analysis
as well. The results are shown in Figure 4.
The effects of exam (M =.75, SE =.005 for block exam versus M =.77, SE =.005 for final;
F[1,431] = 28.7) and pretest (F[3,1293] = 153.6) were both significant, as was the exam ×
pretest interaction (F[3,1293] = 10.6). As can be seen from Figure 4 and was confirmed by
post-hoc comparisons, the probability of a correct response was greater for questions that
appeared on two pretests (M =.83, SE =.005) than questions that appeared on one or no
pretest (M =.74, SE =.007) (t[1293] =10.4). Also, the exam × pretest interaction was such that
there was no difference in the proportion of correct answers on the block exams versus the
final for questions that appeared on the pretests (M =.77, SE =.007 for block exams versus
M =.77, SE =.007 for final), but performance improved on the final for questions that did
not appear on the pretest (M =.74, SE =.007 for block exams versus M =.78, SE =.007 for
final).
Educational Psychology 497

The effect of correct responses on pretests on subsequent exam performance


To evaluate the effect of correct pretest responses, an analysis of variance was performed on the
block exams and finals in which the factors were: exam, similarity, pretest, and section. Off the
433 students in the previous analysis, 426 students got some items correct in all pretests and
participated in this analysis; 265 students from one section and 161 students from the other
section. The effects of exam (M =.78, SE =.005 for block exam, M =.79, SE =.006 for final;
F[1,424] = 6.0), similarity (M =.79, SE =.006 for repeated questions, M =.80, SE =.005 for similar
questions; F[1,424] = 9.2), and pretest (F[3,1272] = 127.0) were all significant, as was the exam
× pretest interaction (F[3,1272] = 12.8) and the pretest × similarity interaction (F[3,1272] = 8.9).
As can be seen from Figure 5, and confirmed by post-hoc comparisons, the probability of a

Both Online In class Neither

0.95

0.9

0.85

0.8

0.75

0.7
Exam Final Exam Final

Figure 5. For students who did not participate in all conditions in all blocks, the probability of a correct
response to a question on the block exam and final for questions answered correctly online and in-class
(both), online, and in class, and those questions only presented on the two exams (neither) (left panel); and
the probability of a correct response to questions that were similar to repeated questions answered correctly
on the pretest(s) (right panel).
498 A.L. Glass et al.

correct response was greater for questions that appeared on two pretests (M =.87, SE =.006) than
for questions that appeared on one or no pretest (M =.76, SE =.007) (t[1272] = 10.1).
As shown in Figure 5, the exam × pretest interaction was such that there was no increase in
Figure 5. For students who did not participate in all conditions in all blocks, the probability of a correct response to a question on the block exam and final for questions answered correctly online and in-class (both), online, and in class, and those questions only presented on the two exams (neither) (left panel); and the probability of a correct response to questions that were similar to repeated questions answered correctly on the pretest(s) (right panel).

the proportion of correct answers from the block exams to the final for questions that appeared
on the pretests (M =.80, SE =.008 for block exams versus M =.79, SE =.008 for final), but perfor-
mance improved on the final for questions that did not appear on the pretest (M =.74, SE =.007
for block exams versus M =.78, SE =.007 for final). As shown in Figure 5, the pretest × similarity
interaction was such that questions that had appeared on at least one pretest (hence, were
repeated) were more likely to be answered correctly than similar questions (M =.80, SE =.008 for
repeated questions versus M =.79, SE =.008 for similar questions), but questions that had not
appeared on at least one pretest (hence, were not repeated) were not more likely to be answered
correctly than similar questions (M =.75, SE =.007 for unrepeated questions versus M =.77, SE
=.007 for similar questions).

The effect of the pretests on final performance for correctly-answered exam questions
Next, an analysis of variance was conducted on the questions that were answered correctly on
both the block exams and the final exam. The factors in the analysis were similarity, pretest,
block, and section. A total of 483 students contributed to all cells of this experimental design and
were included in this analysis. The effects of pretest (F[3,1443] = 17.0) and block (M =.87, SE
=.005 for Block 1, M =.93, SE =.004 for Block 2, M =.96, SE =.003 for Block 3; F[2,962] = 210.1)
were significant. As confirmed by a post-hoc comparison, the effect of pretest was significant
because the probability of a correct answer was greater for both questions (i.e., pretested both
online and in class; M =.94, SE =.004) than for online (M =.91, SE =.004), in-class (M =.90, SE
=.005), or neither questions (M =.92, SE =.005) (t[1443] = 2.5).

Discussion
This experiment produced three results that were informative about the effect of study on reten-
tion in a college lecture class. In addition, the experiment produced two results that demonstrate
significant forgetting over a semester-long retention interval and a dynamic balance between
forgetting and study.

Two pretests of a question increased retention of its answer but one pretest did not
As Figures 1 and 4 show, presenting a question once before the block exam, whether online or in
class, had no effect on the probability of that question being answered correctly on the block exam
or the final exam. However, presenting a question twice before the block exam, once online and
once in class, increased the probability of that question being answered correctly on both the
block exam and the final. The same effect was observed for both repeated questions and for simi-
lar questions whose answers were implied by the answers to the repeated questions. If the students
had only remembered the words of the answers to the questions verbatim, then the repeated ques-
tions would have been answered correctly more often than the similar questions. However,
performance on repeated versus similar questions was identical – so memory for a question must
have been at the level of comprehension. The gist of the correct answer was remembered rather
than its exact words. When the analysis was restricted to correct answers, there was a significant
effect of similarity, but the magnitude of the effect was small – the advantage of repeated over
similar items was .04 in Figure 3 and .01 in Figure 5. This result most probably reveals that in
addition to gist memory, some students recognised the words of their own responses. However,
it does not modify the conclusion that memory for gist was the largest effect of questioning.
Educational Psychology 499

A comparison of the results for all pretested questions shown in Figures 1 and 4 with those
answered correctly on the pretest shown in Figures 3 and 5 suggests no additional effect of
retrieving the correct answer on the pretest.
To summarise, all students were given the same reading assignments before class. If a student
was pretested on a question from the reading before class and given the correct answer as feed-
back, that student was more likely to get that question correct if asked again in class. If a student
was pretested on the same question both before class and in class, that student was more likely to
get that question correct on the block exam and final exam. So the pretests increased encoding
and/or retention of the study materials. Furthermore, these effects were observed whether or not
the student got the question correct on the quiz.
These results extend previous research into the effect of testing on learning in several important
ways, when compared with the previous research most similar in methodology and purpose to that
reported here – the work of Roediger and colleagues (Karpicke & Roediger, 2007; Roediger &
Karpicke, 2006; Roediger & Marsh, 2005). Roediger found an effect of testing on recall; here, the
effect was found to extend to multiple-choice questions. Roediger measured retention over an inter-
val of a week; here, retention was measured over intervals up to 15 weeks. Also, the present study’s
finding that, for the meaningful study materials used here, at least two test trials are required to
increase retention at intervals of greater than a week is new. In fact, this is the first study to examine
the effect of any kind of distributed study of meaningful material, whether active or passive, over
retention intervals of 1–15 weeks (Cepeda et al., 2006). Also, this is the first experimental study
to show a significant effect of clicker use on exam performance, and the second experimental study
(after Romanov & Nevgi, 2006) to demonstrate a significant effect of online testing.
The advantage of repeated testing over single testing, even when the first test produced a
correct answer, contradicts speculation that a repeated test would not produce better retention.
Buschke (1973), Thompson et al. (1978), and Bahrick (1979) all made use of a dropout procedure
in which items responded to correctly were not retested. In the context of this study, this would
mean that questions answered correctly online would not be repeated in class. The dropout proce-
dure is used when it is believed that repeated study of an item already producing a correct
response, called over-learning, would not improve retention. Karpicke and Roediger (2007,
p. 157) point out:

This dropout condition is similar to what study guides often instruct students to do when studying
facts by using flash cards and other methods: Drop material that is already ‘learned’ (or recallable)
from further practice and focus on material that is not yet learned.

Similarly, Rohrer et al. (2005) argued that over-learning may not benefit retention at very long
retention intervals (e.g., nine weeks).
In contrast, it was found here that at retention intervals of 1–12 weeks, when performance on
the final is considered, two pretests produced better retention than one pretest, even for questions
answered correctly. We agree with Karpricke and Roediger’s (2007) conclusion that the effec-
tiveness of over-learning probably depends on the type of practice. While additional passive study
had little or no effect on retention in the studies mentioned above, repeated testing had a profound
effect here and on the results of Karpicke and Roediger (2007).
There are two possible explanations for the effectiveness of the pretests as study trials. These
explanations are not mutually exclusive.
The first explanation is the ‘additional study hypothesis’. According to this hypothesis, the
question alerted the student that this was something that had to be learned. In its extreme form,
this hypothesis would have a student not reading the assigned reading, but taking the pretest
before the class and looking up the answer to each question in the textbook. Consequently, at
the end of the pretest the student would know the answers to the pretest questions but nothing
500 A.L. Glass et al.

else. Furthermore, the student would record the answer online, and also in class. Subsequently,
the student could study these answers, and hence know them for the block exams and final
exam.
The second explanation is the ‘repeated testing hypothesis’. According to this hypothesis,
asking the question before class results in an enhanced representation of its content in the memory
of the student, which produces higher retention for the block exam and final. A weakness of this
hypothesis is that it does not explain why a question had to appear on both online and in-class
pretests in order to produce better exam performance. Neither does it explain why the interval
between the block exam and the final had a significant effect on performance. If the main effect
on final performance was a result of what was studied just before the final, then how long it had
been since that recently studied material had last been tested should not have had the large effect
on performance that was observed. Furthermore, it should be emphasised that the questions to all
48 experimental questions were presented in class and on PowerPoint slides that were available
online both before and after class.
Thus, it could be that students only gave additional study to those facts they remembered
being asked about online or in class, and paid less attention to the additional material discussed
in class and available on the PowerPoint slides accompanying the lectures. That is, students had
precise memory for the facts they were questioned about, but it was not this memory of the ques-
tioning but the subsequent study that it encouraged that produced the improvement in exam
performance. This hypothesised sequence of events is less parsimonious than the repeated testing
hypothesis and will not be considered further.
Recall that there are two explanations of the testing effect: a quantitative increase in the
elaboration of the memory trace, versus a qualitative increase in its robustness and hence its
longevity. That two pretests were necessary to produce improved performance on the block
exams and final is consistent with the qualitative hypothesis: asking the question again in class
signalled to the brain that this information would be needed again, thus initiating a consolidation
process for the representation that produced better performance on the block exams and final.
Furthermore, as shown in Figures 2 and 5, even when the analysis was restricted to those items
correctly answered previously, the results were the same. Consider just those questions answered
correctly in class: questions were more likely to be answered correctly in the block exam and final
exam if they had been previously answered correctly online. This result is not explained by the
elaboration hypothesis, but it directly follows from the assumption that distributed presentation
produces a more robust memory trace.
Recently, evidence has been found of the neurological mechanism underlying the qualitative
change in memory trace. This mechanism is called neurogenesis. New cells are born in the brain
every day, but most of these cells soon die. However, when an animal learns something new,
some of these new neurons are recruited into the new memory trace and become a part of it.
So the retention of new neurons is an essential component of learning. A recent study of neuro-
genesis (Sisti, Glass, & Shors, 2007) found that repetition increases the survival of the new
neurons that form the memory trace of a new event.
Another result here also seems to be inconsistent with the elaboration hypothesis. Recall that
on both pretests the students received feedback as to the correct answer. If the predominant effect
of a pretest question was to encode an event in which the correct answer was retrieved by and/or
made known to the student when the correct response was given as feedback, then it should not
have mattered whether or not the student got the question correct. The effect on subsequent
performance should have been the same: and it was. However, if the predominant effect of a
pretest question was to provide a study trial during which the student formed an elaborated
representation during the retrieval of a response to a question, then presumably this elaborated
representation should have been useful on subsequent exams only if it was in fact associated with
Educational Psychology 501

the correct response. Hence, there should have been a larger effect on subsequent performance
when the student got the question correct.
In summary, these results suggest that, in this experiment, the effect of a repeated question
was to produce a more robust memory trace. However, these results should not be considered to
disconfirm the elaboration hypothesis, which is not a mutually exclusive alternative to the robust-
ness effect – the results here merely indicate that it was not the principal explanation of the effect
of repetition observed here.

Forgetting between tests


Notice that Figures 1 and 4 show significant forgetting between the pretest and the block exam.
This result is confirmed by the results shown in Figures 3 and 5. Fewer than 90% of the questions
answered correctly on a pretest were again answered correctly on the block exam. Also, even when
both the repeated and similar versions of a question were answered correctly on a block exam,
retention interval had a significant effect on whether the question would be answered correctly
on the final. As the retention interval decreased from the block exam to the final, the probability
of a correct response increased, from .87 to .93 to .96 across the three block exams. Together, these
results show that forgetting of the answers to questions occurred throughout the course.
Forgetting is evidence that some students had less than optimal study strategies. The students
were familiar with the questions from pretests and block exams and had unlimited access to study
materials. If their review of the material had been complete, then questions answered correctly on
the block exams should all have been answered correctly on the final. So, at least some students
did not engage in an optimal review for the final, consequently responding incorrectly to ques-
tions that they had previously answered correctly. This is not to say that students did not study at
all. Performance improved between the block exams and the final, on questions that students saw
for the first time on the block exams. However, improvement from the block exam to the final did
not occur for questions that had been seen before. Whatever learning some students engaged in
was offset by the forgetting of others. This result poses a pedagogical challenge. At least one more
quiz, between the first block exam and the final, is necessary to level the forgetting function. Its
effectiveness should be the focus of future research.

Conclusions
This experiment produced three significant results: two pretests of a question increased retention
of its answer but one pretest did not; there was still forgetting of the correct answer between a
block exam and the final; and the same effects were observed for both exam questions identical
to the quiz questions and exam questions that were similar to quiz questions. In addition, the
experiment produced two results that demonstrate significant forgetting over a semester-long
retention interval and a dynamic balance between forgetting and study. Getting a question correct
on two pretests did not guarantee getting it correct on a block exam, and neither did getting two
versions of a question correct on a block exam guarantee getting it correct on the final. For ques-
tions that were not pretested, performance improved from the block exam to the final; however,
for questions that were pretested, performance did not improve from the block exam to the final.
From these results we can draw conclusions about memory and pedagogy.

Memory
Distributed testing increases the robustness of the memory trace. This is consistent with the
hypothesis that the traces of repeated events undergo consolidation, producing a trace whose
retention interval is proportional to the duration between event repetitions.
502 A.L. Glass et al.

Pedagogy
Given the duration of a semester, two repetitions of a study task, at least a day apart, are much
more likely to have a long-term effect on exam performance than a single presentation of the
study task. In the course studied here, performance increased from about 80% to 90% – about
from a B to an A.
It should be possible to obtain a better theoretical understanding of the effect of distributed
study on retention, hence providing a method for reducing intra-course forgetting and further
improving classroom performance, through a refined use of this methodology in future research.

Acknowledgement
This research was supported by a grant from the Cengage (formerly Thomson) Learning Corporation.

References
Bahrick, H.P. (1979). Maintenance of knowledge: Questions about memory we forgot to ask. Journal of
Experimental Psychology: General, 108, 206–208.
Bjork, R.A. (1988). Retrieval practice and the maintenance of knowledge. In M.M. Gruneberg, P.E.
Morris, & R.N. Sykes (Eds.), Practical aspects of memory: Current research and issues (Vol. 1,
pp. 396–401). New York: Wiley.
Bobrow, S.A., & Bower, G.H. (1969). Comprehension and recall of sentences. Journal of Experimental
Psychology, 80, 455–461.
Brown, A.S., Schilling, H.E.H., & Hockensmith, M.L. (1999). The negative suggestion effect: Pondering
incorrect alternatives may be hazardous to your knowledge. Journal of Educational Psychology, 91,
756–764.
Buschke, H. (1973). Selective reminding for analysis of memory and learning. Journal of Verbal Learning
and Verbal Behavior, 12, 543–550.
Cepeda, N.J., Pashler, H., Vul, E., Wixted, J.T., & Rohre, D. (2006). Distributed practice in verbal recall
tasks: A review and quantitative synthesis. Psychological Bulletin, 132, 354–380.
Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral
sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
deWinstanley, P.A., & Bjork, R. (2002). Successful lecturing: Presenting information in ways that engage
effective processing. New Directions for Teaching and Learning, 89, 19–31.
Ebbinghaus, H. (1964). Memory: A contribution to experimental psychology (H.A. Ruger, C.E. Bussenius,
& E.R. Hilgard, Trans.). New York: Dover Publications. (Original work published 1885).
Elliott, C. (2003). Using a personal response system in economics teaching. International Review of
Economics Education, 1, 80–86
Foos, P.W., Mora, J.J., & Tkacz, S. (1994). Student study techniques and the generation effect. Journal of
Educational Psychology, 86, 567–576.
Gates, A.I. (1917). Recitation as a factor in memorizing. Archives of Psychology, 6, 40.
Glass, A.L., Krejci, J., & Goldman, J. (1989). The necessary and sufficient conditions for motor learning,
recognition and recall. Journal of Memory and Language, 28, 189–199.
Kalat, J.W. (2005). Introduction to psychology (7th ed.). Belmont, CA: Thomson.
Karpicke, J.D., & Roediger, H.L., III. (2007). Repeated retrieval during learning is the key to long-term
retention. Journal of Memory and Language, 57, 151–162.
Kennedy, G.E., & Cutts, Q.I. (2005). The association between students’ use of an electronic voting system
and their learning outcomes. Journal of Computer Assisted Learning, 21, 260–268.
McDaniel, M.A., & Fisher, R.P. (1991). Tests and test feedback as learning sources. Contemporary
Educational Psychology, 16, 192–201.
McDaniel, M.A., Kowitz, M.D., & Dunay, P.K. (1989). Altering memory through recall: The effects of
cue-guided retrieval processing. Memory and Cognition, 17, 423–434.
McDaniel, M.A., & Masson, M.E.J. (1985). Altering memory representations through retrieval. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 11, 371–385.
Roediger, H.L., III, & Karpicke, J.D. (2006). Test-enhanced learning: Taking memory tests improves long-
term retention. Psychological Science, 17, 249–255.
Roediger, H.L., III, & Marsh, E.J. (2005). The positive and negative consequences of multiple-choice test-
ing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1155–1159.
Educational Psychology 503

Rohrer, D., Taylor, K., Pashler, H., Wixted, J.T., & Cepeda, N.J. (2005). The effect of overlearning on
long-term retention. Applied Cognitive Psychology, 19, 361–374.
Romanov, K., & Nevgi, A. (2006). Learning outcomes in medical informatics: Comparison of a WebCT
course with ordinary web site learning material. International Journal of Medical Informatics, 75,
156–162.
Sisti, H., Glass, A.L., & Shors, T.J. (2007). Neurogenesis and the spacing effect: Learning over time
enhances memory and the survival of new neurons. Learning and Memory, 14, 368–375.
Slamecka, N.J., & Graf, P. (1978). The generation effect: Delineation of a phenomenon. Journal of
Experimental Psychology: Human Learning and Memory, 4, 592–604.
Thompson, C.P., Wenger, S.K., & Bartling, C.A. (1978). How recall facilitates subsequent recall: A reap-
praisal. Journal of Experimental Psychology: Human Learning and Memory, 4, 210–221.
Toppino, T.C., & Luipersbeck, S.M. (1993). Generality of the negative suggestion effect in objective tests.
Journal of Educational Psychology, 86, 357–362.
Zangwill, O.L. (1939). Some relations between reproducing and recognizing prose material. British Journal
of Psychology, 29, 370–382.

You might also like