Professional Documents
Culture Documents
Composition: A Meta-analysis of
Experimental Treatment Studies
(p. 914). Experimental research, he charges, "is written for other re-
searchers, promotions, or dusty archives in a language guaranteed for
self-extinction.... The data cannot be exported from room to room
... [Teachers] have been unable to transfer faceless data to the alive,
inquiring faces of the children they teach the next morning" (p. 918).
What Graves presumably means by this curious metaphor is that the
findings of experiments cannot be applied with comparable results in
other than the experimental classrooms. If Graves in particular is right,
we should find that results of experiments on similar instructional
variables have little in common, that their results are highly hetero-
geneous.
Despite the current disdain for experimental studies, it seemed wise
to examine them for a number of reasons. First, the total number of
experimental studies completed in the past 20 years exceeds the total
number of studies included in the Braddock bibliography. Second,
even a cursory review of the published studies indicates that many of
them have heeded the advice of Braddock and his colleagues, who
had rightly bemoaned the lack of carefully designed experiments.
Third, new techniques have been available for integrating the results
of experimental studies since 1978.
Selection of Studies
Meta-analysis
The techniques used in the analysis are based on the work of Glass
(1978) and particularly on the statistical model developed by Hedges
(1981, 1982a). A meta-analysis computes standard scores for various
treatments' gains or losses by dividing the difference between posttest
scores, adjusted for the difference between pretest scores, by the pooled
standard deviation of posttest scores for all groups in the study. The
resulting score, commonly called effect size, reports a given treatment
gain (or loss) in terms of standard score units. Thus, a given treatment
might be said to have an experimental/control effect size of .5 standard
deviations, meaning that the gain for the average student in the ex-
perimental group is .5 standard deviations greater than for the average
student in the control group.
Effect sizes can be accumulated across studies. Using the techniques
developed by Hedges (1981), this meta-analysis weights each effect
size by the reciprocal of its variance so that the accumulation is not
simply an average of raw effect sizes, but a mean effect size dependent
on the variance of its constituents.
The major goal of the meta-analysis is to explain the variability
among the characteristics of the treatments in relation to the variability
of their effect sizes. That involves categorizing the treatments along
various dimensions (e.g., instructional mode, focus of instruction, du-
ration), comparing mean effect sizes of treatments grouped together,
and testing the studies grouped together for homogeneity (Hedges
Mode (N = 29)
Presentational Natural Process Ind
Focus (N = 39) (N = 4) (N= 9) (
Grammar (N = 5)
Sentence combining
z (N = 5)
0 Models (N = 7) 3
Using criteria (scales,
etc.) (N = 6)
Cb ... ...
(I
c- Inquiry (N = 6)
Free writing (N = 10) 9 g. ? 9
s0
All Treatments
Duration
Mode of Instruction
D
TABLE 2
p)
o
Duration:Summaryof ExperimentallControl
EffectSzzeStatistics
95% Confidence
IntervaI
o
Duration Mean Effect SD Lower Upper
Presentational Mode
Environmental Mode
Individualized Mode
95% Confidence
Interval
0 Mean Effect SD Lower Upper
PI
w
0
E: All meta-analysis
treatments (N = 73) .28 .018 .24 .32 41
m Treatments (four outliers
removed) (N = 69) .24 .019 .20 .27 16
trl+ Treatments included in
0-
E: mode of instruction
analysis (N = 29) .24 .025 .19 .29 7
z
Natural process (N = 9) .19 .037 .11 .26 2
Environmental (N = 10) .44 .050 .34 .53 1
Presentational (N = 4) .02 .114 -.20 .24
Individualized (N = 6) .17 .064 .06 .28 1
Treatments categorized by
mode of instruction
(N = 29) . . . 5
Macrorie's Telling Writing had a substantially greater loss than did the
control, resulting in an experimental/control effect size of -.27. But
the reasons for that loss are not so apparent as they are in the Adams
(1977) study. At any rate, removal of these two studies reduces the
mean effect size to only .18 from .186, but reduces H to 14.44 from
23.15.
In the individualized group, one study contributes heavily to both
the mean and the heterogeneity (A. E. Thibodeau 1973). Its removal
reduces the mean effect size to .09 from .167 and H to 7.20 from
14.68. Speculation about its relatively high experimental/control effect
size of .45 is not possible without more detailed information about the
treatments than the dissertation provides.
As table 4 indicates, the removal of these three studies reduces the
heterogeneity in the dimension of mode and its categories to homo-
geneity (H = 35.39, df = 22, not significant atp < .01). The dimension
explains 89.6 percent of the 29 treatments categorized as having an
identifiable instructional mode in the experimental group and some
other in the control.
Since the characteristics of each treatment group were coded, in-
spection of mean pretest-to-posttest effect sizes regardless of experi-
mental or control status is possible. The mean pretest-to-posttest effect
for 33 presentational treatments is .18, considerably higher than might
be expected from the original analysis; for nine natural process treat-
ments, .26; for nine environmental treatments, .75; and for seven
individualized treatments, .24. Examining the treatments in this way
indicates that their relative positions have changed only slightly, with
the presentational and environmental treatments being somewhat
stronger in relation to the others. While the differences among the
presentational, natural process, and individualized modes are not sig-
nificant, the environmental gain is three times the gain for the others
TABLE 4
Effect Size H df
Presentational .02 .92 3
Natural process .18 14.44 6
Environmental .44 12.83 9
Individualized .09 7.20 4
Total modes 35.39 22
Focus of Instruction
95% Confidence
Interval
Mode Mean Effect SD Lower Upper
Presentational (N = 32) .18 .026 .13 .23 9
z Natural process (N = 9) .26 .035 .19 .33
o0 2
Environmental (N = 9) .75 .053 .64 .85 10
Individualized (N = 9) .24 .060 .12 .36
04
Al
Models
Sentence Combining
Scales
Inquiry
Free Writing
95% Confidence
Interval
Focus Mean Effect SD Lower Upper
Treatments included in focus of
instruction analysis (N = 39) .26 .023 .21 .30
Grammar (N = 5) -.29 .059 -.40 -.17
Sentence combining (N = 5) .35 .083 .19 .51
Models (N = 7) .22 .057 .11 .33
z Scales (N = 6) .36 .078 .21 .51
0o
Free writing (N = 10) .16 .035 .09 .23
Inquiry (N = 6) .56 .076 .41 .71
(D Treatments categorized by
focus of instruction (N = 39) ... ..
With two outliers removed
4 (N = 37) ... ... ..
The mean experimental/control effect size for the five studies focusing
on sentence combining activities is .35 and is also homogeneous
(H = 1.89, df = 4). Although the effect size is not significantly different
from that for the study of models, it is significantly greater than that
for the grammar treatments.
The mean experimental/control effect size for the use of scales is
.36 and is homogeneous (H = 6.89, df = 5). While the treatment is
not significantly different from sentence combining, it is significantly
different from three of the six focuses examined: grammar, models,
and free writing.
The mean experimental/control effect size for the six treatments
focusing on inquiry is .56, the highest mean effect size for any in-
structional focus. It, too, is homogeneous with H = 8.73 and df = 5.
It is significantly higher than grammar, models (z = 3.75, p < .0002),
scales (z = 1.96, p < .05), sentence combining (1.98, p < .05), and
free writing (z = 4.99, p < .0001).
Finally, the mean effect size for the 10 treatments using free writing
as a major instructional tool is .16. However, it is not homogeneous
(H = 29.25, df = 9). Although free writing has a significantly stronger
effect than grammar and mechanics, it is significantly lower than sentence
combining (z = 2.12, p < .05), scales (z = 2.35, p < .02), and inquiry
(z = 4.99, p < .0001).
With all studies included, the dimension of instructional focus comes
close to homogeneity (H = 58.92, df = 33, with 54.70 required for
nonsignificance at p < .01). The main source of this heterogeneity is
the free-writing category (H = 27.25). The Adams (1971) study displays
a high positive standardized residual, as it did in the natural process
category, and should be removed for substantive reasons, its gain being
entirely dependent on losses in the control groups. The removal of a
second treatment, Witte and Faigley (1981), this one with a high negative
standardized residual, along with Adams, increases the mean for free
writing to .17 from .16 but reduces H to 18.23, which is not significant
at p < .01, df = 7. The removal of these two studies reduces H for
the dimension to 48.58, not significant at p < .01, df = 31.
To achieve homogeneity for each focus and in the dimension as a
whole, only two of 39 treatments were removed. The dimension of
instructional focus, then, has a high level of explanatory power, ac-
counting for nearly 95 percent of the 39 treatments included in it.
Removal of Outliers
While the results reported above are clear, the critical reader may ask
about the extent to which removal of outliers distorts the results. Four
their drafts in light of comments from both students and the instructor.
But the instructor does not plan activities to help develop specific
strategies of composing. This instructional mode is about 25 percent
less effective than the average experimental treatment, but about 50
percent more effective than the presentational mode. In treatments
that examine the effects of individualized work with students, the
results are essentially the same.
I have labeled the most effective mode of instruction "environmental"
because it brings teacher, student, and materials more nearly into
balance and, in effect, takes advantage of all resources of the classroom.
In this mode, the instructor plans and uses activities that result in
high levels of student interaction concerning particular problems parallel
to those they encounter in certain kinds of writing, such as generating
criteria and examples to develop extended definitions of concepts or
generating arguable assertions from appropriate data and predicting
and countering opposing arguments. In contrast to the presentational
mode, this mode places priority on high levels of student involvement.
In contrast to the natural process mode, the environmental mode
places priority on structured problem-solving activities, with clear ob-
jectives, planned to enable students to deal with similar problems in
composing. On pretest-to-posttest measures, the environmental mode
is over four times more effective than the traditional presentational
mode and three times more effective than that natural process mode.
Like modes of instruction, the focuses of instruction examined have
important ramifications for instructional practice.
Grammar.-The study of traditional school grammer (i.e., the def-
inition of parts of speech, the parsing of sentences, etc.) has no effect
on raising the quality of student writing. Every other focus of instruction
examined in this review is stronger. Taught in certain ways, grammar
and mechanics instruction has a deleterious effect on student writing.
In some studies a heavy emphasis on mechanics and usage (e.g., marking
every error) results in significant losses in overall quality. School boards,
administrators, and teachers who impose the systematic study of tra-
ditional school grammar on their students over lengthy periods of
time in the name of teaching writing do them a gross disservice that
should not be tolerated by anyone concerned with the effective teaching
of good writing. Teachers concerned with teaching standard usage
and typographical conventions should teach them in the context of
real writing problems.
Models.-What I have referred to as teaching from models un-
doubtedly has a place in the English program. This research indicates
that emphasis on the presentation of good pieces of writing as models
is significantly more useful than the study of grammar. At the same
time, treatments that use the study of models almost exclusively are
less effective than other available techniques.
Free writing.-This focus asks students to write freely about whatever
interests or concerns them. As a major instructional technique, free
writing is more effective than teaching grammar in raising the quality
of student writing. However, it is less effective than any other focus
of instruction examined. Even when examined in conjunction with
other features of the "process" model of teaching writing (writing for
peers, feedback from peers, revision, and so forth), these treatments
are only about two-thirds as effective as the average experimental
treatment and less than half as effective as environmental treatments.
Sentencecombining.-The practice of building more complex sentences
from simpler ones has been shown to be effective in a large number
of experimental studies. This research shows sentence combining, on
the average, to be more than twice as effective as free writing as a
means of enhancing the quality of student writing.
Scales.-Scales, criteria, and specific questions that students apply
to their own or others' writing also have a powerful effect on enhancing
quality. Through using the criteria systematically, students appear to
internalize them and bring them to bear in generating new material
even when they do not have the criteria in front of them. These
treatments are two times more effective than free-writing techniques.
Inquiry.-Inquiry focuses the attention of students on strategies for
dealing with sets of data, strategies that will be used in writing. For
example, treatments categorized as inquiry might involve students in
finding and stating specific details that convey personal experience
vividly, in examining sets of data to develop and support explanatory
generalizations, or in analyzing situations that present ethical problems
and in developing arguments about those situations. On the average,
these treatments are nearly four times more effective than free writing
and over two-and-a-half times more powerful than the traditional
study of model pieces of writing.
While the results for the various treatments differ greatly from each
other, this does not imply that the less effective techniques have no
place in the writing curriculum. Indeed, sentence combining, scales,
and inquiry all make occasional use of models, but they certainly do
not emphasize the study of models exclusively. Structured free writing,
in which writers jot down all of their ideas on a particular topic, can
be successfully integrated with other techniques as a means of both
memory search and invention.
These results have important ramifications for those in positions to
make and recommend policy at local, state, and national levels. Indeed,
the findings of this study directly contradict the assumptions, policy
that are demonstrably more effective and continue our efforts to evaluate
them.
Note
The research reported in this paper was supported by a grant from the
Spencer Foundation and will be part of a book-length review of research on
the teaching of composition.
References
Studies in Meta-analysis