School Improvement: Reality and Illusion

British Journal of Educational Studies, ISSN 0007-1005
DOI number: 10.1111/j.1467-8527.2009.00444.x

Vol. 57, No. 4, December 2009, pp 363 –379
SCHOOL IMPROVEMENT: REALITY

SCHOOL
Blackwell
Oxford,
British
BJES
©
1467-8527
0007-1005
XXX
Blackwell
ORIGINAL IMPROVEMENT:
Journal
UKPublishing
Publishing
of Educational
ARTICLE Ltd REALITY
Ltd. and
Studies AND ILLUSION
SES 2009
AND ILLUSION
by Robert Coe, Durham University
ABSTRACT: School improvement is much sought and often claimed.

However, it is questionable whether overall achievement in countries
such as the USA or England has improved by any significant amount
over thirty years. Several school improvement programmes have been
claimed as successful, but evaluations, even where they exist, are generally
poor: based on the perceptions of participants, lacking any counterfactual
or reporting selectively. Accounts of improvement in individual schools
are numerous, but are inevitably selective; the attribution of causality is
problematic and knowledge of the conditions under which such phenomena
are likely to be replicated is limited. School effectiveness research also has
yet to identify specific strategies with clear causal effects. In short, many
claims of school improvement are illusory. Nevertheless, there are some
improvement strategies that are well-defined, feasible and robustly shown
to be effective. In future, we need greater clarity and agreement about
what constitutes success. Evaluation must be taken more seriously, and
its results treated more critically.
Keywords: school improvement, evidence-based, evaluation
1. Introduction
For those who want to improve schooling, there seems to be plenty
of advice about how to do it; the problem here is not a shortage of
initiatives. Most suggestions claim some basis in research, and many
journals and books are filled with the latest thinking about what a
successful school needs to do or how the process of transforming
a school can be undertaken. Also plentiful are the accounts of
improvement by individual schools. These are often dramatic – even
heroic – stories that will give hope and inspiration to others who
would set out to improve their school.
But are the claims about improvement made by researchers,
practitioners and policy makers to be trusted? Evaluation of the true
363
© 2009 The Author
Journal compilation © 2009 SES. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford
OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.
SCHOOL IMPROVEMENT: REALITY AND ILLUSION
effects of school improvement initiatives is often seen as unnecessary

or, when it is done, is done badly. Without proper evaluation, almost
any approach can make what may appear to be compelling claims
about its effectiveness. What would a more critical analysis of the
evidence tell us about the history of and prospects for genuine
improvement?
The educational world is swept by periodic trends, confidently
and optimistically moving on from what did not work, but often
returning to ideas that were previously discarded. Short memories
and a belief in the power of novelty seem to be the driving forces.
Such an unscientific approach creates fashion victims, not improving
schools. Given the volume and complexity of the ‘improvement’
literature, does it have any practical implications for schools? Where
can we find advice that will reliably lead to real improvement rather
than just alignment with the latest fashion?
2. Are the Improvement Claims Credible?
Evidence of Systemic Improvement

One reason for being sceptical of the claims made about school
improvement in individual cases is the lack of convincing evidence
of improvement of the system as a whole. For example, in the US
results from the National Assessment of Educational Progress (Perie
et al., 2005) show that performance in reading has remained absolutely
flat since 1971 for all age groups. In mathematics, despite some gains
at ages nine and 13, performance at the end of schooling (age 17)
has not risen.
International studies such as PISA or TIMSS do not have such a
long history, are not designed to evaluate change over time and their
interpretation and use is inevitably problematic (see e.g. Brown, 1998).
Nevertheless, recent surveys have provided some evidence about
changes over time and, for countries such as England which lack
adequate internal systems for measuring changes in achievement,
they are certainly worth considering. These analyses tend to show
rises for some individual countries, for some ages or in some subjects,
but not overall improvement. For example, PISA 2006 compared
reading of 15-year-olds with achievement in 2000 and found that
‘performance generally remained flat’ (OECD, 2007, p. 5), despite,
as they point out, a 39 per cent real-terms increase in per-student
expenditure over the same period.
TIMSS 2007 compared science performance with 1995 (Martin
et al., 2008, pp. 44 –48). Overall results for countries that participated
364
© 2009 The Author
Journal compilation © 2009 SES
in both surveys did show improvements at Grade 4, but little

overall change at Grade 8. Mathematics in TIMSS 2007 showed
overall small rises at Grade 4, but declines in Grade 8 compared
with performance of the same countries in 1995 (Mullis et al., 2008,
p. 51).
In England, although, at face value, rises in performance on
national tests have been substantial, it is clear that they overstate
the true picture (Tymms, 2004). A review by Tymms and Merrell
(2007) of multiple sources of evidence about changes in standards
concluded that performance in both reading and mathematics at
the end of primary school (age 11) has ‘remained fairly constant’
since the 1950s, though results of national tests suggest that since
1995 reading has risen ‘very slightly’ and mathematics has risen
‘moderately’. At GCSE and A-level apparent rises have been
dramatic, though again it seems likely that these substantially over-
state any real gains, about which little evidence seems to be available
(Coe, 2007). However, even real rises in statutory assessments may
not amount to real improvement, since increases in performance
on high-stakes assessments do not necessarily correspond with rises
on other tests for which pupils have not been specifically prepared
(Klein et al., 2000; Linn, 2002).
Moreover, any optimism we might have about the power of
national education policy to impact on performance should be
tempered by the example of the UK, where, despite very different
policy contexts and initiatives, international studies often fail to
show dramatic differences across the four UK nations (e.g. Bradshaw
et al., 2007). Of course, some commentators have pointed to specific
examples of good performance in these studies as evidence of
the success of local policy, but these seem to depend on being
somewhat selective. For example, Hopkins’ (2007, p. 37) citation
of England’s good results in reading in PIRLS 2001 as evidence for
the success of the National Literacy Strategy was subsequently rather
undermined by its more modest performance in the 2006 survey.
Should the latter result be taken as evidence that the strategy is no
longer working?
Another possible aspect of systemic improvement would be
increased equity. If overall standards have not risen, but the gap
between rich and poor has closed, for example, this could still
represent progress. Here the evidence is mixed and rather complex.
Although it seems that in some countries the strength of association
between family of origin and educational outcomes may have
declined, in others (including England and the United States) it has
not (Breen and Jonsson, 2005).
365
© 2009 The Author
Evidence of Impact of Specific School Improvement Programmes

Even if outcomes of the school system as a whole have not improved
it is still possible that individual programmes of school improvement
have worked. Certainly, there seems to be a tradition in writing on
school improvement of citing a number of key examples of successful
projects (e.g. Sammons, 2007). However, when one tries to investigate
the evidence on which their claims to success are based, one is often
faced with a circle of citation whose references are all to each other
or, at best, to equivocal or poor quality evaluations (Gray et al., 1999).
For example, the Improving the Quality of Education for All
(IQEA) project (Hopkins et al., 1994) is described by Harris (2002)
as ‘one of the most successful improvement projects in the UK’
(p. 3). However, in a resource pack for teachers, Hopkins (2002,
p. 1) concedes that ‘IQEA has not (as yet) been subject to external
evaluation’. The basis of the claims that the programme has been
successful seem to be the journals kept by participating teachers to
record their perceptions of the process (Hopkins et al., 1994).
In fact, the use of the perceptions of participants as a criterion for
success is fairly widespread in school improvement research. For example,
Harris (2002, p. 48) gives a list of tools that can be used to evaluate
school improvement programmes, including interviews, questionnaires
and observation, but with no mention of (for example) any assessment
of learning. Outcomes such as teachers’ perceptions are generally
easier to change than arguably more important outcomes like
students’ achievement, which are notoriously robust. It is extraordinary,
for example, that two recent evaluations of major large-scale school
improvement programmes in the USA were able to say nothing
about their impact on student achievement (Muijs, 2004). Without
any evidence about changes in achievement outcomes, an evaluation
can give at best a limited picture of a programme’s real impact.
Of course perceptions are important and one would certainly
want to know how a programme of school improvement change
was perceived by those who experienced it and whose job it was to
make it work. In interpreting such perceptions, however, we must
bear in mind that they suffer from an in-built positive bias due to the
problem of ‘dissonance reduction’ (Festinger, 1957). This is the
social-psychological phenomenon whereby people who believe they
have freely invested effort in a particular course of action are more
likely to see it as successful than those who have not invested such
effort. This is because for the former group to see it as unsuccessful
would require them to think they had acted unwisely; the alternative
to the loss of self-esteem that this judgement would imply is to
366
© 2009 The Author
increase the favourability of their judgements of the success of their

actions. Hence in situations where people have been persuaded,
encouraged or inspired (but not forced) to put effort into making
a change, they are more likely to see it as having been successful
than those who did not put in effort (e.g. Brown and Peterson, 1994).
The fact that participants view a programme as having succeeded
probably tells us more about the motivational and inspirational skills
of those who recruited and persuaded them to commit to it than it
does about the real impact of the programme.
A further problem in many accounts of school improvement is
that even where the targeted schools have improved it is quite possible
that they would have improved just as much had they not been
involved. This appears to be the case, for example, in Ofsted’s
(2006) evaluation of the London Challenge, in which substantial
improvement was seen in schools that were not directly part of the
programme, though no specific comparison was made and this is not
really clear in the rather celebratory tone of the document. The
need for a valid comparison is well illustrated by MacBeath et al.
(2007) who attempt to evaluate the impact of an initiative to improve
a group of ‘schools facing extremely challenging circumstances’.
Although it is clear that the schools’ results improved dramatically
over the period of the programme, the improvement was almost
perfectly matched by a group of unrelated comparison schools
identified as being comparable at the start.
It is well known in the evaluation literature that studies without
a comparison group tend to over-estimate effects (e.g. Lipsey and
Wilson, 1993). This is likely to be a particular problem when the
impact of a programme is being evaluated against a background of
wider changes, either in policy or in the nature of the outcome
measures being used, as for example when increases in GCSE attain-
ment are taken as evidence of improvement. An evaluation of a school
improvement programme without any kind of comparison group is
unlikely to tell us anything about the effects of that programme since
it cannot rule out other possible explanations for any apparent
improvement. Even where reference to a comparison group is made,
interpretation can still be problematic unless a good case is made for
the initial equivalence of the two groups, for example by the use of
random assignment.
Evidence of Improvement of Individual Schools

A widespread feature of the research literature and surrounding
narratives of school improvement is the accounts of the improvement
367
© 2009 The Author
of individual schools. These descriptions are often rich in detail,

personally narrated and contain dramatic and compelling evidence
of improvement. On one level such accounts are often highly credible:
when faced with a headteacher who declares ‘My school was failing
in A, B, C ways. We did X, Y, Z. Now the evidence of success is P, Q,
R’, even a sceptical listener must acknowledge the contrast between
A, B, C and P, Q, R. Even if we accept the account of improvement
in this school as valid, however, there are two general reasons why we
should be cautious about thinking this has implications for other
schools.
The first is that such accounts are inevitably selective, on a number
of levels. For example, we seldom hear broadcast with the same
enthusiasm the story of the headteacher who turned a successful
school into a failure – yet such stories must exist. Perhaps they also
did X, Y, Z? If we hear only the success stories then pretty much
anything may seem like an effective strategy for school improvement;
a strategy has only to be popular for it to appear effective under this
level of critical scrutiny.
This phenomenon of ‘publication bias’ is well known in the
evaluation literature. Studies with more positive and statistically
significant results are more likely to be written up, submitted for
publication (and to more prominent outlets) and actually published
(Hopewell et al., 2009). Indeed, in fields where systematic review
of the evidence of impact of interventions is common, such as in
medicine, it is widely seen as a significant problem. Despite the fact
that medical trials must be prospectively registered, that awareness
of publication bias is widespread in medical research and that a
number of techniques are commonly used to identify and minimise
its effects, it is still believed to be a substantial threat, causing the
conclusions from reviews to be more positive than they would be if
all the evidence were available (Torgerson, 2003).
Selection can also occur within a study, for example, in a greater
tendency to report positive instances, or put greater emphasis on
them. There will also be choices to be made about what outcomes to
record, which groups to assess or which methods of analysis to use,
all of which may affect the results. These kinds of selection can operate
at a subconscious level, particularly where the person reporting the
programme has also been involved in its delivery. There is a natural
tendency to look for the best and we often fail to see the downsides
of what we have worked hard to create.
The second reason we should be cautious about applying lessons
from one school’s account of improvement to other schools is that
even if there has been real improvement, the reasons for it will
368
© 2009 The Author
always be hard to isolate and may be even harder to replicate. Just

because you did X, Y, Z and P, Q, R resulted, it does not follow that
X, Y, Z was the cause of the improvement. The rooster that crowed
every morning believing that he made the sun come up might
have been surprised to discover that if one morning he was silent,
the sun would still rise. Applied to such an implausible context, the
argument seems facile, yet this same kind of ‘chicken logic’, albeit
cloaked in more subtle and reasonable form, underlies many
attempts to infer general strategies from the correlation between a
set of actions taken by one school and its subsequent improvement.
A more critical analysis of that school’s account might ask whether
there are other possible explanations for the improvement.
One example of such an alternative explanation is that the natural
fluctuation in performance from one year to the next, or the tendency
for systems with feedback to self-correct, will tend to make a bad
school better, regardless of any outside intervention. It may seem
obvious to start trying to improve immediately after a particularly bad
year, or to focus your efforts on particularly bad schools. However, in
such cases the phenomenon of ‘regression to the mean’ ensures that
they are likely to improve anyway.
A second alternative explanation for improvement is that many
attempts to improve schools will result in changes in the charac-
teristics of students entering the school. The publicity, boost to
morale, charisma of the actors driving change and extra funding
that are often associated with improvement initiatives may all serve to
make those schools more popular and hence improve their intakes.
Unless an evaluation takes account of this change, any improvement
in outcomes may simply amount to moving low achievement from
one school to another. Moreover, even when a study does attempt to
adjust for intake, because measurements of intake characteristics are
not perfect, that adjustment will be inadequate, leading to exaggerated
estimates of the true effect of an initiative. Hence, for most schools,
one of the easiest ways to ‘improve’ may be simply to recruit better
students.
Finally, in relation to the issue of inferring causality from anecdotal
accounts, we must consider the question of generalisability. Even if
we are convinced that X, Y, Z was the cause of the improvement in
a particular case, it does not follow that it would work as well in all,
or even any, other schools. Special features of the context may have
mediated the effect. Unless we have some knowledge about the
range of contexts in which a benefit may be expected, as well as the
tolerance for variations in X, Y, Z that are allowable, it seems premature
to advocate this as a strategy to other schools.
369
© 2009 The Author
3. What is the Secret of School Improvement?

Given how widespread these kinds of inadequacies in evaluation
design and interpretation are, we might be forgiven for thinking
that more or less any initiative could be made to look effective. A
cynical school improvement consultant might offer the following
advice to a school or school system that wants to be able to claim
‘improvement’:
• Wait for a bad year and/or choose a bad school to start with.
Things often self-correct and you can then attribute this to the
initiative, or even take the credit for it yourself.
• Take on any initiative, and ask everyone who put effort into it
whether they feel it worked. Pay money to a consultant. Then to
say it hasn’t worked would mean admitting that you’ve wasted
your time and money; no-one wants to think that.
• Define ‘improvement’ in terms of perceptions and ratings of
teachers and students. DO NOT conduct any objective assessments
– they may disappoint. If you have to assess student achievement
use a test that has high stakes for everyone involved – teachers can
usually find a way to improve scores. Remember, you can often be
selective in who you interview, observe or assess.
• Conduct some kind of evaluation, but don’t let the design be too
good. Avoid any kind of comparison group of schools if possible,
but if you have to, make sure you allow some important (but
unmeasured) differences between the ‘comparison’ and ‘inter-
vention’ schools to remain. AT ALL COSTS avoid random allocation.
Ideally, allow someone connected with, or at least supportive of,
the initiative to conduct the evaluation. Don’t make any decisions
about what outcomes to report or what analyses to do until you
have seen the data.
• If any improvement occurs in any aspect of performance, focus
attention on that rather than on any areas that have not improved
or got worse.
• Put some effort into marketing and presentation of the school
in order to recruit better students. That more or less guarantees
‘improvement’. If this seems too obviously misleading, by all
means collect some token data on intake and make a ‘value-
added’ adjustment. As long as the intake measures are not
perfect you won’t be able to adjust fully and some ‘improvement’
will remain.
Evaluation sleights of hand, such as these, probably account for a
large part of what is claimed as school improvement in the literature.
370
© 2009 The Author
4. The Need for High-quality Evaluation

My argument so far has been that much of what is claimed as school
improvement is actually not real or, if it is, we often don’t really know
why it occurred. I have argued that wider and better use of more
rigorous evaluation designs would help us to distinguish real from
illusory improvement and to understand the strategies and con-
ditions under which real improvement can confidently be predicted
to occur as a result of any particular actions.
In fact the call for more evaluation is itself quite common within
the school improvement literature. In particular, with the attempts to
merge the approaches of school improvement and school effectiveness
in the 1990s came a number of appeals for improvement initiatives
to be better evaluated (e.g. Reynolds and Stoll, 1996). It is notable,
however, that the most credible of these tended to come from those
whose roots had been in the ‘effectiveness’ side of the merger.
For those steeped in the school improvement tradition, the idea
of evaluation, or at least impact evaluation, seems more problematic.
Part of the reason for this is a view of school improvement as an
organic, adaptive process that creatively and actively engages those
working in schools to seek their own local solutions, rather than
imposing some ‘one size fits all’ strategy on them from outside. What
works in one school may well not work in another. Real school
improvement is not done to a school but by it.
The belief that programmes to help organisations such as schools
can work only if they are sufficiently tailored to the particular unique
context and emphasise self-determination over passive response has
led some to reject the kind of ‘rigorous’ evaluation that sets out to
measure impact and identify cause and effect as not possible, necessary
or desirable (see for example Rootman et al., 2001, for a presentation
of this argument in relation to the evaluation of community health
programmes). Evaluations that use experimental or quasi-experimental
designs to seek general causes of measured outcomes are inappro-
priate when neither interventions nor outcomes can be pre-specified
and when any ‘treatment’ that is applied will be creatively adapted
and transformed by the participants into something that is likely to
be unrecognisable to its designers. Moreover, it is argued, in complex
field settings experimental control is practically impossible, as well as
unethical and counterproductive to programme aims of empower-
ment; each context is unique and cannot be fully specified, so
attempts to find general laws are doomed to fail inexplicably.
Unfortunately, the strict adoption of this perspective carries some
costly consequences. If school improvement strategies cannot be
371
© 2009 The Author
pre-specified then they cannot be the subject of policy. You cannot

advise or require someone to act in a particular way unless you can
specify what it is you want them to do. However, even flexible pro-
grammes will contain a level of specification. For example, IQEA
(Hopkins et al., 1994) does not specify what kinds of strategies a
participating school must adopt – this is very much left to them to
decide – but it does specify a set of principles and broad operational
constraints within which those decisions should be made. To the
extent that it is specified (i.e., the principles and general approach)
it can therefore be evaluated.
Even if a strategy can be well enough specified for it to be adopted
as a policy at some level, it is hard to see how this can be rationally
defended in the absence of any evidence about its likely impact. To
say that all schools are unique so one cannot generalise about what
works, but at the same time giving specific advice or implementing
a particular policy with the intention of helping them to improve, is
tantamount to saying, ‘I have no basis for believing this will do more
good than harm in your case, but do it anyway.’
In fact, the need for improvement strategies to be sensitive to
individual circumstances is perfectly compatible with rigorous
evaluation. The problem lies in defining the limits of applicability of
the solution. A particular approach may work only with certain types
of school, teacher or student, in particular circumstances, or with
specific characteristics. If we can identify the conditions under which
it is appropriate, we can evaluate whether it really is a solution; if we
cannot, then although a putative solution may well work for some
and not others, unless we know which is which, we should not waste
time, energy and resources trying to implement it.
5. Where Should We Look for Knowledge About

How to Improve Schools?
If school improvement research is an unreliable guide to what is
worth trying, where should we look? Some have claimed that school
effectiveness research can tell us how to make a school effective.
Others point to specific improvement strategies that have been
properly evaluated and shown to be effective. We should consider
both these suggestions.
School Effectiveness Research

In the 1990s a number of writers began to talk about a ‘merged para-
digm’ between the previously separate fields of school improvement
372
© 2009 The Author
and school effectiveness research. Since the ‘merger’, writers such as

Sammons (2007) have argued that attempts to improve schools
should be informed by what we know about the characteristics of
effective schools.
Others have disagreed, however. In fact the whole field of school
effectiveness research has been the subject of strong criticism,
particularly in the UK. Writers such as Elliott (1996), Thrupp (2001)
and Wrigley (2004) have seen SER as dominated by a positivist,
reductionist paradigm which underestimates the importance of the
social context of schooling, ignores crucial questions about values
and oversimplifies educational goals.
Yet other writers have criticised more specific aspects of the methods
used and the claims made in SER, while broadly accepting the
paradigm (e.g. Coe and Fitz-Gibbon, 1998; Hill, 1998; Luyten et al.,
2005; Ouston, 1999; Scheerens et al., 2001). Among the criticisms
made by these writers are SER’s narrow and inappropriate definitions
of ‘effectiveness’, the oversimplification and lack of theoretical
basis of its modelling and its exaggeration of the consensus over the
correlates of effectiveness. Most crucial, in relation to the relevance
of SER to school improvement, are the criticisms that the amount of
variation in performance that is explained by any of the factors
associated with ‘effectiveness’ is very small and that SER has yet to
demonstrate the extent to which differences among schools in their
‘effectiveness’ are really caused by identifiable factors within the
school and, more importantly, factors within the school’s control.
The well known product of SER – a list of characteristics of effective
schools which a hopeful school improver should try to imitate –
turns out to be at best weakly related to a somewhat problematic
definition of effectiveness. Moreover, there is generally no guidance
about exactly how each characteristic can be acquired if it is absent,
or strengthened if it is present. Even if a school can manage to take
on these characteristics there is no guarantee that outcomes will
improve as a result. As Ouston (1999) has observed:
If one had cleaned the classrooms of the less effective schools and
given each teacher a house-plant would the exam results have
improved? I doubt it: the house-plants would probably have died.
(p. 168)
The absurdity of the causal interpretation is clear in this example, yet
it is the same logic that drives schools to try to adopt characteristics
like ‘strong educational leadership’ or ‘shared vision’. The fact that
a particular causal interpretation is superficially plausible should not
excuse any lowering of our critical standards. Convincing proof that
373
© 2009 The Author
taking action to assume a particular characteristic will lead to

improvement in some outcome can come only from studies that
observe the effects of deliberate attempts to do so, not from cross-
sectional, correlational studies. It is interesting to note that when a
much less well known follow-up to one of the best known studies in
school effectiveness research (Fifteen Thousand Hours , Rutter et al.,
1979), tried to do just this, the results were rather disappointing
(Maughan et al., 1990; Ouston et al., 1991).
Improvement Programmes that have been Evaluated

The second suggestion is perhaps more promising. Of most interest
here will be strategies for improvement that meet three specific
criteria. First, the approach must be well-defined, so that it is clear
to those implementing it exactly what they must do, and to evaluators
whether or not the instance they are evaluating is actually a true
example of this approach. Second, they must be feasible. In other
words they must fit with the constraints of school life, appear attrac-
tive to those working in schools and not be prohibitively expensive.
Third, the approach must have been evaluated using adequate
designs to estimate its impact in a reasonable range of relevant
contexts, and the results must be sufficiently positive to justify the
effort and expense of adoption.
Perhaps the most obvious examples of approaches that meet these
criteria are the US-grown programmes of comprehensive school
reform that apply a particular structure of activities to whole school
improvement. Programmes like Success for All (Slavin and Madden,
2001) and the School Development Program (Comer et al., 1999) are
clearly defined, have been adopted very widely (at least in the US)
and have been well-evaluated (Borman et al., 2003).
As well as the whole-school focused approaches there are also
examples of strategies that are more focused on classroom learning,
some of which are themselves elements of the whole-school pro-
grammes mentioned above. Again, we should limit our attention to
approaches that are well-defined, feasible and shown to be effective.
In this category we might include Reading Recovery (Clay, 1993), peer-
tutoring (Ginsburg-Block et al., 2006) and the use of metacognitive
approaches (Higgins et al., 2005; Marzano, 1998).
Also possibly in this category are some strategies for the use of
assessment feedback to enhance learning. These include the use of
formative assessment (Black and Wiliam, 1998), ‘visible learning’
(Hattie, 2009) and performance monitoring (Visscher et al., 2002).
In all these cases there is substantial evidence that the use of these
374
© 2009 The Author
kinds of feedback can lead to better learning. What may be less clear,
however, is exactly what the intervention is in each case. If it is a
specific set of teacher behaviours, what are they? If it is a particular
programme of teacher education or development, of what should it
consist? Unfortunately, although there is overwhelming evidence
that feedback can enhance learning, we also know that it doesn’t
always. Until well-specified strategies have been developed and
evaluated it may be premature to implement policies in this area.
Finally, under the heading of evaluated approaches we could
mention some that have been evaluated and found to be ineffective.
For example, there is little evidence to suggest that attempts to
match teaching to pupils’ learning styles has any benefits (Coffield
et al., 2004), despite its current popularity in England and elsewhere.
Another, perhaps more controversial, example is the use of ‘learning
mentors’, a core element of initiatives such as Excellence in Cities and
Every Child Matters in England (CWDC, 2009). Although here policy-
makers can cite evaluations that claim positive effects of mentoring
on learning and attitudes, a more critical analysis of these studies
and the conclusions of systematic reviews based on more rigorous
designs suggest that effects of this kind of mentoring are small, at
best (Coe, forthcoming).
6. Conclusions
I have argued that much of what is claimed as school improvement
is illusory, whether in relation to the improvement of whole systems,
particular programmes or the accounts of individual schools. I have
stressed the need for adequate evaluation to allow us to distinguish
between those programmes that are effective and those that are not.
Moreover, the claims of school effectiveness research, that it can
identify improvement strategies, have been questioned. On a more
positive note, I have cited some examples of strategies for improve-
ment that are well-defined, feasible and effective.
What lessons can the practitioner who is interested in real
improvement take from this? The most important message is the
need for better use of evidence from evaluation. There are three
particular changes required that may be highlighted here.
The first is a need for a cultural change in the value placed on
evaluation in informing decisions about practice or policy. Too often
the evaluation of initiatives appears to be overlooked, done as an
afterthought or used selectively when it supports an already decided
course of action. Instead it must be built into the development of
new initiatives and policies, properly resourced and treated as a key
375
© 2009 The Author
part of the decision-making process. Decision makers, including

elected ministers, their advisors, local authority officers and head-
teachers, should be held to account for the evidence underpinning
their decisions.
Second is the need for greater awareness of the threats to validity
of evaluation claims. The problem is not just lack of evaluation or
lack of attention to it; sometimes poor quality evaluation is accepted
uncritically, leading to incorrect inference. This is a problem not
just within the educational research community but among policy
makers and practitioners. The wider the spread of informed critical
scepticism in the interpretation of causal claims, the less fertile is the
breeding ground for ineffective practices and policies.
The third change is the need for more attention to outcomes.
Before a programme can be evaluated, decisions must be made about
what outcomes are to count as success or failure. Too often available
outcomes are treated as given, without much thought about whether
they are measured adequately or even whether what they measure is
consistent with our aims for schooling and for the programme.
To sum up, it is beyond doubt that some of the effort that goes
into making changes and introducing new practices into schools
results in real improvements. Undoubtedly, however, much of it does
not and it is likely that some of it actually does harm. We must also
remember that a similar level of effort directed into other activities
might have led to even greater improvement. While we continue
to adopt practices and policies based on their popularity and super-
ficial plausibility, the irony is that as educators we fail to learn. Unless
we improve our ability to evaluate programmes properly and to act
on that evidence we will continue to be fashion victims, unable to
separate reality and illusion in school improvement. Worst of all, we
will fail to do our best for the children whose education matters so
much.
7. References
BLACK, P. and WILIAM, D. (1998) Assessment and classroom learning, Assessment
in Education, 5 (1), 7–74.
BORMAN, G., HEWES G., OVERMAN, L. and BROWN, S. (2003) Comprehensive
school reform and achievement: a meta-analysis, Review of Educational Research, 73
(2), 125 –230.
BRADSHAW, J., STURMAN, L., VAPPULA, H., AGER, R. and WHEATER, R. (2007)
Achievement of 15-year-olds in England: PISA 2006 National Report (OECD Programme
for International Student Assessment) (Slough, NFER).
BREEN, R. and JONSSON, J.O. (2005) Inequality of opportunity in comparative
perspective: recent research on educational attainment and social mobility,
Annual Review of Sociology, 31, 223–243.
376
© 2009 The Author
BROWN, M. (1998) The tyranny of the international horse race. In R. SLEE and S.
WEINER (with S. TOMLINSON) School Effectiveness for Whom? (London, Falmer).
BROWN, S.P. and PETERSON R.A. (1994) The effect of effort on sales performance
and job-satisfaction, Journal of Marketing, 58 (2), 70–80.
CLAY, M.M. (1993) Reading Recovery: a Guidebook for Teachers in Training (Auckland,
Heinemann Education).
COE, R. (2007, April) Changes in standards at GCSE and A-Level: Evidence from ALIS
and YELLIS. Report for the Office of National Statistics (Durham, Curriculum,
Evaluation and Management Centre, Durham University).
COE, R. (forthcoming) Evidence-based policy? The case of learning mentors
(submitted for publication).
COE, R.J. and FITZ-GIBBON, C.T. (1998) School effectiveness research: criticisms
and recommendations, Oxford Review of Education, 24 (4), 421–438.
COFFIELD, F., MOSELEY, D., HALL, E. and ECCLESTONE, K. (2004) Should we be
using Leaning Styles? What Research has to Say to Practice (London, Learning and
Skills Research Centre).
COMER, J.P., HAYNES, N.M., JOYNER, E.T. and BEN-AVIE, M. (1999) Child by
Child: The Comer Process for Change in Education (New York, Teachers College
Press).
CWDC (Children’s Workforce Development Council) (2009) Learning Mentors.
Available online at: http://www.cwdcouncil.org.uk/learning-mentors/ (accessed
5 May 2009).
ELLIOTT, J. (1996) School effectiveness research and its critics: alternative visions
of schooling, Cambridge Journal of Education, 26, 199–223.
FESTINGER, L. (1957) A Theory of Cognitive Dissonance (Evanston, IL, Row, Peterson).
GINSBURG-BLOCK, M.D., ROHRBECK, C.A. and FANTUZZO, J.W. (2006)
A meta-analytic review of social, self-concept, and behavioral outcomes of peer-
assisted learning, Journal of Educational Psychology, 98 (4), 732–749.
GRAY, J., HOPKINS, D., REYNOLDS, D., WILCOX, B., FARRELL, S. and JESSON, D.
(1999) Improving Schools: Performance and Potential (Buckingham, Open University
Press).
HARRIS, A. (2002) School Improvement: What’s in it for Schools? (London, Routledge-
Falmer).
HATTIE, J.A.C. (2009) Visible Learning: A Synthesis of over 800 Meta-analyses Relating
to Achievement (Abingdon, Routledge).
HIGGINS, S., HALL, E., BAUMFIELD, V. and MOSELEY, D. (2005) A meta-analysis
of the impact of the implementation of thinking skills approaches on pupils. In
Research Evidence in Education Library (London, EPPI-Centre, Social Science
Research Unit, Institute of Education, University of London).
HILL, P.W. (1998) Shaking the foundations: research driven school reform, School
Effectiveness and School Improvement, 9, 419–436.
HOPEWELL, S., LOUDON, K., CLARKE, M.J., OXMAN, A.D. and DICKERSIN, K.
(2009) Publication bias in clinical trials due to statistical significance or direction
of trial results, Cochrane Database of Systematic Reviews 2009: Issue: 1 Article
Number: MR000006.
HOPKINS, D. (2002) Improving the Quality of Education for All: A Handbook of Staff
Development Activities (2nd edn.) (London, David Fulton).
HOPKINS, D. (2007) Every School a Great School: Realizing the Potential of System Lead-
ership (Maidenhead, Open University Press).
HOPKINS, D., AINSCOW, M. and WEST, M. (1994) School Improvement in an Era of
Change (London, Cassell).
377
© 2009 The Author
KLEIN, S.P., HAMILTON, L.S., McCAFFREY, D.F. and STECHER, B.M. (2000)
What do test scores in Texas tell us? Education Policy Analysis Archives, 8, 49.
LINN, R.L. (2002) Assessments and accountability. Educational Researcher, 29 (2), 4–16.
LIPSEY, M.W. and WILSON, D.B. (1993) The efficacy of psychological, educational,
and behavioral treatment: confirmation from meta-analysis, American Psychologist,
48 (12), 1181–1209.
LUYTEN, H., VISSCHER, A. and WITZIERS, B. (2005) School effectiveness
research: from a review of the criticism to recommendations for further develop-
ment, School Effectiveness and School Improvement, 16 (3), 249–279.
MacBEATH, J., GRAY, J., CULLEN, J., FROST, D., STEWARD, S. and SWAFFIELD,
S. (2007) Schools on the Edge: Responding to Challenging Circumstances (London, Paul
Chapman).
MARTIN, M.O., MULLIS, I.V.S. and FOY, P. (with OLSON, J.F., ERBERBER, E.,
PREUSCHOFF, C. and GALIA, J.) (2008) TIMSS 2007 International Science Report:
Findings from IEA’s Trends in International Mathematics and Science Study at the Fourth
and Eighth Grades (Chestnut Hill, MA, TIMSS and PIRLS International Study
Center, Boston College).
MARZANO, R.J. (1998) A Theory-Based Meta-Analysis of Research on Instruction
(Aurora, CO, Mid-continent Regional Educational Laboratory).
MAUGHAN, B., PICKLES, A., RUTTER, M. and OUSTON, J. (1990) Can schools
change? I. Outcomes at six London secondary schools, School Effectiveness and
School Improvement, 1 (3), 188–210.
MUIJS, R.D. (2004) Tales of American comprehensive school reform: successes,
failures, and reflections, School Effectiveness and School Improvement, 15 (3–4),
487–492.
MULLIS, I.V.S., MARTIN, M.O. and FOY, P. (with OLSON, J.F., PREUSCHOFF, C.,
ERBERBER, E., ARORA, A. and GALIA, J.) (2008) TIMSS 2007 International
Mathematics Report: Findings from IEA’s Trends in International Mathematics and
Science Study at the Fourth and Eighth Grades (Chestnut Hill, MA, TIMSS and PIRLS
International Study Center, Boston College).
OECD (2007) PISA 2006: Science competencies for tomorrow’s world (executive summary)
(Paris, Organisation for Economic Co-operation and Development). Available
online at: www.pisa.oecd.org.
OFSTED (Office for Standards in Education) (2006, 5 December) Improvements in
London Schools 2000–2006, HMI 2509 (London, Ofsted).
OUSTON, J. (1999) School effectiveness and school improvement: critique of
a movement. In T. BUSH, R. BOLAM, R. GLATTER and P. RIBBINS (Eds)
Educational Management: Redefining Theory, Policy and Practice (London, Paul
Chapman).
OUSTON, J., MAUGHAN, B. and RUTTER, M. (1991) Can schools change? II:
practice in six London secondary schools, School Effectiveness and School Improve-
ment, 2 (1), 3–13.
PERIE, M., MORAN, R. and LUTKUS, A.D. (2005) NAEP 2004 Trends in Academic
Progress: Three Decades of Student Performance in Reading and Mathematics (NCES
2005–464) (Washington, DC, U.S. Department of Education, Institute of Educa-
tion Sciences, National Center for Education Statistics. Government Printing
Office).
REYNOLDS, D. and STOLL, L. (1996) Merging school effectiveness and school
improvement: the knowledge bases. In D. REYNOLDS, R. BOLLEN, B.
CREEMERS, D. HOPKINS, L. STOLL and N. LAGERWEIJ Making Good Schools:
Linking School Effectiveness and School Improvement (London, Routledge).
378
© 2009 The Author
ROOTMAN, I., GOODSTADT, M., HYNDMAN, B., McQUEEN, D.V., POTVIN, L.,
SPRINGETT, J. and ZIGLIO, E. (Eds) (2001) Evaluation in Health Promotion:
Principles and Perspectives (World Health Organisation Regional Publications,
European Series, No. 92).
RUTTER, M., MAUGHAN, B., MORTIMORE, P. and OUSTON, J. (1979) Fifteen
Thousand Hours: Secondary Schools and Their Effects on Children (London, Open
Books).
SAMMONS, P. (2007) School Effectiveness and Equity: Making Connections. A Review of
School Effectiveness and Improvement Research – its Implications for Practitioners and
Policy Makers (Reading, CfBT Education Trust).
SCHEERENS, J., BOSKER, R.J. and CREEMERS, B.P.M. (2001) Time for self-
criticism: on the viability of school effectiveness research, School Effectiveness
and School Improvement, 12 (1),131–157.
SLAVIN, R.E. and MADDEN, N.A. (2001) One Million Children: Success for All
(Thousand Oaks, CA, Corwin).
THRUPP, M. (2001) Sociological and political concerns about school effectiveness
research: time for a new reseach agenda, School Effectiveness and School Improvement,
12 (1), 7–40.
TORGERSON, C. (2003) Systematic Reviews and Meta-Analysis (London, Continuum).
TYMMS, P. (2004) Are standards rising in English primary schools? British Educa-
tional Research Journal, 30, 477–494.
TYMMS, P. and MERRELL, C. (2007) Standards and Quality in English Primary Schools
over Time: the National Evidence (Primary Review Research Survey 4/1) (Cam-
bridge, University of Cambridge Faculty of Education).
VISSCHER, A.J. and COE, R. (2002) School Improvement through Performance Feedback
(Lisse, Swets and Zeitlinger).
WRIGLEY, T. (2004) ‘School effectiveness’: the problem of reductionism, British
Educational Research Journal, 30 (2), 227–244.
Correspondence
Dr Robert Coe
School of Education
Durham University
Leazes Road
Durham DH1 1TA
E-mail: r.j.coe@dur.ac.uk
379
© 2009 The Author

School Improvement: Reality and Illusion

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

School Improvement: Reality and Illusion

Uploaded by

Copyright:

Available Formats

British Journal of Educational Studies, ISSN 0007-1005

DOI number: 10.1111/j.1467-8527.2009.00444.x

SCHOOL IMPROVEMENT: REALITY

by Robert Coe, Durham University

ABSTRACT: School improvement is much sought and often claimed.

Keywords: school improvement, evidence-based, evaluation

effects of school improvement initiatives is often seen as unnecessary

2. Are the Improvement Claims Credible?

Evidence of Systemic Improvement

in both surveys did show improvements at Grade 4, but little

Evidence of Impact of Specific School Improvement Programmes

increase the favourability of their judgements of the success of their

Evidence of Improvement of Individual Schools

of individual schools. These descriptions are often rich in detail,

always be hard to isolate and may be even harder to replicate. Just

3. What is the Secret of School Improvement?

4. The Need for High-quality Evaluation

pre-specified then they cannot be the subject of policy. You cannot

5. Where Should We Look for Knowledge About

School Effectiveness Research

and school effectiveness research. Since the ‘merger’, writers such as

taking action to assume a particular characteristic will lead to

Improvement Programmes that have been Evaluated

part of the decision-making process. Decision makers, including

You might also like