You are on page 1of 28

This article was downloaded by: [Oxford Brookes University] On: 01 July 2013, At: 04:44 Publisher: Routledge

Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

British Journal of Educational Studies


Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/rbje20

Using Assessment to Drive the Reform of Schooling: Time to Stop Pursuing the Chimera?
Harry Torrance
a a

Manchester Metropolitan University Published online: 08 Dec 2011.

To cite this article: Harry Torrance (2011): Using Assessment to Drive the Reform of Schooling: Time to Stop Pursuing the Chimera?, British Journal of Educational Studies, 59:4, 459-485 To link to this article: http://dx.doi.org/10.1080/00071005.2011.620944

PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/termsand-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sublicensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

British Journal of Educational Studies Vol. 59, No. 4, December 2011, pp. 459485

USING ASSESSMENT TO DRIVE THE REFORM OF SCHOOLING: TIME TO STOP PURSUING THE CHIMERA?
by HARRY TORRANCE, Manchester Metropolitan University
ABSTRACT: Internationally, over the last 2030 years, changing the procedures and processes of assessment has come to be seen, by many educators as well as policy-makers, as a way to frame the curriculum and drive the reform of schooling. Such developments have often been manifested in large scale, high stakes testing programmes. At the same time educational arguments have been made about the need to provide students with good quality formative feedback, and informative reports about what they have achieved. The chimera of a perfectly integrated and functioning curriculum and assessment system has been pursued, but such ambition far outstretches systemic capacity; it is neither feasible nor desirable. The national testing and examination system in England is an exemplar case. As national results have improved, much evidence suggests that, if anything, actual standards of achievement are falling, and grade ination is undermining public condence in the whole system. The paper will review these issues and tensions, and argue that a different model for developing curriculum and assessment is urgently needed. Keywords: assessment, school reform, examination results, England

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

1. INTRODUCTION1 Internationally, over the last 2030 years, various developments have taken place in the eld of curriculum and assessment that have led governments around the world to look to assessment policy and practice as a way of exerting pressure on their school systems. Changing the procedures and processes of assessment has come to be seen, by many educators as well as policy-makers, as a way to frame the curriculum and drive the reform of schooling. There is not a single explanation for how and why this has happened. Many tributaries have contributed to the current torrent of policy initiatives. But the unintended consequences, or at least I assume they are unintended, are becoming very apparent, and if they are not addressed then they are likely to undermine the validity and legitimacy of the whole enterprise. In this paper I review some of the general factors contributing to the current mainstream intellectual and policy consensus; explore some of the consequences of current practice; and identity key elements which must be changed in order to develop a curriculum and assessment system which is t for purpose. The argument of the paper is that in the increasingly frantic search for a perfectly integrated and functioning assessment system, our ambition has far outstretched our capacity
ISSN 0007-1005 (print)/ISSN 1467-8527 (online) 2011 Society for Educational Studies http://dx.doi.org/10.1080/00071005.2011.620944 http://www.tandfonline.com

460

ASSESSMENT IN SCHOOL REFORM

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

to deliver in terms of both what is feasible and what is desirable. The compromises that have been struck between educational aspiration and political purpose have distorted both, with very negative consequences for the educational experience of students and the credibility of the overall system. The rst part of the paper briey summarises where we are now with respect to both the educational and political arguments which focus on using assessment to reform schooling. Secondly, I examine evidence of the effects of such developments in England, including steadily rising pass rates in national tests and public examinations, but increasing scepticism about their validity and reliability, i.e. about what they actually mean in terms of educational standards. Finally, I go on to think about what can be salvaged and what has to change if the appropriateness and quality of schooling in the twenty-rst century is to be improved. 2. HOW DID WE GET HERE? Changes in assessment do not simply arise from technical developments in the eld, though these certainly contribute. Rather, such change reects developments in the social and economic aspirations which we hold for the education system, and thus what it is that we are trying to design assessment to accomplish. This section of the paper briey reviews the following issues and developments with respect to their diverse contributions to the current settlement: selection, certication and norm referencing; human resource development and education for all; criterion referencing, clarity of outcomes and the development of content standards; social justice and educational inclusion; summative and formative assessment. Selection, Certication and Norm Referencing One of the most profound intellectual and policy shifts over the last 30 years or so has been the move from seeing only a small percentage of a student cohort as being both educable and worth educating, to seeing education as an investment in the social and economic potential of the whole cohort. We now take this aspiration for granted, at least at the level of policy rhetoric, but it was not always so: quite the reverse. Historically, education was a scarce commodity, access to educational opportunities were limited, and educational assessment was largely concerned with selecting individuals for those limited opportunities; for access to an elite secondary education for example, and access to university. In turn grades and certicates were awarded to individuals at the end of particular courses of study, as they progressed through the education system. So the focus of assessment was on identifying individual achievement, and particularly on selecting and certicating

ASSESSMENT IN SCHOOL REFORM

461

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

individuals. In so doing, this process functioned to identify, and legitimate on grounds of educational merit, the identication of the next cohort of suitably qualied and socialised personnel for economic and social leadership roles in society. Selection and certication was done by relatively small elite groups, of relatively small elite groups, for relatively small elite groups, and was underpinned by reference to the idea that innate intellectual ability was distributed along a normal distribution curve within a population. The most obvious example in England is the work of Cyril Burt in producing the intellectual justication for mental testing and in turn the 11+ selection test for grammar school entrance (Torrance, 1981). Selection, and grading for certication, produced the need for assessments to generate a rank order, with norm-referencing being used for such purposes. What mattered was where an individuals score came in relation to their peers, rather than any absolute level of achievement that it might signal in itself. Of course absolute levels of achievement were important in terms of determining grade boundaries, but such conceptions of achievement remained largely within the tacit knowledge of examiners and were not reported explicitly. What mattered publicly was the norm-referenced rank-order and grades awarded. Such practices were a product of their time, largely determined by the imperative to create a small social and economic elite, to lead and manage a largely unskilled manual workforce.

Human Resource Development and Education for All Such times have changed. Without rehearsing all the changes in the international terms of trade, and the conditions of production, that have occurred over the last 3040 years, it is apparent that we now live in a world of intense global economic competition with mass movements of capital and labour. Unskilled production has virtually vanished from the UK and other similar economies, and the emphasis now is on education for the so-called knowledge economy and as a form of investment in human capital. The focus is now on education for all, or at least the large majority, and the development of a t-for-purpose assessment system as a system, i.e. as part of an integrated approach to national human resource development. The imperative now is to treat education as an economic investment, both on the part of the individual student, and on the part of government. Instead of needing a legitimate reason to dispense with the intellectual capabilities of most of the population, governments now need to cultivate these capabilities.

Criterion Referencing, Clarity of Outcomes and the Development of Content Standards In parallel with such developments the need for assessment to produce more useful information about student achievement has also become apparent useful information for teachers, for the students themselves, and also for other

462

ASSESSMENT IN SCHOOL REFORM

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

stakeholders such as parents, employers and government. Norm-referenced, rank ordered grades do not communicate what students have achieved and, over time, we have seen a move toward more criterion-referenced assessment. Initial interest in criterion-referencing derived from the development of mastery learning programmes and evaluation studies, in the 1960s and 1970s, which sought to delineate and identify what students should know and be able to do after following a particular course of study (e.g. Bloom, 1974; Ebel, 1972; Glaser, 1963). Such early work was very much internal, so to speak, to the curriculum development and evaluation research community, but the idea of reporting learning outcomes, rather than norm-referenced grades, became more widely disseminated as demands for utility and accountability developed in the 1970s and 1980s. Employers wanted more information about what school leavers could do and governments wanted more information about what the school system was producing. Moreover, demands also grew for the school system to produce different things a wider range of more relevant skills and understandings for the knowledge economy. This in turn required a wider range of assessment methods to be developed to identify and report a wider range of learning outcomes practical work, coursework and extended project work, for example, to test practical competences and the application, rather than simply the memorisation and regurgitation, of knowledge. Thus a concern for what we might term content standards, and the production of more useful information about what school students know, understand and can do, has merged with debates about how best to measure and report such content standards, and indeed enforce them.

Social Justice and Educational Inclusion Various types of social justice arguments have also contributed to this nexus of change, partly linked to the human capital development arguments outlined above, but also partly driven by arguments about promoting social inclusion and social mobility through equal access to educational opportunities. Thus advocates argue that the majority of the population should not be abandoned to comparative, normreferenced, failure. Rather, we need our assessment systems to identify and report what students can do, rather than what they cannot, including the many social and attitudinal outcomes of education which are just as important as academic outcomes (e.g. Broadfoot et al., 1988; ILEA Hargreaves Report, 1984). Thus we want our children not just to be able to do maths, or science, or whatever, but to enjoy them and understand their importance. Equally we wish to value achievements in other domains, including social and political understanding, and ensure that students can contribute to civil society. In tandem with such general arguments about widening the scope and inclusiveness of assessment have come specic technical developments incorporating graded tests, the modularisation of the curriculum and the possibility of accumulating better nal results though the assessment of coursework and even re-sits of modular papers to improve grades (Murphy and Torrance, 1988 review some of the original advocacy for these sorts of developments; Hayward and McNichol, 2007 report some of the problems).

ASSESSMENT IN SCHOOL REFORM Summative and Formative Assessment

463

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

A more specically educational variant of some of these arguments is manifested in the debate about summative and formative assessment, and the role that formative assessment could play in improving the quality and outcomes of teaching and learning. Summative assessment reports the outcomes of an assessment process. It largely takes place at the end of a course of study, and results are reported after the course of study has nished, to the student and to interested others. Even if the assessment has involved some form of criterion-referencing, and is informative and positive (all big ifs of course), reporting after the fact about what has been achieved leaves little scope for using such information for improvement. Moreover narrow forms of summative assessment, focusing largely on testing academic achievement, can have a very narrowing backwash effect on the curriculum and the quality of teaching and the student experience (as will be reviewed in more detail below). Advocates of formative assessment argue that using a wider range of classroom-based tasks to assess student progress, and providing good quality feedback to students during a course on what they have achieved but also how they might improve, can facilitate learning and improve outcomes. Many issues arise of course, with respect to the nature and quality of the feedback and the support provided for students (which, again, will be reviewed below) but for the purposes of this introductory discussion it is sufcient to note that this major educational aspiration and endeavour is played out in the context of, and plays into the development of, the wider debate about the purpose, validity and reliability of assessment. 3. WHERE ARE WE NOW? So, a wide variety of interacting elements, deriving from long term social and economic change, and from educational arguments about the role of assessment in facilitating learning, seem to have produced the current consensus. It is not that there is some sort of simple progression here, such that norm referencing has been completely superseded, or that there is any particularly conscious orchestration and integration of these different elements. Rather, all elements are in-play at the present time, but the major inuences currently driving developments in curriculum and assessment derive from human capital theory coupled with the demand for clarity of objectives and the prescription of content standards. Thus governments around the world are looking to produce integrated curriculum and assessment systems to drive up standards, and they are supported by many educational advocates of greater integration. Perhaps the two most visible examples of change are the National Curriculum and Assessment system in England (DES, 1987), and the No Child Left Behind legislation in the United States (NCLB, 2001) now morphing into Obamas standards-oriented Race for the Top and the State-level Common Core Standards Initiative. Meanwhile, other countries are adopting similar programmes, including New Zealand, which has been developing national standards linked to a testing system since 2002 (NZQA, 2011), and Australia (cf. Wyatt-Smith et al., 2010).

464

ASSESSMENT IN SCHOOL REFORM

A key problem, however, is that there remains a schism between the educational arguments for changes in assessment to enhance learning, and the policy demands for school improvement and accountability. This schism produces tensions in developing an integrated system, and can result in signicant unintended consequences. The arguments in favour of using assessment to change teaching essentially fall into two related, but nevertheless distinct, categories. One argument derives from educational issues and values, the other is much more oriented towards accountability and the use of political pressure to bring about change. The educational arguments revolve around the role that assessment plays in determining the curriculum, using the so-called backwash effect of assessment, noted earlier, in a positive way; as Resnick and Resnick (1992) have put it:

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

You get what you assess; you dont get what you dont assess; you should build assessment towards what you want . . . to teach . . . (p. 59)

This is very much the thinking that inuenced the Measurement-Driven Instruction movement in the USA in the late 1980s and 1990s. Put desired objectives into testing programmes and teachers will teach those desired objectives (Airasian, 1988; Popham, 1987). Thus the intention is to use changes in assessment directly to inuence curriculum content and the process of teaching and learning. More recently such arguments have developed to incorporate the notion of a standards-based curriculum, whereby standards are set in terms of curriculum content and achievement levels and test are aligned with the curriculum to reinforce the teaching of those standards and to measure whether and to what extent such standards have been achieved. A rather more complex interpretation of the same broad insight focuses much more at classroom level and on the quality of teacherstudent interaction. Thus, to reiterate, it is also recognised that routine, informal assessment can play a key role in underpinning or undermining the quality of teaching and learning in the classroom. How teachers assess students work, what sorts of positive or negative feedback is given, and whether or not advice on how to improve is provided can make a great deal of difference to what is learned and how it is learned. This is the thinking which underlies the formative assessment movement in England, where such approaches are perhaps most developed (Black and Wiliam, 1998; Black et al., 2006; Torrance and Pryor, 1998), though it has also been very inuential in Australia and New Zealand (Cowie and Bell, 1999; Sadler, 1989, 1998; Wyatt Smith et al., 2010) and acknowledged as potentially important for developments in Hong Kong, the USA and elsewhere (Carless, 2006; Hargreaves, 2007; Shepard, 2000). The political accountability arguments for using assessment to drive school reform are much simpler and more clear cut. Here the claim is that education systems in general, schools in particular, must have their efciency and effectiveness measured by the outcomes produced. Expected standards of achievement must be prescribed and tests regularly employed to identify whether or not these expectations have been met. In publicly maintained school systems such prescription is

ASSESSMENT IN SCHOOL REFORM

465

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

controlled by government and the quality of teaching and learning in the classroom is assumed to rise if results improve. Essentially, in this model, testing is used as a lever to effect the system qua system; the detail at classroom level is assumed to look after itself. If results are improving, the quality of students educational experience and achievement is assumed to be improving. However, just as with measurement-driven instruction or a standards-based curriculum, it is crucial to the logic and practice of such an accountability system that the tests employed do indeed genuinely sample the curriculum, and reliably measure student achievement. The tests must be valid indicators of quality across the system as a whole otherwise they will drive the system in the wrong direction. In England, we are very much dealing with this politically driven, accountability-oriented analysis of the nature of the problem of educational standards and what to do about them; though some elements of the arguments about standards-based instruction and formative assessment, or assessment for learning as it is now more commonly known, also feature in debate. In this respect we can note once again that change in the education system is unlikely to occur only as a result of educational arguments, or indeed simply to comply with government pressure and legislation. It is the interaction of the two which produces particular practices at particular points in time. In the systemic social and institutional space of education, educational arguments are likely to be modied and adapted to t the prevailing political context, while at one and the same time such arguments are deployed in policy debates in order to increase the rhetorical and symbolic legitimacy of policy and to mobilise action within local educational contexts and institutions. It is this interaction of policy and educational aspiration that seems to have produced the current educational orthodoxy of trying to combine formative approaches to classroom assessment with large-scale summative accountability systems. The aspiration to combine formative and summative assessment in a single system was rst articulated in the Task Group on Assessment and Testing Report (The TGAT Report, DES, 1987), which provided the educational rationale for the original national system of testing. The report argued that:
It is possible to build up a comprehensive picture of the overall achievements of a pupil by aggregating, in a structured way, the separate results of a set of assessments designed to serve formative purposes. (para. 25)

The TGAT Report was also known as the Black Report since Paul Black chaired the Task Group, and he has been a signicant advocate of merging formative and summative assessment over the intervening years (e.g. Black, 1998). More recently, Shavelson et al. (2005), reviewing developments in England, Australia and the United States, argue that:
The potential for summative and formative assessment to work at cross purposes . . . is enormous. However . . . [i]f left in conict the summative function will overpower the formative and . . . [t]he goal of teaching . . . becomes improving scores on these tests. (p. 7)

466

ASSESSMENT IN SCHOOL REFORM

Gilmore (2002), reviewing the National Education Monitoring Project in New Zealand (NEMP), argues:
The vision that underlies large-scale assessment programmes . . . provides dependable assessment information for accountability (evaluative) purposes while at the same time supporting and sustaining exemplary teaching and learning. (p. 345)

Bennett (2011), in a substantial review of current international perspectives, concludes that:


The effectiveness of formative assessment will be limited by the nature of the larger system in which it is embedded . . .. Ultimately we have to change the system . . . if we want to have maximum impact on learning and instruction. [This] means remaking our accountability tests . . .[ ]. . . we have to rethink assessment from the ground up as a coherent system. (pp. 1920)

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

Thus the rationale seems to be that accountability is here to stay; summative assessment will always drive out formative assessment if they are set in opposition to one another, therefore we need to splice them together in an attempt to create the perfect chimera, the perfect genetically modied assessment system. In this ambition our reach has far outstretched our grasp. The understandable felt need of educators to pursue their educational aspirations in the context of particular policy demands has produced a tendency towards making over-ambitious claims for what can be accomplished. And, certainly in England, the system that has been produced is starting to collapse under its own weight.

4. WHAT IS THE IMPACT OF THESE CHANGES? POLICY AND PRACTICE IN E NGLAND I now move on to provide some illustrations of the problems that have become apparent in England, whereby the very improvement in results that the creation of a national system has brought about, have arguably been accomplished by too much coaching and practising for the tests, and are now undermining public credibility in the whole enterprise. England is chosen for illustrative purposes because in many respects it represents the paradigm case of the sorts of trends we have seen over the last 30 years or so. Moreover England now has over 20 years of experience of developing a national curriculum and assessment system so any initial teething troubles should have been long overcome. If any problems remain (and they do) then it is likely that they are intractable and require a different approach. England has a statutory National Curriculum and Testing system, introduced in 1988 (DES, 1987) which tests all students in a cohort at regular intervals. Originally all students were tested at age seven in English and Maths, at ages 11 and 14 in English, Maths and Science, and sat national public examinations the General Certicate of Secondary Education (GCSE) at age 16. Currently all students are now tested only at 11, with GCSE retained at 16. Since 2005

ASSESSMENT IN SCHOOL REFORM

467

classroom-based teacher assessment has been used to report results at age seven, while tests at 14 were abolished in 2009 following a asco of lost papers, unmarked papers and wrongly marked papers, which demonstrated just how overwhelming testing whole cohorts had become. Thus two stages in the process have been dropped as the undesirability of testing very young children, and the impossibility of maintaining any pretence of quality and reliability in such a mass system, became apparent. England also retains a subject-based examination system at the formal secondary school leaving age of 16 years, and a subject-based Advanced level A-level qualication, normally taken at 18 years of age, for entrance to university. So, over the last 25 years or so, if English policy makers could squeeze something into the assessment system, they did, though subsequently elements of what they squeezed in popped out again, as it transpired that there was not enough space in the system for all the testing that was envisaged. Two full levels of wholecohort testing at ages seven and 14 have been dropped in an attempt to ameliorate the worst effects of testing. The explicit use of assessment to drive educational change in England dates back to the introduction of a single system of secondary school examinations, the General Certicate of Secondary Education (GCSE), for 16-year-olds (the minimum school leaving age), by the then Conservative government of Margaret Thatcher in 1986 (with the rst new exams taken in 1988). In the 1960s and 1970s England operated with two parallel secondary school examination systems: GCE O-level2 for those students considered to be in the top 20 per cent of the ability range; and CSE3 for those considered to be in the next 40 per cent of the ability range. The bottom 40 per cent were not considered capable of taking examinations at all. Selection of the top 20 per cent for entry to grammar schools was based on the 11+ intelligence test and, overall, such a selective system represented the epitome of a norm-referenced system. The creation of a single system of examining, GCSE, in the mid-1980s, might be said to mark the point at which governments in England fully began to buy into human capital theory and treat education as an investment in the population as a whole, rather than as a way to select a social and economic elite. Of course education, and particularly assessment, still does play a major role in selecting and legitimating the selection of a social and economic elite, but it is at least now arguable that this outcome is an unintended effect of policy, rather than an overt intention. One effect of selectivity was that, precisely because it was not thought appropriate for all children to take secondary school examinations, there were no overall data in the system about how well schools were doing and what standards were being attained across the system as a whole. Moreover, the selection test for secondary school allocation (the 11+) could only provide evidence that 80 per cent of pupils failed their 11+ and therefore failed primary education. Even this test was largely phased out with the introduction of comprehensive secondary schools, so that by the mid-1980s there were virtually no data whatsoever on the output of primary schools. The Labour government of the 1970s launched

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

468

ASSESSMENT IN SCHOOL REFORM

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

the Assessment of Performance Unit (APU) to try to provide evidence of standards achieved. However, the APU did not provide unequivocal and easily usable evidence about national standards (Gipps and Goldstein, 1983); nor, because of its sampling strategy, could it reach into and inuence every classroom. First GCSE (1986), then the National Curriculum and National Testing (1988) were introduced in order to control directly what was taught and how it was taught, and to measure whether or not it was being taught effectively. There has been extensive detailed argument about the scale and scope of the National Curriculum and Testing system and many modications have been put in place since it was rst introduced (cf. Daugherty, 1995; Torrance, 1995, 2003). However, the issue of educational accountability has remained the key policy problem for over 20 years now, and the development of a standards-based, testdriven education system has remained the key policy solution up to the present. The commitment to a testing regime has remained completely taken-for-granted, as elements of policy have been built up, layer by layer, and then stripped away again, in successive attempts by both Conservative and Labour governments to try nally to realise the vision of an integrated national curriculum and testing system and render it operational. So we have had 20+ years of a natural experiment with the educational provision of our children.

5. IMPACT ON RESULTS ARE STANDARDS RISING? Before moving on to review some of the detail of the system in operation, and some of the educational consequences of this experiment, I will review some of the results produced so we have a sense of what this move from a selective to a mass system actually looks like. The increasingly feverish activity of governments of both complexions since the mid-1990s, both with respect to the National Curriculum and testing regime, and with respect to GCSE and A-level, indicates that educational standards are still considered to be a political issue about which something must be done, or at least be seen to be done. National Curriculum test scores have risen since national testing was rst introduced but have plateaued since around 2000 and insofar as they indicate anything meaningful about educational standards this suggests that progress in primary education has stalled, or appears to have stalled. However, gures for GCSE and A-level indicate that examination scores have risen consistently since the 1970s, irrespective of which government is in power or which specic curriculum interventions have been pursued. These results tend to indicate that it is the general trend towards human resource development and criterion referencing, combined with the general pressure to succeed, that has seen scores rise. To take national test results rst, at age seven, in Table 1 and Figure 1, we can see that results started high, improved a little, and have stayed high, but there remains a stubborn 1520 per cent or so of children who are not reaching level 2, the expected level, in maths and English by the age of seven. At age 11 (Table 1

ASSESSMENT IN SCHOOL REFORM

469

TABLE 1: Percentage of pupils gaining National Curriculum assessment level 2 or above at age seven and level 4 or above at age 11, England Age 7 Eng 1992 1995 1996 1997 2000 2002 2005 2007 2008 2009 2010 77 76 80 81/80 81/84 84/82 85/82 84/80 85/80 84/81 85/81 Maths 78 78 80 83 90 90 91 90 90 89 89 Eng 48 58 63 75 75 79 80 81 80 81 Age 11 Maths 44 54 61 72 73 75 77 78 78 80 Science

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

62 69 85 86 86 88 88 88 85

source: http://www.dcsf.gov.uk/rsgateway/ 1992 rst full run of KS1 tests; 1995 rst full run of KS2 English and Maths; 1996 rst full run of KS2 Science. New Labour government elected. KS1 English results now being reported separately in terms of attainment targets (81 per cent gained level 2 in Reading, 80 per cent in Writing). Such details had been available previously but results were routinely reported as whole subject levels. The Writing score is averaged across the writing test and the spelling test. The scores are averaged in Figure 1. KS1 tests now conducted as teacher assessment so results 20052010 are no longer directly comparable with previous results at KS1, though interestingly, teacher assessment does not seem to diverge from the established trend in test results. Results for Age 11 Science now derive from teacher assessment only; a national sample of 5 per cent took tests with 81 per cent reaching level 4. NB 2010 also saw c. 25 per cent of primary schools boycotting the English and Maths tests (c. 4000 schools), leading to government agreement to review national testing for the future.

and Figure 2) the results start low, rapidly improve, but again, have plateaued since 2000 with around 20 per cent of children not achieving the expected level, level 4, in maths or English. Not every years results are recorded in Table 1 and Figures 1 and 2; rather, sufcient years are recorded to indicate trends over time along with key dates which government has variously used and dropped as indicators of progress. Progress since 1997 was the measure routinely deployed by the New Labour government at national level. And at rst sight progress since 1997 seems signicant. But closer scrutiny indicates signicant improvements occurred in results prior to 1997. Thus, for example, in the rst two years after National Testing was rst introduced at KS2 under a Conservative government (19951997) results improved by 15 percentage points in English (from 48 per cent to 63 per cent) and 17 percentage points in maths (from 44 per cent to 61 per cent). In the ten years after 1997, results improved by 16 percentage points in English (63 per cent to 79 per cent) and 14 percentage points in maths (61 per cent to 75 per cent; 19972006), but

470
100 90 80 70 60 50 40 30 20 10

ASSESSMENT IN SCHOOL REFORM

Age 7 English Age 7 Maths

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

0 1992 1995 1996 1997 2000 2002 2005 2007 2008 2009 2010

Figure 1. Percentage of pupils gaining National Curriculum Assessment level 2 or above at age 7 (KS1), England
100 90 80 70 60 50 40 30 20 10 0 1992 1995 1996 1997 2000 2002 2005 2007 2008 2009 2010 Age 11 English Age 11 Maths Age 11 Science

Figure 2. Percentage of pupils gaining National Curriculum Assessment level 4 or above at age 11 (KS2), England

with most of this improvement being achieved by 2000. The plateau effect since 2000 has continued through to the most recent results available in 2010. One inference we might take from these gures, especially with respect to results at KS2 (age 11), is that the introduction of National Testing constituted a major perturbation in the primary school system such that teachers were left initially deskilled by the innovation, so results started low; but results rapidly improved as teachers and students came to understand what was required of them, in terms of test preparation, and then progress tailed off as the limits of such articial improvement were reached. Results for GCSE at age 16 are rather different but perhaps even more instructive (Table 2 and Figure 3). They have been rising steadily since the exam was

ASSESSMENT IN SCHOOL REFORM

471

TABLE 2: Percentage of pupils gaining O-level/CSE grade 1/GCSE and Equivalents 19752010, England Percentage 5 or more A C; 1975 1980 1988 1990 1995 1997 2000 2005 2007 2008 2009 2010 22.6 24.0 29.9 34.5 43.5 45.1 49.2 56.8 61.4 65.3 70.0 75.4 Percentage 5 or more A G 58.6 69.0 74.7 80.3 85.7 86.4 88.9 89.9 90.9 91.6 92.3 92.8

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

Source: Torrance (2003) and time series 19962010 available at: http://www.education. gov.uk/rsgateway/DB/SFR/s000985/sfr01-2011.pdf (accessed 5 September 2011). NB R&S update and revise pass rates so these gures may vary by fractions of a percentage from those published in previous years; the gures recorded here are the most recent posted on the DoE website. For details of calculating equivalence between O-level, CSE and GCSE see Torrance (2003). It should also be noted that DCSF/DfE Research and Statistics report totals including GCSEs and equivalents, including KS4 students taking iGCSEs and vocational GCSEs. The totals therefore are slightly higher than the headline GCSE pass rate that is announced each autumn but this does not affect the thrust of the argument.

rst introduced in 1988 and indeed were rising prior to its introduction. In the mid-1970s, when only the top 20 per cent of students were thought capable of passing O-level, the percentage of students passing at least ve O-levels or their equivalent under the previous dual system was 22.6 per cent.4 By 1988, the rst year of GCSE results, this had risen to 29.9 per cent. By the mid-1990s this had risen further to 43.5 per cent and the most recent results for 2010 indicate that
100 90 80 70 60 50 40 30 20 10 0 %5 or more A*-C grades %5 or more A*-G grades

Figure 3. Percentage of pupils gaining O-level/CSE grade 1/GCSE and equivalents 1975 2010, England

472

ASSESSMENT IN SCHOOL REFORM

75 per cent of students now pass ve or more GCSEs or their equivalent at grades A C. That is, 75 per cent of the school population now achieve what 30 years ago it was thought only the top 20 per cent could achieve. Furthermore, taking the full range of grades into account (A G), as an indicator of the numbers of students gaining at least some benet from their secondary education, almost 60 per cent gained at least ve A G grades in 1975, while nearly 93 per cent achieved ve A G in 2010. So from the published statistics in the public domain the evidence is that pass rates have been steadily improving over many years. And, on the face of it, this represents an absolute transformation of what the system is achieving, compared to 30 years ago. Of course, overall pass rates conceal other issues. Within these general trends different sub-groups perform better than others, and results vary by social class, gender and race. Thus, for example, 87.5 per cent of candidates of Chinese origin gained at least ve A Cs in 2009 (n=2,275), while only 67 per cent of candidates of Black African and Caribbean origin did so (n=23,609) though this in itself represents a very signicant improvement from the 35 per cent recorded in the early 2000s (Torrance, 2005). Amongst candidates of White British origin, 69.8 per cent passed at least ve A Cs in 2009 (n=461,445). A recent Joseph Rowntree-sponsored study indicates that poor, working-class white boys do worst of all (Cassen and Kingdon, 2007). There is not space here to explore such differential pass rates in more detail, but, taken together, the results of National Curriculum Tests and GCSE indicate that there is a major bifurcation developing between those students who are doing well and a substantial minority of perhaps 2025 per cent of children who are not riding the rising tide of results. Clearly this raises major political and educational issues, as the system is driven more and more by the pursuit of examination success for the majority, and de facto attends less and less to the needs of those who, for whatever reason, do not t into the model. Overall, however, for the purposes of the present discussion, the key point is that GCSE pass rates have been rising in England for more than 30 years. A-level pass rates have also been rising in similar fashion over the last 30 years (Table 3 and Figure 4). Originally A-level examinations were designed to qualify and select applicants for entrance to university. They were designed to be taken by a minority of the minority thought to be capable of beneting from an academic secondary education. Today A-levels taken at age 18 are starting to replace GCSE as the marker of a successfully completed secondary education. Again, not every year is recorded in Table 3 and Figure 4, but rather those that indicate trends over time, and particularly over the last ten years when already good results still show an incremental improvement year on year. The key points to note include a steadily rising numbers of passes and top grades over 30 years; a particular blip around 20002001 when major changes to the structure of A-level was introduced (Curriculum 2000), but, once accommodated, the steadily rising

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

ASSESSMENT IN SCHOOL REFORM


TABLE 3: A-level pass rates 19802010, England Entries 1980 1985 1990 1995 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 567, 027 Percent A grades 8.8 9.8 12.0 15.6 18.8 18.3 20.3 21.3 22.1 22.4 23.8 25.0 25.6 26.5 26.8

473

Percent all grades A-E 67.8 70.2 76.7 84.0 90.6 89.6 94.1 95.3 95.9 96.2 96.5 96.9 97.2 97.5 97.6

606, 995

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

717, 127

784, 877

Sources: Department of Education and Science (1980) Statistics of School Leavers, CSE and GCE, England Department of Education and Science (1985) Statistics of Education, School Leavers, CSE and GCE England Daily Telegraph 14 August 2008 accessed via website http://www.telegraph.co.uk 4 October 2010. Times Educational Supplement 17 August 1990 p. 3. Times Educational supplement 18 August 1995 p. 5. Department for Education and Employment (2000) Statistics of Education: GCSE/GNVQ and GCE A/AS Level and Advanced GNVQ Examination Results1999/2000 England accessed from DCSF Research and Statistics website: http://www.dcsf.gov.uk/rsgateway 4 October 2010. All further results, 20012010, from Joint Council for Qualications website http://www.jcq.org.uk/ accessed 4 October 2010. It might be noted in passing that it is remarkably difcult to gain access to A-level pass rates pre-2001 when they rst started appearing on the JCQ website. So far as I can discern there is no single source of results which goes back to 1980 or beyond. Government statistics have been produced in very different ways over the period, to address whatever was the key policy concern of the day. Thus this table has been produced by extensive mining of different sources First results for Curriculum 2000 including modular A/S levels. First use of A and A grades: A = 8.1 per cent, A = 18.7 per cent = 26.8 per cent total A + A (NB c. 14 per cent of all candidates achieved straight As in 2010, i.e. 3 x A or A grades, up from c. 7 per cent in 2000).

pass rate was resumed and perhaps slightly accelerated; as was the shift towards awarding of top grades. In 1980 only nine per cent of entries gained grade A, and from a signicantly smaller number of overall entries (n=567,027), while in 2010 27 per cent of entries were awarded grade As, from a total entry of 784,877. Figure 5 illustrates the change very dramatically. Here the diamonds line indicates the grade distribution in 1980, with the majority of passing grades being C, D and E, and skewed towards the bottom end. The squares line indicates grade distribution in 2010, and the graph is reversed, with most passing grades being C, B and A. So what might be the explanation for these long term trends? Here we return to the points made in the opening section of the paper. Some element of a genuine

474
100 90 80 70 60 50 40 30 20 10

ASSESSMENT IN SCHOOL REFORM

% A Grades % all pass grades (A-E)

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

0 1990 2000 2002 2004 2006 2007 2008 2009 1980 1985 1995 2001 2003 2005 2010

Figure 4. Percentage A-level passes, 19802010, England. 1980: n =567,027; 2010: n =784,877
30 25 20 15 10 5 0 E D C B A 1980 2010

Figure 5. Percentage distribution of A-level grades, EA, 1980 and 2010, England

rise in standards is likely to be present, driven by better socio-economic conditions of students, higher expectations of educational outcomes by students, parents and teachers, and better teaching. But this is combined with and compounded by two key elements of the changes which have taken place in systems of assessment: (i) an increasingly more focused concentration on passing exams, by both teachers (teaching to the test) and the majority of students (extrinsic motivation), because of the perceived importance of educational success in institutional accountability and individual life chances; (ii) the increased transparency of modular, criterion-referenced assessment systems, which affords teachers and students much more opportunity to practise for tests and improve grades through, coaching, specic feedback and resubmission of work.

ASSESSMENT IN SCHOOL REFORM 6. IMPACT ON EDUCATIONAL EXPERIENCE AND QUALITY

475

In relation to National Curriculum test scores, the evidence in England suggests that teaching to the test is the most likely recent explanation for rising scores which tail off as teachers and students come to be about as efcient as they can be at scoring well on the tests within a regime of coaching and practice. Many research studies have reported an increasing focus on test preparation, particularly in the nal year of primary school prior to the tests being taken (for a review of recent studies see Wyse and Torrance, 2009). Thus, for example, McNess et al. (2001) note that:
Whole class teaching and individual pupil work increased at the expense of group work . . . [there was] a noticeable increase in the time spent on the core subjects . . . [and] teachers . . . put time aside for revision and mock tests . . . (pp. 1213)

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

While Hall et al. (2004) report that:


assessment is synonymous with testing . . . assessment, narrowed to test-taking in preparation for SATs, is the main business of life in the last two terms of year six. (p. 804)

However, it is not only independent research studies which highlight such problems. School inspectors who routinely visit schools on a regular basis have reported on a narrowing of the curriculum and summaries of their inspection ndings have been included in the annual reports from the Ofce for Standards in Education (OfSTED). One recent report noted:
In many [primary] schools the focus of the teaching of English is on those parts of the curriculum on which there are likely to be questions in national tests . . . History and, more so, geography continued to be marginalized . . . In [secondary] schools . . . the experience of English had become narrower . . . as teachers focused on tests and examinations . . . There was a similar tension in mathematics . . . (OfSTED, 2006, pp. 5256)

Such concerns have also now reached Parliament, with a report of the Select Committee for Children, Schools and Families (2008) stating:
In an effort to drive up national standards, too much emphasis has been placed on a single set of tests and this has been to the detriment of some aspects of the curriculum and some students. (Select Committee, reported on BBC 13 May 2008: http://news.bbc.co.uk/1/hi/education/7396623.stm)

And nally, after a boycott of KS2 tests in 2010 by about 25 per cent of primary schools, the Bew Committee was set up by the new Conservative and Liberal Democrat coalition government to review KS2 testing, reporting in June 2011. Their terms of reference included:
how to avoid, as far as possible, the risk of perverse incentives, over-rehearsal and reduced focus on productive learning. (Bew Report, 2011, p. 4)

476 The report notes:

ASSESSMENT IN SCHOOL REFORM

There are considerable concerns . . . that the system is too high stakes, which can lead to unintended consequences such as over-rehearsal and teaching to the test. (p. 9)

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

Comparable evidence can also be identied internationally. Many research studies from the United States (e.g. Klein et al., 2000; Linn, 2000; Shepard, 1990) report similar ndings from previous investigations of test-based reform in the USA, and the same issues are now beginning to emerge from studies of the No Child Left Behind program. State level NCLB test scores are rising (CEP, 2007), but equally Administrators and teachers have made a concerted effort to align curriculum and instruction with state academic standards and assessments (CEP, 2006, p. 1). A recently completed study by Rand Education funded by the US National Science Foundation noted that:
changes included a narrowing of the curriculum and instruction toward tested topics and even toward certain problems styles or formats. Teachers also reported focusing more on students near the procient cut-score . . . (Hamilton et al., 2007, Summary: p. xix)

So overall, the international research evidence suggests that rising test scores might actually mask falling standards as students are exposed to a much restricted curriculum. Similar issues with regard to narrowing the curriculum have been reported with respect to GCSE in England. For the purposes of government statistics and the compilation of league tables of secondary schools the passing grade for GCSE is grade C, and schools are under enormous pressure to maximise the numbers of students passing at least ve AC grades. As noted above, OfSTED has observed a focus on exam preparation in secondary schools as well as primary schools, and research by Gillborn and Youdell (2006) has reported schools focusing particularly on identifying and supporting those students who could possibly be moved from a D to a C grade GCSE triage as they have termed it, a practice very similar to that noted by the US Rand Report cited above. Taken together with our earlier observations about a signicant minority of children not tting in with the model, this means that perhaps 2025 per cent of children are increasingly ignored as they progress through the school system, if they are not thought worthy of such triage investment. At A-level, research undertaken as part of a Nufeld Foundation sponsored review of the 1419 curriculum in England identied modularisation and the retaking of modular tests as a key issue for pass rates in science (Hayward and McNichol, 2007). My own research investigating assessment across the postcompulsory sector including A-level has identied transparency of procedures, objectives and criteria as a key issue which, combined with the detailed feedback which teachers give students, has led to a situation which I have characterised as criteria compliance:

ASSESSMENT IN SCHOOL REFORM

477

. . . greater transparency of intended learning outcomes and the criteria by which they are judged, and . . . [c]larity in assessment procedures [and] processes . . . has underpinned the widespread use of coaching, practice and provision of formative feedback to boost individual and institutional achievement . . . . However . . . such transparency encourages instrumentalism . . . transparency of objectives coupled with extensive use of coaching and practice to help learners meet them is in danger of removing the challenge of learning and reducing the quality and validity of outcomes achieved. This might be characterized as a move from assessment of learning, through the currently popular idea of assessment for learning, to assessment as learning, where assessment procedures and practices come completely to dominate the learning experience, and criteria compliance comes to replace learning. (Torrance, 2007 p. 282)

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

So, potentially, we have reached a situation in England where scores and grades are continuing to rise but the validity and reliability of the standards achieved are subject to increasing doubt, and the educational experience of even the most successful students, let alone those who are not successful, is compromised. Employers and university selectors alike are expressing concern about the quality and credibility of GCSE and A-level grades (e.g. Sykes Review, 2010) and this in turn is drawing responses from the Examination Boards (e.g. Cambridge Assessment, 2010, 2011). A recent review commissioned by the Conservative Party, the Sykes Review, reported that:
Condence in the qualications and assessment system has been diminishing for many years. The usefulness of the system has been eroded by the politicisation of assessment outcomes, by universities loss of condence in A levels as a certicate of readiness for university-level study, by employers loss of condence in GCSEs and A levels as certication of relevant knowledge and skills, and by the disproportionate burden placed by external assessment on pupils, teachers and schools. The volume of external assessment has also grown enormously . . . . This process has undermined the credibility of teacher and school assessment, as well as limiting and undermining teaching. (Sykes Review, 2010, p. 4)

Now it might be argued that it was ever thus, employers and indeed examiners have often complained that standards are not high enough. Also, to reiterate, the Sykes Review was set up to report to the Conservative Party when in opposition, and might thus be thought of as rather partisan. However, the Review Committee membership was very broad, and actually produced ndings not that far removed from OFSTED and the 2008 Parliamentary Select Committee. The more interesting point to note is how widespread is the concern across the political spectrum. It is also interesting to note the time lag between the research ndings of the late 1990s and early 2000s the work of McNess, Hall and others, cited earlier and the impact on policy thinking some ten years later. We could have avoided ten years of this progressive narrowing of the curriculum if politicians had been in a position to hear and respond to the research literature. Successive research studies over 40+ years have indicated that it is the vitality of teacherstudent relationships and the quality of teacherstudent interaction that are the most important factors in improving student learning experiences

478

ASSESSMENT IN SCHOOL REFORM

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

and raising attainment (Galton et al., 1980; Galton et al., 1999; Jackson, 1968; Mehan, 1979; Mercer, 1995). Yet this is precisely what is threatened by an overconcentration on testing. The focus of the teacherstudent relationship is currently oriented towards criteria compliance and grade accumulation, rather than learning. It is almost as if successive governments in England have taken an actuarial view of the role of rising pass rates. Examination success is correlated with social and economic success, so, the policy thinking seems to be, maximise exam passes, especially for previously disadvantaged groups, and social and economic mobility will necessarily improve, irrespective of the educational experience of the students or the quality of the outcomes achieved. In focusing on assessment, standards and accountability to drive improvements in schooling, policymakers seem to have lost sight of the purpose of education and the nature of individual achievement of what it is that standards are supposed to embody in terms of the knowledge, skills, attitudes and competences that we might expect of young people leaving school in the twenty-rst century. It is almost as if the unit of analysis for policy-making, so to speak, has shifted from the curriculum and the building blocks of individual student learning and achievement, to the overall output of the system. Individual achievement and, more particularly, the content and quality of that achievement the traditional focus of assessment has been ignored, or at least taken for granted, as policy has focused on raising test scores across the system as a whole. Meanwhile, many educationists and assessment developers have been content to ride the tiger of accountability, and take the policy context as given, in order to try to insinuate their own version of measurement driven instruction or assessment for learning into the system. However, assessment processes cannot be so easily assimilated into a single, integrated, systemic operation. Three issues in particular are apparent:

(i) just because assessment can be observed to have negative backwash effects on the curriculum and teaching, this doesnt necessarily mean that the same mechanism is available to harness these effects to benecial purposes; certainly this is proving far more difcult than seems to have been anticipated; (ii) the impact of assessment for learning on students knowledge and understanding will inevitably be mediated by the accountability context in which it operates; thus students are currently learning to accumulate grades, rather than understand the structure, coherence and content of particular knowledge domains; (iii) criterion-referencing enables the structure of knowledge domains and the processes of assessment associated with them to be more transparent, such that more students can achieve more success, but the very nature of that success undermines the selection function of assessment and thus, in turn, the credibility of what is achieved.

ASSESSMENT IN SCHOOL REFORM

479

Education has moved from being a scarce good to a positional good; whereby all cannot succeed in terms of publicly reported grades without the nature of the success being called into question. This is now a key and urgent issue to improve educational quality and outcomes without simultaneously undermining the credibility of the enterprise. Equally, we have to attend to the 2025 per cent of students that the accountability model is leaving adrift. 7. WHERE DO WE GO FROM HERE? DEVELOPING EDUCATIONAL QUALITY We are, however, where we are, and the irony of the current situation, certainly in England, is that a narrow test-driven accountability system is unlikely to produce exible and creative workers for the so-called knowledge economy, even amongst the 75 per cent who are currently seen as successful, let alone the 25 per cent who are not. Nor are such systems likely to contribute to more general deliberation about the curriculum, teacher development and the democratic purposes and organisation of schooling. Quite the reverse, accountability systems take these matters of purpose as settled, the only issues being efciency and effectiveness in meeting already determined goals. But it is becoming apparent that a one-size-ts-all standards agenda in educational policymaking has run its course and that what is needed is a new focus on curriculum content, local responsiveness and the quality of teaching and learning on the production of diverse experiences of learning for an uncertain world. 8. POLICY OPTIONS FOR THE FUTURE There are no simple or straightforward policy options with respect to assessment and testing. Assessment intersects with every aspect of an educational system:

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

at the level of the individual student and teacher and their various experiences (positive or negative) of the assessment process; at the level of the school or similar educational institution and how it is organised and held to account; and at the level of the educational and social system with respect to what knowledge is endorsed and which people are legitimately accredited for future economic and social leadership. Governments in England over the last 20 years or so seem to have only partially appreciated this, assuming that standards can be mandated and measured without the process of measurement impacting on and, in key respects, distorting the system as a whole. Similarly, because educational achievement is correlated with social and economic well-being, the efforts of successive governments in England seem to have concentrated on pushing as many children as possible through as many examinations as possible. This seems to have been conceived of as part of the drive for social inclusion and improving social mobility, without reecting on

480

ASSESSMENT IN SCHOOL REFORM

the restricted educational experience it creates for even high achieving students, let alone the 25 per cent who still do not attain at least ve A C grades at GCSE. The key policy problem is that assessment will always impact on teaching and learning; the key issue is to try to accentuate the positive impact and diminish the negative impact as far as possible. So what are the implications for the future? The new Conservative-dominated coalition government in the UK is concerned enough to have recently announced some action, but it is action that involves pulling the same levers, albeit in a slightly different direction, and is unlikely to solve the problem. They have announced a new policy with respect to inserting new features into the way in which results are reported and schools held to account. Five GCSE passes at A C will no longer be the minimum expected, but ve GCSEs including English, maths, science, a modern foreign language and a humanities subject such as history. This English Baccalaureate, as it has been termed by the government, is very reminiscent of the old matriculation certicate of the 1930s. It certainly moves the goalposts a very long way in terms of measuring the success of the system. In 2010 only 22 per cent of secondary school students took such a mix of subjects and only 15.6 per cent gained passes at C or above (DfE, 2011). We seem to be going back to the future, with only about 20 per cent of the school population currently being involved in the new measure. While it seems unlikely that the coalition government is deliberately trying to turn the clock back and reinstate the old grammar school/secondary modern divide, the percentages of students included and excluded are startlingly similar. No doubt numbers will grow as schools adapt to this new demand. However, if nothing else changes with respect to the ways in which the pursuit of test success drives teaching, then simply coaching children to speak a few words of French or Spanish, or remember a few key dates, is hardly likely to address the major issues of educational quality which are at stake. Moreover, it will be difcult for the new government to claim to be making a positive difference to improving educational standards when the publicly reported measure of success plunges from 75 per cent to below 20 per cent. The new government has understood that there is a problem, but not what to do about it. Rather more positively, some more obviously educational elements of amelioration can be identied, particularly involving the separation of formative, localised, classroom-based assessment from large scale accountability testing. The key implications of what we now know about large scale accountability testing would seem to be:

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

The greater the scale and scope of the testing system, the simpler the tests
will be The more individual student achievement is tied to system accountability the more accountability measures will dominate student experience Therefore: restrict testing to a politically necessary minimum; attend to monitoring standards by use of small national samples;

ASSESSMENT IN SCHOOL REFORM

481

re-conceptualise the integration of curriculum development and assessment by starting from the perspective of the curriculum: i.e. put resources and support into re-thinking curriculum goals for the twenty-rst century and developing illustrative examples of high quality assessment tasks that underpin and reinforce these goals, for teachers to use as appropriate. The Bew Report (2011) on Key Stage 2 (KS2) testing indicates that statutory testing will be further restricted, particularly with respect to testing Writing with writing composition . . . subject only to . . . teacher assessment (p. 14). Meanwhile, national testing of the whole cohort in Science was ended in 2010 with achievement in KS2 Science now being monitored by the testing of a sample of schools and students. So, governments continue to learn the lessons of an over-concentration on testing, though the inspection regime also needs to change. The focus of institutional self-evaluation and other accountability mechanisms including inspection need to be on the quality of the learning experience in the classroom, rather than simply concentrating on the outcomes produced. But given what we now know about twenty years of such attempts to combine amelioration with educational development, it might be argued that what is needed is a much more radical break with current policy and practice. Continuing centralisation of systemic control will threaten long term systemic sustainability, professional development and capacity building. This issue was noted by the external evaluation of the National Literacy and Numeracy Strategies when it reported that continuing this kind of accountability for too long may result in a culture of dependence (Earl et al., 2003, p. 6). Similar problems can be observed with respect to the potential opportunities to develop teacher assessment. Although national testing was ended at age seven and replaced by teacher assessment, teachers continue to use past papers to provide them with evidence for their judgements (Reed and Lewis, 2005). Comparable issues are emerging with the abolition of national testing at age 14, with preparations for GCSE simply being started earlier. Thus what is required is not just rolling back the boundaries of testing. The liberated intellectual and material possibilities for developing high quality teaching and learning must be supported by positive programmes of professional development. In the end quality education must involve quality activities taking place in schools and classrooms and this requires curriculum and professional development at local level. A radical version of the future would not simply attempt to ameliorate central control but to challenge it, and seek to embed new ideas and practices at local level. It is necessary to invest in more creative forms of curriculum and professional development, especially with respect to classroom level assessment skills and understandings, and to re-build the communities of epistemological practice within which judgments about standards are made and ultimately reside. Understanding and addressing key educational issues for the twenty-rst century requires much more curriculum exibility and responsiveness and this

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

482

ASSESSMENT IN SCHOOL REFORM

requires investment in teacher professional development at local level. Twenty years of increasing central control and regulation have produced a narrow and risk-averse education culture which is the very antithesis of the ostensible purpose of the exercise. Producing higher test scores is not enough, for learners or for governments. The need for better quality educational encounters and better quality information with which to take decisions has never been more acute. We need to stimulate new visions of what might be accomplished by our education system, and new ways to record diverse experiences and outcomes, rather than continuing to insist that all we can achieve is compliance with that which is already known.

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

9. NOTES
1

3 4

Earlier versions of this article were presented as a keynote speech to the New Zealand Association for Research in Education (NZARE) annual conference, University of Auckland, December 2010; the British Educational Studies Association (BESA) annual conference, Manchester Metropolitan University, July 2011; and as a paper to the British Educational Research Association (BERA) annual conference, London Institute of Education, September 2011. General Certicate of Education Ordinary Level; GCE Advanced Level was and still is taken at around 18+ to qualify for entry to university. Certicate of Secondary Education. i.e. the equivalent of ve GCSEs at grades A C: the top GCSE grades of A C are ofcially accepted as the equivalent of the old O-level passes; the percentage of students gaining at least ve A Cs is the ofcially and commonly accepted measure of a good secondary education; the percentage of students gaining at least ve A Gs (the full range of grades) is the ofcially and commonly accepted measure of a minimally satisfactory secondary education.

10. REFERENCES
Airasian, P. (1988) Measurement-driven instruction: a closer look, Educational Measurement: Issues and Practice, 7 (4), 611. Bennett, R. (2011) Formative assessment: a critical review, Assessment in Education, 18 (1), 525. Bew Report (2011) Independent Review of Key Stage 2 Testing, Assessment and Accountability. Available at: http://www.education.gov.uk/ks2review (accessed 25 August 2011). Black, P. (1998) Testing: Friend or Foe? (London, Falmer Press). Black, P. and Wiliam, D. (1998) Assessment and Classroom Learning, Assessment in Education, 5 (1), 774. Black, P., McCormick, R., James, M. and Pedder, D. (2006) Learning how to learn and assessment for learning, Research Papers in Education, 21 (2), 119132. Bloom, B. (1974) An introduction to Mastery Learning Theory. In J. Block (Ed.) Schools Society and Mastery Learning (New York, Holt, Rinehart and Winston). Broadfoot, P., James, M., McMeeking, S., Nuttall, D. and Stierer, B. (1988) Records of Achievement: Report of the National Evaluation (London, HMSO). Cambridge Assessment (2010) A better approach to regulating qualication standards. Available at: http://www.cambridgeassessment.org.uk/ca/Viewpoints/Viewpoint? id=134763 (accessed 4 July 2011).

ASSESSMENT IN SCHOOL REFORM

483

Cambridge Assessment (2011) Higher education admissions test must be fair, valid and transparent. Available at: http://www.cambridgeassessment.org.uk/ca/ Spotlight/Detail?tag=entry (accessed 4 July 2011). Carless, D. (2011) From Testing to Productive Student Learning (London, Routledge). Carless, D., Joughin, G., Liu N-F and Associates (2006) How Assessment Supports Learning (Hong Kong, Hong Kong University Press). Cassen, R. and Kingdon, G. (2007) Tackling Low Educational Achievement (York, Joseph Rowntree Foundation). Centre on Education Policy (2006) From the Capital to the Classroom: Year 4 of the No Child Left Behind Act: Summary and Recommendations. Available at: http://www.cepdc.org/. Centre on Education Policy (2007) Has Student Achievement Increased Since No Child Left Behind? Available at: http://www.cep-dc.org/. Cowie, B. and Bell, B. (1999), A model of formative assessment in science education, Assessment in Education, 6, 101116 Daugherty, R. (1995) National Curriculum Assessment: a Review of Policy 19871994 (London, Falmer Press). Department for Education (2011) The English Baccalaureate. Available at: http://www. education.gov.uk/schools/teachingandlearning/qualications/englishbac/a0075975/the englishbaccalaureate (accessed 4 July 2011). Department of Education and Science (1987) Task Group on Assessment and Testing: A Report (London, DES). Earl, L., Watson, N., Levin, B., Leithwood, K., Fullan, M., Torrance, N. et al. (2003) Watching and Learning 3: Final Report of the External Evaluation of Englands Literacy and Numeracy Strategies; Executive Summary (Nottingham, DfES). Ebel, R.L. (1972) Essentials of Educational Measurement (Englewood Cliffs, NJ, PrenticeHall). Galton, M., Simon, B. and Croll, P. (1980) Inside the Primary Classroom (London, Routledge and Kegan Paul). Galton, M., Hargreaves, L., Comber, C. and Wall, D. (1999) Inside the Primary Classroom: 20 Years on (London, Routledge). Gillborn, D. and Youdell, D. (2006) Educational triage and the D-to-C conversion: suitable case for treatment? In H. Lauder, P. Brown, J. Dillabough and A. H. Halsey (Eds) Education, Globalisation and Social Change (Oxford, Oxford University Press). Gilmore A. (2002) Large-scale assessment and teachers assessment capacity: learning opportunities for teachers in the National Education Monitoring Project in New Zealand, Assessment in Education, 9 (3), 343361. Gipps, C. and Goldstein, H. (1983) Monitoring Children (London, Heinemann). Glaser, R. (1963) Instructional technology and the measurement of learning outcomes, American Psychologist, 18, 519522. Hall, K., Collins, J., Benjamin, S., Nind, M. and Sheehy, K. (2004) SATurated models of pupildom: assessment and inclusion/exclusion, British Educational Research Journal, 30 (6), 801881. Hamilton, L. et al. (2007) Standards-based Accountability under No Child Left Behind (Santa Monica, Rand Education). Hargreaves, E. (2007) The validity of collaborative assessment for learning, Assessment in Education, 14 (2), 185199. Hayward, G. and McNicholl, J. (2007) Modular mayhem? A case study of the development of the A level science curriculum in England, Assessment in Education, 14 (3), 335351. Inner London Education Authority (1984) Improving Secondary Schools (The Hargreaves Report) (London, ILEA). Jackson, P. (1968) Life in Classrooms (New York, Holt Reinhart and Winston).

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

484

ASSESSMENT IN SCHOOL REFORM

Klein, S. et al. (2000) What do test scores in Texas tell us? Education Policy Analysis Archives, 8, 49. Available at: http://epaa.asu.edu/epaa/v8n49. Linn, R. (2000) Assessments and accountability, Educational Researcher, 29, 416. McNess, E., Triggs, P., Broadfoot, P., Osborn, M. and Pollard, A. (2001) The changing nature of assessment in English primary schools: ndings from the PACE Project 1989 1997, Education 313, 29 (3), 916. Mehan, H. (1979) Learning Lessons: Social Organisation in the Classroom (Harvard, Harvard University Press). Mercer N. (1995) The Guided Construction of Knowledge (Clevedon, Multi-Lingual Matters). Murphy, R. and Torrance, H. (1988) The Changing Face of Educational Assessment (Maidenhead, Open University Press). New Zealand Qualication Authority (2011) History of NCEA. Available at: http://www.nzqa.govt.nz/qualications-standards/qualications/ncea/understandingncea/history-of-ncea/ (accessed 7 July 2011). No Child Left Behind Act (2001) Public Law 107110. Available at: http://www.ed.gov/nclb/landing.jhtml. Ofce for Standards in Education (2006) The Annual Report of Her Majestys Chief Inspector of Schools 2005/06 (London, OfSTED). Available at: http://www.ofsted.gov.uk. Popham, J. (1987) The merits of measurement-driven instruction, Phi Delta Kappan, 68, 679682. Reed, M. and Lewis, K. (2005) Key Stage 1 Evaluation of New Assessment Arrangements (Slough, NFER). Resnick, L. and Resnick, D. (1992) Assessing the thinking curriculum. In B. Gifford and M. OConnor (Eds) Future Assessments: Changing Views of Aptitude, Achievement and Instruction (Boston, Kluwer). Sadler, R. (1989) Formative assessment and the design of instructional systems, Instructional Science, 18, 119144. Sadler, R. (1998) Formative assessment: revisiting the territory, Assessment in Education, 5 (1), 7784. Shavelson, R., Black, P., Wiliam, D. and Coffey, J. (2005) On linking formative and summative functions in the design of large-scale assessment systems. Available at: http://www.stanford.edu/dept/SUSE/SEAL/Reports_Papers/Onper cent20Aligning per cent20Formative per cent20and per cent20Summative per cent20Functions_Submit.doc (accessed 4 July 2011). Shepard L. (1990) Inated test score gains: is the problem old norms or teaching to the test? Educational Measurement: Issues and Practice, 9 (3), 1522. Shepard L. (2000) The role of assessment in a learning culture, Educational Researcher, 29 (7), 414. Sykes Review (2010) The Sir Richard Sykes Review of the future of English qualications and assessment system. Available at: http://www. conservatives.com/news/news_stories/2010/03/~/media/Files/Downloadablepercent20Files/Sir per cent20Richard per cent20Sykes_Review.ashx (accessed 4 July 2011). Torrance, H. (1981) The origins and development of mental testing in England and the United States, British Journal of Sociology of Education, 2 (1), 4559. Torrance, H. (Ed, 1995) Evaluating Authentic Assessment: Issues, Problems and Future Possibilities (Buckingham, Open University Press). Torrance, H. (2003) Assessment of the National Curriculum in England. In T. Kellaghan and D. Stufebeam (Eds) International Handbook of Educational Evaluation (Dordrecht, Kluwer).

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

ASSESSMENT IN SCHOOL REFORM

485

Downloaded by [Oxford Brookes University] at 04:44 01 July 2013

Torrance, H. (2005) Testing times for black achievement some observations from England. Paper presented to Symposium Leaving No Child Behind: How Federal Education Agencies are Addressing Achievement Gaps for Linguistic and Racial/Ethnic Groups American Educational Research Association Annual Conference, Montreal, 1115 April. Torrance, H. (2007) Assessment as Learning? How the use of explicit learning objectives, assessment criteria and feedback in post-secondary education and training can come to dominate learning, Assessment in Education, 14 (3), 281294. Torrance, H. and Pryor, J. (1998) Investigating Formative Assessment: Teaching, Learning and Assessment in the Classroom (Buckingham, Open University Press). Wyatt-Smith, C., Klenowski, V . and Gunn, S. (2010) The centrality of teachers judgement practice in assessment: A study of standards in moderation, Assessment in Education: Principles, Policy and Practice, 17 (1), 5975. Wyse, D. and Torrance, H. (2009) The development and consequences of national curriculum assessment for primary education in England, Educational Research, 51 (2), 213238. Correspondence Professor Harry Torrance The Education and Social Research Institute Manchester Metropolitan University 799 Wilmslow Road Manchester M20 2RR E-mail: h.torrance@mmu.ac.uk