You are on page 1of 23

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/280111671

Three In-Course Assessment Reforms to


Improve Higher Education Learning Outcomes

Article in Assessment & Evaluation in Higher Education July 2015


Impact Factor: 0.84 DOI: 10.1080/02602938.2015.1064858

READS

148

1 author:

D Royce Sadler
University of Queensland
45 PUBLICATIONS 2,727 CITATIONS

SEE PROFILE

All in-text references underlined in blue are linked to publications on ResearchGate, Available from: D Royce Sadler
letting you access and read them immediately. Retrieved on: 13 June 2016
Authors final manuscript. This article was published online in July 2015. It will be assigned to an issue of
the journal with final page numbers in due course. Publication details: Sadler, D. R. (2015 online): Three in-
course assessment reforms to improve higher education learning outcomes, Assessment & Evaluation in
Higher Education, DOI:10.1080/02602938.2015.1064858

Three In-Course Assessment Reforms to Improve


Higher Education Learning Outcomes

D. Royce Sadler
School of Education, The University of Queensland

Abstract
A current international concern is that, for too large a proportion of graduates, their
higher-order cognitive and practical capabilities are below acceptable levels. The
constituent courses of academic programs are the most logical sites for developing
these capabilities. Contributing to patchy attainment are deficiencies in three particular
aspects of assessment practice: the design and specifications of many assessment tasks;
the minimum requirements for awarding a passing grade in a course and granting
credit towards the degree; and the accumulation of points derived from quizzes,
assessments or activities completed during the teaching period. Rethinking and
reforming these would lead to improvements for significant sub-populations of
students. Pursuing such a goal would also have significant positive implications for
academic teachers but be contingent on favourable contextual settings including
departmental and institutional priorities.

Keywords: generic skills, higher education competencies, learning outcomes, academic


standards, higher education grades

Introduction
This article is mainly about cognitive capabilities that are important in most academic fields:
proficiency in thinking, reasoning, synthesising, conceptualising, evaluating and
communicating. These higher-order capabilities form a subset of what are also variously
called intended learning outcomes (Biggs and Tang 2011); or some combination of
generic, graduate or higher education with competencies, skills or attributes. With
the rapid expansion of higher education worldwide, it is natural to ask about the extent to
which all students can demonstrate adequate levels of such higher-order capabilities by the
time they graduate. But what is meant by adequate? This is the fundamental question. A

1
number of agencies and commentators referenced in the next section have alleged that while
many graduates do achieve desired standards, many others do not.
This article is based on the premise that the most logical, direct and appropriate site for
developing capabilities is within the courses that constitute degree programs. Research by
Jones (2009, 2013) has demonstrated that interpretations of the competences differ from
field to field, sometimes widely. This is the nature of disciplines. However, there are
reasonable grounds for believing that capabilities developed thoroughly in one context a
particular course or sequence of courses normally have a transferable element to them.
This allows them to be reconfigured and repurposed for use in other contexts at other times.
As Strathern (1997, 320), an anthropologist, explained it:

In making transferable skills an objective, one cannot reproduce what makes a skill
work, i.e. its embeddedness [W]hat is needed is the very ability to embed oneself
in diverse contexts, but that can only be learnt one context at a time[I]f you
embed yourself in site A you are more likely, not less, to be able to embed yourself
in site B. But if in Site A you are always casting around for how you might do
research in B or C or D, you never learn that. There is a lesson here for disciplines.
Somehow we have to produce embedded knowledge: i.e. insights that are there
for excavating later, when the context is right, but not until then [W]e have not
to block or hinder the organism's capacity to use time for the absorption of
information time-released knowledge or delayed-reaction comprehension.
[Capitalization in the original].

Reforming three particular assessment practices would increase the likelihood that more
students, especially those currently at the minimum pass level, would achieve the levels
expected of all graduates. The three form a mutually interdependent package. They are: the
design and specification of assessment tasks; the requirements for a Pass; and the design of
course assessment programs. Wherever these are not currently being practiced as aspects of
normal institutional quality assurance, they amount to reforms that require enabling changes
to be made elsewhere in the learning environment.

Context
Two widely read books by Bok (2006) and Arum and Roksa (2010) respectively describe
unevenness in graduate outcomes as perceived in the USA. Bok (2006, 7-8) wrote: Survey
after survey of students and recent graduates shows that they are remarkably pleased with
their college years. Overall, they also achieve significant gains in critical thinking, general
knowledge, moral reasoning, quantitative skills, and other competencies. At the same time
and fully compatible with that, colleges and universities, for all the benefits they bring,
accomplish far less for their students than they should. Many seniors graduate without being
able to write well enough to satisfy their employers (8) by expressing themselves with
clarity, precision, style and grace (82). Many cannot reason clearly or perform
competently in analysing complex, nontechnical problems, even though faculties rank

2
critical thinking as the primary goal of a college education (8). The ability to think
critically to ask pertinent questions, recognize and define problems, identify the arguments
on all sides of an issue, search for and use relevant data, and arrive in the end at carefully
reasoned judgments is the indispensable means of making effective use of information
(109).
Here, Bok has raised quite specific concerns. They may be valid to a greater or lesser
extent for particular institutions, academic degree programs or component courses there is
usually no independent way of telling. However, his portrayal of the situation in the USA
resonates with similar concerns raised in other countries. These are reflected in the number
of national and international discussions, policies, projects, regulations, instruments and
forms of cooperation aimed at assuring graduate outcomes (Australian Learning and
Teaching Council 2010; Bergan and Damian 2010; Lewis 2010; Williams 2010; Douglass,
Thomson and Zhao 2012; Blmeke, Zlatkin-Troitschanskaia, Kuhn and Fege 2013; Dill and
Beerkens 2013; Sadler 2013b; Shavelson 2010, 2013; Tremblay, 2013; Coates, 2014). Part
of the overall unease is because, globally, higher education has expanded rapidly without
matching increases in public funding directed specifically towards teaching.
Despite what may seem an overwhelming challenge, progress could be made by
ensuring that the course grades entered on students academic transcripts can be trusted to
represent adequate levels of the expected graduate competencies. Across a full degree
program, the transcript reports student performance on a large range of demanding tasks, in a
wide variety of courses, studied over a considerable period of time, and covering substantial
disciplinary and professional territory. Specialised tests of graduate competencies are not set
up to do this (Shavelson 2013). If third parties are to draw reasonably robust conclusions
about a graduates acquired overall capability or competence, the grades on transcripts must
be trustworthy.

Reform 1: Assessment task design and specifications


Grading a students performance involves drawing an inference from what the student
produces. The quality of the inference depends on several factors, two obvious ones being
the quality of the data (the student production) and the ability of the assessor. The quality of
the data is the focus here. Ordinarily, students respond to assessment items. An ambiguous
item is unlikely to give rise to good quality data because different students will most likely
interpret the item differently. This is why the stimulus needs to be both well designed and
clearly specified. It must set up a fresh problem to be solved, a question to be answered, an
issue to be addressed, or a position to be critiqued or defended. Students may or may not be
able to do the task well, but at least there is no excuse for not knowing the type of response
required.
To make this concrete, consider this poor assessment task: Write an essay on directive
and supportive leadership styles. Any student who simply writes separate detailed accounts
of the two leadership styles technically fulfils the requirements of the task description.
However, high performing students delve deeply into a topic as a matter of course. They
may, for example, describe the two leadership styles briefly but then go on to analyse

3
similarities, differences, and the superior fit of one of the styles for a particular purpose.
(Many other possibilities exist.) These students comprehensive understanding of both the
topic itself and the assessment context leads them to be analytical rather than descriptive,
and high marks typically follow. Regardless of the actual form of the assessment task
specifications, examiners and markers find that student responses generally range from low
to high quality for any reasonably sized student group. In some examiners eyes, this range
would be sufficient evidence to conclude that the structure and content of the assessment
task is unproblematic, thus reinforcing the status quo. That reasoning is faulty.
An example of better design for the leadership styles task would be to set up some
scenario involving two particular types of organisations (say, a voluntary association and a
business employing mostly casual staff). Ask students to explain which leadership style, or
which aspects of each, might suit the two organisations. Making the intention clear in this
way makes separate descriptions unnecessary, because how well students know the two
styles will be evident in their responses. The improved design also makes it reasonable to
hold all students to the task requirements. This is an important consideration if the
evidence of achievement is not to be compromised by poor item structure. Poor quality
evidence of a students... achievement must not be confused with evidence of poor
achievement (Sadler 2014b, 286).
In general, tasks need to stimulate higher-order thought processes such as:
hypothesising; extrapolating (or interpolating); exploring and articulating relationships
among things; estimating the likely effects of varying the parameters of a system;
redesigning something to suit a new purpose; using analogues as explanatory tools; outlining
and defending a scenario; and evaluating inadequacies or errors in solutions or arguments.
Given the huge variety of expected outcomes in different disciplines, fields and professions,
academics in those fields are best placed to determine the nature of well-formed questions
that push the students into the right amount of unfamiliar territory.
Ideally, the task specifications identify for all students the genre of response required.
Critical reviews, arguments, underlying assumptions that have to be identified, and causal
explanations are all distinct response genres (Sadler 2014a). This does not mean that students
should be given copious instructions on how to go about the task or detailed rubrics and
statements of criteria and standards of the type often recommended (Grunert O'Brien, Millis
and Cohen 2008). It means they have the right to know the genre for their response. It is both
illogical and counterproductive to appraise the quality of a student work as a member of a
particular genre if the work is not actually a member of that genre. (The concept of
response genre is not identical with writing genre as Gardner and Nesi [2013] use the
term in connection with teaching academic writing to students.)
Creating demanding assessment tasks from scratch is hard work if the tasks are to tap
into higher-order operations on ideas and information. A straightforward way to proceed is
to collect a broad range of existing tasks that require students to construct responses of
considerable length. Sources include previous assignment tasks, project descriptions and
examinations in the field. Similar material from related discipline fields may also prove
useful for ideas. Academics, individually or in groups generally can, without special tuition
or much difficulty, scrutinise the materials, broaden their own insights, and differentiate
them according to quality. In so doing, they expand their own understanding of the

4
possibilities and can decide which to avoid, emulate or adapt to suit their own context and
purpose. They can also imagine themselves as students faced with responding to particular
task specifications, trying to figure out how they would proceed.
Potential sources also include real-life problems in the relevant field. These may be of
special value in assessing graduate capability late in a degree program. Although it may not
be feasible to deal with the complexities of the full problem in its context, doing away with
unnecessary detail has to be balanced against the cost of providing students with experience
in deciding for themselves what is necessary and what may be safely discarded to make the
problem amenable to solution (Taylor 1974).

Iterative improvement of task specifications


Professional test developers routinely engage in revising task designs and specifications in
the light of experience. In higher education, a simple but revealing check on whether an
assessment task actually requires higher-order thinking and production is to ask one or two
competent others. They should interpret the wording of the specifications literally and either
indicate the absolute minimum that could be done to complete the task, or better still,
actually attempt the task itself. A more thorough check is to compare task specifications with
actual student works or performances and analyse how students responded to the tasks. This
process is passive and distinctly different from that used to score responses. What is sought
is at least a partial diagnosis of any deficiencies in the task design or specifications. Where at
least some responses technically do fall within a literal interpretation but are much simpler
than was intended, it may not have been imagined that such interpretations would be
possible. At the opposite extreme is a response that really 'captures' what was intuitively
hoped for but not fully conceptualised when the specifications were written. Capitalise on
that for the future.
The final check is to consult students themselves (Hounsell 1987) rather than try to infer
what they must have been thinking as they went about the task. This is the only
independent source that can confirm or disconfirm their understanding or reactions
(Alderson 1986). What went on in their heads while they were working out how to respond
to the task, and then during the planning and production phases? Were they surprised by how
the quality of their work was appraised?

Intended learning outcomes


To digress briefly, it might be thought at this point that assessment task design should start
with statements of course objectives, graduate capabilities or intended learning outcomes.
Biggs and Tang (2011) recommended this as foundational to achieving what they termed
constructive alignment. The treatment above, however, began directly with consideration
of assessment task design and specification. The rationale for this is as follows. Statements
of objectives typically use abstract terminology to frame higher-order cognitive
competencies such as critical analysis, problem solving and the like. These terms are
open to wide interpretation (Weissberg 2013), and adding more words cannot solve the

5
problem. The explanation can be found in Sadlers (2014b) parallel argument about the
impossibility of expressing academic achievement standards in verbal or other codified form.
The same reasoning applies to learning outcomes. The key terms in the language used cannot
be interpreted unambiguously. They float according to context because they have imagined
rather than concrete referents. On the other hand, assessment tasks and specifications are
material formulations that can be exhibited, argued about and administered. They provide the
sharpest and most direct tool available for discussing, clarifying and communicating course
intentions for students and academics alike.

Reform 2: Grading at the Pass level


Many of the objects, products and processes used in everyday life have their quality
governed by external standards that are set by some recognised authority and sharply
discriminate pass from fail. Independent licensed testing agencies apply these standards
using calibrated testing procedures. No corresponding infrastructure exists for marking,
grading and reporting course-based student achievement in higher education. Exactly what
standards and comparability mean receives relatively little attention. Yet markers
constantly need to make sound judgments about the quality of work in order to infer
underlying competence or capability. A central issue is where to pitch the course grade
boundaries. An especially important one is the lower boundary for a Passing grade, because
that usually determines whether credit will be granted towards the degree. Where should that
lower boundary be set so that when all courses are taken together, the result satisfies
discipline-based expectations, professional accreditation requirements and the capabilities
society expects of all higher education graduates?
In ordinary conversation, something is said to be passable when it is adequate or
satisfactory for the purpose. The speaker initially assumes and hearers could clarify if they
need to what adequacy means in the context. Sometimes, tone of voice can indicate that
the requirements must technically be met, even if only just. The following has been distilled
from several existing definitions:

Pass (v): to demonstrate attainment, achievement or proficiency at or exceeding a


level accepted as satisfactory but not necessarily of the highest level; to satisfy
fully a minimum agreed performance requirement; to show sufficiency or adequacy
to purpose; to meet expectations, conform to specifications or reach some fixed and
approved standard.

How do institutions conceptualise what should count as a pass? Some clues can be found in
their published grade descriptors, where these exist, although the statements may not
necessarily correspond closely with actual grading decisions. Consider the five statements in
Table 1 outlining what a Pass represents in five different institutions, all obtained from their
web sites. All use the word Pass explicitly as a grade label or refer to a pass in associated
documentation. In some cases, conditions apply. For example, the number of courses that

6
can be passed at the minimum level and also credited towards a degree may be strictly
limited.
In these statements, expectations range from concessions to students who stay the full
length of courses but may actually learn very little through to notionally adequate levels of
capability. Also in there can be found open tolerance of low levels of performance on
higher-order objectives (specifically, the ability to make sound judgments, act
independently, engage in analysis, and communicate clearly) and specific endorsement of
participation in class towards course credit. Participation is not strictly an element of
achievement or competence at all. Taken together, these grade descriptors send mixed
messages about what it means to pass.

Table 1 Five grade descriptors for the lowest level of achievement in a course for
which credit can be counted towards the degree. Conditions may apply.

Item Designator Grade Descriptor

1 50-59 Satisfactory. Demonstrates appreciation of subject matter and


Pass issues. Addresses most of the assessment criteria adequately but
may lack in depth and breadth. Often work of this grade
demonstrates only basic comprehension or competency. Work
of this grade may be poorly structured and presented. (Monash
University).

2 D Earned by work that is unsatisfactory but indicates some


(D+, D, D-) minimal command of the course materials and some minimal
participation in class activities that is worthy of course credit
toward the degree. (Harvard University, College of Arts and
Sciences).

3 40-49 Acceptable attainment of most intended learning outcomes,


3rd Pass displaying a qualified familiarity with a minimally sufficient
range of relevant materials, and a grasp of the analytical issues
and concepts which is generally reasonable, albeit insecure.
(University of Stirling).

4 E Sufficient: A performance that meets the minimum criteria, but


no more. The candidate demonstrates a very limited degree of
judgement and independent thinking. (University of Oslo).

5 D Deficient in mastery of course material; originality, creativity,


or both apparently absent from performance; deficient
performance in analysis, synthesis, and critical expression, oral
or written; ability to work independently deficient. (Dartmouth
College).

* Grade code as entered on academic transcript.

7
Although these formal grade descriptors indicate particular orientations, the definitive
measure of the adequacy of an institutions standards is whether the lowest-performing
students who gain credit for a course achieve higher-order objectives to a sufficient degree.
In the case of written responses, that includes the quality of writing. This can be determined
only by scrutinizing student responses to well-constructed assessment tasks. If a grade of D
is officially the lowest on the credit-earning scale but all students gain at least a B-, the
salient issue is whether the work awarded a B- deserves credit in terms of higher-order
outcomes. At the upper end of the grade scale, the issue is whether all students who gain the
highest available grade really do demonstrate excellence or a high level of distinction.
This is not the end of the story, and key questions still need to be asked: What is meant
and implied by acceptable standards or to a sufficient degree? How can appropriate
standards be set collaboratively so as to reflect a broad consensus? What is required to give
course grades integrity and currency across courses, programs and institutions? How may
standards be given material form so they can remain stable reference points over time?
These have been at least partially addressed both theoretically (Sadler 2013a, 2014b) and in
field trials (Watty, et al. 2014).

Reform 3: Redesigning course assessment plans


This Reform is about the timing, purpose and structure of assessment during and at the end
of a course. Specific aspects are the practice of combining marks awarded during a course
with those awarded at the end; ensuring that assessment during a course functions
formatively; and changing the parameters of summative assessment for grading.

Accumulation of marks
In theory, a course grade is meant to represent a students level of capability attained by the
end of a course. [G]rading is the assignment of a letter or number to indicate the level of
mastery the student has attained at the end of a course of study (Schrag 2001, 63). It is
literally the out-come that goes on record. This is entirely consistent with the customary (and
legitimate) way of expressing intended learning outcomes: By the end of this course,
students should: . Whether the actual path of learning is smooth or bumpy, and regardless
of the effort the student has (or has not) put in, only the final achievement status should
matter in determining the course grade (Sadler 2009, 2010b). However, in many higher
education institutions, accumulating marks or points for work assessed during a period of
learning (continuous assessment) is the prevailing practice, mandated or at least endorsed by
the institution. Readily available software provides bookkeeping tools for it. These make it
easy to progressively bank marks, then weight and process them at the point of withdrawal
for conversion into the course grade.
The common arguments for accumulation are essentially instrumentalist (Isaksson
2008). The purpose is not so much to help learners attain adequate levels of complex
knowledge and skills by the end of a course as to keep them working and provide multiple

8
opportunities for feedback. In any case, so it is argued, students need, expect, appreciate and
thrive under continuous assessment (Trotter 2006; Hernndez 2012). However,
notwithstanding its superficial appeal, accumulation actually diverts attention from the goal
of achieving a satisfactory level by course end.
First, accumulating performance measures during the learning period maps the shape of
the actual learning path into the grade (Sadler 2010b). In general, the context and actions of
both teacher and learner influence the rate and depth at which learning occurs. For many
students, coming to grips with and then and overcoming false starts, errors, bumbling
attempts and time spent going up blind alleys lead to deep understandings by the end of a
course. Students can take bold risks that end in disasters and safely make conceptual
connections that later have to be unlearned. For well over a century, the role of spacing
during the total time available for developing high-order knowledge and skills has been
extensively studied. This research provides robust findings on how humans learn
(Ebbinghaus 1885; Bloom 1974; Conway, Cohen and Stanhope 1992; Rohrer and Pashler
2010; Bud et al. 2011). This is especially marked in sequential learning in which
competence is attained only after a series of learning experiences that may take months or
years to complete before the learner has developed a satisfactory degree of attainment in the
field (Bloom 1974, 682).
Second, accumulation lends itself to awarding and banking marks for a variety of
non-achievements for the purpose of influencing student behaviour. Marks are used to
incentivise and reward student effort, engagement in preferred activities, completion of
exercises or work stages, and participation. These behaviours and activities may well assist
learning, but they do not constitute the final level of achievement, or even part of it. On the
debit side of the ledger, marks may be deducted to penalise late submission, cheating or
plagiarism. The cost of using marks to modify behaviour is contamination of the grade.
Other ways have to be found. Quite apart from behaviour management, many students insist
they have a moral right for aspects other than unadulterated achievement to be included in
their grades (Zinn et al. 2011; Tippin, Lafreniere and Page 2012). Overall, the banking
model takes data from non-achievement contaminants, early deficits, and idiosyncratic
paths of learning and mixes them all into the final grade. The grade is then logically
impossible to disentangle and hence interpret (Brookhart 1991; Sadler 2010b). Equally
serious is that no coherent concept of a standard can apply to such a mishmash of data.
Finally, although accumulating marks may succeed in motivating and focusing student
effort, the pressure and drive typically ease off once the ledger balance approaches the Pass
score cut-off. This allows students to sidestep the challenge of gaining a command over the
course as a whole, especially its higher-order objectives. Put another way, accumulation
invites students to valorise externally offered proximate goals at the very time that the
eventual goal should be kept front and centre in their minds. A persons perspective on the
fullness of the eventual goal to be achieved, or the central purpose to be served, can play a
determinative role in how they approach and manage their own learning, and the task of
becoming competent (Sommers 1980; Entwistle 1995; Sadler 2014a). A steady stream of
extrinsic rewards is a poor substitute for developed intrinsic rewards where students take
primary responsibility for their own learning. Extrinsic rewards work directly against the

9
students-as-learners maturation process in which they progress towards becoming
independent, self-directed, lifelong learners.

Designing during-course formative assessment


When the drag imposed by accumulating low or irrelevant marks is eliminated, during-
course tests and assessments are freed up to function purely formatively. Purely indicates
high stakes for learning but zero influence on the summative grade. True, in a broader
context, students may use information about end-of-course performance in one course to
improve their performance in subsequent courses, but that is a different issue. Within a
single course, formative and summative assessments need to be clearly separated so that they
can serve their respective purposes.
Given a set of course objectives, formative assessment is commonly viewed narrowly as
giving students assessment tasks and then feedback so that they can improve (Nicol and
Macfarlane-Dick 2006). Despite all the effort typically invested in creating better and better
feedback, it too often makes practically no difference. Sadler (2010a, 2013c) argued that the
principal reason is that feedback is basically about telling students the transmission model
of teaching transposed into an assessment setting. The alternative is to offer students
formative assessment opportunities that provide authentic evaluative experience of the type
they need in order to become better able to recognise, monitor and control the quality of
works they themselves are to produce. This matter is discussed at length in Sadler (1989,
2010a, 2013c, 2014a).
Students need to be exposed to a variety of complex tasks and their corresponding
response genres. This immerses them in decision spaces that are similar to those inhabited by
marker-teachers in which judgments are made about whether a particular work falls within
the required response genre, and if so, the macro-level and micro-level determinants of its
quality. Students need to become competent not only in making judgments about their own
works but also in defending those judgments and figuring out how those works could have
been made better. This involves learning to notice aspects that make a difference to quality,
and to pass over those that make only negligible difference. In other words, they need
practice in appraising works holistically, so that they come to understand how the
appropriate use of smaller-scale tactics enable a larger-scale purpose to be accomplished.
This type of seeing typically goes unrecognized in most of the research on assessment for
learning, where the focus has been on feedback (Sadler 2013c, 58).
Part of the agenda is specifically and deliberately to induct students into appreciating
the types and ranges of problems, issues or questions that could legitimately be set as
assessment tasks in the course. Multiple assessment tasks that demand complex cognitive
and other capabilities serve multiple purposes that include conveying the intended learning
outcomes for the course and equipping students for summative assessment at the end of the
course. Students need to be challenged with problems that develop, activate and coordinate
the same cognitive processes and professional skills they will need as graduates. By
definition, capability implies the freedom, versatility and adaptability to tackle successfully
problems that have not been delineated or anticipated in advance, and to do so on demand,

10
unaided and to a satisfactory standard. There are not just a handful of stereotypic problems
or types of tasks that characterise the course but a wide range of possibilities that entail
diverse cognitive and practical skills in different combinations. Research by Entwistle (1995)
showed that the best preparation for course examinations comes about only by having a
thorough grasp of the whole course. Sound assessment plans, tasks and specifications are
crucial to this.
The choice of assessment task format is an important meta-parameter in the design of
course assessment programs, both formative and summative. Extensive use of multiple
choice tests reduces if not eliminates altogether the number of written prose responses,
and with it a valuable opportunity to develop competence in discipline-focused writing.
Creating precise and cogent prose promotes high-level learning primarily because it requires
careful, probing thought (Bok 2006, 103). In her classic 1977 article, Emig wrote that
Clear writing by definition is that writing which signals without ambiguity the nature of
conceptual relationships, whether they be coordinate, subordinate, superordinate, causal, or
something other (126). In Sternglasss research, students repeatedly reported that: Only
through writing [papers of a type that]required them to integrate theory with evidence did
they achieve the insights that moved them to complex reasoning about the topic under
consideration (1997, 295). Bok (2006), Zorn (2013) and many others have argued that the
best site for developing good writing is within the disciplines themselves, not separately as a
specialist activity.

Re-inventing end-of-course summative assessment


The plan for summative assessment in a course amounts to more than just ensuring that each
assessment task is well designed and specified. A basic tenet of assessment is that the
evidence of academic achievement should be unquestionably the students own work.
Common threats to the integrity of achievement data include cheating, collusion, plagiarism,
outsourcing term papers and using substitute test takers. Increasingly sophisticated digital
technologies and telecommunications have contributed to the problem (Park 2003; Walker
2010). The traditional way of satisfying the secure data requirement has been to use
previously unseen assessment tasks in invigilated, time-restricted written examinations.
These assessment formats typically fail to take advantage of the technologies and tools of
production currently used in most workplaces use of the keyboard input, office
productivity software, the internet and web searching. Exploring ways to address this is an
active area of research. Williams (2006) and Williams and Wong (2009), for instance, have
trialled forms of open-book, open-web assessments.
Composing text and manipulating data using a keyboard (rather than pen and paper) are
now so common that these tools should be readily available for candidates. Editing of text, in
particular, has the potential to improve learning during testing. Sommers (1980) highlighted
the special role that editing and revising ones work can play in creating and clarifying
meaning:
[E]xperienced writers seek to discover (to create) meaning in the engagement with their
writing, in revision. They seek to emphasize and exploit the lack of clarity, the

11
differences of meaning, the dissonance, that writing as opposed to speech allows in the
possibility of revision. Writing has spatial and temporal features not apparent in speech
words are recorded in space and fixed in time which is why writing is susceptible to
reordering and later addition. Such features make possible the dissonance that both
provokes revision and promises, from itself, new meaning. (386)

Additional concerns about traditional examinations have their roots in typical examination
conditions. Students often experience considerable stress because of both the strict time
limits and the summary nature of high stakes, make-or-break events. In some cases,
medical researchers have explored coping strategies and the possible use of medication
(Edwards and Trimble 1992). Removing or relaxing problematic examination conditions
could well include making time limits generous (within reasonable limits) and allowing
review time and re-examination (with an accompanying fee if necessary). If it is objected
that all students in a course should perform under identical conditions, the reply is
straightforward. Students with special needs typically have accommodations made for them,
but within any course, some students may be just below the threshold at which special
accommodations would apply. In addition, the quality of a students response as appraised
against standards rather than against other students work is a clearer indicator of their
capability than the speed of task completion.
This section is concluded with two observations that apply regardless of the mode or
medium of response: efficiency and sampling. An efficient plan results in high levels of
valid achievement information relative to the costs of getting it including time in setting
and marking student work, and administrative overheads. Appropriate sampling involves
coverage across both the course subject matter (a preoccupation with many examiners) and
the range of relevant intended higher-order outcomes. These two together are somewhat
analogous to evaluating the economic potential of a mineral deposit by drilling a series of
cores into a prospective ore body to test its lateral extent and its richness (Whateley and
Scott 2006). Emphasising depth in thinking and precision in expression may well result in
higher quality but more condensed outputs.

Implications for students


In many higher education institutions, reforms along the lines sketched out above would
shift a significant measure of responsibility from the educational environment (teacher,
program director, resources, technology and the institution) to the students themselves. For
students to rise to the challenge of passing each course without concessions of any kind, they
have to set their priorities, and coordinate the resources over which they have control (prior
knowledge, personal talent, effort and time) in order to gain credit towards their degrees. For
that, they need a clear sense of future-mindedness or prospection (Osman 2014; Seligman
et al. 2013). This makes sense for the student only when the socially constructed context or
order in which they will live when they finish their degrees is sufficiently stable or secure
for them to know all the effort will have been worthwhile (Schatzki 2001). In the short term,

12
this involves attending to course achievement goals as they come, and for each, a sense of
agency over personal performance.

Goal setting
Extensive research over several decades in a wide variety of field and laboratory settings has
investigated the impact that so-called hard goals have on task performance. Progressive
reviews of this work are available in Locke et al. (1981), Locke et al. (1990), and the first
and last chapters of Locke and Latham (2013). Hard goals are specific and clear rather than
general or vague, difficult and challenging rather than simple or easy, and closer to the upper
limit of a persons capacity to perform than was their initial level of performance. Goals that
require students to stretch for them generally lead to substantial gains in performance. They
act to focus attention, mobilise effort and increase persistence at a task. In contrast, do-your-
best goals often fare little better than having no goals at all. As one would expect, the degree
of improvement is moderated by other factors, including the complexity of the task, the
learners ability, the strategies employed and various contextual constraints (Locke et al.
1981). However, the general conclusion is that an individual [cannot] thrive without goals
to provide a sense of purpose If the purpose is neither clear nor challenging, very little
gets accomplished (Locke and Latham 2009, 22).
Arranging the learning environment so that all students have an adequate grasp of the
higher-order outcomes stated in course outlines is a clear imperative for universities and
colleges. Setting standards that some students initially see as tough and possibly even
unfair or coercive, depending on their initial expectations is part of that. Serious students
adapt pragmatically to hard constraints provided the settings are known, fair and relevant.
The consequences of a hard-earned Pass are highly positive in terms of both credit towards
the degree and personal sense of accomplishment. Carried out ethically, hard goals work
constructively for the student in both the short and the long term (Sadler 2014b).

Student sense of agency


The nub of student agency is the belief that, in the matter of passing courses and gaining
credit, one is significantly responsible for ones own learning and achievement. This is
captured nicely by Pacherie (2008, 195), who drew a distinction between a long-term and an
occurrent sense of agency. The long-term sense is a sense of oneself as an agent apart
from any particular action... a form of self-narrative where ones past actions and projected
future actions are given a general coherence and unified through a set of overarching goals,
motivations, projects and general lines of conduct. The occurrent sense is that which one
experiences at the time one is preparing or performing a particular action. Pacherie was not
writing specifically about higher education, but Lindgren and McDaniel (2012) were:
Agency can shape both the process and the outcomes of student learning People are more
driven to achieve the agendas they set for themselves (346). The three reforms outlined
above, including formative assessment implemented according to sound principles, can
contribute to the growth of student agency in learning not only by imposing rigorous but

13
reasonable conditions for students to succeed but also by providing effective developmental
support.
Being vividly aware that one is in control of ones actions brings with it a personal
sense of responsibility. Frith (2014) summarised an ancient Hellenistic perspective on this,
which in essence is that ones sense of agency is developed through two factors. The first is
the cognitive binding that links ones intentional action (say, considerable effort) to its
outcome (Passing the course). The second is the belief that an alternative action (investing
little or no effort) would have led to a different outcome (Fail) accompanied by an
experience of regret. The second part of this is known as counterfactual reasoning because
although it is valid to think this way, it is essentially hypothetical, being contrary to what
actually happened (Roese 1997). If the likelihood of failure in a course is low or
non-existent, the sense of agency is weakened or disappears altogether.
For students to gain clarity on a complex course-based achievement goal something
radically different from trying to improve by, say, one grade they must understand what
high-level achievement looks like and experience for themselves what reaching it entails.
Overall, students need to see and appreciate the purpose to be served, experience success in
moving towards its attainment, and be motivated, with grit and determination, to follow
through to completion.
Genuine achievement for which a student works hard and produces a high quality result
brings about levels of fulfilment and confidence that come only from possessing deep and
thorough knowledge of some body of worthwhile material or attaining proficiency in
high-level professional skills. The terms pleasure, satisfaction, motivation and
accomplishment have many nuanced and overlapping meanings, but there is little doubt
about the legitimacy of pleasure as a by-product of successful striving (Duncker 1941,
391). This is categorically different from, in the modern context, having satisfying
experiences in the classroom (although the two may co-occur) or experiencing success in
winning against others. For some students more than others, developing this type of personal
capital demands substantial striving and struggling and induces considerable stress.
However, little by way of significant and enduring learning comes cheaply, and experiencing
success at something that was originally thought to be out of reach brings a distinctive
personal reward, a palpable sense of accomplishment. Not to insist on a demonstration of an
adequate level of higher-order capabilities is to deprive students of both an important
stimulus to achieve and the satisfaction of reaching a significant goal.

Inhibitors of change
Some inhibitors are conceptual in nature. One of these consists of the multiple meanings
attached to the term standard. Add to that a limited awareness of the need for externally
validated anchorage points for standards generally and Passing grades in particular (Sadler
2011, 2013a). Others have to do with assessment practices that detract from the integrity of
course marks and grades. Some have been criticized in the literature for decades
(Oppenheim, Jahoda and James 1967; Elton 2004; Sadler 2009), but they are now so deeply
embedded in assessment cultures they are resistant to change. In addition, new practices

14
keep coming along and are added incrementally. Accepted uncritically, these often become
popular through being labelled as innovative or best practice. They are defended strongly
by academic teachers, students and administrators and may even be mandated in institutional
assessment policies. Accumulating marks is but one example. The fact that they reduce the
integrity of course grades goes largely unheralded.
Whether hard goals are actually set and enforced depends on a variety of other factors
as well, some of which are related to the grading dispositions of individual academics. At
successively higher levels in the chain of authority, the freedom of academics to make
significant changes depends on: an enabling and supportive context provided by academic
department heads and program directors; the fixedness of the prevailing assessment
traditions, grading policies and academic priorities; and requirements externally set by
governments or accrediting agencies.

Internal momentum and culture


Recent trends in higher education assessment practice have included: minimising the
proportion of achievement data that is secure; allowing lower-order outcomes to be
substituted for higher-order as the minimum for course credit; programming the learning
path so it is presented in small manageable self-contained steps to facilitate smooth, painless
progress from step to step; and markers reading between the lines of poorly composed
written work accompanied by making generous inferences as to the students level of
understanding. The underlying drive is to ensure the least possible discomfort or stress for
students (Fiamengo 2013).
At the institutional policy level lie: curriculum freedom that allows students the
flexibility to pick and choose courses from a wide range to make up a substantial part of a
degree program; and credit transfer policies and recognition of prior learning that impose
few restrictions. These decrease the effectiveness of coherent sequences of courses
specifically designed to promote development of higher-order outcomes, which require
considerable time and multiple encounters to mature properly. Institutional factors are also
influenced by financial considerations, particularly continuity in total income from student
fees and government funding. In principle, assuring the quality of all graduates, maintaining
student entry levels and ensuring a satisfactory enrolments-based income stream are not
incompatible. However in practice, academic achievement standards can be compromised to
avoid rebalancing internal resource allocations to prioritise teaching.
At the scale of individual academics, the following statements all have their origins in
conversations with academics in universities in different countries. They reveal a range of
problematic dispositions and attitudes related to passing courses. Many students are low-
entry or from disadvantaged backgrounds. They have limited ability to achieve well on
higher-order outcome measures, but they nevertheless benefit greatly from the experience of
higher education. Students who put in substantial effort no doubt learn something of
importance in the process and therefore deserve to pass. While it is disappointing when
students submit low-level work, there is no guarantee those students would gain employment
directly in the fields of their degrees anyway. Students who fail courses suffer adverse

15
personal and social consequences, such as loss of face, additional fees and delay to graduate
earnings, so avoiding failing grades is important. When students have to pay substantial
fees, they expect to pass and in any case would appeal against failure. All grade results are
reviewed by the Assessment Review Committee and, with very few exceptions, approved
without amendment. Consistent with the principle of academic freedom, professors must
be free to decide, according to their own professional judgments, the grades to be assigned.
Creative ways are found for students to earn enough marks for them to at least pass, with
scaffolding and active coaching there to help. Students these days need a qualification even
if it means they are not truly qualified at the end. In any case, graduates learn most of what
they need to know after graduation. Cutting out cumulative assessment and instead,
grading according to serious standards would produce high failure rates and consequential
loss of income. The institution would not tolerate that.
Finally, I know I am generous in grading, but I need to keep my teaching evaluation
scores up so I can look forward to tenure. Whether there is a causal link between grades and
teaching evaluations is debated, but [r]egardless of the true relationship between grades and
teaching evaluations, the fact that many instructors perceive a positive correlation between
assigned grades and student evaluations of teaching has important consequences when there
also exists a perception that student course evaluations play a prominent role in promotion,
salary, and tenure decisions (Johnson 2003, 49).
Most of these comments amount to admissions that things as they exist may not be as
they ought to be, but by implication, not much can be done about it. Addressing inflated pass
rates at their source by raising actual achievement levels is the only valid means of ensuring
grade integrity. No amount of tinkering with other variables, and no configuration of proxy
measurements, will make the difference required.

Conclusion
In recent decades, the focus for evaluating teaching quality has been heavily weighted
towards inputs (student entry levels, participation rates, facilities, resources and support
services) and a select group of outcomes (degree completions, employability, starting
salaries and student satisfaction, experience or engagement). Conspicuously absent is
anything to do with actual academic achievement in courses. This has allowed a number of
sub-optimal assessment practices to become normalised into assessment cultures. One of the
consequences is that too many students have been able to graduate without the capabilities
expected of graduates, yet this is not necessarily apparent from their transcripts.
The focus in this article has been on student outcomes rather than inputs, with particular
emphasis on the higher-order capabilities of students. Many students fail to master these, yet
they gain credit in course after course and eventually graduate. Directly addressing the
deficient aspects of assessment culture and practice could radically alter this state of affairs,
but it would require a transformation in thinking and practice on the part of many academics.
The ultimate aim is to ensure that all students accept a significant proportion of the
responsibility for achieving adequate levels of higher-order outcomes. Bluntly put, no
student would be awarded a pass in a course without being able to demonstrate these levels.

16
For some students, this would necessitate a major change in their priorities. For academics,
both their assessment practices and the nature of the student-teacher relationship would
change.
Undoubtedly, determination to pursue this end would have significant washback effects
on teaching, learning, and course and program objectives, but that is intended. The
likelihood of success depends on finding a rational, ethical and affordable way to do it. This
may require re-engineering some parts of the transition path, creating other parts from
scratch, and reworking priorities, policies, and practices to a considerable extent. In
particular, it would entail rebalancing institutional resource allocations in order to cater for
student cohorts that have become much more diversified. Except for aims geared narrowly to
economic and employment considerations, this goal is broadly consistent with traditional
and many recent statements of the real purposes of higher education.

Sources for grade descriptors in Table 1


1. Monash University, Assessment in Coursework Units Policy: Grade Descriptors;
Course label Pass P (above Credit C; below Fail, F). Accessed 26-May-2015.
http://policy.monash.edu.au/policy-bank/academic/education/assessment/unit-
assessment-procedures.html
2. Harvard University, College of Arts and Sciences; The Grading System. Grade label D
(above C; below E). Course Requirements for the Degree: All candidates for the
Bachelor of Arts or the Bachelor of Science degree must pass 16.0 full courses and
receive letter grades of C or higher in at least 10.5 of them (at least 12.0 to be eligible
for a degree with honors). Additional note: Grades of D+ through D are passing but
unsatisfactory grades. Accessed 26-May-2015.
http://static.fas.harvard.edu/registrar/ugrad_handbook/current/chapter2/grades_honors.h
tml
3. University of Stirling, University Common Marking Scheme; Grade label 3rd Pass
(above 2.2 Pass; below: Fail-Marginal). Accessed 26-May-2015.
http://www.stir.ac.uk/regulations/undergrad/assessmentandawardofcredit/
4. University of Oslo Grading system: Grading scale with letter values. Grade label: E
Sufficient (above D Satisfactory; below F Fail). Accessed 26-May-2015.
http://www.uio.no/english/studies/about/academic-system/grading-system/
5. Dartmouth College. Grade descriptions; Grade label: D (above C; below E); Credit
eligibility: Requirements for the Degree of Bachelor of Arts. II. A student must pass
thirty-five courses No more than eight courses passed with the grade of D may be
counted toward the thirty-five courses required for graduation. Accessed 26-May-2015.
http://www.dartmouth.edu/~reg/transcript/grade_descriptions.html

17
References
Alderson, J. C. 1986. Innovations in Language Testing? In Innovations in Language
Testing: Proceedings of the IUS/NFER Conference, edited by M. Portal, 93-105.
Windsor, Berkshire: NFER-Nelson.
Arum, R., and J. Roksa. 2010. Academically Adrift: Limited Learning on College Campuses.
Chicago: University of Chicago Press.
Australian Learning and Teaching Council 2010. Learning and Teaching Academic
Standards Project Final Report. Strawberry Hills, NSW: Australian Learning and
Teaching Council.
Bergan S., and R. Damian, R., eds. 2010. Higher Education for Modern Societies:
Competences and Values. Higher Education Series No. 15. Strasbourg: Council of
Europe Publishing.
Biggs, J. B., and C. Tang. 2011. Teaching for Quality Learning at University: What the
Student Does. 4th ed. Maidenhead, UK: McGraw-Hill/Society for Research into Higher
Education/Open University Press.
Blmeke, S., O. Zlatkin-Troitschanskaia, C. Kuhn, and J. Fege, eds. 2013. Modeling and
Measuring Competencies in Higher Education: Tasks and Challenges. Rotterdam:
Sense Publishers.
Bloom, B. S. 1974. Time and Learning. American Psychologist 29 (9), 682-688.
doi:10.1037/h0037632.
Bok, D. 2006. Our Underachieving Colleges: A Candid Look at How Much Students Learn
and Why They Should Be Learning More. Princeton, NJ: Princeton University Press.
Brookhart, S. M. 1991. Grading Practices and Validity. Educational Measurement: Issues
and Practice 10 (1): 35-36. doi:10.1111/j.1745-3992.1991.tb00182.x.
Bud, L., T. Imbos., M. W. van de Wiel, and M. P. Berger. 2011. The Effect of Distributed
Practice on Students Conceptual Understanding of Statistics. Higher Education 62
(1): 69-79. doi:10.1007/s10734-010-9366-y.
Coates, H. (ed.). 2014. Higher Education Learning Outcomes Assessment: International
Perspectives. (Series: Higher Education Research and Policy. Vol. 6). Frankfurt am
Main, Berlin, Bern, Bruxelles, New York, Oxford, Wien: Peter Lang.
Conway, M A., G. Cohen, and N. Stanhope. 1992. Very Long-Term Memory for
Knowledge Acquired at School and University. Applied Cognitive Psychology 6 (6):
467-482. doi:10.1002/acp.2350060603.
Dill, D. D. and M. Beerkens. 2013. Designing the Framework Conditions for Assuring
Academic Standards: Lessons Learned about Professional, Market, and Government
Regulation of Academic Quality. Higher Education 65 (3): 341-357.
doi:10.1007/s10734-012-9548-x.
Douglass, J. A., G. Thomson, and C-M. Zhao. 2012. The Learning Outcomes Race: The
Value of Self-Reported Gains in Large Research Universities. Higher Education 64
(3): 317-335. doi:10.1007/s10734-011-9496-x.
Duncker, K. 1941. On Pleasure, Emotion, and Striving. Philosophy and Phenomenological
Research 1 (4): 391-430. doi:org/10.2307/2103143.

18
Ebbinghaus, H. 1885. Memory: A Contribution to Experimental Psychology. Trans. H. A.
Ruger and C. E. Bussenius. 1913. NY: Teachers College, Columbia University.
Edwards, J. M. and K. Trimble. 1992. Anxiety, Coping and Academic Performance.
Anxiety, Stress & Coping: An International Journal 5 (4): 337-350.
doi:10.1080/10615809208248370.
Elton, L. 2004. A Challenge to Established Assessment Practice. Higher Education
Quarterly 58 (1): 43-62. doi:10.1111/j.1468-2273.2004.00259.x.
Emig, J. 1977. Writing as a Mode of Learning. College Composition and Communication,
28 (2): 122-128. doi:10.2307/356095.
Entwistle, N. 1995. Frameworks for Understanding as Experienced in Essay Writing and in
Preparing for Examination. Educational Psychologist 30 (1): 47-54.
doi:10.1207/s15326985ep3001_5.
Fiamengo, J. 2013. The Fail-Proof Student. Academic Questions 26 (3): 329-337.
doi:10.1007/s12129-013-9372-5.
Frith, C.D. 2014. Action, Agency and Responsibility. Neuropsychologia, 55 (1): 137142.
doi:10.1016/j.neuropsychologia.2013.09.007.
Gardner, S. and H. Nesi. 2013. A Classification of Genre Families in University Student
Writing. Applied Linguistics 34 (1): 25-52. doi:10.1093/applin/ams024.
Grunert O'Brien, J., B. J Millis, and M. W Cohen. 2008. The Course Syllabus: A Learning-
Centered Approach. San Francisco: Jossey-Bass.
Hernndez, R. 2012. Does Continuous Assessment in Higher Education Support Student
Learning? Higher Education 64 (4):489-502. doi:10.1007/s10734-012-9506-7.
Hounsell, D. 1987. Essay Writing and the Quality of Feedback. In Student Learning:
Research in Education and Cognitive Psychology, edited by J. T. E. Richardson, M. W.
Eysenck, and D. Warren-Piper, 109-119. Milton Keynes: Open University Press and
Society for Research into Higher Education.
Isaksson, S. 2008. Assess As You Go: The Effect of Continuous Assessment on Student
Learning During a Short Course in Archaeology. Assessment & Evaluation in Higher
Education 33 (1): 17. doi:10.1080/02602930601122498.
Johnson, V. E. 2003. Grade Inflation: A Crisis in College Education. New York: Springer-
Verlag.
Jones, A. 2009. Redisciplining Generic Attributes: The Disciplinary Context in Focus.
Studies in Higher Education 34 (1): 85-100. doi:10.1080/03075070802602018.
Jones, A. 2013. There is Nothing Generic about Graduate Attributes: Unpacking the Scope
of Context. Journal of Further and Higher Education 37 (5): 591-605.
doi:10.1080/0309877X.2011.645466.
Lewis, R. 2010. External Examiner System in the United Kingdom. In Public Policy for
Academic Quality: Analyses of Innovative Policy Instruments, edited by D. D. Dill, and
M. Beerkens, 21-36. Dordrecht: Springer.
Lindgren, R., and R. McDaniel. 2012. Transforming Online Learning through Narrative and
Student Agency. Educational Technology & Society 15 (4), 344355.
Locke, E. A., K. N. Shaw, L. M. Saari, and G. P. Latham. 1981. Goal Setting and Task
Performance: 1969-1980. Psychological Bulletin 90 (1): 125-152. doi:10.1037/0033-
2909.90.1.125

19
Locke, E. A., G. P. Latham, K. J. Smith, and R. E. Wood, 1990. A Theory of Goal Setting
and Task Performance. Englewood Cliffs, NJ: Prentice Hall.
Locke, E. A. and G. P. Latham. 2009. Has Goal Setting Gone Wild, or Have its Attackers
Abandoned Good Scholarship? Academy of Management Perspectives 23 (1): 17-23.
doi:10.5465/AMP.2009.37008000.
Locke, E. A., and G. P. Latham, eds. 2013. New Developments in Goal Setting and Task
Performance. New York: Routledge.
Nicol, D. J., and Macfarlane-Dick, D. 2006. Formative Assessment and Self-Regulated
Learning: A Model and Seven Principles of Good Feedback Practice. Studies in Higher
Education 31 (2): 199218. doi:10.1080/03075070600572090.
Oppenheim, A. N., M. Jahoda, and R. L. James. 1967. Assumptions Underlying the Use of
University Examinations. Higher Education Quarterly 21 (3): 341-351.
doi:10.1111/j.1468-2273.1967.tb00245.x.
Osman, M. 2014. Future-Minded: The Psychology of Agency and Control. Basingstoke:
Palgrave-Macmillan.
Pacherie, E. 2008. The Phenomenology of Action: A Conceptual Framework. Cognition
107 (1): 179-217. doi:10.1016/j.cognition.2007.09.003.
Park, C. 2003. In Other (Peoples) Words: Plagiarism by University StudentsLiterature
and Lessons. Assessment & Evaluation in Higher Education 28 (5): 461-488.
doi:10.1080/02602930301677.
Roese, N. 1997. Counterfactual Thinking. Psychological Bulletin 121 (1): 133148.
doi:10.1037/0033-2909.121.1.133.
Rohrer, D., and H. Pashler. 2010. Recent Research on Human Learning Challenges
Conventional Instructional Strategies. Educational Researcher 39 (5): 406-412.
doi:10.3102/0013189X10374770.
Sadler, D. R. 1989. Formative Assessment and the Design of Instructional Systems.
Instructional Science 18 (2): 119-144. doi:10.1007/BF00117714.
Sadler, D. R. 2009. Grade Integrity and the Representation of Academic Achievement.
Studies in Higher Education 34 (7): 807-826. doi:10.1080/03075070802706553.
Sadler, D. R. 2010a. Beyond Feedback: Developing Student Capability in Complex
Appraisal. Assessment & Evaluation in Higher Education 35 (5): 535-550.
doi:10.1080/02602930903541015.
Sadler, D. R. 2010b. Fidelity as a Precondition for Integrity in Grading Academic
Achievement. Assessment & Evaluation in Higher Education 35 (6): 727-743.
doi:10.1080/02602930902977756.
Sadler, D. R. 2011. Academic Freedom, Achievement Standards and Professional Identity.
Quality in Higher Education 17 (11): 103-118. doi:10.1080/13538322.2011.554639.
Sadler, D. R. 2013a. Assuring Academic Achievement Standards: From Moderation to
Calibration. Assessment in Education: Principles, Policy and Practice 20 (1): 5-19.
doi:10.1080/0969594X.2012.714742.
Sadler, D. R. 2013b. Making Competent Judgments of Competence. In Modeling and
Measuring Competencies in Higher Education, edited by S. Blmeke, O. Zlatkin-
Troitschanskaia, C. Kuhn, and J. Fege, 13-27. Rotterdam: Sense Publishers.

20
Sadler, D. R. 2013c. Opening up Feedback: Teaching Learners to See. In
Reconceptualising Feedback in Higher Education: Developing Dialogue with Students,
edited by S. Merry, M. Price, D. Carless, and M. Taras, 54-63. London: Routledge.
Sadler, D. R. 2014a. Learning from Assessment Events: The Role of Goal Knowledge. In
Advances and Innovations in University Assessment and Feedback edited by C. Kreber,
C. Anderson, N. Entwistle, and J. McArthur, 152-172. Edinburgh: Edinburgh University
Press.
Sadler, D. R. 2014b. The Futility of Attempting to Codify Academic Achievement
Standards. Higher Education 67 (3): 273-288. doi:10.1007/s10734-013-9649-1.
Schatzki, T. R. 2001. Practice Mind-ed Orders. In The Practice Turn in Contemporary
Theory, edited by T. R. Schatzki, K. K. Cetina, and E. von Savigny, 50-63. London:
Routledge.
Schrag, F. 2001. From Here to Equality: Grading Policies for Egalitarians. Educational
Theory 51 (1): 63-73. doi:10.1111/j.1741-5446.2001.00063.x.
Seligman, M. E. P., Railton, P., Baumeister, R. F., and Sripada, C. 2013. Navigating Into
the Future or Driven by the Past. Perspectives on Psychological Science 8(2): 119-141.
DOI:10.1177/1745691612474317.
Shavelson, R. J. 2010. Measuring College Learning Responsibly: Accountability in a New
Era. Stanford, CA: Stanford University Press.
Shavelson, R. J. 2013. An Approach to Testing and Modeling Competence. In Modeling
and Measuring Competencies in Higher Education, edited by S. Blmeke, O. Zlatkin-
Troitschanskaia, C. Kuhn, and J. Fege, 29-43. Rotterdam: Sense Publishers.
Sommers, N. 1980. Revision Strategies of Student Writers and Experienced Adult Writers.
College Composition and Communication 31 (4): 378-388. doi:10.2307/356588.
Sternglass, M. 1997. Time to Know Them: A Longitudinal Study of Writing and Learning at
the College Level. Mahwah, NJ: Erlbaum.
Strathern, M. 1997. Improving Ratings: Audit in the British University System.
European Review 5 (3): 305-321. doi:0.1002/(SICI)1234- 981X(199707)5:33.0.CO;2- 4.
Taylor, R. N. 1974. Nature of Problem Ill-Structuredness: Implications for Problem
Formulation and Solution. Decision Sciences 5 (4): 632-643. doi:10.1111/j.1540-
5915.1974.tb00642.x.
Tippin, G. K., Lafreniere, K. D., and Page, S. 2012. Student Perception of Academic
Grading: Personality, Academic Orientation, and Effort. Active Learning in Higher
Education 13 (1): 5161. doi:10.1177/1469787411429187.
Tremblay, K. 2013. OECD Assessment of Higher Education Learning Outcomes
(AHELO): Rationale, Challenges and Initial Insights from the Feasibility Study. In
Modeling and Measuring Competencies in Higher Education, edited by S. Blmeke, O.
Zlatkin-Troitschanskaia, C. Kuhn, and J. Fege, 113-126. Rotterdam: Sense Publishers.
Trotter, E. 2006. Student Perceptions of Continuous Summative Assessment. Assessment
& Evaluation in Higher Education 31 (5): 505-521. doi:10.1080/02602930600679506.
Walker, J. 2010. Measuring Plagiarism: Researching What Students Do, Not What They
Say They Do. Studies in Higher Education 35 (1): 41-59.
doi:10.1080/03075070902912994.

21
Watty, K., Freeman, M., Howieson, B., Hancock, P., OConnell, B., de Lange, P., and
Abraham., A. 2014. Social Moderation, Assessment and Assuring Standards for
Accounting Graduates. Assessment & Evaluation in Higher Education, 39 (4): 461-
478. doi:10.1080/02602938.2013.848336.
Weissberg, R. 2013. Critically Thinking about Critical Thinking. Academic Questions 26
(3): 317-328. doi:10.1007/s12129-013-9375-2.
Whateley, M. K. G. and Scott, B. C. 2006. Evaluation Techniques. Chap.10 in
Introduction to Mineral Exploration. 2nd ed. edited by Charles J. Moon, Michael K. G.
Whateley, and Anthony M. Evans, 199-252. Malden, MA: Blackwell.
Williams, G. 2010. Subject Benchmarking in the UK. In Public Policy for Academic
Quality: Analyses of Innovative Policy Instruments, edited by D. D. Dill, and M.
Beerkens, 157-181. Dordrecht: Springer. doi:10.1007/978-90-481-3754-1_9.
Williams, J. B. 2006. The Place of the Closed Book, Invigilated Final Examination in a
Knowledge Economy. Educational Media International 43 (2): 107119.
doi:10.1080/09523980500237864.
Williams, J. B. and A. Wong. 2009. The Efficacy of Final Examinations: A Comparative
Study of Closed-Book, Invigilated Exams and Open-Book, Open-Web Exams. British
Journal of Educational Technology 40 (2): 227-236.
doi:10.1111/j.1467-8535.2008.00929.x.
Zinn,T. E., Magnotti, J. F., Marchuk, K., Schultz, B. S., Luther, A., and Varfolomeeva, V.
2011. Does Effort Still Count? More on What Makes the Grade. Teaching of
Psychology 38 (1): 10-15. doi:10.1177/0098628310390907.
Zorn, J. 2013. English Compositionism as Fraud and Failure. Academic Questions 26 (3):
270-284. doi:10.1007/s12129-013-9368-1.

22