Professional Documents
Culture Documents
Tsl3112module PPG Final
Tsl3112module PPG Final
1.0
OVERVIEW OF ASSESSMENT:
CONTEXT, ISSUES AND TRENDS
SYNOPSIS
LEARNING OUTCOMES
By the end of this topic, you will be able to:
1.2
1.
2.
3.
FRAMEWORK OF TOPICS
Definitions
OVERVIEW OF
ASSESSMENT: CONTEXT,
ISSUES & TRENDS
Purposes
Differences
of various
tests
CONTENT
SESSION ONE (3 hours)
1.3
INTRODUCTION
1.4
1.4.1 Test
The four terms above are frequently used interchangeably in any
academic discussions. A test is a subset of assessment intended to measure
a test-taker's language proficiency, knowledge, performance or skills. Testing
is a type of assessment techniques. It is a systematically prepared procedure
that happens at a point in time when a test-taker gathers all his abilities to
achieve ultimateperformance because he knows that his responses are being
evaluated and measured.A test is first a method of measuring a test-takers
ability, knowledge or performance in a given area; and second it must
measure.
Bachman (1990) who was also quoted by Brown defined a test as a
process of quantifying a test-takers performance according to explicit
procedures or rules.
1.4.2 Assessment
Assessment is every so oftena misunderstood term. Assessment is a
comprehensive process of planning, collecting, analysing, reporting, and
using information on students over time(Gottlieb, 2006, p. 86).Mousavi
(2009)is of the opinion that assessment is appraising or estimating the level
of magnitude of some attribute of a person. Assessment is an important
aspect in the fields of language testing and educational measurement and
perhaps, the most challenging partof it. It is an ongoing process in
educational practice, which involves a multitude of methodological techniques.
It can consist of tests, projects, portfolios, anecdotal information and student
self-reflection.A test may be assessed formally or informally, subconsciously
or consciously, as well as incidental or intended by an appraiser.
1.4.3 Evaluation
Evaluation is another confusing term. Many are confused between
evaluation and testing. Evaluation does not necessary entail testing. In
reality, evaluation is involved when the results of a test (or other assessment
procedure) are used for decision-making (Bachman, 1990, pp. 22-23).
Evaluation involves the interpretation of information. If a teacher simply
records numbers or makes check marks on a chart, it does not constitute
evaluation. When a tester or marker evaluate, s/he values the results in
such a way that the worth of the performance is conveyed to the test-taker.
This is usually done with some reference to the consequences, either good or
bad of the performance.This is commonly practised in applied linguistics
research, where the focus is often on describing processes, individuals, and
groups, and the relationships among language use, the language use
situation, and language ability.
1.4.4 Measurement
Measurement is the assigning of numbers to certain attributes of
objects, events, or people according to a rule-governed system. For our
purposes of language testing, we will limit the discussion to unobservable
abilities or attributes, sometimes referred to as traits, such as grammatical
knowledge, strategic competence or language aptitude. Similar to other tyoes
of assessment, measurement must be conducted according to explicit rules
and procedures as spelled out in test specifications, criteria, and procedures
for scoring.Measurement could be interpreted as the process of quantifying
the observed performance of classroom learners. Bachman (1990) cautioned
us to distinguish between quantitative and qualitative descriptions. Simply
put, the former involves assigning numbers (including rankings and letter
grades) to observed performance, while the latter consists of written
descriptions, oral feedback, and non-quantifiable reports.
The relationships among test, measurement, assessment, and their
uses are illustrated in Figure 1.
research methodology;
b)
practical advances;
c)
d)
e)
PreIndependence
Implementation
of the Razak
Report (1956)
Implementation
of the
RahmanTalib
Report (1960)
Implementation
of the Cabinet
Report (1979)
Implementation of
the Malaysia
Education Blueprint
(2013 2025)
i
vi
Implementation of
the Open
Certificate
Syndicate
Implementation
of Malay
Language as the
National
Language (1960)
The
achievements
of Malaysia
Examination
Syndicate
v
Recognition of
Examination
certificates
iv
Putting in place an
examination system
to meet national
needs
ii
Pioneering the
use of
computer in
the country
(1967)
iii
Taking over the
work of the
Cambridge
Examination
Syndicate
Tutorial question
Examine the contributing factors to the changing trends of
language assessment.
Create and present findings using graphic organisers.
TOPIC 2
2.0
SYNOPSIS
2.1
LEARNING OUTCOMES
By the end of this topic, you will be able to:
2.2
4.
5.
6.
FRAMEWORK OF TOPICS
Role and
Purposes of
Assessment in
Teaching and
Learning
Reasons / Purposes
of Assessment
Assessment of
Learning /
Assessment for
Learning
Types of Tests:
Proficiency,
Achievement,
Diagnostic, Aptitude,
and Placement Tests
CONTENT
SESSION TWO (3 hours)
2.3
Reasons/Purpose of Assessment
assessment are most likely to maximise student learning and well being? How
best can we use assessment in the service of student learning and wellbeing?
We have a traditional answer to these questions. Our traditional answer says
that to maximise student learning we need to develop rigorous standardised
tests given once a year to all students at approximately the same time. Then,
the results are used for accountability, identifying schools for additional
assistance, and certifying the extent to which individual students are meeting
competency.
Let us take a closer look at the two assessments below i.e.
Assessment of Learning and Assessment for Learning.
2.4
Assessment of Learning
Assessment of learning is the use of a task or an activity to measure,
Types of tests
The most common use of language tests is to identify strengths and
Diagnostic Tests
This type of test no longer enjoys the widespread use it once had. An
aptitude test is designed to measure general ability or capacity to learn a
foreign language a priori (before taking a course) and ultimate predicted
success in that undertaking. Language aptitude tests were seemingly
designed to apply to the classroom learning of any language. In the United
States, two common standardised English Language tests once used were
the Modern Language Aptitude Test (MLAT; Carroll & Sapon, 1958) and the
Pimsleur Language Aptitude Battery (PLAB; Pimsleur, 1966). Since there is
no research to show unequivocally that these kinds of tasks predict
communicative success in a language, apart from untutored language
acquisition, standardised aptitude tests are seldom used today with the
exception of identifying foreign language disability (Stansfield & Reed, 2004).
Progress Tests
These tests measure the progress that students are making towards
defined course or programme goals. They are administered at various stages
throughout a language course to see what the students have learned,
perhaps after certain segments of instruction have been completed. Progress
tests are generally teacher produced and are narrower in focus than
achievement tests because they cover a smaller amount of material and
assess fewer objectives.
Placement Tests
These tests, on the other hand, are designed to assess students level
of language ability for placement in an appropriate course or class. This type
of test indicates the level at which a student will learn most effectively. The
main aim is to create groups, which are homogeneous in level. In designing a
placement test, the test developer may choose to base the test content either
on a theory of general language proficiency or on learning objectives of the
curriculum. In the former, institutions may choose to use a well-established
proficiency test such as the TOEFL or IELTS exam and link it to curricular
benchmarks. In the latter, tests are based on aspects of the syllabus taught
at the institution concerned.
TOPIC 3
3.0
SYNOPSIS
LEARNING OUTCOMES
By the end of this topic, you will be able to:
7.
8.
3.2
FRAMEWORK OF TOPICS
Norm-Referenced
and CriterionReferenced
Types of Tests
Formative and
Summative
Objective and
Subjective
CONTENT
SESSION THREE (3 hours)
3.3
and skills, in the aspect of academic achievement tests, that testtakers/students originally have and their successive gains over time. As
opposed to NRTs, CRTs focus on students mastery of a subject matter
(represented in the standards) along a continuum instead of ranking student
on a bell curve. Table 3 below shows the differences between NormReferenced Test (NRT) and Criterion-Referenced Test (CRT).
Criterion-Referenced Test
An approach that
provides information on
students mastery based
on a criterion specified by
the teacher
Purpose
Determine performance
Determine learning
difference among
mastery based on
individual and groups
specified criterion and
standard
Test Item
From easy to difficult level Guided by minimum
and able to discriminate
achievement in the
examinees ability
related objectives
Frequency
Continuous assessment
Continuous assessment
in the classroom
Appropriateness
Summative evaluation
Formative evaluation
Example
Public exams: UPSR,
Mastery test: monthly
PMR, SPM, and STPM
test, coursework, project,
exercises in the
classroom
Table 3: The differences between Norm-Referenced Test (NRT) and
Criterion-Referenced Test (CRT)
Definition
3.5
Norm-Referenced Test
A test that measures
students achievement as
compared to other
students in the group
Formative Test
Formative test or assessment, as the name implies, is a kind of
Summative Test
Summative Assessment
Final exams
National exams (UPSR, PMR, SPM,
STPM)
Diagnostic tests
Entrance exams
Table 3.1: Common formative and summative assessments in schools
3.7
Objective Test
According to BBC Teaching English, an objective test is a test that
ii.
True-falseitems/questions:
iii.
Matchingitems/questions; and
iv.
1.
2.
Stem
Every multiple-choice item consists of a stem (the body of the item
Options or alternatives
They are known as a list of possible responses to a test item.
There are usually between three and five options/alternatives to
choose from.
4.
Key
This is the correct response. The response can either be
correct or the best one. Usually for a good item, the correct answer is not
obvious as compared to the distractors.
5. Distractors
This is known as a disturber that is included to distract students from
selecting the correct answer. An excellent distractor is almost the same as
the correct answer but it is not.
i.
ii.
iii.
Make certain that the intended answer is clearly the one correct
one;
iv.
3.8
Subjective Test
Contrary to an objective test, a subjective test is evaluated by giving an
Objective test items are items that have only one answer or correct
response. Describe in-depth the multiple-choice test item.
2.
Discussion
1. Identify at least three differences between formative and summative
assessment?
2. What are the strengths of multiple-choice items compared to essay
items?
3. Informal assessments are often unreliable, yet they are still
important in classrooms. Explain why this is the case, and defend
your explanation with examples.
4. Compare and contrast Norm-Referenced Test with CriterionReferenced Test.
TOPIC 4
4.0
SYNOPSIS
LEARNING OUTCOMES
By the end of this topic, you will be able to:
4.2
1.
2.
3.
FRAMEWORK OF TOPICS
Reliability
Interpretability
Validity
Types of
Tests
Authenticity
Practicality
Washback Effect
Objectivity
CONTENT
SESSION FOUR (3 hours)
4.3
INTRODUCTION
Assessment is a complex, iterative process requiring skills,
RELIABILITY
Reliability means the degree to which an assessment tool produces
on the second half due to fatigue, and so on. Thus, lack of reliability in the
scores students receive is a treat to validity.
According to Brown (2010), a reliable test can be described as
follows:
v
v
v
v
Test Factor
Teacher and
Student Factor
Environment
Factor
Test
Administration
Factor
Marking Factor
b.
Teacher-Student factors
In most tests, it is normally for teachers to construct and
c.
Environment factors
An examination environment certainly influences test-takers and
d.
Because students' grades are dependent on the way tests are being
administered, test administrators should strive to provide clear and
accurate instructions, sufficient time and careful monitoring of tests to
improve the reliability of their tests. A test-re-test technique can be
used to determine test reliability.
e.
Marking factors
common that different markers award different marks for the same
answer even with a prepared mark scheme. A markers assessment
may vary from time to time and with different situations. Conversely, it
does not happen to the objective type of tests since the responses are
fixed. Thus, objectivity is a condition for reliability.
4.5
VALIDITY
Validity refers to the evidence base that can be provided about
Content validity: Does the assessment content cover what you want to
assess? Have satisfactory samples of language and language skills been
selected for testing?
Construct validity: Are you measuring what you think you're measuring? Is
the test based on the best available theory of language and language use?
Concurrent validity: Can you use the current test score to estimate scores
of other criteria? Does the test correlate with other existing measures?
the criteria (concepts, skills and knowledge) relevant to the purpose of the
examination. The important notion here is the purpose.
a. Face validity
b. Content Validity
Types of Validity
c. Construct Validity
d. Concurrent Validity
e. Predictive Validity
juncture, (lack of) hesitations, and other elements within the construct
of fluency. Tests are, in a manner of speaking, operational definitions
of constructs in that their test tasks are the building blocks of the entity
that is being measured (see Davidson, Hudson, & Lynch, 1985; T.
McNamara, 2000).
4.5.6 Practicality
Although practicality is an important characteristic of tests, it is
by far a limiting factor in testing. There will be situations in which after
we have already determined what we consider to be the most valid
test, we need to reconsider the format purely because of practicality
issues. A valid test of spoken interaction, for example, would require
that the examinees be relaxed, interact with peers and speak on topics
that they are familiar and comfortable with. This sounds like the kind of
conversations that people have with their friends while sipping
afternoon teaby the roadside stalls. Of course such a situation would
be a highly valid measure of spoken interaction if we can setit up.
Imagine if we even try to do so. It would require hidden cameras as
4.5.7 Objectivity
The objectivity of a test refers to the ability of
teachers/examiners who mark the answer scripts. Objectivity refers to
the extent, in which an examiner examines and awards scores to the
same answer script. The test is said to have high objectivity when the
examiner is able to give the same score to the similar answers guided
by the mark scheme. An objective test is a test that has the highest
level of objectivity due to the scoring that is not influenced by the
examiners skills and emotions. Meanwhile, subjective test is said to
have the lowest objectivity. Based on various researches, different
examiners tend to award different scores to an essay test. It is also
possible that the same examiner would give different scores to the
same essay if s/he is to re-check at different times.
4.5.8 Washback effect
The term 'washback' or backwash (Hughes, 2003, p.1)
refers to the impact that testshave on teaching and learning. Such
impact is usuallyseen as being negative: tests are said to force
teachersto do things they do not necessarily wish to do.However, some
4.5.9 Authenticity
Another major principle of language testing is authenticity. It is a
concept that is difficult to define, particularly within the art and science
of evaluating and designing test. Citing Bachman and Palmer (1996) in
Brown (2010) authenticity is the degree of correspondence of the
characteristics of a given language test task to the features of a target
language task (p.23) and then suggested an agenda for identifying
those target language tasks and for transforming them into valid test
items.
4.6.0 Interpretability
Test interpretation encompasses all the ways that meaning is
assigned to the scores. Proper interpretation requires knowledge
about the test, which can be obtained by studying its manual and other
materials along with current research literature with respect to its
use; no one should undertake the interpretation of scores on any test
without such study. In any test interpretation, the following
considerations should be taken into account.
A. Consider Reliability: Reliability is important because it is a
prerequisite to validity and because the degree to which a score may
vary due to measurement error is an important factor in its
interpretation.
B. Consider Validity: Proper test interpretation requires knowledge of
the validity evidence available for the intended use of the test. Its
validity for other uses is not relevant. Indeed, use of a measurement
for a purpose for which it was not designed may constitute misuse.
The nature of the validity evidence required for a test depends upon its
use.
C. Scores, Norms, and Related technical Features: The result of
scoring a test or subtest is usually a number called a raw score, which
by itself is not interpretable. Additional steps are needed to translate
the number directly into either a verbal description (e.g., pass or
fail) or into a derived score (e.g., a standard score). Less than full
understanding of these procedures is likely to produce errors in
interpretation and ultimately in counseling or other uses.
D. Administration and Scoring Variation: Stated criteria for score
interpretation assume standard procedures for administering and
TOPIC 5
5.0
SYNOPSIS
Topic 5 exposes you the stages of test construction, the preparing of test
blueprint/test specifications, the elements in a Test Specifications Guidelines
And the importance of following the guidelines for constructing tests items.
Then we look at the various test formats that are appropriate for language
assessment.
5.1
LEARNING OUTCOMES
By the end of this topic, you will be able to:
1.
2.
3.
draw up a test specification that reflect both the purpose and the
objectives of the test
4.
5.
6.
7.
8.
9.
5.2
FRAMEWORK OF TOPICS
Stages of Test
Construction
Preparing Test
Blueprint / Test
Specifications
Guidelines for
constructing Test
Items
Test Format
CONTENT
SESSION FIVE (3 hours)
5.3
determining
planning
writing
preparing
reviewing
vi
vii
pre-testing
validating
5.3.1 Determining
The essential first step in testing is to make oneself perfectly
clear about what it is one wants to know and for what purpose. When
we start to construct a test, the following questions have to be
answered.
5.3.2 Planning
The first form that the solution takes is a set of specifications for
the test.This will include information on: content, format and timing,
criteria,levels of performance, and scoring procedures.
In this stage, the test constructor has to determine the content by
answering the following questions:
v Describing the purpose of the test;
v Describing the characteristics of the test takers, the nature of the
population of the examinees for whom the test is being designed.
v Defining the nature of the ability we want to measure;
v Developing a plan for evaluating the qualities of test usefulness, which
is the degree to which a test is useful for teachers and students, it
includes six qualities: reliability, validity, authenticity, practicality interactiveness, and impact;
v Identifying resources and developing a plan for their allocation and
management;
v Determining format and timing of the test;
v Determining levels of performance;
v Determining scoring procedures
5.3.3 Writing
Although writing items is time-consuming, writing good items is an art.
No one can expect to be able consistently to produce perfect items.
Some items will have to be rejected, others reworked. The best way to
identify items that have to be improved or abandoned is through
teamwork. Colleagues must really try to find fault; and despite the
seemingly inevitable emotional attachment that item writers develop to
items that they have created, they must be open to, and ready to
accept, the criticisms that are offered to them. Good personal relations
are a desirable quality in any test writing team.
5.3.4 Preparing
One has to understand the major principles, techniques and
experience of preparing the test items. Not every teacher can make a
good tester. To construct different kinds of tests, the tester should
observe some principles. In the production-type tests, we have to bear
in mind that no comments are necessary. Test writers should also try to
avoid test items, which can be answered through test- wiseness. Testwiseness refers to the capacity of the examinees to utilise the
characteristics and formats of the test to guess the correct answer.
5.3.5 Reviewing
Principles for reviewing test items:
v The test should not be reviewed immediately after its construction,
5.3.6 Pre-testing
After reviewing the test, it should be submitted to pre-testing.
v The tester should administer the newly-developed test to a group of
examinees similar to the target group and the purpose is to analyse
every individual item as well as the whole test.
v Numerical data (test results) should be collected to check the
efficiency of the item, it should include item facility and
discrimination.
5.3.7 Validating
Item Facility (IF) shows to what extent the item is easy or difficult. The
items should neither be too easy nor too difficult. To measure the
facility or easiness of the item, the following formula is used:
IF= number of correct responses (c) / total number of candidates (N)
And to measure item difficulty:
IF= (w) / (N)
The results of such equations range from 0 1. An item with a
facility index of 0 is too difficult, and with 1 is too easy. The ideal item is
one with the value of (0.5) and the acceptability range for item facility is
between [0.37 0.63], i.e. less than 0.37 is difficult, and above 0.63 is
easy.
Thus, tests which are too easy or too difficult for a given sample
population, often show low reliability. As noted in Topic 4, reliability is
one of the complementary aspects of measurement.
5.4
skills to be included
and the grammar class they attend serves to reinforce the grammatical
forms that they have learnt in the two earlier classes.
Based on the scenario above, the test specs that you design
might consist of the four sequential steps:
1. a broad outline of how the test will be organised
2. which of the eight sub-skills you will test
3. what the various tasks and item types will be
4. how results will be scored, reported to students, and used in future
class (washback)
Besides knowing the purpose of the test you are creating, you
are required to know as precisely as possible what it is you want to
test. Do not conduct a test hastily. Instead, you need to examine the
objectives for the unit you are testing carefully.
5.5
Taxonomy by allowing these two aspects, the noun and verb, to form
separate dimensions, the noun providing the basis for the Knowledge
dimension
and the verb forming the basis for the Cognitive Process
Alternative Names
Recognising
Identifying
Recalling
Retrieving
Definition
Retrieve knowledge
from long-term
memory
Locating knowledge in
long-term memory that
is consistent with
presented material
Retrieving relevant
knowledge from longterm memory
Level 2 C2
Categories &
Cognitive Processes
Understand
Alternative Names
Interpreting
Clarifying
Paraphrasing
Representing
Translating
Exemplifying
Illustrating
Instantiating
Classifying
Categorising
Subsuming
Summarising
Abstracting
Generalising
Concluding
Extrapolating
Interpolating
Predicting
Contrasting
Mapping
Matching
Inferring
Comparing
Explaining
Constructing models
Definition
Construct meaning
from instructional
messages, including
oral, written, and
graphic
communication
Changing from one form
of representation to
another
Finding a specific
example or illustration of
a concept or principle
Determining that
something belongs to a
category
Abstracting a general
theme or major point(s)
Drawing a logical
conclusion from
presenting information
Detecting
correspondences
between two ideas,
objects, and the like
Constructing a cause
and effect model of a
system
Level 3 C3
Categories &
Cognitive Processes
Apply
Alternative Names
Executing
Carrying out
Exemplifying
Illustrating
Instantiating
Using
Analyse
Definition
Applying a procedure
to a familiar task
Applying a procedure to
a familiar task
Applying a procedure to
an unfamiliar task
Break materials into
its constituent parts
Differentiating
Organising
Attributing
Discriminating
Distinguishing
Focusing
Selecting
Finding coherence
Integrating
Outlining
Parsing
Structuring
Deconstructing
Evaluating
Checking
Coordinating
Detecting
Monitoring
Testing
Critiquing
Judging
Create
Determining a point of
view, bias, values, or
intent underlying
presented material
Make judgments
based on criteria and
standards
Detecting
inconsistencies or
fallacies within a
process or product,
determining whether a
process or product has
internal consistency;
detecting the
effectiveness of a
procedure as it is being
implemented
Detecting
inconsistencies
betweena product and
external
criteria;determining
whether a product has
external consistency;
detecting the
appropriateness of a
procedure for a given
problem
Putting elements
together to form a
coherent or functional
whole; reorganise
elements into a new
pattern or structure
Generating
Hypothesising
Planning
Designing
Producing
Constructing
Coming upwith
alternative hypotheses
based on criteria
Devising a procedure for
accomplishing some
task
Inventing a product
Procedural Knowledge
Metacognitive
Knowledge
Definition
The basic elements students must know to the
acquainted with a discipline or solve problems in it
The interrelationships among the basic elements
within a larger structure that enable them to
function together
How to do something, methods of inquiry, and
criteria for using skills, algorithms, techniques, and
methods
Knowledge of cognition in general as well as
awareness and knowledge of ones own cognition
one or few aspects of the task (unistructural), then several aspects but
they are unrelated (multistructural), then we learn how to integrate
them into a whole (relational), and finally, we are able to generalise
that whole to as yet untaught applications (extended abstract). The
diagram below shows lists verbs typical of each such level.
5.6
levels of thinking and Blooms taxonomy is often cited as a tool to use in item
writing. Always stick to writing important questions that represent and can
predict that a test-taker is proficient at high levels of cognitive processing in
doing their test proficiently.
easy, moderate and advance test items. A reliable and valid test instrument
should encompass all three levels of difficulties.
6.0
Test format
What is the difference between test format and test type? For example,
when you want to introduce new kinds of test, for example, reading test, which
is organised a little bit different from the existing test items, what do you say?
Test format or test type? Test format refers to the layout of questions on a
test. For example, the format of a test could be two essay questions, 50
multiple- choice questions, etc.For the sake of brevity, I will consider providing
the outlines of some large-scale standardised tests.
UPSR
Primary School Evaluation Test, also as known Ujian Penilaian
Sekolah Rendah (commonly abbreviated as UPSR; Malay), is a national
examination taken by all pupils in our country at the end of their sixth year
in primary school before they leave for secondary school. It is prepared and
examined by the Malaysian Examinations Syndicate. This test consists of two
papers namely Paper 1 and Paper 2.
Multiple-choice questions are tested using a standardised optical
answer sheet that uses optical mark recognition for detecting answers for
Paper 1 and Paper 2 comprises three sections, namely Sections A, B, and C.
TOPIC 6
6.0
SYNOPSIS
Topic 6 focuses on ways to assess language skills and language
content. It defines the types of test items used to assess language
skills and language content. It also provides teachers with suggestions
on ways a teacher can assess the listening, speaking, reading and
writing skills in a classroom. It also discusses concepts of and
differences between discrete point test, integrative test and
communicative test.
6.1
LEARNING OUTCOMES
At the end of Topic 6, teachers will be able to:
6.2
FRAMEWORK OF TOPICS
LANGUAGE SKILLS
LISTENING
SPEAKING
ASSESSING
LANGUAGE SKILLS
AND
LANGUAGE CONTENT
READING
WRITING
LANGUAGE CONTENT
DISCRETE TEST
INTEGRATIVE
TEST
COMMUNICATIVE
TEST
OBJECTIVE AND
SUBJECTIVE TESTING
CONTENT
SESSION SIX (6 hours)
6.2.1
Listening
Basically there are two kinds of listening tests: tests that test specific aspects
of listening, like sound discrimination; and task based tests which test skills in
accomplishing different types of listening tasks considered important for the
students being tested. In addition to this, Brown 2010 identified four types of
listening performance from which assessment could be considered.
i. Intensive : listening for perception of the components (phonemes, words,
intonation, discourse markers,etc) of a ;larger stretch of language.
ii. Responsive : listening to a relatively short stretch of language ( a
greeting, question, command, comprehension check, etc.) in order to
make an equally short response
iii. Selective : processing stretches of discourse such as short monologues
for several minutes in order to scan for certain information. The
purpose of such performance is not necessarily to look for global or
general meaning but to be able to comprehend designated information
in a context of longer stretches of spoken language( such as classroom
directions from a teacher, TV or radio news items, or stories).
Assessment tasks in selective listening could ask students, for example,
to listen for names, numbers, grammatical category, directions (in a
map exercise), or certain facts and events.
iv. Extensive : listening to develop a top-down , global
understanding of spoken language. Extensive performance
ranges from listening to lengthy lectures to listening to a
conversation and deriving a comprehensive message or
purpose. Listening for the gist or the main idea- and making
inferences are all part of extensive listening.
b.
Speaking
In the assessment of oral production, both discrete feature
objective tests and integrative task-based tests are used. The first
type tests such skills as pronunciation, knowledge of what
language is appropriate in different situations, language required
in doing different things like describing, giving directions, giving
instructions, etc. The second type involves finding out if pupils
can perform different tasks using spoken language that is
appropriate for the purpose and the context. Task-based activities
involve describing scenes shown in a picture, participating in a
discussion about a given topic, narrating a story, etc. As in the
listening performance assessment tasks, Brown 2010 cited four
categories for oral assessment.
1.
2.
A.
B.
C.
Reading
A reading text can also convey various kinds of meaning and reading
involves the interpretation or comprehension of these meanings. First,
grammatical meaning are meanings that are expressed through
linguistic structures such as complex and simple sentences and the
correct interpretation of those structures. A second meaning is
informational meaning which refers largely to the concept or
messages contained in the text. Respondents may be required to
comprehend merely the information or content of the passage and this
may be assessed through various means such as summary and
prcis writing. Compared to grammatical or syntactic meaning,
informational meaning requires a more general understanding of a text
rather than having to pay close attention to the linguistic structure of
sentences. A third meaning contained in many texts is discourse
meaning. This refers to the perception of rhetorical functions conveyed
by the text. One typical function is discourse marking which adds
cohesiveness to a text. These words, such as unless, however, thus,
therefore etc., are crucial to the correct interpretation of a text and
students may be assessed on their ability to understand the discoursal
1.
2.
3.
There are many examples of each type of test. Objective type tests
include the multiple choice test, true false items and matching
items because each of these are graded objectively. In these
examples of objective tests, there is only
one correct response and the grader does not need to subjectively
assess the response.
Two other terms, select type tests and supply type tests are related
terms when we think of objective and subjective tests. In most
cases, objective tests are similar to select type tests where
students are expected to select or choose the answer from a list of
options. Just as a multiple choice question test is an objective type
test, it can also be considered a select type test. Similarly, tests
involving essay type questions are supply type as the students are
expected to supply the answer through their essay. How then
would you classify a fill in the blank type test? Definitely for this
type of test, the students need to supply the answer, but what is
supplied is merely a single word or a short phrase which differs
tremendously from an essay. It may therefore be helpful to once
again consider a continuum with supply type and select type items
at each end of the continuum respectively.
It is not by accident that we find there are few, if any, test formats that are
either supply type and objective or select type and subjective. Select type
tests tend to be objective while supply type tests tend to be subjective.
In addition to the above, Brown and Hudson (1998), have also suggested
three broad categories to differentiate tests according to how students are
expected to respond. These categories are the selected response tests, the
constructed response tests, and the personal response tests. Examples of
each of these types of tests are given in Table 6.1.
Constructed response
Personal response
True false
Fill-in
Conferences
Matching
Short answer
Portfolios
Multiple choice
Performance test
Communicative Test
As language teaching has emphasised the importance of
communication through the communicative approach, it is not
surprising that communicative tests have also been given prominence.
A communicative emphasis in testing involves many aspects, two of
which revolve around communicative elements in tests and meaningful
content. Both these aspects are briefly addressed in the following sub
sections:
In short, the kinds of tests that we should expect more of in the future
will be communicative tests in which candidates actually have to
produce the language in an interactive setting involving some degree of
unpredictability which is typical of any language interaction situation.
These tests would also take the communicative purpose of the
interaction into consideration and require the student to interact with
language that is actual and unsimplified for the learner. Fulcher finally
points out that in a communicative test, the only real criterion of
success is the behavioural outcome, or whether the learner was
able to achieve the intended communicative effect (p. 493). It is
obvious from this description that the communicative test may not be
Exercise 1
1.
2.
TOPIC 7
7.0
SYNOPSIS
Topic 7 focuses on the scoring, grading and assessment criteria. It
provides teachers with brief descriptions on the different approaches to
scoring namely:-objective, holistic and analytic.
7.1
LEARNING OUTCOMES
7.2
FRAMEWORK OF TOPICS
Approaches to
scoring
Objective
Holistic
Analytic
CONTENT
SESSION SEVEN (3 hours)
7.2.1
Objective approach
A type of scoring approach is the objective scoring approach. This scoring
approach relies on quantified methods of evaluating students writing. A
sample of how objective scoring is conducted is given by Bailey (1999) as
follows:
RRating
5-6
CCriteria
The 6 point scale above includes broad descriptors of what a students essay
reflects for each band. It is quite apparent that graders using this scale are
expected to pay attention to vocabulary, meaning, organisation, topic
Components
Content
Organisation
Vocabulary
Language Used
Mechanics
Weight
30 points
20 points
20 points
25 points
5 points
Advantages
Analytical
Objective
Quickly graded
Provide a public standard that is
understood by the teachers and
students alike
Relatively higher degree of rater
reliability
Applicable to the assessment of
many different topics
Emphasise the students
strengths rather than their
weaknesses.
It provides clear guidelines in
grading in the form of the various
components.
Allows the graders to consciously
address important aspects of
writing.
Emphasises the students
strengths rather than their
weaknesses.
Disadvantages
The single score may actually mask differences
across individual compositions.
Does not provide a lot of diagnostic feedback
EXERCISE
1.
TOPIC 8
8.0
SYNOPSIS
Topic 8 focuses on item analysis and interpretation. It provides teachers with
brief descriptions on basic statistics terminologies such as mode, median,
mean, standard deviation, standard score and interpretation of data. It will also
look at some item analysis that deals with item difficulty and item discrimination.
Teachers will also be introduced to distractor analysis in language assessment.
8.2
FRAMEWORK OF TOPICS
ITEM ANALYSIS
AND
INTERPRETATIO
N
BASIC
STATISTICS
CONTENT
SESSION EIGHT (6 hours)
ITEM ANALYSIS
MODE
STANDARD
DEVIATION
ITEM
DIFFICULTY
MEDIAN
STANDARD
SCORE
ITEM
DISCRIMINATIO
N
MEAN
INTERPRETATIO
N OF DATA
DISTRACTOR
ANALYSIS
MEDIAN
MEAN
8.2.2
Standard deviation
Standard deviation refers to how much the scores deviate from the mean.
There are two methods of calculating standard deviation which are the
deviation method and raw score method which are illustrated by the following
formulae.
To illustrate this, we will use 20, 25,30. Using standard deviation method,
we come up with the following table:
Table 8.1:Calculating the Standard Deviation Using the Deviation Method
Using the raw score method, we can come up with the following:
Table 8.2 : Calculating the Standard Deviation Using the Raw Score Method
Both methods result in the same final value of 5. If you are calculating
standard deviation with a calculator, it is suggested that the deviation
method be used when there are only a few scores and the raw score
method be used when there are many scores. This is because when
there are many scores, it will be tedious to calculate the square of the
deviations and their sum.
i. The Z score
The Z score is the basic standardised score. It is referred to as the
basic form as other computations of standardised scores must first
calculate the Z score. The formula used to calculate the Z score is as
follows:
Z score values are very small and usually range only from 2 to 2.
Such small values make it inappropriate for score reporting especially
for those unaccustomed to the concept. Imagine what a parent may
say if his child comes home with a report card with a Z score of 0.47
in English Language! Fortunately, there is another form of
standardised score - the T score with values that are more
palatable to the relevant parties.
ii.
The T score
The T score is a standardised score which can be computed using the
formula 10 (Z) + 50. As such, the T score for students A, B, C, and D in
the table 4.3 are 10(-1.28) + 50; 10 (-0.23) + 50; 10(0.47) + 50; and 10
Interpretation of data
The standardised score is actually a very important score if we want to
compare performance across tests and between students. Let us take the
following scenario as an example:
How can En. Abu solve this problem? He would have to have
standardised scores in order to decide. This would require the
following information:
Test 1 : X = 42 standard deviation= 7
Test 2 : X = 47 standard deviation= 8
Using the information above, En. Abu can find the Z score for each
raw score reported as follows:
Table 8.4: Z Score for Form 2A
Based on Table 8.4, both Ali and Chong have a negative Z score as
their total score for both tests. However, Chong has a higher Z score
total (i.e. 1.07 compared to 1.34) and therefore performed better
when we take the performance of all the other students into
consideration.
Similarly, the area between the mean and 1 standard deviation is also
34.13%. As such, the area between 1 and 1 standard deviations is
68.26%.
In using the normal curve, it is important to make a distinction between
standard deviation values and standard deviation scores. A standard
deviation value is a constant and is shown on the horizontal axis of the
diagram above. The standard deviation score, on the other hand, is the
obtained score when we use the standard deviation formula provided
earlier. So, if we find the score to be 5 as in the earlier example, then the
score for the standard deviation value of 1 is 5 and for the value of 2 is 5
x 2 = 10 and for the value of 3 is 15 and so on. Standard deviation
values of 1, -2, and 3 will have corresponding negative scores of 5, 10, and 15.
8.2.5
Item analysis
a.
Item difficulty
Item difficulty refers to how easy or difficult an item is. The formula
used to measure item difficulty is quite straightforward. It involves
finding out how many students answered an item correctly and
dividing it by the number of students who took this test. The formula
is therefore:
Lets use the following instance as an example. Suppose you have just
conducted a twenty item test and obtained the following results:
As there are twelve students in the class, 33% of this total would be 4
students. Therefore, the upper group and lower group will each consist
of 4 students each. Based on their total scores, the upper group would
consist of students L, A, E, and G while the lower group would consist
of students J, H, D and I.
We now need to look at the performance of these students for each
item in order to find the item discrimination index of each item.
For item 1, all four students in the upper group (L, A, E, and G)
answered correctly while only student H in the lower group answered
correctly. Using the formula described earlier, we can plug in the
numbers as follows:
Distractor analysis
Distractor analysis is an extension of item analysis, using techniques
that are similar to item difficulty and item discrimination. In distractor
analysis, however, we are no longer interested in how test takers select
the correct answer, but how the distractors were able to function
effectively by drawing the test takers away from the correct answer.
The number of times each distractor is selected is noted in order to
determine the effectiveness of the distractor. We would expect that the
distractor is selected by enough candidates for it to be a viable
distractor.
What exactly is an acceptable value? This depends to a large extent on
the difficulty of the item itself and what we consider to be an acceptable
item difficulty value for test items. If we are to assume that 0.7 is an
appropriate item difficulty value, then we should expect that the
remaining 0.3 be about evenly distributed among the distractors.
Let us assume that 100 students took the test. If we assume that A is the
answer and the item difficulty is 0.7, then 70 students answered correctly.
What about the remaining 30 students and the effectiveness of the three
distractors? If all 30 selected D, then distractors B and C are useless in
their role as distractors. Similarly, if 15 students selected D and another 15
selected B, then C is not an effective distractor and should be replaced.
Therefore, the ideal situation would be for each of the three distractors to
be selected by an equal number of all students who did not get the answer
correct, i.e. in this case 10 students. Therefore the effectiveness of each
distractor can be quantified as 10/100 or 0.1 where 10 is the number of
students who selected the tiems and 100 is the total number of students
who took the test. This technique is similar to a difficulty index although the
result does not indicate the difficulty of each item, but rather the
effectiveness of the distractor. In the first situation described in this
paragraph, options A, B, C and D would have a difficulty index of 0.7, 0, 0,
and 0.3 respectively. If the distractors worked equally well, then the indices
would be 0.7, 0.1, 0.1, and 0.1. Unlike in determining the difficulty of an
item, the value of the difficulty index formula for the distractors must be
interpreted in relation to the indices for the other distractors.
From a different perspective, the item discrimination formula can also be
used in distractor analysis. The concept of upper groups and lower groups
would still remain, but the analysis and expectation would differ slightly
from the regular item discrimination that we have looked at earlier. Instead
of expecting a positive value, we should logically expect a negative value
as more students from the lower group should select distractors. Each
distractor can have its own item discrimination value in order to analyse
how the distractors work and ultimately refine the effectiveness of the test
item itself.
Table 8.6: Selection of Distractors
Distractor A
Distractor B
Distractor C
Distractor D
Item 1
8*
Item 2
8*
Item 3
8*
Item 4
8*
Item 5
7*
d.
* indicates key
For Item 1, the discrimination index for each distractor can be calculated
using the discrimination index formula. From Table 8.5, we know that all the
students in the upper group answered this item correctly and only one
student from the lower group did so. If we assume that the three remaining
students from the lower group all selected distractor B, then the
discrimination index for item 1, distractor B will be:
This negative value indicates that more students from the lower group
selected the distractor compared to students from the upper group. This
result is to be expected of a distractor and a value of -1 to 0 is preferred.
EXERCISE
1. Calculate the mean, mode, median and range of the following set of
scores:
23, 24, 25, 23, 24, 23, 23, 26, 27, 22, 28.
2. What is a normal curve and what does this show? Does the final
result always show a normal curve and how does this relate to
standardised tests?
TOPIC 9
9.0 SYNOPSIS
Topic 9 focuses on reporting assessment data. It provides teachers with brief
descriptions on the purposes of reporting and the reporting methods.
9.1 LEARNING OUTCOMES
By the end of Topic 9, teachers will be able to:
REPORTING OF
ASSESSMENT
DATA
PURPOSES OF
REPORTING
CONTENT
SESSION NINE (3 hours)
REPORTING
METHODS
9.2.2
Reporting methods
Student achievement progress can be reported by comparing:
i. Norm - Referenced Assessment and Reporting
Assessing and reporting a student's achievement and progress in
comparison to other students.
ii Criterion - Referenced Assessment and Reporting
Assessing and reporting a student's achievement and progress in
comparison to predetermined criteria.
An outcomes-approach to assessment will provide information about
student achievement to enable reporting against a standards
framework.
iii An outcomes-approach
Acknowledges that students, regardless of their class or grade, can be
working towards syllabus outcomes anywhere along the learning
continuum.
awards.
Is valid
Assessment strategies should accurately and appropriately assess
clearly defined aspects of student achievement. If a strategy does
not accurately assess what it is designed to assess, then its use is
misleading.
Valid assessment strategies are those that reflect the actual
intention of teaching and learning activities, based on syllabus
outcomes.
Where values and attitudes are expressed in syllabus outcomes,
these too should be assessed as part of student learning.
Is fair
Effective and informative assessment strategies are designed to
ensure equal opportunity for success regardless of students' age,
gender, physical or other disability, culture, background language,
socio-economic status or geographic location.
Engages the learner
Effective and informative assessment practice is student centred.
Ideally there is a cooperative interaction between teacher and
students, and among the students themselves.
The syllabus outcomes and the assessment processes to be used
should be made explicit to students. Students should participate in
the negotiation of learning tasks and actively monitor and reflect
upon their achievements and progress.
Values teacher judgement
Good assessment practice involves teachers making judgements,
on the weight of assessment evidence, about student progress
towards the achievement of outcomes.
Teachers can be confident a student has achieved an outcome
TOPIC 10
10.0 SYNOPSIS
Topic 10 focuses on the issues and concerns related to assessment in the
Malaysian primary schools. It will look at how assessment is viewed and used
in Malaysia.
10.1 LEARNING OUTCOMES
By the end of Topic 10, teachers will be able to:
ExamOriented
system
Alternative
assessment
Issues and
Concerns in
Malaysian
Schools
Schoolbased
assessment
CONTENT
SESSION TEN (3 hours)
Cognitive
Levels of
assessment
10.3
Exam-oriented System
10.4
Knowledge
Comprehension
Application
Analysis
Synthesis
Evaluation
Knowledge
Recalling memorized information. May involve remembering a wide range of
material from specific facts to complete theories, but all that is required is the
bringing to mind of the appropriate information. Represents the lowest level
of learning outcomes in the cognitive domain.
Learning objectives at this level: know common terms, know specific facts,
know methods and procedures, know basic concepts, know principles.
Question verbs: Define, list, state, identify, label, name, who? when? where?
what?
Comprehension
The ability to grasp the meaning of material. Translating material from one
form to another (words to numbers), interpreting material (explaining or
summarizing), estimating future trends (predicting consequences or effects).
Goes one step beyond the simple remembering of material, and represent
the lowest level of understanding.
Learning objectives at this level: understand facts and principles, interpret
verbal material, interpret charts and graphs, translate verbal material to
mathematical formulae, estimate the future consequences implied in data,
justify methods and procedures.
Question verbs: Explain, predict, interpret, infer, summarize, convert,
translate, give example, account for, paraphrase x?
Application
The ability to use learned material in new and concrete situations. Applying
rules, methods, concepts, principles, laws, and theories. Learning outcomes
in this area require a higher level of understanding than those under
comprehension.
Learning objectives at this level: apply concepts and principles to new
situations, apply laws and theories to practical situations, solve mathematical
Analysis
The ability to break down material into its component parts. Identifying parts,
analysis of relationships between parts, recognition of the organizational
principles involved. Learning outcomes here represent a higher intellectual
level than comprehension and application because they require an
understanding of both the content and the structural form of the material.
Learning objectives at this level: recognize unstated assumptions, recognizes
logical fallacies in reasoning, distinguish between facts and inferences,
evaluate the relevancy of data, analyze the organizational structure of a work
(art, music, writing).
Question verbs: Differentiate, compare / contrast, distinguish x from y, how
does x affect or relate to y? why? how? What piece of x is missing / needed?
Synthesis
(By definition, synthesis cannot be assessed with multiple-choice questions.
It appears here to complete Bloom's taxonomy.)
The ability to put parts together to form a new whole. This may involve the
production of a unique communication (theme or speech), a plan of
operations (research proposal), or a set of abstract relations (scheme for
classifying information). Learning outcomes in this area stress creative
behaviors, with major emphasis on the formulation of new patterns or
structure.
Learning objectives at this level: write a well organized paper, give a well
organized speech, write a creative short story (or poem or music), propose a
plan for an experiment, integrate learning from different areas into a plan for
solving a problem, formulate a new scheme for classifying objects (or events,
or ideas).
Question verbs: Design, construct, develop, formulate, imagine, create,
change, write a short story and label the following elements:
Evaluation
The ability to judge the value of material (statement, novel, poem, research
report) for a given purpose. The judgments are to be based on definite
criteria, which may be internal (organization) or external (relevance to the
purpose). The student may determine the criteria or be given them. Learning
outcomes in this area are highest in the cognitive hierarchy because they
contain elements of all the other categories, plus conscious value judgments
based on clearly defined criteria.
Learning objectives at this level: judge the logical consistency of written
material, judge the adequacy with which conclusions are supported by data,
judge the value of a work (art, music, writing) by the use of internal criteria,
judge the value of a work (art, music, writing) by use of external standards of
excellence.
Question verbs: Justify, appraise, evaluate, judge x according to given
criteria. Which option would be better/preferable to party y?
10.5
School-based Assessment
The traditional system of assessment no longer satisfies the educational
and social needs of the third millennium. In the past few decades, many
countries have made profound reforms in their assessment systems.
Several educational systems have in turn introduced school-based
assessment as part of or instead of external assessment in their
certification. While examination bodies acknowledge the immense
potential of school-based assessment in terms of validity and flexibility,
yet at the same time they have to guard against or deal with difficulties
related to reliability, quality control and quality assurance. In the debate
on school-based assessment, the issue of why has been widely written
about and there is general agreement on the principles of validity of
this form of assessment.
Izard (2001) as well as Raivoce and Pongi (2001) explain that schoolbased assessment (SBA) is often perceived as the process put in place
to collect evidence of what students have achieved, especially in
Academic:
Non-academic:
Centralised Assessment
Conducted and administered by teachers in schools using instruments,
rubrics, guidelines, time line and procedures prepared by LP
Monitoring and moderation conducted by PBS Committee at School,
District and State Education Department, and LP
School Assessment
The emphasis is on collecting first hand information about pupils learning
based on curriculum standards
Teachers plan the assessment, prepare the instrument and administer the
assessment during teaching and learning process
Teachers mark pupils responses and report their progress continuously.
10.6
Alternative Assessment
Alternative Assessment
One-shot tests
Indirect tests
Direct tests
Inauthentic tests
Authentic assessment
Individual projects
Group projects
No feedback to learners
Speeded exams
Power exams
Classroom-based tests
Summative
Formative
Product of instruction
Process of instruction
Intrusive
Integrated
Judgmental
Developmental
Teacher proof
Teacher mediated
Physical demonstration
Pictorial products
Reading response logs
K-W-L (what I know/what I want to know/what Ive learned) charts
Dialogue journals
Checklists
Teacher-pupils conferences
Interviews
Performace tasks
Portfolios
Self assessment
Peer assessment
Portfolios
A well known and commonly uses alternative assessment is the portfolio
assessment. The contents of the portfolio become evidence of abilities
much like how we would use a test to measure the abilities of our
students.
Bailey (1998, p: 218), describes a portfolio to contain four primary
elements.
Introductory Section
Overview
Reflective Essay
Personal Section
Assessment Section
Evaluation by peers
Self-evaluation
Journals
Score reports
Photographs
Personal items
and
4.
3.
I have difficulty with some questions, but I generally get the meaning
2.
1.
stimulate meta-cognition.
EXERCISE
In your opinion, what are the advantages of using portfolios as
a form of alternative assessment?
REFERENCES
Allen, I. J. (2011). Repriviledging reading: The negotiation of
uncertainty.
Pedagogy: Critical Approaches to Teaching
Literature, Language Composition, and Culture, 12 (1) pp. 97120.
Available at:
http://pedagogy.dukejournals.org/cgi/doi/10.1215/153142001416540(RetrievedSeptember 26, 2013)
Alderson, J. C. (1986b). Innovations in language testing? In M.
Portal
(Ed.), Innovations in language testing. pp. 93-105.
Windsor: NFER/Nelson.
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test
construction
and evaluation. Cambridge: Cambridge University
Press.
Anderson, L.W. (Ed.), Krathwohl, D.R. (Ed.), Airasian,P.W.,
Cruikshank, K.A.,
Mayer, R.E., Pintrich, P.R.,Raths, J., &
Wittrock, M.C. (2001). A
taxonomy for learning, teaching, and
assessing: A revision of Bloom's
Taxonomy of Educational
Objectives (Complete edition). New York: Longman.
Anderson, K. M., (2007). Differentiating instruction to include all
students. Preventing School Failure, 51 (3) pp. 49-54.
Bachman, L. F. (2004). Statistical Analyses for Language
Assessment. pp.
22-23. Cambridge, UK: Cambridge
University Press.
Biggs, J. B. and Collis, K. F. (1982).Evaluating the Quality of
Learning: the
SOLO taxonomy. New York, NY: Academic Press.
Biggs, J. B., & Collis, K .F. (1991) Multimodal learning and the
quality of intelligent behaviour. In: H. Rowe (Ed.) Intelligence:
Reconceptualization and measurement. Hillsdale, NJ: Lawrence
Erlbaum. pp. 57-75.
Biggs, J.B.& Tang, C. (2009). Applying constructive alignment to
outcomes- based teaching and learning. Training Material. Quality
Teaching for
Learning in Higher Education Workshop for
Master Trainers. Ministry
of Higher Education. Kuala Lumpur.
Black, P. & Wiliam, D. (2009). Developing the theory of formative
assessment
J. Gardiner, ed. Educational Assessment
Evaluation and Accountability, 1 (1), pp. 531.
Available at: http://eprints.ioe.ac.uk/1119/. (Retrieved 23 August
2013)
Bloom, B. S. (Ed.). Engelhart, M.D., Furst, E.J., Hill,W.H., &
University Press.
In M.
Celce-Murcia (Ed.). Beyond basics: Issues and research
in TESOL pp. 137-152. Rowley, MA: Newbury House.
Davidson, F., & Lynch, B. (2002). Testcraft: A teachers guide to
writing and using language test specifications. New Haven, CT:
Yale University Press.
Davies, A., Brown, A., Elder, C., Hill, K., Lumley, T. and
McNamara, T. (1999). Dictionary of language testing.
Cambridge: University ofCambridge Local Examinations
Syndicate and Cambridge University Press.
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn
(ed.). Educational Measurement. (3rd. ed.) pp.105-146. New
York, NY: Macmillan.
Gottlieb, M. (2006). Assessing English Language Learners:
Bridges from Language Proficiency to Academic Achievement.
USA: Corwin Press.
Grotjahn, R. (1986).Test validation and cognitive psychology:
Some methodological considerations.Language Testing
3,pp.15885.
Hattie, J. (2009).Visible Learning. New York: Routledge.
Hattie, J. (2012) Visible Learning for Teachers: Maximizing Impact
on
Learning. Abingdon: Routledge
Hattie, J. & Brown, G. (2004) Cognitive processes in asTTle: The
SOLO taxonomy. University of Auckland/Ministry of Education.
asTTle Technical Report 43
Hook, P. & Mills, J. (2011) SOLO Taxonomy: A Guide for Schools
Book 1: A
common language of learning. Laughton, UK:
Essential Resources Educational Publishers.
Huang, S.C. (2012).English Teaching: Practice and Critique 11 (4),
pp.
99119.
Hughes, A. (2003). Testing for language teachers (2nd. Ed.).
Cambridge,
MA: Cambridge University Press.
Gavin, B. et al. (2008). An introduction to educational assessment,
measurement and evaluation. (2nd ed.). Australia: Pearson
Education New Zealand.
Moseley, D., Baumfield, V., Elliott, J., Gregson, M., Higgins, S.,
Miller, J., &
Newton, D. (2005).Frameworks for Thinking: A
handbook for teaching
and learning. Cambridge: Cambridge
University Press.
Mousavi, S. A. (2009). An encyclopedic dictionary of language
testing (4th ed.)
Tehran: Rahnama Publications.
Norleha Ibrahim. (2009). Management of measurement and
evaluation
Module. Selongor: Open University Malaysia.
Nckles, M., Hbner, S. & Renkl, A. (2009). Enhancing selfregulated learning
by writing learning protocols. Learning and
Instruction, 19(3), pp. 259 271. Available
at: http://linkinghub.elsevier.com/retrieve/pii/S0959475208000558
(Retrieved March 26, 2013).
Oller, J. W. (1979). Language tests at school: A pragmatic
approach. London: Longman.
Pearson, I. (1988).Tests as levers for change. In D. Chamberlain
& R. Baumgardner (Eds.), ESP in the classroom: Practice and
evaluation (Vol. 128, 98-107). London: Modern
EnglishPublications.
Pimsleur, P. (1966). Pimsleur Language Aptitude Battery. New
York, NY:
Harcourt, Brace & World.
NAMA
NURLIZA BT OTHMAN
othmannurliza@yahoo.com
KELAYAKAN
KELULUSAN:
KELULUSAN