You are on page 1of 22

HMEF5053

MEASUREMENT &
EVALUATION IN EDUCATION

Dr. Lee Leh Hong


sharonllh@yahoo.com
012-204 8891
TOPIC 1
WHY ASSESS?
By the end of this topic, you should be able to:

1. Describe the imperial examination in China;


2. Identify important events in the development of educational
testing in the United States;
3. Differentiate between measurement, evaluation and assessment;
4. Explain the purposes of assessment;
5. List the differences between formative and summative assessment;
and
6. Justify when norm-referenced and criterion-referenced is adopted.
HISTORY
Early Evidence of
Educational Testing was
conducted in China called
Imperial Examination
System

Originated during Han


Dynasty around 115 AD
and was introduced during
the Sui Dynasty around
600 AD.

Fully implemented in
China during the Song
Dynasty (960-1279) and
the examination system
was conducted over 1300
years
Examination hall with
7500 cells, Guangdong,
1873.

http://www.chinatoday.com.cn/Engl
ish/e2008/e200802/p54.htm
The imperial examination was a highly competitive endeavour which required
both effort and luck. For every round of examinations, only a selected few were
awarded top honours among tens of thousands of candidates. Many were
tempted to take shortcuts to success by cheating. A variety of tools and tricks
were used, including bribery, and cheats were caught frequently. This is a
replica of a linen vest which was used for cheating in the imperial examination
during the Qing dynasty. Hidden inside are 62 "eight-legged essays" with a total
of more than 40,000 characters. This artefact is from the Shanghai Jiading
Museum.
Development of Modern Educational
Measurement

Sir Francis Galton Alfred Binet (11 July 1857 – 18 Oct1911)


(16 Feb 1822 – 17 Jan 1911)
French psychologist who invented
The study of human abilities and the1st the first usable intelligence test,
to apply statistical methods to the known at the time as the Binet test
measurement of human differences and and today referred to as the IQ test.
heredity. Introduced questionnaires and
surveys for collecting data on human
abilities and competencies.
Charles Edward Spearman
(1863-1945)
British psychologist known for his
seminal work on the testing and
measuring of human intelligence. In
1904 he posited, in the
prestigious American Journal of
Psychology, his famous two-factor
theory of intelligence. Its components
were soon abbreviated to g to represent
general intelligence and s for specific
ability
David Weschler
Lewis Terman (1896– 1981)
(15 Jan 1877 – 21 Dec 1956)
The Wechsler Adult Intelligence Scale
American psychologist, noted as a (WAIS IQ Test) (first published in
pioneer in educational psychology in 1955), originated in a revision of the
the early 20th century at the Stanford Wechsler-Bellevue IQ test (1939), itself
University School of Education. He is a battery of tests composed from
subtests Wechsler conscripted from
best known as the inventor of
Yerkes' Army Tests (Yerkes, 1921). The
the Stanford-Binet IQ test. WAIS IQ test measures general
intelligence, which Wechsler defined as
"The global capacity of a person to act
purposefully, to think rationally, and to
deal effectively with his/her
environment."
Thurstone (1887-1955) was responsible for
the standardized mean and standard
deviation of IQ scores used today, as opposed
to the Intelligence Test system originally used
by Alfred Binet He is also known for the
development of the Thurstone scale.

Raymond Bernard Cattell


(20 March 1905 – 2 February 1998)

Known for identifying the


dimensions of personality, he also Joy Paul Guilford (1897-1987)
studied basic dimensions of other
domains: intelligence,motivation Guilford's Structure of Intellect (SI)
and vocational interests. Cattell theory, an individual's performance on
theorized the existence of fluid intelligence tests can be traced back to
and crystallized intelligences to the underlying mental abilities or
explain human cognitive ability, factors of intelligence. SI theory
and authored the Culture Fair comprises up to 150 different
Intelligence Test to minimize the intellectual abilities organized along
bias of written language and three dimensions—Operations,
cultural background in intelligence Content, and Products
testing.
What do you understand by
the following terms?

 Measurement (Pengukuran)
 Evaluation/Assessment
(Penilaian/Pentaksiran)
 Test (Ujian)
TEST
A set of tasks/questions administered to a person
during a fixed period of time under standardised
conditions to obtain a score that has a specific
psychometric property (e.g. ability, skill, attitude,
etc)
MEASUREMENT

The act of assigning a number to a particular


attribute/characteristic of a person.
ASSESSMENT/EVALUATION
The process of collecting information to make
decisions/value judgment about a person’s ability,
skill, attitude, etc
All the three concepts may be involved in a
single process

Example:
To determine a pupil’s performance in
Mathematics, a teacher may assign him
a task (test) to obtain a numeric score
(measuremet). Based on the score, the
teacher decides whether this particular
pupil is good, average or poor in Maths
(evaluation/assessment)

Can you think of an example of your own?


WHY ASSESS?

Task for Discussion


As a teacher, you are aware that your
pupils have often been assessed,
either by you or a higher authority.
Make a list of reasons why it is
necessary that they be assessed
throughout their schooling?
PURPOSES OF ASSESSMENT
 TO IMPROVE LEARNING
Diagnosis
Exceptionality
Placement
For Communication to parents
Certification
School Administration & Counselling
 TO IMPROVE TEACHING
Were the intended learning
outcomes achieved?
Were the teaching
methods/strategies effective?
Did the teaching take into
consideration learners’ factors (e.g
prior knowledge, ability levels,
learning styles) ?
Were the teaching materials effective?
Were particular teachers more
effective than the others? Why?
Formative Summative
Assessment Assessment
• Conducted throughout • Conducted at the end
Timing the teaching- leaning of a teaching-learning
process phase (e.g. end of year
or sem.)

Aim • To assess learning • To assess achievement


progress – on-going of the instructional
• To identify needs for goals of a course/prog
remediation or - terminal
enrichment • To certify SS and
improve curriculum
Method • Paper & pencil tests, • Paper & pencil tests,
observations, quizzes, practical tests
exercises, etc
• Final exam,, qualifying
Example • Monthly tests, weekly
tests, national exam
quizzes, daily reports
(PMR, UPSR, SPM,
etc
STPM, etc)
FORMATIVE Vs SUMMATIVE

SUMMATIVE
FORMATIVE ASSESSMENT ASSESSMENT

COURSE OF STUDY
• Task I: Class Discussion
A Science and a Maths teacher have carried out their
monthly tests and is discussing the performance of
pupil A in the class.

 Sc Tr. : Pupil A has performed as well and


better than 80% of the pupils in the
class in the test.
 Maths Tr.: Pupil A has obtained 80% of the
total marks in the test.

What type of test has been administered by each


teacher? Are the two tests different? What are the
differences? What are the similarities?
Sc Teacher has compared pupil A’s
performance with the rest of the pupils in the
class NORM-REFERENCED TEST
(UJIAN RUJUKAN NORMA)

Maths Teacher has compared pupil A’s


performance with a predetermined criteria
CRITERION-REFERENCED TEST
(UJIAN RUJUKAN KRITERIA)

CRT determines what the SS can do or cannot do,


NOT how good they are compare with others as in
NRT
NORM-REFERENCED TESTS CRITERION-REFERENCED
TESTS

AIM • Compare a student’s • Compare a student’s


performance with other performance against
SS some criteria
• Select SS for • To determine how
certification much a student has
learnt
• To improve the
teaching & learning
process

• Vary from simple to • Posses similar


QESTION difficulty related to the
difficult criteria/Match item
• Omit very easy & very difficulty with learning
hard items tasks
CONTENT • Wide coverage • Specific aspects
REPORT • Grades are assigned • No grades are assigned
EXAMPLE • UPSR, PMR, SPM • Class tests,
assignments, exercises
SIMILARITIES

• Both require specification of achievement


domain to be measured
• Both require a relevant sample of test
items
• Both can use the same types of test items
• Both are judged by the same qualities of
goodness (validity & reliability)
• Both are useful in educational assessment
THANK YOU

You might also like