You are on page 1of 147

Minggu 10

1. Mengaplikasikan Analisis cara


taksiran formatif mengajar 4
untuk
mengenalpasti Penilaian
pencapaian hasil •Pemulihan/
pembelajaran pembetulan dan
2. Mengenalpasti pengukuhan/
teknik pengajaran pengayaan
dan pembelajaran
yang sesuai
Learning Framework
…but have you answered the questions all
learners need to know?

 Where do I need to go?


 Why should I go there?
 How will I get there?
 How will I know when I’ve arrived?
Common Test Types And Characteristics
Type Advantages Disadvantages Best Utilized

True – False • Easy to construct • Can be ambigeous • To measure recall and


Yes - No •Can reinforce comprehension of facts
incorrect
information
Enables guessing

Multiple Choice • Easy to score and • Difficult to construct • To measure


statistically analyse • Enables students to comprehension
• Can be constructed to answer by process of • To measure higher
measure analyse and unintentionally hidden cognitive skills
synthesis of information clues

Matching •Popular with students • Difficult to construct • To measure


•Can be constructed to • Enables students to comprehension by
include broad range of answer by process of comparing information
information

Short Answer Open- •Easy to construct •Difficult to score as •To measure to recall of
Ended •Adaptable to specific more than one answer facts and specific
subject content may be correct knowledge

Fill in The Blank •Can be more focused •Diffuclt to score when •To measure recall of
and easily scored more than one answer facts and specific
Written Tests
Selected- response Short-answer test Essay test
test
Characteristics Objective; Choose Objective; Ask to Ask to discuss one
among supply into from or more related
alternatives; memory; Assess ideas according to
Assess foundational certain criteria
foundational knowledge
knowledge
Advantages Efficiency Relatively easy to Assess higher-level
write; Allow for abilities
breadth
Disadvantages Focus on verbatim Focus on verbatim Lack of
memorization memorization consistency of
grading
Way to Measure Student Learning

 Written test
- Selected-response tests
- Short- answer tests
- Essay tests
 Performance tests

- Direct writing assessments


- Portfolios
- Exhibitions
- Demonstrations
Conceptions of learning ( Saljo 1979 )

1. Learning as a quantitative increase in knowledge.


Learning is acquiring information or “ knowing a lot”
2. Learning as memorising . Learning is storing
information that can be reproduced.
3. Learning as acquiring facts, skills and method that
can be retained and used as necessary.
4. Learning as making sense or abstracting meaning.
Learning involves relating parts of the subject matter
to each other and to the real world.
5. Learning as interpreting and understanding reality in
a different way. Learning involves comprehending
the world by re-interpreting knowledge.
Analytic Framework : Broad Conceptions of
Teaching
Type CCategory DeDescription

I Teacher- A Teaching as transmitting concepts of syllabus


Focused

B Teacher as transmitting teacher’s knowledge

II Student C Teacher as helping students acquire concepts of


Focused syllabus
D Teacher as helping students acquire teacher’s
knowledge
Results : Type I Conception
( Teacher- Focused)

“ [ Teaching ] is a transfer of knowledge from


somebody who accumulates certain amount
of knowledge to people who are recipient[s]
of the knowledge” (Professor of Medicine)

* Focus on transfer of information


* Students’ prior knowledge not
considered
* Students are passive recipients
Result : Type II Conception
( Student – Focused)
“I don’t give [students] recipes. I expect them to understand the
concepts behind what they’re doing and I expect them to learn so
they are able to do the experiments on their own. And you know,
problem solve and design experiments later. So I guess I’m always
teaching so they can function on their own as scientist. But they are
expected to know, the concepts and not just how you do it, but why
you do it, I guess.” ( Preventative Medicine)
 Teaching viewed as facilitative prosess, not a matter of
transmission
 Focus of teaching on helping students discover
knowledge that he as an expert already holds
 Student’s ability to understand the relationship among
provied concepts is a valued component of leaning &
professional development
Result : Type III Conception
( Learner – Focused)

The goal of teaching is to bring about changes in the student so that


they can better understand the given area, so it’s about changing the
conceptual representations of studentss such that they can go out and
not only be able to recive, go out on their own and recover more
information outside of the classroom but also generate their own
questions. ( Teacher of Linguistics)

 Teaching helps students develop & change their


conceptions of subject matter
 Does not expect student to adopt his own woridview,

but to create own perspective on subject material


Kirkpatrick’s Four Levels of
Evaluation R
e
s
ul
t
Behavior

Learning

Reactions

Kirkpatrick’s Four
Levels of Evalution
Figure 1
Kirkpatrick’s Four Levels of Evalution
1- Reactions : Measures how students have reacted to the
training – Program evalution sheets
2- Learning : Measures what students have learned from the
training – Individual pre- and post test for comparisons
3- Behavior : Measures whether what was learned is being
applied in the life – Observations and feedback from others
4- Result : Measures whether the application of learning in
class is achieving result – difficult to measure

* Each successive level of evalution build upon the evalutions


of the previous level. Each successive level of evalutions
adds precision to the measure of effectiveness but requires
more time consuming analiysis and increased costs.
Evalution Frameworks :
Kirkpatrick’s Model
Level 4 : Does it matter? Does it advance
strategy?
Level 3 : are the doing it (objectives)
consistently and appropriately?
++++++++++++++++++++++++
+++++++
Level 2 : Can they do it (objectives)? Do they
show the skills and abilities?
Level 1 : did they like the experience?
Satisfaction? Use? Repeat use?
Before Intruction
• Student Records
- portfolio
- report cards
- info cards
- anecdotal notes
 Interest Inventory
 Observations
 KWL (focus on K and W)
 Class discussions
During Instruction (Formative)
 Observational Checklist
 Anecdotal Notes
 Class work
 Conference Notes

(writing/reading)
 Questioning
After Intruction (Summative)
Traditional
* paper/pencil test

Altenative
* Projects
* Portfolios
* Presentations
t s
m en
s s
ss e
t A
os
/P
P re
n g
i g ni
es
D
Selected Response Items
 Alternative Response
 Matching
 Multiple Choice
Alternative Response Items
• Stem contains a declarative statement

• Response choice (true/false, yes/no,


right/wrong, correct/incorrect, fact/opinion,
agree/disagree

• Measures the ability to correct identify the


correctness of statements of fact, definitions
of terms, or statements of principles (simple
learning outcomes)
Sample Alternative Response Items
T F 1. The green coloring material in a plant leaf
is called cholophyll

Y N 2. Is 50% of 38 more than 18?

T F 3. The Constition of the United States is


the highest law of the country

T F 4. The earth is a planet

T F O 5. There are intelligent life forms


on other planets
Tips to Remember
• If you are the simply doing T/F stay away
from opinion statements
• Keep the stem clear and concise (avoid
complex sentences)
• Do not use subjective words such as
frequently, most, some, few, usually, often ect
• Do not use absolute terms such as always,
never, all, none, or only
• Avoid the use of negative terms : no, not
• Keep a balanced response set
Advantages and Limitations
 Efficient
 Easy to construct
 Provides for a wide-sampling of material

 Limited to measuring at the knowledge level


 Susceptibility to guessing
Matching Items
 Measures simple associations/knowledge
level
 Student given a stem to match with correct

respose
Sample Matching Items
(small sample set only – typically 7-10 items in set)
Write the letter for the term in Column B that matches
the description in Column A. Each term is used only
once
Column A Column B
___ A number divisble by itself and one A. Integer

___ A symbolic represntation of a whole number B. Irrational number

___ Number that can be represnted by a ratio C. Numeral


of whole numbers

___ A positive or negative whole number D. Integer

E. Rational number
Alternate Matching Form
Read each example. The write the name of the literary technique beside
the example. A literary technique may be used more than once

Personification Simile Metaphor Alliteration

____ The flame of Nadia’s newfound knowledge burned inside her

____ The kitten studied the ball of clay carefully-taking stock of its shape and size-
triyng to decide whether it was going to attack him

____ He helped himself to a helty helping of hash brown potatoes

____ He aparment was like the duty dump

____ Juan felt as light as air-filled with joy of the news

____ The stump sat upright, looking down over the clear-cut valley with disdain
Advantages/Limitions
 Advantages : compact easy form, easy to
construct, easy to score

 Limitations : only measures basic recall,


sometimes difficult to have enough items to
develop a homegeneous set
Multiple Choice
 Measures simple to
complex learning
goals/objectives
 Typical items includes a

stem and list of


distacters
Key Ideas
• Stem must be clear and concise, the reader should know
what the answer should be without looking at the responses
• Do not leave a dangling stem – use a blank or convert to a
question
• Blanks should be left near the end of the stem
• Convert stems to questions when possible
• Avoid negatives (not, no, except, leats, ect)
• Only one correct answer! Avoid “A and B”, “B and C”, “B and A
but not C”, “All of the above,” “None of the above”
• Responses should be grammatically correct, approximately
the same length
• All distracters should be plausible
• Avoid clues in the stem
Sample Item
 Who was the main character of the story?
a. Granpa Jones
b. Granma Jones
c. Cousin Ralph
d. Anabel Jones
Sample MC Item
 Mary had tickets to the movies. Each tickets
cost 6 dollars.
What was the total cost of tickets?
A. 12 dollars
B. 18 dollars
C. 24 dollars
D. 30 dollars
Advantages/Limitions
• Advantages : Measures a variety of learning
outcomes (knowledge – application), specific
item – eliminates vagueness, forces student
to know what is correct, greater reliability
compared to alternative respons items, can
identify misconceptions
• Limitations : Does not move beyond
application phase, writing appropriate
distracters can be difficult
Constructed Response Items
 Completion
 Short – answer
 Essay
 Explanations
 Writing Prompt
Short Answer/Completion Item
• Short Answer items uses a direct question or
task
• Completion items uses consists of an
incomplete
• Used for measuring a wide variety of simple
learning outcomes, knowledge of
terminology, facts, principle, methods or
procedures, interpretation of data, solving
numerial problems
Short Answer Sample Items
• What is the name of the man who invented the
steam boat?
__________ __________
(first name) (last name)

 What device is used to detect whether an


electric charge is positive or negative? _______

 Name three organs in the digestive system.


__________ __________ _________
Completion Sample Items
 The name of the man who invented the
steamboat is _____________

 A member of the United States Senate is


elected to a term of ___________ years
Advantedge and Limitations
 Easy to construct
 Student must supply the answer, limit

guessing

 Not suitable for measuring complex learning


outcomes (knowledge/comprehension)
 Difficulty of scoring (partial answer)
Helpful Hints
 Do not start the item with a blank
 Provide enough clues to lead to the correct

answer
 Do not include too many blanks (at most two

blanks)
 Blanks for answer shoulf be equal in length
 If answer is expressed in numerial units

indicate the type of answer ___lb. ___oz


Easy Items
 Response elicits one or more paragraphs
from a student
 Measures student’s ability to synthesize,

evalute, and compose


 Two types – Restricted response

- Extended response
Essay Items
Restricted Response Restricted Response
 Limits the form and • Few boundaries
content of response • More extensive
response
Example • May want to limit
In a paragraph, leght (“use no more
describe two
than two pages”)
functions of the
digestive system. (6
points) Example
Helpful Hints for Essay Items
 Nurture concise response, convey to students
celar expectations for rsponse
- structure questions “name two” “list tree”
“in a paragraph…”
 Provide a value to each item

- “2 points” versus “5 points”


Helpful Hints
 Proof read your items carefully
 Is the language grade level appropriate?
 Is the layout grade level appropriate?
 Did you provide a space for name and date?
Best Practice Tips
 Check for aligment with intructianal
objectives!
 Make sure each item measures an intrustional

obejctive
 The assessment should consist of more than

one item per objective


ANALYZING AND USING TEST ITEM
DATA
Purpose of Item Analysis
 Evaluates the quality of each item
 Rationale : the quality of items determines

the quality of test (i.e., reliability & validity)


 May suggest ways of improving the

measurement of a test
 Can help with understanding why certain

tests predict some criteria but not others


Item Analysis
•When analyzing the test item, we have several
questions about the performance of each item. Some
of these questions include :
•Are the item congruent with the test objectives?
•Are the item valid? Do they measure what they supposed to
measure?
•Are the item reliable? Do they measure consistently?
•How long does it take an examinee to complete each item?
•What item are most difficult to answer correctly?
•What item are easy?
•Are they any poor performing items that need to be
discarded?
Types Of Item Analysis for CTT
 Three major types :

1. Assess quality of the distracters

2. Assess difficulty of the items

3. Assess how well an item differentiates between


high and low performers
Purposes and Elements of item Analysis

 To select the best available items for the final


form of the test.
 To identify structural or content defects in

the items.
 To detect learning difficulties of the class as a

whole
 To identify the areas of weakness of students

in need of remediation
Three elements of item analysis

1. Examination of the difficulty level of the


items.

2. Determination of the discriminating power


of each item, and

3. Examination of the effectiveness of


distractors in a multiple choice or matching
items.
The difficulty level of an item is known as
index of difficulty.

Index of difficulty is the percentage of students


answering correctly each item in the test
Index of discrimination refer to the percentage of
high-scoring individuals responding correctly
versus the number of low-scoring individuals
responding responding correctly to an item.
This numeric index indicates how effectively an item
differentiates between the students who did well
and those who did poorly on the test.
Preparing Data for Item Analysis

1. Arrange test score from highest to lowest.

2. Ger one-third of the papers from the


highest scores and the other third from the
lowest scores.

3. Record separately the number of times


each alternative was chosen by the
students in both groups.
4. Add the number of correct answers to
each item made by the combined upper and
lower groups.
5. Compute the index of difficulty for each
item, following formula :
IDF = (NRC/TS)100

Where IDF = index of difficulty


NRC = number of students responding correctly to an item
TS = total number of an students in the upper and lower
groups.
6. Compute thee index of discrimination,
based on the formula :
IDN = (CU – CL)
NSG
Where IDN = index of discrimination
CU = number of correct responses of the upper group
CL = number of correct responses of the lower group
NSG = number of student per group
Using information about Index of Difficulty

The difficulty index of a test items tells a


teacher about the comprehension of or
performance on material or task contained
in an item.
For an item to be considered a good item, its
difficulty index should be 50%. An item
with 50% difficulty index is neither easy nor
difficult.
If an item has a difficulty index of 67.5%, this
means that it is 67.5% easy and 32.5%
difficult.
Information on the index of difficulty of an
item can help a teacher decide whether a
test should be revised, retained or modified.
Interpretation of the Difficulty Index

Range Difficulty Level

20 & below Very difficult


21-40 Difficult
41-60 Average
61-80 Easy
81 & above Very easy
Using Information About Index Of
Discrimination
 The Index Of Discrimination tells a teacher the
degree to which a test item differentiates the high
achievers from the low achievers in is class. A test
item may have positive or negative discriminating
power.
 An item has a positive discriminating power when
more student from the upper group got the right
answer than those from the lowest group.
 When more student from the upper group got the
correct answer on an item than those from the
upper group, the item has a negative
discriminating power.
There are instance when an item has zero
discriminating power – when equal number
of students from upper and lower group got
the right answer to a test item.
In the given example, item 5 has the highest
discriminating power. This means that it
can differentiate high and low achievers.
Interpretation of the Difficulty Index

Range Verbal Description

.40 & above Very Good Item


.30 - .39 Good Item
.20 - .29 Fair Item
.09 - .19 Poor Item
When should a test item be rejected? Retained? Modified or revised

A test item can be retained when its level of


difficulty is average and discriminating power is
positive.
It has to rejected when it is either easy / very
easy or difficult / very difficult and its
discriminating power is negative or zero.
An item can be modified when its difficulty level
is average and its discrimination index is
negative.
Examining Distracter Effectiveness

An ideal item is one that all student in the


upper group answer correctly and all
students in the lower group answer
wrongly. And the responses of the lower
group have to be evenly distributed among
the incorrect alternatives.
Developing an Item Data File

 Encourage teachers to undertake an item analysis as


often as practical
 Allowing for accumulated data to be used to make item
analysis more reliable
 Providing for a wider choice of item format and
objectives
 Facilitating the revision of items
 Accumulating a large pool of items as to allow for some
items to be shared with the students for study
purposes.
Limitations Of Item Analysis
 It cannot be used for essay items.
 Teacher must be cautious about what
damage may be due to the table of
specifications when items not meeting the
criteria are deleted from the test. These
items are to be rewritten or replaced.
What is Item Discrimination?
 Generally, student who did well on the
exam should select the correct answer to
any given item on the exam.
 The Discrimination Index distinguishes for
each item between the performance of
students who did poorly.
How does it work?
 for each item, subtract the number in the
lower group who answered correctly from
the number of students in the upper group
who answered correctly.
 Divide the result by the number of students
in one group.
 The discrimination Index is listed in decimal
format and ranges between -1 and 1.
What a “good” value?
Item Discrimination : Examples

Number of correct answers in group Item


Item
Discriminati
no.
Upper 1/4 Lower 1/4 on Index

1 90 20 0.7
2 80 70 0.1
3 100 0 1
4 100 100 0
5 50 50 0
6 20 60 -04
Quick Reference
 Use the following table as a guideline to
determine whether an item ( or its
corresponding instruction) should be
considered for revision.
Item Discrimination Item Difficulty
(D)
High Medium Low

D = < 0% review review review

0 % < D < 30 % ok review ok

D > = 30 % ok ok ok
Distracter analysis

First question of item analysis : how many


people choose each response?
If there only one best response, then all other
response options are distracters.
Example from in class assignment (N=35):
Which method has best internal consistensy ?
a) Projective test 1
b) Peer ratings 1
c) Forced choice 21
d) Differences n.s. 12
Distracter analysis (cont’d)
 A perfect test item would have 2 characteristics :
1. Everyone who knows the item gets it right
2. People who do know the item will have responses equality
distributed across the wrong answer.

 It is not desirable to have one of the distracters chosen more


often then the correct answer.

 This result indicates a potential problem with the question.


This distracters may be too similar to the correct answer and
/or these maybe something in either the stem or the
alternatives that is misleading.
Distracter analysis (cont’d)
 Calculate the # of people expected to choose each
of the distracters. If random same expected
number for each wrong response (Figure 10-1).

# of Persons N answering incorrectly 14


Exp. To Choose ___________________ = __ =4.7
Distracter number of distracters 3
Distracter analysis (cont’d)
When the number of person choosing a distracter
significantly exceeds the number expected, these
are 2 possibilities:

1. It is possible that choice reflects partial


knowledge
2. The item is a poorly worded trick question

 Unpopular distracters may lower item and test


difficulty because it is easily eliminated
 Extremely popular likely to lower the reliability and
validity of the test
Distracter analysis : Definition
 Compare the performance of the highest
and lowest scoring 25% of the student on
the distracter option (i.e. the incorrect
answers presented on the exam)
 Fewer of the top performers should choose

each of the distracters as their answer


compared to the bottom performers.
Distracter analysis : Examples
Item 1 A B C D E Omit
% of student in upper 1/4 2 5 0 0 0 0
0
% of student in middle 1 1 1 1 5 0
5 0 0 0
% of student in lower 1/4 5 5 5 1 0 0
Item 2 A B C D
0 E Omit
% of student in upper ¼ 0 5 5 1 0 0
5
% of student in middle 0 1 1 5 2 0
0 5 0
% of student in lower 1/4 0 5 1 0 1 0
0 0
Distracter Analysis : Discussion

 What is the purpose of a good distracter?

 Which distracters should you consider


throwing out?
Item analysis report
Exercise : Interpret Item Analysis

 Review the sample report.


 Identify any exam items that may require

revision.
 For each identify item, list your observation

and hypothesis of the nature of the


problem.
Knowledge Or Successful Guessing?

Multiple Choice Exam Strategies


-improve odds by eliminating 1 or more
infeasible or unlikely answer options

Description Exam Strategies


-brain dumping
-part marks
-consideration for perfect answers to
questions that were not asked
Possibility of a “Random Pass”

Depends on the number


of answer options per question
and the number of questions!
Percent Pass ( >50%) by Chance
Number of
Questions 2 choice 3 choice 4 choice 5 choice

1 50 33 25 20

2 75 56 44 36

4 69 41 26 18

6 66 32 17 10

10 62 21 8 3

20 59 9.2 1.4 .3

50 56 1 .01 .0004
Adjustment for Guessing
Negative Marking…
- Elimination strategy reduces odds of
wrong answer penalty
- subtracting a percentage of the number
of wrong answer obtained from the final grade
- give a grade of 4 a correct answer and a
score of – 1 for a wrong on a 4 choice
question
Negative Marking…
- A score of less than zero is possible
-students hate negative marking
-negative marking is not practised in
descriptive examinations
- A poor substitute for a test that is too short
with too few answer options
Educational Measurement
and Evaluation

Myrna E. Lahoylahoy, Ph.D.


Measurement defined
 Process of quantifiying individual’s achievement,
personality, attitudes, habits and skills
 Quantification appraisal of observable phenomena
 Process of assigning symbols to dimensions of
phenomena
 An operation peformed on the physical world by
an observer
 Process by which information about the attributes
or characteristics of things are determined
differentiated
Evaluation defined
 Qualitative aspect of determining the outcomes of learning.
 Process of ranking with respect to attributes or trait

 Appraising the extent of learning


 Judging effectiveness of educ. Experience

 Interpreting and analyzing changes in behavior

 Describing accurately quantity and quality of thing


 Summing up results of measurement or tests giving meaning

based on value judgments


 Systematic process of determining the extent to which

instructional objectives are achieved


 Considering evidence in the light of value standard and in

terms of particular situations and goals which the group of


individuals are striving to attain
 TESTING- a technique of obtaining
information needed for evolution purposes

◦ Test, Quizzes, measuring, instruments- are devices


used to obtain such information
FUNCTIONS OF MEASUREMENTS
1. INSTRUCTIONAL
a)principal (basic purpose)
-to determine what knowledge, skills,
abilities, habits and attitudes have been
acquired
-to determine what progress or extent of
learning attained
-to determine strengths, weaknesses,
difficults and needs of students
Function of Evaluation
1. Evaluation assesses or make appraisal of
-Educational objectives, programs, curricula,
instructional materials, facilities
- teacher
- Learner
-Public relations of the school
- achievement scores of the learner
2. Evaluation conducts research
Principles of Evaluation
 Evaluation should be
1. Based on clearly stated objectives
2. Comprehensive
3. Cooperative
4. Used Judiciously
5. Continuous and integral part of the
teaching-learning process
Types of Evaluation used in
classroom instruction
1. Diagnostic Evaluation-detects pupil’s
learning difficulties which somehow are not
revealed by formative tests. It is more
comprehensive and specific
2. Formative Evaluation- it provides feedback
regarding the student’s performance in
attaining instructional objectives. It identifies
learning errors that needed to be corrected
and it provides information to make
instruction more effective
3.Placement evaluation- it defines student’s
entry behaviors. It determines knowledge and
skills he possesses which are necessary at the
beginning of instruction
4. Summative Evaluation-it determines the
extent to which objectives of instruction have
been attained and is used for assigning
grades/marks and to provides feedback to
students.
Qualities of a Good Measuring
Instrument
1. VALIDILITY
content, concurrent, predictive, construct
2. RELIABILITY
adequacy, objectivity, testing condition,
test administration procedures
3. USABILITY
(practicality) ease in administration,
scoring, interpretation and application, low
cost, proper mechanical make-up
VALIDITY
Content validity- face validity or logically
validity used in evaluating achievement test
Concurrent validity- test agrees with or
correlates with a criterion (ex. Entrance
examination)
predictive validity-degree of accuracy of how
activity which it intends to foretell
Construct validity-agreement of the test with
a theoretical construct or trait (ex.IQ)
RELIABILITY
 Methods of estimating reliability
1. Test –retest Method (uses spearmen rank correlation
coefficient)
2. Parallel forms/alternate forms (paired observations
are correlated)
3. Split-half method (odd-even halves and computed
using spearmen brown formula)
4. Internal-consistency method (kuder - Richardson
formula 20)
5. Scorer reliability method(two examiners
independently score a set of test papers then
correlate their scores)
Classification of Measuring
Instrument
1. Standard Tests
a) psychological test-intelligence test,
Aptitude test, Personality (rating scale)
test, vocational and professional interest
inventory
b) Educational Test
2. Teacher-made test
planning, Preparing, Reproducing,
Administering , Scoring, Evaluating, Interpreting
Criterion and Norm Reference Tests
Norm-Reference Tests
It compares a student’s performance of other students in
the class
It uses the normal curve in distributing grades of
students by placing them either above or below the
mean.
The teacher’s main concern is the variability of the score.
The more variable the score is the better because it can
determine how individual differs from the other.
Uses percentiles and standard scores
It tends to be of average difficulty.
 Measures of central
Tendency
Mean, Median, Mode
 Measures of Variability

Range, Quartile Deviation, Standard


Deviation
Measures of Central Tendency
MODE-the crude of inspectional average measure.
It is most frequently occurring score. It is the
poorest measure of central tendency.
Advantage: Mode is always a real value since it does
not fall on zero. It is simple to approximate by
observation for small cases. It does not
necessitates arrangement of values.
Disadvantage: it is not rigidly defined and is
inapplicable to irregular distribution
What is the mode of these scores?
75,60,78,75 76 75 88 75 81 75
Measures of Central Tendency
MEDIAN-the scores that divides the distribution into
halves. It is sometimes called the counting average.
Advantage : it is the best measure when the
distribution is irregular or skewed. It can be located
in an open – ended distribution or when the data is
incomplete (ex. 80% of the cases is reported)
Disadvantage: It necessitates arranging of items
according to size before it can be computed
What is the medium?
75,60,78,75 76 75 88 75 81 75
Measures of Central Tendency
MEAN-The most widely used and familiar
average. The most reliable and the most stable
of all measures of central tendency
Advantage: It is the best measure for regular
distribution.
Disadvantage: It is affected by extreme values
What is the mean?
75,60,78,75 76 75 88 75 81 75
STANDARD DEVIATION
 It is the most important and the best measure
of variability of test scores.
 A small standard deviation means that the

group has small variability or relatively


homogeneous.
 It is used with mean.
Letter grade Criterion- Norm- Self-referenced
Referenced referenced

B Very Good or Very Good; Very Good;


Proficient; performs above some
complete the average of improvement on
knowledge of the class most or all the
most content, objectives
skills ; mastery
of most
objectives
C Acceptable or Average; Acceptable;
basic; command performs at the some
of only the basic class average improvement on
content skills; some of the
mastery of some objectives
objectives
Letter Grade Criterion- Norm- Self-Referenced
referenced referenced
D Lacking ; little Poor ; below the Lacking ;
knowledge of class average minimal
most content; progress on
master of only a most objectives
few objectives
F Unsatisfactory; Unsatisfactory; Unsatisfactory;
lacks knowledge far below the no improvement
of content; no class average; on any
mastery of among the worst objectives.
objectives in the class
Grading and Framing Questions
(frisbie & Waltman,1992):
 What meaning should each grade symbol carry?
 What should “failure” mean?
 What elements or performances should be incorporated?
 How should the grades in a class be distributed?
 What should the components be like that go into a final
grade?
 What method should be used to assign grades?
 Should borderline cases be reviewed?
 What other factors can influence the philosophy of grading?
Essential Terminology
 Grade: A symbol that represents the degree to
which students have met a set of well-defined
instructional objectives.
 Absolute Grading: Absolute grading, or

criterion-referenced grading, consists of


comparisons between a student’s performance
and some previously defined criteria. Thus,
student’s are not compared to other students.
When using absolute grading, one must be
careful in designing the criteria that will be used
determine the student’s grades.
Essential Terminology
 Relative Grading:
-relative grading, or norm-referenced grading
-consists of comparisons between a student and others
in the same class, the norm group.
- those that perform better than most other students
that will be assigned certain grades.
. If using the normal curve in relative grading then
3.6% of the students should be assigned As, 23.8%Bs,
45.2%Cs, 23.8%Ds, and 3.6% Fs.
-emphasizes competition among group members and
does not accurately reflect any objective level of
achievement.
Essential Terminology
 Growth Grading : (self-referenced grading)
-consists of comparisons between a
student’s performance and their perceived
ability/capability.
. Overachievers would be assign highed
grades, while underachievers would be
assigned lower grades.
-Growth grading, while de-emphasizing
competition, tends to produce invalid grades
relative to achievement levels.
Letter Grades
 Advantages
 easy to use
Easy to interpret(theoretically)
Concise

• Disadvantages
Meaning of a grade may very widely
Does not address strengths & weaknesses
K-2 student’s may feel threatened by them
Number of Percentage Grades
1,2,3 or 98%, 80%, 60%
Advantages
 Easy to use.

 Easy to interpret (theoretically)

 Concise

 More continuous than Letter Grades

 May be combined with Letter Grades

Disadvantage
 Meaning of grade may vary widely

 Does not address strengths & weaknesses.

 K-2 students may feel threatened by them

 Meaning may need to be explained/interpreted.


Two-Category Grades
pass- fail, Acceptable- unacceptable , s/u
 Advantages

- less emotional for younger students.


-can encourage risk taking for students that
may not want to take the course for a grade
 Disadvantages
 Less reliable than a continuous measure
 Does not contain much information relative to a
student’s achievement.
Checklists and Rating Scales:
objective evaluated by checks or numerical ratings.
 Advantages

- results in a detailed list of student


achievements.
- may be combined with other measures.
 Disadvantages

- may become too detailed to easily


comprehend.
-Difficult for record keeping
Student-Teacher Conference
Discussion with no grade
 Advantages

- Involves a personal discussion of


achievement.
- May be used as a formative, ongoing
measure
 Disadvantages

- Teachers needs to be skilled in discussion and


offering+ and-feedback.
- Time consuming.
-Some students may feel threatened.
- Difficult for record keeping.
Parent-teacher conference:
discussion with no grade
 Advantages

- Involves personal discussion of achievement and may


alleviate misunderstanding.
-Teacher can show samples of work and rational for
assessment.
-May improve relations with parents.
• Disadvantages
- teachers need to be skilled in discussion and
offering=and- feedback
-time consuming
-may provoke parent-teacher anxiety
-may be inconvenient for parents
-Difficult for record keeping
Letter of parents:
explanation with no grade
 Advantages

- most useful as an addition form of


communication
 Disadvantages

- short letters may not adequately communicate


a student’s achievement.
- require good writing skills
-time consuming.
Guidelines for Effective and Fair
Grading (Gronlund,1998 )
 Discuss with students ( and parents when
approprite0 the basis of all grading, and all
grading procedures, at the beginning of the
course/school year
 Grades should reflect, and be based on,

student’s level of achievement, using only


those assessments that validly measure
achievement
 Grade should reflect, and be based on, a

composite of several valid assessments.


Guidelines for Effective and Fair
Grading
 When combining several valid assessments,
each assessment should be appropriately
weighted
 An appropriate type of grading framework

should be adopted, given the ultimate use of


the grade.
 All borderline grades should be re-evaluated

based on a careful examination of all


achievement evidence
A Few More Hints on Effective
Grading
 Emphasize fair grading and scoring.
 Grade relative to specific learning objectives.
 Base grades primarily on current

performance.
 Provide accurate, timely and helpful feedback.
 Use a sufficient number of assessments.
 Don’t lower grades due to misbehaviors or

attendances.
 Use professsional judgment.
Common criticisms of Grading
 Harmful to a students psyche.
 Do not motivate but may provide disincentive
 Mastery may not be the purpose of the

activity-or 100% performance may be


necessary
 Performance may be necessary to determine

acquisition of skill (e.g.,piano, computer)


 Written activities do not emphasize oral

communication which may be a more


functional skill
Are grades meaningless in the larger
of education?
 There are vast differences in grading
practices between teachers and schools.
 Most schools lack a standardized and

codified grading policy.


 A grade, a simple symbol, is incapable of

conveying the complexity of a student’s


achievement.
 Grading is not always valued by teachers and
thus often suffers from carelessness.
 Teachers often use grading as form of

discipline and motivation, rather than as an


assessment report
Some benefits of learning outcomes
 Select content
 Develop of instructional strategy
 Develop and select instructional materials
 Constructs tests and other instruments for

assessing and evaluating


 Improve you as a teacher, and our overall

program
Writing Learning Outcomes
 Learning outcomes Formula
 Bloom’s Taxonomy
 Characteristic of Good Learning Outcomes
 Learning Outcomes Exercise
 Write your Learning Outcomes
Theory Into Practice
5 Questions for Instructional Design

1. What do you want the student to be able to do?


(outcome)
2. What does the student need to know in order to do
this well? (curriculum)
3. What activity will facilitate the learning? (pedagogy)
4. How will the student demonstrate the learning?
(assessment)
5. How will I know the student has done this well?
(criteria)
1. What do you want the student to
be able to do?
This question asks you to develop the
outcome.

For example:
Students identifies, consults and evaluates
reference books appropriate to the topic in
order to locate background information and
statistics.
Example 3
 Bad Outcome
- Use Illiad and Texshare in order to access
materials not available at UT Arlington
Library.
Example 3
 Good Outcome
- Utilize retrieval services in order to obtain
materials not owned by UT Arlington library.
Last example…I Promise
 Bad Outcome
- Students will construct bibliographies and
in-text references using discipline
appropriate styles in order to contribute to
academic discourse in their discipline.
Last Example…I Promise
 Good Outcome
- Construct bibliographies and in-text
references using discipline appropriate styles
in order to correctly attribute other’s work
and ideas.
Let’s Write a Learning Outcomes
 We’re taking a friend camping for the first
time (not roughing it too much).
 What do they need to know?
Let’s Write a Learning Outcome
 We’ll concentrate on how to build a fire
 Why do we want our friend to be able to

properly build a fire?


Let’s Write a Learning Outcome
 Now let’s write the learning outcome
 What is our verb (use Bloom’s)
 Why?
Reliability
A test is reliable when it yields consistent results. To
establish reliability researchers establish different
procedures :

1. Split-half Reliability: Dividing the test into two equal


halves and assessing how consistent the scores are.
2. Reliability using different tests: Using different
forms of the test to measure consistency between
them.
3. Test-Retest Reliability : Using the same test on two
occasions to measure consistency.
Validity
Reliability of a test does not ensure validity. Validity of a test refers to what the
test is supposed to measure or predict.

1. Content Validity: Refer to the extent a test measures your definition of the
construct
2. Criterion-related validity: Relationship between scores on a test and an
independent measure of what the test is supposed to measure

1. predictive Validity: Refers to the function of a test in predicting a


particular behavior or trait. For instance, we might theorize that a measure
of math ability should be able to predict how well a person will do in an
engineering-based profesion.
2. Convergent Validity: we examine the degree to which the
operationalization is similar to (converges on) other operationalizations we
might correlate the scores on our test with scores on other tests that purport
to measure basic math ability, where high correlations would be evidence of
convergent validity.
Norm-referenced standard
Test score distribution
Criterion referenced standard
Test score distribution (average group)
Test score distribution (poor group)
Test score distribution (good group)
Definitions
 Assessment – The process of measuring
something with the purpose of assigning a
numerical value.
 Scoring – The procedure of assigning a
numerical value to assessment task.
 Evaluation – The process of determining the
worth of something in relation to established
benchmarks using assessment information.
Assessment Types
 Formative – for performance enhancement
 Summative - for performance enhancement
 Formal – quizzes, tests, essays, lab
reports,etc.
 Informal – active questioning during and at
end of class
 Traditional – tests, quizzes, homework, lab
reports, teacher
 Alternative – PBL’s, presentations, essays,
book reviews, peers
Alternative Assessment
 Alternative to what? Paper & pencil exams
 Alternatives:
-lab work / research projects
-portfolios
-presentations
-research paper
-essays
-self-assessment / peer assessment
-lab practical
-classroom “clickers” or responder pads
More Formal Alternative
 Rube Goldberg projects
 Bridge building / rocketry / mousetrap cars
 Writing a computer program
 Research project
 Term paper
 Create web page
 Create movie
 Role playing
 Building models
 Academic competitions
Informal CATs (Classroom Assessment
Techniqus )

 Quick-fire questions
 Minute paper
 1) what did you learn today?
 2) what questions do you have?
 Directed paraphrasing (explain a concept to a
particular audience)
 The “muddiest” point (what is it about the
topic that remains unclear to you?)
Authentic Assessment
 The National Science Education Standards
draft (1994) states, “Authentic assessment
exerxuces require students to apply scientific
information and reasoning to situations like
those they will encounter in the world outside
the classroom as well situations that
approximate how scientists do their work. ”
Assessment Concerns
 Validity – is the test assessing what’s intended?
- are test items based on stated objectives?
- are test items properly constructed?
 Difficulty – are questions too hard? (e.g.,30% to
70% of students should answer a given item
correctly)
 Discriminability – are the performance on
individual test item positively correlated with
overall student performances? (e.g., only best
students do well on most difficult questions)
Criterion-Referenced Eval’s
 Based on a predetermined set of criteria.
 For instance,
-90% and up = A
-80% to 89.99% =B
-70% to 79.99 =C
-60% to 69.99% =D
-59.99% and below =F
Criterion-Referenced Eval’s
 Pros:
 Sets minimum performance expectations.
 Demonstrate what students can and cannot
do in relation to important content-area
standards (e.g, ILS)
 Cons:
 Some times it’s hard to know just where to
set boundary conditions.
 Lack of comparison data with other students
and/or schools.
Norm-referenced Evaluation
 Based upon the assumption of a standard
normal (Gaussian) distribution with n>30.
 Employs the z score:

- A = top 10% (z>+1.28)


- B = next 20% (+0.53<z<+1.28)
- C = central 40% (-0.53<z<+0.53)
- D = next 20% (-1.28<z<-0.53)
- F = bottom 10% (z<-1.28)
Norm-referenced Evaluation
 Pros:
-Ensures a “spread” between top and bottom of the
class for clear grade setting.
-Shows student performance relative to group.
-Con: In a group with great performance, some will be
ensured an “F”.
 Cons:
-Top and bottom performances can sometime be very
close.
-Dispenses with absolute criteria for performance.
-Being above average does not necessarily imply “A”
performance.
Norm and Criterion Compared
 Norm-Referenced:
-Ensures a competitive classroom atmosphere
-Assumes a standard normal distribution
-Small-group statistics a problem
-Assumes “this” class like all others

 Criterion-Referenced:
-Allows for a cooperative classroom atmosphere
-No assumptions about form of distribution
-Small group statistics not a problem
-Difficult to know just where to set criteria

You might also like